All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support
@ 2016-06-17  1:30 Darrick J. Wong
  2016-06-17  1:30 ` [PATCH 001/145] xfs_buflock: add a tool that can be used to find buffer deadlocks Darrick J. Wong
                   ` (144 more replies)
  0 siblings, 145 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:30 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Hi all,

This is the sixth revision of a patchset that adds to xfsprogs
support for tracking reverse-mappings of physical blocks to file and
metadata (rmap); support for mapping multiple file logical blocks to
the same physical block (reflink); and implements the beginnings of
online metadata scrubbing.  Given the significant amount of design
assumptions that change with block sharing, rmap and reflink are
provided together.  There shouldn't be any incompatible on-disk format
changes, pending a thorough review of the patches within.

The first few libxfs patches resync the xfsprogs libxfs with the
kernel libxfs so that the new feature development can branch off
(roughly) the same code.  All patches in the kernel patchset that
touch libxfs are provided here individually, and implement the same
code changes.

Patches to the actual xfsprogs programs are also provided -- xfs_db,
xfs_logprint, and xfs_metadump are taught to examine the new on-disk
data structures; xfs_io now knows how to inject errors for tests;
mkfs can format the new features; and xfs_repair has been taught to
check its own observations against the existing rmapbt and refcountbt
and to regenerate the two indices.

At the very end of the patchset is an initial implementation of twin
"fsmap" commands in xfs_db and xfs_io for offline and online analysis
of the filesystem.  The last patch implements xfs_scrub which performs
(very) limited online checking of the filesystem.  It should work on
any disk-based filesystem, and if it detects XFS it can trigger the
in-kernel metadata b+tree scrubbing.

The very first patch is a tool that can analyze xfs_buf deadlocks from
ftrace output, and the patches after that merge various kernel libxfs
changes into the xfsprogs libxfs so that the rest of the patches apply
consistently.

If you're going to start using this mess, you probably ought to just
pull from my github trees for kernel[1], xfsprogs[2], and xfstests[3].
There are also updates for xfs-docs[4].  The kernel patches should
apply to dchinner's for-next; xfsprogs patches to for-next; and
xfstest to master.  The kernel git tree already has for-next included.

The patches have been xfstested with x64, i386, and armv7l--arm64,
ppc64, and ppc64le no longer boot in qemu.  All three architectures
pass all 'clone' group tests except xfs/128 (which is the swapext
test), and AFAICT don't cause any new failures for the 'auto' group.

This is an extraordinary way to eat your data.  Enjoy! 
Comments and questions are, as always, welcome.

--D

[1] https://github.com/djwong/linux/tree/djwong-devel
[2] https://github.com/djwong/xfsprogs/tree/djwong-devel
[3] https://github.com/djwong/xfstests/tree/djwong-devel
[4] https://github.com/djwong/xfs-documentation/tree/djwong-devel

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 149+ messages in thread

* [PATCH 001/145] xfs_buflock: add a tool that can be used to find buffer deadlocks
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
@ 2016-06-17  1:30 ` Darrick J. Wong
  2016-06-17  1:30 ` [PATCH 002/145] libxfs: port changes from kernel libxfs Darrick J. Wong
                   ` (143 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:30 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Add a (rough) python script that can parse the output of:
# trace-cmd -e xfs_buf_*lock*' <other tracepoints>
to identify xfs_buf deadlocks between XFS threads.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 tools/xfsbuflock.py |  205 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 205 insertions(+)
 create mode 100755 tools/xfsbuflock.py


diff --git a/tools/xfsbuflock.py b/tools/xfsbuflock.py
new file mode 100755
index 0000000..f307461
--- /dev/null
+++ b/tools/xfsbuflock.py
@@ -0,0 +1,205 @@
+#!/usr/bin/env python3
+
+# Read ftrace input, looking for XFS buffer deadlocks.
+#
+# Copyright (C) 2016 Oracle.  All Rights Reserved.
+#
+# Author: Darrick J. Wong <darrick.wong@oracle.com>
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License
+# as published by the Free Software Foundation; either version 2
+# of the License, or (at your option) any later version.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+#
+# Rough guide to using this script:
+# Collect ftrace data from a deadlock:
+#
+# # trace-cmd record -e 'xfs_buf_*lock*' <other traces> &
+# <run command, hang system>^Z
+# # killall -INT trace-cmd
+# <wait for trace-cmd to spit out trace.dat>
+#
+# Now analyze the captured trace data:
+#
+# # trace-cmd report | xfsbuflock.py
+# === fsx-14956 ===
+# <trace data>
+# 3732.005575: xfs_buf_trylock_fail: dev 8:16 bno 0x1 nblks 0x1 hold 4 \
+#		pincount 1 lock 0 flags DONE|KMEM caller 0xc009af36s
+# Locked buffers:
+# dev 8:16 bno 0x64c371 nblks 0x1 lock 1 owner fsx-14956@3732.005567
+#   waiting: fsx-14954
+# dev 8:16 bno 0x64c380 nblks 0x8 lock 1 owner fsx-14956@3732.005571
+# dev 8:16 bno 0x64c378 nblks 0x8 lock 1 owner fsx-14956@3732.005570
+# === fsx-14954 ===
+# <trace data>
+# 3732.005592: xfs_buf_trylock_fail: dev 8:16 bno 0x64c371 nblks 0x1 hold 4 \
+#		pincount 1 lock 0 flags ASYNC|DONE|KMEM caller 0xc009af36s
+# Locked buffers:
+# dev 8:16 bno 0x8 nblks 0x8 lock 1 owner fsx-14954@3732.005583
+# dev 8:16 bno 0x1 nblks 0x1 lock 1 owner fsx-14954@3732.005574
+#   waiting: fsx-14956
+#   waiting: fsx-14957
+#   waiting: fsx-14958
+# dev 8:16 bno 0x10 nblks 0x8 lock 1 owner fsx-14954@3732.005585
+#
+# As you can see, fsx-14596 is locking AGFs in violation of the locking
+# order rules.
+
+import sys
+import fileinput
+from collections import namedtuple
+
+NR_BACKTRACE = 50
+
+class Process:
+	def __init__(self, pid):
+		self.pid = pid;
+		self.bufs = set()
+		self.locked_bufs = set()
+		self.backtrace = []
+
+	def dump(self):
+		print('=== %s ===' % self.pid)
+		for bt in self.backtrace:
+			print('%f: %s' % (bt.time, bt.descr))
+		print('Locked buffers:')
+		for buf in self.locked_bufs:
+			buf.dump()
+
+class Buffer:
+	def __init__(self, dev, bno, blen):
+		self.dev = dev
+		self.bno = int(bno, 0)
+		self.blen = int(blen, 0)
+		self.locked = False
+		self.locktime = None
+		self.owner = None
+		self.waiters = set()
+
+	def trylock(self, process, time):
+		self.lockdone(process, time)
+
+	def lockdone(self, process, time):
+		if self.locked:
+			print('Buffer already locked on line %d?!' % nr)
+		#	process.dump()
+		#	self.dump()
+		#	assert False
+		if process in self.waiters:
+			self.waiters.remove(process)
+		self.locked = True
+		self.owner = process
+		self.locktime = time
+		process.locked_bufs.add(self)
+		process.bufs.add(self)
+		locked_buffers.add(self)
+
+	def waitlock(self, process):
+		self.waiters.add(process)
+
+	def unlock(self):
+		self.locked = False
+		if self in locked_buffers:
+			locked_buffers.remove(self)
+		if self.owner is not None and \
+		   self in self.owner.locked_bufs:
+			self.owner.locked_bufs.remove(self)
+
+	def dump(self):
+		if self.owner is not None:
+			pid = '%s@%f' % (self.owner.pid, self.locktime)
+		else:
+			pid = ''
+		print('dev %s bno 0x%x nblks 0x%x lock %d owner %s' % \
+			(self.dev, self.bno, self.blen, self.locked, \
+			pid))
+		for proc in self.waiters:
+			print('  waiting: %s' % proc.pid)
+
+Event = namedtuple('Event', 'time descr')
+
+# Read ftrace input, looking for events and for buffer lock info
+processes = {}
+buffers = {}
+locked_buffers = set()
+
+def getbuf(toks):
+	if int(toks[7], 0) == 18446744073709551615:
+		return None
+	bufkey = ' '.join(toks[4:10])
+	if bufkey in buffers:
+		return buffers[bufkey]
+	buf = Buffer(toks[5], toks[7], toks[9])
+	buffers[bufkey] = buf
+	return buf
+
+nr = 0
+for line in fileinput.input():
+	nr += 1
+	toks = line.split()
+	if len(toks) < 4:
+		continue
+	pid = toks[0]
+	time = float(toks[2][:-1])
+	fn = toks[3][:-1]
+
+	if pid in processes:
+		proc = processes[pid]
+	else:
+		proc = Process(pid)
+		processes[pid] = proc
+
+	if fn == 'xfs_buf_unlock' or fn == 'xfs_buf_item_unlock_stale':
+		buf = getbuf(toks)
+		if buf is not None:
+			buf.unlock()
+	elif fn == 'xfs_buf_lock_done':
+		buf = getbuf(toks)
+		if buf is not None:
+			buf.lockdone(proc, time)
+	elif fn == 'xfs_buf_lock':
+		buf = getbuf(toks)
+		if buf is not None:
+			buf.waitlock(proc)
+	elif fn == 'xfs_buf_trylock':
+		buf = getbuf(toks)
+		if buf is not None:
+			buf.trylock(proc, time)
+	elif fn == 'xfs_buf_item_unlock':
+		pass
+	else:
+		e = Event(time, ' '.join(toks[3:]))
+		proc.backtrace.append(e)
+		if len(proc.backtrace) > NR_BACKTRACE:
+			proc.backtrace.pop(0)
+
+deadlocked = set()
+for buf in locked_buffers:
+	deadlocked.add(buf.owner)
+
+for proc in deadlocked:
+	proc.dump()
+	
+sys.exit(0)
+
+for key in buffers:
+	buf = buffers[key]
+	if buf.locked:
+		print('dev %s bno 0x%x len 0x%x owner %s' % (buf.dev, buf.bno, buf.blen, buf.owner.pid))
+	else:
+		print('dev %s bno 0x%x len 0x%x' % (buf.dev, buf.bno, buf.blen))
+
+sys.exit(0)
+
+for pid in processes:
+	proc = processes[pid]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 002/145] libxfs: port changes from kernel libxfs
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
  2016-06-17  1:30 ` [PATCH 001/145] xfs_buflock: add a tool that can be used to find buffer deadlocks Darrick J. Wong
@ 2016-06-17  1:30 ` Darrick J. Wong
  2016-06-17  1:31 ` [PATCH 003/145] libxfs: backport changes from 4.5 Darrick J. Wong
                   ` (142 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:30 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Port various changes between the kernel and xfsprogs.  This
cleans up both so that we can develop rmap and reflink on the
same libxfs code.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_alloc.c     |    5 +++--
 libxfs/xfs_btree.c     |    4 ++--
 libxfs/xfs_da_btree.c  |   51 ++++++++++++++++++++++++------------------------
 libxfs/xfs_dir2_node.c |    2 ++
 libxfs/xfs_dquot_buf.c |   10 +++++----
 libxfs/xfs_format.h    |    3 +--
 libxfs/xfs_ialloc.c    |    5 ++---
 libxfs/xfs_inode_buf.c |    3 +--
 libxfs/xfs_inode_buf.h |    2 --
 libxfs/xfs_sb.c        |   19 ++++++++++++++++++
 10 files changed, 61 insertions(+), 43 deletions(-)


diff --git a/libxfs/xfs_alloc.c b/libxfs/xfs_alloc.c
index af40270..28b3fb9 100644
--- a/libxfs/xfs_alloc.c
+++ b/libxfs/xfs_alloc.c
@@ -2411,8 +2411,9 @@ xfs_alloc_read_agf(
 			be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNTi]);
 		spin_lock_init(&pag->pagb_lock);
 		pag->pagb_count = 0;
-		/* XXX: pagb_tree doesn't exist in userspace */
-		//pag->pagb_tree = RB_ROOT;
+#ifdef __KERNEL__
+		pag->pagb_tree = RB_ROOT;
+#endif
 		pag->pagf_init = 1;
 	}
 #ifdef DEBUG
diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c
index a736cb5..1448fd6 100644
--- a/libxfs/xfs_btree.c
+++ b/libxfs/xfs_btree.c
@@ -243,7 +243,7 @@ xfs_btree_lblock_verify_crc(
 	struct xfs_btree_block	*block = XFS_BUF_TO_BLOCK(bp);
 	struct xfs_mount	*mp = bp->b_target->bt_mount;
 
-	if (xfs_sb_version_hascrc(&bp->b_target->bt_mount->m_sb)) {
+	if (xfs_sb_version_hascrc(&mp->m_sb)) {
 		if (!xfs_log_check_lsn(mp, be64_to_cpu(block->bb_u.l.bb_lsn)))
 			return false;
 		return xfs_buf_verify_cksum(bp, XFS_BTREE_LBLOCK_CRC_OFF);
@@ -281,7 +281,7 @@ xfs_btree_sblock_verify_crc(
 	struct xfs_btree_block  *block = XFS_BUF_TO_BLOCK(bp);
 	struct xfs_mount	*mp = bp->b_target->bt_mount;
 
-	if (xfs_sb_version_hascrc(&bp->b_target->bt_mount->m_sb)) {
+	if (xfs_sb_version_hascrc(&mp->m_sb)) {
 		if (!xfs_log_check_lsn(mp, be64_to_cpu(block->bb_u.s.bb_lsn)))
 			return false;
 		return xfs_buf_verify_cksum(bp, XFS_BTREE_SBLOCK_CRC_OFF);
diff --git a/libxfs/xfs_da_btree.c b/libxfs/xfs_da_btree.c
index f3c04ab..298252e 100644
--- a/libxfs/xfs_da_btree.c
+++ b/libxfs/xfs_da_btree.c
@@ -351,6 +351,7 @@ xfs_da3_split(
 	struct xfs_da_state_blk	*newblk;
 	struct xfs_da_state_blk	*addblk;
 	struct xfs_da_intnode	*node;
+	struct xfs_buf		*bp;
 	int			max;
 	int			action = 0;
 	int			error;
@@ -391,9 +392,7 @@ xfs_da3_split(
 				break;
 			}
 			/*
-			 * Entry wouldn't fit, split the leaf again. The new
-			 * extrablk will be consumed by xfs_da3_node_split if
-			 * the node is split.
+			 * Entry wouldn't fit, split the leaf again.
 			 */
 			state->extravalid = 1;
 			if (state->inleaf) {
@@ -442,14 +441,6 @@ xfs_da3_split(
 		return 0;
 
 	/*
-	 * xfs_da3_node_split() should have consumed any extra blocks we added
-	 * during a double leaf split in the attr fork. This is guaranteed as
-	 * we can't be here if the attr fork only has a single leaf block.
-	 */
-	ASSERT(state->extravalid == 0 ||
-	       state->path.blk[max].magic == XFS_DIR2_LEAFN_MAGIC);
-
-	/*
 	 * Split the root node.
 	 */
 	ASSERT(state->path.active == 0);
@@ -461,31 +452,41 @@ xfs_da3_split(
 	}
 
 	/*
-	 * Update pointers to the node which used to be block 0 and just got
-	 * bumped because of the addition of a new root node.  Note that the
-	 * original block 0 could be at any position in the list of blocks in
-	 * the tree.
+	 * Update pointers to the node which used to be block 0 and
+	 * just got bumped because of the addition of a new root node.
+	 * There might be three blocks involved if a double split occurred,
+	 * and the original block 0 could be at any position in the list.
 	 *
-	 * Note: the magic numbers and sibling pointers are in the same physical
-	 * place for both v2 and v3 headers (by design). Hence it doesn't matter
-	 * which version of the xfs_da_intnode structure we use here as the
-	 * result will be the same using either structure.
+	 * Note: the magic numbers and sibling pointers are in the same
+	 * physical place for both v2 and v3 headers (by design). Hence it
+	 * doesn't matter which version of the xfs_da_intnode structure we use
+	 * here as the result will be the same using either structure.
 	 */
 	node = oldblk->bp->b_addr;
 	if (node->hdr.info.forw) {
-		ASSERT(be32_to_cpu(node->hdr.info.forw) == addblk->blkno);
-		node = addblk->bp->b_addr;
+		if (be32_to_cpu(node->hdr.info.forw) == addblk->blkno) {
+			bp = addblk->bp;
+		} else {
+			ASSERT(state->extravalid);
+			bp = state->extrablk.bp;
+		}
+		node = bp->b_addr;
 		node->hdr.info.back = cpu_to_be32(oldblk->blkno);
-		xfs_trans_log_buf(state->args->trans, addblk->bp,
+		xfs_trans_log_buf(state->args->trans, bp,
 				  XFS_DA_LOGRANGE(node, &node->hdr.info,
 				  sizeof(node->hdr.info)));
 	}
 	node = oldblk->bp->b_addr;
 	if (node->hdr.info.back) {
-		ASSERT(be32_to_cpu(node->hdr.info.back) == addblk->blkno);
-		node = addblk->bp->b_addr;
+		if (be32_to_cpu(node->hdr.info.back) == addblk->blkno) {
+			bp = addblk->bp;
+		} else {
+			ASSERT(state->extravalid);
+			bp = state->extrablk.bp;
+		}
+		node = bp->b_addr;
 		node->hdr.info.forw = cpu_to_be32(oldblk->blkno);
-		xfs_trans_log_buf(state->args->trans, addblk->bp,
+		xfs_trans_log_buf(state->args->trans, bp,
 				  XFS_DA_LOGRANGE(node, &node->hdr.info,
 				  sizeof(node->hdr.info)));
 	}
diff --git a/libxfs/xfs_dir2_node.c b/libxfs/xfs_dir2_node.c
index 224daa6..04fecf1 100644
--- a/libxfs/xfs_dir2_node.c
+++ b/libxfs/xfs_dir2_node.c
@@ -2146,12 +2146,14 @@ xfs_dir2_node_replace(
 	state = xfs_da_state_alloc();
 	state->args = args;
 	state->mp = args->dp->i_mount;
+
 	/*
 	 * We have to save new inode number and ftype since
 	 * xfs_da3_node_lookup_int() is going to overwrite them
 	 */
 	inum = args->inumber;
 	ftype = args->filetype;
+
 	/*
 	 * Lookup the entry to change in the btree.
 	 */
diff --git a/libxfs/xfs_dquot_buf.c b/libxfs/xfs_dquot_buf.c
index 433abe4..7b7ea83 100644
--- a/libxfs/xfs_dquot_buf.c
+++ b/libxfs/xfs_dquot_buf.c
@@ -37,11 +37,8 @@ int
 xfs_calc_dquots_per_chunk(
 	unsigned int		nbblks)	/* basic block units */
 {
-
-	ASSERT(nbblks > 0);
-	return BBTOB(nbblks) / sizeof(xfs_dqblk_t);
-
-#if 0	/* kernel code that goes wrong in userspace! */
+#ifdef __KERNEL__
+	/* kernel code that goes wrong in userspace! */
 	unsigned int	ndquots;
 
 	ASSERT(nbblks > 0);
@@ -49,6 +46,9 @@ xfs_calc_dquots_per_chunk(
 	do_div(ndquots, sizeof(xfs_dqblk_t));
 
 	return ndquots;
+#else
+	ASSERT(nbblks > 0);
+	return BBTOB(nbblks) / sizeof(xfs_dqblk_t);
 #endif
 }
 
diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index f89b6e0..825fa0c 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -469,7 +469,6 @@ xfs_sb_has_ro_compat_feature(
 #define XFS_SB_FEAT_INCOMPAT_FTYPE	(1 << 0)	/* filetype in dirent */
 #define XFS_SB_FEAT_INCOMPAT_SPINODES	(1 << 1)	/* sparse inode chunks */
 #define XFS_SB_FEAT_INCOMPAT_META_UUID	(1 << 2)	/* metadata UUID */
-
 #define XFS_SB_FEAT_INCOMPAT_ALL \
 		(XFS_SB_FEAT_INCOMPAT_FTYPE|	\
 		 XFS_SB_FEAT_INCOMPAT_SPINODES|	\
@@ -533,7 +532,7 @@ static inline bool xfs_sb_version_hassparseinodes(struct xfs_sb *sbp)
  * user-visible UUID to be changed on V5 filesystems which have a
  * filesystem UUID stamped into every piece of metadata.
  */
-static inline int xfs_sb_version_hasmetauuid(xfs_sb_t *sbp)
+static inline bool xfs_sb_version_hasmetauuid(struct xfs_sb *sbp)
 {
 	return (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) &&
 		(sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_META_UUID);
diff --git a/libxfs/xfs_ialloc.c b/libxfs/xfs_ialloc.c
index 64c3acf..72e9ff7 100644
--- a/libxfs/xfs_ialloc.c
+++ b/libxfs/xfs_ialloc.c
@@ -1822,8 +1822,7 @@ xfs_difree_inode_chunk(
 
 	if (!xfs_inobt_issparse(rec->ir_holemask)) {
 		/* not sparse, calculate extent info directly */
-		xfs_bmap_add_free(mp, flist, XFS_AGB_TO_FSB(mp, agno,
-				  XFS_AGINO_TO_AGBNO(mp, rec->ir_startino)),
+		xfs_bmap_add_free(mp, flist, XFS_AGB_TO_FSB(mp, agno, sagbno),
 				  mp->m_ialloc_blks);
 		return;
 	}
@@ -2228,7 +2227,7 @@ xfs_imap_lookup(
 	}
 
 	xfs_trans_brelse(tp, agbp);
-	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+	xfs_btree_del_cursor(cur, error ? XFS_BTREE_ERROR : XFS_BTREE_NOERROR);
 	if (error)
 		return error;
 
diff --git a/libxfs/xfs_inode_buf.c b/libxfs/xfs_inode_buf.c
index e3d674d..c21a4e6 100644
--- a/libxfs/xfs_inode_buf.c
+++ b/libxfs/xfs_inode_buf.c
@@ -375,7 +375,7 @@ xfs_log_dinode_to_disk(
 	}
 }
 
-bool
+static bool
 xfs_dinode_verify(
 	struct xfs_mount	*mp,
 	xfs_ino_t		ino,
@@ -514,7 +514,6 @@ xfs_iread(
 	}
 
 	ASSERT(ip->i_d.di_version >= 2);
-
 	ip->i_delayed_blks = 0;
 
 	/*
diff --git a/libxfs/xfs_inode_buf.h b/libxfs/xfs_inode_buf.h
index 4ece9bf..958c543 100644
--- a/libxfs/xfs_inode_buf.h
+++ b/libxfs/xfs_inode_buf.h
@@ -72,8 +72,6 @@ void	xfs_inode_to_disk(struct xfs_inode *ip, struct xfs_dinode *to,
 void	xfs_inode_from_disk(struct xfs_inode *ip, struct xfs_dinode *from);
 void	xfs_log_dinode_to_disk(struct xfs_log_dinode *from,
 			       struct xfs_dinode *to);
-bool	xfs_dinode_verify(struct xfs_mount *mp, xfs_ino_t ino,
-			  struct xfs_dinode *dip);
 
 bool	xfs_dinode_good_version(struct xfs_mount *mp, __u8 version);
 
diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index 78ad889..67c7a65 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -256,6 +256,19 @@ xfs_mount_validate_sb(
 	}
 
 	/*
+	 * Until this is fixed only page-sized or smaller data blocks work.
+	 */
+#ifdef __KERNEL__
+	if (unlikely(sbp->sb_blocksize > PAGE_SIZE)) {
+		xfs_warn(mp,
+		"File system with blocksize %d bytes. "
+		"Only pagesize (%ld) or less will currently work.",
+				sbp->sb_blocksize, PAGE_SIZE);
+		return -ENOSYS;
+	}
+#endif
+
+	/*
 	 * Currently only very few inode sizes are supported.
 	 */
 	switch (sbp->sb_inodesize) {
@@ -277,6 +290,12 @@ xfs_mount_validate_sb(
 		return -EFBIG;
 	}
 
+#ifdef __KERNEL__
+	if (check_inprogress && sbp->sb_inprogress) {
+		xfs_warn(mp, "Offline file system operation in progress!");
+		return -EFSCORRUPTED;
+	}
+#endif
 	return 0;
 }
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 003/145] libxfs: backport changes from 4.5
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
  2016-06-17  1:30 ` [PATCH 001/145] xfs_buflock: add a tool that can be used to find buffer deadlocks Darrick J. Wong
  2016-06-17  1:30 ` [PATCH 002/145] libxfs: port changes from kernel libxfs Darrick J. Wong
@ 2016-06-17  1:31 ` Darrick J. Wong
  2016-06-17  1:31 ` [PATCH 004/145] libxfs: backport kernel 4.6 changes Darrick J. Wong
                   ` (141 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:31 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Backport changes from kernel 4.3 -> 4.5.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 include/libxfs.h          |    1 +
 libxfs/libxfs_api_defs.h  |    2 +-
 libxfs/libxfs_priv.h      |    8 --------
 libxfs/xfs_alloc_btree.c  |    4 ----
 libxfs/xfs_attr.c         |    2 ++
 libxfs/xfs_dir2.h         |    3 +--
 libxfs/xfs_dir2_data.c    |   14 +++++++-------
 libxfs/xfs_dquot_buf.c    |    4 ++--
 libxfs/xfs_ialloc_btree.c |    2 --
 repair/dir2.c             |    3 +--
 repair/dir2.h             |   20 ++++++++++++++++++++
 repair/phase6.c           |    2 +-
 12 files changed, 36 insertions(+), 29 deletions(-)


diff --git a/include/libxfs.h b/include/libxfs.h
index cf2e20e..a34a3a9 100644
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -163,6 +163,7 @@ extern unsigned int	libxfs_log2_roundup(unsigned int i);
 
 extern int	libxfs_alloc_file_space (struct xfs_inode *, xfs_off_t,
 				xfs_off_t, int, int);
+extern int	libxfs_bmap_finish(xfs_trans_t **, xfs_bmap_free_t *, struct xfs_inode *);
 
 extern void 	libxfs_fs_repair_cmn_err(int, struct xfs_mount *, char *, ...);
 extern void	libxfs_fs_cmn_err(int, struct xfs_mount *, char *, ...);
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index 685c7a7..bb502e0 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -85,7 +85,7 @@
 #define xfs_dir_replace			libxfs_dir_replace
 #define xfs_dir2_isblock		libxfs_dir2_isblock
 #define xfs_dir2_isleaf			libxfs_dir2_isleaf
-#define __xfs_dir2_data_freescan	libxfs_dir2_data_freescan
+#define xfs_dir2_data_freescan		libxfs_dir2_data_freescan
 #define xfs_dir2_data_log_entry		libxfs_dir2_data_log_entry
 #define xfs_dir2_data_log_header	libxfs_dir2_data_log_header
 #define xfs_dir2_data_make_free		libxfs_dir2_data_make_free
diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h
index 2c5aba0..ef9ff3b 100644
--- a/libxfs/libxfs_priv.h
+++ b/libxfs/libxfs_priv.h
@@ -49,14 +49,6 @@
 #ifndef __LIBXFS_INTERNAL_XFS_H__
 #define __LIBXFS_INTERNAL_XFS_H__
 
-/*
- * Repair doesn't have a inode when it calls libxfs_dir2_data_freescan,
- * so we to work around this internally for now.
- */
-#define xfs_dir2_data_freescan(ip, hdr, loghead) \
-	__xfs_dir2_data_freescan((ip)->i_mount->m_dir_geo, \
-				 (ip)->d_ops, hdr, loghead)
-
 #include "libxfs_api_defs.h"
 #include "platform_defs.h"
 #include "xfs.h"
diff --git a/libxfs/xfs_alloc_btree.c b/libxfs/xfs_alloc_btree.c
index 79d0fb9..094135f 100644
--- a/libxfs/xfs_alloc_btree.c
+++ b/libxfs/xfs_alloc_btree.c
@@ -289,8 +289,6 @@ xfs_allocbt_verify(
 	level = be16_to_cpu(block->bb_level);
 	switch (block->bb_magic) {
 	case cpu_to_be32(XFS_ABTB_CRC_MAGIC):
-		if (!xfs_sb_version_hascrc(&mp->m_sb))
-			return false;
 		if (!xfs_btree_sblock_v5hdr_verify(bp))
 			return false;
 		/* fall through */
@@ -302,8 +300,6 @@ xfs_allocbt_verify(
 			return false;
 		break;
 	case cpu_to_be32(XFS_ABTC_CRC_MAGIC):
-		if (!xfs_sb_version_hascrc(&mp->m_sb))
-			return false;
 		if (!xfs_btree_sblock_v5hdr_verify(bp))
 			return false;
 		/* fall through */
diff --git a/libxfs/xfs_attr.c b/libxfs/xfs_attr.c
index afe3dcb..82a7c5e 100644
--- a/libxfs/xfs_attr.c
+++ b/libxfs/xfs_attr.c
@@ -134,6 +134,8 @@ xfs_attr_get(
 
 	args.value = value;
 	args.valuelen = *valuelenp;
+	/* Entirely possible to look up a name which doesn't exist */
+	args.op_flags = XFS_DA_OP_OKNOENT;
 
 	lock_mode = xfs_ilock_attr_map_shared(ip);
 	if (!xfs_inode_hasattr(ip))
diff --git a/libxfs/xfs_dir2.h b/libxfs/xfs_dir2.h
index 0129e37..0a62e73 100644
--- a/libxfs/xfs_dir2.h
+++ b/libxfs/xfs_dir2.h
@@ -157,8 +157,7 @@ extern int xfs_dir2_isleaf(struct xfs_da_args *args, int *r);
 extern int xfs_dir2_shrink_inode(struct xfs_da_args *args, xfs_dir2_db_t db,
 				struct xfs_buf *bp);
 
-extern void __xfs_dir2_data_freescan(struct xfs_da_geometry *geo,
-		const struct xfs_dir_ops *ops,
+extern void xfs_dir2_data_freescan(struct xfs_inode *dp,
 		struct xfs_dir2_data_hdr *hdr, int *loghead);
 extern void xfs_dir2_data_log_entry(struct xfs_da_args *args,
 		struct xfs_buf *bp, struct xfs_dir2_data_entry *dep);
diff --git a/libxfs/xfs_dir2_data.c b/libxfs/xfs_dir2_data.c
index 6ae5cd2..c80ab7e 100644
--- a/libxfs/xfs_dir2_data.c
+++ b/libxfs/xfs_dir2_data.c
@@ -502,9 +502,8 @@ xfs_dir2_data_freeremove(
  * Given a data block, reconstruct its bestfree map.
  */
 void
-__xfs_dir2_data_freescan(
-	struct xfs_da_geometry	*geo,
-	const struct xfs_dir_ops *ops,
+xfs_dir2_data_freescan(
+	struct xfs_inode	*dp,
 	struct xfs_dir2_data_hdr *hdr,
 	int			*loghead)
 {
@@ -514,6 +513,7 @@ __xfs_dir2_data_freescan(
 	struct xfs_dir2_data_free *bf;
 	char			*endp;		/* end of block's data */
 	char			*p;		/* current entry pointer */
+	struct xfs_da_geometry	*geo = dp->i_mount->m_dir_geo;
 
 	ASSERT(hdr->magic == cpu_to_be32(XFS_DIR2_DATA_MAGIC) ||
 	       hdr->magic == cpu_to_be32(XFS_DIR3_DATA_MAGIC) ||
@@ -523,13 +523,13 @@ __xfs_dir2_data_freescan(
 	/*
 	 * Start by clearing the table.
 	 */
-	bf = ops->data_bestfree_p(hdr);
+	bf = dp->d_ops->data_bestfree_p(hdr);
 	memset(bf, 0, sizeof(*bf) * XFS_DIR2_DATA_FD_COUNT);
 	*loghead = 1;
 	/*
 	 * Set up pointers.
 	 */
-	p = (char *)ops->data_entry_p(hdr);
+	p = (char *)dp->d_ops->data_entry_p(hdr);
 	if (hdr->magic == cpu_to_be32(XFS_DIR2_BLOCK_MAGIC) ||
 	    hdr->magic == cpu_to_be32(XFS_DIR3_BLOCK_MAGIC)) {
 		btp = xfs_dir2_block_tail_p(geo, hdr);
@@ -556,8 +556,8 @@ __xfs_dir2_data_freescan(
 		else {
 			dep = (xfs_dir2_data_entry_t *)p;
 			ASSERT((char *)dep - (char *)hdr ==
-			       be16_to_cpu(*ops->data_entry_tag_p(dep)));
-			p += ops->data_entsize(dep->namelen);
+			       be16_to_cpu(*dp->d_ops->data_entry_tag_p(dep)));
+			p += dp->d_ops->data_entsize(dep->namelen);
 		}
 	}
 }
diff --git a/libxfs/xfs_dquot_buf.c b/libxfs/xfs_dquot_buf.c
index 7b7ea83..bfc6383 100644
--- a/libxfs/xfs_dquot_buf.c
+++ b/libxfs/xfs_dquot_buf.c
@@ -199,8 +199,8 @@ xfs_dquot_buf_verify_crc(
 	if (mp->m_quotainfo)
 		ndquots = mp->m_quotainfo->qi_dqperchunk;
 	else
-		ndquots = xfs_calc_dquots_per_chunk(bp->b_length);
-//					XFS_BB_TO_FSB(mp, bp->b_length));
+		ndquots = xfs_calc_dquots_per_chunk(
+					XFS_BB_TO_FSB(mp, bp->b_length));
 
 	for (i = 0; i < ndquots; i++, d++) {
 		if (!xfs_verify_cksum((char *)d, sizeof(struct xfs_dqblk),
diff --git a/libxfs/xfs_ialloc_btree.c b/libxfs/xfs_ialloc_btree.c
index a077635..0a7d985 100644
--- a/libxfs/xfs_ialloc_btree.c
+++ b/libxfs/xfs_ialloc_btree.c
@@ -227,8 +227,6 @@ xfs_inobt_verify(
 	switch (block->bb_magic) {
 	case cpu_to_be32(XFS_IBT_CRC_MAGIC):
 	case cpu_to_be32(XFS_FIBT_CRC_MAGIC):
-		if (!xfs_sb_version_hascrc(&mp->m_sb))
-			return false;
 		if (!xfs_btree_sblock_v5hdr_verify(bp))
 			return false;
 		/* fall through */
diff --git a/repair/dir2.c b/repair/dir2.c
index 61912d1..a2fe5c6 100644
--- a/repair/dir2.c
+++ b/repair/dir2.c
@@ -921,8 +921,7 @@ _("bad bestfree table in block %u in directory inode %" PRIu64 ": "),
 			da_bno, ino);
 		if (!no_modify) {
 			do_warn(_("repairing table\n"));
-			libxfs_dir2_data_freescan(mp->m_dir_geo, M_DIROPS(mp),
-						  d, &i);
+			repair_dir2_data_freescan(mp, M_DIROPS(mp), d, &i);
 			*dirty = 1;
 		} else {
 			do_warn(_("would repair table\n"));
diff --git a/repair/dir2.h b/repair/dir2.h
index e4d4eeb..a0d873e 100644
--- a/repair/dir2.h
+++ b/repair/dir2.h
@@ -43,4 +43,24 @@ int
 dir2_is_badino(
 	xfs_ino_t	ino);
 
+/*
+ * Repair doesn't have a inode when it calls libxfs_dir2_data_freescan,
+ * so we to work around this internally for now.
+ */
+static inline void
+repair_dir2_data_freescan(
+	struct xfs_mount		*mp,
+	const struct xfs_dir_ops	*d_ops,
+	struct xfs_dir2_data_hdr	*hdr,
+	int				*loghead)
+{
+	struct xfs_inode	ino;
+
+	ino.d_ops = d_ops;
+	ino.i_mount = mp;
+	return libxfs_dir2_data_freescan(&ino, hdr, loghead);
+}
+
+extern int xfs_dir_ino_validate(struct xfs_mount *mp, xfs_ino_t ino);
+
 #endif	/* _XR_DIR2_H */
diff --git a/repair/phase6.c b/repair/phase6.c
index 0a71164..7353c3a 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -1902,7 +1902,7 @@ _("entry \"%s\" in dir inode %" PRIu64 " inconsistent with .. value (%" PRIu64 "
 	}
 	*num_illegal += nbad;
 	if (needscan)
-		libxfs_dir2_data_freescan(mp->m_dir_geo, M_DIROPS(mp), d, &i);
+		repair_dir2_data_freescan(mp, M_DIROPS(mp), d, &i);
 	if (needlog)
 		libxfs_dir2_data_log_header(&da, bp);
 	libxfs_bmap_finish(&tp, &flist, ip);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 004/145] libxfs: backport kernel 4.6 changes
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (2 preceding siblings ...)
  2016-06-17  1:31 ` [PATCH 003/145] libxfs: backport changes from 4.5 Darrick J. Wong
@ 2016-06-17  1:31 ` Darrick J. Wong
  2016-06-17  1:31 ` [PATCH 005/145] libxfs: backport kernel 4.7 changes Darrick J. Wong
                   ` (140 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:31 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Backport the changes from kernel 4.5 -> 4.6.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 include/xfs_trans.h     |    1 
 libxfs/libxfs_priv.h    |    2 -
 libxfs/trans.c          |    1 
 libxfs/xfs_bmap.c       |  170 +++++++++++++++++++++++++++++++++--------------
 libxfs/xfs_bmap_btree.c |    4 +
 libxfs/xfs_dir2_node.c  |    4 +
 libxfs/xfs_ialloc.c     |    4 +
 7 files changed, 129 insertions(+), 57 deletions(-)


diff --git a/include/xfs_trans.h b/include/xfs_trans.h
index 5467c7f..d7ee1fd 100644
--- a/include/xfs_trans.h
+++ b/include/xfs_trans.h
@@ -81,6 +81,7 @@ typedef struct xfs_trans {
 	long		t_fdblocks_delta;	/* superblock fdblocks chg */
 	long		t_frextents_delta;	/* superblock freextents chg */
 	struct list_head	t_items;	/* first log item desc chunk */
+	unsigned int	t_blk_res;
 } xfs_trans_t;
 
 void	xfs_trans_init(struct xfs_mount *);
diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h
index ef9ff3b..ecd75e7 100644
--- a/libxfs/libxfs_priv.h
+++ b/libxfs/libxfs_priv.h
@@ -187,7 +187,7 @@ enum ce { CE_DEBUG, CE_CONT, CE_NOTE, CE_WARN, CE_ALERT, CE_PANIC };
  */
 #define prandom_u32()		0
 
-#define PAGE_CACHE_SIZE		getpagesize()
+#define PAGE_SIZE		getpagesize()
 
 static inline int __do_div(unsigned long long *n, unsigned base)
 {
diff --git a/libxfs/trans.c b/libxfs/trans.c
index 0388950..18ea010 100644
--- a/libxfs/trans.c
+++ b/libxfs/trans.c
@@ -193,6 +193,7 @@ libxfs_trans_reserve(
 	if (blocks > 0) {
 		if (mpsb->sb_fdblocks < blocks)
 			return -ENOSPC;
+		tp->t_blk_res += blocks;
 	}
 	/* user space, don't need log/RT stuff (preserve the API though) */
 	return 0;
diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index 40286e4..cbcfd72 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -469,10 +469,7 @@ xfs_bmap_check_leaf_extents(
 		}
 		block = XFS_BUF_TO_BLOCK(bp);
 	}
-	if (bp_release) {
-		bp_release = 0;
-		xfs_trans_brelse(NULL, bp);
-	}
+
 	return;
 
 error0:
@@ -3737,11 +3734,11 @@ xfs_bmap_btalloc(
 		args.prod = align;
 		if ((args.mod = (xfs_extlen_t)do_mod(ap->offset, args.prod)))
 			args.mod = (xfs_extlen_t)(args.prod - args.mod);
-	} else if (mp->m_sb.sb_blocksize >= PAGE_CACHE_SIZE) {
+	} else if (mp->m_sb.sb_blocksize >= PAGE_SIZE) {
 		args.prod = 1;
 		args.mod = 0;
 	} else {
-		args.prod = PAGE_CACHE_SIZE >> mp->m_sb.sb_blocklog;
+		args.prod = PAGE_SIZE >> mp->m_sb.sb_blocklog;
 		if ((args.mod = (xfs_extlen_t)(do_mod(ap->offset, args.prod))))
 			args.mod = (xfs_extlen_t)(args.prod - args.mod);
 	}
@@ -4713,6 +4710,66 @@ error0:
 }
 
 /*
+ * When a delalloc extent is split (e.g., due to a hole punch), the original
+ * indlen reservation must be shared across the two new extents that are left
+ * behind.
+ *
+ * Given the original reservation and the worst case indlen for the two new
+ * extents (as calculated by xfs_bmap_worst_indlen()), split the original
+ * reservation fairly across the two new extents. If necessary, steal available
+ * blocks from a deleted extent to make up a reservation deficiency (e.g., if
+ * ores == 1). The number of stolen blocks is returned. The availability and
+ * subsequent accounting of stolen blocks is the responsibility of the caller.
+ */
+static xfs_filblks_t
+xfs_bmap_split_indlen(
+	xfs_filblks_t			ores,		/* original res. */
+	xfs_filblks_t			*indlen1,	/* ext1 worst indlen */
+	xfs_filblks_t			*indlen2,	/* ext2 worst indlen */
+	xfs_filblks_t			avail)		/* stealable blocks */
+{
+	xfs_filblks_t			len1 = *indlen1;
+	xfs_filblks_t			len2 = *indlen2;
+	xfs_filblks_t			nres = len1 + len2; /* new total res. */
+	xfs_filblks_t			stolen = 0;
+
+	/*
+	 * Steal as many blocks as we can to try and satisfy the worst case
+	 * indlen for both new extents.
+	 */
+	while (nres > ores && avail) {
+		nres--;
+		avail--;
+		stolen++;
+	}
+
+	/*
+	 * The only blocks available are those reserved for the original
+	 * extent and what we can steal from the extent being removed.
+	 * If this still isn't enough to satisfy the combined
+	 * requirements for the two new extents, skim blocks off of each
+	 * of the new reservations until they match what is available.
+	 */
+	while (nres > ores) {
+		if (len1) {
+			len1--;
+			nres--;
+		}
+		if (nres == ores)
+			break;
+		if (len2) {
+			len2--;
+			nres--;
+		}
+	}
+
+	*indlen1 = len1;
+	*indlen2 = len2;
+
+	return stolen;
+}
+
+/*
  * Called by xfs_bmapi to update file extent records and the btree
  * after removing space (or undoing a delayed allocation).
  */
@@ -4976,28 +5033,29 @@ xfs_bmap_del_extent(
 			XFS_IFORK_NEXT_SET(ip, whichfork,
 				XFS_IFORK_NEXTENTS(ip, whichfork) + 1);
 		} else {
+			xfs_filblks_t	stolen;
 			ASSERT(whichfork == XFS_DATA_FORK);
-			temp = xfs_bmap_worst_indlen(ip, temp);
+
+			/*
+			 * Distribute the original indlen reservation across the
+			 * two new extents. Steal blocks from the deleted extent
+			 * if necessary. Stealing blocks simply fudges the
+			 * fdblocks accounting in xfs_bunmapi().
+			 */
+			temp = xfs_bmap_worst_indlen(ip, got.br_blockcount);
+			temp2 = xfs_bmap_worst_indlen(ip, new.br_blockcount);
+			stolen = xfs_bmap_split_indlen(da_old, &temp, &temp2,
+						       del->br_blockcount);
+			da_new = temp + temp2 - stolen;
+			del->br_blockcount -= stolen;
+
+			/*
+			 * Set the reservation for each extent. Warn if either
+			 * is zero as this can lead to delalloc problems.
+			 */
+			WARN_ON_ONCE(!temp || !temp2);
 			xfs_bmbt_set_startblock(ep, nullstartblock((int)temp));
-			temp2 = xfs_bmap_worst_indlen(ip, temp2);
 			new.br_startblock = nullstartblock((int)temp2);
-			da_new = temp + temp2;
-			while (da_new > da_old) {
-				if (temp) {
-					temp--;
-					da_new--;
-					xfs_bmbt_set_startblock(ep,
-						nullstartblock((int)temp));
-				}
-				if (da_new == da_old)
-					break;
-				if (temp2) {
-					temp2--;
-					da_new--;
-					new.br_startblock =
-						nullstartblock((int)temp2);
-				}
-			}
 		}
 		trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_);
 		xfs_iext_insert(ip, *idx + 1, 1, &new, state);
@@ -5202,7 +5260,7 @@ xfs_bunmapi(
 			 * This is better than zeroing it.
 			 */
 			ASSERT(del.br_state == XFS_EXT_NORM);
-			ASSERT(xfs_trans_get_block_res(tp) > 0);
+			ASSERT(tp->t_blk_res > 0);
 			/*
 			 * If this spans a realtime extent boundary,
 			 * chop it back to the start of the one we end at.
@@ -5233,7 +5291,7 @@ xfs_bunmapi(
 				del.br_startblock += mod;
 			} else if ((del.br_startoff == start &&
 				    (del.br_state == XFS_EXT_UNWRITTEN ||
-				     xfs_trans_get_block_res(tp) == 0)) ||
+				     tp->t_blk_res == 0)) ||
 				   !xfs_sb_version_hasextflgbit(&mp->m_sb)) {
 				/*
 				 * Can't make it unwritten.  There isn't
@@ -5288,9 +5346,37 @@ xfs_bunmapi(
 				goto nodelete;
 			}
 		}
+
+		/*
+		 * If it's the case where the directory code is running
+		 * with no block reservation, and the deleted block is in
+		 * the middle of its extent, and the resulting insert
+		 * of an extent would cause transformation to btree format,
+		 * then reject it.  The calling code will then swap
+		 * blocks around instead.
+		 * We have to do this now, rather than waiting for the
+		 * conversion to btree format, since the transaction
+		 * will be dirty.
+		 */
+		if (!wasdel && tp->t_blk_res == 0 &&
+		    XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS &&
+		    XFS_IFORK_NEXTENTS(ip, whichfork) >= /* Note the >= */
+			XFS_IFORK_MAXEXT(ip, whichfork) &&
+		    del.br_startoff > got.br_startoff &&
+		    del.br_startoff + del.br_blockcount <
+		    got.br_startoff + got.br_blockcount) {
+			error = -ENOSPC;
+			goto error0;
+		}
+
+		/*
+		 * Unreserve quota and update realtime free space, if
+		 * appropriate. If delayed allocation, update the inode delalloc
+		 * counter now and wait to update the sb counters as
+		 * xfs_bmap_del_extent() might need to borrow some blocks.
+		 */
 		if (wasdel) {
 			ASSERT(startblockval(del.br_startblock) > 0);
-			/* Update realtime/data freespace, unreserve quota */
 			if (isrt) {
 				xfs_filblks_t rtexts;
 
@@ -5301,8 +5387,6 @@ xfs_bunmapi(
 					ip, -((long)del.br_blockcount), 0,
 					XFS_QMOPT_RES_RTBLKS);
 			} else {
-				xfs_mod_fdblocks(mp, (int64_t)del.br_blockcount,
-						 false);
 				(void)xfs_trans_reserve_quota_nblks(NULL,
 					ip, -((long)del.br_blockcount), 0,
 					XFS_QMOPT_RES_REGBLKS);
@@ -5313,32 +5397,16 @@ xfs_bunmapi(
 					XFS_BTCUR_BPRV_WASDEL;
 		} else if (cur)
 			cur->bc_private.b.flags &= ~XFS_BTCUR_BPRV_WASDEL;
-		/*
-		 * If it's the case where the directory code is running
-		 * with no block reservation, and the deleted block is in
-		 * the middle of its extent, and the resulting insert
-		 * of an extent would cause transformation to btree format,
-		 * then reject it.  The calling code will then swap
-		 * blocks around instead.
-		 * We have to do this now, rather than waiting for the
-		 * conversion to btree format, since the transaction
-		 * will be dirty.
-		 */
-		if (!wasdel && xfs_trans_get_block_res(tp) == 0 &&
-		    XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS &&
-		    XFS_IFORK_NEXTENTS(ip, whichfork) >= /* Note the >= */
-			XFS_IFORK_MAXEXT(ip, whichfork) &&
-		    del.br_startoff > got.br_startoff &&
-		    del.br_startoff + del.br_blockcount <
-		    got.br_startoff + got.br_blockcount) {
-			error = -ENOSPC;
-			goto error0;
-		}
+
 		error = xfs_bmap_del_extent(ip, tp, &lastx, flist, cur, &del,
 				&tmp_logflags, whichfork);
 		logflags |= tmp_logflags;
 		if (error)
 			goto error0;
+
+		if (!isrt && wasdel)
+			xfs_mod_fdblocks(mp, (int64_t)del.br_blockcount, false);
+
 		bno = del.br_startoff - 1;
 nodelete:
 		/*
diff --git a/libxfs/xfs_bmap_btree.c b/libxfs/xfs_bmap_btree.c
index a63379b..022d4b6 100644
--- a/libxfs/xfs_bmap_btree.c
+++ b/libxfs/xfs_bmap_btree.c
@@ -458,7 +458,7 @@ xfs_bmbt_alloc_block(
 		 * reservation amount is insufficient then we may fail a
 		 * block allocation here and corrupt the filesystem.
 		 */
-		args.minleft = xfs_trans_get_block_res(args.tp);
+		args.minleft = args.tp->t_blk_res;
 	} else if (cur->bc_private.b.flist->xbf_low) {
 		args.type = XFS_ALLOCTYPE_START_BNO;
 	} else {
@@ -467,7 +467,7 @@ xfs_bmbt_alloc_block(
 
 	args.minlen = args.maxlen = args.prod = 1;
 	args.wasdel = cur->bc_private.b.flags & XFS_BTCUR_BPRV_WASDEL;
-	if (!args.wasdel && xfs_trans_get_block_res(args.tp) == 0) {
+	if (!args.wasdel && args.tp->t_blk_res == 0) {
 		error = -ENOSPC;
 		goto error0;
 	}
diff --git a/libxfs/xfs_dir2_node.c b/libxfs/xfs_dir2_node.c
index 04fecf1..b75b432 100644
--- a/libxfs/xfs_dir2_node.c
+++ b/libxfs/xfs_dir2_node.c
@@ -2232,6 +2232,9 @@ xfs_dir2_node_trim_free(
 
 	dp = args->dp;
 	tp = args->trans;
+
+	*rvalp = 0;
+
 	/*
 	 * Read the freespace block.
 	 */
@@ -2252,7 +2255,6 @@ xfs_dir2_node_trim_free(
 	 */
 	if (freehdr.nused > 0) {
 		xfs_trans_brelse(tp, bp);
-		*rvalp = 0;
 		return 0;
 	}
 	/*
diff --git a/libxfs/xfs_ialloc.c b/libxfs/xfs_ialloc.c
index 72e9ff7..4f0e4ee 100644
--- a/libxfs/xfs_ialloc.c
+++ b/libxfs/xfs_ialloc.c
@@ -2396,8 +2396,8 @@ xfs_ialloc_compute_maxlevels(
 
 	maxleafents = (1LL << XFS_INO_AGINO_BITS(mp)) >>
 		XFS_INODES_PER_CHUNK_LOG;
-	minleafrecs = mp->m_alloc_mnr[0];
-	minnoderecs = mp->m_alloc_mnr[1];
+	minleafrecs = mp->m_inobt_mnr[0];
+	minnoderecs = mp->m_inobt_mnr[1];
 	maxblocks = (maxleafents + minleafrecs - 1) / minleafrecs;
 	for (level = 1; maxblocks > 1; level++)
 		maxblocks = (maxblocks + minnoderecs - 1) / minnoderecs;

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 005/145] libxfs: backport kernel 4.7 changes
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (3 preceding siblings ...)
  2016-06-17  1:31 ` [PATCH 004/145] libxfs: backport kernel 4.6 changes Darrick J. Wong
@ 2016-06-17  1:31 ` Darrick J. Wong
  2016-06-17  1:31 ` [PATCH 006/145] xfs: make several functions static Darrick J. Wong
                   ` (139 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:31 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Backport the changes from kernel 4.6 -> 4.7.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 include/kmem.h            |    2 -
 include/xfs_trans.h       |    7 +--
 libxfs/kmem.c             |    2 -
 libxfs/libxfs_api_defs.h  |    1 
 libxfs/trans.c            |   63 ++++++++++++++++------------
 libxfs/util.c             |    5 +-
 libxfs/xfs_attr.c         |   58 ++++++--------------------
 libxfs/xfs_bmap.c         |   22 ++++------
 libxfs/xfs_dir2_sf.c      |    9 +---
 libxfs/xfs_inode_fork.c   |   99 +++++++++++++++++++++++++++++---------------
 libxfs/xfs_inode_fork.h   |    1 
 libxfs/xfs_log_format.h   |    5 ++
 libxfs/xfs_sb.c           |    8 +---
 libxfs/xfs_shared.h       |  102 +--------------------------------------------
 libxlog/xfs_log_recover.c |    2 -
 mkfs/proto.c              |   66 +++++++++++++++--------------
 mkfs/xfs_mkfs.c           |    5 +-
 repair/phase5.c           |   13 ++++--
 repair/phase6.c           |   82 ++++++++++++++----------------------
 repair/phase7.c           |    4 --
 20 files changed, 224 insertions(+), 332 deletions(-)


diff --git a/include/kmem.h b/include/kmem.h
index 5484d32..65f0ade 100644
--- a/include/kmem.h
+++ b/include/kmem.h
@@ -49,6 +49,6 @@ kmem_free(void *ptr) {
 	free(ptr);
 }
 
-extern void	*kmem_realloc(void *, size_t, size_t, int);
+extern void	*kmem_realloc(void *, size_t, int);
 
 #endif
diff --git a/include/xfs_trans.h b/include/xfs_trans.h
index d7ee1fd..a5e019d 100644
--- a/include/xfs_trans.h
+++ b/include/xfs_trans.h
@@ -71,7 +71,6 @@ typedef struct xfs_qoff_logitem {
 } xfs_qoff_logitem_t;
 
 typedef struct xfs_trans {
-	unsigned int	t_type;			/* transaction type */
 	unsigned int	t_log_res;		/* amt of log space resvd */
 	unsigned int	t_log_count;		/* count for perm log res */
 	struct xfs_mount *t_mountp;		/* ptr to fs mount struct */
@@ -87,9 +86,9 @@ typedef struct xfs_trans {
 void	xfs_trans_init(struct xfs_mount *);
 int	xfs_trans_roll(struct xfs_trans **, struct xfs_inode *);
 
-xfs_trans_t	*libxfs_trans_alloc(struct xfs_mount *, int);
-int	libxfs_trans_reserve(struct xfs_trans *, struct xfs_trans_res *,
-				     uint, uint);
+int	libxfs_trans_alloc(struct xfs_mount *mp, struct xfs_trans_res *resp,
+			uint blocks, uint rtextents, uint flags,
+			struct xfs_trans **tpp);
 int	libxfs_trans_commit(struct xfs_trans *);
 void	libxfs_trans_cancel(struct xfs_trans *);
 struct xfs_buf *libxfs_trans_getsb(struct xfs_trans *, struct xfs_mount *, int);
diff --git a/libxfs/kmem.c b/libxfs/kmem.c
index 4f3cd7e..c8bcb50 100644
--- a/libxfs/kmem.c
+++ b/libxfs/kmem.c
@@ -70,7 +70,7 @@ kmem_zalloc(size_t size, int flags)
 }
 
 void *
-kmem_realloc(void *ptr, size_t new_size, size_t old_size, int flags)
+kmem_realloc(void *ptr, size_t new_size, int flags)
 {
 	ptr = realloc(ptr, new_size);
 	if (ptr == NULL) {
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index bb502e0..611a849 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -55,7 +55,6 @@
 #define xfs_trans_read_buf_map		libxfs_trans_read_buf_map
 #define xfs_trans_roll			libxfs_trans_roll
 #define xfs_trans_get_buf_map		libxfs_trans_get_buf_map
-#define xfs_trans_reserve		libxfs_trans_reserve
 #define xfs_trans_resv_calc		libxfs_trans_resv_calc
 
 #define xfs_attr_get			libxfs_attr_get
diff --git a/libxfs/trans.c b/libxfs/trans.c
index 18ea010..97a29b9 100644
--- a/libxfs/trans.c
+++ b/libxfs/trans.c
@@ -119,7 +119,6 @@ libxfs_trans_roll(
 	 */
 	tres.tr_logres = trans->t_log_res;
 	tres.tr_logcount = trans->t_log_count;
-	*tpp = libxfs_trans_alloc(trans->t_mountp, trans->t_type);
 
 	/*
 	 * Commit the current transaction.
@@ -132,8 +131,6 @@ libxfs_trans_roll(
 	if (error)
 		return error;
 
-	trans = *tpp;
-
 	/*
 	 * Reserve space in the log for th next transaction.
 	 * This also pushes items in the "AIL", the list of logged items,
@@ -143,7 +140,7 @@ libxfs_trans_roll(
 	 * the prior and the next transactions.
 	 */
 	tres.tr_logflags = XFS_TRANS_PERM_LOG_RES;
-	error = xfs_trans_reserve(trans, &tres, 0, 0);
+	error = libxfs_trans_alloc(trans->t_mountp, &tres, 0, 0, 0, tpp);
 	/*
 	 *  Ensure that the inode is in the new transaction and locked.
 	 */
@@ -151,32 +148,11 @@ libxfs_trans_roll(
 		return error;
 
 	if (dp)
-		xfs_trans_ijoin(trans, dp, 0);
+		xfs_trans_ijoin(*tpp, dp, 0);
 	return 0;
 }
 
-xfs_trans_t *
-libxfs_trans_alloc(
-	xfs_mount_t	*mp,
-	int		type)
-{
-	xfs_trans_t	*ptr;
-
-	if ((ptr = calloc(sizeof(xfs_trans_t), 1)) == NULL) {
-		fprintf(stderr, _("%s: xact calloc failed (%d bytes): %s\n"),
-			progname, (int)sizeof(xfs_trans_t), strerror(errno));
-		exit(1);
-	}
-	ptr->t_mountp = mp;
-	ptr->t_type = type;
-	INIT_LIST_HEAD(&ptr->t_items);
-#ifdef XACT_DEBUG
-	fprintf(stderr, "allocated new transaction %p\n", ptr);
-#endif
-	return ptr;
-}
-
-int
+static int
 libxfs_trans_reserve(
 	struct xfs_trans	*tp,
 	struct xfs_trans_res	*resp,
@@ -199,6 +175,39 @@ libxfs_trans_reserve(
 	return 0;
 }
 
+int
+libxfs_trans_alloc(
+	struct xfs_mount	*mp,
+	struct xfs_trans_res	*resp,
+	uint			blocks,
+	uint			rtextents,
+	uint			flags,
+	struct xfs_trans	**tpp)
+{
+	struct xfs_trans	*ptr;
+	int			error;
+
+	if ((ptr = calloc(sizeof(xfs_trans_t), 1)) == NULL) {
+		fprintf(stderr, _("%s: xact calloc failed (%d bytes): %s\n"),
+			progname, (int)sizeof(xfs_trans_t), strerror(errno));
+		exit(1);
+	}
+	ptr->t_mountp = mp;
+	INIT_LIST_HEAD(&ptr->t_items);
+#ifdef XACT_DEBUG
+	fprintf(stderr, "allocated new transaction %p\n", ptr);
+#endif
+
+	error = libxfs_trans_reserve(ptr, resp, blocks, rtextents);
+	if (error) {
+		libxfs_trans_cancel(ptr);
+		return error;
+	}
+
+	*tpp = ptr;
+	return 0;
+}
+
 void
 libxfs_trans_cancel(
 	xfs_trans_t	*tp)
diff --git a/libxfs/util.c b/libxfs/util.c
index f3b9895..b992ad0 100644
--- a/libxfs/util.c
+++ b/libxfs/util.c
@@ -541,10 +541,9 @@ libxfs_alloc_file_space(
 	while (allocatesize_fsb && !error) {
 		datablocks = allocatesize_fsb;
 
-		tp = xfs_trans_alloc(mp, XFS_TRANS_DIOSTRAT);
 		resblks = (uint)XFS_DIOSTRAT_SPACE_RES(mp, datablocks);
-		error = xfs_trans_reserve(tp, &M_RES(mp)->tr_write,
-					  resblks, 0);
+		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write,
+					  resblks, 0, 0, &tp);
 		/*
 		 * Check for running out of space
 		 */
diff --git a/libxfs/xfs_attr.c b/libxfs/xfs_attr.c
index 82a7c5e..0b05654 100644
--- a/libxfs/xfs_attr.c
+++ b/libxfs/xfs_attr.c
@@ -237,37 +237,21 @@ xfs_attr_set(
 			return error;
 	}
 
-	/*
-	 * Start our first transaction of the day.
-	 *
-	 * All future transactions during this code must be "chained" off
-	 * this one via the trans_dup() call.  All transactions will contain
-	 * the inode, and the inode will always be marked with trans_ihold().
-	 * Since the inode will be locked in all transactions, we must log
-	 * the inode in every transaction to let it float upward through
-	 * the log.
-	 */
-	args.trans = xfs_trans_alloc(mp, XFS_TRANS_ATTR_SET);
+	tres.tr_logres = M_RES(mp)->tr_attrsetm.tr_logres +
+			 M_RES(mp)->tr_attrsetrt.tr_logres * args.total;
+	tres.tr_logcount = XFS_ATTRSET_LOG_COUNT;
+	tres.tr_logflags = XFS_TRANS_PERM_LOG_RES;
 
 	/*
 	 * Root fork attributes can use reserved data blocks for this
 	 * operation if necessary
 	 */
-
-	if (rsvd)
-		args.trans->t_flags |= XFS_TRANS_RESERVE;
-
-	tres.tr_logres = M_RES(mp)->tr_attrsetm.tr_logres +
-			 M_RES(mp)->tr_attrsetrt.tr_logres * args.total;
-	tres.tr_logcount = XFS_ATTRSET_LOG_COUNT;
-	tres.tr_logflags = XFS_TRANS_PERM_LOG_RES;
-	error = xfs_trans_reserve(args.trans, &tres, args.total, 0);
-	if (error) {
-		xfs_trans_cancel(args.trans);
+	error = xfs_trans_alloc(mp, &tres, args.total, 0,
+			rsvd ? XFS_TRANS_RESERVE : 0, &args.trans);
+	if (error)
 		return error;
-	}
-	xfs_ilock(dp, XFS_ILOCK_EXCL);
 
+	xfs_ilock(dp, XFS_ILOCK_EXCL);
 	error = xfs_trans_reserve_quota_nblks(args.trans, dp, args.total, 0,
 				rsvd ? XFS_QMOPT_RES_REGBLKS | XFS_QMOPT_FORCE_RES :
 				       XFS_QMOPT_RES_REGBLKS);
@@ -424,31 +408,15 @@ xfs_attr_remove(
 		return error;
 
 	/*
-	 * Start our first transaction of the day.
-	 *
-	 * All future transactions during this code must be "chained" off
-	 * this one via the trans_dup() call.  All transactions will contain
-	 * the inode, and the inode will always be marked with trans_ihold().
-	 * Since the inode will be locked in all transactions, we must log
-	 * the inode in every transaction to let it float upward through
-	 * the log.
-	 */
-	args.trans = xfs_trans_alloc(mp, XFS_TRANS_ATTR_RM);
-
-	/*
 	 * Root fork attributes can use reserved data blocks for this
 	 * operation if necessary
 	 */
-
-	if (flags & ATTR_ROOT)
-		args.trans->t_flags |= XFS_TRANS_RESERVE;
-
-	error = xfs_trans_reserve(args.trans, &M_RES(mp)->tr_attrrm,
-				  XFS_ATTRRM_SPACE_RES(mp), 0);
-	if (error) {
-		xfs_trans_cancel(args.trans);
+	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_attrrm,
+			XFS_ATTRRM_SPACE_RES(mp), 0,
+			(flags & ATTR_ROOT) ? XFS_TRANS_RESERVE : 0,
+			&args.trans);
+	if (error)
 		return error;
-	}
 
 	xfs_ilock(dp, XFS_ILOCK_EXCL);
 	/*
diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index cbcfd72..c2a2c53 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -1113,15 +1113,14 @@ xfs_bmap_add_attrfork(
 
 	mp = ip->i_mount;
 	ASSERT(!XFS_NOT_DQATTACHED(mp, ip));
-	tp = xfs_trans_alloc(mp, XFS_TRANS_ADDAFORK);
+
 	blks = XFS_ADDAFORK_SPACE_RES(mp);
-	if (rsvd)
-		tp->t_flags |= XFS_TRANS_RESERVE;
-	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_addafork, blks, 0);
-	if (error) {
-		xfs_trans_cancel(tp);
+
+	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_addafork, blks, 0,
+			rsvd ? XFS_TRANS_RESERVE : 0, &tp);
+	if (error)
 		return error;
-	}
+
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	error = xfs_trans_reserve_quota_nblks(tp, ip, blks, 0, rsvd ?
 			XFS_QMOPT_RES_REGBLKS | XFS_QMOPT_FORCE_RES :
@@ -6018,13 +6017,10 @@ xfs_bmap_split_extent(
 	xfs_fsblock_t           firstfsb;
 	int                     error;
 
-	tp = xfs_trans_alloc(mp, XFS_TRANS_DIOSTRAT);
-	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_write,
-			XFS_DIOSTRAT_SPACE_RES(mp, 0), 0);
-	if (error) {
-		xfs_trans_cancel(tp);
+	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write,
+			XFS_DIOSTRAT_SPACE_RES(mp, 0), 0, 0, &tp);
+	if (error)
 		return error;
-	}
 
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
diff --git a/libxfs/xfs_dir2_sf.c b/libxfs/xfs_dir2_sf.c
index e4b505b..90b07f7 100644
--- a/libxfs/xfs_dir2_sf.c
+++ b/libxfs/xfs_dir2_sf.c
@@ -255,15 +255,12 @@ xfs_dir2_block_to_sf(
 	 *
 	 * Convert the inode to local format and copy the data in.
 	 */
-	dp->i_df.if_flags &= ~XFS_IFEXTENTS;
-	dp->i_df.if_flags |= XFS_IFINLINE;
-	dp->i_d.di_format = XFS_DINODE_FMT_LOCAL;
 	ASSERT(dp->i_df.if_bytes == 0);
-	xfs_idata_realloc(dp, size, XFS_DATA_FORK);
+	xfs_init_local_fork(dp, XFS_DATA_FORK, dst, size);
+	dp->i_d.di_format = XFS_DINODE_FMT_LOCAL;
+	dp->i_d.di_size = size;
 
 	logflags |= XFS_ILOG_DDATA;
-	memcpy(dp->i_df.if_u1.if_data, dst, size);
-	dp->i_d.di_size = size;
 	xfs_dir2_sf_check(args);
 out:
 	xfs_trans_log_inode(args->trans, dp, logflags);
diff --git a/libxfs/xfs_inode_fork.c b/libxfs/xfs_inode_fork.c
index 2af1dba..799873a 100644
--- a/libxfs/xfs_inode_fork.c
+++ b/libxfs/xfs_inode_fork.c
@@ -227,6 +227,48 @@ xfs_iformat_fork(
 	return error;
 }
 
+void
+xfs_init_local_fork(
+	struct xfs_inode	*ip,
+	int			whichfork,
+	const void		*data,
+	int			size)
+{
+	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, whichfork);
+	int			mem_size = size, real_size = 0;
+	bool			zero_terminate;
+
+	/*
+	 * If we are using the local fork to store a symlink body we need to
+	 * zero-terminate it so that we can pass it back to the VFS directly.
+	 * Overallocate the in-memory fork by one for that and add a zero
+	 * to terminate it below.
+	 */
+	zero_terminate = S_ISLNK(VFS_I(ip)->i_mode);
+	if (zero_terminate)
+		mem_size++;
+
+	if (size == 0)
+		ifp->if_u1.if_data = NULL;
+	else if (mem_size <= sizeof(ifp->if_u2.if_inline_data))
+		ifp->if_u1.if_data = ifp->if_u2.if_inline_data;
+	else {
+		real_size = roundup(mem_size, 4);
+		ifp->if_u1.if_data = kmem_alloc(real_size, KM_SLEEP | KM_NOFS);
+	}
+
+	if (size) {
+		memcpy(ifp->if_u1.if_data, data, size);
+		if (zero_terminate)
+			ifp->if_u1.if_data[size] = '\0';
+	}
+
+	ifp->if_bytes = size;
+	ifp->if_real_bytes = real_size;
+	ifp->if_flags &= ~(XFS_IFEXTENTS | XFS_IFBROOT);
+	ifp->if_flags |= XFS_IFINLINE;
+}
+
 /*
  * The file is in-lined in the on-disk inode.
  * If it fits into if_inline_data, then copy
@@ -244,8 +286,6 @@ xfs_iformat_local(
 	int		whichfork,
 	int		size)
 {
-	xfs_ifork_t	*ifp;
-	int		real_size;
 
 	/*
 	 * If the size is unreasonable, then something
@@ -261,22 +301,8 @@ xfs_iformat_local(
 				     ip->i_mount, dip);
 		return -EFSCORRUPTED;
 	}
-	ifp = XFS_IFORK_PTR(ip, whichfork);
-	real_size = 0;
-	if (size == 0)
-		ifp->if_u1.if_data = NULL;
-	else if (size <= sizeof(ifp->if_u2.if_inline_data))
-		ifp->if_u1.if_data = ifp->if_u2.if_inline_data;
-	else {
-		real_size = roundup(size, 4);
-		ifp->if_u1.if_data = kmem_alloc(real_size, KM_SLEEP | KM_NOFS);
-	}
-	ifp->if_bytes = size;
-	ifp->if_real_bytes = real_size;
-	if (size)
-		memcpy(ifp->if_u1.if_data, XFS_DFORK_PTR(dip, whichfork), size);
-	ifp->if_flags &= ~XFS_IFEXTENTS;
-	ifp->if_flags |= XFS_IFINLINE;
+
+	xfs_init_local_fork(ip, whichfork, XFS_DFORK_PTR(dip, whichfork), size);
 	return 0;
 }
 
@@ -512,7 +538,6 @@ xfs_iroot_realloc(
 		new_max = cur_max + rec_diff;
 		new_size = XFS_BMAP_BROOT_SPACE_CALC(mp, new_max);
 		ifp->if_broot = kmem_realloc(ifp->if_broot, new_size,
-				XFS_BMAP_BROOT_SPACE_CALC(mp, cur_max),
 				KM_SLEEP | KM_NOFS);
 		op = (char *)XFS_BMAP_BROOT_PTR_ADDR(mp, ifp->if_broot, 1,
 						     ifp->if_broot_bytes);
@@ -656,7 +681,6 @@ xfs_idata_realloc(
 				ifp->if_u1.if_data =
 					kmem_realloc(ifp->if_u1.if_data,
 							real_size,
-							ifp->if_real_bytes,
 							KM_SLEEP | KM_NOFS);
 			}
 		} else {
@@ -1372,8 +1396,7 @@ xfs_iext_realloc_direct(
 		if (rnew_size != ifp->if_real_bytes) {
 			ifp->if_u1.if_extents =
 				kmem_realloc(ifp->if_u1.if_extents,
-						rnew_size,
-						ifp->if_real_bytes, KM_NOFS);
+						rnew_size, KM_NOFS);
 		}
 		if (rnew_size > ifp->if_real_bytes) {
 			memset(&ifp->if_u1.if_extents[ifp->if_bytes /
@@ -1457,9 +1480,8 @@ xfs_iext_realloc_indirect(
 	if (new_size == 0) {
 		xfs_iext_destroy(ifp);
 	} else {
-		ifp->if_u1.if_ext_irec = (xfs_ext_irec_t *)
-			kmem_realloc(ifp->if_u1.if_ext_irec,
-				new_size, size, KM_NOFS);
+		ifp->if_u1.if_ext_irec =
+			kmem_realloc(ifp->if_u1.if_ext_irec, new_size, KM_NOFS);
 	}
 }
 
@@ -1493,6 +1515,24 @@ xfs_iext_indirect_to_direct(
 }
 
 /*
+ * Remove all records from the indirection array.
+ */
+STATIC void
+xfs_iext_irec_remove_all(
+	struct xfs_ifork *ifp)
+{
+	int		nlists;
+	int		i;
+
+	ASSERT(ifp->if_flags & XFS_IFEXTIREC);
+	nlists = ifp->if_real_bytes / XFS_IEXT_BUFSZ;
+	for (i = 0; i < nlists; i++)
+		kmem_free(ifp->if_u1.if_ext_irec[i].er_extbuf);
+	kmem_free(ifp->if_u1.if_ext_irec);
+	ifp->if_flags &= ~XFS_IFEXTIREC;
+}
+
+/*
  * Free incore file extents.
  */
 void
@@ -1500,14 +1540,7 @@ xfs_iext_destroy(
 	xfs_ifork_t	*ifp)		/* inode fork pointer */
 {
 	if (ifp->if_flags & XFS_IFEXTIREC) {
-		int	erp_idx;
-		int	nlists;
-
-		nlists = ifp->if_real_bytes / XFS_IEXT_BUFSZ;
-		for (erp_idx = nlists - 1; erp_idx >= 0 ; erp_idx--) {
-			xfs_iext_irec_remove(ifp, erp_idx);
-		}
-		ifp->if_flags &= ~XFS_IFEXTIREC;
+		xfs_iext_irec_remove_all(ifp);
 	} else if (ifp->if_real_bytes) {
 		kmem_free(ifp->if_u1.if_extents);
 	} else if (ifp->if_bytes) {
diff --git a/libxfs/xfs_inode_fork.h b/libxfs/xfs_inode_fork.h
index 7d3b1ed..f95e072 100644
--- a/libxfs/xfs_inode_fork.h
+++ b/libxfs/xfs_inode_fork.h
@@ -134,6 +134,7 @@ void		xfs_iroot_realloc(struct xfs_inode *, int, int);
 int		xfs_iread_extents(struct xfs_trans *, struct xfs_inode *, int);
 int		xfs_iextents_copy(struct xfs_inode *, struct xfs_bmbt_rec *,
 				  int);
+void		xfs_init_local_fork(struct xfs_inode *, int, const void *, int);
 
 struct xfs_bmbt_rec_host *
 		xfs_iext_get_ext(struct xfs_ifork *, xfs_extnum_t);
diff --git a/libxfs/xfs_log_format.h b/libxfs/xfs_log_format.h
index 40005bf..e5baba3 100644
--- a/libxfs/xfs_log_format.h
+++ b/libxfs/xfs_log_format.h
@@ -212,6 +212,11 @@ typedef struct xfs_trans_header {
 #define	XFS_TRANS_HEADER_MAGIC	0x5452414e	/* TRAN */
 
 /*
+ * The only type valid for th_type in CIL-enabled file system logs:
+ */
+#define XFS_TRANS_CHECKPOINT	40
+
+/*
  * Log item types.
  */
 #define	XFS_LI_EFI		0x1236
diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index 67c7a65..e2cc83e 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -839,12 +839,10 @@ xfs_sync_sb(
 	struct xfs_trans	*tp;
 	int			error;
 
-	tp = _xfs_trans_alloc(mp, XFS_TRANS_SB_CHANGE, KM_SLEEP);
-	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_sb, 0, 0);
-	if (error) {
-		xfs_trans_cancel(tp);
+	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_sb, 0, 0,
+			XFS_TRANS_NO_WRITECOUNT, &tp);
+	if (error)
 		return error;
-	}
 
 	xfs_log_sb(tp);
 	if (wait)
diff --git a/libxfs/xfs_shared.h b/libxfs/xfs_shared.h
index 81ac870..16002b5 100644
--- a/libxfs/xfs_shared.h
+++ b/libxfs/xfs_shared.h
@@ -56,103 +56,6 @@ extern const struct xfs_buf_ops xfs_symlink_buf_ops;
 extern const struct xfs_buf_ops xfs_rtbuf_ops;
 
 /*
- * Transaction types.  Used to distinguish types of buffers. These never reach
- * the log.
- */
-#define XFS_TRANS_SETATTR_NOT_SIZE	1
-#define XFS_TRANS_SETATTR_SIZE		2
-#define XFS_TRANS_INACTIVE		3
-#define XFS_TRANS_CREATE		4
-#define XFS_TRANS_CREATE_TRUNC		5
-#define XFS_TRANS_TRUNCATE_FILE		6
-#define XFS_TRANS_REMOVE		7
-#define XFS_TRANS_LINK			8
-#define XFS_TRANS_RENAME		9
-#define XFS_TRANS_MKDIR			10
-#define XFS_TRANS_RMDIR			11
-#define XFS_TRANS_SYMLINK		12
-#define XFS_TRANS_SET_DMATTRS		13
-#define XFS_TRANS_GROWFS		14
-#define XFS_TRANS_STRAT_WRITE		15
-#define XFS_TRANS_DIOSTRAT		16
-/* 17 was XFS_TRANS_WRITE_SYNC */
-#define	XFS_TRANS_WRITEID		18
-#define	XFS_TRANS_ADDAFORK		19
-#define	XFS_TRANS_ATTRINVAL		20
-#define	XFS_TRANS_ATRUNCATE		21
-#define	XFS_TRANS_ATTR_SET		22
-#define	XFS_TRANS_ATTR_RM		23
-#define	XFS_TRANS_ATTR_FLAG		24
-#define	XFS_TRANS_CLEAR_AGI_BUCKET	25
-#define XFS_TRANS_SB_CHANGE		26
-/*
- * Dummy entries since we use the transaction type to index into the
- * trans_type[] in xlog_recover_print_trans_head()
- */
-#define XFS_TRANS_DUMMY1		27
-#define XFS_TRANS_DUMMY2		28
-#define XFS_TRANS_QM_QUOTAOFF		29
-#define XFS_TRANS_QM_DQALLOC		30
-#define XFS_TRANS_QM_SETQLIM		31
-#define XFS_TRANS_QM_DQCLUSTER		32
-#define XFS_TRANS_QM_QINOCREATE		33
-#define XFS_TRANS_QM_QUOTAOFF_END	34
-#define XFS_TRANS_FSYNC_TS		35
-#define	XFS_TRANS_GROWFSRT_ALLOC	36
-#define	XFS_TRANS_GROWFSRT_ZERO		37
-#define	XFS_TRANS_GROWFSRT_FREE		38
-#define	XFS_TRANS_SWAPEXT		39
-#define	XFS_TRANS_CHECKPOINT		40
-#define	XFS_TRANS_ICREATE		41
-#define	XFS_TRANS_CREATE_TMPFILE	42
-#define	XFS_TRANS_TYPE_MAX		43
-/* new transaction types need to be reflected in xfs_logprint(8) */
-
-#define XFS_TRANS_TYPES \
-	{ XFS_TRANS_SETATTR_NOT_SIZE,	"SETATTR_NOT_SIZE" }, \
-	{ XFS_TRANS_SETATTR_SIZE,	"SETATTR_SIZE" }, \
-	{ XFS_TRANS_INACTIVE,		"INACTIVE" }, \
-	{ XFS_TRANS_CREATE,		"CREATE" }, \
-	{ XFS_TRANS_CREATE_TRUNC,	"CREATE_TRUNC" }, \
-	{ XFS_TRANS_TRUNCATE_FILE,	"TRUNCATE_FILE" }, \
-	{ XFS_TRANS_REMOVE,		"REMOVE" }, \
-	{ XFS_TRANS_LINK,		"LINK" }, \
-	{ XFS_TRANS_RENAME,		"RENAME" }, \
-	{ XFS_TRANS_MKDIR,		"MKDIR" }, \
-	{ XFS_TRANS_RMDIR,		"RMDIR" }, \
-	{ XFS_TRANS_SYMLINK,		"SYMLINK" }, \
-	{ XFS_TRANS_SET_DMATTRS,	"SET_DMATTRS" }, \
-	{ XFS_TRANS_GROWFS,		"GROWFS" }, \
-	{ XFS_TRANS_STRAT_WRITE,	"STRAT_WRITE" }, \
-	{ XFS_TRANS_DIOSTRAT,		"DIOSTRAT" }, \
-	{ XFS_TRANS_WRITEID,		"WRITEID" }, \
-	{ XFS_TRANS_ADDAFORK,		"ADDAFORK" }, \
-	{ XFS_TRANS_ATTRINVAL,		"ATTRINVAL" }, \
-	{ XFS_TRANS_ATRUNCATE,		"ATRUNCATE" }, \
-	{ XFS_TRANS_ATTR_SET,		"ATTR_SET" }, \
-	{ XFS_TRANS_ATTR_RM,		"ATTR_RM" }, \
-	{ XFS_TRANS_ATTR_FLAG,		"ATTR_FLAG" }, \
-	{ XFS_TRANS_CLEAR_AGI_BUCKET,	"CLEAR_AGI_BUCKET" }, \
-	{ XFS_TRANS_SB_CHANGE,		"SBCHANGE" }, \
-	{ XFS_TRANS_DUMMY1,		"DUMMY1" }, \
-	{ XFS_TRANS_DUMMY2,		"DUMMY2" }, \
-	{ XFS_TRANS_QM_QUOTAOFF,	"QM_QUOTAOFF" }, \
-	{ XFS_TRANS_QM_DQALLOC,		"QM_DQALLOC" }, \
-	{ XFS_TRANS_QM_SETQLIM,		"QM_SETQLIM" }, \
-	{ XFS_TRANS_QM_DQCLUSTER,	"QM_DQCLUSTER" }, \
-	{ XFS_TRANS_QM_QINOCREATE,	"QM_QINOCREATE" }, \
-	{ XFS_TRANS_QM_QUOTAOFF_END,	"QM_QOFF_END" }, \
-	{ XFS_TRANS_FSYNC_TS,		"FSYNC_TS" }, \
-	{ XFS_TRANS_GROWFSRT_ALLOC,	"GROWFSRT_ALLOC" }, \
-	{ XFS_TRANS_GROWFSRT_ZERO,	"GROWFSRT_ZERO" }, \
-	{ XFS_TRANS_GROWFSRT_FREE,	"GROWFSRT_FREE" }, \
-	{ XFS_TRANS_SWAPEXT,		"SWAPEXT" }, \
-	{ XFS_TRANS_CHECKPOINT,		"CHECKPOINT" }, \
-	{ XFS_TRANS_ICREATE,		"ICREATE" }, \
-	{ XFS_TRANS_CREATE_TMPFILE,	"CREATE_TMPFILE" }, \
-	{ XLOG_UNMOUNT_REC_TYPE,	"UNMOUNT" }
-
-/*
  * This structure is used to track log items associated with
  * a transaction.  It points to the log item and keeps some
  * flags to track the state of the log item.  It also tracks
@@ -181,8 +84,9 @@ int	xfs_log_calc_minimum_size(struct xfs_mount *);
 #define	XFS_TRANS_SYNC		0x08	/* make commit synchronous */
 #define XFS_TRANS_DQ_DIRTY	0x10	/* at least one dquot in trx dirty */
 #define XFS_TRANS_RESERVE	0x20    /* OK to use reserved data blocks */
-#define XFS_TRANS_FREEZE_PROT	0x40	/* Transaction has elevated writer
-					   count in superblock */
+#define XFS_TRANS_NO_WRITECOUNT 0x40	/* do not elevate SB writecount */
+#define XFS_TRANS_NOFS		0x80	/* pass KM_NOFS to kmem_alloc */
+
 /*
  * Field values for xfs_trans_mod_sb.
  */
diff --git a/libxlog/xfs_log_recover.c b/libxlog/xfs_log_recover.c
index 6116ecd..6cd77d5 100644
--- a/libxlog/xfs_log_recover.c
+++ b/libxlog/xfs_log_recover.c
@@ -1062,7 +1062,7 @@ xlog_recover_add_to_cont_trans(
 	old_ptr = item->ri_buf[item->ri_cnt-1].i_addr;
 	old_len = item->ri_buf[item->ri_cnt-1].i_len;
 
-	ptr = kmem_realloc(old_ptr, len+old_len, old_len, KM_SLEEP);
+	ptr = kmem_realloc(old_ptr, len+old_len, KM_SLEEP);
 	memcpy(&ptr[old_len], dp, len); /* d, s, l */
 	item->ri_buf[item->ri_cnt-1].i_len += len;
 	item->ri_buf[item->ri_cnt-1].i_addr = ptr;
diff --git a/mkfs/proto.c b/mkfs/proto.c
index 09a9439..edbaa33 100644
--- a/mkfs/proto.c
+++ b/mkfs/proto.c
@@ -25,7 +25,7 @@
  */
 static char *getstr(char **pp);
 static void fail(char *msg, int i);
-static void getres(xfs_trans_t *tp, uint blocks);
+static void getres(struct xfs_mount *mp, uint blocks, struct xfs_trans **tpp);
 static void rsvfile(xfs_mount_t *mp, xfs_inode_t *ip, long long len);
 static int newfile(xfs_trans_t *tp, xfs_inode_t *ip, xfs_bmap_free_t *flist,
 	xfs_fsblock_t *first, int dolocal, int logit, char *buf, int len);
@@ -127,22 +127,23 @@ res_failed(
 
 static void
 getres(
-	xfs_trans_t	*tp,
-	uint		blocks)
+	struct xfs_mount	*mp,
+	uint			blocks,
+	struct xfs_trans	**tpp)
 {
-	int		i;
-	xfs_mount_t	*mp;
-	uint		r;
-
-	mp = tp->t_mountp;
-	for (i = 0, r = MKFS_BLOCKRES(blocks); r >= blocks; r--) {
-		struct xfs_trans_res    tres = {0};
-
-		i = -libxfs_trans_reserve(tp, &tres, r, 0);
-		if (i == 0)
+	struct xfs_trans	*tp;
+	uint			r;
+	int			error = -ENOSPC;
+
+	for (r = MKFS_BLOCKRES(blocks); r >= blocks; r--) {
+		error = libxfs_trans_alloc(mp, NULL, r, 0, 0, &tp);
+		if (!error) {
+			*tpp = tp;
 			return;
+		}
 	}
-	res_failed(i);
+
+	res_failed(error);
 	/* NOTREACHED */
 }
 
@@ -203,7 +204,11 @@ rsvfile(
 	/*
 	 * update the inode timestamp, mode, and prealloc flag bits
 	 */
-	tp = libxfs_trans_alloc(mp, 0);
+	error = -libxfs_trans_alloc(mp, NULL, 0, 0, 0, &tp);
+	if (error) {
+		fail(_("error reserving space for a file"), error);
+		exit(1);
+	}
 
 	libxfs_trans_ijoin(tp, ip, 0);
 
@@ -454,13 +459,12 @@ parseproto(
 	xname.name = (unsigned char *)name;
 	xname.len = name ? strlen(name) : 0;
 	xname.type = 0;
-	tp = libxfs_trans_alloc(mp, 0);
 	flags = XFS_ILOG_CORE;
 	xfs_bmap_init(&flist, &first);
 	switch (fmt) {
 	case IF_REGULAR:
 		buf = newregfile(pp, &len);
-		getres(tp, XFS_B_TO_FSB(mp, len));
+		getres(mp, XFS_B_TO_FSB(mp, len), &tp);
 		error = -libxfs_inode_alloc(&tp, pip, mode|S_IFREG, 1, 0,
 					   &creds, fsxp, &ip);
 		if (error)
@@ -483,7 +487,7 @@ parseproto(
 				progname, value, name);
 			exit(1);
 		}
-		getres(tp, XFS_B_TO_FSB(mp, llen));
+		getres(mp, XFS_B_TO_FSB(mp, llen), &tp);
 
 		error = -libxfs_inode_alloc(&tp, pip, mode|S_IFREG, 1, 0,
 					  &creds, fsxp, &ip);
@@ -505,7 +509,7 @@ parseproto(
 		return;
 
 	case IF_BLOCK:
-		getres(tp, 0);
+		getres(mp, 0, &tp);
 		majdev = getnum(getstr(pp), 0, 0, false);
 		mindev = getnum(getstr(pp), 0, 0, false);
 		error = -libxfs_inode_alloc(&tp, pip, mode|S_IFBLK, 1,
@@ -520,7 +524,7 @@ parseproto(
 		break;
 
 	case IF_CHAR:
-		getres(tp, 0);
+		getres(mp, 0, &tp);
 		majdev = getnum(getstr(pp), 0, 0, false);
 		mindev = getnum(getstr(pp), 0, 0, false);
 		error = -libxfs_inode_alloc(&tp, pip, mode|S_IFCHR, 1,
@@ -534,7 +538,7 @@ parseproto(
 		break;
 
 	case IF_FIFO:
-		getres(tp, 0);
+		getres(mp, 0, &tp);
 		error = -libxfs_inode_alloc(&tp, pip, mode|S_IFIFO, 1, 0,
 				&creds, fsxp, &ip);
 		if (error)
@@ -546,7 +550,7 @@ parseproto(
 	case IF_SYMLINK:
 		buf = getstr(pp);
 		len = (int)strlen(buf);
-		getres(tp, XFS_B_TO_FSB(mp, len));
+		getres(mp, XFS_B_TO_FSB(mp, len), &tp);
 		error = -libxfs_inode_alloc(&tp, pip, mode|S_IFLNK, 1, 0,
 				&creds, fsxp, &ip);
 		if (error)
@@ -557,7 +561,7 @@ parseproto(
 		newdirent(mp, tp, pip, &xname, ip->i_ino, &first, &flist);
 		break;
 	case IF_DIRECTORY:
-		getres(tp, 0);
+		getres(mp, 0, &tp);
 		error = -libxfs_inode_alloc(&tp, pip, mode|S_IFDIR, 1, 0,
 				&creds, fsxp, &ip);
 		if (error)
@@ -649,8 +653,7 @@ rtinit(
 	/*
 	 * First, allocate the inodes.
 	 */
-	tp = libxfs_trans_alloc(mp, 0);
-	i = -libxfs_trans_reserve(tp, &tres, MKFS_BLOCKRES_INODE, 0);
+	i = -libxfs_trans_alloc(mp, &tres, MKFS_BLOCKRES_INODE, 0, 0, &tp);
 	if (i)
 		res_failed(i);
 
@@ -687,9 +690,8 @@ rtinit(
 	/*
 	 * Next, give the bitmap file some zero-filled blocks.
 	 */
-	tp = libxfs_trans_alloc(mp, 0);
-	i = -libxfs_trans_reserve(tp, &tres, mp->m_sb.sb_rbmblocks +
-				 (XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK) - 1), 0);
+	i = -libxfs_trans_alloc(mp, &tres, mp->m_sb.sb_rbmblocks +
+			 (XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK) - 1), 0, 0, &tp);
 	if (i)
 		res_failed(i);
 
@@ -723,10 +725,9 @@ rtinit(
 	/*
 	 * Give the summary file some zero-filled blocks.
 	 */
-	tp = libxfs_trans_alloc(mp, 0);
 	nsumblocks = mp->m_rsumsize >> mp->m_sb.sb_blocklog;
-	i = -libxfs_trans_reserve(tp, &tres, nsumblocks +
-				 (XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK) - 1), 0);
+	i = -libxfs_trans_alloc(mp, &tres, nsumblocks +
+			(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK) - 1), 0, 0, &tp);
 	if (i)
 		res_failed(i);
 	libxfs_trans_ijoin(tp, rsumip, 0);
@@ -760,8 +761,7 @@ rtinit(
 	 * Do one transaction per bitmap block.
 	 */
 	for (bno = 0; bno < mp->m_sb.sb_rextents; bno = ebno) {
-		tp = libxfs_trans_alloc(mp, 0);
-		i = -libxfs_trans_reserve(tp, &tres, 0, 0);
+		i = -libxfs_trans_alloc(mp, &tres, 0, 0, 0, &tp);
 		if (i)
 			res_failed(i);
 		libxfs_trans_ijoin(tp, rbmip, 0);
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index 955dcfd..de00c8e 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -3451,14 +3451,15 @@ _("size %s specified for log subvolume is too large, maximum is %lld blocks\n"),
 		struct xfs_trans_res tres = {0};
 
 		memset(&args, 0, sizeof(args));
-		args.tp = tp = libxfs_trans_alloc(mp, 0);
+		
 		args.mp = mp;
 		args.agno = agno;
 		args.alignment = 1;
 		args.pag = xfs_perag_get(mp,agno);
-		c = -libxfs_trans_reserve(tp, &tres, worst_freelist, 0);
+		c = -libxfs_trans_alloc(mp, &tres, worst_freelist, 0, 0, &tp);
 		if (c)
 			res_failed(c);
+		args.tp = tp;
 
 		libxfs_alloc_fix_freelist(&args, 0);
 		xfs_perag_put(args.pag);
diff --git a/repair/phase5.c b/repair/phase5.c
index 5d48848..b58111b 100644
--- a/repair/phase5.c
+++ b/repair/phase5.c
@@ -1505,14 +1505,19 @@ build_agf_agfl(xfs_mount_t	*mp,
 		int		error;
 
 		memset(&args, 0, sizeof(args));
-		args.tp = tp = libxfs_trans_alloc(mp, 0);
 		args.mp = mp;
 		args.agno = agno;
 		args.alignment = 1;
 		args.pag = xfs_perag_get(mp,agno);
-		libxfs_trans_reserve(tp, &tres,
-				     xfs_alloc_min_freelist(mp, args.pag), 0);
-		error = libxfs_alloc_fix_freelist(&args, 0);
+		error = -libxfs_trans_alloc(mp, &tres,
+				xfs_alloc_min_freelist(mp, args.pag),
+				0, 0, &tp);
+		if (error) {
+			do_error(_("failed to fix AGFL on AG %d, error %d\n"),
+					agno, error);
+		}
+		args.tp = tp;
+		error = -libxfs_alloc_fix_freelist(&args, 0);
 		xfs_perag_put(args.pag);
 		if (error) {
 			do_error(_("failed to fix AGFL on AG %d, error %d\n"),
diff --git a/repair/phase6.c b/repair/phase6.c
index 7353c3a..d1acb68 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -494,9 +494,7 @@ mk_rbmino(xfs_mount_t *mp)
 	/*
 	 * first set up inode
 	 */
-	tp = libxfs_trans_alloc(mp, 0);
-
-	i = -libxfs_trans_reserve(tp, &tres, 10, 0);
+	i = -libxfs_trans_alloc(mp, &tres, 10, 0, 0, &tp);
 	if (i)
 		res_failed(i);
 
@@ -544,9 +542,9 @@ mk_rbmino(xfs_mount_t *mp)
 	 * then allocate blocks for file and fill with zeroes (stolen
 	 * from mkfs)
 	 */
-	tp = libxfs_trans_alloc(mp, 0);
-	error = -libxfs_trans_reserve(tp, &tres, mp->m_sb.sb_rbmblocks +
-				(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK) - 1), 0);
+	error = -libxfs_trans_alloc(mp, &tres, mp->m_sb.sb_rbmblocks +
+				(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK) - 1),
+				0, 0, &tp);
 	if (error)
 		res_failed(error);
 
@@ -598,9 +596,7 @@ fill_rbmino(xfs_mount_t *mp)
 	bmp = btmcompute;
 	bno = 0;
 
-	tp = libxfs_trans_alloc(mp, 0);
-
-	error = -libxfs_trans_reserve(tp, &tres, 10, 0);
+	error = -libxfs_trans_alloc(mp, &tres, 10, 0, 0, &tp);
 	if (error)
 		res_failed(error);
 
@@ -671,9 +667,7 @@ fill_rsumino(xfs_mount_t *mp)
 	bno = 0;
 	end_bno = mp->m_rsumsize >> mp->m_sb.sb_blocklog;
 
-	tp = libxfs_trans_alloc(mp, 0);
-
-	error = -libxfs_trans_reserve(tp, &tres, 10, 0);
+	error = -libxfs_trans_alloc(mp, &tres, 10, 0, 0, &tp);
 	if (error)
 		res_failed(error);
 
@@ -747,9 +741,7 @@ mk_rsumino(xfs_mount_t *mp)
 	/*
 	 * first set up inode
 	 */
-	tp = libxfs_trans_alloc(mp, 0);
-
-	i = -libxfs_trans_reserve(tp, &M_RES(mp)->tr_ichange, 10, 0);
+	i = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_ichange, 10, 0, 0, &tp);
 	if (i)
 		res_failed(i);
 
@@ -797,15 +789,15 @@ mk_rsumino(xfs_mount_t *mp)
 	 * then allocate blocks for file and fill with zeroes (stolen
 	 * from mkfs)
 	 */
-	tp = libxfs_trans_alloc(mp, 0);
 	xfs_bmap_init(&flist, &first);
 
 	nsumblocks = mp->m_rsumsize >> mp->m_sb.sb_blocklog;
 	tres.tr_logres = BBTOB(128);
 	tres.tr_logcount = XFS_DEFAULT_PERM_LOG_COUNT;
 	tres.tr_logflags = XFS_TRANS_PERM_LOG_RES;
-	error = -libxfs_trans_reserve(tp, &tres, mp->m_sb.sb_rbmblocks +
-			      (XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK) - 1), 0);
+	error = -libxfs_trans_alloc(mp, &tres, mp->m_sb.sb_rbmblocks +
+			      (XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK) - 1),
+			      0, 0, &tp);
 	if (error)
 		res_failed(error);
 
@@ -854,10 +846,9 @@ mk_root_dir(xfs_mount_t *mp)
 	int		vers;
 	int		times;
 
-	tp = libxfs_trans_alloc(mp, 0);
 	ip = NULL;
 
-	i = -libxfs_trans_reserve(tp, &M_RES(mp)->tr_ichange, 10, 0);
+	i = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_ichange, 10, 0, 0, &tp);
 	if (i)
 		res_failed(i);
 
@@ -954,11 +945,10 @@ mk_orphanage(xfs_mount_t *mp)
 	 * could not be found, create it
 	 */
 
-	tp = libxfs_trans_alloc(mp, 0);
 	xfs_bmap_init(&flist, &first);
 
 	nres = XFS_MKDIR_SPACE_RES(mp, xname.len);
-	i = -libxfs_trans_reserve(tp, &M_RES(mp)->tr_mkdir, nres, 0);
+	i = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_mkdir, nres, 0, 0, &tp);
 	if (i)
 		res_failed(i);
 
@@ -1092,8 +1082,6 @@ mv_orphanage(
 		xname.len = snprintf((char *)fname, sizeof(fname), "%llu.%d",
 					(unsigned long long)ino, ++incr);
 
-	tp = libxfs_trans_alloc(mp, 0);
-
 	if ((err = -libxfs_iget(mp, NULL, ino, 0, &ino_p, 0)))
 		do_error(_("%d - couldn't iget disconnected inode\n"), err);
 
@@ -1112,8 +1100,8 @@ mv_orphanage(
 		if (err) {
 			ASSERT(err == ENOENT);
 
-			err = -libxfs_trans_reserve(tp, &M_RES(mp)->tr_rename,
-						   nres, 0);
+			err = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_rename,
+						   nres, 0, 0, &tp);
 			if (err)
 				do_error(
 	_("space reservation failed (%d), filesystem may be out of space\n"),
@@ -1154,8 +1142,8 @@ mv_orphanage(
 
 			libxfs_trans_commit(tp);
 		} else  {
-			err = -libxfs_trans_reserve(tp, &M_RES(mp)->tr_rename,
-						   nres, 0);
+			err = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_rename,
+						   nres, 0, 0, &tp);
 			if (err)
 				do_error(
 	_("space reservation failed (%d), filesystem may be out of space\n"),
@@ -1210,8 +1198,8 @@ mv_orphanage(
 		 * also accounted for in the create
 		 */
 		nres = XFS_DIRENTER_SPACE_RES(mp, xname.len);
-		err = -libxfs_trans_reserve(tp, &M_RES(mp)->tr_remove,
-					   nres, 0);
+		err = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_remove,
+					   nres, 0, 0, &tp);
 		if (err)
 			do_error(
 	_("space reservation failed (%d), filesystem may be out of space\n"),
@@ -1306,9 +1294,8 @@ longform_dir2_rebuild(
 
 	xfs_bmap_init(&flist, &firstblock);
 
-	tp = libxfs_trans_alloc(mp, 0);
 	nres = XFS_REMOVE_SPACE_RES(mp);
-	error = -libxfs_trans_reserve(tp, &M_RES(mp)->tr_remove, nres, 0);
+	error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_remove, nres, 0, 0, &tp);
 	if (error)
 		res_failed(error);
 	libxfs_trans_ijoin(tp, ip, 0);
@@ -1349,10 +1336,9 @@ longform_dir2_rebuild(
 						p->name.name[1] == '.'))))
 			continue;
 
-		tp = libxfs_trans_alloc(mp, 0);
 		nres = XFS_CREATE_SPACE_RES(mp, p->name.len);
-		error = -libxfs_trans_reserve(tp, &M_RES(mp)->tr_create,
-					     nres, 0);
+		error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_create,
+					     nres, 0, 0, &tp);
 		if (error)
 			res_failed(error);
 
@@ -1406,9 +1392,8 @@ dir2_kill_block(
 	int		nres;
 	xfs_trans_t	*tp;
 
-	tp = libxfs_trans_alloc(mp, 0);
 	nres = XFS_REMOVE_SPACE_RES(mp);
-	error = -libxfs_trans_reserve(tp, &M_RES(mp)->tr_remove, nres, 0);
+	error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_remove, nres, 0, 0, &tp);
 	if (error)
 		res_failed(error);
 	libxfs_trans_ijoin(tp, ip, 0);
@@ -1598,8 +1583,7 @@ longform_dir2_entry_check_data(
 	if (freetab->nents < db + 1)
 		freetab->nents = db + 1;
 
-	tp = libxfs_trans_alloc(mp, 0);
-	error = -libxfs_trans_reserve(tp, &M_RES(mp)->tr_remove, 0, 0);
+	error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_remove, 0, 0, 0, &tp);
 	if (error)
 		res_failed(error);
 	da.trans = tp;
@@ -2902,7 +2886,6 @@ process_dir_inode(
 			break;
 
 		case XFS_DINODE_FMT_LOCAL:
-			tp = libxfs_trans_alloc(mp, 0);
 			/*
 			 * using the remove reservation is overkill
 			 * since at most we'll only need to log the
@@ -2910,8 +2893,8 @@ process_dir_inode(
 			 * new define in ourselves.
 			 */
 			nres = no_modify ? 0 : XFS_REMOVE_SPACE_RES(mp);
-			error = -libxfs_trans_reserve(tp, &M_RES(mp)->tr_remove,
-						     nres, 0);
+			error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_remove,
+						     nres, 0, 0, &tp);
 			if (error)
 				res_failed(error);
 
@@ -2951,13 +2934,12 @@ process_dir_inode(
 
 		do_warn(_("recreating root directory .. entry\n"));
 
-		tp = libxfs_trans_alloc(mp, 0);
-		ASSERT(tp != NULL);
-
 		nres = XFS_MKDIR_SPACE_RES(mp, 2);
-		error = -libxfs_trans_reserve(tp, &M_RES(mp)->tr_mkdir, nres, 0);
+		error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_mkdir,
+				nres, 0, 0, &tp);
 		if (error)
 			res_failed(error);
+		ASSERT(tp != NULL);
 
 		libxfs_trans_ijoin(tp, ip, 0);
 
@@ -3010,14 +2992,12 @@ process_dir_inode(
 			do_warn(
 	_("creating missing \".\" entry in dir ino %" PRIu64 "\n"), ino);
 
-			tp = libxfs_trans_alloc(mp, 0);
-			ASSERT(tp != NULL);
-
 			nres = XFS_MKDIR_SPACE_RES(mp, 1);
-			error = -libxfs_trans_reserve(tp, &M_RES(mp)->tr_mkdir,
-						     nres, 0);
+			error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_mkdir,
+						     nres, 0, 0, &tp);
 			if (error)
 				res_failed(error);
+			ASSERT(tp != NULL);
 
 			libxfs_trans_ijoin(tp, ip, 0);
 
diff --git a/repair/phase7.c b/repair/phase7.c
index 3e234b9..8bce117 100644
--- a/repair/phase7.c
+++ b/repair/phase7.c
@@ -40,10 +40,8 @@ update_inode_nlinks(
 	int			dirty;
 	int			nres;
 
-	tp = libxfs_trans_alloc(mp, XFS_TRANS_REMOVE);
-
 	nres = no_modify ? 0 : 10;
-	error = -libxfs_trans_reserve(tp, &M_RES(mp)->tr_remove, nres, 0);
+	error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_remove, nres, 0, 0, &tp);
 	ASSERT(error == 0);
 
 	error = -libxfs_trans_iget(mp, tp, ino, 0, 0, &ip);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 006/145] xfs: make several functions static
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (4 preceding siblings ...)
  2016-06-17  1:31 ` [PATCH 005/145] libxfs: backport kernel 4.7 changes Darrick J. Wong
@ 2016-06-17  1:31 ` Darrick J. Wong
  2016-06-17  1:31 ` [PATCH 007/145] xfs: define XFS_IOC_FREEZE even if FIFREEZE is defined Darrick J. Wong
                   ` (138 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:31 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: Eric Sandeen, Christoph Hellwig, xfs

From: Eric Sandeen <sandeen@sandeen.net>

Al Viro noticed that xfs_lock_inodes should be static, and
that led to ... a few more.

These are just the easy ones, others require moving functions
higher in source files, so that's not done here to keep
this review simple.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 libxfs/xfs_alloc.c     |    2 +-
 libxfs/xfs_alloc.h     |    7 -------
 libxfs/xfs_attr_leaf.h |    3 ---
 libxfs/xfs_rtbitmap.c  |    2 +-
 4 files changed, 2 insertions(+), 12 deletions(-)


diff --git a/libxfs/xfs_alloc.c b/libxfs/xfs_alloc.c
index 28b3fb9..8520f31 100644
--- a/libxfs/xfs_alloc.c
+++ b/libxfs/xfs_alloc.c
@@ -80,7 +80,7 @@ xfs_alloc_lookup_ge(
  * Lookup the first record less than or equal to [bno, len]
  * in the btree given by cur.
  */
-int					/* error */
+static int				/* error */
 xfs_alloc_lookup_le(
 	struct xfs_btree_cur	*cur,	/* btree cursor */
 	xfs_agblock_t		bno,	/* starting block of extent */
diff --git a/libxfs/xfs_alloc.h b/libxfs/xfs_alloc.h
index 135eb3d..92a66ba 100644
--- a/libxfs/xfs_alloc.h
+++ b/libxfs/xfs_alloc.h
@@ -212,13 +212,6 @@ xfs_free_extent(
 	xfs_fsblock_t	bno,	/* starting block number of extent */
 	xfs_extlen_t	len);	/* length of extent */
 
-int					/* error */
-xfs_alloc_lookup_le(
-	struct xfs_btree_cur	*cur,	/* btree cursor */
-	xfs_agblock_t		bno,	/* starting block of extent */
-	xfs_extlen_t		len,	/* length of extent */
-	int			*stat);	/* success/failure */
-
 int				/* error */
 xfs_alloc_lookup_ge(
 	struct xfs_btree_cur	*cur,	/* btree cursor */
diff --git a/libxfs/xfs_attr_leaf.h b/libxfs/xfs_attr_leaf.h
index 9d7e741..8ef420a 100644
--- a/libxfs/xfs_attr_leaf.h
+++ b/libxfs/xfs_attr_leaf.h
@@ -50,7 +50,6 @@ int	xfs_attr_shortform_lookup(struct xfs_da_args *args);
 int	xfs_attr_shortform_getvalue(struct xfs_da_args *args);
 int	xfs_attr_shortform_to_leaf(struct xfs_da_args *args);
 int	xfs_attr_shortform_remove(struct xfs_da_args *args);
-int	xfs_attr_shortform_list(struct xfs_attr_list_context *context);
 int	xfs_attr_shortform_allfit(struct xfs_buf *bp, struct xfs_inode *dp);
 int	xfs_attr_shortform_bytesfit(struct xfs_inode *dp, int bytes);
 void	xfs_attr_fork_remove(struct xfs_inode *ip, struct xfs_trans *tp);
@@ -88,8 +87,6 @@ int	xfs_attr3_leaf_toosmall(struct xfs_da_state *state, int *retval);
 void	xfs_attr3_leaf_unbalance(struct xfs_da_state *state,
 				       struct xfs_da_state_blk *drop_blk,
 				       struct xfs_da_state_blk *save_blk);
-int	xfs_attr3_root_inactive(struct xfs_trans **trans, struct xfs_inode *dp);
-
 /*
  * Utility routines.
  */
diff --git a/libxfs/xfs_rtbitmap.c b/libxfs/xfs_rtbitmap.c
index 70ea975..36fe323 100644
--- a/libxfs/xfs_rtbitmap.c
+++ b/libxfs/xfs_rtbitmap.c
@@ -65,7 +65,7 @@ const struct xfs_buf_ops xfs_rtbuf_ops = {
  * Get a buffer for the bitmap or summary file block specified.
  * The buffer is returned read and locked.
  */
-int
+static int
 xfs_rtbuf_get(
 	xfs_mount_t	*mp,		/* file system mount structure */
 	xfs_trans_t	*tp,		/* transaction pointer */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 007/145] xfs: define XFS_IOC_FREEZE even if FIFREEZE is defined
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (5 preceding siblings ...)
  2016-06-17  1:31 ` [PATCH 006/145] xfs: make several functions static Darrick J. Wong
@ 2016-06-17  1:31 ` Darrick J. Wong
  2016-06-17  1:31 ` [PATCH 008/145] libxfs: add more list operations Darrick J. Wong
                   ` (137 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:31 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: Eric Sandeen, Christoph Hellwig, xfs

From: Christoph Hellwig <hch@lst.de>

And the same for XFS_IOC_THAW.  Just because we now have a common
version of the ioctl we still need to provide the old name for it
for anyone using those.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 libxfs/xfs_fs.h |    8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)


diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h
index b9622ba..1f17e1c 100644
--- a/libxfs/xfs_fs.h
+++ b/libxfs/xfs_fs.h
@@ -542,12 +542,8 @@ typedef struct xfs_swapext
 #define XFS_IOC_ERROR_CLEARALL	     _IOW ('X', 117, struct xfs_error_injection)
 /*	XFS_IOC_ATTRCTL_BY_HANDLE -- deprecated 118	 */
 
-/*	XFS_IOC_FREEZE		  -- FIFREEZE   119	 */
-/*	XFS_IOC_THAW		  -- FITHAW     120	 */
-#ifndef FIFREEZE
-#define XFS_IOC_FREEZE		     _IOWR('X', 119, int)
-#define XFS_IOC_THAW		     _IOWR('X', 120, int)
-#endif
+#define XFS_IOC_FREEZE		     _IOWR('X', 119, int)	/* aka FIFREEZE */
+#define XFS_IOC_THAW		     _IOWR('X', 120, int)	/* aka FITHAW */
 
 #define XFS_IOC_FSSETDM_BY_HANDLE    _IOW ('X', 121, struct xfs_fsop_setdm_handlereq)
 #define XFS_IOC_ATTRLIST_BY_HANDLE   _IOW ('X', 122, struct xfs_fsop_attrlist_handlereq)

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 008/145] libxfs: add more list operations
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (6 preceding siblings ...)
  2016-06-17  1:31 ` [PATCH 007/145] xfs: define XFS_IOC_FREEZE even if FIFREEZE is defined Darrick J. Wong
@ 2016-06-17  1:31 ` Darrick J. Wong
  2016-06-24  0:40   ` Dave Chinner
  2016-06-17  1:31 ` [PATCH 009/145] xfs_logprint: move the EFI copying/printing functions to a redo items file Darrick J. Wong
                   ` (136 subsequent siblings)
  144 siblings, 1 reply; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:31 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Add some list operations that the deferred rmap code requires.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 include/list.h |   76 ++++++++++++++++++++++++++++---
 libxfs/rdwr.c  |    2 -
 libxfs/util.c  |  137 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 206 insertions(+), 9 deletions(-)


diff --git a/include/list.h b/include/list.h
index f92faed..372cf06 100644
--- a/include/list.h
+++ b/include/list.h
@@ -111,30 +111,30 @@ static inline int list_empty(const struct list_head *head)
 }
 
 static inline void __list_splice(struct list_head *list,
-				 struct list_head *head)
+				 struct list_head *prev,
+				 struct list_head *next)
 {
 	struct list_head *first = list->next;
 	struct list_head *last = list->prev;
-	struct list_head *at = head->next;
 
-	first->prev = head;
-	head->next = first;
+	first->prev = prev;
+	prev->next = first;
 
-	last->next = at;
-	at->prev = last;
+	last->next = next;
+	next->prev = last;
 }
 
 static inline void list_splice(struct list_head *list, struct list_head *head)
 {
 	if (!list_empty(list))
-		__list_splice(list, head);
+		__list_splice(list, head, head->next);
 }
 
 static inline void list_splice_init(struct list_head *list,
 				    struct list_head *head)
 {
 	if (!list_empty(list)) {
-		__list_splice(list, head);
+		__list_splice(list, head, head->next);
 		list_head_init(list);
 	}
 }
@@ -161,4 +161,64 @@ static inline void list_splice_init(struct list_head *list,
 	     &pos->member != (head); 					\
 	     pos = n, n = list_entry(n->member.next, typeof(*n), member))
 
+#define list_first_entry(ptr, type, member) \
+	list_entry((ptr)->next, type, member)
+
+#define container_of(ptr, type, member) ({			\
+	const typeof( ((type *)0)->member ) *__mptr = (ptr);	\
+	(type *)( (char *)__mptr - offsetof(type,member) );})
+
+void list_sort(void *priv, struct list_head *head,
+	       int (*cmp)(void *priv, struct list_head *a,
+			  struct list_head *b));
+
+#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
+
+/**
+ * list_splice_tail_init - join two lists and reinitialise the emptied list
+ * @list: the new list to add.
+ * @head: the place to add it in the first list.
+ *
+ * Each of the lists is a queue.
+ * The list at @list is reinitialised
+ */
+static inline void list_splice_tail_init(struct list_head *list,
+					 struct list_head *head)
+{
+	if (!list_empty(list)) {
+		__list_splice(list, head->prev, head);
+		INIT_LIST_HEAD(list);
+	}
+}
+
+/**
+ * list_last_entry - get the last element from a list
+ * @ptr:	the list head to take the element from.
+ * @type:	the type of the struct this is embedded in.
+ * @member:	the name of the list_head within the struct.
+ *
+ * Note, that list is expected to be not empty.
+ */
+#define list_last_entry(ptr, type, member) \
+	list_entry((ptr)->prev, type, member)
+
+/**
+ * list_prev_entry - get the prev element in list
+ * @pos:	the type * to cursor
+ * @member:	the name of the list_head within the struct.
+ */
+#define list_prev_entry(pos, member) \
+	list_entry((pos)->member.prev, typeof(*(pos)), member)
+
+/**
+ * list_for_each_entry_reverse - iterate backwards over list of given type.
+ * @pos:	the type * to use as a loop cursor.
+ * @head:	the head for your list.
+ * @member:	the name of the list_head within the struct.
+ */
+#define list_for_each_entry_reverse(pos, head, member)			\
+	for (pos = list_last_entry(head, typeof(*pos), member);		\
+	     &pos->member != (head); 					\
+	     pos = list_prev_entry(pos, member))
+
 #endif	/* __LIST_H__ */
diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
index 0ec38c5..aa30522 100644
--- a/libxfs/rdwr.c
+++ b/libxfs/rdwr.c
@@ -1260,7 +1260,7 @@ libxfs_bulkrelse(
 	}
 
 	pthread_mutex_lock(&xfs_buf_freelist.cm_mutex);
-	__list_splice(list, &xfs_buf_freelist.cm_list);
+	list_splice(list, &xfs_buf_freelist.cm_list);
 	pthread_mutex_unlock(&xfs_buf_freelist.cm_mutex);
 
 	return count;
diff --git a/libxfs/util.c b/libxfs/util.c
index b992ad0..2e826c9 100644
--- a/libxfs/util.c
+++ b/libxfs/util.c
@@ -35,6 +35,7 @@
 #include "xfs_ialloc.h"
 #include "xfs_alloc.h"
 #include "xfs_bit.h"
+#include "list.h"
 
 /*
  * Calculate the worst case log unit reservation for a given superblock
@@ -781,3 +782,139 @@ libxfs_zero_extent(
 
 	return libxfs_device_zero(xfs_find_bdev_for_inode(ip), sector, size);
 }
+
+/* List sorting code from Linux. */
+#define MAX_LIST_LENGTH_BITS 20
+
+/*
+ * Returns a list organized in an intermediate format suited
+ * to chaining of merge() calls: null-terminated, no reserved or
+ * sentinel head node, "prev" links not maintained.
+ */
+static struct list_head *merge(void *priv,
+				int (*cmp)(void *priv, struct list_head *a,
+					struct list_head *b),
+				struct list_head *a, struct list_head *b)
+{
+	struct list_head head, *tail = &head;
+
+	while (a && b) {
+		/* if equal, take 'a' -- important for sort stability */
+		if ((*cmp)(priv, a, b) <= 0) {
+			tail->next = a;
+			a = a->next;
+		} else {
+			tail->next = b;
+			b = b->next;
+		}
+		tail = tail->next;
+	}
+	tail->next = a?:b;
+	return head.next;
+}
+
+/*
+ * Combine final list merge with restoration of standard doubly-linked
+ * list structure.  This approach duplicates code from merge(), but
+ * runs faster than the tidier alternatives of either a separate final
+ * prev-link restoration pass, or maintaining the prev links
+ * throughout.
+ */
+static void merge_and_restore_back_links(void *priv,
+				int (*cmp)(void *priv, struct list_head *a,
+					struct list_head *b),
+				struct list_head *head,
+				struct list_head *a, struct list_head *b)
+{
+	struct list_head *tail = head;
+	unsigned count = 0;
+
+	while (a && b) {
+		/* if equal, take 'a' -- important for sort stability */
+		if ((*cmp)(priv, a, b) <= 0) {
+			tail->next = a;
+			a->prev = tail;
+			a = a->next;
+		} else {
+			tail->next = b;
+			b->prev = tail;
+			b = b->next;
+		}
+		tail = tail->next;
+	}
+	tail->next = a ? : b;
+
+	do {
+		/*
+		 * In worst cases this loop may run many iterations.
+		 * Continue callbacks to the client even though no
+		 * element comparison is needed, so the client's cmp()
+		 * routine can invoke cond_resched() periodically.
+		 */
+		if (unlikely(!(++count)))
+			(*cmp)(priv, tail->next, tail->next);
+
+		tail->next->prev = tail;
+		tail = tail->next;
+	} while (tail->next);
+
+	tail->next = head;
+	head->prev = tail;
+}
+
+/**
+ * list_sort - sort a list
+ * @priv: private data, opaque to list_sort(), passed to @cmp
+ * @head: the list to sort
+ * @cmp: the elements comparison function
+ *
+ * This function implements "merge sort", which has O(nlog(n))
+ * complexity.
+ *
+ * The comparison function @cmp must return a negative value if @a
+ * should sort before @b, and a positive value if @a should sort after
+ * @b. If @a and @b are equivalent, and their original relative
+ * ordering is to be preserved, @cmp must return 0.
+ */
+void list_sort(void *priv, struct list_head *head,
+		int (*cmp)(void *priv, struct list_head *a,
+			struct list_head *b))
+{
+	struct list_head *part[MAX_LIST_LENGTH_BITS+1]; /* sorted partial lists
+						-- last slot is a sentinel */
+	int lev;  /* index into part[] */
+	int max_lev = 0;
+	struct list_head *list;
+
+	if (list_empty(head))
+		return;
+
+	memset(part, 0, sizeof(part));
+
+	head->prev->next = NULL;
+	list = head->next;
+
+	while (list) {
+		struct list_head *cur = list;
+		list = list->next;
+		cur->next = NULL;
+
+		for (lev = 0; part[lev]; lev++) {
+			cur = merge(priv, cmp, part[lev], cur);
+			part[lev] = NULL;
+		}
+		if (lev > max_lev) {
+			if (unlikely(lev >= ARRAY_SIZE(part)-1)) {
+				lev--;
+			}
+			max_lev = lev;
+		}
+		part[lev] = cur;
+	}
+
+	for (lev = 0; lev < max_lev; lev++)
+		if (part[lev])
+			list = merge(priv, cmp, part[lev], list);
+
+	merge_and_restore_back_links(priv, cmp, head, part[max_lev], list);
+}

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 009/145] xfs_logprint: move the EFI copying/printing functions to a redo items file
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (7 preceding siblings ...)
  2016-06-17  1:31 ` [PATCH 008/145] libxfs: add more list operations Darrick J. Wong
@ 2016-06-17  1:31 ` Darrick J. Wong
  2016-06-17  1:31 ` [PATCH 010/145] xfs_logprint: fix formatting issues with the EFI printing code Darrick J. Wong
                   ` (135 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:31 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Move the functions that handle EFI items into a separate file to
avoid cluttering up log_misc.c even more when we start adding the
other redo item types.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 logprint/Makefile        |    2 
 logprint/log_misc.c      |  144 -----------------------------
 logprint/log_print_all.c |   60 ------------
 logprint/log_redo.c      |  226 ++++++++++++++++++++++++++++++++++++++++++++++
 logprint/logprint.h      |    6 +
 5 files changed, 232 insertions(+), 206 deletions(-)
 create mode 100644 logprint/log_redo.c


diff --git a/logprint/Makefile b/logprint/Makefile
index 7bcf27f..534bf5b 100644
--- a/logprint/Makefile
+++ b/logprint/Makefile
@@ -10,7 +10,7 @@ LTCOMMAND = xfs_logprint
 HFILES = logprint.h
 CFILES = logprint.c \
 	 log_copy.c log_dump.c log_misc.c \
-	 log_print_all.c log_print_trans.c
+	 log_print_all.c log_print_trans.c log_redo.c
 
 LLDLIBS	= $(LIBXFS) $(LIBXLOG) $(LIBUUID) $(LIBRT) $(LIBPTHREAD)
 LTDEPENDENCIES = $(LIBXFS) $(LIBXLOG)
diff --git a/logprint/log_misc.c b/logprint/log_misc.c
index e6ee832..57d397c 100644
--- a/logprint/log_misc.c
+++ b/logprint/log_misc.c
@@ -475,100 +475,6 @@ xlog_print_trans_buffer(char **ptr, int len, int *i, int num_ops)
 
 
 int
-xlog_print_trans_efd(char **ptr, uint len)
-{
-    xfs_efd_log_format_t *f;
-    xfs_efd_log_format_t lbuf;
-    /* size without extents at end */
-    uint core_size = sizeof(xfs_efd_log_format_t) - sizeof(xfs_extent_t);
-
-    /*
-     * memmove to ensure 8-byte alignment for the long longs in
-     * xfs_efd_log_format_t structure
-     */
-    memmove(&lbuf, *ptr, MIN(core_size, len));
-    f = &lbuf;
-    *ptr += len;
-    if (len >= core_size) {
-	printf(_("EFD:  #regs: %d    num_extents: %d  id: 0x%llx\n"),
-	       f->efd_size, f->efd_nextents, (unsigned long long)f->efd_efi_id);
-
-	/* don't print extents as they are not used */
-
-	return 0;
-    } else {
-	printf(_("EFD: Not enough data to decode further\n"));
-	return 1;
-    }
-}	/* xlog_print_trans_efd */
-
-
-int
-xlog_print_trans_efi(
-	char **ptr,
-	uint src_len,
-	int continued)
-{
-    xfs_efi_log_format_t *src_f, *f = NULL;
-    uint		 dst_len;
-    xfs_extent_t	 *ex;
-    int			 i;
-    int			 error = 0;
-    int			 core_size = offsetof(xfs_efi_log_format_t, efi_extents);
-
-    /*
-     * memmove to ensure 8-byte alignment for the long longs in
-     * xfs_efi_log_format_t structure
-     */
-    if ((src_f = (xfs_efi_log_format_t *)malloc(src_len)) == NULL) {
-	fprintf(stderr, _("%s: xlog_print_trans_efi: malloc failed\n"), progname);
-	exit(1);
-    }
-    memmove((char*)src_f, *ptr, src_len);
-    *ptr += src_len;
-
-    /* convert to native format */
-    dst_len = sizeof(xfs_efi_log_format_t) + (src_f->efi_nextents - 1) * sizeof(xfs_extent_t);
-
-    if (continued && src_len < core_size) {
-	printf(_("EFI: Not enough data to decode further\n"));
-	error = 1;
-	goto error;
-    }
-
-    if ((f = (xfs_efi_log_format_t *)malloc(dst_len)) == NULL) {
-	fprintf(stderr, _("%s: xlog_print_trans_efi: malloc failed\n"), progname);
-	exit(1);
-    }
-    if (xfs_efi_copy_format((char*)src_f, src_len, f, continued)) {
-	error = 1;
-	goto error;
-    }
-
-    printf(_("EFI:  #regs: %d    num_extents: %d  id: 0x%llx\n"),
-	   f->efi_size, f->efi_nextents, (unsigned long long)f->efi_id);
-
-    if (continued) {
-	printf(_("EFI free extent data skipped (CONTINUE set, no space)\n"));
-	goto error;
-    }
-
-    ex = f->efi_extents;
-    for (i=0; i < f->efi_nextents; i++) {
-	    printf("(s: 0x%llx, l: %d) ",
-		    (unsigned long long)ex->ext_start, ex->ext_len);
-	    if (i % 4 == 3) printf("\n");
-	    ex++;
-    }
-    if (i % 4 != 0) printf("\n");
-error:
-    free(src_f);
-    free(f);
-    return error;
-}	/* xlog_print_trans_efi */
-
-
-int
 xlog_print_trans_qoff(char **ptr, uint len)
 {
     xfs_qoff_logformat_t *f;
@@ -1617,53 +1523,3 @@ xfs_inode_item_format_convert(char *src_buf, uint len, xfs_inode_log_format_t *i
 	}
 	return in_f;
 }
-
-int
-xfs_efi_copy_format(
-	char			  *buf,
-	uint			  len,
-	struct xfs_efi_log_format *dst_efi_fmt,
-	int			  continued)
-{
-        uint i;
-	uint nextents = ((xfs_efi_log_format_t *)buf)->efi_nextents;
-        uint dst_len = sizeof(xfs_efi_log_format_t) + (nextents - 1) * sizeof(xfs_extent_t);
-        uint len32 = sizeof(xfs_efi_log_format_32_t) + (nextents - 1) * sizeof(xfs_extent_32_t);
-        uint len64 = sizeof(xfs_efi_log_format_64_t) + (nextents - 1) * sizeof(xfs_extent_64_t);
-
-        if (len == dst_len || continued) {
-                memcpy((char *)dst_efi_fmt, buf, len);
-                return 0;
-        } else if (len == len32) {
-                xfs_efi_log_format_32_t *src_efi_fmt_32 = (xfs_efi_log_format_32_t *)buf;
-
-                dst_efi_fmt->efi_type     = src_efi_fmt_32->efi_type;
-                dst_efi_fmt->efi_size     = src_efi_fmt_32->efi_size;
-                dst_efi_fmt->efi_nextents = src_efi_fmt_32->efi_nextents;
-                dst_efi_fmt->efi_id       = src_efi_fmt_32->efi_id;
-                for (i = 0; i < dst_efi_fmt->efi_nextents; i++) {
-                        dst_efi_fmt->efi_extents[i].ext_start =
-                                src_efi_fmt_32->efi_extents[i].ext_start;
-                        dst_efi_fmt->efi_extents[i].ext_len =
-                                src_efi_fmt_32->efi_extents[i].ext_len;
-                }
-                return 0;
-        } else if (len == len64) {
-                xfs_efi_log_format_64_t *src_efi_fmt_64 = (xfs_efi_log_format_64_t *)buf;
-
-                dst_efi_fmt->efi_type     = src_efi_fmt_64->efi_type;
-                dst_efi_fmt->efi_size     = src_efi_fmt_64->efi_size;
-                dst_efi_fmt->efi_nextents = src_efi_fmt_64->efi_nextents;
-                dst_efi_fmt->efi_id       = src_efi_fmt_64->efi_id;
-                for (i = 0; i < dst_efi_fmt->efi_nextents; i++) {
-                        dst_efi_fmt->efi_extents[i].ext_start =
-                                src_efi_fmt_64->efi_extents[i].ext_start;
-                        dst_efi_fmt->efi_extents[i].ext_len =
-                                src_efi_fmt_64->efi_extents[i].ext_len;
-                }
-                return 0;
-        }
-	fprintf(stderr, _("%s: bad size of efi format: %u; expected %u or %u; nextents = %u\n"),
-		progname, len, len32, len64, nextents);
-        return 1;
-}
diff --git a/logprint/log_print_all.c b/logprint/log_print_all.c
index f95f4aa..4d92c3b 100644
--- a/logprint/log_print_all.c
+++ b/logprint/log_print_all.c
@@ -372,66 +372,6 @@ xlog_recover_print_inode(
 	}
 }
 
-STATIC void
-xlog_recover_print_efd(
-	xlog_recover_item_t	*item)
-{
-	xfs_efd_log_format_t	*f;
-
-	f = (xfs_efd_log_format_t *)item->ri_buf[0].i_addr;
-	/*
-	 * An xfs_efd_log_format structure contains a variable length array
-	 * as the last field.
-	 * Each element is of size xfs_extent_32_t or xfs_extent_64_t.
-	 * However, the extents are never used and won't be printed.
-	 */
-	printf(_("	EFD:  #regs: %d    num_extents: %d  id: 0x%llx\n"),
-	       f->efd_size, f->efd_nextents, (unsigned long long)f->efd_efi_id);
-}
-
-
-STATIC void
-xlog_recover_print_efi(
-	xlog_recover_item_t	*item)
-{
-	xfs_efi_log_format_t	*f, *src_f;
-	xfs_extent_t		*ex;
-	int			i;
-	uint			src_len, dst_len;
-
-	src_f = (xfs_efi_log_format_t *)item->ri_buf[0].i_addr;
-	src_len = item->ri_buf[0].i_len;
-	/*
-	 * An xfs_efi_log_format structure contains a variable length array
-	 * as the last field.
-	 * Each element is of size xfs_extent_32_t or xfs_extent_64_t.
-	 * Need to convert to native format.
-	 */
-	dst_len = sizeof(xfs_efi_log_format_t) + (src_f->efi_nextents - 1) * sizeof(xfs_extent_t);
-	if ((f = (xfs_efi_log_format_t *)malloc(dst_len)) == NULL) {
-	    fprintf(stderr, _("%s: xlog_recover_print_efi: malloc failed\n"), progname);
-	    exit(1);
-	}
-	if (xfs_efi_copy_format((char*)src_f, src_len, f, 0)) {
-	    free(f);
-	    return;
-	}
-
-	printf(_("	EFI:  #regs:%d    num_extents:%d  id:0x%llx\n"),
-	       f->efi_size, f->efi_nextents, (unsigned long long)f->efi_id);
-	ex = f->efi_extents;
-	printf("	");
-	for (i=0; i< f->efi_nextents; i++) {
-		printf("(s: 0x%llx, l: %d) ",
-			(unsigned long long)ex->ext_start, ex->ext_len);
-		if (i % 4 == 3)
-			printf("\n");
-		ex++;
-	}
-	if (i % 4 != 0)
-		printf("\n");
-	free(f);
-}
 
 STATIC void
 xlog_recover_print_icreate(
diff --git a/logprint/log_redo.c b/logprint/log_redo.c
new file mode 100644
index 0000000..a9608f0
--- /dev/null
+++ b/logprint/log_redo.c
@@ -0,0 +1,226 @@
+/*
+ * Copyright (c) 2000-2003,2005 Silicon Graphics, Inc.
+ * Copyright (c) 2016 Oracle, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+#include "libxfs.h"
+#include "libxlog.h"
+
+#include "logprint.h"
+
+/* Extent Free Items */
+
+int
+xfs_efi_copy_format(
+	char			  *buf,
+	uint			  len,
+	struct xfs_efi_log_format *dst_efi_fmt,
+	int			  continued)
+{
+        uint i;
+	uint nextents = ((xfs_efi_log_format_t *)buf)->efi_nextents;
+        uint dst_len = sizeof(xfs_efi_log_format_t) + (nextents - 1) * sizeof(xfs_extent_t);
+        uint len32 = sizeof(xfs_efi_log_format_32_t) + (nextents - 1) * sizeof(xfs_extent_32_t);
+        uint len64 = sizeof(xfs_efi_log_format_64_t) + (nextents - 1) * sizeof(xfs_extent_64_t);
+
+        if (len == dst_len || continued) {
+                memcpy((char *)dst_efi_fmt, buf, len);
+                return 0;
+        } else if (len == len32) {
+                xfs_efi_log_format_32_t *src_efi_fmt_32 = (xfs_efi_log_format_32_t *)buf;
+
+                dst_efi_fmt->efi_type     = src_efi_fmt_32->efi_type;
+                dst_efi_fmt->efi_size     = src_efi_fmt_32->efi_size;
+                dst_efi_fmt->efi_nextents = src_efi_fmt_32->efi_nextents;
+                dst_efi_fmt->efi_id       = src_efi_fmt_32->efi_id;
+                for (i = 0; i < dst_efi_fmt->efi_nextents; i++) {
+                        dst_efi_fmt->efi_extents[i].ext_start =
+                                src_efi_fmt_32->efi_extents[i].ext_start;
+                        dst_efi_fmt->efi_extents[i].ext_len =
+                                src_efi_fmt_32->efi_extents[i].ext_len;
+                }
+                return 0;
+        } else if (len == len64) {
+                xfs_efi_log_format_64_t *src_efi_fmt_64 = (xfs_efi_log_format_64_t *)buf;
+
+                dst_efi_fmt->efi_type     = src_efi_fmt_64->efi_type;
+                dst_efi_fmt->efi_size     = src_efi_fmt_64->efi_size;
+                dst_efi_fmt->efi_nextents = src_efi_fmt_64->efi_nextents;
+                dst_efi_fmt->efi_id       = src_efi_fmt_64->efi_id;
+                for (i = 0; i < dst_efi_fmt->efi_nextents; i++) {
+                        dst_efi_fmt->efi_extents[i].ext_start =
+                                src_efi_fmt_64->efi_extents[i].ext_start;
+                        dst_efi_fmt->efi_extents[i].ext_len =
+                                src_efi_fmt_64->efi_extents[i].ext_len;
+                }
+                return 0;
+        }
+	fprintf(stderr, _("%s: bad size of efi format: %u; expected %u or %u; nextents = %u\n"),
+		progname, len, len32, len64, nextents);
+        return 1;
+}
+
+int
+xlog_print_trans_efi(
+	char **ptr,
+	uint src_len,
+	int continued)
+{
+    xfs_efi_log_format_t *src_f, *f = NULL;
+    uint		 dst_len;
+    xfs_extent_t	 *ex;
+    int			 i;
+    int			 error = 0;
+    int			 core_size = offsetof(xfs_efi_log_format_t, efi_extents);
+
+    /*
+     * memmove to ensure 8-byte alignment for the long longs in
+     * xfs_efi_log_format_t structure
+     */
+    if ((src_f = (xfs_efi_log_format_t *)malloc(src_len)) == NULL) {
+	fprintf(stderr, _("%s: xlog_print_trans_efi: malloc failed\n"), progname);
+	exit(1);
+    }
+    memmove((char*)src_f, *ptr, src_len);
+    *ptr += src_len;
+
+    /* convert to native format */
+    dst_len = sizeof(xfs_efi_log_format_t) + (src_f->efi_nextents - 1) * sizeof(xfs_extent_t);
+
+    if (continued && src_len < core_size) {
+	printf(_("EFI: Not enough data to decode further\n"));
+	error = 1;
+	goto error;
+    }
+
+    if ((f = (xfs_efi_log_format_t *)malloc(dst_len)) == NULL) {
+	fprintf(stderr, _("%s: xlog_print_trans_efi: malloc failed\n"), progname);
+	exit(1);
+    }
+    if (xfs_efi_copy_format((char*)src_f, src_len, f, continued)) {
+	error = 1;
+	goto error;
+    }
+
+    printf(_("EFI:  #regs: %d    num_extents: %d  id: 0x%llx\n"),
+	   f->efi_size, f->efi_nextents, (unsigned long long)f->efi_id);
+
+    if (continued) {
+	printf(_("EFI free extent data skipped (CONTINUE set, no space)\n"));
+	goto error;
+    }
+
+    ex = f->efi_extents;
+    for (i=0; i < f->efi_nextents; i++) {
+	    printf("(s: 0x%llx, l: %d) ",
+		    (unsigned long long)ex->ext_start, ex->ext_len);
+	    if (i % 4 == 3) printf("\n");
+	    ex++;
+    }
+    if (i % 4 != 0) printf("\n");
+error:
+    free(src_f);
+    free(f);
+    return error;
+}	/* xlog_print_trans_efi */
+
+void
+xlog_recover_print_efi(
+	xlog_recover_item_t	*item)
+{
+	xfs_efi_log_format_t	*f, *src_f;
+	xfs_extent_t		*ex;
+	int			i;
+	uint			src_len, dst_len;
+
+	src_f = (xfs_efi_log_format_t *)item->ri_buf[0].i_addr;
+	src_len = item->ri_buf[0].i_len;
+	/*
+	 * An xfs_efi_log_format structure contains a variable length array
+	 * as the last field.
+	 * Each element is of size xfs_extent_32_t or xfs_extent_64_t.
+	 * Need to convert to native format.
+	 */
+	dst_len = sizeof(xfs_efi_log_format_t) + (src_f->efi_nextents - 1) * sizeof(xfs_extent_t);
+	if ((f = (xfs_efi_log_format_t *)malloc(dst_len)) == NULL) {
+	    fprintf(stderr, _("%s: xlog_recover_print_efi: malloc failed\n"), progname);
+	    exit(1);
+	}
+	if (xfs_efi_copy_format((char*)src_f, src_len, f, 0)) {
+	    free(f);
+	    return;
+	}
+
+	printf(_("	EFI:  #regs:%d    num_extents:%d  id:0x%llx\n"),
+	       f->efi_size, f->efi_nextents, (unsigned long long)f->efi_id);
+	ex = f->efi_extents;
+	printf("	");
+	for (i=0; i< f->efi_nextents; i++) {
+		printf("(s: 0x%llx, l: %d) ",
+			(unsigned long long)ex->ext_start, ex->ext_len);
+		if (i % 4 == 3)
+			printf("\n");
+		ex++;
+	}
+	if (i % 4 != 0)
+		printf("\n");
+	free(f);
+}
+
+int
+xlog_print_trans_efd(char **ptr, uint len)
+{
+    xfs_efd_log_format_t *f;
+    xfs_efd_log_format_t lbuf;
+    /* size without extents at end */
+    uint core_size = sizeof(xfs_efd_log_format_t) - sizeof(xfs_extent_t);
+
+    /*
+     * memmove to ensure 8-byte alignment for the long longs in
+     * xfs_efd_log_format_t structure
+     */
+    memmove(&lbuf, *ptr, MIN(core_size, len));
+    f = &lbuf;
+    *ptr += len;
+    if (len >= core_size) {
+	printf(_("EFD:  #regs: %d    num_extents: %d  id: 0x%llx\n"),
+	       f->efd_size, f->efd_nextents, (unsigned long long)f->efd_efi_id);
+
+	/* don't print extents as they are not used */
+
+	return 0;
+    } else {
+	printf(_("EFD: Not enough data to decode further\n"));
+	return 1;
+    }
+}	/* xlog_print_trans_efd */
+
+void
+xlog_recover_print_efd(
+	xlog_recover_item_t	*item)
+{
+	xfs_efd_log_format_t	*f;
+
+	f = (xfs_efd_log_format_t *)item->ri_buf[0].i_addr;
+	/*
+	 * An xfs_efd_log_format structure contains a variable length array
+	 * as the last field.
+	 * Each element is of size xfs_extent_32_t or xfs_extent_64_t.
+	 * However, the extents are never used and won't be printed.
+	 */
+	printf(_("	EFD:  #regs: %d    num_extents: %d  id: 0x%llx\n"),
+	       f->efd_size, f->efd_nextents, (unsigned long long)f->efd_efi_id);
+}
diff --git a/logprint/logprint.h b/logprint/logprint.h
index 018af81..517e852 100644
--- a/logprint/logprint.h
+++ b/logprint/logprint.h
@@ -45,6 +45,10 @@ extern void print_stars(void);
 
 extern xfs_inode_log_format_t *
 	xfs_inode_item_format_convert(char *, uint, xfs_inode_log_format_t *);
-extern int xfs_efi_copy_format(char *, uint, xfs_efi_log_format_t *, int);
+
+extern int xlog_print_trans_efi(char **ptr, uint src_len, int continued);
+extern void xlog_recover_print_efi(xlog_recover_item_t *item);
+extern int xlog_print_trans_efd(char **ptr, uint len);
+extern void xlog_recover_print_efd(xlog_recover_item_t *item);
 
 #endif	/* LOGPRINT_H */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 010/145] xfs_logprint: fix formatting issues with the EFI printing code
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (8 preceding siblings ...)
  2016-06-17  1:31 ` [PATCH 009/145] xfs_logprint: move the EFI copying/printing functions to a redo items file Darrick J. Wong
@ 2016-06-17  1:31 ` Darrick J. Wong
  2016-06-17  1:31 ` [PATCH 011/145] man: document the DAX fsxattr inode flag Darrick J. Wong
                   ` (134 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:31 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Fix some formatting issues with the EFI handling functions.
This is a purely mechanical whitespace fix, no code changes
aside from adding 'static'.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 logprint/log_redo.c |  261 ++++++++++++++++++++++++++-------------------------
 1 file changed, 133 insertions(+), 128 deletions(-)


diff --git a/logprint/log_redo.c b/logprint/log_redo.c
index a9608f0..d60cc1b 100644
--- a/logprint/log_redo.c
+++ b/logprint/log_redo.c
@@ -23,118 +23,119 @@
 
 /* Extent Free Items */
 
-int
+static int
 xfs_efi_copy_format(
 	char			  *buf,
 	uint			  len,
 	struct xfs_efi_log_format *dst_efi_fmt,
 	int			  continued)
 {
-        uint i;
+	uint i;
 	uint nextents = ((xfs_efi_log_format_t *)buf)->efi_nextents;
-        uint dst_len = sizeof(xfs_efi_log_format_t) + (nextents - 1) * sizeof(xfs_extent_t);
-        uint len32 = sizeof(xfs_efi_log_format_32_t) + (nextents - 1) * sizeof(xfs_extent_32_t);
-        uint len64 = sizeof(xfs_efi_log_format_64_t) + (nextents - 1) * sizeof(xfs_extent_64_t);
-
-        if (len == dst_len || continued) {
-                memcpy((char *)dst_efi_fmt, buf, len);
-                return 0;
-        } else if (len == len32) {
-                xfs_efi_log_format_32_t *src_efi_fmt_32 = (xfs_efi_log_format_32_t *)buf;
-
-                dst_efi_fmt->efi_type     = src_efi_fmt_32->efi_type;
-                dst_efi_fmt->efi_size     = src_efi_fmt_32->efi_size;
-                dst_efi_fmt->efi_nextents = src_efi_fmt_32->efi_nextents;
-                dst_efi_fmt->efi_id       = src_efi_fmt_32->efi_id;
-                for (i = 0; i < dst_efi_fmt->efi_nextents; i++) {
-                        dst_efi_fmt->efi_extents[i].ext_start =
-                                src_efi_fmt_32->efi_extents[i].ext_start;
-                        dst_efi_fmt->efi_extents[i].ext_len =
-                                src_efi_fmt_32->efi_extents[i].ext_len;
-                }
-                return 0;
-        } else if (len == len64) {
-                xfs_efi_log_format_64_t *src_efi_fmt_64 = (xfs_efi_log_format_64_t *)buf;
-
-                dst_efi_fmt->efi_type     = src_efi_fmt_64->efi_type;
-                dst_efi_fmt->efi_size     = src_efi_fmt_64->efi_size;
-                dst_efi_fmt->efi_nextents = src_efi_fmt_64->efi_nextents;
-                dst_efi_fmt->efi_id       = src_efi_fmt_64->efi_id;
-                for (i = 0; i < dst_efi_fmt->efi_nextents; i++) {
-                        dst_efi_fmt->efi_extents[i].ext_start =
-                                src_efi_fmt_64->efi_extents[i].ext_start;
-                        dst_efi_fmt->efi_extents[i].ext_len =
-                                src_efi_fmt_64->efi_extents[i].ext_len;
-                }
-                return 0;
-        }
+	uint dst_len = sizeof(xfs_efi_log_format_t) + (nextents - 1) * sizeof(xfs_extent_t);
+	uint len32 = sizeof(xfs_efi_log_format_32_t) + (nextents - 1) * sizeof(xfs_extent_32_t);
+	uint len64 = sizeof(xfs_efi_log_format_64_t) + (nextents - 1) * sizeof(xfs_extent_64_t);
+
+	if (len == dst_len || continued) {
+		memcpy((char *)dst_efi_fmt, buf, len);
+		return 0;
+	} else if (len == len32) {
+		xfs_efi_log_format_32_t *src_efi_fmt_32 = (xfs_efi_log_format_32_t *)buf;
+
+		dst_efi_fmt->efi_type	 = src_efi_fmt_32->efi_type;
+		dst_efi_fmt->efi_size	 = src_efi_fmt_32->efi_size;
+		dst_efi_fmt->efi_nextents = src_efi_fmt_32->efi_nextents;
+		dst_efi_fmt->efi_id	   = src_efi_fmt_32->efi_id;
+		for (i = 0; i < dst_efi_fmt->efi_nextents; i++) {
+			dst_efi_fmt->efi_extents[i].ext_start =
+				src_efi_fmt_32->efi_extents[i].ext_start;
+			dst_efi_fmt->efi_extents[i].ext_len =
+				src_efi_fmt_32->efi_extents[i].ext_len;
+		}
+		return 0;
+	} else if (len == len64) {
+		xfs_efi_log_format_64_t *src_efi_fmt_64 = (xfs_efi_log_format_64_t *)buf;
+
+		dst_efi_fmt->efi_type	 = src_efi_fmt_64->efi_type;
+		dst_efi_fmt->efi_size	 = src_efi_fmt_64->efi_size;
+		dst_efi_fmt->efi_nextents = src_efi_fmt_64->efi_nextents;
+		dst_efi_fmt->efi_id	   = src_efi_fmt_64->efi_id;
+		for (i = 0; i < dst_efi_fmt->efi_nextents; i++) {
+			dst_efi_fmt->efi_extents[i].ext_start =
+				src_efi_fmt_64->efi_extents[i].ext_start;
+			dst_efi_fmt->efi_extents[i].ext_len =
+				src_efi_fmt_64->efi_extents[i].ext_len;
+		}
+		return 0;
+	}
 	fprintf(stderr, _("%s: bad size of efi format: %u; expected %u or %u; nextents = %u\n"),
 		progname, len, len32, len64, nextents);
-        return 1;
+	return 1;
 }
 
 int
 xlog_print_trans_efi(
-	char **ptr,
-	uint src_len,
-	int continued)
+	char			**ptr,
+	uint			src_len,
+	int			continued)
 {
-    xfs_efi_log_format_t *src_f, *f = NULL;
-    uint		 dst_len;
-    xfs_extent_t	 *ex;
-    int			 i;
-    int			 error = 0;
-    int			 core_size = offsetof(xfs_efi_log_format_t, efi_extents);
-
-    /*
-     * memmove to ensure 8-byte alignment for the long longs in
-     * xfs_efi_log_format_t structure
-     */
-    if ((src_f = (xfs_efi_log_format_t *)malloc(src_len)) == NULL) {
-	fprintf(stderr, _("%s: xlog_print_trans_efi: malloc failed\n"), progname);
-	exit(1);
-    }
-    memmove((char*)src_f, *ptr, src_len);
-    *ptr += src_len;
-
-    /* convert to native format */
-    dst_len = sizeof(xfs_efi_log_format_t) + (src_f->efi_nextents - 1) * sizeof(xfs_extent_t);
-
-    if (continued && src_len < core_size) {
-	printf(_("EFI: Not enough data to decode further\n"));
-	error = 1;
-	goto error;
-    }
-
-    if ((f = (xfs_efi_log_format_t *)malloc(dst_len)) == NULL) {
-	fprintf(stderr, _("%s: xlog_print_trans_efi: malloc failed\n"), progname);
-	exit(1);
-    }
-    if (xfs_efi_copy_format((char*)src_f, src_len, f, continued)) {
-	error = 1;
-	goto error;
-    }
-
-    printf(_("EFI:  #regs: %d    num_extents: %d  id: 0x%llx\n"),
-	   f->efi_size, f->efi_nextents, (unsigned long long)f->efi_id);
-
-    if (continued) {
-	printf(_("EFI free extent data skipped (CONTINUE set, no space)\n"));
-	goto error;
-    }
-
-    ex = f->efi_extents;
-    for (i=0; i < f->efi_nextents; i++) {
-	    printf("(s: 0x%llx, l: %d) ",
-		    (unsigned long long)ex->ext_start, ex->ext_len);
-	    if (i % 4 == 3) printf("\n");
-	    ex++;
-    }
-    if (i % 4 != 0) printf("\n");
+	xfs_efi_log_format_t	*src_f, *f = NULL;
+	uint			dst_len;
+	xfs_extent_t		*ex;
+	int			i;
+	int			error = 0;
+	int			core_size = offsetof(xfs_efi_log_format_t, efi_extents);
+
+	/*
+	 * memmove to ensure 8-byte alignment for the long longs in
+	 * xfs_efi_log_format_t structure
+	 */
+	if ((src_f = (xfs_efi_log_format_t *)malloc(src_len)) == NULL) {
+		fprintf(stderr, _("%s: xlog_print_trans_efi: malloc failed\n"), progname);
+		exit(1);
+	}
+	memmove((char*)src_f, *ptr, src_len);
+	*ptr += src_len;
+
+	/* convert to native format */
+	dst_len = sizeof(xfs_efi_log_format_t) + (src_f->efi_nextents - 1) * sizeof(xfs_extent_t);
+
+	if (continued && src_len < core_size) {
+		printf(_("EFI: Not enough data to decode further\n"));
+		error = 1;
+		goto error;
+	}
+
+	if ((f = (xfs_efi_log_format_t *)malloc(dst_len)) == NULL) {
+		fprintf(stderr, _("%s: xlog_print_trans_efi: malloc failed\n"), progname);
+		exit(1);
+	}
+	if (xfs_efi_copy_format((char*)src_f, src_len, f, continued)) {
+		error = 1;
+		goto error;
+	}
+
+	printf(_("EFI:  #regs: %d	num_extents: %d  id: 0x%llx\n"),
+		f->efi_size, f->efi_nextents, (unsigned long long)f->efi_id);
+
+	if (continued) {
+		printf(_("EFI free extent data skipped (CONTINUE set, no space)\n"));
+		goto error;
+	}
+
+	ex = f->efi_extents;
+	for (i=0; i < f->efi_nextents; i++) {
+		printf("(s: 0x%llx, l: %d) ",
+			(unsigned long long)ex->ext_start, ex->ext_len);
+		if (i % 4 == 3) printf("\n");
+		ex++;
+	}
+	if (i % 4 != 0)
+		printf("\n");
 error:
-    free(src_f);
-    free(f);
-    return error;
+	free(src_f);
+	free(f);
+	return error;
 }	/* xlog_print_trans_efi */
 
 void
@@ -154,18 +155,20 @@ xlog_recover_print_efi(
 	 * Each element is of size xfs_extent_32_t or xfs_extent_64_t.
 	 * Need to convert to native format.
 	 */
-	dst_len = sizeof(xfs_efi_log_format_t) + (src_f->efi_nextents - 1) * sizeof(xfs_extent_t);
+	dst_len = sizeof(xfs_efi_log_format_t) +
+		(src_f->efi_nextents - 1) * sizeof(xfs_extent_t);
 	if ((f = (xfs_efi_log_format_t *)malloc(dst_len)) == NULL) {
-	    fprintf(stderr, _("%s: xlog_recover_print_efi: malloc failed\n"), progname);
-	    exit(1);
+		fprintf(stderr, _("%s: xlog_recover_print_efi: malloc failed\n"),
+			progname);
+		exit(1);
 	}
 	if (xfs_efi_copy_format((char*)src_f, src_len, f, 0)) {
-	    free(f);
-	    return;
+		free(f);
+		return;
 	}
 
-	printf(_("	EFI:  #regs:%d    num_extents:%d  id:0x%llx\n"),
-	       f->efi_size, f->efi_nextents, (unsigned long long)f->efi_id);
+	printf(_("	EFI:  #regs:%d	num_extents:%d  id:0x%llx\n"),
+		   f->efi_size, f->efi_nextents, (unsigned long long)f->efi_id);
 	ex = f->efi_extents;
 	printf("	");
 	for (i=0; i< f->efi_nextents; i++) {
@@ -183,29 +186,30 @@ xlog_recover_print_efi(
 int
 xlog_print_trans_efd(char **ptr, uint len)
 {
-    xfs_efd_log_format_t *f;
-    xfs_efd_log_format_t lbuf;
-    /* size without extents at end */
-    uint core_size = sizeof(xfs_efd_log_format_t) - sizeof(xfs_extent_t);
-
-    /*
-     * memmove to ensure 8-byte alignment for the long longs in
-     * xfs_efd_log_format_t structure
-     */
-    memmove(&lbuf, *ptr, MIN(core_size, len));
-    f = &lbuf;
-    *ptr += len;
-    if (len >= core_size) {
-	printf(_("EFD:  #regs: %d    num_extents: %d  id: 0x%llx\n"),
-	       f->efd_size, f->efd_nextents, (unsigned long long)f->efd_efi_id);
-
-	/* don't print extents as they are not used */
-
-	return 0;
-    } else {
-	printf(_("EFD: Not enough data to decode further\n"));
+	xfs_efd_log_format_t *f;
+	xfs_efd_log_format_t lbuf;
+	/* size without extents at end */
+	uint core_size = sizeof(xfs_efd_log_format_t) - sizeof(xfs_extent_t);
+
+	/*
+	 * memmove to ensure 8-byte alignment for the long longs in
+	 * xfs_efd_log_format_t structure
+	 */
+	memmove(&lbuf, *ptr, MIN(core_size, len));
+	f = &lbuf;
+	*ptr += len;
+	if (len >= core_size) {
+		printf(_("EFD:  #regs: %d	num_extents: %d  id: 0x%llx\n"),
+			f->efd_size, f->efd_nextents,
+			(unsigned long long)f->efd_efi_id);
+
+		/* don't print extents as they are not used */
+
+		return 0;
+	} else {
+		printf(_("EFD: Not enough data to decode further\n"));
 	return 1;
-    }
+	}
 }	/* xlog_print_trans_efd */
 
 void
@@ -221,6 +225,7 @@ xlog_recover_print_efd(
 	 * Each element is of size xfs_extent_32_t or xfs_extent_64_t.
 	 * However, the extents are never used and won't be printed.
 	 */
-	printf(_("	EFD:  #regs: %d    num_extents: %d  id: 0x%llx\n"),
-	       f->efd_size, f->efd_nextents, (unsigned long long)f->efd_efi_id);
+	printf(_("	EFD:  #regs: %d	num_extents: %d  id: 0x%llx\n"),
+		f->efd_size, f->efd_nextents,
+		(unsigned long long)f->efd_efi_id);
 }

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 011/145] man: document the DAX fsxattr inode flag
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (9 preceding siblings ...)
  2016-06-17  1:31 ` [PATCH 010/145] xfs_logprint: fix formatting issues with the EFI printing code Darrick J. Wong
@ 2016-06-17  1:31 ` Darrick J. Wong
  2016-06-17  1:32 ` [PATCH 012/145] xfs: separate freelist fixing into a separate helper Darrick J. Wong
                   ` (133 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:31 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Document the new inode flag in struct fsxattr for DAX.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 man/man3/xfsctl.3 |    5 +++++
 1 file changed, 5 insertions(+)


diff --git a/man/man3/xfsctl.3 b/man/man3/xfsctl.3
index e84b829..9e7f138 100644
--- a/man/man3/xfsctl.3
+++ b/man/man3/xfsctl.3
@@ -225,6 +225,11 @@ group and so files within different directories will not interleave
 extents on disk. The reservation is only active while files are being
 created and written into the directory.
 .TP
+.SM "Bit 15 (0x8000) \- XFS_XFLAG_DAX"
+If the filesystem lives on directly accessible persistent memory, reads and
+writes to this file will go straight to the persistent memory, bypassing the
+page cache.
+.TP
 .SM "Bit 31 (0x80000000) \- XFS_XFLAG_HASATTR"
 The file has extended attributes associated with it.
 .RE

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 012/145] xfs: separate freelist fixing into a separate helper
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (10 preceding siblings ...)
  2016-06-17  1:31 ` [PATCH 011/145] man: document the DAX fsxattr inode flag Darrick J. Wong
@ 2016-06-17  1:32 ` Darrick J. Wong
  2016-06-17  1:32 ` [PATCH 013/145] xfs: convert list of extents to free into a regular list Darrick J. Wong
                   ` (132 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:32 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

From: Dave Chinner <david@fromorbit.com>

Break up xfs_free_extent() into a helper that fixes the freelist.
This helper will be used subsequently to ensure the freelist during
deferred rmap processing.

Signed-off-by: Dave Chinner <david@fromorbit.com>
[darrick: refactor to put this at the head of the patchset]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_alloc.c |   82 ++++++++++++++++++++++++++++++++++------------------
 libxfs/xfs_alloc.h |    2 +
 2 files changed, 55 insertions(+), 29 deletions(-)


diff --git a/libxfs/xfs_alloc.c b/libxfs/xfs_alloc.c
index 8520f31..2998af8 100644
--- a/libxfs/xfs_alloc.c
+++ b/libxfs/xfs_alloc.c
@@ -2656,55 +2656,79 @@ error0:
 	return error;
 }
 
-/*
- * Free an extent.
- * Just break up the extent address and hand off to xfs_free_ag_extent
- * after fixing up the freelist.
- */
-int				/* error */
-xfs_free_extent(
-	xfs_trans_t	*tp,	/* transaction pointer */
-	xfs_fsblock_t	bno,	/* starting block number of extent */
-	xfs_extlen_t	len)	/* length of extent */
+/* Ensure that the freelist is at full capacity. */
+int
+xfs_free_extent_fix_freelist(
+	struct xfs_trans	*tp,
+	xfs_agnumber_t		agno,
+	struct xfs_buf		**agbp)
 {
-	xfs_alloc_arg_t	args;
-	int		error;
+	xfs_alloc_arg_t		args;
+	int			error;
 
-	ASSERT(len != 0);
 	memset(&args, 0, sizeof(xfs_alloc_arg_t));
 	args.tp = tp;
 	args.mp = tp->t_mountp;
+	args.agno = agno;
 
 	/*
 	 * validate that the block number is legal - the enables us to detect
 	 * and handle a silent filesystem corruption rather than crashing.
 	 */
-	args.agno = XFS_FSB_TO_AGNO(args.mp, bno);
 	if (args.agno >= args.mp->m_sb.sb_agcount)
 		return -EFSCORRUPTED;
 
-	args.agbno = XFS_FSB_TO_AGBNO(args.mp, bno);
-	if (args.agbno >= args.mp->m_sb.sb_agblocks)
-		return -EFSCORRUPTED;
-
 	args.pag = xfs_perag_get(args.mp, args.agno);
 	ASSERT(args.pag);
 
 	error = xfs_alloc_fix_freelist(&args, XFS_ALLOC_FLAG_FREEING);
 	if (error)
-		goto error0;
+		goto out;
+
+	*agbp = args.agbp;
+out:
+	xfs_perag_put(args.pag);
+	return error;
+}
+
+/*
+ * Free an extent.
+ * Just break up the extent address and hand off to xfs_free_ag_extent
+ * after fixing up the freelist.
+ */
+int				/* error */
+xfs_free_extent(
+	struct xfs_trans	*tp,	/* transaction pointer */
+	xfs_fsblock_t		bno,	/* starting block number of extent */
+	xfs_extlen_t		len)	/* length of extent */
+{
+	struct xfs_mount	*mp = tp->t_mountp;
+	struct xfs_buf		*agbp;
+	xfs_agnumber_t		agno = XFS_FSB_TO_AGNO(mp, bno);
+	xfs_agblock_t		agbno = XFS_FSB_TO_AGBNO(mp, bno);
+	int			error;
+
+	ASSERT(len != 0);
+
+	error = xfs_free_extent_fix_freelist(tp, agno, &agbp);
+	if (error)
+		return error;
+
+	XFS_WANT_CORRUPTED_GOTO(mp, agbno < mp->m_sb.sb_agblocks, err);
 
 	/* validate the extent size is legal now we have the agf locked */
-	if (args.agbno + len >
-			be32_to_cpu(XFS_BUF_TO_AGF(args.agbp)->agf_length)) {
-		error = -EFSCORRUPTED;
-		goto error0;
-	}
+	XFS_WANT_CORRUPTED_GOTO(mp,
+			agbno + len <= be32_to_cpu(XFS_BUF_TO_AGF(agbp)->agf_length),
+			err);
 
-	error = xfs_free_ag_extent(tp, args.agbp, args.agno, args.agbno, len, 0);
-	if (!error)
-		xfs_extent_busy_insert(tp, args.agno, args.agbno, len, 0);
-error0:
-	xfs_perag_put(args.pag);
+	error = xfs_free_ag_extent(tp, agbp, agno, agbno, len, 0);
+	if (error)
+		goto err;
+
+	xfs_extent_busy_insert(tp, agno, agbno, len, 0);
+	return 0;
+
+err:
+	xfs_trans_brelse(tp, agbp);
 	return error;
 }
diff --git a/libxfs/xfs_alloc.h b/libxfs/xfs_alloc.h
index 92a66ba..cf268b2 100644
--- a/libxfs/xfs_alloc.h
+++ b/libxfs/xfs_alloc.h
@@ -229,5 +229,7 @@ xfs_alloc_get_rec(
 int xfs_read_agf(struct xfs_mount *mp, struct xfs_trans *tp,
 			xfs_agnumber_t agno, int flags, struct xfs_buf **bpp);
 int xfs_alloc_fix_freelist(struct xfs_alloc_arg *args, int flags);
+int xfs_free_extent_fix_freelist(struct xfs_trans *tp, xfs_agnumber_t agno,
+		struct xfs_buf **agbp);
 
 #endif	/* __XFS_ALLOC_H__ */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 013/145] xfs: convert list of extents to free into a regular list
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (11 preceding siblings ...)
  2016-06-17  1:32 ` [PATCH 012/145] xfs: separate freelist fixing into a separate helper Darrick J. Wong
@ 2016-06-17  1:32 ` Darrick J. Wong
  2016-06-17  1:32 ` [PATCH 014/145] xfs: create a standard btree size calculator code Darrick J. Wong
                   ` (131 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:32 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

In struct xfs_bmap_free, convert the open-coded free extent list to
a regular list, then use list_sort to sort it prior to processing.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/init.c        |    3 ++-
 libxfs/libxfs_priv.h |    3 +--
 libxfs/util.c        |   10 +++++-----
 libxfs/xfs_bmap.c    |   39 +++++++++++----------------------------
 libxfs/xfs_bmap.h    |   14 ++++++++------
 5 files changed, 27 insertions(+), 42 deletions(-)


diff --git a/libxfs/init.c b/libxfs/init.c
index e04b6e0..67c3b30 100644
--- a/libxfs/init.c
+++ b/libxfs/init.c
@@ -417,7 +417,8 @@ manage_zones(int release)
 	xfs_btree_cur_zone = kmem_zone_init(
 			sizeof(xfs_btree_cur_t), "xfs_btree_cur");
 	xfs_bmap_free_item_zone = kmem_zone_init(
-			sizeof(xfs_bmap_free_item_t), "xfs_bmap_free_item");
+			sizeof(struct xfs_bmap_free_item),
+			"xfs_bmap_free_item");
 	xfs_log_item_desc_zone = kmem_zone_init(
 			sizeof(struct xfs_log_item_desc), "xfs_log_item_desc");
 	xfs_dir_startup();
diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h
index ecd75e7..ba16544 100644
--- a/libxfs/libxfs_priv.h
+++ b/libxfs/libxfs_priv.h
@@ -466,8 +466,7 @@ struct xfs_buftarg;
 int xfs_attr_rmtval_get(struct xfs_da_args *);
 
 /* xfs_bmap.c */
-void xfs_bmap_del_free(struct xfs_bmap_free *, struct xfs_bmap_free_item *,
-			struct xfs_bmap_free_item *);
+void xfs_bmap_del_free(struct xfs_bmap_free *, struct xfs_bmap_free_item *);
 
 /* xfs_mount.c */
 int xfs_initialize_perag_data(struct xfs_mount *, xfs_agnumber_t);
diff --git a/libxfs/util.c b/libxfs/util.c
index 2e826c9..f37b396 100644
--- a/libxfs/util.c
+++ b/libxfs/util.c
@@ -480,20 +480,20 @@ libxfs_bmap_finish(
 	struct xfs_bmap_free	*flist,
 	struct xfs_inode	*ip)
 {
-	xfs_bmap_free_item_t	*free;	/* free extent list item */
-	xfs_bmap_free_item_t	*next;	/* next item on free list */
+	struct xfs_bmap_free_item	*free;	/* free extent list item */
 	int			error;
 
 	if (flist->xbf_count == 0)
 		return 0;
 
-	for (free = flist->xbf_first; free != NULL; free = next) {
-		next = free->xbfi_next;
+	while (!list_empty(&flist->xbf_flist)) {
+		free = list_first_entry(&flist->xbf_flist,
+				struct xfs_bmap_free_item, xbfi_list);
 		error = xfs_free_extent(*tp, free->xbfi_startblock,
 					free->xbfi_blockcount);
 		if (error)
 			return error;
-		xfs_bmap_del_free(flist, NULL, free);
+		xfs_bmap_del_free(flist, free);
 	}
 	return 0;
 }
diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index c2a2c53..65de5ad 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -567,9 +567,7 @@ xfs_bmap_add_free(
 	xfs_fsblock_t		bno,		/* fs block number of extent */
 	xfs_filblks_t		len)		/* length of extent */
 {
-	xfs_bmap_free_item_t	*cur;		/* current (next) element */
-	xfs_bmap_free_item_t	*new;		/* new element */
-	xfs_bmap_free_item_t	*prev;		/* previous element */
+	struct xfs_bmap_free_item	*new;		/* new element */
 #ifdef DEBUG
 	xfs_agnumber_t		agno;
 	xfs_agblock_t		agbno;
@@ -589,17 +587,7 @@ xfs_bmap_add_free(
 	new = kmem_zone_alloc(xfs_bmap_free_item_zone, KM_SLEEP);
 	new->xbfi_startblock = bno;
 	new->xbfi_blockcount = (xfs_extlen_t)len;
-	for (prev = NULL, cur = flist->xbf_first;
-	     cur != NULL;
-	     prev = cur, cur = cur->xbfi_next) {
-		if (cur->xbfi_startblock >= bno)
-			break;
-	}
-	if (prev)
-		prev->xbfi_next = new;
-	else
-		flist->xbf_first = new;
-	new->xbfi_next = cur;
+	list_add(&new->xbfi_list, &flist->xbf_flist);
 	flist->xbf_count++;
 }
 
@@ -609,14 +597,10 @@ xfs_bmap_add_free(
  */
 void
 xfs_bmap_del_free(
-	xfs_bmap_free_t		*flist,	/* free item list header */
-	xfs_bmap_free_item_t	*prev,	/* previous item on list, if any */
-	xfs_bmap_free_item_t	*free)	/* list item to be freed */
+	struct xfs_bmap_free		*flist,	/* free item list header */
+	struct xfs_bmap_free_item	*free)	/* list item to be freed */
 {
-	if (prev)
-		prev->xbfi_next = free->xbfi_next;
-	else
-		flist->xbf_first = free->xbfi_next;
+	list_del(&free->xbfi_list);
 	flist->xbf_count--;
 	kmem_zone_free(xfs_bmap_free_item_zone, free);
 }
@@ -626,17 +610,16 @@ xfs_bmap_del_free(
  */
 void
 xfs_bmap_cancel(
-	xfs_bmap_free_t		*flist)	/* list of bmap_free_items */
+	struct xfs_bmap_free		*flist)	/* list of bmap_free_items */
 {
-	xfs_bmap_free_item_t	*free;	/* free list item */
-	xfs_bmap_free_item_t	*next;
+	struct xfs_bmap_free_item	*free;	/* free list item */
 
 	if (flist->xbf_count == 0)
 		return;
-	ASSERT(flist->xbf_first != NULL);
-	for (free = flist->xbf_first; free; free = next) {
-		next = free->xbfi_next;
-		xfs_bmap_del_free(flist, NULL, free);
+	while (!list_empty(&flist->xbf_flist)) {
+		free = list_first_entry(&flist->xbf_flist,
+				struct xfs_bmap_free_item, xbfi_list);
+		xfs_bmap_del_free(flist, free);
 	}
 	ASSERT(flist->xbf_count == 0);
 }
diff --git a/libxfs/xfs_bmap.h b/libxfs/xfs_bmap.h
index 6485403..c165b2d 100644
--- a/libxfs/xfs_bmap.h
+++ b/libxfs/xfs_bmap.h
@@ -62,12 +62,12 @@ struct xfs_bmalloca {
  * List of extents to be free "later".
  * The list is kept sorted on xbf_startblock.
  */
-typedef struct xfs_bmap_free_item
+struct xfs_bmap_free_item
 {
 	xfs_fsblock_t		xbfi_startblock;/* starting fs block number */
 	xfs_extlen_t		xbfi_blockcount;/* number of blocks in extent */
-	struct xfs_bmap_free_item *xbfi_next;	/* link to next entry */
-} xfs_bmap_free_item_t;
+	struct list_head	xbfi_list;
+};
 
 /*
  * Header for free extent list.
@@ -85,7 +85,7 @@ typedef struct xfs_bmap_free_item
  */
 typedef	struct xfs_bmap_free
 {
-	xfs_bmap_free_item_t	*xbf_first;	/* list of to-be-free extents */
+	struct list_head	xbf_flist;	/* list of to-be-free extents */
 	int			xbf_count;	/* count of items on list */
 	int			xbf_low;	/* alloc in low mode */
 } xfs_bmap_free_t;
@@ -141,8 +141,10 @@ static inline int xfs_bmapi_aflag(int w)
 
 static inline void xfs_bmap_init(xfs_bmap_free_t *flp, xfs_fsblock_t *fbp)
 {
-	((flp)->xbf_first = NULL, (flp)->xbf_count = 0, \
-		(flp)->xbf_low = 0, *(fbp) = NULLFSBLOCK);
+	INIT_LIST_HEAD(&flp->xbf_flist);
+	flp->xbf_count = 0;
+	flp->xbf_low = 0;
+	*fbp = NULLFSBLOCK;
 }
 
 /*

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 014/145] xfs: create a standard btree size calculator code
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (12 preceding siblings ...)
  2016-06-17  1:32 ` [PATCH 013/145] xfs: convert list of extents to free into a regular list Darrick J. Wong
@ 2016-06-17  1:32 ` Darrick J. Wong
  2016-06-17  1:32 ` [PATCH 015/145] xfs: refactor btree maxlevels computation Darrick J. Wong
                   ` (130 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:32 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Create a helper to generate AG btree height calculator functions.
This will be used (much) later when we get to the refcount btree.

v2: Use a helper function instead of a macro.
v3: We can (theoretically) store more than 2^32 records in a btree, so
    widen the fields to accept that.
v4: Don't modify xfs_bmap_worst_indlen; the purpose of /that/ function
    is to estimate the worst-case number of blocks needed for a bmbt
    expansion, not to calculate the space required to store nr records.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_btree.c |   27 +++++++++++++++++++++++++++
 libxfs/xfs_btree.h |    3 +++
 2 files changed, 30 insertions(+)


diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c
index 1448fd6..27e58b2 100644
--- a/libxfs/xfs_btree.c
+++ b/libxfs/xfs_btree.c
@@ -4152,3 +4152,30 @@ xfs_btree_sblock_verify(
 
 	return true;
 }
+
+/*
+ * Calculate the number of blocks needed to store a given number of records
+ * in a short-format (per-AG metadata) btree.
+ */
+xfs_extlen_t
+xfs_btree_calc_size(
+	struct xfs_mount	*mp,
+	uint			*limits,
+	unsigned long long	len)
+{
+	int			level;
+	int			maxrecs;
+	xfs_extlen_t		rval;
+
+	maxrecs = limits[0];
+	for (level = 0, rval = 0; len > 0; level++) {
+		len += maxrecs - 1;
+		do_div(len, maxrecs);
+		rval += len;
+		if (len == 1)
+			return rval;
+		if (level == 0)
+			maxrecs = limits[1];
+	}
+	return rval;
+}
diff --git a/libxfs/xfs_btree.h b/libxfs/xfs_btree.h
index 9a88839..b330f19 100644
--- a/libxfs/xfs_btree.h
+++ b/libxfs/xfs_btree.h
@@ -475,4 +475,7 @@ static inline int xfs_btree_get_level(struct xfs_btree_block *block)
 bool xfs_btree_sblock_v5hdr_verify(struct xfs_buf *bp);
 bool xfs_btree_sblock_verify(struct xfs_buf *bp, unsigned int max_recs);
 
+xfs_extlen_t xfs_btree_calc_size(struct xfs_mount *mp, uint *limits,
+		unsigned long long len);
+
 #endif	/* __XFS_BTREE_H__ */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 015/145] xfs: refactor btree maxlevels computation
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (13 preceding siblings ...)
  2016-06-17  1:32 ` [PATCH 014/145] xfs: create a standard btree size calculator code Darrick J. Wong
@ 2016-06-17  1:32 ` Darrick J. Wong
  2016-06-17  1:32 ` [PATCH 016/145] xfs: during btree split, save new block key & ptr for future insertion Darrick J. Wong
                   ` (129 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:32 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Create a common function to calculate the maximum height of a per-AG
btree.  This will eventually be used by the rmapbt and refcountbt code
to calculate appropriate maxlevels values for each.  This is important
because the verifiers and the transaction block reservations depend on
accurate estimates of many blocks are needed to satisfy a btree split.

We were mistakenly using the max bnobt height for all the btrees,
which creates a dangerous situation since the larger records and keys
in an rmapbt make it very possible that the rmapbt will be taller than
the bnobt and so we can run out of transaction block reservation.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_alloc.c  |   15 ++-------------
 libxfs/xfs_btree.c  |   19 +++++++++++++++++++
 libxfs/xfs_btree.h  |    2 ++
 libxfs/xfs_ialloc.c |   19 +++++--------------
 4 files changed, 28 insertions(+), 27 deletions(-)


diff --git a/libxfs/xfs_alloc.c b/libxfs/xfs_alloc.c
index 2998af8..7276419 100644
--- a/libxfs/xfs_alloc.c
+++ b/libxfs/xfs_alloc.c
@@ -1835,19 +1835,8 @@ void
 xfs_alloc_compute_maxlevels(
 	xfs_mount_t	*mp)	/* file system mount structure */
 {
-	int		level;
-	uint		maxblocks;
-	uint		maxleafents;
-	int		minleafrecs;
-	int		minnoderecs;
-
-	maxleafents = (mp->m_sb.sb_agblocks + 1) / 2;
-	minleafrecs = mp->m_alloc_mnr[0];
-	minnoderecs = mp->m_alloc_mnr[1];
-	maxblocks = (maxleafents + minleafrecs - 1) / minleafrecs;
-	for (level = 1; maxblocks > 1; level++)
-		maxblocks = (maxblocks + minnoderecs - 1) / minnoderecs;
-	mp->m_ag_maxlevels = level;
+	mp->m_ag_maxlevels = xfs_btree_compute_maxlevels(mp, mp->m_alloc_mnr,
+			(mp->m_sb.sb_agblocks + 1) / 2);
 }
 
 /*
diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c
index 27e58b2..623841f 100644
--- a/libxfs/xfs_btree.c
+++ b/libxfs/xfs_btree.c
@@ -4154,6 +4154,25 @@ xfs_btree_sblock_verify(
 }
 
 /*
+ * Calculate the number of btree levels needed to store a given number of
+ * records in a short-format btree.
+ */
+uint
+xfs_btree_compute_maxlevels(
+	struct xfs_mount	*mp,
+	uint			*limits,
+	unsigned long		len)
+{
+	uint			level;
+	unsigned long		maxblocks;
+
+	maxblocks = (len + limits[0] - 1) / limits[0];
+	for (level = 1; maxblocks > 1; level++)
+		maxblocks = (maxblocks + limits[1] - 1) / limits[1];
+	return level;
+}
+
+/*
  * Calculate the number of blocks needed to store a given number of records
  * in a short-format (per-AG metadata) btree.
  */
diff --git a/libxfs/xfs_btree.h b/libxfs/xfs_btree.h
index b330f19..b955e5d 100644
--- a/libxfs/xfs_btree.h
+++ b/libxfs/xfs_btree.h
@@ -477,5 +477,7 @@ bool xfs_btree_sblock_verify(struct xfs_buf *bp, unsigned int max_recs);
 
 xfs_extlen_t xfs_btree_calc_size(struct xfs_mount *mp, uint *limits,
 		unsigned long long len);
+uint xfs_btree_compute_maxlevels(struct xfs_mount *mp, uint *limits,
+		unsigned long len);
 
 #endif	/* __XFS_BTREE_H__ */
diff --git a/libxfs/xfs_ialloc.c b/libxfs/xfs_ialloc.c
index 4f0e4ee..249512b 100644
--- a/libxfs/xfs_ialloc.c
+++ b/libxfs/xfs_ialloc.c
@@ -2388,20 +2388,11 @@ void
 xfs_ialloc_compute_maxlevels(
 	xfs_mount_t	*mp)		/* file system mount structure */
 {
-	int		level;
-	uint		maxblocks;
-	uint		maxleafents;
-	int		minleafrecs;
-	int		minnoderecs;
-
-	maxleafents = (1LL << XFS_INO_AGINO_BITS(mp)) >>
-		XFS_INODES_PER_CHUNK_LOG;
-	minleafrecs = mp->m_inobt_mnr[0];
-	minnoderecs = mp->m_inobt_mnr[1];
-	maxblocks = (maxleafents + minleafrecs - 1) / minleafrecs;
-	for (level = 1; maxblocks > 1; level++)
-		maxblocks = (maxblocks + minnoderecs - 1) / minnoderecs;
-	mp->m_in_maxlevels = level;
+	uint		inodes;
+
+	inodes = (1LL << XFS_INO_AGINO_BITS(mp)) >> XFS_INODES_PER_CHUNK_LOG;
+	mp->m_in_maxlevels = xfs_btree_compute_maxlevels(mp, mp->m_inobt_mnr,
+							 inodes);
 }
 
 /*

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 016/145] xfs: during btree split, save new block key & ptr for future insertion
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (14 preceding siblings ...)
  2016-06-17  1:32 ` [PATCH 015/145] xfs: refactor btree maxlevels computation Darrick J. Wong
@ 2016-06-17  1:32 ` Darrick J. Wong
  2016-06-17  1:32 ` [PATCH 017/145] xfs: support btrees with overlapping intervals for keys Darrick J. Wong
                   ` (128 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:32 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

When a btree block has to be split, we pass the new block's ptr from
xfs_btree_split() back to xfs_btree_insert() via a pointer parameter;
however, we pass the block's key through the cursor's record.  It is a
little weird to "initialize" a record from a key since the non-key
attributes will have garbage values.

When we go to add support for interval queries, we have to be able to
pass the lowest and highest keys accessible via a pointer.  There's no
clean way to pass this back through the cursor's record field.
Therefore, pass the key directly back to xfs_btree_insert() the same
way that we pass the btree_ptr.

As a bonus, we no longer need init_rec_from_key and can drop it from the
codebase.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_alloc_btree.c  |   12 ------------
 libxfs/xfs_bmap_btree.c   |   12 ------------
 libxfs/xfs_btree.c        |   44 ++++++++++++++++++++++----------------------
 libxfs/xfs_btree.h        |    2 --
 libxfs/xfs_ialloc_btree.c |   10 ----------
 5 files changed, 22 insertions(+), 58 deletions(-)


diff --git a/libxfs/xfs_alloc_btree.c b/libxfs/xfs_alloc_btree.c
index 094135f..ff4bae4 100644
--- a/libxfs/xfs_alloc_btree.c
+++ b/libxfs/xfs_alloc_btree.c
@@ -210,17 +210,6 @@ xfs_allocbt_init_key_from_rec(
 }
 
 STATIC void
-xfs_allocbt_init_rec_from_key(
-	union xfs_btree_key	*key,
-	union xfs_btree_rec	*rec)
-{
-	ASSERT(key->alloc.ar_startblock != 0);
-
-	rec->alloc.ar_startblock = key->alloc.ar_startblock;
-	rec->alloc.ar_blockcount = key->alloc.ar_blockcount;
-}
-
-STATIC void
 xfs_allocbt_init_rec_from_cur(
 	struct xfs_btree_cur	*cur,
 	union xfs_btree_rec	*rec)
@@ -404,7 +393,6 @@ static const struct xfs_btree_ops xfs_allocbt_ops = {
 	.get_minrecs		= xfs_allocbt_get_minrecs,
 	.get_maxrecs		= xfs_allocbt_get_maxrecs,
 	.init_key_from_rec	= xfs_allocbt_init_key_from_rec,
-	.init_rec_from_key	= xfs_allocbt_init_rec_from_key,
 	.init_rec_from_cur	= xfs_allocbt_init_rec_from_cur,
 	.init_ptr_from_cur	= xfs_allocbt_init_ptr_from_cur,
 	.key_diff		= xfs_allocbt_key_diff,
diff --git a/libxfs/xfs_bmap_btree.c b/libxfs/xfs_bmap_btree.c
index 022d4b6..2ae701e 100644
--- a/libxfs/xfs_bmap_btree.c
+++ b/libxfs/xfs_bmap_btree.c
@@ -597,17 +597,6 @@ xfs_bmbt_init_key_from_rec(
 }
 
 STATIC void
-xfs_bmbt_init_rec_from_key(
-	union xfs_btree_key	*key,
-	union xfs_btree_rec	*rec)
-{
-	ASSERT(key->bmbt.br_startoff != 0);
-
-	xfs_bmbt_disk_set_allf(&rec->bmbt, be64_to_cpu(key->bmbt.br_startoff),
-			       0, 0, XFS_EXT_NORM);
-}
-
-STATIC void
 xfs_bmbt_init_rec_from_cur(
 	struct xfs_btree_cur	*cur,
 	union xfs_btree_rec	*rec)
@@ -757,7 +746,6 @@ static const struct xfs_btree_ops xfs_bmbt_ops = {
 	.get_minrecs		= xfs_bmbt_get_minrecs,
 	.get_dmaxrecs		= xfs_bmbt_get_dmaxrecs,
 	.init_key_from_rec	= xfs_bmbt_init_key_from_rec,
-	.init_rec_from_key	= xfs_bmbt_init_rec_from_key,
 	.init_rec_from_cur	= xfs_bmbt_init_rec_from_cur,
 	.init_ptr_from_cur	= xfs_bmbt_init_ptr_from_cur,
 	.key_diff		= xfs_bmbt_key_diff,
diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c
index 623841f..9a92c65 100644
--- a/libxfs/xfs_btree.c
+++ b/libxfs/xfs_btree.c
@@ -2858,10 +2858,9 @@ xfs_btree_make_block_unfull(
 	int			*index,	/* new tree index */
 	union xfs_btree_ptr	*nptr,	/* new btree ptr */
 	struct xfs_btree_cur	**ncur,	/* new btree cursor */
-	union xfs_btree_rec	*nrec,	/* new record */
+	union xfs_btree_key	*key, /* key of new block */
 	int			*stat)
 {
-	union xfs_btree_key	key;	/* new btree key value */
 	int			error = 0;
 
 	if ((cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) &&
@@ -2906,13 +2905,12 @@ xfs_btree_make_block_unfull(
 	 * If this works we have to re-set our variables because we
 	 * could be in a different block now.
 	 */
-	error = xfs_btree_split(cur, level, nptr, &key, ncur, stat);
+	error = xfs_btree_split(cur, level, nptr, key, ncur, stat);
 	if (error || *stat == 0)
 		return error;
 
 
 	*index = cur->bc_ptrs[level];
-	cur->bc_ops->init_rec_from_key(&key, nrec);
 	return 0;
 }
 
@@ -2925,16 +2923,16 @@ xfs_btree_insrec(
 	struct xfs_btree_cur	*cur,	/* btree cursor */
 	int			level,	/* level to insert record at */
 	union xfs_btree_ptr	*ptrp,	/* i/o: block number inserted */
-	union xfs_btree_rec	*recp,	/* i/o: record data inserted */
+	union xfs_btree_key	*key,	/* i/o: block key for ptrp */
 	struct xfs_btree_cur	**curp,	/* output: new cursor replacing cur */
 	int			*stat)	/* success/failure */
 {
 	struct xfs_btree_block	*block;	/* btree block */
 	struct xfs_buf		*bp;	/* buffer for block */
-	union xfs_btree_key	key;	/* btree key */
 	union xfs_btree_ptr	nptr;	/* new block ptr */
 	struct xfs_btree_cur	*ncur;	/* new btree cursor */
-	union xfs_btree_rec	nrec;	/* new record count */
+	union xfs_btree_key	nkey;	/* new block key */
+	union xfs_btree_rec	rec;	/* record to insert */
 	int			optr;	/* old key/record index */
 	int			ptr;	/* key/record index */
 	int			numrecs;/* number of records */
@@ -2943,8 +2941,14 @@ xfs_btree_insrec(
 	int			i;
 #endif
 
+	/* Make a key out of the record data to be inserted, and save it. */
+	if (level == 0) {
+		cur->bc_ops->init_rec_from_cur(cur, &rec);
+		cur->bc_ops->init_key_from_rec(key, &rec);
+	}
+
 	XFS_BTREE_TRACE_CURSOR(cur, XBT_ENTRY);
-	XFS_BTREE_TRACE_ARGIPR(cur, level, *ptrp, recp);
+	XFS_BTREE_TRACE_ARGIPR(cur, level, *ptrp, &rec);
 
 	ncur = NULL;
 
@@ -2969,9 +2973,6 @@ xfs_btree_insrec(
 		return 0;
 	}
 
-	/* Make a key out of the record data to be inserted, and save it. */
-	cur->bc_ops->init_key_from_rec(&key, recp);
-
 	optr = ptr;
 
 	XFS_BTREE_STATS_INC(cur, insrec);
@@ -2988,10 +2989,10 @@ xfs_btree_insrec(
 	/* Check that the new entry is being inserted in the right place. */
 	if (ptr <= numrecs) {
 		if (level == 0) {
-			ASSERT(cur->bc_ops->recs_inorder(cur, recp,
+			ASSERT(cur->bc_ops->recs_inorder(cur, &rec,
 				xfs_btree_rec_addr(cur, ptr, block)));
 		} else {
-			ASSERT(cur->bc_ops->keys_inorder(cur, &key,
+			ASSERT(cur->bc_ops->keys_inorder(cur, key,
 				xfs_btree_key_addr(cur, ptr, block)));
 		}
 	}
@@ -3004,7 +3005,7 @@ xfs_btree_insrec(
 	xfs_btree_set_ptr_null(cur, &nptr);
 	if (numrecs == cur->bc_ops->get_maxrecs(cur, level)) {
 		error = xfs_btree_make_block_unfull(cur, level, numrecs,
-					&optr, &ptr, &nptr, &ncur, &nrec, stat);
+					&optr, &ptr, &nptr, &ncur, &nkey, stat);
 		if (error || *stat == 0)
 			goto error0;
 	}
@@ -3054,7 +3055,7 @@ xfs_btree_insrec(
 #endif
 
 		/* Now put the new data in, bump numrecs and log it. */
-		xfs_btree_copy_keys(cur, kp, &key, 1);
+		xfs_btree_copy_keys(cur, kp, key, 1);
 		xfs_btree_copy_ptrs(cur, pp, ptrp, 1);
 		numrecs++;
 		xfs_btree_set_numrecs(block, numrecs);
@@ -3075,7 +3076,7 @@ xfs_btree_insrec(
 		xfs_btree_shift_recs(cur, rp, 1, numrecs - ptr + 1);
 
 		/* Now put the new data in, bump numrecs and log it. */
-		xfs_btree_copy_recs(cur, rp, recp, 1);
+		xfs_btree_copy_recs(cur, rp, &rec, 1);
 		xfs_btree_set_numrecs(block, ++numrecs);
 		xfs_btree_log_recs(cur, bp, ptr, numrecs);
 #ifdef DEBUG
@@ -3091,7 +3092,7 @@ xfs_btree_insrec(
 
 	/* If we inserted at the start of a block, update the parents' keys. */
 	if (optr == 1) {
-		error = xfs_btree_updkey(cur, &key, level + 1);
+		error = xfs_btree_updkey(cur, key, level + 1);
 		if (error)
 			goto error0;
 	}
@@ -3101,7 +3102,7 @@ xfs_btree_insrec(
 	 * we are at the far right edge of the tree, update it.
 	 */
 	if (xfs_btree_is_lastrec(cur, block, level)) {
-		cur->bc_ops->update_lastrec(cur, block, recp,
+		cur->bc_ops->update_lastrec(cur, block, &rec,
 					    ptr, LASTREC_INSREC);
 	}
 
@@ -3111,7 +3112,7 @@ xfs_btree_insrec(
 	 */
 	*ptrp = nptr;
 	if (!xfs_btree_ptr_is_null(cur, &nptr)) {
-		*recp = nrec;
+		*key = nkey;
 		*curp = ncur;
 	}
 
@@ -3142,14 +3143,13 @@ xfs_btree_insert(
 	union xfs_btree_ptr	nptr;	/* new block number (split result) */
 	struct xfs_btree_cur	*ncur;	/* new cursor (split result) */
 	struct xfs_btree_cur	*pcur;	/* previous level's cursor */
-	union xfs_btree_rec	rec;	/* record to insert */
+	union xfs_btree_key	key;	/* key of block to insert */
 
 	level = 0;
 	ncur = NULL;
 	pcur = cur;
 
 	xfs_btree_set_ptr_null(cur, &nptr);
-	cur->bc_ops->init_rec_from_cur(cur, &rec);
 
 	/*
 	 * Loop going up the tree, starting at the leaf level.
@@ -3161,7 +3161,7 @@ xfs_btree_insert(
 		 * Insert nrec/nptr into this level of the tree.
 		 * Note if we fail, nptr will be null.
 		 */
-		error = xfs_btree_insrec(pcur, level, &nptr, &rec, &ncur, &i);
+		error = xfs_btree_insrec(pcur, level, &nptr, &key, &ncur, &i);
 		if (error) {
 			if (pcur != cur)
 				xfs_btree_del_cursor(pcur, XFS_BTREE_ERROR);
diff --git a/libxfs/xfs_btree.h b/libxfs/xfs_btree.h
index b955e5d..b99c018 100644
--- a/libxfs/xfs_btree.h
+++ b/libxfs/xfs_btree.h
@@ -158,8 +158,6 @@ struct xfs_btree_ops {
 	/* init values of btree structures */
 	void	(*init_key_from_rec)(union xfs_btree_key *key,
 				     union xfs_btree_rec *rec);
-	void	(*init_rec_from_key)(union xfs_btree_key *key,
-				     union xfs_btree_rec *rec);
 	void	(*init_rec_from_cur)(struct xfs_btree_cur *cur,
 				     union xfs_btree_rec *rec);
 	void	(*init_ptr_from_cur)(struct xfs_btree_cur *cur,
diff --git a/libxfs/xfs_ialloc_btree.c b/libxfs/xfs_ialloc_btree.c
index 0a7d985..cd61419 100644
--- a/libxfs/xfs_ialloc_btree.c
+++ b/libxfs/xfs_ialloc_btree.c
@@ -145,14 +145,6 @@ xfs_inobt_init_key_from_rec(
 }
 
 STATIC void
-xfs_inobt_init_rec_from_key(
-	union xfs_btree_key	*key,
-	union xfs_btree_rec	*rec)
-{
-	rec->inobt.ir_startino = key->inobt.ir_startino;
-}
-
-STATIC void
 xfs_inobt_init_rec_from_cur(
 	struct xfs_btree_cur	*cur,
 	union xfs_btree_rec	*rec)
@@ -313,7 +305,6 @@ static const struct xfs_btree_ops xfs_inobt_ops = {
 	.get_minrecs		= xfs_inobt_get_minrecs,
 	.get_maxrecs		= xfs_inobt_get_maxrecs,
 	.init_key_from_rec	= xfs_inobt_init_key_from_rec,
-	.init_rec_from_key	= xfs_inobt_init_rec_from_key,
 	.init_rec_from_cur	= xfs_inobt_init_rec_from_cur,
 	.init_ptr_from_cur	= xfs_inobt_init_ptr_from_cur,
 	.key_diff		= xfs_inobt_key_diff,
@@ -335,7 +326,6 @@ static const struct xfs_btree_ops xfs_finobt_ops = {
 	.get_minrecs		= xfs_inobt_get_minrecs,
 	.get_maxrecs		= xfs_inobt_get_maxrecs,
 	.init_key_from_rec	= xfs_inobt_init_key_from_rec,
-	.init_rec_from_key	= xfs_inobt_init_rec_from_key,
 	.init_rec_from_cur	= xfs_inobt_init_rec_from_cur,
 	.init_ptr_from_cur	= xfs_finobt_init_ptr_from_cur,
 	.key_diff		= xfs_inobt_key_diff,

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 017/145] xfs: support btrees with overlapping intervals for keys
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (15 preceding siblings ...)
  2016-06-17  1:32 ` [PATCH 016/145] xfs: during btree split, save new block key & ptr for future insertion Darrick J. Wong
@ 2016-06-17  1:32 ` Darrick J. Wong
  2016-06-17  1:32 ` [PATCH 018/145] xfs: introduce interval queries on btrees Darrick J. Wong
                   ` (127 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:32 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

On a filesystem with both reflink and reverse mapping enabled, it's
possible to have multiple rmap records referring to the same blocks on
disk.  When overlapping intervals are possible, querying a classic
btree to find all records intersecting a given interval is inefficient
because we cannot use the left side of the search interval to filter
out non-matching records the same way that we can use the existing
btree key to filter out records coming after the right side of the
search interval.  This will become important once we want to use the
rmap btree to rebuild BMBTs, or implement the (future) fsmap ioctl.

(For the non-overlapping case, we can perform such queries trivially
by starting at the left side of the interval and walking the tree
until we pass the right side.)

Therefore, extend the btree code to come closer to supporting
intervals as a first-class record attribute.  This involves widening
the btree node's key space to store both the lowest key reachable via
the node pointer (as the btree does now) and the highest key reachable
via the same pointer and teaching the btree modifying functions to
keep the highest-key records up to date.

This behavior can be turned on via a new btree ops flag so that btrees
that cannot store overlapping intervals don't pay the overhead costs
in terms of extra code and disk format changes.

v2: When we're deleting a record in a btree that supports overlapped
interval records and the deletion results in two btree blocks being
joined, we defer updating the high/low keys until after all possible
joining (at higher levels in the tree) have finished.  At this point,
the btree pointers at all levels have been updated to remove the empty
blocks and we can update the low and high keys.

When we're doing this, we must be careful to update the keys of all
node pointers up to the root instead of stopping at the first set of
keys that don't need updating.  This is because it's possible for a
single deletion to cause joining of multiple levels of tree, and so
we need to update everything going back to the root.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 include/xfs_trace.h |    2 
 libxfs/xfs_btree.c  |  379 ++++++++++++++++++++++++++++++++++++++++++++++-----
 libxfs/xfs_btree.h  |   16 ++
 3 files changed, 361 insertions(+), 36 deletions(-)


diff --git a/include/xfs_trace.h b/include/xfs_trace.h
index 423772f..aea41bc 100644
--- a/include/xfs_trace.h
+++ b/include/xfs_trace.h
@@ -166,6 +166,8 @@
 #define trace_xfs_extlist(a,b,c,d)	((void) 0)
 #define trace_xfs_bunmap(a,b,c,d,e)	((void) 0)
 
+#define trace_xfs_btree_updkeys(...)		((void) 0)
+
 /* set c = c to avoid unused var warnings */
 #define trace_xfs_perag_get(a,b,c,d)	((c) = (c))
 #define trace_xfs_perag_get_tag(a,b,c,d) ((c) = (c))
diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c
index 9a92c65..7fa0226 100644
--- a/libxfs/xfs_btree.c
+++ b/libxfs/xfs_btree.c
@@ -48,6 +48,11 @@ static const __uint32_t xfs_magics[2][XFS_BTNUM_MAX] = {
 	xfs_magics[!!((cur)->bc_flags & XFS_BTREE_CRC_BLOCKS)][cur->bc_btnum]
 
 
+struct xfs_btree_double_key {
+	union xfs_btree_key	low;
+	union xfs_btree_key	high;
+};
+
 STATIC int				/* error (0 or EFSCORRUPTED) */
 xfs_btree_check_lblock(
 	struct xfs_btree_cur	*cur,	/* btree cursor */
@@ -424,6 +429,30 @@ xfs_btree_dup_cursor(
  * into a btree block (xfs_btree_*_offset) or return a pointer to the given
  * record, key or pointer (xfs_btree_*_addr).  Note that all addressing
  * inside the btree block is done using indices starting at one, not zero!
+ *
+ * If XFS_BTREE_OVERLAPPING is set, then this btree supports keys containing
+ * overlapping intervals.  In such a tree, records are still sorted lowest to
+ * highest and indexed by the smallest key value that refers to the record.
+ * However, nodes are different: each pointer has two associated keys -- one
+ * indexing the lowest key available in the block(s) below (the same behavior
+ * as the key in a regular btree) and another indexing the highest key
+ * available in the block(s) below.  Because records are /not/ sorted by the
+ * highest key, all leaf block updates require us to compute the highest key
+ * that matches any record in the leaf and to recursively update the high keys
+ * in the nodes going further up in the tree, if necessary.  Nodes look like
+ * this:
+ *
+ *		+--------+-----+-----+-----+-----+-----+-------+-------+-----+
+ * Non-Leaf:	| header | lo1 | hi1 | lo2 | hi2 | ... | ptr 1 | ptr 2 | ... |
+ *		+--------+-----+-----+-----+-----+-----+-------+-------+-----+
+ *
+ * To perform an interval query on an overlapped tree, perform the usual
+ * depth-first search and use the low and high keys to decide if we can skip
+ * that particular node.  If a leaf node is reached, return the records that
+ * intersect the interval.  Note that an interval query may return numerous
+ * entries.  For a non-overlapped tree, simply search for the record associated
+ * with the lowest key and iterate forward until a non-matching record is
+ * found.
  */
 
 /*
@@ -441,6 +470,17 @@ static inline size_t xfs_btree_block_len(struct xfs_btree_cur *cur)
 	return XFS_BTREE_SBLOCK_LEN;
 }
 
+/* Return size of btree block keys for this btree instance. */
+static inline size_t xfs_btree_key_len(struct xfs_btree_cur *cur)
+{
+	size_t			len;
+
+	len = cur->bc_ops->key_len;
+	if (cur->bc_ops->flags & XFS_BTREE_OPS_OVERLAPPING)
+		len *= 2;
+	return len;
+}
+
 /*
  * Return size of btree block pointers for this btree instance.
  */
@@ -471,7 +511,19 @@ xfs_btree_key_offset(
 	int			n)
 {
 	return xfs_btree_block_len(cur) +
-		(n - 1) * cur->bc_ops->key_len;
+		(n - 1) * xfs_btree_key_len(cur);
+}
+
+/*
+ * Calculate offset of the n-th high key in a btree block.
+ */
+STATIC size_t
+xfs_btree_high_key_offset(
+	struct xfs_btree_cur	*cur,
+	int			n)
+{
+	return xfs_btree_block_len(cur) +
+		(n - 1) * xfs_btree_key_len(cur) + cur->bc_ops->key_len;
 }
 
 /*
@@ -484,7 +536,7 @@ xfs_btree_ptr_offset(
 	int			level)
 {
 	return xfs_btree_block_len(cur) +
-		cur->bc_ops->get_maxrecs(cur, level) * cur->bc_ops->key_len +
+		cur->bc_ops->get_maxrecs(cur, level) * xfs_btree_key_len(cur) +
 		(n - 1) * xfs_btree_ptr_len(cur);
 }
 
@@ -515,6 +567,19 @@ xfs_btree_key_addr(
 }
 
 /*
+ * Return a pointer to the n-th high key in the btree block.
+ */
+STATIC union xfs_btree_key *
+xfs_btree_high_key_addr(
+	struct xfs_btree_cur	*cur,
+	int			n,
+	struct xfs_btree_block	*block)
+{
+	return (union xfs_btree_key *)
+		((char *)block + xfs_btree_high_key_offset(cur, n));
+}
+
+/*
  * Return a pointer to the n-th block pointer in the btree block.
  */
 STATIC union xfs_btree_ptr *
@@ -1213,7 +1278,7 @@ xfs_btree_copy_keys(
 	int			numkeys)
 {
 	ASSERT(numkeys >= 0);
-	memcpy(dst_key, src_key, numkeys * cur->bc_ops->key_len);
+	memcpy(dst_key, src_key, numkeys * xfs_btree_key_len(cur));
 }
 
 /*
@@ -1259,8 +1324,8 @@ xfs_btree_shift_keys(
 	ASSERT(numkeys >= 0);
 	ASSERT(dir == 1 || dir == -1);
 
-	dst_key = (char *)key + (dir * cur->bc_ops->key_len);
-	memmove(dst_key, key, numkeys * cur->bc_ops->key_len);
+	dst_key = (char *)key + (dir * xfs_btree_key_len(cur));
+	memmove(dst_key, key, numkeys * xfs_btree_key_len(cur));
 }
 
 /*
@@ -1875,6 +1940,180 @@ error0:
 	return error;
 }
 
+/* Determine the low and high keys of a leaf block */
+STATIC void
+xfs_btree_find_leaf_keys(
+	struct xfs_btree_cur	*cur,
+	struct xfs_btree_block	*block,
+	union xfs_btree_key	*low,
+	union xfs_btree_key	*high)
+{
+	int			n;
+	union xfs_btree_rec	*rec;
+	union xfs_btree_key	max_hkey;
+	union xfs_btree_key	hkey;
+
+	rec = xfs_btree_rec_addr(cur, 1, block);
+	cur->bc_ops->init_key_from_rec(low, rec);
+
+	if (!(cur->bc_ops->flags & XFS_BTREE_OPS_OVERLAPPING))
+		return;
+
+	cur->bc_ops->init_high_key_from_rec(&max_hkey, rec);
+	for (n = 2; n <= xfs_btree_get_numrecs(block); n++) {
+		rec = xfs_btree_rec_addr(cur, n, block);
+		cur->bc_ops->init_high_key_from_rec(&hkey, rec);
+		if (cur->bc_ops->diff_two_keys(cur, &max_hkey, &hkey) > 0)
+			max_hkey = hkey;
+	}
+
+	*high = max_hkey;
+}
+
+/* Determine the low and high keys of a node block */
+STATIC void
+xfs_btree_find_node_keys(
+	struct xfs_btree_cur	*cur,
+	struct xfs_btree_block	*block,
+	union xfs_btree_key	*low,
+	union xfs_btree_key	*high)
+{
+	int			n;
+	union xfs_btree_key	*hkey;
+	union xfs_btree_key	*max_hkey;
+
+	*low = *xfs_btree_key_addr(cur, 1, block);
+
+	if (!(cur->bc_ops->flags & XFS_BTREE_OPS_OVERLAPPING))
+		return;
+
+	max_hkey = xfs_btree_high_key_addr(cur, 1, block);
+	for (n = 2; n <= xfs_btree_get_numrecs(block); n++) {
+		hkey = xfs_btree_high_key_addr(cur, n, block);
+		if (cur->bc_ops->diff_two_keys(cur, max_hkey, hkey) > 0)
+			max_hkey = hkey;
+	}
+
+	*high = *max_hkey;
+}
+
+/*
+ * Update parental low & high keys from some block all the way back to the
+ * root of the btree.
+ */
+STATIC int
+__xfs_btree_updkeys(
+	struct xfs_btree_cur	*cur,
+	int			level,
+	struct xfs_btree_block	*block,
+	struct xfs_buf		*bp0,
+	bool			force_all)
+{
+	union xfs_btree_key	lkey;	/* keys from current level */
+	union xfs_btree_key	hkey;
+	union xfs_btree_key	*nlkey;	/* keys from the next level up */
+	union xfs_btree_key	*nhkey;
+	struct xfs_buf		*bp;
+	int			ptr = -1;
+
+	if (!(cur->bc_ops->flags & XFS_BTREE_OPS_OVERLAPPING))
+		return 0;
+
+	if (level + 1 >= cur->bc_nlevels)
+		return 0;
+
+	trace_xfs_btree_updkeys(cur, level, bp0);
+
+	if (level == 0)
+		xfs_btree_find_leaf_keys(cur, block, &lkey, &hkey);
+	else
+		xfs_btree_find_node_keys(cur, block, &lkey, &hkey);
+	for (level++; level < cur->bc_nlevels; level++) {
+		block = xfs_btree_get_block(cur, level, &bp);
+		trace_xfs_btree_updkeys(cur, level, bp);
+		ptr = cur->bc_ptrs[level];
+		nlkey = xfs_btree_key_addr(cur, ptr, block);
+		nhkey = xfs_btree_high_key_addr(cur, ptr, block);
+		if (!(cur->bc_ops->diff_two_keys(cur, nlkey, &lkey) != 0 ||
+		      cur->bc_ops->diff_two_keys(cur, nhkey, &hkey) != 0) &&
+		    !force_all)
+			break;
+		memcpy(nlkey, &lkey, cur->bc_ops->key_len);
+		memcpy(nhkey, &hkey, cur->bc_ops->key_len);
+		xfs_btree_log_keys(cur, bp, ptr, ptr);
+		if (level + 1 >= cur->bc_nlevels)
+			break;
+		xfs_btree_find_node_keys(cur, block, &lkey, &hkey);
+	}
+
+	return 0;
+}
+
+/*
+ * Update all the keys from a sibling block at some level in the cursor back
+ * to the root, stopping when we find a key pair that doesn't need updating.
+ */
+STATIC int
+xfs_btree_sibling_updkeys(
+	struct xfs_btree_cur	*cur,
+	int			level,
+	int			ptr,
+	struct xfs_btree_block	*block,
+	struct xfs_buf		*bp0)
+{
+	struct xfs_btree_cur	*ncur;
+	int			stat;
+	int			error;
+
+	error = xfs_btree_dup_cursor(cur, &ncur);
+	if (error)
+		return error;
+
+	if (level + 1 >= ncur->bc_nlevels)
+		error = -EDOM;
+	else if (ptr == XFS_BB_RIGHTSIB)
+		error = xfs_btree_increment(ncur, level + 1, &stat);
+	else if (ptr == XFS_BB_LEFTSIB)
+		error = xfs_btree_decrement(ncur, level + 1, &stat);
+	else
+		error = -EBADE;
+	if (error || !stat)
+		return error;
+
+	error = __xfs_btree_updkeys(ncur, level, block, bp0, false);
+	xfs_btree_del_cursor(ncur, XFS_BTREE_NOERROR);
+	return error;
+}
+
+/*
+ * Update all the keys from some level in cursor back to the root, stopping
+ * when we find a key pair that don't need updating.
+ */
+STATIC int
+xfs_btree_updkeys(
+	struct xfs_btree_cur	*cur,
+	int			level)
+{
+	struct xfs_buf		*bp;
+	struct xfs_btree_block	*block;
+
+	block = xfs_btree_get_block(cur, level, &bp);
+	return __xfs_btree_updkeys(cur, level, block, bp, false);
+}
+
+/* Update all the keys from some level in cursor back to the root. */
+STATIC int
+xfs_btree_updkeys_force(
+	struct xfs_btree_cur	*cur,
+	int			level)
+{
+	struct xfs_buf		*bp;
+	struct xfs_btree_block	*block;
+
+	block = xfs_btree_get_block(cur, level, &bp);
+	return __xfs_btree_updkeys(cur, level, block, bp, true);
+}
+
 /*
  * Update keys at all levels from here to the root along the cursor's path.
  */
@@ -1889,6 +2128,9 @@ xfs_btree_updkey(
 	union xfs_btree_key	*kp;
 	int			ptr;
 
+	if (cur->bc_ops->flags & XFS_BTREE_OPS_OVERLAPPING)
+		return 0;
+
 	XFS_BTREE_TRACE_CURSOR(cur, XBT_ENTRY);
 	XFS_BTREE_TRACE_ARGIK(cur, level, keyp);
 
@@ -1966,7 +2208,8 @@ xfs_btree_update(
 					    ptr, LASTREC_UPDATE);
 	}
 
-	/* Updating first rec in leaf. Pass new key value up to our parent. */
+	/* Pass new key value up to our parent. */
+	xfs_btree_updkeys(cur, 0);
 	if (ptr == 1) {
 		union xfs_btree_key	key;
 
@@ -2145,7 +2388,9 @@ xfs_btree_lshift(
 		rkp = &key;
 	}
 
-	/* Update the parent key values of right. */
+	/* Update the parent key values of left and right. */
+	xfs_btree_sibling_updkeys(cur, level, XFS_BB_LEFTSIB, left, lbp);
+	xfs_btree_updkeys(cur, level);
 	error = xfs_btree_updkey(cur, rkp, level + 1);
 	if (error)
 		goto error0;
@@ -2317,6 +2562,9 @@ xfs_btree_rshift(
 	if (error)
 		goto error1;
 
+	/* Update left and right parent pointers */
+	xfs_btree_updkeys(cur, level);
+	xfs_btree_updkeys(tcur, level);
 	error = xfs_btree_updkey(tcur, rkp, level + 1);
 	if (error)
 		goto error1;
@@ -2352,7 +2600,7 @@ __xfs_btree_split(
 	struct xfs_btree_cur	*cur,
 	int			level,
 	union xfs_btree_ptr	*ptrp,
-	union xfs_btree_key	*key,
+	struct xfs_btree_double_key	*key,
 	struct xfs_btree_cur	**curp,
 	int			*stat)		/* success/failure */
 {
@@ -2448,9 +2696,6 @@ __xfs_btree_split(
 
 		xfs_btree_log_keys(cur, rbp, 1, rrecs);
 		xfs_btree_log_ptrs(cur, rbp, 1, rrecs);
-
-		/* Grab the keys to the entries moved to the right block */
-		xfs_btree_copy_keys(cur, key, rkp, 1);
 	} else {
 		/* It's a leaf.  Move records.  */
 		union xfs_btree_rec	*lrp;	/* left record pointer */
@@ -2461,12 +2706,8 @@ __xfs_btree_split(
 
 		xfs_btree_copy_recs(cur, rrp, lrp, rrecs);
 		xfs_btree_log_recs(cur, rbp, 1, rrecs);
-
-		cur->bc_ops->init_key_from_rec(key,
-			xfs_btree_rec_addr(cur, 1, right));
 	}
 
-
 	/*
 	 * Find the left block number by looking in the buffer.
 	 * Adjust numrecs, sibling pointers.
@@ -2480,6 +2721,12 @@ __xfs_btree_split(
 	xfs_btree_set_numrecs(left, lrecs);
 	xfs_btree_set_numrecs(right, xfs_btree_get_numrecs(right) + rrecs);
 
+	/* Find the low & high keys for the new block. */
+	if (level > 0)
+		xfs_btree_find_node_keys(cur, right, &key->low, &key->high);
+	else
+		xfs_btree_find_leaf_keys(cur, right, &key->low, &key->high);
+
 	xfs_btree_log_block(cur, rbp, XFS_BB_ALL_BITS);
 	xfs_btree_log_block(cur, lbp, XFS_BB_NUMRECS | XFS_BB_RIGHTSIB);
 
@@ -2495,6 +2742,10 @@ __xfs_btree_split(
 		xfs_btree_set_sibling(cur, rrblock, &rptr, XFS_BB_LEFTSIB);
 		xfs_btree_log_block(cur, rrbp, XFS_BB_LEFTSIB);
 	}
+
+	/* Update the left block's keys... */
+	xfs_btree_updkeys(cur, level);
+
 	/*
 	 * If the cursor is really in the right block, move it there.
 	 * If it's just pointing past the last entry in left, then we'll
@@ -2533,7 +2784,7 @@ struct xfs_btree_split_args {
 	struct xfs_btree_cur	*cur;
 	int			level;
 	union xfs_btree_ptr	*ptrp;
-	union xfs_btree_key	*key;
+	struct xfs_btree_double_key	*key;
 	struct xfs_btree_cur	**curp;
 	int			*stat;		/* success/failure */
 	int			result;
@@ -2582,7 +2833,7 @@ xfs_btree_split(
 	struct xfs_btree_cur	*cur,
 	int			level,
 	union xfs_btree_ptr	*ptrp,
-	union xfs_btree_key	*key,
+	struct xfs_btree_double_key	*key,
 	struct xfs_btree_cur	**curp,
 	int			*stat)		/* success/failure */
 {
@@ -2802,27 +3053,27 @@ xfs_btree_new_root(
 		bp = lbp;
 		nptr = 2;
 	}
+
 	/* Fill in the new block's btree header and log it. */
 	xfs_btree_init_block_cur(cur, nbp, cur->bc_nlevels, 2);
 	xfs_btree_log_block(cur, nbp, XFS_BB_ALL_BITS);
 	ASSERT(!xfs_btree_ptr_is_null(cur, &lptr) &&
 			!xfs_btree_ptr_is_null(cur, &rptr));
-
 	/* Fill in the key data in the new root. */
 	if (xfs_btree_get_level(left) > 0) {
-		xfs_btree_copy_keys(cur,
+		xfs_btree_find_node_keys(cur, left,
 				xfs_btree_key_addr(cur, 1, new),
-				xfs_btree_key_addr(cur, 1, left), 1);
-		xfs_btree_copy_keys(cur,
+				xfs_btree_high_key_addr(cur, 1, new));
+		xfs_btree_find_node_keys(cur, right,
 				xfs_btree_key_addr(cur, 2, new),
-				xfs_btree_key_addr(cur, 1, right), 1);
+				xfs_btree_high_key_addr(cur, 2, new));
 	} else {
-		cur->bc_ops->init_key_from_rec(
-				xfs_btree_key_addr(cur, 1, new),
-				xfs_btree_rec_addr(cur, 1, left));
-		cur->bc_ops->init_key_from_rec(
-				xfs_btree_key_addr(cur, 2, new),
-				xfs_btree_rec_addr(cur, 1, right));
+		xfs_btree_find_leaf_keys(cur, left,
+			xfs_btree_key_addr(cur, 1, new),
+			xfs_btree_high_key_addr(cur, 1, new));
+		xfs_btree_find_leaf_keys(cur, right,
+			xfs_btree_key_addr(cur, 2, new),
+			xfs_btree_high_key_addr(cur, 2, new));
 	}
 	xfs_btree_log_keys(cur, nbp, 1, 2);
 
@@ -2833,6 +3084,7 @@ xfs_btree_new_root(
 		xfs_btree_ptr_addr(cur, 2, new), &rptr, 1);
 	xfs_btree_log_ptrs(cur, nbp, 1, 2);
 
+
 	/* Fix up the cursor. */
 	xfs_btree_setbuf(cur, cur->bc_nlevels, nbp);
 	cur->bc_ptrs[cur->bc_nlevels] = nptr;
@@ -2858,7 +3110,7 @@ xfs_btree_make_block_unfull(
 	int			*index,	/* new tree index */
 	union xfs_btree_ptr	*nptr,	/* new btree ptr */
 	struct xfs_btree_cur	**ncur,	/* new btree cursor */
-	union xfs_btree_key	*key, /* key of new block */
+	struct xfs_btree_double_key	*key,	/* key of new block */
 	int			*stat)
 {
 	int			error = 0;
@@ -2914,6 +3166,22 @@ xfs_btree_make_block_unfull(
 	return 0;
 }
 
+/* Copy a double key into a btree block. */
+static void
+xfs_btree_copy_double_keys(
+	struct xfs_btree_cur	*cur,
+	int			ptr,
+	struct xfs_btree_block	*block,
+	struct xfs_btree_double_key	*key)
+{
+	memcpy(xfs_btree_key_addr(cur, ptr, block), &key->low,
+			cur->bc_ops->key_len);
+
+	if (cur->bc_ops->flags & XFS_BTREE_OPS_OVERLAPPING)
+		memcpy(xfs_btree_high_key_addr(cur, ptr, block), &key->high,
+				cur->bc_ops->key_len);
+}
+
 /*
  * Insert one record/level.  Return information to the caller
  * allowing the next level up to proceed if necessary.
@@ -2923,7 +3191,7 @@ xfs_btree_insrec(
 	struct xfs_btree_cur	*cur,	/* btree cursor */
 	int			level,	/* level to insert record at */
 	union xfs_btree_ptr	*ptrp,	/* i/o: block number inserted */
-	union xfs_btree_key	*key,	/* i/o: block key for ptrp */
+	struct xfs_btree_double_key	*key, /* i/o: block key for ptrp */
 	struct xfs_btree_cur	**curp,	/* output: new cursor replacing cur */
 	int			*stat)	/* success/failure */
 {
@@ -2931,7 +3199,7 @@ xfs_btree_insrec(
 	struct xfs_buf		*bp;	/* buffer for block */
 	union xfs_btree_ptr	nptr;	/* new block ptr */
 	struct xfs_btree_cur	*ncur;	/* new btree cursor */
-	union xfs_btree_key	nkey;	/* new block key */
+	struct xfs_btree_double_key	nkey;	/* new block key */
 	union xfs_btree_rec	rec;	/* record to insert */
 	int			optr;	/* old key/record index */
 	int			ptr;	/* key/record index */
@@ -2940,11 +3208,12 @@ xfs_btree_insrec(
 #ifdef DEBUG
 	int			i;
 #endif
+	xfs_daddr_t		old_bn;
 
 	/* Make a key out of the record data to be inserted, and save it. */
 	if (level == 0) {
 		cur->bc_ops->init_rec_from_cur(cur, &rec);
-		cur->bc_ops->init_key_from_rec(key, &rec);
+		cur->bc_ops->init_key_from_rec(&key->low, &rec);
 	}
 
 	XFS_BTREE_TRACE_CURSOR(cur, XBT_ENTRY);
@@ -2979,6 +3248,7 @@ xfs_btree_insrec(
 
 	/* Get pointers to the btree buffer and block. */
 	block = xfs_btree_get_block(cur, level, &bp);
+	old_bn = bp ? bp->b_bn : XFS_BUF_DADDR_NULL;
 	numrecs = xfs_btree_get_numrecs(block);
 
 #ifdef DEBUG
@@ -2992,7 +3262,7 @@ xfs_btree_insrec(
 			ASSERT(cur->bc_ops->recs_inorder(cur, &rec,
 				xfs_btree_rec_addr(cur, ptr, block)));
 		} else {
-			ASSERT(cur->bc_ops->keys_inorder(cur, key,
+			ASSERT(cur->bc_ops->keys_inorder(cur, &key->low,
 				xfs_btree_key_addr(cur, ptr, block)));
 		}
 	}
@@ -3055,7 +3325,7 @@ xfs_btree_insrec(
 #endif
 
 		/* Now put the new data in, bump numrecs and log it. */
-		xfs_btree_copy_keys(cur, kp, key, 1);
+		xfs_btree_copy_double_keys(cur, ptr, block, key);
 		xfs_btree_copy_ptrs(cur, pp, ptrp, 1);
 		numrecs++;
 		xfs_btree_set_numrecs(block, numrecs);
@@ -3091,8 +3361,24 @@ xfs_btree_insrec(
 	xfs_btree_log_block(cur, bp, XFS_BB_NUMRECS);
 
 	/* If we inserted at the start of a block, update the parents' keys. */
+	if (ncur && bp->b_bn != old_bn) {
+		/*
+		 * We just inserted into a new tree block, which means that
+		 * the key for the block is in nkey, not the tree.
+		 */
+		if (level == 0)
+			xfs_btree_find_leaf_keys(cur, block, &nkey.low,
+					&nkey.high);
+		else
+			xfs_btree_find_node_keys(cur, block, &nkey.low,
+					&nkey.high);
+	} else {
+		/* Updating the left block, do it the standard way. */
+		xfs_btree_updkeys(cur, level);
+	}
+
 	if (optr == 1) {
-		error = xfs_btree_updkey(cur, key, level + 1);
+		error = xfs_btree_updkey(cur, &key->low, level + 1);
 		if (error)
 			goto error0;
 	}
@@ -3143,7 +3429,7 @@ xfs_btree_insert(
 	union xfs_btree_ptr	nptr;	/* new block number (split result) */
 	struct xfs_btree_cur	*ncur;	/* new cursor (split result) */
 	struct xfs_btree_cur	*pcur;	/* previous level's cursor */
-	union xfs_btree_key	key;	/* key of block to insert */
+	struct xfs_btree_double_key	key;	/* key of block to insert */
 
 	level = 0;
 	ncur = NULL;
@@ -3548,6 +3834,7 @@ xfs_btree_delrec(
 	 * If we deleted the leftmost entry in the block, update the
 	 * key values above us in the tree.
 	 */
+	xfs_btree_updkeys(cur, level);
 	if (ptr == 1) {
 		error = xfs_btree_updkey(cur, keyp, level + 1);
 		if (error)
@@ -3878,6 +4165,16 @@ xfs_btree_delrec(
 	if (level > 0)
 		cur->bc_ptrs[level]--;
 
+	/*
+	 * We combined blocks, so we have to update the parent keys if the
+	 * btree supports overlapped intervals.  However, bc_ptrs[level + 1]
+	 * points to the old block so that the caller knows which record to
+	 * delete.  Therefore, the caller must be savvy enough to call updkeys
+	 * for us if we return stat == 2.  The other exit points from this
+	 * function don't require deletions further up the tree, so they can
+	 * call updkeys directly.
+	 */
+
 	XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT);
 	/* Return value means the next level up has something to do. */
 	*stat = 2;
@@ -3903,6 +4200,7 @@ xfs_btree_delete(
 	int			error;	/* error return value */
 	int			level;
 	int			i;
+	bool			joined = false;
 
 	XFS_BTREE_TRACE_CURSOR(cur, XBT_ENTRY);
 
@@ -3916,8 +4214,17 @@ xfs_btree_delete(
 		error = xfs_btree_delrec(cur, level, &i);
 		if (error)
 			goto error0;
+		if (i == 2)
+			joined = true;
 	}
 
+	/*
+	 * If we combined blocks as part of deleting the record, delrec won't
+	 * have updated the parent keys so we have to do that here.
+	 */
+	if (joined)
+		xfs_btree_updkeys_force(cur, 0);
+
 	if (i == 0) {
 		for (level = 1; level < cur->bc_nlevels; level++) {
 			if (cur->bc_ptrs[level] == 0) {
diff --git a/libxfs/xfs_btree.h b/libxfs/xfs_btree.h
index b99c018..a5ec6c7 100644
--- a/libxfs/xfs_btree.h
+++ b/libxfs/xfs_btree.h
@@ -126,6 +126,9 @@ struct xfs_btree_ops {
 	size_t	key_len;
 	size_t	rec_len;
 
+	/* flags */
+	uint	flags;
+
 	/* cursor operations */
 	struct xfs_btree_cur *(*dup_cursor)(struct xfs_btree_cur *);
 	void	(*update_cursor)(struct xfs_btree_cur *src,
@@ -162,11 +165,21 @@ struct xfs_btree_ops {
 				     union xfs_btree_rec *rec);
 	void	(*init_ptr_from_cur)(struct xfs_btree_cur *cur,
 				     union xfs_btree_ptr *ptr);
+	void	(*init_high_key_from_rec)(union xfs_btree_key *key,
+					  union xfs_btree_rec *rec);
 
 	/* difference between key value and cursor value */
 	__int64_t (*key_diff)(struct xfs_btree_cur *cur,
 			      union xfs_btree_key *key);
 
+	/*
+	 * Difference between key2 and key1 -- positive if key2 > key1,
+	 * negative if key2 < key1, and zero if equal.
+	 */
+	__int64_t (*diff_two_keys)(struct xfs_btree_cur *cur,
+				   union xfs_btree_key *key1,
+				   union xfs_btree_key *key2);
+
 	const struct xfs_buf_ops	*buf_ops;
 
 #if defined(DEBUG) || defined(XFS_WARN)
@@ -182,6 +195,9 @@ struct xfs_btree_ops {
 #endif
 };
 
+/* btree ops flags */
+#define XFS_BTREE_OPS_OVERLAPPING	(1<<0)	/* overlapping intervals */
+
 /*
  * Reasons for the update_lastrec method to be called.
  */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 018/145] xfs: introduce interval queries on btrees
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (16 preceding siblings ...)
  2016-06-17  1:32 ` [PATCH 017/145] xfs: support btrees with overlapping intervals for keys Darrick J. Wong
@ 2016-06-17  1:32 ` Darrick J. Wong
  2016-06-17  1:32 ` [PATCH 019/145] xfs: refactor btree owner change into a separate visit-blocks function Darrick J. Wong
                   ` (126 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:32 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Create a function to enable querying of btree records mapping to a
range of keys.  This will be used in subsequent patches to allow
querying the reverse mapping btree to find the extents mapped to a
range of physical blocks, though the generic code can be used for
any range query.

v2: add some shortcuts so that we can jump out of processing once
we know there won't be any more records to find.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 include/xfs_trace.h |    1 
 libxfs/xfs_btree.c  |  249 +++++++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_btree.h  |   22 +++--
 3 files changed, 267 insertions(+), 5 deletions(-)


diff --git a/include/xfs_trace.h b/include/xfs_trace.h
index aea41bc..56c4533 100644
--- a/include/xfs_trace.h
+++ b/include/xfs_trace.h
@@ -167,6 +167,7 @@
 #define trace_xfs_bunmap(a,b,c,d,e)	((void) 0)
 
 #define trace_xfs_btree_updkeys(...)		((void) 0)
+#define trace_xfs_btree_overlapped_query_range(...)	((void) 0)
 
 /* set c = c to avoid unused var warnings */
 #define trace_xfs_perag_get(a,b,c,d)	((c) = (c))
diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c
index 7fa0226..7a34aa1 100644
--- a/libxfs/xfs_btree.c
+++ b/libxfs/xfs_btree.c
@@ -4505,3 +4505,252 @@ xfs_btree_calc_size(
 	}
 	return rval;
 }
+
+/* Query a regular btree for all records overlapping a given interval. */
+STATIC int
+xfs_btree_simple_query_range(
+	struct xfs_btree_cur		*cur,
+	union xfs_btree_irec		*low_rec,
+	union xfs_btree_irec		*high_rec,
+	xfs_btree_query_range_fn	fn,
+	void				*priv)
+{
+	union xfs_btree_rec		*recp;
+	union xfs_btree_rec		rec;
+	union xfs_btree_key		low_key;
+	union xfs_btree_key		high_key;
+	union xfs_btree_key		rec_key;
+	__int64_t			diff;
+	int				stat;
+	bool				firstrec = true;
+	int				error;
+
+	ASSERT(cur->bc_ops->init_high_key_from_rec);
+
+	/* Find the keys of both ends of the interval. */
+	cur->bc_rec = *high_rec;
+	cur->bc_ops->init_rec_from_cur(cur, &rec);
+	cur->bc_ops->init_key_from_rec(&high_key, &rec);
+
+	cur->bc_rec = *low_rec;
+	cur->bc_ops->init_rec_from_cur(cur, &rec);
+	cur->bc_ops->init_key_from_rec(&low_key, &rec);
+
+	/* Find the leftmost record. */
+	stat = 0;
+	error = xfs_btree_lookup(cur, XFS_LOOKUP_LE, &stat);
+	if (error)
+		goto out;
+
+	while (stat) {
+		/* Find the record. */
+		error = xfs_btree_get_rec(cur, &recp, &stat);
+		if (error || !stat)
+			break;
+
+		/* Can we tell if this record is too low? */
+		if (firstrec) {
+			cur->bc_rec = *low_rec;
+			cur->bc_ops->init_high_key_from_rec(&rec_key, recp);
+			diff = cur->bc_ops->key_diff(cur, &rec_key);
+			if (diff < 0)
+				goto advloop;
+		}
+		firstrec = false;
+
+		/* Have we gone past the end? */
+		cur->bc_rec = *high_rec;
+		cur->bc_ops->init_key_from_rec(&rec_key, recp);
+		diff = cur->bc_ops->key_diff(cur, &rec_key);
+		if (diff > 0)
+			break;
+
+		/* Callback */
+		error = fn(cur, recp, priv);
+		if (error < 0 || error == XFS_BTREE_QUERY_RANGE_ABORT)
+			break;
+
+advloop:
+		/* Move on to the next record. */
+		error = xfs_btree_increment(cur, 0, &stat);
+		if (error)
+			break;
+	}
+
+out:
+	return error;
+}
+
+/*
+ * Query an overlapped interval btree for all records overlapping a given
+ * interval.
+ */
+STATIC int
+xfs_btree_overlapped_query_range(
+	struct xfs_btree_cur		*cur,
+	union xfs_btree_irec		*low_rec,
+	union xfs_btree_irec		*high_rec,
+	xfs_btree_query_range_fn	fn,
+	void				*priv)
+{
+	union xfs_btree_ptr		ptr;
+	union xfs_btree_ptr		*pp;
+	union xfs_btree_key		rec_key;
+	union xfs_btree_key		low_key;
+	union xfs_btree_key		high_key;
+	union xfs_btree_key		*lkp;
+	union xfs_btree_key		*hkp;
+	union xfs_btree_rec		rec;
+	union xfs_btree_rec		*recp;
+	struct xfs_btree_block		*block;
+	__int64_t			ldiff;
+	__int64_t			hdiff;
+	int				level;
+	struct xfs_buf			*bp;
+	int				i;
+	int				error;
+
+	/* Find the keys of both ends of the interval. */
+	cur->bc_rec = *high_rec;
+	cur->bc_ops->init_rec_from_cur(cur, &rec);
+	cur->bc_ops->init_key_from_rec(&high_key, &rec);
+
+	cur->bc_rec = *low_rec;
+	cur->bc_ops->init_rec_from_cur(cur, &rec);
+	cur->bc_ops->init_key_from_rec(&low_key, &rec);
+
+	/* Load the root of the btree. */
+	level = cur->bc_nlevels - 1;
+	cur->bc_ops->init_ptr_from_cur(cur, &ptr);
+	error = xfs_btree_lookup_get_block(cur, level, &ptr, &block);
+	if (error)
+		return error;
+	xfs_btree_get_block(cur, level, &bp);
+	trace_xfs_btree_overlapped_query_range(cur, level, bp);
+#ifdef DEBUG
+	error = xfs_btree_check_block(cur, block, level, bp);
+	if (error)
+		goto out;
+#endif
+	cur->bc_ptrs[level] = 1;
+
+	while (level < cur->bc_nlevels) {
+		block = XFS_BUF_TO_BLOCK(cur->bc_bufs[level]);
+
+		if (level == 0) {
+			/* End of leaf, pop back towards the root. */
+			if (cur->bc_ptrs[level] >
+			    be16_to_cpu(block->bb_numrecs)) {
+leaf_pop_up:
+				if (level < cur->bc_nlevels - 1)
+					cur->bc_ptrs[level + 1]++;
+				level++;
+				continue;
+			}
+
+			recp = xfs_btree_rec_addr(cur, cur->bc_ptrs[0], block);
+
+			cur->bc_ops->init_high_key_from_rec(&rec_key, recp);
+			ldiff = cur->bc_ops->diff_two_keys(cur, &low_key,
+					&rec_key);
+
+			cur->bc_ops->init_key_from_rec(&rec_key, recp);
+			hdiff = cur->bc_ops->diff_two_keys(cur, &rec_key,
+					&high_key);
+
+			/* If the record matches, callback */
+			if (ldiff >= 0 && hdiff >= 0) {
+				error = fn(cur, recp, priv);
+				if (error < 0 ||
+				    error == XFS_BTREE_QUERY_RANGE_ABORT)
+					break;
+			} else if (hdiff < 0) {
+				/* Record is larger than high key; pop. */
+				goto leaf_pop_up;
+			}
+			cur->bc_ptrs[level]++;
+			continue;
+		}
+
+		/* End of node, pop back towards the root. */
+		if (cur->bc_ptrs[level] > be16_to_cpu(block->bb_numrecs)) {
+node_pop_up:
+			if (level < cur->bc_nlevels - 1)
+				cur->bc_ptrs[level + 1]++;
+			level++;
+			continue;
+		}
+
+		lkp = xfs_btree_key_addr(cur, cur->bc_ptrs[level], block);
+		hkp = xfs_btree_high_key_addr(cur, cur->bc_ptrs[level], block);
+		pp = xfs_btree_ptr_addr(cur, cur->bc_ptrs[level], block);
+
+		ldiff = cur->bc_ops->diff_two_keys(cur, &low_key, hkp);
+		hdiff = cur->bc_ops->diff_two_keys(cur, lkp, &high_key);
+
+		/* If the key matches, drill another level deeper. */
+		if (ldiff >= 0 && hdiff >= 0) {
+			level--;
+			error = xfs_btree_lookup_get_block(cur, level, pp,
+					&block);
+			if (error)
+				goto out;
+			xfs_btree_get_block(cur, level, &bp);
+			trace_xfs_btree_overlapped_query_range(cur, level, bp);
+#ifdef DEBUG
+			error = xfs_btree_check_block(cur, block, level, bp);
+			if (error)
+				goto out;
+#endif
+			cur->bc_ptrs[level] = 1;
+			continue;
+		} else if (hdiff < 0) {
+			/* The low key is larger than the upper range; pop. */
+			goto node_pop_up;
+		}
+		cur->bc_ptrs[level]++;
+	}
+
+out:
+	/*
+	 * If we don't end this function with the cursor pointing at a record
+	 * block, a subsequent non-error cursor deletion will not release
+	 * node-level buffers, causing a buffer leak.  This is quite possible
+	 * with a zero-results range query, so release the buffers if we
+	 * failed to return any results.
+	 */
+	if (cur->bc_bufs[0] == NULL) {
+		for (i = 0; i < cur->bc_nlevels; i++) {
+			if (cur->bc_bufs[i]) {
+				xfs_trans_brelse(cur->bc_tp, cur->bc_bufs[i]);
+				cur->bc_bufs[i] = NULL;
+				cur->bc_ptrs[i] = 0;
+				cur->bc_ra[i] = 0;
+			}
+		}
+	}
+
+	return error;
+}
+
+/*
+ * Query a btree for all records overlapping a given interval of keys.  The
+ * supplied function will be called with each record found; return one of the
+ * XFS_BTREE_QUERY_RANGE_{CONTINUE,ABORT} values or the usual negative error
+ * code.  This function returns XFS_BTREE_QUERY_RANGE_ABORT, zero, or a
+ * negative error code.
+ */
+int
+xfs_btree_query_range(
+	struct xfs_btree_cur		*cur,
+	union xfs_btree_irec		*low_rec,
+	union xfs_btree_irec		*high_rec,
+	xfs_btree_query_range_fn	fn,
+	void				*priv)
+{
+	if (!(cur->bc_ops->flags & XFS_BTREE_OPS_OVERLAPPING))
+		return xfs_btree_simple_query_range(cur, low_rec,
+				high_rec, fn, priv);
+	return xfs_btree_overlapped_query_range(cur, low_rec, high_rec,
+			fn, priv);
+}
diff --git a/libxfs/xfs_btree.h b/libxfs/xfs_btree.h
index a5ec6c7..898fee5 100644
--- a/libxfs/xfs_btree.h
+++ b/libxfs/xfs_btree.h
@@ -206,6 +206,12 @@ struct xfs_btree_ops {
 #define LASTREC_DELREC	2
 
 
+union xfs_btree_irec {
+	xfs_alloc_rec_incore_t		a;
+	xfs_bmbt_irec_t			b;
+	xfs_inobt_rec_incore_t		i;
+};
+
 /*
  * Btree cursor structure.
  * This collects all information needed by the btree code in one place.
@@ -216,11 +222,7 @@ typedef struct xfs_btree_cur
 	struct xfs_mount	*bc_mp;	/* file system mount struct */
 	const struct xfs_btree_ops *bc_ops;
 	uint			bc_flags; /* btree features - below */
-	union {
-		xfs_alloc_rec_incore_t	a;
-		xfs_bmbt_irec_t		b;
-		xfs_inobt_rec_incore_t	i;
-	}		bc_rec;		/* current insert/search record value */
+	union xfs_btree_irec	bc_rec;	/* current insert/search record value */
 	struct xfs_buf	*bc_bufs[XFS_BTREE_MAXLEVELS];	/* buf ptr per level */
 	int		bc_ptrs[XFS_BTREE_MAXLEVELS];	/* key/record # */
 	__uint8_t	bc_ra[XFS_BTREE_MAXLEVELS];	/* readahead bits */
@@ -494,4 +496,14 @@ xfs_extlen_t xfs_btree_calc_size(struct xfs_mount *mp, uint *limits,
 uint xfs_btree_compute_maxlevels(struct xfs_mount *mp, uint *limits,
 		unsigned long len);
 
+/* return codes */
+#define XFS_BTREE_QUERY_RANGE_CONTINUE	0	/* keep iterating */
+#define XFS_BTREE_QUERY_RANGE_ABORT	1	/* stop iterating */
+typedef int (*xfs_btree_query_range_fn)(struct xfs_btree_cur *cur,
+		union xfs_btree_rec *rec, void *priv);
+
+int xfs_btree_query_range(struct xfs_btree_cur *cur,
+		union xfs_btree_irec *low_rec, union xfs_btree_irec *high_rec,
+		xfs_btree_query_range_fn fn, void *priv);
+
 #endif	/* __XFS_BTREE_H__ */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 019/145] xfs: refactor btree owner change into a separate visit-blocks function
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (17 preceding siblings ...)
  2016-06-17  1:32 ` [PATCH 018/145] xfs: introduce interval queries on btrees Darrick J. Wong
@ 2016-06-17  1:32 ` Darrick J. Wong
  2016-06-17  1:32 ` [PATCH 020/145] xfs: move deferred operations into a separate file Darrick J. Wong
                   ` (125 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:32 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Refactor the btree_change_owner function into a more generic apparatus
which visits all blocks in a btree.  We'll use this in a subsequent
patch for counting btree blocks for AG reservations.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_btree.c |  141 ++++++++++++++++++++++++++++++++++------------------
 libxfs/xfs_btree.h |    5 ++
 2 files changed, 96 insertions(+), 50 deletions(-)


diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c
index 7a34aa1..4f48878 100644
--- a/libxfs/xfs_btree.c
+++ b/libxfs/xfs_btree.c
@@ -4285,6 +4285,81 @@ xfs_btree_get_rec(
 	return 0;
 }
 
+/* Visit a block in a btree. */
+STATIC int
+xfs_btree_visit_block(
+	struct xfs_btree_cur		*cur,
+	int				level,
+	xfs_btree_visit_blocks_fn	fn,
+	void				*data)
+{
+	struct xfs_btree_block		*block;
+	struct xfs_buf			*bp;
+	union xfs_btree_ptr		rptr;
+	int				error;
+
+	/* do right sibling readahead */
+	xfs_btree_readahead(cur, level, XFS_BTCUR_RIGHTRA);
+	block = xfs_btree_get_block(cur, level, &bp);
+
+	/* process the block */
+	error = fn(cur, level, data);
+	if (error)
+		return error;
+
+	/* now read rh sibling block for next iteration */
+	xfs_btree_get_sibling(cur, block, &rptr, XFS_BB_RIGHTSIB);
+	if (xfs_btree_ptr_is_null(cur, &rptr))
+		return -ENOENT;
+
+	return xfs_btree_lookup_get_block(cur, level, &rptr, &block);
+}
+
+
+/* Visit every block in a btree. */
+int
+xfs_btree_visit_blocks(
+	struct xfs_btree_cur		*cur,
+	xfs_btree_visit_blocks_fn	fn,
+	void				*data)
+{
+	union xfs_btree_ptr		lptr;
+	int				level;
+	struct xfs_btree_block		*block = NULL;
+	int				error = 0;
+
+	cur->bc_ops->init_ptr_from_cur(cur, &lptr);
+
+	/* for each level */
+	for (level = cur->bc_nlevels - 1; level >= 0; level--) {
+		/* grab the left hand block */
+		error = xfs_btree_lookup_get_block(cur, level, &lptr, &block);
+		if (error)
+			return error;
+
+		/* readahead the left most block for the next level down */
+		if (level > 0) {
+			union xfs_btree_ptr     *ptr;
+
+			ptr = xfs_btree_ptr_addr(cur, 1, block);
+			xfs_btree_readahead_ptr(cur, ptr, 1);
+
+			/* save for the next iteration of the loop */
+			lptr = *ptr;
+		}
+
+		/* for each buffer in the level */
+		do {
+			error = xfs_btree_visit_block(cur, level, fn, data);
+		} while (!error);
+
+		if (error != -ENOENT)
+			return error;
+	}
+
+	return 0;
+}
+
 /*
  * Change the owner of a btree.
  *
@@ -4309,26 +4384,27 @@ xfs_btree_get_rec(
  * just queue the modified buffer as delayed write buffer so the transaction
  * recovery completion writes the changes to disk.
  */
+struct xfs_btree_block_change_owner_info {
+	__uint64_t		new_owner;
+	struct list_head	*buffer_list;
+};
+
 static int
 xfs_btree_block_change_owner(
 	struct xfs_btree_cur	*cur,
 	int			level,
-	__uint64_t		new_owner,
-	struct list_head	*buffer_list)
+	void			*data)
 {
+	struct xfs_btree_block_change_owner_info	*bbcoi = data;
 	struct xfs_btree_block	*block;
 	struct xfs_buf		*bp;
-	union xfs_btree_ptr     rptr;
-
-	/* do right sibling readahead */
-	xfs_btree_readahead(cur, level, XFS_BTCUR_RIGHTRA);
 
 	/* modify the owner */
 	block = xfs_btree_get_block(cur, level, &bp);
 	if (cur->bc_flags & XFS_BTREE_LONG_PTRS)
-		block->bb_u.l.bb_owner = cpu_to_be64(new_owner);
+		block->bb_u.l.bb_owner = cpu_to_be64(bbcoi->new_owner);
 	else
-		block->bb_u.s.bb_owner = cpu_to_be32(new_owner);
+		block->bb_u.s.bb_owner = cpu_to_be32(bbcoi->new_owner);
 
 	/*
 	 * If the block is a root block hosted in an inode, we might not have a
@@ -4342,19 +4418,14 @@ xfs_btree_block_change_owner(
 			xfs_trans_ordered_buf(cur->bc_tp, bp);
 			xfs_btree_log_block(cur, bp, XFS_BB_OWNER);
 		} else {
-			xfs_buf_delwri_queue(bp, buffer_list);
+			xfs_buf_delwri_queue(bp, bbcoi->buffer_list);
 		}
 	} else {
 		ASSERT(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE);
 		ASSERT(level == cur->bc_nlevels - 1);
 	}
 
-	/* now read rh sibling block for next iteration */
-	xfs_btree_get_sibling(cur, block, &rptr, XFS_BB_RIGHTSIB);
-	if (xfs_btree_ptr_is_null(cur, &rptr))
-		return -ENOENT;
-
-	return xfs_btree_lookup_get_block(cur, level, &rptr, &block);
+	return 0;
 }
 
 int
@@ -4363,43 +4434,13 @@ xfs_btree_change_owner(
 	__uint64_t		new_owner,
 	struct list_head	*buffer_list)
 {
-	union xfs_btree_ptr     lptr;
-	int			level;
-	struct xfs_btree_block	*block = NULL;
-	int			error = 0;
+	struct xfs_btree_block_change_owner_info	bbcoi;
 
-	cur->bc_ops->init_ptr_from_cur(cur, &lptr);
+	bbcoi.new_owner = new_owner;
+	bbcoi.buffer_list = buffer_list;
 
-	/* for each level */
-	for (level = cur->bc_nlevels - 1; level >= 0; level--) {
-		/* grab the left hand block */
-		error = xfs_btree_lookup_get_block(cur, level, &lptr, &block);
-		if (error)
-			return error;
-
-		/* readahead the left most block for the next level down */
-		if (level > 0) {
-			union xfs_btree_ptr     *ptr;
-
-			ptr = xfs_btree_ptr_addr(cur, 1, block);
-			xfs_btree_readahead_ptr(cur, ptr, 1);
-
-			/* save for the next iteration of the loop */
-			lptr = *ptr;
-		}
-
-		/* for each buffer in the level */
-		do {
-			error = xfs_btree_block_change_owner(cur, level,
-							     new_owner,
-							     buffer_list);
-		} while (!error);
-
-		if (error != -ENOENT)
-			return error;
-	}
-
-	return 0;
+	return xfs_btree_visit_blocks(cur, xfs_btree_block_change_owner,
+			&bbcoi);
 }
 
 /**
diff --git a/libxfs/xfs_btree.h b/libxfs/xfs_btree.h
index 898fee5..0ec3055 100644
--- a/libxfs/xfs_btree.h
+++ b/libxfs/xfs_btree.h
@@ -506,4 +506,9 @@ int xfs_btree_query_range(struct xfs_btree_cur *cur,
 		union xfs_btree_irec *low_rec, union xfs_btree_irec *high_rec,
 		xfs_btree_query_range_fn fn, void *priv);
 
+typedef int (*xfs_btree_visit_blocks_fn)(struct xfs_btree_cur *cur, int level,
+		void *data);
+int xfs_btree_visit_blocks(struct xfs_btree_cur *cur,
+		xfs_btree_visit_blocks_fn fn, void *data);
+
 #endif	/* __XFS_BTREE_H__ */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 020/145] xfs: move deferred operations into a separate file
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (18 preceding siblings ...)
  2016-06-17  1:32 ` [PATCH 019/145] xfs: refactor btree owner change into a separate visit-blocks function Darrick J. Wong
@ 2016-06-17  1:32 ` Darrick J. Wong
  2016-06-17  1:32 ` [PATCH 021/145] xfs: add tracepoints for the deferred ops mechanism Darrick J. Wong
                   ` (124 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:32 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

All the code around struct xfs_bmap_free basically implements a
deferred operation framework through which we can roll transactions
(to unlock buffers and avoid violating lock order rules) while
managing all the necessary log redo items.  Previously we only used
this code to free extents after some sort of mapping operation, but
with the advent of rmap and reflink, we suddenly need to do more than
that.

With that in mind, xfs_bmap_free really becomes a deferred ops control
structure.  Rename the structure and move the deferred ops into their
own file to avoid further bloating of the bmap code.

v2: actually sort the work items by AG to avoid deadlocks

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/Makefile     |    3 
 libxfs/defer_item.c |   36 ++++
 libxfs/init.c       |    2 
 libxfs/xfs_defer.c  |  471 +++++++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_defer.h  |   96 ++++++++++
 5 files changed, 608 insertions(+)
 create mode 100644 libxfs/defer_item.c
 create mode 100644 libxfs/xfs_defer.c
 create mode 100644 libxfs/xfs_defer.h


diff --git a/libxfs/Makefile b/libxfs/Makefile
index ca1a5ee..0073cd1 100644
--- a/libxfs/Makefile
+++ b/libxfs/Makefile
@@ -29,6 +29,7 @@ HFILES = \
 	xfs_attr_remote.h \
 	xfs_cksum.h \
 	xfs_da_btree.h \
+	xfs_defer.h \
 	xfs_dir2.h \
 	xfs_ialloc.h \
 	xfs_ialloc_btree.h \
@@ -49,6 +50,7 @@ HFILES = \
 
 CFILES = cache.c \
 	crc32.c \
+	defer_item.c \
 	init.c \
 	kmem.c \
 	logitem.c \
@@ -67,6 +69,7 @@ CFILES = cache.c \
 	xfs_btree.c \
 	xfs_da_btree.c \
 	xfs_da_format.c \
+	xfs_defer.c \
 	xfs_dir2.c \
 	xfs_dir2_block.c \
 	xfs_dir2_data.c \
diff --git a/libxfs/defer_item.c b/libxfs/defer_item.c
new file mode 100644
index 0000000..06d294f
--- /dev/null
+++ b/libxfs/defer_item.c
@@ -0,0 +1,36 @@
+/*
+ * Copyright (C) 2016 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "libxfs_priv.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_bit.h"
+#include "xfs_sb.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_trans.h"
+
+/* Initialize the deferred operation types. */
+void
+xfs_defer_init_types(void)
+{
+}
diff --git a/libxfs/init.c b/libxfs/init.c
index 67c3b30..553da7b 100644
--- a/libxfs/init.c
+++ b/libxfs/init.c
@@ -26,6 +26,7 @@
 #include "xfs_log_format.h"
 #include "xfs_trans_resv.h"
 #include "xfs_mount.h"
+#include "xfs_defer.h"
 #include "xfs_inode_buf.h"
 #include "xfs_inode_fork.h"
 #include "xfs_inode.h"
@@ -262,6 +263,7 @@ libxfs_init(libxfs_init_t *a)
 	flags = (a->isreadonly | a->isdirect);
 
 	radix_tree_init();
+	xfs_defer_init_types();
 
 	if (a->volname) {
 		if(!check_open(a->volname,flags,&rawfile,&blockfile))
diff --git a/libxfs/xfs_defer.c b/libxfs/xfs_defer.c
new file mode 100644
index 0000000..5ae5e0d
--- /dev/null
+++ b/libxfs/xfs_defer.c
@@ -0,0 +1,471 @@
+/*
+ * Copyright (C) 2016 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "libxfs_priv.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_bit.h"
+#include "xfs_sb.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_trans.h"
+#include "xfs_trace.h"
+
+/*
+ * Deferred Operations in XFS
+ *
+ * Due to the way locking rules work in XFS, certain transactions (block
+ * mapping and unmapping, typically) have permanent reservations so that
+ * we can roll the transaction to adhere to AG locking order rules and
+ * to unlock buffers between metadata updates.  Prior to rmap/reflink,
+ * the mapping code had a mechanism to perform these deferrals for
+ * extents that were going to be freed; this code makes that facility
+ * more generic.
+ *
+ * When adding the reverse mapping and reflink features, it became
+ * necessary to perform complex remapping multi-transactions to comply
+ * with AG locking order rules, and to be able to spread a single
+ * refcount update operation (an operation on an n-block extent can
+ * update as many as n records!) among multiple transactions.  XFS can
+ * roll a transaction to facilitate this, but using this facility
+ * requires us to log "intent" items in case log recovery needs to
+ * redo the operation, and to log "done" items to indicate that redo
+ * is not necessary.
+ *
+ * The xfs_defer_ops structure tracks incoming deferred work (which is
+ * work that has not yet had an intent logged) in xfs_defer_intake.
+ * There is one xfs_defer_intake for each type of deferrable
+ * operation.  Each new deferral is placed in the op's intake list,
+ * where it waits for the caller to finish the deferred operations.
+ *
+ * Finishing a set of deferred operations is an involved process.  To
+ * start, we define "rolling a deferred-op transaction" as follows:
+ *
+ * > For each xfs_defer_intake,
+ *   - Sort the items on the intake list in AG order.
+ *   - Create a log intent item for that type.
+ *   - Attach to it the items on the intake list.
+ *   - Stash the intent+items for later in an xfs_defer_pending.
+ *   - Attach the xfs_defer_pending to the xfs_defer_ops work list.
+ * > Roll the transaction.
+ *
+ * NOTE: To avoid exceeding the transaction reservation, we limit the
+ * number of items that we attach to a given xfs_defer_pending.
+ *
+ * The actual finishing process looks like this:
+ *
+ * > For each xfs_defer_pending in the xfs_defer_ops work list,
+ *   - Roll the deferred-op transaction as above.
+ *   - Create a log done item for that type, and attach it to the
+ *     intent item.
+ *   - For each work item attached to the intent item,
+ *     * Perform the described action.
+ *     * Attach the work item to the log done item.
+ *     * If the result of doing the work was -EAGAIN, log a fresh
+ *       intent item and attach all remaining work items to it.  Put
+ *       the xfs_defer_pending item back on the work list, and repeat
+ *       the loop.  This allows us to make partial progress even if
+ *       the transaction is too full to finish the job.
+ *
+ * The key here is that we must log an intent item for all pending
+ * work items every time we roll the transaction, and that we must log
+ * a done item as soon as the work is completed.  With this mechanism
+ * we can perform complex remapping operations, chaining intent items
+ * as needed.
+ *
+ * This is an example of remapping the extent (E, E+B) into file X at
+ * offset A and dealing with the extent (C, C+B) already being mapped
+ * there:
+ * +-------------------------------------------------+
+ * | Unmap file X startblock C offset A length B     | t0
+ * | Intent to reduce refcount for extent (C, B)     |
+ * | Intent to remove rmap (X, C, A, B)              |
+ * | Intent to free extent (D, 1) (bmbt block)       |
+ * | Intent to map (X, A, B) at startblock E         |
+ * +-------------------------------------------------+
+ * | Map file X startblock E offset A length B       | t1
+ * | Done mapping (X, E, A, B)                       |
+ * | Intent to increase refcount for extent (E, B)   |
+ * | Intent to add rmap (X, E, A, B)                 |
+ * +-------------------------------------------------+
+ * | Reduce refcount for extent (C, B)               | t2
+ * | Done reducing refcount for extent (C, B)        |
+ * | Increase refcount for extent (E, B)             |
+ * | Done increasing refcount for extent (E, B)      |
+ * | Intent to free extent (C, B)                    |
+ * | Intent to free extent (F, 1) (refcountbt block) |
+ * | Intent to remove rmap (F, 1, REFC)              |
+ * +-------------------------------------------------+
+ * | Remove rmap (X, C, A, B)                        | t3
+ * | Done removing rmap (X, C, A, B)                 |
+ * | Add rmap (X, E, A, B)                           |
+ * | Done adding rmap (X, E, A, B)                   |
+ * | Remove rmap (F, 1, REFC)                        |
+ * | Done removing rmap (F, 1, REFC)                 |
+ * +-------------------------------------------------+
+ * | Free extent (C, B)                              | t4
+ * | Done freeing extent (C, B)                      |
+ * | Free extent (D, 1)                              |
+ * | Done freeing extent (D, 1)                      |
+ * | Free extent (F, 1)                              |
+ * | Done freeing extent (F, 1)                      |
+ * +-------------------------------------------------+
+ *
+ * If we should crash before t2 commits, log recovery replays
+ * the following intent items:
+ *
+ * - Intent to reduce refcount for extent (C, B)
+ * - Intent to remove rmap (X, C, A, B)
+ * - Intent to free extent (D, 1) (bmbt block)
+ * - Intent to increase refcount for extent (E, B)
+ * - Intent to add rmap (X, E, A, B)
+ *
+ * In the process of recovering, it should also generate and take care
+ * of these intent items:
+ *
+ * - Intent to free extent (C, B)
+ * - Intent to free extent (F, 1) (refcountbt block)
+ * - Intent to remove rmap (F, 1, REFC)
+ */
+
+static const struct xfs_defer_op_type *defer_op_types[XFS_DEFER_OPS_TYPE_MAX];
+
+/*
+ * For each pending item in the intake list, log its intent item and the
+ * associated extents, then add the entire intake list to the end of
+ * the pending list.
+ */
+STATIC void
+xfs_defer_intake_work(
+	struct xfs_trans		*tp,
+	struct xfs_defer_ops		*dop)
+{
+	struct list_head		*li;
+	struct xfs_defer_pending	*dfp;
+
+	list_for_each_entry(dfp, &dop->dop_intake, dfp_list) {
+		dfp->dfp_intent = dfp->dfp_type->create_intent(tp,
+				dfp->dfp_count);
+		list_sort(tp->t_mountp, &dfp->dfp_work,
+				dfp->dfp_type->diff_items);
+		list_for_each(li, &dfp->dfp_work)
+			dfp->dfp_type->log_item(tp, dfp->dfp_intent, li);
+	}
+
+	list_splice_tail_init(&dop->dop_intake, &dop->dop_pending);
+}
+
+/* Abort all the intents that were committed. */
+STATIC void
+xfs_defer_trans_abort(
+	struct xfs_trans		*tp,
+	struct xfs_defer_ops		*dop,
+	int				error)
+{
+	struct xfs_defer_pending	*dfp;
+
+	/*
+	 * If the transaction was committed, drop the intent reference
+	 * since we're bailing out of here. The other reference is
+	 * dropped when the intent hits the AIL.  If the transaction
+	 * was not committed, the intent is freed by the intent item
+	 * unlock handler on abort.
+	 */
+	if (!dop->dop_committed)
+		return;
+
+	/* Abort intent items. */
+	list_for_each_entry(dfp, &dop->dop_pending, dfp_list) {
+		if (dfp->dfp_committed)
+			dfp->dfp_type->abort_intent(dfp->dfp_intent);
+	}
+
+	/* Shut down FS. */
+	xfs_force_shutdown(tp->t_mountp, (error == -EFSCORRUPTED) ?
+			SHUTDOWN_CORRUPT_INCORE : SHUTDOWN_META_IO_ERROR);
+}
+
+/* Roll a transaction so we can do some deferred op processing. */
+STATIC int
+xfs_defer_trans_roll(
+	struct xfs_trans		**tp,
+	struct xfs_defer_ops		*dop,
+	struct xfs_inode		*ip)
+{
+	int				i;
+	int				error;
+
+	/* Log all the joined inodes except the one we passed in. */
+	for (i = 0; i < XFS_DEFER_OPS_NR_INODES && dop->dop_inodes[i]; i++) {
+		if (dop->dop_inodes[i] == ip)
+			continue;
+		xfs_trans_log_inode(*tp, dop->dop_inodes[i], XFS_ILOG_CORE);
+	}
+
+	/* Roll the transaction. */
+	error = xfs_trans_roll(tp, ip);
+	if (error) {
+		xfs_defer_trans_abort(*tp, dop, error);
+		return error;
+	}
+	dop->dop_committed = true;
+
+	/* Log all the joined inodes except the one we passed in. */
+	for (i = 0; i < XFS_DEFER_OPS_NR_INODES && dop->dop_inodes[i]; i++) {
+		if (dop->dop_inodes[i] == ip)
+			continue;
+		xfs_trans_ijoin(*tp, dop->dop_inodes[i], 0);
+	}
+
+	return error;
+}
+
+/* Do we have any work items to finish? */
+bool
+xfs_defer_has_unfinished_work(
+	struct xfs_defer_ops		*dop)
+{
+	return !list_empty(&dop->dop_pending) || !list_empty(&dop->dop_intake);
+}
+
+/*
+ * Add this inode to the deferred op.  Each joined inode is relogged
+ * each time we roll the transaction, in addition to any inode passed
+ * to xfs_defer_finish().
+ */
+int
+xfs_defer_join(
+	struct xfs_defer_ops		*dop,
+	struct xfs_inode		*ip)
+{
+	int				i;
+
+	for (i = 0; i < XFS_DEFER_OPS_NR_INODES; i++) {
+		if (dop->dop_inodes[i] == ip)
+			return 0;
+		else if (dop->dop_inodes[i] == NULL) {
+			dop->dop_inodes[i] = ip;
+			return 0;
+		}
+	}
+
+	return -EFSCORRUPTED;
+}
+
+/*
+ * Finish all the pending work.  This involves logging intent items for
+ * any work items that wandered in since the last transaction roll (if
+ * one has even happened), rolling the transaction, and finishing the
+ * work items in the first item on the logged-and-pending list.
+ *
+ * If an inode is provided, relog it to the new transaction.
+ */
+int
+xfs_defer_finish(
+	struct xfs_trans		**tp,
+	struct xfs_defer_ops		*dop,
+	struct xfs_inode		*ip)
+{
+	struct xfs_defer_pending	*dfp;
+	struct list_head		*li;
+	struct list_head		*n;
+	void				*done_item = NULL;
+	void				*state;
+	int				error = 0;
+	void				(*cleanup_fn)(struct xfs_trans *, void *, int);
+
+	ASSERT((*tp)->t_flags & XFS_TRANS_PERM_LOG_RES);
+
+	/* Until we run out of pending work to finish... */
+	while (xfs_defer_has_unfinished_work(dop)) {
+		/* Log intents for work items sitting in the intake. */
+		xfs_defer_intake_work(*tp, dop);
+
+		/* Roll the transaction. */
+		error = xfs_defer_trans_roll(tp, dop, ip);
+		if (error)
+			goto out;
+
+		/* Mark all pending intents as committed. */
+		list_for_each_entry_reverse(dfp, &dop->dop_pending, dfp_list) {
+			if (dfp->dfp_committed)
+				break;
+			dfp->dfp_committed = true;
+		}
+
+		/* Log an intent-done item for the first pending item. */
+		dfp = list_first_entry(&dop->dop_pending,
+				struct xfs_defer_pending, dfp_list);
+		done_item = dfp->dfp_type->create_done(*tp, dfp->dfp_intent,
+				dfp->dfp_count);
+		cleanup_fn = dfp->dfp_type->finish_cleanup;
+
+		/* Finish the work items. */
+		state = NULL;
+		list_for_each_safe(li, n, &dfp->dfp_work) {
+			list_del(li);
+			dfp->dfp_count--;
+			error = dfp->dfp_type->finish_item(*tp, dop, li,
+					done_item, &state);
+			if (error == -EAGAIN) {
+				/*
+				 * If the caller needs to try again, put the
+				 * item back on the pending list and jump out
+				 * for further processing.
+				 */
+				list_add(li, &dfp->dfp_work);
+				dfp->dfp_count++;
+				break;
+			} else if (error) {
+				/*
+				 * Clean up after ourselves and jump out.
+				 * xfs_defer_cancel will take care of freeing
+				 * all these lists and stuff.
+				 */
+				if (cleanup_fn)
+					cleanup_fn(*tp, state, error);
+				xfs_defer_trans_abort(*tp, dop, error);
+				goto out;
+			}
+		}
+		if (error == -EAGAIN) {
+			/*
+			 * Log a new intent, relog all the remaining work
+			 * item to the new intent, attach the new intent to
+			 * the dfp, and leave the dfp at the head of the list
+			 * for further processing.
+			 */
+			dfp->dfp_intent = dfp->dfp_type->create_intent(*tp,
+					dfp->dfp_count);
+			list_for_each(li, &dfp->dfp_work)
+				dfp->dfp_type->log_item(*tp, dfp->dfp_intent,
+						li);
+		} else {
+			/* Done with the dfp, free it. */
+			list_del(&dfp->dfp_list);
+			kmem_free(dfp);
+		}
+
+		if (cleanup_fn)
+			cleanup_fn(*tp, state, error);
+	}
+
+out:
+	return error;
+}
+
+/*
+ * Free up any items left in the list.
+ */
+void
+xfs_defer_cancel(
+	struct xfs_defer_ops		*dop)
+{
+	struct xfs_defer_pending	*dfp;
+	struct xfs_defer_pending	*pli;
+	struct list_head		*pwi;
+	struct list_head		*n;
+
+	/*
+	 * Free the pending items.  Caller should already have arranged
+	 * for the intent items to be released.
+	 */
+	list_for_each_entry_safe(dfp, pli, &dop->dop_intake, dfp_list) {
+		list_del(&dfp->dfp_list);
+		list_for_each_safe(pwi, n, &dfp->dfp_work) {
+			list_del(pwi);
+			dfp->dfp_count--;
+			dfp->dfp_type->cancel_item(pwi);
+		}
+		ASSERT(dfp->dfp_count == 0);
+		kmem_free(dfp);
+	}
+	list_for_each_entry_safe(dfp, pli, &dop->dop_pending, dfp_list) {
+		list_del(&dfp->dfp_list);
+		list_for_each_safe(pwi, n, &dfp->dfp_work) {
+			list_del(pwi);
+			dfp->dfp_count--;
+			dfp->dfp_type->cancel_item(pwi);
+		}
+		ASSERT(dfp->dfp_count == 0);
+		kmem_free(dfp);
+	}
+}
+
+/* Add an item for later deferred processing. */
+void
+xfs_defer_add(
+	struct xfs_defer_ops		*dop,
+	enum xfs_defer_ops_type		type,
+	struct list_head		*li)
+{
+	struct xfs_defer_pending	*dfp = NULL;
+
+	/*
+	 * Add the item to a pending item at the end of the intake list.
+	 * If the last pending item has the same type, reuse it.  Else,
+	 * create a new pending item at the end of the intake list.
+	 */
+	if (!list_empty(&dop->dop_intake)) {
+		dfp = list_last_entry(&dop->dop_intake,
+				struct xfs_defer_pending, dfp_list);
+		if (dfp->dfp_type->type != type ||
+		    (dfp->dfp_type->max_items &&
+		     dfp->dfp_count >= dfp->dfp_type->max_items))
+			dfp = NULL;
+	}
+	if (!dfp) {
+		dfp = kmem_alloc(sizeof(struct xfs_defer_pending),
+				KM_SLEEP | KM_NOFS);
+		dfp->dfp_type = defer_op_types[type];
+		dfp->dfp_committed = false;
+		dfp->dfp_intent = NULL;
+		dfp->dfp_count = 0;
+		INIT_LIST_HEAD(&dfp->dfp_work);
+		list_add_tail(&dfp->dfp_list, &dop->dop_intake);
+	}
+
+	list_add_tail(li, &dfp->dfp_work);
+	dfp->dfp_count++;
+}
+
+/* Initialize a deferred operation list. */
+void
+xfs_defer_init_op_type(
+	const struct xfs_defer_op_type	*type)
+{
+	defer_op_types[type->type] = type;
+}
+
+/* Initialize a deferred operation. */
+void
+xfs_defer_init(
+	struct xfs_defer_ops		*dop,
+	xfs_fsblock_t			*fbp)
+{
+	dop->dop_committed = false;
+	dop->dop_low = false;
+	memset(&dop->dop_inodes, 0, sizeof(dop->dop_inodes));
+	*fbp = NULLFSBLOCK;
+	INIT_LIST_HEAD(&dop->dop_intake);
+	INIT_LIST_HEAD(&dop->dop_pending);
+}
diff --git a/libxfs/xfs_defer.h b/libxfs/xfs_defer.h
new file mode 100644
index 0000000..85c7a3a
--- /dev/null
+++ b/libxfs/xfs_defer.h
@@ -0,0 +1,96 @@
+/*
+ * Copyright (C) 2016 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef __XFS_DEFER_H__
+#define	__XFS_DEFER_H__
+
+struct xfs_defer_op_type;
+
+/*
+ * Save a log intent item and a list of extents, so that we can replay
+ * whatever action had to happen to the extent list and file the log done
+ * item.
+ */
+struct xfs_defer_pending {
+	const struct xfs_defer_op_type	*dfp_type;	/* function pointers */
+	struct list_head		dfp_list;	/* pending items */
+	bool				dfp_committed;	/* committed trans? */
+	void				*dfp_intent;	/* log intent item */
+	struct list_head		dfp_work;	/* work items */
+	unsigned int			dfp_count;	/* # extent items */
+};
+
+/*
+ * Header for deferred operation list.
+ *
+ * dop_low is used by the allocator to activate the lowspace algorithm -
+ * when free space is running low the extent allocator may choose to
+ * allocate an extent from an AG without leaving sufficient space for
+ * a btree split when inserting the new extent.  In this case the allocator
+ * will enable the lowspace algorithm which is supposed to allow further
+ * allocations (such as btree splits and newroots) to allocate from
+ * sequential AGs.  In order to avoid locking AGs out of order the lowspace
+ * algorithm will start searching for free space from AG 0.  If the correct
+ * transaction reservations have been made then this algorithm will eventually
+ * find all the space it needs.
+ */
+enum xfs_defer_ops_type {
+	XFS_DEFER_OPS_TYPE_MAX,
+};
+
+#define XFS_DEFER_OPS_NR_INODES	2	/* join up to two inodes */
+
+struct xfs_defer_ops {
+	bool			dop_committed;	/* did any trans commit? */
+	bool			dop_low;	/* alloc in low mode */
+	struct list_head	dop_intake;	/* unlogged pending work */
+	struct list_head	dop_pending;	/* logged pending work */
+
+	/* relog these inodes with each roll */
+	struct xfs_inode	*dop_inodes[XFS_DEFER_OPS_NR_INODES];
+};
+
+void xfs_defer_add(struct xfs_defer_ops *dop, enum xfs_defer_ops_type type,
+		struct list_head *h);
+int xfs_defer_finish(struct xfs_trans **tp, struct xfs_defer_ops *dop,
+		struct xfs_inode *ip);
+void xfs_defer_cancel(struct xfs_defer_ops *dop);
+void xfs_defer_init(struct xfs_defer_ops *dop, xfs_fsblock_t *fbp);
+bool xfs_defer_has_unfinished_work(struct xfs_defer_ops *dop);
+int xfs_defer_join(struct xfs_defer_ops *dop, struct xfs_inode *ip);
+
+/* Description of a deferred type. */
+struct xfs_defer_op_type {
+	enum xfs_defer_ops_type	type;
+	unsigned int		max_items;
+	void (*abort_intent)(void *);
+	void *(*create_done)(struct xfs_trans *, void *, unsigned int);
+	int (*finish_item)(struct xfs_trans *, struct xfs_defer_ops *,
+			struct list_head *, void *, void **);
+	void (*finish_cleanup)(struct xfs_trans *, void *, int);
+	void (*cancel_item)(struct list_head *);
+	int (*diff_items)(void *, struct list_head *, struct list_head *);
+	void *(*create_intent)(struct xfs_trans *, uint);
+	void (*log_item)(struct xfs_trans *, void *, struct list_head *);
+};
+
+void xfs_defer_init_op_type(const struct xfs_defer_op_type *type);
+void xfs_defer_init_types(void);
+
+#endif /* __XFS_DEFER_H__ */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 021/145] xfs: add tracepoints for the deferred ops mechanism
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (19 preceding siblings ...)
  2016-06-17  1:32 ` [PATCH 020/145] xfs: move deferred operations into a separate file Darrick J. Wong
@ 2016-06-17  1:32 ` Darrick J. Wong
  2016-06-17  1:33 ` [PATCH 022/145] xfs: enable the xfs_defer mechanism to process extents to free Darrick J. Wong
                   ` (123 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:32 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 include/xfs_trace.h |   15 +++++++++++++++
 libxfs/xfs_defer.c  |   19 +++++++++++++++++++
 2 files changed, 34 insertions(+)


diff --git a/include/xfs_trace.h b/include/xfs_trace.h
index 56c4533..dd0d46f 100644
--- a/include/xfs_trace.h
+++ b/include/xfs_trace.h
@@ -169,6 +169,21 @@
 #define trace_xfs_btree_updkeys(...)		((void) 0)
 #define trace_xfs_btree_overlapped_query_range(...)	((void) 0)
 
+#define trace_xfs_defer_intake_work(...)	((void) 0)
+#define trace_xfs_defer_trans_abort(...)	((void) 0)
+#define trace_xfs_defer_pending_abort(...)	((void) 0)
+#define trace_xfs_defer_trans_roll(...)		((void) 0)
+#define trace_xfs_defer_trans_roll_error(...)	((void) 0)
+#define trace_xfs_defer_finish(...)		((void) 0)
+#define trace_xfs_defer_pending_commit(...)	((void) 0)
+#define trace_xfs_defer_pending_finish(...)	((void) 0)
+#define trace_xfs_defer_finish_error(...)	((void) 0)
+#define trace_xfs_defer_finish_done(...)	((void) 0)
+#define trace_xfs_defer_cancel(...)		((void) 0)
+#define trace_xfs_defer_intake_cancel(...)	((void) 0)
+#define trace_xfs_defer_pending_cancel(...)	((void) 0)
+#define trace_xfs_defer_init(...)		((void) 0)
+
 /* set c = c to avoid unused var warnings */
 #define trace_xfs_perag_get(a,b,c,d)	((c) = (c))
 #define trace_xfs_perag_get_tag(a,b,c,d) ((c) = (c))
diff --git a/libxfs/xfs_defer.c b/libxfs/xfs_defer.c
index 5ae5e0d..069b7bb 100644
--- a/libxfs/xfs_defer.c
+++ b/libxfs/xfs_defer.c
@@ -163,6 +163,7 @@ xfs_defer_intake_work(
 	struct xfs_defer_pending	*dfp;
 
 	list_for_each_entry(dfp, &dop->dop_intake, dfp_list) {
+		trace_xfs_defer_intake_work(tp->t_mountp, dfp);
 		dfp->dfp_intent = dfp->dfp_type->create_intent(tp,
 				dfp->dfp_count);
 		list_sort(tp->t_mountp, &dfp->dfp_work,
@@ -183,6 +184,7 @@ xfs_defer_trans_abort(
 {
 	struct xfs_defer_pending	*dfp;
 
+	trace_xfs_defer_trans_abort(tp->t_mountp, dop);
 	/*
 	 * If the transaction was committed, drop the intent reference
 	 * since we're bailing out of here. The other reference is
@@ -195,6 +197,7 @@ xfs_defer_trans_abort(
 
 	/* Abort intent items. */
 	list_for_each_entry(dfp, &dop->dop_pending, dfp_list) {
+		trace_xfs_defer_pending_abort(tp->t_mountp, dfp);
 		if (dfp->dfp_committed)
 			dfp->dfp_type->abort_intent(dfp->dfp_intent);
 	}
@@ -221,9 +224,12 @@ xfs_defer_trans_roll(
 		xfs_trans_log_inode(*tp, dop->dop_inodes[i], XFS_ILOG_CORE);
 	}
 
+	trace_xfs_defer_trans_roll((*tp)->t_mountp, dop);
+
 	/* Roll the transaction. */
 	error = xfs_trans_roll(tp, ip);
 	if (error) {
+		trace_xfs_defer_trans_roll_error((*tp)->t_mountp, dop, error);
 		xfs_defer_trans_abort(*tp, dop, error);
 		return error;
 	}
@@ -295,6 +301,8 @@ xfs_defer_finish(
 
 	ASSERT((*tp)->t_flags & XFS_TRANS_PERM_LOG_RES);
 
+	trace_xfs_defer_finish((*tp)->t_mountp, dop);
+
 	/* Until we run out of pending work to finish... */
 	while (xfs_defer_has_unfinished_work(dop)) {
 		/* Log intents for work items sitting in the intake. */
@@ -309,12 +317,14 @@ xfs_defer_finish(
 		list_for_each_entry_reverse(dfp, &dop->dop_pending, dfp_list) {
 			if (dfp->dfp_committed)
 				break;
+			trace_xfs_defer_pending_commit((*tp)->t_mountp, dfp);
 			dfp->dfp_committed = true;
 		}
 
 		/* Log an intent-done item for the first pending item. */
 		dfp = list_first_entry(&dop->dop_pending,
 				struct xfs_defer_pending, dfp_list);
+		trace_xfs_defer_pending_finish((*tp)->t_mountp, dfp);
 		done_item = dfp->dfp_type->create_done(*tp, dfp->dfp_intent,
 				dfp->dfp_count);
 		cleanup_fn = dfp->dfp_type->finish_cleanup;
@@ -370,6 +380,10 @@ xfs_defer_finish(
 	}
 
 out:
+	if (error)
+		trace_xfs_defer_finish_error((*tp)->t_mountp, dop, error);
+	else
+		trace_xfs_defer_finish_done((*tp)->t_mountp, dop);
 	return error;
 }
 
@@ -385,11 +399,14 @@ xfs_defer_cancel(
 	struct list_head		*pwi;
 	struct list_head		*n;
 
+	trace_xfs_defer_cancel(NULL, dop);
+
 	/*
 	 * Free the pending items.  Caller should already have arranged
 	 * for the intent items to be released.
 	 */
 	list_for_each_entry_safe(dfp, pli, &dop->dop_intake, dfp_list) {
+		trace_xfs_defer_intake_cancel(NULL, dfp);
 		list_del(&dfp->dfp_list);
 		list_for_each_safe(pwi, n, &dfp->dfp_work) {
 			list_del(pwi);
@@ -400,6 +417,7 @@ xfs_defer_cancel(
 		kmem_free(dfp);
 	}
 	list_for_each_entry_safe(dfp, pli, &dop->dop_pending, dfp_list) {
+		trace_xfs_defer_pending_cancel(NULL, dfp);
 		list_del(&dfp->dfp_list);
 		list_for_each_safe(pwi, n, &dfp->dfp_work) {
 			list_del(pwi);
@@ -468,4 +486,5 @@ xfs_defer_init(
 	*fbp = NULLFSBLOCK;
 	INIT_LIST_HEAD(&dop->dop_intake);
 	INIT_LIST_HEAD(&dop->dop_pending);
+	trace_xfs_defer_init(NULL, dop);
 }

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 022/145] xfs: enable the xfs_defer mechanism to process extents to free
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (20 preceding siblings ...)
  2016-06-17  1:32 ` [PATCH 021/145] xfs: add tracepoints for the deferred ops mechanism Darrick J. Wong
@ 2016-06-17  1:33 ` Darrick J. Wong
  2016-06-17  1:33 ` [PATCH 023/145] xfs: rework xfs_bmap_free callers to use xfs_defer_ops Darrick J. Wong
                   ` (122 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:33 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Connect the xfs_defer mechanism with the pieces that we'll need to
handle deferred extent freeing.  We'll wire up the existing code to
our new deferred mechanism later.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/defer_item.c |  100 +++++++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_defer.h  |    1 +
 2 files changed, 101 insertions(+)


diff --git a/libxfs/defer_item.c b/libxfs/defer_item.c
index 06d294f..72c28f8 100644
--- a/libxfs/defer_item.c
+++ b/libxfs/defer_item.c
@@ -28,9 +28,109 @@
 #include "xfs_mount.h"
 #include "xfs_defer.h"
 #include "xfs_trans.h"
+#include "xfs_bmap.h"
+#include "xfs_alloc.h"
+
+/* Extent Freeing */
+
+/* Sort bmap items by AG. */
+static int
+xfs_bmap_free_diff_items(
+	void				*priv,
+	struct list_head		*a,
+	struct list_head		*b)
+{
+	struct xfs_mount		*mp = priv;
+	struct xfs_bmap_free_item	*ra;
+	struct xfs_bmap_free_item	*rb;
+
+	ra = container_of(a, struct xfs_bmap_free_item, xbfi_list);
+	rb = container_of(b, struct xfs_bmap_free_item, xbfi_list);
+	return  XFS_FSB_TO_AGNO(mp, ra->xbfi_startblock) -
+		XFS_FSB_TO_AGNO(mp, rb->xbfi_startblock);
+}
+
+/* Get an EFI. */
+STATIC void *
+xfs_bmap_free_create_intent(
+	struct xfs_trans		*tp,
+	unsigned int			count)
+{
+	return NULL;
+}
+
+/* Log a free extent to the intent item. */
+STATIC void
+xfs_bmap_free_log_item(
+	struct xfs_trans		*tp,
+	void				*intent,
+	struct list_head		*item)
+{
+}
+
+/* Get an EFD so we can process all the free extents. */
+STATIC void *
+xfs_bmap_free_create_done(
+	struct xfs_trans		*tp,
+	void				*intent,
+	unsigned int			count)
+{
+	return NULL;
+}
+
+/* Process a free extent. */
+STATIC int
+xfs_bmap_free_finish_item(
+	struct xfs_trans		*tp,
+	struct xfs_defer_ops		*dop,
+	struct list_head		*item,
+	void				*done_item,
+	void				**state)
+{
+	struct xfs_bmap_free_item	*free;
+	int				error;
+
+	free = container_of(item, struct xfs_bmap_free_item, xbfi_list);
+	error = xfs_free_extent(tp, free->xbfi_startblock,
+			free->xbfi_blockcount);
+	kmem_free(free);
+	return error;
+}
+
+/* Abort all pending EFIs. */
+STATIC void
+xfs_bmap_free_abort_intent(
+	void				*intent)
+{
+}
+
+/* Cancel a free extent. */
+STATIC void
+xfs_bmap_free_cancel_item(
+	struct list_head		*item)
+{
+	struct xfs_bmap_free_item	*free;
+
+	free = container_of(item, struct xfs_bmap_free_item, xbfi_list);
+	kmem_free(free);
+}
+
+const struct xfs_defer_op_type xfs_extent_free_defer_type = {
+	.type		= XFS_DEFER_OPS_TYPE_FREE,
+	.diff_items	= xfs_bmap_free_diff_items,
+	.create_intent	= xfs_bmap_free_create_intent,
+	.abort_intent	= xfs_bmap_free_abort_intent,
+	.log_item	= xfs_bmap_free_log_item,
+	.create_done	= xfs_bmap_free_create_done,
+	.finish_item	= xfs_bmap_free_finish_item,
+	.cancel_item	= xfs_bmap_free_cancel_item,
+};
+
+/* Deferred Item Initialization */
 
 /* Initialize the deferred operation types. */
 void
 xfs_defer_init_types(void)
 {
+	xfs_defer_init_op_type(&xfs_extent_free_defer_type);
 }
diff --git a/libxfs/xfs_defer.h b/libxfs/xfs_defer.h
index 85c7a3a..743fc32 100644
--- a/libxfs/xfs_defer.h
+++ b/libxfs/xfs_defer.h
@@ -51,6 +51,7 @@ struct xfs_defer_pending {
  * find all the space it needs.
  */
 enum xfs_defer_ops_type {
+	XFS_DEFER_OPS_TYPE_FREE,
 	XFS_DEFER_OPS_TYPE_MAX,
 };
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 023/145] xfs: rework xfs_bmap_free callers to use xfs_defer_ops
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (21 preceding siblings ...)
  2016-06-17  1:33 ` [PATCH 022/145] xfs: enable the xfs_defer mechanism to process extents to free Darrick J. Wong
@ 2016-06-17  1:33 ` Darrick J. Wong
  2016-06-17  1:33 ` [PATCH 024/145] xfs: change xfs_bmap_{finish, cancel, init, free} -> xfs_defer_* Darrick J. Wong
                   ` (121 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:33 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Rework everything that used xfs_bmap_free to use xfs_defer_ops
instead.  For now we'll just remove the old symbols and play some
cpp magic to make it work; in the next patch we'll actually rename
everything.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 include/libxfs.h         |    5 +++-
 libxfs/libxfs_api_defs.h |    2 --
 libxfs/util.c            |   25 +--------------------
 libxfs/xfs_alloc.c       |    1 +
 libxfs/xfs_attr.c        |    1 +
 libxfs/xfs_attr_remote.c |    1 +
 libxfs/xfs_bmap.c        |   55 +++++++++-------------------------------------
 libxfs/xfs_bmap.h        |   32 ---------------------------
 libxfs/xfs_bmap_btree.c  |    5 +++-
 libxfs/xfs_btree.c       |    1 +
 libxfs/xfs_defer.h       |    7 ++++++
 libxfs/xfs_dir2.c        |    1 +
 libxfs/xfs_ialloc.c      |    1 +
 libxfs/xfs_inode_buf.c   |    1 +
 libxfs/xfs_sb.c          |    1 +
 15 files changed, 34 insertions(+), 105 deletions(-)


diff --git a/include/libxfs.h b/include/libxfs.h
index a34a3a9..71e758b 100644
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -60,6 +60,7 @@ extern uint32_t crc32c_le(uint32_t crc, unsigned char const *p, size_t len);
 #include "xfs_bit.h"
 #include "xfs_sb.h"
 #include "xfs_mount.h"
+#include "xfs_defer.h"
 #include "xfs_da_format.h"
 #include "xfs_da_btree.h"
 #include "xfs_dir2.h"
@@ -163,7 +164,9 @@ extern unsigned int	libxfs_log2_roundup(unsigned int i);
 
 extern int	libxfs_alloc_file_space (struct xfs_inode *, xfs_off_t,
 				xfs_off_t, int, int);
-extern int	libxfs_bmap_finish(xfs_trans_t **, xfs_bmap_free_t *, struct xfs_inode *);
+#define libxfs_bmap_finish	xfs_defer_finish
+#define libxfs_bmap_cancel	xfs_defer_cancel
+#define xfs_bmap_free_t		struct xfs_defer_ops
 
 extern void 	libxfs_fs_repair_cmn_err(int, struct xfs_mount *, char *, ...);
 extern void	libxfs_fs_cmn_err(int, struct xfs_mount *, char *, ...);
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index 611a849..84b86bc 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -63,10 +63,8 @@
 #define xfs_attr_leaf_newentsize	libxfs_attr_leaf_newentsize
 
 #define xfs_alloc_fix_freelist		libxfs_alloc_fix_freelist
-#define xfs_bmap_cancel			libxfs_bmap_cancel
 #define xfs_bmap_last_offset		libxfs_bmap_last_offset
 #define xfs_bmap_search_extents		libxfs_bmap_search_extents
-#define xfs_bmap_finish			libxfs_bmap_finish
 #define xfs_bmapi_write			libxfs_bmapi_write
 #define xfs_bmapi_read			libxfs_bmapi_read
 #define xfs_bunmapi			libxfs_bunmapi
diff --git a/libxfs/util.c b/libxfs/util.c
index f37b396..ab007b0 100644
--- a/libxfs/util.c
+++ b/libxfs/util.c
@@ -25,6 +25,7 @@
 #include "xfs_log_format.h"
 #include "xfs_trans_resv.h"
 #include "xfs_mount.h"
+#include "xfs_defer.h"
 #include "xfs_inode_buf.h"
 #include "xfs_inode_fork.h"
 #include "xfs_inode.h"
@@ -474,30 +475,6 @@ libxfs_mod_incore_sb(
 	}
 }
 
-int
-libxfs_bmap_finish(
-	struct xfs_trans	**tp,
-	struct xfs_bmap_free	*flist,
-	struct xfs_inode	*ip)
-{
-	struct xfs_bmap_free_item	*free;	/* free extent list item */
-	int			error;
-
-	if (flist->xbf_count == 0)
-		return 0;
-
-	while (!list_empty(&flist->xbf_flist)) {
-		free = list_first_entry(&flist->xbf_flist,
-				struct xfs_bmap_free_item, xbfi_list);
-		error = xfs_free_extent(*tp, free->xbfi_startblock,
-					free->xbfi_blockcount);
-		if (error)
-			return error;
-		xfs_bmap_del_free(flist, free);
-	}
-	return 0;
-}
-
 /*
  * This routine allocates disk space for the given file.
  * Originally derived from xfs_alloc_file_space().
diff --git a/libxfs/xfs_alloc.c b/libxfs/xfs_alloc.c
index 7276419..8454816 100644
--- a/libxfs/xfs_alloc.c
+++ b/libxfs/xfs_alloc.c
@@ -24,6 +24,7 @@
 #include "xfs_bit.h"
 #include "xfs_sb.h"
 #include "xfs_mount.h"
+#include "xfs_defer.h"
 #include "xfs_inode.h"
 #include "xfs_btree.h"
 #include "xfs_alloc_btree.h"
diff --git a/libxfs/xfs_attr.c b/libxfs/xfs_attr.c
index 0b05654..ee370dc 100644
--- a/libxfs/xfs_attr.c
+++ b/libxfs/xfs_attr.c
@@ -23,6 +23,7 @@
 #include "xfs_trans_resv.h"
 #include "xfs_bit.h"
 #include "xfs_mount.h"
+#include "xfs_defer.h"
 #include "xfs_da_format.h"
 #include "xfs_da_btree.h"
 #include "xfs_attr_sf.h"
diff --git a/libxfs/xfs_attr_remote.c b/libxfs/xfs_attr_remote.c
index 79d663e..ed4cc94 100644
--- a/libxfs/xfs_attr_remote.c
+++ b/libxfs/xfs_attr_remote.c
@@ -24,6 +24,7 @@
 #include "xfs_trans_resv.h"
 #include "xfs_bit.h"
 #include "xfs_mount.h"
+#include "xfs_defer.h"
 #include "xfs_da_format.h"
 #include "xfs_da_btree.h"
 #include "xfs_inode.h"
diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index 65de5ad..a2d8268 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -24,6 +24,7 @@
 #include "xfs_bit.h"
 #include "xfs_sb.h"
 #include "xfs_mount.h"
+#include "xfs_defer.h"
 #include "xfs_da_format.h"
 #include "xfs_da_btree.h"
 #include "xfs_dir2.h"
@@ -587,41 +588,7 @@ xfs_bmap_add_free(
 	new = kmem_zone_alloc(xfs_bmap_free_item_zone, KM_SLEEP);
 	new->xbfi_startblock = bno;
 	new->xbfi_blockcount = (xfs_extlen_t)len;
-	list_add(&new->xbfi_list, &flist->xbf_flist);
-	flist->xbf_count++;
-}
-
-/*
- * Remove the entry "free" from the free item list.  Prev points to the
- * previous entry, unless "free" is the head of the list.
- */
-void
-xfs_bmap_del_free(
-	struct xfs_bmap_free		*flist,	/* free item list header */
-	struct xfs_bmap_free_item	*free)	/* list item to be freed */
-{
-	list_del(&free->xbfi_list);
-	flist->xbf_count--;
-	kmem_zone_free(xfs_bmap_free_item_zone, free);
-}
-
-/*
- * Free up any items left in the list.
- */
-void
-xfs_bmap_cancel(
-	struct xfs_bmap_free		*flist)	/* list of bmap_free_items */
-{
-	struct xfs_bmap_free_item	*free;	/* free list item */
-
-	if (flist->xbf_count == 0)
-		return;
-	while (!list_empty(&flist->xbf_flist)) {
-		free = list_first_entry(&flist->xbf_flist,
-				struct xfs_bmap_free_item, xbfi_list);
-		xfs_bmap_del_free(flist, free);
-	}
-	ASSERT(flist->xbf_count == 0);
+	xfs_defer_add(flist, XFS_DEFER_OPS_TYPE_FREE, &new->xbfi_list);
 }
 
 /*
@@ -759,7 +726,7 @@ xfs_bmap_extents_to_btree(
 	if (*firstblock == NULLFSBLOCK) {
 		args.type = XFS_ALLOCTYPE_START_BNO;
 		args.fsbno = XFS_INO_TO_FSB(mp, ip->i_ino);
-	} else if (flist->xbf_low) {
+	} else if (flist->dop_low) {
 		args.type = XFS_ALLOCTYPE_START_BNO;
 		args.fsbno = *firstblock;
 	} else {
@@ -780,7 +747,7 @@ xfs_bmap_extents_to_btree(
 	ASSERT(args.fsbno != NULLFSBLOCK);
 	ASSERT(*firstblock == NULLFSBLOCK ||
 	       args.agno == XFS_FSB_TO_AGNO(mp, *firstblock) ||
-	       (flist->xbf_low &&
+	       (flist->dop_low &&
 		args.agno > XFS_FSB_TO_AGNO(mp, *firstblock)));
 	*firstblock = cur->bc_private.b.firstblock = args.fsbno;
 	cur->bc_private.b.allocated++;
@@ -3700,7 +3667,7 @@ xfs_bmap_btalloc(
 			error = xfs_bmap_btalloc_nullfb(ap, &args, &blen);
 		if (error)
 			return error;
-	} else if (ap->flist->xbf_low) {
+	} else if (ap->flist->dop_low) {
 		if (xfs_inode_is_filestream(ap->ip))
 			args.type = XFS_ALLOCTYPE_FIRST_AG;
 		else
@@ -3733,7 +3700,7 @@ xfs_bmap_btalloc(
 	 * is >= the stripe unit and the allocation offset is
 	 * at the end of file.
 	 */
-	if (!ap->flist->xbf_low && ap->aeof) {
+	if (!ap->flist->dop_low && ap->aeof) {
 		if (!ap->offset) {
 			args.alignment = stripe_align;
 			atype = args.type;
@@ -3826,7 +3793,7 @@ xfs_bmap_btalloc(
 		args.minleft = 0;
 		if ((error = xfs_alloc_vextent(&args)))
 			return error;
-		ap->flist->xbf_low = 1;
+		ap->flist->dop_low = true;
 	}
 	if (args.fsbno != NULLFSBLOCK) {
 		/*
@@ -3836,7 +3803,7 @@ xfs_bmap_btalloc(
 		ASSERT(*ap->firstblock == NULLFSBLOCK ||
 		       XFS_FSB_TO_AGNO(mp, *ap->firstblock) ==
 		       XFS_FSB_TO_AGNO(mp, args.fsbno) ||
-		       (ap->flist->xbf_low &&
+		       (ap->flist->dop_low &&
 			XFS_FSB_TO_AGNO(mp, *ap->firstblock) <
 			XFS_FSB_TO_AGNO(mp, args.fsbno)));
 
@@ -3844,7 +3811,7 @@ xfs_bmap_btalloc(
 		if (*ap->firstblock == NULLFSBLOCK)
 			*ap->firstblock = args.fsbno;
 		ASSERT(nullfb || fb_agno == args.agno ||
-		       (ap->flist->xbf_low && fb_agno < args.agno));
+		       (ap->flist->dop_low && fb_agno < args.agno));
 		ap->length = args.len;
 		ap->ip->i_d.di_nblocks += args.len;
 		xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE);
@@ -4311,7 +4278,7 @@ xfs_bmapi_allocate(
 	if (error)
 		return error;
 
-	if (bma->flist->xbf_low)
+	if (bma->flist->dop_low)
 		bma->minleft = 0;
 	if (bma->cur)
 		bma->cur->bc_private.b.firstblock = *bma->firstblock;
@@ -4676,7 +4643,7 @@ error0:
 			       XFS_FSB_TO_AGNO(mp, *firstblock) ==
 			       XFS_FSB_TO_AGNO(mp,
 				       bma.cur->bc_private.b.firstblock) ||
-			       (flist->xbf_low &&
+			       (flist->dop_low &&
 				XFS_FSB_TO_AGNO(mp, *firstblock) <
 				XFS_FSB_TO_AGNO(mp,
 					bma.cur->bc_private.b.firstblock)));
diff --git a/libxfs/xfs_bmap.h b/libxfs/xfs_bmap.h
index c165b2d..9a74610 100644
--- a/libxfs/xfs_bmap.h
+++ b/libxfs/xfs_bmap.h
@@ -69,27 +69,6 @@ struct xfs_bmap_free_item
 	struct list_head	xbfi_list;
 };
 
-/*
- * Header for free extent list.
- *
- * xbf_low is used by the allocator to activate the lowspace algorithm -
- * when free space is running low the extent allocator may choose to
- * allocate an extent from an AG without leaving sufficient space for
- * a btree split when inserting the new extent.  In this case the allocator
- * will enable the lowspace algorithm which is supposed to allow further
- * allocations (such as btree splits and newroots) to allocate from
- * sequential AGs.  In order to avoid locking AGs out of order the lowspace
- * algorithm will start searching for free space from AG 0.  If the correct
- * transaction reservations have been made then this algorithm will eventually
- * find all the space it needs.
- */
-typedef	struct xfs_bmap_free
-{
-	struct list_head	xbf_flist;	/* list of to-be-free extents */
-	int			xbf_count;	/* count of items on list */
-	int			xbf_low;	/* alloc in low mode */
-} xfs_bmap_free_t;
-
 #define	XFS_BMAP_MAX_NMAP	4
 
 /*
@@ -139,14 +118,6 @@ static inline int xfs_bmapi_aflag(int w)
 #define	DELAYSTARTBLOCK		((xfs_fsblock_t)-1LL)
 #define	HOLESTARTBLOCK		((xfs_fsblock_t)-2LL)
 
-static inline void xfs_bmap_init(xfs_bmap_free_t *flp, xfs_fsblock_t *fbp)
-{
-	INIT_LIST_HEAD(&flp->xbf_flist);
-	flp->xbf_count = 0;
-	flp->xbf_low = 0;
-	*fbp = NULLFSBLOCK;
-}
-
 /*
  * Flags for xfs_bmap_add_extent*.
  */
@@ -195,9 +166,6 @@ int	xfs_bmap_add_attrfork(struct xfs_inode *ip, int size, int rsvd);
 void	xfs_bmap_local_to_extents_empty(struct xfs_inode *ip, int whichfork);
 void	xfs_bmap_add_free(struct xfs_mount *mp, struct xfs_bmap_free *flist,
 			  xfs_fsblock_t bno, xfs_filblks_t len);
-void	xfs_bmap_cancel(struct xfs_bmap_free *flist);
-int	xfs_bmap_finish(struct xfs_trans **tp, struct xfs_bmap_free *flist,
-			struct xfs_inode *ip);
 void	xfs_bmap_compute_maxlevels(struct xfs_mount *mp, int whichfork);
 int	xfs_bmap_first_unused(struct xfs_trans *tp, struct xfs_inode *ip,
 		xfs_extlen_t len, xfs_fileoff_t *unused, int whichfork);
diff --git a/libxfs/xfs_bmap_btree.c b/libxfs/xfs_bmap_btree.c
index 2ae701e..8d4d4b0 100644
--- a/libxfs/xfs_bmap_btree.c
+++ b/libxfs/xfs_bmap_btree.c
@@ -23,6 +23,7 @@
 #include "xfs_trans_resv.h"
 #include "xfs_bit.h"
 #include "xfs_mount.h"
+#include "xfs_defer.h"
 #include "xfs_inode.h"
 #include "xfs_trans.h"
 #include "xfs_alloc.h"
@@ -459,7 +460,7 @@ xfs_bmbt_alloc_block(
 		 * block allocation here and corrupt the filesystem.
 		 */
 		args.minleft = args.tp->t_blk_res;
-	} else if (cur->bc_private.b.flist->xbf_low) {
+	} else if (cur->bc_private.b.flist->dop_low) {
 		args.type = XFS_ALLOCTYPE_START_BNO;
 	} else {
 		args.type = XFS_ALLOCTYPE_NEAR_BNO;
@@ -487,7 +488,7 @@ xfs_bmbt_alloc_block(
 		error = xfs_alloc_vextent(&args);
 		if (error)
 			goto error0;
-		cur->bc_private.b.flist->xbf_low = 1;
+		cur->bc_private.b.flist->dop_low = true;
 	}
 	if (args.fsbno == NULLFSBLOCK) {
 		XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT);
diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c
index 4f48878..c5a475f 100644
--- a/libxfs/xfs_btree.c
+++ b/libxfs/xfs_btree.c
@@ -23,6 +23,7 @@
 #include "xfs_trans_resv.h"
 #include "xfs_bit.h"
 #include "xfs_mount.h"
+#include "xfs_defer.h"
 #include "xfs_inode.h"
 #include "xfs_trans.h"
 #include "xfs_btree.h"
diff --git a/libxfs/xfs_defer.h b/libxfs/xfs_defer.h
index 743fc32..4c05ba6 100644
--- a/libxfs/xfs_defer.h
+++ b/libxfs/xfs_defer.h
@@ -94,4 +94,11 @@ struct xfs_defer_op_type {
 void xfs_defer_init_op_type(const struct xfs_defer_op_type *type);
 void xfs_defer_init_types(void);
 
+/* XXX: compatibility shims, will go away in the next patch */
+#define xfs_bmap_finish		xfs_defer_finish
+#define xfs_bmap_cancel		xfs_defer_cancel
+#define xfs_bmap_init		xfs_defer_init
+#define xfs_bmap_free		xfs_defer_ops
+typedef struct xfs_defer_ops	xfs_bmap_free_t;
+
 #endif /* __XFS_DEFER_H__ */
diff --git a/libxfs/xfs_dir2.c b/libxfs/xfs_dir2.c
index 723b267..402611f 100644
--- a/libxfs/xfs_dir2.c
+++ b/libxfs/xfs_dir2.c
@@ -21,6 +21,7 @@
 #include "xfs_log_format.h"
 #include "xfs_trans_resv.h"
 #include "xfs_mount.h"
+#include "xfs_defer.h"
 #include "xfs_da_format.h"
 #include "xfs_da_btree.h"
 #include "xfs_inode.h"
diff --git a/libxfs/xfs_ialloc.c b/libxfs/xfs_ialloc.c
index 249512b..44d5e76 100644
--- a/libxfs/xfs_ialloc.c
+++ b/libxfs/xfs_ialloc.c
@@ -24,6 +24,7 @@
 #include "xfs_bit.h"
 #include "xfs_sb.h"
 #include "xfs_mount.h"
+#include "xfs_defer.h"
 #include "xfs_inode.h"
 #include "xfs_btree.h"
 #include "xfs_ialloc.h"
diff --git a/libxfs/xfs_inode_buf.c b/libxfs/xfs_inode_buf.c
index c21a4e6..572c101 100644
--- a/libxfs/xfs_inode_buf.c
+++ b/libxfs/xfs_inode_buf.c
@@ -22,6 +22,7 @@
 #include "xfs_log_format.h"
 #include "xfs_trans_resv.h"
 #include "xfs_mount.h"
+#include "xfs_defer.h"
 #include "xfs_inode.h"
 #include "xfs_cksum.h"
 #include "xfs_trans.h"
diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index e2cc83e..a4ee48e 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -24,6 +24,7 @@
 #include "xfs_bit.h"
 #include "xfs_sb.h"
 #include "xfs_mount.h"
+#include "xfs_defer.h"
 #include "xfs_inode.h"
 #include "xfs_ialloc.h"
 #include "xfs_alloc.h"

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 024/145] xfs: change xfs_bmap_{finish, cancel, init, free} -> xfs_defer_*
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (22 preceding siblings ...)
  2016-06-17  1:33 ` [PATCH 023/145] xfs: rework xfs_bmap_free callers to use xfs_defer_ops Darrick J. Wong
@ 2016-06-17  1:33 ` Darrick J. Wong
  2016-06-17  1:33 ` [PATCH 025/145] xfs: rename flist/free_list to dfops Darrick J. Wong
                   ` (120 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:33 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Drop the compatibility shims that we were using to integrate the new
deferred operation mechanism into the existing code.  No new code.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 include/libxfs.h         |    3 --
 libxfs/util.c            |    8 +++--
 libxfs/xfs_attr.c        |   58 ++++++++++++++++++++-------------------
 libxfs/xfs_attr_remote.c |   14 +++++----
 libxfs/xfs_bmap.c        |   38 +++++++++++++-------------
 libxfs/xfs_bmap.h        |   10 +++----
 libxfs/xfs_btree.h       |    5 ++-
 libxfs/xfs_da_btree.h    |    4 +--
 libxfs/xfs_defer.h       |    7 -----
 libxfs/xfs_dir2.c        |    6 ++--
 libxfs/xfs_dir2.h        |    8 +++--
 libxfs/xfs_ialloc.c      |    6 ++--
 libxfs/xfs_ialloc.h      |    2 +
 libxfs/xfs_trans_resv.c  |    4 +--
 mkfs/proto.c             |   30 ++++++++++----------
 repair/phase6.c          |   68 +++++++++++++++++++++++-----------------------
 16 files changed, 131 insertions(+), 140 deletions(-)


diff --git a/include/libxfs.h b/include/libxfs.h
index 71e758b..863b0e3 100644
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -164,9 +164,6 @@ extern unsigned int	libxfs_log2_roundup(unsigned int i);
 
 extern int	libxfs_alloc_file_space (struct xfs_inode *, xfs_off_t,
 				xfs_off_t, int, int);
-#define libxfs_bmap_finish	xfs_defer_finish
-#define libxfs_bmap_cancel	xfs_defer_cancel
-#define xfs_bmap_free_t		struct xfs_defer_ops
 
 extern void 	libxfs_fs_repair_cmn_err(int, struct xfs_mount *, char *, ...);
 extern void	libxfs_fs_cmn_err(int, struct xfs_mount *, char *, ...);
diff --git a/libxfs/util.c b/libxfs/util.c
index ab007b0..4b81818 100644
--- a/libxfs/util.c
+++ b/libxfs/util.c
@@ -493,7 +493,7 @@ libxfs_alloc_file_space(
 	xfs_filblks_t	allocated_fsb;
 	xfs_filblks_t	allocatesize_fsb;
 	xfs_fsblock_t	firstfsb;
-	xfs_bmap_free_t free_list;
+	struct xfs_defer_ops free_list;
 	xfs_bmbt_irec_t *imapp;
 	xfs_bmbt_irec_t imaps[1];
 	int		reccount;
@@ -535,7 +535,7 @@ libxfs_alloc_file_space(
 		}
 		xfs_trans_ijoin(tp, ip, 0);
 
-		xfs_bmap_init(&free_list, &firstfsb);
+		xfs_defer_init(&free_list, &firstfsb);
 		error = xfs_bmapi_write(tp, ip, startoffset_fsb, allocatesize_fsb,
 				xfs_bmapi_flags, &firstfsb, 0, imapp,
 				&reccount, &free_list);
@@ -544,7 +544,7 @@ libxfs_alloc_file_space(
 			goto error0;
 
 		/* complete the transaction */
-		error = xfs_bmap_finish(&tp, &free_list, ip);
+		error = xfs_defer_finish(&tp, &free_list, ip);
 		if (error)
 			goto error0;
 
@@ -562,7 +562,7 @@ libxfs_alloc_file_space(
 	return error;
 
 error0:	/* Cancel bmap, cancel trans */
-	xfs_bmap_cancel(&free_list);
+	xfs_defer_cancel(&free_list);
 	xfs_trans_cancel(tp);
 	return error;
 }
diff --git a/libxfs/xfs_attr.c b/libxfs/xfs_attr.c
index ee370dc..06b3c5d 100644
--- a/libxfs/xfs_attr.c
+++ b/libxfs/xfs_attr.c
@@ -199,7 +199,7 @@ xfs_attr_set(
 {
 	struct xfs_mount	*mp = dp->i_mount;
 	struct xfs_da_args	args;
-	struct xfs_bmap_free	flist;
+	struct xfs_defer_ops	flist;
 	struct xfs_trans_res	tres;
 	xfs_fsblock_t		firstblock;
 	int			rsvd = (flags & ATTR_ROOT) != 0;
@@ -312,13 +312,13 @@ xfs_attr_set(
 		 * It won't fit in the shortform, transform to a leaf block.
 		 * GROT: another possible req'mt for a double-split btree op.
 		 */
-		xfs_bmap_init(args.flist, args.firstblock);
+		xfs_defer_init(args.flist, args.firstblock);
 		error = xfs_attr_shortform_to_leaf(&args);
 		if (!error)
-			error = xfs_bmap_finish(&args.trans, args.flist, dp);
+			error = xfs_defer_finish(&args.trans, args.flist, dp);
 		if (error) {
 			args.trans = NULL;
-			xfs_bmap_cancel(&flist);
+			xfs_defer_cancel(&flist);
 			goto out;
 		}
 
@@ -378,7 +378,7 @@ xfs_attr_remove(
 {
 	struct xfs_mount	*mp = dp->i_mount;
 	struct xfs_da_args	args;
-	struct xfs_bmap_free	flist;
+	struct xfs_defer_ops	flist;
 	xfs_fsblock_t		firstblock;
 	int			error;
 
@@ -580,13 +580,13 @@ xfs_attr_leaf_addname(xfs_da_args_t *args)
 		 * Commit that transaction so that the node_addname() call
 		 * can manage its own transactions.
 		 */
-		xfs_bmap_init(args->flist, args->firstblock);
+		xfs_defer_init(args->flist, args->firstblock);
 		error = xfs_attr3_leaf_to_node(args);
 		if (!error)
-			error = xfs_bmap_finish(&args->trans, args->flist, dp);
+			error = xfs_defer_finish(&args->trans, args->flist, dp);
 		if (error) {
 			args->trans = NULL;
-			xfs_bmap_cancel(args->flist);
+			xfs_defer_cancel(args->flist);
 			return error;
 		}
 
@@ -670,15 +670,15 @@ xfs_attr_leaf_addname(xfs_da_args_t *args)
 		 * If the result is small enough, shrink it all into the inode.
 		 */
 		if ((forkoff = xfs_attr_shortform_allfit(bp, dp))) {
-			xfs_bmap_init(args->flist, args->firstblock);
+			xfs_defer_init(args->flist, args->firstblock);
 			error = xfs_attr3_leaf_to_shortform(bp, args, forkoff);
 			/* bp is gone due to xfs_da_shrink_inode */
 			if (!error)
-				error = xfs_bmap_finish(&args->trans,
+				error = xfs_defer_finish(&args->trans,
 							args->flist, dp);
 			if (error) {
 				args->trans = NULL;
-				xfs_bmap_cancel(args->flist);
+				xfs_defer_cancel(args->flist);
 				return error;
 			}
 		}
@@ -733,14 +733,14 @@ xfs_attr_leaf_removename(xfs_da_args_t *args)
 	 * If the result is small enough, shrink it all into the inode.
 	 */
 	if ((forkoff = xfs_attr_shortform_allfit(bp, dp))) {
-		xfs_bmap_init(args->flist, args->firstblock);
+		xfs_defer_init(args->flist, args->firstblock);
 		error = xfs_attr3_leaf_to_shortform(bp, args, forkoff);
 		/* bp is gone due to xfs_da_shrink_inode */
 		if (!error)
-			error = xfs_bmap_finish(&args->trans, args->flist, dp);
+			error = xfs_defer_finish(&args->trans, args->flist, dp);
 		if (error) {
 			args->trans = NULL;
-			xfs_bmap_cancel(args->flist);
+			xfs_defer_cancel(args->flist);
 			return error;
 		}
 	}
@@ -859,14 +859,14 @@ restart:
 			 */
 			xfs_da_state_free(state);
 			state = NULL;
-			xfs_bmap_init(args->flist, args->firstblock);
+			xfs_defer_init(args->flist, args->firstblock);
 			error = xfs_attr3_leaf_to_node(args);
 			if (!error)
-				error = xfs_bmap_finish(&args->trans,
+				error = xfs_defer_finish(&args->trans,
 							args->flist, dp);
 			if (error) {
 				args->trans = NULL;
-				xfs_bmap_cancel(args->flist);
+				xfs_defer_cancel(args->flist);
 				goto out;
 			}
 
@@ -887,13 +887,13 @@ restart:
 		 * in the index/blkno/rmtblkno/rmtblkcnt fields and
 		 * in the index2/blkno2/rmtblkno2/rmtblkcnt2 fields.
 		 */
-		xfs_bmap_init(args->flist, args->firstblock);
+		xfs_defer_init(args->flist, args->firstblock);
 		error = xfs_da3_split(state);
 		if (!error)
-			error = xfs_bmap_finish(&args->trans, args->flist, dp);
+			error = xfs_defer_finish(&args->trans, args->flist, dp);
 		if (error) {
 			args->trans = NULL;
-			xfs_bmap_cancel(args->flist);
+			xfs_defer_cancel(args->flist);
 			goto out;
 		}
 	} else {
@@ -986,14 +986,14 @@ restart:
 		 * Check to see if the tree needs to be collapsed.
 		 */
 		if (retval && (state->path.active > 1)) {
-			xfs_bmap_init(args->flist, args->firstblock);
+			xfs_defer_init(args->flist, args->firstblock);
 			error = xfs_da3_join(state);
 			if (!error)
-				error = xfs_bmap_finish(&args->trans,
+				error = xfs_defer_finish(&args->trans,
 							args->flist, dp);
 			if (error) {
 				args->trans = NULL;
-				xfs_bmap_cancel(args->flist);
+				xfs_defer_cancel(args->flist);
 				goto out;
 			}
 		}
@@ -1109,13 +1109,13 @@ xfs_attr_node_removename(xfs_da_args_t *args)
 	 * Check to see if the tree needs to be collapsed.
 	 */
 	if (retval && (state->path.active > 1)) {
-		xfs_bmap_init(args->flist, args->firstblock);
+		xfs_defer_init(args->flist, args->firstblock);
 		error = xfs_da3_join(state);
 		if (!error)
-			error = xfs_bmap_finish(&args->trans, args->flist, dp);
+			error = xfs_defer_finish(&args->trans, args->flist, dp);
 		if (error) {
 			args->trans = NULL;
-			xfs_bmap_cancel(args->flist);
+			xfs_defer_cancel(args->flist);
 			goto out;
 		}
 		/*
@@ -1142,15 +1142,15 @@ xfs_attr_node_removename(xfs_da_args_t *args)
 			goto out;
 
 		if ((forkoff = xfs_attr_shortform_allfit(bp, dp))) {
-			xfs_bmap_init(args->flist, args->firstblock);
+			xfs_defer_init(args->flist, args->firstblock);
 			error = xfs_attr3_leaf_to_shortform(bp, args, forkoff);
 			/* bp is gone due to xfs_da_shrink_inode */
 			if (!error)
-				error = xfs_bmap_finish(&args->trans,
+				error = xfs_defer_finish(&args->trans,
 							args->flist, dp);
 			if (error) {
 				args->trans = NULL;
-				xfs_bmap_cancel(args->flist);
+				xfs_defer_cancel(args->flist);
 				goto out;
 			}
 		} else
diff --git a/libxfs/xfs_attr_remote.c b/libxfs/xfs_attr_remote.c
index ed4cc94..dc545f0 100644
--- a/libxfs/xfs_attr_remote.c
+++ b/libxfs/xfs_attr_remote.c
@@ -456,16 +456,16 @@ xfs_attr_rmtval_set(
 		 * extent and then crash then the block may not contain the
 		 * correct metadata after log recovery occurs.
 		 */
-		xfs_bmap_init(args->flist, args->firstblock);
+		xfs_defer_init(args->flist, args->firstblock);
 		nmap = 1;
 		error = xfs_bmapi_write(args->trans, dp, (xfs_fileoff_t)lblkno,
 				  blkcnt, XFS_BMAPI_ATTRFORK, args->firstblock,
 				  args->total, &map, &nmap, args->flist);
 		if (!error)
-			error = xfs_bmap_finish(&args->trans, args->flist, dp);
+			error = xfs_defer_finish(&args->trans, args->flist, dp);
 		if (error) {
 			args->trans = NULL;
-			xfs_bmap_cancel(args->flist);
+			xfs_defer_cancel(args->flist);
 			return error;
 		}
 
@@ -499,7 +499,7 @@ xfs_attr_rmtval_set(
 
 		ASSERT(blkcnt > 0);
 
-		xfs_bmap_init(args->flist, args->firstblock);
+		xfs_defer_init(args->flist, args->firstblock);
 		nmap = 1;
 		error = xfs_bmapi_read(dp, (xfs_fileoff_t)lblkno,
 				       blkcnt, &map, &nmap,
@@ -599,16 +599,16 @@ xfs_attr_rmtval_remove(
 	blkcnt = args->rmtblkcnt;
 	done = 0;
 	while (!done) {
-		xfs_bmap_init(args->flist, args->firstblock);
+		xfs_defer_init(args->flist, args->firstblock);
 		error = xfs_bunmapi(args->trans, args->dp, lblkno, blkcnt,
 				    XFS_BMAPI_ATTRFORK, 1, args->firstblock,
 				    args->flist, &done);
 		if (!error)
-			error = xfs_bmap_finish(&args->trans, args->flist,
+			error = xfs_defer_finish(&args->trans, args->flist,
 						args->dp);
 		if (error) {
 			args->trans = NULL;
-			xfs_bmap_cancel(args->flist);
+			xfs_defer_cancel(args->flist);
 			return error;
 		}
 
diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index a2d8268..0faa0a9 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -564,7 +564,7 @@ xfs_bmap_validate_ret(
 void
 xfs_bmap_add_free(
 	struct xfs_mount	*mp,		/* mount point structure */
-	struct xfs_bmap_free	*flist,		/* list of extents */
+	struct xfs_defer_ops	*flist,		/* list of extents */
 	xfs_fsblock_t		bno,		/* fs block number of extent */
 	xfs_filblks_t		len)		/* length of extent */
 {
@@ -664,7 +664,7 @@ xfs_bmap_extents_to_btree(
 	xfs_trans_t		*tp,		/* transaction pointer */
 	xfs_inode_t		*ip,		/* incore inode pointer */
 	xfs_fsblock_t		*firstblock,	/* first-block-allocated */
-	xfs_bmap_free_t		*flist,		/* blocks freed in xaction */
+	struct xfs_defer_ops	*flist,		/* blocks freed in xaction */
 	xfs_btree_cur_t		**curp,		/* cursor returned to caller */
 	int			wasdel,		/* converting a delayed alloc */
 	int			*logflagsp,	/* inode logging flags */
@@ -932,7 +932,7 @@ xfs_bmap_add_attrfork_btree(
 	xfs_trans_t		*tp,		/* transaction pointer */
 	xfs_inode_t		*ip,		/* incore inode pointer */
 	xfs_fsblock_t		*firstblock,	/* first block allocated */
-	xfs_bmap_free_t		*flist,		/* blocks to free at commit */
+	struct xfs_defer_ops	*flist,		/* blocks to free at commit */
 	int			*flags)		/* inode logging flags */
 {
 	xfs_btree_cur_t		*cur;		/* btree cursor */
@@ -975,7 +975,7 @@ xfs_bmap_add_attrfork_extents(
 	xfs_trans_t		*tp,		/* transaction pointer */
 	xfs_inode_t		*ip,		/* incore inode pointer */
 	xfs_fsblock_t		*firstblock,	/* first block allocated */
-	xfs_bmap_free_t		*flist,		/* blocks to free at commit */
+	struct xfs_defer_ops	*flist,		/* blocks to free at commit */
 	int			*flags)		/* inode logging flags */
 {
 	xfs_btree_cur_t		*cur;		/* bmap btree cursor */
@@ -1010,7 +1010,7 @@ xfs_bmap_add_attrfork_local(
 	xfs_trans_t		*tp,		/* transaction pointer */
 	xfs_inode_t		*ip,		/* incore inode pointer */
 	xfs_fsblock_t		*firstblock,	/* first block allocated */
-	xfs_bmap_free_t		*flist,		/* blocks to free at commit */
+	struct xfs_defer_ops	*flist,		/* blocks to free at commit */
 	int			*flags)		/* inode logging flags */
 {
 	xfs_da_args_t		dargs;		/* args for dir/attr code */
@@ -1051,7 +1051,7 @@ xfs_bmap_add_attrfork(
 	int			rsvd)		/* xact may use reserved blks */
 {
 	xfs_fsblock_t		firstblock;	/* 1st block/ag allocated */
-	xfs_bmap_free_t		flist;		/* freed extent records */
+	struct xfs_defer_ops	flist;		/* freed extent records */
 	xfs_mount_t		*mp;		/* mount structure */
 	xfs_trans_t		*tp;		/* transaction pointer */
 	int			blks;		/* space reservation */
@@ -1117,7 +1117,7 @@ xfs_bmap_add_attrfork(
 	ip->i_afp = kmem_zone_zalloc(xfs_ifork_zone, KM_SLEEP);
 	ip->i_afp->if_flags = XFS_IFEXTENTS;
 	logflags = 0;
-	xfs_bmap_init(&flist, &firstblock);
+	xfs_defer_init(&flist, &firstblock);
 	switch (ip->i_d.di_format) {
 	case XFS_DINODE_FMT_LOCAL:
 		error = xfs_bmap_add_attrfork_local(tp, ip, &firstblock, &flist,
@@ -1157,7 +1157,7 @@ xfs_bmap_add_attrfork(
 			xfs_log_sb(tp);
 	}
 
-	error = xfs_bmap_finish(&tp, &flist, NULL);
+	error = xfs_defer_finish(&tp, &flist, NULL);
 	if (error)
 		goto bmap_cancel;
 	error = xfs_trans_commit(tp);
@@ -1165,7 +1165,7 @@ xfs_bmap_add_attrfork(
 	return error;
 
 bmap_cancel:
-	xfs_bmap_cancel(&flist);
+	xfs_defer_cancel(&flist);
 trans_cancel:
 	xfs_trans_cancel(tp);
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
@@ -2206,7 +2206,7 @@ xfs_bmap_add_extent_unwritten_real(
 	xfs_btree_cur_t		**curp,	/* if *curp is null, not a btree */
 	xfs_bmbt_irec_t		*new,	/* new data to add to file extents */
 	xfs_fsblock_t		*first,	/* pointer to firstblock variable */
-	xfs_bmap_free_t		*flist,	/* list of extents to be freed */
+	struct xfs_defer_ops	*flist,	/* list of extents to be freed */
 	int			*logflagsp) /* inode logging flags */
 {
 	xfs_btree_cur_t		*cur;	/* btree cursor */
@@ -4439,7 +4439,7 @@ xfs_bmapi_write(
 	xfs_extlen_t		total,		/* total blocks needed */
 	struct xfs_bmbt_irec	*mval,		/* output: map values */
 	int			*nmap,		/* i/o: mval size/count */
-	struct xfs_bmap_free	*flist)		/* i/o: list extents to free */
+	struct xfs_defer_ops	*flist)		/* i/o: list extents to free */
 {
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_ifork	*ifp;
@@ -4727,7 +4727,7 @@ xfs_bmap_del_extent(
 	xfs_inode_t		*ip,	/* incore inode pointer */
 	xfs_trans_t		*tp,	/* current transaction pointer */
 	xfs_extnum_t		*idx,	/* extent number to update/delete */
-	xfs_bmap_free_t		*flist,	/* list of extents to be freed */
+	struct xfs_defer_ops	*flist,	/* list of extents to be freed */
 	xfs_btree_cur_t		*cur,	/* if null, not a btree */
 	xfs_bmbt_irec_t		*del,	/* data to remove from extents */
 	int			*logflagsp, /* inode logging flags */
@@ -5056,7 +5056,7 @@ xfs_bunmapi(
 	xfs_extnum_t		nexts,		/* number of extents max */
 	xfs_fsblock_t		*firstblock,	/* first allocated block
 						   controls a.g. for allocs */
-	xfs_bmap_free_t		*flist,		/* i/o: list extents to free */
+	struct xfs_defer_ops	*flist,		/* i/o: list extents to free */
 	int			*done)		/* set if not done yet */
 {
 	xfs_btree_cur_t		*cur;		/* bmap btree cursor */
@@ -5670,7 +5670,7 @@ xfs_bmap_shift_extents(
 	int			*done,
 	xfs_fileoff_t		stop_fsb,
 	xfs_fsblock_t		*firstblock,
-	struct xfs_bmap_free	*flist,
+	struct xfs_defer_ops	*flist,
 	enum shift_direction	direction,
 	int			num_exts)
 {
@@ -5824,7 +5824,7 @@ xfs_bmap_split_extent_at(
 	struct xfs_inode	*ip,
 	xfs_fileoff_t		split_fsb,
 	xfs_fsblock_t		*firstfsb,
-	struct xfs_bmap_free	*free_list)
+	struct xfs_defer_ops	*free_list)
 {
 	int				whichfork = XFS_DATA_FORK;
 	struct xfs_btree_cur		*cur = NULL;
@@ -5963,7 +5963,7 @@ xfs_bmap_split_extent(
 {
 	struct xfs_mount        *mp = ip->i_mount;
 	struct xfs_trans        *tp;
-	struct xfs_bmap_free    free_list;
+	struct xfs_defer_ops    free_list;
 	xfs_fsblock_t           firstfsb;
 	int                     error;
 
@@ -5975,21 +5975,21 @@ xfs_bmap_split_extent(
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
 
-	xfs_bmap_init(&free_list, &firstfsb);
+	xfs_defer_init(&free_list, &firstfsb);
 
 	error = xfs_bmap_split_extent_at(tp, ip, split_fsb,
 			&firstfsb, &free_list);
 	if (error)
 		goto out;
 
-	error = xfs_bmap_finish(&tp, &free_list, NULL);
+	error = xfs_defer_finish(&tp, &free_list, NULL);
 	if (error)
 		goto out;
 
 	return xfs_trans_commit(tp);
 
 out:
-	xfs_bmap_cancel(&free_list);
+	xfs_defer_cancel(&free_list);
 	xfs_trans_cancel(tp);
 	return error;
 }
diff --git a/libxfs/xfs_bmap.h b/libxfs/xfs_bmap.h
index 9a74610..a779cc5 100644
--- a/libxfs/xfs_bmap.h
+++ b/libxfs/xfs_bmap.h
@@ -32,7 +32,7 @@ extern kmem_zone_t	*xfs_bmap_free_item_zone;
  */
 struct xfs_bmalloca {
 	xfs_fsblock_t		*firstblock; /* i/o first block allocated */
-	struct xfs_bmap_free	*flist;	/* bmap freelist */
+	struct xfs_defer_ops	*flist;	/* bmap freelist */
 	struct xfs_trans	*tp;	/* transaction pointer */
 	struct xfs_inode	*ip;	/* incore inode pointer */
 	struct xfs_bmbt_irec	prev;	/* extent before the new one */
@@ -164,7 +164,7 @@ void	xfs_bmap_trace_exlist(struct xfs_inode *ip, xfs_extnum_t cnt,
 
 int	xfs_bmap_add_attrfork(struct xfs_inode *ip, int size, int rsvd);
 void	xfs_bmap_local_to_extents_empty(struct xfs_inode *ip, int whichfork);
-void	xfs_bmap_add_free(struct xfs_mount *mp, struct xfs_bmap_free *flist,
+void	xfs_bmap_add_free(struct xfs_mount *mp, struct xfs_defer_ops *flist,
 			  xfs_fsblock_t bno, xfs_filblks_t len);
 void	xfs_bmap_compute_maxlevels(struct xfs_mount *mp, int whichfork);
 int	xfs_bmap_first_unused(struct xfs_trans *tp, struct xfs_inode *ip,
@@ -186,18 +186,18 @@ int	xfs_bmapi_write(struct xfs_trans *tp, struct xfs_inode *ip,
 		xfs_fileoff_t bno, xfs_filblks_t len, int flags,
 		xfs_fsblock_t *firstblock, xfs_extlen_t total,
 		struct xfs_bmbt_irec *mval, int *nmap,
-		struct xfs_bmap_free *flist);
+		struct xfs_defer_ops *flist);
 int	xfs_bunmapi(struct xfs_trans *tp, struct xfs_inode *ip,
 		xfs_fileoff_t bno, xfs_filblks_t len, int flags,
 		xfs_extnum_t nexts, xfs_fsblock_t *firstblock,
-		struct xfs_bmap_free *flist, int *done);
+		struct xfs_defer_ops *flist, int *done);
 int	xfs_check_nostate_extents(struct xfs_ifork *ifp, xfs_extnum_t idx,
 		xfs_extnum_t num);
 uint	xfs_default_attroffset(struct xfs_inode *ip);
 int	xfs_bmap_shift_extents(struct xfs_trans *tp, struct xfs_inode *ip,
 		xfs_fileoff_t *next_fsb, xfs_fileoff_t offset_shift_fsb,
 		int *done, xfs_fileoff_t stop_fsb, xfs_fsblock_t *firstblock,
-		struct xfs_bmap_free *flist, enum shift_direction direction,
+		struct xfs_defer_ops *flist, enum shift_direction direction,
 		int num_exts);
 int	xfs_bmap_split_extent(struct xfs_inode *ip, xfs_fileoff_t split_offset);
 
diff --git a/libxfs/xfs_btree.h b/libxfs/xfs_btree.h
index 0ec3055..ae714a8 100644
--- a/libxfs/xfs_btree.h
+++ b/libxfs/xfs_btree.h
@@ -19,7 +19,7 @@
 #define	__XFS_BTREE_H__
 
 struct xfs_buf;
-struct xfs_bmap_free;
+struct xfs_defer_ops;
 struct xfs_inode;
 struct xfs_mount;
 struct xfs_trans;
@@ -234,11 +234,12 @@ typedef struct xfs_btree_cur
 	union {
 		struct {			/* needed for BNO, CNT, INO */
 			struct xfs_buf	*agbp;	/* agf/agi buffer pointer */
+			struct xfs_defer_ops *flist;	/* deferred updates */
 			xfs_agnumber_t	agno;	/* ag number */
 		} a;
 		struct {			/* needed for BMAP */
 			struct xfs_inode *ip;	/* pointer to our inode */
-			struct xfs_bmap_free *flist;	/* list to free after */
+			struct xfs_defer_ops *flist;	/* deferred updates */
 			xfs_fsblock_t	firstblock;	/* 1st blk allocated */
 			int		allocated;	/* count of alloced */
 			short		forksize;	/* fork's inode space */
diff --git a/libxfs/xfs_da_btree.h b/libxfs/xfs_da_btree.h
index 6e153e3..249813a 100644
--- a/libxfs/xfs_da_btree.h
+++ b/libxfs/xfs_da_btree.h
@@ -19,7 +19,7 @@
 #ifndef __XFS_DA_BTREE_H__
 #define	__XFS_DA_BTREE_H__
 
-struct xfs_bmap_free;
+struct xfs_defer_ops;
 struct xfs_inode;
 struct xfs_trans;
 struct zone;
@@ -70,7 +70,7 @@ typedef struct xfs_da_args {
 	xfs_ino_t	inumber;	/* input/output inode number */
 	struct xfs_inode *dp;		/* directory inode to manipulate */
 	xfs_fsblock_t	*firstblock;	/* ptr to firstblock for bmap calls */
-	struct xfs_bmap_free *flist;	/* ptr to freelist for bmap_finish */
+	struct xfs_defer_ops *flist;	/* ptr to freelist for bmap_finish */
 	struct xfs_trans *trans;	/* current trans (changes over time) */
 	xfs_extlen_t	total;		/* total blocks needed, for 1st bmap */
 	int		whichfork;	/* data or attribute fork */
diff --git a/libxfs/xfs_defer.h b/libxfs/xfs_defer.h
index 4c05ba6..743fc32 100644
--- a/libxfs/xfs_defer.h
+++ b/libxfs/xfs_defer.h
@@ -94,11 +94,4 @@ struct xfs_defer_op_type {
 void xfs_defer_init_op_type(const struct xfs_defer_op_type *type);
 void xfs_defer_init_types(void);
 
-/* XXX: compatibility shims, will go away in the next patch */
-#define xfs_bmap_finish		xfs_defer_finish
-#define xfs_bmap_cancel		xfs_defer_cancel
-#define xfs_bmap_init		xfs_defer_init
-#define xfs_bmap_free		xfs_defer_ops
-typedef struct xfs_defer_ops	xfs_bmap_free_t;
-
 #endif /* __XFS_DEFER_H__ */
diff --git a/libxfs/xfs_dir2.c b/libxfs/xfs_dir2.c
index 402611f..6edaa8e 100644
--- a/libxfs/xfs_dir2.c
+++ b/libxfs/xfs_dir2.c
@@ -258,7 +258,7 @@ xfs_dir_createname(
 	struct xfs_name		*name,
 	xfs_ino_t		inum,		/* new entry inode number */
 	xfs_fsblock_t		*first,		/* bmap's firstblock */
-	xfs_bmap_free_t		*flist,		/* bmap's freeblock list */
+	struct xfs_defer_ops	*flist,		/* bmap's freeblock list */
 	xfs_extlen_t		total)		/* bmap's total block count */
 {
 	struct xfs_da_args	*args;
@@ -435,7 +435,7 @@ xfs_dir_removename(
 	struct xfs_name	*name,
 	xfs_ino_t	ino,
 	xfs_fsblock_t	*first,		/* bmap's firstblock */
-	xfs_bmap_free_t	*flist,		/* bmap's freeblock list */
+	struct xfs_defer_ops	*flist,		/* bmap's freeblock list */
 	xfs_extlen_t	total)		/* bmap's total block count */
 {
 	struct xfs_da_args *args;
@@ -497,7 +497,7 @@ xfs_dir_replace(
 	struct xfs_name	*name,		/* name of entry to replace */
 	xfs_ino_t	inum,		/* new inode number */
 	xfs_fsblock_t	*first,		/* bmap's firstblock */
-	xfs_bmap_free_t	*flist,		/* bmap's freeblock list */
+	struct xfs_defer_ops	*flist,		/* bmap's freeblock list */
 	xfs_extlen_t	total)		/* bmap's total block count */
 {
 	struct xfs_da_args *args;
diff --git a/libxfs/xfs_dir2.h b/libxfs/xfs_dir2.h
index 0a62e73..5737d85 100644
--- a/libxfs/xfs_dir2.h
+++ b/libxfs/xfs_dir2.h
@@ -18,7 +18,7 @@
 #ifndef __XFS_DIR2_H__
 #define __XFS_DIR2_H__
 
-struct xfs_bmap_free;
+struct xfs_defer_ops;
 struct xfs_da_args;
 struct xfs_inode;
 struct xfs_mount;
@@ -129,18 +129,18 @@ extern int xfs_dir_init(struct xfs_trans *tp, struct xfs_inode *dp,
 extern int xfs_dir_createname(struct xfs_trans *tp, struct xfs_inode *dp,
 				struct xfs_name *name, xfs_ino_t inum,
 				xfs_fsblock_t *first,
-				struct xfs_bmap_free *flist, xfs_extlen_t tot);
+				struct xfs_defer_ops *flist, xfs_extlen_t tot);
 extern int xfs_dir_lookup(struct xfs_trans *tp, struct xfs_inode *dp,
 				struct xfs_name *name, xfs_ino_t *inum,
 				struct xfs_name *ci_name);
 extern int xfs_dir_removename(struct xfs_trans *tp, struct xfs_inode *dp,
 				struct xfs_name *name, xfs_ino_t ino,
 				xfs_fsblock_t *first,
-				struct xfs_bmap_free *flist, xfs_extlen_t tot);
+				struct xfs_defer_ops *flist, xfs_extlen_t tot);
 extern int xfs_dir_replace(struct xfs_trans *tp, struct xfs_inode *dp,
 				struct xfs_name *name, xfs_ino_t inum,
 				xfs_fsblock_t *first,
-				struct xfs_bmap_free *flist, xfs_extlen_t tot);
+				struct xfs_defer_ops *flist, xfs_extlen_t tot);
 extern int xfs_dir_canenter(struct xfs_trans *tp, struct xfs_inode *dp,
 				struct xfs_name *name);
 
diff --git a/libxfs/xfs_ialloc.c b/libxfs/xfs_ialloc.c
index 44d5e76..d03570c 100644
--- a/libxfs/xfs_ialloc.c
+++ b/libxfs/xfs_ialloc.c
@@ -1812,7 +1812,7 @@ xfs_difree_inode_chunk(
 	struct xfs_mount		*mp,
 	xfs_agnumber_t			agno,
 	struct xfs_inobt_rec_incore	*rec,
-	struct xfs_bmap_free		*flist)
+	struct xfs_defer_ops		*flist)
 {
 	xfs_agblock_t	sagbno = XFS_AGINO_TO_AGBNO(mp, rec->ir_startino);
 	int		startidx, endidx;
@@ -1884,7 +1884,7 @@ xfs_difree_inobt(
 	struct xfs_trans		*tp,
 	struct xfs_buf			*agbp,
 	xfs_agino_t			agino,
-	struct xfs_bmap_free		*flist,
+	struct xfs_defer_ops		*flist,
 	struct xfs_icluster		*xic,
 	struct xfs_inobt_rec_incore	*orec)
 {
@@ -2116,7 +2116,7 @@ int
 xfs_difree(
 	struct xfs_trans	*tp,		/* transaction pointer */
 	xfs_ino_t		inode,		/* inode to be freed */
-	struct xfs_bmap_free	*flist,		/* extents to free */
+	struct xfs_defer_ops	*flist,		/* extents to free */
 	struct xfs_icluster	*xic)	/* cluster info if deleted */
 {
 	/* REFERENCED */
diff --git a/libxfs/xfs_ialloc.h b/libxfs/xfs_ialloc.h
index 6e450df..2e06b67 100644
--- a/libxfs/xfs_ialloc.h
+++ b/libxfs/xfs_ialloc.h
@@ -95,7 +95,7 @@ int					/* error */
 xfs_difree(
 	struct xfs_trans *tp,		/* transaction pointer */
 	xfs_ino_t	inode,		/* inode to be freed */
-	struct xfs_bmap_free *flist,	/* extents to free */
+	struct xfs_defer_ops *flist,	/* extents to free */
 	struct xfs_icluster *ifree);	/* cluster info if deleted */
 
 /*
diff --git a/libxfs/xfs_trans_resv.c b/libxfs/xfs_trans_resv.c
index 9e519a5..c55220f 100644
--- a/libxfs/xfs_trans_resv.c
+++ b/libxfs/xfs_trans_resv.c
@@ -152,9 +152,9 @@ xfs_calc_finobt_res(
  * item logged to try to account for the overhead of the transaction mechanism.
  *
  * Note:  Most of the reservations underestimate the number of allocation
- * groups into which they could free extents in the xfs_bmap_finish() call.
+ * groups into which they could free extents in the xfs_defer_finish() call.
  * This is because the number in the worst case is quite high and quite
- * unusual.  In order to fix this we need to change xfs_bmap_finish() to free
+ * unusual.  In order to fix this we need to change xfs_defer_finish() to free
  * extents in only a single AG at a time.  This will require changes to the
  * EFI code as well, however, so that the EFI for the extents not freed is
  * logged again in each transaction.  See SGI PV #261917.
diff --git a/mkfs/proto.c b/mkfs/proto.c
index edbaa33..f0c33e4 100644
--- a/mkfs/proto.c
+++ b/mkfs/proto.c
@@ -27,7 +27,7 @@ static char *getstr(char **pp);
 static void fail(char *msg, int i);
 static void getres(struct xfs_mount *mp, uint blocks, struct xfs_trans **tpp);
 static void rsvfile(xfs_mount_t *mp, xfs_inode_t *ip, long long len);
-static int newfile(xfs_trans_t *tp, xfs_inode_t *ip, xfs_bmap_free_t *flist,
+static int newfile(xfs_trans_t *tp, xfs_inode_t *ip, struct xfs_defer_ops *flist,
 	xfs_fsblock_t *first, int dolocal, int logit, char *buf, int len);
 static char *newregfile(char **pp, int *len);
 static void rtinit(xfs_mount_t *mp);
@@ -236,7 +236,7 @@ static int
 newfile(
 	xfs_trans_t	*tp,
 	xfs_inode_t	*ip,
-	xfs_bmap_free_t	*flist,
+	struct xfs_defer_ops	*flist,
 	xfs_fsblock_t	*first,
 	int		dolocal,
 	int		logit,
@@ -329,7 +329,7 @@ newdirent(
 	struct xfs_name	*name,
 	xfs_ino_t	inum,
 	xfs_fsblock_t	*first,
-	xfs_bmap_free_t	*flist)
+	struct xfs_defer_ops	*flist)
 {
 	int	error;
 	int	rsv;
@@ -375,7 +375,7 @@ parseproto(
 	int		error;
 	xfs_fsblock_t	first;
 	int		flags;
-	xfs_bmap_free_t	flist;
+	struct xfs_defer_ops	flist;
 	int		fmt;
 	int		i;
 	xfs_inode_t	*ip;
@@ -460,7 +460,7 @@ parseproto(
 	xname.len = name ? strlen(name) : 0;
 	xname.type = 0;
 	flags = XFS_ILOG_CORE;
-	xfs_bmap_init(&flist, &first);
+	xfs_defer_init(&flist, &first);
 	switch (fmt) {
 	case IF_REGULAR:
 		buf = newregfile(pp, &len);
@@ -500,7 +500,7 @@ parseproto(
 		newdirent(mp, tp, pip, &xname, ip->i_ino, &first, &flist);
 		libxfs_trans_log_inode(tp, ip, flags);
 
-		error = -libxfs_bmap_finish(&tp, &flist, ip);
+		error = -xfs_defer_finish(&tp, &flist, ip);
 		if (error)
 			fail(_("Pre-allocated file creation failed"), error);
 		libxfs_trans_commit(tp);
@@ -582,7 +582,7 @@ parseproto(
 		}
 		newdirectory(mp, tp, ip, pip);
 		libxfs_trans_log_inode(tp, ip, flags);
-		error = -libxfs_bmap_finish(&tp, &flist, ip);
+		error = -xfs_defer_finish(&tp, &flist, ip);
 		if (error)
 			fail(_("Directory creation failed"), error);
 		libxfs_trans_commit(tp);
@@ -608,7 +608,7 @@ parseproto(
 		fail(_("Unknown format"), EINVAL);
 	}
 	libxfs_trans_log_inode(tp, ip, flags);
-	error = -libxfs_bmap_finish(&tp, &flist, ip);
+	error = -xfs_defer_finish(&tp, &flist, ip);
 	if (error) {
 		fail(_("Error encountered creating file from prototype file"),
 			error);
@@ -638,7 +638,7 @@ rtinit(
 	xfs_bmbt_irec_t	*ep;
 	int		error;
 	xfs_fsblock_t	first;
-	xfs_bmap_free_t	flist;
+	struct xfs_defer_ops	flist;
 	int		i;
 	xfs_bmbt_irec_t	map[XFS_BMAP_MAX_NMAP];
 	xfs_extlen_t	nsumblocks;
@@ -697,7 +697,7 @@ rtinit(
 
 	libxfs_trans_ijoin(tp, rbmip, 0);
 	bno = 0;
-	xfs_bmap_init(&flist, &first);
+	xfs_defer_init(&flist, &first);
 	while (bno < mp->m_sb.sb_rbmblocks) {
 		nmap = XFS_BMAP_MAX_NMAP;
 		error = -libxfs_bmapi_write(tp, rbmip, bno,
@@ -716,7 +716,7 @@ rtinit(
 		}
 	}
 
-	error = -libxfs_bmap_finish(&tp, &flist, rbmip);
+	error = -xfs_defer_finish(&tp, &flist, rbmip);
 	if (error) {
 		fail(_("Completion of the realtime bitmap failed"), error);
 	}
@@ -732,7 +732,7 @@ rtinit(
 		res_failed(i);
 	libxfs_trans_ijoin(tp, rsumip, 0);
 	bno = 0;
-	xfs_bmap_init(&flist, &first);
+	xfs_defer_init(&flist, &first);
 	while (bno < nsumblocks) {
 		nmap = XFS_BMAP_MAX_NMAP;
 		error = -libxfs_bmapi_write(tp, rsumip, bno,
@@ -750,7 +750,7 @@ rtinit(
 			bno += ep->br_blockcount;
 		}
 	}
-	error = -libxfs_bmap_finish(&tp, &flist, rsumip);
+	error = -xfs_defer_finish(&tp, &flist, rsumip);
 	if (error) {
 		fail(_("Completion of the realtime summary failed"), error);
 	}
@@ -765,7 +765,7 @@ rtinit(
 		if (i)
 			res_failed(i);
 		libxfs_trans_ijoin(tp, rbmip, 0);
-		xfs_bmap_init(&flist, &first);
+		xfs_defer_init(&flist, &first);
 		ebno = XFS_RTMIN(mp->m_sb.sb_rextents,
 			bno + NBBY * mp->m_sb.sb_blocksize);
 		error = -libxfs_rtfree_extent(tp, bno, (xfs_extlen_t)(ebno-bno));
@@ -773,7 +773,7 @@ rtinit(
 			fail(_("Error initializing the realtime space"),
 				error);
 		}
-		error = -libxfs_bmap_finish(&tp, &flist, rbmip);
+		error = -xfs_defer_finish(&tp, &flist, rbmip);
 		if (error) {
 			fail(_("Error completing the realtime space"), error);
 		}
diff --git a/repair/phase6.c b/repair/phase6.c
index d1acb68..961f7bc 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -484,7 +484,7 @@ mk_rbmino(xfs_mount_t *mp)
 	int		i;
 	int		nmap;
 	int		error;
-	xfs_bmap_free_t	flist;
+	struct xfs_defer_ops	flist;
 	xfs_fileoff_t	bno;
 	xfs_bmbt_irec_t	map[XFS_BMAP_MAX_NMAP];
 	int		vers;
@@ -550,7 +550,7 @@ mk_rbmino(xfs_mount_t *mp)
 
 	libxfs_trans_ijoin(tp, ip, 0);
 	bno = 0;
-	xfs_bmap_init(&flist, &first);
+	xfs_defer_init(&flist, &first);
 	while (bno < mp->m_sb.sb_rbmblocks) {
 		nmap = XFS_BMAP_MAX_NMAP;
 		error = -libxfs_bmapi_write(tp, ip, bno,
@@ -569,7 +569,7 @@ mk_rbmino(xfs_mount_t *mp)
 			bno += ep->br_blockcount;
 		}
 	}
-	error = -libxfs_bmap_finish(&tp, &flist, ip);
+	error = -xfs_defer_finish(&tp, &flist, ip);
 	if (error) {
 		do_error(
 		_("allocation of the realtime bitmap failed, error = %d\n"),
@@ -731,7 +731,7 @@ mk_rsumino(xfs_mount_t *mp)
 	int		nmap;
 	int		error;
 	int		nsumblocks;
-	xfs_bmap_free_t	flist;
+	struct xfs_defer_ops	flist;
 	xfs_fileoff_t	bno;
 	xfs_bmbt_irec_t	map[XFS_BMAP_MAX_NMAP];
 	int		vers;
@@ -789,7 +789,7 @@ mk_rsumino(xfs_mount_t *mp)
 	 * then allocate blocks for file and fill with zeroes (stolen
 	 * from mkfs)
 	 */
-	xfs_bmap_init(&flist, &first);
+	xfs_defer_init(&flist, &first);
 
 	nsumblocks = mp->m_rsumsize >> mp->m_sb.sb_blocklog;
 	tres.tr_logres = BBTOB(128);
@@ -803,7 +803,7 @@ mk_rsumino(xfs_mount_t *mp)
 
 	libxfs_trans_ijoin(tp, ip, 0);
 	bno = 0;
-	xfs_bmap_init(&flist, &first);
+	xfs_defer_init(&flist, &first);
 	while (bno < nsumblocks) {
 		nmap = XFS_BMAP_MAX_NMAP;
 		error = -libxfs_bmapi_write(tp, ip, bno,
@@ -821,7 +821,7 @@ mk_rsumino(xfs_mount_t *mp)
 			bno += ep->br_blockcount;
 		}
 	}
-	error = -libxfs_bmap_finish(&tp, &flist, ip);
+	error = -xfs_defer_finish(&tp, &flist, ip);
 	if (error) {
 		do_error(
 	_("allocation of the realtime summary ino failed, error = %d\n"),
@@ -919,7 +919,7 @@ mk_orphanage(xfs_mount_t *mp)
 	int		ino_offset = 0;
 	int		i;
 	int		error;
-	xfs_bmap_free_t	flist;
+	struct xfs_defer_ops	flist;
 	const int	mode = 0755;
 	int		nres;
 	struct xfs_name	xname;
@@ -945,7 +945,7 @@ mk_orphanage(xfs_mount_t *mp)
 	 * could not be found, create it
 	 */
 
-	xfs_bmap_init(&flist, &first);
+	xfs_defer_init(&flist, &first);
 
 	nres = XFS_MKDIR_SPACE_RES(mp, xname.len);
 	i = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_mkdir, nres, 0, 0, &tp);
@@ -1028,7 +1028,7 @@ mk_orphanage(xfs_mount_t *mp)
 	libxfs_dir_init(tp, ip, pip);
 	libxfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 
-	error = -libxfs_bmap_finish(&tp, &flist, ip);
+	error = -xfs_defer_finish(&tp, &flist, ip);
 	if (error) {
 		do_error(_("%s directory creation failed -- bmapf error %d\n"),
 			ORPHANAGE, error);
@@ -1057,7 +1057,7 @@ mv_orphanage(
 	xfs_inode_t		*ino_p;
 	xfs_trans_t		*tp;
 	xfs_fsblock_t		first;
-	xfs_bmap_free_t		flist;
+	struct xfs_defer_ops	flist;
 	int			err;
 	unsigned char		fname[MAXPATHLEN + 1];
 	int			nres;
@@ -1110,7 +1110,7 @@ mv_orphanage(
 			libxfs_trans_ijoin(tp, orphanage_ip, 0);
 			libxfs_trans_ijoin(tp, ino_p, 0);
 
-			xfs_bmap_init(&flist, &first);
+			xfs_defer_init(&flist, &first);
 			err = -libxfs_dir_createname(tp, orphanage_ip, &xname,
 						ino, &first, &flist, nres);
 			if (err)
@@ -1134,7 +1134,7 @@ mv_orphanage(
 			inc_nlink(VFS_I(ino_p));
 			libxfs_trans_log_inode(tp, ino_p, XFS_ILOG_CORE);
 
-			err = -libxfs_bmap_finish(&tp, &flist, ino_p);
+			err = -xfs_defer_finish(&tp, &flist, ino_p);
 			if (err)
 				do_error(
 	_("bmap finish failed (err - %d), filesystem may be out of space\n"),
@@ -1152,7 +1152,7 @@ mv_orphanage(
 			libxfs_trans_ijoin(tp, orphanage_ip, 0);
 			libxfs_trans_ijoin(tp, ino_p, 0);
 
-			xfs_bmap_init(&flist, &first);
+			xfs_defer_init(&flist, &first);
 
 			err = -libxfs_dir_createname(tp, orphanage_ip, &xname,
 						ino, &first, &flist, nres);
@@ -1181,7 +1181,7 @@ mv_orphanage(
 						err);
 			}
 
-			err = -libxfs_bmap_finish(&tp, &flist, ino_p);
+			err = -xfs_defer_finish(&tp, &flist, ino_p);
 			if (err)
 				do_error(
 	_("bmap finish failed (%d), filesystem may be out of space\n"),
@@ -1208,7 +1208,7 @@ mv_orphanage(
 		libxfs_trans_ijoin(tp, orphanage_ip, 0);
 		libxfs_trans_ijoin(tp, ino_p, 0);
 
-		xfs_bmap_init(&flist, &first);
+		xfs_defer_init(&flist, &first);
 		err = -libxfs_dir_createname(tp, orphanage_ip, &xname, ino,
 						&first, &flist, nres);
 		if (err)
@@ -1220,7 +1220,7 @@ mv_orphanage(
 		set_nlink(VFS_I(ino_p), 1);
 		libxfs_trans_log_inode(tp, ino_p, XFS_ILOG_CORE);
 
-		err = -libxfs_bmap_finish(&tp, &flist, ino_p);
+		err = -xfs_defer_finish(&tp, &flist, ino_p);
 		if (err)
 			do_error(
 	_("bmap finish failed (%d), filesystem may be out of space\n"),
@@ -1269,7 +1269,7 @@ longform_dir2_rebuild(
 	xfs_trans_t		*tp;
 	xfs_fileoff_t		lastblock;
 	xfs_fsblock_t		firstblock;
-	xfs_bmap_free_t		flist;
+	struct xfs_defer_ops	flist;
 	xfs_inode_t		pip;
 	dir_hash_ent_t		*p;
 	int			done;
@@ -1292,7 +1292,7 @@ longform_dir2_rebuild(
 	    xfs_dir_ino_validate(mp, pip.i_ino))
 		pip.i_ino = mp->m_sb.sb_rootino;
 
-	xfs_bmap_init(&flist, &firstblock);
+	xfs_defer_init(&flist, &firstblock);
 
 	nres = XFS_REMOVE_SPACE_RES(mp);
 	error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_remove, nres, 0, 0, &tp);
@@ -1320,7 +1320,7 @@ longform_dir2_rebuild(
 		goto out_bmap_cancel;
 	}
 
-	error = -libxfs_bmap_finish(&tp, &flist, ip);
+	error = -xfs_defer_finish(&tp, &flist, ip);
 
 	libxfs_trans_commit(tp);
 
@@ -1344,7 +1344,7 @@ longform_dir2_rebuild(
 
 		libxfs_trans_ijoin(tp, ip, 0);
 
-		xfs_bmap_init(&flist, &firstblock);
+		xfs_defer_init(&flist, &firstblock);
 		error = -libxfs_dir_createname(tp, ip, &p->name, p->inum,
 						&firstblock, &flist, nres);
 		if (error) {
@@ -1354,7 +1354,7 @@ _("name create failed in ino %" PRIu64 " (%d), filesystem may be out of space\n"
 			goto out_bmap_cancel;
 		}
 
-		error = -libxfs_bmap_finish(&tp, &flist, ip);
+		error = -xfs_defer_finish(&tp, &flist, ip);
 		if (error) {
 			do_warn(
 	_("bmap finish failed (%d), filesystem may be out of space\n"),
@@ -1368,7 +1368,7 @@ _("name create failed in ino %" PRIu64 " (%d), filesystem may be out of space\n"
 	return;
 
 out_bmap_cancel:
-	libxfs_bmap_cancel(&flist);
+	xfs_defer_cancel(&flist);
 	libxfs_trans_cancel(tp);
 	return;
 }
@@ -1388,7 +1388,7 @@ dir2_kill_block(
 	xfs_da_args_t	args;
 	int		error;
 	xfs_fsblock_t	firstblock;
-	xfs_bmap_free_t	flist;
+	struct xfs_defer_ops	flist;
 	int		nres;
 	xfs_trans_t	*tp;
 
@@ -1399,7 +1399,7 @@ dir2_kill_block(
 	libxfs_trans_ijoin(tp, ip, 0);
 	libxfs_trans_bjoin(tp, bp);
 	memset(&args, 0, sizeof(args));
-	xfs_bmap_init(&flist, &firstblock);
+	xfs_defer_init(&flist, &firstblock);
 	args.dp = ip;
 	args.trans = tp;
 	args.firstblock = &firstblock;
@@ -1414,7 +1414,7 @@ dir2_kill_block(
 	if (error)
 		do_error(_("shrink_inode failed inode %" PRIu64 " block %u\n"),
 			ip->i_ino, da_bno);
-	libxfs_bmap_finish(&tp, &flist, ip);
+	xfs_defer_finish(&tp, &flist, ip);
 	libxfs_trans_commit(tp);
 }
 
@@ -1448,7 +1448,7 @@ longform_dir2_entry_check_data(
 	char			*endptr;
 	int			error;
 	xfs_fsblock_t		firstblock;
-	xfs_bmap_free_t		flist;
+	struct xfs_defer_ops	flist;
 	char			fname[MAXNAMELEN + 1];
 	freetab_t		*freetab;
 	int			i;
@@ -1590,7 +1590,7 @@ longform_dir2_entry_check_data(
 	libxfs_trans_ijoin(tp, ip, 0);
 	libxfs_trans_bjoin(tp, bp);
 	libxfs_trans_bhold(tp, bp);
-	xfs_bmap_init(&flist, &firstblock);
+	xfs_defer_init(&flist, &firstblock);
 	if (be32_to_cpu(d->magic) != wantmagic) {
 		do_warn(
 	_("bad directory block magic # %#x for directory inode %" PRIu64 " block %d: "),
@@ -1889,7 +1889,7 @@ _("entry \"%s\" in dir inode %" PRIu64 " inconsistent with .. value (%" PRIu64 "
 		repair_dir2_data_freescan(mp, M_DIROPS(mp), d, &i);
 	if (needlog)
 		libxfs_dir2_data_log_header(&da, bp);
-	libxfs_bmap_finish(&tp, &flist, ip);
+	xfs_defer_finish(&tp, &flist, ip);
 	libxfs_trans_commit(tp);
 
 	/* record the largest free space in the freetab for later checking */
@@ -2805,7 +2805,7 @@ process_dir_inode(
 	int			ino_offset)
 {
 	xfs_ino_t		ino;
-	xfs_bmap_free_t		flist;
+	struct xfs_defer_ops	flist;
 	xfs_fsblock_t		first;
 	xfs_inode_t		*ip;
 	xfs_trans_t		*tp;
@@ -2943,7 +2943,7 @@ process_dir_inode(
 
 		libxfs_trans_ijoin(tp, ip, 0);
 
-		xfs_bmap_init(&flist, &first);
+		xfs_defer_init(&flist, &first);
 
 		error = -libxfs_dir_createname(tp, ip, &xfs_name_dotdot,
 					ip->i_ino, &first, &flist, nres);
@@ -2953,7 +2953,7 @@ process_dir_inode(
 
 		libxfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 
-		error = -libxfs_bmap_finish(&tp, &flist, ip);
+		error = -xfs_defer_finish(&tp, &flist, ip);
 		ASSERT(error == 0);
 		libxfs_trans_commit(tp);
 
@@ -3001,7 +3001,7 @@ process_dir_inode(
 
 			libxfs_trans_ijoin(tp, ip, 0);
 
-			xfs_bmap_init(&flist, &first);
+			xfs_defer_init(&flist, &first);
 
 			error = -libxfs_dir_createname(tp, ip, &xfs_name_dot,
 					ip->i_ino, &first, &flist, nres);
@@ -3012,7 +3012,7 @@ process_dir_inode(
 
 			libxfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 
-			error = -libxfs_bmap_finish(&tp, &flist, ip);
+			error = -xfs_defer_finish(&tp, &flist, ip);
 			ASSERT(error == 0);
 			libxfs_trans_commit(tp);
 		}

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 025/145] xfs: rename flist/free_list to dfops
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (23 preceding siblings ...)
  2016-06-17  1:33 ` [PATCH 024/145] xfs: change xfs_bmap_{finish, cancel, init, free} -> xfs_defer_* Darrick J. Wong
@ 2016-06-17  1:33 ` Darrick J. Wong
  2016-06-17  1:33 ` [PATCH 026/145] xfs: add tracepoints and error injection for deferred extent freeing Darrick J. Wong
                   ` (119 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:33 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Mechanical change of flist/free_list to dfops, since they're now
deferred ops, not just a freeing list.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/util.c            |   10 ++--
 libxfs/xfs_attr.c        |   62 ++++++++++++-------------
 libxfs/xfs_attr_leaf.c   |    4 +-
 libxfs/xfs_attr_remote.c |   18 ++++---
 libxfs/xfs_bmap.c        |  116 +++++++++++++++++++++++-----------------------
 libxfs/xfs_bmap.h        |   10 ++--
 libxfs/xfs_bmap_btree.c  |   14 +++---
 libxfs/xfs_btree.h       |    4 +-
 libxfs/xfs_da_btree.c    |    6 +-
 libxfs/xfs_da_btree.h    |    2 -
 libxfs/xfs_dir2.c        |   14 +++---
 libxfs/xfs_dir2.h        |    6 +-
 libxfs/xfs_ialloc.c      |   14 +++---
 libxfs/xfs_ialloc.h      |    2 -
 mkfs/proto.c             |   56 +++++++++++-----------
 repair/phase6.c          |   94 +++++++++++++++++++------------------
 repair/sb.c              |   10 ++--
 17 files changed, 221 insertions(+), 221 deletions(-)


diff --git a/libxfs/util.c b/libxfs/util.c
index 4b81818..5b277c2 100644
--- a/libxfs/util.c
+++ b/libxfs/util.c
@@ -493,7 +493,7 @@ libxfs_alloc_file_space(
 	xfs_filblks_t	allocated_fsb;
 	xfs_filblks_t	allocatesize_fsb;
 	xfs_fsblock_t	firstfsb;
-	struct xfs_defer_ops free_list;
+	struct xfs_defer_ops dfops;
 	xfs_bmbt_irec_t *imapp;
 	xfs_bmbt_irec_t imaps[1];
 	int		reccount;
@@ -535,16 +535,16 @@ libxfs_alloc_file_space(
 		}
 		xfs_trans_ijoin(tp, ip, 0);
 
-		xfs_defer_init(&free_list, &firstfsb);
+		xfs_defer_init(&dfops, &firstfsb);
 		error = xfs_bmapi_write(tp, ip, startoffset_fsb, allocatesize_fsb,
 				xfs_bmapi_flags, &firstfsb, 0, imapp,
-				&reccount, &free_list);
+				&reccount, &dfops);
 
 		if (error)
 			goto error0;
 
 		/* complete the transaction */
-		error = xfs_defer_finish(&tp, &free_list, ip);
+		error = xfs_defer_finish(&tp, &dfops, ip);
 		if (error)
 			goto error0;
 
@@ -562,7 +562,7 @@ libxfs_alloc_file_space(
 	return error;
 
 error0:	/* Cancel bmap, cancel trans */
-	xfs_defer_cancel(&free_list);
+	xfs_defer_cancel(&dfops);
 	xfs_trans_cancel(tp);
 	return error;
 }
diff --git a/libxfs/xfs_attr.c b/libxfs/xfs_attr.c
index 06b3c5d..60513f9 100644
--- a/libxfs/xfs_attr.c
+++ b/libxfs/xfs_attr.c
@@ -199,7 +199,7 @@ xfs_attr_set(
 {
 	struct xfs_mount	*mp = dp->i_mount;
 	struct xfs_da_args	args;
-	struct xfs_defer_ops	flist;
+	struct xfs_defer_ops	dfops;
 	struct xfs_trans_res	tres;
 	xfs_fsblock_t		firstblock;
 	int			rsvd = (flags & ATTR_ROOT) != 0;
@@ -217,7 +217,7 @@ xfs_attr_set(
 	args.value = value;
 	args.valuelen = valuelen;
 	args.firstblock = &firstblock;
-	args.flist = &flist;
+	args.dfops = &dfops;
 	args.op_flags = XFS_DA_OP_ADDNAME | XFS_DA_OP_OKNOENT;
 	args.total = xfs_attr_calc_size(&args, &local);
 
@@ -312,13 +312,13 @@ xfs_attr_set(
 		 * It won't fit in the shortform, transform to a leaf block.
 		 * GROT: another possible req'mt for a double-split btree op.
 		 */
-		xfs_defer_init(args.flist, args.firstblock);
+		xfs_defer_init(args.dfops, args.firstblock);
 		error = xfs_attr_shortform_to_leaf(&args);
 		if (!error)
-			error = xfs_defer_finish(&args.trans, args.flist, dp);
+			error = xfs_defer_finish(&args.trans, args.dfops, dp);
 		if (error) {
 			args.trans = NULL;
-			xfs_defer_cancel(&flist);
+			xfs_defer_cancel(&dfops);
 			goto out;
 		}
 
@@ -378,7 +378,7 @@ xfs_attr_remove(
 {
 	struct xfs_mount	*mp = dp->i_mount;
 	struct xfs_da_args	args;
-	struct xfs_defer_ops	flist;
+	struct xfs_defer_ops	dfops;
 	xfs_fsblock_t		firstblock;
 	int			error;
 
@@ -395,7 +395,7 @@ xfs_attr_remove(
 		return error;
 
 	args.firstblock = &firstblock;
-	args.flist = &flist;
+	args.dfops = &dfops;
 
 	/*
 	 * we have no control over the attribute names that userspace passes us
@@ -580,13 +580,13 @@ xfs_attr_leaf_addname(xfs_da_args_t *args)
 		 * Commit that transaction so that the node_addname() call
 		 * can manage its own transactions.
 		 */
-		xfs_defer_init(args->flist, args->firstblock);
+		xfs_defer_init(args->dfops, args->firstblock);
 		error = xfs_attr3_leaf_to_node(args);
 		if (!error)
-			error = xfs_defer_finish(&args->trans, args->flist, dp);
+			error = xfs_defer_finish(&args->trans, args->dfops, dp);
 		if (error) {
 			args->trans = NULL;
-			xfs_defer_cancel(args->flist);
+			xfs_defer_cancel(args->dfops);
 			return error;
 		}
 
@@ -670,15 +670,15 @@ xfs_attr_leaf_addname(xfs_da_args_t *args)
 		 * If the result is small enough, shrink it all into the inode.
 		 */
 		if ((forkoff = xfs_attr_shortform_allfit(bp, dp))) {
-			xfs_defer_init(args->flist, args->firstblock);
+			xfs_defer_init(args->dfops, args->firstblock);
 			error = xfs_attr3_leaf_to_shortform(bp, args, forkoff);
 			/* bp is gone due to xfs_da_shrink_inode */
 			if (!error)
 				error = xfs_defer_finish(&args->trans,
-							args->flist, dp);
+							args->dfops, dp);
 			if (error) {
 				args->trans = NULL;
-				xfs_defer_cancel(args->flist);
+				xfs_defer_cancel(args->dfops);
 				return error;
 			}
 		}
@@ -733,14 +733,14 @@ xfs_attr_leaf_removename(xfs_da_args_t *args)
 	 * If the result is small enough, shrink it all into the inode.
 	 */
 	if ((forkoff = xfs_attr_shortform_allfit(bp, dp))) {
-		xfs_defer_init(args->flist, args->firstblock);
+		xfs_defer_init(args->dfops, args->firstblock);
 		error = xfs_attr3_leaf_to_shortform(bp, args, forkoff);
 		/* bp is gone due to xfs_da_shrink_inode */
 		if (!error)
-			error = xfs_defer_finish(&args->trans, args->flist, dp);
+			error = xfs_defer_finish(&args->trans, args->dfops, dp);
 		if (error) {
 			args->trans = NULL;
-			xfs_defer_cancel(args->flist);
+			xfs_defer_cancel(args->dfops);
 			return error;
 		}
 	}
@@ -859,14 +859,14 @@ restart:
 			 */
 			xfs_da_state_free(state);
 			state = NULL;
-			xfs_defer_init(args->flist, args->firstblock);
+			xfs_defer_init(args->dfops, args->firstblock);
 			error = xfs_attr3_leaf_to_node(args);
 			if (!error)
 				error = xfs_defer_finish(&args->trans,
-							args->flist, dp);
+							args->dfops, dp);
 			if (error) {
 				args->trans = NULL;
-				xfs_defer_cancel(args->flist);
+				xfs_defer_cancel(args->dfops);
 				goto out;
 			}
 
@@ -887,13 +887,13 @@ restart:
 		 * in the index/blkno/rmtblkno/rmtblkcnt fields and
 		 * in the index2/blkno2/rmtblkno2/rmtblkcnt2 fields.
 		 */
-		xfs_defer_init(args->flist, args->firstblock);
+		xfs_defer_init(args->dfops, args->firstblock);
 		error = xfs_da3_split(state);
 		if (!error)
-			error = xfs_defer_finish(&args->trans, args->flist, dp);
+			error = xfs_defer_finish(&args->trans, args->dfops, dp);
 		if (error) {
 			args->trans = NULL;
-			xfs_defer_cancel(args->flist);
+			xfs_defer_cancel(args->dfops);
 			goto out;
 		}
 	} else {
@@ -986,14 +986,14 @@ restart:
 		 * Check to see if the tree needs to be collapsed.
 		 */
 		if (retval && (state->path.active > 1)) {
-			xfs_defer_init(args->flist, args->firstblock);
+			xfs_defer_init(args->dfops, args->firstblock);
 			error = xfs_da3_join(state);
 			if (!error)
 				error = xfs_defer_finish(&args->trans,
-							args->flist, dp);
+							args->dfops, dp);
 			if (error) {
 				args->trans = NULL;
-				xfs_defer_cancel(args->flist);
+				xfs_defer_cancel(args->dfops);
 				goto out;
 			}
 		}
@@ -1109,13 +1109,13 @@ xfs_attr_node_removename(xfs_da_args_t *args)
 	 * Check to see if the tree needs to be collapsed.
 	 */
 	if (retval && (state->path.active > 1)) {
-		xfs_defer_init(args->flist, args->firstblock);
+		xfs_defer_init(args->dfops, args->firstblock);
 		error = xfs_da3_join(state);
 		if (!error)
-			error = xfs_defer_finish(&args->trans, args->flist, dp);
+			error = xfs_defer_finish(&args->trans, args->dfops, dp);
 		if (error) {
 			args->trans = NULL;
-			xfs_defer_cancel(args->flist);
+			xfs_defer_cancel(args->dfops);
 			goto out;
 		}
 		/*
@@ -1142,15 +1142,15 @@ xfs_attr_node_removename(xfs_da_args_t *args)
 			goto out;
 
 		if ((forkoff = xfs_attr_shortform_allfit(bp, dp))) {
-			xfs_defer_init(args->flist, args->firstblock);
+			xfs_defer_init(args->dfops, args->firstblock);
 			error = xfs_attr3_leaf_to_shortform(bp, args, forkoff);
 			/* bp is gone due to xfs_da_shrink_inode */
 			if (!error)
 				error = xfs_defer_finish(&args->trans,
-							args->flist, dp);
+							args->dfops, dp);
 			if (error) {
 				args->trans = NULL;
-				xfs_defer_cancel(args->flist);
+				xfs_defer_cancel(args->dfops);
 				goto out;
 			}
 		} else
diff --git a/libxfs/xfs_attr_leaf.c b/libxfs/xfs_attr_leaf.c
index d0eb0a7..0ca0347 100644
--- a/libxfs/xfs_attr_leaf.c
+++ b/libxfs/xfs_attr_leaf.c
@@ -787,7 +787,7 @@ xfs_attr_shortform_to_leaf(xfs_da_args_t *args)
 	nargs.dp = dp;
 	nargs.geo = args->geo;
 	nargs.firstblock = args->firstblock;
-	nargs.flist = args->flist;
+	nargs.dfops = args->dfops;
 	nargs.total = args->total;
 	nargs.whichfork = XFS_ATTR_FORK;
 	nargs.trans = args->trans;
@@ -917,7 +917,7 @@ xfs_attr3_leaf_to_shortform(
 	nargs.geo = args->geo;
 	nargs.dp = dp;
 	nargs.firstblock = args->firstblock;
-	nargs.flist = args->flist;
+	nargs.dfops = args->dfops;
 	nargs.total = args->total;
 	nargs.whichfork = XFS_ATTR_FORK;
 	nargs.trans = args->trans;
diff --git a/libxfs/xfs_attr_remote.c b/libxfs/xfs_attr_remote.c
index dc545f0..abe1705 100644
--- a/libxfs/xfs_attr_remote.c
+++ b/libxfs/xfs_attr_remote.c
@@ -456,16 +456,16 @@ xfs_attr_rmtval_set(
 		 * extent and then crash then the block may not contain the
 		 * correct metadata after log recovery occurs.
 		 */
-		xfs_defer_init(args->flist, args->firstblock);
+		xfs_defer_init(args->dfops, args->firstblock);
 		nmap = 1;
 		error = xfs_bmapi_write(args->trans, dp, (xfs_fileoff_t)lblkno,
 				  blkcnt, XFS_BMAPI_ATTRFORK, args->firstblock,
-				  args->total, &map, &nmap, args->flist);
+				  args->total, &map, &nmap, args->dfops);
 		if (!error)
-			error = xfs_defer_finish(&args->trans, args->flist, dp);
+			error = xfs_defer_finish(&args->trans, args->dfops, dp);
 		if (error) {
 			args->trans = NULL;
-			xfs_defer_cancel(args->flist);
+			xfs_defer_cancel(args->dfops);
 			return error;
 		}
 
@@ -499,7 +499,7 @@ xfs_attr_rmtval_set(
 
 		ASSERT(blkcnt > 0);
 
-		xfs_defer_init(args->flist, args->firstblock);
+		xfs_defer_init(args->dfops, args->firstblock);
 		nmap = 1;
 		error = xfs_bmapi_read(dp, (xfs_fileoff_t)lblkno,
 				       blkcnt, &map, &nmap,
@@ -599,16 +599,16 @@ xfs_attr_rmtval_remove(
 	blkcnt = args->rmtblkcnt;
 	done = 0;
 	while (!done) {
-		xfs_defer_init(args->flist, args->firstblock);
+		xfs_defer_init(args->dfops, args->firstblock);
 		error = xfs_bunmapi(args->trans, args->dp, lblkno, blkcnt,
 				    XFS_BMAPI_ATTRFORK, 1, args->firstblock,
-				    args->flist, &done);
+				    args->dfops, &done);
 		if (!error)
-			error = xfs_defer_finish(&args->trans, args->flist,
+			error = xfs_defer_finish(&args->trans, args->dfops,
 						args->dp);
 		if (error) {
 			args->trans = NULL;
-			xfs_defer_cancel(args->flist);
+			xfs_defer_cancel(args->dfops);
 			return error;
 		}
 
diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index 0faa0a9..5385800 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -564,7 +564,7 @@ xfs_bmap_validate_ret(
 void
 xfs_bmap_add_free(
 	struct xfs_mount	*mp,		/* mount point structure */
-	struct xfs_defer_ops	*flist,		/* list of extents */
+	struct xfs_defer_ops	*dfops,		/* list of extents */
 	xfs_fsblock_t		bno,		/* fs block number of extent */
 	xfs_filblks_t		len)		/* length of extent */
 {
@@ -588,7 +588,7 @@ xfs_bmap_add_free(
 	new = kmem_zone_alloc(xfs_bmap_free_item_zone, KM_SLEEP);
 	new->xbfi_startblock = bno;
 	new->xbfi_blockcount = (xfs_extlen_t)len;
-	xfs_defer_add(flist, XFS_DEFER_OPS_TYPE_FREE, &new->xbfi_list);
+	xfs_defer_add(dfops, XFS_DEFER_OPS_TYPE_FREE, &new->xbfi_list);
 }
 
 /*
@@ -641,7 +641,7 @@ xfs_bmap_btree_to_extents(
 	cblock = XFS_BUF_TO_BLOCK(cbp);
 	if ((error = xfs_btree_check_block(cur, cblock, 0, cbp)))
 		return error;
-	xfs_bmap_add_free(mp, cur->bc_private.b.flist, cbno, 1);
+	xfs_bmap_add_free(mp, cur->bc_private.b.dfops, cbno, 1);
 	ip->i_d.di_nblocks--;
 	xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, -1L);
 	xfs_trans_binval(tp, cbp);
@@ -664,7 +664,7 @@ xfs_bmap_extents_to_btree(
 	xfs_trans_t		*tp,		/* transaction pointer */
 	xfs_inode_t		*ip,		/* incore inode pointer */
 	xfs_fsblock_t		*firstblock,	/* first-block-allocated */
-	struct xfs_defer_ops	*flist,		/* blocks freed in xaction */
+	struct xfs_defer_ops	*dfops,		/* blocks freed in xaction */
 	xfs_btree_cur_t		**curp,		/* cursor returned to caller */
 	int			wasdel,		/* converting a delayed alloc */
 	int			*logflagsp,	/* inode logging flags */
@@ -713,7 +713,7 @@ xfs_bmap_extents_to_btree(
 	 */
 	cur = xfs_bmbt_init_cursor(mp, tp, ip, whichfork);
 	cur->bc_private.b.firstblock = *firstblock;
-	cur->bc_private.b.flist = flist;
+	cur->bc_private.b.dfops = dfops;
 	cur->bc_private.b.flags = wasdel ? XFS_BTCUR_BPRV_WASDEL : 0;
 	/*
 	 * Convert to a btree with two levels, one record in root.
@@ -726,7 +726,7 @@ xfs_bmap_extents_to_btree(
 	if (*firstblock == NULLFSBLOCK) {
 		args.type = XFS_ALLOCTYPE_START_BNO;
 		args.fsbno = XFS_INO_TO_FSB(mp, ip->i_ino);
-	} else if (flist->dop_low) {
+	} else if (dfops->dop_low) {
 		args.type = XFS_ALLOCTYPE_START_BNO;
 		args.fsbno = *firstblock;
 	} else {
@@ -747,7 +747,7 @@ xfs_bmap_extents_to_btree(
 	ASSERT(args.fsbno != NULLFSBLOCK);
 	ASSERT(*firstblock == NULLFSBLOCK ||
 	       args.agno == XFS_FSB_TO_AGNO(mp, *firstblock) ||
-	       (flist->dop_low &&
+	       (dfops->dop_low &&
 		args.agno > XFS_FSB_TO_AGNO(mp, *firstblock)));
 	*firstblock = cur->bc_private.b.firstblock = args.fsbno;
 	cur->bc_private.b.allocated++;
@@ -932,7 +932,7 @@ xfs_bmap_add_attrfork_btree(
 	xfs_trans_t		*tp,		/* transaction pointer */
 	xfs_inode_t		*ip,		/* incore inode pointer */
 	xfs_fsblock_t		*firstblock,	/* first block allocated */
-	struct xfs_defer_ops	*flist,		/* blocks to free at commit */
+	struct xfs_defer_ops	*dfops,		/* blocks to free at commit */
 	int			*flags)		/* inode logging flags */
 {
 	xfs_btree_cur_t		*cur;		/* btree cursor */
@@ -945,7 +945,7 @@ xfs_bmap_add_attrfork_btree(
 		*flags |= XFS_ILOG_DBROOT;
 	else {
 		cur = xfs_bmbt_init_cursor(mp, tp, ip, XFS_DATA_FORK);
-		cur->bc_private.b.flist = flist;
+		cur->bc_private.b.dfops = dfops;
 		cur->bc_private.b.firstblock = *firstblock;
 		if ((error = xfs_bmbt_lookup_ge(cur, 0, 0, 0, &stat)))
 			goto error0;
@@ -975,7 +975,7 @@ xfs_bmap_add_attrfork_extents(
 	xfs_trans_t		*tp,		/* transaction pointer */
 	xfs_inode_t		*ip,		/* incore inode pointer */
 	xfs_fsblock_t		*firstblock,	/* first block allocated */
-	struct xfs_defer_ops	*flist,		/* blocks to free at commit */
+	struct xfs_defer_ops	*dfops,		/* blocks to free at commit */
 	int			*flags)		/* inode logging flags */
 {
 	xfs_btree_cur_t		*cur;		/* bmap btree cursor */
@@ -984,7 +984,7 @@ xfs_bmap_add_attrfork_extents(
 	if (ip->i_d.di_nextents * sizeof(xfs_bmbt_rec_t) <= XFS_IFORK_DSIZE(ip))
 		return 0;
 	cur = NULL;
-	error = xfs_bmap_extents_to_btree(tp, ip, firstblock, flist, &cur, 0,
+	error = xfs_bmap_extents_to_btree(tp, ip, firstblock, dfops, &cur, 0,
 		flags, XFS_DATA_FORK);
 	if (cur) {
 		cur->bc_private.b.allocated = 0;
@@ -1010,7 +1010,7 @@ xfs_bmap_add_attrfork_local(
 	xfs_trans_t		*tp,		/* transaction pointer */
 	xfs_inode_t		*ip,		/* incore inode pointer */
 	xfs_fsblock_t		*firstblock,	/* first block allocated */
-	struct xfs_defer_ops	*flist,		/* blocks to free at commit */
+	struct xfs_defer_ops	*dfops,		/* blocks to free at commit */
 	int			*flags)		/* inode logging flags */
 {
 	xfs_da_args_t		dargs;		/* args for dir/attr code */
@@ -1023,7 +1023,7 @@ xfs_bmap_add_attrfork_local(
 		dargs.geo = ip->i_mount->m_dir_geo;
 		dargs.dp = ip;
 		dargs.firstblock = firstblock;
-		dargs.flist = flist;
+		dargs.dfops = dfops;
 		dargs.total = dargs.geo->fsbcount;
 		dargs.whichfork = XFS_DATA_FORK;
 		dargs.trans = tp;
@@ -1051,7 +1051,7 @@ xfs_bmap_add_attrfork(
 	int			rsvd)		/* xact may use reserved blks */
 {
 	xfs_fsblock_t		firstblock;	/* 1st block/ag allocated */
-	struct xfs_defer_ops	flist;		/* freed extent records */
+	struct xfs_defer_ops	dfops;		/* freed extent records */
 	xfs_mount_t		*mp;		/* mount structure */
 	xfs_trans_t		*tp;		/* transaction pointer */
 	int			blks;		/* space reservation */
@@ -1117,18 +1117,18 @@ xfs_bmap_add_attrfork(
 	ip->i_afp = kmem_zone_zalloc(xfs_ifork_zone, KM_SLEEP);
 	ip->i_afp->if_flags = XFS_IFEXTENTS;
 	logflags = 0;
-	xfs_defer_init(&flist, &firstblock);
+	xfs_defer_init(&dfops, &firstblock);
 	switch (ip->i_d.di_format) {
 	case XFS_DINODE_FMT_LOCAL:
-		error = xfs_bmap_add_attrfork_local(tp, ip, &firstblock, &flist,
+		error = xfs_bmap_add_attrfork_local(tp, ip, &firstblock, &dfops,
 			&logflags);
 		break;
 	case XFS_DINODE_FMT_EXTENTS:
 		error = xfs_bmap_add_attrfork_extents(tp, ip, &firstblock,
-			&flist, &logflags);
+			&dfops, &logflags);
 		break;
 	case XFS_DINODE_FMT_BTREE:
-		error = xfs_bmap_add_attrfork_btree(tp, ip, &firstblock, &flist,
+		error = xfs_bmap_add_attrfork_btree(tp, ip, &firstblock, &dfops,
 			&logflags);
 		break;
 	default:
@@ -1157,7 +1157,7 @@ xfs_bmap_add_attrfork(
 			xfs_log_sb(tp);
 	}
 
-	error = xfs_defer_finish(&tp, &flist, NULL);
+	error = xfs_defer_finish(&tp, &dfops, NULL);
 	if (error)
 		goto bmap_cancel;
 	error = xfs_trans_commit(tp);
@@ -1165,7 +1165,7 @@ xfs_bmap_add_attrfork(
 	return error;
 
 bmap_cancel:
-	xfs_defer_cancel(&flist);
+	xfs_defer_cancel(&dfops);
 trans_cancel:
 	xfs_trans_cancel(tp);
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
@@ -1962,7 +1962,7 @@ xfs_bmap_add_extent_delay_real(
 
 		if (xfs_bmap_needs_btree(bma->ip, whichfork)) {
 			error = xfs_bmap_extents_to_btree(bma->tp, bma->ip,
-					bma->firstblock, bma->flist,
+					bma->firstblock, bma->dfops,
 					&bma->cur, 1, &tmp_rval, whichfork);
 			rval |= tmp_rval;
 			if (error)
@@ -2046,7 +2046,7 @@ xfs_bmap_add_extent_delay_real(
 
 		if (xfs_bmap_needs_btree(bma->ip, whichfork)) {
 			error = xfs_bmap_extents_to_btree(bma->tp, bma->ip,
-				bma->firstblock, bma->flist, &bma->cur, 1,
+				bma->firstblock, bma->dfops, &bma->cur, 1,
 				&tmp_rval, whichfork);
 			rval |= tmp_rval;
 			if (error)
@@ -2115,7 +2115,7 @@ xfs_bmap_add_extent_delay_real(
 
 		if (xfs_bmap_needs_btree(bma->ip, whichfork)) {
 			error = xfs_bmap_extents_to_btree(bma->tp, bma->ip,
-					bma->firstblock, bma->flist, &bma->cur,
+					bma->firstblock, bma->dfops, &bma->cur,
 					1, &tmp_rval, whichfork);
 			rval |= tmp_rval;
 			if (error)
@@ -2164,7 +2164,7 @@ xfs_bmap_add_extent_delay_real(
 
 		ASSERT(bma->cur == NULL);
 		error = xfs_bmap_extents_to_btree(bma->tp, bma->ip,
-				bma->firstblock, bma->flist, &bma->cur,
+				bma->firstblock, bma->dfops, &bma->cur,
 				da_old > 0, &tmp_logflags, whichfork);
 		bma->logflags |= tmp_logflags;
 		if (error)
@@ -2206,7 +2206,7 @@ xfs_bmap_add_extent_unwritten_real(
 	xfs_btree_cur_t		**curp,	/* if *curp is null, not a btree */
 	xfs_bmbt_irec_t		*new,	/* new data to add to file extents */
 	xfs_fsblock_t		*first,	/* pointer to firstblock variable */
-	struct xfs_defer_ops	*flist,	/* list of extents to be freed */
+	struct xfs_defer_ops	*dfops,	/* list of extents to be freed */
 	int			*logflagsp) /* inode logging flags */
 {
 	xfs_btree_cur_t		*cur;	/* btree cursor */
@@ -2699,7 +2699,7 @@ xfs_bmap_add_extent_unwritten_real(
 		int	tmp_logflags;	/* partial log flag return val */
 
 		ASSERT(cur == NULL);
-		error = xfs_bmap_extents_to_btree(tp, ip, first, flist, &cur,
+		error = xfs_bmap_extents_to_btree(tp, ip, first, dfops, &cur,
 				0, &tmp_logflags, XFS_DATA_FORK);
 		*logflagsp |= tmp_logflags;
 		if (error)
@@ -3092,7 +3092,7 @@ xfs_bmap_add_extent_hole_real(
 
 		ASSERT(bma->cur == NULL);
 		error = xfs_bmap_extents_to_btree(bma->tp, bma->ip,
-				bma->firstblock, bma->flist, &bma->cur,
+				bma->firstblock, bma->dfops, &bma->cur,
 				0, &tmp_logflags, whichfork);
 		bma->logflags |= tmp_logflags;
 		if (error)
@@ -3667,7 +3667,7 @@ xfs_bmap_btalloc(
 			error = xfs_bmap_btalloc_nullfb(ap, &args, &blen);
 		if (error)
 			return error;
-	} else if (ap->flist->dop_low) {
+	} else if (ap->dfops->dop_low) {
 		if (xfs_inode_is_filestream(ap->ip))
 			args.type = XFS_ALLOCTYPE_FIRST_AG;
 		else
@@ -3700,7 +3700,7 @@ xfs_bmap_btalloc(
 	 * is >= the stripe unit and the allocation offset is
 	 * at the end of file.
 	 */
-	if (!ap->flist->dop_low && ap->aeof) {
+	if (!ap->dfops->dop_low && ap->aeof) {
 		if (!ap->offset) {
 			args.alignment = stripe_align;
 			atype = args.type;
@@ -3793,7 +3793,7 @@ xfs_bmap_btalloc(
 		args.minleft = 0;
 		if ((error = xfs_alloc_vextent(&args)))
 			return error;
-		ap->flist->dop_low = true;
+		ap->dfops->dop_low = true;
 	}
 	if (args.fsbno != NULLFSBLOCK) {
 		/*
@@ -3803,7 +3803,7 @@ xfs_bmap_btalloc(
 		ASSERT(*ap->firstblock == NULLFSBLOCK ||
 		       XFS_FSB_TO_AGNO(mp, *ap->firstblock) ==
 		       XFS_FSB_TO_AGNO(mp, args.fsbno) ||
-		       (ap->flist->dop_low &&
+		       (ap->dfops->dop_low &&
 			XFS_FSB_TO_AGNO(mp, *ap->firstblock) <
 			XFS_FSB_TO_AGNO(mp, args.fsbno)));
 
@@ -3811,7 +3811,7 @@ xfs_bmap_btalloc(
 		if (*ap->firstblock == NULLFSBLOCK)
 			*ap->firstblock = args.fsbno;
 		ASSERT(nullfb || fb_agno == args.agno ||
-		       (ap->flist->dop_low && fb_agno < args.agno));
+		       (ap->dfops->dop_low && fb_agno < args.agno));
 		ap->length = args.len;
 		ap->ip->i_d.di_nblocks += args.len;
 		xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE);
@@ -4278,7 +4278,7 @@ xfs_bmapi_allocate(
 	if (error)
 		return error;
 
-	if (bma->flist->dop_low)
+	if (bma->dfops->dop_low)
 		bma->minleft = 0;
 	if (bma->cur)
 		bma->cur->bc_private.b.firstblock = *bma->firstblock;
@@ -4287,7 +4287,7 @@ xfs_bmapi_allocate(
 	if ((ifp->if_flags & XFS_IFBROOT) && !bma->cur) {
 		bma->cur = xfs_bmbt_init_cursor(mp, bma->tp, bma->ip, whichfork);
 		bma->cur->bc_private.b.firstblock = *bma->firstblock;
-		bma->cur->bc_private.b.flist = bma->flist;
+		bma->cur->bc_private.b.dfops = bma->dfops;
 	}
 	/*
 	 * Bump the number of extents we've allocated
@@ -4368,7 +4368,7 @@ xfs_bmapi_convert_unwritten(
 		bma->cur = xfs_bmbt_init_cursor(bma->ip->i_mount, bma->tp,
 					bma->ip, whichfork);
 		bma->cur->bc_private.b.firstblock = *bma->firstblock;
-		bma->cur->bc_private.b.flist = bma->flist;
+		bma->cur->bc_private.b.dfops = bma->dfops;
 	}
 	mval->br_state = (mval->br_state == XFS_EXT_UNWRITTEN)
 				? XFS_EXT_NORM : XFS_EXT_UNWRITTEN;
@@ -4385,7 +4385,7 @@ xfs_bmapi_convert_unwritten(
 	}
 
 	error = xfs_bmap_add_extent_unwritten_real(bma->tp, bma->ip, &bma->idx,
-			&bma->cur, mval, bma->firstblock, bma->flist,
+			&bma->cur, mval, bma->firstblock, bma->dfops,
 			&tmp_logflags);
 	/*
 	 * Log the inode core unconditionally in the unwritten extent conversion
@@ -4439,7 +4439,7 @@ xfs_bmapi_write(
 	xfs_extlen_t		total,		/* total blocks needed */
 	struct xfs_bmbt_irec	*mval,		/* output: map values */
 	int			*nmap,		/* i/o: mval size/count */
-	struct xfs_defer_ops	*flist)		/* i/o: list extents to free */
+	struct xfs_defer_ops	*dfops)		/* i/o: list extents to free */
 {
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_ifork	*ifp;
@@ -4529,7 +4529,7 @@ xfs_bmapi_write(
 	bma.ip = ip;
 	bma.total = total;
 	bma.userdata = 0;
-	bma.flist = flist;
+	bma.dfops = dfops;
 	bma.firstblock = firstblock;
 
 	while (bno < end && n < *nmap) {
@@ -4643,7 +4643,7 @@ error0:
 			       XFS_FSB_TO_AGNO(mp, *firstblock) ==
 			       XFS_FSB_TO_AGNO(mp,
 				       bma.cur->bc_private.b.firstblock) ||
-			       (flist->dop_low &&
+			       (dfops->dop_low &&
 				XFS_FSB_TO_AGNO(mp, *firstblock) <
 				XFS_FSB_TO_AGNO(mp,
 					bma.cur->bc_private.b.firstblock)));
@@ -4727,7 +4727,7 @@ xfs_bmap_del_extent(
 	xfs_inode_t		*ip,	/* incore inode pointer */
 	xfs_trans_t		*tp,	/* current transaction pointer */
 	xfs_extnum_t		*idx,	/* extent number to update/delete */
-	struct xfs_defer_ops	*flist,	/* list of extents to be freed */
+	struct xfs_defer_ops	*dfops,	/* list of extents to be freed */
 	xfs_btree_cur_t		*cur,	/* if null, not a btree */
 	xfs_bmbt_irec_t		*del,	/* data to remove from extents */
 	int			*logflagsp, /* inode logging flags */
@@ -5015,7 +5015,7 @@ xfs_bmap_del_extent(
 	 * If we need to, add to list of extents to delete.
 	 */
 	if (do_fx)
-		xfs_bmap_add_free(mp, flist, del->br_startblock,
+		xfs_bmap_add_free(mp, dfops, del->br_startblock,
 			del->br_blockcount);
 	/*
 	 * Adjust inode # blocks in the file.
@@ -5056,7 +5056,7 @@ xfs_bunmapi(
 	xfs_extnum_t		nexts,		/* number of extents max */
 	xfs_fsblock_t		*firstblock,	/* first allocated block
 						   controls a.g. for allocs */
-	struct xfs_defer_ops	*flist,		/* i/o: list extents to free */
+	struct xfs_defer_ops	*dfops,		/* i/o: list extents to free */
 	int			*done)		/* set if not done yet */
 {
 	xfs_btree_cur_t		*cur;		/* bmap btree cursor */
@@ -5129,7 +5129,7 @@ xfs_bunmapi(
 		ASSERT(XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_BTREE);
 		cur = xfs_bmbt_init_cursor(mp, tp, ip, whichfork);
 		cur->bc_private.b.firstblock = *firstblock;
-		cur->bc_private.b.flist = flist;
+		cur->bc_private.b.dfops = dfops;
 		cur->bc_private.b.flags = 0;
 	} else
 		cur = NULL;
@@ -5221,7 +5221,7 @@ xfs_bunmapi(
 			}
 			del.br_state = XFS_EXT_UNWRITTEN;
 			error = xfs_bmap_add_extent_unwritten_real(tp, ip,
-					&lastx, &cur, &del, firstblock, flist,
+					&lastx, &cur, &del, firstblock, dfops,
 					&logflags);
 			if (error)
 				goto error0;
@@ -5280,7 +5280,7 @@ xfs_bunmapi(
 				lastx--;
 				error = xfs_bmap_add_extent_unwritten_real(tp,
 						ip, &lastx, &cur, &prev,
-						firstblock, flist, &logflags);
+						firstblock, dfops, &logflags);
 				if (error)
 					goto error0;
 				goto nodelete;
@@ -5289,7 +5289,7 @@ xfs_bunmapi(
 				del.br_state = XFS_EXT_UNWRITTEN;
 				error = xfs_bmap_add_extent_unwritten_real(tp,
 						ip, &lastx, &cur, &del,
-						firstblock, flist, &logflags);
+						firstblock, dfops, &logflags);
 				if (error)
 					goto error0;
 				goto nodelete;
@@ -5347,7 +5347,7 @@ xfs_bunmapi(
 		} else if (cur)
 			cur->bc_private.b.flags &= ~XFS_BTCUR_BPRV_WASDEL;
 
-		error = xfs_bmap_del_extent(ip, tp, &lastx, flist, cur, &del,
+		error = xfs_bmap_del_extent(ip, tp, &lastx, dfops, cur, &del,
 				&tmp_logflags, whichfork);
 		logflags |= tmp_logflags;
 		if (error)
@@ -5381,7 +5381,7 @@ nodelete:
 	 */
 	if (xfs_bmap_needs_btree(ip, whichfork)) {
 		ASSERT(cur == NULL);
-		error = xfs_bmap_extents_to_btree(tp, ip, firstblock, flist,
+		error = xfs_bmap_extents_to_btree(tp, ip, firstblock, dfops,
 			&cur, 0, &tmp_logflags, whichfork);
 		logflags |= tmp_logflags;
 		if (error)
@@ -5670,7 +5670,7 @@ xfs_bmap_shift_extents(
 	int			*done,
 	xfs_fileoff_t		stop_fsb,
 	xfs_fsblock_t		*firstblock,
-	struct xfs_defer_ops	*flist,
+	struct xfs_defer_ops	*dfops,
 	enum shift_direction	direction,
 	int			num_exts)
 {
@@ -5715,7 +5715,7 @@ xfs_bmap_shift_extents(
 	if (ifp->if_flags & XFS_IFBROOT) {
 		cur = xfs_bmbt_init_cursor(mp, tp, ip, whichfork);
 		cur->bc_private.b.firstblock = *firstblock;
-		cur->bc_private.b.flist = flist;
+		cur->bc_private.b.dfops = dfops;
 		cur->bc_private.b.flags = 0;
 	}
 
@@ -5824,7 +5824,7 @@ xfs_bmap_split_extent_at(
 	struct xfs_inode	*ip,
 	xfs_fileoff_t		split_fsb,
 	xfs_fsblock_t		*firstfsb,
-	struct xfs_defer_ops	*free_list)
+	struct xfs_defer_ops	*dfops)
 {
 	int				whichfork = XFS_DATA_FORK;
 	struct xfs_btree_cur		*cur = NULL;
@@ -5886,7 +5886,7 @@ xfs_bmap_split_extent_at(
 	if (ifp->if_flags & XFS_IFBROOT) {
 		cur = xfs_bmbt_init_cursor(mp, tp, ip, whichfork);
 		cur->bc_private.b.firstblock = *firstfsb;
-		cur->bc_private.b.flist = free_list;
+		cur->bc_private.b.dfops = dfops;
 		cur->bc_private.b.flags = 0;
 		error = xfs_bmbt_lookup_eq(cur, got.br_startoff,
 				got.br_startblock,
@@ -5939,7 +5939,7 @@ xfs_bmap_split_extent_at(
 		int tmp_logflags; /* partial log flag return val */
 
 		ASSERT(cur == NULL);
-		error = xfs_bmap_extents_to_btree(tp, ip, firstfsb, free_list,
+		error = xfs_bmap_extents_to_btree(tp, ip, firstfsb, dfops,
 				&cur, 0, &tmp_logflags, whichfork);
 		logflags |= tmp_logflags;
 	}
@@ -5963,7 +5963,7 @@ xfs_bmap_split_extent(
 {
 	struct xfs_mount        *mp = ip->i_mount;
 	struct xfs_trans        *tp;
-	struct xfs_defer_ops    free_list;
+	struct xfs_defer_ops    dfops;
 	xfs_fsblock_t           firstfsb;
 	int                     error;
 
@@ -5975,21 +5975,21 @@ xfs_bmap_split_extent(
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
 
-	xfs_defer_init(&free_list, &firstfsb);
+	xfs_defer_init(&dfops, &firstfsb);
 
 	error = xfs_bmap_split_extent_at(tp, ip, split_fsb,
-			&firstfsb, &free_list);
+			&firstfsb, &dfops);
 	if (error)
 		goto out;
 
-	error = xfs_defer_finish(&tp, &free_list, NULL);
+	error = xfs_defer_finish(&tp, &dfops, NULL);
 	if (error)
 		goto out;
 
 	return xfs_trans_commit(tp);
 
 out:
-	xfs_defer_cancel(&free_list);
+	xfs_defer_cancel(&dfops);
 	xfs_trans_cancel(tp);
 	return error;
 }
diff --git a/libxfs/xfs_bmap.h b/libxfs/xfs_bmap.h
index a779cc5..6854e61 100644
--- a/libxfs/xfs_bmap.h
+++ b/libxfs/xfs_bmap.h
@@ -32,7 +32,7 @@ extern kmem_zone_t	*xfs_bmap_free_item_zone;
  */
 struct xfs_bmalloca {
 	xfs_fsblock_t		*firstblock; /* i/o first block allocated */
-	struct xfs_defer_ops	*flist;	/* bmap freelist */
+	struct xfs_defer_ops	*dfops;	/* bmap freelist */
 	struct xfs_trans	*tp;	/* transaction pointer */
 	struct xfs_inode	*ip;	/* incore inode pointer */
 	struct xfs_bmbt_irec	prev;	/* extent before the new one */
@@ -164,7 +164,7 @@ void	xfs_bmap_trace_exlist(struct xfs_inode *ip, xfs_extnum_t cnt,
 
 int	xfs_bmap_add_attrfork(struct xfs_inode *ip, int size, int rsvd);
 void	xfs_bmap_local_to_extents_empty(struct xfs_inode *ip, int whichfork);
-void	xfs_bmap_add_free(struct xfs_mount *mp, struct xfs_defer_ops *flist,
+void	xfs_bmap_add_free(struct xfs_mount *mp, struct xfs_defer_ops *dfops,
 			  xfs_fsblock_t bno, xfs_filblks_t len);
 void	xfs_bmap_compute_maxlevels(struct xfs_mount *mp, int whichfork);
 int	xfs_bmap_first_unused(struct xfs_trans *tp, struct xfs_inode *ip,
@@ -186,18 +186,18 @@ int	xfs_bmapi_write(struct xfs_trans *tp, struct xfs_inode *ip,
 		xfs_fileoff_t bno, xfs_filblks_t len, int flags,
 		xfs_fsblock_t *firstblock, xfs_extlen_t total,
 		struct xfs_bmbt_irec *mval, int *nmap,
-		struct xfs_defer_ops *flist);
+		struct xfs_defer_ops *dfops);
 int	xfs_bunmapi(struct xfs_trans *tp, struct xfs_inode *ip,
 		xfs_fileoff_t bno, xfs_filblks_t len, int flags,
 		xfs_extnum_t nexts, xfs_fsblock_t *firstblock,
-		struct xfs_defer_ops *flist, int *done);
+		struct xfs_defer_ops *dfops, int *done);
 int	xfs_check_nostate_extents(struct xfs_ifork *ifp, xfs_extnum_t idx,
 		xfs_extnum_t num);
 uint	xfs_default_attroffset(struct xfs_inode *ip);
 int	xfs_bmap_shift_extents(struct xfs_trans *tp, struct xfs_inode *ip,
 		xfs_fileoff_t *next_fsb, xfs_fileoff_t offset_shift_fsb,
 		int *done, xfs_fileoff_t stop_fsb, xfs_fsblock_t *firstblock,
-		struct xfs_defer_ops *flist, enum shift_direction direction,
+		struct xfs_defer_ops *dfops, enum shift_direction direction,
 		int num_exts);
 int	xfs_bmap_split_extent(struct xfs_inode *ip, xfs_fileoff_t split_offset);
 
diff --git a/libxfs/xfs_bmap_btree.c b/libxfs/xfs_bmap_btree.c
index 8d4d4b0..d290769 100644
--- a/libxfs/xfs_bmap_btree.c
+++ b/libxfs/xfs_bmap_btree.c
@@ -404,11 +404,11 @@ xfs_bmbt_dup_cursor(
 			cur->bc_private.b.ip, cur->bc_private.b.whichfork);
 
 	/*
-	 * Copy the firstblock, flist, and flags values,
+	 * Copy the firstblock, dfops, and flags values,
 	 * since init cursor doesn't get them.
 	 */
 	new->bc_private.b.firstblock = cur->bc_private.b.firstblock;
-	new->bc_private.b.flist = cur->bc_private.b.flist;
+	new->bc_private.b.dfops = cur->bc_private.b.dfops;
 	new->bc_private.b.flags = cur->bc_private.b.flags;
 
 	return new;
@@ -421,7 +421,7 @@ xfs_bmbt_update_cursor(
 {
 	ASSERT((dst->bc_private.b.firstblock != NULLFSBLOCK) ||
 	       (dst->bc_private.b.ip->i_d.di_flags & XFS_DIFLAG_REALTIME));
-	ASSERT(dst->bc_private.b.flist == src->bc_private.b.flist);
+	ASSERT(dst->bc_private.b.dfops == src->bc_private.b.dfops);
 
 	dst->bc_private.b.allocated += src->bc_private.b.allocated;
 	dst->bc_private.b.firstblock = src->bc_private.b.firstblock;
@@ -460,7 +460,7 @@ xfs_bmbt_alloc_block(
 		 * block allocation here and corrupt the filesystem.
 		 */
 		args.minleft = args.tp->t_blk_res;
-	} else if (cur->bc_private.b.flist->dop_low) {
+	} else if (cur->bc_private.b.dfops->dop_low) {
 		args.type = XFS_ALLOCTYPE_START_BNO;
 	} else {
 		args.type = XFS_ALLOCTYPE_NEAR_BNO;
@@ -488,7 +488,7 @@ xfs_bmbt_alloc_block(
 		error = xfs_alloc_vextent(&args);
 		if (error)
 			goto error0;
-		cur->bc_private.b.flist->dop_low = true;
+		cur->bc_private.b.dfops->dop_low = true;
 	}
 	if (args.fsbno == NULLFSBLOCK) {
 		XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT);
@@ -524,7 +524,7 @@ xfs_bmbt_free_block(
 	struct xfs_trans	*tp = cur->bc_tp;
 	xfs_fsblock_t		fsbno = XFS_DADDR_TO_FSB(mp, XFS_BUF_ADDR(bp));
 
-	xfs_bmap_add_free(mp, cur->bc_private.b.flist, fsbno, 1);
+	xfs_bmap_add_free(mp, cur->bc_private.b.dfops, fsbno, 1);
 	ip->i_d.di_nblocks--;
 
 	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
@@ -786,7 +786,7 @@ xfs_bmbt_init_cursor(
 	cur->bc_private.b.forksize = XFS_IFORK_SIZE(ip, whichfork);
 	cur->bc_private.b.ip = ip;
 	cur->bc_private.b.firstblock = NULLFSBLOCK;
-	cur->bc_private.b.flist = NULL;
+	cur->bc_private.b.dfops = NULL;
 	cur->bc_private.b.allocated = 0;
 	cur->bc_private.b.flags = 0;
 	cur->bc_private.b.whichfork = whichfork;
diff --git a/libxfs/xfs_btree.h b/libxfs/xfs_btree.h
index ae714a8..7483cac 100644
--- a/libxfs/xfs_btree.h
+++ b/libxfs/xfs_btree.h
@@ -234,12 +234,12 @@ typedef struct xfs_btree_cur
 	union {
 		struct {			/* needed for BNO, CNT, INO */
 			struct xfs_buf	*agbp;	/* agf/agi buffer pointer */
-			struct xfs_defer_ops *flist;	/* deferred updates */
+			struct xfs_defer_ops *dfops;	/* deferred updates */
 			xfs_agnumber_t	agno;	/* ag number */
 		} a;
 		struct {			/* needed for BMAP */
 			struct xfs_inode *ip;	/* pointer to our inode */
-			struct xfs_defer_ops *flist;	/* deferred updates */
+			struct xfs_defer_ops *dfops;	/* deferred updates */
 			xfs_fsblock_t	firstblock;	/* 1st blk allocated */
 			int		allocated;	/* count of alloced */
 			short		forksize;	/* fork's inode space */
diff --git a/libxfs/xfs_da_btree.c b/libxfs/xfs_da_btree.c
index 298252e..bf24ade 100644
--- a/libxfs/xfs_da_btree.c
+++ b/libxfs/xfs_da_btree.c
@@ -2025,7 +2025,7 @@ xfs_da_grow_inode_int(
 	error = xfs_bmapi_write(tp, dp, *bno, count,
 			xfs_bmapi_aflag(w)|XFS_BMAPI_METADATA|XFS_BMAPI_CONTIG,
 			args->firstblock, args->total, &map, &nmap,
-			args->flist);
+			args->dfops);
 	if (error)
 		return error;
 
@@ -2048,7 +2048,7 @@ xfs_da_grow_inode_int(
 			error = xfs_bmapi_write(tp, dp, b, c,
 					xfs_bmapi_aflag(w)|XFS_BMAPI_METADATA,
 					args->firstblock, args->total,
-					&mapp[mapi], &nmap, args->flist);
+					&mapp[mapi], &nmap, args->dfops);
 			if (error)
 				goto out_free_map;
 			if (nmap < 1)
@@ -2358,7 +2358,7 @@ xfs_da_shrink_inode(
 		 */
 		error = xfs_bunmapi(tp, dp, dead_blkno, count,
 				    xfs_bmapi_aflag(w), 0, args->firstblock,
-				    args->flist, &done);
+				    args->dfops, &done);
 		if (error == -ENOSPC) {
 			if (w != XFS_DATA_FORK)
 				break;
diff --git a/libxfs/xfs_da_btree.h b/libxfs/xfs_da_btree.h
index 249813a..98c75cb 100644
--- a/libxfs/xfs_da_btree.h
+++ b/libxfs/xfs_da_btree.h
@@ -70,7 +70,7 @@ typedef struct xfs_da_args {
 	xfs_ino_t	inumber;	/* input/output inode number */
 	struct xfs_inode *dp;		/* directory inode to manipulate */
 	xfs_fsblock_t	*firstblock;	/* ptr to firstblock for bmap calls */
-	struct xfs_defer_ops *flist;	/* ptr to freelist for bmap_finish */
+	struct xfs_defer_ops *dfops;	/* ptr to freelist for bmap_finish */
 	struct xfs_trans *trans;	/* current trans (changes over time) */
 	xfs_extlen_t	total;		/* total blocks needed, for 1st bmap */
 	int		whichfork;	/* data or attribute fork */
diff --git a/libxfs/xfs_dir2.c b/libxfs/xfs_dir2.c
index 6edaa8e..4180a93 100644
--- a/libxfs/xfs_dir2.c
+++ b/libxfs/xfs_dir2.c
@@ -258,7 +258,7 @@ xfs_dir_createname(
 	struct xfs_name		*name,
 	xfs_ino_t		inum,		/* new entry inode number */
 	xfs_fsblock_t		*first,		/* bmap's firstblock */
-	struct xfs_defer_ops	*flist,		/* bmap's freeblock list */
+	struct xfs_defer_ops	*dfops,		/* bmap's freeblock list */
 	xfs_extlen_t		total)		/* bmap's total block count */
 {
 	struct xfs_da_args	*args;
@@ -285,7 +285,7 @@ xfs_dir_createname(
 	args->inumber = inum;
 	args->dp = dp;
 	args->firstblock = first;
-	args->flist = flist;
+	args->dfops = dfops;
 	args->total = total;
 	args->whichfork = XFS_DATA_FORK;
 	args->trans = tp;
@@ -435,7 +435,7 @@ xfs_dir_removename(
 	struct xfs_name	*name,
 	xfs_ino_t	ino,
 	xfs_fsblock_t	*first,		/* bmap's firstblock */
-	struct xfs_defer_ops	*flist,		/* bmap's freeblock list */
+	struct xfs_defer_ops	*dfops,		/* bmap's freeblock list */
 	xfs_extlen_t	total)		/* bmap's total block count */
 {
 	struct xfs_da_args *args;
@@ -457,7 +457,7 @@ xfs_dir_removename(
 	args->inumber = ino;
 	args->dp = dp;
 	args->firstblock = first;
-	args->flist = flist;
+	args->dfops = dfops;
 	args->total = total;
 	args->whichfork = XFS_DATA_FORK;
 	args->trans = tp;
@@ -497,7 +497,7 @@ xfs_dir_replace(
 	struct xfs_name	*name,		/* name of entry to replace */
 	xfs_ino_t	inum,		/* new inode number */
 	xfs_fsblock_t	*first,		/* bmap's firstblock */
-	struct xfs_defer_ops	*flist,		/* bmap's freeblock list */
+	struct xfs_defer_ops	*dfops,		/* bmap's freeblock list */
 	xfs_extlen_t	total)		/* bmap's total block count */
 {
 	struct xfs_da_args *args;
@@ -522,7 +522,7 @@ xfs_dir_replace(
 	args->inumber = inum;
 	args->dp = dp;
 	args->firstblock = first;
-	args->flist = flist;
+	args->dfops = dfops;
 	args->total = total;
 	args->whichfork = XFS_DATA_FORK;
 	args->trans = tp;
@@ -679,7 +679,7 @@ xfs_dir2_shrink_inode(
 
 	/* Unmap the fsblock(s). */
 	error = xfs_bunmapi(tp, dp, da, args->geo->fsbcount, 0, 0,
-			    args->firstblock, args->flist, &done);
+			    args->firstblock, args->dfops, &done);
 	if (error) {
 		/*
 		 * ENOSPC actually can happen if we're in a removename with no
diff --git a/libxfs/xfs_dir2.h b/libxfs/xfs_dir2.h
index 5737d85..a9bab0e 100644
--- a/libxfs/xfs_dir2.h
+++ b/libxfs/xfs_dir2.h
@@ -129,18 +129,18 @@ extern int xfs_dir_init(struct xfs_trans *tp, struct xfs_inode *dp,
 extern int xfs_dir_createname(struct xfs_trans *tp, struct xfs_inode *dp,
 				struct xfs_name *name, xfs_ino_t inum,
 				xfs_fsblock_t *first,
-				struct xfs_defer_ops *flist, xfs_extlen_t tot);
+				struct xfs_defer_ops *dfops, xfs_extlen_t tot);
 extern int xfs_dir_lookup(struct xfs_trans *tp, struct xfs_inode *dp,
 				struct xfs_name *name, xfs_ino_t *inum,
 				struct xfs_name *ci_name);
 extern int xfs_dir_removename(struct xfs_trans *tp, struct xfs_inode *dp,
 				struct xfs_name *name, xfs_ino_t ino,
 				xfs_fsblock_t *first,
-				struct xfs_defer_ops *flist, xfs_extlen_t tot);
+				struct xfs_defer_ops *dfops, xfs_extlen_t tot);
 extern int xfs_dir_replace(struct xfs_trans *tp, struct xfs_inode *dp,
 				struct xfs_name *name, xfs_ino_t inum,
 				xfs_fsblock_t *first,
-				struct xfs_defer_ops *flist, xfs_extlen_t tot);
+				struct xfs_defer_ops *dfops, xfs_extlen_t tot);
 extern int xfs_dir_canenter(struct xfs_trans *tp, struct xfs_inode *dp,
 				struct xfs_name *name);
 
diff --git a/libxfs/xfs_ialloc.c b/libxfs/xfs_ialloc.c
index d03570c..1545338 100644
--- a/libxfs/xfs_ialloc.c
+++ b/libxfs/xfs_ialloc.c
@@ -1812,7 +1812,7 @@ xfs_difree_inode_chunk(
 	struct xfs_mount		*mp,
 	xfs_agnumber_t			agno,
 	struct xfs_inobt_rec_incore	*rec,
-	struct xfs_defer_ops		*flist)
+	struct xfs_defer_ops		*dfops)
 {
 	xfs_agblock_t	sagbno = XFS_AGINO_TO_AGBNO(mp, rec->ir_startino);
 	int		startidx, endidx;
@@ -1823,7 +1823,7 @@ xfs_difree_inode_chunk(
 
 	if (!xfs_inobt_issparse(rec->ir_holemask)) {
 		/* not sparse, calculate extent info directly */
-		xfs_bmap_add_free(mp, flist, XFS_AGB_TO_FSB(mp, agno, sagbno),
+		xfs_bmap_add_free(mp, dfops, XFS_AGB_TO_FSB(mp, agno, sagbno),
 				  mp->m_ialloc_blks);
 		return;
 	}
@@ -1867,7 +1867,7 @@ xfs_difree_inode_chunk(
 
 		ASSERT(agbno % mp->m_sb.sb_spino_align == 0);
 		ASSERT(contigblk % mp->m_sb.sb_spino_align == 0);
-		xfs_bmap_add_free(mp, flist, XFS_AGB_TO_FSB(mp, agno, agbno),
+		xfs_bmap_add_free(mp, dfops, XFS_AGB_TO_FSB(mp, agno, agbno),
 				  contigblk);
 
 		/* reset range to current bit and carry on... */
@@ -1884,7 +1884,7 @@ xfs_difree_inobt(
 	struct xfs_trans		*tp,
 	struct xfs_buf			*agbp,
 	xfs_agino_t			agino,
-	struct xfs_defer_ops		*flist,
+	struct xfs_defer_ops		*dfops,
 	struct xfs_icluster		*xic,
 	struct xfs_inobt_rec_incore	*orec)
 {
@@ -1971,7 +1971,7 @@ xfs_difree_inobt(
 			goto error0;
 		}
 
-		xfs_difree_inode_chunk(mp, agno, &rec, flist);
+		xfs_difree_inode_chunk(mp, agno, &rec, dfops);
 	} else {
 		xic->deleted = 0;
 
@@ -2116,7 +2116,7 @@ int
 xfs_difree(
 	struct xfs_trans	*tp,		/* transaction pointer */
 	xfs_ino_t		inode,		/* inode to be freed */
-	struct xfs_defer_ops	*flist,		/* extents to free */
+	struct xfs_defer_ops	*dfops,		/* extents to free */
 	struct xfs_icluster	*xic)	/* cluster info if deleted */
 {
 	/* REFERENCED */
@@ -2168,7 +2168,7 @@ xfs_difree(
 	/*
 	 * Fix up the inode allocation btree.
 	 */
-	error = xfs_difree_inobt(mp, tp, agbp, agino, flist, xic, &rec);
+	error = xfs_difree_inobt(mp, tp, agbp, agino, dfops, xic, &rec);
 	if (error)
 		goto error0;
 
diff --git a/libxfs/xfs_ialloc.h b/libxfs/xfs_ialloc.h
index 2e06b67..0bb8966 100644
--- a/libxfs/xfs_ialloc.h
+++ b/libxfs/xfs_ialloc.h
@@ -95,7 +95,7 @@ int					/* error */
 xfs_difree(
 	struct xfs_trans *tp,		/* transaction pointer */
 	xfs_ino_t	inode,		/* inode to be freed */
-	struct xfs_defer_ops *flist,	/* extents to free */
+	struct xfs_defer_ops *dfops,	/* extents to free */
 	struct xfs_icluster *ifree);	/* cluster info if deleted */
 
 /*
diff --git a/mkfs/proto.c b/mkfs/proto.c
index f0c33e4..cf9ed77 100644
--- a/mkfs/proto.c
+++ b/mkfs/proto.c
@@ -27,7 +27,7 @@ static char *getstr(char **pp);
 static void fail(char *msg, int i);
 static void getres(struct xfs_mount *mp, uint blocks, struct xfs_trans **tpp);
 static void rsvfile(xfs_mount_t *mp, xfs_inode_t *ip, long long len);
-static int newfile(xfs_trans_t *tp, xfs_inode_t *ip, struct xfs_defer_ops *flist,
+static int newfile(xfs_trans_t *tp, xfs_inode_t *ip, struct xfs_defer_ops *dfops,
 	xfs_fsblock_t *first, int dolocal, int logit, char *buf, int len);
 static char *newregfile(char **pp, int *len);
 static void rtinit(xfs_mount_t *mp);
@@ -236,7 +236,7 @@ static int
 newfile(
 	xfs_trans_t	*tp,
 	xfs_inode_t	*ip,
-	struct xfs_defer_ops	*flist,
+	struct xfs_defer_ops	*dfops,
 	xfs_fsblock_t	*first,
 	int		dolocal,
 	int		logit,
@@ -267,7 +267,7 @@ newfile(
 		nb = XFS_B_TO_FSB(mp, len);
 		nmap = 1;
 		error = -libxfs_bmapi_write(tp, ip, 0, nb, 0, first, nb,
-				&map, &nmap, flist);
+				&map, &nmap, dfops);
 		if (error) {
 			fail(_("error allocating space for a file"), error);
 		}
@@ -329,14 +329,14 @@ newdirent(
 	struct xfs_name	*name,
 	xfs_ino_t	inum,
 	xfs_fsblock_t	*first,
-	struct xfs_defer_ops	*flist)
+	struct xfs_defer_ops	*dfops)
 {
 	int	error;
 	int	rsv;
 
 	rsv = XFS_DIRENTER_SPACE_RES(mp, name->len);
 
-	error = -libxfs_dir_createname(tp, pip, name, inum, first, flist, rsv);
+	error = -libxfs_dir_createname(tp, pip, name, inum, first, dfops, rsv);
 	if (error)
 		fail(_("directory createname error"), error);
 }
@@ -375,7 +375,7 @@ parseproto(
 	int		error;
 	xfs_fsblock_t	first;
 	int		flags;
-	struct xfs_defer_ops	flist;
+	struct xfs_defer_ops	dfops;
 	int		fmt;
 	int		i;
 	xfs_inode_t	*ip;
@@ -460,7 +460,7 @@ parseproto(
 	xname.len = name ? strlen(name) : 0;
 	xname.type = 0;
 	flags = XFS_ILOG_CORE;
-	xfs_defer_init(&flist, &first);
+	xfs_defer_init(&dfops, &first);
 	switch (fmt) {
 	case IF_REGULAR:
 		buf = newregfile(pp, &len);
@@ -469,12 +469,12 @@ parseproto(
 					   &creds, fsxp, &ip);
 		if (error)
 			fail(_("Inode allocation failed"), error);
-		flags |= newfile(tp, ip, &flist, &first, 0, 0, buf, len);
+		flags |= newfile(tp, ip, &dfops, &first, 0, 0, buf, len);
 		if (buf)
 			free(buf);
 		libxfs_trans_ijoin(tp, pip, 0);
 		xname.type = XFS_DIR3_FT_REG_FILE;
-		newdirent(mp, tp, pip, &xname, ip->i_ino, &first, &flist);
+		newdirent(mp, tp, pip, &xname, ip->i_ino, &first, &dfops);
 		break;
 
 	case IF_RESERVED:			/* pre-allocated space only */
@@ -497,10 +497,10 @@ parseproto(
 		libxfs_trans_ijoin(tp, pip, 0);
 
 		xname.type = XFS_DIR3_FT_REG_FILE;
-		newdirent(mp, tp, pip, &xname, ip->i_ino, &first, &flist);
+		newdirent(mp, tp, pip, &xname, ip->i_ino, &first, &dfops);
 		libxfs_trans_log_inode(tp, ip, flags);
 
-		error = -xfs_defer_finish(&tp, &flist, ip);
+		error = -xfs_defer_finish(&tp, &dfops, ip);
 		if (error)
 			fail(_("Pre-allocated file creation failed"), error);
 		libxfs_trans_commit(tp);
@@ -519,7 +519,7 @@ parseproto(
 		}
 		libxfs_trans_ijoin(tp, pip, 0);
 		xname.type = XFS_DIR3_FT_BLKDEV;
-		newdirent(mp, tp, pip, &xname, ip->i_ino, &first, &flist);
+		newdirent(mp, tp, pip, &xname, ip->i_ino, &first, &dfops);
 		flags |= XFS_ILOG_DEV;
 		break;
 
@@ -533,7 +533,7 @@ parseproto(
 			fail(_("Inode allocation failed"), error);
 		libxfs_trans_ijoin(tp, pip, 0);
 		xname.type = XFS_DIR3_FT_CHRDEV;
-		newdirent(mp, tp, pip, &xname, ip->i_ino, &first, &flist);
+		newdirent(mp, tp, pip, &xname, ip->i_ino, &first, &dfops);
 		flags |= XFS_ILOG_DEV;
 		break;
 
@@ -545,7 +545,7 @@ parseproto(
 			fail(_("Inode allocation failed"), error);
 		libxfs_trans_ijoin(tp, pip, 0);
 		xname.type = XFS_DIR3_FT_FIFO;
-		newdirent(mp, tp, pip, &xname, ip->i_ino, &first, &flist);
+		newdirent(mp, tp, pip, &xname, ip->i_ino, &first, &dfops);
 		break;
 	case IF_SYMLINK:
 		buf = getstr(pp);
@@ -555,10 +555,10 @@ parseproto(
 				&creds, fsxp, &ip);
 		if (error)
 			fail(_("Inode allocation failed"), error);
-		flags |= newfile(tp, ip, &flist, &first, 1, 1, buf, len);
+		flags |= newfile(tp, ip, &dfops, &first, 1, 1, buf, len);
 		libxfs_trans_ijoin(tp, pip, 0);
 		xname.type = XFS_DIR3_FT_SYMLINK;
-		newdirent(mp, tp, pip, &xname, ip->i_ino, &first, &flist);
+		newdirent(mp, tp, pip, &xname, ip->i_ino, &first, &dfops);
 		break;
 	case IF_DIRECTORY:
 		getres(mp, 0, &tp);
@@ -576,13 +576,13 @@ parseproto(
 			libxfs_trans_ijoin(tp, pip, 0);
 			xname.type = XFS_DIR3_FT_DIR;
 			newdirent(mp, tp, pip, &xname, ip->i_ino,
-				  &first, &flist);
+				  &first, &dfops);
 			inc_nlink(VFS_I(pip));
 			libxfs_trans_log_inode(tp, pip, XFS_ILOG_CORE);
 		}
 		newdirectory(mp, tp, ip, pip);
 		libxfs_trans_log_inode(tp, ip, flags);
-		error = -xfs_defer_finish(&tp, &flist, ip);
+		error = -xfs_defer_finish(&tp, &dfops, ip);
 		if (error)
 			fail(_("Directory creation failed"), error);
 		libxfs_trans_commit(tp);
@@ -608,7 +608,7 @@ parseproto(
 		fail(_("Unknown format"), EINVAL);
 	}
 	libxfs_trans_log_inode(tp, ip, flags);
-	error = -xfs_defer_finish(&tp, &flist, ip);
+	error = -xfs_defer_finish(&tp, &dfops, ip);
 	if (error) {
 		fail(_("Error encountered creating file from prototype file"),
 			error);
@@ -638,7 +638,7 @@ rtinit(
 	xfs_bmbt_irec_t	*ep;
 	int		error;
 	xfs_fsblock_t	first;
-	struct xfs_defer_ops	flist;
+	struct xfs_defer_ops	dfops;
 	int		i;
 	xfs_bmbt_irec_t	map[XFS_BMAP_MAX_NMAP];
 	xfs_extlen_t	nsumblocks;
@@ -697,13 +697,13 @@ rtinit(
 
 	libxfs_trans_ijoin(tp, rbmip, 0);
 	bno = 0;
-	xfs_defer_init(&flist, &first);
+	xfs_defer_init(&dfops, &first);
 	while (bno < mp->m_sb.sb_rbmblocks) {
 		nmap = XFS_BMAP_MAX_NMAP;
 		error = -libxfs_bmapi_write(tp, rbmip, bno,
 				(xfs_extlen_t)(mp->m_sb.sb_rbmblocks - bno),
 				0, &first, mp->m_sb.sb_rbmblocks,
-				map, &nmap, &flist);
+				map, &nmap, &dfops);
 		if (error) {
 			fail(_("Allocation of the realtime bitmap failed"),
 				error);
@@ -716,7 +716,7 @@ rtinit(
 		}
 	}
 
-	error = -xfs_defer_finish(&tp, &flist, rbmip);
+	error = -xfs_defer_finish(&tp, &dfops, rbmip);
 	if (error) {
 		fail(_("Completion of the realtime bitmap failed"), error);
 	}
@@ -732,13 +732,13 @@ rtinit(
 		res_failed(i);
 	libxfs_trans_ijoin(tp, rsumip, 0);
 	bno = 0;
-	xfs_defer_init(&flist, &first);
+	xfs_defer_init(&dfops, &first);
 	while (bno < nsumblocks) {
 		nmap = XFS_BMAP_MAX_NMAP;
 		error = -libxfs_bmapi_write(tp, rsumip, bno,
 				(xfs_extlen_t)(nsumblocks - bno),
 				0, &first, nsumblocks,
-				map, &nmap, &flist);
+				map, &nmap, &dfops);
 		if (error) {
 			fail(_("Allocation of the realtime summary failed"),
 				error);
@@ -750,7 +750,7 @@ rtinit(
 			bno += ep->br_blockcount;
 		}
 	}
-	error = -xfs_defer_finish(&tp, &flist, rsumip);
+	error = -xfs_defer_finish(&tp, &dfops, rsumip);
 	if (error) {
 		fail(_("Completion of the realtime summary failed"), error);
 	}
@@ -765,7 +765,7 @@ rtinit(
 		if (i)
 			res_failed(i);
 		libxfs_trans_ijoin(tp, rbmip, 0);
-		xfs_defer_init(&flist, &first);
+		xfs_defer_init(&dfops, &first);
 		ebno = XFS_RTMIN(mp->m_sb.sb_rextents,
 			bno + NBBY * mp->m_sb.sb_blocksize);
 		error = -libxfs_rtfree_extent(tp, bno, (xfs_extlen_t)(ebno-bno));
@@ -773,7 +773,7 @@ rtinit(
 			fail(_("Error initializing the realtime space"),
 				error);
 		}
-		error = -xfs_defer_finish(&tp, &flist, rbmip);
+		error = -xfs_defer_finish(&tp, &dfops, rbmip);
 		if (error) {
 			fail(_("Error completing the realtime space"), error);
 		}
diff --git a/repair/phase6.c b/repair/phase6.c
index 961f7bc..0194580 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -484,7 +484,7 @@ mk_rbmino(xfs_mount_t *mp)
 	int		i;
 	int		nmap;
 	int		error;
-	struct xfs_defer_ops	flist;
+	struct xfs_defer_ops	dfops;
 	xfs_fileoff_t	bno;
 	xfs_bmbt_irec_t	map[XFS_BMAP_MAX_NMAP];
 	int		vers;
@@ -550,13 +550,13 @@ mk_rbmino(xfs_mount_t *mp)
 
 	libxfs_trans_ijoin(tp, ip, 0);
 	bno = 0;
-	xfs_defer_init(&flist, &first);
+	xfs_defer_init(&dfops, &first);
 	while (bno < mp->m_sb.sb_rbmblocks) {
 		nmap = XFS_BMAP_MAX_NMAP;
 		error = -libxfs_bmapi_write(tp, ip, bno,
 			  (xfs_extlen_t)(mp->m_sb.sb_rbmblocks - bno),
 			  0, &first, mp->m_sb.sb_rbmblocks,
-			  map, &nmap, &flist);
+			  map, &nmap, &dfops);
 		if (error) {
 			do_error(
 			_("couldn't allocate realtime bitmap, error = %d\n"),
@@ -569,7 +569,7 @@ mk_rbmino(xfs_mount_t *mp)
 			bno += ep->br_blockcount;
 		}
 	}
-	error = -xfs_defer_finish(&tp, &flist, ip);
+	error = -xfs_defer_finish(&tp, &dfops, ip);
 	if (error) {
 		do_error(
 		_("allocation of the realtime bitmap failed, error = %d\n"),
@@ -731,7 +731,7 @@ mk_rsumino(xfs_mount_t *mp)
 	int		nmap;
 	int		error;
 	int		nsumblocks;
-	struct xfs_defer_ops	flist;
+	struct xfs_defer_ops	dfops;
 	xfs_fileoff_t	bno;
 	xfs_bmbt_irec_t	map[XFS_BMAP_MAX_NMAP];
 	int		vers;
@@ -789,7 +789,7 @@ mk_rsumino(xfs_mount_t *mp)
 	 * then allocate blocks for file and fill with zeroes (stolen
 	 * from mkfs)
 	 */
-	xfs_defer_init(&flist, &first);
+	xfs_defer_init(&dfops, &first);
 
 	nsumblocks = mp->m_rsumsize >> mp->m_sb.sb_blocklog;
 	tres.tr_logres = BBTOB(128);
@@ -803,12 +803,12 @@ mk_rsumino(xfs_mount_t *mp)
 
 	libxfs_trans_ijoin(tp, ip, 0);
 	bno = 0;
-	xfs_defer_init(&flist, &first);
+	xfs_defer_init(&dfops, &first);
 	while (bno < nsumblocks) {
 		nmap = XFS_BMAP_MAX_NMAP;
 		error = -libxfs_bmapi_write(tp, ip, bno,
 			  (xfs_extlen_t)(nsumblocks - bno),
-			  0, &first, nsumblocks, map, &nmap, &flist);
+			  0, &first, nsumblocks, map, &nmap, &dfops);
 		if (error) {
 			do_error(
 		_("couldn't allocate realtime summary inode, error = %d\n"),
@@ -821,7 +821,7 @@ mk_rsumino(xfs_mount_t *mp)
 			bno += ep->br_blockcount;
 		}
 	}
-	error = -xfs_defer_finish(&tp, &flist, ip);
+	error = -xfs_defer_finish(&tp, &dfops, ip);
 	if (error) {
 		do_error(
 	_("allocation of the realtime summary ino failed, error = %d\n"),
@@ -919,7 +919,7 @@ mk_orphanage(xfs_mount_t *mp)
 	int		ino_offset = 0;
 	int		i;
 	int		error;
-	struct xfs_defer_ops	flist;
+	struct xfs_defer_ops	dfops;
 	const int	mode = 0755;
 	int		nres;
 	struct xfs_name	xname;
@@ -945,7 +945,7 @@ mk_orphanage(xfs_mount_t *mp)
 	 * could not be found, create it
 	 */
 
-	xfs_defer_init(&flist, &first);
+	xfs_defer_init(&dfops, &first);
 
 	nres = XFS_MKDIR_SPACE_RES(mp, xname.len);
 	i = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_mkdir, nres, 0, 0, &tp);
@@ -1007,7 +1007,7 @@ mk_orphanage(xfs_mount_t *mp)
 	 * create the actual entry
 	 */
 	error = -libxfs_dir_createname(tp, pip, &xname, ip->i_ino, &first,
-					&flist, nres);
+					&dfops, nres);
 	if (error)
 		do_error(
 		_("can't make %s, createname error %d\n"),
@@ -1028,7 +1028,7 @@ mk_orphanage(xfs_mount_t *mp)
 	libxfs_dir_init(tp, ip, pip);
 	libxfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 
-	error = -xfs_defer_finish(&tp, &flist, ip);
+	error = -xfs_defer_finish(&tp, &dfops, ip);
 	if (error) {
 		do_error(_("%s directory creation failed -- bmapf error %d\n"),
 			ORPHANAGE, error);
@@ -1057,7 +1057,7 @@ mv_orphanage(
 	xfs_inode_t		*ino_p;
 	xfs_trans_t		*tp;
 	xfs_fsblock_t		first;
-	struct xfs_defer_ops	flist;
+	struct xfs_defer_ops	dfops;
 	int			err;
 	unsigned char		fname[MAXPATHLEN + 1];
 	int			nres;
@@ -1110,9 +1110,9 @@ mv_orphanage(
 			libxfs_trans_ijoin(tp, orphanage_ip, 0);
 			libxfs_trans_ijoin(tp, ino_p, 0);
 
-			xfs_defer_init(&flist, &first);
+			xfs_defer_init(&dfops, &first);
 			err = -libxfs_dir_createname(tp, orphanage_ip, &xname,
-						ino, &first, &flist, nres);
+						ino, &first, &dfops, nres);
 			if (err)
 				do_error(
 	_("name create failed in %s (%d), filesystem may be out of space\n"),
@@ -1125,7 +1125,7 @@ mv_orphanage(
 			libxfs_trans_log_inode(tp, orphanage_ip, XFS_ILOG_CORE);
 
 			err = -libxfs_dir_createname(tp, ino_p, &xfs_name_dotdot,
-					orphanage_ino, &first, &flist, nres);
+					orphanage_ino, &first, &dfops, nres);
 			if (err)
 				do_error(
 	_("creation of .. entry failed (%d), filesystem may be out of space\n"),
@@ -1134,7 +1134,7 @@ mv_orphanage(
 			inc_nlink(VFS_I(ino_p));
 			libxfs_trans_log_inode(tp, ino_p, XFS_ILOG_CORE);
 
-			err = -xfs_defer_finish(&tp, &flist, ino_p);
+			err = -xfs_defer_finish(&tp, &dfops, ino_p);
 			if (err)
 				do_error(
 	_("bmap finish failed (err - %d), filesystem may be out of space\n"),
@@ -1152,10 +1152,10 @@ mv_orphanage(
 			libxfs_trans_ijoin(tp, orphanage_ip, 0);
 			libxfs_trans_ijoin(tp, ino_p, 0);
 
-			xfs_defer_init(&flist, &first);
+			xfs_defer_init(&dfops, &first);
 
 			err = -libxfs_dir_createname(tp, orphanage_ip, &xname,
-						ino, &first, &flist, nres);
+						ino, &first, &dfops, nres);
 			if (err)
 				do_error(
 	_("name create failed in %s (%d), filesystem may be out of space\n"),
@@ -1174,14 +1174,14 @@ mv_orphanage(
 			if (entry_ino_num != orphanage_ino)  {
 				err = -libxfs_dir_replace(tp, ino_p,
 						&xfs_name_dotdot, orphanage_ino,
-						&first, &flist, nres);
+						&first, &dfops, nres);
 				if (err)
 					do_error(
 	_("name replace op failed (%d), filesystem may be out of space\n"),
 						err);
 			}
 
-			err = -xfs_defer_finish(&tp, &flist, ino_p);
+			err = -xfs_defer_finish(&tp, &dfops, ino_p);
 			if (err)
 				do_error(
 	_("bmap finish failed (%d), filesystem may be out of space\n"),
@@ -1208,9 +1208,9 @@ mv_orphanage(
 		libxfs_trans_ijoin(tp, orphanage_ip, 0);
 		libxfs_trans_ijoin(tp, ino_p, 0);
 
-		xfs_defer_init(&flist, &first);
+		xfs_defer_init(&dfops, &first);
 		err = -libxfs_dir_createname(tp, orphanage_ip, &xname, ino,
-						&first, &flist, nres);
+						&first, &dfops, nres);
 		if (err)
 			do_error(
 	_("name create failed in %s (%d), filesystem may be out of space\n"),
@@ -1220,7 +1220,7 @@ mv_orphanage(
 		set_nlink(VFS_I(ino_p), 1);
 		libxfs_trans_log_inode(tp, ino_p, XFS_ILOG_CORE);
 
-		err = -xfs_defer_finish(&tp, &flist, ino_p);
+		err = -xfs_defer_finish(&tp, &dfops, ino_p);
 		if (err)
 			do_error(
 	_("bmap finish failed (%d), filesystem may be out of space\n"),
@@ -1269,7 +1269,7 @@ longform_dir2_rebuild(
 	xfs_trans_t		*tp;
 	xfs_fileoff_t		lastblock;
 	xfs_fsblock_t		firstblock;
-	struct xfs_defer_ops	flist;
+	struct xfs_defer_ops	dfops;
 	xfs_inode_t		pip;
 	dir_hash_ent_t		*p;
 	int			done;
@@ -1292,7 +1292,7 @@ longform_dir2_rebuild(
 	    xfs_dir_ino_validate(mp, pip.i_ino))
 		pip.i_ino = mp->m_sb.sb_rootino;
 
-	xfs_defer_init(&flist, &firstblock);
+	xfs_defer_init(&dfops, &firstblock);
 
 	nres = XFS_REMOVE_SPACE_RES(mp);
 	error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_remove, nres, 0, 0, &tp);
@@ -1306,7 +1306,7 @@ longform_dir2_rebuild(
 
 	/* free all data, leaf, node and freespace blocks */
 	error = -libxfs_bunmapi(tp, ip, 0, lastblock, XFS_BMAPI_METADATA, 0,
-				&firstblock, &flist, &done);
+				&firstblock, &dfops, &done);
 	if (error) {
 		do_warn(_("xfs_bunmapi failed -- error - %d\n"), error);
 		goto out_bmap_cancel;
@@ -1320,7 +1320,7 @@ longform_dir2_rebuild(
 		goto out_bmap_cancel;
 	}
 
-	error = -xfs_defer_finish(&tp, &flist, ip);
+	error = -xfs_defer_finish(&tp, &dfops, ip);
 
 	libxfs_trans_commit(tp);
 
@@ -1344,9 +1344,9 @@ longform_dir2_rebuild(
 
 		libxfs_trans_ijoin(tp, ip, 0);
 
-		xfs_defer_init(&flist, &firstblock);
+		xfs_defer_init(&dfops, &firstblock);
 		error = -libxfs_dir_createname(tp, ip, &p->name, p->inum,
-						&firstblock, &flist, nres);
+						&firstblock, &dfops, nres);
 		if (error) {
 			do_warn(
 _("name create failed in ino %" PRIu64 " (%d), filesystem may be out of space\n"),
@@ -1354,7 +1354,7 @@ _("name create failed in ino %" PRIu64 " (%d), filesystem may be out of space\n"
 			goto out_bmap_cancel;
 		}
 
-		error = -xfs_defer_finish(&tp, &flist, ip);
+		error = -xfs_defer_finish(&tp, &dfops, ip);
 		if (error) {
 			do_warn(
 	_("bmap finish failed (%d), filesystem may be out of space\n"),
@@ -1368,7 +1368,7 @@ _("name create failed in ino %" PRIu64 " (%d), filesystem may be out of space\n"
 	return;
 
 out_bmap_cancel:
-	xfs_defer_cancel(&flist);
+	xfs_defer_cancel(&dfops);
 	libxfs_trans_cancel(tp);
 	return;
 }
@@ -1388,7 +1388,7 @@ dir2_kill_block(
 	xfs_da_args_t	args;
 	int		error;
 	xfs_fsblock_t	firstblock;
-	struct xfs_defer_ops	flist;
+	struct xfs_defer_ops	dfops;
 	int		nres;
 	xfs_trans_t	*tp;
 
@@ -1399,11 +1399,11 @@ dir2_kill_block(
 	libxfs_trans_ijoin(tp, ip, 0);
 	libxfs_trans_bjoin(tp, bp);
 	memset(&args, 0, sizeof(args));
-	xfs_defer_init(&flist, &firstblock);
+	xfs_defer_init(&dfops, &firstblock);
 	args.dp = ip;
 	args.trans = tp;
 	args.firstblock = &firstblock;
-	args.flist = &flist;
+	args.dfops = &dfops;
 	args.whichfork = XFS_DATA_FORK;
 	args.geo = mp->m_dir_geo;
 	if (da_bno >= mp->m_dir_geo->leafblk && da_bno < mp->m_dir_geo->freeblk)
@@ -1414,7 +1414,7 @@ dir2_kill_block(
 	if (error)
 		do_error(_("shrink_inode failed inode %" PRIu64 " block %u\n"),
 			ip->i_ino, da_bno);
-	xfs_defer_finish(&tp, &flist, ip);
+	xfs_defer_finish(&tp, &dfops, ip);
 	libxfs_trans_commit(tp);
 }
 
@@ -1448,7 +1448,7 @@ longform_dir2_entry_check_data(
 	char			*endptr;
 	int			error;
 	xfs_fsblock_t		firstblock;
-	struct xfs_defer_ops	flist;
+	struct xfs_defer_ops	dfops;
 	char			fname[MAXNAMELEN + 1];
 	freetab_t		*freetab;
 	int			i;
@@ -1590,7 +1590,7 @@ longform_dir2_entry_check_data(
 	libxfs_trans_ijoin(tp, ip, 0);
 	libxfs_trans_bjoin(tp, bp);
 	libxfs_trans_bhold(tp, bp);
-	xfs_defer_init(&flist, &firstblock);
+	xfs_defer_init(&dfops, &firstblock);
 	if (be32_to_cpu(d->magic) != wantmagic) {
 		do_warn(
 	_("bad directory block magic # %#x for directory inode %" PRIu64 " block %d: "),
@@ -1889,7 +1889,7 @@ _("entry \"%s\" in dir inode %" PRIu64 " inconsistent with .. value (%" PRIu64 "
 		repair_dir2_data_freescan(mp, M_DIROPS(mp), d, &i);
 	if (needlog)
 		libxfs_dir2_data_log_header(&da, bp);
-	xfs_defer_finish(&tp, &flist, ip);
+	xfs_defer_finish(&tp, &dfops, ip);
 	libxfs_trans_commit(tp);
 
 	/* record the largest free space in the freetab for later checking */
@@ -2805,7 +2805,7 @@ process_dir_inode(
 	int			ino_offset)
 {
 	xfs_ino_t		ino;
-	struct xfs_defer_ops	flist;
+	struct xfs_defer_ops	dfops;
 	xfs_fsblock_t		first;
 	xfs_inode_t		*ip;
 	xfs_trans_t		*tp;
@@ -2943,17 +2943,17 @@ process_dir_inode(
 
 		libxfs_trans_ijoin(tp, ip, 0);
 
-		xfs_defer_init(&flist, &first);
+		xfs_defer_init(&dfops, &first);
 
 		error = -libxfs_dir_createname(tp, ip, &xfs_name_dotdot,
-					ip->i_ino, &first, &flist, nres);
+					ip->i_ino, &first, &dfops, nres);
 		if (error)
 			do_error(
 	_("can't make \"..\" entry in root inode %" PRIu64 ", createname error %d\n"), ino, error);
 
 		libxfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 
-		error = -xfs_defer_finish(&tp, &flist, ip);
+		error = -xfs_defer_finish(&tp, &dfops, ip);
 		ASSERT(error == 0);
 		libxfs_trans_commit(tp);
 
@@ -3001,10 +3001,10 @@ process_dir_inode(
 
 			libxfs_trans_ijoin(tp, ip, 0);
 
-			xfs_defer_init(&flist, &first);
+			xfs_defer_init(&dfops, &first);
 
 			error = -libxfs_dir_createname(tp, ip, &xfs_name_dot,
-					ip->i_ino, &first, &flist, nres);
+					ip->i_ino, &first, &dfops, nres);
 			if (error)
 				do_error(
 	_("can't make \".\" entry in dir ino %" PRIu64 ", createname error %d\n"),
@@ -3012,7 +3012,7 @@ process_dir_inode(
 
 			libxfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 
-			error = -xfs_defer_finish(&tp, &flist, ip);
+			error = -xfs_defer_finish(&tp, &dfops, ip);
 			ASSERT(error == 0);
 			libxfs_trans_commit(tp);
 		}
diff --git a/repair/sb.c b/repair/sb.c
index 4eef14a..803125c 100644
--- a/repair/sb.c
+++ b/repair/sb.c
@@ -653,7 +653,7 @@ verify_set_primary_sb(xfs_sb_t		*rsb,
 
 		retval = get_sb(sb, off, size, agno);
 		if (retval == XR_EOF)
-			goto out_free_list;
+			goto out_dfops;
 
 		if (retval == XR_OK) {
 			/*
@@ -675,7 +675,7 @@ verify_set_primary_sb(xfs_sb_t		*rsb,
 	retval = 0;
 	if (num_ok < num_sbs / 2) {
 		retval = XR_INSUFF_SEC_SB;
-		goto out_free_list;
+		goto out_dfops;
 	}
 
 	current = get_best_geo(list);
@@ -703,7 +703,7 @@ verify_set_primary_sb(xfs_sb_t		*rsb,
 				exit(1);
 			}
 		}
-		goto out_free_list;
+		goto out_dfops;
 	case 1:
 		/*
 		 * If we only have a single allocation group there is no
@@ -718,7 +718,7 @@ verify_set_primary_sb(xfs_sb_t		*rsb,
 	  "Use the -o force_geometry option to proceed.\n"));
 			exit(1);
 		}
-		goto out_free_list;
+		goto out_dfops;
 	default:
 		/*
 		 * at least half of the probed superblocks have
@@ -758,7 +758,7 @@ verify_set_primary_sb(xfs_sb_t		*rsb,
 		sb_width = sb->sb_width;
 	}
 
-out_free_list:
+out_dfops:
 	free_geo(list);
 	free(sb);
 	return retval;

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 026/145] xfs: add tracepoints and error injection for deferred extent freeing
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (24 preceding siblings ...)
  2016-06-17  1:33 ` [PATCH 025/145] xfs: rename flist/free_list to dfops Darrick J. Wong
@ 2016-06-17  1:33 ` Darrick J. Wong
  2016-06-17  1:33 ` [PATCH 027/145] xfs_io: add free-extent error injection type Darrick J. Wong
                   ` (118 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:33 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Add a couple of tracepoints for the deferred extent free operation and
a site for injecting errors while finishing the operation.  This makes
it easier to debug deferred ops and test log redo.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 include/xfs_trace.h |    3 +++
 libxfs/xfs_alloc.c  |    7 +++++++
 libxfs/xfs_bmap.c   |    2 ++
 3 files changed, 12 insertions(+)


diff --git a/include/xfs_trace.h b/include/xfs_trace.h
index dd0d46f..cdabd18 100644
--- a/include/xfs_trace.h
+++ b/include/xfs_trace.h
@@ -184,6 +184,9 @@
 #define trace_xfs_defer_pending_cancel(...)	((void) 0)
 #define trace_xfs_defer_init(...)		((void) 0)
 
+#define trace_xfs_bmap_free_deferred(...)	((void) 0)
+#define trace_xfs_bmap_free_defer(...)		((void) 0)
+
 /* set c = c to avoid unused var warnings */
 #define trace_xfs_perag_get(a,b,c,d)	((c) = (c))
 #define trace_xfs_perag_get_tag(a,b,c,d) ((c) = (c))
diff --git a/libxfs/xfs_alloc.c b/libxfs/xfs_alloc.c
index 8454816..06133c0 100644
--- a/libxfs/xfs_alloc.c
+++ b/libxfs/xfs_alloc.c
@@ -2700,6 +2700,13 @@ xfs_free_extent(
 
 	ASSERT(len != 0);
 
+	trace_xfs_bmap_free_deferred(mp, agno, 0, agbno, len);
+
+	if (XFS_TEST_ERROR(false, mp,
+			XFS_ERRTAG_FREE_EXTENT,
+			XFS_RANDOM_FREE_EXTENT))
+		return -EIO;
+
 	error = xfs_free_extent_fix_freelist(tp, agno, &agbp);
 	if (error)
 		return error;
diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index 5385800..33e181c 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -588,6 +588,8 @@ xfs_bmap_add_free(
 	new = kmem_zone_alloc(xfs_bmap_free_item_zone, KM_SLEEP);
 	new->xbfi_startblock = bno;
 	new->xbfi_blockcount = (xfs_extlen_t)len;
+	trace_xfs_bmap_free_defer(mp, XFS_FSB_TO_AGNO(mp, bno), 0,
+			XFS_FSB_TO_AGBNO(mp, bno), len);
 	xfs_defer_add(dfops, XFS_DEFER_OPS_TYPE_FREE, &new->xbfi_list);
 }
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 027/145] xfs_io: add free-extent error injection type
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (25 preceding siblings ...)
  2016-06-17  1:33 ` [PATCH 026/145] xfs: add tracepoints and error injection for deferred extent freeing Darrick J. Wong
@ 2016-06-17  1:33 ` Darrick J. Wong
  2016-06-17  1:33 ` [PATCH 028/145] xfs: introduce rmap btree definitions Darrick J. Wong
                   ` (117 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:33 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Add XFS_ERRTAG_FREE_EXTENT to the types of errors we can inject.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 io/inject.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)


diff --git a/io/inject.c b/io/inject.c
index 90ccda8..12e0fb3 100644
--- a/io/inject.c
+++ b/io/inject.c
@@ -74,7 +74,9 @@ error_tag(char *name)
 		{ XFS_ERRTAG_DIOWRITE_IOERR,		"diowrite" },
 #define XFS_ERRTAG_BMAPIFORMAT                          21
 		{ XFS_ERRTAG_BMAPIFORMAT,		"bmapifmt" },
-#define XFS_ERRTAG_MAX                                  22
+#define XFS_ERRTAG_FREE_EXTENT				22
+		{ XFS_ERRTAG_FREE_EXTENT,		"free_extent" },
+#define XFS_ERRTAG_MAX                                  23
 		{ XFS_ERRTAG_MAX,			NULL }
 	};
 	int	count;

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 028/145] xfs: introduce rmap btree definitions
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (26 preceding siblings ...)
  2016-06-17  1:33 ` [PATCH 027/145] xfs_io: add free-extent error injection type Darrick J. Wong
@ 2016-06-17  1:33 ` Darrick J. Wong
  2016-06-17  1:33 ` [PATCH 029/145] xfs: add rmap btree stats infrastructure Darrick J. Wong
                   ` (116 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:33 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: Dave Chinner, xfs

From: Dave Chinner <dchinner@redhat.com>

Add new per-ag rmap btree definitions to the per-ag structures. The
rmap btree will sit in the empty slots on disk after the free space
btrees, and hence form a part of the array of space management
btrees. This requires the definition of the btree to be contiguous
with the free space btrees.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 libxfs/xfs_alloc.c  |    6 ++++++
 libxfs/xfs_btree.c  |    4 ++--
 libxfs/xfs_btree.h  |    3 +++
 libxfs/xfs_format.h |   22 +++++++++++++++++-----
 libxfs/xfs_types.h  |    4 ++--
 5 files changed, 30 insertions(+), 9 deletions(-)


diff --git a/libxfs/xfs_alloc.c b/libxfs/xfs_alloc.c
index 06133c0..675151d 100644
--- a/libxfs/xfs_alloc.c
+++ b/libxfs/xfs_alloc.c
@@ -2268,6 +2268,10 @@ xfs_agf_verify(
 	    be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNT]) > XFS_BTREE_MAXLEVELS)
 		return false;
 
+	if (xfs_sb_version_hasrmapbt(&mp->m_sb) &&
+	    be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAP]) > XFS_BTREE_MAXLEVELS)
+		return false;
+
 	/*
 	 * during growfs operations, the perag is not fully initialised,
 	 * so we can't use it for any useful checking. growfs ensures we can't
@@ -2399,6 +2403,8 @@ xfs_alloc_read_agf(
 			be32_to_cpu(agf->agf_levels[XFS_BTNUM_BNOi]);
 		pag->pagf_levels[XFS_BTNUM_CNTi] =
 			be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNTi]);
+		pag->pagf_levels[XFS_BTNUM_RMAPi] =
+			be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAPi]);
 		spin_lock_init(&pag->pagb_lock);
 		pag->pagb_count = 0;
 #ifdef __KERNEL__
diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c
index c5a475f..8f66e14 100644
--- a/libxfs/xfs_btree.c
+++ b/libxfs/xfs_btree.c
@@ -40,9 +40,9 @@ kmem_zone_t	*xfs_btree_cur_zone;
  * Btree magic numbers.
  */
 static const __uint32_t xfs_magics[2][XFS_BTNUM_MAX] = {
-	{ XFS_ABTB_MAGIC, XFS_ABTC_MAGIC, XFS_BMAP_MAGIC, XFS_IBT_MAGIC,
+	{ XFS_ABTB_MAGIC, XFS_ABTC_MAGIC, 0, XFS_BMAP_MAGIC, XFS_IBT_MAGIC,
 	  XFS_FIBT_MAGIC },
-	{ XFS_ABTB_CRC_MAGIC, XFS_ABTC_CRC_MAGIC,
+	{ XFS_ABTB_CRC_MAGIC, XFS_ABTC_CRC_MAGIC, XFS_RMAP_CRC_MAGIC,
 	  XFS_BMAP_CRC_MAGIC, XFS_IBT_CRC_MAGIC, XFS_FIBT_CRC_MAGIC }
 };
 #define xfs_btree_magic(cur) \
diff --git a/libxfs/xfs_btree.h b/libxfs/xfs_btree.h
index 7483cac..202fdd3 100644
--- a/libxfs/xfs_btree.h
+++ b/libxfs/xfs_btree.h
@@ -63,6 +63,7 @@ union xfs_btree_rec {
 #define	XFS_BTNUM_BMAP	((xfs_btnum_t)XFS_BTNUM_BMAPi)
 #define	XFS_BTNUM_INO	((xfs_btnum_t)XFS_BTNUM_INOi)
 #define	XFS_BTNUM_FINO	((xfs_btnum_t)XFS_BTNUM_FINOi)
+#define	XFS_BTNUM_RMAP	((xfs_btnum_t)XFS_BTNUM_RMAPi)
 
 /*
  * For logging record fields.
@@ -95,6 +96,7 @@ do {    \
 	case XFS_BTNUM_BMAP: __XFS_BTREE_STATS_INC(__mp, bmbt, stat); break; \
 	case XFS_BTNUM_INO: __XFS_BTREE_STATS_INC(__mp, ibt, stat); break; \
 	case XFS_BTNUM_FINO: __XFS_BTREE_STATS_INC(__mp, fibt, stat); break; \
+	case XFS_BTNUM_RMAP: break;	\
 	case XFS_BTNUM_MAX: ASSERT(0); __mp = __mp /* fucking gcc */ ; break; \
 	}       \
 } while (0)
@@ -115,6 +117,7 @@ do {    \
 		__XFS_BTREE_STATS_ADD(__mp, ibt, stat, val); break; \
 	case XFS_BTNUM_FINO:	\
 		__XFS_BTREE_STATS_ADD(__mp, fibt, stat, val); break; \
+	case XFS_BTNUM_RMAP: break; \
 	case XFS_BTNUM_MAX: ASSERT(0); __mp = __mp /* fucking gcc */ ; break; \
 	}       \
 } while (0)
diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index 825fa0c..cb30a25 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -455,6 +455,7 @@ xfs_sb_has_compat_feature(
 }
 
 #define XFS_SB_FEAT_RO_COMPAT_FINOBT   (1 << 0)		/* free inode btree */
+#define XFS_SB_FEAT_RO_COMPAT_RMAPBT   (1 << 1)		/* reverse map btree */
 #define XFS_SB_FEAT_RO_COMPAT_ALL \
 		(XFS_SB_FEAT_RO_COMPAT_FINOBT)
 #define XFS_SB_FEAT_RO_COMPAT_UNKNOWN	~XFS_SB_FEAT_RO_COMPAT_ALL
@@ -538,6 +539,12 @@ static inline bool xfs_sb_version_hasmetauuid(struct xfs_sb *sbp)
 		(sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_META_UUID);
 }
 
+static inline bool xfs_sb_version_hasrmapbt(struct xfs_sb *sbp)
+{
+	return (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) &&
+		(sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_RMAPBT);
+}
+
 /*
  * end of superblock version macros
  */
@@ -598,10 +605,10 @@ xfs_is_quota_inode(struct xfs_sb *sbp, xfs_ino_t ino)
 #define	XFS_AGI_GOOD_VERSION(v)	((v) == XFS_AGI_VERSION)
 
 /*
- * Btree number 0 is bno, 1 is cnt.  This value gives the size of the
+ * Btree number 0 is bno, 1 is cnt, 2 is rmap. This value gives the size of the
  * arrays below.
  */
-#define	XFS_BTNUM_AGF	((int)XFS_BTNUM_CNTi + 1)
+#define	XFS_BTNUM_AGF	((int)XFS_BTNUM_RMAPi + 1)
 
 /*
  * The second word of agf_levels in the first a.g. overlaps the EFS
@@ -618,12 +625,10 @@ typedef struct xfs_agf {
 	__be32		agf_seqno;	/* sequence # starting from 0 */
 	__be32		agf_length;	/* size in blocks of a.g. */
 	/*
-	 * Freespace information
+	 * Freespace and rmap information
 	 */
 	__be32		agf_roots[XFS_BTNUM_AGF];	/* root blocks */
-	__be32		agf_spare0;	/* spare field */
 	__be32		agf_levels[XFS_BTNUM_AGF];	/* btree levels */
-	__be32		agf_spare1;	/* spare field */
 
 	__be32		agf_flfirst;	/* first freelist block's index */
 	__be32		agf_fllast;	/* last freelist block's index */
@@ -1307,6 +1312,13 @@ typedef __be32 xfs_inobt_ptr_t;
 #define	XFS_FIBT_BLOCK(mp)		((xfs_agblock_t)(XFS_IBT_BLOCK(mp) + 1))
 
 /*
+ * Reverse mapping btree format definitions
+ *
+ * There is a btree for the reverse map per allocation group
+ */
+#define	XFS_RMAP_CRC_MAGIC	0x524d4233	/* 'RMB3' */
+
+/*
  * The first data block of an AG depends on whether the filesystem was formatted
  * with the finobt feature. If so, account for the finobt reserved root btree
  * block.
diff --git a/libxfs/xfs_types.h b/libxfs/xfs_types.h
index f0d145a..da87796 100644
--- a/libxfs/xfs_types.h
+++ b/libxfs/xfs_types.h
@@ -111,8 +111,8 @@ typedef enum {
 } xfs_lookup_t;
 
 typedef enum {
-	XFS_BTNUM_BNOi, XFS_BTNUM_CNTi, XFS_BTNUM_BMAPi, XFS_BTNUM_INOi,
-	XFS_BTNUM_FINOi, XFS_BTNUM_MAX
+	XFS_BTNUM_BNOi, XFS_BTNUM_CNTi, XFS_BTNUM_RMAPi, XFS_BTNUM_BMAPi,
+	XFS_BTNUM_INOi, XFS_BTNUM_FINOi, XFS_BTNUM_MAX
 } xfs_btnum_t;
 
 struct xfs_name {

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 029/145] xfs: add rmap btree stats infrastructure
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (27 preceding siblings ...)
  2016-06-17  1:33 ` [PATCH 028/145] xfs: introduce rmap btree definitions Darrick J. Wong
@ 2016-06-17  1:33 ` Darrick J. Wong
  2016-06-17  1:33 ` [PATCH 030/145] xfs: rmap btree add more reserved blocks Darrick J. Wong
                   ` (115 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:33 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: Dave Chinner, xfs

From: Dave Chinner <dchinner@redhat.com>

The rmap btree will require the same stats as all the other generic
btrees, so add al the code for that now.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 libxfs/xfs_btree.h |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)


diff --git a/libxfs/xfs_btree.h b/libxfs/xfs_btree.h
index 202fdd3..a29067c 100644
--- a/libxfs/xfs_btree.h
+++ b/libxfs/xfs_btree.h
@@ -96,7 +96,7 @@ do {    \
 	case XFS_BTNUM_BMAP: __XFS_BTREE_STATS_INC(__mp, bmbt, stat); break; \
 	case XFS_BTNUM_INO: __XFS_BTREE_STATS_INC(__mp, ibt, stat); break; \
 	case XFS_BTNUM_FINO: __XFS_BTREE_STATS_INC(__mp, fibt, stat); break; \
-	case XFS_BTNUM_RMAP: break;	\
+	case XFS_BTNUM_RMAP: __XFS_BTREE_STATS_INC(__mp, rmap, stat); break; \
 	case XFS_BTNUM_MAX: ASSERT(0); __mp = __mp /* fucking gcc */ ; break; \
 	}       \
 } while (0)
@@ -117,7 +117,8 @@ do {    \
 		__XFS_BTREE_STATS_ADD(__mp, ibt, stat, val); break; \
 	case XFS_BTNUM_FINO:	\
 		__XFS_BTREE_STATS_ADD(__mp, fibt, stat, val); break; \
-	case XFS_BTNUM_RMAP: break; \
+	case XFS_BTNUM_RMAP:	\
+		__XFS_BTREE_STATS_ADD(__mp, rmap, stat, val); break; \
 	case XFS_BTNUM_MAX: ASSERT(0); __mp = __mp /* fucking gcc */ ; break; \
 	}       \
 } while (0)

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 030/145] xfs: rmap btree add more reserved blocks
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (28 preceding siblings ...)
  2016-06-17  1:33 ` [PATCH 029/145] xfs: add rmap btree stats infrastructure Darrick J. Wong
@ 2016-06-17  1:33 ` Darrick J. Wong
  2016-06-17  1:33 ` [PATCH 031/145] xfs: add owner field to extent allocation and freeing Darrick J. Wong
                   ` (114 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:33 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: Dave Chinner, xfs

From: Dave Chinner <dchinner@redhat.com>

XFS reserves a small amount of space in each AG for the minimum
number of free blocks needed for operation. Adding the rmap btree
increases the number of reserved blocks, but it also increases the
complexity of the calculation as the free inode btree is optional
(like the rmbt).

Rather than calculate the prealloc blocks every time we need to
check it, add a function to calculate it at mount time and store it
in the struct xfs_mount, and convert the XFS_PREALLOC_BLOCKS macro
just to use the xfs-mount variable directly.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 include/xfs_mount.h |    1 +
 libxfs/init.c       |    2 ++
 libxfs/xfs_alloc.c  |   11 +++++++++++
 libxfs/xfs_alloc.h  |    2 ++
 libxfs/xfs_format.h |    9 +--------
 mkfs/xfs_mkfs.c     |   14 +++++++-------
 6 files changed, 24 insertions(+), 15 deletions(-)


diff --git a/include/xfs_mount.h b/include/xfs_mount.h
index 14b05a6..2c7c72e 100644
--- a/include/xfs_mount.h
+++ b/include/xfs_mount.h
@@ -67,6 +67,7 @@ typedef struct xfs_mount {
 	uint			m_ag_maxlevels;	/* XFS_AG_MAXLEVELS */
 	uint			m_bm_maxlevels[2]; /* XFS_BM_MAXLEVELS */
 	uint			m_in_maxlevels;	/* XFS_IN_MAXLEVELS */
+	xfs_extlen_t		m_ag_prealloc_blocks; /* reserved ag blocks */
 	struct radix_tree_root	m_perag_tree;
 	uint			m_flags;	/* global mount flags */
 	uint			m_qflags;	/* quota status flags */
diff --git a/libxfs/init.c b/libxfs/init.c
index 553da7b..7b618da 100644
--- a/libxfs/init.c
+++ b/libxfs/init.c
@@ -572,6 +572,8 @@ libxfs_initialize_perag(
 
 	if (maxagi)
 		*maxagi = index;
+
+	mp->m_ag_prealloc_blocks = xfs_prealloc_blocks(mp);
 	return 0;
 
 out_unwind:
diff --git a/libxfs/xfs_alloc.c b/libxfs/xfs_alloc.c
index 675151d..738275f 100644
--- a/libxfs/xfs_alloc.c
+++ b/libxfs/xfs_alloc.c
@@ -46,6 +46,17 @@ STATIC int xfs_alloc_ag_vextent_size(xfs_alloc_arg_t *);
 STATIC int xfs_alloc_ag_vextent_small(xfs_alloc_arg_t *,
 		xfs_btree_cur_t *, xfs_agblock_t *, xfs_extlen_t *, int *);
 
+xfs_extlen_t
+xfs_prealloc_blocks(
+	struct xfs_mount	*mp)
+{
+	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
+		return XFS_RMAP_BLOCK(mp) + 1;
+	if (xfs_sb_version_hasfinobt(&mp->m_sb))
+		return XFS_FIBT_BLOCK(mp) + 1;
+	return XFS_IBT_BLOCK(mp) + 1;
+}
+
 /*
  * Lookup the record equal to [bno, len] in the btree given by cur.
  */
diff --git a/libxfs/xfs_alloc.h b/libxfs/xfs_alloc.h
index cf268b2..20b54aa 100644
--- a/libxfs/xfs_alloc.h
+++ b/libxfs/xfs_alloc.h
@@ -232,4 +232,6 @@ int xfs_alloc_fix_freelist(struct xfs_alloc_arg *args, int flags);
 int xfs_free_extent_fix_freelist(struct xfs_trans *tp, xfs_agnumber_t agno,
 		struct xfs_buf **agbp);
 
+xfs_extlen_t xfs_prealloc_blocks(struct xfs_mount *mp);
+
 #endif	/* __XFS_ALLOC_H__ */
diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index cb30a25..12048fa 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -1318,18 +1318,11 @@ typedef __be32 xfs_inobt_ptr_t;
  */
 #define	XFS_RMAP_CRC_MAGIC	0x524d4233	/* 'RMB3' */
 
-/*
- * The first data block of an AG depends on whether the filesystem was formatted
- * with the finobt feature. If so, account for the finobt reserved root btree
- * block.
- */
-#define XFS_PREALLOC_BLOCKS(mp) \
+#define	XFS_RMAP_BLOCK(mp) \
 	(xfs_sb_version_hasfinobt(&((mp)->m_sb)) ? \
 	 XFS_FIBT_BLOCK(mp) + 1 : \
 	 XFS_IBT_BLOCK(mp) + 1)
 
-
-
 /*
  * BMAP Btree format definitions
  *
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index de00c8e..e9d2851 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -2965,7 +2965,7 @@ _("size %s specified for log subvolume is too large, maximum is %lld blocks\n"),
 
 	/*
 	 * sb_versionnum and finobt flags must be set before we use
-	 * XFS_PREALLOC_BLOCKS().
+	 * xfs_prealloc_blocks().
 	 */
 	sb_set_features(&mp->m_sb, &sb_feat, sectorsize, lsectorsize, dsunit);
 
@@ -2982,7 +2982,7 @@ _("size %s specified for log subvolume is too large, maximum is %lld blocks\n"),
 			/* revalidate the log size is valid if we changed it */
 			validate_log_size(logblocks, blocklog, min_logblocks);
 		}
-		if (logblocks > agsize - XFS_PREALLOC_BLOCKS(mp)) {
+		if (logblocks > agsize - xfs_prealloc_blocks(mp)) {
 			fprintf(stderr,
 	_("internal log size %lld too large, must fit in allocation group\n"),
 				(long long)logblocks);
@@ -2999,7 +2999,7 @@ _("size %s specified for log subvolume is too large, maximum is %lld blocks\n"),
 		} else
 			logagno = (xfs_agnumber_t)(agcount / 2);
 
-		logstart = XFS_AGB_TO_FSB(mp, logagno, XFS_PREALLOC_BLOCKS(mp));
+		logstart = XFS_AGB_TO_FSB(mp, logagno, xfs_prealloc_blocks(mp));
 		/*
 		 * Align the logstart at stripe unit boundary.
 		 */
@@ -3077,7 +3077,7 @@ _("size %s specified for log subvolume is too large, maximum is %lld blocks\n"),
 	sbp->sb_imax_pct = imaxpct;
 	sbp->sb_icount = 0;
 	sbp->sb_ifree = 0;
-	sbp->sb_fdblocks = dblocks - agcount * XFS_PREALLOC_BLOCKS(mp) -
+	sbp->sb_fdblocks = dblocks - agcount * xfs_prealloc_blocks(mp) -
 		(loginternal ? logblocks : 0);
 	sbp->sb_frextents = 0;	/* will do a free later */
 	sbp->sb_uquotino = sbp->sb_gquotino = sbp->sb_pquotino = 0;
@@ -3219,7 +3219,7 @@ _("size %s specified for log subvolume is too large, maximum is %lld blocks\n"),
 		agf->agf_flfirst = 0;
 		agf->agf_fllast = cpu_to_be32(XFS_AGFL_SIZE(mp) - 1);
 		agf->agf_flcount = 0;
-		nbmblocks = (xfs_extlen_t)(agsize - XFS_PREALLOC_BLOCKS(mp));
+		nbmblocks = (xfs_extlen_t)(agsize - xfs_prealloc_blocks(mp));
 		agf->agf_freeblks = cpu_to_be32(nbmblocks);
 		agf->agf_longest = cpu_to_be32(nbmblocks);
 		if (xfs_sb_version_hascrc(&mp->m_sb))
@@ -3300,7 +3300,7 @@ _("size %s specified for log subvolume is too large, maximum is %lld blocks\n"),
 						agno, 0);
 
 		arec = XFS_ALLOC_REC_ADDR(mp, block, 1);
-		arec->ar_startblock = cpu_to_be32(XFS_PREALLOC_BLOCKS(mp));
+		arec->ar_startblock = cpu_to_be32(xfs_prealloc_blocks(mp));
 		if (loginternal && agno == logagno) {
 			if (lalign) {
 				/*
@@ -3355,7 +3355,7 @@ _("size %s specified for log subvolume is too large, maximum is %lld blocks\n"),
 						agno, 0);
 
 		arec = XFS_ALLOC_REC_ADDR(mp, block, 1);
-		arec->ar_startblock = cpu_to_be32(XFS_PREALLOC_BLOCKS(mp));
+		arec->ar_startblock = cpu_to_be32(xfs_prealloc_blocks(mp));
 		if (loginternal && agno == logagno) {
 			if (lalign) {
 				arec->ar_blockcount = cpu_to_be32(

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 031/145] xfs: add owner field to extent allocation and freeing
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (29 preceding siblings ...)
  2016-06-17  1:33 ` [PATCH 030/145] xfs: rmap btree add more reserved blocks Darrick J. Wong
@ 2016-06-17  1:33 ` Darrick J. Wong
  2016-06-17  1:34 ` [PATCH 032/145] xfs: introduce rmap extent operation stubs Darrick J. Wong
                   ` (113 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:33 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: Dave Chinner, xfs

For the rmap btree to work, we have to feed the extent owner
information to the the allocation and freeing functions. This
information is what will end up in the rmap btree that tracks
allocated extents. While we technically don't need the owner
information when freeing extents, passing it allows us to validate
that the extent we are removing from the rmap btree actually
belonged to the owner we expected it to belong to.

We also define a special set of owner values for internal metadata
that would otherwise have no owner. This allows us to tell the
difference between metadata owned by different per-ag btrees, as
well as static fs metadata (e.g. AG headers) and internal journal
blocks.

There are also a couple of special cases we need to take care of -
during EFI recovery, we don't actually know who the original owner
was, so we need to pass a wildcard to indicate that we aren't
checking the owner for validity. We also need special handling in
growfs, as we "free" the space in the last AG when extending it, but
because it's new space it has no actual owner...

While touching the xfs_bmap_add_free() function, re-order the
parameters to put the struct xfs_mount first.

Extend the owner field to include both the owner type and some sort
of index within the owner.  The index field will be used to support
reverse mappings when reflink is enabled.

This is based upon a patch originally from Dave Chinner. It has been
extended to add more owner information with the intent of helping
recovery operations when things go wrong (e.g. offset of user data
block in a file).

[dchinner: de-shout the xfs_rmap_*_owner helpers]
[darrick: minor style fixes suggested by Christoph Hellwig]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 libxfs/defer_item.c       |    2 +
 libxfs/xfs_alloc.c        |   11 ++++++--
 libxfs/xfs_alloc.h        |    4 ++-
 libxfs/xfs_bmap.c         |   17 ++++++++++--
 libxfs/xfs_bmap.h         |    4 ++-
 libxfs/xfs_bmap_btree.c   |    6 +++-
 libxfs/xfs_format.h       |   65 +++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_ialloc.c       |    7 +++--
 libxfs/xfs_ialloc_btree.c |    7 ++++-
 9 files changed, 110 insertions(+), 13 deletions(-)


diff --git a/libxfs/defer_item.c b/libxfs/defer_item.c
index 72c28f8..777875c 100644
--- a/libxfs/defer_item.c
+++ b/libxfs/defer_item.c
@@ -92,7 +92,7 @@ xfs_bmap_free_finish_item(
 
 	free = container_of(item, struct xfs_bmap_free_item, xbfi_list);
 	error = xfs_free_extent(tp, free->xbfi_startblock,
-			free->xbfi_blockcount);
+			free->xbfi_blockcount, &free->xbfi_oinfo);
 	kmem_free(free);
 	return error;
 }
diff --git a/libxfs/xfs_alloc.c b/libxfs/xfs_alloc.c
index 738275f..e3442a3 100644
--- a/libxfs/xfs_alloc.c
+++ b/libxfs/xfs_alloc.c
@@ -1592,6 +1592,7 @@ xfs_free_ag_extent(
 	xfs_agnumber_t	agno,	/* allocation group number */
 	xfs_agblock_t	bno,	/* starting block number */
 	xfs_extlen_t	len,	/* length of extent */
+	struct xfs_owner_info	*oinfo,	/* extent owner */
 	int		isfl)	/* set if is freelist blocks - no sb acctg */
 {
 	xfs_btree_cur_t	*bno_cur;	/* cursor for by-block btree */
@@ -2001,13 +2002,15 @@ xfs_alloc_fix_freelist(
 	 * back on the free list? Maybe we should only do this when space is
 	 * getting low or the AGFL is more than half full?
 	 */
+	xfs_rmap_ag_owner(&targs.oinfo, XFS_RMAP_OWN_AG);
 	while (pag->pagf_flcount > need) {
 		struct xfs_buf	*bp;
 
 		error = xfs_alloc_get_freelist(tp, agbp, &bno, 0);
 		if (error)
 			goto out_agbp_relse;
-		error = xfs_free_ag_extent(tp, agbp, args->agno, bno, 1, 1);
+		error = xfs_free_ag_extent(tp, agbp, args->agno, bno, 1,
+					   &targs.oinfo, 1);
 		if (error)
 			goto out_agbp_relse;
 		bp = xfs_btree_get_bufs(mp, tp, args->agno, bno, 0);
@@ -2017,6 +2020,7 @@ xfs_alloc_fix_freelist(
 	memset(&targs, 0, sizeof(targs));
 	targs.tp = tp;
 	targs.mp = mp;
+	xfs_rmap_ag_owner(&targs.oinfo, XFS_RMAP_OWN_AG);
 	targs.agbp = agbp;
 	targs.agno = args->agno;
 	targs.alignment = targs.minlen = targs.prod = targs.isfl = 1;
@@ -2707,7 +2711,8 @@ int				/* error */
 xfs_free_extent(
 	struct xfs_trans	*tp,	/* transaction pointer */
 	xfs_fsblock_t		bno,	/* starting block number of extent */
-	xfs_extlen_t		len)	/* length of extent */
+	xfs_extlen_t		len,	/* length of extent */
+	struct xfs_owner_info	*oinfo)	/* extent owner */
 {
 	struct xfs_mount	*mp = tp->t_mountp;
 	struct xfs_buf		*agbp;
@@ -2735,7 +2740,7 @@ xfs_free_extent(
 			agbno + len <= be32_to_cpu(XFS_BUF_TO_AGF(agbp)->agf_length),
 			err);
 
-	error = xfs_free_ag_extent(tp, agbp, agno, agbno, len, 0);
+	error = xfs_free_ag_extent(tp, agbp, agno, agbno, len, oinfo, 0);
 	if (error)
 		goto err;
 
diff --git a/libxfs/xfs_alloc.h b/libxfs/xfs_alloc.h
index 20b54aa..0721a48 100644
--- a/libxfs/xfs_alloc.h
+++ b/libxfs/xfs_alloc.h
@@ -123,6 +123,7 @@ typedef struct xfs_alloc_arg {
 	char		isfl;		/* set if is freelist blocks - !acctg */
 	char		userdata;	/* mask defining userdata treatment */
 	xfs_fsblock_t	firstblock;	/* io first block allocated */
+	struct xfs_owner_info	oinfo;	/* owner of blocks being allocated */
 } xfs_alloc_arg_t;
 
 /*
@@ -210,7 +211,8 @@ int				/* error */
 xfs_free_extent(
 	struct xfs_trans *tp,	/* transaction pointer */
 	xfs_fsblock_t	bno,	/* starting block number of extent */
-	xfs_extlen_t	len);	/* length of extent */
+	xfs_extlen_t	len,	/* length of extent */
+	struct xfs_owner_info	*oinfo);	/* extent owner */
 
 int				/* error */
 xfs_alloc_lookup_ge(
diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index 33e181c..de1c759 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -566,7 +566,8 @@ xfs_bmap_add_free(
 	struct xfs_mount	*mp,		/* mount point structure */
 	struct xfs_defer_ops	*dfops,		/* list of extents */
 	xfs_fsblock_t		bno,		/* fs block number of extent */
-	xfs_filblks_t		len)		/* length of extent */
+	xfs_filblks_t		len,		/* length of extent */
+	struct xfs_owner_info	*oinfo)		/* extent owner */
 {
 	struct xfs_bmap_free_item	*new;		/* new element */
 #ifdef DEBUG
@@ -585,9 +586,14 @@ xfs_bmap_add_free(
 	ASSERT(agbno + len <= mp->m_sb.sb_agblocks);
 #endif
 	ASSERT(xfs_bmap_free_item_zone != NULL);
+
 	new = kmem_zone_alloc(xfs_bmap_free_item_zone, KM_SLEEP);
 	new->xbfi_startblock = bno;
 	new->xbfi_blockcount = (xfs_extlen_t)len;
+	if (oinfo)
+		memcpy(&new->xbfi_oinfo, oinfo, sizeof(struct xfs_owner_info));
+	else
+		memset(&new->xbfi_oinfo, 0, sizeof(struct xfs_owner_info));
 	trace_xfs_bmap_free_defer(mp, XFS_FSB_TO_AGNO(mp, bno), 0,
 			XFS_FSB_TO_AGBNO(mp, bno), len);
 	xfs_defer_add(dfops, XFS_DEFER_OPS_TYPE_FREE, &new->xbfi_list);
@@ -620,6 +626,7 @@ xfs_bmap_btree_to_extents(
 	xfs_mount_t		*mp;	/* mount point structure */
 	__be64			*pp;	/* ptr to block address */
 	struct xfs_btree_block	*rblock;/* root btree block */
+	struct xfs_owner_info	oinfo;
 
 	mp = ip->i_mount;
 	ifp = XFS_IFORK_PTR(ip, whichfork);
@@ -643,7 +650,8 @@ xfs_bmap_btree_to_extents(
 	cblock = XFS_BUF_TO_BLOCK(cbp);
 	if ((error = xfs_btree_check_block(cur, cblock, 0, cbp)))
 		return error;
-	xfs_bmap_add_free(mp, cur->bc_private.b.dfops, cbno, 1);
+	xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, whichfork);
+	xfs_bmap_add_free(mp, cur->bc_private.b.dfops, cbno, 1, &oinfo);
 	ip->i_d.di_nblocks--;
 	xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, -1L);
 	xfs_trans_binval(tp, cbp);
@@ -724,6 +732,7 @@ xfs_bmap_extents_to_btree(
 	memset(&args, 0, sizeof(args));
 	args.tp = tp;
 	args.mp = mp;
+	xfs_rmap_ino_bmbt_owner(&args.oinfo, ip->i_ino, whichfork);
 	args.firstblock = *firstblock;
 	if (*firstblock == NULLFSBLOCK) {
 		args.type = XFS_ALLOCTYPE_START_BNO;
@@ -870,6 +879,7 @@ xfs_bmap_local_to_extents(
 	memset(&args, 0, sizeof(args));
 	args.tp = tp;
 	args.mp = ip->i_mount;
+	xfs_rmap_ino_owner(&args.oinfo, ip->i_ino, whichfork, 0);
 	args.firstblock = *firstblock;
 	/*
 	 * Allocate a block.  We know we need only one, since the
@@ -4831,6 +4841,7 @@ xfs_bmap_del_extent(
 		nblks = 0;
 		do_fx = 0;
 	}
+
 	/*
 	 * Set flag value to use in switch statement.
 	 * Left-contig is 2, right-contig is 1.
@@ -5018,7 +5029,7 @@ xfs_bmap_del_extent(
 	 */
 	if (do_fx)
 		xfs_bmap_add_free(mp, dfops, del->br_startblock,
-			del->br_blockcount);
+				  del->br_blockcount, NULL);
 	/*
 	 * Adjust inode # blocks in the file.
 	 */
diff --git a/libxfs/xfs_bmap.h b/libxfs/xfs_bmap.h
index 6854e61..41f7ef2 100644
--- a/libxfs/xfs_bmap.h
+++ b/libxfs/xfs_bmap.h
@@ -67,6 +67,7 @@ struct xfs_bmap_free_item
 	xfs_fsblock_t		xbfi_startblock;/* starting fs block number */
 	xfs_extlen_t		xbfi_blockcount;/* number of blocks in extent */
 	struct list_head	xbfi_list;
+	struct xfs_owner_info	xbfi_oinfo;	/* extent owner */
 };
 
 #define	XFS_BMAP_MAX_NMAP	4
@@ -165,7 +166,8 @@ void	xfs_bmap_trace_exlist(struct xfs_inode *ip, xfs_extnum_t cnt,
 int	xfs_bmap_add_attrfork(struct xfs_inode *ip, int size, int rsvd);
 void	xfs_bmap_local_to_extents_empty(struct xfs_inode *ip, int whichfork);
 void	xfs_bmap_add_free(struct xfs_mount *mp, struct xfs_defer_ops *dfops,
-			  xfs_fsblock_t bno, xfs_filblks_t len);
+			  xfs_fsblock_t bno, xfs_filblks_t len,
+			  struct xfs_owner_info *oinfo);
 void	xfs_bmap_compute_maxlevels(struct xfs_mount *mp, int whichfork);
 int	xfs_bmap_first_unused(struct xfs_trans *tp, struct xfs_inode *ip,
 		xfs_extlen_t len, xfs_fileoff_t *unused, int whichfork);
diff --git a/libxfs/xfs_bmap_btree.c b/libxfs/xfs_bmap_btree.c
index d290769..404e321 100644
--- a/libxfs/xfs_bmap_btree.c
+++ b/libxfs/xfs_bmap_btree.c
@@ -444,6 +444,8 @@ xfs_bmbt_alloc_block(
 	args.mp = cur->bc_mp;
 	args.fsbno = cur->bc_private.b.firstblock;
 	args.firstblock = args.fsbno;
+	xfs_rmap_ino_bmbt_owner(&args.oinfo, cur->bc_private.b.ip->i_ino,
+			cur->bc_private.b.whichfork);
 
 	if (args.fsbno == NULLFSBLOCK) {
 		args.fsbno = be64_to_cpu(start->l);
@@ -523,8 +525,10 @@ xfs_bmbt_free_block(
 	struct xfs_inode	*ip = cur->bc_private.b.ip;
 	struct xfs_trans	*tp = cur->bc_tp;
 	xfs_fsblock_t		fsbno = XFS_DADDR_TO_FSB(mp, XFS_BUF_ADDR(bp));
+	struct xfs_owner_info	oinfo;
 
-	xfs_bmap_add_free(mp, cur->bc_private.b.dfops, fsbno, 1);
+	xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, cur->bc_private.b.whichfork);
+	xfs_bmap_add_free(mp, cur->bc_private.b.dfops, fsbno, 1, &oinfo);
 	ip->i_d.di_nblocks--;
 
 	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index 12048fa..1b63315 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -1318,6 +1318,71 @@ typedef __be32 xfs_inobt_ptr_t;
  */
 #define	XFS_RMAP_CRC_MAGIC	0x524d4233	/* 'RMB3' */
 
+/*
+ * Ownership info for an extent.  This is used to create reverse-mapping
+ * entries.
+ */
+#define XFS_OWNER_INFO_ATTR_FORK	(1 << 0)
+#define XFS_OWNER_INFO_BMBT_BLOCK	(1 << 1)
+struct xfs_owner_info {
+	uint64_t		oi_owner;
+	xfs_fileoff_t		oi_offset;
+	unsigned int		oi_flags;
+};
+
+static inline void
+xfs_rmap_ag_owner(
+	struct xfs_owner_info	*oi,
+	uint64_t		owner)
+{
+	oi->oi_owner = owner;
+	oi->oi_offset = 0;
+	oi->oi_flags = 0;
+}
+
+static inline void
+xfs_rmap_ino_bmbt_owner(
+	struct xfs_owner_info	*oi,
+	xfs_ino_t		ino,
+	int			whichfork)
+{
+	oi->oi_owner = ino;
+	oi->oi_offset = 0;
+	oi->oi_flags = XFS_OWNER_INFO_BMBT_BLOCK;
+	if (whichfork == XFS_ATTR_FORK)
+		oi->oi_flags |= XFS_OWNER_INFO_ATTR_FORK;
+}
+
+static inline void
+xfs_rmap_ino_owner(
+	struct xfs_owner_info	*oi,
+	xfs_ino_t		ino,
+	int			whichfork,
+	xfs_fileoff_t		offset)
+{
+	oi->oi_owner = ino;
+	oi->oi_offset = offset;
+	oi->oi_flags = 0;
+	if (whichfork == XFS_ATTR_FORK)
+		oi->oi_flags |= XFS_OWNER_INFO_ATTR_FORK;
+}
+
+/*
+ * Special owner types.
+ *
+ * Seeing as we only support up to 8EB, we have the upper bit of the owner field
+ * to tell us we have a special owner value. We use these for static metadata
+ * allocated at mkfs/growfs time, as well as for freespace management metadata.
+ */
+#define XFS_RMAP_OWN_NULL	(-1ULL)	/* No owner, for growfs */
+#define XFS_RMAP_OWN_UNKNOWN	(-2ULL)	/* Unknown owner, for EFI recovery */
+#define XFS_RMAP_OWN_FS		(-3ULL)	/* static fs metadata */
+#define XFS_RMAP_OWN_LOG	(-4ULL)	/* static fs metadata */
+#define XFS_RMAP_OWN_AG		(-5ULL)	/* AG freespace btree blocks */
+#define XFS_RMAP_OWN_INOBT	(-6ULL)	/* Inode btree blocks */
+#define XFS_RMAP_OWN_INODES	(-7ULL)	/* Inode chunk */
+#define XFS_RMAP_OWN_MIN	(-8ULL) /* guard */
+
 #define	XFS_RMAP_BLOCK(mp) \
 	(xfs_sb_version_hasfinobt(&((mp)->m_sb)) ? \
 	 XFS_FIBT_BLOCK(mp) + 1 : \
diff --git a/libxfs/xfs_ialloc.c b/libxfs/xfs_ialloc.c
index 1545338..8c2344c 100644
--- a/libxfs/xfs_ialloc.c
+++ b/libxfs/xfs_ialloc.c
@@ -609,6 +609,7 @@ xfs_ialloc_ag_alloc(
 	args.tp = tp;
 	args.mp = tp->t_mountp;
 	args.fsbno = NULLFSBLOCK;
+	xfs_rmap_ag_owner(&args.oinfo, XFS_RMAP_OWN_INODES);
 
 #ifdef DEBUG
 	/* randomly do sparse inode allocations */
@@ -1819,12 +1820,14 @@ xfs_difree_inode_chunk(
 	int		nextbit;
 	xfs_agblock_t	agbno;
 	int		contigblk;
+	struct xfs_owner_info	oinfo;
 	DECLARE_BITMAP(holemask, XFS_INOBT_HOLEMASK_BITS);
+	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_INODES);
 
 	if (!xfs_inobt_issparse(rec->ir_holemask)) {
 		/* not sparse, calculate extent info directly */
 		xfs_bmap_add_free(mp, dfops, XFS_AGB_TO_FSB(mp, agno, sagbno),
-				  mp->m_ialloc_blks);
+				  mp->m_ialloc_blks, &oinfo);
 		return;
 	}
 
@@ -1868,7 +1871,7 @@ xfs_difree_inode_chunk(
 		ASSERT(agbno % mp->m_sb.sb_spino_align == 0);
 		ASSERT(contigblk % mp->m_sb.sb_spino_align == 0);
 		xfs_bmap_add_free(mp, dfops, XFS_AGB_TO_FSB(mp, agno, agbno),
-				  contigblk);
+				  contigblk, &oinfo);
 
 		/* reset range to current bit and carry on... */
 		startidx = endidx = nextbit;
diff --git a/libxfs/xfs_ialloc_btree.c b/libxfs/xfs_ialloc_btree.c
index cd61419..d79ddfe 100644
--- a/libxfs/xfs_ialloc_btree.c
+++ b/libxfs/xfs_ialloc_btree.c
@@ -95,6 +95,7 @@ xfs_inobt_alloc_block(
 	memset(&args, 0, sizeof(args));
 	args.tp = cur->bc_tp;
 	args.mp = cur->bc_mp;
+	xfs_rmap_ag_owner(&args.oinfo, XFS_RMAP_OWN_INOBT);
 	args.fsbno = XFS_AGB_TO_FSB(args.mp, cur->bc_private.a.agno, sbno);
 	args.minlen = 1;
 	args.maxlen = 1;
@@ -124,8 +125,12 @@ xfs_inobt_free_block(
 	struct xfs_btree_cur	*cur,
 	struct xfs_buf		*bp)
 {
+	struct xfs_owner_info	oinfo;
+
+	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_INOBT);
 	return xfs_free_extent(cur->bc_tp,
-			XFS_DADDR_TO_FSB(cur->bc_mp, XFS_BUF_ADDR(bp)), 1);
+			XFS_DADDR_TO_FSB(cur->bc_mp, XFS_BUF_ADDR(bp)), 1,
+			&oinfo);
 }
 
 STATIC int

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 032/145] xfs: introduce rmap extent operation stubs
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (30 preceding siblings ...)
  2016-06-17  1:33 ` [PATCH 031/145] xfs: add owner field to extent allocation and freeing Darrick J. Wong
@ 2016-06-17  1:34 ` Darrick J. Wong
  2016-06-17  1:34 ` [PATCH 033/145] xfs: define the on-disk rmap btree format Darrick J. Wong
                   ` (112 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:34 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: Dave Chinner, xfs

From: Dave Chinner <dchinner@redhat.com>

Add the stubs into the extent allocation and freeing paths that the
rmap btree implementation will hook into. While doing this, add the
trace points that will be used to track rmap btree extent
manipulations.

[darrick.wong@oracle.com: Extend the stubs to take full owner info.]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 include/libxfs.h        |    1 +
 include/xfs_trace.h     |    6 +++
 libxfs/Makefile         |    2 +
 libxfs/xfs_alloc.c      |   18 +++++++++-
 libxfs/xfs_rmap.c       |   88 +++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_rmap_btree.h |   30 ++++++++++++++++
 6 files changed, 144 insertions(+), 1 deletion(-)
 create mode 100644 libxfs/xfs_rmap.c
 create mode 100644 libxfs/xfs_rmap_btree.h


diff --git a/include/libxfs.h b/include/libxfs.h
index 863b0e3..5e76263 100644
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -77,6 +77,7 @@ extern uint32_t crc32c_le(uint32_t crc, unsigned char const *p, size_t len);
 #include "xfs_bmap.h"
 #include "xfs_trace.h"
 #include "xfs_trans.h"
+#include "xfs_rmap_btree.h"
 
 #ifndef ARRAY_SIZE
 #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
diff --git a/include/xfs_trace.h b/include/xfs_trace.h
index cdabd18..b4174a3 100644
--- a/include/xfs_trace.h
+++ b/include/xfs_trace.h
@@ -186,6 +186,12 @@
 
 #define trace_xfs_bmap_free_deferred(...)	((void) 0)
 #define trace_xfs_bmap_free_defer(...)		((void) 0)
+#define trace_xfs_rmap_free_extent(...)		((void) 0)
+#define trace_xfs_rmap_free_extent_error(...)	((void) 0)
+#define trace_xfs_rmap_alloc_extent(...)	((void) 0)
+#define trace_xfs_rmap_alloc_extent_error(...)	((void) 0)
+#define trace_xfs_rmap_alloc_extent_done(...)	((void) 0)
+#define trace_xfs_rmap_free_extent_done(...)	((void) 0)
 
 /* set c = c to avoid unused var warnings */
 #define trace_xfs_perag_get(a,b,c,d)	((c) = (c))
diff --git a/libxfs/Makefile b/libxfs/Makefile
index 0073cd1..d7a3e57 100644
--- a/libxfs/Makefile
+++ b/libxfs/Makefile
@@ -36,6 +36,7 @@ HFILES = \
 	xfs_inode_buf.h \
 	xfs_inode_fork.h \
 	xfs_quota_defs.h \
+	xfs_rmap_btree.h \
 	xfs_sb.h \
 	xfs_shared.h \
 	xfs_trans_resv.h \
@@ -82,6 +83,7 @@ CFILES = cache.c \
 	xfs_inode_fork.c \
 	xfs_ialloc_btree.c \
 	xfs_log_rlimit.c \
+	xfs_rmap.c \
 	xfs_rtbitmap.c \
 	xfs_sb.c \
 	xfs_symlink_remote.c \
diff --git a/libxfs/xfs_alloc.c b/libxfs/xfs_alloc.c
index e3442a3..0f35e96 100644
--- a/libxfs/xfs_alloc.c
+++ b/libxfs/xfs_alloc.c
@@ -27,6 +27,7 @@
 #include "xfs_defer.h"
 #include "xfs_inode.h"
 #include "xfs_btree.h"
+#include "xfs_rmap_btree.h"
 #include "xfs_alloc_btree.h"
 #include "xfs_alloc.h"
 #include "xfs_cksum.h"
@@ -644,6 +645,14 @@ xfs_alloc_ag_vextent(
 	ASSERT(!args->wasfromfl || !args->isfl);
 	ASSERT(args->agbno % args->alignment == 0);
 
+	/* if not file data, insert new block into the reverse map btree */
+	if (args->oinfo.oi_owner) {
+		error = xfs_rmap_alloc(args->tp, args->agbp, args->agno,
+				       args->agbno, args->len, &args->oinfo);
+		if (error)
+			return error;
+	}
+
 	if (!args->wasfromfl) {
 		error = xfs_alloc_update_counters(args->tp, args->pag,
 						  args->agbp,
@@ -1610,12 +1619,19 @@ xfs_free_ag_extent(
 	xfs_extlen_t	nlen;		/* new length of freespace */
 	xfs_perag_t	*pag;		/* per allocation group data */
 
+	bno_cur = cnt_cur = NULL;
 	mp = tp->t_mountp;
+
+	if (oinfo->oi_owner) {
+		error = xfs_rmap_free(tp, agbp, agno, bno, len, oinfo);
+		if (error)
+			goto error0;
+	}
+
 	/*
 	 * Allocate and initialize a cursor for the by-block btree.
 	 */
 	bno_cur = xfs_allocbt_init_cursor(mp, tp, agbp, agno, XFS_BTNUM_BNO);
-	cnt_cur = NULL;
 	/*
 	 * Look for a neighboring block on the left (lower block numbers)
 	 * that is contiguous with this space.
diff --git a/libxfs/xfs_rmap.c b/libxfs/xfs_rmap.c
new file mode 100644
index 0000000..31c6336
--- /dev/null
+++ b/libxfs/xfs_rmap.c
@@ -0,0 +1,88 @@
+
+/*
+ * Copyright (c) 2014 Red Hat, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+#include "libxfs_priv.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_bit.h"
+#include "xfs_sb.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_btree.h"
+#include "xfs_trans.h"
+#include "xfs_alloc.h"
+#include "xfs_rmap_btree.h"
+#include "xfs_trans_space.h"
+#include "xfs_trace.h"
+
+int
+xfs_rmap_free(
+	struct xfs_trans	*tp,
+	struct xfs_buf		*agbp,
+	xfs_agnumber_t		agno,
+	xfs_agblock_t		bno,
+	xfs_extlen_t		len,
+	struct xfs_owner_info	*oinfo)
+{
+	struct xfs_mount	*mp = tp->t_mountp;
+	int			error = 0;
+
+	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+		return 0;
+
+	trace_xfs_rmap_free_extent(mp, agno, bno, len, false, oinfo);
+	if (1)
+		goto out_error;
+	trace_xfs_rmap_free_extent_done(mp, agno, bno, len, false, oinfo);
+	return 0;
+
+out_error:
+	trace_xfs_rmap_free_extent_error(mp, agno, bno, len, false, oinfo);
+	return error;
+}
+
+int
+xfs_rmap_alloc(
+	struct xfs_trans	*tp,
+	struct xfs_buf		*agbp,
+	xfs_agnumber_t		agno,
+	xfs_agblock_t		bno,
+	xfs_extlen_t		len,
+	struct xfs_owner_info	*oinfo)
+{
+	struct xfs_mount	*mp = tp->t_mountp;
+	int			error = 0;
+
+	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+		return 0;
+
+	trace_xfs_rmap_alloc_extent(mp, agno, bno, len, false, oinfo);
+	if (1)
+		goto out_error;
+	trace_xfs_rmap_alloc_extent_done(mp, agno, bno, len, false, oinfo);
+	return 0;
+
+out_error:
+	trace_xfs_rmap_alloc_extent_error(mp, agno, bno, len, false, oinfo);
+	return error;
+}
diff --git a/libxfs/xfs_rmap_btree.h b/libxfs/xfs_rmap_btree.h
new file mode 100644
index 0000000..a3b8f90
--- /dev/null
+++ b/libxfs/xfs_rmap_btree.h
@@ -0,0 +1,30 @@
+/*
+ * Copyright (c) 2014 Red Hat, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+#ifndef __XFS_RMAP_BTREE_H__
+#define	__XFS_RMAP_BTREE_H__
+
+struct xfs_buf;
+
+int xfs_rmap_alloc(struct xfs_trans *tp, struct xfs_buf *agbp,
+		   xfs_agnumber_t agno, xfs_agblock_t bno, xfs_extlen_t len,
+		   struct xfs_owner_info *oinfo);
+int xfs_rmap_free(struct xfs_trans *tp, struct xfs_buf *agbp,
+		  xfs_agnumber_t agno, xfs_agblock_t bno, xfs_extlen_t len,
+		  struct xfs_owner_info *oinfo);
+
+#endif	/* __XFS_RMAP_BTREE_H__ */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 033/145] xfs: define the on-disk rmap btree format
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (31 preceding siblings ...)
  2016-06-17  1:34 ` [PATCH 032/145] xfs: introduce rmap extent operation stubs Darrick J. Wong
@ 2016-06-17  1:34 ` Darrick J. Wong
  2016-06-17  1:34 ` [PATCH 034/145] xfs: rmap btree transaction reservations Darrick J. Wong
                   ` (111 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:34 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: Dave Chinner, xfs

From: Dave Chinner <dchinner@redhat.com>

Now we have all the surrounding call infrastructure in place, we can
start filling out the rmap btree implementation. Start with the
on-disk btree format; add everything needed to read, write and
manipulate rmap btree blocks. This prepares the way for adding the
btree operations implementation.

[darrick: record owner and offset info in rmap btree]
[darrick: fork, bmbt and unwritten state in rmap btree]
[darrick: flags are a separate field in xfs_rmap_irec]
[darrick: calculate maxlevels separately]
[darrick: move 'unwritten' bit to rm_offset]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 include/xfs_mount.h     |    3 +
 libxfs/Makefile         |    1 
 libxfs/init.c           |    2 +
 libxfs/xfs_btree.c      |    3 +
 libxfs/xfs_btree.h      |   18 +++--
 libxfs/xfs_format.h     |  140 +++++++++++++++++++++++++++++++++++++
 libxfs/xfs_rmap_btree.c |  178 +++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_rmap_btree.h |   32 ++++++++
 libxfs/xfs_sb.c         |    6 ++
 libxfs/xfs_shared.h     |    2 +
 10 files changed, 377 insertions(+), 8 deletions(-)
 create mode 100644 libxfs/xfs_rmap_btree.c


diff --git a/include/xfs_mount.h b/include/xfs_mount.h
index 2c7c72e..7d63c93 100644
--- a/include/xfs_mount.h
+++ b/include/xfs_mount.h
@@ -64,9 +64,12 @@ typedef struct xfs_mount {
 	uint			m_bmap_dmnr[2];	/* XFS_BMAP_BLOCK_DMINRECS */
 	uint			m_inobt_mxr[2];	/* XFS_INOBT_BLOCK_MAXRECS */
 	uint			m_inobt_mnr[2];	/* XFS_INOBT_BLOCK_MINRECS */
+	uint			m_rmap_mxr[2];	/* max rmap btree records */
+	uint			m_rmap_mnr[2];	/* min rmap btree records */
 	uint			m_ag_maxlevels;	/* XFS_AG_MAXLEVELS */
 	uint			m_bm_maxlevels[2]; /* XFS_BM_MAXLEVELS */
 	uint			m_in_maxlevels;	/* XFS_IN_MAXLEVELS */
+	uint			m_rmap_maxlevels; /* max rmap btree levels */
 	xfs_extlen_t		m_ag_prealloc_blocks; /* reserved ag blocks */
 	struct radix_tree_root	m_perag_tree;
 	uint			m_flags;	/* global mount flags */
diff --git a/libxfs/Makefile b/libxfs/Makefile
index d7a3e57..df47da2 100644
--- a/libxfs/Makefile
+++ b/libxfs/Makefile
@@ -84,6 +84,7 @@ CFILES = cache.c \
 	xfs_ialloc_btree.c \
 	xfs_log_rlimit.c \
 	xfs_rmap.c \
+	xfs_rmap_btree.c \
 	xfs_rtbitmap.c \
 	xfs_sb.c \
 	xfs_symlink_remote.c \
diff --git a/libxfs/init.c b/libxfs/init.c
index 7b618da..c56d123 100644
--- a/libxfs/init.c
+++ b/libxfs/init.c
@@ -31,6 +31,7 @@
 #include "xfs_inode_fork.h"
 #include "xfs_inode.h"
 #include "xfs_trans.h"
+#include "xfs_rmap_btree.h"
 
 #include "libxfs.h"		/* for now */
 
@@ -683,6 +684,7 @@ libxfs_mount(
 	xfs_bmap_compute_maxlevels(mp, XFS_DATA_FORK);
 	xfs_bmap_compute_maxlevels(mp, XFS_ATTR_FORK);
 	xfs_ialloc_compute_maxlevels(mp);
+	xfs_rmapbt_compute_maxlevels(mp);
 
 	if (sbp->sb_imax_pct) {
 		/* Make sure the maximum inode count is a multiple of the
diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c
index 8f66e14..db0267a 100644
--- a/libxfs/xfs_btree.c
+++ b/libxfs/xfs_btree.c
@@ -1206,6 +1206,9 @@ xfs_btree_set_refs(
 	case XFS_BTNUM_BMAP:
 		xfs_buf_set_ref(bp, XFS_BMAP_BTREE_REF);
 		break;
+	case XFS_BTNUM_RMAP:
+		xfs_buf_set_ref(bp, XFS_RMAP_BTREE_REF);
+		break;
 	default:
 		ASSERT(0);
 	}
diff --git a/libxfs/xfs_btree.h b/libxfs/xfs_btree.h
index a29067c..90ea2a7 100644
--- a/libxfs/xfs_btree.h
+++ b/libxfs/xfs_btree.h
@@ -38,17 +38,19 @@ union xfs_btree_ptr {
 };
 
 union xfs_btree_key {
-	xfs_bmbt_key_t		bmbt;
-	xfs_bmdr_key_t		bmbr;	/* bmbt root block */
-	xfs_alloc_key_t		alloc;
-	xfs_inobt_key_t		inobt;
+	struct xfs_bmbt_key		bmbt;
+	xfs_bmdr_key_t			bmbr;	/* bmbt root block */
+	xfs_alloc_key_t			alloc;
+	struct xfs_inobt_key		inobt;
+	struct xfs_rmap_key		rmap;
 };
 
 union xfs_btree_rec {
-	xfs_bmbt_rec_t		bmbt;
-	xfs_bmdr_rec_t		bmbr;	/* bmbt root block */
-	xfs_alloc_rec_t		alloc;
-	xfs_inobt_rec_t		inobt;
+	struct xfs_bmbt_rec		bmbt;
+	xfs_bmdr_rec_t			bmbr;	/* bmbt root block */
+	struct xfs_alloc_rec		alloc;
+	struct xfs_inobt_rec		inobt;
+	struct xfs_rmap_rec		rmap;
 };
 
 /*
diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index 1b63315..2525004 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -1383,11 +1383,151 @@ xfs_rmap_ino_owner(
 #define XFS_RMAP_OWN_INODES	(-7ULL)	/* Inode chunk */
 #define XFS_RMAP_OWN_MIN	(-8ULL) /* guard */
 
+#define XFS_RMAP_NON_INODE_OWNER(owner)	(!!((owner) & (1ULL << 63)))
+
+/*
+ * Data record structure
+ */
+struct xfs_rmap_rec {
+	__be32		rm_startblock;	/* extent start block */
+	__be32		rm_blockcount;	/* extent length */
+	__be64		rm_owner;	/* extent owner */
+	__be64		rm_offset;	/* offset within the owner */
+};
+
+/*
+ * rmap btree record
+ *  rm_offset:63 is the attribute fork flag
+ *  rm_offset:62 is the bmbt block flag
+ *  rm_offset:61 is the unwritten extent flag (same as l0:63 in bmbt)
+ *  rm_offset:54-60 aren't used and should be zero
+ *  rm_offset:0-53 is the block offset within the inode
+ */
+#define XFS_RMAP_OFF_ATTR_FORK	((__uint64_t)1ULL << 63)
+#define XFS_RMAP_OFF_BMBT_BLOCK	((__uint64_t)1ULL << 62)
+#define XFS_RMAP_OFF_UNWRITTEN	((__uint64_t)1ULL << 61)
+
+#define XFS_RMAP_LEN_MAX	((__uint32_t)~0U)
+#define XFS_RMAP_OFF_FLAGS	(XFS_RMAP_OFF_ATTR_FORK | \
+				 XFS_RMAP_OFF_BMBT_BLOCK | \
+				 XFS_RMAP_OFF_UNWRITTEN)
+#define XFS_RMAP_OFF_MASK	((__uint64_t)0x3FFFFFFFFFFFFFULL)
+
+#define XFS_RMAP_OFF(off)		((off) & XFS_RMAP_OFF_MASK)
+
+#define XFS_RMAP_IS_BMBT_BLOCK(off)	(!!((off) & XFS_RMAP_OFF_BMBT_BLOCK))
+#define XFS_RMAP_IS_ATTR_FORK(off)	(!!((off) & XFS_RMAP_OFF_ATTR_FORK))
+#define XFS_RMAP_IS_UNWRITTEN(len)	(!!((off) & XFS_RMAP_OFF_UNWRITTEN))
+
+#define RMAPBT_STARTBLOCK_BITLEN	32
+#define RMAPBT_BLOCKCOUNT_BITLEN	32
+#define RMAPBT_OWNER_BITLEN		64
+#define RMAPBT_ATTRFLAG_BITLEN		1
+#define RMAPBT_BMBTFLAG_BITLEN		1
+#define RMAPBT_EXNTFLAG_BITLEN		1
+#define RMAPBT_UNUSED_OFFSET_BITLEN	7
+#define RMAPBT_OFFSET_BITLEN		54
+
+#define XFS_RMAP_ATTR_FORK		(1 << 0)
+#define XFS_RMAP_BMBT_BLOCK		(1 << 1)
+#define XFS_RMAP_UNWRITTEN		(1 << 2)
+#define XFS_RMAP_KEY_FLAGS		(XFS_RMAP_ATTR_FORK | \
+					 XFS_RMAP_BMBT_BLOCK)
+#define XFS_RMAP_REC_FLAGS		(XFS_RMAP_UNWRITTEN)
+struct xfs_rmap_irec {
+	xfs_agblock_t	rm_startblock;	/* extent start block */
+	xfs_extlen_t	rm_blockcount;	/* extent length */
+	__uint64_t	rm_owner;	/* extent owner */
+	__uint64_t	rm_offset;	/* offset within the owner */
+	unsigned int	rm_flags;	/* state flags */
+};
+
+static inline __u64
+xfs_rmap_irec_offset_pack(
+	const struct xfs_rmap_irec	*irec)
+{
+	__u64			x;
+
+	x = XFS_RMAP_OFF(irec->rm_offset);
+	if (irec->rm_flags & XFS_RMAP_ATTR_FORK)
+		x |= XFS_RMAP_OFF_ATTR_FORK;
+	if (irec->rm_flags & XFS_RMAP_BMBT_BLOCK)
+		x |= XFS_RMAP_OFF_BMBT_BLOCK;
+	if (irec->rm_flags & XFS_RMAP_UNWRITTEN)
+		x |= XFS_RMAP_OFF_UNWRITTEN;
+	return x;
+}
+
+static inline int
+xfs_rmap_irec_offset_unpack(
+	__u64			offset,
+	struct xfs_rmap_irec	*irec)
+{
+	if (offset & ~(XFS_RMAP_OFF_MASK | XFS_RMAP_OFF_FLAGS))
+		return -EFSCORRUPTED;
+	irec->rm_offset = XFS_RMAP_OFF(offset);
+	if (offset & XFS_RMAP_OFF_ATTR_FORK)
+		irec->rm_flags |= XFS_RMAP_ATTR_FORK;
+	if (offset & XFS_RMAP_OFF_BMBT_BLOCK)
+		irec->rm_flags |= XFS_RMAP_BMBT_BLOCK;
+	if (offset & XFS_RMAP_OFF_UNWRITTEN)
+		irec->rm_flags |= XFS_RMAP_UNWRITTEN;
+	return 0;
+}
+
+/*
+ * Key structure
+ *
+ * We don't use the length for lookups
+ */
+struct xfs_rmap_key {
+	__be32		rm_startblock;	/* extent start block */
+	__be64		rm_owner;	/* extent owner */
+	__be64		rm_offset;	/* offset within the owner */
+} __attribute__((packed));
+
+/* btree pointer type */
+typedef __be32 xfs_rmap_ptr_t;
+
 #define	XFS_RMAP_BLOCK(mp) \
 	(xfs_sb_version_hasfinobt(&((mp)->m_sb)) ? \
 	 XFS_FIBT_BLOCK(mp) + 1 : \
 	 XFS_IBT_BLOCK(mp) + 1)
 
+static inline void
+xfs_owner_info_unpack(
+	struct xfs_owner_info	*oinfo,
+	uint64_t		*owner,
+	uint64_t		*offset,
+	unsigned int		*flags)
+{
+	unsigned int		r = 0;
+
+	*owner = oinfo->oi_owner;
+	*offset = oinfo->oi_offset;
+	if (oinfo->oi_flags & XFS_OWNER_INFO_ATTR_FORK)
+		r |= XFS_RMAP_ATTR_FORK;
+	if (oinfo->oi_flags & XFS_OWNER_INFO_BMBT_BLOCK)
+		r |= XFS_RMAP_BMBT_BLOCK;
+	*flags = r;
+}
+
+static inline void
+xfs_owner_info_pack(
+	struct xfs_owner_info	*oinfo,
+	uint64_t		owner,
+	uint64_t		offset,
+	unsigned int		flags)
+{
+	oinfo->oi_owner = owner;
+	oinfo->oi_offset = XFS_RMAP_OFF(offset);
+	oinfo->oi_flags = 0;
+	if (flags & XFS_RMAP_ATTR_FORK)
+		oinfo->oi_flags |= XFS_OWNER_INFO_ATTR_FORK;
+	if (flags & XFS_RMAP_BMBT_BLOCK)
+		oinfo->oi_flags |= XFS_OWNER_INFO_BMBT_BLOCK;
+}
+
 /*
  * BMAP Btree format definitions
  *
diff --git a/libxfs/xfs_rmap_btree.c b/libxfs/xfs_rmap_btree.c
new file mode 100644
index 0000000..02636f6
--- /dev/null
+++ b/libxfs/xfs_rmap_btree.c
@@ -0,0 +1,178 @@
+/*
+ * Copyright (c) 2014 Red Hat, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+#include "libxfs_priv.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_bit.h"
+#include "xfs_sb.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_inode.h"
+#include "xfs_trans.h"
+#include "xfs_alloc.h"
+#include "xfs_btree.h"
+#include "xfs_rmap_btree.h"
+#include "xfs_trace.h"
+#include "xfs_cksum.h"
+
+static struct xfs_btree_cur *
+xfs_rmapbt_dup_cursor(
+	struct xfs_btree_cur	*cur)
+{
+	return xfs_rmapbt_init_cursor(cur->bc_mp, cur->bc_tp,
+			cur->bc_private.a.agbp, cur->bc_private.a.agno);
+}
+
+static bool
+xfs_rmapbt_verify(
+	struct xfs_buf		*bp)
+{
+	struct xfs_mount	*mp = bp->b_target->bt_mount;
+	struct xfs_btree_block	*block = XFS_BUF_TO_BLOCK(bp);
+	struct xfs_perag	*pag = bp->b_pag;
+	unsigned int		level;
+
+	/*
+	 * magic number and level verification
+	 *
+	 * During growfs operations, we can't verify the exact level or owner as
+	 * the perag is not fully initialised and hence not attached to the
+	 * buffer.  In this case, check against the maximum tree depth.
+	 *
+	 * Similarly, during log recovery we will have a perag structure
+	 * attached, but the agf information will not yet have been initialised
+	 * from the on disk AGF. Again, we can only check against maximum limits
+	 * in this case.
+	 */
+	if (block->bb_magic != cpu_to_be32(XFS_RMAP_CRC_MAGIC))
+		return false;
+
+	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+		return false;
+	if (!xfs_btree_sblock_v5hdr_verify(bp))
+		return false;
+
+	level = be16_to_cpu(block->bb_level);
+	if (pag && pag->pagf_init) {
+		if (level >= pag->pagf_levels[XFS_BTNUM_RMAPi])
+			return false;
+	} else if (level >= mp->m_rmap_maxlevels)
+		return false;
+
+	return xfs_btree_sblock_verify(bp, mp->m_rmap_mxr[level != 0]);
+}
+
+static void
+xfs_rmapbt_read_verify(
+	struct xfs_buf	*bp)
+{
+	if (!xfs_btree_sblock_verify_crc(bp))
+		xfs_buf_ioerror(bp, -EFSBADCRC);
+	else if (!xfs_rmapbt_verify(bp))
+		xfs_buf_ioerror(bp, -EFSCORRUPTED);
+
+	if (bp->b_error) {
+		trace_xfs_btree_corrupt(bp, _RET_IP_);
+		xfs_verifier_error(bp);
+	}
+}
+
+static void
+xfs_rmapbt_write_verify(
+	struct xfs_buf	*bp)
+{
+	if (!xfs_rmapbt_verify(bp)) {
+		trace_xfs_btree_corrupt(bp, _RET_IP_);
+		xfs_buf_ioerror(bp, -EFSCORRUPTED);
+		xfs_verifier_error(bp);
+		return;
+	}
+	xfs_btree_sblock_calc_crc(bp);
+
+}
+
+const struct xfs_buf_ops xfs_rmapbt_buf_ops = {
+	.name			= "xfs_rmapbt",
+	.verify_read		= xfs_rmapbt_read_verify,
+	.verify_write		= xfs_rmapbt_write_verify,
+};
+
+static const struct xfs_btree_ops xfs_rmapbt_ops = {
+	.rec_len		= sizeof(struct xfs_rmap_rec),
+	.key_len		= sizeof(struct xfs_rmap_key),
+
+	.dup_cursor		= xfs_rmapbt_dup_cursor,
+	.buf_ops		= &xfs_rmapbt_buf_ops,
+};
+
+/*
+ * Allocate a new allocation btree cursor.
+ */
+struct xfs_btree_cur *
+xfs_rmapbt_init_cursor(
+	struct xfs_mount	*mp,
+	struct xfs_trans	*tp,
+	struct xfs_buf		*agbp,
+	xfs_agnumber_t		agno)
+{
+	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agbp);
+	struct xfs_btree_cur	*cur;
+
+	cur = kmem_zone_zalloc(xfs_btree_cur_zone, KM_NOFS);
+	cur->bc_tp = tp;
+	cur->bc_mp = mp;
+	cur->bc_btnum = XFS_BTNUM_RMAP;
+	cur->bc_flags = XFS_BTREE_CRC_BLOCKS;
+	cur->bc_blocklog = mp->m_sb.sb_blocklog;
+	cur->bc_ops = &xfs_rmapbt_ops;
+	cur->bc_nlevels = be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAP]);
+
+	cur->bc_private.a.agbp = agbp;
+	cur->bc_private.a.agno = agno;
+
+	return cur;
+}
+
+/*
+ * Calculate number of records in an rmap btree block.
+ */
+int
+xfs_rmapbt_maxrecs(
+	struct xfs_mount	*mp,
+	int			blocklen,
+	int			leaf)
+{
+	blocklen -= XFS_RMAP_BLOCK_LEN;
+
+	if (leaf)
+		return blocklen / sizeof(struct xfs_rmap_rec);
+	return blocklen /
+		(sizeof(struct xfs_rmap_key) + sizeof(xfs_rmap_ptr_t));
+}
+
+/* Compute the maximum height of an rmap btree. */
+void
+xfs_rmapbt_compute_maxlevels(
+	struct xfs_mount		*mp)
+{
+	mp->m_rmap_maxlevels = xfs_btree_compute_maxlevels(mp,
+			mp->m_rmap_mnr, mp->m_sb.sb_agblocks);
+}
diff --git a/libxfs/xfs_rmap_btree.h b/libxfs/xfs_rmap_btree.h
index a3b8f90..462767f 100644
--- a/libxfs/xfs_rmap_btree.h
+++ b/libxfs/xfs_rmap_btree.h
@@ -19,6 +19,38 @@
 #define	__XFS_RMAP_BTREE_H__
 
 struct xfs_buf;
+struct xfs_btree_cur;
+struct xfs_mount;
+
+/* rmaps only exist on crc enabled filesystems */
+#define XFS_RMAP_BLOCK_LEN	XFS_BTREE_SBLOCK_CRC_LEN
+
+/*
+ * Record, key, and pointer address macros for btree blocks.
+ *
+ * (note that some of these may appear unused, but they are used in userspace)
+ */
+#define XFS_RMAP_REC_ADDR(block, index) \
+	((struct xfs_rmap_rec *) \
+		((char *)(block) + XFS_RMAP_BLOCK_LEN + \
+		 (((index) - 1) * sizeof(struct xfs_rmap_rec))))
+
+#define XFS_RMAP_KEY_ADDR(block, index) \
+	((struct xfs_rmap_key *) \
+		((char *)(block) + XFS_RMAP_BLOCK_LEN + \
+		 ((index) - 1) * sizeof(struct xfs_rmap_key)))
+
+#define XFS_RMAP_PTR_ADDR(block, index, maxrecs) \
+	((xfs_rmap_ptr_t *) \
+		((char *)(block) + XFS_RMAP_BLOCK_LEN + \
+		 (maxrecs) * sizeof(struct xfs_rmap_key) + \
+		 ((index) - 1) * sizeof(xfs_rmap_ptr_t)))
+
+struct xfs_btree_cur *xfs_rmapbt_init_cursor(struct xfs_mount *mp,
+				struct xfs_trans *tp, struct xfs_buf *bp,
+				xfs_agnumber_t agno);
+int xfs_rmapbt_maxrecs(struct xfs_mount *mp, int blocklen, int leaf);
+extern void xfs_rmapbt_compute_maxlevels(struct xfs_mount *mp);
 
 int xfs_rmap_alloc(struct xfs_trans *tp, struct xfs_buf *agbp,
 		   xfs_agnumber_t agno, xfs_agblock_t bno, xfs_extlen_t len,
diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index a4ee48e..0dfa179 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -34,6 +34,7 @@
 #include "xfs_bmap_btree.h"
 #include "xfs_alloc_btree.h"
 #include "xfs_ialloc_btree.h"
+#include "xfs_rmap_btree.h"
 
 /*
  * Physical superblock buffer manipulations. Shared with libxfs in userspace.
@@ -731,6 +732,11 @@ xfs_sb_mount_common(
 	mp->m_bmap_dmnr[0] = mp->m_bmap_dmxr[0] / 2;
 	mp->m_bmap_dmnr[1] = mp->m_bmap_dmxr[1] / 2;
 
+	mp->m_rmap_mxr[0] = xfs_rmapbt_maxrecs(mp, sbp->sb_blocksize, 1);
+	mp->m_rmap_mxr[1] = xfs_rmapbt_maxrecs(mp, sbp->sb_blocksize, 0);
+	mp->m_rmap_mnr[0] = mp->m_rmap_mxr[0] / 2;
+	mp->m_rmap_mnr[1] = mp->m_rmap_mxr[1] / 2;
+
 	mp->m_bsize = XFS_FSB_TO_BB(mp, 1);
 	mp->m_ialloc_inos = (int)MAX((__uint16_t)XFS_INODES_PER_CHUNK,
 					sbp->sb_inopblock);
diff --git a/libxfs/xfs_shared.h b/libxfs/xfs_shared.h
index 16002b5..0c5b30b 100644
--- a/libxfs/xfs_shared.h
+++ b/libxfs/xfs_shared.h
@@ -38,6 +38,7 @@ extern const struct xfs_buf_ops xfs_agi_buf_ops;
 extern const struct xfs_buf_ops xfs_agf_buf_ops;
 extern const struct xfs_buf_ops xfs_agfl_buf_ops;
 extern const struct xfs_buf_ops xfs_allocbt_buf_ops;
+extern const struct xfs_buf_ops xfs_rmapbt_buf_ops;
 extern const struct xfs_buf_ops xfs_attr3_leaf_buf_ops;
 extern const struct xfs_buf_ops xfs_attr3_rmt_buf_ops;
 extern const struct xfs_buf_ops xfs_bmbt_buf_ops;
@@ -116,6 +117,7 @@ int	xfs_log_calc_minimum_size(struct xfs_mount *);
 #define	XFS_INO_BTREE_REF	3
 #define	XFS_ALLOC_BTREE_REF	2
 #define	XFS_BMAP_BTREE_REF	2
+#define	XFS_RMAP_BTREE_REF	2
 #define	XFS_DIR_BTREE_REF	2
 #define	XFS_INO_REF		2
 #define	XFS_ATTR_BTREE_REF	1

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 034/145] xfs: rmap btree transaction reservations
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (32 preceding siblings ...)
  2016-06-17  1:34 ` [PATCH 033/145] xfs: define the on-disk rmap btree format Darrick J. Wong
@ 2016-06-17  1:34 ` Darrick J. Wong
  2016-06-17  1:34 ` [PATCH 035/145] xfs: rmap btree requires more reserved free space Darrick J. Wong
                   ` (110 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:34 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: Dave Chinner, xfs

The rmap btrees will use the AGFL as the block allocation source, so
we need to ensure that the transaction reservations reflect the fact
this tree is modified by allocation and freeing. Hence we need to
extend all the extent allocation/free reservations used in
transactions to handle this.

Note that this also gets rid of the unused XFS_ALLOCFREE_LOG_RES
macro, as we now do buffer reservations based on the number of
buffers logged via xfs_calc_buf_res(). Hence we only need the buffer
count calculation now.

[darrick: use rmap_maxlevels when calculating log block resv]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_trans_resv.c |   58 +++++++++++++++++++++++++++++++++--------------
 libxfs/xfs_trans_resv.h |   10 --------
 2 files changed, 41 insertions(+), 27 deletions(-)


diff --git a/libxfs/xfs_trans_resv.c b/libxfs/xfs_trans_resv.c
index c55220f..2ed80a5 100644
--- a/libxfs/xfs_trans_resv.c
+++ b/libxfs/xfs_trans_resv.c
@@ -63,6 +63,30 @@ xfs_calc_buf_res(
 }
 
 /*
+ * Per-extent log reservation for the btree changes involved in freeing or
+ * allocating an extent.  In classic XFS there were two trees that will be
+ * modified (bnobt + cntbt).  With rmap enabled, there are three trees
+ * (rmapbt).  The number of blocks reserved is based on the formula:
+ *
+ * num trees * ((2 blocks/level * max depth) - 1)
+ *
+ * Keep in mind that max depth is calculated separately for each type of tree.
+ */
+static uint
+xfs_allocfree_log_count(
+	struct xfs_mount *mp,
+	uint		num_ops)
+{
+	uint		blocks;
+
+	blocks = num_ops * 2 * (2 * mp->m_ag_maxlevels - 1);
+	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
+		blocks += num_ops * (2 * mp->m_rmap_maxlevels - 1);
+
+	return blocks;
+}
+
+/*
  * Logging inodes is really tricksy. They are logged in memory format,
  * which means that what we write into the log doesn't directly translate into
  * the amount of space they use on disk.
@@ -125,7 +149,7 @@ xfs_calc_inode_res(
  */
 STATIC uint
 xfs_calc_finobt_res(
-	struct xfs_mount 	*mp,
+	struct xfs_mount	*mp,
 	int			alloc,
 	int			modify)
 {
@@ -136,7 +160,7 @@ xfs_calc_finobt_res(
 
 	res = xfs_calc_buf_res(mp->m_in_maxlevels, XFS_FSB_TO_B(mp, 1));
 	if (alloc)
-		res += xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+		res += xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
 					XFS_FSB_TO_B(mp, 1));
 	if (modify)
 		res += (uint)XFS_FSB_TO_B(mp, 1);
@@ -187,10 +211,10 @@ xfs_calc_write_reservation(
 		     xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK),
 				      XFS_FSB_TO_B(mp, 1)) +
 		     xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
-		     xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 2),
+		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 2),
 				      XFS_FSB_TO_B(mp, 1))),
 		    (xfs_calc_buf_res(5, mp->m_sb.sb_sectsize) +
-		     xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 2),
+		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 2),
 				      XFS_FSB_TO_B(mp, 1))));
 }
 
@@ -216,10 +240,10 @@ xfs_calc_itruncate_reservation(
 		     xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK) + 1,
 				      XFS_FSB_TO_B(mp, 1))),
 		    (xfs_calc_buf_res(9, mp->m_sb.sb_sectsize) +
-		     xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 4),
+		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 4),
 				      XFS_FSB_TO_B(mp, 1)) +
 		    xfs_calc_buf_res(5, 0) +
-		    xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+		    xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
 				     XFS_FSB_TO_B(mp, 1)) +
 		    xfs_calc_buf_res(2 + mp->m_ialloc_blks +
 				     mp->m_in_maxlevels, 0)));
@@ -246,7 +270,7 @@ xfs_calc_rename_reservation(
 		     xfs_calc_buf_res(2 * XFS_DIROP_LOG_COUNT(mp),
 				      XFS_FSB_TO_B(mp, 1))),
 		    (xfs_calc_buf_res(7, mp->m_sb.sb_sectsize) +
-		     xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 3),
+		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 3),
 				      XFS_FSB_TO_B(mp, 1))));
 }
 
@@ -285,7 +309,7 @@ xfs_calc_link_reservation(
 		     xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp),
 				      XFS_FSB_TO_B(mp, 1))),
 		    (xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
-		     xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
 				      XFS_FSB_TO_B(mp, 1))));
 }
 
@@ -323,7 +347,7 @@ xfs_calc_remove_reservation(
 		     xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp),
 				      XFS_FSB_TO_B(mp, 1))),
 		    (xfs_calc_buf_res(4, mp->m_sb.sb_sectsize) +
-		     xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 2),
+		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 2),
 				      XFS_FSB_TO_B(mp, 1))));
 }
 
@@ -370,7 +394,7 @@ xfs_calc_create_resv_alloc(
 		mp->m_sb.sb_sectsize +
 		xfs_calc_buf_res(mp->m_ialloc_blks, XFS_FSB_TO_B(mp, 1)) +
 		xfs_calc_buf_res(mp->m_in_maxlevels, XFS_FSB_TO_B(mp, 1)) +
-		xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+		xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
 				 XFS_FSB_TO_B(mp, 1));
 }
 
@@ -398,7 +422,7 @@ xfs_calc_icreate_resv_alloc(
 	return xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) +
 		mp->m_sb.sb_sectsize +
 		xfs_calc_buf_res(mp->m_in_maxlevels, XFS_FSB_TO_B(mp, 1)) +
-		xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+		xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
 				 XFS_FSB_TO_B(mp, 1)) +
 		xfs_calc_finobt_res(mp, 0, 0);
 }
@@ -482,7 +506,7 @@ xfs_calc_ifree_reservation(
 		xfs_calc_buf_res(1, 0) +
 		xfs_calc_buf_res(2 + mp->m_ialloc_blks +
 				 mp->m_in_maxlevels, 0) +
-		xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+		xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
 				 XFS_FSB_TO_B(mp, 1)) +
 		xfs_calc_finobt_res(mp, 0, 1);
 }
@@ -512,7 +536,7 @@ xfs_calc_growdata_reservation(
 	struct xfs_mount	*mp)
 {
 	return xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
-		xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+		xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
 				 XFS_FSB_TO_B(mp, 1));
 }
 
@@ -534,7 +558,7 @@ xfs_calc_growrtalloc_reservation(
 		xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK),
 				 XFS_FSB_TO_B(mp, 1)) +
 		xfs_calc_inode_res(mp, 1) +
-		xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+		xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
 				 XFS_FSB_TO_B(mp, 1));
 }
 
@@ -610,7 +634,7 @@ xfs_calc_addafork_reservation(
 		xfs_calc_buf_res(1, mp->m_dir_geo->blksize) +
 		xfs_calc_buf_res(XFS_DAENTER_BMAP1B(mp, XFS_DATA_FORK) + 1,
 				 XFS_FSB_TO_B(mp, 1)) +
-		xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+		xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
 				 XFS_FSB_TO_B(mp, 1));
 }
 
@@ -633,7 +657,7 @@ xfs_calc_attrinval_reservation(
 		    xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK),
 				     XFS_FSB_TO_B(mp, 1))),
 		   (xfs_calc_buf_res(9, mp->m_sb.sb_sectsize) +
-		    xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 4),
+		    xfs_calc_buf_res(xfs_allocfree_log_count(mp, 4),
 				     XFS_FSB_TO_B(mp, 1))));
 }
 
@@ -700,7 +724,7 @@ xfs_calc_attrrm_reservation(
 					XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK)) +
 		     xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK), 0)),
 		    (xfs_calc_buf_res(5, mp->m_sb.sb_sectsize) +
-		     xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 2),
+		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 2),
 				      XFS_FSB_TO_B(mp, 1))));
 }
 
diff --git a/libxfs/xfs_trans_resv.h b/libxfs/xfs_trans_resv.h
index 7978150..0eb46ed 100644
--- a/libxfs/xfs_trans_resv.h
+++ b/libxfs/xfs_trans_resv.h
@@ -68,16 +68,6 @@ struct xfs_trans_resv {
 #define M_RES(mp)	(&(mp)->m_resv)
 
 /*
- * Per-extent log reservation for the allocation btree changes
- * involved in freeing or allocating an extent.
- * 2 trees * (2 blocks/level * max depth - 1) * block size
- */
-#define	XFS_ALLOCFREE_LOG_RES(mp,nx) \
-	((nx) * (2 * XFS_FSB_TO_B((mp), 2 * (mp)->m_ag_maxlevels - 1)))
-#define	XFS_ALLOCFREE_LOG_COUNT(mp,nx) \
-	((nx) * (2 * (2 * (mp)->m_ag_maxlevels - 1)))
-
-/*
  * Per-directory log reservation for any directory change.
  * dir blocks: (1 btree block per level + data block + free block) * dblock size
  * bmap btree: (levels + 2) * max depth * block size

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 035/145] xfs: rmap btree requires more reserved free space
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (33 preceding siblings ...)
  2016-06-17  1:34 ` [PATCH 034/145] xfs: rmap btree transaction reservations Darrick J. Wong
@ 2016-06-17  1:34 ` Darrick J. Wong
  2016-06-17  1:34 ` [PATCH 036/145] xfs: add rmap btree operations Darrick J. Wong
                   ` (109 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:34 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: Dave Chinner, xfs

From: Dave Chinner <dchinner@redhat.com>

The rmap btree is allocated from the AGFL, which means we have to
ensure ENOSPC is reported to userspace before we run out of free
space in each AG. The last allocation in an AG can cause a full
height rmap btree split, and that means we have to reserve at least
this many blocks *in each AG* to be placed on the AGFL at ENOSPC.
Update the various space calculation functiosn to handle this.

Also, because the macros are now executing conditional code and are called quite
frequently, convert them to functions that initialise varaibles in the struct
xfs_mount, use the new variables everywhere and document the calculations
better.

v2: If rmapbt is disabled, it is incorrect to require 1 extra AGFL block
for the rmapbt (due to the + 1); the entire clause needs to be gated
on the feature flag.

v3: Use m_rmap_maxlevels to determine min_free.

[darrick.wong@oracle.com: don't reserve blocks if !rmap]
[dchinner@redhat.com: update m_ag_max_usable after growfs]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 include/xfs_mount.h |    2 +
 libxfs/xfs_alloc.c  |   71 +++++++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_alloc.h  |   41 ++++-------------------------
 libxfs/xfs_bmap.c   |    2 +
 libxfs/xfs_sb.c     |    2 +
 mkfs/xfs_mkfs.c     |    2 +
 6 files changed, 82 insertions(+), 38 deletions(-)


diff --git a/include/xfs_mount.h b/include/xfs_mount.h
index 7d63c93..5cd9464 100644
--- a/include/xfs_mount.h
+++ b/include/xfs_mount.h
@@ -71,6 +71,8 @@ typedef struct xfs_mount {
 	uint			m_in_maxlevels;	/* XFS_IN_MAXLEVELS */
 	uint			m_rmap_maxlevels; /* max rmap btree levels */
 	xfs_extlen_t		m_ag_prealloc_blocks; /* reserved ag blocks */
+	uint			m_alloc_set_aside; /* space we can't use */
+	uint			m_ag_max_usable; /* max space per AG */
 	struct radix_tree_root	m_perag_tree;
 	uint			m_flags;	/* global mount flags */
 	uint			m_qflags;	/* quota status flags */
diff --git a/libxfs/xfs_alloc.c b/libxfs/xfs_alloc.c
index 0f35e96..7d680da 100644
--- a/libxfs/xfs_alloc.c
+++ b/libxfs/xfs_alloc.c
@@ -59,6 +59,72 @@ xfs_prealloc_blocks(
 }
 
 /*
+ * In order to avoid ENOSPC-related deadlock caused by out-of-order locking of
+ * AGF buffer (PV 947395), we place constraints on the relationship among
+ * actual allocations for data blocks, freelist blocks, and potential file data
+ * bmap btree blocks. However, these restrictions may result in no actual space
+ * allocated for a delayed extent, for example, a data block in a certain AG is
+ * allocated but there is no additional block for the additional bmap btree
+ * block due to a split of the bmap btree of the file. The result of this may
+ * lead to an infinite loop when the file gets flushed to disk and all delayed
+ * extents need to be actually allocated. To get around this, we explicitly set
+ * aside a few blocks which will not be reserved in delayed allocation.
+ *
+ * The minimum number of needed freelist blocks is 4 fsbs _per AG_ when we are
+ * not using rmap btrees a potential split of file's bmap btree requires 1 fsb,
+ * so we set the number of set-aside blocks to 4 + 4*agcount when not using
+ * rmap btrees.
+ *
+ * When rmap btrees are active, we have to consider that using the last block
+ * in the AG can cause a full height rmap btree split and we need enough blocks
+ * on the AGFL to be able to handle this. That means we have, in addition to
+ * the above consideration, another (2 * mp->m_rmap_levels) - 1 blocks required
+ * to be available to the free list.
+ */
+unsigned int
+xfs_alloc_set_aside(
+	struct xfs_mount *mp)
+{
+	unsigned int	blocks;
+
+	blocks = 4 + (mp->m_sb.sb_agcount * XFS_ALLOC_AGFL_RESERVE);
+	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+		return blocks;
+	return blocks + (mp->m_sb.sb_agcount * (2 * mp->m_rmap_maxlevels) - 1);
+}
+
+/*
+ * When deciding how much space to allocate out of an AG, we limit the
+ * allocation maximum size to the size the AG. However, we cannot use all the
+ * blocks in the AG - some are permanently used by metadata. These
+ * blocks are generally:
+ *	- the AG superblock, AGF, AGI and AGFL
+ *	- the AGF (bno and cnt) and AGI btree root blocks, and optionally
+ *	  the AGI free inode and rmap btree root blocks.
+ *	- blocks on the AGFL according to xfs_alloc_set_aside() limits
+ *
+ * The AG headers are sector sized, so the amount of space they take up is
+ * dependent on filesystem geometry. The others are all single blocks.
+ */
+unsigned int
+xfs_alloc_ag_max_usable(struct xfs_mount *mp)
+{
+	unsigned int	blocks;
+
+	blocks = XFS_BB_TO_FSB(mp, XFS_FSS_TO_BB(mp, 4)); /* ag headers */
+	blocks += XFS_ALLOC_AGFL_RESERVE;
+	blocks += 3;			/* AGF, AGI btree root blocks */
+	if (xfs_sb_version_hasfinobt(&mp->m_sb))
+		blocks++;		/* finobt root block */
+	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
+		/* rmap root block + full tree split on full AG */
+		blocks += 1 + (2 * mp->m_ag_maxlevels) - 1;
+	}
+
+	return mp->m_sb.sb_agblocks - blocks;
+}
+
+/*
  * Lookup the record equal to [bno, len] in the btree given by cur.
  */
 STATIC int				/* error */
@@ -1900,6 +1966,11 @@ xfs_alloc_min_freelist(
 	/* space needed by-size freespace btree */
 	min_free += min_t(unsigned int, pag->pagf_levels[XFS_BTNUM_CNTi] + 1,
 				       mp->m_ag_maxlevels);
+	/* space needed reverse mapping used space btree */
+	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
+		min_free += min_t(unsigned int,
+				  pag->pagf_levels[XFS_BTNUM_RMAPi] + 1,
+				  mp->m_rmap_maxlevels);
 
 	return min_free;
 }
diff --git a/libxfs/xfs_alloc.h b/libxfs/xfs_alloc.h
index 0721a48..7b6c66b 100644
--- a/libxfs/xfs_alloc.h
+++ b/libxfs/xfs_alloc.h
@@ -56,42 +56,6 @@ typedef unsigned int xfs_alloctype_t;
 #define	XFS_ALLOC_FLAG_FREEING	0x00000002  /* indicate caller is freeing extents*/
 
 /*
- * In order to avoid ENOSPC-related deadlock caused by
- * out-of-order locking of AGF buffer (PV 947395), we place
- * constraints on the relationship among actual allocations for
- * data blocks, freelist blocks, and potential file data bmap
- * btree blocks. However, these restrictions may result in no
- * actual space allocated for a delayed extent, for example, a data
- * block in a certain AG is allocated but there is no additional
- * block for the additional bmap btree block due to a split of the
- * bmap btree of the file. The result of this may lead to an
- * infinite loop in xfssyncd when the file gets flushed to disk and
- * all delayed extents need to be actually allocated. To get around
- * this, we explicitly set aside a few blocks which will not be
- * reserved in delayed allocation. Considering the minimum number of
- * needed freelist blocks is 4 fsbs _per AG_, a potential split of file's bmap
- * btree requires 1 fsb, so we set the number of set-aside blocks
- * to 4 + 4*agcount.
- */
-#define XFS_ALLOC_SET_ASIDE(mp)  (4 + ((mp)->m_sb.sb_agcount * 4))
-
-/*
- * When deciding how much space to allocate out of an AG, we limit the
- * allocation maximum size to the size the AG. However, we cannot use all the
- * blocks in the AG - some are permanently used by metadata. These
- * blocks are generally:
- *	- the AG superblock, AGF, AGI and AGFL
- *	- the AGF (bno and cnt) and AGI btree root blocks
- *	- 4 blocks on the AGFL according to XFS_ALLOC_SET_ASIDE() limits
- *
- * The AG headers are sector sized, so the amount of space they take up is
- * dependent on filesystem geometry. The others are all single blocks.
- */
-#define XFS_ALLOC_AG_MAX_USABLE(mp)	\
-	((mp)->m_sb.sb_agblocks - XFS_BB_TO_FSB(mp, XFS_FSS_TO_BB(mp, 4)) - 7)
-
-
-/*
  * Argument structure for xfs_alloc routines.
  * This is turned into a structure to avoid having 20 arguments passed
  * down several levels of the stack.
@@ -133,6 +97,11 @@ typedef struct xfs_alloc_arg {
 #define XFS_ALLOC_INITIAL_USER_DATA	(1 << 1)/* special case start of file */
 #define XFS_ALLOC_USERDATA_ZERO		(1 << 2)/* zero extent on allocation */
 
+/* freespace limit calculations */
+#define XFS_ALLOC_AGFL_RESERVE	4
+unsigned int xfs_alloc_set_aside(struct xfs_mount *mp);
+unsigned int xfs_alloc_ag_max_usable(struct xfs_mount *mp);
+
 xfs_extlen_t xfs_alloc_longest_free_extent(struct xfs_mount *mp,
 		struct xfs_perag *pag, xfs_extlen_t need);
 unsigned int xfs_alloc_min_freelist(struct xfs_mount *mp,
diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index de1c759..453d073 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -3664,7 +3664,7 @@ xfs_bmap_btalloc(
 	args.fsbno = ap->blkno;
 
 	/* Trim the allocation back to the maximum an AG can fit. */
-	args.maxlen = MIN(ap->length, XFS_ALLOC_AG_MAX_USABLE(mp));
+	args.maxlen = MIN(ap->length, mp->m_ag_max_usable);
 	args.firstblock = *ap->firstblock;
 	blen = 0;
 	if (nullfb) {
diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index 0dfa179..26c29ea 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -746,6 +746,8 @@ xfs_sb_mount_common(
 		mp->m_ialloc_min_blks = sbp->sb_spino_align;
 	else
 		mp->m_ialloc_min_blks = mp->m_ialloc_blks;
+	mp->m_alloc_set_aside = xfs_alloc_set_aside(mp);
+	mp->m_ag_max_usable = xfs_alloc_ag_max_usable(mp);
 }
 
 /*
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index e9d2851..4b5df98 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -2977,7 +2977,7 @@ _("size %s specified for log subvolume is too large, maximum is %lld blocks\n"),
 		 */
 		if (!logsize) {
 			logblocks = MIN(logblocks,
-					XFS_ALLOC_AG_MAX_USABLE(mp));
+					xfs_alloc_ag_max_usable(mp));
 
 			/* revalidate the log size is valid if we changed it */
 			validate_log_size(logblocks, blocklog, min_logblocks);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 036/145] xfs: add rmap btree operations
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (34 preceding siblings ...)
  2016-06-17  1:34 ` [PATCH 035/145] xfs: rmap btree requires more reserved free space Darrick J. Wong
@ 2016-06-17  1:34 ` Darrick J. Wong
  2016-06-17  1:34 ` [PATCH 037/145] xfs: support overlapping intervals in the rmap btree Darrick J. Wong
                   ` (108 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:34 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: Dave Chinner, xfs

From: Dave Chinner <dchinner@redhat.com>

Implement the generic btree operations needed to manipulate rmap
btree blocks. This is very similar to the per-ag freespace btree
implementation, and uses the AGFL for allocation and freeing of
blocks.

Adapt the rmap btree to store owner offsets within each rmap record,
and to handle the primary key being redefined as the tuple
[agblk, owner, offset].  The expansion of the primary key is crucial
to allowing multiple owners per extent.

[darrick: adapt the btree ops to deal with offsets]
[darrick: remove init_rec_from_key]
[darrick: move unwritten bit to rm_offset]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 include/xfs_trace.h     |    2 
 libxfs/xfs_btree.h      |    1 
 libxfs/xfs_rmap.c       |   96 +++++++++++++++++++
 libxfs/xfs_rmap_btree.c |  243 +++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_rmap_btree.h |    9 ++
 5 files changed, 351 insertions(+)


diff --git a/include/xfs_trace.h b/include/xfs_trace.h
index b4174a3..af3f68b 100644
--- a/include/xfs_trace.h
+++ b/include/xfs_trace.h
@@ -192,6 +192,8 @@
 #define trace_xfs_rmap_alloc_extent_error(...)	((void) 0)
 #define trace_xfs_rmap_alloc_extent_done(...)	((void) 0)
 #define trace_xfs_rmap_free_extent_done(...)	((void) 0)
+#define trace_xfs_rmapbt_free_block(...)	((void) 0)
+#define trace_xfs_rmapbt_alloc_block(...)	((void) 0)
 
 /* set c = c to avoid unused var warnings */
 #define trace_xfs_perag_get(a,b,c,d)	((c) = (c))
diff --git a/libxfs/xfs_btree.h b/libxfs/xfs_btree.h
index 90ea2a7..9963c48 100644
--- a/libxfs/xfs_btree.h
+++ b/libxfs/xfs_btree.h
@@ -216,6 +216,7 @@ union xfs_btree_irec {
 	xfs_alloc_rec_incore_t		a;
 	xfs_bmbt_irec_t			b;
 	xfs_inobt_rec_incore_t		i;
+	struct xfs_rmap_irec		r;
 };
 
 /*
diff --git a/libxfs/xfs_rmap.c b/libxfs/xfs_rmap.c
index 31c6336..5034fd3 100644
--- a/libxfs/xfs_rmap.c
+++ b/libxfs/xfs_rmap.c
@@ -35,6 +35,102 @@
 #include "xfs_trans_space.h"
 #include "xfs_trace.h"
 
+/*
+ * Lookup the first record less than or equal to [bno, len, owner, offset]
+ * in the btree given by cur.
+ */
+int
+xfs_rmap_lookup_le(
+	struct xfs_btree_cur	*cur,
+	xfs_agblock_t		bno,
+	xfs_extlen_t		len,
+	uint64_t		owner,
+	uint64_t		offset,
+	unsigned int		flags,
+	int			*stat)
+{
+	cur->bc_rec.r.rm_startblock = bno;
+	cur->bc_rec.r.rm_blockcount = len;
+	cur->bc_rec.r.rm_owner = owner;
+	cur->bc_rec.r.rm_offset = offset;
+	cur->bc_rec.r.rm_flags = flags;
+	return xfs_btree_lookup(cur, XFS_LOOKUP_LE, stat);
+}
+
+/*
+ * Lookup the record exactly matching [bno, len, owner, offset]
+ * in the btree given by cur.
+ */
+int
+xfs_rmap_lookup_eq(
+	struct xfs_btree_cur	*cur,
+	xfs_agblock_t		bno,
+	xfs_extlen_t		len,
+	uint64_t		owner,
+	uint64_t		offset,
+	unsigned int		flags,
+	int			*stat)
+{
+	cur->bc_rec.r.rm_startblock = bno;
+	cur->bc_rec.r.rm_blockcount = len;
+	cur->bc_rec.r.rm_owner = owner;
+	cur->bc_rec.r.rm_offset = offset;
+	cur->bc_rec.r.rm_flags = flags;
+	return xfs_btree_lookup(cur, XFS_LOOKUP_EQ, stat);
+}
+
+/*
+ * Update the record referred to by cur to the value given
+ * by [bno, len, owner, offset].
+ * This either works (return 0) or gets an EFSCORRUPTED error.
+ */
+STATIC int
+xfs_rmap_update(
+	struct xfs_btree_cur	*cur,
+	struct xfs_rmap_irec	*irec)
+{
+	union xfs_btree_rec	rec;
+
+	rec.rmap.rm_startblock = cpu_to_be32(irec->rm_startblock);
+	rec.rmap.rm_blockcount = cpu_to_be32(irec->rm_blockcount);
+	rec.rmap.rm_owner = cpu_to_be64(irec->rm_owner);
+	rec.rmap.rm_offset = cpu_to_be64(
+			xfs_rmap_irec_offset_pack(irec));
+	return xfs_btree_update(cur, &rec);
+}
+
+static int
+xfs_rmapbt_btrec_to_irec(
+	union xfs_btree_rec	*rec,
+	struct xfs_rmap_irec	*irec)
+{
+	irec->rm_flags = 0;
+	irec->rm_startblock = be32_to_cpu(rec->rmap.rm_startblock);
+	irec->rm_blockcount = be32_to_cpu(rec->rmap.rm_blockcount);
+	irec->rm_owner = be64_to_cpu(rec->rmap.rm_owner);
+	return xfs_rmap_irec_offset_unpack(be64_to_cpu(rec->rmap.rm_offset),
+			irec);
+}
+
+/*
+ * Get the data from the pointed-to record.
+ */
+int
+xfs_rmap_get_rec(
+	struct xfs_btree_cur	*cur,
+	struct xfs_rmap_irec	*irec,
+	int			*stat)
+{
+	union xfs_btree_rec	*rec;
+	int			error;
+
+	error = xfs_btree_get_rec(cur, &rec, stat);
+	if (error || !*stat)
+		return error;
+
+	return xfs_rmapbt_btrec_to_irec(rec, irec);
+}
+
 int
 xfs_rmap_free(
 	struct xfs_trans	*tp,
diff --git a/libxfs/xfs_rmap_btree.c b/libxfs/xfs_rmap_btree.c
index 02636f6..c8fd99e 100644
--- a/libxfs/xfs_rmap_btree.c
+++ b/libxfs/xfs_rmap_btree.c
@@ -33,6 +33,31 @@
 #include "xfs_trace.h"
 #include "xfs_cksum.h"
 
+/*
+ * Reverse map btree.
+ *
+ * This is a per-ag tree used to track the owner(s) of a given extent. With
+ * reflink it is possible for there to be multiple owners, which is a departure
+ * from classic XFS. Owner records for data extents are inserted when the
+ * extent is mapped and removed when an extent is unmapped.  Owner records for
+ * all other block types (i.e. metadata) are inserted when an extent is
+ * allocated and removed when an extent is freed. There can only be one owner
+ * of a metadata extent, usually an inode or some other metadata structure like
+ * an AG btree.
+ *
+ * The rmap btree is part of the free space management, so blocks for the tree
+ * are sourced from the agfl. Hence we need transaction reservation support for
+ * this tree so that the freelist is always large enough. This also impacts on
+ * the minimum space we need to leave free in the AG.
+ *
+ * The tree is ordered by [ag block, owner, offset]. This is a large key size,
+ * but it is the only way to enforce unique keys when a block can be owned by
+ * multiple files at any offset. There's no need to order/search by extent
+ * size for online updating/management of the tree. It is intended that most
+ * reverse lookups will be to find the owner(s) of a particular block, or to
+ * try to recover tree and file data from corrupt primary metadata.
+ */
+
 static struct xfs_btree_cur *
 xfs_rmapbt_dup_cursor(
 	struct xfs_btree_cur	*cur)
@@ -41,6 +66,173 @@ xfs_rmapbt_dup_cursor(
 			cur->bc_private.a.agbp, cur->bc_private.a.agno);
 }
 
+STATIC void
+xfs_rmapbt_set_root(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_ptr	*ptr,
+	int			inc)
+{
+	struct xfs_buf		*agbp = cur->bc_private.a.agbp;
+	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agbp);
+	xfs_agnumber_t		seqno = be32_to_cpu(agf->agf_seqno);
+	int			btnum = cur->bc_btnum;
+	struct xfs_perag	*pag = xfs_perag_get(cur->bc_mp, seqno);
+
+	ASSERT(ptr->s != 0);
+
+	agf->agf_roots[btnum] = ptr->s;
+	be32_add_cpu(&agf->agf_levels[btnum], inc);
+	pag->pagf_levels[btnum] += inc;
+	xfs_perag_put(pag);
+
+	xfs_alloc_log_agf(cur->bc_tp, agbp, XFS_AGF_ROOTS | XFS_AGF_LEVELS);
+}
+
+STATIC int
+xfs_rmapbt_alloc_block(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_ptr	*start,
+	union xfs_btree_ptr	*new,
+	int			*stat)
+{
+	int			error;
+	xfs_agblock_t		bno;
+
+	XFS_BTREE_TRACE_CURSOR(cur, XBT_ENTRY);
+
+	/* Allocate the new block from the freelist. If we can't, give up.  */
+	error = xfs_alloc_get_freelist(cur->bc_tp, cur->bc_private.a.agbp,
+				       &bno, 1);
+	if (error) {
+		XFS_BTREE_TRACE_CURSOR(cur, XBT_ERROR);
+		return error;
+	}
+
+	trace_xfs_rmapbt_alloc_block(cur->bc_mp, cur->bc_private.a.agno,
+			bno, 1);
+	if (bno == NULLAGBLOCK) {
+		XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT);
+		*stat = 0;
+		return 0;
+	}
+
+	xfs_extent_busy_reuse(cur->bc_mp, cur->bc_private.a.agno, bno, 1,
+			false);
+
+	xfs_trans_agbtree_delta(cur->bc_tp, 1);
+	new->s = cpu_to_be32(bno);
+
+	XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT);
+	*stat = 1;
+	return 0;
+}
+
+STATIC int
+xfs_rmapbt_free_block(
+	struct xfs_btree_cur	*cur,
+	struct xfs_buf		*bp)
+{
+	struct xfs_buf		*agbp = cur->bc_private.a.agbp;
+	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agbp);
+	xfs_agblock_t		bno;
+	int			error;
+
+	bno = xfs_daddr_to_agbno(cur->bc_mp, XFS_BUF_ADDR(bp));
+	trace_xfs_rmapbt_free_block(cur->bc_mp, cur->bc_private.a.agno,
+			bno, 1);
+	error = xfs_alloc_put_freelist(cur->bc_tp, agbp, NULL, bno, 1);
+	if (error)
+		return error;
+
+	xfs_extent_busy_insert(cur->bc_tp, be32_to_cpu(agf->agf_seqno), bno, 1,
+			      XFS_EXTENT_BUSY_SKIP_DISCARD);
+	xfs_trans_agbtree_delta(cur->bc_tp, -1);
+
+	xfs_trans_binval(cur->bc_tp, bp);
+	return 0;
+}
+
+STATIC int
+xfs_rmapbt_get_minrecs(
+	struct xfs_btree_cur	*cur,
+	int			level)
+{
+	return cur->bc_mp->m_rmap_mnr[level != 0];
+}
+
+STATIC int
+xfs_rmapbt_get_maxrecs(
+	struct xfs_btree_cur	*cur,
+	int			level)
+{
+	return cur->bc_mp->m_rmap_mxr[level != 0];
+}
+
+STATIC void
+xfs_rmapbt_init_key_from_rec(
+	union xfs_btree_key	*key,
+	union xfs_btree_rec	*rec)
+{
+	key->rmap.rm_startblock = rec->rmap.rm_startblock;
+	key->rmap.rm_owner = rec->rmap.rm_owner;
+	key->rmap.rm_offset = rec->rmap.rm_offset;
+}
+
+STATIC void
+xfs_rmapbt_init_rec_from_cur(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_rec	*rec)
+{
+	rec->rmap.rm_startblock = cpu_to_be32(cur->bc_rec.r.rm_startblock);
+	rec->rmap.rm_blockcount = cpu_to_be32(cur->bc_rec.r.rm_blockcount);
+	rec->rmap.rm_owner = cpu_to_be64(cur->bc_rec.r.rm_owner);
+	rec->rmap.rm_offset = cpu_to_be64(
+			xfs_rmap_irec_offset_pack(&cur->bc_rec.r));
+}
+
+STATIC void
+xfs_rmapbt_init_ptr_from_cur(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_ptr	*ptr)
+{
+	struct xfs_agf		*agf = XFS_BUF_TO_AGF(cur->bc_private.a.agbp);
+
+	ASSERT(cur->bc_private.a.agno == be32_to_cpu(agf->agf_seqno));
+	ASSERT(agf->agf_roots[cur->bc_btnum] != 0);
+
+	ptr->s = agf->agf_roots[cur->bc_btnum];
+}
+
+STATIC __int64_t
+xfs_rmapbt_key_diff(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_key	*key)
+{
+	struct xfs_rmap_irec	*rec = &cur->bc_rec.r;
+	struct xfs_rmap_key	*kp = &key->rmap;
+	__u64			x, y;
+	__int64_t		d;
+
+	d = (__int64_t)be32_to_cpu(kp->rm_startblock) - rec->rm_startblock;
+	if (d)
+		return d;
+
+	x = be64_to_cpu(kp->rm_owner);
+	y = rec->rm_owner;
+	if (x > y)
+		return 1;
+	else if (y > x)
+		return -1;
+
+	x = XFS_RMAP_OFF(be64_to_cpu(kp->rm_offset));
+	y = rec->rm_offset;
+	if (x > y)
+		return 1;
+	else if (y > x)
+		return -1;
+	return 0;
+}
+
 static bool
 xfs_rmapbt_verify(
 	struct xfs_buf		*bp)
@@ -115,12 +307,63 @@ const struct xfs_buf_ops xfs_rmapbt_buf_ops = {
 	.verify_write		= xfs_rmapbt_write_verify,
 };
 
+#if defined(DEBUG) || defined(XFS_WARN)
+STATIC int
+xfs_rmapbt_keys_inorder(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_key	*k1,
+	union xfs_btree_key	*k2)
+{
+	if (be32_to_cpu(k1->rmap.rm_startblock) <
+	    be32_to_cpu(k2->rmap.rm_startblock))
+		return 1;
+	if (be64_to_cpu(k1->rmap.rm_owner) <
+	    be64_to_cpu(k2->rmap.rm_owner))
+		return 1;
+	if (XFS_RMAP_OFF(be64_to_cpu(k1->rmap.rm_offset)) <=
+	    XFS_RMAP_OFF(be64_to_cpu(k2->rmap.rm_offset)))
+		return 1;
+	return 0;
+}
+
+STATIC int
+xfs_rmapbt_recs_inorder(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_rec	*r1,
+	union xfs_btree_rec	*r2)
+{
+	if (be32_to_cpu(r1->rmap.rm_startblock) <
+	    be32_to_cpu(r2->rmap.rm_startblock))
+		return 1;
+	if (XFS_RMAP_OFF(be64_to_cpu(r1->rmap.rm_offset)) <
+	    XFS_RMAP_OFF(be64_to_cpu(r2->rmap.rm_offset)))
+		return 1;
+	if (be64_to_cpu(r1->rmap.rm_owner) <=
+	    be64_to_cpu(r2->rmap.rm_owner))
+		return 1;
+	return 0;
+}
+#endif	/* DEBUG */
+
 static const struct xfs_btree_ops xfs_rmapbt_ops = {
 	.rec_len		= sizeof(struct xfs_rmap_rec),
 	.key_len		= sizeof(struct xfs_rmap_key),
 
 	.dup_cursor		= xfs_rmapbt_dup_cursor,
+	.set_root		= xfs_rmapbt_set_root,
+	.alloc_block		= xfs_rmapbt_alloc_block,
+	.free_block		= xfs_rmapbt_free_block,
+	.get_minrecs		= xfs_rmapbt_get_minrecs,
+	.get_maxrecs		= xfs_rmapbt_get_maxrecs,
+	.init_key_from_rec	= xfs_rmapbt_init_key_from_rec,
+	.init_rec_from_cur	= xfs_rmapbt_init_rec_from_cur,
+	.init_ptr_from_cur	= xfs_rmapbt_init_ptr_from_cur,
+	.key_diff		= xfs_rmapbt_key_diff,
 	.buf_ops		= &xfs_rmapbt_buf_ops,
+#if defined(DEBUG) || defined(XFS_WARN)
+	.keys_inorder		= xfs_rmapbt_keys_inorder,
+	.recs_inorder		= xfs_rmapbt_recs_inorder,
+#endif
 };
 
 /*
diff --git a/libxfs/xfs_rmap_btree.h b/libxfs/xfs_rmap_btree.h
index 462767f..17fa383 100644
--- a/libxfs/xfs_rmap_btree.h
+++ b/libxfs/xfs_rmap_btree.h
@@ -52,6 +52,15 @@ struct xfs_btree_cur *xfs_rmapbt_init_cursor(struct xfs_mount *mp,
 int xfs_rmapbt_maxrecs(struct xfs_mount *mp, int blocklen, int leaf);
 extern void xfs_rmapbt_compute_maxlevels(struct xfs_mount *mp);
 
+int xfs_rmap_lookup_le(struct xfs_btree_cur *cur, xfs_agblock_t bno,
+		xfs_extlen_t len, uint64_t owner, uint64_t offset,
+		unsigned int flags, int *stat);
+int xfs_rmap_lookup_eq(struct xfs_btree_cur *cur, xfs_agblock_t bno,
+		xfs_extlen_t len, uint64_t owner, uint64_t offset,
+		unsigned int flags, int *stat);
+int xfs_rmap_get_rec(struct xfs_btree_cur *cur, struct xfs_rmap_irec *irec,
+		int *stat);
+
 int xfs_rmap_alloc(struct xfs_trans *tp, struct xfs_buf *agbp,
 		   xfs_agnumber_t agno, xfs_agblock_t bno, xfs_extlen_t len,
 		   struct xfs_owner_info *oinfo);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 037/145] xfs: support overlapping intervals in the rmap btree
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (35 preceding siblings ...)
  2016-06-17  1:34 ` [PATCH 036/145] xfs: add rmap btree operations Darrick J. Wong
@ 2016-06-17  1:34 ` Darrick J. Wong
  2016-06-17  1:34 ` [PATCH 038/145] xfs: teach rmapbt to support interval queries Darrick J. Wong
                   ` (107 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:34 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Now that the generic btree code supports overlapping intervals, plug
in the rmap btree to this functionality.  We will need it to find
potential left neighbors in xfs_rmap_{alloc,free} later in the patch
set.

v2: Fix bit manipulation bug when generating high key offset.

v3: Move unwritten bit to rm_offset.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_rmap_btree.c |   59 ++++++++++++++++++++++++++++++++++++++++++++++-
 libxfs/xfs_rmap_btree.h |   10 ++++++--
 2 files changed, 66 insertions(+), 3 deletions(-)


diff --git a/libxfs/xfs_rmap_btree.c b/libxfs/xfs_rmap_btree.c
index c8fd99e..b5c3c21 100644
--- a/libxfs/xfs_rmap_btree.c
+++ b/libxfs/xfs_rmap_btree.c
@@ -179,6 +179,28 @@ xfs_rmapbt_init_key_from_rec(
 }
 
 STATIC void
+xfs_rmapbt_init_high_key_from_rec(
+	union xfs_btree_key	*key,
+	union xfs_btree_rec	*rec)
+{
+	__uint64_t		off;
+	int			adj;
+
+	adj = be32_to_cpu(rec->rmap.rm_blockcount) - 1;
+
+	key->rmap.rm_startblock = rec->rmap.rm_startblock;
+	be32_add_cpu(&key->rmap.rm_startblock, adj);
+	key->rmap.rm_owner = rec->rmap.rm_owner;
+	key->rmap.rm_offset = rec->rmap.rm_offset;
+	if (XFS_RMAP_NON_INODE_OWNER(be64_to_cpu(rec->rmap.rm_owner)) ||
+	    XFS_RMAP_IS_BMBT_BLOCK(be64_to_cpu(rec->rmap.rm_offset)))
+		return;
+	off = be64_to_cpu(key->rmap.rm_offset);
+	off = (XFS_RMAP_OFF(off) + adj) | (off & ~XFS_RMAP_OFF_MASK);
+	key->rmap.rm_offset = cpu_to_be64(off);
+}
+
+STATIC void
 xfs_rmapbt_init_rec_from_cur(
 	struct xfs_btree_cur	*cur,
 	union xfs_btree_rec	*rec)
@@ -233,6 +255,38 @@ xfs_rmapbt_key_diff(
 	return 0;
 }
 
+STATIC __int64_t
+xfs_rmapbt_diff_two_keys(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_key	*k1,
+	union xfs_btree_key	*k2)
+{
+	struct xfs_rmap_key	*kp1 = &k1->rmap;
+	struct xfs_rmap_key	*kp2 = &k2->rmap;
+	__int64_t		d;
+	__u64			x, y;
+
+	d = (__int64_t)be32_to_cpu(kp2->rm_startblock) -
+		       be32_to_cpu(kp1->rm_startblock);
+	if (d)
+		return d;
+
+	x = be64_to_cpu(kp2->rm_owner);
+	y = be64_to_cpu(kp1->rm_owner);
+	if (x > y)
+		return 1;
+	else if (y > x)
+		return -1;
+
+	x = XFS_RMAP_OFF(be64_to_cpu(kp2->rm_offset));
+	y = XFS_RMAP_OFF(be64_to_cpu(kp1->rm_offset));
+	if (x > y)
+		return 1;
+	else if (y > x)
+		return -1;
+	return 0;
+}
+
 static bool
 xfs_rmapbt_verify(
 	struct xfs_buf		*bp)
@@ -348,6 +402,7 @@ xfs_rmapbt_recs_inorder(
 static const struct xfs_btree_ops xfs_rmapbt_ops = {
 	.rec_len		= sizeof(struct xfs_rmap_rec),
 	.key_len		= sizeof(struct xfs_rmap_key),
+	.flags			= XFS_BTREE_OPS_OVERLAPPING,
 
 	.dup_cursor		= xfs_rmapbt_dup_cursor,
 	.set_root		= xfs_rmapbt_set_root,
@@ -356,10 +411,12 @@ static const struct xfs_btree_ops xfs_rmapbt_ops = {
 	.get_minrecs		= xfs_rmapbt_get_minrecs,
 	.get_maxrecs		= xfs_rmapbt_get_maxrecs,
 	.init_key_from_rec	= xfs_rmapbt_init_key_from_rec,
+	.init_high_key_from_rec	= xfs_rmapbt_init_high_key_from_rec,
 	.init_rec_from_cur	= xfs_rmapbt_init_rec_from_cur,
 	.init_ptr_from_cur	= xfs_rmapbt_init_ptr_from_cur,
 	.key_diff		= xfs_rmapbt_key_diff,
 	.buf_ops		= &xfs_rmapbt_buf_ops,
+	.diff_two_keys		= xfs_rmapbt_diff_two_keys,
 #if defined(DEBUG) || defined(XFS_WARN)
 	.keys_inorder		= xfs_rmapbt_keys_inorder,
 	.recs_inorder		= xfs_rmapbt_recs_inorder,
@@ -408,7 +465,7 @@ xfs_rmapbt_maxrecs(
 	if (leaf)
 		return blocklen / sizeof(struct xfs_rmap_rec);
 	return blocklen /
-		(sizeof(struct xfs_rmap_key) + sizeof(xfs_rmap_ptr_t));
+		(2 * sizeof(struct xfs_rmap_key) + sizeof(xfs_rmap_ptr_t));
 }
 
 /* Compute the maximum height of an rmap btree. */
diff --git a/libxfs/xfs_rmap_btree.h b/libxfs/xfs_rmap_btree.h
index 17fa383..796071c 100644
--- a/libxfs/xfs_rmap_btree.h
+++ b/libxfs/xfs_rmap_btree.h
@@ -38,12 +38,18 @@ struct xfs_mount;
 #define XFS_RMAP_KEY_ADDR(block, index) \
 	((struct xfs_rmap_key *) \
 		((char *)(block) + XFS_RMAP_BLOCK_LEN + \
-		 ((index) - 1) * sizeof(struct xfs_rmap_key)))
+		 ((index) - 1) * 2 * sizeof(struct xfs_rmap_key)))
+
+#define XFS_RMAP_HIGH_KEY_ADDR(block, index) \
+	((struct xfs_rmap_key *) \
+		((char *)(block) + XFS_RMAP_BLOCK_LEN + \
+		 sizeof(struct xfs_rmap_key) + \
+		 ((index) - 1) * 2 * sizeof(struct xfs_rmap_key)))
 
 #define XFS_RMAP_PTR_ADDR(block, index, maxrecs) \
 	((xfs_rmap_ptr_t *) \
 		((char *)(block) + XFS_RMAP_BLOCK_LEN + \
-		 (maxrecs) * sizeof(struct xfs_rmap_key) + \
+		 (maxrecs) * 2 * sizeof(struct xfs_rmap_key) + \
 		 ((index) - 1) * sizeof(xfs_rmap_ptr_t)))
 
 struct xfs_btree_cur *xfs_rmapbt_init_cursor(struct xfs_mount *mp,

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 038/145] xfs: teach rmapbt to support interval queries
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (36 preceding siblings ...)
  2016-06-17  1:34 ` [PATCH 037/145] xfs: support overlapping intervals in the rmap btree Darrick J. Wong
@ 2016-06-17  1:34 ` Darrick J. Wong
  2016-06-17  1:34 ` [PATCH 039/145] xfs: add an extent to the rmap btree Darrick J. Wong
                   ` (106 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:34 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Now that the generic btree code supports querying all records within a
range of keys, use that functionality to allow us to ask for all the
extents mapped to a range of physical blocks.

v2: Move unwritten bit to rm_offset.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_rmap.c       |   43 +++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_rmap_btree.h |    9 +++++++++
 2 files changed, 52 insertions(+)


diff --git a/libxfs/xfs_rmap.c b/libxfs/xfs_rmap.c
index 5034fd3..390ddde 100644
--- a/libxfs/xfs_rmap.c
+++ b/libxfs/xfs_rmap.c
@@ -182,3 +182,46 @@ out_error:
 	trace_xfs_rmap_alloc_extent_error(mp, agno, bno, len, false, oinfo);
 	return error;
 }
+
+struct xfs_rmapbt_query_range_info {
+	xfs_rmapbt_query_range_fn	fn;
+	void				*priv;
+};
+
+/* Format btree record and pass to our callback. */
+STATIC int
+xfs_rmapbt_query_range_helper(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_rec	*rec,
+	void			*priv)
+{
+	struct xfs_rmapbt_query_range_info	*query = priv;
+	struct xfs_rmap_irec			irec;
+	int					error;
+
+	error = xfs_rmapbt_btrec_to_irec(rec, &irec);
+	if (error)
+		return error;
+	return query->fn(cur, &irec, query->priv);
+}
+
+/* Find all rmaps between two keys. */
+int
+xfs_rmapbt_query_range(
+	struct xfs_btree_cur		*cur,
+	struct xfs_rmap_irec		*low_rec,
+	struct xfs_rmap_irec		*high_rec,
+	xfs_rmapbt_query_range_fn	fn,
+	void				*priv)
+{
+	union xfs_btree_irec		low_brec;
+	union xfs_btree_irec		high_brec;
+	struct xfs_rmapbt_query_range_info	query;
+
+	low_brec.r = *low_rec;
+	high_brec.r = *high_rec;
+	query.priv = priv;
+	query.fn = fn;
+	return xfs_btree_query_range(cur, &low_brec, &high_brec,
+			xfs_rmapbt_query_range_helper, &query);
+}
diff --git a/libxfs/xfs_rmap_btree.h b/libxfs/xfs_rmap_btree.h
index 796071c..e926c6e 100644
--- a/libxfs/xfs_rmap_btree.h
+++ b/libxfs/xfs_rmap_btree.h
@@ -74,4 +74,13 @@ int xfs_rmap_free(struct xfs_trans *tp, struct xfs_buf *agbp,
 		  xfs_agnumber_t agno, xfs_agblock_t bno, xfs_extlen_t len,
 		  struct xfs_owner_info *oinfo);
 
+typedef int (*xfs_rmapbt_query_range_fn)(
+	struct xfs_btree_cur	*cur,
+	struct xfs_rmap_irec	*rec,
+	void			*priv);
+
+int xfs_rmapbt_query_range(struct xfs_btree_cur *cur,
+		struct xfs_rmap_irec *low_rec, struct xfs_rmap_irec *high_rec,
+		xfs_rmapbt_query_range_fn fn, void *priv);
+
 #endif	/* __XFS_RMAP_BTREE_H__ */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 039/145] xfs: add an extent to the rmap btree
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (37 preceding siblings ...)
  2016-06-17  1:34 ` [PATCH 038/145] xfs: teach rmapbt to support interval queries Darrick J. Wong
@ 2016-06-17  1:34 ` Darrick J. Wong
  2016-06-17  1:34 ` [PATCH 040/145] xfs: remove an extent from " Darrick J. Wong
                   ` (105 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:34 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: Dave Chinner, xfs

From: Dave Chinner <dchinner@redhat.com>

Now all the btree, free space and transaction infrastructure is in
place, we can finally add the code to insert reverse mappings to the
rmap btree. Freeing will be done in a separate patch, so just the
addition operation can be focussed on here.

v2: Update alloc function to handle non-shared file data.  Isolate the
part that makes changes from the part that initializes the rmap
cursor; this will be useful for deferred updates.

[darrick: handle owner offsets when adding rmaps]
[dchinner: remove remaining debug printk statements]
[darrick: move unwritten bit to rm_offset]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 include/xfs_trace.h     |    9 ++
 libxfs/xfs_rmap.c       |  225 ++++++++++++++++++++++++++++++++++++++++++++++-
 libxfs/xfs_rmap_btree.h |    1 
 3 files changed, 230 insertions(+), 5 deletions(-)


diff --git a/include/xfs_trace.h b/include/xfs_trace.h
index af3f68b..45b5284 100644
--- a/include/xfs_trace.h
+++ b/include/xfs_trace.h
@@ -194,6 +194,15 @@
 #define trace_xfs_rmap_free_extent_done(...)	((void) 0)
 #define trace_xfs_rmapbt_free_block(...)	((void) 0)
 #define trace_xfs_rmapbt_alloc_block(...)	((void) 0)
+#define trace_xfs_rmapbt_update(...)		((void) 0)
+#define trace_xfs_rmapbt_update_error(...)	((void) 0)
+#define trace_xfs_rmapbt_insert(...)		((void) 0)
+#define trace_xfs_rmapbt_insert_error(...)	((void) 0)
+#define trace_xfs_rmapbt_delete(...)		((void) 0)
+#define trace_xfs_rmapbt_delete_error(...)	((void) 0)
+
+#define trace_xfs_rmap_lookup_le_range_result(...)		((void) 0)
+#define trace_xfs_rmap_map_gtrec(...)		((void) 0)
 
 /* set c = c to avoid unused var warnings */
 #define trace_xfs_perag_get(a,b,c,d)	((c) = (c))
diff --git a/libxfs/xfs_rmap.c b/libxfs/xfs_rmap.c
index 390ddde..fc3522b 100644
--- a/libxfs/xfs_rmap.c
+++ b/libxfs/xfs_rmap.c
@@ -157,6 +157,218 @@ out_error:
 	return error;
 }
 
+/*
+ * A mergeable rmap should have the same owner, cannot be unwritten, and
+ * must be a bmbt rmap if we're asking about a bmbt rmap.
+ */
+static bool
+xfs_rmap_is_mergeable(
+	struct xfs_rmap_irec	*irec,
+	uint64_t		owner,
+	uint64_t		offset,
+	xfs_extlen_t		len,
+	unsigned int		flags)
+{
+	if (irec->rm_owner == XFS_RMAP_OWN_NULL)
+		return false;
+	if (irec->rm_owner != owner)
+		return false;
+	if ((flags & XFS_RMAP_UNWRITTEN) ^
+	    (irec->rm_flags & XFS_RMAP_UNWRITTEN))
+		return false;
+	if ((flags & XFS_RMAP_ATTR_FORK) ^
+	    (irec->rm_flags & XFS_RMAP_ATTR_FORK))
+		return false;
+	if ((flags & XFS_RMAP_BMBT_BLOCK) ^
+	    (irec->rm_flags & XFS_RMAP_BMBT_BLOCK))
+		return false;
+	return true;
+}
+
+/*
+ * When we allocate a new block, the first thing we do is add a reference to
+ * the extent in the rmap btree. This takes the form of a [agbno, length,
+ * owner, offset] record.  Flags are encoded in the high bits of the offset
+ * field.
+ */
+STATIC int
+__xfs_rmap_alloc(
+	struct xfs_btree_cur	*cur,
+	xfs_agblock_t		bno,
+	xfs_extlen_t		len,
+	bool			unwritten,
+	struct xfs_owner_info	*oinfo)
+{
+	struct xfs_mount	*mp = cur->bc_mp;
+	struct xfs_rmap_irec	ltrec;
+	struct xfs_rmap_irec	gtrec;
+	int			have_gt;
+	int			have_lt;
+	int			error = 0;
+	int			i;
+	uint64_t		owner;
+	uint64_t		offset;
+	unsigned int		flags = 0;
+	bool			ignore_off;
+
+	xfs_owner_info_unpack(oinfo, &owner, &offset, &flags);
+	ignore_off = XFS_RMAP_NON_INODE_OWNER(owner) ||
+			(flags & XFS_RMAP_BMBT_BLOCK);
+	if (unwritten)
+		flags |= XFS_RMAP_UNWRITTEN;
+	trace_xfs_rmap_alloc_extent(mp, cur->bc_private.a.agno, bno, len,
+			unwritten, oinfo);
+
+	/*
+	 * For the initial lookup, look for and exact match or the left-adjacent
+	 * record for our insertion point. This will also give us the record for
+	 * start block contiguity tests.
+	 */
+	error = xfs_rmap_lookup_le(cur, bno, len, owner, offset, flags,
+			&have_lt);
+	if (error)
+		goto out_error;
+	XFS_WANT_CORRUPTED_GOTO(mp, have_lt == 1, out_error);
+
+	error = xfs_rmap_get_rec(cur, &ltrec, &have_lt);
+	if (error)
+		goto out_error;
+	XFS_WANT_CORRUPTED_GOTO(mp, have_lt == 1, out_error);
+	trace_xfs_rmap_lookup_le_range_result(cur->bc_mp,
+			cur->bc_private.a.agno, ltrec.rm_startblock,
+			ltrec.rm_blockcount, ltrec.rm_owner,
+			ltrec.rm_offset, ltrec.rm_flags);
+
+	if (!xfs_rmap_is_mergeable(&ltrec, owner, offset, len, flags))
+		have_lt = 0;
+
+	XFS_WANT_CORRUPTED_GOTO(mp,
+		have_lt == 0 ||
+		ltrec.rm_startblock + ltrec.rm_blockcount <= bno, out_error);
+
+	/*
+	 * Increment the cursor to see if we have a right-adjacent record to our
+	 * insertion point. This will give us the record for end block
+	 * contiguity tests.
+	 */
+	error = xfs_btree_increment(cur, 0, &have_gt);
+	if (error)
+		goto out_error;
+	if (have_gt) {
+		error = xfs_rmap_get_rec(cur, &gtrec, &have_gt);
+		if (error)
+			goto out_error;
+		XFS_WANT_CORRUPTED_GOTO(mp, have_gt == 1, out_error);
+		XFS_WANT_CORRUPTED_GOTO(mp, bno + len <= gtrec.rm_startblock,
+					out_error);
+		trace_xfs_rmap_map_gtrec(cur->bc_mp,
+			cur->bc_private.a.agno, gtrec.rm_startblock,
+			gtrec.rm_blockcount, gtrec.rm_owner,
+			gtrec.rm_offset, gtrec.rm_flags);
+		if (!xfs_rmap_is_mergeable(&gtrec, owner, offset, len, flags))
+			have_gt = 0;
+	}
+
+	/*
+	 * Note: cursor currently points one record to the right of ltrec, even
+	 * if there is no record in the tree to the right.
+	 */
+	if (have_lt &&
+	    ltrec.rm_startblock + ltrec.rm_blockcount == bno &&
+	    (ignore_off || ltrec.rm_offset + ltrec.rm_blockcount == offset)) {
+		/*
+		 * left edge contiguous, merge into left record.
+		 *
+		 *       ltbno     ltlen
+		 * orig:   |ooooooooo|
+		 * adding:           |aaaaaaaaa|
+		 * result: |rrrrrrrrrrrrrrrrrrr|
+		 *                  bno       len
+		 */
+		ltrec.rm_blockcount += len;
+		if (have_gt &&
+		    bno + len == gtrec.rm_startblock &&
+		    (ignore_off || offset + len == gtrec.rm_offset) &&
+		    (unsigned long)ltrec.rm_blockcount + len +
+				gtrec.rm_blockcount <= XFS_RMAP_LEN_MAX) {
+			/*
+			 * right edge also contiguous, delete right record
+			 * and merge into left record.
+			 *
+			 *       ltbno     ltlen    gtbno     gtlen
+			 * orig:   |ooooooooo|         |ooooooooo|
+			 * adding:           |aaaaaaaaa|
+			 * result: |rrrrrrrrrrrrrrrrrrrrrrrrrrrrr|
+			 */
+			ltrec.rm_blockcount += gtrec.rm_blockcount;
+			trace_xfs_rmapbt_delete(mp, cur->bc_private.a.agno,
+					gtrec.rm_startblock,
+					gtrec.rm_blockcount,
+					gtrec.rm_owner,
+					gtrec.rm_offset,
+					gtrec.rm_flags);
+			error = xfs_btree_delete(cur, &i);
+			if (error)
+				goto out_error;
+			XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+		}
+
+		/* point the cursor back to the left record and update */
+		error = xfs_btree_decrement(cur, 0, &have_gt);
+		if (error)
+			goto out_error;
+		error = xfs_rmap_update(cur, &ltrec);
+		if (error)
+			goto out_error;
+	} else if (have_gt &&
+		   bno + len == gtrec.rm_startblock &&
+		   (ignore_off || offset + len == gtrec.rm_offset)) {
+		/*
+		 * right edge contiguous, merge into right record.
+		 *
+		 *                 gtbno     gtlen
+		 * Orig:             |ooooooooo|
+		 * adding: |aaaaaaaaa|
+		 * Result: |rrrrrrrrrrrrrrrrrrr|
+		 *        bno       len
+		 */
+		gtrec.rm_startblock = bno;
+		gtrec.rm_blockcount += len;
+		if (!ignore_off)
+			gtrec.rm_offset = offset;
+		error = xfs_rmap_update(cur, &gtrec);
+		if (error)
+			goto out_error;
+	} else {
+		/*
+		 * no contiguous edge with identical owner, insert
+		 * new record at current cursor position.
+		 */
+		cur->bc_rec.r.rm_startblock = bno;
+		cur->bc_rec.r.rm_blockcount = len;
+		cur->bc_rec.r.rm_owner = owner;
+		cur->bc_rec.r.rm_offset = offset;
+		cur->bc_rec.r.rm_flags = flags;
+		trace_xfs_rmapbt_insert(mp, cur->bc_private.a.agno, bno, len,
+			owner, offset, flags);
+		error = xfs_btree_insert(cur, &i);
+		if (error)
+			goto out_error;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+	}
+
+	trace_xfs_rmap_alloc_extent_done(mp, cur->bc_private.a.agno, bno, len,
+			unwritten, oinfo);
+out_error:
+	if (error)
+		trace_xfs_rmap_alloc_extent_error(mp, cur->bc_private.a.agno,
+				bno, len, unwritten, oinfo);
+	return error;
+}
+
+/*
+ * Add a reference to an extent in the rmap btree.
+ */
 int
 xfs_rmap_alloc(
 	struct xfs_trans	*tp,
@@ -167,19 +379,22 @@ xfs_rmap_alloc(
 	struct xfs_owner_info	*oinfo)
 {
 	struct xfs_mount	*mp = tp->t_mountp;
-	int			error = 0;
+	struct xfs_btree_cur	*cur;
+	int			error;
 
 	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
 		return 0;
 
-	trace_xfs_rmap_alloc_extent(mp, agno, bno, len, false, oinfo);
-	if (1)
+	cur = xfs_rmapbt_init_cursor(mp, tp, agbp, agno);
+	error = __xfs_rmap_alloc(cur, bno, len, false, oinfo);
+	if (error)
 		goto out_error;
-	trace_xfs_rmap_alloc_extent_done(mp, agno, bno, len, false, oinfo);
+
+	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
 	return 0;
 
 out_error:
-	trace_xfs_rmap_alloc_extent_error(mp, agno, bno, len, false, oinfo);
+	xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
 	return error;
 }
 
diff --git a/libxfs/xfs_rmap_btree.h b/libxfs/xfs_rmap_btree.h
index e926c6e..9d92da5 100644
--- a/libxfs/xfs_rmap_btree.h
+++ b/libxfs/xfs_rmap_btree.h
@@ -67,6 +67,7 @@ int xfs_rmap_lookup_eq(struct xfs_btree_cur *cur, xfs_agblock_t bno,
 int xfs_rmap_get_rec(struct xfs_btree_cur *cur, struct xfs_rmap_irec *irec,
 		int *stat);
 
+/* functions for updating the rmapbt for bmbt blocks and AG btree blocks */
 int xfs_rmap_alloc(struct xfs_trans *tp, struct xfs_buf *agbp,
 		   xfs_agnumber_t agno, xfs_agblock_t bno, xfs_extlen_t len,
 		   struct xfs_owner_info *oinfo);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 040/145] xfs: remove an extent from the rmap btree
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (38 preceding siblings ...)
  2016-06-17  1:34 ` [PATCH 039/145] xfs: add an extent to the rmap btree Darrick J. Wong
@ 2016-06-17  1:34 ` Darrick J. Wong
  2016-06-17  1:35 ` [PATCH 041/145] xfs: convert unwritten status of reverse mappings Darrick J. Wong
                   ` (104 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:34 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: Dave Chinner, xfs

From: Dave Chinner <dchinner@redhat.com>

Now that we have records in the rmap btree, we need to remove them
when extents are freed. This needs to find the relevant record in
the btree and remove/trim/split it accordingly.

v2: Update the free function to deal with non-shared file data, and
isolate the part that does the rmap update from the part that deals
with cursors.  This will be useful for deferred ops.

[darrick.wong@oracle.com: make rmap routines handle the enlarged keyspace]
[dchinner: remove remaining unused debug printks]
[darrick: fix a bug when growfs in an AG with an rmap ending at EOFS]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 libxfs/xfs_rmap.c |  220 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 215 insertions(+), 5 deletions(-)


diff --git a/libxfs/xfs_rmap.c b/libxfs/xfs_rmap.c
index fc3522b..cc82824 100644
--- a/libxfs/xfs_rmap.c
+++ b/libxfs/xfs_rmap.c
@@ -131,6 +131,212 @@ xfs_rmap_get_rec(
 	return xfs_rmapbt_btrec_to_irec(rec, irec);
 }
 
+/*
+ * Find the extent in the rmap btree and remove it.
+ *
+ * The record we find should always be an exact match for the extent that we're
+ * looking for, since we insert them into the btree without modification.
+ *
+ * Special Case #1: when growing the filesystem, we "free" an extent when
+ * growing the last AG. This extent is new space and so it is not tracked as
+ * used space in the btree. The growfs code will pass in an owner of
+ * XFS_RMAP_OWN_NULL to indicate that it expected that there is no owner of this
+ * extent. We verify that - the extent lookup result in a record that does not
+ * overlap.
+ *
+ * Special Case #2: EFIs do not record the owner of the extent, so when
+ * recovering EFIs from the log we pass in XFS_RMAP_OWN_UNKNOWN to tell the rmap
+ * btree to ignore the owner (i.e. wildcard match) so we don't trigger
+ * corruption checks during log recovery.
+ */
+STATIC int
+__xfs_rmap_free(
+	struct xfs_btree_cur	*cur,
+	xfs_agblock_t		bno,
+	xfs_extlen_t		len,
+	bool			unwritten,
+	struct xfs_owner_info	*oinfo)
+{
+	struct xfs_mount	*mp = cur->bc_mp;
+	struct xfs_rmap_irec	ltrec;
+	uint64_t		ltoff;
+	int			error = 0;
+	int			i;
+	uint64_t		owner;
+	uint64_t		offset;
+	unsigned int		flags;
+	bool			ignore_off;
+
+	xfs_owner_info_unpack(oinfo, &owner, &offset, &flags);
+	ignore_off = XFS_RMAP_NON_INODE_OWNER(owner) ||
+			(flags & XFS_RMAP_BMBT_BLOCK);
+	if (unwritten)
+		flags |= XFS_RMAP_UNWRITTEN;
+	trace_xfs_rmap_free_extent(mp, cur->bc_private.a.agno, bno, len,
+			unwritten, oinfo);
+
+	/*
+	 * We should always have a left record because there's a static record
+	 * for the AG headers at rm_startblock == 0 created by mkfs/growfs that
+	 * will not ever be removed from the tree.
+	 */
+	error = xfs_rmap_lookup_le(cur, bno, len, owner, offset, flags, &i);
+	if (error)
+		goto out_error;
+	XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+
+	error = xfs_rmap_get_rec(cur, &ltrec, &i);
+	if (error)
+		goto out_error;
+	XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+	trace_xfs_rmap_lookup_le_range_result(cur->bc_mp,
+			cur->bc_private.a.agno, ltrec.rm_startblock,
+			ltrec.rm_blockcount, ltrec.rm_owner,
+			ltrec.rm_offset, ltrec.rm_flags);
+	ltoff = ltrec.rm_offset;
+
+	/*
+	 * For growfs, the incoming extent must be beyond the left record we
+	 * just found as it is new space and won't be used by anyone. This is
+	 * just a corruption check as we don't actually do anything with this
+	 * extent.  Note that we need to use >= instead of > because it might
+	 * be the case that the "left" extent goes all the way to EOFS.
+	 */
+	if (owner == XFS_RMAP_OWN_NULL) {
+		XFS_WANT_CORRUPTED_GOTO(mp, bno >= ltrec.rm_startblock +
+						ltrec.rm_blockcount, out_error);
+		goto out_done;
+	}
+
+	/* Make sure the unwritten flag matches. */
+	XFS_WANT_CORRUPTED_GOTO(mp, (flags & XFS_RMAP_UNWRITTEN) ==
+			(ltrec.rm_flags & XFS_RMAP_UNWRITTEN), out_error);
+
+	/* Make sure the extent we found covers the entire freeing range. */
+	XFS_WANT_CORRUPTED_GOTO(mp, ltrec.rm_startblock <= bno &&
+		ltrec.rm_startblock + ltrec.rm_blockcount >=
+		bno + len, out_error);
+
+	/* Make sure the owner matches what we expect to find in the tree. */
+	XFS_WANT_CORRUPTED_GOTO(mp, owner == ltrec.rm_owner ||
+				    XFS_RMAP_NON_INODE_OWNER(owner), out_error);
+
+	/* Check the offset, if necessary. */
+	if (!XFS_RMAP_NON_INODE_OWNER(owner)) {
+		if (flags & XFS_RMAP_BMBT_BLOCK) {
+			XFS_WANT_CORRUPTED_GOTO(mp,
+					ltrec.rm_flags & XFS_RMAP_BMBT_BLOCK,
+					out_error);
+		} else {
+			XFS_WANT_CORRUPTED_GOTO(mp,
+					ltrec.rm_offset <= offset, out_error);
+			XFS_WANT_CORRUPTED_GOTO(mp,
+					ltoff + ltrec.rm_blockcount >= offset + len,
+					out_error);
+		}
+	}
+
+	if (ltrec.rm_startblock == bno && ltrec.rm_blockcount == len) {
+		/* exact match, simply remove the record from rmap tree */
+		trace_xfs_rmapbt_delete(mp, cur->bc_private.a.agno,
+				ltrec.rm_startblock, ltrec.rm_blockcount,
+				ltrec.rm_owner, ltrec.rm_offset,
+				ltrec.rm_flags);
+		error = xfs_btree_delete(cur, &i);
+		if (error)
+			goto out_error;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+	} else if (ltrec.rm_startblock == bno) {
+		/*
+		 * overlap left hand side of extent: move the start, trim the
+		 * length and update the current record.
+		 *
+		 *       ltbno                ltlen
+		 * Orig:    |oooooooooooooooooooo|
+		 * Freeing: |fffffffff|
+		 * Result:            |rrrrrrrrrr|
+		 *         bno       len
+		 */
+		ltrec.rm_startblock += len;
+		ltrec.rm_blockcount -= len;
+		if (!ignore_off)
+			ltrec.rm_offset += len;
+		error = xfs_rmap_update(cur, &ltrec);
+		if (error)
+			goto out_error;
+	} else if (ltrec.rm_startblock + ltrec.rm_blockcount == bno + len) {
+		/*
+		 * overlap right hand side of extent: trim the length and update
+		 * the current record.
+		 *
+		 *       ltbno                ltlen
+		 * Orig:    |oooooooooooooooooooo|
+		 * Freeing:            |fffffffff|
+		 * Result:  |rrrrrrrrrr|
+		 *                    bno       len
+		 */
+		ltrec.rm_blockcount -= len;
+		error = xfs_rmap_update(cur, &ltrec);
+		if (error)
+			goto out_error;
+	} else {
+
+		/*
+		 * overlap middle of extent: trim the length of the existing
+		 * record to the length of the new left-extent size, increment
+		 * the insertion position so we can insert a new record
+		 * containing the remaining right-extent space.
+		 *
+		 *       ltbno                ltlen
+		 * Orig:    |oooooooooooooooooooo|
+		 * Freeing:       |fffffffff|
+		 * Result:  |rrrrr|         |rrrr|
+		 *               bno       len
+		 */
+		xfs_extlen_t	orig_len = ltrec.rm_blockcount;
+
+		ltrec.rm_blockcount = bno - ltrec.rm_startblock;
+		error = xfs_rmap_update(cur, &ltrec);
+		if (error)
+			goto out_error;
+
+		error = xfs_btree_increment(cur, 0, &i);
+		if (error)
+			goto out_error;
+
+		cur->bc_rec.r.rm_startblock = bno + len;
+		cur->bc_rec.r.rm_blockcount = orig_len - len -
+						     ltrec.rm_blockcount;
+		cur->bc_rec.r.rm_owner = ltrec.rm_owner;
+		if (ignore_off)
+			cur->bc_rec.r.rm_offset = 0;
+		else
+			cur->bc_rec.r.rm_offset = offset + len;
+		cur->bc_rec.r.rm_flags = flags;
+		trace_xfs_rmapbt_insert(mp, cur->bc_private.a.agno,
+				cur->bc_rec.r.rm_startblock,
+				cur->bc_rec.r.rm_blockcount,
+				cur->bc_rec.r.rm_owner,
+				cur->bc_rec.r.rm_offset,
+				cur->bc_rec.r.rm_flags);
+		error = xfs_btree_insert(cur, &i);
+		if (error)
+			goto out_error;
+	}
+
+out_done:
+	trace_xfs_rmap_free_extent_done(mp, cur->bc_private.a.agno, bno, len,
+			unwritten, oinfo);
+out_error:
+	if (error)
+		trace_xfs_rmap_free_extent_error(mp, cur->bc_private.a.agno,
+				bno, len, unwritten, oinfo);
+	return error;
+}
+
+/*
+ * Remove a reference to an extent in the rmap btree.
+ */
 int
 xfs_rmap_free(
 	struct xfs_trans	*tp,
@@ -141,19 +347,23 @@ xfs_rmap_free(
 	struct xfs_owner_info	*oinfo)
 {
 	struct xfs_mount	*mp = tp->t_mountp;
-	int			error = 0;
+	struct xfs_btree_cur	*cur;
+	int			error;
 
 	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
 		return 0;
 
-	trace_xfs_rmap_free_extent(mp, agno, bno, len, false, oinfo);
-	if (1)
+	cur = xfs_rmapbt_init_cursor(mp, tp, agbp, agno);
+
+	error = __xfs_rmap_free(cur, bno, len, false, oinfo);
+	if (error)
 		goto out_error;
-	trace_xfs_rmap_free_extent_done(mp, agno, bno, len, false, oinfo);
+
+	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
 	return 0;
 
 out_error:
-	trace_xfs_rmap_free_extent_error(mp, agno, bno, len, false, oinfo);
+	xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
 	return error;
 }
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 041/145] xfs: convert unwritten status of reverse mappings
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (39 preceding siblings ...)
  2016-06-17  1:34 ` [PATCH 040/145] xfs: remove an extent from " Darrick J. Wong
@ 2016-06-17  1:35 ` Darrick J. Wong
  2016-06-17  1:35 ` [PATCH 042/145] xfs: add rmap btree insert and delete helpers Darrick J. Wong
                   ` (103 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:35 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Provide a function to convert an unwritten extent to a real one and
vice versa.

v2: Move unwritten bit to rm_offset.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 include/xfs_trace.h |    6 +
 libxfs/xfs_rmap.c   |  442 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 448 insertions(+)


diff --git a/include/xfs_trace.h b/include/xfs_trace.h
index 45b5284..55df410 100644
--- a/include/xfs_trace.h
+++ b/include/xfs_trace.h
@@ -203,6 +203,12 @@
 
 #define trace_xfs_rmap_lookup_le_range_result(...)		((void) 0)
 #define trace_xfs_rmap_map_gtrec(...)		((void) 0)
+#define trace_xfs_rmap_convert(...)		((void) 0)
+#define trace_xfs_rmap_convert_gtrec(...)	((void) 0)
+#define trace_xfs_rmap_convert_state(...)	((void) 0)
+#define trace_xfs_rmap_convert_done(...)	((void) 0)
+#define trace_xfs_rmap_convert_error(...)	((void) 0)
+#define trace_xfs_rmap_find_left_neighbor_result(...)		((void) 0)
 
 /* set c = c to avoid unused var warnings */
 #define trace_xfs_perag_get(a,b,c,d)	((c) = (c))
diff --git a/libxfs/xfs_rmap.c b/libxfs/xfs_rmap.c
index cc82824..1e25285 100644
--- a/libxfs/xfs_rmap.c
+++ b/libxfs/xfs_rmap.c
@@ -608,6 +608,448 @@ out_error:
 	return error;
 }
 
+#define RMAP_LEFT_CONTIG	(1 << 0)
+#define RMAP_RIGHT_CONTIG	(1 << 1)
+#define RMAP_LEFT_FILLING	(1 << 2)
+#define RMAP_RIGHT_FILLING	(1 << 3)
+#define RMAP_LEFT_VALID		(1 << 6)
+#define RMAP_RIGHT_VALID	(1 << 7)
+
+#define LEFT		r[0]
+#define RIGHT		r[1]
+#define PREV		r[2]
+#define NEW		r[3]
+
+/*
+ * Convert an unwritten extent to a real extent or vice versa.
+ * Does not handle overlapping extents.
+ */
+STATIC int
+__xfs_rmap_convert(
+	struct xfs_btree_cur	*cur,
+	xfs_agblock_t		bno,
+	xfs_extlen_t		len,
+	bool			unwritten,
+	struct xfs_owner_info	*oinfo)
+{
+	struct xfs_mount	*mp = cur->bc_mp;
+	struct xfs_rmap_irec	r[4];	/* neighbor extent entries */
+					/* left is 0, right is 1, prev is 2 */
+					/* new is 3 */
+	uint64_t		owner;
+	uint64_t		offset;
+	uint64_t		new_endoff;
+	unsigned int		oldext;
+	unsigned int		newext;
+	unsigned int		flags = 0;
+	int			i;
+	int			state = 0;
+	int			error;
+
+	xfs_owner_info_unpack(oinfo, &owner, &offset, &flags);
+	ASSERT(!(XFS_RMAP_NON_INODE_OWNER(owner) ||
+			(flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK))));
+	oldext = unwritten ? XFS_RMAP_UNWRITTEN : 0;
+	new_endoff = offset + len;
+	trace_xfs_rmap_convert(mp, cur->bc_private.a.agno, bno, len,
+			unwritten, oinfo);
+
+	/*
+	 * For the initial lookup, look for and exact match or the left-adjacent
+	 * record for our insertion point. This will also give us the record for
+	 * start block contiguity tests.
+	 */
+	error = xfs_rmap_lookup_le(cur, bno, len, owner, offset, oldext, &i);
+	if (error)
+		goto done;
+	XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
+
+	error = xfs_rmap_get_rec(cur, &PREV, &i);
+	if (error)
+		goto done;
+	XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
+	trace_xfs_rmap_lookup_le_range_result(cur->bc_mp,
+			cur->bc_private.a.agno, PREV.rm_startblock,
+			PREV.rm_blockcount, PREV.rm_owner,
+			PREV.rm_offset, PREV.rm_flags);
+
+	ASSERT(PREV.rm_offset <= offset);
+	ASSERT(PREV.rm_offset + PREV.rm_blockcount >= new_endoff);
+	ASSERT((PREV.rm_flags & XFS_RMAP_UNWRITTEN) == oldext);
+	newext = ~oldext & XFS_RMAP_UNWRITTEN;
+
+	/*
+	 * Set flags determining what part of the previous oldext allocation
+	 * extent is being replaced by a newext allocation.
+	 */
+	if (PREV.rm_offset == offset)
+		state |= RMAP_LEFT_FILLING;
+	if (PREV.rm_offset + PREV.rm_blockcount == new_endoff)
+		state |= RMAP_RIGHT_FILLING;
+
+	/*
+	 * Decrement the cursor to see if we have a left-adjacent record to our
+	 * insertion point. This will give us the record for end block
+	 * contiguity tests.
+	 */
+	error = xfs_btree_decrement(cur, 0, &i);
+	if (error)
+		goto done;
+	if (i) {
+		state |= RMAP_LEFT_VALID;
+		error = xfs_rmap_get_rec(cur, &LEFT, &i);
+		if (error)
+			goto done;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
+		XFS_WANT_CORRUPTED_GOTO(mp,
+				LEFT.rm_startblock + LEFT.rm_blockcount <= bno,
+				done);
+		trace_xfs_rmap_find_left_neighbor_result(cur->bc_mp,
+				cur->bc_private.a.agno, LEFT.rm_startblock,
+				LEFT.rm_blockcount, LEFT.rm_owner,
+				LEFT.rm_offset, LEFT.rm_flags);
+		if (LEFT.rm_startblock + LEFT.rm_blockcount == bno &&
+		    LEFT.rm_offset + LEFT.rm_blockcount == offset &&
+		    xfs_rmap_is_mergeable(&LEFT, owner, offset, len, newext))
+			state |= RMAP_LEFT_CONTIG;
+	}
+
+	/*
+	 * Increment the cursor to see if we have a right-adjacent record to our
+	 * insertion point. This will give us the record for end block
+	 * contiguity tests.
+	 */
+	error = xfs_btree_increment(cur, 0, &i);
+	if (error)
+		goto done;
+	XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
+	error = xfs_btree_increment(cur, 0, &i);
+	if (error)
+		goto done;
+	if (i) {
+		state |= RMAP_RIGHT_VALID;
+		error = xfs_rmap_get_rec(cur, &RIGHT, &i);
+		if (error)
+			goto done;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
+		XFS_WANT_CORRUPTED_GOTO(mp, bno + len <= RIGHT.rm_startblock,
+					done);
+		trace_xfs_rmap_convert_gtrec(cur->bc_mp,
+				cur->bc_private.a.agno, RIGHT.rm_startblock,
+				RIGHT.rm_blockcount, RIGHT.rm_owner,
+				RIGHT.rm_offset, RIGHT.rm_flags);
+		if (bno + len == RIGHT.rm_startblock &&
+		    offset + len == RIGHT.rm_offset &&
+		    xfs_rmap_is_mergeable(&RIGHT, owner, offset, len, newext))
+			state |= RMAP_RIGHT_CONTIG;
+	}
+
+	/* check that left + prev + right is not too long */
+	if ((state & (RMAP_LEFT_FILLING | RMAP_LEFT_CONTIG |
+			 RMAP_RIGHT_FILLING | RMAP_RIGHT_CONTIG)) ==
+	    (RMAP_LEFT_FILLING | RMAP_LEFT_CONTIG |
+	     RMAP_RIGHT_FILLING | RMAP_RIGHT_CONTIG) &&
+	    (unsigned long)LEFT.rm_blockcount + len +
+	     RIGHT.rm_blockcount > XFS_RMAP_LEN_MAX)
+		state &= ~RMAP_RIGHT_CONTIG;
+
+	trace_xfs_rmap_convert_state(mp, cur->bc_private.a.agno, state,
+			_RET_IP_);
+
+	/* reset the cursor back to PREV */
+	error = xfs_rmap_lookup_le(cur, bno, len, owner, offset, oldext, &i);
+	if (error)
+		goto done;
+	XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
+
+	/*
+	 * Switch out based on the FILLING and CONTIG state bits.
+	 */
+	switch (state & (RMAP_LEFT_FILLING | RMAP_LEFT_CONTIG |
+			 RMAP_RIGHT_FILLING | RMAP_RIGHT_CONTIG)) {
+	case RMAP_LEFT_FILLING | RMAP_LEFT_CONTIG |
+	     RMAP_RIGHT_FILLING | RMAP_RIGHT_CONTIG:
+		/*
+		 * Setting all of a previous oldext extent to newext.
+		 * The left and right neighbors are both contiguous with new.
+		 */
+		error = xfs_btree_increment(cur, 0, &i);
+		if (error)
+			goto done;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
+		trace_xfs_rmapbt_delete(mp, cur->bc_private.a.agno,
+				RIGHT.rm_startblock, RIGHT.rm_blockcount,
+				RIGHT.rm_owner, RIGHT.rm_offset,
+				RIGHT.rm_flags);
+		error = xfs_btree_delete(cur, &i);
+		if (error)
+			goto done;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
+		error = xfs_btree_decrement(cur, 0, &i);
+		if (error)
+			goto done;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
+		trace_xfs_rmapbt_delete(mp, cur->bc_private.a.agno,
+				PREV.rm_startblock, PREV.rm_blockcount,
+				PREV.rm_owner, PREV.rm_offset,
+				PREV.rm_flags);
+		error = xfs_btree_delete(cur, &i);
+		if (error)
+			goto done;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
+		error = xfs_btree_decrement(cur, 0, &i);
+		if (error)
+			goto done;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
+		NEW = LEFT;
+		NEW.rm_blockcount += PREV.rm_blockcount + RIGHT.rm_blockcount;
+		error = xfs_rmap_update(cur, &NEW);
+		if (error)
+			goto done;
+		break;
+
+	case RMAP_LEFT_FILLING | RMAP_RIGHT_FILLING | RMAP_LEFT_CONTIG:
+		/*
+		 * Setting all of a previous oldext extent to newext.
+		 * The left neighbor is contiguous, the right is not.
+		 */
+		trace_xfs_rmapbt_delete(mp, cur->bc_private.a.agno,
+				PREV.rm_startblock, PREV.rm_blockcount,
+				PREV.rm_owner, PREV.rm_offset,
+				PREV.rm_flags);
+		error = xfs_btree_delete(cur, &i);
+		if (error)
+			goto done;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
+		error = xfs_btree_decrement(cur, 0, &i);
+		if (error)
+			goto done;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
+		NEW = LEFT;
+		NEW.rm_blockcount += PREV.rm_blockcount;
+		error = xfs_rmap_update(cur, &NEW);
+		if (error)
+			goto done;
+		break;
+
+	case RMAP_LEFT_FILLING | RMAP_RIGHT_FILLING | RMAP_RIGHT_CONTIG:
+		/*
+		 * Setting all of a previous oldext extent to newext.
+		 * The right neighbor is contiguous, the left is not.
+		 */
+		error = xfs_btree_increment(cur, 0, &i);
+		if (error)
+			goto done;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
+		trace_xfs_rmapbt_delete(mp, cur->bc_private.a.agno,
+				RIGHT.rm_startblock, RIGHT.rm_blockcount,
+				RIGHT.rm_owner, RIGHT.rm_offset,
+				RIGHT.rm_flags);
+		error = xfs_btree_delete(cur, &i);
+		if (error)
+			goto done;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
+		error = xfs_btree_decrement(cur, 0, &i);
+		if (error)
+			goto done;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
+		NEW.rm_startblock = bno;
+		NEW.rm_owner = owner;
+		NEW.rm_offset = offset;
+		NEW.rm_blockcount = len + RIGHT.rm_blockcount;
+		NEW.rm_flags = newext;
+		error = xfs_rmap_update(cur, &NEW);
+		if (error)
+			goto done;
+		break;
+
+	case RMAP_LEFT_FILLING | RMAP_RIGHT_FILLING:
+		/*
+		 * Setting all of a previous oldext extent to newext.
+		 * Neither the left nor right neighbors are contiguous with
+		 * the new one.
+		 */
+		NEW = PREV;
+		NEW.rm_flags = newext;
+		error = xfs_rmap_update(cur, &NEW);
+		if (error)
+			goto done;
+		break;
+
+	case RMAP_LEFT_FILLING | RMAP_LEFT_CONTIG:
+		/*
+		 * Setting the first part of a previous oldext extent to newext.
+		 * The left neighbor is contiguous.
+		 */
+		NEW = PREV;
+		NEW.rm_offset += len;
+		NEW.rm_startblock += len;
+		NEW.rm_blockcount -= len;
+		error = xfs_rmap_update(cur, &NEW);
+		if (error)
+			goto done;
+		error = xfs_btree_decrement(cur, 0, &i);
+		if (error)
+			goto done;
+		NEW = LEFT;
+		NEW.rm_blockcount += len;
+		error = xfs_rmap_update(cur, &NEW);
+		if (error)
+			goto done;
+		break;
+
+	case RMAP_LEFT_FILLING:
+		/*
+		 * Setting the first part of a previous oldext extent to newext.
+		 * The left neighbor is not contiguous.
+		 */
+		NEW = PREV;
+		NEW.rm_startblock += len;
+		NEW.rm_offset += len;
+		NEW.rm_blockcount -= len;
+		error = xfs_rmap_update(cur, &NEW);
+		if (error)
+			goto done;
+		NEW.rm_startblock = bno;
+		NEW.rm_owner = owner;
+		NEW.rm_offset = offset;
+		NEW.rm_blockcount = len;
+		NEW.rm_flags = newext;
+		cur->bc_rec.r = NEW;
+		trace_xfs_rmapbt_insert(mp, cur->bc_private.a.agno, bno,
+				len, owner, offset, newext);
+		error = xfs_btree_insert(cur, &i);
+		if (error)
+			goto done;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
+		break;
+
+	case RMAP_RIGHT_FILLING | RMAP_RIGHT_CONTIG:
+		/*
+		 * Setting the last part of a previous oldext extent to newext.
+		 * The right neighbor is contiguous with the new allocation.
+		 */
+		NEW = PREV;
+		NEW.rm_blockcount -= len;
+		error = xfs_rmap_update(cur, &NEW);
+		if (error)
+			goto done;
+		error = xfs_btree_increment(cur, 0, &i);
+		if (error)
+			goto done;
+		NEW = RIGHT;
+		NEW.rm_offset = offset;
+		NEW.rm_startblock = bno;
+		NEW.rm_blockcount += len;
+		error = xfs_rmap_update(cur, &NEW);
+		if (error)
+			goto done;
+		break;
+
+	case RMAP_RIGHT_FILLING:
+		/*
+		 * Setting the last part of a previous oldext extent to newext.
+		 * The right neighbor is not contiguous.
+		 */
+		NEW = PREV;
+		NEW.rm_blockcount -= len;
+		error = xfs_rmap_update(cur, &NEW);
+		if (error)
+			goto done;
+		error = xfs_rmap_lookup_eq(cur, bno, len, owner, offset,
+				oldext, &i);
+		if (error)
+			goto done;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 0, done);
+		NEW.rm_startblock = bno;
+		NEW.rm_owner = owner;
+		NEW.rm_offset = offset;
+		NEW.rm_blockcount = len;
+		NEW.rm_flags = newext;
+		cur->bc_rec.r = NEW;
+		trace_xfs_rmapbt_insert(mp, cur->bc_private.a.agno, bno,
+				len, owner, offset, newext);
+		error = xfs_btree_insert(cur, &i);
+		if (error)
+			goto done;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
+		break;
+
+	case 0:
+		/*
+		 * Setting the middle part of a previous oldext extent to
+		 * newext.  Contiguity is impossible here.
+		 * One extent becomes three extents.
+		 */
+		/* new right extent - oldext */
+		NEW.rm_startblock = bno + len;
+		NEW.rm_owner = owner;
+		NEW.rm_offset = new_endoff;
+		NEW.rm_blockcount = PREV.rm_offset + PREV.rm_blockcount -
+				new_endoff;
+		NEW.rm_flags = PREV.rm_flags;
+		error = xfs_rmap_update(cur, &NEW);
+		if (error)
+			goto done;
+		/* new left extent - oldext */
+		NEW = PREV;
+		NEW.rm_blockcount = offset - PREV.rm_offset;
+		cur->bc_rec.r = NEW;
+		trace_xfs_rmapbt_insert(mp, cur->bc_private.a.agno,
+				NEW.rm_startblock, NEW.rm_blockcount,
+				NEW.rm_owner, NEW.rm_offset,
+				NEW.rm_flags);
+		error = xfs_btree_insert(cur, &i);
+		if (error)
+			goto done;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
+		/*
+		 * Reset the cursor to the position of the new extent
+		 * we are about to insert as we can't trust it after
+		 * the previous insert.
+		 */
+		error = xfs_rmap_lookup_eq(cur, bno, len, owner, offset,
+				oldext, &i);
+		if (error)
+			goto done;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 0, done);
+		/* new middle extent - newext */
+		cur->bc_rec.b.br_state = newext;
+		trace_xfs_rmapbt_insert(mp, cur->bc_private.a.agno, bno, len,
+				owner, offset, newext);
+		error = xfs_btree_insert(cur, &i);
+		if (error)
+			goto done;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
+		break;
+
+	case RMAP_LEFT_FILLING | RMAP_LEFT_CONTIG | RMAP_RIGHT_CONTIG:
+	case RMAP_RIGHT_FILLING | RMAP_LEFT_CONTIG | RMAP_RIGHT_CONTIG:
+	case RMAP_LEFT_FILLING | RMAP_RIGHT_CONTIG:
+	case RMAP_RIGHT_FILLING | RMAP_LEFT_CONTIG:
+	case RMAP_LEFT_CONTIG | RMAP_RIGHT_CONTIG:
+	case RMAP_LEFT_CONTIG:
+	case RMAP_RIGHT_CONTIG:
+		/*
+		 * These cases are all impossible.
+		 */
+		ASSERT(0);
+	}
+
+	trace_xfs_rmap_convert_done(mp, cur->bc_private.a.agno, bno, len,
+			unwritten, oinfo);
+done:
+	if (error)
+		trace_xfs_rmap_convert_error(cur->bc_mp,
+				cur->bc_private.a.agno, error, _RET_IP_);
+	return error;
+}
+
+#undef	NEW
+#undef	LEFT
+#undef	RIGHT
+#undef	PREV
+
 struct xfs_rmapbt_query_range_info {
 	xfs_rmapbt_query_range_fn	fn;
 	void				*priv;

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 042/145] xfs: add rmap btree insert and delete helpers
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (40 preceding siblings ...)
  2016-06-17  1:35 ` [PATCH 041/145] xfs: convert unwritten status of reverse mappings Darrick J. Wong
@ 2016-06-17  1:35 ` Darrick J. Wong
  2016-06-17  1:35 ` [PATCH 043/145] xfs: create helpers for mapping, unmapping, and converting file fork extents Darrick J. Wong
                   ` (102 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:35 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: Dave Chinner, xfs

Add a couple of helper functions to encapsulate rmap btree insert and
delete operations.  Add tracepoints to the update function.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 libxfs/xfs_rmap.c       |   78 ++++++++++++++++++++++++++++++++++++++++++++++-
 libxfs/xfs_rmap_btree.h |    3 ++
 2 files changed, 80 insertions(+), 1 deletion(-)


diff --git a/libxfs/xfs_rmap.c b/libxfs/xfs_rmap.c
index 1e25285..3c13fac 100644
--- a/libxfs/xfs_rmap.c
+++ b/libxfs/xfs_rmap.c
@@ -90,13 +90,89 @@ xfs_rmap_update(
 	struct xfs_rmap_irec	*irec)
 {
 	union xfs_btree_rec	rec;
+	int			error;
+
+	trace_xfs_rmapbt_update(cur->bc_mp, cur->bc_private.a.agno,
+			irec->rm_startblock, irec->rm_blockcount,
+			irec->rm_owner, irec->rm_offset, irec->rm_flags);
 
 	rec.rmap.rm_startblock = cpu_to_be32(irec->rm_startblock);
 	rec.rmap.rm_blockcount = cpu_to_be32(irec->rm_blockcount);
 	rec.rmap.rm_owner = cpu_to_be64(irec->rm_owner);
 	rec.rmap.rm_offset = cpu_to_be64(
 			xfs_rmap_irec_offset_pack(irec));
-	return xfs_btree_update(cur, &rec);
+	error = xfs_btree_update(cur, &rec);
+	if (error)
+		trace_xfs_rmapbt_update_error(cur->bc_mp,
+				cur->bc_private.a.agno, error, _RET_IP_);
+	return error;
+}
+
+int
+xfs_rmapbt_insert(
+	struct xfs_btree_cur	*rcur,
+	xfs_agblock_t		agbno,
+	xfs_extlen_t		len,
+	uint64_t		owner,
+	uint64_t		offset,
+	unsigned int		flags)
+{
+	int			i;
+	int			error;
+
+	trace_xfs_rmapbt_insert(rcur->bc_mp, rcur->bc_private.a.agno, agbno,
+			len, owner, offset, flags);
+
+	error = xfs_rmap_lookup_eq(rcur, agbno, len, owner, offset, flags, &i);
+	if (error)
+		goto done;
+	XFS_WANT_CORRUPTED_GOTO(rcur->bc_mp, i == 0, done);
+
+	rcur->bc_rec.r.rm_startblock = agbno;
+	rcur->bc_rec.r.rm_blockcount = len;
+	rcur->bc_rec.r.rm_owner = owner;
+	rcur->bc_rec.r.rm_offset = offset;
+	rcur->bc_rec.r.rm_flags = flags;
+	error = xfs_btree_insert(rcur, &i);
+	if (error)
+		goto done;
+	XFS_WANT_CORRUPTED_GOTO(rcur->bc_mp, i == 1, done);
+done:
+	if (error)
+		trace_xfs_rmapbt_insert_error(rcur->bc_mp,
+				rcur->bc_private.a.agno, error, _RET_IP_);
+	return error;
+}
+
+STATIC int
+xfs_rmapbt_delete(
+	struct xfs_btree_cur	*rcur,
+	xfs_agblock_t		agbno,
+	xfs_extlen_t		len,
+	uint64_t		owner,
+	uint64_t		offset,
+	unsigned int		flags)
+{
+	int			i;
+	int			error;
+
+	trace_xfs_rmapbt_delete(rcur->bc_mp, rcur->bc_private.a.agno, agbno,
+			len, owner, offset, flags);
+
+	error = xfs_rmap_lookup_eq(rcur, agbno, len, owner, offset, flags, &i);
+	if (error)
+		goto done;
+	XFS_WANT_CORRUPTED_GOTO(rcur->bc_mp, i == 1, done);
+
+	error = xfs_btree_delete(rcur, &i);
+	if (error)
+		goto done;
+	XFS_WANT_CORRUPTED_GOTO(rcur->bc_mp, i == 1, done);
+done:
+	if (error)
+		trace_xfs_rmapbt_delete_error(rcur->bc_mp,
+				rcur->bc_private.a.agno, error, _RET_IP_);
+	return error;
 }
 
 static int
diff --git a/libxfs/xfs_rmap_btree.h b/libxfs/xfs_rmap_btree.h
index 9d92da5..6674340 100644
--- a/libxfs/xfs_rmap_btree.h
+++ b/libxfs/xfs_rmap_btree.h
@@ -64,6 +64,9 @@ int xfs_rmap_lookup_le(struct xfs_btree_cur *cur, xfs_agblock_t bno,
 int xfs_rmap_lookup_eq(struct xfs_btree_cur *cur, xfs_agblock_t bno,
 		xfs_extlen_t len, uint64_t owner, uint64_t offset,
 		unsigned int flags, int *stat);
+int xfs_rmapbt_insert(struct xfs_btree_cur *rcur, xfs_agblock_t agbno,
+		xfs_extlen_t len, uint64_t owner, uint64_t offset,
+		unsigned int flags);
 int xfs_rmap_get_rec(struct xfs_btree_cur *cur, struct xfs_rmap_irec *irec,
 		int *stat);
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 043/145] xfs: create helpers for mapping, unmapping, and converting file fork extents
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (41 preceding siblings ...)
  2016-06-17  1:35 ` [PATCH 042/145] xfs: add rmap btree insert and delete helpers Darrick J. Wong
@ 2016-06-17  1:35 ` Darrick J. Wong
  2016-06-17  1:35 ` [PATCH 044/145] xfs: create rmap update intent log items Darrick J. Wong
                   ` (101 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:35 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Create two helper functions to assist with mapping, unmapping, and
converting flag status of extents in a file's data/attr forks.  For
non-shared files we can use the _alloc, _free, and _convert functions;
when reflink comes these functions will be augmented to deal with
shared extents.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_rmap.c |   42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)


diff --git a/libxfs/xfs_rmap.c b/libxfs/xfs_rmap.c
index 3c13fac..47f37d7 100644
--- a/libxfs/xfs_rmap.c
+++ b/libxfs/xfs_rmap.c
@@ -1121,11 +1121,53 @@ done:
 	return error;
 }
 
+/*
+ * Convert an unwritten extent to a real extent or vice versa.
+ */
+STATIC int
+xfs_rmap_convert(
+	struct xfs_btree_cur	*cur,
+	xfs_agblock_t		bno,
+	xfs_extlen_t		len,
+	bool			unwritten,
+	struct xfs_owner_info	*oinfo)
+{
+	return __xfs_rmap_convert(cur, bno, len, unwritten, oinfo);
+}
+
 #undef	NEW
 #undef	LEFT
 #undef	RIGHT
 #undef	PREV
 
+/*
+ * Find an extent in the rmap btree and unmap it.
+ */
+STATIC int
+xfs_rmap_unmap(
+	struct xfs_btree_cur	*cur,
+	xfs_agblock_t		bno,
+	xfs_extlen_t		len,
+	bool			unwritten,
+	struct xfs_owner_info	*oinfo)
+{
+	return __xfs_rmap_free(cur, bno, len, unwritten, oinfo);
+}
+
+/*
+ * Find an extent in the rmap btree and map it.
+ */
+STATIC int
+xfs_rmap_map(
+	struct xfs_btree_cur	*cur,
+	xfs_agblock_t		bno,
+	xfs_extlen_t		len,
+	bool			unwritten,
+	struct xfs_owner_info	*oinfo)
+{
+	return __xfs_rmap_alloc(cur, bno, len, unwritten, oinfo);
+}
+
 struct xfs_rmapbt_query_range_info {
 	xfs_rmapbt_query_range_fn	fn;
 	void				*priv;

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 044/145] xfs: create rmap update intent log items
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (42 preceding siblings ...)
  2016-06-17  1:35 ` [PATCH 043/145] xfs: create helpers for mapping, unmapping, and converting file fork extents Darrick J. Wong
@ 2016-06-17  1:35 ` Darrick J. Wong
  2016-06-17  1:35 ` [PATCH 045/145] xfs: enable the xfs_defer mechanism to process rmaps to update Darrick J. Wong
                   ` (100 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:35 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Create rmap update intent/done log items to record redo information in
the log.  Because we need to roll transactions between updating the
bmbt mapping and updating the reverse mapping, we also have to track
the status of the metadata updates that will be recorded in the
post-roll transactions, just in case we crash before committing the
final transaction.  This mechanism enables log recovery to finish what
was already started.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_log_format.h |   67 ++++++++++++++++++++++++++++++++++++++++++++++-
 libxfs/xfs_rmap_btree.h |   19 +++++++++++++
 2 files changed, 84 insertions(+), 2 deletions(-)


diff --git a/libxfs/xfs_log_format.h b/libxfs/xfs_log_format.h
index e5baba3..b9627b7 100644
--- a/libxfs/xfs_log_format.h
+++ b/libxfs/xfs_log_format.h
@@ -110,7 +110,9 @@ static inline uint xlog_get_cycle(char *ptr)
 #define XLOG_REG_TYPE_COMMIT		18
 #define XLOG_REG_TYPE_TRANSHDR		19
 #define XLOG_REG_TYPE_ICREATE		20
-#define XLOG_REG_TYPE_MAX		20
+#define XLOG_REG_TYPE_RUI_FORMAT	21
+#define XLOG_REG_TYPE_RUD_FORMAT	22
+#define XLOG_REG_TYPE_MAX		22
 
 /*
  * Flags to log operation header
@@ -227,6 +229,8 @@ typedef struct xfs_trans_header {
 #define	XFS_LI_DQUOT		0x123d
 #define	XFS_LI_QUOTAOFF		0x123e
 #define	XFS_LI_ICREATE		0x123f
+#define	XFS_LI_RUI		0x1240	/* rmap update intent */
+#define	XFS_LI_RUD		0x1241
 
 #define XFS_LI_TYPE_DESC \
 	{ XFS_LI_EFI,		"XFS_LI_EFI" }, \
@@ -236,7 +240,9 @@ typedef struct xfs_trans_header {
 	{ XFS_LI_BUF,		"XFS_LI_BUF" }, \
 	{ XFS_LI_DQUOT,		"XFS_LI_DQUOT" }, \
 	{ XFS_LI_QUOTAOFF,	"XFS_LI_QUOTAOFF" }, \
-	{ XFS_LI_ICREATE,	"XFS_LI_ICREATE" }
+	{ XFS_LI_ICREATE,	"XFS_LI_ICREATE" }, \
+	{ XFS_LI_RUI,		"XFS_LI_RUI" }, \
+	{ XFS_LI_RUD,		"XFS_LI_RUD" }
 
 /*
  * Inode Log Item Format definitions.
@@ -604,6 +610,63 @@ typedef struct xfs_efd_log_format_64 {
 } xfs_efd_log_format_64_t;
 
 /*
+ * RUI/RUD (reverse mapping) log format definitions
+ */
+struct xfs_map_extent {
+	__uint64_t		me_owner;
+	__uint64_t		me_startblock;
+	__uint64_t		me_startoff;
+	__uint32_t		me_len;
+	__uint32_t		me_flags;
+};
+
+/* rmap me_flags: upper bits are flags, lower byte is type code */
+#define XFS_RMAP_EXTENT_MAP		1
+#define XFS_RMAP_EXTENT_MAP_SHARED	2
+#define XFS_RMAP_EXTENT_UNMAP		3
+#define XFS_RMAP_EXTENT_UNMAP_SHARED	4
+#define XFS_RMAP_EXTENT_CONVERT		5
+#define XFS_RMAP_EXTENT_CONVERT_SHARED	6
+#define XFS_RMAP_EXTENT_ALLOC		7
+#define XFS_RMAP_EXTENT_FREE		8
+#define XFS_RMAP_EXTENT_TYPE_MASK	0xFF
+
+#define XFS_RMAP_EXTENT_ATTR_FORK	(1U << 31)
+#define XFS_RMAP_EXTENT_BMBT_BLOCK	(1U << 30)
+#define XFS_RMAP_EXTENT_UNWRITTEN	(1U << 29)
+
+#define XFS_RMAP_EXTENT_FLAGS		(XFS_RMAP_EXTENT_TYPE_MASK | \
+					 XFS_RMAP_EXTENT_ATTR_FORK | \
+					 XFS_RMAP_EXTENT_BMBT_BLOCK | \
+					 XFS_RMAP_EXTENT_UNWRITTEN)
+
+/*
+ * This is the structure used to lay out an rui log item in the
+ * log.  The rui_extents field is a variable size array whose
+ * size is given by rui_nextents.
+ */
+struct xfs_rui_log_format {
+	__uint16_t		rui_type;	/* rui log item type */
+	__uint16_t		rui_size;	/* size of this item */
+	__uint32_t		rui_nextents;	/* # extents to free */
+	__uint64_t		rui_id;		/* rui identifier */
+	struct xfs_map_extent	rui_extents[1];	/* array of extents to rmap */
+};
+
+/*
+ * This is the structure used to lay out an rud log item in the
+ * log.  The rud_extents array is a variable size array whose
+ * size is given by rud_nextents;
+ */
+struct xfs_rud_log_format {
+	__uint16_t		rud_type;	/* rud log item type */
+	__uint16_t		rud_size;	/* size of this item */
+	__uint32_t		rud_nextents;	/* # of extents freed */
+	__uint64_t		rud_rui_id;	/* id of corresponding rui */
+	struct xfs_map_extent	rud_extents[1];	/* array of extents rmapped */
+};
+
+/*
  * Dquot Log format definitions.
  *
  * The first two fields must be the type and size fitting into
diff --git a/libxfs/xfs_rmap_btree.h b/libxfs/xfs_rmap_btree.h
index 6674340..aff60dc 100644
--- a/libxfs/xfs_rmap_btree.h
+++ b/libxfs/xfs_rmap_btree.h
@@ -87,4 +87,23 @@ int xfs_rmapbt_query_range(struct xfs_btree_cur *cur,
 		struct xfs_rmap_irec *low_rec, struct xfs_rmap_irec *high_rec,
 		xfs_rmapbt_query_range_fn fn, void *priv);
 
+enum xfs_rmap_intent_type {
+	XFS_RMAP_MAP,
+	XFS_RMAP_MAP_SHARED,
+	XFS_RMAP_UNMAP,
+	XFS_RMAP_UNMAP_SHARED,
+	XFS_RMAP_CONVERT,
+	XFS_RMAP_CONVERT_SHARED,
+	XFS_RMAP_ALLOC,
+	XFS_RMAP_FREE,
+};
+
+struct xfs_rmap_intent {
+	struct list_head			ri_list;
+	enum xfs_rmap_intent_type		ri_type;
+	__uint64_t				ri_owner;
+	int					ri_whichfork;
+	struct xfs_bmbt_irec			ri_bmap;
+};
+
 #endif	/* __XFS_RMAP_BTREE_H__ */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 045/145] xfs: enable the xfs_defer mechanism to process rmaps to update
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (43 preceding siblings ...)
  2016-06-17  1:35 ` [PATCH 044/145] xfs: create rmap update intent log items Darrick J. Wong
@ 2016-06-17  1:35 ` Darrick J. Wong
  2016-06-17  1:35 ` [PATCH 046/145] xfs: propagate bmap updates to rmapbt Darrick J. Wong
                   ` (99 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:35 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Connect the xfs_defer mechanism with the pieces that we'll need to
handle deferred rmap updates.  We'll wire up the existing code to
our new deferred mechanism later.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/defer_item.c |  100 +++++++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_defer.h  |    1 +
 2 files changed, 101 insertions(+)


diff --git a/libxfs/defer_item.c b/libxfs/defer_item.c
index 777875c..cd88cfd 100644
--- a/libxfs/defer_item.c
+++ b/libxfs/defer_item.c
@@ -30,6 +30,7 @@
 #include "xfs_trans.h"
 #include "xfs_bmap.h"
 #include "xfs_alloc.h"
+#include "xfs_rmap_btree.h"
 
 /* Extent Freeing */
 
@@ -126,11 +127,110 @@ const struct xfs_defer_op_type xfs_extent_free_defer_type = {
 	.cancel_item	= xfs_bmap_free_cancel_item,
 };
 
+/* Reverse Mapping */
+
+/* Sort rmap intents by AG. */
+static int
+xfs_rmap_update_diff_items(
+	void				*priv,
+	struct list_head		*a,
+	struct list_head		*b)
+{
+	struct xfs_mount		*mp = priv;
+	struct xfs_rmap_intent		*ra;
+	struct xfs_rmap_intent		*rb;
+
+	ra = container_of(a, struct xfs_rmap_intent, ri_list);
+	rb = container_of(b, struct xfs_rmap_intent, ri_list);
+	return  XFS_FSB_TO_AGNO(mp, ra->ri_bmap.br_startblock) -
+		XFS_FSB_TO_AGNO(mp, rb->ri_bmap.br_startblock);
+}
+
+/* Get an RUI. */
+STATIC void *
+xfs_rmap_update_create_intent(
+	struct xfs_trans		*tp,
+	unsigned int			count)
+{
+	return NULL;
+}
+
+/* Log rmap updates in the intent item. */
+STATIC void
+xfs_rmap_update_log_item(
+	struct xfs_trans		*tp,
+	void				*intent,
+	struct list_head		*item)
+{
+}
+
+/* Get an RUD so we can process all the deferred rmap updates. */
+STATIC void *
+xfs_rmap_update_create_done(
+	struct xfs_trans		*tp,
+	void				*intent,
+	unsigned int			count)
+{
+	return NULL;
+}
+
+/* Process a deferred rmap update. */
+STATIC int
+xfs_rmap_update_finish_item(
+	struct xfs_trans		*tp,
+	struct xfs_defer_ops		*dop,
+	struct list_head		*item,
+	void				*done_item,
+	void				**state)
+{
+	return -EFSCORRUPTED;
+}
+
+/* Clean up after processing deferred rmaps. */
+STATIC void
+xfs_rmap_update_finish_cleanup(
+	struct xfs_trans	*tp,
+	void			*state,
+	int			error)
+{
+}
+
+/* Abort all pending RUIs. */
+STATIC void
+xfs_rmap_update_abort_intent(
+	void				*intent)
+{
+}
+
+/* Cancel a deferred rmap update. */
+STATIC void
+xfs_rmap_update_cancel_item(
+	struct list_head		*item)
+{
+	struct xfs_rmap_intent		*rmap;
+
+	rmap = container_of(item, struct xfs_rmap_intent, ri_list);
+	kmem_free(rmap);
+}
+
+const struct xfs_defer_op_type xfs_rmap_update_defer_type = {
+	.type		= XFS_DEFER_OPS_TYPE_RMAP,
+	.diff_items	= xfs_rmap_update_diff_items,
+	.create_intent	= xfs_rmap_update_create_intent,
+	.abort_intent	= xfs_rmap_update_abort_intent,
+	.log_item	= xfs_rmap_update_log_item,
+	.create_done	= xfs_rmap_update_create_done,
+	.finish_item	= xfs_rmap_update_finish_item,
+	.finish_cleanup = xfs_rmap_update_finish_cleanup,
+	.cancel_item	= xfs_rmap_update_cancel_item,
+};
+
 /* Deferred Item Initialization */
 
 /* Initialize the deferred operation types. */
 void
 xfs_defer_init_types(void)
 {
+	xfs_defer_init_op_type(&xfs_rmap_update_defer_type);
 	xfs_defer_init_op_type(&xfs_extent_free_defer_type);
 }
diff --git a/libxfs/xfs_defer.h b/libxfs/xfs_defer.h
index 743fc32..920642e 100644
--- a/libxfs/xfs_defer.h
+++ b/libxfs/xfs_defer.h
@@ -51,6 +51,7 @@ struct xfs_defer_pending {
  * find all the space it needs.
  */
 enum xfs_defer_ops_type {
+	XFS_DEFER_OPS_TYPE_RMAP,
 	XFS_DEFER_OPS_TYPE_FREE,
 	XFS_DEFER_OPS_TYPE_MAX,
 };

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 046/145] xfs: propagate bmap updates to rmapbt
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (44 preceding siblings ...)
  2016-06-17  1:35 ` [PATCH 045/145] xfs: enable the xfs_defer mechanism to process rmaps to update Darrick J. Wong
@ 2016-06-17  1:35 ` Darrick J. Wong
  2016-06-17  1:35 ` [PATCH 047/145] xfs: add rmap btree geometry feature flag Darrick J. Wong
                   ` (98 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:35 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

When we map, unmap, or convert an extent in a file's data or attr
fork, schedule a respective update in the rmapbt.  Previous versions
of this patch required a 1:1 correspondence between bmap and rmap,
but this is no longer true.

v2: Remove the 1:1 correspondence requirement now that we have the
ability to make interval queries against the rmapbt.  Update the
commit message to reflect the broad restructuring of this patch.
Fix the bmap shift code to adjust the rmaps correctly.

v3: Use the deferred operations code to handle redo operations
atomically and deadlock free.  Plumb in all five rmap actions
(map, unmap, convert extent, alloc, free); we'll use the first
three now for file data, and reflink will want the last two.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 include/xfs_trace.h     |    2 
 libxfs/defer_item.c     |   18 +++
 libxfs/util.c           |    1 
 libxfs/xfs_bmap.c       |   56 +++++++++-
 libxfs/xfs_rmap.c       |  252 +++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_rmap_btree.h |   24 ++++
 6 files changed, 344 insertions(+), 9 deletions(-)


diff --git a/include/xfs_trace.h b/include/xfs_trace.h
index 55df410..00c2ccb 100644
--- a/include/xfs_trace.h
+++ b/include/xfs_trace.h
@@ -200,6 +200,8 @@
 #define trace_xfs_rmapbt_insert_error(...)	((void) 0)
 #define trace_xfs_rmapbt_delete(...)		((void) 0)
 #define trace_xfs_rmapbt_delete_error(...)	((void) 0)
+#define trace_xfs_rmap_defer(...)		((void) 0)
+#define trace_xfs_rmap_deferred(...)		((void) 0)
 
 #define trace_xfs_rmap_lookup_le_range_result(...)		((void) 0)
 #define trace_xfs_rmap_map_gtrec(...)		((void) 0)
diff --git a/libxfs/defer_item.c b/libxfs/defer_item.c
index cd88cfd..381c969 100644
--- a/libxfs/defer_item.c
+++ b/libxfs/defer_item.c
@@ -183,7 +183,20 @@ xfs_rmap_update_finish_item(
 	void				*done_item,
 	void				**state)
 {
-	return -EFSCORRUPTED;
+	struct xfs_rmap_intent		*rmap;
+	int				error;
+
+	rmap = container_of(item, struct xfs_rmap_intent, ri_list);
+	error = xfs_rmap_finish_one(tp,
+			rmap->ri_type,
+			rmap->ri_owner, rmap->ri_whichfork,
+			rmap->ri_bmap.br_startoff,
+			rmap->ri_bmap.br_startblock,
+			rmap->ri_bmap.br_blockcount,
+			rmap->ri_bmap.br_state,
+			(struct xfs_btree_cur **)state);
+	kmem_free(rmap);
+	return error;
 }
 
 /* Clean up after processing deferred rmaps. */
@@ -193,6 +206,9 @@ xfs_rmap_update_finish_cleanup(
 	void			*state,
 	int			error)
 {
+	struct xfs_btree_cur	*rcur = state;
+
+	xfs_rmap_finish_one_cleanup(tp, rcur, error);
 }
 
 /* Abort all pending RUIs. */
diff --git a/libxfs/util.c b/libxfs/util.c
index 5b277c2..2ba6510 100644
--- a/libxfs/util.c
+++ b/libxfs/util.c
@@ -37,6 +37,7 @@
 #include "xfs_alloc.h"
 #include "xfs_bit.h"
 #include "list.h"
+#include "xfs_rmap_btree.h"
 
 /*
  * Calculate the worst case log unit reservation for a given superblock
diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index 453d073..e9ccec5 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -38,6 +38,7 @@
 #include "xfs_trace.h"
 #include "xfs_attr_leaf.h"
 #include "xfs_quota_defs.h"
+#include "xfs_rmap_btree.h"
 
 
 kmem_zone_t		*xfs_bmap_free_item_zone;
@@ -2170,6 +2171,11 @@ xfs_bmap_add_extent_delay_real(
 		ASSERT(0);
 	}
 
+	/* add reverse mapping */
+	error = xfs_rmap_map_extent(mp, bma->dfops, bma->ip, whichfork, new);
+	if (error)
+		goto done;
+
 	/* convert to a btree if necessary */
 	if (xfs_bmap_needs_btree(bma->ip, whichfork)) {
 		int	tmp_logflags;	/* partial log flag return val */
@@ -2706,6 +2712,11 @@ xfs_bmap_add_extent_unwritten_real(
 		ASSERT(0);
 	}
 
+	/* update reverse mappings */
+	error = xfs_rmap_convert_extent(mp, dfops, ip, XFS_DATA_FORK, new);
+	if (error)
+		goto done;
+
 	/* convert to a btree if necessary */
 	if (xfs_bmap_needs_btree(ip, XFS_DATA_FORK)) {
 		int	tmp_logflags;	/* partial log flag return val */
@@ -3098,6 +3109,11 @@ xfs_bmap_add_extent_hole_real(
 		break;
 	}
 
+	/* add reverse mapping */
+	error = xfs_rmap_map_extent(mp, bma->dfops, bma->ip, whichfork, new);
+	if (error)
+		goto done;
+
 	/* convert to a btree if necessary */
 	if (xfs_bmap_needs_btree(bma->ip, whichfork)) {
 		int	tmp_logflags;	/* partial log flag return val */
@@ -5024,6 +5040,14 @@ xfs_bmap_del_extent(
 		++*idx;
 		break;
 	}
+
+	/* remove reverse mapping */
+	if (!delay) {
+		error = xfs_rmap_unmap_extent(mp, dfops, ip, whichfork, del);
+		if (error)
+			goto done;
+	}
+
 	/*
 	 * If we need to, add to list of extents to delete.
 	 */
@@ -5561,7 +5585,8 @@ xfs_bmse_shift_one(
 	struct xfs_bmbt_rec_host	*gotp,
 	struct xfs_btree_cur		*cur,
 	int				*logflags,
-	enum shift_direction		direction)
+	enum shift_direction		direction,
+	struct xfs_defer_ops		*dfops)
 {
 	struct xfs_ifork		*ifp;
 	struct xfs_mount		*mp;
@@ -5609,9 +5634,13 @@ xfs_bmse_shift_one(
 		/* check whether to merge the extent or shift it down */
 		if (xfs_bmse_can_merge(&adj_irec, &got,
 				       offset_shift_fsb)) {
-			return xfs_bmse_merge(ip, whichfork, offset_shift_fsb,
-					      *current_ext, gotp, adj_irecp,
-					      cur, logflags);
+			error = xfs_bmse_merge(ip, whichfork, offset_shift_fsb,
+					       *current_ext, gotp, adj_irecp,
+					       cur, logflags);
+			if (error)
+				return error;
+			adj_irec = got;
+			goto update_rmap;
 		}
 	} else {
 		startoff = got.br_startoff + offset_shift_fsb;
@@ -5648,9 +5677,10 @@ update_current_ext:
 		(*current_ext)--;
 	xfs_bmbt_set_startoff(gotp, startoff);
 	*logflags |= XFS_ILOG_CORE;
+	adj_irec = got;
 	if (!cur) {
 		*logflags |= XFS_ILOG_DEXT;
-		return 0;
+		goto update_rmap;
 	}
 
 	error = xfs_bmbt_lookup_eq(cur, got.br_startoff, got.br_startblock,
@@ -5660,8 +5690,18 @@ update_current_ext:
 	XFS_WANT_CORRUPTED_RETURN(mp, i == 1);
 
 	got.br_startoff = startoff;
-	return xfs_bmbt_update(cur, got.br_startoff, got.br_startblock,
-			       got.br_blockcount, got.br_state);
+	error = xfs_bmbt_update(cur, got.br_startoff, got.br_startblock,
+			got.br_blockcount, got.br_state);
+	if (error)
+		return error;
+
+update_rmap:
+	/* update reverse mapping */
+	error = xfs_rmap_unmap_extent(mp, dfops, ip, whichfork, &adj_irec);
+	if (error)
+		return error;
+	adj_irec.br_startoff = startoff;
+	return xfs_rmap_map_extent(mp, dfops, ip, whichfork, &adj_irec);
 }
 
 /*
@@ -5789,7 +5829,7 @@ xfs_bmap_shift_extents(
 	while (nexts++ < num_exts) {
 		error = xfs_bmse_shift_one(ip, whichfork, offset_shift_fsb,
 					   &current_ext, gotp, cur, &logflags,
-					   direction);
+					   direction, dfops);
 		if (error)
 			goto del_cursor;
 		/*
diff --git a/libxfs/xfs_rmap.c b/libxfs/xfs_rmap.c
index 47f37d7..7637903 100644
--- a/libxfs/xfs_rmap.c
+++ b/libxfs/xfs_rmap.c
@@ -34,6 +34,8 @@
 #include "xfs_rmap_btree.h"
 #include "xfs_trans_space.h"
 #include "xfs_trace.h"
+#include "xfs_bmap.h"
+#include "xfs_inode.h"
 
 /*
  * Lookup the first record less than or equal to [bno, len, owner, offset]
@@ -1210,3 +1212,253 @@ xfs_rmapbt_query_range(
 	return xfs_btree_query_range(cur, &low_brec, &high_brec,
 			xfs_rmapbt_query_range_helper, &query);
 }
+
+/* Clean up after calling xfs_rmap_finish_one. */
+void
+xfs_rmap_finish_one_cleanup(
+	struct xfs_trans	*tp,
+	struct xfs_btree_cur	*rcur,
+	int			error)
+{
+	struct xfs_buf		*agbp;
+
+	if (rcur == NULL)
+		return;
+	agbp = rcur->bc_private.a.agbp;
+	xfs_btree_del_cursor(rcur, error ? XFS_BTREE_ERROR : XFS_BTREE_NOERROR);
+	xfs_trans_brelse(tp, agbp);
+}
+
+/*
+ * Process one of the deferred rmap operations.  We pass back the
+ * btree cursor to maintain our lock on the rmapbt between calls.
+ * This saves time and eliminates a buffer deadlock between the
+ * superblock and the AGF because we'll always grab them in the same
+ * order.
+ */
+int
+xfs_rmap_finish_one(
+	struct xfs_trans		*tp,
+	enum xfs_rmap_intent_type	type,
+	__uint64_t			owner,
+	int				whichfork,
+	xfs_fileoff_t			startoff,
+	xfs_fsblock_t			startblock,
+	xfs_filblks_t			blockcount,
+	xfs_exntst_t			state,
+	struct xfs_btree_cur		**pcur)
+{
+	struct xfs_mount		*mp = tp->t_mountp;
+	struct xfs_btree_cur		*rcur;
+	struct xfs_buf			*agbp = NULL;
+	int				error = 0;
+	xfs_agnumber_t			agno;
+	struct xfs_owner_info		oinfo;
+	xfs_agblock_t			bno;
+	bool				unwritten;
+
+	agno = XFS_FSB_TO_AGNO(mp, startblock);
+	ASSERT(agno != NULLAGNUMBER);
+	bno = XFS_FSB_TO_AGBNO(mp, startblock);
+
+	trace_xfs_rmap_deferred(mp, agno, type, bno, owner, whichfork,
+			startoff, blockcount, state);
+
+	if (XFS_TEST_ERROR(false, mp,
+			XFS_ERRTAG_RMAP_FINISH_ONE,
+			XFS_RANDOM_RMAP_FINISH_ONE))
+		return -EIO;
+
+	/*
+	 * If we haven't gotten a cursor or the cursor AG doesn't match
+	 * the startblock, get one now.
+	 */
+	rcur = *pcur;
+	if (rcur != NULL && rcur->bc_private.a.agno != agno) {
+		xfs_rmap_finish_one_cleanup(tp, rcur, 0);
+		rcur = NULL;
+		*pcur = NULL;
+	}
+	if (rcur == NULL) {
+		error = xfs_free_extent_fix_freelist(tp, agno, &agbp);
+		if (error)
+			return error;
+		if (!agbp)
+			return -EFSCORRUPTED;
+
+		rcur = xfs_rmapbt_init_cursor(mp, tp, agbp, agno);
+		if (!rcur) {
+			error = -ENOMEM;
+			goto out_cur;
+		}
+	}
+	*pcur = rcur;
+
+	xfs_rmap_ino_owner(&oinfo, owner, whichfork, startoff);
+	unwritten = state == XFS_EXT_UNWRITTEN;
+	bno = XFS_FSB_TO_AGBNO(rcur->bc_mp, startblock);
+
+	switch (type) {
+	case XFS_RMAP_MAP:
+		error = xfs_rmap_map(rcur, bno, blockcount, unwritten, &oinfo);
+		break;
+	case XFS_RMAP_UNMAP:
+		error = xfs_rmap_unmap(rcur, bno, blockcount, unwritten,
+				&oinfo);
+		break;
+	case XFS_RMAP_CONVERT:
+		error = xfs_rmap_convert(rcur, bno, blockcount, !unwritten,
+				&oinfo);
+		break;
+	case XFS_RMAP_ALLOC:
+		error = __xfs_rmap_alloc(rcur, bno, blockcount, unwritten,
+				&oinfo);
+		break;
+	case XFS_RMAP_FREE:
+		error = __xfs_rmap_free(rcur, bno, blockcount, unwritten,
+				&oinfo);
+		break;
+	default:
+		ASSERT(0);
+		error = -EFSCORRUPTED;
+	}
+	return error;
+
+out_cur:
+	xfs_trans_brelse(tp, agbp);
+
+	return error;
+}
+
+/*
+ * Record a rmap intent; the list is kept sorted first by AG and then by
+ * increasing age.
+ */
+static int
+__xfs_rmap_add(
+	struct xfs_mount	*mp,
+	struct xfs_defer_ops	*dfops,
+	struct xfs_rmap_intent	*ri)
+{
+	struct xfs_rmap_intent	*new;
+
+	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+		return 0;
+
+	trace_xfs_rmap_defer(mp, XFS_FSB_TO_AGNO(mp, ri->ri_bmap.br_startblock),
+			ri->ri_type,
+			XFS_FSB_TO_AGBNO(mp, ri->ri_bmap.br_startblock),
+			ri->ri_owner, ri->ri_whichfork,
+			ri->ri_bmap.br_startoff,
+			ri->ri_bmap.br_blockcount,
+			ri->ri_bmap.br_state);
+
+	new = kmem_zalloc(sizeof(struct xfs_rmap_intent), KM_SLEEP | KM_NOFS);
+	*new = *ri;
+
+	xfs_defer_add(dfops, XFS_DEFER_OPS_TYPE_RMAP, &new->ri_list);
+	return 0;
+}
+
+/* Map an extent into a file. */
+int
+xfs_rmap_map_extent(
+	struct xfs_mount	*mp,
+	struct xfs_defer_ops	*dfops,
+	struct xfs_inode	*ip,
+	int			whichfork,
+	struct xfs_bmbt_irec	*PREV)
+{
+	struct xfs_rmap_intent	ri;
+
+	ri.ri_type = XFS_RMAP_MAP;
+	ri.ri_owner = ip->i_ino;
+	ri.ri_whichfork = whichfork;
+	ri.ri_bmap = *PREV;
+
+	return __xfs_rmap_add(mp, dfops, &ri);
+}
+
+/* Unmap an extent out of a file. */
+int
+xfs_rmap_unmap_extent(
+	struct xfs_mount	*mp,
+	struct xfs_defer_ops	*dfops,
+	struct xfs_inode	*ip,
+	int			whichfork,
+	struct xfs_bmbt_irec	*PREV)
+{
+	struct xfs_rmap_intent	ri;
+
+	ri.ri_type = XFS_RMAP_UNMAP;
+	ri.ri_owner = ip->i_ino;
+	ri.ri_whichfork = whichfork;
+	ri.ri_bmap = *PREV;
+
+	return __xfs_rmap_add(mp, dfops, &ri);
+}
+
+/* Convert a data fork extent from unwritten to real or vice versa. */
+int
+xfs_rmap_convert_extent(
+	struct xfs_mount	*mp,
+	struct xfs_defer_ops	*dfops,
+	struct xfs_inode	*ip,
+	int			whichfork,
+	struct xfs_bmbt_irec	*PREV)
+{
+	struct xfs_rmap_intent	ri;
+
+	ri.ri_type = XFS_RMAP_CONVERT;
+	ri.ri_owner = ip->i_ino;
+	ri.ri_whichfork = whichfork;
+	ri.ri_bmap = *PREV;
+
+	return __xfs_rmap_add(mp, dfops, &ri);
+}
+
+/* Schedule the creation of an rmap for non-file data. */
+int
+xfs_rmap_alloc_defer(
+	struct xfs_mount	*mp,
+	struct xfs_defer_ops	*dfops,
+	xfs_agnumber_t		agno,
+	xfs_agblock_t		bno,
+	xfs_extlen_t		len,
+	__uint64_t		owner)
+{
+	struct xfs_rmap_intent	ri;
+
+	ri.ri_type = XFS_RMAP_ALLOC;
+	ri.ri_owner = owner;
+	ri.ri_whichfork = XFS_DATA_FORK;
+	ri.ri_bmap.br_startblock = XFS_AGB_TO_FSB(mp, agno, bno);
+	ri.ri_bmap.br_blockcount = len;
+	ri.ri_bmap.br_startoff = 0;
+	ri.ri_bmap.br_state = XFS_EXT_NORM;
+
+	return __xfs_rmap_add(mp, dfops, &ri);
+}
+
+/* Schedule the deletion of an rmap for non-file data. */
+int
+xfs_rmap_free_defer(
+	struct xfs_mount	*mp,
+	struct xfs_defer_ops	*dfops,
+	xfs_agnumber_t		agno,
+	xfs_agblock_t		bno,
+	xfs_extlen_t		len,
+	__uint64_t		owner)
+{
+	struct xfs_rmap_intent	ri;
+
+	ri.ri_type = XFS_RMAP_FREE;
+	ri.ri_owner = owner;
+	ri.ri_whichfork = XFS_DATA_FORK;
+	ri.ri_bmap.br_startblock = XFS_AGB_TO_FSB(mp, agno, bno);
+	ri.ri_bmap.br_blockcount = len;
+	ri.ri_bmap.br_startoff = 0;
+	ri.ri_bmap.br_state = XFS_EXT_NORM;
+
+	return __xfs_rmap_add(mp, dfops, &ri);
+}
diff --git a/libxfs/xfs_rmap_btree.h b/libxfs/xfs_rmap_btree.h
index aff60dc..5df406e 100644
--- a/libxfs/xfs_rmap_btree.h
+++ b/libxfs/xfs_rmap_btree.h
@@ -106,4 +106,28 @@ struct xfs_rmap_intent {
 	struct xfs_bmbt_irec			ri_bmap;
 };
 
+/* functions for updating the rmapbt based on bmbt map/unmap operations */
+int xfs_rmap_map_extent(struct xfs_mount *mp, struct xfs_defer_ops *dfops,
+		struct xfs_inode *ip, int whichfork,
+		struct xfs_bmbt_irec *imap);
+int xfs_rmap_unmap_extent(struct xfs_mount *mp, struct xfs_defer_ops *dfops,
+		struct xfs_inode *ip, int whichfork,
+		struct xfs_bmbt_irec *imap);
+int xfs_rmap_convert_extent(struct xfs_mount *mp, struct xfs_defer_ops *dfops,
+		struct xfs_inode *ip, int whichfork,
+		struct xfs_bmbt_irec *imap);
+int xfs_rmap_alloc_defer(struct xfs_mount *mp, struct xfs_defer_ops *dfops,
+		xfs_agnumber_t agno, xfs_agblock_t bno, xfs_extlen_t len,
+		__uint64_t owner);
+int xfs_rmap_free_defer(struct xfs_mount *mp, struct xfs_defer_ops *dfops,
+		xfs_agnumber_t agno, xfs_agblock_t bno, xfs_extlen_t len,
+		__uint64_t owner);
+
+void xfs_rmap_finish_one_cleanup(struct xfs_trans *tp,
+		struct xfs_btree_cur *rcur, int error);
+int xfs_rmap_finish_one(struct xfs_trans *tp, enum xfs_rmap_intent_type type,
+		__uint64_t owner, int whichfork, xfs_fileoff_t startoff,
+		xfs_fsblock_t startblock, xfs_filblks_t blockcount,
+		xfs_exntst_t state, struct xfs_btree_cur **pcur);
+
 #endif	/* __XFS_RMAP_BTREE_H__ */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 047/145] xfs: add rmap btree geometry feature flag
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (45 preceding siblings ...)
  2016-06-17  1:35 ` [PATCH 046/145] xfs: propagate bmap updates to rmapbt Darrick J. Wong
@ 2016-06-17  1:35 ` Darrick J. Wong
  2016-06-17  1:35 ` [PATCH 048/145] xfs: don't update rmapbt when fixing agfl Darrick J. Wong
                   ` (97 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:35 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: Dave Chinner, xfs

From: Dave Chinner <dchinner@redhat.com>

So xfs_info and other userspace utilities know the filesystem is
using this feature.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 libxfs/xfs_fs.h |    1 +
 1 file changed, 1 insertion(+)


diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h
index 1f17e1c..085ea6f 100644
--- a/libxfs/xfs_fs.h
+++ b/libxfs/xfs_fs.h
@@ -230,6 +230,7 @@ typedef struct xfs_fsop_resblks {
 #define XFS_FSOP_GEOM_FLAGS_FTYPE	0x10000	/* inode directory types */
 #define XFS_FSOP_GEOM_FLAGS_FINOBT	0x20000	/* free inode btree */
 #define XFS_FSOP_GEOM_FLAGS_SPINODES	0x40000	/* sparse inode chunks	*/
+#define XFS_FSOP_GEOM_FLAGS_RMAPBT	0x80000	/* Reverse mapping btree */
 
 /*
  * Minimum and maximum sizes need for growth checks.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 048/145] xfs: don't update rmapbt when fixing agfl
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (46 preceding siblings ...)
  2016-06-17  1:35 ` [PATCH 047/145] xfs: add rmap btree geometry feature flag Darrick J. Wong
@ 2016-06-17  1:35 ` Darrick J. Wong
  2016-06-17  1:35 ` [PATCH 049/145] xfs: enable the rmap btree functionality Darrick J. Wong
                   ` (96 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:35 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Allow a caller of xfs_alloc_fix_freelist to disable rmapbt updates
when fixing the AG freelist.  xfs_repair needs this during phase 5
to be able to adjust the freelist while it's reconstructing the rmap
btree; the missing entries will be added back at the very end of
phase 5 once the AGFL contents settle down.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_alloc.c |   40 ++++++++++++++++++++++++++--------------
 libxfs/xfs_alloc.h |    3 +++
 2 files changed, 29 insertions(+), 14 deletions(-)


diff --git a/libxfs/xfs_alloc.c b/libxfs/xfs_alloc.c
index 7d680da..2f943db 100644
--- a/libxfs/xfs_alloc.c
+++ b/libxfs/xfs_alloc.c
@@ -2088,26 +2088,38 @@ xfs_alloc_fix_freelist(
 	 * anything other than extra overhead when we need to put more blocks
 	 * back on the free list? Maybe we should only do this when space is
 	 * getting low or the AGFL is more than half full?
+	 *
+	 * The NOSHRINK flag prevents the AGFL from being shrunk if it's too
+	 * big; the NORMAP flag prevents AGFL expand/shrink operations from
+	 * updating the rmapbt.  Both flags are used in xfs_repair while we're
+	 * rebuilding the rmapbt, and neither are used by the kernel.  They're
+	 * both required to ensure that rmaps are correctly recorded for the
+	 * regenerated AGFL, bnobt, and cntbt.  See repair/phase5.c and
+	 * repair/rmap.c in xfsprogs for details.
 	 */
-	xfs_rmap_ag_owner(&targs.oinfo, XFS_RMAP_OWN_AG);
-	while (pag->pagf_flcount > need) {
-		struct xfs_buf	*bp;
+	memset(&targs, 0, sizeof(targs));
+	if (!(flags & XFS_ALLOC_FLAG_NOSHRINK)) {
+		if (!(flags & XFS_ALLOC_FLAG_NORMAP))
+			xfs_rmap_ag_owner(&targs.oinfo, XFS_RMAP_OWN_AG);
+		while (pag->pagf_flcount > need) {
+			struct xfs_buf	*bp;
 
-		error = xfs_alloc_get_freelist(tp, agbp, &bno, 0);
-		if (error)
-			goto out_agbp_relse;
-		error = xfs_free_ag_extent(tp, agbp, args->agno, bno, 1,
-					   &targs.oinfo, 1);
-		if (error)
-			goto out_agbp_relse;
-		bp = xfs_btree_get_bufs(mp, tp, args->agno, bno, 0);
-		xfs_trans_binval(tp, bp);
+			error = xfs_alloc_get_freelist(tp, agbp, &bno, 0);
+			if (error)
+				goto out_agbp_relse;
+			error = xfs_free_ag_extent(tp, agbp, args->agno, bno, 1,
+						   &targs.oinfo, 1);
+			if (error)
+				goto out_agbp_relse;
+			bp = xfs_btree_get_bufs(mp, tp, args->agno, bno, 0);
+			xfs_trans_binval(tp, bp);
+		}
 	}
 
-	memset(&targs, 0, sizeof(targs));
 	targs.tp = tp;
 	targs.mp = mp;
-	xfs_rmap_ag_owner(&targs.oinfo, XFS_RMAP_OWN_AG);
+	if (!(flags & XFS_ALLOC_FLAG_NORMAP))
+		xfs_rmap_ag_owner(&targs.oinfo, XFS_RMAP_OWN_AG);
 	targs.agbp = agbp;
 	targs.agno = args->agno;
 	targs.alignment = targs.minlen = targs.prod = targs.isfl = 1;
diff --git a/libxfs/xfs_alloc.h b/libxfs/xfs_alloc.h
index 7b6c66b..7b9e67e 100644
--- a/libxfs/xfs_alloc.h
+++ b/libxfs/xfs_alloc.h
@@ -54,6 +54,9 @@ typedef unsigned int xfs_alloctype_t;
  */
 #define	XFS_ALLOC_FLAG_TRYLOCK	0x00000001  /* use trylock for buffer locking */
 #define	XFS_ALLOC_FLAG_FREEING	0x00000002  /* indicate caller is freeing extents*/
+#define	XFS_ALLOC_FLAG_NORMAP	0x00000004  /* don't modify the rmapbt */
+#define	XFS_ALLOC_FLAG_NOSHRINK	0x00000008  /* don't shrink the freelist */
+
 
 /*
  * Argument structure for xfs_alloc routines.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 049/145] xfs: enable the rmap btree functionality
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (47 preceding siblings ...)
  2016-06-17  1:35 ` [PATCH 048/145] xfs: don't update rmapbt when fixing agfl Darrick J. Wong
@ 2016-06-17  1:35 ` Darrick J. Wong
  2016-06-17  1:36 ` [PATCH 050/145] xfs_db: display rmap btree contents Darrick J. Wong
                   ` (95 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:35 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: Dave Chinner, xfs

From: Dave Chinner <dchinner@redhat.com>

Add the feature flag to the supported matrix so that the kernel can
mount and use rmap btree enabled filesystems

v2: Move the EXPERIMENTAL message to fill_super so it only prints once.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
[darrick.wong@oracle.com: move the experimental tag]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_format.h |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index 2525004..7205806 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -457,7 +457,8 @@ xfs_sb_has_compat_feature(
 #define XFS_SB_FEAT_RO_COMPAT_FINOBT   (1 << 0)		/* free inode btree */
 #define XFS_SB_FEAT_RO_COMPAT_RMAPBT   (1 << 1)		/* reverse map btree */
 #define XFS_SB_FEAT_RO_COMPAT_ALL \
-		(XFS_SB_FEAT_RO_COMPAT_FINOBT)
+		(XFS_SB_FEAT_RO_COMPAT_FINOBT | \
+		 XFS_SB_FEAT_RO_COMPAT_RMAPBT)
 #define XFS_SB_FEAT_RO_COMPAT_UNKNOWN	~XFS_SB_FEAT_RO_COMPAT_ALL
 static inline bool
 xfs_sb_has_ro_compat_feature(

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 050/145] xfs_db: display rmap btree contents
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (48 preceding siblings ...)
  2016-06-17  1:35 ` [PATCH 049/145] xfs: enable the rmap btree functionality Darrick J. Wong
@ 2016-06-17  1:36 ` Darrick J. Wong
  2016-06-17  1:36 ` [PATCH 051/145] xfs_db: spot check rmapbt Darrick J. Wong
                   ` (94 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:36 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: Dave Chinner, xfs

From: Dave Chinner <dchinner@redhat.com>

Teach the debugger how to dump the reverse-mapping btree contents.
Decode the extra fields in the rmapbt records and keys now that we
support reflink.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
[darrick: split patch, add commit message, decode extra fields]
[darrick: support overlapped interval btree fields]
[darrick: move unwritten bit to rm_offset]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 db/agf.c          |    6 +++
 db/btblock.c      |  100 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 db/btblock.h      |    5 +++
 db/field.c        |   19 ++++++++++
 db/field.h        |    9 +++++
 db/type.c         |    5 +++
 db/type.h         |    2 +
 man/man8/xfs_db.8 |   56 +++++++++++++++++++++++++++++-
 8 files changed, 200 insertions(+), 2 deletions(-)


diff --git a/db/agf.c b/db/agf.c
index e10526d..f4c4269 100644
--- a/db/agf.c
+++ b/db/agf.c
@@ -55,6 +55,9 @@ const field_t	agf_flds[] = {
 	{ "cntroot", FLDT_AGBLOCK,
 	  OI(OFF(roots) + XFS_BTNUM_CNT * SZ(roots[XFS_BTNUM_CNT])), C1, 0,
 	  TYP_CNTBT },
+	{ "rmaproot", FLDT_AGBLOCKNZ,
+	  OI(OFF(roots) + XFS_BTNUM_RMAP * SZ(roots[XFS_BTNUM_RMAP])), C1, 0,
+	  TYP_RMAPBT },
 	{ "levels", FLDT_UINT32D, OI(OFF(levels)), CI(XFS_BTNUM_AGF),
 	  FLD_ARRAY|FLD_SKIPALL, TYP_NONE },
 	{ "bnolevel", FLDT_UINT32D,
@@ -63,6 +66,9 @@ const field_t	agf_flds[] = {
 	{ "cntlevel", FLDT_UINT32D,
 	  OI(OFF(levels) + XFS_BTNUM_CNT * SZ(levels[XFS_BTNUM_CNT])), C1, 0,
 	  TYP_NONE },
+	{ "rmaplevel", FLDT_UINT32D,
+	  OI(OFF(levels) + XFS_BTNUM_RMAP * SZ(levels[XFS_BTNUM_RMAP])), C1, 0,
+	  TYP_NONE },
 	{ "flfirst", FLDT_UINT32D, OI(OFF(flfirst)), C1, 0, TYP_NONE },
 	{ "fllast", FLDT_UINT32D, OI(OFF(fllast)), C1, 0, TYP_NONE },
 	{ "flcount", FLDT_UINT32D, OI(OFF(flcount)), C1, 0, TYP_NONE },
diff --git a/db/btblock.c b/db/btblock.c
index 46140fc..ce59d18 100644
--- a/db/btblock.c
+++ b/db/btblock.c
@@ -96,6 +96,12 @@ struct xfs_db_btree {
 		sizeof(xfs_inobt_rec_t),
 		sizeof(__be32),
 	},
+	{	XFS_RMAP_CRC_MAGIC,
+		XFS_BTREE_SBLOCK_CRC_LEN,
+		2 * sizeof(struct xfs_rmap_key),
+		sizeof(struct xfs_rmap_rec),
+		sizeof(__be32),
+	},
 	{	0,
 	},
 };
@@ -607,3 +613,97 @@ const field_t	cntbt_rec_flds[] = {
 	{ NULL }
 };
 #undef ROFF
+
+/* RMAP btree blocks */
+const field_t	rmapbt_crc_hfld[] = {
+	{ "", FLDT_RMAPBT_CRC, OI(0), C1, 0, TYP_NONE },
+	{ NULL }
+};
+
+#define	OFF(f)	bitize(offsetof(struct xfs_btree_block, bb_ ## f))
+const field_t	rmapbt_crc_flds[] = {
+	{ "magic", FLDT_UINT32X, OI(OFF(magic)), C1, 0, TYP_NONE },
+	{ "level", FLDT_UINT16D, OI(OFF(level)), C1, 0, TYP_NONE },
+	{ "numrecs", FLDT_UINT16D, OI(OFF(numrecs)), C1, 0, TYP_NONE },
+	{ "leftsib", FLDT_AGBLOCK, OI(OFF(u.s.bb_leftsib)), C1, 0, TYP_RMAPBT },
+	{ "rightsib", FLDT_AGBLOCK, OI(OFF(u.s.bb_rightsib)), C1, 0, TYP_RMAPBT },
+	{ "bno", FLDT_DFSBNO, OI(OFF(u.s.bb_blkno)), C1, 0, TYP_RMAPBT },
+	{ "lsn", FLDT_UINT64X, OI(OFF(u.s.bb_lsn)), C1, 0, TYP_NONE },
+	{ "uuid", FLDT_UUID, OI(OFF(u.s.bb_uuid)), C1, 0, TYP_NONE },
+	{ "owner", FLDT_AGNUMBER, OI(OFF(u.s.bb_owner)), C1, 0, TYP_NONE },
+	{ "crc", FLDT_CRC, OI(OFF(u.s.bb_crc)), C1, 0, TYP_NONE },
+	{ "recs", FLDT_RMAPBTREC, btblock_rec_offset, btblock_rec_count,
+	  FLD_ARRAY|FLD_ABASE1|FLD_COUNT|FLD_OFFSET, TYP_NONE },
+	{ "keys", FLDT_RMAPBTKEY, btblock_key_offset, btblock_key_count,
+	  FLD_ARRAY|FLD_ABASE1|FLD_COUNT|FLD_OFFSET, TYP_NONE },
+	{ "ptrs", FLDT_RMAPBTPTR, btblock_ptr_offset, btblock_key_count,
+	  FLD_ARRAY|FLD_ABASE1|FLD_COUNT|FLD_OFFSET, TYP_RMAPBT },
+	{ NULL }
+};
+#undef OFF
+
+#define	KOFF(f)	bitize(offsetof(struct xfs_rmap_key, rm_ ## f))
+
+#define RMAPBK_STARTBLOCK_BITOFF	0
+#define RMAPBK_OWNER_BITOFF		(RMAPBK_STARTBLOCK_BITOFF + RMAPBT_STARTBLOCK_BITLEN)
+#define RMAPBK_ATTRFLAG_BITOFF		(RMAPBK_OWNER_BITOFF + RMAPBT_OWNER_BITLEN)
+#define RMAPBK_BMBTFLAG_BITOFF		(RMAPBK_ATTRFLAG_BITOFF + RMAPBT_ATTRFLAG_BITLEN)
+#define RMAPBK_EXNTFLAG_BITOFF		(RMAPBK_BMBTFLAG_BITOFF + RMAPBT_BMBTFLAG_BITLEN)
+#define RMAPBK_UNUSED_OFFSET_BITOFF	(RMAPBK_EXNTFLAG_BITOFF + RMAPBT_EXNTFLAG_BITLEN)
+#define RMAPBK_OFFSET_BITOFF		(RMAPBK_UNUSED_OFFSET_BITOFF + RMAPBT_UNUSED_OFFSET_BITLEN)
+
+#define HI_KOFF(f)	bitize(sizeof(struct xfs_rmap_key) + offsetof(struct xfs_rmap_key, rm_ ## f))
+
+#define RMAPBK_STARTBLOCKHI_BITOFF	(bitize(sizeof(struct xfs_rmap_key)))
+#define RMAPBK_OWNERHI_BITOFF		(RMAPBK_STARTBLOCKHI_BITOFF + RMAPBT_STARTBLOCK_BITLEN)
+#define RMAPBK_ATTRFLAGHI_BITOFF	(RMAPBK_OWNERHI_BITOFF + RMAPBT_OWNER_BITLEN)
+#define RMAPBK_BMBTFLAGHI_BITOFF	(RMAPBK_ATTRFLAGHI_BITOFF + RMAPBT_ATTRFLAG_BITLEN)
+#define RMAPBK_EXNTFLAGHI_BITOFF	(RMAPBK_BMBTFLAGHI_BITOFF + RMAPBT_BMBTFLAG_BITLEN)
+#define RMAPBK_UNUSED_OFFSETHI_BITOFF	(RMAPBK_EXNTFLAGHI_BITOFF + RMAPBT_EXNTFLAG_BITLEN)
+#define RMAPBK_OFFSETHI_BITOFF		(RMAPBK_UNUSED_OFFSETHI_BITOFF + RMAPBT_UNUSED_OFFSET_BITLEN)
+
+const field_t	rmapbt_key_flds[] = {
+	{ "startblock", FLDT_AGBLOCK, OI(KOFF(startblock)), C1, 0, TYP_DATA },
+	{ "owner", FLDT_INT64D, OI(KOFF(owner)), C1, 0, TYP_NONE },
+	{ "offset", FLDT_RFILEOFFD, OI(RMAPBK_OFFSET_BITOFF), C1, 0, TYP_NONE },
+	{ "attrfork", FLDT_RATTRFORKFLG, OI(RMAPBK_ATTRFLAG_BITOFF), C1, 0,
+	  TYP_NONE },
+	{ "bmbtblock", FLDT_RBMBTFLG, OI(RMAPBK_BMBTFLAG_BITOFF), C1, 0,
+	  TYP_NONE },
+	{ "startblock_hi", FLDT_AGBLOCK, OI(HI_KOFF(startblock)), C1, 0, TYP_DATA },
+	{ "owner_hi", FLDT_INT64D, OI(HI_KOFF(owner)), C1, 0, TYP_NONE },
+	{ "offset_hi", FLDT_RFILEOFFD, OI(RMAPBK_OFFSETHI_BITOFF), C1, 0, TYP_NONE },
+	{ "attrfork_hi", FLDT_RATTRFORKFLG, OI(RMAPBK_ATTRFLAGHI_BITOFF), C1, 0,
+	  TYP_NONE },
+	{ "bmbtblock_hi", FLDT_RBMBTFLG, OI(RMAPBK_BMBTFLAGHI_BITOFF), C1, 0,
+	  TYP_NONE },
+	{ NULL }
+};
+#undef HI_KOFF
+#undef KOFF
+
+#define	ROFF(f)	bitize(offsetof(struct xfs_rmap_rec, rm_ ## f))
+
+#define RMAPBT_STARTBLOCK_BITOFF	0
+#define RMAPBT_BLOCKCOUNT_BITOFF	(RMAPBT_STARTBLOCK_BITOFF + RMAPBT_STARTBLOCK_BITLEN)
+#define RMAPBT_OWNER_BITOFF		(RMAPBT_BLOCKCOUNT_BITOFF + RMAPBT_BLOCKCOUNT_BITLEN)
+#define RMAPBT_ATTRFLAG_BITOFF		(RMAPBT_OWNER_BITOFF + RMAPBT_OWNER_BITLEN)
+#define RMAPBT_BMBTFLAG_BITOFF		(RMAPBT_ATTRFLAG_BITOFF + RMAPBT_ATTRFLAG_BITLEN)
+#define RMAPBT_EXNTFLAG_BITOFF		(RMAPBT_BMBTFLAG_BITOFF + RMAPBT_BMBTFLAG_BITLEN)
+#define RMAPBT_UNUSED_OFFSET_BITOFF	(RMAPBT_EXNTFLAG_BITOFF + RMAPBT_EXNTFLAG_BITLEN)
+#define RMAPBT_OFFSET_BITOFF		(RMAPBT_UNUSED_OFFSET_BITOFF + RMAPBT_UNUSED_OFFSET_BITLEN)
+
+const field_t	rmapbt_rec_flds[] = {
+	{ "startblock", FLDT_AGBLOCK, OI(RMAPBT_STARTBLOCK_BITOFF), C1, 0, TYP_DATA },
+	{ "blockcount", FLDT_REXTLEN, OI(RMAPBT_BLOCKCOUNT_BITOFF), C1, 0, TYP_NONE },
+	{ "owner", FLDT_INT64D, OI(RMAPBT_OWNER_BITOFF), C1, 0, TYP_NONE },
+	{ "offset", FLDT_RFILEOFFD, OI(RMAPBT_OFFSET_BITOFF), C1, 0, TYP_NONE },
+	{ "extentflag", FLDT_REXTFLG, OI(RMAPBT_EXNTFLAG_BITOFF), C1, 0,
+	  TYP_NONE },
+	{ "attrfork", FLDT_RATTRFORKFLG, OI(RMAPBT_ATTRFLAG_BITOFF), C1, 0,
+	  TYP_NONE },
+	{ "bmbtblock", FLDT_RBMBTFLG, OI(RMAPBT_BMBTFLAG_BITOFF), C1, 0,
+	  TYP_NONE },
+	{ NULL }
+};
+#undef ROFF
diff --git a/db/btblock.h b/db/btblock.h
index 228eb36..35299b4 100644
--- a/db/btblock.h
+++ b/db/btblock.h
@@ -54,4 +54,9 @@ extern const struct field	cntbt_crc_hfld[];
 extern const struct field	cntbt_key_flds[];
 extern const struct field	cntbt_rec_flds[];
 
+extern const struct field	rmapbt_crc_flds[];
+extern const struct field	rmapbt_crc_hfld[];
+extern const struct field	rmapbt_key_flds[];
+extern const struct field	rmapbt_rec_flds[];
+
 extern int	btblock_size(void *obj, int startoff, int idx);
diff --git a/db/field.c b/db/field.c
index 843c385..58728a9 100644
--- a/db/field.c
+++ b/db/field.c
@@ -153,6 +153,16 @@ const ftattr_t	ftattrtab[] = {
 	{ FLDT_CHARNS, "charns", fp_charns, NULL, SI(bitsz(char)), 0, NULL,
 	  NULL },
 	{ FLDT_CHARS, "chars", fp_num, "%c", SI(bitsz(char)), 0, NULL, NULL },
+	{ FLDT_REXTLEN, "rextlen", fp_num, "%u", SI(RMAPBT_BLOCKCOUNT_BITLEN),
+	  0, NULL, NULL },
+	{ FLDT_RFILEOFFD, "rfileoffd", fp_num, "%llu", SI(RMAPBT_OFFSET_BITLEN),
+	  0, NULL, NULL },
+	{ FLDT_REXTFLG, "rextflag", fp_num, "%u", SI(RMAPBT_EXNTFLAG_BITLEN), 0,
+	  NULL, NULL },
+	{ FLDT_RATTRFORKFLG, "rattrforkflag", fp_num, "%u", SI(RMAPBT_ATTRFLAG_BITLEN), 0,
+	  NULL, NULL },
+	{ FLDT_RBMBTFLG, "rbmbtflag", fp_num, "%u", SI(RMAPBT_BMBTFLAG_BITLEN), 0,
+	  NULL, NULL },
 	{ FLDT_CNTBT, "cntbt", NULL, (char *)cntbt_flds, btblock_size, FTARG_SIZE,
 	  NULL, cntbt_flds },
 	{ FLDT_CNTBT_CRC, "cntbt", NULL, (char *)cntbt_crc_flds, btblock_size,
@@ -164,6 +174,15 @@ const ftattr_t	ftattrtab[] = {
 	{ FLDT_CNTBTREC, "cntbtrec", fp_sarray, (char *)cntbt_rec_flds,
 	  SI(bitsz(xfs_alloc_rec_t)), 0, NULL, cntbt_rec_flds },
 
+	{ FLDT_RMAPBT_CRC, "rmapbt", NULL, (char *)rmapbt_crc_flds, btblock_size,
+	  FTARG_SIZE, NULL, rmapbt_crc_flds },
+	{ FLDT_RMAPBTKEY, "rmapbtkey", fp_sarray, (char *)rmapbt_key_flds,
+	  SI(bitize(2 * sizeof(struct xfs_rmap_key))), 0, NULL, rmapbt_key_flds },
+	{ FLDT_RMAPBTPTR, "rmapbtptr", fp_num, "%u",
+	  SI(bitsz(xfs_rmap_ptr_t)), 0, fa_agblock, NULL },
+	{ FLDT_RMAPBTREC, "rmapbtrec", fp_sarray, (char *)rmapbt_rec_flds,
+	  SI(bitsz(struct xfs_rmap_rec)), 0, NULL, rmapbt_rec_flds },
+
 /* CRC field */
 	{ FLDT_CRC, "crc", fp_crc, "%#x (%s)", SI(bitsz(__uint32_t)),
 	  0, NULL, NULL },
diff --git a/db/field.h b/db/field.h
index 11aebc3..47f562a 100644
--- a/db/field.h
+++ b/db/field.h
@@ -75,11 +75,20 @@ typedef enum fldt	{
 	FLDT_CFSBLOCK,
 	FLDT_CHARNS,
 	FLDT_CHARS,
+	FLDT_REXTLEN,
+	FLDT_RFILEOFFD,
+	FLDT_REXTFLG,
+	FLDT_RATTRFORKFLG,
+	FLDT_RBMBTFLG,
 	FLDT_CNTBT,
 	FLDT_CNTBT_CRC,
 	FLDT_CNTBTKEY,
 	FLDT_CNTBTPTR,
 	FLDT_CNTBTREC,
+	FLDT_RMAPBT_CRC,
+	FLDT_RMAPBTKEY,
+	FLDT_RMAPBTPTR,
+	FLDT_RMAPBTREC,
 
 	/* CRC field type */
 	FLDT_CRC,
diff --git a/db/type.c b/db/type.c
index 1da7ee1..dd192a1 100644
--- a/db/type.c
+++ b/db/type.c
@@ -58,6 +58,7 @@ static const typ_t	__typtab[] = {
 	{ TYP_BMAPBTD, "bmapbtd", handle_struct, bmapbtd_hfld, NULL },
 	{ TYP_BNOBT, "bnobt", handle_struct, bnobt_hfld, NULL },
 	{ TYP_CNTBT, "cntbt", handle_struct, cntbt_hfld, NULL },
+	{ TYP_RMAPBT, NULL },
 	{ TYP_DATA, "data", handle_block, NULL, NULL },
 	{ TYP_DIR2, "dir2", handle_struct, dir2_hfld, NULL },
 	{ TYP_DQBLK, "dqblk", handle_struct, dqblk_hfld, NULL },
@@ -88,6 +89,8 @@ static const typ_t	__typtab_crc[] = {
 		&xfs_allocbt_buf_ops },
 	{ TYP_CNTBT, "cntbt", handle_struct, cntbt_crc_hfld,
 		&xfs_allocbt_buf_ops },
+	{ TYP_RMAPBT, "rmapbt", handle_struct, rmapbt_crc_hfld,
+		&xfs_rmapbt_buf_ops },
 	{ TYP_DATA, "data", handle_block, NULL, NULL },
 	{ TYP_DIR2, "dir3", handle_struct, dir3_hfld,
 		&xfs_dir3_db_buf_ops },
@@ -124,6 +127,8 @@ static const typ_t	__typtab_spcrc[] = {
 		&xfs_allocbt_buf_ops },
 	{ TYP_CNTBT, "cntbt", handle_struct, cntbt_crc_hfld,
 		&xfs_allocbt_buf_ops },
+	{ TYP_RMAPBT, "rmapbt", handle_struct, rmapbt_crc_hfld,
+		&xfs_rmapbt_buf_ops },
 	{ TYP_DATA, "data", handle_block, NULL, NULL },
 	{ TYP_DIR2, "dir3", handle_struct, dir3_hfld,
 		&xfs_dir3_db_buf_ops },
diff --git a/db/type.h b/db/type.h
index d9583e5..1bef8e6 100644
--- a/db/type.h
+++ b/db/type.h
@@ -24,7 +24,7 @@ struct field;
 typedef enum typnm
 {
 	TYP_AGF, TYP_AGFL, TYP_AGI, TYP_ATTR, TYP_BMAPBTA,
-	TYP_BMAPBTD, TYP_BNOBT, TYP_CNTBT, TYP_DATA,
+	TYP_BMAPBTD, TYP_BNOBT, TYP_CNTBT, TYP_RMAPBT, TYP_DATA,
 	TYP_DIR2, TYP_DQBLK, TYP_INOBT, TYP_INODATA, TYP_INODE,
 	TYP_LOG, TYP_RTBITMAP, TYP_RTSUMMARY, TYP_SB, TYP_SYMLINK,
 	TYP_TEXT, TYP_FINOBT, TYP_NONE
diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8
index ff8f862..a380f78 100644
--- a/man/man8/xfs_db.8
+++ b/man/man8/xfs_db.8
@@ -673,7 +673,7 @@ If no argument is given, show the current data type.
 The possible data types are:
 .BR agf ", " agfl ", " agi ", " attr ", " bmapbta ", " bmapbtd ,
 .BR bnobt ", " cntbt ", " data ", " dir ", " dir2 ", " dqblk ,
-.BR inobt ", " inode ", " log ", " rtbitmap ", " rtsummary ,
+.BR inobt ", " inode ", " log ", " rmapbt ", " rtbitmap ", " rtsummary ,
 .BR sb ", " symlink " and " text .
 See the TYPES section below for more information on these data types.
 .TP
@@ -1658,6 +1658,60 @@ use
 .BR xfs_logprint (8)
 instead.
 .TP
+.B rmapbt
+There is one set of filesystem blocks forming the reverse mapping Btree for
+each allocation group. The root block of this Btree is designated by the
+.B rmaproot
+field in the corresponding AGF block.  The blocks are linked to sibling left
+and right blocks at each level, as well as by pointers from parent to child
+blocks.  Each block has the following fields:
+.RS 1.4i
+.PD 0
+.TP 1.2i
+.B magic
+RMAP block magic number, 0x524d4233 ('RMB3').
+.TP
+.B level
+level number of this block, 0 is a leaf.
+.TP
+.B numrecs
+number of data entries in the block.
+.TP
+.B leftsib
+left (logically lower) sibling block, 0 if none.
+.TP
+.B rightsib
+right (logically higher) sibling block, 0 if none.
+.TP
+.B recs
+[leaf blocks only] array of reference count records. Each record contains
+.BR startblock ,
+.BR blockcount ,
+.BR owner ,
+.BR offset ,
+.BR attr_fork ,
+.BR bmbt_block ,
+and
+.BR unwritten .
+.TP
+.B keys
+[non-leaf blocks only] array of double-key records. The first ("low") key
+contains the first value of each block in the level below this one. The second
+("high") key contains the largest key that can be used to identify any record
+in the subtree. Each record contains
+.BR startblock ,
+.BR owner ,
+.BR offset ,
+.BR attr_fork ,
+and
+.BR bmbt_block .
+.TP
+.B ptrs
+[non-leaf blocks only] array of child block pointers. Each pointer is a
+block number within the allocation group to the next level in the Btree.
+.PD
+.RE
+.TP
 .B rtbitmap
 If the filesystem has a realtime subvolume, then the
 .B rbmino

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 051/145] xfs_db: spot check rmapbt
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (49 preceding siblings ...)
  2016-06-17  1:36 ` [PATCH 050/145] xfs_db: display rmap btree contents Darrick J. Wong
@ 2016-06-17  1:36 ` Darrick J. Wong
  2016-06-17  1:36 ` [PATCH 052/145] xfs_db: copy the rmap btree Darrick J. Wong
                   ` (93 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:36 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Check the rmapbt for obvious errors.  We're leaving thorough checks
such as comparing the primary metadata against the rmapbt contents
for newer things like xfs_repair.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 db/check.c |   85 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 84 insertions(+), 1 deletion(-)


diff --git a/db/check.c b/db/check.c
index 0871ed7..2964b5f 100644
--- a/db/check.c
+++ b/db/check.c
@@ -44,7 +44,7 @@ typedef enum {
 	DBM_FREE1,	DBM_FREE2,	DBM_FREELIST,	DBM_INODE,
 	DBM_LOG,	DBM_MISSING,	DBM_QUOTA,	DBM_RTBITMAP,
 	DBM_RTDATA,	DBM_RTFREE,	DBM_RTSUM,	DBM_SB,
-	DBM_SYMLINK,	DBM_BTFINO,
+	DBM_SYMLINK,	DBM_BTFINO,	DBM_BTRMAP,
 	DBM_NDBM
 } dbm_t;
 
@@ -171,6 +171,7 @@ static const char	*typename[] = {
 	"sb",
 	"symlink",
 	"btfino",
+	"btrmap",
 	NULL
 };
 static int		verbose;
@@ -349,6 +350,9 @@ static void		scanfunc_ino(struct xfs_btree_block *block, int level,
 static void		scanfunc_fino(struct xfs_btree_block *block, int level,
 				     struct xfs_agf *agf, xfs_agblock_t bno,
 				     int isroot);
+static void		scanfunc_rmap(struct xfs_btree_block *block, int level,
+				     struct xfs_agf *agf, xfs_agblock_t bno,
+				     int isroot);
 static void		set_dbmap(xfs_agnumber_t agno, xfs_agblock_t agbno,
 				  xfs_extlen_t len, dbm_t type,
 				  xfs_agnumber_t c_agno, xfs_agblock_t c_agbno);
@@ -1050,6 +1054,7 @@ blocktrash_f(
 		   (1 << DBM_RTSUM) |
 		   (1 << DBM_SYMLINK) |
 		   (1 << DBM_BTFINO) |
+		   (1 << DBM_BTRMAP) |
 		   (1 << DBM_SB);
 	while ((c = getopt(argc, argv, "0123n:o:s:t:x:y:z")) != EOF) {
 		switch (c) {
@@ -3899,6 +3904,12 @@ scan_ag(
 		be32_to_cpu(agf->agf_roots[XFS_BTNUM_CNT]),
 		be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNT]),
 		1, scanfunc_cnt, TYP_CNTBT);
+	if (agf->agf_roots[XFS_BTNUM_RMAP]) {
+		scan_sbtree(agf,
+			be32_to_cpu(agf->agf_roots[XFS_BTNUM_RMAP]),
+			be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAP]),
+			1, scanfunc_rmap, TYP_RMAPBT);
+	}
 	scan_sbtree(agf,
 		be32_to_cpu(agi->agi_root),
 		be32_to_cpu(agi->agi_level),
@@ -4650,6 +4661,78 @@ scanfunc_fino(
 }
 
 static void
+scanfunc_rmap(
+	struct xfs_btree_block	*block,
+	int			level,
+	struct xfs_agf		*agf,
+	xfs_agblock_t		bno,
+	int			isroot)
+{
+	xfs_agnumber_t		seqno = be32_to_cpu(agf->agf_seqno);
+	int			i;
+	xfs_rmap_ptr_t		*pp;
+	struct xfs_rmap_rec	*rp;
+	xfs_agblock_t		lastblock;
+
+	if (be32_to_cpu(block->bb_magic) != XFS_RMAP_CRC_MAGIC) {
+		dbprintf(_("bad magic # %#x in rmapbt block %u/%u\n"),
+			be32_to_cpu(block->bb_magic), seqno, bno);
+		serious_error++;
+		return;
+	}
+	if (be16_to_cpu(block->bb_level) != level) {
+		if (!sflag)
+			dbprintf(_("expected level %d got %d in rmapbt block "
+				 "%u/%u\n"),
+				level, be16_to_cpu(block->bb_level), seqno, bno);
+		error++;
+	}
+	if (!isroot) {
+		fdblocks++;
+		agfbtreeblks++;
+	}
+	set_dbmap(seqno, bno, 1, DBM_BTRMAP, seqno, bno);
+	if (level == 0) {
+		if (be16_to_cpu(block->bb_numrecs) > mp->m_rmap_mxr[0] ||
+		    (isroot == 0 && be16_to_cpu(block->bb_numrecs) < mp->m_rmap_mnr[0])) {
+			dbprintf(_("bad btree nrecs (%u, min=%u, max=%u) in "
+				 "rmapbt block %u/%u\n"),
+				be16_to_cpu(block->bb_numrecs), mp->m_rmap_mnr[0],
+				mp->m_rmap_mxr[0], seqno, bno);
+			serious_error++;
+			return;
+		}
+		rp = XFS_RMAP_REC_ADDR(block, 1);
+		lastblock = 0;
+		for (i = 0; i < be16_to_cpu(block->bb_numrecs); i++) {
+			if (be32_to_cpu(rp[i].rm_startblock) < lastblock) {
+				dbprintf(_(
+		"out-of-order rmap btree record %d (%u %u) block %u/%u\n"),
+					 i, be32_to_cpu(rp[i].rm_startblock),
+					 be32_to_cpu(rp[i].rm_startblock),
+					 be32_to_cpu(agf->agf_seqno), bno);
+			} else {
+				lastblock = be32_to_cpu(rp[i].rm_startblock);
+			}
+		}
+		return;
+	}
+	if (be16_to_cpu(block->bb_numrecs) > mp->m_rmap_mxr[1] ||
+	    (isroot == 0 && be16_to_cpu(block->bb_numrecs) < mp->m_rmap_mnr[1])) {
+		dbprintf(_("bad btree nrecs (%u, min=%u, max=%u) in rmapbt "
+			 "block %u/%u\n"),
+			be16_to_cpu(block->bb_numrecs), mp->m_rmap_mnr[1],
+			mp->m_rmap_mxr[1], seqno, bno);
+		serious_error++;
+		return;
+	}
+	pp = XFS_RMAP_PTR_ADDR(block, 1, mp->m_rmap_mxr[1]);
+	for (i = 0; i < be16_to_cpu(block->bb_numrecs); i++)
+		scan_sbtree(agf, be32_to_cpu(pp[i]), level, 0, scanfunc_rmap,
+				TYP_RMAPBT);
+}
+
+static void
 set_dbmap(
 	xfs_agnumber_t	agno,
 	xfs_agblock_t	agbno,

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 052/145] xfs_db: copy the rmap btree
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (50 preceding siblings ...)
  2016-06-17  1:36 ` [PATCH 051/145] xfs_db: spot check rmapbt Darrick J. Wong
@ 2016-06-17  1:36 ` Darrick J. Wong
  2016-06-17  1:36 ` [PATCH 053/145] xfs_growfs: report rmapbt presence Darrick J. Wong
                   ` (92 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:36 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Copy the rmapbt when we're metadumping the filesystem.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 db/metadump.c |   74 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 74 insertions(+)


diff --git a/db/metadump.c b/db/metadump.c
index d7ff6e5..609a5d7 100644
--- a/db/metadump.c
+++ b/db/metadump.c
@@ -543,6 +543,78 @@ copy_free_cnt_btree(
 	return scan_btree(agno, root, levels, TYP_CNTBT, agf, scanfunc_freesp);
 }
 
+static int
+scanfunc_rmapbt(
+	struct xfs_btree_block	*block,
+	xfs_agnumber_t		agno,
+	xfs_agblock_t		agbno,
+	int			level,
+	typnm_t			btype,
+	void			*arg)
+{
+	xfs_rmap_ptr_t		*pp;
+	int			i;
+	int			numrecs;
+
+	if (level == 0)
+		return 1;
+
+	numrecs = be16_to_cpu(block->bb_numrecs);
+	if (numrecs > mp->m_rmap_mxr[1]) {
+		if (show_warnings)
+			print_warning("invalid numrecs (%u) in %s block %u/%u",
+				numrecs, typtab[btype].name, agno, agbno);
+		return 1;
+	}
+
+	pp = XFS_RMAP_PTR_ADDR(block, 1, mp->m_rmap_mxr[1]);
+	for (i = 0; i < numrecs; i++) {
+		if (!valid_bno(agno, be32_to_cpu(pp[i]))) {
+			if (show_warnings)
+				print_warning("invalid block number (%u/%u) "
+					"in %s block %u/%u",
+					agno, be32_to_cpu(pp[i]),
+					typtab[btype].name, agno, agbno);
+			continue;
+		}
+		if (!scan_btree(agno, be32_to_cpu(pp[i]), level, btype, arg,
+				scanfunc_rmapbt))
+			return 0;
+	}
+	return 1;
+}
+
+static int
+copy_rmap_btree(
+	xfs_agnumber_t	agno,
+	struct xfs_agf	*agf)
+{
+	xfs_agblock_t	root;
+	int		levels;
+
+	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+		return 1;
+
+	root = be32_to_cpu(agf->agf_roots[XFS_BTNUM_RMAP]);
+	levels = be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAP]);
+
+	/* validate root and levels before processing the tree */
+	if (root == 0 || root > mp->m_sb.sb_agblocks) {
+		if (show_warnings)
+			print_warning("invalid block number (%u) in rmapbt "
+					"root in agf %u", root, agno);
+		return 1;
+	}
+	if (levels >= XFS_BTREE_MAXLEVELS) {
+		if (show_warnings)
+			print_warning("invalid level (%u) in rmapbt root "
+					"in agf %u", levels, agno);
+		return 1;
+	}
+
+	return scan_btree(agno, root, levels, TYP_RMAPBT, agf, scanfunc_rmapbt);
+}
+
 /* filename and extended attribute obfuscation routines */
 
 struct name_ent {
@@ -2451,6 +2523,8 @@ scan_ag(
 			goto pop_out;
 		if (!copy_free_cnt_btree(agno, agf))
 			goto pop_out;
+		if (!copy_rmap_btree(agno, agf))
+			goto pop_out;
 	}
 
 	/* copy inode btrees and the inodes and their associated metadata */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 053/145] xfs_growfs: report rmapbt presence
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (51 preceding siblings ...)
  2016-06-17  1:36 ` [PATCH 052/145] xfs_db: copy the rmap btree Darrick J. Wong
@ 2016-06-17  1:36 ` Darrick J. Wong
  2016-06-17  1:36 ` [PATCH 054/145] xfs_io: add rmap-finish error injection type Darrick J. Wong
                   ` (91 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:36 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 growfs/xfs_growfs.c |   14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)


diff --git a/growfs/xfs_growfs.c b/growfs/xfs_growfs.c
index 56315f9..2b46480 100644
--- a/growfs/xfs_growfs.c
+++ b/growfs/xfs_growfs.c
@@ -58,12 +58,13 @@ report_info(
 	int		cimode,
 	int		ftype_enabled,
 	int		finobt_enabled,
-	int		spinodes)
+	int		spinodes,
+	int		rmapbt_enabled)
 {
 	printf(_(
 	    "meta-data=%-22s isize=%-6u agcount=%u, agsize=%u blks\n"
 	    "         =%-22s sectsz=%-5u attr=%u, projid32bit=%u\n"
-	    "         =%-22s crc=%-8u finobt=%u spinodes=%u\n"
+	    "         =%-22s crc=%-8u finobt=%u spinodes=%u rmapbt=%u\n"
 	    "data     =%-22s bsize=%-6u blocks=%llu, imaxpct=%u\n"
 	    "         =%-22s sunit=%-6u swidth=%u blks\n"
 	    "naming   =version %-14u bsize=%-6u ascii-ci=%d ftype=%d\n"
@@ -73,7 +74,7 @@ report_info(
 
 		mntpoint, geo.inodesize, geo.agcount, geo.agblocks,
 		"", geo.sectsize, attrversion, projid32bit,
-		"", crcs_enabled, finobt_enabled, spinodes,
+		"", crcs_enabled, finobt_enabled, spinodes, rmapbt_enabled,
 		"", geo.blocksize, (unsigned long long)geo.datablocks,
 			geo.imaxpct,
 		"", geo.sunit, geo.swidth,
@@ -127,6 +128,7 @@ main(int argc, char **argv)
 	int			ftype_enabled = 0;
 	int			finobt_enabled;	/* free inode btree */
 	int			spinodes;
+	int			rmapbt_enabled;
 
 	progname = basename(argv[0]);
 	setlocale(LC_ALL, "");
@@ -250,11 +252,13 @@ main(int argc, char **argv)
 	ftype_enabled = geo.flags & XFS_FSOP_GEOM_FLAGS_FTYPE ? 1 : 0;
 	finobt_enabled = geo.flags & XFS_FSOP_GEOM_FLAGS_FINOBT ? 1 : 0;
 	spinodes = geo.flags & XFS_FSOP_GEOM_FLAGS_SPINODES ? 1 : 0;
+	rmapbt_enabled = geo.flags & XFS_FSOP_GEOM_FLAGS_RMAPBT ? 1 : 0;
 	if (nflag) {
 		report_info(geo, datadev, isint, logdev, rtdev,
 				lazycount, dirversion, logversion,
 				attrversion, projid32bit, crcs_enabled, ci,
-				ftype_enabled, finobt_enabled, spinodes);
+				ftype_enabled, finobt_enabled, spinodes,
+				rmapbt_enabled);
 		exit(0);
 	}
 
@@ -292,7 +296,7 @@ main(int argc, char **argv)
 	report_info(geo, datadev, isint, logdev, rtdev,
 			lazycount, dirversion, logversion,
 			attrversion, projid32bit, crcs_enabled, ci, ftype_enabled,
-			finobt_enabled, spinodes);
+			finobt_enabled, spinodes, rmapbt_enabled);
 
 	ddsize = xi.dsize;
 	dlsize = ( xi.logBBsize? xi.logBBsize :

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 054/145] xfs_io: add rmap-finish error injection type
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (52 preceding siblings ...)
  2016-06-17  1:36 ` [PATCH 053/145] xfs_growfs: report rmapbt presence Darrick J. Wong
@ 2016-06-17  1:36 ` Darrick J. Wong
  2016-06-17  1:36 ` [PATCH 055/145] xfs_logprint: support rmap redo items Darrick J. Wong
                   ` (90 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:36 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Add XFS_ERRTAG_RMAP_FINISH_ONE to the types of errors we can inject.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 io/inject.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)


diff --git a/io/inject.c b/io/inject.c
index 12e0fb3..16ac925 100644
--- a/io/inject.c
+++ b/io/inject.c
@@ -76,7 +76,9 @@ error_tag(char *name)
 		{ XFS_ERRTAG_BMAPIFORMAT,		"bmapifmt" },
 #define XFS_ERRTAG_FREE_EXTENT				22
 		{ XFS_ERRTAG_FREE_EXTENT,		"free_extent" },
-#define XFS_ERRTAG_MAX                                  23
+#define XFS_ERRTAG_RMAP_FINISH_ONE			23
+		{ XFS_ERRTAG_RMAP_FINISH_ONE,		"rmap_finish_one" },
+#define XFS_ERRTAG_MAX                                  24
 		{ XFS_ERRTAG_MAX,			NULL }
 	};
 	int	count;

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 055/145] xfs_logprint: support rmap redo items
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (53 preceding siblings ...)
  2016-06-17  1:36 ` [PATCH 054/145] xfs_io: add rmap-finish error injection type Darrick J. Wong
@ 2016-06-17  1:36 ` Darrick J. Wong
  2016-06-17  1:36 ` [PATCH 056/145] xfs_repair: use rmap btree data to check block types Darrick J. Wong
                   ` (89 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:36 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Print reverse mapping update redo items.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 logprint/log_misc.c      |   11 +++
 logprint/log_print_all.c |   12 ++++
 logprint/log_redo.c      |  152 ++++++++++++++++++++++++++++++++++++++++++++++
 logprint/logprint.h      |    5 ++
 4 files changed, 180 insertions(+)


diff --git a/logprint/log_misc.c b/logprint/log_misc.c
index 57d397c..479fc14 100644
--- a/logprint/log_misc.c
+++ b/logprint/log_misc.c
@@ -993,6 +993,17 @@ xlog_print_record(
 					be32_to_cpu(op_head->oh_len));
 			break;
 		    }
+		    case XFS_LI_RUI: {
+			skip = xlog_print_trans_rui(&ptr,
+					be32_to_cpu(op_head->oh_len),
+					continued);
+			break;
+		    }
+		    case XFS_LI_RUD: {
+			skip = xlog_print_trans_rud(&ptr,
+					be32_to_cpu(op_head->oh_len));
+			break;
+		    }
 		    case XFS_LI_QUOTAOFF: {
 			skip = xlog_print_trans_qoff(&ptr,
 					be32_to_cpu(op_head->oh_len));
diff --git a/logprint/log_print_all.c b/logprint/log_print_all.c
index 4d92c3b..0fe354b 100644
--- a/logprint/log_print_all.c
+++ b/logprint/log_print_all.c
@@ -408,6 +408,12 @@ xlog_recover_print_logitem(
 	case XFS_LI_EFI:
 		xlog_recover_print_efi(item);
 		break;
+	case XFS_LI_RUD:
+		xlog_recover_print_rud(item);
+		break;
+	case XFS_LI_RUI:
+		xlog_recover_print_rui(item);
+		break;
 	case XFS_LI_DQUOT:
 		xlog_recover_print_dquot(item);
 		break;
@@ -442,6 +448,12 @@ xlog_recover_print_item(
 	case XFS_LI_EFI:
 		printf("EFI");
 		break;
+	case XFS_LI_RUD:
+		printf("RUD");
+		break;
+	case XFS_LI_RUI:
+		printf("RUI");
+		break;
 	case XFS_LI_DQUOT:
 		printf("DQ ");
 		break;
diff --git a/logprint/log_redo.c b/logprint/log_redo.c
index d60cc1b..717dccd 100644
--- a/logprint/log_redo.c
+++ b/logprint/log_redo.c
@@ -229,3 +229,155 @@ xlog_recover_print_efd(
 		f->efd_size, f->efd_nextents,
 		(unsigned long long)f->efd_efi_id);
 }
+
+/* Reverse Mapping Update Items */
+
+static int
+xfs_rui_copy_format(
+	char			  *buf,
+	uint			  len,
+	struct xfs_rui_log_format *dst_fmt,
+	int			  continued)
+{
+	uint nextents = ((struct xfs_rui_log_format *)buf)->rui_nextents;
+	uint dst_len = sizeof(struct xfs_rui_log_format) +
+			(nextents - 1) * sizeof(struct xfs_map_extent);
+
+	if (len == dst_len || continued) {
+		memcpy((char *)dst_fmt, buf, len);
+		return 0;
+	}
+	fprintf(stderr, _("%s: bad size of RUI format: %u; expected %u; nextents = %u\n"),
+		progname, len, dst_len, nextents);
+	return 1;
+}
+
+int
+xlog_print_trans_rui(
+	char			**ptr,
+	uint			src_len,
+	int			continued)
+{
+	struct xfs_rui_log_format	*src_f, *f = NULL;
+	uint			dst_len;
+	uint			nextents;
+	struct xfs_map_extent	*ex;
+	int			i;
+	int			error = 0;
+	int			core_size;
+
+	core_size = offsetof(struct xfs_rui_log_format, rui_extents);
+
+	/*
+	 * memmove to ensure 8-byte alignment for the long longs in
+	 * struct xfs_rui_log_format structure
+	 */
+	src_f = malloc(src_len);
+	if (src_f == NULL) {
+		fprintf(stderr, _("%s: %s: malloc failed\n"),
+			progname, __func__);
+		exit(1);
+	}
+	memmove((char*)src_f, *ptr, src_len);
+	*ptr += src_len;
+
+	/* convert to native format */
+	nextents = src_f->rui_nextents;
+	dst_len = sizeof(struct xfs_rui_log_format) +
+			(nextents - 1) * sizeof(struct xfs_map_extent);
+
+	if (continued && src_len < core_size) {
+		printf(_("RUI: Not enough data to decode further\n"));
+		error = 1;
+		goto error;
+	}
+
+	f = malloc(dst_len);
+	if (f == NULL) {
+		fprintf(stderr, _("%s: %s: malloc failed\n"),
+			progname, __func__);
+		exit(1);
+	}
+	if (xfs_rui_copy_format((char *)src_f, src_len, f, continued)) {
+		error = 1;
+		goto error;
+	}
+
+	printf(_("RUI:  #regs: %d	num_extents: %d  id: 0x%llx\n"),
+		f->rui_size, f->rui_nextents, (unsigned long long)f->rui_id);
+
+	if (continued) {
+		printf(_("RUI extent data skipped (CONTINUE set, no space)\n"));
+		goto error;
+	}
+
+	ex = f->rui_extents;
+	for (i=0; i < f->rui_nextents; i++) {
+		printf("(s: 0x%llx, l: %d, own: %lld, off: %llu, f: 0x%x) ",
+			(unsigned long long)ex->me_startblock, ex->me_len,
+			(long long)ex->me_owner,
+			(unsigned long long)ex->me_startoff, ex->me_flags);
+		printf("\n");
+		ex++;
+	}
+error:
+	free(src_f);
+	free(f);
+	return error;
+}
+
+void
+xlog_recover_print_rui(
+	struct xlog_recover_item	*item)
+{
+	char				*src_f;
+	uint				src_len;
+
+	src_f = item->ri_buf[0].i_addr;
+	src_len = item->ri_buf[0].i_len;
+
+	xlog_print_trans_rui(&src_f, src_len, 0);
+}
+
+int
+xlog_print_trans_rud(
+	char				**ptr,
+	uint				len)
+{
+	struct xfs_rud_log_format	*f;
+	struct xfs_rud_log_format	lbuf;
+
+	/* size without extents at end */
+	uint core_size = sizeof(struct xfs_rud_log_format) -
+		sizeof(struct xfs_map_extent);
+
+	/*
+	 * memmove to ensure 8-byte alignment for the long longs in
+	 * xfs_efd_log_format_t structure
+	 */
+	memmove(&lbuf, *ptr, MIN(core_size, len));
+	f = &lbuf;
+	*ptr += len;
+	if (len >= core_size) {
+		printf(_("RUD:  #regs: %d	num_extents: %d  id: 0x%llx\n"),
+			f->rud_size, f->rud_nextents,
+			(unsigned long long)f->rud_rui_id);
+
+		/* don't print extents as they are not used */
+
+		return 0;
+	} else {
+		printf(_("RUD: Not enough data to decode further\n"));
+		return 1;
+	}
+}
+
+void
+xlog_recover_print_rud(
+	struct xlog_recover_item	*item)
+{
+	char				*f;
+
+	f = item->ri_buf[0].i_addr;
+	xlog_print_trans_rud(&f, sizeof(struct xfs_rud_log_format));
+}
diff --git a/logprint/logprint.h b/logprint/logprint.h
index 517e852..0c03c08 100644
--- a/logprint/logprint.h
+++ b/logprint/logprint.h
@@ -51,4 +51,9 @@ extern void xlog_recover_print_efi(xlog_recover_item_t *item);
 extern int xlog_print_trans_efd(char **ptr, uint len);
 extern void xlog_recover_print_efd(xlog_recover_item_t *item);
 
+extern int xlog_print_trans_rui(char **ptr, uint src_len, int continued);
+extern void xlog_recover_print_rui(struct xlog_recover_item *item);
+extern int xlog_print_trans_rud(char **ptr, uint len);
+extern void xlog_recover_print_rud(struct xlog_recover_item *item);
+
 #endif	/* LOGPRINT_H */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 056/145] xfs_repair: use rmap btree data to check block types
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (54 preceding siblings ...)
  2016-06-17  1:36 ` [PATCH 055/145] xfs_logprint: support rmap redo items Darrick J. Wong
@ 2016-06-17  1:36 ` Darrick J. Wong
  2016-06-17  1:36 ` [PATCH 057/145] xfs_repair: fix fino_bno calculation when rmapbt is enabled Darrick J. Wong
                   ` (88 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:36 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: Dave Chinner, xfs

From: Dave Chinner <dchinner@redhat.com>

Use the rmap btree to pre-populate the block type information so that
when repair iterates the primary metadata, we can confirm the block
type.

Ensure that we remove the flag bits from blockcount before using the
length field.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
[darrick.wong@oracle.com: split patch, strip flag bits from blockcount]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/dinode.c     |    6 +
 repair/incore.h     |   16 +-
 repair/scan.c       |  356 ++++++++++++++++++++++++++++++++++++++++++++++++---
 repair/xfs_repair.c |    2 
 4 files changed, 351 insertions(+), 29 deletions(-)


diff --git a/repair/dinode.c b/repair/dinode.c
index cbd4305..c1e60ff 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -744,6 +744,7 @@ _("%s fork in ino %" PRIu64 " claims dup extent, "
 _("%s fork in ino %" PRIu64 " claims free block %" PRIu64 "\n"),
 					forkname, ino, (__uint64_t) b);
 				/* fall through ... */
+			case XR_E_INUSE1:	/* seen by rmap */
 			case XR_E_UNKNOWN:
 				set_bmap_ext(agno, agbno, blen, XR_E_INUSE);
 				break;
@@ -751,6 +752,11 @@ _("%s fork in ino %" PRIu64 " claims free block %" PRIu64 "\n"),
 			case XR_E_BAD_STATE:
 				do_error(_("bad state in block map %" PRIu64 "\n"), b);
 
+			case XR_E_FS_MAP1:
+			case XR_E_INO1:
+			case XR_E_INUSE_FS1:
+				do_warn(_("rmap claims metadata use!\n"));
+				/* fall through */
 			case XR_E_FS_MAP:
 			case XR_E_INO:
 			case XR_E_INUSE_FS:
diff --git a/repair/incore.h b/repair/incore.h
index c92475e..bc0810b 100644
--- a/repair/incore.h
+++ b/repair/incore.h
@@ -102,17 +102,11 @@ typedef struct rt_extent_tree_node  {
 #define XR_E_MULT	5	/* extent is multiply referenced */
 #define XR_E_INO	6	/* extent used by inodes (inode blocks) */
 #define XR_E_FS_MAP	7	/* extent used by fs space/inode maps */
-#define XR_E_BAD_STATE	8
-
-/* extent states, in 64 bit word chunks */
-#define	XR_E_UNKNOWN_LL		0x0000000000000000LL
-#define	XR_E_FREE1_LL		0x1111111111111111LL
-#define	XR_E_FREE_LL		0x2222222222222222LL
-#define	XR_E_INUSE_LL		0x3333333333333333LL
-#define	XR_E_INUSE_FS_LL	0x4444444444444444LL
-#define	XR_E_MULT_LL		0x5555555555555555LL
-#define	XR_E_INO_LL		0x6666666666666666LL
-#define	XR_E_FS_MAP_LL		0x7777777777777777LL
+#define XR_E_INUSE1	8	/* used block (marked by rmap btree) */
+#define XR_E_INUSE_FS1	9	/* used by fs ag header or log (rmap btree) */
+#define XR_E_INO1	10	/* used by inodes (marked by rmap btree) */
+#define XR_E_FS_MAP1	11	/* used by fs space/inode maps (rmap btree) */
+#define XR_E_BAD_STATE	12
 
 /* separate state bit, OR'ed into high (4th) bit of ex_state field */
 
diff --git a/repair/scan.c b/repair/scan.c
index 964ff06..eb23685 100644
--- a/repair/scan.c
+++ b/repair/scan.c
@@ -44,6 +44,7 @@ struct aghdr_cnts {
 	__uint32_t	agicount;
 	__uint32_t	agifreecount;
 	__uint64_t	fdblocks;
+	__uint64_t	usedblocks;
 	__uint64_t	ifreecount;
 	__uint32_t	fibtfreecount;
 };
@@ -308,6 +309,13 @@ _("bad back (left) sibling pointer (saw %llu should be NULL (0))\n"
 		pthread_mutex_lock(&ag_locks[agno].lock);
 		state = get_bmap(agno, agbno);
 		switch (state) {
+		case XR_E_INUSE1:
+			/*
+			 * block was claimed as in use data by the rmap
+			 * btree, but has not been found in the data extent
+			 * map for the inode. That means this bmbt block hasn't
+			 * yet been claimed as in use, which means -it's ours-
+			 */
 		case XR_E_UNKNOWN:
 		case XR_E_FREE1:
 		case XR_E_FREE:
@@ -764,6 +772,272 @@ ino_issparse(
 	return xfs_inobt_is_sparse_disk(rp, offset);
 }
 
+static void
+scan_rmapbt(
+	struct xfs_btree_block	*block,
+	int			level,
+	xfs_agblock_t		bno,
+	xfs_agnumber_t		agno,
+	int			suspect,
+	int			isroot,
+	__uint32_t		magic,
+	void			*priv)
+{
+	struct aghdr_cnts	*agcnts = priv;
+	const char		*name = "rmap";
+	int			i;
+	xfs_rmap_ptr_t		*pp;
+	struct xfs_rmap_rec	*rp;
+	int			hdr_errors = 0;
+	int			numrecs;
+	int			state;
+	xfs_agblock_t		lastblock = 0;
+	int64_t			lastowner = 0;
+	int64_t			lastoffset = 0;
+
+	if (magic != XFS_RMAP_CRC_MAGIC) {
+		name = "(unknown)";
+		assert(0);
+	}
+
+	if (be32_to_cpu(block->bb_magic) != magic) {
+		do_warn(_("bad magic # %#x in bt%s block %d/%d\n"),
+			be32_to_cpu(block->bb_magic), name, agno, bno);
+		hdr_errors++;
+		if (suspect)
+			return;
+	}
+
+	/*
+	 * All RMAP btree blocks except the roots are freed for a
+	 * fully empty filesystem, thus they are counted towards the
+	 * free data block counter.
+	 */
+	if (!isroot) {
+		agcnts->agfbtreeblks++;
+		agcnts->fdblocks++;
+	}
+
+	if (be16_to_cpu(block->bb_level) != level) {
+		do_warn(_("expected level %d got %d in bt%s block %d/%d\n"),
+			level, be16_to_cpu(block->bb_level), name, agno, bno);
+		hdr_errors++;
+		if (suspect)
+			return;
+	}
+
+	/* check for btree blocks multiply claimed */
+	state = get_bmap(agno, bno);
+	if (!(state == XR_E_UNKNOWN || state == XR_E_FS_MAP1))  {
+		set_bmap(agno, bno, XR_E_MULT);
+		do_warn(
+_("%s rmap btree block claimed (state %d), agno %d, bno %d, suspect %d\n"),
+				name, state, agno, bno, suspect);
+		return;
+	}
+	set_bmap(agno, bno, XR_E_FS_MAP);
+
+	numrecs = be16_to_cpu(block->bb_numrecs);
+	if (level == 0) {
+		if (numrecs > mp->m_rmap_mxr[0])  {
+			numrecs = mp->m_rmap_mxr[0];
+			hdr_errors++;
+		}
+		if (isroot == 0 && numrecs < mp->m_rmap_mnr[0])  {
+			numrecs = mp->m_rmap_mnr[0];
+			hdr_errors++;
+		}
+
+		if (hdr_errors) {
+			do_warn(
+	_("bad btree nrecs (%u, min=%u, max=%u) in bt%s block %u/%u\n"),
+				be16_to_cpu(block->bb_numrecs),
+				mp->m_rmap_mnr[0], mp->m_rmap_mxr[0],
+				name, agno, bno);
+			suspect++;
+		}
+
+		rp = XFS_RMAP_REC_ADDR(block, 1);
+		for (i = 0; i < numrecs; i++) {
+			xfs_agblock_t		b, end;
+			xfs_extlen_t		len, blen;
+			int64_t			owner, offset;
+
+			b = be32_to_cpu(rp[i].rm_startblock);
+			len = be32_to_cpu(rp[i].rm_blockcount);
+			owner = be64_to_cpu(rp[i].rm_owner);
+			offset = be64_to_cpu(rp[i].rm_offset);
+			end = b + len;
+
+			/* Make sure agbno & len make sense. */
+			if (!verify_agbno(mp, agno, b)) {
+				do_warn(
+	_("invalid start block %u in record %u of %s btree block %u/%u\n"),
+					b, i, name, agno, bno);
+				continue;
+			}
+			if (len == 0 || !verify_agbno(mp, agno, end - 1)) {
+				do_warn(
+	_("invalid length %u in record %u of %s btree block %u/%u\n"),
+					len, i, name, agno, bno);
+				continue;
+			}
+
+			/* Look for impossible owners. */
+			if (!(owner > 0 || (owner > XFS_RMAP_OWN_MIN &&
+					    owner <= XFS_RMAP_OWN_FS)))
+				do_warn(
+	_("invalid owner in rmap btree record %d (%"PRId64" %u) block %u/%u\n"),
+						i, owner, len, agno, bno);
+
+			/* Check for out of order records. */
+			if (i == 0) {
+advance:
+				lastblock = b;
+				lastowner = owner;
+				lastoffset = offset;
+			} else {
+				bool bad;
+
+				bad = b <= lastblock;
+				if (bad)
+					do_warn(
+	_("out-of-order rmap btree record %d (%u %"PRId64" %"PRIx64" %u) block %u/%u\n"),
+					i, b, owner, offset, len, agno, bno);
+				else
+					goto advance;
+			}
+
+			/* Check for block owner collisions. */
+			for ( ; b < end; b += blen)  {
+				state = get_bmap_ext(agno, b, end, &blen);
+				switch (state) {
+				case XR_E_UNKNOWN:
+					switch (owner) {
+					case XFS_RMAP_OWN_FS:
+					case XFS_RMAP_OWN_LOG:
+						set_bmap(agno, b, XR_E_INUSE_FS1);
+						break;
+					case XFS_RMAP_OWN_AG:
+					case XFS_RMAP_OWN_INOBT:
+						set_bmap(agno, b, XR_E_FS_MAP1);
+						break;
+					case XFS_RMAP_OWN_INODES:
+						set_bmap(agno, b, XR_E_INO1);
+						break;
+					case XFS_RMAP_OWN_NULL:
+						/* still unknown */
+						break;
+					default:
+						/* file data */
+						set_bmap(agno, b, XR_E_INUSE1);
+						break;
+					}
+					break;
+				case XR_E_INUSE_FS:
+					if (owner == XFS_RMAP_OWN_FS ||
+					    owner == XFS_RMAP_OWN_LOG)
+						break;
+					do_warn(
+_("Static meta block (%d,%d-%d) mismatch in %s tree, state - %d,%" PRIx64 "\n"),
+						agno, b, b + blen - 1,
+						name, state, owner);
+					break;
+				case XR_E_FS_MAP:
+					if (owner == XFS_RMAP_OWN_AG ||
+					    owner == XFS_RMAP_OWN_INOBT)
+						break;
+					do_warn(
+_("AG meta block (%d,%d-%d) mismatch in %s tree, state - %d,%" PRIx64 "\n"),
+						agno, b, b + blen - 1,
+						name, state, owner);
+					break;
+				case XR_E_INO:
+					if (owner == XFS_RMAP_OWN_INODES)
+						break;
+					do_warn(
+_("inode block (%d,%d-%d) mismatch in %s tree, state - %d,%" PRIx64 "\n"),
+						agno, b, b + blen - 1,
+						name, state, owner);
+					break;
+				case XR_E_INUSE:
+					if (owner >= 0 &&
+					    owner < mp->m_sb.sb_dblocks)
+						break;
+					do_warn(
+_("in use block (%d,%d-%d) mismatch in %s tree, state - %d,%" PRIx64 "\n"),
+						agno, b, b + blen - 1,
+						name, state, owner);
+					break;
+				case XR_E_FREE1:
+				case XR_E_FREE:
+					/*
+					 * May be on the AGFL. If not, they'll
+					 * be caught later.
+					 */
+					break;
+				default:
+					do_warn(
+_("unknown block (%d,%d-%d) mismatch on %s tree, state - %d,%" PRIx64 "\n"),
+						agno, b, b + blen - 1,
+						name, state, owner);
+					break;
+				}
+			}
+		}
+		return;
+	}
+
+	/*
+	 * interior record
+	 */
+	pp = XFS_RMAP_PTR_ADDR(block, 1, mp->m_rmap_mxr[1]);
+
+	if (numrecs > mp->m_rmap_mxr[1])  {
+		numrecs = mp->m_rmap_mxr[1];
+		hdr_errors++;
+	}
+	if (isroot == 0 && numrecs < mp->m_rmap_mnr[1])  {
+		numrecs = mp->m_rmap_mnr[1];
+		hdr_errors++;
+	}
+
+	/*
+	 * don't pass bogus tree flag down further if this block
+	 * looked ok.  bail out if two levels in a row look bad.
+	 */
+	if (hdr_errors)  {
+		do_warn(
+	_("bad btree nrecs (%u, min=%u, max=%u) in bt%s block %u/%u\n"),
+			be16_to_cpu(block->bb_numrecs),
+			mp->m_rmap_mnr[1], mp->m_rmap_mxr[1],
+			name, agno, bno);
+		if (suspect)
+			return;
+		suspect++;
+	} else if (suspect) {
+		suspect = 0;
+	}
+
+	for (i = 0; i < numrecs; i++)  {
+		xfs_agblock_t		bno = be32_to_cpu(pp[i]);
+
+		/*
+		 * XXX - put sibling detection right here.
+		 * we know our sibling chain is good.  So as we go,
+		 * we check the entry before and after each entry.
+		 * If either of the entries references a different block,
+		 * check the sibling pointer.  If there's a sibling
+		 * pointer mismatch, try and extract as much data
+		 * as possible.
+		 */
+		if (bno != 0 && verify_agbno(mp, agno, bno)) {
+			scan_sbtree(bno, level, agno, suspect, scan_rmapbt, 0,
+				    magic, priv, &xfs_rmapbt_buf_ops);
+		}
+	}
+}
+
 /*
  * The following helpers are to help process and validate individual on-disk
  * inode btree records. We have two possible inode btrees with slightly
@@ -976,20 +1250,27 @@ scan_single_ino_chunk(
 
 			agbno = XFS_AGINO_TO_AGBNO(mp, ino + j);
 			state = get_bmap(agno, agbno);
-			if (state == XR_E_UNKNOWN)  {
-				set_bmap(agno, agbno, XR_E_INO);
-			} else if (state == XR_E_INUSE_FS && agno == 0 &&
-				   ino + j >= first_prealloc_ino &&
-				   ino + j < last_prealloc_ino)  {
+			switch (state) {
+			case XR_E_INO:
+				break;
+			case XR_E_UNKNOWN:
+			case XR_E_INO1:	/* seen by rmap */
 				set_bmap(agno, agbno, XR_E_INO);
-			} else  {
+				break;
+			case XR_E_INUSE_FS:
+			case XR_E_INUSE_FS1:
+				if (agno == 0 &&
+				    ino + j >= first_prealloc_ino &&
+				    ino + j < last_prealloc_ino) {
+					set_bmap(agno, agbno, XR_E_INO);
+					break;
+				}
+				/* fall through */
+			default:
+				/* XXX - maybe should mark block a duplicate */
 				do_warn(
 _("inode chunk claims used block, inobt block - agno %d, bno %d, inopb %d\n"),
 					agno, agbno, mp->m_sb.sb_inopblock);
-				/*
-				 * XXX - maybe should mark
-				 * block a duplicate
-				 */
 				return ++suspect;
 			}
 		}
@@ -1099,19 +1380,35 @@ _("sparse inode chunk claims inode block, finobt block - agno %d, bno %d, inopb
 				continue;
 			}
 
-			if (state == XR_E_INO) {
-				continue;
-			} else if ((state == XR_E_UNKNOWN) ||
-				   (state == XR_E_INUSE_FS && agno == 0 &&
-				    ino + j >= first_prealloc_ino &&
-				    ino + j < last_prealloc_ino)) {
+			switch (state) {
+			case XR_E_INO:
+				break;
+			case XR_E_INO1:	/* seen by rmap */
+				set_bmap(agno, agbno, XR_E_INO);
+				break;
+			case XR_E_UNKNOWN:
 				do_warn(
 _("inode chunk claims untracked block, finobt block - agno %d, bno %d, inopb %d\n"),
 					agno, agbno, mp->m_sb.sb_inopblock);
 
 				set_bmap(agno, agbno, XR_E_INO);
 				suspect++;
-			} else {
+				break;
+			case XR_E_INUSE_FS:
+			case XR_E_INUSE_FS1:
+				if (agno == 0 &&
+				    ino + j >= first_prealloc_ino &&
+				    ino + j < last_prealloc_ino) {
+					do_warn(
+_("inode chunk claims untracked block, finobt block - agno %d, bno %d, inopb %d\n"),
+						agno, agbno, mp->m_sb.sb_inopblock);
+
+					set_bmap(agno, agbno, XR_E_INO);
+					suspect++;
+					break;
+				}
+				/* fall through */
+			default:
 				do_warn(
 _("inode chunk claims used block, finobt block - agno %d, bno %d, inopb %d\n"),
 					agno, agbno, mp->m_sb.sb_inopblock);
@@ -1280,6 +1577,7 @@ scan_inobt(
 	 */
 	state = get_bmap(agno, bno);
 	switch (state)  {
+	case XR_E_FS_MAP1: /* already been seen by an rmap scan */
 	case XR_E_UNKNOWN:
 	case XR_E_FREE1:
 	case XR_E_FREE:
@@ -1420,7 +1718,7 @@ scan_freelist(
 	if (XFS_SB_BLOCK(mp) != XFS_AGFL_BLOCK(mp) &&
 	    XFS_AGF_BLOCK(mp) != XFS_AGFL_BLOCK(mp) &&
 	    XFS_AGI_BLOCK(mp) != XFS_AGFL_BLOCK(mp))
-		set_bmap(agno, XFS_AGFL_BLOCK(mp), XR_E_FS_MAP);
+		set_bmap(agno, XFS_AGFL_BLOCK(mp), XR_E_INUSE_FS);
 
 	if (be32_to_cpu(agf->agf_flcount) == 0)
 		return;
@@ -1505,6 +1803,19 @@ validate_agf(
 			bno, agno);
 	}
 
+	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
+		bno = be32_to_cpu(agf->agf_roots[XFS_BTNUM_RMAP]);
+		if (bno != 0 && verify_agbno(mp, agno, bno)) {
+			scan_sbtree(bno,
+				    be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAP]),
+				    agno, 0, scan_rmapbt, 1, XFS_RMAP_CRC_MAGIC,
+				    agcnts, &xfs_rmapbt_buf_ops);
+		} else  {
+			do_warn(_("bad agbno %u for rmapbt root, agno %d\n"),
+				bno, agno);
+		}
+	}
+
 	if (be32_to_cpu(agf->agf_freeblks) != agcnts->agffreeblks) {
 		do_warn(_("agf_freeblks %u, counted %u in ag %u\n"),
 			be32_to_cpu(agf->agf_freeblks), agcnts->agffreeblks, agno);
@@ -1520,6 +1831,7 @@ validate_agf(
 		do_warn(_("agf_btreeblks %u, counted %" PRIu64 " in ag %u\n"),
 			be32_to_cpu(agf->agf_btreeblks), agcnts->agfbtreeblks, agno);
 	}
+
 }
 
 static void
@@ -1759,6 +2071,7 @@ scan_ags(
 	__uint64_t	fdblocks = 0;
 	__uint64_t	icount = 0;
 	__uint64_t	ifreecount = 0;
+	__uint64_t	usedblocks = 0;
 	xfs_agnumber_t	i;
 	work_queue_t	wq;
 
@@ -1781,6 +2094,7 @@ scan_ags(
 		fdblocks += agcnts[i].fdblocks;
 		icount += agcnts[i].agicount;
 		ifreecount += agcnts[i].ifreecount;
+		usedblocks += agcnts[i].usedblocks;
 	}
 
 	free(agcnts);
@@ -1802,4 +2116,10 @@ scan_ags(
 		do_warn(_("sb_fdblocks %" PRIu64 ", counted %" PRIu64 "\n"),
 			mp->m_sb.sb_fdblocks, fdblocks);
 	}
+
+	if (usedblocks &&
+	    usedblocks != mp->m_sb.sb_dblocks - fdblocks) {
+		do_warn(_("used blocks %" PRIu64 ", counted %" PRIu64 "\n"),
+			mp->m_sb.sb_dblocks - fdblocks, usedblocks);
+	}
 }
diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
index 9d91f2d..709c0c3 100644
--- a/repair/xfs_repair.c
+++ b/repair/xfs_repair.c
@@ -417,6 +417,8 @@ calc_mkfs(xfs_mount_t *mp)
 	fino_bno = inobt_root + (2 * min(2, mp->m_ag_maxlevels)) + 1;
 	if (xfs_sb_version_hasfinobt(&mp->m_sb))
 		fino_bno++;
+	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
+		fino_bno++;
 
 	/*
 	 * If the log is allocated in the first allocation group we need to

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 057/145] xfs_repair: fix fino_bno calculation when rmapbt is enabled
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (55 preceding siblings ...)
  2016-06-17  1:36 ` [PATCH 056/145] xfs_repair: use rmap btree data to check block types Darrick J. Wong
@ 2016-06-17  1:36 ` Darrick J. Wong
  2016-06-17  1:36 ` [PATCH 058/145] xfs_repair: create a slab API for allocating arrays in large chunks Darrick J. Wong
                   ` (87 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:36 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

In xfs_repair, we calculate where we think mkfs put the root inode
block.  However, the rmapbt component doesn't account for the fact
that mkfs reserved 2 AGFL blocks for the rmapbt, so its calculation
is off by a bit.  This leads to it complaining (incorrectly) about the
root inode block being in the wrong place and blowing up.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/xfs_repair.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)


diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
index 709c0c3..3b63754 100644
--- a/repair/xfs_repair.c
+++ b/repair/xfs_repair.c
@@ -417,8 +417,10 @@ calc_mkfs(xfs_mount_t *mp)
 	fino_bno = inobt_root + (2 * min(2, mp->m_ag_maxlevels)) + 1;
 	if (xfs_sb_version_hasfinobt(&mp->m_sb))
 		fino_bno++;
-	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
+	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
+		fino_bno += min(2, mp->m_rmap_maxlevels);
 		fino_bno++;
+	}
 
 	/*
 	 * If the log is allocated in the first allocation group we need to

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 058/145] xfs_repair: create a slab API for allocating arrays in large chunks
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (56 preceding siblings ...)
  2016-06-17  1:36 ` [PATCH 057/145] xfs_repair: fix fino_bno calculation when rmapbt is enabled Darrick J. Wong
@ 2016-06-17  1:36 ` Darrick J. Wong
  2016-06-17  1:36 ` [PATCH 059/145] xfs_repair: collect reverse-mapping data for refcount/rmap tree rebuilding Darrick J. Wong
                   ` (86 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:36 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Create a slab-based array and a bag-of-pointers data structure to
facilitate rapid linear scans of reverse-mapping data for later
reconstruction of the refcount and rmap btrees.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/Makefile |    4 
 repair/slab.c   |  456 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 repair/slab.h   |   60 +++++++
 3 files changed, 518 insertions(+), 2 deletions(-)
 create mode 100644 repair/slab.c
 create mode 100644 repair/slab.h


diff --git a/repair/Makefile b/repair/Makefile
index 251722b..756ba95 100644
--- a/repair/Makefile
+++ b/repair/Makefile
@@ -11,13 +11,13 @@ LTCOMMAND = xfs_repair
 
 HFILES = agheader.h attr_repair.h avl.h avl64.h bmap.h btree.h \
 	da_util.h dinode.h dir2.h err_protos.h globals.h incore.h protos.h \
-	rt.h progress.h scan.h versions.h prefetch.h threads.h
+	rt.h progress.h scan.h versions.h prefetch.h slab.h threads.h
 
 CFILES = agheader.c attr_repair.c avl.c avl64.c bmap.c btree.c \
 	da_util.c dino_chunks.c dinode.c dir2.c globals.c incore.c \
 	incore_bmc.c init.c incore_ext.c incore_ino.c phase1.c \
 	phase2.c phase3.c phase4.c phase5.c phase6.c phase7.c \
-	progress.c prefetch.c rt.c sb.c scan.c threads.c \
+	progress.c prefetch.c rt.c sb.c scan.c slab.c threads.c \
 	versions.c xfs_repair.c
 
 LLDLIBS = $(LIBXFS) $(LIBXLOG) $(LIBUUID) $(LIBRT) $(LIBPTHREAD)
diff --git a/repair/slab.c b/repair/slab.c
new file mode 100644
index 0000000..97c13d3
--- /dev/null
+++ b/repair/slab.c
@@ -0,0 +1,456 @@
+/*
+ * Copyright (C) 2016 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <libxfs.h>
+#include "slab.h"
+
+#undef SLAB_DEBUG
+
+#ifdef SLAB_DEBUG
+# define dbg_printf(f, a...)  do {printf(f, ## a); fflush(stdout); } while (0)
+#else
+# define dbg_printf(f, a...)
+#endif
+
+/*
+ * Slab Arrays and Bags
+ *
+ * The slab array is a dynamically growable linear array.  Internally it
+ * maintains a list of slabs of increasing size; when a slab fills up, another
+ * is allocated.  Each slab is sorted individually, which means that one must
+ * use an iterator to walk the entire logical array, sorted order or otherwise.
+ * Array items can neither be removed nor accessed randomly, since (at the
+ * moment) the only user of them (storing reverse mappings) doesn't need either
+ * piece.  Pointers are not stable across sort operations.
+ *
+ * A bag is a collection of pointers.  The bag can be added to or removed from
+ * arbitrarily, and the bag items can be iterated.  Bags are used to process
+ * rmaps into refcount btree entries.
+ */
+
+/*
+ * Slabs -- each slab_hdr holds an array of items; when a slab_hdr fills up, we
+ * allocate a new one and add to that one.  The slab object coordinates the
+ * slab_hdrs.
+ */
+
+/* Each slab holds at least 4096 items */
+#define MIN_SLAB_NR		4096
+/* and cannot be larger than 128M */
+#define MAX_SLAB_SIZE		(128 * 1048576)
+struct xfs_slab_hdr {
+	size_t			sh_nr;
+	size_t			sh_inuse;	/* items in use */
+	struct xfs_slab_hdr	*sh_next;	/* next slab hdr */
+						/* objects follow */
+};
+
+struct xfs_slab {
+	size_t			s_item_sz;	/* item size */
+	size_t			s_nr_slabs;	/* # of slabs */
+	size_t			s_nr_items;	/* # of items */
+	struct xfs_slab_hdr	*s_first;	/* first slab header */
+	struct xfs_slab_hdr	*s_last;	/* last sh_next pointer */
+};
+
+/*
+ * Slab cursors -- each slab_hdr_cursor tracks a slab_hdr; the slab_cursor
+ * tracks the slab_hdr_cursors.  If a compare_fn is specified, the cursor
+ * returns objects in increasing order (if you've previously sorted the
+ * slabs with qsort_slab()).  If compare_fn == NULL, it returns slab items
+ * in order.
+ */
+struct xfs_slab_hdr_cursor {
+	struct xfs_slab_hdr	*hdr;		/* a slab header */
+	size_t			loc;		/* where we are in the slab */
+};
+
+typedef int (*xfs_slab_compare_fn)(const void *, const void *);
+
+struct xfs_slab_cursor {
+	size_t				nr;		/* # of per-slab cursors */
+	struct xfs_slab			*slab;		/* pointer to the slab */
+	struct xfs_slab_hdr_cursor	*last_hcur;	/* last header we took from */
+	xfs_slab_compare_fn		compare_fn;	/* compare items */
+	struct xfs_slab_hdr_cursor	hcur[0];	/* per-slab cursors */
+};
+
+/*
+ * Bags -- each bag is an array of pointers items; when a bag fills up, we
+ * resize it.
+ */
+#define MIN_BAG_SIZE	4096
+struct xfs_bag {
+	size_t			bg_nr;		/* number of pointers */
+	size_t			bg_inuse;	/* number of slots in use */
+	void			**bg_ptrs;	/* pointers */
+};
+#define BAG_SIZE(nr)	(sizeof(struct xfs_bag) + ((nr) * sizeof(void *)))
+#define BAG_END(bag)	(&(bag)->bg_ptrs[(bag)->bg_nr])
+
+/*
+ * Create a slab to hold some objects of a particular size.
+ */
+int
+init_slab(
+	struct xfs_slab	**slab,
+	size_t		item_size)
+{
+	struct xfs_slab	*ptr;
+
+	ptr = calloc(1, sizeof(struct xfs_slab));
+	if (!ptr)
+		return -ENOMEM;
+	ptr->s_item_sz = item_size;
+	ptr->s_last = NULL;
+	*slab = ptr;
+
+	return 0;
+}
+
+/*
+ * Frees a slab.
+ */
+void
+free_slab(
+	struct xfs_slab		**slab)
+{
+	struct xfs_slab		*ptr;
+	struct xfs_slab_hdr	*hdr;
+	struct xfs_slab_hdr	*nhdr;
+
+	ptr = *slab;
+	if (!ptr)
+		return;
+	hdr = ptr->s_first;
+	while (hdr) {
+		nhdr = hdr->sh_next;
+		free(hdr);
+		hdr = nhdr;
+	}
+	free(ptr);
+	*slab = NULL;
+}
+
+static void *
+slab_ptr(
+	struct xfs_slab		*slab,
+	struct xfs_slab_hdr	*hdr,
+	size_t			idx)
+{
+	char			*p;
+
+	ASSERT(idx < hdr->sh_inuse);
+	p = (char *)(hdr + 1);
+	p += slab->s_item_sz * idx;
+	return p;
+}
+
+/*
+ * Add an item to the slab.
+ */
+int
+slab_add(
+	struct xfs_slab		*slab,
+	void			*item)
+{
+	struct xfs_slab_hdr		*hdr;
+	void			*p;
+
+	hdr = slab->s_last;
+	if (!hdr || hdr->sh_inuse == hdr->sh_nr) {
+		size_t n;
+
+		n = (hdr ? hdr->sh_nr * 2 : MIN_SLAB_NR);
+		if (n * slab->s_item_sz > MAX_SLAB_SIZE)
+			n = MAX_SLAB_SIZE / slab->s_item_sz;
+		hdr = malloc(sizeof(struct xfs_slab_hdr) + (n * slab->s_item_sz));
+		if (!hdr)
+			return -ENOMEM;
+		hdr->sh_nr = n;
+		hdr->sh_inuse = 0;
+		hdr->sh_next = NULL;
+		if (slab->s_last)
+			slab->s_last->sh_next = hdr;
+		if (!slab->s_first)
+			slab->s_first = hdr;
+		slab->s_last = hdr;
+		slab->s_nr_slabs++;
+	}
+	hdr->sh_inuse++;
+	p = slab_ptr(slab, hdr, hdr->sh_inuse - 1);
+	memcpy(p, item, slab->s_item_sz);
+	slab->s_nr_items++;
+
+	return 0;
+}
+
+/*
+ * Sort the items in the slab.  Do not run this method if there are any
+ * cursors holding on to the slab.
+ */
+void
+qsort_slab(
+	struct xfs_slab		*slab,
+	int (*compare_fn)(const void *, const void *))
+{
+	struct xfs_slab_hdr	*hdr;
+
+	hdr = slab->s_first;
+	while (hdr) {
+		qsort(slab_ptr(slab, hdr, 0), hdr->sh_inuse, slab->s_item_sz,
+		      compare_fn);
+		hdr = hdr->sh_next;
+	}
+}
+
+/*
+ * init_slab_cursor() -- Create a slab cursor to iterate the slab items.
+ *
+ * @slab: The slab.
+ * @compare_fn: If specified, use this function to return items in ascending order.
+ * @cur: The new cursor.
+ */
+int
+init_slab_cursor(
+	struct xfs_slab		*slab,
+	int (*compare_fn)(const void *, const void *),
+	struct xfs_slab_cursor	**cur)
+{
+	struct xfs_slab_cursor	*c;
+	struct xfs_slab_hdr_cursor	*hcur;
+	struct xfs_slab_hdr	*hdr;
+
+	c = malloc(sizeof(struct xfs_slab_cursor) +
+		   (sizeof(struct xfs_slab_hdr_cursor) * slab->s_nr_slabs));
+	if (!c)
+		return -ENOMEM;
+	c->nr = slab->s_nr_slabs;
+	c->slab = slab;
+	c->compare_fn = compare_fn;
+	c->last_hcur = NULL;
+	hcur = (struct xfs_slab_hdr_cursor *)(c + 1);
+	hdr = slab->s_first;
+	while (hdr) {
+		hcur->hdr = hdr;
+		hcur->loc = 0;
+		hcur++;
+		hdr = hdr->sh_next;
+	}
+	*cur = c;
+	return 0;
+}
+
+/*
+ * Free the slab cursor.
+ */
+void
+free_slab_cursor(
+	struct xfs_slab_cursor	**cur)
+{
+	if (!*cur)
+		return;
+	free(*cur);
+	*cur = NULL;
+}
+
+/*
+ * Return the smallest item in the slab, without advancing the iterator.
+ * The slabs must be sorted prior to the creation of the cursor.
+ */
+void *
+peek_slab_cursor(
+	struct xfs_slab_cursor	*cur)
+{
+	struct xfs_slab_hdr_cursor	*hcur;
+	void			*p = NULL;
+	void			*q;
+	size_t			i;
+
+	cur->last_hcur = NULL;
+
+	/* no compare function; inorder traversal */
+	if (!cur->compare_fn) {
+		if (!cur->last_hcur)
+			cur->last_hcur = &cur->hcur[0];
+		hcur = cur->last_hcur;
+		while (hcur < &cur->hcur[cur->nr] &&
+			hcur->loc >= hcur->hdr->sh_inuse)
+			hcur++;
+		if (hcur == &cur->hcur[cur->nr])
+			return NULL;
+		p = slab_ptr(cur->slab, hcur->hdr, hcur->loc);
+		cur->last_hcur = hcur;
+		return p;
+	}
+
+	/* otherwise return things in increasing order */
+	for (i = 0, hcur = &cur->hcur[i]; i < cur->nr; i++, hcur++) {
+		if (hcur->loc >= hcur->hdr->sh_inuse)
+			continue;
+		q = slab_ptr(cur->slab, hcur->hdr, hcur->loc);
+		if (!p || cur->compare_fn(p, q) > 0) {
+			p = q;
+			cur->last_hcur = hcur;
+		}
+	}
+
+	return p;
+}
+
+/*
+ * After a peek operation, advance the cursor.
+ */
+void
+advance_slab_cursor(
+	struct xfs_slab_cursor	*cur)
+{
+	ASSERT(cur->last_hcur);
+	cur->last_hcur->loc++;
+}
+
+/*
+ * Retrieve the next item in the slab and advance the cursor.
+ */
+void *
+pop_slab_cursor(
+	struct xfs_slab_cursor	*cur)
+{
+	void			*p;
+
+	p = peek_slab_cursor(cur);
+	if (p)
+		advance_slab_cursor(cur);
+	return p;
+}
+
+/*
+ * Return the number of items in the slab.
+ */
+size_t
+slab_count(
+	struct xfs_slab	*slab)
+{
+	return slab->s_nr_items;
+}
+
+/*
+ * Create a bag to point to some objects.
+ */
+int
+init_bag(
+	struct xfs_bag	**bag)
+{
+	struct xfs_bag	*ptr;
+
+	ptr = calloc(1, sizeof(struct xfs_bag));
+	if (!ptr)
+		return -ENOMEM;
+	ptr->bg_ptrs = calloc(MIN_BAG_SIZE, sizeof(void *));
+	if (!ptr->bg_ptrs) {
+		free(ptr);
+		return -ENOMEM;
+	}
+	ptr->bg_nr = MIN_BAG_SIZE;
+	*bag = ptr;
+	return 0;
+}
+
+/*
+ * Free a bag of pointers.
+ */
+void
+free_bag(
+	struct xfs_bag	**bag)
+{
+	struct xfs_bag	*ptr;
+
+	ptr = *bag;
+	if (!ptr)
+		return;
+	free(ptr->bg_ptrs);
+	free(ptr);
+	*bag = NULL;
+}
+
+/*
+ * Add an object to the pointer bag.
+ */
+int
+bag_add(
+	struct xfs_bag	*bag,
+	void		*ptr)
+{
+	void		**p, **x;
+
+	p = &bag->bg_ptrs[bag->bg_inuse];
+	if (p == BAG_END(bag)) {
+		/* No free space, alloc more pointers */
+		size_t nr;
+
+		nr = bag->bg_nr * 2;
+		x = realloc(bag->bg_ptrs, nr * sizeof(void *));
+		if (!x)
+			return -ENOMEM;
+		bag->bg_ptrs = x;
+		memset(BAG_END(bag), 0, bag->bg_nr * sizeof(void *));
+		bag->bg_nr = nr;
+	}
+	bag->bg_ptrs[bag->bg_inuse] = ptr;
+	bag->bg_inuse++;
+	return 0;
+}
+
+/*
+ * Remove a pointer from a bag.
+ */
+int
+bag_remove(
+	struct xfs_bag	*bag,
+	size_t		nr)
+{
+	ASSERT(nr < bag->bg_inuse);
+	memmove(&bag->bg_ptrs[nr], &bag->bg_ptrs[nr + 1],
+		(bag->bg_inuse - nr) * sizeof(void *));
+	bag->bg_inuse--;
+	return 0;
+}
+
+/*
+ * Return the number of items in a bag.
+ */
+size_t
+bag_count(
+	struct xfs_bag	*bag)
+{
+	return bag->bg_inuse;
+}
+
+/*
+ * Return the nth item in a bag.
+ */
+void *
+bag_item(
+	struct xfs_bag	*bag,
+	size_t		nr)
+{
+	if (nr >= bag->bg_inuse)
+		return NULL;
+	return bag->bg_ptrs[nr];
+}
diff --git a/repair/slab.h b/repair/slab.h
new file mode 100644
index 0000000..4aa5512
--- /dev/null
+++ b/repair/slab.h
@@ -0,0 +1,60 @@
+/*
+ * Copyright (C) 2016 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef SLAB_H_
+#define SLAB_H_
+
+struct xfs_slab;
+struct xfs_slab_cursor;
+
+extern int init_slab(struct xfs_slab **, size_t);
+extern void free_slab(struct xfs_slab **);
+
+extern int slab_add(struct xfs_slab *, void *);
+extern void qsort_slab(struct xfs_slab *, int (*)(const void *, const void *));
+extern size_t slab_count(struct xfs_slab *);
+
+extern int init_slab_cursor(struct xfs_slab *,
+	int (*)(const void *, const void *), struct xfs_slab_cursor **);
+extern void free_slab_cursor(struct xfs_slab_cursor **);
+
+extern void *peek_slab_cursor(struct xfs_slab_cursor *);
+extern void advance_slab_cursor(struct xfs_slab_cursor *);
+extern void *pop_slab_cursor(struct xfs_slab_cursor *);
+
+struct xfs_bag;
+
+extern int init_bag(struct xfs_bag **);
+extern void free_bag(struct xfs_bag **);
+extern int bag_add(struct xfs_bag *, void *);
+extern int bag_remove(struct xfs_bag *, size_t);
+extern size_t bag_count(struct xfs_bag *);
+extern void *bag_item(struct xfs_bag *, size_t);
+
+#define foreach_bag_ptr(bag, idx, ptr) \
+	for ((idx) = 0, (ptr) = bag_item((bag), (idx)); \
+	     (idx) < bag_count(bag); \
+	     (idx)++, (ptr) = bag_item((bag), (idx)))
+
+#define foreach_bag_ptr_reverse(bag, idx, ptr) \
+	for ((idx) = bag_count(bag) - 1, (ptr) = bag_item((bag), (idx)); \
+	     (idx) >= 0 && (ptr) != NULL; \
+	     (idx)--, (ptr) = bag_item((bag), (idx)))
+
+#endif /* SLAB_H_ */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 059/145] xfs_repair: collect reverse-mapping data for refcount/rmap tree rebuilding
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (57 preceding siblings ...)
  2016-06-17  1:36 ` [PATCH 058/145] xfs_repair: create a slab API for allocating arrays in large chunks Darrick J. Wong
@ 2016-06-17  1:36 ` Darrick J. Wong
  2016-06-17  1:37 ` [PATCH 060/145] xfs_repair: record and merge raw rmap data Darrick J. Wong
                   ` (85 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:36 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Collect reverse-mapping data for the entire filesystem so that we can
later check and rebuild the reference count tree and the reverse mapping
tree.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/Makefile     |    4 +
 repair/dinode.c     |    9 ++
 repair/phase4.c     |    5 +
 repair/rmap.c       |  191 +++++++++++++++++++++++++++++++++++++++++++++++++++
 repair/rmap.h       |   32 +++++++++
 repair/xfs_repair.c |    4 +
 6 files changed, 243 insertions(+), 2 deletions(-)
 create mode 100644 repair/rmap.c
 create mode 100644 repair/rmap.h


diff --git a/repair/Makefile b/repair/Makefile
index 756ba95..81c2b9f 100644
--- a/repair/Makefile
+++ b/repair/Makefile
@@ -11,13 +11,13 @@ LTCOMMAND = xfs_repair
 
 HFILES = agheader.h attr_repair.h avl.h avl64.h bmap.h btree.h \
 	da_util.h dinode.h dir2.h err_protos.h globals.h incore.h protos.h \
-	rt.h progress.h scan.h versions.h prefetch.h slab.h threads.h
+	rt.h progress.h scan.h versions.h prefetch.h rmap.h slab.h threads.h
 
 CFILES = agheader.c attr_repair.c avl.c avl64.c bmap.c btree.c \
 	da_util.c dino_chunks.c dinode.c dir2.c globals.c incore.c \
 	incore_bmc.c init.c incore_ext.c incore_ino.c phase1.c \
 	phase2.c phase3.c phase4.c phase5.c phase6.c phase7.c \
-	progress.c prefetch.c rt.c sb.c scan.c slab.c threads.c \
+	progress.c prefetch.c rmap.c rt.c sb.c scan.c slab.c threads.c \
 	versions.c xfs_repair.c
 
 LLDLIBS = $(LIBXFS) $(LIBXLOG) $(LIBUUID) $(LIBRT) $(LIBPTHREAD)
diff --git a/repair/dinode.c b/repair/dinode.c
index c1e60ff..89163b1 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -30,6 +30,8 @@
 #include "attr_repair.h"
 #include "bmap.h"
 #include "threads.h"
+#include "slab.h"
+#include "rmap.h"
 
 /*
  * gettext lookups for translations of strings use mutexes internally to
@@ -779,6 +781,13 @@ _("illegal state %d in block map %" PRIu64 "\n"),
 					state, b);
 			}
 		}
+		if (collect_rmaps) { /* && !check_dups */
+			error = add_rmap(mp, ino, whichfork, &irec);
+			if (error)
+				do_error(
+_("couldn't add reverse mapping\n")
+					);
+		}
 		*tot += irec.br_blockcount;
 	}
 	error = 0;
diff --git a/repair/phase4.c b/repair/phase4.c
index 1a7d7b5..b4264df 100644
--- a/repair/phase4.c
+++ b/repair/phase4.c
@@ -30,7 +30,10 @@
 #include "versions.h"
 #include "dir2.h"
 #include "progress.h"
+#include "slab.h"
+#include "rmap.h"
 
+bool collect_rmaps;
 
 /*
  * null out quota inode fields in sb if they point to non-existent inodes.
@@ -170,6 +173,8 @@ phase4(xfs_mount_t *mp)
 	int			ag_hdr_block;
 	int			bstate;
 
+	if (needs_rmap_work(mp))
+		collect_rmaps = true;
 	ag_hdr_block = howmany(ag_hdr_len, mp->m_sb.sb_blocksize);
 
 	do_log(_("Phase 4 - check for duplicate blocks...\n"));
diff --git a/repair/rmap.c b/repair/rmap.c
new file mode 100644
index 0000000..e78115e
--- /dev/null
+++ b/repair/rmap.c
@@ -0,0 +1,191 @@
+/*
+ * Copyright (C) 2016 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <libxfs.h>
+#include "btree.h"
+#include "err_protos.h"
+#include "libxlog.h"
+#include "incore.h"
+#include "globals.h"
+#include "dinode.h"
+#include "slab.h"
+#include "rmap.h"
+
+#undef RMAP_DEBUG
+
+#ifdef RMAP_DEBUG
+# define dbg_printf(f, a...)  do {printf(f, ## a); fflush(stdout); } while (0)
+#else
+# define dbg_printf(f, a...)
+#endif
+
+/* per-AG rmap object anchor */
+struct xfs_ag_rmap {
+	struct xfs_slab	*ar_rmaps;		/* rmap observations, p4 */
+};
+
+static struct xfs_ag_rmap *ag_rmaps;
+
+/*
+ * Compare rmap observations for array sorting.
+ */
+static int
+rmap_compare(
+	const void		*a,
+	const void		*b)
+{
+	const struct xfs_rmap_irec	*pa;
+	const struct xfs_rmap_irec	*pb;
+	__u64			oa;
+	__u64			ob;
+
+	pa = a; pb = b;
+	oa = xfs_rmap_irec_offset_pack(pa);
+	ob = xfs_rmap_irec_offset_pack(pb);
+
+	if (pa->rm_startblock < pb->rm_startblock)
+		return -1;
+	else if (pa->rm_startblock > pb->rm_startblock)
+		return 1;
+	else if (pa->rm_owner < pb->rm_owner)
+		return -1;
+	else if (pa->rm_owner > pb->rm_owner)
+		return 1;
+	else if (oa < ob)
+		return -1;
+	else if (oa > ob)
+		return 1;
+	else
+		return 0;
+}
+
+/*
+ * Returns true if we must reconstruct either the reference count or reverse
+ * mapping trees.
+ */
+bool
+needs_rmap_work(
+	struct xfs_mount	*mp)
+{
+	return xfs_sb_version_hasrmapbt(&mp->m_sb);
+}
+
+/*
+ * Initialize per-AG reverse map data.
+ */
+void
+init_rmaps(
+	struct xfs_mount	*mp)
+{
+	xfs_agnumber_t		i;
+	int			error;
+
+	if (!needs_rmap_work(mp))
+		return;
+
+	ag_rmaps = calloc(mp->m_sb.sb_agcount, sizeof(struct xfs_ag_rmap));
+	if (!ag_rmaps)
+		do_error(_("couldn't allocate per-AG reverse map roots\n"));
+
+	for (i = 0; i < mp->m_sb.sb_agcount; i++) {
+		error = init_slab(&ag_rmaps[i].ar_rmaps,
+				sizeof(struct xfs_rmap_irec));
+		if (error)
+			do_error(
+_("Insufficient memory while allocating reverse mapping slabs."));
+	}
+}
+
+/*
+ * Free the per-AG reverse-mapping data.
+ */
+void
+free_rmaps(
+	struct xfs_mount	*mp)
+{
+	xfs_agnumber_t		i;
+
+	if (!needs_rmap_work(mp))
+		return;
+
+	for (i = 0; i < mp->m_sb.sb_agcount; i++)
+		free_slab(&ag_rmaps[i].ar_rmaps);
+	free(ag_rmaps);
+	ag_rmaps = NULL;
+}
+
+/*
+ * Add an observation about a block mapping in an inode's data or attribute
+ * fork for later btree reconstruction.
+ */
+int
+add_rmap(
+	struct xfs_mount	*mp,
+	xfs_ino_t		ino,
+	int			whichfork,
+	struct xfs_bmbt_irec	*irec)
+{
+	struct xfs_slab		*rmaps;
+	struct xfs_rmap_irec	rmap;
+	xfs_agnumber_t		agno;
+	xfs_agblock_t		agbno;
+
+	if (!needs_rmap_work(mp))
+		return 0;
+
+	agno = XFS_FSB_TO_AGNO(mp, irec->br_startblock);
+	agbno = XFS_FSB_TO_AGBNO(mp, irec->br_startblock);
+	ASSERT(agno != NULLAGNUMBER);
+	ASSERT(agno < mp->m_sb.sb_agcount);
+	ASSERT(agbno + irec->br_blockcount <= mp->m_sb.sb_agblocks);
+	ASSERT(ino != NULLFSINO);
+	ASSERT(whichfork == XFS_DATA_FORK || whichfork == XFS_ATTR_FORK);
+
+	rmaps = ag_rmaps[agno].ar_rmaps;
+	rmap.rm_owner = ino;
+	rmap.rm_offset = irec->br_startoff;
+	rmap.rm_flags = 0;
+	if (whichfork == XFS_ATTR_FORK)
+		rmap.rm_flags |= XFS_RMAP_ATTR_FORK;
+	rmap.rm_startblock = agbno;
+	rmap.rm_blockcount = irec->br_blockcount;
+	if (irec->br_state == XFS_EXT_UNWRITTEN)
+		rmap.rm_flags |= XFS_RMAP_UNWRITTEN;
+	return slab_add(rmaps, &rmap);
+}
+
+#ifdef RMAP_DEBUG
+static void
+dump_rmap(
+	const char		*msg,
+	xfs_agnumber_t		agno,
+	struct xfs_rmap_irec	*rmap)
+{
+	printf("%s: %p agno=%u pblk=%llu own=%lld lblk=%llu len=%u flags=0x%x\n",
+		msg, rmap,
+		(unsigned int)agno,
+		(unsigned long long)rmap->rm_startblock,
+		(unsigned long long)rmap->rm_owner,
+		(unsigned long long)rmap->rm_offset,
+		(unsigned int)rmap->rm_blockcount,
+		(unsigned int)rmap->rm_flags);
+}
+#else
+# define dump_rmap(m, a, r)
+#endif
diff --git a/repair/rmap.h b/repair/rmap.h
new file mode 100644
index 0000000..0832790
--- /dev/null
+++ b/repair/rmap.h
@@ -0,0 +1,32 @@
+/*
+ * Copyright (C) 2016 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef RMAP_H_
+#define RMAP_H_
+
+extern bool collect_rmaps;
+
+extern bool needs_rmap_work(struct xfs_mount *);
+
+extern void init_rmaps(struct xfs_mount *);
+extern void free_rmaps(struct xfs_mount *);
+
+extern int add_rmap(struct xfs_mount *, xfs_ino_t, int, struct xfs_bmbt_irec *);
+
+#endif /* RMAP_H_ */
diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
index 3b63754..2ecd81d 100644
--- a/repair/xfs_repair.c
+++ b/repair/xfs_repair.c
@@ -32,6 +32,8 @@
 #include "threads.h"
 #include "progress.h"
 #include "dinode.h"
+#include "slab.h"
+#include "rmap.h"
 
 #define	rounddown(x, y)	(((x)/(y))*(y))
 
@@ -898,6 +900,7 @@ main(int argc, char **argv)
 	init_bmaps(mp);
 	incore_ino_init(mp);
 	incore_ext_init(mp);
+	init_rmaps(mp);
 
 	/* initialize random globals now that we know the fs geometry */
 	inodes_per_block = mp->m_sb.sb_inopblock;
@@ -931,6 +934,7 @@ main(int argc, char **argv)
 	/*
 	 * Done with the block usage maps, toss them...
 	 */
+	free_rmaps(mp);
 	free_bmaps(mp);
 
 	if (!bad_ino_btree)  {

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 060/145] xfs_repair: record and merge raw rmap data
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (58 preceding siblings ...)
  2016-06-17  1:36 ` [PATCH 059/145] xfs_repair: collect reverse-mapping data for refcount/rmap tree rebuilding Darrick J. Wong
@ 2016-06-17  1:37 ` Darrick J. Wong
  2016-06-17  1:37 ` [PATCH 061/145] xfs_repair: add inode bmbt block rmaps Darrick J. Wong
                   ` (84 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:37 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Since we still allow merging of BMBT block, AG metadata, and AG btree
block rmaps, provide a facility to collect these raw observations and
merge them (with maximal length) into the main rmap list.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/rmap.c |  137 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 repair/rmap.h |    4 ++
 2 files changed, 140 insertions(+), 1 deletion(-)


diff --git a/repair/rmap.c b/repair/rmap.c
index e78115e..1851742 100644
--- a/repair/rmap.c
+++ b/repair/rmap.c
@@ -38,6 +38,7 @@
 /* per-AG rmap object anchor */
 struct xfs_ag_rmap {
 	struct xfs_slab	*ar_rmaps;		/* rmap observations, p4 */
+	struct xfs_slab	*ar_raw_rmaps;		/* unmerged rmaps */
 };
 
 static struct xfs_ag_rmap *ag_rmaps;
@@ -109,6 +110,11 @@ init_rmaps(
 		if (error)
 			do_error(
 _("Insufficient memory while allocating reverse mapping slabs."));
+		error = init_slab(&ag_rmaps[i].ar_raw_rmaps,
+				  sizeof(struct xfs_rmap_irec));
+		if (error)
+			do_error(
+_("Insufficient memory while allocating raw metadata reverse mapping slabs."));
 	}
 }
 
@@ -124,13 +130,40 @@ free_rmaps(
 	if (!needs_rmap_work(mp))
 		return;
 
-	for (i = 0; i < mp->m_sb.sb_agcount; i++)
+	for (i = 0; i < mp->m_sb.sb_agcount; i++) {
 		free_slab(&ag_rmaps[i].ar_rmaps);
+		free_slab(&ag_rmaps[i].ar_raw_rmaps);
+	}
 	free(ag_rmaps);
 	ag_rmaps = NULL;
 }
 
 /*
+ * Decide if two reverse-mapping records can be merged.
+ */
+bool
+mergeable_rmaps(
+	struct xfs_rmap_irec	*r1,
+	struct xfs_rmap_irec	*r2)
+{
+	if (r1->rm_owner != r2->rm_owner)
+		return false;
+	if (r1->rm_startblock + r1->rm_blockcount != r2->rm_startblock)
+		return false;
+	if ((unsigned long long)r1->rm_blockcount + r2->rm_blockcount >
+	    XFS_RMAP_LEN_MAX)
+		return false;
+	if (XFS_RMAP_NON_INODE_OWNER(r2->rm_owner))
+		return true;
+	/* must be an inode owner below here */
+	if (r1->rm_flags != r2->rm_flags)
+		return false;
+	if (r1->rm_flags & XFS_RMAP_BMBT_BLOCK)
+		return true;
+	return r1->rm_offset + r1->rm_blockcount == r2->rm_offset;
+}
+
+/*
  * Add an observation about a block mapping in an inode's data or attribute
  * fork for later btree reconstruction.
  */
@@ -170,6 +203,108 @@ add_rmap(
 	return slab_add(rmaps, &rmap);
 }
 
+/* add a raw rmap; these will be merged later */
+static int
+__add_raw_rmap(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	xfs_agblock_t		agbno,
+	xfs_extlen_t		len,
+	uint64_t		owner,
+	bool			is_attr,
+	bool			is_bmbt)
+{
+	struct xfs_rmap_irec	rmap;
+
+	ASSERT(len != 0);
+	rmap.rm_owner = owner;
+	rmap.rm_offset = 0;
+	rmap.rm_flags = 0;
+	if (is_attr)
+		rmap.rm_flags |= XFS_RMAP_ATTR_FORK;
+	if (is_bmbt)
+		rmap.rm_flags |= XFS_RMAP_BMBT_BLOCK;
+	rmap.rm_startblock = agbno;
+	rmap.rm_blockcount = len;
+	return slab_add(ag_rmaps[agno].ar_raw_rmaps, &rmap);
+}
+
+/*
+ * Add a reverse mapping for a per-AG fixed metadata extent.
+ */
+int
+add_ag_rmap(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	xfs_agblock_t		agbno,
+	xfs_extlen_t		len,
+	uint64_t		owner)
+{
+	if (!needs_rmap_work(mp))
+		return 0;
+
+	ASSERT(agno != NULLAGNUMBER);
+	ASSERT(agno < mp->m_sb.sb_agcount);
+	ASSERT(agbno + len <= mp->m_sb.sb_agblocks);
+
+	return __add_raw_rmap(mp, agno, agbno, len, owner, false, false);
+}
+
+/*
+ * Merge adjacent raw rmaps and add them to the main rmap list.
+ */
+int
+fold_raw_rmaps(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno)
+{
+	struct xfs_slab_cursor	*cur = NULL;
+	struct xfs_rmap_irec	*prev, *rec;
+	size_t			old_sz;
+	int			error;
+
+	old_sz = slab_count(ag_rmaps[agno].ar_rmaps);
+	if (slab_count(ag_rmaps[agno].ar_raw_rmaps) == 0)
+		goto no_raw;
+	qsort_slab(ag_rmaps[agno].ar_raw_rmaps, rmap_compare);
+	error = init_slab_cursor(ag_rmaps[agno].ar_raw_rmaps, rmap_compare,
+			&cur);
+	if (error)
+		goto err;
+
+	prev = pop_slab_cursor(cur);
+	rec = pop_slab_cursor(cur);
+	while (rec) {
+		if (mergeable_rmaps(prev, rec)) {
+			prev->rm_blockcount += rec->rm_blockcount;
+			rec = pop_slab_cursor(cur);
+			continue;
+		}
+		error = slab_add(ag_rmaps[agno].ar_rmaps, prev);
+		if (error)
+			goto err;
+		prev = rec;
+		rec = pop_slab_cursor(cur);
+	}
+	if (prev) {
+		error = slab_add(ag_rmaps[agno].ar_rmaps, prev);
+		if (error)
+			goto err;
+	}
+	free_slab(&ag_rmaps[agno].ar_raw_rmaps);
+	error = init_slab(&ag_rmaps[agno].ar_raw_rmaps,
+			sizeof(struct xfs_rmap_irec));
+	if (error)
+		do_error(
+_("Insufficient memory while allocating raw metadata reverse mapping slabs."));
+no_raw:
+	if (old_sz)
+		qsort_slab(ag_rmaps[agno].ar_rmaps, rmap_compare);
+err:
+	free_slab_cursor(&cur);
+	return error;
+}
+
 #ifdef RMAP_DEBUG
 static void
 dump_rmap(
diff --git a/repair/rmap.h b/repair/rmap.h
index 0832790..ca92623 100644
--- a/repair/rmap.h
+++ b/repair/rmap.h
@@ -28,5 +28,9 @@ extern void init_rmaps(struct xfs_mount *);
 extern void free_rmaps(struct xfs_mount *);
 
 extern int add_rmap(struct xfs_mount *, xfs_ino_t, int, struct xfs_bmbt_irec *);
+extern int add_ag_rmap(struct xfs_mount *, xfs_agnumber_t agno,
+		xfs_agblock_t agbno, xfs_extlen_t len, uint64_t owner);
+extern int fold_raw_rmaps(struct xfs_mount *mp, xfs_agnumber_t agno);
+extern bool mergeable_rmaps(struct xfs_rmap_irec *r1, struct xfs_rmap_irec *r2);
 
 #endif /* RMAP_H_ */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 061/145] xfs_repair: add inode bmbt block rmaps
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (59 preceding siblings ...)
  2016-06-17  1:37 ` [PATCH 060/145] xfs_repair: record and merge raw rmap data Darrick J. Wong
@ 2016-06-17  1:37 ` Darrick J. Wong
  2016-06-17  1:37 ` [PATCH 062/145] xfs_repair: add fixed-location per-AG rmaps Darrick J. Wong
                   ` (83 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:37 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Record BMBT blocks in the raw rmap list.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/rmap.c |   26 ++++++++++++++++++++++++++
 repair/rmap.h |    1 +
 repair/scan.c |   11 +++++++++++
 3 files changed, 38 insertions(+)


diff --git a/repair/rmap.c b/repair/rmap.c
index 1851742..e30e99b 100644
--- a/repair/rmap.c
+++ b/repair/rmap.c
@@ -230,6 +230,32 @@ __add_raw_rmap(
 }
 
 /*
+ * Add a reverse mapping for an inode fork's block mapping btree block.
+ */
+int
+add_bmbt_rmap(
+	struct xfs_mount	*mp,
+	xfs_ino_t		ino,
+	int			whichfork,
+	xfs_fsblock_t		fsbno)
+{
+	xfs_agnumber_t		agno;
+	xfs_agblock_t		agbno;
+
+	if (!needs_rmap_work(mp))
+		return 0;
+
+	agno = XFS_FSB_TO_AGNO(mp, fsbno);
+	agbno = XFS_FSB_TO_AGBNO(mp, fsbno);
+	ASSERT(agno != NULLAGNUMBER);
+	ASSERT(agno < mp->m_sb.sb_agcount);
+	ASSERT(agbno + 1 <= mp->m_sb.sb_agblocks);
+
+	return __add_raw_rmap(mp, agno, agbno, 1, ino,
+			whichfork == XFS_ATTR_FORK, true);
+}
+
+/*
  * Add a reverse mapping for a per-AG fixed metadata extent.
  */
 int
diff --git a/repair/rmap.h b/repair/rmap.h
index ca92623..6a3a0a4 100644
--- a/repair/rmap.h
+++ b/repair/rmap.h
@@ -30,6 +30,7 @@ extern void free_rmaps(struct xfs_mount *);
 extern int add_rmap(struct xfs_mount *, xfs_ino_t, int, struct xfs_bmbt_irec *);
 extern int add_ag_rmap(struct xfs_mount *, xfs_agnumber_t agno,
 		xfs_agblock_t agbno, xfs_extlen_t len, uint64_t owner);
+extern int add_bmbt_rmap(struct xfs_mount *, xfs_ino_t, int, xfs_fsblock_t);
 extern int fold_raw_rmaps(struct xfs_mount *mp, xfs_agnumber_t agno);
 extern bool mergeable_rmaps(struct xfs_rmap_irec *r1, struct xfs_rmap_irec *r2);
 
diff --git a/repair/scan.c b/repair/scan.c
index eb23685..6157d71 100644
--- a/repair/scan.c
+++ b/repair/scan.c
@@ -29,6 +29,7 @@
 #include "bmap.h"
 #include "progress.h"
 #include "threads.h"
+#include "rmap.h"
 
 static xfs_mount_t	*mp = NULL;
 
@@ -197,6 +198,7 @@ scan_bmapbt(
 	xfs_agnumber_t		agno;
 	xfs_agblock_t		agbno;
 	int			state;
+	int			error;
 
 	/*
 	 * unlike the ag freeblock btrees, if anything looks wrong
@@ -378,6 +380,15 @@ _("bad state %d, inode %" PRIu64 " bmap block 0x%" PRIx64 "\n"),
 	(*tot)++;
 	numrecs = be16_to_cpu(block->bb_numrecs);
 
+	/* Record BMBT blocks in the reverse-mapping data. */
+	if (check_dups && collect_rmaps) {
+		error = add_bmbt_rmap(mp, ino, whichfork, bno);
+		if (error)
+			do_error(
+_("couldn't add inode %"PRIu64" bmbt block %"PRIu64" reverse-mapping data."),
+				ino, bno);
+	}
+
 	if (level == 0) {
 		if (numrecs > mp->m_bmap_dmxr[0] || (isroot == 0 && numrecs <
 							mp->m_bmap_dmnr[0])) {

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 062/145] xfs_repair: add fixed-location per-AG rmaps
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (60 preceding siblings ...)
  2016-06-17  1:37 ` [PATCH 061/145] xfs_repair: add inode bmbt block rmaps Darrick J. Wong
@ 2016-06-17  1:37 ` Darrick J. Wong
  2016-06-17  1:37 ` [PATCH 063/145] xfs_repair: check existing rmapbt entries against observed rmaps Darrick J. Wong
                   ` (82 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:37 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Add reverse-mappings for fixed-location per-AG metadata such as inode
chunks, superblocks, and the log to the raw rmap list, then merge the
raw rmap data (which also has the BMBT data) into the main rmap list.

v2: Support sparse inode chunks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/phase4.c |   41 +++++++++++++++++++++++++
 repair/rmap.c   |   92 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 repair/rmap.h   |    2 +
 3 files changed, 135 insertions(+)


diff --git a/repair/phase4.c b/repair/phase4.c
index b4264df..8880c91 100644
--- a/repair/phase4.c
+++ b/repair/phase4.c
@@ -157,6 +157,40 @@ process_ags(
 	do_inode_prefetch(mp, ag_stride, process_ag_func, true, false);
 }
 
+static void
+check_rmap_btrees(
+	work_queue_t	*wq,
+	xfs_agnumber_t	agno,
+	void		*arg)
+{
+	int		error;
+
+	error = add_fixed_ag_rmap_data(wq->mp, agno);
+	if (error)
+		do_error(
+_("unable to add AG %u metadata reverse-mapping data.\n"), agno);
+
+	error = fold_raw_rmaps(wq->mp, agno);
+	if (error)
+		do_error(
+_("unable to merge AG %u metadata reverse-mapping data.\n"), agno);
+}
+
+static void
+process_rmap_data(
+	struct xfs_mount	*mp)
+{
+	struct work_queue	wq;
+	xfs_agnumber_t		i;
+
+	if (!needs_rmap_work(mp))
+		return;
+
+	create_work_queue(&wq, mp, libxfs_nproc());
+	for (i = 0; i < mp->m_sb.sb_agcount; i++)
+		queue_work(&wq, check_rmap_btrees, i, NULL);
+	destroy_work_queue(&wq);
+}
 
 void
 phase4(xfs_mount_t *mp)
@@ -306,6 +340,13 @@ phase4(xfs_mount_t *mp)
 	 * already in phase 3.
 	 */
 	process_ags(mp);
+
+	/*
+	 * Process all the reverse-mapping data that we collected.  This
+	 * involves checking the rmap data against the btree.
+	 */
+	process_rmap_data(mp);
+
 	print_final_rpt();
 
 	/*
diff --git a/repair/rmap.c b/repair/rmap.c
index e30e99b..8f532fb 100644
--- a/repair/rmap.c
+++ b/repair/rmap.c
@@ -331,6 +331,98 @@ err:
 	return error;
 }
 
+static int
+find_first_zero_bit(
+	__uint64_t	mask)
+{
+	int		n;
+	int		b = 0;
+
+	for (n = 0; n < sizeof(mask) * NBBY && (mask & 1); n++, mask >>= 1)
+		b++;
+
+	return b;
+}
+
+static int
+popcnt(
+	__uint64_t	mask)
+{
+	int		n;
+	int		b = 0;
+
+	if (mask == 0)
+		return 0;
+
+	for (n = 0; n < sizeof(mask) * NBBY; n++, mask >>= 1)
+		if (mask & 1)
+			b++;
+
+	return b;
+}
+
+/*
+ * Add an allocation group's fixed metadata to the rmap list.  This includes
+ * sb/agi/agf/agfl headers, inode chunks, and the log.
+ */
+int
+add_fixed_ag_rmap_data(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno)
+{
+	xfs_fsblock_t		fsbno;
+	xfs_agblock_t		agbno;
+	ino_tree_node_t		*ino_rec;
+	xfs_agino_t		agino;
+	int			error;
+	int			startidx;
+	int			nr;
+
+	if (!needs_rmap_work(mp))
+		return 0;
+
+	/* sb/agi/agf/agfl headers */
+	error = add_ag_rmap(mp, agno, 0, XFS_BNO_BLOCK(mp),
+			XFS_RMAP_OWN_FS);
+	if (error)
+		goto out;
+
+	/* inodes */
+	ino_rec = findfirst_inode_rec(agno);
+	for (; ino_rec != NULL; ino_rec = next_ino_rec(ino_rec)) {
+		if (xfs_sb_version_hassparseinodes(&mp->m_sb)) {
+			startidx = find_first_zero_bit(ino_rec->ir_sparse);
+			nr = XFS_INODES_PER_CHUNK - popcnt(ino_rec->ir_sparse);
+		} else {
+			startidx = 0;
+			nr = XFS_INODES_PER_CHUNK;
+		}
+		nr /= mp->m_sb.sb_inopblock;
+		if (nr == 0)
+			nr = 1;
+		agino = ino_rec->ino_startnum + startidx;
+		agbno = XFS_AGINO_TO_AGBNO(mp, agino);
+		if (XFS_AGINO_TO_OFFSET(mp, agino) == 0) {
+			error = add_ag_rmap(mp, agno, agbno, nr,
+					XFS_RMAP_OWN_INODES);
+			if (error)
+				goto out;
+		}
+	}
+
+	/* log */
+	fsbno = mp->m_sb.sb_logstart;
+	if (fsbno && XFS_FSB_TO_AGNO(mp, fsbno) == agno) {
+		agbno = XFS_FSB_TO_AGBNO(mp, mp->m_sb.sb_logstart);
+		error = add_ag_rmap(mp, agno, agbno, mp->m_sb.sb_logblocks,
+				XFS_RMAP_OWN_LOG);
+		if (error)
+			goto out;
+	}
+out:
+	return error;
+}
+
 #ifdef RMAP_DEBUG
 static void
 dump_rmap(
diff --git a/repair/rmap.h b/repair/rmap.h
index 6a3a0a4..f948f25 100644
--- a/repair/rmap.h
+++ b/repair/rmap.h
@@ -34,4 +34,6 @@ extern int add_bmbt_rmap(struct xfs_mount *, xfs_ino_t, int, xfs_fsblock_t);
 extern int fold_raw_rmaps(struct xfs_mount *mp, xfs_agnumber_t agno);
 extern bool mergeable_rmaps(struct xfs_rmap_irec *r1, struct xfs_rmap_irec *r2);
 
+extern int add_fixed_ag_rmap_data(struct xfs_mount *, xfs_agnumber_t);
+
 #endif /* RMAP_H_ */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 063/145] xfs_repair: check existing rmapbt entries against observed rmaps
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (61 preceding siblings ...)
  2016-06-17  1:37 ` [PATCH 062/145] xfs_repair: add fixed-location per-AG rmaps Darrick J. Wong
@ 2016-06-17  1:37 ` Darrick J. Wong
  2016-06-17  1:37 ` [PATCH 064/145] xfs_repair: rebuild reverse-mapping btree Darrick J. Wong
                   ` (81 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:37 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Once we've finished collecting reverse mapping observations from the
metadata scan, check those observations against the rmap btree
(particularly if we're in -n mode) to detect rmapbt problems.

v2: Restructure after moving rmap_irec flags to separate field.
v3: Refactor code to prepare to do range queries for reflink.
Move unwritten bit to rm_offset.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/phase4.c |    6 +
 repair/rmap.c   |  253 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 repair/rmap.h   |   10 ++
 repair/scan.c   |  104 ++++++++++++++++++++---
 4 files changed, 362 insertions(+), 11 deletions(-)


diff --git a/repair/phase4.c b/repair/phase4.c
index 8880c91..e234d92 100644
--- a/repair/phase4.c
+++ b/repair/phase4.c
@@ -174,6 +174,12 @@ _("unable to add AG %u metadata reverse-mapping data.\n"), agno);
 	if (error)
 		do_error(
 _("unable to merge AG %u metadata reverse-mapping data.\n"), agno);
+
+	error = check_rmaps(wq->mp, agno);
+	if (error)
+		do_error(
+_("%s while checking reverse-mappings"),
+			 strerror(-error));
 }
 
 static void
diff --git a/repair/rmap.c b/repair/rmap.c
index 8f532fb..4648425 100644
--- a/repair/rmap.c
+++ b/repair/rmap.c
@@ -42,6 +42,7 @@ struct xfs_ag_rmap {
 };
 
 static struct xfs_ag_rmap *ag_rmaps;
+static bool rmapbt_suspect;
 
 /*
  * Compare rmap observations for array sorting.
@@ -442,3 +443,255 @@ dump_rmap(
 #else
 # define dump_rmap(m, a, r)
 #endif
+
+/*
+ * Return the number of rmap objects for an AG.
+ */
+size_t
+rmap_record_count(
+	struct xfs_mount		*mp,
+	xfs_agnumber_t		agno)
+{
+	return slab_count(ag_rmaps[agno].ar_rmaps);
+}
+
+/*
+ * Return a slab cursor that will return rmap objects in order.
+ */
+int
+init_rmap_cursor(
+	xfs_agnumber_t		agno,
+	struct xfs_slab_cursor	**cur)
+{
+	return init_slab_cursor(ag_rmaps[agno].ar_rmaps, rmap_compare, cur);
+}
+
+/*
+ * Disable the refcount btree check.
+ */
+void
+rmap_avoid_check(void)
+{
+	rmapbt_suspect = true;
+}
+
+/* Look for an rmap in the rmapbt that matches a given rmap. */
+static int
+lookup_rmap(
+	struct xfs_btree_cur	*bt_cur,
+	struct xfs_rmap_irec	*rm_rec,
+	struct xfs_rmap_irec	*tmp,
+	int			*have)
+{
+	int			error;
+
+	/* Use the regular btree retrieval routine. */
+	error = xfs_rmap_lookup_le(bt_cur, rm_rec->rm_startblock,
+				rm_rec->rm_blockcount,
+				rm_rec->rm_owner, rm_rec->rm_offset,
+				rm_rec->rm_flags, have);
+	if (error)
+		return error;
+	if (*have == 0)
+		return error;
+	return xfs_rmap_get_rec(bt_cur, tmp, have);
+}
+
+/* Does the btree rmap cover the observed rmap? */
+#define NEXTP(x)	((x)->rm_startblock + (x)->rm_blockcount)
+#define NEXTL(x)	((x)->rm_offset + (x)->rm_blockcount)
+static bool
+is_good_rmap(
+	struct xfs_rmap_irec	*observed,
+	struct xfs_rmap_irec	*btree)
+{
+	/* Can't have mismatches in the flags or the owner. */
+	if (btree->rm_flags != observed->rm_flags ||
+	    btree->rm_owner != observed->rm_owner)
+		return false;
+
+	/*
+	 * Btree record can't physically start after the observed
+	 * record, nor can it end before the observed record.
+	 */
+	if (btree->rm_startblock > observed->rm_startblock ||
+	    NEXTP(btree) < NEXTP(observed))
+		return false;
+
+	/* If this is metadata or bmbt, we're done. */
+	if (XFS_RMAP_NON_INODE_OWNER(observed->rm_owner) ||
+	    (observed->rm_flags & XFS_RMAP_BMBT_BLOCK))
+		return true;
+	/*
+	 * Btree record can't logically start after the observed
+	 * record, nor can it end before the observed record.
+	 */
+	if (btree->rm_offset > observed->rm_offset ||
+	    NEXTL(btree) < NEXTL(observed))
+		return false;
+
+	return true;
+}
+#undef NEXTP
+#undef NEXTL
+
+/*
+ * Compare the observed reverse mappings against what's in the ag btree.
+ */
+int
+check_rmaps(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno)
+{
+	struct xfs_slab_cursor	*rm_cur;
+	struct xfs_btree_cur	*bt_cur = NULL;
+	int			error;
+	int			have;
+	struct xfs_buf		*agbp = NULL;
+	struct xfs_rmap_irec	*rm_rec;
+	struct xfs_rmap_irec	tmp;
+	struct xfs_perag	*pag;		/* per allocation group data */
+
+	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+		return 0;
+	if (rmapbt_suspect) {
+		if (no_modify && agno == 0)
+			do_warn(_("would rebuild corrupt rmap btrees.\n"));
+		return 0;
+	}
+
+	/* Create cursors to refcount structures */
+	error = init_rmap_cursor(agno, &rm_cur);
+	if (error)
+		return error;
+
+	error = xfs_alloc_read_agf(mp, NULL, agno, 0, &agbp);
+	if (error)
+		goto err;
+
+	/* Leave the per-ag data "uninitialized" since we rewrite it later */
+	pag = xfs_perag_get(mp, agno);
+	pag->pagf_init = 0;
+	xfs_perag_put(pag);
+
+	bt_cur = xfs_rmapbt_init_cursor(mp, NULL, agbp, agno);
+	if (!bt_cur) {
+		error = -ENOMEM;
+		goto err;
+	}
+
+	rm_rec = pop_slab_cursor(rm_cur);
+	while (rm_rec) {
+		error = lookup_rmap(bt_cur, rm_rec, &tmp, &have);
+		if (error)
+			goto err;
+		if (!have) {
+			do_warn(
+_("Missing reverse-mapping record for (%u/%u) %slen %u owner %"PRId64" \
+%s%soff %"PRIu64"\n"),
+				agno, rm_rec->rm_startblock,
+				(rm_rec->rm_flags & XFS_RMAP_UNWRITTEN) ?
+					_("unwritten ") : "",
+				rm_rec->rm_blockcount,
+				rm_rec->rm_owner,
+				(rm_rec->rm_flags & XFS_RMAP_ATTR_FORK) ?
+					_("attr ") : "",
+				(rm_rec->rm_flags & XFS_RMAP_BMBT_BLOCK) ?
+					_("bmbt ") : "",
+				rm_rec->rm_offset);
+			goto next_loop;
+		}
+
+		/* Compare each refcount observation against the btree's */
+		if (!is_good_rmap(rm_rec, &tmp)) {
+			do_warn(
+_("Incorrect reverse-mapping: saw (%u/%u) %slen %u owner %"PRId64" %s%soff \
+%"PRIu64"; should be (%u/%u) %slen %u owner %"PRId64" %s%soff %"PRIu64"\n"),
+				agno, tmp.rm_startblock,
+				(tmp.rm_flags & XFS_RMAP_UNWRITTEN) ?
+					_("unwritten ") : "",
+				tmp.rm_blockcount,
+				tmp.rm_owner,
+				(tmp.rm_flags & XFS_RMAP_ATTR_FORK) ?
+					_("attr ") : "",
+				(tmp.rm_flags & XFS_RMAP_BMBT_BLOCK) ?
+					_("bmbt ") : "",
+				tmp.rm_offset,
+				agno, rm_rec->rm_startblock,
+				(rm_rec->rm_flags & XFS_RMAP_UNWRITTEN) ?
+					_("unwritten ") : "",
+				rm_rec->rm_blockcount,
+				rm_rec->rm_owner,
+				(rm_rec->rm_flags & XFS_RMAP_ATTR_FORK) ?
+					_("attr ") : "",
+				(rm_rec->rm_flags & XFS_RMAP_BMBT_BLOCK) ?
+					_("bmbt ") : "",
+				rm_rec->rm_offset);
+			goto next_loop;
+		}
+next_loop:
+		rm_rec = pop_slab_cursor(rm_cur);
+	}
+
+err:
+	if (bt_cur)
+		xfs_btree_del_cursor(bt_cur, XFS_BTREE_NOERROR);
+	if (agbp)
+		libxfs_putbuf(agbp);
+	free_slab_cursor(&rm_cur);
+	return 0;
+}
+
+/* Compare the key fields of two rmap records. */
+__int64_t
+rmap_diffkeys(
+	struct xfs_rmap_irec	*kp1,
+	struct xfs_rmap_irec	*kp2)
+{
+	__u64			oa;
+	__u64			ob;
+	__int64_t		d;
+	struct xfs_rmap_irec	tmp;
+
+	tmp = *kp1;
+	tmp.rm_flags &= ~XFS_RMAP_REC_FLAGS;
+	oa = xfs_rmap_irec_offset_pack(&tmp);
+	tmp = *kp2;
+	tmp.rm_flags &= ~XFS_RMAP_REC_FLAGS;
+	ob = xfs_rmap_irec_offset_pack(&tmp);
+
+	d = (__int64_t)kp2->rm_startblock - kp1->rm_startblock;
+	if (d)
+		return d;
+
+	if (kp2->rm_owner > kp1->rm_owner)
+		return 1;
+	else if (kp1->rm_owner > kp2->rm_owner)
+		return -1;
+
+	if (ob > oa)
+		return 1;
+	else if (oa > ob)
+		return -1;
+	return 0;
+}
+
+/* Compute the high key of an rmap record. */
+void
+rmap_high_key_from_rec(
+	struct xfs_rmap_irec	*rec,
+	struct xfs_rmap_irec	*key)
+{
+	int			adj;
+
+	adj = rec->rm_blockcount - 1;
+
+	key->rm_startblock = rec->rm_startblock + adj;
+	key->rm_owner = rec->rm_owner;
+	key->rm_offset = rec->rm_offset;
+	key->rm_flags = rec->rm_flags & XFS_RMAP_KEY_FLAGS;
+	if (XFS_RMAP_NON_INODE_OWNER(rec->rm_owner) ||
+	    (rec->rm_flags & XFS_RMAP_BMBT_BLOCK))
+		return;
+	key->rm_offset += adj;
+}
diff --git a/repair/rmap.h b/repair/rmap.h
index f948f25..d9d08d4 100644
--- a/repair/rmap.h
+++ b/repair/rmap.h
@@ -36,4 +36,14 @@ extern bool mergeable_rmaps(struct xfs_rmap_irec *r1, struct xfs_rmap_irec *r2);
 
 extern int add_fixed_ag_rmap_data(struct xfs_mount *, xfs_agnumber_t);
 
+extern size_t rmap_record_count(struct xfs_mount *, xfs_agnumber_t);
+extern int init_rmap_cursor(xfs_agnumber_t, struct xfs_slab_cursor **);
+extern void rmap_avoid_check(void);
+extern int check_rmaps(struct xfs_mount *, xfs_agnumber_t);
+
+extern __int64_t rmap_diffkeys(struct xfs_rmap_irec *kp1,
+		struct xfs_rmap_irec *kp2);
+extern void rmap_high_key_from_rec(struct xfs_rmap_irec *rec,
+		struct xfs_rmap_irec *key);
+
 #endif /* RMAP_H_ */
diff --git a/repair/scan.c b/repair/scan.c
index 6157d71..6106d93 100644
--- a/repair/scan.c
+++ b/repair/scan.c
@@ -29,6 +29,7 @@
 #include "bmap.h"
 #include "progress.h"
 #include "threads.h"
+#include "slab.h"
 #include "rmap.h"
 
 static xfs_mount_t	*mp = NULL;
@@ -783,6 +784,11 @@ ino_issparse(
 	return xfs_inobt_is_sparse_disk(rp, offset);
 }
 
+struct rmap_priv {
+	struct aghdr_cnts	*agcnts;
+	struct xfs_rmap_irec	high_key;
+};
+
 static void
 scan_rmapbt(
 	struct xfs_btree_block	*block,
@@ -794,21 +800,26 @@ scan_rmapbt(
 	__uint32_t		magic,
 	void			*priv)
 {
-	struct aghdr_cnts	*agcnts = priv;
 	const char		*name = "rmap";
 	int			i;
 	xfs_rmap_ptr_t		*pp;
 	struct xfs_rmap_rec	*rp;
+	struct rmap_priv	*rmap_priv = priv;
 	int			hdr_errors = 0;
 	int			numrecs;
 	int			state;
 	xfs_agblock_t		lastblock = 0;
 	int64_t			lastowner = 0;
 	int64_t			lastoffset = 0;
+	struct xfs_rmap_key	*kp;
+	struct xfs_rmap_irec	key;
+
 
 	if (magic != XFS_RMAP_CRC_MAGIC) {
 		name = "(unknown)";
-		assert(0);
+		hdr_errors++;
+		suspect++;
+		goto out;
 	}
 
 	if (be32_to_cpu(block->bb_magic) != magic) {
@@ -816,7 +827,7 @@ scan_rmapbt(
 			be32_to_cpu(block->bb_magic), name, agno, bno);
 		hdr_errors++;
 		if (suspect)
-			return;
+			goto out;
 	}
 
 	/*
@@ -825,8 +836,8 @@ scan_rmapbt(
 	 * free data block counter.
 	 */
 	if (!isroot) {
-		agcnts->agfbtreeblks++;
-		agcnts->fdblocks++;
+		rmap_priv->agcnts->agfbtreeblks++;
+		rmap_priv->agcnts->fdblocks++;
 	}
 
 	if (be16_to_cpu(block->bb_level) != level) {
@@ -834,7 +845,7 @@ scan_rmapbt(
 			level, be16_to_cpu(block->bb_level), name, agno, bno);
 		hdr_errors++;
 		if (suspect)
-			return;
+			goto out;
 	}
 
 	/* check for btree blocks multiply claimed */
@@ -844,7 +855,7 @@ scan_rmapbt(
 		do_warn(
 _("%s rmap btree block claimed (state %d), agno %d, bno %d, suspect %d\n"),
 				name, state, agno, bno, suspect);
-		return;
+		goto out;
 	}
 	set_bmap(agno, bno, XR_E_FS_MAP);
 
@@ -878,7 +889,20 @@ _("%s rmap btree block claimed (state %d), agno %d, bno %d, suspect %d\n"),
 			len = be32_to_cpu(rp[i].rm_blockcount);
 			owner = be64_to_cpu(rp[i].rm_owner);
 			offset = be64_to_cpu(rp[i].rm_offset);
-			end = b + len;
+
+			key.rm_flags = 0;
+			key.rm_startblock = b;
+			key.rm_blockcount = len;
+			key.rm_owner = owner;
+			if (xfs_rmap_irec_offset_unpack(offset, &key)) {
+				/* Look for impossible flags. */
+				do_warn(
+	_("invalid flags in record %u of %s btree block %u/%u\n"),
+					i, name, agno, bno);
+				continue;
+			}
+
+			end = key.rm_startblock + key.rm_blockcount;
 
 			/* Make sure agbno & len make sense. */
 			if (!verify_agbno(mp, agno, b)) {
@@ -919,6 +943,18 @@ advance:
 					goto advance;
 			}
 
+			/* Check that we don't go past the high key. */
+			key.rm_startblock += key.rm_blockcount - 1;
+			if (!XFS_RMAP_NON_INODE_OWNER(key.rm_owner) &&
+			    !(key.rm_flags & XFS_RMAP_BMBT_BLOCK))
+				key.rm_offset += key.rm_blockcount - 1;
+			key.rm_blockcount = 0;
+			if (rmap_diffkeys(&rmap_priv->high_key, &key) > 0) {
+				do_warn(
+	_("record %d greater than high key of block (%u/%u) in %s tree\n"),
+					i, agno, bno, name);
+			}
+
 			/* Check for block owner collisions. */
 			for ( ; b < end; b += blen)  {
 				state = get_bmap_ext(agno, b, end, &blen);
@@ -996,7 +1032,7 @@ _("unknown block (%d,%d-%d) mismatch on %s tree, state - %d,%" PRIx64 "\n"),
 				}
 			}
 		}
-		return;
+		goto out;
 	}
 
 	/*
@@ -1024,12 +1060,33 @@ _("unknown block (%d,%d-%d) mismatch on %s tree, state - %d,%" PRIx64 "\n"),
 			mp->m_rmap_mnr[1], mp->m_rmap_mxr[1],
 			name, agno, bno);
 		if (suspect)
-			return;
+			goto out;
 		suspect++;
 	} else if (suspect) {
 		suspect = 0;
 	}
 
+	/* check the node's high keys */
+	for (i = 0; !isroot && i < numrecs; i++) {
+		kp = XFS_RMAP_HIGH_KEY_ADDR(block, i + 1);
+
+		key.rm_flags = 0;
+		key.rm_startblock = be32_to_cpu(kp->rm_startblock);
+		key.rm_owner = be64_to_cpu(kp->rm_owner);
+		if (xfs_rmap_irec_offset_unpack(be64_to_cpu(kp->rm_offset),
+				&key)) {
+			/* Look for impossible flags. */
+			do_warn(
+	_("invalid flags in key %u of %s btree block %u/%u\n"),
+				i, name, agno, bno);
+			continue;
+		}
+		if (rmap_diffkeys(&rmap_priv->high_key, &key) > 0)
+			do_warn(
+	_("key %d greater than high key of block (%u/%u) in %s tree\n"),
+				i, agno, bno, name);
+	}
+
 	for (i = 0; i < numrecs; i++)  {
 		xfs_agblock_t		bno = be32_to_cpu(pp[i]);
 
@@ -1042,11 +1099,30 @@ _("unknown block (%d,%d-%d) mismatch on %s tree, state - %d,%" PRIx64 "\n"),
 		 * pointer mismatch, try and extract as much data
 		 * as possible.
 		 */
+		kp = XFS_RMAP_HIGH_KEY_ADDR(block, i + 1);
+		rmap_priv->high_key.rm_flags = 0;
+		rmap_priv->high_key.rm_startblock =
+				be32_to_cpu(kp->rm_startblock);
+		rmap_priv->high_key.rm_owner =
+				be64_to_cpu(kp->rm_owner);
+		if (xfs_rmap_irec_offset_unpack(be64_to_cpu(kp->rm_offset),
+				&rmap_priv->high_key)) {
+			/* Look for impossible flags. */
+			do_warn(
+	_("invalid flags in high key %u of %s btree block %u/%u\n"),
+				i, name, agno, bno);
+			continue;
+		}
+
 		if (bno != 0 && verify_agbno(mp, agno, bno)) {
 			scan_sbtree(bno, level, agno, suspect, scan_rmapbt, 0,
 				    magic, priv, &xfs_rmapbt_buf_ops);
 		}
 	}
+
+out:
+	if (suspect)
+		rmap_avoid_check();
 }
 
 /*
@@ -1815,15 +1891,21 @@ validate_agf(
 	}
 
 	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
+		struct rmap_priv	priv;
+
+		memset(&priv.high_key, 0xFF, sizeof(priv.high_key));
+		priv.high_key.rm_blockcount = 0;
+		priv.agcnts = agcnts;
 		bno = be32_to_cpu(agf->agf_roots[XFS_BTNUM_RMAP]);
 		if (bno != 0 && verify_agbno(mp, agno, bno)) {
 			scan_sbtree(bno,
 				    be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAP]),
 				    agno, 0, scan_rmapbt, 1, XFS_RMAP_CRC_MAGIC,
-				    agcnts, &xfs_rmapbt_buf_ops);
+				    &priv, &xfs_rmapbt_buf_ops);
 		} else  {
 			do_warn(_("bad agbno %u for rmapbt root, agno %d\n"),
 				bno, agno);
+			rmap_avoid_check();
 		}
 	}
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 064/145] xfs_repair: rebuild reverse-mapping btree
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (62 preceding siblings ...)
  2016-06-17  1:37 ` [PATCH 063/145] xfs_repair: check existing rmapbt entries against observed rmaps Darrick J. Wong
@ 2016-06-17  1:37 ` Darrick J. Wong
  2016-06-17  1:37 ` [PATCH 065/145] xfs_repair: add per-AG btree blocks to rmap data and add to rmapbt Darrick J. Wong
                   ` (80 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:37 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Rebuild the reverse-mapping btree with the rmap observations
corresponding to file extents, bmbt blocks, and fixed per-AG metadata.

v2: Update to use the flags in rm_flags and to support high key
updates for the rmapbt.

v3: Initialize rm_flags to zero so as to avoid corruption problems
later when we stash non-key flags in the offset field.

v4: Leave a few empty slots in each rmapbt leaf when we're rebuilding
the rmapbt so that we can insert records for the AG metadata blocks
without causing too many btree splits.  This (hopefully) prevents the
situation where running xfs_repair greatly increases the size of the
btree.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/phase5.c |  407 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 393 insertions(+), 14 deletions(-)


diff --git a/repair/phase5.c b/repair/phase5.c
index b58111b..bb065ec 100644
--- a/repair/phase5.c
+++ b/repair/phase5.c
@@ -28,6 +28,8 @@
 #include "versions.h"
 #include "threads.h"
 #include "progress.h"
+#include "slab.h"
+#include "rmap.h"
 
 /*
  * we maintain the current slice (path from root to leaf)
@@ -1326,6 +1328,359 @@ nextrec:
 	}
 }
 
+/* rebuild the rmap tree */
+
+/*
+ * we don't have to worry here about how chewing up free extents
+ * may perturb things because rmap tree building happens before
+ * freespace tree building.
+ */
+static void
+init_rmapbt_cursor(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	struct bt_status	*btree_curs)
+{
+	size_t			num_recs;
+	int			level;
+	struct bt_stat_level	*lptr;
+	struct bt_stat_level	*p_lptr;
+	xfs_extlen_t		blocks_allocated;
+	int			maxrecs;
+
+	if (!xfs_sb_version_hasrmapbt(&mp->m_sb)) {
+		memset(btree_curs, 0, sizeof(struct bt_status));
+		return;
+	}
+
+	lptr = &btree_curs->level[0];
+	btree_curs->init = 1;
+
+	/*
+	 * build up statistics
+	 */
+	num_recs = rmap_record_count(mp, agno);
+	if (num_recs == 0) {
+		/*
+		 * easy corner-case -- no rmap records
+		 */
+		lptr->num_blocks = 1;
+		lptr->modulo = 0;
+		lptr->num_recs_pb = 0;
+		lptr->num_recs_tot = 0;
+
+		btree_curs->num_levels = 1;
+		btree_curs->num_tot_blocks = btree_curs->num_free_blocks = 1;
+
+		setup_cursor(mp, agno, btree_curs);
+
+		return;
+	}
+
+	/*
+	 * Leave enough slack in the rmapbt that we can insert the
+	 * metadata AG entries without too many splits.
+	 */
+	maxrecs = mp->m_rmap_mxr[0];
+	if (num_recs > maxrecs)
+		maxrecs -= 10;
+	blocks_allocated = lptr->num_blocks = howmany(num_recs, maxrecs);
+
+	lptr->modulo = num_recs % lptr->num_blocks;
+	lptr->num_recs_pb = num_recs / lptr->num_blocks;
+	lptr->num_recs_tot = num_recs;
+	level = 1;
+
+	if (lptr->num_blocks > 1)  {
+		for (; btree_curs->level[level-1].num_blocks > 1
+				&& level < XFS_BTREE_MAXLEVELS;
+				level++)  {
+			lptr = &btree_curs->level[level];
+			p_lptr = &btree_curs->level[level - 1];
+			lptr->num_blocks = howmany(p_lptr->num_blocks,
+				mp->m_rmap_mxr[1]);
+			lptr->modulo = p_lptr->num_blocks % lptr->num_blocks;
+			lptr->num_recs_pb = p_lptr->num_blocks
+					/ lptr->num_blocks;
+			lptr->num_recs_tot = p_lptr->num_blocks;
+
+			blocks_allocated += lptr->num_blocks;
+		}
+	}
+	ASSERT(lptr->num_blocks == 1);
+	btree_curs->num_levels = level;
+
+	btree_curs->num_tot_blocks = btree_curs->num_free_blocks
+			= blocks_allocated;
+
+	setup_cursor(mp, agno, btree_curs);
+}
+
+static void
+prop_rmap_cursor(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	struct bt_status	*btree_curs,
+	struct xfs_rmap_irec	*rm_rec,
+	int			level)
+{
+	struct xfs_btree_block	*bt_hdr;
+	struct xfs_rmap_key	*bt_key;
+	xfs_rmap_ptr_t		*bt_ptr;
+	xfs_agblock_t		agbno;
+	struct bt_stat_level	*lptr;
+
+	level++;
+
+	if (level >= btree_curs->num_levels)
+		return;
+
+	lptr = &btree_curs->level[level];
+	bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
+
+	if (be16_to_cpu(bt_hdr->bb_numrecs) == 0)  {
+		/*
+		 * this only happens once to initialize the
+		 * first path up the left side of the tree
+		 * where the agbno's are already set up
+		 */
+		prop_rmap_cursor(mp, agno, btree_curs, rm_rec, level);
+	}
+
+	if (be16_to_cpu(bt_hdr->bb_numrecs) ==
+				lptr->num_recs_pb + (lptr->modulo > 0))  {
+		/*
+		 * write out current prev block, grab us a new block,
+		 * and set the rightsib pointer of current block
+		 */
+#ifdef XR_BLD_INO_TRACE
+		fprintf(stderr, " ino prop agbno %d ", lptr->prev_agbno);
+#endif
+		if (lptr->prev_agbno != NULLAGBLOCK)  {
+			ASSERT(lptr->prev_buf_p != NULL);
+			libxfs_writebuf(lptr->prev_buf_p, 0);
+		}
+		lptr->prev_agbno = lptr->agbno;
+		lptr->prev_buf_p = lptr->buf_p;
+		agbno = get_next_blockaddr(agno, level, btree_curs);
+
+		bt_hdr->bb_u.s.bb_rightsib = cpu_to_be32(agbno);
+
+		lptr->buf_p = libxfs_getbuf(mp->m_dev,
+					XFS_AGB_TO_DADDR(mp, agno, agbno),
+					XFS_FSB_TO_BB(mp, 1));
+		lptr->agbno = agbno;
+
+		if (lptr->modulo)
+			lptr->modulo--;
+
+		/*
+		 * initialize block header
+		 */
+		lptr->buf_p->b_ops = &xfs_rmapbt_buf_ops;
+		bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
+		memset(bt_hdr, 0, mp->m_sb.sb_blocksize);
+		xfs_btree_init_block(mp, lptr->buf_p, XFS_RMAP_CRC_MAGIC,
+					level, 0, agno,
+					XFS_BTREE_CRC_BLOCKS);
+
+		bt_hdr->bb_u.s.bb_leftsib = cpu_to_be32(lptr->prev_agbno);
+
+		/*
+		 * propagate extent record for first extent in new block up
+		 */
+		prop_rmap_cursor(mp, agno, btree_curs, rm_rec, level);
+	}
+	/*
+	 * add inode info to current block
+	 */
+	be16_add_cpu(&bt_hdr->bb_numrecs, 1);
+
+	bt_key = XFS_RMAP_KEY_ADDR(bt_hdr,
+				    be16_to_cpu(bt_hdr->bb_numrecs));
+	bt_ptr = XFS_RMAP_PTR_ADDR(bt_hdr,
+				    be16_to_cpu(bt_hdr->bb_numrecs),
+				    mp->m_rmap_mxr[1]);
+
+	bt_key->rm_startblock = cpu_to_be32(rm_rec->rm_startblock);
+	bt_key->rm_owner = cpu_to_be64(rm_rec->rm_owner);
+	bt_key->rm_offset = cpu_to_be64(rm_rec->rm_offset);
+
+	*bt_ptr = cpu_to_be32(btree_curs->level[level-1].agbno);
+}
+
+static void
+prop_rmap_highkey(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	struct bt_status	*btree_curs,
+	struct xfs_rmap_irec	*rm_highkey)
+{
+	struct xfs_btree_block	*bt_hdr;
+	struct xfs_rmap_key	*bt_key;
+	struct bt_stat_level	*lptr;
+	struct xfs_rmap_irec	key;
+	struct xfs_rmap_irec	high_key;
+	int			level;
+	int			i;
+	int			numrecs;
+
+	key.rm_flags = 0;
+	high_key = *rm_highkey;
+	for (level = 1; level < btree_curs->num_levels; level++) {
+		lptr = &btree_curs->level[level];
+		bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
+		numrecs = be16_to_cpu(bt_hdr->bb_numrecs);
+		bt_key = XFS_RMAP_HIGH_KEY_ADDR(bt_hdr, numrecs);
+
+		bt_key->rm_startblock = cpu_to_be32(high_key.rm_startblock);
+		bt_key->rm_owner = cpu_to_be64(high_key.rm_owner);
+		bt_key->rm_offset = cpu_to_be64(
+				xfs_rmap_irec_offset_pack(&high_key));
+
+		for (i = 1; i < numrecs - 1; i++) {
+			bt_key = XFS_RMAP_HIGH_KEY_ADDR(bt_hdr, i);
+			key.rm_startblock = be32_to_cpu(bt_key->rm_startblock);
+			key.rm_owner = be64_to_cpu(bt_key->rm_owner);
+			key.rm_offset = be64_to_cpu(bt_key->rm_offset);
+			if (rmap_diffkeys(&high_key, &key) > 0)
+				high_key = key;
+		}
+	}
+}
+
+/*
+ * rebuilds a rmap btree given a cursor.
+ */
+static void
+build_rmap_tree(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	struct bt_status	*btree_curs)
+{
+	xfs_agnumber_t		i;
+	xfs_agblock_t		j;
+	xfs_agblock_t		agbno;
+	struct xfs_btree_block	*bt_hdr;
+	struct xfs_rmap_irec	*rm_rec;
+	struct xfs_slab_cursor	*rmap_cur;
+	struct xfs_rmap_rec	*bt_rec;
+	struct xfs_rmap_irec	highest_key;
+	struct xfs_rmap_irec	hi_key;
+	struct bt_stat_level	*lptr;
+	int			level = btree_curs->num_levels;
+	int			error;
+
+	highest_key.rm_flags = 0;
+	for (i = 0; i < level; i++)  {
+		lptr = &btree_curs->level[i];
+
+		agbno = get_next_blockaddr(agno, i, btree_curs);
+		lptr->buf_p = libxfs_getbuf(mp->m_dev,
+					XFS_AGB_TO_DADDR(mp, agno, agbno),
+					XFS_FSB_TO_BB(mp, 1));
+
+		if (i == btree_curs->num_levels - 1)
+			btree_curs->root = agbno;
+
+		lptr->agbno = agbno;
+		lptr->prev_agbno = NULLAGBLOCK;
+		lptr->prev_buf_p = NULL;
+		/*
+		 * initialize block header
+		 */
+
+		lptr->buf_p->b_ops = &xfs_rmapbt_buf_ops;
+		bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
+		memset(bt_hdr, 0, mp->m_sb.sb_blocksize);
+		xfs_btree_init_block(mp, lptr->buf_p, XFS_RMAP_CRC_MAGIC,
+					i, 0, agno,
+					XFS_BTREE_CRC_BLOCKS);
+	}
+
+	/*
+	 * run along leaf, setting up records.  as we have to switch
+	 * blocks, call the prop_rmap_cursor routine to set up the new
+	 * pointers for the parent.  that can recurse up to the root
+	 * if required.  set the sibling pointers for leaf level here.
+	 */
+	error = init_rmap_cursor(agno, &rmap_cur);
+	if (error)
+		do_error(
+_("Insufficient memory to construct reverse-map cursor."));
+	rm_rec = pop_slab_cursor(rmap_cur);
+	lptr = &btree_curs->level[0];
+
+	for (i = 0; i < lptr->num_blocks; i++)  {
+		/*
+		 * block initialization, lay in block header
+		 */
+		lptr->buf_p->b_ops = &xfs_rmapbt_buf_ops;
+		bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
+		memset(bt_hdr, 0, mp->m_sb.sb_blocksize);
+		xfs_btree_init_block(mp, lptr->buf_p, XFS_RMAP_CRC_MAGIC,
+					0, 0, agno,
+					XFS_BTREE_CRC_BLOCKS);
+
+		bt_hdr->bb_u.s.bb_leftsib = cpu_to_be32(lptr->prev_agbno);
+		bt_hdr->bb_numrecs = cpu_to_be16(lptr->num_recs_pb +
+							(lptr->modulo > 0));
+
+		if (lptr->modulo > 0)
+			lptr->modulo--;
+
+		if (lptr->num_recs_pb > 0)
+			prop_rmap_cursor(mp, agno, btree_curs, rm_rec, 0);
+
+		bt_rec = (struct xfs_rmap_rec *)
+			  ((char *)bt_hdr + XFS_RMAP_BLOCK_LEN);
+		highest_key.rm_startblock = 0;
+		highest_key.rm_owner = 0;
+		highest_key.rm_offset = 0;
+		for (j = 0; j < be16_to_cpu(bt_hdr->bb_numrecs); j++) {
+			ASSERT(rm_rec != NULL);
+			bt_rec[j].rm_startblock =
+					cpu_to_be32(rm_rec->rm_startblock);
+			bt_rec[j].rm_blockcount =
+					cpu_to_be32(rm_rec->rm_blockcount);
+			bt_rec[j].rm_owner = cpu_to_be64(rm_rec->rm_owner);
+			bt_rec[j].rm_offset = cpu_to_be64(
+					xfs_rmap_irec_offset_pack(rm_rec));
+			rmap_high_key_from_rec(rm_rec, &hi_key);
+			if (rmap_diffkeys(&highest_key, &hi_key) > 0)
+				highest_key = hi_key;
+
+			rm_rec = pop_slab_cursor(rmap_cur);
+		}
+
+		/* Now go set the parent key */
+		prop_rmap_highkey(mp, agno, btree_curs, &highest_key);
+
+		if (rm_rec != NULL)  {
+			/*
+			 * get next leaf level block
+			 */
+			if (lptr->prev_buf_p != NULL)  {
+#ifdef XR_BLD_RL_TRACE
+				fprintf(stderr, "writing rmapbt agbno %u\n",
+					lptr->prev_agbno);
+#endif
+				ASSERT(lptr->prev_agbno != NULLAGBLOCK);
+				libxfs_writebuf(lptr->prev_buf_p, 0);
+			}
+			lptr->prev_buf_p = lptr->buf_p;
+			lptr->prev_agbno = lptr->agbno;
+			lptr->agbno = get_next_blockaddr(agno, 0, btree_curs);
+			bt_hdr->bb_u.s.bb_rightsib = cpu_to_be32(lptr->agbno);
+
+			lptr->buf_p = libxfs_getbuf(mp->m_dev,
+					XFS_AGB_TO_DADDR(mp, agno, lptr->agbno),
+					XFS_FSB_TO_BB(mp, 1));
+		}
+	}
+	free_slab_cursor(&rmap_cur);
+}
+
 /*
  * build both the agf and the agfl for an agno given both
  * btree cursors.
@@ -1333,19 +1688,21 @@ nextrec:
  * XXX: yet more common code that can be shared with mkfs/growfs.
  */
 static void
-build_agf_agfl(xfs_mount_t	*mp,
-		xfs_agnumber_t	agno,
-		bt_status_t	*bno_bt,
-		bt_status_t	*bcnt_bt,
-		xfs_extlen_t	freeblks,	/* # free blocks in tree */
-		int		lostblocks)	/* # blocks that will be lost */
+build_agf_agfl(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	struct bt_status	*bno_bt,
+	struct bt_status	*bcnt_bt,
+	xfs_extlen_t		freeblks,	/* # free blocks in tree */
+	int			lostblocks,	/* # blocks that will be lost */
+	struct bt_status	*rmap_bt)
 {
-	extent_tree_node_t	*ext_ptr;
-	xfs_buf_t		*agf_buf, *agfl_buf;
+	struct extent_tree_node	*ext_ptr;
+	struct xfs_buf		*agf_buf, *agfl_buf;
 	int			i;
 	int			j;
-	xfs_agfl_t		*agfl;
-	xfs_agf_t		*agf;
+	struct xfs_agfl		*agfl;
+	struct xfs_agf		*agf;
 	__be32			*freelist;
 
 	agf_buf = libxfs_getbuf(mp->m_dev,
@@ -1377,20 +1734,25 @@ build_agf_agfl(xfs_mount_t	*mp,
 	agf->agf_levels[XFS_BTNUM_BNO] = cpu_to_be32(bno_bt->num_levels);
 	agf->agf_roots[XFS_BTNUM_CNT] = cpu_to_be32(bcnt_bt->root);
 	agf->agf_levels[XFS_BTNUM_CNT] = cpu_to_be32(bcnt_bt->num_levels);
+	agf->agf_roots[XFS_BTNUM_RMAP] = cpu_to_be32(rmap_bt->root);
+	agf->agf_levels[XFS_BTNUM_RMAP] = cpu_to_be32(rmap_bt->num_levels);
 	agf->agf_freeblks = cpu_to_be32(freeblks);
 
 	/*
 	 * Count and record the number of btree blocks consumed if required.
 	 */
 	if (xfs_sb_version_haslazysbcount(&mp->m_sb)) {
+		unsigned int blks;
 		/*
 		 * Don't count the root blocks as they are already
 		 * accounted for.
 		 */
-		agf->agf_btreeblks = cpu_to_be32(
-			(bno_bt->num_tot_blocks - bno_bt->num_free_blocks) +
+		blks = (bno_bt->num_tot_blocks - bno_bt->num_free_blocks) +
 			(bcnt_bt->num_tot_blocks - bcnt_bt->num_free_blocks) -
-			2);
+			2;
+		if (xfs_sb_version_hasrmapbt(&mp->m_sb))
+			blks += rmap_bt->num_tot_blocks - rmap_bt->num_free_blocks - 1;
+		agf->agf_btreeblks = cpu_to_be32(blks);
 #ifdef XR_BLD_FREE_TRACE
 		fprintf(stderr, "agf->agf_btreeblks = %u\n",
 				be32_to_cpu(agf->agf_btreeblks));
@@ -1586,6 +1948,7 @@ phase5_func(
 	bt_status_t	bcnt_btree_curs;
 	bt_status_t	ino_btree_curs;
 	bt_status_t	fino_btree_curs;
+	bt_status_t	rmap_btree_curs;
 	int		extra_blocks = 0;
 	uint		num_freeblocks;
 	xfs_extlen_t	freeblks1;
@@ -1641,6 +2004,12 @@ phase5_func(
 		sb_icount_ag[agno] += num_inos;
 		sb_ifree_ag[agno] += num_free_inos;
 
+		/*
+		 * Set up the btree cursors for the on-disk rmap btrees,
+		 * which includes pre-allocating all required blocks.
+		 */
+		init_rmapbt_cursor(mp, agno, &rmap_btree_curs);
+
 		num_extents = count_bno_extents_blocks(agno, &num_freeblocks);
 		/*
 		 * lose two blocks per AG -- the space tree roots
@@ -1725,11 +2094,19 @@ phase5_func(
 
 		ASSERT(freeblks1 == freeblks2);
 
+		if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
+			build_rmap_tree(mp, agno, &rmap_btree_curs);
+			write_cursor(&rmap_btree_curs);
+			sb_fdblocks_ag[agno] += (rmap_btree_curs.num_tot_blocks -
+					rmap_btree_curs.num_free_blocks) - 1;
+		}
+
 		/*
 		 * set up agf and agfl
 		 */
 		build_agf_agfl(mp, agno, &bno_btree_curs,
-				&bcnt_btree_curs, freeblks1, extra_blocks);
+				&bcnt_btree_curs, freeblks1, extra_blocks,
+				&rmap_btree_curs);
 		/*
 		 * build inode allocation tree.
 		 */
@@ -1758,6 +2135,8 @@ phase5_func(
 		 */
 		finish_cursor(&bno_btree_curs);
 		finish_cursor(&ino_btree_curs);
+		if (xfs_sb_version_hasrmapbt(&mp->m_sb))
+			finish_cursor(&rmap_btree_curs);
 		if (xfs_sb_version_hasfinobt(&mp->m_sb))
 			finish_cursor(&fino_btree_curs);
 		finish_cursor(&bcnt_btree_curs);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 065/145] xfs_repair: add per-AG btree blocks to rmap data and add to rmapbt
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (63 preceding siblings ...)
  2016-06-17  1:37 ` [PATCH 064/145] xfs_repair: rebuild reverse-mapping btree Darrick J. Wong
@ 2016-06-17  1:37 ` Darrick J. Wong
  2016-06-17  1:37 ` [PATCH 066/145] xfs_repair: merge data & attr fork reverse mappings Darrick J. Wong
                   ` (79 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:37 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Since we can't know the location of the new per-AG btree blocks prior
to constructing the rmapbt, we must record raw reverse-mapping data for
btree blocks while the new btrees are under construction.  After the
rmapbt has been rebuilt, merge the btree rmap entries into the rmapbt
with the libxfs code.

Also refactor the freelist fixing code since we need it to tidy up
the AGFL after each rmapbt allocation.

v2: Use xfs_rmap_alloc to add rmap records for AG metadata blocks
because it knows how to merge adjacent rmaps.  This particular bug was
discovered while running xfs_repair twice on generic/175 wherein block
X was originally allocated to the rmapbt, then X+1 got allocated to
the rmapbt when we expanded it to hold all the entries for the rmapbt
blocks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/phase5.c |   52 +++++++-------
 repair/rmap.c   |  198 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 repair/rmap.h   |    4 +
 3 files changed, 226 insertions(+), 28 deletions(-)


diff --git a/repair/phase5.c b/repair/phase5.c
index bb065ec..db84440 100644
--- a/repair/phase5.c
+++ b/repair/phase5.c
@@ -74,6 +74,7 @@ typedef struct bt_status  {
 	 * per-level status info
 	 */
 	bt_stat_level_t		level[XFS_BTREE_MAXLEVELS];
+	uint64_t		owner;		/* owner */
 } bt_status_t;
 
 /*
@@ -205,6 +206,7 @@ setup_cursor(xfs_mount_t *mp, xfs_agnumber_t agno, bt_status_t *curs)
 	extent_tree_node_t	*bno_ext_ptr;
 	xfs_extlen_t		blocks_allocated;
 	xfs_agblock_t		*agb_ptr;
+	int			error;
 
 	/*
 	 * get the number of blocks we need to allocate, then
@@ -249,6 +251,12 @@ setup_cursor(xfs_mount_t *mp, xfs_agnumber_t agno, bt_status_t *curs)
 			blocks_allocated++;
 		}
 
+		error = add_ag_rmap(mp, agno, ext_ptr->ex_startblock, u,
+				curs->owner);
+		if (error)
+			do_error(_("could not set up btree rmaps: %s\n"),
+				strerror(-error));
+
 		/*
 		 * if we only used part of this last extent, then we
 		 * need only to reset the extent in the extent
@@ -916,6 +924,7 @@ init_ino_cursor(xfs_mount_t *mp, xfs_agnumber_t agno, bt_status_t *btree_curs,
 
 	lptr = &btree_curs->level[0];
 	btree_curs->init = 1;
+	btree_curs->owner = XFS_RMAP_OWN_INOBT;
 
 	/*
 	 * build up statistics
@@ -1355,6 +1364,7 @@ init_rmapbt_cursor(
 
 	lptr = &btree_curs->level[0];
 	btree_curs->init = 1;
+	btree_curs->owner = XFS_RMAP_OWN_AG;
 
 	/*
 	 * build up statistics
@@ -1834,6 +1844,7 @@ build_agf_agfl(
 		agf->agf_flfirst = 0;
 		agf->agf_fllast = cpu_to_be32(i - 1);
 		agf->agf_flcount = cpu_to_be32(i);
+		rmap_store_agflcount(mp, agno, i);
 
 #ifdef XR_BLD_FREE_TRACE
 		fprintf(stderr, "writing agfl for ag %u\n", agno);
@@ -1858,35 +1869,8 @@ build_agf_agfl(
 
 	/*
 	 * now fix up the free list appropriately
-	 * XXX: code lifted from mkfs, should be shared.
 	 */
-	{
-		xfs_alloc_arg_t	args;
-		xfs_trans_t	*tp;
-		struct xfs_trans_res tres = {0};
-		int		error;
-
-		memset(&args, 0, sizeof(args));
-		args.mp = mp;
-		args.agno = agno;
-		args.alignment = 1;
-		args.pag = xfs_perag_get(mp,agno);
-		error = -libxfs_trans_alloc(mp, &tres,
-				xfs_alloc_min_freelist(mp, args.pag),
-				0, 0, &tp);
-		if (error) {
-			do_error(_("failed to fix AGFL on AG %d, error %d\n"),
-					agno, error);
-		}
-		args.tp = tp;
-		error = -libxfs_alloc_fix_freelist(&args, 0);
-		xfs_perag_put(args.pag);
-		if (error) {
-			do_error(_("failed to fix AGFL on AG %d, error %d\n"),
-					agno, error);
-		}
-		libxfs_trans_commit(tp);
-	}
+	fix_freelist(mp, agno, true);
 
 #ifdef XR_BLD_FREE_TRACE
 	fprintf(stderr, "wrote agf for ag %u\n", agno);
@@ -1958,6 +1942,7 @@ phase5_func(
 	xfs_agblock_t	num_extents;
 	__uint32_t	magic;
 	struct agi_stat	agi_stat = {0,};
+	int		error;
 
 	if (verbose)
 		do_log(_("        - agno = %d\n"), agno);
@@ -2063,6 +2048,8 @@ phase5_func(
 
 		bcnt_btree_curs = bno_btree_curs;
 
+		bno_btree_curs.owner = XFS_RMAP_OWN_AG;
+		bcnt_btree_curs.owner = XFS_RMAP_OWN_AG;
 		setup_cursor(mp, agno, &bno_btree_curs);
 		setup_cursor(mp, agno, &bcnt_btree_curs);
 
@@ -2140,6 +2127,15 @@ phase5_func(
 		if (xfs_sb_version_hasfinobt(&mp->m_sb))
 			finish_cursor(&fino_btree_curs);
 		finish_cursor(&bcnt_btree_curs);
+
+		/*
+		 * Put the per-AG btree rmap data into the rmapbt
+		 */
+		error = store_ag_btree_rmap_data(mp, agno);
+		if (error)
+			do_error(
+_("unable to add AG %u reverse-mapping data to btree.\n"), agno);
+
 		/*
 		 * release the incore per-AG bno/bcnt trees so
 		 * the extent nodes can be recycled
diff --git a/repair/rmap.c b/repair/rmap.c
index 4648425..9c17ee8 100644
--- a/repair/rmap.c
+++ b/repair/rmap.c
@@ -39,6 +39,8 @@
 struct xfs_ag_rmap {
 	struct xfs_slab	*ar_rmaps;		/* rmap observations, p4 */
 	struct xfs_slab	*ar_raw_rmaps;		/* unmerged rmaps */
+	int		ar_flcount;		/* agfl entries from leftover */
+						/* agbt allocations */
 };
 
 static struct xfs_ag_rmap *ag_rmaps;
@@ -424,6 +426,124 @@ out:
 	return error;
 }
 
+/*
+ * Copy the per-AG btree reverse-mapping data into the rmapbt.
+ *
+ * At rmapbt reconstruction time, the rmapbt will be populated _only_ with
+ * rmaps for file extents, inode chunks, AG headers, and bmbt blocks.  While
+ * building the AG btrees we can record all the blocks allocated for each
+ * btree, but we cannot resolve the conflict between the fact that one has to
+ * finish allocating the space for the rmapbt before building the bnobt and the
+ * fact that allocating blocks for the bnobt requires adding rmapbt entries.
+ * Therefore we record in-core the rmaps for each btree and here use the
+ * libxfs rmap functions to finish building the rmap btree.
+ *
+ * During AGF/AGFL reconstruction in phase 5, rmaps for the AG btrees are
+ * recorded in memory.  The rmapbt has not been set up yet, so we need to be
+ * able to "expand" the AGFL without updating the rmapbt.  After we've written
+ * out the new AGF header the new rmapbt is available, so this function reads
+ * each AGFL to generate rmap entries.  These entries are merged with the AG
+ * btree rmap entries, and then we use libxfs' rmap functions to add them to
+ * the rmapbt, after which it is fully regenerated.
+ */
+int
+store_ag_btree_rmap_data(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno)
+{
+	struct xfs_slab_cursor	*rm_cur;
+	struct xfs_rmap_irec	*rm_rec = NULL;
+	struct xfs_buf		*agbp = NULL;
+	struct xfs_buf		*agflbp = NULL;
+	struct xfs_trans	*tp;
+	struct xfs_trans_res tres = {0};
+	__be32			*agfl_bno, *b;
+	int			error = 0;
+	struct xfs_owner_info	oinfo;
+
+	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+		return 0;
+
+	/* Release the ar_rmaps; they were put into the rmapbt during p5. */
+	free_slab(&ag_rmaps[agno].ar_rmaps);
+	error = init_slab(&ag_rmaps[agno].ar_rmaps,
+				  sizeof(struct xfs_rmap_irec));
+	if (error)
+		goto err;
+
+	/* Add the AGFL blocks to the rmap list */
+	error = xfs_trans_read_buf(
+			mp, NULL, mp->m_ddev_targp,
+			XFS_AG_DADDR(mp, agno, XFS_AGFL_DADDR(mp)),
+			XFS_FSS_TO_BB(mp, 1), 0, &agflbp, &xfs_agfl_buf_ops);
+	if (error)
+		goto err;
+
+	agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp);
+	agfl_bno += ag_rmaps[agno].ar_flcount;
+	b = agfl_bno;
+	while (*b != NULLAGBLOCK && b - agfl_bno <= XFS_AGFL_SIZE(mp)) {
+		error = add_ag_rmap(mp, agno, be32_to_cpu(*b), 1,
+				XFS_RMAP_OWN_AG);
+		if (error)
+			goto err;
+		b++;
+	}
+	libxfs_putbuf(agflbp);
+	agflbp = NULL;
+
+	/* Merge all the raw rmaps into the main list */
+	error = fold_raw_rmaps(mp, agno);
+	if (error)
+		goto err;
+
+	/* Create cursors to refcount structures */
+	error = init_slab_cursor(ag_rmaps[agno].ar_rmaps, rmap_compare,
+			&rm_cur);
+	if (error)
+		goto err;
+
+	/* Insert rmaps into the btree one at a time */
+	rm_rec = pop_slab_cursor(rm_cur);
+	while (rm_rec) {
+		error = -libxfs_trans_alloc(mp, &tres, 16, 0, 0, &tp);
+		if (error)
+			goto err_slab;
+
+		error = xfs_alloc_read_agf(mp, tp, agno, 0, &agbp);
+		if (error)
+			goto err_trans;
+
+		ASSERT(XFS_RMAP_NON_INODE_OWNER(rm_rec->rm_owner));
+		xfs_rmap_ag_owner(&oinfo, rm_rec->rm_owner);
+		error = xfs_rmap_alloc(tp, agbp, agno, rm_rec->rm_startblock,
+				rm_rec->rm_blockcount, &oinfo);
+		if (error)
+			goto err_trans;
+
+		error = -libxfs_trans_commit(tp);
+		if (error)
+			goto err_slab;
+
+		fix_freelist(mp, agno, false);
+
+		rm_rec = pop_slab_cursor(rm_cur);
+	}
+
+	free_slab_cursor(&rm_cur);
+	return 0;
+
+err_trans:
+	libxfs_trans_cancel(tp);
+err_slab:
+	free_slab_cursor(&rm_cur);
+err:
+	if (agflbp)
+		libxfs_putbuf(agflbp);
+	printf("FAIL err %d\n", error);
+	return error;
+}
+
 #ifdef RMAP_DEBUG
 static void
 dump_rmap(
@@ -695,3 +815,81 @@ rmap_high_key_from_rec(
 		return;
 	key->rm_offset += adj;
 }
+
+/*
+ * Regenerate the AGFL so that we don't run out of it while rebuilding the
+ * rmap btree.  If skip_rmapbt is true, don't update the rmapbt (most probably
+ * because we're updating the rmapbt).
+ */
+void
+fix_freelist(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	bool			skip_rmapbt)
+{
+	xfs_alloc_arg_t		args;
+	xfs_trans_t		*tp;
+	struct xfs_trans_res	tres = {0};
+	int			flags;
+	int			error;
+
+	memset(&args, 0, sizeof(args));
+	args.mp = mp;
+	args.agno = agno;
+	args.alignment = 1;
+	args.pag = xfs_perag_get(mp, agno);
+	error = -libxfs_trans_alloc(mp, &tres,
+			xfs_alloc_min_freelist(mp, args.pag), 0, 0, &tp);
+	if (error)
+		do_error(_("failed to fix AGFL on AG %d, error %d\n"),
+				agno, error);
+	args.tp = tp;
+
+	/*
+	 * Prior to rmapbt, all we had to do to fix the freelist is "expand"
+	 * the fresh AGFL header from empty to full.  That hasn't changed.  For
+	 * rmapbt, however, things change a bit.
+	 *
+	 * When we're stuffing the rmapbt with the AG btree rmaps the tree can
+	 * expand, so we need to keep the AGFL well-stocked for the expansion.
+	 * However, this expansion can cause the bnobt/cntbt to shrink, which
+	 * can make the AGFL eligible for shrinking.  Shrinking involves
+	 * freeing rmapbt entries, but since we haven't finished loading the
+	 * rmapbt with the btree rmaps it's possible for the remove operation
+	 * to fail.  The AGFL block is large enough at this point to absorb any
+	 * blocks freed from the bnobt/cntbt, so we can disable shrinking.
+	 *
+	 * During the initial AGFL regeneration during AGF generation in phase5
+	 * we must also disable rmapbt modifications because the AGF that
+	 * libxfs reads does not yet point to the new rmapbt.  These initial
+	 * AGFL entries are added just prior to adding the AG btree block rmaps
+	 * to the rmapbt.  It's ok to pass NOSHRINK here too, since the AGFL is
+	 * empty and cannot shrink.
+	 */
+	flags = XFS_ALLOC_FLAG_NOSHRINK;
+	if (skip_rmapbt)
+		flags |= XFS_ALLOC_FLAG_NORMAP;
+	error = libxfs_alloc_fix_freelist(&args, flags);
+	xfs_perag_put(args.pag);
+	if (error) {
+		do_error(_("failed to fix AGFL on AG %d, error %d\n"),
+				agno, error);
+	}
+	libxfs_trans_commit(tp);
+}
+
+/*
+ * Remember how many AGFL entries came from excess AG btree allocations and
+ * therefore already have rmap entries.
+ */
+void
+rmap_store_agflcount(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	int			count)
+{
+	if (!needs_rmap_work(mp))
+		return;
+
+	ag_rmaps[agno].ar_flcount = count;
+}
diff --git a/repair/rmap.h b/repair/rmap.h
index d9d08d4..4722266 100644
--- a/repair/rmap.h
+++ b/repair/rmap.h
@@ -35,6 +35,7 @@ extern int fold_raw_rmaps(struct xfs_mount *mp, xfs_agnumber_t agno);
 extern bool mergeable_rmaps(struct xfs_rmap_irec *r1, struct xfs_rmap_irec *r2);
 
 extern int add_fixed_ag_rmap_data(struct xfs_mount *, xfs_agnumber_t);
+extern int store_ag_btree_rmap_data(struct xfs_mount *, xfs_agnumber_t);
 
 extern size_t rmap_record_count(struct xfs_mount *, xfs_agnumber_t);
 extern int init_rmap_cursor(xfs_agnumber_t, struct xfs_slab_cursor **);
@@ -46,4 +47,7 @@ extern __int64_t rmap_diffkeys(struct xfs_rmap_irec *kp1,
 extern void rmap_high_key_from_rec(struct xfs_rmap_irec *rec,
 		struct xfs_rmap_irec *key);
 
+extern void fix_freelist(struct xfs_mount *, xfs_agnumber_t, bool);
+extern void rmap_store_agflcount(struct xfs_mount *, xfs_agnumber_t, int);
+
 #endif /* RMAP_H_ */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 066/145] xfs_repair: merge data & attr fork reverse mappings
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (64 preceding siblings ...)
  2016-06-17  1:37 ` [PATCH 065/145] xfs_repair: add per-AG btree blocks to rmap data and add to rmapbt Darrick J. Wong
@ 2016-06-17  1:37 ` Darrick J. Wong
  2016-06-17  1:37 ` [PATCH 067/145] xfs_repair: look for mergeable rmaps Darrick J. Wong
                   ` (78 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:37 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Merge data and attribute fork reverse mappings.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/phase4.c |   10 ++++++++++
 repair/rmap.c   |   32 +++++++++++++++++++++++++++++---
 repair/rmap.h   |    2 ++
 3 files changed, 41 insertions(+), 3 deletions(-)


diff --git a/repair/phase4.c b/repair/phase4.c
index e234d92..3be3786 100644
--- a/repair/phase4.c
+++ b/repair/phase4.c
@@ -154,7 +154,17 @@ static void
 process_ags(
 	xfs_mount_t		*mp)
 {
+	xfs_agnumber_t		i;
+	int			error;
+
 	do_inode_prefetch(mp, ag_stride, process_ag_func, true, false);
+	for (i = 0; i < mp->m_sb.sb_agcount; i++) {
+		error = finish_collecting_fork_rmaps(mp, i);
+		if (error)
+			do_error(
+_("unable to finish adding attr/data fork reverse-mapping data for AG %u.\n"),
+				i);
+	}
 }
 
 static void
diff --git a/repair/rmap.c b/repair/rmap.c
index 9c17ee8..e39df5a 100644
--- a/repair/rmap.c
+++ b/repair/rmap.c
@@ -41,6 +41,7 @@ struct xfs_ag_rmap {
 	struct xfs_slab	*ar_raw_rmaps;		/* unmerged rmaps */
 	int		ar_flcount;		/* agfl entries from leftover */
 						/* agbt allocations */
+	struct xfs_rmap_irec	ar_last_rmap;	/* last rmap seen */
 };
 
 static struct xfs_ag_rmap *ag_rmaps;
@@ -118,6 +119,7 @@ _("Insufficient memory while allocating reverse mapping slabs."));
 		if (error)
 			do_error(
 _("Insufficient memory while allocating raw metadata reverse mapping slabs."));
+		ag_rmaps[i].ar_last_rmap.rm_owner = XFS_RMAP_OWN_UNKNOWN;
 	}
 }
 
@@ -177,10 +179,11 @@ add_rmap(
 	int			whichfork,
 	struct xfs_bmbt_irec	*irec)
 {
-	struct xfs_slab		*rmaps;
 	struct xfs_rmap_irec	rmap;
 	xfs_agnumber_t		agno;
 	xfs_agblock_t		agbno;
+	struct xfs_rmap_irec	*last_rmap;
+	int			error = 0;
 
 	if (!needs_rmap_work(mp))
 		return 0;
@@ -193,7 +196,6 @@ add_rmap(
 	ASSERT(ino != NULLFSINO);
 	ASSERT(whichfork == XFS_DATA_FORK || whichfork == XFS_ATTR_FORK);
 
-	rmaps = ag_rmaps[agno].ar_rmaps;
 	rmap.rm_owner = ino;
 	rmap.rm_offset = irec->br_startoff;
 	rmap.rm_flags = 0;
@@ -203,7 +205,31 @@ add_rmap(
 	rmap.rm_blockcount = irec->br_blockcount;
 	if (irec->br_state == XFS_EXT_UNWRITTEN)
 		rmap.rm_flags |= XFS_RMAP_UNWRITTEN;
-	return slab_add(rmaps, &rmap);
+	last_rmap = &ag_rmaps[agno].ar_last_rmap;
+	if (last_rmap->rm_owner == XFS_RMAP_OWN_UNKNOWN)
+		*last_rmap = rmap;
+	else if (mergeable_rmaps(last_rmap, &rmap))
+		last_rmap->rm_blockcount += rmap.rm_blockcount;
+	else {
+		error = slab_add(ag_rmaps[agno].ar_rmaps, last_rmap);
+		if (error)
+			return error;
+		*last_rmap = rmap;
+	}
+
+	return error;
+}
+
+/* Finish collecting inode data/attr fork rmaps. */
+int
+finish_collecting_fork_rmaps(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno)
+{
+	if (!needs_rmap_work(mp) ||
+	    ag_rmaps[agno].ar_last_rmap.rm_owner == XFS_RMAP_OWN_UNKNOWN)
+		return 0;
+	return slab_add(ag_rmaps[agno].ar_rmaps, &ag_rmaps[agno].ar_last_rmap);
 }
 
 /* add a raw rmap; these will be merged later */
diff --git a/repair/rmap.h b/repair/rmap.h
index 4722266..69215e8 100644
--- a/repair/rmap.h
+++ b/repair/rmap.h
@@ -28,6 +28,8 @@ extern void init_rmaps(struct xfs_mount *);
 extern void free_rmaps(struct xfs_mount *);
 
 extern int add_rmap(struct xfs_mount *, xfs_ino_t, int, struct xfs_bmbt_irec *);
+extern int finish_collecting_fork_rmaps(struct xfs_mount *mp,
+		xfs_agnumber_t agno);
 extern int add_ag_rmap(struct xfs_mount *, xfs_agnumber_t agno,
 		xfs_agblock_t agbno, xfs_extlen_t len, uint64_t owner);
 extern int add_bmbt_rmap(struct xfs_mount *, xfs_ino_t, int, xfs_fsblock_t);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 067/145] xfs_repair: look for mergeable rmaps
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (65 preceding siblings ...)
  2016-06-17  1:37 ` [PATCH 066/145] xfs_repair: merge data & attr fork reverse mappings Darrick J. Wong
@ 2016-06-17  1:37 ` Darrick J. Wong
  2016-06-17  1:37 ` [PATCH 068/145] xfs_repair: check for impossible rmap record field combinations Darrick J. Wong
                   ` (77 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:37 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Check for adjacent mergeable rmaps; this is a sign that we've
screwed up somehow.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/scan.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)


diff --git a/repair/scan.c b/repair/scan.c
index 6106d93..d72b257 100644
--- a/repair/scan.c
+++ b/repair/scan.c
@@ -787,6 +787,7 @@ ino_issparse(
 struct rmap_priv {
 	struct aghdr_cnts	*agcnts;
 	struct xfs_rmap_irec	high_key;
+	struct xfs_rmap_irec	last_rec;
 };
 
 static void
@@ -943,6 +944,16 @@ advance:
 					goto advance;
 			}
 
+			/* Is this mergeable with the previous record? */
+			if (mergeable_rmaps(&rmap_priv->last_rec, &key)) {
+				do_warn(
+	_("record %d in block (%u/%u) of %s tree should be merged with previous record\n"),
+					i, agno, bno, name);
+				rmap_priv->last_rec.rm_blockcount +=
+						key.rm_blockcount;
+			} else
+				rmap_priv->last_rec = key;
+
 			/* Check that we don't go past the high key. */
 			key.rm_startblock += key.rm_blockcount - 1;
 			if (!XFS_RMAP_NON_INODE_OWNER(key.rm_owner) &&
@@ -1896,6 +1907,7 @@ validate_agf(
 		memset(&priv.high_key, 0xFF, sizeof(priv.high_key));
 		priv.high_key.rm_blockcount = 0;
 		priv.agcnts = agcnts;
+		priv.last_rec.rm_owner = XFS_RMAP_OWN_UNKNOWN;
 		bno = be32_to_cpu(agf->agf_roots[XFS_BTNUM_RMAP]);
 		if (bno != 0 && verify_agbno(mp, agno, bno)) {
 			scan_sbtree(bno,

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 068/145] xfs_repair: check for impossible rmap record field combinations
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (66 preceding siblings ...)
  2016-06-17  1:37 ` [PATCH 067/145] xfs_repair: look for mergeable rmaps Darrick J. Wong
@ 2016-06-17  1:37 ` Darrick J. Wong
  2016-06-17  1:38 ` [PATCH 069/145] mkfs: set agsize prior to calculating minimum log size Darrick J. Wong
                   ` (76 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:37 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Make sure there are no records or keys with impossible field
combinations, such as non-inode records with offsets or flags.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/scan.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)


diff --git a/repair/scan.c b/repair/scan.c
index d72b257..ec41ba6 100644
--- a/repair/scan.c
+++ b/repair/scan.c
@@ -926,6 +926,18 @@ _("%s rmap btree block claimed (state %d), agno %d, bno %d, suspect %d\n"),
 	_("invalid owner in rmap btree record %d (%"PRId64" %u) block %u/%u\n"),
 						i, owner, len, agno, bno);
 
+			/* Look for impossible record field combinations. */
+			if (XFS_RMAP_NON_INODE_OWNER(key.rm_owner)) {
+				if (key.rm_flags)
+					do_warn(
+	_("record %d of block (%u/%u) in %s btree cannot have non-inode owner with flags\n"),
+						i, agno, bno, name);
+				if (key.rm_offset)
+					do_warn(
+	_("record %d of block (%u/%u) in %s btree cannot have non-inode owner with offset\n"),
+						i, agno, bno, name);
+			}
+
 			/* Check for out of order records. */
 			if (i == 0) {
 advance:

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 069/145] mkfs: set agsize prior to calculating minimum log size
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (67 preceding siblings ...)
  2016-06-17  1:37 ` [PATCH 068/145] xfs_repair: check for impossible rmap record field combinations Darrick J. Wong
@ 2016-06-17  1:38 ` Darrick J. Wong
  2016-06-17  1:38 ` [PATCH 070/145] mkfs.xfs: create filesystems with reverse-mappings Darrick J. Wong
                   ` (75 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:38 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Each btree has its own maxlevels variable.  Since the level count of
certain btrees depend on agblocks, it's necessary to know the AG size
prior to calculating the log reservations.  These reservations are
needed to calculate the log size and the kernel will refuse to mount
if we guess too low, so stuff in the real agsize when we're formatting
the log.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 include/xfs_multidisk.h |    2 +-
 mkfs/maxtrres.c         |    3 ++-
 mkfs/xfs_mkfs.c         |    3 ++-
 3 files changed, 5 insertions(+), 3 deletions(-)


diff --git a/include/xfs_multidisk.h b/include/xfs_multidisk.h
index fe3e98f..4429dab 100644
--- a/include/xfs_multidisk.h
+++ b/include/xfs_multidisk.h
@@ -66,7 +66,7 @@ extern void parse_proto (xfs_mount_t *mp, struct fsxattr *fsx, char **pp);
 extern void res_failed (int err);
 
 /* maxtrres.c */
-extern int max_trans_res (int crcs_enabled, int dirversion,
+extern int max_trans_res(unsigned long agsize, int crcs_enabled, int dirversion,
 		int sectorlog, int blocklog, int inodelog, int dirblocklog,
 		int logversion, int log_sunit, int finobt);
 
diff --git a/mkfs/maxtrres.c b/mkfs/maxtrres.c
index f48a0f7..c0b1b5d 100644
--- a/mkfs/maxtrres.c
+++ b/mkfs/maxtrres.c
@@ -29,6 +29,7 @@
 
 int
 max_trans_res(
+	unsigned long	agsize,
 	int		crcs_enabled,
 	int		dirversion,
 	int		sectorlog,
@@ -50,7 +51,7 @@ max_trans_res(
 	sbp->sb_sectsize = 1 << sbp->sb_sectlog;
 	sbp->sb_blocklog = blocklog;
 	sbp->sb_blocksize = 1 << blocklog;
-	sbp->sb_agblocks = XFS_AG_MIN_BYTES / (1 << blocklog);
+	sbp->sb_agblocks = agsize;
 	sbp->sb_inodelog = inodelog;
 	sbp->sb_inopblog = blocklog - inodelog;
 	sbp->sb_inodesize = 1 << inodelog;
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index 4b5df98..8b3cad8 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -2887,7 +2887,8 @@ an AG size that is one stripe unit smaller, for example %llu.\n"),
 		lsunit = (32 * 1024) >> blocklog;
 	}
 
-	min_logblocks = max_trans_res(sb_feat.crcs_enabled, sb_feat.dir_version,
+	min_logblocks = max_trans_res(agsize,
+				   sb_feat.crcs_enabled, sb_feat.dir_version,
 				   sectorlog, blocklog, inodelog, dirblocklog,
 				   sb_feat.log_version, lsunit, sb_feat.finobt);
 	ASSERT(min_logblocks);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 070/145] mkfs.xfs: create filesystems with reverse-mappings
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (68 preceding siblings ...)
  2016-06-17  1:38 ` [PATCH 069/145] mkfs: set agsize prior to calculating minimum log size Darrick J. Wong
@ 2016-06-17  1:38 ` Darrick J. Wong
  2016-06-17  1:38 ` [PATCH 071/145] xfs: count the blocks in a btree Darrick J. Wong
                   ` (74 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:38 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: Dave Chinner, xfs

From: Dave Chinner <dchinner@redhat.com>

Create v5 filesystems with rmapbt turned on.  Document the rmapbt
options to mkfs, and initialize the extra field we added for reflink
support.

v2: Turn on the rmapbt feature when calculating the minimum log size.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
[darrick.wong@oracle.com: split patch, add commit message and extra fields]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 include/xfs_multidisk.h |    2 -
 man/man8/mkfs.xfs.8     |   20 +++++++
 mkfs/maxtrres.c         |    5 +-
 mkfs/xfs_mkfs.c         |  138 +++++++++++++++++++++++++++++++++++++++++------
 4 files changed, 145 insertions(+), 20 deletions(-)


diff --git a/include/xfs_multidisk.h b/include/xfs_multidisk.h
index 4429dab..8dc3027 100644
--- a/include/xfs_multidisk.h
+++ b/include/xfs_multidisk.h
@@ -68,6 +68,6 @@ extern void res_failed (int err);
 /* maxtrres.c */
 extern int max_trans_res(unsigned long agsize, int crcs_enabled, int dirversion,
 		int sectorlog, int blocklog, int inodelog, int dirblocklog,
-		int logversion, int log_sunit, int finobt);
+		int logversion, int log_sunit, int finobt, int rmapbt);
 
 #endif	/* __XFS_MULTIDISK_H__ */
diff --git a/man/man8/mkfs.xfs.8 b/man/man8/mkfs.xfs.8
index 980b0e1..d88d314 100644
--- a/man/man8/mkfs.xfs.8
+++ b/man/man8/mkfs.xfs.8
@@ -193,6 +193,26 @@ is used, the free inode btree feature is not supported and is disabled.
 .BI uuid= value
 Use the given value as the filesystem UUID for the newly created filesystem.
 The default is to generate a random UUID.
+.TP
+.BI rmapbt= value
+This option enables the creation of a reverse-mapping btree index in each
+allocation group.  The value is either 0 to disable the feature, or 1 to
+create the btree.
+.IP
+The reverse mapping btree maps filesystem blocks to the owner of the
+filesystem block.  Most of the mappings will be to an inode number and an
+offset, though there will also be mappings to filesystem metadata.  This
+secondary metadata can be used to validate the primary metadata or to
+pinpoint exactly which data has been lost when a disk error occurs.
+.IP
+By default,
+.B mkfs.xfs
+will not create reverse mapping btrees.  This feature is only available
+for filesystems created with the (default)
+.B \-m crc=1
+option set. When the option
+.B \-m crc=0
+is used, the reverse mapping btree feature is not supported and is disabled.
 .RE
 .TP
 .BI \-d " data_section_options"
diff --git a/mkfs/maxtrres.c b/mkfs/maxtrres.c
index c0b1b5d..fc24eac 100644
--- a/mkfs/maxtrres.c
+++ b/mkfs/maxtrres.c
@@ -38,7 +38,8 @@ max_trans_res(
 	int		dirblocklog,
 	int		logversion,
 	int		log_sunit,
-	int		finobt)
+	int		finobt,
+	int		rmapbt)
 {
 	xfs_sb_t	*sbp;
 	xfs_mount_t	mount;
@@ -72,6 +73,8 @@ max_trans_res(
 			XFS_DFL_SB_VERSION_BITS;
 	if (finobt)
 		sbp->sb_features_ro_compat |= XFS_SB_FEAT_RO_COMPAT_FINOBT;
+	if (rmapbt)
+		sbp->sb_features_ro_compat |= XFS_SB_FEAT_RO_COMPAT_RMAPBT;
 
 	libxfs_mount(&mount, sbp, 0,0,0,0);
 	maxfsb = xfs_log_calc_minimum_size(&mount);
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index 8b3cad8..634dcfd 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -680,6 +680,8 @@ struct opt_params mopts = {
 		"finobt",
 #define M_UUID		2
 		"uuid",
+#define M_RMAPBT	3
+		"rmapbt",
 		NULL
 	},
 	.subopt_params = {
@@ -699,6 +701,12 @@ struct opt_params mopts = {
 		  .conflicts = { LAST_CONFLICT },
 		  .defaultval = SUBOPT_NEEDS_VAL,
 		},
+		{ .index = M_RMAPBT,
+		  .conflicts = { LAST_CONFLICT },
+		  .minval = 0,
+		  .maxval = 1,
+		  .defaultval = 0,
+		},
 	},
 };
 
@@ -1454,6 +1462,7 @@ struct sb_feat_args {
 	bool	crcs_enabled;
 	bool	dirftype;
 	bool	parent_pointers;
+	bool	rmapbt;
 };
 
 static void
@@ -1524,6 +1533,8 @@ sb_set_features(
 
 	if (fp->finobt)
 		sbp->sb_features_ro_compat = XFS_SB_FEAT_RO_COMPAT_FINOBT;
+	if (fp->rmapbt)
+		sbp->sb_features_ro_compat |= XFS_SB_FEAT_RO_COMPAT_RMAPBT;
 
 	/*
 	 * Sparse inode chunk support has two main inode alignment requirements.
@@ -1784,6 +1795,7 @@ main(
 		.crcs_enabled = true,
 		.dirftype = true,
 		.parent_pointers = false,
+		.rmapbt = false,
 	};
 
 	platform_uuid_generate(&uuid);
@@ -2073,6 +2085,10 @@ main(
 					if (platform_uuid_parse(value, &uuid))
 						illegal(optarg, "m uuid");
 					break;
+				case M_RMAPBT:
+					sb_feat.rmapbt = getnum(
+						value, &mopts, M_RMAPBT);
+					break;
 				default:
 					unknown('m', value);
 				}
@@ -2409,6 +2425,20 @@ _("sparse inodes not supported without CRC support\n"));
 		}
 		sb_feat.spinodes = 0;
 
+		if (sb_feat.rmapbt) {
+			fprintf(stderr,
+_("rmapbt not supported without CRC support\n"));
+			usage();
+		}
+		sb_feat.rmapbt = false;
+	}
+
+
+	if (sb_feat.rmapbt && xi.rtname) {
+		fprintf(stderr,
+_("rmapbt not supported with realtime devices\n"));
+		usage();
+		sb_feat.rmapbt = false;
 	}
 
 	if (nsflag || nlflag) {
@@ -2890,7 +2920,8 @@ an AG size that is one stripe unit smaller, for example %llu.\n"),
 	min_logblocks = max_trans_res(agsize,
 				   sb_feat.crcs_enabled, sb_feat.dir_version,
 				   sectorlog, blocklog, inodelog, dirblocklog,
-				   sb_feat.log_version, lsunit, sb_feat.finobt);
+				   sb_feat.log_version, lsunit, sb_feat.finobt,
+				   sb_feat.rmapbt);
 	ASSERT(min_logblocks);
 	min_logblocks = MAX(XFS_MIN_LOG_BLOCKS, min_logblocks);
 	if (!logsize && dblocks >= (1024*1024*1024) >> blocklog)
@@ -2965,7 +2996,7 @@ _("size %s specified for log subvolume is too large, maximum is %lld blocks\n"),
 	mp->m_sectbb_log = sbp->sb_sectlog - BBSHIFT;
 
 	/*
-	 * sb_versionnum and finobt flags must be set before we use
+	 * sb_versionnum, finobt and rmapbt flags must be set before we use
 	 * xfs_prealloc_blocks().
 	 */
 	sb_set_features(&mp->m_sb, &sb_feat, sectorsize, lsectorsize, dsunit);
@@ -3025,7 +3056,7 @@ _("size %s specified for log subvolume is too large, maximum is %lld blocks\n"),
 		printf(_(
 		   "meta-data=%-22s isize=%-6d agcount=%lld, agsize=%lld blks\n"
 		   "         =%-22s sectsz=%-5u attr=%u, projid32bit=%u\n"
-		   "         =%-22s crc=%-8u finobt=%u, sparse=%u\n"
+		   "         =%-22s crc=%-8u finobt=%u, sparse=%u, rmapbt=%u\n"
 		   "data     =%-22s bsize=%-6u blocks=%llu, imaxpct=%u\n"
 		   "         =%-22s sunit=%-6u swidth=%u blks\n"
 		   "naming   =version %-14u bsize=%-6u ascii-ci=%d ftype=%d\n"
@@ -3036,6 +3067,7 @@ _("size %s specified for log subvolume is too large, maximum is %lld blocks\n"),
 			"", sectorsize, sb_feat.attr_version,
 				    !sb_feat.projid16bit,
 			"", sb_feat.crcs_enabled, sb_feat.finobt, sb_feat.spinodes,
+			sb_feat.rmapbt,
 			"", blocksize, (long long)dblocks, imaxpct,
 			"", dsunit, dswidth,
 			sb_feat.dir_version, dirblocksize, sb_feat.nci,
@@ -3217,6 +3249,12 @@ _("size %s specified for log subvolume is too large, maximum is %lld blocks\n"),
 		agf->agf_levels[XFS_BTNUM_CNTi] = cpu_to_be32(1);
 		pag->pagf_levels[XFS_BTNUM_BNOi] = 1;
 		pag->pagf_levels[XFS_BTNUM_CNTi] = 1;
+		if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
+			agf->agf_roots[XFS_BTNUM_RMAPi] =
+						cpu_to_be32(XFS_RMAP_BLOCK(mp));
+			agf->agf_levels[XFS_BTNUM_RMAPi] = cpu_to_be32(1);
+		}
+
 		agf->agf_flfirst = 0;
 		agf->agf_fllast = cpu_to_be32(XFS_AGFL_SIZE(mp) - 1);
 		agf->agf_flcount = 0;
@@ -3404,24 +3442,88 @@ _("size %s specified for log subvolume is too large, maximum is %lld blocks\n"),
 		/*
 		 * Free INO btree root block
 		 */
-		if (!sb_feat.finobt) {
-			xfs_perag_put(pag);
-			continue;
+		if (sb_feat.finobt) {
+			buf = libxfs_getbuf(mp->m_ddev_targp,
+					XFS_AGB_TO_DADDR(mp, agno, XFS_FIBT_BLOCK(mp)),
+					bsize);
+			buf->b_ops = &xfs_inobt_buf_ops;
+			block = XFS_BUF_TO_BLOCK(buf);
+			memset(block, 0, blocksize);
+			if (xfs_sb_version_hascrc(&mp->m_sb))
+				xfs_btree_init_block(mp, buf, XFS_FIBT_CRC_MAGIC, 0, 0,
+							agno, XFS_BTREE_CRC_BLOCKS);
+			else
+				xfs_btree_init_block(mp, buf, XFS_FIBT_MAGIC, 0, 0,
+							agno, 0);
+			libxfs_writebuf(buf, LIBXFS_EXIT_ON_FAILURE);
 		}
 
-		buf = libxfs_getbuf(mp->m_ddev_targp,
-				XFS_AGB_TO_DADDR(mp, agno, XFS_FIBT_BLOCK(mp)),
+		/* RMAP btree root block */
+		if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
+			struct xfs_rmap_rec	*rrec;
+
+			buf = libxfs_getbuf(mp->m_ddev_targp,
+				XFS_AGB_TO_DADDR(mp, agno, XFS_RMAP_BLOCK(mp)),
 				bsize);
-		buf->b_ops = &xfs_inobt_buf_ops;
-		block = XFS_BUF_TO_BLOCK(buf);
-		memset(block, 0, blocksize);
-		if (xfs_sb_version_hascrc(&mp->m_sb))
-			xfs_btree_init_block(mp, buf, XFS_FIBT_CRC_MAGIC, 0, 0,
+			buf->b_ops = &xfs_rmapbt_buf_ops;
+			block = XFS_BUF_TO_BLOCK(buf);
+			memset(block, 0, blocksize);
+
+			xfs_btree_init_block(mp, buf, XFS_RMAP_CRC_MAGIC, 0, 0,
 						agno, XFS_BTREE_CRC_BLOCKS);
-		else
-			xfs_btree_init_block(mp, buf, XFS_FIBT_MAGIC, 0, 0,
-						agno, 0);
-		libxfs_writebuf(buf, LIBXFS_EXIT_ON_FAILURE);
+
+			/*
+			 * mark the AG header regions as static metadata
+			 * The BNO btree block is the first block after the
+			 * headers, so it's location defines the size of region
+			 * the static metadata consumes.
+			 */
+			rrec = XFS_RMAP_REC_ADDR(block, 1);
+			rrec->rm_startblock = 0;
+			rrec->rm_blockcount = cpu_to_be32(XFS_BNO_BLOCK(mp));
+			rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_FS);
+			rrec->rm_offset = 0;
+			be16_add_cpu(&block->bb_numrecs, 1);
+
+			/* account freespace btree root blocks */
+			rrec = XFS_RMAP_REC_ADDR(block, 2);
+			rrec->rm_startblock = cpu_to_be32(XFS_BNO_BLOCK(mp));
+			rrec->rm_blockcount = cpu_to_be32(2);
+			rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_AG);
+			rrec->rm_offset = 0;
+			be16_add_cpu(&block->bb_numrecs, 1);
+
+			/* account inode btree root blocks */
+			rrec = XFS_RMAP_REC_ADDR(block, 3);
+			rrec->rm_startblock = cpu_to_be32(XFS_IBT_BLOCK(mp));
+			rrec->rm_blockcount = cpu_to_be32(XFS_RMAP_BLOCK(mp) -
+							XFS_IBT_BLOCK(mp));
+			rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_INOBT);
+			rrec->rm_offset = 0;
+			be16_add_cpu(&block->bb_numrecs, 1);
+
+			/* account for rmap btree root */
+			rrec = XFS_RMAP_REC_ADDR(block, 4);
+			rrec->rm_startblock = cpu_to_be32(XFS_RMAP_BLOCK(mp));
+			rrec->rm_blockcount = cpu_to_be32(1);
+			rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_AG);
+			rrec->rm_offset = 0;
+			be16_add_cpu(&block->bb_numrecs, 1);
+
+			/* account for the log space */
+			if (loginternal && agno == logagno) {
+				rrec = XFS_RMAP_REC_ADDR(block, 5);
+				rrec->rm_startblock = cpu_to_be32(
+						XFS_FSB_TO_AGBNO(mp, logstart));
+				rrec->rm_blockcount = cpu_to_be32(logblocks);
+				rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_LOG);
+				rrec->rm_offset = 0;
+				be16_add_cpu(&block->bb_numrecs, 1);
+			}
+
+			libxfs_writebuf(buf, LIBXFS_EXIT_ON_FAILURE);
+		}
+
 		xfs_perag_put(pag);
 	}
 
@@ -3646,7 +3748,7 @@ usage( void )
 {
 	fprintf(stderr, _("Usage: %s\n\
 /* blocksize */		[-b log=n|size=num]\n\
-/* metadata */		[-m crc=0|1,finobt=0|1,uuid=xxx]\n\
+/* metadata */		[-m crc=0|1,finobt=0|1,uuid=xxx,rmapbt=0|1]\n\
 /* data subvol */	[-d agcount=n,agsize=n,file,name=xxx,size=num,\n\
 			    (sunit=value,swidth=value|su=num,sw=num|noalign),\n\
 			    sectlog=n|sectsize=num\n\

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 071/145] xfs: count the blocks in a btree
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (69 preceding siblings ...)
  2016-06-17  1:38 ` [PATCH 070/145] mkfs.xfs: create filesystems with reverse-mappings Darrick J. Wong
@ 2016-06-17  1:38 ` Darrick J. Wong
  2016-06-17  1:38 ` [PATCH 072/145] xfs: set up per-AG free space reservations Darrick J. Wong
                   ` (73 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:38 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Provide a helper method to count the number of blocks in a short form
btree.  The refcount and rmap btrees need to know the number of blocks
already in use to set up their per-AG block reservations during mount.

v2: Use btree_visit_blocks instead of open-coding our own traversal
routine.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_btree.c |   22 ++++++++++++++++++++++
 libxfs/xfs_btree.h |    2 ++
 2 files changed, 24 insertions(+)


diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c
index db0267a..1e547f8 100644
--- a/libxfs/xfs_btree.c
+++ b/libxfs/xfs_btree.c
@@ -4799,3 +4799,25 @@ xfs_btree_query_range(
 	return xfs_btree_overlapped_query_range(cur, low_rec, high_rec,
 			fn, priv);
 }
+
+int
+xfs_btree_count_blocks_helper(
+	struct xfs_btree_cur	*cur,
+	int			level,
+	void			*data)
+{
+	xfs_extlen_t		*blocks = data;
+	(*blocks)++;
+
+	return 0;
+}
+
+/* Count the blocks in a btree and return the result in *blocks. */
+int
+xfs_btree_count_blocks(
+	struct xfs_btree_cur	*cur,
+	xfs_extlen_t		*blocks)
+{
+	return xfs_btree_visit_blocks(cur, xfs_btree_count_blocks_helper,
+			blocks);
+}
diff --git a/libxfs/xfs_btree.h b/libxfs/xfs_btree.h
index 9963c48..6fa13a9 100644
--- a/libxfs/xfs_btree.h
+++ b/libxfs/xfs_btree.h
@@ -519,4 +519,6 @@ typedef int (*xfs_btree_visit_blocks_fn)(struct xfs_btree_cur *cur, int level,
 int xfs_btree_visit_blocks(struct xfs_btree_cur *cur,
 		xfs_btree_visit_blocks_fn fn, void *data);
 
+int xfs_btree_count_blocks(struct xfs_btree_cur *cur, xfs_extlen_t *blocks);
+
 #endif	/* __XFS_BTREE_H__ */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 072/145] xfs: set up per-AG free space reservations
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (70 preceding siblings ...)
  2016-06-17  1:38 ` [PATCH 071/145] xfs: count the blocks in a btree Darrick J. Wong
@ 2016-06-17  1:38 ` Darrick J. Wong
  2016-06-17  1:38 ` [PATCH 073/145] xfs: introduce refcount btree definitions Darrick J. Wong
                   ` (72 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:38 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

One unfortunate quirk of the reference count btree -- it can expand in
size when blocks are written to *other* allocation groups if, say, one
large extent becomes a lot of tiny extents.  Since we don't want to
start throwing errors in the middle of CoWing, we need to reserve some
blocks to handle future expansion.

Use the count of how many reserved blocks we need to have on hand to
create a virtual reservation in the AG.  Through selective clamping of
the maximum length of allocation requests and of the length of the
longest free extent, we can make it look like there's less free space
in the AG unless the reservation owner is asking for blocks.

In other words, play some accounting tricks in-core to make sure that
we always have blocks available.  On the plus side, there's nothing to
clean up if we crash, which is contrast to the strategy that the rough
draft used (actually removing extents from the freespace btrees).

v2: There's really only two kinds of per-AG reservation pools -- one
to feed the AGFL (rmapbt), and one to feed everything else
(refcountbt).  Bearing that in mind, we can embed the reservation
controls in xfs_perag and greatly simplify the block accounting.
Furthermore, fix some longstanding accounting bugs that were a direct
result of the goofy "allocate a block and later fix up the accounting"
strategy by integrating the reservation accounting code more tightly
with the allocator.  This eliminates the ENOSPC complaints resulting
from refcount btree splits during truncate operations.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 include/xfs_mount.h       |   34 +++++
 include/xfs_trace.h       |    9 +
 libxfs/Makefile           |    2 
 libxfs/defer_item.c       |    3 
 libxfs/xfs_ag_resv.c      |  317 +++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_ag_resv.h      |   35 +++++
 libxfs/xfs_alloc.c        |   93 ++++++++++---
 libxfs/xfs_alloc.h        |    8 +
 libxfs/xfs_bmap.c         |    6 +
 libxfs/xfs_ialloc_btree.c |    2 
 10 files changed, 479 insertions(+), 30 deletions(-)
 create mode 100644 libxfs/xfs_ag_resv.c
 create mode 100644 libxfs/xfs_ag_resv.h


diff --git a/include/xfs_mount.h b/include/xfs_mount.h
index 5cd9464..c452de2 100644
--- a/include/xfs_mount.h
+++ b/include/xfs_mount.h
@@ -112,6 +112,20 @@ typedef struct xfs_mount {
 	struct xlog		*m_log;
 } xfs_mount_t;
 
+/* per-AG block reservation data structures*/
+enum xfs_ag_resv_type {
+	XFS_AG_RESV_NONE = 0,
+	XFS_AG_RESV_METADATA,
+	XFS_AG_RESV_AGFL,
+};
+
+struct xfs_ag_resv {
+	/* number of block reserved here */
+	xfs_extlen_t			ar_reserved;
+	/* number of blocks originally asked for */
+	xfs_extlen_t			ar_asked;
+};
+
 /*
  * Per-ag incore structure, copies of information in agf and agi,
  * to improve the performance of allocation group selection.
@@ -142,8 +156,28 @@ typedef struct xfs_perag {
 	xfs_agino_t	pagl_leftrec;
 	xfs_agino_t	pagl_rightrec;
 	int		pagb_count;	/* pagb slots in use */
+
+	/* Blocks reserved for all kinds of metadata. */
+	struct xfs_ag_resv	pag_meta_resv;
+	/* Blocks reserved for just AGFL-based metadata. */
+	struct xfs_ag_resv	pag_agfl_resv;
 } xfs_perag_t;
 
+static inline struct xfs_ag_resv *
+xfs_perag_resv(
+	struct xfs_perag	*pag,
+	enum xfs_ag_resv_type	type)
+{
+	switch (type) {
+	case XFS_AG_RESV_METADATA:
+		return &pag->pag_meta_resv;
+	case XFS_AG_RESV_AGFL:
+		return &pag->pag_agfl_resv;
+	default:
+		return NULL;
+	}
+}
+
 #define LIBXFS_MOUNT_DEBUGGER		0x0001
 #define LIBXFS_MOUNT_32BITINODES	0x0002
 #define LIBXFS_MOUNT_32BITINOOPT	0x0004
diff --git a/include/xfs_trace.h b/include/xfs_trace.h
index 00c2ccb..040f2f0 100644
--- a/include/xfs_trace.h
+++ b/include/xfs_trace.h
@@ -212,6 +212,15 @@
 #define trace_xfs_rmap_convert_error(...)	((void) 0)
 #define trace_xfs_rmap_find_left_neighbor_result(...)		((void) 0)
 
+#define trace_xfs_ag_resv_critical(...)		((void) 0)
+#define trace_xfs_ag_resv_needed(...)		((void) 0)
+#define trace_xfs_ag_resv_free(...)		((void) 0)
+#define trace_xfs_ag_resv_free_error(...)	((void) 0)
+#define trace_xfs_ag_resv_init(...)		((void) 0)
+#define trace_xfs_ag_resv_init_error(...)	((void) 0)
+#define trace_xfs_ag_resv_alloc_extent(...)	((void) 0)
+#define trace_xfs_ag_resv_free_extent(...)	((void) 0)
+
 /* set c = c to avoid unused var warnings */
 #define trace_xfs_perag_get(a,b,c,d)	((c) = (c))
 #define trace_xfs_perag_get_tag(a,b,c,d) ((c) = (c))
diff --git a/libxfs/Makefile b/libxfs/Makefile
index df47da2..588c663 100644
--- a/libxfs/Makefile
+++ b/libxfs/Makefile
@@ -18,6 +18,7 @@ PKGHFILES = xfs_fs.h \
 	xfs_log_format.h
 
 HFILES = \
+	xfs_ag_resv.h \
 	xfs_alloc.h \
 	xfs_alloc_btree.h \
 	xfs_attr_leaf.h \
@@ -59,6 +60,7 @@ CFILES = cache.c \
 	rdwr.c \
 	trans.c \
 	util.c \
+	xfs_ag_resv.c \
 	xfs_alloc.c \
 	xfs_alloc_btree.c \
 	xfs_attr.c \
diff --git a/libxfs/defer_item.c b/libxfs/defer_item.c
index 381c969..d2e4ad0 100644
--- a/libxfs/defer_item.c
+++ b/libxfs/defer_item.c
@@ -93,7 +93,8 @@ xfs_bmap_free_finish_item(
 
 	free = container_of(item, struct xfs_bmap_free_item, xbfi_list);
 	error = xfs_free_extent(tp, free->xbfi_startblock,
-			free->xbfi_blockcount, &free->xbfi_oinfo);
+			free->xbfi_blockcount, &free->xbfi_oinfo,
+			XFS_AG_RESV_NONE);
 	kmem_free(free);
 	return error;
 }
diff --git a/libxfs/xfs_ag_resv.c b/libxfs/xfs_ag_resv.c
new file mode 100644
index 0000000..03413e4
--- /dev/null
+++ b/libxfs/xfs_ag_resv.c
@@ -0,0 +1,317 @@
+/*
+ * Copyright (C) 2016 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "libxfs_priv.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_sb.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_alloc.h"
+#include "xfs_trace.h"
+#include "xfs_cksum.h"
+#include "xfs_trans.h"
+#include "xfs_bit.h"
+#include "xfs_bmap.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_ag_resv.h"
+#include "xfs_trans_space.h"
+#include "xfs_rmap_btree.h"
+#include "xfs_btree.h"
+
+/*
+ * Per-AG Block Reservations
+ *
+ * For some kinds of allocation group metadata structures, it is advantageous
+ * to reserve a small number of blocks in each AG so that future expansions of
+ * that data structure do not encounter ENOSPC because errors during a btree
+ * split cause the filesystem to go offline.
+ *
+ * Prior to the introduction of reflink, this wasn't an issue because the free
+ * space btrees maintain a reserve of space (the AGFL) to handle any expansion
+ * that may be necessary; and allocations of other metadata (inodes, BMBT,
+ * dir/attr) aren't restricted to a single AG.  However, with reflink it is
+ * possible to allocate all the space in an AG, have subsequent reflink/CoW
+ * activity expand the refcount btree, and discover that there's no space left
+ * to handle that expansion.  Since we can calculate the maximum size of the
+ * refcount btree, we can reserve space for it and avoid ENOSPC.
+ *
+ * Handling per-AG reservations consists of three changes to the allocator's
+ * behavior:  First, because these reservations are always needed, we decrease
+ * the ag_max_usable counter to reflect the size of the AG after the reserved
+ * blocks are taken.  Second, the reservations must be reflected in the
+ * fdblocks count to maintain proper accounting.  Third, each AG must maintain
+ * its own reserved block counter so that we can calculate the amount of space
+ * that must remain free to maintain the reservations.  Fourth, the "remaining
+ * reserved blocks" count must be used when calculating the length of the
+ * longest free extent in an AG and to clamp maxlen in the per-AG allocation
+ * functions.  In other words, we maintain a virtual allocation via in-core
+ * accounting tricks so that we don't have to clean up after a crash. :)
+ *
+ * Reserved blocks can be managed by passing one of the enum xfs_ag_resv_type
+ * values via struct xfs_alloc_arg or directly to the xfs_free_extent
+ * function.  It might seem a little funny to maintain a reservoir of blocks
+ * to feed another reservoir, but the AGFL only holds enough blocks to get
+ * through the next transaction.  The per-AG reservation is to ensure (we
+ * hope) that each AG never runs out of blocks.  Each data structure wanting
+ * to use the reservation system should update ask/used in xfs_ag_resv_init.
+ */
+
+/*
+ * Are we critically low on blocks?  For now we'll define that as the number
+ * of blocks we can get our hands on being less than 10% of what we reserved
+ * or less than some arbitrary number (eight).
+ */
+bool
+xfs_ag_resv_critical(
+	struct xfs_perag		*pag,
+	enum xfs_ag_resv_type		type)
+{
+	xfs_extlen_t			avail;
+	xfs_extlen_t			orig;
+
+	switch (type) {
+	case XFS_AG_RESV_METADATA:
+		avail = pag->pagf_freeblks - pag->pag_agfl_resv.ar_reserved;
+		orig = pag->pag_meta_resv.ar_asked;
+		break;
+	case XFS_AG_RESV_AGFL:
+		avail = pag->pagf_freeblks + pag->pagf_flcount -
+			pag->pag_meta_resv.ar_reserved;
+		orig = pag->pag_agfl_resv.ar_asked;
+		break;
+	default:
+		ASSERT(0);
+		return false;
+	}
+
+	trace_xfs_ag_resv_critical(pag, type, avail);
+
+	return avail < orig / 10 || avail < XFS_BTREE_MAXLEVELS;
+}
+
+/*
+ * How many blocks are reserved but not used, and therefore must not be
+ * allocated away?
+ */
+xfs_extlen_t
+xfs_ag_resv_needed(
+	struct xfs_perag		*pag,
+	enum xfs_ag_resv_type		type)
+{
+	xfs_extlen_t			len;
+
+	len = pag->pag_meta_resv.ar_reserved + pag->pag_agfl_resv.ar_reserved;
+	switch (type) {
+	case XFS_AG_RESV_METADATA:
+	case XFS_AG_RESV_AGFL:
+		len -= xfs_perag_resv(pag, type)->ar_reserved;
+		break;
+	case XFS_AG_RESV_NONE:
+		/* empty */
+		break;
+	default:
+		ASSERT(0);
+	}
+
+	trace_xfs_ag_resv_needed(pag, type, len);
+
+	return len;
+}
+
+/* Clean out a reservation */
+static int
+__xfs_ag_resv_free(
+	struct xfs_perag		*pag,
+	enum xfs_ag_resv_type		type)
+{
+	struct xfs_ag_resv		*resv;
+	struct xfs_ag_resv		t;
+	int				error;
+
+	trace_xfs_ag_resv_free(pag, type, 0);
+
+	resv = xfs_perag_resv(pag, type);
+	t = *resv;
+	resv->ar_reserved = 0;
+	resv->ar_asked = 0;
+	pag->pag_mount->m_ag_max_usable += t.ar_asked;
+
+	error = xfs_mod_fdblocks(pag->pag_mount, t.ar_reserved, true);
+	if (error)
+		trace_xfs_ag_resv_free_error(pag->pag_mount, pag->pag_agno,
+				error, _RET_IP_);
+	return error;
+}
+
+/* Free a per-AG reservation. */
+int
+xfs_ag_resv_free(
+	struct xfs_perag		*pag)
+{
+	int				error = 0;
+	int				err2;
+
+	err2 = __xfs_ag_resv_free(pag, XFS_AG_RESV_AGFL);
+	if (err2 && !error)
+		error = err2;
+	err2 = __xfs_ag_resv_free(pag, XFS_AG_RESV_METADATA);
+	if (err2 && !error)
+		error = err2;
+	return error;
+}
+
+static int
+__xfs_ag_resv_init(
+	struct xfs_perag		*pag,
+	enum xfs_ag_resv_type		type,
+	xfs_extlen_t			ask,
+	xfs_extlen_t			used)
+{
+	struct xfs_mount		*mp = pag->pag_mount;
+	struct xfs_ag_resv		*resv;
+	int				error;
+
+	resv = xfs_perag_resv(pag, type);
+	if (used > ask)
+		ask = used;
+	resv->ar_asked = ask;
+	resv->ar_reserved = ask - used;
+	mp->m_ag_max_usable -= ask;
+
+	trace_xfs_ag_resv_init(pag, type, ask);
+
+	error = xfs_mod_fdblocks(mp, -(int64_t)resv->ar_reserved, true);
+	if (error)
+		trace_xfs_ag_resv_init_error(pag->pag_mount, pag->pag_agno,
+				error, _RET_IP_);
+
+	return error;
+}
+
+/* Create a per-AG block reservation. */
+int
+xfs_ag_resv_init(
+	struct xfs_perag		*pag)
+{
+	xfs_extlen_t			ask;
+	xfs_extlen_t			used;
+	int				error = 0;
+	int				err2;
+
+	if (pag->pag_meta_resv.ar_asked)
+		goto init_agfl;
+
+	/* Create the metadata reservation. */
+	ask = used = 0;
+
+	err2 = __xfs_ag_resv_init(pag, XFS_AG_RESV_METADATA, ask, used);
+	if (err2 && !error)
+		error = err2;
+
+init_agfl:
+	if (pag->pag_agfl_resv.ar_asked)
+		return error;
+
+	/* Create the AGFL metadata reservation */
+	ask = used = 0;
+
+	err2 = __xfs_ag_resv_init(pag, XFS_AG_RESV_AGFL, ask, used);
+	if (err2 && !error)
+		error = err2;
+
+	return error;
+}
+
+/* Allocate a block from the reservation. */
+void
+xfs_ag_resv_alloc_extent(
+	struct xfs_perag		*pag,
+	enum xfs_ag_resv_type		type,
+	struct xfs_alloc_arg		*args)
+{
+	struct xfs_ag_resv		*resv;
+	xfs_extlen_t			leftover;
+	uint				field;
+
+	trace_xfs_ag_resv_alloc_extent(pag, type, args->len);
+
+	switch (type) {
+	case XFS_AG_RESV_METADATA:
+	case XFS_AG_RESV_AGFL:
+		resv = xfs_perag_resv(pag, type);
+		break;
+	default:
+		ASSERT(0);
+		/* fall through */
+	case XFS_AG_RESV_NONE:
+		field = args->wasdel ? XFS_TRANS_SB_RES_FDBLOCKS :
+				       XFS_TRANS_SB_FDBLOCKS;
+		xfs_trans_mod_sb(args->tp, field, -(int64_t)args->len);
+		return;
+	}
+
+	if (args->len > resv->ar_reserved) {
+		leftover = args->len - resv->ar_reserved;
+		if (type != XFS_AG_RESV_AGFL)
+			xfs_trans_mod_sb(args->tp, XFS_TRANS_SB_FDBLOCKS,
+					-(int64_t)leftover);
+		resv->ar_reserved = 0;
+	} else
+		resv->ar_reserved -= args->len;
+}
+
+/* Free a block to the reservation. */
+void
+xfs_ag_resv_free_extent(
+	struct xfs_perag		*pag,
+	enum xfs_ag_resv_type		type,
+	struct xfs_trans		*tp,
+	xfs_extlen_t			len)
+{
+	xfs_extlen_t			leftover;
+	struct xfs_ag_resv		*resv;
+
+	trace_xfs_ag_resv_free_extent(pag, type, len);
+
+	switch (type) {
+	case XFS_AG_RESV_METADATA:
+	case XFS_AG_RESV_AGFL:
+		resv = xfs_perag_resv(pag, type);
+		break;
+	default:
+		ASSERT(0);
+		/* fall through */
+	case XFS_AG_RESV_NONE:
+		xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, (int64_t)len);
+		return;
+	}
+
+	if (resv->ar_reserved + len > resv->ar_asked) {
+		leftover = resv->ar_reserved + len - resv->ar_asked;
+		if (type != XFS_AG_RESV_AGFL)
+			xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS,
+					(int64_t)leftover);
+		resv->ar_reserved = resv->ar_asked;
+	} else
+		resv->ar_reserved += len;
+}
diff --git a/libxfs/xfs_ag_resv.h b/libxfs/xfs_ag_resv.h
new file mode 100644
index 0000000..8d6c687
--- /dev/null
+++ b/libxfs/xfs_ag_resv.h
@@ -0,0 +1,35 @@
+/*
+ * Copyright (C) 2016 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef __XFS_AG_RESV_H__
+#define	__XFS_AG_RESV_H__
+
+int xfs_ag_resv_free(struct xfs_perag *pag);
+int xfs_ag_resv_init(struct xfs_perag *pag);
+
+bool xfs_ag_resv_critical(struct xfs_perag *pag, enum xfs_ag_resv_type type);
+xfs_extlen_t xfs_ag_resv_needed(struct xfs_perag *pag,
+		enum xfs_ag_resv_type type);
+
+void xfs_ag_resv_alloc_extent(struct xfs_perag *pag, enum xfs_ag_resv_type type,
+		struct xfs_alloc_arg *args);
+void xfs_ag_resv_free_extent(struct xfs_perag *pag, enum xfs_ag_resv_type type,
+		struct xfs_trans *tp, xfs_extlen_t len);
+
+#endif	/* __XFS_AG_RESV_H__ */
diff --git a/libxfs/xfs_alloc.c b/libxfs/xfs_alloc.c
index 2f943db..7b9040e 100644
--- a/libxfs/xfs_alloc.c
+++ b/libxfs/xfs_alloc.c
@@ -33,6 +33,7 @@
 #include "xfs_cksum.h"
 #include "xfs_trace.h"
 #include "xfs_trans.h"
+#include "xfs_ag_resv.h"
 
 struct workqueue_struct *xfs_alloc_wq;
 
@@ -678,12 +679,29 @@ xfs_alloc_ag_vextent(
 	xfs_alloc_arg_t	*args)	/* argument structure for allocation */
 {
 	int		error=0;
+	xfs_extlen_t	reservation;
+	xfs_extlen_t	oldmax;
 
 	ASSERT(args->minlen > 0);
 	ASSERT(args->maxlen > 0);
 	ASSERT(args->minlen <= args->maxlen);
 	ASSERT(args->mod < args->prod);
 	ASSERT(args->alignment > 0);
+
+	/*
+	 * Clamp maxlen to the amount of free space minus any reservations
+	 * that have been made.
+	 */
+	oldmax = args->maxlen;
+	reservation = xfs_ag_resv_needed(args->pag, args->resv);
+	if (args->maxlen > args->pag->pagf_freeblks - reservation)
+		args->maxlen = args->pag->pagf_freeblks - reservation;
+	if (args->maxlen == 0) {
+		args->agbno = NULLAGBLOCK;
+		args->maxlen = oldmax;
+		return 0;
+	}
+
 	/*
 	 * Branch to correct routine based on the type.
 	 */
@@ -703,12 +721,14 @@ xfs_alloc_ag_vextent(
 		/* NOTREACHED */
 	}
 
+	args->maxlen = oldmax;
+
 	if (error || args->agbno == NULLAGBLOCK)
 		return error;
 
 	ASSERT(args->len >= args->minlen);
 	ASSERT(args->len <= args->maxlen);
-	ASSERT(!args->wasfromfl || !args->isfl);
+	ASSERT(!args->wasfromfl || args->resv != XFS_AG_RESV_AGFL);
 	ASSERT(args->agbno % args->alignment == 0);
 
 	/* if not file data, insert new block into the reverse map btree */
@@ -730,12 +750,7 @@ xfs_alloc_ag_vextent(
 					      args->agbno, args->len));
 	}
 
-	if (!args->isfl) {
-		xfs_trans_mod_sb(args->tp, args->wasdel ?
-				 XFS_TRANS_SB_RES_FDBLOCKS :
-				 XFS_TRANS_SB_FDBLOCKS,
-				 -((long)(args->len)));
-	}
+	xfs_ag_resv_alloc_extent(args->pag, args->resv, args);
 
 	XFS_STATS_INC(args->mp, xs_allocx);
 	XFS_STATS_ADD(args->mp, xs_allocb, args->len);
@@ -1597,7 +1612,8 @@ xfs_alloc_ag_vextent_small(
 	 * to respect minleft even when pulling from the
 	 * freelist.
 	 */
-	else if (args->minlen == 1 && args->alignment == 1 && !args->isfl &&
+	else if (args->minlen == 1 && args->alignment == 1 &&
+		 args->resv != XFS_AG_RESV_AGFL &&
 		 (be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_flcount)
 		  > args->minleft)) {
 		error = xfs_alloc_get_freelist(args->tp, args->agbp, &fbno, 0);
@@ -1668,7 +1684,7 @@ xfs_free_ag_extent(
 	xfs_agblock_t	bno,	/* starting block number */
 	xfs_extlen_t	len,	/* length of extent */
 	struct xfs_owner_info	*oinfo,	/* extent owner */
-	int		isfl)	/* set if is freelist blocks - no sb acctg */
+	enum xfs_ag_resv_type	type) /* extent reservation type */
 {
 	xfs_btree_cur_t	*bno_cur;	/* cursor for by-block btree */
 	xfs_btree_cur_t	*cnt_cur;	/* cursor for by-size btree */
@@ -1896,21 +1912,22 @@ xfs_free_ag_extent(
 	 */
 	pag = xfs_perag_get(mp, agno);
 	error = xfs_alloc_update_counters(tp, pag, agbp, len);
+	xfs_ag_resv_free_extent(pag, type, tp, len);
 	xfs_perag_put(pag);
 	if (error)
 		goto error0;
 
-	if (!isfl)
-		xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, (long)len);
 	XFS_STATS_INC(mp, xs_freex);
 	XFS_STATS_ADD(mp, xs_freeb, len);
 
-	trace_xfs_free_extent(mp, agno, bno, len, isfl, haveleft, haveright);
+	trace_xfs_free_extent(mp, agno, bno, len, type == XFS_AG_RESV_AGFL,
+			haveleft, haveright);
 
 	return 0;
 
  error0:
-	trace_xfs_free_extent(mp, agno, bno, len, isfl, -1, -1);
+	trace_xfs_free_extent(mp, agno, bno, len, type == XFS_AG_RESV_AGFL,
+			-1, -1);
 	if (bno_cur)
 		xfs_btree_del_cursor(bno_cur, XFS_BTREE_ERROR);
 	if (cnt_cur)
@@ -1935,21 +1952,43 @@ xfs_alloc_compute_maxlevels(
 }
 
 /*
- * Find the length of the longest extent in an AG.
+ * Find the length of the longest extent in an AG.  The 'need' parameter
+ * specifies how much space we're going to need for the AGFL and the
+ * 'reserved' parameter tells us how many blocks in this AG are reserved for
+ * other callers.
  */
 xfs_extlen_t
 xfs_alloc_longest_free_extent(
 	struct xfs_mount	*mp,
 	struct xfs_perag	*pag,
-	xfs_extlen_t		need)
+	xfs_extlen_t		need,
+	xfs_extlen_t		reserved)
 {
 	xfs_extlen_t		delta = 0;
 
+	/*
+	 * If the AGFL needs a recharge, we'll have to subtract that from the
+	 * longest extent.
+	 */
 	if (need > pag->pagf_flcount)
 		delta = need - pag->pagf_flcount;
 
+	/*
+	 * If we cannot maintain others' reservations with space from the
+	 * not-longest freesp extents, we'll have to subtract /that/ from
+	 * the longest extent too.
+	 */
+	if (pag->pagf_freeblks - pag->pagf_longest < reserved)
+		delta += reserved - (pag->pagf_freeblks - pag->pagf_longest);
+
+	/*
+	 * If the longest extent is long enough to satisfy all the
+	 * reservations and AGFL rules in place, we can return this extent.
+	 */
 	if (pag->pagf_longest > delta)
 		return pag->pagf_longest - delta;
+
+	/* Otherwise, let the caller try for 1 block if there's space. */
 	return pag->pagf_flcount > 0 || pag->pagf_longest > 0;
 }
 
@@ -1989,20 +2028,24 @@ xfs_alloc_space_available(
 {
 	struct xfs_perag	*pag = args->pag;
 	xfs_extlen_t		longest;
+	xfs_extlen_t		reservation; /* blocks that are still reserved */
 	int			available;
 
 	if (flags & XFS_ALLOC_FLAG_FREEING)
 		return true;
 
+	reservation = xfs_ag_resv_needed(pag, args->resv);
+
 	/* do we have enough contiguous free space for the allocation? */
-	longest = xfs_alloc_longest_free_extent(args->mp, pag, min_free);
+	longest = xfs_alloc_longest_free_extent(args->mp, pag, min_free,
+			reservation);
 	if ((args->minlen + args->alignment + args->minalignslop - 1) > longest)
 		return false;
 
-	/* do have enough free space remaining for the allocation? */
+	/* do we have enough free space remaining for the allocation? */
 	available = (int)(pag->pagf_freeblks + pag->pagf_flcount -
-			  min_free - args->total);
-	if (available < (int)args->minleft)
+			  reservation - min_free - args->total);
+	if (available < (int)args->minleft || available <= 0)
 		return false;
 
 	return true;
@@ -2108,7 +2151,8 @@ xfs_alloc_fix_freelist(
 			if (error)
 				goto out_agbp_relse;
 			error = xfs_free_ag_extent(tp, agbp, args->agno, bno, 1,
-						   &targs.oinfo, 1);
+						   &targs.oinfo,
+						   XFS_AG_RESV_AGFL);
 			if (error)
 				goto out_agbp_relse;
 			bp = xfs_btree_get_bufs(mp, tp, args->agno, bno, 0);
@@ -2122,7 +2166,7 @@ xfs_alloc_fix_freelist(
 		xfs_rmap_ag_owner(&targs.oinfo, XFS_RMAP_OWN_AG);
 	targs.agbp = agbp;
 	targs.agno = args->agno;
-	targs.alignment = targs.minlen = targs.prod = targs.isfl = 1;
+	targs.alignment = targs.minlen = targs.prod = 1;
 	targs.type = XFS_ALLOCTYPE_THIS_AG;
 	targs.pag = pag;
 	error = xfs_alloc_read_agfl(mp, tp, targs.agno, &agflbp);
@@ -2133,6 +2177,7 @@ xfs_alloc_fix_freelist(
 	while (pag->pagf_flcount < need) {
 		targs.agbno = 0;
 		targs.maxlen = need - pag->pagf_flcount;
+		targs.resv = XFS_AG_RESV_AGFL;
 
 		/* Allocate as many blocks as possible at once. */
 		error = xfs_alloc_ag_vextent(&targs);
@@ -2811,7 +2856,8 @@ xfs_free_extent(
 	struct xfs_trans	*tp,	/* transaction pointer */
 	xfs_fsblock_t		bno,	/* starting block number of extent */
 	xfs_extlen_t		len,	/* length of extent */
-	struct xfs_owner_info	*oinfo)	/* extent owner */
+	struct xfs_owner_info	*oinfo,	/* extent owner */
+	enum xfs_ag_resv_type	type)	/* block reservation type */
 {
 	struct xfs_mount	*mp = tp->t_mountp;
 	struct xfs_buf		*agbp;
@@ -2820,6 +2866,7 @@ xfs_free_extent(
 	int			error;
 
 	ASSERT(len != 0);
+	ASSERT(type != XFS_AG_RESV_AGFL);
 
 	trace_xfs_bmap_free_deferred(mp, agno, 0, agbno, len);
 
@@ -2839,7 +2886,7 @@ xfs_free_extent(
 			agbno + len <= be32_to_cpu(XFS_BUF_TO_AGF(agbp)->agf_length),
 			err);
 
-	error = xfs_free_ag_extent(tp, agbp, agno, agbno, len, oinfo, 0);
+	error = xfs_free_ag_extent(tp, agbp, agno, agbno, len, oinfo, type);
 	if (error)
 		goto err;
 
diff --git a/libxfs/xfs_alloc.h b/libxfs/xfs_alloc.h
index 7b9e67e..9f6373a 100644
--- a/libxfs/xfs_alloc.h
+++ b/libxfs/xfs_alloc.h
@@ -87,10 +87,10 @@ typedef struct xfs_alloc_arg {
 	xfs_alloctype_t	otype;		/* original allocation type */
 	char		wasdel;		/* set if allocation was prev delayed */
 	char		wasfromfl;	/* set if allocation is from freelist */
-	char		isfl;		/* set if is freelist blocks - !acctg */
 	char		userdata;	/* mask defining userdata treatment */
 	xfs_fsblock_t	firstblock;	/* io first block allocated */
 	struct xfs_owner_info	oinfo;	/* owner of blocks being allocated */
+	enum xfs_ag_resv_type	resv;	/* block reservation to use */
 } xfs_alloc_arg_t;
 
 /*
@@ -106,7 +106,8 @@ unsigned int xfs_alloc_set_aside(struct xfs_mount *mp);
 unsigned int xfs_alloc_ag_max_usable(struct xfs_mount *mp);
 
 xfs_extlen_t xfs_alloc_longest_free_extent(struct xfs_mount *mp,
-		struct xfs_perag *pag, xfs_extlen_t need);
+		struct xfs_perag *pag, xfs_extlen_t need,
+		xfs_extlen_t reserved);
 unsigned int xfs_alloc_min_freelist(struct xfs_mount *mp,
 		struct xfs_perag *pag);
 
@@ -184,7 +185,8 @@ xfs_free_extent(
 	struct xfs_trans *tp,	/* transaction pointer */
 	xfs_fsblock_t	bno,	/* starting block number of extent */
 	xfs_extlen_t	len,	/* length of extent */
-	struct xfs_owner_info	*oinfo);	/* extent owner */
+	struct xfs_owner_info	*oinfo,	/* extent owner */
+	enum xfs_ag_resv_type	type);	/* block reservation type */
 
 int				/* error */
 xfs_alloc_lookup_ge(
diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index e9ccec5..50faacd 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -39,6 +39,7 @@
 #include "xfs_attr_leaf.h"
 #include "xfs_quota_defs.h"
 #include "xfs_rmap_btree.h"
+#include "xfs_ag_resv.h"
 
 
 kmem_zone_t		*xfs_bmap_free_item_zone;
@@ -3493,7 +3494,8 @@ xfs_bmap_longest_free_extent(
 	}
 
 	longest = xfs_alloc_longest_free_extent(mp, pag,
-					xfs_alloc_min_freelist(mp, pag));
+				xfs_alloc_min_freelist(mp, pag),
+				xfs_ag_resv_needed(pag, XFS_AG_RESV_NONE));
 	if (*blen < longest)
 		*blen = longest;
 
@@ -3772,7 +3774,7 @@ xfs_bmap_btalloc(
 	}
 	args.minleft = ap->minleft;
 	args.wasdel = ap->wasdel;
-	args.isfl = 0;
+	args.resv = XFS_AG_RESV_NONE;
 	args.userdata = ap->userdata;
 	if (ap->userdata & XFS_ALLOC_USERDATA_ZERO)
 		args.ip = ap->ip;
diff --git a/libxfs/xfs_ialloc_btree.c b/libxfs/xfs_ialloc_btree.c
index d79ddfe..36d09eb 100644
--- a/libxfs/xfs_ialloc_btree.c
+++ b/libxfs/xfs_ialloc_btree.c
@@ -130,7 +130,7 @@ xfs_inobt_free_block(
 	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_INOBT);
 	return xfs_free_extent(cur->bc_tp,
 			XFS_DADDR_TO_FSB(cur->bc_mp, XFS_BUF_ADDR(bp)), 1,
-			&oinfo);
+			&oinfo, XFS_AG_RESV_NONE);
 }
 
 STATIC int

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 073/145] xfs: introduce refcount btree definitions
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (71 preceding siblings ...)
  2016-06-17  1:38 ` [PATCH 072/145] xfs: set up per-AG free space reservations Darrick J. Wong
@ 2016-06-17  1:38 ` Darrick J. Wong
  2016-06-17  1:38 ` [PATCH 074/145] xfs: add refcount btree stats infrastructure Darrick J. Wong
                   ` (71 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:38 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: Christoph Hellwig, xfs

Add new per-AG refcount btree definitions to the per-AG structures.

v2: Move the reflink inode flag out of the way of the DAX flag, and
add the new cowextsize flag.

v3: Don't allow pNFS to export reflinked files; this will be removed
some day when the Linux pNFS server supports it.

[hch: don't allow pNFS export of reflinked files]
[darrick: fix the feature test in hch's patch]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/xfs_inode.h     |    5 +++++
 include/xfs_mount.h     |    3 +++
 libxfs/xfs_alloc.c      |    5 +++++
 libxfs/xfs_btree.c      |    5 +++--
 libxfs/xfs_btree.h      |    3 +++
 libxfs/xfs_format.h     |   29 ++++++++++++++++++++++++++---
 libxfs/xfs_rmap_btree.c |    7 +++++--
 libxfs/xfs_types.h      |    2 +-
 8 files changed, 51 insertions(+), 8 deletions(-)


diff --git a/include/xfs_inode.h b/include/xfs_inode.h
index 8141d97..3876fa6 100644
--- a/include/xfs_inode.h
+++ b/include/xfs_inode.h
@@ -122,6 +122,11 @@ xfs_set_projid(struct xfs_icdinode *id, prid_t projid)
 	id->di_projid_lo = (__uint16_t) (projid & 0xffff);
 }
 
+static inline bool xfs_is_reflink_inode(struct xfs_inode *ip)
+{
+	return ip->i_d.di_flags2 & XFS_DIFLAG2_REFLINK;
+}
+
 typedef struct cred {
 	uid_t	cr_uid;
 	gid_t	cr_gid;
diff --git a/include/xfs_mount.h b/include/xfs_mount.h
index c452de2..c5db4d8 100644
--- a/include/xfs_mount.h
+++ b/include/xfs_mount.h
@@ -161,6 +161,9 @@ typedef struct xfs_perag {
 	struct xfs_ag_resv	pag_meta_resv;
 	/* Blocks reserved for just AGFL-based metadata. */
 	struct xfs_ag_resv	pag_agfl_resv;
+
+	/* reference count */
+	__uint8_t	pagf_refcount_level;
 } xfs_perag_t;
 
 static inline struct xfs_ag_resv *
diff --git a/libxfs/xfs_alloc.c b/libxfs/xfs_alloc.c
index 7b9040e..c4d001f 100644
--- a/libxfs/xfs_alloc.c
+++ b/libxfs/xfs_alloc.c
@@ -2444,6 +2444,10 @@ xfs_agf_verify(
 	    be32_to_cpu(agf->agf_btreeblks) > be32_to_cpu(agf->agf_length))
 		return false;
 
+	if (xfs_sb_version_hasreflink(&mp->m_sb) &&
+	    be32_to_cpu(agf->agf_refcount_level) > XFS_BTREE_MAXLEVELS)
+		return false;
+
 	return true;;
 
 }
@@ -2564,6 +2568,7 @@ xfs_alloc_read_agf(
 			be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNTi]);
 		pag->pagf_levels[XFS_BTNUM_RMAPi] =
 			be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAPi]);
+		pag->pagf_refcount_level = be32_to_cpu(agf->agf_refcount_level);
 		spin_lock_init(&pag->pagb_lock);
 		pag->pagb_count = 0;
 #ifdef __KERNEL__
diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c
index 1e547f8..f4b1780 100644
--- a/libxfs/xfs_btree.c
+++ b/libxfs/xfs_btree.c
@@ -41,9 +41,10 @@ kmem_zone_t	*xfs_btree_cur_zone;
  */
 static const __uint32_t xfs_magics[2][XFS_BTNUM_MAX] = {
 	{ XFS_ABTB_MAGIC, XFS_ABTC_MAGIC, 0, XFS_BMAP_MAGIC, XFS_IBT_MAGIC,
-	  XFS_FIBT_MAGIC },
+	  XFS_FIBT_MAGIC, 0 },
 	{ XFS_ABTB_CRC_MAGIC, XFS_ABTC_CRC_MAGIC, XFS_RMAP_CRC_MAGIC,
-	  XFS_BMAP_CRC_MAGIC, XFS_IBT_CRC_MAGIC, XFS_FIBT_CRC_MAGIC }
+	  XFS_BMAP_CRC_MAGIC, XFS_IBT_CRC_MAGIC, XFS_FIBT_CRC_MAGIC,
+	  XFS_REFC_CRC_MAGIC }
 };
 #define xfs_btree_magic(cur) \
 	xfs_magics[!!((cur)->bc_flags & XFS_BTREE_CRC_BLOCKS)][cur->bc_btnum]
diff --git a/libxfs/xfs_btree.h b/libxfs/xfs_btree.h
index 6fa13a9..85506a9 100644
--- a/libxfs/xfs_btree.h
+++ b/libxfs/xfs_btree.h
@@ -66,6 +66,7 @@ union xfs_btree_rec {
 #define	XFS_BTNUM_INO	((xfs_btnum_t)XFS_BTNUM_INOi)
 #define	XFS_BTNUM_FINO	((xfs_btnum_t)XFS_BTNUM_FINOi)
 #define	XFS_BTNUM_RMAP	((xfs_btnum_t)XFS_BTNUM_RMAPi)
+#define	XFS_BTNUM_REFC	((xfs_btnum_t)XFS_BTNUM_REFCi)
 
 /*
  * For logging record fields.
@@ -99,6 +100,7 @@ do {    \
 	case XFS_BTNUM_INO: __XFS_BTREE_STATS_INC(__mp, ibt, stat); break; \
 	case XFS_BTNUM_FINO: __XFS_BTREE_STATS_INC(__mp, fibt, stat); break; \
 	case XFS_BTNUM_RMAP: __XFS_BTREE_STATS_INC(__mp, rmap, stat); break; \
+	case XFS_BTNUM_REFC: break; \
 	case XFS_BTNUM_MAX: ASSERT(0); __mp = __mp /* fucking gcc */ ; break; \
 	}       \
 } while (0)
@@ -121,6 +123,7 @@ do {    \
 		__XFS_BTREE_STATS_ADD(__mp, fibt, stat, val); break; \
 	case XFS_BTNUM_RMAP:	\
 		__XFS_BTREE_STATS_ADD(__mp, rmap, stat, val); break; \
+	case XFS_BTNUM_REFC: break; \
 	case XFS_BTNUM_MAX: ASSERT(0); __mp = __mp /* fucking gcc */ ; break; \
 	}       \
 } while (0)
diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index 7205806..865ce0f 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -456,6 +456,7 @@ xfs_sb_has_compat_feature(
 
 #define XFS_SB_FEAT_RO_COMPAT_FINOBT   (1 << 0)		/* free inode btree */
 #define XFS_SB_FEAT_RO_COMPAT_RMAPBT   (1 << 1)		/* reverse map btree */
+#define XFS_SB_FEAT_RO_COMPAT_REFLINK  (1 << 2)		/* reflinked files */
 #define XFS_SB_FEAT_RO_COMPAT_ALL \
 		(XFS_SB_FEAT_RO_COMPAT_FINOBT | \
 		 XFS_SB_FEAT_RO_COMPAT_RMAPBT)
@@ -546,6 +547,12 @@ static inline bool xfs_sb_version_hasrmapbt(struct xfs_sb *sbp)
 		(sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_RMAPBT);
 }
 
+static inline bool xfs_sb_version_hasreflink(struct xfs_sb *sbp)
+{
+	return (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) &&
+		(sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_REFLINK);
+}
+
 /*
  * end of superblock version macros
  */
@@ -640,12 +647,15 @@ typedef struct xfs_agf {
 	__be32		agf_btreeblks;	/* # of blocks held in AGF btrees */
 	uuid_t		agf_uuid;	/* uuid of filesystem */
 
+	__be32		agf_refcount_root;	/* refcount tree root block */
+	__be32		agf_refcount_level;	/* refcount btree levels */
+
 	/*
 	 * reserve some contiguous space for future logged fields before we add
 	 * the unlogged fields. This makes the range logging via flags and
 	 * structure offsets much simpler.
 	 */
-	__be64		agf_spare64[16];
+	__be64		agf_spare64[15];
 
 	/* unlogged fields, written during buffer writeback. */
 	__be64		agf_lsn;	/* last write sequence */
@@ -1033,9 +1043,14 @@ static inline void xfs_dinode_put_rdev(struct xfs_dinode *dip, xfs_dev_t rdev)
  * 16 bits of the XFS_XFLAG_s range.
  */
 #define XFS_DIFLAG2_DAX_BIT	0	/* use DAX for this inode */
+#define XFS_DIFLAG2_REFLINK_BIT	1	/* file's blocks may be shared */
+#define XFS_DIFLAG2_COWEXTSIZE_BIT   2  /* copy on write extent size hint */
 #define XFS_DIFLAG2_DAX		(1 << XFS_DIFLAG2_DAX_BIT)
+#define XFS_DIFLAG2_REFLINK     (1 << XFS_DIFLAG2_REFLINK_BIT)
+#define XFS_DIFLAG2_COWEXTSIZE  (1 << XFS_DIFLAG2_COWEXTSIZE_BIT)
 
-#define XFS_DIFLAG2_ANY		(XFS_DIFLAG2_DAX)
+#define XFS_DIFLAG2_ANY \
+	(XFS_DIFLAG2_DAX | XFS_DIFLAG2_REFLINK | XFS_DIFLAG2_COWEXTSIZE)
 
 /*
  * Inode number format:
@@ -1382,7 +1397,8 @@ xfs_rmap_ino_owner(
 #define XFS_RMAP_OWN_AG		(-5ULL)	/* AG freespace btree blocks */
 #define XFS_RMAP_OWN_INOBT	(-6ULL)	/* Inode btree blocks */
 #define XFS_RMAP_OWN_INODES	(-7ULL)	/* Inode chunk */
-#define XFS_RMAP_OWN_MIN	(-8ULL) /* guard */
+#define XFS_RMAP_OWN_REFC	(-8ULL) /* refcount tree */
+#define XFS_RMAP_OWN_MIN	(-9ULL) /* guard */
 
 #define XFS_RMAP_NON_INODE_OWNER(owner)	(!!((owner) & (1ULL << 63)))
 
@@ -1530,6 +1546,13 @@ xfs_owner_info_pack(
 }
 
 /*
+ * Reference Count Btree format definitions
+ *
+ */
+#define	XFS_REFC_CRC_MAGIC	0x52334643	/* 'R3FC' */
+
+
+/*
  * BMAP Btree format definitions
  *
  * This includes both the root block definition that sits inside an inode fork
diff --git a/libxfs/xfs_rmap_btree.c b/libxfs/xfs_rmap_btree.c
index b5c3c21..3fedfd7 100644
--- a/libxfs/xfs_rmap_btree.c
+++ b/libxfs/xfs_rmap_btree.c
@@ -473,6 +473,9 @@ void
 xfs_rmapbt_compute_maxlevels(
 	struct xfs_mount		*mp)
 {
-	mp->m_rmap_maxlevels = xfs_btree_compute_maxlevels(mp,
-			mp->m_rmap_mnr, mp->m_sb.sb_agblocks);
+	if (xfs_sb_version_hasreflink(&mp->m_sb))
+		mp->m_rmap_maxlevels = XFS_BTREE_MAXLEVELS;
+	else
+		mp->m_rmap_maxlevels = xfs_btree_compute_maxlevels(mp,
+				mp->m_rmap_mnr, mp->m_sb.sb_agblocks);
 }
diff --git a/libxfs/xfs_types.h b/libxfs/xfs_types.h
index da87796..690d616 100644
--- a/libxfs/xfs_types.h
+++ b/libxfs/xfs_types.h
@@ -112,7 +112,7 @@ typedef enum {
 
 typedef enum {
 	XFS_BTNUM_BNOi, XFS_BTNUM_CNTi, XFS_BTNUM_RMAPi, XFS_BTNUM_BMAPi,
-	XFS_BTNUM_INOi, XFS_BTNUM_FINOi, XFS_BTNUM_MAX
+	XFS_BTNUM_INOi, XFS_BTNUM_FINOi, XFS_BTNUM_REFCi, XFS_BTNUM_MAX
 } xfs_btnum_t;
 
 struct xfs_name {

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 074/145] xfs: add refcount btree stats infrastructure
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (72 preceding siblings ...)
  2016-06-17  1:38 ` [PATCH 073/145] xfs: introduce refcount btree definitions Darrick J. Wong
@ 2016-06-17  1:38 ` Darrick J. Wong
  2016-06-17  1:38 ` [PATCH 075/145] xfs: refcount btree add more reserved blocks Darrick J. Wong
                   ` (70 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:38 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

The refcount btree presents the same stats as the other btrees, so
add all the code for that now.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_btree.h |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)


diff --git a/libxfs/xfs_btree.h b/libxfs/xfs_btree.h
index 85506a9..93e761e 100644
--- a/libxfs/xfs_btree.h
+++ b/libxfs/xfs_btree.h
@@ -100,7 +100,7 @@ do {    \
 	case XFS_BTNUM_INO: __XFS_BTREE_STATS_INC(__mp, ibt, stat); break; \
 	case XFS_BTNUM_FINO: __XFS_BTREE_STATS_INC(__mp, fibt, stat); break; \
 	case XFS_BTNUM_RMAP: __XFS_BTREE_STATS_INC(__mp, rmap, stat); break; \
-	case XFS_BTNUM_REFC: break; \
+	case XFS_BTNUM_REFC: __XFS_BTREE_STATS_INC(__mp, refcbt, stat); break; \
 	case XFS_BTNUM_MAX: ASSERT(0); __mp = __mp /* fucking gcc */ ; break; \
 	}       \
 } while (0)
@@ -123,7 +123,8 @@ do {    \
 		__XFS_BTREE_STATS_ADD(__mp, fibt, stat, val); break; \
 	case XFS_BTNUM_RMAP:	\
 		__XFS_BTREE_STATS_ADD(__mp, rmap, stat, val); break; \
-	case XFS_BTNUM_REFC: break; \
+	case XFS_BTNUM_REFC:	\
+		__XFS_BTREE_STATS_ADD(__mp, refcbt, stat, val); break; \
 	case XFS_BTNUM_MAX: ASSERT(0); __mp = __mp /* fucking gcc */ ; break; \
 	}       \
 } while (0)

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 075/145] xfs: refcount btree add more reserved blocks
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (73 preceding siblings ...)
  2016-06-17  1:38 ` [PATCH 074/145] xfs: add refcount btree stats infrastructure Darrick J. Wong
@ 2016-06-17  1:38 ` Darrick J. Wong
  2016-06-17  1:38 ` [PATCH 076/145] xfs: define the on-disk refcount btree format Darrick J. Wong
                   ` (69 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:38 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Since XFS reserves a small amount of space in each AG as the minimum
free space needed for an operation, save some more space in case we
touch the refcount btree.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_alloc.c  |   13 +++++++++++++
 libxfs/xfs_format.h |    2 ++
 2 files changed, 15 insertions(+)


diff --git a/libxfs/xfs_alloc.c b/libxfs/xfs_alloc.c
index c4d001f..6e6ada8 100644
--- a/libxfs/xfs_alloc.c
+++ b/libxfs/xfs_alloc.c
@@ -48,10 +48,23 @@ STATIC int xfs_alloc_ag_vextent_size(xfs_alloc_arg_t *);
 STATIC int xfs_alloc_ag_vextent_small(xfs_alloc_arg_t *,
 		xfs_btree_cur_t *, xfs_agblock_t *, xfs_extlen_t *, int *);
 
+unsigned int
+xfs_refc_block(
+	struct xfs_mount	*mp)
+{
+	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
+		return XFS_RMAP_BLOCK(mp) + 1;
+	if (xfs_sb_version_hasfinobt(&mp->m_sb))
+		return XFS_FIBT_BLOCK(mp) + 1;
+	return XFS_IBT_BLOCK(mp) + 1;
+}
+
 xfs_extlen_t
 xfs_prealloc_blocks(
 	struct xfs_mount	*mp)
 {
+	if (xfs_sb_version_hasreflink(&mp->m_sb))
+		return xfs_refc_block(mp) + 1;
 	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
 		return XFS_RMAP_BLOCK(mp) + 1;
 	if (xfs_sb_version_hasfinobt(&mp->m_sb))
diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index 865ce0f..54fbe2b 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -1551,6 +1551,8 @@ xfs_owner_info_pack(
  */
 #define	XFS_REFC_CRC_MAGIC	0x52334643	/* 'R3FC' */
 
+unsigned int xfs_refc_block(struct xfs_mount *mp);
+
 
 /*
  * BMAP Btree format definitions

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 076/145] xfs: define the on-disk refcount btree format
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (74 preceding siblings ...)
  2016-06-17  1:38 ` [PATCH 075/145] xfs: refcount btree add more reserved blocks Darrick J. Wong
@ 2016-06-17  1:38 ` Darrick J. Wong
  2016-06-17  1:38 ` [PATCH 077/145] xfs: account for the refcount btree in the alloc/free log reservation Darrick J. Wong
                   ` (68 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:38 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: Christoph Hellwig, xfs

Start constructing the refcount btree implementation by establishing
the on-disk format and everything needed to read, write, and
manipulate the refcount btree blocks.

v2: Calculate a separate maxlevels for the refcount btree.

v3: Enable the tracking of per-cursor stats for refcount btrees.
The refcount update code will use this to guess if it's time to
split a refcountbt update across two transactions to avoid
exhausing the transaction reservation.

xfs_refcountbt_init_cursor can be called under the ilock, so
use KM_NOFS to prevent fs activity with a lock held.  This
should shut up some of the lockdep warnings.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[hch: allocate the cursor with KM_NOFS to quiet lockdep]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/darwin.h            |    1 
 include/freebsd.h           |    1 
 include/gnukfreebsd.h       |    1 
 include/irix.h              |    1 
 include/libxfs.h            |    1 
 include/linux.h             |    1 
 include/xfs_mount.h         |    3 +
 libxfs/Makefile             |    2 
 libxfs/init.c               |    2 
 libxfs/xfs_btree.c          |    3 +
 libxfs/xfs_btree.h          |   12 +++
 libxfs/xfs_format.h         |   32 ++++++++
 libxfs/xfs_refcount_btree.c |  177 +++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_refcount_btree.h |   67 ++++++++++++++++
 libxfs/xfs_sb.c             |    9 ++
 libxfs/xfs_shared.h         |    2 
 libxfs/xfs_trans_resv.c     |    2 
 libxfs/xfs_trans_resv.h     |    1 
 18 files changed, 317 insertions(+), 1 deletion(-)
 create mode 100644 libxfs/xfs_refcount_btree.c
 create mode 100644 libxfs/xfs_refcount_btree.h


diff --git a/include/darwin.h b/include/darwin.h
index a52030d..2935b4c 100644
--- a/include/darwin.h
+++ b/include/darwin.h
@@ -140,6 +140,7 @@ typedef off_t		xfs_off_t;
 typedef u_int64_t	xfs_ino_t;
 typedef u_int32_t	xfs_dev_t;
 typedef int64_t		xfs_daddr_t;
+typedef u_int32_t	xfs_nlink_t;
 
 #define stat64		stat
 #define fstat64		fstat
diff --git a/include/freebsd.h b/include/freebsd.h
index f7e0c75..3feca07 100644
--- a/include/freebsd.h
+++ b/include/freebsd.h
@@ -53,6 +53,7 @@ typedef off_t		off64_t;
 typedef __uint64_t	xfs_ino_t;
 typedef __uint32_t	xfs_dev_t;
 typedef __int64_t	xfs_daddr_t;
+typedef __uint32_t	xfs_nlink_t;
 
 typedef unsigned char		__u8;
 typedef signed char		__s8;
diff --git a/include/gnukfreebsd.h b/include/gnukfreebsd.h
index 64167b2..a62789e 100644
--- a/include/gnukfreebsd.h
+++ b/include/gnukfreebsd.h
@@ -42,6 +42,7 @@ typedef off_t		xfs_off_t;
 typedef __uint64_t	xfs_ino_t;
 typedef __uint32_t	xfs_dev_t;
 typedef __int64_t	xfs_daddr_t;
+typedef __uint32_t	xfs_nlink_t;
 
 typedef unsigned char		__u8;
 typedef signed char		__s8;
diff --git a/include/irix.h b/include/irix.h
index c2191ee..45c8594 100644
--- a/include/irix.h
+++ b/include/irix.h
@@ -47,6 +47,7 @@ typedef off64_t		xfs_off_t;
 typedef __int64_t	xfs_ino_t;
 typedef __int32_t	xfs_dev_t;
 typedef __int64_t	xfs_daddr_t;
+typedef __int32_t	xfs_nlink_t;
 
 typedef unsigned char		__u8;
 typedef signed char		__s8;
diff --git a/include/libxfs.h b/include/libxfs.h
index 5e76263..bec4ee6 100644
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -78,6 +78,7 @@ extern uint32_t crc32c_le(uint32_t crc, unsigned char const *p, size_t len);
 #include "xfs_trace.h"
 #include "xfs_trans.h"
 #include "xfs_rmap_btree.h"
+#include "xfs_refcount_btree.h"
 
 #ifndef ARRAY_SIZE
 #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
diff --git a/include/linux.h b/include/linux.h
index cc0f70c..cd4b3eb 100644
--- a/include/linux.h
+++ b/include/linux.h
@@ -146,6 +146,7 @@ typedef off64_t		xfs_off_t;
 typedef __uint64_t	xfs_ino_t;
 typedef __uint32_t	xfs_dev_t;
 typedef __int64_t	xfs_daddr_t;
+typedef __uint32_t	xfs_nlink_t;
 
 /**
  * Abstraction of mountpoints.
diff --git a/include/xfs_mount.h b/include/xfs_mount.h
index c5db4d8..d47efa0 100644
--- a/include/xfs_mount.h
+++ b/include/xfs_mount.h
@@ -66,10 +66,13 @@ typedef struct xfs_mount {
 	uint			m_inobt_mnr[2];	/* XFS_INOBT_BLOCK_MINRECS */
 	uint			m_rmap_mxr[2];	/* max rmap btree records */
 	uint			m_rmap_mnr[2];	/* min rmap btree records */
+	uint			m_refc_mxr[2];	/* max refc btree records */
+	uint			m_refc_mnr[2];	/* min refc btree records */
 	uint			m_ag_maxlevels;	/* XFS_AG_MAXLEVELS */
 	uint			m_bm_maxlevels[2]; /* XFS_BM_MAXLEVELS */
 	uint			m_in_maxlevels;	/* XFS_IN_MAXLEVELS */
 	uint			m_rmap_maxlevels; /* max rmap btree levels */
+	uint			m_refc_maxlevels; /* max refcount btree level */
 	xfs_extlen_t		m_ag_prealloc_blocks; /* reserved ag blocks */
 	uint			m_alloc_set_aside; /* space we can't use */
 	uint			m_ag_max_usable; /* max space per AG */
diff --git a/libxfs/Makefile b/libxfs/Makefile
index 588c663..c31a2e9 100644
--- a/libxfs/Makefile
+++ b/libxfs/Makefile
@@ -37,6 +37,7 @@ HFILES = \
 	xfs_inode_buf.h \
 	xfs_inode_fork.h \
 	xfs_quota_defs.h \
+	xfs_refcount_btree.h \
 	xfs_rmap_btree.h \
 	xfs_sb.h \
 	xfs_shared.h \
@@ -85,6 +86,7 @@ CFILES = cache.c \
 	xfs_inode_fork.c \
 	xfs_ialloc_btree.c \
 	xfs_log_rlimit.c \
+	xfs_refcount_btree.c \
 	xfs_rmap.c \
 	xfs_rmap_btree.c \
 	xfs_rtbitmap.c \
diff --git a/libxfs/init.c b/libxfs/init.c
index c56d123..66bfbd8 100644
--- a/libxfs/init.c
+++ b/libxfs/init.c
@@ -32,6 +32,7 @@
 #include "xfs_inode.h"
 #include "xfs_trans.h"
 #include "xfs_rmap_btree.h"
+#include "xfs_refcount_btree.h"
 
 #include "libxfs.h"		/* for now */
 
@@ -685,6 +686,7 @@ libxfs_mount(
 	xfs_bmap_compute_maxlevels(mp, XFS_ATTR_FORK);
 	xfs_ialloc_compute_maxlevels(mp);
 	xfs_rmapbt_compute_maxlevels(mp);
+	xfs_refcountbt_compute_maxlevels(mp);
 
 	if (sbp->sb_imax_pct) {
 		/* Make sure the maximum inode count is a multiple of the
diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c
index f4b1780..89fb2fe 100644
--- a/libxfs/xfs_btree.c
+++ b/libxfs/xfs_btree.c
@@ -1210,6 +1210,9 @@ xfs_btree_set_refs(
 	case XFS_BTNUM_RMAP:
 		xfs_buf_set_ref(bp, XFS_RMAP_BTREE_REF);
 		break;
+	case XFS_BTNUM_REFC:
+		xfs_buf_set_ref(bp, XFS_REFC_BTREE_REF);
+		break;
 	default:
 		ASSERT(0);
 	}
diff --git a/libxfs/xfs_btree.h b/libxfs/xfs_btree.h
index 93e761e..dbf299f 100644
--- a/libxfs/xfs_btree.h
+++ b/libxfs/xfs_btree.h
@@ -43,6 +43,7 @@ union xfs_btree_key {
 	xfs_alloc_key_t			alloc;
 	struct xfs_inobt_key		inobt;
 	struct xfs_rmap_key		rmap;
+	struct xfs_refcount_key		refc;
 };
 
 union xfs_btree_rec {
@@ -51,6 +52,7 @@ union xfs_btree_rec {
 	struct xfs_alloc_rec		alloc;
 	struct xfs_inobt_rec		inobt;
 	struct xfs_rmap_rec		rmap;
+	struct xfs_refcount_rec		refc;
 };
 
 /*
@@ -221,6 +223,15 @@ union xfs_btree_irec {
 	xfs_bmbt_irec_t			b;
 	xfs_inobt_rec_incore_t		i;
 	struct xfs_rmap_irec		r;
+	struct xfs_refcount_irec	rc;
+};
+
+/* Per-AG btree private information. */
+union xfs_btree_cur_private {
+	struct {
+		unsigned long	nr_ops;		/* # record updates */
+		int		shape_changes;	/* # of extent splits */
+	} refc;
 };
 
 /*
@@ -247,6 +258,7 @@ typedef struct xfs_btree_cur
 			struct xfs_buf	*agbp;	/* agf/agi buffer pointer */
 			struct xfs_defer_ops *dfops;	/* deferred updates */
 			xfs_agnumber_t	agno;	/* ag number */
+			union xfs_btree_cur_private	priv;
 		} a;
 		struct {			/* needed for BMAP */
 			struct xfs_inode *ip;	/* pointer to our inode */
diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index 54fbe2b..916d92b 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -1553,6 +1553,38 @@ xfs_owner_info_pack(
 
 unsigned int xfs_refc_block(struct xfs_mount *mp);
 
+/*
+ * Data record/key structure
+ *
+ * Each record associates a range of physical blocks (starting at
+ * rc_startblock and ending rc_blockcount blocks later) with a
+ * reference count (rc_refcount).  A record is only stored in the
+ * btree if the refcount is > 2.  An entry in the free block btree
+ * means that the refcount is 0, and no entries anywhere means that
+ * the refcount is 1, as was true in XFS before reflinking.
+ */
+struct xfs_refcount_rec {
+	__be32		rc_startblock;	/* starting block number */
+	__be32		rc_blockcount;	/* count of blocks */
+	__be32		rc_refcount;	/* number of inodes linked here */
+};
+
+struct xfs_refcount_key {
+	__be32		rc_startblock;	/* starting block number */
+};
+
+struct xfs_refcount_irec {
+	xfs_agblock_t	rc_startblock;	/* starting block number */
+	xfs_extlen_t	rc_blockcount;	/* count of free blocks */
+	xfs_nlink_t	rc_refcount;	/* number of inodes linked here */
+};
+
+#define MAXREFCOUNT	((xfs_nlink_t)~0U)
+#define MAXREFCEXTLEN	((xfs_extlen_t)~0U)
+
+/* btree pointer type */
+typedef __be32 xfs_refcount_ptr_t;
+
 
 /*
  * BMAP Btree format definitions
diff --git a/libxfs/xfs_refcount_btree.c b/libxfs/xfs_refcount_btree.c
new file mode 100644
index 0000000..a7b99e4
--- /dev/null
+++ b/libxfs/xfs_refcount_btree.c
@@ -0,0 +1,177 @@
+/*
+ * Copyright (C) 2016 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "libxfs_priv.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_sb.h"
+#include "xfs_mount.h"
+#include "xfs_btree.h"
+#include "xfs_bmap.h"
+#include "xfs_refcount_btree.h"
+#include "xfs_alloc.h"
+#include "xfs_trace.h"
+#include "xfs_cksum.h"
+#include "xfs_trans.h"
+#include "xfs_bit.h"
+
+static struct xfs_btree_cur *
+xfs_refcountbt_dup_cursor(
+	struct xfs_btree_cur	*cur)
+{
+	return xfs_refcountbt_init_cursor(cur->bc_mp, cur->bc_tp,
+			cur->bc_private.a.agbp, cur->bc_private.a.agno,
+			cur->bc_private.a.dfops);
+}
+
+STATIC bool
+xfs_refcountbt_verify(
+	struct xfs_buf		*bp)
+{
+	struct xfs_mount	*mp = bp->b_target->bt_mount;
+	struct xfs_btree_block	*block = XFS_BUF_TO_BLOCK(bp);
+	struct xfs_perag	*pag = bp->b_pag;
+	unsigned int		level;
+
+	if (block->bb_magic != cpu_to_be32(XFS_REFC_CRC_MAGIC))
+		return false;
+
+	if (!xfs_sb_version_hasreflink(&mp->m_sb))
+		return false;
+	if (!xfs_btree_sblock_v5hdr_verify(bp))
+		return false;
+
+	level = be16_to_cpu(block->bb_level);
+	if (pag && pag->pagf_init) {
+		if (level >= pag->pagf_refcount_level)
+			return false;
+	} else if (level >= mp->m_refc_maxlevels)
+		return false;
+
+	return xfs_btree_sblock_verify(bp, mp->m_refc_mxr[level != 0]);
+}
+
+STATIC void
+xfs_refcountbt_read_verify(
+	struct xfs_buf	*bp)
+{
+	if (!xfs_btree_sblock_verify_crc(bp))
+		xfs_buf_ioerror(bp, -EFSBADCRC);
+	else if (!xfs_refcountbt_verify(bp))
+		xfs_buf_ioerror(bp, -EFSCORRUPTED);
+
+	if (bp->b_error) {
+		trace_xfs_btree_corrupt(bp, _RET_IP_);
+		xfs_verifier_error(bp);
+	}
+}
+
+STATIC void
+xfs_refcountbt_write_verify(
+	struct xfs_buf	*bp)
+{
+	if (!xfs_refcountbt_verify(bp)) {
+		trace_xfs_btree_corrupt(bp, _RET_IP_);
+		xfs_buf_ioerror(bp, -EFSCORRUPTED);
+		xfs_verifier_error(bp);
+		return;
+	}
+	xfs_btree_sblock_calc_crc(bp);
+
+}
+
+const struct xfs_buf_ops xfs_refcountbt_buf_ops = {
+	.name			= "xfs_refcountbt",
+	.verify_read		= xfs_refcountbt_read_verify,
+	.verify_write		= xfs_refcountbt_write_verify,
+};
+
+static const struct xfs_btree_ops xfs_refcountbt_ops = {
+	.rec_len		= sizeof(struct xfs_refcount_rec),
+	.key_len		= sizeof(struct xfs_refcount_key),
+
+	.dup_cursor		= xfs_refcountbt_dup_cursor,
+	.buf_ops		= &xfs_refcountbt_buf_ops,
+};
+
+/*
+ * Allocate a new refcount btree cursor.
+ */
+struct xfs_btree_cur *
+xfs_refcountbt_init_cursor(
+	struct xfs_mount	*mp,
+	struct xfs_trans	*tp,
+	struct xfs_buf		*agbp,
+	xfs_agnumber_t		agno,
+	struct xfs_defer_ops	*dfops)
+{
+	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agbp);
+	struct xfs_btree_cur	*cur;
+
+	ASSERT(agno != NULLAGNUMBER);
+	ASSERT(agno < mp->m_sb.sb_agcount);
+	cur = kmem_zone_zalloc(xfs_btree_cur_zone, KM_NOFS);
+
+	cur->bc_tp = tp;
+	cur->bc_mp = mp;
+	cur->bc_btnum = XFS_BTNUM_REFC;
+	cur->bc_blocklog = mp->m_sb.sb_blocklog;
+	cur->bc_ops = &xfs_refcountbt_ops;
+
+	cur->bc_nlevels = be32_to_cpu(agf->agf_refcount_level);
+
+	cur->bc_private.a.agbp = agbp;
+	cur->bc_private.a.agno = agno;
+	cur->bc_private.a.dfops = dfops;
+	cur->bc_flags |= XFS_BTREE_CRC_BLOCKS;
+
+	cur->bc_private.a.priv.refc.nr_ops = 0;
+	cur->bc_private.a.priv.refc.shape_changes = 0;
+
+	return cur;
+}
+
+/*
+ * Calculate the number of records in a refcount btree block.
+ */
+int
+xfs_refcountbt_maxrecs(
+	struct xfs_mount	*mp,
+	int			blocklen,
+	bool			leaf)
+{
+	blocklen -= XFS_REFCOUNT_BLOCK_LEN;
+
+	if (leaf)
+		return blocklen / sizeof(struct xfs_refcount_rec);
+	return blocklen / (sizeof(struct xfs_refcount_key) +
+			   sizeof(xfs_refcount_ptr_t));
+}
+
+/* Compute the maximum height of a refcount btree. */
+void
+xfs_refcountbt_compute_maxlevels(
+	struct xfs_mount		*mp)
+{
+	mp->m_refc_maxlevels = xfs_btree_compute_maxlevels(mp,
+			mp->m_refc_mnr, mp->m_sb.sb_agblocks);
+}
diff --git a/libxfs/xfs_refcount_btree.h b/libxfs/xfs_refcount_btree.h
new file mode 100644
index 0000000..9e9ad7c
--- /dev/null
+++ b/libxfs/xfs_refcount_btree.h
@@ -0,0 +1,67 @@
+/*
+ * Copyright (C) 2016 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef __XFS_REFCOUNT_BTREE_H__
+#define	__XFS_REFCOUNT_BTREE_H__
+
+/*
+ * Reference Count Btree on-disk structures
+ */
+
+struct xfs_buf;
+struct xfs_btree_cur;
+struct xfs_mount;
+
+/*
+ * Btree block header size
+ */
+#define XFS_REFCOUNT_BLOCK_LEN	XFS_BTREE_SBLOCK_CRC_LEN
+
+/*
+ * Record, key, and pointer address macros for btree blocks.
+ *
+ * (note that some of these may appear unused, but they are used in userspace)
+ */
+#define XFS_REFCOUNT_REC_ADDR(block, index) \
+	((struct xfs_refcount_rec *) \
+		((char *)(block) + \
+		 XFS_REFCOUNT_BLOCK_LEN + \
+		 (((index) - 1) * sizeof(struct xfs_refcount_rec))))
+
+#define XFS_REFCOUNT_KEY_ADDR(block, index) \
+	((struct xfs_refcount_key *) \
+		((char *)(block) + \
+		 XFS_REFCOUNT_BLOCK_LEN + \
+		 ((index) - 1) * sizeof(struct xfs_refcount_key)))
+
+#define XFS_REFCOUNT_PTR_ADDR(block, index, maxrecs) \
+	((xfs_refcount_ptr_t *) \
+		((char *)(block) + \
+		 XFS_REFCOUNT_BLOCK_LEN + \
+		 (maxrecs) * sizeof(struct xfs_refcount_key) + \
+		 ((index) - 1) * sizeof(xfs_refcount_ptr_t)))
+
+extern struct xfs_btree_cur *xfs_refcountbt_init_cursor(struct xfs_mount *mp,
+		struct xfs_trans *tp, struct xfs_buf *agbp, xfs_agnumber_t agno,
+		struct xfs_defer_ops *dfops);
+extern int xfs_refcountbt_maxrecs(struct xfs_mount *mp, int blocklen,
+		bool leaf);
+extern void xfs_refcountbt_compute_maxlevels(struct xfs_mount *mp);
+
+#endif	/* __XFS_REFCOUNT_BTREE_H__ */
diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index 26c29ea..bb54bc2 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -35,6 +35,8 @@
 #include "xfs_alloc_btree.h"
 #include "xfs_ialloc_btree.h"
 #include "xfs_rmap_btree.h"
+#include "xfs_bmap.h"
+#include "xfs_refcount_btree.h"
 
 /*
  * Physical superblock buffer manipulations. Shared with libxfs in userspace.
@@ -737,6 +739,13 @@ xfs_sb_mount_common(
 	mp->m_rmap_mnr[0] = mp->m_rmap_mxr[0] / 2;
 	mp->m_rmap_mnr[1] = mp->m_rmap_mxr[1] / 2;
 
+	mp->m_refc_mxr[0] = xfs_refcountbt_maxrecs(mp, sbp->sb_blocksize,
+			true);
+	mp->m_refc_mxr[1] = xfs_refcountbt_maxrecs(mp, sbp->sb_blocksize,
+			false);
+	mp->m_refc_mnr[0] = mp->m_refc_mxr[0] / 2;
+	mp->m_refc_mnr[1] = mp->m_refc_mxr[1] / 2;
+
 	mp->m_bsize = XFS_FSB_TO_BB(mp, 1);
 	mp->m_ialloc_inos = (int)MAX((__uint16_t)XFS_INODES_PER_CHUNK,
 					sbp->sb_inopblock);
diff --git a/libxfs/xfs_shared.h b/libxfs/xfs_shared.h
index 0c5b30b..c6f4eb4 100644
--- a/libxfs/xfs_shared.h
+++ b/libxfs/xfs_shared.h
@@ -39,6 +39,7 @@ extern const struct xfs_buf_ops xfs_agf_buf_ops;
 extern const struct xfs_buf_ops xfs_agfl_buf_ops;
 extern const struct xfs_buf_ops xfs_allocbt_buf_ops;
 extern const struct xfs_buf_ops xfs_rmapbt_buf_ops;
+extern const struct xfs_buf_ops xfs_refcountbt_buf_ops;
 extern const struct xfs_buf_ops xfs_attr3_leaf_buf_ops;
 extern const struct xfs_buf_ops xfs_attr3_rmt_buf_ops;
 extern const struct xfs_buf_ops xfs_bmbt_buf_ops;
@@ -122,6 +123,7 @@ int	xfs_log_calc_minimum_size(struct xfs_mount *);
 #define	XFS_INO_REF		2
 #define	XFS_ATTR_BTREE_REF	1
 #define	XFS_DQUOT_REF		1
+#define	XFS_REFC_BTREE_REF	1
 
 /*
  * Flags for xfs_trans_ichgtime().
diff --git a/libxfs/xfs_trans_resv.c b/libxfs/xfs_trans_resv.c
index 2ed80a5..10234bb 100644
--- a/libxfs/xfs_trans_resv.c
+++ b/libxfs/xfs_trans_resv.c
@@ -72,7 +72,7 @@ xfs_calc_buf_res(
  *
  * Keep in mind that max depth is calculated separately for each type of tree.
  */
-static uint
+uint
 xfs_allocfree_log_count(
 	struct xfs_mount *mp,
 	uint		num_ops)
diff --git a/libxfs/xfs_trans_resv.h b/libxfs/xfs_trans_resv.h
index 0eb46ed..36a1511 100644
--- a/libxfs/xfs_trans_resv.h
+++ b/libxfs/xfs_trans_resv.h
@@ -102,5 +102,6 @@ struct xfs_trans_resv {
 #define	XFS_ATTRRM_LOG_COUNT		3
 
 void xfs_trans_resv_calc(struct xfs_mount *mp, struct xfs_trans_resv *resp);
+uint xfs_allocfree_log_count(struct xfs_mount *mp, uint num_ops);
 
 #endif	/* __XFS_TRANS_RESV_H__ */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 077/145] xfs: account for the refcount btree in the alloc/free log reservation
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (75 preceding siblings ...)
  2016-06-17  1:38 ` [PATCH 076/145] xfs: define the on-disk refcount btree format Darrick J. Wong
@ 2016-06-17  1:38 ` Darrick J. Wong
  2016-06-17  1:38 ` [PATCH 078/145] xfs: add refcount btree operations Darrick J. Wong
                   ` (67 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:38 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: Christoph Hellwig, xfs

Every time we allocate or free an extent, we might need to split the
refcount btree.  Reserve some blocks in the transaction to handle
this possibility.

(Reproduced by generic/167 over NFS atop XFS)

Signed-off-by: Christoph Hellwig <hch@lst.de>
[darrick.wong@oracle.com: add commit message]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_trans_resv.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)


diff --git a/libxfs/xfs_trans_resv.c b/libxfs/xfs_trans_resv.c
index 10234bb..5b6bbcd 100644
--- a/libxfs/xfs_trans_resv.c
+++ b/libxfs/xfs_trans_resv.c
@@ -66,7 +66,8 @@ xfs_calc_buf_res(
  * Per-extent log reservation for the btree changes involved in freeing or
  * allocating an extent.  In classic XFS there were two trees that will be
  * modified (bnobt + cntbt).  With rmap enabled, there are three trees
- * (rmapbt).  The number of blocks reserved is based on the formula:
+ * (rmapbt).  With reflink, there are four trees (refcountbt).  The number of
+ * blocks reserved is based on the formula:
  *
  * num trees * ((2 blocks/level * max depth) - 1)
  *
@@ -82,6 +83,8 @@ xfs_allocfree_log_count(
 	blocks = num_ops * 2 * (2 * mp->m_ag_maxlevels - 1);
 	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
 		blocks += num_ops * (2 * mp->m_rmap_maxlevels - 1);
+	if (xfs_sb_version_hasreflink(&mp->m_sb))
+		blocks += num_ops * (2 * mp->m_refc_maxlevels - 1);
 
 	return blocks;
 }

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 078/145] xfs: add refcount btree operations
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (76 preceding siblings ...)
  2016-06-17  1:38 ` [PATCH 077/145] xfs: account for the refcount btree in the alloc/free log reservation Darrick J. Wong
@ 2016-06-17  1:38 ` Darrick J. Wong
  2016-06-17  1:39 ` [PATCH 079/145] xfs: create refcount update intent log items Darrick J. Wong
                   ` (66 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:38 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: Christoph Hellwig, xfs

Implement the generic btree operations required to manipulate refcount
btree blocks.  The implementation is similar to the bmapbt, though it
will only allocate and free blocks from the AG.

v2: Remove init_rec_from_key since we no longer need it, and add
tracepoints when refcount btree operations fail.

Since the refcount root and level fields are separate from the
existing roots and levels array, they need a separate logging flag.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[hch: fix logging of AGF refcount btree fields]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/libxfs.h            |    1 
 include/xfs_trace.h         |   12 +++
 libxfs/Makefile             |    2 
 libxfs/xfs_alloc.c          |    4 +
 libxfs/xfs_format.h         |    5 +
 libxfs/xfs_refcount.c       |  176 ++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_refcount.h       |   30 +++++++
 libxfs/xfs_refcount_btree.c |  197 +++++++++++++++++++++++++++++++++++++++++++
 8 files changed, 426 insertions(+), 1 deletion(-)
 create mode 100644 libxfs/xfs_refcount.c
 create mode 100644 libxfs/xfs_refcount.h


diff --git a/include/libxfs.h b/include/libxfs.h
index bec4ee6..cc6a877 100644
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -79,6 +79,7 @@ extern uint32_t crc32c_le(uint32_t crc, unsigned char const *p, size_t len);
 #include "xfs_trans.h"
 #include "xfs_rmap_btree.h"
 #include "xfs_refcount_btree.h"
+#include "xfs_refcount.h"
 
 #ifndef ARRAY_SIZE
 #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
diff --git a/include/xfs_trace.h b/include/xfs_trace.h
index 040f2f0..cb5fa89 100644
--- a/include/xfs_trace.h
+++ b/include/xfs_trace.h
@@ -221,6 +221,18 @@
 #define trace_xfs_ag_resv_alloc_extent(...)	((void) 0)
 #define trace_xfs_ag_resv_free_extent(...)	((void) 0)
 
+#define trace_xfs_refcountbt_lookup(...)	((void) 0)
+#define trace_xfs_refcountbt_get(...)		((void) 0)
+#define trace_xfs_refcountbt_update(...)	((void) 0)
+#define trace_xfs_refcountbt_update_error(...)	((void) 0)
+#define trace_xfs_refcountbt_insert(...)	((void) 0)
+#define trace_xfs_refcountbt_insert_error(...)	((void) 0)
+#define trace_xfs_refcountbt_delete(...)	((void) 0)
+#define trace_xfs_refcountbt_delete_error(...)	((void) 0)
+#define trace_xfs_refcountbt_free_block(...)	((void) 0)
+#define trace_xfs_refcountbt_alloc_block(...)	((void) 0)
+#define trace_xfs_refcount_rec_order_error(...)	((void) 0)
+
 /* set c = c to avoid unused var warnings */
 #define trace_xfs_perag_get(a,b,c,d)	((c) = (c))
 #define trace_xfs_perag_get_tag(a,b,c,d) ((c) = (c))
diff --git a/libxfs/Makefile b/libxfs/Makefile
index c31a2e9..4b1ada0 100644
--- a/libxfs/Makefile
+++ b/libxfs/Makefile
@@ -37,6 +37,7 @@ HFILES = \
 	xfs_inode_buf.h \
 	xfs_inode_fork.h \
 	xfs_quota_defs.h \
+	xfs_refcount.h \
 	xfs_refcount_btree.h \
 	xfs_rmap_btree.h \
 	xfs_sb.h \
@@ -86,6 +87,7 @@ CFILES = cache.c \
 	xfs_inode_fork.c \
 	xfs_ialloc_btree.c \
 	xfs_log_rlimit.c \
+	xfs_refcount.c \
 	xfs_refcount_btree.c \
 	xfs_rmap.c \
 	xfs_rmap_btree.c \
diff --git a/libxfs/xfs_alloc.c b/libxfs/xfs_alloc.c
index 6e6ada8..6554ce7 100644
--- a/libxfs/xfs_alloc.c
+++ b/libxfs/xfs_alloc.c
@@ -2322,6 +2322,10 @@ xfs_alloc_log_agf(
 		offsetof(xfs_agf_t, agf_longest),
 		offsetof(xfs_agf_t, agf_btreeblks),
 		offsetof(xfs_agf_t, agf_uuid),
+		offsetof(xfs_agf_t, agf_refcount_root),
+		offsetof(xfs_agf_t, agf_refcount_level),
+		/* needed so that we don't log the whole rest of the structure: */
+		offsetof(xfs_agf_t, agf_spare64),
 		sizeof(xfs_agf_t)
 	};
 
diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index 916d92b..fdeaf53 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -680,7 +680,10 @@ typedef struct xfs_agf {
 #define	XFS_AGF_LONGEST		0x00000400
 #define	XFS_AGF_BTREEBLKS	0x00000800
 #define	XFS_AGF_UUID		0x00001000
-#define	XFS_AGF_NUM_BITS	13
+#define	XFS_AGF_REFCOUNT_ROOT	0x00002000
+#define	XFS_AGF_REFCOUNT_LEVEL	0x00004000
+#define	XFS_AGF_SPARE64		0x00008000
+#define	XFS_AGF_NUM_BITS	16
 #define	XFS_AGF_ALL_BITS	((1 << XFS_AGF_NUM_BITS) - 1)
 
 #define XFS_AGF_FLAGS \
diff --git a/libxfs/xfs_refcount.c b/libxfs/xfs_refcount.c
new file mode 100644
index 0000000..0eda933
--- /dev/null
+++ b/libxfs/xfs_refcount.c
@@ -0,0 +1,176 @@
+/*
+ * Copyright (C) 2016 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "libxfs_priv.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_sb.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bmap.h"
+#include "xfs_refcount_btree.h"
+#include "xfs_alloc.h"
+#include "xfs_trace.h"
+#include "xfs_cksum.h"
+#include "xfs_trans.h"
+#include "xfs_bit.h"
+#include "xfs_refcount.h"
+
+/*
+ * Look up the first record less than or equal to [bno, len] in the btree
+ * given by cur.
+ */
+int
+xfs_refcountbt_lookup_le(
+	struct xfs_btree_cur	*cur,
+	xfs_agblock_t		bno,
+	int			*stat)
+{
+	trace_xfs_refcountbt_lookup(cur->bc_mp, cur->bc_private.a.agno, bno,
+			XFS_LOOKUP_LE);
+	cur->bc_rec.rc.rc_startblock = bno;
+	cur->bc_rec.rc.rc_blockcount = 0;
+	return xfs_btree_lookup(cur, XFS_LOOKUP_LE, stat);
+}
+
+/*
+ * Look up the first record greater than or equal to [bno, len] in the btree
+ * given by cur.
+ */
+int
+xfs_refcountbt_lookup_ge(
+	struct xfs_btree_cur	*cur,
+	xfs_agblock_t		bno,
+	int			*stat)
+{
+	trace_xfs_refcountbt_lookup(cur->bc_mp, cur->bc_private.a.agno, bno,
+			XFS_LOOKUP_GE);
+	cur->bc_rec.rc.rc_startblock = bno;
+	cur->bc_rec.rc.rc_blockcount = 0;
+	return xfs_btree_lookup(cur, XFS_LOOKUP_GE, stat);
+}
+
+/*
+ * Get the data from the pointed-to record.
+ */
+int
+xfs_refcountbt_get_rec(
+	struct xfs_btree_cur		*cur,
+	struct xfs_refcount_irec	*irec,
+	int				*stat)
+{
+	union xfs_btree_rec	*rec;
+	int			error;
+
+	error = xfs_btree_get_rec(cur, &rec, stat);
+	if (!error && *stat == 1) {
+		irec->rc_startblock = be32_to_cpu(rec->refc.rc_startblock);
+		irec->rc_blockcount = be32_to_cpu(rec->refc.rc_blockcount);
+		irec->rc_refcount = be32_to_cpu(rec->refc.rc_refcount);
+		trace_xfs_refcountbt_get(cur->bc_mp, cur->bc_private.a.agno,
+				irec);
+	}
+	return error;
+}
+
+/*
+ * Update the record referred to by cur to the value given
+ * by [bno, len, refcount].
+ * This either works (return 0) or gets an EFSCORRUPTED error.
+ */
+STATIC int
+xfs_refcountbt_update(
+	struct xfs_btree_cur		*cur,
+	struct xfs_refcount_irec	*irec)
+{
+	union xfs_btree_rec	rec;
+	int			error;
+
+	trace_xfs_refcountbt_update(cur->bc_mp, cur->bc_private.a.agno, irec);
+	rec.refc.rc_startblock = cpu_to_be32(irec->rc_startblock);
+	rec.refc.rc_blockcount = cpu_to_be32(irec->rc_blockcount);
+	rec.refc.rc_refcount = cpu_to_be32(irec->rc_refcount);
+	error = xfs_btree_update(cur, &rec);
+	if (error)
+		trace_xfs_refcountbt_update_error(cur->bc_mp,
+				cur->bc_private.a.agno, error, _RET_IP_);
+	return error;
+}
+
+/*
+ * Insert the record referred to by cur to the value given
+ * by [bno, len, refcount].
+ * This either works (return 0) or gets an EFSCORRUPTED error.
+ */
+STATIC int
+xfs_refcountbt_insert(
+	struct xfs_btree_cur		*cur,
+	struct xfs_refcount_irec	*irec,
+	int				*i)
+{
+	int				error;
+
+	trace_xfs_refcountbt_insert(cur->bc_mp, cur->bc_private.a.agno, irec);
+	cur->bc_rec.rc.rc_startblock = irec->rc_startblock;
+	cur->bc_rec.rc.rc_blockcount = irec->rc_blockcount;
+	cur->bc_rec.rc.rc_refcount = irec->rc_refcount;
+	error = xfs_btree_insert(cur, i);
+	XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, *i == 1, out_error);
+out_error:
+	if (error)
+		trace_xfs_refcountbt_insert_error(cur->bc_mp,
+				cur->bc_private.a.agno, error, _RET_IP_);
+	return error;
+}
+
+/*
+ * Remove the record referred to by cur, then set the pointer to the spot
+ * where the record could be re-inserted, in case we want to increment or
+ * decrement the cursor.
+ * This either works (return 0) or gets an EFSCORRUPTED error.
+ */
+STATIC int
+xfs_refcountbt_delete(
+	struct xfs_btree_cur	*cur,
+	int			*i)
+{
+	struct xfs_refcount_irec	irec;
+	int			found_rec;
+	int			error;
+
+	error = xfs_refcountbt_get_rec(cur, &irec, &found_rec);
+	if (error)
+		goto out_error;
+	XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
+	trace_xfs_refcountbt_delete(cur->bc_mp, cur->bc_private.a.agno, &irec);
+	error = xfs_btree_delete(cur, i);
+	XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, *i == 1, out_error);
+	if (error)
+		goto out_error;
+	error = xfs_refcountbt_lookup_ge(cur, irec.rc_startblock, &found_rec);
+out_error:
+	if (error)
+		trace_xfs_refcountbt_delete_error(cur->bc_mp,
+				cur->bc_private.a.agno, error, _RET_IP_);
+	return error;
+}
diff --git a/libxfs/xfs_refcount.h b/libxfs/xfs_refcount.h
new file mode 100644
index 0000000..8ea65c6
--- /dev/null
+++ b/libxfs/xfs_refcount.h
@@ -0,0 +1,30 @@
+/*
+ * Copyright (C) 2016 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef __XFS_REFCOUNT_H__
+#define __XFS_REFCOUNT_H__
+
+extern int xfs_refcountbt_lookup_le(struct xfs_btree_cur *cur,
+		xfs_agblock_t bno, int *stat);
+extern int xfs_refcountbt_lookup_ge(struct xfs_btree_cur *cur,
+		xfs_agblock_t bno, int *stat);
+extern int xfs_refcountbt_get_rec(struct xfs_btree_cur *cur,
+		struct xfs_refcount_irec *irec, int *stat);
+
+#endif	/* __XFS_REFCOUNT_H__ */
diff --git a/libxfs/xfs_refcount_btree.c b/libxfs/xfs_refcount_btree.c
index a7b99e4..8c53e71 100644
--- a/libxfs/xfs_refcount_btree.c
+++ b/libxfs/xfs_refcount_btree.c
@@ -43,6 +43,153 @@ xfs_refcountbt_dup_cursor(
 			cur->bc_private.a.dfops);
 }
 
+STATIC void
+xfs_refcountbt_set_root(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_ptr	*ptr,
+	int			inc)
+{
+	struct xfs_buf		*agbp = cur->bc_private.a.agbp;
+	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agbp);
+	xfs_agnumber_t		seqno = be32_to_cpu(agf->agf_seqno);
+	struct xfs_perag	*pag = xfs_perag_get(cur->bc_mp, seqno);
+
+	ASSERT(ptr->s != 0);
+
+	agf->agf_refcount_root = ptr->s;
+	be32_add_cpu(&agf->agf_refcount_level, inc);
+	pag->pagf_refcount_level += inc;
+	xfs_perag_put(pag);
+
+	xfs_alloc_log_agf(cur->bc_tp, agbp,
+			XFS_AGF_REFCOUNT_ROOT | XFS_AGF_REFCOUNT_LEVEL);
+}
+
+STATIC int
+xfs_refcountbt_alloc_block(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_ptr	*start,
+	union xfs_btree_ptr	*new,
+	int			*stat)
+{
+	struct xfs_alloc_arg	args;		/* block allocation args */
+	int			error;		/* error return value */
+
+	memset(&args, 0, sizeof(args));
+	args.tp = cur->bc_tp;
+	args.mp = cur->bc_mp;
+	args.type = XFS_ALLOCTYPE_NEAR_BNO;
+	args.fsbno = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_private.a.agno,
+			xfs_refc_block(args.mp));
+	args.firstblock = args.fsbno;
+	xfs_rmap_ag_owner(&args.oinfo, XFS_RMAP_OWN_REFC);
+	args.minlen = args.maxlen = args.prod = 1;
+
+	error = xfs_alloc_vextent(&args);
+	if (error)
+		goto out_error;
+	trace_xfs_refcountbt_alloc_block(cur->bc_mp, cur->bc_private.a.agno,
+			args.agbno, 1);
+	if (args.fsbno == NULLFSBLOCK) {
+		XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT);
+		*stat = 0;
+		return 0;
+	}
+	ASSERT(args.agno == cur->bc_private.a.agno);
+	ASSERT(args.len == 1);
+
+	new->s = cpu_to_be32(args.agbno);
+
+	XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT);
+	*stat = 1;
+	return 0;
+
+out_error:
+	XFS_BTREE_TRACE_CURSOR(cur, XBT_ERROR);
+	return error;
+}
+
+STATIC int
+xfs_refcountbt_free_block(
+	struct xfs_btree_cur	*cur,
+	struct xfs_buf		*bp)
+{
+	struct xfs_mount	*mp = cur->bc_mp;
+	struct xfs_trans	*tp = cur->bc_tp;
+	xfs_fsblock_t		fsbno = XFS_DADDR_TO_FSB(mp, XFS_BUF_ADDR(bp));
+	struct xfs_owner_info	oinfo;
+
+	trace_xfs_refcountbt_free_block(cur->bc_mp, cur->bc_private.a.agno,
+			XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno), 1);
+	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_REFC);
+	xfs_bmap_add_free(mp, cur->bc_private.a.dfops, fsbno, 1,
+			&oinfo);
+	xfs_trans_binval(tp, bp);
+	return 0;
+}
+
+STATIC int
+xfs_refcountbt_get_minrecs(
+	struct xfs_btree_cur	*cur,
+	int			level)
+{
+	return cur->bc_mp->m_refc_mnr[level != 0];
+}
+
+STATIC int
+xfs_refcountbt_get_maxrecs(
+	struct xfs_btree_cur	*cur,
+	int			level)
+{
+	return cur->bc_mp->m_refc_mxr[level != 0];
+}
+
+STATIC void
+xfs_refcountbt_init_key_from_rec(
+	union xfs_btree_key	*key,
+	union xfs_btree_rec	*rec)
+{
+	ASSERT(rec->refc.rc_startblock != 0);
+
+	key->refc.rc_startblock = rec->refc.rc_startblock;
+}
+
+STATIC void
+xfs_refcountbt_init_rec_from_cur(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_rec	*rec)
+{
+	ASSERT(cur->bc_rec.rc.rc_startblock != 0);
+
+	rec->refc.rc_startblock = cpu_to_be32(cur->bc_rec.rc.rc_startblock);
+	rec->refc.rc_blockcount = cpu_to_be32(cur->bc_rec.rc.rc_blockcount);
+	rec->refc.rc_refcount = cpu_to_be32(cur->bc_rec.rc.rc_refcount);
+}
+
+STATIC void
+xfs_refcountbt_init_ptr_from_cur(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_ptr	*ptr)
+{
+	struct xfs_agf		*agf = XFS_BUF_TO_AGF(cur->bc_private.a.agbp);
+
+	ASSERT(cur->bc_private.a.agno == be32_to_cpu(agf->agf_seqno));
+	ASSERT(agf->agf_refcount_root != 0);
+
+	ptr->s = agf->agf_refcount_root;
+}
+
+STATIC __int64_t
+xfs_refcountbt_key_diff(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_key	*key)
+{
+	struct xfs_refcount_irec	*rec = &cur->bc_rec.rc;
+	struct xfs_refcount_key		*kp = &key->refc;
+
+	return (__int64_t)be32_to_cpu(kp->rc_startblock) - rec->rc_startblock;
+}
+
 STATIC bool
 xfs_refcountbt_verify(
 	struct xfs_buf		*bp)
@@ -105,12 +252,62 @@ const struct xfs_buf_ops xfs_refcountbt_buf_ops = {
 	.verify_write		= xfs_refcountbt_write_verify,
 };
 
+#if defined(DEBUG) || defined(XFS_WARN)
+STATIC int
+xfs_refcountbt_keys_inorder(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_key	*k1,
+	union xfs_btree_key	*k2)
+{
+	return be32_to_cpu(k1->refc.rc_startblock) <
+	       be32_to_cpu(k2->refc.rc_startblock);
+}
+
+STATIC int
+xfs_refcountbt_recs_inorder(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_rec	*r1,
+	union xfs_btree_rec	*r2)
+{
+	struct xfs_refcount_irec	a, b;
+
+	int ret = be32_to_cpu(r1->refc.rc_startblock) +
+		be32_to_cpu(r1->refc.rc_blockcount) <=
+		be32_to_cpu(r2->refc.rc_startblock);
+	if (!ret) {
+		a.rc_startblock = be32_to_cpu(r1->refc.rc_startblock);
+		a.rc_blockcount = be32_to_cpu(r1->refc.rc_blockcount);
+		a.rc_refcount = be32_to_cpu(r1->refc.rc_refcount);
+		b.rc_startblock = be32_to_cpu(r2->refc.rc_startblock);
+		b.rc_blockcount = be32_to_cpu(r2->refc.rc_blockcount);
+		b.rc_refcount = be32_to_cpu(r2->refc.rc_refcount);
+		trace_xfs_refcount_rec_order_error(cur->bc_mp,
+				cur->bc_private.a.agno, &a, &b);
+	}
+
+	return ret;
+}
+#endif	/* DEBUG */
+
 static const struct xfs_btree_ops xfs_refcountbt_ops = {
 	.rec_len		= sizeof(struct xfs_refcount_rec),
 	.key_len		= sizeof(struct xfs_refcount_key),
 
 	.dup_cursor		= xfs_refcountbt_dup_cursor,
+	.set_root		= xfs_refcountbt_set_root,
+	.alloc_block		= xfs_refcountbt_alloc_block,
+	.free_block		= xfs_refcountbt_free_block,
+	.get_minrecs		= xfs_refcountbt_get_minrecs,
+	.get_maxrecs		= xfs_refcountbt_get_maxrecs,
+	.init_key_from_rec	= xfs_refcountbt_init_key_from_rec,
+	.init_rec_from_cur	= xfs_refcountbt_init_rec_from_cur,
+	.init_ptr_from_cur	= xfs_refcountbt_init_ptr_from_cur,
+	.key_diff		= xfs_refcountbt_key_diff,
 	.buf_ops		= &xfs_refcountbt_buf_ops,
+#if defined(DEBUG) || defined(XFS_WARN)
+	.keys_inorder		= xfs_refcountbt_keys_inorder,
+	.recs_inorder		= xfs_refcountbt_recs_inorder,
+#endif
 };
 
 /*

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 079/145] xfs: create refcount update intent log items
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (77 preceding siblings ...)
  2016-06-17  1:38 ` [PATCH 078/145] xfs: add refcount btree operations Darrick J. Wong
@ 2016-06-17  1:39 ` Darrick J. Wong
  2016-06-17  1:39 ` [PATCH 080/145] xfs: log refcount intent items Darrick J. Wong
                   ` (65 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:39 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Create refcount update intent/done log items to record redo
information in the log.  Because we need to roll transactions between
updating the bmbt mapping and updating the reverse mapping, we also
have to track the status of the metadata updates that will be recorded
in the post-roll transactions, just in case we crash before committing
the final transaction.  This mechanism enables log recovery to finish
what was already started.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_log_format.h |   54 +++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 52 insertions(+), 2 deletions(-)


diff --git a/libxfs/xfs_log_format.h b/libxfs/xfs_log_format.h
index b9627b7..923b08f 100644
--- a/libxfs/xfs_log_format.h
+++ b/libxfs/xfs_log_format.h
@@ -112,7 +112,9 @@ static inline uint xlog_get_cycle(char *ptr)
 #define XLOG_REG_TYPE_ICREATE		20
 #define XLOG_REG_TYPE_RUI_FORMAT	21
 #define XLOG_REG_TYPE_RUD_FORMAT	22
-#define XLOG_REG_TYPE_MAX		22
+#define XLOG_REG_TYPE_CUI_FORMAT	23
+#define XLOG_REG_TYPE_CUD_FORMAT	24
+#define XLOG_REG_TYPE_MAX		24
 
 /*
  * Flags to log operation header
@@ -231,6 +233,8 @@ typedef struct xfs_trans_header {
 #define	XFS_LI_ICREATE		0x123f
 #define	XFS_LI_RUI		0x1240	/* rmap update intent */
 #define	XFS_LI_RUD		0x1241
+#define	XFS_LI_CUI		0x1242	/* refcount update intent */
+#define	XFS_LI_CUD		0x1243
 
 #define XFS_LI_TYPE_DESC \
 	{ XFS_LI_EFI,		"XFS_LI_EFI" }, \
@@ -242,7 +246,9 @@ typedef struct xfs_trans_header {
 	{ XFS_LI_QUOTAOFF,	"XFS_LI_QUOTAOFF" }, \
 	{ XFS_LI_ICREATE,	"XFS_LI_ICREATE" }, \
 	{ XFS_LI_RUI,		"XFS_LI_RUI" }, \
-	{ XFS_LI_RUD,		"XFS_LI_RUD" }
+	{ XFS_LI_RUD,		"XFS_LI_RUD" }, \
+	{ XFS_LI_CUI,		"XFS_LI_CUI" }, \
+	{ XFS_LI_CUD,		"XFS_LI_CUD" }
 
 /*
  * Inode Log Item Format definitions.
@@ -667,6 +673,50 @@ struct xfs_rud_log_format {
 };
 
 /*
+ * CUI/CUD (refcount update) log format definitions
+ */
+struct xfs_phys_extent {
+	__uint64_t		pe_startblock;
+	__uint32_t		pe_len;
+	__uint32_t		pe_flags;
+};
+
+/* refcount pe_flags: upper bits are flags, lower byte is type code */
+#define XFS_REFCOUNT_EXTENT_INCREASE	1
+#define XFS_REFCOUNT_EXTENT_DECREASE	2
+#define XFS_REFCOUNT_EXTENT_ALLOC_COW	3
+#define XFS_REFCOUNT_EXTENT_FREE_COW	4
+#define XFS_REFCOUNT_EXTENT_TYPE_MASK	0xFF
+
+#define XFS_REFCOUNT_EXTENT_FLAGS	(XFS_REFCOUNT_EXTENT_TYPE_MASK)
+
+/*
+ * This is the structure used to lay out a cui log item in the
+ * log.  The cui_extents field is a variable size array whose
+ * size is given by cui_nextents.
+ */
+struct xfs_cui_log_format {
+	__uint16_t		cui_type;	/* cui log item type */
+	__uint16_t		cui_size;	/* size of this item */
+	__uint32_t		cui_nextents;	/* # extents to free */
+	__uint64_t		cui_id;		/* cui identifier */
+	struct xfs_phys_extent	cui_extents[1];	/* array of extents */
+};
+
+/*
+ * This is the structure used to lay out a cud log item in the
+ * log.  The cud_extents array is a variable size array whose
+ * size is given by cud_nextents;
+ */
+struct xfs_cud_log_format {
+	__uint16_t		cud_type;	/* cud log item type */
+	__uint16_t		cud_size;	/* size of this item */
+	__uint32_t		cud_nextents;	/* # of extents freed */
+	__uint64_t		cud_cui_id;	/* id of corresponding cui */
+	struct xfs_phys_extent	cud_extents[1];	/* array of extents */
+};
+
+/*
  * Dquot Log format definitions.
  *
  * The first two fields must be the type and size fitting into

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 080/145] xfs: log refcount intent items
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (78 preceding siblings ...)
  2016-06-17  1:39 ` [PATCH 079/145] xfs: create refcount update intent log items Darrick J. Wong
@ 2016-06-17  1:39 ` Darrick J. Wong
  2016-06-17  1:39 ` [PATCH 081/145] xfs: adjust refcount of an extent of blocks in refcount btree Darrick J. Wong
                   ` (64 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:39 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Provide a mechanism for higher levels to create CUI/CUD items, submit
them to the log, and a stub function to deal with recovered CUI items.
These parts will be connected to the refcountbt in a later patch.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_refcount.h |   14 ++++++++++++++
 1 file changed, 14 insertions(+)


diff --git a/libxfs/xfs_refcount.h b/libxfs/xfs_refcount.h
index 8ea65c6..0b36c1d 100644
--- a/libxfs/xfs_refcount.h
+++ b/libxfs/xfs_refcount.h
@@ -27,4 +27,18 @@ extern int xfs_refcountbt_lookup_ge(struct xfs_btree_cur *cur,
 extern int xfs_refcountbt_get_rec(struct xfs_btree_cur *cur,
 		struct xfs_refcount_irec *irec, int *stat);
 
+enum xfs_refcount_intent_type {
+	XFS_REFCOUNT_INCREASE,
+	XFS_REFCOUNT_DECREASE,
+	XFS_REFCOUNT_ALLOC_COW,
+	XFS_REFCOUNT_FREE_COW,
+};
+
+struct xfs_refcount_intent {
+	struct list_head			ri_list;
+	enum xfs_refcount_intent_type		ri_type;
+	xfs_fsblock_t				ri_startblock;
+	xfs_extlen_t				ri_blockcount;
+};
+
 #endif	/* __XFS_REFCOUNT_H__ */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 081/145] xfs: adjust refcount of an extent of blocks in refcount btree
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (79 preceding siblings ...)
  2016-06-17  1:39 ` [PATCH 080/145] xfs: log refcount intent items Darrick J. Wong
@ 2016-06-17  1:39 ` Darrick J. Wong
  2016-06-17  1:39 ` [PATCH 082/145] xfs: connect refcount adjust functions to upper layers Darrick J. Wong
                   ` (63 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:39 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Provide functions to adjust the reference counts for an extent of
physical blocks stored in the refcount btree.

v2: Refactor the left/right split code into a single function.  Track
the number of btree shape changes and record updates during a refcount
update so that we can decide if we need to get a fresh transaction to
continue.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 include/xfs_trace.h   |   20 +
 libxfs/xfs_refcount.c |  783 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 803 insertions(+)


diff --git a/include/xfs_trace.h b/include/xfs_trace.h
index cb5fa89..0b167e9 100644
--- a/include/xfs_trace.h
+++ b/include/xfs_trace.h
@@ -232,6 +232,26 @@
 #define trace_xfs_refcountbt_free_block(...)	((void) 0)
 #define trace_xfs_refcountbt_alloc_block(...)	((void) 0)
 #define trace_xfs_refcount_rec_order_error(...)	((void) 0)
+#define trace_xfs_refcount_split_extent(...)	((void) 0)
+#define trace_xfs_refcount_split_extent_error(...)	((void) 0)
+#define trace_xfs_refcount_merge_center_extents_error(...)	((void) 0)
+#define trace_xfs_refcount_merge_left_extent_error(...)	((void) 0)
+#define trace_xfs_refcount_merge_right_extent_error(...)	((void) 0)
+#define trace_xfs_refcount_find_left_extent(...)	((void) 0)
+#define trace_xfs_refcount_find_left_extent_error(...)	((void) 0)
+#define trace_xfs_refcount_find_right_extent(...)	((void) 0)
+#define trace_xfs_refcount_find_right_extent_error(...)	((void) 0)
+#define trace_xfs_refcount_merge_center_extents(...)	((void) 0)
+#define trace_xfs_refcount_merge_left_extent(...)	((void) 0)
+#define trace_xfs_refcount_merge_right_extent(...)	((void) 0)
+#define trace_xfs_refcount_modify_extent(...)	((void) 0)
+#define trace_xfs_refcount_modify_extent_error(...)	((void) 0)
+#define trace_xfs_refcount_adjust_error(...)	((void) 0)
+#define trace_xfs_refcount_increase(...)	((void) 0)
+#define trace_xfs_refcount_decrease(...)	((void) 0)
+#define trace_xfs_refcount_deferred(...)	((void) 0)
+#define trace_xfs_refcount_defer(...)		((void) 0)
+#define trace_xfs_refcount_finish_one_leftover(...)	((void) 0)
 
 /* set c = c to avoid unused var warnings */
 #define trace_xfs_perag_get(a,b,c,d)	((c) = (c))
diff --git a/libxfs/xfs_refcount.c b/libxfs/xfs_refcount.c
index 0eda933..cd68c1e 100644
--- a/libxfs/xfs_refcount.c
+++ b/libxfs/xfs_refcount.c
@@ -36,6 +36,12 @@
 #include "xfs_bit.h"
 #include "xfs_refcount.h"
 
+/* Allowable refcount adjustment amounts. */
+enum xfs_refc_adjust_op {
+	XFS_REFCOUNT_ADJUST_INCREASE	= 1,
+	XFS_REFCOUNT_ADJUST_DECREASE	= -1,
+};
+
 /*
  * Look up the first record less than or equal to [bno, len] in the btree
  * given by cur.
@@ -174,3 +180,780 @@ out_error:
 				cur->bc_private.a.agno, error, _RET_IP_);
 	return error;
 }
+
+/*
+ * Adjusting the Reference Count
+ *
+ * As stated elsewhere, the reference count btree (refcbt) stores
+ * >1 reference counts for extents of physical blocks.  In this
+ * operation, we're either raising or lowering the reference count of
+ * some subrange stored in the tree:
+ *
+ *      <------ adjustment range ------>
+ * ----+   +---+-----+ +--+--------+---------
+ *  2  |   | 3 |  4  | |17|   55   |   10
+ * ----+   +---+-----+ +--+--------+---------
+ * X axis is physical blocks number;
+ * reference counts are the numbers inside the rectangles
+ *
+ * The first thing we need to do is to ensure that there are no
+ * refcount extents crossing either boundary of the range to be
+ * adjusted.  For any extent that does cross a boundary, split it into
+ * two extents so that we can increment the refcount of one of the
+ * pieces later:
+ *
+ *      <------ adjustment range ------>
+ * ----+   +---+-----+ +--+--------+----+----
+ *  2  |   | 3 |  2  | |17|   55   | 10 | 10
+ * ----+   +---+-----+ +--+--------+----+----
+ *
+ * For this next step, let's assume that all the physical blocks in
+ * the adjustment range are mapped to a file and are therefore in use
+ * at least once.  Therefore, we can infer that any gap in the
+ * refcount tree within the adjustment range represents a physical
+ * extent with refcount == 1:
+ *
+ *      <------ adjustment range ------>
+ * ----+---+---+-----+-+--+--------+----+----
+ *  2  |"1"| 3 |  2  |1|17|   55   | 10 | 10
+ * ----+---+---+-----+-+--+--------+----+----
+ *      ^
+ *
+ * For each extent that falls within the interval range, figure out
+ * which extent is to the left or the right of that extent.  Now we
+ * have a left, current, and right extent.  If the new reference count
+ * of the center extent enables us to merge left, center, and right
+ * into one record covering all three, do so.  If the center extent is
+ * at the left end of the range, abuts the left extent, and its new
+ * reference count matches the left extent's record, then merge them.
+ * If the center extent is at the right end of the range, abuts the
+ * right extent, and the reference counts match, merge those.  In the
+ * example, we can left merge (assuming an increment operation):
+ *
+ *      <------ adjustment range ------>
+ * --------+---+-----+-+--+--------+----+----
+ *    2    | 3 |  2  |1|17|   55   | 10 | 10
+ * --------+---+-----+-+--+--------+----+----
+ *          ^
+ *
+ * For all other extents within the range, adjust the reference count
+ * or delete it if the refcount falls below 2.  If we were
+ * incrementing, the end result looks like this:
+ *
+ *      <------ adjustment range ------>
+ * --------+---+-----+-+--+--------+----+----
+ *    2    | 4 |  3  |2|18|   56   | 11 | 10
+ * --------+---+-----+-+--+--------+----+----
+ *
+ * The result of a decrement operation looks as such:
+ *
+ *      <------ adjustment range ------>
+ * ----+   +---+       +--+--------+----+----
+ *  2  |   | 2 |       |16|   54   |  9 | 10
+ * ----+   +---+       +--+--------+----+----
+ *      DDDD    111111DD
+ *
+ * The blocks marked "D" are freed; the blocks marked "1" are only
+ * referenced once and therefore the record is removed from the
+ * refcount btree.
+ */
+
+#define RCNEXT(rc)	((rc).rc_startblock + (rc).rc_blockcount)
+/*
+ * Split a refcount extent that crosses agbno.
+ */
+STATIC int
+xfs_refcount_split_extent(
+	struct xfs_btree_cur		*cur,
+	xfs_agblock_t			agbno,
+	bool				*shape_changed)
+{
+	struct xfs_refcount_irec	rcext, tmp;
+	int				found_rec;
+	int				error;
+
+	*shape_changed = false;
+	error = xfs_refcountbt_lookup_le(cur, agbno, &found_rec);
+	if (error)
+		goto out_error;
+	if (!found_rec)
+		return 0;
+
+	error = xfs_refcountbt_get_rec(cur, &rcext, &found_rec);
+	if (error)
+		goto out_error;
+	XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
+	if (rcext.rc_startblock == agbno || RCNEXT(rcext) <= agbno)
+		return 0;
+
+	*shape_changed = true;
+	trace_xfs_refcount_split_extent(cur->bc_mp, cur->bc_private.a.agno,
+			&rcext, agbno);
+
+	/* Establish the right extent. */
+	tmp = rcext;
+	tmp.rc_startblock = agbno;
+	tmp.rc_blockcount -= (agbno - rcext.rc_startblock);
+	error = xfs_refcountbt_update(cur, &tmp);
+	if (error)
+		goto out_error;
+
+	/* Insert the left extent. */
+	tmp = rcext;
+	tmp.rc_blockcount = agbno - rcext.rc_startblock;
+	error = xfs_refcountbt_insert(cur, &tmp, &found_rec);
+	if (error)
+		goto out_error;
+	XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
+	return error;
+
+out_error:
+	trace_xfs_refcount_split_extent_error(cur->bc_mp,
+			cur->bc_private.a.agno, error, _RET_IP_);
+	return error;
+}
+
+/*
+ * Merge the left, center, and right extents.
+ */
+STATIC int
+xfs_refcount_merge_center_extent(
+	struct xfs_btree_cur		*cur,
+	struct xfs_refcount_irec	*left,
+	struct xfs_refcount_irec	*center,
+	unsigned long long		extlen,
+	xfs_agblock_t			*agbno,
+	xfs_extlen_t			*aglen)
+{
+	int				error;
+	int				found_rec;
+
+	error = xfs_refcountbt_lookup_ge(cur, center->rc_startblock,
+			&found_rec);
+	if (error)
+		goto out_error;
+	XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
+
+	error = xfs_refcountbt_delete(cur, &found_rec);
+	if (error)
+		goto out_error;
+	XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
+
+	if (center->rc_refcount > 1) {
+		error = xfs_refcountbt_delete(cur, &found_rec);
+		if (error)
+			goto out_error;
+		XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1,
+				out_error);
+	}
+
+	error = xfs_refcountbt_lookup_le(cur, left->rc_startblock,
+			&found_rec);
+	if (error)
+		goto out_error;
+	XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
+
+	left->rc_blockcount = extlen;
+	error = xfs_refcountbt_update(cur, left);
+	if (error)
+		goto out_error;
+
+	*aglen = 0;
+	return error;
+
+out_error:
+	trace_xfs_refcount_merge_center_extents_error(cur->bc_mp,
+			cur->bc_private.a.agno, error, _RET_IP_);
+	return error;
+}
+
+/*
+ * Merge with the left extent.
+ */
+STATIC int
+xfs_refcount_merge_left_extent(
+	struct xfs_btree_cur		*cur,
+	struct xfs_refcount_irec	*left,
+	struct xfs_refcount_irec	*cleft,
+	xfs_agblock_t			*agbno,
+	xfs_extlen_t			*aglen)
+{
+	int				error;
+	int				found_rec;
+
+	if (cleft->rc_refcount > 1) {
+		error = xfs_refcountbt_lookup_le(cur, cleft->rc_startblock,
+				&found_rec);
+		if (error)
+			goto out_error;
+		XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1,
+				out_error);
+
+		error = xfs_refcountbt_delete(cur, &found_rec);
+		if (error)
+			goto out_error;
+		XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1,
+				out_error);
+	}
+
+	error = xfs_refcountbt_lookup_le(cur, left->rc_startblock,
+			&found_rec);
+	if (error)
+		goto out_error;
+	XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
+
+	left->rc_blockcount += cleft->rc_blockcount;
+	error = xfs_refcountbt_update(cur, left);
+	if (error)
+		goto out_error;
+
+	*agbno += cleft->rc_blockcount;
+	*aglen -= cleft->rc_blockcount;
+	return error;
+
+out_error:
+	trace_xfs_refcount_merge_left_extent_error(cur->bc_mp,
+			cur->bc_private.a.agno, error, _RET_IP_);
+	return error;
+}
+
+/*
+ * Merge with the right extent.
+ */
+STATIC int
+xfs_refcount_merge_right_extent(
+	struct xfs_btree_cur		*cur,
+	struct xfs_refcount_irec	*right,
+	struct xfs_refcount_irec	*cright,
+	xfs_agblock_t			*agbno,
+	xfs_extlen_t			*aglen)
+{
+	int				error;
+	int				found_rec;
+
+	if (cright->rc_refcount > 1) {
+		error = xfs_refcountbt_lookup_le(cur, cright->rc_startblock,
+			&found_rec);
+		if (error)
+			goto out_error;
+		XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1,
+				out_error);
+
+		error = xfs_refcountbt_delete(cur, &found_rec);
+		if (error)
+			goto out_error;
+		XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1,
+				out_error);
+	}
+
+	error = xfs_refcountbt_lookup_le(cur, right->rc_startblock,
+			&found_rec);
+	if (error)
+		goto out_error;
+	XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
+
+	right->rc_startblock -= cright->rc_blockcount;
+	right->rc_blockcount += cright->rc_blockcount;
+	error = xfs_refcountbt_update(cur, right);
+	if (error)
+		goto out_error;
+
+	*aglen -= cright->rc_blockcount;
+	return error;
+
+out_error:
+	trace_xfs_refcount_merge_right_extent_error(cur->bc_mp,
+			cur->bc_private.a.agno, error, _RET_IP_);
+	return error;
+}
+
+/*
+ * Find the left extent and the one after it (cleft).  This function assumes
+ * that we've already split any extent crossing agbno.
+ */
+STATIC int
+xfs_refcount_find_left_extents(
+	struct xfs_btree_cur		*cur,
+	struct xfs_refcount_irec	*left,
+	struct xfs_refcount_irec	*cleft,
+	xfs_agblock_t			agbno,
+	xfs_extlen_t			aglen)
+{
+	struct xfs_refcount_irec	tmp;
+	int				error;
+	int				found_rec;
+
+	left->rc_blockcount = cleft->rc_blockcount = 0;
+	error = xfs_refcountbt_lookup_le(cur, agbno - 1, &found_rec);
+	if (error)
+		goto out_error;
+	if (!found_rec)
+		return 0;
+
+	error = xfs_refcountbt_get_rec(cur, &tmp, &found_rec);
+	if (error)
+		goto out_error;
+	XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
+
+	if (RCNEXT(tmp) != agbno)
+		return 0;
+	/* We have a left extent; retrieve (or invent) the next right one */
+	*left = tmp;
+
+	error = xfs_btree_increment(cur, 0, &found_rec);
+	if (error)
+		goto out_error;
+	if (found_rec) {
+		error = xfs_refcountbt_get_rec(cur, &tmp, &found_rec);
+		if (error)
+			goto out_error;
+		XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1,
+				out_error);
+
+		/* if tmp starts at the end of our range, just use that */
+		if (tmp.rc_startblock == agbno)
+			*cleft = tmp;
+		else {
+			/*
+			 * There's a gap in the refcntbt at the start of the
+			 * range we're interested in (refcount == 1) so
+			 * create the implied extent and pass it back.
+			 */
+			cleft->rc_startblock = agbno;
+			cleft->rc_blockcount = min(aglen,
+					tmp.rc_startblock - agbno);
+			cleft->rc_refcount = 1;
+		}
+	} else {
+		/*
+		 * No extents, so pretend that there's one covering the whole
+		 * range.
+		 */
+		cleft->rc_startblock = agbno;
+		cleft->rc_blockcount = aglen;
+		cleft->rc_refcount = 1;
+	}
+	trace_xfs_refcount_find_left_extent(cur->bc_mp, cur->bc_private.a.agno,
+			left, cleft, agbno);
+	return error;
+
+out_error:
+	trace_xfs_refcount_find_left_extent_error(cur->bc_mp,
+			cur->bc_private.a.agno, error, _RET_IP_);
+	return error;
+}
+
+/*
+ * Find the right extent and the one before it (cright).  This function
+ * assumes that we've already split any extents crossing agbno + aglen.
+ */
+STATIC int
+xfs_refcount_find_right_extents(
+	struct xfs_btree_cur		*cur,
+	struct xfs_refcount_irec	*right,
+	struct xfs_refcount_irec	*cright,
+	xfs_agblock_t			agbno,
+	xfs_extlen_t			aglen)
+{
+	struct xfs_refcount_irec	tmp;
+	int				error;
+	int				found_rec;
+
+	right->rc_blockcount = cright->rc_blockcount = 0;
+	error = xfs_refcountbt_lookup_ge(cur, agbno + aglen, &found_rec);
+	if (error)
+		goto out_error;
+	if (!found_rec)
+		return 0;
+
+	error = xfs_refcountbt_get_rec(cur, &tmp, &found_rec);
+	if (error)
+		goto out_error;
+	XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
+
+	if (tmp.rc_startblock != agbno + aglen)
+		return 0;
+	/* We have a right extent; retrieve (or invent) the next left one */
+	*right = tmp;
+
+	error = xfs_btree_decrement(cur, 0, &found_rec);
+	if (error)
+		goto out_error;
+	if (found_rec) {
+		error = xfs_refcountbt_get_rec(cur, &tmp, &found_rec);
+		if (error)
+			goto out_error;
+		XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1,
+				out_error);
+
+		/* if tmp ends at the end of our range, just use that */
+		if (RCNEXT(tmp) == agbno + aglen)
+			*cright = tmp;
+		else {
+			/*
+			 * There's a gap in the refcntbt at the end of the
+			 * range we're interested in (refcount == 1) so
+			 * create the implied extent and pass it back.
+			 */
+			cright->rc_startblock = max(agbno, RCNEXT(tmp));
+			cright->rc_blockcount = right->rc_startblock -
+					cright->rc_startblock;
+			cright->rc_refcount = 1;
+		}
+	} else {
+		/*
+		 * No extents, so pretend that there's one covering the whole
+		 * range.
+		 */
+		cright->rc_startblock = agbno;
+		cright->rc_blockcount = aglen;
+		cright->rc_refcount = 1;
+	}
+	trace_xfs_refcount_find_right_extent(cur->bc_mp, cur->bc_private.a.agno,
+			cright, right, agbno + aglen);
+	return error;
+
+out_error:
+	trace_xfs_refcount_find_right_extent_error(cur->bc_mp,
+			cur->bc_private.a.agno, error, _RET_IP_);
+	return error;
+}
+#undef RCNEXT
+
+/*
+ * Try to merge with any extents on the boundaries of the adjustment range.
+ */
+STATIC int
+xfs_refcount_merge_extents(
+	struct xfs_btree_cur	*cur,
+	xfs_agblock_t		*agbno,
+	xfs_extlen_t		*aglen,
+	enum xfs_refc_adjust_op adjust,
+	bool			*shape_changed)
+{
+	struct xfs_refcount_irec	left = {0}, cleft = {0};
+	struct xfs_refcount_irec	cright = {0}, right = {0};
+	int				error;
+	unsigned long long		ulen;
+	bool				cequal;
+
+	*shape_changed = false;
+	/*
+	 * Find the extent just below agbno [left], just above agbno [cleft],
+	 * just below (agbno + aglen) [cright], and just above (agbno + aglen)
+	 * [right].
+	 */
+	error = xfs_refcount_find_left_extents(cur, &left, &cleft, *agbno,
+			*aglen);
+	if (error)
+		return error;
+	error = xfs_refcount_find_right_extents(cur, &right, &cright, *agbno,
+			*aglen);
+	if (error)
+		return error;
+
+	/* No left or right extent to merge; exit. */
+	if (left.rc_blockcount == 0 && right.rc_blockcount == 0)
+		return 0;
+
+	*shape_changed = true;
+	cequal = (cleft.rc_startblock == cright.rc_startblock) &&
+		 (cleft.rc_blockcount == cright.rc_blockcount);
+
+	/* Try to merge left, cleft, and right.  cleft must == cright. */
+	ulen = (unsigned long long)left.rc_blockcount + cleft.rc_blockcount +
+			right.rc_blockcount;
+	if (left.rc_blockcount != 0 && right.rc_blockcount != 0 &&
+	    cleft.rc_blockcount != 0 && cright.rc_blockcount != 0 &&
+	    cequal &&
+	    left.rc_refcount == cleft.rc_refcount + adjust &&
+	    right.rc_refcount == cleft.rc_refcount + adjust &&
+	    ulen < MAXREFCEXTLEN) {
+		trace_xfs_refcount_merge_center_extents(cur->bc_mp,
+				cur->bc_private.a.agno, &left, &cleft, &right);
+		return xfs_refcount_merge_center_extent(cur, &left, &cleft,
+				ulen, agbno, aglen);
+	}
+
+	/* Try to merge left and cleft. */
+	ulen = (unsigned long long)left.rc_blockcount + cleft.rc_blockcount;
+	if (left.rc_blockcount != 0 && cleft.rc_blockcount != 0 &&
+	    left.rc_refcount == cleft.rc_refcount + adjust &&
+	    ulen < MAXREFCEXTLEN) {
+		trace_xfs_refcount_merge_left_extent(cur->bc_mp,
+				cur->bc_private.a.agno, &left, &cleft);
+		error = xfs_refcount_merge_left_extent(cur, &left, &cleft,
+				agbno, aglen);
+		if (error)
+			return error;
+
+		/*
+		 * If we just merged left + cleft and cleft == cright,
+		 * we no longer have a cright to merge with right.  We're done.
+		 */
+		if (cequal)
+			return 0;
+	}
+
+	/* Try to merge cright and right. */
+	ulen = (unsigned long long)right.rc_blockcount + cright.rc_blockcount;
+	if (right.rc_blockcount != 0 && cright.rc_blockcount != 0 &&
+	    right.rc_refcount == cright.rc_refcount + adjust &&
+	    ulen < MAXREFCEXTLEN) {
+		trace_xfs_refcount_merge_right_extent(cur->bc_mp,
+				cur->bc_private.a.agno, &cright, &right);
+		return xfs_refcount_merge_right_extent(cur, &right, &cright,
+				agbno, aglen);
+	}
+
+	return error;
+}
+
+/*
+ * While we're adjusting the refcounts records of an extent, we have
+ * to keep an eye on the number of extents we're dirtying -- run too
+ * many in a single transaction and we'll exceed the transaction's
+ * reservation and crash the fs.  Each record adds 12 bytes to the
+ * log (plus any key updates) so we'll conservatively assume 24 bytes
+ * per record.  We must also leave space for btree splits on both ends
+ * of the range and space for the CUD and a new CUI.
+ *
+ * XXX: This is a pretty hand-wavy estimate.  The penalty for guessing
+ * true incorrectly is a shutdown FS; the penalty for guessing false
+ * incorrectly is more transaction rolls than might be necessary.
+ * Be conservative here.
+ */
+static bool
+xfs_refcount_still_have_space(
+	struct xfs_btree_cur		*cur)
+{
+	unsigned long			overhead;
+
+	overhead = cur->bc_private.a.priv.refc.shape_changes *
+			xfs_allocfree_log_count(cur->bc_mp, 1);
+	overhead *= cur->bc_mp->m_sb.sb_blocksize;
+
+	/*
+	 * Only allow 2 refcount extent updates per transaction if the
+	 * refcount continue update "error" has been injected.
+	 */
+	if (cur->bc_private.a.priv.refc.nr_ops > 2 &&
+	    XFS_TEST_ERROR(false, cur->bc_mp,
+			XFS_ERRTAG_REFCOUNT_CONTINUE_UPDATE,
+			XFS_RANDOM_REFCOUNT_CONTINUE_UPDATE))
+		return false;
+
+	if (cur->bc_private.a.priv.refc.nr_ops == 0)
+		return true;
+	else if (overhead > cur->bc_tp->t_log_res)
+		return false;
+	return  cur->bc_tp->t_log_res - overhead >
+		cur->bc_private.a.priv.refc.nr_ops * 32;
+}
+
+/*
+ * Adjust the refcounts of middle extents.  At this point we should have
+ * split extents that crossed the adjustment range; merged with adjacent
+ * extents; and updated agbno/aglen to reflect the merges.  Therefore,
+ * all we have to do is update the extents inside [agbno, agbno + aglen].
+ */
+STATIC int
+xfs_refcount_adjust_extents(
+	struct xfs_btree_cur	*cur,
+	xfs_agblock_t		agbno,
+	xfs_extlen_t		aglen,
+	xfs_extlen_t		*adjusted,
+	enum xfs_refc_adjust_op	adj,
+	struct xfs_defer_ops	*dfops,
+	struct xfs_owner_info	*oinfo)
+{
+	struct xfs_refcount_irec	ext, tmp;
+	int				error;
+	int				found_rec, found_tmp;
+	xfs_fsblock_t			fsbno;
+
+	/* Merging did all the work already. */
+	if (aglen == 0)
+		return 0;
+
+	error = xfs_refcountbt_lookup_ge(cur, agbno, &found_rec);
+	if (error)
+		goto out_error;
+
+	while (aglen > 0 && xfs_refcount_still_have_space(cur)) {
+		error = xfs_refcountbt_get_rec(cur, &ext, &found_rec);
+		if (error)
+			goto out_error;
+		if (!found_rec) {
+			ext.rc_startblock = cur->bc_mp->m_sb.sb_agblocks;
+			ext.rc_blockcount = 0;
+			ext.rc_refcount = 0;
+		}
+
+		/*
+		 * Deal with a hole in the refcount tree; if a file maps to
+		 * these blocks and there's no refcountbt recourd, pretend that
+		 * there is one with refcount == 1.
+		 */
+		if (ext.rc_startblock != agbno) {
+			tmp.rc_startblock = agbno;
+			tmp.rc_blockcount = min(aglen,
+					ext.rc_startblock - agbno);
+			tmp.rc_refcount = 1 + adj;
+			trace_xfs_refcount_modify_extent(cur->bc_mp,
+					cur->bc_private.a.agno, &tmp);
+
+			/*
+			 * Either cover the hole (increment) or
+			 * delete the range (decrement).
+			 */
+			if (tmp.rc_refcount) {
+				error = xfs_refcountbt_insert(cur, &tmp,
+						&found_tmp);
+				if (error)
+					goto out_error;
+				XFS_WANT_CORRUPTED_GOTO(cur->bc_mp,
+						found_tmp == 1, out_error);
+				cur->bc_private.a.priv.refc.nr_ops++;
+			} else {
+				fsbno = XFS_AGB_TO_FSB(cur->bc_mp,
+						cur->bc_private.a.agno,
+						tmp.rc_startblock);
+				xfs_bmap_add_free(cur->bc_mp, dfops, fsbno,
+						tmp.rc_blockcount, oinfo);
+			}
+
+			(*adjusted) += tmp.rc_blockcount;
+			agbno += tmp.rc_blockcount;
+			aglen -= tmp.rc_blockcount;
+
+			error = xfs_refcountbt_lookup_ge(cur, agbno,
+					&found_rec);
+			if (error)
+				goto out_error;
+		}
+
+		/* Stop if there's nothing left to modify */
+		if (aglen == 0 || !xfs_refcount_still_have_space(cur))
+			break;
+
+		/*
+		 * Adjust the reference count and either update the tree
+		 * (incr) or free the blocks (decr).
+		 */
+		if (ext.rc_refcount == MAXREFCOUNT)
+			goto skip;
+		ext.rc_refcount += adj;
+		trace_xfs_refcount_modify_extent(cur->bc_mp,
+				cur->bc_private.a.agno, &ext);
+		if (ext.rc_refcount > 1) {
+			error = xfs_refcountbt_update(cur, &ext);
+			if (error)
+				goto out_error;
+			cur->bc_private.a.priv.refc.nr_ops++;
+		} else if (ext.rc_refcount == 1) {
+			error = xfs_refcountbt_delete(cur, &found_rec);
+			if (error)
+				goto out_error;
+			XFS_WANT_CORRUPTED_GOTO(cur->bc_mp,
+					found_rec == 1, out_error);
+			cur->bc_private.a.priv.refc.nr_ops++;
+			goto advloop;
+		} else {
+			fsbno = XFS_AGB_TO_FSB(cur->bc_mp,
+					cur->bc_private.a.agno,
+					ext.rc_startblock);
+			xfs_bmap_add_free(cur->bc_mp, dfops, fsbno,
+					ext.rc_blockcount, oinfo);
+		}
+
+skip:
+		error = xfs_btree_increment(cur, 0, &found_rec);
+		if (error)
+			goto out_error;
+
+advloop:
+		(*adjusted) += ext.rc_blockcount;
+		agbno += ext.rc_blockcount;
+		aglen -= ext.rc_blockcount;
+	}
+
+	return error;
+out_error:
+	trace_xfs_refcount_modify_extent_error(cur->bc_mp,
+			cur->bc_private.a.agno, error, _RET_IP_);
+	return error;
+}
+
+/* Adjust the reference count of a range of AG blocks. */
+STATIC int
+xfs_refcount_adjust(
+	struct xfs_btree_cur	*cur,
+	xfs_agblock_t		agbno,
+	xfs_extlen_t		aglen,
+	xfs_extlen_t		*adjusted,
+	enum xfs_refc_adjust_op	adj,
+	struct xfs_defer_ops	*dfops,
+	struct xfs_owner_info	*oinfo)
+{
+	xfs_extlen_t		orig_aglen;
+	bool			shape_changed;
+	int			shape_changes = 0;
+	int			error;
+
+	*adjusted = 0;
+	switch (adj) {
+	case XFS_REFCOUNT_ADJUST_INCREASE:
+		trace_xfs_refcount_increase(cur->bc_mp, cur->bc_private.a.agno,
+				agbno, aglen);
+		break;
+	case XFS_REFCOUNT_ADJUST_DECREASE:
+		trace_xfs_refcount_decrease(cur->bc_mp, cur->bc_private.a.agno,
+				agbno, aglen);
+		break;
+	default:
+		ASSERT(0);
+	}
+
+	/*
+	 * Ensure that no rcextents cross the boundary of the adjustment range.
+	 */
+	error = xfs_refcount_split_extent(cur, agbno, &shape_changed);
+	if (error)
+		goto out_error;
+	if (shape_changed)
+		shape_changes++;
+
+	error = xfs_refcount_split_extent(cur, agbno + aglen, &shape_changed);
+	if (error)
+		goto out_error;
+	if (shape_changed)
+		shape_changes++;
+
+	/*
+	 * Try to merge with the left or right extents of the range.
+	 */
+	orig_aglen = aglen;
+	error = xfs_refcount_merge_extents(cur, &agbno, &aglen, adj,
+			&shape_changed);
+	if (error)
+		goto out_error;
+	if (shape_changed)
+		shape_changes++;
+	(*adjusted) += orig_aglen - aglen;
+	if (shape_changes)
+		cur->bc_private.a.priv.refc.shape_changes++;
+
+	/* Now that we've taken care of the ends, adjust the middle extents */
+	error = xfs_refcount_adjust_extents(cur, agbno, aglen, adjusted, adj,
+			dfops, oinfo);
+	if (error)
+		goto out_error;
+
+	return 0;
+
+out_error:
+	trace_xfs_refcount_adjust_error(cur->bc_mp, cur->bc_private.a.agno,
+			error, _RET_IP_);
+	return error;
+}

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 082/145] xfs: connect refcount adjust functions to upper layers
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (80 preceding siblings ...)
  2016-06-17  1:39 ` [PATCH 081/145] xfs: adjust refcount of an extent of blocks in refcount btree Darrick J. Wong
@ 2016-06-17  1:39 ` Darrick J. Wong
  2016-06-17  1:39 ` [PATCH 083/145] xfs: adjust refcount when unmapping file blocks Darrick J. Wong
                   ` (62 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:39 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Plumb in the upper level interface to schedule and finish deferred
refcount operations via the deferred ops mechanism.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/defer_item.c   |  123 +++++++++++++++++++++++++++++++++++
 libxfs/xfs_defer.h    |    1 
 libxfs/xfs_refcount.c |  171 +++++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_refcount.h |   12 +++
 4 files changed, 307 insertions(+)


diff --git a/libxfs/defer_item.c b/libxfs/defer_item.c
index d2e4ad0..a383813 100644
--- a/libxfs/defer_item.c
+++ b/libxfs/defer_item.c
@@ -31,6 +31,7 @@
 #include "xfs_bmap.h"
 #include "xfs_alloc.h"
 #include "xfs_rmap_btree.h"
+#include "xfs_refcount.h"
 
 /* Extent Freeing */
 
@@ -242,12 +243,134 @@ const struct xfs_defer_op_type xfs_rmap_update_defer_type = {
 	.cancel_item	= xfs_rmap_update_cancel_item,
 };
 
+/* Reference Counting */
+
+/* Sort rmap intents by AG. */
+static int
+xfs_refcount_update_diff_items(
+	void				*priv,
+	struct list_head		*a,
+	struct list_head		*b)
+{
+	struct xfs_mount		*mp = priv;
+	struct xfs_refcount_intent	*ra;
+	struct xfs_refcount_intent	*rb;
+
+	ra = container_of(a, struct xfs_refcount_intent, ri_list);
+	rb = container_of(b, struct xfs_refcount_intent, ri_list);
+	return  XFS_FSB_TO_AGNO(mp, ra->ri_startblock) -
+		XFS_FSB_TO_AGNO(mp, rb->ri_startblock);
+}
+
+/* Get an CUI. */
+STATIC void *
+xfs_refcount_update_create_intent(
+	struct xfs_trans		*tp,
+	unsigned int			count)
+{
+	return NULL;
+}
+
+/* Log refcount updates in the intent item. */
+STATIC void
+xfs_refcount_update_log_item(
+	struct xfs_trans		*tp,
+	void				*intent,
+	struct list_head		*item)
+{
+}
+
+/* Get an CUD so we can process all the deferred refcount updates. */
+STATIC void *
+xfs_refcount_update_create_done(
+	struct xfs_trans		*tp,
+	void				*intent,
+	unsigned int			count)
+{
+	return NULL;
+}
+
+/* Process a deferred refcount update. */
+STATIC int
+xfs_refcount_update_finish_item(
+	struct xfs_trans		*tp,
+	struct xfs_defer_ops		*dop,
+	struct list_head		*item,
+	void				*done_item,
+	void				**state)
+{
+	struct xfs_refcount_intent	*refc;
+	xfs_extlen_t			adjusted;
+	int				error;
+
+	refc = container_of(item, struct xfs_refcount_intent, ri_list);
+	error = xfs_refcount_finish_one(tp, dop,
+			refc->ri_type,
+			refc->ri_startblock,
+			refc->ri_blockcount,
+			&adjusted,
+			(struct xfs_btree_cur **)state);
+	/* Did we run out of reservation?  Requeue what we didn't finish. */
+	if (!error && adjusted < refc->ri_blockcount) {
+		ASSERT(refc->ri_type == XFS_REFCOUNT_INCREASE ||
+		       refc->ri_type == XFS_REFCOUNT_DECREASE);
+		refc->ri_startblock += adjusted;
+		refc->ri_blockcount -= adjusted;
+		return -EAGAIN;
+	}
+	kmem_free(refc);
+	return error;
+}
+
+/* Clean up after processing deferred refcounts. */
+STATIC void
+xfs_refcount_update_finish_cleanup(
+	struct xfs_trans	*tp,
+	void			*state,
+	int			error)
+{
+	struct xfs_btree_cur	*rcur = state;
+
+	xfs_refcount_finish_one_cleanup(tp, rcur, error);
+}
+
+/* Abort all pending CUIs. */
+STATIC void
+xfs_refcount_update_abort_intent(
+	void				*intent)
+{
+}
+
+/* Cancel a deferred refcount update. */
+STATIC void
+xfs_refcount_update_cancel_item(
+	struct list_head		*item)
+{
+	struct xfs_refcount_intent	*refc;
+
+	refc = container_of(item, struct xfs_refcount_intent, ri_list);
+	kmem_free(refc);
+}
+
+const struct xfs_defer_op_type xfs_refcount_update_defer_type = {
+	.type		= XFS_DEFER_OPS_TYPE_REFCOUNT,
+	.diff_items	= xfs_refcount_update_diff_items,
+	.create_intent	= xfs_refcount_update_create_intent,
+	.abort_intent	= xfs_refcount_update_abort_intent,
+	.log_item	= xfs_refcount_update_log_item,
+	.create_done	= xfs_refcount_update_create_done,
+	.finish_item	= xfs_refcount_update_finish_item,
+	.finish_cleanup = xfs_refcount_update_finish_cleanup,
+	.cancel_item	= xfs_refcount_update_cancel_item,
+};
+
 /* Deferred Item Initialization */
 
 /* Initialize the deferred operation types. */
 void
 xfs_defer_init_types(void)
 {
+	xfs_defer_init_op_type(&xfs_refcount_update_defer_type);
 	xfs_defer_init_op_type(&xfs_rmap_update_defer_type);
 	xfs_defer_init_op_type(&xfs_extent_free_defer_type);
 }
diff --git a/libxfs/xfs_defer.h b/libxfs/xfs_defer.h
index 920642e..4081b00 100644
--- a/libxfs/xfs_defer.h
+++ b/libxfs/xfs_defer.h
@@ -51,6 +51,7 @@ struct xfs_defer_pending {
  * find all the space it needs.
  */
 enum xfs_defer_ops_type {
+	XFS_DEFER_OPS_TYPE_REFCOUNT,
 	XFS_DEFER_OPS_TYPE_RMAP,
 	XFS_DEFER_OPS_TYPE_FREE,
 	XFS_DEFER_OPS_TYPE_MAX,
diff --git a/libxfs/xfs_refcount.c b/libxfs/xfs_refcount.c
index cd68c1e..55859cd 100644
--- a/libxfs/xfs_refcount.c
+++ b/libxfs/xfs_refcount.c
@@ -957,3 +957,174 @@ out_error:
 			error, _RET_IP_);
 	return error;
 }
+
+/* Clean up after calling xfs_refcount_finish_one. */
+void
+xfs_refcount_finish_one_cleanup(
+	struct xfs_trans	*tp,
+	struct xfs_btree_cur	*rcur,
+	int			error)
+{
+	struct xfs_buf		*agbp;
+
+	if (rcur == NULL)
+		return;
+	agbp = rcur->bc_private.a.agbp;
+	xfs_btree_del_cursor(rcur, error ? XFS_BTREE_ERROR : XFS_BTREE_NOERROR);
+	xfs_trans_brelse(tp, agbp);
+}
+
+/*
+ * Process one of the deferred refcount operations.  We pass back the
+ * btree cursor to maintain our lock on the btree between calls.
+ * This saves time and eliminates a buffer deadlock between the
+ * superblock and the AGF because we'll always grab them in the same
+ * order.
+ */
+int
+xfs_refcount_finish_one(
+	struct xfs_trans		*tp,
+	struct xfs_defer_ops		*dfops,
+	enum xfs_refcount_intent_type	type,
+	xfs_fsblock_t			startblock,
+	xfs_extlen_t			blockcount,
+	xfs_extlen_t			*adjusted,
+	struct xfs_btree_cur		**pcur)
+{
+	struct xfs_mount		*mp = tp->t_mountp;
+	struct xfs_btree_cur		*rcur;
+	struct xfs_buf			*agbp = NULL;
+	int				error = 0;
+	xfs_agnumber_t			agno;
+	xfs_agblock_t			bno;
+	unsigned long			nr_ops = 0;
+	int				shape_changes = 0;
+
+	agno = XFS_FSB_TO_AGNO(mp, startblock);
+	ASSERT(agno != NULLAGNUMBER);
+	bno = XFS_FSB_TO_AGBNO(mp, startblock);
+
+	trace_xfs_refcount_deferred(mp, XFS_FSB_TO_AGNO(mp, startblock),
+			type, XFS_FSB_TO_AGBNO(mp, startblock),
+			blockcount);
+
+	if (XFS_TEST_ERROR(false, mp,
+			XFS_ERRTAG_REFCOUNT_FINISH_ONE,
+			XFS_RANDOM_REFCOUNT_FINISH_ONE))
+		return -EIO;
+
+	/*
+	 * If we haven't gotten a cursor or the cursor AG doesn't match
+	 * the startblock, get one now.
+	 */
+	rcur = *pcur;
+	if (rcur != NULL && rcur->bc_private.a.agno != agno) {
+		nr_ops = rcur->bc_private.a.priv.refc.nr_ops;
+		shape_changes = rcur->bc_private.a.priv.refc.shape_changes;
+		xfs_refcount_finish_one_cleanup(tp, rcur, 0);
+		rcur = NULL;
+		*pcur = NULL;
+	}
+	if (rcur == NULL) {
+		error = xfs_alloc_read_agf(tp->t_mountp, tp, agno,
+				XFS_ALLOC_FLAG_FREEING, &agbp);
+		if (error)
+			return error;
+		if (!agbp)
+			return -EFSCORRUPTED;
+
+		rcur = xfs_refcountbt_init_cursor(mp, tp, agbp, agno, dfops);
+		if (!rcur) {
+			error = -ENOMEM;
+			goto out_cur;
+		}
+		rcur->bc_private.a.priv.refc.nr_ops = nr_ops;
+		rcur->bc_private.a.priv.refc.shape_changes = shape_changes;
+	}
+	*pcur = rcur;
+
+	switch (type) {
+	case XFS_REFCOUNT_INCREASE:
+		error = xfs_refcount_adjust(rcur, bno, blockcount, adjusted,
+			XFS_REFCOUNT_ADJUST_INCREASE, dfops, NULL);
+		break;
+	case XFS_REFCOUNT_DECREASE:
+		error = xfs_refcount_adjust(rcur, bno, blockcount, adjusted,
+			XFS_REFCOUNT_ADJUST_DECREASE, dfops, NULL);
+		break;
+	default:
+		ASSERT(0);
+		error = -EFSCORRUPTED;
+	}
+	if (!error && *adjusted != blockcount)
+		trace_xfs_refcount_finish_one_leftover(mp, agno, type,
+				bno, blockcount, *adjusted);
+	return error;
+
+out_cur:
+	xfs_trans_brelse(tp, agbp);
+
+	return error;
+}
+
+/*
+ * Record a refcount intent for later processing.
+ */
+static int
+__xfs_refcount_add(
+	struct xfs_mount		*mp,
+	struct xfs_defer_ops		*dfops,
+	struct xfs_refcount_intent	*ri)
+{
+	struct xfs_refcount_intent	*new;
+
+	if (!xfs_sb_version_hasreflink(&mp->m_sb))
+		return 0;
+
+	trace_xfs_refcount_defer(mp, XFS_FSB_TO_AGNO(mp, ri->ri_startblock),
+			ri->ri_type, XFS_FSB_TO_AGBNO(mp, ri->ri_startblock),
+			ri->ri_blockcount);
+
+	new = kmem_zalloc(sizeof(struct xfs_refcount_intent),
+			KM_SLEEP | KM_NOFS);
+	*new = *ri;
+
+	xfs_defer_add(dfops, XFS_DEFER_OPS_TYPE_REFCOUNT, &new->ri_list);
+	return 0;
+}
+
+/*
+ * Increase the reference count of the blocks backing a file's extent.
+ */
+int
+xfs_refcount_increase_extent(
+	struct xfs_mount		*mp,
+	struct xfs_defer_ops		*dfops,
+	struct xfs_bmbt_irec		*PREV)
+{
+	struct xfs_refcount_intent	ri;
+
+	ri.ri_type = XFS_REFCOUNT_INCREASE;
+	ri.ri_startblock = PREV->br_startblock;
+	ri.ri_blockcount = PREV->br_blockcount;
+
+	return __xfs_refcount_add(mp, dfops, &ri);
+}
+
+/*
+ * Decrease the reference count of the blocks backing a file's extent.
+ */
+int
+xfs_refcount_decrease_extent(
+	struct xfs_mount		*mp,
+	struct xfs_defer_ops		*dfops,
+	struct xfs_bmbt_irec		*PREV)
+{
+	struct xfs_refcount_intent	ri;
+
+	ri.ri_type = XFS_REFCOUNT_DECREASE;
+	ri.ri_startblock = PREV->br_startblock;
+	ri.ri_blockcount = PREV->br_blockcount;
+
+	return __xfs_refcount_add(mp, dfops, &ri);
+}
diff --git a/libxfs/xfs_refcount.h b/libxfs/xfs_refcount.h
index 0b36c1d..92c05ea 100644
--- a/libxfs/xfs_refcount.h
+++ b/libxfs/xfs_refcount.h
@@ -41,4 +41,16 @@ struct xfs_refcount_intent {
 	xfs_extlen_t				ri_blockcount;
 };
 
+extern int xfs_refcount_increase_extent(struct xfs_mount *mp,
+		struct xfs_defer_ops *dfops, struct xfs_bmbt_irec *irec);
+extern int xfs_refcount_decrease_extent(struct xfs_mount *mp,
+		struct xfs_defer_ops *dfops, struct xfs_bmbt_irec *irec);
+
+extern void xfs_refcount_finish_one_cleanup(struct xfs_trans *tp,
+		struct xfs_btree_cur *rcur, int error);
+extern int xfs_refcount_finish_one(struct xfs_trans *tp,
+		struct xfs_defer_ops *dfops, enum xfs_refcount_intent_type type,
+		xfs_fsblock_t startblock, xfs_extlen_t blockcount,
+		xfs_extlen_t *adjusted, struct xfs_btree_cur **pcur);
+
 #endif	/* __XFS_REFCOUNT_H__ */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 083/145] xfs: adjust refcount when unmapping file blocks
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (81 preceding siblings ...)
  2016-06-17  1:39 ` [PATCH 082/145] xfs: connect refcount adjust functions to upper layers Darrick J. Wong
@ 2016-06-17  1:39 ` Darrick J. Wong
  2016-06-17  1:39 ` [PATCH 084/145] xfs: refcount btree requires more reserved space Darrick J. Wong
                   ` (61 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:39 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

When we're unmapping blocks from a reflinked file, decrease the
refcount of the affected blocks and free the extents that are no
longer in use.

v2: Use deferred ops system to avoid deadlocks and running out of
transaction reservation.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_bmap.c |   14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)


diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index 50faacd..7ef1d18 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -40,6 +40,7 @@
 #include "xfs_quota_defs.h"
 #include "xfs_rmap_btree.h"
 #include "xfs_ag_resv.h"
+#include "xfs_refcount.h"
 
 
 kmem_zone_t		*xfs_bmap_free_item_zone;
@@ -5053,9 +5054,16 @@ xfs_bmap_del_extent(
 	/*
 	 * If we need to, add to list of extents to delete.
 	 */
-	if (do_fx)
-		xfs_bmap_add_free(mp, dfops, del->br_startblock,
-				  del->br_blockcount, NULL);
+	if (do_fx) {
+		if (xfs_is_reflink_inode(ip) && whichfork == XFS_DATA_FORK) {
+			error = xfs_refcount_decrease_extent(mp, dfops, del);
+			if (error)
+				goto done;
+		} else
+			xfs_bmap_add_free(mp, dfops, del->br_startblock,
+					  del->br_blockcount, NULL);
+	}
+
 	/*
 	 * Adjust inode # blocks in the file.
 	 */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 084/145] xfs: refcount btree requires more reserved space
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (82 preceding siblings ...)
  2016-06-17  1:39 ` [PATCH 083/145] xfs: adjust refcount when unmapping file blocks Darrick J. Wong
@ 2016-06-17  1:39 ` Darrick J. Wong
  2016-06-17  1:39 ` [PATCH 085/145] xfs: introduce reflink utility functions Darrick J. Wong
                   ` (60 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:39 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

The reference count btree is allocated from the free space, which
means that we have to ensure that an AG can't run out of free space
while performing a refcount operation.  In the pathological case each
AG block has its own refcntbt record, so we have to keep that much
space available.

v2: Calculate the maximum possible size of the rmap and refcount
btrees based on minimally-full btree blocks.  This increases the
per-AG block reservations to handle the worst case btree size.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_alloc.c          |    3 +++
 libxfs/xfs_refcount_btree.c |   23 +++++++++++++++++++++++
 libxfs/xfs_refcount_btree.h |    4 ++++
 3 files changed, 30 insertions(+)


diff --git a/libxfs/xfs_alloc.c b/libxfs/xfs_alloc.c
index 6554ce7..ca3e7ce 100644
--- a/libxfs/xfs_alloc.c
+++ b/libxfs/xfs_alloc.c
@@ -34,6 +34,7 @@
 #include "xfs_trace.h"
 #include "xfs_trans.h"
 #include "xfs_ag_resv.h"
+#include "xfs_refcount_btree.h"
 
 struct workqueue_struct *xfs_alloc_wq;
 
@@ -134,6 +135,8 @@ xfs_alloc_ag_max_usable(struct xfs_mount *mp)
 		/* rmap root block + full tree split on full AG */
 		blocks += 1 + (2 * mp->m_ag_maxlevels) - 1;
 	}
+	if (xfs_sb_version_hasreflink(&mp->m_sb))
+		blocks += xfs_refcountbt_max_size(mp);
 
 	return mp->m_sb.sb_agblocks - blocks;
 }
diff --git a/libxfs/xfs_refcount_btree.c b/libxfs/xfs_refcount_btree.c
index 8c53e71..8c1cba9 100644
--- a/libxfs/xfs_refcount_btree.c
+++ b/libxfs/xfs_refcount_btree.c
@@ -372,3 +372,26 @@ xfs_refcountbt_compute_maxlevels(
 	mp->m_refc_maxlevels = xfs_btree_compute_maxlevels(mp,
 			mp->m_refc_mnr, mp->m_sb.sb_agblocks);
 }
+
+/* Calculate the refcount btree size for some records. */
+xfs_extlen_t
+xfs_refcountbt_calc_size(
+	struct xfs_mount	*mp,
+	unsigned long long	len)
+{
+	return xfs_btree_calc_size(mp, mp->m_refc_mnr, len);
+}
+
+/*
+ * Calculate the maximum refcount btree size.
+ */
+xfs_extlen_t
+xfs_refcountbt_max_size(
+	struct xfs_mount	*mp)
+{
+	/* Bail out if we're uninitialized, which can happen in mkfs. */
+	if (mp->m_refc_mxr[0] == 0)
+		return 0;
+
+	return xfs_refcountbt_calc_size(mp, mp->m_sb.sb_agblocks);
+}
diff --git a/libxfs/xfs_refcount_btree.h b/libxfs/xfs_refcount_btree.h
index 9e9ad7c..780b02f 100644
--- a/libxfs/xfs_refcount_btree.h
+++ b/libxfs/xfs_refcount_btree.h
@@ -64,4 +64,8 @@ extern int xfs_refcountbt_maxrecs(struct xfs_mount *mp, int blocklen,
 		bool leaf);
 extern void xfs_refcountbt_compute_maxlevels(struct xfs_mount *mp);
 
+extern xfs_extlen_t xfs_refcountbt_calc_size(struct xfs_mount *mp,
+		unsigned long long len);
+extern xfs_extlen_t xfs_refcountbt_max_size(struct xfs_mount *mp);
+
 #endif	/* __XFS_REFCOUNT_BTREE_H__ */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 085/145] xfs: introduce reflink utility functions
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (83 preceding siblings ...)
  2016-06-17  1:39 ` [PATCH 084/145] xfs: refcount btree requires more reserved space Darrick J. Wong
@ 2016-06-17  1:39 ` Darrick J. Wong
  2016-06-17  1:39 ` [PATCH 086/145] xfs: create bmbt update intent log items Darrick J. Wong
                   ` (59 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:39 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

These functions will be used by the other reflink functions to find
the maximum length of a range of shared blocks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.coM>
---
 include/xfs_trace.h   |    3 +
 libxfs/xfs_refcount.c |  109 +++++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_refcount.h |    4 ++
 3 files changed, 116 insertions(+)


diff --git a/include/xfs_trace.h b/include/xfs_trace.h
index 0b167e9..d3b2486 100644
--- a/include/xfs_trace.h
+++ b/include/xfs_trace.h
@@ -252,6 +252,9 @@
 #define trace_xfs_refcount_deferred(...)	((void) 0)
 #define trace_xfs_refcount_defer(...)		((void) 0)
 #define trace_xfs_refcount_finish_one_leftover(...)	((void) 0)
+#define trace_xfs_refcount_find_shared(...)	((void) 0)
+#define trace_xfs_refcount_find_shared_result(...)	((void) 0)
+#define trace_xfs_refcount_find_shared_error(...)	((void) 0)
 
 /* set c = c to avoid unused var warnings */
 #define trace_xfs_perag_get(a,b,c,d)	((c) = (c))
diff --git a/libxfs/xfs_refcount.c b/libxfs/xfs_refcount.c
index 55859cd..d2b614b 100644
--- a/libxfs/xfs_refcount.c
+++ b/libxfs/xfs_refcount.c
@@ -1128,3 +1128,112 @@ xfs_refcount_decrease_extent(
 
 	return __xfs_refcount_add(mp, dfops, &ri);
 }
+
+/*
+ * Given an AG extent, find the lowest-numbered run of shared blocks within
+ * that range and return the range in fbno/flen.  If find_maximal is set,
+ * return the longest extent of shared blocks; if not, just return the first
+ * extent we find.  If no shared blocks are found, flen will be set to zero.
+ */
+int
+xfs_refcount_find_shared(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	xfs_agblock_t		agbno,
+	xfs_extlen_t		aglen,
+	xfs_agblock_t		*fbno,
+	xfs_extlen_t		*flen,
+	bool			find_maximal)
+{
+	struct xfs_btree_cur	*cur;
+	struct xfs_buf		*agbp;
+	struct xfs_refcount_irec	tmp;
+	int			error;
+	int			i, have;
+	int			bt_error = XFS_BTREE_ERROR;
+
+	trace_xfs_refcount_find_shared(mp, agno, agbno, aglen);
+
+	error = xfs_alloc_read_agf(mp, NULL, agno, 0, &agbp);
+	if (error)
+		goto out;
+	cur = xfs_refcountbt_init_cursor(mp, NULL, agbp, agno, NULL);
+
+	/* By default, skip the whole range */
+	*fbno = agbno + aglen;
+	*flen = 0;
+
+	/* Try to find a refcount extent that crosses the start */
+	error = xfs_refcountbt_lookup_le(cur, agbno, &have);
+	if (error)
+		goto out_error;
+	if (!have) {
+		/* No left extent, look at the next one */
+		error = xfs_btree_increment(cur, 0, &have);
+		if (error)
+			goto out_error;
+		if (!have)
+			goto done;
+	}
+	error = xfs_refcountbt_get_rec(cur, &tmp, &i);
+	if (error)
+		goto out_error;
+	XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+
+	/* If the extent ends before the start, look at the next one */
+	if (tmp.rc_startblock + tmp.rc_blockcount <= agbno) {
+		error = xfs_btree_increment(cur, 0, &have);
+		if (error)
+			goto out_error;
+		if (!have)
+			goto done;
+		error = xfs_refcountbt_get_rec(cur, &tmp, &i);
+		if (error)
+			goto out_error;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+	}
+
+	/* If the extent ends after the range we want, bail out */
+	if (tmp.rc_startblock >= agbno + aglen)
+		goto done;
+
+	/* We found the start of a shared extent! */
+	if (tmp.rc_startblock < agbno) {
+		tmp.rc_blockcount -= (agbno - tmp.rc_startblock);
+		tmp.rc_startblock = agbno;
+	}
+
+	*fbno = tmp.rc_startblock;
+	*flen = min(tmp.rc_blockcount, agbno + aglen - *fbno);
+	if (!find_maximal)
+		goto done;
+
+	/* Otherwise, find the end of this shared extent */
+	while (*fbno + *flen < agbno + aglen) {
+		error = xfs_btree_increment(cur, 0, &have);
+		if (error)
+			goto out_error;
+		if (!have)
+			break;
+		error = xfs_refcountbt_get_rec(cur, &tmp, &i);
+		if (error)
+			goto out_error;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+		if (tmp.rc_startblock >= agbno + aglen ||
+		    tmp.rc_startblock != *fbno + *flen)
+			break;
+		*flen = min(*flen + tmp.rc_blockcount, agbno + aglen - *fbno);
+	}
+
+done:
+	bt_error = XFS_BTREE_NOERROR;
+	trace_xfs_refcount_find_shared_result(mp, agno, *fbno, *flen);
+
+out_error:
+	xfs_btree_del_cursor(cur, bt_error);
+	xfs_buf_relse(agbp);
+out:
+	if (error)
+		trace_xfs_refcount_find_shared_error(mp, agno, error, _RET_IP_);
+	return error;
+}
diff --git a/libxfs/xfs_refcount.h b/libxfs/xfs_refcount.h
index 92c05ea..b7b83b8 100644
--- a/libxfs/xfs_refcount.h
+++ b/libxfs/xfs_refcount.h
@@ -53,4 +53,8 @@ extern int xfs_refcount_finish_one(struct xfs_trans *tp,
 		xfs_fsblock_t startblock, xfs_extlen_t blockcount,
 		xfs_extlen_t *adjusted, struct xfs_btree_cur **pcur);
 
+extern int xfs_refcount_find_shared(struct xfs_mount *mp, xfs_agnumber_t agno,
+		xfs_agblock_t agbno, xfs_extlen_t aglen, xfs_agblock_t *fbno,
+		xfs_extlen_t *flen, bool find_maximal);
+
 #endif	/* __XFS_REFCOUNT_H__ */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 086/145] xfs: create bmbt update intent log items
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (84 preceding siblings ...)
  2016-06-17  1:39 ` [PATCH 085/145] xfs: introduce reflink utility functions Darrick J. Wong
@ 2016-06-17  1:39 ` Darrick J. Wong
  2016-06-17  1:39 ` [PATCH 087/145] xfs: log bmap intent items Darrick J. Wong
                   ` (58 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:39 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Create bmbt update intent/done log items to record redo information in
the log.  Because we need to roll transactions multiple times for
reflink operations, between we also have to track the status of the
metadata updates that will be recorded in the post-roll transactions,
just in case we crash before committing the final transaction.  This
mechanism enables log recovery to finish what was already started.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_log_format.h |   52 +++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 50 insertions(+), 2 deletions(-)


diff --git a/libxfs/xfs_log_format.h b/libxfs/xfs_log_format.h
index 923b08f..320a305 100644
--- a/libxfs/xfs_log_format.h
+++ b/libxfs/xfs_log_format.h
@@ -114,7 +114,9 @@ static inline uint xlog_get_cycle(char *ptr)
 #define XLOG_REG_TYPE_RUD_FORMAT	22
 #define XLOG_REG_TYPE_CUI_FORMAT	23
 #define XLOG_REG_TYPE_CUD_FORMAT	24
-#define XLOG_REG_TYPE_MAX		24
+#define XLOG_REG_TYPE_BUI_FORMAT	25
+#define XLOG_REG_TYPE_BUD_FORMAT	26
+#define XLOG_REG_TYPE_MAX		26
 
 /*
  * Flags to log operation header
@@ -235,6 +237,8 @@ typedef struct xfs_trans_header {
 #define	XFS_LI_RUD		0x1241
 #define	XFS_LI_CUI		0x1242	/* refcount update intent */
 #define	XFS_LI_CUD		0x1243
+#define	XFS_LI_BUI		0x1244	/* bmbt update intent */
+#define	XFS_LI_BUD		0x1245
 
 #define XFS_LI_TYPE_DESC \
 	{ XFS_LI_EFI,		"XFS_LI_EFI" }, \
@@ -248,7 +252,9 @@ typedef struct xfs_trans_header {
 	{ XFS_LI_RUI,		"XFS_LI_RUI" }, \
 	{ XFS_LI_RUD,		"XFS_LI_RUD" }, \
 	{ XFS_LI_CUI,		"XFS_LI_CUI" }, \
-	{ XFS_LI_CUD,		"XFS_LI_CUD" }
+	{ XFS_LI_CUD,		"XFS_LI_CUD" }, \
+	{ XFS_LI_BUI,		"XFS_LI_BUI" }, \
+	{ XFS_LI_BUD,		"XFS_LI_BUD" }
 
 /*
  * Inode Log Item Format definitions.
@@ -717,6 +723,48 @@ struct xfs_cud_log_format {
 };
 
 /*
+ * BUI/BUD (inode block mapping) log format definitions
+ */
+
+/* bmbt me_flags: upper bits are flags, lower byte is type code */
+#define XFS_BMAP_EXTENT_MAP		1
+#define XFS_BMAP_EXTENT_UNMAP		2
+#define XFS_BMAP_EXTENT_TYPE_MASK	0xFF
+
+#define XFS_BMAP_EXTENT_ATTR_FORK	(1U << 31)
+#define XFS_BMAP_EXTENT_UNWRITTEN	(1U << 30)
+
+#define XFS_BMAP_EXTENT_FLAGS		(XFS_BMAP_EXTENT_TYPE_MASK | \
+					 XFS_BMAP_EXTENT_ATTR_FORK | \
+					 XFS_BMAP_EXTENT_UNWRITTEN)
+
+/*
+ * This is the structure used to lay out an bui log item in the
+ * log.  The bui_extents field is a variable size array whose
+ * size is given by bui_nextents.
+ */
+struct xfs_bui_log_format {
+	__uint16_t		bui_type;	/* bui log item type */
+	__uint16_t		bui_size;	/* size of this item */
+	__uint32_t		bui_nextents;	/* # extents to free */
+	__uint64_t		bui_id;		/* bui identifier */
+	struct xfs_map_extent	bui_extents[1];	/* array of extents to bmap */
+};
+
+/*
+ * This is the structure used to lay out an bud log item in the
+ * log.  The bud_extents array is a variable size array whose
+ * size is given by bud_nextents;
+ */
+struct xfs_bud_log_format {
+	__uint16_t		bud_type;	/* bud log item type */
+	__uint16_t		bud_size;	/* size of this item */
+	__uint32_t		bud_nextents;	/* # of extents freed */
+	__uint64_t		bud_bui_id;	/* id of corresponding bui */
+	struct xfs_map_extent	bud_extents[1];	/* array of extents bmapped */
+};
+
+/*
  * Dquot Log format definitions.
  *
  * The first two fields must be the type and size fitting into

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 087/145] xfs: log bmap intent items
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (85 preceding siblings ...)
  2016-06-17  1:39 ` [PATCH 086/145] xfs: create bmbt update intent log items Darrick J. Wong
@ 2016-06-17  1:39 ` Darrick J. Wong
  2016-06-17  1:40 ` [PATCH 088/145] xfs: map an inode's offset to an exact physical block Darrick J. Wong
                   ` (57 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:39 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Provide a mechanism for higher levels to create BUI/BUD items, submit
them to the log, and a stub function to deal with recovered BUI items.
These parts will be connected to the rmapbt in a later patch.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_bmap.h |   12 ++++++++++++
 1 file changed, 12 insertions(+)


diff --git a/libxfs/xfs_bmap.h b/libxfs/xfs_bmap.h
index 41f7ef2..62a66d0 100644
--- a/libxfs/xfs_bmap.h
+++ b/libxfs/xfs_bmap.h
@@ -209,5 +209,17 @@ struct xfs_bmbt_rec_host *
 				struct xfs_bmbt_irec *gotp,
 				struct xfs_bmbt_irec *prevp);
 
+enum xfs_bmap_intent_type {
+	XFS_BMAP_MAP,
+	XFS_BMAP_UNMAP,
+};
+
+struct xfs_bmap_intent {
+	struct list_head			bi_list;
+	enum xfs_bmap_intent_type		bi_type;
+	struct xfs_inode			*bi_owner;
+	int					bi_whichfork;
+	struct xfs_bmbt_irec			bi_bmap;
+};
 
 #endif	/* __XFS_BMAP_H__ */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 088/145] xfs: map an inode's offset to an exact physical block
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (86 preceding siblings ...)
  2016-06-17  1:39 ` [PATCH 087/145] xfs: log bmap intent items Darrick J. Wong
@ 2016-06-17  1:40 ` Darrick J. Wong
  2016-06-17  1:40 ` [PATCH 089/145] xfs: implement deferred bmbt map/unmap operations Darrick J. Wong
                   ` (56 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:40 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Teach the bmap routine to know how to map a range of file blocks to a
specific range of physical blocks, instead of simply allocating fresh
blocks.  This enables reflink to map a file to blocks that are already
in use.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 include/xfs_trace.h |    3 ++
 libxfs/xfs_bmap.c   |   63 +++++++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_bmap.h   |   10 +++++++-
 3 files changed, 75 insertions(+), 1 deletion(-)


diff --git a/include/xfs_trace.h b/include/xfs_trace.h
index d3b2486..6277b53 100644
--- a/include/xfs_trace.h
+++ b/include/xfs_trace.h
@@ -256,6 +256,9 @@
 #define trace_xfs_refcount_find_shared_result(...)	((void) 0)
 #define trace_xfs_refcount_find_shared_error(...)	((void) 0)
 
+#define trace_xfs_bmap_remap_alloc(...)		((void) 0)
+#define trace_xfs_bmap_remap_alloc_error(...)	((void) 0)
+
 /* set c = c to avoid unused var warnings */
 #define trace_xfs_perag_get(a,b,c,d)	((c) = (c))
 #define trace_xfs_perag_get_tag(a,b,c,d) ((c) = (c))
diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index 7ef1d18..58f730e 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -3864,6 +3864,55 @@ xfs_bmap_btalloc(
 }
 
 /*
+ * For a remap operation, just "allocate" an extent at the address that the
+ * caller passed in, and ensure that the AGFL is the right size.  The caller
+ * will then map the "allocated" extent into the file somewhere.
+ */
+STATIC int
+xfs_bmap_remap_alloc(
+	struct xfs_bmalloca	*ap)
+{
+	struct xfs_trans	*tp = ap->tp;
+	struct xfs_mount	*mp = tp->t_mountp;
+	xfs_agblock_t		bno;
+	struct xfs_alloc_arg	args;
+	int			error;
+
+	/*
+	 * validate that the block number is legal - the enables us to detect
+	 * and handle a silent filesystem corruption rather than crashing.
+	 */
+	memset(&args, 0, sizeof(struct xfs_alloc_arg));
+	args.tp = ap->tp;
+	args.mp = ap->tp->t_mountp;
+	bno = *ap->firstblock;
+	args.agno = XFS_FSB_TO_AGNO(mp, bno);
+	ASSERT(args.agno < mp->m_sb.sb_agcount);
+	args.agbno = XFS_FSB_TO_AGBNO(mp, bno);
+	ASSERT(args.agbno < mp->m_sb.sb_agblocks);
+
+	/* "Allocate" the extent from the range we passed in. */
+	trace_xfs_bmap_remap_alloc(ap->ip, *ap->firstblock, ap->length);
+	ap->blkno = bno;
+	ap->ip->i_d.di_nblocks += ap->length;
+	xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE);
+
+	/* Fix the freelist, like a real allocator does. */
+	args.userdata = 1;
+	args.pag = xfs_perag_get(args.mp, args.agno);
+	ASSERT(args.pag);
+
+	error = xfs_alloc_fix_freelist(&args, XFS_ALLOC_FLAG_FREEING);
+	if (error)
+		goto error0;
+error0:
+	xfs_perag_put(args.pag);
+	if (error)
+		trace_xfs_bmap_remap_alloc_error(ap->ip, error, _RET_IP_);
+	return error;
+}
+
+/*
  * xfs_bmap_alloc is called by xfs_bmapi to allocate an extent for a file.
  * It figures out where to ask the underlying allocator to put the new extent.
  */
@@ -3871,6 +3920,8 @@ STATIC int
 xfs_bmap_alloc(
 	struct xfs_bmalloca	*ap)	/* bmap alloc argument struct */
 {
+	if (ap->flags & XFS_BMAPI_REMAP)
+		return xfs_bmap_remap_alloc(ap);
 	if (XFS_IS_REALTIME_INODE(ap->ip) && ap->userdata)
 		return xfs_bmap_rtalloc(ap);
 	return xfs_bmap_btalloc(ap);
@@ -4507,6 +4558,12 @@ xfs_bmapi_write(
 	ASSERT(len > 0);
 	ASSERT(XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_LOCAL);
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
+	if (whichfork == XFS_ATTR_FORK)
+		ASSERT(!(flags & XFS_BMAPI_REMAP));
+	if (flags & XFS_BMAPI_REMAP) {
+		ASSERT(!(flags & XFS_BMAPI_PREALLOC));
+		ASSERT(!(flags & XFS_BMAPI_CONVERT));
+	}
 
 	/* zeroing is for currently only for data extents, not metadata */
 	ASSERT((flags & (XFS_BMAPI_METADATA | XFS_BMAPI_ZERO)) !=
@@ -4568,6 +4625,12 @@ xfs_bmapi_write(
 		wasdelay = !inhole && isnullstartblock(bma.got.br_startblock);
 
 		/*
+		 * Make sure we only reflink into a hole.
+		 */
+		if (flags & XFS_BMAPI_REMAP)
+			ASSERT(inhole);
+
+		/*
 		 * First, deal with the hole before the allocated space
 		 * that we found, if any.
 		 */
diff --git a/libxfs/xfs_bmap.h b/libxfs/xfs_bmap.h
index 62a66d0..fb2fd4c 100644
--- a/libxfs/xfs_bmap.h
+++ b/libxfs/xfs_bmap.h
@@ -97,6 +97,13 @@ struct xfs_bmap_free_item
  */
 #define XFS_BMAPI_ZERO		0x080
 
+/*
+ * Map the inode offset to the block given in ap->firstblock.  Primarily
+ * used for reflink.  The range must be in a hole, and this flag cannot be
+ * turned on with PREALLOC or CONVERT, and cannot be used on the attr fork.
+ */
+#define XFS_BMAPI_REMAP		0x100
+
 #define XFS_BMAPI_FLAGS \
 	{ XFS_BMAPI_ENTIRE,	"ENTIRE" }, \
 	{ XFS_BMAPI_METADATA,	"METADATA" }, \
@@ -105,7 +112,8 @@ struct xfs_bmap_free_item
 	{ XFS_BMAPI_IGSTATE,	"IGSTATE" }, \
 	{ XFS_BMAPI_CONTIG,	"CONTIG" }, \
 	{ XFS_BMAPI_CONVERT,	"CONVERT" }, \
-	{ XFS_BMAPI_ZERO,	"ZERO" }
+	{ XFS_BMAPI_ZERO,	"ZERO" }, \
+	{ XFS_BMAPI_REMAP,	"REMAP" }
 
 
 static inline int xfs_bmapi_aflag(int w)

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 089/145] xfs: implement deferred bmbt map/unmap operations
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (87 preceding siblings ...)
  2016-06-17  1:40 ` [PATCH 088/145] xfs: map an inode's offset to an exact physical block Darrick J. Wong
@ 2016-06-17  1:40 ` Darrick J. Wong
  2016-06-17  1:40 ` [PATCH 090/145] xfs: return work remaining at the end of a bunmapi operation Darrick J. Wong
                   ` (55 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:40 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Implement deferred versions of the inode block map/unmap functions.
These will be used in subsequent patches to make reflink operations
atomic.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 include/xfs_trace.h |    2 +
 libxfs/defer_item.c |  101 ++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_bmap.c   |  124 +++++++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_bmap.h   |   11 +++++
 libxfs/xfs_defer.h  |    1 
 5 files changed, 239 insertions(+)


diff --git a/include/xfs_trace.h b/include/xfs_trace.h
index 6277b53..dfc92a6 100644
--- a/include/xfs_trace.h
+++ b/include/xfs_trace.h
@@ -258,6 +258,8 @@
 
 #define trace_xfs_bmap_remap_alloc(...)		((void) 0)
 #define trace_xfs_bmap_remap_alloc_error(...)	((void) 0)
+#define trace_xfs_bmap_deferred(...)		((void) 0)
+#define trace_xfs_bmap_defer(...)		((void) 0)
 
 /* set c = c to avoid unused var warnings */
 #define trace_xfs_perag_get(a,b,c,d)	((c) = (c))
diff --git a/libxfs/defer_item.c b/libxfs/defer_item.c
index a383813..bd41808 100644
--- a/libxfs/defer_item.c
+++ b/libxfs/defer_item.c
@@ -32,6 +32,8 @@
 #include "xfs_alloc.h"
 #include "xfs_rmap_btree.h"
 #include "xfs_refcount.h"
+#include "xfs_bmap.h"
+#include "xfs_inode.h"
 
 /* Extent Freeing */
 
@@ -364,12 +366,111 @@ const struct xfs_defer_op_type xfs_refcount_update_defer_type = {
 	.cancel_item	= xfs_refcount_update_cancel_item,
 };
 
+/* Inode Block Mapping */
+
+/* Sort bmap intents by inode. */
+static int
+xfs_bmap_update_diff_items(
+	void				*priv,
+	struct list_head		*a,
+	struct list_head		*b)
+{
+	struct xfs_bmap_intent		*ba;
+	struct xfs_bmap_intent		*bb;
+
+	ba = container_of(a, struct xfs_bmap_intent, bi_list);
+	bb = container_of(b, struct xfs_bmap_intent, bi_list);
+	return ba->bi_owner->i_ino - bb->bi_owner->i_ino;
+}
+
+/* Get an BUI. */
+STATIC void *
+xfs_bmap_update_create_intent(
+	struct xfs_trans		*tp,
+	unsigned int			count)
+{
+	return NULL;
+}
+
+/* Log bmap updates in the intent item. */
+STATIC void
+xfs_bmap_update_log_item(
+	struct xfs_trans		*tp,
+	void				*intent,
+	struct list_head		*item)
+{
+}
+
+/* Get an BUD so we can process all the deferred rmap updates. */
+STATIC void *
+xfs_bmap_update_create_done(
+	struct xfs_trans		*tp,
+	void				*intent,
+	unsigned int			count)
+{
+	return NULL;
+}
+
+/* Process a deferred rmap update. */
+STATIC int
+xfs_bmap_update_finish_item(
+	struct xfs_trans		*tp,
+	struct xfs_defer_ops		*dop,
+	struct list_head		*item,
+	void				*done_item,
+	void				**state)
+{
+	struct xfs_bmap_intent		*bmap;
+	int				error;
+
+	bmap = container_of(item, struct xfs_bmap_intent, bi_list);
+	error = xfs_bmap_finish_one(tp, dop,
+			bmap->bi_owner,
+			bmap->bi_type, bmap->bi_whichfork,
+			bmap->bi_bmap.br_startoff,
+			bmap->bi_bmap.br_startblock,
+			bmap->bi_bmap.br_blockcount,
+			bmap->bi_bmap.br_state);
+	kmem_free(bmap);
+	return error;
+}
+
+/* Abort all pending BUIs. */
+STATIC void
+xfs_bmap_update_abort_intent(
+	void				*intent)
+{
+}
+
+/* Cancel a deferred rmap update. */
+STATIC void
+xfs_bmap_update_cancel_item(
+	struct list_head		*item)
+{
+	struct xfs_bmap_intent		*bmap;
+
+	bmap = container_of(item, struct xfs_bmap_intent, bi_list);
+	kmem_free(bmap);
+}
+
+const struct xfs_defer_op_type xfs_bmap_update_defer_type = {
+	.type		= XFS_DEFER_OPS_TYPE_BMAP,
+	.diff_items	= xfs_bmap_update_diff_items,
+	.create_intent	= xfs_bmap_update_create_intent,
+	.abort_intent	= xfs_bmap_update_abort_intent,
+	.log_item	= xfs_bmap_update_log_item,
+	.create_done	= xfs_bmap_update_create_done,
+	.finish_item	= xfs_bmap_update_finish_item,
+	.cancel_item	= xfs_bmap_update_cancel_item,
+};
+
 /* Deferred Item Initialization */
 
 /* Initialize the deferred operation types. */
 void
 xfs_defer_init_types(void)
 {
+	xfs_defer_init_op_type(&xfs_bmap_update_defer_type);
 	xfs_defer_init_op_type(&xfs_refcount_update_defer_type);
 	xfs_defer_init_op_type(&xfs_rmap_update_defer_type);
 	xfs_defer_init_op_type(&xfs_extent_free_defer_type);
diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index 58f730e..dff4b7b 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -6119,3 +6119,127 @@ out:
 	xfs_trans_cancel(tp);
 	return error;
 }
+
+/* Record a bmap intent. */
+static int
+__xfs_bmap_add(
+	struct xfs_mount	*mp,
+	struct xfs_defer_ops	*dfops,
+	struct xfs_bmap_intent	*bi)
+{
+	int			error;
+	struct xfs_bmap_intent	*new;
+
+	ASSERT(bi->bi_whichfork == XFS_DATA_FORK);
+
+	trace_xfs_bmap_defer(mp, XFS_FSB_TO_AGNO(mp, bi->bi_bmap.br_startblock),
+			bi->bi_type,
+			XFS_FSB_TO_AGBNO(mp, bi->bi_bmap.br_startblock),
+			bi->bi_owner->i_ino, bi->bi_whichfork,
+			bi->bi_bmap.br_startoff,
+			bi->bi_bmap.br_blockcount,
+			bi->bi_bmap.br_state);
+
+	new = kmem_zalloc(sizeof(struct xfs_bmap_intent), KM_SLEEP | KM_NOFS);
+	*new = *bi;
+
+	error = xfs_defer_join(dfops, bi->bi_owner);
+	if (error)
+		return error;
+
+	xfs_defer_add(dfops, XFS_DEFER_OPS_TYPE_BMAP, &new->bi_list);
+	return 0;
+}
+
+/* Map an extent into a file. */
+int
+xfs_bmap_map_extent(
+	struct xfs_mount	*mp,
+	struct xfs_defer_ops	*dfops,
+	struct xfs_inode	*ip,
+	int			whichfork,
+	struct xfs_bmbt_irec	*PREV)
+{
+	struct xfs_bmap_intent	bi;
+
+	bi.bi_type = XFS_BMAP_MAP;
+	bi.bi_owner = ip;
+	bi.bi_whichfork = whichfork;
+	bi.bi_bmap = *PREV;
+
+	return __xfs_bmap_add(mp, dfops, &bi);
+}
+
+/* Unmap an extent out of a file. */
+int
+xfs_bmap_unmap_extent(
+	struct xfs_mount	*mp,
+	struct xfs_defer_ops	*dfops,
+	struct xfs_inode	*ip,
+	int			whichfork,
+	struct xfs_bmbt_irec	*PREV)
+{
+	struct xfs_bmap_intent	bi;
+
+	bi.bi_type = XFS_BMAP_UNMAP;
+	bi.bi_owner = ip;
+	bi.bi_whichfork = whichfork;
+	bi.bi_bmap = *PREV;
+
+	return __xfs_bmap_add(mp, dfops, &bi);
+}
+
+/*
+ * Process one of the deferred bmap operations.  We pass back the
+ * btree cursor to maintain our lock on the bmapbt between calls.
+ */
+int
+xfs_bmap_finish_one(
+	struct xfs_trans		*tp,
+	struct xfs_defer_ops		*dfops,
+	struct xfs_inode		*ip,
+	enum xfs_bmap_intent_type	type,
+	int				whichfork,
+	xfs_fileoff_t			startoff,
+	xfs_fsblock_t			startblock,
+	xfs_filblks_t			blockcount,
+	xfs_exntst_t			state)
+{
+	struct xfs_bmbt_irec		bmap;
+	int				nimaps = 1;
+	xfs_fsblock_t			firstfsb;
+	int				error = 0;
+
+	bmap.br_startblock = startblock;
+	bmap.br_startoff = startoff;
+	bmap.br_blockcount = blockcount;
+	bmap.br_state = state;
+
+	trace_xfs_bmap_deferred(tp->t_mountp,
+			XFS_FSB_TO_AGNO(tp->t_mountp, startblock), type,
+			XFS_FSB_TO_AGBNO(tp->t_mountp, startblock),
+			ip->i_ino, whichfork, startoff, blockcount, state);
+
+	if (XFS_TEST_ERROR(false, tp->t_mountp,
+			XFS_ERRTAG_BMAP_FINISH_ONE,
+			XFS_RANDOM_BMAP_FINISH_ONE))
+		return -EIO;
+
+	switch (type) {
+	case XFS_BMAP_MAP:
+		firstfsb = bmap.br_startblock;
+		error = xfs_bmapi_write(tp, ip, bmap.br_startoff,
+					bmap.br_blockcount,
+					XFS_BMAPI_REMAP, &firstfsb,
+					bmap.br_blockcount, &bmap, &nimaps,
+					dfops);
+		break;
+	case XFS_BMAP_UNMAP:
+		/* not implemented for now */
+	default:
+		ASSERT(0);
+		error = -EFSCORRUPTED;
+	}
+
+	return error;
+}
diff --git a/libxfs/xfs_bmap.h b/libxfs/xfs_bmap.h
index fb2fd4c..394a22c 100644
--- a/libxfs/xfs_bmap.h
+++ b/libxfs/xfs_bmap.h
@@ -230,4 +230,15 @@ struct xfs_bmap_intent {
 	struct xfs_bmbt_irec			bi_bmap;
 };
 
+int	xfs_bmap_finish_one(struct xfs_trans *tp, struct xfs_defer_ops *dfops,
+		struct xfs_inode *ip, enum xfs_bmap_intent_type type,
+		int whichfork, xfs_fileoff_t startoff, xfs_fsblock_t startblock,
+		xfs_filblks_t blockcount, xfs_exntst_t state);
+int	xfs_bmap_map_extent(struct xfs_mount *mp, struct xfs_defer_ops *dfops,
+		struct xfs_inode *ip, int whichfork,
+		struct xfs_bmbt_irec *imap);
+int	xfs_bmap_unmap_extent(struct xfs_mount *mp, struct xfs_defer_ops *dfops,
+		struct xfs_inode *ip, int whichfork,
+		struct xfs_bmbt_irec *imap);
+
 #endif	/* __XFS_BMAP_H__ */
diff --git a/libxfs/xfs_defer.h b/libxfs/xfs_defer.h
index 4081b00..47aa048 100644
--- a/libxfs/xfs_defer.h
+++ b/libxfs/xfs_defer.h
@@ -51,6 +51,7 @@ struct xfs_defer_pending {
  * find all the space it needs.
  */
 enum xfs_defer_ops_type {
+	XFS_DEFER_OPS_TYPE_BMAP,
 	XFS_DEFER_OPS_TYPE_REFCOUNT,
 	XFS_DEFER_OPS_TYPE_RMAP,
 	XFS_DEFER_OPS_TYPE_FREE,

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 090/145] xfs: return work remaining at the end of a bunmapi operation
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (88 preceding siblings ...)
  2016-06-17  1:40 ` [PATCH 089/145] xfs: implement deferred bmbt map/unmap operations Darrick J. Wong
@ 2016-06-17  1:40 ` Darrick J. Wong
  2016-06-17  1:40 ` [PATCH 091/145] xfs: add reflink feature flag to geometry Darrick J. Wong
                   ` (54 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:40 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Return the range of file blocks that bunmapi didn't free.  This hint
is used by CoW and reflink to figure out what part of an extent
actually got freed so that it can set up the appropriate atomic
remapping of just the freed range.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_bmap.c |   36 ++++++++++++++++++++++++++++++------
 libxfs/xfs_bmap.h |    4 ++++
 2 files changed, 34 insertions(+), 6 deletions(-)


diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index dff4b7b..f0c0871 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -5157,17 +5157,16 @@ done:
  * *done is set.
  */
 int						/* error */
-xfs_bunmapi(
+__xfs_bunmapi(
 	xfs_trans_t		*tp,		/* transaction pointer */
 	struct xfs_inode	*ip,		/* incore inode */
 	xfs_fileoff_t		bno,		/* starting offset to unmap */
-	xfs_filblks_t		len,		/* length to unmap in file */
+	xfs_filblks_t		*rlen,		/* i/o: amount remaining */
 	int			flags,		/* misc flags */
 	xfs_extnum_t		nexts,		/* number of extents max */
 	xfs_fsblock_t		*firstblock,	/* first allocated block
 						   controls a.g. for allocs */
-	struct xfs_defer_ops	*dfops,		/* i/o: list extents to free */
-	int			*done)		/* set if not done yet */
+	struct xfs_defer_ops	*dfops)		/* i/o: deferred updates */
 {
 	xfs_btree_cur_t		*cur;		/* bmap btree cursor */
 	xfs_bmbt_irec_t		del;		/* extent being deleted */
@@ -5189,6 +5188,7 @@ xfs_bunmapi(
 	int			wasdel;		/* was a delayed alloc extent */
 	int			whichfork;	/* data or attribute fork */
 	xfs_fsblock_t		sum;
+	xfs_filblks_t		len = *rlen;	/* length to unmap in file */
 
 	trace_xfs_bunmap(ip, bno, len, flags, _RET_IP_);
 
@@ -5215,7 +5215,7 @@ xfs_bunmapi(
 		return error;
 	nextents = ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t);
 	if (nextents == 0) {
-		*done = 1;
+		*rlen = 0;
 		return 0;
 	}
 	XFS_STATS_INC(mp, xs_blk_unmap);
@@ -5484,7 +5484,10 @@ nodelete:
 			extno++;
 		}
 	}
-	*done = bno == (xfs_fileoff_t)-1 || bno < start || lastx < 0;
+	if (bno == (xfs_fileoff_t)-1 || bno < start || lastx < 0)
+		*rlen = 0;
+	else
+		*rlen = bno - start + 1;
 
 	/*
 	 * Convert to a btree if necessary.
@@ -5540,6 +5543,27 @@ error0:
 	return error;
 }
 
+/* Unmap a range of a file. */
+int
+xfs_bunmapi(
+	xfs_trans_t		*tp,
+	struct xfs_inode	*ip,
+	xfs_fileoff_t		bno,
+	xfs_filblks_t		len,
+	int			flags,
+	xfs_extnum_t		nexts,
+	xfs_fsblock_t		*firstblock,
+	struct xfs_defer_ops	*dfops,
+	int			*done)
+{
+	int			error;
+
+	error = __xfs_bunmapi(tp, ip, bno, &len, flags, nexts, firstblock,
+			dfops);
+	*done = (len == 0);
+	return error;
+}
+
 /*
  * Determine whether an extent shift can be accomplished by a merge with the
  * extent that precedes the target hole of the shift.
diff --git a/libxfs/xfs_bmap.h b/libxfs/xfs_bmap.h
index 394a22c..97828c5 100644
--- a/libxfs/xfs_bmap.h
+++ b/libxfs/xfs_bmap.h
@@ -197,6 +197,10 @@ int	xfs_bmapi_write(struct xfs_trans *tp, struct xfs_inode *ip,
 		xfs_fsblock_t *firstblock, xfs_extlen_t total,
 		struct xfs_bmbt_irec *mval, int *nmap,
 		struct xfs_defer_ops *dfops);
+int	__xfs_bunmapi(struct xfs_trans *tp, struct xfs_inode *ip,
+		xfs_fileoff_t bno, xfs_filblks_t *rlen, int flags,
+		xfs_extnum_t nexts, xfs_fsblock_t *firstblock,
+		struct xfs_defer_ops *dfops);
 int	xfs_bunmapi(struct xfs_trans *tp, struct xfs_inode *ip,
 		xfs_fileoff_t bno, xfs_filblks_t len, int flags,
 		xfs_extnum_t nexts, xfs_fsblock_t *firstblock,

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 091/145] xfs: add reflink feature flag to geometry
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (89 preceding siblings ...)
  2016-06-17  1:40 ` [PATCH 090/145] xfs: return work remaining at the end of a bunmapi operation Darrick J. Wong
@ 2016-06-17  1:40 ` Darrick J. Wong
  2016-06-17  1:40 ` [PATCH 092/145] xfs: don't allow reflinked dir/dev/fifo/socket/pipe files Darrick J. Wong
                   ` (53 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:40 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Report the reflink feature in the XFS geometry so that xfs_info and
friends know the filesystem has this feature.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_fs.h |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h
index 085ea6f..f291a53 100644
--- a/libxfs/xfs_fs.h
+++ b/libxfs/xfs_fs.h
@@ -230,7 +230,8 @@ typedef struct xfs_fsop_resblks {
 #define XFS_FSOP_GEOM_FLAGS_FTYPE	0x10000	/* inode directory types */
 #define XFS_FSOP_GEOM_FLAGS_FINOBT	0x20000	/* free inode btree */
 #define XFS_FSOP_GEOM_FLAGS_SPINODES	0x40000	/* sparse inode chunks	*/
-#define XFS_FSOP_GEOM_FLAGS_RMAPBT	0x80000	/* Reverse mapping btree */
+#define XFS_FSOP_GEOM_FLAGS_RMAPBT	0x80000	/* reverse mapping btree */
+#define XFS_FSOP_GEOM_FLAGS_REFLINK	0x100000 /* files can share blocks */
 
 /*
  * Minimum and maximum sizes need for growth checks.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 092/145] xfs: don't allow reflinked dir/dev/fifo/socket/pipe files
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (90 preceding siblings ...)
  2016-06-17  1:40 ` [PATCH 091/145] xfs: add reflink feature flag to geometry Darrick J. Wong
@ 2016-06-17  1:40 ` Darrick J. Wong
  2016-06-17  1:40 ` [PATCH 093/145] xfs: introduce the CoW fork Darrick J. Wong
                   ` (52 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:40 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Only non-rt files can be reflinked, so check that when we load an
inode.  Also, don't leak the attr fork if there's a failure.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_inode_fork.c |   23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)


diff --git a/libxfs/xfs_inode_fork.c b/libxfs/xfs_inode_fork.c
index 799873a..83fd4f3 100644
--- a/libxfs/xfs_inode_fork.c
+++ b/libxfs/xfs_inode_fork.c
@@ -117,6 +117,26 @@ xfs_iformat_fork(
 		return -EFSCORRUPTED;
 	}
 
+	if (unlikely(xfs_is_reflink_inode(ip) &&
+	    (VFS_I(ip)->i_mode & S_IFMT) != S_IFREG)) {
+		xfs_warn(ip->i_mount,
+			"corrupt dinode %llu, wrong file type for reflink.",
+			ip->i_ino);
+		XFS_CORRUPTION_ERROR("xfs_iformat(reflink)",
+				     XFS_ERRLEVEL_LOW, ip->i_mount, dip);
+		return -EFSCORRUPTED;
+	}
+
+	if (unlikely(xfs_is_reflink_inode(ip) &&
+	    (ip->i_d.di_flags & XFS_DIFLAG_REALTIME))) {
+		xfs_warn(ip->i_mount,
+			"corrupt dinode %llu, has reflink+realtime flag set.",
+			ip->i_ino);
+		XFS_CORRUPTION_ERROR("xfs_iformat(reflink)",
+				     XFS_ERRLEVEL_LOW, ip->i_mount, dip);
+		return -EFSCORRUPTED;
+	}
+
 	switch (VFS_I(ip)->i_mode & S_IFMT) {
 	case S_IFIFO:
 	case S_IFCHR:
@@ -204,7 +224,8 @@ xfs_iformat_fork(
 			XFS_CORRUPTION_ERROR("xfs_iformat(8)",
 					     XFS_ERRLEVEL_LOW,
 					     ip->i_mount, dip);
-			return -EFSCORRUPTED;
+			error = -EFSCORRUPTED;
+			break;
 		}
 
 		error = xfs_iformat_local(ip, dip, XFS_ATTR_FORK, size);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 093/145] xfs: introduce the CoW fork
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (91 preceding siblings ...)
  2016-06-17  1:40 ` [PATCH 092/145] xfs: don't allow reflinked dir/dev/fifo/socket/pipe files Darrick J. Wong
@ 2016-06-17  1:40 ` Darrick J. Wong
  2016-06-17  1:40 ` [PATCH 094/145] xfs: support bmapping delalloc extents in " Darrick J. Wong
                   ` (51 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:40 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Introduce a new in-core fork for storing copy-on-write delalloc
reservations and allocated extents that are in the process of being
written out.

v2: fix up bmapi_read so that we can query the CoW fork, and have it
return a "hole" extent if there's no CoW fork.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 include/xfs_inode.h     |    3 +++
 libxfs/rdwr.c           |    2 ++
 libxfs/xfs_bmap.c       |   27 ++++++++++++++++++++-------
 libxfs/xfs_bmap.h       |   22 +++++++++++++++++++---
 libxfs/xfs_bmap_btree.c |    1 +
 libxfs/xfs_inode_fork.c |   47 ++++++++++++++++++++++++++++++++++++++++++++---
 libxfs/xfs_inode_fork.h |   28 ++++++++++++++++++++++------
 libxfs/xfs_rmap.c       |    2 ++
 libxfs/xfs_types.h      |    1 +
 9 files changed, 114 insertions(+), 19 deletions(-)


diff --git a/include/xfs_inode.h b/include/xfs_inode.h
index 3876fa6..b7623c0 100644
--- a/include/xfs_inode.h
+++ b/include/xfs_inode.h
@@ -50,6 +50,7 @@ typedef struct xfs_inode {
 	struct xfs_imap		i_imap;		/* location for xfs_imap() */
 	struct xfs_buftarg	i_dev;		/* dev for this inode */
 	struct xfs_ifork	*i_afp;		/* attribute fork pointer */
+	struct xfs_ifork	*i_cowfp;	/* copy on write extents */
 	struct xfs_ifork	i_df;		/* data fork */
 	struct xfs_trans	*i_transp;	/* ptr to owning transaction */
 	struct xfs_inode_log_item *i_itemp;	/* logging information */
@@ -58,6 +59,8 @@ typedef struct xfs_inode {
 	xfs_fsize_t		i_size;		/* in-memory size */
 	const struct xfs_dir_ops *d_ops;	/* directory ops vector */
 	struct inode		i_vnode;
+	xfs_extnum_t		i_cnextents;	/* # of extents in cow fork */
+	unsigned int		i_cformat;	/* format of cow fork */
 } xfs_inode_t;
 
 static inline struct inode *VFS_I(struct xfs_inode *ip)
diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
index aa30522..533a064 100644
--- a/libxfs/rdwr.c
+++ b/libxfs/rdwr.c
@@ -1372,6 +1372,8 @@ libxfs_idestroy(xfs_inode_t *ip)
 	}
 	if (ip->i_afp)
 		libxfs_idestroy_fork(ip, XFS_ATTR_FORK);
+	if (ip->i_cowfp)
+		xfs_idestroy_fork(ip, XFS_COW_FORK);
 }
 
 void
diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index f0c0871..2ec3385 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -2916,6 +2916,7 @@ xfs_bmap_add_extent_hole_real(
 	ASSERT(!isnullstartblock(new->br_startblock));
 	ASSERT(!bma->cur ||
 	       !(bma->cur->bc_private.b.flags & XFS_BTCUR_BPRV_WASDEL));
+	ASSERT(whichfork != XFS_COW_FORK);
 
 	XFS_STATS_INC(mp, xs_add_exlist);
 
@@ -4050,12 +4051,11 @@ xfs_bmapi_read(
 	int			error;
 	int			eof;
 	int			n = 0;
-	int			whichfork = (flags & XFS_BMAPI_ATTRFORK) ?
-						XFS_ATTR_FORK : XFS_DATA_FORK;
+	int			whichfork = xfs_bmapi_whichfork(flags);
 
 	ASSERT(*nmap >= 1);
 	ASSERT(!(flags & ~(XFS_BMAPI_ATTRFORK|XFS_BMAPI_ENTIRE|
-			   XFS_BMAPI_IGSTATE)));
+			   XFS_BMAPI_IGSTATE|XFS_BMAPI_COWFORK)));
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_SHARED|XFS_ILOCK_EXCL));
 
 	if (unlikely(XFS_TEST_ERROR(
@@ -4073,6 +4073,16 @@ xfs_bmapi_read(
 
 	ifp = XFS_IFORK_PTR(ip, whichfork);
 
+	/* No CoW fork?  Return a hole. */
+	if (whichfork == XFS_COW_FORK && !ifp) {
+		mval->br_startoff = bno;
+		mval->br_startblock = HOLESTARTBLOCK;
+		mval->br_blockcount = len;
+		mval->br_state = XFS_EXT_NORM;
+		*nmap = 1;
+		return 0;
+	}
+
 	if (!(ifp->if_flags & XFS_IFEXTENTS)) {
 		error = xfs_iread_extents(NULL, ip, whichfork);
 		if (error)
@@ -4425,8 +4435,7 @@ xfs_bmapi_convert_unwritten(
 	xfs_filblks_t		len,
 	int			flags)
 {
-	int			whichfork = (flags & XFS_BMAPI_ATTRFORK) ?
-						XFS_ATTR_FORK : XFS_DATA_FORK;
+	int			whichfork = xfs_bmapi_whichfork(flags);
 	struct xfs_ifork	*ifp = XFS_IFORK_PTR(bma->ip, whichfork);
 	int			tmp_logflags = 0;
 	int			error;
@@ -4442,6 +4451,8 @@ xfs_bmapi_convert_unwritten(
 			(XFS_BMAPI_PREALLOC | XFS_BMAPI_CONVERT))
 		return 0;
 
+	ASSERT(whichfork != XFS_COW_FORK);
+
 	/*
 	 * Modify (by adding) the state flag, if writing.
 	 */
@@ -4854,6 +4865,8 @@ xfs_bmap_del_extent(
 
 	if (whichfork == XFS_ATTR_FORK)
 		state |= BMAP_ATTRFORK;
+	else if (whichfork == XFS_COW_FORK)
+		state |= BMAP_COWFORK;
 
 	ifp = XFS_IFORK_PTR(ip, whichfork);
 	ASSERT((*idx >= 0) && (*idx < ifp->if_bytes /
@@ -5192,8 +5205,8 @@ __xfs_bunmapi(
 
 	trace_xfs_bunmap(ip, bno, len, flags, _RET_IP_);
 
-	whichfork = (flags & XFS_BMAPI_ATTRFORK) ?
-		XFS_ATTR_FORK : XFS_DATA_FORK;
+	whichfork = xfs_bmapi_whichfork(flags);
+	ASSERT(whichfork != XFS_COW_FORK);
 	ifp = XFS_IFORK_PTR(ip, whichfork);
 	if (unlikely(
 	    XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
diff --git a/libxfs/xfs_bmap.h b/libxfs/xfs_bmap.h
index 97828c5..a8ef1c6 100644
--- a/libxfs/xfs_bmap.h
+++ b/libxfs/xfs_bmap.h
@@ -104,6 +104,9 @@ struct xfs_bmap_free_item
  */
 #define XFS_BMAPI_REMAP		0x100
 
+/* Map something in the CoW fork. */
+#define XFS_BMAPI_COWFORK	0x200
+
 #define XFS_BMAPI_FLAGS \
 	{ XFS_BMAPI_ENTIRE,	"ENTIRE" }, \
 	{ XFS_BMAPI_METADATA,	"METADATA" }, \
@@ -113,12 +116,23 @@ struct xfs_bmap_free_item
 	{ XFS_BMAPI_CONTIG,	"CONTIG" }, \
 	{ XFS_BMAPI_CONVERT,	"CONVERT" }, \
 	{ XFS_BMAPI_ZERO,	"ZERO" }, \
-	{ XFS_BMAPI_REMAP,	"REMAP" }
+	{ XFS_BMAPI_REMAP,	"REMAP" }, \
+	{ XFS_BMAPI_COWFORK,	"COWFORK" }
 
 
 static inline int xfs_bmapi_aflag(int w)
 {
-	return (w == XFS_ATTR_FORK ? XFS_BMAPI_ATTRFORK : 0);
+	return (w == XFS_ATTR_FORK ? XFS_BMAPI_ATTRFORK :
+	       (w == XFS_COW_FORK ? XFS_BMAPI_COWFORK : 0));
+}
+
+static inline int xfs_bmapi_whichfork(int bmapi_flags)
+{
+	if (bmapi_flags & XFS_BMAPI_COWFORK)
+		return XFS_COW_FORK;
+	else if (bmapi_flags & XFS_BMAPI_ATTRFORK)
+		return XFS_ATTR_FORK;
+	return XFS_DATA_FORK;
 }
 
 /*
@@ -139,13 +153,15 @@ static inline int xfs_bmapi_aflag(int w)
 #define BMAP_LEFT_VALID		(1 << 6)
 #define BMAP_RIGHT_VALID	(1 << 7)
 #define BMAP_ATTRFORK		(1 << 8)
+#define BMAP_COWFORK		(1 << 9)
 
 #define XFS_BMAP_EXT_FLAGS \
 	{ BMAP_LEFT_CONTIG,	"LC" }, \
 	{ BMAP_RIGHT_CONTIG,	"RC" }, \
 	{ BMAP_LEFT_FILLING,	"LF" }, \
 	{ BMAP_RIGHT_FILLING,	"RF" }, \
-	{ BMAP_ATTRFORK,	"ATTR" }
+	{ BMAP_ATTRFORK,	"ATTR" }, \
+	{ BMAP_COWFORK,		"COW" }
 
 
 /*
diff --git a/libxfs/xfs_bmap_btree.c b/libxfs/xfs_bmap_btree.c
index 404e321..2145ac0 100644
--- a/libxfs/xfs_bmap_btree.c
+++ b/libxfs/xfs_bmap_btree.c
@@ -773,6 +773,7 @@ xfs_bmbt_init_cursor(
 {
 	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, whichfork);
 	struct xfs_btree_cur	*cur;
+	ASSERT(whichfork != XFS_COW_FORK);
 
 	cur = kmem_zone_zalloc(xfs_btree_cur_zone, KM_SLEEP);
 
diff --git a/libxfs/xfs_inode_fork.c b/libxfs/xfs_inode_fork.c
index 83fd4f3..ab708dc 100644
--- a/libxfs/xfs_inode_fork.c
+++ b/libxfs/xfs_inode_fork.c
@@ -202,9 +202,14 @@ xfs_iformat_fork(
 		XFS_ERROR_REPORT("xfs_iformat(7)", XFS_ERRLEVEL_LOW, ip->i_mount);
 		return -EFSCORRUPTED;
 	}
-	if (error) {
+	if (error)
 		return error;
+
+	if (xfs_is_reflink_inode(ip)) {
+		ASSERT(ip->i_cowfp == NULL);
+		xfs_ifork_init_cow(ip);
 	}
+
 	if (!XFS_DFORK_Q(dip))
 		return 0;
 
@@ -243,6 +248,9 @@ xfs_iformat_fork(
 	if (error) {
 		kmem_zone_free(xfs_ifork_zone, ip->i_afp);
 		ip->i_afp = NULL;
+		if (ip->i_cowfp)
+			kmem_zone_free(xfs_ifork_zone, ip->i_cowfp);
+		ip->i_cowfp = NULL;
 		xfs_idestroy_fork(ip, XFS_DATA_FORK);
 	}
 	return error;
@@ -757,6 +765,9 @@ xfs_idestroy_fork(
 	if (whichfork == XFS_ATTR_FORK) {
 		kmem_zone_free(xfs_ifork_zone, ip->i_afp);
 		ip->i_afp = NULL;
+	} else if (whichfork == XFS_COW_FORK) {
+		kmem_zone_free(xfs_ifork_zone, ip->i_cowfp);
+		ip->i_cowfp = NULL;
 	}
 }
 
@@ -944,6 +955,19 @@ xfs_iext_get_ext(
 	}
 }
 
+/* XFS_IEXT_STATE_TO_FORK() -- Convert BMAP state flags to an inode fork. */
+xfs_ifork_t *
+XFS_IEXT_STATE_TO_FORK(
+	struct xfs_inode	*ip,
+	int			state)
+{
+	if (state & BMAP_COWFORK)
+		return ip->i_cowfp;
+	else if (state & BMAP_ATTRFORK)
+		return ip->i_afp;
+	return &ip->i_df;
+}
+
 /*
  * Insert new item(s) into the extent records for incore inode
  * fork 'ifp'.  'count' new items are inserted at index 'idx'.
@@ -956,7 +980,7 @@ xfs_iext_insert(
 	xfs_bmbt_irec_t	*new,		/* items to insert */
 	int		state)		/* type of extent conversion */
 {
-	xfs_ifork_t	*ifp = (state & BMAP_ATTRFORK) ? ip->i_afp : &ip->i_df;
+	xfs_ifork_t	*ifp = XFS_IEXT_STATE_TO_FORK(ip, state);
 	xfs_extnum_t	i;		/* extent record index */
 
 	trace_xfs_iext_insert(ip, idx, new, state, _RET_IP_);
@@ -1206,7 +1230,7 @@ xfs_iext_remove(
 	int		ext_diff,	/* number of extents to remove */
 	int		state)		/* type of extent conversion */
 {
-	xfs_ifork_t	*ifp = (state & BMAP_ATTRFORK) ? ip->i_afp : &ip->i_df;
+	xfs_ifork_t	*ifp = XFS_IEXT_STATE_TO_FORK(ip, state);
 	xfs_extnum_t	nextents;	/* number of extents in file */
 	int		new_size;	/* size of extents after removal */
 
@@ -1951,3 +1975,20 @@ xfs_iext_irec_update_extoffs(
 		ifp->if_u1.if_ext_irec[i].er_extoff += ext_diff;
 	}
 }
+
+/*
+ * Initialize an inode's copy-on-write fork.
+ */
+void
+xfs_ifork_init_cow(
+	struct xfs_inode	*ip)
+{
+	if (ip->i_cowfp)
+		return;
+
+	ip->i_cowfp = kmem_zone_zalloc(xfs_ifork_zone,
+				       KM_SLEEP | KM_NOFS);
+	ip->i_cowfp->if_flags = XFS_IFEXTENTS;
+	ip->i_cformat = XFS_DINODE_FMT_EXTENTS;
+	ip->i_cnextents = 0;
+}
diff --git a/libxfs/xfs_inode_fork.h b/libxfs/xfs_inode_fork.h
index f95e072..44d38eb 100644
--- a/libxfs/xfs_inode_fork.h
+++ b/libxfs/xfs_inode_fork.h
@@ -92,7 +92,9 @@ typedef struct xfs_ifork {
 #define XFS_IFORK_PTR(ip,w)		\
 	((w) == XFS_DATA_FORK ? \
 		&(ip)->i_df : \
-		(ip)->i_afp)
+		((w) == XFS_ATTR_FORK ? \
+			(ip)->i_afp : \
+			(ip)->i_cowfp))
 #define XFS_IFORK_DSIZE(ip) \
 	(XFS_IFORK_Q(ip) ? \
 		XFS_IFORK_BOFF(ip) : \
@@ -105,26 +107,38 @@ typedef struct xfs_ifork {
 #define XFS_IFORK_SIZE(ip,w) \
 	((w) == XFS_DATA_FORK ? \
 		XFS_IFORK_DSIZE(ip) : \
-		XFS_IFORK_ASIZE(ip))
+		((w) == XFS_ATTR_FORK ? \
+			XFS_IFORK_ASIZE(ip) : \
+			0))
 #define XFS_IFORK_FORMAT(ip,w) \
 	((w) == XFS_DATA_FORK ? \
 		(ip)->i_d.di_format : \
-		(ip)->i_d.di_aformat)
+		((w) == XFS_ATTR_FORK ? \
+			(ip)->i_d.di_aformat : \
+			(ip)->i_cformat))
 #define XFS_IFORK_FMT_SET(ip,w,n) \
 	((w) == XFS_DATA_FORK ? \
 		((ip)->i_d.di_format = (n)) : \
-		((ip)->i_d.di_aformat = (n)))
+		((w) == XFS_ATTR_FORK ? \
+			((ip)->i_d.di_aformat = (n)) : \
+			((ip)->i_cformat = (n))))
 #define XFS_IFORK_NEXTENTS(ip,w) \
 	((w) == XFS_DATA_FORK ? \
 		(ip)->i_d.di_nextents : \
-		(ip)->i_d.di_anextents)
+		((w) == XFS_ATTR_FORK ? \
+			(ip)->i_d.di_anextents : \
+			(ip)->i_cnextents))
 #define XFS_IFORK_NEXT_SET(ip,w,n) \
 	((w) == XFS_DATA_FORK ? \
 		((ip)->i_d.di_nextents = (n)) : \
-		((ip)->i_d.di_anextents = (n)))
+		((w) == XFS_ATTR_FORK ? \
+			((ip)->i_d.di_anextents = (n)) : \
+			((ip)->i_cnextents = (n))))
 #define XFS_IFORK_MAXEXT(ip, w) \
 	(XFS_IFORK_SIZE(ip, w) / sizeof(xfs_bmbt_rec_t))
 
+xfs_ifork_t	*XFS_IEXT_STATE_TO_FORK(struct xfs_inode *ip, int state);
+
 int		xfs_iformat_fork(struct xfs_inode *, struct xfs_dinode *);
 void		xfs_iflush_fork(struct xfs_inode *, struct xfs_dinode *,
 				struct xfs_inode_log_item *, int);
@@ -169,4 +183,6 @@ void		xfs_iext_irec_update_extoffs(struct xfs_ifork *, int, int);
 
 extern struct kmem_zone	*xfs_ifork_zone;
 
+extern void xfs_ifork_init_cow(struct xfs_inode *ip);
+
 #endif	/* __XFS_INODE_FORK_H__ */
diff --git a/libxfs/xfs_rmap.c b/libxfs/xfs_rmap.c
index 7637903..6e69208 100644
--- a/libxfs/xfs_rmap.c
+++ b/libxfs/xfs_rmap.c
@@ -1344,6 +1344,8 @@ __xfs_rmap_add(
 
 	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
 		return 0;
+	if (ri->ri_whichfork == XFS_COW_FORK)
+		return 0;
 
 	trace_xfs_rmap_defer(mp, XFS_FSB_TO_AGNO(mp, ri->ri_bmap.br_startblock),
 			ri->ri_type,
diff --git a/libxfs/xfs_types.h b/libxfs/xfs_types.h
index 690d616..cf044c0 100644
--- a/libxfs/xfs_types.h
+++ b/libxfs/xfs_types.h
@@ -93,6 +93,7 @@ typedef __int64_t	xfs_sfiloff_t;	/* signed block number in a file */
  */
 #define	XFS_DATA_FORK	0
 #define	XFS_ATTR_FORK	1
+#define	XFS_COW_FORK	2
 
 /*
  * Min numbers of data/attr fork btree root pointers.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 094/145] xfs: support bmapping delalloc extents in the CoW fork
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (92 preceding siblings ...)
  2016-06-17  1:40 ` [PATCH 093/145] xfs: introduce the CoW fork Darrick J. Wong
@ 2016-06-17  1:40 ` Darrick J. Wong
  2016-06-17  1:40 ` [PATCH 095/145] xfs: support allocating delayed extents in " Darrick J. Wong
                   ` (50 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:40 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Allow the creation of delayed allocation extents in the CoW fork.
In a subsequent patch we'll wire up write_begin and page_mkwrite to
actually do this.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_bmap.c |   29 ++++++++++++++++++-----------
 libxfs/xfs_bmap.h |    2 +-
 2 files changed, 19 insertions(+), 12 deletions(-)


diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index 2ec3385..acb3011 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -2752,6 +2752,7 @@ done:
 STATIC void
 xfs_bmap_add_extent_hole_delay(
 	xfs_inode_t		*ip,	/* incore inode pointer */
+	int			whichfork,
 	xfs_extnum_t		*idx,	/* extent number to update/insert */
 	xfs_bmbt_irec_t		*new)	/* new data to add to file extents */
 {
@@ -2763,8 +2764,10 @@ xfs_bmap_add_extent_hole_delay(
 	int			state;  /* state bits, accessed thru macros */
 	xfs_filblks_t		temp=0;	/* temp for indirect calculations */
 
-	ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
+	ifp = XFS_IFORK_PTR(ip, whichfork);
 	state = 0;
+	if (whichfork == XFS_COW_FORK)
+		state |= BMAP_COWFORK;
 	ASSERT(isnullstartblock(new->br_startblock));
 
 	/*
@@ -2782,7 +2785,7 @@ xfs_bmap_add_extent_hole_delay(
 	 * Check and set flags if the current (right) segment exists.
 	 * If it doesn't exist, we're converting the hole at end-of-file.
 	 */
-	if (*idx < ip->i_df.if_bytes / (uint)sizeof(xfs_bmbt_rec_t)) {
+	if (*idx < ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t)) {
 		state |= BMAP_RIGHT_VALID;
 		xfs_bmbt_get_all(xfs_iext_get_ext(ifp, *idx), &right);
 
@@ -4132,6 +4135,7 @@ xfs_bmapi_read(
 STATIC int
 xfs_bmapi_reserve_delalloc(
 	struct xfs_inode	*ip,
+	int			whichfork,
 	xfs_fileoff_t		aoff,
 	xfs_filblks_t		len,
 	struct xfs_bmbt_irec	*got,
@@ -4140,7 +4144,7 @@ xfs_bmapi_reserve_delalloc(
 	int			eof)
 {
 	struct xfs_mount	*mp = ip->i_mount;
-	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
+	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, whichfork);
 	xfs_extlen_t		alen;
 	xfs_extlen_t		indlen;
 	char			rt = XFS_IS_REALTIME_INODE(ip);
@@ -4199,7 +4203,7 @@ xfs_bmapi_reserve_delalloc(
 	got->br_startblock = nullstartblock(indlen);
 	got->br_blockcount = alen;
 	got->br_state = XFS_EXT_NORM;
-	xfs_bmap_add_extent_hole_delay(ip, lastx, got);
+	xfs_bmap_add_extent_hole_delay(ip, whichfork, lastx, got);
 
 	/*
 	 * Update our extent pointer, given that xfs_bmap_add_extent_hole_delay
@@ -4231,6 +4235,7 @@ out_unreserve_quota:
 int
 xfs_bmapi_delay(
 	struct xfs_inode	*ip,	/* incore inode */
+	int			whichfork, /* data or cow fork? */
 	xfs_fileoff_t		bno,	/* starting file offs. mapped */
 	xfs_filblks_t		len,	/* length to map in file */
 	struct xfs_bmbt_irec	*mval,	/* output: map values */
@@ -4238,7 +4243,7 @@ xfs_bmapi_delay(
 	int			flags)	/* XFS_BMAPI_... */
 {
 	struct xfs_mount	*mp = ip->i_mount;
-	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
+	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, whichfork);
 	struct xfs_bmbt_irec	got;	/* current file extent record */
 	struct xfs_bmbt_irec	prev;	/* previous file extent record */
 	xfs_fileoff_t		obno;	/* old block number (offset) */
@@ -4248,14 +4253,15 @@ xfs_bmapi_delay(
 	int			n = 0;	/* current extent index */
 	int			error = 0;
 
+	ASSERT(whichfork == XFS_DATA_FORK || whichfork == XFS_COW_FORK);
 	ASSERT(*nmap >= 1);
 	ASSERT(*nmap <= XFS_BMAP_MAX_NMAP);
 	ASSERT(!(flags & ~XFS_BMAPI_ENTIRE));
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
 
 	if (unlikely(XFS_TEST_ERROR(
-	    (XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_EXTENTS &&
-	     XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_BTREE),
+	    (XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
+	     XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_BTREE),
 	     mp, XFS_ERRTAG_BMAPIFORMAT, XFS_RANDOM_BMAPIFORMAT))) {
 		XFS_ERROR_REPORT("xfs_bmapi_delay", XFS_ERRLEVEL_LOW, mp);
 		return -EFSCORRUPTED;
@@ -4266,19 +4272,20 @@ xfs_bmapi_delay(
 
 	XFS_STATS_INC(mp, xs_blk_mapw);
 
-	if (!(ifp->if_flags & XFS_IFEXTENTS)) {
-		error = xfs_iread_extents(NULL, ip, XFS_DATA_FORK);
+	if (whichfork == XFS_DATA_FORK && !(ifp->if_flags & XFS_IFEXTENTS)) {
+		error = xfs_iread_extents(NULL, ip, whichfork);
 		if (error)
 			return error;
 	}
 
-	xfs_bmap_search_extents(ip, bno, XFS_DATA_FORK, &eof, &lastx, &got, &prev);
+	xfs_bmap_search_extents(ip, bno, whichfork, &eof, &lastx, &got, &prev);
 	end = bno + len;
 	obno = bno;
 
 	while (bno < end && n < *nmap) {
 		if (eof || got.br_startoff > bno) {
-			error = xfs_bmapi_reserve_delalloc(ip, bno, len, &got,
+			error = xfs_bmapi_reserve_delalloc(ip, whichfork,
+							   bno, len, &got,
 							   &prev, &lastx, eof);
 			if (error) {
 				if (n == 0) {
diff --git a/libxfs/xfs_bmap.h b/libxfs/xfs_bmap.h
index a8ef1c6..d90f88e 100644
--- a/libxfs/xfs_bmap.h
+++ b/libxfs/xfs_bmap.h
@@ -205,7 +205,7 @@ int	xfs_bmap_read_extents(struct xfs_trans *tp, struct xfs_inode *ip,
 int	xfs_bmapi_read(struct xfs_inode *ip, xfs_fileoff_t bno,
 		xfs_filblks_t len, struct xfs_bmbt_irec *mval,
 		int *nmap, int flags);
-int	xfs_bmapi_delay(struct xfs_inode *ip, xfs_fileoff_t bno,
+int	xfs_bmapi_delay(struct xfs_inode *ip, int whichfork, xfs_fileoff_t bno,
 		xfs_filblks_t len, struct xfs_bmbt_irec *mval,
 		int *nmap, int flags);
 int	xfs_bmapi_write(struct xfs_trans *tp, struct xfs_inode *ip,

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 095/145] xfs: support allocating delayed extents in CoW fork
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (93 preceding siblings ...)
  2016-06-17  1:40 ` [PATCH 094/145] xfs: support bmapping delalloc extents in " Darrick J. Wong
@ 2016-06-17  1:40 ` Darrick J. Wong
  2016-06-17  1:40 ` [PATCH 096/145] xfs: support removing extents from " Darrick J. Wong
                   ` (49 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:40 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Modify xfs_bmap_add_extent_delay_real() so that we can convert delayed
allocation extents in the CoW fork to real allocations, and wire this
up all the way back to xfs_iomap_write_allocate().  In a subsequent
patch, we'll modify the writepage handler to call this.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_bmap.c |   51 +++++++++++++++++++++++++++++++++++----------------
 1 file changed, 35 insertions(+), 16 deletions(-)


diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index acb3011..18dcd5f 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -133,7 +133,8 @@ xfs_bmbt_lookup_ge(
  */
 static inline bool xfs_bmap_needs_btree(struct xfs_inode *ip, int whichfork)
 {
-	return XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS &&
+	return whichfork != XFS_COW_FORK &&
+		XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS &&
 		XFS_IFORK_NEXTENTS(ip, whichfork) >
 			XFS_IFORK_MAXEXT(ip, whichfork);
 }
@@ -143,7 +144,8 @@ static inline bool xfs_bmap_needs_btree(struct xfs_inode *ip, int whichfork)
  */
 static inline bool xfs_bmap_wants_extents(struct xfs_inode *ip, int whichfork)
 {
-	return XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_BTREE &&
+	return whichfork != XFS_COW_FORK &&
+		XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_BTREE &&
 		XFS_IFORK_NEXTENTS(ip, whichfork) <=
 			XFS_IFORK_MAXEXT(ip, whichfork);
 }
@@ -633,6 +635,7 @@ xfs_bmap_btree_to_extents(
 
 	mp = ip->i_mount;
 	ifp = XFS_IFORK_PTR(ip, whichfork);
+	ASSERT(whichfork != XFS_COW_FORK);
 	ASSERT(ifp->if_flags & XFS_IFEXTENTS);
 	ASSERT(XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_BTREE);
 	rblock = ifp->if_broot;
@@ -699,6 +702,7 @@ xfs_bmap_extents_to_btree(
 	xfs_bmbt_ptr_t		*pp;		/* root block address pointer */
 
 	mp = ip->i_mount;
+	ASSERT(whichfork != XFS_COW_FORK);
 	ifp = XFS_IFORK_PTR(ip, whichfork);
 	ASSERT(XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS);
 
@@ -830,6 +834,7 @@ xfs_bmap_local_to_extents_empty(
 {
 	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, whichfork);
 
+	ASSERT(whichfork != XFS_COW_FORK);
 	ASSERT(XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_LOCAL);
 	ASSERT(ifp->if_bytes == 0);
 	ASSERT(XFS_IFORK_NEXTENTS(ip, whichfork) == 0);
@@ -1663,7 +1668,8 @@ xfs_bmap_one_block(
  */
 STATIC int				/* error */
 xfs_bmap_add_extent_delay_real(
-	struct xfs_bmalloca	*bma)
+	struct xfs_bmalloca	*bma,
+	int			whichfork)
 {
 	struct xfs_bmbt_irec	*new = &bma->got;
 	int			diff;	/* temp value */
@@ -1682,10 +1688,13 @@ xfs_bmap_add_extent_delay_real(
 	xfs_filblks_t		temp2=0;/* value for da_new calculations */
 	int			tmp_rval;	/* partial logging flags */
 	struct xfs_mount	*mp;
-	int			whichfork = XFS_DATA_FORK;
+	xfs_extnum_t		*nextents;
 
 	mp = bma->ip->i_mount;
 	ifp = XFS_IFORK_PTR(bma->ip, whichfork);
+	ASSERT(whichfork != XFS_ATTR_FORK);
+	nextents = (whichfork == XFS_COW_FORK ? &bma->ip->i_cnextents :
+						&bma->ip->i_d.di_nextents);
 
 	ASSERT(bma->idx >= 0);
 	ASSERT(bma->idx <= ifp->if_bytes / sizeof(struct xfs_bmbt_rec));
@@ -1699,6 +1708,9 @@ xfs_bmap_add_extent_delay_real(
 #define	RIGHT		r[1]
 #define	PREV		r[2]
 
+	if (whichfork == XFS_COW_FORK)
+		state |= BMAP_COWFORK;
+
 	/*
 	 * Set up a bunch of variables to make the tests simpler.
 	 */
@@ -1785,7 +1797,7 @@ xfs_bmap_add_extent_delay_real(
 		trace_xfs_bmap_post_update(bma->ip, bma->idx, state, _THIS_IP_);
 
 		xfs_iext_remove(bma->ip, bma->idx + 1, 2, state);
-		bma->ip->i_d.di_nextents--;
+		(*nextents)--;
 		if (bma->cur == NULL)
 			rval = XFS_ILOG_CORE | XFS_ILOG_DEXT;
 		else {
@@ -1887,7 +1899,7 @@ xfs_bmap_add_extent_delay_real(
 		xfs_bmbt_set_startblock(ep, new->br_startblock);
 		trace_xfs_bmap_post_update(bma->ip, bma->idx, state, _THIS_IP_);
 
-		bma->ip->i_d.di_nextents++;
+		(*nextents)++;
 		if (bma->cur == NULL)
 			rval = XFS_ILOG_CORE | XFS_ILOG_DEXT;
 		else {
@@ -1957,7 +1969,7 @@ xfs_bmap_add_extent_delay_real(
 		temp = PREV.br_blockcount - new->br_blockcount;
 		xfs_bmbt_set_blockcount(ep, temp);
 		xfs_iext_insert(bma->ip, bma->idx, 1, new, state);
-		bma->ip->i_d.di_nextents++;
+		(*nextents)++;
 		if (bma->cur == NULL)
 			rval = XFS_ILOG_CORE | XFS_ILOG_DEXT;
 		else {
@@ -2041,7 +2053,7 @@ xfs_bmap_add_extent_delay_real(
 		trace_xfs_bmap_pre_update(bma->ip, bma->idx, state, _THIS_IP_);
 		xfs_bmbt_set_blockcount(ep, temp);
 		xfs_iext_insert(bma->ip, bma->idx + 1, 1, new, state);
-		bma->ip->i_d.di_nextents++;
+		(*nextents)++;
 		if (bma->cur == NULL)
 			rval = XFS_ILOG_CORE | XFS_ILOG_DEXT;
 		else {
@@ -2110,7 +2122,7 @@ xfs_bmap_add_extent_delay_real(
 		RIGHT.br_blockcount = temp2;
 		/* insert LEFT (r[0]) and RIGHT (r[1]) at the same time */
 		xfs_iext_insert(bma->ip, bma->idx + 1, 2, &LEFT, state);
-		bma->ip->i_d.di_nextents++;
+		(*nextents)++;
 		if (bma->cur == NULL)
 			rval = XFS_ILOG_CORE | XFS_ILOG_DEXT;
 		else {
@@ -2208,7 +2220,8 @@ xfs_bmap_add_extent_delay_real(
 
 	xfs_bmap_check_leaf_extents(bma->cur, bma->ip, whichfork);
 done:
-	bma->logflags |= rval;
+	if (whichfork != XFS_COW_FORK)
+		bma->logflags |= rval;
 	return error;
 #undef	LEFT
 #undef	RIGHT
@@ -3848,7 +3861,8 @@ xfs_bmap_btalloc(
 		ASSERT(nullfb || fb_agno == args.agno ||
 		       (ap->dfops->dop_low && fb_agno < args.agno));
 		ap->length = args.len;
-		ap->ip->i_d.di_nblocks += args.len;
+		if (!(ap->flags & XFS_BMAPI_COWFORK))
+			ap->ip->i_d.di_nblocks += args.len;
 		xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE);
 		if (ap->wasdel)
 			ap->ip->i_delayed_blks -= args.len;
@@ -4322,8 +4336,7 @@ xfs_bmapi_allocate(
 	struct xfs_bmalloca	*bma)
 {
 	struct xfs_mount	*mp = bma->ip->i_mount;
-	int			whichfork = (bma->flags & XFS_BMAPI_ATTRFORK) ?
-						XFS_ATTR_FORK : XFS_DATA_FORK;
+	int			whichfork = xfs_bmapi_whichfork(bma->flags);
 	struct xfs_ifork	*ifp = XFS_IFORK_PTR(bma->ip, whichfork);
 	int			tmp_logflags = 0;
 	int			error;
@@ -4412,7 +4425,7 @@ xfs_bmapi_allocate(
 		bma->got.br_state = XFS_EXT_UNWRITTEN;
 
 	if (bma->wasdel)
-		error = xfs_bmap_add_extent_delay_real(bma);
+		error = xfs_bmap_add_extent_delay_real(bma, whichfork);
 	else
 		error = xfs_bmap_add_extent_hole_real(bma, whichfork);
 
@@ -4566,8 +4579,7 @@ xfs_bmapi_write(
 	orig_mval = mval;
 	orig_nmap = *nmap;
 #endif
-	whichfork = (flags & XFS_BMAPI_ATTRFORK) ?
-		XFS_ATTR_FORK : XFS_DATA_FORK;
+	whichfork = xfs_bmapi_whichfork(flags);
 
 	ASSERT(*nmap >= 1);
 	ASSERT(*nmap <= XFS_BMAP_MAX_NMAP);
@@ -4578,6 +4590,11 @@ xfs_bmapi_write(
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
 	if (whichfork == XFS_ATTR_FORK)
 		ASSERT(!(flags & XFS_BMAPI_REMAP));
+	if (whichfork == XFS_COW_FORK) {
+		ASSERT(!(flags & XFS_BMAPI_REMAP));
+		ASSERT(!(flags & XFS_BMAPI_PREALLOC));
+		ASSERT(!(flags & XFS_BMAPI_CONVERT));
+	}
 	if (flags & XFS_BMAPI_REMAP) {
 		ASSERT(!(flags & XFS_BMAPI_PREALLOC));
 		ASSERT(!(flags & XFS_BMAPI_CONVERT));
@@ -4647,6 +4664,8 @@ xfs_bmapi_write(
 		 */
 		if (flags & XFS_BMAPI_REMAP)
 			ASSERT(inhole);
+		if (flags & XFS_BMAPI_COWFORK)
+			ASSERT(!inhole);
 
 		/*
 		 * First, deal with the hole before the allocated space

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 096/145] xfs: support removing extents from CoW fork
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (94 preceding siblings ...)
  2016-06-17  1:40 ` [PATCH 095/145] xfs: support allocating delayed extents in " Darrick J. Wong
@ 2016-06-17  1:40 ` Darrick J. Wong
  2016-06-17  1:41 ` [PATCH 097/145] xfs: store in-progress CoW allocations in the refcount btree Darrick J. Wong
                   ` (48 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:40 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Create a helper method to remove extents from the CoW fork without
any of the side effects (rmapbt/bmbt updates) of the regular extent
deletion routine.  We'll eventually use this to clear out the CoW fork
during ioend processing.

v2: Use bmapi_read to iterate and trim the CoW extents instead of
reading them raw via the iext code.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_bmap.c |  176 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_bmap.h |    1 
 2 files changed, 177 insertions(+)


diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index 18dcd5f..3e4a3e7 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -4973,6 +4973,7 @@ xfs_bmap_del_extent(
 		/*
 		 * Matches the whole extent.  Delete the entry.
 		 */
+		trace_xfs_bmap_pre_update(ip, *idx, state, _THIS_IP_);
 		xfs_iext_remove(ip, *idx, 1,
 				whichfork == XFS_ATTR_FORK ? BMAP_ATTRFORK : 0);
 		--*idx;
@@ -5190,6 +5191,181 @@ done:
 }
 
 /*
+ * xfs_bunmapi_cow() -- Remove the relevant parts of the CoW fork.
+ *			See xfs_bmap_del_extent.
+ * @ip: XFS inode.
+ * @idx: Extent number to delete.
+ * @del: Extent to remove.
+ */
+int
+xfs_bunmapi_cow(
+	xfs_inode_t		*ip,
+	xfs_bmbt_irec_t		*del)
+{
+	xfs_filblks_t		da_new;	/* new delay-alloc indirect blocks */
+	xfs_filblks_t		da_old;	/* old delay-alloc indirect blocks */
+	xfs_fsblock_t		del_endblock = 0;/* first block past del */
+	xfs_fileoff_t		del_endoff;	/* first offset past del */
+	int			delay;	/* current block is delayed allocated */
+	xfs_bmbt_rec_host_t	*ep;	/* current extent entry pointer */
+	int			error;	/* error return value */
+	xfs_bmbt_irec_t		got;	/* current extent entry */
+	xfs_fileoff_t		got_endoff;	/* first offset past got */
+	xfs_ifork_t		*ifp;	/* inode fork pointer */
+	xfs_mount_t		*mp;	/* mount structure */
+	xfs_filblks_t		nblks;	/* quota/sb block count */
+	xfs_bmbt_irec_t		new;	/* new record to be inserted */
+	/* REFERENCED */
+	uint			qfield;	/* quota field to update */
+	xfs_filblks_t		temp;	/* for indirect length calculations */
+	xfs_filblks_t		temp2;	/* for indirect length calculations */
+	int			state = BMAP_COWFORK;
+	int			eof;
+	xfs_extnum_t		eidx;
+
+	mp = ip->i_mount;
+	XFS_STATS_INC(mp, xs_del_exlist);
+
+	ep = xfs_bmap_search_extents(ip, del->br_startoff, XFS_COW_FORK, &eof,
+			&eidx, &got, &new);
+
+	ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK);
+	ASSERT((eidx >= 0) && (eidx < ifp->if_bytes /
+		(uint)sizeof(xfs_bmbt_rec_t)));
+	ASSERT(del->br_blockcount > 0);
+	ASSERT(got.br_startoff <= del->br_startoff);
+	del_endoff = del->br_startoff + del->br_blockcount;
+	got_endoff = got.br_startoff + got.br_blockcount;
+	ASSERT(got_endoff >= del_endoff);
+	delay = isnullstartblock(got.br_startblock);
+	ASSERT(isnullstartblock(del->br_startblock) == delay);
+	qfield = 0;
+	error = 0;
+	/*
+	 * If deleting a real allocation, must free up the disk space.
+	 */
+	if (!delay) {
+		nblks = del->br_blockcount;
+		qfield = XFS_TRANS_DQ_BCOUNT;
+		/*
+		 * Set up del_endblock and cur for later.
+		 */
+		del_endblock = del->br_startblock + del->br_blockcount;
+		da_old = da_new = 0;
+	} else {
+		da_old = startblockval(got.br_startblock);
+		da_new = 0;
+		nblks = 0;
+	}
+	qfield = qfield;
+	nblks = nblks;
+
+	/*
+	 * Set flag value to use in switch statement.
+	 * Left-contig is 2, right-contig is 1.
+	 */
+	switch (((got.br_startoff == del->br_startoff) << 1) |
+		(got_endoff == del_endoff)) {
+	case 3:
+		/*
+		 * Matches the whole extent.  Delete the entry.
+		 */
+		xfs_iext_remove(ip, eidx, 1, BMAP_COWFORK);
+		--eidx;
+		break;
+
+	case 2:
+		/*
+		 * Deleting the first part of the extent.
+		 */
+		trace_xfs_bmap_pre_update(ip, eidx, state, _THIS_IP_);
+		xfs_bmbt_set_startoff(ep, del_endoff);
+		temp = got.br_blockcount - del->br_blockcount;
+		xfs_bmbt_set_blockcount(ep, temp);
+		if (delay) {
+			temp = XFS_FILBLKS_MIN(xfs_bmap_worst_indlen(ip, temp),
+				da_old);
+			xfs_bmbt_set_startblock(ep, nullstartblock((int)temp));
+			trace_xfs_bmap_post_update(ip, eidx, state, _THIS_IP_);
+			da_new = temp;
+			break;
+		}
+		xfs_bmbt_set_startblock(ep, del_endblock);
+		trace_xfs_bmap_post_update(ip, eidx, state, _THIS_IP_);
+		break;
+
+	case 1:
+		/*
+		 * Deleting the last part of the extent.
+		 */
+		temp = got.br_blockcount - del->br_blockcount;
+		trace_xfs_bmap_pre_update(ip, eidx, state, _THIS_IP_);
+		xfs_bmbt_set_blockcount(ep, temp);
+		if (delay) {
+			temp = XFS_FILBLKS_MIN(xfs_bmap_worst_indlen(ip, temp),
+				da_old);
+			xfs_bmbt_set_startblock(ep, nullstartblock((int)temp));
+			trace_xfs_bmap_post_update(ip, eidx, state, _THIS_IP_);
+			da_new = temp;
+			break;
+		}
+		trace_xfs_bmap_post_update(ip, eidx, state, _THIS_IP_);
+		break;
+
+	case 0:
+		/*
+		 * Deleting the middle of the extent.
+		 */
+		temp = del->br_startoff - got.br_startoff;
+		trace_xfs_bmap_pre_update(ip, eidx, state, _THIS_IP_);
+		xfs_bmbt_set_blockcount(ep, temp);
+		new.br_startoff = del_endoff;
+		temp2 = got_endoff - del_endoff;
+		new.br_blockcount = temp2;
+		new.br_state = got.br_state;
+		if (!delay) {
+			new.br_startblock = del_endblock;
+		} else {
+			temp = xfs_bmap_worst_indlen(ip, temp);
+			xfs_bmbt_set_startblock(ep, nullstartblock((int)temp));
+			temp2 = xfs_bmap_worst_indlen(ip, temp2);
+			new.br_startblock = nullstartblock((int)temp2);
+			da_new = temp + temp2;
+			while (da_new > da_old) {
+				if (temp) {
+					temp--;
+					da_new--;
+					xfs_bmbt_set_startblock(ep,
+						nullstartblock((int)temp));
+				}
+				if (da_new == da_old)
+					break;
+				if (temp2) {
+					temp2--;
+					da_new--;
+					new.br_startblock =
+						nullstartblock((int)temp2);
+				}
+			}
+		}
+		trace_xfs_bmap_post_update(ip, eidx, state, _THIS_IP_);
+		xfs_iext_insert(ip, eidx + 1, 1, &new, state);
+		++eidx;
+		break;
+	}
+
+	/*
+	 * Account for change in delayed indirect blocks.
+	 * Nothing to do for disk quota accounting here.
+	 */
+	ASSERT(da_old >= da_new);
+	if (da_old > da_new)
+		xfs_mod_fdblocks(mp, (int64_t)(da_old - da_new), false);
+
+	return error;
+}
+
+/*
  * Unmap (remove) blocks from a file.
  * If nexts is nonzero then the number of extents to remove is limited to
  * that value.  If not all extents in the block range can be removed then
diff --git a/libxfs/xfs_bmap.h b/libxfs/xfs_bmap.h
index d90f88e..1c7ab70 100644
--- a/libxfs/xfs_bmap.h
+++ b/libxfs/xfs_bmap.h
@@ -221,6 +221,7 @@ int	xfs_bunmapi(struct xfs_trans *tp, struct xfs_inode *ip,
 		xfs_fileoff_t bno, xfs_filblks_t len, int flags,
 		xfs_extnum_t nexts, xfs_fsblock_t *firstblock,
 		struct xfs_defer_ops *dfops, int *done);
+int	xfs_bunmapi_cow(struct xfs_inode *ip, struct xfs_bmbt_irec *del);
 int	xfs_check_nostate_extents(struct xfs_ifork *ifp, xfs_extnum_t idx,
 		xfs_extnum_t num);
 uint	xfs_default_attroffset(struct xfs_inode *ip);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 097/145] xfs: store in-progress CoW allocations in the refcount btree
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (95 preceding siblings ...)
  2016-06-17  1:40 ` [PATCH 096/145] xfs: support removing extents from " Darrick J. Wong
@ 2016-06-17  1:41 ` Darrick J. Wong
  2016-06-17  1:41 ` [PATCH 098/145] xfs: teach get_bmapx and fiemap about shared extents and the CoW fork Darrick J. Wong
                   ` (47 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:41 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Due to the way the CoW algorithm in XFS works, there's an interval
during which blocks allocated to handle a CoW can be lost -- if the FS
goes down after the blocks are allocated but before the block
remapping takes place.  This is exacerbated by the cowextsz hint --
allocated reservations can sit around for a while, waiting to get
used.

Since the refcount btree doesn't normally store records with refcount
of 1, we can use it to record these in-progress extents.  In-progress
blocks cannot be shared because they're not user-visible, so there
shouldn't be any conflicts with other programs.  This is a better
solution than holding EFIs during writeback because (a) EFIs can't be
relogged currently, (b) even if they could, EFIs are bound by
available log space, which puts an unnecessary upper bound on how much
CoW we can have in flight, and (c) we already have a mechanism to
track blocks.

At mount time, read the refcount records and free anything we find
with a refcount of 1 because those were in-progress when the FS went
down.

v2: Use the deferred operations system to avoid deadlocks and blowing
out the transaction reservation.  This allows us to unmap a CoW
extent from the refcountbt and into a file atomically.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 include/xfs_trace.h   |    4 +
 libxfs/xfs_bmap.c     |   11 ++
 libxfs/xfs_format.h   |    3 
 libxfs/xfs_refcount.c |  321 ++++++++++++++++++++++++++++++++++++++++++++++++-
 libxfs/xfs_refcount.h |    7 +
 5 files changed, 340 insertions(+), 6 deletions(-)


diff --git a/include/xfs_trace.h b/include/xfs_trace.h
index dfc92a6..7022164 100644
--- a/include/xfs_trace.h
+++ b/include/xfs_trace.h
@@ -261,6 +261,10 @@
 #define trace_xfs_bmap_deferred(...)		((void) 0)
 #define trace_xfs_bmap_defer(...)		((void) 0)
 
+#define trace_xfs_refcount_adjust_cow_error(...)	((void) 0)
+#define trace_xfs_refcount_cow_increase(...)	((void) 0)
+#define trace_xfs_refcount_cow_decrease(...)	((void) 0)
+
 /* set c = c to avoid unused var warnings */
 #define trace_xfs_perag_get(a,b,c,d)	((c) = (c))
 #define trace_xfs_perag_get_tag(a,b,c,d) ((c) = (c))
diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index 3e4a3e7..4b77ed0 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -4697,6 +4697,17 @@ xfs_bmapi_write(
 				goto error0;
 			if (bma.blkno == NULLFSBLOCK)
 				break;
+
+			/*
+			 * If this is a CoW allocation, record the data in
+			 * the refcount btree for orphan recovery.
+			 */
+			if (whichfork == XFS_COW_FORK) {
+				error = xfs_refcount_alloc_cow_extent(mp, dfops,
+						bma.blkno, bma.length);
+				if (error)
+					goto error0;
+			}
 		}
 
 		/* Deal with the allocated space we found.  */
diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index fdeaf53..820dfde 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -1401,7 +1401,8 @@ xfs_rmap_ino_owner(
 #define XFS_RMAP_OWN_INOBT	(-6ULL)	/* Inode btree blocks */
 #define XFS_RMAP_OWN_INODES	(-7ULL)	/* Inode chunk */
 #define XFS_RMAP_OWN_REFC	(-8ULL) /* refcount tree */
-#define XFS_RMAP_OWN_MIN	(-9ULL) /* guard */
+#define XFS_RMAP_OWN_COW	(-9ULL) /* cow allocations */
+#define XFS_RMAP_OWN_MIN	(-10ULL) /* guard */
 
 #define XFS_RMAP_NON_INODE_OWNER(owner)	(!!((owner) & (1ULL << 63)))
 
diff --git a/libxfs/xfs_refcount.c b/libxfs/xfs_refcount.c
index d2b614b..e03a9b7 100644
--- a/libxfs/xfs_refcount.c
+++ b/libxfs/xfs_refcount.c
@@ -35,13 +35,23 @@
 #include "xfs_trans.h"
 #include "xfs_bit.h"
 #include "xfs_refcount.h"
+#include "xfs_rmap_btree.h"
 
 /* Allowable refcount adjustment amounts. */
 enum xfs_refc_adjust_op {
 	XFS_REFCOUNT_ADJUST_INCREASE	= 1,
 	XFS_REFCOUNT_ADJUST_DECREASE	= -1,
+	XFS_REFCOUNT_ADJUST_COW_ALLOC	= 0,
+	XFS_REFCOUNT_ADJUST_COW_FREE	= -1,
 };
 
+STATIC int __xfs_refcount_cow_alloc(struct xfs_btree_cur *rcur,
+		xfs_agblock_t agbno, xfs_extlen_t aglen,
+		struct xfs_defer_ops *dfops);
+STATIC int __xfs_refcount_cow_free(struct xfs_btree_cur *rcur,
+		xfs_agblock_t agbno, xfs_extlen_t aglen,
+		struct xfs_defer_ops *dfops);
+
 /*
  * Look up the first record less than or equal to [bno, len] in the btree
  * given by cur.
@@ -467,6 +477,8 @@ out_error:
 	return error;
 }
 
+#define XFS_FIND_RCEXT_SHARED	1
+#define XFS_FIND_RCEXT_COW	2
 /*
  * Find the left extent and the one after it (cleft).  This function assumes
  * that we've already split any extent crossing agbno.
@@ -477,7 +489,8 @@ xfs_refcount_find_left_extents(
 	struct xfs_refcount_irec	*left,
 	struct xfs_refcount_irec	*cleft,
 	xfs_agblock_t			agbno,
-	xfs_extlen_t			aglen)
+	xfs_extlen_t			aglen,
+	int				flags)
 {
 	struct xfs_refcount_irec	tmp;
 	int				error;
@@ -497,6 +510,10 @@ xfs_refcount_find_left_extents(
 
 	if (RCNEXT(tmp) != agbno)
 		return 0;
+	if ((flags & XFS_FIND_RCEXT_SHARED) && tmp.rc_refcount < 2)
+		return 0;
+	if ((flags & XFS_FIND_RCEXT_COW) && tmp.rc_refcount > 1)
+		return 0;
 	/* We have a left extent; retrieve (or invent) the next right one */
 	*left = tmp;
 
@@ -553,7 +570,8 @@ xfs_refcount_find_right_extents(
 	struct xfs_refcount_irec	*right,
 	struct xfs_refcount_irec	*cright,
 	xfs_agblock_t			agbno,
-	xfs_extlen_t			aglen)
+	xfs_extlen_t			aglen,
+	int				flags)
 {
 	struct xfs_refcount_irec	tmp;
 	int				error;
@@ -573,6 +591,10 @@ xfs_refcount_find_right_extents(
 
 	if (tmp.rc_startblock != agbno + aglen)
 		return 0;
+	if ((flags & XFS_FIND_RCEXT_SHARED) && tmp.rc_refcount < 2)
+		return 0;
+	if ((flags & XFS_FIND_RCEXT_COW) && tmp.rc_refcount > 1)
+		return 0;
 	/* We have a right extent; retrieve (or invent) the next left one */
 	*right = tmp;
 
@@ -629,6 +651,7 @@ xfs_refcount_merge_extents(
 	xfs_agblock_t		*agbno,
 	xfs_extlen_t		*aglen,
 	enum xfs_refc_adjust_op adjust,
+	int			flags,
 	bool			*shape_changed)
 {
 	struct xfs_refcount_irec	left = {0}, cleft = {0};
@@ -644,11 +667,11 @@ xfs_refcount_merge_extents(
 	 * [right].
 	 */
 	error = xfs_refcount_find_left_extents(cur, &left, &cleft, *agbno,
-			*aglen);
+			*aglen, flags);
 	if (error)
 		return error;
 	error = xfs_refcount_find_right_extents(cur, &right, &cright, *agbno,
-			*aglen);
+			*aglen, flags);
 	if (error)
 		return error;
 
@@ -935,7 +958,7 @@ xfs_refcount_adjust(
 	 */
 	orig_aglen = aglen;
 	error = xfs_refcount_merge_extents(cur, &agbno, &aglen, adj,
-			&shape_changed);
+			XFS_FIND_RCEXT_SHARED, &shape_changed);
 	if (error)
 		goto out_error;
 	if (shape_changed)
@@ -1052,6 +1075,18 @@ xfs_refcount_finish_one(
 		error = xfs_refcount_adjust(rcur, bno, blockcount, adjusted,
 			XFS_REFCOUNT_ADJUST_DECREASE, dfops, NULL);
 		break;
+	case XFS_REFCOUNT_ALLOC_COW:
+		*adjusted = 0;
+		error = __xfs_refcount_cow_alloc(rcur, bno, blockcount, dfops);
+		if (!error)
+			*adjusted = blockcount;
+		break;
+	case XFS_REFCOUNT_FREE_COW:
+		*adjusted = 0;
+		error = __xfs_refcount_cow_free(rcur, bno, blockcount, dfops);
+		if (!error)
+			*adjusted = blockcount;
+		break;
 	default:
 		ASSERT(0);
 		error = -EFSCORRUPTED;
@@ -1237,3 +1272,279 @@ out:
 		trace_xfs_refcount_find_shared_error(mp, agno, error, _RET_IP_);
 	return error;
 }
+
+/*
+ * Recovering CoW Blocks After a Crash
+ *
+ * Due to the way that the copy on write mechanism works, there's a window of
+ * opportunity in which we can lose track of allocated blocks during a crash.
+ * Because CoW uses delayed allocation in the in-core CoW fork, writeback
+ * causes blocks to be allocated and stored in the CoW fork.  The blocks are
+ * no longer in the free space btree but are not otherwise recorded anywhere
+ * until the write completes and the blocks are mapped into the file.  A crash
+ * in between allocation and remapping results in the replacement blocks being
+ * lost.  This situation is exacerbated by the CoW extent size hint because
+ * allocations can hang around for long time.
+ *
+ * However, there is a place where we can record these allocations before they
+ * become mappings -- the reference count btree.  The btree does not record
+ * extents with refcount == 1, so we can record allocations with a refcount of
+ * 1.  Blocks being used for CoW writeout cannot be shared, so there should be
+ * no conflict with shared block records.  These mappings should be created
+ * when we allocate blocks to the CoW fork and deleted when they're removed
+ * from the CoW fork.
+ *
+ * Minor nit: records for in-progress CoW allocations and records for shared
+ * extents must never be merged, to preserve the property that (except for CoW
+ * allocations) there are no refcount btree entries with refcount == 1.  The
+ * only time this could potentially happen is when unsharing a block that's
+ * adjacent to CoW allocations, so we must be careful to avoid this.
+ *
+ * At mount time we recover lost CoW allocations by searching the refcount
+ * btree for these refcount == 1 mappings.  These represent CoW allocations
+ * that were in progress at the time the filesystem went down, so we can free
+ * them to get the space back.
+ *
+ * This mechanism is superior to creating EFIs for unmapped CoW extents for
+ * several reasons -- first, EFIs pin the tail of the log and would have to be
+ * periodically relogged to avoid filling up the log.  Second, CoW completions
+ * will have to file an EFD and create new EFIs for whatever remains in the
+ * CoW fork; this partially takes care of (1) but extent-size reservations
+ * will have to periodically relog even if there's no writeout in progress.
+ * This can happen if the CoW extent size hint is set, which you really want.
+ * Third, EFIs cannot currently be automatically relogged into newer
+ * transactions to advance the log tail.  Fourth, stuffing the log full of
+ * EFIs places an upper bound on the number of CoW allocations that can be
+ * held filesystem-wide at any given time.  Recording them in the refcount
+ * btree doesn't require us to maintain any state in memory and doesn't pin
+ * the log.
+ */
+/*
+ * Adjust the refcounts of CoW allocations.  These allocations are "magic"
+ * in that they're not referenced anywhere else in the filesystem, so we
+ * stash them in the refcount btree with a refcount of 1 until either file
+ * remapping (or CoW cancellation) happens.
+ */
+STATIC int
+xfs_refcount_adjust_cow_extents(
+	struct xfs_btree_cur	*cur,
+	xfs_agblock_t		agbno,
+	xfs_extlen_t		aglen,
+	enum xfs_refc_adjust_op	adj,
+	struct xfs_defer_ops	*dfops,
+	struct xfs_owner_info	*oinfo)
+{
+	struct xfs_refcount_irec	ext, tmp;
+	int				error;
+	int				found_rec, found_tmp;
+
+	if (aglen == 0)
+		return 0;
+
+	/* Find any overlapping refcount records */
+	error = xfs_refcountbt_lookup_ge(cur, agbno, &found_rec);
+	if (error)
+		goto out_error;
+	error = xfs_refcountbt_get_rec(cur, &ext, &found_rec);
+	if (error)
+		goto out_error;
+	if (!found_rec) {
+		ext.rc_startblock = cur->bc_mp->m_sb.sb_agblocks;
+		ext.rc_blockcount = 0;
+		ext.rc_refcount = 0;
+	}
+
+	switch (adj) {
+	case XFS_REFCOUNT_ADJUST_COW_ALLOC:
+		/* Adding a CoW reservation, there should be nothing here. */
+		XFS_WANT_CORRUPTED_GOTO(cur->bc_mp,
+				ext.rc_startblock >= agbno + aglen, out_error);
+
+		tmp.rc_startblock = agbno;
+		tmp.rc_blockcount = aglen;
+		tmp.rc_refcount = 1;
+		trace_xfs_refcount_modify_extent(cur->bc_mp,
+				cur->bc_private.a.agno, &tmp);
+
+		error = xfs_refcountbt_insert(cur, &tmp,
+				&found_tmp);
+		if (error)
+			goto out_error;
+		XFS_WANT_CORRUPTED_GOTO(cur->bc_mp,
+				found_tmp == 1, out_error);
+		break;
+	case XFS_REFCOUNT_ADJUST_COW_FREE:
+		/* Removing a CoW reservation, there should be one extent. */
+		XFS_WANT_CORRUPTED_GOTO(cur->bc_mp,
+			ext.rc_startblock == agbno, out_error);
+		XFS_WANT_CORRUPTED_GOTO(cur->bc_mp,
+			ext.rc_blockcount == aglen, out_error);
+		XFS_WANT_CORRUPTED_GOTO(cur->bc_mp,
+			ext.rc_refcount == 1, out_error);
+
+		ext.rc_refcount = 0;
+		trace_xfs_refcount_modify_extent(cur->bc_mp,
+				cur->bc_private.a.agno, &ext);
+		error = xfs_refcountbt_delete(cur, &found_rec);
+		if (error)
+			goto out_error;
+		XFS_WANT_CORRUPTED_GOTO(cur->bc_mp,
+				found_rec == 1, out_error);
+		break;
+	default:
+		ASSERT(0);
+	}
+
+	return error;
+out_error:
+	trace_xfs_refcount_modify_extent_error(cur->bc_mp,
+			cur->bc_private.a.agno, error, _RET_IP_);
+	return error;
+}
+
+/*
+ * Add or remove refcount btree entries for CoW reservations.
+ */
+STATIC int
+xfs_refcount_adjust_cow(
+	struct xfs_btree_cur	*cur,
+	xfs_agblock_t		agbno,
+	xfs_extlen_t		aglen,
+	enum xfs_refc_adjust_op	adj,
+	struct xfs_defer_ops	*dfops)
+{
+	bool			shape_changed;
+	int			error;
+
+	/*
+	 * Ensure that no rcextents cross the boundary of the adjustment range.
+	 */
+	error = xfs_refcount_split_extent(cur, agbno, &shape_changed);
+	if (error)
+		goto out_error;
+
+	error = xfs_refcount_split_extent(cur, agbno + aglen, &shape_changed);
+	if (error)
+		goto out_error;
+
+	/*
+	 * Try to merge with the left or right extents of the range.
+	 */
+	error = xfs_refcount_merge_extents(cur, &agbno, &aglen, adj,
+			XFS_FIND_RCEXT_COW, &shape_changed);
+	if (error)
+		goto out_error;
+
+	/* Now that we've taken care of the ends, adjust the middle extents */
+	error = xfs_refcount_adjust_cow_extents(cur, agbno, aglen, adj,
+			dfops, NULL);
+	if (error)
+		goto out_error;
+
+	return 0;
+
+out_error:
+	trace_xfs_refcount_adjust_cow_error(cur->bc_mp, cur->bc_private.a.agno,
+			error, _RET_IP_);
+	return error;
+}
+
+/*
+ * Record a CoW allocation in the refcount btree.
+ */
+STATIC int
+__xfs_refcount_cow_alloc(
+	struct xfs_btree_cur	*rcur,
+	xfs_agblock_t		agbno,
+	xfs_extlen_t		aglen,
+	struct xfs_defer_ops	*dfops)
+{
+	int			error;
+
+	trace_xfs_refcount_cow_increase(rcur->bc_mp, rcur->bc_private.a.agno,
+			agbno, aglen);
+
+	/* Add refcount btree reservation */
+	error = xfs_refcount_adjust_cow(rcur, agbno, aglen,
+			XFS_REFCOUNT_ADJUST_COW_ALLOC, dfops);
+	if (error)
+		return error;
+
+	/* Add rmap entry */
+	if (xfs_sb_version_hasrmapbt(&rcur->bc_mp->m_sb)) {
+		error = xfs_rmap_alloc_defer(rcur->bc_mp, dfops,
+				rcur->bc_private.a.agno,
+				agbno, aglen, XFS_RMAP_OWN_COW);
+		if (error)
+			return error;
+	}
+
+	return error;
+}
+
+/*
+ * Remove a CoW allocation from the refcount btree.
+ */
+STATIC int
+__xfs_refcount_cow_free(
+	struct xfs_btree_cur	*rcur,
+	xfs_agblock_t		agbno,
+	xfs_extlen_t		aglen,
+	struct xfs_defer_ops	*dfops)
+{
+	int			error;
+
+	trace_xfs_refcount_cow_decrease(rcur->bc_mp, rcur->bc_private.a.agno,
+			agbno, aglen);
+
+	/* Remove refcount btree reservation */
+	error = xfs_refcount_adjust_cow(rcur, agbno, aglen,
+			XFS_REFCOUNT_ADJUST_COW_FREE, dfops);
+	if (error)
+		return error;
+
+	/* Remove rmap entry */
+	if (xfs_sb_version_hasrmapbt(&rcur->bc_mp->m_sb)) {
+		error = xfs_rmap_free_defer(rcur->bc_mp, dfops,
+				rcur->bc_private.a.agno,
+				agbno, aglen, XFS_RMAP_OWN_COW);
+		if (error)
+			return error;
+	}
+
+	return error;
+}
+
+/* Record a CoW staging extent in the refcount btree. */
+int
+xfs_refcount_alloc_cow_extent(
+	struct xfs_mount		*mp,
+	struct xfs_defer_ops		*dfops,
+	xfs_fsblock_t			fsb,
+	xfs_extlen_t			len)
+{
+	struct xfs_refcount_intent	ri;
+
+	ri.ri_type = XFS_REFCOUNT_ALLOC_COW;
+	ri.ri_startblock = fsb;
+	ri.ri_blockcount = len;
+
+	return __xfs_refcount_add(mp, dfops, &ri);
+}
+
+/* Forget a CoW staging event in the refcount btree. */
+int
+xfs_refcount_free_cow_extent(
+	struct xfs_mount		*mp,
+	struct xfs_defer_ops		*dfops,
+	xfs_fsblock_t			fsb,
+	xfs_extlen_t			len)
+{
+	struct xfs_refcount_intent	ri;
+
+	ri.ri_type = XFS_REFCOUNT_FREE_COW;
+	ri.ri_startblock = fsb;
+	ri.ri_blockcount = len;
+
+	return __xfs_refcount_add(mp, dfops, &ri);
+}
diff --git a/libxfs/xfs_refcount.h b/libxfs/xfs_refcount.h
index b7b83b8..6665eeb 100644
--- a/libxfs/xfs_refcount.h
+++ b/libxfs/xfs_refcount.h
@@ -57,4 +57,11 @@ extern int xfs_refcount_find_shared(struct xfs_mount *mp, xfs_agnumber_t agno,
 		xfs_agblock_t agbno, xfs_extlen_t aglen, xfs_agblock_t *fbno,
 		xfs_extlen_t *flen, bool find_maximal);
 
+extern int xfs_refcount_alloc_cow_extent(struct xfs_mount *mp,
+		struct xfs_defer_ops *dfops, xfs_fsblock_t fsb,
+		xfs_extlen_t len);
+extern int xfs_refcount_free_cow_extent(struct xfs_mount *mp,
+		struct xfs_defer_ops *dfops, xfs_fsblock_t fsb,
+		xfs_extlen_t len);
+
 #endif	/* __XFS_REFCOUNT_H__ */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 098/145] xfs: teach get_bmapx and fiemap about shared extents and the CoW fork
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (96 preceding siblings ...)
  2016-06-17  1:41 ` [PATCH 097/145] xfs: store in-progress CoW allocations in the refcount btree Darrick J. Wong
@ 2016-06-17  1:41 ` Darrick J. Wong
  2016-06-17  1:41 ` [PATCH 099/145] xfs: support FS_XFLAG_REFLINK on reflink filesystems Darrick J. Wong
                   ` (46 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:41 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Teach xfs_getbmapx how to report shared extents and CoW fork contents,
then modify the FIEMAP formatters to set the appropriate flags.  A
previous version of this patch only modified the fiemap formatter,
which is insufficient.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_fs.h |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)


diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h
index f291a53..bb70066 100644
--- a/libxfs/xfs_fs.h
+++ b/libxfs/xfs_fs.h
@@ -105,14 +105,16 @@ struct getbmapx {
 #define BMV_IF_PREALLOC		0x4	/* rtn status BMV_OF_PREALLOC if req */
 #define BMV_IF_DELALLOC		0x8	/* rtn status BMV_OF_DELALLOC if req */
 #define BMV_IF_NO_HOLES		0x10	/* Do not return holes */
+#define BMV_IF_COWFORK		0x20	/* return CoW fork rather than data */
 #define BMV_IF_VALID	\
 	(BMV_IF_ATTRFORK|BMV_IF_NO_DMAPI_READ|BMV_IF_PREALLOC|	\
-	 BMV_IF_DELALLOC|BMV_IF_NO_HOLES)
+	 BMV_IF_DELALLOC|BMV_IF_NO_HOLES|BMV_IF_COWFORK)
 
 /*	bmv_oflags values - returned for each non-header segment */
 #define BMV_OF_PREALLOC		0x1	/* segment = unwritten pre-allocation */
 #define BMV_OF_DELALLOC		0x2	/* segment = delayed allocation */
 #define BMV_OF_LAST		0x4	/* segment is the last in the file */
+#define BMV_OF_SHARED		0x8	/* segment shared with another file */
 
 /*
  * Structure for XFS_IOC_FSSETDM.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 099/145] xfs: support FS_XFLAG_REFLINK on reflink filesystems
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (97 preceding siblings ...)
  2016-06-17  1:41 ` [PATCH 098/145] xfs: teach get_bmapx and fiemap about shared extents and the CoW fork Darrick J. Wong
@ 2016-06-17  1:41 ` Darrick J. Wong
  2016-06-17  1:41 ` [PATCH 100/145] xfs: create a separate cow extent size hint for the allocator Darrick J. Wong
                   ` (45 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:41 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Add support for reporting the "reflink" inode flag in the XFS-specific
getxflags ioctl, and allow the user to clear the flag if file size is
zero.

v2: Move the reflink flag out of the way of the DAX flag, and add the
new cowextsize flag.

v3: do not report (or allow changes to) FL_NOCOW_FL, since we don't
support a flag to prevent CoWing and the reflink flag is a poor
proxy.  We'll try to design away the need for the NOCOW flag.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 include/darwin.h  |    2 ++
 include/freebsd.h |    2 ++
 include/irix.h    |    2 ++
 include/linux.h   |    2 ++
 4 files changed, 8 insertions(+)


diff --git a/include/darwin.h b/include/darwin.h
index 2935b4c..abb2a22 100644
--- a/include/darwin.h
+++ b/include/darwin.h
@@ -314,6 +314,8 @@ struct fsxattr {
 #define FS_XFLAG_NODEFRAG	0x00002000	/* do not defragment */
 #define FS_XFLAG_FILESTREAM	0x00004000	/* use filestream allocator */
 #define FS_XFLAG_DAX		0x00008000	/* use DAX for IO */
+#define FS_XFLAG_REFLINK	0x00010000	/* file is reflinked */
+#define FS_XFLAG_COWEXTSIZE	0x00020000	/* CoW extent size allocator hint */
 #define FS_XFLAG_HASATTR	0x80000000	/* no DIFLAG for this	*/
 
 #define FS_IOC_FSGETXATTR     _IOR ('X', 31, struct fsxattr)
diff --git a/include/freebsd.h b/include/freebsd.h
index 3feca07..fc58a74 100644
--- a/include/freebsd.h
+++ b/include/freebsd.h
@@ -204,6 +204,8 @@ struct fsxattr {
 #define FS_XFLAG_NODEFRAG	0x00002000	/* do not defragment */
 #define FS_XFLAG_FILESTREAM	0x00004000	/* use filestream allocator */
 #define FS_XFLAG_DAX		0x00008000	/* use DAX for IO */
+#define FS_XFLAG_REFLINK	0x00010000	/* file is reflinked */
+#define FS_XFLAG_COWEXTSIZE	0x00020000	/* CoW extent size allocator hint */
 #define FS_XFLAG_HASATTR	0x80000000	/* no DIFLAG for this	*/
 
 #define FS_IOC_FSGETXATTR     _IOR ('X', 31, struct fsxattr)
diff --git a/include/irix.h b/include/irix.h
index 45c8594..c4d25b5 100644
--- a/include/irix.h
+++ b/include/irix.h
@@ -449,6 +449,8 @@ struct fsxattr {
 #define FS_XFLAG_NODEFRAG	0x00002000	/* do not defragment */
 #define FS_XFLAG_FILESTREAM	0x00004000	/* use filestream allocator */
 #define FS_XFLAG_DAX		0x00008000	/* use DAX for IO */
+#define FS_XFLAG_REFLINK	0x00010000	/* file is reflinked */
+#define FS_XFLAG_COWEXTSIZE	0x00020000	/* CoW extent size allocator hint */
 #define FS_XFLAG_HASATTR	0x80000000	/* no DIFLAG for this	*/
 
 #define FS_IOC_FSGETXATTR		F_FSGETXATTR
diff --git a/include/linux.h b/include/linux.h
index cd4b3eb..d47a29c 100644
--- a/include/linux.h
+++ b/include/linux.h
@@ -207,6 +207,8 @@ struct fsxattr {
 #define FS_XFLAG_NODEFRAG	0x00002000	/* do not defragment */
 #define FS_XFLAG_FILESTREAM	0x00004000	/* use filestream allocator */
 #define FS_XFLAG_DAX		0x00008000	/* use DAX for IO */
+#define FS_XFLAG_REFLINK	0x00010000	/* file is reflinked */
+#define FS_XFLAG_COWEXTSIZE	0x00020000	/* CoW extent size allocator hint */
 #define FS_XFLAG_HASATTR	0x80000000	/* no DIFLAG for this	*/
 
 #define FS_IOC_FSGETXATTR     _IOR ('X', 31, struct fsxattr)

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 100/145] xfs: create a separate cow extent size hint for the allocator
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (98 preceding siblings ...)
  2016-06-17  1:41 ` [PATCH 099/145] xfs: support FS_XFLAG_REFLINK on reflink filesystems Darrick J. Wong
@ 2016-06-17  1:41 ` Darrick J. Wong
  2016-06-17  1:41 ` [PATCH 101/145] xfs: preallocate blocks for worst-case btree expansion Darrick J. Wong
                   ` (44 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:41 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Create a per-inode extent size allocator hint for copy-on-write.  This
hint is separate from the existing extent size hint so that CoW can
take advantage of the fragmentation-reducing properties of extent size
hints without disabling delalloc for regular writes.

The extent size hint that's fed to the allocator during a copy on
write operation is the greater of the cowextsize and regular extsize
hint.

During reflink, if we're sharing the entire source file to the entire
destination file and the destination file doesn't already have a
cowextsize hint, propagate the source file's cowextsize hint to the
destination file.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 include/darwin.h        |    3 ++-
 include/freebsd.h       |    3 ++-
 include/irix.h          |    3 ++-
 include/linux.h         |    3 ++-
 libxfs/libxfs_priv.h    |    1 +
 libxfs/xfs_bmap.c       |   13 +++++++++++--
 libxfs/xfs_format.h     |    3 ++-
 libxfs/xfs_fs.h         |    3 ++-
 libxfs/xfs_inode_buf.c  |    4 +++-
 libxfs/xfs_inode_buf.h  |    1 +
 libxfs/xfs_log_format.h |    3 ++-
 11 files changed, 30 insertions(+), 10 deletions(-)


diff --git a/include/darwin.h b/include/darwin.h
index abb2a22..2cbd058 100644
--- a/include/darwin.h
+++ b/include/darwin.h
@@ -293,7 +293,8 @@ struct fsxattr {
 	__u32		fsx_extsize;	/* extsize field value (get/set)*/
 	__u32		fsx_nextents;	/* nextents field value (get)	*/
 	__u32		fsx_projid;	/* project identifier (get/set) */
-	unsigned char	fsx_pad[12];
+	__u32		fsx_cowextsize;	/* cow extsize field value (get/set) */
+	unsigned char	fsx_pad[8];
 };
 
 /*
diff --git a/include/freebsd.h b/include/freebsd.h
index fc58a74..1d84b14 100644
--- a/include/freebsd.h
+++ b/include/freebsd.h
@@ -183,7 +183,8 @@ struct fsxattr {
 	__u32		fsx_extsize;	/* extsize field value (get/set)*/
 	__u32		fsx_nextents;	/* nextents field value (get)	*/
 	__u32		fsx_projid;	/* project identifier (get/set) */
-	unsigned char	fsx_pad[12];
+	__u32		fsx_cowextsize;	/* cow extsize field value (get/set) */
+	unsigned char	fsx_pad[8];
 };
 
 /*
diff --git a/include/irix.h b/include/irix.h
index c4d25b5..0cf796a 100644
--- a/include/irix.h
+++ b/include/irix.h
@@ -428,7 +428,8 @@ struct fsxattr {
 	__u32		fsx_extsize;	/* extsize field value (get/set)*/
 	__u32		fsx_nextents;	/* nextents field value (get)	*/
 	__u32		fsx_projid;	/* project identifier (get/set) */
-	unsigned char	fsx_pad[12];
+	__u32		fsx_cowextsize;	/* cow extsize field value (get/set) */
+	unsigned char	fsx_pad[8];
 };
 
 /*
diff --git a/include/linux.h b/include/linux.h
index d47a29c..2aeecd0 100644
--- a/include/linux.h
+++ b/include/linux.h
@@ -186,7 +186,8 @@ struct fsxattr {
 	__u32		fsx_extsize;	/* extsize field value (get/set)*/
 	__u32		fsx_nextents;	/* nextents field value (get)	*/
 	__u32		fsx_projid;	/* project identifier (get/set) */
-	unsigned char	fsx_pad[12];
+	__u32		fsx_cowextsize;	/* cow extsize field value (get/set) */
+	unsigned char	fsx_pad[8];
 };
 
 /*
diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h
index ba16544..41f6b96 100644
--- a/libxfs/libxfs_priv.h
+++ b/libxfs/libxfs_priv.h
@@ -426,6 +426,7 @@ do { \
 #define xfs_rotorstep				1
 #define xfs_bmap_rtalloc(a)			(-ENOSYS)
 #define xfs_get_extsz_hint(ip)			(0)
+#define xfs_get_cowextsz_hint(ip)		(0)
 #define xfs_inode_is_filestream(ip)		(0)
 #define xfs_filestream_lookup_ag(ip)		(0)
 #define xfs_filestream_new_ag(ip,ag)		(0)
diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index 4b77ed0..e81ade5 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -3657,7 +3657,13 @@ xfs_bmap_btalloc(
 	else if (mp->m_dalign)
 		stripe_align = mp->m_dalign;
 
-	align = ap->userdata ? xfs_get_extsz_hint(ap->ip) : 0;
+	if (ap->userdata) {
+		if (ap->flags & XFS_BMAPI_COWFORK)
+			align = xfs_get_cowextsz_hint(ap->ip);
+		else
+			align = xfs_get_extsz_hint(ap->ip);
+	} else
+		align = 0;
 	if (unlikely(align)) {
 		error = xfs_bmap_extsize_align(mp, &ap->got, &ap->prev,
 						align, 0, ap->eof, 0, ap->conv,
@@ -4170,7 +4176,10 @@ xfs_bmapi_reserve_delalloc(
 		alen = XFS_FILBLKS_MIN(alen, got->br_startoff - aoff);
 
 	/* Figure out the extent size, adjust alen */
-	extsz = xfs_get_extsz_hint(ip);
+	if (whichfork == XFS_COW_FORK)
+		extsz = xfs_get_cowextsz_hint(ip);
+	else
+		extsz = xfs_get_extsz_hint(ip);
 	if (extsz) {
 		error = xfs_bmap_extsize_align(mp, got, prev, extsz, rt, eof,
 					       1, 0, &aoff, &alen);
diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index 820dfde..de29220 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -890,7 +890,8 @@ typedef struct xfs_dinode {
 	__be64		di_changecount;	/* number of attribute changes */
 	__be64		di_lsn;		/* flush sequence */
 	__be64		di_flags2;	/* more random flags */
-	__u8		di_pad2[16];	/* more padding for future expansion */
+	__be32		di_cowextsize;	/* basic cow extent size for file */
+	__u8		di_pad2[12];	/* more padding for future expansion */
 
 	/* fields only written to during inode creation */
 	xfs_timestamp_t	di_crtime;	/* time created */
diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h
index bb70066..df58c1c 100644
--- a/libxfs/xfs_fs.h
+++ b/libxfs/xfs_fs.h
@@ -302,7 +302,8 @@ typedef struct xfs_bstat {
 #define	bs_projid	bs_projid_lo	/* (previously just bs_projid)	*/
 	__u16		bs_forkoff;	/* inode fork offset in bytes	*/
 	__u16		bs_projid_hi;	/* higher part of project id	*/
-	unsigned char	bs_pad[10];	/* pad space, unused		*/
+	unsigned char	bs_pad[6];	/* pad space, unused		*/
+	__u32		bs_cowextsize;	/* cow extent size		*/
 	__u32		bs_dmevmask;	/* DMIG event mask		*/
 	__u16		bs_dmstate;	/* DMIG state info		*/
 	__u16		bs_aextents;	/* attribute number of extents	*/
diff --git a/libxfs/xfs_inode_buf.c b/libxfs/xfs_inode_buf.c
index 572c101..8a804e2 100644
--- a/libxfs/xfs_inode_buf.c
+++ b/libxfs/xfs_inode_buf.c
@@ -265,6 +265,7 @@ xfs_inode_from_disk(
 		to->di_crtime.t_sec = be32_to_cpu(from->di_crtime.t_sec);
 		to->di_crtime.t_nsec = be32_to_cpu(from->di_crtime.t_nsec);
 		to->di_flags2 = be64_to_cpu(from->di_flags2);
+		to->di_cowextsize = be32_to_cpu(from->di_cowextsize);
 	}
 }
 
@@ -314,7 +315,7 @@ xfs_inode_to_disk(
 		to->di_crtime.t_sec = cpu_to_be32(from->di_crtime.t_sec);
 		to->di_crtime.t_nsec = cpu_to_be32(from->di_crtime.t_nsec);
 		to->di_flags2 = cpu_to_be64(from->di_flags2);
-
+		to->di_cowextsize = cpu_to_be32(from->di_cowextsize);
 		to->di_ino = cpu_to_be64(ip->i_ino);
 		to->di_lsn = cpu_to_be64(lsn);
 		memset(to->di_pad2, 0, sizeof(to->di_pad2));
@@ -366,6 +367,7 @@ xfs_log_dinode_to_disk(
 		to->di_crtime.t_sec = cpu_to_be32(from->di_crtime.t_sec);
 		to->di_crtime.t_nsec = cpu_to_be32(from->di_crtime.t_nsec);
 		to->di_flags2 = cpu_to_be64(from->di_flags2);
+		to->di_cowextsize = cpu_to_be32(from->di_cowextsize);
 		to->di_ino = cpu_to_be64(from->di_ino);
 		to->di_lsn = cpu_to_be64(from->di_lsn);
 		memcpy(to->di_pad2, from->di_pad2, sizeof(to->di_pad2));
diff --git a/libxfs/xfs_inode_buf.h b/libxfs/xfs_inode_buf.h
index 958c543..6848a0a 100644
--- a/libxfs/xfs_inode_buf.h
+++ b/libxfs/xfs_inode_buf.h
@@ -47,6 +47,7 @@ struct xfs_icdinode {
 	__uint16_t	di_flags;	/* random flags, XFS_DIFLAG_... */
 
 	__uint64_t	di_flags2;	/* more random flags */
+	__uint32_t	di_cowextsize;	/* basic cow extent size for file */
 
 	xfs_ictimestamp_t di_crtime;	/* time created */
 };
diff --git a/libxfs/xfs_log_format.h b/libxfs/xfs_log_format.h
index 320a305..9cab67f 100644
--- a/libxfs/xfs_log_format.h
+++ b/libxfs/xfs_log_format.h
@@ -423,7 +423,8 @@ struct xfs_log_dinode {
 	__uint64_t	di_changecount;	/* number of attribute changes */
 	xfs_lsn_t	di_lsn;		/* flush sequence */
 	__uint64_t	di_flags2;	/* more random flags */
-	__uint8_t	di_pad2[16];	/* more padding for future expansion */
+	__uint32_t	di_cowextsize;	/* basic cow extent size for file */
+	__uint8_t	di_pad2[12];	/* more padding for future expansion */
 
 	/* fields only written to during inode creation */
 	xfs_ictimestamp_t di_crtime;	/* time created */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 101/145] xfs: preallocate blocks for worst-case btree expansion
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (99 preceding siblings ...)
  2016-06-17  1:41 ` [PATCH 100/145] xfs: create a separate cow extent size hint for the allocator Darrick J. Wong
@ 2016-06-17  1:41 ` Darrick J. Wong
  2016-06-17  1:41 ` [PATCH 102/145] xfs: try other AGs to allocate a BMBT block Darrick J. Wong
                   ` (43 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:41 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: Christoph Hellwig, xfs

To gracefully handle the situation where a CoW operation turns a
single refcount extent into a lot of tiny ones and then run out of
space when a tree split has to happen, use the per-AG reserved block
pool to pre-allocate all the space we'll ever need for a maximal
btree.  For a 4K block size, this only costs an overhead of 0.3% of
available disk space.

When reflink is enabled, we have an unfortunate problem with rmap --
since we can share a block billions of times, this means that the
reverse mapping btree can expand basically infinitely.  When an AG is
so full that there are no free blocks with which to expand the rmapbt,
the filesystem will shut down hard.

This is rather annoying to the user, so use the AG reservation code to
reserve a "reasonable" amount of space for rmap.  We'll prevent
reflinks and CoW operations if we think we're getting close to
exhausting an AG's free space rather than shutting down, but this
permanent reservation should be enough for "most" users.  Hopefully.

v2: Simplify the return value from xfs_perag_pool_free_block to a bool
so that we can easily call xfs_trans_binval for both the per-AG pool
and the real freeing case.  Without this we fail to invalidate the
btree buffer and will trip over the write verifier on a shrinking
refcount btree.

v3: Convert to the new per-AG reservation code.

v4: Combine this patch with the one that adds the rmapbt reservation,
since the rmapbt reservation is only needed for reflink filesystems.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[hch@lst.de: ensure that we invalidate the freed btree buffer]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_ag_resv.c        |   11 ++++++
 libxfs/xfs_alloc.c          |    2 -
 libxfs/xfs_refcount_btree.c |   61 ++++++++++++++++++++++++++++++---
 libxfs/xfs_refcount_btree.h |    3 ++
 libxfs/xfs_rmap_btree.c     |   80 +++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_rmap_btree.h     |    7 ++++
 6 files changed, 157 insertions(+), 7 deletions(-)


diff --git a/libxfs/xfs_ag_resv.c b/libxfs/xfs_ag_resv.c
index 03413e4..dd899d4 100644
--- a/libxfs/xfs_ag_resv.c
+++ b/libxfs/xfs_ag_resv.c
@@ -37,6 +37,7 @@
 #include "xfs_trans_space.h"
 #include "xfs_rmap_btree.h"
 #include "xfs_btree.h"
+#include "xfs_refcount_btree.h"
 
 /*
  * Per-AG Block Reservations
@@ -224,6 +225,11 @@ xfs_ag_resv_init(
 	/* Create the metadata reservation. */
 	ask = used = 0;
 
+	err2 = xfs_refcountbt_calc_reserves(pag->pag_mount, pag->pag_agno,
+			&ask, &used);
+	if (err2 && !error)
+		error = err2;
+
 	err2 = __xfs_ag_resv_init(pag, XFS_AG_RESV_METADATA, ask, used);
 	if (err2 && !error)
 		error = err2;
@@ -235,6 +241,11 @@ init_agfl:
 	/* Create the AGFL metadata reservation */
 	ask = used = 0;
 
+	err2 = xfs_rmapbt_calc_reserves(pag->pag_mount, pag->pag_agno,
+			&ask, &used);
+	if (err2 && !error)
+		error = err2;
+
 	err2 = __xfs_ag_resv_init(pag, XFS_AG_RESV_AGFL, ask, used);
 	if (err2 && !error)
 		error = err2;
diff --git a/libxfs/xfs_alloc.c b/libxfs/xfs_alloc.c
index ca3e7ce..33087e6 100644
--- a/libxfs/xfs_alloc.c
+++ b/libxfs/xfs_alloc.c
@@ -135,8 +135,6 @@ xfs_alloc_ag_max_usable(struct xfs_mount *mp)
 		/* rmap root block + full tree split on full AG */
 		blocks += 1 + (2 * mp->m_ag_maxlevels) - 1;
 	}
-	if (xfs_sb_version_hasreflink(&mp->m_sb))
-		blocks += xfs_refcountbt_max_size(mp);
 
 	return mp->m_sb.sb_agblocks - blocks;
 }
diff --git a/libxfs/xfs_refcount_btree.c b/libxfs/xfs_refcount_btree.c
index 8c1cba9..1b3ba07 100644
--- a/libxfs/xfs_refcount_btree.c
+++ b/libxfs/xfs_refcount_btree.c
@@ -75,6 +75,8 @@ xfs_refcountbt_alloc_block(
 	struct xfs_alloc_arg	args;		/* block allocation args */
 	int			error;		/* error return value */
 
+	XFS_BTREE_TRACE_CURSOR(cur, XBT_ENTRY);
+
 	memset(&args, 0, sizeof(args));
 	args.tp = cur->bc_tp;
 	args.mp = cur->bc_mp;
@@ -84,6 +86,7 @@ xfs_refcountbt_alloc_block(
 	args.firstblock = args.fsbno;
 	xfs_rmap_ag_owner(&args.oinfo, XFS_RMAP_OWN_REFC);
 	args.minlen = args.maxlen = args.prod = 1;
+	args.resv = XFS_AG_RESV_METADATA;
 
 	error = xfs_alloc_vextent(&args);
 	if (error)
@@ -115,17 +118,20 @@ xfs_refcountbt_free_block(
 	struct xfs_buf		*bp)
 {
 	struct xfs_mount	*mp = cur->bc_mp;
-	struct xfs_trans	*tp = cur->bc_tp;
 	xfs_fsblock_t		fsbno = XFS_DADDR_TO_FSB(mp, XFS_BUF_ADDR(bp));
 	struct xfs_owner_info	oinfo;
+	int			error;
 
 	trace_xfs_refcountbt_free_block(cur->bc_mp, cur->bc_private.a.agno,
 			XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno), 1);
 	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_REFC);
-	xfs_bmap_add_free(mp, cur->bc_private.a.dfops, fsbno, 1,
-			&oinfo);
-	xfs_trans_binval(tp, bp);
-	return 0;
+	error = xfs_free_extent(cur->bc_tp, fsbno, 1, &oinfo,
+			XFS_AG_RESV_METADATA);
+	if (error)
+		return error;
+
+	xfs_trans_binval(cur->bc_tp, bp);
+	return error;
 }
 
 STATIC int
@@ -395,3 +401,48 @@ xfs_refcountbt_max_size(
 
 	return xfs_refcountbt_calc_size(mp, mp->m_sb.sb_agblocks);
 }
+
+/* Count the blocks in the reference count tree. */
+static int
+xfs_refcountbt_count_blocks(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	xfs_extlen_t		*tree_blocks)
+{
+	struct xfs_buf		*agbp;
+	struct xfs_btree_cur	*cur;
+	int			error;
+
+	error = xfs_alloc_read_agf(mp, NULL, agno, 0, &agbp);
+	if (error)
+		return error;
+	cur = xfs_refcountbt_init_cursor(mp, NULL, agbp, agno, NULL);
+	error = xfs_btree_count_blocks(cur, tree_blocks);
+	xfs_btree_del_cursor(cur, error ? XFS_BTREE_ERROR : XFS_BTREE_NOERROR);
+	xfs_trans_brelse(NULL, agbp);
+
+	return error;
+}
+
+/*
+ * Figure out how many blocks to reserve and how many are used by this btree.
+ */
+int
+xfs_refcountbt_calc_reserves(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	xfs_extlen_t		*ask,
+	xfs_extlen_t		*used)
+{
+	xfs_extlen_t		tree_len = 0;
+	int			error;
+
+	if (!xfs_sb_version_hasreflink(&mp->m_sb))
+		return 0;
+
+	*ask += xfs_refcountbt_max_size(mp);
+	error = xfs_refcountbt_count_blocks(mp, agno, &tree_len);
+	*used += tree_len;
+
+	return error;
+}
diff --git a/libxfs/xfs_refcount_btree.h b/libxfs/xfs_refcount_btree.h
index 780b02f..3be7768 100644
--- a/libxfs/xfs_refcount_btree.h
+++ b/libxfs/xfs_refcount_btree.h
@@ -68,4 +68,7 @@ extern xfs_extlen_t xfs_refcountbt_calc_size(struct xfs_mount *mp,
 		unsigned long long len);
 extern xfs_extlen_t xfs_refcountbt_max_size(struct xfs_mount *mp);
 
+extern int xfs_refcountbt_calc_reserves(struct xfs_mount *mp,
+		xfs_agnumber_t agno, xfs_extlen_t *ask, xfs_extlen_t *used);
+
 #endif	/* __XFS_REFCOUNT_BTREE_H__ */
diff --git a/libxfs/xfs_rmap_btree.c b/libxfs/xfs_rmap_btree.c
index 3fedfd7..0b7da82 100644
--- a/libxfs/xfs_rmap_btree.c
+++ b/libxfs/xfs_rmap_btree.c
@@ -32,6 +32,7 @@
 #include "xfs_rmap_btree.h"
 #include "xfs_trace.h"
 #include "xfs_cksum.h"
+#include "xfs_ag_resv.h"
 
 /*
  * Reverse map btree.
@@ -58,6 +59,14 @@
  * try to recover tree and file data from corrupt primary metadata.
  */
 
+static bool
+xfs_rmapbt_need_reserve(
+	struct xfs_mount	*mp)
+{
+	return  xfs_sb_version_hasrmapbt(&mp->m_sb) &&
+		xfs_sb_version_hasreflink(&mp->m_sb);
+}
+
 static struct xfs_btree_cur *
 xfs_rmapbt_dup_cursor(
 	struct xfs_btree_cur	*cur)
@@ -479,3 +488,74 @@ xfs_rmapbt_compute_maxlevels(
 		mp->m_rmap_maxlevels = xfs_btree_compute_maxlevels(mp,
 				mp->m_rmap_mnr, mp->m_sb.sb_agblocks);
 }
+
+/* Calculate the refcount btree size for some records. */
+xfs_extlen_t
+xfs_rmapbt_calc_size(
+	struct xfs_mount	*mp,
+	unsigned long long	len)
+{
+	return xfs_btree_calc_size(mp, mp->m_rmap_mnr, len);
+}
+
+/*
+ * Calculate the maximum refcount btree size.
+ */
+xfs_extlen_t
+xfs_rmapbt_max_size(
+	struct xfs_mount	*mp)
+{
+	/* Bail out if we're uninitialized, which can happen in mkfs. */
+	if (mp->m_rmap_mxr[0] == 0)
+		return 0;
+
+	return xfs_rmapbt_calc_size(mp, mp->m_sb.sb_agblocks);
+}
+
+/* Count the blocks in the reference count tree. */
+static int
+xfs_rmapbt_count_blocks(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	xfs_extlen_t		*tree_blocks)
+{
+	struct xfs_buf		*agbp;
+	struct xfs_btree_cur	*cur;
+	int			error;
+
+	error = xfs_alloc_read_agf(mp, NULL, agno, 0, &agbp);
+	if (error)
+		return error;
+	cur = xfs_rmapbt_init_cursor(mp, NULL, agbp, agno);
+	error = xfs_btree_count_blocks(cur, tree_blocks);
+	xfs_btree_del_cursor(cur, error ? XFS_BTREE_ERROR : XFS_BTREE_NOERROR);
+	xfs_trans_brelse(NULL, agbp);
+
+	return error;
+}
+
+/*
+ * Figure out how many blocks to reserve and how many are used by this btree.
+ */
+int
+xfs_rmapbt_calc_reserves(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	xfs_extlen_t		*ask,
+	xfs_extlen_t		*used)
+{
+	xfs_extlen_t		pool_len;
+	xfs_extlen_t		tree_len = 0;
+	int			error;
+
+	if (!xfs_rmapbt_need_reserve(mp))
+		return 0;
+
+	/* Reserve 1% of the AG or enough for 1 block per record. */
+	pool_len = max(mp->m_sb.sb_agblocks / 100, xfs_rmapbt_max_size(mp));
+	*ask += pool_len;
+	error = xfs_rmapbt_count_blocks(mp, agno, &tree_len);
+	*used += tree_len;
+
+	return error;
+}
diff --git a/libxfs/xfs_rmap_btree.h b/libxfs/xfs_rmap_btree.h
index 5df406e..f398e8b 100644
--- a/libxfs/xfs_rmap_btree.h
+++ b/libxfs/xfs_rmap_btree.h
@@ -130,4 +130,11 @@ int xfs_rmap_finish_one(struct xfs_trans *tp, enum xfs_rmap_intent_type type,
 		xfs_fsblock_t startblock, xfs_filblks_t blockcount,
 		xfs_exntst_t state, struct xfs_btree_cur **pcur);
 
+extern xfs_extlen_t xfs_rmapbt_calc_size(struct xfs_mount *mp,
+		unsigned long long len);
+extern xfs_extlen_t xfs_rmapbt_max_size(struct xfs_mount *mp);
+
+extern int xfs_rmapbt_calc_reserves(struct xfs_mount *mp,
+		xfs_agnumber_t agno, xfs_extlen_t *ask, xfs_extlen_t *used);
+
 #endif	/* __XFS_RMAP_BTREE_H__ */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 102/145] xfs: try other AGs to allocate a BMBT block
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (100 preceding siblings ...)
  2016-06-17  1:41 ` [PATCH 101/145] xfs: preallocate blocks for worst-case btree expansion Darrick J. Wong
@ 2016-06-17  1:41 ` Darrick J. Wong
  2016-06-17  1:41 ` [PATCH 103/145] xfs: provide switch to force filesystem to copy-on-write all the time Darrick J. Wong
                   ` (42 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:41 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Prior to the introduction of reflink, allocating a block and mapping
it into a file was performed in a single transaction with a single
block reservation, and the allocator was supposed to find enough
blocks to allocate the extent and any BMBT blocks that might be
necessary (unless we're low on space).

However, due to the way copy on write works, allocation and mapping
have been split into two transactions, which means that we must be
able to handle the case where we allocate an extent for CoW but that
AG runs out of free space before the blocks can be mapped into a file,
and the mapping requires a new BMBT block.  When this happens, look in
one of the other AGs for a BMBT block instead of taking the FS down.

The same applies to the functions that convert a data fork to extents
and later btree format.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_bmap.c       |   30 ++++++++++++++++++++++++++++++
 libxfs/xfs_bmap_btree.c |   17 +++++++++++++++++
 2 files changed, 47 insertions(+)


diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index e81ade5..c48dd4a 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -745,6 +745,7 @@ xfs_bmap_extents_to_btree(
 		args.type = XFS_ALLOCTYPE_START_BNO;
 		args.fsbno = XFS_INO_TO_FSB(mp, ip->i_ino);
 	} else if (dfops->dop_low) {
+try_another_ag:
 		args.type = XFS_ALLOCTYPE_START_BNO;
 		args.fsbno = *firstblock;
 	} else {
@@ -759,6 +760,21 @@ xfs_bmap_extents_to_btree(
 		xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
 		return error;
 	}
+
+	/*
+	 * During a CoW operation, the allocation and bmbt updates occur in
+	 * different transactions.  The mapping code tries to put new bmbt
+	 * blocks near extents being mapped, but the only way to guarantee this
+	 * is if the alloc and the mapping happen in a single transaction that
+	 * has a block reservation.  That isn't the case here, so if we run out
+	 * of space we'll try again with another AG.
+	 */
+	if (xfs_sb_version_hasreflink(&cur->bc_mp->m_sb) &&
+	    args.fsbno == NULLFSBLOCK &&
+	    args.type == XFS_ALLOCTYPE_NEAR_BNO) {
+		dfops->dop_low = true;
+		goto try_another_ag;
+	}
 	/*
 	 * Allocation can't fail, the space was reserved.
 	 */
@@ -894,6 +910,7 @@ xfs_bmap_local_to_extents(
 	 * file currently fits in an inode.
 	 */
 	if (*firstblock == NULLFSBLOCK) {
+try_another_ag:
 		args.fsbno = XFS_INO_TO_FSB(args.mp, ip->i_ino);
 		args.type = XFS_ALLOCTYPE_START_BNO;
 	} else {
@@ -906,6 +923,19 @@ xfs_bmap_local_to_extents(
 	if (error)
 		goto done;
 
+	/*
+	 * During a CoW operation, the allocation and bmbt updates occur in
+	 * different transactions.  The mapping code tries to put new bmbt
+	 * blocks near extents being mapped, but the only way to guarantee this
+	 * is if the alloc and the mapping happen in a single transaction that
+	 * has a block reservation.  That isn't the case here, so if we run out
+	 * of space we'll try again with another AG.
+	 */
+	if (xfs_sb_version_hasreflink(&ip->i_mount->m_sb) &&
+	    args.fsbno == NULLFSBLOCK &&
+	    args.type == XFS_ALLOCTYPE_NEAR_BNO) {
+		goto try_another_ag;
+	}
 	/* Can't fail, the space was reserved. */
 	ASSERT(args.fsbno != NULLFSBLOCK);
 	ASSERT(args.len == 1);
diff --git a/libxfs/xfs_bmap_btree.c b/libxfs/xfs_bmap_btree.c
index 2145ac0..ecfcd5a 100644
--- a/libxfs/xfs_bmap_btree.c
+++ b/libxfs/xfs_bmap_btree.c
@@ -449,6 +449,7 @@ xfs_bmbt_alloc_block(
 
 	if (args.fsbno == NULLFSBLOCK) {
 		args.fsbno = be64_to_cpu(start->l);
+try_another_ag:
 		args.type = XFS_ALLOCTYPE_START_BNO;
 		/*
 		 * Make sure there is sufficient room left in the AG to
@@ -478,6 +479,22 @@ xfs_bmbt_alloc_block(
 	if (error)
 		goto error0;
 
+	/*
+	 * During a CoW operation, the allocation and bmbt updates occur in
+	 * different transactions.  The mapping code tries to put new bmbt
+	 * blocks near extents being mapped, but the only way to guarantee this
+	 * is if the alloc and the mapping happen in a single transaction that
+	 * has a block reservation.  That isn't the case here, so if we run out
+	 * of space we'll try again with another AG.
+	 */
+	if (xfs_sb_version_hasreflink(&cur->bc_mp->m_sb) &&
+	    args.fsbno == NULLFSBLOCK &&
+	    args.type == XFS_ALLOCTYPE_NEAR_BNO) {
+		cur->bc_private.b.dfops->dop_low = true;
+		args.fsbno = cur->bc_private.b.firstblock;
+		goto try_another_ag;
+	}
+
 	if (args.fsbno == NULLFSBLOCK && args.minleft) {
 		/*
 		 * Could not find an AG with enough free space to satisfy

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 103/145] xfs: provide switch to force filesystem to copy-on-write all the time
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (101 preceding siblings ...)
  2016-06-17  1:41 ` [PATCH 102/145] xfs: try other AGs to allocate a BMBT block Darrick J. Wong
@ 2016-06-17  1:41 ` Darrick J. Wong
  2016-06-17  1:41 ` [PATCH 104/145] xfs: increase log reservations for reflink Darrick J. Wong
                   ` (41 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:41 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Make it possible to force XFS to use copy on write all the time, at
least if reflink is turned on.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/libxfs_priv.h  |    2 ++
 libxfs/xfs_refcount.c |    6 ++++++
 2 files changed, 8 insertions(+)


diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h
index 41f6b96..527bd49 100644
--- a/libxfs/libxfs_priv.h
+++ b/libxfs/libxfs_priv.h
@@ -517,4 +517,6 @@ int libxfs_zero_extent(struct xfs_inode *ip, xfs_fsblock_t start_fsb,
 
 bool xfs_log_check_lsn(struct xfs_mount *, xfs_lsn_t);
 
+#define xfs_always_cow	(false)
+
 #endif	/* __LIBXFS_INTERNAL_XFS_H__ */
diff --git a/libxfs/xfs_refcount.c b/libxfs/xfs_refcount.c
index e03a9b7..855ab54 100644
--- a/libxfs/xfs_refcount.c
+++ b/libxfs/xfs_refcount.c
@@ -1189,6 +1189,12 @@ xfs_refcount_find_shared(
 
 	trace_xfs_refcount_find_shared(mp, agno, agbno, aglen);
 
+	if (xfs_always_cow) {
+		*fbno = agbno;
+		*flen = aglen;
+		return 0;
+	}
+
 	error = xfs_alloc_read_agf(mp, NULL, agno, 0, &agbp);
 	if (error)
 		goto out;

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 104/145] xfs: increase log reservations for reflink
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (102 preceding siblings ...)
  2016-06-17  1:41 ` [PATCH 103/145] xfs: provide switch to force filesystem to copy-on-write all the time Darrick J. Wong
@ 2016-06-17  1:41 ` Darrick J. Wong
  2016-06-17  1:41 ` [PATCH 105/145] xfs: use interval query for rmap map and unmap operations on shared files Darrick J. Wong
                   ` (40 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:41 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Increase the log reservations to handle the increased rolling that
happens at the end of copy-on-write operations.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_trans_resv.c |   16 +++++++++++++---
 libxfs/xfs_trans_resv.h |    2 ++
 2 files changed, 15 insertions(+), 3 deletions(-)


diff --git a/libxfs/xfs_trans_resv.c b/libxfs/xfs_trans_resv.c
index 5b6bbcd..5152a5b 100644
--- a/libxfs/xfs_trans_resv.c
+++ b/libxfs/xfs_trans_resv.c
@@ -811,11 +811,18 @@ xfs_trans_resv_calc(
 	 * require a permanent reservation on space.
 	 */
 	resp->tr_write.tr_logres = xfs_calc_write_reservation(mp);
-	resp->tr_write.tr_logcount = XFS_WRITE_LOG_COUNT;
+	if (xfs_sb_version_hasreflink(&mp->m_sb))
+		resp->tr_write.tr_logcount = XFS_WRITE_LOG_COUNT_REFLINK;
+	else
+		resp->tr_write.tr_logcount = XFS_WRITE_LOG_COUNT;
 	resp->tr_write.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
 
 	resp->tr_itruncate.tr_logres = xfs_calc_itruncate_reservation(mp);
-	resp->tr_itruncate.tr_logcount = XFS_ITRUNCATE_LOG_COUNT;
+	if (xfs_sb_version_hasreflink(&mp->m_sb))
+		resp->tr_itruncate.tr_logcount =
+				XFS_ITRUNCATE_LOG_COUNT_REFLINK;
+	else
+		resp->tr_itruncate.tr_logcount = XFS_ITRUNCATE_LOG_COUNT;
 	resp->tr_itruncate.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
 
 	resp->tr_rename.tr_logres = xfs_calc_rename_reservation(mp);
@@ -872,7 +879,10 @@ xfs_trans_resv_calc(
 	resp->tr_growrtalloc.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
 
 	resp->tr_qm_dqalloc.tr_logres = xfs_calc_qm_dqalloc_reservation(mp);
-	resp->tr_qm_dqalloc.tr_logcount = XFS_WRITE_LOG_COUNT;
+	if (xfs_sb_version_hasreflink(&mp->m_sb))
+		resp->tr_qm_dqalloc.tr_logcount = XFS_WRITE_LOG_COUNT_REFLINK;
+	else
+		resp->tr_qm_dqalloc.tr_logcount = XFS_WRITE_LOG_COUNT;
 	resp->tr_qm_dqalloc.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
 
 	/*
diff --git a/libxfs/xfs_trans_resv.h b/libxfs/xfs_trans_resv.h
index 36a1511..b7e5357 100644
--- a/libxfs/xfs_trans_resv.h
+++ b/libxfs/xfs_trans_resv.h
@@ -87,6 +87,7 @@ struct xfs_trans_resv {
 #define	XFS_DEFAULT_LOG_COUNT		1
 #define	XFS_DEFAULT_PERM_LOG_COUNT	2
 #define	XFS_ITRUNCATE_LOG_COUNT		2
+#define	XFS_ITRUNCATE_LOG_COUNT_REFLINK	8
 #define XFS_INACTIVE_LOG_COUNT		2
 #define	XFS_CREATE_LOG_COUNT		2
 #define	XFS_CREATE_TMPFILE_LOG_COUNT	2
@@ -96,6 +97,7 @@ struct xfs_trans_resv {
 #define	XFS_LINK_LOG_COUNT		2
 #define	XFS_RENAME_LOG_COUNT		2
 #define	XFS_WRITE_LOG_COUNT		2
+#define	XFS_WRITE_LOG_COUNT_REFLINK	8
 #define	XFS_ADDAFORK_LOG_COUNT		2
 #define	XFS_ATTRINVAL_LOG_COUNT		1
 #define	XFS_ATTRSET_LOG_COUNT		3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 105/145] xfs: use interval query for rmap map and unmap operations on shared files
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (103 preceding siblings ...)
  2016-06-17  1:41 ` [PATCH 104/145] xfs: increase log reservations for reflink Darrick J. Wong
@ 2016-06-17  1:41 ` Darrick J. Wong
  2016-06-17  1:41 ` [PATCH 106/145] xfs: convert unwritten status of shared-extent reverse mappings " Darrick J. Wong
                   ` (39 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:41 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

When it's possible for reverse mappings to overlap (data fork extents
of files on reflink filesystems), use the interval query function to
find the left neighbor of an extent we're trying to add; and be
careful to use the lookup functions to update the neighbors and/or
add new extents.

v2: xfs_rmap_find_left_neighbor() needs to calculate the high key of a
query range correctly.  We can also add a few shortcuts -- there are
no left neighbors of a query at offset zero.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 include/xfs_trace.h     |   13 +
 libxfs/xfs_rmap.c       |  483 +++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_rmap_btree.h |    7 +
 3 files changed, 500 insertions(+), 3 deletions(-)


diff --git a/include/xfs_trace.h b/include/xfs_trace.h
index 7022164..ce973ba 100644
--- a/include/xfs_trace.h
+++ b/include/xfs_trace.h
@@ -210,7 +210,6 @@
 #define trace_xfs_rmap_convert_state(...)	((void) 0)
 #define trace_xfs_rmap_convert_done(...)	((void) 0)
 #define trace_xfs_rmap_convert_error(...)	((void) 0)
-#define trace_xfs_rmap_find_left_neighbor_result(...)		((void) 0)
 
 #define trace_xfs_ag_resv_critical(...)		((void) 0)
 #define trace_xfs_ag_resv_needed(...)		((void) 0)
@@ -265,6 +264,18 @@
 #define trace_xfs_refcount_cow_increase(...)	((void) 0)
 #define trace_xfs_refcount_cow_decrease(...)	((void) 0)
 
+#define trace_xfs_rmap_find_left_neighbor_candidate(...)	((void) 0)
+#define trace_xfs_rmap_find_left_neighbor_query(...)		((void) 0)
+#define trace_xfs_rmap_find_left_neighbor_result(...)		((void) 0)
+#define trace_xfs_rmap_lookup_le_range_candidate(...)		((void) 0)
+#define trace_xfs_rmap_lookup_le_range(...)	((void) 0)
+#define trace_xfs_rmap_unmap(...)		((void) 0)
+#define trace_xfs_rmap_unmap_done(...)		((void) 0)
+#define trace_xfs_rmap_unmap_error(...)		((void) 0)
+#define trace_xfs_rmap_map(...)			((void) 0)
+#define trace_xfs_rmap_map_done(...)		((void) 0)
+#define trace_xfs_rmap_map_error(...)		((void) 0)
+
 /* set c = c to avoid unused var warnings */
 #define trace_xfs_perag_get(a,b,c,d)	((c) = (c))
 #define trace_xfs_perag_get_tag(a,b,c,d) ((c) = (c))
diff --git a/libxfs/xfs_rmap.c b/libxfs/xfs_rmap.c
index 6e69208..c1cb218 100644
--- a/libxfs/xfs_rmap.c
+++ b/libxfs/xfs_rmap.c
@@ -209,6 +209,160 @@ xfs_rmap_get_rec(
 	return xfs_rmapbt_btrec_to_irec(rec, irec);
 }
 
+struct xfs_find_left_neighbor_info {
+	struct xfs_rmap_irec	high;
+	struct xfs_rmap_irec	*irec;
+	int			*stat;
+};
+
+/* For each rmap given, figure out if it matches the key we want. */
+STATIC int
+xfs_rmap_find_left_neighbor_helper(
+	struct xfs_btree_cur	*cur,
+	struct xfs_rmap_irec	*rec,
+	void			*priv)
+{
+	struct xfs_find_left_neighbor_info	*info = priv;
+
+	trace_xfs_rmap_find_left_neighbor_candidate(cur->bc_mp,
+			cur->bc_private.a.agno, rec->rm_startblock,
+			rec->rm_blockcount, rec->rm_owner, rec->rm_offset,
+			rec->rm_flags);
+
+	if (rec->rm_owner != info->high.rm_owner)
+		return XFS_BTREE_QUERY_RANGE_CONTINUE;
+	if (!XFS_RMAP_NON_INODE_OWNER(rec->rm_owner) &&
+	    !(rec->rm_flags & XFS_RMAP_BMBT_BLOCK) &&
+	    rec->rm_offset + rec->rm_blockcount - 1 != info->high.rm_offset)
+		return XFS_BTREE_QUERY_RANGE_CONTINUE;
+
+	*info->irec = *rec;
+	*info->stat = 1;
+	return XFS_BTREE_QUERY_RANGE_ABORT;
+}
+
+/*
+ * Find the record to the left of the given extent, being careful only to
+ * return a match with the same owner and adjacent physical and logical
+ * block ranges.
+ */
+int
+xfs_rmap_find_left_neighbor(
+	struct xfs_btree_cur	*cur,
+	xfs_agblock_t		bno,
+	uint64_t		owner,
+	uint64_t		offset,
+	unsigned int		flags,
+	struct xfs_rmap_irec	*irec,
+	int			*stat)
+{
+	struct xfs_find_left_neighbor_info	info;
+	int			error;
+
+	*stat = 0;
+	if (bno == 0)
+		return 0;
+	info.high.rm_startblock = bno - 1;
+	info.high.rm_owner = owner;
+	if (!XFS_RMAP_NON_INODE_OWNER(owner) &&
+	    !(flags & XFS_RMAP_BMBT_BLOCK)) {
+		if (offset == 0)
+			return 0;
+		info.high.rm_offset = offset - 1;
+	} else
+		info.high.rm_offset = 0;
+	info.high.rm_flags = flags;
+	info.high.rm_blockcount = 0;
+	info.irec = irec;
+	info.stat = stat;
+
+	trace_xfs_rmap_find_left_neighbor_query(cur->bc_mp,
+			cur->bc_private.a.agno, bno, 0, owner, offset, flags);
+
+	error = xfs_rmapbt_query_range(cur, &info.high, &info.high,
+			xfs_rmap_find_left_neighbor_helper, &info);
+	if (error == XFS_BTREE_QUERY_RANGE_ABORT)
+		error = 0;
+	if (*stat)
+		trace_xfs_rmap_find_left_neighbor_result(cur->bc_mp,
+				cur->bc_private.a.agno, irec->rm_startblock,
+				irec->rm_blockcount, irec->rm_owner,
+				irec->rm_offset, irec->rm_flags);
+	return error;
+}
+
+/* For each rmap given, figure out if it matches the key we want. */
+STATIC int
+xfs_rmap_lookup_le_range_helper(
+	struct xfs_btree_cur	*cur,
+	struct xfs_rmap_irec	*rec,
+	void			*priv)
+{
+	struct xfs_find_left_neighbor_info	*info = priv;
+
+	trace_xfs_rmap_lookup_le_range_candidate(cur->bc_mp,
+			cur->bc_private.a.agno, rec->rm_startblock,
+			rec->rm_blockcount, rec->rm_owner, rec->rm_offset,
+			rec->rm_flags);
+
+	if (rec->rm_owner != info->high.rm_owner)
+		return XFS_BTREE_QUERY_RANGE_CONTINUE;
+	if (!XFS_RMAP_NON_INODE_OWNER(rec->rm_owner) &&
+	    !(rec->rm_flags & XFS_RMAP_BMBT_BLOCK) &&
+	    (rec->rm_offset > info->high.rm_offset ||
+	     rec->rm_offset + rec->rm_blockcount <= info->high.rm_offset))
+		return XFS_BTREE_QUERY_RANGE_CONTINUE;
+
+	*info->irec = *rec;
+	*info->stat = 1;
+	return XFS_BTREE_QUERY_RANGE_ABORT;
+}
+
+/*
+ * Find the record to the left of the given extent, being careful only to
+ * return a match with the same owner and overlapping physical and logical
+ * block ranges.  This is the overlapping-interval version of
+ * xfs_rmap_lookup_le.
+ */
+int
+xfs_rmap_lookup_le_range(
+	struct xfs_btree_cur	*cur,
+	xfs_agblock_t		bno,
+	uint64_t		owner,
+	uint64_t		offset,
+	unsigned int		flags,
+	struct xfs_rmap_irec	*irec,
+	int			*stat)
+{
+	struct xfs_find_left_neighbor_info	info;
+	int			error;
+
+	info.high.rm_startblock = bno;
+	info.high.rm_owner = owner;
+	if (!XFS_RMAP_NON_INODE_OWNER(owner) && !(flags & XFS_RMAP_BMBT_BLOCK))
+		info.high.rm_offset = offset;
+	else
+		info.high.rm_offset = 0;
+	info.high.rm_flags = flags;
+	info.high.rm_blockcount = 0;
+	*stat = 0;
+	info.irec = irec;
+	info.stat = stat;
+
+	trace_xfs_rmap_lookup_le_range(cur->bc_mp,
+			cur->bc_private.a.agno, bno, 0, owner, offset, flags);
+	error = xfs_rmapbt_query_range(cur, &info.high, &info.high,
+			xfs_rmap_lookup_le_range_helper, &info);
+	if (error == XFS_BTREE_QUERY_RANGE_ABORT)
+		error = 0;
+	if (*stat)
+		trace_xfs_rmap_lookup_le_range_result(cur->bc_mp,
+				cur->bc_private.a.agno, irec->rm_startblock,
+				irec->rm_blockcount, irec->rm_owner,
+				irec->rm_offset, irec->rm_flags);
+	return error;
+}
+
 /*
  * Find the extent in the rmap btree and remove it.
  *
@@ -1157,6 +1311,168 @@ xfs_rmap_unmap(
 }
 
 /*
+ * Find an extent in the rmap btree and unmap it.  For rmap extent types that
+ * can overlap (data fork rmaps on reflink filesystems) we must be careful
+ * that the prev/next records in the btree might belong to another owner.
+ * Therefore we must use delete+insert to alter any of the key fields.
+ *
+ * For every other situation there can only be one owner for a given extent,
+ * so we can call the regular _free function.
+ */
+STATIC int
+xfs_rmap_unmap_shared(
+	struct xfs_btree_cur	*cur,
+	xfs_agblock_t		bno,
+	xfs_extlen_t		len,
+	bool			unwritten,
+	struct xfs_owner_info	*oinfo)
+{
+	struct xfs_mount	*mp = cur->bc_mp;
+	struct xfs_rmap_irec	ltrec;
+	uint64_t		ltoff;
+	int			error = 0;
+	int			i;
+	uint64_t		owner;
+	uint64_t		offset;
+	unsigned int		flags;
+
+	xfs_owner_info_unpack(oinfo, &owner, &offset, &flags);
+	if (unwritten)
+		flags |= XFS_RMAP_UNWRITTEN;
+	trace_xfs_rmap_unmap(mp, cur->bc_private.a.agno, bno, len,
+			unwritten, oinfo);
+
+	/*
+	 * We should always have a left record because there's a static record
+	 * for the AG headers at rm_startblock == 0 created by mkfs/growfs that
+	 * will not ever be removed from the tree.
+	 */
+	error = xfs_rmap_lookup_le_range(cur, bno, owner, offset, flags,
+			&ltrec, &i);
+	if (error)
+		goto out_error;
+	XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+	ltoff = ltrec.rm_offset;
+
+	/* Make sure the extent we found covers the entire freeing range. */
+	XFS_WANT_CORRUPTED_GOTO(mp, ltrec.rm_startblock <= bno &&
+		ltrec.rm_startblock + ltrec.rm_blockcount >=
+		bno + len, out_error);
+
+	/* Make sure the owner matches what we expect to find in the tree. */
+	XFS_WANT_CORRUPTED_GOTO(mp, owner == ltrec.rm_owner, out_error);
+
+	/* Make sure the unwritten flag matches. */
+	XFS_WANT_CORRUPTED_GOTO(mp, (flags & XFS_RMAP_UNWRITTEN) ==
+			(ltrec.rm_flags & XFS_RMAP_UNWRITTEN), out_error);
+
+	/* Check the offset. */
+	XFS_WANT_CORRUPTED_GOTO(mp, ltrec.rm_offset <= offset, out_error);
+	XFS_WANT_CORRUPTED_GOTO(mp, offset <= ltoff + ltrec.rm_blockcount,
+			out_error);
+
+	if (ltrec.rm_startblock == bno && ltrec.rm_blockcount == len) {
+		/* Exact match, simply remove the record from rmap tree. */
+		error = xfs_rmapbt_delete(cur, ltrec.rm_startblock,
+				ltrec.rm_blockcount, ltrec.rm_owner,
+				ltrec.rm_offset, ltrec.rm_flags);
+		if (error)
+			goto out_error;
+	} else if (ltrec.rm_startblock == bno) {
+		/*
+		 * Overlap left hand side of extent: move the start, trim the
+		 * length and update the current record.
+		 *
+		 *       ltbno                ltlen
+		 * Orig:    |oooooooooooooooooooo|
+		 * Freeing: |fffffffff|
+		 * Result:            |rrrrrrrrrr|
+		 *         bno       len
+		 */
+
+		/* Delete prev rmap. */
+		error = xfs_rmapbt_delete(cur, ltrec.rm_startblock,
+				ltrec.rm_blockcount, ltrec.rm_owner,
+				ltrec.rm_offset, ltrec.rm_flags);
+		if (error)
+			goto out_error;
+
+		/* Add an rmap at the new offset. */
+		ltrec.rm_startblock += len;
+		ltrec.rm_blockcount -= len;
+		ltrec.rm_offset += len;
+		error = xfs_rmapbt_insert(cur, ltrec.rm_startblock,
+				ltrec.rm_blockcount, ltrec.rm_owner,
+				ltrec.rm_offset, ltrec.rm_flags);
+		if (error)
+			goto out_error;
+	} else if (ltrec.rm_startblock + ltrec.rm_blockcount == bno + len) {
+		/*
+		 * Overlap right hand side of extent: trim the length and
+		 * update the current record.
+		 *
+		 *       ltbno                ltlen
+		 * Orig:    |oooooooooooooooooooo|
+		 * Freeing:            |fffffffff|
+		 * Result:  |rrrrrrrrrr|
+		 *                    bno       len
+		 */
+		error = xfs_rmap_lookup_eq(cur, ltrec.rm_startblock,
+				ltrec.rm_blockcount, ltrec.rm_owner,
+				ltrec.rm_offset, ltrec.rm_flags, &i);
+		if (error)
+			goto out_error;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+		ltrec.rm_blockcount -= len;
+		error = xfs_rmap_update(cur, &ltrec);
+		if (error)
+			goto out_error;
+	} else {
+		/*
+		 * Overlap middle of extent: trim the length of the existing
+		 * record to the length of the new left-extent size, increment
+		 * the insertion position so we can insert a new record
+		 * containing the remaining right-extent space.
+		 *
+		 *       ltbno                ltlen
+		 * Orig:    |oooooooooooooooooooo|
+		 * Freeing:       |fffffffff|
+		 * Result:  |rrrrr|         |rrrr|
+		 *               bno       len
+		 */
+		xfs_extlen_t	orig_len = ltrec.rm_blockcount;
+
+		/* Shrink the left side of the rmap */
+		error = xfs_rmap_lookup_eq(cur, ltrec.rm_startblock,
+				ltrec.rm_blockcount, ltrec.rm_owner,
+				ltrec.rm_offset, ltrec.rm_flags, &i);
+		if (error)
+			goto out_error;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+		ltrec.rm_blockcount = bno - ltrec.rm_startblock;
+		error = xfs_rmap_update(cur, &ltrec);
+		if (error)
+			goto out_error;
+
+		/* Add an rmap at the new offset */
+		error = xfs_rmapbt_insert(cur, bno + len,
+				orig_len - len - ltrec.rm_blockcount,
+				ltrec.rm_owner, offset + len,
+				ltrec.rm_flags);
+		if (error)
+			goto out_error;
+	}
+
+	trace_xfs_rmap_unmap_done(mp, cur->bc_private.a.agno, bno, len,
+			unwritten, oinfo);
+out_error:
+	if (error)
+		trace_xfs_rmap_unmap_error(cur->bc_mp,
+				cur->bc_private.a.agno, error, _RET_IP_);
+	return error;
+}
+
+/*
  * Find an extent in the rmap btree and map it.
  */
 STATIC int
@@ -1170,6 +1486,159 @@ xfs_rmap_map(
 	return __xfs_rmap_alloc(cur, bno, len, unwritten, oinfo);
 }
 
+/*
+ * Find an extent in the rmap btree and map it.  For rmap extent types that
+ * can overlap (data fork rmaps on reflink filesystems) we must be careful
+ * that the prev/next records in the btree might belong to another owner.
+ * Therefore we must use delete+insert to alter any of the key fields.
+ *
+ * For every other situation there can only be one owner for a given extent,
+ * so we can call the regular _alloc function.
+ */
+STATIC int
+xfs_rmap_map_shared(
+	struct xfs_btree_cur	*cur,
+	xfs_agblock_t		bno,
+	xfs_extlen_t		len,
+	bool			unwritten,
+	struct xfs_owner_info	*oinfo)
+{
+	struct xfs_mount	*mp = cur->bc_mp;
+	struct xfs_rmap_irec	ltrec;
+	struct xfs_rmap_irec	gtrec;
+	int			have_gt;
+	int			have_lt;
+	int			error = 0;
+	int			i;
+	uint64_t		owner;
+	uint64_t		offset;
+	unsigned int		flags = 0;
+
+	xfs_owner_info_unpack(oinfo, &owner, &offset, &flags);
+	if (unwritten)
+		flags |= XFS_RMAP_UNWRITTEN;
+	trace_xfs_rmap_map(mp, cur->bc_private.a.agno, bno, len,
+			unwritten, oinfo);
+
+	/* Is there a left record that abuts our range? */
+	error = xfs_rmap_find_left_neighbor(cur, bno, owner, offset, flags,
+			&ltrec, &have_lt);
+	if (error)
+		goto out_error;
+	if (have_lt &&
+	    !xfs_rmap_is_mergeable(&ltrec, owner, offset, len, flags))
+		have_lt = 0;
+
+	/* Is there a right record that abuts our range? */
+	error = xfs_rmap_lookup_eq(cur, bno + len, len, owner, offset + len,
+			flags, &have_gt);
+	if (error)
+		goto out_error;
+	if (have_gt) {
+		error = xfs_rmap_get_rec(cur, &gtrec, &have_gt);
+		if (error)
+			goto out_error;
+		XFS_WANT_CORRUPTED_GOTO(mp, have_gt == 1, out_error);
+		trace_xfs_rmap_map_gtrec(cur->bc_mp,
+			cur->bc_private.a.agno, gtrec.rm_startblock,
+			gtrec.rm_blockcount, gtrec.rm_owner,
+			gtrec.rm_offset, gtrec.rm_flags);
+
+		if (!xfs_rmap_is_mergeable(&gtrec, owner, offset, len, flags))
+			have_gt = 0;
+	}
+
+	if (have_lt &&
+	    ltrec.rm_startblock + ltrec.rm_blockcount == bno &&
+	    ltrec.rm_offset + ltrec.rm_blockcount == offset) {
+		/*
+		 * Left edge contiguous, merge into left record.
+		 *
+		 *       ltbno     ltlen
+		 * orig:   |ooooooooo|
+		 * adding:           |aaaaaaaaa|
+		 * result: |rrrrrrrrrrrrrrrrrrr|
+		 *                  bno       len
+		 */
+		ltrec.rm_blockcount += len;
+		if (have_gt &&
+		    bno + len == gtrec.rm_startblock &&
+		    offset + len == gtrec.rm_offset) {
+			/*
+			 * Right edge also contiguous, delete right record
+			 * and merge into left record.
+			 *
+			 *       ltbno     ltlen    gtbno     gtlen
+			 * orig:   |ooooooooo|         |ooooooooo|
+			 * adding:           |aaaaaaaaa|
+			 * result: |rrrrrrrrrrrrrrrrrrrrrrrrrrrrr|
+			 */
+			ltrec.rm_blockcount += gtrec.rm_blockcount;
+			error = xfs_rmapbt_delete(cur, gtrec.rm_startblock,
+					gtrec.rm_blockcount, gtrec.rm_owner,
+					gtrec.rm_offset, gtrec.rm_flags);
+			if (error)
+				goto out_error;
+		}
+
+		/* Point the cursor back to the left record and update. */
+		error = xfs_rmap_lookup_eq(cur, ltrec.rm_startblock,
+				ltrec.rm_blockcount, ltrec.rm_owner,
+				ltrec.rm_offset, ltrec.rm_flags, &i);
+		if (error)
+			goto out_error;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+
+		error = xfs_rmap_update(cur, &ltrec);
+		if (error)
+			goto out_error;
+	} else if (have_gt &&
+		   bno + len == gtrec.rm_startblock &&
+		   offset + len == gtrec.rm_offset) {
+		/*
+		 * Right edge contiguous, merge into right record.
+		 *
+		 *                 gtbno     gtlen
+		 * Orig:             |ooooooooo|
+		 * adding: |aaaaaaaaa|
+		 * Result: |rrrrrrrrrrrrrrrrrrr|
+		 *        bno       len
+		 */
+		/* Delete the old record. */
+		error = xfs_rmapbt_delete(cur, gtrec.rm_startblock,
+				gtrec.rm_blockcount, gtrec.rm_owner,
+				gtrec.rm_offset, gtrec.rm_flags);
+		if (error)
+			goto out_error;
+
+		/* Move the start and re-add it. */
+		gtrec.rm_startblock = bno;
+		gtrec.rm_blockcount += len;
+		gtrec.rm_offset = offset;
+		error = xfs_rmapbt_insert(cur, gtrec.rm_startblock,
+				gtrec.rm_blockcount, gtrec.rm_owner,
+				gtrec.rm_offset, gtrec.rm_flags);
+		if (error)
+			goto out_error;
+	} else {
+		/*
+		 * No contiguous edge with identical owner, insert
+		 * new record at current cursor position.
+		 */
+		error = xfs_rmapbt_insert(cur, bno, len, owner, offset, flags);
+		if (error)
+			goto out_error;
+	}
+
+	trace_xfs_rmap_map_done(mp, cur->bc_private.a.agno, bno, len,
+			unwritten, oinfo);
+out_error:
+	if (error)
+		trace_xfs_rmap_map_error(cur->bc_mp,
+				cur->bc_private.a.agno, error, _RET_IP_);
+	return error;
+}
+
 struct xfs_rmapbt_query_range_info {
 	xfs_rmapbt_query_range_fn	fn;
 	void				*priv;
@@ -1302,10 +1771,18 @@ xfs_rmap_finish_one(
 	case XFS_RMAP_MAP:
 		error = xfs_rmap_map(rcur, bno, blockcount, unwritten, &oinfo);
 		break;
+	case XFS_RMAP_MAP_SHARED:
+		error = xfs_rmap_map_shared(rcur, bno, blockcount, unwritten,
+				&oinfo);
+		break;
 	case XFS_RMAP_UNMAP:
 		error = xfs_rmap_unmap(rcur, bno, blockcount, unwritten,
 				&oinfo);
 		break;
+	case XFS_RMAP_UNMAP_SHARED:
+		error = xfs_rmap_unmap_shared(rcur, bno, blockcount, unwritten,
+				&oinfo);
+		break;
 	case XFS_RMAP_CONVERT:
 		error = xfs_rmap_convert(rcur, bno, blockcount, !unwritten,
 				&oinfo);
@@ -1373,7 +1850,8 @@ xfs_rmap_map_extent(
 {
 	struct xfs_rmap_intent	ri;
 
-	ri.ri_type = XFS_RMAP_MAP;
+	ri.ri_type = xfs_is_reflink_inode(ip) ? XFS_RMAP_MAP_SHARED :
+			XFS_RMAP_MAP;
 	ri.ri_owner = ip->i_ino;
 	ri.ri_whichfork = whichfork;
 	ri.ri_bmap = *PREV;
@@ -1392,7 +1870,8 @@ xfs_rmap_unmap_extent(
 {
 	struct xfs_rmap_intent	ri;
 
-	ri.ri_type = XFS_RMAP_UNMAP;
+	ri.ri_type = xfs_is_reflink_inode(ip) ? XFS_RMAP_UNMAP_SHARED :
+			XFS_RMAP_UNMAP;
 	ri.ri_owner = ip->i_ino;
 	ri.ri_whichfork = whichfork;
 	ri.ri_bmap = *PREV;
diff --git a/libxfs/xfs_rmap_btree.h b/libxfs/xfs_rmap_btree.h
index f398e8b..5baa81f 100644
--- a/libxfs/xfs_rmap_btree.h
+++ b/libxfs/xfs_rmap_btree.h
@@ -70,6 +70,13 @@ int xfs_rmapbt_insert(struct xfs_btree_cur *rcur, xfs_agblock_t agbno,
 int xfs_rmap_get_rec(struct xfs_btree_cur *cur, struct xfs_rmap_irec *irec,
 		int *stat);
 
+int xfs_rmap_find_left_neighbor(struct xfs_btree_cur *cur, xfs_agblock_t bno,
+		uint64_t owner, uint64_t offset, unsigned int flags,
+		struct xfs_rmap_irec *irec, int	*stat);
+int xfs_rmap_lookup_le_range(struct xfs_btree_cur *cur, xfs_agblock_t bno,
+		uint64_t owner, uint64_t offset, unsigned int flags,
+		struct xfs_rmap_irec *irec, int	*stat);
+
 /* functions for updating the rmapbt for bmbt blocks and AG btree blocks */
 int xfs_rmap_alloc(struct xfs_trans *tp, struct xfs_buf *agbp,
 		   xfs_agnumber_t agno, xfs_agblock_t bno, xfs_extlen_t len,

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 106/145] xfs: convert unwritten status of shared-extent reverse mappings on shared files
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (104 preceding siblings ...)
  2016-06-17  1:41 ` [PATCH 105/145] xfs: use interval query for rmap map and unmap operations on shared files Darrick J. Wong
@ 2016-06-17  1:41 ` Darrick J. Wong
  2016-06-17  1:42 ` [PATCH 107/145] xfs: don't allow realtime and reflinked files to mix Darrick J. Wong
                   ` (38 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:41 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Upgrade the rmap extent conversion function to handle shared extents.

v2: Move unwritten bit to rm_offset.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_rmap.c |  385 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 384 insertions(+), 1 deletion(-)


diff --git a/libxfs/xfs_rmap.c b/libxfs/xfs_rmap.c
index c1cb218..23bffdc 100644
--- a/libxfs/xfs_rmap.c
+++ b/libxfs/xfs_rmap.c
@@ -1291,6 +1291,384 @@ xfs_rmap_convert(
 	return __xfs_rmap_convert(cur, bno, len, unwritten, oinfo);
 }
 
+/*
+ * Convert an unwritten extent to a real extent or vice versa.  If there is no
+ * possibility of overlapping extents, delegate to the simpler convert
+ * function.
+ */
+STATIC int
+xfs_rmap_convert_shared(
+	struct xfs_btree_cur	*cur,
+	xfs_agblock_t		bno,
+	xfs_extlen_t		len,
+	bool			unwritten,
+	struct xfs_owner_info	*oinfo)
+{
+	struct xfs_mount	*mp = cur->bc_mp;
+	struct xfs_rmap_irec	r[4];	/* neighbor extent entries */
+					/* left is 0, right is 1, prev is 2 */
+					/* new is 3 */
+	uint64_t		owner;
+	uint64_t		offset;
+	uint64_t		new_endoff;
+	unsigned int		oldext;
+	unsigned int		newext;
+	unsigned int		flags = 0;
+	int			i;
+	int			state = 0;
+	int			error;
+
+	xfs_owner_info_unpack(oinfo, &owner, &offset, &flags);
+	ASSERT(!(XFS_RMAP_NON_INODE_OWNER(owner) ||
+			(flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK))));
+	oldext = unwritten ? XFS_RMAP_UNWRITTEN : 0;
+	new_endoff = offset + len;
+	trace_xfs_rmap_convert(mp, cur->bc_private.a.agno, bno, len,
+			unwritten, oinfo);
+
+	/*
+	 * For the initial lookup, look for and exact match or the left-adjacent
+	 * record for our insertion point. This will also give us the record for
+	 * start block contiguity tests.
+	 */
+	error = xfs_rmap_lookup_le_range(cur, bno, owner, offset, flags,
+			&PREV, &i);
+	XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
+
+	ASSERT(PREV.rm_offset <= offset);
+	ASSERT(PREV.rm_offset + PREV.rm_blockcount >= new_endoff);
+	ASSERT((PREV.rm_flags & XFS_RMAP_UNWRITTEN) == oldext);
+	newext = ~oldext & XFS_RMAP_UNWRITTEN;
+
+	/*
+	 * Set flags determining what part of the previous oldext allocation
+	 * extent is being replaced by a newext allocation.
+	 */
+	if (PREV.rm_offset == offset)
+		state |= RMAP_LEFT_FILLING;
+	if (PREV.rm_offset + PREV.rm_blockcount == new_endoff)
+		state |= RMAP_RIGHT_FILLING;
+
+	/* Is there a left record that abuts our range? */
+	error = xfs_rmap_find_left_neighbor(cur, bno, owner, offset, newext,
+			&LEFT, &i);
+	if (error)
+		goto done;
+	if (i) {
+		state |= RMAP_LEFT_VALID;
+		XFS_WANT_CORRUPTED_GOTO(mp,
+				LEFT.rm_startblock + LEFT.rm_blockcount <= bno,
+				done);
+		if (xfs_rmap_is_mergeable(&LEFT, owner, offset, len, newext))
+			state |= RMAP_LEFT_CONTIG;
+	}
+
+	/* Is there a right record that abuts our range? */
+	error = xfs_rmap_lookup_eq(cur, bno + len, len, owner, offset + len,
+			newext, &i);
+	if (error)
+		goto done;
+	if (i) {
+		state |= RMAP_RIGHT_VALID;
+		error = xfs_rmap_get_rec(cur, &RIGHT, &i);
+		if (error)
+			goto done;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
+		XFS_WANT_CORRUPTED_GOTO(mp, bno + len <= RIGHT.rm_startblock,
+				done);
+		trace_xfs_rmap_convert_gtrec(cur->bc_mp,
+				cur->bc_private.a.agno, RIGHT.rm_startblock,
+				RIGHT.rm_blockcount, RIGHT.rm_owner,
+				RIGHT.rm_offset, RIGHT.rm_flags);
+		if (xfs_rmap_is_mergeable(&RIGHT, owner, offset, len, newext))
+			state |= RMAP_RIGHT_CONTIG;
+	}
+
+	/* check that left + prev + right is not too long */
+	if ((state & (RMAP_LEFT_FILLING | RMAP_LEFT_CONTIG |
+			 RMAP_RIGHT_FILLING | RMAP_RIGHT_CONTIG)) ==
+	    (RMAP_LEFT_FILLING | RMAP_LEFT_CONTIG |
+	     RMAP_RIGHT_FILLING | RMAP_RIGHT_CONTIG) &&
+	    (unsigned long)LEFT.rm_blockcount + len +
+	     RIGHT.rm_blockcount > XFS_RMAP_LEN_MAX)
+		state &= ~RMAP_RIGHT_CONTIG;
+
+	trace_xfs_rmap_convert_state(mp, cur->bc_private.a.agno, state,
+			_RET_IP_);
+	/*
+	 * Switch out based on the FILLING and CONTIG state bits.
+	 */
+	switch (state & (RMAP_LEFT_FILLING | RMAP_LEFT_CONTIG |
+			 RMAP_RIGHT_FILLING | RMAP_RIGHT_CONTIG)) {
+	case RMAP_LEFT_FILLING | RMAP_LEFT_CONTIG |
+	     RMAP_RIGHT_FILLING | RMAP_RIGHT_CONTIG:
+		/*
+		 * Setting all of a previous oldext extent to newext.
+		 * The left and right neighbors are both contiguous with new.
+		 */
+		error = xfs_rmapbt_delete(cur, RIGHT.rm_startblock,
+				RIGHT.rm_blockcount, RIGHT.rm_owner,
+				RIGHT.rm_offset, RIGHT.rm_flags);
+		if (error)
+			goto done;
+		error = xfs_rmapbt_delete(cur, PREV.rm_startblock,
+				PREV.rm_blockcount, PREV.rm_owner,
+				PREV.rm_offset, PREV.rm_flags);
+		if (error)
+			goto done;
+		NEW = LEFT;
+		error = xfs_rmap_lookup_eq(cur, NEW.rm_startblock,
+				NEW.rm_blockcount, NEW.rm_owner,
+				NEW.rm_offset, NEW.rm_flags, &i);
+		if (error)
+			goto done;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
+		NEW.rm_blockcount += PREV.rm_blockcount + RIGHT.rm_blockcount;
+		error = xfs_rmap_update(cur, &NEW);
+		if (error)
+			goto done;
+		break;
+
+	case RMAP_LEFT_FILLING | RMAP_RIGHT_FILLING | RMAP_LEFT_CONTIG:
+		/*
+		 * Setting all of a previous oldext extent to newext.
+		 * The left neighbor is contiguous, the right is not.
+		 */
+		error = xfs_rmapbt_delete(cur, PREV.rm_startblock,
+				PREV.rm_blockcount, PREV.rm_owner,
+				PREV.rm_offset, PREV.rm_flags);
+		if (error)
+			goto done;
+		NEW = LEFT;
+		error = xfs_rmap_lookup_eq(cur, NEW.rm_startblock,
+				NEW.rm_blockcount, NEW.rm_owner,
+				NEW.rm_offset, NEW.rm_flags, &i);
+		if (error)
+			goto done;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
+		NEW.rm_blockcount += PREV.rm_blockcount;
+		error = xfs_rmap_update(cur, &NEW);
+		if (error)
+			goto done;
+		break;
+
+	case RMAP_LEFT_FILLING | RMAP_RIGHT_FILLING | RMAP_RIGHT_CONTIG:
+		/*
+		 * Setting all of a previous oldext extent to newext.
+		 * The right neighbor is contiguous, the left is not.
+		 */
+		error = xfs_rmapbt_delete(cur, RIGHT.rm_startblock,
+				RIGHT.rm_blockcount, RIGHT.rm_owner,
+				RIGHT.rm_offset, RIGHT.rm_flags);
+		if (error)
+			goto done;
+		NEW = PREV;
+		error = xfs_rmap_lookup_eq(cur, NEW.rm_startblock,
+				NEW.rm_blockcount, NEW.rm_owner,
+				NEW.rm_offset, NEW.rm_flags, &i);
+		if (error)
+			goto done;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
+		NEW.rm_blockcount += RIGHT.rm_blockcount;
+		NEW.rm_flags = RIGHT.rm_flags;
+		error = xfs_rmap_update(cur, &NEW);
+		if (error)
+			goto done;
+		break;
+
+	case RMAP_LEFT_FILLING | RMAP_RIGHT_FILLING:
+		/*
+		 * Setting all of a previous oldext extent to newext.
+		 * Neither the left nor right neighbors are contiguous with
+		 * the new one.
+		 */
+		NEW = PREV;
+		error = xfs_rmap_lookup_eq(cur, NEW.rm_startblock,
+				NEW.rm_blockcount, NEW.rm_owner,
+				NEW.rm_offset, NEW.rm_flags, &i);
+		if (error)
+			goto done;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
+		NEW.rm_flags = newext;
+		error = xfs_rmap_update(cur, &NEW);
+		if (error)
+			goto done;
+		break;
+
+	case RMAP_LEFT_FILLING | RMAP_LEFT_CONTIG:
+		/*
+		 * Setting the first part of a previous oldext extent to newext.
+		 * The left neighbor is contiguous.
+		 */
+		NEW = PREV;
+		error = xfs_rmapbt_delete(cur, NEW.rm_startblock,
+				NEW.rm_blockcount, NEW.rm_owner,
+				NEW.rm_offset, NEW.rm_flags);
+		if (error)
+			goto done;
+		NEW.rm_offset += len;
+		NEW.rm_startblock += len;
+		NEW.rm_blockcount -= len;
+		error = xfs_rmapbt_insert(cur, NEW.rm_startblock,
+				NEW.rm_blockcount, NEW.rm_owner,
+				NEW.rm_offset, NEW.rm_flags);
+		if (error)
+			goto done;
+		NEW = LEFT;
+		error = xfs_rmap_lookup_eq(cur, NEW.rm_startblock,
+				NEW.rm_blockcount, NEW.rm_owner,
+				NEW.rm_offset, NEW.rm_flags, &i);
+		if (error)
+			goto done;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
+		NEW.rm_blockcount += len;
+		error = xfs_rmap_update(cur, &NEW);
+		if (error)
+			goto done;
+		break;
+
+	case RMAP_LEFT_FILLING:
+		/*
+		 * Setting the first part of a previous oldext extent to newext.
+		 * The left neighbor is not contiguous.
+		 */
+		NEW = PREV;
+		error = xfs_rmapbt_delete(cur, NEW.rm_startblock,
+				NEW.rm_blockcount, NEW.rm_owner,
+				NEW.rm_offset, NEW.rm_flags);
+		if (error)
+			goto done;
+		NEW.rm_offset += len;
+		NEW.rm_startblock += len;
+		NEW.rm_blockcount -= len;
+		error = xfs_rmapbt_insert(cur, NEW.rm_startblock,
+				NEW.rm_blockcount, NEW.rm_owner,
+				NEW.rm_offset, NEW.rm_flags);
+		if (error)
+			goto done;
+		error = xfs_rmapbt_insert(cur, bno, len, owner, offset, newext);
+		if (error)
+			goto done;
+		break;
+
+	case RMAP_RIGHT_FILLING | RMAP_RIGHT_CONTIG:
+		/*
+		 * Setting the last part of a previous oldext extent to newext.
+		 * The right neighbor is contiguous with the new allocation.
+		 */
+		NEW = PREV;
+		error = xfs_rmap_lookup_eq(cur, NEW.rm_startblock,
+				NEW.rm_blockcount, NEW.rm_owner,
+				NEW.rm_offset, NEW.rm_flags, &i);
+		if (error)
+			goto done;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
+		NEW.rm_blockcount = offset - NEW.rm_offset;
+		error = xfs_rmap_update(cur, &NEW);
+		if (error)
+			goto done;
+		NEW = RIGHT;
+		error = xfs_rmapbt_delete(cur, NEW.rm_startblock,
+				NEW.rm_blockcount, NEW.rm_owner,
+				NEW.rm_offset, NEW.rm_flags);
+		if (error)
+			goto done;
+		NEW.rm_offset = offset;
+		NEW.rm_startblock = bno;
+		NEW.rm_blockcount += len;
+		error = xfs_rmapbt_insert(cur, NEW.rm_startblock,
+				NEW.rm_blockcount, NEW.rm_owner,
+				NEW.rm_offset, NEW.rm_flags);
+		if (error)
+			goto done;
+		break;
+
+	case RMAP_RIGHT_FILLING:
+		/*
+		 * Setting the last part of a previous oldext extent to newext.
+		 * The right neighbor is not contiguous.
+		 */
+		NEW = PREV;
+		error = xfs_rmap_lookup_eq(cur, NEW.rm_startblock,
+				NEW.rm_blockcount, NEW.rm_owner,
+				NEW.rm_offset, NEW.rm_flags, &i);
+		if (error)
+			goto done;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
+		NEW.rm_blockcount -= len;
+		error = xfs_rmap_update(cur, &NEW);
+		if (error)
+			goto done;
+		error = xfs_rmapbt_insert(cur, bno, len, owner, offset, newext);
+		if (error)
+			goto done;
+		break;
+
+	case 0:
+		/*
+		 * Setting the middle part of a previous oldext extent to
+		 * newext.  Contiguity is impossible here.
+		 * One extent becomes three extents.
+		 */
+		/* new right extent - oldext */
+		NEW.rm_startblock = bno + len;
+		NEW.rm_owner = owner;
+		NEW.rm_offset = new_endoff;
+		NEW.rm_blockcount = PREV.rm_offset + PREV.rm_blockcount -
+				new_endoff;
+		NEW.rm_flags = PREV.rm_flags;
+		error = xfs_rmapbt_insert(cur, NEW.rm_startblock,
+				NEW.rm_blockcount, NEW.rm_owner, NEW.rm_offset,
+				NEW.rm_flags);
+		if (error)
+			goto done;
+		/* new left extent - oldext */
+		NEW = PREV;
+		error = xfs_rmap_lookup_eq(cur, NEW.rm_startblock,
+				NEW.rm_blockcount, NEW.rm_owner,
+				NEW.rm_offset, NEW.rm_flags, &i);
+		if (error)
+			goto done;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
+		NEW.rm_blockcount = offset - NEW.rm_offset;
+		error = xfs_rmap_update(cur, &NEW);
+		if (error)
+			goto done;
+		/* new middle extent - newext */
+		NEW.rm_startblock = bno;
+		NEW.rm_blockcount = len;
+		NEW.rm_owner = owner;
+		NEW.rm_offset = offset;
+		NEW.rm_flags = newext;
+		error = xfs_rmapbt_insert(cur, NEW.rm_startblock,
+				NEW.rm_blockcount, NEW.rm_owner, NEW.rm_offset,
+				NEW.rm_flags);
+		if (error)
+			goto done;
+		break;
+
+	case RMAP_LEFT_FILLING | RMAP_LEFT_CONTIG | RMAP_RIGHT_CONTIG:
+	case RMAP_RIGHT_FILLING | RMAP_LEFT_CONTIG | RMAP_RIGHT_CONTIG:
+	case RMAP_LEFT_FILLING | RMAP_RIGHT_CONTIG:
+	case RMAP_RIGHT_FILLING | RMAP_LEFT_CONTIG:
+	case RMAP_LEFT_CONTIG | RMAP_RIGHT_CONTIG:
+	case RMAP_LEFT_CONTIG:
+	case RMAP_RIGHT_CONTIG:
+		/*
+		 * These cases are all impossible.
+		 */
+		ASSERT(0);
+	}
+
+	trace_xfs_rmap_convert_done(mp, cur->bc_private.a.agno, bno, len,
+			unwritten, oinfo);
+done:
+	if (error)
+		trace_xfs_rmap_convert_error(cur->bc_mp,
+				cur->bc_private.a.agno, error, _RET_IP_);
+	return error;
+}
+
 #undef	NEW
 #undef	LEFT
 #undef	RIGHT
@@ -1787,6 +2165,10 @@ xfs_rmap_finish_one(
 		error = xfs_rmap_convert(rcur, bno, blockcount, !unwritten,
 				&oinfo);
 		break;
+	case XFS_RMAP_CONVERT_SHARED:
+		error = xfs_rmap_convert_shared(rcur, bno, blockcount,
+				!unwritten, &oinfo);
+		break;
 	case XFS_RMAP_ALLOC:
 		error = __xfs_rmap_alloc(rcur, bno, blockcount, unwritten,
 				&oinfo);
@@ -1890,7 +2272,8 @@ xfs_rmap_convert_extent(
 {
 	struct xfs_rmap_intent	ri;
 
-	ri.ri_type = XFS_RMAP_CONVERT;
+	ri.ri_type = xfs_is_reflink_inode(ip) ? XFS_RMAP_CONVERT_SHARED :
+			XFS_RMAP_CONVERT;
 	ri.ri_owner = ip->i_ino;
 	ri.ri_whichfork = whichfork;
 	ri.ri_bmap = *PREV;

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 107/145] xfs: don't allow realtime and reflinked files to mix
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (105 preceding siblings ...)
  2016-06-17  1:41 ` [PATCH 106/145] xfs: convert unwritten status of shared-extent reverse mappings " Darrick J. Wong
@ 2016-06-17  1:42 ` Darrick J. Wong
  2016-06-17  1:42 ` [PATCH 108/145] xfs: don't mix reflink and DAX mode for now Darrick J. Wong
                   ` (37 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:42 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

We don't support sharing blocks on the realtime device.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_inode_buf.c |   10 ++++++++++
 1 file changed, 10 insertions(+)


diff --git a/libxfs/xfs_inode_buf.c b/libxfs/xfs_inode_buf.c
index 8a804e2..3e46bc5 100644
--- a/libxfs/xfs_inode_buf.c
+++ b/libxfs/xfs_inode_buf.c
@@ -384,6 +384,9 @@ xfs_dinode_verify(
 	xfs_ino_t		ino,
 	struct xfs_dinode	*dip)
 {
+	uint16_t		flags;
+	uint64_t		flags2;
+
 	if (dip->di_magic != cpu_to_be16(XFS_DINODE_MAGIC))
 		return false;
 
@@ -400,6 +403,13 @@ xfs_dinode_verify(
 		return false;
 	if (!uuid_equal(&dip->di_uuid, &mp->m_sb.sb_meta_uuid))
 		return false;
+
+	/* don't let reflink and realtime mix */
+	flags = be16_to_cpu(dip->di_flags);
+	flags2 = be64_to_cpu(dip->di_flags2);
+	if ((flags2 & XFS_DIFLAG2_REFLINK) && (flags & XFS_DIFLAG_REALTIME))
+		return false;
+
 	return true;
 }
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 108/145] xfs: don't mix reflink and DAX mode for now
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (106 preceding siblings ...)
  2016-06-17  1:42 ` [PATCH 107/145] xfs: don't allow realtime and reflinked files to mix Darrick J. Wong
@ 2016-06-17  1:42 ` Darrick J. Wong
  2016-06-17  1:42 ` [PATCH 109/145] xfs: recognize the reflink feature bit Darrick J. Wong
                   ` (36 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:42 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Since we don't have a strategy for handling both DAX and reflink,
for now we'll just prohibit both being set at the same time.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_inode_buf.c |    4 ++++
 1 file changed, 4 insertions(+)


diff --git a/libxfs/xfs_inode_buf.c b/libxfs/xfs_inode_buf.c
index 3e46bc5..cf48c8c 100644
--- a/libxfs/xfs_inode_buf.c
+++ b/libxfs/xfs_inode_buf.c
@@ -410,6 +410,10 @@ xfs_dinode_verify(
 	if ((flags2 & XFS_DIFLAG2_REFLINK) && (flags & XFS_DIFLAG_REALTIME))
 		return false;
 
+	/* don't let reflink and dax mix */
+	if ((flags2 & XFS_DIFLAG2_REFLINK) && (flags2 & XFS_DIFLAG2_DAX))
+		return false;
+
 	return true;
 }
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 109/145] xfs: recognize the reflink feature bit
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (107 preceding siblings ...)
  2016-06-17  1:42 ` [PATCH 108/145] xfs: don't mix reflink and DAX mode for now Darrick J. Wong
@ 2016-06-17  1:42 ` Darrick J. Wong
  2016-06-17  1:42 ` [PATCH 110/145] xfs_db: dump refcount btree data Darrick J. Wong
                   ` (35 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:42 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Add the reflink feature flag to the set of recognized feature flags.
This enables users to write to reflink filesystems.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_format.h |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index de29220..bfbf6e8 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -459,7 +459,8 @@ xfs_sb_has_compat_feature(
 #define XFS_SB_FEAT_RO_COMPAT_REFLINK  (1 << 2)		/* reflinked files */
 #define XFS_SB_FEAT_RO_COMPAT_ALL \
 		(XFS_SB_FEAT_RO_COMPAT_FINOBT | \
-		 XFS_SB_FEAT_RO_COMPAT_RMAPBT)
+		 XFS_SB_FEAT_RO_COMPAT_RMAPBT | \
+		 XFS_SB_FEAT_RO_COMPAT_REFLINK)
 #define XFS_SB_FEAT_RO_COMPAT_UNKNOWN	~XFS_SB_FEAT_RO_COMPAT_ALL
 static inline bool
 xfs_sb_has_ro_compat_feature(

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 110/145] xfs_db: dump refcount btree data
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (108 preceding siblings ...)
  2016-06-17  1:42 ` [PATCH 109/145] xfs: recognize the reflink feature bit Darrick J. Wong
@ 2016-06-17  1:42 ` Darrick J. Wong
  2016-06-17  1:42 ` [PATCH 111/145] xfs_db: add support for checking the refcount btree Darrick J. Wong
                   ` (34 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:42 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Add the ability to walk and dump the refcount btree in xfs_db.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 db/agf.c          |   10 ++++++++--
 db/btblock.c      |   50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 db/btblock.h      |    5 +++++
 db/field.c        |    9 +++++++++
 db/field.h        |    4 ++++
 db/inode.c        |    3 +++
 db/sb.c           |    2 ++
 db/type.c         |    5 +++++
 db/type.h         |    2 +-
 man/man8/xfs_db.8 |   47 +++++++++++++++++++++++++++++++++++++++++++++--
 10 files changed, 132 insertions(+), 5 deletions(-)


diff --git a/db/agf.c b/db/agf.c
index f4c4269..86d8929 100644
--- a/db/agf.c
+++ b/db/agf.c
@@ -47,7 +47,7 @@ const field_t	agf_flds[] = {
 	{ "versionnum", FLDT_UINT32D, OI(OFF(versionnum)), C1, 0, TYP_NONE },
 	{ "seqno", FLDT_AGNUMBER, OI(OFF(seqno)), C1, 0, TYP_NONE },
 	{ "length", FLDT_AGBLOCK, OI(OFF(length)), C1, 0, TYP_NONE },
-	{ "roots", FLDT_AGBLOCK, OI(OFF(roots)), CI(XFS_BTNUM_AGF),
+	{ "roots", FLDT_AGBLOCK, OI(OFF(roots)), CI(XFS_BTNUM_AGF) + 1,
 	  FLD_ARRAY|FLD_SKIPALL, TYP_NONE },
 	{ "bnoroot", FLDT_AGBLOCK,
 	  OI(OFF(roots) + XFS_BTNUM_BNO * SZ(roots[XFS_BTNUM_BNO])), C1, 0,
@@ -58,7 +58,10 @@ const field_t	agf_flds[] = {
 	{ "rmaproot", FLDT_AGBLOCKNZ,
 	  OI(OFF(roots) + XFS_BTNUM_RMAP * SZ(roots[XFS_BTNUM_RMAP])), C1, 0,
 	  TYP_RMAPBT },
-	{ "levels", FLDT_UINT32D, OI(OFF(levels)), CI(XFS_BTNUM_AGF),
+	{ "refcntroot", FLDT_AGBLOCKNZ,
+	  OI(OFF(refcount_root)), C1, 0,
+	  TYP_REFCBT },
+	{ "levels", FLDT_UINT32D, OI(OFF(levels)), CI(XFS_BTNUM_AGF) + 1,
 	  FLD_ARRAY|FLD_SKIPALL, TYP_NONE },
 	{ "bnolevel", FLDT_UINT32D,
 	  OI(OFF(levels) + XFS_BTNUM_BNO * SZ(levels[XFS_BTNUM_BNO])), C1, 0,
@@ -69,6 +72,9 @@ const field_t	agf_flds[] = {
 	{ "rmaplevel", FLDT_UINT32D,
 	  OI(OFF(levels) + XFS_BTNUM_RMAP * SZ(levels[XFS_BTNUM_RMAP])), C1, 0,
 	  TYP_NONE },
+	{ "refcntlevel", FLDT_UINT32D,
+	  OI(OFF(refcount_level)), C1, 0,
+	  TYP_NONE },
 	{ "flfirst", FLDT_UINT32D, OI(OFF(flfirst)), C1, 0, TYP_NONE },
 	{ "fllast", FLDT_UINT32D, OI(OFF(fllast)), C1, 0, TYP_NONE },
 	{ "flcount", FLDT_UINT32D, OI(OFF(flcount)), C1, 0, TYP_NONE },
diff --git a/db/btblock.c b/db/btblock.c
index ce59d18..e0c896b 100644
--- a/db/btblock.c
+++ b/db/btblock.c
@@ -102,6 +102,12 @@ struct xfs_db_btree {
 		sizeof(struct xfs_rmap_rec),
 		sizeof(__be32),
 	},
+	{	XFS_REFC_CRC_MAGIC,
+		XFS_BTREE_SBLOCK_CRC_LEN,
+		sizeof(struct xfs_refcount_key),
+		sizeof(struct xfs_refcount_rec),
+		sizeof(__be32),
+	},
 	{	0,
 	},
 };
@@ -707,3 +713,47 @@ const field_t	rmapbt_rec_flds[] = {
 	{ NULL }
 };
 #undef ROFF
+
+/* refcount btree blocks */
+const field_t	refcbt_crc_hfld[] = {
+	{ "", FLDT_REFCBT_CRC, OI(0), C1, 0, TYP_NONE },
+	{ NULL }
+};
+
+#define	OFF(f)	bitize(offsetof(struct xfs_btree_block, bb_ ## f))
+const field_t	refcbt_crc_flds[] = {
+	{ "magic", FLDT_UINT32X, OI(OFF(magic)), C1, 0, TYP_NONE },
+	{ "level", FLDT_UINT16D, OI(OFF(level)), C1, 0, TYP_NONE },
+	{ "numrecs", FLDT_UINT16D, OI(OFF(numrecs)), C1, 0, TYP_NONE },
+	{ "leftsib", FLDT_AGBLOCK, OI(OFF(u.s.bb_leftsib)), C1, 0, TYP_REFCBT },
+	{ "rightsib", FLDT_AGBLOCK, OI(OFF(u.s.bb_rightsib)), C1, 0, TYP_REFCBT },
+	{ "bno", FLDT_DFSBNO, OI(OFF(u.s.bb_blkno)), C1, 0, TYP_REFCBT },
+	{ "lsn", FLDT_UINT64X, OI(OFF(u.s.bb_lsn)), C1, 0, TYP_NONE },
+	{ "uuid", FLDT_UUID, OI(OFF(u.s.bb_uuid)), C1, 0, TYP_NONE },
+	{ "owner", FLDT_AGNUMBER, OI(OFF(u.s.bb_owner)), C1, 0, TYP_NONE },
+	{ "crc", FLDT_CRC, OI(OFF(u.s.bb_crc)), C1, 0, TYP_NONE },
+	{ "recs", FLDT_REFCBTREC, btblock_rec_offset, btblock_rec_count,
+	  FLD_ARRAY|FLD_ABASE1|FLD_COUNT|FLD_OFFSET, TYP_NONE },
+	{ "keys", FLDT_REFCBTKEY, btblock_key_offset, btblock_key_count,
+	  FLD_ARRAY|FLD_ABASE1|FLD_COUNT|FLD_OFFSET, TYP_NONE },
+	{ "ptrs", FLDT_REFCBTPTR, btblock_ptr_offset, btblock_key_count,
+	  FLD_ARRAY|FLD_ABASE1|FLD_COUNT|FLD_OFFSET, TYP_REFCBT },
+	{ NULL }
+};
+#undef OFF
+
+#define	KOFF(f)	bitize(offsetof(struct xfs_refcount_key, rc_ ## f))
+const field_t	refcbt_key_flds[] = {
+	{ "startblock", FLDT_AGBLOCK, OI(KOFF(startblock)), C1, 0, TYP_DATA },
+	{ NULL }
+};
+#undef KOFF
+
+#define	ROFF(f)	bitize(offsetof(struct xfs_refcount_rec, rc_ ## f))
+const field_t	refcbt_rec_flds[] = {
+	{ "startblock", FLDT_AGBLOCK, OI(ROFF(startblock)), C1, 0, TYP_DATA },
+	{ "blockcount", FLDT_EXTLEN, OI(ROFF(blockcount)), C1, 0, TYP_NONE },
+	{ "refcount", FLDT_UINT32D, OI(ROFF(refcount)), C1, 0, TYP_DATA },
+	{ NULL }
+};
+#undef ROFF
diff --git a/db/btblock.h b/db/btblock.h
index 35299b4..fead2f1 100644
--- a/db/btblock.h
+++ b/db/btblock.h
@@ -59,4 +59,9 @@ extern const struct field	rmapbt_crc_hfld[];
 extern const struct field	rmapbt_key_flds[];
 extern const struct field	rmapbt_rec_flds[];
 
+extern const struct field	refcbt_crc_flds[];
+extern const struct field	refcbt_crc_hfld[];
+extern const struct field	refcbt_key_flds[];
+extern const struct field	refcbt_rec_flds[];
+
 extern int	btblock_size(void *obj, int startoff, int idx);
diff --git a/db/field.c b/db/field.c
index 58728a9..f81b64d 100644
--- a/db/field.c
+++ b/db/field.c
@@ -183,6 +183,15 @@ const ftattr_t	ftattrtab[] = {
 	{ FLDT_RMAPBTREC, "rmapbtrec", fp_sarray, (char *)rmapbt_rec_flds,
 	  SI(bitsz(struct xfs_rmap_rec)), 0, NULL, rmapbt_rec_flds },
 
+	{ FLDT_REFCBT_CRC, "refcntbt", NULL, (char *)refcbt_crc_flds, btblock_size,
+	  FTARG_SIZE, NULL, refcbt_crc_flds },
+	{ FLDT_REFCBTKEY, "refcntbtkey", fp_sarray, (char *)refcbt_key_flds,
+	  SI(bitsz(struct xfs_refcount_key)), 0, NULL, refcbt_key_flds },
+	{ FLDT_REFCBTPTR, "refcntbtptr", fp_num, "%u", SI(bitsz(xfs_refcount_ptr_t)),
+	  0, fa_agblock, NULL },
+	{ FLDT_REFCBTREC, "refcntbtrec", fp_sarray, (char *)refcbt_rec_flds,
+	  SI(bitsz(struct xfs_refcount_rec)), 0, NULL, refcbt_rec_flds },
+
 /* CRC field */
 	{ FLDT_CRC, "crc", fp_crc, "%#x (%s)", SI(bitsz(__uint32_t)),
 	  0, NULL, NULL },
diff --git a/db/field.h b/db/field.h
index 47f562a..ae5f490 100644
--- a/db/field.h
+++ b/db/field.h
@@ -89,6 +89,10 @@ typedef enum fldt	{
 	FLDT_RMAPBTKEY,
 	FLDT_RMAPBTPTR,
 	FLDT_RMAPBTREC,
+	FLDT_REFCBT_CRC,
+	FLDT_REFCBTKEY,
+	FLDT_REFCBTPTR,
+	FLDT_REFCBTREC,
 
 	/* CRC field type */
 	FLDT_CRC,
diff --git a/db/inode.c b/db/inode.c
index 442e6ea..702cdf8 100644
--- a/db/inode.c
+++ b/db/inode.c
@@ -175,6 +175,9 @@ const field_t	inode_v3_flds[] = {
 	{ "crtime", FLDT_TIMESTAMP, OI(COFF(crtime)), C1, 0, TYP_NONE },
 	{ "inumber", FLDT_INO, OI(COFF(ino)), C1, 0, TYP_NONE },
 	{ "uuid", FLDT_UUID, OI(COFF(uuid)), C1, 0, TYP_NONE },
+	{ "reflink", FLDT_UINT1,
+	  OI(COFF(flags2) + bitsz(__uint64_t) - XFS_DIFLAG2_REFLINK_BIT-1), C1,
+	  0, TYP_NONE },
 	{ NULL }
 };
 
diff --git a/db/sb.c b/db/sb.c
index 79a3c1d..8e7722c 100644
--- a/db/sb.c
+++ b/db/sb.c
@@ -694,6 +694,8 @@ version_string(
 		strcat(s, ",SPARSE_INODES");
 	if (xfs_sb_version_hasmetauuid(sbp))
 		strcat(s, ",META_UUID");
+	if (xfs_sb_version_hasreflink(sbp))
+		strcat(s, ",REFLINK");
 	return s;
 }
 
diff --git a/db/type.c b/db/type.c
index dd192a1..2a501ca 100644
--- a/db/type.c
+++ b/db/type.c
@@ -59,6 +59,7 @@ static const typ_t	__typtab[] = {
 	{ TYP_BNOBT, "bnobt", handle_struct, bnobt_hfld, NULL },
 	{ TYP_CNTBT, "cntbt", handle_struct, cntbt_hfld, NULL },
 	{ TYP_RMAPBT, NULL },
+	{ TYP_REFCBT, NULL },
 	{ TYP_DATA, "data", handle_block, NULL, NULL },
 	{ TYP_DIR2, "dir2", handle_struct, dir2_hfld, NULL },
 	{ TYP_DQBLK, "dqblk", handle_struct, dqblk_hfld, NULL },
@@ -91,6 +92,8 @@ static const typ_t	__typtab_crc[] = {
 		&xfs_allocbt_buf_ops },
 	{ TYP_RMAPBT, "rmapbt", handle_struct, rmapbt_crc_hfld,
 		&xfs_rmapbt_buf_ops },
+	{ TYP_REFCBT, "refcntbt", handle_struct, refcbt_crc_hfld,
+		&xfs_refcountbt_buf_ops },
 	{ TYP_DATA, "data", handle_block, NULL, NULL },
 	{ TYP_DIR2, "dir3", handle_struct, dir3_hfld,
 		&xfs_dir3_db_buf_ops },
@@ -129,6 +132,8 @@ static const typ_t	__typtab_spcrc[] = {
 		&xfs_allocbt_buf_ops },
 	{ TYP_RMAPBT, "rmapbt", handle_struct, rmapbt_crc_hfld,
 		&xfs_rmapbt_buf_ops },
+	{ TYP_REFCBT, "refcntbt", handle_struct, refcbt_crc_hfld,
+		&xfs_refcountbt_buf_ops },
 	{ TYP_DATA, "data", handle_block, NULL, NULL },
 	{ TYP_DIR2, "dir3", handle_struct, dir3_hfld,
 		&xfs_dir3_db_buf_ops },
diff --git a/db/type.h b/db/type.h
index 1bef8e6..998f755 100644
--- a/db/type.h
+++ b/db/type.h
@@ -24,7 +24,7 @@ struct field;
 typedef enum typnm
 {
 	TYP_AGF, TYP_AGFL, TYP_AGI, TYP_ATTR, TYP_BMAPBTA,
-	TYP_BMAPBTD, TYP_BNOBT, TYP_CNTBT, TYP_RMAPBT, TYP_DATA,
+	TYP_BMAPBTD, TYP_BNOBT, TYP_CNTBT, TYP_RMAPBT, TYP_REFCBT, TYP_DATA,
 	TYP_DIR2, TYP_DQBLK, TYP_INOBT, TYP_INODATA, TYP_INODE,
 	TYP_LOG, TYP_RTBITMAP, TYP_RTSUMMARY, TYP_SB, TYP_SYMLINK,
 	TYP_TEXT, TYP_FINOBT, TYP_NONE
diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8
index a380f78..b6d2f64 100644
--- a/man/man8/xfs_db.8
+++ b/man/man8/xfs_db.8
@@ -673,8 +673,8 @@ If no argument is given, show the current data type.
 The possible data types are:
 .BR agf ", " agfl ", " agi ", " attr ", " bmapbta ", " bmapbtd ,
 .BR bnobt ", " cntbt ", " data ", " dir ", " dir2 ", " dqblk ,
-.BR inobt ", " inode ", " log ", " rmapbt ", " rtbitmap ", " rtsummary ,
-.BR sb ", " symlink " and " text .
+.BR inobt ", " inode ", " log ", " refcntbt ", " rmapbt ", " rtbitmap ,
+.BR rtsummary ", " sb ", " symlink " and " text .
 See the TYPES section below for more information on these data types.
 .TP
 .BI "uuid [" uuid " | " generate " | " rewrite " | " restore ]
@@ -1658,6 +1658,49 @@ use
 .BR xfs_logprint (8)
 instead.
 .TP
+.B refcntbt
+There is one set of filesystem blocks forming the reference count Btree for
+each allocation group. The root block of this Btree is designated by the
+.B refcntroot
+field in the corresponding AGF block.  The blocks are linked to sibling left
+and right blocks at each level, as well as by pointers from parent to child
+blocks.  Each block has the following fields:
+.RS 1.4i
+.PD 0
+.TP 1.2i
+.B magic
+REFC block magic number, 0x52334643 ('R3FC').
+.TP
+.B level
+level number of this block, 0 is a leaf.
+.TP
+.B numrecs
+number of data entries in the block.
+.TP
+.B leftsib
+left (logically lower) sibling block, 0 if none.
+.TP
+.B rightsib
+right (logically higher) sibling block, 0 if none.
+.TP
+.B recs
+[leaf blocks only] array of reference count records. Each record contains
+.BR startblock ,
+.BR blockcount ,
+and
+.BR refcount .
+.TP
+.B keys
+[non-leaf blocks only] array of key records. These are the first value
+of each block in the level below this one. Each record contains
+.BR startblock .
+.TP
+.B ptrs
+[non-leaf blocks only] array of child block pointers. Each pointer is a
+block number within the allocation group to the next level in the Btree.
+.PD
+.RE
+.TP
 .B rmapbt
 There is one set of filesystem blocks forming the reverse mapping Btree for
 each allocation group. The root block of this Btree is designated by the

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 111/145] xfs_db: add support for checking the refcount btree
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (109 preceding siblings ...)
  2016-06-17  1:42 ` [PATCH 110/145] xfs_db: dump refcount btree data Darrick J. Wong
@ 2016-06-17  1:42 ` Darrick J. Wong
  2016-06-17  1:42 ` [PATCH 112/145] xfs_db: metadump should copy the refcount btree too Darrick J. Wong
                   ` (33 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:42 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Do some basic checks of the refcount btree.  xfs_repair will have to
check that the reference counts match the various bmbt mappings.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 db/check.c |  136 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 128 insertions(+), 8 deletions(-)


diff --git a/db/check.c b/db/check.c
index 2964b5f..3b17585 100644
--- a/db/check.c
+++ b/db/check.c
@@ -44,7 +44,8 @@ typedef enum {
 	DBM_FREE1,	DBM_FREE2,	DBM_FREELIST,	DBM_INODE,
 	DBM_LOG,	DBM_MISSING,	DBM_QUOTA,	DBM_RTBITMAP,
 	DBM_RTDATA,	DBM_RTFREE,	DBM_RTSUM,	DBM_SB,
-	DBM_SYMLINK,	DBM_BTFINO,	DBM_BTRMAP,
+	DBM_SYMLINK,	DBM_BTFINO,	DBM_BTRMAP,	DBM_BTREFC,
+	DBM_RLDATA,
 	DBM_NDBM
 } dbm_t;
 
@@ -52,7 +53,8 @@ typedef struct inodata {
 	struct inodata	*next;
 	nlink_t		link_set;
 	nlink_t		link_add;
-	char		isdir;
+	char		isdir:1;
+	char		isreflink:1;
 	char		security;
 	char		ilist;
 	xfs_ino_t	ino;
@@ -172,6 +174,8 @@ static const char	*typename[] = {
 	"symlink",
 	"btfino",
 	"btrmap",
+	"btrefcnt",
+	"rldata",
 	NULL
 };
 static int		verbose;
@@ -229,7 +233,8 @@ static int		blocktrash_f(int argc, char **argv);
 static int		blockuse_f(int argc, char **argv);
 static int		check_blist(xfs_fsblock_t bno);
 static void		check_dbmap(xfs_agnumber_t agno, xfs_agblock_t agbno,
-				    xfs_extlen_t len, dbm_t type);
+				    xfs_extlen_t len, dbm_t type,
+				    int ignore_reflink);
 static int		check_inomap(xfs_agnumber_t agno, xfs_agblock_t agbno,
 				     xfs_extlen_t len, xfs_ino_t c_ino);
 static void		check_linkcounts(xfs_agnumber_t agno);
@@ -353,6 +358,9 @@ static void		scanfunc_fino(struct xfs_btree_block *block, int level,
 static void		scanfunc_rmap(struct xfs_btree_block *block, int level,
 				     struct xfs_agf *agf, xfs_agblock_t bno,
 				     int isroot);
+static void		scanfunc_refcnt(struct xfs_btree_block *block, int level,
+				     struct xfs_agf *agf, xfs_agblock_t bno,
+				     int isroot);
 static void		set_dbmap(xfs_agnumber_t agno, xfs_agblock_t agbno,
 				  xfs_extlen_t len, dbm_t type,
 				  xfs_agnumber_t c_agno, xfs_agblock_t c_agbno);
@@ -1055,6 +1063,7 @@ blocktrash_f(
 		   (1 << DBM_SYMLINK) |
 		   (1 << DBM_BTFINO) |
 		   (1 << DBM_BTRMAP) |
+		   (1 << DBM_BTREFC) |
 		   (1 << DBM_SB);
 	while ((c = getopt(argc, argv, "0123n:o:s:t:x:y:z")) != EOF) {
 		switch (c) {
@@ -1291,18 +1300,25 @@ check_dbmap(
 	xfs_agnumber_t	agno,
 	xfs_agblock_t	agbno,
 	xfs_extlen_t	len,
-	dbm_t		type)
+	dbm_t		type,
+	int		ignore_reflink)
 {
 	xfs_extlen_t	i;
 	char		*p;
+	dbm_t		d;
 
 	for (i = 0, p = &dbmap[agno][agbno]; i < len; i++, p++) {
+		d = (dbm_t)*p;
+		if (ignore_reflink && (d == DBM_UNKNOWN || d == DBM_DATA ||
+				       d == DBM_RLDATA))
+			continue;
 		if ((dbm_t)*p != type) {
-			if (!sflag || CHECK_BLISTA(agno, agbno + i))
+			if (!sflag || CHECK_BLISTA(agno, agbno + i)) {
 				dbprintf(_("block %u/%u expected type %s got "
 					 "%s\n"),
 					agno, agbno + i, typename[type],
 					typename[(dbm_t)*p]);
+			}
 			error++;
 		}
 	}
@@ -1336,7 +1352,7 @@ check_inomap(
 		return 0;
 	}
 	for (i = 0, rval = 1, idp = &inomap[agno][agbno]; i < len; i++, idp++) {
-		if (*idp) {
+		if (*idp && !(*idp)->isreflink) {
 			if (!sflag || (*idp)->ilist ||
 			    CHECK_BLISTA(agno, agbno + i))
 				dbprintf(_("block %u/%u claimed by inode %lld, "
@@ -1542,6 +1558,26 @@ check_rrange(
 	return 1;
 }
 
+/*
+ * We don't check the accuracy of reference counts -- all we do is ensure
+ * that a data block never crosses with non-data blocks.  repair can check
+ * those kinds of things.
+ *
+ * So with that in mind, if we're setting a block to be data or rldata,
+ * don't complain so long as the block is currently unknown, data, or rldata.
+ * Don't let blocks downgrade from rldata -> data.
+ */
+static bool
+is_reflink(
+	dbm_t		type2)
+{
+	if (!xfs_sb_version_hasreflink(&mp->m_sb))
+		return false;
+	if (type2 == DBM_DATA || type2 == DBM_RLDATA)
+		return true;
+	return false;
+}
+
 static void
 check_set_dbmap(
 	xfs_agnumber_t	agno,
@@ -1561,10 +1597,15 @@ check_set_dbmap(
 			agbno, agbno + len - 1, c_agno, c_agbno);
 		return;
 	}
-	check_dbmap(agno, agbno, len, type1);
+	check_dbmap(agno, agbno, len, type1, is_reflink(type2));
 	mayprint = verbose | blist_size;
 	for (i = 0, p = &dbmap[agno][agbno]; i < len; i++, p++) {
-		*p = (char)type2;
+		if (*p == DBM_RLDATA && type2 == DBM_DATA)
+			;	/* do nothing */
+		if (*p == DBM_DATA && type2 == DBM_DATA)
+			*p = (char)DBM_RLDATA;
+		else
+			*p = (char)type2;
 		if (mayprint && (verbose || CHECK_BLISTA(agno, agbno + i)))
 			dbprintf(_("setting block %u/%u to %s\n"), agno, agbno + i,
 				typename[type2]);
@@ -2804,6 +2845,7 @@ process_inode(
 		break;
 	}
 
+	id->isreflink = !!(xino.i_d.di_flags2 & XFS_DIFLAG2_REFLINK);
 	setlink_inode(id, VFS_I(&xino)->i_nlink, type == DBM_DIR, security);
 
 	switch (xino.i_d.di_format) {
@@ -3910,6 +3952,12 @@ scan_ag(
 			be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAP]),
 			1, scanfunc_rmap, TYP_RMAPBT);
 	}
+	if (agf->agf_refcount_root) {
+		scan_sbtree(agf,
+			be32_to_cpu(agf->agf_refcount_root),
+			be32_to_cpu(agf->agf_refcount_level),
+			1, scanfunc_refcnt, TYP_REFCBT);
+	}
 	scan_sbtree(agf,
 		be32_to_cpu(agi->agi_root),
 		be32_to_cpu(agi->agi_level),
@@ -4733,6 +4781,78 @@ scanfunc_rmap(
 }
 
 static void
+scanfunc_refcnt(
+	struct xfs_btree_block	*block,
+	int			level,
+	struct xfs_agf		*agf,
+	xfs_agblock_t		bno,
+	int			isroot)
+{
+	xfs_agnumber_t		seqno = be32_to_cpu(agf->agf_seqno);
+	int			i;
+	xfs_refcount_ptr_t	*pp;
+	struct xfs_refcount_rec	*rp;
+	xfs_agblock_t		lastblock;
+
+	if (be32_to_cpu(block->bb_magic) != XFS_REFC_CRC_MAGIC) {
+		dbprintf(_("bad magic # %#x in refcntbt block %u/%u\n"),
+			be32_to_cpu(block->bb_magic), seqno, bno);
+		serious_error++;
+		return;
+	}
+	if (be16_to_cpu(block->bb_level) != level) {
+		if (!sflag)
+			dbprintf(_("expected level %d got %d in refcntbt block "
+				 "%u/%u\n"),
+				level, be16_to_cpu(block->bb_level), seqno, bno);
+		error++;
+	}
+	set_dbmap(seqno, bno, 1, DBM_BTREFC, seqno, bno);
+	if (level == 0) {
+		if (be16_to_cpu(block->bb_numrecs) > mp->m_refc_mxr[0] ||
+		    (isroot == 0 && be16_to_cpu(block->bb_numrecs) < mp->m_refc_mnr[0])) {
+			dbprintf(_("bad btree nrecs (%u, min=%u, max=%u) in "
+				 "refcntbt block %u/%u\n"),
+				be16_to_cpu(block->bb_numrecs), mp->m_refc_mnr[0],
+				mp->m_refc_mxr[0], seqno, bno);
+			serious_error++;
+			return;
+		}
+		rp = XFS_REFCOUNT_REC_ADDR(block, 1);
+		lastblock = 0;
+		for (i = 0; i < be16_to_cpu(block->bb_numrecs); i++) {
+			set_dbmap(seqno, be32_to_cpu(rp[i].rc_startblock),
+				be32_to_cpu(rp[i].rc_blockcount), DBM_RLDATA,
+				seqno, bno);
+			if (be32_to_cpu(rp[i].rc_startblock) < lastblock) {
+				dbprintf(_(
+		"out-of-order refcnt btree record %d (%u %u) block %u/%u\n"),
+					 i, be32_to_cpu(rp[i].rc_startblock),
+					 be32_to_cpu(rp[i].rc_startblock),
+					 be32_to_cpu(agf->agf_seqno), bno);
+			} else {
+				lastblock = be32_to_cpu(rp[i].rc_startblock) +
+					    be32_to_cpu(rp[i].rc_blockcount);
+			}
+		}
+		return;
+	}
+	if (be16_to_cpu(block->bb_numrecs) > mp->m_refc_mxr[1] ||
+	    (isroot == 0 && be16_to_cpu(block->bb_numrecs) < mp->m_refc_mnr[1])) {
+		dbprintf(_("bad btree nrecs (%u, min=%u, max=%u) in refcntbt "
+			 "block %u/%u\n"),
+			be16_to_cpu(block->bb_numrecs), mp->m_refc_mnr[1],
+			mp->m_refc_mxr[1], seqno, bno);
+		serious_error++;
+		return;
+	}
+	pp = XFS_REFCOUNT_PTR_ADDR(block, 1, mp->m_refc_mxr[1]);
+	for (i = 0; i < be16_to_cpu(block->bb_numrecs); i++)
+		scan_sbtree(agf, be32_to_cpu(pp[i]), level, 0, scanfunc_refcnt,
+				TYP_REFCBT);
+}
+
+static void
 set_dbmap(
 	xfs_agnumber_t	agno,
 	xfs_agblock_t	agbno,

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 112/145] xfs_db: metadump should copy the refcount btree too
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (110 preceding siblings ...)
  2016-06-17  1:42 ` [PATCH 111/145] xfs_db: add support for checking the refcount btree Darrick J. Wong
@ 2016-06-17  1:42 ` Darrick J. Wong
  2016-06-17  1:42 ` [PATCH 113/145] xfs_db: deal with the CoW extent size hint Darrick J. Wong
                   ` (32 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:42 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Teach metadump to copy the refcount btree.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 db/metadump.c |   74 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 74 insertions(+)


diff --git a/db/metadump.c b/db/metadump.c
index 609a5d7..a98e951 100644
--- a/db/metadump.c
+++ b/db/metadump.c
@@ -615,6 +615,78 @@ copy_rmap_btree(
 	return scan_btree(agno, root, levels, TYP_RMAPBT, agf, scanfunc_rmapbt);
 }
 
+static int
+scanfunc_refcntbt(
+	struct xfs_btree_block	*block,
+	xfs_agnumber_t		agno,
+	xfs_agblock_t		agbno,
+	int			level,
+	typnm_t			btype,
+	void			*arg)
+{
+	xfs_refcount_ptr_t	*pp;
+	int			i;
+	int			numrecs;
+
+	if (level == 0)
+		return 1;
+
+	numrecs = be16_to_cpu(block->bb_numrecs);
+	if (numrecs > mp->m_refc_mxr[1]) {
+		if (show_warnings)
+			print_warning("invalid numrecs (%u) in %s block %u/%u",
+				numrecs, typtab[btype].name, agno, agbno);
+		return 1;
+	}
+
+	pp = XFS_REFCOUNT_PTR_ADDR(block, 1, mp->m_refc_mxr[1]);
+	for (i = 0; i < numrecs; i++) {
+		if (!valid_bno(agno, be32_to_cpu(pp[i]))) {
+			if (show_warnings)
+				print_warning("invalid block number (%u/%u) "
+					"in %s block %u/%u",
+					agno, be32_to_cpu(pp[i]),
+					typtab[btype].name, agno, agbno);
+			continue;
+		}
+		if (!scan_btree(agno, be32_to_cpu(pp[i]), level, btype, arg,
+				scanfunc_refcntbt))
+			return 0;
+	}
+	return 1;
+}
+
+static int
+copy_refcount_btree(
+	xfs_agnumber_t	agno,
+	struct xfs_agf	*agf)
+{
+	xfs_agblock_t	root;
+	int		levels;
+
+	if (!xfs_sb_version_hasreflink(&mp->m_sb))
+		return 1;
+
+	root = be32_to_cpu(agf->agf_refcount_root);
+	levels = be32_to_cpu(agf->agf_refcount_level);
+
+	/* validate root and levels before processing the tree */
+	if (root == 0 || root > mp->m_sb.sb_agblocks) {
+		if (show_warnings)
+			print_warning("invalid block number (%u) in refcntbt "
+					"root in agf %u", root, agno);
+		return 1;
+	}
+	if (levels >= XFS_BTREE_MAXLEVELS) {
+		if (show_warnings)
+			print_warning("invalid level (%u) in refcntbt root "
+					"in agf %u", levels, agno);
+		return 1;
+	}
+
+	return scan_btree(agno, root, levels, TYP_REFCBT, agf, scanfunc_refcntbt);
+}
+
 /* filename and extended attribute obfuscation routines */
 
 struct name_ent {
@@ -2525,6 +2597,8 @@ scan_ag(
 			goto pop_out;
 		if (!copy_rmap_btree(agno, agf))
 			goto pop_out;
+		if (!copy_refcount_btree(agno, agf))
+			goto pop_out;
 	}
 
 	/* copy inode btrees and the inodes and their associated metadata */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 113/145] xfs_db: deal with the CoW extent size hint
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (111 preceding siblings ...)
  2016-06-17  1:42 ` [PATCH 112/145] xfs_db: metadump should copy the refcount btree too Darrick J. Wong
@ 2016-06-17  1:42 ` Darrick J. Wong
  2016-06-17  1:42 ` [PATCH 114/145] xfs_growfs: report the presence of the reflink feature Darrick J. Wong
                   ` (31 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:42 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Display the CoW extent hint size when dumping inodes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 db/inode.c |    4 ++++
 1 file changed, 4 insertions(+)


diff --git a/db/inode.c b/db/inode.c
index 702cdf8..cac19fc 100644
--- a/db/inode.c
+++ b/db/inode.c
@@ -172,12 +172,16 @@ const field_t	inode_v3_flds[] = {
 	{ "change_count", FLDT_UINT64D, OI(COFF(changecount)), C1, 0, TYP_NONE },
 	{ "lsn", FLDT_UINT64X, OI(COFF(lsn)), C1, 0, TYP_NONE },
 	{ "flags2", FLDT_UINT64X, OI(COFF(flags2)), C1, 0, TYP_NONE },
+	{ "cowextsize", FLDT_EXTLEN, OI(COFF(cowextsize)), C1, 0, TYP_NONE },
 	{ "crtime", FLDT_TIMESTAMP, OI(COFF(crtime)), C1, 0, TYP_NONE },
 	{ "inumber", FLDT_INO, OI(COFF(ino)), C1, 0, TYP_NONE },
 	{ "uuid", FLDT_UUID, OI(COFF(uuid)), C1, 0, TYP_NONE },
 	{ "reflink", FLDT_UINT1,
 	  OI(COFF(flags2) + bitsz(__uint64_t) - XFS_DIFLAG2_REFLINK_BIT-1), C1,
 	  0, TYP_NONE },
+	{ "cowextsz", FLDT_UINT1,
+	  OI(COFF(flags2) + bitsz(__uint64_t) - XFS_DIFLAG2_COWEXTSIZE_BIT-1), C1,
+	  0, TYP_NONE },
 	{ NULL }
 };
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 114/145] xfs_growfs: report the presence of the reflink feature
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (112 preceding siblings ...)
  2016-06-17  1:42 ` [PATCH 113/145] xfs_db: deal with the CoW extent size hint Darrick J. Wong
@ 2016-06-17  1:42 ` Darrick J. Wong
  2016-06-17  1:42 ` [PATCH 115/145] xfs_io: bmap should support querying CoW fork, shared blocks Darrick J. Wong
                   ` (30 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:42 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Report the presence of the reflink feature in xfs_info.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 growfs/xfs_growfs.c |   12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)


diff --git a/growfs/xfs_growfs.c b/growfs/xfs_growfs.c
index 2b46480..a294e14 100644
--- a/growfs/xfs_growfs.c
+++ b/growfs/xfs_growfs.c
@@ -59,12 +59,14 @@ report_info(
 	int		ftype_enabled,
 	int		finobt_enabled,
 	int		spinodes,
-	int		rmapbt_enabled)
+	int		rmapbt_enabled,
+	int		reflink_enabled)
 {
 	printf(_(
 	    "meta-data=%-22s isize=%-6u agcount=%u, agsize=%u blks\n"
 	    "         =%-22s sectsz=%-5u attr=%u, projid32bit=%u\n"
 	    "         =%-22s crc=%-8u finobt=%u spinodes=%u rmapbt=%u\n"
+	    "         =%-22s reflink=%u\n"
 	    "data     =%-22s bsize=%-6u blocks=%llu, imaxpct=%u\n"
 	    "         =%-22s sunit=%-6u swidth=%u blks\n"
 	    "naming   =version %-14u bsize=%-6u ascii-ci=%d ftype=%d\n"
@@ -75,6 +77,7 @@ report_info(
 		mntpoint, geo.inodesize, geo.agcount, geo.agblocks,
 		"", geo.sectsize, attrversion, projid32bit,
 		"", crcs_enabled, finobt_enabled, spinodes, rmapbt_enabled,
+		"", reflink_enabled,
 		"", geo.blocksize, (unsigned long long)geo.datablocks,
 			geo.imaxpct,
 		"", geo.sunit, geo.swidth,
@@ -129,6 +132,7 @@ main(int argc, char **argv)
 	int			finobt_enabled;	/* free inode btree */
 	int			spinodes;
 	int			rmapbt_enabled;
+	int			reflink_enabled;
 
 	progname = basename(argv[0]);
 	setlocale(LC_ALL, "");
@@ -253,12 +257,13 @@ main(int argc, char **argv)
 	finobt_enabled = geo.flags & XFS_FSOP_GEOM_FLAGS_FINOBT ? 1 : 0;
 	spinodes = geo.flags & XFS_FSOP_GEOM_FLAGS_SPINODES ? 1 : 0;
 	rmapbt_enabled = geo.flags & XFS_FSOP_GEOM_FLAGS_RMAPBT ? 1 : 0;
+	reflink_enabled = geo.flags & XFS_FSOP_GEOM_FLAGS_REFLINK ? 1 : 0;
 	if (nflag) {
 		report_info(geo, datadev, isint, logdev, rtdev,
 				lazycount, dirversion, logversion,
 				attrversion, projid32bit, crcs_enabled, ci,
 				ftype_enabled, finobt_enabled, spinodes,
-				rmapbt_enabled);
+				rmapbt_enabled, reflink_enabled);
 		exit(0);
 	}
 
@@ -296,7 +301,8 @@ main(int argc, char **argv)
 	report_info(geo, datadev, isint, logdev, rtdev,
 			lazycount, dirversion, logversion,
 			attrversion, projid32bit, crcs_enabled, ci, ftype_enabled,
-			finobt_enabled, spinodes, rmapbt_enabled);
+			finobt_enabled, spinodes, rmapbt_enabled,
+			reflink_enabled);
 
 	ddsize = xi.dsize;
 	dlsize = ( xi.logBBsize? xi.logBBsize :

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 115/145] xfs_io: bmap should support querying CoW fork, shared blocks
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (113 preceding siblings ...)
  2016-06-17  1:42 ` [PATCH 114/145] xfs_growfs: report the presence of the reflink feature Darrick J. Wong
@ 2016-06-17  1:42 ` Darrick J. Wong
  2016-06-17  1:42 ` [PATCH 116/145] xfs_io: get and set the CoW extent size hint Darrick J. Wong
                   ` (29 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:42 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Teach the bmap command to report shared and delayed allocation
extents, and to be able to query the CoW fork.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 io/bmap.c           |   43 ++++++++++++++++++++++++++++++++++---------
 man/man8/xfs_bmap.8 |   14 ++++++++++++++
 man/man8/xfs_io.8   |    2 +-
 3 files changed, 49 insertions(+), 10 deletions(-)


diff --git a/io/bmap.c b/io/bmap.c
index 04d04c7..0834cdd 100644
--- a/io/bmap.c
+++ b/io/bmap.c
@@ -41,7 +41,9 @@ bmap_help(void)
 " Holes are marked by replacing the startblock..endblock with 'hole'.\n"
 " All the file offsets and disk blocks are in units of 512-byte blocks.\n"
 " -a -- prints the attribute fork map instead of the data fork.\n"
+" -c -- prints the copy-on-write fork map instead of the data fork.\n"
 " -d -- suppresses a DMAPI read event, offline portions shown as holes.\n"
+" -e -- print delayed allocation extents.\n"
 " -l -- also displays the length of each extent in 512-byte blocks.\n"
 " -n -- query n extents.\n"
 " -p -- obtain all unwritten extents as well (w/ -v show which are unwritten.)\n"
@@ -75,6 +77,7 @@ bmap_f(
 	int			loop = 0;
 	int			flg = 0;
 	int			aflag = 0;
+	int			cflag = 0;
 	int			lflag = 0;
 	int			nflag = 0;
 	int			pflag = 0;
@@ -85,12 +88,19 @@ bmap_f(
 	int			c;
 	int			egcnt;
 
-	while ((c = getopt(argc, argv, "adln:pv")) != EOF) {
+	while ((c = getopt(argc, argv, "acdeln:pv")) != EOF) {
 		switch (c) {
 		case 'a':	/* Attribute fork. */
 			bmv_iflags |= BMV_IF_ATTRFORK;
 			aflag = 1;
 			break;
+		case 'c':	/* CoW fork. */
+			bmv_iflags |= BMV_IF_COWFORK | BMV_IF_DELALLOC;
+			cflag = 1;
+			break;
+		case 'e':
+			bmv_iflags |= BMV_IF_DELALLOC;
+			break;
 		case 'l':	/* list number of blocks with each extent */
 			lflag = 1;
 			break;
@@ -113,7 +123,7 @@ bmap_f(
 			return command_usage(&bmap_cmd);
 		}
 	}
-	if (aflag)
+	if (aflag || cflag)
 		bmv_iflags &= ~(BMV_IF_PREALLOC|BMV_IF_NO_DMAPI_READ);
 
 	if (vflag) {
@@ -271,13 +281,14 @@ bmap_f(
 #define MINRANGE_WIDTH	16
 #define MINAG_WIDTH	2
 #define MINTOT_WIDTH	5
-#define NFLG		5	/* count of flags */
-#define	FLG_NULL	000000	/* Null flag */
-#define	FLG_PRE		010000	/* Unwritten extent */
-#define	FLG_BSU		001000	/* Not on begin of stripe unit  */
-#define	FLG_ESU		000100	/* Not on end   of stripe unit  */
-#define	FLG_BSW		000010	/* Not on begin of stripe width */
-#define	FLG_ESW		000001	/* Not on end   of stripe width */
+#define NFLG		6	/* count of flags */
+#define	FLG_NULL	0000000	/* Null flag */
+#define	FLG_SHARED	0100000	/* shared extent */
+#define	FLG_PRE		0010000	/* Unwritten extent */
+#define	FLG_BSU		0001000	/* Not on begin of stripe unit  */
+#define	FLG_ESU		0000100	/* Not on end   of stripe unit  */
+#define	FLG_BSW		0000010	/* Not on begin of stripe width */
+#define	FLG_ESW		0000001	/* Not on end   of stripe width */
 		int	agno;
 		off64_t agoff, bbperag;
 		int	foff_w, boff_w, aoff_w, tot_w, agno_w;
@@ -348,6 +359,10 @@ bmap_f(
 			if (map[i + 1].bmv_oflags & BMV_OF_PREALLOC) {
 				flg |= FLG_PRE;
 			}
+			if (map[i + 1].bmv_oflags & BMV_OF_SHARED)
+				flg |= FLG_SHARED;
+			if (map[i + 1].bmv_oflags & BMV_OF_DELALLOC)
+				map[i + 1].bmv_block = -2;
 			/*
 			 * If striping enabled, determine if extent starts/ends
 			 * on a stripe unit boundary.
@@ -380,6 +395,14 @@ bmap_f(
 					agno_w, "",
 					aoff_w, "",
 					tot_w, (long long)map[i+1].bmv_length);
+			} else if (map[i + 1].bmv_block == -2) {
+				printf("%4d: %-*s %-*s %*s %-*s %*lld\n",
+					i,
+					foff_w, rbuf,
+					boff_w, _("delalloc"),
+					agno_w, "",
+					aoff_w, "",
+					tot_w, (long long)map[i+1].bmv_length);
 			} else {
 				snprintf(bbuf, sizeof(bbuf), "%lld..%lld",
 					(long long) map[i + 1].bmv_block,
@@ -411,6 +434,8 @@ bmap_f(
 		}
 		if ((flg || pflag) && vflag > 1) {
 			printf(_(" FLAG Values:\n"));
+			printf(_("    %*.*o Shared extent\n"),
+				NFLG+1, NFLG+1, FLG_SHARED);
 			printf(_("    %*.*o Unwritten preallocated extent\n"),
 				NFLG+1, NFLG+1, FLG_PRE);
 			printf(_("    %*.*o Doesn't begin on stripe unit\n"),
diff --git a/man/man8/xfs_bmap.8 b/man/man8/xfs_bmap.8
index e196559..098cfae 100644
--- a/man/man8/xfs_bmap.8
+++ b/man/man8/xfs_bmap.8
@@ -36,6 +36,10 @@ no matter what the filesystem's block size is.
 If this option is specified, information about the file's
 attribute fork is printed instead of the default data fork.
 .TP
+.B \-c
+If this option is specified, information about the file's
+copy on write fork is printed instead of the default data fork.
+.TP
 .B \-d
 If portions of the file have been migrated offline by
 a DMAPI application, a DMAPI read event will be generated to
@@ -45,6 +49,16 @@ printed.  However if the
 option is used, no DMAPI read event will be generated for a
 DMAPI file and offline portions will be reported as holes.
 .TP
+.B \-e
+If this option is used,
+.B xfs_bmap
+obtains all delayed allocation extents, and does not flush dirty pages
+to disk before querying extent data. With the
+.B \-v
+option, the
+.I flags
+column will show which extents have not yet been allocated.
+.TP
 .B \-l
 If this option is used, then
 .IP
diff --git a/man/man8/xfs_io.8 b/man/man8/xfs_io.8
index b0300bb..6c45c37 100644
--- a/man/man8/xfs_io.8
+++ b/man/man8/xfs_io.8
@@ -256,7 +256,7 @@ See the
 .B pwrite
 command.
 .TP
-.BI "bmap [ \-adlpv ] [ \-n " nx " ]"
+.BI "bmap [ \-acdelpv ] [ \-n " nx " ]"
 Prints the block mapping for the current open file. Refer to the
 .BR xfs_bmap (8)
 manual page for complete documentation.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 116/145] xfs_io: get and set the CoW extent size hint
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (114 preceding siblings ...)
  2016-06-17  1:42 ` [PATCH 115/145] xfs_io: bmap should support querying CoW fork, shared blocks Darrick J. Wong
@ 2016-06-17  1:42 ` Darrick J. Wong
  2016-06-17  1:43 ` [PATCH 117/145] xfs_io: add refcount+bmap error injection types Darrick J. Wong
                   ` (28 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:42 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Enable administrators to get or set the CoW extent size hint.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 io/attr.c         |    8 ++-
 io/open.c         |  166 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 man/man8/xfs_io.8 |   16 +++++
 3 files changed, 189 insertions(+), 1 deletion(-)


diff --git a/io/attr.c b/io/attr.c
index 0186b1d..13bec73 100644
--- a/io/attr.c
+++ b/io/attr.c
@@ -48,9 +48,11 @@ static struct xflags {
 	{ FS_XFLAG_NODEFRAG,		"f", "no-defrag"	},
 	{ FS_XFLAG_FILESTREAM,		"S", "filestream"	},
 	{ FS_XFLAG_DAX,			"x", "dax"		},
+	{ FS_XFLAG_REFLINK,		"R", "reflink"		},
+	{ FS_XFLAG_COWEXTSIZE,		"C", "cowextsize"	},
 	{ 0, NULL, NULL }
 };
-#define CHATTR_XFLAG_LIST	"r"/*p*/"iasAdtPneEfSx"
+#define CHATTR_XFLAG_LIST	"r"/*p*/"iasAdtPneEfSxRC"
 
 static void
 lsattr_help(void)
@@ -75,6 +77,8 @@ lsattr_help(void)
 " f -- do not include this file when defragmenting the filesystem\n"
 " S -- enable filestreams allocator for this directory\n"
 " x -- Use direct access (DAX) for data in this file\n"
+" R -- file data blocks may be shared with another file\n"
+" C -- for files with shared blocks, observe the inode CoW extent size value\n"
 "\n"
 " Options:\n"
 " -R -- recursively descend (useful when current file is a directory)\n"
@@ -111,6 +115,8 @@ chattr_help(void)
 " +/-f -- set/clear the no-defrag flag\n"
 " +/-S -- set/clear the filestreams allocator flag\n"
 " +/-x -- set/clear the direct access (DAX) flag\n"
+" +/-R -- set/clear the reflink flag\n"
+" +/-C -- set/clear the CoW extent-size flag\n"
 " Note1: user must have certain capabilities to modify immutable/append-only.\n"
 " Note2: immutable/append-only files cannot be deleted; removing these files\n"
 "        requires the immutable/append-only flag to be cleared first.\n"
diff --git a/io/open.c b/io/open.c
index 2303527..1e682a4 100644
--- a/io/open.c
+++ b/io/open.c
@@ -46,8 +46,10 @@ static cmdinfo_t chproj_cmd;
 static cmdinfo_t lsproj_cmd;
 static cmdinfo_t extsize_cmd;
 static cmdinfo_t inode_cmd;
+static cmdinfo_t cowextsize_cmd;
 static prid_t prid;
 static long extsize;
+static long cowextsize;
 
 off64_t
 filesize(void)
@@ -125,6 +127,7 @@ stat_f(
 		printxattr(fsx.fsx_xflags, verbose, 0, file->name, 1, 1);
 		printf(_("fsxattr.projid = %u\n"), fsx.fsx_projid);
 		printf(_("fsxattr.extsize = %u\n"), fsx.fsx_extsize);
+		printf(_("fsxattr.cowextsize = %u\n"), fsx.fsx_cowextsize);
 		printf(_("fsxattr.nextents = %u\n"), fsx.fsx_nextents);
 		printf(_("fsxattr.naextents = %u\n"), fsxa.fsx_nextents);
 	}
@@ -696,6 +699,158 @@ extsize_f(
 	return 0;
 }
 
+static void
+cowextsize_help(void)
+{
+	printf(_(
+"\n"
+" report or modify preferred CoW extent size (in bytes) for the current path\n"
+"\n"
+" -R -- recursively descend (useful when current path is a directory)\n"
+" -D -- recursively descend, only modifying cowextsize on directories\n"
+"\n"));
+}
+
+static int
+get_cowextsize(const char *path, int fd)
+{
+	struct fsxattr	fsx;
+
+	if ((xfsctl(path, fd, XFS_IOC_FSGETXATTR, &fsx)) < 0) {
+		printf("%s: XFS_IOC_FSGETXATTR %s: %s\n",
+			progname, path, strerror(errno));
+		return 0;
+	}
+	printf("[%u] %s\n", fsx.fsx_cowextsize, path);
+	return 0;
+}
+
+static int
+set_cowextsize(const char *path, int fd, long extsz)
+{
+	struct fsxattr	fsx;
+	struct stat64	stat;
+
+	if (fstat64(fd, &stat) < 0) {
+		perror("fstat64");
+		return 0;
+	}
+	if ((xfsctl(path, fd, XFS_IOC_FSGETXATTR, &fsx)) < 0) {
+		printf("%s: XFS_IOC_FSGETXATTR %s: %s\n",
+			progname, path, strerror(errno));
+		return 0;
+	}
+
+	if (S_ISREG(stat.st_mode) || S_ISDIR(stat.st_mode)) {
+		fsx.fsx_xflags |= FS_XFLAG_COWEXTSIZE;
+	} else {
+		printf(_("invalid target file type - file %s\n"), path);
+		return 0;
+	}
+	fsx.fsx_cowextsize = extsz;
+
+	if ((xfsctl(path, fd, XFS_IOC_FSSETXATTR, &fsx)) < 0) {
+		printf("%s: XFS_IOC_FSSETXATTR %s: %s\n",
+			progname, path, strerror(errno));
+		return 0;
+	}
+
+	return 0;
+}
+
+static int
+get_cowextsize_callback(
+	const char		*path,
+	const struct stat	*stat,
+	int			status,
+	struct FTW		*data)
+{
+	int			fd;
+
+	if (recurse_dir && !S_ISDIR(stat->st_mode))
+		return 0;
+
+	fd = open(path, O_RDONLY);
+	if (fd < 0) {
+		fprintf(stderr, _("%s: cannot open %s: %s\n"),
+			progname, path, strerror(errno));
+	} else {
+		get_cowextsize(path, fd);
+		close(fd);
+	}
+	return 0;
+}
+
+static int
+set_cowextsize_callback(
+	const char		*path,
+	const struct stat	*stat,
+	int			status,
+	struct FTW		*data)
+{
+	int			fd;
+
+	if (recurse_dir && !S_ISDIR(stat->st_mode))
+		return 0;
+
+	fd = open(path, O_RDONLY);
+	if (fd < 0) {
+		fprintf(stderr, _("%s: cannot open %s: %s\n"),
+			progname, path, strerror(errno));
+	} else {
+		set_cowextsize(path, fd, cowextsize);
+		close(fd);
+	}
+	return 0;
+}
+
+static int
+cowextsize_f(
+	int		argc,
+	char		**argv)
+{
+	size_t			blocksize, sectsize;
+	int			c;
+
+	recurse_all = recurse_dir = 0;
+	init_cvtnum(&blocksize, &sectsize);
+	while ((c = getopt(argc, argv, "DR")) != EOF) {
+		switch (c) {
+		case 'D':
+			recurse_all = 0;
+			recurse_dir = 1;
+			break;
+		case 'R':
+			recurse_all = 1;
+			recurse_dir = 0;
+			break;
+		default:
+			return command_usage(&cowextsize_cmd);
+		}
+	}
+
+	if (optind < argc) {
+		cowextsize = (long)cvtnum(blocksize, sectsize, argv[optind]);
+		if (cowextsize < 0) {
+			printf(_("non-numeric cowextsize argument -- %s\n"),
+				argv[optind]);
+			return 0;
+		}
+	} else {
+		cowextsize = -1;
+	}
+
+	if (recurse_all || recurse_dir)
+		nftw(file->name, (extsize >= 0) ?
+			set_cowextsize_callback : get_cowextsize_callback,
+			100, FTW_PHYS | FTW_MOUNT | FTW_DEPTH);
+	else if (cowextsize >= 0)
+		set_cowextsize(file->name, file->fd, cowextsize);
+	else
+		get_cowextsize(file->name, file->fd);
+	return 0;
+}
+
 static int
 statfs_f(
 	int			argc,
@@ -964,6 +1119,16 @@ open_init(void)
 		_("Query inode number usage in the filesystem");
 	inode_cmd.help = inode_help;
 
+	cowextsize_cmd.name = "cowextsize";
+	cowextsize_cmd.cfunc = cowextsize_f;
+	cowextsize_cmd.args = _("[-D | -R] [cowextsize]");
+	cowextsize_cmd.argmin = 0;
+	cowextsize_cmd.argmax = -1;
+	cowextsize_cmd.flags = CMD_NOMAP_OK;
+	cowextsize_cmd.oneline =
+		_("get/set preferred CoW extent size (in bytes) for the open file");
+	cowextsize_cmd.help = cowextsize_help;
+
 	add_command(&open_cmd);
 	add_command(&stat_cmd);
 	add_command(&close_cmd);
@@ -972,4 +1137,5 @@ open_init(void)
 	add_command(&lsproj_cmd);
 	add_command(&extsize_cmd);
 	add_command(&inode_cmd);
+	add_command(&cowextsize_cmd);
 }
diff --git a/man/man8/xfs_io.8 b/man/man8/xfs_io.8
index 6c45c37..cc70b7c 100644
--- a/man/man8/xfs_io.8
+++ b/man/man8/xfs_io.8
@@ -283,6 +283,22 @@ The
 should be specified in bytes, or using one of the usual units suffixes
 (k, m, g, b, etc). The extent size is always reported in units of bytes.
 .TP
+.BI "cowextsize [ \-R | \-D ] [ " value " ]"
+Display and/or modify the preferred copy-on-write extent size used
+when allocating space for the currently open file. If the
+.B \-R
+option is specified, a recursive descent is performed
+for all directory entries below the currently open file
+.RB ( \-D
+can be used to restrict the output to directories only).
+If the target file is a directory, then the inherited CoW extent size
+is set for that directory (new files created in that directory
+inherit that CoW extent size).
+The
+.I value
+should be specified in bytes, or using one of the usual units suffixes
+(k, m, g, b, etc). The extent size is always reported in units of bytes.
+.TP
 .BI "allocsp " size " 0"
 Sets the size of the file to
 .I size

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 117/145] xfs_io: add refcount+bmap error injection types
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (115 preceding siblings ...)
  2016-06-17  1:42 ` [PATCH 116/145] xfs_io: get and set the CoW extent size hint Darrick J. Wong
@ 2016-06-17  1:43 ` Darrick J. Wong
  2016-06-17  1:43 ` [PATCH 118/145] xfs_logprint: support cowextsize reporting in log contents Darrick J. Wong
                   ` (27 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:43 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Add refcount and bmap deferred finish to the types of errors we can
inject.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 io/inject.c |    8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)


diff --git a/io/inject.c b/io/inject.c
index 16ac925..56642b8 100644
--- a/io/inject.c
+++ b/io/inject.c
@@ -78,7 +78,13 @@ error_tag(char *name)
 		{ XFS_ERRTAG_FREE_EXTENT,		"free_extent" },
 #define XFS_ERRTAG_RMAP_FINISH_ONE			23
 		{ XFS_ERRTAG_RMAP_FINISH_ONE,		"rmap_finish_one" },
-#define XFS_ERRTAG_MAX                                  24
+#define XFS_ERRTAG_REFCOUNT_CONTINUE_UPDATE		24
+		{ XFS_ERRTAG_REFCOUNT_CONTINUE_UPDATE,	"refcount_continue_update" },
+#define XFS_ERRTAG_REFCOUNT_FINISH_ONE			25
+		{ XFS_ERRTAG_REFCOUNT_FINISH_ONE,	"refcount_finish_one" },
+#define XFS_ERRTAG_BMAP_FINISH_ONE			26
+		{ XFS_ERRTAG_BMAP_FINISH_ONE,		"bmap_finish_one" },
+#define XFS_ERRTAG_MAX                                  27
 		{ XFS_ERRTAG_MAX,			NULL }
 	};
 	int	count;

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 118/145] xfs_logprint: support cowextsize reporting in log contents
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (116 preceding siblings ...)
  2016-06-17  1:43 ` [PATCH 117/145] xfs_io: add refcount+bmap error injection types Darrick J. Wong
@ 2016-06-17  1:43 ` Darrick J. Wong
  2016-06-17  1:43 ` [PATCH 119/145] xfs_logprint: support refcount redo items Darrick J. Wong
                   ` (26 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:43 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 logprint/log_misc.c      |    4 ++++
 logprint/log_print_all.c |    4 ++++
 2 files changed, 8 insertions(+)


diff --git a/logprint/log_misc.c b/logprint/log_misc.c
index 479fc14..f6488d9 100644
--- a/logprint/log_misc.c
+++ b/logprint/log_misc.c
@@ -513,6 +513,10 @@ xlog_print_trans_inode_core(
 	   ip->di_dmstate);
     printf(_("flags 0x%x gen 0x%x\n"),
 	   ip->di_flags, ip->di_gen);
+    if (ip->di_version == 3) {
+        printf(_("flags2 0x%llx cowextsize 0x%x\n"),
+            (unsigned long long)ip->di_flags2, ip->di_cowextsize);
+    }
 }
 
 void
diff --git a/logprint/log_print_all.c b/logprint/log_print_all.c
index 0fe354b..46952c4 100644
--- a/logprint/log_print_all.c
+++ b/logprint/log_print_all.c
@@ -272,6 +272,10 @@ xlog_recover_print_inode_core(
 	     "gen:%d\n"),
 	       (int)di->di_forkoff, di->di_dmevmask, (int)di->di_dmstate,
 	       (int)di->di_flags, di->di_gen);
+	if (di->di_version == 3) {
+		printf(_("flags2 0x%llx cowextsize 0x%x\n"),
+			(unsigned long long)di->di_flags2, di->di_cowextsize);
+	}
 }
 
 STATIC void

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 119/145] xfs_logprint: support refcount redo items
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (117 preceding siblings ...)
  2016-06-17  1:43 ` [PATCH 118/145] xfs_logprint: support cowextsize reporting in log contents Darrick J. Wong
@ 2016-06-17  1:43 ` Darrick J. Wong
  2016-06-17  1:43 ` [PATCH 120/145] xfs_logprint: support bmap " Darrick J. Wong
                   ` (25 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:43 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Print reference count update redo items.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 logprint/log_misc.c      |   11 +++
 logprint/log_print_all.c |   12 ++++
 logprint/log_redo.c      |  151 ++++++++++++++++++++++++++++++++++++++++++++++
 logprint/logprint.h      |    5 ++
 4 files changed, 179 insertions(+)


diff --git a/logprint/log_misc.c b/logprint/log_misc.c
index f6488d9..5389b72 100644
--- a/logprint/log_misc.c
+++ b/logprint/log_misc.c
@@ -1008,6 +1008,17 @@ xlog_print_record(
 					be32_to_cpu(op_head->oh_len));
 			break;
 		    }
+		    case XFS_LI_CUI: {
+			skip = xlog_print_trans_cui(&ptr,
+					be32_to_cpu(op_head->oh_len),
+					continued);
+			break;
+		    }
+		    case XFS_LI_CUD: {
+			skip = xlog_print_trans_cud(&ptr,
+					be32_to_cpu(op_head->oh_len));
+			break;
+		    }
 		    case XFS_LI_QUOTAOFF: {
 			skip = xlog_print_trans_qoff(&ptr,
 					be32_to_cpu(op_head->oh_len));
diff --git a/logprint/log_print_all.c b/logprint/log_print_all.c
index 46952c4..eb3e326 100644
--- a/logprint/log_print_all.c
+++ b/logprint/log_print_all.c
@@ -418,6 +418,12 @@ xlog_recover_print_logitem(
 	case XFS_LI_RUI:
 		xlog_recover_print_rui(item);
 		break;
+	case XFS_LI_CUD:
+		xlog_recover_print_cud(item);
+		break;
+	case XFS_LI_CUI:
+		xlog_recover_print_cui(item);
+		break;
 	case XFS_LI_DQUOT:
 		xlog_recover_print_dquot(item);
 		break;
@@ -458,6 +464,12 @@ xlog_recover_print_item(
 	case XFS_LI_RUI:
 		printf("RUI");
 		break;
+	case XFS_LI_CUD:
+		printf("CUD");
+		break;
+	case XFS_LI_CUI:
+		printf("CUI");
+		break;
 	case XFS_LI_DQUOT:
 		printf("DQ ");
 		break;
diff --git a/logprint/log_redo.c b/logprint/log_redo.c
index 717dccd..1539be1 100644
--- a/logprint/log_redo.c
+++ b/logprint/log_redo.c
@@ -381,3 +381,154 @@ xlog_recover_print_rud(
 	f = item->ri_buf[0].i_addr;
 	xlog_print_trans_rud(&f, sizeof(struct xfs_rud_log_format));
 }
+
+/* Reference Count Update Items */
+
+static int
+xfs_cui_copy_format(
+	char			  *buf,
+	uint			  len,
+	struct xfs_cui_log_format *dst_fmt,
+	int			  continued)
+{
+	uint nextents = ((struct xfs_cui_log_format *)buf)->cui_nextents;
+	uint dst_len = sizeof(struct xfs_cui_log_format) +
+			(nextents - 1) * sizeof(struct xfs_phys_extent);
+
+	if (len == dst_len || continued) {
+		memcpy((char *)dst_fmt, buf, len);
+		return 0;
+	}
+	fprintf(stderr, _("%s: bad size of CUI format: %u; expected %u; nextents = %u\n"),
+		progname, len, dst_len, nextents);
+	return 1;
+}
+
+int
+xlog_print_trans_cui(
+	char			**ptr,
+	uint			src_len,
+	int			continued)
+{
+	struct xfs_cui_log_format	*src_f, *f = NULL;
+	uint			dst_len;
+	uint			nextents;
+	struct xfs_phys_extent	*ex;
+	int			i;
+	int			error = 0;
+	int			core_size;
+
+	core_size = offsetof(struct xfs_cui_log_format, cui_extents);
+
+	/*
+	 * memmove to ensure 8-byte alignment for the long longs in
+	 * struct xfs_cui_log_format structure
+	 */
+	src_f = malloc(src_len);
+	if (src_f == NULL) {
+		fprintf(stderr, _("%s: %s: malloc failed\n"),
+			progname, __func__);
+		exit(1);
+	}
+	memmove((char*)src_f, *ptr, src_len);
+	*ptr += src_len;
+
+	/* convert to native format */
+	nextents = src_f->cui_nextents;
+	dst_len = sizeof(struct xfs_cui_log_format) +
+			(nextents - 1) * sizeof(struct xfs_phys_extent);
+
+	if (continued && src_len < core_size) {
+		printf(_("CUI: Not enough data to decode further\n"));
+		error = 1;
+		goto error;
+	}
+
+	f = malloc(dst_len);
+	if (f == NULL) {
+		fprintf(stderr, _("%s: %s: malloc failed\n"),
+			progname, __func__);
+		exit(1);
+	}
+	if (xfs_cui_copy_format((char *)src_f, src_len, f, continued)) {
+		error = 1;
+		goto error;
+	}
+
+	printf(_("CUI:  #regs: %d	num_extents: %d  id: 0x%llx\n"),
+		f->cui_size, f->cui_nextents, (unsigned long long)f->cui_id);
+
+	if (continued) {
+		printf(_("CUI extent data skipped (CONTINUE set, no space)\n"));
+		goto error;
+	}
+
+	ex = f->cui_extents;
+	for (i=0; i < f->cui_nextents; i++) {
+		printf("(s: 0x%llx, l: %d, f: 0x%x) ",
+			(unsigned long long)ex->pe_startblock, ex->pe_len,
+			ex->pe_flags);
+		printf("\n");
+		ex++;
+	}
+error:
+	free(src_f);
+	free(f);
+	return error;
+}
+
+void
+xlog_recover_print_cui(
+	struct xlog_recover_item	*item)
+{
+	char				*src_f;
+	uint				src_len;
+
+	src_f = item->ri_buf[0].i_addr;
+	src_len = item->ri_buf[0].i_len;
+
+	xlog_print_trans_cui(&src_f, src_len, 0);
+}
+
+int
+xlog_print_trans_cud(
+	char				**ptr,
+	uint				len)
+{
+	struct xfs_cud_log_format	*f;
+	struct xfs_cud_log_format	lbuf;
+
+	/* size without extents at end */
+	uint core_size = sizeof(struct xfs_cud_log_format) -
+		sizeof(struct xfs_phys_extent);
+
+	/*
+	 * memmove to ensure 8-byte alignment for the long longs in
+	 * xfs_efd_log_format_t structure
+	 */
+	memmove(&lbuf, *ptr, MIN(core_size, len));
+	f = &lbuf;
+	*ptr += len;
+	if (len >= core_size) {
+		printf(_("CUD:  #regs: %d	num_extents: %d  id: 0x%llx\n"),
+			f->cud_size, f->cud_nextents,
+			(unsigned long long)f->cud_cui_id);
+
+		/* don't print extents as they are not used */
+
+		return 0;
+	} else {
+		printf(_("CUD: Not enough data to decode further\n"));
+		return 1;
+	}
+}
+
+void
+xlog_recover_print_cud(
+	struct xlog_recover_item	*item)
+{
+	char				*f;
+
+	f = item->ri_buf[0].i_addr;
+	xlog_print_trans_cud(&f, sizeof(struct xfs_cud_log_format));
+}
diff --git a/logprint/logprint.h b/logprint/logprint.h
index 0c03c08..a1115e2 100644
--- a/logprint/logprint.h
+++ b/logprint/logprint.h
@@ -56,4 +56,9 @@ extern void xlog_recover_print_rui(struct xlog_recover_item *item);
 extern int xlog_print_trans_rud(char **ptr, uint len);
 extern void xlog_recover_print_rud(struct xlog_recover_item *item);
 
+extern int xlog_print_trans_cui(char **ptr, uint src_len, int continued);
+extern void xlog_recover_print_cui(struct xlog_recover_item *item);
+extern int xlog_print_trans_cud(char **ptr, uint len);
+extern void xlog_recover_print_cud(struct xlog_recover_item *item);
+
 #endif	/* LOGPRINT_H */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 120/145] xfs_logprint: support bmap redo items
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (118 preceding siblings ...)
  2016-06-17  1:43 ` [PATCH 119/145] xfs_logprint: support refcount redo items Darrick J. Wong
@ 2016-06-17  1:43 ` Darrick J. Wong
  2016-06-17  1:43 ` [PATCH 121/145] man: document the reflink inode flag in fsxattr Darrick J. Wong
                   ` (24 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:43 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Print block mapping update redo items.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 logprint/log_misc.c      |   11 +++
 logprint/log_print_all.c |   12 ++++
 logprint/log_redo.c      |  152 ++++++++++++++++++++++++++++++++++++++++++++++
 logprint/logprint.h      |    5 ++
 4 files changed, 180 insertions(+)


diff --git a/logprint/log_misc.c b/logprint/log_misc.c
index 5389b72..c10a1d1 100644
--- a/logprint/log_misc.c
+++ b/logprint/log_misc.c
@@ -1019,6 +1019,17 @@ xlog_print_record(
 					be32_to_cpu(op_head->oh_len));
 			break;
 		    }
+		    case XFS_LI_BUI: {
+			skip = xlog_print_trans_bui(&ptr,
+					be32_to_cpu(op_head->oh_len),
+					continued);
+			break;
+		    }
+		    case XFS_LI_BUD: {
+			skip = xlog_print_trans_bud(&ptr,
+					be32_to_cpu(op_head->oh_len));
+			break;
+		    }
 		    case XFS_LI_QUOTAOFF: {
 			skip = xlog_print_trans_qoff(&ptr,
 					be32_to_cpu(op_head->oh_len));
diff --git a/logprint/log_print_all.c b/logprint/log_print_all.c
index eb3e326..f49316e 100644
--- a/logprint/log_print_all.c
+++ b/logprint/log_print_all.c
@@ -424,6 +424,12 @@ xlog_recover_print_logitem(
 	case XFS_LI_CUI:
 		xlog_recover_print_cui(item);
 		break;
+	case XFS_LI_BUD:
+		xlog_recover_print_bud(item);
+		break;
+	case XFS_LI_BUI:
+		xlog_recover_print_bui(item);
+		break;
 	case XFS_LI_DQUOT:
 		xlog_recover_print_dquot(item);
 		break;
@@ -470,6 +476,12 @@ xlog_recover_print_item(
 	case XFS_LI_CUI:
 		printf("CUI");
 		break;
+	case XFS_LI_BUD:
+		printf("BUD");
+		break;
+	case XFS_LI_BUI:
+		printf("BUI");
+		break;
 	case XFS_LI_DQUOT:
 		printf("DQ ");
 		break;
diff --git a/logprint/log_redo.c b/logprint/log_redo.c
index 1539be1..7a1bbef 100644
--- a/logprint/log_redo.c
+++ b/logprint/log_redo.c
@@ -532,3 +532,155 @@ xlog_recover_print_cud(
 	f = item->ri_buf[0].i_addr;
 	xlog_print_trans_cud(&f, sizeof(struct xfs_cud_log_format));
 }
+
+/* Block Mapping Update Items */
+
+static int
+xfs_bui_copy_format(
+	char			  *buf,
+	uint			  len,
+	struct xfs_bui_log_format *dst_fmt,
+	int			  continued)
+{
+	uint nextents = ((struct xfs_bui_log_format *)buf)->bui_nextents;
+	uint dst_len = sizeof(struct xfs_bui_log_format) +
+			(nextents - 1) * sizeof(struct xfs_map_extent);
+
+	if (len == dst_len || continued) {
+		memcpy((char *)dst_fmt, buf, len);
+		return 0;
+	}
+	fprintf(stderr, _("%s: bad size of BUI format: %u; expected %u; nextents = %u\n"),
+		progname, len, dst_len, nextents);
+	return 1;
+}
+
+int
+xlog_print_trans_bui(
+	char			**ptr,
+	uint			src_len,
+	int			continued)
+{
+	struct xfs_bui_log_format	*src_f, *f = NULL;
+	uint			dst_len;
+	uint			nextents;
+	struct xfs_map_extent	*ex;
+	int			i;
+	int			error = 0;
+	int			core_size;
+
+	core_size = offsetof(struct xfs_bui_log_format, bui_extents);
+
+	/*
+	 * memmove to ensure 8-byte alignment for the long longs in
+	 * struct xfs_bui_log_format structure
+	 */
+	src_f = malloc(src_len);
+	if (src_f == NULL) {
+		fprintf(stderr, _("%s: %s: malloc failed\n"),
+			progname, __func__);
+		exit(1);
+	}
+	memmove((char*)src_f, *ptr, src_len);
+	*ptr += src_len;
+
+	/* convert to native format */
+	nextents = src_f->bui_nextents;
+	dst_len = sizeof(struct xfs_bui_log_format) +
+			(nextents - 1) * sizeof(struct xfs_map_extent);
+
+	if (continued && src_len < core_size) {
+		printf(_("BUI: Not enough data to decode further\n"));
+		error = 1;
+		goto error;
+	}
+
+	f = malloc(dst_len);
+	if (f == NULL) {
+		fprintf(stderr, _("%s: %s: malloc failed\n"),
+			progname, __func__);
+		exit(1);
+	}
+	if (xfs_bui_copy_format((char *)src_f, src_len, f, continued)) {
+		error = 1;
+		goto error;
+	}
+
+	printf(_("BUI:  #regs: %d	num_extents: %d  id: 0x%llx\n"),
+		f->bui_size, f->bui_nextents, (unsigned long long)f->bui_id);
+
+	if (continued) {
+		printf(_("BUI extent data skipped (CONTINUE set, no space)\n"));
+		goto error;
+	}
+
+	ex = f->bui_extents;
+	for (i=0; i < f->bui_nextents; i++) {
+		printf("(s: 0x%llx, l: %d, own: %lld, off: %llu, f: 0x%x) ",
+			(unsigned long long)ex->me_startblock, ex->me_len,
+			(long long)ex->me_owner,
+			(unsigned long long)ex->me_startoff, ex->me_flags);
+		printf("\n");
+		ex++;
+	}
+error:
+	free(src_f);
+	free(f);
+	return error;
+}
+
+void
+xlog_recover_print_bui(
+	struct xlog_recover_item	*item)
+{
+	char				*src_f;
+	uint				src_len;
+
+	src_f = item->ri_buf[0].i_addr;
+	src_len = item->ri_buf[0].i_len;
+
+	xlog_print_trans_bui(&src_f, src_len, 0);
+}
+
+int
+xlog_print_trans_bud(
+	char				**ptr,
+	uint				len)
+{
+	struct xfs_bud_log_format	*f;
+	struct xfs_bud_log_format	lbuf;
+
+	/* size without extents at end */
+	uint core_size = sizeof(struct xfs_bud_log_format) -
+		sizeof(struct xfs_map_extent);
+
+	/*
+	 * memmove to ensure 8-byte alignment for the long longs in
+	 * xfs_efd_log_format_t structure
+	 */
+	memmove(&lbuf, *ptr, MIN(core_size, len));
+	f = &lbuf;
+	*ptr += len;
+	if (len >= core_size) {
+		printf(_("BUD:  #regs: %d	num_extents: %d  id: 0x%llx\n"),
+			f->bud_size, f->bud_nextents,
+			(unsigned long long)f->bud_bui_id);
+
+		/* don't print extents as they are not used */
+
+		return 0;
+	} else {
+		printf(_("BUD: Not enough data to decode further\n"));
+		return 1;
+	}
+}
+
+void
+xlog_recover_print_bud(
+	struct xlog_recover_item	*item)
+{
+	char				*f;
+
+	f = item->ri_buf[0].i_addr;
+	xlog_print_trans_bud(&f, sizeof(struct xfs_bud_log_format));
+}
diff --git a/logprint/logprint.h b/logprint/logprint.h
index a1115e2..81feff3 100644
--- a/logprint/logprint.h
+++ b/logprint/logprint.h
@@ -61,4 +61,9 @@ extern void xlog_recover_print_cui(struct xlog_recover_item *item);
 extern int xlog_print_trans_cud(char **ptr, uint len);
 extern void xlog_recover_print_cud(struct xlog_recover_item *item);
 
+extern int xlog_print_trans_bui(char **ptr, uint src_len, int continued);
+extern void xlog_recover_print_bui(struct xlog_recover_item *item);
+extern int xlog_print_trans_bud(char **ptr, uint len);
+extern void xlog_recover_print_bud(struct xlog_recover_item *item);
+
 #endif	/* LOGPRINT_H */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 121/145] man: document the reflink inode flag in fsxattr
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (119 preceding siblings ...)
  2016-06-17  1:43 ` [PATCH 120/145] xfs_logprint: support bmap " Darrick J. Wong
@ 2016-06-17  1:43 ` Darrick J. Wong
  2016-06-17  1:43 ` [PATCH 122/145] man: document the inode cowextsize flags & fields Darrick J. Wong
                   ` (23 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:43 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Document the new inode flag in struct fsxattr for reflink.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 man/man3/xfsctl.3 |   10 ++++++++++
 1 file changed, 10 insertions(+)


diff --git a/man/man3/xfsctl.3 b/man/man3/xfsctl.3
index 9e7f138..ef6c992 100644
--- a/man/man3/xfsctl.3
+++ b/man/man3/xfsctl.3
@@ -230,6 +230,16 @@ If the filesystem lives on directly accessible persistent memory, reads and
 writes to this file will go straight to the persistent memory, bypassing the
 page cache.
 .TP
+.SM "Bit 16 (0x10000) \- XFS_XFLAG_REFLINK"
+This file is sharing or has shared blocks with another file.
+This flag can be set by reflinking or deduping blocks with another file
+and cleared by fallocating the entire file to pre-copy all shared extents.
+A file cannot have
+.BR XFS_XFLAG_REFLINK
+and
+.BR XFS_XFLAG_DAX
+set at the same time, that is to say that DAX files cannot share blocks.
+.TP
 .SM "Bit 31 (0x80000000) \- XFS_XFLAG_HASATTR"
 The file has extended attributes associated with it.
 .RE

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 122/145] man: document the inode cowextsize flags & fields
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (120 preceding siblings ...)
  2016-06-17  1:43 ` [PATCH 121/145] man: document the reflink inode flag in fsxattr Darrick J. Wong
@ 2016-06-17  1:43 ` Darrick J. Wong
  2016-06-17  1:43 ` [PATCH 123/145] xfs_repair: fix get_agino_buf to avoid corrupting inodes Darrick J. Wong
                   ` (22 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:43 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Document the new copy-on-write extent size fields and inode flags
available in struct fsxattr.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 man/man3/xfsctl.3 |   24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)


diff --git a/man/man3/xfsctl.3 b/man/man3/xfsctl.3
index ef6c992..77506aa 100644
--- a/man/man3/xfsctl.3
+++ b/man/man3/xfsctl.3
@@ -150,6 +150,15 @@ value returned indicates that a preferred extent size was previously
 set on the file, a
 .B fsx_extsize
 of zero indicates that the defaults for that filesystem will be used.
+A
+.B fsx_cowextsize
+value returned indicates that a preferred copy on write extent size was
+previously set on the file, whereas a
+.B fsx_cowextsize
+of zero indicates that the defaults for that filesystem will be used.
+The current default for
+.B fsx_cowextsize
+is 128 blocks.
 Currently the meaningful bits for the
 .B fsx_xflags
 field are:
@@ -240,6 +249,15 @@ and
 .BR XFS_XFLAG_DAX
 set at the same time, that is to say that DAX files cannot share blocks.
 .TP
+.SM "Bit 17 (0x20000) \- XFS_XFLAG_COWEXTSIZE"
+Copy on Write Extent size bit - if a CoW extent size value is set on the file,
+the allocator will allocate extents for staging a copy on write operation
+in multiples of the set size for this file (see
+.B XFS_IOC_FSSETXATTR
+below).
+If the CoW extent size is set on a directory, then new file and directories
+created in the directory will inherit the parent's CoW extent size value.
+.TP
 .SM "Bit 31 (0x80000000) \- XFS_XFLAG_HASATTR"
 The file has extended attributes associated with it.
 .RE
@@ -261,7 +279,8 @@ The final argument points to a variable of type
 .BR "struct fsxattr" ,
 but only the following fields are used in this call:
 .BR fsx_xflags ,
-.B fsx_extsize
+.BR fsx_extsize ,
+.BR fsx_cowextsize ,
 and
 .BR fsx_projid .
 The
@@ -271,6 +290,9 @@ when the file is empty, except in the case of a directory where
 the extent size can be set at any time (this value is only used
 for regular file allocations, so should only be set on a directory
 in conjunction with the XFS_XFLAG_EXTSZINHERIT flag).
+The copy on write extent size,
+.BR fsx_cowextsize ,
+can be set at any time.
 
 .TP
 .B XFS_IOC_GETBMAP

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 123/145] xfs_repair: fix get_agino_buf to avoid corrupting inodes
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (121 preceding siblings ...)
  2016-06-17  1:43 ` [PATCH 122/145] man: document the inode cowextsize flags & fields Darrick J. Wong
@ 2016-06-17  1:43 ` Darrick J. Wong
  2016-06-17  1:43 ` [PATCH 124/145] xfs_repair: check the existing refcount btree Darrick J. Wong
                   ` (21 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:43 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

The inode buffering code tries to read inodes in units of chunks,
which are the larger of 8K or 1 FSB.  Each chunk gets its own xfs_buf,
which means that get_agino_buf must calculate the disk address of the
chunk and feed that to libxfs_readbuf in order to find the inode data
correctly.  The current code simply grabs the chunk for the start
inode and indexes from that, which corrupts memory because the start
inode and the target inode could be in different inode chunks.  That
causes the assert in rmap.c to blow when we clear the reflink flag.

(Also fix some minor errors in the debugging printfs.)

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/rdwr.c   |    8 +++---
 repair/dinode.c |   73 +++++++++++++++++++++++++++++++++----------------------
 repair/dinode.h |   12 +++++----
 3 files changed, 54 insertions(+), 39 deletions(-)


diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
index 533a064..9fcc319 100644
--- a/libxfs/rdwr.c
+++ b/libxfs/rdwr.c
@@ -1038,9 +1038,9 @@ libxfs_readbufr_map(struct xfs_buftarg *btp, struct xfs_buf *bp, int flags)
 	if (!error)
 		bp->b_flags |= LIBXFS_B_UPTODATE;
 #ifdef IO_DEBUG
-	printf("%lx: %s: read %u bytes, error %d, blkno=0x%llx(0x%llx), %p\n",
-		pthread_self(), __FUNCTION__, , error,
-		(long long)LIBXFS_BBTOOFF64(blkno), (long long)blkno, bp);
+	printf("%lx: %s: read %lu bytes, error %d, blkno=%llu(%llu), %p\n",
+		pthread_self(), __FUNCTION__, buf - (char *)bp->b_addr, error,
+		(long long)LIBXFS_BBTOOFF64(bp->b_bn), (long long)bp->b_bn, bp);
 #endif
 	return error;
 }
@@ -1070,7 +1070,7 @@ libxfs_readbuf_map(struct xfs_buftarg *btp, struct xfs_buf_map *map, int nmaps,
 	if (!error)
 		libxfs_readbuf_verify(bp, ops);
 
-#ifdef IO_DEBUG
+#ifdef IO_DEBUGX
 	printf("%lx: %s: read %lu bytes, error %d, blkno=%llu(%llu), %p\n",
 		pthread_self(), __FUNCTION__, buf - (char *)bp->b_addr, error,
 		(long long)LIBXFS_BBTOOFF64(bp->b_bn), (long long)bp->b_bn, bp);
diff --git a/repair/dinode.c b/repair/dinode.c
index 89163b1..e9b4f8f 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -847,43 +847,58 @@ scan_bmbt_reclist(
 }
 
 /*
- * these two are meant for routines that read and work with inodes
- * one at a time where the inodes may be in any order (like walking
- * the unlinked lists to look for inodes).  the caller is responsible
- * for writing/releasing the buffer.
+ * Grab the buffer backing an inode.  This is meant for routines that
+ * work with inodes one at a time in any order (like walking the
+ * unlinked lists to look for inodes).  The caller is responsible for
+ * writing/releasing the buffer.
  */
-xfs_buf_t *
-get_agino_buf(xfs_mount_t	 *mp,
-		xfs_agnumber_t	agno,
-		xfs_agino_t	agino,
-		xfs_dinode_t	**dipp)
+struct xfs_buf *
+get_agino_buf(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	xfs_agino_t		agino,
+	struct xfs_dinode	**dipp)
 {
-	ino_tree_node_t *irec;
-	xfs_buf_t *bp;
-	int size;
-
-	if ((irec = find_inode_rec(mp, agno, agino)) == NULL)
-		return(NULL);
+	struct xfs_buf		*bp;
+	int			cluster_size;
+	int			ino_per_cluster;
+	xfs_agino_t		cluster_agino;
+	xfs_daddr_t		cluster_daddr;
+	xfs_daddr_t		cluster_blks;
 
-	size = MAX(1, XFS_FSB_TO_BB(mp,
+	/*
+	 * Inode buffers have been read into memory in inode_cluster_size
+	 * chunks (or one FSB).  To find the correct buffer for an inode,
+	 * we must find the buffer for its cluster, add the appropriate
+	 * offset, and return that.
+	 */
+	cluster_size = MAX(mp->m_inode_cluster_size, mp->m_sb.sb_blocksize);
+	ino_per_cluster = cluster_size / mp->m_sb.sb_inodesize;
+	cluster_agino = agino & ~(ino_per_cluster - 1);
+	cluster_blks = XFS_FSB_TO_DADDR(mp, MAX(1,
 			mp->m_inode_cluster_size >> mp->m_sb.sb_blocklog));
-	bp = libxfs_readbuf(mp->m_dev, XFS_AGB_TO_DADDR(mp, agno,
-		XFS_AGINO_TO_AGBNO(mp, irec->ino_startnum)), size, 0,
-		&xfs_inode_buf_ops);
+	cluster_daddr = XFS_AGB_TO_DADDR(mp, agno,
+			XFS_AGINO_TO_AGBNO(mp, cluster_agino));
+
+#ifdef XR_INODE_TRACE
+	printf("cluster_size %d ipc %d clusagino %d daddr %lld sectors %lld\n",
+		cluster_size, ino_per_cluster, cluster_agino, cluster_daddr,
+		cluster_blks);
+#endif
+
+	bp = libxfs_readbuf(mp->m_dev, cluster_daddr, cluster_blks,
+			0, &xfs_inode_buf_ops);
 	if (!bp) {
 		do_warn(_("cannot read inode (%u/%u), disk block %" PRIu64 "\n"),
-			agno, irec->ino_startnum,
-			XFS_AGB_TO_DADDR(mp, agno,
-				XFS_AGINO_TO_AGBNO(mp, irec->ino_startnum)));
-		return(NULL);
+			agno, cluster_agino, cluster_daddr);
+		return NULL;
 	}
 
-	*dipp = xfs_make_iptr(mp, bp, agino -
-		XFS_OFFBNO_TO_AGINO(mp, XFS_AGINO_TO_AGBNO(mp,
-						irec->ino_startnum),
-		0));
-
-	return(bp);
+	*dipp = xfs_make_iptr(mp, bp, agino - cluster_agino);
+	ASSERT(!xfs_sb_version_hascrc(&mp->m_sb) ||
+			XFS_AGINO_TO_INO(mp, agno, agino) ==
+			be64_to_cpu((*dipp)->di_ino));
+	return bp;
 }
 
 /*
diff --git a/repair/dinode.h b/repair/dinode.h
index 5aebf5b..61d0736 100644
--- a/repair/dinode.h
+++ b/repair/dinode.h
@@ -113,12 +113,12 @@ void
 check_uncertain_aginodes(xfs_mount_t	*mp,
 			xfs_agnumber_t	agno);
 
-xfs_buf_t *
-get_agino_buf(xfs_mount_t	*mp,
-		xfs_agnumber_t	agno,
-		xfs_agino_t	agino,
-		xfs_dinode_t	**dipp);
-
+struct xfs_buf *
+get_agino_buf(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	xfs_agino_t		agino,
+	struct xfs_dinode	**dipp);
 
 void dinode_bmbt_translation_init(void);
 char * get_forkname(int whichfork);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 124/145] xfs_repair: check the existing refcount btree
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (122 preceding siblings ...)
  2016-06-17  1:43 ` [PATCH 123/145] xfs_repair: fix get_agino_buf to avoid corrupting inodes Darrick J. Wong
@ 2016-06-17  1:43 ` Darrick J. Wong
  2016-06-17  1:43 ` [PATCH 125/145] xfs_repair: handle multiple owners of data blocks Darrick J. Wong
                   ` (20 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:43 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Spot-check the refcount btree for obvious errors, and mark the
refcount btree blocks as such.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/incore.h     |    3 +
 repair/scan.c       |  185 +++++++++++++++++++++++++++++++++++++++++++++++++++
 repair/xfs_repair.c |    2 +
 3 files changed, 189 insertions(+), 1 deletion(-)


diff --git a/repair/incore.h b/repair/incore.h
index bc0810b..b6c4b4f 100644
--- a/repair/incore.h
+++ b/repair/incore.h
@@ -106,7 +106,8 @@ typedef struct rt_extent_tree_node  {
 #define XR_E_INUSE_FS1	9	/* used by fs ag header or log (rmap btree) */
 #define XR_E_INO1	10	/* used by inodes (marked by rmap btree) */
 #define XR_E_FS_MAP1	11	/* used by fs space/inode maps (rmap btree) */
-#define XR_E_BAD_STATE	12
+#define XR_E_REFC	12	/* used by fs ag reference count btree */
+#define XR_E_BAD_STATE	13
 
 /* separate state bit, OR'ed into high (4th) bit of ex_state field */
 
diff --git a/repair/scan.c b/repair/scan.c
index ec41ba6..6300204 100644
--- a/repair/scan.c
+++ b/repair/scan.c
@@ -995,6 +995,9 @@ advance:
 					case XFS_RMAP_OWN_INODES:
 						set_bmap(agno, b, XR_E_INO1);
 						break;
+					case XFS_RMAP_OWN_REFC:
+						set_bmap(agno, b, XR_E_REFC);
+						break;
 					case XFS_RMAP_OWN_NULL:
 						/* still unknown */
 						break;
@@ -1030,6 +1033,14 @@ _("inode block (%d,%d-%d) mismatch in %s tree, state - %d,%" PRIx64 "\n"),
 						agno, b, b + blen - 1,
 						name, state, owner);
 					break;
+				case XR_E_REFC:
+					if (owner == XFS_RMAP_OWN_REFC)
+						break;
+					do_warn(
+_("AG refcount block (%d,%d-%d) mismatch in %s tree, state - %d,%" PRIx64 "\n"),
+						agno, b, b + blen - 1,
+						name, state, owner);
+					break;
 				case XR_E_INUSE:
 					if (owner >= 0 &&
 					    owner < mp->m_sb.sb_dblocks)
@@ -1148,6 +1159,167 @@ out:
 		rmap_avoid_check();
 }
 
+static void
+scan_refcbt(
+	struct xfs_btree_block	*block,
+	int			level,
+	xfs_agblock_t		bno,
+	xfs_agnumber_t		agno,
+	int			suspect,
+	int			isroot,
+	__uint32_t		magic,
+	void			*priv)
+{
+	const char		*name = "refcount";
+	int			i;
+	xfs_refcount_ptr_t	*pp;
+	struct xfs_refcount_rec	*rp;
+	int			hdr_errors = 0;
+	int			numrecs;
+	int			state;
+	xfs_agblock_t		lastblock = 0;
+
+	if (magic != XFS_REFC_CRC_MAGIC) {
+		name = "(unknown)";
+		hdr_errors++;
+		suspect++;
+		goto out;
+	}
+
+	if (be32_to_cpu(block->bb_magic) != magic) {
+		do_warn(_("bad magic # %#x in %s btree block %d/%d\n"),
+			be32_to_cpu(block->bb_magic), name, agno, bno);
+		hdr_errors++;
+		if (suspect)
+			goto out;
+	}
+
+	if (be16_to_cpu(block->bb_level) != level) {
+		do_warn(_("expected level %d got %d in %s btree block %d/%d\n"),
+			level, be16_to_cpu(block->bb_level), name, agno, bno);
+		hdr_errors++;
+		if (suspect)
+			goto out;
+	}
+
+	/* check for btree blocks multiply claimed */
+	state = get_bmap(agno, bno);
+	if (!(state == XR_E_UNKNOWN || state == XR_E_REFC))  {
+		set_bmap(agno, bno, XR_E_MULT);
+		do_warn(
+_("%s btree block claimed (state %d), agno %d, bno %d, suspect %d\n"),
+				name, state, agno, bno, suspect);
+		goto out;
+	}
+	set_bmap(agno, bno, XR_E_FS_MAP);
+
+	numrecs = be16_to_cpu(block->bb_numrecs);
+	if (level == 0) {
+		if (numrecs > mp->m_refc_mxr[0])  {
+			numrecs = mp->m_refc_mxr[0];
+			hdr_errors++;
+		}
+		if (isroot == 0 && numrecs < mp->m_refc_mnr[0])  {
+			numrecs = mp->m_refc_mnr[0];
+			hdr_errors++;
+		}
+
+		if (hdr_errors) {
+			do_warn(
+	_("bad btree nrecs (%u, min=%u, max=%u) in %s btree block %u/%u\n"),
+				be16_to_cpu(block->bb_numrecs),
+				mp->m_refc_mnr[0], mp->m_refc_mxr[0],
+				name, agno, bno);
+			suspect++;
+		}
+
+		rp = XFS_REFCOUNT_REC_ADDR(block, 1);
+		for (i = 0; i < numrecs; i++) {
+			xfs_agblock_t		b, end;
+			xfs_extlen_t		len;
+			xfs_nlink_t		nr;
+
+			b = be32_to_cpu(rp[i].rc_startblock);
+			len = be32_to_cpu(rp[i].rc_blockcount);
+			nr = be32_to_cpu(rp[i].rc_refcount);
+			end = b + len;
+
+			if (!verify_agbno(mp, agno, b)) {
+				do_warn(
+	_("invalid start block %u in record %u of %s btree block %u/%u\n"),
+					b, i, name, agno, bno);
+				continue;
+			}
+			if (len == 0 || !verify_agbno(mp, agno, end - 1)) {
+				do_warn(
+	_("invalid length %u in record %u of %s btree block %u/%u\n"),
+					len, i, name, agno, bno);
+				continue;
+			}
+
+			if (nr < 2 || nr > MAXREFCOUNT) {
+				do_warn(
+	_("invalid reference count %u in record %u of %s btree block %u/%u\n"),
+					nr, i, name, agno, bno);
+				continue;
+			}
+
+			if (b && b <= lastblock) {
+				do_warn(_(
+	"out-of-order %s btree record %d (%u %u) block %u/%u\n"),
+					name, i, b, len, agno, bno);
+			} else {
+				lastblock = b;
+			}
+
+			/* XXX: probably want to mark the reflinked areas? */
+		}
+		goto out;
+	}
+
+	/*
+	 * interior record
+	 */
+	pp = XFS_REFCOUNT_PTR_ADDR(block, 1, mp->m_refc_mxr[1]);
+
+	if (numrecs > mp->m_refc_mxr[1])  {
+		numrecs = mp->m_refc_mxr[1];
+		hdr_errors++;
+	}
+	if (isroot == 0 && numrecs < mp->m_refc_mnr[1])  {
+		numrecs = mp->m_refc_mnr[1];
+		hdr_errors++;
+	}
+
+	/*
+	 * don't pass bogus tree flag down further if this block
+	 * looked ok.  bail out if two levels in a row look bad.
+	 */
+	if (hdr_errors)  {
+		do_warn(
+	_("bad btree nrecs (%u, min=%u, max=%u) in %s btree block %u/%u\n"),
+			be16_to_cpu(block->bb_numrecs),
+			mp->m_refc_mnr[1], mp->m_refc_mxr[1],
+			name, agno, bno);
+		if (suspect)
+			goto out;
+		suspect++;
+	} else if (suspect) {
+		suspect = 0;
+	}
+
+	for (i = 0; i < numrecs; i++)  {
+		xfs_agblock_t		bno = be32_to_cpu(pp[i]);
+
+		if (bno != 0 && verify_agbno(mp, agno, bno)) {
+			scan_sbtree(bno, level, agno, suspect, scan_refcbt, 0,
+				    magic, priv, &xfs_refcountbt_buf_ops);
+		}
+	}
+out:
+	return;
+}
+
 /*
  * The following helpers are to help process and validate individual on-disk
  * inode btree records. We have two possible inode btrees with slightly
@@ -1933,6 +2105,19 @@ validate_agf(
 		}
 	}
 
+	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
+		bno = be32_to_cpu(agf->agf_refcount_root);
+		if (bno != 0 && verify_agbno(mp, agno, bno)) {
+			scan_sbtree(bno,
+				    be32_to_cpu(agf->agf_refcount_level),
+				    agno, 0, scan_refcbt, 1, XFS_REFC_CRC_MAGIC,
+				    agcnts, &xfs_refcountbt_buf_ops);
+		} else  {
+			do_warn(_("bad agbno %u for refcntbt root, agno %d\n"),
+				bno, agno);
+		}
+	}
+
 	if (be32_to_cpu(agf->agf_freeblks) != agcnts->agffreeblks) {
 		do_warn(_("agf_freeblks %u, counted %u in ag %u\n"),
 			be32_to_cpu(agf->agf_freeblks), agcnts->agffreeblks, agno);
diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
index 2ecd81d..cc557f7 100644
--- a/repair/xfs_repair.c
+++ b/repair/xfs_repair.c
@@ -423,6 +423,8 @@ calc_mkfs(xfs_mount_t *mp)
 		fino_bno += min(2, mp->m_rmap_maxlevels);
 		fino_bno++;
 	}
+	if (xfs_sb_version_hasreflink(&mp->m_sb))
+		fino_bno++;
 
 	/*
 	 * If the log is allocated in the first allocation group we need to

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 125/145] xfs_repair: handle multiple owners of data blocks
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (123 preceding siblings ...)
  2016-06-17  1:43 ` [PATCH 124/145] xfs_repair: check the existing refcount btree Darrick J. Wong
@ 2016-06-17  1:43 ` Darrick J. Wong
  2016-06-17  1:44 ` [PATCH 126/145] xfs_repair: process reverse-mapping data into refcount data Darrick J. Wong
                   ` (19 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:43 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

If reflink is enabled, don't freak out if there are multiple owners of
a given block; that's just a sign that each of those owners are
reflink files.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/dinode.c |   66 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 repair/scan.c   |   38 +++++++++++++++++++++++++++++++-
 2 files changed, 103 insertions(+), 1 deletion(-)


diff --git a/repair/dinode.c b/repair/dinode.c
index e9b4f8f..ecba3be 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -722,6 +722,9 @@ _("Fatal error: inode %" PRIu64 " - blkmap_set_ext(): %s\n"
 			 * checking each entry without setting the
 			 * block bitmap
 			 */
+			if (type == XR_INO_DATA &&
+			    xfs_sb_version_hasreflink(&mp->m_sb))
+				goto skip_dup;
 			if (search_dup_extent(agno, agbno, ebno)) {
 				do_warn(
 _("%s fork in ino %" PRIu64 " claims dup extent, "
@@ -731,6 +734,7 @@ _("%s fork in ino %" PRIu64 " claims dup extent, "
 					irec.br_blockcount);
 				goto done;
 			}
+skip_dup:
 			*tot += irec.br_blockcount;
 			continue;
 		}
@@ -770,6 +774,9 @@ _("%s fork in inode %" PRIu64 " claims metadata block %" PRIu64 "\n"),
 			case XR_E_INUSE:
 			case XR_E_MULT:
 				set_bmap_ext(agno, agbno, blen, XR_E_MULT);
+				if (type == XR_INO_DATA &&
+				    xfs_sb_version_hasreflink(&mp->m_sb))
+					break;
 				do_warn(
 _("%s fork in %s inode %" PRIu64 " claims used block %" PRIu64 "\n"),
 					forkname, ftype, ino, b);
@@ -2475,6 +2482,65 @@ _("bad (negative) size %" PRId64 " on inode %" PRIu64 "\n"),
 		}
 	}
 
+	/*
+	 * check that we only have valid flags2 set, and those that are set make
+	 * sense.
+	 */
+	if (dino->di_version >= 3) {
+		uint16_t flags = be16_to_cpu(dino->di_flags);
+		uint64_t flags2 = be64_to_cpu(dino->di_flags2);
+
+		if (flags2 & ~XFS_DIFLAG2_ANY) {
+			if (!uncertain) {
+				do_warn(
+	_("Bad flags2 set in inode %" PRIu64 "\n"),
+					lino);
+			}
+			flags2 &= XFS_DIFLAG2_ANY;
+		}
+
+		if ((flags2 & XFS_DIFLAG2_REFLINK) &&
+		    !xfs_sb_version_hasreflink(&mp->m_sb)) {
+			if (!uncertain) {
+				do_warn(
+	_("inode %" PRIu64 " is marked reflinked but file system does not support reflink\n"),
+					lino);
+			}
+			goto clear_bad_out;
+		}
+
+		if (flags2 & XFS_DIFLAG2_REFLINK) {
+			/* must be a file */
+			if (di_mode && !S_ISREG(di_mode)) {
+				if (!uncertain) {
+					do_warn(
+	_("reflink flag set on non-file inode %" PRIu64 "\n"),
+						lino);
+				}
+				goto clear_bad_out;
+			}
+		}
+
+		if ((flags2 & XFS_DIFLAG2_REFLINK) &&
+		    (flags & (XFS_DIFLAG_REALTIME | XFS_DIFLAG_RTINHERIT))) {
+			if (!uncertain) {
+				do_warn(
+	_("Cannot have a reflinked realtime inode %" PRIu64 "\n"),
+					lino);
+			}
+			goto clear_bad_out;
+		}
+
+		if (!verify_mode && flags2 != be64_to_cpu(dino->di_flags2)) {
+			if (!no_modify) {
+				do_warn(_("fixing bad flags2.\n"));
+				dino->di_flags2 = cpu_to_be64(flags2);
+				*dirty = 1;
+			} else
+				do_warn(_("would fix bad flags2.\n"));
+		}
+	}
+
 	if (verify_mode)
 		return retval;
 
diff --git a/repair/scan.c b/repair/scan.c
index 6300204..8938341 100644
--- a/repair/scan.c
+++ b/repair/scan.c
@@ -790,6 +790,28 @@ struct rmap_priv {
 	struct xfs_rmap_irec	last_rec;
 };
 
+static bool
+rmap_in_order(
+	xfs_agblock_t	b,
+	xfs_agblock_t	lastblock,
+	int64_t		owner,
+	int64_t		lastowner,
+	int64_t		offset,
+	int64_t		lastoffset)
+{
+	if (b > lastblock)
+		return true;
+	else if (b < lastblock)
+		return false;
+
+	if (owner > lastowner)
+		return true;
+	else if (owner < lastowner)
+		return false;
+
+	return offset > lastoffset;
+}
+
 static void
 scan_rmapbt(
 	struct xfs_btree_block	*block,
@@ -947,7 +969,12 @@ advance:
 			} else {
 				bool bad;
 
-				bad = b <= lastblock;
+				if (xfs_sb_version_hasreflink(&mp->m_sb))
+					bad = !rmap_in_order(b, lastblock,
+							owner, lastowner,
+							offset, lastoffset);
+				else
+					bad = b <= lastblock;
 				if (bad)
 					do_warn(
 	_("out-of-order rmap btree record %d (%u %"PRId64" %"PRIx64" %u) block %u/%u\n"),
@@ -1057,6 +1084,15 @@ _("in use block (%d,%d-%d) mismatch in %s tree, state - %d,%" PRIx64 "\n"),
 					 * be caught later.
 					 */
 					break;
+				case XR_E_INUSE1:
+					/*
+					 * multiple inode owners are ok with
+					 * reflink enabled
+					 */
+					if (xfs_sb_version_hasreflink(&mp->m_sb) &&
+					    !XFS_RMAP_NON_INODE_OWNER(owner))
+						break;
+					/* fall through */
 				default:
 					do_warn(
 _("unknown block (%d,%d-%d) mismatch on %s tree, state - %d,%" PRIx64 "\n"),

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 126/145] xfs_repair: process reverse-mapping data into refcount data
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (124 preceding siblings ...)
  2016-06-17  1:43 ` [PATCH 125/145] xfs_repair: handle multiple owners of data blocks Darrick J. Wong
@ 2016-06-17  1:44 ` Darrick J. Wong
  2016-06-17  1:44 ` [PATCH 127/145] xfs_repair: record reflink inode state Darrick J. Wong
                   ` (18 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:44 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Take all the reverse-mapping data we've acquired and use it to generate
reference count data.  This data is used in phase 5 to rebuild the
refcount btree.

v2: Update to reflect separation of rmap_irec flags.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/phase4.c |   27 ++++++
 repair/rmap.c   |  232 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 repair/rmap.h   |    2 
 3 files changed, 259 insertions(+), 2 deletions(-)


diff --git a/repair/phase4.c b/repair/phase4.c
index 3be3786..021d51d 100644
--- a/repair/phase4.c
+++ b/repair/phase4.c
@@ -193,6 +193,21 @@ _("%s while checking reverse-mappings"),
 }
 
 static void
+compute_ag_refcounts(
+	work_queue_t	*wq,
+	xfs_agnumber_t	agno,
+	void		*arg)
+{
+	int		error;
+
+	error = compute_refcounts(wq->mp, agno);
+	if (error)
+		do_error(
+_("%s while computing reference count records.\n"),
+			 strerror(-error));
+}
+
+static void
 process_rmap_data(
 	struct xfs_mount	*mp)
 {
@@ -206,6 +221,14 @@ process_rmap_data(
 	for (i = 0; i < mp->m_sb.sb_agcount; i++)
 		queue_work(&wq, check_rmap_btrees, i, NULL);
 	destroy_work_queue(&wq);
+
+	if (!xfs_sb_version_hasreflink(&mp->m_sb))
+		return;
+
+	create_work_queue(&wq, mp, libxfs_nproc());
+	for (i = 0; i < mp->m_sb.sb_agcount; i++)
+		queue_work(&wq, compute_ag_refcounts, i, NULL);
+	destroy_work_queue(&wq);
 }
 
 void
@@ -359,7 +382,9 @@ phase4(xfs_mount_t *mp)
 
 	/*
 	 * Process all the reverse-mapping data that we collected.  This
-	 * involves checking the rmap data against the btree.
+	 * involves checking the rmap data against the btree, computing
+	 * reference counts based on the rmap data, and checking the counts
+	 * against the refcount btree.
 	 */
 	process_rmap_data(mp);
 
diff --git a/repair/rmap.c b/repair/rmap.c
index e39df5a..4da8003 100644
--- a/repair/rmap.c
+++ b/repair/rmap.c
@@ -42,6 +42,7 @@ struct xfs_ag_rmap {
 	int		ar_flcount;		/* agfl entries from leftover */
 						/* agbt allocations */
 	struct xfs_rmap_irec	ar_last_rmap;	/* last rmap seen */
+	struct xfs_slab	*ar_refcount_items;	/* refcount items, p4-5 */
 };
 
 static struct xfs_ag_rmap *ag_rmaps;
@@ -88,7 +89,8 @@ bool
 needs_rmap_work(
 	struct xfs_mount	*mp)
 {
-	return xfs_sb_version_hasrmapbt(&mp->m_sb);
+	return xfs_sb_version_hasreflink(&mp->m_sb) ||
+	       xfs_sb_version_hasrmapbt(&mp->m_sb);
 }
 
 /*
@@ -120,6 +122,11 @@ _("Insufficient memory while allocating reverse mapping slabs."));
 			do_error(
 _("Insufficient memory while allocating raw metadata reverse mapping slabs."));
 		ag_rmaps[i].ar_last_rmap.rm_owner = XFS_RMAP_OWN_UNKNOWN;
+		error = init_slab(&ag_rmaps[i].ar_refcount_items,
+				  sizeof(struct xfs_refcount_irec));
+		if (error)
+			do_error(
+_("Insufficient memory while allocating refcount item slabs."));
 	}
 }
 
@@ -138,6 +145,7 @@ free_rmaps(
 	for (i = 0; i < mp->m_sb.sb_agcount; i++) {
 		free_slab(&ag_rmaps[i].ar_rmaps);
 		free_slab(&ag_rmaps[i].ar_raw_rmaps);
+		free_slab(&ag_rmaps[i].ar_refcount_items);
 	}
 	free(ag_rmaps);
 	ag_rmaps = NULL;
@@ -591,6 +599,228 @@ dump_rmap(
 #endif
 
 /*
+ * Rebuilding the Reference Count & Reverse Mapping Btrees
+ *
+ * The reference count (refcnt) and reverse mapping (rmap) btrees are rebuilt
+ * during phase 5, like all other AG btrees.  Therefore, reverse mappings must
+ * be processed into reference counts at the end of phase 4, and the rmaps must
+ * be recorded during phase 4.  There is a need to access the rmaps in physical
+ * block order, but no particular need for random access, so the slab.c code
+ * provides a big logical array (consisting of smaller slabs) and some inorder
+ * iterator functions.
+ *
+ * Once we've recorded all the reverse mappings, we're ready to translate the
+ * rmaps into refcount entries.  Imagine the rmap entries as rectangles
+ * representing extents of physical blocks, and that the rectangles can be laid
+ * down to allow them to overlap each other; then we know that we must emit
+ * a refcnt btree entry wherever the amount of overlap changes, i.e. the
+ * emission stimulus is level-triggered:
+ *
+ *                 -    ---
+ *       --      ----- ----   ---        ------
+ * --   ----     ----------- ----     ---------
+ * -------------------------------- -----------
+ * ^ ^  ^^ ^^    ^ ^^ ^^^  ^^^^  ^ ^^ ^  ^     ^
+ * 2 1  23 21    3 43 234  2123  1 01 2  3     0
+ *
+ * For our purposes, a rmap is a tuple (startblock, len, fileoff, owner).
+ *
+ * Note that in the actual refcnt btree we don't store the refcount < 2 cases
+ * because the bnobt tells us which blocks are free; single-use blocks aren't
+ * recorded in the bnobt or the refcntbt.  If the rmapbt supports storing
+ * multiple entries covering a given block we could theoretically dispense with
+ * the refcntbt and simply count rmaps, but that's inefficient in the (hot)
+ * write path, so we'll take the cost of the extra tree to save time.  Also
+ * there's no guarantee that rmap will be enabled.
+ *
+ * Given an array of rmaps sorted by physical block number, a starting physical
+ * block (sp), a bag to hold rmaps that cover sp, and the next physical
+ * block where the level changes (np), we can reconstruct the refcount
+ * btree as follows:
+ *
+ * While there are still unprocessed rmaps in the array,
+ *  - Set sp to the physical block (pblk) of the next unprocessed rmap.
+ *  - Add to the bag all rmaps in the array where startblock == sp.
+ *  - Set np to the physical block where the bag size will change.
+ *    This is the minimum of (the pblk of the next unprocessed rmap) and
+ *    (startblock + len of each rmap in the bag).
+ *  - Record the bag size as old_bag_size.
+ *
+ *  - While the bag isn't empty,
+ *     - Remove from the bag all rmaps where startblock + len == np.
+ *     - Add to the bag all rmaps in the array where startblock == np.
+ *     - If the bag size isn't old_bag_size, store the refcount entry
+ *       (sp, np - sp, bag_size) in the refcnt btree.
+ *     - If the bag is empty, break out of the inner loop.
+ *     - Set old_bag_size to the bag size
+ *     - Set sp = np.
+ *     - Set np to the physical block where the bag size will change.
+ *       This is the minimum of (the pblk of the next unprocessed rmap) and
+ *       (startblock + len of each rmap in the bag).
+ *
+ * An implementation detail is that because this processing happens during
+ * phase 4, the refcount entries are stored in an array so that phase 5 can
+ * load them into the refcount btree.  The rmaps can be loaded directly into
+ * the rmap btree during phase 5 as well.
+ */
+
+/*
+ * Emit a refcount object for refcntbt reconstruction during phase 5.
+ */
+#define REFCOUNT_CLAMP(nr)	((nr) > MAXREFCOUNT ? MAXREFCOUNT : (nr))
+static void
+refcount_emit(
+	struct xfs_mount		*mp,
+	xfs_agnumber_t		agno,
+	xfs_agblock_t		agbno,
+	xfs_extlen_t		len,
+	size_t			nr_rmaps)
+{
+	struct xfs_refcount_irec	rlrec;
+	int			error;
+	struct xfs_slab		*rlslab;
+
+	rlslab = ag_rmaps[agno].ar_refcount_items;
+	ASSERT(nr_rmaps > 0);
+
+	dbg_printf("REFL: agno=%u pblk=%u, len=%u -> refcount=%zu\n",
+		agno, agbno, len, nr_rmaps);
+	rlrec.rc_startblock = agbno;
+	rlrec.rc_blockcount = len;
+	rlrec.rc_refcount = REFCOUNT_CLAMP(nr_rmaps);
+	error = slab_add(rlslab, &rlrec);
+	if (error)
+		do_error(
+_("Insufficient memory while recreating refcount tree."));
+}
+#undef REFCOUNT_CLAMP
+
+/*
+ * Transform a pile of physical block mapping observations into refcount data
+ * for eventual rebuilding of the btrees.
+ */
+#define RMAP_END(r)	((r)->rm_startblock + (r)->rm_blockcount)
+int
+compute_refcounts(
+	struct xfs_mount		*mp,
+	xfs_agnumber_t		agno)
+{
+	struct xfs_bag		*stack_top = NULL;
+	struct xfs_slab		*rmaps;
+	struct xfs_slab_cursor	*rmaps_cur;
+	struct xfs_rmap_irec	*array_cur;
+	struct xfs_rmap_irec	*rmap;
+	xfs_agblock_t		sbno;	/* first bno of this rmap set */
+	xfs_agblock_t		cbno;	/* first bno of this refcount set */
+	xfs_agblock_t		nbno;	/* next bno where rmap set changes */
+	size_t			n, idx;
+	size_t			old_stack_nr;
+	int			error;
+
+	if (!xfs_sb_version_hasreflink(&mp->m_sb))
+		return 0;
+
+	rmaps = ag_rmaps[agno].ar_rmaps;
+
+	error = init_slab_cursor(rmaps, rmap_compare, &rmaps_cur);
+	if (error)
+		return error;
+
+	error = init_bag(&stack_top);
+	if (error)
+		goto err;
+
+	/* While there are rmaps to be processed... */
+	n = 0;
+	while (n < slab_count(rmaps)) {
+		array_cur = peek_slab_cursor(rmaps_cur);
+		sbno = cbno = array_cur->rm_startblock;
+		/* Push all rmaps with pblk == sbno onto the stack */
+		for (;
+		     array_cur && array_cur->rm_startblock == sbno;
+		     array_cur = peek_slab_cursor(rmaps_cur)) {
+			advance_slab_cursor(rmaps_cur); n++;
+			dump_rmap("push0", agno, array_cur);
+			error = bag_add(stack_top, array_cur);
+			if (error)
+				goto err;
+		}
+
+		/* Set nbno to the bno of the next refcount change */
+		if (n < slab_count(rmaps))
+			nbno = array_cur->rm_startblock;
+		else
+			nbno = NULLAGBLOCK;
+		foreach_bag_ptr(stack_top, idx, rmap) {
+			nbno = min(nbno, RMAP_END(rmap));
+		}
+
+		/* Emit reverse mappings, if needed */
+		ASSERT(nbno > sbno);
+		old_stack_nr = bag_count(stack_top);
+
+		/* While stack isn't empty... */
+		while (bag_count(stack_top)) {
+			/* Pop all rmaps that end at nbno */
+			foreach_bag_ptr_reverse(stack_top, idx, rmap) {
+				if (RMAP_END(rmap) != nbno)
+					continue;
+				dump_rmap("pop", agno, rmap);
+				error = bag_remove(stack_top, idx);
+				if (error)
+					goto err;
+			}
+
+			/* Push array items that start at nbno */
+			for (;
+			     array_cur && array_cur->rm_startblock == nbno;
+			     array_cur = peek_slab_cursor(rmaps_cur)) {
+				advance_slab_cursor(rmaps_cur); n++;
+				dump_rmap("push1", agno, array_cur);
+				error = bag_add(stack_top, array_cur);
+				if (error)
+					goto err;
+			}
+
+			/* Emit refcount if necessary */
+			ASSERT(nbno > cbno);
+			if (bag_count(stack_top) != old_stack_nr) {
+				if (old_stack_nr > 1) {
+					refcount_emit(mp, agno, cbno,
+						      nbno - cbno,
+						      old_stack_nr);
+				}
+				cbno = nbno;
+			}
+
+			/* Stack empty, go find the next rmap */
+			if (bag_count(stack_top) == 0)
+				break;
+			old_stack_nr = bag_count(stack_top);
+			sbno = nbno;
+
+			/* Set nbno to the bno of the next refcount change */
+			if (n < slab_count(rmaps))
+				nbno = array_cur->rm_startblock;
+			else
+				nbno = NULLAGBLOCK;
+			foreach_bag_ptr(stack_top, idx, rmap) {
+				nbno = min(nbno, RMAP_END(rmap));
+			}
+
+			/* Emit reverse mappings, if needed */
+			ASSERT(nbno > sbno);
+		}
+	}
+err:
+	free_bag(&stack_top);
+	free_slab_cursor(&rmaps_cur);
+
+	return error;
+}
+#undef RMAP_END
+
+/*
  * Return the number of rmap objects for an AG.
  */
 size_t
diff --git a/repair/rmap.h b/repair/rmap.h
index 69215e8..65f33e0 100644
--- a/repair/rmap.h
+++ b/repair/rmap.h
@@ -49,6 +49,8 @@ extern __int64_t rmap_diffkeys(struct xfs_rmap_irec *kp1,
 extern void rmap_high_key_from_rec(struct xfs_rmap_irec *rec,
 		struct xfs_rmap_irec *key);
 
+extern int compute_refcounts(struct xfs_mount *, xfs_agnumber_t);
+
 extern void fix_freelist(struct xfs_mount *, xfs_agnumber_t, bool);
 extern void rmap_store_agflcount(struct xfs_mount *, xfs_agnumber_t, int);
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 127/145] xfs_repair: record reflink inode state
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (125 preceding siblings ...)
  2016-06-17  1:44 ` [PATCH 126/145] xfs_repair: process reverse-mapping data into refcount data Darrick J. Wong
@ 2016-06-17  1:44 ` Darrick J. Wong
  2016-06-17  1:44 ` [PATCH 128/145] xfs_repair: fix inode reflink flags Darrick J. Wong
                   ` (17 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:44 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Record the state of the per-inode reflink flag, so that we can
compare against the rmap data and update the flags accordingly.
Clear the (reflink) state if we clear the inode.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/dino_chunks.c |    1 +
 repair/dinode.c      |    6 ++++++
 repair/incore.h      |   38 ++++++++++++++++++++++++++++++++++++++
 repair/incore_ino.c  |    2 ++
 repair/rmap.c        |   26 ++++++++++++++++++++++++++
 repair/rmap.h        |    2 ++
 6 files changed, 75 insertions(+)


diff --git a/repair/dino_chunks.c b/repair/dino_chunks.c
index 7dbaca6..4db9512 100644
--- a/repair/dino_chunks.c
+++ b/repair/dino_chunks.c
@@ -931,6 +931,7 @@ next_readbuf:
 				do_warn(_("would have cleared inode %" PRIu64 "\n"),
 					ino);
 			}
+			clear_inode_was_rl(ino_rec, irec_offset);
 		}
 
 process_next:
diff --git a/repair/dinode.c b/repair/dinode.c
index ecba3be..d48e415 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -2636,6 +2636,12 @@ _("bad non-zero extent size %u for non-realtime/extsize inode %" PRIu64 ", "),
 		goto clear_bad_out;
 
 	/*
+	 * record the state of the reflink flag
+	 */
+	if (collect_rmaps)
+		record_inode_reflink_flag(mp, dino, agno, ino, lino);
+
+	/*
 	 * check data fork -- if it's bad, clear the inode
 	 */
 	if (process_inode_data_fork(mp, agno, ino, dino, type, dirty,
diff --git a/repair/incore.h b/repair/incore.h
index b6c4b4f..bcd2f4b 100644
--- a/repair/incore.h
+++ b/repair/incore.h
@@ -283,6 +283,8 @@ typedef struct ino_tree_node  {
 	__uint64_t		ir_sparse;	/* sparse inode bitmask */
 	__uint64_t		ino_confirmed;	/* confirmed bitmask */
 	__uint64_t		ino_isa_dir;	/* bit == 1 if a directory */
+	__uint64_t		ino_was_rl;	/* bit == 1 if reflink flag set */
+	__uint64_t		ino_is_rl;	/* bit == 1 if reflink flag should be set */
 	__uint8_t		nlink_size;
 	union ino_nlink		disk_nlinks;	/* on-disk nlinks, set in P3 */
 	union  {
@@ -494,6 +496,42 @@ static inline bool is_inode_sparse(struct ino_tree_node *irec, int offset)
 }
 
 /*
+ * set/clear/test was inode marked as reflinked
+ */
+static inline void set_inode_was_rl(struct ino_tree_node *irec, int offset)
+{
+	irec->ino_was_rl |= IREC_MASK(offset);
+}
+
+static inline void clear_inode_was_rl(struct ino_tree_node *irec, int offset)
+{
+	irec->ino_was_rl &= ~IREC_MASK(offset);
+}
+
+static inline int inode_was_rl(struct ino_tree_node *irec, int offset)
+{
+	return (irec->ino_was_rl & IREC_MASK(offset)) != 0;
+}
+
+/*
+ * set/clear/test should inode be marked as reflinked
+ */
+static inline void set_inode_is_rl(struct ino_tree_node *irec, int offset)
+{
+	irec->ino_is_rl |= IREC_MASK(offset);
+}
+
+static inline void clear_inode_is_rl(struct ino_tree_node *irec, int offset)
+{
+	irec->ino_is_rl &= ~IREC_MASK(offset);
+}
+
+static inline int inode_is_rl(struct ino_tree_node *irec, int offset)
+{
+	return (irec->ino_is_rl & IREC_MASK(offset)) != 0;
+}
+
+/*
  * add_inode_reached() is set on inode I only if I has been reached
  * by an inode P claiming to be the parent and if I is a directory,
  * the .. link in the I says that P is I's parent.
diff --git a/repair/incore_ino.c b/repair/incore_ino.c
index 1898257..2ec1765 100644
--- a/repair/incore_ino.c
+++ b/repair/incore_ino.c
@@ -257,6 +257,8 @@ alloc_ino_node(
 	irec->ino_startnum = starting_ino;
 	irec->ino_confirmed = 0;
 	irec->ino_isa_dir = 0;
+	irec->ino_was_rl = 0;
+	irec->ino_is_rl = 0;
 	irec->ir_free = (xfs_inofree_t) - 1;
 	irec->ir_sparse = 0;
 	irec->ino_un.ex_data = NULL;
diff --git a/repair/rmap.c b/repair/rmap.c
index 4da8003..6a62665 100644
--- a/repair/rmap.c
+++ b/repair/rmap.c
@@ -1073,6 +1073,32 @@ rmap_high_key_from_rec(
 }
 
 /*
+ * Record that an inode had the reflink flag set when repair started.  The
+ * inode reflink flag will be adjusted as necessary.
+ */
+void
+record_inode_reflink_flag(
+	struct xfs_mount	*mp,
+	struct xfs_dinode	*dino,
+	xfs_agnumber_t		agno,
+	xfs_agino_t		ino,
+	xfs_ino_t		lino)
+{
+	struct ino_tree_node	*irec;
+	int			off;
+
+	ASSERT(XFS_AGINO_TO_INO(mp, agno, ino) == be64_to_cpu(dino->di_ino));
+	if (!(be64_to_cpu(dino->di_flags2) & XFS_DIFLAG2_REFLINK))
+		return;
+	irec = find_inode_rec(mp, agno, ino);
+	off = get_inode_offset(mp, lino, irec);
+	ASSERT(!inode_was_rl(irec, off));
+	set_inode_was_rl(irec, off);
+	dbg_printf("set was_rl lino=%llu was=0x%llx\n",
+		(unsigned long long)lino, (unsigned long long)irec->ino_was_rl);
+}
+
+/*
  * Regenerate the AGFL so that we don't run out of it while rebuilding the
  * rmap btree.  If skip_rmapbt is true, don't update the rmapbt (most probably
  * because we're updating the rmapbt).
diff --git a/repair/rmap.h b/repair/rmap.h
index 65f33e0..47b2f3b 100644
--- a/repair/rmap.h
+++ b/repair/rmap.h
@@ -50,6 +50,8 @@ extern void rmap_high_key_from_rec(struct xfs_rmap_irec *rec,
 		struct xfs_rmap_irec *key);
 
 extern int compute_refcounts(struct xfs_mount *, xfs_agnumber_t);
+extern void record_inode_reflink_flag(struct xfs_mount *, struct xfs_dinode *,
+	xfs_agnumber_t, xfs_agino_t, xfs_ino_t);
 
 extern void fix_freelist(struct xfs_mount *, xfs_agnumber_t, bool);
 extern void rmap_store_agflcount(struct xfs_mount *, xfs_agnumber_t, int);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 128/145] xfs_repair: fix inode reflink flags
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (126 preceding siblings ...)
  2016-06-17  1:44 ` [PATCH 127/145] xfs_repair: record reflink inode state Darrick J. Wong
@ 2016-06-17  1:44 ` Darrick J. Wong
  2016-06-17  1:44 ` [PATCH 129/145] xfs_repair: check the refcount btree against our observed reference counts when -n Darrick J. Wong
                   ` (16 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:44 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

While we're computing reference counts, record which inodes actually
share blocks with other files and fix the flags as necessary.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/phase4.c |   20 ++++++++
 repair/rmap.c   |  133 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 repair/rmap.h   |    1 
 3 files changed, 154 insertions(+)


diff --git a/repair/phase4.c b/repair/phase4.c
index 021d51d..59bb9fb 100644
--- a/repair/phase4.c
+++ b/repair/phase4.c
@@ -208,6 +208,21 @@ _("%s while computing reference count records.\n"),
 }
 
 static void
+process_inode_reflink_flags(
+	struct work_queue	*wq,
+	xfs_agnumber_t		agno,
+	void			*arg)
+{
+	int			error;
+
+	error = fix_inode_reflink_flags(wq->mp, agno);
+	if (error)
+		do_error(
+_("%s while fixing inode reflink flags.\n"),
+			 strerror(-error));
+}
+
+static void
 process_rmap_data(
 	struct xfs_mount	*mp)
 {
@@ -229,6 +244,11 @@ process_rmap_data(
 	for (i = 0; i < mp->m_sb.sb_agcount; i++)
 		queue_work(&wq, compute_ag_refcounts, i, NULL);
 	destroy_work_queue(&wq);
+
+	create_work_queue(&wq, mp, libxfs_nproc());
+	for (i = 0; i < mp->m_sb.sb_agcount; i++)
+		queue_work(&wq, process_inode_reflink_flags, i, NULL);
+	destroy_work_queue(&wq);
 }
 
 void
diff --git a/repair/rmap.c b/repair/rmap.c
index 6a62665..124173d 100644
--- a/repair/rmap.c
+++ b/repair/rmap.c
@@ -665,6 +665,39 @@ dump_rmap(
  */
 
 /*
+ * Mark all inodes in the reverse-mapping observation stack as requiring the
+ * reflink inode flag, if the stack depth is greater than 1.
+ */
+static void
+mark_inode_rl(
+	struct xfs_mount		*mp,
+	struct xfs_bag		*rmaps)
+{
+	xfs_agnumber_t		iagno;
+	struct xfs_rmap_irec	*rmap;
+	struct ino_tree_node	*irec;
+	int			off;
+	size_t			idx;
+	xfs_agino_t		ino;
+
+	if (bag_count(rmaps) < 2)
+		return;
+
+	/* Reflink flag accounting */
+	foreach_bag_ptr(rmaps, idx, rmap) {
+		ASSERT(!XFS_RMAP_NON_INODE_OWNER(rmap->rm_owner));
+		iagno = XFS_INO_TO_AGNO(mp, rmap->rm_owner);
+		ino = XFS_INO_TO_AGINO(mp, rmap->rm_owner);
+		pthread_mutex_lock(&ag_locks[iagno].lock);
+		irec = find_inode_rec(mp, iagno, ino);
+		off = get_inode_offset(mp, rmap->rm_owner, irec);
+		/* lock here because we might go outside this ag */
+		set_inode_is_rl(irec, off);
+		pthread_mutex_unlock(&ag_locks[iagno].lock);
+	}
+}
+
+/*
  * Emit a refcount object for refcntbt reconstruction during phase 5.
  */
 #define REFCOUNT_CLAMP(nr)	((nr) > MAXREFCOUNT ? MAXREFCOUNT : (nr))
@@ -745,6 +778,7 @@ compute_refcounts(
 			if (error)
 				goto err;
 		}
+		mark_inode_rl(mp, stack_top);
 
 		/* Set nbno to the bno of the next refcount change */
 		if (n < slab_count(rmaps))
@@ -781,6 +815,7 @@ compute_refcounts(
 				if (error)
 					goto err;
 			}
+			mark_inode_rl(mp, stack_top);
 
 			/* Emit refcount if necessary */
 			ASSERT(nbno > cbno);
@@ -1099,6 +1134,104 @@ record_inode_reflink_flag(
 }
 
 /*
+ * Fix an inode's reflink flag.
+ */
+static int
+fix_inode_reflink_flag(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	xfs_agino_t		agino,
+	bool			set)
+{
+	struct xfs_dinode	*dino;
+	struct xfs_buf		*buf;
+
+	if (set)
+		do_warn(
+_("setting reflink flag on inode %"PRIu64"\n"),
+			XFS_AGINO_TO_INO(mp, agno, agino));
+	else if (!no_modify) /* && !set */
+		do_warn(
+_("clearing reflink flag on inode %"PRIu64"\n"),
+			XFS_AGINO_TO_INO(mp, agno, agino));
+	if (no_modify)
+		return 0;
+
+	buf = get_agino_buf(mp, agno, agino, &dino);
+	if (!buf)
+		return 1;
+	ASSERT(XFS_AGINO_TO_INO(mp, agno, agino) == be64_to_cpu(dino->di_ino));
+	if (set)
+		dino->di_flags2 |= cpu_to_be64(XFS_DIFLAG2_REFLINK);
+	else
+		dino->di_flags2 &= cpu_to_be64(~XFS_DIFLAG2_REFLINK);
+	libxfs_dinode_calc_crc(mp, dino);
+	libxfs_writebuf(buf, 0);
+
+	return 0;
+}
+
+/*
+ * Fix discrepancies between the state of the inode reflink flag and our
+ * observations as to whether or not the inode really needs it.
+ */
+int
+fix_inode_reflink_flags(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno)
+{
+	struct ino_tree_node	*irec;
+	int			bit;
+	__uint64_t		was;
+	__uint64_t		is;
+	__uint64_t		diff;
+	__uint64_t		mask;
+	int			error = 0;
+	xfs_agino_t		agino;
+
+	/*
+	 * Update the reflink flag for any inode where there's a discrepancy
+	 * between the inode flag and whether or not we found any reflinked
+	 * extents.
+	 */
+	for (irec = findfirst_inode_rec(agno);
+	     irec != NULL;
+	     irec = next_ino_rec(irec)) {
+		ASSERT((irec->ino_was_rl & irec->ir_free) == 0);
+		ASSERT((irec->ino_is_rl & irec->ir_free) == 0);
+		was = irec->ino_was_rl;
+		is = irec->ino_is_rl;
+		if (was == is)
+			continue;
+		diff = was ^ is;
+		dbg_printf("mismatch ino=%llu was=0x%lx is=0x%lx dif=0x%lx\n",
+			(unsigned long long)XFS_AGINO_TO_INO(mp, agno,
+						irec->ino_startnum),
+			was, is, diff);
+
+		for (bit = 0, mask = 1; bit < 64; bit++, mask <<= 1) {
+			agino = bit + irec->ino_startnum;
+			if (!(diff & mask))
+				continue;
+			else if (was & mask)
+				error = fix_inode_reflink_flag(mp, agno, agino,
+						false);
+			else if (is & mask)
+				error = fix_inode_reflink_flag(mp, agno, agino,
+						true);
+			else
+				ASSERT(0);
+			if (error)
+				do_error(
+_("Unable to fix reflink flag on inode %"PRIu64".\n"),
+					XFS_AGINO_TO_INO(mp, agno, agino));
+		}
+	}
+
+	return error;
+}
+
+/*
  * Regenerate the AGFL so that we don't run out of it while rebuilding the
  * rmap btree.  If skip_rmapbt is true, don't update the rmapbt (most probably
  * because we're updating the rmapbt).
diff --git a/repair/rmap.h b/repair/rmap.h
index 47b2f3b..d6d360f 100644
--- a/repair/rmap.h
+++ b/repair/rmap.h
@@ -52,6 +52,7 @@ extern void rmap_high_key_from_rec(struct xfs_rmap_irec *rec,
 extern int compute_refcounts(struct xfs_mount *, xfs_agnumber_t);
 extern void record_inode_reflink_flag(struct xfs_mount *, struct xfs_dinode *,
 	xfs_agnumber_t, xfs_agino_t, xfs_ino_t);
+extern int fix_inode_reflink_flags(struct xfs_mount *, xfs_agnumber_t);
 
 extern void fix_freelist(struct xfs_mount *, xfs_agnumber_t, bool);
 extern void rmap_store_agflcount(struct xfs_mount *, xfs_agnumber_t, int);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 129/145] xfs_repair: check the refcount btree against our observed reference counts when -n
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (127 preceding siblings ...)
  2016-06-17  1:44 ` [PATCH 128/145] xfs_repair: fix inode reflink flags Darrick J. Wong
@ 2016-06-17  1:44 ` Darrick J. Wong
  2016-06-17  1:44 ` [PATCH 130/145] xfs_repair: rebuild the refcount btree Darrick J. Wong
                   ` (15 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:44 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Check the observed reference counts against whatever's in the refcount
btree for discrepancies.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/phase4.c |   20 +++++++++
 repair/rmap.c   |  126 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 repair/rmap.h   |    5 ++
 repair/scan.c   |    3 +
 4 files changed, 154 insertions(+)


diff --git a/repair/phase4.c b/repair/phase4.c
index 59bb9fb..9f4e0d0 100644
--- a/repair/phase4.c
+++ b/repair/phase4.c
@@ -223,6 +223,21 @@ _("%s while fixing inode reflink flags.\n"),
 }
 
 static void
+check_refcount_btrees(
+	work_queue_t	*wq,
+	xfs_agnumber_t	agno,
+	void		*arg)
+{
+	int		error;
+
+	error = check_refcounts(wq->mp, agno);
+	if (error)
+		do_error(
+_("%s while checking reference counts"),
+			 strerror(-error));
+}
+
+static void
 process_rmap_data(
 	struct xfs_mount	*mp)
 {
@@ -249,6 +264,11 @@ process_rmap_data(
 	for (i = 0; i < mp->m_sb.sb_agcount; i++)
 		queue_work(&wq, process_inode_reflink_flags, i, NULL);
 	destroy_work_queue(&wq);
+
+	create_work_queue(&wq, mp, libxfs_nproc());
+	for (i = 0; i < mp->m_sb.sb_agcount; i++)
+		queue_work(&wq, check_refcount_btrees, i, NULL);
+	destroy_work_queue(&wq);
 }
 
 void
diff --git a/repair/rmap.c b/repair/rmap.c
index 124173d..5c0e015 100644
--- a/repair/rmap.c
+++ b/repair/rmap.c
@@ -47,6 +47,7 @@ struct xfs_ag_rmap {
 
 static struct xfs_ag_rmap *ag_rmaps;
 static bool rmapbt_suspect;
+static bool refcbt_suspect;
 
 /*
  * Compare rmap observations for array sorting.
@@ -1232,6 +1233,131 @@ _("Unable to fix reflink flag on inode %"PRIu64".\n"),
 }
 
 /*
+ * Return the number of refcount objects for an AG.
+ */
+size_t
+refcount_record_count(
+	struct xfs_mount		*mp,
+	xfs_agnumber_t		agno)
+{
+	return slab_count(ag_rmaps[agno].ar_refcount_items);
+}
+
+/*
+ * Return a slab cursor that will return refcount objects in order.
+ */
+int
+init_refcount_cursor(
+	xfs_agnumber_t		agno,
+	struct xfs_slab_cursor	**cur)
+{
+	return init_slab_cursor(ag_rmaps[agno].ar_refcount_items, NULL, cur);
+}
+
+/*
+ * Disable the refcount btree check.
+ */
+void
+refcount_avoid_check(void)
+{
+	refcbt_suspect = true;
+}
+
+/*
+ * Compare the observed reference counts against what's in the ag btree.
+ */
+int
+check_refcounts(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno)
+{
+	struct xfs_slab_cursor	*rl_cur;
+	struct xfs_btree_cur	*bt_cur = NULL;
+	int			error;
+	int			have;
+	int			i;
+	struct xfs_buf		*agbp = NULL;
+	struct xfs_refcount_irec	*rl_rec;
+	struct xfs_refcount_irec	tmp;
+	struct xfs_perag	*pag;		/* per allocation group data */
+
+	if (!xfs_sb_version_hasreflink(&mp->m_sb))
+		return 0;
+	if (refcbt_suspect) {
+		if (no_modify && agno == 0)
+			do_warn(_("would rebuild corrupt refcount btrees.\n"));
+		return 0;
+	}
+
+	/* Create cursors to refcount structures */
+	error = init_refcount_cursor(agno, &rl_cur);
+	if (error)
+		return error;
+
+	error = xfs_alloc_read_agf(mp, NULL, agno, 0, &agbp);
+	if (error)
+		goto err;
+
+	/* Leave the per-ag data "uninitialized" since we rewrite it later */
+	pag = xfs_perag_get(mp, agno);
+	pag->pagf_init = 0;
+	xfs_perag_put(pag);
+
+	bt_cur = xfs_refcountbt_init_cursor(mp, NULL, agbp, agno, NULL);
+	if (!bt_cur) {
+		error = -ENOMEM;
+		goto err;
+	}
+
+	rl_rec = pop_slab_cursor(rl_cur);
+	while (rl_rec) {
+		/* Look for a refcount record in the btree */
+		error = xfs_refcountbt_lookup_le(bt_cur,
+				rl_rec->rc_startblock, &have);
+		if (error)
+			goto err;
+		if (!have) {
+			do_warn(
+_("Missing reference count record for (%u/%u) len %u count %u\n"),
+				agno, rl_rec->rc_startblock,
+				rl_rec->rc_blockcount, rl_rec->rc_refcount);
+			goto next_loop;
+		}
+
+		error = xfs_refcountbt_get_rec(bt_cur, &tmp, &i);
+		if (error)
+			goto err;
+		if (!i) {
+			do_warn(
+_("Missing reference count record for (%u/%u) len %u count %u\n"),
+				agno, rl_rec->rc_startblock,
+				rl_rec->rc_blockcount, rl_rec->rc_refcount);
+			goto next_loop;
+		}
+
+		/* Compare each refcount observation against the btree's */
+		if (tmp.rc_startblock != rl_rec->rc_startblock ||
+		    tmp.rc_blockcount < rl_rec->rc_blockcount ||
+		    tmp.rc_refcount < rl_rec->rc_refcount)
+			do_warn(
+_("Incorrect reference count: saw (%u/%u) len %u nlinks %u; should be (%u/%u) len %u nlinks %u\n"),
+				agno, tmp.rc_startblock, tmp.rc_blockcount,
+				tmp.rc_refcount, agno, rl_rec->rc_startblock,
+				rl_rec->rc_blockcount, rl_rec->rc_refcount);
+next_loop:
+		rl_rec = pop_slab_cursor(rl_cur);
+	}
+
+err:
+	if (bt_cur)
+		xfs_btree_del_cursor(bt_cur, XFS_BTREE_NOERROR);
+	if (agbp)
+		libxfs_putbuf(agbp);
+	free_slab_cursor(&rl_cur);
+	return 0;
+}
+
+/*
  * Regenerate the AGFL so that we don't run out of it while rebuilding the
  * rmap btree.  If skip_rmapbt is true, don't update the rmapbt (most probably
  * because we're updating the rmapbt).
diff --git a/repair/rmap.h b/repair/rmap.h
index d6d360f..d26c259 100644
--- a/repair/rmap.h
+++ b/repair/rmap.h
@@ -50,6 +50,11 @@ extern void rmap_high_key_from_rec(struct xfs_rmap_irec *rec,
 		struct xfs_rmap_irec *key);
 
 extern int compute_refcounts(struct xfs_mount *, xfs_agnumber_t);
+extern size_t refcount_record_count(struct xfs_mount *, xfs_agnumber_t);
+extern int init_refcount_cursor(xfs_agnumber_t, struct xfs_slab_cursor **);
+extern void refcount_avoid_check(void);
+extern int check_refcounts(struct xfs_mount *, xfs_agnumber_t);
+
 extern void record_inode_reflink_flag(struct xfs_mount *, struct xfs_dinode *,
 	xfs_agnumber_t, xfs_agino_t, xfs_ino_t);
 extern int fix_inode_reflink_flags(struct xfs_mount *, xfs_agnumber_t);
diff --git a/repair/scan.c b/repair/scan.c
index 8938341..4e78335 100644
--- a/repair/scan.c
+++ b/repair/scan.c
@@ -1353,6 +1353,8 @@ _("%s btree block claimed (state %d), agno %d, bno %d, suspect %d\n"),
 		}
 	}
 out:
+	if (suspect)
+		refcount_avoid_check();
 	return;
 }
 
@@ -2151,6 +2153,7 @@ validate_agf(
 		} else  {
 			do_warn(_("bad agbno %u for refcntbt root, agno %d\n"),
 				bno, agno);
+			refcount_avoid_check();
 		}
 	}
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 130/145] xfs_repair: rebuild the refcount btree
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (128 preceding siblings ...)
  2016-06-17  1:44 ` [PATCH 129/145] xfs_repair: check the refcount btree against our observed reference counts when -n Darrick J. Wong
@ 2016-06-17  1:44 ` Darrick J. Wong
  2016-06-17  1:44 ` [PATCH 131/145] xfs_repair: complain about copy-on-write leftovers Darrick J. Wong
                   ` (14 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:44 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Rebuild the refcount btree with the reference count data we assembled
during phase 4.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/phase5.c |  316 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 314 insertions(+), 2 deletions(-)


diff --git a/repair/phase5.c b/repair/phase5.c
index db84440..5018191 100644
--- a/repair/phase5.c
+++ b/repair/phase5.c
@@ -1691,6 +1691,297 @@ _("Insufficient memory to construct reverse-map cursor."));
 	free_slab_cursor(&rmap_cur);
 }
 
+/* rebuild the refcount tree */
+
+/*
+ * we don't have to worry here about how chewing up free extents
+ * may perturb things because reflink tree building happens before
+ * freespace tree building.
+ */
+static void
+init_refc_cursor(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	struct bt_status	*btree_curs)
+{
+	size_t			num_recs;
+	int			level;
+	struct bt_stat_level	*lptr;
+	struct bt_stat_level	*p_lptr;
+	xfs_extlen_t		blocks_allocated;
+
+	if (!xfs_sb_version_hasreflink(&mp->m_sb)) {
+		memset(btree_curs, 0, sizeof(struct bt_status));
+		return;
+	}
+
+	lptr = &btree_curs->level[0];
+	btree_curs->init = 1;
+	btree_curs->owner = XFS_RMAP_OWN_REFC;
+
+	/*
+	 * build up statistics
+	 */
+	num_recs = refcount_record_count(mp, agno);
+	if (num_recs == 0) {
+		/*
+		 * easy corner-case -- no refcount records
+		 */
+		lptr->num_blocks = 1;
+		lptr->modulo = 0;
+		lptr->num_recs_pb = 0;
+		lptr->num_recs_tot = 0;
+
+		btree_curs->num_levels = 1;
+		btree_curs->num_tot_blocks = btree_curs->num_free_blocks = 1;
+
+		setup_cursor(mp, agno, btree_curs);
+
+		return;
+	}
+
+	blocks_allocated = lptr->num_blocks = howmany(num_recs,
+					mp->m_refc_mxr[0]);
+
+	lptr->modulo = num_recs % lptr->num_blocks;
+	lptr->num_recs_pb = num_recs / lptr->num_blocks;
+	lptr->num_recs_tot = num_recs;
+	level = 1;
+
+	if (lptr->num_blocks > 1)  {
+		for (; btree_curs->level[level-1].num_blocks > 1
+				&& level < XFS_BTREE_MAXLEVELS;
+				level++)  {
+			lptr = &btree_curs->level[level];
+			p_lptr = &btree_curs->level[level - 1];
+			lptr->num_blocks = howmany(p_lptr->num_blocks,
+					mp->m_refc_mxr[1]);
+			lptr->modulo = p_lptr->num_blocks % lptr->num_blocks;
+			lptr->num_recs_pb = p_lptr->num_blocks
+					/ lptr->num_blocks;
+			lptr->num_recs_tot = p_lptr->num_blocks;
+
+			blocks_allocated += lptr->num_blocks;
+		}
+	}
+	ASSERT(lptr->num_blocks == 1);
+	btree_curs->num_levels = level;
+
+	btree_curs->num_tot_blocks = btree_curs->num_free_blocks
+			= blocks_allocated;
+
+	setup_cursor(mp, agno, btree_curs);
+}
+
+static void
+prop_refc_cursor(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	struct bt_status	*btree_curs,
+	xfs_agblock_t		startbno,
+	int			level)
+{
+	struct xfs_btree_block	*bt_hdr;
+	struct xfs_refcount_key	*bt_key;
+	xfs_refcount_ptr_t	*bt_ptr;
+	xfs_agblock_t		agbno;
+	struct bt_stat_level	*lptr;
+
+	level++;
+
+	if (level >= btree_curs->num_levels)
+		return;
+
+	lptr = &btree_curs->level[level];
+	bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
+
+	if (be16_to_cpu(bt_hdr->bb_numrecs) == 0)  {
+		/*
+		 * this only happens once to initialize the
+		 * first path up the left side of the tree
+		 * where the agbno's are already set up
+		 */
+		prop_refc_cursor(mp, agno, btree_curs, startbno, level);
+	}
+
+	if (be16_to_cpu(bt_hdr->bb_numrecs) ==
+				lptr->num_recs_pb + (lptr->modulo > 0))  {
+		/*
+		 * write out current prev block, grab us a new block,
+		 * and set the rightsib pointer of current block
+		 */
+#ifdef XR_BLD_INO_TRACE
+		fprintf(stderr, " ino prop agbno %d ", lptr->prev_agbno);
+#endif
+		if (lptr->prev_agbno != NULLAGBLOCK)  {
+			ASSERT(lptr->prev_buf_p != NULL);
+			libxfs_writebuf(lptr->prev_buf_p, 0);
+		}
+		lptr->prev_agbno = lptr->agbno;
+		lptr->prev_buf_p = lptr->buf_p;
+		agbno = get_next_blockaddr(agno, level, btree_curs);
+
+		bt_hdr->bb_u.s.bb_rightsib = cpu_to_be32(agbno);
+
+		lptr->buf_p = libxfs_getbuf(mp->m_dev,
+					XFS_AGB_TO_DADDR(mp, agno, agbno),
+					XFS_FSB_TO_BB(mp, 1));
+		lptr->agbno = agbno;
+
+		if (lptr->modulo)
+			lptr->modulo--;
+
+		/*
+		 * initialize block header
+		 */
+		lptr->buf_p->b_ops = &xfs_refcountbt_buf_ops;
+		bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
+		memset(bt_hdr, 0, mp->m_sb.sb_blocksize);
+		xfs_btree_init_block(mp, lptr->buf_p, XFS_REFC_CRC_MAGIC,
+					level, 0, agno,
+					XFS_BTREE_CRC_BLOCKS);
+
+		bt_hdr->bb_u.s.bb_leftsib = cpu_to_be32(lptr->prev_agbno);
+
+		/*
+		 * propagate extent record for first extent in new block up
+		 */
+		prop_refc_cursor(mp, agno, btree_curs, startbno, level);
+	}
+	/*
+	 * add inode info to current block
+	 */
+	be16_add_cpu(&bt_hdr->bb_numrecs, 1);
+
+	bt_key = XFS_REFCOUNT_KEY_ADDR(bt_hdr,
+				    be16_to_cpu(bt_hdr->bb_numrecs));
+	bt_ptr = XFS_REFCOUNT_PTR_ADDR(bt_hdr,
+				    be16_to_cpu(bt_hdr->bb_numrecs),
+				    mp->m_refc_mxr[1]);
+
+	bt_key->rc_startblock = cpu_to_be32(startbno);
+	*bt_ptr = cpu_to_be32(btree_curs->level[level-1].agbno);
+}
+
+/*
+ * rebuilds a refcount btree given a cursor.
+ */
+static void
+build_refcount_tree(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	struct bt_status	*btree_curs)
+{
+	xfs_agnumber_t		i;
+	xfs_agblock_t		j;
+	xfs_agblock_t		agbno;
+	struct xfs_btree_block	*bt_hdr;
+	struct xfs_refcount_irec	*refc_rec;
+	struct xfs_slab_cursor	*refc_cur;
+	struct xfs_refcount_rec	*bt_rec;
+	struct bt_stat_level	*lptr;
+	int			level = btree_curs->num_levels;
+	int			error;
+
+	for (i = 0; i < level; i++)  {
+		lptr = &btree_curs->level[i];
+
+		agbno = get_next_blockaddr(agno, i, btree_curs);
+		lptr->buf_p = libxfs_getbuf(mp->m_dev,
+					XFS_AGB_TO_DADDR(mp, agno, agbno),
+					XFS_FSB_TO_BB(mp, 1));
+
+		if (i == btree_curs->num_levels - 1)
+			btree_curs->root = agbno;
+
+		lptr->agbno = agbno;
+		lptr->prev_agbno = NULLAGBLOCK;
+		lptr->prev_buf_p = NULL;
+		/*
+		 * initialize block header
+		 */
+
+		lptr->buf_p->b_ops = &xfs_refcountbt_buf_ops;
+		bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
+		memset(bt_hdr, 0, mp->m_sb.sb_blocksize);
+		xfs_btree_init_block(mp, lptr->buf_p, XFS_REFC_CRC_MAGIC,
+					i, 0, agno,
+					XFS_BTREE_CRC_BLOCKS);
+	}
+
+	/*
+	 * run along leaf, setting up records.  as we have to switch
+	 * blocks, call the prop_refc_cursor routine to set up the new
+	 * pointers for the parent.  that can recurse up to the root
+	 * if required.  set the sibling pointers for leaf level here.
+	 */
+	error = init_refcount_cursor(agno, &refc_cur);
+	if (error)
+		do_error(
+_("Insufficient memory to construct refcount cursor."));
+	refc_rec = pop_slab_cursor(refc_cur);
+	lptr = &btree_curs->level[0];
+
+	for (i = 0; i < lptr->num_blocks; i++)  {
+		/*
+		 * block initialization, lay in block header
+		 */
+		lptr->buf_p->b_ops = &xfs_refcountbt_buf_ops;
+		bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
+		memset(bt_hdr, 0, mp->m_sb.sb_blocksize);
+		xfs_btree_init_block(mp, lptr->buf_p, XFS_REFC_CRC_MAGIC,
+					0, 0, agno,
+					XFS_BTREE_CRC_BLOCKS);
+
+		bt_hdr->bb_u.s.bb_leftsib = cpu_to_be32(lptr->prev_agbno);
+		bt_hdr->bb_numrecs = cpu_to_be16(lptr->num_recs_pb +
+							(lptr->modulo > 0));
+
+		if (lptr->modulo > 0)
+			lptr->modulo--;
+
+		if (lptr->num_recs_pb > 0)
+			prop_refc_cursor(mp, agno, btree_curs,
+					refc_rec->rc_startblock, 0);
+
+		bt_rec = (struct xfs_refcount_rec *)
+			  ((char *)bt_hdr + XFS_REFCOUNT_BLOCK_LEN);
+		for (j = 0; j < be16_to_cpu(bt_hdr->bb_numrecs); j++) {
+			ASSERT(refc_rec != NULL);
+			bt_rec[j].rc_startblock =
+					cpu_to_be32(refc_rec->rc_startblock);
+			bt_rec[j].rc_blockcount =
+					cpu_to_be32(refc_rec->rc_blockcount);
+			bt_rec[j].rc_refcount = cpu_to_be32(refc_rec->rc_refcount);
+
+			refc_rec = pop_slab_cursor(refc_cur);
+		}
+
+		if (refc_rec != NULL)  {
+			/*
+			 * get next leaf level block
+			 */
+			if (lptr->prev_buf_p != NULL)  {
+#ifdef XR_BLD_RL_TRACE
+				fprintf(stderr, "writing refcntbt agbno %u\n",
+					lptr->prev_agbno);
+#endif
+				ASSERT(lptr->prev_agbno != NULLAGBLOCK);
+				libxfs_writebuf(lptr->prev_buf_p, 0);
+			}
+			lptr->prev_buf_p = lptr->buf_p;
+			lptr->prev_agbno = lptr->agbno;
+			lptr->agbno = get_next_blockaddr(agno, 0, btree_curs);
+			bt_hdr->bb_u.s.bb_rightsib = cpu_to_be32(lptr->agbno);
+
+			lptr->buf_p = libxfs_getbuf(mp->m_dev,
+					XFS_AGB_TO_DADDR(mp, agno, lptr->agbno),
+					XFS_FSB_TO_BB(mp, 1));
+		}
+	}
+	free_slab_cursor(&refc_cur);
+}
+
 /*
  * build both the agf and the agfl for an agno given both
  * btree cursors.
@@ -1705,7 +1996,8 @@ build_agf_agfl(
 	struct bt_status	*bcnt_bt,
 	xfs_extlen_t		freeblks,	/* # free blocks in tree */
 	int			lostblocks,	/* # blocks that will be lost */
-	struct bt_status	*rmap_bt)
+	struct bt_status	*rmap_bt,
+	struct bt_status	*refcnt_bt)
 {
 	struct extent_tree_node	*ext_ptr;
 	struct xfs_buf		*agf_buf, *agfl_buf;
@@ -1747,6 +2039,8 @@ build_agf_agfl(
 	agf->agf_roots[XFS_BTNUM_RMAP] = cpu_to_be32(rmap_bt->root);
 	agf->agf_levels[XFS_BTNUM_RMAP] = cpu_to_be32(rmap_bt->num_levels);
 	agf->agf_freeblks = cpu_to_be32(freeblks);
+	agf->agf_refcount_root = cpu_to_be32(refcnt_bt->root);
+	agf->agf_refcount_level = cpu_to_be32(refcnt_bt->num_levels);
 
 	/*
 	 * Count and record the number of btree blocks consumed if required.
@@ -1864,6 +2158,10 @@ build_agf_agfl(
 
 	ASSERT(be32_to_cpu(agf->agf_roots[XFS_BTNUM_BNOi]) !=
 		be32_to_cpu(agf->agf_roots[XFS_BTNUM_CNTi]));
+	ASSERT(be32_to_cpu(agf->agf_refcount_root) !=
+		be32_to_cpu(agf->agf_roots[XFS_BTNUM_BNOi]));
+	ASSERT(be32_to_cpu(agf->agf_refcount_root) !=
+		be32_to_cpu(agf->agf_roots[XFS_BTNUM_CNTi]));
 
 	libxfs_writebuf(agf_buf, 0);
 
@@ -1933,6 +2231,7 @@ phase5_func(
 	bt_status_t	ino_btree_curs;
 	bt_status_t	fino_btree_curs;
 	bt_status_t	rmap_btree_curs;
+	bt_status_t	refcnt_btree_curs;
 	int		extra_blocks = 0;
 	uint		num_freeblocks;
 	xfs_extlen_t	freeblks1;
@@ -1995,6 +2294,12 @@ phase5_func(
 		 */
 		init_rmapbt_cursor(mp, agno, &rmap_btree_curs);
 
+		/*
+		 * Set up the btree cursors for the on-disk refcount btrees,
+		 * which includes pre-allocating all required blocks.
+		 */
+		init_refc_cursor(mp, agno, &refcnt_btree_curs);
+
 		num_extents = count_bno_extents_blocks(agno, &num_freeblocks);
 		/*
 		 * lose two blocks per AG -- the space tree roots
@@ -2088,12 +2393,17 @@ phase5_func(
 					rmap_btree_curs.num_free_blocks) - 1;
 		}
 
+		if (xfs_sb_version_hasreflink(&mp->m_sb)) {
+			build_refcount_tree(mp, agno, &refcnt_btree_curs);
+			write_cursor(&refcnt_btree_curs);
+		}
+
 		/*
 		 * set up agf and agfl
 		 */
 		build_agf_agfl(mp, agno, &bno_btree_curs,
 				&bcnt_btree_curs, freeblks1, extra_blocks,
-				&rmap_btree_curs);
+				&rmap_btree_curs, &refcnt_btree_curs);
 		/*
 		 * build inode allocation tree.
 		 */
@@ -2124,6 +2434,8 @@ phase5_func(
 		finish_cursor(&ino_btree_curs);
 		if (xfs_sb_version_hasrmapbt(&mp->m_sb))
 			finish_cursor(&rmap_btree_curs);
+		if (xfs_sb_version_hasreflink(&mp->m_sb))
+			finish_cursor(&refcnt_btree_curs);
 		if (xfs_sb_version_hasfinobt(&mp->m_sb))
 			finish_cursor(&fino_btree_curs);
 		finish_cursor(&bcnt_btree_curs);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 131/145] xfs_repair: complain about copy-on-write leftovers
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (129 preceding siblings ...)
  2016-06-17  1:44 ` [PATCH 130/145] xfs_repair: rebuild the refcount btree Darrick J. Wong
@ 2016-06-17  1:44 ` Darrick J. Wong
  2016-06-17  1:44 ` [PATCH 132/145] xfs_repair: check the CoW extent size hint Darrick J. Wong
                   ` (13 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:44 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Complain about leftover CoW allocations that are hanging off the
refcount btree.  These are cleaned out at mount time, but we could be
louder about flagging down evidence of trouble.

Since these extents aren't "owned" by anything, we'll free them up by
reconstructing the free space btrees.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 db/check.c      |   21 +++++++++++++++++----
 repair/incore.h |    3 ++-
 repair/scan.c   |   25 ++++++++++++++++++++++++-
 3 files changed, 43 insertions(+), 6 deletions(-)


diff --git a/db/check.c b/db/check.c
index 3b17585..841a605 100644
--- a/db/check.c
+++ b/db/check.c
@@ -45,7 +45,7 @@ typedef enum {
 	DBM_LOG,	DBM_MISSING,	DBM_QUOTA,	DBM_RTBITMAP,
 	DBM_RTDATA,	DBM_RTFREE,	DBM_RTSUM,	DBM_SB,
 	DBM_SYMLINK,	DBM_BTFINO,	DBM_BTRMAP,	DBM_BTREFC,
-	DBM_RLDATA,
+	DBM_RLDATA,	DBM_COWDATA,
 	DBM_NDBM
 } dbm_t;
 
@@ -4821,9 +4821,22 @@ scanfunc_refcnt(
 		rp = XFS_REFCOUNT_REC_ADDR(block, 1);
 		lastblock = 0;
 		for (i = 0; i < be16_to_cpu(block->bb_numrecs); i++) {
-			set_dbmap(seqno, be32_to_cpu(rp[i].rc_startblock),
-				be32_to_cpu(rp[i].rc_blockcount), DBM_RLDATA,
-				seqno, bno);
+			if (be32_to_cpu(rp[i].rc_refcount) == 1) {
+				dbprintf(_(
+		"leftover CoW extent (%u/%u) len %u\n"),
+					seqno,
+					be32_to_cpu(rp[i].rc_startblock),
+					be32_to_cpu(rp[i].rc_blockcount));
+				set_dbmap(seqno,
+					be32_to_cpu(rp[i].rc_startblock),
+					be32_to_cpu(rp[i].rc_blockcount),
+					DBM_COWDATA, seqno, bno);
+			} else {
+				set_dbmap(seqno,
+					be32_to_cpu(rp[i].rc_startblock),
+					be32_to_cpu(rp[i].rc_blockcount),
+					DBM_RLDATA, seqno, bno);
+			}
 			if (be32_to_cpu(rp[i].rc_startblock) < lastblock) {
 				dbprintf(_(
 		"out-of-order refcnt btree record %d (%u %u) block %u/%u\n"),
diff --git a/repair/incore.h b/repair/incore.h
index bcd2f4b..c23a3a3 100644
--- a/repair/incore.h
+++ b/repair/incore.h
@@ -107,7 +107,8 @@ typedef struct rt_extent_tree_node  {
 #define XR_E_INO1	10	/* used by inodes (marked by rmap btree) */
 #define XR_E_FS_MAP1	11	/* used by fs space/inode maps (rmap btree) */
 #define XR_E_REFC	12	/* used by fs ag reference count btree */
-#define XR_E_BAD_STATE	13
+#define XR_E_COW	13	/* leftover cow extent */
+#define XR_E_BAD_STATE	14
 
 /* separate state bit, OR'ed into high (4th) bit of ex_state field */
 
diff --git a/repair/scan.c b/repair/scan.c
index 4e78335..1c2afb6 100644
--- a/repair/scan.c
+++ b/repair/scan.c
@@ -1293,7 +1293,30 @@ _("%s btree block claimed (state %d), agno %d, bno %d, suspect %d\n"),
 				continue;
 			}
 
-			if (nr < 2 || nr > MAXREFCOUNT) {
+			if (nr == 1) {
+				xfs_agblock_t	c;
+				xfs_extlen_t	cnr;
+
+				for (c = b; c < end; c += cnr) {
+					state = get_bmap_ext(agno, c, end, &cnr);
+					switch (state) {
+					case XR_E_COW:
+						break;
+					case XR_E_UNKNOWN:
+						do_warn(
+_("leftover CoW extent (%u/%u) len %u\n"),
+						agno, c, cnr);
+						set_bmap_ext(agno, c, cnr, XR_E_FREE);
+						break;
+					default:
+						do_warn(
+_("extent (%u/%u) len %u claimed, state is %d\n"),
+						agno, c, cnr, state);
+						set_bmap_ext(agno, c, cnr, XR_E_FREE);
+						break;
+					}
+				}
+			} else if (nr < 2 || nr > MAXREFCOUNT) {
 				do_warn(
 	_("invalid reference count %u in record %u of %s btree block %u/%u\n"),
 					nr, i, name, agno, bno);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 132/145] xfs_repair: check the CoW extent size hint
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (130 preceding siblings ...)
  2016-06-17  1:44 ` [PATCH 131/145] xfs_repair: complain about copy-on-write leftovers Darrick J. Wong
@ 2016-06-17  1:44 ` Darrick J. Wong
  2016-06-17  1:44 ` [PATCH 133/145] xfs_repair: use range query when while checking rmaps Darrick J. Wong
                   ` (12 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:44 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/dinode.c |   55 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)


diff --git a/repair/dinode.c b/repair/dinode.c
index d48e415..9973f3e 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -2531,6 +2531,38 @@ _("bad (negative) size %" PRId64 " on inode %" PRIu64 "\n"),
 			goto clear_bad_out;
 		}
 
+		if ((flags2 & XFS_DIFLAG2_COWEXTSIZE) &&
+		    !xfs_sb_version_hasreflink(&mp->m_sb)) {
+			if (!uncertain) {
+				do_warn(
+	_("inode %" PRIu64 " has CoW extent size hint but file system does not support reflink\n"),
+					lino);
+			}
+			flags2 &= ~XFS_DIFLAG2_COWEXTSIZE;
+		}
+
+		if (flags2 & XFS_DIFLAG2_COWEXTSIZE) {
+			/* must be a directory or file */
+			if (di_mode && !S_ISDIR(di_mode) && !S_ISREG(di_mode)) {
+				if (!uncertain) {
+					do_warn(
+	_("CoW extent size flag set on non-file, non-directory inode %" PRIu64 "\n" ),
+						lino);
+				}
+				flags2 &= ~XFS_DIFLAG2_COWEXTSIZE;
+			}
+		}
+
+		if ((flags2 & XFS_DIFLAG2_COWEXTSIZE) &&
+		    (flags & (XFS_DIFLAG_REALTIME | XFS_DIFLAG_RTINHERIT))) {
+			if (!uncertain) {
+				do_warn(
+	_("Cannot have CoW extent size hint on a realtime inode %" PRIu64 "\n"),
+					lino);
+			}
+			flags2 &= ~XFS_DIFLAG2_COWEXTSIZE;
+		}
+
 		if (!verify_mode && flags2 != be64_to_cpu(dino->di_flags2)) {
 			if (!no_modify) {
 				do_warn(_("fixing bad flags2.\n"));
@@ -2624,6 +2656,29 @@ _("bad non-zero extent size %u for non-realtime/extsize inode %" PRIu64 ", "),
 	}
 
 	/*
+	 * Only (regular files and directories) with COWEXTSIZE flags
+	 * set can have extsize set.
+	 */
+	if (dino->di_version >= 3 &&
+	    be32_to_cpu(dino->di_cowextsize) != 0) {
+		if ((type == XR_INO_DIR || type == XR_INO_DATA) &&
+		    (be64_to_cpu(dino->di_flags2) &
+					XFS_DIFLAG2_COWEXTSIZE)) {
+			/* s'okay */ ;
+		} else {
+			do_warn(
+_("Cannot have non-zero CoW extent size %u on non-cowextsize inode %" PRIu64 ", "),
+					be32_to_cpu(dino->di_cowextsize), lino);
+			if (!no_modify)  {
+				do_warn(_("resetting to zero\n"));
+				dino->di_cowextsize = 0;
+				*dirty = 1;
+			} else
+				do_warn(_("would reset to zero\n"));
+		}
+	}
+
+	/*
 	 * general size/consistency checks:
 	 */
 	if (process_check_inode_sizes(mp, dino, lino, type) != 0)

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 133/145] xfs_repair: use range query when while checking rmaps
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (131 preceding siblings ...)
  2016-06-17  1:44 ` [PATCH 132/145] xfs_repair: check the CoW extent size hint Darrick J. Wong
@ 2016-06-17  1:44 ` Darrick J. Wong
  2016-06-17  1:44 ` [PATCH 134/145] xfs_repair: check for mergeable refcount records Darrick J. Wong
                   ` (11 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:44 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

For shared extents, we ought to use a range query on the rmapbt to
find the corresponding rmap.  However, most of the time the observed
rmap will be an exact match for the rmapbt rmap, in which case we
could have used the (much faster) regular lookup.  Therefore, try the
regular lookup first and resort to the range lookup if that doesn't
get us what we want.  This can cut the run time of the rmap check of
xfs_repair in half.

Theoretically, the only reason why an observed rmap wouldn't be an
exact match for an rmapbt rmap is because we modified some file on
account of a metadata error.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/rmap.c |   26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)


diff --git a/repair/rmap.c b/repair/rmap.c
index 5c0e015..1b89d4c 100644
--- a/repair/rmap.c
+++ b/repair/rmap.c
@@ -909,6 +909,20 @@ lookup_rmap(
 	return xfs_rmap_get_rec(bt_cur, tmp, have);
 }
 
+/* Look for an rmap in the rmapbt that matches a given rmap. */
+static int
+lookup_rmap_overlapped(
+	struct xfs_btree_cur	*bt_cur,
+	struct xfs_rmap_irec	*rm_rec,
+	struct xfs_rmap_irec	*tmp,
+	int			*have)
+{
+	/* Have to use our fancy version for overlapped */
+	return xfs_rmap_lookup_le_range(bt_cur, rm_rec->rm_startblock,
+				rm_rec->rm_owner, rm_rec->rm_offset,
+				rm_rec->rm_flags, tmp, have);
+}
+
 /* Does the btree rmap cover the observed rmap? */
 #define NEXTP(x)	((x)->rm_startblock + (x)->rm_blockcount)
 #define NEXTL(x)	((x)->rm_offset + (x)->rm_blockcount)
@@ -997,6 +1011,18 @@ check_rmaps(
 		error = lookup_rmap(bt_cur, rm_rec, &tmp, &have);
 		if (error)
 			goto err;
+		/*
+		 * Using the range query is expensive, so only do it if
+		 * the regular lookup doesn't find anything or if it doesn't
+		 * match the observed rmap.
+		 */
+		if (xfs_sb_version_hasreflink(&bt_cur->bc_mp->m_sb) &&
+				(!have || !is_good_rmap(rm_rec, &tmp))) {
+			error = lookup_rmap_overlapped(bt_cur, rm_rec,
+					&tmp, &have);
+			if (error)
+				goto err;
+		}
 		if (!have) {
 			do_warn(
 _("Missing reverse-mapping record for (%u/%u) %slen %u owner %"PRId64" \

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 134/145] xfs_repair: check for mergeable refcount records
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (132 preceding siblings ...)
  2016-06-17  1:44 ` [PATCH 133/145] xfs_repair: use range query when while checking rmaps Darrick J. Wong
@ 2016-06-17  1:44 ` Darrick J. Wong
  2016-06-17  1:44 ` [PATCH 135/145] mkfs.xfs: format reflink enabled filesystems Darrick J. Wong
                   ` (10 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:44 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Make sure there aren't adjacent refcount records that could be merged;
this is a sign that the refcount tree algorithms aren't working
correctly.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/scan.c |   25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)


diff --git a/repair/scan.c b/repair/scan.c
index 1c2afb6..d2e588a 100644
--- a/repair/scan.c
+++ b/repair/scan.c
@@ -1195,6 +1195,11 @@ out:
 		rmap_avoid_check();
 }
 
+struct refc_priv {
+	struct xfs_refcount_irec	last_rec;
+};
+
+
 static void
 scan_refcbt(
 	struct xfs_btree_block	*block,
@@ -1214,6 +1219,7 @@ scan_refcbt(
 	int			numrecs;
 	int			state;
 	xfs_agblock_t		lastblock = 0;
+	struct refc_priv	*refc_priv = priv;
 
 	if (magic != XFS_REFC_CRC_MAGIC) {
 		name = "(unknown)";
@@ -1331,6 +1337,20 @@ _("extent (%u/%u) len %u claimed, state is %d\n"),
 				lastblock = b;
 			}
 
+			/* Is this record mergeable with the last one? */
+			if (refc_priv->last_rec.rc_startblock +
+			    refc_priv->last_rec.rc_blockcount == b &&
+			    refc_priv->last_rec.rc_refcount == nr) {
+				do_warn(
+	_("record %d in block (%u/%u) of %s tree should be merged with previous record\n"),
+					i, agno, bno, name);
+				refc_priv->last_rec.rc_blockcount += len;
+			} else {
+				refc_priv->last_rec.rc_startblock = b;
+				refc_priv->last_rec.rc_blockcount = len;
+				refc_priv->last_rec.rc_refcount = nr;
+			}
+
 			/* XXX: probably want to mark the reflinked areas? */
 		}
 		goto out;
@@ -2169,10 +2189,13 @@ validate_agf(
 	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
 		bno = be32_to_cpu(agf->agf_refcount_root);
 		if (bno != 0 && verify_agbno(mp, agno, bno)) {
+			struct refc_priv	priv;
+
+			memset(&priv, 0, sizeof(priv));
 			scan_sbtree(bno,
 				    be32_to_cpu(agf->agf_refcount_level),
 				    agno, 0, scan_refcbt, 1, XFS_REFC_CRC_MAGIC,
-				    agcnts, &xfs_refcountbt_buf_ops);
+				    &priv, &xfs_refcountbt_buf_ops);
 		} else  {
 			do_warn(_("bad agbno %u for refcntbt root, agno %d\n"),
 				bno, agno);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 135/145] mkfs.xfs: format reflink enabled filesystems
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (133 preceding siblings ...)
  2016-06-17  1:44 ` [PATCH 134/145] xfs_repair: check for mergeable refcount records Darrick J. Wong
@ 2016-06-17  1:44 ` Darrick J. Wong
  2016-06-17  1:45 ` [PATCH 136/145] xfs: introduce the XFS_IOC_GETFSMAPX ioctl Darrick J. Wong
                   ` (9 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:44 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Create the refcount btree at mkfs time and set the feature flag.

v2: Turn on the reflink feature when calculating the minimum log size.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 include/xfs_multidisk.h |    3 +-
 man/man8/mkfs.xfs.8     |   28 ++++++++++++++++++++
 mkfs/maxtrres.c         |    5 +++-
 mkfs/xfs_mkfs.c         |   67 +++++++++++++++++++++++++++++++++++++++++++----
 4 files changed, 95 insertions(+), 8 deletions(-)


diff --git a/include/xfs_multidisk.h b/include/xfs_multidisk.h
index 8dc3027..ce9bbce 100644
--- a/include/xfs_multidisk.h
+++ b/include/xfs_multidisk.h
@@ -68,6 +68,7 @@ extern void res_failed (int err);
 /* maxtrres.c */
 extern int max_trans_res(unsigned long agsize, int crcs_enabled, int dirversion,
 		int sectorlog, int blocklog, int inodelog, int dirblocklog,
-		int logversion, int log_sunit, int finobt, int rmapbt);
+		int logversion, int log_sunit, int finobt, int rmapbt,
+		int reflink);
 
 #endif	/* __XFS_MULTIDISK_H__ */
diff --git a/man/man8/mkfs.xfs.8 b/man/man8/mkfs.xfs.8
index d88d314..6131e24 100644
--- a/man/man8/mkfs.xfs.8
+++ b/man/man8/mkfs.xfs.8
@@ -213,6 +213,34 @@ for filesystems created with the (default)
 option set. When the option
 .B \-m crc=0
 is used, the reverse mapping btree feature is not supported and is disabled.
+.TP
+.BI reflink= value
+This option enables the use of a separate reference count btree index in each
+allocation group. The value is either 0 to disable the feature, or 1 to create
+a reference count btree in each allocation group.
+.IP
+The reference count btree enables the sharing of physical extents between
+the data forks of different files, which is commonly known as "reflink".
+Unlike traditional Unix filesystems which assume that every inode and
+logical block pair map to a unique physical block, a reflink-capable
+XFS filesystem removes the uniqueness requirement, allowing up to four
+billion arbitrary inode/logical block pairs to map to a physical block.
+If a program tries to write to a multiply-referenced block in a file, the write
+will be redirected to a new block, and that file's logical-to-physical
+mapping will be changed to the new block ("copy on write").  This feature
+enables the creation of per-file snapshots and deduplication.  It is only
+available for the data forks of regular files.
+.IP
+By default,
+.B mkfs.xfs
+will not create reference count btrees and therefore will not enable the
+reflink feature.  This feature is only available for filesystems created with
+the (default)
+.B \-m crc=1
+option set. When the option
+.B \-m crc=0
+is used, the reference count btree feature is not supported and reflink is
+disabled.
 .RE
 .TP
 .BI \-d " data_section_options"
diff --git a/mkfs/maxtrres.c b/mkfs/maxtrres.c
index fc24eac..a9c0985 100644
--- a/mkfs/maxtrres.c
+++ b/mkfs/maxtrres.c
@@ -39,7 +39,8 @@ max_trans_res(
 	int		logversion,
 	int		log_sunit,
 	int		finobt,
-	int		rmapbt)
+	int		rmapbt,
+	int		reflink)
 {
 	xfs_sb_t	*sbp;
 	xfs_mount_t	mount;
@@ -75,6 +76,8 @@ max_trans_res(
 		sbp->sb_features_ro_compat |= XFS_SB_FEAT_RO_COMPAT_FINOBT;
 	if (rmapbt)
 		sbp->sb_features_ro_compat |= XFS_SB_FEAT_RO_COMPAT_RMAPBT;
+	if (reflink)
+		sbp->sb_features_ro_compat |= XFS_SB_FEAT_RO_COMPAT_REFLINK;
 
 	libxfs_mount(&mount, sbp, 0,0,0,0);
 	maxfsb = xfs_log_calc_minimum_size(&mount);
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index 634dcfd..3753731 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -682,6 +682,8 @@ struct opt_params mopts = {
 		"uuid",
 #define M_RMAPBT	3
 		"rmapbt",
+#define M_REFLINK	4
+		"reflink",
 		NULL
 	},
 	.subopt_params = {
@@ -707,6 +709,12 @@ struct opt_params mopts = {
 		  .maxval = 1,
 		  .defaultval = 0,
 		},
+		{ .index = M_REFLINK,
+		  .conflicts = { LAST_CONFLICT },
+		  .minval = 0,
+		  .maxval = 1,
+		  .defaultval = 0,
+		},
 	},
 };
 
@@ -1463,6 +1471,7 @@ struct sb_feat_args {
 	bool	dirftype;
 	bool	parent_pointers;
 	bool	rmapbt;
+	bool	reflink;
 };
 
 static void
@@ -1535,6 +1544,8 @@ sb_set_features(
 		sbp->sb_features_ro_compat = XFS_SB_FEAT_RO_COMPAT_FINOBT;
 	if (fp->rmapbt)
 		sbp->sb_features_ro_compat |= XFS_SB_FEAT_RO_COMPAT_RMAPBT;
+	if (fp->reflink)
+		sbp->sb_features_ro_compat |= XFS_SB_FEAT_RO_COMPAT_REFLINK;
 
 	/*
 	 * Sparse inode chunk support has two main inode alignment requirements.
@@ -1796,6 +1807,7 @@ main(
 		.dirftype = true,
 		.parent_pointers = false,
 		.rmapbt = false,
+		.reflink = false,
 	};
 
 	platform_uuid_generate(&uuid);
@@ -2089,6 +2101,10 @@ main(
 					sb_feat.rmapbt = getnum(
 						value, &mopts, M_RMAPBT);
 					break;
+				case M_REFLINK:
+					sb_feat.reflink = getnum(
+						value, &mopts, M_REFLINK);
+					break;
 				default:
 					unknown('m', value);
 				}
@@ -2431,6 +2447,13 @@ _("rmapbt not supported without CRC support\n"));
 			usage();
 		}
 		sb_feat.rmapbt = false;
+
+		if (sb_feat.reflink) {
+			fprintf(stderr,
+_("reflink not supported without CRC support\n"));
+			usage();
+		}
+		sb_feat.reflink = false;
 	}
 
 
@@ -2921,7 +2944,7 @@ an AG size that is one stripe unit smaller, for example %llu.\n"),
 				   sb_feat.crcs_enabled, sb_feat.dir_version,
 				   sectorlog, blocklog, inodelog, dirblocklog,
 				   sb_feat.log_version, lsunit, sb_feat.finobt,
-				   sb_feat.rmapbt);
+				   sb_feat.rmapbt, sb_feat.reflink);
 	ASSERT(min_logblocks);
 	min_logblocks = MAX(XFS_MIN_LOG_BLOCKS, min_logblocks);
 	if (!logsize && dblocks >= (1024*1024*1024) >> blocklog)
@@ -3056,7 +3079,7 @@ _("size %s specified for log subvolume is too large, maximum is %lld blocks\n"),
 		printf(_(
 		   "meta-data=%-22s isize=%-6d agcount=%lld, agsize=%lld blks\n"
 		   "         =%-22s sectsz=%-5u attr=%u, projid32bit=%u\n"
-		   "         =%-22s crc=%-8u finobt=%u, sparse=%u, rmapbt=%u\n"
+		   "         =%-22s crc=%-8u finobt=%u, sparse=%u, rmapbt=%u, reflink=%u\n"
 		   "data     =%-22s bsize=%-6u blocks=%llu, imaxpct=%u\n"
 		   "         =%-22s sunit=%-6u swidth=%u blks\n"
 		   "naming   =version %-14u bsize=%-6u ascii-ci=%d ftype=%d\n"
@@ -3067,7 +3090,7 @@ _("size %s specified for log subvolume is too large, maximum is %lld blocks\n"),
 			"", sectorsize, sb_feat.attr_version,
 				    !sb_feat.projid16bit,
 			"", sb_feat.crcs_enabled, sb_feat.finobt, sb_feat.spinodes,
-			sb_feat.rmapbt,
+			sb_feat.rmapbt, sb_feat.reflink,
 			"", blocksize, (long long)dblocks, imaxpct,
 			"", dsunit, dswidth,
 			sb_feat.dir_version, dirblocksize, sb_feat.nci,
@@ -3254,7 +3277,10 @@ _("size %s specified for log subvolume is too large, maximum is %lld blocks\n"),
 						cpu_to_be32(XFS_RMAP_BLOCK(mp));
 			agf->agf_levels[XFS_BTNUM_RMAPi] = cpu_to_be32(1);
 		}
-
+		if (xfs_sb_version_hasreflink(&mp->m_sb)) {
+			agf->agf_refcount_root = cpu_to_be32(xfs_refc_block(mp));
+			agf->agf_refcount_level = cpu_to_be32(1);
+		}
 		agf->agf_flfirst = 0;
 		agf->agf_fllast = cpu_to_be32(XFS_AGFL_SIZE(mp) - 1);
 		agf->agf_flcount = 0;
@@ -3423,6 +3449,23 @@ _("size %s specified for log subvolume is too large, maximum is %lld blocks\n"),
 		libxfs_writebuf(buf, LIBXFS_EXIT_ON_FAILURE);
 
 		/*
+		 * refcount btree root block
+		 */
+		if (xfs_sb_version_hasreflink(&mp->m_sb)) {
+			buf = libxfs_getbuf(mp->m_ddev_targp,
+					XFS_AGB_TO_DADDR(mp, agno, xfs_refc_block(mp)),
+					bsize);
+			buf->b_ops = &xfs_refcountbt_buf_ops;
+
+			block = XFS_BUF_TO_BLOCK(buf);
+			memset(block, 0, blocksize);
+			xfs_btree_init_block(mp, buf, XFS_REFC_CRC_MAGIC, 0, 0,
+						agno, XFS_BTREE_CRC_BLOCKS);
+
+			libxfs_writebuf(buf, LIBXFS_EXIT_ON_FAILURE);
+		}
+
+		/*
 		 * INO btree root block
 		 */
 		buf = libxfs_getbuf(mp->m_ddev_targp,
@@ -3510,9 +3553,21 @@ _("size %s specified for log subvolume is too large, maximum is %lld blocks\n"),
 			rrec->rm_offset = 0;
 			be16_add_cpu(&block->bb_numrecs, 1);
 
+			/* account for refcount btree root */ 
+			if (xfs_sb_version_hasreflink(&mp->m_sb)) {
+				rrec = XFS_RMAP_REC_ADDR(block, 5);
+				rrec->rm_startblock = cpu_to_be32(
+							xfs_refc_block(mp));
+				rrec->rm_blockcount = cpu_to_be32(1);
+				rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_REFC);
+				rrec->rm_offset = 0;
+				be16_add_cpu(&block->bb_numrecs, 1);
+			}
+
 			/* account for the log space */
 			if (loginternal && agno == logagno) {
-				rrec = XFS_RMAP_REC_ADDR(block, 5);
+				rrec = XFS_RMAP_REC_ADDR(block,
+					be16_to_cpu(block->bb_numrecs) + 1);
 				rrec->rm_startblock = cpu_to_be32(
 						XFS_FSB_TO_AGBNO(mp, logstart));
 				rrec->rm_blockcount = cpu_to_be32(logblocks);
@@ -3748,7 +3803,7 @@ usage( void )
 {
 	fprintf(stderr, _("Usage: %s\n\
 /* blocksize */		[-b log=n|size=num]\n\
-/* metadata */		[-m crc=0|1,finobt=0|1,uuid=xxx,rmapbt=0|1]\n\
+/* metadata */		[-m crc=0|1,finobt=0|1,uuid=xxx,rmapbt=0|1,reflink=0|1]\n\
 /* data subvol */	[-d agcount=n,agsize=n,file,name=xxx,size=num,\n\
 			    (sunit=value,swidth=value|su=num,sw=num|noalign),\n\
 			    sectlog=n|sectsize=num\n\

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 136/145] xfs: introduce the XFS_IOC_GETFSMAPX ioctl
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (134 preceding siblings ...)
  2016-06-17  1:44 ` [PATCH 135/145] mkfs.xfs: format reflink enabled filesystems Darrick J. Wong
@ 2016-06-17  1:45 ` Darrick J. Wong
  2016-06-17  1:45 ` [PATCH 137/145] xfs_db: introduce the 'fsmap' command to find what owns a set of fsblocks Darrick J. Wong
                   ` (8 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:45 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Introduce a new ioctl that uses the reverse mapping btree to return
information about the physical layout of the filesystem.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_fs.h       |   62 +++++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_refcount.c |   51 +++++++++++++++++++++++++++++-----------
 libxfs/xfs_refcount.h |    4 +++
 3 files changed, 103 insertions(+), 14 deletions(-)


diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h
index df58c1c..236a5a7 100644
--- a/libxfs/xfs_fs.h
+++ b/libxfs/xfs_fs.h
@@ -117,6 +117,67 @@ struct getbmapx {
 #define BMV_OF_SHARED		0x8	/* segment shared with another file */
 
 /*
+ *	Structure for XFS_IOC_GETFSMAPX.
+ *
+ *	Similar to XFS_IOC_GETBMAPX, the first two elements in the array are
+ *	used to constrain the output.  The first element in the array should
+ *	represent the lowest disk address that the user wants to learn about.
+ *	The second element in the array should represent the highest disk
+ *	address to query.  Subsequent array elements will be filled out by the
+ *	command.
+ *
+ *	The fmv_iflags field is only used in the first structure.  The
+ *	fmv_oflags field is filled in for each returned structure after the
+ *	second structure.  The fmv_unused1 fields in the first two array
+ *	elements must be zero.
+ *
+ *	The fmv_count, fmv_entries, and fmv_iflags fields in the second array
+ *	element must be zero.
+ *
+ *	fmv_block, fmv_offset, and fmv_length are expressed in units of 512
+ *	byte sectors.
+ */
+#ifndef HAVE_GETFSMAPX
+struct getfsmapx {
+	__u32		fmv_device;	/* device id */
+	__u32		fmv_unused1;	/* future use, must be zero */
+	__u64		fmv_block;	/* starting block */
+	__u64		fmv_owner;	/* owner id */
+	__u64		fmv_offset;	/* file offset of segment */
+	__u64		fmv_length;	/* length of segment, blocks */
+	__u32		fmv_oflags;	/* mapping flags */
+	__u32		fmv_iflags;	/* control flags (1st structure) */
+	__u32		fmv_count;	/* # of entries in array incl. input */
+	__u32		fmv_entries;	/* # of entries filled in (output). */
+	__u64		fmv_unused2;	/* future use, must be zero */
+};
+#endif
+
+/*	fmv_flags values - set by XFS_IOC_GETFSMAPX caller.	*/
+/* no flags defined yet */
+#define FMV_IF_VALID	0
+
+/*	fmv_flags values - returned for each non-header segment */
+#define FMV_OF_PREALLOC		0x1	/* segment = unwritten pre-allocation */
+#define FMV_OF_ATTR_FORK	0x2	/* segment = attribute fork */
+#define FMV_OF_EXTENT_MAP	0x4	/* segment = extent map */
+#define FMV_OF_SHARED		0x8	/* segment = shared with another file */
+#define FMV_OF_SPECIAL_OWNER	0x10	/* owner is a special value */
+#define FMV_OF_LAST		0x20	/* segment is the last in the FS */
+
+/*	fmv_owner special values */
+#define	FMV_OWN_FREE		(-1ULL)	/* free space */
+#define FMV_OWN_UNKNOWN		(-2ULL)	/* unknown owner */
+#define FMV_OWN_FS		(-3ULL)	/* static fs metadata */
+#define FMV_OWN_LOG		(-4ULL)	/* journalling log */
+#define FMV_OWN_AG		(-5ULL)	/* per-AG metadata */
+#define FMV_OWN_INOBT		(-6ULL)	/* inode btree blocks */
+#define FMV_OWN_INODES		(-7ULL)	/* inodes */
+#define FMV_OWN_REFC		(-8ULL) /* refcount tree */
+#define FMV_OWN_COW		(-9ULL) /* cow allocations */
+#define FMV_OWN_DEFECTIVE	(-10ULL) /* bad blocks */
+
+/*
  * Structure for XFS_IOC_FSSETDM.
  * For use by backup and restore programs to set the XFS on-disk inode
  * fields di_dmevmask and di_dmstate.  These must be set to exactly and
@@ -523,6 +584,7 @@ typedef struct xfs_swapext
 #define XFS_IOC_GETBMAPX	_IOWR('X', 56, struct getbmap)
 #define XFS_IOC_ZERO_RANGE	_IOW ('X', 57, struct xfs_flock64)
 #define XFS_IOC_FREE_EOFBLOCKS	_IOR ('X', 58, struct xfs_fs_eofblocks)
+#define XFS_IOC_GETFSMAPX	_IOWR('X', 59, struct getfsmapx)
 
 /*
  * ioctl commands that replace IRIX syssgi()'s
diff --git a/libxfs/xfs_refcount.c b/libxfs/xfs_refcount.c
index 855ab54..a19cb45 100644
--- a/libxfs/xfs_refcount.c
+++ b/libxfs/xfs_refcount.c
@@ -1171,8 +1171,9 @@ xfs_refcount_decrease_extent(
  * extent we find.  If no shared blocks are found, flen will be set to zero.
  */
 int
-xfs_refcount_find_shared(
+__xfs_refcount_find_shared(
 	struct xfs_mount	*mp,
+	struct xfs_buf		*agbp,
 	xfs_agnumber_t		agno,
 	xfs_agblock_t		agbno,
 	xfs_extlen_t		aglen,
@@ -1181,23 +1182,13 @@ xfs_refcount_find_shared(
 	bool			find_maximal)
 {
 	struct xfs_btree_cur	*cur;
-	struct xfs_buf		*agbp;
 	struct xfs_refcount_irec	tmp;
-	int			error;
 	int			i, have;
 	int			bt_error = XFS_BTREE_ERROR;
+	int			error;
 
 	trace_xfs_refcount_find_shared(mp, agno, agbno, aglen);
 
-	if (xfs_always_cow) {
-		*fbno = agbno;
-		*flen = aglen;
-		return 0;
-	}
-
-	error = xfs_alloc_read_agf(mp, NULL, agno, 0, &agbp);
-	if (error)
-		goto out;
 	cur = xfs_refcountbt_init_cursor(mp, NULL, agbp, agno, NULL);
 
 	/* By default, skip the whole range */
@@ -1272,14 +1263,46 @@ done:
 
 out_error:
 	xfs_btree_del_cursor(cur, bt_error);
-	xfs_buf_relse(agbp);
-out:
 	if (error)
 		trace_xfs_refcount_find_shared_error(mp, agno, error, _RET_IP_);
 	return error;
 }
 
 /*
+ * Given an AG extent, find the lowest-numbered run of shared blocks within
+ * that range and return the range in fbno/flen.
+ */
+int
+xfs_refcount_find_shared(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	xfs_agblock_t		agbno,
+	xfs_extlen_t		aglen,
+	xfs_agblock_t		*fbno,
+	xfs_extlen_t		*flen,
+	bool			find_maximal)
+{
+	struct xfs_buf		*agbp;
+	int			error;
+
+	if (xfs_always_cow) {
+		*fbno = agbno;
+		*flen = aglen;
+		return 0;
+	}
+
+	error = xfs_alloc_read_agf(mp, NULL, agno, 0, &agbp);
+	if (error)
+		return error;
+
+	error = __xfs_refcount_find_shared(mp, agbp, agno, agbno, aglen,
+			fbno, flen, find_maximal);
+
+	xfs_buf_relse(agbp);
+	return error;
+}
+
+/*
  * Recovering CoW Blocks After a Crash
  *
  * Due to the way that the copy on write mechanism works, there's a window of
diff --git a/libxfs/xfs_refcount.h b/libxfs/xfs_refcount.h
index 6665eeb..44b0346 100644
--- a/libxfs/xfs_refcount.h
+++ b/libxfs/xfs_refcount.h
@@ -53,6 +53,10 @@ extern int xfs_refcount_finish_one(struct xfs_trans *tp,
 		xfs_fsblock_t startblock, xfs_extlen_t blockcount,
 		xfs_extlen_t *adjusted, struct xfs_btree_cur **pcur);
 
+extern int __xfs_refcount_find_shared(struct xfs_mount *mp,
+		struct xfs_buf *agbp, xfs_agnumber_t agno, xfs_agblock_t agbno,
+		xfs_extlen_t aglen, xfs_agblock_t *fbno, xfs_extlen_t *flen,
+		bool find_maximal);
 extern int xfs_refcount_find_shared(struct xfs_mount *mp, xfs_agnumber_t agno,
 		xfs_agblock_t agbno, xfs_extlen_t aglen, xfs_agblock_t *fbno,
 		xfs_extlen_t *flen, bool find_maximal);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 137/145] xfs_db: introduce the 'fsmap' command to find what owns a set of fsblocks
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (135 preceding siblings ...)
  2016-06-17  1:45 ` [PATCH 136/145] xfs: introduce the XFS_IOC_GETFSMAPX ioctl Darrick J. Wong
@ 2016-06-17  1:45 ` Darrick J. Wong
  2016-06-17  1:45 ` [PATCH 138/145] xfs_io: support the new getfsmap ioctl Darrick J. Wong
                   ` (7 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:45 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Introduce a new 'fsmap' command to the fs debugger that will query the
rmap btree to report the file/metadata extents mapped to a range of
physical blocks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 db/Makefile       |    2 -
 db/command.c      |    2 +
 db/fsmap.c        |  163 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 db/fsmap.h        |   20 +++++++
 man/man8/xfs_db.8 |    9 +++
 5 files changed, 195 insertions(+), 1 deletion(-)
 create mode 100644 db/fsmap.c
 create mode 100644 db/fsmap.h


diff --git a/db/Makefile b/db/Makefile
index 8260da3..5adea48 100644
--- a/db/Makefile
+++ b/db/Makefile
@@ -12,7 +12,7 @@ HFILES = addr.h agf.h agfl.h agi.h attr.h attrshort.h bit.h block.h bmap.h \
 	dir2.h dir2sf.h dquot.h echo.h faddr.h field.h \
 	flist.h fprint.h frag.h freesp.h hash.h help.h init.h inode.h input.h \
 	io.h logformat.h malloc.h metadump.h output.h print.h quit.h sb.h \
-	 sig.h strvec.h text.h type.h write.h attrset.h symlink.h
+	 sig.h strvec.h text.h type.h write.h attrset.h symlink.h fsmap.h
 CFILES = $(HFILES:.h=.c)
 LSRCFILES = xfs_admin.sh xfs_ncheck.sh xfs_metadump.sh
 
diff --git a/db/command.c b/db/command.c
index 3c17a1e..278c357 100644
--- a/db/command.c
+++ b/db/command.c
@@ -49,6 +49,7 @@
 #include "write.h"
 #include "malloc.h"
 #include "dquot.h"
+#include "fsmap.h"
 
 cmdinfo_t	*cmdtab;
 int		ncmds;
@@ -128,6 +129,7 @@ init_commands(void)
 	echo_init();
 	frag_init();
 	freesp_init();
+	fsmap_init();
 	help_init();
 	hash_init();
 	inode_init();
diff --git a/db/fsmap.c b/db/fsmap.c
new file mode 100644
index 0000000..a9de401
--- /dev/null
+++ b/db/fsmap.c
@@ -0,0 +1,163 @@
+/*
+ * Copyright (C) 2016 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "libxfs.h"
+#include "command.h"
+#include "fsmap.h"
+#include "output.h"
+#include "init.h"
+
+struct fsmap_info {
+	unsigned long long	nr;
+	xfs_agnumber_t		agno;
+};
+
+static int
+fsmap_fn(
+	struct xfs_btree_cur	*cur,
+	struct xfs_rmap_irec	*rec,
+	void			*priv)
+{
+	struct fsmap_info	*info = priv;
+
+	dbprintf(_("%llu: %u/%u len %u owner %lld offset %llu bmbt %d attrfork %d extflag %d\n"),
+		info->nr, info->agno, rec->rm_startblock,
+		rec->rm_blockcount, rec->rm_owner, rec->rm_offset,
+		!!(rec->rm_flags & XFS_RMAP_BMBT_BLOCK),
+		!!(rec->rm_flags & XFS_RMAP_ATTR_FORK),
+		!!(rec->rm_flags & XFS_RMAP_UNWRITTEN));
+	info->nr++;
+
+	return 0;
+}
+
+int
+fsmap_f(
+	int			argc,
+	char			**argv)
+{
+	char			*p;
+	struct fsmap_info	info;
+	xfs_agnumber_t		start_ag;
+	xfs_agnumber_t		end_ag;
+	xfs_agnumber_t		agno;
+	xfs_fsblock_t		start_fsb = 0;
+	xfs_fsblock_t		end_fsb = NULLFSBLOCK;
+	struct xfs_rmap_irec	low;
+	struct xfs_rmap_irec	high;
+	struct xfs_btree_cur	*bt_cur;
+	struct xfs_buf		*agbp;
+	int			c;
+	xfs_daddr_t		eofs;
+	int			error;
+
+	if (!xfs_sb_version_hasrmapbt(&mp->m_sb)) {
+		dbprintf(_("Filesystem does not support reverse mapping btree.\n"));
+		return 0;
+	}
+
+	while ((c = getopt(argc, argv, "")) != EOF) {
+		switch (c) {
+		default:
+			dbprintf(_("Bad option for fsmap command.\n"));
+			return 0;
+		}
+	}
+
+	if (argc > optind) {
+		start_fsb = strtoull(argv[optind], &p, 0);
+		if (*p != '\0' || start_fsb >= mp->m_sb.sb_dblocks) {
+			dbprintf(_("Bad fsmap start_fsb %s.\n"), argv[optind]);
+			return 0;
+		}
+	}
+
+	if (argc > optind + 1) {
+		end_fsb = strtoull(argv[optind + 1], &p, 0);
+		if (*p != '\0') {
+			dbprintf(_("Bad fsmap end_fsb %s.\n"), argv[optind + 1]);
+			return 0;
+		}
+	}
+
+	eofs = XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks);
+	if (XFS_FSB_TO_DADDR(mp, end_fsb) >= eofs)
+		end_fsb = XFS_DADDR_TO_FSB(mp, eofs - 1);
+
+	low.rm_startblock = XFS_FSB_TO_AGBNO(mp, start_fsb);
+	low.rm_owner = 0;
+	low.rm_offset = 0;
+	low.rm_flags = 0;
+	high.rm_startblock = -1U;
+	high.rm_owner = ULLONG_MAX;
+	high.rm_offset = ULLONG_MAX;
+	high.rm_flags = XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK | XFS_RMAP_UNWRITTEN;
+
+	start_ag = XFS_FSB_TO_AGNO(mp, start_fsb);
+	end_ag = XFS_FSB_TO_AGNO(mp, end_fsb);
+
+	info.nr = 0;
+	for (agno = start_ag; agno <= end_ag; agno++) {
+		if (agno == end_ag)
+			high.rm_startblock = XFS_FSB_TO_AGBNO(mp, end_fsb);
+
+		error = xfs_alloc_read_agf(mp, NULL, agno, 0, &agbp);
+		if (error) {
+			dbprintf(_("Error %d while reading AGF.\n"), error);
+			return 0;
+		}
+
+		bt_cur = xfs_rmapbt_init_cursor(mp, NULL, agbp, agno);
+		if (!bt_cur) {
+			libxfs_putbuf(agbp);
+			dbprintf(_("Not enough memory.\n"));
+			return 0;
+		}
+
+		info.agno = agno;
+		error = xfs_rmapbt_query_range(bt_cur, &low, &high,
+				fsmap_fn, &info);
+		if (error) {
+			xfs_btree_del_cursor(bt_cur, XFS_BTREE_ERROR);
+			libxfs_putbuf(agbp);
+			dbprintf(_("Error %d while querying fsmap btree.\n"),
+				error);
+			return 0;
+		}
+
+		xfs_btree_del_cursor(bt_cur, XFS_BTREE_NOERROR);
+		libxfs_putbuf(agbp);
+
+		if (agno == start_ag)
+			low.rm_startblock = 0;
+	}
+
+	return 0;
+}
+
+static const cmdinfo_t	fsmap_cmd =
+	{ "fsmap", NULL, fsmap_f, 0, 2, 0,
+	  N_("start_fsb [end_fsb]"),
+	  N_("display reverse mapping(s)"), NULL };
+
+void
+fsmap_init(void)
+{
+	add_command(&fsmap_cmd);
+}
diff --git a/db/fsmap.h b/db/fsmap.h
new file mode 100644
index 0000000..f8aacd3
--- /dev/null
+++ b/db/fsmap.h
@@ -0,0 +1,20 @@
+/*
+ * Copyright (C) 2016 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+extern void	fsmap_init(void);
diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8
index b6d2f64..514e3aa 100644
--- a/man/man8/xfs_db.8
+++ b/man/man8/xfs_db.8
@@ -568,6 +568,15 @@ command to convert to and from this form. Block numbers given for file blocks
 .B bmap
 command) are in this form.
 .TP
+.BI "fsmap [ " start " ] [ " end " ]
+Prints the mapping of disk blocks used by an XFS filesystem.  The map
+lists each extent used by files, allocation group metadata,
+journalling logs, and static filesystem metadata, as well as any
+regions that are unused.  All blocks, offsets, and lengths are specified
+in units of 512-byte blocks, no matter what the filesystem's block size is.
+.BI "The optional " start " and " end " arguments can be used to constrain
+the output to a particular range of disk blocks.
+.TP
 .BI hash " string
 Prints the hash value of
 .I string

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 138/145] xfs_io: support the new getfsmap ioctl
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (136 preceding siblings ...)
  2016-06-17  1:45 ` [PATCH 137/145] xfs_db: introduce the 'fsmap' command to find what owns a set of fsblocks Darrick J. Wong
@ 2016-06-17  1:45 ` Darrick J. Wong
  2016-06-17  1:45 ` [PATCH 139/145] xfs: scrub btree records and pointers while querying Darrick J. Wong
                   ` (6 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:45 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 io/Makefile       |    2 
 io/fsmap.c        |  488 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 io/init.c         |    1 
 io/io.h           |    1 
 man/man8/xfs_io.8 |   47 +++++
 5 files changed, 538 insertions(+), 1 deletion(-)
 create mode 100644 io/fsmap.c


diff --git a/io/Makefile b/io/Makefile
index 0b53f41..6439e1d 100644
--- a/io/Makefile
+++ b/io/Makefile
@@ -11,7 +11,7 @@ HFILES = init.h io.h
 CFILES = init.c \
 	attr.c bmap.c file.c freeze.c fsync.c getrusage.c imap.c link.c \
 	mmap.c open.c parent.c pread.c prealloc.c pwrite.c seek.c shutdown.c \
-	sync.c truncate.c reflink.c
+	sync.c truncate.c reflink.c fsmap.c
 
 LLDLIBS = $(LIBXCMD) $(LIBHANDLE)
 LTDEPENDENCIES = $(LIBXCMD) $(LIBHANDLE)
diff --git a/io/fsmap.c b/io/fsmap.c
new file mode 100644
index 0000000..7cf3776
--- /dev/null
+++ b/io/fsmap.c
@@ -0,0 +1,488 @@
+/*
+ * Copyright (C) 2016 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "platform_defs.h"
+#include "command.h"
+#include "init.h"
+#include "io.h"
+#include "input.h"
+
+static cmdinfo_t fsmap_cmd;
+
+static void
+fsmap_help(void)
+{
+	printf(_(
+"\n"
+" prints the block mapping for an XFS filesystem"
+"\n"
+" Example:\n"
+" 'fsmap -vp' - tabular format verbose map, including unwritten extents\n"
+"\n"
+" fsmap prints the map of disk blocks used by the whole filesystem.\n"
+" The map lists each extent used by the file, as well as regions in the\n"
+" filesystem that do not have any corresponding blocks (free space).\n"
+" By default, each line of the listing takes the following form:\n"
+"     extent: [startoffset..endoffset] owner startblock..endblock\n"
+" All the file offsets and disk blocks are in units of 512-byte blocks.\n"
+" -n -- query n extents.\n"
+" -v -- Verbose information, specify ag info.  Show flags legend on 2nd -v\n"
+"\n"));
+}
+
+static int
+numlen(
+	off64_t	val)
+{
+	off64_t	tmp;
+	int	len;
+
+	for (len = 0, tmp = val; tmp > 0; tmp = tmp/10)
+		len++;
+	return (len == 0 ? 1 : len);
+}
+
+static const char *
+special_owner(
+	__int64_t	owner)
+{
+	switch (owner) {
+	case FMV_OWN_FREE:
+		return _("free space");
+	case FMV_OWN_UNKNOWN:
+		return _("unknown");
+	case FMV_OWN_FS:
+		return _("static fs metadata");
+	case FMV_OWN_LOG:
+		return _("journalling log");
+	case FMV_OWN_AG:
+		return _("per-AG metadata");
+	case FMV_OWN_INOBT:
+		return _("inode btree");
+	case FMV_OWN_INODES:
+		return _("inodes");
+	case FMV_OWN_REFC:
+		return _("refcount btree");
+	case FMV_OWN_COW:
+		return _("cow reservation");
+	case FMV_OWN_DEFECTIVE:
+		return _("defective");
+	default:
+		return _("unknown");
+	}
+}
+
+static void
+dump_map(
+	unsigned long long	nr,
+	struct getfsmapx	*map)
+{
+	unsigned long long	i;
+	struct getfsmapx	*p;
+
+	for (i = 0, p = map + 2; i < map->fmv_entries; i++, p++) {
+		printf("\t%llu: %u:%u [%lld..%lld]: ", i + nr,
+			major(p->fmv_device), minor(p->fmv_device),
+			(long long) p->fmv_block,
+			(long long)(p->fmv_block + p->fmv_length - 1));
+		if (p->fmv_oflags & FMV_OF_SPECIAL_OWNER)
+			printf("%s", special_owner(p->fmv_owner));
+		else if (p->fmv_oflags & FMV_OF_EXTENT_MAP)
+			printf(_("inode %lld extent map"),
+				(long long) p->fmv_owner);
+		else
+			printf(_("inode %lld %lld..%lld"),
+				(long long) p->fmv_owner,
+				(long long) p->fmv_offset,
+				(long long)(p->fmv_offset + p->fmv_length - 1));
+		printf(_(" %lld blocks\n"),
+			(long long)p->fmv_length);
+	}
+}
+
+/*
+ * Verbose mode displays:
+ *   extent: major:minor [startblock..endblock]: startoffset..endoffset \
+ *	ag# (agoffset..agendoffset) totalbbs flags
+ */
+#define MINRANGE_WIDTH	16
+#define MINAG_WIDTH	2
+#define MINTOT_WIDTH	5
+#define NFLG		7		/* count of flags */
+#define	FLG_NULL	00000000	/* Null flag */
+#define	FLG_SHARED	01000000	/* shared extent */
+#define	FLG_ATTR_FORK	00100000	/* attribute fork */
+#define	FLG_PRE		00010000	/* Unwritten extent */
+#define	FLG_BSU		00001000	/* Not on begin of stripe unit  */
+#define	FLG_ESU		00000100	/* Not on end   of stripe unit  */
+#define	FLG_BSW		00000010	/* Not on begin of stripe width */
+#define	FLG_ESW		00000001	/* Not on end   of stripe width */
+static void
+dump_map_verbose(
+	unsigned long long	nr,
+	struct getfsmapx	*map,
+	bool			*dumped_flags,
+	struct xfs_fsop_geom	*fsgeo)
+{
+	unsigned long long	i;
+	struct getfsmapx	*p;
+	int			agno;
+	off64_t			agoff, bbperag;
+	int			foff_w, boff_w, aoff_w, tot_w, agno_w, own_w, nr_w, dev_w;
+	char			rbuf[32], bbuf[32], abuf[32], obuf[32], nbuf[32], dbuf[32];
+	int			sunit, swidth;
+	int			flg = 0;
+
+	foff_w = boff_w = aoff_w = own_w = MINRANGE_WIDTH;
+	dev_w = 3;
+	nr_w = 4;
+	tot_w = MINTOT_WIDTH;
+	bbperag = (off64_t)fsgeo->agblocks *
+		  (off64_t)fsgeo->blocksize / BBSIZE;
+	sunit = (fsgeo->sunit * fsgeo->blocksize) / BBSIZE;
+	swidth = (fsgeo->swidth * fsgeo->blocksize) / BBSIZE;
+
+	/*
+	 * Go through the extents and figure out the width
+	 * needed for all columns.
+	 */
+	for (i = 0, p = map + 2; i < map->fmv_entries; i++, p++) {
+		if (p->fmv_oflags & FMV_OF_PREALLOC ||
+		    p->fmv_oflags & FMV_OF_ATTR_FORK ||
+		    p->fmv_oflags & FMV_OF_SHARED)
+			flg = 1;
+		if (sunit &&
+		    (p->fmv_block  % sunit != 0 ||
+		     ((p->fmv_block + p->fmv_length) % sunit) != 0 ||
+		     p->fmv_block % swidth != 0 ||
+		     ((p->fmv_block + p->fmv_length) % swidth) != 0))
+			flg = 1;
+		if (flg)
+			*dumped_flags = true;
+		snprintf(nbuf, sizeof(nbuf), "%llu", nr + i);
+		nr_w = max(nr_w, strlen(nbuf));
+		snprintf(dbuf, sizeof(dbuf), "%u:%u", major(p->fmv_device),
+			minor(p->fmv_device));
+		dev_w = max(dev_w, strlen(dbuf));
+		snprintf(bbuf, sizeof(bbuf), "[%lld..%lld]:",
+			(long long) p->fmv_block,
+			(long long)(p->fmv_block + p->fmv_length - 1));
+		boff_w = max(boff_w, strlen(bbuf));
+		if (p->fmv_oflags & FMV_OF_SPECIAL_OWNER)
+			own_w = max(own_w, strlen(special_owner(p->fmv_owner)));
+		else {
+			snprintf(obuf, sizeof(obuf), "%lld",
+				(long long)p->fmv_owner);
+			own_w = max(own_w, strlen(obuf));
+		}
+		if (p->fmv_oflags & FMV_OF_EXTENT_MAP)
+			foff_w = max(foff_w, strlen(_("extent_map")));
+		else if (p->fmv_oflags & FMV_OF_SPECIAL_OWNER)
+			;
+		else {
+			snprintf(rbuf, sizeof(rbuf), "%lld..%lld",
+				(long long) p->fmv_offset,
+				(long long)(p->fmv_offset + p->fmv_length - 1));
+			foff_w = max(foff_w, strlen(rbuf));
+		}
+		agno = p->fmv_block / bbperag;
+		agoff = p->fmv_block - (agno * bbperag);
+		snprintf(abuf, sizeof(abuf),
+			"(%lld..%lld)",
+			(long long)agoff,
+			(long long)(agoff + p->fmv_length - 1));
+		aoff_w = max(aoff_w, strlen(abuf));
+		tot_w = max(tot_w,
+			numlen(p->fmv_length));
+	}
+	agno_w = max(MINAG_WIDTH, numlen(fsgeo->agcount));
+	if (nr == 0)
+		printf("%*s: %-*s %-*s %-*s %-*s %*s %-*s %*s%s\n",
+			nr_w, _("EXT"),
+			dev_w, _("DEV"),
+			boff_w, _("BLOCK-RANGE"),
+			own_w, _("OWNER"),
+			foff_w, _("FILE-OFFSET"),
+			agno_w, _("AG"),
+			aoff_w, _("AG-OFFSET"),
+			tot_w, _("TOTAL"),
+			flg ? _(" FLAGS") : "");
+	for (i = 0, p = map + 2; i < map->fmv_entries; i++, p++) {
+		flg = FLG_NULL;
+		if (p->fmv_oflags & FMV_OF_PREALLOC)
+			flg |= FLG_PRE;
+		if (p->fmv_oflags & FMV_OF_ATTR_FORK)
+			flg |= FLG_ATTR_FORK;
+		if (p->fmv_oflags & FMV_OF_SHARED)
+			flg |= FLG_SHARED;
+		/*
+		 * If striping enabled, determine if extent starts/ends
+		 * on a stripe unit boundary.
+		 */
+		if (sunit) {
+			if (p->fmv_block  % sunit != 0)
+				flg |= FLG_BSU;
+			if (((p->fmv_block +
+			      p->fmv_length ) % sunit ) != 0)
+				flg |= FLG_ESU;
+			if (p->fmv_block % swidth != 0)
+				flg |= FLG_BSW;
+			if (((p->fmv_block +
+			      p->fmv_length ) % swidth ) != 0)
+				flg |= FLG_ESW;
+		}
+		snprintf(dbuf, sizeof(dbuf), "%u:%u", major(p->fmv_device),
+			minor(p->fmv_device));
+		snprintf(bbuf, sizeof(bbuf), "[%lld..%lld]:",
+			(long long) p->fmv_block,
+			(long long)(p->fmv_block + p->fmv_length - 1));
+		if (p->fmv_oflags & FMV_OF_SPECIAL_OWNER) {
+			snprintf(obuf, sizeof(obuf), "%s",
+				special_owner(p->fmv_owner));
+			snprintf(rbuf, sizeof(rbuf), " ");
+		} else {
+			snprintf(obuf, sizeof(obuf), "%lld",
+				(long long)p->fmv_owner);
+			snprintf(rbuf, sizeof(rbuf), "%lld..%lld",
+				(long long) p->fmv_offset,
+				(long long)(p->fmv_offset + p->fmv_length - 1));
+		}
+		agno = p->fmv_block / bbperag;
+		agoff = p->fmv_block - (agno * bbperag);
+		snprintf(abuf, sizeof(abuf),
+			"(%lld..%lld)",
+			(long long)agoff,
+			(long long)(agoff + p->fmv_length - 1));
+		if (p->fmv_oflags & FMV_OF_EXTENT_MAP)
+			printf("%*llu: %-*s %-*s %-*s %-*s %*d %-*s %*lld\n",
+				nr_w, nr + i,
+				dev_w, dbuf,
+				boff_w, bbuf,
+				own_w, obuf,
+				foff_w, _("extent map"),
+				agno_w, agno,
+				aoff_w, abuf,
+				tot_w, (long long)p->fmv_length);
+		else {
+			printf("%*llu: %-*s %-*s %-*s %-*s", nr_w, nr + i,
+				dev_w, dbuf, boff_w, bbuf, own_w, obuf,
+				foff_w, rbuf);
+			printf(" %*d %-*s", agno_w, agno,
+				aoff_w, abuf);
+			printf(" %*lld", tot_w,
+				(long long)p->fmv_length);
+			if (flg == FLG_NULL)
+				printf("\n");
+			else
+				printf(" %-*.*o\n", NFLG, NFLG, flg);
+		}
+	}
+}
+
+static void
+dump_verbose_key(void)
+{
+	printf(_(" FLAG Values:\n"));
+	printf(_("    %*.*o Shared extent\n"),
+		NFLG+1, NFLG+1, FLG_SHARED);
+	printf(_("    %*.*o Attribute fork\n"),
+		NFLG+1, NFLG+1, FLG_ATTR_FORK);
+	printf(_("    %*.*o Unwritten preallocated extent\n"),
+		NFLG+1, NFLG+1, FLG_PRE);
+	printf(_("    %*.*o Doesn't begin on stripe unit\n"),
+		NFLG+1, NFLG+1, FLG_BSU);
+	printf(_("    %*.*o Doesn't end   on stripe unit\n"),
+		NFLG+1, NFLG+1, FLG_ESU);
+	printf(_("    %*.*o Doesn't begin on stripe width\n"),
+		NFLG+1, NFLG+1, FLG_BSW);
+	printf(_("    %*.*o Doesn't end   on stripe width\n"),
+		NFLG+1, NFLG+1, FLG_ESW);
+}
+
+int
+fsmap_f(
+	int			argc,
+	char			**argv)
+{
+	struct getfsmapx	*p;
+	struct getfsmapx	*nmap;
+	struct getfsmapx	*map;
+	struct xfs_fsop_geom	fsgeo;
+	long long		start = 0;
+	long long		end = -1;
+	int			nmap_size;
+	int			map_size;
+	int			nflag = 0;
+	int			vflag = 0;
+	int			fmv_iflags = 0;	/* flags for GETFSMAPX */
+	int			i = 0;
+	int			c;
+	unsigned long long	nr = 0;
+	size_t			fsblocksize, fssectsize;
+	bool			dumped_flags = false;
+
+	init_cvtnum(&fsblocksize, &fssectsize);
+
+	while ((c = getopt(argc, argv, "n:v")) != EOF) {
+		switch (c) {
+		case 'n':	/* number of extents specified */
+			nflag = atoi(optarg);
+			break;
+		case 'v':	/* Verbose output */
+			vflag++;
+			break;
+		default:
+			return command_usage(&fsmap_cmd);
+		}
+	}
+
+	if (argc > optind) {
+		start = cvtnum(fsblocksize, fssectsize, argv[optind]);
+		if (start < 0) {
+			fprintf(stderr,
+				_("Bad rmap start_fsb %s.\n"),
+				argv[optind]);
+			return 0;
+		}
+	}
+
+	if (argc > optind + 1) {
+		end = cvtnum(fsblocksize, fssectsize, argv[optind + 1]);
+		if (end < 0) {
+			fprintf(stderr,
+				_("Bad rmap end_fsb %s.\n"),
+				argv[optind + 1]);
+			return 0;
+		}
+	}
+
+	if (vflag) {
+		c = xfsctl(file->name, file->fd, XFS_IOC_FSGEOMETRY_V1, &fsgeo);
+		if (c < 0) {
+			fprintf(stderr,
+				_("%s: can't get geometry [\"%s\"]: %s\n"),
+				progname, file->name, strerror(errno));
+			exitcode = 1;
+			return 0;
+		}
+	}
+
+	map_size = nflag ? nflag + 2 : 32;	/* initial guess - 32 */
+	map = malloc(map_size * sizeof(*map));
+	if (map == NULL) {
+		fprintf(stderr, _("%s: malloc of %zu bytes failed.\n"),
+			progname, map_size * sizeof(*map));
+		exitcode = 1;
+		return 0;
+	}
+
+	memset(map, 0, sizeof(*map) * 2);
+	map->fmv_iflags = fmv_iflags;
+	map->fmv_block = start / 512;
+	(map + 1)->fmv_device = UINT_MAX;
+	(map + 1)->fmv_block = (unsigned long long)end / 512;
+	(map + 1)->fmv_owner = ULLONG_MAX;
+	(map + 1)->fmv_offset = ULLONG_MAX;
+
+	/* Count mappings */
+	if (!nflag) {
+		map->fmv_count = 2;
+		i = xfsctl(file->name, file->fd, XFS_IOC_GETFSMAPX, map);
+		if (i < 0) {
+			fprintf(stderr, _("%s: xfsctl(XFS_IOC_GETFSMAPX)"
+				" iflags=0x%x [\"%s\"]: %s\n"),
+				progname, map->fmv_iflags, file->name,
+				strerror(errno));
+			free(map);
+			exitcode = 1;
+			return 0;
+		}
+		if (map->fmv_entries > map_size * 2) {
+			unsigned long long nr;
+
+			nr = 5ULL * map->fmv_entries / 4 + 2;
+			nmap_size = nr > INT_MAX ? INT_MAX : nr;
+			nmap = realloc(map, nmap_size * sizeof(*map));
+			if (nmap == NULL) {
+				fprintf(stderr,
+					_("%s: cannot realloc %zu bytes\n"),
+					progname, map_size*sizeof(*map));
+			} else {
+				map = nmap;
+				map_size = nmap_size;
+			}
+		}
+	}
+
+	map->fmv_count = map_size;
+	do {
+		/* Get some extents */
+		i = xfsctl(file->name, file->fd, XFS_IOC_GETFSMAPX, map);
+		if (i < 0) {
+			fprintf(stderr, _("%s: xfsctl(XFS_IOC_GETFSMAPX)"
+				" iflags=0x%x [\"%s\"]: %s\n"),
+				progname, map->fmv_iflags, file->name,
+				strerror(errno));
+			free(map);
+			exitcode = 1;
+			return 0;
+		}
+
+		if (map->fmv_entries == 0)
+			break;
+
+		if (!vflag)
+			dump_map(nr, map);
+		else
+			dump_map_verbose(nr, map, &dumped_flags, &fsgeo);
+
+		p = map + 1 + map->fmv_entries;
+		if (p->fmv_oflags & FMV_OF_LAST)
+			break;
+
+		nr += map->fmv_entries;
+		map->fmv_device = p->fmv_device;
+		map->fmv_block = p->fmv_block;
+		map->fmv_owner = p->fmv_owner;
+		map->fmv_offset = p->fmv_offset;
+		map->fmv_oflags = p->fmv_oflags;
+		map->fmv_length = p->fmv_length;
+	} while (true);
+
+	if (dumped_flags)
+		dump_verbose_key();
+
+	free(map);
+	return 0;
+}
+
+void
+fsmap_init(void)
+{
+	fsmap_cmd.name = "fsmap";
+	fsmap_cmd.cfunc = fsmap_f;
+	fsmap_cmd.argmin = 0;
+	fsmap_cmd.argmax = -1;
+	fsmap_cmd.flags = CMD_NOMAP_OK;
+	fsmap_cmd.args = _("[-v] [-n nx] [start] [end]");
+	fsmap_cmd.oneline = _("print filesystem mapping for a range of blocks");
+	fsmap_cmd.help = fsmap_help;
+
+	add_command(&fsmap_cmd);
+}
diff --git a/io/init.c b/io/init.c
index 51f1f5c..4ae8274 100644
--- a/io/init.c
+++ b/io/init.c
@@ -60,6 +60,7 @@ init_commands(void)
 	file_init();
 	flink_init();
 	freeze_init();
+	fsmap_init();
 	fsync_init();
 	getrusage_init();
 	help_init();
diff --git a/io/io.h b/io/io.h
index 172b1f8..cef1763 100644
--- a/io/io.h
+++ b/io/io.h
@@ -97,6 +97,7 @@ extern void		bmap_init(void);
 extern void		file_init(void);
 extern void		flink_init(void);
 extern void		freeze_init(void);
+extern void		fsmap_init(void);
 extern void		fsync_init(void);
 extern void		getrusage_init(void);
 extern void		help_init(void);
diff --git a/man/man8/xfs_io.8 b/man/man8/xfs_io.8
index cc70b7c..2872f13 100644
--- a/man/man8/xfs_io.8
+++ b/man/man8/xfs_io.8
@@ -267,6 +267,53 @@ ioctl.  Options behave as described in the
 .BR xfs_bmap (8)
 manual page.
 .TP
+.BI "fsmap [ \-v ] [ \-n " nx " ] [ " start " ] [ " end " ]
+Prints the mapping of disk blocks used by an XFS filesystem.  The map
+lists each extent used by files, allocation group metadata,
+journalling logs, and static filesystem metadata, as well as any
+regions that are unused.  Each line of the listings takes the
+following form:
+.PP
+.RS
+.IR extent ": " major ":" minor " [" startblock .. endblock "]: " owner " " startoffset .. endoffset " " length
+.PP
+Static filesystem metadata, allocation group metadata, btrees,
+journalling logs, and free space are marked by replacing the
+.IR startoffset .. endoffset
+with the appropriate marker.  All blocks, offsets, and lengths are specified
+in units of 512-byte blocks, no matter what the filesystem's block size is.
+.BI "The optional " start " and " end " arguments can be used to constrain
+the output to a particular range of disk blocks.
+.RE
+.RS 1.0i
+.PD 0
+.TP
+.BI \-n " num_extents"
+If this option is given,
+.B xfs_fsmap
+obtains the extent list of the file in groups of
+.I num_extents
+extents. In the absence of
+.BR \-n ", " xfs_fsmap
+queries the system for the number of extents in the filesystem and uses that
+value to compute the group size.
+.TP
+.B \-v
+Shows verbose information. When this flag is specified, additional AG
+specific information is appended to each line in the following form:
+.IP
+.RS 1.2i
+.IR agno " (" startagblock .. endagblock ") " nblocks " " flags
+.RE
+.IP
+A second
+.B \-v
+option will print out the
+.I flags
+legend.
+.RE
+.PD
+.TP
 .BI "extsize [ \-R | \-D ] [ " value " ]"
 Display and/or modify the preferred extent size used when allocating
 space for the currently open file. If the

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 139/145] xfs: scrub btree records and pointers while querying
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (137 preceding siblings ...)
  2016-06-17  1:45 ` [PATCH 138/145] xfs_io: support the new getfsmap ioctl Darrick J. Wong
@ 2016-06-17  1:45 ` Darrick J. Wong
  2016-06-17  1:45 ` [PATCH 140/145] xfs: support scrubbing free space btrees Darrick J. Wong
                   ` (5 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:45 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Create a function that walks a btree, checking the integrity of each
btree block (headers, keys, records) and calling back to the caller
to perform further checks on the records.

v2: Prefix function names with xfs_

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/Makefile         |    2 
 libxfs/xfs_alloc.c      |   33 ++++
 libxfs/xfs_alloc.h      |    3 
 libxfs/xfs_btree.c      |   12 +
 libxfs/xfs_btree.h      |   15 ++
 libxfs/xfs_format.h     |    2 
 libxfs/xfs_rmap.c       |   39 +++++
 libxfs/xfs_rmap_btree.h |    3 
 libxfs/xfs_scrub.c      |  396 +++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_scrub.h      |   76 +++++++++
 10 files changed, 572 insertions(+), 9 deletions(-)
 create mode 100644 libxfs/xfs_scrub.c
 create mode 100644 libxfs/xfs_scrub.h


diff --git a/libxfs/Makefile b/libxfs/Makefile
index 4b1ada0..a2188ec 100644
--- a/libxfs/Makefile
+++ b/libxfs/Makefile
@@ -41,6 +41,7 @@ HFILES = \
 	xfs_refcount_btree.h \
 	xfs_rmap_btree.h \
 	xfs_sb.h \
+	xfs_scrub.h \
 	xfs_shared.h \
 	xfs_trans_resv.h \
 	xfs_trans_space.h \
@@ -93,6 +94,7 @@ CFILES = cache.c \
 	xfs_rmap_btree.c \
 	xfs_rtbitmap.c \
 	xfs_sb.c \
+	xfs_scrub.c \
 	xfs_symlink_remote.c \
 	xfs_trans_resv.c
 
diff --git a/libxfs/xfs_alloc.c b/libxfs/xfs_alloc.c
index 33087e6..7ab05e3 100644
--- a/libxfs/xfs_alloc.c
+++ b/libxfs/xfs_alloc.c
@@ -2920,3 +2920,36 @@ err:
 	xfs_trans_brelse(tp, agbp);
 	return error;
 }
+
+/* Is there a record covering a given extent? */
+int
+xfs_alloc_record_exists(
+	struct xfs_btree_cur	*cur,
+	xfs_agblock_t		bno,
+	xfs_extlen_t		len,
+	bool			*is_freesp)
+{
+	int			stat;
+	xfs_agblock_t		fbno;
+	xfs_extlen_t		flen;
+	int			error;
+
+	error = xfs_alloc_lookup_le(cur, bno, len, &stat);
+	if (error)
+		return error;
+	if (!stat) {
+		*is_freesp = false;
+		return 0;
+	}
+
+	error = xfs_alloc_get_rec(cur, &fbno, &flen, &stat);
+	if (error)
+		return error;
+	if (!stat) {
+		*is_freesp = false;
+		return 0;
+	}
+
+	*is_freesp = (fbno <= bno && fbno + flen >= bno + len);
+	return 0;
+}
diff --git a/libxfs/xfs_alloc.h b/libxfs/xfs_alloc.h
index 9f6373a..4f2ce38 100644
--- a/libxfs/xfs_alloc.h
+++ b/libxfs/xfs_alloc.h
@@ -210,4 +210,7 @@ int xfs_free_extent_fix_freelist(struct xfs_trans *tp, xfs_agnumber_t agno,
 
 xfs_extlen_t xfs_prealloc_blocks(struct xfs_mount *mp);
 
+int xfs_alloc_record_exists(struct xfs_btree_cur *cur, xfs_agblock_t bno,
+		xfs_extlen_t len, bool *is_freesp);
+
 #endif	/* __XFS_ALLOC_H__ */
diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c
index 89fb2fe..89d4bec 100644
--- a/libxfs/xfs_btree.c
+++ b/libxfs/xfs_btree.c
@@ -545,7 +545,7 @@ xfs_btree_ptr_offset(
 /*
  * Return a pointer to the n-th record in the btree block.
  */
-STATIC union xfs_btree_rec *
+union xfs_btree_rec *
 xfs_btree_rec_addr(
 	struct xfs_btree_cur	*cur,
 	int			n,
@@ -558,7 +558,7 @@ xfs_btree_rec_addr(
 /*
  * Return a pointer to the n-th key in the btree block.
  */
-STATIC union xfs_btree_key *
+union xfs_btree_key *
 xfs_btree_key_addr(
 	struct xfs_btree_cur	*cur,
 	int			n,
@@ -571,7 +571,7 @@ xfs_btree_key_addr(
 /*
  * Return a pointer to the n-th high key in the btree block.
  */
-STATIC union xfs_btree_key *
+union xfs_btree_key *
 xfs_btree_high_key_addr(
 	struct xfs_btree_cur	*cur,
 	int			n,
@@ -584,7 +584,7 @@ xfs_btree_high_key_addr(
 /*
  * Return a pointer to the n-th block pointer in the btree block.
  */
-STATIC union xfs_btree_ptr *
+union xfs_btree_ptr *
 xfs_btree_ptr_addr(
 	struct xfs_btree_cur	*cur,
 	int			n,
@@ -618,7 +618,7 @@ xfs_btree_get_iroot(
  * Retrieve the block pointer from the cursor at the given level.
  * This may be an inode btree root or from a buffer.
  */
-STATIC struct xfs_btree_block *		/* generic btree block pointer */
+struct xfs_btree_block *		/* generic btree block pointer */
 xfs_btree_get_block(
 	struct xfs_btree_cur	*cur,	/* btree cursor */
 	int			level,	/* level in btree */
@@ -1729,7 +1729,7 @@ error0:
 	return error;
 }
 
-STATIC int
+int
 xfs_btree_lookup_get_block(
 	struct xfs_btree_cur	*cur,	/* btree cursor */
 	int			level,	/* level in the btree */
diff --git a/libxfs/xfs_btree.h b/libxfs/xfs_btree.h
index dbf299f..6f22cb0 100644
--- a/libxfs/xfs_btree.h
+++ b/libxfs/xfs_btree.h
@@ -194,7 +194,6 @@ struct xfs_btree_ops {
 
 	const struct xfs_buf_ops	*buf_ops;
 
-#if defined(DEBUG) || defined(XFS_WARN)
 	/* check that k1 is lower than k2 */
 	int	(*keys_inorder)(struct xfs_btree_cur *cur,
 				union xfs_btree_key *k1,
@@ -204,7 +203,6 @@ struct xfs_btree_ops {
 	int	(*recs_inorder)(struct xfs_btree_cur *cur,
 				union xfs_btree_rec *r1,
 				union xfs_btree_rec *r2);
-#endif
 };
 
 /* btree ops flags */
@@ -537,4 +535,17 @@ int xfs_btree_visit_blocks(struct xfs_btree_cur *cur,
 
 int xfs_btree_count_blocks(struct xfs_btree_cur *cur, xfs_extlen_t *blocks);
 
+union xfs_btree_rec *xfs_btree_rec_addr(struct xfs_btree_cur *cur, int n,
+		struct xfs_btree_block *block);
+union xfs_btree_key *xfs_btree_key_addr(struct xfs_btree_cur *cur, int n,
+		struct xfs_btree_block *block);
+union xfs_btree_key *xfs_btree_high_key_addr(struct xfs_btree_cur *cur, int n,
+		struct xfs_btree_block *block);
+union xfs_btree_ptr *xfs_btree_ptr_addr(struct xfs_btree_cur *cur, int n,
+		struct xfs_btree_block *block);
+int xfs_btree_lookup_get_block(struct xfs_btree_cur *cur, int level,
+		union xfs_btree_ptr *pp, struct xfs_btree_block **blkp);
+struct xfs_btree_block *xfs_btree_get_block(struct xfs_btree_cur *cur,
+		int level, struct xfs_buf **bpp);
+
 #endif	/* __XFS_BTREE_H__ */
diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index bfbf6e8..de69327 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -518,7 +518,7 @@ static inline int xfs_sb_version_hasftype(struct xfs_sb *sbp)
 		 (sbp->sb_features2 & XFS_SB_VERSION2_FTYPE));
 }
 
-static inline int xfs_sb_version_hasfinobt(xfs_sb_t *sbp)
+static inline bool xfs_sb_version_hasfinobt(xfs_sb_t *sbp)
 {
 	return (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) &&
 		(sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_FINOBT);
diff --git a/libxfs/xfs_rmap.c b/libxfs/xfs_rmap.c
index 23bffdc..3ec9a35 100644
--- a/libxfs/xfs_rmap.c
+++ b/libxfs/xfs_rmap.c
@@ -2326,3 +2326,42 @@ xfs_rmap_free_defer(
 
 	return __xfs_rmap_add(mp, dfops, &ri);
 }
+
+/* Is there a record covering a given extent? */
+int
+xfs_rmap_record_exists(
+	struct xfs_btree_cur	*cur,
+	xfs_agblock_t		bno,
+	xfs_extlen_t		len,
+	struct xfs_owner_info	*oinfo,
+	bool			*has_rmap)
+{
+	uint64_t		owner;
+	uint64_t		offset;
+	unsigned int		flags;
+	int			stat;
+	struct xfs_rmap_irec	irec;
+	int			error;
+
+	xfs_owner_info_unpack(oinfo, &owner, &offset, &flags);
+
+	error = xfs_rmap_lookup_le(cur, bno, len, owner, offset, flags, &stat);
+	if (error)
+		return error;
+	if (!stat) {
+		*has_rmap = false;
+		return 0;
+	}
+
+	error = xfs_rmap_get_rec(cur, &irec, &stat);
+	if (error)
+		return error;
+	if (!stat) {
+		*has_rmap = false;
+		return 0;
+	}
+
+	*has_rmap = (irec.rm_startblock <= bno &&
+		     irec.rm_startblock + irec.rm_blockcount >= bno + len);
+	return 0;
+}
diff --git a/libxfs/xfs_rmap_btree.h b/libxfs/xfs_rmap_btree.h
index 5baa81f..2f072c8 100644
--- a/libxfs/xfs_rmap_btree.h
+++ b/libxfs/xfs_rmap_btree.h
@@ -144,4 +144,7 @@ extern xfs_extlen_t xfs_rmapbt_max_size(struct xfs_mount *mp);
 extern int xfs_rmapbt_calc_reserves(struct xfs_mount *mp,
 		xfs_agnumber_t agno, xfs_extlen_t *ask, xfs_extlen_t *used);
 
+extern int xfs_rmap_record_exists(struct xfs_btree_cur *cur, xfs_agblock_t bno,
+		xfs_extlen_t len, struct xfs_owner_info *oinfo, bool *has_rmap);
+
 #endif	/* __XFS_RMAP_BTREE_H__ */
diff --git a/libxfs/xfs_scrub.c b/libxfs/xfs_scrub.c
new file mode 100644
index 0000000..750c482
--- /dev/null
+++ b/libxfs/xfs_scrub.c
@@ -0,0 +1,396 @@
+/*
+ * Copyright (C) 2016 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "libxfs_priv.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_alloc.h"
+#include "xfs_bmap.h"
+#include "xfs_ialloc.h"
+#include "xfs_refcount.h"
+#include "xfs_alloc_btree.h"
+#include "xfs_rmap_btree.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_scrub.h"
+
+static const char * const btree_types[] = {
+	[XFS_BTNUM_BNO]		= "bnobt",
+	[XFS_BTNUM_CNT]		= "cntbt",
+	[XFS_BTNUM_RMAP]	= "rmapbt",
+	[XFS_BTNUM_BMAP]	= "bmapbt",
+	[XFS_BTNUM_INO]		= "inobt",
+	[XFS_BTNUM_FINO]	= "finobt",
+	[XFS_BTNUM_REFC]	= "refcountbt",
+};
+
+/* Report a scrub corruption in dmesg. */
+void
+xfs_btree_scrub_error(
+	struct xfs_btree_cur		*cur,
+	int				level,
+	const char			*file,
+	int				line,
+	const char			*check)
+{
+	char				buf[16];
+	xfs_fsblock_t			fsbno;
+
+	if (cur->bc_ptrs[level] >= 1)
+		snprintf(buf, 16, " ptr %d", cur->bc_ptrs[level]);
+	else
+		buf[0] = 0;
+
+	fsbno = XFS_DADDR_TO_FSB(cur->bc_mp, cur->bc_bufs[level]->b_bn);
+	xfs_alert(cur->bc_mp, "scrub: %s btree corruption in block %u/%u%s: %s, file: %s, line: %d",
+			btree_types[cur->bc_btnum],
+			XFS_FSB_TO_AGNO(cur->bc_mp, fsbno),
+			XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno),
+			buf, check, file, line);
+}
+
+/* AG metadata scrubbing */
+
+/*
+ * Make sure this record is in order and doesn't stray outside of the parent
+ * keys.
+ */
+static int
+xfs_btree_scrub_rec(
+	struct xfs_btree_scrub	*bs)
+{
+	struct xfs_btree_cur	*cur = bs->cur;
+	union xfs_btree_rec	*rec;
+	union xfs_btree_key	key;
+	union xfs_btree_key	*keyp;
+	struct xfs_btree_block	*block;
+	struct xfs_btree_block	*keyblock;
+
+	block = XFS_BUF_TO_BLOCK(cur->bc_bufs[0]);
+	rec = xfs_btree_rec_addr(cur, cur->bc_ptrs[0], block);
+
+	/* If this isn't the first record, are they in order? */
+	XFS_BTREC_SCRUB_CHECK(bs, bs->firstrec ||
+			cur->bc_ops->recs_inorder(cur, &bs->lastrec, rec));
+	bs->firstrec = false;
+	bs->lastrec = *rec;
+
+	if (cur->bc_nlevels == 1)
+		return 0;
+
+	/* Is this at least as large as the parent low key? */
+	cur->bc_ops->init_key_from_rec(&key, rec);
+	keyblock = XFS_BUF_TO_BLOCK(cur->bc_bufs[1]);
+	keyp = xfs_btree_key_addr(cur, cur->bc_ptrs[1], keyblock);
+
+	XFS_BTKEY_SCRUB_CHECK(bs, 0,
+			cur->bc_ops->diff_two_keys(cur, keyp, &key) >= 0);
+
+	if (!(cur->bc_ops->flags & XFS_BTREE_OPS_OVERLAPPING))
+		return 0;
+
+	/* Is this no larger than the parent high key? */
+	keyp = xfs_btree_high_key_addr(cur, cur->bc_ptrs[1], keyblock);
+
+	XFS_BTKEY_SCRUB_CHECK(bs, 0,
+			cur->bc_ops->diff_two_keys(cur, &key, keyp) >= 0);
+
+	return 0;
+}
+
+/*
+ * Make sure this key is in order and doesn't stray outside of the parent
+ * keys.
+ */
+static int
+xfs_btree_scrub_key(
+	struct xfs_btree_scrub	*bs,
+	int			level)
+{
+	struct xfs_btree_cur	*cur = bs->cur;
+	union xfs_btree_key	*key;
+	union xfs_btree_key	*keyp;
+	struct xfs_btree_block	*block;
+	struct xfs_btree_block	*keyblock;
+
+	block = XFS_BUF_TO_BLOCK(cur->bc_bufs[level]);
+	key = xfs_btree_key_addr(cur, cur->bc_ptrs[level], block);
+
+	/* If this isn't the first key, are they in order? */
+	XFS_BTKEY_SCRUB_CHECK(bs, level, bs->firstkey[level] ||
+			cur->bc_ops->keys_inorder(cur, &bs->lastkey[level],
+					key));
+	bs->firstkey[level] = false;
+	bs->lastkey[level] = *key;
+
+	if (level + 1 >= cur->bc_nlevels)
+		return 0;
+
+	/* Is this at least as large as the parent low key? */
+	keyblock = XFS_BUF_TO_BLOCK(cur->bc_bufs[level + 1]);
+	keyp = xfs_btree_key_addr(cur, cur->bc_ptrs[level + 1], keyblock);
+
+	XFS_BTKEY_SCRUB_CHECK(bs, level,
+			cur->bc_ops->diff_two_keys(cur, keyp, key) >= 0);
+
+	if (!(cur->bc_ops->flags & XFS_BTREE_OPS_OVERLAPPING))
+		return 0;
+
+	/* Is this no larger than the parent high key? */
+	key = xfs_btree_high_key_addr(cur, cur->bc_ptrs[level], block);
+	keyp = xfs_btree_high_key_addr(cur, cur->bc_ptrs[level + 1], keyblock);
+
+	XFS_BTKEY_SCRUB_CHECK(bs, level,
+			cur->bc_ops->diff_two_keys(cur, key, keyp) >= 0);
+
+	return 0;
+}
+
+struct check_owner {
+	struct list_head	list;
+	xfs_agblock_t		bno;
+};
+
+/*
+ * Make sure this btree block isn't in the free list and that there's
+ * an rmap record for it.
+ */
+static int
+xfs_btree_block_check_owner(
+	struct xfs_btree_scrub		*bs,
+	xfs_agblock_t			bno)
+{
+	bool				has_rmap;
+	bool				is_freesp;
+	int				error;
+
+	/* Check that this block isn't free */
+	error = xfs_alloc_record_exists(bs->bno_cur, bno, 1, &is_freesp);
+	if (error)
+		goto err;
+	XFS_BTREC_SCRUB_CHECK(bs, !is_freesp);
+
+	if (!bs->rmap_cur)
+		return 0;
+
+	/* Check that there's an rmap record for this */
+	error = xfs_rmap_record_exists(bs->rmap_cur, bno, 1, &bs->oinfo,
+			&has_rmap);
+	if (error)
+		goto err;
+	XFS_BTREC_SCRUB_CHECK(bs, has_rmap);
+err:
+	return error;
+}
+
+/* Check the owner of a btree block. */
+static int
+xfs_btree_scrub_check_owner(
+	struct xfs_btree_scrub		*bs,
+	struct xfs_buf			*bp)
+{
+	struct xfs_btree_cur		*cur = bs->cur;
+	xfs_agblock_t			bno;
+	xfs_fsblock_t			fsbno;
+	struct check_owner		*co;
+
+	fsbno = XFS_DADDR_TO_FSB(cur->bc_mp, bp->b_bn);
+	bno = XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno);
+
+	/* Do we need to defer this one? */
+	if ((!bs->rmap_cur && xfs_sb_version_hasrmapbt(&cur->bc_mp->m_sb)) ||
+	    !bs->bno_cur) {
+		co = kmem_alloc(sizeof(struct check_owner), KM_SLEEP | KM_NOFS);
+		co->bno = bno;
+		list_add_tail(&co->list, &bs->to_check);
+		return 0;
+	}
+
+	return xfs_btree_block_check_owner(bs, bno);
+}
+
+/*
+ * Visit all nodes and leaves of a btree.  Check that all pointers and
+ * records are in order, that the keys reflect the records, and use a callback
+ * so that the caller can verify individual records.  The callback is the same
+ * as the one for xfs_btree_query_range, so therefore this function also
+ * returns XFS_BTREE_QUERY_RANGE_ABORT, zero, or a negative error code.
+ */
+int
+xfs_btree_scrub(
+	struct xfs_btree_scrub		*bs)
+{
+	struct xfs_btree_cur		*cur = bs->cur;
+	union xfs_btree_ptr		ptr;
+	union xfs_btree_ptr		*pp;
+	union xfs_btree_rec		*recp;
+	struct xfs_btree_block		*block;
+	int				level;
+	struct xfs_buf			*bp;
+	int				i;
+	struct check_owner		*co, *n;
+	int				error;
+
+	/* Finish filling out the scrub state */
+	bs->error = 0;
+	bs->firstrec = true;
+	for (i = 0; i < XFS_BTREE_MAXLEVELS; i++)
+		bs->firstkey[i] = true;
+	bs->bno_cur = bs->rmap_cur = NULL;
+	INIT_LIST_HEAD(&bs->to_check);
+	if (bs->cur->bc_btnum != XFS_BTNUM_BNO)
+		bs->bno_cur = xfs_allocbt_init_cursor(cur->bc_mp, NULL,
+				bs->agf_bp, bs->cur->bc_private.a.agno,
+				XFS_BTNUM_BNO);
+	if (bs->cur->bc_btnum != XFS_BTNUM_RMAP &&
+	    xfs_sb_version_hasrmapbt(&cur->bc_mp->m_sb))
+		bs->rmap_cur = xfs_rmapbt_init_cursor(cur->bc_mp, NULL,
+				bs->agf_bp, bs->cur->bc_private.a.agno);
+
+	/* Load the root of the btree. */
+	level = cur->bc_nlevels - 1;
+	cur->bc_ops->init_ptr_from_cur(cur, &ptr);
+	error = xfs_btree_lookup_get_block(cur, level, &ptr, &block);
+	if (error)
+		goto out;
+
+	xfs_btree_get_block(cur, level, &bp);
+	error = xfs_btree_check_block(cur, block, level, bp);
+	if (error)
+		goto out;
+	error = xfs_btree_scrub_check_owner(bs, bp);
+	if (error)
+		goto out;
+
+	cur->bc_ptrs[level] = 1;
+
+	while (level < cur->bc_nlevels) {
+		block = XFS_BUF_TO_BLOCK(cur->bc_bufs[level]);
+
+		if (level == 0) {
+			/* End of leaf, pop back towards the root. */
+			if (cur->bc_ptrs[level] >
+			    be16_to_cpu(block->bb_numrecs)) {
+				if (level < cur->bc_nlevels - 1)
+					cur->bc_ptrs[level + 1]++;
+				level++;
+				continue;
+			}
+
+			/* Records in order for scrub? */
+			error = xfs_btree_scrub_rec(bs);
+			if (error)
+				goto out;
+
+			recp = xfs_btree_rec_addr(cur, cur->bc_ptrs[0], block);
+			error = bs->scrub_rec(bs, recp);
+			if (error < 0 ||
+			    error == XFS_BTREE_QUERY_RANGE_ABORT)
+				break;
+
+			cur->bc_ptrs[level]++;
+			continue;
+		}
+
+		/* End of node, pop back towards the root. */
+		if (cur->bc_ptrs[level] > be16_to_cpu(block->bb_numrecs)) {
+			if (level < cur->bc_nlevels - 1)
+				cur->bc_ptrs[level + 1]++;
+			level++;
+			continue;
+		}
+
+		/* Keys in order for scrub? */
+		error = xfs_btree_scrub_key(bs, level);
+		if (error)
+			goto out;
+
+		/* Drill another level deeper. */
+		pp = xfs_btree_ptr_addr(cur, cur->bc_ptrs[level], block);
+		level--;
+		error = xfs_btree_lookup_get_block(cur, level, pp,
+				&block);
+		if (error)
+			goto out;
+
+		xfs_btree_get_block(cur, level, &bp);
+		error = xfs_btree_check_block(cur, block, level, bp);
+		if (error)
+			goto out;
+
+		error = xfs_btree_scrub_check_owner(bs, bp);
+		if (error)
+			goto out;
+
+		cur->bc_ptrs[level] = 1;
+	}
+
+out:
+	/*
+	 * If we don't end this function with the cursor pointing at a record
+	 * block, a subsequent non-error cursor deletion will not release
+	 * node-level buffers, causing a buffer leak.  This is quite possible
+	 * with a zero-results range query, so release the buffers if we
+	 * failed to return any results.
+	 */
+	if (cur->bc_bufs[0] == NULL) {
+		for (i = 0; i < cur->bc_nlevels; i++) {
+			if (cur->bc_bufs[i]) {
+				xfs_trans_brelse(cur->bc_tp, cur->bc_bufs[i]);
+				cur->bc_bufs[i] = NULL;
+				cur->bc_ptrs[i] = 0;
+				cur->bc_ra[i] = 0;
+			}
+		}
+	}
+
+	/* Check the deferred stuff */
+	if (!error) {
+		if (bs->cur->bc_btnum == XFS_BTNUM_BNO)
+			bs->bno_cur = bs->cur;
+		else if (bs->cur->bc_btnum == XFS_BTNUM_RMAP)
+			bs->rmap_cur = bs->cur;
+		list_for_each_entry(co, &bs->to_check, list) {
+			error = xfs_btree_block_check_owner(bs, co->bno);
+			if (error)
+				break;
+		}
+	}
+	list_for_each_entry_safe(co, n, &bs->to_check, list) {
+		list_del(&co->list);
+		kmem_free(co);
+	}
+
+	if (bs->bno_cur && bs->bno_cur != bs->cur)
+		xfs_btree_del_cursor(bs->bno_cur, XFS_BTREE_ERROR);
+	if (bs->rmap_cur && bs->rmap_cur != bs->cur)
+		xfs_btree_del_cursor(bs->rmap_cur, XFS_BTREE_ERROR);
+
+	if (error || bs->error)
+		xfs_alert(cur->bc_mp,
+			"Corruption detected. Unmount and run xfs_repair.");
+
+	return error;
+}
diff --git a/libxfs/xfs_scrub.h b/libxfs/xfs_scrub.h
new file mode 100644
index 0000000..af80a9d
--- /dev/null
+++ b/libxfs/xfs_scrub.h
@@ -0,0 +1,76 @@
+/*
+ * Copyright (C) 2016 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef __XFS_SCRUB_H__
+#define	__XFS_SCRUB_H__
+
+/* btree scrub */
+struct xfs_btree_scrub;
+
+typedef int (*xfs_btree_scrub_rec_fn)(
+	struct xfs_btree_scrub	*bs,
+	union xfs_btree_rec	*rec);
+
+struct xfs_btree_scrub {
+	/* caller-provided scrub state */
+	struct xfs_btree_cur		*cur;
+	xfs_btree_scrub_rec_fn		scrub_rec;
+	struct xfs_buf			*agi_bp;
+	struct xfs_buf			*agf_bp;
+	struct xfs_buf			*agfl_bp;
+	struct xfs_owner_info		oinfo;
+
+	/* internal scrub state */
+	union xfs_btree_rec		lastrec;
+	bool				firstrec;
+	union xfs_btree_key		lastkey[XFS_BTREE_MAXLEVELS];
+	bool				firstkey[XFS_BTREE_MAXLEVELS];
+	struct xfs_btree_cur		*rmap_cur;
+	struct xfs_btree_cur		*bno_cur;
+	struct list_head		to_check;
+	int				error;
+};
+
+int xfs_btree_scrub(struct xfs_btree_scrub *bs);
+void xfs_btree_scrub_error(struct xfs_btree_cur *cur, int level,
+		const char *file, int line, const char *check);
+#define XFS_BTREC_SCRUB_CHECK(bs, fs_ok) \
+	if (!(fs_ok)) { \
+		xfs_btree_scrub_error((bs)->cur, 0, __FILE__, __LINE__, #fs_ok); \
+		(bs)->error = -EFSCORRUPTED; \
+	}
+#define XFS_BTREC_SCRUB_GOTO(bs, fs_ok, label) \
+	if (!(fs_ok)) { \
+		xfs_btree_scrub_error((bs)->cur, 0, __FILE__, __LINE__, #fs_ok); \
+		(bs)->error = -EFSCORRUPTED; \
+		goto label; \
+	}
+#define XFS_BTKEY_SCRUB_CHECK(bs, level, fs_ok) \
+	if (!(fs_ok)) { \
+		xfs_btree_scrub_error((bs)->cur, (level), __FILE__, __LINE__, #fs_ok); \
+		(bs)->error = -EFSCORRUPTED; \
+	}
+#define XFS_BTKEY_SCRUB_GOTO(bs, level, fs_ok, label) \
+	if (!(fs_ok)) { \
+		xfs_btree_scrub_error((bs)->cur, 0, __FILE__, __LINE__, #fs_ok); \
+		(bs)->error = -EFSCORRUPTED; \
+		goto label; \
+	}
+
+#endif	/* __XFS_SCRUB_H__ */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 140/145] xfs: support scrubbing free space btrees
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (138 preceding siblings ...)
  2016-06-17  1:45 ` [PATCH 139/145] xfs: scrub btree records and pointers while querying Darrick J. Wong
@ 2016-06-17  1:45 ` Darrick J. Wong
  2016-06-17  1:45 ` [PATCH 141/145] xfs: support scrubbing inode btrees Darrick J. Wong
                   ` (4 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:45 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Plumb in the pieces necessary to check the free space btrees.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/libxfs_priv.h     |    1 
 libxfs/xfs_alloc.c       |   98 ++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_alloc.h       |    3 +
 libxfs/xfs_alloc_btree.c |   51 +++++++++++++++++++++---
 4 files changed, 147 insertions(+), 6 deletions(-)


diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h
index 527bd49..dabdafc 100644
--- a/libxfs/libxfs_priv.h
+++ b/libxfs/libxfs_priv.h
@@ -518,5 +518,6 @@ int libxfs_zero_extent(struct xfs_inode *ip, xfs_fsblock_t start_fsb,
 bool xfs_log_check_lsn(struct xfs_mount *, xfs_lsn_t);
 
 #define xfs_always_cow	(false)
+#define xfs_err(...)	((void)0)
 
 #endif	/* __LIBXFS_INTERNAL_XFS_H__ */
diff --git a/libxfs/xfs_alloc.c b/libxfs/xfs_alloc.c
index 7ab05e3..7219057 100644
--- a/libxfs/xfs_alloc.c
+++ b/libxfs/xfs_alloc.c
@@ -35,6 +35,7 @@
 #include "xfs_trans.h"
 #include "xfs_ag_resv.h"
 #include "xfs_refcount_btree.h"
+#include "xfs_scrub.h"
 
 struct workqueue_struct *xfs_alloc_wq;
 
@@ -2953,3 +2954,100 @@ xfs_alloc_record_exists(
 	*is_freesp = (fbno <= bno && fbno + flen >= bno + len);
 	return 0;
 }
+
+STATIC int
+xfs_allocbt_scrub_rmap_check(
+	struct xfs_btree_cur		*cur,
+	struct xfs_rmap_irec		*rec,
+	void				*priv)
+{
+	xfs_err(cur->bc_mp, "%s: freespace in rmapbt! %u/%u %u %lld %lld %x",
+			__func__, cur->bc_private.a.agno, rec->rm_startblock,
+			rec->rm_blockcount, rec->rm_owner, rec->rm_offset,
+			rec->rm_flags);
+	return XFS_BTREE_QUERY_RANGE_ABORT;
+}
+
+STATIC int
+xfs_allocbt_scrub_helper(
+	struct xfs_btree_scrub		*bs,
+	union xfs_btree_rec		*rec)
+{
+	struct xfs_mount		*mp = bs->cur->bc_mp;
+	xfs_agblock_t			bno;
+	xfs_extlen_t			len;
+	struct xfs_rmap_irec		low;
+	struct xfs_rmap_irec		high;
+	bool				no_rmap;
+	int				error;
+
+	bno = be32_to_cpu(rec->alloc.ar_startblock);
+	len = be32_to_cpu(rec->alloc.ar_blockcount);
+
+	XFS_BTREC_SCRUB_CHECK(bs, bno <= mp->m_sb.sb_agblocks);
+	XFS_BTREC_SCRUB_CHECK(bs, bno < bno + len);
+	XFS_BTREC_SCRUB_CHECK(bs, (unsigned long long)bno + len <=
+			mp->m_sb.sb_agblocks);
+
+	/* if rmapbt, make sure there's no record */
+	if (!bs->rmap_cur)
+		return 0;
+
+	memset(&low, 0, sizeof(low));
+	low.rm_startblock = bno;
+	memset(&high, 0xFF, sizeof(high));
+	high.rm_startblock = bno + len - 1;
+
+	error = xfs_rmapbt_query_range(bs->rmap_cur, &low, &high,
+			&xfs_allocbt_scrub_rmap_check, NULL);
+	if (error && error != XFS_BTREE_QUERY_RANGE_ABORT)
+		goto err;
+	no_rmap = error == 0;
+	XFS_BTREC_SCRUB_CHECK(bs, no_rmap);
+err:
+	return error;
+}
+
+/* Scrub the freespace btrees for some AG. */
+STATIC int
+xfs_allocbt_scrub(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	int			which)
+{
+	struct xfs_btree_scrub	bs;
+	int			error;
+
+	error = xfs_alloc_read_agf(mp, NULL, agno, 0, &bs.agf_bp);
+	if (error)
+		return error;
+
+	bs.cur = xfs_allocbt_init_cursor(mp, NULL, bs.agf_bp, agno, which);
+	bs.scrub_rec = xfs_allocbt_scrub_helper;
+	xfs_rmap_ag_owner(&bs.oinfo, XFS_RMAP_OWN_AG);
+	error = xfs_btree_scrub(&bs);
+	xfs_btree_del_cursor(bs.cur,
+			error ? XFS_BTREE_ERROR : XFS_BTREE_NOERROR);
+	xfs_trans_brelse(NULL, bs.agf_bp);
+
+	if (!error && bs.error)
+		error = bs.error;
+
+	return error;
+}
+
+int
+xfs_bnobt_scrub(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno)
+{
+	return xfs_allocbt_scrub(mp, agno, XFS_BTNUM_BNO);
+}
+
+int
+xfs_cntbt_scrub(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno)
+{
+	return xfs_allocbt_scrub(mp, agno, XFS_BTNUM_CNT);
+}
diff --git a/libxfs/xfs_alloc.h b/libxfs/xfs_alloc.h
index 4f2ce38..f1fcc7e 100644
--- a/libxfs/xfs_alloc.h
+++ b/libxfs/xfs_alloc.h
@@ -213,4 +213,7 @@ xfs_extlen_t xfs_prealloc_blocks(struct xfs_mount *mp);
 int xfs_alloc_record_exists(struct xfs_btree_cur *cur, xfs_agblock_t bno,
 		xfs_extlen_t len, bool *is_freesp);
 
+int xfs_bnobt_scrub(struct xfs_mount *mp, xfs_agnumber_t agno);
+int xfs_cntbt_scrub(struct xfs_mount *mp, xfs_agnumber_t agno);
+
 #endif	/* __XFS_ALLOC_H__ */
diff --git a/libxfs/xfs_alloc_btree.c b/libxfs/xfs_alloc_btree.c
index ff4bae4..1794791 100644
--- a/libxfs/xfs_alloc_btree.c
+++ b/libxfs/xfs_alloc_btree.c
@@ -254,6 +254,26 @@ xfs_allocbt_key_diff(
 	return (__int64_t)be32_to_cpu(kp->ar_startblock) - rec->ar_startblock;
 }
 
+STATIC __int64_t
+xfs_bnobt_diff_two_keys(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_key	*k1,
+	union xfs_btree_key	*k2)
+{
+	return (__int64_t)be32_to_cpu(k2->alloc.ar_startblock) -
+			  be32_to_cpu(k1->alloc.ar_startblock);
+}
+
+STATIC __int64_t
+xfs_cntbt_diff_two_keys(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_key	*k1,
+	union xfs_btree_key	*k2)
+{
+	return (__int64_t)be32_to_cpu(k2->alloc.ar_blockcount) -
+			  be32_to_cpu(k1->alloc.ar_blockcount);
+}
+
 static bool
 xfs_allocbt_verify(
 	struct xfs_buf		*bp)
@@ -342,7 +362,6 @@ const struct xfs_buf_ops xfs_allocbt_buf_ops = {
 };
 
 
-#if defined(DEBUG) || defined(XFS_WARN)
 STATIC int
 xfs_allocbt_keys_inorder(
 	struct xfs_btree_cur	*cur,
@@ -379,9 +398,29 @@ xfs_allocbt_recs_inorder(
 			 be32_to_cpu(r2->alloc.ar_startblock));
 	}
 }
-#endif	/* DEBUG */
 
-static const struct xfs_btree_ops xfs_allocbt_ops = {
+static const struct xfs_btree_ops xfs_bnobt_ops = {
+	.rec_len		= sizeof(xfs_alloc_rec_t),
+	.key_len		= sizeof(xfs_alloc_key_t),
+
+	.dup_cursor		= xfs_allocbt_dup_cursor,
+	.set_root		= xfs_allocbt_set_root,
+	.alloc_block		= xfs_allocbt_alloc_block,
+	.free_block		= xfs_allocbt_free_block,
+	.update_lastrec		= xfs_allocbt_update_lastrec,
+	.get_minrecs		= xfs_allocbt_get_minrecs,
+	.get_maxrecs		= xfs_allocbt_get_maxrecs,
+	.init_key_from_rec	= xfs_allocbt_init_key_from_rec,
+	.init_rec_from_cur	= xfs_allocbt_init_rec_from_cur,
+	.init_ptr_from_cur	= xfs_allocbt_init_ptr_from_cur,
+	.key_diff		= xfs_allocbt_key_diff,
+	.buf_ops		= &xfs_allocbt_buf_ops,
+	.diff_two_keys		= xfs_bnobt_diff_two_keys,
+	.keys_inorder		= xfs_allocbt_keys_inorder,
+	.recs_inorder		= xfs_allocbt_recs_inorder,
+};
+
+static const struct xfs_btree_ops xfs_cntbt_ops = {
 	.rec_len		= sizeof(xfs_alloc_rec_t),
 	.key_len		= sizeof(xfs_alloc_key_t),
 
@@ -397,10 +436,9 @@ static const struct xfs_btree_ops xfs_allocbt_ops = {
 	.init_ptr_from_cur	= xfs_allocbt_init_ptr_from_cur,
 	.key_diff		= xfs_allocbt_key_diff,
 	.buf_ops		= &xfs_allocbt_buf_ops,
-#if defined(DEBUG) || defined(XFS_WARN)
+	.diff_two_keys		= xfs_cntbt_diff_two_keys,
 	.keys_inorder		= xfs_allocbt_keys_inorder,
 	.recs_inorder		= xfs_allocbt_recs_inorder,
-#endif
 };
 
 /*
@@ -425,12 +463,13 @@ xfs_allocbt_init_cursor(
 	cur->bc_mp = mp;
 	cur->bc_btnum = btnum;
 	cur->bc_blocklog = mp->m_sb.sb_blocklog;
-	cur->bc_ops = &xfs_allocbt_ops;
 
 	if (btnum == XFS_BTNUM_CNT) {
+		cur->bc_ops = &xfs_cntbt_ops;
 		cur->bc_nlevels = be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNT]);
 		cur->bc_flags = XFS_BTREE_LASTREC_UPDATE;
 	} else {
+		cur->bc_ops = &xfs_bnobt_ops;
 		cur->bc_nlevels = be32_to_cpu(agf->agf_levels[XFS_BTNUM_BNO]);
 	}
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 141/145] xfs: support scrubbing inode btrees
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (139 preceding siblings ...)
  2016-06-17  1:45 ` [PATCH 140/145] xfs: support scrubbing free space btrees Darrick J. Wong
@ 2016-06-17  1:45 ` Darrick J. Wong
  2016-06-17  1:45 ` [PATCH 142/145] xfs: support scrubbing rmap btree Darrick J. Wong
                   ` (3 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:45 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Plumb in the pieces necessary to check the inode btrees.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_ialloc.c       |  178 +++++++++++++++++++++++++++++++++++++++++----
 libxfs/xfs_ialloc.h       |    2 +
 libxfs/xfs_ialloc_btree.c |   18 +++--
 3 files changed, 176 insertions(+), 22 deletions(-)


diff --git a/libxfs/xfs_ialloc.c b/libxfs/xfs_ialloc.c
index 8c2344c..d0fb2db 100644
--- a/libxfs/xfs_ialloc.c
+++ b/libxfs/xfs_ialloc.c
@@ -34,6 +34,8 @@
 #include "xfs_cksum.h"
 #include "xfs_trans.h"
 #include "xfs_trace.h"
+#include "xfs_rmap_btree.h"
+#include "xfs_scrub.h"
 
 
 /*
@@ -92,24 +94,14 @@ xfs_inobt_update(
 	return xfs_btree_update(cur, &rec);
 }
 
-/*
- * Get the data from the pointed-to record.
- */
-int					/* error */
-xfs_inobt_get_rec(
-	struct xfs_btree_cur	*cur,	/* btree cursor */
-	xfs_inobt_rec_incore_t	*irec,	/* btree record */
-	int			*stat)	/* output: success/failure */
+STATIC void
+xfs_inobt_btrec_to_irec(
+	struct xfs_mount		*mp,
+	union xfs_btree_rec		*rec,
+	struct xfs_inobt_rec_incore	*irec)
 {
-	union xfs_btree_rec	*rec;
-	int			error;
-
-	error = xfs_btree_get_rec(cur, &rec, stat);
-	if (error || *stat == 0)
-		return error;
-
 	irec->ir_startino = be32_to_cpu(rec->inobt.ir_startino);
-	if (xfs_sb_version_hassparseinodes(&cur->bc_mp->m_sb)) {
+	if (xfs_sb_version_hassparseinodes(&mp->m_sb)) {
 		irec->ir_holemask = be16_to_cpu(rec->inobt.ir_u.sp.ir_holemask);
 		irec->ir_count = rec->inobt.ir_u.sp.ir_count;
 		irec->ir_freecount = rec->inobt.ir_u.sp.ir_freecount;
@@ -124,6 +116,25 @@ xfs_inobt_get_rec(
 				be32_to_cpu(rec->inobt.ir_u.f.ir_freecount);
 	}
 	irec->ir_free = be64_to_cpu(rec->inobt.ir_free);
+}
+
+/*
+ * Get the data from the pointed-to record.
+ */
+int					/* error */
+xfs_inobt_get_rec(
+	struct xfs_btree_cur	*cur,	/* btree cursor */
+	xfs_inobt_rec_incore_t	*irec,	/* btree record */
+	int			*stat)	/* output: success/failure */
+{
+	union xfs_btree_rec	*rec;
+	int			error;
+
+	error = xfs_btree_get_rec(cur, &rec, stat);
+	if (error || *stat == 0)
+		return error;
+
+	xfs_inobt_btrec_to_irec(cur->bc_mp, rec, irec);
 
 	return 0;
 }
@@ -2644,3 +2655,138 @@ xfs_ialloc_pagi_init(
 		xfs_trans_brelse(tp, bp);
 	return 0;
 }
+
+STATIC int
+xfs_iallocbt_scrub_helper(
+	struct xfs_btree_scrub		*bs,
+	union xfs_btree_rec		*rec)
+{
+	struct xfs_mount		*mp = bs->cur->bc_mp;
+	struct xfs_inobt_rec_incore	irec;
+	__uint16_t			holemask;
+	xfs_agino_t			agino;
+	xfs_agblock_t			bno;
+	xfs_extlen_t			len;
+	int				holecount;
+	int				i;
+	bool				has_rmap = false;
+	struct xfs_owner_info		oinfo;
+	int				error = 0;
+	uint64_t			holes;
+
+	xfs_inobt_btrec_to_irec(mp, rec, &irec);
+
+	XFS_BTREC_SCRUB_CHECK(bs, irec.ir_count <= XFS_INODES_PER_CHUNK);
+	XFS_BTREC_SCRUB_CHECK(bs, irec.ir_freecount <= XFS_INODES_PER_CHUNK);
+	agino = irec.ir_startino;
+	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_INODES);
+
+	/* Handle non-sparse inodes */
+	if (!xfs_inobt_issparse(irec.ir_holemask)) {
+		len = XFS_B_TO_FSB(mp,
+				XFS_INODES_PER_CHUNK * mp->m_sb.sb_inodesize);
+		bno = XFS_AGINO_TO_AGBNO(mp, agino);
+
+		XFS_BTREC_SCRUB_CHECK(bs, bno < mp->m_sb.sb_agblocks)
+		XFS_BTREC_SCRUB_CHECK(bs, bno < bno + len);
+		XFS_BTREC_SCRUB_CHECK(bs, (unsigned long long)bno + len <=
+				mp->m_sb.sb_agblocks);
+
+		if (!bs->rmap_cur)
+			return error;
+		error = xfs_rmap_record_exists(bs->rmap_cur, bno, len, &oinfo,
+				&has_rmap);
+		if (error)
+			return error;
+		XFS_BTREC_SCRUB_CHECK(bs, has_rmap);
+		return 0;
+	}
+
+	/* Check each chunk of a sparse inode cluster. */
+	holemask = irec.ir_holemask;
+	holecount = 0;
+	len = XFS_B_TO_FSB(mp,
+			XFS_INODES_PER_HOLEMASK_BIT * mp->m_sb.sb_inodesize);
+	holes = ~xfs_inobt_irec_to_allocmask(&irec);
+	XFS_BTREC_SCRUB_CHECK(bs, (holes & irec.ir_free) == holes);
+	XFS_BTREC_SCRUB_CHECK(bs, irec.ir_freecount <= irec.ir_count);
+
+	for (i = 0; i < XFS_INOBT_HOLEMASK_BITS; holemask >>= 1,
+			i++, agino += XFS_INODES_PER_HOLEMASK_BIT) {
+		if (holemask & 1) {
+			holecount += XFS_INODES_PER_HOLEMASK_BIT;
+			continue;
+		}
+		bno = XFS_AGINO_TO_AGBNO(mp, agino);
+
+		XFS_BTREC_SCRUB_CHECK(bs, bno < mp->m_sb.sb_agblocks)
+		XFS_BTREC_SCRUB_CHECK(bs, bno < bno + len);
+		XFS_BTREC_SCRUB_CHECK(bs, (unsigned long long)bno + len <=
+				mp->m_sb.sb_agblocks);
+
+		if (!bs->rmap_cur)
+			continue;
+		error = xfs_rmap_record_exists(bs->rmap_cur, bno, len, &oinfo,
+				&has_rmap);
+		if (error)
+			break;
+		XFS_BTREC_SCRUB_CHECK(bs, has_rmap);
+	}
+
+	XFS_BTREC_SCRUB_CHECK(bs, holecount <= XFS_INODES_PER_CHUNK);
+	XFS_BTREC_SCRUB_CHECK(bs, holecount + irec.ir_count ==
+			XFS_INODES_PER_CHUNK);
+
+	return error;
+}
+
+/* Scrub the inode btrees for some AG. */
+STATIC int
+xfs_iallocbt_scrub(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	xfs_btnum_t		which)
+{
+	struct xfs_btree_scrub	bs;
+	int			error;
+
+	error = xfs_ialloc_read_agi(mp, NULL, agno, &bs.agi_bp);
+	if (error)
+		return error;
+
+	error = xfs_alloc_read_agf(mp, NULL, agno, 0, &bs.agf_bp);
+	if (error) {
+		xfs_trans_brelse(NULL, bs.agi_bp);
+		return error;
+	}
+
+	bs.cur = xfs_inobt_init_cursor(mp, NULL, bs.agi_bp, agno, which);
+	bs.scrub_rec = xfs_iallocbt_scrub_helper;
+	xfs_rmap_ag_owner(&bs.oinfo, XFS_RMAP_OWN_INOBT);
+	error = xfs_btree_scrub(&bs);
+	xfs_btree_del_cursor(bs.cur,
+			error ? XFS_BTREE_ERROR : XFS_BTREE_NOERROR);
+	xfs_trans_brelse(NULL, bs.agf_bp);
+	xfs_trans_brelse(NULL, bs.agi_bp);
+
+	if (!error && bs.error)
+		error = bs.error;
+
+	return error;
+}
+
+int
+xfs_inobt_scrub(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno)
+{
+	return xfs_iallocbt_scrub(mp, agno, XFS_BTNUM_INO);
+}
+
+int
+xfs_finobt_scrub(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno)
+{
+	return xfs_iallocbt_scrub(mp, agno, XFS_BTNUM_FINO);
+}
diff --git a/libxfs/xfs_ialloc.h b/libxfs/xfs_ialloc.h
index 0bb8966..7ea6ff3 100644
--- a/libxfs/xfs_ialloc.h
+++ b/libxfs/xfs_ialloc.h
@@ -168,5 +168,7 @@ int xfs_ialloc_inode_init(struct xfs_mount *mp, struct xfs_trans *tp,
 int xfs_read_agi(struct xfs_mount *mp, struct xfs_trans *tp,
 		xfs_agnumber_t agno, struct xfs_buf **bpp);
 
+extern int xfs_inobt_scrub(struct xfs_mount *mp, xfs_agnumber_t agno);
+extern int xfs_finobt_scrub(struct xfs_mount *mp, xfs_agnumber_t agno);
 
 #endif	/* __XFS_IALLOC_H__ */
diff --git a/libxfs/xfs_ialloc_btree.c b/libxfs/xfs_ialloc_btree.c
index 36d09eb..81a7b97 100644
--- a/libxfs/xfs_ialloc_btree.c
+++ b/libxfs/xfs_ialloc_btree.c
@@ -203,6 +203,16 @@ xfs_inobt_key_diff(
 			  cur->bc_rec.i.ir_startino;
 }
 
+STATIC __int64_t
+xfs_inobt_diff_two_keys(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_key	*k1,
+	union xfs_btree_key	*k2)
+{
+	return (__int64_t)be32_to_cpu(k2->inobt.ir_startino) -
+			  be32_to_cpu(k1->inobt.ir_startino);
+}
+
 static int
 xfs_inobt_verify(
 	struct xfs_buf		*bp)
@@ -277,7 +287,6 @@ const struct xfs_buf_ops xfs_inobt_buf_ops = {
 	.verify_write = xfs_inobt_write_verify,
 };
 
-#if defined(DEBUG) || defined(XFS_WARN)
 STATIC int
 xfs_inobt_keys_inorder(
 	struct xfs_btree_cur	*cur,
@@ -297,7 +306,6 @@ xfs_inobt_recs_inorder(
 	return be32_to_cpu(r1->inobt.ir_startino) + XFS_INODES_PER_CHUNK <=
 		be32_to_cpu(r2->inobt.ir_startino);
 }
-#endif	/* DEBUG */
 
 static const struct xfs_btree_ops xfs_inobt_ops = {
 	.rec_len		= sizeof(xfs_inobt_rec_t),
@@ -314,10 +322,9 @@ static const struct xfs_btree_ops xfs_inobt_ops = {
 	.init_ptr_from_cur	= xfs_inobt_init_ptr_from_cur,
 	.key_diff		= xfs_inobt_key_diff,
 	.buf_ops		= &xfs_inobt_buf_ops,
-#if defined(DEBUG) || defined(XFS_WARN)
+	.diff_two_keys		= xfs_inobt_diff_two_keys,
 	.keys_inorder		= xfs_inobt_keys_inorder,
 	.recs_inorder		= xfs_inobt_recs_inorder,
-#endif
 };
 
 static const struct xfs_btree_ops xfs_finobt_ops = {
@@ -335,10 +342,9 @@ static const struct xfs_btree_ops xfs_finobt_ops = {
 	.init_ptr_from_cur	= xfs_finobt_init_ptr_from_cur,
 	.key_diff		= xfs_inobt_key_diff,
 	.buf_ops		= &xfs_inobt_buf_ops,
-#if defined(DEBUG) || defined(XFS_WARN)
+	.diff_two_keys		= xfs_inobt_diff_two_keys,
 	.keys_inorder		= xfs_inobt_keys_inorder,
 	.recs_inorder		= xfs_inobt_recs_inorder,
-#endif
 };
 
 /*

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 142/145] xfs: support scrubbing rmap btree
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (140 preceding siblings ...)
  2016-06-17  1:45 ` [PATCH 141/145] xfs: support scrubbing inode btrees Darrick J. Wong
@ 2016-06-17  1:45 ` Darrick J. Wong
  2016-06-17  1:45 ` [PATCH 143/145] xfs: support scrubbing refcount btree Darrick J. Wong
                   ` (2 subsequent siblings)
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:45 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Plumb in the pieces necessary to check the rmap btree.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_rmap.c       |   77 +++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_rmap_btree.c |    4 --
 libxfs/xfs_rmap_btree.h |    2 +
 3 files changed, 79 insertions(+), 4 deletions(-)


diff --git a/libxfs/xfs_rmap.c b/libxfs/xfs_rmap.c
index 3ec9a35..4e0e472 100644
--- a/libxfs/xfs_rmap.c
+++ b/libxfs/xfs_rmap.c
@@ -36,6 +36,7 @@
 #include "xfs_trace.h"
 #include "xfs_bmap.h"
 #include "xfs_inode.h"
+#include "xfs_scrub.h"
 
 /*
  * Lookup the first record less than or equal to [bno, len, owner, offset]
@@ -2365,3 +2366,79 @@ xfs_rmap_record_exists(
 		     irec.rm_startblock + irec.rm_blockcount >= bno + len);
 	return 0;
 }
+
+STATIC int
+xfs_rmapbt_scrub_helper(
+	struct xfs_btree_scrub		*bs,
+	union xfs_btree_rec		*rec)
+{
+	struct xfs_mount		*mp = bs->cur->bc_mp;
+	struct xfs_rmap_irec		irec;
+	bool				is_freesp;
+	bool				non_inode;
+	bool				is_unwritten;
+	bool				is_bmbt;
+	bool				is_attr;
+	int				error;
+
+	error = xfs_rmapbt_btrec_to_irec(rec, &irec);
+	if (error)
+		return error;
+
+	XFS_BTREC_SCRUB_CHECK(bs, irec.rm_startblock < mp->m_sb.sb_agblocks)
+	XFS_BTREC_SCRUB_CHECK(bs, irec.rm_startblock < irec.rm_startblock +
+			irec.rm_blockcount);
+	XFS_BTREC_SCRUB_CHECK(bs, (unsigned long long)irec.rm_startblock +
+			irec.rm_blockcount <= mp->m_sb.sb_agblocks)
+
+	non_inode = XFS_RMAP_NON_INODE_OWNER(irec.rm_owner);
+	is_bmbt = irec.rm_flags & XFS_RMAP_ATTR_FORK;
+	is_attr = irec.rm_flags & XFS_RMAP_BMBT_BLOCK;
+	is_unwritten = irec.rm_flags & XFS_RMAP_UNWRITTEN;
+
+	XFS_BTREC_SCRUB_CHECK(bs, !is_bmbt || irec.rm_offset == 0);
+	XFS_BTREC_SCRUB_CHECK(bs, !non_inode || irec.rm_offset == 0);
+	XFS_BTREC_SCRUB_CHECK(bs, !is_unwritten || !(is_bmbt || non_inode ||
+			is_attr));
+	XFS_BTREC_SCRUB_CHECK(bs, !non_inode || !(is_bmbt || is_unwritten ||
+			is_attr));
+
+	/* check there's no record in freesp btrees */
+	error = xfs_alloc_record_exists(bs->bno_cur, irec.rm_startblock,
+			irec.rm_blockcount, &is_freesp);
+	if (error)
+		goto err;
+	XFS_BTREC_SCRUB_CHECK(bs, !is_freesp);
+
+	/* XXX: check with the owner */
+
+err:
+	return error;
+}
+
+/* Scrub the rmap btree for some AG. */
+int
+xfs_rmapbt_scrub(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno)
+{
+	struct xfs_btree_scrub	bs;
+	int			error;
+
+	error = xfs_alloc_read_agf(mp, NULL, agno, 0, &bs.agf_bp);
+	if (error)
+		return error;
+
+	bs.cur = xfs_rmapbt_init_cursor(mp, NULL, bs.agf_bp, agno);
+	bs.scrub_rec = xfs_rmapbt_scrub_helper;
+	xfs_rmap_ag_owner(&bs.oinfo, XFS_RMAP_OWN_AG);
+	error = xfs_btree_scrub(&bs);
+	xfs_btree_del_cursor(bs.cur,
+			error ? XFS_BTREE_ERROR : XFS_BTREE_NOERROR);
+	xfs_trans_brelse(NULL, bs.agf_bp);
+
+	if (!error && bs.error)
+		error = bs.error;
+
+	return error;
+}
diff --git a/libxfs/xfs_rmap_btree.c b/libxfs/xfs_rmap_btree.c
index 0b7da82..e592833 100644
--- a/libxfs/xfs_rmap_btree.c
+++ b/libxfs/xfs_rmap_btree.c
@@ -370,7 +370,6 @@ const struct xfs_buf_ops xfs_rmapbt_buf_ops = {
 	.verify_write		= xfs_rmapbt_write_verify,
 };
 
-#if defined(DEBUG) || defined(XFS_WARN)
 STATIC int
 xfs_rmapbt_keys_inorder(
 	struct xfs_btree_cur	*cur,
@@ -406,7 +405,6 @@ xfs_rmapbt_recs_inorder(
 		return 1;
 	return 0;
 }
-#endif	/* DEBUG */
 
 static const struct xfs_btree_ops xfs_rmapbt_ops = {
 	.rec_len		= sizeof(struct xfs_rmap_rec),
@@ -426,10 +424,8 @@ static const struct xfs_btree_ops xfs_rmapbt_ops = {
 	.key_diff		= xfs_rmapbt_key_diff,
 	.buf_ops		= &xfs_rmapbt_buf_ops,
 	.diff_two_keys		= xfs_rmapbt_diff_two_keys,
-#if defined(DEBUG) || defined(XFS_WARN)
 	.keys_inorder		= xfs_rmapbt_keys_inorder,
 	.recs_inorder		= xfs_rmapbt_recs_inorder,
-#endif
 };
 
 /*
diff --git a/libxfs/xfs_rmap_btree.h b/libxfs/xfs_rmap_btree.h
index 2f072c8..3f8742d 100644
--- a/libxfs/xfs_rmap_btree.h
+++ b/libxfs/xfs_rmap_btree.h
@@ -147,4 +147,6 @@ extern int xfs_rmapbt_calc_reserves(struct xfs_mount *mp,
 extern int xfs_rmap_record_exists(struct xfs_btree_cur *cur, xfs_agblock_t bno,
 		xfs_extlen_t len, struct xfs_owner_info *oinfo, bool *has_rmap);
 
+int xfs_rmapbt_scrub(struct xfs_mount *mp, xfs_agnumber_t agno);
+
 #endif	/* __XFS_RMAP_BTREE_H__ */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 143/145] xfs: support scrubbing refcount btree
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (141 preceding siblings ...)
  2016-06-17  1:45 ` [PATCH 142/145] xfs: support scrubbing rmap btree Darrick J. Wong
@ 2016-06-17  1:45 ` Darrick J. Wong
  2016-06-17  1:45 ` [PATCH 144/145] xfs: add btree scrub tracepoints Darrick J. Wong
  2016-06-17  1:46 ` [PATCH 145/145] xfs_scrub: create online filesystem scrub program Darrick J. Wong
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:45 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Plumb in the pieces necessary to check the refcount btree.  If rmap is
available, check the reference count by performing an interval query
against the rmapbt.

v2: Handle the case where the rmap records are not all at least the
length of the refcount extent.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_refcount.c       |  224 +++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_refcount.h       |    2 
 libxfs/xfs_refcount_btree.c |   16 ++-
 3 files changed, 238 insertions(+), 4 deletions(-)


diff --git a/libxfs/xfs_refcount.c b/libxfs/xfs_refcount.c
index a19cb45..760ad7e 100644
--- a/libxfs/xfs_refcount.c
+++ b/libxfs/xfs_refcount.c
@@ -36,6 +36,7 @@
 #include "xfs_bit.h"
 #include "xfs_refcount.h"
 #include "xfs_rmap_btree.h"
+#include "xfs_scrub.h"
 
 /* Allowable refcount adjustment amounts. */
 enum xfs_refc_adjust_op {
@@ -1577,3 +1578,226 @@ xfs_refcount_free_cow_extent(
 
 	return __xfs_refcount_add(mp, dfops, &ri);
 }
+
+struct xfs_refcountbt_scrub_fragment {
+	struct xfs_rmap_irec		rm;
+	struct list_head		list;
+};
+
+struct xfs_refcountbt_scrub_rmap_check_info {
+	xfs_nlink_t			nr;
+	struct xfs_refcount_irec	rc;
+	struct list_head		fragments;
+};
+
+static int
+xfs_refcountbt_scrub_rmap_check(
+	struct xfs_btree_cur		*cur,
+	struct xfs_rmap_irec		*rec,
+	void				*priv)
+{
+	struct xfs_refcountbt_scrub_rmap_check_info	*rsrci = priv;
+	struct xfs_refcountbt_scrub_fragment		*frag;
+	xfs_agblock_t			rm_last;
+	xfs_agblock_t			rc_last;
+
+	rm_last = rec->rm_startblock + rec->rm_blockcount;
+	rc_last = rsrci->rc.rc_startblock + rsrci->rc.rc_blockcount;
+	if (rec->rm_startblock <= rsrci->rc.rc_startblock && rm_last >= rc_last)
+		rsrci->nr++;
+	else {
+		frag = kmem_zalloc(sizeof(struct xfs_refcountbt_scrub_fragment),
+				KM_SLEEP);
+		frag->rm = *rec;
+		list_add_tail(&frag->list, &rsrci->fragments);
+	}
+
+	return 0;
+}
+
+STATIC void
+xfs_refcountbt_process_rmap_fragments(
+	struct xfs_mount				*mp,
+	struct xfs_refcountbt_scrub_rmap_check_info	*rsrci)
+{
+	struct list_head				worklist;
+	struct xfs_refcountbt_scrub_fragment		*cur;
+	struct xfs_refcountbt_scrub_fragment		*n;
+	xfs_agblock_t					bno;
+	xfs_agblock_t					rbno;
+	xfs_agblock_t					next_rbno;
+	xfs_nlink_t					nr;
+	xfs_nlink_t					target_nr;
+
+	target_nr = rsrci->rc.rc_refcount - rsrci->nr;
+	if (target_nr == 0)
+		return;
+
+	/*
+	 * There are (rsrci->rc.rc_refcount - rsrci->nr refcount)
+	 * references we haven't found yet.  Pull that many off the
+	 * fragment list and figure out where the smallest rmap ends
+	 * (and therefore the next rmap should start).  All the rmaps
+	 * we pull off should start at or before the beginning of the
+	 * refcount record's range.
+	 */
+	INIT_LIST_HEAD(&worklist);
+	rbno = NULLAGBLOCK;
+	nr = 1;
+	list_for_each_entry_safe(cur, n, &rsrci->fragments, list) {
+		if (cur->rm.rm_startblock > rsrci->rc.rc_startblock)
+			goto fail;
+		bno = cur->rm.rm_startblock + cur->rm.rm_blockcount;
+		if (rbno > bno)
+			rbno = bno;
+		list_del(&cur->list);
+		list_add_tail(&cur->list, &worklist);
+		if (nr == target_nr)
+			break;
+		nr++;
+	}
+
+	if (nr != target_nr)
+		goto fail;
+
+	while (!list_empty(&rsrci->fragments)) {
+		/* Discard any fragments ending at rbno. */
+		nr = 0;
+		next_rbno = NULLAGBLOCK;
+		list_for_each_entry_safe(cur, n, &worklist, list) {
+			bno = cur->rm.rm_startblock + cur->rm.rm_blockcount;
+			if (bno != rbno) {
+				if (next_rbno > bno)
+					next_rbno = bno;
+				continue;
+			}
+			list_del(&cur->list);
+			kmem_free(cur);
+			nr++;
+		}
+
+		/* Empty list?  We're done. */
+		if (list_empty(&rsrci->fragments))
+			break;
+
+		/* Try to add nr rmaps starting at rbno to the worklist. */
+		list_for_each_entry_safe(cur, n, &rsrci->fragments, list) {
+			bno = cur->rm.rm_startblock + cur->rm.rm_blockcount;
+			if (cur->rm.rm_startblock != rbno)
+				goto fail;
+			list_del(&cur->list);
+			list_add_tail(&cur->list, &worklist);
+			if (next_rbno > bno)
+				next_rbno = bno;
+			nr--;
+			if (nr == 0)
+				break;
+		}
+
+		rbno = next_rbno;
+	}
+
+	/*
+	 * Make sure the last extent we processed ends at or beyond
+	 * the end of the refcount extent.
+	 */
+	if (rbno < rsrci->rc.rc_startblock + rsrci->rc.rc_blockcount)
+		goto fail;
+
+	rsrci->nr = rsrci->rc.rc_refcount;
+fail:
+	/* Delete fragments and work list. */
+	while (!list_empty(&worklist)) {
+		cur = list_first_entry(&worklist,
+				struct xfs_refcountbt_scrub_fragment, list);
+		list_del(&cur->list);
+		kmem_free(cur);
+	}
+	while (!list_empty(&rsrci->fragments)) {
+		cur = list_first_entry(&rsrci->fragments,
+				struct xfs_refcountbt_scrub_fragment, list);
+		list_del(&cur->list);
+		kmem_free(cur);
+	}
+}
+
+STATIC int
+xfs_refcountbt_scrub_helper(
+	struct xfs_btree_scrub		*bs,
+	union xfs_btree_rec		*rec)
+{
+	struct xfs_mount		*mp = bs->cur->bc_mp;
+	struct xfs_rmap_irec		low;
+	struct xfs_rmap_irec		high;
+	struct xfs_refcount_irec	irec;
+	struct xfs_refcountbt_scrub_rmap_check_info	rsrci;
+	struct xfs_refcountbt_scrub_fragment		*cur;
+	int				error;
+
+	irec.rc_startblock = be32_to_cpu(rec->refc.rc_startblock);
+	irec.rc_blockcount = be32_to_cpu(rec->refc.rc_blockcount);
+	irec.rc_refcount = be32_to_cpu(rec->refc.rc_refcount);
+
+	XFS_BTREC_SCRUB_CHECK(bs, irec.rc_startblock < mp->m_sb.sb_agblocks);
+	XFS_BTREC_SCRUB_CHECK(bs, irec.rc_startblock < irec.rc_startblock +
+			irec.rc_blockcount);
+	XFS_BTREC_SCRUB_CHECK(bs, (unsigned long long)irec.rc_startblock +
+			irec.rc_blockcount <= mp->m_sb.sb_agblocks);
+	XFS_BTREC_SCRUB_CHECK(bs, irec.rc_refcount >= 1);
+
+	/* confirm the refcount */
+	if (!bs->rmap_cur)
+		return 0;
+
+	memset(&low, 0, sizeof(low));
+	low.rm_startblock = irec.rc_startblock;
+	memset(&high, 0xFF, sizeof(high));
+	high.rm_startblock = irec.rc_startblock + irec.rc_blockcount - 1;
+
+	rsrci.nr = 0;
+	rsrci.rc = irec;
+	INIT_LIST_HEAD(&rsrci.fragments);
+	error = xfs_rmapbt_query_range(bs->rmap_cur, &low, &high,
+			&xfs_refcountbt_scrub_rmap_check, &rsrci);
+	if (error && error != XFS_BTREE_QUERY_RANGE_ABORT)
+		goto err;
+	error = 0;
+	xfs_refcountbt_process_rmap_fragments(mp, &rsrci);
+	XFS_BTREC_SCRUB_CHECK(bs, irec.rc_refcount == rsrci.nr);
+
+err:
+	while (!list_empty(&rsrci.fragments)) {
+		cur = list_first_entry(&rsrci.fragments,
+				struct xfs_refcountbt_scrub_fragment, list);
+		list_del(&cur->list);
+		kmem_free(cur);
+	}
+	return error;
+}
+
+/* Scrub the refcount btree for some AG. */
+int
+xfs_refcountbt_scrub(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno)
+{
+	struct xfs_btree_scrub	bs;
+	int			error;
+
+	error = xfs_alloc_read_agf(mp, NULL, agno, 0, &bs.agf_bp);
+	if (error)
+		return error;
+
+	bs.cur = xfs_refcountbt_init_cursor(mp, NULL, bs.agf_bp, agno, NULL);
+	bs.scrub_rec = xfs_refcountbt_scrub_helper;
+	xfs_rmap_ag_owner(&bs.oinfo, XFS_RMAP_OWN_REFC);
+	error = xfs_btree_scrub(&bs);
+	xfs_btree_del_cursor(bs.cur,
+			error ? XFS_BTREE_ERROR : XFS_BTREE_NOERROR);
+	xfs_trans_brelse(NULL, bs.agf_bp);
+
+	if (!error && bs.error)
+		error = bs.error;
+
+	return error;
+}
diff --git a/libxfs/xfs_refcount.h b/libxfs/xfs_refcount.h
index 44b0346..d2317f1 100644
--- a/libxfs/xfs_refcount.h
+++ b/libxfs/xfs_refcount.h
@@ -68,4 +68,6 @@ extern int xfs_refcount_free_cow_extent(struct xfs_mount *mp,
 		struct xfs_defer_ops *dfops, xfs_fsblock_t fsb,
 		xfs_extlen_t len);
 
+extern int xfs_refcountbt_scrub(struct xfs_mount *mp, xfs_agnumber_t agno);
+
 #endif	/* __XFS_REFCOUNT_H__ */
diff --git a/libxfs/xfs_refcount_btree.c b/libxfs/xfs_refcount_btree.c
index 1b3ba07..3cd30d0 100644
--- a/libxfs/xfs_refcount_btree.c
+++ b/libxfs/xfs_refcount_btree.c
@@ -196,6 +196,16 @@ xfs_refcountbt_key_diff(
 	return (__int64_t)be32_to_cpu(kp->rc_startblock) - rec->rc_startblock;
 }
 
+STATIC __int64_t
+xfs_refcountbt_diff_two_keys(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_key	*k1,
+	union xfs_btree_key	*k2)
+{
+	return (__int64_t)be32_to_cpu(k2->refc.rc_startblock) -
+			  be32_to_cpu(k1->refc.rc_startblock);
+}
+
 STATIC bool
 xfs_refcountbt_verify(
 	struct xfs_buf		*bp)
@@ -258,7 +268,6 @@ const struct xfs_buf_ops xfs_refcountbt_buf_ops = {
 	.verify_write		= xfs_refcountbt_write_verify,
 };
 
-#if defined(DEBUG) || defined(XFS_WARN)
 STATIC int
 xfs_refcountbt_keys_inorder(
 	struct xfs_btree_cur	*cur,
@@ -287,13 +296,13 @@ xfs_refcountbt_recs_inorder(
 		b.rc_startblock = be32_to_cpu(r2->refc.rc_startblock);
 		b.rc_blockcount = be32_to_cpu(r2->refc.rc_blockcount);
 		b.rc_refcount = be32_to_cpu(r2->refc.rc_refcount);
+		a = a; b = b;
 		trace_xfs_refcount_rec_order_error(cur->bc_mp,
 				cur->bc_private.a.agno, &a, &b);
 	}
 
 	return ret;
 }
-#endif	/* DEBUG */
 
 static const struct xfs_btree_ops xfs_refcountbt_ops = {
 	.rec_len		= sizeof(struct xfs_refcount_rec),
@@ -310,10 +319,9 @@ static const struct xfs_btree_ops xfs_refcountbt_ops = {
 	.init_ptr_from_cur	= xfs_refcountbt_init_ptr_from_cur,
 	.key_diff		= xfs_refcountbt_key_diff,
 	.buf_ops		= &xfs_refcountbt_buf_ops,
-#if defined(DEBUG) || defined(XFS_WARN)
+	.diff_two_keys		= xfs_refcountbt_diff_two_keys,
 	.keys_inorder		= xfs_refcountbt_keys_inorder,
 	.recs_inorder		= xfs_refcountbt_recs_inorder,
-#endif
 };
 
 /*

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 144/145] xfs: add btree scrub tracepoints
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (142 preceding siblings ...)
  2016-06-17  1:45 ` [PATCH 143/145] xfs: support scrubbing refcount btree Darrick J. Wong
@ 2016-06-17  1:45 ` Darrick J. Wong
  2016-06-17  1:46 ` [PATCH 145/145] xfs_scrub: create online filesystem scrub program Darrick J. Wong
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:45 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 include/xfs_trace.h |    3 +++
 libxfs/xfs_scrub.c  |   14 ++++++++++++++
 2 files changed, 17 insertions(+)


diff --git a/include/xfs_trace.h b/include/xfs_trace.h
index ce973ba..206e550 100644
--- a/include/xfs_trace.h
+++ b/include/xfs_trace.h
@@ -276,6 +276,9 @@
 #define trace_xfs_rmap_map_done(...)		((void) 0)
 #define trace_xfs_rmap_map_error(...)		((void) 0)
 
+#define trace_xfs_btree_scrub_key(...)		((void) 0)
+#define trace_xfs_btree_scrub_rec(...)		((void) 0)
+
 /* set c = c to avoid unused var warnings */
 #define trace_xfs_perag_get(a,b,c,d)	((c) = (c))
 #define trace_xfs_perag_get_tag(a,b,c,d) ((c) = (c))
diff --git a/libxfs/xfs_scrub.c b/libxfs/xfs_scrub.c
index 750c482..bd9669d 100644
--- a/libxfs/xfs_scrub.c
+++ b/libxfs/xfs_scrub.c
@@ -34,6 +34,7 @@
 #include "xfs_rmap_btree.h"
 #include "xfs_log_format.h"
 #include "xfs_trans.h"
+#include "xfs_trace.h"
 #include "xfs_scrub.h"
 
 static const char * const btree_types[] = {
@@ -88,6 +89,12 @@ xfs_btree_scrub_rec(
 	struct xfs_btree_block	*block;
 	struct xfs_btree_block	*keyblock;
 
+	trace_xfs_btree_scrub_rec(cur->bc_mp, cur->bc_private.a.agno,
+			XFS_FSB_TO_AGBNO(cur->bc_mp,
+				XFS_DADDR_TO_FSB(cur->bc_mp,
+					cur->bc_bufs[0]->b_bn)),
+			cur->bc_btnum, 0, cur->bc_nlevels, cur->bc_ptrs[0]);
+
 	block = XFS_BUF_TO_BLOCK(cur->bc_bufs[0]);
 	rec = xfs_btree_rec_addr(cur, cur->bc_ptrs[0], block);
 
@@ -135,6 +142,13 @@ xfs_btree_scrub_key(
 	struct xfs_btree_block	*block;
 	struct xfs_btree_block	*keyblock;
 
+	trace_xfs_btree_scrub_key(cur->bc_mp, cur->bc_private.a.agno,
+			XFS_FSB_TO_AGBNO(cur->bc_mp,
+				XFS_DADDR_TO_FSB(cur->bc_mp,
+					cur->bc_bufs[level]->b_bn)),
+			cur->bc_btnum, level, cur->bc_nlevels,
+			cur->bc_ptrs[level]);
+
 	block = XFS_BUF_TO_BLOCK(cur->bc_bufs[level]);
 	key = xfs_btree_key_addr(cur, cur->bc_ptrs[level], block);
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 145/145] xfs_scrub: create online filesystem scrub program
  2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
                   ` (143 preceding siblings ...)
  2016-06-17  1:45 ` [PATCH 144/145] xfs: add btree scrub tracepoints Darrick J. Wong
@ 2016-06-17  1:46 ` Darrick J. Wong
  144 siblings, 0 replies; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:46 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Create a toy filesystem scrubbing tool that walks the directory tree,
queries every file's extents, extended attributes, and stat data.  For
generic (non-XFS) filesystems this depends on the kernel to do nearly
all the validation.  Optionally, we can (try to) read all the file
data.

Future XFS extensions to this program will perform much stronger
metadata checking and cross-referencing.  In the future we might be
able to do such things like lock a directory, check the entries and
back pointers, and unlock it; or lock an inode to check the extent map
and cross-reference the entries therein with a reverse-mapping index.

However, this tool /should/ work for most non-XFS filesystems.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 Makefile             |    2 
 man/man8/xfs_scrub.8 |   82 +++++
 scrub/Makefile       |   26 ++
 scrub/generic.c      |  372 +++++++++++++++++++++++
 scrub/scrub.c        |  822 ++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/scrub.h        |  100 ++++++
 scrub/xfs.c          |  241 +++++++++++++++
 7 files changed, 1644 insertions(+), 1 deletion(-)
 create mode 100644 man/man8/xfs_scrub.8
 create mode 100644 scrub/Makefile
 create mode 100644 scrub/generic.c
 create mode 100644 scrub/scrub.c
 create mode 100644 scrub/scrub.h
 create mode 100644 scrub/xfs.c


diff --git a/Makefile b/Makefile
index fca0a42..72334d9 100644
--- a/Makefile
+++ b/Makefile
@@ -46,7 +46,7 @@ HDR_SUBDIRS = include libxfs
 DLIB_SUBDIRS = libxlog libxcmd libhandle
 LIB_SUBDIRS = libxfs $(DLIB_SUBDIRS)
 TOOL_SUBDIRS = copy db estimate fsck fsr growfs io logprint mkfs quota \
-		mdrestore repair rtcp m4 man doc debian
+		mdrestore repair rtcp m4 man doc debian scrub
 
 ifneq ("$(XGETTEXT)","")
 TOOL_SUBDIRS += po
diff --git a/man/man8/xfs_scrub.8 b/man/man8/xfs_scrub.8
new file mode 100644
index 0000000..95d7169
--- /dev/null
+++ b/man/man8/xfs_scrub.8
@@ -0,0 +1,82 @@
+.TH xfs_scrub 8
+.SH NAME
+xfs_scrub \- scrub the contents of an XFS filesystem
+.SH SYNOPSIS
+.B xfs_scrub
+[
+.B \-dvx
+] [
+.B \-t
+.I fstype
+]
+.I mountpoint
+.br
+.B xfs_scrub \-V
+.SH DESCRIPTION
+.B xfs_scrub
+attempts to read and check all the metadata in a Linux filesystem.
+.PP
+If
+.B xfs_scrub
+does not detect an XFS filesystem, it will use a generic backend to
+scrub the filesystem.  This involves walking the directory tree,
+querying the data and extended attribute extent maps, performing
+limited checks of directory and inode data, reading all of an
+inode's extended attributes, and optionally reading all data in
+a file.
+.PP
+If an XFS filesystem is detected, then
+.B xfs_scrub
+will use private XFS ioctls and sysfs interfaces to perform more
+rigorous scrubbing of the internal metadata.  Currently this is
+limited to asking the kernel to check the per-AG btrees, which
+also performs limited cross-referencing.
+.SH OPTIONS
+.TP
+.B \-d
+Enable debugging mode, which augments error reports with the exact file
+and line where the scrub failure occurred.  This also enables verbose
+mode.
+.TP
+.B \-v
+Enable verbose mode, which prints periodic status updates.
+.TP
+.BI \-t " fstype"
+Force the use of a particular type of filesystem scrubber.  Currently
+supported backends are
+.I xfs
+and
+.I generic
+scrubbers.
+.TP
+.B \-V
+Prints the version number and exits.
+.TP
+.B \-x
+Scrub file data.  This reads every block of every file on disk.
+.SH EXIT CODE
+The exit code returned by
+.B xfs_scrub
+is the sum of the following conditions:
+.br
+\	0\	\-\ No errors
+.br
+\	4\	\-\ File system errors left uncorrected
+.br
+\	8\	\-\ Operational error
+.br
+\	16\	\-\ Usage or syntax error
+.br
+.SH CAVEATS
+.B xfs_scrub
+is a very immature utility!  The generic scrub backend walks the directory
+tree, reads file extents and data, and queries every extended attribute it
+can find.  The generic scrub does not grab exclusive locks on the objects
+it is examining, nor does it have any way to cross-reference what it sees
+against the internal filesystem metadata.
+.PP
+The XFS backend will some day learn how to do all those things, but for
+now its only advantage over the generic backend is that it knows how to
+ask the kernel to perform a basic scrub of the XFS AG metadata.
+.SH SEE ALSO
+.BR xfs_repair (8).
diff --git a/scrub/Makefile b/scrub/Makefile
new file mode 100644
index 0000000..52b2838
--- /dev/null
+++ b/scrub/Makefile
@@ -0,0 +1,26 @@
+#
+# Copyright (c) 2016 Oracle.  All Rights Reserved.
+#
+
+TOPDIR = ..
+include $(TOPDIR)/include/builddefs
+
+LTCOMMAND = xfs_scrub
+
+HFILES = scrub.h
+CFILES = scrub.c generic.c xfs.c
+
+LLDLIBS += $(LIBBLKID) $(LIBXFS) $(LIBUUID) $(LIBRT) $(LIBPTHREAD)
+LTDEPENDENCIES += $(LIBXFS)
+LLDFLAGS = -static-libtool-libs
+
+default: depend $(LTCOMMAND)
+
+include $(BUILDRULES)
+
+install: default
+	$(INSTALL) -m 755 -d $(PKG_ROOT_SBIN_DIR)
+	$(LTINSTALL) -m 755 $(LTCOMMAND) $(PKG_ROOT_SBIN_DIR)
+install-dev:
+
+-include .dep
diff --git a/scrub/generic.c b/scrub/generic.c
new file mode 100644
index 0000000..6c397fd
--- /dev/null
+++ b/scrub/generic.c
@@ -0,0 +1,372 @@
+/*
+ * Copyright (C) 2016 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <linux/fs.h>
+#include <linux/fiemap.h>
+#include <sys/statvfs.h>
+#include <sys/types.h>
+#include <dirent.h>
+#include <attr/xattr.h>
+#include "libxfs.h"
+#include "scrub.h"
+
+/* Routines to scrub a generic filesystem with nothing but the VFS. */
+
+bool
+generic_scan_fs(
+	struct scrub_ctx	*ctx)
+{
+	/* Nothing to do here. */
+	return true;
+}
+
+bool
+generic_scan_inodes(
+	struct scrub_ctx	*ctx)
+{
+	/* Nothing to do here. */
+	return true;
+}
+
+bool
+generic_cleanup(
+	struct scrub_ctx	*ctx)
+{
+	/* Nothing to do here. */
+	return true;
+}
+
+bool
+generic_scan_metadata(
+	struct scrub_ctx	*ctx)
+{
+	/* Nothing to do here. */
+	return true;
+}
+
+/* Check all entries in a directory. */
+bool
+generic_check_dir(
+	struct scrub_ctx	*ctx,
+	int			dir_fd)
+{
+	/* Nothing to do here. */
+	return true;
+}
+
+/* Check an inode's extents... the hard way. */
+static bool
+generic_scan_extents_fibmap(
+	struct scrub_ctx	*ctx,
+	int			fd,
+	struct stat64		*sb)
+{
+	unsigned int		blk;
+	unsigned int		b;
+	off_t			numblocks;
+	int			error;
+
+	if (!(ctx->quirks & SCRUB_QUIRK_FIBMAP_WORKS))
+		return true;
+
+	numblocks = (sb->st_size + sb->st_blksize - 1) / sb->st_blksize;
+	if (numblocks > UINT_MAX)
+		numblocks = UINT_MAX;
+	for (blk = 0; blk < numblocks; blk++) {
+		b = blk;
+		error = ioctl(fd, FIBMAP, &b);
+		if (error) {
+			if (errno == EOPNOTSUPP) {
+				path_warn(ctx,
+_("data block FIEMAP/FIBMAP not supported, will not check extent map."));
+				ctx->quirks &= ~SCRUB_QUIRK_FIBMAP_WORKS;
+				return true;
+			}
+			path_errno(ctx);
+		}
+	}
+
+	return true;
+}
+
+/* Check an inode's extents. */
+#define NR_EXTENTS	512
+bool
+generic_scan_extents(
+	struct scrub_ctx	*ctx,
+	int			fd,
+	struct stat64		*sb,
+	bool			attr_fork)
+{
+	struct fiemap		*fiemap;
+	size_t			sz;
+	struct fiemap_extent	*extent;
+	__u64			next_logical;
+	bool			last = false;
+	int			error;
+	unsigned int		i;
+
+	/* FIEMAP only works for files. */
+	if (!S_ISREG(sb->st_mode))
+		return true;
+
+	if (!attr_fork && !(ctx->quirks & SCRUB_QUIRK_FIEMAP_WORKS))
+		return generic_scan_extents_fibmap(ctx, fd, sb);
+	else if (attr_fork && !(ctx->quirks & SCRUB_QUIRK_FIEMAP_ATTR_WORKS))
+		return true;
+
+	sz = sizeof(struct fiemap) + sizeof(struct fiemap_extent) * NR_EXTENTS;
+	fiemap = calloc(sz, 1);
+	if (!fiemap) {
+		path_errno(ctx);
+		return false;
+	}
+
+	fiemap->fm_length = ~0ULL;
+	fiemap->fm_flags = FIEMAP_FLAG_SYNC;
+	if (attr_fork)
+		fiemap->fm_flags |= FIEMAP_FLAG_XATTR;
+	fiemap->fm_extent_count = NR_EXTENTS;
+	fiemap->fm_reserved = 0;
+	next_logical = 0;
+
+	while (!last) {
+		fiemap->fm_start = next_logical;
+		error = ioctl(fd, FS_IOC_FIEMAP, (unsigned long)fiemap);
+		if (error < 0 && errno == EOPNOTSUPP) {
+			if (attr_fork) {
+				path_warn(ctx,
+_("extended attribute FIEMAP not supported, will not check extent map."));
+				ctx->quirks &= ~SCRUB_QUIRK_FIEMAP_WORKS;
+			} else
+				ctx->quirks &= ~SCRUB_QUIRK_FIEMAP_ATTR_WORKS;
+			break;
+		}
+		if (error < 0) {
+			path_errno(ctx);
+			break;
+		}
+
+		/* No more extents to map, exit */
+		if (!fiemap->fm_mapped_extents)
+			break;
+
+		for (i = 0; i < fiemap->fm_mapped_extents; i++) {
+			extent = &fiemap->fm_extents[i];
+
+			if (extent->fe_length == 0)
+				path_error(ctx,
+_("zero-length extent at offset %llu\n"),
+					extent->fe_logical);
+
+			next_logical = extent->fe_logical + extent->fe_length;
+			if (extent->fe_flags & FIEMAP_EXTENT_LAST)
+				last = true;
+		}
+	}
+
+	free(fiemap);
+	return true;
+}
+
+/* Check the fields of an inode. */
+bool
+generic_check_inode(
+	struct scrub_ctx	*ctx,
+	int			fd,
+	struct stat64		*sb)
+{
+	if (sb->st_nlink == 0)
+		path_error(ctx,
+_("nlinks should not be 0."));
+
+	return true;
+}
+
+/* Try to read all the extended attributes. */
+bool
+generic_scan_xattrs(
+	struct scrub_ctx	*ctx,
+	int			fd)
+{
+	char			*buf = NULL;
+	char			*p;
+	ssize_t			buf_sz;
+	ssize_t			sz;
+	char			*valbuf = NULL;
+	ssize_t			valbuf_sz = 0;
+	ssize_t			val_sz;
+	ssize_t			sz2;
+	bool			moveon = true;
+	char			*x;
+
+	buf_sz = flistxattr(fd, NULL, 0);
+	if (buf_sz == -EOPNOTSUPP)
+		return true;
+	else if (buf_sz == 0)
+		return true;
+	else if (buf_sz < 0) {
+		path_errno(ctx);
+		return true;
+	}
+
+	buf = malloc(buf_sz);
+	if (!buf) {
+		path_errno(ctx);
+		return false;
+	}
+
+	sz = flistxattr(fd, buf, buf_sz);
+	if (sz < 0) {
+		path_errno(ctx);
+		goto out;
+	} else if (sz != buf_sz) {
+		path_error(ctx,
+_("read %zu bytes of xattr names, expected %zu bytes."),
+				sz, buf_sz);
+	}
+
+	/* Read all the attrs and values. */
+	for (p = buf; p < buf + sz; p += strlen(p) + 1) {
+		val_sz = fgetxattr(fd, p, NULL, 0);
+		if (val_sz < 0) {
+			if (errno != ENODATA)
+				path_errno(ctx);
+			continue;
+		}
+		if (val_sz > valbuf_sz) {
+			x = realloc(valbuf, val_sz);
+			if (!x) {
+				path_errno(ctx);
+				moveon = false;
+				break;
+			}
+			valbuf = x;
+			valbuf_sz = val_sz;
+		}
+		sz2 = fgetxattr(fd, p, valbuf, val_sz);
+		if (sz2 < 0) {
+			path_errno(ctx);
+			continue;
+		} else if (sz2 != val_sz)
+			path_error(ctx,
+_("read %zu bytes from xattr %s value, expected %zu bytes."),
+					sz2, p, val_sz);
+	}
+out:
+	free(valbuf);
+	free(buf);
+	return moveon;
+}
+
+/* Try to read all the extended attributes of things that have no fd. */
+bool
+generic_scan_special_xattrs(
+	struct scrub_ctx	*ctx)
+{
+	char			*buf = NULL;
+	char			*p;
+	ssize_t			buf_sz;
+	ssize_t			sz;
+	char			*valbuf = NULL;
+	ssize_t			valbuf_sz = 0;
+	ssize_t			val_sz;
+	ssize_t			sz2;
+	bool			moveon = true;
+	char			*x;
+	char			path[PATH_MAX];
+	int			error;
+
+	/* Construct the full path to this file. */
+	error = construct_path(ctx, path, PATH_MAX);
+	if (error) {
+		path_errno(ctx);
+		return false;
+	}
+
+	buf_sz = llistxattr(path, NULL, 0);
+	if (buf_sz == -EOPNOTSUPP)
+		return true;
+	else if (buf_sz == 0)
+		return true;
+	else if (buf_sz < 0) {
+		path_errno(ctx);
+		return true;
+	}
+
+	buf = malloc(buf_sz);
+	if (!buf) {
+		path_errno(ctx);
+		return false;
+	}
+
+	sz = llistxattr(path, buf, buf_sz);
+	if (sz < 0) {
+		path_errno(ctx);
+		goto out;
+	} else if (sz != buf_sz) {
+		path_error(ctx,
+_("read %zu bytes of xattr names, expected %zu bytes."),
+				sz, buf_sz);
+	}
+
+	/* Read all the attrs and values. */
+	for (p = buf; p < buf + sz; p += strlen(p) + 1) {
+		val_sz = lgetxattr(path, p, NULL, 0);
+		if (val_sz < 0) {
+			path_errno(ctx);
+			continue;
+		}
+		if (val_sz > valbuf_sz) {
+			x = realloc(valbuf, val_sz);
+			if (!x) {
+				path_errno(ctx);
+				moveon = false;
+				break;
+			}
+			valbuf = x;
+			valbuf_sz = val_sz;
+		}
+		sz2 = lgetxattr(path, p, valbuf, val_sz);
+		if (sz2 < 0) {
+			path_errno(ctx);
+			continue;
+		} else if (sz2 != val_sz)
+			path_error(ctx,
+_("read %zu bytes from xattr %s value, expected %zu bytes."),
+					sz2, p, val_sz);
+	}
+out:
+	free(valbuf);
+	free(buf);
+	return moveon;
+}
+
+struct scrub_ops generic_scrub_ops = {
+	.name			= "generic",
+	.cleanup		= generic_cleanup,
+	.scan_fs		= generic_scan_fs,
+	.scan_inodes		= generic_scan_inodes,
+	.check_dir		= generic_check_dir,
+	.check_inode		= generic_check_inode,
+	.scan_extents		= generic_scan_extents,
+	.scan_xattrs		= generic_scan_xattrs,
+	.scan_special_xattrs	= generic_scan_special_xattrs,
+	.scan_metadata		= generic_scan_metadata,
+};
diff --git a/scrub/scrub.c b/scrub/scrub.c
new file mode 100644
index 0000000..2d68b07
--- /dev/null
+++ b/scrub/scrub.c
@@ -0,0 +1,822 @@
+/*
+ * Copyright (C) 2016 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "libxfs.h"
+#include <stdio.h>
+#include <mntent.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/resource.h>
+#include <sys/statvfs.h>
+#include <sys/vfs.h>
+#include <fcntl.h>
+#include <dirent.h>
+#include "scrub.h"
+
+#define _PATH_PROC_MOUNTS	"/proc/mounts"
+
+bool				verbose;
+bool				debug;
+bool				scrub_data;
+
+static void __attribute__((noreturn))
+usage(void)
+{
+	fprintf(stderr, _("Usage: %s [OPTIONS] mountpoint\n"), progname);
+	fprintf(stderr, _("-d:\tRun program in debug mode.\n"));
+	fprintf(stderr, _("-t:\tUse this filesystem backend for scrubbing.\n"));
+	fprintf(stderr, _("-v:\tVerbose output.\n"));
+	fprintf(stderr, _("-x:\tScrub file data too.\n"));
+
+	exit(16);
+}
+
+/*
+ * Check if the argument is either the device name or mountpoint of a mounted
+ * filesystem.
+ */
+static bool
+find_mountpoint_check(struct stat64 *sb, struct mntent *t)
+{
+	struct stat64 ms;
+
+	if (S_ISDIR(sb->st_mode)) {		/* mount point */
+		if (stat64(t->mnt_dir, &ms) < 0)
+			return false;
+		if (sb->st_ino != ms.st_ino)
+			return false;
+		if (sb->st_dev != ms.st_dev)
+			return false;
+		/*
+		 * Make sure the device given by mtab is accessible
+		 * before using it.
+		 */
+		if (stat64(t->mnt_fsname, &ms) < 0)
+			return false;
+	} else {				/* device */
+		if (stat64(t->mnt_fsname, &ms) < 0)
+			return false;
+		if (sb->st_rdev != ms.st_rdev)
+			return false;
+		/*
+		 * Make sure the mountpoint given by mtab is accessible
+		 * before using it.
+		 */
+		if (stat64(t->mnt_dir, &ms) < 0)
+			return false;
+	}
+
+	return true;
+}
+
+/* Check that our alleged mountpoint is in mtab */
+static bool
+find_mountpoint(char *mtab, struct stat64 *sb, struct mntent *mnt)
+{
+	struct mntent_cursor cursor;
+	struct mntent *t = NULL;
+	bool found = false;
+
+	if (platform_mntent_open(&cursor, mtab) != 0) {
+		fprintf(stderr, "Error: can't get mntent entries.\n");
+		exit(1);
+	}
+
+	while ((t = platform_mntent_next(&cursor)) != NULL) {
+		if (find_mountpoint_check(sb, t)) {
+			*mnt = *t;
+			found = true;
+			break;
+		}
+	}
+	platform_mntent_close(&cursor);
+	return found;
+}
+
+/* Print a string and whatever error is stored in errno. */
+void
+__str_errno(
+	struct scrub_ctx	*ctx,
+	const char		*str,
+	const char		*file,
+	int			line)
+{
+	char			buf[256];
+
+	fprintf(stderr, "%s: %s.", str, strerror_r(errno, buf, 256));
+	if (debug)
+		fprintf(stderr, " (%s line %d)", file, line);
+	fprintf(stderr, "\n");
+	ctx->errors_found++;
+}
+
+/* Print a string and some error text. */
+void
+__str_error(
+	struct scrub_ctx	*ctx,
+	const char		*str,
+	const char		*file,
+	int			line,
+	const char		*format,
+	...)
+{
+	va_list			args;
+
+	fprintf(stderr, "%s: ", str);
+	va_start(args, format);
+	vfprintf(stderr, format, args);
+	va_end(args);
+	if (debug)
+		fprintf(stderr, " (%s line %d)", file, line);
+	fprintf(stderr, "\n");
+	ctx->errors_found++;
+}
+
+/* Print a string and some warning text. */
+void
+__str_warn(
+	struct scrub_ctx	*ctx,
+	const char		*str,
+	const char		*file,
+	int			line,
+	const char		*format,
+	...)
+{
+	va_list			args;
+
+	fprintf(stderr, "%s: ", str);
+	va_start(args, format);
+	vfprintf(stderr, format, args);
+	va_end(args);
+	if (debug)
+		fprintf(stderr, " (%s line %d)", file, line);
+	fprintf(stderr, "\n");
+	ctx->warnings_found++;
+}
+
+/* Print the current path and whatever error is stored in errno. */
+void
+__path_errno(
+	struct scrub_ctx	*ctx,
+	const char		*file,
+	int			line)
+{
+	char			buf[256];
+	struct list_head	*l;
+	struct path_piece	*pp;
+	int			err;
+
+	err = errno;
+	fprintf(stderr, "%s", ctx->mntpoint);
+	list_for_each(l, &ctx->path_stack) {
+		pp = container_of(l, struct path_piece, list);
+		fprintf(stderr, "/%s", pp->name);
+	}
+	fprintf(stderr, ": %s.", strerror_r(err, buf, 256));
+	if (debug)
+		fprintf(stderr, " (%s line %d)", file, line);
+	fprintf(stderr, "\n");
+	ctx->errors_found++;
+}
+
+/* Print the current path and some error text. */
+void
+__path_error(
+	struct scrub_ctx	*ctx,
+	const char		*file,
+	int			line,
+	const char		*format,
+	...)
+{
+	va_list			args;
+	struct list_head	*l;
+	struct path_piece	*pp;
+
+	fprintf(stderr, "%s", ctx->mntpoint);
+	list_for_each(l, &ctx->path_stack) {
+		pp = container_of(l, struct path_piece, list);
+		fprintf(stderr, "/%s", pp->name);
+	}
+	fprintf(stderr, ": ");
+	va_start(args, format);
+	vfprintf(stderr, format, args);
+	va_end(args);
+	if (debug)
+		fprintf(stderr, " (%s line %d)", file, line);
+	fprintf(stderr, "\n");
+	ctx->errors_found++;
+}
+
+/* Print the current path and some warning text. */
+void
+__path_warn(
+	struct scrub_ctx	*ctx,
+	const char		*file,
+	int			line,
+	const char		*format,
+	...)
+{
+	va_list			args;
+	struct list_head	*l;
+	struct path_piece	*pp;
+
+	fprintf(stderr, "%s", ctx->mntpoint);
+	list_for_each(l, &ctx->path_stack) {
+		pp = container_of(l, struct path_piece, list);
+		fprintf(stderr, "/%s", pp->name);
+	}
+	fprintf(stderr, ": ");
+	va_start(args, format);
+	vfprintf(stderr, format, args);
+	va_end(args);
+	if (debug)
+		fprintf(stderr, " (%s line %d)", file, line);
+	fprintf(stderr, "\n");
+	ctx->warnings_found++;
+}
+
+/* Construct the current path. */
+int
+construct_path(
+	struct scrub_ctx	*ctx,
+	char			*buf,
+	size_t			buflen)
+{
+	size_t			nr = 0;
+	struct list_head	*l;
+	struct path_piece	*pp;
+	int			sz;
+
+	/* Mountpoint */
+	sz = snprintf(buf + nr, buflen - nr, "%s", ctx->mntpoint);
+	if (sz < 0)
+		return -1;
+	else if (sz > buflen - nr) {
+		errno = ENOMEM;
+		return -1;
+	}
+	nr += sz;
+
+	/* Intermediate path components. */
+	list_for_each(l, &ctx->path_stack) {
+		pp = container_of(l, struct path_piece, list);
+
+		sz = snprintf(buf + nr, buflen - nr, "/%s", pp->name);
+		if (sz < 0)
+			return -1;
+		else if (sz > buflen - nr) {
+			errno = ENOMEM;
+			return -1;
+		}
+		nr += sz;
+	}
+
+	return 0;
+}
+
+#define CHECK_TYPE(type) \
+	case DT_##type: \
+		if (!S_IS##type(sb->st_mode)) { \
+			path_error(ctx, \
+_("dtype of block does not match mode 0x%x\n"), \
+				sb->st_mode & S_IFMT); \
+		} \
+		break;
+
+/* Ensure that the directory entry matches the stat info. */
+static bool
+verify_dirent(
+	struct scrub_ctx	*ctx,
+	struct dirent		*dirent,
+	struct stat64		*sb)
+{
+	if (dirent->d_ino != sb->st_ino)
+		path_error(ctx,
+_("inode numbers (%llu != %llu) do not match!"),
+			(unsigned long long)dirent->d_ino,
+			(unsigned long long)sb->st_ino);
+
+	switch (dirent->d_type) {
+	case DT_UNKNOWN:
+		break;
+	CHECK_TYPE(BLK)
+	CHECK_TYPE(CHR)
+	CHECK_TYPE(DIR)
+	CHECK_TYPE(FIFO)
+	CHECK_TYPE(LNK)
+	CHECK_TYPE(REG)
+	CHECK_TYPE(SOCK)
+	}
+
+	return true;
+}
+#undef CHECK_TYPE
+
+/* Read all the data in a file. */
+#define READ_BUF_SIZE		262144
+static bool
+read_file(
+	struct scrub_ctx	*ctx,
+	int			fd,
+	struct stat64		*sb)
+{
+	off_t			data_end = 0;
+	off_t			data_start;
+	off_t			start;
+	ssize_t			sz;
+	size_t			count;
+	static char		*readbuf;
+	bool			reports_holes = true;
+	bool			direct_io = false;
+	int			flags;
+	int			error;
+	static long		page_size;
+
+	/* Find the page size. */
+	if (!page_size) {
+		page_size = sysconf(_SC_PAGESIZE);
+		if (page_size < 0) {
+			path_errno(ctx);
+			return false;
+		}
+	}
+
+	/* Try to allocate a read buffer if we don't have one. */
+	if (!readbuf) {
+		error = posix_memalign((void **)&readbuf, page_size,
+				READ_BUF_SIZE);
+		if (error || !readbuf) {
+			path_errno(ctx);
+			return false;
+		}
+	}
+
+	/* Can we set O_DIRECT? */
+	flags = fcntl(fd, F_GETFL);
+	error = fcntl(fd, F_SETFL, flags | O_DIRECT);
+	if (!error)
+		direct_io = true;
+
+	/* See if SEEK_DATA/SEEK_HOLE work... */
+	data_start = lseek(fd, data_end, SEEK_DATA);
+	if (data_start < 0) {
+		/* ENXIO for SEEK_DATA means no file data anywhere. */
+		if (errno == ENXIO)
+			return true;
+		reports_holes = false;
+	}
+
+	if (reports_holes) {
+		data_end = lseek(fd, data_start, SEEK_HOLE);
+		if (data_end < 0)
+			reports_holes = false;
+	}
+
+	/* ...or just read everything if they don't. */
+	if (!reports_holes) {
+		data_start = 0;
+		data_end = sb->st_size;
+	}
+
+	if (!direct_io) {
+		posix_fadvise(fd, 0, sb->st_size, POSIX_FADV_SEQUENTIAL);
+		posix_fadvise(fd, 0, sb->st_size, POSIX_FADV_WILLNEED);
+	}
+	/* Read the non-hole areas. */
+	while (data_start < data_end) {
+		start = data_start;
+
+		if (direct_io && (start & (page_size - 1)))
+			start &= ~(page_size - 1);
+		count = min(READ_BUF_SIZE, data_end - start);
+		if (direct_io && (count & (page_size - 1)))
+			count = (count + page_size) & ~(page_size - 1);
+		sz = pread(fd, readbuf, count, start);
+		if (sz < 0)
+			path_errno(ctx);
+		else if (sz == 0) {
+			path_error(ctx,
+_("Read zero bytes, expected %zu."),
+					count);
+			break;
+		} else if (sz != count && start + sz != data_end) {
+			path_warn(ctx,
+_("Short read of %zu bytes, expected %zu."),
+					sz, count);
+		}
+		data_start = start + sz;
+
+		if (data_start >= data_end && reports_holes) {
+			data_start = lseek(fd, data_end, SEEK_DATA);
+			if (data_start < 0) {
+				if (errno != ENXIO)
+					path_errno(ctx);
+				break;
+			}
+			data_end = lseek(fd, data_start, SEEK_HOLE);
+			if (data_end < 0) {
+				if (errno != ENXIO)
+					path_errno(ctx);
+				break;
+			}
+		}
+	}
+
+	/* Turn off O_DIRECT. */
+	if (direct_io) {
+		flags = fcntl(fd, F_GETFL);
+		error = fcntl(fd, F_SETFL, flags & ~O_DIRECT);
+		if (error)
+			path_errno(ctx);
+	}
+
+	return true;
+}
+
+/* Scrub a directory. */
+static bool
+check_dir(
+	struct scrub_ctx	*ctx,
+	int			dir_fd)
+{
+	DIR			*dir;
+	struct dirent		*dirent;
+	struct path_piece	pp;
+	int			fd = -1;
+	struct stat64		sb;
+	struct stat64		fd_sb;
+	bool			moveon;
+	static char		linkbuf[PATH_MAX];
+	ssize_t			len;
+	int			error;
+
+	/* FS-specific directory checks. */
+	moveon = ctx->ops->check_dir(ctx, dir_fd);
+	if (!moveon)
+		return moveon;
+
+	/* Iterate the directory entries. */
+	dir = fdopendir(dir_fd);
+	if (!dir) {
+		path_errno(ctx);
+		return true;
+	}
+
+	/* Iterate every directory entry. */
+	INIT_LIST_HEAD(&pp.list);
+	list_add_tail(&pp.list, &ctx->path_stack);
+	dirent = readdir(dir);
+	while (dirent) {
+		if (!strcmp(".", dirent->d_name) ||
+		    !strcmp("..", dirent->d_name))
+			goto next;
+
+		pp.name = dirent->d_name;
+		error = fstatat64(dir_fd, dirent->d_name, &sb,
+				AT_NO_AUTOMOUNT | AT_SYMLINK_NOFOLLOW);
+		if (error) {
+			path_errno(ctx);
+			break;
+		}
+
+		/* Ignore files on other filesystems. */
+		if (sb.st_dev != ctx->mnt_sb.st_dev)
+			goto next;
+
+		/* Check the directory entry itself. */
+		moveon = verify_dirent(ctx, dirent, &sb);
+		if (!moveon)
+			break;
+
+		/* If symlink, read the target value. */
+		if (S_ISLNK(sb.st_mode)) {
+			len = readlinkat(dir_fd, dirent->d_name, linkbuf,
+					PATH_MAX);
+			if (len < 0)
+				path_errno(ctx);
+			else if (len != sb.st_size)
+				path_error(ctx,
+_("read %zu bytes from a %zu byte symlink?"),
+					len, sb.st_size);
+		}
+
+		/* Read the xattrs without a file descriptor. */
+		if (S_ISSOCK(sb.st_mode) || S_ISFIFO(sb.st_mode) ||
+		    S_ISBLK(sb.st_mode) || S_ISCHR(sb.st_mode) ||
+		    S_ISLNK(sb.st_mode)) {
+			moveon = ctx->ops->scan_special_xattrs(ctx);
+			if (!moveon)
+				break;
+		}
+
+		/* If not dir or file, move on to the next dirent. */
+		if (!S_ISDIR(sb.st_mode) && !S_ISREG(sb.st_mode))
+			goto next;
+
+		/* Open the file */
+		fd = openat(dir_fd, dirent->d_name,
+				O_RDONLY | O_NOATIME | O_NOFOLLOW | O_NOCTTY);
+		if (fd < 0) {
+			path_errno(ctx);
+			goto next;
+		}
+
+		/* Did the fstatat and the open race? */
+		if (fstat64(fd, &fd_sb) < 0) {
+			path_errno(ctx);
+			goto close;
+		}
+		if (fd_sb.st_ino != sb.st_ino || fd_sb.st_dev != sb.st_dev)
+			path_warn(ctx,
+_("inode changed out from under us!"));
+
+		/* Check the inode. */
+		moveon = ctx->ops->check_inode(ctx, fd, &fd_sb);
+		if (!moveon)
+			break;
+
+		/* Scan the extent maps. */
+		moveon = ctx->ops->scan_extents(ctx, fd, &fd_sb, false);
+		if (!moveon)
+			break;
+		moveon = ctx->ops->scan_extents(ctx, fd, &fd_sb, true);
+		if (!moveon)
+			break;
+
+		/* Read all the file data. */
+		if (scrub_data && S_ISREG(fd_sb.st_mode)) {
+			moveon = read_file(ctx, fd, &fd_sb);
+			if (!moveon)
+				break;
+		}
+
+		/* Read all the extended attributes. */
+		moveon = ctx->ops->scan_xattrs(ctx, fd);
+		if (!moveon)
+			break;
+
+		/* If directory, call ourselves recursively. */
+		if (S_ISDIR(fd_sb.st_mode)) {
+			moveon = check_dir(ctx, fd);
+			if (!moveon)
+				break;
+			/* closedir already closed fd for us */
+			fd = -1;
+			goto next;
+		}
+
+		/* Close file. */
+close:
+		error = close(fd);
+		if (error)
+			path_errno(ctx);
+		fd = -1;
+		
+next:
+		dirent = readdir(dir);
+	}
+
+	if (fd >= 0) {
+		error = close(fd);
+		if (error)
+			path_errno(ctx);
+	}
+	list_del(&pp.list);
+
+	/* Close dir, go away. */
+	error = closedir(dir);
+	if (error)
+		path_errno(ctx);
+
+	return moveon;
+}
+
+
+
+/* Traverse the directory tree. */
+static bool
+traverse_fs(
+	struct scrub_ctx	*ctx)
+{
+	bool			moveon;
+
+	/* Check the inode. */
+	moveon = ctx->ops->check_inode(ctx, ctx->mnt_fd, &ctx->mnt_sb);
+	if (!moveon)
+		return moveon;
+
+	/* Scan the extent maps. */
+	moveon = ctx->ops->scan_extents(ctx, ctx->mnt_fd, &ctx->mnt_sb, false);
+	if (!moveon)
+		return moveon;
+	moveon = ctx->ops->scan_extents(ctx, ctx->mnt_fd, &ctx->mnt_sb, true);
+	if (!moveon)
+		return moveon;
+
+	/* Check the mountpoint directory. */
+	moveon = check_dir(ctx, ctx->mnt_fd);
+	if (!moveon)
+		return moveon;
+
+	return true;
+}
+
+static struct scrub_ops *scrub_impl[] = {
+	&xfs_scrub_ops,
+	&generic_scrub_ops,
+	NULL
+};
+
+int
+main(
+	int			argc,
+	char			**argv)
+{
+	int			c;
+	char			*mtab = NULL;
+	struct scrub_ctx	ctx;
+	bool			ismnt;
+	bool			moveon;
+	int			ret;
+	struct scrub_ops	**ops;
+
+	progname = basename(argv[0]);
+	setlocale(LC_ALL, "");
+	bindtextdomain(PACKAGE, LOCALEDIR);
+	textdomain(PACKAGE);
+
+	ctx.ops = NULL;
+	while ((c = getopt(argc, argv, "dt:vxV")) != EOF) {
+		switch (c) {
+		case 'd':
+			debug = true;
+			break;
+		case 't':
+			for (ops = scrub_impl; *ops; ops++) {
+				if (!strcmp(optarg, (*ops)->name)) {
+					ctx.ops = *ops;
+					break;
+				}
+			}
+			if (!ctx.ops) {
+				fprintf(stderr,
+_("Unknown filesystem driver '%s'.\n"),
+						optarg);
+				return 1;
+			}
+			break;
+		case 'v':
+			verbose = true;
+			break;
+		case 'x':
+			scrub_data = true;
+			break;
+		case 'V':
+			printf(_("%s version %s\n"), progname, VERSION);
+			exit(0);
+		case '?':
+		default:
+			usage();
+		}
+	}
+
+	if (optind != argc - 1)
+		usage();
+
+	ctx.errors_found = 0;
+	ctx.warnings_found = 0;
+	ctx.mntpoint = argv[optind];
+	ctx.quirks = SCRUB_QUIRK_FIEMAP_WORKS | SCRUB_QUIRK_FIEMAP_ATTR_WORKS |
+		     SCRUB_QUIRK_FIBMAP_WORKS;
+
+	/* Find the mount record for the passed-in argument. */
+
+	if (stat64(argv[optind], &ctx.mnt_sb) < 0) {
+		fprintf(stderr,
+			_("%s: could not stat: %s: %s\n"),
+			progname, argv[optind], strerror(errno));
+		return 16;
+	}
+
+	/*
+	 * If the user did not specify an explicit mount table, try to use
+	 * /proc/mounts if it is available, else /etc/mtab.  We prefer
+	 * /proc/mounts because it is kernel controlled, while /etc/mtab
+	 * may contain garbage that userspace tools like pam_mounts wrote
+	 * into it.
+	 */
+	if (!mtab) {
+		if (access(_PATH_PROC_MOUNTS, R_OK) == 0)
+			mtab = _PATH_PROC_MOUNTS;
+		else
+			mtab = _PATH_MOUNTED;
+	}
+
+	ismnt = find_mountpoint(mtab, &ctx.mnt_sb, &ctx.mnt_ent);
+	if (!ismnt) {
+		fprintf(stderr, _("%s: Not a mount point or block device.\n"),
+			ctx.mntpoint);
+		return 16;
+	}
+	ctx.mntpoint = ctx.mnt_ent.mnt_dir;
+
+	/* Find an appropriate scrub backend. */
+	for (ops = scrub_impl; !ctx.ops && *ops; ops++) {
+		if (!strcmp(ctx.mnt_ent.mnt_type, (*ops)->name))
+			ctx.ops = *ops;
+	}
+	if (!ctx.ops)
+		ctx.ops = &generic_scrub_ops;
+	INIT_LIST_HEAD(&ctx.path_stack);
+	if (verbose)
+		printf(_("%s: scrubbing %s filesystem with %s driver.\n"),
+			ctx.mntpoint, ctx.mnt_ent.mnt_type, ctx.ops->name);
+
+	/* Phase 1: Find and verify filesystem */
+	if (verbose)
+		printf(_("Phase 1: Find filesystem.\n"));
+	ctx.mnt_fd = open(ctx.mntpoint, O_RDONLY | O_NOATIME);
+	if (ctx.mnt_fd < 0) {
+		perror(ctx.mntpoint);
+		return 8;
+	}
+	ret = fstat64(ctx.mnt_fd, &ctx.mnt_sb);
+	if (ret) {
+		path_errno(&ctx);
+		moveon = false;
+		goto out;
+	}
+	moveon = ctx.ops->scan_fs(&ctx);
+	if (!moveon)
+		goto out;
+
+	/* Phase 2: Check inodes, blocks, and sizes */
+	if (verbose)
+		printf(_("Phase 2: Scanning inodes.\n"));
+	moveon = ctx.ops->scan_inodes(&ctx);
+	if (!moveon)
+		goto out;
+
+	/* Phase 3: Check the directory structure. */
+	if (verbose)
+		printf(_("Phase 3: Check the directory structure.\n"));
+	moveon = traverse_fs(&ctx);
+	if (!moveon)
+		goto out;
+
+	/* Phase X: Check for duplicate blocks(??) */
+
+	/* Phase Y: Verify link counts(??) */
+
+	/* Phase 4: Check internal group metadata. */
+	if (verbose)
+		printf(_("Phase 4: Check internal metadata.\n"));
+	moveon = ctx.ops->scan_metadata(&ctx);
+	if (!moveon)
+		goto out;
+
+	/* Clean up scan data. */
+	moveon = ctx.ops->cleanup(&ctx);
+	if (!moveon)
+		goto out;
+
+out:
+	ret = 0;
+	if (!moveon)
+		ret |= 8;
+
+	if (ctx.errors_found && ctx.warnings_found)
+		fprintf(stderr,
+_("%s: %lu errors and %lu warnings found.  Unmount and run fsck.\n"),
+			ctx.mntpoint, ctx.errors_found, ctx.warnings_found);
+	else if (ctx.errors_found && ctx.warnings_found == 0)
+		fprintf(stderr,
+_("%s: %lu errors found.  Unmount and run fsck.\n"),
+			ctx.mntpoint, ctx.errors_found);
+	else if (ctx.errors_found == 0 && ctx.warnings_found)
+		fprintf(stderr,
+_("%s: %lu warnings found.\n"),
+			ctx.mntpoint, ctx.warnings_found);
+	if (ctx.errors_found)
+		ret |= 4;
+
+	return ret;
+}
diff --git a/scrub/scrub.h b/scrub/scrub.h
new file mode 100644
index 0000000..b7436c1
--- /dev/null
+++ b/scrub/scrub.h
@@ -0,0 +1,100 @@
+/*
+ * Copyright (C) 2016 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef SCRUB_H_
+#define SCRUB_H_
+
+struct scrub_ctx;
+
+struct scrub_ops {
+	const char	*name;
+	bool (*cleanup)(struct scrub_ctx *ctx);
+	bool (*scan_fs)(struct scrub_ctx *ctx);
+	bool (*scan_inodes)(struct scrub_ctx *ctx);
+	bool (*check_dir)(struct scrub_ctx *ctx, int dir_fd);
+	bool (*check_inode)(struct scrub_ctx *ctx, int fd, struct stat64 *sb);
+	bool (*scan_extents)(struct scrub_ctx *ctx, int fd, struct stat64 *sb,
+			     bool attr_fork);
+	bool (*scan_xattrs)(struct scrub_ctx *ctx, int fd);
+	bool (*scan_special_xattrs)(struct scrub_ctx *ctx);
+	bool (*scan_metadata)(struct scrub_ctx *ctx);
+};
+
+#define SCRUB_QUIRK_FIEMAP_WORKS	(1 << 0)
+#define SCRUB_QUIRK_FIEMAP_ATTR_WORKS	(1 << 1)
+#define SCRUB_QUIRK_FIBMAP_WORKS	(1 << 2)
+struct scrub_ctx {
+	struct scrub_ops	*ops;
+	char			*mntpoint;
+	int			mnt_fd;
+	struct mntent		mnt_ent;
+	struct stat64		mnt_sb;
+	struct statvfs		mnt_sv;
+	struct statfs		mnt_sf;
+	unsigned long		errors_found;
+	unsigned long		warnings_found;
+	unsigned long		quirks;
+
+	struct list_head	path_stack;
+	void			*priv;
+};
+
+struct path_piece {
+	struct list_head	list;
+	const char		*name;
+};
+
+extern bool		verbose;
+extern bool		debug;
+extern bool		scrub_data;
+
+void __path_errno(struct scrub_ctx *, const char *, int);
+void __path_error(struct scrub_ctx *, const char *, int, const char *, ...);
+void __path_warn(struct scrub_ctx *, const char *, int, const char *, ...);
+void __str_errno(struct scrub_ctx *, const char *, const char *, int);
+void __str_error(struct scrub_ctx *, const char *, const char *, int, const char *, ...);
+void __str_warn(struct scrub_ctx *, const char *, const char *, int, const char *, ...);
+
+#define path_errno(ctx)		__path_errno(ctx, __FILE__, __LINE__)
+#define path_error(ctx, ...)	__path_error(ctx, __FILE__, __LINE__, __VA_ARGS__)
+#define path_warn(ctx, ...)	__path_warn(ctx, __FILE__, __LINE__, __VA_ARGS__)
+#define str_errno(ctx, str)		__str_errno(ctx, str, __FILE__, __LINE__)
+#define str_error(ctx, str, ...)	__str_error(ctx, str, __FILE__, __LINE__, __VA_ARGS__)
+#define str_warn(ctx, str, ...)		__str_warn(ctx, str, __FILE__, __LINE__, __VA_ARGS__)
+
+int construct_path(struct scrub_ctx *ctx, char *buf, size_t buflen);
+
+#define container_of(ptr, type, member) ({			\
+	const typeof( ((type *)0)->member ) *__mptr = (ptr);	\
+		(type *)( (char *)__mptr - offsetof(type,member) );})
+
+extern struct scrub_ops	generic_scrub_ops;
+extern struct scrub_ops	xfs_scrub_ops;
+
+bool generic_cleanup(struct scrub_ctx *ctx);
+bool generic_scan_fs(struct scrub_ctx *ctx);
+bool generic_scan_inodes(struct scrub_ctx *ctx);
+bool generic_check_dir(struct scrub_ctx *ctx, int dir_fd);
+bool generic_check_inode(struct scrub_ctx *ctx, int fd, struct stat64 *sb);
+bool generic_scan_extents(struct scrub_ctx *ctx, int fd, struct stat64 *sb,
+		bool attr_fork);
+bool generic_scan_xattrs(struct scrub_ctx *ctx, int fd);
+bool generic_scan_special_xattrs(struct scrub_ctx *ctx);
+
+#endif /* SCRUB_H_ */
diff --git a/scrub/xfs.c b/scrub/xfs.c
new file mode 100644
index 0000000..0300066
--- /dev/null
+++ b/scrub/xfs.c
@@ -0,0 +1,241 @@
+/*
+ * Copyright (C) 2016 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "libxfs.h"
+#include <sys/statvfs.h>
+#include <sys/types.h>
+#include <dirent.h>
+#include "scrub.h"
+
+/* Routines to scrub an XFS filesystem. */
+
+struct xfs_scrub_ctx {
+	xfs_fsop_geom_t		geo;
+	int			check_fd;
+};
+
+static bool
+xfs_cleanup(
+	struct scrub_ctx	*ctx)
+{
+	free(ctx->priv);
+	ctx->priv = NULL;
+
+	return generic_cleanup(ctx);
+}
+
+/* Find the /sys/fs/xfs/$dev/check path that corresponds to this fs. */
+static bool
+xfs_find_sysfs_check(
+	struct scrub_ctx	*ctx)
+{
+	struct xfs_scrub_ctx	*xctx = ctx->priv;
+	char			path[PATH_MAX];
+	char			buf[PATH_MAX];
+	int			sz;
+	ssize_t			ssz;
+	char			*p;
+
+	/* /dev/block/$major:$minor usually points "../$kernel_name" */
+	sz = snprintf(path, PATH_MAX, "/dev/block/%d:%d",
+			major(ctx->mnt_sb.st_dev), minor(ctx->mnt_sb.st_dev));
+	if (sz < 0) {
+		path_errno(ctx);
+		return false;
+	}
+
+	ssz = readlink(path, buf, PATH_MAX);
+	if (ssz < 0) {
+		perror(path);
+		return false;
+	}
+	buf[PATH_MAX - 1] = 0;
+
+	p = strchr(buf, '/');
+	p = NULL ? buf : p + 1;
+
+	/* See if we can find a pointer to /sys/fs/xfs/$p/check */
+	sz = snprintf(path, PATH_MAX, "/sys/fs/xfs/%s/check", p);
+	if (sz < 0) {
+		path_errno(ctx);
+		return false;
+	}
+
+	xctx->check_fd = open(path, O_RDONLY | O_DIRECTORY);
+	if (xctx->check_fd < 0) {
+		if (errno != ENOENT)
+			perror(path);
+		return false;
+	}
+
+	return true;
+}
+
+/* Read the XFS geometry. */
+static bool
+xfs_scan_fs(
+	struct scrub_ctx	*ctx)
+{
+	struct xfs_scrub_ctx	*xctx;
+	int			error;
+
+	if (!platform_test_xfs_fd(ctx->mnt_fd)) {
+		path_error(ctx,
+_("Does not appear to be an XFS filesystem!"));
+		return false;
+	}
+
+	xctx = malloc(sizeof(struct xfs_scrub_ctx));
+	if (!ctx) {
+		path_errno(ctx);
+		return false;
+	}
+	xctx->check_fd = -1;
+
+	/* Retrieve XFS geometry. */
+	error = xfsctl(ctx->mntpoint, ctx->mnt_fd, XFS_IOC_FSGEOMETRY,
+			&xctx->geo);
+	if (error) {
+		path_errno(ctx);
+		xfs_cleanup(ctx);
+		return false;
+	}
+	ctx->priv = xctx;
+
+	printf("xfs_scrub is incomplete on XFS.\n");
+
+	/* XXX: should we whine if we can't find the sysfs check dir? */
+	xfs_find_sysfs_check(ctx);
+
+	return generic_scan_fs(ctx);
+}
+
+/* Scrub a piece of metadata in a particular AG. */
+static bool
+xfs_scan_ag_metadata(
+	struct scrub_ctx	*ctx,
+	const char		*name,
+	xfs_agnumber_t		ag)
+{
+	struct xfs_scrub_ctx	*xctx = ctx->priv;
+	char			descr[256];
+	char			cmd[256];
+	int			fd;
+	int			sz;
+	ssize_t			ssz;
+
+	sz = snprintf(descr, 256, "AG %d %s", ag, name);
+	if (sz < 0) {
+		str_errno(ctx, name);
+		return false;
+	}
+
+	fd = openat(xctx->check_fd, name, O_WRONLY);
+	if (fd < 0) {
+		str_errno(ctx, descr);
+		return true;
+	}
+
+	sz = snprintf(cmd, 256, "%d", ag);
+	if (sz < 0) {
+		str_errno(ctx, descr);
+		goto out;
+	}
+
+	ssz = write(fd, cmd, strlen(cmd));
+	if (ssz < 0) {
+		str_errno(ctx, descr);
+		goto out;
+	} else if (ssz != strlen(cmd)) {
+		str_error(ctx, descr,
+_("Strange output length %zu (expected %zu)\n"),
+				ssz, strlen(cmd));
+		ctx->errors_found++;
+		goto out;
+	}
+
+out:
+	sz = close(fd);
+	if (sz)
+		str_errno(ctx, descr);
+
+	return true;
+}
+
+/* Try to scan metadata via sysfs. */
+static bool
+xfs_scan_metadata(
+	struct scrub_ctx	*ctx)
+{
+	struct xfs_scrub_ctx	*xctx = ctx->priv;
+	xfs_agnumber_t		ag;
+	DIR			*checkdir;
+	bool			moveon = true;
+	struct dirent		*dirent;
+	int			error;
+
+	if (xctx->check_fd < 0)
+		return true;
+
+	/* Open the check controls. */
+	checkdir = fdopendir(xctx->check_fd);
+	if (!checkdir) {
+		path_error(ctx,
+_("Failed to open the check control."));
+		return false;
+	}
+
+	/* Scan everything we can in here. */
+	while ((dirent = readdir(checkdir)) != NULL) {
+		if (!strcmp(".", dirent->d_name) ||
+		    !strcmp("..", dirent->d_name))
+			continue;
+
+		for (ag = 0; ag < xctx->geo.agcount; ag++) {
+			moveon = xfs_scan_ag_metadata(ctx, dirent->d_name, ag);
+			if (!moveon)
+				break;
+		}
+	}
+
+	/* Done with metadata scrub. */
+	error = closedir(checkdir);
+	if (error)
+		path_errno(ctx);
+	xctx->check_fd = -1;
+
+	return moveon;
+}
+
+/*
+ * XXX: eventually we'll want to do better checking here, but the generic
+ * tree walk + metadata scrub is good enough for now.
+ */
+struct scrub_ops xfs_scrub_ops = {
+	.name			= "xfs",
+	.cleanup		= xfs_cleanup,
+	.scan_fs		= xfs_scan_fs,
+	.scan_inodes		= generic_scan_inodes,
+	.check_dir		= generic_check_dir,
+	.check_inode		= generic_check_inode,
+	.scan_extents		= generic_scan_extents,
+	.scan_xattrs		= generic_scan_xattrs,
+	.scan_special_xattrs	= generic_scan_special_xattrs,
+	.scan_metadata		= xfs_scan_metadata,
+};

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* Re: [PATCH 008/145] libxfs: add more list operations
  2016-06-17  1:31 ` [PATCH 008/145] libxfs: add more list operations Darrick J. Wong
@ 2016-06-24  0:40   ` Dave Chinner
  2016-06-24  0:46     ` Darrick J. Wong
  0 siblings, 1 reply; 149+ messages in thread
From: Dave Chinner @ 2016-06-24  0:40 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: xfs

On Thu, Jun 16, 2016 at 06:31:35PM -0700, Darrick J. Wong wrote:
> Add some list operations that the deferred rmap code requires.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

This has all come from the linux kernel, right? Can you tell me
which files it has come from so I can add it to the commit message?
Maybe it would be better to keep the list sorting code in it's own
file (e.g. libxfs/list_sort.c) just to keep a bit of separation
between the xfs code and code that is copied in from outside?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 008/145] libxfs: add more list operations
  2016-06-24  0:40   ` Dave Chinner
@ 2016-06-24  0:46     ` Darrick J. Wong
  2016-06-24  1:50       ` Dave Chinner
  0 siblings, 1 reply; 149+ messages in thread
From: Darrick J. Wong @ 2016-06-24  0:46 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Fri, Jun 24, 2016 at 10:40:22AM +1000, Dave Chinner wrote:
> On Thu, Jun 16, 2016 at 06:31:35PM -0700, Darrick J. Wong wrote:
> > Add some list operations that the deferred rmap code requires.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> This has all come from the linux kernel, right?

Yes.

> Can you tell me
> which files it has come from so I can add it to the commit message?

lib/list_sort.c for all the list_sort stuff,
include/linux/list.h for the rest of the list_* stuff,
include/linux/kernel.h for container_of.

> Maybe it would be better to keep the list sorting code in it's own
> file (e.g. libxfs/list_sort.c) just to keep a bit of separation
> between the xfs code and code that is copied in from outside?

I'd wondered if it made more sense to do that, but (sort of arbitrarily
decided not to add more files).

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 008/145] libxfs: add more list operations
  2016-06-24  0:46     ` Darrick J. Wong
@ 2016-06-24  1:50       ` Dave Chinner
  0 siblings, 0 replies; 149+ messages in thread
From: Dave Chinner @ 2016-06-24  1:50 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: xfs

On Thu, Jun 23, 2016 at 05:46:38PM -0700, Darrick J. Wong wrote:
> On Fri, Jun 24, 2016 at 10:40:22AM +1000, Dave Chinner wrote:
> > On Thu, Jun 16, 2016 at 06:31:35PM -0700, Darrick J. Wong wrote:
> > > Add some list operations that the deferred rmap code requires.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > This has all come from the linux kernel, right?
> 
> Yes.
> 
> > Can you tell me
> > which files it has come from so I can add it to the commit message?
> 
> lib/list_sort.c for all the list_sort stuff,
> include/linux/list.h for the rest of the list_* stuff,
> include/linux/kernel.h for container_of.

I thought we already had container_of. Ah, only inside __KERNEL__
code fragments....

> > Maybe it would be better to keep the list sorting code in it's own
> > file (e.g. libxfs/list_sort.c) just to keep a bit of separation
> > between the xfs code and code that is copied in from outside?
> 
> I'd wondered if it made more sense to do that, but (sort of arbitrarily
> decided not to add more files).

keeping it in a spearate file makes it easier to update in future
(e.g. forklift replacement of radix tree code). maybe we don't need
to do this here, but I don't mind adding new files for stuff like
this...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 149+ messages in thread

end of thread, other threads:[~2016-06-24  1:50 UTC | newest]

Thread overview: 149+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-17  1:30 [PATCH v6 000/145] xfsprogs: add reverse mapping, reflink, dedupe, and online scrub support Darrick J. Wong
2016-06-17  1:30 ` [PATCH 001/145] xfs_buflock: add a tool that can be used to find buffer deadlocks Darrick J. Wong
2016-06-17  1:30 ` [PATCH 002/145] libxfs: port changes from kernel libxfs Darrick J. Wong
2016-06-17  1:31 ` [PATCH 003/145] libxfs: backport changes from 4.5 Darrick J. Wong
2016-06-17  1:31 ` [PATCH 004/145] libxfs: backport kernel 4.6 changes Darrick J. Wong
2016-06-17  1:31 ` [PATCH 005/145] libxfs: backport kernel 4.7 changes Darrick J. Wong
2016-06-17  1:31 ` [PATCH 006/145] xfs: make several functions static Darrick J. Wong
2016-06-17  1:31 ` [PATCH 007/145] xfs: define XFS_IOC_FREEZE even if FIFREEZE is defined Darrick J. Wong
2016-06-17  1:31 ` [PATCH 008/145] libxfs: add more list operations Darrick J. Wong
2016-06-24  0:40   ` Dave Chinner
2016-06-24  0:46     ` Darrick J. Wong
2016-06-24  1:50       ` Dave Chinner
2016-06-17  1:31 ` [PATCH 009/145] xfs_logprint: move the EFI copying/printing functions to a redo items file Darrick J. Wong
2016-06-17  1:31 ` [PATCH 010/145] xfs_logprint: fix formatting issues with the EFI printing code Darrick J. Wong
2016-06-17  1:31 ` [PATCH 011/145] man: document the DAX fsxattr inode flag Darrick J. Wong
2016-06-17  1:32 ` [PATCH 012/145] xfs: separate freelist fixing into a separate helper Darrick J. Wong
2016-06-17  1:32 ` [PATCH 013/145] xfs: convert list of extents to free into a regular list Darrick J. Wong
2016-06-17  1:32 ` [PATCH 014/145] xfs: create a standard btree size calculator code Darrick J. Wong
2016-06-17  1:32 ` [PATCH 015/145] xfs: refactor btree maxlevels computation Darrick J. Wong
2016-06-17  1:32 ` [PATCH 016/145] xfs: during btree split, save new block key & ptr for future insertion Darrick J. Wong
2016-06-17  1:32 ` [PATCH 017/145] xfs: support btrees with overlapping intervals for keys Darrick J. Wong
2016-06-17  1:32 ` [PATCH 018/145] xfs: introduce interval queries on btrees Darrick J. Wong
2016-06-17  1:32 ` [PATCH 019/145] xfs: refactor btree owner change into a separate visit-blocks function Darrick J. Wong
2016-06-17  1:32 ` [PATCH 020/145] xfs: move deferred operations into a separate file Darrick J. Wong
2016-06-17  1:32 ` [PATCH 021/145] xfs: add tracepoints for the deferred ops mechanism Darrick J. Wong
2016-06-17  1:33 ` [PATCH 022/145] xfs: enable the xfs_defer mechanism to process extents to free Darrick J. Wong
2016-06-17  1:33 ` [PATCH 023/145] xfs: rework xfs_bmap_free callers to use xfs_defer_ops Darrick J. Wong
2016-06-17  1:33 ` [PATCH 024/145] xfs: change xfs_bmap_{finish, cancel, init, free} -> xfs_defer_* Darrick J. Wong
2016-06-17  1:33 ` [PATCH 025/145] xfs: rename flist/free_list to dfops Darrick J. Wong
2016-06-17  1:33 ` [PATCH 026/145] xfs: add tracepoints and error injection for deferred extent freeing Darrick J. Wong
2016-06-17  1:33 ` [PATCH 027/145] xfs_io: add free-extent error injection type Darrick J. Wong
2016-06-17  1:33 ` [PATCH 028/145] xfs: introduce rmap btree definitions Darrick J. Wong
2016-06-17  1:33 ` [PATCH 029/145] xfs: add rmap btree stats infrastructure Darrick J. Wong
2016-06-17  1:33 ` [PATCH 030/145] xfs: rmap btree add more reserved blocks Darrick J. Wong
2016-06-17  1:33 ` [PATCH 031/145] xfs: add owner field to extent allocation and freeing Darrick J. Wong
2016-06-17  1:34 ` [PATCH 032/145] xfs: introduce rmap extent operation stubs Darrick J. Wong
2016-06-17  1:34 ` [PATCH 033/145] xfs: define the on-disk rmap btree format Darrick J. Wong
2016-06-17  1:34 ` [PATCH 034/145] xfs: rmap btree transaction reservations Darrick J. Wong
2016-06-17  1:34 ` [PATCH 035/145] xfs: rmap btree requires more reserved free space Darrick J. Wong
2016-06-17  1:34 ` [PATCH 036/145] xfs: add rmap btree operations Darrick J. Wong
2016-06-17  1:34 ` [PATCH 037/145] xfs: support overlapping intervals in the rmap btree Darrick J. Wong
2016-06-17  1:34 ` [PATCH 038/145] xfs: teach rmapbt to support interval queries Darrick J. Wong
2016-06-17  1:34 ` [PATCH 039/145] xfs: add an extent to the rmap btree Darrick J. Wong
2016-06-17  1:34 ` [PATCH 040/145] xfs: remove an extent from " Darrick J. Wong
2016-06-17  1:35 ` [PATCH 041/145] xfs: convert unwritten status of reverse mappings Darrick J. Wong
2016-06-17  1:35 ` [PATCH 042/145] xfs: add rmap btree insert and delete helpers Darrick J. Wong
2016-06-17  1:35 ` [PATCH 043/145] xfs: create helpers for mapping, unmapping, and converting file fork extents Darrick J. Wong
2016-06-17  1:35 ` [PATCH 044/145] xfs: create rmap update intent log items Darrick J. Wong
2016-06-17  1:35 ` [PATCH 045/145] xfs: enable the xfs_defer mechanism to process rmaps to update Darrick J. Wong
2016-06-17  1:35 ` [PATCH 046/145] xfs: propagate bmap updates to rmapbt Darrick J. Wong
2016-06-17  1:35 ` [PATCH 047/145] xfs: add rmap btree geometry feature flag Darrick J. Wong
2016-06-17  1:35 ` [PATCH 048/145] xfs: don't update rmapbt when fixing agfl Darrick J. Wong
2016-06-17  1:35 ` [PATCH 049/145] xfs: enable the rmap btree functionality Darrick J. Wong
2016-06-17  1:36 ` [PATCH 050/145] xfs_db: display rmap btree contents Darrick J. Wong
2016-06-17  1:36 ` [PATCH 051/145] xfs_db: spot check rmapbt Darrick J. Wong
2016-06-17  1:36 ` [PATCH 052/145] xfs_db: copy the rmap btree Darrick J. Wong
2016-06-17  1:36 ` [PATCH 053/145] xfs_growfs: report rmapbt presence Darrick J. Wong
2016-06-17  1:36 ` [PATCH 054/145] xfs_io: add rmap-finish error injection type Darrick J. Wong
2016-06-17  1:36 ` [PATCH 055/145] xfs_logprint: support rmap redo items Darrick J. Wong
2016-06-17  1:36 ` [PATCH 056/145] xfs_repair: use rmap btree data to check block types Darrick J. Wong
2016-06-17  1:36 ` [PATCH 057/145] xfs_repair: fix fino_bno calculation when rmapbt is enabled Darrick J. Wong
2016-06-17  1:36 ` [PATCH 058/145] xfs_repair: create a slab API for allocating arrays in large chunks Darrick J. Wong
2016-06-17  1:36 ` [PATCH 059/145] xfs_repair: collect reverse-mapping data for refcount/rmap tree rebuilding Darrick J. Wong
2016-06-17  1:37 ` [PATCH 060/145] xfs_repair: record and merge raw rmap data Darrick J. Wong
2016-06-17  1:37 ` [PATCH 061/145] xfs_repair: add inode bmbt block rmaps Darrick J. Wong
2016-06-17  1:37 ` [PATCH 062/145] xfs_repair: add fixed-location per-AG rmaps Darrick J. Wong
2016-06-17  1:37 ` [PATCH 063/145] xfs_repair: check existing rmapbt entries against observed rmaps Darrick J. Wong
2016-06-17  1:37 ` [PATCH 064/145] xfs_repair: rebuild reverse-mapping btree Darrick J. Wong
2016-06-17  1:37 ` [PATCH 065/145] xfs_repair: add per-AG btree blocks to rmap data and add to rmapbt Darrick J. Wong
2016-06-17  1:37 ` [PATCH 066/145] xfs_repair: merge data & attr fork reverse mappings Darrick J. Wong
2016-06-17  1:37 ` [PATCH 067/145] xfs_repair: look for mergeable rmaps Darrick J. Wong
2016-06-17  1:37 ` [PATCH 068/145] xfs_repair: check for impossible rmap record field combinations Darrick J. Wong
2016-06-17  1:38 ` [PATCH 069/145] mkfs: set agsize prior to calculating minimum log size Darrick J. Wong
2016-06-17  1:38 ` [PATCH 070/145] mkfs.xfs: create filesystems with reverse-mappings Darrick J. Wong
2016-06-17  1:38 ` [PATCH 071/145] xfs: count the blocks in a btree Darrick J. Wong
2016-06-17  1:38 ` [PATCH 072/145] xfs: set up per-AG free space reservations Darrick J. Wong
2016-06-17  1:38 ` [PATCH 073/145] xfs: introduce refcount btree definitions Darrick J. Wong
2016-06-17  1:38 ` [PATCH 074/145] xfs: add refcount btree stats infrastructure Darrick J. Wong
2016-06-17  1:38 ` [PATCH 075/145] xfs: refcount btree add more reserved blocks Darrick J. Wong
2016-06-17  1:38 ` [PATCH 076/145] xfs: define the on-disk refcount btree format Darrick J. Wong
2016-06-17  1:38 ` [PATCH 077/145] xfs: account for the refcount btree in the alloc/free log reservation Darrick J. Wong
2016-06-17  1:38 ` [PATCH 078/145] xfs: add refcount btree operations Darrick J. Wong
2016-06-17  1:39 ` [PATCH 079/145] xfs: create refcount update intent log items Darrick J. Wong
2016-06-17  1:39 ` [PATCH 080/145] xfs: log refcount intent items Darrick J. Wong
2016-06-17  1:39 ` [PATCH 081/145] xfs: adjust refcount of an extent of blocks in refcount btree Darrick J. Wong
2016-06-17  1:39 ` [PATCH 082/145] xfs: connect refcount adjust functions to upper layers Darrick J. Wong
2016-06-17  1:39 ` [PATCH 083/145] xfs: adjust refcount when unmapping file blocks Darrick J. Wong
2016-06-17  1:39 ` [PATCH 084/145] xfs: refcount btree requires more reserved space Darrick J. Wong
2016-06-17  1:39 ` [PATCH 085/145] xfs: introduce reflink utility functions Darrick J. Wong
2016-06-17  1:39 ` [PATCH 086/145] xfs: create bmbt update intent log items Darrick J. Wong
2016-06-17  1:39 ` [PATCH 087/145] xfs: log bmap intent items Darrick J. Wong
2016-06-17  1:40 ` [PATCH 088/145] xfs: map an inode's offset to an exact physical block Darrick J. Wong
2016-06-17  1:40 ` [PATCH 089/145] xfs: implement deferred bmbt map/unmap operations Darrick J. Wong
2016-06-17  1:40 ` [PATCH 090/145] xfs: return work remaining at the end of a bunmapi operation Darrick J. Wong
2016-06-17  1:40 ` [PATCH 091/145] xfs: add reflink feature flag to geometry Darrick J. Wong
2016-06-17  1:40 ` [PATCH 092/145] xfs: don't allow reflinked dir/dev/fifo/socket/pipe files Darrick J. Wong
2016-06-17  1:40 ` [PATCH 093/145] xfs: introduce the CoW fork Darrick J. Wong
2016-06-17  1:40 ` [PATCH 094/145] xfs: support bmapping delalloc extents in " Darrick J. Wong
2016-06-17  1:40 ` [PATCH 095/145] xfs: support allocating delayed extents in " Darrick J. Wong
2016-06-17  1:40 ` [PATCH 096/145] xfs: support removing extents from " Darrick J. Wong
2016-06-17  1:41 ` [PATCH 097/145] xfs: store in-progress CoW allocations in the refcount btree Darrick J. Wong
2016-06-17  1:41 ` [PATCH 098/145] xfs: teach get_bmapx and fiemap about shared extents and the CoW fork Darrick J. Wong
2016-06-17  1:41 ` [PATCH 099/145] xfs: support FS_XFLAG_REFLINK on reflink filesystems Darrick J. Wong
2016-06-17  1:41 ` [PATCH 100/145] xfs: create a separate cow extent size hint for the allocator Darrick J. Wong
2016-06-17  1:41 ` [PATCH 101/145] xfs: preallocate blocks for worst-case btree expansion Darrick J. Wong
2016-06-17  1:41 ` [PATCH 102/145] xfs: try other AGs to allocate a BMBT block Darrick J. Wong
2016-06-17  1:41 ` [PATCH 103/145] xfs: provide switch to force filesystem to copy-on-write all the time Darrick J. Wong
2016-06-17  1:41 ` [PATCH 104/145] xfs: increase log reservations for reflink Darrick J. Wong
2016-06-17  1:41 ` [PATCH 105/145] xfs: use interval query for rmap map and unmap operations on shared files Darrick J. Wong
2016-06-17  1:41 ` [PATCH 106/145] xfs: convert unwritten status of shared-extent reverse mappings " Darrick J. Wong
2016-06-17  1:42 ` [PATCH 107/145] xfs: don't allow realtime and reflinked files to mix Darrick J. Wong
2016-06-17  1:42 ` [PATCH 108/145] xfs: don't mix reflink and DAX mode for now Darrick J. Wong
2016-06-17  1:42 ` [PATCH 109/145] xfs: recognize the reflink feature bit Darrick J. Wong
2016-06-17  1:42 ` [PATCH 110/145] xfs_db: dump refcount btree data Darrick J. Wong
2016-06-17  1:42 ` [PATCH 111/145] xfs_db: add support for checking the refcount btree Darrick J. Wong
2016-06-17  1:42 ` [PATCH 112/145] xfs_db: metadump should copy the refcount btree too Darrick J. Wong
2016-06-17  1:42 ` [PATCH 113/145] xfs_db: deal with the CoW extent size hint Darrick J. Wong
2016-06-17  1:42 ` [PATCH 114/145] xfs_growfs: report the presence of the reflink feature Darrick J. Wong
2016-06-17  1:42 ` [PATCH 115/145] xfs_io: bmap should support querying CoW fork, shared blocks Darrick J. Wong
2016-06-17  1:42 ` [PATCH 116/145] xfs_io: get and set the CoW extent size hint Darrick J. Wong
2016-06-17  1:43 ` [PATCH 117/145] xfs_io: add refcount+bmap error injection types Darrick J. Wong
2016-06-17  1:43 ` [PATCH 118/145] xfs_logprint: support cowextsize reporting in log contents Darrick J. Wong
2016-06-17  1:43 ` [PATCH 119/145] xfs_logprint: support refcount redo items Darrick J. Wong
2016-06-17  1:43 ` [PATCH 120/145] xfs_logprint: support bmap " Darrick J. Wong
2016-06-17  1:43 ` [PATCH 121/145] man: document the reflink inode flag in fsxattr Darrick J. Wong
2016-06-17  1:43 ` [PATCH 122/145] man: document the inode cowextsize flags & fields Darrick J. Wong
2016-06-17  1:43 ` [PATCH 123/145] xfs_repair: fix get_agino_buf to avoid corrupting inodes Darrick J. Wong
2016-06-17  1:43 ` [PATCH 124/145] xfs_repair: check the existing refcount btree Darrick J. Wong
2016-06-17  1:43 ` [PATCH 125/145] xfs_repair: handle multiple owners of data blocks Darrick J. Wong
2016-06-17  1:44 ` [PATCH 126/145] xfs_repair: process reverse-mapping data into refcount data Darrick J. Wong
2016-06-17  1:44 ` [PATCH 127/145] xfs_repair: record reflink inode state Darrick J. Wong
2016-06-17  1:44 ` [PATCH 128/145] xfs_repair: fix inode reflink flags Darrick J. Wong
2016-06-17  1:44 ` [PATCH 129/145] xfs_repair: check the refcount btree against our observed reference counts when -n Darrick J. Wong
2016-06-17  1:44 ` [PATCH 130/145] xfs_repair: rebuild the refcount btree Darrick J. Wong
2016-06-17  1:44 ` [PATCH 131/145] xfs_repair: complain about copy-on-write leftovers Darrick J. Wong
2016-06-17  1:44 ` [PATCH 132/145] xfs_repair: check the CoW extent size hint Darrick J. Wong
2016-06-17  1:44 ` [PATCH 133/145] xfs_repair: use range query when while checking rmaps Darrick J. Wong
2016-06-17  1:44 ` [PATCH 134/145] xfs_repair: check for mergeable refcount records Darrick J. Wong
2016-06-17  1:44 ` [PATCH 135/145] mkfs.xfs: format reflink enabled filesystems Darrick J. Wong
2016-06-17  1:45 ` [PATCH 136/145] xfs: introduce the XFS_IOC_GETFSMAPX ioctl Darrick J. Wong
2016-06-17  1:45 ` [PATCH 137/145] xfs_db: introduce the 'fsmap' command to find what owns a set of fsblocks Darrick J. Wong
2016-06-17  1:45 ` [PATCH 138/145] xfs_io: support the new getfsmap ioctl Darrick J. Wong
2016-06-17  1:45 ` [PATCH 139/145] xfs: scrub btree records and pointers while querying Darrick J. Wong
2016-06-17  1:45 ` [PATCH 140/145] xfs: support scrubbing free space btrees Darrick J. Wong
2016-06-17  1:45 ` [PATCH 141/145] xfs: support scrubbing inode btrees Darrick J. Wong
2016-06-17  1:45 ` [PATCH 142/145] xfs: support scrubbing rmap btree Darrick J. Wong
2016-06-17  1:45 ` [PATCH 143/145] xfs: support scrubbing refcount btree Darrick J. Wong
2016-06-17  1:45 ` [PATCH 144/145] xfs: add btree scrub tracepoints Darrick J. Wong
2016-06-17  1:46 ` [PATCH 145/145] xfs_scrub: create online filesystem scrub program Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.