All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v10 00/39] xfsprogs: add reflink and dedupe support
@ 2016-10-25 23:03 Darrick J. Wong
  2016-10-25 23:03 ` [PATCH 01/39] xfs: convert RUI log formats to use variable length arrays Darrick J. Wong
                   ` (38 more replies)
  0 siblings, 39 replies; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-25 23:03 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: linux-xfs

Hi all,

This is the tenth revision of a patchset that adds to XFS userland tool
support for mapping multiple file logical blocks to the same physical
block (reflink/deduplication).

The core libxfs patches from the kernel have already been imported into
for-next, so this patch series contains fixes for those patches followed
by patches for the userspace programs themselves to support reflink.

If you're going to start using this mess, you probably ought to just
pull from my github trees for xfsprogs[1] and xfstests[2].  Kernel
support is in 4.9-rc1 or later.

The patches have been xfstested with x64, ppc64, and armhf; all tests
in the clone and rmap groups pass.  AFAICT they don't cause any new
failures for the 'auto' group.

This is an extraordinary way to eat your data.  Enjoy! 
Comments and questions are, as always, welcome.

--D

[1] https://github.com/djwong/xfsprogs/tree/djwong-devel
[2] https://github.com/djwong/xfstests/tree/djwong-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 01/39] xfs: convert RUI log formats to use variable length arrays
  2016-10-25 23:03 [PATCH v10 00/39] xfsprogs: add reflink and dedupe support Darrick J. Wong
@ 2016-10-25 23:03 ` Darrick J. Wong
  2016-10-26 10:21   ` Christoph Hellwig
  2016-10-25 23:03 ` [PATCH 02/39] xfs_buflock: handling parsing errors more gracefully Darrick J. Wong
                   ` (37 subsequent siblings)
  38 siblings, 1 reply; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-25 23:03 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: linux-xfs

Use variable length array declarations for RUI log items,
and replace the open coded sizeof formulae with a single function.

[Fix up the logprint code to reflect the new RUI format.]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 logprint/log_redo.c |    6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)


diff --git a/logprint/log_redo.c b/logprint/log_redo.c
index add0764..40e0727 100644
--- a/logprint/log_redo.c
+++ b/logprint/log_redo.c
@@ -240,8 +240,7 @@ xfs_rui_copy_format(
 	int			  continued)
 {
 	uint nextents = ((struct xfs_rui_log_format *)buf)->rui_nextents;
-	uint dst_len = sizeof(struct xfs_rui_log_format) +
-			(nextents - 1) * sizeof(struct xfs_map_extent);
+	uint dst_len = xfs_rui_log_format_sizeof(nextents);
 
 	if (len == dst_len || continued) {
 		memcpy((char *)dst_fmt, buf, len);
@@ -283,8 +282,7 @@ xlog_print_trans_rui(
 
 	/* convert to native format */
 	nextents = src_f->rui_nextents;
-	dst_len = sizeof(struct xfs_rui_log_format) +
-			(nextents - 1) * sizeof(struct xfs_map_extent);
+	dst_len = xfs_rui_log_format_sizeof(nextents);
 
 	if (continued && src_len < core_size) {
 		printf(_("RUI: Not enough data to decode further\n"));


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 02/39] xfs_buflock: handling parsing errors more gracefully
  2016-10-25 23:03 [PATCH v10 00/39] xfsprogs: add reflink and dedupe support Darrick J. Wong
  2016-10-25 23:03 ` [PATCH 01/39] xfs: convert RUI log formats to use variable length arrays Darrick J. Wong
@ 2016-10-25 23:03 ` Darrick J. Wong
  2016-10-26 10:22   ` Christoph Hellwig
  2016-10-25 23:03 ` [PATCH 03/39] xfs: define the on-disk refcount btree format Darrick J. Wong
                   ` (36 subsequent siblings)
  38 siblings, 1 reply; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-25 23:03 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: linux-xfs

Skip ftrace output lines that don't parse.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 tools/xfsbuflock.py |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)


diff --git a/tools/xfsbuflock.py b/tools/xfsbuflock.py
index f307461..82b6e01 100755
--- a/tools/xfsbuflock.py
+++ b/tools/xfsbuflock.py
@@ -150,7 +150,10 @@ for line in fileinput.input():
 	if len(toks) < 4:
 		continue
 	pid = toks[0]
-	time = float(toks[2][:-1])
+	try:
+		time = float(toks[2][:-1])
+	except:
+		continue
 	fn = toks[3][:-1]
 
 	if pid in processes:


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 03/39] xfs: define the on-disk refcount btree format
  2016-10-25 23:03 [PATCH v10 00/39] xfsprogs: add reflink and dedupe support Darrick J. Wong
  2016-10-25 23:03 ` [PATCH 01/39] xfs: convert RUI log formats to use variable length arrays Darrick J. Wong
  2016-10-25 23:03 ` [PATCH 02/39] xfs_buflock: handling parsing errors more gracefully Darrick J. Wong
@ 2016-10-25 23:03 ` Darrick J. Wong
  2016-10-26 10:23   ` Christoph Hellwig
  2016-10-25 23:04 ` [PATCH 04/39] xfs: add refcount btree operations Darrick J. Wong
                   ` (35 subsequent siblings)
  38 siblings, 1 reply; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-25 23:03 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: linux-xfs, Christoph Hellwig

Start constructing the refcount btree implementation by establishing
the on-disk format and everything needed to read, write, and
manipulate the refcount btree blocks.

[Initialize the in-core mount context.]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[hch: allocate the cursor with KM_NOFS to quiet lockdep]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
v2: Calculate a separate maxlevels for the refcount btree.

v3: Enable the tracking of per-cursor stats for refcount btrees.
The refcount update code will use this to guess if it's time to
split a refcountbt update across two transactions to avoid
exhausing the transaction reservation.

xfs_refcountbt_init_cursor can be called under the ilock, so
use KM_NOFS to prevent fs activity with a lock held.  This
should shut up some of the lockdep warnings.
---
 include/libxfs.h |    1 +
 libxfs/init.c    |    2 ++
 2 files changed, 3 insertions(+)


diff --git a/include/libxfs.h b/include/libxfs.h
index cf59d6c..ec8f6ab 100644
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -79,6 +79,7 @@ extern uint32_t crc32c_le(uint32_t crc, unsigned char const *p, size_t len);
 #include "xfs_trans.h"
 #include "xfs_rmap_btree.h"
 #include "xfs_rmap.h"
+#include "xfs_refcount_btree.h"
 
 #ifndef ARRAY_SIZE
 #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
diff --git a/libxfs/init.c b/libxfs/init.c
index 828ae3e..c962d3e 100644
--- a/libxfs/init.c
+++ b/libxfs/init.c
@@ -32,6 +32,7 @@
 #include "xfs_inode.h"
 #include "xfs_trans.h"
 #include "xfs_rmap_btree.h"
+#include "xfs_refcount_btree.h"
 
 #include "libxfs.h"		/* for now */
 
@@ -689,6 +690,7 @@ libxfs_mount(
 	xfs_bmap_compute_maxlevels(mp, XFS_ATTR_FORK);
 	xfs_ialloc_compute_maxlevels(mp);
 	xfs_rmapbt_compute_maxlevels(mp);
+	xfs_refcountbt_compute_maxlevels(mp);
 
 	if (sbp->sb_imax_pct) {
 		/* Make sure the maximum inode count is a multiple of the


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 04/39] xfs: add refcount btree operations
  2016-10-25 23:03 [PATCH v10 00/39] xfsprogs: add reflink and dedupe support Darrick J. Wong
                   ` (2 preceding siblings ...)
  2016-10-25 23:03 ` [PATCH 03/39] xfs: define the on-disk refcount btree format Darrick J. Wong
@ 2016-10-25 23:04 ` Darrick J. Wong
  2016-10-26 10:23   ` Christoph Hellwig
  2016-10-25 23:04 ` [PATCH 05/39] xfs: connect refcount adjust functions to upper layers Darrick J. Wong
                   ` (34 subsequent siblings)
  38 siblings, 1 reply; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-25 23:04 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: linux-xfs, Christoph Hellwig

Implement the generic btree operations required to manipulate refcount
btree blocks.  The implementation is similar to the bmapbt, though it
will only allocate and free blocks from the AG.

[Add the xfs_refcount.h file to the standard include list.]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[hch: fix logging of AGF refcount btree fields]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
v2: Remove init_rec_from_key since we no longer need it, and add
tracepoints when refcount btree operations fail.

Since the refcount root and level fields are separate from the
existing roots and levels array, they need a separate logging flag.
---
 include/libxfs.h |    1 +
 1 file changed, 1 insertion(+)


diff --git a/include/libxfs.h b/include/libxfs.h
index ec8f6ab..e5e1523 100644
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -80,6 +80,7 @@ extern uint32_t crc32c_le(uint32_t crc, unsigned char const *p, size_t len);
 #include "xfs_rmap_btree.h"
 #include "xfs_rmap.h"
 #include "xfs_refcount_btree.h"
+#include "xfs_refcount.h"
 
 #ifndef ARRAY_SIZE
 #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 05/39] xfs: connect refcount adjust functions to upper layers
  2016-10-25 23:03 [PATCH v10 00/39] xfsprogs: add reflink and dedupe support Darrick J. Wong
                   ` (3 preceding siblings ...)
  2016-10-25 23:04 ` [PATCH 04/39] xfs: add refcount btree operations Darrick J. Wong
@ 2016-10-25 23:04 ` Darrick J. Wong
  2016-10-26 10:24   ` Christoph Hellwig
  2016-10-25 23:04 ` [PATCH 06/39] xfs: implement deferred bmbt map/unmap operations Darrick J. Wong
                   ` (33 subsequent siblings)
  38 siblings, 1 reply; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-25 23:04 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: linux-xfs

Plumb in the upper level interface to schedule and finish deferred
refcount operations via the deferred ops mechanism.

[Plumb in refcount deferred op log items.]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 include/xfs_trans.h |    1 
 libxfs/defer_item.c |  130 +++++++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/init.c       |    1 
 3 files changed, 132 insertions(+)


diff --git a/include/xfs_trans.h b/include/xfs_trans.h
index ab5d59b..739a792 100644
--- a/include/xfs_trans.h
+++ b/include/xfs_trans.h
@@ -147,5 +147,6 @@ libxfs_trans_read_buf(
 
 void xfs_extent_free_init_defer_op(void);
 void xfs_rmap_update_init_defer_op(void);
+void xfs_refcount_update_init_defer_op(void);
 
 #endif	/* __XFS_TRANS_H__ */
diff --git a/libxfs/defer_item.c b/libxfs/defer_item.c
index f60a11b..1b7d037 100644
--- a/libxfs/defer_item.c
+++ b/libxfs/defer_item.c
@@ -31,6 +31,7 @@
 #include "xfs_bmap.h"
 #include "xfs_alloc.h"
 #include "xfs_rmap.h"
+#include "xfs_refcount.h"
 
 /* Dummy defer item ops, since we don't do logging. */
 
@@ -257,3 +258,132 @@ xfs_rmap_update_init_defer_op(void)
 {
 	xfs_defer_init_op_type(&xfs_rmap_update_defer_type);
 }
+
+/* Reference Counting */
+
+/* Sort refcount intents by AG. */
+static int
+xfs_refcount_update_diff_items(
+	void				*priv,
+	struct list_head		*a,
+	struct list_head		*b)
+{
+	struct xfs_mount		*mp = priv;
+	struct xfs_refcount_intent	*ra;
+	struct xfs_refcount_intent	*rb;
+
+	ra = container_of(a, struct xfs_refcount_intent, ri_list);
+	rb = container_of(b, struct xfs_refcount_intent, ri_list);
+	return  XFS_FSB_TO_AGNO(mp, ra->ri_startblock) -
+		XFS_FSB_TO_AGNO(mp, rb->ri_startblock);
+}
+
+/* Get an CUI. */
+STATIC void *
+xfs_refcount_update_create_intent(
+	struct xfs_trans		*tp,
+	unsigned int			count)
+{
+	return NULL;
+}
+
+/* Log refcount updates in the intent item. */
+STATIC void
+xfs_refcount_update_log_item(
+	struct xfs_trans		*tp,
+	void				*intent,
+	struct list_head		*item)
+{
+}
+
+/* Get an CUD so we can process all the deferred refcount updates. */
+STATIC void *
+xfs_refcount_update_create_done(
+	struct xfs_trans		*tp,
+	void				*intent,
+	unsigned int			count)
+{
+	return NULL;
+}
+
+/* Process a deferred refcount update. */
+STATIC int
+xfs_refcount_update_finish_item(
+	struct xfs_trans		*tp,
+	struct xfs_defer_ops		*dop,
+	struct list_head		*item,
+	void				*done_item,
+	void				**state)
+{
+	struct xfs_refcount_intent	*refc;
+	xfs_fsblock_t			new_fsb;
+	xfs_extlen_t			new_aglen;
+	int				error;
+
+	refc = container_of(item, struct xfs_refcount_intent, ri_list);
+	error = xfs_refcount_finish_one(tp, dop,
+			refc->ri_type,
+			refc->ri_startblock,
+			refc->ri_blockcount,
+			&new_fsb, &new_aglen,
+			(struct xfs_btree_cur **)state);
+	/* Did we run out of reservation?  Requeue what we didn't finish. */
+	if (!error && new_aglen > 0) {
+		ASSERT(refc->ri_type == XFS_REFCOUNT_INCREASE ||
+		       refc->ri_type == XFS_REFCOUNT_DECREASE);
+		refc->ri_startblock = new_fsb;
+		refc->ri_blockcount = new_aglen;
+		return -EAGAIN;
+	}
+	kmem_free(refc);
+	return error;
+}
+
+/* Clean up after processing deferred refcounts. */
+STATIC void
+xfs_refcount_update_finish_cleanup(
+	struct xfs_trans	*tp,
+	void			*state,
+	int			error)
+{
+	struct xfs_btree_cur	*rcur = state;
+
+	xfs_refcount_finish_one_cleanup(tp, rcur, error);
+}
+
+/* Abort all pending CUIs. */
+STATIC void
+xfs_refcount_update_abort_intent(
+	void				*intent)
+{
+}
+
+/* Cancel a deferred refcount update. */
+STATIC void
+xfs_refcount_update_cancel_item(
+	struct list_head		*item)
+{
+	struct xfs_refcount_intent	*refc;
+
+	refc = container_of(item, struct xfs_refcount_intent, ri_list);
+	kmem_free(refc);
+}
+
+static const struct xfs_defer_op_type xfs_refcount_update_defer_type = {
+	.type		= XFS_DEFER_OPS_TYPE_REFCOUNT,
+	.diff_items	= xfs_refcount_update_diff_items,
+	.create_intent	= xfs_refcount_update_create_intent,
+	.abort_intent	= xfs_refcount_update_abort_intent,
+	.log_item	= xfs_refcount_update_log_item,
+	.create_done	= xfs_refcount_update_create_done,
+	.finish_item	= xfs_refcount_update_finish_item,
+	.finish_cleanup = xfs_refcount_update_finish_cleanup,
+	.cancel_item	= xfs_refcount_update_cancel_item,
+};
+
+/* Register the deferred op type. */
+void
+xfs_refcount_update_init_defer_op(void)
+{
+	xfs_defer_init_op_type(&xfs_refcount_update_defer_type);
+}
diff --git a/libxfs/init.c b/libxfs/init.c
index c962d3e..9d9e928 100644
--- a/libxfs/init.c
+++ b/libxfs/init.c
@@ -268,6 +268,7 @@ libxfs_init(libxfs_init_t *a)
 
 	xfs_extent_free_init_defer_op();
 	xfs_rmap_update_init_defer_op();
+	xfs_refcount_update_init_defer_op();
 
 	radix_tree_init();
 


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 06/39] xfs: implement deferred bmbt map/unmap operations
  2016-10-25 23:03 [PATCH v10 00/39] xfsprogs: add reflink and dedupe support Darrick J. Wong
                   ` (4 preceding siblings ...)
  2016-10-25 23:04 ` [PATCH 05/39] xfs: connect refcount adjust functions to upper layers Darrick J. Wong
@ 2016-10-25 23:04 ` Darrick J. Wong
  2016-10-25 23:04 ` [PATCH 07/39] xfs: introduce the CoW fork Darrick J. Wong
                   ` (32 subsequent siblings)
  38 siblings, 0 replies; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-25 23:04 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: linux-xfs

Implement deferred versions of the inode block map/unmap functions.
These will be used in subsequent patches to make reflink operations
atomic.

[Plumb in bmap deferred op log items.]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 include/xfs_trans.h |    1 
 libxfs/defer_item.c |  107 +++++++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/init.c       |    1 
 3 files changed, 109 insertions(+)


diff --git a/include/xfs_trans.h b/include/xfs_trans.h
index 739a792..44deebb 100644
--- a/include/xfs_trans.h
+++ b/include/xfs_trans.h
@@ -148,5 +148,6 @@ libxfs_trans_read_buf(
 void xfs_extent_free_init_defer_op(void);
 void xfs_rmap_update_init_defer_op(void);
 void xfs_refcount_update_init_defer_op(void);
+void xfs_bmap_update_init_defer_op(void);
 
 #endif	/* __XFS_TRANS_H__ */
diff --git a/libxfs/defer_item.c b/libxfs/defer_item.c
index 1b7d037..49bf7f8 100644
--- a/libxfs/defer_item.c
+++ b/libxfs/defer_item.c
@@ -32,6 +32,8 @@
 #include "xfs_alloc.h"
 #include "xfs_rmap.h"
 #include "xfs_refcount.h"
+#include "xfs_bmap.h"
+#include "xfs_inode.h"
 
 /* Dummy defer item ops, since we don't do logging. */
 
@@ -387,3 +389,108 @@ xfs_refcount_update_init_defer_op(void)
 {
 	xfs_defer_init_op_type(&xfs_refcount_update_defer_type);
 }
+
+/* Inode Block Mapping */
+
+/* Sort bmap intents by inode. */
+static int
+xfs_bmap_update_diff_items(
+	void				*priv,
+	struct list_head		*a,
+	struct list_head		*b)
+{
+	struct xfs_bmap_intent		*ba;
+	struct xfs_bmap_intent		*bb;
+
+	ba = container_of(a, struct xfs_bmap_intent, bi_list);
+	bb = container_of(b, struct xfs_bmap_intent, bi_list);
+	return ba->bi_owner->i_ino - bb->bi_owner->i_ino;
+}
+
+/* Get an BUI. */
+STATIC void *
+xfs_bmap_update_create_intent(
+	struct xfs_trans		*tp,
+	unsigned int			count)
+{
+	return NULL;
+}
+
+/* Log bmap updates in the intent item. */
+STATIC void
+xfs_bmap_update_log_item(
+	struct xfs_trans		*tp,
+	void				*intent,
+	struct list_head		*item)
+{
+}
+
+/* Get an BUD so we can process all the deferred rmap updates. */
+STATIC void *
+xfs_bmap_update_create_done(
+	struct xfs_trans		*tp,
+	void				*intent,
+	unsigned int			count)
+{
+	return NULL;
+}
+
+/* Process a deferred rmap update. */
+STATIC int
+xfs_bmap_update_finish_item(
+	struct xfs_trans		*tp,
+	struct xfs_defer_ops		*dop,
+	struct list_head		*item,
+	void				*done_item,
+	void				**state)
+{
+	struct xfs_bmap_intent		*bmap;
+	int				error;
+
+	bmap = container_of(item, struct xfs_bmap_intent, bi_list);
+	error = xfs_bmap_finish_one(tp, dop,
+			bmap->bi_owner,
+			bmap->bi_type, bmap->bi_whichfork,
+			bmap->bi_bmap.br_startoff,
+			bmap->bi_bmap.br_startblock,
+			bmap->bi_bmap.br_blockcount,
+			bmap->bi_bmap.br_state);
+	kmem_free(bmap);
+	return error;
+}
+
+/* Abort all pending BUIs. */
+STATIC void
+xfs_bmap_update_abort_intent(
+	void				*intent)
+{
+}
+
+/* Cancel a deferred rmap update. */
+STATIC void
+xfs_bmap_update_cancel_item(
+	struct list_head		*item)
+{
+	struct xfs_bmap_intent		*bmap;
+
+	bmap = container_of(item, struct xfs_bmap_intent, bi_list);
+	kmem_free(bmap);
+}
+
+static const struct xfs_defer_op_type xfs_bmap_update_defer_type = {
+	.type		= XFS_DEFER_OPS_TYPE_BMAP,
+	.diff_items	= xfs_bmap_update_diff_items,
+	.create_intent	= xfs_bmap_update_create_intent,
+	.abort_intent	= xfs_bmap_update_abort_intent,
+	.log_item	= xfs_bmap_update_log_item,
+	.create_done	= xfs_bmap_update_create_done,
+	.finish_item	= xfs_bmap_update_finish_item,
+	.cancel_item	= xfs_bmap_update_cancel_item,
+};
+
+/* Register the deferred op type. */
+void
+xfs_bmap_update_init_defer_op(void)
+{
+	xfs_defer_init_op_type(&xfs_bmap_update_defer_type);
+}
diff --git a/libxfs/init.c b/libxfs/init.c
index 9d9e928..3721589 100644
--- a/libxfs/init.c
+++ b/libxfs/init.c
@@ -269,6 +269,7 @@ libxfs_init(libxfs_init_t *a)
 	xfs_extent_free_init_defer_op();
 	xfs_rmap_update_init_defer_op();
 	xfs_refcount_update_init_defer_op();
+	xfs_bmap_update_init_defer_op();
 
 	radix_tree_init();
 


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 07/39] xfs: introduce the CoW fork
  2016-10-25 23:03 [PATCH v10 00/39] xfsprogs: add reflink and dedupe support Darrick J. Wong
                   ` (5 preceding siblings ...)
  2016-10-25 23:04 ` [PATCH 06/39] xfs: implement deferred bmbt map/unmap operations Darrick J. Wong
@ 2016-10-25 23:04 ` Darrick J. Wong
  2016-10-26 10:25   ` Christoph Hellwig
  2016-10-25 23:04 ` [PATCH 08/39] xfs: create a separate cow extent size hint for the allocator Darrick J. Wong
                   ` (31 subsequent siblings)
  38 siblings, 1 reply; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-25 23:04 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: linux-xfs

Introduce a new in-core fork for storing copy-on-write delalloc
reservations and allocated extents that are in the process of being
written out.

[Clean up the CoW fork, should there ever be one.]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
v2: fix up bmapi_read so that we can query the CoW fork, and have it
return a "hole" extent if there's no CoW fork.
---
 libxfs/rdwr.c |    2 ++
 1 file changed, 2 insertions(+)


diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
index 526bc62..8b22eb4 100644
--- a/libxfs/rdwr.c
+++ b/libxfs/rdwr.c
@@ -1372,6 +1372,8 @@ libxfs_idestroy(xfs_inode_t *ip)
 	}
 	if (ip->i_afp)
 		libxfs_idestroy_fork(ip, XFS_ATTR_FORK);
+	if (ip->i_cowfp)
+		xfs_idestroy_fork(ip, XFS_COW_FORK);
 }
 
 void


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 08/39] xfs: create a separate cow extent size hint for the allocator
  2016-10-25 23:03 [PATCH v10 00/39] xfsprogs: add reflink and dedupe support Darrick J. Wong
                   ` (6 preceding siblings ...)
  2016-10-25 23:04 ` [PATCH 07/39] xfs: introduce the CoW fork Darrick J. Wong
@ 2016-10-25 23:04 ` Darrick J. Wong
  2016-10-25 23:04 ` [PATCH 09/39] xfs_db: dump refcount btree data Darrick J. Wong
                   ` (30 subsequent siblings)
  38 siblings, 0 replies; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-25 23:04 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: linux-xfs

Create a per-inode extent size allocator hint for copy-on-write.  This
hint is separate from the existing extent size hint so that CoW can
take advantage of the fragmentation-reducing properties of extent size
hints without disabling delalloc for regular writes.

The extent size hint that's fed to the allocator during a copy on
write operation is the greater of the cowextsize and regular extsize
hint.

During reflink, if we're sharing the entire source file to the entire
destination file and the destination file doesn't already have a
cowextsize hint, propagate the source file's cowextsize hint to the
destination file.

[Plumb in the appropriate fsxattr flags and fields in the headers.]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 include/darwin.h  |    7 ++++++-
 include/freebsd.h |    6 +++++-
 include/irix.h    |    6 +++++-
 include/linux.h   |    7 ++++++-
 4 files changed, 22 insertions(+), 4 deletions(-)


diff --git a/include/darwin.h b/include/darwin.h
index 4132bfd..6c9f87a 100644
--- a/include/darwin.h
+++ b/include/darwin.h
@@ -292,7 +292,8 @@ struct fsxattr {
 	__u32		fsx_extsize;	/* extsize field value (get/set)*/
 	__u32		fsx_nextents;	/* nextents field value (get)	*/
 	__u32		fsx_projid;	/* project identifier (get/set) */
-	unsigned char	fsx_pad[12];
+	__u32		fsx_cowextsize;	/* cow extsize field value (get/set) */
+	unsigned char	fsx_pad[8];
 };
 
 /*
@@ -320,4 +321,8 @@ struct fsxattr {
 
 #endif
 
+#ifndef FS_XFLAG_COWEXTSIZE
+#define FS_XFLAG_COWEXTSIZE	0x00010000	/* CoW extent size allocator hint */
+#endif
+
 #endif	/* __XFS_DARWIN_H__ */
diff --git a/include/freebsd.h b/include/freebsd.h
index 32268ef..72935ec 100644
--- a/include/freebsd.h
+++ b/include/freebsd.h
@@ -181,7 +181,8 @@ struct fsxattr {
 	__u32		fsx_extsize;	/* extsize field value (get/set)*/
 	__u32		fsx_nextents;	/* nextents field value (get)	*/
 	__u32		fsx_projid;	/* project identifier (get/set) */
-	unsigned char	fsx_pad[12];
+	__u32		fsx_cowextsize;	/* cow extsize field value (get/set) */
+	unsigned char	fsx_pad[8];
 };
 
 /*
@@ -209,5 +210,8 @@ struct fsxattr {
 
 #endif
 
+#ifndef FS_XFLAG_COWEXTSIZE
+#define FS_XFLAG_COWEXTSIZE	0x00010000	/* CoW extent size allocator hint */
+#endif
 
 #endif	/* __XFS_FREEBSD_H__ */
diff --git a/include/irix.h b/include/irix.h
index 6094b8a..8bbcbd8 100644
--- a/include/irix.h
+++ b/include/irix.h
@@ -427,7 +427,8 @@ struct fsxattr {
 	__u32		fsx_extsize;	/* extsize field value (get/set)*/
 	__u32		fsx_nextents;	/* nextents field value (get)	*/
 	__u32		fsx_projid;	/* project identifier (get/set) */
-	unsigned char	fsx_pad[12];
+	__u32		fsx_cowextsize;	/* cow extsize field value (get/set) */
+	unsigned char	fsx_pad[8];
 };
 
 /*
@@ -455,6 +456,9 @@ struct fsxattr {
 
 #endif
 
+#ifndef FS_XFLAG_COWEXTSIZE
+#define FS_XFLAG_COWEXTSIZE	0x00010000	/* CoW extent size allocator hint */
+#endif
 
 /**
  * Abstraction of mountpoints.
diff --git a/include/linux.h b/include/linux.h
index 06f1af4..ddc053e 100644
--- a/include/linux.h
+++ b/include/linux.h
@@ -181,7 +181,8 @@ struct fsxattr {
 	__u32		fsx_extsize;	/* extsize field value (get/set)*/
 	__u32		fsx_nextents;	/* nextents field value (get)	*/
 	__u32		fsx_projid;	/* project identifier (get/set) */
-	unsigned char	fsx_pad[12];
+	__u32		fsx_cowextsize;	/* cow extsize field value (get/set) */
+	unsigned char	fsx_pad[8];
 };
 
 /*
@@ -209,4 +210,8 @@ struct fsxattr {
 
 #endif
 
+#ifndef FS_XFLAG_COWEXTSIZE
+#define FS_XFLAG_COWEXTSIZE	0x00010000	/* CoW extent size allocator hint */
+#endif
+
 #endif	/* __XFS_LINUX_H__ */


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 09/39] xfs_db: dump refcount btree data
  2016-10-25 23:03 [PATCH v10 00/39] xfsprogs: add reflink and dedupe support Darrick J. Wong
                   ` (7 preceding siblings ...)
  2016-10-25 23:04 ` [PATCH 08/39] xfs: create a separate cow extent size hint for the allocator Darrick J. Wong
@ 2016-10-25 23:04 ` Darrick J. Wong
  2016-10-26 10:28   ` Christoph Hellwig
  2016-10-25 23:04 ` [PATCH 10/39] xfs_db: add support for checking the refcount btree Darrick J. Wong
                   ` (29 subsequent siblings)
  38 siblings, 1 reply; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-25 23:04 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: linux-xfs

Add the ability to walk and dump the refcount btree in xfs_db.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 db/agf.c          |   13 +++++++++++--
 db/btblock.c      |   55 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 db/btblock.h      |    5 +++++
 db/field.c        |   13 +++++++++++++
 db/field.h        |    6 ++++++
 db/inode.c        |    3 +++
 db/sb.c           |    2 ++
 db/type.c         |    5 +++++
 db/type.h         |    2 +-
 man/man8/xfs_db.8 |   47 +++++++++++++++++++++++++++++++++++++++++++--
 10 files changed, 146 insertions(+), 5 deletions(-)


diff --git a/db/agf.c b/db/agf.c
index 467dd4c..275f407 100644
--- a/db/agf.c
+++ b/db/agf.c
@@ -47,7 +47,7 @@ const field_t	agf_flds[] = {
 	{ "versionnum", FLDT_UINT32D, OI(OFF(versionnum)), C1, 0, TYP_NONE },
 	{ "seqno", FLDT_AGNUMBER, OI(OFF(seqno)), C1, 0, TYP_NONE },
 	{ "length", FLDT_AGBLOCK, OI(OFF(length)), C1, 0, TYP_NONE },
-	{ "roots", FLDT_AGBLOCK, OI(OFF(roots)), CI(XFS_BTNUM_AGF),
+	{ "roots", FLDT_AGBLOCK, OI(OFF(roots)), CI(XFS_BTNUM_AGF) + 1,
 	  FLD_ARRAY|FLD_SKIPALL, TYP_NONE },
 	{ "bnoroot", FLDT_AGBLOCK,
 	  OI(OFF(roots) + XFS_BTNUM_BNO * SZ(roots[XFS_BTNUM_BNO])), C1, 0,
@@ -58,7 +58,10 @@ const field_t	agf_flds[] = {
 	{ "rmaproot", FLDT_AGBLOCKNZ,
 	  OI(OFF(roots) + XFS_BTNUM_RMAP * SZ(roots[XFS_BTNUM_RMAP])), C1, 0,
 	  TYP_RMAPBT },
-	{ "levels", FLDT_UINT32D, OI(OFF(levels)), CI(XFS_BTNUM_AGF),
+	{ "refcntroot", FLDT_AGBLOCKNZ,
+	  OI(OFF(refcount_root)), C1, 0,
+	  TYP_REFCBT },
+	{ "levels", FLDT_UINT32D, OI(OFF(levels)), CI(XFS_BTNUM_AGF) + 1,
 	  FLD_ARRAY|FLD_SKIPALL, TYP_NONE },
 	{ "bnolevel", FLDT_UINT32D,
 	  OI(OFF(levels) + XFS_BTNUM_BNO * SZ(levels[XFS_BTNUM_BNO])), C1, 0,
@@ -69,9 +72,15 @@ const field_t	agf_flds[] = {
 	{ "rmaplevel", FLDT_UINT32D,
 	  OI(OFF(levels) + XFS_BTNUM_RMAP * SZ(levels[XFS_BTNUM_RMAP])), C1, 0,
 	  TYP_NONE },
+	{ "refcntlevel", FLDT_UINT32D,
+	  OI(OFF(refcount_level)), C1, 0,
+	  TYP_NONE },
 	{ "rmapblocks", FLDT_UINT32D,
 	  OI(OFF(rmap_blocks)), C1, 0,
 	  TYP_NONE },
+	{ "refcntblocks", FLDT_UINT32D,
+	  OI(OFF(refcount_blocks)), C1, 0,
+	  TYP_NONE },
 	{ "flfirst", FLDT_UINT32D, OI(OFF(flfirst)), C1, 0, TYP_NONE },
 	{ "fllast", FLDT_UINT32D, OI(OFF(fllast)), C1, 0, TYP_NONE },
 	{ "flcount", FLDT_UINT32D, OI(OFF(flcount)), C1, 0, TYP_NONE },
diff --git a/db/btblock.c b/db/btblock.c
index ce59d18..835a5f0 100644
--- a/db/btblock.c
+++ b/db/btblock.c
@@ -102,6 +102,12 @@ struct xfs_db_btree {
 		sizeof(struct xfs_rmap_rec),
 		sizeof(__be32),
 	},
+	{	XFS_REFC_CRC_MAGIC,
+		XFS_BTREE_SBLOCK_CRC_LEN,
+		sizeof(struct xfs_refcount_key),
+		sizeof(struct xfs_refcount_rec),
+		sizeof(__be32),
+	},
 	{	0,
 	},
 };
@@ -707,3 +713,52 @@ const field_t	rmapbt_rec_flds[] = {
 	{ NULL }
 };
 #undef ROFF
+
+/* refcount btree blocks */
+const field_t	refcbt_crc_hfld[] = {
+	{ "", FLDT_REFCBT_CRC, OI(0), C1, 0, TYP_NONE },
+	{ NULL }
+};
+
+#define	OFF(f)	bitize(offsetof(struct xfs_btree_block, bb_ ## f))
+const field_t	refcbt_crc_flds[] = {
+	{ "magic", FLDT_UINT32X, OI(OFF(magic)), C1, 0, TYP_NONE },
+	{ "level", FLDT_UINT16D, OI(OFF(level)), C1, 0, TYP_NONE },
+	{ "numrecs", FLDT_UINT16D, OI(OFF(numrecs)), C1, 0, TYP_NONE },
+	{ "leftsib", FLDT_AGBLOCK, OI(OFF(u.s.bb_leftsib)), C1, 0, TYP_REFCBT },
+	{ "rightsib", FLDT_AGBLOCK, OI(OFF(u.s.bb_rightsib)), C1, 0, TYP_REFCBT },
+	{ "bno", FLDT_DFSBNO, OI(OFF(u.s.bb_blkno)), C1, 0, TYP_REFCBT },
+	{ "lsn", FLDT_UINT64X, OI(OFF(u.s.bb_lsn)), C1, 0, TYP_NONE },
+	{ "uuid", FLDT_UUID, OI(OFF(u.s.bb_uuid)), C1, 0, TYP_NONE },
+	{ "owner", FLDT_AGNUMBER, OI(OFF(u.s.bb_owner)), C1, 0, TYP_NONE },
+	{ "crc", FLDT_CRC, OI(OFF(u.s.bb_crc)), C1, 0, TYP_NONE },
+	{ "recs", FLDT_REFCBTREC, btblock_rec_offset, btblock_rec_count,
+	  FLD_ARRAY|FLD_ABASE1|FLD_COUNT|FLD_OFFSET, TYP_NONE },
+	{ "keys", FLDT_REFCBTKEY, btblock_key_offset, btblock_key_count,
+	  FLD_ARRAY|FLD_ABASE1|FLD_COUNT|FLD_OFFSET, TYP_NONE },
+	{ "ptrs", FLDT_REFCBTPTR, btblock_ptr_offset, btblock_key_count,
+	  FLD_ARRAY|FLD_ABASE1|FLD_COUNT|FLD_OFFSET, TYP_REFCBT },
+	{ NULL }
+};
+#undef OFF
+
+#define REFCNTBT_COWFLAG_BITOFF		0
+#define REFCNTBT_STARTBLOCK_BITOFF	(REFCNTBT_COWFLAG_BITOFF + REFCNTBT_COWFLAG_BITLEN)
+
+#define	KOFF(f)	bitize(offsetof(struct xfs_refcount_key, rc_ ## f))
+const field_t	refcbt_key_flds[] = {
+	{ "startblock", FLDT_CAGBLOCK, OI(REFCNTBT_STARTBLOCK_BITOFF), C1, 0, TYP_DATA },
+	{ "cowflag", FLDT_CCOWFLG, OI(REFCNTBT_COWFLAG_BITOFF), C1, 0, TYP_DATA },
+	{ NULL }
+};
+#undef KOFF
+
+#define	ROFF(f)	bitize(offsetof(struct xfs_refcount_rec, rc_ ## f))
+const field_t	refcbt_rec_flds[] = {
+	{ "startblock", FLDT_CAGBLOCK, OI(REFCNTBT_STARTBLOCK_BITOFF), C1, 0, TYP_DATA },
+	{ "blockcount", FLDT_EXTLEN, OI(ROFF(blockcount)), C1, 0, TYP_NONE },
+	{ "refcount", FLDT_UINT32D, OI(ROFF(refcount)), C1, 0, TYP_DATA },
+	{ "cowflag", FLDT_CCOWFLG, OI(REFCNTBT_COWFLAG_BITOFF), C1, 0, TYP_DATA },
+	{ NULL }
+};
+#undef ROFF
diff --git a/db/btblock.h b/db/btblock.h
index 35299b4..fead2f1 100644
--- a/db/btblock.h
+++ b/db/btblock.h
@@ -59,4 +59,9 @@ extern const struct field	rmapbt_crc_hfld[];
 extern const struct field	rmapbt_key_flds[];
 extern const struct field	rmapbt_rec_flds[];
 
+extern const struct field	refcbt_crc_flds[];
+extern const struct field	refcbt_crc_hfld[];
+extern const struct field	refcbt_key_flds[];
+extern const struct field	refcbt_rec_flds[];
+
 extern int	btblock_size(void *obj, int startoff, int idx);
diff --git a/db/field.c b/db/field.c
index ca7642f..1968dd5 100644
--- a/db/field.c
+++ b/db/field.c
@@ -163,6 +163,10 @@ const ftattr_t	ftattrtab[] = {
 	  NULL, NULL },
 	{ FLDT_RBMBTFLG, "rbmbtflag", fp_num, "%u", SI(RMAPBT_BMBTFLAG_BITLEN), 0,
 	  NULL, NULL },
+	{ FLDT_CAGBLOCK, "cagblock", fp_num, "%u", SI(REFCNTBT_AGBLOCK_BITLEN),
+	  FTARG_DONULL, fa_agblock, NULL },
+	{ FLDT_CCOWFLG, "ccowflag", fp_num, "%u", SI(REFCNTBT_COWFLAG_BITLEN), 0,
+	  NULL, NULL },
 	{ FLDT_CNTBT, "cntbt", NULL, (char *)cntbt_flds, btblock_size, FTARG_SIZE,
 	  NULL, cntbt_flds },
 	{ FLDT_CNTBT_CRC, "cntbt", NULL, (char *)cntbt_crc_flds, btblock_size,
@@ -183,6 +187,15 @@ const ftattr_t	ftattrtab[] = {
 	{ FLDT_RMAPBTREC, "rmapbtrec", fp_sarray, (char *)rmapbt_rec_flds,
 	  SI(bitsz(struct xfs_rmap_rec)), 0, NULL, rmapbt_rec_flds },
 
+	{ FLDT_REFCBT_CRC, "refcntbt", NULL, (char *)refcbt_crc_flds, btblock_size,
+	  FTARG_SIZE, NULL, refcbt_crc_flds },
+	{ FLDT_REFCBTKEY, "refcntbtkey", fp_sarray, (char *)refcbt_key_flds,
+	  SI(bitsz(struct xfs_refcount_key)), 0, NULL, refcbt_key_flds },
+	{ FLDT_REFCBTPTR, "refcntbtptr", fp_num, "%u", SI(bitsz(xfs_refcount_ptr_t)),
+	  0, fa_agblock, NULL },
+	{ FLDT_REFCBTREC, "refcntbtrec", fp_sarray, (char *)refcbt_rec_flds,
+	  SI(bitsz(struct xfs_refcount_rec)), 0, NULL, refcbt_rec_flds },
+
 /* CRC field */
 	{ FLDT_CRC, "crc", fp_crc, "%#x (%s)", SI(bitsz(__uint32_t)),
 	  0, NULL, NULL },
diff --git a/db/field.h b/db/field.h
index 47f562a..53616f1 100644
--- a/db/field.h
+++ b/db/field.h
@@ -80,6 +80,8 @@ typedef enum fldt	{
 	FLDT_REXTFLG,
 	FLDT_RATTRFORKFLG,
 	FLDT_RBMBTFLG,
+	FLDT_CAGBLOCK,
+	FLDT_CCOWFLG,
 	FLDT_CNTBT,
 	FLDT_CNTBT_CRC,
 	FLDT_CNTBTKEY,
@@ -89,6 +91,10 @@ typedef enum fldt	{
 	FLDT_RMAPBTKEY,
 	FLDT_RMAPBTPTR,
 	FLDT_RMAPBTREC,
+	FLDT_REFCBT_CRC,
+	FLDT_REFCBTKEY,
+	FLDT_REFCBTPTR,
+	FLDT_REFCBTREC,
 
 	/* CRC field type */
 	FLDT_CRC,
diff --git a/db/inode.c b/db/inode.c
index 442e6ea..702cdf8 100644
--- a/db/inode.c
+++ b/db/inode.c
@@ -175,6 +175,9 @@ const field_t	inode_v3_flds[] = {
 	{ "crtime", FLDT_TIMESTAMP, OI(COFF(crtime)), C1, 0, TYP_NONE },
 	{ "inumber", FLDT_INO, OI(COFF(ino)), C1, 0, TYP_NONE },
 	{ "uuid", FLDT_UUID, OI(COFF(uuid)), C1, 0, TYP_NONE },
+	{ "reflink", FLDT_UINT1,
+	  OI(COFF(flags2) + bitsz(__uint64_t) - XFS_DIFLAG2_REFLINK_BIT-1), C1,
+	  0, TYP_NONE },
 	{ NULL }
 };
 
diff --git a/db/sb.c b/db/sb.c
index 79a3c1d..8e7722c 100644
--- a/db/sb.c
+++ b/db/sb.c
@@ -694,6 +694,8 @@ version_string(
 		strcat(s, ",SPARSE_INODES");
 	if (xfs_sb_version_hasmetauuid(sbp))
 		strcat(s, ",META_UUID");
+	if (xfs_sb_version_hasreflink(sbp))
+		strcat(s, ",REFLINK");
 	return s;
 }
 
diff --git a/db/type.c b/db/type.c
index 337243a..10fa54e 100644
--- a/db/type.c
+++ b/db/type.c
@@ -61,6 +61,7 @@ static const typ_t	__typtab[] = {
 	{ TYP_BNOBT, "bnobt", handle_struct, bnobt_hfld, NULL, TYP_F_NO_CRC_OFF },
 	{ TYP_CNTBT, "cntbt", handle_struct, cntbt_hfld, NULL, TYP_F_NO_CRC_OFF },
 	{ TYP_RMAPBT, NULL },
+	{ TYP_REFCBT, NULL },
 	{ TYP_DATA, "data", handle_block, NULL, NULL, TYP_F_NO_CRC_OFF },
 	{ TYP_DIR2, "dir2", handle_struct, dir2_hfld, NULL, TYP_F_NO_CRC_OFF },
 	{ TYP_DQBLK, "dqblk", handle_struct, dqblk_hfld, NULL, TYP_F_NO_CRC_OFF },
@@ -97,6 +98,8 @@ static const typ_t	__typtab_crc[] = {
 		&xfs_allocbt_buf_ops, XFS_BTREE_SBLOCK_CRC_OFF },
 	{ TYP_RMAPBT, "rmapbt", handle_struct, rmapbt_crc_hfld,
 		&xfs_rmapbt_buf_ops, XFS_BTREE_SBLOCK_CRC_OFF },
+	{ TYP_REFCBT, "refcntbt", handle_struct, refcbt_crc_hfld,
+		&xfs_refcountbt_buf_ops, XFS_BTREE_SBLOCK_CRC_OFF },
 	{ TYP_DATA, "data", handle_block, NULL, NULL, TYP_F_NO_CRC_OFF },
 	{ TYP_DIR2, "dir3", handle_struct, dir3_hfld,
 		&xfs_dir3_db_buf_ops, TYP_F_NO_CRC_OFF },
@@ -139,6 +142,8 @@ static const typ_t	__typtab_spcrc[] = {
 		&xfs_allocbt_buf_ops, XFS_BTREE_SBLOCK_CRC_OFF },
 	{ TYP_RMAPBT, "rmapbt", handle_struct, rmapbt_crc_hfld,
 		&xfs_rmapbt_buf_ops, XFS_BTREE_SBLOCK_CRC_OFF },
+	{ TYP_REFCBT, "refcntbt", handle_struct, refcbt_crc_hfld,
+		&xfs_refcountbt_buf_ops, XFS_BTREE_SBLOCK_CRC_OFF },
 	{ TYP_DATA, "data", handle_block, NULL, NULL, TYP_F_NO_CRC_OFF },
 	{ TYP_DIR2, "dir3", handle_struct, dir3_hfld,
 		&xfs_dir3_db_buf_ops, TYP_F_NO_CRC_OFF },
diff --git a/db/type.h b/db/type.h
index b5a21a7..87ff107 100644
--- a/db/type.h
+++ b/db/type.h
@@ -24,7 +24,7 @@ struct field;
 typedef enum typnm
 {
 	TYP_AGF, TYP_AGFL, TYP_AGI, TYP_ATTR, TYP_BMAPBTA,
-	TYP_BMAPBTD, TYP_BNOBT, TYP_CNTBT, TYP_RMAPBT, TYP_DATA,
+	TYP_BMAPBTD, TYP_BNOBT, TYP_CNTBT, TYP_RMAPBT, TYP_REFCBT, TYP_DATA,
 	TYP_DIR2, TYP_DQBLK, TYP_INOBT, TYP_INODATA, TYP_INODE,
 	TYP_LOG, TYP_RTBITMAP, TYP_RTSUMMARY, TYP_SB, TYP_SYMLINK,
 	TYP_TEXT, TYP_FINOBT, TYP_NONE
diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8
index 8056b30..460d89d 100644
--- a/man/man8/xfs_db.8
+++ b/man/man8/xfs_db.8
@@ -699,8 +699,8 @@ If no argument is given, show the current data type.
 The possible data types are:
 .BR agf ", " agfl ", " agi ", " attr ", " bmapbta ", " bmapbtd ,
 .BR bnobt ", " cntbt ", " data ", " dir ", " dir2 ", " dqblk ,
-.BR inobt ", " inode ", " log ", " rmapbt ", " rtbitmap ", " rtsummary ,
-.BR sb ", " symlink " and " text .
+.BR inobt ", " inode ", " log ", " refcntbt ", " rmapbt ", " rtbitmap ,
+.BR rtsummary ", " sb ", " symlink " and " text .
 See the TYPES section below for more information on these data types.
 .TP
 .BI "uuid [" uuid " | " generate " | " rewrite " | " restore ]
@@ -1684,6 +1684,49 @@ use
 .BR xfs_logprint (8)
 instead.
 .TP
+.B refcntbt
+There is one set of filesystem blocks forming the reference count Btree for
+each allocation group. The root block of this Btree is designated by the
+.B refcntroot
+field in the corresponding AGF block.  The blocks are linked to sibling left
+and right blocks at each level, as well as by pointers from parent to child
+blocks.  Each block has the following fields:
+.RS 1.4i
+.PD 0
+.TP 1.2i
+.B magic
+REFC block magic number, 0x52334643 ('R3FC').
+.TP
+.B level
+level number of this block, 0 is a leaf.
+.TP
+.B numrecs
+number of data entries in the block.
+.TP
+.B leftsib
+left (logically lower) sibling block, 0 if none.
+.TP
+.B rightsib
+right (logically higher) sibling block, 0 if none.
+.TP
+.B recs
+[leaf blocks only] array of reference count records. Each record contains
+.BR startblock ,
+.BR blockcount ,
+and
+.BR refcount .
+.TP
+.B keys
+[non-leaf blocks only] array of key records. These are the first value
+of each block in the level below this one. Each record contains
+.BR startblock .
+.TP
+.B ptrs
+[non-leaf blocks only] array of child block pointers. Each pointer is a
+block number within the allocation group to the next level in the Btree.
+.PD
+.RE
+.TP
 .B rmapbt
 There is one set of filesystem blocks forming the reverse mapping Btree for
 each allocation group. The root block of this Btree is designated by the


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 10/39] xfs_db: add support for checking the refcount btree
  2016-10-25 23:03 [PATCH v10 00/39] xfsprogs: add reflink and dedupe support Darrick J. Wong
                   ` (8 preceding siblings ...)
  2016-10-25 23:04 ` [PATCH 09/39] xfs_db: dump refcount btree data Darrick J. Wong
@ 2016-10-25 23:04 ` Darrick J. Wong
  2016-10-26  0:49   ` Dave Chinner
  2016-10-25 23:04 ` [PATCH 11/39] xfs_db: metadump should copy the refcount btree too Darrick J. Wong
                   ` (28 subsequent siblings)
  38 siblings, 1 reply; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-25 23:04 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: linux-xfs

Do some basic checks of the refcount btree.  xfs_repair will have to
check that the reference counts match the various bmbt mappings.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 db/check.c |  136 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 128 insertions(+), 8 deletions(-)


diff --git a/db/check.c b/db/check.c
index a6a8372..5b90182 100644
--- a/db/check.c
+++ b/db/check.c
@@ -44,7 +44,8 @@ typedef enum {
 	DBM_FREE1,	DBM_FREE2,	DBM_FREELIST,	DBM_INODE,
 	DBM_LOG,	DBM_MISSING,	DBM_QUOTA,	DBM_RTBITMAP,
 	DBM_RTDATA,	DBM_RTFREE,	DBM_RTSUM,	DBM_SB,
-	DBM_SYMLINK,	DBM_BTFINO,	DBM_BTRMAP,
+	DBM_SYMLINK,	DBM_BTFINO,	DBM_BTRMAP,	DBM_BTREFC,
+	DBM_RLDATA,
 	DBM_NDBM
 } dbm_t;
 
@@ -52,7 +53,8 @@ typedef struct inodata {
 	struct inodata	*next;
 	nlink_t		link_set;
 	nlink_t		link_add;
-	char		isdir;
+	char		isdir:1;
+	char		isreflink:1;
 	char		security;
 	char		ilist;
 	xfs_ino_t	ino;
@@ -172,6 +174,8 @@ static const char	*typename[] = {
 	"symlink",
 	"btfino",
 	"btrmap",
+	"btrefcnt",
+	"rldata",
 	NULL
 };
 static int		verbose;
@@ -229,7 +233,8 @@ static int		blocktrash_f(int argc, char **argv);
 static int		blockuse_f(int argc, char **argv);
 static int		check_blist(xfs_fsblock_t bno);
 static void		check_dbmap(xfs_agnumber_t agno, xfs_agblock_t agbno,
-				    xfs_extlen_t len, dbm_t type);
+				    xfs_extlen_t len, dbm_t type,
+				    int ignore_reflink);
 static int		check_inomap(xfs_agnumber_t agno, xfs_agblock_t agbno,
 				     xfs_extlen_t len, xfs_ino_t c_ino);
 static void		check_linkcounts(xfs_agnumber_t agno);
@@ -353,6 +358,9 @@ static void		scanfunc_fino(struct xfs_btree_block *block, int level,
 static void		scanfunc_rmap(struct xfs_btree_block *block, int level,
 				     struct xfs_agf *agf, xfs_agblock_t bno,
 				     int isroot);
+static void		scanfunc_refcnt(struct xfs_btree_block *block, int level,
+				     struct xfs_agf *agf, xfs_agblock_t bno,
+				     int isroot);
 static void		set_dbmap(xfs_agnumber_t agno, xfs_agblock_t agbno,
 				  xfs_extlen_t len, dbm_t type,
 				  xfs_agnumber_t c_agno, xfs_agblock_t c_agbno);
@@ -1055,6 +1063,7 @@ blocktrash_f(
 		   (1 << DBM_SYMLINK) |
 		   (1 << DBM_BTFINO) |
 		   (1 << DBM_BTRMAP) |
+		   (1 << DBM_BTREFC) |
 		   (1 << DBM_SB);
 	while ((c = getopt(argc, argv, "0123n:o:s:t:x:y:z")) != EOF) {
 		switch (c) {
@@ -1291,18 +1300,25 @@ check_dbmap(
 	xfs_agnumber_t	agno,
 	xfs_agblock_t	agbno,
 	xfs_extlen_t	len,
-	dbm_t		type)
+	dbm_t		type,
+	int		ignore_reflink)
 {
 	xfs_extlen_t	i;
 	char		*p;
+	dbm_t		d;
 
 	for (i = 0, p = &dbmap[agno][agbno]; i < len; i++, p++) {
+		d = (dbm_t)*p;
+		if (ignore_reflink && (d == DBM_UNKNOWN || d == DBM_DATA ||
+				       d == DBM_RLDATA))
+			continue;
 		if ((dbm_t)*p != type) {
-			if (!sflag || CHECK_BLISTA(agno, agbno + i))
+			if (!sflag || CHECK_BLISTA(agno, agbno + i)) {
 				dbprintf(_("block %u/%u expected type %s got "
 					 "%s\n"),
 					agno, agbno + i, typename[type],
 					typename[(dbm_t)*p]);
+			}
 			error++;
 		}
 	}
@@ -1336,7 +1352,7 @@ check_inomap(
 		return 0;
 	}
 	for (i = 0, rval = 1, idp = &inomap[agno][agbno]; i < len; i++, idp++) {
-		if (*idp) {
+		if (*idp && !(*idp)->isreflink) {
 			if (!sflag || (*idp)->ilist ||
 			    CHECK_BLISTA(agno, agbno + i))
 				dbprintf(_("block %u/%u claimed by inode %lld, "
@@ -1542,6 +1558,26 @@ check_rrange(
 	return 1;
 }
 
+/*
+ * We don't check the accuracy of reference counts -- all we do is ensure
+ * that a data block never crosses with non-data blocks.  repair can check
+ * those kinds of things.
+ *
+ * So with that in mind, if we're setting a block to be data or rldata,
+ * don't complain so long as the block is currently unknown, data, or rldata.
+ * Don't let blocks downgrade from rldata -> data.
+ */
+static bool
+is_reflink(
+	dbm_t		type2)
+{
+	if (!xfs_sb_version_hasreflink(&mp->m_sb))
+		return false;
+	if (type2 == DBM_DATA || type2 == DBM_RLDATA)
+		return true;
+	return false;
+}
+
 static void
 check_set_dbmap(
 	xfs_agnumber_t	agno,
@@ -1561,10 +1597,15 @@ check_set_dbmap(
 			agbno, agbno + len - 1, c_agno, c_agbno);
 		return;
 	}
-	check_dbmap(agno, agbno, len, type1);
+	check_dbmap(agno, agbno, len, type1, is_reflink(type2));
 	mayprint = verbose | blist_size;
 	for (i = 0, p = &dbmap[agno][agbno]; i < len; i++, p++) {
-		*p = (char)type2;
+		if (*p == DBM_RLDATA && type2 == DBM_DATA)
+			;	/* do nothing */
+		if (*p == DBM_DATA && type2 == DBM_DATA)
+			*p = (char)DBM_RLDATA;
+		else
+			*p = (char)type2;
 		if (mayprint && (verbose || CHECK_BLISTA(agno, agbno + i)))
 			dbprintf(_("setting block %u/%u to %s\n"), agno, agbno + i,
 				typename[type2]);
@@ -2804,6 +2845,7 @@ process_inode(
 		break;
 	}
 
+	id->isreflink = !!(xino.i_d.di_flags2 & XFS_DIFLAG2_REFLINK);
 	setlink_inode(id, VFS_I(&xino)->i_nlink, type == DBM_DIR, security);
 
 	switch (xino.i_d.di_format) {
@@ -3910,6 +3952,12 @@ scan_ag(
 			be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAP]),
 			1, scanfunc_rmap, TYP_RMAPBT);
 	}
+	if (agf->agf_refcount_root) {
+		scan_sbtree(agf,
+			be32_to_cpu(agf->agf_refcount_root),
+			be32_to_cpu(agf->agf_refcount_level),
+			1, scanfunc_refcnt, TYP_REFCBT);
+	}
 	scan_sbtree(agf,
 		be32_to_cpu(agi->agi_root),
 		be32_to_cpu(agi->agi_level),
@@ -4643,6 +4691,78 @@ scanfunc_rmap(
 }
 
 static void
+scanfunc_refcnt(
+	struct xfs_btree_block	*block,
+	int			level,
+	struct xfs_agf		*agf,
+	xfs_agblock_t		bno,
+	int			isroot)
+{
+	xfs_agnumber_t		seqno = be32_to_cpu(agf->agf_seqno);
+	int			i;
+	xfs_refcount_ptr_t	*pp;
+	struct xfs_refcount_rec	*rp;
+	xfs_agblock_t		lastblock;
+
+	if (be32_to_cpu(block->bb_magic) != XFS_REFC_CRC_MAGIC) {
+		dbprintf(_("bad magic # %#x in refcntbt block %u/%u\n"),
+			be32_to_cpu(block->bb_magic), seqno, bno);
+		serious_error++;
+		return;
+	}
+	if (be16_to_cpu(block->bb_level) != level) {
+		if (!sflag)
+			dbprintf(_("expected level %d got %d in refcntbt block "
+				 "%u/%u\n"),
+				level, be16_to_cpu(block->bb_level), seqno, bno);
+		error++;
+	}
+	set_dbmap(seqno, bno, 1, DBM_BTREFC, seqno, bno);
+	if (level == 0) {
+		if (be16_to_cpu(block->bb_numrecs) > mp->m_refc_mxr[0] ||
+		    (isroot == 0 && be16_to_cpu(block->bb_numrecs) < mp->m_refc_mnr[0])) {
+			dbprintf(_("bad btree nrecs (%u, min=%u, max=%u) in "
+				 "refcntbt block %u/%u\n"),
+				be16_to_cpu(block->bb_numrecs), mp->m_refc_mnr[0],
+				mp->m_refc_mxr[0], seqno, bno);
+			serious_error++;
+			return;
+		}
+		rp = XFS_REFCOUNT_REC_ADDR(block, 1);
+		lastblock = 0;
+		for (i = 0; i < be16_to_cpu(block->bb_numrecs); i++) {
+			set_dbmap(seqno, be32_to_cpu(rp[i].rc_startblock),
+				be32_to_cpu(rp[i].rc_blockcount), DBM_RLDATA,
+				seqno, bno);
+			if (be32_to_cpu(rp[i].rc_startblock) < lastblock) {
+				dbprintf(_(
+		"out-of-order refcnt btree record %d (%u %u) block %u/%u\n"),
+					 i, be32_to_cpu(rp[i].rc_startblock),
+					 be32_to_cpu(rp[i].rc_startblock),
+					 be32_to_cpu(agf->agf_seqno), bno);
+			} else {
+				lastblock = be32_to_cpu(rp[i].rc_startblock) +
+					    be32_to_cpu(rp[i].rc_blockcount);
+			}
+		}
+		return;
+	}
+	if (be16_to_cpu(block->bb_numrecs) > mp->m_refc_mxr[1] ||
+	    (isroot == 0 && be16_to_cpu(block->bb_numrecs) < mp->m_refc_mnr[1])) {
+		dbprintf(_("bad btree nrecs (%u, min=%u, max=%u) in refcntbt "
+			 "block %u/%u\n"),
+			be16_to_cpu(block->bb_numrecs), mp->m_refc_mnr[1],
+			mp->m_refc_mxr[1], seqno, bno);
+		serious_error++;
+		return;
+	}
+	pp = XFS_REFCOUNT_PTR_ADDR(block, 1, mp->m_refc_mxr[1]);
+	for (i = 0; i < be16_to_cpu(block->bb_numrecs); i++)
+		scan_sbtree(agf, be32_to_cpu(pp[i]), level, 0, scanfunc_refcnt,
+				TYP_REFCBT);
+}
+
+static void
 set_dbmap(
 	xfs_agnumber_t	agno,
 	xfs_agblock_t	agbno,


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 11/39] xfs_db: metadump should copy the refcount btree too
  2016-10-25 23:03 [PATCH v10 00/39] xfsprogs: add reflink and dedupe support Darrick J. Wong
                   ` (9 preceding siblings ...)
  2016-10-25 23:04 ` [PATCH 10/39] xfs_db: add support for checking the refcount btree Darrick J. Wong
@ 2016-10-25 23:04 ` Darrick J. Wong
  2016-10-26 10:29   ` Christoph Hellwig
  2016-10-25 23:04 ` [PATCH 12/39] xfs_db: deal with the CoW extent size hint Darrick J. Wong
                   ` (27 subsequent siblings)
  38 siblings, 1 reply; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-25 23:04 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: linux-xfs

Teach metadump to copy the refcount btree.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 db/metadump.c |   74 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 74 insertions(+)


diff --git a/db/metadump.c b/db/metadump.c
index c769958..1ba6b38 100644
--- a/db/metadump.c
+++ b/db/metadump.c
@@ -615,6 +615,78 @@ copy_rmap_btree(
 	return scan_btree(agno, root, levels, TYP_RMAPBT, agf, scanfunc_rmapbt);
 }
 
+static int
+scanfunc_refcntbt(
+	struct xfs_btree_block	*block,
+	xfs_agnumber_t		agno,
+	xfs_agblock_t		agbno,
+	int			level,
+	typnm_t			btype,
+	void			*arg)
+{
+	xfs_refcount_ptr_t	*pp;
+	int			i;
+	int			numrecs;
+
+	if (level == 0)
+		return 1;
+
+	numrecs = be16_to_cpu(block->bb_numrecs);
+	if (numrecs > mp->m_refc_mxr[1]) {
+		if (show_warnings)
+			print_warning("invalid numrecs (%u) in %s block %u/%u",
+				numrecs, typtab[btype].name, agno, agbno);
+		return 1;
+	}
+
+	pp = XFS_REFCOUNT_PTR_ADDR(block, 1, mp->m_refc_mxr[1]);
+	for (i = 0; i < numrecs; i++) {
+		if (!valid_bno(agno, be32_to_cpu(pp[i]))) {
+			if (show_warnings)
+				print_warning("invalid block number (%u/%u) "
+					"in %s block %u/%u",
+					agno, be32_to_cpu(pp[i]),
+					typtab[btype].name, agno, agbno);
+			continue;
+		}
+		if (!scan_btree(agno, be32_to_cpu(pp[i]), level, btype, arg,
+				scanfunc_refcntbt))
+			return 0;
+	}
+	return 1;
+}
+
+static int
+copy_refcount_btree(
+	xfs_agnumber_t	agno,
+	struct xfs_agf	*agf)
+{
+	xfs_agblock_t	root;
+	int		levels;
+
+	if (!xfs_sb_version_hasreflink(&mp->m_sb))
+		return 1;
+
+	root = be32_to_cpu(agf->agf_refcount_root);
+	levels = be32_to_cpu(agf->agf_refcount_level);
+
+	/* validate root and levels before processing the tree */
+	if (root == 0 || root > mp->m_sb.sb_agblocks) {
+		if (show_warnings)
+			print_warning("invalid block number (%u) in refcntbt "
+					"root in agf %u", root, agno);
+		return 1;
+	}
+	if (levels >= XFS_BTREE_MAXLEVELS) {
+		if (show_warnings)
+			print_warning("invalid level (%u) in refcntbt root "
+					"in agf %u", levels, agno);
+		return 1;
+	}
+
+	return scan_btree(agno, root, levels, TYP_REFCBT, agf, scanfunc_refcntbt);
+}
+
 /* filename and extended attribute obfuscation routines */
 
 struct name_ent {
@@ -2525,6 +2597,8 @@ scan_ag(
 			goto pop_out;
 		if (!copy_rmap_btree(agno, agf))
 			goto pop_out;
+		if (!copy_refcount_btree(agno, agf))
+			goto pop_out;
 	}
 
 	/* copy inode btrees and the inodes and their associated metadata */


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 12/39] xfs_db: deal with the CoW extent size hint
  2016-10-25 23:03 [PATCH v10 00/39] xfsprogs: add reflink and dedupe support Darrick J. Wong
                   ` (10 preceding siblings ...)
  2016-10-25 23:04 ` [PATCH 11/39] xfs_db: metadump should copy the refcount btree too Darrick J. Wong
@ 2016-10-25 23:04 ` Darrick J. Wong
  2016-10-26 10:28   ` Christoph Hellwig
  2016-10-25 23:04 ` [PATCH 13/39] xfs_db: print one array element per line Darrick J. Wong
                   ` (26 subsequent siblings)
  38 siblings, 1 reply; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-25 23:04 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: linux-xfs

Display the CoW extent hint size when dumping inodes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 db/inode.c |    4 ++++
 1 file changed, 4 insertions(+)


diff --git a/db/inode.c b/db/inode.c
index 702cdf8..cac19fc 100644
--- a/db/inode.c
+++ b/db/inode.c
@@ -172,12 +172,16 @@ const field_t	inode_v3_flds[] = {
 	{ "change_count", FLDT_UINT64D, OI(COFF(changecount)), C1, 0, TYP_NONE },
 	{ "lsn", FLDT_UINT64X, OI(COFF(lsn)), C1, 0, TYP_NONE },
 	{ "flags2", FLDT_UINT64X, OI(COFF(flags2)), C1, 0, TYP_NONE },
+	{ "cowextsize", FLDT_EXTLEN, OI(COFF(cowextsize)), C1, 0, TYP_NONE },
 	{ "crtime", FLDT_TIMESTAMP, OI(COFF(crtime)), C1, 0, TYP_NONE },
 	{ "inumber", FLDT_INO, OI(COFF(ino)), C1, 0, TYP_NONE },
 	{ "uuid", FLDT_UUID, OI(COFF(uuid)), C1, 0, TYP_NONE },
 	{ "reflink", FLDT_UINT1,
 	  OI(COFF(flags2) + bitsz(__uint64_t) - XFS_DIFLAG2_REFLINK_BIT-1), C1,
 	  0, TYP_NONE },
+	{ "cowextsz", FLDT_UINT1,
+	  OI(COFF(flags2) + bitsz(__uint64_t) - XFS_DIFLAG2_COWEXTSIZE_BIT-1), C1,
+	  0, TYP_NONE },
 	{ NULL }
 };
 


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 13/39] xfs_db: print one array element per line
  2016-10-25 23:03 [PATCH v10 00/39] xfsprogs: add reflink and dedupe support Darrick J. Wong
                   ` (11 preceding siblings ...)
  2016-10-25 23:04 ` [PATCH 12/39] xfs_db: deal with the CoW extent size hint Darrick J. Wong
@ 2016-10-25 23:04 ` Darrick J. Wong
  2016-10-26  0:51   ` Dave Chinner
  2016-10-25 23:05 ` [PATCH 14/39] xfs_growfs: report the presence of the reflink feature Darrick J. Wong
                   ` (25 subsequent siblings)
  38 siblings, 1 reply; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-25 23:04 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: linux-xfs

Print one array element per line so that the debugger output isn't
a gigantic pile of screen snow.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 db/print.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/db/print.c b/db/print.c
index 998daf4..e31372f 100644
--- a/db/print.c
+++ b/db/print.c
@@ -197,7 +197,7 @@ print_sarray(
 	     i < count && !seenint();
 	     i++, bitoff += size) {
 		if (array)
-			dbprintf("%d:", i + base);
+			dbprintf("\n%d:", i + base);
 		for (f = flds, first = 1; f->name; f++) {
 			if (f->flags & FLD_SKIPALL)
 				continue;


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 14/39] xfs_growfs: report the presence of the reflink feature
  2016-10-25 23:03 [PATCH v10 00/39] xfsprogs: add reflink and dedupe support Darrick J. Wong
                   ` (12 preceding siblings ...)
  2016-10-25 23:04 ` [PATCH 13/39] xfs_db: print one array element per line Darrick J. Wong
@ 2016-10-25 23:05 ` Darrick J. Wong
  2016-10-26 10:31   ` Christoph Hellwig
  2016-10-25 23:05 ` [PATCH 15/39] xfs_io: bmap should support querying CoW fork, shared blocks Darrick J. Wong
                   ` (24 subsequent siblings)
  38 siblings, 1 reply; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-25 23:05 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: linux-xfs

Report the presence of the reflink feature in xfs_info.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 growfs/xfs_growfs.c |   12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)


diff --git a/growfs/xfs_growfs.c b/growfs/xfs_growfs.c
index 2b46480..a294e14 100644
--- a/growfs/xfs_growfs.c
+++ b/growfs/xfs_growfs.c
@@ -59,12 +59,14 @@ report_info(
 	int		ftype_enabled,
 	int		finobt_enabled,
 	int		spinodes,
-	int		rmapbt_enabled)
+	int		rmapbt_enabled,
+	int		reflink_enabled)
 {
 	printf(_(
 	    "meta-data=%-22s isize=%-6u agcount=%u, agsize=%u blks\n"
 	    "         =%-22s sectsz=%-5u attr=%u, projid32bit=%u\n"
 	    "         =%-22s crc=%-8u finobt=%u spinodes=%u rmapbt=%u\n"
+	    "         =%-22s reflink=%u\n"
 	    "data     =%-22s bsize=%-6u blocks=%llu, imaxpct=%u\n"
 	    "         =%-22s sunit=%-6u swidth=%u blks\n"
 	    "naming   =version %-14u bsize=%-6u ascii-ci=%d ftype=%d\n"
@@ -75,6 +77,7 @@ report_info(
 		mntpoint, geo.inodesize, geo.agcount, geo.agblocks,
 		"", geo.sectsize, attrversion, projid32bit,
 		"", crcs_enabled, finobt_enabled, spinodes, rmapbt_enabled,
+		"", reflink_enabled,
 		"", geo.blocksize, (unsigned long long)geo.datablocks,
 			geo.imaxpct,
 		"", geo.sunit, geo.swidth,
@@ -129,6 +132,7 @@ main(int argc, char **argv)
 	int			finobt_enabled;	/* free inode btree */
 	int			spinodes;
 	int			rmapbt_enabled;
+	int			reflink_enabled;
 
 	progname = basename(argv[0]);
 	setlocale(LC_ALL, "");
@@ -253,12 +257,13 @@ main(int argc, char **argv)
 	finobt_enabled = geo.flags & XFS_FSOP_GEOM_FLAGS_FINOBT ? 1 : 0;
 	spinodes = geo.flags & XFS_FSOP_GEOM_FLAGS_SPINODES ? 1 : 0;
 	rmapbt_enabled = geo.flags & XFS_FSOP_GEOM_FLAGS_RMAPBT ? 1 : 0;
+	reflink_enabled = geo.flags & XFS_FSOP_GEOM_FLAGS_REFLINK ? 1 : 0;
 	if (nflag) {
 		report_info(geo, datadev, isint, logdev, rtdev,
 				lazycount, dirversion, logversion,
 				attrversion, projid32bit, crcs_enabled, ci,
 				ftype_enabled, finobt_enabled, spinodes,
-				rmapbt_enabled);
+				rmapbt_enabled, reflink_enabled);
 		exit(0);
 	}
 
@@ -296,7 +301,8 @@ main(int argc, char **argv)
 	report_info(geo, datadev, isint, logdev, rtdev,
 			lazycount, dirversion, logversion,
 			attrversion, projid32bit, crcs_enabled, ci, ftype_enabled,
-			finobt_enabled, spinodes, rmapbt_enabled);
+			finobt_enabled, spinodes, rmapbt_enabled,
+			reflink_enabled);
 
 	ddsize = xi.dsize;
 	dlsize = ( xi.logBBsize? xi.logBBsize :


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 15/39] xfs_io: bmap should support querying CoW fork, shared blocks
  2016-10-25 23:03 [PATCH v10 00/39] xfsprogs: add reflink and dedupe support Darrick J. Wong
                   ` (13 preceding siblings ...)
  2016-10-25 23:05 ` [PATCH 14/39] xfs_growfs: report the presence of the reflink feature Darrick J. Wong
@ 2016-10-25 23:05 ` Darrick J. Wong
  2016-10-25 23:05 ` [PATCH 16/39] libxfs: add configure option to override system header fsxattr Darrick J. Wong
                   ` (23 subsequent siblings)
  38 siblings, 0 replies; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-25 23:05 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: linux-xfs

Teach the bmap command to report shared and delayed allocation
extents, and to be able to query the CoW fork.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 io/bmap.c           |   43 ++++++++++++++++++++++++++++++++++---------
 man/man8/xfs_bmap.8 |   14 ++++++++++++++
 man/man8/xfs_io.8   |    2 +-
 3 files changed, 49 insertions(+), 10 deletions(-)


diff --git a/io/bmap.c b/io/bmap.c
index b2e48da..2333244 100644
--- a/io/bmap.c
+++ b/io/bmap.c
@@ -41,7 +41,9 @@ bmap_help(void)
 " Holes are marked by replacing the startblock..endblock with 'hole'.\n"
 " All the file offsets and disk blocks are in units of 512-byte blocks.\n"
 " -a -- prints the attribute fork map instead of the data fork.\n"
+" -c -- prints the copy-on-write fork map instead of the data fork.\n"
 " -d -- suppresses a DMAPI read event, offline portions shown as holes.\n"
+" -e -- print delayed allocation extents.\n"
 " -l -- also displays the length of each extent in 512-byte blocks.\n"
 " -n -- query n extents.\n"
 " -p -- obtain all unwritten extents as well (w/ -v show which are unwritten.)\n"
@@ -75,6 +77,7 @@ bmap_f(
 	int			loop = 0;
 	int			flg = 0;
 	int			aflag = 0;
+	int			cflag = 0;
 	int			lflag = 0;
 	int			nflag = 0;
 	int			pflag = 0;
@@ -85,12 +88,19 @@ bmap_f(
 	int			c;
 	int			egcnt;
 
-	while ((c = getopt(argc, argv, "adln:pv")) != EOF) {
+	while ((c = getopt(argc, argv, "acdeln:pv")) != EOF) {
 		switch (c) {
 		case 'a':	/* Attribute fork. */
 			bmv_iflags |= BMV_IF_ATTRFORK;
 			aflag = 1;
 			break;
+		case 'c':	/* CoW fork. */
+			bmv_iflags |= BMV_IF_COWFORK | BMV_IF_DELALLOC;
+			cflag = 1;
+			break;
+		case 'e':
+			bmv_iflags |= BMV_IF_DELALLOC;
+			break;
 		case 'l':	/* list number of blocks with each extent */
 			lflag = 1;
 			break;
@@ -113,7 +123,7 @@ bmap_f(
 			return command_usage(&bmap_cmd);
 		}
 	}
-	if (aflag)
+	if (aflag || cflag)
 		bmv_iflags &= ~(BMV_IF_PREALLOC|BMV_IF_NO_DMAPI_READ);
 
 	if (vflag) {
@@ -273,13 +283,14 @@ bmap_f(
 #define MINRANGE_WIDTH	16
 #define MINAG_WIDTH	2
 #define MINTOT_WIDTH	5
-#define NFLG		5	/* count of flags */
-#define	FLG_NULL	000000	/* Null flag */
-#define	FLG_PRE		010000	/* Unwritten extent */
-#define	FLG_BSU		001000	/* Not on begin of stripe unit  */
-#define	FLG_ESU		000100	/* Not on end   of stripe unit  */
-#define	FLG_BSW		000010	/* Not on begin of stripe width */
-#define	FLG_ESW		000001	/* Not on end   of stripe width */
+#define NFLG		6	/* count of flags */
+#define	FLG_NULL	0000000	/* Null flag */
+#define	FLG_SHARED	0100000	/* shared extent */
+#define	FLG_PRE		0010000	/* Unwritten extent */
+#define	FLG_BSU		0001000	/* Not on begin of stripe unit  */
+#define	FLG_ESU		0000100	/* Not on end   of stripe unit  */
+#define	FLG_BSW		0000010	/* Not on begin of stripe width */
+#define	FLG_ESW		0000001	/* Not on end   of stripe width */
 		int	agno;
 		off64_t agoff, bbperag;
 		int	foff_w, boff_w, aoff_w, tot_w, agno_w;
@@ -350,6 +361,10 @@ bmap_f(
 			if (map[i + 1].bmv_oflags & BMV_OF_PREALLOC) {
 				flg |= FLG_PRE;
 			}
+			if (map[i + 1].bmv_oflags & BMV_OF_SHARED)
+				flg |= FLG_SHARED;
+			if (map[i + 1].bmv_oflags & BMV_OF_DELALLOC)
+				map[i + 1].bmv_block = -2;
 			/*
 			 * If striping enabled, determine if extent starts/ends
 			 * on a stripe unit boundary.
@@ -382,6 +397,14 @@ bmap_f(
 					agno_w, "",
 					aoff_w, "",
 					tot_w, (long long)map[i+1].bmv_length);
+			} else if (map[i + 1].bmv_block == -2) {
+				printf("%4d: %-*s %-*s %*s %-*s %*lld\n",
+					i,
+					foff_w, rbuf,
+					boff_w, _("delalloc"),
+					agno_w, "",
+					aoff_w, "",
+					tot_w, (long long)map[i+1].bmv_length);
 			} else {
 				snprintf(bbuf, sizeof(bbuf), "%lld..%lld",
 					(long long) map[i + 1].bmv_block,
@@ -413,6 +436,8 @@ bmap_f(
 		}
 		if ((flg || pflag) && vflag > 1) {
 			printf(_(" FLAG Values:\n"));
+			printf(_("    %*.*o Shared extent\n"),
+				NFLG+1, NFLG+1, FLG_SHARED);
 			printf(_("    %*.*o Unwritten preallocated extent\n"),
 				NFLG+1, NFLG+1, FLG_PRE);
 			printf(_("    %*.*o Doesn't begin on stripe unit\n"),
diff --git a/man/man8/xfs_bmap.8 b/man/man8/xfs_bmap.8
index e196559..098cfae 100644
--- a/man/man8/xfs_bmap.8
+++ b/man/man8/xfs_bmap.8
@@ -36,6 +36,10 @@ no matter what the filesystem's block size is.
 If this option is specified, information about the file's
 attribute fork is printed instead of the default data fork.
 .TP
+.B \-c
+If this option is specified, information about the file's
+copy on write fork is printed instead of the default data fork.
+.TP
 .B \-d
 If portions of the file have been migrated offline by
 a DMAPI application, a DMAPI read event will be generated to
@@ -45,6 +49,16 @@ printed.  However if the
 option is used, no DMAPI read event will be generated for a
 DMAPI file and offline portions will be reported as holes.
 .TP
+.B \-e
+If this option is used,
+.B xfs_bmap
+obtains all delayed allocation extents, and does not flush dirty pages
+to disk before querying extent data. With the
+.B \-v
+option, the
+.I flags
+column will show which extents have not yet been allocated.
+.TP
 .B \-l
 If this option is used, then
 .IP
diff --git a/man/man8/xfs_io.8 b/man/man8/xfs_io.8
index 2c56f09..d089524 100644
--- a/man/man8/xfs_io.8
+++ b/man/man8/xfs_io.8
@@ -256,7 +256,7 @@ See the
 .B pwrite
 command.
 .TP
-.BI "bmap [ \-adlpv ] [ \-n " nx " ]"
+.BI "bmap [ \-acdelpv ] [ \-n " nx " ]"
 Prints the block mapping for the current open file. Refer to the
 .BR xfs_bmap (8)
 manual page for complete documentation.


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 16/39] libxfs: add configure option to override system header fsxattr
  2016-10-25 23:03 [PATCH v10 00/39] xfsprogs: add reflink and dedupe support Darrick J. Wong
                   ` (14 preceding siblings ...)
  2016-10-25 23:05 ` [PATCH 15/39] xfs_io: bmap should support querying CoW fork, shared blocks Darrick J. Wong
@ 2016-10-25 23:05 ` Darrick J. Wong
  2016-10-26  0:56   ` Dave Chinner
  2016-10-26 10:32   ` Christoph Hellwig
  2016-10-25 23:05 ` [PATCH 17/39] xfs_io: get and set the CoW extent size hint Darrick J. Wong
                   ` (22 subsequent siblings)
  38 siblings, 2 replies; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-25 23:05 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: linux-xfs

By default, libxfs will use the kernel/system headers to define struct
fsxattr.  Unfortunately, this creates a problem for developers who are
writing new features but building xfsprogs on a stable system, because
the stable kernel's headers don't reflect the new feature.  In this
case, we want to be able to use the internal fsxattr definition while
the kernel headers catch up, so provide some configure magic to allow
developers to force the use of the internal definition.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 configure.ac         |    5 +++++
 include/builddefs.in |    4 ++++
 include/linux.h      |   10 +++++++++-
 io/fiemap.c          |    1 -
 4 files changed, 18 insertions(+), 2 deletions(-)


diff --git a/configure.ac b/configure.ac
index 50e04df..8a39e75 100644
--- a/configure.ac
+++ b/configure.ac
@@ -61,6 +61,11 @@ AC_ARG_ENABLE(librt,
 	enable_librt=yes)
 AC_SUBST(enable_librt)
 
+AC_ARG_ENABLE(internal-fsxattr,
+[ --enable-internal-fsxattr=[yes/no] Override system definition of struct fsxattr [default=no]],,
+	enable_internal_fsxattr=no)
+AC_SUBST(enable_internal_fsxattr)
+
 #
 # If the user specified a libdir ending in lib64 do not append another
 # 64 to the library names.
diff --git a/include/builddefs.in b/include/builddefs.in
index 7153d7a..fd7eb74 100644
--- a/include/builddefs.in
+++ b/include/builddefs.in
@@ -109,6 +109,7 @@ HAVE_MNTENT = @have_mntent@
 HAVE_FLS = @have_fls@
 HAVE_FSETXATTR = @have_fsetxattr@
 HAVE_MREMAP = @have_mremap@
+ENABLE_INTERNAL_FSXATTR = @enable_internal_fsxattr@
 
 GCCFLAGS = -funsigned-char -fno-strict-aliasing -Wall
 #	   -Wbitwise -Wno-transparent-union -Wno-old-initializer -Wno-decl
@@ -148,6 +149,9 @@ endif
 ifeq ($(ENABLE_BLKID),yes)
 PCFLAGS+= -DENABLE_BLKID
 endif
+ifeq ($(ENABLE_INTERNAL_FSXATTR),yes)
+PCFLAGS+= -DOVERRIDE_SYSTEM_FSXATTR
+endif
 
 
 GCFLAGS = $(OPTIMIZER) $(DEBUG) \
diff --git a/include/linux.h b/include/linux.h
index ddc053e..e26388b 100644
--- a/include/linux.h
+++ b/include/linux.h
@@ -32,7 +32,13 @@
 #include <stdio.h>
 #include <asm/types.h>
 #include <mntent.h>
+#ifdef OVERRIDE_SYSTEM_FSXATTR
+# define fsxattr sys_fsxattr
+#endif
 #include <linux/fs.h> /* fsxattr defintion for new kernels */
+#ifdef OVERRIDE_SYSTEM_FSXATTR
+# undef fsxattr
+#endif
 
 static __inline__ int xfsctl(const char *path, int fd, int cmd, void *p)
 {
@@ -175,7 +181,7 @@ static inline void platform_mntent_close(struct mntent_cursor * cursor)
  * are a copy of the definitions moved to linux/uapi/fs.h in the 4.5 kernel,
  * so this is purely for supporting builds against old kernel headers.
  */
-#ifndef FS_IOC_FSGETXATTR
+#if !defined FS_IOC_FSGETXATTR || defined OVERRIDE_SYSTEM_FSXATTR
 struct fsxattr {
 	__u32		fsx_xflags;	/* xflags field value (get/set) */
 	__u32		fsx_extsize;	/* extsize field value (get/set)*/
@@ -184,7 +190,9 @@ struct fsxattr {
 	__u32		fsx_cowextsize;	/* cow extsize field value (get/set) */
 	unsigned char	fsx_pad[8];
 };
+#endif
 
+#ifndef FS_IOC_FSGETXATTR
 /*
  * Flags for the fsx_xflags field
  */
diff --git a/io/fiemap.c b/io/fiemap.c
index f89da06..bcbae49 100644
--- a/io/fiemap.c
+++ b/io/fiemap.c
@@ -19,7 +19,6 @@
 #include "platform_defs.h"
 #include "command.h"
 #include <linux/fiemap.h>
-#include <linux/fs.h>
 #include "init.h"
 #include "io.h"
 


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 17/39] xfs_io: get and set the CoW extent size hint
  2016-10-25 23:03 [PATCH v10 00/39] xfsprogs: add reflink and dedupe support Darrick J. Wong
                   ` (15 preceding siblings ...)
  2016-10-25 23:05 ` [PATCH 16/39] libxfs: add configure option to override system header fsxattr Darrick J. Wong
@ 2016-10-25 23:05 ` Darrick J. Wong
  2016-10-26  1:06   ` Dave Chinner
  2016-10-25 23:05 ` [PATCH 18/39] xfs_io: add refcount+bmap error injection types Darrick J. Wong
                   ` (21 subsequent siblings)
  38 siblings, 1 reply; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-25 23:05 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: linux-xfs

Enable administrators to get or set the CoW extent size hint.
Report the hint when we run stat.  This also requires some
autoconf magic to detect whether or not fsx_cowextsize exists.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 configure.ac          |    1 
 include/builddefs.in  |    4 +
 io/Makefile           |    5 +
 io/attr.c             |    5 +
 io/cowextsize.c       |  202 +++++++++++++++++++++++++++++++++++++++++++++++++
 io/init.c             |    1 
 io/io.h               |    6 +
 io/open.c             |    3 +
 m4/package_libcdev.m4 |   26 ++++++
 man/man8/xfs_io.8     |   16 ++++
 10 files changed, 268 insertions(+), 1 deletion(-)
 create mode 100644 io/cowextsize.c


diff --git a/configure.ac b/configure.ac
index 8a39e75..539966d 100644
--- a/configure.ac
+++ b/configure.ac
@@ -134,6 +134,7 @@ AC_HAVE_FLS
 AC_HAVE_READDIR
 AC_HAVE_FSETXATTR
 AC_HAVE_MREMAP
+AC_HAVE_FSXATTR_COWEXTSIZE
 
 if test "$enable_blkid" = yes; then
 AC_HAVE_BLKID_TOPO
diff --git a/include/builddefs.in b/include/builddefs.in
index fd7eb74..165fa78 100644
--- a/include/builddefs.in
+++ b/include/builddefs.in
@@ -109,6 +109,7 @@ HAVE_MNTENT = @have_mntent@
 HAVE_FLS = @have_fls@
 HAVE_FSETXATTR = @have_fsetxattr@
 HAVE_MREMAP = @have_mremap@
+HAVE_FSXATTR_COWEXTSIZE = @have_fsxattr_cowextsize@
 ENABLE_INTERNAL_FSXATTR = @enable_internal_fsxattr@
 
 GCCFLAGS = -funsigned-char -fno-strict-aliasing -Wall
@@ -149,6 +150,9 @@ endif
 ifeq ($(ENABLE_BLKID),yes)
 PCFLAGS+= -DENABLE_BLKID
 endif
+ifeq ($(HAVE_FSXATTR_COWEXTSIZE),yes)
+PCFLAGS+= -DHAVE_FSXATTR_COWEXTSIZE
+endif
 ifeq ($(ENABLE_INTERNAL_FSXATTR),yes)
 PCFLAGS+= -DOVERRIDE_SYSTEM_FSXATTR
 endif
diff --git a/io/Makefile b/io/Makefile
index 62bc03b..1997ca9 100644
--- a/io/Makefile
+++ b/io/Makefile
@@ -99,6 +99,11 @@ ifeq ($(HAVE_MREMAP),yes)
 LCFLAGS += -DHAVE_MREMAP
 endif
 
+ifeq ($(HAVE_FSXATTR_COWEXTSIZE),yes)
+CFILES += cowextsize.c
+# -DHAVE_FSXATTR_COWEXTSIZE already set in PCFLAGS
+endif
+
 default: depend $(LTCOMMAND)
 
 include $(BUILDRULES)
diff --git a/io/attr.c b/io/attr.c
index d1962f3..b8eec1b 100644
--- a/io/attr.c
+++ b/io/attr.c
@@ -48,9 +48,10 @@ static struct xflags {
 	{ FS_XFLAG_NODEFRAG,		"f", "no-defrag"	},
 	{ FS_XFLAG_FILESTREAM,		"S", "filestream"	},
 	{ FS_XFLAG_DAX,			"x", "dax"		},
+	{ FS_XFLAG_COWEXTSIZE,		"C", "cowextsize"	},
 	{ 0, NULL, NULL }
 };
-#define CHATTR_XFLAG_LIST	"r"/*p*/"iasAdtPneEfSx"
+#define CHATTR_XFLAG_LIST	"r"/*p*/"iasAdtPneEfSxC"
 
 static void
 lsattr_help(void)
@@ -75,6 +76,7 @@ lsattr_help(void)
 " f -- do not include this file when defragmenting the filesystem\n"
 " S -- enable filestreams allocator for this directory\n"
 " x -- Use direct access (DAX) for data in this file\n"
+" C -- for files with shared blocks, observe the inode CoW extent size value\n"
 "\n"
 " Options:\n"
 " -R -- recursively descend (useful when current file is a directory)\n"
@@ -111,6 +113,7 @@ chattr_help(void)
 " +/-f -- set/clear the no-defrag flag\n"
 " +/-S -- set/clear the filestreams allocator flag\n"
 " +/-x -- set/clear the direct access (DAX) flag\n"
+" +/-C -- set/clear the CoW extent-size flag\n"
 " Note1: user must have certain capabilities to modify immutable/append-only.\n"
 " Note2: immutable/append-only files cannot be deleted; removing these files\n"
 "        requires the immutable/append-only flag to be cleared first.\n"
diff --git a/io/cowextsize.c b/io/cowextsize.c
new file mode 100644
index 0000000..b4a1c2e
--- /dev/null
+++ b/io/cowextsize.c
@@ -0,0 +1,202 @@
+/*
+ * Copyright (C) 2016 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+/*
+ * If configure didn't find a struct fsxattr with fsx_cowextsize,
+ * disable the only other source (so far) of struct fsxattr.  Thus,
+ * build with the internal definition of struct fsxattr, which has
+ * fsx_cowextsize.
+ */
+#include "platform_defs.h"
+#include "command.h"
+#include "init.h"
+#include "io.h"
+#include "input.h"
+#include "path.h"
+
+static cmdinfo_t cowextsize_cmd;
+static long cowextsize;
+
+static void
+cowextsize_help(void)
+{
+	printf(_(
+"\n"
+" report or modify preferred CoW extent size (in bytes) for the current path\n"
+"\n"
+" -R -- recursively descend (useful when current path is a directory)\n"
+" -D -- recursively descend, only modifying cowextsize on directories\n"
+"\n"));
+}
+
+static int
+get_cowextsize(const char *path, int fd)
+{
+	struct fsxattr	fsx;
+
+	if ((xfsctl(path, fd, XFS_IOC_FSGETXATTR, &fsx)) < 0) {
+		printf("%s: XFS_IOC_FSGETXATTR %s: %s\n",
+			progname, path, strerror(errno));
+		return 0;
+	}
+	printf("[%u] %s\n", fsx.fsx_cowextsize, path);
+	return 0;
+}
+
+static int
+set_cowextsize(const char *path, int fd, long extsz)
+{
+	struct fsxattr	fsx;
+	struct stat64	stat;
+
+	if (fstat64(fd, &stat) < 0) {
+		perror("fstat64");
+		return 0;
+	}
+	if ((xfsctl(path, fd, XFS_IOC_FSGETXATTR, &fsx)) < 0) {
+		printf("%s: XFS_IOC_FSGETXATTR %s: %s\n",
+			progname, path, strerror(errno));
+		return 0;
+	}
+
+	if (S_ISREG(stat.st_mode) || S_ISDIR(stat.st_mode)) {
+		fsx.fsx_xflags |= FS_XFLAG_COWEXTSIZE;
+	} else {
+		printf(_("invalid target file type - file %s\n"), path);
+		return 0;
+	}
+	fsx.fsx_cowextsize = extsz;
+
+	if ((xfsctl(path, fd, XFS_IOC_FSSETXATTR, &fsx)) < 0) {
+		printf("%s: XFS_IOC_FSSETXATTR %s: %s\n",
+			progname, path, strerror(errno));
+		return 0;
+	}
+
+	return 0;
+}
+
+static int
+get_cowextsize_callback(
+	const char		*path,
+	const struct stat	*stat,
+	int			status,
+	struct FTW		*data)
+{
+	int			fd;
+
+	if (recurse_dir && !S_ISDIR(stat->st_mode))
+		return 0;
+
+	fd = open(path, O_RDONLY);
+	if (fd < 0) {
+		fprintf(stderr, _("%s: cannot open %s: %s\n"),
+			progname, path, strerror(errno));
+	} else {
+		get_cowextsize(path, fd);
+		close(fd);
+	}
+	return 0;
+}
+
+static int
+set_cowextsize_callback(
+	const char		*path,
+	const struct stat	*stat,
+	int			status,
+	struct FTW		*data)
+{
+	int			fd;
+
+	if (recurse_dir && !S_ISDIR(stat->st_mode))
+		return 0;
+
+	fd = open(path, O_RDONLY);
+	if (fd < 0) {
+		fprintf(stderr, _("%s: cannot open %s: %s\n"),
+			progname, path, strerror(errno));
+	} else {
+		set_cowextsize(path, fd, cowextsize);
+		close(fd);
+	}
+	return 0;
+}
+
+static int
+cowextsize_f(
+	int		argc,
+	char		**argv)
+{
+	size_t			blocksize, sectsize;
+	int			c;
+
+	recurse_all = recurse_dir = 0;
+	init_cvtnum(&blocksize, &sectsize);
+	while ((c = getopt(argc, argv, "DR")) != EOF) {
+		switch (c) {
+		case 'D':
+			recurse_all = 0;
+			recurse_dir = 1;
+			break;
+		case 'R':
+			recurse_all = 1;
+			recurse_dir = 0;
+			break;
+		default:
+			return command_usage(&cowextsize_cmd);
+		}
+	}
+
+	if (optind < argc) {
+		cowextsize = (long)cvtnum(blocksize, sectsize, argv[optind]);
+		if (cowextsize < 0) {
+			printf(_("non-numeric cowextsize argument -- %s\n"),
+				argv[optind]);
+			return 0;
+		}
+	} else {
+		cowextsize = -1;
+	}
+
+	if (recurse_all || recurse_dir)
+		nftw(file->name, (cowextsize >= 0) ?
+			set_cowextsize_callback : get_cowextsize_callback,
+			100, FTW_PHYS | FTW_MOUNT | FTW_DEPTH);
+	else if (cowextsize >= 0)
+		set_cowextsize(file->name, file->fd, cowextsize);
+	else
+		get_cowextsize(file->name, file->fd);
+	return 0;
+}
+
+void
+cowextsize_init(void)
+{
+	cowextsize_cmd.name = "cowextsize";
+	cowextsize_cmd.cfunc = cowextsize_f;
+	cowextsize_cmd.args = _("[-D | -R] [cowextsize]");
+	cowextsize_cmd.argmin = 0;
+	cowextsize_cmd.argmax = -1;
+	cowextsize_cmd.flags = CMD_NOMAP_OK;
+	cowextsize_cmd.oneline =
+		_("get/set preferred CoW extent size (in bytes) for the open file");
+	cowextsize_cmd.help = cowextsize_help;
+
+	add_command(&cowextsize_cmd);
+}
diff --git a/io/init.c b/io/init.c
index efe7390..6b88cc6 100644
--- a/io/init.c
+++ b/io/init.c
@@ -85,6 +85,7 @@ init_commands(void)
 	sync_range_init();
 	truncate_init();
 	reflink_init();
+	cowextsize_init();
 }
 
 static int
diff --git a/io/io.h b/io/io.h
index 2bc7ac4..4264e4d 100644
--- a/io/io.h
+++ b/io/io.h
@@ -169,3 +169,9 @@ extern void		readdir_init(void);
 #endif
 
 extern void		reflink_init(void);
+
+#ifdef HAVE_FSXATTR_COWEXTSIZE
+extern void		cowextsize_init(void);
+#else
+#define cowextsize_init()	do { } while (0)
+#endif
diff --git a/io/open.c b/io/open.c
index 8f934ee..27943c7 100644
--- a/io/open.c
+++ b/io/open.c
@@ -125,6 +125,9 @@ stat_f(
 		printxattr(fsx.fsx_xflags, verbose, 0, file->name, 1, 1);
 		printf(_("fsxattr.projid = %u\n"), fsx.fsx_projid);
 		printf(_("fsxattr.extsize = %u\n"), fsx.fsx_extsize);
+#if defined HAVE_FSXATTR_COWEXTSIZE
+		printf(_("fsxattr.cowextsize = %u\n"), fsx.fsx_cowextsize);
+#endif
 		printf(_("fsxattr.nextents = %u\n"), fsx.fsx_nextents);
 		printf(_("fsxattr.naextents = %u\n"), fsxa.fsx_nextents);
 	}
diff --git a/m4/package_libcdev.m4 b/m4/package_libcdev.m4
index 7a847e9..45954c2 100644
--- a/m4/package_libcdev.m4
+++ b/m4/package_libcdev.m4
@@ -265,3 +265,29 @@ AC_DEFUN([AC_HAVE_MREMAP],
        )
     AC_SUBST(have_mremap)
   ])
+
+#
+# Check if we have a struct fsxattr with a fsx_cowextsize field.
+# If linux/fs.h has a struct with that field, then we're ok.
+# If we can't find fsxattr in linux/fs.h at all, the internal
+# definitions provide it, and we're ok.
+#
+# The only way we won't have this is if the kernel headers don't
+# have the field.
+#
+AC_DEFUN([AC_HAVE_FSXATTR_COWEXTSIZE],
+  [ AM_CONDITIONAL([INTERNAL_FSXATTR], [test "x$enable_internal_fsxattr" = xyes])
+    AM_COND_IF([INTERNAL_FSXATTR],
+    [have_fsxattr_cowextsize=yes],
+    [ AC_CHECK_TYPE(struct fsxattr,
+	  [AC_CHECK_MEMBER(struct fsxattr.fsx_cowextsize,
+		  have_fsxattr_cowextsize=yes,
+		  have_fsxattr_cowextsize=no,
+		  [#include <linux/fs.h>]
+          )],
+	  have_fsxattr_cowextsize=yes,
+	  [#include <linux/fs.h>]
+      )
+    ])
+    AC_SUBST(have_fsxattr_cowextsize)
+  ])
diff --git a/man/man8/xfs_io.8 b/man/man8/xfs_io.8
index d089524..2365550 100644
--- a/man/man8/xfs_io.8
+++ b/man/man8/xfs_io.8
@@ -283,6 +283,22 @@ The
 should be specified in bytes, or using one of the usual units suffixes
 (k, m, g, b, etc). The extent size is always reported in units of bytes.
 .TP
+.BI "cowextsize [ \-R | \-D ] [ " value " ]"
+Display and/or modify the preferred copy-on-write extent size used
+when allocating space for the currently open file. If the
+.B \-R
+option is specified, a recursive descent is performed
+for all directory entries below the currently open file
+.RB ( \-D
+can be used to restrict the output to directories only).
+If the target file is a directory, then the inherited CoW extent size
+is set for that directory (new files created in that directory
+inherit that CoW extent size).
+The
+.I value
+should be specified in bytes, or using one of the usual units suffixes
+(k, m, g, b, etc). The extent size is always reported in units of bytes.
+.TP
 .BI "allocsp " size " 0"
 Sets the size of the file to
 .I size


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 18/39] xfs_io: add refcount+bmap error injection types
  2016-10-25 23:03 [PATCH v10 00/39] xfsprogs: add reflink and dedupe support Darrick J. Wong
                   ` (16 preceding siblings ...)
  2016-10-25 23:05 ` [PATCH 17/39] xfs_io: get and set the CoW extent size hint Darrick J. Wong
@ 2016-10-25 23:05 ` Darrick J. Wong
  2016-10-26 10:33   ` Christoph Hellwig
  2016-10-25 23:05 ` [PATCH 19/39] xfs_io: support injecting the 'per-AG reservation critically low' error Darrick J. Wong
                   ` (20 subsequent siblings)
  38 siblings, 1 reply; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-25 23:05 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: linux-xfs

Add refcount and bmap deferred finish to the types of errors we can
inject.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 io/inject.c |    8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)


diff --git a/io/inject.c b/io/inject.c
index 16ac925..56642b8 100644
--- a/io/inject.c
+++ b/io/inject.c
@@ -78,7 +78,13 @@ error_tag(char *name)
 		{ XFS_ERRTAG_FREE_EXTENT,		"free_extent" },
 #define XFS_ERRTAG_RMAP_FINISH_ONE			23
 		{ XFS_ERRTAG_RMAP_FINISH_ONE,		"rmap_finish_one" },
-#define XFS_ERRTAG_MAX                                  24
+#define XFS_ERRTAG_REFCOUNT_CONTINUE_UPDATE		24
+		{ XFS_ERRTAG_REFCOUNT_CONTINUE_UPDATE,	"refcount_continue_update" },
+#define XFS_ERRTAG_REFCOUNT_FINISH_ONE			25
+		{ XFS_ERRTAG_REFCOUNT_FINISH_ONE,	"refcount_finish_one" },
+#define XFS_ERRTAG_BMAP_FINISH_ONE			26
+		{ XFS_ERRTAG_BMAP_FINISH_ONE,		"bmap_finish_one" },
+#define XFS_ERRTAG_MAX                                  27
 		{ XFS_ERRTAG_MAX,			NULL }
 	};
 	int	count;


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 19/39] xfs_io: support injecting the 'per-AG reservation critically low' error
  2016-10-25 23:03 [PATCH v10 00/39] xfsprogs: add reflink and dedupe support Darrick J. Wong
                   ` (17 preceding siblings ...)
  2016-10-25 23:05 ` [PATCH 18/39] xfs_io: add refcount+bmap error injection types Darrick J. Wong
@ 2016-10-25 23:05 ` Darrick J. Wong
  2016-10-26 10:33   ` Christoph Hellwig
  2016-10-25 23:05 ` [PATCH 20/39] xfs_io: provide long-format help for falloc Darrick J. Wong
                   ` (19 subsequent siblings)
  38 siblings, 1 reply; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-25 23:05 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: linux-xfs

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 io/inject.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)


diff --git a/io/inject.c b/io/inject.c
index 56642b8..5d5e4ae 100644
--- a/io/inject.c
+++ b/io/inject.c
@@ -84,7 +84,9 @@ error_tag(char *name)
 		{ XFS_ERRTAG_REFCOUNT_FINISH_ONE,	"refcount_finish_one" },
 #define XFS_ERRTAG_BMAP_FINISH_ONE			26
 		{ XFS_ERRTAG_BMAP_FINISH_ONE,		"bmap_finish_one" },
-#define XFS_ERRTAG_MAX                                  27
+#define XFS_ERRTAG_AG_RESV_CRITICAL			27
+		{ XFS_ERRTAG_AG_RESV_CRITICAL,		"ag_resv_critical" },
+#define XFS_ERRTAG_MAX                                  28
 		{ XFS_ERRTAG_MAX,			NULL }
 	};
 	int	count;


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 20/39] xfs_io: provide long-format help for falloc
  2016-10-25 23:03 [PATCH v10 00/39] xfsprogs: add reflink and dedupe support Darrick J. Wong
                   ` (18 preceding siblings ...)
  2016-10-25 23:05 ` [PATCH 19/39] xfs_io: support injecting the 'per-AG reservation critically low' error Darrick J. Wong
@ 2016-10-25 23:05 ` Darrick J. Wong
  2016-10-26 10:34   ` Christoph Hellwig
  2016-10-25 23:05 ` [PATCH 21/39] xfs_io: try to unshare copy-on-write blocks via fallocate Darrick J. Wong
                   ` (18 subsequent siblings)
  38 siblings, 1 reply; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-25 23:05 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: linux-xfs

Provide long-format help for falloc so that xfstests can use
_require_xfs_io_command to check for falloc command line args.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 io/prealloc.c |   21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)


diff --git a/io/prealloc.c b/io/prealloc.c
index 713ea7b..7f5c200 100644
--- a/io/prealloc.c
+++ b/io/prealloc.c
@@ -164,6 +164,26 @@ zero_f(
 
 
 #if defined (HAVE_FALLOCATE)
+static void
+falloc_help(void)
+{
+	printf(_(
+"\n"
+" modifies space associated with part of a file via fallocate"
+"\n"
+" Example:\n"
+" 'falloc 0 1m' - fills all holes within the first megabyte\n"
+"\n"
+" falloc uses the fallocate system call to alter space allocations in the\n"
+" open file.  The following operations are supported:\n"
+" All the file offsets are in units of bytes.\n"
+" -c -- collapses the given range.\n"
+" -i -- inserts a hole into the given range of the file.\n"
+" -k -- do not change file size.\n"
+" -p -- unmap the given range from the file.\n"
+"\n"));
+}
+
 static int
 fallocate_f(
 	int		argc,
@@ -349,6 +369,7 @@ prealloc_init(void)
 	falloc_cmd.args = _("[-c] [-k] [-p] off len");
 	falloc_cmd.oneline =
 	_("allocates space associated with part of a file via fallocate");
+	falloc_cmd.help = falloc_help;
 	add_command(&falloc_cmd);
 
 	fpunch_cmd.name = "fpunch";


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 21/39] xfs_io: try to unshare copy-on-write blocks via fallocate
  2016-10-25 23:03 [PATCH v10 00/39] xfsprogs: add reflink and dedupe support Darrick J. Wong
                   ` (19 preceding siblings ...)
  2016-10-25 23:05 ` [PATCH 20/39] xfs_io: provide long-format help for falloc Darrick J. Wong
@ 2016-10-25 23:05 ` Darrick J. Wong
  2016-10-26 10:34   ` Christoph Hellwig
  2016-10-25 23:05 ` [PATCH 22/39] xfs_logprint: support cowextsize reporting in log contents Darrick J. Wong
                   ` (17 subsequent siblings)
  38 siblings, 1 reply; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-25 23:05 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: linux-xfs

Wire up the "unshare" flag to the xfs_io fallocate command.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 io/prealloc.c     |   42 ++++++++++++++++++++++++++++++++++++++++--
 man/man8/xfs_io.8 |    5 +++++
 2 files changed, 45 insertions(+), 2 deletions(-)


diff --git a/io/prealloc.c b/io/prealloc.c
index 7f5c200..a9d66cc 100644
--- a/io/prealloc.c
+++ b/io/prealloc.c
@@ -40,6 +40,10 @@
 #define FALLOC_FL_INSERT_RANGE 0x20
 #endif
 
+#ifndef FALLOC_FL_UNSHARE_RANGE
+#define FALLOC_FL_UNSHARE_RANGE 0x40
+#endif
+
 static cmdinfo_t allocsp_cmd;
 static cmdinfo_t freesp_cmd;
 static cmdinfo_t resvsp_cmd;
@@ -181,6 +185,7 @@ falloc_help(void)
 " -i -- inserts a hole into the given range of the file.\n"
 " -k -- do not change file size.\n"
 " -p -- unmap the given range from the file.\n"
+" -u -- unshare shared extents in the given range.\n"
 "\n"));
 }
 
@@ -193,7 +198,7 @@ fallocate_f(
 	int		mode = 0;
 	int		c;
 
-	while ((c = getopt(argc, argv, "cikp")) != EOF) {
+	while ((c = getopt(argc, argv, "cikpu")) != EOF) {
 		switch (c) {
 		case 'c':
 			mode = FALLOC_FL_COLLAPSE_RANGE;
@@ -207,6 +212,9 @@ fallocate_f(
 		case 'p':
 			mode = FALLOC_FL_PUNCH_HOLE;
 			break;
+		case 'u':
+			mode = FALLOC_FL_UNSHARE_RANGE;
+			break;
 		default:
 			command_usage(&falloc_cmd);
 		}
@@ -306,6 +314,26 @@ fzero_f(
 	}
 	return 0;
 }
+
+static int
+funshare_f(
+	int		argc,
+	char		**argv)
+{
+	xfs_flock64_t	segment;
+	int		mode = FALLOC_FL_UNSHARE_RANGE;
+	int		index = 1;
+
+	if (!offset_length(argv[index], argv[index + 1], &segment))
+		return 0;
+
+	if (fallocate(file->fd, mode,
+			segment.l_start, segment.l_len)) {
+		perror("fallocate");
+		return 0;
+	}
+	return 0;
+}
 #endif	/* HAVE_FALLOCATE */
 
 void
@@ -366,7 +394,7 @@ prealloc_init(void)
 	falloc_cmd.argmin = 2;
 	falloc_cmd.argmax = -1;
 	falloc_cmd.flags = CMD_NOMAP_OK | CMD_FOREIGN_OK;
-	falloc_cmd.args = _("[-c] [-k] [-p] off len");
+	falloc_cmd.args = _("[-c] [-k] [-p] [-u] off len");
 	falloc_cmd.oneline =
 	_("allocates space associated with part of a file via fallocate");
 	falloc_cmd.help = falloc_help;
@@ -411,5 +439,15 @@ prealloc_init(void)
 	fzero_cmd.oneline =
 	_("zeroes space and eliminates holes by preallocating");
 	add_command(&fzero_cmd);
+
+	fzero_cmd.name = "funshare";
+	fzero_cmd.cfunc = funshare_f;
+	fzero_cmd.argmin = 2;
+	fzero_cmd.argmax = 2;
+	fzero_cmd.flags = CMD_NOMAP_OK | CMD_FOREIGN_OK;
+	fzero_cmd.args = _("off len");
+	fzero_cmd.oneline =
+	_("unshares shared blocks within the range");
+	add_command(&fzero_cmd);
 #endif	/* HAVE_FALLOCATE */
 }
diff --git a/man/man8/xfs_io.8 b/man/man8/xfs_io.8
index 2365550..eb7b878 100644
--- a/man/man8/xfs_io.8
+++ b/man/man8/xfs_io.8
@@ -431,6 +431,11 @@ the FALLOC_FL_PUNCH_HOLE flag as described in the
 .BR fallocate (2)
 manual page.
 .TP
+.BI funshare " offset length"
+Call fallocate with FALLOC_FL_UNSHARE_RANGE flag as described in the
+.BR fallocate (2)
+manual page to unshare all shared blocks within the range.
+.TP
 .BI fzero " offset length"
 Call fallocate with FALLOC_FL_ZERO_RANGE flag as described in the
 .BR fallocate (2)


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 22/39] xfs_logprint: support cowextsize reporting in log contents
  2016-10-25 23:03 [PATCH v10 00/39] xfsprogs: add reflink and dedupe support Darrick J. Wong
                   ` (20 preceding siblings ...)
  2016-10-25 23:05 ` [PATCH 21/39] xfs_io: try to unshare copy-on-write blocks via fallocate Darrick J. Wong
@ 2016-10-25 23:05 ` Darrick J. Wong
  2016-10-26 10:34   ` Christoph Hellwig
  2016-10-25 23:06 ` [PATCH 23/39] xfs_logprint: support refcount redo items Darrick J. Wong
                   ` (16 subsequent siblings)
  38 siblings, 1 reply; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-25 23:05 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: linux-xfs

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 logprint/log_misc.c      |    4 ++++
 logprint/log_print_all.c |    4 ++++
 2 files changed, 8 insertions(+)


diff --git a/logprint/log_misc.c b/logprint/log_misc.c
index e4af09b..dbe5729 100644
--- a/logprint/log_misc.c
+++ b/logprint/log_misc.c
@@ -467,6 +467,10 @@ xlog_print_trans_inode_core(
 	   ip->di_dmstate);
     printf(_("flags 0x%x gen 0x%x\n"),
 	   ip->di_flags, ip->di_gen);
+    if (ip->di_version == 3) {
+        printf(_("flags2 0x%llx cowextsize 0x%x\n"),
+            (unsigned long long)ip->di_flags2, ip->di_cowextsize);
+    }
 }
 
 void
diff --git a/logprint/log_print_all.c b/logprint/log_print_all.c
index 0fe354b..46952c4 100644
--- a/logprint/log_print_all.c
+++ b/logprint/log_print_all.c
@@ -272,6 +272,10 @@ xlog_recover_print_inode_core(
 	     "gen:%d\n"),
 	       (int)di->di_forkoff, di->di_dmevmask, (int)di->di_dmstate,
 	       (int)di->di_flags, di->di_gen);
+	if (di->di_version == 3) {
+		printf(_("flags2 0x%llx cowextsize 0x%x\n"),
+			(unsigned long long)di->di_flags2, di->di_cowextsize);
+	}
 }
 
 STATIC void


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 23/39] xfs_logprint: support refcount redo items
  2016-10-25 23:03 [PATCH v10 00/39] xfsprogs: add reflink and dedupe support Darrick J. Wong
                   ` (21 preceding siblings ...)
  2016-10-25 23:05 ` [PATCH 22/39] xfs_logprint: support cowextsize reporting in log contents Darrick J. Wong
@ 2016-10-25 23:06 ` Darrick J. Wong
  2016-10-26 10:37   ` Christoph Hellwig
  2016-10-25 23:06 ` [PATCH 24/39] xfs_logprint: support bmap " Darrick J. Wong
                   ` (15 subsequent siblings)
  38 siblings, 1 reply; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-25 23:06 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: linux-xfs

Print reference count update redo items.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 logprint/log_misc.c      |   11 +++
 logprint/log_print_all.c |   12 ++++
 logprint/log_redo.c      |  148 ++++++++++++++++++++++++++++++++++++++++++++++
 logprint/logprint.h      |    5 ++
 4 files changed, 176 insertions(+)


diff --git a/logprint/log_misc.c b/logprint/log_misc.c
index dbe5729..8284206 100644
--- a/logprint/log_misc.c
+++ b/logprint/log_misc.c
@@ -962,6 +962,17 @@ xlog_print_record(
 					be32_to_cpu(op_head->oh_len));
 			break;
 		    }
+		    case XFS_LI_CUI: {
+			skip = xlog_print_trans_cui(&ptr,
+					be32_to_cpu(op_head->oh_len),
+					continued);
+			break;
+		    }
+		    case XFS_LI_CUD: {
+			skip = xlog_print_trans_cud(&ptr,
+					be32_to_cpu(op_head->oh_len));
+			break;
+		    }
 		    case XFS_LI_QUOTAOFF: {
 			skip = xlog_print_trans_qoff(&ptr,
 					be32_to_cpu(op_head->oh_len));
diff --git a/logprint/log_print_all.c b/logprint/log_print_all.c
index 46952c4..eb3e326 100644
--- a/logprint/log_print_all.c
+++ b/logprint/log_print_all.c
@@ -418,6 +418,12 @@ xlog_recover_print_logitem(
 	case XFS_LI_RUI:
 		xlog_recover_print_rui(item);
 		break;
+	case XFS_LI_CUD:
+		xlog_recover_print_cud(item);
+		break;
+	case XFS_LI_CUI:
+		xlog_recover_print_cui(item);
+		break;
 	case XFS_LI_DQUOT:
 		xlog_recover_print_dquot(item);
 		break;
@@ -458,6 +464,12 @@ xlog_recover_print_item(
 	case XFS_LI_RUI:
 		printf("RUI");
 		break;
+	case XFS_LI_CUD:
+		printf("CUD");
+		break;
+	case XFS_LI_CUI:
+		printf("CUI");
+		break;
 	case XFS_LI_DQUOT:
 		printf("DQ ");
 		break;
diff --git a/logprint/log_redo.c b/logprint/log_redo.c
index 40e0727..6be073e 100644
--- a/logprint/log_redo.c
+++ b/logprint/log_redo.c
@@ -378,3 +378,151 @@ xlog_recover_print_rud(
 	f = item->ri_buf[0].i_addr;
 	xlog_print_trans_rud(&f, sizeof(struct xfs_rud_log_format));
 }
+
+/* Reference Count Update Items */
+
+static int
+xfs_cui_copy_format(
+	char			  *buf,
+	uint			  len,
+	struct xfs_cui_log_format *dst_fmt,
+	int			  continued)
+{
+	uint nextents = ((struct xfs_cui_log_format *)buf)->cui_nextents;
+	uint dst_len = xfs_cui_log_format_sizeof(nextents);
+
+	if (len == dst_len || continued) {
+		memcpy((char *)dst_fmt, buf, len);
+		return 0;
+	}
+	fprintf(stderr, _("%s: bad size of CUI format: %u; expected %u; nextents = %u\n"),
+		progname, len, dst_len, nextents);
+	return 1;
+}
+
+int
+xlog_print_trans_cui(
+	char			**ptr,
+	uint			src_len,
+	int			continued)
+{
+	struct xfs_cui_log_format	*src_f, *f = NULL;
+	uint			dst_len;
+	uint			nextents;
+	struct xfs_phys_extent	*ex;
+	int			i;
+	int			error = 0;
+	int			core_size;
+
+	core_size = offsetof(struct xfs_cui_log_format, cui_extents);
+
+	/*
+	 * memmove to ensure 8-byte alignment for the long longs in
+	 * struct xfs_cui_log_format structure
+	 */
+	src_f = malloc(src_len);
+	if (src_f == NULL) {
+		fprintf(stderr, _("%s: %s: malloc failed\n"),
+			progname, __func__);
+		exit(1);
+	}
+	memmove((char*)src_f, *ptr, src_len);
+	*ptr += src_len;
+
+	/* convert to native format */
+	nextents = src_f->cui_nextents;
+	dst_len = xfs_cui_log_format_sizeof(nextents);
+
+	if (continued && src_len < core_size) {
+		printf(_("CUI: Not enough data to decode further\n"));
+		error = 1;
+		goto error;
+	}
+
+	f = malloc(dst_len);
+	if (f == NULL) {
+		fprintf(stderr, _("%s: %s: malloc failed\n"),
+			progname, __func__);
+		exit(1);
+	}
+	if (xfs_cui_copy_format((char *)src_f, src_len, f, continued)) {
+		error = 1;
+		goto error;
+	}
+
+	printf(_("CUI:  #regs: %d	num_extents: %d  id: 0x%llx\n"),
+		f->cui_size, f->cui_nextents, (unsigned long long)f->cui_id);
+
+	if (continued) {
+		printf(_("CUI extent data skipped (CONTINUE set, no space)\n"));
+		goto error;
+	}
+
+	ex = f->cui_extents;
+	for (i=0; i < f->cui_nextents; i++) {
+		printf("(s: 0x%llx, l: %d, f: 0x%x) ",
+			(unsigned long long)ex->pe_startblock, ex->pe_len,
+			ex->pe_flags);
+		printf("\n");
+		ex++;
+	}
+error:
+	free(src_f);
+	free(f);
+	return error;
+}
+
+void
+xlog_recover_print_cui(
+	struct xlog_recover_item	*item)
+{
+	char				*src_f;
+	uint				src_len;
+
+	src_f = item->ri_buf[0].i_addr;
+	src_len = item->ri_buf[0].i_len;
+
+	xlog_print_trans_cui(&src_f, src_len, 0);
+}
+
+int
+xlog_print_trans_cud(
+	char				**ptr,
+	uint				len)
+{
+	struct xfs_cud_log_format	*f;
+	struct xfs_cud_log_format	lbuf;
+
+	/* size without extents at end */
+	uint core_size = sizeof(struct xfs_cud_log_format);
+
+	/*
+	 * memmove to ensure 8-byte alignment for the long longs in
+	 * xfs_efd_log_format_t structure
+	 */
+	memmove(&lbuf, *ptr, MIN(core_size, len));
+	f = &lbuf;
+	*ptr += len;
+	if (len >= core_size) {
+		printf(_("CUD:  #regs: %d	                 id: 0x%llx\n"),
+			f->cud_size,
+			(unsigned long long)f->cud_cui_id);
+
+		/* don't print extents as they are not used */
+
+		return 0;
+	} else {
+		printf(_("CUD: Not enough data to decode further\n"));
+		return 1;
+	}
+}
+
+void
+xlog_recover_print_cud(
+	struct xlog_recover_item	*item)
+{
+	char				*f;
+
+	f = item->ri_buf[0].i_addr;
+	xlog_print_trans_cud(&f, sizeof(struct xfs_cud_log_format));
+}
diff --git a/logprint/logprint.h b/logprint/logprint.h
index bdd0ee1..ae63c2e 100644
--- a/logprint/logprint.h
+++ b/logprint/logprint.h
@@ -54,4 +54,9 @@ extern void xlog_recover_print_rui(struct xlog_recover_item *item);
 extern int xlog_print_trans_rud(char **ptr, uint len);
 extern void xlog_recover_print_rud(struct xlog_recover_item *item);
 
+extern int xlog_print_trans_cui(char **ptr, uint src_len, int continued);
+extern void xlog_recover_print_cui(struct xlog_recover_item *item);
+extern int xlog_print_trans_cud(char **ptr, uint len);
+extern void xlog_recover_print_cud(struct xlog_recover_item *item);
+
 #endif	/* LOGPRINT_H */


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 24/39] xfs_logprint: support bmap redo items
  2016-10-25 23:03 [PATCH v10 00/39] xfsprogs: add reflink and dedupe support Darrick J. Wong
                   ` (22 preceding siblings ...)
  2016-10-25 23:06 ` [PATCH 23/39] xfs_logprint: support refcount redo items Darrick J. Wong
@ 2016-10-25 23:06 ` Darrick J. Wong
  2016-10-26 10:38   ` Christoph Hellwig
  2016-10-25 23:06 ` [PATCH 25/39] man: document the inode cowextsize flags & fields Darrick J. Wong
                   ` (14 subsequent siblings)
  38 siblings, 1 reply; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-25 23:06 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: linux-xfs

Print block mapping update redo items.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 logprint/log_misc.c      |   11 +++
 logprint/log_print_all.c |   12 ++++
 logprint/log_redo.c      |  149 ++++++++++++++++++++++++++++++++++++++++++++++
 logprint/logprint.h      |    5 ++
 4 files changed, 177 insertions(+)


diff --git a/logprint/log_misc.c b/logprint/log_misc.c
index 8284206..6dc0ed4 100644
--- a/logprint/log_misc.c
+++ b/logprint/log_misc.c
@@ -973,6 +973,17 @@ xlog_print_record(
 					be32_to_cpu(op_head->oh_len));
 			break;
 		    }
+		    case XFS_LI_BUI: {
+			skip = xlog_print_trans_bui(&ptr,
+					be32_to_cpu(op_head->oh_len),
+					continued);
+			break;
+		    }
+		    case XFS_LI_BUD: {
+			skip = xlog_print_trans_bud(&ptr,
+					be32_to_cpu(op_head->oh_len));
+			break;
+		    }
 		    case XFS_LI_QUOTAOFF: {
 			skip = xlog_print_trans_qoff(&ptr,
 					be32_to_cpu(op_head->oh_len));
diff --git a/logprint/log_print_all.c b/logprint/log_print_all.c
index eb3e326..f49316e 100644
--- a/logprint/log_print_all.c
+++ b/logprint/log_print_all.c
@@ -424,6 +424,12 @@ xlog_recover_print_logitem(
 	case XFS_LI_CUI:
 		xlog_recover_print_cui(item);
 		break;
+	case XFS_LI_BUD:
+		xlog_recover_print_bud(item);
+		break;
+	case XFS_LI_BUI:
+		xlog_recover_print_bui(item);
+		break;
 	case XFS_LI_DQUOT:
 		xlog_recover_print_dquot(item);
 		break;
@@ -470,6 +476,12 @@ xlog_recover_print_item(
 	case XFS_LI_CUI:
 		printf("CUI");
 		break;
+	case XFS_LI_BUD:
+		printf("BUD");
+		break;
+	case XFS_LI_BUI:
+		printf("BUI");
+		break;
 	case XFS_LI_DQUOT:
 		printf("DQ ");
 		break;
diff --git a/logprint/log_redo.c b/logprint/log_redo.c
index 6be073e..9668567 100644
--- a/logprint/log_redo.c
+++ b/logprint/log_redo.c
@@ -526,3 +526,152 @@ xlog_recover_print_cud(
 	f = item->ri_buf[0].i_addr;
 	xlog_print_trans_cud(&f, sizeof(struct xfs_cud_log_format));
 }
+
+/* Block Mapping Update Items */
+
+static int
+xfs_bui_copy_format(
+	char			  *buf,
+	uint			  len,
+	struct xfs_bui_log_format *dst_fmt,
+	int			  continued)
+{
+	uint nextents = ((struct xfs_bui_log_format *)buf)->bui_nextents;
+	uint dst_len = xfs_bui_log_format_sizeof(nextents);
+
+	if (len == dst_len || continued) {
+		memcpy((char *)dst_fmt, buf, len);
+		return 0;
+	}
+	fprintf(stderr, _("%s: bad size of BUI format: %u; expected %u; nextents = %u\n"),
+		progname, len, dst_len, nextents);
+	return 1;
+}
+
+int
+xlog_print_trans_bui(
+	char			**ptr,
+	uint			src_len,
+	int			continued)
+{
+	struct xfs_bui_log_format	*src_f, *f = NULL;
+	uint			dst_len;
+	uint			nextents;
+	struct xfs_map_extent	*ex;
+	int			i;
+	int			error = 0;
+	int			core_size;
+
+	core_size = offsetof(struct xfs_bui_log_format, bui_extents);
+
+	/*
+	 * memmove to ensure 8-byte alignment for the long longs in
+	 * struct xfs_bui_log_format structure
+	 */
+	src_f = malloc(src_len);
+	if (src_f == NULL) {
+		fprintf(stderr, _("%s: %s: malloc failed\n"),
+			progname, __func__);
+		exit(1);
+	}
+	memmove((char*)src_f, *ptr, src_len);
+	*ptr += src_len;
+
+	/* convert to native format */
+	nextents = src_f->bui_nextents;
+	dst_len = xfs_bui_log_format_sizeof(nextents);
+
+	if (continued && src_len < core_size) {
+		printf(_("BUI: Not enough data to decode further\n"));
+		error = 1;
+		goto error;
+	}
+
+	f = malloc(dst_len);
+	if (f == NULL) {
+		fprintf(stderr, _("%s: %s: malloc failed\n"),
+			progname, __func__);
+		exit(1);
+	}
+	if (xfs_bui_copy_format((char *)src_f, src_len, f, continued)) {
+		error = 1;
+		goto error;
+	}
+
+	printf(_("BUI:  #regs: %d	num_extents: %d  id: 0x%llx\n"),
+		f->bui_size, f->bui_nextents, (unsigned long long)f->bui_id);
+
+	if (continued) {
+		printf(_("BUI extent data skipped (CONTINUE set, no space)\n"));
+		goto error;
+	}
+
+	ex = f->bui_extents;
+	for (i=0; i < f->bui_nextents; i++) {
+		printf("(s: 0x%llx, l: %d, own: %lld, off: %llu, f: 0x%x) ",
+			(unsigned long long)ex->me_startblock, ex->me_len,
+			(long long)ex->me_owner,
+			(unsigned long long)ex->me_startoff, ex->me_flags);
+		printf("\n");
+		ex++;
+	}
+error:
+	free(src_f);
+	free(f);
+	return error;
+}
+
+void
+xlog_recover_print_bui(
+	struct xlog_recover_item	*item)
+{
+	char				*src_f;
+	uint				src_len;
+
+	src_f = item->ri_buf[0].i_addr;
+	src_len = item->ri_buf[0].i_len;
+
+	xlog_print_trans_bui(&src_f, src_len, 0);
+}
+
+int
+xlog_print_trans_bud(
+	char				**ptr,
+	uint				len)
+{
+	struct xfs_bud_log_format	*f;
+	struct xfs_bud_log_format	lbuf;
+
+	/* size without extents at end */
+	uint core_size = sizeof(struct xfs_bud_log_format);
+
+	/*
+	 * memmove to ensure 8-byte alignment for the long longs in
+	 * xfs_efd_log_format_t structure
+	 */
+	memmove(&lbuf, *ptr, MIN(core_size, len));
+	f = &lbuf;
+	*ptr += len;
+	if (len >= core_size) {
+		printf(_("BUD:  #regs: %d	                 id: 0x%llx\n"),
+			f->bud_size,
+			(unsigned long long)f->bud_bui_id);
+
+		/* don't print extents as they are not used */
+
+		return 0;
+	} else {
+		printf(_("BUD: Not enough data to decode further\n"));
+		return 1;
+	}
+}
+
+void
+xlog_recover_print_bud(
+	struct xlog_recover_item	*item)
+{
+	char				*f;
+
+	f = item->ri_buf[0].i_addr;
+	xlog_print_trans_bud(&f, sizeof(struct xfs_bud_log_format));
+}
diff --git a/logprint/logprint.h b/logprint/logprint.h
index ae63c2e..fc10e16 100644
--- a/logprint/logprint.h
+++ b/logprint/logprint.h
@@ -59,4 +59,9 @@ extern void xlog_recover_print_cui(struct xlog_recover_item *item);
 extern int xlog_print_trans_cud(char **ptr, uint len);
 extern void xlog_recover_print_cud(struct xlog_recover_item *item);
 
+extern int xlog_print_trans_bui(char **ptr, uint src_len, int continued);
+extern void xlog_recover_print_bui(struct xlog_recover_item *item);
+extern int xlog_print_trans_bud(char **ptr, uint len);
+extern void xlog_recover_print_bud(struct xlog_recover_item *item);
+
 #endif	/* LOGPRINT_H */


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 25/39] man: document the inode cowextsize flags & fields
  2016-10-25 23:03 [PATCH v10 00/39] xfsprogs: add reflink and dedupe support Darrick J. Wong
                   ` (23 preceding siblings ...)
  2016-10-25 23:06 ` [PATCH 24/39] xfs_logprint: support bmap " Darrick J. Wong
@ 2016-10-25 23:06 ` Darrick J. Wong
  2016-10-26 10:39   ` Christoph Hellwig
  2016-10-25 23:06 ` [PATCH 26/39] xfs_repair: fix get_agino_buf to avoid corrupting inodes Darrick J. Wong
                   ` (13 subsequent siblings)
  38 siblings, 1 reply; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-25 23:06 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: linux-xfs

Document the new copy-on-write extent size fields and inode flags
available in struct fsxattr.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 man/man3/xfsctl.3 |   28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)


diff --git a/man/man3/xfsctl.3 b/man/man3/xfsctl.3
index 9e7f138..6e5027c 100644
--- a/man/man3/xfsctl.3
+++ b/man/man3/xfsctl.3
@@ -150,6 +150,15 @@ value returned indicates that a preferred extent size was previously
 set on the file, a
 .B fsx_extsize
 of zero indicates that the defaults for that filesystem will be used.
+A
+.B fsx_cowextsize
+value returned indicates that a preferred copy on write extent size was
+previously set on the file, whereas a
+.B fsx_cowextsize
+of zero indicates that the defaults for that filesystem will be used.
+The current default for
+.B fsx_cowextsize
+is 128 blocks.
 Currently the meaningful bits for the
 .B fsx_xflags
 field are:
@@ -229,6 +238,19 @@ created and written into the directory.
 If the filesystem lives on directly accessible persistent memory, reads and
 writes to this file will go straight to the persistent memory, bypassing the
 page cache.
+A file cannot be reflinked and have the
+.BR XFS_XFLAG_DAX
+set at the same time.
+That is to say that DAX files cannot share blocks.
+.TP
+.SM "Bit 16 (0x10000) \- XFS_XFLAG_COWEXTSIZE"
+Copy on Write Extent size bit - if a CoW extent size value is set on the file,
+the allocator will allocate extents for staging a copy on write operation
+in multiples of the set size for this file (see
+.B XFS_IOC_FSSETXATTR
+below).
+If the CoW extent size is set on a directory, then new file and directories
+created in the directory will inherit the parent's CoW extent size value.
 .TP
 .SM "Bit 31 (0x80000000) \- XFS_XFLAG_HASATTR"
 The file has extended attributes associated with it.
@@ -251,7 +273,8 @@ The final argument points to a variable of type
 .BR "struct fsxattr" ,
 but only the following fields are used in this call:
 .BR fsx_xflags ,
-.B fsx_extsize
+.BR fsx_extsize ,
+.BR fsx_cowextsize ,
 and
 .BR fsx_projid .
 The
@@ -261,6 +284,9 @@ when the file is empty, except in the case of a directory where
 the extent size can be set at any time (this value is only used
 for regular file allocations, so should only be set on a directory
 in conjunction with the XFS_XFLAG_EXTSZINHERIT flag).
+The copy on write extent size,
+.BR fsx_cowextsize ,
+can be set at any time.
 
 .TP
 .B XFS_IOC_GETBMAP


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 26/39] xfs_repair: fix get_agino_buf to avoid corrupting inodes
  2016-10-25 23:03 [PATCH v10 00/39] xfsprogs: add reflink and dedupe support Darrick J. Wong
                   ` (24 preceding siblings ...)
  2016-10-25 23:06 ` [PATCH 25/39] man: document the inode cowextsize flags & fields Darrick J. Wong
@ 2016-10-25 23:06 ` Darrick J. Wong
  2016-10-26 10:48   ` Christoph Hellwig
  2016-10-25 23:06 ` [PATCH 27/39] xfs_repair: check the existing refcount btree Darrick J. Wong
                   ` (12 subsequent siblings)
  38 siblings, 1 reply; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-25 23:06 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: linux-xfs

The inode buffering code tries to read inodes in units of chunks,
which are the larger of 8K or 1 FSB.  Each chunk gets its own xfs_buf,
which means that get_agino_buf must calculate the disk address of the
chunk and feed that to libxfs_readbuf in order to find the inode data
correctly.  The current code simply grabs the chunk for the start
inode and indexes from that, which corrupts memory because the start
inode and the target inode could be in different inode chunks.  That
causes the assert in rmap.c to blow when we clear the reflink flag.

(Also fix some minor errors in the debugging printfs.)

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/rdwr.c   |    8 +++---
 repair/dinode.c |   73 +++++++++++++++++++++++++++++++++----------------------
 repair/dinode.h |   12 +++++----
 3 files changed, 54 insertions(+), 39 deletions(-)


diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
index 8b22eb4..0fea9cb 100644
--- a/libxfs/rdwr.c
+++ b/libxfs/rdwr.c
@@ -1038,9 +1038,9 @@ libxfs_readbufr_map(struct xfs_buftarg *btp, struct xfs_buf *bp, int flags)
 	if (!error)
 		bp->b_flags |= LIBXFS_B_UPTODATE;
 #ifdef IO_DEBUG
-	printf("%lx: %s: read %u bytes, error %d, blkno=0x%llx(0x%llx), %p\n",
-		pthread_self(), __FUNCTION__, , error,
-		(long long)LIBXFS_BBTOOFF64(blkno), (long long)blkno, bp);
+	printf("%lx: %s: read %lu bytes, error %d, blkno=%llu(%llu), %p\n",
+		pthread_self(), __FUNCTION__, buf - (char *)bp->b_addr, error,
+		(long long)LIBXFS_BBTOOFF64(bp->b_bn), (long long)bp->b_bn, bp);
 #endif
 	return error;
 }
@@ -1070,7 +1070,7 @@ libxfs_readbuf_map(struct xfs_buftarg *btp, struct xfs_buf_map *map, int nmaps,
 	if (!error)
 		libxfs_readbuf_verify(bp, ops);
 
-#ifdef IO_DEBUG
+#ifdef IO_DEBUGX
 	printf("%lx: %s: read %lu bytes, error %d, blkno=%llu(%llu), %p\n",
 		pthread_self(), __FUNCTION__, buf - (char *)bp->b_addr, error,
 		(long long)LIBXFS_BBTOOFF64(bp->b_bn), (long long)bp->b_bn, bp);
diff --git a/repair/dinode.c b/repair/dinode.c
index 512a668..16e0a06 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -847,43 +847,58 @@ scan_bmbt_reclist(
 }
 
 /*
- * these two are meant for routines that read and work with inodes
- * one at a time where the inodes may be in any order (like walking
- * the unlinked lists to look for inodes).  the caller is responsible
- * for writing/releasing the buffer.
+ * Grab the buffer backing an inode.  This is meant for routines that
+ * work with inodes one at a time in any order (like walking the
+ * unlinked lists to look for inodes).  The caller is responsible for
+ * writing/releasing the buffer.
  */
-xfs_buf_t *
-get_agino_buf(xfs_mount_t	 *mp,
-		xfs_agnumber_t	agno,
-		xfs_agino_t	agino,
-		xfs_dinode_t	**dipp)
+struct xfs_buf *
+get_agino_buf(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	xfs_agino_t		agino,
+	struct xfs_dinode	**dipp)
 {
-	ino_tree_node_t *irec;
-	xfs_buf_t *bp;
-	int size;
-
-	if ((irec = find_inode_rec(mp, agno, agino)) == NULL)
-		return(NULL);
+	struct xfs_buf		*bp;
+	int			cluster_size;
+	int			ino_per_cluster;
+	xfs_agino_t		cluster_agino;
+	xfs_daddr_t		cluster_daddr;
+	xfs_daddr_t		cluster_blks;
 
-	size = MAX(1, XFS_FSB_TO_BB(mp,
+	/*
+	 * Inode buffers have been read into memory in inode_cluster_size
+	 * chunks (or one FSB).  To find the correct buffer for an inode,
+	 * we must find the buffer for its cluster, add the appropriate
+	 * offset, and return that.
+	 */
+	cluster_size = MAX(mp->m_inode_cluster_size, mp->m_sb.sb_blocksize);
+	ino_per_cluster = cluster_size / mp->m_sb.sb_inodesize;
+	cluster_agino = agino & ~(ino_per_cluster - 1);
+	cluster_blks = XFS_FSB_TO_DADDR(mp, MAX(1,
 			mp->m_inode_cluster_size >> mp->m_sb.sb_blocklog));
-	bp = libxfs_readbuf(mp->m_dev, XFS_AGB_TO_DADDR(mp, agno,
-		XFS_AGINO_TO_AGBNO(mp, irec->ino_startnum)), size, 0,
-		&xfs_inode_buf_ops);
+	cluster_daddr = XFS_AGB_TO_DADDR(mp, agno,
+			XFS_AGINO_TO_AGBNO(mp, cluster_agino));
+
+#ifdef XR_INODE_TRACE
+	printf("cluster_size %d ipc %d clusagino %d daddr %lld sectors %lld\n",
+		cluster_size, ino_per_cluster, cluster_agino, cluster_daddr,
+		cluster_blks);
+#endif
+
+	bp = libxfs_readbuf(mp->m_dev, cluster_daddr, cluster_blks,
+			0, &xfs_inode_buf_ops);
 	if (!bp) {
 		do_warn(_("cannot read inode (%u/%u), disk block %" PRIu64 "\n"),
-			agno, irec->ino_startnum,
-			XFS_AGB_TO_DADDR(mp, agno,
-				XFS_AGINO_TO_AGBNO(mp, irec->ino_startnum)));
-		return(NULL);
+			agno, cluster_agino, cluster_daddr);
+		return NULL;
 	}
 
-	*dipp = xfs_make_iptr(mp, bp, agino -
-		XFS_OFFBNO_TO_AGINO(mp, XFS_AGINO_TO_AGBNO(mp,
-						irec->ino_startnum),
-		0));
-
-	return(bp);
+	*dipp = xfs_make_iptr(mp, bp, agino - cluster_agino);
+	ASSERT(!xfs_sb_version_hascrc(&mp->m_sb) ||
+			XFS_AGINO_TO_INO(mp, agno, agino) ==
+			be64_to_cpu((*dipp)->di_ino));
+	return bp;
 }
 
 /*
diff --git a/repair/dinode.h b/repair/dinode.h
index 5aebf5b..61d0736 100644
--- a/repair/dinode.h
+++ b/repair/dinode.h
@@ -113,12 +113,12 @@ void
 check_uncertain_aginodes(xfs_mount_t	*mp,
 			xfs_agnumber_t	agno);
 
-xfs_buf_t *
-get_agino_buf(xfs_mount_t	*mp,
-		xfs_agnumber_t	agno,
-		xfs_agino_t	agino,
-		xfs_dinode_t	**dipp);
-
+struct xfs_buf *
+get_agino_buf(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	xfs_agino_t		agino,
+	struct xfs_dinode	**dipp);
 
 void dinode_bmbt_translation_init(void);
 char * get_forkname(int whichfork);


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 27/39] xfs_repair: check the existing refcount btree
  2016-10-25 23:03 [PATCH v10 00/39] xfsprogs: add reflink and dedupe support Darrick J. Wong
                   ` (25 preceding siblings ...)
  2016-10-25 23:06 ` [PATCH 26/39] xfs_repair: fix get_agino_buf to avoid corrupting inodes Darrick J. Wong
@ 2016-10-25 23:06 ` Darrick J. Wong
  2016-10-26 10:49   ` Christoph Hellwig
  2016-10-25 23:06 ` [PATCH 28/39] xfs_repair: handle multiple owners of data blocks Darrick J. Wong
                   ` (11 subsequent siblings)
  38 siblings, 1 reply; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-25 23:06 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: linux-xfs

Spot-check the refcount btree for obvious errors, and mark the
refcount btree blocks as such.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/incore.h     |    3 +
 repair/scan.c       |  185 +++++++++++++++++++++++++++++++++++++++++++++++++++
 repair/xfs_repair.c |    2 +
 3 files changed, 189 insertions(+), 1 deletion(-)


diff --git a/repair/incore.h b/repair/incore.h
index bc0810b..b6c4b4f 100644
--- a/repair/incore.h
+++ b/repair/incore.h
@@ -106,7 +106,8 @@ typedef struct rt_extent_tree_node  {
 #define XR_E_INUSE_FS1	9	/* used by fs ag header or log (rmap btree) */
 #define XR_E_INO1	10	/* used by inodes (marked by rmap btree) */
 #define XR_E_FS_MAP1	11	/* used by fs space/inode maps (rmap btree) */
-#define XR_E_BAD_STATE	12
+#define XR_E_REFC	12	/* used by fs ag reference count btree */
+#define XR_E_BAD_STATE	13
 
 /* separate state bit, OR'ed into high (4th) bit of ex_state field */
 
diff --git a/repair/scan.c b/repair/scan.c
index 35a974a..c27f969 100644
--- a/repair/scan.c
+++ b/repair/scan.c
@@ -810,6 +810,9 @@ process_rmap_rec(
 		case XFS_RMAP_OWN_INODES:
 			set_bmap_ext(agno, b, blen, XR_E_INO1);
 			break;
+		case XFS_RMAP_OWN_REFC:
+			set_bmap_ext(agno, b, blen, XR_E_REFC);
+			break;
 		case XFS_RMAP_OWN_NULL:
 			/* still unknown */
 			break;
@@ -845,6 +848,14 @@ _("inode block (%d,%d-%d) mismatch in %s tree, state - %d,%" PRIx64 "\n"),
 			agno, b, b + blen - 1,
 			name, state, owner);
 		break;
+	case XR_E_REFC:
+		if (owner == XFS_RMAP_OWN_REFC)
+			break;
+		do_warn(
+_("AG refcount block (%d,%d-%d) mismatch in %s tree, state - %d,%" PRIx64 "\n"),
+			agno, b, b + blen - 1,
+			name, state, owner);
+		break;
 	case XR_E_INUSE:
 		if (owner >= 0 &&
 		    owner < mp->m_sb.sb_dblocks)
@@ -1161,6 +1172,167 @@ _("%s rmap btree block claimed (state %d), agno %d, bno %d, suspect %d\n"),
 		rmap_avoid_check();
 }
 
+static void
+scan_refcbt(
+	struct xfs_btree_block	*block,
+	int			level,
+	xfs_agblock_t		bno,
+	xfs_agnumber_t		agno,
+	int			suspect,
+	int			isroot,
+	__uint32_t		magic,
+	void			*priv)
+{
+	const char		*name = "refcount";
+	int			i;
+	xfs_refcount_ptr_t	*pp;
+	struct xfs_refcount_rec	*rp;
+	int			hdr_errors = 0;
+	int			numrecs;
+	int			state;
+	xfs_agblock_t		lastblock = 0;
+
+	if (magic != XFS_REFC_CRC_MAGIC) {
+		name = "(unknown)";
+		hdr_errors++;
+		suspect++;
+		goto out;
+	}
+
+	if (be32_to_cpu(block->bb_magic) != magic) {
+		do_warn(_("bad magic # %#x in %s btree block %d/%d\n"),
+			be32_to_cpu(block->bb_magic), name, agno, bno);
+		hdr_errors++;
+		if (suspect)
+			goto out;
+	}
+
+	if (be16_to_cpu(block->bb_level) != level) {
+		do_warn(_("expected level %d got %d in %s btree block %d/%d\n"),
+			level, be16_to_cpu(block->bb_level), name, agno, bno);
+		hdr_errors++;
+		if (suspect)
+			goto out;
+	}
+
+	/* check for btree blocks multiply claimed */
+	state = get_bmap(agno, bno);
+	if (!(state == XR_E_UNKNOWN || state == XR_E_REFC))  {
+		set_bmap(agno, bno, XR_E_MULT);
+		do_warn(
+_("%s btree block claimed (state %d), agno %d, bno %d, suspect %d\n"),
+				name, state, agno, bno, suspect);
+		goto out;
+	}
+	set_bmap(agno, bno, XR_E_FS_MAP);
+
+	numrecs = be16_to_cpu(block->bb_numrecs);
+	if (level == 0) {
+		if (numrecs > mp->m_refc_mxr[0])  {
+			numrecs = mp->m_refc_mxr[0];
+			hdr_errors++;
+		}
+		if (isroot == 0 && numrecs < mp->m_refc_mnr[0])  {
+			numrecs = mp->m_refc_mnr[0];
+			hdr_errors++;
+		}
+
+		if (hdr_errors) {
+			do_warn(
+	_("bad btree nrecs (%u, min=%u, max=%u) in %s btree block %u/%u\n"),
+				be16_to_cpu(block->bb_numrecs),
+				mp->m_refc_mnr[0], mp->m_refc_mxr[0],
+				name, agno, bno);
+			suspect++;
+		}
+
+		rp = XFS_REFCOUNT_REC_ADDR(block, 1);
+		for (i = 0; i < numrecs; i++) {
+			xfs_agblock_t		b, end;
+			xfs_extlen_t		len;
+			xfs_nlink_t		nr;
+
+			b = be32_to_cpu(rp[i].rc_startblock);
+			len = be32_to_cpu(rp[i].rc_blockcount);
+			nr = be32_to_cpu(rp[i].rc_refcount);
+			end = b + len;
+
+			if (!verify_agbno(mp, agno, b)) {
+				do_warn(
+	_("invalid start block %u in record %u of %s btree block %u/%u\n"),
+					b, i, name, agno, bno);
+				continue;
+			}
+			if (len == 0 || !verify_agbno(mp, agno, end - 1)) {
+				do_warn(
+	_("invalid length %u in record %u of %s btree block %u/%u\n"),
+					len, i, name, agno, bno);
+				continue;
+			}
+
+			if (nr < 2 || nr > MAXREFCOUNT) {
+				do_warn(
+	_("invalid reference count %u in record %u of %s btree block %u/%u\n"),
+					nr, i, name, agno, bno);
+				continue;
+			}
+
+			if (b && b <= lastblock) {
+				do_warn(_(
+	"out-of-order %s btree record %d (%u %u) block %u/%u\n"),
+					name, i, b, len, agno, bno);
+			} else {
+				lastblock = b;
+			}
+
+			/* XXX: probably want to mark the reflinked areas? */
+		}
+		goto out;
+	}
+
+	/*
+	 * interior record
+	 */
+	pp = XFS_REFCOUNT_PTR_ADDR(block, 1, mp->m_refc_mxr[1]);
+
+	if (numrecs > mp->m_refc_mxr[1])  {
+		numrecs = mp->m_refc_mxr[1];
+		hdr_errors++;
+	}
+	if (isroot == 0 && numrecs < mp->m_refc_mnr[1])  {
+		numrecs = mp->m_refc_mnr[1];
+		hdr_errors++;
+	}
+
+	/*
+	 * don't pass bogus tree flag down further if this block
+	 * looked ok.  bail out if two levels in a row look bad.
+	 */
+	if (hdr_errors)  {
+		do_warn(
+	_("bad btree nrecs (%u, min=%u, max=%u) in %s btree block %u/%u\n"),
+			be16_to_cpu(block->bb_numrecs),
+			mp->m_refc_mnr[1], mp->m_refc_mxr[1],
+			name, agno, bno);
+		if (suspect)
+			goto out;
+		suspect++;
+	} else if (suspect) {
+		suspect = 0;
+	}
+
+	for (i = 0; i < numrecs; i++)  {
+		xfs_agblock_t		bno = be32_to_cpu(pp[i]);
+
+		if (bno != 0 && verify_agbno(mp, agno, bno)) {
+			scan_sbtree(bno, level, agno, suspect, scan_refcbt, 0,
+				    magic, priv, &xfs_refcountbt_buf_ops);
+		}
+	}
+out:
+	return;
+}
+
 /*
  * The following helpers are to help process and validate individual on-disk
  * inode btree records. We have two possible inode btrees with slightly
@@ -1951,6 +2123,19 @@ validate_agf(
 		}
 	}
 
+	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
+		bno = be32_to_cpu(agf->agf_refcount_root);
+		if (bno != 0 && verify_agbno(mp, agno, bno)) {
+			scan_sbtree(bno,
+				    be32_to_cpu(agf->agf_refcount_level),
+				    agno, 0, scan_refcbt, 1, XFS_REFC_CRC_MAGIC,
+				    agcnts, &xfs_refcountbt_buf_ops);
+		} else  {
+			do_warn(_("bad agbno %u for refcntbt root, agno %d\n"),
+				bno, agno);
+		}
+	}
+
 	if (be32_to_cpu(agf->agf_freeblks) != agcnts->agffreeblks) {
 		do_warn(_("agf_freeblks %u, counted %u in ag %u\n"),
 			be32_to_cpu(agf->agf_freeblks), agcnts->agffreeblks, agno);
diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
index dc38ece..4d92b90 100644
--- a/repair/xfs_repair.c
+++ b/repair/xfs_repair.c
@@ -423,6 +423,8 @@ calc_mkfs(xfs_mount_t *mp)
 		fino_bno += min(2, mp->m_rmap_maxlevels); /* agfl blocks */
 		fino_bno++;
 	}
+	if (xfs_sb_version_hasreflink(&mp->m_sb))
+		fino_bno++;
 
 	/*
 	 * If the log is allocated in the first allocation group we need to


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 28/39] xfs_repair: handle multiple owners of data blocks
  2016-10-25 23:03 [PATCH v10 00/39] xfsprogs: add reflink and dedupe support Darrick J. Wong
                   ` (26 preceding siblings ...)
  2016-10-25 23:06 ` [PATCH 27/39] xfs_repair: check the existing refcount btree Darrick J. Wong
@ 2016-10-25 23:06 ` Darrick J. Wong
  2016-10-26 10:57   ` Christoph Hellwig
  2016-10-25 23:06 ` [PATCH 29/39] xfs_repair: process reverse-mapping data into refcount data Darrick J. Wong
                   ` (10 subsequent siblings)
  38 siblings, 1 reply; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-25 23:06 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: linux-xfs

If reflink is enabled, don't freak out if there are multiple owners of
a given block; that's just a sign that each of those owners are
reflink files.

v2: owner and offset are unsigned types, so use those for inorder
comparison.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/dinode.c |   66 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 repair/scan.c   |   42 ++++++++++++++++++++++++++++++++++-
 2 files changed, 107 insertions(+), 1 deletion(-)


diff --git a/repair/dinode.c b/repair/dinode.c
index 16e0a06..98afdc9 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -722,6 +722,9 @@ _("Fatal error: inode %" PRIu64 " - blkmap_set_ext(): %s\n"
 			 * checking each entry without setting the
 			 * block bitmap
 			 */
+			if (type == XR_INO_DATA &&
+			    xfs_sb_version_hasreflink(&mp->m_sb))
+				goto skip_dup;
 			if (search_dup_extent(agno, agbno, ebno)) {
 				do_warn(
 _("%s fork in ino %" PRIu64 " claims dup extent, "
@@ -731,6 +734,7 @@ _("%s fork in ino %" PRIu64 " claims dup extent, "
 					irec.br_blockcount);
 				goto done;
 			}
+skip_dup:
 			*tot += irec.br_blockcount;
 			continue;
 		}
@@ -770,6 +774,9 @@ _("%s fork in inode %" PRIu64 " claims metadata block %" PRIu64 "\n"),
 			case XR_E_INUSE:
 			case XR_E_MULT:
 				set_bmap_ext(agno, agbno, blen, XR_E_MULT);
+				if (type == XR_INO_DATA &&
+				    xfs_sb_version_hasreflink(&mp->m_sb))
+					break;
 				do_warn(
 _("%s fork in %s inode %" PRIu64 " claims used block %" PRIu64 "\n"),
 					forkname, ftype, ino, b);
@@ -2475,6 +2482,65 @@ _("bad (negative) size %" PRId64 " on inode %" PRIu64 "\n"),
 		}
 	}
 
+	/*
+	 * check that we only have valid flags2 set, and those that are set make
+	 * sense.
+	 */
+	if (dino->di_version >= 3) {
+		uint16_t flags = be16_to_cpu(dino->di_flags);
+		uint64_t flags2 = be64_to_cpu(dino->di_flags2);
+
+		if (flags2 & ~XFS_DIFLAG2_ANY) {
+			if (!uncertain) {
+				do_warn(
+	_("Bad flags2 set in inode %" PRIu64 "\n"),
+					lino);
+			}
+			flags2 &= XFS_DIFLAG2_ANY;
+		}
+
+		if ((flags2 & XFS_DIFLAG2_REFLINK) &&
+		    !xfs_sb_version_hasreflink(&mp->m_sb)) {
+			if (!uncertain) {
+				do_warn(
+	_("inode %" PRIu64 " is marked reflinked but file system does not support reflink\n"),
+					lino);
+			}
+			goto clear_bad_out;
+		}
+
+		if (flags2 & XFS_DIFLAG2_REFLINK) {
+			/* must be a file */
+			if (di_mode && !S_ISREG(di_mode)) {
+				if (!uncertain) {
+					do_warn(
+	_("reflink flag set on non-file inode %" PRIu64 "\n"),
+						lino);
+				}
+				goto clear_bad_out;
+			}
+		}
+
+		if ((flags2 & XFS_DIFLAG2_REFLINK) &&
+		    (flags & (XFS_DIFLAG_REALTIME | XFS_DIFLAG_RTINHERIT))) {
+			if (!uncertain) {
+				do_warn(
+	_("Cannot have a reflinked realtime inode %" PRIu64 "\n"),
+					lino);
+			}
+			goto clear_bad_out;
+		}
+
+		if (!verify_mode && flags2 != be64_to_cpu(dino->di_flags2)) {
+			if (!no_modify) {
+				do_warn(_("fixing bad flags2.\n"));
+				dino->di_flags2 = cpu_to_be64(flags2);
+				*dirty = 1;
+			} else
+				do_warn(_("would fix bad flags2.\n"));
+		}
+	}
+
 	if (verify_mode)
 		return retval;
 
diff --git a/repair/scan.c b/repair/scan.c
index c27f969..d3a1a82 100644
--- a/repair/scan.c
+++ b/repair/scan.c
@@ -872,6 +872,15 @@ _("in use block (%d,%d-%d) mismatch in %s tree, state - %d,%" PRIx64 "\n"),
 		 * be caught later.
 		 */
 		break;
+	case XR_E_INUSE1:
+		/*
+		 * multiple inode owners are ok with
+		 * reflink enabled
+		 */
+		if (xfs_sb_version_hasreflink(&mp->m_sb) &&
+		    !XFS_RMAP_NON_INODE_OWNER(owner))
+			break;
+		/* fall through */
 	default:
 		do_warn(
 _("unknown block (%d,%d-%d) mismatch on %s tree, state - %d,%" PRIx64 "\n"),
@@ -888,6 +897,28 @@ struct rmap_priv {
 	xfs_agblock_t		nr_blocks;
 };
 
+static bool
+rmap_in_order(
+	xfs_agblock_t	b,
+	xfs_agblock_t	lastblock,
+	uint64_t	owner,
+	uint64_t	lastowner,
+	uint64_t	offset,
+	uint64_t	lastoffset)
+{
+	if (b > lastblock)
+		return true;
+	else if (b < lastblock)
+		return false;
+
+	if (owner > lastowner)
+		return true;
+	else if (owner < lastowner)
+		return false;
+
+	return offset > lastoffset;
+}
+
 static void
 scan_rmapbt(
 	struct xfs_btree_block	*block,
@@ -908,6 +939,8 @@ scan_rmapbt(
 	int			numrecs;
 	int			state;
 	xfs_agblock_t		lastblock = 0;
+	uint64_t		lastowner = 0;
+	uint64_t		lastoffset = 0;
 	struct xfs_rmap_key	*kp;
 	struct xfs_rmap_irec	key = {0};
 
@@ -1038,10 +1071,17 @@ _("%s rmap btree block claimed (state %d), agno %d, bno %d, suspect %d\n"),
 			if (i == 0) {
 advance:
 				lastblock = b;
+				lastowner = owner;
+				lastoffset = offset;
 			} else {
 				bool bad;
 
-				bad = b <= lastblock;
+				if (xfs_sb_version_hasreflink(&mp->m_sb))
+					bad = !rmap_in_order(b, lastblock,
+							owner, lastowner,
+							offset, lastoffset);
+				else
+					bad = b <= lastblock;
 				if (bad)
 					do_warn(
 	_("out-of-order rmap btree record %d (%u %"PRId64" %"PRIx64" %u) block %u/%u\n"),


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 29/39] xfs_repair: process reverse-mapping data into refcount data
  2016-10-25 23:03 [PATCH v10 00/39] xfsprogs: add reflink and dedupe support Darrick J. Wong
                   ` (27 preceding siblings ...)
  2016-10-25 23:06 ` [PATCH 28/39] xfs_repair: handle multiple owners of data blocks Darrick J. Wong
@ 2016-10-25 23:06 ` Darrick J. Wong
  2016-10-25 23:06 ` [PATCH 30/39] xfs_repair: record reflink inode state Darrick J. Wong
                   ` (9 subsequent siblings)
  38 siblings, 0 replies; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-25 23:06 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: linux-xfs

Take all the reverse-mapping data we've acquired and use it to generate
reference count data.  This data is used in phase 5 to rebuild the
refcount btree.

v2: Update to reflect separation of rmap_irec flags.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/phase4.c |   27 ++++++
 repair/rmap.c   |  232 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 repair/rmap.h   |    2 
 3 files changed, 259 insertions(+), 2 deletions(-)


diff --git a/repair/phase4.c b/repair/phase4.c
index 9da1bb1..86992c9 100644
--- a/repair/phase4.c
+++ b/repair/phase4.c
@@ -193,6 +193,21 @@ _("%s while checking reverse-mappings"),
 }
 
 static void
+compute_ag_refcounts(
+	work_queue_t	*wq,
+	xfs_agnumber_t	agno,
+	void		*arg)
+{
+	int		error;
+
+	error = compute_refcounts(wq->mp, agno);
+	if (error)
+		do_error(
+_("%s while computing reference count records.\n"),
+			 strerror(-error));
+}
+
+static void
 process_rmap_data(
 	struct xfs_mount	*mp)
 {
@@ -206,6 +221,14 @@ process_rmap_data(
 	for (i = 0; i < mp->m_sb.sb_agcount; i++)
 		queue_work(&wq, check_rmap_btrees, i, NULL);
 	destroy_work_queue(&wq);
+
+	if (!xfs_sb_version_hasreflink(&mp->m_sb))
+		return;
+
+	create_work_queue(&wq, mp, libxfs_nproc());
+	for (i = 0; i < mp->m_sb.sb_agcount; i++)
+		queue_work(&wq, compute_ag_refcounts, i, NULL);
+	destroy_work_queue(&wq);
 }
 
 void
@@ -359,7 +382,9 @@ phase4(xfs_mount_t *mp)
 
 	/*
 	 * Process all the reverse-mapping data that we collected.  This
-	 * involves checking the rmap data against the btree.
+	 * involves checking the rmap data against the btree, computing
+	 * reference counts based on the rmap data, and checking the counts
+	 * against the refcount btree.
 	 */
 	process_rmap_data(mp);
 
diff --git a/repair/rmap.c b/repair/rmap.c
index 645af31..e95f6c8 100644
--- a/repair/rmap.c
+++ b/repair/rmap.c
@@ -42,6 +42,7 @@ struct xfs_ag_rmap {
 	int		ar_flcount;		/* agfl entries from leftover */
 						/* agbt allocations */
 	struct xfs_rmap_irec	ar_last_rmap;	/* last rmap seen */
+	struct xfs_slab	*ar_refcount_items;	/* refcount items, p4-5 */
 };
 
 static struct xfs_ag_rmap *ag_rmaps;
@@ -88,7 +89,8 @@ bool
 rmap_needs_work(
 	struct xfs_mount	*mp)
 {
-	return xfs_sb_version_hasrmapbt(&mp->m_sb);
+	return xfs_sb_version_hasreflink(&mp->m_sb) ||
+	       xfs_sb_version_hasrmapbt(&mp->m_sb);
 }
 
 /*
@@ -120,6 +122,11 @@ _("Insufficient memory while allocating reverse mapping slabs."));
 			do_error(
 _("Insufficient memory while allocating raw metadata reverse mapping slabs."));
 		ag_rmaps[i].ar_last_rmap.rm_owner = XFS_RMAP_OWN_UNKNOWN;
+		error = init_slab(&ag_rmaps[i].ar_refcount_items,
+				  sizeof(struct xfs_refcount_irec));
+		if (error)
+			do_error(
+_("Insufficient memory while allocating refcount item slabs."));
 	}
 }
 
@@ -138,6 +145,7 @@ rmaps_free(
 	for (i = 0; i < mp->m_sb.sb_agcount; i++) {
 		free_slab(&ag_rmaps[i].ar_rmaps);
 		free_slab(&ag_rmaps[i].ar_raw_rmaps);
+		free_slab(&ag_rmaps[i].ar_refcount_items);
 	}
 	free(ag_rmaps);
 	ag_rmaps = NULL;
@@ -598,6 +606,228 @@ rmap_dump(
 #endif
 
 /*
+ * Rebuilding the Reference Count & Reverse Mapping Btrees
+ *
+ * The reference count (refcnt) and reverse mapping (rmap) btrees are rebuilt
+ * during phase 5, like all other AG btrees.  Therefore, reverse mappings must
+ * be processed into reference counts at the end of phase 4, and the rmaps must
+ * be recorded during phase 4.  There is a need to access the rmaps in physical
+ * block order, but no particular need for random access, so the slab.c code
+ * provides a big logical array (consisting of smaller slabs) and some inorder
+ * iterator functions.
+ *
+ * Once we've recorded all the reverse mappings, we're ready to translate the
+ * rmaps into refcount entries.  Imagine the rmap entries as rectangles
+ * representing extents of physical blocks, and that the rectangles can be laid
+ * down to allow them to overlap each other; then we know that we must emit
+ * a refcnt btree entry wherever the amount of overlap changes, i.e. the
+ * emission stimulus is level-triggered:
+ *
+ *                 -    ---
+ *       --      ----- ----   ---        ------
+ * --   ----     ----------- ----     ---------
+ * -------------------------------- -----------
+ * ^ ^  ^^ ^^    ^ ^^ ^^^  ^^^^  ^ ^^ ^  ^     ^
+ * 2 1  23 21    3 43 234  2123  1 01 2  3     0
+ *
+ * For our purposes, a rmap is a tuple (startblock, len, fileoff, owner).
+ *
+ * Note that in the actual refcnt btree we don't store the refcount < 2 cases
+ * because the bnobt tells us which blocks are free; single-use blocks aren't
+ * recorded in the bnobt or the refcntbt.  If the rmapbt supports storing
+ * multiple entries covering a given block we could theoretically dispense with
+ * the refcntbt and simply count rmaps, but that's inefficient in the (hot)
+ * write path, so we'll take the cost of the extra tree to save time.  Also
+ * there's no guarantee that rmap will be enabled.
+ *
+ * Given an array of rmaps sorted by physical block number, a starting physical
+ * block (sp), a bag to hold rmaps that cover sp, and the next physical
+ * block where the level changes (np), we can reconstruct the refcount
+ * btree as follows:
+ *
+ * While there are still unprocessed rmaps in the array,
+ *  - Set sp to the physical block (pblk) of the next unprocessed rmap.
+ *  - Add to the bag all rmaps in the array where startblock == sp.
+ *  - Set np to the physical block where the bag size will change.
+ *    This is the minimum of (the pblk of the next unprocessed rmap) and
+ *    (startblock + len of each rmap in the bag).
+ *  - Record the bag size as old_bag_size.
+ *
+ *  - While the bag isn't empty,
+ *     - Remove from the bag all rmaps where startblock + len == np.
+ *     - Add to the bag all rmaps in the array where startblock == np.
+ *     - If the bag size isn't old_bag_size, store the refcount entry
+ *       (sp, np - sp, bag_size) in the refcnt btree.
+ *     - If the bag is empty, break out of the inner loop.
+ *     - Set old_bag_size to the bag size
+ *     - Set sp = np.
+ *     - Set np to the physical block where the bag size will change.
+ *       This is the minimum of (the pblk of the next unprocessed rmap) and
+ *       (startblock + len of each rmap in the bag).
+ *
+ * An implementation detail is that because this processing happens during
+ * phase 4, the refcount entries are stored in an array so that phase 5 can
+ * load them into the refcount btree.  The rmaps can be loaded directly into
+ * the rmap btree during phase 5 as well.
+ */
+
+/*
+ * Emit a refcount object for refcntbt reconstruction during phase 5.
+ */
+#define REFCOUNT_CLAMP(nr)	((nr) > MAXREFCOUNT ? MAXREFCOUNT : (nr))
+static void
+refcount_emit(
+	struct xfs_mount		*mp,
+	xfs_agnumber_t		agno,
+	xfs_agblock_t		agbno,
+	xfs_extlen_t		len,
+	size_t			nr_rmaps)
+{
+	struct xfs_refcount_irec	rlrec;
+	int			error;
+	struct xfs_slab		*rlslab;
+
+	rlslab = ag_rmaps[agno].ar_refcount_items;
+	ASSERT(nr_rmaps > 0);
+
+	dbg_printf("REFL: agno=%u pblk=%u, len=%u -> refcount=%zu\n",
+		agno, agbno, len, nr_rmaps);
+	rlrec.rc_startblock = agbno;
+	rlrec.rc_blockcount = len;
+	rlrec.rc_refcount = REFCOUNT_CLAMP(nr_rmaps);
+	error = slab_add(rlslab, &rlrec);
+	if (error)
+		do_error(
+_("Insufficient memory while recreating refcount tree."));
+}
+#undef REFCOUNT_CLAMP
+
+/*
+ * Transform a pile of physical block mapping observations into refcount data
+ * for eventual rebuilding of the btrees.
+ */
+#define RMAP_END(r)	((r)->rm_startblock + (r)->rm_blockcount)
+int
+compute_refcounts(
+	struct xfs_mount		*mp,
+	xfs_agnumber_t		agno)
+{
+	struct xfs_bag		*stack_top = NULL;
+	struct xfs_slab		*rmaps;
+	struct xfs_slab_cursor	*rmaps_cur;
+	struct xfs_rmap_irec	*array_cur;
+	struct xfs_rmap_irec	*rmap;
+	xfs_agblock_t		sbno;	/* first bno of this rmap set */
+	xfs_agblock_t		cbno;	/* first bno of this refcount set */
+	xfs_agblock_t		nbno;	/* next bno where rmap set changes */
+	size_t			n, idx;
+	size_t			old_stack_nr;
+	int			error;
+
+	if (!xfs_sb_version_hasreflink(&mp->m_sb))
+		return 0;
+
+	rmaps = ag_rmaps[agno].ar_rmaps;
+
+	error = init_slab_cursor(rmaps, rmap_compare, &rmaps_cur);
+	if (error)
+		return error;
+
+	error = init_bag(&stack_top);
+	if (error)
+		goto err;
+
+	/* While there are rmaps to be processed... */
+	n = 0;
+	while (n < slab_count(rmaps)) {
+		array_cur = peek_slab_cursor(rmaps_cur);
+		sbno = cbno = array_cur->rm_startblock;
+		/* Push all rmaps with pblk == sbno onto the stack */
+		for (;
+		     array_cur && array_cur->rm_startblock == sbno;
+		     array_cur = peek_slab_cursor(rmaps_cur)) {
+			advance_slab_cursor(rmaps_cur); n++;
+			rmap_dump("push0", agno, array_cur);
+			error = bag_add(stack_top, array_cur);
+			if (error)
+				goto err;
+		}
+
+		/* Set nbno to the bno of the next refcount change */
+		if (n < slab_count(rmaps))
+			nbno = array_cur->rm_startblock;
+		else
+			nbno = NULLAGBLOCK;
+		foreach_bag_ptr(stack_top, idx, rmap) {
+			nbno = min(nbno, RMAP_END(rmap));
+		}
+
+		/* Emit reverse mappings, if needed */
+		ASSERT(nbno > sbno);
+		old_stack_nr = bag_count(stack_top);
+
+		/* While stack isn't empty... */
+		while (bag_count(stack_top)) {
+			/* Pop all rmaps that end at nbno */
+			foreach_bag_ptr_reverse(stack_top, idx, rmap) {
+				if (RMAP_END(rmap) != nbno)
+					continue;
+				rmap_dump("pop", agno, rmap);
+				error = bag_remove(stack_top, idx);
+				if (error)
+					goto err;
+			}
+
+			/* Push array items that start at nbno */
+			for (;
+			     array_cur && array_cur->rm_startblock == nbno;
+			     array_cur = peek_slab_cursor(rmaps_cur)) {
+				advance_slab_cursor(rmaps_cur); n++;
+				rmap_dump("push1", agno, array_cur);
+				error = bag_add(stack_top, array_cur);
+				if (error)
+					goto err;
+			}
+
+			/* Emit refcount if necessary */
+			ASSERT(nbno > cbno);
+			if (bag_count(stack_top) != old_stack_nr) {
+				if (old_stack_nr > 1) {
+					refcount_emit(mp, agno, cbno,
+						      nbno - cbno,
+						      old_stack_nr);
+				}
+				cbno = nbno;
+			}
+
+			/* Stack empty, go find the next rmap */
+			if (bag_count(stack_top) == 0)
+				break;
+			old_stack_nr = bag_count(stack_top);
+			sbno = nbno;
+
+			/* Set nbno to the bno of the next refcount change */
+			if (n < slab_count(rmaps))
+				nbno = array_cur->rm_startblock;
+			else
+				nbno = NULLAGBLOCK;
+			foreach_bag_ptr(stack_top, idx, rmap) {
+				nbno = min(nbno, RMAP_END(rmap));
+			}
+
+			/* Emit reverse mappings, if needed */
+			ASSERT(nbno > sbno);
+		}
+	}
+err:
+	free_bag(&stack_top);
+	free_slab_cursor(&rmaps_cur);
+
+	return error;
+}
+#undef RMAP_END
+
+/*
  * Return the number of rmap objects for an AG.
  */
 size_t
diff --git a/repair/rmap.h b/repair/rmap.h
index 7106dfc..01dec9f 100644
--- a/repair/rmap.h
+++ b/repair/rmap.h
@@ -49,6 +49,8 @@ extern __int64_t rmap_diffkeys(struct xfs_rmap_irec *kp1,
 extern void rmap_high_key_from_rec(struct xfs_rmap_irec *rec,
 		struct xfs_rmap_irec *key);
 
+extern int compute_refcounts(struct xfs_mount *, xfs_agnumber_t);
+
 extern void fix_freelist(struct xfs_mount *, xfs_agnumber_t, bool);
 extern void rmap_store_agflcount(struct xfs_mount *, xfs_agnumber_t, int);
 


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 30/39] xfs_repair: record reflink inode state
  2016-10-25 23:03 [PATCH v10 00/39] xfsprogs: add reflink and dedupe support Darrick J. Wong
                   ` (28 preceding siblings ...)
  2016-10-25 23:06 ` [PATCH 29/39] xfs_repair: process reverse-mapping data into refcount data Darrick J. Wong
@ 2016-10-25 23:06 ` Darrick J. Wong
  2016-10-25 23:06 ` [PATCH 31/39] xfs_repair: fix inode reflink flags Darrick J. Wong
                   ` (8 subsequent siblings)
  38 siblings, 0 replies; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-25 23:06 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: linux-xfs

Record the state of the per-inode reflink flag, so that we can
compare against the rmap data and update the flags accordingly.
Clear the (reflink) state if we clear the inode.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/dino_chunks.c |    1 +
 repair/dinode.c      |    6 ++++++
 repair/incore.h      |   38 ++++++++++++++++++++++++++++++++++++++
 repair/incore_ino.c  |    2 ++
 repair/rmap.c        |   26 ++++++++++++++++++++++++++
 repair/rmap.h        |    2 ++
 6 files changed, 75 insertions(+)


diff --git a/repair/dino_chunks.c b/repair/dino_chunks.c
index 7dbaca6..4db9512 100644
--- a/repair/dino_chunks.c
+++ b/repair/dino_chunks.c
@@ -931,6 +931,7 @@ process_inode_chunk(
 				do_warn(_("would have cleared inode %" PRIu64 "\n"),
 					ino);
 			}
+			clear_inode_was_rl(ino_rec, irec_offset);
 		}
 
 process_next:
diff --git a/repair/dinode.c b/repair/dinode.c
index 98afdc9..64fc983 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -2636,6 +2636,12 @@ _("bad non-zero extent size %u for non-realtime/extsize inode %" PRIu64 ", "),
 		goto clear_bad_out;
 
 	/*
+	 * record the state of the reflink flag
+	 */
+	if (collect_rmaps)
+		record_inode_reflink_flag(mp, dino, agno, ino, lino);
+
+	/*
 	 * check data fork -- if it's bad, clear the inode
 	 */
 	if (process_inode_data_fork(mp, agno, ino, dino, type, dirty,
diff --git a/repair/incore.h b/repair/incore.h
index b6c4b4f..bcd2f4b 100644
--- a/repair/incore.h
+++ b/repair/incore.h
@@ -283,6 +283,8 @@ typedef struct ino_tree_node  {
 	__uint64_t		ir_sparse;	/* sparse inode bitmask */
 	__uint64_t		ino_confirmed;	/* confirmed bitmask */
 	__uint64_t		ino_isa_dir;	/* bit == 1 if a directory */
+	__uint64_t		ino_was_rl;	/* bit == 1 if reflink flag set */
+	__uint64_t		ino_is_rl;	/* bit == 1 if reflink flag should be set */
 	__uint8_t		nlink_size;
 	union ino_nlink		disk_nlinks;	/* on-disk nlinks, set in P3 */
 	union  {
@@ -494,6 +496,42 @@ static inline bool is_inode_sparse(struct ino_tree_node *irec, int offset)
 }
 
 /*
+ * set/clear/test was inode marked as reflinked
+ */
+static inline void set_inode_was_rl(struct ino_tree_node *irec, int offset)
+{
+	irec->ino_was_rl |= IREC_MASK(offset);
+}
+
+static inline void clear_inode_was_rl(struct ino_tree_node *irec, int offset)
+{
+	irec->ino_was_rl &= ~IREC_MASK(offset);
+}
+
+static inline int inode_was_rl(struct ino_tree_node *irec, int offset)
+{
+	return (irec->ino_was_rl & IREC_MASK(offset)) != 0;
+}
+
+/*
+ * set/clear/test should inode be marked as reflinked
+ */
+static inline void set_inode_is_rl(struct ino_tree_node *irec, int offset)
+{
+	irec->ino_is_rl |= IREC_MASK(offset);
+}
+
+static inline void clear_inode_is_rl(struct ino_tree_node *irec, int offset)
+{
+	irec->ino_is_rl &= ~IREC_MASK(offset);
+}
+
+static inline int inode_is_rl(struct ino_tree_node *irec, int offset)
+{
+	return (irec->ino_is_rl & IREC_MASK(offset)) != 0;
+}
+
+/*
  * add_inode_reached() is set on inode I only if I has been reached
  * by an inode P claiming to be the parent and if I is a directory,
  * the .. link in the I says that P is I's parent.
diff --git a/repair/incore_ino.c b/repair/incore_ino.c
index 1898257..2ec1765 100644
--- a/repair/incore_ino.c
+++ b/repair/incore_ino.c
@@ -257,6 +257,8 @@ alloc_ino_node(
 	irec->ino_startnum = starting_ino;
 	irec->ino_confirmed = 0;
 	irec->ino_isa_dir = 0;
+	irec->ino_was_rl = 0;
+	irec->ino_is_rl = 0;
 	irec->ir_free = (xfs_inofree_t) - 1;
 	irec->ir_sparse = 0;
 	irec->ino_un.ex_data = NULL;
diff --git a/repair/rmap.c b/repair/rmap.c
index e95f6c8..5de4d32 100644
--- a/repair/rmap.c
+++ b/repair/rmap.c
@@ -1083,6 +1083,32 @@ rmap_high_key_from_rec(
 }
 
 /*
+ * Record that an inode had the reflink flag set when repair started.  The
+ * inode reflink flag will be adjusted as necessary.
+ */
+void
+record_inode_reflink_flag(
+	struct xfs_mount	*mp,
+	struct xfs_dinode	*dino,
+	xfs_agnumber_t		agno,
+	xfs_agino_t		ino,
+	xfs_ino_t		lino)
+{
+	struct ino_tree_node	*irec;
+	int			off;
+
+	ASSERT(XFS_AGINO_TO_INO(mp, agno, ino) == be64_to_cpu(dino->di_ino));
+	if (!(be64_to_cpu(dino->di_flags2) & XFS_DIFLAG2_REFLINK))
+		return;
+	irec = find_inode_rec(mp, agno, ino);
+	off = get_inode_offset(mp, lino, irec);
+	ASSERT(!inode_was_rl(irec, off));
+	set_inode_was_rl(irec, off);
+	dbg_printf("set was_rl lino=%llu was=0x%llx\n",
+		(unsigned long long)lino, (unsigned long long)irec->ino_was_rl);
+}
+
+/*
  * Regenerate the AGFL so that we don't run out of it while rebuilding the
  * rmap btree.  If skip_rmapbt is true, don't update the rmapbt (most probably
  * because we're updating the rmapbt).
diff --git a/repair/rmap.h b/repair/rmap.h
index 01dec9f..ab6f434 100644
--- a/repair/rmap.h
+++ b/repair/rmap.h
@@ -50,6 +50,8 @@ extern void rmap_high_key_from_rec(struct xfs_rmap_irec *rec,
 		struct xfs_rmap_irec *key);
 
 extern int compute_refcounts(struct xfs_mount *, xfs_agnumber_t);
+extern void record_inode_reflink_flag(struct xfs_mount *, struct xfs_dinode *,
+	xfs_agnumber_t, xfs_agino_t, xfs_ino_t);
 
 extern void fix_freelist(struct xfs_mount *, xfs_agnumber_t, bool);
 extern void rmap_store_agflcount(struct xfs_mount *, xfs_agnumber_t, int);


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 31/39] xfs_repair: fix inode reflink flags
  2016-10-25 23:03 [PATCH v10 00/39] xfsprogs: add reflink and dedupe support Darrick J. Wong
                   ` (29 preceding siblings ...)
  2016-10-25 23:06 ` [PATCH 30/39] xfs_repair: record reflink inode state Darrick J. Wong
@ 2016-10-25 23:06 ` Darrick J. Wong
  2016-10-25 23:07 ` [PATCH 32/39] xfs_repair: check the refcount btree against our observed reference counts when -n Darrick J. Wong
                   ` (7 subsequent siblings)
  38 siblings, 0 replies; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-25 23:06 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: linux-xfs

While we're computing reference counts, record which inodes actually
share blocks with other files and fix the flags as necessary.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/phase4.c |   20 ++++++++
 repair/rmap.c   |  133 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 repair/rmap.h   |    1 
 3 files changed, 154 insertions(+)


diff --git a/repair/phase4.c b/repair/phase4.c
index 86992c9..2c2d611 100644
--- a/repair/phase4.c
+++ b/repair/phase4.c
@@ -208,6 +208,21 @@ _("%s while computing reference count records.\n"),
 }
 
 static void
+process_inode_reflink_flags(
+	struct work_queue	*wq,
+	xfs_agnumber_t		agno,
+	void			*arg)
+{
+	int			error;
+
+	error = fix_inode_reflink_flags(wq->mp, agno);
+	if (error)
+		do_error(
+_("%s while fixing inode reflink flags.\n"),
+			 strerror(-error));
+}
+
+static void
 process_rmap_data(
 	struct xfs_mount	*mp)
 {
@@ -229,6 +244,11 @@ process_rmap_data(
 	for (i = 0; i < mp->m_sb.sb_agcount; i++)
 		queue_work(&wq, compute_ag_refcounts, i, NULL);
 	destroy_work_queue(&wq);
+
+	create_work_queue(&wq, mp, libxfs_nproc());
+	for (i = 0; i < mp->m_sb.sb_agcount; i++)
+		queue_work(&wq, process_inode_reflink_flags, i, NULL);
+	destroy_work_queue(&wq);
 }
 
 void
diff --git a/repair/rmap.c b/repair/rmap.c
index 5de4d32..4ef9051 100644
--- a/repair/rmap.c
+++ b/repair/rmap.c
@@ -672,6 +672,39 @@ rmap_dump(
  */
 
 /*
+ * Mark all inodes in the reverse-mapping observation stack as requiring the
+ * reflink inode flag, if the stack depth is greater than 1.
+ */
+static void
+mark_inode_rl(
+	struct xfs_mount		*mp,
+	struct xfs_bag		*rmaps)
+{
+	xfs_agnumber_t		iagno;
+	struct xfs_rmap_irec	*rmap;
+	struct ino_tree_node	*irec;
+	int			off;
+	size_t			idx;
+	xfs_agino_t		ino;
+
+	if (bag_count(rmaps) < 2)
+		return;
+
+	/* Reflink flag accounting */
+	foreach_bag_ptr(rmaps, idx, rmap) {
+		ASSERT(!XFS_RMAP_NON_INODE_OWNER(rmap->rm_owner));
+		iagno = XFS_INO_TO_AGNO(mp, rmap->rm_owner);
+		ino = XFS_INO_TO_AGINO(mp, rmap->rm_owner);
+		pthread_mutex_lock(&ag_locks[iagno].lock);
+		irec = find_inode_rec(mp, iagno, ino);
+		off = get_inode_offset(mp, rmap->rm_owner, irec);
+		/* lock here because we might go outside this ag */
+		set_inode_is_rl(irec, off);
+		pthread_mutex_unlock(&ag_locks[iagno].lock);
+	}
+}
+
+/*
  * Emit a refcount object for refcntbt reconstruction during phase 5.
  */
 #define REFCOUNT_CLAMP(nr)	((nr) > MAXREFCOUNT ? MAXREFCOUNT : (nr))
@@ -752,6 +785,7 @@ compute_refcounts(
 			if (error)
 				goto err;
 		}
+		mark_inode_rl(mp, stack_top);
 
 		/* Set nbno to the bno of the next refcount change */
 		if (n < slab_count(rmaps))
@@ -788,6 +822,7 @@ compute_refcounts(
 				if (error)
 					goto err;
 			}
+			mark_inode_rl(mp, stack_top);
 
 			/* Emit refcount if necessary */
 			ASSERT(nbno > cbno);
@@ -1109,6 +1144,104 @@ record_inode_reflink_flag(
 }
 
 /*
+ * Fix an inode's reflink flag.
+ */
+static int
+fix_inode_reflink_flag(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	xfs_agino_t		agino,
+	bool			set)
+{
+	struct xfs_dinode	*dino;
+	struct xfs_buf		*buf;
+
+	if (set)
+		do_warn(
+_("setting reflink flag on inode %"PRIu64"\n"),
+			XFS_AGINO_TO_INO(mp, agno, agino));
+	else if (!no_modify) /* && !set */
+		do_warn(
+_("clearing reflink flag on inode %"PRIu64"\n"),
+			XFS_AGINO_TO_INO(mp, agno, agino));
+	if (no_modify)
+		return 0;
+
+	buf = get_agino_buf(mp, agno, agino, &dino);
+	if (!buf)
+		return 1;
+	ASSERT(XFS_AGINO_TO_INO(mp, agno, agino) == be64_to_cpu(dino->di_ino));
+	if (set)
+		dino->di_flags2 |= cpu_to_be64(XFS_DIFLAG2_REFLINK);
+	else
+		dino->di_flags2 &= cpu_to_be64(~XFS_DIFLAG2_REFLINK);
+	libxfs_dinode_calc_crc(mp, dino);
+	libxfs_writebuf(buf, 0);
+
+	return 0;
+}
+
+/*
+ * Fix discrepancies between the state of the inode reflink flag and our
+ * observations as to whether or not the inode really needs it.
+ */
+int
+fix_inode_reflink_flags(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno)
+{
+	struct ino_tree_node	*irec;
+	int			bit;
+	__uint64_t		was;
+	__uint64_t		is;
+	__uint64_t		diff;
+	__uint64_t		mask;
+	int			error = 0;
+	xfs_agino_t		agino;
+
+	/*
+	 * Update the reflink flag for any inode where there's a discrepancy
+	 * between the inode flag and whether or not we found any reflinked
+	 * extents.
+	 */
+	for (irec = findfirst_inode_rec(agno);
+	     irec != NULL;
+	     irec = next_ino_rec(irec)) {
+		ASSERT((irec->ino_was_rl & irec->ir_free) == 0);
+		ASSERT((irec->ino_is_rl & irec->ir_free) == 0);
+		was = irec->ino_was_rl;
+		is = irec->ino_is_rl;
+		if (was == is)
+			continue;
+		diff = was ^ is;
+		dbg_printf("mismatch ino=%llu was=0x%lx is=0x%lx dif=0x%lx\n",
+			(unsigned long long)XFS_AGINO_TO_INO(mp, agno,
+						irec->ino_startnum),
+			was, is, diff);
+
+		for (bit = 0, mask = 1; bit < 64; bit++, mask <<= 1) {
+			agino = bit + irec->ino_startnum;
+			if (!(diff & mask))
+				continue;
+			else if (was & mask)
+				error = fix_inode_reflink_flag(mp, agno, agino,
+						false);
+			else if (is & mask)
+				error = fix_inode_reflink_flag(mp, agno, agino,
+						true);
+			else
+				ASSERT(0);
+			if (error)
+				do_error(
+_("Unable to fix reflink flag on inode %"PRIu64".\n"),
+					XFS_AGINO_TO_INO(mp, agno, agino));
+		}
+	}
+
+	return error;
+}
+
+/*
  * Regenerate the AGFL so that we don't run out of it while rebuilding the
  * rmap btree.  If skip_rmapbt is true, don't update the rmapbt (most probably
  * because we're updating the rmapbt).
diff --git a/repair/rmap.h b/repair/rmap.h
index ab6f434..266448f 100644
--- a/repair/rmap.h
+++ b/repair/rmap.h
@@ -52,6 +52,7 @@ extern void rmap_high_key_from_rec(struct xfs_rmap_irec *rec,
 extern int compute_refcounts(struct xfs_mount *, xfs_agnumber_t);
 extern void record_inode_reflink_flag(struct xfs_mount *, struct xfs_dinode *,
 	xfs_agnumber_t, xfs_agino_t, xfs_ino_t);
+extern int fix_inode_reflink_flags(struct xfs_mount *, xfs_agnumber_t);
 
 extern void fix_freelist(struct xfs_mount *, xfs_agnumber_t, bool);
 extern void rmap_store_agflcount(struct xfs_mount *, xfs_agnumber_t, int);


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 32/39] xfs_repair: check the refcount btree against our observed reference counts when -n
  2016-10-25 23:03 [PATCH v10 00/39] xfsprogs: add reflink and dedupe support Darrick J. Wong
                   ` (30 preceding siblings ...)
  2016-10-25 23:06 ` [PATCH 31/39] xfs_repair: fix inode reflink flags Darrick J. Wong
@ 2016-10-25 23:07 ` Darrick J. Wong
  2016-10-25 23:07 ` [PATCH 33/39] xfs_repair: rebuild the refcount btree Darrick J. Wong
                   ` (6 subsequent siblings)
  38 siblings, 0 replies; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-25 23:07 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: linux-xfs

Check the observed reference counts against whatever's in the refcount
btree for discrepancies.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/libxfs_api_defs.h |    4 +
 repair/phase4.c          |   19 +++++++
 repair/rmap.c            |  126 ++++++++++++++++++++++++++++++++++++++++++++++
 repair/rmap.h            |    5 ++
 repair/scan.c            |    3 +
 5 files changed, 156 insertions(+), 1 deletion(-)


diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index b30228d..8c15b75 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -138,4 +138,8 @@
 #define xfs_dinode_good_version		libxfs_dinode_good_version
 #define xfs_free_extent			libxfs_free_extent
 
+#define xfs_refcountbt_init_cursor	libxfs_refcountbt_init_cursor
+#define xfs_refcount_lookup_le		libxfs_refcount_lookup_le
+#define xfs_refcount_get_rec		libxfs_refcount_get_rec
+
 #endif /* __LIBXFS_API_DEFS_H__ */
diff --git a/repair/phase4.c b/repair/phase4.c
index 2c2d611..e59464b 100644
--- a/repair/phase4.c
+++ b/repair/phase4.c
@@ -223,6 +223,21 @@ _("%s while fixing inode reflink flags.\n"),
 }
 
 static void
+check_refcount_btrees(
+	work_queue_t	*wq,
+	xfs_agnumber_t	agno,
+	void		*arg)
+{
+	int		error;
+
+	error = check_refcounts(wq->mp, agno);
+	if (error)
+		do_error(
+_("%s while checking reference counts"),
+			 strerror(-error));
+}
+
+static void
 process_rmap_data(
 	struct xfs_mount	*mp)
 {
@@ -246,8 +261,10 @@ process_rmap_data(
 	destroy_work_queue(&wq);
 
 	create_work_queue(&wq, mp, libxfs_nproc());
-	for (i = 0; i < mp->m_sb.sb_agcount; i++)
+	for (i = 0; i < mp->m_sb.sb_agcount; i++) {
 		queue_work(&wq, process_inode_reflink_flags, i, NULL);
+		queue_work(&wq, check_refcount_btrees, i, NULL);
+	}
 	destroy_work_queue(&wq);
 }
 
diff --git a/repair/rmap.c b/repair/rmap.c
index 4ef9051..849b788 100644
--- a/repair/rmap.c
+++ b/repair/rmap.c
@@ -47,6 +47,7 @@ struct xfs_ag_rmap {
 
 static struct xfs_ag_rmap *ag_rmaps;
 static bool rmapbt_suspect;
+static bool refcbt_suspect;
 
 /*
  * Compare rmap observations for array sorting.
@@ -1242,6 +1243,131 @@ _("Unable to fix reflink flag on inode %"PRIu64".\n"),
 }
 
 /*
+ * Return the number of refcount objects for an AG.
+ */
+size_t
+refcount_record_count(
+	struct xfs_mount		*mp,
+	xfs_agnumber_t		agno)
+{
+	return slab_count(ag_rmaps[agno].ar_refcount_items);
+}
+
+/*
+ * Return a slab cursor that will return refcount objects in order.
+ */
+int
+init_refcount_cursor(
+	xfs_agnumber_t		agno,
+	struct xfs_slab_cursor	**cur)
+{
+	return init_slab_cursor(ag_rmaps[agno].ar_refcount_items, NULL, cur);
+}
+
+/*
+ * Disable the refcount btree check.
+ */
+void
+refcount_avoid_check(void)
+{
+	refcbt_suspect = true;
+}
+
+/*
+ * Compare the observed reference counts against what's in the ag btree.
+ */
+int
+check_refcounts(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno)
+{
+	struct xfs_slab_cursor	*rl_cur;
+	struct xfs_btree_cur	*bt_cur = NULL;
+	int			error;
+	int			have;
+	int			i;
+	struct xfs_buf		*agbp = NULL;
+	struct xfs_refcount_irec	*rl_rec;
+	struct xfs_refcount_irec	tmp;
+	struct xfs_perag	*pag;		/* per allocation group data */
+
+	if (!xfs_sb_version_hasreflink(&mp->m_sb))
+		return 0;
+	if (refcbt_suspect) {
+		if (no_modify && agno == 0)
+			do_warn(_("would rebuild corrupt refcount btrees.\n"));
+		return 0;
+	}
+
+	/* Create cursors to refcount structures */
+	error = init_refcount_cursor(agno, &rl_cur);
+	if (error)
+		return error;
+
+	error = -libxfs_alloc_read_agf(mp, NULL, agno, 0, &agbp);
+	if (error)
+		goto err;
+
+	/* Leave the per-ag data "uninitialized" since we rewrite it later */
+	pag = libxfs_perag_get(mp, agno);
+	pag->pagf_init = 0;
+	libxfs_perag_put(pag);
+
+	bt_cur = libxfs_refcountbt_init_cursor(mp, NULL, agbp, agno, NULL);
+	if (!bt_cur) {
+		error = -ENOMEM;
+		goto err;
+	}
+
+	rl_rec = pop_slab_cursor(rl_cur);
+	while (rl_rec) {
+		/* Look for a refcount record in the btree */
+		error = -libxfs_refcount_lookup_le(bt_cur,
+				rl_rec->rc_startblock, &have);
+		if (error)
+			goto err;
+		if (!have) {
+			do_warn(
+_("Missing reference count record for (%u/%u) len %u count %u\n"),
+				agno, rl_rec->rc_startblock,
+				rl_rec->rc_blockcount, rl_rec->rc_refcount);
+			goto next_loop;
+		}
+
+		error = -libxfs_refcount_get_rec(bt_cur, &tmp, &i);
+		if (error)
+			goto err;
+		if (!i) {
+			do_warn(
+_("Missing reference count record for (%u/%u) len %u count %u\n"),
+				agno, rl_rec->rc_startblock,
+				rl_rec->rc_blockcount, rl_rec->rc_refcount);
+			goto next_loop;
+		}
+
+		/* Compare each refcount observation against the btree's */
+		if (tmp.rc_startblock != rl_rec->rc_startblock ||
+		    tmp.rc_blockcount < rl_rec->rc_blockcount ||
+		    tmp.rc_refcount < rl_rec->rc_refcount)
+			do_warn(
+_("Incorrect reference count: saw (%u/%u) len %u nlinks %u; should be (%u/%u) len %u nlinks %u\n"),
+				agno, tmp.rc_startblock, tmp.rc_blockcount,
+				tmp.rc_refcount, agno, rl_rec->rc_startblock,
+				rl_rec->rc_blockcount, rl_rec->rc_refcount);
+next_loop:
+		rl_rec = pop_slab_cursor(rl_cur);
+	}
+
+err:
+	if (bt_cur)
+		libxfs_btree_del_cursor(bt_cur, XFS_BTREE_NOERROR);
+	if (agbp)
+		libxfs_putbuf(agbp);
+	free_slab_cursor(&rl_cur);
+	return 0;
+}
+
+/*
  * Regenerate the AGFL so that we don't run out of it while rebuilding the
  * rmap btree.  If skip_rmapbt is true, don't update the rmapbt (most probably
  * because we're updating the rmapbt).
diff --git a/repair/rmap.h b/repair/rmap.h
index 266448f..752ece8 100644
--- a/repair/rmap.h
+++ b/repair/rmap.h
@@ -50,6 +50,11 @@ extern void rmap_high_key_from_rec(struct xfs_rmap_irec *rec,
 		struct xfs_rmap_irec *key);
 
 extern int compute_refcounts(struct xfs_mount *, xfs_agnumber_t);
+extern size_t refcount_record_count(struct xfs_mount *, xfs_agnumber_t);
+extern int init_refcount_cursor(xfs_agnumber_t, struct xfs_slab_cursor **);
+extern void refcount_avoid_check(void);
+extern int check_refcounts(struct xfs_mount *, xfs_agnumber_t);
+
 extern void record_inode_reflink_flag(struct xfs_mount *, struct xfs_dinode *,
 	xfs_agnumber_t, xfs_agino_t, xfs_ino_t);
 extern int fix_inode_reflink_flags(struct xfs_mount *, xfs_agnumber_t);
diff --git a/repair/scan.c b/repair/scan.c
index d3a1a82..1c60784 100644
--- a/repair/scan.c
+++ b/repair/scan.c
@@ -1370,6 +1370,8 @@ _("%s btree block claimed (state %d), agno %d, bno %d, suspect %d\n"),
 		}
 	}
 out:
+	if (suspect)
+		refcount_avoid_check();
 	return;
 }
 
@@ -2173,6 +2175,7 @@ validate_agf(
 		} else  {
 			do_warn(_("bad agbno %u for refcntbt root, agno %d\n"),
 				bno, agno);
+			refcount_avoid_check();
 		}
 	}
 


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 33/39] xfs_repair: rebuild the refcount btree
  2016-10-25 23:03 [PATCH v10 00/39] xfsprogs: add reflink and dedupe support Darrick J. Wong
                   ` (31 preceding siblings ...)
  2016-10-25 23:07 ` [PATCH 32/39] xfs_repair: check the refcount btree against our observed reference counts when -n Darrick J. Wong
@ 2016-10-25 23:07 ` Darrick J. Wong
  2016-10-25 23:07 ` [PATCH 34/39] xfs_repair: complain about copy-on-write leftovers Darrick J. Wong
                   ` (5 subsequent siblings)
  38 siblings, 0 replies; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-25 23:07 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: linux-xfs

Rebuild the refcount btree with the reference count data we assembled
during phase 4.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/phase5.c |  316 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 315 insertions(+), 1 deletion(-)


diff --git a/repair/phase5.c b/repair/phase5.c
index 7be1df8..3604d1d 100644
--- a/repair/phase5.c
+++ b/repair/phase5.c
@@ -1694,6 +1694,297 @@ _("Insufficient memory to construct reverse-map cursor."));
 	free_slab_cursor(&rmap_cur);
 }
 
+/* rebuild the refcount tree */
+
+/*
+ * we don't have to worry here about how chewing up free extents
+ * may perturb things because reflink tree building happens before
+ * freespace tree building.
+ */
+static void
+init_refc_cursor(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	struct bt_status	*btree_curs)
+{
+	size_t			num_recs;
+	int			level;
+	struct bt_stat_level	*lptr;
+	struct bt_stat_level	*p_lptr;
+	xfs_extlen_t		blocks_allocated;
+
+	if (!xfs_sb_version_hasreflink(&mp->m_sb)) {
+		memset(btree_curs, 0, sizeof(struct bt_status));
+		return;
+	}
+
+	lptr = &btree_curs->level[0];
+	btree_curs->init = 1;
+	btree_curs->owner = XFS_RMAP_OWN_REFC;
+
+	/*
+	 * build up statistics
+	 */
+	num_recs = refcount_record_count(mp, agno);
+	if (num_recs == 0) {
+		/*
+		 * easy corner-case -- no refcount records
+		 */
+		lptr->num_blocks = 1;
+		lptr->modulo = 0;
+		lptr->num_recs_pb = 0;
+		lptr->num_recs_tot = 0;
+
+		btree_curs->num_levels = 1;
+		btree_curs->num_tot_blocks = btree_curs->num_free_blocks = 1;
+
+		setup_cursor(mp, agno, btree_curs);
+
+		return;
+	}
+
+	blocks_allocated = lptr->num_blocks = howmany(num_recs,
+					mp->m_refc_mxr[0]);
+
+	lptr->modulo = num_recs % lptr->num_blocks;
+	lptr->num_recs_pb = num_recs / lptr->num_blocks;
+	lptr->num_recs_tot = num_recs;
+	level = 1;
+
+	if (lptr->num_blocks > 1)  {
+		for (; btree_curs->level[level-1].num_blocks > 1
+				&& level < XFS_BTREE_MAXLEVELS;
+				level++)  {
+			lptr = &btree_curs->level[level];
+			p_lptr = &btree_curs->level[level - 1];
+			lptr->num_blocks = howmany(p_lptr->num_blocks,
+					mp->m_refc_mxr[1]);
+			lptr->modulo = p_lptr->num_blocks % lptr->num_blocks;
+			lptr->num_recs_pb = p_lptr->num_blocks
+					/ lptr->num_blocks;
+			lptr->num_recs_tot = p_lptr->num_blocks;
+
+			blocks_allocated += lptr->num_blocks;
+		}
+	}
+	ASSERT(lptr->num_blocks == 1);
+	btree_curs->num_levels = level;
+
+	btree_curs->num_tot_blocks = btree_curs->num_free_blocks
+			= blocks_allocated;
+
+	setup_cursor(mp, agno, btree_curs);
+}
+
+static void
+prop_refc_cursor(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	struct bt_status	*btree_curs,
+	xfs_agblock_t		startbno,
+	int			level)
+{
+	struct xfs_btree_block	*bt_hdr;
+	struct xfs_refcount_key	*bt_key;
+	xfs_refcount_ptr_t	*bt_ptr;
+	xfs_agblock_t		agbno;
+	struct bt_stat_level	*lptr;
+
+	level++;
+
+	if (level >= btree_curs->num_levels)
+		return;
+
+	lptr = &btree_curs->level[level];
+	bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
+
+	if (be16_to_cpu(bt_hdr->bb_numrecs) == 0)  {
+		/*
+		 * this only happens once to initialize the
+		 * first path up the left side of the tree
+		 * where the agbno's are already set up
+		 */
+		prop_refc_cursor(mp, agno, btree_curs, startbno, level);
+	}
+
+	if (be16_to_cpu(bt_hdr->bb_numrecs) ==
+				lptr->num_recs_pb + (lptr->modulo > 0))  {
+		/*
+		 * write out current prev block, grab us a new block,
+		 * and set the rightsib pointer of current block
+		 */
+#ifdef XR_BLD_INO_TRACE
+		fprintf(stderr, " ino prop agbno %d ", lptr->prev_agbno);
+#endif
+		if (lptr->prev_agbno != NULLAGBLOCK)  {
+			ASSERT(lptr->prev_buf_p != NULL);
+			libxfs_writebuf(lptr->prev_buf_p, 0);
+		}
+		lptr->prev_agbno = lptr->agbno;
+		lptr->prev_buf_p = lptr->buf_p;
+		agbno = get_next_blockaddr(agno, level, btree_curs);
+
+		bt_hdr->bb_u.s.bb_rightsib = cpu_to_be32(agbno);
+
+		lptr->buf_p = libxfs_getbuf(mp->m_dev,
+					XFS_AGB_TO_DADDR(mp, agno, agbno),
+					XFS_FSB_TO_BB(mp, 1));
+		lptr->agbno = agbno;
+
+		if (lptr->modulo)
+			lptr->modulo--;
+
+		/*
+		 * initialize block header
+		 */
+		lptr->buf_p->b_ops = &xfs_refcountbt_buf_ops;
+		bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
+		memset(bt_hdr, 0, mp->m_sb.sb_blocksize);
+		libxfs_btree_init_block(mp, lptr->buf_p, XFS_REFC_CRC_MAGIC,
+					level, 0, agno,
+					XFS_BTREE_CRC_BLOCKS);
+
+		bt_hdr->bb_u.s.bb_leftsib = cpu_to_be32(lptr->prev_agbno);
+
+		/*
+		 * propagate extent record for first extent in new block up
+		 */
+		prop_refc_cursor(mp, agno, btree_curs, startbno, level);
+	}
+	/*
+	 * add inode info to current block
+	 */
+	be16_add_cpu(&bt_hdr->bb_numrecs, 1);
+
+	bt_key = XFS_REFCOUNT_KEY_ADDR(bt_hdr,
+				    be16_to_cpu(bt_hdr->bb_numrecs));
+	bt_ptr = XFS_REFCOUNT_PTR_ADDR(bt_hdr,
+				    be16_to_cpu(bt_hdr->bb_numrecs),
+				    mp->m_refc_mxr[1]);
+
+	bt_key->rc_startblock = cpu_to_be32(startbno);
+	*bt_ptr = cpu_to_be32(btree_curs->level[level-1].agbno);
+}
+
+/*
+ * rebuilds a refcount btree given a cursor.
+ */
+static void
+build_refcount_tree(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	struct bt_status	*btree_curs)
+{
+	xfs_agnumber_t		i;
+	xfs_agblock_t		j;
+	xfs_agblock_t		agbno;
+	struct xfs_btree_block	*bt_hdr;
+	struct xfs_refcount_irec	*refc_rec;
+	struct xfs_slab_cursor	*refc_cur;
+	struct xfs_refcount_rec	*bt_rec;
+	struct bt_stat_level	*lptr;
+	int			level = btree_curs->num_levels;
+	int			error;
+
+	for (i = 0; i < level; i++)  {
+		lptr = &btree_curs->level[i];
+
+		agbno = get_next_blockaddr(agno, i, btree_curs);
+		lptr->buf_p = libxfs_getbuf(mp->m_dev,
+					XFS_AGB_TO_DADDR(mp, agno, agbno),
+					XFS_FSB_TO_BB(mp, 1));
+
+		if (i == btree_curs->num_levels - 1)
+			btree_curs->root = agbno;
+
+		lptr->agbno = agbno;
+		lptr->prev_agbno = NULLAGBLOCK;
+		lptr->prev_buf_p = NULL;
+		/*
+		 * initialize block header
+		 */
+
+		lptr->buf_p->b_ops = &xfs_refcountbt_buf_ops;
+		bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
+		memset(bt_hdr, 0, mp->m_sb.sb_blocksize);
+		libxfs_btree_init_block(mp, lptr->buf_p, XFS_REFC_CRC_MAGIC,
+					i, 0, agno,
+					XFS_BTREE_CRC_BLOCKS);
+	}
+
+	/*
+	 * run along leaf, setting up records.  as we have to switch
+	 * blocks, call the prop_refc_cursor routine to set up the new
+	 * pointers for the parent.  that can recurse up to the root
+	 * if required.  set the sibling pointers for leaf level here.
+	 */
+	error = init_refcount_cursor(agno, &refc_cur);
+	if (error)
+		do_error(
+_("Insufficient memory to construct refcount cursor."));
+	refc_rec = pop_slab_cursor(refc_cur);
+	lptr = &btree_curs->level[0];
+
+	for (i = 0; i < lptr->num_blocks; i++)  {
+		/*
+		 * block initialization, lay in block header
+		 */
+		lptr->buf_p->b_ops = &xfs_refcountbt_buf_ops;
+		bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
+		memset(bt_hdr, 0, mp->m_sb.sb_blocksize);
+		libxfs_btree_init_block(mp, lptr->buf_p, XFS_REFC_CRC_MAGIC,
+					0, 0, agno,
+					XFS_BTREE_CRC_BLOCKS);
+
+		bt_hdr->bb_u.s.bb_leftsib = cpu_to_be32(lptr->prev_agbno);
+		bt_hdr->bb_numrecs = cpu_to_be16(lptr->num_recs_pb +
+							(lptr->modulo > 0));
+
+		if (lptr->modulo > 0)
+			lptr->modulo--;
+
+		if (lptr->num_recs_pb > 0)
+			prop_refc_cursor(mp, agno, btree_curs,
+					refc_rec->rc_startblock, 0);
+
+		bt_rec = (struct xfs_refcount_rec *)
+			  ((char *)bt_hdr + XFS_REFCOUNT_BLOCK_LEN);
+		for (j = 0; j < be16_to_cpu(bt_hdr->bb_numrecs); j++) {
+			ASSERT(refc_rec != NULL);
+			bt_rec[j].rc_startblock =
+					cpu_to_be32(refc_rec->rc_startblock);
+			bt_rec[j].rc_blockcount =
+					cpu_to_be32(refc_rec->rc_blockcount);
+			bt_rec[j].rc_refcount = cpu_to_be32(refc_rec->rc_refcount);
+
+			refc_rec = pop_slab_cursor(refc_cur);
+		}
+
+		if (refc_rec != NULL)  {
+			/*
+			 * get next leaf level block
+			 */
+			if (lptr->prev_buf_p != NULL)  {
+#ifdef XR_BLD_RL_TRACE
+				fprintf(stderr, "writing refcntbt agbno %u\n",
+					lptr->prev_agbno);
+#endif
+				ASSERT(lptr->prev_agbno != NULLAGBLOCK);
+				libxfs_writebuf(lptr->prev_buf_p, 0);
+			}
+			lptr->prev_buf_p = lptr->buf_p;
+			lptr->prev_agbno = lptr->agbno;
+			lptr->agbno = get_next_blockaddr(agno, 0, btree_curs);
+			bt_hdr->bb_u.s.bb_rightsib = cpu_to_be32(lptr->agbno);
+
+			lptr->buf_p = libxfs_getbuf(mp->m_dev,
+					XFS_AGB_TO_DADDR(mp, agno, lptr->agbno),
+					XFS_FSB_TO_BB(mp, 1));
+		}
+	}
+	free_slab_cursor(&refc_cur);
+}
+
 /*
  * build both the agf and the agfl for an agno given both
  * btree cursors.
@@ -1709,6 +2000,7 @@ build_agf_agfl(
 	xfs_extlen_t		freeblks,	/* # free blocks in tree */
 	int			lostblocks,	/* # blocks that will be lost */
 	struct bt_status	*rmap_bt,
+	struct bt_status	*refcnt_bt,
 	struct xfs_slab		*lost_fsb)
 {
 	struct extent_tree_node	*ext_ptr;
@@ -1754,6 +2046,10 @@ build_agf_agfl(
 	agf->agf_freeblks = cpu_to_be32(freeblks);
 	agf->agf_rmap_blocks = cpu_to_be32(rmap_bt->num_tot_blocks -
 			rmap_bt->num_free_blocks);
+	agf->agf_refcount_root = cpu_to_be32(refcnt_bt->root);
+	agf->agf_refcount_level = cpu_to_be32(refcnt_bt->num_levels);
+	agf->agf_refcount_blocks = cpu_to_be32(refcnt_bt->num_tot_blocks -
+			refcnt_bt->num_free_blocks);
 
 	/*
 	 * Count and record the number of btree blocks consumed if required.
@@ -1868,6 +2164,10 @@ _("Insufficient memory saving lost blocks.\n"));
 
 	ASSERT(be32_to_cpu(agf->agf_roots[XFS_BTNUM_BNOi]) !=
 		be32_to_cpu(agf->agf_roots[XFS_BTNUM_CNTi]));
+	ASSERT(be32_to_cpu(agf->agf_refcount_root) !=
+		be32_to_cpu(agf->agf_roots[XFS_BTNUM_BNOi]));
+	ASSERT(be32_to_cpu(agf->agf_refcount_root) !=
+		be32_to_cpu(agf->agf_roots[XFS_BTNUM_CNTi]));
 
 	libxfs_writebuf(agf_buf, 0);
 
@@ -1938,6 +2238,7 @@ phase5_func(
 	bt_status_t	ino_btree_curs;
 	bt_status_t	fino_btree_curs;
 	bt_status_t	rmap_btree_curs;
+	bt_status_t	refcnt_btree_curs;
 	int		extra_blocks = 0;
 	uint		num_freeblocks;
 	xfs_extlen_t	freeblks1;
@@ -2000,6 +2301,12 @@ phase5_func(
 		 */
 		init_rmapbt_cursor(mp, agno, &rmap_btree_curs);
 
+		/*
+		 * Set up the btree cursors for the on-disk refcount btrees,
+		 * which includes pre-allocating all required blocks.
+		 */
+		init_refc_cursor(mp, agno, &refcnt_btree_curs);
+
 		num_extents = count_bno_extents_blocks(agno, &num_freeblocks);
 		/*
 		 * lose two blocks per AG -- the space tree roots
@@ -2090,12 +2397,17 @@ phase5_func(
 					rmap_btree_curs.num_free_blocks) - 1;
 		}
 
+		if (xfs_sb_version_hasreflink(&mp->m_sb)) {
+			build_refcount_tree(mp, agno, &refcnt_btree_curs);
+			write_cursor(&refcnt_btree_curs);
+		}
+
 		/*
 		 * set up agf and agfl
 		 */
 		build_agf_agfl(mp, agno, &bno_btree_curs,
 				&bcnt_btree_curs, freeblks1, extra_blocks,
-				&rmap_btree_curs, lost_fsb);
+				&rmap_btree_curs, &refcnt_btree_curs, lost_fsb);
 		/*
 		 * build inode allocation tree.
 		 */
@@ -2126,6 +2438,8 @@ phase5_func(
 		finish_cursor(&ino_btree_curs);
 		if (xfs_sb_version_hasrmapbt(&mp->m_sb))
 			finish_cursor(&rmap_btree_curs);
+		if (xfs_sb_version_hasreflink(&mp->m_sb))
+			finish_cursor(&refcnt_btree_curs);
 		if (xfs_sb_version_hasfinobt(&mp->m_sb))
 			finish_cursor(&fino_btree_curs);
 		finish_cursor(&bcnt_btree_curs);


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 34/39] xfs_repair: complain about copy-on-write leftovers
  2016-10-25 23:03 [PATCH v10 00/39] xfsprogs: add reflink and dedupe support Darrick J. Wong
                   ` (32 preceding siblings ...)
  2016-10-25 23:07 ` [PATCH 33/39] xfs_repair: rebuild the refcount btree Darrick J. Wong
@ 2016-10-25 23:07 ` Darrick J. Wong
  2016-10-25 23:07 ` [PATCH 35/39] xfs_repair: check the CoW extent size hint Darrick J. Wong
                   ` (4 subsequent siblings)
  38 siblings, 0 replies; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-25 23:07 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: linux-xfs

Complain about leftover CoW allocations that are hanging off the
refcount btree.  These are cleaned out at mount time, but we could be
louder about flagging down evidence of trouble.

Since these extents aren't "owned" by anything, we'll free them up by
reconstructing the free space btrees.

v2: When we're processing rmap records, we inadvertently forgot to
handle the CoW owner, so the leftover CoW staging blocks got marked as
file data.  These blocks will just get freed later, so mark them
"CoW".  When we process the refcountbt, complain about leftovers if
the type is unknown or "CoW".

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 db/check.c      |   21 +++++++++++++++++----
 repair/incore.h |    3 ++-
 repair/scan.c   |   45 ++++++++++++++++++++++++++++++++++++++++-----
 3 files changed, 59 insertions(+), 10 deletions(-)


diff --git a/db/check.c b/db/check.c
index 5b90182..7392852 100644
--- a/db/check.c
+++ b/db/check.c
@@ -45,7 +45,7 @@ typedef enum {
 	DBM_LOG,	DBM_MISSING,	DBM_QUOTA,	DBM_RTBITMAP,
 	DBM_RTDATA,	DBM_RTFREE,	DBM_RTSUM,	DBM_SB,
 	DBM_SYMLINK,	DBM_BTFINO,	DBM_BTRMAP,	DBM_BTREFC,
-	DBM_RLDATA,
+	DBM_RLDATA,	DBM_COWDATA,
 	DBM_NDBM
 } dbm_t;
 
@@ -4731,9 +4731,22 @@ scanfunc_refcnt(
 		rp = XFS_REFCOUNT_REC_ADDR(block, 1);
 		lastblock = 0;
 		for (i = 0; i < be16_to_cpu(block->bb_numrecs); i++) {
-			set_dbmap(seqno, be32_to_cpu(rp[i].rc_startblock),
-				be32_to_cpu(rp[i].rc_blockcount), DBM_RLDATA,
-				seqno, bno);
+			if (be32_to_cpu(rp[i].rc_refcount) == 1) {
+				dbprintf(_(
+		"leftover CoW extent (%u/%u) len %u\n"),
+					seqno,
+					be32_to_cpu(rp[i].rc_startblock),
+					be32_to_cpu(rp[i].rc_blockcount));
+				set_dbmap(seqno,
+					be32_to_cpu(rp[i].rc_startblock),
+					be32_to_cpu(rp[i].rc_blockcount),
+					DBM_COWDATA, seqno, bno);
+			} else {
+				set_dbmap(seqno,
+					be32_to_cpu(rp[i].rc_startblock),
+					be32_to_cpu(rp[i].rc_blockcount),
+					DBM_RLDATA, seqno, bno);
+			}
 			if (be32_to_cpu(rp[i].rc_startblock) < lastblock) {
 				dbprintf(_(
 		"out-of-order refcnt btree record %d (%u %u) block %u/%u\n"),
diff --git a/repair/incore.h b/repair/incore.h
index bcd2f4b..c23a3a3 100644
--- a/repair/incore.h
+++ b/repair/incore.h
@@ -107,7 +107,8 @@ typedef struct rt_extent_tree_node  {
 #define XR_E_INO1	10	/* used by inodes (marked by rmap btree) */
 #define XR_E_FS_MAP1	11	/* used by fs space/inode maps (rmap btree) */
 #define XR_E_REFC	12	/* used by fs ag reference count btree */
-#define XR_E_BAD_STATE	13
+#define XR_E_COW	13	/* leftover cow extent */
+#define XR_E_BAD_STATE	14
 
 /* separate state bit, OR'ed into high (4th) bit of ex_state field */
 
diff --git a/repair/scan.c b/repair/scan.c
index 1c60784..800a88a 100644
--- a/repair/scan.c
+++ b/repair/scan.c
@@ -813,6 +813,9 @@ process_rmap_rec(
 		case XFS_RMAP_OWN_REFC:
 			set_bmap_ext(agno, b, blen, XR_E_REFC);
 			break;
+		case XFS_RMAP_OWN_COW:
+			set_bmap_ext(agno, b, blen, XR_E_COW);
+			break;
 		case XFS_RMAP_OWN_NULL:
 			/* still unknown */
 			break;
@@ -1288,16 +1291,27 @@ _("%s btree block claimed (state %d), agno %d, bno %d, suspect %d\n"),
 
 		rp = XFS_REFCOUNT_REC_ADDR(block, 1);
 		for (i = 0; i < numrecs; i++) {
-			xfs_agblock_t		b, end;
+			xfs_agblock_t		b, agb, end;
 			xfs_extlen_t		len;
 			xfs_nlink_t		nr;
 
-			b = be32_to_cpu(rp[i].rc_startblock);
+			b = agb = be32_to_cpu(rp[i].rc_startblock);
 			len = be32_to_cpu(rp[i].rc_blockcount);
 			nr = be32_to_cpu(rp[i].rc_refcount);
-			end = b + len;
+			if (b >= XFS_REFC_COW_START && nr != 1)
+				do_warn(
+_("leftover CoW extent has incorrect refcount in record %u of %s btree block %u/%u\n"),
+					i, name, agno, bno);
+			if (nr == 1) {
+				if (agb < XFS_REFC_COW_START)
+					do_warn(
+_("leftover CoW extent has invalid startblock in record %u of %s btree block %u/%u\n"),
+						i, name, agno, bno);
+				agb -= XFS_REFC_COW_START;
+			}
+			end = agb + len;
 
-			if (!verify_agbno(mp, agno, b)) {
+			if (!verify_agbno(mp, agno, agb)) {
 				do_warn(
 	_("invalid start block %u in record %u of %s btree block %u/%u\n"),
 					b, i, name, agno, bno);
@@ -1310,7 +1324,28 @@ _("%s btree block claimed (state %d), agno %d, bno %d, suspect %d\n"),
 				continue;
 			}
 
-			if (nr < 2 || nr > MAXREFCOUNT) {
+			if (nr == 1) {
+				xfs_agblock_t	c;
+				xfs_extlen_t	cnr;
+
+				for (c = agb; c < end; c += cnr) {
+					state = get_bmap_ext(agno, c, end, &cnr);
+					switch (state) {
+					case XR_E_UNKNOWN:
+					case XR_E_COW:
+						do_warn(
+_("leftover CoW extent (%u/%u) len %u\n"),
+						agno, c, cnr);
+						set_bmap_ext(agno, c, cnr, XR_E_FREE);
+						break;
+					default:
+						do_warn(
+_("extent (%u/%u) len %u claimed, state is %d\n"),
+						agno, c, cnr, state);
+						break;
+					}
+				}
+			} else if (nr < 2 || nr > MAXREFCOUNT) {
 				do_warn(
 	_("invalid reference count %u in record %u of %s btree block %u/%u\n"),
 					nr, i, name, agno, bno);


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 35/39] xfs_repair: check the CoW extent size hint
  2016-10-25 23:03 [PATCH v10 00/39] xfsprogs: add reflink and dedupe support Darrick J. Wong
                   ` (33 preceding siblings ...)
  2016-10-25 23:07 ` [PATCH 34/39] xfs_repair: complain about copy-on-write leftovers Darrick J. Wong
@ 2016-10-25 23:07 ` Darrick J. Wong
  2016-10-25 23:07 ` [PATCH 36/39] xfs_repair: use range query when while checking rmaps Darrick J. Wong
                   ` (3 subsequent siblings)
  38 siblings, 0 replies; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-25 23:07 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: linux-xfs

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/dinode.c |   55 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)


diff --git a/repair/dinode.c b/repair/dinode.c
index 64fc983..11b60ce 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -2531,6 +2531,38 @@ _("bad (negative) size %" PRId64 " on inode %" PRIu64 "\n"),
 			goto clear_bad_out;
 		}
 
+		if ((flags2 & XFS_DIFLAG2_COWEXTSIZE) &&
+		    !xfs_sb_version_hasreflink(&mp->m_sb)) {
+			if (!uncertain) {
+				do_warn(
+	_("inode %" PRIu64 " has CoW extent size hint but file system does not support reflink\n"),
+					lino);
+			}
+			flags2 &= ~XFS_DIFLAG2_COWEXTSIZE;
+		}
+
+		if (flags2 & XFS_DIFLAG2_COWEXTSIZE) {
+			/* must be a directory or file */
+			if (di_mode && !S_ISDIR(di_mode) && !S_ISREG(di_mode)) {
+				if (!uncertain) {
+					do_warn(
+	_("CoW extent size flag set on non-file, non-directory inode %" PRIu64 "\n" ),
+						lino);
+				}
+				flags2 &= ~XFS_DIFLAG2_COWEXTSIZE;
+			}
+		}
+
+		if ((flags2 & XFS_DIFLAG2_COWEXTSIZE) &&
+		    (flags & (XFS_DIFLAG_REALTIME | XFS_DIFLAG_RTINHERIT))) {
+			if (!uncertain) {
+				do_warn(
+	_("Cannot have CoW extent size hint on a realtime inode %" PRIu64 "\n"),
+					lino);
+			}
+			flags2 &= ~XFS_DIFLAG2_COWEXTSIZE;
+		}
+
 		if (!verify_mode && flags2 != be64_to_cpu(dino->di_flags2)) {
 			if (!no_modify) {
 				do_warn(_("fixing bad flags2.\n"));
@@ -2624,6 +2656,29 @@ _("bad non-zero extent size %u for non-realtime/extsize inode %" PRIu64 ", "),
 	}
 
 	/*
+	 * Only (regular files and directories) with COWEXTSIZE flags
+	 * set can have extsize set.
+	 */
+	if (dino->di_version >= 3 &&
+	    be32_to_cpu(dino->di_cowextsize) != 0) {
+		if ((type == XR_INO_DIR || type == XR_INO_DATA) &&
+		    (be64_to_cpu(dino->di_flags2) &
+					XFS_DIFLAG2_COWEXTSIZE)) {
+			/* s'okay */ ;
+		} else {
+			do_warn(
+_("Cannot have non-zero CoW extent size %u on non-cowextsize inode %" PRIu64 ", "),
+					be32_to_cpu(dino->di_cowextsize), lino);
+			if (!no_modify)  {
+				do_warn(_("resetting to zero\n"));
+				dino->di_cowextsize = 0;
+				*dirty = 1;
+			} else
+				do_warn(_("would reset to zero\n"));
+		}
+	}
+
+	/*
 	 * general size/consistency checks:
 	 */
 	if (process_check_inode_sizes(mp, dino, lino, type) != 0)


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 36/39] xfs_repair: use range query when while checking rmaps
  2016-10-25 23:03 [PATCH v10 00/39] xfsprogs: add reflink and dedupe support Darrick J. Wong
                   ` (34 preceding siblings ...)
  2016-10-25 23:07 ` [PATCH 35/39] xfs_repair: check the CoW extent size hint Darrick J. Wong
@ 2016-10-25 23:07 ` Darrick J. Wong
  2016-10-25 23:07 ` [PATCH 37/39] xfs_repair: check for mergeable refcount records Darrick J. Wong
                   ` (2 subsequent siblings)
  38 siblings, 0 replies; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-25 23:07 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: linux-xfs

For shared extents, we ought to use a range query on the rmapbt to
find the corresponding rmap.  However, most of the time the observed
rmap will be an exact match for the rmapbt rmap, in which case we
could have used the (much faster) regular lookup.  Therefore, try the
regular lookup first and resort to the range lookup if that doesn't
get us what we want.  This can cut the run time of the rmap check of
xfs_repair in half.

Theoretically, the only reason why an observed rmap wouldn't be an
exact match for an rmapbt rmap is because we modified some file on
account of a metadata error.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/libxfs_api_defs.h |    1 +
 repair/rmap.c            |   26 ++++++++++++++++++++++++++
 2 files changed, 27 insertions(+)


diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index 8c15b75..e95763d 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -141,5 +141,6 @@
 #define xfs_refcountbt_init_cursor	libxfs_refcountbt_init_cursor
 #define xfs_refcount_lookup_le		libxfs_refcount_lookup_le
 #define xfs_refcount_get_rec		libxfs_refcount_get_rec
+#define xfs_rmap_lookup_le_range	libxfs_rmap_lookup_le_range
 
 #endif /* __LIBXFS_API_DEFS_H__ */
diff --git a/repair/rmap.c b/repair/rmap.c
index 849b788..db5a30f 100644
--- a/repair/rmap.c
+++ b/repair/rmap.c
@@ -916,6 +916,20 @@ rmap_lookup(
 	return -libxfs_rmap_get_rec(bt_cur, tmp, have);
 }
 
+/* Look for an rmap in the rmapbt that matches a given rmap. */
+static int
+rmap_lookup_overlapped(
+	struct xfs_btree_cur	*bt_cur,
+	struct xfs_rmap_irec	*rm_rec,
+	struct xfs_rmap_irec	*tmp,
+	int			*have)
+{
+	/* Have to use our fancy version for overlapped */
+	return -libxfs_rmap_lookup_le_range(bt_cur, rm_rec->rm_startblock,
+				rm_rec->rm_owner, rm_rec->rm_offset,
+				rm_rec->rm_flags, tmp, have);
+}
+
 /* Does the btree rmap cover the observed rmap? */
 #define NEXTP(x)	((x)->rm_startblock + (x)->rm_blockcount)
 #define NEXTL(x)	((x)->rm_offset + (x)->rm_blockcount)
@@ -1004,6 +1018,18 @@ rmaps_verify_btree(
 		error = rmap_lookup(bt_cur, rm_rec, &tmp, &have);
 		if (error)
 			goto err;
+		/*
+		 * Using the range query is expensive, so only do it if
+		 * the regular lookup doesn't find anything or if it doesn't
+		 * match the observed rmap.
+		 */
+		if (xfs_sb_version_hasreflink(&bt_cur->bc_mp->m_sb) &&
+				(!have || !rmap_is_good(rm_rec, &tmp))) {
+			error = rmap_lookup_overlapped(bt_cur, rm_rec,
+					&tmp, &have);
+			if (error)
+				goto err;
+		}
 		if (!have) {
 			do_warn(
 _("Missing reverse-mapping record for (%u/%u) %slen %u owner %"PRId64" \


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 37/39] xfs_repair: check for mergeable refcount records
  2016-10-25 23:03 [PATCH v10 00/39] xfsprogs: add reflink and dedupe support Darrick J. Wong
                   ` (35 preceding siblings ...)
  2016-10-25 23:07 ` [PATCH 36/39] xfs_repair: use range query when while checking rmaps Darrick J. Wong
@ 2016-10-25 23:07 ` Darrick J. Wong
  2016-10-25 23:07 ` [PATCH 38/39] xfs_repair: use thread pools to sort rmap data Darrick J. Wong
  2016-10-25 23:07 ` [PATCH 39/39] mkfs.xfs: format reflink enabled filesystems Darrick J. Wong
  38 siblings, 0 replies; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-25 23:07 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: linux-xfs

Make sure there aren't adjacent refcount records that could be merged;
this is a sign that the refcount tree algorithms aren't working
correctly.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/scan.c |   32 +++++++++++++++++++++++++++++++-
 1 file changed, 31 insertions(+), 1 deletion(-)


diff --git a/repair/scan.c b/repair/scan.c
index 800a88a..0e13581 100644
--- a/repair/scan.c
+++ b/repair/scan.c
@@ -1215,6 +1215,12 @@ _("%s rmap btree block claimed (state %d), agno %d, bno %d, suspect %d\n"),
 		rmap_avoid_check();
 }
 
+struct refc_priv {
+	struct xfs_refcount_irec	last_rec;
+	xfs_agblock_t			nr_blocks;
+};
+
+
 static void
 scan_refcbt(
 	struct xfs_btree_block	*block,
@@ -1234,6 +1240,7 @@ scan_refcbt(
 	int			numrecs;
 	int			state;
 	xfs_agblock_t		lastblock = 0;
+	struct refc_priv	*refc_priv = priv;
 
 	if (magic != XFS_REFC_CRC_MAGIC) {
 		name = "(unknown)";
@@ -1258,6 +1265,8 @@ scan_refcbt(
 			goto out;
 	}
 
+	refc_priv->nr_blocks++;
+
 	/* check for btree blocks multiply claimed */
 	state = get_bmap(agno, bno);
 	if (!(state == XR_E_UNKNOWN || state == XR_E_REFC))  {
@@ -1360,6 +1369,20 @@ _("extent (%u/%u) len %u claimed, state is %d\n"),
 				lastblock = b;
 			}
 
+			/* Is this record mergeable with the last one? */
+			if (refc_priv->last_rec.rc_startblock +
+			    refc_priv->last_rec.rc_blockcount == b &&
+			    refc_priv->last_rec.rc_refcount == nr) {
+				do_warn(
+	_("record %d in block (%u/%u) of %s tree should be merged with previous record\n"),
+					i, agno, bno, name);
+				refc_priv->last_rec.rc_blockcount += len;
+			} else {
+				refc_priv->last_rec.rc_startblock = b;
+				refc_priv->last_rec.rc_blockcount = len;
+				refc_priv->last_rec.rc_refcount = nr;
+			}
+
 			/* XXX: probably want to mark the reflinked areas? */
 		}
 		goto out;
@@ -2203,10 +2226,17 @@ validate_agf(
 	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
 		bno = be32_to_cpu(agf->agf_refcount_root);
 		if (bno != 0 && verify_agbno(mp, agno, bno)) {
+			struct refc_priv	priv;
+
+			memset(&priv, 0, sizeof(priv));
 			scan_sbtree(bno,
 				    be32_to_cpu(agf->agf_refcount_level),
 				    agno, 0, scan_refcbt, 1, XFS_REFC_CRC_MAGIC,
-				    agcnts, &xfs_refcountbt_buf_ops);
+				    &priv, &xfs_refcountbt_buf_ops);
+			if (be32_to_cpu(agf->agf_refcount_blocks) != priv.nr_blocks)
+				do_warn(_("bad refcountbt block count %u, saw %u\n"),
+					priv.nr_blocks,
+					be32_to_cpu(agf->agf_refcount_blocks));
 		} else  {
 			do_warn(_("bad agbno %u for refcntbt root, agno %d\n"),
 				bno, agno);


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 38/39] xfs_repair: use thread pools to sort rmap data
  2016-10-25 23:03 [PATCH v10 00/39] xfsprogs: add reflink and dedupe support Darrick J. Wong
                   ` (36 preceding siblings ...)
  2016-10-25 23:07 ` [PATCH 37/39] xfs_repair: check for mergeable refcount records Darrick J. Wong
@ 2016-10-25 23:07 ` Darrick J. Wong
  2016-10-25 23:07 ` [PATCH 39/39] mkfs.xfs: format reflink enabled filesystems Darrick J. Wong
  38 siblings, 0 replies; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-25 23:07 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: linux-xfs

Since each slab is a collection of independent mini-slabs, we can
fire up a bunch of threads to sort the mini-slabs in parallel.
This speeds up the sorting phase of the rmapbt rebuilding if we
have a large number of mini slabs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/slab.c |   46 ++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 44 insertions(+), 2 deletions(-)


diff --git a/repair/slab.c b/repair/slab.c
index 97c13d3..8609270 100644
--- a/repair/slab.c
+++ b/repair/slab.c
@@ -201,6 +201,27 @@ slab_add(
 	return 0;
 }
 
+#include "threads.h"
+
+struct qsort_slab {
+	struct xfs_slab		*slab;
+	struct xfs_slab_hdr	*hdr;
+	int			(*compare_fn)(const void *, const void *);
+};
+
+static void
+qsort_slab_helper(
+	struct work_queue	*wq,
+	xfs_agnumber_t		agno,
+	void			*arg)
+{
+	struct qsort_slab	*qs = arg;
+
+	qsort(slab_ptr(qs->slab, qs->hdr, 0), qs->hdr->sh_inuse,
+			qs->slab->s_item_sz, qs->compare_fn);
+	free(qs);
+}
+
 /*
  * Sort the items in the slab.  Do not run this method if there are any
  * cursors holding on to the slab.
@@ -210,14 +231,35 @@ qsort_slab(
 	struct xfs_slab		*slab,
 	int (*compare_fn)(const void *, const void *))
 {
+	struct work_queue	wq;
 	struct xfs_slab_hdr	*hdr;
+	struct qsort_slab	*qs;
+
+	/*
+	 * If we don't have that many slabs, we're probably better
+	 * off skipping all the thread overhead.
+	 */
+	if (slab->s_nr_slabs <= 4) {
+		hdr = slab->s_first;
+		while (hdr) {
+			qsort(slab_ptr(slab, hdr, 0), hdr->sh_inuse,
+					slab->s_item_sz, compare_fn);
+			hdr = hdr->sh_next;
+		}
+		return;
+	}
 
+	create_work_queue(&wq, NULL, libxfs_nproc());
 	hdr = slab->s_first;
 	while (hdr) {
-		qsort(slab_ptr(slab, hdr, 0), hdr->sh_inuse, slab->s_item_sz,
-		      compare_fn);
+		qs = malloc(sizeof(struct qsort_slab));
+		qs->slab = slab;
+		qs->hdr = hdr;
+		qs->compare_fn = compare_fn;
+		queue_work(&wq, qsort_slab_helper, 0, qs);
 		hdr = hdr->sh_next;
 	}
+	destroy_work_queue(&wq);
 }
 
 /*


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 39/39] mkfs.xfs: format reflink enabled filesystems
  2016-10-25 23:03 [PATCH v10 00/39] xfsprogs: add reflink and dedupe support Darrick J. Wong
                   ` (37 preceding siblings ...)
  2016-10-25 23:07 ` [PATCH 38/39] xfs_repair: use thread pools to sort rmap data Darrick J. Wong
@ 2016-10-25 23:07 ` Darrick J. Wong
  38 siblings, 0 replies; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-25 23:07 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: linux-xfs

Create the refcount btree at mkfs time and set the feature flag.

v2: Turn on the reflink feature when calculating the minimum log size.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 include/xfs_multidisk.h  |    3 +-
 libxfs/libxfs_api_defs.h |    1 +
 man/man8/mkfs.xfs.8      |   28 ++++++++++++++++++
 mkfs/maxtrres.c          |    5 +++
 mkfs/xfs_mkfs.c          |   70 ++++++++++++++++++++++++++++++++++++++++++----
 5 files changed, 99 insertions(+), 8 deletions(-)


diff --git a/include/xfs_multidisk.h b/include/xfs_multidisk.h
index 8dc3027..ce9bbce 100644
--- a/include/xfs_multidisk.h
+++ b/include/xfs_multidisk.h
@@ -68,6 +68,7 @@ extern void res_failed (int err);
 /* maxtrres.c */
 extern int max_trans_res(unsigned long agsize, int crcs_enabled, int dirversion,
 		int sectorlog, int blocklog, int inodelog, int dirblocklog,
-		int logversion, int log_sunit, int finobt, int rmapbt);
+		int logversion, int log_sunit, int finobt, int rmapbt,
+		int reflink);
 
 #endif	/* __XFS_MULTIDISK_H__ */
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index e95763d..d60b6c2 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -142,5 +142,6 @@
 #define xfs_refcount_lookup_le		libxfs_refcount_lookup_le
 #define xfs_refcount_get_rec		libxfs_refcount_get_rec
 #define xfs_rmap_lookup_le_range	libxfs_rmap_lookup_le_range
+#define xfs_refc_block			libxfs_refc_block
 
 #endif /* __LIBXFS_API_DEFS_H__ */
diff --git a/man/man8/mkfs.xfs.8 b/man/man8/mkfs.xfs.8
index c44b3bd..b2bc223 100644
--- a/man/man8/mkfs.xfs.8
+++ b/man/man8/mkfs.xfs.8
@@ -213,6 +213,34 @@ for filesystems created with the (default)
 option set. When the option
 .B \-m crc=0
 is used, the reverse mapping btree feature is not supported and is disabled.
+.TP
+.BI reflink= value
+This option enables the use of a separate reference count btree index in each
+allocation group. The value is either 0 to disable the feature, or 1 to create
+a reference count btree in each allocation group.
+.IP
+The reference count btree enables the sharing of physical extents between
+the data forks of different files, which is commonly known as "reflink".
+Unlike traditional Unix filesystems which assume that every inode and
+logical block pair map to a unique physical block, a reflink-capable
+XFS filesystem removes the uniqueness requirement, allowing up to four
+billion arbitrary inode/logical block pairs to map to a physical block.
+If a program tries to write to a multiply-referenced block in a file, the write
+will be redirected to a new block, and that file's logical-to-physical
+mapping will be changed to the new block ("copy on write").  This feature
+enables the creation of per-file snapshots and deduplication.  It is only
+available for the data forks of regular files.
+.IP
+By default,
+.B mkfs.xfs
+will not create reference count btrees and therefore will not enable the
+reflink feature.  This feature is only available for filesystems created with
+the (default)
+.B \-m crc=1
+option set. When the option
+.B \-m crc=0
+is used, the reference count btree feature is not supported and reflink is
+disabled.
 .RE
 .TP
 .BI \-d " data_section_options"
diff --git a/mkfs/maxtrres.c b/mkfs/maxtrres.c
index d7978b6..fba7818 100644
--- a/mkfs/maxtrres.c
+++ b/mkfs/maxtrres.c
@@ -39,7 +39,8 @@ max_trans_res(
 	int		logversion,
 	int		log_sunit,
 	int		finobt,
-	int		rmapbt)
+	int		rmapbt,
+	int		reflink)
 {
 	xfs_sb_t	*sbp;
 	xfs_mount_t	mount;
@@ -75,6 +76,8 @@ max_trans_res(
 		sbp->sb_features_ro_compat |= XFS_SB_FEAT_RO_COMPAT_FINOBT;
 	if (rmapbt)
 		sbp->sb_features_ro_compat |= XFS_SB_FEAT_RO_COMPAT_RMAPBT;
+	if (reflink)
+		sbp->sb_features_ro_compat |= XFS_SB_FEAT_RO_COMPAT_REFLINK;
 
 	libxfs_mount(&mount, sbp, 0,0,0,0);
 	maxfsb = libxfs_log_calc_minimum_size(&mount);
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index 580119e..fc565c0 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -672,6 +672,8 @@ struct opt_params mopts = {
 		"uuid",
 #define M_RMAPBT	3
 		"rmapbt",
+#define M_REFLINK	4
+		"reflink",
 		NULL
 	},
 	.subopt_params = {
@@ -697,6 +699,12 @@ struct opt_params mopts = {
 		  .maxval = 1,
 		  .defaultval = 0,
 		},
+		{ .index = M_REFLINK,
+		  .conflicts = { LAST_CONFLICT },
+		  .minval = 0,
+		  .maxval = 1,
+		  .defaultval = 0,
+		},
 	},
 };
 
@@ -1155,6 +1163,7 @@ struct sb_feat_args {
 	bool	dirftype;
 	bool	parent_pointers;
 	bool	rmapbt;
+	bool	reflink;
 };
 
 static void
@@ -1227,6 +1236,8 @@ sb_set_features(
 		sbp->sb_features_ro_compat = XFS_SB_FEAT_RO_COMPAT_FINOBT;
 	if (fp->rmapbt)
 		sbp->sb_features_ro_compat |= XFS_SB_FEAT_RO_COMPAT_RMAPBT;
+	if (fp->reflink)
+		sbp->sb_features_ro_compat |= XFS_SB_FEAT_RO_COMPAT_REFLINK;
 
 	/*
 	 * Sparse inode chunk support has two main inode alignment requirements.
@@ -1488,6 +1499,7 @@ main(
 		.dirftype = true,
 		.parent_pointers = false,
 		.rmapbt = false,
+		.reflink = false,
 	};
 
 	platform_uuid_generate(&uuid);
@@ -1776,6 +1788,10 @@ main(
 					sb_feat.rmapbt = getnum(
 						value, &mopts, M_RMAPBT);
 					break;
+				case M_REFLINK:
+					sb_feat.reflink = getnum(
+						value, &mopts, M_REFLINK);
+					break;
 				default:
 					unknown('m', value);
 				}
@@ -2115,6 +2131,13 @@ _("rmapbt not supported without CRC support\n"));
 			usage();
 		}
 		sb_feat.rmapbt = false;
+
+		if (sb_feat.reflink) {
+			fprintf(stderr,
+_("reflink not supported without CRC support\n"));
+			usage();
+		}
+		sb_feat.reflink = false;
 	}
 
 
@@ -2599,7 +2622,7 @@ an AG size that is one stripe unit smaller, for example %llu.\n"),
 				   sb_feat.crcs_enabled, sb_feat.dir_version,
 				   sectorlog, blocklog, inodelog, dirblocklog,
 				   sb_feat.log_version, lsunit, sb_feat.finobt,
-				   sb_feat.rmapbt);
+				   sb_feat.rmapbt, sb_feat.reflink);
 	ASSERT(min_logblocks);
 	min_logblocks = MAX(XFS_MIN_LOG_BLOCKS, min_logblocks);
 	if (!logsize && dblocks >= (1024*1024*1024) >> blocklog)
@@ -2734,7 +2757,7 @@ _("size %s specified for log subvolume is too large, maximum is %lld blocks\n"),
 		printf(_(
 		   "meta-data=%-22s isize=%-6d agcount=%lld, agsize=%lld blks\n"
 		   "         =%-22s sectsz=%-5u attr=%u, projid32bit=%u\n"
-		   "         =%-22s crc=%-8u finobt=%u, sparse=%u, rmapbt=%u\n"
+		   "         =%-22s crc=%-8u finobt=%u, sparse=%u, rmapbt=%u, reflink=%u\n"
 		   "data     =%-22s bsize=%-6u blocks=%llu, imaxpct=%u\n"
 		   "         =%-22s sunit=%-6u swidth=%u blks\n"
 		   "naming   =version %-14u bsize=%-6u ascii-ci=%d ftype=%d\n"
@@ -2745,7 +2768,7 @@ _("size %s specified for log subvolume is too large, maximum is %lld blocks\n"),
 			"", sectorsize, sb_feat.attr_version,
 				    !sb_feat.projid16bit,
 			"", sb_feat.crcs_enabled, sb_feat.finobt, sb_feat.spinodes,
-			sb_feat.rmapbt,
+			sb_feat.rmapbt, sb_feat.reflink,
 			"", blocksize, (long long)dblocks, imaxpct,
 			"", dsunit, dswidth,
 			sb_feat.dir_version, dirblocksize, sb_feat.nci,
@@ -2933,7 +2956,12 @@ _("size %s specified for log subvolume is too large, maximum is %lld blocks\n"),
 			agf->agf_levels[XFS_BTNUM_RMAPi] = cpu_to_be32(1);
 			agf->agf_rmap_blocks = cpu_to_be32(1);
 		}
-
+		if (xfs_sb_version_hasreflink(&mp->m_sb)) {
+			agf->agf_refcount_root = cpu_to_be32(
+					libxfs_refc_block(mp));
+			agf->agf_refcount_level = cpu_to_be32(1);
+			agf->agf_refcount_blocks = cpu_to_be32(1);
+		}
 		agf->agf_flfirst = 0;
 		agf->agf_fllast = cpu_to_be32(XFS_AGFL_SIZE(mp) - 1);
 		agf->agf_flcount = 0;
@@ -3102,6 +3130,24 @@ _("size %s specified for log subvolume is too large, maximum is %lld blocks\n"),
 		libxfs_writebuf(buf, LIBXFS_EXIT_ON_FAILURE);
 
 		/*
+		 * refcount btree root block
+		 */
+		if (xfs_sb_version_hasreflink(&mp->m_sb)) {
+			buf = libxfs_getbuf(mp->m_ddev_targp,
+					XFS_AGB_TO_DADDR(mp, agno,
+						libxfs_refc_block(mp)),
+					bsize);
+			buf->b_ops = &xfs_refcountbt_buf_ops;
+
+			block = XFS_BUF_TO_BLOCK(buf);
+			memset(block, 0, blocksize);
+			libxfs_btree_init_block(mp, buf, XFS_REFC_CRC_MAGIC, 0,
+						0, agno, XFS_BTREE_CRC_BLOCKS);
+
+			libxfs_writebuf(buf, LIBXFS_EXIT_ON_FAILURE);
+		}
+
+		/*
 		 * INO btree root block
 		 */
 		buf = libxfs_getbuf(mp->m_ddev_targp,
@@ -3189,9 +3235,21 @@ _("size %s specified for log subvolume is too large, maximum is %lld blocks\n"),
 			rrec->rm_offset = 0;
 			be16_add_cpu(&block->bb_numrecs, 1);
 
+			/* account for refcount btree root */
+			if (xfs_sb_version_hasreflink(&mp->m_sb)) {
+				rrec = XFS_RMAP_REC_ADDR(block, 5);
+				rrec->rm_startblock = cpu_to_be32(
+							libxfs_refc_block(mp));
+				rrec->rm_blockcount = cpu_to_be32(1);
+				rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_REFC);
+				rrec->rm_offset = 0;
+				be16_add_cpu(&block->bb_numrecs, 1);
+			}
+
 			/* account for the log space */
 			if (loginternal && agno == logagno) {
-				rrec = XFS_RMAP_REC_ADDR(block, 5);
+				rrec = XFS_RMAP_REC_ADDR(block,
+					be16_to_cpu(block->bb_numrecs) + 1);
 				rrec->rm_startblock = cpu_to_be32(
 						XFS_FSB_TO_AGBNO(mp, logstart));
 				rrec->rm_blockcount = cpu_to_be32(logblocks);
@@ -3446,7 +3504,7 @@ usage( void )
 {
 	fprintf(stderr, _("Usage: %s\n\
 /* blocksize */		[-b log=n|size=num]\n\
-/* metadata */		[-m crc=0|1,finobt=0|1,uuid=xxx,rmapbt=0|1]\n\
+/* metadata */		[-m crc=0|1,finobt=0|1,uuid=xxx,rmapbt=0|1,reflink=0|1]\n\
 /* data subvol */	[-d agcount=n,agsize=n,file,name=xxx,size=num,\n\
 			    (sunit=value,swidth=value|su=num,sw=num|noalign),\n\
 			    sectlog=n|sectsize=num\n\


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* Re: [PATCH 10/39] xfs_db: add support for checking the refcount btree
  2016-10-25 23:04 ` [PATCH 10/39] xfs_db: add support for checking the refcount btree Darrick J. Wong
@ 2016-10-26  0:49   ` Dave Chinner
  2016-10-26  1:13     ` Darrick J. Wong
  0 siblings, 1 reply; 85+ messages in thread
From: Dave Chinner @ 2016-10-26  0:49 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Oct 25, 2016 at 04:04:39PM -0700, Darrick J. Wong wrote:
> Do some basic checks of the refcount btree.  xfs_repair will have to
> check that the reference counts match the various bmbt mappings.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
.....
> @@ -1561,10 +1597,15 @@ check_set_dbmap(
>  			agbno, agbno + len - 1, c_agno, c_agbno);
>  		return;
>  	}
> -	check_dbmap(agno, agbno, len, type1);
> +	check_dbmap(agno, agbno, len, type1, is_reflink(type2));
>  	mayprint = verbose | blist_size;
>  	for (i = 0, p = &dbmap[agno][agbno]; i < len; i++, p++) {
> -		*p = (char)type2;
> +		if (*p == DBM_RLDATA && type2 == DBM_DATA)
> +			;	/* do nothing */
> +		if (*p == DBM_DATA && type2 == DBM_DATA)
> +			*p = (char)DBM_RLDATA;
> +		else
> +			*p = (char)type2;

What this /* do nothing */ case for?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 13/39] xfs_db: print one array element per line
  2016-10-25 23:04 ` [PATCH 13/39] xfs_db: print one array element per line Darrick J. Wong
@ 2016-10-26  0:51   ` Dave Chinner
  2016-10-26  1:13     ` Darrick J. Wong
  0 siblings, 1 reply; 85+ messages in thread
From: Dave Chinner @ 2016-10-26  0:51 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Oct 25, 2016 at 04:04:58PM -0700, Darrick J. Wong wrote:
> Print one array element per line so that the debugger output isn't
> a gigantic pile of screen snow.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

What commands does this affect?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 16/39] libxfs: add configure option to override system header fsxattr
  2016-10-25 23:05 ` [PATCH 16/39] libxfs: add configure option to override system header fsxattr Darrick J. Wong
@ 2016-10-26  0:56   ` Dave Chinner
  2016-10-26  1:16     ` Darrick J. Wong
  2016-10-26 10:32   ` Christoph Hellwig
  1 sibling, 1 reply; 85+ messages in thread
From: Dave Chinner @ 2016-10-26  0:56 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Oct 25, 2016 at 04:05:17PM -0700, Darrick J. Wong wrote:
> By default, libxfs will use the kernel/system headers to define struct
> fsxattr.  Unfortunately, this creates a problem for developers who are
> writing new features but building xfsprogs on a stable system, because
> the stable kernel's headers don't reflect the new feature.  In this
> case, we want to be able to use the internal fsxattr definition while
> the kernel headers catch up, so provide some configure magic to allow
> developers to force the use of the internal definition.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
....
>  #include <stdio.h>
>  #include <asm/types.h>
>  #include <mntent.h>
> +#ifdef OVERRIDE_SYSTEM_FSXATTR
> +# define fsxattr sys_fsxattr
> +#endif
>  #include <linux/fs.h> /* fsxattr defintion for new kernels */
> +#ifdef OVERRIDE_SYSTEM_FSXATTR
> +# undef fsxattr
> +#endif

messy, but I can't think of a cleaner way of doing this.
>  
>  static __inline__ int xfsctl(const char *path, int fd, int cmd, void *p)
>  {
> @@ -175,7 +181,7 @@ static inline void platform_mntent_close(struct mntent_cursor * cursor)
>   * are a copy of the definitions moved to linux/uapi/fs.h in the 4.5 kernel,
>   * so this is purely for supporting builds against old kernel headers.
>   */
> -#ifndef FS_IOC_FSGETXATTR
> +#if !defined FS_IOC_FSGETXATTR || defined OVERRIDE_SYSTEM_FSXATTR
>  struct fsxattr {
>  	__u32		fsx_xflags;	/* xflags field value (get/set) */
>  	__u32		fsx_extsize;	/* extsize field value (get/set)*/
> @@ -184,7 +190,9 @@ struct fsxattr {
>  	__u32		fsx_cowextsize;	/* cow extsize field value (get/set) */
>  	unsigned char	fsx_pad[8];
>  };
> +#endif
>  
> +#ifndef FS_IOC_FSGETXATTR
>  /*
>   * Flags for the fsx_xflags field
>   */

Hmmm - what happens if all we are doing is introducing new flags?
Doesn't the overide need to cover them as well?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 17/39] xfs_io: get and set the CoW extent size hint
  2016-10-25 23:05 ` [PATCH 17/39] xfs_io: get and set the CoW extent size hint Darrick J. Wong
@ 2016-10-26  1:06   ` Dave Chinner
  0 siblings, 0 replies; 85+ messages in thread
From: Dave Chinner @ 2016-10-26  1:06 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Oct 25, 2016 at 04:05:23PM -0700, Darrick J. Wong wrote:
> Enable administrators to get or set the CoW extent size hint.
> Report the hint when we run stat.  This also requires some
> autoconf magic to detect whether or not fsx_cowextsize exists.
....
> +++ b/include/builddefs.in
> @@ -109,6 +109,7 @@ HAVE_MNTENT = @have_mntent@
>  HAVE_FLS = @have_fls@
>  HAVE_FSETXATTR = @have_fsetxattr@
>  HAVE_MREMAP = @have_mremap@
> +HAVE_FSXATTR_COWEXTSIZE = @have_fsxattr_cowextsize@
>  ENABLE_INTERNAL_FSXATTR = @enable_internal_fsxattr@

We really need a comment here that states that
ENABLE_INTERNAL_FSXATTR=yes implies HAVE_FSXATTR_COWEXTSIZE=yes
because of the autoconf magic we do....

> diff --git a/m4/package_libcdev.m4 b/m4/package_libcdev.m4
> index 7a847e9..45954c2 100644
> --- a/m4/package_libcdev.m4
> +++ b/m4/package_libcdev.m4
> @@ -265,3 +265,29 @@ AC_DEFUN([AC_HAVE_MREMAP],
>         )
>      AC_SUBST(have_mremap)
>    ])
> +
> +#
> +# Check if we have a struct fsxattr with a fsx_cowextsize field.
> +# If linux/fs.h has a struct with that field, then we're ok.
> +# If we can't find fsxattr in linux/fs.h at all, the internal
> +# definitions provide it, and we're ok.
> +#
> +# The only way we won't have this is if the kernel headers don't
> +# have the field.
> +#
> +AC_DEFUN([AC_HAVE_FSXATTR_COWEXTSIZE],
> +  [ AM_CONDITIONAL([INTERNAL_FSXATTR], [test "x$enable_internal_fsxattr" = xyes])
> +    AM_COND_IF([INTERNAL_FSXATTR],
> +    [have_fsxattr_cowextsize=yes],

Ok, so here we set the value. I think this needs a better/more
generic comment because we're going to have to repeat this
pattern in future. i.e. explain the basic construct, then as a
separate statement say "apply it to detection of the fsx_cowextsize
field".

> +    [ AC_CHECK_TYPE(struct fsxattr,
> +	  [AC_CHECK_MEMBER(struct fsxattr.fsx_cowextsize,
> +		  have_fsxattr_cowextsize=yes,
> +		  have_fsxattr_cowextsize=no,
> +		  [#include <linux/fs.h>]
> +          )],
> +	  have_fsxattr_cowextsize=yes,
> +	  [#include <linux/fs.h>]
> +      )
> +    ])
> +    AC_SUBST(have_fsxattr_cowextsize)
> +  ])

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 10/39] xfs_db: add support for checking the refcount btree
  2016-10-26  0:49   ` Dave Chinner
@ 2016-10-26  1:13     ` Darrick J. Wong
  2016-10-26  3:26       ` Dave Chinner
  0 siblings, 1 reply; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-26  1:13 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Oct 26, 2016 at 11:49:20AM +1100, Dave Chinner wrote:
> On Tue, Oct 25, 2016 at 04:04:39PM -0700, Darrick J. Wong wrote:
> > Do some basic checks of the refcount btree.  xfs_repair will have to
> > check that the reference counts match the various bmbt mappings.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> .....
> > @@ -1561,10 +1597,15 @@ check_set_dbmap(
> >  			agbno, agbno + len - 1, c_agno, c_agbno);
> >  		return;
> >  	}
> > -	check_dbmap(agno, agbno, len, type1);
> > +	check_dbmap(agno, agbno, len, type1, is_reflink(type2));
> >  	mayprint = verbose | blist_size;
> >  	for (i = 0, p = &dbmap[agno][agbno]; i < len; i++, p++) {
> > -		*p = (char)type2;
> > +		if (*p == DBM_RLDATA && type2 == DBM_DATA)
> > +			;	/* do nothing */
> > +		if (*p == DBM_DATA && type2 == DBM_DATA)
> > +			*p = (char)DBM_RLDATA;
> > +		else
> > +			*p = (char)type2;
> 
> What this /* do nothing */ case for?

Handles the case that a data block with multiple owners encounters
another owner.  The second case in the block handles the case that
a data block with a single owner encounters a second owner.

(Assuming you're prodding me to add a comment, I'll go do that.)

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 13/39] xfs_db: print one array element per line
  2016-10-26  0:51   ` Dave Chinner
@ 2016-10-26  1:13     ` Darrick J. Wong
  2016-10-26  3:23       ` Dave Chinner
  0 siblings, 1 reply; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-26  1:13 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Oct 26, 2016 at 11:51:38AM +1100, Dave Chinner wrote:
> On Tue, Oct 25, 2016 at 04:04:58PM -0700, Darrick J. Wong wrote:
> > Print one array element per line so that the debugger output isn't
> > a gigantic pile of screen snow.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> What commands does this affect?

The 'print' command.

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 16/39] libxfs: add configure option to override system header fsxattr
  2016-10-26  0:56   ` Dave Chinner
@ 2016-10-26  1:16     ` Darrick J. Wong
  0 siblings, 0 replies; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-26  1:16 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Oct 26, 2016 at 11:56:07AM +1100, Dave Chinner wrote:
> On Tue, Oct 25, 2016 at 04:05:17PM -0700, Darrick J. Wong wrote:
> > By default, libxfs will use the kernel/system headers to define struct
> > fsxattr.  Unfortunately, this creates a problem for developers who are
> > writing new features but building xfsprogs on a stable system, because
> > the stable kernel's headers don't reflect the new feature.  In this
> > case, we want to be able to use the internal fsxattr definition while
> > the kernel headers catch up, so provide some configure magic to allow
> > developers to force the use of the internal definition.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ....
> >  #include <stdio.h>
> >  #include <asm/types.h>
> >  #include <mntent.h>
> > +#ifdef OVERRIDE_SYSTEM_FSXATTR
> > +# define fsxattr sys_fsxattr
> > +#endif
> >  #include <linux/fs.h> /* fsxattr defintion for new kernels */
> > +#ifdef OVERRIDE_SYSTEM_FSXATTR
> > +# undef fsxattr
> > +#endif
> 
> messy, but I can't think of a cleaner way of doing this.
> >  
> >  static __inline__ int xfsctl(const char *path, int fd, int cmd, void *p)
> >  {
> > @@ -175,7 +181,7 @@ static inline void platform_mntent_close(struct mntent_cursor * cursor)
> >   * are a copy of the definitions moved to linux/uapi/fs.h in the 4.5 kernel,
> >   * so this is purely for supporting builds against old kernel headers.
> >   */
> > -#ifndef FS_IOC_FSGETXATTR
> > +#if !defined FS_IOC_FSGETXATTR || defined OVERRIDE_SYSTEM_FSXATTR
> >  struct fsxattr {
> >  	__u32		fsx_xflags;	/* xflags field value (get/set) */
> >  	__u32		fsx_extsize;	/* extsize field value (get/set)*/
> > @@ -184,7 +190,9 @@ struct fsxattr {
> >  	__u32		fsx_cowextsize;	/* cow extsize field value (get/set) */
> >  	unsigned char	fsx_pad[8];
> >  };
> > +#endif
> >  
> > +#ifndef FS_IOC_FSGETXATTR
> >  /*
> >   * Flags for the fsx_xflags field
> >   */
> 
> Hmmm - what happens if all we are doing is introducing new flags?
> Doesn't the overide need to cover them as well?

As I did in the next patch, my intent is for the include/ header files to
define any flag that isn't picked up by the system headers.  IOWs, the
OVERRIDE_SYSTEM_FSXATTR only helps us pick up changes to the struct
fsxattr definition.

#ifndef FS_XFLAG_MOOCOW
# define FS_XFLAG_MOOCOW   0xBEEF
#endif

(Inelegant, but oh well.)

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 13/39] xfs_db: print one array element per line
  2016-10-26  1:13     ` Darrick J. Wong
@ 2016-10-26  3:23       ` Dave Chinner
  2016-10-26  3:34         ` Darrick J. Wong
  0 siblings, 1 reply; 85+ messages in thread
From: Dave Chinner @ 2016-10-26  3:23 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Oct 25, 2016 at 06:13:43PM -0700, Darrick J. Wong wrote:
> On Wed, Oct 26, 2016 at 11:51:38AM +1100, Dave Chinner wrote:
> > On Tue, Oct 25, 2016 at 04:04:58PM -0700, Darrick J. Wong wrote:
> > > Print one array element per line so that the debugger output isn't
> > > a gigantic pile of screen snow.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > What commands does this affect?
> 
> The 'print' command.

Ok, let me be more specific - what is an example of the change of
behaviour? printing bmbt records in a block, or something else?
Can you post a before/after example?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 10/39] xfs_db: add support for checking the refcount btree
  2016-10-26  1:13     ` Darrick J. Wong
@ 2016-10-26  3:26       ` Dave Chinner
  2016-10-26  6:29         ` Darrick J. Wong
  0 siblings, 1 reply; 85+ messages in thread
From: Dave Chinner @ 2016-10-26  3:26 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Oct 25, 2016 at 06:13:09PM -0700, Darrick J. Wong wrote:
> On Wed, Oct 26, 2016 at 11:49:20AM +1100, Dave Chinner wrote:
> > On Tue, Oct 25, 2016 at 04:04:39PM -0700, Darrick J. Wong wrote:
> > > Do some basic checks of the refcount btree.  xfs_repair will have to
> > > check that the reference counts match the various bmbt mappings.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > .....
> > > @@ -1561,10 +1597,15 @@ check_set_dbmap(
> > >  			agbno, agbno + len - 1, c_agno, c_agbno);
> > >  		return;
> > >  	}
> > > -	check_dbmap(agno, agbno, len, type1);
> > > +	check_dbmap(agno, agbno, len, type1, is_reflink(type2));
> > >  	mayprint = verbose | blist_size;
> > >  	for (i = 0, p = &dbmap[agno][agbno]; i < len; i++, p++) {
> > > -		*p = (char)type2;
> > > +		if (*p == DBM_RLDATA && type2 == DBM_DATA)
> > > +			;	/* do nothing */
> > > +		if (*p == DBM_DATA && type2 == DBM_DATA)
> > > +			*p = (char)DBM_RLDATA;
> > > +		else
> > > +			*p = (char)type2;
> > 
> > What this /* do nothing */ case for?
> 
> Handles the case that a data block with multiple owners encounters
> another owner.  The second case in the block handles the case that
> a data block with a single owner encounters a second owner.
> 
> (Assuming you're prodding me to add a comment, I'll go do that.)

Ah, no, I'm asking why you added dead code:

	if (foo)
		;
	if (bar)
		....
	else
		....

the "if (foo) ;" case is dead code - it doesn't need to exist.
Did you mean this:

	if (foo)
		;
	else if (bar)
		....
	else
		....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 13/39] xfs_db: print one array element per line
  2016-10-26  3:23       ` Dave Chinner
@ 2016-10-26  3:34         ` Darrick J. Wong
  2016-10-26  5:48           ` Dave Chinner
  0 siblings, 1 reply; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-26  3:34 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Oct 26, 2016 at 02:23:08PM +1100, Dave Chinner wrote:
> On Tue, Oct 25, 2016 at 06:13:43PM -0700, Darrick J. Wong wrote:
> > On Wed, Oct 26, 2016 at 11:51:38AM +1100, Dave Chinner wrote:
> > > On Tue, Oct 25, 2016 at 04:04:58PM -0700, Darrick J. Wong wrote:
> > > > Print one array element per line so that the debugger output isn't
> > > > a gigantic pile of screen snow.
> > > > 
> > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > What commands does this affect?
> > 
> > The 'print' command.
> 
> Ok, let me be more specific - what is an example of the change of
> behaviour? printing bmbt records in a block, or something else?
> Can you post a before/after example?

Oh, sorry.  The patch changes the print command such that arrays of
records are printed with one record per line instead of one enormously
long line.

Before (inobt):

xfs_db> p recs
recs[1-55] = [startino,holemask,count,freecount,free] 
1:[128,0,64,0,0] 2:[4288,0xff,32,0,0xffffffff] 3:[4352,0,64,0,0] 4:[4416,0,64,10,0x1f0003e000000000] 5:[4480,0,64,17,0xc00e1803c2007840] 6:[4544,0,64,18,0x21e00c3801870070] 7:[4608,0,64,16,0x403e0007c010f00] 8:[4672,0,64,28,0xe1f07c3e8078200f] 9:[4736,0,64,18,0x838030700e0c01c1] 10:[4800,0,64,33,0xe1f07c3e0f87c10f] 11:[4864,0,64,33,0xc3e0f87c1f0f83e1] 12:[4928,0,64,33,0xf87c1f0f83e1f07] 13:[4992,0,64,32,0x7c3e0f87c1f0f83] 14:[5056,0,64,38,0xf87c1f0ff0f83e1f] 15:[5120,0,64,32,0x83e1f07c3e0f87c1] 16:[5184,0,64,32,0xc1f0f83e1f07c3e0] 17:[5248,0,64,35,0x7c1f0f837c3e0f87] 18:[5312,0,64,33,0xe0f87c1f0f83e1f0] 19:[5376,0,64,33,0x87c1f0f83e1f07c3] 20:[5440,0,64,34,0x1f0f83e1f07c3e0f] 

After:

xfs_db> p recs
recs[1-55] = [startino,holemask,count,freecount,free] 
1:[128,0,64,0,0] 
2:[4288,0xff,32,0,0xffffffff] 
3:[4352,0,64,0,0] 
4:[4416,0,64,10,0x1f0003e000000000] 
5:[4480,0,64,17,0xc00e1803c2007840] 
6:[4544,0,64,18,0x21e00c3801870070] 
7:[4608,0,64,16,0x403e0007c010f00] 
8:[4672,0,64,28,0xe1f07c3e8078200f] 
9:[4736,0,64,18,0x838030700e0c01c1] 
10:[4800,0,64,33,0xe1f07c3e0f87c10f] 
11:[4864,0,64,33,0xc3e0f87c1f0f83e1] 
12:[4928,0,64,33,0xf87c1f0f83e1f07] 
13:[4992,0,64,32,0x7c3e0f87c1f0f83] 
14:[5056,0,64,38,0xf87c1f0ff0f83e1f] 
15:[5120,0,64,32,0x83e1f07c3e0f87c1] 
16:[5184,0,64,32,0xc1f0f83e1f07c3e0] 
17:[5248,0,64,35,0x7c1f0f837c3e0f87] 
18:[5312,0,64,33,0xe0f87c1f0f83e1f0] 
19:[5376,0,64,33,0x87c1f0f83e1f07c3] 
20:[5440,0,64,34,0x1f0f83e1f07c3e0f] 

Somewhat less eyeball-bleeding, hopefully. :)

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 13/39] xfs_db: print one array element per line
  2016-10-26  3:34         ` Darrick J. Wong
@ 2016-10-26  5:48           ` Dave Chinner
  0 siblings, 0 replies; 85+ messages in thread
From: Dave Chinner @ 2016-10-26  5:48 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Oct 25, 2016 at 08:34:13PM -0700, Darrick J. Wong wrote:
> On Wed, Oct 26, 2016 at 02:23:08PM +1100, Dave Chinner wrote:
> > On Tue, Oct 25, 2016 at 06:13:43PM -0700, Darrick J. Wong wrote:
> > > On Wed, Oct 26, 2016 at 11:51:38AM +1100, Dave Chinner wrote:
> > > > On Tue, Oct 25, 2016 at 04:04:58PM -0700, Darrick J. Wong wrote:
> > > > > Print one array element per line so that the debugger output isn't
> > > > > a gigantic pile of screen snow.
> > > > > 
> > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > 
> > > > What commands does this affect?
> > > 
> > > The 'print' command.
> > 
> > Ok, let me be more specific - what is an example of the change of
> > behaviour? printing bmbt records in a block, or something else?
> > Can you post a before/after example?
> 
> Oh, sorry.  The patch changes the print command such that arrays of
> records are printed with one record per line instead of one enormously
> long line.
> 
> Before (inobt):
> 
> xfs_db> p recs
> recs[1-55] = [startino,holemask,count,freecount,free] 
> 1:[128,0,64,0,0] 2:[4288,0xff,32,0,0xffffffff] 3:[4352,0,64,0,0] 4:[4416,0,64,10,0x1f0003e000000000] 5:[4480,0,64,17,0xc00e1803c2007840] 6:[4544,0,64,18,0x21e00c3801870070] 7:[4608,0,64,16,0x403e0007c010f00] 8:[4672,0,64,28,0xe1f07c3e8078200f] 9:[4736,0,64,18,0x838030700e0c01c1] 10:[4800,0,64,33,0xe1f07c3e0f87c10f] 11:[4864,0,64,33,0xc3e0f87c1f0f83e1] 12:[4928,0,64,33,0xf87c1f0f83e1f07] 13:[4992,0,64,32,0x7c3e0f87c1f0f83] 14:[5056,0,64,38,0xf87c1f0ff0f83e1f] 15:[5120,0,64,32,0x83e1f07c3e0f87c1] 16:[5184,0,64,32,0xc1f0f83e1f07c3e0] 17:[5248,0,64,35,0x7c1f0f837c3e0f87] 18:[5312,0,64,33,0xe0f87c1f0f83e1f0] 19:[5376,0,64,33,0x87c1f0f83e1f07c3] 20:[5440,0,64,34,0x1f0f83e1f07c3e0f] 
> 
> After:
> 
> xfs_db> p recs
> recs[1-55] = [startino,holemask,count,freecount,free] 
> 1:[128,0,64,0,0] 
> 2:[4288,0xff,32,0,0xffffffff] 
> 3:[4352,0,64,0,0] 
> 4:[4416,0,64,10,0x1f0003e000000000] 
> 5:[4480,0,64,17,0xc00e1803c2007840] 
> 6:[4544,0,64,18,0x21e00c3801870070] 
> 7:[4608,0,64,16,0x403e0007c010f00] 
> 8:[4672,0,64,28,0xe1f07c3e8078200f] 
> 9:[4736,0,64,18,0x838030700e0c01c1] 
> 10:[4800,0,64,33,0xe1f07c3e0f87c10f] 
> 11:[4864,0,64,33,0xc3e0f87c1f0f83e1] 
> 12:[4928,0,64,33,0xf87c1f0f83e1f07] 

Ok, that's what I suspected it did. Some benefits, some drawbacks
(e.g. if you don't have a scroll buffer). I can live with it.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 10/39] xfs_db: add support for checking the refcount btree
  2016-10-26  3:26       ` Dave Chinner
@ 2016-10-26  6:29         ` Darrick J. Wong
  0 siblings, 0 replies; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-26  6:29 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Oct 26, 2016 at 02:26:38PM +1100, Dave Chinner wrote:
> On Tue, Oct 25, 2016 at 06:13:09PM -0700, Darrick J. Wong wrote:
> > On Wed, Oct 26, 2016 at 11:49:20AM +1100, Dave Chinner wrote:
> > > On Tue, Oct 25, 2016 at 04:04:39PM -0700, Darrick J. Wong wrote:
> > > > Do some basic checks of the refcount btree.  xfs_repair will have to
> > > > check that the reference counts match the various bmbt mappings.
> > > > 
> > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > .....
> > > > @@ -1561,10 +1597,15 @@ check_set_dbmap(
> > > >  			agbno, agbno + len - 1, c_agno, c_agbno);
> > > >  		return;
> > > >  	}
> > > > -	check_dbmap(agno, agbno, len, type1);
> > > > +	check_dbmap(agno, agbno, len, type1, is_reflink(type2));
> > > >  	mayprint = verbose | blist_size;
> > > >  	for (i = 0, p = &dbmap[agno][agbno]; i < len; i++, p++) {
> > > > -		*p = (char)type2;
> > > > +		if (*p == DBM_RLDATA && type2 == DBM_DATA)
> > > > +			;	/* do nothing */
> > > > +		if (*p == DBM_DATA && type2 == DBM_DATA)
> > > > +			*p = (char)DBM_RLDATA;
> > > > +		else
> > > > +			*p = (char)type2;
> > > 
> > > What this /* do nothing */ case for?
> > 
> > Handles the case that a data block with multiple owners encounters
> > another owner.  The second case in the block handles the case that
> > a data block with a single owner encounters a second owner.
> > 
> > (Assuming you're prodding me to add a comment, I'll go do that.)
> 
> Ah, no, I'm asking why you added dead code:
> 
> 	if (foo)
> 		;
> 	if (bar)
> 		....
> 	else
> 		....
> 
> the "if (foo) ;" case is dead code - it doesn't need to exist.
> Did you mean this:
> 
> 	if (foo)
> 		;
> 	else if (bar)
> 		....
> 	else
> 		....

Yes, there should be an 'else' before the second if test.

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 01/39] xfs: convert RUI log formats to use variable length arrays
  2016-10-25 23:03 ` [PATCH 01/39] xfs: convert RUI log formats to use variable length arrays Darrick J. Wong
@ 2016-10-26 10:21   ` Christoph Hellwig
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Hellwig @ 2016-10-26 10:21 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: david, linux-xfs

On Tue, Oct 25, 2016 at 04:03:44PM -0700, Darrick J. Wong wrote:
> Use variable length array declarations for RUI log items,
> and replace the open coded sizeof formulae with a single function.
> 
> [Fix up the logprint code to reflect the new RUI format.]

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 02/39] xfs_buflock: handling parsing errors more gracefully
  2016-10-25 23:03 ` [PATCH 02/39] xfs_buflock: handling parsing errors more gracefully Darrick J. Wong
@ 2016-10-26 10:22   ` Christoph Hellwig
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Hellwig @ 2016-10-26 10:22 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: david, linux-xfs

On Tue, Oct 25, 2016 at 04:03:50PM -0700, Darrick J. Wong wrote:
> Skip ftrace output lines that don't parse.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 03/39] xfs: define the on-disk refcount btree format
  2016-10-25 23:03 ` [PATCH 03/39] xfs: define the on-disk refcount btree format Darrick J. Wong
@ 2016-10-26 10:23   ` Christoph Hellwig
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Hellwig @ 2016-10-26 10:23 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: david, linux-xfs, Christoph Hellwig

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

> [hch: allocate the cursor with KM_NOFS to quiet lockdep]
> Signed-off-by: Christoph Hellwig <hch@lst.de>

I don't remember touching any of this, though :)

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 04/39] xfs: add refcount btree operations
  2016-10-25 23:04 ` [PATCH 04/39] xfs: add refcount btree operations Darrick J. Wong
@ 2016-10-26 10:23   ` Christoph Hellwig
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Hellwig @ 2016-10-26 10:23 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: david, linux-xfs, Christoph Hellwig

On Tue, Oct 25, 2016 at 04:04:02PM -0700, Darrick J. Wong wrote:
> Implement the generic btree operations required to manipulate refcount
> btree blocks.  The implementation is similar to the bmapbt, though it
> will only allocate and free blocks from the AG.
> 
> [Add the xfs_refcount.h file to the standard include list.]
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> [hch: fix logging of AGF refcount btree fields]
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Both the commit log and signoff look odd for adding a single include
directive..

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 05/39] xfs: connect refcount adjust functions to upper layers
  2016-10-25 23:04 ` [PATCH 05/39] xfs: connect refcount adjust functions to upper layers Darrick J. Wong
@ 2016-10-26 10:24   ` Christoph Hellwig
  2016-10-26 18:06     ` Darrick J. Wong
  0 siblings, 1 reply; 85+ messages in thread
From: Christoph Hellwig @ 2016-10-26 10:24 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: david, linux-xfs

On Tue, Oct 25, 2016 at 04:04:08PM -0700, Darrick J. Wong wrote:
> Plumb in the upper level interface to schedule and finish deferred
> refcount operations via the deferred ops mechanism.
> 
> [Plumb in refcount deferred op log items.]
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Shouldn't we try to shared xfs_trans_refcount.c instead?

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 07/39] xfs: introduce the CoW fork
  2016-10-25 23:04 ` [PATCH 07/39] xfs: introduce the CoW fork Darrick J. Wong
@ 2016-10-26 10:25   ` Christoph Hellwig
  2016-10-26 17:59     ` Darrick J. Wong
  0 siblings, 1 reply; 85+ messages in thread
From: Christoph Hellwig @ 2016-10-26 10:25 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: david, linux-xfs

On Tue, Oct 25, 2016 at 04:04:21PM -0700, Darrick J. Wong wrote:
> Introduce a new in-core fork for storing copy-on-write delalloc
> reservations and allocated extents that are in the process of being
> written out.
> 
> [Clean up the CoW fork, should there ever be one.]
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

What's up with all these odd commit message and tiny not really
standalone patches?

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 09/39] xfs_db: dump refcount btree data
  2016-10-25 23:04 ` [PATCH 09/39] xfs_db: dump refcount btree data Darrick J. Wong
@ 2016-10-26 10:28   ` Christoph Hellwig
  2016-10-26 17:52     ` Darrick J. Wong
  0 siblings, 1 reply; 85+ messages in thread
From: Christoph Hellwig @ 2016-10-26 10:28 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: david, linux-xfs

> @@ -47,7 +47,7 @@ const field_t	agf_flds[] = {
>  	{ "versionnum", FLDT_UINT32D, OI(OFF(versionnum)), C1, 0, TYP_NONE },
>  	{ "seqno", FLDT_AGNUMBER, OI(OFF(seqno)), C1, 0, TYP_NONE },
>  	{ "length", FLDT_AGBLOCK, OI(OFF(length)), C1, 0, TYP_NONE },
> -	{ "roots", FLDT_AGBLOCK, OI(OFF(roots)), CI(XFS_BTNUM_AGF),
> +	{ "roots", FLDT_AGBLOCK, OI(OFF(roots)), CI(XFS_BTNUM_AGF) + 1,
>  	  FLD_ARRAY|FLD_SKIPALL, TYP_NONE },

Please replace XFS_BTNUM_AGF with a proper XFS_NUM_AG_BTREES or
similar define.  Without that this line and the ones below are black
magic.

Otherwise this looks fine to me.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 12/39] xfs_db: deal with the CoW extent size hint
  2016-10-25 23:04 ` [PATCH 12/39] xfs_db: deal with the CoW extent size hint Darrick J. Wong
@ 2016-10-26 10:28   ` Christoph Hellwig
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Hellwig @ 2016-10-26 10:28 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: david, linux-xfs

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 11/39] xfs_db: metadump should copy the refcount btree too
  2016-10-25 23:04 ` [PATCH 11/39] xfs_db: metadump should copy the refcount btree too Darrick J. Wong
@ 2016-10-26 10:29   ` Christoph Hellwig
  2016-10-26 16:33     ` Darrick J. Wong
  0 siblings, 1 reply; 85+ messages in thread
From: Christoph Hellwig @ 2016-10-26 10:29 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: david, linux-xfs

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

Do we have a testcase for metadump on a reflink fs, btw?

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 14/39] xfs_growfs: report the presence of the reflink feature
  2016-10-25 23:05 ` [PATCH 14/39] xfs_growfs: report the presence of the reflink feature Darrick J. Wong
@ 2016-10-26 10:31   ` Christoph Hellwig
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Hellwig @ 2016-10-26 10:31 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: david, linux-xfs

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 16/39] libxfs: add configure option to override system header fsxattr
  2016-10-25 23:05 ` [PATCH 16/39] libxfs: add configure option to override system header fsxattr Darrick J. Wong
  2016-10-26  0:56   ` Dave Chinner
@ 2016-10-26 10:32   ` Christoph Hellwig
  2016-10-26 19:04     ` Darrick J. Wong
  1 sibling, 1 reply; 85+ messages in thread
From: Christoph Hellwig @ 2016-10-26 10:32 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: david, linux-xfs

On Tue, Oct 25, 2016 at 04:05:17PM -0700, Darrick J. Wong wrote:
> By default, libxfs will use the kernel/system headers to define struct
> fsxattr.  Unfortunately, this creates a problem for developers who are
> writing new features but building xfsprogs on a stable system, because
> the stable kernel's headers don't reflect the new feature.  In this
> case, we want to be able to use the internal fsxattr definition while
> the kernel headers catch up, so provide some configure magic to allow
> developers to force the use of the internal definition.

We should simply always use our defintion either unconditionally or
based on checking the system one.  It's defintively not something that
should require user interaction.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 18/39] xfs_io: add refcount+bmap error injection types
  2016-10-25 23:05 ` [PATCH 18/39] xfs_io: add refcount+bmap error injection types Darrick J. Wong
@ 2016-10-26 10:33   ` Christoph Hellwig
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Hellwig @ 2016-10-26 10:33 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: david, linux-xfs

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 19/39] xfs_io: support injecting the 'per-AG reservation critically low' error
  2016-10-25 23:05 ` [PATCH 19/39] xfs_io: support injecting the 'per-AG reservation critically low' error Darrick J. Wong
@ 2016-10-26 10:33   ` Christoph Hellwig
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Hellwig @ 2016-10-26 10:33 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: david, linux-xfs

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 20/39] xfs_io: provide long-format help for falloc
  2016-10-25 23:05 ` [PATCH 20/39] xfs_io: provide long-format help for falloc Darrick J. Wong
@ 2016-10-26 10:34   ` Christoph Hellwig
  2016-10-26 16:37     ` Darrick J. Wong
  0 siblings, 1 reply; 85+ messages in thread
From: Christoph Hellwig @ 2016-10-26 10:34 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: david, linux-xfs

On Tue, Oct 25, 2016 at 04:05:43PM -0700, Darrick J. Wong wrote:
> Provide long-format help for falloc so that xfstests can use
> _require_xfs_io_command to check for falloc command line args.

As it turns out checking those args isn't actually suitable.

But documentation is always good to have, so:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 21/39] xfs_io: try to unshare copy-on-write blocks via fallocate
  2016-10-25 23:05 ` [PATCH 21/39] xfs_io: try to unshare copy-on-write blocks via fallocate Darrick J. Wong
@ 2016-10-26 10:34   ` Christoph Hellwig
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Hellwig @ 2016-10-26 10:34 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: david, linux-xfs

On Tue, Oct 25, 2016 at 04:05:49PM -0700, Darrick J. Wong wrote:
> Wire up the "unshare" flag to the xfs_io fallocate command.

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 22/39] xfs_logprint: support cowextsize reporting in log contents
  2016-10-25 23:05 ` [PATCH 22/39] xfs_logprint: support cowextsize reporting in log contents Darrick J. Wong
@ 2016-10-26 10:34   ` Christoph Hellwig
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Hellwig @ 2016-10-26 10:34 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: david, linux-xfs

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 23/39] xfs_logprint: support refcount redo items
  2016-10-25 23:06 ` [PATCH 23/39] xfs_logprint: support refcount redo items Darrick J. Wong
@ 2016-10-26 10:37   ` Christoph Hellwig
  2016-10-26 17:31     ` Darrick J. Wong
  0 siblings, 1 reply; 85+ messages in thread
From: Christoph Hellwig @ 2016-10-26 10:37 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: david, linux-xfs

> +static int
> +xfs_cui_copy_format(
> +	char			  *buf,
> +	uint			  len,
> +	struct xfs_cui_log_format *dst_fmt,
> +	int			  continued)
> +{
> +	uint nextents = ((struct xfs_cui_log_format *)buf)->cui_nextents;

nit: may have a local variable of type struct xfs_cui_log_format *
to clean this up a bit?

> +	uint dst_len = xfs_cui_log_format_sizeof(nextents);
> +
> +	if (len == dst_len || continued) {
> +		memcpy((char *)dst_fmt, buf, len);

no need to cast here.

> +int
> +xlog_print_trans_cui(
> +	char			**ptr,
> +	uint			src_len,
> +	int			continued)
> +{
> +	struct xfs_cui_log_format	*src_f, *f = NULL;
> +	uint			dst_len;
> +	uint			nextents;
> +	struct xfs_phys_extent	*ex;
> +	int			i;
> +	int			error = 0;
> +	int			core_size;
> +
> +	core_size = offsetof(struct xfs_cui_log_format, cui_extents);
> +
> +	/*
> +	 * memmove to ensure 8-byte alignment for the long longs in
> +	 * struct xfs_cui_log_format structure
> +	 */
> +	src_f = malloc(src_len);
> +	if (src_f == NULL) {
> +		fprintf(stderr, _("%s: %s: malloc failed\n"),
> +			progname, __func__);
> +		exit(1);
> +	}
> +	memmove((char*)src_f, *ptr, src_len);

No need to use memmove on a freshly allocated buffer ever, memcpy
is enough.  Also no need to cast here.

> +int
> +xlog_print_trans_cud(
> +	char				**ptr,
> +	uint				len)
> +{
> +	struct xfs_cud_log_format	*f;
> +	struct xfs_cud_log_format	lbuf;
> +
> +	/* size without extents at end */
> +	uint core_size = sizeof(struct xfs_cud_log_format);
> +
> +	/*
> +	 * memmove to ensure 8-byte alignment for the long longs in
> +	 * xfs_efd_log_format_t structure
> +	 */
> +	memmove(&lbuf, *ptr, MIN(core_size, len));

Can be a memcpy again.


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 24/39] xfs_logprint: support bmap redo items
  2016-10-25 23:06 ` [PATCH 24/39] xfs_logprint: support bmap " Darrick J. Wong
@ 2016-10-26 10:38   ` Christoph Hellwig
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Hellwig @ 2016-10-26 10:38 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: david, linux-xfs

Same comments as for the last patch, looks like a copy an paste job :)

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 25/39] man: document the inode cowextsize flags & fields
  2016-10-25 23:06 ` [PATCH 25/39] man: document the inode cowextsize flags & fields Darrick J. Wong
@ 2016-10-26 10:39   ` Christoph Hellwig
  2016-10-26 17:20     ` Darrick J. Wong
  0 siblings, 1 reply; 85+ messages in thread
From: Christoph Hellwig @ 2016-10-26 10:39 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: david, linux-xfs

On Tue, Oct 25, 2016 at 04:06:14PM -0700, Darrick J. Wong wrote:
> Document the new copy-on-write extent size fields and inode flags
> available in struct fsxattr.

Looks fine.  Btw, now that fsxattr is in common code should this
documentation move to or be duplicated in the man-pages project?

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 26/39] xfs_repair: fix get_agino_buf to avoid corrupting inodes
  2016-10-25 23:06 ` [PATCH 26/39] xfs_repair: fix get_agino_buf to avoid corrupting inodes Darrick J. Wong
@ 2016-10-26 10:48   ` Christoph Hellwig
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Hellwig @ 2016-10-26 10:48 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: david, linux-xfs

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 27/39] xfs_repair: check the existing refcount btree
  2016-10-25 23:06 ` [PATCH 27/39] xfs_repair: check the existing refcount btree Darrick J. Wong
@ 2016-10-26 10:49   ` Christoph Hellwig
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Hellwig @ 2016-10-26 10:49 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: david, linux-xfs

On Tue, Oct 25, 2016 at 04:06:28PM -0700, Darrick J. Wong wrote:
> Spot-check the refcount btree for obvious errors, and mark the
> refcount btree blocks as such.

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 28/39] xfs_repair: handle multiple owners of data blocks
  2016-10-25 23:06 ` [PATCH 28/39] xfs_repair: handle multiple owners of data blocks Darrick J. Wong
@ 2016-10-26 10:57   ` Christoph Hellwig
  2016-10-26 17:15     ` Darrick J. Wong
  0 siblings, 1 reply; 85+ messages in thread
From: Christoph Hellwig @ 2016-10-26 10:57 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: david, linux-xfs

> --- a/repair/dinode.c
> +++ b/repair/dinode.c
> @@ -722,6 +722,9 @@ _("Fatal error: inode %" PRIu64 " - blkmap_set_ext(): %s\n"
>  			 * checking each entry without setting the
>  			 * block bitmap
>  			 */
> +			if (type == XR_INO_DATA &&
> +			    xfs_sb_version_hasreflink(&mp->m_sb))
> +				goto skip_dup;
>  			if (search_dup_extent(agno, agbno, ebno)) {
>  				do_warn(
>  _("%s fork in ino %" PRIu64 " claims dup extent, "
> @@ -731,6 +734,7 @@ _("%s fork in ino %" PRIu64 " claims dup extent, "
>  					irec.br_blockcount);
>  				goto done;
>  			}
> +skip_dup:

For some weird reason this goto makes me sad.  I'm getting into
major nitpick terretory here, but can we avoid it?  Either duplicate
the *tot increment or find some funky condition?

Otherwise this looks fine.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 11/39] xfs_db: metadump should copy the refcount btree too
  2016-10-26 10:29   ` Christoph Hellwig
@ 2016-10-26 16:33     ` Darrick J. Wong
  0 siblings, 0 replies; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-26 16:33 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: david, linux-xfs

On Wed, Oct 26, 2016 at 03:29:40AM -0700, Christoph Hellwig wrote:
> Looks fine,
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> 
> Do we have a testcase for metadump on a reflink fs, btw?

xfs/129 for refcountbt, xfs/234 for rmapbt, and xfs/336 for realtime rmap.

--D

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 20/39] xfs_io: provide long-format help for falloc
  2016-10-26 10:34   ` Christoph Hellwig
@ 2016-10-26 16:37     ` Darrick J. Wong
  0 siblings, 0 replies; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-26 16:37 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: david, linux-xfs

On Wed, Oct 26, 2016 at 03:34:05AM -0700, Christoph Hellwig wrote:
> On Tue, Oct 25, 2016 at 04:05:43PM -0700, Darrick J. Wong wrote:
> > Provide long-format help for falloc so that xfstests can use
> > _require_xfs_io_command to check for falloc command line args.
> 
> As it turns out checking those args isn't actually suitable.

Yeah, ultimately I simply made an 'funshare' command and xfstests
checks whether it works the same way it does for fzero/fpunch/etc.
That makes the commit message a little out of date; will change it.

--D

> But documentation is always good to have, so:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 28/39] xfs_repair: handle multiple owners of data blocks
  2016-10-26 10:57   ` Christoph Hellwig
@ 2016-10-26 17:15     ` Darrick J. Wong
  2016-10-26 21:15       ` Dave Chinner
  0 siblings, 1 reply; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-26 17:15 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: david, linux-xfs

On Wed, Oct 26, 2016 at 03:57:58AM -0700, Christoph Hellwig wrote:
> > --- a/repair/dinode.c
> > +++ b/repair/dinode.c
> > @@ -722,6 +722,9 @@ _("Fatal error: inode %" PRIu64 " - blkmap_set_ext(): %s\n"
> >  			 * checking each entry without setting the
> >  			 * block bitmap
> >  			 */
> > +			if (type == XR_INO_DATA &&
> > +			    xfs_sb_version_hasreflink(&mp->m_sb))
> > +				goto skip_dup;
> >  			if (search_dup_extent(agno, agbno, ebno)) {
> >  				do_warn(
> >  _("%s fork in ino %" PRIu64 " claims dup extent, "
> > @@ -731,6 +734,7 @@ _("%s fork in ino %" PRIu64 " claims dup extent, "
> >  					irec.br_blockcount);
> >  				goto done;
> >  			}
> > +skip_dup:
> 
> For some weird reason this goto makes me sad.  I'm getting into
> major nitpick terretory here, but can we avoid it?  Either duplicate
> the *tot increment or find some funky condition?
> 
> Otherwise this looks fine.

Ugh, unnecessary goto.  Sad! ;)

But in all seriousness an 'else if' could do the same work, so:

if (type == XR_INO_DATA &&
    xfs_sb_version_hasreflink(&mp->m_sb))
	; /* avoid the dup extent check below */
else if (search_dup_extent(agno, agbno, ebno)) {
	do_warn(...);
}
*tot += irec.br_blockcount;

--D

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 25/39] man: document the inode cowextsize flags & fields
  2016-10-26 10:39   ` Christoph Hellwig
@ 2016-10-26 17:20     ` Darrick J. Wong
  0 siblings, 0 replies; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-26 17:20 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: david, linux-xfs

On Wed, Oct 26, 2016 at 03:39:44AM -0700, Christoph Hellwig wrote:
> On Tue, Oct 25, 2016 at 04:06:14PM -0700, Darrick J. Wong wrote:
> > Document the new copy-on-write extent size fields and inode flags
> > available in struct fsxattr.
> 
> Looks fine.  Btw, now that fsxattr is in common code should this
> documentation move to or be duplicated in the man-pages project?

Yeah.  I'll port the manpage over to man-pages and we can
kill the one in xfsprogs later if we want.

> Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 23/39] xfs_logprint: support refcount redo items
  2016-10-26 10:37   ` Christoph Hellwig
@ 2016-10-26 17:31     ` Darrick J. Wong
  0 siblings, 0 replies; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-26 17:31 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: david, linux-xfs

On Wed, Oct 26, 2016 at 03:37:50AM -0700, Christoph Hellwig wrote:
> > +static int
> > +xfs_cui_copy_format(
> > +	char			  *buf,
> > +	uint			  len,
> > +	struct xfs_cui_log_format *dst_fmt,
> > +	int			  continued)
> > +{
> > +	uint nextents = ((struct xfs_cui_log_format *)buf)->cui_nextents;
> 
> nit: may have a local variable of type struct xfs_cui_log_format *
> to clean this up a bit?
> 
> > +	uint dst_len = xfs_cui_log_format_sizeof(nextents);
> > +
> > +	if (len == dst_len || continued) {
> > +		memcpy((char *)dst_fmt, buf, len);
> 
> no need to cast here.
> 
> > +int
> > +xlog_print_trans_cui(
> > +	char			**ptr,
> > +	uint			src_len,
> > +	int			continued)
> > +{
> > +	struct xfs_cui_log_format	*src_f, *f = NULL;
> > +	uint			dst_len;
> > +	uint			nextents;
> > +	struct xfs_phys_extent	*ex;
> > +	int			i;
> > +	int			error = 0;
> > +	int			core_size;
> > +
> > +	core_size = offsetof(struct xfs_cui_log_format, cui_extents);
> > +
> > +	/*
> > +	 * memmove to ensure 8-byte alignment for the long longs in
> > +	 * struct xfs_cui_log_format structure
> > +	 */
> > +	src_f = malloc(src_len);
> > +	if (src_f == NULL) {
> > +		fprintf(stderr, _("%s: %s: malloc failed\n"),
> > +			progname, __func__);
> > +		exit(1);
> > +	}
> > +	memmove((char*)src_f, *ptr, src_len);
> 
> No need to use memmove on a freshly allocated buffer ever, memcpy
> is enough.  Also no need to cast here.
> 
> > +int
> > +xlog_print_trans_cud(
> > +	char				**ptr,
> > +	uint				len)
> > +{
> > +	struct xfs_cud_log_format	*f;
> > +	struct xfs_cud_log_format	lbuf;
> > +
> > +	/* size without extents at end */
> > +	uint core_size = sizeof(struct xfs_cud_log_format);
> > +
> > +	/*
> > +	 * memmove to ensure 8-byte alignment for the long longs in
> > +	 * xfs_efd_log_format_t structure
> > +	 */
> > +	memmove(&lbuf, *ptr, MIN(core_size, len));
> 
> Can be a memcpy again.
> 

Yeah, cut & paste, sorry.

The copy function can be passed a struct xfs_cui_log_format * as the
first parameter instead of char *, which cleans out a lot of the
cruftiness.  Will fix this all up.

--D

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 09/39] xfs_db: dump refcount btree data
  2016-10-26 10:28   ` Christoph Hellwig
@ 2016-10-26 17:52     ` Darrick J. Wong
  0 siblings, 0 replies; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-26 17:52 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: david, linux-xfs

On Wed, Oct 26, 2016 at 03:28:13AM -0700, Christoph Hellwig wrote:
> > @@ -47,7 +47,7 @@ const field_t	agf_flds[] = {
> >  	{ "versionnum", FLDT_UINT32D, OI(OFF(versionnum)), C1, 0, TYP_NONE },
> >  	{ "seqno", FLDT_AGNUMBER, OI(OFF(seqno)), C1, 0, TYP_NONE },
> >  	{ "length", FLDT_AGBLOCK, OI(OFF(length)), C1, 0, TYP_NONE },
> > -	{ "roots", FLDT_AGBLOCK, OI(OFF(roots)), CI(XFS_BTNUM_AGF),
> > +	{ "roots", FLDT_AGBLOCK, OI(OFF(roots)), CI(XFS_BTNUM_AGF) + 1,

NAK.  The 'roots' array in the AGF is only three items long.  Adding one
here makes it pick up 'bnolevel' as the root of a fourth tree, which is
incorrect.  Same applies to the 'levels' array; it should also be left
alone.

(The refcount_root and refcount_level field definitions are ok.)

> >  	  FLD_ARRAY|FLD_SKIPALL, TYP_NONE },
> 
> Please replace XFS_BTNUM_AGF with a proper XFS_NUM_AG_BTREES or
> similar define.  Without that this line and the ones below are black
> magic.

The naming here is awkward, indeed.  The agf_roots[] and agf_levels[]
arrays are declared to be XFS_BTNUM_AGF (3) elements long.  That was
fine when we wanted to root the rmapbt in that third slot, but now it's
weird because the refcountbt is the fourth tree to be rooted in the AGF.

Increasing XFS_BTNUM_AGF would have screwed up the on-disk format way
back when we could change the disk format, and instead renaming the
constant to XFS_AGF_ROOT_ARRAY_LEN is still somewhat misleading since
the agf_root array doesn't cover all the AGF roots.

I guess I could enlarge the comment for XFS_BTNUM_AGF to warn that
there are other btree roots in the AGF, though it felt like that was
fairly obvious.

--D

> Otherwise this looks fine to me.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 07/39] xfs: introduce the CoW fork
  2016-10-26 10:25   ` Christoph Hellwig
@ 2016-10-26 17:59     ` Darrick J. Wong
  0 siblings, 0 replies; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-26 17:59 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: david, linux-xfs

On Wed, Oct 26, 2016 at 03:25:24AM -0700, Christoph Hellwig wrote:
> On Tue, Oct 25, 2016 at 04:04:21PM -0700, Darrick J. Wong wrote:
> > Introduce a new in-core fork for storing copy-on-write delalloc
> > reservations and allocated extents that are in the process of being
> > written out.
> > 
> > [Clean up the CoW fork, should there ever be one.]
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> What's up with all these odd commit message and tiny not really
> standalone patches?

Silly me just rammed the old patch stack into the new for-next base,
with the result that the patches that still had un-merged hunks got left
in the patch stack with commit messages intact.

Dave yelled at me to fix the commit messages, so I changed the commit
messages to be the text in the brackets before the S-o-B and pushed the
mess to github.  I figured I ought to wait a day or two for more reviews
to come in before re-spamming everyone.

Sorry about the spew.

--D

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 05/39] xfs: connect refcount adjust functions to upper layers
  2016-10-26 10:24   ` Christoph Hellwig
@ 2016-10-26 18:06     ` Darrick J. Wong
  0 siblings, 0 replies; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-26 18:06 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: david, linux-xfs

On Wed, Oct 26, 2016 at 03:24:44AM -0700, Christoph Hellwig wrote:
> On Tue, Oct 25, 2016 at 04:04:08PM -0700, Darrick J. Wong wrote:
> > Plumb in the upper level interface to schedule and finish deferred
> > refcount operations via the deferred ops mechanism.
> > 
> > [Plumb in refcount deferred op log items.]
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Shouldn't we try to shared xfs_trans_refcount.c instead?

There's not a lot to share, since the kernel version actually has to
create log intent items and log them prior to calling the libxfs
functions, whereas xfsprogs calls the libxfs functions directly.

The only identical functions are _diff_items, _finish_cleanup, and
_cancel_item.  Those last two are one-liners.

--D

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 16/39] libxfs: add configure option to override system header fsxattr
  2016-10-26 10:32   ` Christoph Hellwig
@ 2016-10-26 19:04     ` Darrick J. Wong
  0 siblings, 0 replies; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-26 19:04 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: david, linux-xfs

On Wed, Oct 26, 2016 at 03:32:42AM -0700, Christoph Hellwig wrote:
> On Tue, Oct 25, 2016 at 04:05:17PM -0700, Darrick J. Wong wrote:
> > By default, libxfs will use the kernel/system headers to define struct
> > fsxattr.  Unfortunately, this creates a problem for developers who are
> > writing new features but building xfsprogs on a stable system, because
> > the stable kernel's headers don't reflect the new feature.  In this
> > case, we want to be able to use the internal fsxattr definition while
> > the kernel headers catch up, so provide some configure magic to allow
> > developers to force the use of the internal definition.
> 
> We should simply always use our defintion either unconditionally or
> based on checking the system one.  It's defintively not something that
> should require user interaction.

All right.  For this patch I'll remove the configure option, leaving
only the pieces that actually make the override happen.  In patch #17
I'll add a configure check that enables the override if the system
struct fsxattr is present but does not contain the cowextsize field, and
make it so that io/cowextsize.c is always built.

As a side note this will leave intact the ability to detect that the
system headers don't define fsxattr (or its ioctl) at all, and use the
internal definitions in that case.

--D

> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 28/39] xfs_repair: handle multiple owners of data blocks
  2016-10-26 17:15     ` Darrick J. Wong
@ 2016-10-26 21:15       ` Dave Chinner
  2016-10-26 21:59         ` Darrick J. Wong
  0 siblings, 1 reply; 85+ messages in thread
From: Dave Chinner @ 2016-10-26 21:15 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs

On Wed, Oct 26, 2016 at 10:15:53AM -0700, Darrick J. Wong wrote:
> On Wed, Oct 26, 2016 at 03:57:58AM -0700, Christoph Hellwig wrote:
> > > --- a/repair/dinode.c
> > > +++ b/repair/dinode.c
> > > @@ -722,6 +722,9 @@ _("Fatal error: inode %" PRIu64 " - blkmap_set_ext(): %s\n"
> > >  			 * checking each entry without setting the
> > >  			 * block bitmap
> > >  			 */
> > > +			if (type == XR_INO_DATA &&
> > > +			    xfs_sb_version_hasreflink(&mp->m_sb))
> > > +				goto skip_dup;
> > >  			if (search_dup_extent(agno, agbno, ebno)) {
> > >  				do_warn(
> > >  _("%s fork in ino %" PRIu64 " claims dup extent, "
> > > @@ -731,6 +734,7 @@ _("%s fork in ino %" PRIu64 " claims dup extent, "
> > >  					irec.br_blockcount);
> > >  				goto done;
> > >  			}
> > > +skip_dup:
> > 
> > For some weird reason this goto makes me sad.  I'm getting into
> > major nitpick terretory here, but can we avoid it?  Either duplicate
> > the *tot increment or find some funky condition?
> > 
> > Otherwise this looks fine.
> 
> Ugh, unnecessary goto.  Sad! ;)
> 
> But in all seriousness an 'else if' could do the same work, so:
> 
> if (type == XR_INO_DATA &&
>     xfs_sb_version_hasreflink(&mp->m_sb))
> 	; /* avoid the dup extent check below */
> else if (search_dup_extent(agno, agbno, ebno)) {
> 	do_warn(...);
> }
> *tot += irec.br_blockcount;

Even that is unnecessary, yes?

	if ((type != XR_INO_DATA ||
	     !xfs_sb_version_hasreflink(&mp->m_sb)) &&
	    search_dup_extent(agno, agbno, ebno) {
		do_warn(...);
	}


Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 28/39] xfs_repair: handle multiple owners of data blocks
  2016-10-26 21:15       ` Dave Chinner
@ 2016-10-26 21:59         ` Darrick J. Wong
  0 siblings, 0 replies; 85+ messages in thread
From: Darrick J. Wong @ 2016-10-26 21:59 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, linux-xfs

On Thu, Oct 27, 2016 at 08:15:40AM +1100, Dave Chinner wrote:
> On Wed, Oct 26, 2016 at 10:15:53AM -0700, Darrick J. Wong wrote:
> > On Wed, Oct 26, 2016 at 03:57:58AM -0700, Christoph Hellwig wrote:
> > > > --- a/repair/dinode.c
> > > > +++ b/repair/dinode.c
> > > > @@ -722,6 +722,9 @@ _("Fatal error: inode %" PRIu64 " - blkmap_set_ext(): %s\n"
> > > >  			 * checking each entry without setting the
> > > >  			 * block bitmap
> > > >  			 */
> > > > +			if (type == XR_INO_DATA &&
> > > > +			    xfs_sb_version_hasreflink(&mp->m_sb))
> > > > +				goto skip_dup;
> > > >  			if (search_dup_extent(agno, agbno, ebno)) {
> > > >  				do_warn(
> > > >  _("%s fork in ino %" PRIu64 " claims dup extent, "
> > > > @@ -731,6 +734,7 @@ _("%s fork in ino %" PRIu64 " claims dup extent, "
> > > >  					irec.br_blockcount);
> > > >  				goto done;
> > > >  			}
> > > > +skip_dup:
> > > 
> > > For some weird reason this goto makes me sad.  I'm getting into
> > > major nitpick terretory here, but can we avoid it?  Either duplicate
> > > the *tot increment or find some funky condition?
> > > 
> > > Otherwise this looks fine.
> > 
> > Ugh, unnecessary goto.  Sad! ;)
> > 
> > But in all seriousness an 'else if' could do the same work, so:
> > 
> > if (type == XR_INO_DATA &&
> >     xfs_sb_version_hasreflink(&mp->m_sb))
> > 	; /* avoid the dup extent check below */
> > else if (search_dup_extent(agno, agbno, ebno)) {
> > 	do_warn(...);
> > }
> > *tot += irec.br_blockcount;
> 
> Even that is unnecessary, yes?
> 
> 	if ((type != XR_INO_DATA ||
> 	     !xfs_sb_version_hasreflink(&mp->m_sb)) &&
> 	    search_dup_extent(agno, agbno, ebno) {
> 		do_warn(...);
> 	}

Err, yes.

--D

> 
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

end of thread, other threads:[~2016-10-26 21:59 UTC | newest]

Thread overview: 85+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-25 23:03 [PATCH v10 00/39] xfsprogs: add reflink and dedupe support Darrick J. Wong
2016-10-25 23:03 ` [PATCH 01/39] xfs: convert RUI log formats to use variable length arrays Darrick J. Wong
2016-10-26 10:21   ` Christoph Hellwig
2016-10-25 23:03 ` [PATCH 02/39] xfs_buflock: handling parsing errors more gracefully Darrick J. Wong
2016-10-26 10:22   ` Christoph Hellwig
2016-10-25 23:03 ` [PATCH 03/39] xfs: define the on-disk refcount btree format Darrick J. Wong
2016-10-26 10:23   ` Christoph Hellwig
2016-10-25 23:04 ` [PATCH 04/39] xfs: add refcount btree operations Darrick J. Wong
2016-10-26 10:23   ` Christoph Hellwig
2016-10-25 23:04 ` [PATCH 05/39] xfs: connect refcount adjust functions to upper layers Darrick J. Wong
2016-10-26 10:24   ` Christoph Hellwig
2016-10-26 18:06     ` Darrick J. Wong
2016-10-25 23:04 ` [PATCH 06/39] xfs: implement deferred bmbt map/unmap operations Darrick J. Wong
2016-10-25 23:04 ` [PATCH 07/39] xfs: introduce the CoW fork Darrick J. Wong
2016-10-26 10:25   ` Christoph Hellwig
2016-10-26 17:59     ` Darrick J. Wong
2016-10-25 23:04 ` [PATCH 08/39] xfs: create a separate cow extent size hint for the allocator Darrick J. Wong
2016-10-25 23:04 ` [PATCH 09/39] xfs_db: dump refcount btree data Darrick J. Wong
2016-10-26 10:28   ` Christoph Hellwig
2016-10-26 17:52     ` Darrick J. Wong
2016-10-25 23:04 ` [PATCH 10/39] xfs_db: add support for checking the refcount btree Darrick J. Wong
2016-10-26  0:49   ` Dave Chinner
2016-10-26  1:13     ` Darrick J. Wong
2016-10-26  3:26       ` Dave Chinner
2016-10-26  6:29         ` Darrick J. Wong
2016-10-25 23:04 ` [PATCH 11/39] xfs_db: metadump should copy the refcount btree too Darrick J. Wong
2016-10-26 10:29   ` Christoph Hellwig
2016-10-26 16:33     ` Darrick J. Wong
2016-10-25 23:04 ` [PATCH 12/39] xfs_db: deal with the CoW extent size hint Darrick J. Wong
2016-10-26 10:28   ` Christoph Hellwig
2016-10-25 23:04 ` [PATCH 13/39] xfs_db: print one array element per line Darrick J. Wong
2016-10-26  0:51   ` Dave Chinner
2016-10-26  1:13     ` Darrick J. Wong
2016-10-26  3:23       ` Dave Chinner
2016-10-26  3:34         ` Darrick J. Wong
2016-10-26  5:48           ` Dave Chinner
2016-10-25 23:05 ` [PATCH 14/39] xfs_growfs: report the presence of the reflink feature Darrick J. Wong
2016-10-26 10:31   ` Christoph Hellwig
2016-10-25 23:05 ` [PATCH 15/39] xfs_io: bmap should support querying CoW fork, shared blocks Darrick J. Wong
2016-10-25 23:05 ` [PATCH 16/39] libxfs: add configure option to override system header fsxattr Darrick J. Wong
2016-10-26  0:56   ` Dave Chinner
2016-10-26  1:16     ` Darrick J. Wong
2016-10-26 10:32   ` Christoph Hellwig
2016-10-26 19:04     ` Darrick J. Wong
2016-10-25 23:05 ` [PATCH 17/39] xfs_io: get and set the CoW extent size hint Darrick J. Wong
2016-10-26  1:06   ` Dave Chinner
2016-10-25 23:05 ` [PATCH 18/39] xfs_io: add refcount+bmap error injection types Darrick J. Wong
2016-10-26 10:33   ` Christoph Hellwig
2016-10-25 23:05 ` [PATCH 19/39] xfs_io: support injecting the 'per-AG reservation critically low' error Darrick J. Wong
2016-10-26 10:33   ` Christoph Hellwig
2016-10-25 23:05 ` [PATCH 20/39] xfs_io: provide long-format help for falloc Darrick J. Wong
2016-10-26 10:34   ` Christoph Hellwig
2016-10-26 16:37     ` Darrick J. Wong
2016-10-25 23:05 ` [PATCH 21/39] xfs_io: try to unshare copy-on-write blocks via fallocate Darrick J. Wong
2016-10-26 10:34   ` Christoph Hellwig
2016-10-25 23:05 ` [PATCH 22/39] xfs_logprint: support cowextsize reporting in log contents Darrick J. Wong
2016-10-26 10:34   ` Christoph Hellwig
2016-10-25 23:06 ` [PATCH 23/39] xfs_logprint: support refcount redo items Darrick J. Wong
2016-10-26 10:37   ` Christoph Hellwig
2016-10-26 17:31     ` Darrick J. Wong
2016-10-25 23:06 ` [PATCH 24/39] xfs_logprint: support bmap " Darrick J. Wong
2016-10-26 10:38   ` Christoph Hellwig
2016-10-25 23:06 ` [PATCH 25/39] man: document the inode cowextsize flags & fields Darrick J. Wong
2016-10-26 10:39   ` Christoph Hellwig
2016-10-26 17:20     ` Darrick J. Wong
2016-10-25 23:06 ` [PATCH 26/39] xfs_repair: fix get_agino_buf to avoid corrupting inodes Darrick J. Wong
2016-10-26 10:48   ` Christoph Hellwig
2016-10-25 23:06 ` [PATCH 27/39] xfs_repair: check the existing refcount btree Darrick J. Wong
2016-10-26 10:49   ` Christoph Hellwig
2016-10-25 23:06 ` [PATCH 28/39] xfs_repair: handle multiple owners of data blocks Darrick J. Wong
2016-10-26 10:57   ` Christoph Hellwig
2016-10-26 17:15     ` Darrick J. Wong
2016-10-26 21:15       ` Dave Chinner
2016-10-26 21:59         ` Darrick J. Wong
2016-10-25 23:06 ` [PATCH 29/39] xfs_repair: process reverse-mapping data into refcount data Darrick J. Wong
2016-10-25 23:06 ` [PATCH 30/39] xfs_repair: record reflink inode state Darrick J. Wong
2016-10-25 23:06 ` [PATCH 31/39] xfs_repair: fix inode reflink flags Darrick J. Wong
2016-10-25 23:07 ` [PATCH 32/39] xfs_repair: check the refcount btree against our observed reference counts when -n Darrick J. Wong
2016-10-25 23:07 ` [PATCH 33/39] xfs_repair: rebuild the refcount btree Darrick J. Wong
2016-10-25 23:07 ` [PATCH 34/39] xfs_repair: complain about copy-on-write leftovers Darrick J. Wong
2016-10-25 23:07 ` [PATCH 35/39] xfs_repair: check the CoW extent size hint Darrick J. Wong
2016-10-25 23:07 ` [PATCH 36/39] xfs_repair: use range query when while checking rmaps Darrick J. Wong
2016-10-25 23:07 ` [PATCH 37/39] xfs_repair: check for mergeable refcount records Darrick J. Wong
2016-10-25 23:07 ` [PATCH 38/39] xfs_repair: use thread pools to sort rmap data Darrick J. Wong
2016-10-25 23:07 ` [PATCH 39/39] mkfs.xfs: format reflink enabled filesystems Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.