linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v8 00/22] xfs: online scrub support
@ 2017-07-21  4:38 Darrick J. Wong
  2017-07-21  4:38 ` [PATCH 01/22] xfs: query the per-AG reservation counters Darrick J. Wong
                   ` (21 more replies)
  0 siblings, 22 replies; 63+ messages in thread
From: Darrick J. Wong @ 2017-07-21  4:38 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

Hi all,

This is the eighth revision of a patchset that adds to XFS kernel
support for online metadata scrubbing and repair.  There aren't any
on-disk format changes.  Changes since v7 include adding scrubbers for
metadata buffers that were sitting around in memory before scrub
started, scrubbers for directory parent pointers, and scrubbers for
quota counters.  I have been performing daily online scrubs of my XFS
filesystems for several months now, with surprisingly few problems.

Online scrub/repair support consists of four major pieces -- first, an
ioctl that maps physical extents to their owners (GETFSMAP; already in
4.12); second, various in-kernel metadata scrubbing ioctls to examine
metadata records and cross-reference them with other filesystem
metadata; third, an in-kernel mechanism for rebuilding damaged metadata
objects and btrees; and fourth, a userspace component to coordinate
scrubbing and repair operations.

This new utility, xfs_scrub, is separate from the existing offline
xfs_repair tool.  The program uses various XFS ioctls to iterate all XFS
metadata and asks the kernel to check the metadata and repair it if
necessary.

Per reviewer request, the v8 patch series will be broken into multiple
smaller series -- the first series adds the scrub functionality, and the
second series will contain all the userspace code.  The third series
will add cross-referencing checks, and the fourth series will add online
repair functionality.

While I understand that reviewer bandwidth is limited, I would like to
get this series prepped for 4.14, if possible.  I have isolated the
scrub code such that it can be compiled out entirely, in the hopes that
we can stabilize the code while not exposing regular users to riskier
code.

If you're going to start using this mess, you probably ought to just
pull from my git trees.  The kernel patches[1] should apply against
4.13-rc1.  xfsprogs[2] and xfstests[3] can be found in their usual
places.  The git trees contain all four series' worth of changes.

This is an extraordinary way to eat your data.  Enjoy! 
Comments and questions are, as always, welcome.

--D

[1] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=djwong-devel
[2] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=djwong-devel
[3] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=djwong-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 01/22] xfs: query the per-AG reservation counters
  2017-07-21  4:38 [PATCH v8 00/22] xfs: online scrub support Darrick J. Wong
@ 2017-07-21  4:38 ` Darrick J. Wong
  2017-07-23 16:16   ` Allison Henderson
  2017-07-23 22:25   ` Dave Chinner
  2017-07-21  4:38 ` [PATCH 02/22] xfs: add scrub tracepoints Darrick J. Wong
                   ` (20 subsequent siblings)
  21 siblings, 2 replies; 63+ messages in thread
From: Darrick J. Wong @ 2017-07-21  4:38 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Establish an ioctl for userspace to query the original and current
per-AG reservation counts.  This will be used by xfs_scrub to
check that the vfs counters are at least somewhat sane.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_fs.h |   10 ++++++++++
 fs/xfs/xfs_fsops.c     |   29 +++++++++++++++++++++++++++++
 fs/xfs/xfs_fsops.h     |    2 ++
 fs/xfs/xfs_ioctl.c     |   16 ++++++++++++++++
 fs/xfs/xfs_ioctl32.c   |    1 +
 5 files changed, 58 insertions(+)


diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 8c61f21..5dedab9 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -469,6 +469,15 @@ typedef struct xfs_swapext
 #define XFS_FSOP_GOING_FLAGS_NOLOGFLUSH		0x2	/* don't flush log nor data */
 
 /*
+ * AG reserved block counters
+ */
+struct xfs_fsop_ag_resblks {
+	__u64 resblks;		/* blocks reserved now */
+	__u64 resblks_orig;	/* blocks reserved at mount time */
+	__u64 reserved[2];
+};
+
+/*
  * ioctl limits
  */
 #ifdef XATTR_LIST_MAX
@@ -543,6 +552,7 @@ typedef struct xfs_swapext
 #define XFS_IOC_ATTRMULTI_BY_HANDLE  _IOW ('X', 123, struct xfs_fsop_attrmulti_handlereq)
 #define XFS_IOC_FSGEOMETRY	     _IOR ('X', 124, struct xfs_fsop_geom)
 #define XFS_IOC_GOINGDOWN	     _IOR ('X', 125, uint32_t)
+#define XFS_IOC_GET_AG_RESBLKS	     _IOR ('X', 126, struct xfs_fsop_ag_resblks)
 /*	XFS_IOC_GETFSUUID ---------- deprecated 140	 */
 
 
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 8f22fc5..0920d59 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -44,6 +44,7 @@
 #include "xfs_filestream.h"
 #include "xfs_rmap.h"
 #include "xfs_ag_resv.h"
+#include "xfs_fs.h"
 
 /*
  * File system operations
@@ -1046,3 +1047,31 @@ xfs_fs_unreserve_ag_blocks(
 
 	return error;
 }
+
+/* Query the per-AG reservations to see how many blocks we have reserved. */
+int
+xfs_fs_get_ag_reserve_blocks(
+	struct xfs_mount		*mp,
+	struct xfs_fsop_ag_resblks	*out)
+{
+	struct xfs_ag_resv		*r;
+	struct xfs_perag		*pag;
+	xfs_agnumber_t			agno;
+
+	out->resblks = 0;
+	out->resblks_orig = 0;
+	out->reserved[0] = out->reserved[1] = 0;
+
+	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
+		pag = xfs_perag_get(mp, agno);
+		r = xfs_perag_resv(pag, XFS_AG_RESV_METADATA);
+		out->resblks += r->ar_reserved;
+		out->resblks_orig += r->ar_asked;
+		r = xfs_perag_resv(pag, XFS_AG_RESV_AGFL);
+		out->resblks += r->ar_reserved;
+		out->resblks_orig += r->ar_asked;
+		xfs_perag_put(pag);
+	}
+
+	return 0;
+}
diff --git a/fs/xfs/xfs_fsops.h b/fs/xfs/xfs_fsops.h
index 2954c13..c8f5e26 100644
--- a/fs/xfs/xfs_fsops.h
+++ b/fs/xfs/xfs_fsops.h
@@ -25,6 +25,8 @@ extern int xfs_fs_counts(xfs_mount_t *mp, xfs_fsop_counts_t *cnt);
 extern int xfs_reserve_blocks(xfs_mount_t *mp, uint64_t *inval,
 				xfs_fsop_resblks_t *outval);
 extern int xfs_fs_goingdown(xfs_mount_t *mp, uint32_t inflags);
+extern int xfs_fs_get_ag_reserve_blocks(struct xfs_mount *mp,
+		struct xfs_fsop_ag_resblks *out);
 
 extern int xfs_fs_reserve_ag_blocks(struct xfs_mount *mp);
 extern int xfs_fs_unreserve_ag_blocks(struct xfs_mount *mp);
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 9c0c7a9..cc00260 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1974,6 +1974,22 @@ xfs_file_ioctl(
 		return 0;
 	}
 
+	case XFS_IOC_GET_AG_RESBLKS: {
+		struct xfs_fsop_ag_resblks	out;
+
+		if (!capable(CAP_SYS_ADMIN))
+			return -EPERM;
+
+		error = xfs_fs_get_ag_reserve_blocks(mp, &out);
+		if (error)
+			return error;
+
+		if (copy_to_user(arg, &out, sizeof(out)))
+			return -EFAULT;
+
+		return 0;
+	}
+
 	case XFS_IOC_FSGROWFSDATA: {
 		xfs_growfs_data_t in;
 
diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
index fa0bc4d..e8b4de3 100644
--- a/fs/xfs/xfs_ioctl32.c
+++ b/fs/xfs/xfs_ioctl32.c
@@ -556,6 +556,7 @@ xfs_file_compat_ioctl(
 	case XFS_IOC_ERROR_INJECTION:
 	case XFS_IOC_ERROR_CLEARALL:
 	case FS_IOC_GETFSMAP:
+	case XFS_IOC_GET_AG_RESBLKS:
 		return xfs_file_ioctl(filp, cmd, p);
 #ifndef BROKEN_X86_ALIGNMENT
 	/* These are handled fine if no alignment issues */


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 02/22] xfs: add scrub tracepoints
  2017-07-21  4:38 [PATCH v8 00/22] xfs: online scrub support Darrick J. Wong
  2017-07-21  4:38 ` [PATCH 01/22] xfs: query the per-AG reservation counters Darrick J. Wong
@ 2017-07-21  4:38 ` Darrick J. Wong
  2017-07-23 16:23   ` Allison Henderson
  2017-07-21  4:38 ` [PATCH 03/22] xfs: create an ioctl to scrub AG metadata Darrick J. Wong
                   ` (19 subsequent siblings)
  21 siblings, 1 reply; 63+ messages in thread
From: Darrick J. Wong @ 2017-07-21  4:38 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_types.h |    5 +
 fs/xfs/xfs_trace.h        |  375 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 380 insertions(+)


diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h
index 0220159..3ee2dba 100644
--- a/fs/xfs/libxfs/xfs_types.h
+++ b/fs/xfs/libxfs/xfs_types.h
@@ -94,6 +94,11 @@ typedef int64_t		xfs_sfiloff_t;	/* signed block number in a file */
 #define	XFS_ATTR_FORK	1
 #define	XFS_COW_FORK	2
 
+#define XFS_FORK_DESC \
+	{ XFS_DATA_FORK,	"data" }, \
+	{ XFS_ATTR_FORK,	"attr" }, \
+	{ XFS_COW_FORK,		"CoW" }
+
 /*
  * Min numbers of data/attr fork btree root pointers.
  */
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index bcc3cdf..2e7e193 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -42,6 +42,7 @@ struct xfs_btree_cur;
 struct xfs_refcount_irec;
 struct xfs_fsmap;
 struct xfs_rmap_irec;
+struct xfs_scrub_metadata;
 
 DECLARE_EVENT_CLASS(xfs_attr_list_class,
 	TP_PROTO(struct xfs_attr_list_context *ctx),
@@ -3309,6 +3310,380 @@ DEFINE_GETFSMAP_EVENT(xfs_getfsmap_low_key);
 DEFINE_GETFSMAP_EVENT(xfs_getfsmap_high_key);
 DEFINE_GETFSMAP_EVENT(xfs_getfsmap_mapping);
 
+/* scrub */
+#define XFS_SCRUB_TYPE_DESC \
+	{ 0, NULL }
+DECLARE_EVENT_CLASS(xfs_scrub_class,
+	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
+		 int error),
+	TP_ARGS(ip, sm, error),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(int, type)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_ino_t, inum)
+		__field(unsigned int, gen)
+		__field(unsigned int, flags)
+		__field(int, error)
+	),
+	TP_fast_assign(
+		__entry->dev = ip->i_mount->m_super->s_dev;
+		__entry->ino = ip->i_ino;
+		__entry->error = error;
+	),
+	TP_printk("dev %d:%d ino %llu type %s agno %u inum %llu gen %u flags 0x%x error %d",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __print_symbolic(__entry->type, XFS_SCRUB_TYPE_DESC),
+		  __entry->agno,
+		  __entry->inum,
+		  __entry->gen,
+		  __entry->flags,
+		  __entry->error)
+)
+#define DEFINE_SCRUB_EVENT(name) \
+DEFINE_EVENT(xfs_scrub_class, name, \
+	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm, \
+		 int error), \
+	TP_ARGS(ip, sm, error))
+
+DEFINE_SCRUB_EVENT(xfs_scrub);
+DEFINE_SCRUB_EVENT(xfs_scrub_done);
+DEFINE_SCRUB_EVENT(xfs_scrub_deadlock_retry);
+
+DECLARE_EVENT_CLASS(xfs_scrub_sbtree_class,
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agblock_t bno,
+		 xfs_btnum_t btnum, int level, int nlevels, int ptr),
+	TP_ARGS(mp, agno, bno, btnum, level, nlevels, ptr),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_btnum_t, btnum)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_agblock_t, bno)
+		__field(int, level)
+		__field(int, nlevels)
+		__field(int, ptr)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->agno = agno;
+		__entry->btnum = btnum;
+		__entry->bno = bno;
+		__entry->level = level;
+		__entry->nlevels = nlevels;
+		__entry->ptr = ptr;
+	),
+	TP_printk("dev %d:%d agno %u agbno %u btnum %d level %d nlevels %d ptr %d",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->agno,
+		  __entry->bno,
+		  __entry->btnum,
+		  __entry->level,
+		  __entry->nlevels,
+		  __entry->ptr)
+)
+#define DEFINE_SCRUB_SBTREE_EVENT(name) \
+DEFINE_EVENT(xfs_scrub_sbtree_class, name, \
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agblock_t bno, \
+		 xfs_btnum_t btnum, int level, int nlevels, int ptr), \
+	TP_ARGS(mp, agno, bno, btnum, level, nlevels, ptr))
+
+DEFINE_SCRUB_SBTREE_EVENT(xfs_scrub_btree_rec);
+DEFINE_SCRUB_SBTREE_EVENT(xfs_scrub_btree_key);
+
+TRACE_EVENT(xfs_scrub_op_error,
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agblock_t bno,
+		 const char *type, int error, const char *func,
+		 int line),
+	TP_ARGS(mp, agno, bno, type, error, func, line),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_agblock_t, bno)
+		__string(type, type)
+		__field(int, error)
+		__string(func, func)
+		__field(int, line)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->agno = agno;
+		__entry->bno = bno;
+		__assign_str(type, type);
+		__entry->error = error;
+		__assign_str(func, func);
+		__entry->line = line;
+	),
+	TP_printk("dev %d:%d agno %u agbno %u type '%s' error %d fn %s:%d",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->agno,
+		  __entry->bno,
+		  __get_str(type),
+		  __entry->error,
+		  __get_str(func),
+		  __entry->line)
+);
+
+TRACE_EVENT(xfs_scrub_file_op_error,
+	TP_PROTO(struct xfs_inode *ip, int whichfork, xfs_fileoff_t offset,
+		 const char *type, int error, const char *func,
+		 int line),
+	TP_ARGS(ip, whichfork, offset, type, error, func, line),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(int, whichfork)
+		__field(xfs_fileoff_t, offset)
+		__string(type, type)
+		__field(int, error)
+		__string(func, func)
+		__field(int, line)
+	),
+	TP_fast_assign(
+		__entry->dev = ip->i_mount->m_super->s_dev;
+		__entry->ino = ip->i_ino;
+		__entry->whichfork = whichfork;
+		__entry->offset = offset;
+		__assign_str(type, type);
+		__entry->error = error;
+		__assign_str(func, func);
+		__entry->line = line;
+	),
+	TP_printk("dev %d:%d ino %llu %s offset %llu type '%s' error %d fn %s:%d",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __print_symbolic(__entry->whichfork, XFS_FORK_DESC),
+		  __entry->offset,
+		  __get_str(type),
+		  __entry->error,
+		  __get_str(func),
+		  __entry->line)
+);
+
+DECLARE_EVENT_CLASS(xfs_scrub_block_error_class,
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agblock_t bno,
+		 const char *type, const char *check, const char *func,
+		 int line),
+	TP_ARGS(mp, agno, bno, type, check, func, line),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_agblock_t, bno)
+		__string(type, type)
+		__string(check, check)
+		__string(func, func)
+		__field(int, line)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->agno = agno;
+		__entry->bno = bno;
+		__assign_str(type, type);
+		__assign_str(check, check);
+		__assign_str(func, func);
+		__entry->line = line;
+	),
+	TP_printk("dev %d:%d agno %u agbno %u type '%s' check '%s' fn %s:%d",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->agno,
+		  __entry->bno,
+		  __get_str(type),
+		  __get_str(check),
+		  __get_str(func),
+		  __entry->line)
+)
+
+#define DEFINE_SCRUB_BLOCK_ERROR_EVENT(name) \
+DEFINE_EVENT(xfs_scrub_block_error_class, name, \
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agblock_t bno, \
+		 const char *type, const char *check, const char *func, \
+		 int line), \
+	TP_ARGS(mp, agno, bno, type, check, func, line))
+
+DEFINE_SCRUB_BLOCK_ERROR_EVENT(xfs_scrub_block_error);
+DEFINE_SCRUB_BLOCK_ERROR_EVENT(xfs_scrub_block_preen);
+
+DECLARE_EVENT_CLASS(xfs_scrub_ino_error_class,
+	TP_PROTO(struct xfs_mount *mp, xfs_ino_t ino, xfs_agnumber_t agno, xfs_agblock_t bno,
+		 const char *type, const char *check, const char *func,
+		 int line),
+	TP_ARGS(mp, ino, agno, bno, type, check, func, line),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_agblock_t, bno)
+		__string(type, type)
+		__string(check, check)
+		__string(func, func)
+		__field(int, line)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->ino = ino;
+		__entry->agno = agno;
+		__entry->bno = bno;
+		__assign_str(type, type);
+		__assign_str(check, check);
+		__assign_str(func, func);
+		__entry->line = line;
+	),
+	TP_printk("dev %d:%d ino %llu agno %u agbno %u type '%s' check '%s' fn %s:%d",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->agno,
+		  __entry->bno,
+		  __get_str(type),
+		  __get_str(check),
+		  __get_str(func),
+		  __entry->line)
+)
+
+#define DEFINE_SCRUB_INO_ERROR_EVENT(name) \
+DEFINE_EVENT(xfs_scrub_ino_error_class, name, \
+	TP_PROTO(struct xfs_mount *mp, xfs_ino_t ino, xfs_agnumber_t agno, xfs_agblock_t bno, \
+		 const char *type, const char *check, const char *func, \
+		 int line), \
+	TP_ARGS(mp, ino, agno, bno, type, check, func, line))
+
+DEFINE_SCRUB_INO_ERROR_EVENT(xfs_scrub_ino_error);
+DEFINE_SCRUB_INO_ERROR_EVENT(xfs_scrub_ino_preen);
+
+DECLARE_EVENT_CLASS(xfs_scrub_data_error_class,
+	TP_PROTO(struct xfs_inode *ip, int whichfork, xfs_fileoff_t offset,
+		 const char *type, const char *check, const char *func,
+		 int line),
+	TP_ARGS(ip, whichfork, offset, type, check, func, line),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(int, whichfork)
+		__field(xfs_fileoff_t, offset)
+		__string(type, type)
+		__string(check, check)
+		__string(func, func)
+		__field(int, line)
+	),
+	TP_fast_assign(
+		__entry->dev = ip->i_mount->m_super->s_dev;
+		__entry->ino = ip->i_ino;
+		__entry->whichfork = whichfork;
+		__entry->offset = offset;
+		__assign_str(type, type);
+		__assign_str(check, check);
+		__assign_str(func, func);
+		__entry->line = line;
+	),
+	TP_printk("dev %d:%d ino %llu %s fork offset %llu type '%s' check '%s' fn %s:%d",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __print_symbolic(__entry->whichfork, XFS_FORK_DESC),
+		  __entry->offset,
+		  __get_str(type),
+		  __get_str(check),
+		  __get_str(func),
+		  __entry->line)
+);
+
+#define DEFINE_SCRUB_DATA_ERROR_EVENT(name) \
+DEFINE_EVENT(xfs_scrub_data_error_class, name, \
+	TP_PROTO(struct xfs_inode *ip, int whichfork, xfs_fileoff_t offset, \
+		 const char *type, const char *check, const char *func, \
+		 int line), \
+	TP_ARGS(ip, whichfork, offset, type, check, func, line))
+
+DEFINE_SCRUB_DATA_ERROR_EVENT(xfs_scrub_data_error);
+DEFINE_SCRUB_DATA_ERROR_EVENT(xfs_scrub_data_warning);
+
+TRACE_EVENT(xfs_scrub_xref_error,
+	TP_PROTO(struct xfs_mount *mp, const char *type, int error,
+		 const char *func, int line),
+	TP_ARGS(mp, type, error, func, line),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__string(type, type)
+		__field(int, error)
+		__string(func, func)
+		__field(int, line)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__assign_str(type, type);
+		__entry->error = error;
+		__assign_str(func, func);
+		__entry->line = line;
+	),
+	TP_printk("dev %d:%d btree %s xref error %d fn %s:%d",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __get_str(type),
+		  __entry->error,
+		  __get_str(func),
+		  __entry->line)
+);
+
+TRACE_EVENT(xfs_scrub_btree_error,
+	TP_PROTO(struct xfs_mount *mp, const char *bt_type, const char *bt_ptr,
+		 xfs_agnumber_t agno, xfs_agblock_t bno, const char *check,
+		 const char *func, int line),
+	TP_ARGS(mp, bt_type, bt_ptr, agno, bno, check, func, line),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__string(bt_type, bt_type)
+		__string(bt_ptr, bt_ptr)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_agblock_t, bno)
+		__string(check, check)
+		__string(func, func)
+		__field(int, line)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__assign_str(bt_type, bt_type);
+		__assign_str(bt_ptr, bt_ptr);
+		__entry->agno = agno;
+		__entry->bno = bno;
+		__assign_str(check, check);
+		__assign_str(func, func);
+		__entry->line = line;
+	),
+	TP_printk("dev %d:%d %s %s agno %u agbno %u check '%s' fn %s:%d",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __get_str(bt_type),
+		  __get_str(bt_ptr),
+		  __entry->agno,
+		  __entry->bno,
+		  __get_str(check),
+		  __get_str(func),
+		  __entry->line)
+);
+
+TRACE_EVENT(xfs_scrub_incomplete,
+	TP_PROTO(struct xfs_mount *mp, const char *type, const char *check,
+		 const char *func, int line),
+	TP_ARGS(mp, type, check, func, line),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__string(type, type)
+		__string(check, check)
+		__string(func, func)
+		__field(int, line)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__assign_str(type, type);
+		__assign_str(check, check);
+		__assign_str(func, func);
+		__entry->line = line;
+	),
+	TP_printk("dev %d:%d %s check '%s' fn %s:%d",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __get_str(type),
+		  __get_str(check),
+		  __get_str(func),
+		  __entry->line)
+);
+
 #endif /* _TRACE_XFS_H */
 
 #undef TRACE_INCLUDE_PATH


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 03/22] xfs: create an ioctl to scrub AG metadata
  2017-07-21  4:38 [PATCH v8 00/22] xfs: online scrub support Darrick J. Wong
  2017-07-21  4:38 ` [PATCH 01/22] xfs: query the per-AG reservation counters Darrick J. Wong
  2017-07-21  4:38 ` [PATCH 02/22] xfs: add scrub tracepoints Darrick J. Wong
@ 2017-07-21  4:38 ` Darrick J. Wong
  2017-07-23 16:37   ` Allison Henderson
  2017-07-23 23:45   ` Dave Chinner
  2017-07-21  4:38 ` [PATCH 04/22] xfs: generic functions to scrub metadata and btrees Darrick J. Wong
                   ` (18 subsequent siblings)
  21 siblings, 2 replies; 63+ messages in thread
From: Darrick J. Wong @ 2017-07-21  4:38 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Create an ioctl that can be used to scrub internal filesystem metadata.
The new ioctl takes the metadata type, an (optional) AG number, an
(optional) inode number and generation, and a flags argument.  This will
be used by the upcoming XFS online scrub tool.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Kconfig           |   17 +
 fs/xfs/Makefile          |    7 +
 fs/xfs/libxfs/xfs_fs.h   |   41 ++++
 fs/xfs/scrub/common.c    |  533 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/common.h    |  179 +++++++++++++++
 fs/xfs/scrub/xfs_scrub.h |   29 +++
 fs/xfs/xfs_ioctl.c       |   28 ++
 fs/xfs/xfs_ioctl32.c     |    1 
 fs/xfs/xfs_trace.h       |    7 +
 9 files changed, 841 insertions(+), 1 deletion(-)
 create mode 100644 fs/xfs/scrub/common.c
 create mode 100644 fs/xfs/scrub/common.h
 create mode 100644 fs/xfs/scrub/xfs_scrub.h


diff --git a/fs/xfs/Kconfig b/fs/xfs/Kconfig
index 1b98cfa..f42fcf1 100644
--- a/fs/xfs/Kconfig
+++ b/fs/xfs/Kconfig
@@ -71,6 +71,23 @@ config XFS_RT
 
 	  If unsure, say N.
 
+config XFS_ONLINE_SCRUB
+	bool "XFS online metadata check support"
+	default n
+	depends on XFS_FS
+	help
+	  If you say Y here you will be able to check metadata on a
+	  mounted XFS filesystem.  This feature is intended to reduce
+	  filesystem downtime by supplementing xfs_repair.  The key
+	  advantage here is to look for problems proactively so that
+	  they can be dealt with in a controlled manner.
+
+	  This feature is considered EXPERIMENTAL.  Use with caution!
+
+	  See the xfs_scrub man page in section 8 for additional information.
+
+	  If unsure, say N.
+
 config XFS_WARN
 	bool "XFS Verbose Warnings"
 	depends on XFS_FS && !XFS_DEBUG
diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 5b959ee..c4fdaa2 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -136,3 +136,10 @@ xfs-$(CONFIG_XFS_POSIX_ACL)	+= xfs_acl.o
 xfs-$(CONFIG_SYSCTL)		+= xfs_sysctl.o
 xfs-$(CONFIG_COMPAT)		+= xfs_ioctl32.o
 xfs-$(CONFIG_EXPORTFS_BLOCK_OPS)	+= xfs_pnfs.o
+
+# online scrub/repair
+ifeq ($(CONFIG_XFS_ONLINE_SCRUB),y)
+xfs-y				+= $(addprefix scrub/, \
+				   common.o \
+				   )
+endif
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 5dedab9..aeccc99 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -468,6 +468,46 @@ typedef struct xfs_swapext
 #define XFS_FSOP_GOING_FLAGS_LOGFLUSH		0x1	/* flush log but not data */
 #define XFS_FSOP_GOING_FLAGS_NOLOGFLUSH		0x2	/* don't flush log nor data */
 
+/* metadata scrubbing */
+struct xfs_scrub_metadata {
+	__u32 sm_type;		/* What to check? */
+	__u32 sm_flags;		/* flags; see below. */
+	__u64 sm_ino;		/* inode number. */
+	__u32 sm_gen;		/* inode generation. */
+	__u32 sm_agno;		/* ag number. */
+	__u64 sm_reserved[5];	/* pad to 64 bytes */
+};
+
+/*
+ * Metadata types and flags for scrub operation.
+ */
+#define XFS_SCRUB_TYPE_TEST	0	/* dummy to test ioctl */
+#define XFS_SCRUB_TYPE_MAX	0
+
+/* i: repair this metadata */
+#define XFS_SCRUB_FLAG_REPAIR		(1 << 0)
+/* o: metadata object needs repair */
+#define XFS_SCRUB_FLAG_CORRUPT		(1 << 1)
+/* o: metadata object could be optimized */
+#define XFS_SCRUB_FLAG_PREEN		(1 << 2)
+/* o: cross-referencing failed */
+#define XFS_SCRUB_FLAG_XFAIL		(1 << 3)
+/* o: metadata object disagrees with cross-referenced metadata */
+#define XFS_SCRUB_FLAG_XCORRUPT		(1 << 4)
+/* o: scan was not complete */
+#define XFS_SCRUB_FLAG_INCOMPLETE	(1 << 5)
+/* o: metadata object looked funny but isn't corrupt */
+#define XFS_SCRUB_FLAG_WARNING		(1 << 6)
+
+#define XFS_SCRUB_FLAGS_IN	(XFS_SCRUB_FLAG_REPAIR)
+#define XFS_SCRUB_FLAGS_OUT	(XFS_SCRUB_FLAG_CORRUPT | \
+				 XFS_SCRUB_FLAG_PREEN | \
+				 XFS_SCRUB_FLAG_XFAIL | \
+				 XFS_SCRUB_FLAG_XCORRUPT | \
+				 XFS_SCRUB_FLAG_INCOMPLETE | \
+				 XFS_SCRUB_FLAG_WARNING)
+#define XFS_SCRUB_FLAGS_ALL	(XFS_SCRUB_FLAGS_IN | XFS_SCRUB_FLAGS_OUT)
+
 /*
  * AG reserved block counters
  */
@@ -520,6 +560,7 @@ struct xfs_fsop_ag_resblks {
 #define XFS_IOC_ZERO_RANGE	_IOW ('X', 57, struct xfs_flock64)
 #define XFS_IOC_FREE_EOFBLOCKS	_IOR ('X', 58, struct xfs_fs_eofblocks)
 /*	XFS_IOC_GETFSMAP ------ hoisted 59         */
+#define XFS_IOC_SCRUB_METADATA	_IOWR('X', 60, struct xfs_scrub_metadata)
 
 /*
  * ioctl commands that replace IRIX syssgi()'s
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
new file mode 100644
index 0000000..6931793
--- /dev/null
+++ b/fs/xfs/scrub/common.c
@@ -0,0 +1,533 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_trace.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_alloc.h"
+#include "xfs_alloc_btree.h"
+#include "xfs_bmap.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_ialloc.h"
+#include "xfs_ialloc_btree.h"
+#include "xfs_refcount.h"
+#include "xfs_refcount_btree.h"
+#include "xfs_rmap.h"
+#include "xfs_rmap_btree.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/common.h"
+
+/*
+ * Online Scrub and Repair
+ *
+ * Traditionally, XFS (the kernel driver) did not know how to check or
+ * repair on-disk data structures.  That task was left to the xfs_check
+ * and xfs_repair tools, both of which require taking the filesystem
+ * offline for a thorough but time consuming examination.  Online
+ * scrub & repair, on the other hand, enables us to check the metadata
+ * for obvious errors while carefully stepping around the filesystem's
+ * ongoing operations, locking rules, etc.
+ *
+ * Given that most XFS metadata consist of records stored in a btree,
+ * most of the checking functions iterate the btree blocks themselves
+ * looking for irregularities.  When a record block is encountered, each
+ * record can be checked for obviously bad values.  Record values can
+ * also be cross-referenced against other btrees to look for potential
+ * misunderstandings between pieces of metadata.
+ *
+ * It is expected that the checkers responsible for per-AG metadata
+ * structures will lock the AG headers (AGI, AGF, AGFL), iterate the
+ * metadata structure, and perform any relevant cross-referencing before
+ * unlocking the AG and returning the results to userspace.  These
+ * scrubbers must not keep an AG locked for too long to avoid tying up
+ * the block and inode allocators.
+ *
+ * Block maps and b-trees rooted in an inode present a special challenge
+ * because they can involve extents from any AG.  The general scrubber
+ * structure of lock -> check -> xref -> unlock still holds, but AG
+ * locking order rules /must/ be obeyed to avoid deadlocks.  The
+ * ordering rule, of course, is that we must lock in increasing AG
+ * order.  Helper functions are provided to track which AG headers we've
+ * already locked.  If we detect an imminent locking order violation, we
+ * can signal a potential deadlock, in which case the scrubber can jump
+ * out to the top level, lock all the AGs in order, and retry the scrub.
+ *
+ * For file data (directories, extended attributes, symlinks) scrub, we
+ * can simply lock the inode and walk the data.  For btree data
+ * (directories and attributes) we follow the same btree-scrubbing
+ * strategy outlined previously to check the records.
+ *
+ * We use a bit of trickery with transactions to avoid buffer deadlocks
+ * if there is a cycle in the metadata.  The basic problem is that
+ * travelling down a btree involves locking the current buffer at each
+ * tree level.  If a pointer should somehow point back to a buffer that
+ * we've already examined, we will deadlock due to the second buffer
+ * locking attempt.  Note however that grabbing a buffer in transaction
+ * context links the locked buffer to the transaction.  If we try to
+ * re-grab the buffer in the context of the same transaction, we avoid
+ * the second lock attempt and continue.  Between the verifier and the
+ * scrubber, something will notice that something is amiss and report
+ * the corruption.  Therefore, each scrubber will allocate an empty
+ * transaction, attach buffers to it, and cancel the transaction at the
+ * end of the scrub run.  Cancelling a non-dirty transaction simply
+ * unlocks the buffers.
+ *
+ * There are four pieces of data that scrub can communicate to
+ * userspace.  The first is the error code (errno), which can be used to
+ * communicate operational errors in performing the scrub.  There are
+ * also three flags that can be set in the scrub context.  If the data
+ * structure itself is corrupt, the CORRUPT flag will be set.  If
+ * the metadata is correct but otherwise suboptimal, the PREEN flag
+ * will be set.
+ */
+
+struct xfs_scrub_meta_fns {
+	int		(*setup)(struct xfs_scrub_context *,
+				 struct xfs_inode *);
+	int		(*scrub)(struct xfs_scrub_context *);
+	bool		(*has)(struct xfs_sb *);
+};
+
+/* Check for operational errors. */
+bool
+xfs_scrub_op_ok(
+	struct xfs_scrub_context	*sc,
+	xfs_agnumber_t			agno,
+	xfs_agblock_t			bno,
+	const char			*type,
+	int				*error,
+	const char			*func,
+	int				line)
+{
+	struct xfs_mount		*mp = sc->mp;
+
+	switch (*error) {
+	case 0:
+		return true;
+	case -EDEADLOCK:
+		/* Used to restart an op with deadlock avoidance. */
+		trace_xfs_scrub_deadlock_retry(sc->ip, sc->sm, *error);
+		break;
+	case -EFSBADCRC:
+	case -EFSCORRUPTED:
+		/* Note the badness but don't abort. */
+		sc->sm->sm_flags |= XFS_SCRUB_FLAG_CORRUPT;
+		*error = 0;
+		/* fall through */
+	default:
+		trace_xfs_scrub_op_error(mp, agno, bno, type, *error, func,
+				line);
+		break;
+	}
+	return false;
+}
+
+/* Check for operational errors for a file offset. */
+bool
+xfs_scrub_file_op_ok(
+	struct xfs_scrub_context	*sc,
+	int				whichfork,
+	xfs_fileoff_t			offset,
+	const char			*type,
+	int				*error,
+	const char			*func,
+	int				line)
+{
+	switch (*error) {
+	case 0:
+		return true;
+	case -EDEADLOCK:
+		/* Used to restart an op with deadlock avoidance. */
+		trace_xfs_scrub_deadlock_retry(sc->ip, sc->sm, *error);
+		break;
+	case -EFSBADCRC:
+	case -EFSCORRUPTED:
+		/* Note the badness but don't abort. */
+		sc->sm->sm_flags |= XFS_SCRUB_FLAG_CORRUPT;
+		*error = 0;
+		/* fall through */
+	default:
+		trace_xfs_scrub_file_op_error(sc->ip, whichfork, offset, type,
+				*error, func, line);
+		break;
+	}
+	return false;
+}
+
+/* Check for metadata block optimization possibilities. */
+bool
+xfs_scrub_block_preen(
+	struct xfs_scrub_context	*sc,
+	struct xfs_buf			*bp,
+	const char			*type,
+	bool				fs_ok,
+	const char			*check,
+	const char			*func,
+	int				line)
+{
+	struct xfs_mount		*mp = sc->mp;
+	xfs_fsblock_t			fsbno;
+	xfs_agnumber_t			agno;
+	xfs_agblock_t			bno;
+
+	if (fs_ok)
+		return fs_ok;
+
+	fsbno = XFS_DADDR_TO_FSB(mp, bp->b_bn);
+	agno = XFS_FSB_TO_AGNO(mp, fsbno);
+	bno = XFS_FSB_TO_AGBNO(mp, fsbno);
+
+	sc->sm->sm_flags |= XFS_SCRUB_FLAG_PREEN;
+	trace_xfs_scrub_block_preen(mp, agno, bno, type, check, func, line);
+	return fs_ok;
+}
+
+/* Check for metadata block corruption. */
+bool
+xfs_scrub_block_ok(
+	struct xfs_scrub_context	*sc,
+	struct xfs_buf			*bp,
+	const char			*type,
+	bool				fs_ok,
+	const char			*check,
+	const char			*func,
+	int				line)
+{
+	struct xfs_mount		*mp = sc->mp;
+	xfs_fsblock_t			fsbno;
+	xfs_agnumber_t			agno;
+	xfs_agblock_t			bno;
+
+	if (fs_ok)
+		return fs_ok;
+
+	fsbno = XFS_DADDR_TO_FSB(mp, bp->b_bn);
+	agno = XFS_FSB_TO_AGNO(mp, fsbno);
+	bno = XFS_FSB_TO_AGBNO(mp, fsbno);
+
+	sc->sm->sm_flags |= XFS_SCRUB_FLAG_CORRUPT;
+	trace_xfs_scrub_block_error(mp, agno, bno, type, check, func, line);
+	return fs_ok;
+}
+
+/* Check for inode metadata corruption. */
+bool
+xfs_scrub_ino_ok(
+	struct xfs_scrub_context	*sc,
+	xfs_ino_t			ino,
+	struct xfs_buf			*bp,
+	const char			*type,
+	bool				fs_ok,
+	const char			*check,
+	const char			*func,
+	int				line)
+{
+	struct xfs_inode		*ip = sc->ip;
+	struct xfs_mount		*mp = sc->mp;
+	xfs_fsblock_t			fsbno;
+	xfs_agnumber_t			agno;
+	xfs_agblock_t			bno;
+
+	if (fs_ok)
+		return fs_ok;
+
+	if (bp) {
+		fsbno = XFS_DADDR_TO_FSB(mp, bp->b_bn);
+		agno = XFS_FSB_TO_AGNO(mp, fsbno);
+		bno = XFS_FSB_TO_AGBNO(mp, fsbno);
+	} else {
+		agno = XFS_INO_TO_AGNO(mp, ip->i_ino);
+		bno = XFS_INO_TO_AGINO(mp, ip->i_ino);
+	}
+
+	sc->sm->sm_flags |= XFS_SCRUB_FLAG_CORRUPT;
+	trace_xfs_scrub_ino_error(mp, ino, agno, bno, type, check, func, line);
+	return fs_ok;
+}
+
+/* Check for inode metadata optimization possibilities. */
+bool
+xfs_scrub_ino_preen(
+	struct xfs_scrub_context	*sc,
+	struct xfs_buf			*bp,
+	const char			*type,
+	bool				fs_ok,
+	const char			*check,
+	const char			*func,
+	int				line)
+{
+	struct xfs_inode		*ip = sc->ip;
+	struct xfs_mount		*mp = sc->mp;
+	xfs_fsblock_t			fsbno;
+	xfs_agnumber_t			agno;
+	xfs_agblock_t			bno;
+
+	if (fs_ok)
+		return fs_ok;
+
+	if (bp) {
+		fsbno = XFS_DADDR_TO_FSB(mp, bp->b_bn);
+		agno = XFS_FSB_TO_AGNO(mp, fsbno);
+		bno = XFS_FSB_TO_AGBNO(mp, fsbno);
+	} else {
+		agno = XFS_INO_TO_AGNO(mp, ip->i_ino);
+		bno = XFS_INO_TO_AGINO(mp, ip->i_ino);
+	}
+
+	sc->sm->sm_flags |= XFS_SCRUB_FLAG_PREEN;
+	trace_xfs_scrub_ino_preen(mp, ip->i_ino, agno, bno, type, check,
+			func, line);
+	return fs_ok;
+}
+
+/* Check for file data block corruption. */
+bool
+xfs_scrub_data_ok(
+	struct xfs_scrub_context	*sc,
+	int				whichfork,
+	xfs_fileoff_t			offset,
+	const char			*type,
+	bool				fs_ok,
+	const char			*check,
+	const char			*func,
+	int				line)
+{
+	if (fs_ok)
+		return fs_ok;
+
+	sc->sm->sm_flags |= XFS_SCRUB_FLAG_CORRUPT;
+	trace_xfs_scrub_data_error(sc->ip, whichfork, offset, type, check,
+			func, line);
+	return fs_ok;
+}
+
+/* Check for file data block non-corruption problems. */
+bool
+xfs_scrub_data_warn_ok(
+	struct xfs_scrub_context	*sc,
+	int				whichfork,
+	xfs_fileoff_t			offset,
+	const char			*type,
+	bool				fs_ok,
+	const char			*check,
+	const char			*func,
+	int				line)
+{
+	if (fs_ok)
+		return fs_ok;
+
+	sc->sm->sm_flags |= XFS_SCRUB_FLAG_WARNING;
+	trace_xfs_scrub_data_warning(sc->ip, whichfork, offset, type, check,
+			func, line);
+	return fs_ok;
+}
+
+/* Signal an incomplete scrub. */
+bool
+xfs_scrub_incomplete(
+	struct xfs_scrub_context	*sc,
+	const char			*type,
+	bool				fs_ok,
+	const char			*check,
+	const char			*func,
+	int				line)
+{
+	if (fs_ok)
+		return fs_ok;
+
+	sc->sm->sm_flags |= XFS_SCRUB_FLAG_INCOMPLETE;
+	trace_xfs_scrub_incomplete(sc->mp, type, check, func, line);
+	return fs_ok;
+}
+
+/* Dummy scrubber */
+
+int
+xfs_scrub_dummy(
+	struct xfs_scrub_context	*sc)
+{
+	if (sc->sm->sm_ino || sc->sm->sm_agno)
+		return -EINVAL;
+	if (sc->sm->sm_gen & XFS_SCRUB_FLAG_CORRUPT)
+		sc->sm->sm_flags |= XFS_SCRUB_FLAG_CORRUPT;
+	if (sc->sm->sm_gen & XFS_SCRUB_FLAG_PREEN)
+		sc->sm->sm_flags |= XFS_SCRUB_FLAG_PREEN;
+	if (sc->sm->sm_gen & XFS_SCRUB_FLAG_XFAIL)
+		sc->sm->sm_flags |= XFS_SCRUB_FLAG_XFAIL;
+	if (sc->sm->sm_gen & XFS_SCRUB_FLAG_XCORRUPT)
+		sc->sm->sm_flags |= XFS_SCRUB_FLAG_XCORRUPT;
+	if (sc->sm->sm_gen & ~XFS_SCRUB_FLAGS_OUT)
+		return -ENOENT;
+
+	return 0;
+}
+
+/* Per-scrubber setup functions */
+
+/* Set us up with a transaction and an empty context. */
+int
+xfs_scrub_setup_fs(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	return xfs_scrub_trans_alloc(sc->sm, sc->mp,
+			&M_RES(sc->mp)->tr_itruncate, 0, 0, 0, &sc->tp);
+}
+
+/* Scrub setup and teardown */
+
+/* Free all the resources and finish the transactions. */
+STATIC int
+xfs_scrub_teardown(
+	struct xfs_scrub_context	*sc,
+	int				error)
+{
+	if (sc->tp) {
+		xfs_trans_cancel(sc->tp);
+		sc->tp = NULL;
+	}
+	return error;
+}
+
+/* Perform common scrub context initialization. */
+STATIC int
+xfs_scrub_setup(
+	struct xfs_inode		*ip,
+	struct xfs_scrub_context	*sc,
+	const struct xfs_scrub_meta_fns	*fns,
+	struct xfs_scrub_metadata	*sm,
+	bool				try_harder)
+{
+	memset(sc, 0, sizeof(*sc));
+	sc->mp = ip->i_mount;
+	sc->sm = sm;
+	sc->fns = fns;
+	sc->try_harder = try_harder;
+
+	return sc->fns->setup(sc, ip);
+}
+
+/* Scrubbing dispatch. */
+
+static const struct xfs_scrub_meta_fns meta_scrub_fns[] = {
+	{ /* dummy verifier */
+		.setup	= xfs_scrub_setup_fs,
+		.scrub	= xfs_scrub_dummy,
+	},
+};
+
+/* Dispatch metadata scrubbing. */
+int
+xfs_scrub_metadata(
+	struct xfs_inode		*ip,
+	struct xfs_scrub_metadata	*sm)
+{
+	struct xfs_scrub_context	sc;
+	struct xfs_mount		*mp = ip->i_mount;
+	const struct xfs_scrub_meta_fns	*fns;
+	bool				try_harder = false;
+	int				error = 0;
+
+	trace_xfs_scrub(ip, sm, error);
+
+	/* Forbidden if we are shut down or mounted norecovery. */
+	error = -ESHUTDOWN;
+	if (XFS_FORCED_SHUTDOWN(mp))
+		goto out;
+	error = -ENOTRECOVERABLE;
+	if (mp->m_flags & XFS_MOUNT_NORECOVERY)
+		goto out;
+
+	/* Check our inputs. */
+	error = -EINVAL;
+	sm->sm_flags &= ~XFS_SCRUB_FLAGS_OUT;
+	if (sm->sm_flags & ~XFS_SCRUB_FLAGS_IN)
+		goto out;
+	if (memchr_inv(sm->sm_reserved, 0, sizeof(sm->sm_reserved)))
+		goto out;
+
+	/* Do we know about this type of metadata? */
+	error = -ENOENT;
+	if (sm->sm_type > XFS_SCRUB_TYPE_MAX)
+		goto out;
+	fns = &meta_scrub_fns[sm->sm_type];
+	if (fns->scrub == NULL)
+		goto out;
+
+	/* Does this fs even support this type of metadata? */
+	if (fns->has && !fns->has(&mp->m_sb))
+		goto out;
+
+	/* We don't know how to repair anything yet. */
+	error = -EOPNOTSUPP;
+	if (sm->sm_flags & XFS_SCRUB_FLAG_REPAIR)
+		goto out;
+
+	/* This isn't a stable feature.  Use with care. */
+	{
+		static bool warned;
+
+		if (!warned)
+			xfs_alert(mp,
+	"EXPERIMENTAL online scrub feature in use. Use at your own risk!");
+		warned = true;
+	}
+
+retry_op:
+	/* Set up for the operation. */
+	error = xfs_scrub_setup(ip, &sc, fns, sm, try_harder);
+	if (error)
+		goto out_teardown;
+
+	/* Scrub for errors. */
+	error = fns->scrub(&sc);
+	if (!try_harder && error == -EDEADLOCK) {
+		/*
+		 * Scrubbers return -EDEADLOCK to mean 'try harder'.
+		 * Tear down everything we hold, then set up again with
+		 * preparation for worst-case scenarios.
+		 */
+		error = xfs_scrub_teardown(&sc, 0);
+		if (error)
+			goto out;
+		try_harder = true;
+		goto retry_op;
+	} else if (error)
+		goto out_teardown;
+
+	if (xfs_scrub_found_corruption(sm))
+		xfs_alert_ratelimited(mp, "Corruption detected during scrub.");
+
+out_teardown:
+	error = xfs_scrub_teardown(&sc, error);
+out:
+	trace_xfs_scrub_done(ip, sm, error);
+	return error;
+}
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
new file mode 100644
index 0000000..4f3113a
--- /dev/null
+++ b/fs/xfs/scrub/common.h
@@ -0,0 +1,179 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef __XFS_REPAIR_COMMON_H__
+#define __XFS_REPAIR_COMMON_H__
+
+/* Did we find something broken? */
+static inline bool xfs_scrub_found_corruption(struct xfs_scrub_metadata *sm)
+{
+	return sm->sm_flags & (XFS_SCRUB_FLAG_CORRUPT |
+			       XFS_SCRUB_FLAG_XCORRUPT);
+}
+
+struct xfs_scrub_context {
+	/* General scrub state. */
+	struct xfs_mount		*mp;
+	struct xfs_scrub_metadata	*sm;
+	const struct xfs_scrub_meta_fns	*fns;
+	struct xfs_trans		*tp;
+	struct xfs_inode		*ip;
+	bool				try_harder;
+};
+
+/* Should we end the scrub early? */
+static inline bool
+xfs_scrub_should_terminate(
+	int		*error)
+{
+	if (fatal_signal_pending(current)) {
+		if (*error == 0)
+			*error = -EAGAIN;
+		return true;
+	}
+	return false;
+}
+
+/*
+ * Grab a transaction.  If we're going to repair something, we need to
+ * ensure there's enough reservation to make all the changes.  If not,
+ * we can use an empty transaction.
+ */
+static inline int
+xfs_scrub_trans_alloc(
+	struct xfs_scrub_metadata	*sm,
+	struct xfs_mount		*mp,
+	struct xfs_trans_res		*resp,
+	uint				blocks,
+	uint				rtextents,
+	uint				flags,
+	struct xfs_trans		**tpp)
+{
+	return xfs_trans_alloc_empty(mp, tpp);
+}
+
+/* Check for operational errors. */
+bool xfs_scrub_op_ok(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
+		     xfs_agblock_t bno, const char *type, int *error,
+		     const char	*func, int line);
+#define XFS_SCRUB_OP_ERROR_GOTO(sc, agno, bno, type, error, label) \
+	do { \
+		if (!xfs_scrub_op_ok((sc), (agno), (bno), (type), \
+				(error), __func__, __LINE__)) \
+			goto label; \
+	} while (0)
+
+/* Check for operational errors for a file offset. */
+bool xfs_scrub_file_op_ok(struct xfs_scrub_context *sc, int whichfork,
+			  xfs_fileoff_t offset, const char *type,
+			  int *error, const char *func, int line);
+#define XFS_SCRUB_FILE_OP_ERROR_GOTO(sc, which, off, type, error, label) \
+	do { \
+		if (!xfs_scrub_file_op_ok((sc), (which), (off), (type), \
+				(error), __func__, __LINE__)) \
+			goto label; \
+	} while (0)
+
+/* Check for metadata block optimization possibilities. */
+bool xfs_scrub_block_preen(struct xfs_scrub_context *sc, struct xfs_buf *bp,
+			   const char *type, bool fs_ok, const char *check,
+			   const char *func, int line);
+#define XFS_SCRUB_PREEN(sc, bp, type, fs_ok) \
+	xfs_scrub_block_preen((sc), (bp), (type), (fs_ok), #fs_ok, \
+			__func__, __LINE__)
+
+/* Check for inode metadata optimization possibilities. */
+bool xfs_scrub_ino_preen(struct xfs_scrub_context *sc, struct xfs_buf *bp,
+		      const char *type, bool fs_ok, const char *check,
+		      const char *func, int line);
+#define XFS_SCRUB_INO_PREEN(sc, bp, type, fs_ok) \
+	xfs_scrub_ino_preen((sc), (bp), (type), (fs_ok), #fs_ok, \
+			__func__, __LINE__)
+
+/* Check for metadata block corruption. */
+bool xfs_scrub_block_ok(struct xfs_scrub_context *sc, struct xfs_buf *bp,
+			const char *type, bool fs_ok, const char *check,
+			const char *func, int line);
+#define XFS_SCRUB_CHECK(sc, bp, type, fs_ok) \
+	xfs_scrub_block_ok((sc), (bp), (type), (fs_ok), #fs_ok, \
+			__func__, __LINE__)
+#define XFS_SCRUB_GOTO(sc, bp, type, fs_ok, label) \
+	do { \
+		if (!xfs_scrub_block_ok((sc), (bp), (type), (fs_ok), \
+				#fs_ok, __func__, __LINE__)) \
+			goto label; \
+	} while (0)
+
+/* Check for inode metadata corruption. */
+bool xfs_scrub_ino_ok(struct xfs_scrub_context *sc, xfs_ino_t ino,
+		      struct xfs_buf *bp, const char *type, bool fs_ok,
+		      const char *check, const char *func, int line);
+#define XFS_SCRUB_INO_CHECK(sc, ino, bp, type, fs_ok) \
+	xfs_scrub_ino_ok((sc), (ino), (bp), (type), (fs_ok), #fs_ok, \
+			__func__, __LINE__)
+#define XFS_SCRUB_INO_GOTO(sc, ino, bp, type, fs_ok, label) \
+	do { \
+		if (!xfs_scrub_ino_ok((sc), (ino), (bp), (type), (fs_ok), \
+				#fs_ok, __func__, __LINE__)) \
+			goto label; \
+	} while (0)
+
+/* Check for file data block corruption. */
+bool xfs_scrub_data_ok(struct xfs_scrub_context *sc, int whichfork,
+		       xfs_fileoff_t offset, const char *type, bool fs_ok,
+		       const char *check, const char *func, int line);
+#define XFS_SCRUB_DATA_CHECK(sc, whichfork, offset, type, fs_ok) \
+	xfs_scrub_data_ok((sc), (whichfork), (offset), (type), (fs_ok), \
+			#fs_ok, __func__, __LINE__)
+#define XFS_SCRUB_DATA_GOTO(sc, whichfork, offset, type, fs_ok, label) \
+	do { \
+		if (!xfs_scrub_data_ok((sc), (whichfork), (offset), \
+				(type), (fs_ok), #fs_ok, __func__, __LINE__)) \
+			goto label; \
+	} while (0)
+
+/* Check for file data block non-corruption problems. */
+bool xfs_scrub_data_warn_ok(struct xfs_scrub_context *sc, int whichfork,
+			    xfs_fileoff_t offset, const char *type, bool fs_ok,
+			    const char *check, const char *func, int line);
+#define XFS_SCRUB_DATA_WARN(sc, whichfork, offset, type, fs_ok) \
+	xfs_scrub_data_warn_ok((sc), (whichfork), (offset), (type), (fs_ok), \
+			#fs_ok, __func__, __LINE__)
+
+/* Signal an incomplete scrub. */
+bool xfs_scrub_incomplete(struct xfs_scrub_context *sc, const char *type,
+			  bool fs_ok, const char *check, const char *func,
+			  int line);
+#define XFS_SCRUB_INCOMPLETE(sc, type, fs_ok) \
+	xfs_scrub_incomplete((sc), (type), (fs_ok), \
+			#fs_ok, __func__, __LINE__)
+
+/* Setup functions */
+
+#define SETUP_FN(name) int name(struct xfs_scrub_context *sc, struct xfs_inode *ip)
+SETUP_FN(xfs_scrub_setup_fs);
+#undef SETUP_FN
+
+/* Metadata scrubbers */
+
+#define SCRUB_FN(name) int name(struct xfs_scrub_context *sc)
+SCRUB_FN(xfs_scrub_dummy);
+#undef SCRUB_FN
+
+#endif	/* __XFS_REPAIR_COMMON_H__ */
diff --git a/fs/xfs/scrub/xfs_scrub.h b/fs/xfs/scrub/xfs_scrub.h
new file mode 100644
index 0000000..e00e0ea
--- /dev/null
+++ b/fs/xfs/scrub/xfs_scrub.h
@@ -0,0 +1,29 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef __XFS_SCRUB_H__
+#define __XFS_SCRUB_H__
+
+#ifndef CONFIG_XFS_ONLINE_SCRUB
+# define xfs_scrub_metadata(ip, sm)	(-ENOTTY)
+#else
+int xfs_scrub_metadata(struct xfs_inode *ip, struct xfs_scrub_metadata *sm);
+#endif /* CONFIG_XFS_ONLINE_SCRUB */
+
+#endif	/* __XFS_SCRUB_H__ */
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index cc00260..87b3874 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -44,6 +44,7 @@
 #include "xfs_btree.h"
 #include <linux/fsmap.h>
 #include "xfs_fsmap.h"
+#include "scrub/xfs_scrub.h"
 
 #include <linux/capability.h>
 #include <linux/cred.h>
@@ -1689,6 +1690,30 @@ xfs_ioc_getfsmap(
 	return 0;
 }
 
+STATIC int
+xfs_ioc_scrub_metadata(
+	struct xfs_inode		*ip,
+	void				__user *arg)
+{
+	struct xfs_scrub_metadata	scrub;
+	int				error;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	if (copy_from_user(&scrub, arg, sizeof(scrub)))
+		return -EFAULT;
+
+	error = xfs_scrub_metadata(ip, &scrub);
+	if (error)
+		return error;
+
+	if (copy_to_user(arg, &scrub, sizeof(scrub)))
+		return -EFAULT;
+
+	return 0;
+}
+
 int
 xfs_ioc_swapext(
 	xfs_swapext_t	*sxp)
@@ -1872,6 +1897,9 @@ xfs_file_ioctl(
 	case FS_IOC_GETFSMAP:
 		return xfs_ioc_getfsmap(ip, arg);
 
+	case XFS_IOC_SCRUB_METADATA:
+		return xfs_ioc_scrub_metadata(ip, arg);
+
 	case XFS_IOC_FD_TO_HANDLE:
 	case XFS_IOC_PATH_TO_HANDLE:
 	case XFS_IOC_PATH_TO_FSHANDLE: {
diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
index e8b4de3..972d4bd 100644
--- a/fs/xfs/xfs_ioctl32.c
+++ b/fs/xfs/xfs_ioctl32.c
@@ -557,6 +557,7 @@ xfs_file_compat_ioctl(
 	case XFS_IOC_ERROR_CLEARALL:
 	case FS_IOC_GETFSMAP:
 	case XFS_IOC_GET_AG_RESBLKS:
+	case XFS_IOC_SCRUB_METADATA:
 		return xfs_file_ioctl(filp, cmd, p);
 #ifndef BROKEN_X86_ALIGNMENT
 	/* These are handled fine if no alignment issues */
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 2e7e193..d4de29b 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3312,7 +3312,7 @@ DEFINE_GETFSMAP_EVENT(xfs_getfsmap_mapping);
 
 /* scrub */
 #define XFS_SCRUB_TYPE_DESC \
-	{ 0, NULL }
+	{ XFS_SCRUB_TYPE_TEST,		"dummy" }
 DECLARE_EVENT_CLASS(xfs_scrub_class,
 	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
 		 int error),
@@ -3330,6 +3330,11 @@ DECLARE_EVENT_CLASS(xfs_scrub_class,
 	TP_fast_assign(
 		__entry->dev = ip->i_mount->m_super->s_dev;
 		__entry->ino = ip->i_ino;
+		__entry->type = sm->sm_type;
+		__entry->agno = sm->sm_agno;
+		__entry->inum = sm->sm_ino;
+		__entry->gen = sm->sm_gen;
+		__entry->flags = sm->sm_flags;
 		__entry->error = error;
 	),
 	TP_printk("dev %d:%d ino %llu type %s agno %u inum %llu gen %u flags 0x%x error %d",


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 04/22] xfs: generic functions to scrub metadata and btrees
  2017-07-21  4:38 [PATCH v8 00/22] xfs: online scrub support Darrick J. Wong
                   ` (2 preceding siblings ...)
  2017-07-21  4:38 ` [PATCH 03/22] xfs: create an ioctl to scrub AG metadata Darrick J. Wong
@ 2017-07-21  4:38 ` Darrick J. Wong
  2017-07-23 16:40   ` Allison Henderson
  2017-07-24  1:05   ` Dave Chinner
  2017-07-21  4:39 ` [PATCH 05/22] xfs: scrub in-memory metadata buffers Darrick J. Wong
                   ` (17 subsequent siblings)
  21 siblings, 2 replies; 63+ messages in thread
From: Darrick J. Wong @ 2017-07-21  4:38 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Create a function that walks a btree, checking the integrity of each
btree block (headers, keys, records) and calling back to the caller
to perform further checks on the records.  Add some helper functions
so that we report detailed scrub errors in a uniform manner in dmesg.
These are helper functions for subsequent patches.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile       |    1 
 fs/xfs/scrub/btree.c  |  658 +++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/btree.h  |   95 +++++++
 fs/xfs/scrub/common.c |  169 +++++++++++++
 fs/xfs/scrub/common.h |   30 ++
 5 files changed, 953 insertions(+)
 create mode 100644 fs/xfs/scrub/btree.c
 create mode 100644 fs/xfs/scrub/btree.h


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index c4fdaa2..4e04da9 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -140,6 +140,7 @@ xfs-$(CONFIG_EXPORTFS_BLOCK_OPS)	+= xfs_pnfs.o
 # online scrub/repair
 ifeq ($(CONFIG_XFS_ONLINE_SCRUB),y)
 xfs-y				+= $(addprefix scrub/, \
+				   btree.o \
 				   common.o \
 				   )
 endif
diff --git a/fs/xfs/scrub/btree.c b/fs/xfs/scrub/btree.c
new file mode 100644
index 0000000..452e70a
--- /dev/null
+++ b/fs/xfs/scrub/btree.c
@@ -0,0 +1,658 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_trace.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_alloc.h"
+#include "scrub/common.h"
+#include "scrub/btree.h"
+
+/* btree scrubbing */
+
+const char * const btree_types[] = {
+	[XFS_BTNUM_BNO]		= "bnobt",
+	[XFS_BTNUM_CNT]		= "cntbt",
+	[XFS_BTNUM_RMAP]	= "rmapbt",
+	[XFS_BTNUM_BMAP]	= "bmapbt",
+	[XFS_BTNUM_INO]		= "inobt",
+	[XFS_BTNUM_FINO]	= "finobt",
+	[XFS_BTNUM_REFC]	= "refcountbt",
+};
+
+/* Format the trace parameters for the tree cursor. */
+static inline void
+xfs_scrub_btree_format(
+	struct xfs_btree_cur		*cur,
+	int				level,
+	char				*bt_type,
+	size_t				type_len,
+	char				*bt_ptr,
+	size_t				ptr_len,
+	xfs_fsblock_t			*fsbno)
+{
+	char				*type = NULL;
+	struct xfs_btree_block		*block;
+	struct xfs_buf			*bp;
+
+	switch (cur->bc_btnum) {
+	case XFS_BTNUM_BMAP:
+		switch (cur->bc_private.b.whichfork) {
+		case XFS_DATA_FORK:
+			type = "data";
+			break;
+		case XFS_ATTR_FORK:
+			type = "attr";
+			break;
+		case XFS_COW_FORK:
+			type = "CoW";
+			break;
+		}
+		snprintf(bt_type, type_len, "inode %llu %s fork",
+				(unsigned long long)cur->bc_private.b.ip->i_ino,
+				type);
+		break;
+	default:
+		strncpy(bt_type, btree_types[cur->bc_btnum], type_len);
+		break;
+	}
+
+	if (level < cur->bc_nlevels && cur->bc_ptrs[level] >= 1) {
+		block = xfs_btree_get_block(cur, level, &bp);
+		snprintf(bt_ptr, ptr_len, " %s %d/%d",
+				level == 0 ? "rec" : "ptr",
+				cur->bc_ptrs[level],
+				be16_to_cpu(block->bb_numrecs));
+	} else
+		bt_ptr[0] = 0;
+
+	if (level < cur->bc_nlevels && cur->bc_bufs[level])
+		*fsbno = XFS_DADDR_TO_FSB(cur->bc_mp,
+				cur->bc_bufs[level]->b_bn);
+	else if (cur->bc_flags & XFS_BTREE_LONG_PTRS)
+		*fsbno = XFS_INO_TO_FSB(cur->bc_mp,
+				cur->bc_private.b.ip->i_ino);
+	else
+		*fsbno = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_private.a.agno, 0);
+}
+
+/* Check for btree corruption. */
+bool
+xfs_scrub_btree_ok(
+	struct xfs_scrub_context	*sc,
+	struct xfs_btree_cur		*cur,
+	int				level,
+	bool				fs_ok,
+	const char			*check,
+	const char			*func,
+	int				line)
+{
+	char				bt_ptr[24];
+	char				bt_type[48];
+	xfs_fsblock_t			fsbno;
+
+	if (fs_ok)
+		return fs_ok;
+
+	sc->sm->sm_flags |= XFS_SCRUB_FLAG_CORRUPT;
+	xfs_scrub_btree_format(cur, level, bt_type, 48, bt_ptr, 24, &fsbno);
+
+	trace_xfs_scrub_btree_error(cur->bc_mp, bt_type, bt_ptr,
+			XFS_FSB_TO_AGNO(cur->bc_mp, fsbno),
+			XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno),
+			check, func, line);
+	return fs_ok;
+}
+
+/* Check for btree operation errors . */
+bool
+xfs_scrub_btree_op_ok(
+	struct xfs_scrub_context	*sc,
+	struct xfs_btree_cur		*cur,
+	int				level,
+	int				*error,
+	const char			*func,
+	int				line)
+{
+	char				bt_ptr[24];
+	char				bt_type[48];
+	xfs_fsblock_t			fsbno;
+
+	if (*error == 0)
+		return true;
+
+	xfs_scrub_btree_format(cur, level, bt_type, 48, bt_ptr, 24, &fsbno);
+
+	return xfs_scrub_op_ok(sc,
+			XFS_FSB_TO_AGNO(cur->bc_mp, fsbno),
+			XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno),
+			bt_type, error, func, line);
+}
+
+/*
+ * Make sure this record is in order and doesn't stray outside of the parent
+ * keys.
+ */
+STATIC int
+xfs_scrub_btree_rec(
+	struct xfs_scrub_btree	*bs)
+{
+	struct xfs_btree_cur	*cur = bs->cur;
+	union xfs_btree_rec	*rec;
+	union xfs_btree_key	key;
+	union xfs_btree_key	hkey;
+	union xfs_btree_key	*keyp;
+	struct xfs_btree_block	*block;
+	struct xfs_btree_block	*keyblock;
+	struct xfs_buf		*bp;
+
+	block = xfs_btree_get_block(cur, 0, &bp);
+	rec = xfs_btree_rec_addr(cur, cur->bc_ptrs[0], block);
+
+	if (bp)
+		trace_xfs_scrub_btree_rec(cur->bc_mp,
+				XFS_FSB_TO_AGNO(cur->bc_mp,
+					XFS_DADDR_TO_FSB(cur->bc_mp, bp->b_bn)),
+				XFS_FSB_TO_AGBNO(cur->bc_mp,
+					XFS_DADDR_TO_FSB(cur->bc_mp, bp->b_bn)),
+				cur->bc_btnum, 0, cur->bc_nlevels,
+				cur->bc_ptrs[0]);
+	else if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE)
+		trace_xfs_scrub_btree_rec(cur->bc_mp,
+				XFS_INO_TO_AGNO(cur->bc_mp,
+					cur->bc_private.b.ip->i_ino),
+				XFS_INO_TO_AGBNO(cur->bc_mp,
+					cur->bc_private.b.ip->i_ino),
+				cur->bc_btnum, 0, cur->bc_nlevels,
+				cur->bc_ptrs[0]);
+	else
+		trace_xfs_scrub_btree_rec(cur->bc_mp,
+				NULLAGNUMBER, NULLAGBLOCK,
+				cur->bc_btnum, 0, cur->bc_nlevels,
+				cur->bc_ptrs[0]);
+
+	/* If this isn't the first record, are they in order? */
+	XFS_SCRUB_BTREC_CHECK(bs, bs->firstrec ||
+			cur->bc_ops->recs_inorder(cur, &bs->lastrec, rec));
+	bs->firstrec = false;
+	memcpy(&bs->lastrec, rec, cur->bc_ops->rec_len);
+
+	if (cur->bc_nlevels == 1)
+		return 0;
+
+	/* Is this at least as large as the parent low key? */
+	cur->bc_ops->init_key_from_rec(&key, rec);
+	keyblock = xfs_btree_get_block(cur, 1, &bp);
+	keyp = xfs_btree_key_addr(cur, cur->bc_ptrs[1], keyblock);
+	XFS_SCRUB_BTKEY_CHECK(bs, 1,
+			cur->bc_ops->diff_two_keys(cur, &key, keyp) >= 0);
+
+	if (!(cur->bc_flags & XFS_BTREE_OVERLAPPING))
+		return 0;
+
+	/* Is this no larger than the parent high key? */
+	cur->bc_ops->init_high_key_from_rec(&hkey, rec);
+	keyp = xfs_btree_high_key_addr(cur, cur->bc_ptrs[1], keyblock);
+	XFS_SCRUB_BTKEY_CHECK(bs, 1,
+			cur->bc_ops->diff_two_keys(cur, keyp, &hkey) >= 0);
+
+	return 0;
+}
+
+/*
+ * Make sure this key is in order and doesn't stray outside of the parent
+ * keys.
+ */
+STATIC int
+xfs_scrub_btree_key(
+	struct xfs_scrub_btree	*bs,
+	int			level)
+{
+	struct xfs_btree_cur	*cur = bs->cur;
+	union xfs_btree_key	*key;
+	union xfs_btree_key	*keyp;
+	struct xfs_btree_block	*block;
+	struct xfs_btree_block	*keyblock;
+	struct xfs_buf		*bp;
+
+	block = xfs_btree_get_block(cur, level, &bp);
+	key = xfs_btree_key_addr(cur, cur->bc_ptrs[level], block);
+
+	if (bp)
+		trace_xfs_scrub_btree_key(cur->bc_mp,
+				XFS_FSB_TO_AGNO(cur->bc_mp,
+					XFS_DADDR_TO_FSB(cur->bc_mp, bp->b_bn)),
+				XFS_FSB_TO_AGBNO(cur->bc_mp,
+					XFS_DADDR_TO_FSB(cur->bc_mp, bp->b_bn)),
+				cur->bc_btnum, level, cur->bc_nlevels,
+				cur->bc_ptrs[level]);
+	else if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE)
+		trace_xfs_scrub_btree_key(cur->bc_mp,
+				XFS_INO_TO_AGNO(cur->bc_mp,
+					cur->bc_private.b.ip->i_ino),
+				XFS_INO_TO_AGBNO(cur->bc_mp,
+					cur->bc_private.b.ip->i_ino),
+				cur->bc_btnum, level, cur->bc_nlevels,
+				cur->bc_ptrs[level]);
+	else
+		trace_xfs_scrub_btree_key(cur->bc_mp,
+				NULLAGNUMBER, NULLAGBLOCK,
+				cur->bc_btnum, level, cur->bc_nlevels,
+				cur->bc_ptrs[level]);
+
+	/* If this isn't the first key, are they in order? */
+	XFS_SCRUB_BTKEY_CHECK(bs, level, bs->firstkey[level] ||
+			cur->bc_ops->keys_inorder(cur, &bs->lastkey[level],
+					key));
+	bs->firstkey[level] = false;
+	memcpy(&bs->lastkey[level], key, cur->bc_ops->key_len);
+
+	if (level + 1 >= cur->bc_nlevels)
+		return 0;
+
+	/* Is this at least as large as the parent low key? */
+	keyblock = xfs_btree_get_block(cur, level + 1, &bp);
+	keyp = xfs_btree_key_addr(cur, cur->bc_ptrs[level + 1], keyblock);
+	XFS_SCRUB_BTKEY_CHECK(bs, level,
+			cur->bc_ops->diff_two_keys(cur, key, keyp) >= 0);
+
+	if (!(cur->bc_flags & XFS_BTREE_OVERLAPPING))
+		return 0;
+
+	/* Is this no larger than the parent high key? */
+	key = xfs_btree_high_key_addr(cur, cur->bc_ptrs[level], block);
+	keyp = xfs_btree_high_key_addr(cur, cur->bc_ptrs[level + 1], keyblock);
+	XFS_SCRUB_BTKEY_CHECK(bs, level,
+			cur->bc_ops->diff_two_keys(cur, keyp, key) >= 0);
+
+	return 0;
+}
+
+/* Check a btree pointer. */
+static int
+xfs_scrub_btree_ptr(
+	struct xfs_scrub_btree		*bs,
+	int				level,
+	union xfs_btree_ptr		*ptr)
+{
+	struct xfs_btree_cur		*cur = bs->cur;
+	xfs_daddr_t			daddr;
+	xfs_daddr_t			eofs;
+
+	if ((cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) &&
+			level == cur->bc_nlevels) {
+		if (cur->bc_flags & XFS_BTREE_LONG_PTRS) {
+			XFS_SCRUB_BTKEY_GOTO(bs, level, ptr->l == 0, corrupt);
+		} else {
+			XFS_SCRUB_BTKEY_GOTO(bs, level, ptr->s == 0, corrupt);
+		}
+		return 0;
+	}
+
+	if (cur->bc_flags & XFS_BTREE_LONG_PTRS) {
+		XFS_SCRUB_BTKEY_GOTO(bs, level,
+				ptr->l != cpu_to_be64(NULLFSBLOCK), corrupt);
+
+		daddr = XFS_FSB_TO_DADDR(cur->bc_mp, be64_to_cpu(ptr->l));
+	} else {
+		XFS_SCRUB_BTKEY_GOTO(bs, level,
+				cur->bc_private.a.agno != NULLAGNUMBER, corrupt);
+		XFS_SCRUB_BTKEY_GOTO(bs, level,
+				ptr->s != cpu_to_be32(NULLAGBLOCK), corrupt);
+
+		daddr = XFS_AGB_TO_DADDR(cur->bc_mp, cur->bc_private.a.agno,
+				be32_to_cpu(ptr->s));
+	}
+	eofs = XFS_FSB_TO_BB(cur->bc_mp, cur->bc_mp->m_sb.sb_dblocks);
+	XFS_SCRUB_BTKEY_GOTO(bs, level, daddr != 0, corrupt);
+	XFS_SCRUB_BTKEY_GOTO(bs, level, daddr < eofs, corrupt);
+
+	return 0;
+
+corrupt:
+	return -EFSCORRUPTED;
+}
+
+/* Check the siblings of a large format btree block. */
+STATIC int
+xfs_scrub_btree_lblock_check_siblings(
+	struct xfs_scrub_btree		*bs,
+	struct xfs_btree_block		*block)
+{
+	struct xfs_btree_block		*pblock;
+	struct xfs_buf			*pbp;
+	struct xfs_btree_cur		*ncur = NULL;
+	union xfs_btree_ptr		*pp;
+	xfs_fsblock_t			leftsib;
+	xfs_fsblock_t			rightsib;
+	xfs_fsblock_t			fsbno;
+	int				level;
+	int				success;
+	int				error = 0;
+
+	leftsib = be64_to_cpu(block->bb_u.l.bb_leftsib);
+	rightsib = be64_to_cpu(block->bb_u.l.bb_rightsib);
+	level = xfs_btree_get_level(block);
+
+	/* Root block should never have siblings. */
+	if (level == bs->cur->bc_nlevels - 1) {
+		XFS_SCRUB_BTKEY_CHECK(bs, level, leftsib == NULLFSBLOCK);
+		XFS_SCRUB_BTKEY_CHECK(bs, level, rightsib == NULLFSBLOCK);
+		return error;
+	}
+
+	/* Does the left sibling match the parent level left block? */
+	if (leftsib != NULLFSBLOCK) {
+		error = xfs_btree_dup_cursor(bs->cur, &ncur);
+		if (error)
+			return error;
+		error = xfs_btree_decrement(ncur, level + 1, &success);
+		XFS_SCRUB_BTKEY_OP_ERROR_GOTO(bs, level + 1, &error, out_cur);
+		XFS_SCRUB_BTKEY_GOTO(bs, level, success, out_cur);
+
+		pblock = xfs_btree_get_block(ncur, level + 1, &pbp);
+		pp = xfs_btree_ptr_addr(ncur, ncur->bc_ptrs[level + 1], pblock);
+		if (!xfs_scrub_btree_ptr(bs, level + 1, pp)) {
+			fsbno = be64_to_cpu(pp->l);
+			XFS_SCRUB_BTKEY_CHECK(bs, level, fsbno == leftsib);
+		}
+
+		xfs_btree_del_cursor(ncur, XFS_BTREE_ERROR);
+		ncur = NULL;
+	}
+
+	/* Does the right sibling match the parent level right block? */
+	if (!error && rightsib != NULLFSBLOCK) {
+		error = xfs_btree_dup_cursor(bs->cur, &ncur);
+		if (error)
+			return error;
+		error = xfs_btree_increment(ncur, level + 1, &success);
+		XFS_SCRUB_BTKEY_OP_ERROR_GOTO(bs, level + 1, &error, out_cur);
+		XFS_SCRUB_BTKEY_GOTO(bs, level, success, out_cur);
+
+		pblock = xfs_btree_get_block(ncur, level + 1, &pbp);
+		pp = xfs_btree_ptr_addr(ncur, ncur->bc_ptrs[level + 1], pblock);
+		if (!xfs_scrub_btree_ptr(bs, level + 1, pp)) {
+			fsbno = be64_to_cpu(pp->l);
+			XFS_SCRUB_BTKEY_CHECK(bs, level, fsbno == rightsib);
+		}
+
+		xfs_btree_del_cursor(ncur, XFS_BTREE_ERROR);
+		ncur = NULL;
+	}
+
+out_cur:
+	if (ncur)
+		xfs_btree_del_cursor(ncur, XFS_BTREE_ERROR);
+	return error;
+}
+
+/* Check the siblings of a small format btree block. */
+STATIC int
+xfs_scrub_btree_sblock_check_siblings(
+	struct xfs_scrub_btree		*bs,
+	struct xfs_btree_block		*block)
+{
+	struct xfs_btree_block		*pblock;
+	struct xfs_buf			*pbp;
+	struct xfs_btree_cur		*ncur = NULL;
+	union xfs_btree_ptr		*pp;
+	xfs_agblock_t			leftsib;
+	xfs_agblock_t			rightsib;
+	xfs_agblock_t			agbno;
+	int				level;
+	int				success;
+	int				error = 0;
+
+	leftsib = be32_to_cpu(block->bb_u.s.bb_leftsib);
+	rightsib = be32_to_cpu(block->bb_u.s.bb_rightsib);
+	level = xfs_btree_get_level(block);
+
+	/* Root block should never have siblings. */
+	if (level == bs->cur->bc_nlevels - 1) {
+		XFS_SCRUB_BTKEY_CHECK(bs, level, leftsib == NULLAGBLOCK);
+		XFS_SCRUB_BTKEY_CHECK(bs, level, rightsib == NULLAGBLOCK);
+		return error;
+	}
+
+	/* Does the left sibling match the parent level left block? */
+	if (leftsib != NULLAGBLOCK) {
+		error = xfs_btree_dup_cursor(bs->cur, &ncur);
+		if (error)
+			return error;
+		error = xfs_btree_decrement(ncur, level + 1, &success);
+		XFS_SCRUB_BTKEY_OP_ERROR_GOTO(bs, level + 1, &error, out_cur);
+		XFS_SCRUB_BTKEY_GOTO(bs, level, success, verify_rightsib);
+
+		pblock = xfs_btree_get_block(ncur, level + 1, &pbp);
+		pp = xfs_btree_ptr_addr(ncur, ncur->bc_ptrs[level + 1], pblock);
+		if (!xfs_scrub_btree_ptr(bs, level + 1, pp)) {
+			agbno = be32_to_cpu(pp->s);
+			XFS_SCRUB_BTKEY_CHECK(bs, level, agbno == leftsib);
+		}
+
+		xfs_btree_del_cursor(ncur, XFS_BTREE_ERROR);
+		ncur = NULL;
+	}
+
+verify_rightsib:
+	if (ncur) {
+		xfs_btree_del_cursor(ncur, XFS_BTREE_ERROR);
+		ncur = NULL;
+	}
+
+	/* Does the right sibling match the parent level right block? */
+	if (rightsib != NULLAGBLOCK) {
+		error = xfs_btree_dup_cursor(bs->cur, &ncur);
+		if (error)
+			return error;
+		error = xfs_btree_increment(ncur, level + 1, &success);
+		XFS_SCRUB_BTKEY_OP_ERROR_GOTO(bs, level + 1, &error, out_cur);
+		XFS_SCRUB_BTKEY_GOTO(bs, level, success, out_cur);
+
+		pblock = xfs_btree_get_block(ncur, level + 1, &pbp);
+		pp = xfs_btree_ptr_addr(ncur, ncur->bc_ptrs[level + 1], pblock);
+		if (!xfs_scrub_btree_ptr(bs, level + 1, pp)) {
+			agbno = be32_to_cpu(pp->s);
+			XFS_SCRUB_BTKEY_CHECK(bs, level, agbno == rightsib);
+		}
+
+		xfs_btree_del_cursor(ncur, XFS_BTREE_ERROR);
+		ncur = NULL;
+	}
+
+out_cur:
+	if (ncur)
+		xfs_btree_del_cursor(ncur, XFS_BTREE_ERROR);
+	return error;
+}
+
+/* Grab and scrub a btree block. */
+STATIC int
+xfs_scrub_btree_block(
+	struct xfs_scrub_btree		*bs,
+	int				level,
+	union xfs_btree_ptr		*pp,
+	struct xfs_btree_block		**pblock,
+	struct xfs_buf			**pbp)
+{
+	int				error;
+
+	error = xfs_btree_lookup_get_block(bs->cur, level, pp, pblock);
+	if (error)
+		return error;
+
+	xfs_btree_get_block(bs->cur, level, pbp);
+	error = xfs_btree_check_block(bs->cur, *pblock, level, *pbp);
+	if (error)
+		return error;
+
+	return bs->check_siblings_fn(bs, *pblock);
+}
+
+/*
+ * Visit all nodes and leaves of a btree.  Check that all pointers and
+ * records are in order, that the keys reflect the records, and use a callback
+ * so that the caller can verify individual records.  The callback is the same
+ * as the one for xfs_btree_query_range, so therefore this function also
+ * returns XFS_BTREE_QUERY_RANGE_ABORT, zero, or a negative error code.
+ */
+int
+xfs_scrub_btree(
+	struct xfs_scrub_context	*sc,
+	struct xfs_btree_cur		*cur,
+	xfs_scrub_btree_rec_fn		scrub_fn,
+	struct xfs_owner_info		*oinfo,
+	void				*private)
+{
+	struct xfs_scrub_btree		bs = {0};
+	union xfs_btree_ptr		ptr;
+	union xfs_btree_ptr		*pp;
+	union xfs_btree_rec		*recp;
+	struct xfs_btree_block		*block;
+	int				level;
+	struct xfs_buf			*bp;
+	int				i;
+	int				error = 0;
+
+	/* Finish filling out the scrub state */
+	bs.cur = cur;
+	bs.scrub_rec = scrub_fn;
+	bs.oinfo = oinfo;
+	bs.firstrec = true;
+	bs.private = private;
+	bs.sc = sc;
+	for (i = 0; i < XFS_BTREE_MAXLEVELS; i++)
+		bs.firstkey[i] = true;
+	INIT_LIST_HEAD(&bs.to_check);
+
+	if (cur->bc_flags & XFS_BTREE_LONG_PTRS)
+		bs.check_siblings_fn = xfs_scrub_btree_lblock_check_siblings;
+	else
+		bs.check_siblings_fn = xfs_scrub_btree_sblock_check_siblings;
+
+	/* Don't try to check a tree with a height we can't handle. */
+	XFS_SCRUB_BTREC_GOTO(&bs, cur->bc_nlevels > 0, out_badcursor);
+	XFS_SCRUB_BTREC_GOTO(&bs, cur->bc_nlevels <= XFS_BTREE_MAXLEVELS,
+			out_badcursor);
+
+	/* Make sure the root isn't in the superblock. */
+	cur->bc_ops->init_ptr_from_cur(cur, &ptr);
+	error = xfs_scrub_btree_ptr(&bs, cur->bc_nlevels, &ptr);
+	XFS_SCRUB_BTKEY_OP_ERROR_GOTO(&bs, cur->bc_nlevels, &error,
+			out_badcursor);
+
+	/* Load the root of the btree. */
+	level = cur->bc_nlevels - 1;
+	cur->bc_ops->init_ptr_from_cur(cur, &ptr);
+	error = xfs_scrub_btree_block(&bs, level, &ptr, &block, &bp);
+	XFS_SCRUB_BTKEY_OP_ERROR_GOTO(&bs, level, &error, out);
+
+	cur->bc_ptrs[level] = 1;
+
+	while (level < cur->bc_nlevels) {
+		block = xfs_btree_get_block(cur, level, &bp);
+
+		if (level == 0) {
+			/* End of leaf, pop back towards the root. */
+			if (cur->bc_ptrs[level] >
+			    be16_to_cpu(block->bb_numrecs)) {
+				if (level < cur->bc_nlevels - 1)
+					cur->bc_ptrs[level + 1]++;
+				level++;
+				continue;
+			}
+
+			/* Records in order for scrub? */
+			error = xfs_scrub_btree_rec(&bs);
+			if (error)
+				goto out;
+			recp = xfs_btree_rec_addr(cur, cur->bc_ptrs[0], block);
+			error = bs.scrub_rec(&bs, recp);
+			if (error < 0 ||
+			    error == XFS_BTREE_QUERY_RANGE_ABORT)
+				break;
+			if (xfs_scrub_should_terminate(&error))
+				break;
+
+			cur->bc_ptrs[level]++;
+			continue;
+		}
+
+		/* End of node, pop back towards the root. */
+		if (cur->bc_ptrs[level] > be16_to_cpu(block->bb_numrecs)) {
+			if (level < cur->bc_nlevels - 1)
+				cur->bc_ptrs[level + 1]++;
+			level++;
+			continue;
+		}
+
+		/* Keys in order for scrub? */
+		error = xfs_scrub_btree_key(&bs, level);
+		if (error)
+			goto out;
+
+		/* Drill another level deeper. */
+		pp = xfs_btree_ptr_addr(cur, cur->bc_ptrs[level], block);
+		error = xfs_scrub_btree_ptr(&bs, level, pp);
+		if (error) {
+			error = 0;
+			cur->bc_ptrs[level]++;
+			continue;
+		}
+		level--;
+		error = xfs_scrub_btree_block(&bs, level, pp, &block, &bp);
+		XFS_SCRUB_BTKEY_OP_ERROR_GOTO(&bs, level, &error, out);
+
+		cur->bc_ptrs[level] = 1;
+	}
+
+out:
+	/*
+	 * If we don't end this function with the cursor pointing at a record
+	 * block, a subsequent non-error cursor deletion will not release
+	 * node-level buffers, causing a buffer leak.  This is quite possible
+	 * with a zero-results scrubbing run, so release the buffers if we
+	 * aren't pointing at a record.
+	 */
+	if (cur->bc_bufs[0] == NULL) {
+		for (i = 0; i < cur->bc_nlevels; i++) {
+			if (cur->bc_bufs[i]) {
+				xfs_trans_brelse(cur->bc_tp, cur->bc_bufs[i]);
+				cur->bc_bufs[i] = NULL;
+				cur->bc_ptrs[i] = 0;
+				cur->bc_ra[i] = 0;
+			}
+		}
+	}
+
+out_badcursor:
+	return error;
+}
diff --git a/fs/xfs/scrub/btree.h b/fs/xfs/scrub/btree.h
new file mode 100644
index 0000000..75e89b1
--- /dev/null
+++ b/fs/xfs/scrub/btree.h
@@ -0,0 +1,95 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef __XFS_REPAIR_BTREE_H__
+#define __XFS_REPAIR_BTREE_H__
+
+/* btree scrub */
+
+extern const char * const btree_types[];
+
+/* Check for btree corruption. */
+bool xfs_scrub_btree_ok(struct xfs_scrub_context *sc,
+			struct xfs_btree_cur *cur, int level, bool fs_ok,
+			const char *check, const char *func, int line);
+
+/* Check for btree operation errors. */
+bool xfs_scrub_btree_op_ok(struct xfs_scrub_context *sc,
+			   struct xfs_btree_cur *cur, int level, int *error,
+			   const char *func, int line);
+
+#define XFS_SCRUB_BTREC_CHECK(bs, fs_ok) \
+	xfs_scrub_btree_ok((bs)->sc, (bs)->cur, 0, (fs_ok), #fs_ok, \
+			__func__, __LINE__)
+#define XFS_SCRUB_BTREC_GOTO(bs, fs_ok, label) \
+	do { \
+		if (!xfs_scrub_btree_ok((bs)->sc, (bs)->cur, 0, (fs_ok), \
+				#fs_ok, __func__, __LINE__)) \
+			goto label; \
+	} while (0)
+#define XFS_SCRUB_BTREC_OP_ERROR_GOTO(bs, error, label) \
+	do { \
+		if (!xfs_scrub_btree_op_ok((bs)->sc, (bs)->cur, 0, \
+				(error), __func__, __LINE__)) \
+			goto label; \
+	} while (0)
+#define XFS_SCRUB_BTKEY_CHECK(bs, level, fs_ok) \
+	xfs_scrub_btree_ok((bs)->sc, (bs)->cur, (level), (fs_ok), #fs_ok, \
+			__func__, __LINE__)
+#define XFS_SCRUB_BTKEY_GOTO(bs, level, fs_ok, label) \
+	do { \
+		if (!xfs_scrub_btree_ok((bs)->sc, (bs)->cur, (level), (fs_ok), \
+				#fs_ok, __func__, __LINE__)) \
+			goto label; \
+	} while (0)
+#define XFS_SCRUB_BTKEY_OP_ERROR_GOTO(bs, level, error, label) \
+	do { \
+		if (!xfs_scrub_btree_op_ok((bs)->sc, (bs)->cur, (level), \
+				(error), __func__, __LINE__)) \
+			goto label; \
+	} while (0)
+
+struct xfs_scrub_btree;
+typedef int (*xfs_scrub_btree_rec_fn)(
+	struct xfs_scrub_btree	*bs,
+	union xfs_btree_rec	*rec);
+
+struct xfs_scrub_btree {
+	/* caller-provided scrub state */
+	struct xfs_scrub_context	*sc;
+	struct xfs_btree_cur		*cur;
+	xfs_scrub_btree_rec_fn		scrub_rec;
+	struct xfs_owner_info		*oinfo;
+	void				*private;
+
+	/* internal scrub state */
+	union xfs_btree_rec		lastrec;
+	bool				firstrec;
+	union xfs_btree_key		lastkey[XFS_BTREE_MAXLEVELS];
+	bool				firstkey[XFS_BTREE_MAXLEVELS];
+	struct list_head		to_check;
+	int				(*check_siblings_fn)(
+						struct xfs_scrub_btree *,
+						struct xfs_btree_block *);
+};
+int xfs_scrub_btree(struct xfs_scrub_context *sc, struct xfs_btree_cur *cur,
+		    xfs_scrub_btree_rec_fn scrub_fn,
+		    struct xfs_owner_info *oinfo, void *private);
+
+#endif /* __XFS_REPAIR_BTREE_H__ */
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 6931793..331aa14 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -43,6 +43,7 @@
 #include "xfs_rmap_btree.h"
 #include "scrub/xfs_scrub.h"
 #include "scrub/common.h"
+#include "scrub/btree.h"
 
 /*
  * Online Scrub and Repair
@@ -367,6 +368,172 @@ xfs_scrub_incomplete(
 	return fs_ok;
 }
 
+/* AG scrubbing */
+
+/* Grab all the headers for an AG. */
+int
+xfs_scrub_ag_read_headers(
+	struct xfs_scrub_context	*sc,
+	xfs_agnumber_t			agno,
+	struct xfs_buf			**agi,
+	struct xfs_buf			**agf,
+	struct xfs_buf			**agfl)
+{
+	struct xfs_mount		*mp = sc->mp;
+	int				error;
+
+	error = xfs_ialloc_read_agi(mp, sc->tp, agno, agi);
+	if (error)
+		goto out;
+
+	error = xfs_alloc_read_agf(mp, sc->tp, agno, 0, agf);
+	if (error)
+		goto out;
+	if (!*agf) {
+		error = -ENOMEM;
+		goto out;
+	}
+
+	error = xfs_alloc_read_agfl(mp, sc->tp, agno, agfl);
+	if (error)
+		goto out;
+
+out:
+	return error;
+}
+
+/* Release all the AG btree cursors. */
+STATIC void
+xfs_scrub_ag_btcur_free(
+	struct xfs_scrub_ag		*sa)
+{
+	if (sa->refc_cur)
+		xfs_btree_del_cursor(sa->refc_cur, XFS_BTREE_ERROR);
+	if (sa->rmap_cur)
+		xfs_btree_del_cursor(sa->rmap_cur, XFS_BTREE_ERROR);
+	if (sa->fino_cur)
+		xfs_btree_del_cursor(sa->fino_cur, XFS_BTREE_ERROR);
+	if (sa->ino_cur)
+		xfs_btree_del_cursor(sa->ino_cur, XFS_BTREE_ERROR);
+	if (sa->cnt_cur)
+		xfs_btree_del_cursor(sa->cnt_cur, XFS_BTREE_ERROR);
+	if (sa->bno_cur)
+		xfs_btree_del_cursor(sa->bno_cur, XFS_BTREE_ERROR);
+
+	sa->refc_cur = NULL;
+	sa->rmap_cur = NULL;
+	sa->fino_cur = NULL;
+	sa->ino_cur = NULL;
+	sa->bno_cur = NULL;
+	sa->cnt_cur = NULL;
+}
+
+/* Initialize all the btree cursors for an AG. */
+int
+xfs_scrub_ag_btcur_init(
+	struct xfs_scrub_context	*sc,
+	struct xfs_scrub_ag		*sa)
+{
+	struct xfs_mount		*mp = sc->mp;
+	xfs_agnumber_t			agno = sa->agno;
+
+	if (sa->agf_bp) {
+		/* Set up a bnobt cursor for cross-referencing. */
+		sa->bno_cur = xfs_allocbt_init_cursor(mp, sc->tp, sa->agf_bp,
+				agno, XFS_BTNUM_BNO);
+		if (!sa->bno_cur)
+			goto err;
+
+		/* Set up a cntbt cursor for cross-referencing. */
+		sa->cnt_cur = xfs_allocbt_init_cursor(mp, sc->tp, sa->agf_bp,
+				agno, XFS_BTNUM_CNT);
+		if (!sa->cnt_cur)
+			goto err;
+	}
+
+	/* Set up a inobt cursor for cross-referencing. */
+	if (sa->agi_bp) {
+		sa->ino_cur = xfs_inobt_init_cursor(mp, sc->tp, sa->agi_bp,
+					agno, XFS_BTNUM_INO);
+		if (!sa->ino_cur)
+			goto err;
+	}
+
+	/* Set up a finobt cursor for cross-referencing. */
+	if (sa->agi_bp && xfs_sb_version_hasfinobt(&mp->m_sb)) {
+		sa->fino_cur = xfs_inobt_init_cursor(mp, sc->tp, sa->agi_bp,
+				agno, XFS_BTNUM_FINO);
+		if (!sa->fino_cur)
+			goto err;
+	}
+
+	/* Set up a rmapbt cursor for cross-referencing. */
+	if (sa->agf_bp && xfs_sb_version_hasrmapbt(&mp->m_sb)) {
+		sa->rmap_cur = xfs_rmapbt_init_cursor(mp, sc->tp, sa->agf_bp,
+				agno);
+		if (!sa->rmap_cur)
+			goto err;
+	}
+
+	/* Set up a refcountbt cursor for cross-referencing. */
+	if (sa->agf_bp && xfs_sb_version_hasreflink(&mp->m_sb)) {
+		sa->refc_cur = xfs_refcountbt_init_cursor(mp, sc->tp,
+				sa->agf_bp, agno, NULL);
+		if (!sa->refc_cur)
+			goto err;
+	}
+
+	return 0;
+err:
+	return -ENOMEM;
+}
+
+/* Release the AG header context and btree cursors. */
+void
+xfs_scrub_ag_free(
+	struct xfs_scrub_context	*sc,
+	struct xfs_scrub_ag		*sa)
+{
+	xfs_scrub_ag_btcur_free(sa);
+	if (sa->agfl_bp) {
+		xfs_trans_brelse(sc->tp, sa->agfl_bp);
+		sa->agfl_bp = NULL;
+	}
+	if (sa->agf_bp) {
+		xfs_trans_brelse(sc->tp, sa->agf_bp);
+		sa->agf_bp = NULL;
+	}
+	if (sa->agi_bp) {
+		xfs_trans_brelse(sc->tp, sa->agi_bp);
+		sa->agi_bp = NULL;
+	}
+	sa->agno = NULLAGNUMBER;
+}
+
+/*
+ * For scrub, grab the AGI and the AGF headers, in that order.  Locking
+ * order requires us to get the AGI before the AGF.  We use the
+ * transaction to avoid deadlocking on crosslinked metadata buffers;
+ * either the caller passes one in (bmap scrub) or we have to create a
+ * transaction ourselves.
+ */
+int
+xfs_scrub_ag_init(
+	struct xfs_scrub_context	*sc,
+	xfs_agnumber_t			agno,
+	struct xfs_scrub_ag		*sa)
+{
+	int				error;
+
+	sa->agno = agno;
+	error = xfs_scrub_ag_read_headers(sc, agno, &sa->agi_bp,
+			&sa->agf_bp, &sa->agfl_bp);
+	if (error)
+		return error;
+
+	return xfs_scrub_ag_btcur_init(sc, sa);
+}
+
 /* Dummy scrubber */
 
 int
@@ -409,6 +576,7 @@ xfs_scrub_teardown(
 	struct xfs_scrub_context	*sc,
 	int				error)
 {
+	xfs_scrub_ag_free(sc, &sc->sa);
 	if (sc->tp) {
 		xfs_trans_cancel(sc->tp);
 		sc->tp = NULL;
@@ -430,6 +598,7 @@ xfs_scrub_setup(
 	sc->sm = sm;
 	sc->fns = fns;
 	sc->try_harder = try_harder;
+	sc->sa.agno = NULLAGNUMBER;
 
 	return sc->fns->setup(sc, ip);
 }
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 4f3113a..15baccb 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -27,6 +27,24 @@ static inline bool xfs_scrub_found_corruption(struct xfs_scrub_metadata *sm)
 			       XFS_SCRUB_FLAG_XCORRUPT);
 }
 
+/* Buffer pointers and btree cursors for an entire AG. */
+struct xfs_scrub_ag {
+	xfs_agnumber_t			agno;
+
+	/* AG btree roots */
+	struct xfs_buf			*agf_bp;
+	struct xfs_buf			*agfl_bp;
+	struct xfs_buf			*agi_bp;
+
+	/* AG btrees */
+	struct xfs_btree_cur		*bno_cur;
+	struct xfs_btree_cur		*cnt_cur;
+	struct xfs_btree_cur		*ino_cur;
+	struct xfs_btree_cur		*fino_cur;
+	struct xfs_btree_cur		*rmap_cur;
+	struct xfs_btree_cur		*refc_cur;
+};
+
 struct xfs_scrub_context {
 	/* General scrub state. */
 	struct xfs_mount		*mp;
@@ -35,6 +53,9 @@ struct xfs_scrub_context {
 	struct xfs_trans		*tp;
 	struct xfs_inode		*ip;
 	bool				try_harder;
+
+	/* State tracking for single-AG operations. */
+	struct xfs_scrub_ag		sa;
 };
 
 /* Should we end the scrub early? */
@@ -164,6 +185,15 @@ bool xfs_scrub_incomplete(struct xfs_scrub_context *sc, const char *type,
 	xfs_scrub_incomplete((sc), (type), (fs_ok), \
 			#fs_ok, __func__, __LINE__)
 
+void xfs_scrub_ag_free(struct xfs_scrub_context *sc, struct xfs_scrub_ag *sa);
+int xfs_scrub_ag_init(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
+		      struct xfs_scrub_ag *sa);
+int xfs_scrub_ag_read_headers(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
+			      struct xfs_buf **agi, struct xfs_buf **agf,
+			      struct xfs_buf **agfl);
+int xfs_scrub_ag_btcur_init(struct xfs_scrub_context *sc,
+			    struct xfs_scrub_ag *sa);
+
 /* Setup functions */
 
 #define SETUP_FN(name) int name(struct xfs_scrub_context *sc, struct xfs_inode *ip)


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 05/22] xfs: scrub in-memory metadata buffers
  2017-07-21  4:38 [PATCH v8 00/22] xfs: online scrub support Darrick J. Wong
                   ` (3 preceding siblings ...)
  2017-07-21  4:38 ` [PATCH 04/22] xfs: generic functions to scrub metadata and btrees Darrick J. Wong
@ 2017-07-21  4:39 ` Darrick J. Wong
  2017-07-23 16:48   ` Allison Henderson
  2017-07-24  1:43   ` Dave Chinner
  2017-07-21  4:39 ` [PATCH 06/22] xfs: scrub the backup superblocks Darrick J. Wong
                   ` (16 subsequent siblings)
  21 siblings, 2 replies; 63+ messages in thread
From: Darrick J. Wong @ 2017-07-21  4:39 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Call the verifier function for all in-memory metadata buffers, looking
for memory corruption either due to bad memory or coding bugs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile         |    1 
 fs/xfs/libxfs/xfs_fs.h  |    3 +
 fs/xfs/scrub/common.c   |    4 +
 fs/xfs/scrub/common.h   |    2 +
 fs/xfs/scrub/metabufs.c |  177 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_trace.h      |    3 +
 6 files changed, 188 insertions(+), 2 deletions(-)
 create mode 100644 fs/xfs/scrub/metabufs.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 4e04da9..67cf4ac 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -142,5 +142,6 @@ ifeq ($(CONFIG_XFS_ONLINE_SCRUB),y)
 xfs-y				+= $(addprefix scrub/, \
 				   btree.o \
 				   common.o \
+				   metabufs.o \
 				   )
 endif
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index aeccc99..9fb3c65 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -482,7 +482,8 @@ struct xfs_scrub_metadata {
  * Metadata types and flags for scrub operation.
  */
 #define XFS_SCRUB_TYPE_TEST	0	/* dummy to test ioctl */
-#define XFS_SCRUB_TYPE_MAX	0
+#define XFS_SCRUB_TYPE_METABUFS	1	/* in-core metadata buffers */
+#define XFS_SCRUB_TYPE_MAX	1
 
 /* i: repair this metadata */
 #define XFS_SCRUB_FLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 331aa14..e06131f 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -610,6 +610,10 @@ static const struct xfs_scrub_meta_fns meta_scrub_fns[] = {
 		.setup	= xfs_scrub_setup_fs,
 		.scrub	= xfs_scrub_dummy,
 	},
+	{ /* in-memory metadata buffers */
+		.setup	= xfs_scrub_setup_metabufs,
+		.scrub	= xfs_scrub_metabufs,
+	},
 };
 
 /* Dispatch metadata scrubbing. */
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 15baccb..5f0818c 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -198,12 +198,14 @@ int xfs_scrub_ag_btcur_init(struct xfs_scrub_context *sc,
 
 #define SETUP_FN(name) int name(struct xfs_scrub_context *sc, struct xfs_inode *ip)
 SETUP_FN(xfs_scrub_setup_fs);
+SETUP_FN(xfs_scrub_setup_metabufs);
 #undef SETUP_FN
 
 /* Metadata scrubbers */
 
 #define SCRUB_FN(name) int name(struct xfs_scrub_context *sc)
 SCRUB_FN(xfs_scrub_dummy);
+SCRUB_FN(xfs_scrub_metabufs);
 #undef SCRUB_FN
 
 #endif	/* __XFS_REPAIR_COMMON_H__ */
diff --git a/fs/xfs/scrub/metabufs.c b/fs/xfs/scrub/metabufs.c
new file mode 100644
index 0000000..63faaa6
--- /dev/null
+++ b/fs/xfs/scrub/metabufs.c
@@ -0,0 +1,177 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_trace.h"
+#include "xfs_sb.h"
+#include "scrub/common.h"
+
+/* We only iterate buffers one by one, so we don't need any setup. */
+int
+xfs_scrub_setup_metabufs(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	return 0;
+}
+
+#define XFS_SCRUB_METABUFS_TOO_MANY_RETRIES	10
+struct xfs_scrub_metabufs_info {
+	struct xfs_scrub_context	*sc;
+	unsigned int			retries;
+};
+
+/* In-memory buffer corruption. */
+
+#define XFS_SCRUB_BUF_OP_ERROR_GOTO(label) \
+	XFS_SCRUB_OP_ERROR_GOTO(smi->sc, \
+			xfs_daddr_to_agno(smi->sc->mp, bp->b_bn), \
+			xfs_daddr_to_agbno(smi->sc->mp, bp->b_bn), "buf", \
+			&error, label)
+STATIC int
+xfs_scrub_metabufs_scrub_buf(
+	struct xfs_scrub_metabufs_info	*smi,
+	struct xfs_buf			*bp)
+{
+	int				olderror;
+	int				error = 0;
+
+	/*
+	 * We hold the rcu lock during the rhashtable walk, so we can't risk
+	 * having the log forced due to a stale buffer by xfs_buf_lock.
+	 */
+	if (bp->b_flags & XBF_STALE)
+		return 0;
+
+	atomic_inc(&bp->b_hold);
+	if (!xfs_buf_trylock(bp)) {
+		if (smi->retries > XFS_SCRUB_METABUFS_TOO_MANY_RETRIES) {
+			/* We've retried too many times, do what we can. */
+			XFS_SCRUB_INCOMPLETE(smi->sc, "metabufs", true);
+			error = 0;
+		} else {
+			/* Restart the metabuf scrub from the start. */
+			smi->retries++;
+			error = -EAGAIN;
+		}
+		goto out_dec;
+	}
+
+	/* Skip this buffer if it's stale, unread, or has no verifiers. */
+	if ((bp->b_flags & XBF_STALE) ||
+	    !(bp->b_flags & XBF_DONE) ||
+	    !bp->b_ops)
+		goto out_unlock;
+
+	/*
+	 * Run the verifiers to see if the in-memory buffer is bitrotting or
+	 * otherwise corrupt.  If the buffer doesn't have a log item then
+	 * it's clean, so call the read verifier.  However, if the buffer
+	 * has a log item, it is probably dirty.  Checksums will be written
+	 * when the buffer is about to go out to disk, so call the write
+	 * verifier to check the structure.
+	 */
+	olderror = bp->b_error;
+	if (bp->b_fspriv)
+		bp->b_ops->verify_write(bp);
+	else
+		bp->b_ops->verify_read(bp);
+	error = bp->b_error;
+	bp->b_error = olderror;
+
+	/* Mark any corruption errors we might find. */
+	XFS_SCRUB_BUF_OP_ERROR_GOTO(out_unlock);
+
+out_unlock:
+	xfs_buf_unlock(bp);
+out_dec:
+	atomic_dec(&bp->b_hold);
+	return error;
+}
+#undef XFS_SCRUB_BUF_OP_ERROR_GOTO
+
+/* Walk the buffer rhashtable and dispatch buffer checking. */
+STATIC int
+xfs_scrub_metabufs_walk_rhash(
+	struct xfs_scrub_metabufs_info	*smi,
+	struct rhashtable_iter		*iter)
+{
+	struct xfs_buf			*bp;
+	int				error = 0;
+
+	do {
+		if (xfs_scrub_should_terminate(&error))
+			break;
+
+		bp = rhashtable_walk_next(iter);
+		if (IS_ERR(bp))
+			return PTR_ERR(bp);
+		else if (bp == NULL)
+			return 0;
+
+		error = xfs_scrub_metabufs_scrub_buf(smi, bp);
+	} while (error != 0);
+
+	return error;
+}
+
+/* Try to walk the buffers in this AG in order to scrub them. */
+int
+xfs_scrub_metabufs(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_scrub_metabufs_info	smi;
+	struct rhashtable_iter		iter;
+	struct xfs_perag		*pag;
+	int				error;
+
+	smi.sc = sc;
+	smi.retries = 0;
+	pag = xfs_perag_get(sc->mp, sc->sm->sm_agno);
+	rhashtable_walk_enter(&pag->pag_buf_hash, &iter);
+
+	while (1) {
+		if (xfs_scrub_should_terminate(&error))
+			break;
+
+		error = rhashtable_walk_start(&iter);
+		if (!error) {
+			error = xfs_scrub_metabufs_walk_rhash(&smi, &iter);
+			rhashtable_walk_stop(&iter);
+		}
+
+		if (error != -EAGAIN)
+			break;
+		cond_resched();
+	}
+
+	rhashtable_walk_exit(&iter);
+	xfs_perag_put(pag);
+	return error;
+}
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index d4de29b..036e65c 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3312,7 +3312,8 @@ DEFINE_GETFSMAP_EVENT(xfs_getfsmap_mapping);
 
 /* scrub */
 #define XFS_SCRUB_TYPE_DESC \
-	{ XFS_SCRUB_TYPE_TEST,		"dummy" }
+	{ XFS_SCRUB_TYPE_TEST,		"dummy" }, \
+	{ XFS_SCRUB_TYPE_METABUFS,	"metabufs" }
 DECLARE_EVENT_CLASS(xfs_scrub_class,
 	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
 		 int error),


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 06/22] xfs: scrub the backup superblocks
  2017-07-21  4:38 [PATCH v8 00/22] xfs: online scrub support Darrick J. Wong
                   ` (4 preceding siblings ...)
  2017-07-21  4:39 ` [PATCH 05/22] xfs: scrub in-memory metadata buffers Darrick J. Wong
@ 2017-07-21  4:39 ` Darrick J. Wong
  2017-07-23 16:50   ` Allison Henderson
  2017-07-25  4:05   ` Dave Chinner
  2017-07-21  4:39 ` [PATCH 07/22] xfs: scrub AGF and AGFL Darrick J. Wong
                   ` (15 subsequent siblings)
  21 siblings, 2 replies; 63+ messages in thread
From: Darrick J. Wong @ 2017-07-21  4:39 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Ensure that the geometry presented in the backup superblocks matches
the primary superblock so that repair can recover the filesystem if
that primary gets corrupted.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile         |    1 
 fs/xfs/libxfs/xfs_fs.h  |    3 -
 fs/xfs/scrub/agheader.c |  197 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/common.c   |    4 +
 fs/xfs/scrub/common.h   |    2 
 fs/xfs/xfs_trace.h      |    3 -
 6 files changed, 208 insertions(+), 2 deletions(-)
 create mode 100644 fs/xfs/scrub/agheader.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 67cf4ac..1b9bd1a 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -140,6 +140,7 @@ xfs-$(CONFIG_EXPORTFS_BLOCK_OPS)	+= xfs_pnfs.o
 # online scrub/repair
 ifeq ($(CONFIG_XFS_ONLINE_SCRUB),y)
 xfs-y				+= $(addprefix scrub/, \
+				   agheader.o \
 				   btree.o \
 				   common.o \
 				   metabufs.o \
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 9fb3c65..2f12fb1 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -483,7 +483,8 @@ struct xfs_scrub_metadata {
  */
 #define XFS_SCRUB_TYPE_TEST	0	/* dummy to test ioctl */
 #define XFS_SCRUB_TYPE_METABUFS	1	/* in-core metadata buffers */
-#define XFS_SCRUB_TYPE_MAX	1
+#define XFS_SCRUB_TYPE_SB	2	/* superblock */
+#define XFS_SCRUB_TYPE_MAX	2
 
 /* i: repair this metadata */
 #define XFS_SCRUB_FLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/agheader.c b/fs/xfs/scrub/agheader.c
new file mode 100644
index 0000000..3bca60d
--- /dev/null
+++ b/fs/xfs/scrub/agheader.c
@@ -0,0 +1,197 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_trace.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "scrub/common.h"
+
+/* Set us up to check an AG header. */
+int
+xfs_scrub_setup_ag_header(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	struct xfs_mount		*mp = sc->mp;
+
+	if (sc->sm->sm_agno >= mp->m_sb.sb_agcount ||
+	    sc->sm->sm_ino || sc->sm->sm_gen)
+		return -EINVAL;
+	return xfs_scrub_setup_fs(sc, ip);
+}
+
+/* Superblock */
+
+#define XFS_SCRUB_SB_CHECK(fs_ok) \
+	XFS_SCRUB_CHECK(sc, bp, "superblock", fs_ok)
+#define XFS_SCRUB_SB_PREEN(fs_ok) \
+	XFS_SCRUB_PREEN(sc, bp, "superblock", fs_ok)
+#define XFS_SCRUB_SB_OP_ERROR_GOTO(label) \
+	XFS_SCRUB_OP_ERROR_GOTO(sc, agno, 0, "superblock", &error, out)
+/* Scrub the filesystem superblock. */
+int
+xfs_scrub_superblock(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_buf			*bp;
+	struct xfs_sb			sb;
+	xfs_agnumber_t			agno;
+	uint32_t			v2_ok;
+	int				error;
+
+	agno = sc->sm->sm_agno;
+
+	error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
+		  XFS_AGB_TO_DADDR(mp, agno, XFS_SB_BLOCK(mp)),
+		  XFS_FSS_TO_BB(mp, 1), 0, &bp, &xfs_sb_buf_ops);
+	if (error) {
+		trace_xfs_scrub_block_error(mp, agno, XFS_SB_BLOCK(mp),
+				"superblock", "error != 0", __func__, __LINE__);
+		error = 0;
+		sc->sm->sm_flags |= XFS_SCRUB_FLAG_CORRUPT;
+		goto out;
+	}
+
+	/*
+	 * The in-core sb is a more up-to-date copy of AG 0's sb,
+	 * so there's no point in comparing the two.
+	 */
+	if (agno == 0)
+		goto out;
+
+	xfs_sb_from_disk(&sb, XFS_BUF_TO_SBP(bp));
+
+	/* Verify the geometries match. */
+#define XFS_SCRUB_SB_FIELD(fn) \
+		XFS_SCRUB_SB_CHECK(sb.sb_##fn == mp->m_sb.sb_##fn)
+#define XFS_PREEN_SB_FIELD(fn) \
+		XFS_SCRUB_SB_PREEN(sb.sb_##fn == mp->m_sb.sb_##fn)
+	XFS_SCRUB_SB_FIELD(blocksize);
+	XFS_SCRUB_SB_FIELD(dblocks);
+	XFS_SCRUB_SB_FIELD(rblocks);
+	XFS_SCRUB_SB_FIELD(rextents);
+	XFS_SCRUB_SB_PREEN(uuid_equal(&sb.sb_uuid, &mp->m_sb.sb_uuid));
+	XFS_SCRUB_SB_FIELD(logstart);
+	XFS_PREEN_SB_FIELD(rootino);
+	XFS_PREEN_SB_FIELD(rbmino);
+	XFS_PREEN_SB_FIELD(rsumino);
+	XFS_SCRUB_SB_FIELD(rextsize);
+	XFS_SCRUB_SB_FIELD(agblocks);
+	XFS_SCRUB_SB_FIELD(agcount);
+	XFS_SCRUB_SB_FIELD(rbmblocks);
+	XFS_SCRUB_SB_FIELD(logblocks);
+	XFS_SCRUB_SB_CHECK(!(sb.sb_versionnum & ~XFS_SB_VERSION_OKBITS));
+	XFS_SCRUB_SB_CHECK(XFS_SB_VERSION_NUM(&sb) ==
+			   XFS_SB_VERSION_NUM(&mp->m_sb));
+	XFS_SCRUB_SB_FIELD(sectsize);
+	XFS_SCRUB_SB_FIELD(inodesize);
+	XFS_SCRUB_SB_FIELD(inopblock);
+	XFS_SCRUB_SB_PREEN(memcmp(sb.sb_fname, mp->m_sb.sb_fname,
+			   sizeof(sb.sb_fname)) == 0);
+	XFS_SCRUB_SB_FIELD(blocklog);
+	XFS_SCRUB_SB_FIELD(sectlog);
+	XFS_SCRUB_SB_FIELD(inodelog);
+	XFS_SCRUB_SB_FIELD(inopblog);
+	XFS_SCRUB_SB_FIELD(agblklog);
+	XFS_SCRUB_SB_FIELD(rextslog);
+	XFS_PREEN_SB_FIELD(imax_pct);
+	XFS_PREEN_SB_FIELD(uquotino);
+	XFS_PREEN_SB_FIELD(gquotino);
+	XFS_SCRUB_SB_FIELD(shared_vn);
+	XFS_SCRUB_SB_FIELD(inoalignmt);
+	XFS_PREEN_SB_FIELD(unit);
+	XFS_PREEN_SB_FIELD(width);
+	XFS_SCRUB_SB_FIELD(dirblklog);
+	XFS_SCRUB_SB_FIELD(logsectlog);
+	XFS_SCRUB_SB_FIELD(logsectsize);
+	XFS_SCRUB_SB_FIELD(logsunit);
+	v2_ok = XFS_SB_VERSION2_OKBITS;
+	if (XFS_SB_VERSION_NUM(&sb) >= XFS_SB_VERSION_5)
+		v2_ok |= XFS_SB_VERSION2_CRCBIT;
+	XFS_SCRUB_SB_CHECK(!(sb.sb_features2 & ~v2_ok));
+	XFS_SCRUB_SB_PREEN(sb.sb_features2 == sb.sb_bad_features2);
+	XFS_SCRUB_SB_CHECK(!sb.sb_features2 ||
+			xfs_sb_version_hasmorebits(&mp->m_sb));
+	if (xfs_sb_version_hascrc(&mp->m_sb)) {
+		XFS_SCRUB_SB_CHECK(!xfs_sb_has_compat_feature(&sb,
+				XFS_SB_FEAT_COMPAT_UNKNOWN));
+		XFS_SCRUB_SB_CHECK(!xfs_sb_has_ro_compat_feature(&sb,
+				XFS_SB_FEAT_RO_COMPAT_UNKNOWN));
+		XFS_SCRUB_SB_CHECK(!xfs_sb_has_incompat_feature(&sb,
+				XFS_SB_FEAT_INCOMPAT_UNKNOWN));
+		XFS_SCRUB_SB_CHECK(!xfs_sb_has_incompat_log_feature(&sb,
+				XFS_SB_FEAT_INCOMPAT_LOG_UNKNOWN));
+		XFS_SCRUB_SB_FIELD(spino_align);
+		XFS_PREEN_SB_FIELD(pquotino);
+	}
+	if (xfs_sb_version_hasmetauuid(&mp->m_sb)) {
+		XFS_SCRUB_SB_CHECK(uuid_equal(&sb.sb_meta_uuid,
+					&mp->m_sb.sb_meta_uuid));
+		XFS_SCRUB_SB_CHECK(uuid_equal(&sb.sb_uuid,
+					&mp->m_sb.sb_uuid));
+	} else
+		XFS_SCRUB_SB_CHECK(uuid_equal(&sb.sb_uuid,
+					&mp->m_sb.sb_meta_uuid));
+#undef XFS_SCRUB_SB_FIELD
+
+#define XFS_SCRUB_SB_FEAT(fn) \
+		XFS_SCRUB_SB_CHECK(xfs_sb_version_has##fn(&sb) == \
+		xfs_sb_version_has##fn(&mp->m_sb))
+	XFS_SCRUB_SB_FEAT(align);
+	XFS_SCRUB_SB_FEAT(dalign);
+	XFS_SCRUB_SB_FEAT(logv2);
+	XFS_SCRUB_SB_FEAT(extflgbit);
+	XFS_SCRUB_SB_FEAT(sector);
+	XFS_SCRUB_SB_FEAT(asciici);
+	XFS_SCRUB_SB_FEAT(morebits);
+	XFS_SCRUB_SB_FEAT(lazysbcount);
+	XFS_SCRUB_SB_FEAT(crc);
+	XFS_SCRUB_SB_FEAT(_pquotino);
+	XFS_SCRUB_SB_FEAT(ftype);
+	XFS_SCRUB_SB_FEAT(finobt);
+	XFS_SCRUB_SB_FEAT(sparseinodes);
+	XFS_SCRUB_SB_FEAT(metauuid);
+	XFS_SCRUB_SB_FEAT(rmapbt);
+	XFS_SCRUB_SB_FEAT(reflink);
+#undef XFS_SCRUB_SB_FEAT
+
+#define XFS_SCRUB_SB_FEAT_PREEN(fn) \
+		XFS_SCRUB_SB_PREEN(xfs_sb_version_has##fn(&sb) == \
+		xfs_sb_version_has##fn(&mp->m_sb))
+	XFS_SCRUB_SB_FEAT_PREEN(attr);
+	XFS_SCRUB_SB_FEAT_PREEN(attr2);
+#undef XFS_SCRUB_SB_FEAT_PREEN
+
+out:
+	return error;
+}
+#undef XFS_SCRUB_SB_OP_ERROR_GOTO
+#undef XFS_SCRUB_SB_CHECK
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index e06131f..9285107 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -614,6 +614,10 @@ static const struct xfs_scrub_meta_fns meta_scrub_fns[] = {
 		.setup	= xfs_scrub_setup_metabufs,
 		.scrub	= xfs_scrub_metabufs,
 	},
+	{ /* superblock */
+		.setup	= xfs_scrub_setup_ag_header,
+		.scrub	= xfs_scrub_superblock,
+	},
 };
 
 /* Dispatch metadata scrubbing. */
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 5f0818c..094a708 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -199,6 +199,7 @@ int xfs_scrub_ag_btcur_init(struct xfs_scrub_context *sc,
 #define SETUP_FN(name) int name(struct xfs_scrub_context *sc, struct xfs_inode *ip)
 SETUP_FN(xfs_scrub_setup_fs);
 SETUP_FN(xfs_scrub_setup_metabufs);
+SETUP_FN(xfs_scrub_setup_ag_header);
 #undef SETUP_FN
 
 /* Metadata scrubbers */
@@ -206,6 +207,7 @@ SETUP_FN(xfs_scrub_setup_metabufs);
 #define SCRUB_FN(name) int name(struct xfs_scrub_context *sc)
 SCRUB_FN(xfs_scrub_dummy);
 SCRUB_FN(xfs_scrub_metabufs);
+SCRUB_FN(xfs_scrub_superblock);
 #undef SCRUB_FN
 
 #endif	/* __XFS_REPAIR_COMMON_H__ */
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 036e65c..483008a 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3313,7 +3313,8 @@ DEFINE_GETFSMAP_EVENT(xfs_getfsmap_mapping);
 /* scrub */
 #define XFS_SCRUB_TYPE_DESC \
 	{ XFS_SCRUB_TYPE_TEST,		"dummy" }, \
-	{ XFS_SCRUB_TYPE_METABUFS,	"metabufs" }
+	{ XFS_SCRUB_TYPE_METABUFS,	"metabufs" }, \
+	{ XFS_SCRUB_TYPE_SB,		"superblock" }
 DECLARE_EVENT_CLASS(xfs_scrub_class,
 	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
 		 int error),


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 07/22] xfs: scrub AGF and AGFL
  2017-07-21  4:38 [PATCH v8 00/22] xfs: online scrub support Darrick J. Wong
                   ` (5 preceding siblings ...)
  2017-07-21  4:39 ` [PATCH 06/22] xfs: scrub the backup superblocks Darrick J. Wong
@ 2017-07-21  4:39 ` Darrick J. Wong
  2017-07-23 16:59   ` Allison Henderson
  2017-07-21  4:39 ` [PATCH 08/22] xfs: scrub the AGI Darrick J. Wong
                   ` (14 subsequent siblings)
  21 siblings, 1 reply; 63+ messages in thread
From: Darrick J. Wong @ 2017-07-21  4:39 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Check the block references in the AGF and AGFL headers to make sure
they make sense.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_fs.h  |    4 +
 fs/xfs/scrub/agheader.c |  227 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/common.c   |   68 ++++++++++++++
 fs/xfs/scrub/common.h   |    8 ++
 fs/xfs/xfs_trace.h      |    4 +
 5 files changed, 309 insertions(+), 2 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 2f12fb1..cc35b7d 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -484,7 +484,9 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_TEST	0	/* dummy to test ioctl */
 #define XFS_SCRUB_TYPE_METABUFS	1	/* in-core metadata buffers */
 #define XFS_SCRUB_TYPE_SB	2	/* superblock */
-#define XFS_SCRUB_TYPE_MAX	2
+#define XFS_SCRUB_TYPE_AGF	3	/* AG free header */
+#define XFS_SCRUB_TYPE_AGFL	4	/* AG free list */
+#define XFS_SCRUB_TYPE_MAX	4
 
 /* i: repair this metadata */
 #define XFS_SCRUB_FLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/agheader.c b/fs/xfs/scrub/agheader.c
index 3bca60d..48e276c 100644
--- a/fs/xfs/scrub/agheader.c
+++ b/fs/xfs/scrub/agheader.c
@@ -47,6 +47,72 @@ xfs_scrub_setup_ag_header(
 	return xfs_scrub_setup_fs(sc, ip);
 }
 
+/* Find the size of the AG, in blocks. */
+static inline xfs_agblock_t
+xfs_scrub_ag_blocks(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno)
+{
+	ASSERT(agno < mp->m_sb.sb_agcount);
+
+	if (agno < mp->m_sb.sb_agcount - 1)
+		return mp->m_sb.sb_agblocks;
+	return mp->m_sb.sb_dblocks - (agno * mp->m_sb.sb_agblocks);
+}
+
+/* Walk all the blocks in the AGFL. */
+int
+xfs_scrub_walk_agfl(
+	struct xfs_scrub_context	*sc,
+	int				(*fn)(struct xfs_scrub_context *,
+					      xfs_agblock_t bno, void *),
+	void				*priv)
+{
+	struct xfs_agf			*agf;
+	__be32				*agfl_bno;
+	struct xfs_mount		*mp = sc->mp;
+	unsigned int			flfirst;
+	unsigned int			fllast;
+	int				i;
+	int				error;
+
+	agf = XFS_BUF_TO_AGF(sc->sa.agf_bp);
+	agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, sc->sa.agfl_bp);
+	flfirst = be32_to_cpu(agf->agf_flfirst);
+	fllast = be32_to_cpu(agf->agf_fllast);
+
+	/* Skip an empty AGFL. */
+	if (agf->agf_flcount == cpu_to_be32(0))
+		return 0;
+
+	/* first to last is a consecutive list. */
+	if (fllast >= flfirst) {
+		for (i = flfirst; i <= fllast; i++) {
+			error = fn(sc, be32_to_cpu(agfl_bno[i]), priv);
+			if (error)
+				return error;
+		}
+
+		return 0;
+	}
+
+	/* first to the end */
+	for (i = flfirst; i < XFS_AGFL_SIZE(mp); i++) {
+		error = fn(sc, be32_to_cpu(agfl_bno[i]), priv);
+		if (error)
+			return error;
+	}
+
+	/* the start to last. */
+	for (i = 0; i <= fllast; i++) {
+		error = fn(sc, be32_to_cpu(agfl_bno[i]), priv);
+		if (error)
+			return error;
+	}
+
+	return 0;
+}
+
 /* Superblock */
 
 #define XFS_SCRUB_SB_CHECK(fs_ok) \
@@ -195,3 +261,164 @@ xfs_scrub_superblock(
 }
 #undef XFS_SCRUB_SB_OP_ERROR_GOTO
 #undef XFS_SCRUB_SB_CHECK
+
+/* AGF */
+
+#define XFS_SCRUB_AGF_CHECK(fs_ok) \
+	XFS_SCRUB_CHECK(sc, sc->sa.agf_bp, "AGF", fs_ok)
+#define XFS_SCRUB_AGF_OP_ERROR_GOTO(error, label) \
+	XFS_SCRUB_OP_ERROR_GOTO(sc, sc->sm->sm_agno, \
+			XFS_AGF_BLOCK(sc->mp), "AGF", error, label)
+/* Scrub the AGF. */
+int
+xfs_scrub_agf(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_agf			*agf;
+	xfs_daddr_t			daddr;
+	xfs_daddr_t			eofs;
+	xfs_agnumber_t			agno;
+	xfs_agblock_t			agbno;
+	xfs_agblock_t			eoag;
+	xfs_agblock_t			agfl_first;
+	xfs_agblock_t			agfl_last;
+	xfs_agblock_t			agfl_count;
+	xfs_agblock_t			fl_count;
+	int				level;
+	int				error = 0;
+
+	agno = sc->sm->sm_agno;
+	error = xfs_scrub_load_ag_headers(sc, agno, XFS_SCRUB_TYPE_AGF);
+	XFS_SCRUB_AGF_OP_ERROR_GOTO(&error, out);
+
+	agf = XFS_BUF_TO_AGF(sc->sa.agf_bp);
+	eofs = XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks);
+
+	/* Check the AG length */
+	eoag = be32_to_cpu(agf->agf_length);
+	XFS_SCRUB_AGF_CHECK(eoag == xfs_scrub_ag_blocks(mp, agno));
+
+	/* Check the AGF btree roots and levels */
+	agbno = be32_to_cpu(agf->agf_roots[XFS_BTNUM_BNO]);
+	daddr = XFS_AGB_TO_DADDR(mp, agno, agbno);
+	XFS_SCRUB_AGF_CHECK(agbno > XFS_AGI_BLOCK(mp));
+	XFS_SCRUB_AGF_CHECK(agbno < mp->m_sb.sb_agblocks);
+	XFS_SCRUB_AGF_CHECK(agbno < eoag);
+	XFS_SCRUB_AGF_CHECK(daddr < eofs);
+
+	agbno = be32_to_cpu(agf->agf_roots[XFS_BTNUM_CNT]);
+	daddr = XFS_AGB_TO_DADDR(mp, agno, agbno);
+	XFS_SCRUB_AGF_CHECK(agbno > XFS_AGI_BLOCK(mp));
+	XFS_SCRUB_AGF_CHECK(agbno < mp->m_sb.sb_agblocks);
+	XFS_SCRUB_AGF_CHECK(agbno < eoag);
+	XFS_SCRUB_AGF_CHECK(daddr < eofs);
+
+	level = be32_to_cpu(agf->agf_levels[XFS_BTNUM_BNO]);
+	XFS_SCRUB_AGF_CHECK(level > 0);
+	XFS_SCRUB_AGF_CHECK(level <= XFS_BTREE_MAXLEVELS);
+
+	level = be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNT]);
+	XFS_SCRUB_AGF_CHECK(level > 0);
+	XFS_SCRUB_AGF_CHECK(level <= XFS_BTREE_MAXLEVELS);
+
+	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
+		agbno = be32_to_cpu(agf->agf_roots[XFS_BTNUM_RMAP]);
+		daddr = XFS_AGB_TO_DADDR(mp, agno, agbno);
+		XFS_SCRUB_AGF_CHECK(agbno > XFS_AGI_BLOCK(mp));
+		XFS_SCRUB_AGF_CHECK(agbno < mp->m_sb.sb_agblocks);
+		XFS_SCRUB_AGF_CHECK(agbno < eoag);
+		XFS_SCRUB_AGF_CHECK(daddr < eofs);
+
+		level = be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAP]);
+		XFS_SCRUB_AGF_CHECK(level > 0);
+		XFS_SCRUB_AGF_CHECK(level <= XFS_BTREE_MAXLEVELS);
+	}
+
+	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
+		agbno = be32_to_cpu(agf->agf_refcount_root);
+		daddr = XFS_AGB_TO_DADDR(mp, agno, agbno);
+		XFS_SCRUB_AGF_CHECK(agbno > XFS_AGI_BLOCK(mp));
+		XFS_SCRUB_AGF_CHECK(agbno < mp->m_sb.sb_agblocks);
+		XFS_SCRUB_AGF_CHECK(agbno < eoag);
+		XFS_SCRUB_AGF_CHECK(daddr < eofs);
+
+		level = be32_to_cpu(agf->agf_refcount_level);
+		XFS_SCRUB_AGF_CHECK(level > 0);
+		XFS_SCRUB_AGF_CHECK(level <= XFS_BTREE_MAXLEVELS);
+	}
+
+	/* Check the AGFL counters */
+	agfl_first = be32_to_cpu(agf->agf_flfirst);
+	agfl_last = be32_to_cpu(agf->agf_fllast);
+	agfl_count = be32_to_cpu(agf->agf_flcount);
+	if (agfl_last > agfl_first)
+		fl_count = agfl_last - agfl_first + 1;
+	else
+		fl_count = XFS_AGFL_SIZE(mp) - agfl_first + agfl_last + 1;
+	XFS_SCRUB_AGF_CHECK(agfl_count == 0 || fl_count == agfl_count);
+
+out:
+	return error;
+}
+#undef XFS_SCRUB_AGF_OP_ERROR_GOTO
+#undef XFS_SCRUB_AGF_CHECK
+
+/* AGFL */
+
+#define XFS_SCRUB_AGFL_CHECK(fs_ok) \
+	XFS_SCRUB_CHECK(sc, sc->sa.agfl_bp, "AGFL", fs_ok)
+struct xfs_scrub_agfl {
+	xfs_agblock_t			eoag;
+	xfs_daddr_t			eofs;
+};
+
+/* Scrub an AGFL block. */
+STATIC int
+xfs_scrub_agfl_block(
+	struct xfs_scrub_context	*sc,
+	xfs_agblock_t			agbno,
+	void				*priv)
+{
+	struct xfs_mount		*mp = sc->mp;
+	xfs_agnumber_t			agno = sc->sa.agno;
+	struct xfs_scrub_agfl		*sagfl = priv;
+
+	XFS_SCRUB_AGFL_CHECK(agbno > XFS_AGI_BLOCK(mp));
+	XFS_SCRUB_AGFL_CHECK(XFS_AGB_TO_DADDR(mp, agno, agbno) < sagfl->eofs);
+	XFS_SCRUB_AGFL_CHECK(agbno < mp->m_sb.sb_agblocks);
+	XFS_SCRUB_AGFL_CHECK(agbno < sagfl->eoag);
+
+	return 0;
+}
+
+#define XFS_SCRUB_AGFL_OP_ERROR_GOTO(error, label) \
+	XFS_SCRUB_OP_ERROR_GOTO(sc, sc->sm->sm_agno, \
+			XFS_AGFL_BLOCK(sc->mp), "AGFL", error, label)
+/* Scrub the AGFL. */
+int
+xfs_scrub_agfl(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_scrub_agfl		sagfl;
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_agf			*agf;
+	int				error;
+
+	error = xfs_scrub_load_ag_headers(sc, sc->sm->sm_agno,
+			XFS_SCRUB_TYPE_AGFL);
+	XFS_SCRUB_AGFL_OP_ERROR_GOTO(&error, out);
+	if (!sc->sa.agf_bp)
+		return -EFSCORRUPTED;
+
+	agf = XFS_BUF_TO_AGF(sc->sa.agf_bp);
+	sagfl.eofs = XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks);
+	sagfl.eoag = be32_to_cpu(agf->agf_length);
+
+	/* Check the blocks in the AGFL. */
+	return xfs_scrub_walk_agfl(sc, xfs_scrub_agfl_block, &sagfl);
+out:
+	return error;
+}
+#undef XFS_SCRUB_AGFL_OP_ERROR_GOTO
+#undef XFS_SCRUB_AGFL_CHECK
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 9285107..d1ef722 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -603,6 +603,66 @@ xfs_scrub_setup(
 	return sc->fns->setup(sc, ip);
 }
 
+/*
+ * Load and verify an AG header for further AG header examination.
+ * If this header is not the target of the examination, don't return
+ * the buffer if a runtime or verifier error occurs.
+ */
+STATIC int
+xfs_scrub_load_ag_header(
+	struct xfs_scrub_context	*sc,
+	xfs_daddr_t			daddr,
+	struct xfs_buf			**bpp,
+	const struct xfs_buf_ops	*ops,
+	bool				is_target)
+{
+	struct xfs_mount		*mp = sc->mp;
+	int				error;
+
+	*bpp = NULL;
+	error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
+			XFS_AG_DADDR(mp, sc->sa.agno, daddr),
+			XFS_FSS_TO_BB(mp, 1), 0, bpp, ops);
+	return is_target ? error : 0;
+}
+
+/*
+ * Load as many of the AG headers and btree cursors as we can for an
+ * examination and cross-reference of an AG header.
+ */
+int
+xfs_scrub_load_ag_headers(
+	struct xfs_scrub_context	*sc,
+	xfs_agnumber_t			agno,
+	unsigned int			type)
+{
+	struct xfs_mount		*mp = sc->mp;
+	int				error;
+
+	ASSERT(type == XFS_SCRUB_TYPE_AGF || type == XFS_SCRUB_TYPE_AGFL);
+	memset(&sc->sa, 0, sizeof(sc->sa));
+	sc->sa.agno = agno;
+
+	error = xfs_scrub_load_ag_header(sc, XFS_AGI_DADDR(mp),
+			&sc->sa.agi_bp, &xfs_agi_buf_ops, false);
+	if (error)
+		return error;
+
+	error = xfs_scrub_load_ag_header(sc, XFS_AGF_DADDR(mp),
+			&sc->sa.agf_bp, &xfs_agf_buf_ops,
+			type == XFS_SCRUB_TYPE_AGF);
+	if (error)
+		return error;
+
+	error = xfs_scrub_load_ag_header(sc, XFS_AGFL_DADDR(mp),
+			&sc->sa.agfl_bp, &xfs_agfl_buf_ops,
+			type == XFS_SCRUB_TYPE_AGFL);
+	if (error)
+		return error;
+
+	return 0;
+}
+
 /* Scrubbing dispatch. */
 
 static const struct xfs_scrub_meta_fns meta_scrub_fns[] = {
@@ -618,6 +678,14 @@ static const struct xfs_scrub_meta_fns meta_scrub_fns[] = {
 		.setup	= xfs_scrub_setup_ag_header,
 		.scrub	= xfs_scrub_superblock,
 	},
+	{ /* agf */
+		.setup	= xfs_scrub_setup_ag_header,
+		.scrub	= xfs_scrub_agf,
+	},
+	{ /* agfl */
+		.setup	= xfs_scrub_setup_ag_header,
+		.scrub	= xfs_scrub_agfl,
+	},
 };
 
 /* Dispatch metadata scrubbing. */
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 094a708..5c4893d 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -193,6 +193,12 @@ int xfs_scrub_ag_read_headers(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
 			      struct xfs_buf **agfl);
 int xfs_scrub_ag_btcur_init(struct xfs_scrub_context *sc,
 			    struct xfs_scrub_ag *sa);
+int xfs_scrub_load_ag_headers(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
+			      unsigned int type);
+int xfs_scrub_walk_agfl(struct xfs_scrub_context *sc,
+			int (*fn)(struct xfs_scrub_context *, xfs_agblock_t bno,
+				  void *),
+			void *priv);
 
 /* Setup functions */
 
@@ -208,6 +214,8 @@ SETUP_FN(xfs_scrub_setup_ag_header);
 SCRUB_FN(xfs_scrub_dummy);
 SCRUB_FN(xfs_scrub_metabufs);
 SCRUB_FN(xfs_scrub_superblock);
+SCRUB_FN(xfs_scrub_agf);
+SCRUB_FN(xfs_scrub_agfl);
 #undef SCRUB_FN
 
 #endif	/* __XFS_REPAIR_COMMON_H__ */
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 483008a..ebf6045 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3314,7 +3314,9 @@ DEFINE_GETFSMAP_EVENT(xfs_getfsmap_mapping);
 #define XFS_SCRUB_TYPE_DESC \
 	{ XFS_SCRUB_TYPE_TEST,		"dummy" }, \
 	{ XFS_SCRUB_TYPE_METABUFS,	"metabufs" }, \
-	{ XFS_SCRUB_TYPE_SB,		"superblock" }
+	{ XFS_SCRUB_TYPE_SB,		"superblock" }, \
+	{ XFS_SCRUB_TYPE_AGF,		"AGF" }, \
+	{ XFS_SCRUB_TYPE_AGFL,		"AGFL" }
 DECLARE_EVENT_CLASS(xfs_scrub_class,
 	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
 		 int error),


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 08/22] xfs: scrub the AGI
  2017-07-21  4:38 [PATCH v8 00/22] xfs: online scrub support Darrick J. Wong
                   ` (6 preceding siblings ...)
  2017-07-21  4:39 ` [PATCH 07/22] xfs: scrub AGF and AGFL Darrick J. Wong
@ 2017-07-21  4:39 ` Darrick J. Wong
  2017-07-23 17:02   ` Allison Henderson
  2017-07-21  4:39 ` [PATCH 09/22] xfs: scrub free space btrees Darrick J. Wong
                   ` (13 subsequent siblings)
  21 siblings, 1 reply; 63+ messages in thread
From: Darrick J. Wong @ 2017-07-21  4:39 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Add a forgotten check to the AGI verifier, then wire up the scrub
infrastructure to check the AGI contents.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_fs.h  |    3 +
 fs/xfs/scrub/agheader.c |   96 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/common.c   |   10 ++++-
 fs/xfs/scrub/common.h   |    1 
 fs/xfs/xfs_trace.h      |    3 +
 5 files changed, 109 insertions(+), 4 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index cc35b7d..208cc48 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -486,7 +486,8 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_SB	2	/* superblock */
 #define XFS_SCRUB_TYPE_AGF	3	/* AG free header */
 #define XFS_SCRUB_TYPE_AGFL	4	/* AG free list */
-#define XFS_SCRUB_TYPE_MAX	4
+#define XFS_SCRUB_TYPE_AGI	5	/* AG inode header */
+#define XFS_SCRUB_TYPE_MAX	5
 
 /* i: repair this metadata */
 #define XFS_SCRUB_FLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/agheader.c b/fs/xfs/scrub/agheader.c
index 48e276c..137d2ad 100644
--- a/fs/xfs/scrub/agheader.c
+++ b/fs/xfs/scrub/agheader.c
@@ -422,3 +422,99 @@ xfs_scrub_agfl(
 }
 #undef XFS_SCRUB_AGFL_OP_ERROR_GOTO
 #undef XFS_SCRUB_AGFL_CHECK
+
+/* AGI */
+
+#define XFS_SCRUB_AGI_CHECK(fs_ok) \
+	XFS_SCRUB_CHECK(sc, sc->sa.agi_bp, "AGI", fs_ok)
+#define XFS_SCRUB_AGI_OP_ERROR_GOTO(error, label) \
+	XFS_SCRUB_OP_ERROR_GOTO(sc, sc->sm->sm_agno, \
+			XFS_AGI_BLOCK(sc->mp), "AGI", error, label)
+/* Scrub the AGI. */
+int
+xfs_scrub_agi(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_agi			*agi;
+	xfs_daddr_t			daddr;
+	xfs_daddr_t			eofs;
+	xfs_agnumber_t			agno;
+	xfs_agblock_t			agbno;
+	xfs_agblock_t			eoag;
+	xfs_agino_t			agino;
+	xfs_agino_t			first_agino;
+	xfs_agino_t			last_agino;
+	int				i;
+	int				level;
+	int				error = 0;
+
+	agno = sc->sm->sm_agno;
+	error = xfs_scrub_load_ag_headers(sc, agno, XFS_SCRUB_TYPE_AGI);
+	XFS_SCRUB_AGI_OP_ERROR_GOTO(&error, out);
+
+	agi = XFS_BUF_TO_AGI(sc->sa.agi_bp);
+	eofs = XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks);
+
+	/* Check the AG length */
+	eoag = be32_to_cpu(agi->agi_length);
+	XFS_SCRUB_AGI_CHECK(eoag == xfs_scrub_ag_blocks(mp, agno));
+
+	/* Check btree roots and levels */
+	agbno = be32_to_cpu(agi->agi_root);
+	daddr = XFS_AGB_TO_DADDR(mp, agno, agbno);
+	XFS_SCRUB_AGI_CHECK(agbno > XFS_AGI_BLOCK(mp));
+	XFS_SCRUB_AGI_CHECK(agbno < mp->m_sb.sb_agblocks);
+	XFS_SCRUB_AGI_CHECK(agbno < eoag);
+	XFS_SCRUB_AGI_CHECK(daddr < eofs);
+
+	level = be32_to_cpu(agi->agi_level);
+	XFS_SCRUB_AGI_CHECK(level > 0);
+	XFS_SCRUB_AGI_CHECK(level <= XFS_BTREE_MAXLEVELS);
+
+	if (xfs_sb_version_hasfinobt(&mp->m_sb)) {
+		agbno = be32_to_cpu(agi->agi_free_root);
+		daddr = XFS_AGB_TO_DADDR(mp, agno, agbno);
+		XFS_SCRUB_AGI_CHECK(agbno > XFS_AGI_BLOCK(mp));
+		XFS_SCRUB_AGI_CHECK(agbno < mp->m_sb.sb_agblocks);
+		XFS_SCRUB_AGI_CHECK(agbno < eoag);
+		XFS_SCRUB_AGI_CHECK(daddr < eofs);
+
+		level = be32_to_cpu(agi->agi_free_level);
+		XFS_SCRUB_AGI_CHECK(level > 0);
+		XFS_SCRUB_AGI_CHECK(level <= XFS_BTREE_MAXLEVELS);
+	}
+
+	/* Check inode counters */
+	first_agino = XFS_OFFBNO_TO_AGINO(mp, XFS_AGI_BLOCK(mp) + 1, 0);
+	last_agino = XFS_OFFBNO_TO_AGINO(mp, eoag + 1, 0) - 1;
+	agino = be32_to_cpu(agi->agi_count);
+	XFS_SCRUB_AGI_CHECK(agino <= last_agino - first_agino + 1);
+	XFS_SCRUB_AGI_CHECK(agino >= be32_to_cpu(agi->agi_freecount));
+
+	/* Check inode pointers */
+	agino = be32_to_cpu(agi->agi_newino);
+	if (agino != NULLAGINO) {
+		XFS_SCRUB_AGI_CHECK(agino >= first_agino);
+		XFS_SCRUB_AGI_CHECK(agino <= last_agino);
+	}
+	agino = be32_to_cpu(agi->agi_dirino);
+	if (agino != NULLAGINO) {
+		XFS_SCRUB_AGI_CHECK(agino >= first_agino);
+		XFS_SCRUB_AGI_CHECK(agino <= last_agino);
+	}
+
+	/* Check unlinked inode buckets */
+	for (i = 0; i < XFS_AGI_UNLINKED_BUCKETS; i++) {
+		agino = be32_to_cpu(agi->agi_unlinked[i]);
+		if (agino == NULLAGINO)
+			continue;
+		XFS_SCRUB_AGI_CHECK(agino >= first_agino);
+		XFS_SCRUB_AGI_CHECK(agino <= last_agino);
+	}
+
+out:
+	return error;
+}
+#undef XFS_SCRUB_AGI_CHECK
+#undef XFS_SCRUB_AGI_OP_ERROR_GOTO
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index d1ef722..994c6c8 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -639,12 +639,14 @@ xfs_scrub_load_ag_headers(
 	struct xfs_mount		*mp = sc->mp;
 	int				error;
 
-	ASSERT(type == XFS_SCRUB_TYPE_AGF || type == XFS_SCRUB_TYPE_AGFL);
+	ASSERT(type == XFS_SCRUB_TYPE_AGF || type == XFS_SCRUB_TYPE_AGFL ||
+	       type == XFS_SCRUB_TYPE_AGI);
 	memset(&sc->sa, 0, sizeof(sc->sa));
 	sc->sa.agno = agno;
 
 	error = xfs_scrub_load_ag_header(sc, XFS_AGI_DADDR(mp),
-			&sc->sa.agi_bp, &xfs_agi_buf_ops, false);
+			&sc->sa.agi_bp, &xfs_agi_buf_ops,
+			type == XFS_SCRUB_TYPE_AGI);
 	if (error)
 		return error;
 
@@ -686,6 +688,10 @@ static const struct xfs_scrub_meta_fns meta_scrub_fns[] = {
 		.setup	= xfs_scrub_setup_ag_header,
 		.scrub	= xfs_scrub_agfl,
 	},
+	{ /* agi */
+		.setup	= xfs_scrub_setup_ag_header,
+		.scrub	= xfs_scrub_agi,
+	},
 };
 
 /* Dispatch metadata scrubbing. */
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 5c4893d..952151a 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -216,6 +216,7 @@ SCRUB_FN(xfs_scrub_metabufs);
 SCRUB_FN(xfs_scrub_superblock);
 SCRUB_FN(xfs_scrub_agf);
 SCRUB_FN(xfs_scrub_agfl);
+SCRUB_FN(xfs_scrub_agi);
 #undef SCRUB_FN
 
 #endif	/* __XFS_REPAIR_COMMON_H__ */
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index ebf6045..24efbff 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3316,7 +3316,8 @@ DEFINE_GETFSMAP_EVENT(xfs_getfsmap_mapping);
 	{ XFS_SCRUB_TYPE_METABUFS,	"metabufs" }, \
 	{ XFS_SCRUB_TYPE_SB,		"superblock" }, \
 	{ XFS_SCRUB_TYPE_AGF,		"AGF" }, \
-	{ XFS_SCRUB_TYPE_AGFL,		"AGFL" }
+	{ XFS_SCRUB_TYPE_AGFL,		"AGFL" }, \
+	{ XFS_SCRUB_TYPE_AGI,		"AGI" }
 DECLARE_EVENT_CLASS(xfs_scrub_class,
 	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
 		 int error),


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 09/22] xfs: scrub free space btrees
  2017-07-21  4:38 [PATCH v8 00/22] xfs: online scrub support Darrick J. Wong
                   ` (7 preceding siblings ...)
  2017-07-21  4:39 ` [PATCH 08/22] xfs: scrub the AGI Darrick J. Wong
@ 2017-07-21  4:39 ` Darrick J. Wong
  2017-07-23 17:09   ` Allison Henderson
  2017-07-21  4:39 ` [PATCH 10/22] xfs: scrub inode btrees Darrick J. Wong
                   ` (12 subsequent siblings)
  21 siblings, 1 reply; 63+ messages in thread
From: Darrick J. Wong @ 2017-07-21  4:39 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Check the extent records free space btrees to ensure that the values
look sane.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile        |    1 
 fs/xfs/libxfs/xfs_fs.h |    4 +-
 fs/xfs/scrub/alloc.c   |  105 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/common.c  |   24 +++++++++++
 fs/xfs/scrub/common.h  |    6 +++
 fs/xfs/xfs_trace.h     |    4 +-
 6 files changed, 142 insertions(+), 2 deletions(-)
 create mode 100644 fs/xfs/scrub/alloc.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 1b9bd1a..ce492ee 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -141,6 +141,7 @@ xfs-$(CONFIG_EXPORTFS_BLOCK_OPS)	+= xfs_pnfs.o
 ifeq ($(CONFIG_XFS_ONLINE_SCRUB),y)
 xfs-y				+= $(addprefix scrub/, \
 				   agheader.o \
+				   alloc.o \
 				   btree.o \
 				   common.o \
 				   metabufs.o \
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 208cc48..bb36acf 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -487,7 +487,9 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_AGF	3	/* AG free header */
 #define XFS_SCRUB_TYPE_AGFL	4	/* AG free list */
 #define XFS_SCRUB_TYPE_AGI	5	/* AG inode header */
-#define XFS_SCRUB_TYPE_MAX	5
+#define XFS_SCRUB_TYPE_BNOBT	6	/* freesp by block btree */
+#define XFS_SCRUB_TYPE_CNTBT	7	/* freesp by length btree */
+#define XFS_SCRUB_TYPE_MAX	7
 
 /* i: repair this metadata */
 #define XFS_SCRUB_FLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/alloc.c b/fs/xfs/scrub/alloc.c
new file mode 100644
index 0000000..1709ab2
--- /dev/null
+++ b/fs/xfs/scrub/alloc.c
@@ -0,0 +1,105 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_trace.h"
+#include "xfs_sb.h"
+#include "xfs_rmap.h"
+#include "scrub/common.h"
+#include "scrub/btree.h"
+
+/*
+ * Set us up to scrub free space btrees.
+ * Push everything out of the log so that the busy extent list is empty.
+ */
+int
+xfs_scrub_setup_ag_allocbt(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	return xfs_scrub_setup_ag_btree(sc, ip, sc->try_harder);
+}
+
+/* Free space btree scrubber. */
+
+/* Scrub a bnobt/cntbt record. */
+STATIC int
+xfs_scrub_allocbt_helper(
+	struct xfs_scrub_btree		*bs,
+	union xfs_btree_rec		*rec)
+{
+	struct xfs_mount		*mp = bs->cur->bc_mp;
+	struct xfs_agf			*agf;
+	xfs_agblock_t			bno;
+	xfs_extlen_t			len;
+	int				error = 0;
+
+	bno = be32_to_cpu(rec->alloc.ar_startblock);
+	len = be32_to_cpu(rec->alloc.ar_blockcount);
+	agf = XFS_BUF_TO_AGF(bs->sc->sa.agf_bp);
+
+	XFS_SCRUB_BTREC_CHECK(bs, bno < mp->m_sb.sb_agblocks);
+	XFS_SCRUB_BTREC_CHECK(bs, bno < be32_to_cpu(agf->agf_length));
+	XFS_SCRUB_BTREC_CHECK(bs, bno < bno + len);
+	XFS_SCRUB_BTREC_CHECK(bs, (unsigned long long)bno + len <=
+			mp->m_sb.sb_agblocks);
+	XFS_SCRUB_BTREC_CHECK(bs, (unsigned long long)bno + len <=
+			be32_to_cpu(agf->agf_length));
+
+	return error;
+}
+
+/* Scrub the freespace btrees for some AG. */
+STATIC int
+xfs_scrub_allocbt(
+	struct xfs_scrub_context	*sc,
+	xfs_btnum_t			which)
+{
+	struct xfs_owner_info		oinfo;
+	struct xfs_btree_cur		*cur;
+
+	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_AG);
+	cur = which == XFS_BTNUM_BNO ? sc->sa.bno_cur : sc->sa.cnt_cur;
+	return xfs_scrub_btree(sc, cur, xfs_scrub_allocbt_helper,
+			&oinfo, NULL);
+}
+
+int
+xfs_scrub_bnobt(
+	struct xfs_scrub_context	*sc)
+{
+	return xfs_scrub_allocbt(sc, XFS_BTNUM_BNO);
+}
+
+int
+xfs_scrub_cntbt(
+	struct xfs_scrub_context	*sc)
+{
+	return xfs_scrub_allocbt(sc, XFS_BTNUM_CNT);
+}
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 994c6c8..86161b5 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -556,6 +556,22 @@ xfs_scrub_dummy(
 	return 0;
 }
 
+/* Set us up with AG headers and btree cursors. */
+int
+xfs_scrub_setup_ag_btree(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip,
+	bool				force_log)
+{
+	int				error;
+
+	error = xfs_scrub_setup_ag_header(sc, ip);
+	if (error)
+		return error;
+
+	return xfs_scrub_ag_init(sc, sc->sm->sm_agno, &sc->sa);
+}
+
 /* Per-scrubber setup functions */
 
 /* Set us up with a transaction and an empty context. */
@@ -692,6 +708,14 @@ static const struct xfs_scrub_meta_fns meta_scrub_fns[] = {
 		.setup	= xfs_scrub_setup_ag_header,
 		.scrub	= xfs_scrub_agi,
 	},
+	{ /* bnobt */
+		.setup	= xfs_scrub_setup_ag_allocbt,
+		.scrub	= xfs_scrub_bnobt,
+	},
+	{ /* cntbt */
+		.setup	= xfs_scrub_setup_ag_allocbt,
+		.scrub	= xfs_scrub_cntbt,
+	},
 };
 
 /* Dispatch metadata scrubbing. */
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 952151a..f14abfb 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -202,10 +202,14 @@ int xfs_scrub_walk_agfl(struct xfs_scrub_context *sc,
 
 /* Setup functions */
 
+int xfs_scrub_setup_ag_btree(struct xfs_scrub_context *sc,
+			     struct xfs_inode *ip, bool force_log);
+
 #define SETUP_FN(name) int name(struct xfs_scrub_context *sc, struct xfs_inode *ip)
 SETUP_FN(xfs_scrub_setup_fs);
 SETUP_FN(xfs_scrub_setup_metabufs);
 SETUP_FN(xfs_scrub_setup_ag_header);
+SETUP_FN(xfs_scrub_setup_ag_allocbt);
 #undef SETUP_FN
 
 /* Metadata scrubbers */
@@ -217,6 +221,8 @@ SCRUB_FN(xfs_scrub_superblock);
 SCRUB_FN(xfs_scrub_agf);
 SCRUB_FN(xfs_scrub_agfl);
 SCRUB_FN(xfs_scrub_agi);
+SCRUB_FN(xfs_scrub_bnobt);
+SCRUB_FN(xfs_scrub_cntbt);
 #undef SCRUB_FN
 
 #endif	/* __XFS_REPAIR_COMMON_H__ */
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 24efbff..4a9a645 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3317,7 +3317,9 @@ DEFINE_GETFSMAP_EVENT(xfs_getfsmap_mapping);
 	{ XFS_SCRUB_TYPE_SB,		"superblock" }, \
 	{ XFS_SCRUB_TYPE_AGF,		"AGF" }, \
 	{ XFS_SCRUB_TYPE_AGFL,		"AGFL" }, \
-	{ XFS_SCRUB_TYPE_AGI,		"AGI" }
+	{ XFS_SCRUB_TYPE_AGI,		"AGI" }, \
+	{ XFS_SCRUB_TYPE_BNOBT,		"bnobt" }, \
+	{ XFS_SCRUB_TYPE_CNTBT,		"cntbt" }
 DECLARE_EVENT_CLASS(xfs_scrub_class,
 	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
 		 int error),


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 10/22] xfs: scrub inode btrees
  2017-07-21  4:38 [PATCH v8 00/22] xfs: online scrub support Darrick J. Wong
                   ` (8 preceding siblings ...)
  2017-07-21  4:39 ` [PATCH 09/22] xfs: scrub free space btrees Darrick J. Wong
@ 2017-07-21  4:39 ` Darrick J. Wong
  2017-07-23 17:15   ` Allison Henderson
  2017-07-21  4:39 ` [PATCH 11/22] xfs: scrub rmap btrees Darrick J. Wong
                   ` (11 subsequent siblings)
  21 siblings, 1 reply; 63+ messages in thread
From: Darrick J. Wong @ 2017-07-21  4:39 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Check the records of the inode btrees to make sure that the values
make sense given the inode records themselves.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile            |    1 
 fs/xfs/libxfs/xfs_format.h |    2 
 fs/xfs/libxfs/xfs_fs.h     |    4 -
 fs/xfs/scrub/common.c      |    9 +
 fs/xfs/scrub/common.h      |    3 
 fs/xfs/scrub/ialloc.c      |  347 ++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_trace.h         |    4 -
 7 files changed, 367 insertions(+), 3 deletions(-)
 create mode 100644 fs/xfs/scrub/ialloc.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index ce492ee..5197bea 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -144,6 +144,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   alloc.o \
 				   btree.o \
 				   common.o \
+				   ialloc.o \
 				   metabufs.o \
 				   )
 endif
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 23229f0..154c3dd 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -518,7 +518,7 @@ static inline int xfs_sb_version_hasftype(struct xfs_sb *sbp)
 		 (sbp->sb_features2 & XFS_SB_VERSION2_FTYPE));
 }
 
-static inline int xfs_sb_version_hasfinobt(xfs_sb_t *sbp)
+static inline bool xfs_sb_version_hasfinobt(xfs_sb_t *sbp)
 {
 	return (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) &&
 		(sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_FINOBT);
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index bb36acf..5120cfd 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -489,7 +489,9 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_AGI	5	/* AG inode header */
 #define XFS_SCRUB_TYPE_BNOBT	6	/* freesp by block btree */
 #define XFS_SCRUB_TYPE_CNTBT	7	/* freesp by length btree */
-#define XFS_SCRUB_TYPE_MAX	7
+#define XFS_SCRUB_TYPE_INOBT	8	/* inode btree */
+#define XFS_SCRUB_TYPE_FINOBT	9	/* free inode btree */
+#define XFS_SCRUB_TYPE_MAX	9
 
 /* i: repair this metadata */
 #define XFS_SCRUB_FLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 86161b5..9a31846 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -716,6 +716,15 @@ static const struct xfs_scrub_meta_fns meta_scrub_fns[] = {
 		.setup	= xfs_scrub_setup_ag_allocbt,
 		.scrub	= xfs_scrub_cntbt,
 	},
+	{ /* inobt */
+		.setup	= xfs_scrub_setup_ag_iallocbt,
+		.scrub	= xfs_scrub_inobt,
+	},
+	{ /* finobt */
+		.setup	= xfs_scrub_setup_ag_iallocbt,
+		.scrub	= xfs_scrub_finobt,
+		.has	= xfs_sb_version_hasfinobt,
+	},
 };
 
 /* Dispatch metadata scrubbing. */
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index f14abfb..cd89bec 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -210,6 +210,7 @@ SETUP_FN(xfs_scrub_setup_fs);
 SETUP_FN(xfs_scrub_setup_metabufs);
 SETUP_FN(xfs_scrub_setup_ag_header);
 SETUP_FN(xfs_scrub_setup_ag_allocbt);
+SETUP_FN(xfs_scrub_setup_ag_iallocbt);
 #undef SETUP_FN
 
 /* Metadata scrubbers */
@@ -223,6 +224,8 @@ SCRUB_FN(xfs_scrub_agfl);
 SCRUB_FN(xfs_scrub_agi);
 SCRUB_FN(xfs_scrub_bnobt);
 SCRUB_FN(xfs_scrub_cntbt);
+SCRUB_FN(xfs_scrub_inobt);
+SCRUB_FN(xfs_scrub_finobt);
 #undef SCRUB_FN
 
 #endif	/* __XFS_REPAIR_COMMON_H__ */
diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
new file mode 100644
index 0000000..ecf1852
--- /dev/null
+++ b/fs/xfs/scrub/ialloc.c
@@ -0,0 +1,347 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_trace.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_ialloc.h"
+#include "xfs_ialloc_btree.h"
+#include "xfs_icache.h"
+#include "xfs_rmap.h"
+#include "xfs_log.h"
+#include "xfs_trans_priv.h"
+#include "scrub/common.h"
+#include "scrub/btree.h"
+
+/*
+ * Set us up to scrub inode btrees.
+ * If we detect a discrepancy between the inobt and the inode,
+ * try again after forcing logged inode cores out to disk.
+ */
+int
+xfs_scrub_setup_ag_iallocbt(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	return xfs_scrub_setup_ag_btree(sc, ip, sc->try_harder);
+}
+
+/* Inode btree scrubber. */
+
+/* Scrub a chunk of an inobt record. */
+STATIC int
+xfs_scrub_iallocbt_chunk(
+	struct xfs_scrub_btree		*bs,
+	struct xfs_inobt_rec_incore	*irec,
+	xfs_agino_t			agino,
+	xfs_extlen_t			len,
+	bool				*keep_scanning)
+{
+	struct xfs_mount		*mp = bs->cur->bc_mp;
+	struct xfs_agf			*agf;
+	xfs_agblock_t			eoag;
+	xfs_agblock_t			bno;
+	int				error = 0;
+
+	agf = XFS_BUF_TO_AGF(bs->sc->sa.agf_bp);
+	eoag = be32_to_cpu(agf->agf_length);
+	bno = XFS_AGINO_TO_AGBNO(mp, agino);
+
+	*keep_scanning = true;
+	XFS_SCRUB_BTREC_CHECK(bs, bno < mp->m_sb.sb_agblocks);
+	XFS_SCRUB_BTREC_CHECK(bs, bno < eoag);
+	XFS_SCRUB_BTREC_CHECK(bs, bno < bno + len);
+	XFS_SCRUB_BTREC_CHECK(bs, (unsigned long long)bno + len <=
+			mp->m_sb.sb_agblocks);
+	XFS_SCRUB_BTREC_CHECK(bs, (unsigned long long)bno + len <=
+			eoag);
+	if (error) {
+		*keep_scanning = false;
+		goto out;
+	}
+
+out:
+	return error;
+}
+
+/* Count the number of free inodes. */
+static unsigned int
+xfs_scrub_iallocbt_freecount(
+	xfs_inofree_t			freemask)
+{
+	int				bits = XFS_INODES_PER_CHUNK;
+	unsigned int			ret = 0;
+
+	while (bits--) {
+		if (freemask & 1)
+			ret++;
+		freemask >>= 1;
+	}
+
+	return ret;
+}
+
+/* Check a particular inode with ir_free. */
+STATIC int
+xfs_scrub_iallocbt_check_cluster_freemask(
+	struct xfs_scrub_btree		*bs,
+	xfs_ino_t			fsino,
+	xfs_agino_t			chunkino,
+	xfs_agino_t			clusterino,
+	struct xfs_inobt_rec_incore	*irec,
+	struct xfs_buf			*bp)
+{
+	struct xfs_dinode		*dip;
+	struct xfs_mount		*mp = bs->cur->bc_mp;
+	bool				freemask_ok;
+	bool				inuse;
+	int				error;
+
+	dip = xfs_buf_offset(bp, clusterino * mp->m_sb.sb_inodesize);
+	XFS_SCRUB_BTREC_GOTO(bs,
+			be16_to_cpu(dip->di_magic) == XFS_DINODE_MAGIC,
+			out);
+	XFS_SCRUB_BTREC_GOTO(bs,
+			dip->di_version < 3 || be64_to_cpu(dip->di_ino) ==
+				fsino + clusterino,
+			out);
+	freemask_ok = !!(irec->ir_free & XFS_INOBT_MASK(chunkino + clusterino));
+	error = xfs_icache_inode_is_allocated(mp, bs->cur->bc_tp,
+			fsino + clusterino, &inuse);
+	if (error == -ENOENT) {
+		/* Not cached, just read the disk buffer */
+		freemask_ok ^= !!(dip->di_mode);
+		if (!bs->sc->try_harder && !freemask_ok)
+			return -EDEADLOCK;
+	} else if (error < 0) {
+		/* Inode is only half assembled, don't bother. */
+		freemask_ok = true;
+	} else {
+		/* Inode is all there. */
+		freemask_ok ^= inuse;
+	}
+	XFS_SCRUB_BTREC_CHECK(bs, freemask_ok);
+out:
+	return 0;
+}
+
+/* Make sure the free mask is consistent with what the inodes think. */
+STATIC int
+xfs_scrub_iallocbt_check_freemask(
+	struct xfs_scrub_btree		*bs,
+	struct xfs_inobt_rec_incore	*irec)
+{
+	struct xfs_owner_info		oinfo;
+	struct xfs_imap			imap;
+	struct xfs_mount		*mp = bs->cur->bc_mp;
+	struct xfs_dinode		*dip;
+	struct xfs_buf			*bp;
+	xfs_ino_t			fsino;
+	xfs_agino_t			nr_inodes;
+	xfs_agino_t			agino;
+	xfs_agino_t			chunkino;
+	xfs_agino_t			clusterino;
+	xfs_agblock_t			agbno;
+	int				blks_per_cluster;
+	uint16_t			holemask;
+	uint16_t			ir_holemask;
+	int				error = 0;
+
+	/* Make sure the freemask matches the inode records. */
+	blks_per_cluster = xfs_icluster_size_fsb(mp);
+	nr_inodes = XFS_OFFBNO_TO_AGINO(mp, blks_per_cluster, 0);
+	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_INODES);
+
+	for (agino = irec->ir_startino;
+	     agino < irec->ir_startino + XFS_INODES_PER_CHUNK;
+	     agino += blks_per_cluster * mp->m_sb.sb_inopblock) {
+		fsino = XFS_AGINO_TO_INO(mp, bs->cur->bc_private.a.agno, agino);
+		chunkino = agino - irec->ir_startino;
+		agbno = XFS_AGINO_TO_AGBNO(mp, agino);
+
+		/* Compute the holemask mask for this cluster. */
+		for (clusterino = 0, holemask = 0; clusterino < nr_inodes;
+		     clusterino += XFS_INODES_PER_HOLEMASK_BIT)
+			holemask |= XFS_INOBT_MASK((chunkino + clusterino) /
+					XFS_INODES_PER_HOLEMASK_BIT);
+
+		/* The whole cluster must be a hole or not a hole. */
+		ir_holemask = (irec->ir_holemask & holemask);
+		XFS_SCRUB_BTREC_CHECK(bs, ir_holemask == holemask ||
+				ir_holemask == 0);
+
+		/* If any part of this is a hole, skip it. */
+		if (ir_holemask)
+			continue;
+
+		/* Grab the inode cluster buffer. */
+		imap.im_blkno = XFS_AGB_TO_DADDR(mp, bs->cur->bc_private.a.agno,
+				agbno);
+		imap.im_len = XFS_FSB_TO_BB(mp, blks_per_cluster);
+		imap.im_boffset = 0;
+
+		error = xfs_imap_to_bp(mp, bs->cur->bc_tp, &imap,
+				&dip, &bp, 0, 0);
+		XFS_SCRUB_BTREC_OP_ERROR_GOTO(bs, &error, next_cluster);
+
+		/* Which inodes are free? */
+		for (clusterino = 0; clusterino < nr_inodes; clusterino++) {
+			error = xfs_scrub_iallocbt_check_cluster_freemask(bs,
+					fsino, chunkino, clusterino, irec, bp);
+			if (error) {
+				xfs_trans_brelse(bs->cur->bc_tp, bp);
+				return error;
+			}
+		}
+
+		xfs_trans_brelse(bs->cur->bc_tp, bp);
+next_cluster:
+		;
+	}
+
+	return error;
+}
+
+/* Scrub an inobt/finobt record. */
+STATIC int
+xfs_scrub_iallocbt_helper(
+	struct xfs_scrub_btree		*bs,
+	union xfs_btree_rec		*rec)
+{
+	struct xfs_mount		*mp = bs->cur->bc_mp;
+	struct xfs_agi			*agi;
+	struct xfs_inobt_rec_incore	irec;
+	uint64_t			holes;
+	xfs_agino_t			agino;
+	xfs_agblock_t			agbno;
+	xfs_extlen_t			len;
+	bool				keep_scanning;
+	int				holecount;
+	int				i;
+	int				error = 0;
+	int				err2 = 0;
+	unsigned int			real_freecount;
+	uint16_t			holemask;
+
+	xfs_inobt_btrec_to_irec(mp, rec, &irec);
+
+	XFS_SCRUB_BTREC_CHECK(bs, irec.ir_count <= XFS_INODES_PER_CHUNK);
+	XFS_SCRUB_BTREC_CHECK(bs, irec.ir_freecount <= XFS_INODES_PER_CHUNK);
+	real_freecount = irec.ir_freecount +
+			(XFS_INODES_PER_CHUNK - irec.ir_count);
+	XFS_SCRUB_BTREC_CHECK(bs, real_freecount ==
+			xfs_scrub_iallocbt_freecount(irec.ir_free));
+	agi = XFS_BUF_TO_AGI(bs->sc->sa.agi_bp);
+	agino = irec.ir_startino;
+	agbno = XFS_AGINO_TO_AGBNO(mp, irec.ir_startino);
+	XFS_SCRUB_BTREC_GOTO(bs, agbno < be32_to_cpu(agi->agi_length), out);
+	XFS_SCRUB_BTREC_CHECK(bs,
+			!(agbno & (xfs_ialloc_cluster_alignment(mp) - 1)));
+	XFS_SCRUB_BTREC_CHECK(bs, !(agbno & (xfs_icluster_size_fsb(mp) - 1)));
+
+	/* Handle non-sparse inodes */
+	if (!xfs_inobt_issparse(irec.ir_holemask)) {
+		len = XFS_B_TO_FSB(mp,
+				XFS_INODES_PER_CHUNK * mp->m_sb.sb_inodesize);
+		XFS_SCRUB_BTREC_CHECK(bs,
+				irec.ir_count == XFS_INODES_PER_CHUNK);
+
+		error = xfs_scrub_iallocbt_chunk(bs, &irec, agino, len,
+				&keep_scanning);
+		if (error)
+			goto out;
+		goto check_freemask;
+	}
+
+	/* Check each chunk of a sparse inode cluster. */
+	holemask = irec.ir_holemask;
+	holecount = 0;
+	len = XFS_B_TO_FSB(mp,
+			XFS_INODES_PER_HOLEMASK_BIT * mp->m_sb.sb_inodesize);
+	holes = ~xfs_inobt_irec_to_allocmask(&irec);
+	XFS_SCRUB_BTREC_CHECK(bs, (holes & irec.ir_free) == holes);
+	XFS_SCRUB_BTREC_CHECK(bs, irec.ir_freecount <= irec.ir_count);
+
+	for (i = 0; i < XFS_INOBT_HOLEMASK_BITS; holemask >>= 1,
+			i++, agino += XFS_INODES_PER_HOLEMASK_BIT) {
+		if (holemask & 1) {
+			holecount += XFS_INODES_PER_HOLEMASK_BIT;
+			continue;
+		}
+
+		err2 = xfs_scrub_iallocbt_chunk(bs, &irec, agino, len,
+				&keep_scanning);
+		if (!error && err2)
+			error = err2;
+		if (!keep_scanning)
+			break;
+	}
+
+	XFS_SCRUB_BTREC_CHECK(bs, holecount <= XFS_INODES_PER_CHUNK);
+	XFS_SCRUB_BTREC_CHECK(bs, holecount + irec.ir_count ==
+			XFS_INODES_PER_CHUNK);
+
+check_freemask:
+	error = xfs_scrub_iallocbt_check_freemask(bs, &irec);
+	if (error)
+		goto out;
+
+out:
+	return error;
+}
+
+/* Scrub the inode btrees for some AG. */
+STATIC int
+xfs_scrub_iallocbt(
+	struct xfs_scrub_context	*sc,
+	xfs_btnum_t			which)
+{
+	struct xfs_btree_cur		*cur;
+	struct xfs_owner_info		oinfo;
+
+	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_INOBT);
+	cur = which == XFS_BTNUM_INO ? sc->sa.ino_cur : sc->sa.fino_cur;
+	return xfs_scrub_btree(sc, cur, xfs_scrub_iallocbt_helper,
+			&oinfo, NULL);
+}
+
+int
+xfs_scrub_inobt(
+	struct xfs_scrub_context	*sc)
+{
+	return xfs_scrub_iallocbt(sc, XFS_BTNUM_INO);
+}
+
+int
+xfs_scrub_finobt(
+	struct xfs_scrub_context	*sc)
+{
+	return xfs_scrub_iallocbt(sc, XFS_BTNUM_FINO);
+}
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 4a9a645..e2c5f99 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3319,7 +3319,9 @@ DEFINE_GETFSMAP_EVENT(xfs_getfsmap_mapping);
 	{ XFS_SCRUB_TYPE_AGFL,		"AGFL" }, \
 	{ XFS_SCRUB_TYPE_AGI,		"AGI" }, \
 	{ XFS_SCRUB_TYPE_BNOBT,		"bnobt" }, \
-	{ XFS_SCRUB_TYPE_CNTBT,		"cntbt" }
+	{ XFS_SCRUB_TYPE_CNTBT,		"cntbt" }, \
+	{ XFS_SCRUB_TYPE_INOBT,		"inobt" }, \
+	{ XFS_SCRUB_TYPE_FINOBT,	"finobt" }
 DECLARE_EVENT_CLASS(xfs_scrub_class,
 	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
 		 int error),


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 11/22] xfs: scrub rmap btrees
  2017-07-21  4:38 [PATCH v8 00/22] xfs: online scrub support Darrick J. Wong
                   ` (9 preceding siblings ...)
  2017-07-21  4:39 ` [PATCH 10/22] xfs: scrub inode btrees Darrick J. Wong
@ 2017-07-21  4:39 ` Darrick J. Wong
  2017-07-23 17:21   ` Allison Henderson
  2017-07-21  4:39 ` [PATCH 12/22] xfs: scrub refcount btrees Darrick J. Wong
                   ` (10 subsequent siblings)
  21 siblings, 1 reply; 63+ messages in thread
From: Darrick J. Wong @ 2017-07-21  4:39 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Check the reverse mapping records to make sure that the contents
make sense.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile        |    1 
 fs/xfs/libxfs/xfs_fs.h |    3 +
 fs/xfs/scrub/common.c  |    5 ++
 fs/xfs/scrub/common.h  |    2 +
 fs/xfs/scrub/rmap.c    |  127 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_trace.h     |    3 +
 6 files changed, 139 insertions(+), 2 deletions(-)
 create mode 100644 fs/xfs/scrub/rmap.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 5197bea..5fe0c8e 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -146,5 +146,6 @@ xfs-y				+= $(addprefix scrub/, \
 				   common.o \
 				   ialloc.o \
 				   metabufs.o \
+				   rmap.o \
 				   )
 endif
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 5120cfd..24db73a 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -491,7 +491,8 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_CNTBT	7	/* freesp by length btree */
 #define XFS_SCRUB_TYPE_INOBT	8	/* inode btree */
 #define XFS_SCRUB_TYPE_FINOBT	9	/* free inode btree */
-#define XFS_SCRUB_TYPE_MAX	9
+#define XFS_SCRUB_TYPE_RMAPBT	10	/* reverse mapping btree */
+#define XFS_SCRUB_TYPE_MAX	10
 
 /* i: repair this metadata */
 #define XFS_SCRUB_FLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 9a31846..dfa5fc5 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -725,6 +725,11 @@ static const struct xfs_scrub_meta_fns meta_scrub_fns[] = {
 		.scrub	= xfs_scrub_finobt,
 		.has	= xfs_sb_version_hasfinobt,
 	},
+	{ /* rmapbt */
+		.setup	= xfs_scrub_setup_ag_rmapbt,
+		.scrub	= xfs_scrub_rmapbt,
+		.has	= xfs_sb_version_hasrmapbt,
+	},
 };
 
 /* Dispatch metadata scrubbing. */
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index cd89bec..8fbd19b 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -211,6 +211,7 @@ SETUP_FN(xfs_scrub_setup_metabufs);
 SETUP_FN(xfs_scrub_setup_ag_header);
 SETUP_FN(xfs_scrub_setup_ag_allocbt);
 SETUP_FN(xfs_scrub_setup_ag_iallocbt);
+SETUP_FN(xfs_scrub_setup_ag_rmapbt);
 #undef SETUP_FN
 
 /* Metadata scrubbers */
@@ -226,6 +227,7 @@ SCRUB_FN(xfs_scrub_bnobt);
 SCRUB_FN(xfs_scrub_cntbt);
 SCRUB_FN(xfs_scrub_inobt);
 SCRUB_FN(xfs_scrub_finobt);
+SCRUB_FN(xfs_scrub_rmapbt);
 #undef SCRUB_FN
 
 #endif	/* __XFS_REPAIR_COMMON_H__ */
diff --git a/fs/xfs/scrub/rmap.c b/fs/xfs/scrub/rmap.c
new file mode 100644
index 0000000..82d027b
--- /dev/null
+++ b/fs/xfs/scrub/rmap.c
@@ -0,0 +1,127 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_trace.h"
+#include "xfs_sb.h"
+#include "xfs_rmap.h"
+#include "scrub/common.h"
+#include "scrub/btree.h"
+
+/*
+ * Set us up to scrub reverse mapping btrees.
+ */
+int
+xfs_scrub_setup_ag_rmapbt(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	return xfs_scrub_setup_ag_btree(sc, ip, false);
+}
+
+/* Reverse-mapping scrubber. */
+
+/* Scrub an rmapbt record. */
+STATIC int
+xfs_scrub_rmapbt_helper(
+	struct xfs_scrub_btree		*bs,
+	union xfs_btree_rec		*rec)
+{
+	struct xfs_mount		*mp = bs->cur->bc_mp;
+	struct xfs_agf			*agf;
+	struct xfs_rmap_irec		irec;
+	xfs_agblock_t			eoag;
+	bool				non_inode;
+	bool				is_unwritten;
+	bool				is_bmbt;
+	bool				is_attr;
+	int				error;
+
+	error = xfs_rmap_btrec_to_irec(rec, &irec);
+	XFS_SCRUB_BTREC_OP_ERROR_GOTO(bs, &error, out);
+
+	/* Check extent. */
+	agf = XFS_BUF_TO_AGF(bs->sc->sa.agf_bp);
+	eoag = be32_to_cpu(agf->agf_length);
+	XFS_SCRUB_BTREC_CHECK(bs, irec.rm_startblock < mp->m_sb.sb_agblocks);
+	XFS_SCRUB_BTREC_CHECK(bs, irec.rm_startblock < eoag);
+	XFS_SCRUB_BTREC_CHECK(bs, irec.rm_startblock < irec.rm_startblock +
+			irec.rm_blockcount);
+	XFS_SCRUB_BTREC_CHECK(bs, irec.rm_startblock + irec.rm_blockcount <=
+			mp->m_sb.sb_agblocks);
+	XFS_SCRUB_BTREC_CHECK(bs, irec.rm_startblock + irec.rm_blockcount <=
+			eoag);
+
+	/* Check flags. */
+	non_inode = XFS_RMAP_NON_INODE_OWNER(irec.rm_owner);
+	is_bmbt = irec.rm_flags & XFS_RMAP_BMBT_BLOCK;
+	is_attr = irec.rm_flags & XFS_RMAP_ATTR_FORK;
+	is_unwritten = irec.rm_flags & XFS_RMAP_UNWRITTEN;
+
+	XFS_SCRUB_BTREC_CHECK(bs, !is_bmbt || irec.rm_offset == 0);
+	XFS_SCRUB_BTREC_CHECK(bs, !non_inode || irec.rm_offset == 0);
+	XFS_SCRUB_BTREC_CHECK(bs, !is_unwritten || !(is_bmbt || non_inode ||
+			is_attr));
+	XFS_SCRUB_BTREC_CHECK(bs, !non_inode || !(is_bmbt || is_unwritten ||
+			is_attr));
+
+	/* Owner inode within an AG? */
+	XFS_SCRUB_BTREC_CHECK(bs, non_inode ||
+			(XFS_INO_TO_AGNO(mp, irec.rm_owner) <
+							mp->m_sb.sb_agcount &&
+			 XFS_AGINO_TO_AGBNO(mp,
+				XFS_INO_TO_AGINO(mp, irec.rm_owner)) <
+							mp->m_sb.sb_agblocks));
+	/* Owner inode within the FS? */
+	XFS_SCRUB_BTREC_CHECK(bs, non_inode ||
+			XFS_AGB_TO_DADDR(mp,
+				XFS_INO_TO_AGNO(mp, irec.rm_owner),
+				XFS_AGINO_TO_AGBNO(mp,
+					XFS_INO_TO_AGINO(mp, irec.rm_owner))) <
+			XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks));
+
+	/* Non-inode owner within the magic values? */
+	XFS_SCRUB_BTREC_CHECK(bs, !non_inode ||
+			(irec.rm_owner > XFS_RMAP_OWN_MIN &&
+			 irec.rm_owner <= XFS_RMAP_OWN_FS));
+out:
+	return error;
+}
+
+/* Scrub the rmap btree for some AG. */
+int
+xfs_scrub_rmapbt(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_owner_info		oinfo;
+
+	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_AG);
+	return xfs_scrub_btree(sc, sc->sa.rmap_cur, xfs_scrub_rmapbt_helper,
+			&oinfo, NULL);
+}
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index e2c5f99..3996cb8 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3321,7 +3321,8 @@ DEFINE_GETFSMAP_EVENT(xfs_getfsmap_mapping);
 	{ XFS_SCRUB_TYPE_BNOBT,		"bnobt" }, \
 	{ XFS_SCRUB_TYPE_CNTBT,		"cntbt" }, \
 	{ XFS_SCRUB_TYPE_INOBT,		"inobt" }, \
-	{ XFS_SCRUB_TYPE_FINOBT,	"finobt" }
+	{ XFS_SCRUB_TYPE_FINOBT,	"finobt" }, \
+	{ XFS_SCRUB_TYPE_RMAPBT,	"rmapbt" }
 DECLARE_EVENT_CLASS(xfs_scrub_class,
 	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
 		 int error),


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 12/22] xfs: scrub refcount btrees
  2017-07-21  4:38 [PATCH v8 00/22] xfs: online scrub support Darrick J. Wong
                   ` (10 preceding siblings ...)
  2017-07-21  4:39 ` [PATCH 11/22] xfs: scrub rmap btrees Darrick J. Wong
@ 2017-07-21  4:39 ` Darrick J. Wong
  2017-07-23 17:25   ` Allison Henderson
  2017-07-21  4:39 ` [PATCH 13/22] xfs: scrub inodes Darrick J. Wong
                   ` (9 subsequent siblings)
  21 siblings, 1 reply; 63+ messages in thread
From: Darrick J. Wong @ 2017-07-21  4:39 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Plumb in the pieces necessary to check the refcount btree.  If rmap is
available, check the reference count by performing an interval query
against the rmapbt.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile         |    1 
 fs/xfs/libxfs/xfs_fs.h  |    3 +
 fs/xfs/scrub/common.c   |    5 ++
 fs/xfs/scrub/common.h   |    2 +
 fs/xfs/scrub/refcount.c |   96 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_trace.h      |    3 +
 6 files changed, 108 insertions(+), 2 deletions(-)
 create mode 100644 fs/xfs/scrub/refcount.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 5fe0c8e..1b1972b 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -146,6 +146,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   common.o \
 				   ialloc.o \
 				   metabufs.o \
+				   refcount.o \
 				   rmap.o \
 				   )
 endif
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 24db73a..3253de9 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -492,7 +492,8 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_INOBT	8	/* inode btree */
 #define XFS_SCRUB_TYPE_FINOBT	9	/* free inode btree */
 #define XFS_SCRUB_TYPE_RMAPBT	10	/* reverse mapping btree */
-#define XFS_SCRUB_TYPE_MAX	10
+#define XFS_SCRUB_TYPE_REFCNTBT	11	/* reference count btree */
+#define XFS_SCRUB_TYPE_MAX	11
 
 /* i: repair this metadata */
 #define XFS_SCRUB_FLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index dfa5fc5..71a980e 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -730,6 +730,11 @@ static const struct xfs_scrub_meta_fns meta_scrub_fns[] = {
 		.scrub	= xfs_scrub_rmapbt,
 		.has	= xfs_sb_version_hasrmapbt,
 	},
+	{ /* refcountbt */
+		.setup	= xfs_scrub_setup_ag_refcountbt,
+		.scrub	= xfs_scrub_refcountbt,
+		.has	= xfs_sb_version_hasreflink,
+	},
 };
 
 /* Dispatch metadata scrubbing. */
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 8fbd19b..1f9ba8c6 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -212,6 +212,7 @@ SETUP_FN(xfs_scrub_setup_ag_header);
 SETUP_FN(xfs_scrub_setup_ag_allocbt);
 SETUP_FN(xfs_scrub_setup_ag_iallocbt);
 SETUP_FN(xfs_scrub_setup_ag_rmapbt);
+SETUP_FN(xfs_scrub_setup_ag_refcountbt);
 #undef SETUP_FN
 
 /* Metadata scrubbers */
@@ -228,6 +229,7 @@ SCRUB_FN(xfs_scrub_cntbt);
 SCRUB_FN(xfs_scrub_inobt);
 SCRUB_FN(xfs_scrub_finobt);
 SCRUB_FN(xfs_scrub_rmapbt);
+SCRUB_FN(xfs_scrub_refcountbt);
 #undef SCRUB_FN
 
 #endif	/* __XFS_REPAIR_COMMON_H__ */
diff --git a/fs/xfs/scrub/refcount.c b/fs/xfs/scrub/refcount.c
new file mode 100644
index 0000000..fcc72c5
--- /dev/null
+++ b/fs/xfs/scrub/refcount.c
@@ -0,0 +1,96 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_trace.h"
+#include "xfs_sb.h"
+#include "xfs_rmap.h"
+#include "scrub/common.h"
+#include "scrub/btree.h"
+
+/*
+ * Set us up to scrub reference count btrees.
+ */
+int
+xfs_scrub_setup_ag_refcountbt(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	return xfs_scrub_setup_ag_btree(sc, ip, false);
+}
+
+/* Reference count btree scrubber. */
+
+/* Scrub a refcountbt record. */
+STATIC int
+xfs_scrub_refcountbt_helper(
+	struct xfs_scrub_btree		*bs,
+	union xfs_btree_rec		*rec)
+{
+	struct xfs_mount		*mp = bs->cur->bc_mp;
+	struct xfs_agf			*agf;
+	struct xfs_refcount_irec	irec;
+	xfs_agblock_t			eoag;
+	bool				has_cowflag;
+	int				error = 0;
+
+	irec.rc_startblock = be32_to_cpu(rec->refc.rc_startblock);
+	irec.rc_blockcount = be32_to_cpu(rec->refc.rc_blockcount);
+	irec.rc_refcount = be32_to_cpu(rec->refc.rc_refcount);
+	agf = XFS_BUF_TO_AGF(bs->sc->sa.agf_bp);
+	eoag = be32_to_cpu(agf->agf_length);
+
+	has_cowflag = !!(irec.rc_startblock & XFS_REFC_COW_START);
+	XFS_SCRUB_BTREC_CHECK(bs, (irec.rc_refcount == 1 && has_cowflag) ||
+				  (irec.rc_refcount != 1 && !has_cowflag));
+	irec.rc_startblock &= ~XFS_REFC_COW_START;
+	XFS_SCRUB_BTREC_CHECK(bs, irec.rc_startblock < mp->m_sb.sb_agblocks);
+	XFS_SCRUB_BTREC_CHECK(bs, irec.rc_startblock < eoag);
+	XFS_SCRUB_BTREC_CHECK(bs, irec.rc_startblock < irec.rc_startblock +
+			irec.rc_blockcount);
+	XFS_SCRUB_BTREC_CHECK(bs, (unsigned long long)irec.rc_startblock +
+			irec.rc_blockcount <= mp->m_sb.sb_agblocks);
+	XFS_SCRUB_BTREC_CHECK(bs, (unsigned long long)irec.rc_startblock +
+			irec.rc_blockcount <= eoag);
+	XFS_SCRUB_BTREC_CHECK(bs, irec.rc_refcount >= 1);
+
+	return error;
+}
+
+/* Scrub the refcount btree for some AG. */
+int
+xfs_scrub_refcountbt(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_owner_info		oinfo;
+
+	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_REFC);
+	return xfs_scrub_btree(sc, sc->sa.refc_cur, xfs_scrub_refcountbt_helper,
+			&oinfo, NULL);
+}
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 3996cb8..6c0281b 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3322,7 +3322,8 @@ DEFINE_GETFSMAP_EVENT(xfs_getfsmap_mapping);
 	{ XFS_SCRUB_TYPE_CNTBT,		"cntbt" }, \
 	{ XFS_SCRUB_TYPE_INOBT,		"inobt" }, \
 	{ XFS_SCRUB_TYPE_FINOBT,	"finobt" }, \
-	{ XFS_SCRUB_TYPE_RMAPBT,	"rmapbt" }
+	{ XFS_SCRUB_TYPE_RMAPBT,	"rmapbt" }, \
+	{ XFS_SCRUB_TYPE_REFCNTBT,	"refcountbt" }
 DECLARE_EVENT_CLASS(xfs_scrub_class,
 	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
 		 int error),


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 13/22] xfs: scrub inodes
  2017-07-21  4:38 [PATCH v8 00/22] xfs: online scrub support Darrick J. Wong
                   ` (11 preceding siblings ...)
  2017-07-21  4:39 ` [PATCH 12/22] xfs: scrub refcount btrees Darrick J. Wong
@ 2017-07-21  4:39 ` Darrick J. Wong
  2017-07-23 17:38   ` Allison Henderson
  2017-07-21  4:40 ` [PATCH 14/22] xfs: scrub inode block mappings Darrick J. Wong
                   ` (8 subsequent siblings)
  21 siblings, 1 reply; 63+ messages in thread
From: Darrick J. Wong @ 2017-07-21  4:39 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Scrub the fields within an inode.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile        |    1 
 fs/xfs/libxfs/xfs_fs.h |    3 
 fs/xfs/scrub/common.c  |   64 +++++++++
 fs/xfs/scrub/common.h  |    4 +
 fs/xfs/scrub/inode.c   |  326 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_trace.h     |    3 
 6 files changed, 397 insertions(+), 4 deletions(-)
 create mode 100644 fs/xfs/scrub/inode.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 1b1972b..2ba33ad 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -145,6 +145,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   btree.o \
 				   common.o \
 				   ialloc.o \
+				   inode.o \
 				   metabufs.o \
 				   refcount.o \
 				   rmap.o \
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 3253de9..277b528 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -493,7 +493,8 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_FINOBT	9	/* free inode btree */
 #define XFS_SCRUB_TYPE_RMAPBT	10	/* reverse mapping btree */
 #define XFS_SCRUB_TYPE_REFCNTBT	11	/* reference count btree */
-#define XFS_SCRUB_TYPE_MAX	11
+#define XFS_SCRUB_TYPE_INODE	12	/* inode record */
+#define XFS_SCRUB_TYPE_MAX	12
 
 /* i: repair this metadata */
 #define XFS_SCRUB_FLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 71a980e..066fd3e 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -31,6 +31,8 @@
 #include "xfs_trace.h"
 #include "xfs_sb.h"
 #include "xfs_inode.h"
+#include "xfs_icache.h"
+#include "xfs_itable.h"
 #include "xfs_alloc.h"
 #include "xfs_alloc_btree.h"
 #include "xfs_bmap.h"
@@ -584,12 +586,60 @@ xfs_scrub_setup_fs(
 			&M_RES(sc->mp)->tr_itruncate, 0, 0, 0, &sc->tp);
 }
 
+/*
+ * Given an inode and the scrub control structure, grab either the
+ * inode referenced in the control structure or the inode passed in.
+ * The inode is not locked.
+ */
+int
+xfs_scrub_get_inode(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip_in)
+{
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_inode		*ips = NULL;
+	int				error;
+
+	if (sc->sm->sm_agno || (sc->sm->sm_gen && !sc->sm->sm_ino))
+		return -EINVAL;
+
+	/* We want to scan the inode we already had opened. */
+	if (sc->sm->sm_ino == 0 || sc->sm->sm_ino == ip_in->i_ino) {
+		sc->ip = ip_in;
+		return 0;
+	}
+
+	/* Look up the inode, see if the generation number matches. */
+	if (xfs_internal_inum(mp, sc->sm->sm_ino))
+		return -ENOENT;
+	error = xfs_iget(mp, NULL, sc->sm->sm_ino, XFS_IGET_UNTRUSTED,
+			0, &ips);
+	if (error == -ENOENT || error == -EINVAL) {
+		/* inode doesn't exist... */
+		return -ENOENT;
+	} else if (error) {
+		trace_xfs_scrub_op_error(mp,
+				XFS_INO_TO_AGNO(mp, sc->sm->sm_ino),
+				XFS_INO_TO_AGBNO(mp, sc->sm->sm_ino),
+				"inode", error, __func__, __LINE__);
+		return error;
+	}
+	if (VFS_I(ips)->i_generation != sc->sm->sm_gen) {
+		IRELE(ips);
+		return -ENOENT;
+	}
+
+	sc->ip = ips;
+	return 0;
+}
+
 /* Scrub setup and teardown */
 
 /* Free all the resources and finish the transactions. */
 STATIC int
 xfs_scrub_teardown(
 	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip_in,
 	int				error)
 {
 	xfs_scrub_ag_free(sc, &sc->sa);
@@ -597,6 +647,12 @@ xfs_scrub_teardown(
 		xfs_trans_cancel(sc->tp);
 		sc->tp = NULL;
 	}
+	if (sc->ip) {
+		xfs_iunlock(sc->ip, sc->ilock_flags);
+		if (sc->ip != ip_in)
+			IRELE(sc->ip);
+		sc->ip = NULL;
+	}
 	return error;
 }
 
@@ -735,6 +791,10 @@ static const struct xfs_scrub_meta_fns meta_scrub_fns[] = {
 		.scrub	= xfs_scrub_refcountbt,
 		.has	= xfs_sb_version_hasreflink,
 	},
+	{ /* inode record */
+		.setup	= xfs_scrub_setup_inode,
+		.scrub	= xfs_scrub_inode,
+	},
 };
 
 /* Dispatch metadata scrubbing. */
@@ -808,7 +868,7 @@ xfs_scrub_metadata(
 		 * Tear down everything we hold, then set up again with
 		 * preparation for worst-case scenarios.
 		 */
-		error = xfs_scrub_teardown(&sc, 0);
+		error = xfs_scrub_teardown(&sc, ip, 0);
 		if (error)
 			goto out;
 		try_harder = true;
@@ -820,7 +880,7 @@ xfs_scrub_metadata(
 		xfs_alert_ratelimited(mp, "Corruption detected during scrub.");
 
 out_teardown:
-	error = xfs_scrub_teardown(&sc, error);
+	error = xfs_scrub_teardown(&sc, ip, error);
 out:
 	trace_xfs_scrub_done(ip, sm, error);
 	return error;
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 1f9ba8c6..5caa6c9 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -52,6 +52,7 @@ struct xfs_scrub_context {
 	const struct xfs_scrub_meta_fns	*fns;
 	struct xfs_trans		*tp;
 	struct xfs_inode		*ip;
+	uint				ilock_flags;
 	bool				try_harder;
 
 	/* State tracking for single-AG operations. */
@@ -204,6 +205,7 @@ int xfs_scrub_walk_agfl(struct xfs_scrub_context *sc,
 
 int xfs_scrub_setup_ag_btree(struct xfs_scrub_context *sc,
 			     struct xfs_inode *ip, bool force_log);
+int xfs_scrub_get_inode(struct xfs_scrub_context *sc, struct xfs_inode *ip_in);
 
 #define SETUP_FN(name) int name(struct xfs_scrub_context *sc, struct xfs_inode *ip)
 SETUP_FN(xfs_scrub_setup_fs);
@@ -213,6 +215,7 @@ SETUP_FN(xfs_scrub_setup_ag_allocbt);
 SETUP_FN(xfs_scrub_setup_ag_iallocbt);
 SETUP_FN(xfs_scrub_setup_ag_rmapbt);
 SETUP_FN(xfs_scrub_setup_ag_refcountbt);
+SETUP_FN(xfs_scrub_setup_inode);
 #undef SETUP_FN
 
 /* Metadata scrubbers */
@@ -230,6 +233,7 @@ SCRUB_FN(xfs_scrub_inobt);
 SCRUB_FN(xfs_scrub_finobt);
 SCRUB_FN(xfs_scrub_rmapbt);
 SCRUB_FN(xfs_scrub_refcountbt);
+SCRUB_FN(xfs_scrub_inode);
 #undef SCRUB_FN
 
 #endif	/* __XFS_REPAIR_COMMON_H__ */
diff --git a/fs/xfs/scrub/inode.c b/fs/xfs/scrub/inode.c
new file mode 100644
index 0000000..6e1e037
--- /dev/null
+++ b/fs/xfs/scrub/inode.c
@@ -0,0 +1,326 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_trace.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_icache.h"
+#include "xfs_inode_buf.h"
+#include "xfs_inode_fork.h"
+#include "xfs_ialloc.h"
+#include "xfs_log.h"
+#include "xfs_trans_priv.h"
+#include "xfs_reflink.h"
+#include "scrub/common.h"
+
+/* Set us up with an inode. */
+int
+xfs_scrub_setup_inode(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	struct xfs_mount		*mp = sc->mp;
+	int				error;
+
+	/*
+	 * Try to get the inode.  If the verifiers fail, we try again
+	 * in raw mode.
+	 */
+	error = xfs_scrub_get_inode(sc, ip);
+	switch (error) {
+	case 0:
+		break;
+	case -EFSCORRUPTED:
+	case -EFSBADCRC:
+		/* Push everything out of the log onto disk prior to check. */
+		error = _xfs_log_force(mp, XFS_LOG_SYNC, NULL);
+		if (error)
+			return error;
+		xfs_ail_push_all_sync(mp->m_ail);
+		return 0;
+	default:
+		return error;
+	}
+
+	/* Got the inode, lock it and we're ready to go. */
+	sc->ilock_flags = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL;
+	xfs_ilock(sc->ip, sc->ilock_flags);
+	error = xfs_scrub_trans_alloc(sc->sm, mp, &M_RES(mp)->tr_itruncate,
+			0, 0, 0, &sc->tp);
+	if (error)
+		goto out_unlock;
+	sc->ilock_flags |= XFS_ILOCK_EXCL;
+	xfs_ilock(sc->ip, XFS_ILOCK_EXCL);
+
+	return error;
+out_unlock:
+	xfs_iunlock(sc->ip, sc->ilock_flags);
+	if (sc->ip != ip)
+		IRELE(sc->ip);
+	sc->ip = NULL;
+	return error;
+}
+
+/* Inode core */
+
+#define XFS_SCRUB_INODE_CHECK(fs_ok) \
+	XFS_SCRUB_INO_CHECK(sc, ino, bp, "inode", fs_ok)
+#define XFS_SCRUB_INODE_GOTO(fs_ok, label) \
+	XFS_SCRUB_INO_GOTO(sc, ino, bp, "inode", fs_ok, label)
+#define XFS_SCRUB_INODE_OP_ERROR_GOTO(label) \
+	XFS_SCRUB_OP_ERROR_GOTO(sc, XFS_INO_TO_AGNO(mp, ino), \
+			XFS_INO_TO_AGBNO(mp, ino), "inode", &error, label)
+#define XFS_SCRUB_INODE_PREEN(fs_ok) \
+	XFS_SCRUB_INO_PREEN(sc, bp, "inode", fs_ok)
+/* Scrub an inode. */
+int
+xfs_scrub_inode(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_imap			imap;
+	struct xfs_dinode		di;
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_buf			*bp = NULL;
+	struct xfs_dinode		*dip;
+	xfs_ino_t			ino;
+	unsigned long long		isize;
+	uint64_t			flags2;
+	uint32_t			nextents;
+	uint32_t			extsize;
+	uint32_t			cowextsize;
+	uint16_t			flags;
+	uint16_t			mode;
+	bool				has_shared;
+	int				error = 0;
+
+	/* Did we get the in-core inode, or are we doing this manually? */
+	if (sc->ip) {
+		ino = sc->ip->i_ino;
+		xfs_inode_to_disk(sc->ip, &di, 0);
+		dip = &di;
+	} else {
+		/* Map & read inode. */
+		ino = sc->sm->sm_ino;
+		error = xfs_imap(mp, sc->tp, ino, &imap, XFS_IGET_UNTRUSTED);
+		if (error == -EINVAL) {
+			/*
+			 * Inode could have gotten deleted out from under us;
+			 * just forget about it.
+			 */
+			error = -ENOENT;
+			goto out;
+		}
+		XFS_SCRUB_INODE_OP_ERROR_GOTO(out);
+
+		error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
+				imap.im_blkno, imap.im_len, XBF_UNMAPPED, &bp,
+				NULL);
+		XFS_SCRUB_INODE_OP_ERROR_GOTO(out);
+
+		/* Is this really the inode we want? */
+		bp->b_ops = &xfs_inode_buf_ops;
+		dip = xfs_buf_offset(bp, imap.im_boffset);
+		error = xfs_dinode_verify(mp, ino, dip) ? 0 : -EFSCORRUPTED;
+		XFS_SCRUB_INODE_OP_ERROR_GOTO(out);
+		XFS_SCRUB_INODE_GOTO(
+				xfs_dinode_good_version(mp, dip->di_version),
+				out);
+		if (be32_to_cpu(dip->di_gen) != sc->sm->sm_gen) {
+			error = -ENOENT;
+			goto out;
+		}
+	}
+
+	flags = be16_to_cpu(dip->di_flags);
+	if (dip->di_version >= 3)
+		flags2 = be64_to_cpu(dip->di_flags2);
+	else
+		flags2 = 0;
+
+	/* di_mode */
+	mode = be16_to_cpu(dip->di_mode);
+	XFS_SCRUB_INODE_CHECK(!(mode & ~(S_IALLUGO | S_IFMT)));
+
+	/* v1/v2 fields */
+	switch (dip->di_version) {
+	case 1:
+		XFS_SCRUB_INODE_CHECK(dip->di_nlink == 0);
+		XFS_SCRUB_INODE_CHECK(dip->di_mode || !sc->ip);
+		XFS_SCRUB_INODE_CHECK(dip->di_projid_lo == 0);
+		XFS_SCRUB_INODE_CHECK(dip->di_projid_hi == 0);
+		break;
+	case 2:
+	case 3:
+		XFS_SCRUB_INODE_CHECK(dip->di_onlink == 0);
+		XFS_SCRUB_INODE_CHECK(dip->di_mode || !sc->ip);
+		XFS_SCRUB_INODE_CHECK(dip->di_projid_hi == 0 ||
+				xfs_sb_version_hasprojid32bit(&mp->m_sb));
+		break;
+	default:
+		ASSERT(0);
+		break;
+	}
+
+	/* di_format */
+	switch (dip->di_format) {
+	case XFS_DINODE_FMT_DEV:
+		XFS_SCRUB_INODE_CHECK(S_ISCHR(mode) || S_ISBLK(mode) ||
+				      S_ISFIFO(mode) || S_ISSOCK(mode));
+		break;
+	case XFS_DINODE_FMT_LOCAL:
+		XFS_SCRUB_INODE_CHECK(S_ISDIR(mode) || S_ISLNK(mode));
+		break;
+	case XFS_DINODE_FMT_EXTENTS:
+		XFS_SCRUB_INODE_CHECK(S_ISREG(mode) || S_ISDIR(mode) ||
+				      S_ISLNK(mode));
+		break;
+	case XFS_DINODE_FMT_BTREE:
+		XFS_SCRUB_INODE_CHECK(S_ISREG(mode) || S_ISDIR(mode));
+		break;
+	case XFS_DINODE_FMT_UUID:
+	default:
+		XFS_SCRUB_INODE_CHECK(false);
+		break;
+	}
+
+	/* di_size */
+	isize = be64_to_cpu(dip->di_size);
+	XFS_SCRUB_INODE_CHECK(!(isize & (1ULL << 63)));
+	if (!S_ISDIR(mode) && !S_ISREG(mode) && !S_ISLNK(mode))
+		XFS_SCRUB_INODE_CHECK(isize == 0);
+
+	/* di_nblocks */
+	if (flags2 & XFS_DIFLAG2_REFLINK) {
+		; /* nblocks can exceed dblocks */
+	} else if (flags & XFS_DIFLAG_REALTIME) {
+		XFS_SCRUB_INODE_CHECK(be64_to_cpu(dip->di_nblocks) <
+				mp->m_sb.sb_dblocks + mp->m_sb.sb_rblocks);
+	} else {
+		XFS_SCRUB_INODE_CHECK(be64_to_cpu(dip->di_nblocks) <
+				mp->m_sb.sb_dblocks);
+	}
+
+	/* di_extsize */
+	if (flags & XFS_DIFLAG_EXTSIZE) {
+		extsize = be32_to_cpu(dip->di_extsize);
+		XFS_SCRUB_INODE_CHECK(extsize > 0);
+		XFS_SCRUB_INODE_CHECK(extsize <= MAXEXTLEN);
+		XFS_SCRUB_INODE_CHECK(extsize <= mp->m_sb.sb_agblocks / 2 ||
+				(flags & XFS_DIFLAG_REALTIME));
+	}
+
+	/* di_flags */
+	XFS_SCRUB_INODE_CHECK(!(flags & XFS_DIFLAG_IMMUTABLE) ||
+			      !(flags & XFS_DIFLAG_APPEND));
+
+	XFS_SCRUB_INODE_CHECK(!(flags & XFS_DIFLAG_FILESTREAM) ||
+			      !(flags & XFS_DIFLAG_REALTIME));
+
+	/* di_nextents */
+	nextents = be32_to_cpu(dip->di_nextents);
+	switch (dip->di_format) {
+	case XFS_DINODE_FMT_EXTENTS:
+		XFS_SCRUB_INODE_CHECK(nextents <=
+			XFS_DFORK_DSIZE(dip, mp) / sizeof(struct xfs_bmbt_rec));
+		break;
+	case XFS_DINODE_FMT_BTREE:
+		XFS_SCRUB_INODE_CHECK(nextents >
+			XFS_DFORK_DSIZE(dip, mp) / sizeof(struct xfs_bmbt_rec));
+		break;
+	case XFS_DINODE_FMT_LOCAL:
+	case XFS_DINODE_FMT_DEV:
+	case XFS_DINODE_FMT_UUID:
+	default:
+		XFS_SCRUB_INODE_CHECK(nextents == 0);
+		break;
+	}
+
+	/* di_anextents */
+	nextents = be16_to_cpu(dip->di_anextents);
+	switch (dip->di_aformat) {
+	case XFS_DINODE_FMT_EXTENTS:
+		XFS_SCRUB_INODE_CHECK(nextents <=
+			XFS_DFORK_ASIZE(dip, mp) / sizeof(struct xfs_bmbt_rec));
+		break;
+	case XFS_DINODE_FMT_BTREE:
+		XFS_SCRUB_INODE_CHECK(nextents >
+			XFS_DFORK_ASIZE(dip, mp) / sizeof(struct xfs_bmbt_rec));
+		break;
+	case XFS_DINODE_FMT_LOCAL:
+	case XFS_DINODE_FMT_DEV:
+	case XFS_DINODE_FMT_UUID:
+	default:
+		XFS_SCRUB_INODE_CHECK(nextents == 0);
+		break;
+	}
+
+	/* di_forkoff */
+	XFS_SCRUB_INODE_CHECK(XFS_DFORK_APTR(dip) <
+			(char *)dip + mp->m_sb.sb_inodesize);
+	XFS_SCRUB_INODE_CHECK(dip->di_anextents == 0 || dip->di_forkoff);
+
+	/* di_aformat */
+	XFS_SCRUB_INODE_CHECK(dip->di_aformat == XFS_DINODE_FMT_LOCAL ||
+			      dip->di_aformat == XFS_DINODE_FMT_EXTENTS ||
+			      dip->di_aformat == XFS_DINODE_FMT_BTREE);
+
+	/* di_cowextsize */
+	if (flags2 & XFS_DIFLAG2_COWEXTSIZE) {
+		cowextsize = be32_to_cpu(dip->di_cowextsize);
+		XFS_SCRUB_INODE_CHECK(xfs_sb_version_hasreflink(&mp->m_sb));
+		XFS_SCRUB_INODE_CHECK(cowextsize > 0);
+		XFS_SCRUB_INODE_CHECK(cowextsize <= MAXEXTLEN);
+		XFS_SCRUB_INODE_CHECK(cowextsize <= mp->m_sb.sb_agblocks / 2);
+	}
+
+	/* Now let's do the things that require a live inode. */
+	if (!sc->ip)
+		goto out;
+
+	/*
+	 * Does this inode have the reflink flag set but no shared extents?
+	 * Set the preening flag if this is the case.
+	 */
+	if (xfs_is_reflink_inode(sc->ip)) {
+		error = xfs_reflink_inode_has_shared_extents(sc->tp, sc->ip,
+				&has_shared);
+		XFS_SCRUB_INODE_OP_ERROR_GOTO(out);
+		XFS_SCRUB_INODE_PREEN(has_shared == true);
+	}
+
+out:
+	if (bp)
+		xfs_trans_brelse(sc->tp, bp);
+	return error;
+}
+#undef XFS_SCRUB_INODE_PREEN
+#undef XFS_SCRUB_INODE_OP_ERROR_GOTO
+#undef XFS_SCRUB_INODE_GOTO
+#undef XFS_SCRUB_INODE_CHECK
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 6c0281b..950e2c8 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3323,7 +3323,8 @@ DEFINE_GETFSMAP_EVENT(xfs_getfsmap_mapping);
 	{ XFS_SCRUB_TYPE_INOBT,		"inobt" }, \
 	{ XFS_SCRUB_TYPE_FINOBT,	"finobt" }, \
 	{ XFS_SCRUB_TYPE_RMAPBT,	"rmapbt" }, \
-	{ XFS_SCRUB_TYPE_REFCNTBT,	"refcountbt" }
+	{ XFS_SCRUB_TYPE_REFCNTBT,	"refcountbt" }, \
+	{ XFS_SCRUB_TYPE_INODE,		"inode" }
 DECLARE_EVENT_CLASS(xfs_scrub_class,
 	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
 		 int error),


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 14/22] xfs: scrub inode block mappings
  2017-07-21  4:38 [PATCH v8 00/22] xfs: online scrub support Darrick J. Wong
                   ` (12 preceding siblings ...)
  2017-07-21  4:39 ` [PATCH 13/22] xfs: scrub inodes Darrick J. Wong
@ 2017-07-21  4:40 ` Darrick J. Wong
  2017-07-23 17:41   ` Allison Henderson
  2017-07-21  4:40 ` [PATCH 15/22] xfs: scrub directory/attribute btrees Darrick J. Wong
                   ` (7 subsequent siblings)
  21 siblings, 1 reply; 63+ messages in thread
From: Darrick J. Wong @ 2017-07-21  4:40 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Scrub an individual inode's block mappings to make sure they make sense.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile        |    1 
 fs/xfs/libxfs/xfs_fs.h |    5 +
 fs/xfs/scrub/bmap.c    |  378 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/common.c  |   12 ++
 fs/xfs/scrub/common.h  |    5 +
 fs/xfs/xfs_trace.h     |    5 +
 6 files changed, 404 insertions(+), 2 deletions(-)
 create mode 100644 fs/xfs/scrub/bmap.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 2ba33ad..89c67e1a 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -142,6 +142,7 @@ ifeq ($(CONFIG_XFS_ONLINE_SCRUB),y)
 xfs-y				+= $(addprefix scrub/, \
 				   agheader.o \
 				   alloc.o \
+				   bmap.o \
 				   btree.o \
 				   common.o \
 				   ialloc.o \
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 277b528..d762277 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -494,7 +494,10 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_RMAPBT	10	/* reverse mapping btree */
 #define XFS_SCRUB_TYPE_REFCNTBT	11	/* reference count btree */
 #define XFS_SCRUB_TYPE_INODE	12	/* inode record */
-#define XFS_SCRUB_TYPE_MAX	12
+#define XFS_SCRUB_TYPE_BMBTD	13	/* data fork block mapping */
+#define XFS_SCRUB_TYPE_BMBTA	14	/* attr fork block mapping */
+#define XFS_SCRUB_TYPE_BMBTC	15	/* CoW fork block mapping */
+#define XFS_SCRUB_TYPE_MAX	15
 
 /* i: repair this metadata */
 #define XFS_SCRUB_FLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c
new file mode 100644
index 0000000..731f026
--- /dev/null
+++ b/fs/xfs/scrub/bmap.c
@@ -0,0 +1,378 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_trace.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_inode_fork.h"
+#include "xfs_bmap.h"
+#include "xfs_bmap_util.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_rmap.h"
+#include "scrub/common.h"
+#include "scrub/btree.h"
+
+/* Set us up with an inode's bmap. */
+STATIC int
+__xfs_scrub_setup_inode_bmap(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip,
+	bool				flush_data)
+{
+	struct xfs_mount		*mp = sc->mp;
+	int				error;
+
+	error = xfs_scrub_get_inode(sc, ip);
+	if (error)
+		return error;
+
+	sc->ilock_flags = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL;
+	xfs_ilock(sc->ip, sc->ilock_flags);
+
+	/*
+	 * We don't want any ephemeral data fork updates sitting around
+	 * while we inspect block mappings, so wait for directio to finish
+	 * and flush dirty data if we have delalloc reservations.
+	 */
+	if (S_ISREG(VFS_I(sc->ip)->i_mode) && flush_data) {
+		inode_dio_wait(VFS_I(sc->ip));
+		error = filemap_write_and_wait(VFS_I(sc->ip)->i_mapping);
+		if (error)
+			goto out_unlock;
+		error = invalidate_inode_pages2(VFS_I(sc->ip)->i_mapping);
+		if (error)
+			goto out_unlock;
+	}
+
+	/* Got the inode, lock it and we're ready to go. */
+	error = xfs_scrub_trans_alloc(sc->sm, mp, &M_RES(mp)->tr_itruncate,
+			0, 0, 0, &sc->tp);
+	if (error)
+		goto out_unlock;
+	sc->ilock_flags |= XFS_ILOCK_EXCL;
+	xfs_ilock(sc->ip, XFS_ILOCK_EXCL);
+
+	return 0;
+out_unlock:
+	xfs_iunlock(sc->ip, sc->ilock_flags);
+	if (sc->ip != ip)
+		IRELE(sc->ip);
+	sc->ip = NULL;
+	return error;
+}
+
+/* Set us up to scrub the data fork. */
+int
+xfs_scrub_setup_inode_bmap_data(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	return __xfs_scrub_setup_inode_bmap(sc, ip, true);
+}
+
+/* Set us up to scrub the attr or CoW fork. */
+int
+xfs_scrub_setup_inode_bmap(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	return __xfs_scrub_setup_inode_bmap(sc, ip, false);
+}
+
+/*
+ * Inode fork block mapping (BMBT) scrubber.
+ * More complex than the others because we have to scrub
+ * all the extents regardless of whether or not the fork
+ * is in btree format.
+ */
+
+struct xfs_scrub_bmap_info {
+	struct xfs_scrub_context	*sc;
+	const char			*type;
+	xfs_daddr_t			eofs;
+	xfs_fileoff_t			lastoff;
+	bool				is_rt;
+	bool				is_shared;
+	int				whichfork;
+};
+
+#define XFS_SCRUB_BMAP_CHECK(fs_ok) \
+	XFS_SCRUB_INO_CHECK(info->sc, info->sc->ip->i_ino, bp, info->type, fs_ok)
+#define XFS_SCRUB_BMAP_GOTO(fs_ok, label) \
+	XFS_SCRUB_INO_GOTO(info->sc, info->sc->ip->i_ino, bp, info->type, fs_ok, label)
+#define XFS_SCRUB_BMAP_OP_ERROR_GOTO(label) \
+	XFS_SCRUB_OP_ERROR_GOTO(info->sc, agno, 0, "bmap", &error, label)
+/* Scrub a single extent record. */
+STATIC int
+xfs_scrub_bmap_extent(
+	struct xfs_inode		*ip,
+	struct xfs_btree_cur		*cur,
+	struct xfs_scrub_bmap_info	*info,
+	struct xfs_bmbt_irec		*irec)
+{
+	struct xfs_scrub_ag		sa = { 0 };
+	struct xfs_mount		*mp = info->sc->mp;
+	struct xfs_buf			*bp = NULL;
+	xfs_daddr_t			daddr;
+	xfs_daddr_t			dlen;
+	xfs_fsblock_t			bno;
+	xfs_agnumber_t			agno;
+	int				error = 0;
+
+	if (cur)
+		xfs_btree_get_block(cur, 0, &bp);
+
+	XFS_SCRUB_BMAP_CHECK(irec->br_startoff >= info->lastoff);
+	XFS_SCRUB_BMAP_CHECK(irec->br_startblock != HOLESTARTBLOCK);
+	XFS_SCRUB_BMAP_CHECK(!isnullstartblock(irec->br_startblock));
+
+	/* Actual mapping, so check the block ranges. */
+	if (info->is_rt) {
+		daddr = XFS_FSB_TO_BB(mp, irec->br_startblock);
+		agno = NULLAGNUMBER;
+		bno = irec->br_startblock;
+	} else {
+		daddr = XFS_FSB_TO_DADDR(mp, irec->br_startblock);
+		agno = XFS_FSB_TO_AGNO(mp, irec->br_startblock);
+		XFS_SCRUB_BMAP_GOTO(agno < mp->m_sb.sb_agcount, out);
+		bno = XFS_FSB_TO_AGBNO(mp, irec->br_startblock);
+		XFS_SCRUB_BMAP_CHECK(bno < mp->m_sb.sb_agblocks);
+	}
+	dlen = XFS_FSB_TO_BB(mp, irec->br_blockcount);
+	XFS_SCRUB_BMAP_CHECK(irec->br_blockcount > 0);
+	XFS_SCRUB_BMAP_CHECK(irec->br_blockcount <= MAXEXTLEN);
+	XFS_SCRUB_BMAP_CHECK(daddr < info->eofs);
+	XFS_SCRUB_BMAP_CHECK(daddr + dlen <= info->eofs);
+	XFS_SCRUB_BMAP_CHECK(irec->br_state != XFS_EXT_UNWRITTEN ||
+			xfs_sb_version_hasextflgbit(&mp->m_sb));
+	if (error)
+		goto out;
+
+	/* Set ourselves up for cross-referencing later. */
+	if (!info->is_rt) {
+		error = xfs_scrub_ag_init(info->sc, agno, &sa);
+		XFS_SCRUB_BMAP_OP_ERROR_GOTO(out);
+	}
+
+	xfs_scrub_ag_free(info->sc, &sa);
+out:
+	info->lastoff = irec->br_startoff + irec->br_blockcount;
+	return error;
+}
+#undef XFS_SCRUB_BMAP_OP_ERROR_GOTO
+#undef XFS_SCRUB_BMAP_GOTO
+
+/* Scrub a bmbt record. */
+STATIC int
+xfs_scrub_bmapbt_helper(
+	struct xfs_scrub_btree		*bs,
+	union xfs_btree_rec		*rec)
+{
+	struct xfs_bmbt_rec_host	ihost;
+	struct xfs_bmbt_irec		irec;
+	struct xfs_scrub_bmap_info	*info = bs->private;
+	struct xfs_inode		*ip = bs->cur->bc_private.b.ip;
+	struct xfs_buf			*bp = NULL;
+	struct xfs_btree_block		*block;
+	uint64_t			owner;
+	int				i;
+
+	/*
+	 * Check the owners of the btree blocks up to the level below
+	 * the root since the verifiers don't do that.
+	 */
+	if (xfs_sb_version_hascrc(&bs->cur->bc_mp->m_sb) &&
+	    bs->cur->bc_ptrs[0] == 1) {
+		for (i = 0; i < bs->cur->bc_nlevels - 1; i++) {
+			block = xfs_btree_get_block(bs->cur, i, &bp);
+			owner = be64_to_cpu(block->bb_u.l.bb_owner);
+			XFS_SCRUB_BMAP_CHECK(owner == ip->i_ino);
+		}
+	}
+
+	/* Set up the in-core record and scrub it. */
+	ihost.l0 = be64_to_cpu(rec->bmbt.l0);
+	ihost.l1 = be64_to_cpu(rec->bmbt.l1);
+	xfs_bmbt_get_all(&ihost, &irec);
+	return xfs_scrub_bmap_extent(ip, bs->cur, info, &irec);
+}
+#undef XFS_SCRUB_BMAP_CHECK
+
+#define XFS_SCRUB_FORK_CHECK(fs_ok) \
+	XFS_SCRUB_INO_CHECK(sc, ip->i_ino, NULL, info.type, fs_ok)
+#define XFS_SCRUB_FORK_GOTO(fs_ok, label) \
+	XFS_SCRUB_INO_GOTO(sc, ip->i_ino, NULL, info.type, fs_ok, label)
+#define XFS_SCRUB_FORK_OP_ERROR_GOTO(label) \
+	XFS_SCRUB_OP_ERROR_GOTO(sc, \
+			XFS_INO_TO_AGNO(mp, ip->i_ino), \
+			XFS_INO_TO_AGBNO(mp, ip->i_ino), \
+			info.type, &error, label)
+/* Scrub an inode fork's block mappings. */
+STATIC int
+xfs_scrub_bmap(
+	struct xfs_scrub_context	*sc,
+	int				whichfork)
+{
+	struct xfs_bmbt_irec		irec;
+	struct xfs_scrub_bmap_info	info = {0};
+	struct xfs_owner_info		oinfo;
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_inode		*ip = sc->ip;
+	struct xfs_ifork		*ifp;
+	struct xfs_btree_cur		*cur;
+	xfs_fileoff_t			endoff;
+	xfs_extnum_t			idx;
+	bool				found;
+	int				error = 0;
+	int				err2 = 0;
+
+	switch (whichfork) {
+	case XFS_DATA_FORK:
+		info.type = "data fork";
+		break;
+	case XFS_ATTR_FORK:
+		info.type = "attr fork";
+		break;
+	case XFS_COW_FORK:
+		info.type = "CoW fork";
+		break;
+	}
+	ifp = XFS_IFORK_PTR(ip, whichfork);
+
+	info.is_rt = whichfork == XFS_DATA_FORK && XFS_IS_REALTIME_INODE(ip);
+	info.eofs = XFS_FSB_TO_BB(mp, info.is_rt ? mp->m_sb.sb_rblocks :
+					      mp->m_sb.sb_dblocks);
+	info.whichfork = whichfork;
+	info.is_shared = whichfork == XFS_DATA_FORK && xfs_is_reflink_inode(ip);
+	info.sc = sc;
+
+	switch (whichfork) {
+	case XFS_COW_FORK:
+		/* Non-existent CoW forks are ignorable. */
+		if (!ifp)
+			goto out_unlock;
+		/* No CoW forks on non-reflink inodes/filesystems. */
+		XFS_SCRUB_FORK_GOTO(xfs_is_reflink_inode(ip), out_unlock);
+		break;
+	case XFS_ATTR_FORK:
+		if (!ifp)
+			goto out_unlock;
+		XFS_SCRUB_FORK_CHECK(xfs_sb_version_hasattr(&mp->m_sb) ||
+				     xfs_sb_version_hasattr2(&mp->m_sb));
+		break;
+	}
+
+	/* Check the fork values */
+	switch (XFS_IFORK_FORMAT(ip, whichfork)) {
+	case XFS_DINODE_FMT_UUID:
+	case XFS_DINODE_FMT_DEV:
+	case XFS_DINODE_FMT_LOCAL:
+		/* No mappings to check. */
+		goto out_unlock;
+	case XFS_DINODE_FMT_EXTENTS:
+		XFS_SCRUB_FORK_GOTO(ifp->if_flags & XFS_IFEXTENTS, out_unlock);
+		break;
+	case XFS_DINODE_FMT_BTREE:
+		XFS_SCRUB_FORK_CHECK(whichfork != XFS_COW_FORK);
+		/* Scan the btree records. */
+		cur = xfs_bmbt_init_cursor(mp, sc->tp, ip, whichfork);
+		xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, whichfork);
+		err2 = xfs_scrub_btree(sc, cur, xfs_scrub_bmapbt_helper,
+				&oinfo, &info);
+		xfs_btree_del_cursor(cur, err2 ? XFS_BTREE_ERROR :
+						 XFS_BTREE_NOERROR);
+		if (err2 == -EDEADLOCK)
+			return err2;
+		else if (err2)
+			goto out_unlock;
+		break;
+	default:
+		XFS_SCRUB_FORK_GOTO(false, out_unlock);
+		break;
+	}
+
+	/* Extent data is in memory, so scrub that. */
+
+	/* Find the offset of the last extent in the mapping. */
+	error = xfs_bmap_last_offset(ip, &endoff, whichfork);
+	XFS_SCRUB_FORK_OP_ERROR_GOTO(out_unlock);
+
+	/* Scrub extent records. */
+	info.lastoff = 0;
+	ifp = XFS_IFORK_PTR(ip, whichfork);
+	for (found = xfs_iext_lookup_extent(ip, ifp, 0, &idx, &irec);
+	     found;
+	     found = xfs_iext_get_extent(ifp, ++idx, &irec)) {
+		if (xfs_scrub_should_terminate(&error))
+			break;
+		if (isnullstartblock(irec.br_startblock))
+			continue;
+		XFS_SCRUB_FORK_CHECK(irec.br_startoff < endoff);
+		err2 = xfs_scrub_bmap_extent(ip, NULL, &info, &irec);
+		if (err2 == -EDEADLOCK)
+			return err2;
+		else if (!error && err2)
+			error = err2;
+	}
+
+out_unlock:
+	if (error == 0 && err2 != 0)
+		error = err2;
+	return error;
+}
+#undef XFS_SCRUB_FORK_CHECK
+#undef XFS_SCRUB_FORK_GOTO
+
+/* Scrub an inode's data fork. */
+int
+xfs_scrub_bmap_data(
+	struct xfs_scrub_context	*sc)
+{
+	return xfs_scrub_bmap(sc, XFS_DATA_FORK);
+}
+
+/* Scrub an inode's attr fork. */
+int
+xfs_scrub_bmap_attr(
+	struct xfs_scrub_context	*sc)
+{
+	return xfs_scrub_bmap(sc, XFS_ATTR_FORK);
+}
+
+/* Scrub an inode's CoW fork. */
+int
+xfs_scrub_bmap_cow(
+	struct xfs_scrub_context	*sc)
+{
+	if (!xfs_is_reflink_inode(sc->ip))
+		return -ENOENT;
+
+	return xfs_scrub_bmap(sc, XFS_COW_FORK);
+}
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 066fd3e..da3c006 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -795,6 +795,18 @@ static const struct xfs_scrub_meta_fns meta_scrub_fns[] = {
 		.setup	= xfs_scrub_setup_inode,
 		.scrub	= xfs_scrub_inode,
 	},
+	{ /* inode data fork */
+		.setup	= xfs_scrub_setup_inode_bmap_data,
+		.scrub	= xfs_scrub_bmap_data,
+	},
+	{ /* inode attr fork */
+		.setup	= xfs_scrub_setup_inode_bmap,
+		.scrub	= xfs_scrub_bmap_attr,
+	},
+	{ /* inode CoW fork */
+		.setup	= xfs_scrub_setup_inode_bmap,
+		.scrub	= xfs_scrub_bmap_cow,
+	},
 };
 
 /* Dispatch metadata scrubbing. */
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 5caa6c9..1025466 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -216,6 +216,8 @@ SETUP_FN(xfs_scrub_setup_ag_iallocbt);
 SETUP_FN(xfs_scrub_setup_ag_rmapbt);
 SETUP_FN(xfs_scrub_setup_ag_refcountbt);
 SETUP_FN(xfs_scrub_setup_inode);
+SETUP_FN(xfs_scrub_setup_inode_bmap_data);
+SETUP_FN(xfs_scrub_setup_inode_bmap);
 #undef SETUP_FN
 
 /* Metadata scrubbers */
@@ -234,6 +236,9 @@ SCRUB_FN(xfs_scrub_finobt);
 SCRUB_FN(xfs_scrub_rmapbt);
 SCRUB_FN(xfs_scrub_refcountbt);
 SCRUB_FN(xfs_scrub_inode);
+SCRUB_FN(xfs_scrub_bmap_data);
+SCRUB_FN(xfs_scrub_bmap_attr);
+SCRUB_FN(xfs_scrub_bmap_cow);
 #undef SCRUB_FN
 
 #endif	/* __XFS_REPAIR_COMMON_H__ */
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 950e2c8..edfa4c7 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3324,7 +3324,10 @@ DEFINE_GETFSMAP_EVENT(xfs_getfsmap_mapping);
 	{ XFS_SCRUB_TYPE_FINOBT,	"finobt" }, \
 	{ XFS_SCRUB_TYPE_RMAPBT,	"rmapbt" }, \
 	{ XFS_SCRUB_TYPE_REFCNTBT,	"refcountbt" }, \
-	{ XFS_SCRUB_TYPE_INODE,		"inode" }
+	{ XFS_SCRUB_TYPE_INODE,		"inode" }, \
+	{ XFS_SCRUB_TYPE_BMBTD,		"bmapbtd" }, \
+	{ XFS_SCRUB_TYPE_BMBTA,		"bmapbta" }, \
+	{ XFS_SCRUB_TYPE_BMBTC,		"bmapbtc" }
 DECLARE_EVENT_CLASS(xfs_scrub_class,
 	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
 		 int error),


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 15/22] xfs: scrub directory/attribute btrees
  2017-07-21  4:38 [PATCH v8 00/22] xfs: online scrub support Darrick J. Wong
                   ` (13 preceding siblings ...)
  2017-07-21  4:40 ` [PATCH 14/22] xfs: scrub inode block mappings Darrick J. Wong
@ 2017-07-21  4:40 ` Darrick J. Wong
  2017-07-23 17:45   ` Allison Henderson
  2017-07-21  4:40 ` [PATCH 16/22] xfs: scrub directory metadata Darrick J. Wong
                   ` (6 subsequent siblings)
  21 siblings, 1 reply; 63+ messages in thread
From: Darrick J. Wong @ 2017-07-21  4:40 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, Fengguang Wu

From: Darrick J. Wong <darrick.wong@oracle.com>

Provide a way to check the shape and scrub the hashes and records
in a directory or extended attribute btree.  These are helper functions
for the directory & attribute scrubbers in subsequent patches.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[fengguang: remove unneeded variable to store return value]
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
---
 fs/xfs/Makefile        |    1 
 fs/xfs/scrub/dabtree.c |  473 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/dabtree.h |   62 ++++++
 3 files changed, 536 insertions(+)
 create mode 100644 fs/xfs/scrub/dabtree.c
 create mode 100644 fs/xfs/scrub/dabtree.h


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 89c67e1a..5f9e121 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -145,6 +145,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   bmap.o \
 				   btree.o \
 				   common.o \
+				   dabtree.o \
 				   ialloc.o \
 				   inode.o \
 				   metabufs.o \
diff --git a/fs/xfs/scrub/dabtree.c b/fs/xfs/scrub/dabtree.c
new file mode 100644
index 0000000..d730e75
--- /dev/null
+++ b/fs/xfs/scrub/dabtree.c
@@ -0,0 +1,473 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_trace.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_inode_fork.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_dir2.h"
+#include "xfs_dir2_priv.h"
+#include "xfs_attr_leaf.h"
+#include "scrub/common.h"
+#include "scrub/dabtree.h"
+
+/* Directory/Attribute Btree */
+
+/* Find an entry at a certain level in a da btree. */
+STATIC void *
+xfs_scrub_da_btree_entry(
+	struct xfs_scrub_da_btree	*ds,
+	int				level,
+	int				rec)
+{
+	char				*ents;
+	void				*(*fn)(void *);
+	size_t				sz;
+	struct xfs_da_state_blk		*blk;
+
+	/* Dispatch the entry finding function. */
+	blk = &ds->state->path.blk[level];
+	switch (blk->magic) {
+	case XFS_ATTR_LEAF_MAGIC:
+	case XFS_ATTR3_LEAF_MAGIC:
+		fn = (xfs_da_leaf_ents_fn)xfs_attr3_leaf_entryp;
+		sz = sizeof(struct xfs_attr_leaf_entry);
+		break;
+	case XFS_DIR2_LEAFN_MAGIC:
+	case XFS_DIR3_LEAFN_MAGIC:
+		fn = (xfs_da_leaf_ents_fn)ds->dargs.dp->d_ops->leaf_ents_p;
+		sz = sizeof(struct xfs_dir2_leaf_entry);
+		break;
+	case XFS_DIR2_LEAF1_MAGIC:
+	case XFS_DIR3_LEAF1_MAGIC:
+		fn = (xfs_da_leaf_ents_fn)ds->dargs.dp->d_ops->leaf_ents_p;
+		sz = sizeof(struct xfs_dir2_leaf_entry);
+		break;
+	case XFS_DA_NODE_MAGIC:
+	case XFS_DA3_NODE_MAGIC:
+		fn = (xfs_da_leaf_ents_fn)ds->dargs.dp->d_ops->node_tree_p;
+		sz = sizeof(struct xfs_da_node_entry);
+		break;
+	default:
+		return NULL;
+	}
+
+	ents = fn(blk->bp->b_addr);
+	return ents + (sz * rec);
+}
+
+/* Scrub a da btree hash (key). */
+int
+xfs_scrub_da_btree_hash(
+	struct xfs_scrub_da_btree	*ds,
+	int				level,
+	__be32				*hashp)
+{
+	struct xfs_da_state_blk		*blks;
+	struct xfs_da_node_entry	*btree;
+	xfs_dahash_t			hash;
+	xfs_dahash_t			parent_hash;
+
+	/* Is this hash in order? */
+	hash = be32_to_cpu(*hashp);
+	XFS_SCRUB_DA_CHECK(ds, hash >= ds->hashes[level]);
+	ds->hashes[level] = hash;
+
+	if (level == 0)
+		return 0;
+
+	/* Is this hash no larger than the parent hash? */
+	blks = ds->state->path.blk;
+	btree = xfs_scrub_da_btree_entry(ds, level - 1, blks[level - 1].index);
+	parent_hash = be32_to_cpu(btree->hashval);
+	XFS_SCRUB_DA_CHECK(ds, hash <= parent_hash);
+
+	return 0;
+}
+
+/* Scrub a da btree pointer. */
+STATIC int
+xfs_scrub_da_btree_ptr(
+	struct xfs_scrub_da_btree	*ds,
+	int				level,
+	xfs_dablk_t			blkno)
+{
+	XFS_SCRUB_DA_CHECK(ds, blkno >= ds->lowest);
+	XFS_SCRUB_DA_CHECK(ds, ds->highest == 0 || blkno < ds->highest);
+
+	return 0;
+}
+
+/*
+ * The da btree scrubber can handle leaf1 blocks as a degenerate
+ * form of da btree.  Since the regular da code doesn't handle
+ * leaf1, we must multiplex the verifiers.
+ */
+static void
+xfs_scrub_da_btree_read_verify(
+	struct xfs_buf		*bp)
+{
+	struct xfs_da_blkinfo	*info = bp->b_addr;
+
+	switch (be16_to_cpu(info->magic)) {
+	case XFS_DIR2_LEAF1_MAGIC:
+	case XFS_DIR3_LEAF1_MAGIC:
+		bp->b_ops = &xfs_dir3_leaf1_buf_ops;
+		bp->b_ops->verify_read(bp);
+		return;
+	default:
+		bp->b_ops = &xfs_da3_node_buf_ops;
+		bp->b_ops->verify_read(bp);
+		return;
+	}
+}
+static void
+xfs_scrub_da_btree_write_verify(
+	struct xfs_buf	*bp)
+{
+	struct xfs_da_blkinfo	*info = bp->b_addr;
+
+	switch (be16_to_cpu(info->magic)) {
+	case XFS_DIR2_LEAF1_MAGIC:
+	case XFS_DIR3_LEAF1_MAGIC:
+		bp->b_ops = &xfs_dir3_leaf1_buf_ops;
+		bp->b_ops->verify_write(bp);
+		return;
+	default:
+		bp->b_ops = &xfs_da3_node_buf_ops;
+		bp->b_ops->verify_write(bp);
+		return;
+	}
+}
+
+static const struct xfs_buf_ops xfs_scrub_da_btree_buf_ops = {
+	.name = "xfs_scrub_da_btree",
+	.verify_read = xfs_scrub_da_btree_read_verify,
+	.verify_write = xfs_scrub_da_btree_write_verify,
+};
+
+/* Check a block's sibling pointers. */
+STATIC int
+xfs_scrub_da_btree_block_check_siblings(
+	struct xfs_scrub_da_btree	*ds,
+	int				level,
+	struct xfs_da_blkinfo		*hdr)
+{
+	xfs_dablk_t			forw;
+	xfs_dablk_t			back;
+	int				retval;
+	int				error = 0;
+
+	forw = be32_to_cpu(hdr->forw);
+	back = be32_to_cpu(hdr->back);
+
+	/* Top level blocks should not have sibling pointers. */
+	if (level == 0) {
+		XFS_SCRUB_DA_CHECK(ds, forw == 0);
+		XFS_SCRUB_DA_CHECK(ds, back == 0);
+		return error;
+	}
+
+	/* Check back (left) pointer. */
+	if (back != 0) {
+		/* Move the alternate cursor back one block. */
+		ds->state->altpath = ds->state->path;
+		error = xfs_da3_path_shift(ds->state, &ds->state->altpath,
+				0, false, &retval);
+		XFS_SCRUB_DA_OP_ERROR_GOTO(ds, &error, out);
+		XFS_SCRUB_DA_GOTO(ds, retval == 0, verify_forw);
+		XFS_SCRUB_DA_CHECK(ds,
+				ds->state->altpath.blk[level].blkno == back);
+		xfs_trans_brelse(ds->dargs.trans,
+				ds->state->altpath.blk[level].bp);
+	}
+
+verify_forw:
+	/* Check forw (right) pointer. */
+	if (!error && forw != 0) {
+		/* Move the alternate cursor forward one block. */
+		ds->state->altpath = ds->state->path;
+		error = xfs_da3_path_shift(ds->state, &ds->state->altpath,
+				1, false, &retval);
+		XFS_SCRUB_DA_OP_ERROR_GOTO(ds, &error, out);
+		XFS_SCRUB_DA_GOTO(ds, retval == 0, out);
+		XFS_SCRUB_DA_CHECK(ds,
+				ds->state->altpath.blk[level].blkno == forw);
+		xfs_trans_brelse(ds->dargs.trans,
+				ds->state->altpath.blk[level].bp);
+	}
+out:
+	memset(&ds->state->altpath, 0, sizeof(ds->state->altpath));
+	return error;
+}
+
+/* Load a dir/attribute block from a btree. */
+STATIC int
+xfs_scrub_da_btree_block(
+	struct xfs_scrub_da_btree	*ds,
+	int				level,
+	xfs_dablk_t			blkno)
+{
+	struct xfs_da_state_blk		*blk;
+	struct xfs_da_intnode		*node;
+	struct xfs_da_node_entry	*btree;
+	struct xfs_da3_blkinfo		*hdr3;
+	struct xfs_da_args		*dargs = &ds->dargs;
+	struct xfs_inode		*ip = ds->dargs.dp;
+	xfs_ino_t			owner;
+	int				*pmaxrecs;
+	struct xfs_da3_icnode_hdr	nodehdr;
+	int				error;
+
+	blk = &ds->state->path.blk[level];
+	ds->state->path.active = level + 1;
+
+	/* Release old block. */
+	if (blk->bp) {
+		xfs_trans_brelse(dargs->trans, blk->bp);
+		blk->bp = NULL;
+	}
+
+	/* Check the pointer. */
+	blk->blkno = blkno;
+	error = xfs_scrub_da_btree_ptr(ds, level, blkno);
+	if (error) {
+		blk->blkno = 0;
+		goto out;
+	}
+
+	/* Read the buffer. */
+	error = xfs_da_read_buf(dargs->trans, dargs->dp, blk->blkno, -2,
+			&blk->bp, dargs->whichfork,
+			&xfs_scrub_da_btree_buf_ops);
+	XFS_SCRUB_DA_OP_ERROR_GOTO(ds, &error, out_nobuf);
+
+	/* It's ok for a directory not to have a da btree in it. */
+	if (ds->dargs.whichfork == XFS_DATA_FORK && level == 0 &&
+			blk->bp == NULL)
+		goto out_nobuf;
+	XFS_SCRUB_DA_GOTO(ds, blk->bp != NULL, out_nobuf);
+
+	hdr3 = blk->bp->b_addr;
+	blk->magic = be16_to_cpu(hdr3->hdr.magic);
+	pmaxrecs = &ds->maxrecs[level];
+
+	/* Check the owner. */
+	if (xfs_sb_version_hascrc(&ip->i_mount->m_sb)) {
+		owner = be64_to_cpu(hdr3->owner);
+		error = -EFSCORRUPTED;
+		XFS_SCRUB_DA_GOTO(ds, owner == ip->i_ino, out);
+	}
+
+	/* Check the siblings. */
+	error = xfs_scrub_da_btree_block_check_siblings(ds, level, &hdr3->hdr);
+	if (error)
+		goto out;
+
+	/* Interpret the buffer. */
+	error = -EFSCORRUPTED;
+	switch (blk->magic) {
+	case XFS_ATTR_LEAF_MAGIC:
+	case XFS_ATTR3_LEAF_MAGIC:
+		xfs_trans_buf_set_type(dargs->trans, blk->bp,
+				XFS_BLFT_ATTR_LEAF_BUF);
+		blk->magic = XFS_ATTR_LEAF_MAGIC;
+		blk->hashval = xfs_attr_leaf_lasthash(blk->bp, pmaxrecs);
+		XFS_SCRUB_DA_CHECK(ds, ds->tree_level == 0);
+		break;
+	case XFS_DIR2_LEAFN_MAGIC:
+	case XFS_DIR3_LEAFN_MAGIC:
+		xfs_trans_buf_set_type(dargs->trans, blk->bp,
+				XFS_BLFT_DIR_LEAFN_BUF);
+		blk->magic = XFS_DIR2_LEAFN_MAGIC;
+		blk->hashval = xfs_dir2_leaf_lasthash(ip, blk->bp, pmaxrecs);
+		XFS_SCRUB_DA_CHECK(ds, ds->tree_level == 0);
+		break;
+	case XFS_DIR2_LEAF1_MAGIC:
+	case XFS_DIR3_LEAF1_MAGIC:
+		xfs_trans_buf_set_type(dargs->trans, blk->bp,
+				XFS_BLFT_DIR_LEAF1_BUF);
+		blk->magic = XFS_DIR2_LEAF1_MAGIC;
+		blk->hashval = xfs_dir2_leaf_lasthash(ip, blk->bp, pmaxrecs);
+		XFS_SCRUB_DA_CHECK(ds, ds->tree_level == 0);
+		break;
+	case XFS_DA_NODE_MAGIC:
+	case XFS_DA3_NODE_MAGIC:
+		xfs_trans_buf_set_type(dargs->trans, blk->bp,
+				XFS_BLFT_DA_NODE_BUF);
+		blk->magic = XFS_DA_NODE_MAGIC;
+		node = blk->bp->b_addr;
+		ip->d_ops->node_hdr_from_disk(&nodehdr, node);
+		btree = ip->d_ops->node_tree_p(node);
+		*pmaxrecs = nodehdr.count;
+		blk->hashval = be32_to_cpu(btree[*pmaxrecs - 1].hashval);
+		if (level == 0) {
+			XFS_SCRUB_DA_GOTO(ds,
+					nodehdr.level < XFS_DA_NODE_MAXDEPTH,
+					out);
+			ds->tree_level = nodehdr.level;
+		} else
+			XFS_SCRUB_DA_GOTO(ds, ds->tree_level == nodehdr.level,
+					out);
+		break;
+	default:
+		XFS_SCRUB_DA_CHECK(ds, false);
+		xfs_trans_brelse(dargs->trans, blk->bp);
+		blk->bp = NULL;
+		blk->blkno = 0;
+		break;
+	}
+	error = 0;
+
+out:
+	return error;
+out_nobuf:
+	blk->blkno = 0;
+	return error;
+}
+
+/* Visit all nodes and leaves of a da btree. */
+int
+xfs_scrub_da_btree(
+	struct xfs_scrub_context	*sc,
+	int				whichfork,
+	xfs_scrub_da_btree_rec_fn	scrub_fn)
+{
+	struct xfs_scrub_da_btree	ds;
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_da_state_blk		*blks;
+	struct xfs_da_node_entry	*btree;
+	void				*rec;
+	xfs_dablk_t			blkno;
+	bool				is_attr;
+	int				level;
+	int				error;
+
+	memset(&ds, 0, sizeof(ds));
+	/* Skip short format data structures; no btree to scan. */
+	if (XFS_IFORK_FORMAT(sc->ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
+	    XFS_IFORK_FORMAT(sc->ip, whichfork) != XFS_DINODE_FMT_BTREE)
+		return 0;
+
+	/* Set up initial da state. */
+	is_attr = whichfork == XFS_ATTR_FORK;
+	ds.dargs.geo = is_attr ? mp->m_attr_geo : mp->m_dir_geo;
+	ds.dargs.dp = sc->ip;
+	ds.dargs.whichfork = whichfork;
+	ds.dargs.trans = sc->tp;
+	ds.dargs.op_flags = XFS_DA_OP_OKNOENT;
+	ds.state = xfs_da_state_alloc();
+	ds.state->args = &ds.dargs;
+	ds.state->mp = mp;
+	ds.type = is_attr ? "attr" : "dir";
+	ds.sc = sc;
+	blkno = ds.lowest = is_attr ? 0 : ds.dargs.geo->leafblk;
+	ds.highest = is_attr ? 0 : ds.dargs.geo->freeblk;
+	level = 0;
+
+	/* Find the root of the da tree, if present. */
+	blks = ds.state->path.blk;
+	error = xfs_scrub_da_btree_block(&ds, level, blkno);
+	if (error)
+		goto out_state;
+	if (blks[level].bp == NULL)
+		goto out_state;
+
+	blks[level].index = 0;
+	while (level >= 0 && level < XFS_DA_NODE_MAXDEPTH) {
+		/* Handle leaf block. */
+		if (blks[level].magic != XFS_DA_NODE_MAGIC) {
+			/* End of leaf, pop back towards the root. */
+			if (blks[level].index >= ds.maxrecs[level]) {
+				if (level > 0)
+					blks[level - 1].index++;
+				ds.tree_level++;
+				level--;
+				continue;
+			}
+
+			/* Dispatch record scrubbing. */
+			rec = xfs_scrub_da_btree_entry(&ds, level,
+					blks[level].index);
+			error = scrub_fn(&ds, level, rec);
+			if (error < 0 ||
+			    error == XFS_BTREE_QUERY_RANGE_ABORT)
+				break;
+			if (xfs_scrub_should_terminate(&error))
+				break;
+
+			blks[level].index++;
+			continue;
+		}
+
+		btree = xfs_scrub_da_btree_entry(&ds, level, blks[level].index);
+
+		/* End of node, pop back towards the root. */
+		if (blks[level].index >= ds.maxrecs[level]) {
+			if (level > 0)
+				blks[level - 1].index++;
+			ds.tree_level++;
+			level--;
+			continue;
+		}
+
+		/* Hashes in order for scrub? */
+		error = xfs_scrub_da_btree_hash(&ds, level, &btree->hashval);
+		if (error)
+			goto out;
+
+		/* Drill another level deeper. */
+		blkno = be32_to_cpu(btree->before);
+		level++;
+		ds.tree_level--;
+		error = xfs_scrub_da_btree_block(&ds, level, blkno);
+		if (error)
+			goto out;
+		if (blks[level].bp == NULL)
+			goto out;
+
+		blks[level].index = 0;
+	}
+
+out:
+	/* Release all the buffers we're tracking. */
+	for (level = 0; level < XFS_DA_NODE_MAXDEPTH; level++) {
+		if (blks[level].bp == NULL)
+			continue;
+		xfs_trans_brelse(sc->tp, blks[level].bp);
+		blks[level].bp = NULL;
+	}
+
+out_state:
+	xfs_da_state_free(ds.state);
+	return error;
+}
diff --git a/fs/xfs/scrub/dabtree.h b/fs/xfs/scrub/dabtree.h
new file mode 100644
index 0000000..1302d67
--- /dev/null
+++ b/fs/xfs/scrub/dabtree.h
@@ -0,0 +1,62 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef __XFS_REPAIR_DABTREE_H__
+#define __XFS_REPAIR_DABTREE_H__
+
+/* dir/attr btree */
+
+struct xfs_scrub_da_btree {
+	struct xfs_da_args		dargs;
+	xfs_dahash_t			hashes[XFS_DA_NODE_MAXDEPTH];
+	int				maxrecs[XFS_DA_NODE_MAXDEPTH];
+	struct xfs_da_state		*state;
+	const char			*type;
+	struct xfs_scrub_context	*sc;
+	xfs_dablk_t			lowest;
+	xfs_dablk_t			highest;
+	int				tree_level;
+};
+
+typedef void *(*xfs_da_leaf_ents_fn)(void *);
+typedef int (*xfs_scrub_da_btree_rec_fn)(struct xfs_scrub_da_btree *ds,
+		int level, void *rec);
+
+#define XFS_SCRUB_DA_CHECK(ds, fs_ok) \
+	XFS_SCRUB_DATA_CHECK((ds)->sc, (ds)->dargs.whichfork, \
+			xfs_dir2_da_to_db((ds)->dargs.geo, \
+			(ds)->state->path.blk[level].blkno), (ds)->type, \
+			fs_ok)
+#define XFS_SCRUB_DA_GOTO(ds, fs_ok, label) \
+	XFS_SCRUB_DATA_GOTO((ds)->sc, (ds)->dargs.whichfork, \
+			xfs_dir2_da_to_db((ds)->dargs.geo, \
+			(ds)->state->path.blk[level].blkno), (ds)->type, \
+			fs_ok, label)
+#define XFS_SCRUB_DA_OP_ERROR_GOTO(ds, error, label) \
+	XFS_SCRUB_FILE_OP_ERROR_GOTO((ds)->sc, (ds)->dargs.whichfork, \
+			xfs_dir2_da_to_db((ds)->dargs.geo, \
+			(ds)->state->path.blk[level].blkno), (ds)->type, \
+			(error), label)
+
+int xfs_scrub_da_btree_hash(struct xfs_scrub_da_btree *ds, int level,
+			    __be32 *hashp);
+int xfs_scrub_da_btree(struct xfs_scrub_context *sc, int whichfork,
+		       xfs_scrub_da_btree_rec_fn scrub_fn);
+
+#endif /* __XFS_REPAIR_DABTREE_H__ */


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 16/22] xfs: scrub directory metadata
  2017-07-21  4:38 [PATCH v8 00/22] xfs: online scrub support Darrick J. Wong
                   ` (14 preceding siblings ...)
  2017-07-21  4:40 ` [PATCH 15/22] xfs: scrub directory/attribute btrees Darrick J. Wong
@ 2017-07-21  4:40 ` Darrick J. Wong
  2017-07-23 17:51   ` Allison Henderson
  2017-07-21  4:40 ` [PATCH 17/22] xfs: scrub directory freespace Darrick J. Wong
                   ` (5 subsequent siblings)
  21 siblings, 1 reply; 63+ messages in thread
From: Darrick J. Wong @ 2017-07-21  4:40 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Scrub the hash tree and all the entries in a directory.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile        |    1 
 fs/xfs/libxfs/xfs_fs.h |    3 
 fs/xfs/scrub/common.c  |   37 ++++++
 fs/xfs/scrub/common.h  |    4 +
 fs/xfs/scrub/dir.c     |  291 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_trace.h     |    3 
 6 files changed, 337 insertions(+), 2 deletions(-)
 create mode 100644 fs/xfs/scrub/dir.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 5f9e121..c568d0d 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -146,6 +146,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   btree.o \
 				   common.o \
 				   dabtree.o \
+				   dir.o \
 				   ialloc.o \
 				   inode.o \
 				   metabufs.o \
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index d762277..e646b5f 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -497,7 +497,8 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_BMBTD	13	/* data fork block mapping */
 #define XFS_SCRUB_TYPE_BMBTA	14	/* attr fork block mapping */
 #define XFS_SCRUB_TYPE_BMBTC	15	/* CoW fork block mapping */
-#define XFS_SCRUB_TYPE_MAX	15
+#define XFS_SCRUB_TYPE_DIR	16	/* directory */
+#define XFS_SCRUB_TYPE_MAX	16
 
 /* i: repair this metadata */
 #define XFS_SCRUB_FLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index da3c006..92627e9 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -633,6 +633,39 @@ xfs_scrub_get_inode(
 	return 0;
 }
 
+/* Set us up to scrub a file's contents. */
+int
+xfs_scrub_setup_inode_contents(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip,
+	unsigned int			resblks)
+{
+	struct xfs_mount		*mp = sc->mp;
+	int				error;
+
+	error = xfs_scrub_get_inode(sc, ip);
+	if (error)
+		return error;
+
+	/* Got the inode, lock it and we're ready to go. */
+	sc->ilock_flags = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL;
+	xfs_ilock(sc->ip, sc->ilock_flags);
+	error = xfs_scrub_trans_alloc(sc->sm, mp, &M_RES(mp)->tr_itruncate,
+			resblks, 0, 0, &sc->tp);
+	if (error)
+		goto out_unlock;
+	sc->ilock_flags |= XFS_ILOCK_EXCL;
+	xfs_ilock(sc->ip, XFS_ILOCK_EXCL);
+
+	return 0;
+out_unlock:
+	xfs_iunlock(sc->ip, sc->ilock_flags);
+	if (sc->ip != ip)
+		IRELE(sc->ip);
+	sc->ip = NULL;
+	return error;
+}
+
 /* Scrub setup and teardown */
 
 /* Free all the resources and finish the transactions. */
@@ -807,6 +840,10 @@ static const struct xfs_scrub_meta_fns meta_scrub_fns[] = {
 		.setup	= xfs_scrub_setup_inode_bmap,
 		.scrub	= xfs_scrub_bmap_cow,
 	},
+	{ /* directory */
+		.setup	= xfs_scrub_setup_directory,
+		.scrub	= xfs_scrub_directory,
+	},
 };
 
 /* Dispatch metadata scrubbing. */
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 1025466..7baaa2d 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -206,6 +206,8 @@ int xfs_scrub_walk_agfl(struct xfs_scrub_context *sc,
 int xfs_scrub_setup_ag_btree(struct xfs_scrub_context *sc,
 			     struct xfs_inode *ip, bool force_log);
 int xfs_scrub_get_inode(struct xfs_scrub_context *sc, struct xfs_inode *ip_in);
+int xfs_scrub_setup_inode_contents(struct xfs_scrub_context *sc,
+				   struct xfs_inode *ip, unsigned int resblks);
 
 #define SETUP_FN(name) int name(struct xfs_scrub_context *sc, struct xfs_inode *ip)
 SETUP_FN(xfs_scrub_setup_fs);
@@ -218,6 +220,7 @@ SETUP_FN(xfs_scrub_setup_ag_refcountbt);
 SETUP_FN(xfs_scrub_setup_inode);
 SETUP_FN(xfs_scrub_setup_inode_bmap_data);
 SETUP_FN(xfs_scrub_setup_inode_bmap);
+SETUP_FN(xfs_scrub_setup_directory);
 #undef SETUP_FN
 
 /* Metadata scrubbers */
@@ -239,6 +242,7 @@ SCRUB_FN(xfs_scrub_inode);
 SCRUB_FN(xfs_scrub_bmap_data);
 SCRUB_FN(xfs_scrub_bmap_attr);
 SCRUB_FN(xfs_scrub_bmap_cow);
+SCRUB_FN(xfs_scrub_directory);
 #undef SCRUB_FN
 
 #endif	/* __XFS_REPAIR_COMMON_H__ */
diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c
new file mode 100644
index 0000000..1d6b5329
--- /dev/null
+++ b/fs/xfs/scrub/dir.c
@@ -0,0 +1,291 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_trace.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_icache.h"
+#include "xfs_itable.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_dir2.h"
+#include "xfs_dir2_priv.h"
+#include "scrub/common.h"
+#include "scrub/dabtree.h"
+
+/* Set us up to scrub directories. */
+int
+xfs_scrub_setup_directory(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	return xfs_scrub_setup_inode_contents(sc, ip, 0);
+}
+
+/* Directories */
+
+/* Scrub a directory entry. */
+
+struct xfs_scrub_dir_ctx {
+	struct dir_context		dc;
+	struct xfs_scrub_context	*sc;
+};
+
+#define XFS_SCRUB_DIR_CHECK(fs_ok) \
+	XFS_SCRUB_DATA_CHECK(sdc->sc, XFS_DATA_FORK, offset, "dir", fs_ok)
+#define XFS_SCRUB_DIR_GOTO(fs_ok, label) \
+	XFS_SCRUB_DATA_GOTO(sdc->sc, XFS_DATA_FORK, offset, "dir", fs_ok, label)
+#define XFS_SCRUB_DIR_OP_ERROR_GOTO(label) \
+	XFS_SCRUB_FILE_OP_ERROR_GOTO(sdc->sc, XFS_DATA_FORK, offset, "dir", &error, label)
+/* Check that an inode's mode matches a given DT_ type. */
+STATIC int
+xfs_scrub_dir_check_ftype(
+	struct xfs_scrub_dir_ctx	*sdc,
+	xfs_fileoff_t			offset,
+	xfs_ino_t			inum,
+	int				dtype)
+{
+	struct xfs_mount		*mp = sdc->sc->mp;
+	struct xfs_inode		*ip;
+	int				ino_dtype;
+	int				error = 0;
+
+	if (!xfs_sb_version_hasftype(&mp->m_sb)) {
+		XFS_SCRUB_DIR_CHECK(dtype == DT_UNKNOWN || dtype == DT_DIR);
+		goto out;
+	}
+
+	error = xfs_iget(mp, sdc->sc->tp, inum, 0, 0, &ip);
+	XFS_SCRUB_OP_ERROR_GOTO(sdc->sc,
+			XFS_INO_TO_AGNO(mp, inum),
+			XFS_INO_TO_AGBNO(mp, inum),
+			"inode", &error, out);
+	/* Convert mode to the DT_* values that dir_emit uses. */
+	ino_dtype = (VFS_I(ip)->i_mode & S_IFMT) >> 12;
+	XFS_SCRUB_DIR_CHECK(ino_dtype == dtype);
+	IRELE(ip);
+out:
+	return error;
+}
+
+/* Scrub a single directory entry. */
+STATIC int
+xfs_scrub_dir_actor(
+	struct dir_context		*dc,
+	const char			*name,
+	int				namelen,
+	loff_t				pos,
+	u64				ino,
+	unsigned			type)
+{
+	struct xfs_mount		*mp;
+	struct xfs_inode		*ip;
+	struct xfs_scrub_dir_ctx	*sdc;
+	struct xfs_name			xname;
+	xfs_ino_t			lookup_ino;
+	xfs_dablk_t			offset;
+	int				error = 0;
+
+	sdc = container_of(dc, struct xfs_scrub_dir_ctx, dc);
+	ip = sdc->sc->ip;
+	mp = ip->i_mount;
+	offset = xfs_dir2_db_to_da(mp->m_dir_geo,
+			xfs_dir2_dataptr_to_db(mp->m_dir_geo, pos));
+
+	/* Does this inode number make sense? */
+	XFS_SCRUB_DIR_GOTO(xfs_dir_ino_validate(mp, ino) == 0, out);
+	XFS_SCRUB_DIR_GOTO(!xfs_internal_inum(mp, ino), out);
+
+	/* Verify that we can look up this name by hash. */
+	xname.name = name;
+	xname.len = namelen;
+	xname.type = XFS_DIR3_FT_UNKNOWN;
+
+	error = xfs_dir_lookup(sdc->sc->tp, ip, &xname, &lookup_ino, NULL);
+	XFS_SCRUB_DIR_OP_ERROR_GOTO(fail_xref);
+	XFS_SCRUB_DIR_GOTO(lookup_ino == ino, out);
+
+	if (!strncmp(".", name, namelen)) {
+		/* If this is "." then check that the inum matches the dir. */
+		if (xfs_sb_version_hasftype(&mp->m_sb))
+			XFS_SCRUB_DIR_CHECK(type == DT_DIR);
+		XFS_SCRUB_DIR_CHECK(ino == ip->i_ino);
+	} else if (!strncmp("..", name, namelen)) {
+		/*
+		 * If this is ".." in the root inode, check that the inum
+		 * matches this dir.
+		 */
+		if (xfs_sb_version_hasftype(&mp->m_sb))
+			XFS_SCRUB_DIR_CHECK(type == DT_DIR);
+		if (ip->i_ino == mp->m_sb.sb_rootino)
+			XFS_SCRUB_DIR_CHECK(ino == ip->i_ino);
+	}
+	if (error)
+		goto out;
+
+	/* Verify the file type. */
+	error = xfs_scrub_dir_check_ftype(sdc, offset, lookup_ino, type);
+	if (error)
+		goto out;
+out:
+	return error;
+fail_xref:
+	return error ? error : -EFSCORRUPTED;
+}
+#undef XFS_SCRUB_DIR_OP_ERROR_GOTO
+#undef XFS_SCRUB_DIR_GOTO
+#undef XFS_SCRUB_DIR_CHECK
+
+#define XFS_SCRUB_DIRENT_CHECK(fs_ok) \
+	XFS_SCRUB_DATA_CHECK(ds->sc, XFS_DATA_FORK, rec_bno, "dir", fs_ok)
+#define XFS_SCRUB_DIRENT_GOTO(fs_ok, label) \
+	XFS_SCRUB_DATA_GOTO(ds->sc, XFS_DATA_FORK, rec_bno, "dir", fs_ok, label)
+#define XFS_SCRUB_DIRENT_OP_ERROR_GOTO(label) \
+	XFS_SCRUB_FILE_OP_ERROR_GOTO(ds->sc, XFS_DATA_FORK, rec_bno, "dir", &error, label)
+/* Scrub a directory btree record. */
+STATIC int
+xfs_scrub_dir_rec(
+	struct xfs_scrub_da_btree	*ds,
+	int				level,
+	void				*rec)
+{
+	struct xfs_mount		*mp = ds->state->mp;
+	struct xfs_dir2_leaf_entry	*ent = rec;
+	struct xfs_inode		*dp = ds->dargs.dp;
+	struct xfs_dir2_data_entry	*dent;
+	struct xfs_buf			*bp;
+	xfs_ino_t			ino;
+	xfs_dablk_t			rec_bno;
+	xfs_dir2_db_t			db;
+	xfs_dir2_data_aoff_t		off;
+	xfs_dir2_dataptr_t		ptr;
+	xfs_dahash_t			calc_hash;
+	xfs_dahash_t			hash;
+	unsigned int			tag;
+	int				error;
+
+	/* Check the hash of the entry. */
+	error = xfs_scrub_da_btree_hash(ds, level, &ent->hashval);
+	if (error)
+		goto out;
+
+	/* Valid hash pointer? */
+	ptr = be32_to_cpu(ent->address);
+	if (ptr == 0)
+		return 0;
+
+	/* Find the directory entry's location. */
+	db = xfs_dir2_dataptr_to_db(mp->m_dir_geo, ptr);
+	off = xfs_dir2_dataptr_to_off(mp->m_dir_geo, ptr);
+	rec_bno = xfs_dir2_db_to_da(mp->m_dir_geo, db);
+
+	XFS_SCRUB_DA_GOTO(ds, rec_bno < mp->m_dir_geo->leafblk, out);
+	error = xfs_dir3_data_read(ds->dargs.trans, dp, rec_bno, -2, &bp);
+	XFS_SCRUB_DIRENT_OP_ERROR_GOTO(out);
+	XFS_SCRUB_DIRENT_GOTO(bp != NULL, out);
+
+	/* Retrieve the entry and check it. */
+	dent = (struct xfs_dir2_data_entry *)(((char *)bp->b_addr) + off);
+	ino = be64_to_cpu(dent->inumber);
+	hash = be32_to_cpu(ent->hashval);
+	tag = be16_to_cpup(dp->d_ops->data_entry_tag_p(dent));
+	XFS_SCRUB_DIRENT_CHECK(xfs_dir_ino_validate(mp, ino) == 0);
+	XFS_SCRUB_DIRENT_CHECK(!xfs_internal_inum(mp, ino));
+	XFS_SCRUB_DIRENT_CHECK(tag == off);
+	XFS_SCRUB_DIRENT_GOTO(dent->namelen < MAXNAMELEN, out_relse);
+	calc_hash = xfs_da_hashname(dent->name, dent->namelen);
+	XFS_SCRUB_DIRENT_CHECK(calc_hash == hash);
+
+out_relse:
+	xfs_trans_brelse(ds->dargs.trans, bp);
+out:
+	return error;
+}
+#undef XFS_SCRUB_DIRENT_OP_ERROR_GOTO
+#undef XFS_SCRUB_DIRENT_GOTO
+#undef XFS_SCRUB_DIRENT_CHECK
+
+/* Scrub a whole directory. */
+int
+xfs_scrub_directory(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_scrub_dir_ctx	sdc = {
+		.dc.actor = xfs_scrub_dir_actor,
+		.dc.pos = 0,
+	};
+	struct xfs_mount		*mp = sc->mp;
+	size_t				bufsize;
+	loff_t				oldpos;
+	int				error;
+
+	if (!S_ISDIR(VFS_I(sc->ip)->i_mode))
+		return -ENOENT;
+
+	/* Plausible size? */
+	XFS_SCRUB_INO_GOTO(sc, sc->ip->i_ino, NULL, "inode",
+			sc->ip->i_d.di_size >= xfs_dir2_sf_hdr_size(0), out);
+
+	/* Check directory tree structure */
+	error = xfs_scrub_da_btree(sc, XFS_DATA_FORK, xfs_scrub_dir_rec);
+	if (error)
+		return error;
+
+	/* Check that every dirent we see can also be looked up by hash. */
+	bufsize = (size_t)min_t(loff_t, 32768, sc->ip->i_d.di_size);
+	sdc.sc = sc;
+
+	/*
+	 * Look up every name in this directory by hash.
+	 *
+	 * The VFS grabs a read or write lock via i_rwsem before it reads
+	 * or writes to a directory.  If we've gotten this far we've
+	 * already obtained IOLOCK_EXCL, which (since 4.10) is the same as
+	 * getting a write lock on i_rwsem.  Therefore, it is safe for us
+	 * to drop the ILOCK here in order to reuse the _readdir and
+	 * _dir_lookup routines, which do their own ILOCK locking.
+	 */
+	oldpos = 0;
+	sc->ilock_flags &= ~XFS_ILOCK_EXCL;
+	xfs_iunlock(sc->ip, XFS_ILOCK_EXCL);
+	while (true) {
+		error = xfs_readdir(sc->tp, sc->ip, &sdc.dc, bufsize);
+		XFS_SCRUB_OP_ERROR_GOTO(sc,
+				XFS_INO_TO_AGNO(mp, sc->ip->i_ino),
+				XFS_INO_TO_AGBNO(mp, sc->ip->i_ino),
+				"inode", &error, out);
+		if (oldpos == sdc.dc.pos)
+			break;
+		oldpos = sdc.dc.pos;
+	}
+
+out:
+	return error;
+}
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index edfa4c7..ccd27ec 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3327,7 +3327,8 @@ DEFINE_GETFSMAP_EVENT(xfs_getfsmap_mapping);
 	{ XFS_SCRUB_TYPE_INODE,		"inode" }, \
 	{ XFS_SCRUB_TYPE_BMBTD,		"bmapbtd" }, \
 	{ XFS_SCRUB_TYPE_BMBTA,		"bmapbta" }, \
-	{ XFS_SCRUB_TYPE_BMBTC,		"bmapbtc" }
+	{ XFS_SCRUB_TYPE_BMBTC,		"bmapbtc" }, \
+	{ XFS_SCRUB_TYPE_DIR,		"dir" }
 DECLARE_EVENT_CLASS(xfs_scrub_class,
 	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
 		 int error),


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 17/22] xfs: scrub directory freespace
  2017-07-21  4:38 [PATCH v8 00/22] xfs: online scrub support Darrick J. Wong
                   ` (15 preceding siblings ...)
  2017-07-21  4:40 ` [PATCH 16/22] xfs: scrub directory metadata Darrick J. Wong
@ 2017-07-21  4:40 ` Darrick J. Wong
  2017-07-23 17:55   ` Allison Henderson
  2017-07-21  4:40 ` [PATCH 18/22] xfs: scrub extended attributes Darrick J. Wong
                   ` (4 subsequent siblings)
  21 siblings, 1 reply; 63+ messages in thread
From: Darrick J. Wong @ 2017-07-21  4:40 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Check the free space information in a directory.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/scrub/dir.c |  337 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 337 insertions(+)


diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c
index 1d6b5329..661aec3 100644
--- a/fs/xfs/scrub/dir.c
+++ b/fs/xfs/scrub/dir.c
@@ -232,6 +232,338 @@ xfs_scrub_dir_rec(
 #undef XFS_SCRUB_DIRENT_GOTO
 #undef XFS_SCRUB_DIRENT_CHECK
 
+#define XFS_SCRUB_DIR_BLOCK_CHECK(fs_ok) \
+	XFS_SCRUB_DATA_CHECK(sc, XFS_DATA_FORK, \
+		lblk << mp->m_sb.sb_blocklog, "dir", fs_ok)
+#define XFS_SCRUB_DIR_BLOCK_GOTO(fs_ok, label) \
+	XFS_SCRUB_DATA_GOTO(sc, XFS_DATA_FORK, \
+		lblk << mp->m_sb.sb_blocklog, "dir", fs_ok, label)
+#define XFS_SCRUB_DIR_BLOCK_OP_ERROR_GOTO(label) \
+	XFS_SCRUB_FILE_OP_ERROR_GOTO(sc, XFS_DATA_FORK, \
+		lblk << mp->m_sb.sb_blocklog, "dir", &error, label)
+/* Is this free entry either in the bestfree or smaller than all of them? */
+static inline bool
+xfs_scrub_directory_check_free_entry(
+	struct xfs_dir2_data_free	*bf,
+	struct xfs_dir2_data_unused	*dup)
+{
+	struct xfs_dir2_data_free	*dfp;
+	unsigned int			smallest;
+
+	smallest = -1U;
+	for (dfp = &bf[0]; dfp < &bf[XFS_DIR2_DATA_FD_COUNT]; dfp++) {
+		if (dfp->offset &&
+		    be16_to_cpu(dfp->length) == be16_to_cpu(dup->length))
+			return true;
+		if (smallest < be16_to_cpu(dfp->length))
+			smallest = be16_to_cpu(dfp->length);
+	}
+
+	return be16_to_cpu(dup->length) <= smallest;
+}
+
+/* Check free space info in a directory data block. */
+STATIC int
+xfs_scrub_directory_data_bestfree(
+	struct xfs_scrub_context	*sc,
+	xfs_dablk_t			lblk,
+	bool				is_block)
+{
+	struct xfs_dir2_data_unused	*dup;
+	struct xfs_dir2_data_free	*dfp;
+	struct xfs_buf			*bp;
+	struct xfs_dir2_data_free	*bf;
+	struct xfs_mount		*mp = sc->mp;
+	char				*ptr;
+	char				*endptr;
+	u16				tag;
+	int				newlen;
+	int				offset;
+	int				error;
+
+	if (is_block) {
+		/* dir block format */
+		XFS_SCRUB_DIR_BLOCK_CHECK(lblk ==
+				XFS_B_TO_FSBT(mp, XFS_DIR2_DATA_OFFSET));
+		error = xfs_dir3_block_read(sc->tp, sc->ip, &bp);
+	} else {
+		/* dir data format */
+		error = xfs_dir3_data_read(sc->tp, sc->ip, lblk, -1, &bp);
+	}
+	XFS_SCRUB_DIR_BLOCK_OP_ERROR_GOTO(out);
+
+	/* Do the bestfrees correspond to actual free space? */
+	bf = sc->ip->d_ops->data_bestfree_p(bp->b_addr);
+	for (dfp = &bf[0]; dfp < &bf[XFS_DIR2_DATA_FD_COUNT]; dfp++) {
+		offset = be16_to_cpu(dfp->offset);
+		XFS_SCRUB_DIR_BLOCK_GOTO(offset < BBTOB(bp->b_length), nextloop);
+		if (!offset)
+			continue;
+		dup = (struct xfs_dir2_data_unused *)(bp->b_addr + offset);
+		tag = be16_to_cpu(*xfs_dir2_data_unused_tag_p(dup));
+
+		XFS_SCRUB_DIR_BLOCK_CHECK(dup->freetag ==
+				cpu_to_be16(XFS_DIR2_DATA_FREE_TAG));
+		XFS_SCRUB_DIR_BLOCK_CHECK(be16_to_cpu(dup->length) ==
+				be16_to_cpu(dfp->length));
+		XFS_SCRUB_DIR_BLOCK_CHECK(tag ==
+				((char *)dup - (char *)bp->b_addr));
+nextloop:;
+	}
+
+	/* Make sure the bestfrees are actually the best free spaces. */
+	ptr = (char *)sc->ip->d_ops->data_entry_p(bp->b_addr);
+	if (is_block) {
+		struct xfs_dir2_block_tail	*btp;
+
+		btp = xfs_dir2_block_tail_p(sc->mp->m_dir_geo, bp->b_addr);
+		endptr = (char *)xfs_dir2_block_leaf_p(btp);
+	} else
+		endptr = (char *)bp->b_addr + BBTOB(bp->b_length);
+	while (ptr < endptr) {
+		dup = (struct xfs_dir2_data_unused *)ptr;
+		/* Skip real entries */
+		if (dup->freetag != cpu_to_be16(XFS_DIR2_DATA_FREE_TAG)) {
+			struct xfs_dir2_data_entry	*dep;
+
+			dep = (struct xfs_dir2_data_entry *)ptr;
+			newlen = sc->ip->d_ops->data_entsize(dep->namelen);
+			XFS_SCRUB_DIR_BLOCK_GOTO(newlen > 0, out_buf);
+			ptr += newlen;
+			XFS_SCRUB_DIR_BLOCK_CHECK(ptr <= endptr);
+			continue;
+		}
+
+		/* Spot check this free entry */
+		tag = be16_to_cpu(*xfs_dir2_data_unused_tag_p(dup));
+		XFS_SCRUB_DIR_BLOCK_CHECK(tag ==
+				((char *)dup - (char *)bp->b_addr));
+
+		/*
+		 * Either this entry is a bestfree or it's smaller than
+		 * any of the bestfrees.
+		 */
+		XFS_SCRUB_DIR_BLOCK_CHECK(
+				xfs_scrub_directory_check_free_entry(bf, dup));
+
+		/* Move on. */
+		newlen = be16_to_cpu(dup->length);
+		XFS_SCRUB_DIR_BLOCK_GOTO(newlen > 0, out_buf);
+		ptr += newlen;
+		XFS_SCRUB_DIR_BLOCK_CHECK(ptr <= endptr);
+	}
+out_buf:
+	xfs_trans_brelse(sc->tp, bp);
+out:
+	return error;
+}
+
+/* Is this the longest free entry in the block? */
+static inline bool
+xfs_scrub_directory_check_freesp(
+	struct xfs_inode		*dp,
+	struct xfs_buf			*dbp,
+	unsigned int			len)
+{
+	struct xfs_dir2_data_free	*bf;
+	struct xfs_dir2_data_free	*dfp;
+	unsigned int			longest = 0;
+	int				offset;
+
+	bf = dp->d_ops->data_bestfree_p(dbp->b_addr);
+	for (dfp = &bf[0]; dfp < &bf[XFS_DIR2_DATA_FD_COUNT]; dfp++) {
+		offset = be16_to_cpu(dfp->offset);
+		if (!offset)
+			continue;
+		if (longest < be16_to_cpu(dfp->length))
+			longest = be16_to_cpu(dfp->length);
+	}
+
+	return longest == len;
+}
+
+/* Check free space info in a directory leaf1 block. */
+STATIC int
+xfs_scrub_directory_leaf1_bestfree(
+	struct xfs_scrub_context	*sc,
+	struct xfs_da_args		*args,
+	xfs_dablk_t			lblk)
+{
+	struct xfs_dir2_leaf_tail	*ltp;
+	struct xfs_buf			*dbp;
+	struct xfs_buf			*bp;
+	struct xfs_mount		*mp = sc->mp;
+	__be16				*bestp;
+	__u16				best;
+	int				i;
+	int				error;
+
+	/* Read the free space block */
+	error = xfs_dir3_leaf_read(sc->tp, sc->ip, lblk, -1, &bp);
+	XFS_SCRUB_DIR_BLOCK_OP_ERROR_GOTO(out);
+
+	/* Check all the entries. */
+	ltp = xfs_dir2_leaf_tail_p(mp->m_dir_geo, bp->b_addr);
+	bestp = xfs_dir2_leaf_bests_p(ltp);
+	for (i = 0; i < be32_to_cpu(ltp->bestcount); i++, bestp++) {
+		best = be16_to_cpu(*bestp);
+		if (best == NULLDATAOFF)
+			continue;
+		error = xfs_dir3_data_read(sc->tp, sc->ip,
+				i * args->geo->fsbcount, -1, &dbp);
+		XFS_SCRUB_DIR_BLOCK_OP_ERROR_GOTO(skip_buf);
+		XFS_SCRUB_DIR_BLOCK_CHECK(
+				xfs_scrub_directory_check_freesp(sc->ip, dbp,
+					best));
+		xfs_trans_brelse(sc->tp, dbp);
+skip_buf:
+		;
+	}
+out:
+	return error;
+}
+
+/* Check free space info in a directory freespace block. */
+STATIC int
+xfs_scrub_directory_free_bestfree(
+	struct xfs_scrub_context	*sc,
+	struct xfs_da_args		*args,
+	xfs_dablk_t			lblk)
+{
+	struct xfs_dir3_icfree_hdr	freehdr;
+	struct xfs_buf			*dbp;
+	struct xfs_buf			*bp;
+	struct xfs_mount		*mp = sc->mp;
+	__be16				*bestp;
+	__be16				best;
+	int				i;
+	int				error;
+
+	/* Read the free space block */
+	error = xfs_dir2_free_read(sc->tp, sc->ip, lblk, &bp);
+	XFS_SCRUB_DIR_BLOCK_OP_ERROR_GOTO(out);
+
+	/* Check all the entries. */
+	sc->ip->d_ops->free_hdr_from_disk(&freehdr, bp->b_addr);
+	bestp = sc->ip->d_ops->free_bests_p(bp->b_addr);
+	for (i = 0; i < freehdr.nvalid; i++, bestp++) {
+		best = be16_to_cpu(*bestp);
+		if (best == NULLDATAOFF)
+			continue;
+		error = xfs_dir3_data_read(sc->tp, sc->ip,
+				(freehdr.firstdb + i) * args->geo->fsbcount,
+				-1, &dbp);
+		XFS_SCRUB_DIR_BLOCK_OP_ERROR_GOTO(skip_buf);
+		XFS_SCRUB_DIR_BLOCK_CHECK(
+				xfs_scrub_directory_check_freesp(sc->ip, dbp,
+					best));
+		xfs_trans_brelse(sc->tp, dbp);
+skip_buf:
+		;
+	}
+out:
+	return error;
+}
+
+#define for_each_extent_dablk(lblk, args, got) \
+		for ((lblk) = roundup((xfs_dablk_t)(got)->br_startoff, (args)->geo->fsbcount); \
+		     (lblk) < (got)->br_startoff + (got)->br_blockcount; \
+		     (lblk) += (args)->geo->fsbcount)
+/* Check free space information in directories. */
+STATIC int
+xfs_scrub_directory_blocks(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_bmbt_irec		got;
+	struct xfs_da_args		args;
+	struct xfs_ifork		*ifp;
+	struct xfs_mount		*mp = sc->mp;
+	xfs_fileoff_t			leaf_lblk;
+	xfs_fileoff_t			free_lblk;
+	xfs_fileoff_t			lblk;
+	xfs_extnum_t			idx;
+	bool				found;
+	int				is_block = 0;
+	int				error;
+
+	/* Ignore local format directories. */
+	if (sc->ip->i_d.di_format != XFS_DINODE_FMT_EXTENTS &&
+	    sc->ip->i_d.di_format != XFS_DINODE_FMT_BTREE)
+		return 0;
+
+	ifp = XFS_IFORK_PTR(sc->ip, XFS_DATA_FORK);
+	lblk = XFS_B_TO_FSB(mp, XFS_DIR2_DATA_OFFSET);
+	leaf_lblk = XFS_B_TO_FSB(mp, XFS_DIR2_LEAF_OFFSET);
+	free_lblk = XFS_B_TO_FSB(mp, XFS_DIR2_FREE_OFFSET);
+
+	/* Is this a block dir? */
+	args.dp = sc->ip;
+	args.geo = mp->m_dir_geo;
+	args.trans = sc->tp;
+	error = xfs_dir2_isblock(&args, &is_block);
+	XFS_SCRUB_DIR_BLOCK_OP_ERROR_GOTO(out);
+
+	/* Iterate all the data extents in the directory... */
+	found = xfs_iext_lookup_extent(sc->ip, ifp, lblk, &idx, &got);
+	while (found) {
+		/* No more data blocks... */
+		if (got.br_startoff >= leaf_lblk)
+			break;
+
+		/* Check each data block's bestfree data */
+		for_each_extent_dablk(lblk, &args, &got) {
+			error = xfs_scrub_directory_data_bestfree(sc, lblk,
+					is_block);
+			if (error)
+				goto out;
+		}
+
+		found = xfs_iext_get_extent(ifp, ++idx, &got);
+	}
+
+	/* Look for a leaf1 block, which has free info. */
+	if (xfs_iext_lookup_extent(sc->ip, ifp, leaf_lblk, &idx, &got) &&
+	    got.br_startoff == leaf_lblk &&
+	    got.br_blockcount == args.geo->fsbcount &&
+	    !xfs_iext_get_extent(ifp, ++idx, &got)) {
+		XFS_SCRUB_DIR_BLOCK_GOTO(!is_block, not_leaf1);
+		error = xfs_scrub_directory_leaf1_bestfree(sc, &args,
+				leaf_lblk);
+		if (error)
+			goto out;
+	}
+not_leaf1:
+
+	/* Scan for free blocks */
+	lblk = free_lblk;
+	found = xfs_iext_lookup_extent(sc->ip, ifp, lblk, &idx, &got);
+	while (found) {
+		/*
+		 * Dirs can't have blocks mapped above 2^32.
+		 * Single-block dirs shouldn't even be here.
+		 */
+		lblk = got.br_startoff;
+		XFS_SCRUB_DIR_BLOCK_GOTO(!(lblk & ~((1ULL << 32) - 1ULL)), out);
+		XFS_SCRUB_DIR_BLOCK_GOTO(!is_block, nextfree);
+
+		/* Check each dir free block's bestfree data */
+		for_each_extent_dablk(lblk, &args, &got) {
+			error = xfs_scrub_directory_free_bestfree(sc, &args,
+					lblk);
+			if (error)
+				goto out;
+		}
+
+nextfree:
+		found = xfs_iext_get_extent(ifp, ++idx, &got);
+	}
+out:
+	return error;
+}
+#undef for_each_extent_dablk
+#undef XFS_SCRUB_DIR_BLOCK_OP_ERROR_GOTO
+#undef XFS_SCRUB_DIR_BLOCK_CHECK
+
 /* Scrub a whole directory. */
 int
 xfs_scrub_directory(
@@ -258,6 +590,11 @@ xfs_scrub_directory(
 	if (error)
 		return error;
 
+	/* Check the freespace. */
+	error = xfs_scrub_directory_blocks(sc);
+	if (error)
+		return error;
+
 	/* Check that every dirent we see can also be looked up by hash. */
 	bufsize = (size_t)min_t(loff_t, 32768, sc->ip->i_d.di_size);
 	sdc.sc = sc;


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 18/22] xfs: scrub extended attributes
  2017-07-21  4:38 [PATCH v8 00/22] xfs: online scrub support Darrick J. Wong
                   ` (16 preceding siblings ...)
  2017-07-21  4:40 ` [PATCH 17/22] xfs: scrub directory freespace Darrick J. Wong
@ 2017-07-21  4:40 ` Darrick J. Wong
  2017-07-23 17:57   ` Allison Henderson
  2017-07-21  4:40 ` [PATCH 19/22] xfs: scrub symbolic links Darrick J. Wong
                   ` (3 subsequent siblings)
  21 siblings, 1 reply; 63+ messages in thread
From: Darrick J. Wong @ 2017-07-21  4:40 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Scrub the hash tree, keys, and values in an extended attribute structure.
Refactor the attribute code to use the transaction if the caller supplied
one to avoid buffer deadocks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile        |    1 
 fs/xfs/libxfs/xfs_fs.h |    3 -
 fs/xfs/scrub/attr.c    |  212 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/common.c  |    8 ++
 fs/xfs/scrub/common.h  |    3 +
 fs/xfs/xfs_trace.h     |    3 -
 6 files changed, 228 insertions(+), 2 deletions(-)
 create mode 100644 fs/xfs/scrub/attr.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index c568d0d..da64bef 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -142,6 +142,7 @@ ifeq ($(CONFIG_XFS_ONLINE_SCRUB),y)
 xfs-y				+= $(addprefix scrub/, \
 				   agheader.o \
 				   alloc.o \
+				   attr.o \
 				   bmap.o \
 				   btree.o \
 				   common.o \
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index e646b5f..2f553ed 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -498,7 +498,8 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_BMBTA	14	/* attr fork block mapping */
 #define XFS_SCRUB_TYPE_BMBTC	15	/* CoW fork block mapping */
 #define XFS_SCRUB_TYPE_DIR	16	/* directory */
-#define XFS_SCRUB_TYPE_MAX	16
+#define XFS_SCRUB_TYPE_XATTR	17	/* extended attribute */
+#define XFS_SCRUB_TYPE_MAX	17
 
 /* i: repair this metadata */
 #define XFS_SCRUB_FLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
new file mode 100644
index 0000000..f6a4b59
--- /dev/null
+++ b/fs/xfs/scrub/attr.c
@@ -0,0 +1,212 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_trace.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_dir2.h"
+#include "xfs_attr.h"
+#include "xfs_attr_leaf.h"
+#include "scrub/common.h"
+#include "scrub/dabtree.h"
+
+#include <linux/posix_acl_xattr.h>
+#include <linux/xattr.h>
+
+/* Set us up to scrub an inode's extended attributes. */
+int
+xfs_scrub_setup_xattr(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	/* Allocate the buffer without the inode lock held. */
+	sc->buf = kmem_zalloc_large(XATTR_SIZE_MAX, KM_SLEEP);
+	if (!sc->buf)
+		return -ENOMEM;
+
+	return xfs_scrub_setup_inode_contents(sc, ip, 0);
+}
+
+/* Extended Attributes */
+
+struct xfs_scrub_xattr {
+	struct xfs_attr_list_context	context;
+	struct xfs_scrub_context	*sc;
+};
+
+#define XFS_SCRUB_ATTR_CHECK(fs_ok) \
+	XFS_SCRUB_DATA_CHECK(sx->sc, XFS_ATTR_FORK, args.blkno, "attr", fs_ok)
+#define XFS_SCRUB_ATTR_OP_ERROR_GOTO(label) \
+	XFS_SCRUB_FILE_OP_ERROR_GOTO(sx->sc, XFS_ATTR_FORK, args.blkno, "attr", &error, label)
+/* Check that an extended attribute key can be looked up by hash. */
+static void
+xfs_scrub_xattr_listent(
+	struct xfs_attr_list_context	*context,
+	int				flags,
+	unsigned char			*name,
+	int				namelen,
+	int				valuelen)
+{
+	struct xfs_scrub_xattr		*sx;
+	struct xfs_da_args		args = {0};
+	int				error = 0;
+
+	sx = container_of(context, struct xfs_scrub_xattr, context);
+
+	args.flags = ATTR_KERNOTIME;
+	if (flags & XFS_ATTR_ROOT)
+		args.flags |= ATTR_ROOT;
+	else if (flags & XFS_ATTR_SECURE)
+		args.flags |= ATTR_SECURE;
+	args.geo = context->dp->i_mount->m_attr_geo;
+	args.whichfork = XFS_ATTR_FORK;
+	args.dp = context->dp;
+	args.name = name;
+	args.namelen = namelen;
+	args.hashval = xfs_da_hashname(args.name, args.namelen);
+	args.trans = context->tp;
+	args.value = sx->sc->buf;
+	args.valuelen = XATTR_SIZE_MAX;
+
+	error = xfs_attr_get_ilocked(context->dp, &args);
+	if (error == -EEXIST)
+		error = 0;
+	XFS_SCRUB_ATTR_OP_ERROR_GOTO(fail_xref);
+	XFS_SCRUB_ATTR_CHECK(args.valuelen == valuelen);
+
+fail_xref:
+	return;
+}
+#undef XFS_SCRUB_ATTR_OP_ERROR_GOTO
+#undef XFS_SCRUB_ATTR_CHECK
+
+/* Scrub a attribute btree record. */
+STATIC int
+xfs_scrub_xattr_rec(
+	struct xfs_scrub_da_btree	*ds,
+	int				level,
+	void				*rec)
+{
+	struct xfs_mount		*mp = ds->state->mp;
+	struct xfs_attr_leaf_entry	*ent = rec;
+	struct xfs_da_state_blk		*blk;
+	struct xfs_attr_leaf_name_local	*lentry;
+	struct xfs_attr_leaf_name_remote	*rentry;
+	struct xfs_buf			*bp;
+	xfs_dahash_t			calc_hash;
+	xfs_dahash_t			hash;
+	int				nameidx;
+	int				hdrsize;
+	unsigned int			badflags;
+	int				error;
+
+	blk = &ds->state->path.blk[level];
+
+	/* Check the hash of the entry. */
+	error = xfs_scrub_da_btree_hash(ds, level, &ent->hashval);
+	if (error)
+		goto out;
+
+	/* Find the attr entry's location. */
+	bp = blk->bp;
+	hdrsize = xfs_attr3_leaf_hdr_size(bp->b_addr);
+	nameidx = be16_to_cpu(ent->nameidx);
+	XFS_SCRUB_DA_GOTO(ds, nameidx >= hdrsize, out);
+	XFS_SCRUB_DA_GOTO(ds, nameidx < mp->m_attr_geo->blksize, out);
+
+	/* Retrieve the entry and check it. */
+	hash = be32_to_cpu(ent->hashval);
+	badflags = ~(XFS_ATTR_LOCAL | XFS_ATTR_ROOT | XFS_ATTR_SECURE |
+			XFS_ATTR_INCOMPLETE);
+	XFS_SCRUB_DA_CHECK(ds, (ent->flags & badflags) == 0);
+	if (ent->flags & XFS_ATTR_LOCAL) {
+		lentry = (struct xfs_attr_leaf_name_local *)
+				(((char *)bp->b_addr) + nameidx);
+		XFS_SCRUB_DA_GOTO(ds, lentry->namelen < MAXNAMELEN, out);
+		calc_hash = xfs_da_hashname(lentry->nameval, lentry->namelen);
+	} else {
+		rentry = (struct xfs_attr_leaf_name_remote *)
+				(((char *)bp->b_addr) + nameidx);
+		XFS_SCRUB_DA_GOTO(ds, rentry->namelen < MAXNAMELEN, out);
+		calc_hash = xfs_da_hashname(rentry->name, rentry->namelen);
+	}
+	XFS_SCRUB_DA_CHECK(ds, calc_hash == hash);
+
+out:
+	return error;
+}
+
+/* Scrub the extended attribute metadata. */
+int
+xfs_scrub_xattr(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_scrub_xattr		sx = { 0 };
+	struct attrlist_cursor_kern	cursor = { 0 };
+	struct xfs_mount		*mp = sc->mp;
+	int				error = 0;
+
+	if (!xfs_inode_hasattr(sc->ip))
+		return -ENOENT;
+
+	memset(&sx, 0, sizeof(sx));
+	/* Check attribute tree structure */
+	error = xfs_scrub_da_btree(sc, XFS_ATTR_FORK, xfs_scrub_xattr_rec);
+	if (error)
+		goto out;
+
+	/* Check that every attr key can also be looked up by hash. */
+	sx.context.dp = sc->ip;
+	sx.context.cursor = &cursor;
+	sx.context.resynch = 1;
+	sx.context.put_listent = xfs_scrub_xattr_listent;
+	sx.context.tp = sc->tp;
+	sx.sc = sc;
+
+	/*
+	 * Look up every xattr in this file by name.
+	 *
+	 * The VFS only locks i_rwsem when modifying attrs, so keep all
+	 * three locks held because that's the only way to ensure we're
+	 * the only thread poking into the da btree.  We traverse the da
+	 * btree while holding a leaf buffer locked for the xattr name
+	 * iteration, which doesn't really follow the usual buffer
+	 * locking order.
+	 */
+	error = xfs_attr_list_int_ilocked(&sx.context);
+	XFS_SCRUB_OP_ERROR_GOTO(sc,
+			XFS_INO_TO_AGNO(mp, sc->ip->i_ino),
+			XFS_INO_TO_AGBNO(mp, sc->ip->i_ino),
+			"inode", &error, out);
+out:
+	return error;
+}
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 92627e9..a47c654 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -686,6 +686,10 @@ xfs_scrub_teardown(
 			IRELE(sc->ip);
 		sc->ip = NULL;
 	}
+	if (sc->buf) {
+		kmem_free(sc->buf);
+		sc->buf = NULL;
+	}
 	return error;
 }
 
@@ -844,6 +848,10 @@ static const struct xfs_scrub_meta_fns meta_scrub_fns[] = {
 		.setup	= xfs_scrub_setup_directory,
 		.scrub	= xfs_scrub_directory,
 	},
+	{ /* extended attributes */
+		.setup	= xfs_scrub_setup_xattr,
+		.scrub	= xfs_scrub_xattr,
+	},
 };
 
 /* Dispatch metadata scrubbing. */
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 7baaa2d..1cfe0cc 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -52,6 +52,7 @@ struct xfs_scrub_context {
 	const struct xfs_scrub_meta_fns	*fns;
 	struct xfs_trans		*tp;
 	struct xfs_inode		*ip;
+	void				*buf;
 	uint				ilock_flags;
 	bool				try_harder;
 
@@ -221,6 +222,7 @@ SETUP_FN(xfs_scrub_setup_inode);
 SETUP_FN(xfs_scrub_setup_inode_bmap_data);
 SETUP_FN(xfs_scrub_setup_inode_bmap);
 SETUP_FN(xfs_scrub_setup_directory);
+SETUP_FN(xfs_scrub_setup_xattr);
 #undef SETUP_FN
 
 /* Metadata scrubbers */
@@ -243,6 +245,7 @@ SCRUB_FN(xfs_scrub_bmap_data);
 SCRUB_FN(xfs_scrub_bmap_attr);
 SCRUB_FN(xfs_scrub_bmap_cow);
 SCRUB_FN(xfs_scrub_directory);
+SCRUB_FN(xfs_scrub_xattr);
 #undef SCRUB_FN
 
 #endif	/* __XFS_REPAIR_COMMON_H__ */
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index ccd27ec..fe4b313 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3328,7 +3328,8 @@ DEFINE_GETFSMAP_EVENT(xfs_getfsmap_mapping);
 	{ XFS_SCRUB_TYPE_BMBTD,		"bmapbtd" }, \
 	{ XFS_SCRUB_TYPE_BMBTA,		"bmapbta" }, \
 	{ XFS_SCRUB_TYPE_BMBTC,		"bmapbtc" }, \
-	{ XFS_SCRUB_TYPE_DIR,		"dir" }
+	{ XFS_SCRUB_TYPE_DIR,		"dir" }, \
+	{ XFS_SCRUB_TYPE_XATTR,		"xattr" }
 DECLARE_EVENT_CLASS(xfs_scrub_class,
 	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
 		 int error),


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 19/22] xfs: scrub symbolic links
  2017-07-21  4:38 [PATCH v8 00/22] xfs: online scrub support Darrick J. Wong
                   ` (17 preceding siblings ...)
  2017-07-21  4:40 ` [PATCH 18/22] xfs: scrub extended attributes Darrick J. Wong
@ 2017-07-21  4:40 ` Darrick J. Wong
  2017-07-23 17:59   ` Allison Henderson
  2017-07-21  4:40 ` [PATCH 20/22] xfs: scrub parent pointers Darrick J. Wong
                   ` (2 subsequent siblings)
  21 siblings, 1 reply; 63+ messages in thread
From: Darrick J. Wong @ 2017-07-21  4:40 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Create the infrastructure to scrub symbolic link data.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile        |    1 +
 fs/xfs/libxfs/xfs_fs.h |    3 +-
 fs/xfs/scrub/common.c  |    4 ++
 fs/xfs/scrub/common.h  |    2 +
 fs/xfs/scrub/symlink.c |   94 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_trace.h     |    3 +-
 6 files changed, 105 insertions(+), 2 deletions(-)
 create mode 100644 fs/xfs/scrub/symlink.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index da64bef..3d862b9 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -153,5 +153,6 @@ xfs-y				+= $(addprefix scrub/, \
 				   metabufs.o \
 				   refcount.o \
 				   rmap.o \
+				   symlink.o \
 				   )
 endif
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 2f553ed..95d9ce9 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -499,7 +499,8 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_BMBTC	15	/* CoW fork block mapping */
 #define XFS_SCRUB_TYPE_DIR	16	/* directory */
 #define XFS_SCRUB_TYPE_XATTR	17	/* extended attribute */
-#define XFS_SCRUB_TYPE_MAX	17
+#define XFS_SCRUB_TYPE_SYMLINK	18	/* symbolic link */
+#define XFS_SCRUB_TYPE_MAX	18
 
 /* i: repair this metadata */
 #define XFS_SCRUB_FLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index a47c654..4003c2f 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -852,6 +852,10 @@ static const struct xfs_scrub_meta_fns meta_scrub_fns[] = {
 		.setup	= xfs_scrub_setup_xattr,
 		.scrub	= xfs_scrub_xattr,
 	},
+	{ /* symbolic link */
+		.setup	= xfs_scrub_setup_symlink,
+		.scrub	= xfs_scrub_symlink,
+	},
 };
 
 /* Dispatch metadata scrubbing. */
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 1cfe0cc..6d02a64 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -223,6 +223,7 @@ SETUP_FN(xfs_scrub_setup_inode_bmap_data);
 SETUP_FN(xfs_scrub_setup_inode_bmap);
 SETUP_FN(xfs_scrub_setup_directory);
 SETUP_FN(xfs_scrub_setup_xattr);
+SETUP_FN(xfs_scrub_setup_symlink);
 #undef SETUP_FN
 
 /* Metadata scrubbers */
@@ -246,6 +247,7 @@ SCRUB_FN(xfs_scrub_bmap_attr);
 SCRUB_FN(xfs_scrub_bmap_cow);
 SCRUB_FN(xfs_scrub_directory);
 SCRUB_FN(xfs_scrub_xattr);
+SCRUB_FN(xfs_scrub_symlink);
 #undef SCRUB_FN
 
 #endif	/* __XFS_REPAIR_COMMON_H__ */
diff --git a/fs/xfs/scrub/symlink.c b/fs/xfs/scrub/symlink.c
new file mode 100644
index 0000000..75537e9
--- /dev/null
+++ b/fs/xfs/scrub/symlink.c
@@ -0,0 +1,94 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_trace.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_inode_fork.h"
+#include "xfs_symlink.h"
+#include "scrub/common.h"
+
+/* Set us up to scrub a symbolic link. */
+int
+xfs_scrub_setup_symlink(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	/* Allocate the buffer without the inode lock held. */
+	sc->buf = kmem_zalloc_large(XFS_SYMLINK_MAXLEN + 1, KM_SLEEP);
+	if (!sc->buf)
+		return -ENOMEM;
+
+	return xfs_scrub_setup_inode_contents(sc, ip, 0);
+}
+
+/* Symbolic links. */
+
+#define XFS_SCRUB_SYMLINK_CHECK(fs_ok) \
+	XFS_SCRUB_INO_CHECK(sc, ip->i_ino, NULL, "symlink", fs_ok)
+#define XFS_SCRUB_SYMLINK_GOTO(fs_ok, label) \
+	XFS_SCRUB_INO_GOTO(sc, ip->i_ino, NULL, "symlink", fs_ok, label)
+int
+xfs_scrub_symlink(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_inode		*ip = sc->ip;
+	struct xfs_ifork		*ifp;
+	loff_t				len;
+	int				error = 0;
+
+	if (!S_ISLNK(VFS_I(ip)->i_mode))
+		return -ENOENT;
+	ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
+	len = ip->i_d.di_size;
+
+	/* Plausible size? */
+	XFS_SCRUB_SYMLINK_GOTO(len <= XFS_SYMLINK_MAXLEN, out);
+	XFS_SCRUB_SYMLINK_GOTO(len > 0, out);
+
+	/* Inline symlink? */
+	if (ifp->if_flags & XFS_IFINLINE) {
+		XFS_SCRUB_SYMLINK_GOTO(len > 0, out);
+		XFS_SCRUB_SYMLINK_CHECK(len <= XFS_IFORK_DSIZE(ip));
+		XFS_SCRUB_SYMLINK_CHECK(len <= strnlen(ifp->if_u1.if_data,
+				XFS_IFORK_DSIZE(ip)));
+		goto out;
+	}
+
+	/* Remote symlink; must read the contents. */
+	error = xfs_readlink_bmap_ilocked(sc->ip, sc->buf);
+	XFS_SCRUB_FILE_OP_ERROR_GOTO(sc, XFS_DATA_FORK, 0, "symlink",
+			&error, out);
+	XFS_SCRUB_SYMLINK_CHECK(len <= strnlen(sc->buf, XFS_SYMLINK_MAXLEN));
+out:
+	return error;
+}
+#undef XFS_SCRUB_SYMLINK_GOTO
+#undef XFS_SCRUB_SYMLINK_CHECK
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index fe4b313..39824f8 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3329,7 +3329,8 @@ DEFINE_GETFSMAP_EVENT(xfs_getfsmap_mapping);
 	{ XFS_SCRUB_TYPE_BMBTA,		"bmapbta" }, \
 	{ XFS_SCRUB_TYPE_BMBTC,		"bmapbtc" }, \
 	{ XFS_SCRUB_TYPE_DIR,		"dir" }, \
-	{ XFS_SCRUB_TYPE_XATTR,		"xattr" }
+	{ XFS_SCRUB_TYPE_XATTR,		"xattr" }, \
+	{ XFS_SCRUB_TYPE_SYMLINK,	"symlink" }
 DECLARE_EVENT_CLASS(xfs_scrub_class,
 	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
 		 int error),


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 20/22] xfs: scrub parent pointers
  2017-07-21  4:38 [PATCH v8 00/22] xfs: online scrub support Darrick J. Wong
                   ` (18 preceding siblings ...)
  2017-07-21  4:40 ` [PATCH 19/22] xfs: scrub symbolic links Darrick J. Wong
@ 2017-07-21  4:40 ` Darrick J. Wong
  2017-07-23 18:03   ` Allison Henderson
  2017-07-21  4:40 ` [PATCH 21/22] xfs: scrub realtime bitmap/summary Darrick J. Wong
  2017-07-21  4:40 ` [PATCH 22/22] xfs: scrub quota information Darrick J. Wong
  21 siblings, 1 reply; 63+ messages in thread
From: Darrick J. Wong @ 2017-07-21  4:40 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Scrub parent pointers, sort of.  For directories, we can ride the
'..' entry up to the parent to confirm that there's at most one
dentry that points back to this directory.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile        |    1 
 fs/xfs/libxfs/xfs_fs.h |    3 -
 fs/xfs/scrub/common.c  |    4 +
 fs/xfs/scrub/common.h  |    2 
 fs/xfs/scrub/parent.c  |  252 ++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 261 insertions(+), 1 deletion(-)
 create mode 100644 fs/xfs/scrub/parent.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 3d862b9..e73cdc2 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -151,6 +151,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   ialloc.o \
 				   inode.o \
 				   metabufs.o \
+				   parent.o \
 				   refcount.o \
 				   rmap.o \
 				   symlink.o \
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 95d9ce9..4ad3056 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -500,7 +500,8 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_DIR	16	/* directory */
 #define XFS_SCRUB_TYPE_XATTR	17	/* extended attribute */
 #define XFS_SCRUB_TYPE_SYMLINK	18	/* symbolic link */
-#define XFS_SCRUB_TYPE_MAX	18
+#define XFS_SCRUB_TYPE_PARENT	19	/* parent pointers */
+#define XFS_SCRUB_TYPE_MAX	19
 
 /* i: repair this metadata */
 #define XFS_SCRUB_FLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 4003c2f..6f701c6 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -856,6 +856,10 @@ static const struct xfs_scrub_meta_fns meta_scrub_fns[] = {
 		.setup	= xfs_scrub_setup_symlink,
 		.scrub	= xfs_scrub_symlink,
 	},
+	{ /* parent pointers */
+		.setup	= xfs_scrub_setup_parent,
+		.scrub	= xfs_scrub_parent,
+	},
 };
 
 /* Dispatch metadata scrubbing. */
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 6d02a64..1873a31 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -223,6 +223,7 @@ SETUP_FN(xfs_scrub_setup_inode_bmap_data);
 SETUP_FN(xfs_scrub_setup_inode_bmap);
 SETUP_FN(xfs_scrub_setup_directory);
 SETUP_FN(xfs_scrub_setup_xattr);
+SETUP_FN(xfs_scrub_setup_parent);
 SETUP_FN(xfs_scrub_setup_symlink);
 #undef SETUP_FN
 
@@ -247,6 +248,7 @@ SCRUB_FN(xfs_scrub_bmap_attr);
 SCRUB_FN(xfs_scrub_bmap_cow);
 SCRUB_FN(xfs_scrub_directory);
 SCRUB_FN(xfs_scrub_xattr);
+SCRUB_FN(xfs_scrub_parent);
 SCRUB_FN(xfs_scrub_symlink);
 #undef SCRUB_FN
 
diff --git a/fs/xfs/scrub/parent.c b/fs/xfs/scrub/parent.c
new file mode 100644
index 0000000..e604885
--- /dev/null
+++ b/fs/xfs/scrub/parent.c
@@ -0,0 +1,252 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_trace.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_icache.h"
+#include "xfs_dir2.h"
+#include "xfs_dir2_priv.h"
+#include "scrub/common.h"
+
+/* Set us up to scrub parents. */
+int
+xfs_scrub_setup_parent(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	return xfs_scrub_setup_inode_contents(sc, ip, 0);
+}
+
+/* Parent pointers */
+
+/* Look for an entry in a parent pointing to this inode. */
+
+struct xfs_scrub_parent_ctx {
+	struct dir_context		dc;
+	xfs_ino_t			ino;
+	xfs_nlink_t			nr;
+};
+
+/* Look for a single entry in a directory pointing to an inode. */
+STATIC int
+xfs_scrub_parent_actor(
+	struct dir_context		*dc,
+	const char			*name,
+	int				namelen,
+	loff_t				pos,
+	u64				ino,
+	unsigned			type)
+{
+	struct xfs_scrub_parent_ctx	*spc;
+
+	spc = container_of(dc, struct xfs_scrub_parent_ctx, dc);
+	if (spc->ino == ino)
+		spc->nr++;
+	return 0;
+}
+
+/* Count the number of dentries in the parent dir that point to this inode. */
+STATIC int
+xfs_scrub_parent_count_parent_dentries(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*parent,
+	xfs_nlink_t			*nr)
+{
+	struct xfs_scrub_parent_ctx	spc = {
+		.dc.actor = xfs_scrub_parent_actor,
+		.dc.pos = 0,
+		.ino = sc->ip->i_ino,
+		.nr = 0,
+	};
+	struct xfs_ifork		*ifp;
+	size_t				bufsize;
+	loff_t				oldpos;
+	uint				lock_mode;
+	int				error;
+
+	/*
+	 * Load the parent directory's extent map.  A regular directory
+	 * open would start readahead (and thus load the extent map)
+	 * before we even got to a readdir call, but this isn't
+	 * guaranteed here.
+	 */
+	lock_mode = xfs_ilock_data_map_shared(parent);
+	ifp = XFS_IFORK_PTR(parent, XFS_DATA_FORK);
+	if (XFS_IFORK_FORMAT(parent, XFS_DATA_FORK) == XFS_DINODE_FMT_BTREE &&
+	    !(ifp->if_flags & XFS_IFEXTENTS)) {
+		error = xfs_iread_extents(sc->tp, parent, XFS_DATA_FORK);
+		if (error) {
+			xfs_iunlock(parent, lock_mode);
+			return error;
+		}
+	}
+	xfs_iunlock(parent, lock_mode);
+
+	/*
+	 * Iterate the parent dir to confirm that there is
+	 * exactly one entry pointing back to the inode being
+	 * scanned.
+	 */
+	bufsize = (size_t)min_t(loff_t, 32768, parent->i_d.di_size);
+	oldpos = 0;
+	while (true) {
+		error = xfs_readdir(sc->tp, parent, &spc.dc, bufsize);
+		if (error)
+			goto out;
+		if (oldpos == spc.dc.pos)
+			break;
+		oldpos = spc.dc.pos;
+	}
+	*nr = spc.nr;
+out:
+	return error;
+}
+
+/* Scrub a parent pointer. */
+#define XFS_SCRUB_PARENT_CHECK(fs_ok) \
+	XFS_SCRUB_DATA_CHECK(sc, XFS_DATA_FORK, 0, "parent", fs_ok)
+#define XFS_SCRUB_PARENT_GOTO(fs_ok, label) \
+	XFS_SCRUB_DATA_GOTO(sc, XFS_DATA_FORK, 0, "parent", fs_ok, label)
+#define XFS_SCRUB_PARENT_OP_ERROR_GOTO(label) \
+	XFS_SCRUB_FILE_OP_ERROR_GOTO(sc, XFS_DATA_FORK, 0, "parent", \
+		&error, label)
+int
+xfs_scrub_parent(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_inode		*dp = NULL;
+	xfs_ino_t			dnum;
+	xfs_nlink_t			nr;
+	int				tries = 0;
+	int				error;
+
+	/*
+	 * If we're a directory, check that the '..' link points up to
+	 * a directory that has one entry pointing to us.
+	 */
+	if (!S_ISDIR(VFS_I(sc->ip)->i_mode))
+		return -ENOENT;
+
+	/*
+	 * The VFS grabs a read or write lock via i_rwsem before it reads
+	 * or writes to a directory.  If we've gotten this far we've
+	 * already obtained IOLOCK_EXCL, which (since 4.10) is the same as
+	 * getting a write lock on i_rwsem.  Therefore, it is safe for us
+	 * to drop the ILOCK here in order to do directory lookups.
+	 */
+	sc->ilock_flags &= ~(XFS_ILOCK_EXCL | XFS_MMAPLOCK_EXCL);
+	xfs_iunlock(sc->ip, XFS_ILOCK_EXCL | XFS_MMAPLOCK_EXCL);
+
+	/* Look up '..' */
+	error = xfs_dir_lookup(sc->tp, sc->ip, &xfs_name_dotdot, &dnum, NULL);
+	XFS_SCRUB_PARENT_OP_ERROR_GOTO(out);
+
+	/* Is this the root dir?  Then '..' must point to itself. */
+	if (sc->ip == mp->m_rootip) {
+		XFS_SCRUB_PARENT_CHECK(sc->ip->i_ino == mp->m_sb.sb_rootino);
+		XFS_SCRUB_PARENT_CHECK(dnum == sc->ip->i_ino);
+		return 0;
+	}
+
+try_again:
+	/* Otherwise, '..' must not point to ourselves. */
+	XFS_SCRUB_PARENT_GOTO(sc->ip->i_ino != dnum, out);
+
+	error = xfs_iget(mp, sc->tp, dnum, 0, 0, &dp);
+	XFS_SCRUB_PARENT_OP_ERROR_GOTO(out);
+	XFS_SCRUB_PARENT_GOTO(dp != sc->ip, out_rele);
+
+	/*
+	 * We prefer to keep the inode locked while we lock and search
+	 * its alleged parent for a forward reference.  However, this
+	 * child -> parent scheme can deadlock with the parent -> child
+	 * scheme that is normally used.  Therefore, if we can lock the
+	 * parent, just validate the references and get out.
+	 */
+	if (xfs_ilock_nowait(dp, XFS_IOLOCK_SHARED)) {
+		error = xfs_scrub_parent_count_parent_dentries(sc, dp, &nr);
+		XFS_SCRUB_PARENT_OP_ERROR_GOTO(out_unlock);
+		XFS_SCRUB_PARENT_CHECK(nr == 1);
+		goto out_unlock;
+	}
+
+	/*
+	 * The game changes if we get here.  We failed to lock the parent,
+	 * so we're going to try to verify both pointers while only holding
+	 * one lock so as to avoid deadlocking with something that's actually
+	 * trying to traverse down the directory tree.
+	 */
+	xfs_iunlock(sc->ip, sc->ilock_flags);
+	sc->ilock_flags = 0;
+	xfs_ilock(dp, XFS_IOLOCK_SHARED);
+
+	/* Go looking for our dentry. */
+	error = xfs_scrub_parent_count_parent_dentries(sc, dp, &nr);
+	XFS_SCRUB_PARENT_OP_ERROR_GOTO(out_unlock);
+
+	/* Drop the parent lock, relock this inode. */
+	xfs_iunlock(dp, XFS_IOLOCK_SHARED);
+	sc->ilock_flags = XFS_IOLOCK_EXCL;
+	xfs_ilock(sc->ip, sc->ilock_flags);
+
+	/* Look up '..' to see if the inode changed. */
+	error = xfs_dir_lookup(sc->tp, sc->ip, &xfs_name_dotdot, &dnum, NULL);
+	XFS_SCRUB_PARENT_OP_ERROR_GOTO(out_rele);
+
+	/* Drat, parent changed.  Try again! */
+	if (dnum != dp->i_ino) {
+		IRELE(dp);
+		tries++;
+		if (tries < 20)
+			goto try_again;
+		XFS_SCRUB_INCOMPLETE(sc, "parent", false);
+		goto out;
+	}
+	IRELE(dp);
+
+	/*
+	 * '..' didn't change, so check that there was only one entry
+	 * for us in the parent.
+	 */
+	XFS_SCRUB_PARENT_CHECK(nr == 1);
+	goto out;
+
+out_unlock:
+	xfs_iunlock(dp, XFS_IOLOCK_SHARED);
+out_rele:
+	IRELE(dp);
+out:
+	return error;
+}
+#undef XFS_SCRUB_PARENT_OP_ERROR_GOTO
+#undef XFS_SCRUB_PARENT_GOTO
+#undef XFS_SCRUB_PARENT_CHECK


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 21/22] xfs: scrub realtime bitmap/summary
  2017-07-21  4:38 [PATCH v8 00/22] xfs: online scrub support Darrick J. Wong
                   ` (19 preceding siblings ...)
  2017-07-21  4:40 ` [PATCH 20/22] xfs: scrub parent pointers Darrick J. Wong
@ 2017-07-21  4:40 ` Darrick J. Wong
  2017-07-23 18:05   ` Allison Henderson
  2017-07-21  4:40 ` [PATCH 22/22] xfs: scrub quota information Darrick J. Wong
  21 siblings, 1 reply; 63+ messages in thread
From: Darrick J. Wong @ 2017-07-21  4:40 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Perform simple tests of the realtime bitmap and summary.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile            |    2 +
 fs/xfs/libxfs/xfs_format.h |    5 ++
 fs/xfs/libxfs/xfs_fs.h     |    4 +-
 fs/xfs/scrub/agheader.c    |    1 
 fs/xfs/scrub/common.c      |   15 +++++++
 fs/xfs/scrub/common.h      |    3 +
 fs/xfs/scrub/rtbitmap.c    |  101 ++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_trace.h         |    4 +-
 8 files changed, 133 insertions(+), 2 deletions(-)
 create mode 100644 fs/xfs/scrub/rtbitmap.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index e73cdc2..86fc94c 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -156,4 +156,6 @@ xfs-y				+= $(addprefix scrub/, \
 				   rmap.o \
 				   symlink.o \
 				   )
+
+xfs-$(CONFIG_XFS_RT)		+= scrub/rtbitmap.o
 endif
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 154c3dd..d4d9bef 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -315,6 +315,11 @@ static inline bool xfs_sb_good_version(struct xfs_sb *sbp)
 	return false;
 }
 
+static inline bool xfs_sb_version_hasrealtime(struct xfs_sb *sbp)
+{
+	return sbp->sb_rblocks > 0;
+}
+
 /*
  * Detect a mismatched features2 field.  Older kernels read/wrote
  * this into the wrong slot, so to be safe we keep them in sync.
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 4ad3056..83121fc 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -501,7 +501,9 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_XATTR	17	/* extended attribute */
 #define XFS_SCRUB_TYPE_SYMLINK	18	/* symbolic link */
 #define XFS_SCRUB_TYPE_PARENT	19	/* parent pointers */
-#define XFS_SCRUB_TYPE_MAX	19
+#define XFS_SCRUB_TYPE_RTBITMAP	20	/* realtime bitmap */
+#define XFS_SCRUB_TYPE_RTSUM	21	/* realtime summary */
+#define XFS_SCRUB_TYPE_MAX	21
 
 /* i: repair this metadata */
 #define XFS_SCRUB_FLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/agheader.c b/fs/xfs/scrub/agheader.c
index 137d2ad..8048a63 100644
--- a/fs/xfs/scrub/agheader.c
+++ b/fs/xfs/scrub/agheader.c
@@ -247,6 +247,7 @@ xfs_scrub_superblock(
 	XFS_SCRUB_SB_FEAT(metauuid);
 	XFS_SCRUB_SB_FEAT(rmapbt);
 	XFS_SCRUB_SB_FEAT(reflink);
+	XFS_SCRUB_SB_FEAT(realtime);
 #undef XFS_SCRUB_SB_FEAT
 
 #define XFS_SCRUB_SB_FEAT_PREEN(fn) \
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 6f701c6..6e40fa6 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -860,6 +860,21 @@ static const struct xfs_scrub_meta_fns meta_scrub_fns[] = {
 		.setup	= xfs_scrub_setup_parent,
 		.scrub	= xfs_scrub_parent,
 	},
+#ifdef CONFIG_XFS_RT
+	{ /* realtime bitmap */
+		.setup	= xfs_scrub_setup_rt,
+		.scrub	= xfs_scrub_rtbitmap,
+		.has	= xfs_sb_version_hasrealtime,
+	},
+	{ /* realtime summary */
+		.setup	= xfs_scrub_setup_rt,
+		.scrub	= xfs_scrub_rtsummary,
+		.has	= xfs_sb_version_hasrealtime,
+	},
+#else
+	{ NULL },
+	{ NULL },
+#endif
 };
 
 /* Dispatch metadata scrubbing. */
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 1873a31..43a74f0 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -225,6 +225,7 @@ SETUP_FN(xfs_scrub_setup_directory);
 SETUP_FN(xfs_scrub_setup_xattr);
 SETUP_FN(xfs_scrub_setup_parent);
 SETUP_FN(xfs_scrub_setup_symlink);
+SETUP_FN(xfs_scrub_setup_rt);
 #undef SETUP_FN
 
 /* Metadata scrubbers */
@@ -250,6 +251,8 @@ SCRUB_FN(xfs_scrub_directory);
 SCRUB_FN(xfs_scrub_xattr);
 SCRUB_FN(xfs_scrub_parent);
 SCRUB_FN(xfs_scrub_symlink);
+SCRUB_FN(xfs_scrub_rtbitmap);
+SCRUB_FN(xfs_scrub_rtsummary);
 #undef SCRUB_FN
 
 #endif	/* __XFS_REPAIR_COMMON_H__ */
diff --git a/fs/xfs/scrub/rtbitmap.c b/fs/xfs/scrub/rtbitmap.c
new file mode 100644
index 0000000..b061066
--- /dev/null
+++ b/fs/xfs/scrub/rtbitmap.c
@@ -0,0 +1,101 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_trace.h"
+#include "xfs_sb.h"
+#include "xfs_alloc.h"
+#include "xfs_rtalloc.h"
+#include "xfs_inode.h"
+#include "scrub/common.h"
+
+/* Set us up with the realtime metadata locked. */
+int
+xfs_scrub_setup_rt(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	struct xfs_mount		*mp = sc->mp;
+	int				lockmode;
+	int				error = 0;
+
+	if (sc->sm->sm_agno || sc->sm->sm_ino || sc->sm->sm_gen)
+		return -EINVAL;
+
+	error = xfs_scrub_setup_fs(sc, ip);
+	if (error)
+		return error;
+
+	lockmode = XFS_ILOCK_EXCL | XFS_ILOCK_RTBITMAP;
+	xfs_ilock(mp->m_rbmip, lockmode);
+	xfs_trans_ijoin(sc->tp, mp->m_rbmip, lockmode);
+
+	return 0;
+}
+
+/* Realtime bitmap. */
+
+#define XFS_SCRUB_RTBITMAP_CHECK(fs_ok) \
+	XFS_SCRUB_CHECK(sc, bp, "rtbitmap", fs_ok)
+#define XFS_SCRUB_RTBITMAP_OP_ERROR_GOTO(error, label) \
+	XFS_SCRUB_OP_ERROR_GOTO(sc, 0, 0, "rtbitmap", error, label)
+/* Scrub a free extent record from the realtime bitmap. */
+STATIC int
+xfs_scrub_rtbitmap_helper(
+	struct xfs_trans		*tp,
+	struct xfs_rtalloc_rec		*rec,
+	void				*priv)
+{
+	return 0;
+}
+
+/* Scrub the realtime bitmap. */
+int
+xfs_scrub_rtbitmap(
+	struct xfs_scrub_context	*sc)
+{
+	int				error;
+
+	error = xfs_rtalloc_query_all(sc->tp, xfs_scrub_rtbitmap_helper, NULL);
+	XFS_SCRUB_RTBITMAP_OP_ERROR_GOTO(&error, out);
+
+out:
+	return error;
+}
+#undef XFS_SCRUB_RTBITMAP_OP_ERROR_GOTO
+#undef XFS_SCRUB_RTBITMAP_CHECK
+
+/* Scrub the realtime summary. */
+int
+xfs_scrub_rtsummary(
+	struct xfs_scrub_context	*sc)
+{
+	/* XXX: implement this some day */
+	return -ENOENT;
+}
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 39824f8..1be7b00 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3330,7 +3330,9 @@ DEFINE_GETFSMAP_EVENT(xfs_getfsmap_mapping);
 	{ XFS_SCRUB_TYPE_BMBTC,		"bmapbtc" }, \
 	{ XFS_SCRUB_TYPE_DIR,		"dir" }, \
 	{ XFS_SCRUB_TYPE_XATTR,		"xattr" }, \
-	{ XFS_SCRUB_TYPE_SYMLINK,	"symlink" }
+	{ XFS_SCRUB_TYPE_SYMLINK,	"symlink" }, \
+	{ XFS_SCRUB_TYPE_RTBITMAP,	"rtbitmap" }, \
+	{ XFS_SCRUB_TYPE_RTSUM,		"rtsummary" }
 DECLARE_EVENT_CLASS(xfs_scrub_class,
 	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
 		 int error),


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 22/22] xfs: scrub quota information
  2017-07-21  4:38 [PATCH v8 00/22] xfs: online scrub support Darrick J. Wong
                   ` (20 preceding siblings ...)
  2017-07-21  4:40 ` [PATCH 21/22] xfs: scrub realtime bitmap/summary Darrick J. Wong
@ 2017-07-21  4:40 ` Darrick J. Wong
  2017-07-23 18:07   ` Allison Henderson
  21 siblings, 1 reply; 63+ messages in thread
From: Darrick J. Wong @ 2017-07-21  4:40 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Perform some quick sanity testing of the disk quota information.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile        |    1 
 fs/xfs/libxfs/xfs_fs.h |    5 +
 fs/xfs/scrub/common.c  |   18 +++
 fs/xfs/scrub/common.h  |    2 
 fs/xfs/scrub/quota.c   |  274 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_trace.h     |    5 +
 6 files changed, 303 insertions(+), 2 deletions(-)
 create mode 100644 fs/xfs/scrub/quota.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 86fc94c..010a90f 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -158,4 +158,5 @@ xfs-y				+= $(addprefix scrub/, \
 				   )
 
 xfs-$(CONFIG_XFS_RT)		+= scrub/rtbitmap.o
+xfs-$(CONFIG_XFS_QUOTA)		+= scrub/quota.o
 endif
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 83121fc..444e286 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -503,7 +503,10 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_PARENT	19	/* parent pointers */
 #define XFS_SCRUB_TYPE_RTBITMAP	20	/* realtime bitmap */
 #define XFS_SCRUB_TYPE_RTSUM	21	/* realtime summary */
-#define XFS_SCRUB_TYPE_MAX	21
+#define XFS_SCRUB_TYPE_UQUOTA	22	/* user quotas */
+#define XFS_SCRUB_TYPE_GQUOTA	23	/* group quotas */
+#define XFS_SCRUB_TYPE_PQUOTA	24	/* project quotas */
+#define XFS_SCRUB_TYPE_MAX	24
 
 /* i: repair this metadata */
 #define XFS_SCRUB_FLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 6e40fa6..62884a8 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -875,6 +875,24 @@ static const struct xfs_scrub_meta_fns meta_scrub_fns[] = {
 	{ NULL },
 	{ NULL },
 #endif
+#ifdef CONFIG_XFS_QUOTA
+	{ /* user quota */
+		.setup = xfs_scrub_setup_quota,
+		.scrub = xfs_scrub_quota,
+	},
+	{ /* group quota */
+		.setup = xfs_scrub_setup_quota,
+		.scrub = xfs_scrub_quota,
+	},
+	{ /* project quota */
+		.setup = xfs_scrub_setup_quota,
+		.scrub = xfs_scrub_quota,
+	},
+#else
+	{ NULL },
+	{ NULL },
+	{ NULL },
+#endif
 };
 
 /* Dispatch metadata scrubbing. */
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 43a74f0..fcb3764 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -226,6 +226,7 @@ SETUP_FN(xfs_scrub_setup_xattr);
 SETUP_FN(xfs_scrub_setup_parent);
 SETUP_FN(xfs_scrub_setup_symlink);
 SETUP_FN(xfs_scrub_setup_rt);
+SETUP_FN(xfs_scrub_setup_quota);
 #undef SETUP_FN
 
 /* Metadata scrubbers */
@@ -253,6 +254,7 @@ SCRUB_FN(xfs_scrub_parent);
 SCRUB_FN(xfs_scrub_symlink);
 SCRUB_FN(xfs_scrub_rtbitmap);
 SCRUB_FN(xfs_scrub_rtsummary);
+SCRUB_FN(xfs_scrub_quota);
 #undef SCRUB_FN
 
 #endif	/* __XFS_REPAIR_COMMON_H__ */
diff --git a/fs/xfs/scrub/quota.c b/fs/xfs/scrub/quota.c
new file mode 100644
index 0000000..117b8b6
--- /dev/null
+++ b/fs/xfs/scrub/quota.c
@@ -0,0 +1,274 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_trace.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_inode_fork.h"
+#include "xfs_bmap.h"
+#include "xfs_quota.h"
+#include "xfs_qm.h"
+#include "xfs_dquot.h"
+#include "xfs_dquot_item.h"
+#include "scrub/common.h"
+
+/* Convert a scrub type code to a DQ flag, or return 0 if error. */
+static inline uint
+xfs_scrub_quota_to_dqtype(
+	struct xfs_scrub_context	*sc)
+{
+	switch (sc->sm->sm_type) {
+	case XFS_SCRUB_TYPE_UQUOTA:
+		return XFS_DQ_USER;
+	case XFS_SCRUB_TYPE_GQUOTA:
+		return XFS_DQ_GROUP;
+	case XFS_SCRUB_TYPE_PQUOTA:
+		return XFS_DQ_PROJ;
+	default:
+		return 0;
+	}
+}
+
+/* Set us up to scrub a quota. */
+int
+xfs_scrub_setup_quota(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	uint				dqtype;
+
+	if (sc->sm->sm_agno || sc->sm->sm_ino || sc->sm->sm_gen)
+		return -EINVAL;
+
+	dqtype = xfs_scrub_quota_to_dqtype(sc);
+	if (dqtype == 0)
+		return -EINVAL;
+	return 0;
+}
+
+/* Quotas. */
+
+#define XFS_SCRUB_QUOTA_CHECK(fs_ok) \
+	XFS_SCRUB_DATA_CHECK(sc, XFS_DATA_FORK, id, tag, fs_ok)
+#define XFS_SCRUB_QUOTA_WARN(fs_ok) \
+	XFS_SCRUB_DATA_WARN(sc, XFS_DATA_FORK, id, tag, fs_ok)
+#define XFS_SCRUB_QUOTA_GOTO(fs_ok, label) \
+	XFS_SCRUB_DATA_GOTO(sc, XFS_DATA_FORK, id, tag, fs_ok, label)
+#define XFS_SCRUB_QUOTA_OP_ERR(label) \
+	XFS_SCRUB_FILE_OP_ERROR_GOTO(sc, XFS_DATA_FORK, id, tag, &error, label)
+/* Scrub the fields in an individual quota item. */
+STATIC void
+xfs_scrub_quota_item(
+	struct xfs_scrub_context	*sc,
+	const char			*tag,
+	uint				dqtype,
+	struct xfs_dquot		*dq,
+	xfs_dqid_t			id)
+{
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_disk_dquot		*d = &dq->q_core;
+	unsigned long long		bsoft;
+	unsigned long long		isoft;
+	unsigned long long		rsoft;
+	unsigned long long		bhard;
+	unsigned long long		ihard;
+	unsigned long long		rhard;
+	unsigned long long		bcount;
+	unsigned long long		icount;
+	unsigned long long		rcount;
+	xfs_ino_t			inodes;
+
+	/* Did we get the dquot we wanted? */
+	XFS_SCRUB_QUOTA_CHECK(id <= be32_to_cpu(d->d_id));
+	XFS_SCRUB_QUOTA_CHECK(dqtype ==
+			(d->d_flags & XFS_DQ_ALLTYPES));
+
+	/* Check the limits. */
+	bhard = be64_to_cpu(d->d_blk_hardlimit);
+	ihard = be64_to_cpu(d->d_ino_hardlimit);
+	rhard = be64_to_cpu(d->d_rtb_hardlimit);
+
+	bsoft = be64_to_cpu(d->d_blk_softlimit);
+	isoft = be64_to_cpu(d->d_ino_softlimit);
+	rsoft = be64_to_cpu(d->d_rtb_softlimit);
+
+	inodes = XFS_AGINO_TO_INO(mp, mp->m_sb.sb_agcount, 0);
+
+	/*
+	 * Warn if the limits are larger than the fs.  Administrators
+	 * can do this, though in production this seems suspect.
+	 */
+	XFS_SCRUB_QUOTA_WARN(bhard <= mp->m_sb.sb_dblocks);
+	XFS_SCRUB_QUOTA_WARN(ihard <= inodes);
+	XFS_SCRUB_QUOTA_WARN(rhard <= mp->m_sb.sb_rblocks);
+
+	XFS_SCRUB_QUOTA_WARN(bsoft <= mp->m_sb.sb_dblocks);
+	XFS_SCRUB_QUOTA_WARN(isoft <= inodes);
+	XFS_SCRUB_QUOTA_WARN(rsoft <= mp->m_sb.sb_rblocks);
+
+	/* Soft limit must be less than the hard limit. */
+	XFS_SCRUB_QUOTA_CHECK(bsoft <= bhard);
+	XFS_SCRUB_QUOTA_CHECK(isoft <= ihard);
+	XFS_SCRUB_QUOTA_CHECK(rsoft <= rhard);
+
+	/* Check the resource counts. */
+	bcount = be64_to_cpu(d->d_bcount);
+	icount = be64_to_cpu(d->d_icount);
+	rcount = be64_to_cpu(d->d_rtbcount);
+	inodes = percpu_counter_sum(&mp->m_icount);
+
+	/*
+	 * Check that usage doesn't exceed physical limits.  However, on
+	 * a reflink filesystem we're allowed to exceed physical space
+	 * if there are no quota limits.
+	 */
+	if (xfs_sb_version_hasreflink(&mp->m_sb))
+		XFS_SCRUB_QUOTA_WARN(bcount <= mp->m_sb.sb_dblocks);
+	else
+		XFS_SCRUB_QUOTA_CHECK(bcount <= mp->m_sb.sb_dblocks);
+	XFS_SCRUB_QUOTA_CHECK(icount <= inodes);
+	XFS_SCRUB_QUOTA_CHECK(rcount <= mp->m_sb.sb_rblocks);
+
+	/*
+	 * We can violate the hard limits if the admin suddenly sets a
+	 * lower limit than the actual usage.  However, we flag it for
+	 * admin review.
+	 */
+	XFS_SCRUB_QUOTA_WARN(id == 0 || bhard == 0 || bcount <= bhard);
+	XFS_SCRUB_QUOTA_WARN(id == 0 || ihard == 0 || icount <= ihard);
+	XFS_SCRUB_QUOTA_WARN(id == 0 || rhard == 0 || rcount <= rhard);
+}
+
+/* Scrub all of a quota type's items. */
+int
+xfs_scrub_quota(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_bmbt_irec		irec = { 0 };
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_inode		*ip;
+	const char			*tag = NULL;
+	struct xfs_quotainfo		*qi = mp->m_quotainfo;
+	struct xfs_dquot		*dq;
+	xfs_fileoff_t			max_dqid_off;
+	xfs_fileoff_t			off = 0;
+	xfs_dqid_t			id = 0;
+	uint				dqtype;
+	int				nimaps;
+	int				error;
+
+	if (!XFS_IS_QUOTA_RUNNING(mp) || !XFS_IS_QUOTA_ON(mp))
+		return -ENOENT;
+
+	dqtype = xfs_scrub_quota_to_dqtype(sc);
+	switch (dqtype) {
+	case XFS_DQ_USER:
+		tag = "usrquota";
+		break;
+	case XFS_DQ_GROUP:
+		tag = "grpquota";
+		break;
+	case XFS_DQ_PROJ:
+		tag = "prjquota";
+		break;
+	default:
+		ASSERT(0);
+	}
+
+	mutex_lock(&qi->qi_quotaofflock);
+	if (!xfs_this_quota_on(sc->mp, dqtype)) {
+		error = -ENOENT;
+		goto out;
+	}
+
+	/* Attach to the quota inode and set sc->ip so that reporting works. */
+	ip = xfs_quota_inode(sc->mp, dqtype);
+	sc->ip = ip;
+
+	/* Look for problem extents. */
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
+	max_dqid_off = ((xfs_dqid_t)-1) / qi->qi_dqperchunk;
+	while (1) {
+		if (xfs_scrub_should_terminate(&error))
+			break;
+
+		off = irec.br_startoff + irec.br_blockcount;
+		nimaps = 1;
+		error = xfs_bmapi_read(ip, off, -1, &irec, &nimaps,
+				XFS_BMAPI_ENTIRE);
+		XFS_SCRUB_QUOTA_OP_ERR(out_unlock);
+		if (!nimaps)
+			break;
+		if (irec.br_startblock == HOLESTARTBLOCK)
+			continue;
+
+		/*
+		 * Unwritten extents or blocks mapped above the highest
+		 * quota id shouldn't happen.
+		 */
+		XFS_SCRUB_QUOTA_GOTO(!isnullstartblock(irec.br_startblock),
+				next_extent);
+		XFS_SCRUB_QUOTA_GOTO(irec.br_startoff <= max_dqid_off,
+				next_extent);
+		XFS_SCRUB_QUOTA_GOTO(irec.br_startoff + irec.br_blockcount <=
+				max_dqid_off + 1, next_extent);
+next_extent:;
+	}
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+
+	/* Check all the quota items. */
+	while (id < ((xfs_dqid_t)-1ULL)) {
+		if (xfs_scrub_should_terminate(&error))
+			break;
+
+		error = xfs_qm_dqget(mp, NULL, id, dqtype, XFS_QMOPT_DQNEXT,
+				&dq);
+		if (error == -ENOENT)
+			break;
+		XFS_SCRUB_QUOTA_OP_ERR(out);
+
+		xfs_scrub_quota_item(sc, tag, dqtype, dq, id);
+
+		id = be32_to_cpu(dq->q_core.d_id) + 1;
+		xfs_qm_dqput(dq);
+	}
+	goto out;
+
+out_unlock:
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+out:
+	sc->ip = NULL;
+	mutex_unlock(&qi->qi_quotaofflock);
+	return error;
+}
+#undef XFS_SCRUB_QUOTA_OP_ERR
+#undef XFS_SCRUB_QUOTA_GOTO
+#undef XFS_SCRUB_QUOTA_WARN
+#undef XFS_SCRUB_QUOTA_CHECK
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 1be7b00..9f71cb9 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3332,7 +3332,10 @@ DEFINE_GETFSMAP_EVENT(xfs_getfsmap_mapping);
 	{ XFS_SCRUB_TYPE_XATTR,		"xattr" }, \
 	{ XFS_SCRUB_TYPE_SYMLINK,	"symlink" }, \
 	{ XFS_SCRUB_TYPE_RTBITMAP,	"rtbitmap" }, \
-	{ XFS_SCRUB_TYPE_RTSUM,		"rtsummary" }
+	{ XFS_SCRUB_TYPE_RTSUM,		"rtsummary" }, \
+	{ XFS_SCRUB_TYPE_UQUOTA,	"usrquota" }, \
+	{ XFS_SCRUB_TYPE_GQUOTA,	"grpquota" }, \
+	{ XFS_SCRUB_TYPE_PQUOTA,	"prjquota" }
 DECLARE_EVENT_CLASS(xfs_scrub_class,
 	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
 		 int error),


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH 01/22] xfs: query the per-AG reservation counters
  2017-07-21  4:38 ` [PATCH 01/22] xfs: query the per-AG reservation counters Darrick J. Wong
@ 2017-07-23 16:16   ` Allison Henderson
  2017-07-23 22:25   ` Dave Chinner
  1 sibling, 0 replies; 63+ messages in thread
From: Allison Henderson @ 2017-07-23 16:16 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

Looks good to me.  This one hasn't changed much since v7.  You can add 
my review:
Reviewed by: Allison Henderson <allison.henderson@oracle.com>

On 7/20/2017 9:38 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> Establish an ioctl for userspace to query the original and current
> per-AG reservation counts.  This will be used by xfs_scrub to
> check that the vfs counters are at least somewhat sane.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_fs.h |   10 ++++++++++
>  fs/xfs/xfs_fsops.c     |   29 +++++++++++++++++++++++++++++
>  fs/xfs/xfs_fsops.h     |    2 ++
>  fs/xfs/xfs_ioctl.c     |   16 ++++++++++++++++
>  fs/xfs/xfs_ioctl32.c   |    1 +
>  5 files changed, 58 insertions(+)
>
>
> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> index 8c61f21..5dedab9 100644
> --- a/fs/xfs/libxfs/xfs_fs.h
> +++ b/fs/xfs/libxfs/xfs_fs.h
> @@ -469,6 +469,15 @@ typedef struct xfs_swapext
>  #define XFS_FSOP_GOING_FLAGS_NOLOGFLUSH		0x2	/* don't flush log nor data */
>
>  /*
> + * AG reserved block counters
> + */
> +struct xfs_fsop_ag_resblks {
> +	__u64 resblks;		/* blocks reserved now */
> +	__u64 resblks_orig;	/* blocks reserved at mount time */
> +	__u64 reserved[2];
> +};
> +
> +/*
>   * ioctl limits
>   */
>  #ifdef XATTR_LIST_MAX
> @@ -543,6 +552,7 @@ typedef struct xfs_swapext
>  #define XFS_IOC_ATTRMULTI_BY_HANDLE  _IOW ('X', 123, struct xfs_fsop_attrmulti_handlereq)
>  #define XFS_IOC_FSGEOMETRY	     _IOR ('X', 124, struct xfs_fsop_geom)
>  #define XFS_IOC_GOINGDOWN	     _IOR ('X', 125, uint32_t)
> +#define XFS_IOC_GET_AG_RESBLKS	     _IOR ('X', 126, struct xfs_fsop_ag_resblks)
>  /*	XFS_IOC_GETFSUUID ---------- deprecated 140	 */
>
>
> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> index 8f22fc5..0920d59 100644
> --- a/fs/xfs/xfs_fsops.c
> +++ b/fs/xfs/xfs_fsops.c
> @@ -44,6 +44,7 @@
>  #include "xfs_filestream.h"
>  #include "xfs_rmap.h"
>  #include "xfs_ag_resv.h"
> +#include "xfs_fs.h"
>
>  /*
>   * File system operations
> @@ -1046,3 +1047,31 @@ xfs_fs_unreserve_ag_blocks(
>
>  	return error;
>  }
> +
> +/* Query the per-AG reservations to see how many blocks we have reserved. */
> +int
> +xfs_fs_get_ag_reserve_blocks(
> +	struct xfs_mount		*mp,
> +	struct xfs_fsop_ag_resblks	*out)
> +{
> +	struct xfs_ag_resv		*r;
> +	struct xfs_perag		*pag;
> +	xfs_agnumber_t			agno;
> +
> +	out->resblks = 0;
> +	out->resblks_orig = 0;
> +	out->reserved[0] = out->reserved[1] = 0;
> +
> +	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
> +		pag = xfs_perag_get(mp, agno);
> +		r = xfs_perag_resv(pag, XFS_AG_RESV_METADATA);
> +		out->resblks += r->ar_reserved;
> +		out->resblks_orig += r->ar_asked;
> +		r = xfs_perag_resv(pag, XFS_AG_RESV_AGFL);
> +		out->resblks += r->ar_reserved;
> +		out->resblks_orig += r->ar_asked;
> +		xfs_perag_put(pag);
> +	}
> +
> +	return 0;
> +}
> diff --git a/fs/xfs/xfs_fsops.h b/fs/xfs/xfs_fsops.h
> index 2954c13..c8f5e26 100644
> --- a/fs/xfs/xfs_fsops.h
> +++ b/fs/xfs/xfs_fsops.h
> @@ -25,6 +25,8 @@ extern int xfs_fs_counts(xfs_mount_t *mp, xfs_fsop_counts_t *cnt);
>  extern int xfs_reserve_blocks(xfs_mount_t *mp, uint64_t *inval,
>  				xfs_fsop_resblks_t *outval);
>  extern int xfs_fs_goingdown(xfs_mount_t *mp, uint32_t inflags);
> +extern int xfs_fs_get_ag_reserve_blocks(struct xfs_mount *mp,
> +		struct xfs_fsop_ag_resblks *out);
>
>  extern int xfs_fs_reserve_ag_blocks(struct xfs_mount *mp);
>  extern int xfs_fs_unreserve_ag_blocks(struct xfs_mount *mp);
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 9c0c7a9..cc00260 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -1974,6 +1974,22 @@ xfs_file_ioctl(
>  		return 0;
>  	}
>
> +	case XFS_IOC_GET_AG_RESBLKS: {
> +		struct xfs_fsop_ag_resblks	out;
> +
> +		if (!capable(CAP_SYS_ADMIN))
> +			return -EPERM;
> +
> +		error = xfs_fs_get_ag_reserve_blocks(mp, &out);
> +		if (error)
> +			return error;
> +
> +		if (copy_to_user(arg, &out, sizeof(out)))
> +			return -EFAULT;
> +
> +		return 0;
> +	}
> +
>  	case XFS_IOC_FSGROWFSDATA: {
>  		xfs_growfs_data_t in;
>
> diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
> index fa0bc4d..e8b4de3 100644
> --- a/fs/xfs/xfs_ioctl32.c
> +++ b/fs/xfs/xfs_ioctl32.c
> @@ -556,6 +556,7 @@ xfs_file_compat_ioctl(
>  	case XFS_IOC_ERROR_INJECTION:
>  	case XFS_IOC_ERROR_CLEARALL:
>  	case FS_IOC_GETFSMAP:
> +	case XFS_IOC_GET_AG_RESBLKS:
>  		return xfs_file_ioctl(filp, cmd, p);
>  #ifndef BROKEN_X86_ALIGNMENT
>  	/* These are handled fine if no alignment issues */
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 02/22] xfs: add scrub tracepoints
  2017-07-21  4:38 ` [PATCH 02/22] xfs: add scrub tracepoints Darrick J. Wong
@ 2017-07-23 16:23   ` Allison Henderson
  0 siblings, 0 replies; 63+ messages in thread
From: Allison Henderson @ 2017-07-23 16:23 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

Looks good.  When I had sets with multiple versions, I would add a quick 
version history in the commit, just to help reviewers find the updates. 
Up to you though.  You can mark down my review.

Reviewed by: Allison Henderson <allison.henderson@oracle.com>

On 7/20/2017 9:38 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_types.h |    5 +
>  fs/xfs/xfs_trace.h        |  375 +++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 380 insertions(+)
>
>
> diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h
> index 0220159..3ee2dba 100644
> --- a/fs/xfs/libxfs/xfs_types.h
> +++ b/fs/xfs/libxfs/xfs_types.h
> @@ -94,6 +94,11 @@ typedef int64_t		xfs_sfiloff_t;	/* signed block number in a file */
>  #define	XFS_ATTR_FORK	1
>  #define	XFS_COW_FORK	2
>
> +#define XFS_FORK_DESC \
> +	{ XFS_DATA_FORK,	"data" }, \
> +	{ XFS_ATTR_FORK,	"attr" }, \
> +	{ XFS_COW_FORK,		"CoW" }
> +
>  /*
>   * Min numbers of data/attr fork btree root pointers.
>   */
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index bcc3cdf..2e7e193 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -42,6 +42,7 @@ struct xfs_btree_cur;
>  struct xfs_refcount_irec;
>  struct xfs_fsmap;
>  struct xfs_rmap_irec;
> +struct xfs_scrub_metadata;
>
>  DECLARE_EVENT_CLASS(xfs_attr_list_class,
>  	TP_PROTO(struct xfs_attr_list_context *ctx),
> @@ -3309,6 +3310,380 @@ DEFINE_GETFSMAP_EVENT(xfs_getfsmap_low_key);
>  DEFINE_GETFSMAP_EVENT(xfs_getfsmap_high_key);
>  DEFINE_GETFSMAP_EVENT(xfs_getfsmap_mapping);
>
> +/* scrub */
> +#define XFS_SCRUB_TYPE_DESC \
> +	{ 0, NULL }
> +DECLARE_EVENT_CLASS(xfs_scrub_class,
> +	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
> +		 int error),
> +	TP_ARGS(ip, sm, error),
> +	TP_STRUCT__entry(
> +		__field(dev_t, dev)
> +		__field(xfs_ino_t, ino)
> +		__field(int, type)
> +		__field(xfs_agnumber_t, agno)
> +		__field(xfs_ino_t, inum)
> +		__field(unsigned int, gen)
> +		__field(unsigned int, flags)
> +		__field(int, error)
> +	),
> +	TP_fast_assign(
> +		__entry->dev = ip->i_mount->m_super->s_dev;
> +		__entry->ino = ip->i_ino;
> +		__entry->error = error;
> +	),
> +	TP_printk("dev %d:%d ino %llu type %s agno %u inum %llu gen %u flags 0x%x error %d",
> +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> +		  __entry->ino,
> +		  __print_symbolic(__entry->type, XFS_SCRUB_TYPE_DESC),
> +		  __entry->agno,
> +		  __entry->inum,
> +		  __entry->gen,
> +		  __entry->flags,
> +		  __entry->error)
> +)
> +#define DEFINE_SCRUB_EVENT(name) \
> +DEFINE_EVENT(xfs_scrub_class, name, \
> +	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm, \
> +		 int error), \
> +	TP_ARGS(ip, sm, error))
> +
> +DEFINE_SCRUB_EVENT(xfs_scrub);
> +DEFINE_SCRUB_EVENT(xfs_scrub_done);
> +DEFINE_SCRUB_EVENT(xfs_scrub_deadlock_retry);
> +
> +DECLARE_EVENT_CLASS(xfs_scrub_sbtree_class,
> +	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agblock_t bno,
> +		 xfs_btnum_t btnum, int level, int nlevels, int ptr),
> +	TP_ARGS(mp, agno, bno, btnum, level, nlevels, ptr),
> +	TP_STRUCT__entry(
> +		__field(dev_t, dev)
> +		__field(xfs_btnum_t, btnum)
> +		__field(xfs_agnumber_t, agno)
> +		__field(xfs_agblock_t, bno)
> +		__field(int, level)
> +		__field(int, nlevels)
> +		__field(int, ptr)
> +	),
> +	TP_fast_assign(
> +		__entry->dev = mp->m_super->s_dev;
> +		__entry->agno = agno;
> +		__entry->btnum = btnum;
> +		__entry->bno = bno;
> +		__entry->level = level;
> +		__entry->nlevels = nlevels;
> +		__entry->ptr = ptr;
> +	),
> +	TP_printk("dev %d:%d agno %u agbno %u btnum %d level %d nlevels %d ptr %d",
> +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> +		  __entry->agno,
> +		  __entry->bno,
> +		  __entry->btnum,
> +		  __entry->level,
> +		  __entry->nlevels,
> +		  __entry->ptr)
> +)
> +#define DEFINE_SCRUB_SBTREE_EVENT(name) \
> +DEFINE_EVENT(xfs_scrub_sbtree_class, name, \
> +	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agblock_t bno, \
> +		 xfs_btnum_t btnum, int level, int nlevels, int ptr), \
> +	TP_ARGS(mp, agno, bno, btnum, level, nlevels, ptr))
> +
> +DEFINE_SCRUB_SBTREE_EVENT(xfs_scrub_btree_rec);
> +DEFINE_SCRUB_SBTREE_EVENT(xfs_scrub_btree_key);
> +
> +TRACE_EVENT(xfs_scrub_op_error,
> +	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agblock_t bno,
> +		 const char *type, int error, const char *func,
> +		 int line),
> +	TP_ARGS(mp, agno, bno, type, error, func, line),
> +	TP_STRUCT__entry(
> +		__field(dev_t, dev)
> +		__field(xfs_agnumber_t, agno)
> +		__field(xfs_agblock_t, bno)
> +		__string(type, type)
> +		__field(int, error)
> +		__string(func, func)
> +		__field(int, line)
> +	),
> +	TP_fast_assign(
> +		__entry->dev = mp->m_super->s_dev;
> +		__entry->agno = agno;
> +		__entry->bno = bno;
> +		__assign_str(type, type);
> +		__entry->error = error;
> +		__assign_str(func, func);
> +		__entry->line = line;
> +	),
> +	TP_printk("dev %d:%d agno %u agbno %u type '%s' error %d fn %s:%d",
> +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> +		  __entry->agno,
> +		  __entry->bno,
> +		  __get_str(type),
> +		  __entry->error,
> +		  __get_str(func),
> +		  __entry->line)
> +);
> +
> +TRACE_EVENT(xfs_scrub_file_op_error,
> +	TP_PROTO(struct xfs_inode *ip, int whichfork, xfs_fileoff_t offset,
> +		 const char *type, int error, const char *func,
> +		 int line),
> +	TP_ARGS(ip, whichfork, offset, type, error, func, line),
> +	TP_STRUCT__entry(
> +		__field(dev_t, dev)
> +		__field(xfs_ino_t, ino)
> +		__field(int, whichfork)
> +		__field(xfs_fileoff_t, offset)
> +		__string(type, type)
> +		__field(int, error)
> +		__string(func, func)
> +		__field(int, line)
> +	),
> +	TP_fast_assign(
> +		__entry->dev = ip->i_mount->m_super->s_dev;
> +		__entry->ino = ip->i_ino;
> +		__entry->whichfork = whichfork;
> +		__entry->offset = offset;
> +		__assign_str(type, type);
> +		__entry->error = error;
> +		__assign_str(func, func);
> +		__entry->line = line;
> +	),
> +	TP_printk("dev %d:%d ino %llu %s offset %llu type '%s' error %d fn %s:%d",
> +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> +		  __entry->ino,
> +		  __print_symbolic(__entry->whichfork, XFS_FORK_DESC),
> +		  __entry->offset,
> +		  __get_str(type),
> +		  __entry->error,
> +		  __get_str(func),
> +		  __entry->line)
> +);
> +
> +DECLARE_EVENT_CLASS(xfs_scrub_block_error_class,
> +	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agblock_t bno,
> +		 const char *type, const char *check, const char *func,
> +		 int line),
> +	TP_ARGS(mp, agno, bno, type, check, func, line),
> +	TP_STRUCT__entry(
> +		__field(dev_t, dev)
> +		__field(xfs_agnumber_t, agno)
> +		__field(xfs_agblock_t, bno)
> +		__string(type, type)
> +		__string(check, check)
> +		__string(func, func)
> +		__field(int, line)
> +	),
> +	TP_fast_assign(
> +		__entry->dev = mp->m_super->s_dev;
> +		__entry->agno = agno;
> +		__entry->bno = bno;
> +		__assign_str(type, type);
> +		__assign_str(check, check);
> +		__assign_str(func, func);
> +		__entry->line = line;
> +	),
> +	TP_printk("dev %d:%d agno %u agbno %u type '%s' check '%s' fn %s:%d",
> +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> +		  __entry->agno,
> +		  __entry->bno,
> +		  __get_str(type),
> +		  __get_str(check),
> +		  __get_str(func),
> +		  __entry->line)
> +)
> +
> +#define DEFINE_SCRUB_BLOCK_ERROR_EVENT(name) \
> +DEFINE_EVENT(xfs_scrub_block_error_class, name, \
> +	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agblock_t bno, \
> +		 const char *type, const char *check, const char *func, \
> +		 int line), \
> +	TP_ARGS(mp, agno, bno, type, check, func, line))
> +
> +DEFINE_SCRUB_BLOCK_ERROR_EVENT(xfs_scrub_block_error);
> +DEFINE_SCRUB_BLOCK_ERROR_EVENT(xfs_scrub_block_preen);
> +
> +DECLARE_EVENT_CLASS(xfs_scrub_ino_error_class,
> +	TP_PROTO(struct xfs_mount *mp, xfs_ino_t ino, xfs_agnumber_t agno, xfs_agblock_t bno,
> +		 const char *type, const char *check, const char *func,
> +		 int line),
> +	TP_ARGS(mp, ino, agno, bno, type, check, func, line),
> +	TP_STRUCT__entry(
> +		__field(dev_t, dev)
> +		__field(xfs_ino_t, ino)
> +		__field(xfs_agnumber_t, agno)
> +		__field(xfs_agblock_t, bno)
> +		__string(type, type)
> +		__string(check, check)
> +		__string(func, func)
> +		__field(int, line)
> +	),
> +	TP_fast_assign(
> +		__entry->dev = mp->m_super->s_dev;
> +		__entry->ino = ino;
> +		__entry->agno = agno;
> +		__entry->bno = bno;
> +		__assign_str(type, type);
> +		__assign_str(check, check);
> +		__assign_str(func, func);
> +		__entry->line = line;
> +	),
> +	TP_printk("dev %d:%d ino %llu agno %u agbno %u type '%s' check '%s' fn %s:%d",
> +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> +		  __entry->ino,
> +		  __entry->agno,
> +		  __entry->bno,
> +		  __get_str(type),
> +		  __get_str(check),
> +		  __get_str(func),
> +		  __entry->line)
> +)
> +
> +#define DEFINE_SCRUB_INO_ERROR_EVENT(name) \
> +DEFINE_EVENT(xfs_scrub_ino_error_class, name, \
> +	TP_PROTO(struct xfs_mount *mp, xfs_ino_t ino, xfs_agnumber_t agno, xfs_agblock_t bno, \
> +		 const char *type, const char *check, const char *func, \
> +		 int line), \
> +	TP_ARGS(mp, ino, agno, bno, type, check, func, line))
> +
> +DEFINE_SCRUB_INO_ERROR_EVENT(xfs_scrub_ino_error);
> +DEFINE_SCRUB_INO_ERROR_EVENT(xfs_scrub_ino_preen);
> +
> +DECLARE_EVENT_CLASS(xfs_scrub_data_error_class,
> +	TP_PROTO(struct xfs_inode *ip, int whichfork, xfs_fileoff_t offset,
> +		 const char *type, const char *check, const char *func,
> +		 int line),
> +	TP_ARGS(ip, whichfork, offset, type, check, func, line),
> +	TP_STRUCT__entry(
> +		__field(dev_t, dev)
> +		__field(xfs_ino_t, ino)
> +		__field(int, whichfork)
> +		__field(xfs_fileoff_t, offset)
> +		__string(type, type)
> +		__string(check, check)
> +		__string(func, func)
> +		__field(int, line)
> +	),
> +	TP_fast_assign(
> +		__entry->dev = ip->i_mount->m_super->s_dev;
> +		__entry->ino = ip->i_ino;
> +		__entry->whichfork = whichfork;
> +		__entry->offset = offset;
> +		__assign_str(type, type);
> +		__assign_str(check, check);
> +		__assign_str(func, func);
> +		__entry->line = line;
> +	),
> +	TP_printk("dev %d:%d ino %llu %s fork offset %llu type '%s' check '%s' fn %s:%d",
> +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> +		  __entry->ino,
> +		  __print_symbolic(__entry->whichfork, XFS_FORK_DESC),
> +		  __entry->offset,
> +		  __get_str(type),
> +		  __get_str(check),
> +		  __get_str(func),
> +		  __entry->line)
> +);
> +
> +#define DEFINE_SCRUB_DATA_ERROR_EVENT(name) \
> +DEFINE_EVENT(xfs_scrub_data_error_class, name, \
> +	TP_PROTO(struct xfs_inode *ip, int whichfork, xfs_fileoff_t offset, \
> +		 const char *type, const char *check, const char *func, \
> +		 int line), \
> +	TP_ARGS(ip, whichfork, offset, type, check, func, line))
> +
> +DEFINE_SCRUB_DATA_ERROR_EVENT(xfs_scrub_data_error);
> +DEFINE_SCRUB_DATA_ERROR_EVENT(xfs_scrub_data_warning);
> +
> +TRACE_EVENT(xfs_scrub_xref_error,
> +	TP_PROTO(struct xfs_mount *mp, const char *type, int error,
> +		 const char *func, int line),
> +	TP_ARGS(mp, type, error, func, line),
> +	TP_STRUCT__entry(
> +		__field(dev_t, dev)
> +		__string(type, type)
> +		__field(int, error)
> +		__string(func, func)
> +		__field(int, line)
> +	),
> +	TP_fast_assign(
> +		__entry->dev = mp->m_super->s_dev;
> +		__assign_str(type, type);
> +		__entry->error = error;
> +		__assign_str(func, func);
> +		__entry->line = line;
> +	),
> +	TP_printk("dev %d:%d btree %s xref error %d fn %s:%d",
> +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> +		  __get_str(type),
> +		  __entry->error,
> +		  __get_str(func),
> +		  __entry->line)
> +);
> +
> +TRACE_EVENT(xfs_scrub_btree_error,
> +	TP_PROTO(struct xfs_mount *mp, const char *bt_type, const char *bt_ptr,
> +		 xfs_agnumber_t agno, xfs_agblock_t bno, const char *check,
> +		 const char *func, int line),
> +	TP_ARGS(mp, bt_type, bt_ptr, agno, bno, check, func, line),
> +	TP_STRUCT__entry(
> +		__field(dev_t, dev)
> +		__string(bt_type, bt_type)
> +		__string(bt_ptr, bt_ptr)
> +		__field(xfs_agnumber_t, agno)
> +		__field(xfs_agblock_t, bno)
> +		__string(check, check)
> +		__string(func, func)
> +		__field(int, line)
> +	),
> +	TP_fast_assign(
> +		__entry->dev = mp->m_super->s_dev;
> +		__assign_str(bt_type, bt_type);
> +		__assign_str(bt_ptr, bt_ptr);
> +		__entry->agno = agno;
> +		__entry->bno = bno;
> +		__assign_str(check, check);
> +		__assign_str(func, func);
> +		__entry->line = line;
> +	),
> +	TP_printk("dev %d:%d %s %s agno %u agbno %u check '%s' fn %s:%d",
> +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> +		  __get_str(bt_type),
> +		  __get_str(bt_ptr),
> +		  __entry->agno,
> +		  __entry->bno,
> +		  __get_str(check),
> +		  __get_str(func),
> +		  __entry->line)
> +);
> +
> +TRACE_EVENT(xfs_scrub_incomplete,
> +	TP_PROTO(struct xfs_mount *mp, const char *type, const char *check,
> +		 const char *func, int line),
> +	TP_ARGS(mp, type, check, func, line),
> +	TP_STRUCT__entry(
> +		__field(dev_t, dev)
> +		__string(type, type)
> +		__string(check, check)
> +		__string(func, func)
> +		__field(int, line)
> +	),
> +	TP_fast_assign(
> +		__entry->dev = mp->m_super->s_dev;
> +		__assign_str(type, type);
> +		__assign_str(check, check);
> +		__assign_str(func, func);
> +		__entry->line = line;
> +	),
> +	TP_printk("dev %d:%d %s check '%s' fn %s:%d",
> +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> +		  __get_str(type),
> +		  __get_str(check),
> +		  __get_str(func),
> +		  __entry->line)
> +);
> +
>  #endif /* _TRACE_XFS_H */
>
>  #undef TRACE_INCLUDE_PATH
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 03/22] xfs: create an ioctl to scrub AG metadata
  2017-07-21  4:38 ` [PATCH 03/22] xfs: create an ioctl to scrub AG metadata Darrick J. Wong
@ 2017-07-23 16:37   ` Allison Henderson
  2017-07-23 23:45   ` Dave Chinner
  1 sibling, 0 replies; 63+ messages in thread
From: Allison Henderson @ 2017-07-23 16:37 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

Reviewed by: Allison Henderson <allison.henderson@oracle.com>

On 7/20/2017 9:38 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> Create an ioctl that can be used to scrub internal filesystem metadata.
> The new ioctl takes the metadata type, an (optional) AG number, an
> (optional) inode number and generation, and a flags argument.  This will
> be used by the upcoming XFS online scrub tool.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/Kconfig           |   17 +
>  fs/xfs/Makefile          |    7 +
>  fs/xfs/libxfs/xfs_fs.h   |   41 ++++
>  fs/xfs/scrub/common.c    |  533 ++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/scrub/common.h    |  179 +++++++++++++++
>  fs/xfs/scrub/xfs_scrub.h |   29 +++
>  fs/xfs/xfs_ioctl.c       |   28 ++
>  fs/xfs/xfs_ioctl32.c     |    1
>  fs/xfs/xfs_trace.h       |    7 +
>  9 files changed, 841 insertions(+), 1 deletion(-)
>  create mode 100644 fs/xfs/scrub/common.c
>  create mode 100644 fs/xfs/scrub/common.h
>  create mode 100644 fs/xfs/scrub/xfs_scrub.h
>
>
> diff --git a/fs/xfs/Kconfig b/fs/xfs/Kconfig
> index 1b98cfa..f42fcf1 100644
> --- a/fs/xfs/Kconfig
> +++ b/fs/xfs/Kconfig
> @@ -71,6 +71,23 @@ config XFS_RT
>
>  	  If unsure, say N.
>
> +config XFS_ONLINE_SCRUB
> +	bool "XFS online metadata check support"
> +	default n
> +	depends on XFS_FS
> +	help
> +	  If you say Y here you will be able to check metadata on a
> +	  mounted XFS filesystem.  This feature is intended to reduce
> +	  filesystem downtime by supplementing xfs_repair.  The key
> +	  advantage here is to look for problems proactively so that
> +	  they can be dealt with in a controlled manner.
> +
> +	  This feature is considered EXPERIMENTAL.  Use with caution!
> +
> +	  See the xfs_scrub man page in section 8 for additional information.
> +
> +	  If unsure, say N.
> +
>  config XFS_WARN
>  	bool "XFS Verbose Warnings"
>  	depends on XFS_FS && !XFS_DEBUG
> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> index 5b959ee..c4fdaa2 100644
> --- a/fs/xfs/Makefile
> +++ b/fs/xfs/Makefile
> @@ -136,3 +136,10 @@ xfs-$(CONFIG_XFS_POSIX_ACL)	+= xfs_acl.o
>  xfs-$(CONFIG_SYSCTL)		+= xfs_sysctl.o
>  xfs-$(CONFIG_COMPAT)		+= xfs_ioctl32.o
>  xfs-$(CONFIG_EXPORTFS_BLOCK_OPS)	+= xfs_pnfs.o
> +
> +# online scrub/repair
> +ifeq ($(CONFIG_XFS_ONLINE_SCRUB),y)
> +xfs-y				+= $(addprefix scrub/, \
> +				   common.o \
> +				   )
> +endif
> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> index 5dedab9..aeccc99 100644
> --- a/fs/xfs/libxfs/xfs_fs.h
> +++ b/fs/xfs/libxfs/xfs_fs.h
> @@ -468,6 +468,46 @@ typedef struct xfs_swapext
>  #define XFS_FSOP_GOING_FLAGS_LOGFLUSH		0x1	/* flush log but not data */
>  #define XFS_FSOP_GOING_FLAGS_NOLOGFLUSH		0x2	/* don't flush log nor data */
>
> +/* metadata scrubbing */
> +struct xfs_scrub_metadata {
> +	__u32 sm_type;		/* What to check? */
> +	__u32 sm_flags;		/* flags; see below. */
> +	__u64 sm_ino;		/* inode number. */
> +	__u32 sm_gen;		/* inode generation. */
> +	__u32 sm_agno;		/* ag number. */
> +	__u64 sm_reserved[5];	/* pad to 64 bytes */
> +};
> +
> +/*
> + * Metadata types and flags for scrub operation.
> + */
> +#define XFS_SCRUB_TYPE_TEST	0	/* dummy to test ioctl */
> +#define XFS_SCRUB_TYPE_MAX	0
> +
> +/* i: repair this metadata */
> +#define XFS_SCRUB_FLAG_REPAIR		(1 << 0)
> +/* o: metadata object needs repair */
> +#define XFS_SCRUB_FLAG_CORRUPT		(1 << 1)
> +/* o: metadata object could be optimized */
> +#define XFS_SCRUB_FLAG_PREEN		(1 << 2)
> +/* o: cross-referencing failed */
> +#define XFS_SCRUB_FLAG_XFAIL		(1 << 3)
> +/* o: metadata object disagrees with cross-referenced metadata */
> +#define XFS_SCRUB_FLAG_XCORRUPT		(1 << 4)
> +/* o: scan was not complete */
> +#define XFS_SCRUB_FLAG_INCOMPLETE	(1 << 5)
> +/* o: metadata object looked funny but isn't corrupt */
> +#define XFS_SCRUB_FLAG_WARNING		(1 << 6)
> +
> +#define XFS_SCRUB_FLAGS_IN	(XFS_SCRUB_FLAG_REPAIR)
> +#define XFS_SCRUB_FLAGS_OUT	(XFS_SCRUB_FLAG_CORRUPT | \
> +				 XFS_SCRUB_FLAG_PREEN | \
> +				 XFS_SCRUB_FLAG_XFAIL | \
> +				 XFS_SCRUB_FLAG_XCORRUPT | \
> +				 XFS_SCRUB_FLAG_INCOMPLETE | \
> +				 XFS_SCRUB_FLAG_WARNING)
> +#define XFS_SCRUB_FLAGS_ALL	(XFS_SCRUB_FLAGS_IN | XFS_SCRUB_FLAGS_OUT)
> +
>  /*
>   * AG reserved block counters
>   */
> @@ -520,6 +560,7 @@ struct xfs_fsop_ag_resblks {
>  #define XFS_IOC_ZERO_RANGE	_IOW ('X', 57, struct xfs_flock64)
>  #define XFS_IOC_FREE_EOFBLOCKS	_IOR ('X', 58, struct xfs_fs_eofblocks)
>  /*	XFS_IOC_GETFSMAP ------ hoisted 59         */
> +#define XFS_IOC_SCRUB_METADATA	_IOWR('X', 60, struct xfs_scrub_metadata)
>
>  /*
>   * ioctl commands that replace IRIX syssgi()'s
> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> new file mode 100644
> index 0000000..6931793
> --- /dev/null
> +++ b/fs/xfs/scrub/common.c
> @@ -0,0 +1,533 @@
> +/*
> + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> + *
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#include "xfs.h"
> +#include "xfs_fs.h"
> +#include "xfs_shared.h"
> +#include "xfs_format.h"
> +#include "xfs_trans_resv.h"
> +#include "xfs_mount.h"
> +#include "xfs_defer.h"
> +#include "xfs_btree.h"
> +#include "xfs_bit.h"
> +#include "xfs_log_format.h"
> +#include "xfs_trans.h"
> +#include "xfs_trace.h"
> +#include "xfs_sb.h"
> +#include "xfs_inode.h"
> +#include "xfs_alloc.h"
> +#include "xfs_alloc_btree.h"
> +#include "xfs_bmap.h"
> +#include "xfs_bmap_btree.h"
> +#include "xfs_ialloc.h"
> +#include "xfs_ialloc_btree.h"
> +#include "xfs_refcount.h"
> +#include "xfs_refcount_btree.h"
> +#include "xfs_rmap.h"
> +#include "xfs_rmap_btree.h"
> +#include "scrub/xfs_scrub.h"
> +#include "scrub/common.h"
> +
> +/*
> + * Online Scrub and Repair
> + *
> + * Traditionally, XFS (the kernel driver) did not know how to check or
> + * repair on-disk data structures.  That task was left to the xfs_check
> + * and xfs_repair tools, both of which require taking the filesystem
> + * offline for a thorough but time consuming examination.  Online
> + * scrub & repair, on the other hand, enables us to check the metadata
> + * for obvious errors while carefully stepping around the filesystem's
> + * ongoing operations, locking rules, etc.
> + *
> + * Given that most XFS metadata consist of records stored in a btree,
> + * most of the checking functions iterate the btree blocks themselves
> + * looking for irregularities.  When a record block is encountered, each
> + * record can be checked for obviously bad values.  Record values can
> + * also be cross-referenced against other btrees to look for potential
> + * misunderstandings between pieces of metadata.
> + *
> + * It is expected that the checkers responsible for per-AG metadata
> + * structures will lock the AG headers (AGI, AGF, AGFL), iterate the
> + * metadata structure, and perform any relevant cross-referencing before
> + * unlocking the AG and returning the results to userspace.  These
> + * scrubbers must not keep an AG locked for too long to avoid tying up
> + * the block and inode allocators.
> + *
> + * Block maps and b-trees rooted in an inode present a special challenge
> + * because they can involve extents from any AG.  The general scrubber
> + * structure of lock -> check -> xref -> unlock still holds, but AG
> + * locking order rules /must/ be obeyed to avoid deadlocks.  The
> + * ordering rule, of course, is that we must lock in increasing AG
> + * order.  Helper functions are provided to track which AG headers we've
> + * already locked.  If we detect an imminent locking order violation, we
> + * can signal a potential deadlock, in which case the scrubber can jump
> + * out to the top level, lock all the AGs in order, and retry the scrub.
> + *
> + * For file data (directories, extended attributes, symlinks) scrub, we
> + * can simply lock the inode and walk the data.  For btree data
> + * (directories and attributes) we follow the same btree-scrubbing
> + * strategy outlined previously to check the records.
> + *
> + * We use a bit of trickery with transactions to avoid buffer deadlocks
> + * if there is a cycle in the metadata.  The basic problem is that
> + * travelling down a btree involves locking the current buffer at each
> + * tree level.  If a pointer should somehow point back to a buffer that
> + * we've already examined, we will deadlock due to the second buffer
> + * locking attempt.  Note however that grabbing a buffer in transaction
> + * context links the locked buffer to the transaction.  If we try to
> + * re-grab the buffer in the context of the same transaction, we avoid
> + * the second lock attempt and continue.  Between the verifier and the
> + * scrubber, something will notice that something is amiss and report
> + * the corruption.  Therefore, each scrubber will allocate an empty
> + * transaction, attach buffers to it, and cancel the transaction at the
> + * end of the scrub run.  Cancelling a non-dirty transaction simply
> + * unlocks the buffers.
> + *
> + * There are four pieces of data that scrub can communicate to
> + * userspace.  The first is the error code (errno), which can be used to
> + * communicate operational errors in performing the scrub.  There are
> + * also three flags that can be set in the scrub context.  If the data
> + * structure itself is corrupt, the CORRUPT flag will be set.  If
> + * the metadata is correct but otherwise suboptimal, the PREEN flag
> + * will be set.
> + */
> +
> +struct xfs_scrub_meta_fns {
> +	int		(*setup)(struct xfs_scrub_context *,
> +				 struct xfs_inode *);
> +	int		(*scrub)(struct xfs_scrub_context *);
> +	bool		(*has)(struct xfs_sb *);
> +};
> +
> +/* Check for operational errors. */
> +bool
> +xfs_scrub_op_ok(
> +	struct xfs_scrub_context	*sc,
> +	xfs_agnumber_t			agno,
> +	xfs_agblock_t			bno,
> +	const char			*type,
> +	int				*error,
> +	const char			*func,
> +	int				line)
> +{
> +	struct xfs_mount		*mp = sc->mp;
> +
> +	switch (*error) {
> +	case 0:
> +		return true;
> +	case -EDEADLOCK:
> +		/* Used to restart an op with deadlock avoidance. */
> +		trace_xfs_scrub_deadlock_retry(sc->ip, sc->sm, *error);
> +		break;
> +	case -EFSBADCRC:
> +	case -EFSCORRUPTED:
> +		/* Note the badness but don't abort. */
> +		sc->sm->sm_flags |= XFS_SCRUB_FLAG_CORRUPT;
> +		*error = 0;
> +		/* fall through */
> +	default:
> +		trace_xfs_scrub_op_error(mp, agno, bno, type, *error, func,
> +				line);
> +		break;
> +	}
> +	return false;
> +}
> +
> +/* Check for operational errors for a file offset. */
> +bool
> +xfs_scrub_file_op_ok(
> +	struct xfs_scrub_context	*sc,
> +	int				whichfork,
> +	xfs_fileoff_t			offset,
> +	const char			*type,
> +	int				*error,
> +	const char			*func,
> +	int				line)
> +{
> +	switch (*error) {
> +	case 0:
> +		return true;
> +	case -EDEADLOCK:
> +		/* Used to restart an op with deadlock avoidance. */
> +		trace_xfs_scrub_deadlock_retry(sc->ip, sc->sm, *error);
> +		break;
> +	case -EFSBADCRC:
> +	case -EFSCORRUPTED:
> +		/* Note the badness but don't abort. */
> +		sc->sm->sm_flags |= XFS_SCRUB_FLAG_CORRUPT;
> +		*error = 0;
> +		/* fall through */
> +	default:
> +		trace_xfs_scrub_file_op_error(sc->ip, whichfork, offset, type,
> +				*error, func, line);
> +		break;
> +	}
> +	return false;
> +}
> +
> +/* Check for metadata block optimization possibilities. */
> +bool
> +xfs_scrub_block_preen(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_buf			*bp,
> +	const char			*type,
> +	bool				fs_ok,
> +	const char			*check,
> +	const char			*func,
> +	int				line)
> +{
> +	struct xfs_mount		*mp = sc->mp;
> +	xfs_fsblock_t			fsbno;
> +	xfs_agnumber_t			agno;
> +	xfs_agblock_t			bno;
> +
> +	if (fs_ok)
> +		return fs_ok;
> +
> +	fsbno = XFS_DADDR_TO_FSB(mp, bp->b_bn);
> +	agno = XFS_FSB_TO_AGNO(mp, fsbno);
> +	bno = XFS_FSB_TO_AGBNO(mp, fsbno);
> +
> +	sc->sm->sm_flags |= XFS_SCRUB_FLAG_PREEN;
> +	trace_xfs_scrub_block_preen(mp, agno, bno, type, check, func, line);
> +	return fs_ok;
> +}
> +
> +/* Check for metadata block corruption. */
> +bool
> +xfs_scrub_block_ok(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_buf			*bp,
> +	const char			*type,
> +	bool				fs_ok,
> +	const char			*check,
> +	const char			*func,
> +	int				line)
> +{
> +	struct xfs_mount		*mp = sc->mp;
> +	xfs_fsblock_t			fsbno;
> +	xfs_agnumber_t			agno;
> +	xfs_agblock_t			bno;
> +
> +	if (fs_ok)
> +		return fs_ok;
> +
> +	fsbno = XFS_DADDR_TO_FSB(mp, bp->b_bn);
> +	agno = XFS_FSB_TO_AGNO(mp, fsbno);
> +	bno = XFS_FSB_TO_AGBNO(mp, fsbno);
> +
> +	sc->sm->sm_flags |= XFS_SCRUB_FLAG_CORRUPT;
> +	trace_xfs_scrub_block_error(mp, agno, bno, type, check, func, line);
> +	return fs_ok;
> +}
> +
> +/* Check for inode metadata corruption. */
> +bool
> +xfs_scrub_ino_ok(
> +	struct xfs_scrub_context	*sc,
> +	xfs_ino_t			ino,
> +	struct xfs_buf			*bp,
> +	const char			*type,
> +	bool				fs_ok,
> +	const char			*check,
> +	const char			*func,
> +	int				line)
> +{
> +	struct xfs_inode		*ip = sc->ip;
> +	struct xfs_mount		*mp = sc->mp;
> +	xfs_fsblock_t			fsbno;
> +	xfs_agnumber_t			agno;
> +	xfs_agblock_t			bno;
> +
> +	if (fs_ok)
> +		return fs_ok;
> +
> +	if (bp) {
> +		fsbno = XFS_DADDR_TO_FSB(mp, bp->b_bn);
> +		agno = XFS_FSB_TO_AGNO(mp, fsbno);
> +		bno = XFS_FSB_TO_AGBNO(mp, fsbno);
> +	} else {
> +		agno = XFS_INO_TO_AGNO(mp, ip->i_ino);
> +		bno = XFS_INO_TO_AGINO(mp, ip->i_ino);
> +	}
> +
> +	sc->sm->sm_flags |= XFS_SCRUB_FLAG_CORRUPT;
> +	trace_xfs_scrub_ino_error(mp, ino, agno, bno, type, check, func, line);
> +	return fs_ok;
> +}
> +
> +/* Check for inode metadata optimization possibilities. */
> +bool
> +xfs_scrub_ino_preen(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_buf			*bp,
> +	const char			*type,
> +	bool				fs_ok,
> +	const char			*check,
> +	const char			*func,
> +	int				line)
> +{
> +	struct xfs_inode		*ip = sc->ip;
> +	struct xfs_mount		*mp = sc->mp;
> +	xfs_fsblock_t			fsbno;
> +	xfs_agnumber_t			agno;
> +	xfs_agblock_t			bno;
> +
> +	if (fs_ok)
> +		return fs_ok;
> +
> +	if (bp) {
> +		fsbno = XFS_DADDR_TO_FSB(mp, bp->b_bn);
> +		agno = XFS_FSB_TO_AGNO(mp, fsbno);
> +		bno = XFS_FSB_TO_AGBNO(mp, fsbno);
> +	} else {
> +		agno = XFS_INO_TO_AGNO(mp, ip->i_ino);
> +		bno = XFS_INO_TO_AGINO(mp, ip->i_ino);
> +	}
> +
> +	sc->sm->sm_flags |= XFS_SCRUB_FLAG_PREEN;
> +	trace_xfs_scrub_ino_preen(mp, ip->i_ino, agno, bno, type, check,
> +			func, line);
> +	return fs_ok;
> +}
> +
> +/* Check for file data block corruption. */
> +bool
> +xfs_scrub_data_ok(
> +	struct xfs_scrub_context	*sc,
> +	int				whichfork,
> +	xfs_fileoff_t			offset,
> +	const char			*type,
> +	bool				fs_ok,
> +	const char			*check,
> +	const char			*func,
> +	int				line)
> +{
> +	if (fs_ok)
> +		return fs_ok;
> +
> +	sc->sm->sm_flags |= XFS_SCRUB_FLAG_CORRUPT;
> +	trace_xfs_scrub_data_error(sc->ip, whichfork, offset, type, check,
> +			func, line);
> +	return fs_ok;
> +}
> +
> +/* Check for file data block non-corruption problems. */
> +bool
> +xfs_scrub_data_warn_ok(
> +	struct xfs_scrub_context	*sc,
> +	int				whichfork,
> +	xfs_fileoff_t			offset,
> +	const char			*type,
> +	bool				fs_ok,
> +	const char			*check,
> +	const char			*func,
> +	int				line)
> +{
> +	if (fs_ok)
> +		return fs_ok;
> +
> +	sc->sm->sm_flags |= XFS_SCRUB_FLAG_WARNING;
> +	trace_xfs_scrub_data_warning(sc->ip, whichfork, offset, type, check,
> +			func, line);
> +	return fs_ok;
> +}
> +
> +/* Signal an incomplete scrub. */
> +bool
> +xfs_scrub_incomplete(
> +	struct xfs_scrub_context	*sc,
> +	const char			*type,
> +	bool				fs_ok,
> +	const char			*check,
> +	const char			*func,
> +	int				line)
> +{
> +	if (fs_ok)
> +		return fs_ok;
> +
> +	sc->sm->sm_flags |= XFS_SCRUB_FLAG_INCOMPLETE;
> +	trace_xfs_scrub_incomplete(sc->mp, type, check, func, line);
> +	return fs_ok;
> +}
> +
> +/* Dummy scrubber */
> +
> +int
> +xfs_scrub_dummy(
> +	struct xfs_scrub_context	*sc)
> +{
> +	if (sc->sm->sm_ino || sc->sm->sm_agno)
> +		return -EINVAL;
> +	if (sc->sm->sm_gen & XFS_SCRUB_FLAG_CORRUPT)
> +		sc->sm->sm_flags |= XFS_SCRUB_FLAG_CORRUPT;
> +	if (sc->sm->sm_gen & XFS_SCRUB_FLAG_PREEN)
> +		sc->sm->sm_flags |= XFS_SCRUB_FLAG_PREEN;
> +	if (sc->sm->sm_gen & XFS_SCRUB_FLAG_XFAIL)
> +		sc->sm->sm_flags |= XFS_SCRUB_FLAG_XFAIL;
> +	if (sc->sm->sm_gen & XFS_SCRUB_FLAG_XCORRUPT)
> +		sc->sm->sm_flags |= XFS_SCRUB_FLAG_XCORRUPT;
> +	if (sc->sm->sm_gen & ~XFS_SCRUB_FLAGS_OUT)
> +		return -ENOENT;
> +
> +	return 0;
> +}
> +
> +/* Per-scrubber setup functions */
> +
> +/* Set us up with a transaction and an empty context. */
> +int
> +xfs_scrub_setup_fs(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_inode		*ip)
> +{
> +	return xfs_scrub_trans_alloc(sc->sm, sc->mp,
> +			&M_RES(sc->mp)->tr_itruncate, 0, 0, 0, &sc->tp);
> +}
> +
> +/* Scrub setup and teardown */
> +
> +/* Free all the resources and finish the transactions. */
> +STATIC int
> +xfs_scrub_teardown(
> +	struct xfs_scrub_context	*sc,
> +	int				error)
> +{
> +	if (sc->tp) {
> +		xfs_trans_cancel(sc->tp);
> +		sc->tp = NULL;
> +	}
> +	return error;
> +}
> +
> +/* Perform common scrub context initialization. */
> +STATIC int
> +xfs_scrub_setup(
> +	struct xfs_inode		*ip,
> +	struct xfs_scrub_context	*sc,
> +	const struct xfs_scrub_meta_fns	*fns,
> +	struct xfs_scrub_metadata	*sm,
> +	bool				try_harder)
> +{
> +	memset(sc, 0, sizeof(*sc));
> +	sc->mp = ip->i_mount;
> +	sc->sm = sm;
> +	sc->fns = fns;
> +	sc->try_harder = try_harder;
> +
> +	return sc->fns->setup(sc, ip);
> +}
> +
> +/* Scrubbing dispatch. */
> +
> +static const struct xfs_scrub_meta_fns meta_scrub_fns[] = {
> +	{ /* dummy verifier */
> +		.setup	= xfs_scrub_setup_fs,
> +		.scrub	= xfs_scrub_dummy,
> +	},
> +};
> +
> +/* Dispatch metadata scrubbing. */
> +int
> +xfs_scrub_metadata(
> +	struct xfs_inode		*ip,
> +	struct xfs_scrub_metadata	*sm)
> +{
> +	struct xfs_scrub_context	sc;
> +	struct xfs_mount		*mp = ip->i_mount;
> +	const struct xfs_scrub_meta_fns	*fns;
> +	bool				try_harder = false;
> +	int				error = 0;
> +
> +	trace_xfs_scrub(ip, sm, error);
> +
> +	/* Forbidden if we are shut down or mounted norecovery. */
> +	error = -ESHUTDOWN;
> +	if (XFS_FORCED_SHUTDOWN(mp))
> +		goto out;
> +	error = -ENOTRECOVERABLE;
> +	if (mp->m_flags & XFS_MOUNT_NORECOVERY)
> +		goto out;
> +
> +	/* Check our inputs. */
> +	error = -EINVAL;
> +	sm->sm_flags &= ~XFS_SCRUB_FLAGS_OUT;
> +	if (sm->sm_flags & ~XFS_SCRUB_FLAGS_IN)
> +		goto out;
> +	if (memchr_inv(sm->sm_reserved, 0, sizeof(sm->sm_reserved)))
> +		goto out;
> +
> +	/* Do we know about this type of metadata? */
> +	error = -ENOENT;
> +	if (sm->sm_type > XFS_SCRUB_TYPE_MAX)
> +		goto out;
> +	fns = &meta_scrub_fns[sm->sm_type];
> +	if (fns->scrub == NULL)
> +		goto out;
> +
> +	/* Does this fs even support this type of metadata? */
> +	if (fns->has && !fns->has(&mp->m_sb))
> +		goto out;
> +
> +	/* We don't know how to repair anything yet. */
> +	error = -EOPNOTSUPP;
> +	if (sm->sm_flags & XFS_SCRUB_FLAG_REPAIR)
> +		goto out;
> +
> +	/* This isn't a stable feature.  Use with care. */
> +	{
> +		static bool warned;
> +
> +		if (!warned)
> +			xfs_alert(mp,
> +	"EXPERIMENTAL online scrub feature in use. Use at your own risk!");
> +		warned = true;
> +	}
> +
> +retry_op:
> +	/* Set up for the operation. */
> +	error = xfs_scrub_setup(ip, &sc, fns, sm, try_harder);
> +	if (error)
> +		goto out_teardown;
> +
> +	/* Scrub for errors. */
> +	error = fns->scrub(&sc);
> +	if (!try_harder && error == -EDEADLOCK) {
> +		/*
> +		 * Scrubbers return -EDEADLOCK to mean 'try harder'.
> +		 * Tear down everything we hold, then set up again with
> +		 * preparation for worst-case scenarios.
> +		 */
> +		error = xfs_scrub_teardown(&sc, 0);
> +		if (error)
> +			goto out;
> +		try_harder = true;
> +		goto retry_op;
> +	} else if (error)
> +		goto out_teardown;
> +
> +	if (xfs_scrub_found_corruption(sm))
> +		xfs_alert_ratelimited(mp, "Corruption detected during scrub.");
> +
> +out_teardown:
> +	error = xfs_scrub_teardown(&sc, error);
> +out:
> +	trace_xfs_scrub_done(ip, sm, error);
> +	return error;
> +}
> diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
> new file mode 100644
> index 0000000..4f3113a
> --- /dev/null
> +++ b/fs/xfs/scrub/common.h
> @@ -0,0 +1,179 @@
> +/*
> + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> + *
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#ifndef __XFS_REPAIR_COMMON_H__
> +#define __XFS_REPAIR_COMMON_H__
> +
> +/* Did we find something broken? */
> +static inline bool xfs_scrub_found_corruption(struct xfs_scrub_metadata *sm)
> +{
> +	return sm->sm_flags & (XFS_SCRUB_FLAG_CORRUPT |
> +			       XFS_SCRUB_FLAG_XCORRUPT);
> +}
> +
> +struct xfs_scrub_context {
> +	/* General scrub state. */
> +	struct xfs_mount		*mp;
> +	struct xfs_scrub_metadata	*sm;
> +	const struct xfs_scrub_meta_fns	*fns;
> +	struct xfs_trans		*tp;
> +	struct xfs_inode		*ip;
> +	bool				try_harder;
> +};
> +
> +/* Should we end the scrub early? */
> +static inline bool
> +xfs_scrub_should_terminate(
> +	int		*error)
> +{
> +	if (fatal_signal_pending(current)) {
> +		if (*error == 0)
> +			*error = -EAGAIN;
> +		return true;
> +	}
> +	return false;
> +}
> +
> +/*
> + * Grab a transaction.  If we're going to repair something, we need to
> + * ensure there's enough reservation to make all the changes.  If not,
> + * we can use an empty transaction.
> + */
> +static inline int
> +xfs_scrub_trans_alloc(
> +	struct xfs_scrub_metadata	*sm,
> +	struct xfs_mount		*mp,
> +	struct xfs_trans_res		*resp,
> +	uint				blocks,
> +	uint				rtextents,
> +	uint				flags,
> +	struct xfs_trans		**tpp)
> +{
> +	return xfs_trans_alloc_empty(mp, tpp);
> +}
> +
> +/* Check for operational errors. */
> +bool xfs_scrub_op_ok(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
> +		     xfs_agblock_t bno, const char *type, int *error,
> +		     const char	*func, int line);
> +#define XFS_SCRUB_OP_ERROR_GOTO(sc, agno, bno, type, error, label) \
> +	do { \
> +		if (!xfs_scrub_op_ok((sc), (agno), (bno), (type), \
> +				(error), __func__, __LINE__)) \
> +			goto label; \
> +	} while (0)
> +
> +/* Check for operational errors for a file offset. */
> +bool xfs_scrub_file_op_ok(struct xfs_scrub_context *sc, int whichfork,
> +			  xfs_fileoff_t offset, const char *type,
> +			  int *error, const char *func, int line);
> +#define XFS_SCRUB_FILE_OP_ERROR_GOTO(sc, which, off, type, error, label) \
> +	do { \
> +		if (!xfs_scrub_file_op_ok((sc), (which), (off), (type), \
> +				(error), __func__, __LINE__)) \
> +			goto label; \
> +	} while (0)
> +
> +/* Check for metadata block optimization possibilities. */
> +bool xfs_scrub_block_preen(struct xfs_scrub_context *sc, struct xfs_buf *bp,
> +			   const char *type, bool fs_ok, const char *check,
> +			   const char *func, int line);
> +#define XFS_SCRUB_PREEN(sc, bp, type, fs_ok) \
> +	xfs_scrub_block_preen((sc), (bp), (type), (fs_ok), #fs_ok, \
> +			__func__, __LINE__)
> +
> +/* Check for inode metadata optimization possibilities. */
> +bool xfs_scrub_ino_preen(struct xfs_scrub_context *sc, struct xfs_buf *bp,
> +		      const char *type, bool fs_ok, const char *check,
> +		      const char *func, int line);
> +#define XFS_SCRUB_INO_PREEN(sc, bp, type, fs_ok) \
> +	xfs_scrub_ino_preen((sc), (bp), (type), (fs_ok), #fs_ok, \
> +			__func__, __LINE__)
> +
> +/* Check for metadata block corruption. */
> +bool xfs_scrub_block_ok(struct xfs_scrub_context *sc, struct xfs_buf *bp,
> +			const char *type, bool fs_ok, const char *check,
> +			const char *func, int line);
> +#define XFS_SCRUB_CHECK(sc, bp, type, fs_ok) \
> +	xfs_scrub_block_ok((sc), (bp), (type), (fs_ok), #fs_ok, \
> +			__func__, __LINE__)
> +#define XFS_SCRUB_GOTO(sc, bp, type, fs_ok, label) \
> +	do { \
> +		if (!xfs_scrub_block_ok((sc), (bp), (type), (fs_ok), \
> +				#fs_ok, __func__, __LINE__)) \
> +			goto label; \
> +	} while (0)
> +
> +/* Check for inode metadata corruption. */
> +bool xfs_scrub_ino_ok(struct xfs_scrub_context *sc, xfs_ino_t ino,
> +		      struct xfs_buf *bp, const char *type, bool fs_ok,
> +		      const char *check, const char *func, int line);
> +#define XFS_SCRUB_INO_CHECK(sc, ino, bp, type, fs_ok) \
> +	xfs_scrub_ino_ok((sc), (ino), (bp), (type), (fs_ok), #fs_ok, \
> +			__func__, __LINE__)
> +#define XFS_SCRUB_INO_GOTO(sc, ino, bp, type, fs_ok, label) \
> +	do { \
> +		if (!xfs_scrub_ino_ok((sc), (ino), (bp), (type), (fs_ok), \
> +				#fs_ok, __func__, __LINE__)) \
> +			goto label; \
> +	} while (0)
> +
> +/* Check for file data block corruption. */
> +bool xfs_scrub_data_ok(struct xfs_scrub_context *sc, int whichfork,
> +		       xfs_fileoff_t offset, const char *type, bool fs_ok,
> +		       const char *check, const char *func, int line);
> +#define XFS_SCRUB_DATA_CHECK(sc, whichfork, offset, type, fs_ok) \
> +	xfs_scrub_data_ok((sc), (whichfork), (offset), (type), (fs_ok), \
> +			#fs_ok, __func__, __LINE__)
> +#define XFS_SCRUB_DATA_GOTO(sc, whichfork, offset, type, fs_ok, label) \
> +	do { \
> +		if (!xfs_scrub_data_ok((sc), (whichfork), (offset), \
> +				(type), (fs_ok), #fs_ok, __func__, __LINE__)) \
> +			goto label; \
> +	} while (0)
> +
> +/* Check for file data block non-corruption problems. */
> +bool xfs_scrub_data_warn_ok(struct xfs_scrub_context *sc, int whichfork,
> +			    xfs_fileoff_t offset, const char *type, bool fs_ok,
> +			    const char *check, const char *func, int line);
> +#define XFS_SCRUB_DATA_WARN(sc, whichfork, offset, type, fs_ok) \
> +	xfs_scrub_data_warn_ok((sc), (whichfork), (offset), (type), (fs_ok), \
> +			#fs_ok, __func__, __LINE__)
> +
> +/* Signal an incomplete scrub. */
> +bool xfs_scrub_incomplete(struct xfs_scrub_context *sc, const char *type,
> +			  bool fs_ok, const char *check, const char *func,
> +			  int line);
> +#define XFS_SCRUB_INCOMPLETE(sc, type, fs_ok) \
> +	xfs_scrub_incomplete((sc), (type), (fs_ok), \
> +			#fs_ok, __func__, __LINE__)
> +
> +/* Setup functions */
> +
> +#define SETUP_FN(name) int name(struct xfs_scrub_context *sc, struct xfs_inode *ip)
> +SETUP_FN(xfs_scrub_setup_fs);
> +#undef SETUP_FN
> +
> +/* Metadata scrubbers */
> +
> +#define SCRUB_FN(name) int name(struct xfs_scrub_context *sc)
> +SCRUB_FN(xfs_scrub_dummy);
> +#undef SCRUB_FN
> +
> +#endif	/* __XFS_REPAIR_COMMON_H__ */
> diff --git a/fs/xfs/scrub/xfs_scrub.h b/fs/xfs/scrub/xfs_scrub.h
> new file mode 100644
> index 0000000..e00e0ea
> --- /dev/null
> +++ b/fs/xfs/scrub/xfs_scrub.h
> @@ -0,0 +1,29 @@
> +/*
> + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> + *
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#ifndef __XFS_SCRUB_H__
> +#define __XFS_SCRUB_H__
> +
> +#ifndef CONFIG_XFS_ONLINE_SCRUB
> +# define xfs_scrub_metadata(ip, sm)	(-ENOTTY)
> +#else
> +int xfs_scrub_metadata(struct xfs_inode *ip, struct xfs_scrub_metadata *sm);
> +#endif /* CONFIG_XFS_ONLINE_SCRUB */
> +
> +#endif	/* __XFS_SCRUB_H__ */
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index cc00260..87b3874 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -44,6 +44,7 @@
>  #include "xfs_btree.h"
>  #include <linux/fsmap.h>
>  #include "xfs_fsmap.h"
> +#include "scrub/xfs_scrub.h"
>
>  #include <linux/capability.h>
>  #include <linux/cred.h>
> @@ -1689,6 +1690,30 @@ xfs_ioc_getfsmap(
>  	return 0;
>  }
>
> +STATIC int
> +xfs_ioc_scrub_metadata(
> +	struct xfs_inode		*ip,
> +	void				__user *arg)
> +{
> +	struct xfs_scrub_metadata	scrub;
> +	int				error;
> +
> +	if (!capable(CAP_SYS_ADMIN))
> +		return -EPERM;
> +
> +	if (copy_from_user(&scrub, arg, sizeof(scrub)))
> +		return -EFAULT;
> +
> +	error = xfs_scrub_metadata(ip, &scrub);
> +	if (error)
> +		return error;
> +
> +	if (copy_to_user(arg, &scrub, sizeof(scrub)))
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +
>  int
>  xfs_ioc_swapext(
>  	xfs_swapext_t	*sxp)
> @@ -1872,6 +1897,9 @@ xfs_file_ioctl(
>  	case FS_IOC_GETFSMAP:
>  		return xfs_ioc_getfsmap(ip, arg);
>
> +	case XFS_IOC_SCRUB_METADATA:
> +		return xfs_ioc_scrub_metadata(ip, arg);
> +
>  	case XFS_IOC_FD_TO_HANDLE:
>  	case XFS_IOC_PATH_TO_HANDLE:
>  	case XFS_IOC_PATH_TO_FSHANDLE: {
> diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
> index e8b4de3..972d4bd 100644
> --- a/fs/xfs/xfs_ioctl32.c
> +++ b/fs/xfs/xfs_ioctl32.c
> @@ -557,6 +557,7 @@ xfs_file_compat_ioctl(
>  	case XFS_IOC_ERROR_CLEARALL:
>  	case FS_IOC_GETFSMAP:
>  	case XFS_IOC_GET_AG_RESBLKS:
> +	case XFS_IOC_SCRUB_METADATA:
>  		return xfs_file_ioctl(filp, cmd, p);
>  #ifndef BROKEN_X86_ALIGNMENT
>  	/* These are handled fine if no alignment issues */
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 2e7e193..d4de29b 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -3312,7 +3312,7 @@ DEFINE_GETFSMAP_EVENT(xfs_getfsmap_mapping);
>
>  /* scrub */
>  #define XFS_SCRUB_TYPE_DESC \
> -	{ 0, NULL }
> +	{ XFS_SCRUB_TYPE_TEST,		"dummy" }
>  DECLARE_EVENT_CLASS(xfs_scrub_class,
>  	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
>  		 int error),
> @@ -3330,6 +3330,11 @@ DECLARE_EVENT_CLASS(xfs_scrub_class,
>  	TP_fast_assign(
>  		__entry->dev = ip->i_mount->m_super->s_dev;
>  		__entry->ino = ip->i_ino;
> +		__entry->type = sm->sm_type;
> +		__entry->agno = sm->sm_agno;
> +		__entry->inum = sm->sm_ino;
> +		__entry->gen = sm->sm_gen;
> +		__entry->flags = sm->sm_flags;
>  		__entry->error = error;
>  	),
>  	TP_printk("dev %d:%d ino %llu type %s agno %u inum %llu gen %u flags 0x%x error %d",
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 04/22] xfs: generic functions to scrub metadata and btrees
  2017-07-21  4:38 ` [PATCH 04/22] xfs: generic functions to scrub metadata and btrees Darrick J. Wong
@ 2017-07-23 16:40   ` Allison Henderson
  2017-07-24  1:05   ` Dave Chinner
  1 sibling, 0 replies; 63+ messages in thread
From: Allison Henderson @ 2017-07-23 16:40 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

Looks good.  You can add
Reviewed by: Allison Henderson <allison.henderson@oracle.com>

On 7/20/2017 9:38 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> Create a function that walks a btree, checking the integrity of each
> btree block (headers, keys, records) and calling back to the caller
> to perform further checks on the records.  Add some helper functions
> so that we report detailed scrub errors in a uniform manner in dmesg.
> These are helper functions for subsequent patches.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/Makefile       |    1
>  fs/xfs/scrub/btree.c  |  658 +++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/scrub/btree.h  |   95 +++++++
>  fs/xfs/scrub/common.c |  169 +++++++++++++
>  fs/xfs/scrub/common.h |   30 ++
>  5 files changed, 953 insertions(+)
>  create mode 100644 fs/xfs/scrub/btree.c
>  create mode 100644 fs/xfs/scrub/btree.h
>
>
> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> index c4fdaa2..4e04da9 100644
> --- a/fs/xfs/Makefile
> +++ b/fs/xfs/Makefile
> @@ -140,6 +140,7 @@ xfs-$(CONFIG_EXPORTFS_BLOCK_OPS)	+= xfs_pnfs.o
>  # online scrub/repair
>  ifeq ($(CONFIG_XFS_ONLINE_SCRUB),y)
>  xfs-y				+= $(addprefix scrub/, \
> +				   btree.o \
>  				   common.o \
>  				   )
>  endif
> diff --git a/fs/xfs/scrub/btree.c b/fs/xfs/scrub/btree.c
> new file mode 100644
> index 0000000..452e70a
> --- /dev/null
> +++ b/fs/xfs/scrub/btree.c
> @@ -0,0 +1,658 @@
> +/*
> + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> + *
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#include "xfs.h"
> +#include "xfs_fs.h"
> +#include "xfs_shared.h"
> +#include "xfs_format.h"
> +#include "xfs_trans_resv.h"
> +#include "xfs_mount.h"
> +#include "xfs_defer.h"
> +#include "xfs_btree.h"
> +#include "xfs_bit.h"
> +#include "xfs_log_format.h"
> +#include "xfs_trans.h"
> +#include "xfs_trace.h"
> +#include "xfs_sb.h"
> +#include "xfs_inode.h"
> +#include "xfs_alloc.h"
> +#include "scrub/common.h"
> +#include "scrub/btree.h"
> +
> +/* btree scrubbing */
> +
> +const char * const btree_types[] = {
> +	[XFS_BTNUM_BNO]		= "bnobt",
> +	[XFS_BTNUM_CNT]		= "cntbt",
> +	[XFS_BTNUM_RMAP]	= "rmapbt",
> +	[XFS_BTNUM_BMAP]	= "bmapbt",
> +	[XFS_BTNUM_INO]		= "inobt",
> +	[XFS_BTNUM_FINO]	= "finobt",
> +	[XFS_BTNUM_REFC]	= "refcountbt",
> +};
> +
> +/* Format the trace parameters for the tree cursor. */
> +static inline void
> +xfs_scrub_btree_format(
> +	struct xfs_btree_cur		*cur,
> +	int				level,
> +	char				*bt_type,
> +	size_t				type_len,
> +	char				*bt_ptr,
> +	size_t				ptr_len,
> +	xfs_fsblock_t			*fsbno)
> +{
> +	char				*type = NULL;
> +	struct xfs_btree_block		*block;
> +	struct xfs_buf			*bp;
> +
> +	switch (cur->bc_btnum) {
> +	case XFS_BTNUM_BMAP:
> +		switch (cur->bc_private.b.whichfork) {
> +		case XFS_DATA_FORK:
> +			type = "data";
> +			break;
> +		case XFS_ATTR_FORK:
> +			type = "attr";
> +			break;
> +		case XFS_COW_FORK:
> +			type = "CoW";
> +			break;
> +		}
> +		snprintf(bt_type, type_len, "inode %llu %s fork",
> +				(unsigned long long)cur->bc_private.b.ip->i_ino,
> +				type);
> +		break;
> +	default:
> +		strncpy(bt_type, btree_types[cur->bc_btnum], type_len);
> +		break;
> +	}
> +
> +	if (level < cur->bc_nlevels && cur->bc_ptrs[level] >= 1) {
> +		block = xfs_btree_get_block(cur, level, &bp);
> +		snprintf(bt_ptr, ptr_len, " %s %d/%d",
> +				level == 0 ? "rec" : "ptr",
> +				cur->bc_ptrs[level],
> +				be16_to_cpu(block->bb_numrecs));
> +	} else
> +		bt_ptr[0] = 0;
> +
> +	if (level < cur->bc_nlevels && cur->bc_bufs[level])
> +		*fsbno = XFS_DADDR_TO_FSB(cur->bc_mp,
> +				cur->bc_bufs[level]->b_bn);
> +	else if (cur->bc_flags & XFS_BTREE_LONG_PTRS)
> +		*fsbno = XFS_INO_TO_FSB(cur->bc_mp,
> +				cur->bc_private.b.ip->i_ino);
> +	else
> +		*fsbno = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_private.a.agno, 0);
> +}
> +
> +/* Check for btree corruption. */
> +bool
> +xfs_scrub_btree_ok(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_btree_cur		*cur,
> +	int				level,
> +	bool				fs_ok,
> +	const char			*check,
> +	const char			*func,
> +	int				line)
> +{
> +	char				bt_ptr[24];
> +	char				bt_type[48];
> +	xfs_fsblock_t			fsbno;
> +
> +	if (fs_ok)
> +		return fs_ok;
> +
> +	sc->sm->sm_flags |= XFS_SCRUB_FLAG_CORRUPT;
> +	xfs_scrub_btree_format(cur, level, bt_type, 48, bt_ptr, 24, &fsbno);
> +
> +	trace_xfs_scrub_btree_error(cur->bc_mp, bt_type, bt_ptr,
> +			XFS_FSB_TO_AGNO(cur->bc_mp, fsbno),
> +			XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno),
> +			check, func, line);
> +	return fs_ok;
> +}
> +
> +/* Check for btree operation errors . */
> +bool
> +xfs_scrub_btree_op_ok(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_btree_cur		*cur,
> +	int				level,
> +	int				*error,
> +	const char			*func,
> +	int				line)
> +{
> +	char				bt_ptr[24];
> +	char				bt_type[48];
> +	xfs_fsblock_t			fsbno;
> +
> +	if (*error == 0)
> +		return true;
> +
> +	xfs_scrub_btree_format(cur, level, bt_type, 48, bt_ptr, 24, &fsbno);
> +
> +	return xfs_scrub_op_ok(sc,
> +			XFS_FSB_TO_AGNO(cur->bc_mp, fsbno),
> +			XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno),
> +			bt_type, error, func, line);
> +}
> +
> +/*
> + * Make sure this record is in order and doesn't stray outside of the parent
> + * keys.
> + */
> +STATIC int
> +xfs_scrub_btree_rec(
> +	struct xfs_scrub_btree	*bs)
> +{
> +	struct xfs_btree_cur	*cur = bs->cur;
> +	union xfs_btree_rec	*rec;
> +	union xfs_btree_key	key;
> +	union xfs_btree_key	hkey;
> +	union xfs_btree_key	*keyp;
> +	struct xfs_btree_block	*block;
> +	struct xfs_btree_block	*keyblock;
> +	struct xfs_buf		*bp;
> +
> +	block = xfs_btree_get_block(cur, 0, &bp);
> +	rec = xfs_btree_rec_addr(cur, cur->bc_ptrs[0], block);
> +
> +	if (bp)
> +		trace_xfs_scrub_btree_rec(cur->bc_mp,
> +				XFS_FSB_TO_AGNO(cur->bc_mp,
> +					XFS_DADDR_TO_FSB(cur->bc_mp, bp->b_bn)),
> +				XFS_FSB_TO_AGBNO(cur->bc_mp,
> +					XFS_DADDR_TO_FSB(cur->bc_mp, bp->b_bn)),
> +				cur->bc_btnum, 0, cur->bc_nlevels,
> +				cur->bc_ptrs[0]);
> +	else if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE)
> +		trace_xfs_scrub_btree_rec(cur->bc_mp,
> +				XFS_INO_TO_AGNO(cur->bc_mp,
> +					cur->bc_private.b.ip->i_ino),
> +				XFS_INO_TO_AGBNO(cur->bc_mp,
> +					cur->bc_private.b.ip->i_ino),
> +				cur->bc_btnum, 0, cur->bc_nlevels,
> +				cur->bc_ptrs[0]);
> +	else
> +		trace_xfs_scrub_btree_rec(cur->bc_mp,
> +				NULLAGNUMBER, NULLAGBLOCK,
> +				cur->bc_btnum, 0, cur->bc_nlevels,
> +				cur->bc_ptrs[0]);
> +
> +	/* If this isn't the first record, are they in order? */
> +	XFS_SCRUB_BTREC_CHECK(bs, bs->firstrec ||
> +			cur->bc_ops->recs_inorder(cur, &bs->lastrec, rec));
> +	bs->firstrec = false;
> +	memcpy(&bs->lastrec, rec, cur->bc_ops->rec_len);
> +
> +	if (cur->bc_nlevels == 1)
> +		return 0;
> +
> +	/* Is this at least as large as the parent low key? */
> +	cur->bc_ops->init_key_from_rec(&key, rec);
> +	keyblock = xfs_btree_get_block(cur, 1, &bp);
> +	keyp = xfs_btree_key_addr(cur, cur->bc_ptrs[1], keyblock);
> +	XFS_SCRUB_BTKEY_CHECK(bs, 1,
> +			cur->bc_ops->diff_two_keys(cur, &key, keyp) >= 0);
> +
> +	if (!(cur->bc_flags & XFS_BTREE_OVERLAPPING))
> +		return 0;
> +
> +	/* Is this no larger than the parent high key? */
> +	cur->bc_ops->init_high_key_from_rec(&hkey, rec);
> +	keyp = xfs_btree_high_key_addr(cur, cur->bc_ptrs[1], keyblock);
> +	XFS_SCRUB_BTKEY_CHECK(bs, 1,
> +			cur->bc_ops->diff_two_keys(cur, keyp, &hkey) >= 0);
> +
> +	return 0;
> +}
> +
> +/*
> + * Make sure this key is in order and doesn't stray outside of the parent
> + * keys.
> + */
> +STATIC int
> +xfs_scrub_btree_key(
> +	struct xfs_scrub_btree	*bs,
> +	int			level)
> +{
> +	struct xfs_btree_cur	*cur = bs->cur;
> +	union xfs_btree_key	*key;
> +	union xfs_btree_key	*keyp;
> +	struct xfs_btree_block	*block;
> +	struct xfs_btree_block	*keyblock;
> +	struct xfs_buf		*bp;
> +
> +	block = xfs_btree_get_block(cur, level, &bp);
> +	key = xfs_btree_key_addr(cur, cur->bc_ptrs[level], block);
> +
> +	if (bp)
> +		trace_xfs_scrub_btree_key(cur->bc_mp,
> +				XFS_FSB_TO_AGNO(cur->bc_mp,
> +					XFS_DADDR_TO_FSB(cur->bc_mp, bp->b_bn)),
> +				XFS_FSB_TO_AGBNO(cur->bc_mp,
> +					XFS_DADDR_TO_FSB(cur->bc_mp, bp->b_bn)),
> +				cur->bc_btnum, level, cur->bc_nlevels,
> +				cur->bc_ptrs[level]);
> +	else if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE)
> +		trace_xfs_scrub_btree_key(cur->bc_mp,
> +				XFS_INO_TO_AGNO(cur->bc_mp,
> +					cur->bc_private.b.ip->i_ino),
> +				XFS_INO_TO_AGBNO(cur->bc_mp,
> +					cur->bc_private.b.ip->i_ino),
> +				cur->bc_btnum, level, cur->bc_nlevels,
> +				cur->bc_ptrs[level]);
> +	else
> +		trace_xfs_scrub_btree_key(cur->bc_mp,
> +				NULLAGNUMBER, NULLAGBLOCK,
> +				cur->bc_btnum, level, cur->bc_nlevels,
> +				cur->bc_ptrs[level]);
> +
> +	/* If this isn't the first key, are they in order? */
> +	XFS_SCRUB_BTKEY_CHECK(bs, level, bs->firstkey[level] ||
> +			cur->bc_ops->keys_inorder(cur, &bs->lastkey[level],
> +					key));
> +	bs->firstkey[level] = false;
> +	memcpy(&bs->lastkey[level], key, cur->bc_ops->key_len);
> +
> +	if (level + 1 >= cur->bc_nlevels)
> +		return 0;
> +
> +	/* Is this at least as large as the parent low key? */
> +	keyblock = xfs_btree_get_block(cur, level + 1, &bp);
> +	keyp = xfs_btree_key_addr(cur, cur->bc_ptrs[level + 1], keyblock);
> +	XFS_SCRUB_BTKEY_CHECK(bs, level,
> +			cur->bc_ops->diff_two_keys(cur, key, keyp) >= 0);
> +
> +	if (!(cur->bc_flags & XFS_BTREE_OVERLAPPING))
> +		return 0;
> +
> +	/* Is this no larger than the parent high key? */
> +	key = xfs_btree_high_key_addr(cur, cur->bc_ptrs[level], block);
> +	keyp = xfs_btree_high_key_addr(cur, cur->bc_ptrs[level + 1], keyblock);
> +	XFS_SCRUB_BTKEY_CHECK(bs, level,
> +			cur->bc_ops->diff_two_keys(cur, keyp, key) >= 0);
> +
> +	return 0;
> +}
> +
> +/* Check a btree pointer. */
> +static int
> +xfs_scrub_btree_ptr(
> +	struct xfs_scrub_btree		*bs,
> +	int				level,
> +	union xfs_btree_ptr		*ptr)
> +{
> +	struct xfs_btree_cur		*cur = bs->cur;
> +	xfs_daddr_t			daddr;
> +	xfs_daddr_t			eofs;
> +
> +	if ((cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) &&
> +			level == cur->bc_nlevels) {
> +		if (cur->bc_flags & XFS_BTREE_LONG_PTRS) {
> +			XFS_SCRUB_BTKEY_GOTO(bs, level, ptr->l == 0, corrupt);
> +		} else {
> +			XFS_SCRUB_BTKEY_GOTO(bs, level, ptr->s == 0, corrupt);
> +		}
> +		return 0;
> +	}
> +
> +	if (cur->bc_flags & XFS_BTREE_LONG_PTRS) {
> +		XFS_SCRUB_BTKEY_GOTO(bs, level,
> +				ptr->l != cpu_to_be64(NULLFSBLOCK), corrupt);
> +
> +		daddr = XFS_FSB_TO_DADDR(cur->bc_mp, be64_to_cpu(ptr->l));
> +	} else {
> +		XFS_SCRUB_BTKEY_GOTO(bs, level,
> +				cur->bc_private.a.agno != NULLAGNUMBER, corrupt);
> +		XFS_SCRUB_BTKEY_GOTO(bs, level,
> +				ptr->s != cpu_to_be32(NULLAGBLOCK), corrupt);
> +
> +		daddr = XFS_AGB_TO_DADDR(cur->bc_mp, cur->bc_private.a.agno,
> +				be32_to_cpu(ptr->s));
> +	}
> +	eofs = XFS_FSB_TO_BB(cur->bc_mp, cur->bc_mp->m_sb.sb_dblocks);
> +	XFS_SCRUB_BTKEY_GOTO(bs, level, daddr != 0, corrupt);
> +	XFS_SCRUB_BTKEY_GOTO(bs, level, daddr < eofs, corrupt);
> +
> +	return 0;
> +
> +corrupt:
> +	return -EFSCORRUPTED;
> +}
> +
> +/* Check the siblings of a large format btree block. */
> +STATIC int
> +xfs_scrub_btree_lblock_check_siblings(
> +	struct xfs_scrub_btree		*bs,
> +	struct xfs_btree_block		*block)
> +{
> +	struct xfs_btree_block		*pblock;
> +	struct xfs_buf			*pbp;
> +	struct xfs_btree_cur		*ncur = NULL;
> +	union xfs_btree_ptr		*pp;
> +	xfs_fsblock_t			leftsib;
> +	xfs_fsblock_t			rightsib;
> +	xfs_fsblock_t			fsbno;
> +	int				level;
> +	int				success;
> +	int				error = 0;
> +
> +	leftsib = be64_to_cpu(block->bb_u.l.bb_leftsib);
> +	rightsib = be64_to_cpu(block->bb_u.l.bb_rightsib);
> +	level = xfs_btree_get_level(block);
> +
> +	/* Root block should never have siblings. */
> +	if (level == bs->cur->bc_nlevels - 1) {
> +		XFS_SCRUB_BTKEY_CHECK(bs, level, leftsib == NULLFSBLOCK);
> +		XFS_SCRUB_BTKEY_CHECK(bs, level, rightsib == NULLFSBLOCK);
> +		return error;
> +	}
> +
> +	/* Does the left sibling match the parent level left block? */
> +	if (leftsib != NULLFSBLOCK) {
> +		error = xfs_btree_dup_cursor(bs->cur, &ncur);
> +		if (error)
> +			return error;
> +		error = xfs_btree_decrement(ncur, level + 1, &success);
> +		XFS_SCRUB_BTKEY_OP_ERROR_GOTO(bs, level + 1, &error, out_cur);
> +		XFS_SCRUB_BTKEY_GOTO(bs, level, success, out_cur);
> +
> +		pblock = xfs_btree_get_block(ncur, level + 1, &pbp);
> +		pp = xfs_btree_ptr_addr(ncur, ncur->bc_ptrs[level + 1], pblock);
> +		if (!xfs_scrub_btree_ptr(bs, level + 1, pp)) {
> +			fsbno = be64_to_cpu(pp->l);
> +			XFS_SCRUB_BTKEY_CHECK(bs, level, fsbno == leftsib);
> +		}
> +
> +		xfs_btree_del_cursor(ncur, XFS_BTREE_ERROR);
> +		ncur = NULL;
> +	}
> +
> +	/* Does the right sibling match the parent level right block? */
> +	if (!error && rightsib != NULLFSBLOCK) {
> +		error = xfs_btree_dup_cursor(bs->cur, &ncur);
> +		if (error)
> +			return error;
> +		error = xfs_btree_increment(ncur, level + 1, &success);
> +		XFS_SCRUB_BTKEY_OP_ERROR_GOTO(bs, level + 1, &error, out_cur);
> +		XFS_SCRUB_BTKEY_GOTO(bs, level, success, out_cur);
> +
> +		pblock = xfs_btree_get_block(ncur, level + 1, &pbp);
> +		pp = xfs_btree_ptr_addr(ncur, ncur->bc_ptrs[level + 1], pblock);
> +		if (!xfs_scrub_btree_ptr(bs, level + 1, pp)) {
> +			fsbno = be64_to_cpu(pp->l);
> +			XFS_SCRUB_BTKEY_CHECK(bs, level, fsbno == rightsib);
> +		}
> +
> +		xfs_btree_del_cursor(ncur, XFS_BTREE_ERROR);
> +		ncur = NULL;
> +	}
> +
> +out_cur:
> +	if (ncur)
> +		xfs_btree_del_cursor(ncur, XFS_BTREE_ERROR);
> +	return error;
> +}
> +
> +/* Check the siblings of a small format btree block. */
> +STATIC int
> +xfs_scrub_btree_sblock_check_siblings(
> +	struct xfs_scrub_btree		*bs,
> +	struct xfs_btree_block		*block)
> +{
> +	struct xfs_btree_block		*pblock;
> +	struct xfs_buf			*pbp;
> +	struct xfs_btree_cur		*ncur = NULL;
> +	union xfs_btree_ptr		*pp;
> +	xfs_agblock_t			leftsib;
> +	xfs_agblock_t			rightsib;
> +	xfs_agblock_t			agbno;
> +	int				level;
> +	int				success;
> +	int				error = 0;
> +
> +	leftsib = be32_to_cpu(block->bb_u.s.bb_leftsib);
> +	rightsib = be32_to_cpu(block->bb_u.s.bb_rightsib);
> +	level = xfs_btree_get_level(block);
> +
> +	/* Root block should never have siblings. */
> +	if (level == bs->cur->bc_nlevels - 1) {
> +		XFS_SCRUB_BTKEY_CHECK(bs, level, leftsib == NULLAGBLOCK);
> +		XFS_SCRUB_BTKEY_CHECK(bs, level, rightsib == NULLAGBLOCK);
> +		return error;
> +	}
> +
> +	/* Does the left sibling match the parent level left block? */
> +	if (leftsib != NULLAGBLOCK) {
> +		error = xfs_btree_dup_cursor(bs->cur, &ncur);
> +		if (error)
> +			return error;
> +		error = xfs_btree_decrement(ncur, level + 1, &success);
> +		XFS_SCRUB_BTKEY_OP_ERROR_GOTO(bs, level + 1, &error, out_cur);
> +		XFS_SCRUB_BTKEY_GOTO(bs, level, success, verify_rightsib);
> +
> +		pblock = xfs_btree_get_block(ncur, level + 1, &pbp);
> +		pp = xfs_btree_ptr_addr(ncur, ncur->bc_ptrs[level + 1], pblock);
> +		if (!xfs_scrub_btree_ptr(bs, level + 1, pp)) {
> +			agbno = be32_to_cpu(pp->s);
> +			XFS_SCRUB_BTKEY_CHECK(bs, level, agbno == leftsib);
> +		}
> +
> +		xfs_btree_del_cursor(ncur, XFS_BTREE_ERROR);
> +		ncur = NULL;
> +	}
> +
> +verify_rightsib:
> +	if (ncur) {
> +		xfs_btree_del_cursor(ncur, XFS_BTREE_ERROR);
> +		ncur = NULL;
> +	}
> +
> +	/* Does the right sibling match the parent level right block? */
> +	if (rightsib != NULLAGBLOCK) {
> +		error = xfs_btree_dup_cursor(bs->cur, &ncur);
> +		if (error)
> +			return error;
> +		error = xfs_btree_increment(ncur, level + 1, &success);
> +		XFS_SCRUB_BTKEY_OP_ERROR_GOTO(bs, level + 1, &error, out_cur);
> +		XFS_SCRUB_BTKEY_GOTO(bs, level, success, out_cur);
> +
> +		pblock = xfs_btree_get_block(ncur, level + 1, &pbp);
> +		pp = xfs_btree_ptr_addr(ncur, ncur->bc_ptrs[level + 1], pblock);
> +		if (!xfs_scrub_btree_ptr(bs, level + 1, pp)) {
> +			agbno = be32_to_cpu(pp->s);
> +			XFS_SCRUB_BTKEY_CHECK(bs, level, agbno == rightsib);
> +		}
> +
> +		xfs_btree_del_cursor(ncur, XFS_BTREE_ERROR);
> +		ncur = NULL;
> +	}
> +
> +out_cur:
> +	if (ncur)
> +		xfs_btree_del_cursor(ncur, XFS_BTREE_ERROR);
> +	return error;
> +}
> +
> +/* Grab and scrub a btree block. */
> +STATIC int
> +xfs_scrub_btree_block(
> +	struct xfs_scrub_btree		*bs,
> +	int				level,
> +	union xfs_btree_ptr		*pp,
> +	struct xfs_btree_block		**pblock,
> +	struct xfs_buf			**pbp)
> +{
> +	int				error;
> +
> +	error = xfs_btree_lookup_get_block(bs->cur, level, pp, pblock);
> +	if (error)
> +		return error;
> +
> +	xfs_btree_get_block(bs->cur, level, pbp);
> +	error = xfs_btree_check_block(bs->cur, *pblock, level, *pbp);
> +	if (error)
> +		return error;
> +
> +	return bs->check_siblings_fn(bs, *pblock);
> +}
> +
> +/*
> + * Visit all nodes and leaves of a btree.  Check that all pointers and
> + * records are in order, that the keys reflect the records, and use a callback
> + * so that the caller can verify individual records.  The callback is the same
> + * as the one for xfs_btree_query_range, so therefore this function also
> + * returns XFS_BTREE_QUERY_RANGE_ABORT, zero, or a negative error code.
> + */
> +int
> +xfs_scrub_btree(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_btree_cur		*cur,
> +	xfs_scrub_btree_rec_fn		scrub_fn,
> +	struct xfs_owner_info		*oinfo,
> +	void				*private)
> +{
> +	struct xfs_scrub_btree		bs = {0};
> +	union xfs_btree_ptr		ptr;
> +	union xfs_btree_ptr		*pp;
> +	union xfs_btree_rec		*recp;
> +	struct xfs_btree_block		*block;
> +	int				level;
> +	struct xfs_buf			*bp;
> +	int				i;
> +	int				error = 0;
> +
> +	/* Finish filling out the scrub state */
> +	bs.cur = cur;
> +	bs.scrub_rec = scrub_fn;
> +	bs.oinfo = oinfo;
> +	bs.firstrec = true;
> +	bs.private = private;
> +	bs.sc = sc;
> +	for (i = 0; i < XFS_BTREE_MAXLEVELS; i++)
> +		bs.firstkey[i] = true;
> +	INIT_LIST_HEAD(&bs.to_check);
> +
> +	if (cur->bc_flags & XFS_BTREE_LONG_PTRS)
> +		bs.check_siblings_fn = xfs_scrub_btree_lblock_check_siblings;
> +	else
> +		bs.check_siblings_fn = xfs_scrub_btree_sblock_check_siblings;
> +
> +	/* Don't try to check a tree with a height we can't handle. */
> +	XFS_SCRUB_BTREC_GOTO(&bs, cur->bc_nlevels > 0, out_badcursor);
> +	XFS_SCRUB_BTREC_GOTO(&bs, cur->bc_nlevels <= XFS_BTREE_MAXLEVELS,
> +			out_badcursor);
> +
> +	/* Make sure the root isn't in the superblock. */
> +	cur->bc_ops->init_ptr_from_cur(cur, &ptr);
> +	error = xfs_scrub_btree_ptr(&bs, cur->bc_nlevels, &ptr);
> +	XFS_SCRUB_BTKEY_OP_ERROR_GOTO(&bs, cur->bc_nlevels, &error,
> +			out_badcursor);
> +
> +	/* Load the root of the btree. */
> +	level = cur->bc_nlevels - 1;
> +	cur->bc_ops->init_ptr_from_cur(cur, &ptr);
> +	error = xfs_scrub_btree_block(&bs, level, &ptr, &block, &bp);
> +	XFS_SCRUB_BTKEY_OP_ERROR_GOTO(&bs, level, &error, out);
> +
> +	cur->bc_ptrs[level] = 1;
> +
> +	while (level < cur->bc_nlevels) {
> +		block = xfs_btree_get_block(cur, level, &bp);
> +
> +		if (level == 0) {
> +			/* End of leaf, pop back towards the root. */
> +			if (cur->bc_ptrs[level] >
> +			    be16_to_cpu(block->bb_numrecs)) {
> +				if (level < cur->bc_nlevels - 1)
> +					cur->bc_ptrs[level + 1]++;
> +				level++;
> +				continue;
> +			}
> +
> +			/* Records in order for scrub? */
> +			error = xfs_scrub_btree_rec(&bs);
> +			if (error)
> +				goto out;
> +			recp = xfs_btree_rec_addr(cur, cur->bc_ptrs[0], block);
> +			error = bs.scrub_rec(&bs, recp);
> +			if (error < 0 ||
> +			    error == XFS_BTREE_QUERY_RANGE_ABORT)
> +				break;
> +			if (xfs_scrub_should_terminate(&error))
> +				break;
> +
> +			cur->bc_ptrs[level]++;
> +			continue;
> +		}
> +
> +		/* End of node, pop back towards the root. */
> +		if (cur->bc_ptrs[level] > be16_to_cpu(block->bb_numrecs)) {
> +			if (level < cur->bc_nlevels - 1)
> +				cur->bc_ptrs[level + 1]++;
> +			level++;
> +			continue;
> +		}
> +
> +		/* Keys in order for scrub? */
> +		error = xfs_scrub_btree_key(&bs, level);
> +		if (error)
> +			goto out;
> +
> +		/* Drill another level deeper. */
> +		pp = xfs_btree_ptr_addr(cur, cur->bc_ptrs[level], block);
> +		error = xfs_scrub_btree_ptr(&bs, level, pp);
> +		if (error) {
> +			error = 0;
> +			cur->bc_ptrs[level]++;
> +			continue;
> +		}
> +		level--;
> +		error = xfs_scrub_btree_block(&bs, level, pp, &block, &bp);
> +		XFS_SCRUB_BTKEY_OP_ERROR_GOTO(&bs, level, &error, out);
> +
> +		cur->bc_ptrs[level] = 1;
> +	}
> +
> +out:
> +	/*
> +	 * If we don't end this function with the cursor pointing at a record
> +	 * block, a subsequent non-error cursor deletion will not release
> +	 * node-level buffers, causing a buffer leak.  This is quite possible
> +	 * with a zero-results scrubbing run, so release the buffers if we
> +	 * aren't pointing at a record.
> +	 */
> +	if (cur->bc_bufs[0] == NULL) {
> +		for (i = 0; i < cur->bc_nlevels; i++) {
> +			if (cur->bc_bufs[i]) {
> +				xfs_trans_brelse(cur->bc_tp, cur->bc_bufs[i]);
> +				cur->bc_bufs[i] = NULL;
> +				cur->bc_ptrs[i] = 0;
> +				cur->bc_ra[i] = 0;
> +			}
> +		}
> +	}
> +
> +out_badcursor:
> +	return error;
> +}
> diff --git a/fs/xfs/scrub/btree.h b/fs/xfs/scrub/btree.h
> new file mode 100644
> index 0000000..75e89b1
> --- /dev/null
> +++ b/fs/xfs/scrub/btree.h
> @@ -0,0 +1,95 @@
> +/*
> + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> + *
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#ifndef __XFS_REPAIR_BTREE_H__
> +#define __XFS_REPAIR_BTREE_H__
> +
> +/* btree scrub */
> +
> +extern const char * const btree_types[];
> +
> +/* Check for btree corruption. */
> +bool xfs_scrub_btree_ok(struct xfs_scrub_context *sc,
> +			struct xfs_btree_cur *cur, int level, bool fs_ok,
> +			const char *check, const char *func, int line);
> +
> +/* Check for btree operation errors. */
> +bool xfs_scrub_btree_op_ok(struct xfs_scrub_context *sc,
> +			   struct xfs_btree_cur *cur, int level, int *error,
> +			   const char *func, int line);
> +
> +#define XFS_SCRUB_BTREC_CHECK(bs, fs_ok) \
> +	xfs_scrub_btree_ok((bs)->sc, (bs)->cur, 0, (fs_ok), #fs_ok, \
> +			__func__, __LINE__)
> +#define XFS_SCRUB_BTREC_GOTO(bs, fs_ok, label) \
> +	do { \
> +		if (!xfs_scrub_btree_ok((bs)->sc, (bs)->cur, 0, (fs_ok), \
> +				#fs_ok, __func__, __LINE__)) \
> +			goto label; \
> +	} while (0)
> +#define XFS_SCRUB_BTREC_OP_ERROR_GOTO(bs, error, label) \
> +	do { \
> +		if (!xfs_scrub_btree_op_ok((bs)->sc, (bs)->cur, 0, \
> +				(error), __func__, __LINE__)) \
> +			goto label; \
> +	} while (0)
> +#define XFS_SCRUB_BTKEY_CHECK(bs, level, fs_ok) \
> +	xfs_scrub_btree_ok((bs)->sc, (bs)->cur, (level), (fs_ok), #fs_ok, \
> +			__func__, __LINE__)
> +#define XFS_SCRUB_BTKEY_GOTO(bs, level, fs_ok, label) \
> +	do { \
> +		if (!xfs_scrub_btree_ok((bs)->sc, (bs)->cur, (level), (fs_ok), \
> +				#fs_ok, __func__, __LINE__)) \
> +			goto label; \
> +	} while (0)
> +#define XFS_SCRUB_BTKEY_OP_ERROR_GOTO(bs, level, error, label) \
> +	do { \
> +		if (!xfs_scrub_btree_op_ok((bs)->sc, (bs)->cur, (level), \
> +				(error), __func__, __LINE__)) \
> +			goto label; \
> +	} while (0)
> +
> +struct xfs_scrub_btree;
> +typedef int (*xfs_scrub_btree_rec_fn)(
> +	struct xfs_scrub_btree	*bs,
> +	union xfs_btree_rec	*rec);
> +
> +struct xfs_scrub_btree {
> +	/* caller-provided scrub state */
> +	struct xfs_scrub_context	*sc;
> +	struct xfs_btree_cur		*cur;
> +	xfs_scrub_btree_rec_fn		scrub_rec;
> +	struct xfs_owner_info		*oinfo;
> +	void				*private;
> +
> +	/* internal scrub state */
> +	union xfs_btree_rec		lastrec;
> +	bool				firstrec;
> +	union xfs_btree_key		lastkey[XFS_BTREE_MAXLEVELS];
> +	bool				firstkey[XFS_BTREE_MAXLEVELS];
> +	struct list_head		to_check;
> +	int				(*check_siblings_fn)(
> +						struct xfs_scrub_btree *,
> +						struct xfs_btree_block *);
> +};
> +int xfs_scrub_btree(struct xfs_scrub_context *sc, struct xfs_btree_cur *cur,
> +		    xfs_scrub_btree_rec_fn scrub_fn,
> +		    struct xfs_owner_info *oinfo, void *private);
> +
> +#endif /* __XFS_REPAIR_BTREE_H__ */
> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> index 6931793..331aa14 100644
> --- a/fs/xfs/scrub/common.c
> +++ b/fs/xfs/scrub/common.c
> @@ -43,6 +43,7 @@
>  #include "xfs_rmap_btree.h"
>  #include "scrub/xfs_scrub.h"
>  #include "scrub/common.h"
> +#include "scrub/btree.h"
>
>  /*
>   * Online Scrub and Repair
> @@ -367,6 +368,172 @@ xfs_scrub_incomplete(
>  	return fs_ok;
>  }
>
> +/* AG scrubbing */
> +
> +/* Grab all the headers for an AG. */
> +int
> +xfs_scrub_ag_read_headers(
> +	struct xfs_scrub_context	*sc,
> +	xfs_agnumber_t			agno,
> +	struct xfs_buf			**agi,
> +	struct xfs_buf			**agf,
> +	struct xfs_buf			**agfl)
> +{
> +	struct xfs_mount		*mp = sc->mp;
> +	int				error;
> +
> +	error = xfs_ialloc_read_agi(mp, sc->tp, agno, agi);
> +	if (error)
> +		goto out;
> +
> +	error = xfs_alloc_read_agf(mp, sc->tp, agno, 0, agf);
> +	if (error)
> +		goto out;
> +	if (!*agf) {
> +		error = -ENOMEM;
> +		goto out;
> +	}
> +
> +	error = xfs_alloc_read_agfl(mp, sc->tp, agno, agfl);
> +	if (error)
> +		goto out;
> +
> +out:
> +	return error;
> +}
> +
> +/* Release all the AG btree cursors. */
> +STATIC void
> +xfs_scrub_ag_btcur_free(
> +	struct xfs_scrub_ag		*sa)
> +{
> +	if (sa->refc_cur)
> +		xfs_btree_del_cursor(sa->refc_cur, XFS_BTREE_ERROR);
> +	if (sa->rmap_cur)
> +		xfs_btree_del_cursor(sa->rmap_cur, XFS_BTREE_ERROR);
> +	if (sa->fino_cur)
> +		xfs_btree_del_cursor(sa->fino_cur, XFS_BTREE_ERROR);
> +	if (sa->ino_cur)
> +		xfs_btree_del_cursor(sa->ino_cur, XFS_BTREE_ERROR);
> +	if (sa->cnt_cur)
> +		xfs_btree_del_cursor(sa->cnt_cur, XFS_BTREE_ERROR);
> +	if (sa->bno_cur)
> +		xfs_btree_del_cursor(sa->bno_cur, XFS_BTREE_ERROR);
> +
> +	sa->refc_cur = NULL;
> +	sa->rmap_cur = NULL;
> +	sa->fino_cur = NULL;
> +	sa->ino_cur = NULL;
> +	sa->bno_cur = NULL;
> +	sa->cnt_cur = NULL;
> +}
> +
> +/* Initialize all the btree cursors for an AG. */
> +int
> +xfs_scrub_ag_btcur_init(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_scrub_ag		*sa)
> +{
> +	struct xfs_mount		*mp = sc->mp;
> +	xfs_agnumber_t			agno = sa->agno;
> +
> +	if (sa->agf_bp) {
> +		/* Set up a bnobt cursor for cross-referencing. */
> +		sa->bno_cur = xfs_allocbt_init_cursor(mp, sc->tp, sa->agf_bp,
> +				agno, XFS_BTNUM_BNO);
> +		if (!sa->bno_cur)
> +			goto err;
> +
> +		/* Set up a cntbt cursor for cross-referencing. */
> +		sa->cnt_cur = xfs_allocbt_init_cursor(mp, sc->tp, sa->agf_bp,
> +				agno, XFS_BTNUM_CNT);
> +		if (!sa->cnt_cur)
> +			goto err;
> +	}
> +
> +	/* Set up a inobt cursor for cross-referencing. */
> +	if (sa->agi_bp) {
> +		sa->ino_cur = xfs_inobt_init_cursor(mp, sc->tp, sa->agi_bp,
> +					agno, XFS_BTNUM_INO);
> +		if (!sa->ino_cur)
> +			goto err;
> +	}
> +
> +	/* Set up a finobt cursor for cross-referencing. */
> +	if (sa->agi_bp && xfs_sb_version_hasfinobt(&mp->m_sb)) {
> +		sa->fino_cur = xfs_inobt_init_cursor(mp, sc->tp, sa->agi_bp,
> +				agno, XFS_BTNUM_FINO);
> +		if (!sa->fino_cur)
> +			goto err;
> +	}
> +
> +	/* Set up a rmapbt cursor for cross-referencing. */
> +	if (sa->agf_bp && xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> +		sa->rmap_cur = xfs_rmapbt_init_cursor(mp, sc->tp, sa->agf_bp,
> +				agno);
> +		if (!sa->rmap_cur)
> +			goto err;
> +	}
> +
> +	/* Set up a refcountbt cursor for cross-referencing. */
> +	if (sa->agf_bp && xfs_sb_version_hasreflink(&mp->m_sb)) {
> +		sa->refc_cur = xfs_refcountbt_init_cursor(mp, sc->tp,
> +				sa->agf_bp, agno, NULL);
> +		if (!sa->refc_cur)
> +			goto err;
> +	}
> +
> +	return 0;
> +err:
> +	return -ENOMEM;
> +}
> +
> +/* Release the AG header context and btree cursors. */
> +void
> +xfs_scrub_ag_free(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_scrub_ag		*sa)
> +{
> +	xfs_scrub_ag_btcur_free(sa);
> +	if (sa->agfl_bp) {
> +		xfs_trans_brelse(sc->tp, sa->agfl_bp);
> +		sa->agfl_bp = NULL;
> +	}
> +	if (sa->agf_bp) {
> +		xfs_trans_brelse(sc->tp, sa->agf_bp);
> +		sa->agf_bp = NULL;
> +	}
> +	if (sa->agi_bp) {
> +		xfs_trans_brelse(sc->tp, sa->agi_bp);
> +		sa->agi_bp = NULL;
> +	}
> +	sa->agno = NULLAGNUMBER;
> +}
> +
> +/*
> + * For scrub, grab the AGI and the AGF headers, in that order.  Locking
> + * order requires us to get the AGI before the AGF.  We use the
> + * transaction to avoid deadlocking on crosslinked metadata buffers;
> + * either the caller passes one in (bmap scrub) or we have to create a
> + * transaction ourselves.
> + */
> +int
> +xfs_scrub_ag_init(
> +	struct xfs_scrub_context	*sc,
> +	xfs_agnumber_t			agno,
> +	struct xfs_scrub_ag		*sa)
> +{
> +	int				error;
> +
> +	sa->agno = agno;
> +	error = xfs_scrub_ag_read_headers(sc, agno, &sa->agi_bp,
> +			&sa->agf_bp, &sa->agfl_bp);
> +	if (error)
> +		return error;
> +
> +	return xfs_scrub_ag_btcur_init(sc, sa);
> +}
> +
>  /* Dummy scrubber */
>
>  int
> @@ -409,6 +576,7 @@ xfs_scrub_teardown(
>  	struct xfs_scrub_context	*sc,
>  	int				error)
>  {
> +	xfs_scrub_ag_free(sc, &sc->sa);
>  	if (sc->tp) {
>  		xfs_trans_cancel(sc->tp);
>  		sc->tp = NULL;
> @@ -430,6 +598,7 @@ xfs_scrub_setup(
>  	sc->sm = sm;
>  	sc->fns = fns;
>  	sc->try_harder = try_harder;
> +	sc->sa.agno = NULLAGNUMBER;
>
>  	return sc->fns->setup(sc, ip);
>  }
> diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
> index 4f3113a..15baccb 100644
> --- a/fs/xfs/scrub/common.h
> +++ b/fs/xfs/scrub/common.h
> @@ -27,6 +27,24 @@ static inline bool xfs_scrub_found_corruption(struct xfs_scrub_metadata *sm)
>  			       XFS_SCRUB_FLAG_XCORRUPT);
>  }
>
> +/* Buffer pointers and btree cursors for an entire AG. */
> +struct xfs_scrub_ag {
> +	xfs_agnumber_t			agno;
> +
> +	/* AG btree roots */
> +	struct xfs_buf			*agf_bp;
> +	struct xfs_buf			*agfl_bp;
> +	struct xfs_buf			*agi_bp;
> +
> +	/* AG btrees */
> +	struct xfs_btree_cur		*bno_cur;
> +	struct xfs_btree_cur		*cnt_cur;
> +	struct xfs_btree_cur		*ino_cur;
> +	struct xfs_btree_cur		*fino_cur;
> +	struct xfs_btree_cur		*rmap_cur;
> +	struct xfs_btree_cur		*refc_cur;
> +};
> +
>  struct xfs_scrub_context {
>  	/* General scrub state. */
>  	struct xfs_mount		*mp;
> @@ -35,6 +53,9 @@ struct xfs_scrub_context {
>  	struct xfs_trans		*tp;
>  	struct xfs_inode		*ip;
>  	bool				try_harder;
> +
> +	/* State tracking for single-AG operations. */
> +	struct xfs_scrub_ag		sa;
>  };
>
>  /* Should we end the scrub early? */
> @@ -164,6 +185,15 @@ bool xfs_scrub_incomplete(struct xfs_scrub_context *sc, const char *type,
>  	xfs_scrub_incomplete((sc), (type), (fs_ok), \
>  			#fs_ok, __func__, __LINE__)
>
> +void xfs_scrub_ag_free(struct xfs_scrub_context *sc, struct xfs_scrub_ag *sa);
> +int xfs_scrub_ag_init(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
> +		      struct xfs_scrub_ag *sa);
> +int xfs_scrub_ag_read_headers(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
> +			      struct xfs_buf **agi, struct xfs_buf **agf,
> +			      struct xfs_buf **agfl);
> +int xfs_scrub_ag_btcur_init(struct xfs_scrub_context *sc,
> +			    struct xfs_scrub_ag *sa);
> +
>  /* Setup functions */
>
>  #define SETUP_FN(name) int name(struct xfs_scrub_context *sc, struct xfs_inode *ip)
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 05/22] xfs: scrub in-memory metadata buffers
  2017-07-21  4:39 ` [PATCH 05/22] xfs: scrub in-memory metadata buffers Darrick J. Wong
@ 2017-07-23 16:48   ` Allison Henderson
  2017-07-24  1:43   ` Dave Chinner
  1 sibling, 0 replies; 63+ messages in thread
From: Allison Henderson @ 2017-07-23 16:48 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs



On 7/20/2017 9:39 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> Call the verifier function for all in-memory metadata buffers, looking
> for memory corruption either due to bad memory or coding bugs.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/Makefile         |    1
>  fs/xfs/libxfs/xfs_fs.h  |    3 +
>  fs/xfs/scrub/common.c   |    4 +
>  fs/xfs/scrub/common.h   |    2 +
>  fs/xfs/scrub/metabufs.c |  177 +++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_trace.h      |    3 +
>  6 files changed, 188 insertions(+), 2 deletions(-)
>  create mode 100644 fs/xfs/scrub/metabufs.c
>
>
> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> index 4e04da9..67cf4ac 100644
> --- a/fs/xfs/Makefile
> +++ b/fs/xfs/Makefile
> @@ -142,5 +142,6 @@ ifeq ($(CONFIG_XFS_ONLINE_SCRUB),y)
>  xfs-y				+= $(addprefix scrub/, \
>  				   btree.o \
>  				   common.o \
> +				   metabufs.o \
>  				   )
>  endif
> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> index aeccc99..9fb3c65 100644
> --- a/fs/xfs/libxfs/xfs_fs.h
> +++ b/fs/xfs/libxfs/xfs_fs.h
> @@ -482,7 +482,8 @@ struct xfs_scrub_metadata {
>   * Metadata types and flags for scrub operation.
>   */
>  #define XFS_SCRUB_TYPE_TEST	0	/* dummy to test ioctl */
> -#define XFS_SCRUB_TYPE_MAX	0
> +#define XFS_SCRUB_TYPE_METABUFS	1	/* in-core metadata buffers */
> +#define XFS_SCRUB_TYPE_MAX	1
>
>  /* i: repair this metadata */
>  #define XFS_SCRUB_FLAG_REPAIR		(1 << 0)
> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> index 331aa14..e06131f 100644
> --- a/fs/xfs/scrub/common.c
> +++ b/fs/xfs/scrub/common.c
> @@ -610,6 +610,10 @@ static const struct xfs_scrub_meta_fns meta_scrub_fns[] = {
>  		.setup	= xfs_scrub_setup_fs,
>  		.scrub	= xfs_scrub_dummy,
>  	},
> +	{ /* in-memory metadata buffers */
> +		.setup	= xfs_scrub_setup_metabufs,
> +		.scrub	= xfs_scrub_metabufs,
> +	},
>  };
>
>  /* Dispatch metadata scrubbing. */
> diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
> index 15baccb..5f0818c 100644
> --- a/fs/xfs/scrub/common.h
> +++ b/fs/xfs/scrub/common.h
> @@ -198,12 +198,14 @@ int xfs_scrub_ag_btcur_init(struct xfs_scrub_context *sc,
>
>  #define SETUP_FN(name) int name(struct xfs_scrub_context *sc, struct xfs_inode *ip)
>  SETUP_FN(xfs_scrub_setup_fs);
> +SETUP_FN(xfs_scrub_setup_metabufs);
>  #undef SETUP_FN
>
>  /* Metadata scrubbers */
>
>  #define SCRUB_FN(name) int name(struct xfs_scrub_context *sc)
>  SCRUB_FN(xfs_scrub_dummy);
> +SCRUB_FN(xfs_scrub_metabufs);
>  #undef SCRUB_FN
>
>  #endif	/* __XFS_REPAIR_COMMON_H__ */
> diff --git a/fs/xfs/scrub/metabufs.c b/fs/xfs/scrub/metabufs.c
> new file mode 100644
> index 0000000..63faaa6
> --- /dev/null
> +++ b/fs/xfs/scrub/metabufs.c
> @@ -0,0 +1,177 @@
> +/*
> + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> + *
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#include "xfs.h"
> +#include "xfs_fs.h"
> +#include "xfs_shared.h"
> +#include "xfs_format.h"
> +#include "xfs_trans_resv.h"
> +#include "xfs_mount.h"
> +#include "xfs_defer.h"
> +#include "xfs_btree.h"
> +#include "xfs_bit.h"
> +#include "xfs_log_format.h"
> +#include "xfs_trans.h"
> +#include "xfs_trace.h"
> +#include "xfs_sb.h"
> +#include "scrub/common.h"
> +
> +/* We only iterate buffers one by one, so we don't need any setup. */
> +int
> +xfs_scrub_setup_metabufs(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_inode		*ip)
> +{
> +	return 0;
> +}
> +
> +#define XFS_SCRUB_METABUFS_TOO_MANY_RETRIES	10
> +struct xfs_scrub_metabufs_info {
> +	struct xfs_scrub_context	*sc;
> +	unsigned int			retries;
> +};
> +
> +/* In-memory buffer corruption. */
> +
> +#define XFS_SCRUB_BUF_OP_ERROR_GOTO(label) \
> +	XFS_SCRUB_OP_ERROR_GOTO(smi->sc, \
> +			xfs_daddr_to_agno(smi->sc->mp, bp->b_bn), \
> +			xfs_daddr_to_agbno(smi->sc->mp, bp->b_bn), "buf", \
> +			&error, label)
> +STATIC int
> +xfs_scrub_metabufs_scrub_buf(
> +	struct xfs_scrub_metabufs_info	*smi,
> +	struct xfs_buf			*bp)
> +{
> +	int				olderror;
> +	int				error = 0;
> +
> +	/*
> +	 * We hold the rcu lock during the rhashtable walk, so we can't risk
> +	 * having the log forced due to a stale buffer by xfs_buf_lock.
> +	 */
> +	if (bp->b_flags & XBF_STALE)
> +		return 0;
> +
> +	atomic_inc(&bp->b_hold);
> +	if (!xfs_buf_trylock(bp)) {
> +		if (smi->retries > XFS_SCRUB_METABUFS_TOO_MANY_RETRIES) {
> +			/* We've retried too many times, do what we can. */
> +			XFS_SCRUB_INCOMPLETE(smi->sc, "metabufs", true);
> +			error = 0;
> +		} else {
> +			/* Restart the metabuf scrub from the start. */
> +			smi->retries++;
> +			error = -EAGAIN;
> +		}
> +		goto out_dec;
> +	}
> +
> +	/* Skip this buffer if it's stale, unread, or has no verifiers. */
> +	if ((bp->b_flags & XBF_STALE) ||
> +	    !(bp->b_flags & XBF_DONE) ||
> +	    !bp->b_ops)
> +		goto out_unlock;
> +
> +	/*
> +	 * Run the verifiers to see if the in-memory buffer is bitrotting or
> +	 * otherwise corrupt.  If the buffer doesn't have a log item then
> +	 * it's clean, so call the read verifier.  However, if the buffer
> +	 * has a log item, it is probably dirty.  Checksums will be written
> +	 * when the buffer is about to go out to disk, so call the write
> +	 * verifier to check the structure.
> +	 */
> +	olderror = bp->b_error;
> +	if (bp->b_fspriv)
> +		bp->b_ops->verify_write(bp);
> +	else
> +		bp->b_ops->verify_read(bp);
> +	error = bp->b_error;
> +	bp->b_error = olderror;
> +
> +	/* Mark any corruption errors we might find. */
> +	XFS_SCRUB_BUF_OP_ERROR_GOTO(out_unlock);
> +
> +out_unlock:
> +	xfs_buf_unlock(bp);
> +out_dec:
> +	atomic_dec(&bp->b_hold);
> +	return error;
> +}
> +#undef XFS_SCRUB_BUF_OP_ERROR_GOTO
> +
> +/* Walk the buffer rhashtable and dispatch buffer checking. */
> +STATIC int
> +xfs_scrub_metabufs_walk_rhash(
> +	struct xfs_scrub_metabufs_info	*smi,
> +	struct rhashtable_iter		*iter)
> +{
> +	struct xfs_buf			*bp;
> +	int				error = 0;
> +
> +	do {
> +		if (xfs_scrub_should_terminate(&error))
> +			break;
> +
> +		bp = rhashtable_walk_next(iter);
> +		if (IS_ERR(bp))
> +			return PTR_ERR(bp);
> +		else if (bp == NULL)
> +			return 0;
> +
> +		error = xfs_scrub_metabufs_scrub_buf(smi, bp);
> +	} while (error != 0);
> +
> +	return error;
> +}
> +
> +/* Try to walk the buffers in this AG in order to scrub them. */
> +int
> +xfs_scrub_metabufs(
> +	struct xfs_scrub_context	*sc)
> +{
> +	struct xfs_scrub_metabufs_info	smi;
> +	struct rhashtable_iter		iter;
> +	struct xfs_perag		*pag;
> +	int				error;
> +
> +	smi.sc = sc;
> +	smi.retries = 0;
> +	pag = xfs_perag_get(sc->mp, sc->sm->sm_agno);
> +	rhashtable_walk_enter(&pag->pag_buf_hash, &iter);
> +
> +	while (1) {
> +		if (xfs_scrub_should_terminate(&error))
> +			break;
> +
> +		error = rhashtable_walk_start(&iter);
> +		if (!error) {
> +			error = xfs_scrub_metabufs_walk_rhash(&smi, &iter);
> +			rhashtable_walk_stop(&iter);
> +		}
> +
> +		if (error != -EAGAIN)
> +			break;
> +		cond_resched();
> +	}
I suppose it's unlikely that we end up looping too many times, but do 
you think we should we have a max number of tries just in case?

Rest of the patch looks good.
Reviewed by: Allison Henderson <allison.henderson@oracle.com>

> +
> +	rhashtable_walk_exit(&iter);
> +	xfs_perag_put(pag);
> +	return error;
> +}
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index d4de29b..036e65c 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -3312,7 +3312,8 @@ DEFINE_GETFSMAP_EVENT(xfs_getfsmap_mapping);
>
>  /* scrub */
>  #define XFS_SCRUB_TYPE_DESC \
> -	{ XFS_SCRUB_TYPE_TEST,		"dummy" }
> +	{ XFS_SCRUB_TYPE_TEST,		"dummy" }, \
> +	{ XFS_SCRUB_TYPE_METABUFS,	"metabufs" }
>  DECLARE_EVENT_CLASS(xfs_scrub_class,
>  	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
>  		 int error),
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 06/22] xfs: scrub the backup superblocks
  2017-07-21  4:39 ` [PATCH 06/22] xfs: scrub the backup superblocks Darrick J. Wong
@ 2017-07-23 16:50   ` Allison Henderson
  2017-07-25  4:05   ` Dave Chinner
  1 sibling, 0 replies; 63+ messages in thread
From: Allison Henderson @ 2017-07-23 16:50 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

This one looks ok to me.  Most of the work is in the macros, I think it 
looks pretty clean.

Reviewed by: Allison Henderson <allison.henderson@oracle.com>

On 7/20/2017 9:39 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> Ensure that the geometry presented in the backup superblocks matches
> the primary superblock so that repair can recover the filesystem if
> that primary gets corrupted.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/Makefile         |    1
>  fs/xfs/libxfs/xfs_fs.h  |    3 -
>  fs/xfs/scrub/agheader.c |  197 +++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/scrub/common.c   |    4 +
>  fs/xfs/scrub/common.h   |    2
>  fs/xfs/xfs_trace.h      |    3 -
>  6 files changed, 208 insertions(+), 2 deletions(-)
>  create mode 100644 fs/xfs/scrub/agheader.c
>
>
> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> index 67cf4ac..1b9bd1a 100644
> --- a/fs/xfs/Makefile
> +++ b/fs/xfs/Makefile
> @@ -140,6 +140,7 @@ xfs-$(CONFIG_EXPORTFS_BLOCK_OPS)	+= xfs_pnfs.o
>  # online scrub/repair
>  ifeq ($(CONFIG_XFS_ONLINE_SCRUB),y)
>  xfs-y				+= $(addprefix scrub/, \
> +				   agheader.o \
>  				   btree.o \
>  				   common.o \
>  				   metabufs.o \
> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> index 9fb3c65..2f12fb1 100644
> --- a/fs/xfs/libxfs/xfs_fs.h
> +++ b/fs/xfs/libxfs/xfs_fs.h
> @@ -483,7 +483,8 @@ struct xfs_scrub_metadata {
>   */
>  #define XFS_SCRUB_TYPE_TEST	0	/* dummy to test ioctl */
>  #define XFS_SCRUB_TYPE_METABUFS	1	/* in-core metadata buffers */
> -#define XFS_SCRUB_TYPE_MAX	1
> +#define XFS_SCRUB_TYPE_SB	2	/* superblock */
> +#define XFS_SCRUB_TYPE_MAX	2
>
>  /* i: repair this metadata */
>  #define XFS_SCRUB_FLAG_REPAIR		(1 << 0)
> diff --git a/fs/xfs/scrub/agheader.c b/fs/xfs/scrub/agheader.c
> new file mode 100644
> index 0000000..3bca60d
> --- /dev/null
> +++ b/fs/xfs/scrub/agheader.c
> @@ -0,0 +1,197 @@
> +/*
> + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> + *
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#include "xfs.h"
> +#include "xfs_fs.h"
> +#include "xfs_shared.h"
> +#include "xfs_format.h"
> +#include "xfs_trans_resv.h"
> +#include "xfs_mount.h"
> +#include "xfs_defer.h"
> +#include "xfs_btree.h"
> +#include "xfs_bit.h"
> +#include "xfs_log_format.h"
> +#include "xfs_trans.h"
> +#include "xfs_trace.h"
> +#include "xfs_sb.h"
> +#include "xfs_inode.h"
> +#include "scrub/common.h"
> +
> +/* Set us up to check an AG header. */
> +int
> +xfs_scrub_setup_ag_header(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_inode		*ip)
> +{
> +	struct xfs_mount		*mp = sc->mp;
> +
> +	if (sc->sm->sm_agno >= mp->m_sb.sb_agcount ||
> +	    sc->sm->sm_ino || sc->sm->sm_gen)
> +		return -EINVAL;
> +	return xfs_scrub_setup_fs(sc, ip);
> +}
> +
> +/* Superblock */
> +
> +#define XFS_SCRUB_SB_CHECK(fs_ok) \
> +	XFS_SCRUB_CHECK(sc, bp, "superblock", fs_ok)
> +#define XFS_SCRUB_SB_PREEN(fs_ok) \
> +	XFS_SCRUB_PREEN(sc, bp, "superblock", fs_ok)
> +#define XFS_SCRUB_SB_OP_ERROR_GOTO(label) \
> +	XFS_SCRUB_OP_ERROR_GOTO(sc, agno, 0, "superblock", &error, out)
> +/* Scrub the filesystem superblock. */
> +int
> +xfs_scrub_superblock(
> +	struct xfs_scrub_context	*sc)
> +{
> +	struct xfs_mount		*mp = sc->mp;
> +	struct xfs_buf			*bp;
> +	struct xfs_sb			sb;
> +	xfs_agnumber_t			agno;
> +	uint32_t			v2_ok;
> +	int				error;
> +
> +	agno = sc->sm->sm_agno;
> +
> +	error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
> +		  XFS_AGB_TO_DADDR(mp, agno, XFS_SB_BLOCK(mp)),
> +		  XFS_FSS_TO_BB(mp, 1), 0, &bp, &xfs_sb_buf_ops);
> +	if (error) {
> +		trace_xfs_scrub_block_error(mp, agno, XFS_SB_BLOCK(mp),
> +				"superblock", "error != 0", __func__, __LINE__);
> +		error = 0;
> +		sc->sm->sm_flags |= XFS_SCRUB_FLAG_CORRUPT;
> +		goto out;
> +	}
> +
> +	/*
> +	 * The in-core sb is a more up-to-date copy of AG 0's sb,
> +	 * so there's no point in comparing the two.
> +	 */
> +	if (agno == 0)
> +		goto out;
> +
> +	xfs_sb_from_disk(&sb, XFS_BUF_TO_SBP(bp));
> +
> +	/* Verify the geometries match. */
> +#define XFS_SCRUB_SB_FIELD(fn) \
> +		XFS_SCRUB_SB_CHECK(sb.sb_##fn == mp->m_sb.sb_##fn)
> +#define XFS_PREEN_SB_FIELD(fn) \
> +		XFS_SCRUB_SB_PREEN(sb.sb_##fn == mp->m_sb.sb_##fn)
> +	XFS_SCRUB_SB_FIELD(blocksize);
> +	XFS_SCRUB_SB_FIELD(dblocks);
> +	XFS_SCRUB_SB_FIELD(rblocks);
> +	XFS_SCRUB_SB_FIELD(rextents);
> +	XFS_SCRUB_SB_PREEN(uuid_equal(&sb.sb_uuid, &mp->m_sb.sb_uuid));
> +	XFS_SCRUB_SB_FIELD(logstart);
> +	XFS_PREEN_SB_FIELD(rootino);
> +	XFS_PREEN_SB_FIELD(rbmino);
> +	XFS_PREEN_SB_FIELD(rsumino);
> +	XFS_SCRUB_SB_FIELD(rextsize);
> +	XFS_SCRUB_SB_FIELD(agblocks);
> +	XFS_SCRUB_SB_FIELD(agcount);
> +	XFS_SCRUB_SB_FIELD(rbmblocks);
> +	XFS_SCRUB_SB_FIELD(logblocks);
> +	XFS_SCRUB_SB_CHECK(!(sb.sb_versionnum & ~XFS_SB_VERSION_OKBITS));
> +	XFS_SCRUB_SB_CHECK(XFS_SB_VERSION_NUM(&sb) ==
> +			   XFS_SB_VERSION_NUM(&mp->m_sb));
> +	XFS_SCRUB_SB_FIELD(sectsize);
> +	XFS_SCRUB_SB_FIELD(inodesize);
> +	XFS_SCRUB_SB_FIELD(inopblock);
> +	XFS_SCRUB_SB_PREEN(memcmp(sb.sb_fname, mp->m_sb.sb_fname,
> +			   sizeof(sb.sb_fname)) == 0);
> +	XFS_SCRUB_SB_FIELD(blocklog);
> +	XFS_SCRUB_SB_FIELD(sectlog);
> +	XFS_SCRUB_SB_FIELD(inodelog);
> +	XFS_SCRUB_SB_FIELD(inopblog);
> +	XFS_SCRUB_SB_FIELD(agblklog);
> +	XFS_SCRUB_SB_FIELD(rextslog);
> +	XFS_PREEN_SB_FIELD(imax_pct);
> +	XFS_PREEN_SB_FIELD(uquotino);
> +	XFS_PREEN_SB_FIELD(gquotino);
> +	XFS_SCRUB_SB_FIELD(shared_vn);
> +	XFS_SCRUB_SB_FIELD(inoalignmt);
> +	XFS_PREEN_SB_FIELD(unit);
> +	XFS_PREEN_SB_FIELD(width);
> +	XFS_SCRUB_SB_FIELD(dirblklog);
> +	XFS_SCRUB_SB_FIELD(logsectlog);
> +	XFS_SCRUB_SB_FIELD(logsectsize);
> +	XFS_SCRUB_SB_FIELD(logsunit);
> +	v2_ok = XFS_SB_VERSION2_OKBITS;
> +	if (XFS_SB_VERSION_NUM(&sb) >= XFS_SB_VERSION_5)
> +		v2_ok |= XFS_SB_VERSION2_CRCBIT;
> +	XFS_SCRUB_SB_CHECK(!(sb.sb_features2 & ~v2_ok));
> +	XFS_SCRUB_SB_PREEN(sb.sb_features2 == sb.sb_bad_features2);
> +	XFS_SCRUB_SB_CHECK(!sb.sb_features2 ||
> +			xfs_sb_version_hasmorebits(&mp->m_sb));
> +	if (xfs_sb_version_hascrc(&mp->m_sb)) {
> +		XFS_SCRUB_SB_CHECK(!xfs_sb_has_compat_feature(&sb,
> +				XFS_SB_FEAT_COMPAT_UNKNOWN));
> +		XFS_SCRUB_SB_CHECK(!xfs_sb_has_ro_compat_feature(&sb,
> +				XFS_SB_FEAT_RO_COMPAT_UNKNOWN));
> +		XFS_SCRUB_SB_CHECK(!xfs_sb_has_incompat_feature(&sb,
> +				XFS_SB_FEAT_INCOMPAT_UNKNOWN));
> +		XFS_SCRUB_SB_CHECK(!xfs_sb_has_incompat_log_feature(&sb,
> +				XFS_SB_FEAT_INCOMPAT_LOG_UNKNOWN));
> +		XFS_SCRUB_SB_FIELD(spino_align);
> +		XFS_PREEN_SB_FIELD(pquotino);
> +	}
> +	if (xfs_sb_version_hasmetauuid(&mp->m_sb)) {
> +		XFS_SCRUB_SB_CHECK(uuid_equal(&sb.sb_meta_uuid,
> +					&mp->m_sb.sb_meta_uuid));
> +		XFS_SCRUB_SB_CHECK(uuid_equal(&sb.sb_uuid,
> +					&mp->m_sb.sb_uuid));
> +	} else
> +		XFS_SCRUB_SB_CHECK(uuid_equal(&sb.sb_uuid,
> +					&mp->m_sb.sb_meta_uuid));
> +#undef XFS_SCRUB_SB_FIELD
> +
> +#define XFS_SCRUB_SB_FEAT(fn) \
> +		XFS_SCRUB_SB_CHECK(xfs_sb_version_has##fn(&sb) == \
> +		xfs_sb_version_has##fn(&mp->m_sb))
> +	XFS_SCRUB_SB_FEAT(align);
> +	XFS_SCRUB_SB_FEAT(dalign);
> +	XFS_SCRUB_SB_FEAT(logv2);
> +	XFS_SCRUB_SB_FEAT(extflgbit);
> +	XFS_SCRUB_SB_FEAT(sector);
> +	XFS_SCRUB_SB_FEAT(asciici);
> +	XFS_SCRUB_SB_FEAT(morebits);
> +	XFS_SCRUB_SB_FEAT(lazysbcount);
> +	XFS_SCRUB_SB_FEAT(crc);
> +	XFS_SCRUB_SB_FEAT(_pquotino);
> +	XFS_SCRUB_SB_FEAT(ftype);
> +	XFS_SCRUB_SB_FEAT(finobt);
> +	XFS_SCRUB_SB_FEAT(sparseinodes);
> +	XFS_SCRUB_SB_FEAT(metauuid);
> +	XFS_SCRUB_SB_FEAT(rmapbt);
> +	XFS_SCRUB_SB_FEAT(reflink);
> +#undef XFS_SCRUB_SB_FEAT
> +
> +#define XFS_SCRUB_SB_FEAT_PREEN(fn) \
> +		XFS_SCRUB_SB_PREEN(xfs_sb_version_has##fn(&sb) == \
> +		xfs_sb_version_has##fn(&mp->m_sb))
> +	XFS_SCRUB_SB_FEAT_PREEN(attr);
> +	XFS_SCRUB_SB_FEAT_PREEN(attr2);
> +#undef XFS_SCRUB_SB_FEAT_PREEN
> +
> +out:
> +	return error;
> +}
> +#undef XFS_SCRUB_SB_OP_ERROR_GOTO
> +#undef XFS_SCRUB_SB_CHECK
> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> index e06131f..9285107 100644
> --- a/fs/xfs/scrub/common.c
> +++ b/fs/xfs/scrub/common.c
> @@ -614,6 +614,10 @@ static const struct xfs_scrub_meta_fns meta_scrub_fns[] = {
>  		.setup	= xfs_scrub_setup_metabufs,
>  		.scrub	= xfs_scrub_metabufs,
>  	},
> +	{ /* superblock */
> +		.setup	= xfs_scrub_setup_ag_header,
> +		.scrub	= xfs_scrub_superblock,
> +	},
>  };
>
>  /* Dispatch metadata scrubbing. */
> diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
> index 5f0818c..094a708 100644
> --- a/fs/xfs/scrub/common.h
> +++ b/fs/xfs/scrub/common.h
> @@ -199,6 +199,7 @@ int xfs_scrub_ag_btcur_init(struct xfs_scrub_context *sc,
>  #define SETUP_FN(name) int name(struct xfs_scrub_context *sc, struct xfs_inode *ip)
>  SETUP_FN(xfs_scrub_setup_fs);
>  SETUP_FN(xfs_scrub_setup_metabufs);
> +SETUP_FN(xfs_scrub_setup_ag_header);
>  #undef SETUP_FN
>
>  /* Metadata scrubbers */
> @@ -206,6 +207,7 @@ SETUP_FN(xfs_scrub_setup_metabufs);
>  #define SCRUB_FN(name) int name(struct xfs_scrub_context *sc)
>  SCRUB_FN(xfs_scrub_dummy);
>  SCRUB_FN(xfs_scrub_metabufs);
> +SCRUB_FN(xfs_scrub_superblock);
>  #undef SCRUB_FN
>
>  #endif	/* __XFS_REPAIR_COMMON_H__ */
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 036e65c..483008a 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -3313,7 +3313,8 @@ DEFINE_GETFSMAP_EVENT(xfs_getfsmap_mapping);
>  /* scrub */
>  #define XFS_SCRUB_TYPE_DESC \
>  	{ XFS_SCRUB_TYPE_TEST,		"dummy" }, \
> -	{ XFS_SCRUB_TYPE_METABUFS,	"metabufs" }
> +	{ XFS_SCRUB_TYPE_METABUFS,	"metabufs" }, \
> +	{ XFS_SCRUB_TYPE_SB,		"superblock" }
>  DECLARE_EVENT_CLASS(xfs_scrub_class,
>  	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
>  		 int error),
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 07/22] xfs: scrub AGF and AGFL
  2017-07-21  4:39 ` [PATCH 07/22] xfs: scrub AGF and AGFL Darrick J. Wong
@ 2017-07-23 16:59   ` Allison Henderson
  0 siblings, 0 replies; 63+ messages in thread
From: Allison Henderson @ 2017-07-23 16:59 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

Looks good.  Similar the other scrub routines.
Reviewed by: Allison Henderson <allison.henderson@oracle.com>

On 7/20/2017 9:39 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> Check the block references in the AGF and AGFL headers to make sure
> they make sense.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_fs.h  |    4 +
>  fs/xfs/scrub/agheader.c |  227 +++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/scrub/common.c   |   68 ++++++++++++++
>  fs/xfs/scrub/common.h   |    8 ++
>  fs/xfs/xfs_trace.h      |    4 +
>  5 files changed, 309 insertions(+), 2 deletions(-)
>
>
> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> index 2f12fb1..cc35b7d 100644
> --- a/fs/xfs/libxfs/xfs_fs.h
> +++ b/fs/xfs/libxfs/xfs_fs.h
> @@ -484,7 +484,9 @@ struct xfs_scrub_metadata {
>  #define XFS_SCRUB_TYPE_TEST	0	/* dummy to test ioctl */
>  #define XFS_SCRUB_TYPE_METABUFS	1	/* in-core metadata buffers */
>  #define XFS_SCRUB_TYPE_SB	2	/* superblock */
> -#define XFS_SCRUB_TYPE_MAX	2
> +#define XFS_SCRUB_TYPE_AGF	3	/* AG free header */
> +#define XFS_SCRUB_TYPE_AGFL	4	/* AG free list */
> +#define XFS_SCRUB_TYPE_MAX	4
>
>  /* i: repair this metadata */
>  #define XFS_SCRUB_FLAG_REPAIR		(1 << 0)
> diff --git a/fs/xfs/scrub/agheader.c b/fs/xfs/scrub/agheader.c
> index 3bca60d..48e276c 100644
> --- a/fs/xfs/scrub/agheader.c
> +++ b/fs/xfs/scrub/agheader.c
> @@ -47,6 +47,72 @@ xfs_scrub_setup_ag_header(
>  	return xfs_scrub_setup_fs(sc, ip);
>  }
>
> +/* Find the size of the AG, in blocks. */
> +static inline xfs_agblock_t
> +xfs_scrub_ag_blocks(
> +	struct xfs_mount	*mp,
> +	xfs_agnumber_t		agno)
> +{
> +	ASSERT(agno < mp->m_sb.sb_agcount);
> +
> +	if (agno < mp->m_sb.sb_agcount - 1)
> +		return mp->m_sb.sb_agblocks;
> +	return mp->m_sb.sb_dblocks - (agno * mp->m_sb.sb_agblocks);
> +}
> +
> +/* Walk all the blocks in the AGFL. */
> +int
> +xfs_scrub_walk_agfl(
> +	struct xfs_scrub_context	*sc,
> +	int				(*fn)(struct xfs_scrub_context *,
> +					      xfs_agblock_t bno, void *),
> +	void				*priv)
> +{
> +	struct xfs_agf			*agf;
> +	__be32				*agfl_bno;
> +	struct xfs_mount		*mp = sc->mp;
> +	unsigned int			flfirst;
> +	unsigned int			fllast;
> +	int				i;
> +	int				error;
> +
> +	agf = XFS_BUF_TO_AGF(sc->sa.agf_bp);
> +	agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, sc->sa.agfl_bp);
> +	flfirst = be32_to_cpu(agf->agf_flfirst);
> +	fllast = be32_to_cpu(agf->agf_fllast);
> +
> +	/* Skip an empty AGFL. */
> +	if (agf->agf_flcount == cpu_to_be32(0))
> +		return 0;
> +
> +	/* first to last is a consecutive list. */
> +	if (fllast >= flfirst) {
> +		for (i = flfirst; i <= fllast; i++) {
> +			error = fn(sc, be32_to_cpu(agfl_bno[i]), priv);
> +			if (error)
> +				return error;
> +		}
> +
> +		return 0;
> +	}
> +
> +	/* first to the end */
> +	for (i = flfirst; i < XFS_AGFL_SIZE(mp); i++) {
> +		error = fn(sc, be32_to_cpu(agfl_bno[i]), priv);
> +		if (error)
> +			return error;
> +	}
> +
> +	/* the start to last. */
> +	for (i = 0; i <= fllast; i++) {
> +		error = fn(sc, be32_to_cpu(agfl_bno[i]), priv);
> +		if (error)
> +			return error;
> +	}
> +
> +	return 0;
> +}
> +
>  /* Superblock */
>
>  #define XFS_SCRUB_SB_CHECK(fs_ok) \
> @@ -195,3 +261,164 @@ xfs_scrub_superblock(
>  }
>  #undef XFS_SCRUB_SB_OP_ERROR_GOTO
>  #undef XFS_SCRUB_SB_CHECK
> +
> +/* AGF */
> +
> +#define XFS_SCRUB_AGF_CHECK(fs_ok) \
> +	XFS_SCRUB_CHECK(sc, sc->sa.agf_bp, "AGF", fs_ok)
> +#define XFS_SCRUB_AGF_OP_ERROR_GOTO(error, label) \
> +	XFS_SCRUB_OP_ERROR_GOTO(sc, sc->sm->sm_agno, \
> +			XFS_AGF_BLOCK(sc->mp), "AGF", error, label)
> +/* Scrub the AGF. */
> +int
> +xfs_scrub_agf(
> +	struct xfs_scrub_context	*sc)
> +{
> +	struct xfs_mount		*mp = sc->mp;
> +	struct xfs_agf			*agf;
> +	xfs_daddr_t			daddr;
> +	xfs_daddr_t			eofs;
> +	xfs_agnumber_t			agno;
> +	xfs_agblock_t			agbno;
> +	xfs_agblock_t			eoag;
> +	xfs_agblock_t			agfl_first;
> +	xfs_agblock_t			agfl_last;
> +	xfs_agblock_t			agfl_count;
> +	xfs_agblock_t			fl_count;
> +	int				level;
> +	int				error = 0;
> +
> +	agno = sc->sm->sm_agno;
> +	error = xfs_scrub_load_ag_headers(sc, agno, XFS_SCRUB_TYPE_AGF);
> +	XFS_SCRUB_AGF_OP_ERROR_GOTO(&error, out);
> +
> +	agf = XFS_BUF_TO_AGF(sc->sa.agf_bp);
> +	eofs = XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks);
> +
> +	/* Check the AG length */
> +	eoag = be32_to_cpu(agf->agf_length);
> +	XFS_SCRUB_AGF_CHECK(eoag == xfs_scrub_ag_blocks(mp, agno));
> +
> +	/* Check the AGF btree roots and levels */
> +	agbno = be32_to_cpu(agf->agf_roots[XFS_BTNUM_BNO]);
> +	daddr = XFS_AGB_TO_DADDR(mp, agno, agbno);
> +	XFS_SCRUB_AGF_CHECK(agbno > XFS_AGI_BLOCK(mp));
> +	XFS_SCRUB_AGF_CHECK(agbno < mp->m_sb.sb_agblocks);
> +	XFS_SCRUB_AGF_CHECK(agbno < eoag);
> +	XFS_SCRUB_AGF_CHECK(daddr < eofs);
> +
> +	agbno = be32_to_cpu(agf->agf_roots[XFS_BTNUM_CNT]);
> +	daddr = XFS_AGB_TO_DADDR(mp, agno, agbno);
> +	XFS_SCRUB_AGF_CHECK(agbno > XFS_AGI_BLOCK(mp));
> +	XFS_SCRUB_AGF_CHECK(agbno < mp->m_sb.sb_agblocks);
> +	XFS_SCRUB_AGF_CHECK(agbno < eoag);
> +	XFS_SCRUB_AGF_CHECK(daddr < eofs);
> +
> +	level = be32_to_cpu(agf->agf_levels[XFS_BTNUM_BNO]);
> +	XFS_SCRUB_AGF_CHECK(level > 0);
> +	XFS_SCRUB_AGF_CHECK(level <= XFS_BTREE_MAXLEVELS);
> +
> +	level = be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNT]);
> +	XFS_SCRUB_AGF_CHECK(level > 0);
> +	XFS_SCRUB_AGF_CHECK(level <= XFS_BTREE_MAXLEVELS);
> +
> +	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> +		agbno = be32_to_cpu(agf->agf_roots[XFS_BTNUM_RMAP]);
> +		daddr = XFS_AGB_TO_DADDR(mp, agno, agbno);
> +		XFS_SCRUB_AGF_CHECK(agbno > XFS_AGI_BLOCK(mp));
> +		XFS_SCRUB_AGF_CHECK(agbno < mp->m_sb.sb_agblocks);
> +		XFS_SCRUB_AGF_CHECK(agbno < eoag);
> +		XFS_SCRUB_AGF_CHECK(daddr < eofs);
> +
> +		level = be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAP]);
> +		XFS_SCRUB_AGF_CHECK(level > 0);
> +		XFS_SCRUB_AGF_CHECK(level <= XFS_BTREE_MAXLEVELS);
> +	}
> +
> +	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> +		agbno = be32_to_cpu(agf->agf_refcount_root);
> +		daddr = XFS_AGB_TO_DADDR(mp, agno, agbno);
> +		XFS_SCRUB_AGF_CHECK(agbno > XFS_AGI_BLOCK(mp));
> +		XFS_SCRUB_AGF_CHECK(agbno < mp->m_sb.sb_agblocks);
> +		XFS_SCRUB_AGF_CHECK(agbno < eoag);
> +		XFS_SCRUB_AGF_CHECK(daddr < eofs);
> +
> +		level = be32_to_cpu(agf->agf_refcount_level);
> +		XFS_SCRUB_AGF_CHECK(level > 0);
> +		XFS_SCRUB_AGF_CHECK(level <= XFS_BTREE_MAXLEVELS);
> +	}
> +
> +	/* Check the AGFL counters */
> +	agfl_first = be32_to_cpu(agf->agf_flfirst);
> +	agfl_last = be32_to_cpu(agf->agf_fllast);
> +	agfl_count = be32_to_cpu(agf->agf_flcount);
> +	if (agfl_last > agfl_first)
> +		fl_count = agfl_last - agfl_first + 1;
> +	else
> +		fl_count = XFS_AGFL_SIZE(mp) - agfl_first + agfl_last + 1;
> +	XFS_SCRUB_AGF_CHECK(agfl_count == 0 || fl_count == agfl_count);
> +
> +out:
> +	return error;
> +}
> +#undef XFS_SCRUB_AGF_OP_ERROR_GOTO
> +#undef XFS_SCRUB_AGF_CHECK
> +
> +/* AGFL */
> +
> +#define XFS_SCRUB_AGFL_CHECK(fs_ok) \
> +	XFS_SCRUB_CHECK(sc, sc->sa.agfl_bp, "AGFL", fs_ok)
> +struct xfs_scrub_agfl {
> +	xfs_agblock_t			eoag;
> +	xfs_daddr_t			eofs;
> +};
> +
> +/* Scrub an AGFL block. */
> +STATIC int
> +xfs_scrub_agfl_block(
> +	struct xfs_scrub_context	*sc,
> +	xfs_agblock_t			agbno,
> +	void				*priv)
> +{
> +	struct xfs_mount		*mp = sc->mp;
> +	xfs_agnumber_t			agno = sc->sa.agno;
> +	struct xfs_scrub_agfl		*sagfl = priv;
> +
> +	XFS_SCRUB_AGFL_CHECK(agbno > XFS_AGI_BLOCK(mp));
> +	XFS_SCRUB_AGFL_CHECK(XFS_AGB_TO_DADDR(mp, agno, agbno) < sagfl->eofs);
> +	XFS_SCRUB_AGFL_CHECK(agbno < mp->m_sb.sb_agblocks);
> +	XFS_SCRUB_AGFL_CHECK(agbno < sagfl->eoag);
> +
> +	return 0;
> +}
> +
> +#define XFS_SCRUB_AGFL_OP_ERROR_GOTO(error, label) \
> +	XFS_SCRUB_OP_ERROR_GOTO(sc, sc->sm->sm_agno, \
> +			XFS_AGFL_BLOCK(sc->mp), "AGFL", error, label)
> +/* Scrub the AGFL. */
> +int
> +xfs_scrub_agfl(
> +	struct xfs_scrub_context	*sc)
> +{
> +	struct xfs_scrub_agfl		sagfl;
> +	struct xfs_mount		*mp = sc->mp;
> +	struct xfs_agf			*agf;
> +	int				error;
> +
> +	error = xfs_scrub_load_ag_headers(sc, sc->sm->sm_agno,
> +			XFS_SCRUB_TYPE_AGFL);
> +	XFS_SCRUB_AGFL_OP_ERROR_GOTO(&error, out);
> +	if (!sc->sa.agf_bp)
> +		return -EFSCORRUPTED;
> +
> +	agf = XFS_BUF_TO_AGF(sc->sa.agf_bp);
> +	sagfl.eofs = XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks);
> +	sagfl.eoag = be32_to_cpu(agf->agf_length);
> +
> +	/* Check the blocks in the AGFL. */
> +	return xfs_scrub_walk_agfl(sc, xfs_scrub_agfl_block, &sagfl);
> +out:
> +	return error;
> +}
> +#undef XFS_SCRUB_AGFL_OP_ERROR_GOTO
> +#undef XFS_SCRUB_AGFL_CHECK
> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> index 9285107..d1ef722 100644
> --- a/fs/xfs/scrub/common.c
> +++ b/fs/xfs/scrub/common.c
> @@ -603,6 +603,66 @@ xfs_scrub_setup(
>  	return sc->fns->setup(sc, ip);
>  }
>
> +/*
> + * Load and verify an AG header for further AG header examination.
> + * If this header is not the target of the examination, don't return
> + * the buffer if a runtime or verifier error occurs.
> + */
> +STATIC int
> +xfs_scrub_load_ag_header(
> +	struct xfs_scrub_context	*sc,
> +	xfs_daddr_t			daddr,
> +	struct xfs_buf			**bpp,
> +	const struct xfs_buf_ops	*ops,
> +	bool				is_target)
> +{
> +	struct xfs_mount		*mp = sc->mp;
> +	int				error;
> +
> +	*bpp = NULL;
> +	error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
> +			XFS_AG_DADDR(mp, sc->sa.agno, daddr),
> +			XFS_FSS_TO_BB(mp, 1), 0, bpp, ops);
> +	return is_target ? error : 0;
> +}
> +
> +/*
> + * Load as many of the AG headers and btree cursors as we can for an
> + * examination and cross-reference of an AG header.
> + */
> +int
> +xfs_scrub_load_ag_headers(
> +	struct xfs_scrub_context	*sc,
> +	xfs_agnumber_t			agno,
> +	unsigned int			type)
> +{
> +	struct xfs_mount		*mp = sc->mp;
> +	int				error;
> +
> +	ASSERT(type == XFS_SCRUB_TYPE_AGF || type == XFS_SCRUB_TYPE_AGFL);
> +	memset(&sc->sa, 0, sizeof(sc->sa));
> +	sc->sa.agno = agno;
> +
> +	error = xfs_scrub_load_ag_header(sc, XFS_AGI_DADDR(mp),
> +			&sc->sa.agi_bp, &xfs_agi_buf_ops, false);
> +	if (error)
> +		return error;
> +
> +	error = xfs_scrub_load_ag_header(sc, XFS_AGF_DADDR(mp),
> +			&sc->sa.agf_bp, &xfs_agf_buf_ops,
> +			type == XFS_SCRUB_TYPE_AGF);
> +	if (error)
> +		return error;
> +
> +	error = xfs_scrub_load_ag_header(sc, XFS_AGFL_DADDR(mp),
> +			&sc->sa.agfl_bp, &xfs_agfl_buf_ops,
> +			type == XFS_SCRUB_TYPE_AGFL);
> +	if (error)
> +		return error;
> +
> +	return 0;
> +}
> +
>  /* Scrubbing dispatch. */
>
>  static const struct xfs_scrub_meta_fns meta_scrub_fns[] = {
> @@ -618,6 +678,14 @@ static const struct xfs_scrub_meta_fns meta_scrub_fns[] = {
>  		.setup	= xfs_scrub_setup_ag_header,
>  		.scrub	= xfs_scrub_superblock,
>  	},
> +	{ /* agf */
> +		.setup	= xfs_scrub_setup_ag_header,
> +		.scrub	= xfs_scrub_agf,
> +	},
> +	{ /* agfl */
> +		.setup	= xfs_scrub_setup_ag_header,
> +		.scrub	= xfs_scrub_agfl,
> +	},
>  };
>
>  /* Dispatch metadata scrubbing. */
> diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
> index 094a708..5c4893d 100644
> --- a/fs/xfs/scrub/common.h
> +++ b/fs/xfs/scrub/common.h
> @@ -193,6 +193,12 @@ int xfs_scrub_ag_read_headers(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
>  			      struct xfs_buf **agfl);
>  int xfs_scrub_ag_btcur_init(struct xfs_scrub_context *sc,
>  			    struct xfs_scrub_ag *sa);
> +int xfs_scrub_load_ag_headers(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
> +			      unsigned int type);
> +int xfs_scrub_walk_agfl(struct xfs_scrub_context *sc,
> +			int (*fn)(struct xfs_scrub_context *, xfs_agblock_t bno,
> +				  void *),
> +			void *priv);
>
>  /* Setup functions */
>
> @@ -208,6 +214,8 @@ SETUP_FN(xfs_scrub_setup_ag_header);
>  SCRUB_FN(xfs_scrub_dummy);
>  SCRUB_FN(xfs_scrub_metabufs);
>  SCRUB_FN(xfs_scrub_superblock);
> +SCRUB_FN(xfs_scrub_agf);
> +SCRUB_FN(xfs_scrub_agfl);
>  #undef SCRUB_FN
>
>  #endif	/* __XFS_REPAIR_COMMON_H__ */
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 483008a..ebf6045 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -3314,7 +3314,9 @@ DEFINE_GETFSMAP_EVENT(xfs_getfsmap_mapping);
>  #define XFS_SCRUB_TYPE_DESC \
>  	{ XFS_SCRUB_TYPE_TEST,		"dummy" }, \
>  	{ XFS_SCRUB_TYPE_METABUFS,	"metabufs" }, \
> -	{ XFS_SCRUB_TYPE_SB,		"superblock" }
> +	{ XFS_SCRUB_TYPE_SB,		"superblock" }, \
> +	{ XFS_SCRUB_TYPE_AGF,		"AGF" }, \
> +	{ XFS_SCRUB_TYPE_AGFL,		"AGFL" }
>  DECLARE_EVENT_CLASS(xfs_scrub_class,
>  	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
>  		 int error),
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 08/22] xfs: scrub the AGI
  2017-07-21  4:39 ` [PATCH 08/22] xfs: scrub the AGI Darrick J. Wong
@ 2017-07-23 17:02   ` Allison Henderson
  0 siblings, 0 replies; 63+ messages in thread
From: Allison Henderson @ 2017-07-23 17:02 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

Reviewed by: Allison Henderson <allison.henderson@oracle.com>

On 7/20/2017 9:39 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> Add a forgotten check to the AGI verifier, then wire up the scrub
> infrastructure to check the AGI contents.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_fs.h  |    3 +
>  fs/xfs/scrub/agheader.c |   96 +++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/scrub/common.c   |   10 ++++-
>  fs/xfs/scrub/common.h   |    1
>  fs/xfs/xfs_trace.h      |    3 +
>  5 files changed, 109 insertions(+), 4 deletions(-)
>
>
> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> index cc35b7d..208cc48 100644
> --- a/fs/xfs/libxfs/xfs_fs.h
> +++ b/fs/xfs/libxfs/xfs_fs.h
> @@ -486,7 +486,8 @@ struct xfs_scrub_metadata {
>  #define XFS_SCRUB_TYPE_SB	2	/* superblock */
>  #define XFS_SCRUB_TYPE_AGF	3	/* AG free header */
>  #define XFS_SCRUB_TYPE_AGFL	4	/* AG free list */
> -#define XFS_SCRUB_TYPE_MAX	4
> +#define XFS_SCRUB_TYPE_AGI	5	/* AG inode header */
> +#define XFS_SCRUB_TYPE_MAX	5
>
>  /* i: repair this metadata */
>  #define XFS_SCRUB_FLAG_REPAIR		(1 << 0)
> diff --git a/fs/xfs/scrub/agheader.c b/fs/xfs/scrub/agheader.c
> index 48e276c..137d2ad 100644
> --- a/fs/xfs/scrub/agheader.c
> +++ b/fs/xfs/scrub/agheader.c
> @@ -422,3 +422,99 @@ xfs_scrub_agfl(
>  }
>  #undef XFS_SCRUB_AGFL_OP_ERROR_GOTO
>  #undef XFS_SCRUB_AGFL_CHECK
> +
> +/* AGI */
> +
> +#define XFS_SCRUB_AGI_CHECK(fs_ok) \
> +	XFS_SCRUB_CHECK(sc, sc->sa.agi_bp, "AGI", fs_ok)
> +#define XFS_SCRUB_AGI_OP_ERROR_GOTO(error, label) \
> +	XFS_SCRUB_OP_ERROR_GOTO(sc, sc->sm->sm_agno, \
> +			XFS_AGI_BLOCK(sc->mp), "AGI", error, label)
> +/* Scrub the AGI. */
> +int
> +xfs_scrub_agi(
> +	struct xfs_scrub_context	*sc)
> +{
> +	struct xfs_mount		*mp = sc->mp;
> +	struct xfs_agi			*agi;
> +	xfs_daddr_t			daddr;
> +	xfs_daddr_t			eofs;
> +	xfs_agnumber_t			agno;
> +	xfs_agblock_t			agbno;
> +	xfs_agblock_t			eoag;
> +	xfs_agino_t			agino;
> +	xfs_agino_t			first_agino;
> +	xfs_agino_t			last_agino;
> +	int				i;
> +	int				level;
> +	int				error = 0;
> +
> +	agno = sc->sm->sm_agno;
> +	error = xfs_scrub_load_ag_headers(sc, agno, XFS_SCRUB_TYPE_AGI);
> +	XFS_SCRUB_AGI_OP_ERROR_GOTO(&error, out);
> +
> +	agi = XFS_BUF_TO_AGI(sc->sa.agi_bp);
> +	eofs = XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks);
> +
> +	/* Check the AG length */
> +	eoag = be32_to_cpu(agi->agi_length);
> +	XFS_SCRUB_AGI_CHECK(eoag == xfs_scrub_ag_blocks(mp, agno));
> +
> +	/* Check btree roots and levels */
> +	agbno = be32_to_cpu(agi->agi_root);
> +	daddr = XFS_AGB_TO_DADDR(mp, agno, agbno);
> +	XFS_SCRUB_AGI_CHECK(agbno > XFS_AGI_BLOCK(mp));
> +	XFS_SCRUB_AGI_CHECK(agbno < mp->m_sb.sb_agblocks);
> +	XFS_SCRUB_AGI_CHECK(agbno < eoag);
> +	XFS_SCRUB_AGI_CHECK(daddr < eofs);
> +
> +	level = be32_to_cpu(agi->agi_level);
> +	XFS_SCRUB_AGI_CHECK(level > 0);
> +	XFS_SCRUB_AGI_CHECK(level <= XFS_BTREE_MAXLEVELS);
> +
> +	if (xfs_sb_version_hasfinobt(&mp->m_sb)) {
> +		agbno = be32_to_cpu(agi->agi_free_root);
> +		daddr = XFS_AGB_TO_DADDR(mp, agno, agbno);
> +		XFS_SCRUB_AGI_CHECK(agbno > XFS_AGI_BLOCK(mp));
> +		XFS_SCRUB_AGI_CHECK(agbno < mp->m_sb.sb_agblocks);
> +		XFS_SCRUB_AGI_CHECK(agbno < eoag);
> +		XFS_SCRUB_AGI_CHECK(daddr < eofs);
> +
> +		level = be32_to_cpu(agi->agi_free_level);
> +		XFS_SCRUB_AGI_CHECK(level > 0);
> +		XFS_SCRUB_AGI_CHECK(level <= XFS_BTREE_MAXLEVELS);
> +	}
> +
> +	/* Check inode counters */
> +	first_agino = XFS_OFFBNO_TO_AGINO(mp, XFS_AGI_BLOCK(mp) + 1, 0);
> +	last_agino = XFS_OFFBNO_TO_AGINO(mp, eoag + 1, 0) - 1;
> +	agino = be32_to_cpu(agi->agi_count);
> +	XFS_SCRUB_AGI_CHECK(agino <= last_agino - first_agino + 1);
> +	XFS_SCRUB_AGI_CHECK(agino >= be32_to_cpu(agi->agi_freecount));
> +
> +	/* Check inode pointers */
> +	agino = be32_to_cpu(agi->agi_newino);
> +	if (agino != NULLAGINO) {
> +		XFS_SCRUB_AGI_CHECK(agino >= first_agino);
> +		XFS_SCRUB_AGI_CHECK(agino <= last_agino);
> +	}
> +	agino = be32_to_cpu(agi->agi_dirino);
> +	if (agino != NULLAGINO) {
> +		XFS_SCRUB_AGI_CHECK(agino >= first_agino);
> +		XFS_SCRUB_AGI_CHECK(agino <= last_agino);
> +	}
> +
> +	/* Check unlinked inode buckets */
> +	for (i = 0; i < XFS_AGI_UNLINKED_BUCKETS; i++) {
> +		agino = be32_to_cpu(agi->agi_unlinked[i]);
> +		if (agino == NULLAGINO)
> +			continue;
> +		XFS_SCRUB_AGI_CHECK(agino >= first_agino);
> +		XFS_SCRUB_AGI_CHECK(agino <= last_agino);
> +	}
> +
> +out:
> +	return error;
> +}
> +#undef XFS_SCRUB_AGI_CHECK
> +#undef XFS_SCRUB_AGI_OP_ERROR_GOTO
> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> index d1ef722..994c6c8 100644
> --- a/fs/xfs/scrub/common.c
> +++ b/fs/xfs/scrub/common.c
> @@ -639,12 +639,14 @@ xfs_scrub_load_ag_headers(
>  	struct xfs_mount		*mp = sc->mp;
>  	int				error;
>
> -	ASSERT(type == XFS_SCRUB_TYPE_AGF || type == XFS_SCRUB_TYPE_AGFL);
> +	ASSERT(type == XFS_SCRUB_TYPE_AGF || type == XFS_SCRUB_TYPE_AGFL ||
> +	       type == XFS_SCRUB_TYPE_AGI);
>  	memset(&sc->sa, 0, sizeof(sc->sa));
>  	sc->sa.agno = agno;
>
>  	error = xfs_scrub_load_ag_header(sc, XFS_AGI_DADDR(mp),
> -			&sc->sa.agi_bp, &xfs_agi_buf_ops, false);
> +			&sc->sa.agi_bp, &xfs_agi_buf_ops,
> +			type == XFS_SCRUB_TYPE_AGI);
>  	if (error)
>  		return error;
>
> @@ -686,6 +688,10 @@ static const struct xfs_scrub_meta_fns meta_scrub_fns[] = {
>  		.setup	= xfs_scrub_setup_ag_header,
>  		.scrub	= xfs_scrub_agfl,
>  	},
> +	{ /* agi */
> +		.setup	= xfs_scrub_setup_ag_header,
> +		.scrub	= xfs_scrub_agi,
> +	},
>  };
>
>  /* Dispatch metadata scrubbing. */
> diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
> index 5c4893d..952151a 100644
> --- a/fs/xfs/scrub/common.h
> +++ b/fs/xfs/scrub/common.h
> @@ -216,6 +216,7 @@ SCRUB_FN(xfs_scrub_metabufs);
>  SCRUB_FN(xfs_scrub_superblock);
>  SCRUB_FN(xfs_scrub_agf);
>  SCRUB_FN(xfs_scrub_agfl);
> +SCRUB_FN(xfs_scrub_agi);
>  #undef SCRUB_FN
>
>  #endif	/* __XFS_REPAIR_COMMON_H__ */
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index ebf6045..24efbff 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -3316,7 +3316,8 @@ DEFINE_GETFSMAP_EVENT(xfs_getfsmap_mapping);
>  	{ XFS_SCRUB_TYPE_METABUFS,	"metabufs" }, \
>  	{ XFS_SCRUB_TYPE_SB,		"superblock" }, \
>  	{ XFS_SCRUB_TYPE_AGF,		"AGF" }, \
> -	{ XFS_SCRUB_TYPE_AGFL,		"AGFL" }
> +	{ XFS_SCRUB_TYPE_AGFL,		"AGFL" }, \
> +	{ XFS_SCRUB_TYPE_AGI,		"AGI" }
>  DECLARE_EVENT_CLASS(xfs_scrub_class,
>  	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
>  		 int error),
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 09/22] xfs: scrub free space btrees
  2017-07-21  4:39 ` [PATCH 09/22] xfs: scrub free space btrees Darrick J. Wong
@ 2017-07-23 17:09   ` Allison Henderson
  0 siblings, 0 replies; 63+ messages in thread
From: Allison Henderson @ 2017-07-23 17:09 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

Reviewed by: Allison Henderson <allison.henderson@oracle.com>

On 7/20/2017 9:39 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> Check the extent records free space btrees to ensure that the values
> look sane.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/Makefile        |    1
>  fs/xfs/libxfs/xfs_fs.h |    4 +-
>  fs/xfs/scrub/alloc.c   |  105 ++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/scrub/common.c  |   24 +++++++++++
>  fs/xfs/scrub/common.h  |    6 +++
>  fs/xfs/xfs_trace.h     |    4 +-
>  6 files changed, 142 insertions(+), 2 deletions(-)
>  create mode 100644 fs/xfs/scrub/alloc.c
>
>
> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> index 1b9bd1a..ce492ee 100644
> --- a/fs/xfs/Makefile
> +++ b/fs/xfs/Makefile
> @@ -141,6 +141,7 @@ xfs-$(CONFIG_EXPORTFS_BLOCK_OPS)	+= xfs_pnfs.o
>  ifeq ($(CONFIG_XFS_ONLINE_SCRUB),y)
>  xfs-y				+= $(addprefix scrub/, \
>  				   agheader.o \
> +				   alloc.o \
>  				   btree.o \
>  				   common.o \
>  				   metabufs.o \
> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> index 208cc48..bb36acf 100644
> --- a/fs/xfs/libxfs/xfs_fs.h
> +++ b/fs/xfs/libxfs/xfs_fs.h
> @@ -487,7 +487,9 @@ struct xfs_scrub_metadata {
>  #define XFS_SCRUB_TYPE_AGF	3	/* AG free header */
>  #define XFS_SCRUB_TYPE_AGFL	4	/* AG free list */
>  #define XFS_SCRUB_TYPE_AGI	5	/* AG inode header */
> -#define XFS_SCRUB_TYPE_MAX	5
> +#define XFS_SCRUB_TYPE_BNOBT	6	/* freesp by block btree */
> +#define XFS_SCRUB_TYPE_CNTBT	7	/* freesp by length btree */
> +#define XFS_SCRUB_TYPE_MAX	7
>
>  /* i: repair this metadata */
>  #define XFS_SCRUB_FLAG_REPAIR		(1 << 0)
> diff --git a/fs/xfs/scrub/alloc.c b/fs/xfs/scrub/alloc.c
> new file mode 100644
> index 0000000..1709ab2
> --- /dev/null
> +++ b/fs/xfs/scrub/alloc.c
> @@ -0,0 +1,105 @@
> +/*
> + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> + *
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#include "xfs.h"
> +#include "xfs_fs.h"
> +#include "xfs_shared.h"
> +#include "xfs_format.h"
> +#include "xfs_trans_resv.h"
> +#include "xfs_mount.h"
> +#include "xfs_defer.h"
> +#include "xfs_btree.h"
> +#include "xfs_bit.h"
> +#include "xfs_log_format.h"
> +#include "xfs_trans.h"
> +#include "xfs_trace.h"
> +#include "xfs_sb.h"
> +#include "xfs_rmap.h"
> +#include "scrub/common.h"
> +#include "scrub/btree.h"
> +
> +/*
> + * Set us up to scrub free space btrees.
> + * Push everything out of the log so that the busy extent list is empty.
> + */
> +int
> +xfs_scrub_setup_ag_allocbt(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_inode		*ip)
> +{
> +	return xfs_scrub_setup_ag_btree(sc, ip, sc->try_harder);
> +}
> +
> +/* Free space btree scrubber. */
> +
> +/* Scrub a bnobt/cntbt record. */
> +STATIC int
> +xfs_scrub_allocbt_helper(
> +	struct xfs_scrub_btree		*bs,
> +	union xfs_btree_rec		*rec)
> +{
> +	struct xfs_mount		*mp = bs->cur->bc_mp;
> +	struct xfs_agf			*agf;
> +	xfs_agblock_t			bno;
> +	xfs_extlen_t			len;
> +	int				error = 0;
> +
> +	bno = be32_to_cpu(rec->alloc.ar_startblock);
> +	len = be32_to_cpu(rec->alloc.ar_blockcount);
> +	agf = XFS_BUF_TO_AGF(bs->sc->sa.agf_bp);
> +
> +	XFS_SCRUB_BTREC_CHECK(bs, bno < mp->m_sb.sb_agblocks);
> +	XFS_SCRUB_BTREC_CHECK(bs, bno < be32_to_cpu(agf->agf_length));
> +	XFS_SCRUB_BTREC_CHECK(bs, bno < bno + len);
> +	XFS_SCRUB_BTREC_CHECK(bs, (unsigned long long)bno + len <=
> +			mp->m_sb.sb_agblocks);
> +	XFS_SCRUB_BTREC_CHECK(bs, (unsigned long long)bno + len <=
> +			be32_to_cpu(agf->agf_length));
> +
> +	return error;
> +}
> +
> +/* Scrub the freespace btrees for some AG. */
> +STATIC int
> +xfs_scrub_allocbt(
> +	struct xfs_scrub_context	*sc,
> +	xfs_btnum_t			which)
> +{
> +	struct xfs_owner_info		oinfo;
> +	struct xfs_btree_cur		*cur;
> +
> +	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_AG);
> +	cur = which == XFS_BTNUM_BNO ? sc->sa.bno_cur : sc->sa.cnt_cur;
> +	return xfs_scrub_btree(sc, cur, xfs_scrub_allocbt_helper,
> +			&oinfo, NULL);
> +}
> +
> +int
> +xfs_scrub_bnobt(
> +	struct xfs_scrub_context	*sc)
> +{
> +	return xfs_scrub_allocbt(sc, XFS_BTNUM_BNO);
> +}
> +
> +int
> +xfs_scrub_cntbt(
> +	struct xfs_scrub_context	*sc)
> +{
> +	return xfs_scrub_allocbt(sc, XFS_BTNUM_CNT);
> +}
> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> index 994c6c8..86161b5 100644
> --- a/fs/xfs/scrub/common.c
> +++ b/fs/xfs/scrub/common.c
> @@ -556,6 +556,22 @@ xfs_scrub_dummy(
>  	return 0;
>  }
>
> +/* Set us up with AG headers and btree cursors. */
> +int
> +xfs_scrub_setup_ag_btree(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_inode		*ip,
> +	bool				force_log)
> +{
> +	int				error;
> +
> +	error = xfs_scrub_setup_ag_header(sc, ip);
> +	if (error)
> +		return error;
> +
> +	return xfs_scrub_ag_init(sc, sc->sm->sm_agno, &sc->sa);
> +}
> +
>  /* Per-scrubber setup functions */
>
>  /* Set us up with a transaction and an empty context. */
> @@ -692,6 +708,14 @@ static const struct xfs_scrub_meta_fns meta_scrub_fns[] = {
>  		.setup	= xfs_scrub_setup_ag_header,
>  		.scrub	= xfs_scrub_agi,
>  	},
> +	{ /* bnobt */
> +		.setup	= xfs_scrub_setup_ag_allocbt,
> +		.scrub	= xfs_scrub_bnobt,
> +	},
> +	{ /* cntbt */
> +		.setup	= xfs_scrub_setup_ag_allocbt,
> +		.scrub	= xfs_scrub_cntbt,
> +	},
>  };
>
>  /* Dispatch metadata scrubbing. */
> diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
> index 952151a..f14abfb 100644
> --- a/fs/xfs/scrub/common.h
> +++ b/fs/xfs/scrub/common.h
> @@ -202,10 +202,14 @@ int xfs_scrub_walk_agfl(struct xfs_scrub_context *sc,
>
>  /* Setup functions */
>
> +int xfs_scrub_setup_ag_btree(struct xfs_scrub_context *sc,
> +			     struct xfs_inode *ip, bool force_log);
> +
>  #define SETUP_FN(name) int name(struct xfs_scrub_context *sc, struct xfs_inode *ip)
>  SETUP_FN(xfs_scrub_setup_fs);
>  SETUP_FN(xfs_scrub_setup_metabufs);
>  SETUP_FN(xfs_scrub_setup_ag_header);
> +SETUP_FN(xfs_scrub_setup_ag_allocbt);
>  #undef SETUP_FN
>
>  /* Metadata scrubbers */
> @@ -217,6 +221,8 @@ SCRUB_FN(xfs_scrub_superblock);
>  SCRUB_FN(xfs_scrub_agf);
>  SCRUB_FN(xfs_scrub_agfl);
>  SCRUB_FN(xfs_scrub_agi);
> +SCRUB_FN(xfs_scrub_bnobt);
> +SCRUB_FN(xfs_scrub_cntbt);
>  #undef SCRUB_FN
>
>  #endif	/* __XFS_REPAIR_COMMON_H__ */
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 24efbff..4a9a645 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -3317,7 +3317,9 @@ DEFINE_GETFSMAP_EVENT(xfs_getfsmap_mapping);
>  	{ XFS_SCRUB_TYPE_SB,		"superblock" }, \
>  	{ XFS_SCRUB_TYPE_AGF,		"AGF" }, \
>  	{ XFS_SCRUB_TYPE_AGFL,		"AGFL" }, \
> -	{ XFS_SCRUB_TYPE_AGI,		"AGI" }
> +	{ XFS_SCRUB_TYPE_AGI,		"AGI" }, \
> +	{ XFS_SCRUB_TYPE_BNOBT,		"bnobt" }, \
> +	{ XFS_SCRUB_TYPE_CNTBT,		"cntbt" }
>  DECLARE_EVENT_CLASS(xfs_scrub_class,
>  	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
>  		 int error),
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 10/22] xfs: scrub inode btrees
  2017-07-21  4:39 ` [PATCH 10/22] xfs: scrub inode btrees Darrick J. Wong
@ 2017-07-23 17:15   ` Allison Henderson
  0 siblings, 0 replies; 63+ messages in thread
From: Allison Henderson @ 2017-07-23 17:15 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

Looks good. Thanks for all the comments, they help!
Reviewed by: Allison Henderson <allison.henderson@oracle.com>

On 7/20/2017 9:39 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> Check the records of the inode btrees to make sure that the values
> make sense given the inode records themselves.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/Makefile            |    1
>  fs/xfs/libxfs/xfs_format.h |    2
>  fs/xfs/libxfs/xfs_fs.h     |    4 -
>  fs/xfs/scrub/common.c      |    9 +
>  fs/xfs/scrub/common.h      |    3
>  fs/xfs/scrub/ialloc.c      |  347 ++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_trace.h         |    4 -
>  7 files changed, 367 insertions(+), 3 deletions(-)
>  create mode 100644 fs/xfs/scrub/ialloc.c
>
>
> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> index ce492ee..5197bea 100644
> --- a/fs/xfs/Makefile
> +++ b/fs/xfs/Makefile
> @@ -144,6 +144,7 @@ xfs-y				+= $(addprefix scrub/, \
>  				   alloc.o \
>  				   btree.o \
>  				   common.o \
> +				   ialloc.o \
>  				   metabufs.o \
>  				   )
>  endif
> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> index 23229f0..154c3dd 100644
> --- a/fs/xfs/libxfs/xfs_format.h
> +++ b/fs/xfs/libxfs/xfs_format.h
> @@ -518,7 +518,7 @@ static inline int xfs_sb_version_hasftype(struct xfs_sb *sbp)
>  		 (sbp->sb_features2 & XFS_SB_VERSION2_FTYPE));
>  }
>
> -static inline int xfs_sb_version_hasfinobt(xfs_sb_t *sbp)
> +static inline bool xfs_sb_version_hasfinobt(xfs_sb_t *sbp)
>  {
>  	return (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) &&
>  		(sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_FINOBT);
> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> index bb36acf..5120cfd 100644
> --- a/fs/xfs/libxfs/xfs_fs.h
> +++ b/fs/xfs/libxfs/xfs_fs.h
> @@ -489,7 +489,9 @@ struct xfs_scrub_metadata {
>  #define XFS_SCRUB_TYPE_AGI	5	/* AG inode header */
>  #define XFS_SCRUB_TYPE_BNOBT	6	/* freesp by block btree */
>  #define XFS_SCRUB_TYPE_CNTBT	7	/* freesp by length btree */
> -#define XFS_SCRUB_TYPE_MAX	7
> +#define XFS_SCRUB_TYPE_INOBT	8	/* inode btree */
> +#define XFS_SCRUB_TYPE_FINOBT	9	/* free inode btree */
> +#define XFS_SCRUB_TYPE_MAX	9
>
>  /* i: repair this metadata */
>  #define XFS_SCRUB_FLAG_REPAIR		(1 << 0)
> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> index 86161b5..9a31846 100644
> --- a/fs/xfs/scrub/common.c
> +++ b/fs/xfs/scrub/common.c
> @@ -716,6 +716,15 @@ static const struct xfs_scrub_meta_fns meta_scrub_fns[] = {
>  		.setup	= xfs_scrub_setup_ag_allocbt,
>  		.scrub	= xfs_scrub_cntbt,
>  	},
> +	{ /* inobt */
> +		.setup	= xfs_scrub_setup_ag_iallocbt,
> +		.scrub	= xfs_scrub_inobt,
> +	},
> +	{ /* finobt */
> +		.setup	= xfs_scrub_setup_ag_iallocbt,
> +		.scrub	= xfs_scrub_finobt,
> +		.has	= xfs_sb_version_hasfinobt,
> +	},
>  };
>
>  /* Dispatch metadata scrubbing. */
> diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
> index f14abfb..cd89bec 100644
> --- a/fs/xfs/scrub/common.h
> +++ b/fs/xfs/scrub/common.h
> @@ -210,6 +210,7 @@ SETUP_FN(xfs_scrub_setup_fs);
>  SETUP_FN(xfs_scrub_setup_metabufs);
>  SETUP_FN(xfs_scrub_setup_ag_header);
>  SETUP_FN(xfs_scrub_setup_ag_allocbt);
> +SETUP_FN(xfs_scrub_setup_ag_iallocbt);
>  #undef SETUP_FN
>
>  /* Metadata scrubbers */
> @@ -223,6 +224,8 @@ SCRUB_FN(xfs_scrub_agfl);
>  SCRUB_FN(xfs_scrub_agi);
>  SCRUB_FN(xfs_scrub_bnobt);
>  SCRUB_FN(xfs_scrub_cntbt);
> +SCRUB_FN(xfs_scrub_inobt);
> +SCRUB_FN(xfs_scrub_finobt);
>  #undef SCRUB_FN
>
>  #endif	/* __XFS_REPAIR_COMMON_H__ */
> diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
> new file mode 100644
> index 0000000..ecf1852
> --- /dev/null
> +++ b/fs/xfs/scrub/ialloc.c
> @@ -0,0 +1,347 @@
> +/*
> + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> + *
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#include "xfs.h"
> +#include "xfs_fs.h"
> +#include "xfs_shared.h"
> +#include "xfs_format.h"
> +#include "xfs_trans_resv.h"
> +#include "xfs_mount.h"
> +#include "xfs_defer.h"
> +#include "xfs_btree.h"
> +#include "xfs_bit.h"
> +#include "xfs_log_format.h"
> +#include "xfs_trans.h"
> +#include "xfs_trace.h"
> +#include "xfs_sb.h"
> +#include "xfs_inode.h"
> +#include "xfs_ialloc.h"
> +#include "xfs_ialloc_btree.h"
> +#include "xfs_icache.h"
> +#include "xfs_rmap.h"
> +#include "xfs_log.h"
> +#include "xfs_trans_priv.h"
> +#include "scrub/common.h"
> +#include "scrub/btree.h"
> +
> +/*
> + * Set us up to scrub inode btrees.
> + * If we detect a discrepancy between the inobt and the inode,
> + * try again after forcing logged inode cores out to disk.
> + */
> +int
> +xfs_scrub_setup_ag_iallocbt(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_inode		*ip)
> +{
> +	return xfs_scrub_setup_ag_btree(sc, ip, sc->try_harder);
> +}
> +
> +/* Inode btree scrubber. */
> +
> +/* Scrub a chunk of an inobt record. */
> +STATIC int
> +xfs_scrub_iallocbt_chunk(
> +	struct xfs_scrub_btree		*bs,
> +	struct xfs_inobt_rec_incore	*irec,
> +	xfs_agino_t			agino,
> +	xfs_extlen_t			len,
> +	bool				*keep_scanning)
> +{
> +	struct xfs_mount		*mp = bs->cur->bc_mp;
> +	struct xfs_agf			*agf;
> +	xfs_agblock_t			eoag;
> +	xfs_agblock_t			bno;
> +	int				error = 0;
> +
> +	agf = XFS_BUF_TO_AGF(bs->sc->sa.agf_bp);
> +	eoag = be32_to_cpu(agf->agf_length);
> +	bno = XFS_AGINO_TO_AGBNO(mp, agino);
> +
> +	*keep_scanning = true;
> +	XFS_SCRUB_BTREC_CHECK(bs, bno < mp->m_sb.sb_agblocks);
> +	XFS_SCRUB_BTREC_CHECK(bs, bno < eoag);
> +	XFS_SCRUB_BTREC_CHECK(bs, bno < bno + len);
> +	XFS_SCRUB_BTREC_CHECK(bs, (unsigned long long)bno + len <=
> +			mp->m_sb.sb_agblocks);
> +	XFS_SCRUB_BTREC_CHECK(bs, (unsigned long long)bno + len <=
> +			eoag);
> +	if (error) {
> +		*keep_scanning = false;
> +		goto out;
> +	}
> +
> +out:
> +	return error;
> +}
> +
> +/* Count the number of free inodes. */
> +static unsigned int
> +xfs_scrub_iallocbt_freecount(
> +	xfs_inofree_t			freemask)
> +{
> +	int				bits = XFS_INODES_PER_CHUNK;
> +	unsigned int			ret = 0;
> +
> +	while (bits--) {
> +		if (freemask & 1)
> +			ret++;
> +		freemask >>= 1;
> +	}
> +
> +	return ret;
> +}
> +
> +/* Check a particular inode with ir_free. */
> +STATIC int
> +xfs_scrub_iallocbt_check_cluster_freemask(
> +	struct xfs_scrub_btree		*bs,
> +	xfs_ino_t			fsino,
> +	xfs_agino_t			chunkino,
> +	xfs_agino_t			clusterino,
> +	struct xfs_inobt_rec_incore	*irec,
> +	struct xfs_buf			*bp)
> +{
> +	struct xfs_dinode		*dip;
> +	struct xfs_mount		*mp = bs->cur->bc_mp;
> +	bool				freemask_ok;
> +	bool				inuse;
> +	int				error;
> +
> +	dip = xfs_buf_offset(bp, clusterino * mp->m_sb.sb_inodesize);
> +	XFS_SCRUB_BTREC_GOTO(bs,
> +			be16_to_cpu(dip->di_magic) == XFS_DINODE_MAGIC,
> +			out);
> +	XFS_SCRUB_BTREC_GOTO(bs,
> +			dip->di_version < 3 || be64_to_cpu(dip->di_ino) ==
> +				fsino + clusterino,
> +			out);
> +	freemask_ok = !!(irec->ir_free & XFS_INOBT_MASK(chunkino + clusterino));
> +	error = xfs_icache_inode_is_allocated(mp, bs->cur->bc_tp,
> +			fsino + clusterino, &inuse);
> +	if (error == -ENOENT) {
> +		/* Not cached, just read the disk buffer */
> +		freemask_ok ^= !!(dip->di_mode);
> +		if (!bs->sc->try_harder && !freemask_ok)
> +			return -EDEADLOCK;
> +	} else if (error < 0) {
> +		/* Inode is only half assembled, don't bother. */
> +		freemask_ok = true;
> +	} else {
> +		/* Inode is all there. */
> +		freemask_ok ^= inuse;
> +	}
> +	XFS_SCRUB_BTREC_CHECK(bs, freemask_ok);
> +out:
> +	return 0;
> +}
> +
> +/* Make sure the free mask is consistent with what the inodes think. */
> +STATIC int
> +xfs_scrub_iallocbt_check_freemask(
> +	struct xfs_scrub_btree		*bs,
> +	struct xfs_inobt_rec_incore	*irec)
> +{
> +	struct xfs_owner_info		oinfo;
> +	struct xfs_imap			imap;
> +	struct xfs_mount		*mp = bs->cur->bc_mp;
> +	struct xfs_dinode		*dip;
> +	struct xfs_buf			*bp;
> +	xfs_ino_t			fsino;
> +	xfs_agino_t			nr_inodes;
> +	xfs_agino_t			agino;
> +	xfs_agino_t			chunkino;
> +	xfs_agino_t			clusterino;
> +	xfs_agblock_t			agbno;
> +	int				blks_per_cluster;
> +	uint16_t			holemask;
> +	uint16_t			ir_holemask;
> +	int				error = 0;
> +
> +	/* Make sure the freemask matches the inode records. */
> +	blks_per_cluster = xfs_icluster_size_fsb(mp);
> +	nr_inodes = XFS_OFFBNO_TO_AGINO(mp, blks_per_cluster, 0);
> +	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_INODES);
> +
> +	for (agino = irec->ir_startino;
> +	     agino < irec->ir_startino + XFS_INODES_PER_CHUNK;
> +	     agino += blks_per_cluster * mp->m_sb.sb_inopblock) {
> +		fsino = XFS_AGINO_TO_INO(mp, bs->cur->bc_private.a.agno, agino);
> +		chunkino = agino - irec->ir_startino;
> +		agbno = XFS_AGINO_TO_AGBNO(mp, agino);
> +
> +		/* Compute the holemask mask for this cluster. */
> +		for (clusterino = 0, holemask = 0; clusterino < nr_inodes;
> +		     clusterino += XFS_INODES_PER_HOLEMASK_BIT)
> +			holemask |= XFS_INOBT_MASK((chunkino + clusterino) /
> +					XFS_INODES_PER_HOLEMASK_BIT);
> +
> +		/* The whole cluster must be a hole or not a hole. */
> +		ir_holemask = (irec->ir_holemask & holemask);
> +		XFS_SCRUB_BTREC_CHECK(bs, ir_holemask == holemask ||
> +				ir_holemask == 0);
> +
> +		/* If any part of this is a hole, skip it. */
> +		if (ir_holemask)
> +			continue;
> +
> +		/* Grab the inode cluster buffer. */
> +		imap.im_blkno = XFS_AGB_TO_DADDR(mp, bs->cur->bc_private.a.agno,
> +				agbno);
> +		imap.im_len = XFS_FSB_TO_BB(mp, blks_per_cluster);
> +		imap.im_boffset = 0;
> +
> +		error = xfs_imap_to_bp(mp, bs->cur->bc_tp, &imap,
> +				&dip, &bp, 0, 0);
> +		XFS_SCRUB_BTREC_OP_ERROR_GOTO(bs, &error, next_cluster);
> +
> +		/* Which inodes are free? */
> +		for (clusterino = 0; clusterino < nr_inodes; clusterino++) {
> +			error = xfs_scrub_iallocbt_check_cluster_freemask(bs,
> +					fsino, chunkino, clusterino, irec, bp);
> +			if (error) {
> +				xfs_trans_brelse(bs->cur->bc_tp, bp);
> +				return error;
> +			}
> +		}
> +
> +		xfs_trans_brelse(bs->cur->bc_tp, bp);
> +next_cluster:
> +		;
> +	}
> +
> +	return error;
> +}
> +
> +/* Scrub an inobt/finobt record. */
> +STATIC int
> +xfs_scrub_iallocbt_helper(
> +	struct xfs_scrub_btree		*bs,
> +	union xfs_btree_rec		*rec)
> +{
> +	struct xfs_mount		*mp = bs->cur->bc_mp;
> +	struct xfs_agi			*agi;
> +	struct xfs_inobt_rec_incore	irec;
> +	uint64_t			holes;
> +	xfs_agino_t			agino;
> +	xfs_agblock_t			agbno;
> +	xfs_extlen_t			len;
> +	bool				keep_scanning;
> +	int				holecount;
> +	int				i;
> +	int				error = 0;
> +	int				err2 = 0;
> +	unsigned int			real_freecount;
> +	uint16_t			holemask;
> +
> +	xfs_inobt_btrec_to_irec(mp, rec, &irec);
> +
> +	XFS_SCRUB_BTREC_CHECK(bs, irec.ir_count <= XFS_INODES_PER_CHUNK);
> +	XFS_SCRUB_BTREC_CHECK(bs, irec.ir_freecount <= XFS_INODES_PER_CHUNK);
> +	real_freecount = irec.ir_freecount +
> +			(XFS_INODES_PER_CHUNK - irec.ir_count);
> +	XFS_SCRUB_BTREC_CHECK(bs, real_freecount ==
> +			xfs_scrub_iallocbt_freecount(irec.ir_free));
> +	agi = XFS_BUF_TO_AGI(bs->sc->sa.agi_bp);
> +	agino = irec.ir_startino;
> +	agbno = XFS_AGINO_TO_AGBNO(mp, irec.ir_startino);
> +	XFS_SCRUB_BTREC_GOTO(bs, agbno < be32_to_cpu(agi->agi_length), out);
> +	XFS_SCRUB_BTREC_CHECK(bs,
> +			!(agbno & (xfs_ialloc_cluster_alignment(mp) - 1)));
> +	XFS_SCRUB_BTREC_CHECK(bs, !(agbno & (xfs_icluster_size_fsb(mp) - 1)));
> +
> +	/* Handle non-sparse inodes */
> +	if (!xfs_inobt_issparse(irec.ir_holemask)) {
> +		len = XFS_B_TO_FSB(mp,
> +				XFS_INODES_PER_CHUNK * mp->m_sb.sb_inodesize);
> +		XFS_SCRUB_BTREC_CHECK(bs,
> +				irec.ir_count == XFS_INODES_PER_CHUNK);
> +
> +		error = xfs_scrub_iallocbt_chunk(bs, &irec, agino, len,
> +				&keep_scanning);
> +		if (error)
> +			goto out;
> +		goto check_freemask;
> +	}
> +
> +	/* Check each chunk of a sparse inode cluster. */
> +	holemask = irec.ir_holemask;
> +	holecount = 0;
> +	len = XFS_B_TO_FSB(mp,
> +			XFS_INODES_PER_HOLEMASK_BIT * mp->m_sb.sb_inodesize);
> +	holes = ~xfs_inobt_irec_to_allocmask(&irec);
> +	XFS_SCRUB_BTREC_CHECK(bs, (holes & irec.ir_free) == holes);
> +	XFS_SCRUB_BTREC_CHECK(bs, irec.ir_freecount <= irec.ir_count);
> +
> +	for (i = 0; i < XFS_INOBT_HOLEMASK_BITS; holemask >>= 1,
> +			i++, agino += XFS_INODES_PER_HOLEMASK_BIT) {
> +		if (holemask & 1) {
> +			holecount += XFS_INODES_PER_HOLEMASK_BIT;
> +			continue;
> +		}
> +
> +		err2 = xfs_scrub_iallocbt_chunk(bs, &irec, agino, len,
> +				&keep_scanning);
> +		if (!error && err2)
> +			error = err2;
> +		if (!keep_scanning)
> +			break;
> +	}
> +
> +	XFS_SCRUB_BTREC_CHECK(bs, holecount <= XFS_INODES_PER_CHUNK);
> +	XFS_SCRUB_BTREC_CHECK(bs, holecount + irec.ir_count ==
> +			XFS_INODES_PER_CHUNK);
> +
> +check_freemask:
> +	error = xfs_scrub_iallocbt_check_freemask(bs, &irec);
> +	if (error)
> +		goto out;
> +
> +out:
> +	return error;
> +}
> +
> +/* Scrub the inode btrees for some AG. */
> +STATIC int
> +xfs_scrub_iallocbt(
> +	struct xfs_scrub_context	*sc,
> +	xfs_btnum_t			which)
> +{
> +	struct xfs_btree_cur		*cur;
> +	struct xfs_owner_info		oinfo;
> +
> +	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_INOBT);
> +	cur = which == XFS_BTNUM_INO ? sc->sa.ino_cur : sc->sa.fino_cur;
> +	return xfs_scrub_btree(sc, cur, xfs_scrub_iallocbt_helper,
> +			&oinfo, NULL);
> +}
> +
> +int
> +xfs_scrub_inobt(
> +	struct xfs_scrub_context	*sc)
> +{
> +	return xfs_scrub_iallocbt(sc, XFS_BTNUM_INO);
> +}
> +
> +int
> +xfs_scrub_finobt(
> +	struct xfs_scrub_context	*sc)
> +{
> +	return xfs_scrub_iallocbt(sc, XFS_BTNUM_FINO);
> +}
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 4a9a645..e2c5f99 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -3319,7 +3319,9 @@ DEFINE_GETFSMAP_EVENT(xfs_getfsmap_mapping);
>  	{ XFS_SCRUB_TYPE_AGFL,		"AGFL" }, \
>  	{ XFS_SCRUB_TYPE_AGI,		"AGI" }, \
>  	{ XFS_SCRUB_TYPE_BNOBT,		"bnobt" }, \
> -	{ XFS_SCRUB_TYPE_CNTBT,		"cntbt" }
> +	{ XFS_SCRUB_TYPE_CNTBT,		"cntbt" }, \
> +	{ XFS_SCRUB_TYPE_INOBT,		"inobt" }, \
> +	{ XFS_SCRUB_TYPE_FINOBT,	"finobt" }
>  DECLARE_EVENT_CLASS(xfs_scrub_class,
>  	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
>  		 int error),
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 11/22] xfs: scrub rmap btrees
  2017-07-21  4:39 ` [PATCH 11/22] xfs: scrub rmap btrees Darrick J. Wong
@ 2017-07-23 17:21   ` Allison Henderson
  0 siblings, 0 replies; 63+ messages in thread
From: Allison Henderson @ 2017-07-23 17:21 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

Reviewed by: Allison Henderson <allison.henderson@oracle.com>

On 7/20/2017 9:39 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> Check the reverse mapping records to make sure that the contents
> make sense.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/Makefile        |    1
>  fs/xfs/libxfs/xfs_fs.h |    3 +
>  fs/xfs/scrub/common.c  |    5 ++
>  fs/xfs/scrub/common.h  |    2 +
>  fs/xfs/scrub/rmap.c    |  127 ++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_trace.h     |    3 +
>  6 files changed, 139 insertions(+), 2 deletions(-)
>  create mode 100644 fs/xfs/scrub/rmap.c
>
>
> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> index 5197bea..5fe0c8e 100644
> --- a/fs/xfs/Makefile
> +++ b/fs/xfs/Makefile
> @@ -146,5 +146,6 @@ xfs-y				+= $(addprefix scrub/, \
>  				   common.o \
>  				   ialloc.o \
>  				   metabufs.o \
> +				   rmap.o \
>  				   )
>  endif
> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> index 5120cfd..24db73a 100644
> --- a/fs/xfs/libxfs/xfs_fs.h
> +++ b/fs/xfs/libxfs/xfs_fs.h
> @@ -491,7 +491,8 @@ struct xfs_scrub_metadata {
>  #define XFS_SCRUB_TYPE_CNTBT	7	/* freesp by length btree */
>  #define XFS_SCRUB_TYPE_INOBT	8	/* inode btree */
>  #define XFS_SCRUB_TYPE_FINOBT	9	/* free inode btree */
> -#define XFS_SCRUB_TYPE_MAX	9
> +#define XFS_SCRUB_TYPE_RMAPBT	10	/* reverse mapping btree */
> +#define XFS_SCRUB_TYPE_MAX	10
>
>  /* i: repair this metadata */
>  #define XFS_SCRUB_FLAG_REPAIR		(1 << 0)
> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> index 9a31846..dfa5fc5 100644
> --- a/fs/xfs/scrub/common.c
> +++ b/fs/xfs/scrub/common.c
> @@ -725,6 +725,11 @@ static const struct xfs_scrub_meta_fns meta_scrub_fns[] = {
>  		.scrub	= xfs_scrub_finobt,
>  		.has	= xfs_sb_version_hasfinobt,
>  	},
> +	{ /* rmapbt */
> +		.setup	= xfs_scrub_setup_ag_rmapbt,
> +		.scrub	= xfs_scrub_rmapbt,
> +		.has	= xfs_sb_version_hasrmapbt,
> +	},
>  };
>
>  /* Dispatch metadata scrubbing. */
> diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
> index cd89bec..8fbd19b 100644
> --- a/fs/xfs/scrub/common.h
> +++ b/fs/xfs/scrub/common.h
> @@ -211,6 +211,7 @@ SETUP_FN(xfs_scrub_setup_metabufs);
>  SETUP_FN(xfs_scrub_setup_ag_header);
>  SETUP_FN(xfs_scrub_setup_ag_allocbt);
>  SETUP_FN(xfs_scrub_setup_ag_iallocbt);
> +SETUP_FN(xfs_scrub_setup_ag_rmapbt);
>  #undef SETUP_FN
>
>  /* Metadata scrubbers */
> @@ -226,6 +227,7 @@ SCRUB_FN(xfs_scrub_bnobt);
>  SCRUB_FN(xfs_scrub_cntbt);
>  SCRUB_FN(xfs_scrub_inobt);
>  SCRUB_FN(xfs_scrub_finobt);
> +SCRUB_FN(xfs_scrub_rmapbt);
>  #undef SCRUB_FN
>
>  #endif	/* __XFS_REPAIR_COMMON_H__ */
> diff --git a/fs/xfs/scrub/rmap.c b/fs/xfs/scrub/rmap.c
> new file mode 100644
> index 0000000..82d027b
> --- /dev/null
> +++ b/fs/xfs/scrub/rmap.c
> @@ -0,0 +1,127 @@
> +/*
> + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> + *
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#include "xfs.h"
> +#include "xfs_fs.h"
> +#include "xfs_shared.h"
> +#include "xfs_format.h"
> +#include "xfs_trans_resv.h"
> +#include "xfs_mount.h"
> +#include "xfs_defer.h"
> +#include "xfs_btree.h"
> +#include "xfs_bit.h"
> +#include "xfs_log_format.h"
> +#include "xfs_trans.h"
> +#include "xfs_trace.h"
> +#include "xfs_sb.h"
> +#include "xfs_rmap.h"
> +#include "scrub/common.h"
> +#include "scrub/btree.h"
> +
> +/*
> + * Set us up to scrub reverse mapping btrees.
> + */
> +int
> +xfs_scrub_setup_ag_rmapbt(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_inode		*ip)
> +{
> +	return xfs_scrub_setup_ag_btree(sc, ip, false);
> +}
> +
> +/* Reverse-mapping scrubber. */
> +
> +/* Scrub an rmapbt record. */
> +STATIC int
> +xfs_scrub_rmapbt_helper(
> +	struct xfs_scrub_btree		*bs,
> +	union xfs_btree_rec		*rec)
> +{
> +	struct xfs_mount		*mp = bs->cur->bc_mp;
> +	struct xfs_agf			*agf;
> +	struct xfs_rmap_irec		irec;
> +	xfs_agblock_t			eoag;
> +	bool				non_inode;
> +	bool				is_unwritten;
> +	bool				is_bmbt;
> +	bool				is_attr;
> +	int				error;
> +
> +	error = xfs_rmap_btrec_to_irec(rec, &irec);
> +	XFS_SCRUB_BTREC_OP_ERROR_GOTO(bs, &error, out);
> +
> +	/* Check extent. */
> +	agf = XFS_BUF_TO_AGF(bs->sc->sa.agf_bp);
> +	eoag = be32_to_cpu(agf->agf_length);
> +	XFS_SCRUB_BTREC_CHECK(bs, irec.rm_startblock < mp->m_sb.sb_agblocks);
> +	XFS_SCRUB_BTREC_CHECK(bs, irec.rm_startblock < eoag);
> +	XFS_SCRUB_BTREC_CHECK(bs, irec.rm_startblock < irec.rm_startblock +
> +			irec.rm_blockcount);
> +	XFS_SCRUB_BTREC_CHECK(bs, irec.rm_startblock + irec.rm_blockcount <=
> +			mp->m_sb.sb_agblocks);
> +	XFS_SCRUB_BTREC_CHECK(bs, irec.rm_startblock + irec.rm_blockcount <=
> +			eoag);
> +
> +	/* Check flags. */
> +	non_inode = XFS_RMAP_NON_INODE_OWNER(irec.rm_owner);
> +	is_bmbt = irec.rm_flags & XFS_RMAP_BMBT_BLOCK;
> +	is_attr = irec.rm_flags & XFS_RMAP_ATTR_FORK;
> +	is_unwritten = irec.rm_flags & XFS_RMAP_UNWRITTEN;
> +
> +	XFS_SCRUB_BTREC_CHECK(bs, !is_bmbt || irec.rm_offset == 0);
> +	XFS_SCRUB_BTREC_CHECK(bs, !non_inode || irec.rm_offset == 0);
> +	XFS_SCRUB_BTREC_CHECK(bs, !is_unwritten || !(is_bmbt || non_inode ||
> +			is_attr));
> +	XFS_SCRUB_BTREC_CHECK(bs, !non_inode || !(is_bmbt || is_unwritten ||
> +			is_attr));
> +
> +	/* Owner inode within an AG? */
> +	XFS_SCRUB_BTREC_CHECK(bs, non_inode ||
> +			(XFS_INO_TO_AGNO(mp, irec.rm_owner) <
> +							mp->m_sb.sb_agcount &&
> +			 XFS_AGINO_TO_AGBNO(mp,
> +				XFS_INO_TO_AGINO(mp, irec.rm_owner)) <
> +							mp->m_sb.sb_agblocks));
> +	/* Owner inode within the FS? */
> +	XFS_SCRUB_BTREC_CHECK(bs, non_inode ||
> +			XFS_AGB_TO_DADDR(mp,
> +				XFS_INO_TO_AGNO(mp, irec.rm_owner),
> +				XFS_AGINO_TO_AGBNO(mp,
> +					XFS_INO_TO_AGINO(mp, irec.rm_owner))) <
> +			XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks));
> +
> +	/* Non-inode owner within the magic values? */
> +	XFS_SCRUB_BTREC_CHECK(bs, !non_inode ||
> +			(irec.rm_owner > XFS_RMAP_OWN_MIN &&
> +			 irec.rm_owner <= XFS_RMAP_OWN_FS));
> +out:
> +	return error;
> +}
> +
> +/* Scrub the rmap btree for some AG. */
> +int
> +xfs_scrub_rmapbt(
> +	struct xfs_scrub_context	*sc)
> +{
> +	struct xfs_owner_info		oinfo;
> +
> +	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_AG);
> +	return xfs_scrub_btree(sc, sc->sa.rmap_cur, xfs_scrub_rmapbt_helper,
> +			&oinfo, NULL);
> +}
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index e2c5f99..3996cb8 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -3321,7 +3321,8 @@ DEFINE_GETFSMAP_EVENT(xfs_getfsmap_mapping);
>  	{ XFS_SCRUB_TYPE_BNOBT,		"bnobt" }, \
>  	{ XFS_SCRUB_TYPE_CNTBT,		"cntbt" }, \
>  	{ XFS_SCRUB_TYPE_INOBT,		"inobt" }, \
> -	{ XFS_SCRUB_TYPE_FINOBT,	"finobt" }
> +	{ XFS_SCRUB_TYPE_FINOBT,	"finobt" }, \
> +	{ XFS_SCRUB_TYPE_RMAPBT,	"rmapbt" }
>  DECLARE_EVENT_CLASS(xfs_scrub_class,
>  	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
>  		 int error),
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 12/22] xfs: scrub refcount btrees
  2017-07-21  4:39 ` [PATCH 12/22] xfs: scrub refcount btrees Darrick J. Wong
@ 2017-07-23 17:25   ` Allison Henderson
  0 siblings, 0 replies; 63+ messages in thread
From: Allison Henderson @ 2017-07-23 17:25 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

This one looks fine to me, it is pretty much as it was in v7.
Reviewed by: Allison Henderson <allison.henderson@oracle.com>

On 7/20/2017 9:39 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> Plumb in the pieces necessary to check the refcount btree.  If rmap is
> available, check the reference count by performing an interval query
> against the rmapbt.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/Makefile         |    1
>  fs/xfs/libxfs/xfs_fs.h  |    3 +
>  fs/xfs/scrub/common.c   |    5 ++
>  fs/xfs/scrub/common.h   |    2 +
>  fs/xfs/scrub/refcount.c |   96 +++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_trace.h      |    3 +
>  6 files changed, 108 insertions(+), 2 deletions(-)
>  create mode 100644 fs/xfs/scrub/refcount.c
>
>
> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> index 5fe0c8e..1b1972b 100644
> --- a/fs/xfs/Makefile
> +++ b/fs/xfs/Makefile
> @@ -146,6 +146,7 @@ xfs-y				+= $(addprefix scrub/, \
>  				   common.o \
>  				   ialloc.o \
>  				   metabufs.o \
> +				   refcount.o \
>  				   rmap.o \
>  				   )
>  endif
> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> index 24db73a..3253de9 100644
> --- a/fs/xfs/libxfs/xfs_fs.h
> +++ b/fs/xfs/libxfs/xfs_fs.h
> @@ -492,7 +492,8 @@ struct xfs_scrub_metadata {
>  #define XFS_SCRUB_TYPE_INOBT	8	/* inode btree */
>  #define XFS_SCRUB_TYPE_FINOBT	9	/* free inode btree */
>  #define XFS_SCRUB_TYPE_RMAPBT	10	/* reverse mapping btree */
> -#define XFS_SCRUB_TYPE_MAX	10
> +#define XFS_SCRUB_TYPE_REFCNTBT	11	/* reference count btree */
> +#define XFS_SCRUB_TYPE_MAX	11
>
>  /* i: repair this metadata */
>  #define XFS_SCRUB_FLAG_REPAIR		(1 << 0)
> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> index dfa5fc5..71a980e 100644
> --- a/fs/xfs/scrub/common.c
> +++ b/fs/xfs/scrub/common.c
> @@ -730,6 +730,11 @@ static const struct xfs_scrub_meta_fns meta_scrub_fns[] = {
>  		.scrub	= xfs_scrub_rmapbt,
>  		.has	= xfs_sb_version_hasrmapbt,
>  	},
> +	{ /* refcountbt */
> +		.setup	= xfs_scrub_setup_ag_refcountbt,
> +		.scrub	= xfs_scrub_refcountbt,
> +		.has	= xfs_sb_version_hasreflink,
> +	},
>  };
>
>  /* Dispatch metadata scrubbing. */
> diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
> index 8fbd19b..1f9ba8c6 100644
> --- a/fs/xfs/scrub/common.h
> +++ b/fs/xfs/scrub/common.h
> @@ -212,6 +212,7 @@ SETUP_FN(xfs_scrub_setup_ag_header);
>  SETUP_FN(xfs_scrub_setup_ag_allocbt);
>  SETUP_FN(xfs_scrub_setup_ag_iallocbt);
>  SETUP_FN(xfs_scrub_setup_ag_rmapbt);
> +SETUP_FN(xfs_scrub_setup_ag_refcountbt);
>  #undef SETUP_FN
>
>  /* Metadata scrubbers */
> @@ -228,6 +229,7 @@ SCRUB_FN(xfs_scrub_cntbt);
>  SCRUB_FN(xfs_scrub_inobt);
>  SCRUB_FN(xfs_scrub_finobt);
>  SCRUB_FN(xfs_scrub_rmapbt);
> +SCRUB_FN(xfs_scrub_refcountbt);
>  #undef SCRUB_FN
>
>  #endif	/* __XFS_REPAIR_COMMON_H__ */
> diff --git a/fs/xfs/scrub/refcount.c b/fs/xfs/scrub/refcount.c
> new file mode 100644
> index 0000000..fcc72c5
> --- /dev/null
> +++ b/fs/xfs/scrub/refcount.c
> @@ -0,0 +1,96 @@
> +/*
> + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> + *
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#include "xfs.h"
> +#include "xfs_fs.h"
> +#include "xfs_shared.h"
> +#include "xfs_format.h"
> +#include "xfs_trans_resv.h"
> +#include "xfs_mount.h"
> +#include "xfs_defer.h"
> +#include "xfs_btree.h"
> +#include "xfs_bit.h"
> +#include "xfs_log_format.h"
> +#include "xfs_trans.h"
> +#include "xfs_trace.h"
> +#include "xfs_sb.h"
> +#include "xfs_rmap.h"
> +#include "scrub/common.h"
> +#include "scrub/btree.h"
> +
> +/*
> + * Set us up to scrub reference count btrees.
> + */
> +int
> +xfs_scrub_setup_ag_refcountbt(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_inode		*ip)
> +{
> +	return xfs_scrub_setup_ag_btree(sc, ip, false);
> +}
> +
> +/* Reference count btree scrubber. */
> +
> +/* Scrub a refcountbt record. */
> +STATIC int
> +xfs_scrub_refcountbt_helper(
> +	struct xfs_scrub_btree		*bs,
> +	union xfs_btree_rec		*rec)
> +{
> +	struct xfs_mount		*mp = bs->cur->bc_mp;
> +	struct xfs_agf			*agf;
> +	struct xfs_refcount_irec	irec;
> +	xfs_agblock_t			eoag;
> +	bool				has_cowflag;
> +	int				error = 0;
> +
> +	irec.rc_startblock = be32_to_cpu(rec->refc.rc_startblock);
> +	irec.rc_blockcount = be32_to_cpu(rec->refc.rc_blockcount);
> +	irec.rc_refcount = be32_to_cpu(rec->refc.rc_refcount);
> +	agf = XFS_BUF_TO_AGF(bs->sc->sa.agf_bp);
> +	eoag = be32_to_cpu(agf->agf_length);
> +
> +	has_cowflag = !!(irec.rc_startblock & XFS_REFC_COW_START);
> +	XFS_SCRUB_BTREC_CHECK(bs, (irec.rc_refcount == 1 && has_cowflag) ||
> +				  (irec.rc_refcount != 1 && !has_cowflag));
> +	irec.rc_startblock &= ~XFS_REFC_COW_START;
> +	XFS_SCRUB_BTREC_CHECK(bs, irec.rc_startblock < mp->m_sb.sb_agblocks);
> +	XFS_SCRUB_BTREC_CHECK(bs, irec.rc_startblock < eoag);
> +	XFS_SCRUB_BTREC_CHECK(bs, irec.rc_startblock < irec.rc_startblock +
> +			irec.rc_blockcount);
> +	XFS_SCRUB_BTREC_CHECK(bs, (unsigned long long)irec.rc_startblock +
> +			irec.rc_blockcount <= mp->m_sb.sb_agblocks);
> +	XFS_SCRUB_BTREC_CHECK(bs, (unsigned long long)irec.rc_startblock +
> +			irec.rc_blockcount <= eoag);
> +	XFS_SCRUB_BTREC_CHECK(bs, irec.rc_refcount >= 1);
> +
> +	return error;
> +}
> +
> +/* Scrub the refcount btree for some AG. */
> +int
> +xfs_scrub_refcountbt(
> +	struct xfs_scrub_context	*sc)
> +{
> +	struct xfs_owner_info		oinfo;
> +
> +	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_REFC);
> +	return xfs_scrub_btree(sc, sc->sa.refc_cur, xfs_scrub_refcountbt_helper,
> +			&oinfo, NULL);
> +}
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 3996cb8..6c0281b 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -3322,7 +3322,8 @@ DEFINE_GETFSMAP_EVENT(xfs_getfsmap_mapping);
>  	{ XFS_SCRUB_TYPE_CNTBT,		"cntbt" }, \
>  	{ XFS_SCRUB_TYPE_INOBT,		"inobt" }, \
>  	{ XFS_SCRUB_TYPE_FINOBT,	"finobt" }, \
> -	{ XFS_SCRUB_TYPE_RMAPBT,	"rmapbt" }
> +	{ XFS_SCRUB_TYPE_RMAPBT,	"rmapbt" }, \
> +	{ XFS_SCRUB_TYPE_REFCNTBT,	"refcountbt" }
>  DECLARE_EVENT_CLASS(xfs_scrub_class,
>  	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
>  		 int error),
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 13/22] xfs: scrub inodes
  2017-07-21  4:39 ` [PATCH 13/22] xfs: scrub inodes Darrick J. Wong
@ 2017-07-23 17:38   ` Allison Henderson
  2017-07-24 20:02     ` Darrick J. Wong
  0 siblings, 1 reply; 63+ messages in thread
From: Allison Henderson @ 2017-07-23 17:38 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs



On 7/20/2017 9:39 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> Scrub the fields within an inode.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/Makefile        |    1
>  fs/xfs/libxfs/xfs_fs.h |    3
>  fs/xfs/scrub/common.c  |   64 +++++++++
>  fs/xfs/scrub/common.h  |    4 +
>  fs/xfs/scrub/inode.c   |  326 ++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_trace.h     |    3
>  6 files changed, 397 insertions(+), 4 deletions(-)
>  create mode 100644 fs/xfs/scrub/inode.c
>
>
> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> index 1b1972b..2ba33ad 100644
> --- a/fs/xfs/Makefile
> +++ b/fs/xfs/Makefile
> @@ -145,6 +145,7 @@ xfs-y				+= $(addprefix scrub/, \
>  				   btree.o \
>  				   common.o \
>  				   ialloc.o \
> +				   inode.o \
>  				   metabufs.o \
>  				   refcount.o \
>  				   rmap.o \
> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> index 3253de9..277b528 100644
> --- a/fs/xfs/libxfs/xfs_fs.h
> +++ b/fs/xfs/libxfs/xfs_fs.h
> @@ -493,7 +493,8 @@ struct xfs_scrub_metadata {
>  #define XFS_SCRUB_TYPE_FINOBT	9	/* free inode btree */
>  #define XFS_SCRUB_TYPE_RMAPBT	10	/* reverse mapping btree */
>  #define XFS_SCRUB_TYPE_REFCNTBT	11	/* reference count btree */
> -#define XFS_SCRUB_TYPE_MAX	11
> +#define XFS_SCRUB_TYPE_INODE	12	/* inode record */
> +#define XFS_SCRUB_TYPE_MAX	12
>
>  /* i: repair this metadata */
>  #define XFS_SCRUB_FLAG_REPAIR		(1 << 0)
> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> index 71a980e..066fd3e 100644
> --- a/fs/xfs/scrub/common.c
> +++ b/fs/xfs/scrub/common.c
> @@ -31,6 +31,8 @@
>  #include "xfs_trace.h"
>  #include "xfs_sb.h"
>  #include "xfs_inode.h"
> +#include "xfs_icache.h"
> +#include "xfs_itable.h"
>  #include "xfs_alloc.h"
>  #include "xfs_alloc_btree.h"
>  #include "xfs_bmap.h"
> @@ -584,12 +586,60 @@ xfs_scrub_setup_fs(
>  			&M_RES(sc->mp)->tr_itruncate, 0, 0, 0, &sc->tp);
>  }
>
> +/*
> + * Given an inode and the scrub control structure, grab either the
> + * inode referenced in the control structure or the inode passed in.
> + * The inode is not locked.
> + */
> +int
> +xfs_scrub_get_inode(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_inode		*ip_in)
> +{
> +	struct xfs_mount		*mp = sc->mp;
> +	struct xfs_inode		*ips = NULL;
> +	int				error;
> +
> +	if (sc->sm->sm_agno || (sc->sm->sm_gen && !sc->sm->sm_ino))
> +		return -EINVAL;
> +
> +	/* We want to scan the inode we already had opened. */
> +	if (sc->sm->sm_ino == 0 || sc->sm->sm_ino == ip_in->i_ino) {
> +		sc->ip = ip_in;
> +		return 0;
> +	}
> +
> +	/* Look up the inode, see if the generation number matches. */
> +	if (xfs_internal_inum(mp, sc->sm->sm_ino))
> +		return -ENOENT;
> +	error = xfs_iget(mp, NULL, sc->sm->sm_ino, XFS_IGET_UNTRUSTED,
> +			0, &ips);
> +	if (error == -ENOENT || error == -EINVAL) {
> +		/* inode doesn't exist... */
> +		return -ENOENT;
> +	} else if (error) {
> +		trace_xfs_scrub_op_error(mp,
> +				XFS_INO_TO_AGNO(mp, sc->sm->sm_ino),
> +				XFS_INO_TO_AGBNO(mp, sc->sm->sm_ino),
> +				"inode", error, __func__, __LINE__);
> +		return error;
> +	}
> +	if (VFS_I(ips)->i_generation != sc->sm->sm_gen) {
> +		IRELE(ips);
> +		return -ENOENT;
> +	}
> +
> +	sc->ip = ips;
> +	return 0;
> +}
> +
>  /* Scrub setup and teardown */
>
>  /* Free all the resources and finish the transactions. */
>  STATIC int
>  xfs_scrub_teardown(
>  	struct xfs_scrub_context	*sc,
> +	struct xfs_inode		*ip_in,
>  	int				error)
>  {
>  	xfs_scrub_ag_free(sc, &sc->sa);
> @@ -597,6 +647,12 @@ xfs_scrub_teardown(
>  		xfs_trans_cancel(sc->tp);
>  		sc->tp = NULL;
>  	}
> +	if (sc->ip) {
> +		xfs_iunlock(sc->ip, sc->ilock_flags);
> +		if (sc->ip != ip_in)
> +			IRELE(sc->ip);
> +		sc->ip = NULL;
> +	}
>  	return error;
>  }
>
> @@ -735,6 +791,10 @@ static const struct xfs_scrub_meta_fns meta_scrub_fns[] = {
>  		.scrub	= xfs_scrub_refcountbt,
>  		.has	= xfs_sb_version_hasreflink,
>  	},
> +	{ /* inode record */
> +		.setup	= xfs_scrub_setup_inode,
> +		.scrub	= xfs_scrub_inode,
> +	},
>  };
>
>  /* Dispatch metadata scrubbing. */
> @@ -808,7 +868,7 @@ xfs_scrub_metadata(
>  		 * Tear down everything we hold, then set up again with
>  		 * preparation for worst-case scenarios.
>  		 */
> -		error = xfs_scrub_teardown(&sc, 0);
> +		error = xfs_scrub_teardown(&sc, ip, 0);
>  		if (error)
>  			goto out;
>  		try_harder = true;
> @@ -820,7 +880,7 @@ xfs_scrub_metadata(
>  		xfs_alert_ratelimited(mp, "Corruption detected during scrub.");
>
>  out_teardown:
> -	error = xfs_scrub_teardown(&sc, error);
> +	error = xfs_scrub_teardown(&sc, ip, error);
>  out:
>  	trace_xfs_scrub_done(ip, sm, error);
>  	return error;
> diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
> index 1f9ba8c6..5caa6c9 100644
> --- a/fs/xfs/scrub/common.h
> +++ b/fs/xfs/scrub/common.h
> @@ -52,6 +52,7 @@ struct xfs_scrub_context {
>  	const struct xfs_scrub_meta_fns	*fns;
>  	struct xfs_trans		*tp;
>  	struct xfs_inode		*ip;
> +	uint				ilock_flags;
>  	bool				try_harder;
>
>  	/* State tracking for single-AG operations. */
> @@ -204,6 +205,7 @@ int xfs_scrub_walk_agfl(struct xfs_scrub_context *sc,
>
>  int xfs_scrub_setup_ag_btree(struct xfs_scrub_context *sc,
>  			     struct xfs_inode *ip, bool force_log);
> +int xfs_scrub_get_inode(struct xfs_scrub_context *sc, struct xfs_inode *ip_in);
>
>  #define SETUP_FN(name) int name(struct xfs_scrub_context *sc, struct xfs_inode *ip)
>  SETUP_FN(xfs_scrub_setup_fs);
> @@ -213,6 +215,7 @@ SETUP_FN(xfs_scrub_setup_ag_allocbt);
>  SETUP_FN(xfs_scrub_setup_ag_iallocbt);
>  SETUP_FN(xfs_scrub_setup_ag_rmapbt);
>  SETUP_FN(xfs_scrub_setup_ag_refcountbt);
> +SETUP_FN(xfs_scrub_setup_inode);
>  #undef SETUP_FN
>
>  /* Metadata scrubbers */
> @@ -230,6 +233,7 @@ SCRUB_FN(xfs_scrub_inobt);
>  SCRUB_FN(xfs_scrub_finobt);
>  SCRUB_FN(xfs_scrub_rmapbt);
>  SCRUB_FN(xfs_scrub_refcountbt);
> +SCRUB_FN(xfs_scrub_inode);
>  #undef SCRUB_FN
>
>  #endif	/* __XFS_REPAIR_COMMON_H__ */
> diff --git a/fs/xfs/scrub/inode.c b/fs/xfs/scrub/inode.c
> new file mode 100644
> index 0000000..6e1e037
> --- /dev/null
> +++ b/fs/xfs/scrub/inode.c
> @@ -0,0 +1,326 @@
> +/*
> + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> + *
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#include "xfs.h"
> +#include "xfs_fs.h"
> +#include "xfs_shared.h"
> +#include "xfs_format.h"
> +#include "xfs_trans_resv.h"
> +#include "xfs_mount.h"
> +#include "xfs_defer.h"
> +#include "xfs_btree.h"
> +#include "xfs_bit.h"
> +#include "xfs_log_format.h"
> +#include "xfs_trans.h"
> +#include "xfs_trace.h"
> +#include "xfs_sb.h"
> +#include "xfs_inode.h"
> +#include "xfs_icache.h"
> +#include "xfs_inode_buf.h"
> +#include "xfs_inode_fork.h"
> +#include "xfs_ialloc.h"
> +#include "xfs_log.h"
> +#include "xfs_trans_priv.h"
> +#include "xfs_reflink.h"
> +#include "scrub/common.h"
> +
> +/* Set us up with an inode. */
> +int
> +xfs_scrub_setup_inode(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_inode		*ip)
> +{
> +	struct xfs_mount		*mp = sc->mp;
> +	int				error;
> +
> +	/*
> +	 * Try to get the inode.  If the verifiers fail, we try again
> +	 * in raw mode.
> +	 */
> +	error = xfs_scrub_get_inode(sc, ip);
> +	switch (error) {
> +	case 0:
> +		break;
> +	case -EFSCORRUPTED:
> +	case -EFSBADCRC:
> +		/* Push everything out of the log onto disk prior to check. */
> +		error = _xfs_log_force(mp, XFS_LOG_SYNC, NULL);
> +		if (error)
> +			return error;
> +		xfs_ail_push_all_sync(mp->m_ail);
> +		return 0;
> +	default:
> +		return error;
> +	}
> +
> +	/* Got the inode, lock it and we're ready to go. */
> +	sc->ilock_flags = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL;
> +	xfs_ilock(sc->ip, sc->ilock_flags);
Is this lock....

> +	error = xfs_scrub_trans_alloc(sc->sm, mp, &M_RES(mp)->tr_itruncate,
> +			0, 0, 0, &sc->tp);
> +	if (error)
> +		goto out_unlock;
> +	sc->ilock_flags |= XFS_ILOCK_EXCL;
> +	xfs_ilock(sc->ip, XFS_ILOCK_EXCL);
.... and then this lock, supposed to be locking twice like this?  Did 
you maybe mean for the second one to be an unlock?  Also did you mean to 
call it with sc->ilock_flags as the flags like the first call does?

Other than that it looks good.  Looks like you caught a few extra bugs 
since the last revision.
Reviewed by: Allison Henderson <allison.henderson@oracle.com>

> +
> +	return error;
> +out_unlock:
> +	xfs_iunlock(sc->ip, sc->ilock_flags);
> +	if (sc->ip != ip)
> +		IRELE(sc->ip);
> +	sc->ip = NULL;
> +	return error;
> +}
> +
> +/* Inode core */
> +
> +#define XFS_SCRUB_INODE_CHECK(fs_ok) \
> +	XFS_SCRUB_INO_CHECK(sc, ino, bp, "inode", fs_ok)
> +#define XFS_SCRUB_INODE_GOTO(fs_ok, label) \
> +	XFS_SCRUB_INO_GOTO(sc, ino, bp, "inode", fs_ok, label)
> +#define XFS_SCRUB_INODE_OP_ERROR_GOTO(label) \
> +	XFS_SCRUB_OP_ERROR_GOTO(sc, XFS_INO_TO_AGNO(mp, ino), \
> +			XFS_INO_TO_AGBNO(mp, ino), "inode", &error, label)
> +#define XFS_SCRUB_INODE_PREEN(fs_ok) \
> +	XFS_SCRUB_INO_PREEN(sc, bp, "inode", fs_ok)
> +/* Scrub an inode. */
> +int
> +xfs_scrub_inode(
> +	struct xfs_scrub_context	*sc)
> +{
> +	struct xfs_imap			imap;
> +	struct xfs_dinode		di;
> +	struct xfs_mount		*mp = sc->mp;
> +	struct xfs_buf			*bp = NULL;
> +	struct xfs_dinode		*dip;
> +	xfs_ino_t			ino;
> +	unsigned long long		isize;
> +	uint64_t			flags2;
> +	uint32_t			nextents;
> +	uint32_t			extsize;
> +	uint32_t			cowextsize;
> +	uint16_t			flags;
> +	uint16_t			mode;
> +	bool				has_shared;
> +	int				error = 0;
> +
> +	/* Did we get the in-core inode, or are we doing this manually? */
> +	if (sc->ip) {
> +		ino = sc->ip->i_ino;
> +		xfs_inode_to_disk(sc->ip, &di, 0);
> +		dip = &di;
> +	} else {
> +		/* Map & read inode. */
> +		ino = sc->sm->sm_ino;
> +		error = xfs_imap(mp, sc->tp, ino, &imap, XFS_IGET_UNTRUSTED);
> +		if (error == -EINVAL) {
> +			/*
> +			 * Inode could have gotten deleted out from under us;
> +			 * just forget about it.
> +			 */
> +			error = -ENOENT;
> +			goto out;
> +		}
> +		XFS_SCRUB_INODE_OP_ERROR_GOTO(out);
> +
> +		error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
> +				imap.im_blkno, imap.im_len, XBF_UNMAPPED, &bp,
> +				NULL);
> +		XFS_SCRUB_INODE_OP_ERROR_GOTO(out);
> +
> +		/* Is this really the inode we want? */
> +		bp->b_ops = &xfs_inode_buf_ops;
> +		dip = xfs_buf_offset(bp, imap.im_boffset);
> +		error = xfs_dinode_verify(mp, ino, dip) ? 0 : -EFSCORRUPTED;
> +		XFS_SCRUB_INODE_OP_ERROR_GOTO(out);
> +		XFS_SCRUB_INODE_GOTO(
> +				xfs_dinode_good_version(mp, dip->di_version),
> +				out);
> +		if (be32_to_cpu(dip->di_gen) != sc->sm->sm_gen) {
> +			error = -ENOENT;
> +			goto out;
> +		}
> +	}
> +
> +	flags = be16_to_cpu(dip->di_flags);
> +	if (dip->di_version >= 3)
> +		flags2 = be64_to_cpu(dip->di_flags2);
> +	else
> +		flags2 = 0;
> +
> +	/* di_mode */
> +	mode = be16_to_cpu(dip->di_mode);
> +	XFS_SCRUB_INODE_CHECK(!(mode & ~(S_IALLUGO | S_IFMT)));
> +
> +	/* v1/v2 fields */
> +	switch (dip->di_version) {
> +	case 1:
> +		XFS_SCRUB_INODE_CHECK(dip->di_nlink == 0);
> +		XFS_SCRUB_INODE_CHECK(dip->di_mode || !sc->ip);
> +		XFS_SCRUB_INODE_CHECK(dip->di_projid_lo == 0);
> +		XFS_SCRUB_INODE_CHECK(dip->di_projid_hi == 0);
> +		break;
> +	case 2:
> +	case 3:
> +		XFS_SCRUB_INODE_CHECK(dip->di_onlink == 0);
> +		XFS_SCRUB_INODE_CHECK(dip->di_mode || !sc->ip);
> +		XFS_SCRUB_INODE_CHECK(dip->di_projid_hi == 0 ||
> +				xfs_sb_version_hasprojid32bit(&mp->m_sb));
> +		break;
> +	default:
> +		ASSERT(0);
> +		break;
> +	}
> +
> +	/* di_format */
> +	switch (dip->di_format) {
> +	case XFS_DINODE_FMT_DEV:
> +		XFS_SCRUB_INODE_CHECK(S_ISCHR(mode) || S_ISBLK(mode) ||
> +				      S_ISFIFO(mode) || S_ISSOCK(mode));
> +		break;
> +	case XFS_DINODE_FMT_LOCAL:
> +		XFS_SCRUB_INODE_CHECK(S_ISDIR(mode) || S_ISLNK(mode));
> +		break;
> +	case XFS_DINODE_FMT_EXTENTS:
> +		XFS_SCRUB_INODE_CHECK(S_ISREG(mode) || S_ISDIR(mode) ||
> +				      S_ISLNK(mode));
> +		break;
> +	case XFS_DINODE_FMT_BTREE:
> +		XFS_SCRUB_INODE_CHECK(S_ISREG(mode) || S_ISDIR(mode));
> +		break;
> +	case XFS_DINODE_FMT_UUID:
> +	default:
> +		XFS_SCRUB_INODE_CHECK(false);
> +		break;
> +	}
> +
> +	/* di_size */
> +	isize = be64_to_cpu(dip->di_size);
> +	XFS_SCRUB_INODE_CHECK(!(isize & (1ULL << 63)));
> +	if (!S_ISDIR(mode) && !S_ISREG(mode) && !S_ISLNK(mode))
> +		XFS_SCRUB_INODE_CHECK(isize == 0);
> +
> +	/* di_nblocks */
> +	if (flags2 & XFS_DIFLAG2_REFLINK) {
> +		; /* nblocks can exceed dblocks */
> +	} else if (flags & XFS_DIFLAG_REALTIME) {
> +		XFS_SCRUB_INODE_CHECK(be64_to_cpu(dip->di_nblocks) <
> +				mp->m_sb.sb_dblocks + mp->m_sb.sb_rblocks);
> +	} else {
> +		XFS_SCRUB_INODE_CHECK(be64_to_cpu(dip->di_nblocks) <
> +				mp->m_sb.sb_dblocks);
> +	}
> +
> +	/* di_extsize */
> +	if (flags & XFS_DIFLAG_EXTSIZE) {
> +		extsize = be32_to_cpu(dip->di_extsize);
> +		XFS_SCRUB_INODE_CHECK(extsize > 0);
> +		XFS_SCRUB_INODE_CHECK(extsize <= MAXEXTLEN);
> +		XFS_SCRUB_INODE_CHECK(extsize <= mp->m_sb.sb_agblocks / 2 ||
> +				(flags & XFS_DIFLAG_REALTIME));
> +	}
> +
> +	/* di_flags */
> +	XFS_SCRUB_INODE_CHECK(!(flags & XFS_DIFLAG_IMMUTABLE) ||
> +			      !(flags & XFS_DIFLAG_APPEND));
> +
> +	XFS_SCRUB_INODE_CHECK(!(flags & XFS_DIFLAG_FILESTREAM) ||
> +			      !(flags & XFS_DIFLAG_REALTIME));
> +
> +	/* di_nextents */
> +	nextents = be32_to_cpu(dip->di_nextents);
> +	switch (dip->di_format) {
> +	case XFS_DINODE_FMT_EXTENTS:
> +		XFS_SCRUB_INODE_CHECK(nextents <=
> +			XFS_DFORK_DSIZE(dip, mp) / sizeof(struct xfs_bmbt_rec));
> +		break;
> +	case XFS_DINODE_FMT_BTREE:
> +		XFS_SCRUB_INODE_CHECK(nextents >
> +			XFS_DFORK_DSIZE(dip, mp) / sizeof(struct xfs_bmbt_rec));
> +		break;
> +	case XFS_DINODE_FMT_LOCAL:
> +	case XFS_DINODE_FMT_DEV:
> +	case XFS_DINODE_FMT_UUID:
> +	default:
> +		XFS_SCRUB_INODE_CHECK(nextents == 0);
> +		break;
> +	}
> +
> +	/* di_anextents */
> +	nextents = be16_to_cpu(dip->di_anextents);
> +	switch (dip->di_aformat) {
> +	case XFS_DINODE_FMT_EXTENTS:
> +		XFS_SCRUB_INODE_CHECK(nextents <=
> +			XFS_DFORK_ASIZE(dip, mp) / sizeof(struct xfs_bmbt_rec));
> +		break;
> +	case XFS_DINODE_FMT_BTREE:
> +		XFS_SCRUB_INODE_CHECK(nextents >
> +			XFS_DFORK_ASIZE(dip, mp) / sizeof(struct xfs_bmbt_rec));
> +		break;
> +	case XFS_DINODE_FMT_LOCAL:
> +	case XFS_DINODE_FMT_DEV:
> +	case XFS_DINODE_FMT_UUID:
> +	default:
> +		XFS_SCRUB_INODE_CHECK(nextents == 0);
> +		break;
> +	}
> +
> +	/* di_forkoff */
> +	XFS_SCRUB_INODE_CHECK(XFS_DFORK_APTR(dip) <
> +			(char *)dip + mp->m_sb.sb_inodesize);
> +	XFS_SCRUB_INODE_CHECK(dip->di_anextents == 0 || dip->di_forkoff);
> +
> +	/* di_aformat */
> +	XFS_SCRUB_INODE_CHECK(dip->di_aformat == XFS_DINODE_FMT_LOCAL ||
> +			      dip->di_aformat == XFS_DINODE_FMT_EXTENTS ||
> +			      dip->di_aformat == XFS_DINODE_FMT_BTREE);
> +
> +	/* di_cowextsize */
> +	if (flags2 & XFS_DIFLAG2_COWEXTSIZE) {
> +		cowextsize = be32_to_cpu(dip->di_cowextsize);
> +		XFS_SCRUB_INODE_CHECK(xfs_sb_version_hasreflink(&mp->m_sb));
> +		XFS_SCRUB_INODE_CHECK(cowextsize > 0);
> +		XFS_SCRUB_INODE_CHECK(cowextsize <= MAXEXTLEN);
> +		XFS_SCRUB_INODE_CHECK(cowextsize <= mp->m_sb.sb_agblocks / 2);
> +	}
> +
> +	/* Now let's do the things that require a live inode. */
> +	if (!sc->ip)
> +		goto out;
> +
> +	/*
> +	 * Does this inode have the reflink flag set but no shared extents?
> +	 * Set the preening flag if this is the case.
> +	 */
> +	if (xfs_is_reflink_inode(sc->ip)) {
> +		error = xfs_reflink_inode_has_shared_extents(sc->tp, sc->ip,
> +				&has_shared);
> +		XFS_SCRUB_INODE_OP_ERROR_GOTO(out);
> +		XFS_SCRUB_INODE_PREEN(has_shared == true);
> +	}
> +
> +out:
> +	if (bp)
> +		xfs_trans_brelse(sc->tp, bp);
> +	return error;
> +}
> +#undef XFS_SCRUB_INODE_PREEN
> +#undef XFS_SCRUB_INODE_OP_ERROR_GOTO
> +#undef XFS_SCRUB_INODE_GOTO
> +#undef XFS_SCRUB_INODE_CHECK
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 6c0281b..950e2c8 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -3323,7 +3323,8 @@ DEFINE_GETFSMAP_EVENT(xfs_getfsmap_mapping);
>  	{ XFS_SCRUB_TYPE_INOBT,		"inobt" }, \
>  	{ XFS_SCRUB_TYPE_FINOBT,	"finobt" }, \
>  	{ XFS_SCRUB_TYPE_RMAPBT,	"rmapbt" }, \
> -	{ XFS_SCRUB_TYPE_REFCNTBT,	"refcountbt" }
> +	{ XFS_SCRUB_TYPE_REFCNTBT,	"refcountbt" }, \
> +	{ XFS_SCRUB_TYPE_INODE,		"inode" }
>  DECLARE_EVENT_CLASS(xfs_scrub_class,
>  	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
>  		 int error),
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 14/22] xfs: scrub inode block mappings
  2017-07-21  4:40 ` [PATCH 14/22] xfs: scrub inode block mappings Darrick J. Wong
@ 2017-07-23 17:41   ` Allison Henderson
  2017-07-24 20:05     ` Darrick J. Wong
  0 siblings, 1 reply; 63+ messages in thread
From: Allison Henderson @ 2017-07-23 17:41 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs



On 7/20/2017 9:40 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> Scrub an individual inode's block mappings to make sure they make sense.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/Makefile        |    1
>  fs/xfs/libxfs/xfs_fs.h |    5 +
>  fs/xfs/scrub/bmap.c    |  378 ++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/scrub/common.c  |   12 ++
>  fs/xfs/scrub/common.h  |    5 +
>  fs/xfs/xfs_trace.h     |    5 +
>  6 files changed, 404 insertions(+), 2 deletions(-)
>  create mode 100644 fs/xfs/scrub/bmap.c
>
>
> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> index 2ba33ad..89c67e1a 100644
> --- a/fs/xfs/Makefile
> +++ b/fs/xfs/Makefile
> @@ -142,6 +142,7 @@ ifeq ($(CONFIG_XFS_ONLINE_SCRUB),y)
>  xfs-y				+= $(addprefix scrub/, \
>  				   agheader.o \
>  				   alloc.o \
> +				   bmap.o \
>  				   btree.o \
>  				   common.o \
>  				   ialloc.o \
> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> index 277b528..d762277 100644
> --- a/fs/xfs/libxfs/xfs_fs.h
> +++ b/fs/xfs/libxfs/xfs_fs.h
> @@ -494,7 +494,10 @@ struct xfs_scrub_metadata {
>  #define XFS_SCRUB_TYPE_RMAPBT	10	/* reverse mapping btree */
>  #define XFS_SCRUB_TYPE_REFCNTBT	11	/* reference count btree */
>  #define XFS_SCRUB_TYPE_INODE	12	/* inode record */
> -#define XFS_SCRUB_TYPE_MAX	12
> +#define XFS_SCRUB_TYPE_BMBTD	13	/* data fork block mapping */
> +#define XFS_SCRUB_TYPE_BMBTA	14	/* attr fork block mapping */
> +#define XFS_SCRUB_TYPE_BMBTC	15	/* CoW fork block mapping */
> +#define XFS_SCRUB_TYPE_MAX	15
>
>  /* i: repair this metadata */
>  #define XFS_SCRUB_FLAG_REPAIR		(1 << 0)
> diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c
> new file mode 100644
> index 0000000..731f026
> --- /dev/null
> +++ b/fs/xfs/scrub/bmap.c
> @@ -0,0 +1,378 @@
> +/*
> + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> + *
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#include "xfs.h"
> +#include "xfs_fs.h"
> +#include "xfs_shared.h"
> +#include "xfs_format.h"
> +#include "xfs_trans_resv.h"
> +#include "xfs_mount.h"
> +#include "xfs_defer.h"
> +#include "xfs_btree.h"
> +#include "xfs_bit.h"
> +#include "xfs_log_format.h"
> +#include "xfs_trans.h"
> +#include "xfs_trace.h"
> +#include "xfs_sb.h"
> +#include "xfs_inode.h"
> +#include "xfs_inode_fork.h"
> +#include "xfs_bmap.h"
> +#include "xfs_bmap_util.h"
> +#include "xfs_bmap_btree.h"
> +#include "xfs_rmap.h"
> +#include "scrub/common.h"
> +#include "scrub/btree.h"
> +
> +/* Set us up with an inode's bmap. */
> +STATIC int
> +__xfs_scrub_setup_inode_bmap(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_inode		*ip,
> +	bool				flush_data)
> +{
> +	struct xfs_mount		*mp = sc->mp;
> +	int				error;
> +
> +	error = xfs_scrub_get_inode(sc, ip);
> +	if (error)
> +		return error;
> +
> +	sc->ilock_flags = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL;
> +	xfs_ilock(sc->ip, sc->ilock_flags);
> +
> +	/*
> +	 * We don't want any ephemeral data fork updates sitting around
> +	 * while we inspect block mappings, so wait for directio to finish
> +	 * and flush dirty data if we have delalloc reservations.
> +	 */
> +	if (S_ISREG(VFS_I(sc->ip)->i_mode) && flush_data) {
> +		inode_dio_wait(VFS_I(sc->ip));
> +		error = filemap_write_and_wait(VFS_I(sc->ip)->i_mapping);
> +		if (error)
> +			goto out_unlock;
> +		error = invalidate_inode_pages2(VFS_I(sc->ip)->i_mapping);
> +		if (error)
> +			goto out_unlock;
> +	}
> +
> +	/* Got the inode, lock it and we're ready to go. */
> +	error = xfs_scrub_trans_alloc(sc->sm, mp, &M_RES(mp)->tr_itruncate,
> +			0, 0, 0, &sc->tp);
> +	if (error)
> +		goto out_unlock;
> +	sc->ilock_flags |= XFS_ILOCK_EXCL;
> +	xfs_ilock(sc->ip, XFS_ILOCK_EXCL);
> +
> +	return 0;
> +out_unlock:
> +	xfs_iunlock(sc->ip, sc->ilock_flags);
> +	if (sc->ip != ip)
> +		IRELE(sc->ip);
> +	sc->ip = NULL;
> +	return error;
> +}
> +
> +/* Set us up to scrub the data fork. */
> +int
> +xfs_scrub_setup_inode_bmap_data(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_inode		*ip)
> +{
> +	return __xfs_scrub_setup_inode_bmap(sc, ip, true);
> +}
> +
> +/* Set us up to scrub the attr or CoW fork. */
> +int
> +xfs_scrub_setup_inode_bmap(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_inode		*ip)
> +{
> +	return __xfs_scrub_setup_inode_bmap(sc, ip, false);
> +}
> +
> +/*
> + * Inode fork block mapping (BMBT) scrubber.
> + * More complex than the others because we have to scrub
> + * all the extents regardless of whether or not the fork
> + * is in btree format.
> + */
> +
> +struct xfs_scrub_bmap_info {
> +	struct xfs_scrub_context	*sc;
> +	const char			*type;
> +	xfs_daddr_t			eofs;
> +	xfs_fileoff_t			lastoff;
> +	bool				is_rt;
> +	bool				is_shared;
> +	int				whichfork;
> +};
> +
> +#define XFS_SCRUB_BMAP_CHECK(fs_ok) \
> +	XFS_SCRUB_INO_CHECK(info->sc, info->sc->ip->i_ino, bp, info->type, fs_ok)
> +#define XFS_SCRUB_BMAP_GOTO(fs_ok, label) \
> +	XFS_SCRUB_INO_GOTO(info->sc, info->sc->ip->i_ino, bp, info->type, fs_ok, label)
> +#define XFS_SCRUB_BMAP_OP_ERROR_GOTO(label) \
> +	XFS_SCRUB_OP_ERROR_GOTO(info->sc, agno, 0, "bmap", &error, label)
> +/* Scrub a single extent record. */
> +STATIC int
> +xfs_scrub_bmap_extent(
> +	struct xfs_inode		*ip,
> +	struct xfs_btree_cur		*cur,
> +	struct xfs_scrub_bmap_info	*info,
> +	struct xfs_bmbt_irec		*irec)
> +{
> +	struct xfs_scrub_ag		sa = { 0 };
> +	struct xfs_mount		*mp = info->sc->mp;
> +	struct xfs_buf			*bp = NULL;
> +	xfs_daddr_t			daddr;
> +	xfs_daddr_t			dlen;
> +	xfs_fsblock_t			bno;
> +	xfs_agnumber_t			agno;
> +	int				error = 0;
> +
> +	if (cur)
> +		xfs_btree_get_block(cur, 0, &bp);
> +
> +	XFS_SCRUB_BMAP_CHECK(irec->br_startoff >= info->lastoff);
> +	XFS_SCRUB_BMAP_CHECK(irec->br_startblock != HOLESTARTBLOCK);
> +	XFS_SCRUB_BMAP_CHECK(!isnullstartblock(irec->br_startblock));
> +
> +	/* Actual mapping, so check the block ranges. */
> +	if (info->is_rt) {
> +		daddr = XFS_FSB_TO_BB(mp, irec->br_startblock);
> +		agno = NULLAGNUMBER;
> +		bno = irec->br_startblock;
> +	} else {
> +		daddr = XFS_FSB_TO_DADDR(mp, irec->br_startblock);
> +		agno = XFS_FSB_TO_AGNO(mp, irec->br_startblock);
> +		XFS_SCRUB_BMAP_GOTO(agno < mp->m_sb.sb_agcount, out);
> +		bno = XFS_FSB_TO_AGBNO(mp, irec->br_startblock);
> +		XFS_SCRUB_BMAP_CHECK(bno < mp->m_sb.sb_agblocks);
> +	}
> +	dlen = XFS_FSB_TO_BB(mp, irec->br_blockcount);
> +	XFS_SCRUB_BMAP_CHECK(irec->br_blockcount > 0);
> +	XFS_SCRUB_BMAP_CHECK(irec->br_blockcount <= MAXEXTLEN);
> +	XFS_SCRUB_BMAP_CHECK(daddr < info->eofs);
> +	XFS_SCRUB_BMAP_CHECK(daddr + dlen <= info->eofs);
> +	XFS_SCRUB_BMAP_CHECK(irec->br_state != XFS_EXT_UNWRITTEN ||
> +			xfs_sb_version_hasextflgbit(&mp->m_sb));
> +	if (error)
> +		goto out;
> +
> +	/* Set ourselves up for cross-referencing later. */
> +	if (!info->is_rt) {
> +		error = xfs_scrub_ag_init(info->sc, agno, &sa);
> +		XFS_SCRUB_BMAP_OP_ERROR_GOTO(out);
> +	}
> +
> +	xfs_scrub_ag_free(info->sc, &sa);
> +out:
> +	info->lastoff = irec->br_startoff + irec->br_blockcount;
> +	return error;
> +}
> +#undef XFS_SCRUB_BMAP_OP_ERROR_GOTO
> +#undef XFS_SCRUB_BMAP_GOTO
> +
> +/* Scrub a bmbt record. */
> +STATIC int
> +xfs_scrub_bmapbt_helper(
> +	struct xfs_scrub_btree		*bs,
> +	union xfs_btree_rec		*rec)
> +{
> +	struct xfs_bmbt_rec_host	ihost;
> +	struct xfs_bmbt_irec		irec;
> +	struct xfs_scrub_bmap_info	*info = bs->private;
> +	struct xfs_inode		*ip = bs->cur->bc_private.b.ip;
> +	struct xfs_buf			*bp = NULL;
> +	struct xfs_btree_block		*block;
> +	uint64_t			owner;
> +	int				i;
> +
> +	/*
> +	 * Check the owners of the btree blocks up to the level below
> +	 * the root since the verifiers don't do that.
> +	 */
> +	if (xfs_sb_version_hascrc(&bs->cur->bc_mp->m_sb) &&
> +	    bs->cur->bc_ptrs[0] == 1) {
> +		for (i = 0; i < bs->cur->bc_nlevels - 1; i++) {
> +			block = xfs_btree_get_block(bs->cur, i, &bp);
> +			owner = be64_to_cpu(block->bb_u.l.bb_owner);
> +			XFS_SCRUB_BMAP_CHECK(owner == ip->i_ino);
> +		}
> +	}
> +
> +	/* Set up the in-core record and scrub it. */
> +	ihost.l0 = be64_to_cpu(rec->bmbt.l0);
> +	ihost.l1 = be64_to_cpu(rec->bmbt.l1);
> +	xfs_bmbt_get_all(&ihost, &irec);
> +	return xfs_scrub_bmap_extent(ip, bs->cur, info, &irec);
> +}
> +#undef XFS_SCRUB_BMAP_CHECK
> +
> +#define XFS_SCRUB_FORK_CHECK(fs_ok) \
> +	XFS_SCRUB_INO_CHECK(sc, ip->i_ino, NULL, info.type, fs_ok)
> +#define XFS_SCRUB_FORK_GOTO(fs_ok, label) \
> +	XFS_SCRUB_INO_GOTO(sc, ip->i_ino, NULL, info.type, fs_ok, label)
> +#define XFS_SCRUB_FORK_OP_ERROR_GOTO(label) \
> +	XFS_SCRUB_OP_ERROR_GOTO(sc, \
> +			XFS_INO_TO_AGNO(mp, ip->i_ino), \
> +			XFS_INO_TO_AGBNO(mp, ip->i_ino), \
> +			info.type, &error, label)
> +/* Scrub an inode fork's block mappings. */
> +STATIC int
> +xfs_scrub_bmap(
> +	struct xfs_scrub_context	*sc,
> +	int				whichfork)
> +{
> +	struct xfs_bmbt_irec		irec;
> +	struct xfs_scrub_bmap_info	info = {0};
> +	struct xfs_owner_info		oinfo;
> +	struct xfs_mount		*mp = sc->mp;
> +	struct xfs_inode		*ip = sc->ip;
> +	struct xfs_ifork		*ifp;
> +	struct xfs_btree_cur		*cur;
> +	xfs_fileoff_t			endoff;
> +	xfs_extnum_t			idx;
> +	bool				found;
> +	int				error = 0;
> +	int				err2 = 0;
> +
> +	switch (whichfork) {
> +	case XFS_DATA_FORK:
> +		info.type = "data fork";
> +		break;
> +	case XFS_ATTR_FORK:
> +		info.type = "attr fork";
> +		break;
> +	case XFS_COW_FORK:
> +		info.type = "CoW fork";
> +		break;
> +	}
> +	ifp = XFS_IFORK_PTR(ip, whichfork);
> +
> +	info.is_rt = whichfork == XFS_DATA_FORK && XFS_IS_REALTIME_INODE(ip);
> +	info.eofs = XFS_FSB_TO_BB(mp, info.is_rt ? mp->m_sb.sb_rblocks :
> +					      mp->m_sb.sb_dblocks);
> +	info.whichfork = whichfork;
> +	info.is_shared = whichfork == XFS_DATA_FORK && xfs_is_reflink_inode(ip);
> +	info.sc = sc;
> +
> +	switch (whichfork) {
> +	case XFS_COW_FORK:
> +		/* Non-existent CoW forks are ignorable. */
> +		if (!ifp)
> +			goto out_unlock;
> +		/* No CoW forks on non-reflink inodes/filesystems. */
> +		XFS_SCRUB_FORK_GOTO(xfs_is_reflink_inode(ip), out_unlock);
> +		break;
> +	case XFS_ATTR_FORK:
> +		if (!ifp)
> +			goto out_unlock;
> +		XFS_SCRUB_FORK_CHECK(xfs_sb_version_hasattr(&mp->m_sb) ||
> +				     xfs_sb_version_hasattr2(&mp->m_sb));
> +		break;
> +	}
> +
> +	/* Check the fork values */
> +	switch (XFS_IFORK_FORMAT(ip, whichfork)) {
> +	case XFS_DINODE_FMT_UUID:
> +	case XFS_DINODE_FMT_DEV:
> +	case XFS_DINODE_FMT_LOCAL:
> +		/* No mappings to check. */
> +		goto out_unlock;
> +	case XFS_DINODE_FMT_EXTENTS:
> +		XFS_SCRUB_FORK_GOTO(ifp->if_flags & XFS_IFEXTENTS, out_unlock);
> +		break;
> +	case XFS_DINODE_FMT_BTREE:
> +		XFS_SCRUB_FORK_CHECK(whichfork != XFS_COW_FORK);
> +		/* Scan the btree records. */
> +		cur = xfs_bmbt_init_cursor(mp, sc->tp, ip, whichfork);
> +		xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, whichfork);
> +		err2 = xfs_scrub_btree(sc, cur, xfs_scrub_bmapbt_helper,
> +				&oinfo, &info);
> +		xfs_btree_del_cursor(cur, err2 ? XFS_BTREE_ERROR :
> +						 XFS_BTREE_NOERROR);
> +		if (err2 == -EDEADLOCK)
> +			return err2;
> +		else if (err2)
> +			goto out_unlock;
> +		break;
> +	default:
> +		XFS_SCRUB_FORK_GOTO(false, out_unlock);
> +		break;
> +	}
> +
> +	/* Extent data is in memory, so scrub that. */
> +
> +	/* Find the offset of the last extent in the mapping. */
> +	error = xfs_bmap_last_offset(ip, &endoff, whichfork);
> +	XFS_SCRUB_FORK_OP_ERROR_GOTO(out_unlock);
> +
> +	/* Scrub extent records. */
> +	info.lastoff = 0;
> +	ifp = XFS_IFORK_PTR(ip, whichfork);
> +	for (found = xfs_iext_lookup_extent(ip, ifp, 0, &idx, &irec);
> +	     found;
Did you mean to have the found; without an assignment here?  Not sure if 
that was intentional or a typo.

Otherwise looks good.
Reviewed by: Allison Henderson <allison.henderson@oracle.com>
> +	     found = xfs_iext_get_extent(ifp, ++idx, &irec)) {
> +		if (xfs_scrub_should_terminate(&error))
> +			break;
> +		if (isnullstartblock(irec.br_startblock))
> +			continue;
> +		XFS_SCRUB_FORK_CHECK(irec.br_startoff < endoff);
> +		err2 = xfs_scrub_bmap_extent(ip, NULL, &info, &irec);
> +		if (err2 == -EDEADLOCK)
> +			return err2;
> +		else if (!error && err2)
> +			error = err2;
> +	}
> +
> +out_unlock:
> +	if (error == 0 && err2 != 0)
> +		error = err2;
> +	return error;
> +}
> +#undef XFS_SCRUB_FORK_CHECK
> +#undef XFS_SCRUB_FORK_GOTO
> +
> +/* Scrub an inode's data fork. */
> +int
> +xfs_scrub_bmap_data(
> +	struct xfs_scrub_context	*sc)
> +{
> +	return xfs_scrub_bmap(sc, XFS_DATA_FORK);
> +}
> +
> +/* Scrub an inode's attr fork. */
> +int
> +xfs_scrub_bmap_attr(
> +	struct xfs_scrub_context	*sc)
> +{
> +	return xfs_scrub_bmap(sc, XFS_ATTR_FORK);
> +}
> +
> +/* Scrub an inode's CoW fork. */
> +int
> +xfs_scrub_bmap_cow(
> +	struct xfs_scrub_context	*sc)
> +{
> +	if (!xfs_is_reflink_inode(sc->ip))
> +		return -ENOENT;
> +
> +	return xfs_scrub_bmap(sc, XFS_COW_FORK);
> +}
> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> index 066fd3e..da3c006 100644
> --- a/fs/xfs/scrub/common.c
> +++ b/fs/xfs/scrub/common.c
> @@ -795,6 +795,18 @@ static const struct xfs_scrub_meta_fns meta_scrub_fns[] = {
>  		.setup	= xfs_scrub_setup_inode,
>  		.scrub	= xfs_scrub_inode,
>  	},
> +	{ /* inode data fork */
> +		.setup	= xfs_scrub_setup_inode_bmap_data,
> +		.scrub	= xfs_scrub_bmap_data,
> +	},
> +	{ /* inode attr fork */
> +		.setup	= xfs_scrub_setup_inode_bmap,
> +		.scrub	= xfs_scrub_bmap_attr,
> +	},
> +	{ /* inode CoW fork */
> +		.setup	= xfs_scrub_setup_inode_bmap,
> +		.scrub	= xfs_scrub_bmap_cow,
> +	},
>  };
>
>  /* Dispatch metadata scrubbing. */
> diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
> index 5caa6c9..1025466 100644
> --- a/fs/xfs/scrub/common.h
> +++ b/fs/xfs/scrub/common.h
> @@ -216,6 +216,8 @@ SETUP_FN(xfs_scrub_setup_ag_iallocbt);
>  SETUP_FN(xfs_scrub_setup_ag_rmapbt);
>  SETUP_FN(xfs_scrub_setup_ag_refcountbt);
>  SETUP_FN(xfs_scrub_setup_inode);
> +SETUP_FN(xfs_scrub_setup_inode_bmap_data);
> +SETUP_FN(xfs_scrub_setup_inode_bmap);
>  #undef SETUP_FN
>
>  /* Metadata scrubbers */
> @@ -234,6 +236,9 @@ SCRUB_FN(xfs_scrub_finobt);
>  SCRUB_FN(xfs_scrub_rmapbt);
>  SCRUB_FN(xfs_scrub_refcountbt);
>  SCRUB_FN(xfs_scrub_inode);
> +SCRUB_FN(xfs_scrub_bmap_data);
> +SCRUB_FN(xfs_scrub_bmap_attr);
> +SCRUB_FN(xfs_scrub_bmap_cow);
>  #undef SCRUB_FN
>
>  #endif	/* __XFS_REPAIR_COMMON_H__ */
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 950e2c8..edfa4c7 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -3324,7 +3324,10 @@ DEFINE_GETFSMAP_EVENT(xfs_getfsmap_mapping);
>  	{ XFS_SCRUB_TYPE_FINOBT,	"finobt" }, \
>  	{ XFS_SCRUB_TYPE_RMAPBT,	"rmapbt" }, \
>  	{ XFS_SCRUB_TYPE_REFCNTBT,	"refcountbt" }, \
> -	{ XFS_SCRUB_TYPE_INODE,		"inode" }
> +	{ XFS_SCRUB_TYPE_INODE,		"inode" }, \
> +	{ XFS_SCRUB_TYPE_BMBTD,		"bmapbtd" }, \
> +	{ XFS_SCRUB_TYPE_BMBTA,		"bmapbta" }, \
> +	{ XFS_SCRUB_TYPE_BMBTC,		"bmapbtc" }
>  DECLARE_EVENT_CLASS(xfs_scrub_class,
>  	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
>  		 int error),
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 15/22] xfs: scrub directory/attribute btrees
  2017-07-21  4:40 ` [PATCH 15/22] xfs: scrub directory/attribute btrees Darrick J. Wong
@ 2017-07-23 17:45   ` Allison Henderson
  0 siblings, 0 replies; 63+ messages in thread
From: Allison Henderson @ 2017-07-23 17:45 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, Fengguang Wu

Looks good to me.  Mostly just rebased since v7.
Reviewed by: Allison Henderson <allison.henderson@oracle.com>

On 7/20/2017 9:40 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> Provide a way to check the shape and scrub the hashes and records
> in a directory or extended attribute btree.  These are helper functions
> for the directory & attribute scrubbers in subsequent patches.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> [fengguang: remove unneeded variable to store return value]
> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
> ---
>  fs/xfs/Makefile        |    1
>  fs/xfs/scrub/dabtree.c |  473 ++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/scrub/dabtree.h |   62 ++++++
>  3 files changed, 536 insertions(+)
>  create mode 100644 fs/xfs/scrub/dabtree.c
>  create mode 100644 fs/xfs/scrub/dabtree.h
>
>
> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> index 89c67e1a..5f9e121 100644
> --- a/fs/xfs/Makefile
> +++ b/fs/xfs/Makefile
> @@ -145,6 +145,7 @@ xfs-y				+= $(addprefix scrub/, \
>  				   bmap.o \
>  				   btree.o \
>  				   common.o \
> +				   dabtree.o \
>  				   ialloc.o \
>  				   inode.o \
>  				   metabufs.o \
> diff --git a/fs/xfs/scrub/dabtree.c b/fs/xfs/scrub/dabtree.c
> new file mode 100644
> index 0000000..d730e75
> --- /dev/null
> +++ b/fs/xfs/scrub/dabtree.c
> @@ -0,0 +1,473 @@
> +/*
> + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> + *
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#include "xfs.h"
> +#include "xfs_fs.h"
> +#include "xfs_shared.h"
> +#include "xfs_format.h"
> +#include "xfs_trans_resv.h"
> +#include "xfs_mount.h"
> +#include "xfs_defer.h"
> +#include "xfs_btree.h"
> +#include "xfs_bit.h"
> +#include "xfs_log_format.h"
> +#include "xfs_trans.h"
> +#include "xfs_trace.h"
> +#include "xfs_sb.h"
> +#include "xfs_inode.h"
> +#include "xfs_inode_fork.h"
> +#include "xfs_da_format.h"
> +#include "xfs_da_btree.h"
> +#include "xfs_dir2.h"
> +#include "xfs_dir2_priv.h"
> +#include "xfs_attr_leaf.h"
> +#include "scrub/common.h"
> +#include "scrub/dabtree.h"
> +
> +/* Directory/Attribute Btree */
> +
> +/* Find an entry at a certain level in a da btree. */
> +STATIC void *
> +xfs_scrub_da_btree_entry(
> +	struct xfs_scrub_da_btree	*ds,
> +	int				level,
> +	int				rec)
> +{
> +	char				*ents;
> +	void				*(*fn)(void *);
> +	size_t				sz;
> +	struct xfs_da_state_blk		*blk;
> +
> +	/* Dispatch the entry finding function. */
> +	blk = &ds->state->path.blk[level];
> +	switch (blk->magic) {
> +	case XFS_ATTR_LEAF_MAGIC:
> +	case XFS_ATTR3_LEAF_MAGIC:
> +		fn = (xfs_da_leaf_ents_fn)xfs_attr3_leaf_entryp;
> +		sz = sizeof(struct xfs_attr_leaf_entry);
> +		break;
> +	case XFS_DIR2_LEAFN_MAGIC:
> +	case XFS_DIR3_LEAFN_MAGIC:
> +		fn = (xfs_da_leaf_ents_fn)ds->dargs.dp->d_ops->leaf_ents_p;
> +		sz = sizeof(struct xfs_dir2_leaf_entry);
> +		break;
> +	case XFS_DIR2_LEAF1_MAGIC:
> +	case XFS_DIR3_LEAF1_MAGIC:
> +		fn = (xfs_da_leaf_ents_fn)ds->dargs.dp->d_ops->leaf_ents_p;
> +		sz = sizeof(struct xfs_dir2_leaf_entry);
> +		break;
> +	case XFS_DA_NODE_MAGIC:
> +	case XFS_DA3_NODE_MAGIC:
> +		fn = (xfs_da_leaf_ents_fn)ds->dargs.dp->d_ops->node_tree_p;
> +		sz = sizeof(struct xfs_da_node_entry);
> +		break;
> +	default:
> +		return NULL;
> +	}
> +
> +	ents = fn(blk->bp->b_addr);
> +	return ents + (sz * rec);
> +}
> +
> +/* Scrub a da btree hash (key). */
> +int
> +xfs_scrub_da_btree_hash(
> +	struct xfs_scrub_da_btree	*ds,
> +	int				level,
> +	__be32				*hashp)
> +{
> +	struct xfs_da_state_blk		*blks;
> +	struct xfs_da_node_entry	*btree;
> +	xfs_dahash_t			hash;
> +	xfs_dahash_t			parent_hash;
> +
> +	/* Is this hash in order? */
> +	hash = be32_to_cpu(*hashp);
> +	XFS_SCRUB_DA_CHECK(ds, hash >= ds->hashes[level]);
> +	ds->hashes[level] = hash;
> +
> +	if (level == 0)
> +		return 0;
> +
> +	/* Is this hash no larger than the parent hash? */
> +	blks = ds->state->path.blk;
> +	btree = xfs_scrub_da_btree_entry(ds, level - 1, blks[level - 1].index);
> +	parent_hash = be32_to_cpu(btree->hashval);
> +	XFS_SCRUB_DA_CHECK(ds, hash <= parent_hash);
> +
> +	return 0;
> +}
> +
> +/* Scrub a da btree pointer. */
> +STATIC int
> +xfs_scrub_da_btree_ptr(
> +	struct xfs_scrub_da_btree	*ds,
> +	int				level,
> +	xfs_dablk_t			blkno)
> +{
> +	XFS_SCRUB_DA_CHECK(ds, blkno >= ds->lowest);
> +	XFS_SCRUB_DA_CHECK(ds, ds->highest == 0 || blkno < ds->highest);
> +
> +	return 0;
> +}
> +
> +/*
> + * The da btree scrubber can handle leaf1 blocks as a degenerate
> + * form of da btree.  Since the regular da code doesn't handle
> + * leaf1, we must multiplex the verifiers.
> + */
> +static void
> +xfs_scrub_da_btree_read_verify(
> +	struct xfs_buf		*bp)
> +{
> +	struct xfs_da_blkinfo	*info = bp->b_addr;
> +
> +	switch (be16_to_cpu(info->magic)) {
> +	case XFS_DIR2_LEAF1_MAGIC:
> +	case XFS_DIR3_LEAF1_MAGIC:
> +		bp->b_ops = &xfs_dir3_leaf1_buf_ops;
> +		bp->b_ops->verify_read(bp);
> +		return;
> +	default:
> +		bp->b_ops = &xfs_da3_node_buf_ops;
> +		bp->b_ops->verify_read(bp);
> +		return;
> +	}
> +}
> +static void
> +xfs_scrub_da_btree_write_verify(
> +	struct xfs_buf	*bp)
> +{
> +	struct xfs_da_blkinfo	*info = bp->b_addr;
> +
> +	switch (be16_to_cpu(info->magic)) {
> +	case XFS_DIR2_LEAF1_MAGIC:
> +	case XFS_DIR3_LEAF1_MAGIC:
> +		bp->b_ops = &xfs_dir3_leaf1_buf_ops;
> +		bp->b_ops->verify_write(bp);
> +		return;
> +	default:
> +		bp->b_ops = &xfs_da3_node_buf_ops;
> +		bp->b_ops->verify_write(bp);
> +		return;
> +	}
> +}
> +
> +static const struct xfs_buf_ops xfs_scrub_da_btree_buf_ops = {
> +	.name = "xfs_scrub_da_btree",
> +	.verify_read = xfs_scrub_da_btree_read_verify,
> +	.verify_write = xfs_scrub_da_btree_write_verify,
> +};
> +
> +/* Check a block's sibling pointers. */
> +STATIC int
> +xfs_scrub_da_btree_block_check_siblings(
> +	struct xfs_scrub_da_btree	*ds,
> +	int				level,
> +	struct xfs_da_blkinfo		*hdr)
> +{
> +	xfs_dablk_t			forw;
> +	xfs_dablk_t			back;
> +	int				retval;
> +	int				error = 0;
> +
> +	forw = be32_to_cpu(hdr->forw);
> +	back = be32_to_cpu(hdr->back);
> +
> +	/* Top level blocks should not have sibling pointers. */
> +	if (level == 0) {
> +		XFS_SCRUB_DA_CHECK(ds, forw == 0);
> +		XFS_SCRUB_DA_CHECK(ds, back == 0);
> +		return error;
> +	}
> +
> +	/* Check back (left) pointer. */
> +	if (back != 0) {
> +		/* Move the alternate cursor back one block. */
> +		ds->state->altpath = ds->state->path;
> +		error = xfs_da3_path_shift(ds->state, &ds->state->altpath,
> +				0, false, &retval);
> +		XFS_SCRUB_DA_OP_ERROR_GOTO(ds, &error, out);
> +		XFS_SCRUB_DA_GOTO(ds, retval == 0, verify_forw);
> +		XFS_SCRUB_DA_CHECK(ds,
> +				ds->state->altpath.blk[level].blkno == back);
> +		xfs_trans_brelse(ds->dargs.trans,
> +				ds->state->altpath.blk[level].bp);
> +	}
> +
> +verify_forw:
> +	/* Check forw (right) pointer. */
> +	if (!error && forw != 0) {
> +		/* Move the alternate cursor forward one block. */
> +		ds->state->altpath = ds->state->path;
> +		error = xfs_da3_path_shift(ds->state, &ds->state->altpath,
> +				1, false, &retval);
> +		XFS_SCRUB_DA_OP_ERROR_GOTO(ds, &error, out);
> +		XFS_SCRUB_DA_GOTO(ds, retval == 0, out);
> +		XFS_SCRUB_DA_CHECK(ds,
> +				ds->state->altpath.blk[level].blkno == forw);
> +		xfs_trans_brelse(ds->dargs.trans,
> +				ds->state->altpath.blk[level].bp);
> +	}
> +out:
> +	memset(&ds->state->altpath, 0, sizeof(ds->state->altpath));
> +	return error;
> +}
> +
> +/* Load a dir/attribute block from a btree. */
> +STATIC int
> +xfs_scrub_da_btree_block(
> +	struct xfs_scrub_da_btree	*ds,
> +	int				level,
> +	xfs_dablk_t			blkno)
> +{
> +	struct xfs_da_state_blk		*blk;
> +	struct xfs_da_intnode		*node;
> +	struct xfs_da_node_entry	*btree;
> +	struct xfs_da3_blkinfo		*hdr3;
> +	struct xfs_da_args		*dargs = &ds->dargs;
> +	struct xfs_inode		*ip = ds->dargs.dp;
> +	xfs_ino_t			owner;
> +	int				*pmaxrecs;
> +	struct xfs_da3_icnode_hdr	nodehdr;
> +	int				error;
> +
> +	blk = &ds->state->path.blk[level];
> +	ds->state->path.active = level + 1;
> +
> +	/* Release old block. */
> +	if (blk->bp) {
> +		xfs_trans_brelse(dargs->trans, blk->bp);
> +		blk->bp = NULL;
> +	}
> +
> +	/* Check the pointer. */
> +	blk->blkno = blkno;
> +	error = xfs_scrub_da_btree_ptr(ds, level, blkno);
> +	if (error) {
> +		blk->blkno = 0;
> +		goto out;
> +	}
> +
> +	/* Read the buffer. */
> +	error = xfs_da_read_buf(dargs->trans, dargs->dp, blk->blkno, -2,
> +			&blk->bp, dargs->whichfork,
> +			&xfs_scrub_da_btree_buf_ops);
> +	XFS_SCRUB_DA_OP_ERROR_GOTO(ds, &error, out_nobuf);
> +
> +	/* It's ok for a directory not to have a da btree in it. */
> +	if (ds->dargs.whichfork == XFS_DATA_FORK && level == 0 &&
> +			blk->bp == NULL)
> +		goto out_nobuf;
> +	XFS_SCRUB_DA_GOTO(ds, blk->bp != NULL, out_nobuf);
> +
> +	hdr3 = blk->bp->b_addr;
> +	blk->magic = be16_to_cpu(hdr3->hdr.magic);
> +	pmaxrecs = &ds->maxrecs[level];
> +
> +	/* Check the owner. */
> +	if (xfs_sb_version_hascrc(&ip->i_mount->m_sb)) {
> +		owner = be64_to_cpu(hdr3->owner);
> +		error = -EFSCORRUPTED;
> +		XFS_SCRUB_DA_GOTO(ds, owner == ip->i_ino, out);
> +	}
> +
> +	/* Check the siblings. */
> +	error = xfs_scrub_da_btree_block_check_siblings(ds, level, &hdr3->hdr);
> +	if (error)
> +		goto out;
> +
> +	/* Interpret the buffer. */
> +	error = -EFSCORRUPTED;
> +	switch (blk->magic) {
> +	case XFS_ATTR_LEAF_MAGIC:
> +	case XFS_ATTR3_LEAF_MAGIC:
> +		xfs_trans_buf_set_type(dargs->trans, blk->bp,
> +				XFS_BLFT_ATTR_LEAF_BUF);
> +		blk->magic = XFS_ATTR_LEAF_MAGIC;
> +		blk->hashval = xfs_attr_leaf_lasthash(blk->bp, pmaxrecs);
> +		XFS_SCRUB_DA_CHECK(ds, ds->tree_level == 0);
> +		break;
> +	case XFS_DIR2_LEAFN_MAGIC:
> +	case XFS_DIR3_LEAFN_MAGIC:
> +		xfs_trans_buf_set_type(dargs->trans, blk->bp,
> +				XFS_BLFT_DIR_LEAFN_BUF);
> +		blk->magic = XFS_DIR2_LEAFN_MAGIC;
> +		blk->hashval = xfs_dir2_leaf_lasthash(ip, blk->bp, pmaxrecs);
> +		XFS_SCRUB_DA_CHECK(ds, ds->tree_level == 0);
> +		break;
> +	case XFS_DIR2_LEAF1_MAGIC:
> +	case XFS_DIR3_LEAF1_MAGIC:
> +		xfs_trans_buf_set_type(dargs->trans, blk->bp,
> +				XFS_BLFT_DIR_LEAF1_BUF);
> +		blk->magic = XFS_DIR2_LEAF1_MAGIC;
> +		blk->hashval = xfs_dir2_leaf_lasthash(ip, blk->bp, pmaxrecs);
> +		XFS_SCRUB_DA_CHECK(ds, ds->tree_level == 0);
> +		break;
> +	case XFS_DA_NODE_MAGIC:
> +	case XFS_DA3_NODE_MAGIC:
> +		xfs_trans_buf_set_type(dargs->trans, blk->bp,
> +				XFS_BLFT_DA_NODE_BUF);
> +		blk->magic = XFS_DA_NODE_MAGIC;
> +		node = blk->bp->b_addr;
> +		ip->d_ops->node_hdr_from_disk(&nodehdr, node);
> +		btree = ip->d_ops->node_tree_p(node);
> +		*pmaxrecs = nodehdr.count;
> +		blk->hashval = be32_to_cpu(btree[*pmaxrecs - 1].hashval);
> +		if (level == 0) {
> +			XFS_SCRUB_DA_GOTO(ds,
> +					nodehdr.level < XFS_DA_NODE_MAXDEPTH,
> +					out);
> +			ds->tree_level = nodehdr.level;
> +		} else
> +			XFS_SCRUB_DA_GOTO(ds, ds->tree_level == nodehdr.level,
> +					out);
> +		break;
> +	default:
> +		XFS_SCRUB_DA_CHECK(ds, false);
> +		xfs_trans_brelse(dargs->trans, blk->bp);
> +		blk->bp = NULL;
> +		blk->blkno = 0;
> +		break;
> +	}
> +	error = 0;
> +
> +out:
> +	return error;
> +out_nobuf:
> +	blk->blkno = 0;
> +	return error;
> +}
> +
> +/* Visit all nodes and leaves of a da btree. */
> +int
> +xfs_scrub_da_btree(
> +	struct xfs_scrub_context	*sc,
> +	int				whichfork,
> +	xfs_scrub_da_btree_rec_fn	scrub_fn)
> +{
> +	struct xfs_scrub_da_btree	ds;
> +	struct xfs_mount		*mp = sc->mp;
> +	struct xfs_da_state_blk		*blks;
> +	struct xfs_da_node_entry	*btree;
> +	void				*rec;
> +	xfs_dablk_t			blkno;
> +	bool				is_attr;
> +	int				level;
> +	int				error;
> +
> +	memset(&ds, 0, sizeof(ds));
> +	/* Skip short format data structures; no btree to scan. */
> +	if (XFS_IFORK_FORMAT(sc->ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
> +	    XFS_IFORK_FORMAT(sc->ip, whichfork) != XFS_DINODE_FMT_BTREE)
> +		return 0;
> +
> +	/* Set up initial da state. */
> +	is_attr = whichfork == XFS_ATTR_FORK;
> +	ds.dargs.geo = is_attr ? mp->m_attr_geo : mp->m_dir_geo;
> +	ds.dargs.dp = sc->ip;
> +	ds.dargs.whichfork = whichfork;
> +	ds.dargs.trans = sc->tp;
> +	ds.dargs.op_flags = XFS_DA_OP_OKNOENT;
> +	ds.state = xfs_da_state_alloc();
> +	ds.state->args = &ds.dargs;
> +	ds.state->mp = mp;
> +	ds.type = is_attr ? "attr" : "dir";
> +	ds.sc = sc;
> +	blkno = ds.lowest = is_attr ? 0 : ds.dargs.geo->leafblk;
> +	ds.highest = is_attr ? 0 : ds.dargs.geo->freeblk;
> +	level = 0;
> +
> +	/* Find the root of the da tree, if present. */
> +	blks = ds.state->path.blk;
> +	error = xfs_scrub_da_btree_block(&ds, level, blkno);
> +	if (error)
> +		goto out_state;
> +	if (blks[level].bp == NULL)
> +		goto out_state;
> +
> +	blks[level].index = 0;
> +	while (level >= 0 && level < XFS_DA_NODE_MAXDEPTH) {
> +		/* Handle leaf block. */
> +		if (blks[level].magic != XFS_DA_NODE_MAGIC) {
> +			/* End of leaf, pop back towards the root. */
> +			if (blks[level].index >= ds.maxrecs[level]) {
> +				if (level > 0)
> +					blks[level - 1].index++;
> +				ds.tree_level++;
> +				level--;
> +				continue;
> +			}
> +
> +			/* Dispatch record scrubbing. */
> +			rec = xfs_scrub_da_btree_entry(&ds, level,
> +					blks[level].index);
> +			error = scrub_fn(&ds, level, rec);
> +			if (error < 0 ||
> +			    error == XFS_BTREE_QUERY_RANGE_ABORT)
> +				break;
> +			if (xfs_scrub_should_terminate(&error))
> +				break;
> +
> +			blks[level].index++;
> +			continue;
> +		}
> +
> +		btree = xfs_scrub_da_btree_entry(&ds, level, blks[level].index);
> +
> +		/* End of node, pop back towards the root. */
> +		if (blks[level].index >= ds.maxrecs[level]) {
> +			if (level > 0)
> +				blks[level - 1].index++;
> +			ds.tree_level++;
> +			level--;
> +			continue;
> +		}
> +
> +		/* Hashes in order for scrub? */
> +		error = xfs_scrub_da_btree_hash(&ds, level, &btree->hashval);
> +		if (error)
> +			goto out;
> +
> +		/* Drill another level deeper. */
> +		blkno = be32_to_cpu(btree->before);
> +		level++;
> +		ds.tree_level--;
> +		error = xfs_scrub_da_btree_block(&ds, level, blkno);
> +		if (error)
> +			goto out;
> +		if (blks[level].bp == NULL)
> +			goto out;
> +
> +		blks[level].index = 0;
> +	}
> +
> +out:
> +	/* Release all the buffers we're tracking. */
> +	for (level = 0; level < XFS_DA_NODE_MAXDEPTH; level++) {
> +		if (blks[level].bp == NULL)
> +			continue;
> +		xfs_trans_brelse(sc->tp, blks[level].bp);
> +		blks[level].bp = NULL;
> +	}
> +
> +out_state:
> +	xfs_da_state_free(ds.state);
> +	return error;
> +}
> diff --git a/fs/xfs/scrub/dabtree.h b/fs/xfs/scrub/dabtree.h
> new file mode 100644
> index 0000000..1302d67
> --- /dev/null
> +++ b/fs/xfs/scrub/dabtree.h
> @@ -0,0 +1,62 @@
> +/*
> + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> + *
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#ifndef __XFS_REPAIR_DABTREE_H__
> +#define __XFS_REPAIR_DABTREE_H__
> +
> +/* dir/attr btree */
> +
> +struct xfs_scrub_da_btree {
> +	struct xfs_da_args		dargs;
> +	xfs_dahash_t			hashes[XFS_DA_NODE_MAXDEPTH];
> +	int				maxrecs[XFS_DA_NODE_MAXDEPTH];
> +	struct xfs_da_state		*state;
> +	const char			*type;
> +	struct xfs_scrub_context	*sc;
> +	xfs_dablk_t			lowest;
> +	xfs_dablk_t			highest;
> +	int				tree_level;
> +};
> +
> +typedef void *(*xfs_da_leaf_ents_fn)(void *);
> +typedef int (*xfs_scrub_da_btree_rec_fn)(struct xfs_scrub_da_btree *ds,
> +		int level, void *rec);
> +
> +#define XFS_SCRUB_DA_CHECK(ds, fs_ok) \
> +	XFS_SCRUB_DATA_CHECK((ds)->sc, (ds)->dargs.whichfork, \
> +			xfs_dir2_da_to_db((ds)->dargs.geo, \
> +			(ds)->state->path.blk[level].blkno), (ds)->type, \
> +			fs_ok)
> +#define XFS_SCRUB_DA_GOTO(ds, fs_ok, label) \
> +	XFS_SCRUB_DATA_GOTO((ds)->sc, (ds)->dargs.whichfork, \
> +			xfs_dir2_da_to_db((ds)->dargs.geo, \
> +			(ds)->state->path.blk[level].blkno), (ds)->type, \
> +			fs_ok, label)
> +#define XFS_SCRUB_DA_OP_ERROR_GOTO(ds, error, label) \
> +	XFS_SCRUB_FILE_OP_ERROR_GOTO((ds)->sc, (ds)->dargs.whichfork, \
> +			xfs_dir2_da_to_db((ds)->dargs.geo, \
> +			(ds)->state->path.blk[level].blkno), (ds)->type, \
> +			(error), label)
> +
> +int xfs_scrub_da_btree_hash(struct xfs_scrub_da_btree *ds, int level,
> +			    __be32 *hashp);
> +int xfs_scrub_da_btree(struct xfs_scrub_context *sc, int whichfork,
> +		       xfs_scrub_da_btree_rec_fn scrub_fn);
> +
> +#endif /* __XFS_REPAIR_DABTREE_H__ */
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 16/22] xfs: scrub directory metadata
  2017-07-21  4:40 ` [PATCH 16/22] xfs: scrub directory metadata Darrick J. Wong
@ 2017-07-23 17:51   ` Allison Henderson
  0 siblings, 0 replies; 63+ messages in thread
From: Allison Henderson @ 2017-07-23 17:51 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs



On 7/20/2017 9:40 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> Scrub the hash tree and all the entries in a directory.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/Makefile        |    1
>  fs/xfs/libxfs/xfs_fs.h |    3
>  fs/xfs/scrub/common.c  |   37 ++++++
>  fs/xfs/scrub/common.h  |    4 +
>  fs/xfs/scrub/dir.c     |  291 ++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_trace.h     |    3
>  6 files changed, 337 insertions(+), 2 deletions(-)
>  create mode 100644 fs/xfs/scrub/dir.c
>
>
> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> index 5f9e121..c568d0d 100644
> --- a/fs/xfs/Makefile
> +++ b/fs/xfs/Makefile
> @@ -146,6 +146,7 @@ xfs-y				+= $(addprefix scrub/, \
>  				   btree.o \
>  				   common.o \
>  				   dabtree.o \
> +				   dir.o \
>  				   ialloc.o \
>  				   inode.o \
>  				   metabufs.o \
> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> index d762277..e646b5f 100644
> --- a/fs/xfs/libxfs/xfs_fs.h
> +++ b/fs/xfs/libxfs/xfs_fs.h
> @@ -497,7 +497,8 @@ struct xfs_scrub_metadata {
>  #define XFS_SCRUB_TYPE_BMBTD	13	/* data fork block mapping */
>  #define XFS_SCRUB_TYPE_BMBTA	14	/* attr fork block mapping */
>  #define XFS_SCRUB_TYPE_BMBTC	15	/* CoW fork block mapping */
> -#define XFS_SCRUB_TYPE_MAX	15
> +#define XFS_SCRUB_TYPE_DIR	16	/* directory */
> +#define XFS_SCRUB_TYPE_MAX	16
>
>  /* i: repair this metadata */
>  #define XFS_SCRUB_FLAG_REPAIR		(1 << 0)
> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> index da3c006..92627e9 100644
> --- a/fs/xfs/scrub/common.c
> +++ b/fs/xfs/scrub/common.c
> @@ -633,6 +633,39 @@ xfs_scrub_get_inode(
>  	return 0;
>  }
>
> +/* Set us up to scrub a file's contents. */
> +int
> +xfs_scrub_setup_inode_contents(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_inode		*ip,
> +	unsigned int			resblks)
> +{
> +	struct xfs_mount		*mp = sc->mp;
> +	int				error;
> +
> +	error = xfs_scrub_get_inode(sc, ip);
> +	if (error)
> +		return error;
> +
> +	/* Got the inode, lock it and we're ready to go. */
> +	sc->ilock_flags = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL;
> +	xfs_ilock(sc->ip, sc->ilock_flags);
> +	error = xfs_scrub_trans_alloc(sc->sm, mp, &M_RES(mp)->tr_itruncate,
> +			resblks, 0, 0, &sc->tp);
> +	if (error)
> +		goto out_unlock;
> +	sc->ilock_flags |= XFS_ILOCK_EXCL;
> +	xfs_ilock(sc->ip, XFS_ILOCK_EXCL);
Similar question with the locks here that we saw in patch 13

Otherwise I think the rest is fine
Reviewed by: Allison Henderson <allison.henderson@oracle.com>

> +
> +	return 0;
> +out_unlock:
> +	xfs_iunlock(sc->ip, sc->ilock_flags);
> +	if (sc->ip != ip)
> +		IRELE(sc->ip);
> +	sc->ip = NULL;
> +	return error;
> +}
> +
>  /* Scrub setup and teardown */
>
>  /* Free all the resources and finish the transactions. */
> @@ -807,6 +840,10 @@ static const struct xfs_scrub_meta_fns meta_scrub_fns[] = {
>  		.setup	= xfs_scrub_setup_inode_bmap,
>  		.scrub	= xfs_scrub_bmap_cow,
>  	},
> +	{ /* directory */
> +		.setup	= xfs_scrub_setup_directory,
> +		.scrub	= xfs_scrub_directory,
> +	},
>  };
>
>  /* Dispatch metadata scrubbing. */
> diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
> index 1025466..7baaa2d 100644
> --- a/fs/xfs/scrub/common.h
> +++ b/fs/xfs/scrub/common.h
> @@ -206,6 +206,8 @@ int xfs_scrub_walk_agfl(struct xfs_scrub_context *sc,
>  int xfs_scrub_setup_ag_btree(struct xfs_scrub_context *sc,
>  			     struct xfs_inode *ip, bool force_log);
>  int xfs_scrub_get_inode(struct xfs_scrub_context *sc, struct xfs_inode *ip_in);
> +int xfs_scrub_setup_inode_contents(struct xfs_scrub_context *sc,
> +				   struct xfs_inode *ip, unsigned int resblks);
>
>  #define SETUP_FN(name) int name(struct xfs_scrub_context *sc, struct xfs_inode *ip)
>  SETUP_FN(xfs_scrub_setup_fs);
> @@ -218,6 +220,7 @@ SETUP_FN(xfs_scrub_setup_ag_refcountbt);
>  SETUP_FN(xfs_scrub_setup_inode);
>  SETUP_FN(xfs_scrub_setup_inode_bmap_data);
>  SETUP_FN(xfs_scrub_setup_inode_bmap);
> +SETUP_FN(xfs_scrub_setup_directory);
>  #undef SETUP_FN
>
>  /* Metadata scrubbers */
> @@ -239,6 +242,7 @@ SCRUB_FN(xfs_scrub_inode);
>  SCRUB_FN(xfs_scrub_bmap_data);
>  SCRUB_FN(xfs_scrub_bmap_attr);
>  SCRUB_FN(xfs_scrub_bmap_cow);
> +SCRUB_FN(xfs_scrub_directory);
>  #undef SCRUB_FN
>
>  #endif	/* __XFS_REPAIR_COMMON_H__ */
> diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c
> new file mode 100644
> index 0000000..1d6b5329
> --- /dev/null
> +++ b/fs/xfs/scrub/dir.c
> @@ -0,0 +1,291 @@
> +/*
> + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> + *
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#include "xfs.h"
> +#include "xfs_fs.h"
> +#include "xfs_shared.h"
> +#include "xfs_format.h"
> +#include "xfs_trans_resv.h"
> +#include "xfs_mount.h"
> +#include "xfs_defer.h"
> +#include "xfs_btree.h"
> +#include "xfs_bit.h"
> +#include "xfs_log_format.h"
> +#include "xfs_trans.h"
> +#include "xfs_trace.h"
> +#include "xfs_sb.h"
> +#include "xfs_inode.h"
> +#include "xfs_icache.h"
> +#include "xfs_itable.h"
> +#include "xfs_da_format.h"
> +#include "xfs_da_btree.h"
> +#include "xfs_dir2.h"
> +#include "xfs_dir2_priv.h"
> +#include "scrub/common.h"
> +#include "scrub/dabtree.h"
> +
> +/* Set us up to scrub directories. */
> +int
> +xfs_scrub_setup_directory(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_inode		*ip)
> +{
> +	return xfs_scrub_setup_inode_contents(sc, ip, 0);
> +}
> +
> +/* Directories */
> +
> +/* Scrub a directory entry. */
> +
> +struct xfs_scrub_dir_ctx {
> +	struct dir_context		dc;
> +	struct xfs_scrub_context	*sc;
> +};
> +
> +#define XFS_SCRUB_DIR_CHECK(fs_ok) \
> +	XFS_SCRUB_DATA_CHECK(sdc->sc, XFS_DATA_FORK, offset, "dir", fs_ok)
> +#define XFS_SCRUB_DIR_GOTO(fs_ok, label) \
> +	XFS_SCRUB_DATA_GOTO(sdc->sc, XFS_DATA_FORK, offset, "dir", fs_ok, label)
> +#define XFS_SCRUB_DIR_OP_ERROR_GOTO(label) \
> +	XFS_SCRUB_FILE_OP_ERROR_GOTO(sdc->sc, XFS_DATA_FORK, offset, "dir", &error, label)
> +/* Check that an inode's mode matches a given DT_ type. */
> +STATIC int
> +xfs_scrub_dir_check_ftype(
> +	struct xfs_scrub_dir_ctx	*sdc,
> +	xfs_fileoff_t			offset,
> +	xfs_ino_t			inum,
> +	int				dtype)
> +{
> +	struct xfs_mount		*mp = sdc->sc->mp;
> +	struct xfs_inode		*ip;
> +	int				ino_dtype;
> +	int				error = 0;
> +
> +	if (!xfs_sb_version_hasftype(&mp->m_sb)) {
> +		XFS_SCRUB_DIR_CHECK(dtype == DT_UNKNOWN || dtype == DT_DIR);
> +		goto out;
> +	}
> +
> +	error = xfs_iget(mp, sdc->sc->tp, inum, 0, 0, &ip);
> +	XFS_SCRUB_OP_ERROR_GOTO(sdc->sc,
> +			XFS_INO_TO_AGNO(mp, inum),
> +			XFS_INO_TO_AGBNO(mp, inum),
> +			"inode", &error, out);
> +	/* Convert mode to the DT_* values that dir_emit uses. */
> +	ino_dtype = (VFS_I(ip)->i_mode & S_IFMT) >> 12;
> +	XFS_SCRUB_DIR_CHECK(ino_dtype == dtype);
> +	IRELE(ip);
> +out:
> +	return error;
> +}
> +
> +/* Scrub a single directory entry. */
> +STATIC int
> +xfs_scrub_dir_actor(
> +	struct dir_context		*dc,
> +	const char			*name,
> +	int				namelen,
> +	loff_t				pos,
> +	u64				ino,
> +	unsigned			type)
> +{
> +	struct xfs_mount		*mp;
> +	struct xfs_inode		*ip;
> +	struct xfs_scrub_dir_ctx	*sdc;
> +	struct xfs_name			xname;
> +	xfs_ino_t			lookup_ino;
> +	xfs_dablk_t			offset;
> +	int				error = 0;
> +
> +	sdc = container_of(dc, struct xfs_scrub_dir_ctx, dc);
> +	ip = sdc->sc->ip;
> +	mp = ip->i_mount;
> +	offset = xfs_dir2_db_to_da(mp->m_dir_geo,
> +			xfs_dir2_dataptr_to_db(mp->m_dir_geo, pos));
> +
> +	/* Does this inode number make sense? */
> +	XFS_SCRUB_DIR_GOTO(xfs_dir_ino_validate(mp, ino) == 0, out);
> +	XFS_SCRUB_DIR_GOTO(!xfs_internal_inum(mp, ino), out);
> +
> +	/* Verify that we can look up this name by hash. */
> +	xname.name = name;
> +	xname.len = namelen;
> +	xname.type = XFS_DIR3_FT_UNKNOWN;
> +
> +	error = xfs_dir_lookup(sdc->sc->tp, ip, &xname, &lookup_ino, NULL);
> +	XFS_SCRUB_DIR_OP_ERROR_GOTO(fail_xref);
> +	XFS_SCRUB_DIR_GOTO(lookup_ino == ino, out);
> +
> +	if (!strncmp(".", name, namelen)) {
> +		/* If this is "." then check that the inum matches the dir. */
> +		if (xfs_sb_version_hasftype(&mp->m_sb))
> +			XFS_SCRUB_DIR_CHECK(type == DT_DIR);
> +		XFS_SCRUB_DIR_CHECK(ino == ip->i_ino);
> +	} else if (!strncmp("..", name, namelen)) {
> +		/*
> +		 * If this is ".." in the root inode, check that the inum
> +		 * matches this dir.
> +		 */
> +		if (xfs_sb_version_hasftype(&mp->m_sb))
> +			XFS_SCRUB_DIR_CHECK(type == DT_DIR);
> +		if (ip->i_ino == mp->m_sb.sb_rootino)
> +			XFS_SCRUB_DIR_CHECK(ino == ip->i_ino);
> +	}
> +	if (error)
> +		goto out;
> +
> +	/* Verify the file type. */
> +	error = xfs_scrub_dir_check_ftype(sdc, offset, lookup_ino, type);
> +	if (error)
> +		goto out;
> +out:
> +	return error;
> +fail_xref:
> +	return error ? error : -EFSCORRUPTED;
> +}
> +#undef XFS_SCRUB_DIR_OP_ERROR_GOTO
> +#undef XFS_SCRUB_DIR_GOTO
> +#undef XFS_SCRUB_DIR_CHECK
> +
> +#define XFS_SCRUB_DIRENT_CHECK(fs_ok) \
> +	XFS_SCRUB_DATA_CHECK(ds->sc, XFS_DATA_FORK, rec_bno, "dir", fs_ok)
> +#define XFS_SCRUB_DIRENT_GOTO(fs_ok, label) \
> +	XFS_SCRUB_DATA_GOTO(ds->sc, XFS_DATA_FORK, rec_bno, "dir", fs_ok, label)
> +#define XFS_SCRUB_DIRENT_OP_ERROR_GOTO(label) \
> +	XFS_SCRUB_FILE_OP_ERROR_GOTO(ds->sc, XFS_DATA_FORK, rec_bno, "dir", &error, label)
> +/* Scrub a directory btree record. */
> +STATIC int
> +xfs_scrub_dir_rec(
> +	struct xfs_scrub_da_btree	*ds,
> +	int				level,
> +	void				*rec)
> +{
> +	struct xfs_mount		*mp = ds->state->mp;
> +	struct xfs_dir2_leaf_entry	*ent = rec;
> +	struct xfs_inode		*dp = ds->dargs.dp;
> +	struct xfs_dir2_data_entry	*dent;
> +	struct xfs_buf			*bp;
> +	xfs_ino_t			ino;
> +	xfs_dablk_t			rec_bno;
> +	xfs_dir2_db_t			db;
> +	xfs_dir2_data_aoff_t		off;
> +	xfs_dir2_dataptr_t		ptr;
> +	xfs_dahash_t			calc_hash;
> +	xfs_dahash_t			hash;
> +	unsigned int			tag;
> +	int				error;
> +
> +	/* Check the hash of the entry. */
> +	error = xfs_scrub_da_btree_hash(ds, level, &ent->hashval);
> +	if (error)
> +		goto out;
> +
> +	/* Valid hash pointer? */
> +	ptr = be32_to_cpu(ent->address);
> +	if (ptr == 0)
> +		return 0;
> +
> +	/* Find the directory entry's location. */
> +	db = xfs_dir2_dataptr_to_db(mp->m_dir_geo, ptr);
> +	off = xfs_dir2_dataptr_to_off(mp->m_dir_geo, ptr);
> +	rec_bno = xfs_dir2_db_to_da(mp->m_dir_geo, db);
> +
> +	XFS_SCRUB_DA_GOTO(ds, rec_bno < mp->m_dir_geo->leafblk, out);
> +	error = xfs_dir3_data_read(ds->dargs.trans, dp, rec_bno, -2, &bp);
> +	XFS_SCRUB_DIRENT_OP_ERROR_GOTO(out);
> +	XFS_SCRUB_DIRENT_GOTO(bp != NULL, out);
> +
> +	/* Retrieve the entry and check it. */
> +	dent = (struct xfs_dir2_data_entry *)(((char *)bp->b_addr) + off);
> +	ino = be64_to_cpu(dent->inumber);
> +	hash = be32_to_cpu(ent->hashval);
> +	tag = be16_to_cpup(dp->d_ops->data_entry_tag_p(dent));
> +	XFS_SCRUB_DIRENT_CHECK(xfs_dir_ino_validate(mp, ino) == 0);
> +	XFS_SCRUB_DIRENT_CHECK(!xfs_internal_inum(mp, ino));
> +	XFS_SCRUB_DIRENT_CHECK(tag == off);
> +	XFS_SCRUB_DIRENT_GOTO(dent->namelen < MAXNAMELEN, out_relse);
> +	calc_hash = xfs_da_hashname(dent->name, dent->namelen);
> +	XFS_SCRUB_DIRENT_CHECK(calc_hash == hash);
> +
> +out_relse:
> +	xfs_trans_brelse(ds->dargs.trans, bp);
> +out:
> +	return error;
> +}
> +#undef XFS_SCRUB_DIRENT_OP_ERROR_GOTO
> +#undef XFS_SCRUB_DIRENT_GOTO
> +#undef XFS_SCRUB_DIRENT_CHECK
> +
> +/* Scrub a whole directory. */
> +int
> +xfs_scrub_directory(
> +	struct xfs_scrub_context	*sc)
> +{
> +	struct xfs_scrub_dir_ctx	sdc = {
> +		.dc.actor = xfs_scrub_dir_actor,
> +		.dc.pos = 0,
> +	};
> +	struct xfs_mount		*mp = sc->mp;
> +	size_t				bufsize;
> +	loff_t				oldpos;
> +	int				error;
> +
> +	if (!S_ISDIR(VFS_I(sc->ip)->i_mode))
> +		return -ENOENT;
> +
> +	/* Plausible size? */
> +	XFS_SCRUB_INO_GOTO(sc, sc->ip->i_ino, NULL, "inode",
> +			sc->ip->i_d.di_size >= xfs_dir2_sf_hdr_size(0), out);
> +
> +	/* Check directory tree structure */
> +	error = xfs_scrub_da_btree(sc, XFS_DATA_FORK, xfs_scrub_dir_rec);
> +	if (error)
> +		return error;
> +
> +	/* Check that every dirent we see can also be looked up by hash. */
> +	bufsize = (size_t)min_t(loff_t, 32768, sc->ip->i_d.di_size);
> +	sdc.sc = sc;
> +
> +	/*
> +	 * Look up every name in this directory by hash.
> +	 *
> +	 * The VFS grabs a read or write lock via i_rwsem before it reads
> +	 * or writes to a directory.  If we've gotten this far we've
> +	 * already obtained IOLOCK_EXCL, which (since 4.10) is the same as
> +	 * getting a write lock on i_rwsem.  Therefore, it is safe for us
> +	 * to drop the ILOCK here in order to reuse the _readdir and
> +	 * _dir_lookup routines, which do their own ILOCK locking.
> +	 */
> +	oldpos = 0;
> +	sc->ilock_flags &= ~XFS_ILOCK_EXCL;
> +	xfs_iunlock(sc->ip, XFS_ILOCK_EXCL);
> +	while (true) {
> +		error = xfs_readdir(sc->tp, sc->ip, &sdc.dc, bufsize);
> +		XFS_SCRUB_OP_ERROR_GOTO(sc,
> +				XFS_INO_TO_AGNO(mp, sc->ip->i_ino),
> +				XFS_INO_TO_AGBNO(mp, sc->ip->i_ino),
> +				"inode", &error, out);
> +		if (oldpos == sdc.dc.pos)
> +			break;
> +		oldpos = sdc.dc.pos;
> +	}
> +
> +out:
> +	return error;
> +}
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index edfa4c7..ccd27ec 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -3327,7 +3327,8 @@ DEFINE_GETFSMAP_EVENT(xfs_getfsmap_mapping);
>  	{ XFS_SCRUB_TYPE_INODE,		"inode" }, \
>  	{ XFS_SCRUB_TYPE_BMBTD,		"bmapbtd" }, \
>  	{ XFS_SCRUB_TYPE_BMBTA,		"bmapbta" }, \
> -	{ XFS_SCRUB_TYPE_BMBTC,		"bmapbtc" }
> +	{ XFS_SCRUB_TYPE_BMBTC,		"bmapbtc" }, \
> +	{ XFS_SCRUB_TYPE_DIR,		"dir" }
>  DECLARE_EVENT_CLASS(xfs_scrub_class,
>  	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
>  		 int error),
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 17/22] xfs: scrub directory freespace
  2017-07-21  4:40 ` [PATCH 17/22] xfs: scrub directory freespace Darrick J. Wong
@ 2017-07-23 17:55   ` Allison Henderson
  0 siblings, 0 replies; 63+ messages in thread
From: Allison Henderson @ 2017-07-23 17:55 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

Looks fine, not much change since v7
Reviewed by: Allison Henderson <allison.henderson@oracle.com>

On 7/20/2017 9:40 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> Check the free space information in a directory.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/scrub/dir.c |  337 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 337 insertions(+)
>
>
> diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c
> index 1d6b5329..661aec3 100644
> --- a/fs/xfs/scrub/dir.c
> +++ b/fs/xfs/scrub/dir.c
> @@ -232,6 +232,338 @@ xfs_scrub_dir_rec(
>  #undef XFS_SCRUB_DIRENT_GOTO
>  #undef XFS_SCRUB_DIRENT_CHECK
>
> +#define XFS_SCRUB_DIR_BLOCK_CHECK(fs_ok) \
> +	XFS_SCRUB_DATA_CHECK(sc, XFS_DATA_FORK, \
> +		lblk << mp->m_sb.sb_blocklog, "dir", fs_ok)
> +#define XFS_SCRUB_DIR_BLOCK_GOTO(fs_ok, label) \
> +	XFS_SCRUB_DATA_GOTO(sc, XFS_DATA_FORK, \
> +		lblk << mp->m_sb.sb_blocklog, "dir", fs_ok, label)
> +#define XFS_SCRUB_DIR_BLOCK_OP_ERROR_GOTO(label) \
> +	XFS_SCRUB_FILE_OP_ERROR_GOTO(sc, XFS_DATA_FORK, \
> +		lblk << mp->m_sb.sb_blocklog, "dir", &error, label)
> +/* Is this free entry either in the bestfree or smaller than all of them? */
> +static inline bool
> +xfs_scrub_directory_check_free_entry(
> +	struct xfs_dir2_data_free	*bf,
> +	struct xfs_dir2_data_unused	*dup)
> +{
> +	struct xfs_dir2_data_free	*dfp;
> +	unsigned int			smallest;
> +
> +	smallest = -1U;
> +	for (dfp = &bf[0]; dfp < &bf[XFS_DIR2_DATA_FD_COUNT]; dfp++) {
> +		if (dfp->offset &&
> +		    be16_to_cpu(dfp->length) == be16_to_cpu(dup->length))
> +			return true;
> +		if (smallest < be16_to_cpu(dfp->length))
> +			smallest = be16_to_cpu(dfp->length);
> +	}
> +
> +	return be16_to_cpu(dup->length) <= smallest;
> +}
> +
> +/* Check free space info in a directory data block. */
> +STATIC int
> +xfs_scrub_directory_data_bestfree(
> +	struct xfs_scrub_context	*sc,
> +	xfs_dablk_t			lblk,
> +	bool				is_block)
> +{
> +	struct xfs_dir2_data_unused	*dup;
> +	struct xfs_dir2_data_free	*dfp;
> +	struct xfs_buf			*bp;
> +	struct xfs_dir2_data_free	*bf;
> +	struct xfs_mount		*mp = sc->mp;
> +	char				*ptr;
> +	char				*endptr;
> +	u16				tag;
> +	int				newlen;
> +	int				offset;
> +	int				error;
> +
> +	if (is_block) {
> +		/* dir block format */
> +		XFS_SCRUB_DIR_BLOCK_CHECK(lblk ==
> +				XFS_B_TO_FSBT(mp, XFS_DIR2_DATA_OFFSET));
> +		error = xfs_dir3_block_read(sc->tp, sc->ip, &bp);
> +	} else {
> +		/* dir data format */
> +		error = xfs_dir3_data_read(sc->tp, sc->ip, lblk, -1, &bp);
> +	}
> +	XFS_SCRUB_DIR_BLOCK_OP_ERROR_GOTO(out);
> +
> +	/* Do the bestfrees correspond to actual free space? */
> +	bf = sc->ip->d_ops->data_bestfree_p(bp->b_addr);
> +	for (dfp = &bf[0]; dfp < &bf[XFS_DIR2_DATA_FD_COUNT]; dfp++) {
> +		offset = be16_to_cpu(dfp->offset);
> +		XFS_SCRUB_DIR_BLOCK_GOTO(offset < BBTOB(bp->b_length), nextloop);
> +		if (!offset)
> +			continue;
> +		dup = (struct xfs_dir2_data_unused *)(bp->b_addr + offset);
> +		tag = be16_to_cpu(*xfs_dir2_data_unused_tag_p(dup));
> +
> +		XFS_SCRUB_DIR_BLOCK_CHECK(dup->freetag ==
> +				cpu_to_be16(XFS_DIR2_DATA_FREE_TAG));
> +		XFS_SCRUB_DIR_BLOCK_CHECK(be16_to_cpu(dup->length) ==
> +				be16_to_cpu(dfp->length));
> +		XFS_SCRUB_DIR_BLOCK_CHECK(tag ==
> +				((char *)dup - (char *)bp->b_addr));
> +nextloop:;
> +	}
> +
> +	/* Make sure the bestfrees are actually the best free spaces. */
> +	ptr = (char *)sc->ip->d_ops->data_entry_p(bp->b_addr);
> +	if (is_block) {
> +		struct xfs_dir2_block_tail	*btp;
> +
> +		btp = xfs_dir2_block_tail_p(sc->mp->m_dir_geo, bp->b_addr);
> +		endptr = (char *)xfs_dir2_block_leaf_p(btp);
> +	} else
> +		endptr = (char *)bp->b_addr + BBTOB(bp->b_length);
> +	while (ptr < endptr) {
> +		dup = (struct xfs_dir2_data_unused *)ptr;
> +		/* Skip real entries */
> +		if (dup->freetag != cpu_to_be16(XFS_DIR2_DATA_FREE_TAG)) {
> +			struct xfs_dir2_data_entry	*dep;
> +
> +			dep = (struct xfs_dir2_data_entry *)ptr;
> +			newlen = sc->ip->d_ops->data_entsize(dep->namelen);
> +			XFS_SCRUB_DIR_BLOCK_GOTO(newlen > 0, out_buf);
> +			ptr += newlen;
> +			XFS_SCRUB_DIR_BLOCK_CHECK(ptr <= endptr);
> +			continue;
> +		}
> +
> +		/* Spot check this free entry */
> +		tag = be16_to_cpu(*xfs_dir2_data_unused_tag_p(dup));
> +		XFS_SCRUB_DIR_BLOCK_CHECK(tag ==
> +				((char *)dup - (char *)bp->b_addr));
> +
> +		/*
> +		 * Either this entry is a bestfree or it's smaller than
> +		 * any of the bestfrees.
> +		 */
> +		XFS_SCRUB_DIR_BLOCK_CHECK(
> +				xfs_scrub_directory_check_free_entry(bf, dup));
> +
> +		/* Move on. */
> +		newlen = be16_to_cpu(dup->length);
> +		XFS_SCRUB_DIR_BLOCK_GOTO(newlen > 0, out_buf);
> +		ptr += newlen;
> +		XFS_SCRUB_DIR_BLOCK_CHECK(ptr <= endptr);
> +	}
> +out_buf:
> +	xfs_trans_brelse(sc->tp, bp);
> +out:
> +	return error;
> +}
> +
> +/* Is this the longest free entry in the block? */
> +static inline bool
> +xfs_scrub_directory_check_freesp(
> +	struct xfs_inode		*dp,
> +	struct xfs_buf			*dbp,
> +	unsigned int			len)
> +{
> +	struct xfs_dir2_data_free	*bf;
> +	struct xfs_dir2_data_free	*dfp;
> +	unsigned int			longest = 0;
> +	int				offset;
> +
> +	bf = dp->d_ops->data_bestfree_p(dbp->b_addr);
> +	for (dfp = &bf[0]; dfp < &bf[XFS_DIR2_DATA_FD_COUNT]; dfp++) {
> +		offset = be16_to_cpu(dfp->offset);
> +		if (!offset)
> +			continue;
> +		if (longest < be16_to_cpu(dfp->length))
> +			longest = be16_to_cpu(dfp->length);
> +	}
> +
> +	return longest == len;
> +}
> +
> +/* Check free space info in a directory leaf1 block. */
> +STATIC int
> +xfs_scrub_directory_leaf1_bestfree(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_da_args		*args,
> +	xfs_dablk_t			lblk)
> +{
> +	struct xfs_dir2_leaf_tail	*ltp;
> +	struct xfs_buf			*dbp;
> +	struct xfs_buf			*bp;
> +	struct xfs_mount		*mp = sc->mp;
> +	__be16				*bestp;
> +	__u16				best;
> +	int				i;
> +	int				error;
> +
> +	/* Read the free space block */
> +	error = xfs_dir3_leaf_read(sc->tp, sc->ip, lblk, -1, &bp);
> +	XFS_SCRUB_DIR_BLOCK_OP_ERROR_GOTO(out);
> +
> +	/* Check all the entries. */
> +	ltp = xfs_dir2_leaf_tail_p(mp->m_dir_geo, bp->b_addr);
> +	bestp = xfs_dir2_leaf_bests_p(ltp);
> +	for (i = 0; i < be32_to_cpu(ltp->bestcount); i++, bestp++) {
> +		best = be16_to_cpu(*bestp);
> +		if (best == NULLDATAOFF)
> +			continue;
> +		error = xfs_dir3_data_read(sc->tp, sc->ip,
> +				i * args->geo->fsbcount, -1, &dbp);
> +		XFS_SCRUB_DIR_BLOCK_OP_ERROR_GOTO(skip_buf);
> +		XFS_SCRUB_DIR_BLOCK_CHECK(
> +				xfs_scrub_directory_check_freesp(sc->ip, dbp,
> +					best));
> +		xfs_trans_brelse(sc->tp, dbp);
> +skip_buf:
> +		;
> +	}
> +out:
> +	return error;
> +}
> +
> +/* Check free space info in a directory freespace block. */
> +STATIC int
> +xfs_scrub_directory_free_bestfree(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_da_args		*args,
> +	xfs_dablk_t			lblk)
> +{
> +	struct xfs_dir3_icfree_hdr	freehdr;
> +	struct xfs_buf			*dbp;
> +	struct xfs_buf			*bp;
> +	struct xfs_mount		*mp = sc->mp;
> +	__be16				*bestp;
> +	__be16				best;
> +	int				i;
> +	int				error;
> +
> +	/* Read the free space block */
> +	error = xfs_dir2_free_read(sc->tp, sc->ip, lblk, &bp);
> +	XFS_SCRUB_DIR_BLOCK_OP_ERROR_GOTO(out);
> +
> +	/* Check all the entries. */
> +	sc->ip->d_ops->free_hdr_from_disk(&freehdr, bp->b_addr);
> +	bestp = sc->ip->d_ops->free_bests_p(bp->b_addr);
> +	for (i = 0; i < freehdr.nvalid; i++, bestp++) {
> +		best = be16_to_cpu(*bestp);
> +		if (best == NULLDATAOFF)
> +			continue;
> +		error = xfs_dir3_data_read(sc->tp, sc->ip,
> +				(freehdr.firstdb + i) * args->geo->fsbcount,
> +				-1, &dbp);
> +		XFS_SCRUB_DIR_BLOCK_OP_ERROR_GOTO(skip_buf);
> +		XFS_SCRUB_DIR_BLOCK_CHECK(
> +				xfs_scrub_directory_check_freesp(sc->ip, dbp,
> +					best));
> +		xfs_trans_brelse(sc->tp, dbp);
> +skip_buf:
> +		;
> +	}
> +out:
> +	return error;
> +}
> +
> +#define for_each_extent_dablk(lblk, args, got) \
> +		for ((lblk) = roundup((xfs_dablk_t)(got)->br_startoff, (args)->geo->fsbcount); \
> +		     (lblk) < (got)->br_startoff + (got)->br_blockcount; \
> +		     (lblk) += (args)->geo->fsbcount)
> +/* Check free space information in directories. */
> +STATIC int
> +xfs_scrub_directory_blocks(
> +	struct xfs_scrub_context	*sc)
> +{
> +	struct xfs_bmbt_irec		got;
> +	struct xfs_da_args		args;
> +	struct xfs_ifork		*ifp;
> +	struct xfs_mount		*mp = sc->mp;
> +	xfs_fileoff_t			leaf_lblk;
> +	xfs_fileoff_t			free_lblk;
> +	xfs_fileoff_t			lblk;
> +	xfs_extnum_t			idx;
> +	bool				found;
> +	int				is_block = 0;
> +	int				error;
> +
> +	/* Ignore local format directories. */
> +	if (sc->ip->i_d.di_format != XFS_DINODE_FMT_EXTENTS &&
> +	    sc->ip->i_d.di_format != XFS_DINODE_FMT_BTREE)
> +		return 0;
> +
> +	ifp = XFS_IFORK_PTR(sc->ip, XFS_DATA_FORK);
> +	lblk = XFS_B_TO_FSB(mp, XFS_DIR2_DATA_OFFSET);
> +	leaf_lblk = XFS_B_TO_FSB(mp, XFS_DIR2_LEAF_OFFSET);
> +	free_lblk = XFS_B_TO_FSB(mp, XFS_DIR2_FREE_OFFSET);
> +
> +	/* Is this a block dir? */
> +	args.dp = sc->ip;
> +	args.geo = mp->m_dir_geo;
> +	args.trans = sc->tp;
> +	error = xfs_dir2_isblock(&args, &is_block);
> +	XFS_SCRUB_DIR_BLOCK_OP_ERROR_GOTO(out);
> +
> +	/* Iterate all the data extents in the directory... */
> +	found = xfs_iext_lookup_extent(sc->ip, ifp, lblk, &idx, &got);
> +	while (found) {
> +		/* No more data blocks... */
> +		if (got.br_startoff >= leaf_lblk)
> +			break;
> +
> +		/* Check each data block's bestfree data */
> +		for_each_extent_dablk(lblk, &args, &got) {
> +			error = xfs_scrub_directory_data_bestfree(sc, lblk,
> +					is_block);
> +			if (error)
> +				goto out;
> +		}
> +
> +		found = xfs_iext_get_extent(ifp, ++idx, &got);
> +	}
> +
> +	/* Look for a leaf1 block, which has free info. */
> +	if (xfs_iext_lookup_extent(sc->ip, ifp, leaf_lblk, &idx, &got) &&
> +	    got.br_startoff == leaf_lblk &&
> +	    got.br_blockcount == args.geo->fsbcount &&
> +	    !xfs_iext_get_extent(ifp, ++idx, &got)) {
> +		XFS_SCRUB_DIR_BLOCK_GOTO(!is_block, not_leaf1);
> +		error = xfs_scrub_directory_leaf1_bestfree(sc, &args,
> +				leaf_lblk);
> +		if (error)
> +			goto out;
> +	}
> +not_leaf1:
> +
> +	/* Scan for free blocks */
> +	lblk = free_lblk;
> +	found = xfs_iext_lookup_extent(sc->ip, ifp, lblk, &idx, &got);
> +	while (found) {
> +		/*
> +		 * Dirs can't have blocks mapped above 2^32.
> +		 * Single-block dirs shouldn't even be here.
> +		 */
> +		lblk = got.br_startoff;
> +		XFS_SCRUB_DIR_BLOCK_GOTO(!(lblk & ~((1ULL << 32) - 1ULL)), out);
> +		XFS_SCRUB_DIR_BLOCK_GOTO(!is_block, nextfree);
> +
> +		/* Check each dir free block's bestfree data */
> +		for_each_extent_dablk(lblk, &args, &got) {
> +			error = xfs_scrub_directory_free_bestfree(sc, &args,
> +					lblk);
> +			if (error)
> +				goto out;
> +		}
> +
> +nextfree:
> +		found = xfs_iext_get_extent(ifp, ++idx, &got);
> +	}
> +out:
> +	return error;
> +}
> +#undef for_each_extent_dablk
> +#undef XFS_SCRUB_DIR_BLOCK_OP_ERROR_GOTO
> +#undef XFS_SCRUB_DIR_BLOCK_CHECK
> +
>  /* Scrub a whole directory. */
>  int
>  xfs_scrub_directory(
> @@ -258,6 +590,11 @@ xfs_scrub_directory(
>  	if (error)
>  		return error;
>
> +	/* Check the freespace. */
> +	error = xfs_scrub_directory_blocks(sc);
> +	if (error)
> +		return error;
> +
>  	/* Check that every dirent we see can also be looked up by hash. */
>  	bufsize = (size_t)min_t(loff_t, 32768, sc->ip->i_d.di_size);
>  	sdc.sc = sc;
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 18/22] xfs: scrub extended attributes
  2017-07-21  4:40 ` [PATCH 18/22] xfs: scrub extended attributes Darrick J. Wong
@ 2017-07-23 17:57   ` Allison Henderson
  0 siblings, 0 replies; 63+ messages in thread
From: Allison Henderson @ 2017-07-23 17:57 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

Reviewed by: Allison Henderson <allison.henderson@oracle.com>

On 7/20/2017 9:40 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> Scrub the hash tree, keys, and values in an extended attribute structure.
> Refactor the attribute code to use the transaction if the caller supplied
> one to avoid buffer deadocks.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/Makefile        |    1
>  fs/xfs/libxfs/xfs_fs.h |    3 -
>  fs/xfs/scrub/attr.c    |  212 ++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/scrub/common.c  |    8 ++
>  fs/xfs/scrub/common.h  |    3 +
>  fs/xfs/xfs_trace.h     |    3 -
>  6 files changed, 228 insertions(+), 2 deletions(-)
>  create mode 100644 fs/xfs/scrub/attr.c
>
>
> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> index c568d0d..da64bef 100644
> --- a/fs/xfs/Makefile
> +++ b/fs/xfs/Makefile
> @@ -142,6 +142,7 @@ ifeq ($(CONFIG_XFS_ONLINE_SCRUB),y)
>  xfs-y				+= $(addprefix scrub/, \
>  				   agheader.o \
>  				   alloc.o \
> +				   attr.o \
>  				   bmap.o \
>  				   btree.o \
>  				   common.o \
> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> index e646b5f..2f553ed 100644
> --- a/fs/xfs/libxfs/xfs_fs.h
> +++ b/fs/xfs/libxfs/xfs_fs.h
> @@ -498,7 +498,8 @@ struct xfs_scrub_metadata {
>  #define XFS_SCRUB_TYPE_BMBTA	14	/* attr fork block mapping */
>  #define XFS_SCRUB_TYPE_BMBTC	15	/* CoW fork block mapping */
>  #define XFS_SCRUB_TYPE_DIR	16	/* directory */
> -#define XFS_SCRUB_TYPE_MAX	16
> +#define XFS_SCRUB_TYPE_XATTR	17	/* extended attribute */
> +#define XFS_SCRUB_TYPE_MAX	17
>
>  /* i: repair this metadata */
>  #define XFS_SCRUB_FLAG_REPAIR		(1 << 0)
> diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
> new file mode 100644
> index 0000000..f6a4b59
> --- /dev/null
> +++ b/fs/xfs/scrub/attr.c
> @@ -0,0 +1,212 @@
> +/*
> + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> + *
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#include "xfs.h"
> +#include "xfs_fs.h"
> +#include "xfs_shared.h"
> +#include "xfs_format.h"
> +#include "xfs_trans_resv.h"
> +#include "xfs_mount.h"
> +#include "xfs_defer.h"
> +#include "xfs_btree.h"
> +#include "xfs_bit.h"
> +#include "xfs_log_format.h"
> +#include "xfs_trans.h"
> +#include "xfs_trace.h"
> +#include "xfs_sb.h"
> +#include "xfs_inode.h"
> +#include "xfs_da_format.h"
> +#include "xfs_da_btree.h"
> +#include "xfs_dir2.h"
> +#include "xfs_attr.h"
> +#include "xfs_attr_leaf.h"
> +#include "scrub/common.h"
> +#include "scrub/dabtree.h"
> +
> +#include <linux/posix_acl_xattr.h>
> +#include <linux/xattr.h>
> +
> +/* Set us up to scrub an inode's extended attributes. */
> +int
> +xfs_scrub_setup_xattr(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_inode		*ip)
> +{
> +	/* Allocate the buffer without the inode lock held. */
> +	sc->buf = kmem_zalloc_large(XATTR_SIZE_MAX, KM_SLEEP);
> +	if (!sc->buf)
> +		return -ENOMEM;
> +
> +	return xfs_scrub_setup_inode_contents(sc, ip, 0);
> +}
> +
> +/* Extended Attributes */
> +
> +struct xfs_scrub_xattr {
> +	struct xfs_attr_list_context	context;
> +	struct xfs_scrub_context	*sc;
> +};
> +
> +#define XFS_SCRUB_ATTR_CHECK(fs_ok) \
> +	XFS_SCRUB_DATA_CHECK(sx->sc, XFS_ATTR_FORK, args.blkno, "attr", fs_ok)
> +#define XFS_SCRUB_ATTR_OP_ERROR_GOTO(label) \
> +	XFS_SCRUB_FILE_OP_ERROR_GOTO(sx->sc, XFS_ATTR_FORK, args.blkno, "attr", &error, label)
> +/* Check that an extended attribute key can be looked up by hash. */
> +static void
> +xfs_scrub_xattr_listent(
> +	struct xfs_attr_list_context	*context,
> +	int				flags,
> +	unsigned char			*name,
> +	int				namelen,
> +	int				valuelen)
> +{
> +	struct xfs_scrub_xattr		*sx;
> +	struct xfs_da_args		args = {0};
> +	int				error = 0;
> +
> +	sx = container_of(context, struct xfs_scrub_xattr, context);
> +
> +	args.flags = ATTR_KERNOTIME;
> +	if (flags & XFS_ATTR_ROOT)
> +		args.flags |= ATTR_ROOT;
> +	else if (flags & XFS_ATTR_SECURE)
> +		args.flags |= ATTR_SECURE;
> +	args.geo = context->dp->i_mount->m_attr_geo;
> +	args.whichfork = XFS_ATTR_FORK;
> +	args.dp = context->dp;
> +	args.name = name;
> +	args.namelen = namelen;
> +	args.hashval = xfs_da_hashname(args.name, args.namelen);
> +	args.trans = context->tp;
> +	args.value = sx->sc->buf;
> +	args.valuelen = XATTR_SIZE_MAX;
> +
> +	error = xfs_attr_get_ilocked(context->dp, &args);
> +	if (error == -EEXIST)
> +		error = 0;
> +	XFS_SCRUB_ATTR_OP_ERROR_GOTO(fail_xref);
> +	XFS_SCRUB_ATTR_CHECK(args.valuelen == valuelen);
> +
> +fail_xref:
> +	return;
> +}
> +#undef XFS_SCRUB_ATTR_OP_ERROR_GOTO
> +#undef XFS_SCRUB_ATTR_CHECK
> +
> +/* Scrub a attribute btree record. */
> +STATIC int
> +xfs_scrub_xattr_rec(
> +	struct xfs_scrub_da_btree	*ds,
> +	int				level,
> +	void				*rec)
> +{
> +	struct xfs_mount		*mp = ds->state->mp;
> +	struct xfs_attr_leaf_entry	*ent = rec;
> +	struct xfs_da_state_blk		*blk;
> +	struct xfs_attr_leaf_name_local	*lentry;
> +	struct xfs_attr_leaf_name_remote	*rentry;
> +	struct xfs_buf			*bp;
> +	xfs_dahash_t			calc_hash;
> +	xfs_dahash_t			hash;
> +	int				nameidx;
> +	int				hdrsize;
> +	unsigned int			badflags;
> +	int				error;
> +
> +	blk = &ds->state->path.blk[level];
> +
> +	/* Check the hash of the entry. */
> +	error = xfs_scrub_da_btree_hash(ds, level, &ent->hashval);
> +	if (error)
> +		goto out;
> +
> +	/* Find the attr entry's location. */
> +	bp = blk->bp;
> +	hdrsize = xfs_attr3_leaf_hdr_size(bp->b_addr);
> +	nameidx = be16_to_cpu(ent->nameidx);
> +	XFS_SCRUB_DA_GOTO(ds, nameidx >= hdrsize, out);
> +	XFS_SCRUB_DA_GOTO(ds, nameidx < mp->m_attr_geo->blksize, out);
> +
> +	/* Retrieve the entry and check it. */
> +	hash = be32_to_cpu(ent->hashval);
> +	badflags = ~(XFS_ATTR_LOCAL | XFS_ATTR_ROOT | XFS_ATTR_SECURE |
> +			XFS_ATTR_INCOMPLETE);
> +	XFS_SCRUB_DA_CHECK(ds, (ent->flags & badflags) == 0);
> +	if (ent->flags & XFS_ATTR_LOCAL) {
> +		lentry = (struct xfs_attr_leaf_name_local *)
> +				(((char *)bp->b_addr) + nameidx);
> +		XFS_SCRUB_DA_GOTO(ds, lentry->namelen < MAXNAMELEN, out);
> +		calc_hash = xfs_da_hashname(lentry->nameval, lentry->namelen);
> +	} else {
> +		rentry = (struct xfs_attr_leaf_name_remote *)
> +				(((char *)bp->b_addr) + nameidx);
> +		XFS_SCRUB_DA_GOTO(ds, rentry->namelen < MAXNAMELEN, out);
> +		calc_hash = xfs_da_hashname(rentry->name, rentry->namelen);
> +	}
> +	XFS_SCRUB_DA_CHECK(ds, calc_hash == hash);
> +
> +out:
> +	return error;
> +}
> +
> +/* Scrub the extended attribute metadata. */
> +int
> +xfs_scrub_xattr(
> +	struct xfs_scrub_context	*sc)
> +{
> +	struct xfs_scrub_xattr		sx = { 0 };
> +	struct attrlist_cursor_kern	cursor = { 0 };
> +	struct xfs_mount		*mp = sc->mp;
> +	int				error = 0;
> +
> +	if (!xfs_inode_hasattr(sc->ip))
> +		return -ENOENT;
> +
> +	memset(&sx, 0, sizeof(sx));
> +	/* Check attribute tree structure */
> +	error = xfs_scrub_da_btree(sc, XFS_ATTR_FORK, xfs_scrub_xattr_rec);
> +	if (error)
> +		goto out;
> +
> +	/* Check that every attr key can also be looked up by hash. */
> +	sx.context.dp = sc->ip;
> +	sx.context.cursor = &cursor;
> +	sx.context.resynch = 1;
> +	sx.context.put_listent = xfs_scrub_xattr_listent;
> +	sx.context.tp = sc->tp;
> +	sx.sc = sc;
> +
> +	/*
> +	 * Look up every xattr in this file by name.
> +	 *
> +	 * The VFS only locks i_rwsem when modifying attrs, so keep all
> +	 * three locks held because that's the only way to ensure we're
> +	 * the only thread poking into the da btree.  We traverse the da
> +	 * btree while holding a leaf buffer locked for the xattr name
> +	 * iteration, which doesn't really follow the usual buffer
> +	 * locking order.
> +	 */
> +	error = xfs_attr_list_int_ilocked(&sx.context);
> +	XFS_SCRUB_OP_ERROR_GOTO(sc,
> +			XFS_INO_TO_AGNO(mp, sc->ip->i_ino),
> +			XFS_INO_TO_AGBNO(mp, sc->ip->i_ino),
> +			"inode", &error, out);
> +out:
> +	return error;
> +}
> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> index 92627e9..a47c654 100644
> --- a/fs/xfs/scrub/common.c
> +++ b/fs/xfs/scrub/common.c
> @@ -686,6 +686,10 @@ xfs_scrub_teardown(
>  			IRELE(sc->ip);
>  		sc->ip = NULL;
>  	}
> +	if (sc->buf) {
> +		kmem_free(sc->buf);
> +		sc->buf = NULL;
> +	}
>  	return error;
>  }
>
> @@ -844,6 +848,10 @@ static const struct xfs_scrub_meta_fns meta_scrub_fns[] = {
>  		.setup	= xfs_scrub_setup_directory,
>  		.scrub	= xfs_scrub_directory,
>  	},
> +	{ /* extended attributes */
> +		.setup	= xfs_scrub_setup_xattr,
> +		.scrub	= xfs_scrub_xattr,
> +	},
>  };
>
>  /* Dispatch metadata scrubbing. */
> diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
> index 7baaa2d..1cfe0cc 100644
> --- a/fs/xfs/scrub/common.h
> +++ b/fs/xfs/scrub/common.h
> @@ -52,6 +52,7 @@ struct xfs_scrub_context {
>  	const struct xfs_scrub_meta_fns	*fns;
>  	struct xfs_trans		*tp;
>  	struct xfs_inode		*ip;
> +	void				*buf;
>  	uint				ilock_flags;
>  	bool				try_harder;
>
> @@ -221,6 +222,7 @@ SETUP_FN(xfs_scrub_setup_inode);
>  SETUP_FN(xfs_scrub_setup_inode_bmap_data);
>  SETUP_FN(xfs_scrub_setup_inode_bmap);
>  SETUP_FN(xfs_scrub_setup_directory);
> +SETUP_FN(xfs_scrub_setup_xattr);
>  #undef SETUP_FN
>
>  /* Metadata scrubbers */
> @@ -243,6 +245,7 @@ SCRUB_FN(xfs_scrub_bmap_data);
>  SCRUB_FN(xfs_scrub_bmap_attr);
>  SCRUB_FN(xfs_scrub_bmap_cow);
>  SCRUB_FN(xfs_scrub_directory);
> +SCRUB_FN(xfs_scrub_xattr);
>  #undef SCRUB_FN
>
>  #endif	/* __XFS_REPAIR_COMMON_H__ */
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index ccd27ec..fe4b313 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -3328,7 +3328,8 @@ DEFINE_GETFSMAP_EVENT(xfs_getfsmap_mapping);
>  	{ XFS_SCRUB_TYPE_BMBTD,		"bmapbtd" }, \
>  	{ XFS_SCRUB_TYPE_BMBTA,		"bmapbta" }, \
>  	{ XFS_SCRUB_TYPE_BMBTC,		"bmapbtc" }, \
> -	{ XFS_SCRUB_TYPE_DIR,		"dir" }
> +	{ XFS_SCRUB_TYPE_DIR,		"dir" }, \
> +	{ XFS_SCRUB_TYPE_XATTR,		"xattr" }
>  DECLARE_EVENT_CLASS(xfs_scrub_class,
>  	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
>  		 int error),
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 19/22] xfs: scrub symbolic links
  2017-07-21  4:40 ` [PATCH 19/22] xfs: scrub symbolic links Darrick J. Wong
@ 2017-07-23 17:59   ` Allison Henderson
  0 siblings, 0 replies; 63+ messages in thread
From: Allison Henderson @ 2017-07-23 17:59 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

Looks good.  Looks like you caught some bugs along the way
Reviewed by: Allison Henderson <allison.henderson@oracle.com>

On 7/20/2017 9:40 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> Create the infrastructure to scrub symbolic link data.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/Makefile        |    1 +
>  fs/xfs/libxfs/xfs_fs.h |    3 +-
>  fs/xfs/scrub/common.c  |    4 ++
>  fs/xfs/scrub/common.h  |    2 +
>  fs/xfs/scrub/symlink.c |   94 ++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_trace.h     |    3 +-
>  6 files changed, 105 insertions(+), 2 deletions(-)
>  create mode 100644 fs/xfs/scrub/symlink.c
>
>
> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> index da64bef..3d862b9 100644
> --- a/fs/xfs/Makefile
> +++ b/fs/xfs/Makefile
> @@ -153,5 +153,6 @@ xfs-y				+= $(addprefix scrub/, \
>  				   metabufs.o \
>  				   refcount.o \
>  				   rmap.o \
> +				   symlink.o \
>  				   )
>  endif
> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> index 2f553ed..95d9ce9 100644
> --- a/fs/xfs/libxfs/xfs_fs.h
> +++ b/fs/xfs/libxfs/xfs_fs.h
> @@ -499,7 +499,8 @@ struct xfs_scrub_metadata {
>  #define XFS_SCRUB_TYPE_BMBTC	15	/* CoW fork block mapping */
>  #define XFS_SCRUB_TYPE_DIR	16	/* directory */
>  #define XFS_SCRUB_TYPE_XATTR	17	/* extended attribute */
> -#define XFS_SCRUB_TYPE_MAX	17
> +#define XFS_SCRUB_TYPE_SYMLINK	18	/* symbolic link */
> +#define XFS_SCRUB_TYPE_MAX	18
>
>  /* i: repair this metadata */
>  #define XFS_SCRUB_FLAG_REPAIR		(1 << 0)
> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> index a47c654..4003c2f 100644
> --- a/fs/xfs/scrub/common.c
> +++ b/fs/xfs/scrub/common.c
> @@ -852,6 +852,10 @@ static const struct xfs_scrub_meta_fns meta_scrub_fns[] = {
>  		.setup	= xfs_scrub_setup_xattr,
>  		.scrub	= xfs_scrub_xattr,
>  	},
> +	{ /* symbolic link */
> +		.setup	= xfs_scrub_setup_symlink,
> +		.scrub	= xfs_scrub_symlink,
> +	},
>  };
>
>  /* Dispatch metadata scrubbing. */
> diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
> index 1cfe0cc..6d02a64 100644
> --- a/fs/xfs/scrub/common.h
> +++ b/fs/xfs/scrub/common.h
> @@ -223,6 +223,7 @@ SETUP_FN(xfs_scrub_setup_inode_bmap_data);
>  SETUP_FN(xfs_scrub_setup_inode_bmap);
>  SETUP_FN(xfs_scrub_setup_directory);
>  SETUP_FN(xfs_scrub_setup_xattr);
> +SETUP_FN(xfs_scrub_setup_symlink);
>  #undef SETUP_FN
>
>  /* Metadata scrubbers */
> @@ -246,6 +247,7 @@ SCRUB_FN(xfs_scrub_bmap_attr);
>  SCRUB_FN(xfs_scrub_bmap_cow);
>  SCRUB_FN(xfs_scrub_directory);
>  SCRUB_FN(xfs_scrub_xattr);
> +SCRUB_FN(xfs_scrub_symlink);
>  #undef SCRUB_FN
>
>  #endif	/* __XFS_REPAIR_COMMON_H__ */
> diff --git a/fs/xfs/scrub/symlink.c b/fs/xfs/scrub/symlink.c
> new file mode 100644
> index 0000000..75537e9
> --- /dev/null
> +++ b/fs/xfs/scrub/symlink.c
> @@ -0,0 +1,94 @@
> +/*
> + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> + *
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#include "xfs.h"
> +#include "xfs_fs.h"
> +#include "xfs_shared.h"
> +#include "xfs_format.h"
> +#include "xfs_trans_resv.h"
> +#include "xfs_mount.h"
> +#include "xfs_defer.h"
> +#include "xfs_btree.h"
> +#include "xfs_bit.h"
> +#include "xfs_log_format.h"
> +#include "xfs_trans.h"
> +#include "xfs_trace.h"
> +#include "xfs_sb.h"
> +#include "xfs_inode.h"
> +#include "xfs_inode_fork.h"
> +#include "xfs_symlink.h"
> +#include "scrub/common.h"
> +
> +/* Set us up to scrub a symbolic link. */
> +int
> +xfs_scrub_setup_symlink(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_inode		*ip)
> +{
> +	/* Allocate the buffer without the inode lock held. */
> +	sc->buf = kmem_zalloc_large(XFS_SYMLINK_MAXLEN + 1, KM_SLEEP);
> +	if (!sc->buf)
> +		return -ENOMEM;
> +
> +	return xfs_scrub_setup_inode_contents(sc, ip, 0);
> +}
> +
> +/* Symbolic links. */
> +
> +#define XFS_SCRUB_SYMLINK_CHECK(fs_ok) \
> +	XFS_SCRUB_INO_CHECK(sc, ip->i_ino, NULL, "symlink", fs_ok)
> +#define XFS_SCRUB_SYMLINK_GOTO(fs_ok, label) \
> +	XFS_SCRUB_INO_GOTO(sc, ip->i_ino, NULL, "symlink", fs_ok, label)
> +int
> +xfs_scrub_symlink(
> +	struct xfs_scrub_context	*sc)
> +{
> +	struct xfs_inode		*ip = sc->ip;
> +	struct xfs_ifork		*ifp;
> +	loff_t				len;
> +	int				error = 0;
> +
> +	if (!S_ISLNK(VFS_I(ip)->i_mode))
> +		return -ENOENT;
> +	ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
> +	len = ip->i_d.di_size;
> +
> +	/* Plausible size? */
> +	XFS_SCRUB_SYMLINK_GOTO(len <= XFS_SYMLINK_MAXLEN, out);
> +	XFS_SCRUB_SYMLINK_GOTO(len > 0, out);
> +
> +	/* Inline symlink? */
> +	if (ifp->if_flags & XFS_IFINLINE) {
> +		XFS_SCRUB_SYMLINK_GOTO(len > 0, out);
> +		XFS_SCRUB_SYMLINK_CHECK(len <= XFS_IFORK_DSIZE(ip));
> +		XFS_SCRUB_SYMLINK_CHECK(len <= strnlen(ifp->if_u1.if_data,
> +				XFS_IFORK_DSIZE(ip)));
> +		goto out;
> +	}
> +
> +	/* Remote symlink; must read the contents. */
> +	error = xfs_readlink_bmap_ilocked(sc->ip, sc->buf);
> +	XFS_SCRUB_FILE_OP_ERROR_GOTO(sc, XFS_DATA_FORK, 0, "symlink",
> +			&error, out);
> +	XFS_SCRUB_SYMLINK_CHECK(len <= strnlen(sc->buf, XFS_SYMLINK_MAXLEN));
> +out:
> +	return error;
> +}
> +#undef XFS_SCRUB_SYMLINK_GOTO
> +#undef XFS_SCRUB_SYMLINK_CHECK
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index fe4b313..39824f8 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -3329,7 +3329,8 @@ DEFINE_GETFSMAP_EVENT(xfs_getfsmap_mapping);
>  	{ XFS_SCRUB_TYPE_BMBTA,		"bmapbta" }, \
>  	{ XFS_SCRUB_TYPE_BMBTC,		"bmapbtc" }, \
>  	{ XFS_SCRUB_TYPE_DIR,		"dir" }, \
> -	{ XFS_SCRUB_TYPE_XATTR,		"xattr" }
> +	{ XFS_SCRUB_TYPE_XATTR,		"xattr" }, \
> +	{ XFS_SCRUB_TYPE_SYMLINK,	"symlink" }
>  DECLARE_EVENT_CLASS(xfs_scrub_class,
>  	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
>  		 int error),
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 20/22] xfs: scrub parent pointers
  2017-07-21  4:40 ` [PATCH 20/22] xfs: scrub parent pointers Darrick J. Wong
@ 2017-07-23 18:03   ` Allison Henderson
  0 siblings, 0 replies; 63+ messages in thread
From: Allison Henderson @ 2017-07-23 18:03 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs



On 7/20/2017 9:40 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> Scrub parent pointers, sort of.  For directories, we can ride the
> '..' entry up to the parent to confirm that there's at most one
> dentry that points back to this directory.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/Makefile        |    1
>  fs/xfs/libxfs/xfs_fs.h |    3 -
>  fs/xfs/scrub/common.c  |    4 +
>  fs/xfs/scrub/common.h  |    2
>  fs/xfs/scrub/parent.c  |  252 ++++++++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 261 insertions(+), 1 deletion(-)
>  create mode 100644 fs/xfs/scrub/parent.c
>
>
> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> index 3d862b9..e73cdc2 100644
> --- a/fs/xfs/Makefile
> +++ b/fs/xfs/Makefile
> @@ -151,6 +151,7 @@ xfs-y				+= $(addprefix scrub/, \
>  				   ialloc.o \
>  				   inode.o \
>  				   metabufs.o \
> +				   parent.o \
>  				   refcount.o \
>  				   rmap.o \
>  				   symlink.o \
> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> index 95d9ce9..4ad3056 100644
> --- a/fs/xfs/libxfs/xfs_fs.h
> +++ b/fs/xfs/libxfs/xfs_fs.h
> @@ -500,7 +500,8 @@ struct xfs_scrub_metadata {
>  #define XFS_SCRUB_TYPE_DIR	16	/* directory */
>  #define XFS_SCRUB_TYPE_XATTR	17	/* extended attribute */
>  #define XFS_SCRUB_TYPE_SYMLINK	18	/* symbolic link */
> -#define XFS_SCRUB_TYPE_MAX	18
> +#define XFS_SCRUB_TYPE_PARENT	19	/* parent pointers */
> +#define XFS_SCRUB_TYPE_MAX	19
>
>  /* i: repair this metadata */
>  #define XFS_SCRUB_FLAG_REPAIR		(1 << 0)
> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> index 4003c2f..6f701c6 100644
> --- a/fs/xfs/scrub/common.c
> +++ b/fs/xfs/scrub/common.c
> @@ -856,6 +856,10 @@ static const struct xfs_scrub_meta_fns meta_scrub_fns[] = {
>  		.setup	= xfs_scrub_setup_symlink,
>  		.scrub	= xfs_scrub_symlink,
>  	},
> +	{ /* parent pointers */
> +		.setup	= xfs_scrub_setup_parent,
> +		.scrub	= xfs_scrub_parent,
> +	},
>  };
>
>  /* Dispatch metadata scrubbing. */
> diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
> index 6d02a64..1873a31 100644
> --- a/fs/xfs/scrub/common.h
> +++ b/fs/xfs/scrub/common.h
> @@ -223,6 +223,7 @@ SETUP_FN(xfs_scrub_setup_inode_bmap_data);
>  SETUP_FN(xfs_scrub_setup_inode_bmap);
>  SETUP_FN(xfs_scrub_setup_directory);
>  SETUP_FN(xfs_scrub_setup_xattr);
> +SETUP_FN(xfs_scrub_setup_parent);
>  SETUP_FN(xfs_scrub_setup_symlink);
>  #undef SETUP_FN
>
> @@ -247,6 +248,7 @@ SCRUB_FN(xfs_scrub_bmap_attr);
>  SCRUB_FN(xfs_scrub_bmap_cow);
>  SCRUB_FN(xfs_scrub_directory);
>  SCRUB_FN(xfs_scrub_xattr);
> +SCRUB_FN(xfs_scrub_parent);
>  SCRUB_FN(xfs_scrub_symlink);
>  #undef SCRUB_FN
>
> diff --git a/fs/xfs/scrub/parent.c b/fs/xfs/scrub/parent.c
> new file mode 100644
> index 0000000..e604885
> --- /dev/null
> +++ b/fs/xfs/scrub/parent.c
> @@ -0,0 +1,252 @@
> +/*
> + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> + *
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#include "xfs.h"
> +#include "xfs_fs.h"
> +#include "xfs_shared.h"
> +#include "xfs_format.h"
> +#include "xfs_trans_resv.h"
> +#include "xfs_mount.h"
> +#include "xfs_defer.h"
> +#include "xfs_btree.h"
> +#include "xfs_bit.h"
> +#include "xfs_log_format.h"
> +#include "xfs_trans.h"
> +#include "xfs_trace.h"
> +#include "xfs_sb.h"
> +#include "xfs_inode.h"
> +#include "xfs_icache.h"
> +#include "xfs_dir2.h"
> +#include "xfs_dir2_priv.h"
> +#include "scrub/common.h"
> +
> +/* Set us up to scrub parents. */
> +int
> +xfs_scrub_setup_parent(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_inode		*ip)
> +{
> +	return xfs_scrub_setup_inode_contents(sc, ip, 0);
> +}
> +
> +/* Parent pointers */
> +
> +/* Look for an entry in a parent pointing to this inode. */
> +
> +struct xfs_scrub_parent_ctx {
> +	struct dir_context		dc;
> +	xfs_ino_t			ino;
> +	xfs_nlink_t			nr;
> +};
> +
> +/* Look for a single entry in a directory pointing to an inode. */
> +STATIC int
> +xfs_scrub_parent_actor(
> +	struct dir_context		*dc,
> +	const char			*name,
> +	int				namelen,
> +	loff_t				pos,
> +	u64				ino,
> +	unsigned			type)
> +{
> +	struct xfs_scrub_parent_ctx	*spc;
> +
> +	spc = container_of(dc, struct xfs_scrub_parent_ctx, dc);
> +	if (spc->ino == ino)
> +		spc->nr++;
> +	return 0;
> +}
> +
> +/* Count the number of dentries in the parent dir that point to this inode. */
> +STATIC int
> +xfs_scrub_parent_count_parent_dentries(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_inode		*parent,
> +	xfs_nlink_t			*nr)
> +{
> +	struct xfs_scrub_parent_ctx	spc = {
> +		.dc.actor = xfs_scrub_parent_actor,
> +		.dc.pos = 0,
> +		.ino = sc->ip->i_ino,
> +		.nr = 0,
> +	};
> +	struct xfs_ifork		*ifp;
> +	size_t				bufsize;
> +	loff_t				oldpos;
> +	uint				lock_mode;
> +	int				error;
> +
> +	/*
> +	 * Load the parent directory's extent map.  A regular directory
> +	 * open would start readahead (and thus load the extent map)
> +	 * before we even got to a readdir call, but this isn't
> +	 * guaranteed here.
> +	 */
> +	lock_mode = xfs_ilock_data_map_shared(parent);
> +	ifp = XFS_IFORK_PTR(parent, XFS_DATA_FORK);
> +	if (XFS_IFORK_FORMAT(parent, XFS_DATA_FORK) == XFS_DINODE_FMT_BTREE &&
> +	    !(ifp->if_flags & XFS_IFEXTENTS)) {
> +		error = xfs_iread_extents(sc->tp, parent, XFS_DATA_FORK);
> +		if (error) {
> +			xfs_iunlock(parent, lock_mode);
> +			return error;
> +		}
> +	}
> +	xfs_iunlock(parent, lock_mode);
> +
> +	/*
> +	 * Iterate the parent dir to confirm that there is
> +	 * exactly one entry pointing back to the inode being
> +	 * scanned.
> +	 */
> +	bufsize = (size_t)min_t(loff_t, 32768, parent->i_d.di_size);
Where does the 32768 come from?  A max number of entries for the parent 
maybe? Rest looks good otherwise.

Reviewed by: Allison Henderson <allison.henderson@oracle.com>

> +	oldpos = 0;
> +	while (true) {
> +		error = xfs_readdir(sc->tp, parent, &spc.dc, bufsize);
> +		if (error)
> +			goto out;
> +		if (oldpos == spc.dc.pos)
> +			break;
> +		oldpos = spc.dc.pos;
> +	}
> +	*nr = spc.nr;
> +out:
> +	return error;
> +}
> +
> +/* Scrub a parent pointer. */
> +#define XFS_SCRUB_PARENT_CHECK(fs_ok) \
> +	XFS_SCRUB_DATA_CHECK(sc, XFS_DATA_FORK, 0, "parent", fs_ok)
> +#define XFS_SCRUB_PARENT_GOTO(fs_ok, label) \
> +	XFS_SCRUB_DATA_GOTO(sc, XFS_DATA_FORK, 0, "parent", fs_ok, label)
> +#define XFS_SCRUB_PARENT_OP_ERROR_GOTO(label) \
> +	XFS_SCRUB_FILE_OP_ERROR_GOTO(sc, XFS_DATA_FORK, 0, "parent", \
> +		&error, label)
> +int
> +xfs_scrub_parent(
> +	struct xfs_scrub_context	*sc)
> +{
> +	struct xfs_mount		*mp = sc->mp;
> +	struct xfs_inode		*dp = NULL;
> +	xfs_ino_t			dnum;
> +	xfs_nlink_t			nr;
> +	int				tries = 0;
> +	int				error;
> +
> +	/*
> +	 * If we're a directory, check that the '..' link points up to
> +	 * a directory that has one entry pointing to us.
> +	 */
> +	if (!S_ISDIR(VFS_I(sc->ip)->i_mode))
> +		return -ENOENT;
> +
> +	/*
> +	 * The VFS grabs a read or write lock via i_rwsem before it reads
> +	 * or writes to a directory.  If we've gotten this far we've
> +	 * already obtained IOLOCK_EXCL, which (since 4.10) is the same as
> +	 * getting a write lock on i_rwsem.  Therefore, it is safe for us
> +	 * to drop the ILOCK here in order to do directory lookups.
> +	 */
> +	sc->ilock_flags &= ~(XFS_ILOCK_EXCL | XFS_MMAPLOCK_EXCL);
> +	xfs_iunlock(sc->ip, XFS_ILOCK_EXCL | XFS_MMAPLOCK_EXCL);
> +
> +	/* Look up '..' */
> +	error = xfs_dir_lookup(sc->tp, sc->ip, &xfs_name_dotdot, &dnum, NULL);
> +	XFS_SCRUB_PARENT_OP_ERROR_GOTO(out);
> +
> +	/* Is this the root dir?  Then '..' must point to itself. */
> +	if (sc->ip == mp->m_rootip) {
> +		XFS_SCRUB_PARENT_CHECK(sc->ip->i_ino == mp->m_sb.sb_rootino);
> +		XFS_SCRUB_PARENT_CHECK(dnum == sc->ip->i_ino);
> +		return 0;
> +	}
> +
> +try_again:
> +	/* Otherwise, '..' must not point to ourselves. */
> +	XFS_SCRUB_PARENT_GOTO(sc->ip->i_ino != dnum, out);
> +
> +	error = xfs_iget(mp, sc->tp, dnum, 0, 0, &dp);
> +	XFS_SCRUB_PARENT_OP_ERROR_GOTO(out);
> +	XFS_SCRUB_PARENT_GOTO(dp != sc->ip, out_rele);
> +
> +	/*
> +	 * We prefer to keep the inode locked while we lock and search
> +	 * its alleged parent for a forward reference.  However, this
> +	 * child -> parent scheme can deadlock with the parent -> child
> +	 * scheme that is normally used.  Therefore, if we can lock the
> +	 * parent, just validate the references and get out.
> +	 */
> +	if (xfs_ilock_nowait(dp, XFS_IOLOCK_SHARED)) {
> +		error = xfs_scrub_parent_count_parent_dentries(sc, dp, &nr);
> +		XFS_SCRUB_PARENT_OP_ERROR_GOTO(out_unlock);
> +		XFS_SCRUB_PARENT_CHECK(nr == 1);
> +		goto out_unlock;
> +	}
> +
> +	/*
> +	 * The game changes if we get here.  We failed to lock the parent,
> +	 * so we're going to try to verify both pointers while only holding
> +	 * one lock so as to avoid deadlocking with something that's actually
> +	 * trying to traverse down the directory tree.
> +	 */
> +	xfs_iunlock(sc->ip, sc->ilock_flags);
> +	sc->ilock_flags = 0;
> +	xfs_ilock(dp, XFS_IOLOCK_SHARED);
> +
> +	/* Go looking for our dentry. */
> +	error = xfs_scrub_parent_count_parent_dentries(sc, dp, &nr);
> +	XFS_SCRUB_PARENT_OP_ERROR_GOTO(out_unlock);
> +
> +	/* Drop the parent lock, relock this inode. */
> +	xfs_iunlock(dp, XFS_IOLOCK_SHARED);
> +	sc->ilock_flags = XFS_IOLOCK_EXCL;
> +	xfs_ilock(sc->ip, sc->ilock_flags);
> +
> +	/* Look up '..' to see if the inode changed. */
> +	error = xfs_dir_lookup(sc->tp, sc->ip, &xfs_name_dotdot, &dnum, NULL);
> +	XFS_SCRUB_PARENT_OP_ERROR_GOTO(out_rele);
> +
> +	/* Drat, parent changed.  Try again! */
> +	if (dnum != dp->i_ino) {
> +		IRELE(dp);
> +		tries++;
> +		if (tries < 20)
> +			goto try_again;
> +		XFS_SCRUB_INCOMPLETE(sc, "parent", false);
> +		goto out;
> +	}
> +	IRELE(dp);
> +
> +	/*
> +	 * '..' didn't change, so check that there was only one entry
> +	 * for us in the parent.
> +	 */
> +	XFS_SCRUB_PARENT_CHECK(nr == 1);
> +	goto out;
> +
> +out_unlock:
> +	xfs_iunlock(dp, XFS_IOLOCK_SHARED);
> +out_rele:
> +	IRELE(dp);
> +out:
> +	return error;
> +}
> +#undef XFS_SCRUB_PARENT_OP_ERROR_GOTO
> +#undef XFS_SCRUB_PARENT_GOTO
> +#undef XFS_SCRUB_PARENT_CHECK
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 21/22] xfs: scrub realtime bitmap/summary
  2017-07-21  4:40 ` [PATCH 21/22] xfs: scrub realtime bitmap/summary Darrick J. Wong
@ 2017-07-23 18:05   ` Allison Henderson
  0 siblings, 0 replies; 63+ messages in thread
From: Allison Henderson @ 2017-07-23 18:05 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

Looks ok to me.  Mostly just scaffolding and setup.

Reviewed by: Allison Henderson <allison.henderson@oracle.com>

On 7/20/2017 9:40 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> Perform simple tests of the realtime bitmap and summary.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/Makefile            |    2 +
>  fs/xfs/libxfs/xfs_format.h |    5 ++
>  fs/xfs/libxfs/xfs_fs.h     |    4 +-
>  fs/xfs/scrub/agheader.c    |    1
>  fs/xfs/scrub/common.c      |   15 +++++++
>  fs/xfs/scrub/common.h      |    3 +
>  fs/xfs/scrub/rtbitmap.c    |  101 ++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_trace.h         |    4 +-
>  8 files changed, 133 insertions(+), 2 deletions(-)
>  create mode 100644 fs/xfs/scrub/rtbitmap.c
>
>
> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> index e73cdc2..86fc94c 100644
> --- a/fs/xfs/Makefile
> +++ b/fs/xfs/Makefile
> @@ -156,4 +156,6 @@ xfs-y				+= $(addprefix scrub/, \
>  				   rmap.o \
>  				   symlink.o \
>  				   )
> +
> +xfs-$(CONFIG_XFS_RT)		+= scrub/rtbitmap.o
>  endif
> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> index 154c3dd..d4d9bef 100644
> --- a/fs/xfs/libxfs/xfs_format.h
> +++ b/fs/xfs/libxfs/xfs_format.h
> @@ -315,6 +315,11 @@ static inline bool xfs_sb_good_version(struct xfs_sb *sbp)
>  	return false;
>  }
>
> +static inline bool xfs_sb_version_hasrealtime(struct xfs_sb *sbp)
> +{
> +	return sbp->sb_rblocks > 0;
> +}
> +
>  /*
>   * Detect a mismatched features2 field.  Older kernels read/wrote
>   * this into the wrong slot, so to be safe we keep them in sync.
> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> index 4ad3056..83121fc 100644
> --- a/fs/xfs/libxfs/xfs_fs.h
> +++ b/fs/xfs/libxfs/xfs_fs.h
> @@ -501,7 +501,9 @@ struct xfs_scrub_metadata {
>  #define XFS_SCRUB_TYPE_XATTR	17	/* extended attribute */
>  #define XFS_SCRUB_TYPE_SYMLINK	18	/* symbolic link */
>  #define XFS_SCRUB_TYPE_PARENT	19	/* parent pointers */
> -#define XFS_SCRUB_TYPE_MAX	19
> +#define XFS_SCRUB_TYPE_RTBITMAP	20	/* realtime bitmap */
> +#define XFS_SCRUB_TYPE_RTSUM	21	/* realtime summary */
> +#define XFS_SCRUB_TYPE_MAX	21
>
>  /* i: repair this metadata */
>  #define XFS_SCRUB_FLAG_REPAIR		(1 << 0)
> diff --git a/fs/xfs/scrub/agheader.c b/fs/xfs/scrub/agheader.c
> index 137d2ad..8048a63 100644
> --- a/fs/xfs/scrub/agheader.c
> +++ b/fs/xfs/scrub/agheader.c
> @@ -247,6 +247,7 @@ xfs_scrub_superblock(
>  	XFS_SCRUB_SB_FEAT(metauuid);
>  	XFS_SCRUB_SB_FEAT(rmapbt);
>  	XFS_SCRUB_SB_FEAT(reflink);
> +	XFS_SCRUB_SB_FEAT(realtime);
>  #undef XFS_SCRUB_SB_FEAT
>
>  #define XFS_SCRUB_SB_FEAT_PREEN(fn) \
> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> index 6f701c6..6e40fa6 100644
> --- a/fs/xfs/scrub/common.c
> +++ b/fs/xfs/scrub/common.c
> @@ -860,6 +860,21 @@ static const struct xfs_scrub_meta_fns meta_scrub_fns[] = {
>  		.setup	= xfs_scrub_setup_parent,
>  		.scrub	= xfs_scrub_parent,
>  	},
> +#ifdef CONFIG_XFS_RT
> +	{ /* realtime bitmap */
> +		.setup	= xfs_scrub_setup_rt,
> +		.scrub	= xfs_scrub_rtbitmap,
> +		.has	= xfs_sb_version_hasrealtime,
> +	},
> +	{ /* realtime summary */
> +		.setup	= xfs_scrub_setup_rt,
> +		.scrub	= xfs_scrub_rtsummary,
> +		.has	= xfs_sb_version_hasrealtime,
> +	},
> +#else
> +	{ NULL },
> +	{ NULL },
> +#endif
>  };
>
>  /* Dispatch metadata scrubbing. */
> diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
> index 1873a31..43a74f0 100644
> --- a/fs/xfs/scrub/common.h
> +++ b/fs/xfs/scrub/common.h
> @@ -225,6 +225,7 @@ SETUP_FN(xfs_scrub_setup_directory);
>  SETUP_FN(xfs_scrub_setup_xattr);
>  SETUP_FN(xfs_scrub_setup_parent);
>  SETUP_FN(xfs_scrub_setup_symlink);
> +SETUP_FN(xfs_scrub_setup_rt);
>  #undef SETUP_FN
>
>  /* Metadata scrubbers */
> @@ -250,6 +251,8 @@ SCRUB_FN(xfs_scrub_directory);
>  SCRUB_FN(xfs_scrub_xattr);
>  SCRUB_FN(xfs_scrub_parent);
>  SCRUB_FN(xfs_scrub_symlink);
> +SCRUB_FN(xfs_scrub_rtbitmap);
> +SCRUB_FN(xfs_scrub_rtsummary);
>  #undef SCRUB_FN
>
>  #endif	/* __XFS_REPAIR_COMMON_H__ */
> diff --git a/fs/xfs/scrub/rtbitmap.c b/fs/xfs/scrub/rtbitmap.c
> new file mode 100644
> index 0000000..b061066
> --- /dev/null
> +++ b/fs/xfs/scrub/rtbitmap.c
> @@ -0,0 +1,101 @@
> +/*
> + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> + *
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#include "xfs.h"
> +#include "xfs_fs.h"
> +#include "xfs_shared.h"
> +#include "xfs_format.h"
> +#include "xfs_trans_resv.h"
> +#include "xfs_mount.h"
> +#include "xfs_defer.h"
> +#include "xfs_btree.h"
> +#include "xfs_bit.h"
> +#include "xfs_log_format.h"
> +#include "xfs_trans.h"
> +#include "xfs_trace.h"
> +#include "xfs_sb.h"
> +#include "xfs_alloc.h"
> +#include "xfs_rtalloc.h"
> +#include "xfs_inode.h"
> +#include "scrub/common.h"
> +
> +/* Set us up with the realtime metadata locked. */
> +int
> +xfs_scrub_setup_rt(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_inode		*ip)
> +{
> +	struct xfs_mount		*mp = sc->mp;
> +	int				lockmode;
> +	int				error = 0;
> +
> +	if (sc->sm->sm_agno || sc->sm->sm_ino || sc->sm->sm_gen)
> +		return -EINVAL;
> +
> +	error = xfs_scrub_setup_fs(sc, ip);
> +	if (error)
> +		return error;
> +
> +	lockmode = XFS_ILOCK_EXCL | XFS_ILOCK_RTBITMAP;
> +	xfs_ilock(mp->m_rbmip, lockmode);
> +	xfs_trans_ijoin(sc->tp, mp->m_rbmip, lockmode);
> +
> +	return 0;
> +}
> +
> +/* Realtime bitmap. */
> +
> +#define XFS_SCRUB_RTBITMAP_CHECK(fs_ok) \
> +	XFS_SCRUB_CHECK(sc, bp, "rtbitmap", fs_ok)
> +#define XFS_SCRUB_RTBITMAP_OP_ERROR_GOTO(error, label) \
> +	XFS_SCRUB_OP_ERROR_GOTO(sc, 0, 0, "rtbitmap", error, label)
> +/* Scrub a free extent record from the realtime bitmap. */
> +STATIC int
> +xfs_scrub_rtbitmap_helper(
> +	struct xfs_trans		*tp,
> +	struct xfs_rtalloc_rec		*rec,
> +	void				*priv)
> +{
> +	return 0;
> +}
> +
> +/* Scrub the realtime bitmap. */
> +int
> +xfs_scrub_rtbitmap(
> +	struct xfs_scrub_context	*sc)
> +{
> +	int				error;
> +
> +	error = xfs_rtalloc_query_all(sc->tp, xfs_scrub_rtbitmap_helper, NULL);
> +	XFS_SCRUB_RTBITMAP_OP_ERROR_GOTO(&error, out);
> +
> +out:
> +	return error;
> +}
> +#undef XFS_SCRUB_RTBITMAP_OP_ERROR_GOTO
> +#undef XFS_SCRUB_RTBITMAP_CHECK
> +
> +/* Scrub the realtime summary. */
> +int
> +xfs_scrub_rtsummary(
> +	struct xfs_scrub_context	*sc)
> +{
> +	/* XXX: implement this some day */
> +	return -ENOENT;
> +}
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 39824f8..1be7b00 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -3330,7 +3330,9 @@ DEFINE_GETFSMAP_EVENT(xfs_getfsmap_mapping);
>  	{ XFS_SCRUB_TYPE_BMBTC,		"bmapbtc" }, \
>  	{ XFS_SCRUB_TYPE_DIR,		"dir" }, \
>  	{ XFS_SCRUB_TYPE_XATTR,		"xattr" }, \
> -	{ XFS_SCRUB_TYPE_SYMLINK,	"symlink" }
> +	{ XFS_SCRUB_TYPE_SYMLINK,	"symlink" }, \
> +	{ XFS_SCRUB_TYPE_RTBITMAP,	"rtbitmap" }, \
> +	{ XFS_SCRUB_TYPE_RTSUM,		"rtsummary" }
>  DECLARE_EVENT_CLASS(xfs_scrub_class,
>  	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
>  		 int error),
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 22/22] xfs: scrub quota information
  2017-07-21  4:40 ` [PATCH 22/22] xfs: scrub quota information Darrick J. Wong
@ 2017-07-23 18:07   ` Allison Henderson
  0 siblings, 0 replies; 63+ messages in thread
From: Allison Henderson @ 2017-07-23 18:07 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

Reviewed by: Allison Henderson <allison.henderson@oracle.com>

On 7/20/2017 9:40 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> Perform some quick sanity testing of the disk quota information.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/Makefile        |    1
>  fs/xfs/libxfs/xfs_fs.h |    5 +
>  fs/xfs/scrub/common.c  |   18 +++
>  fs/xfs/scrub/common.h  |    2
>  fs/xfs/scrub/quota.c   |  274 ++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_trace.h     |    5 +
>  6 files changed, 303 insertions(+), 2 deletions(-)
>  create mode 100644 fs/xfs/scrub/quota.c
>
>
> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> index 86fc94c..010a90f 100644
> --- a/fs/xfs/Makefile
> +++ b/fs/xfs/Makefile
> @@ -158,4 +158,5 @@ xfs-y				+= $(addprefix scrub/, \
>  				   )
>
>  xfs-$(CONFIG_XFS_RT)		+= scrub/rtbitmap.o
> +xfs-$(CONFIG_XFS_QUOTA)		+= scrub/quota.o
>  endif
> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> index 83121fc..444e286 100644
> --- a/fs/xfs/libxfs/xfs_fs.h
> +++ b/fs/xfs/libxfs/xfs_fs.h
> @@ -503,7 +503,10 @@ struct xfs_scrub_metadata {
>  #define XFS_SCRUB_TYPE_PARENT	19	/* parent pointers */
>  #define XFS_SCRUB_TYPE_RTBITMAP	20	/* realtime bitmap */
>  #define XFS_SCRUB_TYPE_RTSUM	21	/* realtime summary */
> -#define XFS_SCRUB_TYPE_MAX	21
> +#define XFS_SCRUB_TYPE_UQUOTA	22	/* user quotas */
> +#define XFS_SCRUB_TYPE_GQUOTA	23	/* group quotas */
> +#define XFS_SCRUB_TYPE_PQUOTA	24	/* project quotas */
> +#define XFS_SCRUB_TYPE_MAX	24
>
>  /* i: repair this metadata */
>  #define XFS_SCRUB_FLAG_REPAIR		(1 << 0)
> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> index 6e40fa6..62884a8 100644
> --- a/fs/xfs/scrub/common.c
> +++ b/fs/xfs/scrub/common.c
> @@ -875,6 +875,24 @@ static const struct xfs_scrub_meta_fns meta_scrub_fns[] = {
>  	{ NULL },
>  	{ NULL },
>  #endif
> +#ifdef CONFIG_XFS_QUOTA
> +	{ /* user quota */
> +		.setup = xfs_scrub_setup_quota,
> +		.scrub = xfs_scrub_quota,
> +	},
> +	{ /* group quota */
> +		.setup = xfs_scrub_setup_quota,
> +		.scrub = xfs_scrub_quota,
> +	},
> +	{ /* project quota */
> +		.setup = xfs_scrub_setup_quota,
> +		.scrub = xfs_scrub_quota,
> +	},
> +#else
> +	{ NULL },
> +	{ NULL },
> +	{ NULL },
> +#endif
>  };
>
>  /* Dispatch metadata scrubbing. */
> diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
> index 43a74f0..fcb3764 100644
> --- a/fs/xfs/scrub/common.h
> +++ b/fs/xfs/scrub/common.h
> @@ -226,6 +226,7 @@ SETUP_FN(xfs_scrub_setup_xattr);
>  SETUP_FN(xfs_scrub_setup_parent);
>  SETUP_FN(xfs_scrub_setup_symlink);
>  SETUP_FN(xfs_scrub_setup_rt);
> +SETUP_FN(xfs_scrub_setup_quota);
>  #undef SETUP_FN
>
>  /* Metadata scrubbers */
> @@ -253,6 +254,7 @@ SCRUB_FN(xfs_scrub_parent);
>  SCRUB_FN(xfs_scrub_symlink);
>  SCRUB_FN(xfs_scrub_rtbitmap);
>  SCRUB_FN(xfs_scrub_rtsummary);
> +SCRUB_FN(xfs_scrub_quota);
>  #undef SCRUB_FN
>
>  #endif	/* __XFS_REPAIR_COMMON_H__ */
> diff --git a/fs/xfs/scrub/quota.c b/fs/xfs/scrub/quota.c
> new file mode 100644
> index 0000000..117b8b6
> --- /dev/null
> +++ b/fs/xfs/scrub/quota.c
> @@ -0,0 +1,274 @@
> +/*
> + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> + *
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#include "xfs.h"
> +#include "xfs_fs.h"
> +#include "xfs_shared.h"
> +#include "xfs_format.h"
> +#include "xfs_trans_resv.h"
> +#include "xfs_mount.h"
> +#include "xfs_defer.h"
> +#include "xfs_btree.h"
> +#include "xfs_bit.h"
> +#include "xfs_log_format.h"
> +#include "xfs_trans.h"
> +#include "xfs_trace.h"
> +#include "xfs_sb.h"
> +#include "xfs_inode.h"
> +#include "xfs_inode_fork.h"
> +#include "xfs_bmap.h"
> +#include "xfs_quota.h"
> +#include "xfs_qm.h"
> +#include "xfs_dquot.h"
> +#include "xfs_dquot_item.h"
> +#include "scrub/common.h"
> +
> +/* Convert a scrub type code to a DQ flag, or return 0 if error. */
> +static inline uint
> +xfs_scrub_quota_to_dqtype(
> +	struct xfs_scrub_context	*sc)
> +{
> +	switch (sc->sm->sm_type) {
> +	case XFS_SCRUB_TYPE_UQUOTA:
> +		return XFS_DQ_USER;
> +	case XFS_SCRUB_TYPE_GQUOTA:
> +		return XFS_DQ_GROUP;
> +	case XFS_SCRUB_TYPE_PQUOTA:
> +		return XFS_DQ_PROJ;
> +	default:
> +		return 0;
> +	}
> +}
> +
> +/* Set us up to scrub a quota. */
> +int
> +xfs_scrub_setup_quota(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_inode		*ip)
> +{
> +	uint				dqtype;
> +
> +	if (sc->sm->sm_agno || sc->sm->sm_ino || sc->sm->sm_gen)
> +		return -EINVAL;
> +
> +	dqtype = xfs_scrub_quota_to_dqtype(sc);
> +	if (dqtype == 0)
> +		return -EINVAL;
> +	return 0;
> +}
> +
> +/* Quotas. */
> +
> +#define XFS_SCRUB_QUOTA_CHECK(fs_ok) \
> +	XFS_SCRUB_DATA_CHECK(sc, XFS_DATA_FORK, id, tag, fs_ok)
> +#define XFS_SCRUB_QUOTA_WARN(fs_ok) \
> +	XFS_SCRUB_DATA_WARN(sc, XFS_DATA_FORK, id, tag, fs_ok)
> +#define XFS_SCRUB_QUOTA_GOTO(fs_ok, label) \
> +	XFS_SCRUB_DATA_GOTO(sc, XFS_DATA_FORK, id, tag, fs_ok, label)
> +#define XFS_SCRUB_QUOTA_OP_ERR(label) \
> +	XFS_SCRUB_FILE_OP_ERROR_GOTO(sc, XFS_DATA_FORK, id, tag, &error, label)
> +/* Scrub the fields in an individual quota item. */
> +STATIC void
> +xfs_scrub_quota_item(
> +	struct xfs_scrub_context	*sc,
> +	const char			*tag,
> +	uint				dqtype,
> +	struct xfs_dquot		*dq,
> +	xfs_dqid_t			id)
> +{
> +	struct xfs_mount		*mp = sc->mp;
> +	struct xfs_disk_dquot		*d = &dq->q_core;
> +	unsigned long long		bsoft;
> +	unsigned long long		isoft;
> +	unsigned long long		rsoft;
> +	unsigned long long		bhard;
> +	unsigned long long		ihard;
> +	unsigned long long		rhard;
> +	unsigned long long		bcount;
> +	unsigned long long		icount;
> +	unsigned long long		rcount;
> +	xfs_ino_t			inodes;
> +
> +	/* Did we get the dquot we wanted? */
> +	XFS_SCRUB_QUOTA_CHECK(id <= be32_to_cpu(d->d_id));
> +	XFS_SCRUB_QUOTA_CHECK(dqtype ==
> +			(d->d_flags & XFS_DQ_ALLTYPES));
> +
> +	/* Check the limits. */
> +	bhard = be64_to_cpu(d->d_blk_hardlimit);
> +	ihard = be64_to_cpu(d->d_ino_hardlimit);
> +	rhard = be64_to_cpu(d->d_rtb_hardlimit);
> +
> +	bsoft = be64_to_cpu(d->d_blk_softlimit);
> +	isoft = be64_to_cpu(d->d_ino_softlimit);
> +	rsoft = be64_to_cpu(d->d_rtb_softlimit);
> +
> +	inodes = XFS_AGINO_TO_INO(mp, mp->m_sb.sb_agcount, 0);
> +
> +	/*
> +	 * Warn if the limits are larger than the fs.  Administrators
> +	 * can do this, though in production this seems suspect.
> +	 */
> +	XFS_SCRUB_QUOTA_WARN(bhard <= mp->m_sb.sb_dblocks);
> +	XFS_SCRUB_QUOTA_WARN(ihard <= inodes);
> +	XFS_SCRUB_QUOTA_WARN(rhard <= mp->m_sb.sb_rblocks);
> +
> +	XFS_SCRUB_QUOTA_WARN(bsoft <= mp->m_sb.sb_dblocks);
> +	XFS_SCRUB_QUOTA_WARN(isoft <= inodes);
> +	XFS_SCRUB_QUOTA_WARN(rsoft <= mp->m_sb.sb_rblocks);
> +
> +	/* Soft limit must be less than the hard limit. */
> +	XFS_SCRUB_QUOTA_CHECK(bsoft <= bhard);
> +	XFS_SCRUB_QUOTA_CHECK(isoft <= ihard);
> +	XFS_SCRUB_QUOTA_CHECK(rsoft <= rhard);
> +
> +	/* Check the resource counts. */
> +	bcount = be64_to_cpu(d->d_bcount);
> +	icount = be64_to_cpu(d->d_icount);
> +	rcount = be64_to_cpu(d->d_rtbcount);
> +	inodes = percpu_counter_sum(&mp->m_icount);
> +
> +	/*
> +	 * Check that usage doesn't exceed physical limits.  However, on
> +	 * a reflink filesystem we're allowed to exceed physical space
> +	 * if there are no quota limits.
> +	 */
> +	if (xfs_sb_version_hasreflink(&mp->m_sb))
> +		XFS_SCRUB_QUOTA_WARN(bcount <= mp->m_sb.sb_dblocks);
> +	else
> +		XFS_SCRUB_QUOTA_CHECK(bcount <= mp->m_sb.sb_dblocks);
> +	XFS_SCRUB_QUOTA_CHECK(icount <= inodes);
> +	XFS_SCRUB_QUOTA_CHECK(rcount <= mp->m_sb.sb_rblocks);
> +
> +	/*
> +	 * We can violate the hard limits if the admin suddenly sets a
> +	 * lower limit than the actual usage.  However, we flag it for
> +	 * admin review.
> +	 */
> +	XFS_SCRUB_QUOTA_WARN(id == 0 || bhard == 0 || bcount <= bhard);
> +	XFS_SCRUB_QUOTA_WARN(id == 0 || ihard == 0 || icount <= ihard);
> +	XFS_SCRUB_QUOTA_WARN(id == 0 || rhard == 0 || rcount <= rhard);
> +}
> +
> +/* Scrub all of a quota type's items. */
> +int
> +xfs_scrub_quota(
> +	struct xfs_scrub_context	*sc)
> +{
> +	struct xfs_bmbt_irec		irec = { 0 };
> +	struct xfs_mount		*mp = sc->mp;
> +	struct xfs_inode		*ip;
> +	const char			*tag = NULL;
> +	struct xfs_quotainfo		*qi = mp->m_quotainfo;
> +	struct xfs_dquot		*dq;
> +	xfs_fileoff_t			max_dqid_off;
> +	xfs_fileoff_t			off = 0;
> +	xfs_dqid_t			id = 0;
> +	uint				dqtype;
> +	int				nimaps;
> +	int				error;
> +
> +	if (!XFS_IS_QUOTA_RUNNING(mp) || !XFS_IS_QUOTA_ON(mp))
> +		return -ENOENT;
> +
> +	dqtype = xfs_scrub_quota_to_dqtype(sc);
> +	switch (dqtype) {
> +	case XFS_DQ_USER:
> +		tag = "usrquota";
> +		break;
> +	case XFS_DQ_GROUP:
> +		tag = "grpquota";
> +		break;
> +	case XFS_DQ_PROJ:
> +		tag = "prjquota";
> +		break;
> +	default:
> +		ASSERT(0);
> +	}
> +
> +	mutex_lock(&qi->qi_quotaofflock);
> +	if (!xfs_this_quota_on(sc->mp, dqtype)) {
> +		error = -ENOENT;
> +		goto out;
> +	}
> +
> +	/* Attach to the quota inode and set sc->ip so that reporting works. */
> +	ip = xfs_quota_inode(sc->mp, dqtype);
> +	sc->ip = ip;
> +
> +	/* Look for problem extents. */
> +	xfs_ilock(ip, XFS_ILOCK_EXCL);
> +	max_dqid_off = ((xfs_dqid_t)-1) / qi->qi_dqperchunk;
> +	while (1) {
> +		if (xfs_scrub_should_terminate(&error))
> +			break;
> +
> +		off = irec.br_startoff + irec.br_blockcount;
> +		nimaps = 1;
> +		error = xfs_bmapi_read(ip, off, -1, &irec, &nimaps,
> +				XFS_BMAPI_ENTIRE);
> +		XFS_SCRUB_QUOTA_OP_ERR(out_unlock);
> +		if (!nimaps)
> +			break;
> +		if (irec.br_startblock == HOLESTARTBLOCK)
> +			continue;
> +
> +		/*
> +		 * Unwritten extents or blocks mapped above the highest
> +		 * quota id shouldn't happen.
> +		 */
> +		XFS_SCRUB_QUOTA_GOTO(!isnullstartblock(irec.br_startblock),
> +				next_extent);
> +		XFS_SCRUB_QUOTA_GOTO(irec.br_startoff <= max_dqid_off,
> +				next_extent);
> +		XFS_SCRUB_QUOTA_GOTO(irec.br_startoff + irec.br_blockcount <=
> +				max_dqid_off + 1, next_extent);
> +next_extent:;
> +	}
> +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> +
> +	/* Check all the quota items. */
> +	while (id < ((xfs_dqid_t)-1ULL)) {
> +		if (xfs_scrub_should_terminate(&error))
> +			break;
> +
> +		error = xfs_qm_dqget(mp, NULL, id, dqtype, XFS_QMOPT_DQNEXT,
> +				&dq);
> +		if (error == -ENOENT)
> +			break;
> +		XFS_SCRUB_QUOTA_OP_ERR(out);
> +
> +		xfs_scrub_quota_item(sc, tag, dqtype, dq, id);
> +
> +		id = be32_to_cpu(dq->q_core.d_id) + 1;
> +		xfs_qm_dqput(dq);
> +	}
> +	goto out;
> +
> +out_unlock:
> +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> +out:
> +	sc->ip = NULL;
> +	mutex_unlock(&qi->qi_quotaofflock);
> +	return error;
> +}
> +#undef XFS_SCRUB_QUOTA_OP_ERR
> +#undef XFS_SCRUB_QUOTA_GOTO
> +#undef XFS_SCRUB_QUOTA_WARN
> +#undef XFS_SCRUB_QUOTA_CHECK
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 1be7b00..9f71cb9 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -3332,7 +3332,10 @@ DEFINE_GETFSMAP_EVENT(xfs_getfsmap_mapping);
>  	{ XFS_SCRUB_TYPE_XATTR,		"xattr" }, \
>  	{ XFS_SCRUB_TYPE_SYMLINK,	"symlink" }, \
>  	{ XFS_SCRUB_TYPE_RTBITMAP,	"rtbitmap" }, \
> -	{ XFS_SCRUB_TYPE_RTSUM,		"rtsummary" }
> +	{ XFS_SCRUB_TYPE_RTSUM,		"rtsummary" }, \
> +	{ XFS_SCRUB_TYPE_UQUOTA,	"usrquota" }, \
> +	{ XFS_SCRUB_TYPE_GQUOTA,	"grpquota" }, \
> +	{ XFS_SCRUB_TYPE_PQUOTA,	"prjquota" }
>  DECLARE_EVENT_CLASS(xfs_scrub_class,
>  	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
>  		 int error),
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 01/22] xfs: query the per-AG reservation counters
  2017-07-21  4:38 ` [PATCH 01/22] xfs: query the per-AG reservation counters Darrick J. Wong
  2017-07-23 16:16   ` Allison Henderson
@ 2017-07-23 22:25   ` Dave Chinner
  2017-07-24 19:07     ` Darrick J. Wong
  1 sibling, 1 reply; 63+ messages in thread
From: Dave Chinner @ 2017-07-23 22:25 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Jul 20, 2017 at 09:38:35PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Establish an ioctl for userspace to query the original and current
> per-AG reservation counts.  This will be used by xfs_scrub to
> check that the vfs counters are at least somewhat sane.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_fs.h |   10 ++++++++++
>  fs/xfs/xfs_fsops.c     |   29 +++++++++++++++++++++++++++++
>  fs/xfs/xfs_fsops.h     |    2 ++
>  fs/xfs/xfs_ioctl.c     |   16 ++++++++++++++++
>  fs/xfs/xfs_ioctl32.c   |    1 +
>  5 files changed, 58 insertions(+)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> index 8c61f21..5dedab9 100644
> --- a/fs/xfs/libxfs/xfs_fs.h
> +++ b/fs/xfs/libxfs/xfs_fs.h
> @@ -469,6 +469,15 @@ typedef struct xfs_swapext
>  #define XFS_FSOP_GOING_FLAGS_NOLOGFLUSH		0x2	/* don't flush log nor data */
>  
>  /*
> + * AG reserved block counters
> + */
> +struct xfs_fsop_ag_resblks {
> +	__u64 resblks;		/* blocks reserved now */

		current_reservation

> +	__u64 resblks_orig;	/* blocks reserved at mount time */

		mount_reservation;

> +	__u64 reserved[2];
> +};

Also, any new structure we pass to userspace should be versioned
from the start. At minimum, a flags field so we can, in future, tell
userspace what the reserved space means in future.

> +
> +/* Query the per-AG reservations to see how many blocks we have reserved. */
> +int
> +xfs_fs_get_ag_reserve_blocks(
> +	struct xfs_mount		*mp,
> +	struct xfs_fsop_ag_resblks	*out)
> +{
> +	struct xfs_ag_resv		*r;
> +	struct xfs_perag		*pag;
> +	xfs_agnumber_t			agno;
> +
> +	out->resblks = 0;
> +	out->resblks_orig = 0;
> +	out->reserved[0] = out->reserved[1] = 0;

memset() the structure so we don't forget in future to zero it
properly.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 03/22] xfs: create an ioctl to scrub AG metadata
  2017-07-21  4:38 ` [PATCH 03/22] xfs: create an ioctl to scrub AG metadata Darrick J. Wong
  2017-07-23 16:37   ` Allison Henderson
@ 2017-07-23 23:45   ` Dave Chinner
  2017-07-24 21:14     ` Darrick J. Wong
  1 sibling, 1 reply; 63+ messages in thread
From: Dave Chinner @ 2017-07-23 23:45 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Jul 20, 2017 at 09:38:47PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Create an ioctl that can be used to scrub internal filesystem metadata.
> The new ioctl takes the metadata type, an (optional) AG number, an
> (optional) inode number and generation, and a flags argument.  This will
> be used by the upcoming XFS online scrub tool.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Ok, I'm starting completely cold on this code (never seen it before)
so there's a few things about the patch series that hit me straight
away.

1. I can't really review the previous tracepoint patch because I
have no context of how they are used or what is being passed into
them. Tracepoint patches really need to be added when there's
already context available to verify them Can you split that patch up
so that tracepoints are introduced with the code that uses them?

2. Macros. Ugh.

3. Lots of different, disjoint bits of infrastructure in this patch,
but I have no clear idea how it gets used yet. Makes it hard to
review....

Hence I think this patch also needs to be broken up into individual
infrastructure operations, with the patch description describing
what each operation is used for. That way when it comes to adding
the actual metadata scrub code, the reviewer knows how the pieces
all go together and what the functions being called are supposed to
do and return...

Once I understand the infrastructure and how it is supposed to drive
all the other bits, then larger patches for scrubbing individual
structures are fine, but large patches for infrastructure just lead
to long, long emails and missed problems...

> +/* metadata scrubbing */
> +struct xfs_scrub_metadata {
> +	__u32 sm_type;		/* What to check? */
> +	__u32 sm_flags;		/* flags; see below. */
> +	__u64 sm_ino;		/* inode number. */
> +	__u32 sm_gen;		/* inode generation. */
> +	__u32 sm_agno;		/* ag number. */
> +	__u64 sm_reserved[5];	/* pad to 64 bytes */
> +};
> +
> +/*
> + * Metadata types and flags for scrub operation.
> + */
> +#define XFS_SCRUB_TYPE_TEST	0	/* dummy to test ioctl */
> +#define XFS_SCRUB_TYPE_MAX	0
> +
> +/* i: repair this metadata */
> +#define XFS_SCRUB_FLAG_REPAIR		(1 << 0)

If you're going to document a direction, so it in the variable name,
not the comment. XFS_SCRUB_IFLAG_REPAIR, XFS_SCRUB_OFLAG_CORRUPT,
etc. Especially as you don't separate the definitions  of i/o flags,
future flags are going to intertwine i/o in the definitions and
it's not going to be obvious from a quick look where a flag should
be used...

> +/* o: metadata object needs repair */
> +#define XFS_SCRUB_FLAG_CORRUPT		(1 << 1)
> +/* o: metadata object could be optimized */
> +#define XFS_SCRUB_FLAG_PREEN		(1 << 2)

What does "could be optimised" mean?

> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> new file mode 100644
> index 0000000..6931793
> --- /dev/null
> +++ b/fs/xfs/scrub/common.c
> @@ -0,0 +1,533 @@

Rather than "common.c", shouldn't this be named "scrub.c" to
indicate it's the entry point/main infrastructure file? "common"
usually indicates library/shared functions, not ioctl entry points..

[...]

> + * We use a bit of trickery with transactions to avoid buffer deadlocks
> + * if there is a cycle in the metadata.  The basic problem is that
> + * travelling down a btree involves locking the current buffer at each
> + * tree level.  If a pointer should somehow point back to a buffer that
> + * we've already examined, we will deadlock due to the second buffer
> + * locking attempt.  Note however that grabbing a buffer in transaction
> + * context links the locked buffer to the transaction.  If we try to
> + * re-grab the buffer in the context of the same transaction, we avoid
> + * the second lock attempt and continue.  Between the verifier and the
> + * scrubber, something will notice that something is amiss and report
> + * the corruption.  Therefore, each scrubber will allocate an empty
> + * transaction, attach buffers to it, and cancel the transaction at the
> + * end of the scrub run.  Cancelling a non-dirty transaction simply
> + * unlocks the buffers.

This whole chunk of trickery should definitely be in it's own patch...

> + * There are four pieces of data that scrub can communicate to
> + * userspace.  The first is the error code (errno), which can be used to
> + * communicate operational errors in performing the scrub.  There are
> + * also three flags that can be set in the scrub context.  If the data
> + * structure itself is corrupt, the CORRUPT flag will be set.  If
> + * the metadata is correct but otherwise suboptimal, the PREEN flag
> + * will be set.
> + */
> +
> +struct xfs_scrub_meta_fns {
> +	int		(*setup)(struct xfs_scrub_context *,
> +				 struct xfs_inode *);
> +	int		(*scrub)(struct xfs_scrub_context *);
> +	bool		(*has)(struct xfs_sb *);
> +};

What's this structure do? And why "fns" rather than "ops" as we
normally call operation callout structures like this?

> +/* Check for operational errors. */
> +bool
> +xfs_scrub_op_ok(
> +	struct xfs_scrub_context	*sc,
> +	xfs_agnumber_t			agno,
> +	xfs_agblock_t			bno,
> +	const char			*type,
> +	int				*error,
> +	const char			*func,
> +	int				line)
> +{
> +	struct xfs_mount		*mp = sc->mp;
> +
> +	switch (*error) {
> +	case 0:
> +		return true;
> +	case -EDEADLOCK:
> +		/* Used to restart an op with deadlock avoidance. */
> +		trace_xfs_scrub_deadlock_retry(sc->ip, sc->sm, *error);
> +		break;
> +	case -EFSBADCRC:
> +	case -EFSCORRUPTED:
> +		/* Note the badness but don't abort. */
> +		sc->sm->sm_flags |= XFS_SCRUB_FLAG_CORRUPT;
> +		*error = 0;
> +		/* fall through */
> +	default:
> +		trace_xfs_scrub_op_error(mp, agno, bno, type, *error, func,
> +				line);
> +		break;
> +	}
> +	return false;
> +}

These looks like boiler plate functions that aren't used yet, which
makes it harder to see all the actual ioctl code that is supposed
to be introduced in this patch. Again, I'm not sure how they are
supposed to be used, so I can't actually review this code yet....

Also, it looks like we're passing func/line to tracing functions,
which further implies wrapper macros to insert them. We've avoided
this with all the other tracing functions by passing
__return_address and/or _THIS_IP_ to the tracing function. Doing so
in all this boiler plate checking code gets rid of the macros.


> +/* Dummy scrubber */
> +
> +int
> +xfs_scrub_dummy(
> +	struct xfs_scrub_context	*sc)
> +{
> +	if (sc->sm->sm_ino || sc->sm->sm_agno)
> +		return -EINVAL;
> +	if (sc->sm->sm_gen & XFS_SCRUB_FLAG_CORRUPT)
> +		sc->sm->sm_flags |= XFS_SCRUB_FLAG_CORRUPT;
> +	if (sc->sm->sm_gen & XFS_SCRUB_FLAG_PREEN)
> +		sc->sm->sm_flags |= XFS_SCRUB_FLAG_PREEN;
> +	if (sc->sm->sm_gen & XFS_SCRUB_FLAG_XFAIL)
> +		sc->sm->sm_flags |= XFS_SCRUB_FLAG_XFAIL;
> +	if (sc->sm->sm_gen & XFS_SCRUB_FLAG_XCORRUPT)
> +		sc->sm->sm_flags |= XFS_SCRUB_FLAG_XCORRUPT;
> +	if (sc->sm->sm_gen & ~XFS_SCRUB_FLAGS_OUT)
> +		return -ENOENT;
> +
> +	return 0;
> +}

What's the purpose of the dummy? Does it get removed later?

> +/* Per-scrubber setup functions */
> +
> +/* Set us up with a transaction and an empty context. */
> +int
> +xfs_scrub_setup_fs(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_inode		*ip)
> +{
> +	return xfs_scrub_trans_alloc(sc->sm, sc->mp,
> +			&M_RES(sc->mp)->tr_itruncate, 0, 0, 0, &sc->tp);
> +}

Why are you using a truncate reservation for this transaction?

Better question: what initial conditions is a setup function
supposed to create for (I'm guessing here) a scrubber function to
run?

> +
> +/* Scrub setup and teardown */
> +
> +/* Free all the resources and finish the transactions. */
> +STATIC int
> +xfs_scrub_teardown(
> +	struct xfs_scrub_context	*sc,
> +	int				error)
> +{
> +	if (sc->tp) {
> +		xfs_trans_cancel(sc->tp);
> +		sc->tp = NULL;
> +	}
> +	return error;
> +}

So we have scrub function specific setup, but only a global
teardown? Does that mean scrubber functions are not supposed to
allocate any memory for keeping state and cross references? i.e.
they can only store what they can keep attached to a transaction
handle?

> +
> +/* Perform common scrub context initialization. */
> +STATIC int
> +xfs_scrub_setup(
> +	struct xfs_inode		*ip,
> +	struct xfs_scrub_context	*sc,
> +	const struct xfs_scrub_meta_fns	*fns,
> +	struct xfs_scrub_metadata	*sm,
> +	bool				try_harder)
> +{
> +	memset(sc, 0, sizeof(*sc));
> +	sc->mp = ip->i_mount;
> +	sc->sm = sm;
> +	sc->fns = fns;
> +	sc->try_harder = try_harder;
> +
> +	return sc->fns->setup(sc, ip);
> +}

Does this really need a wrapper function? It means main function is
somewhat convoluted....

> +
> +/* Scrubbing dispatch. */
> +
> +static const struct xfs_scrub_meta_fns meta_scrub_fns[] = {
> +	{ /* dummy verifier */
> +		.setup	= xfs_scrub_setup_fs,
> +		.scrub	= xfs_scrub_dummy,
> +	},
> +};
> +
> +/* Dispatch metadata scrubbing. */
> +int
> +xfs_scrub_metadata(
> +	struct xfs_inode		*ip,
> +	struct xfs_scrub_metadata	*sm)
> +{
> +	struct xfs_scrub_context	sc;
> +	struct xfs_mount		*mp = ip->i_mount;
> +	const struct xfs_scrub_meta_fns	*fns;
> +	bool				try_harder = false;
> +	int				error = 0;
> +
> +	trace_xfs_scrub(ip, sm, error);

	memset(sc, 0, sizeof(*sc));
	sc->mp = ip->i_mount;
	sc->sm = sm;
	sc->try_harder = false;

And we reference everything thru the scrub context from here on.
I find it a bit confusing to hide sm inside sc, and everywhere else
references sc.sm, yet later on in this function we make the
assumption that the structure pointed to by sm is the one that the
scrubber is actually modifying when it is running. e.g. the
xfs_scrub_found_corruption() call at the end. Better to make it
clear all the code is working on the same structure...

> +
> +	/* Forbidden if we are shut down or mounted norecovery. */
> +	error = -ESHUTDOWN;
> +	if (XFS_FORCED_SHUTDOWN(mp))
> +		goto out;

Shutdown conditions should return EFSCORRUPTED.

> +	error = -ENOTRECOVERABLE;
> +	if (mp->m_flags & XFS_MOUNT_NORECOVERY)
> +		goto out;

Same for read only mounts, yes?

Or do we allow scrub on read only, but not repair or whatever
"optimisation" is?

> +	/* Check our inputs. */
> +	error = -EINVAL;
> +	sm->sm_flags &= ~XFS_SCRUB_FLAGS_OUT;
> +	if (sm->sm_flags & ~XFS_SCRUB_FLAGS_IN)
> +		goto out;
> +	if (memchr_inv(sm->sm_reserved, 0, sizeof(sm->sm_reserved)))
> +		goto out;
> +
> +	/* Do we know about this type of metadata? */
> +	error = -ENOENT;
> +	if (sm->sm_type > XFS_SCRUB_TYPE_MAX)
> +		goto out;
> +	fns = &meta_scrub_fns[sm->sm_type];

	sc.fns = &meta_scrub_fns[sm->sm_type];

> new file mode 100644
> index 0000000..4f3113a
> --- /dev/null
> +++ b/fs/xfs/scrub/common.h

scrub.h

> @@ -0,0 +1,179 @@
> +/*
> + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> + *
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#ifndef __XFS_REPAIR_COMMON_H__
> +#define __XFS_REPAIR_COMMON_H__
> +
> +/* Did we find something broken? */
> +static inline bool xfs_scrub_found_corruption(struct xfs_scrub_metadata *sm)
> +{
> +	return sm->sm_flags & (XFS_SCRUB_FLAG_CORRUPT |
> +			       XFS_SCRUB_FLAG_XCORRUPT);
> +}

Unless this is going to get more complex, a single call wrapper is
not necessary.

> +/*
> + * Grab a transaction.  If we're going to repair something, we need to
> + * ensure there's enough reservation to make all the changes.  If not,
> + * we can use an empty transaction.
> + */
> +static inline int
> +xfs_scrub_trans_alloc(
> +	struct xfs_scrub_metadata	*sm,
> +	struct xfs_mount		*mp,
> +	struct xfs_trans_res		*resp,
> +	uint				blocks,
> +	uint				rtextents,
> +	uint				flags,
> +	struct xfs_trans		**tpp)
> +{
> +	return xfs_trans_alloc_empty(mp, tpp);
> +}

If we only ever use an empty transaction here, then can we get rid
of the wrapper, too?

> +
> +/* Check for operational errors. */
> +bool xfs_scrub_op_ok(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
> +		     xfs_agblock_t bno, const char *type, int *error,
> +		     const char	*func, int line);
> +#define XFS_SCRUB_OP_ERROR_GOTO(sc, agno, bno, type, error, label) \
> +	do { \
> +		if (!xfs_scrub_op_ok((sc), (agno), (bno), (type), \
> +				(error), __func__, __LINE__)) \
> +			goto label; \
> +	} while (0)

Ok, I though this is where the func/line variables was going. Can we
please try to avoid these macros? It's not much extra work to do
this:

	if (!xfs_scrub_op_ok(sc, agno, bno, type, error, _THIS_IP_))
		goto label;

But it's much nicer to read than shouty macros, it doesn't hide
goto's in macros, and it provides exactly the same debug info to the
tracing code as the macro.

[snip more macros]

> +/* Setup functions */
> +
> +#define SETUP_FN(name) int name(struct xfs_scrub_context *sc, struct xfs_inode *ip)
> +SETUP_FN(xfs_scrub_setup_fs);
> +#undef SETUP_FN

Please, no. This sort of construct is highly unfriendly to grep and
cscope. It costs us nothing extra to define the names in full, but
it makes finding the code so much easier...


> --- /dev/null
> +++ b/fs/xfs/scrub/xfs_scrub.h
> @@ -0,0 +1,29 @@
> +/*
> + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> + *
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#ifndef __XFS_SCRUB_H__
> +#define __XFS_SCRUB_H__
> +
> +#ifndef CONFIG_XFS_ONLINE_SCRUB
> +# define xfs_scrub_metadata(ip, sm)	(-ENOTTY)
> +#else
> +int xfs_scrub_metadata(struct xfs_inode *ip, struct xfs_scrub_metadata *sm);
> +#endif /* CONFIG_XFS_ONLINE_SCRUB */
> +
> +#endif	/* __XFS_SCRUB_H__ */

This ioctl module stub code should be separated out into it's own
patch with all the other code that enables building scrub as a
module.

...
> index 2e7e193..d4de29b 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -3312,7 +3312,7 @@ DEFINE_GETFSMAP_EVENT(xfs_getfsmap_mapping);
>  
>  /* scrub */
>  #define XFS_SCRUB_TYPE_DESC \
> -	{ 0, NULL }
> +	{ XFS_SCRUB_TYPE_TEST,		"dummy" }
>  DECLARE_EVENT_CLASS(xfs_scrub_class,
>  	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
>  		 int error),
> @@ -3330,6 +3330,11 @@ DECLARE_EVENT_CLASS(xfs_scrub_class,
>  	TP_fast_assign(
>  		__entry->dev = ip->i_mount->m_super->s_dev;
>  		__entry->ino = ip->i_ino;
> +		__entry->type = sm->sm_type;
> +		__entry->agno = sm->sm_agno;
> +		__entry->inum = sm->sm_ino;
> +		__entry->gen = sm->sm_gen;
> +		__entry->flags = sm->sm_flags;
>  		__entry->error = error;
>  	),
>  	TP_printk("dev %d:%d ino %llu type %s agno %u inum %llu gen %u flags 0x%x error %d",

So the tracepoints in the previous patch are dependent on structures
that have no yet been defined? Doesn't that break compilation?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 04/22] xfs: generic functions to scrub metadata and btrees
  2017-07-21  4:38 ` [PATCH 04/22] xfs: generic functions to scrub metadata and btrees Darrick J. Wong
  2017-07-23 16:40   ` Allison Henderson
@ 2017-07-24  1:05   ` Dave Chinner
  2017-07-24 21:58     ` Darrick J. Wong
  1 sibling, 1 reply; 63+ messages in thread
From: Dave Chinner @ 2017-07-24  1:05 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Jul 20, 2017 at 09:38:54PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Create a function that walks a btree, checking the integrity of each
> btree block (headers, keys, records) and calling back to the caller
> to perform further checks on the records.  Add some helper functions
> so that we report detailed scrub errors in a uniform manner in dmesg.
> These are helper functions for subsequent patches.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/Makefile       |    1 
>  fs/xfs/scrub/btree.c  |  658 +++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/scrub/btree.h  |   95 +++++++
>  fs/xfs/scrub/common.c |  169 +++++++++++++
>  fs/xfs/scrub/common.h |   30 ++
>  5 files changed, 953 insertions(+)
>  create mode 100644 fs/xfs/scrub/btree.c
>  create mode 100644 fs/xfs/scrub/btree.h
.....
> +/* btree scrubbing */
> +
> +const char * const btree_types[] = {
> +	[XFS_BTNUM_BNO]		= "bnobt",
> +	[XFS_BTNUM_CNT]		= "cntbt",
> +	[XFS_BTNUM_RMAP]	= "rmapbt",
> +	[XFS_BTNUM_BMAP]	= "bmapbt",
> +	[XFS_BTNUM_INO]		= "inobt",
> +	[XFS_BTNUM_FINO]	= "finobt",
> +	[XFS_BTNUM_REFC]	= "refcountbt",
> +};

Don't we already have that already defined somewhere?

> +
> +/* Format the trace parameters for the tree cursor. */
> +static inline void
> +xfs_scrub_btree_format(
> +	struct xfs_btree_cur		*cur,
> +	int				level,
> +	char				*bt_type,
> +	size_t				type_len,
> +	char				*bt_ptr,
> +	size_t				ptr_len,
> +	xfs_fsblock_t			*fsbno)
> +{
> +	char				*type = NULL;
> +	struct xfs_btree_block		*block;
> +	struct xfs_buf			*bp;

hmmm - complex text formatting just for trace point output,
which are rarely going to be used in production? Not sure how I feel
about that yet.

Also, the function is way too big for being an inline function. I'd
be tempted to mark it noinline so the compiler doesn't blow out the
size of the code unnecessarily with automatic inlining of static
functions.

(I haven't reviewed the formatting for sanity).

> +/* Check for btree corruption. */
> +bool
> +xfs_scrub_btree_ok(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_btree_cur		*cur,
> +	int				level,
> +	bool				fs_ok,
> +	const char			*check,
> +	const char			*func,
> +	int				line)
> +{
> +	char				bt_ptr[24];
> +	char				bt_type[48];
> +	xfs_fsblock_t			fsbno;
> +
> +	if (fs_ok)
> +		return fs_ok;
> +
> +	sc->sm->sm_flags |= XFS_SCRUB_FLAG_CORRUPT;
> +	xfs_scrub_btree_format(cur, level, bt_type, 48, bt_ptr, 24, &fsbno);

Ok, magic numbers for buffer lengths. Please use #defines for these
with an explanation of why the chosen lengths are sufficient for the
information they'll be used to hold.

> +	trace_xfs_scrub_btree_error(cur->bc_mp, bt_type, bt_ptr,
> +			XFS_FSB_TO_AGNO(cur->bc_mp, fsbno),
> +			XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno),
> +			check, func, line);

hmmmm - tracepoints are conditional, but the formatting call isn't.
Can this formatting be called/run from inside the tracepoint code
itself?

> +
> +/*
> + * Make sure this record is in order and doesn't stray outside of the parent
> + * keys.
> + */
> +STATIC int
> +xfs_scrub_btree_rec(
> +	struct xfs_scrub_btree	*bs)
> +{
> +	struct xfs_btree_cur	*cur = bs->cur;
> +	union xfs_btree_rec	*rec;
> +	union xfs_btree_key	key;
> +	union xfs_btree_key	hkey;
> +	union xfs_btree_key	*keyp;
> +	struct xfs_btree_block	*block;
> +	struct xfs_btree_block	*keyblock;
> +	struct xfs_buf		*bp;
> +
> +	block = xfs_btree_get_block(cur, 0, &bp);
> +	rec = xfs_btree_rec_addr(cur, cur->bc_ptrs[0], block);
> +
> +	if (bp)
> +		trace_xfs_scrub_btree_rec(cur->bc_mp,
> +				XFS_FSB_TO_AGNO(cur->bc_mp,
> +					XFS_DADDR_TO_FSB(cur->bc_mp, bp->b_bn)),
> +				XFS_FSB_TO_AGBNO(cur->bc_mp,
> +					XFS_DADDR_TO_FSB(cur->bc_mp, bp->b_bn)),
> +				cur->bc_btnum, 0, cur->bc_nlevels,
> +				cur->bc_ptrs[0]);
> +	else if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE)
> +		trace_xfs_scrub_btree_rec(cur->bc_mp,
> +				XFS_INO_TO_AGNO(cur->bc_mp,
> +					cur->bc_private.b.ip->i_ino),
> +				XFS_INO_TO_AGBNO(cur->bc_mp,
> +					cur->bc_private.b.ip->i_ino),
> +				cur->bc_btnum, 0, cur->bc_nlevels,
> +				cur->bc_ptrs[0]);
> +	else
> +		trace_xfs_scrub_btree_rec(cur->bc_mp,
> +				NULLAGNUMBER, NULLAGBLOCK,
> +				cur->bc_btnum, 0, cur->bc_nlevels,
> +				cur->bc_ptrs[0]);

Hmmm - there's more code in the trace calls than there is in the
scrubbing code. Is this really all necessary? I can see code
getting changed in future but not the tracepoints, similar to how
comment updates get missed...

> +	/* If this isn't the first record, are they in order? */
> +	XFS_SCRUB_BTREC_CHECK(bs, bs->firstrec ||
> +			cur->bc_ops->recs_inorder(cur, &bs->lastrec, rec));

So, I go look at the macro:

define XFS_SCRUB_BTREC_CHECK(bs, fs_ok) \
	xfs_scrub_btree_ok((bs)->sc, (bs)->cur, 0, (fs_ok), #fs_ok, \
			   __func__, __LINE__)

I find this:

	/* If this isn't the first record, are they in order? */
	if (!(bs->firstrec &&
	     cur->bc_ops->recs_inorder(cur, &bs->lastrec, rec)))
		xfs_scrub_btree_error(bs->sc, cur, 0, "Record order", _THIS_IP_)

A lot easier to read, understand and maintain because I don't have
to go look at a macro to find out it actually does and what happens
if the records aren't in order....

> +/* Check a btree pointer. */
> +static int
> +xfs_scrub_btree_ptr(
> +	struct xfs_scrub_btree		*bs,
> +	int				level,
> +	union xfs_btree_ptr		*ptr)
> +{
> +	struct xfs_btree_cur		*cur = bs->cur;
> +	xfs_daddr_t			daddr;
> +	xfs_daddr_t			eofs;
> +
> +	if ((cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) &&
> +			level == cur->bc_nlevels) {
> +		if (cur->bc_flags & XFS_BTREE_LONG_PTRS) {
> +			XFS_SCRUB_BTKEY_GOTO(bs, level, ptr->l == 0, corrupt);
> +		} else {
> +			XFS_SCRUB_BTKEY_GOTO(bs, level, ptr->s == 0, corrupt);
> +		}
> +		return 0;
> +	}
> +
> +	if (cur->bc_flags & XFS_BTREE_LONG_PTRS) {
> +		XFS_SCRUB_BTKEY_GOTO(bs, level,
> +				ptr->l != cpu_to_be64(NULLFSBLOCK), corrupt);
> +
> +		daddr = XFS_FSB_TO_DADDR(cur->bc_mp, be64_to_cpu(ptr->l));
> +	} else {
> +		XFS_SCRUB_BTKEY_GOTO(bs, level,
> +				cur->bc_private.a.agno != NULLAGNUMBER, corrupt);
> +		XFS_SCRUB_BTKEY_GOTO(bs, level,
> +				ptr->s != cpu_to_be32(NULLAGBLOCK), corrupt);
> +

Need to check the ptr points to an agbno within the AG size.

Also:
	why no tracing on ptr values?
	check the ptr points to an agbno within the AG size.
	check the ptr points to an agno within agcount


> +		daddr = XFS_AGB_TO_DADDR(cur->bc_mp, cur->bc_private.a.agno,
> +				be32_to_cpu(ptr->s));
> +	}
> +	eofs = XFS_FSB_TO_BB(cur->bc_mp, cur->bc_mp->m_sb.sb_dblocks);
> +	XFS_SCRUB_BTKEY_GOTO(bs, level, daddr != 0, corrupt);
> +	XFS_SCRUB_BTKEY_GOTO(bs, level, daddr < eofs, corrupt);
> +
> +	return 0;
> +
> +corrupt:
> +	return -EFSCORRUPTED;
> +}
> +
> +/* Check the siblings of a large format btree block. */
> +STATIC int
> +xfs_scrub_btree_lblock_check_siblings(
> +	struct xfs_scrub_btree		*bs,
> +	struct xfs_btree_block		*block)
> +{
> +	struct xfs_btree_block		*pblock;
> +	struct xfs_buf			*pbp;
> +	struct xfs_btree_cur		*ncur = NULL;
> +	union xfs_btree_ptr		*pp;
> +	xfs_fsblock_t			leftsib;
> +	xfs_fsblock_t			rightsib;
> +	xfs_fsblock_t			fsbno;
> +	int				level;
> +	int				success;
> +	int				error = 0;
> +
> +	leftsib = be64_to_cpu(block->bb_u.l.bb_leftsib);
> +	rightsib = be64_to_cpu(block->bb_u.l.bb_rightsib);
> +	level = xfs_btree_get_level(block);
> +
> +	/* Root block should never have siblings. */
> +	if (level == bs->cur->bc_nlevels - 1) {
> +		XFS_SCRUB_BTKEY_CHECK(bs, level, leftsib == NULLFSBLOCK);
> +		XFS_SCRUB_BTKEY_CHECK(bs, level, rightsib == NULLFSBLOCK);
> +		return error;
> +	}

This is where the macros force us into silly patterns and blow out
the code size.

	if (level == bs->cur->bc_nlevels - 1 &&
	    (leftsib != NULLFSBLOCK || rightsib != NULLFSBLOCK) {
		/* error trace call */
		return error;
	}


> +	/* Does the left sibling match the parent level left block? */
> +	if (leftsib != NULLFSBLOCK) {
> +		error = xfs_btree_dup_cursor(bs->cur, &ncur);
> +		if (error)
> +			return error;
> +		error = xfs_btree_decrement(ncur, level + 1, &success);
> +		XFS_SCRUB_BTKEY_OP_ERROR_GOTO(bs, level + 1, &error, out_cur);

Hmmm - if I read that right, there's a goto out_cur on error hidden
in this macro....

> +		XFS_SCRUB_BTKEY_GOTO(bs, level, success, out_cur);
> +
> +		pblock = xfs_btree_get_block(ncur, level + 1, &pbp);
> +		pp = xfs_btree_ptr_addr(ncur, ncur->bc_ptrs[level + 1], pblock);
> +		if (!xfs_scrub_btree_ptr(bs, level + 1, pp)) {
> +			fsbno = be64_to_cpu(pp->l);
> +			XFS_SCRUB_BTKEY_CHECK(bs, level, fsbno == leftsib);
> +		}
> +
> +		xfs_btree_del_cursor(ncur, XFS_BTREE_ERROR);
> +		ncur = NULL;
> +	}
> +
> +	/* Does the right sibling match the parent level right block? */
> +	if (!error && rightsib != NULLFSBLOCK) {

So when would error ever be non-zero here?

This is one of the reasons I really don't like all the macros in
this code - it unnecessarily obfuscates the checks being done and
the code flow....

> +/* Check the siblings of a small format btree block. */
> +STATIC int
> +xfs_scrub_btree_sblock_check_siblings(
> +	struct xfs_scrub_btree		*bs,
> +	struct xfs_btree_block		*block)
> +{
> +	struct xfs_btree_block		*pblock;
> +	struct xfs_buf			*pbp;
> +	struct xfs_btree_cur		*ncur = NULL;
> +	union xfs_btree_ptr		*pp;
> +	xfs_agblock_t			leftsib;
> +	xfs_agblock_t			rightsib;
> +	xfs_agblock_t			agbno;
> +	int				level;
> +	int				success;
> +	int				error = 0;
> +
> +	leftsib = be32_to_cpu(block->bb_u.s.bb_leftsib);
> +	rightsib = be32_to_cpu(block->bb_u.s.bb_rightsib);
> +	level = xfs_btree_get_level(block);
> +
> +	/* Root block should never have siblings. */
> +	if (level == bs->cur->bc_nlevels - 1) {
> +		XFS_SCRUB_BTKEY_CHECK(bs, level, leftsib == NULLAGBLOCK);
> +		XFS_SCRUB_BTKEY_CHECK(bs, level, rightsib == NULLAGBLOCK);
> +		return error;
> +	}
> +
> +	/* Does the left sibling match the parent level left block? */
> +	if (leftsib != NULLAGBLOCK) {
> +		error = xfs_btree_dup_cursor(bs->cur, &ncur);
> +		if (error)
> +			return error;
> +		error = xfs_btree_decrement(ncur, level + 1, &success);
> +		XFS_SCRUB_BTKEY_OP_ERROR_GOTO(bs, level + 1, &error, out_cur);
> +		XFS_SCRUB_BTKEY_GOTO(bs, level, success, verify_rightsib);

Why is this different to the lblock checks?

FWIW, this is one of the reasons for abstracting the sblock/lblock
code as much as possible - the core operations should be identical,
so apart from header decode/encode operations, the rest of the code
should be shared...

> +
> +		pblock = xfs_btree_get_block(ncur, level + 1, &pbp);
> +		pp = xfs_btree_ptr_addr(ncur, ncur->bc_ptrs[level + 1], pblock);
> +		if (!xfs_scrub_btree_ptr(bs, level + 1, pp)) {
> +			agbno = be32_to_cpu(pp->s);
> +			XFS_SCRUB_BTKEY_CHECK(bs, level, agbno == leftsib);
> +		}
> +
> +		xfs_btree_del_cursor(ncur, XFS_BTREE_ERROR);
> +		ncur = NULL;
> +	}
> +
> +verify_rightsib:
> +	if (ncur) {
> +		xfs_btree_del_cursor(ncur, XFS_BTREE_ERROR);
> +		ncur = NULL;
> +	}
> +
> +	/* Does the right sibling match the parent level right block? */
> +	if (rightsib != NULLAGBLOCK) {

No "if (!error ...) check here - I'm thinking there's some factoring
needed here to reduce the code duplication going on here...

> +/*
> + * Visit all nodes and leaves of a btree.  Check that all pointers and
> + * records are in order, that the keys reflect the records, and use a callback
> + * so that the caller can verify individual records.  The callback is the same
> + * as the one for xfs_btree_query_range, so therefore this function also
> + * returns XFS_BTREE_QUERY_RANGE_ABORT, zero, or a negative error code.
> + */
> +int
> +xfs_scrub_btree(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_btree_cur		*cur,
> +	xfs_scrub_btree_rec_fn		scrub_fn,
> +	struct xfs_owner_info		*oinfo,
> +	void				*private)
> +{
> +	struct xfs_scrub_btree		bs = {0};
> +	union xfs_btree_ptr		ptr;
> +	union xfs_btree_ptr		*pp;
> +	union xfs_btree_rec		*recp;
> +	struct xfs_btree_block		*block;
> +	int				level;
> +	struct xfs_buf			*bp;
> +	int				i;
> +	int				error = 0;
> +
> +	/* Finish filling out the scrub state */

	/* Initialise the scrub state */

> +	bs.cur = cur;
> +	bs.scrub_rec = scrub_fn;
> +	bs.oinfo = oinfo;
> +	bs.firstrec = true;
> +	bs.private = private;
> +	bs.sc = sc;
> +	for (i = 0; i < XFS_BTREE_MAXLEVELS; i++)
> +		bs.firstkey[i] = true;
> +	INIT_LIST_HEAD(&bs.to_check);
> +
> +	if (cur->bc_flags & XFS_BTREE_LONG_PTRS)
> +		bs.check_siblings_fn = xfs_scrub_btree_lblock_check_siblings;
> +	else
> +		bs.check_siblings_fn = xfs_scrub_btree_sblock_check_siblings;

I'm thinking now that maybe a "get sibling from block" is what is
necessary here, so there can be a shared check function....

> +	/* Don't try to check a tree with a height we can't handle. */
> +	XFS_SCRUB_BTREC_GOTO(&bs, cur->bc_nlevels > 0, out_badcursor);
> +	XFS_SCRUB_BTREC_GOTO(&bs, cur->bc_nlevels <= XFS_BTREE_MAXLEVELS,
> +			out_badcursor);

More single checks that are doubled up...

....
> +out:
> +	/*
> +	 * If we don't end this function with the cursor pointing at a record
> +	 * block, a subsequent non-error cursor deletion will not release
> +	 * node-level buffers, causing a buffer leak.  This is quite possible
> +	 * with a zero-results scrubbing run, so release the buffers if we
> +	 * aren't pointing at a record.
> +	 */
> +	if (cur->bc_bufs[0] == NULL) {
> +		for (i = 0; i < cur->bc_nlevels; i++) {
> +			if (cur->bc_bufs[i]) {
> +				xfs_trans_brelse(cur->bc_tp, cur->bc_bufs[i]);
> +				cur->bc_bufs[i] = NULL;
> +				cur->bc_ptrs[i] = 0;
> +				cur->bc_ra[i] = 0;
> +			}
> +		}
> +	}

I think cursor deletion should be made to handle this case, rather
than special casing it here....

> +struct xfs_scrub_btree;
> +typedef int (*xfs_scrub_btree_rec_fn)(
> +	struct xfs_scrub_btree	*bs,
> +	union xfs_btree_rec	*rec);
> +
> +struct xfs_scrub_btree {
> +	/* caller-provided scrub state */
> +	struct xfs_scrub_context	*sc;
> +	struct xfs_btree_cur		*cur;
> +	xfs_scrub_btree_rec_fn		scrub_rec;
> +	struct xfs_owner_info		*oinfo;
> +	void				*private;
> +
> +	/* internal scrub state */
> +	union xfs_btree_rec		lastrec;
> +	bool				firstrec;
> +	union xfs_btree_key		lastkey[XFS_BTREE_MAXLEVELS];
> +	bool				firstkey[XFS_BTREE_MAXLEVELS];
> +	struct list_head		to_check;
> +	int				(*check_siblings_fn)(
> +						struct xfs_scrub_btree *,
> +						struct xfs_btree_block *);
> +};

This looks like maybe another ops style structure should be used. We've
got a xfs_scrub_btree_rec_fn() and check_siblings_fn() operations -
maybe these should be pushed into the generic libxfs btree ops
vectors?

> +int xfs_scrub_btree(struct xfs_scrub_context *sc, struct xfs_btree_cur *cur,
> +		    xfs_scrub_btree_rec_fn scrub_fn,
> +		    struct xfs_owner_info *oinfo, void *private);
> +
> +#endif /* __XFS_REPAIR_BTREE_H__ */
> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> index 6931793..331aa14 100644
> --- a/fs/xfs/scrub/common.c
> +++ b/fs/xfs/scrub/common.c
> @@ -43,6 +43,7 @@
>  #include "xfs_rmap_btree.h"
>  #include "scrub/xfs_scrub.h"
>  #include "scrub/common.h"
> +#include "scrub/btree.h"
>  
>  /*
>   * Online Scrub and Repair
> @@ -367,6 +368,172 @@ xfs_scrub_incomplete(
>  	return fs_ok;
>  }
>  
> +/* AG scrubbing */
> +

All this from here down doesn't seem related to scrubbing a btree?
It's infrastructure for scanning AGs, but I don't see where it is
called from - it looks unused at this point. I think it should be
separated from the btree validation into it's own patchset and put
before the individual btree verification code...

I haven't really looked at this AG scrubbing code in depth because I
can't tell how it fits into the code that is supposed to call it
yet. I've read it, but without knowing how it's called I can't tell
if the abstractions are just convoluted or whether they are
necessary due to the infrastructure eventually ending up with
multiple different call sites for some of the functions...

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 05/22] xfs: scrub in-memory metadata buffers
  2017-07-21  4:39 ` [PATCH 05/22] xfs: scrub in-memory metadata buffers Darrick J. Wong
  2017-07-23 16:48   ` Allison Henderson
@ 2017-07-24  1:43   ` Dave Chinner
  2017-07-24 22:36     ` Darrick J. Wong
  1 sibling, 1 reply; 63+ messages in thread
From: Dave Chinner @ 2017-07-24  1:43 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Jul 20, 2017 at 09:39:00PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Call the verifier function for all in-memory metadata buffers, looking
> for memory corruption either due to bad memory or coding bugs.

How does this fit into the bigger picture? We can't do an exhaustive
search of the in memory buffer cache, because access is racy w.r.t.
the life cycle of in memory buffers.

Also, if we are doing a full scrub, we're going to hit and then
check the cached in-memory buffers anyway, so I'm missing the
context that explains why this code is necessary.

>  #endif	/* __XFS_REPAIR_COMMON_H__ */
> diff --git a/fs/xfs/scrub/metabufs.c b/fs/xfs/scrub/metabufs.c
> new file mode 100644
> index 0000000..63faaa6
> --- /dev/null
> +++ b/fs/xfs/scrub/metabufs.c
> @@ -0,0 +1,177 @@
> +/*
> + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> + *
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#include "xfs.h"
> +#include "xfs_fs.h"
> +#include "xfs_shared.h"
> +#include "xfs_format.h"
> +#include "xfs_trans_resv.h"
> +#include "xfs_mount.h"
> +#include "xfs_defer.h"
> +#include "xfs_btree.h"
> +#include "xfs_bit.h"
> +#include "xfs_log_format.h"
> +#include "xfs_trans.h"
> +#include "xfs_trace.h"
> +#include "xfs_sb.h"
> +#include "scrub/common.h"
> +
> +/* We only iterate buffers one by one, so we don't need any setup. */
> +int
> +xfs_scrub_setup_metabufs(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_inode		*ip)
> +{
> +	return 0;
> +}
> +
> +#define XFS_SCRUB_METABUFS_TOO_MANY_RETRIES	10
> +struct xfs_scrub_metabufs_info {
> +	struct xfs_scrub_context	*sc;
> +	unsigned int			retries;
> +};

So do we get 10 retries per buffer, or 10 retries across an entire
scan?

> +/* In-memory buffer corruption. */
> +
> +#define XFS_SCRUB_BUF_OP_ERROR_GOTO(label) \
> +	XFS_SCRUB_OP_ERROR_GOTO(smi->sc, \
> +			xfs_daddr_to_agno(smi->sc->mp, bp->b_bn), \
> +			xfs_daddr_to_agbno(smi->sc->mp, bp->b_bn), "buf", \
> +			&error, label)

Nested macros - yuck!

> +STATIC int
> +xfs_scrub_metabufs_scrub_buf(
> +	struct xfs_scrub_metabufs_info	*smi,
> +	struct xfs_buf			*bp)
> +{
> +	int				olderror;
> +	int				error = 0;
> +
> +	/*
> +	 * We hold the rcu lock during the rhashtable walk, so we can't risk
> +	 * having the log forced due to a stale buffer by xfs_buf_lock.
> +	 */
> +	if (bp->b_flags & XBF_STALE)
> +		return 0;
> +
> +	atomic_inc(&bp->b_hold);

This looks wrong. I think it can race with reclaim because we don't
hold the pag->pag_buf_lock. i.e.  xfs_buf_rele() does this:

	release = atomic_dec_and_lock(&bp->b_hold, &pag->pag_buf_lock);

to prevent lookups - which are done under the pag->pag_buf_lock -
from finding the buffer while it has a zero hold count and may be
removed from the cache and freed.

Further, if we are going to iterate the cache, I'd much prefer the
iteration code to be in fs/xfs/xfs_buf.c - nothing outside that file
should be touching core buffer cache structures....

FWIW, the LRU walks already handle this problem by the fact the LRU
owns a hold count on the buffer. So it may be better to do this via
a LRU walk rather than a hashtable walk....

[snip]

> +	olderror = bp->b_error;
> +	if (bp->b_fspriv)
> +		bp->b_ops->verify_write(bp);

Should we be recalculating the CRC on buffers we aren't about to 
be writing to disk? Should we be verifying a buffer that has a
non-zero error value on it?

> +	else
> +		bp->b_ops->verify_read(bp);
> +	error = bp->b_error;
> +	bp->b_error = olderror;
> +
> +	/* Mark any corruption errors we might find. */
> +	XFS_SCRUB_BUF_OP_ERROR_GOTO(out_unlock);

Ah, what? Why does this need a goto? And why doesn't it report the
error that was found? (bloody macros!).

> +out_unlock:
> +	xfs_buf_unlock(bp);
> +out_dec:
> +	atomic_dec(&bp->b_hold);
> +	return error;
> +}
> +#undef XFS_SCRUB_BUF_OP_ERROR_GOTO

Oh, that's the macro defined above the function. Which I paid little
attention to other than it called another macro. Now I realise that
it (ab)uses local variables without them being passed into the
macro. Yup, another reason we need to get rid of the macros from
this code....

> +	struct xfs_scrub_metabufs_info	*smi,
> +	struct rhashtable_iter		*iter)
> +{
> +	struct xfs_buf			*bp;
> +	int				error = 0;
> +
> +	do {
> +		if (xfs_scrub_should_terminate(&error))
> +			break;
> +
> +		bp = rhashtable_walk_next(iter);
> +		if (IS_ERR(bp))
> +			return PTR_ERR(bp);
> +		else if (bp == NULL)
> +			return 0;
> +
> +		error = xfs_scrub_metabufs_scrub_buf(smi, bp);
> +	} while (error != 0);
> +
> +	return error;
> +}
> +
> +/* Try to walk the buffers in this AG in order to scrub them. */
> +int
> +xfs_scrub_metabufs(

Ah, please put an "_ag_" in this so it's clear it's only scrubbing a
single AG. This is hidden deep inside the scrub context, so it took
me a little bit of back tracking to understand that this wasn't a
global scan....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 01/22] xfs: query the per-AG reservation counters
  2017-07-23 22:25   ` Dave Chinner
@ 2017-07-24 19:07     ` Darrick J. Wong
  0 siblings, 0 replies; 63+ messages in thread
From: Darrick J. Wong @ 2017-07-24 19:07 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, Jul 24, 2017 at 08:25:16AM +1000, Dave Chinner wrote:
> On Thu, Jul 20, 2017 at 09:38:35PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Establish an ioctl for userspace to query the original and current
> > per-AG reservation counts.  This will be used by xfs_scrub to
> > check that the vfs counters are at least somewhat sane.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/libxfs/xfs_fs.h |   10 ++++++++++
> >  fs/xfs/xfs_fsops.c     |   29 +++++++++++++++++++++++++++++
> >  fs/xfs/xfs_fsops.h     |    2 ++
> >  fs/xfs/xfs_ioctl.c     |   16 ++++++++++++++++
> >  fs/xfs/xfs_ioctl32.c   |    1 +
> >  5 files changed, 58 insertions(+)
> > 
> > 
> > diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> > index 8c61f21..5dedab9 100644
> > --- a/fs/xfs/libxfs/xfs_fs.h
> > +++ b/fs/xfs/libxfs/xfs_fs.h
> > @@ -469,6 +469,15 @@ typedef struct xfs_swapext
> >  #define XFS_FSOP_GOING_FLAGS_NOLOGFLUSH		0x2	/* don't flush log nor data */
> >  
> >  /*
> > + * AG reserved block counters
> > + */
> > +struct xfs_fsop_ag_resblks {
> > +	__u64 resblks;		/* blocks reserved now */
> 
> 		current_reservation
> 
> > +	__u64 resblks_orig;	/* blocks reserved at mount time */
> 
> 		mount_reservation;
> 
> > +	__u64 reserved[2];

So long as I'm going to add a flags field I might as well bump this
whole structure up to a full 64 bytes.

struct xfs_fsop_ag_resblks {
	__u32 flags;			/* output flags, none defined now */
	__u32 reserved;			/* zero */
	__u64 current_resv;		/* blocks reserved now */
	__u64 mount_resv;		/* blocks reserved at mount time */
	__u64 reserved[5];		/* zero */
};


> > +};
> 
> Also, any new structure we pass to userspace should be versioned
> from the start. At minimum, a flags field so we can, in future, tell
> userspace what the reserved space means in future.

Will do.

> > +
> > +/* Query the per-AG reservations to see how many blocks we have reserved. */
> > +int
> > +xfs_fs_get_ag_reserve_blocks(
> > +	struct xfs_mount		*mp,
> > +	struct xfs_fsop_ag_resblks	*out)
> > +{
> > +	struct xfs_ag_resv		*r;
> > +	struct xfs_perag		*pag;
> > +	xfs_agnumber_t			agno;
> > +
> > +	out->resblks = 0;
> > +	out->resblks_orig = 0;
> > +	out->reserved[0] = out->reserved[1] = 0;
> 
> memset() the structure so we don't forget in future to zero it
> properly.

Ok.

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 13/22] xfs: scrub inodes
  2017-07-23 17:38   ` Allison Henderson
@ 2017-07-24 20:02     ` Darrick J. Wong
  0 siblings, 0 replies; 63+ messages in thread
From: Darrick J. Wong @ 2017-07-24 20:02 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Sun, Jul 23, 2017 at 10:38:08AM -0700, Allison Henderson wrote:
> 
> 
> On 7/20/2017 9:39 PM, Darrick J. Wong wrote:
> >From: Darrick J. Wong <darrick.wong@oracle.com>
> >
> >Scrub the fields within an inode.
> >
> >Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> >---
> > fs/xfs/Makefile        |    1
> > fs/xfs/libxfs/xfs_fs.h |    3
> > fs/xfs/scrub/common.c  |   64 +++++++++
> > fs/xfs/scrub/common.h  |    4 +
> > fs/xfs/scrub/inode.c   |  326 ++++++++++++++++++++++++++++++++++++++++++++++++
> > fs/xfs/xfs_trace.h     |    3
> > 6 files changed, 397 insertions(+), 4 deletions(-)
> > create mode 100644 fs/xfs/scrub/inode.c
> >
> >
> >diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> >index 1b1972b..2ba33ad 100644
> >--- a/fs/xfs/Makefile
> >+++ b/fs/xfs/Makefile
> >@@ -145,6 +145,7 @@ xfs-y				+= $(addprefix scrub/, \
> > 				   btree.o \
> > 				   common.o \
> > 				   ialloc.o \
> >+				   inode.o \
> > 				   metabufs.o \
> > 				   refcount.o \
> > 				   rmap.o \
> >diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> >index 3253de9..277b528 100644
> >--- a/fs/xfs/libxfs/xfs_fs.h
> >+++ b/fs/xfs/libxfs/xfs_fs.h
> >@@ -493,7 +493,8 @@ struct xfs_scrub_metadata {
> > #define XFS_SCRUB_TYPE_FINOBT	9	/* free inode btree */
> > #define XFS_SCRUB_TYPE_RMAPBT	10	/* reverse mapping btree */
> > #define XFS_SCRUB_TYPE_REFCNTBT	11	/* reference count btree */
> >-#define XFS_SCRUB_TYPE_MAX	11
> >+#define XFS_SCRUB_TYPE_INODE	12	/* inode record */
> >+#define XFS_SCRUB_TYPE_MAX	12
> >
> > /* i: repair this metadata */
> > #define XFS_SCRUB_FLAG_REPAIR		(1 << 0)
> >diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> >index 71a980e..066fd3e 100644
> >--- a/fs/xfs/scrub/common.c
> >+++ b/fs/xfs/scrub/common.c
> >@@ -31,6 +31,8 @@
> > #include "xfs_trace.h"
> > #include "xfs_sb.h"
> > #include "xfs_inode.h"
> >+#include "xfs_icache.h"
> >+#include "xfs_itable.h"
> > #include "xfs_alloc.h"
> > #include "xfs_alloc_btree.h"
> > #include "xfs_bmap.h"
> >@@ -584,12 +586,60 @@ xfs_scrub_setup_fs(
> > 			&M_RES(sc->mp)->tr_itruncate, 0, 0, 0, &sc->tp);
> > }
> >
> >+/*
> >+ * Given an inode and the scrub control structure, grab either the
> >+ * inode referenced in the control structure or the inode passed in.
> >+ * The inode is not locked.
> >+ */
> >+int
> >+xfs_scrub_get_inode(
> >+	struct xfs_scrub_context	*sc,
> >+	struct xfs_inode		*ip_in)
> >+{
> >+	struct xfs_mount		*mp = sc->mp;
> >+	struct xfs_inode		*ips = NULL;
> >+	int				error;
> >+
> >+	if (sc->sm->sm_agno || (sc->sm->sm_gen && !sc->sm->sm_ino))
> >+		return -EINVAL;
> >+
> >+	/* We want to scan the inode we already had opened. */
> >+	if (sc->sm->sm_ino == 0 || sc->sm->sm_ino == ip_in->i_ino) {
> >+		sc->ip = ip_in;
> >+		return 0;
> >+	}
> >+
> >+	/* Look up the inode, see if the generation number matches. */
> >+	if (xfs_internal_inum(mp, sc->sm->sm_ino))
> >+		return -ENOENT;
> >+	error = xfs_iget(mp, NULL, sc->sm->sm_ino, XFS_IGET_UNTRUSTED,
> >+			0, &ips);
> >+	if (error == -ENOENT || error == -EINVAL) {
> >+		/* inode doesn't exist... */
> >+		return -ENOENT;
> >+	} else if (error) {
> >+		trace_xfs_scrub_op_error(mp,
> >+				XFS_INO_TO_AGNO(mp, sc->sm->sm_ino),
> >+				XFS_INO_TO_AGBNO(mp, sc->sm->sm_ino),
> >+				"inode", error, __func__, __LINE__);
> >+		return error;
> >+	}
> >+	if (VFS_I(ips)->i_generation != sc->sm->sm_gen) {
> >+		IRELE(ips);
> >+		return -ENOENT;
> >+	}
> >+
> >+	sc->ip = ips;
> >+	return 0;
> >+}
> >+
> > /* Scrub setup and teardown */
> >
> > /* Free all the resources and finish the transactions. */
> > STATIC int
> > xfs_scrub_teardown(
> > 	struct xfs_scrub_context	*sc,
> >+	struct xfs_inode		*ip_in,
> > 	int				error)
> > {
> > 	xfs_scrub_ag_free(sc, &sc->sa);
> >@@ -597,6 +647,12 @@ xfs_scrub_teardown(
> > 		xfs_trans_cancel(sc->tp);
> > 		sc->tp = NULL;
> > 	}
> >+	if (sc->ip) {
> >+		xfs_iunlock(sc->ip, sc->ilock_flags);
> >+		if (sc->ip != ip_in)
> >+			IRELE(sc->ip);
> >+		sc->ip = NULL;
> >+	}
> > 	return error;
> > }
> >
> >@@ -735,6 +791,10 @@ static const struct xfs_scrub_meta_fns meta_scrub_fns[] = {
> > 		.scrub	= xfs_scrub_refcountbt,
> > 		.has	= xfs_sb_version_hasreflink,
> > 	},
> >+	{ /* inode record */
> >+		.setup	= xfs_scrub_setup_inode,
> >+		.scrub	= xfs_scrub_inode,
> >+	},
> > };
> >
> > /* Dispatch metadata scrubbing. */
> >@@ -808,7 +868,7 @@ xfs_scrub_metadata(
> > 		 * Tear down everything we hold, then set up again with
> > 		 * preparation for worst-case scenarios.
> > 		 */
> >-		error = xfs_scrub_teardown(&sc, 0);
> >+		error = xfs_scrub_teardown(&sc, ip, 0);
> > 		if (error)
> > 			goto out;
> > 		try_harder = true;
> >@@ -820,7 +880,7 @@ xfs_scrub_metadata(
> > 		xfs_alert_ratelimited(mp, "Corruption detected during scrub.");
> >
> > out_teardown:
> >-	error = xfs_scrub_teardown(&sc, error);
> >+	error = xfs_scrub_teardown(&sc, ip, error);
> > out:
> > 	trace_xfs_scrub_done(ip, sm, error);
> > 	return error;
> >diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
> >index 1f9ba8c6..5caa6c9 100644
> >--- a/fs/xfs/scrub/common.h
> >+++ b/fs/xfs/scrub/common.h
> >@@ -52,6 +52,7 @@ struct xfs_scrub_context {
> > 	const struct xfs_scrub_meta_fns	*fns;
> > 	struct xfs_trans		*tp;
> > 	struct xfs_inode		*ip;
> >+	uint				ilock_flags;
> > 	bool				try_harder;
> >
> > 	/* State tracking for single-AG operations. */
> >@@ -204,6 +205,7 @@ int xfs_scrub_walk_agfl(struct xfs_scrub_context *sc,
> >
> > int xfs_scrub_setup_ag_btree(struct xfs_scrub_context *sc,
> > 			     struct xfs_inode *ip, bool force_log);
> >+int xfs_scrub_get_inode(struct xfs_scrub_context *sc, struct xfs_inode *ip_in);
> >
> > #define SETUP_FN(name) int name(struct xfs_scrub_context *sc, struct xfs_inode *ip)
> > SETUP_FN(xfs_scrub_setup_fs);
> >@@ -213,6 +215,7 @@ SETUP_FN(xfs_scrub_setup_ag_allocbt);
> > SETUP_FN(xfs_scrub_setup_ag_iallocbt);
> > SETUP_FN(xfs_scrub_setup_ag_rmapbt);
> > SETUP_FN(xfs_scrub_setup_ag_refcountbt);
> >+SETUP_FN(xfs_scrub_setup_inode);
> > #undef SETUP_FN
> >
> > /* Metadata scrubbers */
> >@@ -230,6 +233,7 @@ SCRUB_FN(xfs_scrub_inobt);
> > SCRUB_FN(xfs_scrub_finobt);
> > SCRUB_FN(xfs_scrub_rmapbt);
> > SCRUB_FN(xfs_scrub_refcountbt);
> >+SCRUB_FN(xfs_scrub_inode);
> > #undef SCRUB_FN
> >
> > #endif	/* __XFS_REPAIR_COMMON_H__ */
> >diff --git a/fs/xfs/scrub/inode.c b/fs/xfs/scrub/inode.c
> >new file mode 100644
> >index 0000000..6e1e037
> >--- /dev/null
> >+++ b/fs/xfs/scrub/inode.c
> >@@ -0,0 +1,326 @@
> >+/*
> >+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
> >+ *
> >+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
> >+ *
> >+ * This program is free software; you can redistribute it and/or
> >+ * modify it under the terms of the GNU General Public License
> >+ * as published by the Free Software Foundation; either version 2
> >+ * of the License, or (at your option) any later version.
> >+ *
> >+ * This program is distributed in the hope that it would be useful,
> >+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >+ * GNU General Public License for more details.
> >+ *
> >+ * You should have received a copy of the GNU General Public License
> >+ * along with this program; if not, write the Free Software Foundation,
> >+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> >+ */
> >+#include "xfs.h"
> >+#include "xfs_fs.h"
> >+#include "xfs_shared.h"
> >+#include "xfs_format.h"
> >+#include "xfs_trans_resv.h"
> >+#include "xfs_mount.h"
> >+#include "xfs_defer.h"
> >+#include "xfs_btree.h"
> >+#include "xfs_bit.h"
> >+#include "xfs_log_format.h"
> >+#include "xfs_trans.h"
> >+#include "xfs_trace.h"
> >+#include "xfs_sb.h"
> >+#include "xfs_inode.h"
> >+#include "xfs_icache.h"
> >+#include "xfs_inode_buf.h"
> >+#include "xfs_inode_fork.h"
> >+#include "xfs_ialloc.h"
> >+#include "xfs_log.h"
> >+#include "xfs_trans_priv.h"
> >+#include "xfs_reflink.h"
> >+#include "scrub/common.h"
> >+
> >+/* Set us up with an inode. */
> >+int
> >+xfs_scrub_setup_inode(
> >+	struct xfs_scrub_context	*sc,
> >+	struct xfs_inode		*ip)
> >+{
> >+	struct xfs_mount		*mp = sc->mp;
> >+	int				error;
> >+
> >+	/*
> >+	 * Try to get the inode.  If the verifiers fail, we try again
> >+	 * in raw mode.
> >+	 */
> >+	error = xfs_scrub_get_inode(sc, ip);
> >+	switch (error) {
> >+	case 0:
> >+		break;
> >+	case -EFSCORRUPTED:
> >+	case -EFSBADCRC:
> >+		/* Push everything out of the log onto disk prior to check. */
> >+		error = _xfs_log_force(mp, XFS_LOG_SYNC, NULL);
> >+		if (error)
> >+			return error;
> >+		xfs_ail_push_all_sync(mp->m_ail);
> >+		return 0;
> >+	default:
> >+		return error;
> >+	}
> >+
> >+	/* Got the inode, lock it and we're ready to go. */
> >+	sc->ilock_flags = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL;
> >+	xfs_ilock(sc->ip, sc->ilock_flags);
>
> Is this lock....
> 
> >+	error = xfs_scrub_trans_alloc(sc->sm, mp, &M_RES(mp)->tr_itruncate,
> >+			0, 0, 0, &sc->tp);
> >+	if (error)
> >+		goto out_unlock;
> >+	sc->ilock_flags |= XFS_ILOCK_EXCL;
> >+	xfs_ilock(sc->ip, XFS_ILOCK_EXCL);
>
> .... and then this lock, supposed to be locking twice like this?  Did you
> maybe mean for the second one to be an unlock?  Also did you mean to call it
> with sc->ilock_flags as the flags like the first call does?

They're recursive locks.  First we take the io and mmap locks so that no
other threads can start IO activities, then we take the ilock so that we
have exclusive access to the inode fields / extent map.

Strictly speaking, this /could/ theoretically take the shared version
of the first two locks during a scrub operation since we don't modify
anything (we need the exclusive ilock to load the extent map), but for
now I think it's simpler to obtain totally exclusive access to the
metadata we're checking.  For repair we of course need exclusive access
to all three levels.

--D

> 
> Other than that it looks good.  Looks like you caught a few extra bugs since
> the last revision.
> Reviewed by: Allison Henderson <allison.henderson@oracle.com>
> 
> >+
> >+	return error;
> >+out_unlock:
> >+	xfs_iunlock(sc->ip, sc->ilock_flags);
> >+	if (sc->ip != ip)
> >+		IRELE(sc->ip);
> >+	sc->ip = NULL;
> >+	return error;
> >+}
> >+
> >+/* Inode core */
> >+
> >+#define XFS_SCRUB_INODE_CHECK(fs_ok) \
> >+	XFS_SCRUB_INO_CHECK(sc, ino, bp, "inode", fs_ok)
> >+#define XFS_SCRUB_INODE_GOTO(fs_ok, label) \
> >+	XFS_SCRUB_INO_GOTO(sc, ino, bp, "inode", fs_ok, label)
> >+#define XFS_SCRUB_INODE_OP_ERROR_GOTO(label) \
> >+	XFS_SCRUB_OP_ERROR_GOTO(sc, XFS_INO_TO_AGNO(mp, ino), \
> >+			XFS_INO_TO_AGBNO(mp, ino), "inode", &error, label)
> >+#define XFS_SCRUB_INODE_PREEN(fs_ok) \
> >+	XFS_SCRUB_INO_PREEN(sc, bp, "inode", fs_ok)
> >+/* Scrub an inode. */
> >+int
> >+xfs_scrub_inode(
> >+	struct xfs_scrub_context	*sc)
> >+{
> >+	struct xfs_imap			imap;
> >+	struct xfs_dinode		di;
> >+	struct xfs_mount		*mp = sc->mp;
> >+	struct xfs_buf			*bp = NULL;
> >+	struct xfs_dinode		*dip;
> >+	xfs_ino_t			ino;
> >+	unsigned long long		isize;
> >+	uint64_t			flags2;
> >+	uint32_t			nextents;
> >+	uint32_t			extsize;
> >+	uint32_t			cowextsize;
> >+	uint16_t			flags;
> >+	uint16_t			mode;
> >+	bool				has_shared;
> >+	int				error = 0;
> >+
> >+	/* Did we get the in-core inode, or are we doing this manually? */
> >+	if (sc->ip) {
> >+		ino = sc->ip->i_ino;
> >+		xfs_inode_to_disk(sc->ip, &di, 0);
> >+		dip = &di;
> >+	} else {
> >+		/* Map & read inode. */
> >+		ino = sc->sm->sm_ino;
> >+		error = xfs_imap(mp, sc->tp, ino, &imap, XFS_IGET_UNTRUSTED);
> >+		if (error == -EINVAL) {
> >+			/*
> >+			 * Inode could have gotten deleted out from under us;
> >+			 * just forget about it.
> >+			 */
> >+			error = -ENOENT;
> >+			goto out;
> >+		}
> >+		XFS_SCRUB_INODE_OP_ERROR_GOTO(out);
> >+
> >+		error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
> >+				imap.im_blkno, imap.im_len, XBF_UNMAPPED, &bp,
> >+				NULL);
> >+		XFS_SCRUB_INODE_OP_ERROR_GOTO(out);
> >+
> >+		/* Is this really the inode we want? */
> >+		bp->b_ops = &xfs_inode_buf_ops;
> >+		dip = xfs_buf_offset(bp, imap.im_boffset);
> >+		error = xfs_dinode_verify(mp, ino, dip) ? 0 : -EFSCORRUPTED;
> >+		XFS_SCRUB_INODE_OP_ERROR_GOTO(out);
> >+		XFS_SCRUB_INODE_GOTO(
> >+				xfs_dinode_good_version(mp, dip->di_version),
> >+				out);
> >+		if (be32_to_cpu(dip->di_gen) != sc->sm->sm_gen) {
> >+			error = -ENOENT;
> >+			goto out;
> >+		}
> >+	}
> >+
> >+	flags = be16_to_cpu(dip->di_flags);
> >+	if (dip->di_version >= 3)
> >+		flags2 = be64_to_cpu(dip->di_flags2);
> >+	else
> >+		flags2 = 0;
> >+
> >+	/* di_mode */
> >+	mode = be16_to_cpu(dip->di_mode);
> >+	XFS_SCRUB_INODE_CHECK(!(mode & ~(S_IALLUGO | S_IFMT)));
> >+
> >+	/* v1/v2 fields */
> >+	switch (dip->di_version) {
> >+	case 1:
> >+		XFS_SCRUB_INODE_CHECK(dip->di_nlink == 0);
> >+		XFS_SCRUB_INODE_CHECK(dip->di_mode || !sc->ip);
> >+		XFS_SCRUB_INODE_CHECK(dip->di_projid_lo == 0);
> >+		XFS_SCRUB_INODE_CHECK(dip->di_projid_hi == 0);
> >+		break;
> >+	case 2:
> >+	case 3:
> >+		XFS_SCRUB_INODE_CHECK(dip->di_onlink == 0);
> >+		XFS_SCRUB_INODE_CHECK(dip->di_mode || !sc->ip);
> >+		XFS_SCRUB_INODE_CHECK(dip->di_projid_hi == 0 ||
> >+				xfs_sb_version_hasprojid32bit(&mp->m_sb));
> >+		break;
> >+	default:
> >+		ASSERT(0);
> >+		break;
> >+	}
> >+
> >+	/* di_format */
> >+	switch (dip->di_format) {
> >+	case XFS_DINODE_FMT_DEV:
> >+		XFS_SCRUB_INODE_CHECK(S_ISCHR(mode) || S_ISBLK(mode) ||
> >+				      S_ISFIFO(mode) || S_ISSOCK(mode));
> >+		break;
> >+	case XFS_DINODE_FMT_LOCAL:
> >+		XFS_SCRUB_INODE_CHECK(S_ISDIR(mode) || S_ISLNK(mode));
> >+		break;
> >+	case XFS_DINODE_FMT_EXTENTS:
> >+		XFS_SCRUB_INODE_CHECK(S_ISREG(mode) || S_ISDIR(mode) ||
> >+				      S_ISLNK(mode));
> >+		break;
> >+	case XFS_DINODE_FMT_BTREE:
> >+		XFS_SCRUB_INODE_CHECK(S_ISREG(mode) || S_ISDIR(mode));
> >+		break;
> >+	case XFS_DINODE_FMT_UUID:
> >+	default:
> >+		XFS_SCRUB_INODE_CHECK(false);
> >+		break;
> >+	}
> >+
> >+	/* di_size */
> >+	isize = be64_to_cpu(dip->di_size);
> >+	XFS_SCRUB_INODE_CHECK(!(isize & (1ULL << 63)));
> >+	if (!S_ISDIR(mode) && !S_ISREG(mode) && !S_ISLNK(mode))
> >+		XFS_SCRUB_INODE_CHECK(isize == 0);
> >+
> >+	/* di_nblocks */
> >+	if (flags2 & XFS_DIFLAG2_REFLINK) {
> >+		; /* nblocks can exceed dblocks */
> >+	} else if (flags & XFS_DIFLAG_REALTIME) {
> >+		XFS_SCRUB_INODE_CHECK(be64_to_cpu(dip->di_nblocks) <
> >+				mp->m_sb.sb_dblocks + mp->m_sb.sb_rblocks);
> >+	} else {
> >+		XFS_SCRUB_INODE_CHECK(be64_to_cpu(dip->di_nblocks) <
> >+				mp->m_sb.sb_dblocks);
> >+	}
> >+
> >+	/* di_extsize */
> >+	if (flags & XFS_DIFLAG_EXTSIZE) {
> >+		extsize = be32_to_cpu(dip->di_extsize);
> >+		XFS_SCRUB_INODE_CHECK(extsize > 0);
> >+		XFS_SCRUB_INODE_CHECK(extsize <= MAXEXTLEN);
> >+		XFS_SCRUB_INODE_CHECK(extsize <= mp->m_sb.sb_agblocks / 2 ||
> >+				(flags & XFS_DIFLAG_REALTIME));
> >+	}
> >+
> >+	/* di_flags */
> >+	XFS_SCRUB_INODE_CHECK(!(flags & XFS_DIFLAG_IMMUTABLE) ||
> >+			      !(flags & XFS_DIFLAG_APPEND));
> >+
> >+	XFS_SCRUB_INODE_CHECK(!(flags & XFS_DIFLAG_FILESTREAM) ||
> >+			      !(flags & XFS_DIFLAG_REALTIME));
> >+
> >+	/* di_nextents */
> >+	nextents = be32_to_cpu(dip->di_nextents);
> >+	switch (dip->di_format) {
> >+	case XFS_DINODE_FMT_EXTENTS:
> >+		XFS_SCRUB_INODE_CHECK(nextents <=
> >+			XFS_DFORK_DSIZE(dip, mp) / sizeof(struct xfs_bmbt_rec));
> >+		break;
> >+	case XFS_DINODE_FMT_BTREE:
> >+		XFS_SCRUB_INODE_CHECK(nextents >
> >+			XFS_DFORK_DSIZE(dip, mp) / sizeof(struct xfs_bmbt_rec));
> >+		break;
> >+	case XFS_DINODE_FMT_LOCAL:
> >+	case XFS_DINODE_FMT_DEV:
> >+	case XFS_DINODE_FMT_UUID:
> >+	default:
> >+		XFS_SCRUB_INODE_CHECK(nextents == 0);
> >+		break;
> >+	}
> >+
> >+	/* di_anextents */
> >+	nextents = be16_to_cpu(dip->di_anextents);
> >+	switch (dip->di_aformat) {
> >+	case XFS_DINODE_FMT_EXTENTS:
> >+		XFS_SCRUB_INODE_CHECK(nextents <=
> >+			XFS_DFORK_ASIZE(dip, mp) / sizeof(struct xfs_bmbt_rec));
> >+		break;
> >+	case XFS_DINODE_FMT_BTREE:
> >+		XFS_SCRUB_INODE_CHECK(nextents >
> >+			XFS_DFORK_ASIZE(dip, mp) / sizeof(struct xfs_bmbt_rec));
> >+		break;
> >+	case XFS_DINODE_FMT_LOCAL:
> >+	case XFS_DINODE_FMT_DEV:
> >+	case XFS_DINODE_FMT_UUID:
> >+	default:
> >+		XFS_SCRUB_INODE_CHECK(nextents == 0);
> >+		break;
> >+	}
> >+
> >+	/* di_forkoff */
> >+	XFS_SCRUB_INODE_CHECK(XFS_DFORK_APTR(dip) <
> >+			(char *)dip + mp->m_sb.sb_inodesize);
> >+	XFS_SCRUB_INODE_CHECK(dip->di_anextents == 0 || dip->di_forkoff);
> >+
> >+	/* di_aformat */
> >+	XFS_SCRUB_INODE_CHECK(dip->di_aformat == XFS_DINODE_FMT_LOCAL ||
> >+			      dip->di_aformat == XFS_DINODE_FMT_EXTENTS ||
> >+			      dip->di_aformat == XFS_DINODE_FMT_BTREE);
> >+
> >+	/* di_cowextsize */
> >+	if (flags2 & XFS_DIFLAG2_COWEXTSIZE) {
> >+		cowextsize = be32_to_cpu(dip->di_cowextsize);
> >+		XFS_SCRUB_INODE_CHECK(xfs_sb_version_hasreflink(&mp->m_sb));
> >+		XFS_SCRUB_INODE_CHECK(cowextsize > 0);
> >+		XFS_SCRUB_INODE_CHECK(cowextsize <= MAXEXTLEN);
> >+		XFS_SCRUB_INODE_CHECK(cowextsize <= mp->m_sb.sb_agblocks / 2);
> >+	}
> >+
> >+	/* Now let's do the things that require a live inode. */
> >+	if (!sc->ip)
> >+		goto out;
> >+
> >+	/*
> >+	 * Does this inode have the reflink flag set but no shared extents?
> >+	 * Set the preening flag if this is the case.
> >+	 */
> >+	if (xfs_is_reflink_inode(sc->ip)) {
> >+		error = xfs_reflink_inode_has_shared_extents(sc->tp, sc->ip,
> >+				&has_shared);
> >+		XFS_SCRUB_INODE_OP_ERROR_GOTO(out);
> >+		XFS_SCRUB_INODE_PREEN(has_shared == true);
> >+	}
> >+
> >+out:
> >+	if (bp)
> >+		xfs_trans_brelse(sc->tp, bp);
> >+	return error;
> >+}
> >+#undef XFS_SCRUB_INODE_PREEN
> >+#undef XFS_SCRUB_INODE_OP_ERROR_GOTO
> >+#undef XFS_SCRUB_INODE_GOTO
> >+#undef XFS_SCRUB_INODE_CHECK
> >diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> >index 6c0281b..950e2c8 100644
> >--- a/fs/xfs/xfs_trace.h
> >+++ b/fs/xfs/xfs_trace.h
> >@@ -3323,7 +3323,8 @@ DEFINE_GETFSMAP_EVENT(xfs_getfsmap_mapping);
> > 	{ XFS_SCRUB_TYPE_INOBT,		"inobt" }, \
> > 	{ XFS_SCRUB_TYPE_FINOBT,	"finobt" }, \
> > 	{ XFS_SCRUB_TYPE_RMAPBT,	"rmapbt" }, \
> >-	{ XFS_SCRUB_TYPE_REFCNTBT,	"refcountbt" }
> >+	{ XFS_SCRUB_TYPE_REFCNTBT,	"refcountbt" }, \
> >+	{ XFS_SCRUB_TYPE_INODE,		"inode" }
> > DECLARE_EVENT_CLASS(xfs_scrub_class,
> > 	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
> > 		 int error),
> >
> >--
> >To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 14/22] xfs: scrub inode block mappings
  2017-07-23 17:41   ` Allison Henderson
@ 2017-07-24 20:05     ` Darrick J. Wong
  0 siblings, 0 replies; 63+ messages in thread
From: Darrick J. Wong @ 2017-07-24 20:05 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Sun, Jul 23, 2017 at 10:41:48AM -0700, Allison Henderson wrote:
> 
> 
> On 7/20/2017 9:40 PM, Darrick J. Wong wrote:
> >From: Darrick J. Wong <darrick.wong@oracle.com>
> >
> >Scrub an individual inode's block mappings to make sure they make sense.
> >
> >Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> >---
> > fs/xfs/Makefile        |    1
> > fs/xfs/libxfs/xfs_fs.h |    5 +
> > fs/xfs/scrub/bmap.c    |  378 ++++++++++++++++++++++++++++++++++++++++++++++++
> > fs/xfs/scrub/common.c  |   12 ++
> > fs/xfs/scrub/common.h  |    5 +
> > fs/xfs/xfs_trace.h     |    5 +
> > 6 files changed, 404 insertions(+), 2 deletions(-)
> > create mode 100644 fs/xfs/scrub/bmap.c
> >
> >
> >diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> >index 2ba33ad..89c67e1a 100644
> >--- a/fs/xfs/Makefile
> >+++ b/fs/xfs/Makefile
> >@@ -142,6 +142,7 @@ ifeq ($(CONFIG_XFS_ONLINE_SCRUB),y)
> > xfs-y				+= $(addprefix scrub/, \
> > 				   agheader.o \
> > 				   alloc.o \
> >+				   bmap.o \
> > 				   btree.o \
> > 				   common.o \
> > 				   ialloc.o \
> >diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> >index 277b528..d762277 100644
> >--- a/fs/xfs/libxfs/xfs_fs.h
> >+++ b/fs/xfs/libxfs/xfs_fs.h
> >@@ -494,7 +494,10 @@ struct xfs_scrub_metadata {
> > #define XFS_SCRUB_TYPE_RMAPBT	10	/* reverse mapping btree */
> > #define XFS_SCRUB_TYPE_REFCNTBT	11	/* reference count btree */
> > #define XFS_SCRUB_TYPE_INODE	12	/* inode record */
> >-#define XFS_SCRUB_TYPE_MAX	12
> >+#define XFS_SCRUB_TYPE_BMBTD	13	/* data fork block mapping */
> >+#define XFS_SCRUB_TYPE_BMBTA	14	/* attr fork block mapping */
> >+#define XFS_SCRUB_TYPE_BMBTC	15	/* CoW fork block mapping */
> >+#define XFS_SCRUB_TYPE_MAX	15
> >
> > /* i: repair this metadata */
> > #define XFS_SCRUB_FLAG_REPAIR		(1 << 0)
> >diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c
> >new file mode 100644
> >index 0000000..731f026
> >--- /dev/null
> >+++ b/fs/xfs/scrub/bmap.c
> >@@ -0,0 +1,378 @@
> >+/*
> >+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
> >+ *
> >+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
> >+ *
> >+ * This program is free software; you can redistribute it and/or
> >+ * modify it under the terms of the GNU General Public License
> >+ * as published by the Free Software Foundation; either version 2
> >+ * of the License, or (at your option) any later version.
> >+ *
> >+ * This program is distributed in the hope that it would be useful,
> >+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >+ * GNU General Public License for more details.
> >+ *
> >+ * You should have received a copy of the GNU General Public License
> >+ * along with this program; if not, write the Free Software Foundation,
> >+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> >+ */
> >+#include "xfs.h"
> >+#include "xfs_fs.h"
> >+#include "xfs_shared.h"
> >+#include "xfs_format.h"
> >+#include "xfs_trans_resv.h"
> >+#include "xfs_mount.h"
> >+#include "xfs_defer.h"
> >+#include "xfs_btree.h"
> >+#include "xfs_bit.h"
> >+#include "xfs_log_format.h"
> >+#include "xfs_trans.h"
> >+#include "xfs_trace.h"
> >+#include "xfs_sb.h"
> >+#include "xfs_inode.h"
> >+#include "xfs_inode_fork.h"
> >+#include "xfs_bmap.h"
> >+#include "xfs_bmap_util.h"
> >+#include "xfs_bmap_btree.h"
> >+#include "xfs_rmap.h"
> >+#include "scrub/common.h"
> >+#include "scrub/btree.h"
> >+
> >+/* Set us up with an inode's bmap. */
> >+STATIC int
> >+__xfs_scrub_setup_inode_bmap(
> >+	struct xfs_scrub_context	*sc,
> >+	struct xfs_inode		*ip,
> >+	bool				flush_data)
> >+{
> >+	struct xfs_mount		*mp = sc->mp;
> >+	int				error;
> >+
> >+	error = xfs_scrub_get_inode(sc, ip);
> >+	if (error)
> >+		return error;
> >+
> >+	sc->ilock_flags = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL;
> >+	xfs_ilock(sc->ip, sc->ilock_flags);
> >+
> >+	/*
> >+	 * We don't want any ephemeral data fork updates sitting around
> >+	 * while we inspect block mappings, so wait for directio to finish
> >+	 * and flush dirty data if we have delalloc reservations.
> >+	 */
> >+	if (S_ISREG(VFS_I(sc->ip)->i_mode) && flush_data) {
> >+		inode_dio_wait(VFS_I(sc->ip));
> >+		error = filemap_write_and_wait(VFS_I(sc->ip)->i_mapping);
> >+		if (error)
> >+			goto out_unlock;
> >+		error = invalidate_inode_pages2(VFS_I(sc->ip)->i_mapping);
> >+		if (error)
> >+			goto out_unlock;
> >+	}
> >+
> >+	/* Got the inode, lock it and we're ready to go. */
> >+	error = xfs_scrub_trans_alloc(sc->sm, mp, &M_RES(mp)->tr_itruncate,
> >+			0, 0, 0, &sc->tp);
> >+	if (error)
> >+		goto out_unlock;
> >+	sc->ilock_flags |= XFS_ILOCK_EXCL;
> >+	xfs_ilock(sc->ip, XFS_ILOCK_EXCL);
> >+
> >+	return 0;
> >+out_unlock:
> >+	xfs_iunlock(sc->ip, sc->ilock_flags);
> >+	if (sc->ip != ip)
> >+		IRELE(sc->ip);
> >+	sc->ip = NULL;
> >+	return error;
> >+}
> >+
> >+/* Set us up to scrub the data fork. */
> >+int
> >+xfs_scrub_setup_inode_bmap_data(
> >+	struct xfs_scrub_context	*sc,
> >+	struct xfs_inode		*ip)
> >+{
> >+	return __xfs_scrub_setup_inode_bmap(sc, ip, true);
> >+}
> >+
> >+/* Set us up to scrub the attr or CoW fork. */
> >+int
> >+xfs_scrub_setup_inode_bmap(
> >+	struct xfs_scrub_context	*sc,
> >+	struct xfs_inode		*ip)
> >+{
> >+	return __xfs_scrub_setup_inode_bmap(sc, ip, false);
> >+}
> >+
> >+/*
> >+ * Inode fork block mapping (BMBT) scrubber.
> >+ * More complex than the others because we have to scrub
> >+ * all the extents regardless of whether or not the fork
> >+ * is in btree format.
> >+ */
> >+
> >+struct xfs_scrub_bmap_info {
> >+	struct xfs_scrub_context	*sc;
> >+	const char			*type;
> >+	xfs_daddr_t			eofs;
> >+	xfs_fileoff_t			lastoff;
> >+	bool				is_rt;
> >+	bool				is_shared;
> >+	int				whichfork;
> >+};
> >+
> >+#define XFS_SCRUB_BMAP_CHECK(fs_ok) \
> >+	XFS_SCRUB_INO_CHECK(info->sc, info->sc->ip->i_ino, bp, info->type, fs_ok)
> >+#define XFS_SCRUB_BMAP_GOTO(fs_ok, label) \
> >+	XFS_SCRUB_INO_GOTO(info->sc, info->sc->ip->i_ino, bp, info->type, fs_ok, label)
> >+#define XFS_SCRUB_BMAP_OP_ERROR_GOTO(label) \
> >+	XFS_SCRUB_OP_ERROR_GOTO(info->sc, agno, 0, "bmap", &error, label)
> >+/* Scrub a single extent record. */
> >+STATIC int
> >+xfs_scrub_bmap_extent(
> >+	struct xfs_inode		*ip,
> >+	struct xfs_btree_cur		*cur,
> >+	struct xfs_scrub_bmap_info	*info,
> >+	struct xfs_bmbt_irec		*irec)
> >+{
> >+	struct xfs_scrub_ag		sa = { 0 };
> >+	struct xfs_mount		*mp = info->sc->mp;
> >+	struct xfs_buf			*bp = NULL;
> >+	xfs_daddr_t			daddr;
> >+	xfs_daddr_t			dlen;
> >+	xfs_fsblock_t			bno;
> >+	xfs_agnumber_t			agno;
> >+	int				error = 0;
> >+
> >+	if (cur)
> >+		xfs_btree_get_block(cur, 0, &bp);
> >+
> >+	XFS_SCRUB_BMAP_CHECK(irec->br_startoff >= info->lastoff);
> >+	XFS_SCRUB_BMAP_CHECK(irec->br_startblock != HOLESTARTBLOCK);
> >+	XFS_SCRUB_BMAP_CHECK(!isnullstartblock(irec->br_startblock));
> >+
> >+	/* Actual mapping, so check the block ranges. */
> >+	if (info->is_rt) {
> >+		daddr = XFS_FSB_TO_BB(mp, irec->br_startblock);
> >+		agno = NULLAGNUMBER;
> >+		bno = irec->br_startblock;
> >+	} else {
> >+		daddr = XFS_FSB_TO_DADDR(mp, irec->br_startblock);
> >+		agno = XFS_FSB_TO_AGNO(mp, irec->br_startblock);
> >+		XFS_SCRUB_BMAP_GOTO(agno < mp->m_sb.sb_agcount, out);
> >+		bno = XFS_FSB_TO_AGBNO(mp, irec->br_startblock);
> >+		XFS_SCRUB_BMAP_CHECK(bno < mp->m_sb.sb_agblocks);
> >+	}
> >+	dlen = XFS_FSB_TO_BB(mp, irec->br_blockcount);
> >+	XFS_SCRUB_BMAP_CHECK(irec->br_blockcount > 0);
> >+	XFS_SCRUB_BMAP_CHECK(irec->br_blockcount <= MAXEXTLEN);
> >+	XFS_SCRUB_BMAP_CHECK(daddr < info->eofs);
> >+	XFS_SCRUB_BMAP_CHECK(daddr + dlen <= info->eofs);
> >+	XFS_SCRUB_BMAP_CHECK(irec->br_state != XFS_EXT_UNWRITTEN ||
> >+			xfs_sb_version_hasextflgbit(&mp->m_sb));
> >+	if (error)
> >+		goto out;
> >+
> >+	/* Set ourselves up for cross-referencing later. */
> >+	if (!info->is_rt) {
> >+		error = xfs_scrub_ag_init(info->sc, agno, &sa);
> >+		XFS_SCRUB_BMAP_OP_ERROR_GOTO(out);
> >+	}
> >+
> >+	xfs_scrub_ag_free(info->sc, &sa);
> >+out:
> >+	info->lastoff = irec->br_startoff + irec->br_blockcount;
> >+	return error;
> >+}
> >+#undef XFS_SCRUB_BMAP_OP_ERROR_GOTO
> >+#undef XFS_SCRUB_BMAP_GOTO
> >+
> >+/* Scrub a bmbt record. */
> >+STATIC int
> >+xfs_scrub_bmapbt_helper(
> >+	struct xfs_scrub_btree		*bs,
> >+	union xfs_btree_rec		*rec)
> >+{
> >+	struct xfs_bmbt_rec_host	ihost;
> >+	struct xfs_bmbt_irec		irec;
> >+	struct xfs_scrub_bmap_info	*info = bs->private;
> >+	struct xfs_inode		*ip = bs->cur->bc_private.b.ip;
> >+	struct xfs_buf			*bp = NULL;
> >+	struct xfs_btree_block		*block;
> >+	uint64_t			owner;
> >+	int				i;
> >+
> >+	/*
> >+	 * Check the owners of the btree blocks up to the level below
> >+	 * the root since the verifiers don't do that.
> >+	 */
> >+	if (xfs_sb_version_hascrc(&bs->cur->bc_mp->m_sb) &&
> >+	    bs->cur->bc_ptrs[0] == 1) {
> >+		for (i = 0; i < bs->cur->bc_nlevels - 1; i++) {
> >+			block = xfs_btree_get_block(bs->cur, i, &bp);
> >+			owner = be64_to_cpu(block->bb_u.l.bb_owner);
> >+			XFS_SCRUB_BMAP_CHECK(owner == ip->i_ino);
> >+		}
> >+	}
> >+
> >+	/* Set up the in-core record and scrub it. */
> >+	ihost.l0 = be64_to_cpu(rec->bmbt.l0);
> >+	ihost.l1 = be64_to_cpu(rec->bmbt.l1);
> >+	xfs_bmbt_get_all(&ihost, &irec);
> >+	return xfs_scrub_bmap_extent(ip, bs->cur, info, &irec);
> >+}
> >+#undef XFS_SCRUB_BMAP_CHECK
> >+
> >+#define XFS_SCRUB_FORK_CHECK(fs_ok) \
> >+	XFS_SCRUB_INO_CHECK(sc, ip->i_ino, NULL, info.type, fs_ok)
> >+#define XFS_SCRUB_FORK_GOTO(fs_ok, label) \
> >+	XFS_SCRUB_INO_GOTO(sc, ip->i_ino, NULL, info.type, fs_ok, label)
> >+#define XFS_SCRUB_FORK_OP_ERROR_GOTO(label) \
> >+	XFS_SCRUB_OP_ERROR_GOTO(sc, \
> >+			XFS_INO_TO_AGNO(mp, ip->i_ino), \
> >+			XFS_INO_TO_AGBNO(mp, ip->i_ino), \
> >+			info.type, &error, label)
> >+/* Scrub an inode fork's block mappings. */
> >+STATIC int
> >+xfs_scrub_bmap(
> >+	struct xfs_scrub_context	*sc,
> >+	int				whichfork)
> >+{
> >+	struct xfs_bmbt_irec		irec;
> >+	struct xfs_scrub_bmap_info	info = {0};
> >+	struct xfs_owner_info		oinfo;
> >+	struct xfs_mount		*mp = sc->mp;
> >+	struct xfs_inode		*ip = sc->ip;
> >+	struct xfs_ifork		*ifp;
> >+	struct xfs_btree_cur		*cur;
> >+	xfs_fileoff_t			endoff;
> >+	xfs_extnum_t			idx;
> >+	bool				found;
> >+	int				error = 0;
> >+	int				err2 = 0;
> >+
> >+	switch (whichfork) {
> >+	case XFS_DATA_FORK:
> >+		info.type = "data fork";
> >+		break;
> >+	case XFS_ATTR_FORK:
> >+		info.type = "attr fork";
> >+		break;
> >+	case XFS_COW_FORK:
> >+		info.type = "CoW fork";
> >+		break;
> >+	}
> >+	ifp = XFS_IFORK_PTR(ip, whichfork);
> >+
> >+	info.is_rt = whichfork == XFS_DATA_FORK && XFS_IS_REALTIME_INODE(ip);
> >+	info.eofs = XFS_FSB_TO_BB(mp, info.is_rt ? mp->m_sb.sb_rblocks :
> >+					      mp->m_sb.sb_dblocks);
> >+	info.whichfork = whichfork;
> >+	info.is_shared = whichfork == XFS_DATA_FORK && xfs_is_reflink_inode(ip);
> >+	info.sc = sc;
> >+
> >+	switch (whichfork) {
> >+	case XFS_COW_FORK:
> >+		/* Non-existent CoW forks are ignorable. */
> >+		if (!ifp)
> >+			goto out_unlock;
> >+		/* No CoW forks on non-reflink inodes/filesystems. */
> >+		XFS_SCRUB_FORK_GOTO(xfs_is_reflink_inode(ip), out_unlock);
> >+		break;
> >+	case XFS_ATTR_FORK:
> >+		if (!ifp)
> >+			goto out_unlock;
> >+		XFS_SCRUB_FORK_CHECK(xfs_sb_version_hasattr(&mp->m_sb) ||
> >+				     xfs_sb_version_hasattr2(&mp->m_sb));
> >+		break;
> >+	}
> >+
> >+	/* Check the fork values */
> >+	switch (XFS_IFORK_FORMAT(ip, whichfork)) {
> >+	case XFS_DINODE_FMT_UUID:
> >+	case XFS_DINODE_FMT_DEV:
> >+	case XFS_DINODE_FMT_LOCAL:
> >+		/* No mappings to check. */
> >+		goto out_unlock;
> >+	case XFS_DINODE_FMT_EXTENTS:
> >+		XFS_SCRUB_FORK_GOTO(ifp->if_flags & XFS_IFEXTENTS, out_unlock);
> >+		break;
> >+	case XFS_DINODE_FMT_BTREE:
> >+		XFS_SCRUB_FORK_CHECK(whichfork != XFS_COW_FORK);
> >+		/* Scan the btree records. */
> >+		cur = xfs_bmbt_init_cursor(mp, sc->tp, ip, whichfork);
> >+		xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, whichfork);
> >+		err2 = xfs_scrub_btree(sc, cur, xfs_scrub_bmapbt_helper,
> >+				&oinfo, &info);
> >+		xfs_btree_del_cursor(cur, err2 ? XFS_BTREE_ERROR :
> >+						 XFS_BTREE_NOERROR);
> >+		if (err2 == -EDEADLOCK)
> >+			return err2;
> >+		else if (err2)
> >+			goto out_unlock;
> >+		break;
> >+	default:
> >+		XFS_SCRUB_FORK_GOTO(false, out_unlock);
> >+		break;
> >+	}
> >+
> >+	/* Extent data is in memory, so scrub that. */
> >+
> >+	/* Find the offset of the last extent in the mapping. */
> >+	error = xfs_bmap_last_offset(ip, &endoff, whichfork);
> >+	XFS_SCRUB_FORK_OP_ERROR_GOTO(out_unlock);
> >+
> >+	/* Scrub extent records. */
> >+	info.lastoff = 0;
> >+	ifp = XFS_IFORK_PTR(ip, whichfork);
> >+	for (found = xfs_iext_lookup_extent(ip, ifp, 0, &idx, &irec);
> >+	     found;
> Did you mean to have the found; without an assignment here?  Not sure if
> that was intentional or a typo.

It's a second-clause-of-a-for-loop null test; I can change it to "found
!= NULL" to be more obvious.

--D

> Otherwise looks good.
> Reviewed by: Allison Henderson <allison.henderson@oracle.com>
> >+	     found = xfs_iext_get_extent(ifp, ++idx, &irec)) {
> >+		if (xfs_scrub_should_terminate(&error))
> >+			break;
> >+		if (isnullstartblock(irec.br_startblock))
> >+			continue;
> >+		XFS_SCRUB_FORK_CHECK(irec.br_startoff < endoff);
> >+		err2 = xfs_scrub_bmap_extent(ip, NULL, &info, &irec);
> >+		if (err2 == -EDEADLOCK)
> >+			return err2;
> >+		else if (!error && err2)
> >+			error = err2;
> >+	}
> >+
> >+out_unlock:
> >+	if (error == 0 && err2 != 0)
> >+		error = err2;
> >+	return error;
> >+}
> >+#undef XFS_SCRUB_FORK_CHECK
> >+#undef XFS_SCRUB_FORK_GOTO
> >+
> >+/* Scrub an inode's data fork. */
> >+int
> >+xfs_scrub_bmap_data(
> >+	struct xfs_scrub_context	*sc)
> >+{
> >+	return xfs_scrub_bmap(sc, XFS_DATA_FORK);
> >+}
> >+
> >+/* Scrub an inode's attr fork. */
> >+int
> >+xfs_scrub_bmap_attr(
> >+	struct xfs_scrub_context	*sc)
> >+{
> >+	return xfs_scrub_bmap(sc, XFS_ATTR_FORK);
> >+}
> >+
> >+/* Scrub an inode's CoW fork. */
> >+int
> >+xfs_scrub_bmap_cow(
> >+	struct xfs_scrub_context	*sc)
> >+{
> >+	if (!xfs_is_reflink_inode(sc->ip))
> >+		return -ENOENT;
> >+
> >+	return xfs_scrub_bmap(sc, XFS_COW_FORK);
> >+}
> >diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> >index 066fd3e..da3c006 100644
> >--- a/fs/xfs/scrub/common.c
> >+++ b/fs/xfs/scrub/common.c
> >@@ -795,6 +795,18 @@ static const struct xfs_scrub_meta_fns meta_scrub_fns[] = {
> > 		.setup	= xfs_scrub_setup_inode,
> > 		.scrub	= xfs_scrub_inode,
> > 	},
> >+	{ /* inode data fork */
> >+		.setup	= xfs_scrub_setup_inode_bmap_data,
> >+		.scrub	= xfs_scrub_bmap_data,
> >+	},
> >+	{ /* inode attr fork */
> >+		.setup	= xfs_scrub_setup_inode_bmap,
> >+		.scrub	= xfs_scrub_bmap_attr,
> >+	},
> >+	{ /* inode CoW fork */
> >+		.setup	= xfs_scrub_setup_inode_bmap,
> >+		.scrub	= xfs_scrub_bmap_cow,
> >+	},
> > };
> >
> > /* Dispatch metadata scrubbing. */
> >diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
> >index 5caa6c9..1025466 100644
> >--- a/fs/xfs/scrub/common.h
> >+++ b/fs/xfs/scrub/common.h
> >@@ -216,6 +216,8 @@ SETUP_FN(xfs_scrub_setup_ag_iallocbt);
> > SETUP_FN(xfs_scrub_setup_ag_rmapbt);
> > SETUP_FN(xfs_scrub_setup_ag_refcountbt);
> > SETUP_FN(xfs_scrub_setup_inode);
> >+SETUP_FN(xfs_scrub_setup_inode_bmap_data);
> >+SETUP_FN(xfs_scrub_setup_inode_bmap);
> > #undef SETUP_FN
> >
> > /* Metadata scrubbers */
> >@@ -234,6 +236,9 @@ SCRUB_FN(xfs_scrub_finobt);
> > SCRUB_FN(xfs_scrub_rmapbt);
> > SCRUB_FN(xfs_scrub_refcountbt);
> > SCRUB_FN(xfs_scrub_inode);
> >+SCRUB_FN(xfs_scrub_bmap_data);
> >+SCRUB_FN(xfs_scrub_bmap_attr);
> >+SCRUB_FN(xfs_scrub_bmap_cow);
> > #undef SCRUB_FN
> >
> > #endif	/* __XFS_REPAIR_COMMON_H__ */
> >diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> >index 950e2c8..edfa4c7 100644
> >--- a/fs/xfs/xfs_trace.h
> >+++ b/fs/xfs/xfs_trace.h
> >@@ -3324,7 +3324,10 @@ DEFINE_GETFSMAP_EVENT(xfs_getfsmap_mapping);
> > 	{ XFS_SCRUB_TYPE_FINOBT,	"finobt" }, \
> > 	{ XFS_SCRUB_TYPE_RMAPBT,	"rmapbt" }, \
> > 	{ XFS_SCRUB_TYPE_REFCNTBT,	"refcountbt" }, \
> >-	{ XFS_SCRUB_TYPE_INODE,		"inode" }
> >+	{ XFS_SCRUB_TYPE_INODE,		"inode" }, \
> >+	{ XFS_SCRUB_TYPE_BMBTD,		"bmapbtd" }, \
> >+	{ XFS_SCRUB_TYPE_BMBTA,		"bmapbta" }, \
> >+	{ XFS_SCRUB_TYPE_BMBTC,		"bmapbtc" }
> > DECLARE_EVENT_CLASS(xfs_scrub_class,
> > 	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
> > 		 int error),
> >
> >--
> >To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 03/22] xfs: create an ioctl to scrub AG metadata
  2017-07-23 23:45   ` Dave Chinner
@ 2017-07-24 21:14     ` Darrick J. Wong
  0 siblings, 0 replies; 63+ messages in thread
From: Darrick J. Wong @ 2017-07-24 21:14 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, Jul 24, 2017 at 09:45:39AM +1000, Dave Chinner wrote:
> On Thu, Jul 20, 2017 at 09:38:47PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Create an ioctl that can be used to scrub internal filesystem metadata.
> > The new ioctl takes the metadata type, an (optional) AG number, an
> > (optional) inode number and generation, and a flags argument.  This will
> > be used by the upcoming XFS online scrub tool.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Ok, I'm starting completely cold on this code (never seen it before)
> so there's a few things about the patch series that hit me straight
> away.
> 
> 1. I can't really review the previous tracepoint patch because I
> have no context of how they are used or what is being passed into
> them. Tracepoint patches really need to be added when there's
> already context available to verify them Can you split that patch up
> so that tracepoints are introduced with the code that uses them?

Yes, that can be done.

> 2. Macros. Ugh.

Yes.  If I changed the __func__/__LINE__ to _THIS_IP_ and stopped
stringifying the checks, I guess we could save a fair amount of rodata
in the resulting xfs.ko file, and as a bonus (as you point out later) we
don't need the macros anymore.

> 3. Lots of different, disjoint bits of infrastructure in this patch,
> but I have no clear idea how it gets used yet. Makes it hard to
> review....
> 
> Hence I think this patch also needs to be broken up into individual
> infrastructure operations, with the patch description describing
> what each operation is used for. That way when it comes to adding
> the actual metadata scrub code, the reviewer knows how the pieces
> all go together and what the functions being called are supposed to
> do and return...
> 
> Once I understand the infrastructure and how it is supposed to drive
> all the other bits, then larger patches for scrubbing individual
> structures are fine, but large patches for infrastructure just lead
> to long, long emails and missed problems...

Yes.  It has been difficult finding the right balance between an
eye-watering number of small patches vs. not letting everything
coagulate into this mess of a patch.  But yes, this one can be broken up
into pieces.

> > +/* metadata scrubbing */
> > +struct xfs_scrub_metadata {
> > +	__u32 sm_type;		/* What to check? */
> > +	__u32 sm_flags;		/* flags; see below. */
> > +	__u64 sm_ino;		/* inode number. */
> > +	__u32 sm_gen;		/* inode generation. */
> > +	__u32 sm_agno;		/* ag number. */
> > +	__u64 sm_reserved[5];	/* pad to 64 bytes */
> > +};
> > +
> > +/*
> > + * Metadata types and flags for scrub operation.
> > + */
> > +#define XFS_SCRUB_TYPE_TEST	0	/* dummy to test ioctl */
> > +#define XFS_SCRUB_TYPE_MAX	0
> > +
> > +/* i: repair this metadata */
> > +#define XFS_SCRUB_FLAG_REPAIR		(1 << 0)
> 
> If you're going to document a direction, so it in the variable name,
> not the comment. XFS_SCRUB_IFLAG_REPAIR, XFS_SCRUB_OFLAG_CORRUPT,
> etc. Especially as you don't separate the definitions  of i/o flags,
> future flags are going to intertwine i/o in the definitions and
> it's not going to be obvious from a quick look where a flag should
> be used...

Fair enough.

> > +/* o: metadata object needs repair */
> > +#define XFS_SCRUB_FLAG_CORRUPT		(1 << 1)
> > +/* o: metadata object could be optimized */
> > +#define XFS_SCRUB_FLAG_PREEN		(1 << 2)
> 
> What does "could be optimised" mean?

Metadata that is not corrupt but could be improved anyway.  The primary
example of this is an inode that has the REFLINK flag set but no longer
shares any blocks.  The inode isn't corrupt, but we can speed up write
operations by clearing the flag.

> > diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> > new file mode 100644
> > index 0000000..6931793
> > --- /dev/null
> > +++ b/fs/xfs/scrub/common.c
> > @@ -0,0 +1,533 @@
> 
> Rather than "common.c", shouldn't this be named "scrub.c" to
> indicate it's the entry point/main infrastructure file? "common"
> usually indicates library/shared functions, not ioctl entry points..

Everything up to xfs_scrub_teardown is all common code.  I do however
see your point that the actual ioctl dispatching code could go in a
separate scrub/scrub.c file.

The huge top of file comment should also move over to scrub.c.

> [...]
> 
> > + * We use a bit of trickery with transactions to avoid buffer deadlocks
> > + * if there is a cycle in the metadata.  The basic problem is that
> > + * travelling down a btree involves locking the current buffer at each
> > + * tree level.  If a pointer should somehow point back to a buffer that
> > + * we've already examined, we will deadlock due to the second buffer
> > + * locking attempt.  Note however that grabbing a buffer in transaction
> > + * context links the locked buffer to the transaction.  If we try to
> > + * re-grab the buffer in the context of the same transaction, we avoid
> > + * the second lock attempt and continue.  Between the verifier and the
> > + * scrubber, something will notice that something is amiss and report
> > + * the corruption.  Therefore, each scrubber will allocate an empty
> > + * transaction, attach buffers to it, and cancel the transaction at the
> > + * end of the scrub run.  Cancelling a non-dirty transaction simply
> > + * unlocks the buffers.
> 
> This whole chunk of trickery should definitely be in it's own patch...

<nod> Though, most of the machinery to implement that trickery was
already merged to fix another deadlock, so really the only new thing
here is writing about it in a comment.

> > + * There are four pieces of data that scrub can communicate to
> > + * userspace.  The first is the error code (errno), which can be used to
> > + * communicate operational errors in performing the scrub.  There are
> > + * also three flags that can be set in the scrub context.  If the data
> > + * structure itself is corrupt, the CORRUPT flag will be set.  If
> > + * the metadata is correct but otherwise suboptimal, the PREEN flag
> > + * will be set.
> > + */
> > +
> > +struct xfs_scrub_meta_fns {
> > +	int		(*setup)(struct xfs_scrub_context *,
> > +				 struct xfs_inode *);
> > +	int		(*scrub)(struct xfs_scrub_context *);
> > +	bool		(*has)(struct xfs_sb *);
> > +};
> 
> What's this structure do? And why "fns" rather than "ops" as we
> normally call operation callout structures like this?

It's purely for function pointer dispatch, so _ops it is.

> > +/* Check for operational errors. */
> > +bool
> > +xfs_scrub_op_ok(
> > +	struct xfs_scrub_context	*sc,
> > +	xfs_agnumber_t			agno,
> > +	xfs_agblock_t			bno,
> > +	const char			*type,
> > +	int				*error,
> > +	const char			*func,
> > +	int				line)
> > +{
> > +	struct xfs_mount		*mp = sc->mp;
> > +
> > +	switch (*error) {
> > +	case 0:
> > +		return true;
> > +	case -EDEADLOCK:
> > +		/* Used to restart an op with deadlock avoidance. */
> > +		trace_xfs_scrub_deadlock_retry(sc->ip, sc->sm, *error);
> > +		break;
> > +	case -EFSBADCRC:
> > +	case -EFSCORRUPTED:
> > +		/* Note the badness but don't abort. */
> > +		sc->sm->sm_flags |= XFS_SCRUB_FLAG_CORRUPT;
> > +		*error = 0;
> > +		/* fall through */
> > +	default:
> > +		trace_xfs_scrub_op_error(mp, agno, bno, type, *error, func,
> > +				line);
> > +		break;
> > +	}
> > +	return false;
> > +}
> 
> These looks like boiler plate functions that aren't used yet, which
> makes it harder to see all the actual ioctl code that is supposed
> to be introduced in this patch. Again, I'm not sure how they are
> supposed to be used, so I can't actually review this code yet....

Originally all these bits were scattered into the actual "xfs: scrub
XXXX" patches that first used them, but this bloated those patches up.
Seeing as these helper functions are mostly variations on a theme ("do X
for some block", "do X for an inode", "do X for an inode fork block"), I
can separate them into separate pieces:

xfs_scrub: helpers to deal with operational errors during scrub
xfs_scrub: helpers to record metadata corruption
etc.

> Also, it looks like we're passing func/line to tracing functions,
> which further implies wrapper macros to insert them. We've avoided
> this with all the other tracing functions by passing
> __return_address and/or _THIS_IP_ to the tracing function. Doing so
> in all this boiler plate checking code gets rid of the macros.

Good point.

Do you know why the ftrace calls use _RET_IP_ but the rest of xfs uses
__return_address?  They're the same except that _RET_IP_ casts to
unsigned long.

> > +/* Dummy scrubber */
> > +
> > +int
> > +xfs_scrub_dummy(
> > +	struct xfs_scrub_context	*sc)
> > +{
> > +	if (sc->sm->sm_ino || sc->sm->sm_agno)
> > +		return -EINVAL;
> > +	if (sc->sm->sm_gen & XFS_SCRUB_FLAG_CORRUPT)
> > +		sc->sm->sm_flags |= XFS_SCRUB_FLAG_CORRUPT;
> > +	if (sc->sm->sm_gen & XFS_SCRUB_FLAG_PREEN)
> > +		sc->sm->sm_flags |= XFS_SCRUB_FLAG_PREEN;
> > +	if (sc->sm->sm_gen & XFS_SCRUB_FLAG_XFAIL)
> > +		sc->sm->sm_flags |= XFS_SCRUB_FLAG_XFAIL;
> > +	if (sc->sm->sm_gen & XFS_SCRUB_FLAG_XCORRUPT)
> > +		sc->sm->sm_flags |= XFS_SCRUB_FLAG_XCORRUPT;
> > +	if (sc->sm->sm_gen & ~XFS_SCRUB_FLAGS_OUT)
> > +		return -ENOENT;
> > +
> > +	return 0;
> > +}
> 
> What's the purpose of the dummy? Does it get removed later?

xfstests uses it to figure out if scrub and repair are supported for a
given mount.  It probably ought to be called xfs_scrub_test since it's
not really a /dummy/ anymore.

> > +/* Per-scrubber setup functions */
> > +
> > +/* Set us up with a transaction and an empty context. */
> > +int
> > +xfs_scrub_setup_fs(
> > +	struct xfs_scrub_context	*sc,
> > +	struct xfs_inode		*ip)
> > +{
> > +	return xfs_scrub_trans_alloc(sc->sm, sc->mp,
> > +			&M_RES(sc->mp)->tr_itruncate, 0, 0, 0, &sc->tp);
> > +}
> 
> Why are you using a truncate reservation for this transaction?

When I get to the online repair patches, we'll want the biggest rolling
transaction we can get our hands on, which is tr_itruncate.

> Better question: what initial conditions is a setup function
> supposed to create for (I'm guessing here) a scrubber function to
> run?

Allocate any memory we might need (sc->buf gets added in the xattr
scrubber), create dummy transaction, lock whatever metadata we're
evaluating (AG headers, inode, etc.).

> > +
> > +/* Scrub setup and teardown */
> > +
> > +/* Free all the resources and finish the transactions. */
> > +STATIC int
> > +xfs_scrub_teardown(
> > +	struct xfs_scrub_context	*sc,
> > +	int				error)
> > +{
> > +	if (sc->tp) {
> > +		xfs_trans_cancel(sc->tp);
> > +		sc->tp = NULL;
> > +	}
> > +	return error;
> > +}
> 
> So we have scrub function specific setup, but only a global
> teardown? Does that mean scrubber functions are not supposed to
> allocate any memory for keeping state and cross references? i.e.
> they can only store what they can keep attached to a transaction
> handle?

We'll add a 'void *buf' pointer in the xattr scrubber for allocating a
big chunk of memory, attaching it to the scrub context, and cleaning it
up during teardown.

> > +
> > +/* Perform common scrub context initialization. */
> > +STATIC int
> > +xfs_scrub_setup(
> > +	struct xfs_inode		*ip,
> > +	struct xfs_scrub_context	*sc,
> > +	const struct xfs_scrub_meta_fns	*fns,
> > +	struct xfs_scrub_metadata	*sm,
> > +	bool				try_harder)
> > +{
> > +	memset(sc, 0, sizeof(*sc));
> > +	sc->mp = ip->i_mount;
> > +	sc->sm = sm;
> > +	sc->fns = fns;
> > +	sc->try_harder = try_harder;
> > +
> > +	return sc->fns->setup(sc, ip);
> > +}
> 
> Does this really need a wrapper function? It means main function is
> somewhat convoluted....

Probably not.  This function used to be much longer.

> > +
> > +/* Scrubbing dispatch. */
> > +
> > +static const struct xfs_scrub_meta_fns meta_scrub_fns[] = {
> > +	{ /* dummy verifier */
> > +		.setup	= xfs_scrub_setup_fs,
> > +		.scrub	= xfs_scrub_dummy,
> > +	},
> > +};
> > +
> > +/* Dispatch metadata scrubbing. */
> > +int
> > +xfs_scrub_metadata(
> > +	struct xfs_inode		*ip,
> > +	struct xfs_scrub_metadata	*sm)
> > +{
> > +	struct xfs_scrub_context	sc;
> > +	struct xfs_mount		*mp = ip->i_mount;
> > +	const struct xfs_scrub_meta_fns	*fns;
> > +	bool				try_harder = false;
> > +	int				error = 0;
> > +
> > +	trace_xfs_scrub(ip, sm, error);
> 
> 	memset(sc, 0, sizeof(*sc));
> 	sc->mp = ip->i_mount;
> 	sc->sm = sm;
> 	sc->try_harder = false;
> 
> And we reference everything thru the scrub context from here on.
> I find it a bit confusing to hide sm inside sc, and everywhere else
> references sc.sm, yet later on in this function we make the
> assumption that the structure pointed to by sm is the one that the
> scrubber is actually modifying when it is running. e.g. the
> xfs_scrub_found_corruption() call at the end. Better to make it
> clear all the code is working on the same structure...

Ok, good point.

> > +
> > +	/* Forbidden if we are shut down or mounted norecovery. */
> > +	error = -ESHUTDOWN;
> > +	if (XFS_FORCED_SHUTDOWN(mp))
> > +		goto out;
> 
> Shutdown conditions should return EFSCORRUPTED.

The reason why I picked ESHUTDOWN (and not EFSCORRUPTED) was so that I
could capture EFSCORRUPTEDs accidentally escaping from the kernel code.

> > +	error = -ENOTRECOVERABLE;
> > +	if (mp->m_flags & XFS_MOUNT_NORECOVERY)
> > +		goto out;
> 
> Same for read only mounts, yes?
> 
> Or do we allow scrub on read only, but not repair or whatever
> "optimisation" is?

Correct.

> > +	/* Check our inputs. */
> > +	error = -EINVAL;
> > +	sm->sm_flags &= ~XFS_SCRUB_FLAGS_OUT;
> > +	if (sm->sm_flags & ~XFS_SCRUB_FLAGS_IN)
> > +		goto out;
> > +	if (memchr_inv(sm->sm_reserved, 0, sizeof(sm->sm_reserved)))
> > +		goto out;
> > +
> > +	/* Do we know about this type of metadata? */
> > +	error = -ENOENT;
> > +	if (sm->sm_type > XFS_SCRUB_TYPE_MAX)
> > +		goto out;
> > +	fns = &meta_scrub_fns[sm->sm_type];
> 
> 	sc.fns = &meta_scrub_fns[sm->sm_type];
> 
> > new file mode 100644
> > index 0000000..4f3113a
> > --- /dev/null
> > +++ b/fs/xfs/scrub/common.h
> 
> scrub.h
> 
> > @@ -0,0 +1,179 @@
> > +/*
> > + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> > + *
> > + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License
> > + * as published by the Free Software Foundation; either version 2
> > + * of the License, or (at your option) any later version.
> > + *
> > + * This program is distributed in the hope that it would be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, write the Free Software Foundation,
> > + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> > + */
> > +#ifndef __XFS_REPAIR_COMMON_H__
> > +#define __XFS_REPAIR_COMMON_H__
> > +
> > +/* Did we find something broken? */
> > +static inline bool xfs_scrub_found_corruption(struct xfs_scrub_metadata *sm)
> > +{
> > +	return sm->sm_flags & (XFS_SCRUB_FLAG_CORRUPT |
> > +			       XFS_SCRUB_FLAG_XCORRUPT);
> > +}
> 
> Unless this is going to get more complex, a single call wrapper is
> not necessary.
> 
> > +/*
> > + * Grab a transaction.  If we're going to repair something, we need to
> > + * ensure there's enough reservation to make all the changes.  If not,
> > + * we can use an empty transaction.
> > + */
> > +static inline int
> > +xfs_scrub_trans_alloc(
> > +	struct xfs_scrub_metadata	*sm,
> > +	struct xfs_mount		*mp,
> > +	struct xfs_trans_res		*resp,
> > +	uint				blocks,
> > +	uint				rtextents,
> > +	uint				flags,
> > +	struct xfs_trans		**tpp)
> > +{
> > +	return xfs_trans_alloc_empty(mp, tpp);
> > +}
> 
> If we only ever use an empty transaction here, then can we get rid
> of the wrapper, too?

This eventually becomes:

if (xfs_scrub_must_repair(sm))
	return xfs_trans_alloc(mp, resp, blocks, rtextents, flags, tpp);

return xfs_trans_alloc_empty(mp, tpp);

> > +
> > +/* Check for operational errors. */
> > +bool xfs_scrub_op_ok(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
> > +		     xfs_agblock_t bno, const char *type, int *error,
> > +		     const char	*func, int line);
> > +#define XFS_SCRUB_OP_ERROR_GOTO(sc, agno, bno, type, error, label) \
> > +	do { \
> > +		if (!xfs_scrub_op_ok((sc), (agno), (bno), (type), \
> > +				(error), __func__, __LINE__)) \
> > +			goto label; \
> > +	} while (0)
> 
> Ok, I though this is where the func/line variables was going. Can we
> please try to avoid these macros? It's not much extra work to do
> this:
> 
> 	if (!xfs_scrub_op_ok(sc, agno, bno, type, error, _THIS_IP_))
> 		goto label;
> 
> But it's much nicer to read than shouty macros, it doesn't hide
> goto's in macros, and it provides exactly the same debug info to the
> tracing code as the macro.

Ok.  We ought to be able to reference __return_address from inside
xfs_scrub_op_ok directly, right?  In which case we don't even need the
ugly _THIS_IP_ hanging off the end of the argument list?

> [snip more macros]
> 
> > +/* Setup functions */
> > +
> > +#define SETUP_FN(name) int name(struct xfs_scrub_context *sc, struct xfs_inode *ip)
> > +SETUP_FN(xfs_scrub_setup_fs);
> > +#undef SETUP_FN
> 
> Please, no. This sort of construct is highly unfriendly to grep and
> cscope. It costs us nothing extra to define the names in full, but
> it makes finding the code so much easier...

ok.

> > --- /dev/null
> > +++ b/fs/xfs/scrub/xfs_scrub.h
> > @@ -0,0 +1,29 @@
> > +/*
> > + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> > + *
> > + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License
> > + * as published by the Free Software Foundation; either version 2
> > + * of the License, or (at your option) any later version.
> > + *
> > + * This program is distributed in the hope that it would be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, write the Free Software Foundation,
> > + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> > + */
> > +#ifndef __XFS_SCRUB_H__
> > +#define __XFS_SCRUB_H__
> > +
> > +#ifndef CONFIG_XFS_ONLINE_SCRUB
> > +# define xfs_scrub_metadata(ip, sm)	(-ENOTTY)
> > +#else
> > +int xfs_scrub_metadata(struct xfs_inode *ip, struct xfs_scrub_metadata *sm);
> > +#endif /* CONFIG_XFS_ONLINE_SCRUB */
> > +
> > +#endif	/* __XFS_SCRUB_H__ */
> 
> This ioctl module stub code should be separated out into it's own
> patch with all the other code that enables building scrub as a
> module.

Optional feature, not a separate module.

Though.

Maybe I should send along my RFCRAP patch that actually makes scrub its
own module?  Though I consider the EXPORT_SYMBOL_GPL(xfs...) to be
rather gross.

> ...
> > index 2e7e193..d4de29b 100644
> > --- a/fs/xfs/xfs_trace.h
> > +++ b/fs/xfs/xfs_trace.h
> > @@ -3312,7 +3312,7 @@ DEFINE_GETFSMAP_EVENT(xfs_getfsmap_mapping);
> >  
> >  /* scrub */
> >  #define XFS_SCRUB_TYPE_DESC \
> > -	{ 0, NULL }
> > +	{ XFS_SCRUB_TYPE_TEST,		"dummy" }
> >  DECLARE_EVENT_CLASS(xfs_scrub_class,
> >  	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
> >  		 int error),
> > @@ -3330,6 +3330,11 @@ DECLARE_EVENT_CLASS(xfs_scrub_class,
> >  	TP_fast_assign(
> >  		__entry->dev = ip->i_mount->m_super->s_dev;
> >  		__entry->ino = ip->i_ino;
> > +		__entry->type = sm->sm_type;
> > +		__entry->agno = sm->sm_agno;
> > +		__entry->inum = sm->sm_ino;
> > +		__entry->gen = sm->sm_gen;
> > +		__entry->flags = sm->sm_flags;
> >  		__entry->error = error;
> >  	),
> >  	TP_printk("dev %d:%d ino %llu type %s agno %u inum %llu gen %u flags 0x%x error %d",
> 
> So the tracepoints in the previous patch are dependent on structures
> that have no yet been defined? Doesn't that break compilation?

No, because we only start to dereference the pointer from here on.
There's a "struct xfs_scrub_metadata;" at the top to declare the type,
but if there's no deref then the compiler doesn't care.

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 04/22] xfs: generic functions to scrub metadata and btrees
  2017-07-24  1:05   ` Dave Chinner
@ 2017-07-24 21:58     ` Darrick J. Wong
  2017-07-24 23:15       ` Dave Chinner
  0 siblings, 1 reply; 63+ messages in thread
From: Darrick J. Wong @ 2017-07-24 21:58 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, Jul 24, 2017 at 11:05:49AM +1000, Dave Chinner wrote:
> On Thu, Jul 20, 2017 at 09:38:54PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Create a function that walks a btree, checking the integrity of each
> > btree block (headers, keys, records) and calling back to the caller
> > to perform further checks on the records.  Add some helper functions
> > so that we report detailed scrub errors in a uniform manner in dmesg.
> > These are helper functions for subsequent patches.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/Makefile       |    1 
> >  fs/xfs/scrub/btree.c  |  658 +++++++++++++++++++++++++++++++++++++++++++++++++
> >  fs/xfs/scrub/btree.h  |   95 +++++++
> >  fs/xfs/scrub/common.c |  169 +++++++++++++
> >  fs/xfs/scrub/common.h |   30 ++
> >  5 files changed, 953 insertions(+)
> >  create mode 100644 fs/xfs/scrub/btree.c
> >  create mode 100644 fs/xfs/scrub/btree.h
> .....
> > +/* btree scrubbing */
> > +
> > +const char * const btree_types[] = {
> > +	[XFS_BTNUM_BNO]		= "bnobt",
> > +	[XFS_BTNUM_CNT]		= "cntbt",
> > +	[XFS_BTNUM_RMAP]	= "rmapbt",
> > +	[XFS_BTNUM_BMAP]	= "bmapbt",
> > +	[XFS_BTNUM_INO]		= "inobt",
> > +	[XFS_BTNUM_FINO]	= "finobt",
> > +	[XFS_BTNUM_REFC]	= "refcountbt",
> > +};
> 
> Don't we already have that already defined somewhere?

No...?  I can't find anything.

> > +
> > +/* Format the trace parameters for the tree cursor. */
> > +static inline void
> > +xfs_scrub_btree_format(
> > +	struct xfs_btree_cur		*cur,
> > +	int				level,
> > +	char				*bt_type,
> > +	size_t				type_len,
> > +	char				*bt_ptr,
> > +	size_t				ptr_len,
> > +	xfs_fsblock_t			*fsbno)
> > +{
> > +	char				*type = NULL;
> > +	struct xfs_btree_block		*block;
> > +	struct xfs_buf			*bp;
> 
> hmmm - complex text formatting just for trace point output,
> which are rarely going to be used in production? Not sure how I feel
> about that yet.

I haven't decided if a future version of xfs_scrub will want to capture
check error details from the kernel via ftrace or something.

However, as this function is only called from two places I could just
replace all the string handling bits in favor of directly tracing on
XFS_BTNUM_*/XFS_*_FORK/inode numbers via different tracepoints.

> Also, the function is way too big for being an inline function. I'd
> be tempted to mark it noinline so the compiler doesn't blow out the
> size of the code unnecessarily with automatic inlining of static
> functions.

ok.

> (I haven't reviewed the formatting for sanity).
> 
> > +/* Check for btree corruption. */
> > +bool
> > +xfs_scrub_btree_ok(
> > +	struct xfs_scrub_context	*sc,
> > +	struct xfs_btree_cur		*cur,
> > +	int				level,
> > +	bool				fs_ok,
> > +	const char			*check,
> > +	const char			*func,
> > +	int				line)
> > +{
> > +	char				bt_ptr[24];
> > +	char				bt_type[48];
> > +	xfs_fsblock_t			fsbno;
> > +
> > +	if (fs_ok)
> > +		return fs_ok;
> > +
> > +	sc->sm->sm_flags |= XFS_SCRUB_FLAG_CORRUPT;
> > +	xfs_scrub_btree_format(cur, level, bt_type, 48, bt_ptr, 24, &fsbno);
> 
> Ok, magic numbers for buffer lengths. Please use #defines for these
> with an explanation of why the chosen lengths are sufficient for the
> information they'll be used to hold.
> 
> > +	trace_xfs_scrub_btree_error(cur->bc_mp, bt_type, bt_ptr,
> > +			XFS_FSB_TO_AGNO(cur->bc_mp, fsbno),
> > +			XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno),
> > +			check, func, line);
> 
> hmmmm - tracepoints are conditional, but the formatting call isn't.
> Can this formatting be called/run from inside the tracepoint code
> itself?

I don't know of a way to find out if a given set of tracepoint(s) are
enabled.  Fortunately the formatting call only happens if corruption is
detected.

> > +
> > +/*
> > + * Make sure this record is in order and doesn't stray outside of the parent
> > + * keys.
> > + */
> > +STATIC int
> > +xfs_scrub_btree_rec(
> > +	struct xfs_scrub_btree	*bs)
> > +{
> > +	struct xfs_btree_cur	*cur = bs->cur;
> > +	union xfs_btree_rec	*rec;
> > +	union xfs_btree_key	key;
> > +	union xfs_btree_key	hkey;
> > +	union xfs_btree_key	*keyp;
> > +	struct xfs_btree_block	*block;
> > +	struct xfs_btree_block	*keyblock;
> > +	struct xfs_buf		*bp;
> > +
> > +	block = xfs_btree_get_block(cur, 0, &bp);
> > +	rec = xfs_btree_rec_addr(cur, cur->bc_ptrs[0], block);
> > +
> > +	if (bp)
> > +		trace_xfs_scrub_btree_rec(cur->bc_mp,
> > +				XFS_FSB_TO_AGNO(cur->bc_mp,
> > +					XFS_DADDR_TO_FSB(cur->bc_mp, bp->b_bn)),
> > +				XFS_FSB_TO_AGBNO(cur->bc_mp,
> > +					XFS_DADDR_TO_FSB(cur->bc_mp, bp->b_bn)),
> > +				cur->bc_btnum, 0, cur->bc_nlevels,
> > +				cur->bc_ptrs[0]);
> > +	else if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE)
> > +		trace_xfs_scrub_btree_rec(cur->bc_mp,
> > +				XFS_INO_TO_AGNO(cur->bc_mp,
> > +					cur->bc_private.b.ip->i_ino),
> > +				XFS_INO_TO_AGBNO(cur->bc_mp,
> > +					cur->bc_private.b.ip->i_ino),
> > +				cur->bc_btnum, 0, cur->bc_nlevels,
> > +				cur->bc_ptrs[0]);
> > +	else
> > +		trace_xfs_scrub_btree_rec(cur->bc_mp,
> > +				NULLAGNUMBER, NULLAGBLOCK,
> > +				cur->bc_btnum, 0, cur->bc_nlevels,
> > +				cur->bc_ptrs[0]);
> 
> Hmmm - there's more code in the trace calls than there is in the
> scrubbing code. Is this really all necessary? I can see code
> getting changed in future but not the tracepoints, similar to how
> comment updates get missed...

I've found it useful when analyzing the scrub-fuzz xfstests to be able
to pinpoint exactly what record in a btree hit some bug or other.

> > +	/* If this isn't the first record, are they in order? */
> > +	XFS_SCRUB_BTREC_CHECK(bs, bs->firstrec ||
> > +			cur->bc_ops->recs_inorder(cur, &bs->lastrec, rec));
> 
> So, I go look at the macro:
> 
> define XFS_SCRUB_BTREC_CHECK(bs, fs_ok) \
> 	xfs_scrub_btree_ok((bs)->sc, (bs)->cur, 0, (fs_ok), #fs_ok, \
> 			   __func__, __LINE__)
> 
> I find this:
> 
> 	/* If this isn't the first record, are they in order? */
> 	if (!(bs->firstrec &&
> 	     cur->bc_ops->recs_inorder(cur, &bs->lastrec, rec)))
> 		xfs_scrub_btree_error(bs->sc, cur, 0, "Record order", _THIS_IP_)
> 
> A lot easier to read, understand and maintain because I don't have
> to go look at a macro to find out it actually does and what happens
> if the records aren't in order....
> 
> > +/* Check a btree pointer. */
> > +static int
> > +xfs_scrub_btree_ptr(
> > +	struct xfs_scrub_btree		*bs,
> > +	int				level,
> > +	union xfs_btree_ptr		*ptr)
> > +{
> > +	struct xfs_btree_cur		*cur = bs->cur;
> > +	xfs_daddr_t			daddr;
> > +	xfs_daddr_t			eofs;
> > +
> > +	if ((cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) &&
> > +			level == cur->bc_nlevels) {
> > +		if (cur->bc_flags & XFS_BTREE_LONG_PTRS) {
> > +			XFS_SCRUB_BTKEY_GOTO(bs, level, ptr->l == 0, corrupt);
> > +		} else {
> > +			XFS_SCRUB_BTKEY_GOTO(bs, level, ptr->s == 0, corrupt);
> > +		}
> > +		return 0;
> > +	}
> > +
> > +	if (cur->bc_flags & XFS_BTREE_LONG_PTRS) {
> > +		XFS_SCRUB_BTKEY_GOTO(bs, level,
> > +				ptr->l != cpu_to_be64(NULLFSBLOCK), corrupt);
> > +
> > +		daddr = XFS_FSB_TO_DADDR(cur->bc_mp, be64_to_cpu(ptr->l));
> > +	} else {
> > +		XFS_SCRUB_BTKEY_GOTO(bs, level,
> > +				cur->bc_private.a.agno != NULLAGNUMBER, corrupt);
> > +		XFS_SCRUB_BTKEY_GOTO(bs, level,
> > +				ptr->s != cpu_to_be32(NULLAGBLOCK), corrupt);
> > +
> 
> Need to check the ptr points to an agbno within the AG size.
> 
> Also:
> 	why no tracing on ptr values?
> 	check the ptr points to an agbno within the AG size.
> 	check the ptr points to an agno within agcount
> 
> 
> > +		daddr = XFS_AGB_TO_DADDR(cur->bc_mp, cur->bc_private.a.agno,
> > +				be32_to_cpu(ptr->s));
> > +	}
> > +	eofs = XFS_FSB_TO_BB(cur->bc_mp, cur->bc_mp->m_sb.sb_dblocks);
> > +	XFS_SCRUB_BTKEY_GOTO(bs, level, daddr != 0, corrupt);
> > +	XFS_SCRUB_BTKEY_GOTO(bs, level, daddr < eofs, corrupt);
> > +
> > +	return 0;
> > +
> > +corrupt:
> > +	return -EFSCORRUPTED;
> > +}
> > +
> > +/* Check the siblings of a large format btree block. */
> > +STATIC int
> > +xfs_scrub_btree_lblock_check_siblings(
> > +	struct xfs_scrub_btree		*bs,
> > +	struct xfs_btree_block		*block)
> > +{
> > +	struct xfs_btree_block		*pblock;
> > +	struct xfs_buf			*pbp;
> > +	struct xfs_btree_cur		*ncur = NULL;
> > +	union xfs_btree_ptr		*pp;
> > +	xfs_fsblock_t			leftsib;
> > +	xfs_fsblock_t			rightsib;
> > +	xfs_fsblock_t			fsbno;
> > +	int				level;
> > +	int				success;
> > +	int				error = 0;
> > +
> > +	leftsib = be64_to_cpu(block->bb_u.l.bb_leftsib);
> > +	rightsib = be64_to_cpu(block->bb_u.l.bb_rightsib);
> > +	level = xfs_btree_get_level(block);
> > +
> > +	/* Root block should never have siblings. */
> > +	if (level == bs->cur->bc_nlevels - 1) {
> > +		XFS_SCRUB_BTKEY_CHECK(bs, level, leftsib == NULLFSBLOCK);
> > +		XFS_SCRUB_BTKEY_CHECK(bs, level, rightsib == NULLFSBLOCK);
> > +		return error;
> > +	}
> 
> This is where the macros force us into silly patterns and blow out
> the code size.
> 
> 	if (level == bs->cur->bc_nlevels - 1 &&
> 	    (leftsib != NULLFSBLOCK || rightsib != NULLFSBLOCK) {
> 		/* error trace call */
> 		return error;
> 	}

We're also losing information here.  With the previous code you can tell
between leftsib and rightsib which one is corrupt, whereas with the
suggested replacement you can only tell that at least one of them is
broken.

I want to avoid the situation where scrub flags a corruption, the user
runs ftrace to give us __return_address, and we go looking for that only
to find that it points to a line of code that tests two different
fields.

> > +	/* Does the left sibling match the parent level left block? */
> > +	if (leftsib != NULLFSBLOCK) {
> > +		error = xfs_btree_dup_cursor(bs->cur, &ncur);
> > +		if (error)
> > +			return error;
> > +		error = xfs_btree_decrement(ncur, level + 1, &success);
> > +		XFS_SCRUB_BTKEY_OP_ERROR_GOTO(bs, level + 1, &error, out_cur);
> 
> Hmmm - if I read that right, there's a goto out_cur on error hidden
> in this macro....

No more hidden than XFS_WANT_CORRUPTED_GOTO. :)

But ok, I can rework these as:

if (!xfs_scrub_btree_op_ok(bs->sc, bs->cur, level + 1, &error))
	goto out_cur;

> > +		XFS_SCRUB_BTKEY_GOTO(bs, level, success, out_cur);
> > +
> > +		pblock = xfs_btree_get_block(ncur, level + 1, &pbp);
> > +		pp = xfs_btree_ptr_addr(ncur, ncur->bc_ptrs[level + 1], pblock);
> > +		if (!xfs_scrub_btree_ptr(bs, level + 1, pp)) {
> > +			fsbno = be64_to_cpu(pp->l);
> > +			XFS_SCRUB_BTKEY_CHECK(bs, level, fsbno == leftsib);
> > +		}
> > +
> > +		xfs_btree_del_cursor(ncur, XFS_BTREE_ERROR);
> > +		ncur = NULL;
> > +	}
> > +
> > +	/* Does the right sibling match the parent level right block? */
> > +	if (!error && rightsib != NULLFSBLOCK) {
> 
> So when would error ever be non-zero here?
> 
> This is one of the reasons I really don't like all the macros in
> this code - it unnecessarily obfuscates the checks being done and
> the code flow....

It's unnecessary, you're right.

> > +/* Check the siblings of a small format btree block. */
> > +STATIC int
> > +xfs_scrub_btree_sblock_check_siblings(
> > +	struct xfs_scrub_btree		*bs,
> > +	struct xfs_btree_block		*block)
> > +{
> > +	struct xfs_btree_block		*pblock;
> > +	struct xfs_buf			*pbp;
> > +	struct xfs_btree_cur		*ncur = NULL;
> > +	union xfs_btree_ptr		*pp;
> > +	xfs_agblock_t			leftsib;
> > +	xfs_agblock_t			rightsib;
> > +	xfs_agblock_t			agbno;
> > +	int				level;
> > +	int				success;
> > +	int				error = 0;
> > +
> > +	leftsib = be32_to_cpu(block->bb_u.s.bb_leftsib);
> > +	rightsib = be32_to_cpu(block->bb_u.s.bb_rightsib);
> > +	level = xfs_btree_get_level(block);
> > +
> > +	/* Root block should never have siblings. */
> > +	if (level == bs->cur->bc_nlevels - 1) {
> > +		XFS_SCRUB_BTKEY_CHECK(bs, level, leftsib == NULLAGBLOCK);
> > +		XFS_SCRUB_BTKEY_CHECK(bs, level, rightsib == NULLAGBLOCK);
> > +		return error;
> > +	}
> > +
> > +	/* Does the left sibling match the parent level left block? */
> > +	if (leftsib != NULLAGBLOCK) {
> > +		error = xfs_btree_dup_cursor(bs->cur, &ncur);
> > +		if (error)
> > +			return error;
> > +		error = xfs_btree_decrement(ncur, level + 1, &success);
> > +		XFS_SCRUB_BTKEY_OP_ERROR_GOTO(bs, level + 1, &error, out_cur);
> > +		XFS_SCRUB_BTKEY_GOTO(bs, level, success, verify_rightsib);
> 
> Why is this different to the lblock checks?
> 
> FWIW, this is one of the reasons for abstracting the sblock/lblock
> code as much as possible - the core operations should be identical,
> so apart from header decode/encode operations, the rest of the code
> should be shared...

Yeah.

> > +
> > +		pblock = xfs_btree_get_block(ncur, level + 1, &pbp);
> > +		pp = xfs_btree_ptr_addr(ncur, ncur->bc_ptrs[level + 1], pblock);
> > +		if (!xfs_scrub_btree_ptr(bs, level + 1, pp)) {
> > +			agbno = be32_to_cpu(pp->s);
> > +			XFS_SCRUB_BTKEY_CHECK(bs, level, agbno == leftsib);
> > +		}
> > +
> > +		xfs_btree_del_cursor(ncur, XFS_BTREE_ERROR);
> > +		ncur = NULL;
> > +	}
> > +
> > +verify_rightsib:
> > +	if (ncur) {
> > +		xfs_btree_del_cursor(ncur, XFS_BTREE_ERROR);
> > +		ncur = NULL;
> > +	}
> > +
> > +	/* Does the right sibling match the parent level right block? */
> > +	if (rightsib != NULLAGBLOCK) {
> 
> No "if (!error ...) check here - I'm thinking there's some factoring
> needed here to reduce the code duplication going on here...

Ok, will do.  These two functions can be rewritten as a single function
that uses cur->bc_ops->diff_two_keys.  Then we discard
bs->check_siblings_fn.

> > +/*
> > + * Visit all nodes and leaves of a btree.  Check that all pointers and
> > + * records are in order, that the keys reflect the records, and use a callback
> > + * so that the caller can verify individual records.  The callback is the same
> > + * as the one for xfs_btree_query_range, so therefore this function also
> > + * returns XFS_BTREE_QUERY_RANGE_ABORT, zero, or a negative error code.
> > + */
> > +int
> > +xfs_scrub_btree(
> > +	struct xfs_scrub_context	*sc,
> > +	struct xfs_btree_cur		*cur,
> > +	xfs_scrub_btree_rec_fn		scrub_fn,
> > +	struct xfs_owner_info		*oinfo,
> > +	void				*private)
> > +{
> > +	struct xfs_scrub_btree		bs = {0};
> > +	union xfs_btree_ptr		ptr;
> > +	union xfs_btree_ptr		*pp;
> > +	union xfs_btree_rec		*recp;
> > +	struct xfs_btree_block		*block;
> > +	int				level;
> > +	struct xfs_buf			*bp;
> > +	int				i;
> > +	int				error = 0;
> > +
> > +	/* Finish filling out the scrub state */
> 
> 	/* Initialise the scrub state */
> 
> > +	bs.cur = cur;
> > +	bs.scrub_rec = scrub_fn;
> > +	bs.oinfo = oinfo;
> > +	bs.firstrec = true;
> > +	bs.private = private;
> > +	bs.sc = sc;
> > +	for (i = 0; i < XFS_BTREE_MAXLEVELS; i++)
> > +		bs.firstkey[i] = true;
> > +	INIT_LIST_HEAD(&bs.to_check);
> > +
> > +	if (cur->bc_flags & XFS_BTREE_LONG_PTRS)
> > +		bs.check_siblings_fn = xfs_scrub_btree_lblock_check_siblings;
> > +	else
> > +		bs.check_siblings_fn = xfs_scrub_btree_sblock_check_siblings;
> 
> I'm thinking now that maybe a "get sibling from block" is what is
> necessary here, so there can be a shared check function....
> 
> > +	/* Don't try to check a tree with a height we can't handle. */
> > +	XFS_SCRUB_BTREC_GOTO(&bs, cur->bc_nlevels > 0, out_badcursor);
> > +	XFS_SCRUB_BTREC_GOTO(&bs, cur->bc_nlevels <= XFS_BTREE_MAXLEVELS,
> > +			out_badcursor);
> 
> More single checks that are doubled up...
> 
> ....
> > +out:
> > +	/*
> > +	 * If we don't end this function with the cursor pointing at a record
> > +	 * block, a subsequent non-error cursor deletion will not release
> > +	 * node-level buffers, causing a buffer leak.  This is quite possible
> > +	 * with a zero-results scrubbing run, so release the buffers if we
> > +	 * aren't pointing at a record.
> > +	 */
> > +	if (cur->bc_bufs[0] == NULL) {
> > +		for (i = 0; i < cur->bc_nlevels; i++) {
> > +			if (cur->bc_bufs[i]) {
> > +				xfs_trans_brelse(cur->bc_tp, cur->bc_bufs[i]);
> > +				cur->bc_bufs[i] = NULL;
> > +				cur->bc_ptrs[i] = 0;
> > +				cur->bc_ra[i] = 0;
> > +			}
> > +		}
> > +	}
> 
> I think cursor deletion should be made to handle this case, rather
> than special casing it here....

Yeah.  I think it's safe (now) because we have xfs_scrub_ag_btcur_free
which tears down all the cursors in BTREE_ERROR mode.

> > +struct xfs_scrub_btree;
> > +typedef int (*xfs_scrub_btree_rec_fn)(
> > +	struct xfs_scrub_btree	*bs,
> > +	union xfs_btree_rec	*rec);
> > +
> > +struct xfs_scrub_btree {
> > +	/* caller-provided scrub state */
> > +	struct xfs_scrub_context	*sc;
> > +	struct xfs_btree_cur		*cur;
> > +	xfs_scrub_btree_rec_fn		scrub_rec;
> > +	struct xfs_owner_info		*oinfo;
> > +	void				*private;
> > +
> > +	/* internal scrub state */
> > +	union xfs_btree_rec		lastrec;
> > +	bool				firstrec;
> > +	union xfs_btree_key		lastkey[XFS_BTREE_MAXLEVELS];
> > +	bool				firstkey[XFS_BTREE_MAXLEVELS];
> > +	struct list_head		to_check;
> > +	int				(*check_siblings_fn)(
> > +						struct xfs_scrub_btree *,
> > +						struct xfs_btree_block *);
> > +};
> 
> This looks like maybe another ops style structure should be used. We've
> got a xfs_scrub_btree_rec_fn() and check_siblings_fn() operations -
> maybe these should be pushed into the generic libxfs btree ops
> vectors?
> 
> > +int xfs_scrub_btree(struct xfs_scrub_context *sc, struct xfs_btree_cur *cur,
> > +		    xfs_scrub_btree_rec_fn scrub_fn,
> > +		    struct xfs_owner_info *oinfo, void *private);
> > +
> > +#endif /* __XFS_REPAIR_BTREE_H__ */
> > diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> > index 6931793..331aa14 100644
> > --- a/fs/xfs/scrub/common.c
> > +++ b/fs/xfs/scrub/common.c
> > @@ -43,6 +43,7 @@
> >  #include "xfs_rmap_btree.h"
> >  #include "scrub/xfs_scrub.h"
> >  #include "scrub/common.h"
> > +#include "scrub/btree.h"
> >  
> >  /*
> >   * Online Scrub and Repair
> > @@ -367,6 +368,172 @@ xfs_scrub_incomplete(
> >  	return fs_ok;
> >  }
> >  
> > +/* AG scrubbing */
> > +
> 
> All this from here down doesn't seem related to scrubbing a btree?
> It's infrastructure for scanning AGs, but I don't see where it is
> called from - it looks unused at this point. I think it should be
> separated from the btree validation into it's own patchset and put
> before the individual btree verification code...
> 
> I haven't really looked at this AG scrubbing code in depth because I
> can't tell how it fits into the code that is supposed to call it
> yet. I've read it, but without knowing how it's called I can't tell
> if the abstractions are just convoluted or whether they are
> necessary due to the infrastructure eventually ending up with
> multiple different call sites for some of the functions...

Lock AGI/AGF/AGFL, initialize cursors for all AG btree types, pick the
cursor we want and scan the AG, and use the other btree cursors to
cross-reference each record we see during the scan.

I regret now deferring the posting of the cross-referencing patches,
because some of the overdone bits in this patchset are done to avoid
code churn when that part gets added.  That'll make it (I hope) a little
more obvious why some things are the way they are.

--D

> 
> Cheers,
> 
> Dave.
> 
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 05/22] xfs: scrub in-memory metadata buffers
  2017-07-24  1:43   ` Dave Chinner
@ 2017-07-24 22:36     ` Darrick J. Wong
  2017-07-24 23:38       ` Dave Chinner
  0 siblings, 1 reply; 63+ messages in thread
From: Darrick J. Wong @ 2017-07-24 22:36 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, Jul 24, 2017 at 11:43:27AM +1000, Dave Chinner wrote:
> On Thu, Jul 20, 2017 at 09:39:00PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Call the verifier function for all in-memory metadata buffers, looking
> > for memory corruption either due to bad memory or coding bugs.
> 
> How does this fit into the bigger picture? We can't do an exhaustive
> search of the in memory buffer cache, because access is racy w.r.t.
> the life cycle of in memory buffers.
> 
> Also, if we are doing a full scrub, we're going to hit and then
> check the cached in-memory buffers anyway, so I'm missing the
> context that explains why this code is necessary.

Before we start scanning the filesystem (which could lead to clean
buffers being pushed out of memory and later reread), we want to check
the buffers that have been sitting around in memory to see if they've
mutated since the last time the verifiers ran.

> >  #endif	/* __XFS_REPAIR_COMMON_H__ */
> > diff --git a/fs/xfs/scrub/metabufs.c b/fs/xfs/scrub/metabufs.c
> > new file mode 100644
> > index 0000000..63faaa6
> > --- /dev/null
> > +++ b/fs/xfs/scrub/metabufs.c
> > @@ -0,0 +1,177 @@
> > +/*
> > + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> > + *
> > + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License
> > + * as published by the Free Software Foundation; either version 2
> > + * of the License, or (at your option) any later version.
> > + *
> > + * This program is distributed in the hope that it would be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, write the Free Software Foundation,
> > + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> > + */
> > +#include "xfs.h"
> > +#include "xfs_fs.h"
> > +#include "xfs_shared.h"
> > +#include "xfs_format.h"
> > +#include "xfs_trans_resv.h"
> > +#include "xfs_mount.h"
> > +#include "xfs_defer.h"
> > +#include "xfs_btree.h"
> > +#include "xfs_bit.h"
> > +#include "xfs_log_format.h"
> > +#include "xfs_trans.h"
> > +#include "xfs_trace.h"
> > +#include "xfs_sb.h"
> > +#include "scrub/common.h"
> > +
> > +/* We only iterate buffers one by one, so we don't need any setup. */
> > +int
> > +xfs_scrub_setup_metabufs(
> > +	struct xfs_scrub_context	*sc,
> > +	struct xfs_inode		*ip)
> > +{
> > +	return 0;
> > +}
> > +
> > +#define XFS_SCRUB_METABUFS_TOO_MANY_RETRIES	10
> > +struct xfs_scrub_metabufs_info {
> > +	struct xfs_scrub_context	*sc;
> > +	unsigned int			retries;
> > +};
> 
> So do we get 10 retries per buffer, or 10 retries across an entire
> scan?

Ten per scan.

> > +/* In-memory buffer corruption. */
> > +
> > +#define XFS_SCRUB_BUF_OP_ERROR_GOTO(label) \
> > +	XFS_SCRUB_OP_ERROR_GOTO(smi->sc, \
> > +			xfs_daddr_to_agno(smi->sc->mp, bp->b_bn), \
> > +			xfs_daddr_to_agbno(smi->sc->mp, bp->b_bn), "buf", \
> > +			&error, label)
> 
> Nested macros - yuck!
> 
> > +STATIC int
> > +xfs_scrub_metabufs_scrub_buf(
> > +	struct xfs_scrub_metabufs_info	*smi,
> > +	struct xfs_buf			*bp)
> > +{
> > +	int				olderror;
> > +	int				error = 0;
> > +
> > +	/*
> > +	 * We hold the rcu lock during the rhashtable walk, so we can't risk
> > +	 * having the log forced due to a stale buffer by xfs_buf_lock.
> > +	 */
> > +	if (bp->b_flags & XBF_STALE)
> > +		return 0;
> > +
> > +	atomic_inc(&bp->b_hold);
> 
> This looks wrong. I think it can race with reclaim because we don't
> hold the pag->pag_buf_lock. i.e.  xfs_buf_rele() does this:
> 
> 	release = atomic_dec_and_lock(&bp->b_hold, &pag->pag_buf_lock);
> 
> to prevent lookups - which are done under the pag->pag_buf_lock -
> from finding the buffer while it has a zero hold count and may be
> removed from the cache and freed.

I could be misunderstanding rhashtable here -- as I understand it, the
rhashtable_walk_start function calls rcu_read_lock and doesn't release
it until we call rhashtable_walk_stop.  The rhashtable lookup, insert,
and remove functions each call rcu_read_lock before fidding with the
hashtable internals.  I /think/ this means that even if the scrubber
gets ahold of a buffer with zero b_hold that's being xfs_buf_rele'd,
that concurrent xfs_buf_rele will be waiting for rcu_read_lock, and
therefore the buffer cannot be freed until the walk stops.

> Further, if we are going to iterate the cache, I'd much prefer the
> iteration code to be in fs/xfs/xfs_buf.c - nothing outside that file
> should be touching core buffer cache structures....
> 
> FWIW, the LRU walks already handle this problem by the fact the LRU
> owns a hold count on the buffer. So it may be better to do this via
> a LRU walk rather than a hashtable walk....

Yeah, once we clear up the above I'll move it to xfs_buf.c.

> [snip]
> 
> > +	olderror = bp->b_error;
> > +	if (bp->b_fspriv)
> > +		bp->b_ops->verify_write(bp);
> 
> Should we be recalculating the CRC on buffers we aren't about to 
> be writing to disk?

I don't think it causes any harm to recalculate the crc early.  If the
buffer is dirty and corrupt we can't fix it and write out will flag it
and shut down the fs anyway, so it likely doesn't matter anyway.

> Should we be verifying a buffer that has a non-zero error value on it?

No.

> > +	else
> > +		bp->b_ops->verify_read(bp);
> > +	error = bp->b_error;
> > +	bp->b_error = olderror;
> > +
> > +	/* Mark any corruption errors we might find. */
> > +	XFS_SCRUB_BUF_OP_ERROR_GOTO(out_unlock);
> 
> Ah, what? Why does this need a goto? And why doesn't it report the
> error that was found? (bloody macros!).
> 
> > +out_unlock:
> > +	xfs_buf_unlock(bp);
> > +out_dec:
> > +	atomic_dec(&bp->b_hold);
> > +	return error;
> > +}
> > +#undef XFS_SCRUB_BUF_OP_ERROR_GOTO
> 
> Oh, that's the macro defined above the function. Which I paid little
> attention to other than it called another macro. Now I realise that
> it (ab)uses local variables without them being passed into the
> macro. Yup, another reason we need to get rid of the macros from
> this code....

Ok.

> > +	struct xfs_scrub_metabufs_info	*smi,
> > +	struct rhashtable_iter		*iter)
> > +{
> > +	struct xfs_buf			*bp;
> > +	int				error = 0;
> > +
> > +	do {
> > +		if (xfs_scrub_should_terminate(&error))
> > +			break;
> > +
> > +		bp = rhashtable_walk_next(iter);
> > +		if (IS_ERR(bp))
> > +			return PTR_ERR(bp);
> > +		else if (bp == NULL)
> > +			return 0;
> > +
> > +		error = xfs_scrub_metabufs_scrub_buf(smi, bp);
> > +	} while (error != 0);
> > +
> > +	return error;
> > +}
> > +
> > +/* Try to walk the buffers in this AG in order to scrub them. */
> > +int
> > +xfs_scrub_metabufs(
> 
> Ah, please put an "_ag_" in this so it's clear it's only scrubbing a
> single AG. This is hidden deep inside the scrub context, so it took
> me a little bit of back tracking to understand that this wasn't a
> global scan....

Ok.

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 04/22] xfs: generic functions to scrub metadata and btrees
  2017-07-24 21:58     ` Darrick J. Wong
@ 2017-07-24 23:15       ` Dave Chinner
  2017-07-25  0:39         ` Darrick J. Wong
  0 siblings, 1 reply; 63+ messages in thread
From: Dave Chinner @ 2017-07-24 23:15 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Jul 24, 2017 at 02:58:10PM -0700, Darrick J. Wong wrote:
> On Mon, Jul 24, 2017 at 11:05:49AM +1000, Dave Chinner wrote:
> > > +	struct xfs_btree_cur		*cur,
> > > +	int				level,
> > > +	bool				fs_ok,
> > > +	const char			*check,
> > > +	const char			*func,
> > > +	int				line)
> > > +{
> > > +	char				bt_ptr[24];
> > > +	char				bt_type[48];
> > > +	xfs_fsblock_t			fsbno;
> > > +
> > > +	if (fs_ok)
> > > +		return fs_ok;
> > > +
> > > +	sc->sm->sm_flags |= XFS_SCRUB_FLAG_CORRUPT;
> > > +	xfs_scrub_btree_format(cur, level, bt_type, 48, bt_ptr, 24, &fsbno);
> > 
> > Ok, magic numbers for buffer lengths. Please use #defines for these
> > with an explanation of why the chosen lengths are sufficient for the
> > information they'll be used to hold.
> > 
> > > +	trace_xfs_scrub_btree_error(cur->bc_mp, bt_type, bt_ptr,
> > > +			XFS_FSB_TO_AGNO(cur->bc_mp, fsbno),
> > > +			XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno),
> > > +			check, func, line);
> > 
> > hmmmm - tracepoints are conditional, but the formatting call isn't.
> > Can this formatting be called/run from inside the tracepoint code
> > itself?
> 
> I don't know of a way to find out if a given set of tracepoint(s) are
> enabled.  Fortunately the formatting call only happens if corruption is
> detected.

I was thinking more that the function is called from within the
tracepoint code itself. Tracepoints can have code embedded in
them...

> 
> > > +
> > > +/*
> > > + * Make sure this record is in order and doesn't stray outside of the parent
> > > + * keys.
> > > + */
> > > +STATIC int
> > > +xfs_scrub_btree_rec(
> > > +	struct xfs_scrub_btree	*bs)
> > > +{
> > > +	struct xfs_btree_cur	*cur = bs->cur;
> > > +	union xfs_btree_rec	*rec;
> > > +	union xfs_btree_key	key;
> > > +	union xfs_btree_key	hkey;
> > > +	union xfs_btree_key	*keyp;
> > > +	struct xfs_btree_block	*block;
> > > +	struct xfs_btree_block	*keyblock;
> > > +	struct xfs_buf		*bp;
> > > +
> > > +	block = xfs_btree_get_block(cur, 0, &bp);
> > > +	rec = xfs_btree_rec_addr(cur, cur->bc_ptrs[0], block);
> > > +
> > > +	if (bp)
> > > +		trace_xfs_scrub_btree_rec(cur->bc_mp,
> > > +				XFS_FSB_TO_AGNO(cur->bc_mp,
> > > +					XFS_DADDR_TO_FSB(cur->bc_mp, bp->b_bn)),
> > > +				XFS_FSB_TO_AGBNO(cur->bc_mp,
> > > +					XFS_DADDR_TO_FSB(cur->bc_mp, bp->b_bn)),
> > > +				cur->bc_btnum, 0, cur->bc_nlevels,
> > > +				cur->bc_ptrs[0]);
> > > +	else if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE)
> > > +		trace_xfs_scrub_btree_rec(cur->bc_mp,
> > > +				XFS_INO_TO_AGNO(cur->bc_mp,
> > > +					cur->bc_private.b.ip->i_ino),
> > > +				XFS_INO_TO_AGBNO(cur->bc_mp,
> > > +					cur->bc_private.b.ip->i_ino),
> > > +				cur->bc_btnum, 0, cur->bc_nlevels,
> > > +				cur->bc_ptrs[0]);
> > > +	else
> > > +		trace_xfs_scrub_btree_rec(cur->bc_mp,
> > > +				NULLAGNUMBER, NULLAGBLOCK,
> > > +				cur->bc_btnum, 0, cur->bc_nlevels,
> > > +				cur->bc_ptrs[0]);
> > 
> > Hmmm - there's more code in the trace calls than there is in the
> > scrubbing code. Is this really all necessary? I can see code
> > getting changed in future but not the tracepoints, similar to how
> > comment updates get missed...
> 
> I've found it useful when analyzing the scrub-fuzz xfstests to be able
> to pinpoint exactly what record in a btree hit some bug or other.

Sure, I'm not questioning where it has some use, more just pondering
the complexity required to emit them and whether there's a better
way. I mean, the onl difference is the agno/agbno pair being traced,
so wouldn't it make more sense to trace an opaque 64 bit number here
and do:

	if (bp)
		num = bp->b_bn;
	else if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE)
		num = ip->i_ino;
	else
		num = NULLFSBLOCK;
	trace_xfs_scrub_btree_rec(cur->bc_mp, num, cur->bc_btnum, 0,
				  cur->bc_nlevels, cur->bc_ptrs[0]);

That's much simpler and easier to maintain, but provides the same
info. It's also only one trace event callout, so the code size
should go down as well...

> > > +	leftsib = be64_to_cpu(block->bb_u.l.bb_leftsib);
> > > +	rightsib = be64_to_cpu(block->bb_u.l.bb_rightsib);
> > > +	level = xfs_btree_get_level(block);
> > > +
> > > +	/* Root block should never have siblings. */
> > > +	if (level == bs->cur->bc_nlevels - 1) {
> > > +		XFS_SCRUB_BTKEY_CHECK(bs, level, leftsib == NULLFSBLOCK);
> > > +		XFS_SCRUB_BTKEY_CHECK(bs, level, rightsib == NULLFSBLOCK);
> > > +		return error;
> > > +	}
> > 
> > This is where the macros force us into silly patterns and blow out
> > the code size.
> > 
> > 	if (level == bs->cur->bc_nlevels - 1 &&
> > 	    (leftsib != NULLFSBLOCK || rightsib != NULLFSBLOCK) {
> > 		/* error trace call */
> > 		return error;
> > 	}
> 
> We're also losing information here.  With the previous code you can tell
> between leftsib and rightsib which one is corrupt, whereas with the
> suggested replacement you can only tell that at least one of them is
> broken.

Sure, but that's mostly irrelevant detail because...

> I want to avoid the situation where scrub flags a corruption, the user
> runs ftrace to give us __return_address, and we go looking for that only
> to find that it points to a line of code that tests two different
> fields.

... the fact is you have to go look at the root block header that is
corrupt to analyse the extent of the corruption and eventually
repair it. When it comes to analysing a corruption, you don't just
look at the one field that has been flagged - you look at all the
metadata in the block to determine the extent of the corruption.

If you know a sibling pointer in a root block is corrupt, then the
moment you look at the block header it's *obvious* which sibling
pointer is corrupt. i.e. the one that is not NULLFSBLOCK. It really
doesn't matter what is reported as the error from scrubbing because
the corrupted items are trivially observable.

It's all well and good to scan every piece of metadata for validity,
but that doesn't mean we shouldn't think about what makes sense to
report/log. It's going to be easy to drown the logs in corruption
reports and make it impossible to find the needle that first caused
it.

All I'm saying is blind verbosity like this is an indication that we
haven't thought about how to group or classify corruptions
sufficiently. Yes, we need to be able to detect all corrupted
fields, but we need to recognise that many corruptions are
essentially the same problem distributed across multiple fields and
they'll all be repaired by a single repair action.

In this case, it doesn't matter if it is left or right sibling corruption
because the analysis/repair work we need to do to correct either the
left or right pointer is the same. i.e. we need to walk and validate
the entire chain from both ends to repair a single bad pointer.
Hence it doesn't matter if the left or right sibling is bad, the
action we need to take to repair it is the same because we don't
know if we're sitting on a wholly disconnected part of the chain
(i.e. nasty level of tree corruption) or whether just that link got
stomped on. i.e. bad sibling is an indication that both siblings may
be invalid, not just the one we detected as bad....

SO, yeah, "sibling corruption" means we need to check the entire
sibling chain across all blocks in the btree level, not just in the
direction of the bad pointer.

> > > +	/* Does the left sibling match the parent level left block? */
> > > +	if (leftsib != NULLFSBLOCK) {
> > > +		error = xfs_btree_dup_cursor(bs->cur, &ncur);
> > > +		if (error)
> > > +			return error;
> > > +		error = xfs_btree_decrement(ncur, level + 1, &success);
> > > +		XFS_SCRUB_BTKEY_OP_ERROR_GOTO(bs, level + 1, &error, out_cur);
> > 
> > Hmmm - if I read that right, there's a goto out_cur on error hidden
> > in this macro....
> 
> No more hidden than XFS_WANT_CORRUPTED_GOTO. :)

Right, but I've been wanting to get rid of those XFS_WANT_CORRUPTED
macros for a long, long time... :/

> > > +
> > > +		pblock = xfs_btree_get_block(ncur, level + 1, &pbp);
> > > +		pp = xfs_btree_ptr_addr(ncur, ncur->bc_ptrs[level + 1], pblock);
> > > +		if (!xfs_scrub_btree_ptr(bs, level + 1, pp)) {
> > > +			agbno = be32_to_cpu(pp->s);
> > > +			XFS_SCRUB_BTKEY_CHECK(bs, level, agbno == leftsib);
> > > +		}
> > > +
> > > +		xfs_btree_del_cursor(ncur, XFS_BTREE_ERROR);
> > > +		ncur = NULL;
> > > +	}
> > > +
> > > +verify_rightsib:
> > > +	if (ncur) {
> > > +		xfs_btree_del_cursor(ncur, XFS_BTREE_ERROR);
> > > +		ncur = NULL;
> > > +	}
> > > +
> > > +	/* Does the right sibling match the parent level right block? */
> > > +	if (rightsib != NULLAGBLOCK) {
> > 
> > No "if (!error ...) check here - I'm thinking there's some factoring
> > needed here to reduce the code duplication going on here...
> 
> Ok, will do.  These two functions can be rewritten as a single function
> that uses cur->bc_ops->diff_two_keys.  Then we discard
> bs->check_siblings_fn.

Excellent!

> > > +/* AG scrubbing */
> > > +
> > 
> > All this from here down doesn't seem related to scrubbing a btree?
> > It's infrastructure for scanning AGs, but I don't see where it is
> > called from - it looks unused at this point. I think it should be
> > separated from the btree validation into it's own patchset and put
> > before the individual btree verification code...
> > 
> > I haven't really looked at this AG scrubbing code in depth because I
> > can't tell how it fits into the code that is supposed to call it
> > yet. I've read it, but without knowing how it's called I can't tell
> > if the abstractions are just convoluted or whether they are
> > necessary due to the infrastructure eventually ending up with
> > multiple different call sites for some of the functions...
> 
> Lock AGI/AGF/AGFL, initialize cursors for all AG btree types, pick the
> cursor we want and scan the AG, and use the other btree cursors to
> cross-reference each record we see during the scan.
> 
> I regret now deferring the posting of the cross-referencing patches,
> because some of the overdone bits in this patchset are done to avoid
> code churn when that part gets added.  That'll make it (I hope) a little
> more obvious why some things are the way they are.

Yeah, it's not easy to split large chunks of work up and maintain a
split on a moving codebase.  I don't want you to split these patches
up into fine grained patches because that's just unmanagable, but I
think it's worthwhile to do split out the bits that don't obviously
appear to be related.

For this case, I don't think the follow on patch series would make
any difference to my comments here, because going from btree
verfication code to AG setup infrastructure in a single patch is
quite a shift in context. Patch splits on context switch boundaries
really do help reviewing large chunks of new code :P

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 05/22] xfs: scrub in-memory metadata buffers
  2017-07-24 22:36     ` Darrick J. Wong
@ 2017-07-24 23:38       ` Dave Chinner
  2017-07-25  0:14         ` Darrick J. Wong
  0 siblings, 1 reply; 63+ messages in thread
From: Dave Chinner @ 2017-07-24 23:38 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Jul 24, 2017 at 03:36:54PM -0700, Darrick J. Wong wrote:
> On Mon, Jul 24, 2017 at 11:43:27AM +1000, Dave Chinner wrote:
> > On Thu, Jul 20, 2017 at 09:39:00PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > Call the verifier function for all in-memory metadata buffers, looking
> > > for memory corruption either due to bad memory or coding bugs.
> > 
> > How does this fit into the bigger picture? We can't do an exhaustive
> > search of the in memory buffer cache, because access is racy w.r.t.
> > the life cycle of in memory buffers.
> > 
> > Also, if we are doing a full scrub, we're going to hit and then
> > check the cached in-memory buffers anyway, so I'm missing the
> > context that explains why this code is necessary.
> 
> Before we start scanning the filesystem (which could lead to clean
> buffers being pushed out of memory and later reread), we want to check
> the buffers that have been sitting around in memory to see if they've
> mutated since the last time the verifiers ran.

I'm not sure we need a special cache walk to do this.

My thinking is that if the buffers get pushed out of memory, the
verifier will be run at that time, so we don't need to run the
verifier before a scrub to avoid problems here.

Further, if we read the buffer as part of the scrub and it's found
in cache, then if the scrub finds a corruption we'll know it
happened between the last verifier invocation and the scrub.

If the buffer is not in cache and scrub reads the metadata from
disk, then the verifier should fire on read if the item is corrupt
coming off disk. If the verifier doesn't find corruption in this
case but scrub does, then we've got to think about whether the
verifier has sufficient coverage.

> > > +#define XFS_SCRUB_METABUFS_TOO_MANY_RETRIES	10
> > > +struct xfs_scrub_metabufs_info {
> > > +	struct xfs_scrub_context	*sc;
> > > +	unsigned int			retries;
> > > +};
> > 
> > So do we get 10 retries per buffer, or 10 retries across an entire
> > scan?
> 
> Ten per scan.

That will prevent large/active filesystems from being scanned
completely, I think.

> > > +STATIC int
> > > +xfs_scrub_metabufs_scrub_buf(
> > > +	struct xfs_scrub_metabufs_info	*smi,
> > > +	struct xfs_buf			*bp)
> > > +{
> > > +	int				olderror;
> > > +	int				error = 0;
> > > +
> > > +	/*
> > > +	 * We hold the rcu lock during the rhashtable walk, so we can't risk
> > > +	 * having the log forced due to a stale buffer by xfs_buf_lock.
> > > +	 */
> > > +	if (bp->b_flags & XBF_STALE)
> > > +		return 0;
> > > +
> > > +	atomic_inc(&bp->b_hold);
> > 
> > This looks wrong. I think it can race with reclaim because we don't
> > hold the pag->pag_buf_lock. i.e.  xfs_buf_rele() does this:
> > 
> > 	release = atomic_dec_and_lock(&bp->b_hold, &pag->pag_buf_lock);
> > 
> > to prevent lookups - which are done under the pag->pag_buf_lock -
> > from finding the buffer while it has a zero hold count and may be
> > removed from the cache and freed.
> 
> I could be misunderstanding rhashtable here -- as I understand it, the
> rhashtable_walk_start function calls rcu_read_lock and doesn't release
> it until we call rhashtable_walk_stop.  The rhashtable lookup, insert,
> and remove functions each call rcu_read_lock before fidding with the
> hashtable internals.  I /think/ this means that even if the scrubber
> gets ahold of a buffer with zero b_hold that's being xfs_buf_rele'd,
> that concurrent xfs_buf_rele will be waiting for rcu_read_lock, and
> therefore the buffer cannot be freed until the walk stops.

No, we're not using RCU to protect object life cycles, and RCU also
doesn't protect the rhashtable walks for xfs_bufs because the
xfs_bufs are not freed by a RCU callback at the end of the grace
period.

FYI, when we moved to the rhashtable code, Lucas Stach also provided
a RCU lookup patch which I didn't merge.  It turns out that the RCU
freeing of xfs_bufs has more overhead than the potential CPU usage
saved by removing lock contention in lookups:

https://www.spinics.net/lists/linux-xfs/msg02186.html

IOWs, the per-ag rhashtable has low enough contention
characteristics than the infrastructure overhead of lockless lookups
result in a net performance loss and so the cache index
insert/lookup/remove code is still protected by
pag->pag_buf_lock....

The LRU walk doesn't need the pag->pag_buf_lock because all the
objects on the LRU already have a reference. Hence it can walked
without affecting lookup/insert/remove performance of the cache...

> > [snip]
> > 
> > > +	olderror = bp->b_error;
> > > +	if (bp->b_fspriv)
> > > +		bp->b_ops->verify_write(bp);
> > 
> > Should we be recalculating the CRC on buffers we aren't about to 
> > be writing to disk?
> 
> I don't think it causes any harm to recalculate the crc early.  If the
> buffer is dirty and corrupt we can't fix it and write out will flag it
> and shut down the fs anyway, so it likely doesn't matter anyway.

ok.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 05/22] xfs: scrub in-memory metadata buffers
  2017-07-24 23:38       ` Dave Chinner
@ 2017-07-25  0:14         ` Darrick J. Wong
  2017-07-25  3:32           ` Dave Chinner
  0 siblings, 1 reply; 63+ messages in thread
From: Darrick J. Wong @ 2017-07-25  0:14 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Jul 25, 2017 at 09:38:13AM +1000, Dave Chinner wrote:
> On Mon, Jul 24, 2017 at 03:36:54PM -0700, Darrick J. Wong wrote:
> > On Mon, Jul 24, 2017 at 11:43:27AM +1000, Dave Chinner wrote:
> > > On Thu, Jul 20, 2017 at 09:39:00PM -0700, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > 
> > > > Call the verifier function for all in-memory metadata buffers, looking
> > > > for memory corruption either due to bad memory or coding bugs.
> > > 
> > > How does this fit into the bigger picture? We can't do an exhaustive
> > > search of the in memory buffer cache, because access is racy w.r.t.
> > > the life cycle of in memory buffers.
> > > 
> > > Also, if we are doing a full scrub, we're going to hit and then
> > > check the cached in-memory buffers anyway, so I'm missing the
> > > context that explains why this code is necessary.
> > 
> > Before we start scanning the filesystem (which could lead to clean
> > buffers being pushed out of memory and later reread), we want to check
> > the buffers that have been sitting around in memory to see if they've
> > mutated since the last time the verifiers ran.
> 
> I'm not sure we need a special cache walk to do this.
> 
> My thinking is that if the buffers get pushed out of memory, the
> verifier will be run at that time, so we don't need to run the
> verifier before a scrub to avoid problems here.

Agreed.

> Further, if we read the buffer as part of the scrub and it's found
> in cache, then if the scrub finds a corruption we'll know it
> happened between the last verifier invocation and the scrub.

Hm.  Prior to the introduction of the metabufs scanner a few weeks ago, 
I had thought it sufficient to assume that memory won't get corrupt, so
as long as the read verifier ran at /some/ point in the past we didn't
need to recheck now.

What if we scrap the metabufs scanner and adapt the read verifier
function pointer to allow scrub to bypass the crc check and return the
_THIS_IP_ from any failing structural test?  Then scrubbers can call the
read verifier directly and extract failure info directly.

> If the buffer is not in cache and scrub reads the metadata from
> disk, then the verifier should fire on read if the item is corrupt
> coming off disk. If the verifier doesn't find corruption in this
> case but scrub does, then we've got to think about whether the
> verifier has sufficient coverage.

Scrub has more comprehensive checks (or it will when xref comes along)
so this is likely to happen, fyi.

> > > > +#define XFS_SCRUB_METABUFS_TOO_MANY_RETRIES	10
> > > > +struct xfs_scrub_metabufs_info {
> > > > +	struct xfs_scrub_context	*sc;
> > > > +	unsigned int			retries;
> > > > +};
> > > 
> > > So do we get 10 retries per buffer, or 10 retries across an entire
> > > scan?
> > 
> > Ten per scan.
> 
> That will prevent large/active filesystems from being scanned
> completely, I think.

Perhaps.  Though at this point I think I'll just drop this patch entirely
in favor of re-calling the verifiers....

--D

> > > > +STATIC int
> > > > +xfs_scrub_metabufs_scrub_buf(
> > > > +	struct xfs_scrub_metabufs_info	*smi,
> > > > +	struct xfs_buf			*bp)
> > > > +{
> > > > +	int				olderror;
> > > > +	int				error = 0;
> > > > +
> > > > +	/*
> > > > +	 * We hold the rcu lock during the rhashtable walk, so we can't risk
> > > > +	 * having the log forced due to a stale buffer by xfs_buf_lock.
> > > > +	 */
> > > > +	if (bp->b_flags & XBF_STALE)
> > > > +		return 0;
> > > > +
> > > > +	atomic_inc(&bp->b_hold);
> > > 
> > > This looks wrong. I think it can race with reclaim because we don't
> > > hold the pag->pag_buf_lock. i.e.  xfs_buf_rele() does this:
> > > 
> > > 	release = atomic_dec_and_lock(&bp->b_hold, &pag->pag_buf_lock);
> > > 
> > > to prevent lookups - which are done under the pag->pag_buf_lock -
> > > from finding the buffer while it has a zero hold count and may be
> > > removed from the cache and freed.
> > 
> > I could be misunderstanding rhashtable here -- as I understand it, the
> > rhashtable_walk_start function calls rcu_read_lock and doesn't release
> > it until we call rhashtable_walk_stop.  The rhashtable lookup, insert,
> > and remove functions each call rcu_read_lock before fidding with the
> > hashtable internals.  I /think/ this means that even if the scrubber
> > gets ahold of a buffer with zero b_hold that's being xfs_buf_rele'd,
> > that concurrent xfs_buf_rele will be waiting for rcu_read_lock, and
> > therefore the buffer cannot be freed until the walk stops.
> 
> No, we're not using RCU to protect object life cycles, and RCU also
> doesn't protect the rhashtable walks for xfs_bufs because the
> xfs_bufs are not freed by a RCU callback at the end of the grace
> period.
> 
> FYI, when we moved to the rhashtable code, Lucas Stach also provided
> a RCU lookup patch which I didn't merge.  It turns out that the RCU
> freeing of xfs_bufs has more overhead than the potential CPU usage
> saved by removing lock contention in lookups:
> 
> https://www.spinics.net/lists/linux-xfs/msg02186.html
> 
> IOWs, the per-ag rhashtable has low enough contention
> characteristics than the infrastructure overhead of lockless lookups
> result in a net performance loss and so the cache index
> insert/lookup/remove code is still protected by
> pag->pag_buf_lock....
> 
> The LRU walk doesn't need the pag->pag_buf_lock because all the
> objects on the LRU already have a reference. Hence it can walked
> without affecting lookup/insert/remove performance of the cache...
> 
> > > [snip]
> > > 
> > > > +	olderror = bp->b_error;
> > > > +	if (bp->b_fspriv)
> > > > +		bp->b_ops->verify_write(bp);
> > > 
> > > Should we be recalculating the CRC on buffers we aren't about to 
> > > be writing to disk?
> > 
> > I don't think it causes any harm to recalculate the crc early.  If the
> > buffer is dirty and corrupt we can't fix it and write out will flag it
> > and shut down the fs anyway, so it likely doesn't matter anyway.
> 
> ok.
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 04/22] xfs: generic functions to scrub metadata and btrees
  2017-07-24 23:15       ` Dave Chinner
@ 2017-07-25  0:39         ` Darrick J. Wong
  0 siblings, 0 replies; 63+ messages in thread
From: Darrick J. Wong @ 2017-07-25  0:39 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Jul 25, 2017 at 09:15:42AM +1000, Dave Chinner wrote:
> On Mon, Jul 24, 2017 at 02:58:10PM -0700, Darrick J. Wong wrote:
> > On Mon, Jul 24, 2017 at 11:05:49AM +1000, Dave Chinner wrote:
> > > > +	struct xfs_btree_cur		*cur,
> > > > +	int				level,
> > > > +	bool				fs_ok,
> > > > +	const char			*check,
> > > > +	const char			*func,
> > > > +	int				line)
> > > > +{
> > > > +	char				bt_ptr[24];
> > > > +	char				bt_type[48];
> > > > +	xfs_fsblock_t			fsbno;
> > > > +
> > > > +	if (fs_ok)
> > > > +		return fs_ok;
> > > > +
> > > > +	sc->sm->sm_flags |= XFS_SCRUB_FLAG_CORRUPT;
> > > > +	xfs_scrub_btree_format(cur, level, bt_type, 48, bt_ptr, 24, &fsbno);
> > > 
> > > Ok, magic numbers for buffer lengths. Please use #defines for these
> > > with an explanation of why the chosen lengths are sufficient for the
> > > information they'll be used to hold.
> > > 
> > > > +	trace_xfs_scrub_btree_error(cur->bc_mp, bt_type, bt_ptr,
> > > > +			XFS_FSB_TO_AGNO(cur->bc_mp, fsbno),
> > > > +			XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno),
> > > > +			check, func, line);
> > > 
> > > hmmmm - tracepoints are conditional, but the formatting call isn't.
> > > Can this formatting be called/run from inside the tracepoint code
> > > itself?
> > 
> > I don't know of a way to find out if a given set of tracepoint(s) are
> > enabled.  Fortunately the formatting call only happens if corruption is
> > detected.
> 
> I was thinking more that the function is called from within the
> tracepoint code itself. Tracepoints can have code embedded in
> them...

Oh.  I'll look into that.

> > 
> > > > +
> > > > +/*
> > > > + * Make sure this record is in order and doesn't stray outside of the parent
> > > > + * keys.
> > > > + */
> > > > +STATIC int
> > > > +xfs_scrub_btree_rec(
> > > > +	struct xfs_scrub_btree	*bs)
> > > > +{
> > > > +	struct xfs_btree_cur	*cur = bs->cur;
> > > > +	union xfs_btree_rec	*rec;
> > > > +	union xfs_btree_key	key;
> > > > +	union xfs_btree_key	hkey;
> > > > +	union xfs_btree_key	*keyp;
> > > > +	struct xfs_btree_block	*block;
> > > > +	struct xfs_btree_block	*keyblock;
> > > > +	struct xfs_buf		*bp;
> > > > +
> > > > +	block = xfs_btree_get_block(cur, 0, &bp);
> > > > +	rec = xfs_btree_rec_addr(cur, cur->bc_ptrs[0], block);
> > > > +
> > > > +	if (bp)
> > > > +		trace_xfs_scrub_btree_rec(cur->bc_mp,
> > > > +				XFS_FSB_TO_AGNO(cur->bc_mp,
> > > > +					XFS_DADDR_TO_FSB(cur->bc_mp, bp->b_bn)),
> > > > +				XFS_FSB_TO_AGBNO(cur->bc_mp,
> > > > +					XFS_DADDR_TO_FSB(cur->bc_mp, bp->b_bn)),
> > > > +				cur->bc_btnum, 0, cur->bc_nlevels,
> > > > +				cur->bc_ptrs[0]);
> > > > +	else if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE)
> > > > +		trace_xfs_scrub_btree_rec(cur->bc_mp,
> > > > +				XFS_INO_TO_AGNO(cur->bc_mp,
> > > > +					cur->bc_private.b.ip->i_ino),
> > > > +				XFS_INO_TO_AGBNO(cur->bc_mp,
> > > > +					cur->bc_private.b.ip->i_ino),
> > > > +				cur->bc_btnum, 0, cur->bc_nlevels,
> > > > +				cur->bc_ptrs[0]);
> > > > +	else
> > > > +		trace_xfs_scrub_btree_rec(cur->bc_mp,
> > > > +				NULLAGNUMBER, NULLAGBLOCK,
> > > > +				cur->bc_btnum, 0, cur->bc_nlevels,
> > > > +				cur->bc_ptrs[0]);
> > > 
> > > Hmmm - there's more code in the trace calls than there is in the
> > > scrubbing code. Is this really all necessary? I can see code
> > > getting changed in future but not the tracepoints, similar to how
> > > comment updates get missed...
> > 
> > I've found it useful when analyzing the scrub-fuzz xfstests to be able
> > to pinpoint exactly what record in a btree hit some bug or other.
> 
> Sure, I'm not questioning where it has some use, more just pondering
> the complexity required to emit them and whether there's a better
> way. I mean, the onl difference is the agno/agbno pair being traced,
> so wouldn't it make more sense to trace an opaque 64 bit number here
> and do:
> 
> 	if (bp)
> 		num = bp->b_bn;
> 	else if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE)
> 		num = ip->i_ino;
> 	else
> 		num = NULLFSBLOCK;
> 	trace_xfs_scrub_btree_rec(cur->bc_mp, num, cur->bc_btnum, 0,
> 				  cur->bc_nlevels, cur->bc_ptrs[0]);
> 
> That's much simpler and easier to maintain, but provides the same
> info. It's also only one trace event callout, so the code size
> should go down as well...

<nod>

> > > > +	leftsib = be64_to_cpu(block->bb_u.l.bb_leftsib);
> > > > +	rightsib = be64_to_cpu(block->bb_u.l.bb_rightsib);
> > > > +	level = xfs_btree_get_level(block);
> > > > +
> > > > +	/* Root block should never have siblings. */
> > > > +	if (level == bs->cur->bc_nlevels - 1) {
> > > > +		XFS_SCRUB_BTKEY_CHECK(bs, level, leftsib == NULLFSBLOCK);
> > > > +		XFS_SCRUB_BTKEY_CHECK(bs, level, rightsib == NULLFSBLOCK);
> > > > +		return error;
> > > > +	}
> > > 
> > > This is where the macros force us into silly patterns and blow out
> > > the code size.
> > > 
> > > 	if (level == bs->cur->bc_nlevels - 1 &&
> > > 	    (leftsib != NULLFSBLOCK || rightsib != NULLFSBLOCK) {
> > > 		/* error trace call */
> > > 		return error;
> > > 	}
> > 
> > We're also losing information here.  With the previous code you can tell
> > between leftsib and rightsib which one is corrupt, whereas with the
> > suggested replacement you can only tell that at least one of them is
> > broken.
> 
> Sure, but that's mostly irrelevant detail because...
> 
> > I want to avoid the situation where scrub flags a corruption, the user
> > runs ftrace to give us __return_address, and we go looking for that only
> > to find that it points to a line of code that tests two different
> > fields.
> 
> ... the fact is you have to go look at the root block header that is
> corrupt to analyse the extent of the corruption and eventually
> repair it. When it comes to analysing a corruption, you don't just
> look at the one field that has been flagged - you look at all the
> metadata in the block to determine the extent of the corruption.
> 
> If you know a sibling pointer in a root block is corrupt, then the
> moment you look at the block header it's *obvious* which sibling
> pointer is corrupt. i.e. the one that is not NULLFSBLOCK. It really
> doesn't matter what is reported as the error from scrubbing because
> the corrupted items are trivially observable.
> 
> It's all well and good to scan every piece of metadata for validity,
> but that doesn't mean we shouldn't think about what makes sense to
> report/log. It's going to be easy to drown the logs in corruption
> reports and make it impossible to find the needle that first caused
> it.
> 
> All I'm saying is blind verbosity like this is an indication that we
> haven't thought about how to group or classify corruptions
> sufficiently. Yes, we need to be able to detect all corrupted
> fields, but we need to recognise that many corruptions are
> essentially the same problem distributed across multiple fields and
> they'll all be repaired by a single repair action.
> 
> In this case, it doesn't matter if it is left or right sibling corruption
> because the analysis/repair work we need to do to correct either the
> left or right pointer is the same. i.e. we need to walk and validate
> the entire chain from both ends to repair a single bad pointer.
> Hence it doesn't matter if the left or right sibling is bad, the
> action we need to take to repair it is the same because we don't
> know if we're sitting on a wholly disconnected part of the chain
> (i.e. nasty level of tree corruption) or whether just that link got
> stomped on. i.e. bad sibling is an indication that both siblings may
> be invalid, not just the one we detected as bad....
> 
> SO, yeah, "sibling corruption" means we need to check the entire
> sibling chain across all blocks in the btree level, not just in the
> direction of the bad pointer.

On some level it doesn't matter at all, since we return a single bit to
userspace, and repair just nukes the whole data structure and rebuilds
it from scratch...

...so I can go combine adjacent checks when I demacro the code; should
we decide for some reason to hurl floods of trace data back to userspace
we can always re-separate them.

> > > > +	/* Does the left sibling match the parent level left block? */
> > > > +	if (leftsib != NULLFSBLOCK) {
> > > > +		error = xfs_btree_dup_cursor(bs->cur, &ncur);
> > > > +		if (error)
> > > > +			return error;
> > > > +		error = xfs_btree_decrement(ncur, level + 1, &success);
> > > > +		XFS_SCRUB_BTKEY_OP_ERROR_GOTO(bs, level + 1, &error, out_cur);
> > > 
> > > Hmmm - if I read that right, there's a goto out_cur on error hidden
> > > in this macro....
> > 
> > No more hidden than XFS_WANT_CORRUPTED_GOTO. :)
> 
> Right, but I've been wanting to get rid of those XFS_WANT_CORRUPTED
> macros for a long, long time... :/
> 
> > > > +
> > > > +		pblock = xfs_btree_get_block(ncur, level + 1, &pbp);
> > > > +		pp = xfs_btree_ptr_addr(ncur, ncur->bc_ptrs[level + 1], pblock);
> > > > +		if (!xfs_scrub_btree_ptr(bs, level + 1, pp)) {
> > > > +			agbno = be32_to_cpu(pp->s);
> > > > +			XFS_SCRUB_BTKEY_CHECK(bs, level, agbno == leftsib);

FWIW I forgot to mention in my reply that the key pointer scrubber
doesn't actually have to check if the pointer is non-null but valid,
because at some point during scrub we'll try to use the pointer and if
it points somewhere bad we'll notice when we either an io error or a
block that doesn't match the contents we want.

> > > > +		}
> > > > +
> > > > +		xfs_btree_del_cursor(ncur, XFS_BTREE_ERROR);
> > > > +		ncur = NULL;
> > > > +	}
> > > > +
> > > > +verify_rightsib:
> > > > +	if (ncur) {
> > > > +		xfs_btree_del_cursor(ncur, XFS_BTREE_ERROR);
> > > > +		ncur = NULL;
> > > > +	}
> > > > +
> > > > +	/* Does the right sibling match the parent level right block? */
> > > > +	if (rightsib != NULLAGBLOCK) {
> > > 
> > > No "if (!error ...) check here - I'm thinking there's some factoring
> > > needed here to reduce the code duplication going on here...
> > 
> > Ok, will do.  These two functions can be rewritten as a single function
> > that uses cur->bc_ops->diff_two_keys.  Then we discard
> > bs->check_siblings_fn.
> 
> Excellent!
> 
> > > > +/* AG scrubbing */
> > > > +
> > > 
> > > All this from here down doesn't seem related to scrubbing a btree?
> > > It's infrastructure for scanning AGs, but I don't see where it is
> > > called from - it looks unused at this point. I think it should be
> > > separated from the btree validation into it's own patchset and put
> > > before the individual btree verification code...
> > > 
> > > I haven't really looked at this AG scrubbing code in depth because I
> > > can't tell how it fits into the code that is supposed to call it
> > > yet. I've read it, but without knowing how it's called I can't tell
> > > if the abstractions are just convoluted or whether they are
> > > necessary due to the infrastructure eventually ending up with
> > > multiple different call sites for some of the functions...
> > 
> > Lock AGI/AGF/AGFL, initialize cursors for all AG btree types, pick the
> > cursor we want and scan the AG, and use the other btree cursors to
> > cross-reference each record we see during the scan.
> > 
> > I regret now deferring the posting of the cross-referencing patches,
> > because some of the overdone bits in this patchset are done to avoid
> > code churn when that part gets added.  That'll make it (I hope) a little
> > more obvious why some things are the way they are.
> 
> Yeah, it's not easy to split large chunks of work up and maintain a
> split on a moving codebase.  I don't want you to split these patches
> up into fine grained patches because that's just unmanagable, but I
> think it's worthwhile to do split out the bits that don't obviously
> appear to be related.

Yeah, I might have overreacted some; looking at the tracepoint, helper,
and btree scrub patches I think I only have to add a handful of patches.

> For this case, I don't think the follow on patch series would make
> any difference to my comments here, because going from btree
> verfication code to AG setup infrastructure in a single patch is
> quite a shift in context. Patch splits on context switch boundaries
> really do help reviewing large chunks of new code :P

Yeah, sorry about that.  I got confused by my own code and thought we
were still talking about the helpers patch, not the btree scrub
framework patch.

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 05/22] xfs: scrub in-memory metadata buffers
  2017-07-25  0:14         ` Darrick J. Wong
@ 2017-07-25  3:32           ` Dave Chinner
  2017-07-25  5:27             ` Darrick J. Wong
  0 siblings, 1 reply; 63+ messages in thread
From: Dave Chinner @ 2017-07-25  3:32 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Jul 24, 2017 at 05:14:33PM -0700, Darrick J. Wong wrote:
> On Tue, Jul 25, 2017 at 09:38:13AM +1000, Dave Chinner wrote:
> > On Mon, Jul 24, 2017 at 03:36:54PM -0700, Darrick J. Wong wrote:
> > > On Mon, Jul 24, 2017 at 11:43:27AM +1000, Dave Chinner wrote:
> > > > On Thu, Jul 20, 2017 at 09:39:00PM -0700, Darrick J. Wong wrote:
> > > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > 
> > > > > Call the verifier function for all in-memory metadata buffers, looking
> > > > > for memory corruption either due to bad memory or coding bugs.
> > > > 
> > > > How does this fit into the bigger picture? We can't do an exhaustive
> > > > search of the in memory buffer cache, because access is racy w.r.t.
> > > > the life cycle of in memory buffers.
> > > > 
> > > > Also, if we are doing a full scrub, we're going to hit and then
> > > > check the cached in-memory buffers anyway, so I'm missing the
> > > > context that explains why this code is necessary.
> > > 
> > > Before we start scanning the filesystem (which could lead to clean
> > > buffers being pushed out of memory and later reread), we want to check
> > > the buffers that have been sitting around in memory to see if they've
> > > mutated since the last time the verifiers ran.
> > 
> > I'm not sure we need a special cache walk to do this.
> > 
> > My thinking is that if the buffers get pushed out of memory, the
> > verifier will be run at that time, so we don't need to run the
> > verifier before a scrub to avoid problems here.
> 
> Agreed.
> 
> > Further, if we read the buffer as part of the scrub and it's found
> > in cache, then if the scrub finds a corruption we'll know it
> > happened between the last verifier invocation and the scrub.
> 
> Hm.  Prior to the introduction of the metabufs scanner a few weeks ago, 
> I had thought it sufficient to assume that memory won't get corrupt, so
> as long as the read verifier ran at /some/ point in the past we didn't
> need to recheck now.
> 
> What if we scrap the metabufs scanner and adapt the read verifier
> function pointer to allow scrub to bypass the crc check and return the
> _THIS_IP_ from any failing structural test?  Then scrubbers can call the
> read verifier directly and extract failure info directly.

Yeah, that would work - rather than adapting the .read_verify op
we currently have, maybe a new op .read_verify_nocrc could be added?
THat would mostly just be a different wrapper around the existing
verify functions that are shared between the read and write
verifiers...

> > If the buffer is not in cache and scrub reads the metadata from
> > disk, then the verifier should fire on read if the item is corrupt
> > coming off disk. If the verifier doesn't find corruption in this
> > case but scrub does, then we've got to think about whether the
> > verifier has sufficient coverage.
> 
> Scrub has more comprehensive checks (or it will when xref comes along)
> so this is likely to happen, fyi.

Yup, I expect it will. :) I also expect this to point out where the
verifiers can be improved, because I'm sure we haven't caught all
the "obviously wrong" cases in the verifiers yet...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 06/22] xfs: scrub the backup superblocks
  2017-07-21  4:39 ` [PATCH 06/22] xfs: scrub the backup superblocks Darrick J. Wong
  2017-07-23 16:50   ` Allison Henderson
@ 2017-07-25  4:05   ` Dave Chinner
  2017-07-25  5:42     ` Darrick J. Wong
  1 sibling, 1 reply; 63+ messages in thread
From: Dave Chinner @ 2017-07-25  4:05 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Jul 20, 2017 at 09:39:07PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Ensure that the geometry presented in the backup superblocks matches
> the primary superblock so that repair can recover the filesystem if
> that primary gets corrupted.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
....
> +int
> +xfs_scrub_setup_ag_header(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_inode		*ip)
> +{
> +	struct xfs_mount		*mp = sc->mp;
> +
> +	if (sc->sm->sm_agno >= mp->m_sb.sb_agcount ||
> +	    sc->sm->sm_ino || sc->sm->sm_gen)
> +		return -EINVAL;
> +	return xfs_scrub_setup_fs(sc, ip);
> +}

Could we create a superblock buffer here that contains just the bits
we expect the secondary superblocks to have up to date (everything
else should be zero!), and then just use a memcmp() on the raw
secondary superblock buffer?

If there is a difference, then we can dig further to find what's
wrong?

> +/* Superblock */
> +
> +#define XFS_SCRUB_SB_CHECK(fs_ok) \
> +	XFS_SCRUB_CHECK(sc, bp, "superblock", fs_ok)
> +#define XFS_SCRUB_SB_PREEN(fs_ok) \
> +	XFS_SCRUB_PREEN(sc, bp, "superblock", fs_ok)

I don't understand from reading the code why some fields are checked
and others are preened. A comment explaining this would be helpful.

> +#define XFS_SCRUB_SB_OP_ERROR_GOTO(label) \
> +	XFS_SCRUB_OP_ERROR_GOTO(sc, agno, 0, "superblock", &error, out)
> +/* Scrub the filesystem superblock. */
> +int
> +xfs_scrub_superblock(
> +	struct xfs_scrub_context	*sc)
> +{
> +	struct xfs_mount		*mp = sc->mp;
> +	struct xfs_buf			*bp;
> +	struct xfs_sb			sb;
> +	xfs_agnumber_t			agno;
> +	uint32_t			v2_ok;
> +	int				error;
> +
> +	agno = sc->sm->sm_agno;
> +
> +	error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
> +		  XFS_AGB_TO_DADDR(mp, agno, XFS_SB_BLOCK(mp)),
> +		  XFS_FSS_TO_BB(mp, 1), 0, &bp, &xfs_sb_buf_ops);
> +	if (error) {
> +		trace_xfs_scrub_block_error(mp, agno, XFS_SB_BLOCK(mp),
> +				"superblock", "error != 0", __func__, __LINE__);
> +		error = 0;
> +		sc->sm->sm_flags |= XFS_SCRUB_FLAG_CORRUPT;
> +		goto out;
> +	}
> +
> +	/*
> +	 * The in-core sb is a more up-to-date copy of AG 0's sb,
> +	 * so there's no point in comparing the two.
> +	 */
> +	if (agno == 0)
> +		goto out;

Check this before reading the sb buffer?

> +	xfs_sb_from_disk(&sb, XFS_BUF_TO_SBP(bp));

Ok, there's a problem here - the on-disk superblock needs all unused
fields, empty space and feature bit conditional fields to be zero on
disk. Unused and feature dependent fields aren't necessarily zero in
memory, so we're not really scrubbing the on-disk superblock here.

ALso, all the space between the end of the defined superblock and
the end of the superblock sector must be zero, so scrubbing needs to
verify that, too.


> +
> +	/* Verify the geometries match. */
> +#define XFS_SCRUB_SB_FIELD(fn) \
> +		XFS_SCRUB_SB_CHECK(sb.sb_##fn == mp->m_sb.sb_##fn)
> +#define XFS_PREEN_SB_FIELD(fn) \
> +		XFS_SCRUB_SB_PREEN(sb.sb_##fn == mp->m_sb.sb_##fn)
> +	XFS_SCRUB_SB_FIELD(blocksize);
> +	XFS_SCRUB_SB_FIELD(dblocks);
> +	XFS_SCRUB_SB_FIELD(rblocks);
> +	XFS_SCRUB_SB_FIELD(rextents);
> +	XFS_SCRUB_SB_PREEN(uuid_equal(&sb.sb_uuid, &mp->m_sb.sb_uuid));

Isn't this dependent on the xfs_sb_version_hasmetauuid() feature?
Regardless, I think this should be part of the checks done based on
that feature bit below...

....

> +	if (xfs_sb_version_hascrc(&mp->m_sb)) {
> +		XFS_SCRUB_SB_CHECK(!xfs_sb_has_compat_feature(&sb,
> +				XFS_SB_FEAT_COMPAT_UNKNOWN));
> +		XFS_SCRUB_SB_CHECK(!xfs_sb_has_ro_compat_feature(&sb,
> +				XFS_SB_FEAT_RO_COMPAT_UNKNOWN));
> +		XFS_SCRUB_SB_CHECK(!xfs_sb_has_incompat_feature(&sb,
> +				XFS_SB_FEAT_INCOMPAT_UNKNOWN));
> +		XFS_SCRUB_SB_CHECK(!xfs_sb_has_incompat_log_feature(&sb,
> +				XFS_SB_FEAT_INCOMPAT_LOG_UNKNOWN));
> +		XFS_SCRUB_SB_FIELD(spino_align);
> +		XFS_PREEN_SB_FIELD(pquotino);
> +	}

else all these fields should be zero on disk.

> +	if (xfs_sb_version_hasmetauuid(&mp->m_sb)) {
> +		XFS_SCRUB_SB_CHECK(uuid_equal(&sb.sb_meta_uuid,
> +					&mp->m_sb.sb_meta_uuid));
> +		XFS_SCRUB_SB_CHECK(uuid_equal(&sb.sb_uuid,
> +					&mp->m_sb.sb_uuid));
> +	} else
> +		XFS_SCRUB_SB_CHECK(uuid_equal(&sb.sb_uuid,
> +					&mp->m_sb.sb_meta_uuid));

That's checking in-memory state is valid, not that the on-disk
sb_meta_uuid field is zero for this case.

> +#undef XFS_SCRUB_SB_FIELD
> +
> +#define XFS_SCRUB_SB_FEAT(fn) \
> +		XFS_SCRUB_SB_CHECK(xfs_sb_version_has##fn(&sb) == \
> +		xfs_sb_version_has##fn(&mp->m_sb))
> +	XFS_SCRUB_SB_FEAT(align);
> +	XFS_SCRUB_SB_FEAT(dalign);
> +	XFS_SCRUB_SB_FEAT(logv2);
> +	XFS_SCRUB_SB_FEAT(extflgbit);
> +	XFS_SCRUB_SB_FEAT(sector);
> +	XFS_SCRUB_SB_FEAT(asciici);
> +	XFS_SCRUB_SB_FEAT(morebits);
> +	XFS_SCRUB_SB_FEAT(lazysbcount);
> +	XFS_SCRUB_SB_FEAT(crc);
> +	XFS_SCRUB_SB_FEAT(_pquotino);
> +	XFS_SCRUB_SB_FEAT(ftype);
> +	XFS_SCRUB_SB_FEAT(finobt);
> +	XFS_SCRUB_SB_FEAT(sparseinodes);
> +	XFS_SCRUB_SB_FEAT(metauuid);
> +	XFS_SCRUB_SB_FEAT(rmapbt);
> +	XFS_SCRUB_SB_FEAT(reflink);
> +#undef XFS_SCRUB_SB_FEAT

Do we need bit by bit feature checks? It's trivial to look up the
mismatched bits from just the raw values....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 05/22] xfs: scrub in-memory metadata buffers
  2017-07-25  3:32           ` Dave Chinner
@ 2017-07-25  5:27             ` Darrick J. Wong
  0 siblings, 0 replies; 63+ messages in thread
From: Darrick J. Wong @ 2017-07-25  5:27 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Jul 25, 2017 at 01:32:03PM +1000, Dave Chinner wrote:
> On Mon, Jul 24, 2017 at 05:14:33PM -0700, Darrick J. Wong wrote:
> > On Tue, Jul 25, 2017 at 09:38:13AM +1000, Dave Chinner wrote:
> > > On Mon, Jul 24, 2017 at 03:36:54PM -0700, Darrick J. Wong wrote:
> > > > On Mon, Jul 24, 2017 at 11:43:27AM +1000, Dave Chinner wrote:
> > > > > On Thu, Jul 20, 2017 at 09:39:00PM -0700, Darrick J. Wong wrote:
> > > > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > > 
> > > > > > Call the verifier function for all in-memory metadata buffers, looking
> > > > > > for memory corruption either due to bad memory or coding bugs.
> > > > > 
> > > > > How does this fit into the bigger picture? We can't do an exhaustive
> > > > > search of the in memory buffer cache, because access is racy w.r.t.
> > > > > the life cycle of in memory buffers.
> > > > > 
> > > > > Also, if we are doing a full scrub, we're going to hit and then
> > > > > check the cached in-memory buffers anyway, so I'm missing the
> > > > > context that explains why this code is necessary.
> > > > 
> > > > Before we start scanning the filesystem (which could lead to clean
> > > > buffers being pushed out of memory and later reread), we want to check
> > > > the buffers that have been sitting around in memory to see if they've
> > > > mutated since the last time the verifiers ran.
> > > 
> > > I'm not sure we need a special cache walk to do this.
> > > 
> > > My thinking is that if the buffers get pushed out of memory, the
> > > verifier will be run at that time, so we don't need to run the
> > > verifier before a scrub to avoid problems here.
> > 
> > Agreed.
> > 
> > > Further, if we read the buffer as part of the scrub and it's found
> > > in cache, then if the scrub finds a corruption we'll know it
> > > happened between the last verifier invocation and the scrub.
> > 
> > Hm.  Prior to the introduction of the metabufs scanner a few weeks ago, 
> > I had thought it sufficient to assume that memory won't get corrupt, so
> > as long as the read verifier ran at /some/ point in the past we didn't
> > need to recheck now.
> > 
> > What if we scrap the metabufs scanner and adapt the read verifier
> > function pointer to allow scrub to bypass the crc check and return the
> > _THIS_IP_ from any failing structural test?  Then scrubbers can call the
> > read verifier directly and extract failure info directly.
> 
> Yeah, that would work - rather than adapting the .read_verify op
> we currently have, maybe a new op .read_verify_nocrc could be added?
> THat would mostly just be a different wrapper around the existing
> verify functions that are shared between the read and write
> verifiers...

I was going to call them ->verify_structure() or something like that.

> > > If the buffer is not in cache and scrub reads the metadata from
> > > disk, then the verifier should fire on read if the item is corrupt
> > > coming off disk. If the verifier doesn't find corruption in this
> > > case but scrub does, then we've got to think about whether the
> > > verifier has sufficient coverage.
> > 
> > Scrub has more comprehensive checks (or it will when xref comes along)
> > so this is likely to happen, fyi.
> 
> Yup, I expect it will. :) I also expect this to point out where the
> verifiers can be improved, because I'm sure we haven't caught all
> the "obviously wrong" cases in the verifiers yet...

Well yes. :)  I have been running the "dangerous_fuzzers dangerous_scrub
dangerous_repair" tests again, as they fuzz everything to see what scrub
complains about vs. what repair actually tries to fix.

Current wtf list:

- do we need to check all the padding areas? does repair?
- problems with repair not resetting di_nlink when it moves stuff to l+f?
  (x/380)
- repair doesn't correct remote symlink blocks
- scrub doesn't barf on free block problems?? (x/396)
- repair doesn't notice attr remote value block problems? (x/404)
- XFS_WANT_CORRUPTED_ are verifiers, but should not trigger asserts?
- xfs_repair segfaults when freetag is too big

(Actually, I already sent a fix a couple of days ago for that last one,
but nobody has reviewed it.)

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 06/22] xfs: scrub the backup superblocks
  2017-07-25  4:05   ` Dave Chinner
@ 2017-07-25  5:42     ` Darrick J. Wong
  0 siblings, 0 replies; 63+ messages in thread
From: Darrick J. Wong @ 2017-07-25  5:42 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Jul 25, 2017 at 02:05:00PM +1000, Dave Chinner wrote:
> On Thu, Jul 20, 2017 at 09:39:07PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Ensure that the geometry presented in the backup superblocks matches
> > the primary superblock so that repair can recover the filesystem if
> > that primary gets corrupted.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ....
> > +int
> > +xfs_scrub_setup_ag_header(
> > +	struct xfs_scrub_context	*sc,
> > +	struct xfs_inode		*ip)
> > +{
> > +	struct xfs_mount		*mp = sc->mp;
> > +
> > +	if (sc->sm->sm_agno >= mp->m_sb.sb_agcount ||
> > +	    sc->sm->sm_ino || sc->sm->sm_gen)
> > +		return -EINVAL;
> > +	return xfs_scrub_setup_fs(sc, ip);
> > +}
> 
> Could we create a superblock buffer here that contains just the bits
> we expect the secondary superblocks to have up to date (everything
> else should be zero!), and then just use a memcmp() on the raw
> secondary superblock buffer?
> 
> If there is a difference, then we can dig further to find what's
> wrong?

Sure.

> > +/* Superblock */
> > +
> > +#define XFS_SCRUB_SB_CHECK(fs_ok) \
> > +	XFS_SCRUB_CHECK(sc, bp, "superblock", fs_ok)
> > +#define XFS_SCRUB_SB_PREEN(fs_ok) \
> > +	XFS_SCRUB_PREEN(sc, bp, "superblock", fs_ok)
> 
> I don't understand from reading the code why some fields are checked
> and others are preened. A comment explaining this would be helpful.

Ok.

/* 
 * Superblock fields that are set at mkfs time are checked.
 * Fields in super block 0 that can be updated after mkfs and
 * not copied to the backup superblocks are preened.
 */

> 
> > +#define XFS_SCRUB_SB_OP_ERROR_GOTO(label) \
> > +	XFS_SCRUB_OP_ERROR_GOTO(sc, agno, 0, "superblock", &error, out)
> > +/* Scrub the filesystem superblock. */
> > +int
> > +xfs_scrub_superblock(
> > +	struct xfs_scrub_context	*sc)
> > +{
> > +	struct xfs_mount		*mp = sc->mp;
> > +	struct xfs_buf			*bp;
> > +	struct xfs_sb			sb;
> > +	xfs_agnumber_t			agno;
> > +	uint32_t			v2_ok;
> > +	int				error;
> > +
> > +	agno = sc->sm->sm_agno;
> > +
> > +	error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
> > +		  XFS_AGB_TO_DADDR(mp, agno, XFS_SB_BLOCK(mp)),
> > +		  XFS_FSS_TO_BB(mp, 1), 0, &bp, &xfs_sb_buf_ops);
> > +	if (error) {
> > +		trace_xfs_scrub_block_error(mp, agno, XFS_SB_BLOCK(mp),
> > +				"superblock", "error != 0", __func__, __LINE__);
> > +		error = 0;
> > +		sc->sm->sm_flags |= XFS_SCRUB_FLAG_CORRUPT;
> > +		goto out;
> > +	}
> > +
> > +	/*
> > +	 * The in-core sb is a more up-to-date copy of AG 0's sb,
> > +	 * so there's no point in comparing the two.
> > +	 */
> > +	if (agno == 0)
> > +		goto out;
> 
> Check this before reading the sb buffer?
> 
> > +	xfs_sb_from_disk(&sb, XFS_BUF_TO_SBP(bp));
> 
> Ok, there's a problem here - the on-disk superblock needs all unused
> fields, empty space and feature bit conditional fields to be zero on
> disk. Unused and feature dependent fields aren't necessarily zero in
> memory, so we're not really scrubbing the on-disk superblock here.

Ok.

> ALso, all the space between the end of the defined superblock and
> the end of the superblock sector must be zero, so scrubbing needs to
> verify that, too.

Ok.

> 
> > +
> > +	/* Verify the geometries match. */
> > +#define XFS_SCRUB_SB_FIELD(fn) \
> > +		XFS_SCRUB_SB_CHECK(sb.sb_##fn == mp->m_sb.sb_##fn)
> > +#define XFS_PREEN_SB_FIELD(fn) \
> > +		XFS_SCRUB_SB_PREEN(sb.sb_##fn == mp->m_sb.sb_##fn)
> > +	XFS_SCRUB_SB_FIELD(blocksize);
> > +	XFS_SCRUB_SB_FIELD(dblocks);
> > +	XFS_SCRUB_SB_FIELD(rblocks);
> > +	XFS_SCRUB_SB_FIELD(rextents);
> > +	XFS_SCRUB_SB_PREEN(uuid_equal(&sb.sb_uuid, &mp->m_sb.sb_uuid));
> 
> Isn't this dependent on the xfs_sb_version_hasmetauuid() feature?
> Regardless, I think this should be part of the checks done based on
> that feature bit below...

I don't think it's dependent on hasmetauuid.  sb_uuid is the admin-set
uuid, which ought to be the same on all supers, right?  So that if we
set a new uuid, break sb 0, and have repair fix the fs, the uuid won't
suddenly shift.

Versus sb_meta_uuid, which if hasmetauuid /has/ to match on all supers.

> ....
> 
> > +	if (xfs_sb_version_hascrc(&mp->m_sb)) {
> > +		XFS_SCRUB_SB_CHECK(!xfs_sb_has_compat_feature(&sb,
> > +				XFS_SB_FEAT_COMPAT_UNKNOWN));
> > +		XFS_SCRUB_SB_CHECK(!xfs_sb_has_ro_compat_feature(&sb,
> > +				XFS_SB_FEAT_RO_COMPAT_UNKNOWN));
> > +		XFS_SCRUB_SB_CHECK(!xfs_sb_has_incompat_feature(&sb,
> > +				XFS_SB_FEAT_INCOMPAT_UNKNOWN));
> > +		XFS_SCRUB_SB_CHECK(!xfs_sb_has_incompat_log_feature(&sb,
> > +				XFS_SB_FEAT_INCOMPAT_LOG_UNKNOWN));
> > +		XFS_SCRUB_SB_FIELD(spino_align);
> > +		XFS_PREEN_SB_FIELD(pquotino);
> > +	}
> 
> else all these fields should be zero on disk.

Ok.

> > +	if (xfs_sb_version_hasmetauuid(&mp->m_sb)) {
> > +		XFS_SCRUB_SB_CHECK(uuid_equal(&sb.sb_meta_uuid,
> > +					&mp->m_sb.sb_meta_uuid));
> > +		XFS_SCRUB_SB_CHECK(uuid_equal(&sb.sb_uuid,
> > +					&mp->m_sb.sb_uuid));
> > +	} else
> > +		XFS_SCRUB_SB_CHECK(uuid_equal(&sb.sb_uuid,
> > +					&mp->m_sb.sb_meta_uuid));
> 
> That's checking in-memory state is valid, not that the on-disk
> sb_meta_uuid field is zero for this case.

Eeyuck, that needs some TLC indeed.

> > +#undef XFS_SCRUB_SB_FIELD
> > +
> > +#define XFS_SCRUB_SB_FEAT(fn) \
> > +		XFS_SCRUB_SB_CHECK(xfs_sb_version_has##fn(&sb) == \
> > +		xfs_sb_version_has##fn(&mp->m_sb))
> > +	XFS_SCRUB_SB_FEAT(align);
> > +	XFS_SCRUB_SB_FEAT(dalign);
> > +	XFS_SCRUB_SB_FEAT(logv2);
> > +	XFS_SCRUB_SB_FEAT(extflgbit);
> > +	XFS_SCRUB_SB_FEAT(sector);
> > +	XFS_SCRUB_SB_FEAT(asciici);
> > +	XFS_SCRUB_SB_FEAT(morebits);
> > +	XFS_SCRUB_SB_FEAT(lazysbcount);
> > +	XFS_SCRUB_SB_FEAT(crc);
> > +	XFS_SCRUB_SB_FEAT(_pquotino);
> > +	XFS_SCRUB_SB_FEAT(ftype);
> > +	XFS_SCRUB_SB_FEAT(finobt);
> > +	XFS_SCRUB_SB_FEAT(sparseinodes);
> > +	XFS_SCRUB_SB_FEAT(metauuid);
> > +	XFS_SCRUB_SB_FEAT(rmapbt);
> > +	XFS_SCRUB_SB_FEAT(reflink);
> > +#undef XFS_SCRUB_SB_FEAT
> 
> Do we need bit by bit feature checks? It's trivial to look up the
> mismatched bits from just the raw values....

We could forgo this.

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2017-07-25  5:42 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-21  4:38 [PATCH v8 00/22] xfs: online scrub support Darrick J. Wong
2017-07-21  4:38 ` [PATCH 01/22] xfs: query the per-AG reservation counters Darrick J. Wong
2017-07-23 16:16   ` Allison Henderson
2017-07-23 22:25   ` Dave Chinner
2017-07-24 19:07     ` Darrick J. Wong
2017-07-21  4:38 ` [PATCH 02/22] xfs: add scrub tracepoints Darrick J. Wong
2017-07-23 16:23   ` Allison Henderson
2017-07-21  4:38 ` [PATCH 03/22] xfs: create an ioctl to scrub AG metadata Darrick J. Wong
2017-07-23 16:37   ` Allison Henderson
2017-07-23 23:45   ` Dave Chinner
2017-07-24 21:14     ` Darrick J. Wong
2017-07-21  4:38 ` [PATCH 04/22] xfs: generic functions to scrub metadata and btrees Darrick J. Wong
2017-07-23 16:40   ` Allison Henderson
2017-07-24  1:05   ` Dave Chinner
2017-07-24 21:58     ` Darrick J. Wong
2017-07-24 23:15       ` Dave Chinner
2017-07-25  0:39         ` Darrick J. Wong
2017-07-21  4:39 ` [PATCH 05/22] xfs: scrub in-memory metadata buffers Darrick J. Wong
2017-07-23 16:48   ` Allison Henderson
2017-07-24  1:43   ` Dave Chinner
2017-07-24 22:36     ` Darrick J. Wong
2017-07-24 23:38       ` Dave Chinner
2017-07-25  0:14         ` Darrick J. Wong
2017-07-25  3:32           ` Dave Chinner
2017-07-25  5:27             ` Darrick J. Wong
2017-07-21  4:39 ` [PATCH 06/22] xfs: scrub the backup superblocks Darrick J. Wong
2017-07-23 16:50   ` Allison Henderson
2017-07-25  4:05   ` Dave Chinner
2017-07-25  5:42     ` Darrick J. Wong
2017-07-21  4:39 ` [PATCH 07/22] xfs: scrub AGF and AGFL Darrick J. Wong
2017-07-23 16:59   ` Allison Henderson
2017-07-21  4:39 ` [PATCH 08/22] xfs: scrub the AGI Darrick J. Wong
2017-07-23 17:02   ` Allison Henderson
2017-07-21  4:39 ` [PATCH 09/22] xfs: scrub free space btrees Darrick J. Wong
2017-07-23 17:09   ` Allison Henderson
2017-07-21  4:39 ` [PATCH 10/22] xfs: scrub inode btrees Darrick J. Wong
2017-07-23 17:15   ` Allison Henderson
2017-07-21  4:39 ` [PATCH 11/22] xfs: scrub rmap btrees Darrick J. Wong
2017-07-23 17:21   ` Allison Henderson
2017-07-21  4:39 ` [PATCH 12/22] xfs: scrub refcount btrees Darrick J. Wong
2017-07-23 17:25   ` Allison Henderson
2017-07-21  4:39 ` [PATCH 13/22] xfs: scrub inodes Darrick J. Wong
2017-07-23 17:38   ` Allison Henderson
2017-07-24 20:02     ` Darrick J. Wong
2017-07-21  4:40 ` [PATCH 14/22] xfs: scrub inode block mappings Darrick J. Wong
2017-07-23 17:41   ` Allison Henderson
2017-07-24 20:05     ` Darrick J. Wong
2017-07-21  4:40 ` [PATCH 15/22] xfs: scrub directory/attribute btrees Darrick J. Wong
2017-07-23 17:45   ` Allison Henderson
2017-07-21  4:40 ` [PATCH 16/22] xfs: scrub directory metadata Darrick J. Wong
2017-07-23 17:51   ` Allison Henderson
2017-07-21  4:40 ` [PATCH 17/22] xfs: scrub directory freespace Darrick J. Wong
2017-07-23 17:55   ` Allison Henderson
2017-07-21  4:40 ` [PATCH 18/22] xfs: scrub extended attributes Darrick J. Wong
2017-07-23 17:57   ` Allison Henderson
2017-07-21  4:40 ` [PATCH 19/22] xfs: scrub symbolic links Darrick J. Wong
2017-07-23 17:59   ` Allison Henderson
2017-07-21  4:40 ` [PATCH 20/22] xfs: scrub parent pointers Darrick J. Wong
2017-07-23 18:03   ` Allison Henderson
2017-07-21  4:40 ` [PATCH 21/22] xfs: scrub realtime bitmap/summary Darrick J. Wong
2017-07-23 18:05   ` Allison Henderson
2017-07-21  4:40 ` [PATCH 22/22] xfs: scrub quota information Darrick J. Wong
2017-07-23 18:07   ` Allison Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).