All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v12 00/30] xfs: online scrub support
@ 2017-10-12  1:40 Darrick J. Wong
  2017-10-12  1:40 ` [PATCH 01/30] xfs: return a distinct error code value for IGET_INCORE cache misses Darrick J. Wong
                   ` (29 more replies)
  0 siblings, 30 replies; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-12  1:40 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

Hi all,

This is the twelfth revision of a patchset that adds to XFS kernel
support for online metadata scrubbing and repair.  There aren't any
on-disk format changes.  Changes since v11 include following all the
review comments from Dave Chinner to add more helper functions and
de-grossify the scrub code, and rebasing to 4.14-rc4.  I have been
performing daily online scrubs of my XFS filesystems for several months
now, with surprisingly few problems.

Online scrub/repair support consists of four major pieces -- first, an
ioctl that maps physical extents to their owners (GETFSMAP; already in
4.12); second, various in-kernel metadata scrubbing ioctls to examine
metadata records and cross-reference them with other filesystem
metadata; third, an in-kernel mechanism for rebuilding damaged metadata
objects and btrees; and fourth, a userspace component to coordinate
scrubbing and repair operations.

This new utility, xfs_scrub, is separate from the existing offline
xfs_repair tool.  The program uses various XFS ioctls to iterate all XFS
metadata and asks the kernel to check the metadata and repair it if
necessary.

While I understand that reviewer bandwidth is limited, I would like to
get this series prepped for 4.15, if possible.  I have isolated the
scrub code such that it can be compiled out entirely, in the hopes that
we can stabilize the code while not exposing regular users to riskier
code.

If you're going to start using this mess, you probably ought to just
pull from my git trees.  The kernel patches[1] should apply against
4.14-rc4.  xfsprogs[2] and xfstests[3] can be found in their usual
places.  The git trees contain all four series' worth of changes.

This is an extraordinary way to eat your data.  Enjoy!
Comments and questions are, as always, welcome.

--D

[1] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=djwong-devel
[2] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=djwong-devel
[3] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=djwong-devel

^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH 01/30] xfs: return a distinct error code value for IGET_INCORE cache misses
  2017-10-12  1:40 [PATCH v12 00/30] xfs: online scrub support Darrick J. Wong
@ 2017-10-12  1:40 ` Darrick J. Wong
  2017-10-12  5:25   ` Dave Chinner
  2017-10-12  1:40 ` [PATCH 02/30] xfs: create block pointer check functions Darrick J. Wong
                   ` (28 subsequent siblings)
  29 siblings, 1 reply; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-12  1:40 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, Brian Foster

From: Darrick J. Wong <darrick.wong@oracle.com>

For an XFS_IGET_INCORE iget operation, if the inode isn't in the cache,
return ENODATA so that we don't confuse it with the pre-existing ENOENT
cases (inode is in cache, but freed).

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_icache.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 3422711..43005fb 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -610,7 +610,7 @@ xfs_iget(
 	} else {
 		rcu_read_unlock();
 		if (flags & XFS_IGET_INCORE) {
-			error = -ENOENT;
+			error = -ENODATA;
 			goto out_error_or_again;
 		}
 		XFS_STATS_INC(mp, xs_ig_missed);


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 02/30] xfs: create block pointer check functions
  2017-10-12  1:40 [PATCH v12 00/30] xfs: online scrub support Darrick J. Wong
  2017-10-12  1:40 ` [PATCH 01/30] xfs: return a distinct error code value for IGET_INCORE cache misses Darrick J. Wong
@ 2017-10-12  1:40 ` Darrick J. Wong
  2017-10-12  5:28   ` Dave Chinner
  2017-10-12  1:41 ` [PATCH 03/30] xfs: refactor btree pointer checks Darrick J. Wong
                   ` (27 subsequent siblings)
  29 siblings, 1 reply; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-12  1:40 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Create some helper functions to check that a block pointer points
within the filesystem (or AG) and doesn't point at static metadata.
We will use this for scrub.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_alloc.c    |   49 ++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_alloc.h    |    4 +++
 fs/xfs/libxfs/xfs_rtbitmap.c |   12 ++++++++++
 fs/xfs/xfs_rtalloc.h         |    2 ++
 4 files changed, 67 insertions(+)


diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 744dcae..bd3a943 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -2923,3 +2923,52 @@ xfs_alloc_query_all(
 	query.fn = fn;
 	return xfs_btree_query_all(cur, xfs_alloc_query_range_helper, &query);
 }
+
+/* Find the size of the AG, in blocks. */
+xfs_agblock_t
+xfs_ag_block_count(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno)
+{
+	ASSERT(agno < mp->m_sb.sb_agcount);
+
+	if (agno < mp->m_sb.sb_agcount - 1)
+		return mp->m_sb.sb_agblocks;
+	return mp->m_sb.sb_dblocks - (agno * mp->m_sb.sb_agblocks);
+}
+
+/*
+ * Verify that an AG block number pointer neither points outside the AG
+ * nor points at static metadata.
+ */
+bool
+xfs_verify_agbno_ptr(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	xfs_agblock_t		agbno)
+{
+	xfs_agblock_t		eoag;
+
+	eoag = xfs_ag_block_count(mp, agno);
+	if (agbno >= eoag)
+		return false;
+	if (agbno <= XFS_AGFL_BLOCK(mp))
+		return false;
+	return true;
+}
+
+/*
+ * Verify that an FS block number pointer neither points outside the
+ * filesystem nor points at static AG metadata.
+ */
+bool
+xfs_verify_fsbno_ptr(
+	struct xfs_mount	*mp,
+	xfs_fsblock_t		fsbno)
+{
+	xfs_agnumber_t		agno = XFS_FSB_TO_AGNO(mp, fsbno);
+
+	if (agno >= mp->m_sb.sb_agcount)
+		return false;
+	return xfs_verify_agbno_ptr(mp, agno, XFS_FSB_TO_AGBNO(mp, fsbno));
+}
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index ef26edc..3185807 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -232,5 +232,9 @@ int xfs_alloc_query_range(struct xfs_btree_cur *cur,
 		xfs_alloc_query_range_fn fn, void *priv);
 int xfs_alloc_query_all(struct xfs_btree_cur *cur, xfs_alloc_query_range_fn fn,
 		void *priv);
+xfs_agblock_t xfs_ag_block_count(struct xfs_mount *mp, xfs_agnumber_t agno);
+bool xfs_verify_agbno_ptr(struct xfs_mount *mp, xfs_agnumber_t agno,
+		xfs_agblock_t agbno);
+bool xfs_verify_fsbno_ptr(struct xfs_mount *mp, xfs_fsblock_t fsbno);
 
 #endif	/* __XFS_ALLOC_H__ */
diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c
index 5d4e43e..0a49348 100644
--- a/fs/xfs/libxfs/xfs_rtbitmap.c
+++ b/fs/xfs/libxfs/xfs_rtbitmap.c
@@ -1086,3 +1086,15 @@ xfs_rtalloc_query_all(
 
 	return xfs_rtalloc_query_range(tp, &keys[0], &keys[1], fn, priv);
 }
+
+/*
+ * Verify that an realtime block number pointer doesn't point off the
+ * end of the realtime device.
+ */
+bool
+xfs_verify_rtbno_ptr(
+	struct xfs_mount	*mp,
+	xfs_rtblock_t		rtbno)
+{
+	return rtbno < mp->m_sb.sb_rblocks;
+}
diff --git a/fs/xfs/xfs_rtalloc.h b/fs/xfs/xfs_rtalloc.h
index 79defa7..11b8554 100644
--- a/fs/xfs/xfs_rtalloc.h
+++ b/fs/xfs/xfs_rtalloc.h
@@ -138,6 +138,7 @@ int xfs_rtalloc_query_range(struct xfs_trans *tp,
 int xfs_rtalloc_query_all(struct xfs_trans *tp,
 			  xfs_rtalloc_query_range_fn fn,
 			  void *priv);
+bool xfs_verify_rtbno_ptr(struct xfs_mount *mp, xfs_rtblock_t rtbno);
 #else
 # define xfs_rtallocate_extent(t,b,min,max,l,f,p,rb)    (ENOSYS)
 # define xfs_rtfree_extent(t,b,l)                       (ENOSYS)
@@ -146,6 +147,7 @@ int xfs_rtalloc_query_all(struct xfs_trans *tp,
 # define xfs_rtalloc_query_range(t,l,h,f,p)             (ENOSYS)
 # define xfs_rtalloc_query_all(t,f,p)                   (ENOSYS)
 # define xfs_rtbuf_get(m,t,b,i,p)                       (ENOSYS)
+# define xfs_verify_rtbno_ptr(m, r)			(false)
 static inline int		/* error */
 xfs_rtmount_init(
 	xfs_mount_t	*mp)	/* file system mount structure */


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 03/30] xfs: refactor btree pointer checks
  2017-10-12  1:40 [PATCH v12 00/30] xfs: online scrub support Darrick J. Wong
  2017-10-12  1:40 ` [PATCH 01/30] xfs: return a distinct error code value for IGET_INCORE cache misses Darrick J. Wong
  2017-10-12  1:40 ` [PATCH 02/30] xfs: create block pointer check functions Darrick J. Wong
@ 2017-10-12  1:41 ` Darrick J. Wong
  2017-10-12  5:51   ` Dave Chinner
  2017-10-12  1:41 ` [PATCH 04/30] xfs: refactor btree block header checking functions Darrick J. Wong
                   ` (26 subsequent siblings)
  29 siblings, 1 reply; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-12  1:41 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Refactor the btree pointer checks so that we can call them from the
scrub code without logging errors to dmesg.  Preserve the existing error
reporting for regular operations.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c  |    4 +--
 fs/xfs/libxfs/xfs_btree.c |   70 +++++++++++++++++++++------------------------
 fs/xfs/libxfs/xfs_btree.h |   13 +++++++-
 3 files changed, 45 insertions(+), 42 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 044a363..b4cbd1a 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -657,8 +657,8 @@ xfs_bmap_btree_to_extents(
 	cbno = be64_to_cpu(*pp);
 	*logflagsp = 0;
 #ifdef DEBUG
-	if ((error = xfs_btree_check_lptr(cur, cbno, 1)))
-		return error;
+	XFS_WANT_CORRUPTED_RETURN(cur->bc_mp,
+			xfs_btree_check_lptr(cur, cbno, 1));
 #endif
 	error = xfs_btree_read_bufl(mp, tp, cbno, 0, &cbp, XFS_BMAP_BTREE_REF,
 				&xfs_bmbt_buf_ops);
diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index 5bfb882..e7e033a 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -177,59 +177,53 @@ xfs_btree_check_block(
 		return xfs_btree_check_sblock(cur, block, level, bp);
 }
 
-/*
- * Check that (long) pointer is ok.
- */
-int					/* error (0 or EFSCORRUPTED) */
+/* Check that this long pointer is valid and points within the fs. */
+bool
 xfs_btree_check_lptr(
-	struct xfs_btree_cur	*cur,	/* btree cursor */
-	xfs_fsblock_t		bno,	/* btree block disk address */
-	int			level)	/* btree block level */
+	struct xfs_btree_cur	*cur,
+	xfs_fsblock_t		fsbno,
+	int			level)
 {
-	XFS_WANT_CORRUPTED_RETURN(cur->bc_mp,
-		level > 0 &&
-		bno != NULLFSBLOCK &&
-		XFS_FSB_SANITY_CHECK(cur->bc_mp, bno));
-	return 0;
+	if (level <= 0)
+		return false;
+	return xfs_verify_fsbno_ptr(cur->bc_mp, fsbno);
 }
 
-#ifdef DEBUG
-/*
- * Check that (short) pointer is ok.
- */
-STATIC int				/* error (0 or EFSCORRUPTED) */
+/* Check that this short pointer is valid and points within the AG. */
+bool
 xfs_btree_check_sptr(
-	struct xfs_btree_cur	*cur,	/* btree cursor */
-	xfs_agblock_t		bno,	/* btree block disk address */
-	int			level)	/* btree block level */
+	struct xfs_btree_cur	*cur,
+	xfs_agblock_t		agbno,
+	int			level)
 {
-	xfs_agblock_t		agblocks = cur->bc_mp->m_sb.sb_agblocks;
-
-	XFS_WANT_CORRUPTED_RETURN(cur->bc_mp,
-		level > 0 &&
-		bno != NULLAGBLOCK &&
-		bno != 0 &&
-		bno < agblocks);
-	return 0;
+	if (level <= 0)
+		return false;
+	return xfs_verify_agbno_ptr(cur->bc_mp, cur->bc_private.a.agno, agbno);
 }
 
+#ifdef DEBUG
 /*
- * Check that block ptr is ok.
+ * Check that a given (indexed) btree pointer at a certain level of a
+ * btree is valid and doesn't point past where it should.
  */
-STATIC int				/* error (0 or EFSCORRUPTED) */
+int
 xfs_btree_check_ptr(
-	struct xfs_btree_cur	*cur,	/* btree cursor */
-	union xfs_btree_ptr	*ptr,	/* btree block disk address */
-	int			index,	/* offset from ptr to check */
-	int			level)	/* btree block level */
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_ptr	*ptr,
+	int			index,
+	int			level)
 {
 	if (cur->bc_flags & XFS_BTREE_LONG_PTRS) {
-		return xfs_btree_check_lptr(cur,
-				be64_to_cpu((&ptr->l)[index]), level);
+		XFS_WANT_CORRUPTED_RETURN(cur->bc_mp,
+				xfs_btree_check_lptr(cur,
+					be64_to_cpu((&ptr->l)[index]), level));
 	} else {
-		return xfs_btree_check_sptr(cur,
-				be32_to_cpu((&ptr->s)[index]), level);
+		XFS_WANT_CORRUPTED_RETURN(cur->bc_mp,
+				xfs_btree_check_sptr(cur,
+					be32_to_cpu((&ptr->s)[index]), level));
 	}
+
+	return 0;
 }
 #endif
 
diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index f2a88c3..8f52eda 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -269,10 +269,19 @@ xfs_btree_check_block(
 /*
  * Check that (long) pointer is ok.
  */
-int					/* error (0 or EFSCORRUPTED) */
+bool					/* error (0 or EFSCORRUPTED) */
 xfs_btree_check_lptr(
 	struct xfs_btree_cur	*cur,	/* btree cursor */
-	xfs_fsblock_t		ptr,	/* btree block disk address */
+	xfs_fsblock_t		fsbno,	/* btree block disk address */
+	int			level);	/* btree block level */
+
+/*
+ * Check that (short) pointer is ok.
+ */
+bool					/* error (0 or EFSCORRUPTED) */
+xfs_btree_check_sptr(
+	struct xfs_btree_cur	*cur,	/* btree cursor */
+	xfs_agblock_t		agbno,	/* btree block disk address */
 	int			level);	/* btree block level */
 
 /*


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 04/30] xfs: refactor btree block header checking functions
  2017-10-12  1:40 [PATCH v12 00/30] xfs: online scrub support Darrick J. Wong
                   ` (2 preceding siblings ...)
  2017-10-12  1:41 ` [PATCH 03/30] xfs: refactor btree pointer checks Darrick J. Wong
@ 2017-10-12  1:41 ` Darrick J. Wong
  2017-10-13  1:01   ` Dave Chinner
  2017-10-16 19:48   ` [PATCH v2 " Darrick J. Wong
  2017-10-12  1:41 ` [PATCH 05/30] xfs: create inode pointer verifiers Darrick J. Wong
                   ` (25 subsequent siblings)
  29 siblings, 2 replies; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-12  1:41 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Refactor the btree block header checks to have an internal function that
returns the address of the failing check without logging errors.  The
scrubber will call the internal function, while the external version
will maintain the current logging behavior.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_btree.c |  166 +++++++++++++++++++++++++++------------------
 fs/xfs/libxfs/xfs_btree.h |    5 +
 fs/xfs/xfs_linux.h        |    7 ++
 3 files changed, 110 insertions(+), 68 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index e7e033a..2266a5a 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -63,44 +63,61 @@ xfs_btree_magic(
 	return magic;
 }
 
-STATIC int				/* error (0 or EFSCORRUPTED) */
-xfs_btree_check_lblock(
-	struct xfs_btree_cur	*cur,	/* btree cursor */
-	struct xfs_btree_block	*block,	/* btree long form block pointer */
-	int			level,	/* level of the btree block */
-	struct xfs_buf		*bp)	/* buffer for block, if any */
+/*
+ * Check a long btree block header.  Return the address of the failing check,
+ * or NULL if everything is ok.
+ */
+void *
+__xfs_btree_check_lblock(
+	struct xfs_btree_cur	*cur,
+	struct xfs_btree_block	*block,
+	int			level,
+	struct xfs_buf		*bp)
 {
-	int			lblock_ok = 1; /* block passes checks */
-	struct xfs_mount	*mp;	/* file system mount point */
+	struct xfs_mount	*mp = cur->bc_mp;
 	xfs_btnum_t		btnum = cur->bc_btnum;
-	int			crc;
-
-	mp = cur->bc_mp;
-	crc = xfs_sb_version_hascrc(&mp->m_sb);
+	int			crc = xfs_sb_version_hascrc(&mp->m_sb);
 
 	if (crc) {
-		lblock_ok = lblock_ok &&
-			uuid_equal(&block->bb_u.l.bb_uuid,
-				   &mp->m_sb.sb_meta_uuid) &&
-			block->bb_u.l.bb_blkno == cpu_to_be64(
-				bp ? bp->b_bn : XFS_BUF_DADDR_NULL);
+		if (!uuid_equal(&block->bb_u.l.bb_uuid, &mp->m_sb.sb_meta_uuid))
+			return __this_address;
+		if (block->bb_u.l.bb_blkno !=
+		    cpu_to_be64(bp ? bp->b_bn : XFS_BUF_DADDR_NULL))
+			return __this_address;
 	}
 
-	lblock_ok = lblock_ok &&
-		be32_to_cpu(block->bb_magic) == xfs_btree_magic(crc, btnum) &&
-		be16_to_cpu(block->bb_level) == level &&
-		be16_to_cpu(block->bb_numrecs) <=
-			cur->bc_ops->get_maxrecs(cur, level) &&
-		block->bb_u.l.bb_leftsib &&
-		(block->bb_u.l.bb_leftsib == cpu_to_be64(NULLFSBLOCK) ||
-		 XFS_FSB_SANITY_CHECK(mp,
-			be64_to_cpu(block->bb_u.l.bb_leftsib))) &&
-		block->bb_u.l.bb_rightsib &&
-		(block->bb_u.l.bb_rightsib == cpu_to_be64(NULLFSBLOCK) ||
-		 XFS_FSB_SANITY_CHECK(mp,
-			be64_to_cpu(block->bb_u.l.bb_rightsib)));
-
-	if (unlikely(XFS_TEST_ERROR(!lblock_ok, mp,
+	if (be32_to_cpu(block->bb_magic) != xfs_btree_magic(crc, btnum))
+		return __this_address;
+	if (be16_to_cpu(block->bb_level) != level)
+		return __this_address;
+	if (be16_to_cpu(block->bb_numrecs) >
+	    cur->bc_ops->get_maxrecs(cur, level))
+		return __this_address;
+	if (block->bb_u.l.bb_leftsib != cpu_to_be64(NULLFSBLOCK) &&
+	    !xfs_btree_check_lptr(cur, be64_to_cpu(block->bb_u.l.bb_leftsib),
+			level + 1))
+		return __this_address;
+	if (block->bb_u.l.bb_rightsib != cpu_to_be64(NULLFSBLOCK) &&
+	    !xfs_btree_check_lptr(cur, be64_to_cpu(block->bb_u.l.bb_rightsib),
+			level + 1))
+		return __this_address;
+
+	return NULL;
+}
+
+/* Check a long btree block header. */
+int
+xfs_btree_check_lblock(
+	struct xfs_btree_cur	*cur,
+	struct xfs_btree_block	*block,
+	int			level,
+	struct xfs_buf		*bp)
+{
+	struct xfs_mount	*mp = cur->bc_mp;
+	void			*failed_at;
+
+	failed_at = __xfs_btree_check_lblock(cur, block, level, bp);
+	if (unlikely(XFS_TEST_ERROR(failed_at != NULL, mp,
 			XFS_ERRTAG_BTREE_CHECK_LBLOCK))) {
 		if (bp)
 			trace_xfs_btree_corrupt(bp, _RET_IP_);
@@ -110,48 +127,61 @@ xfs_btree_check_lblock(
 	return 0;
 }
 
-STATIC int				/* error (0 or EFSCORRUPTED) */
-xfs_btree_check_sblock(
-	struct xfs_btree_cur	*cur,	/* btree cursor */
-	struct xfs_btree_block	*block,	/* btree short form block pointer */
-	int			level,	/* level of the btree block */
-	struct xfs_buf		*bp)	/* buffer containing block */
+/*
+ * Check a short btree block header.  Return the address of the failing check,
+ * or NULL if everything is ok.
+ */
+void *
+__xfs_btree_check_sblock(
+	struct xfs_btree_cur	*cur,
+	struct xfs_btree_block	*block,
+	int			level,
+	struct xfs_buf		*bp)
 {
-	struct xfs_mount	*mp;	/* file system mount point */
-	struct xfs_buf		*agbp;	/* buffer for ag. freespace struct */
-	struct xfs_agf		*agf;	/* ag. freespace structure */
-	xfs_agblock_t		agflen;	/* native ag. freespace length */
-	int			sblock_ok = 1; /* block passes checks */
+	struct xfs_mount	*mp = cur->bc_mp;
 	xfs_btnum_t		btnum = cur->bc_btnum;
-	int			crc;
-
-	mp = cur->bc_mp;
-	crc = xfs_sb_version_hascrc(&mp->m_sb);
-	agbp = cur->bc_private.a.agbp;
-	agf = XFS_BUF_TO_AGF(agbp);
-	agflen = be32_to_cpu(agf->agf_length);
+	int			crc = xfs_sb_version_hascrc(&mp->m_sb);
 
 	if (crc) {
-		sblock_ok = sblock_ok &&
-			uuid_equal(&block->bb_u.s.bb_uuid,
-				   &mp->m_sb.sb_meta_uuid) &&
-			block->bb_u.s.bb_blkno == cpu_to_be64(
-				bp ? bp->b_bn : XFS_BUF_DADDR_NULL);
+		if (!uuid_equal(&block->bb_u.s.bb_uuid, &mp->m_sb.sb_meta_uuid))
+			return __this_address;
+		if (block->bb_u.s.bb_blkno !=
+		    cpu_to_be64(bp ? bp->b_bn : XFS_BUF_DADDR_NULL))
+			return __this_address;
 	}
 
-	sblock_ok = sblock_ok &&
-		be32_to_cpu(block->bb_magic) == xfs_btree_magic(crc, btnum) &&
-		be16_to_cpu(block->bb_level) == level &&
-		be16_to_cpu(block->bb_numrecs) <=
-			cur->bc_ops->get_maxrecs(cur, level) &&
-		(block->bb_u.s.bb_leftsib == cpu_to_be32(NULLAGBLOCK) ||
-		 be32_to_cpu(block->bb_u.s.bb_leftsib) < agflen) &&
-		block->bb_u.s.bb_leftsib &&
-		(block->bb_u.s.bb_rightsib == cpu_to_be32(NULLAGBLOCK) ||
-		 be32_to_cpu(block->bb_u.s.bb_rightsib) < agflen) &&
-		block->bb_u.s.bb_rightsib;
-
-	if (unlikely(XFS_TEST_ERROR(!sblock_ok, mp,
+	if (be32_to_cpu(block->bb_magic) != xfs_btree_magic(crc, btnum))
+		return __this_address;
+	if (be16_to_cpu(block->bb_level) != level)
+		return __this_address;
+	if (be16_to_cpu(block->bb_numrecs) >
+	    cur->bc_ops->get_maxrecs(cur, level))
+		return __this_address;
+	if (block->bb_u.s.bb_leftsib != cpu_to_be32(NULLAGBLOCK) &&
+	    !xfs_btree_check_sptr(cur, be32_to_cpu(block->bb_u.s.bb_leftsib),
+			level + 1))
+		return __this_address;
+	if (block->bb_u.s.bb_rightsib != cpu_to_be32(NULLAGBLOCK) &&
+	    !xfs_btree_check_sptr(cur, be32_to_cpu(block->bb_u.s.bb_rightsib),
+			level + 1))
+		return __this_address;
+
+	return NULL;
+}
+
+/* Check a short btree block header. */
+STATIC int
+xfs_btree_check_sblock(
+	struct xfs_btree_cur	*cur,
+	struct xfs_btree_block	*block,
+	int			level,
+	struct xfs_buf		*bp)
+{
+	struct xfs_mount	*mp = cur->bc_mp;
+	void			*failed_at;
+
+	failed_at = __xfs_btree_check_sblock(cur, block, level, bp);
+	if (unlikely(XFS_TEST_ERROR(failed_at != NULL, mp,
 			XFS_ERRTAG_BTREE_CHECK_SBLOCK))) {
 		if (bp)
 			trace_xfs_btree_corrupt(bp, _RET_IP_);
diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index 8f52eda..baf7064 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -255,6 +255,11 @@ typedef struct xfs_btree_cur
  */
 #define	XFS_BUF_TO_BLOCK(bp)	((struct xfs_btree_block *)((bp)->b_addr))
 
+/* Internal long and short btree block checks. */
+void *__xfs_btree_check_lblock(struct xfs_btree_cur *cur,
+		struct xfs_btree_block *block, int level, struct xfs_buf *bp);
+void *__xfs_btree_check_sblock(struct xfs_btree_cur *cur,
+		struct xfs_btree_block *block, int level, struct xfs_buf *bp);
 
 /*
  * Check that block header is ok.
diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
index dcd1292..b825953 100644
--- a/fs/xfs/xfs_linux.h
+++ b/fs/xfs/xfs_linux.h
@@ -142,6 +142,13 @@ typedef __u32			xfs_nlink_t;
 #define SYNCHRONIZE()	barrier()
 #define __return_address __builtin_return_address(0)
 
+/*
+ * Return the address of a label.  Use asm volatile so that the optimizer
+ * won't try anything stupid like refactoring the error jumpouts into a
+ * single return, which throws off the reported address.
+ */
+#define __this_address  ({ __label__ __here; __here: asm volatile(""); &&__here; })
+
 #define XFS_PROJID_DEFAULT	0
 
 #define MIN(a,b)	(min(a,b))


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 05/30] xfs: create inode pointer verifiers
  2017-10-12  1:40 [PATCH v12 00/30] xfs: online scrub support Darrick J. Wong
                   ` (3 preceding siblings ...)
  2017-10-12  1:41 ` [PATCH 04/30] xfs: refactor btree block header checking functions Darrick J. Wong
@ 2017-10-12  1:41 ` Darrick J. Wong
  2017-10-12 20:23   ` Darrick J. Wong
  2017-10-16 19:49   ` [PATCH v2 " Darrick J. Wong
  2017-10-12  1:41 ` [PATCH 06/30] xfs: create an ioctl to scrub AG metadata Darrick J. Wong
                   ` (24 subsequent siblings)
  29 siblings, 2 replies; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-12  1:41 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Create some helper functions to check that inode pointers point to
somewhere within the filesystem and not at the static AG metadata.
Move xfs_internal_inum and create a directory inode check function.
We will use these functions in scrub and elsewhere.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_dir2.c   |   19 ++--------
 fs/xfs/libxfs/xfs_ialloc.c |   81 ++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_ialloc.h |    7 ++++
 fs/xfs/xfs_itable.c        |   10 -----
 fs/xfs/xfs_itable.h        |    2 -
 5 files changed, 91 insertions(+), 28 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
index ccf9783..ee4d2a3 100644
--- a/fs/xfs/libxfs/xfs_dir2.c
+++ b/fs/xfs/libxfs/xfs_dir2.c
@@ -30,6 +30,7 @@
 #include "xfs_bmap.h"
 #include "xfs_dir2.h"
 #include "xfs_dir2_priv.h"
+#include "xfs_ialloc.h"
 #include "xfs_error.h"
 #include "xfs_trace.h"
 
@@ -202,22 +203,8 @@ xfs_dir_ino_validate(
 	xfs_mount_t	*mp,
 	xfs_ino_t	ino)
 {
-	xfs_agblock_t	agblkno;
-	xfs_agino_t	agino;
-	xfs_agnumber_t	agno;
-	int		ino_ok;
-	int		ioff;
-
-	agno = XFS_INO_TO_AGNO(mp, ino);
-	agblkno = XFS_INO_TO_AGBNO(mp, ino);
-	ioff = XFS_INO_TO_OFFSET(mp, ino);
-	agino = XFS_OFFBNO_TO_AGINO(mp, agblkno, ioff);
-	ino_ok =
-		agno < mp->m_sb.sb_agcount &&
-		agblkno < mp->m_sb.sb_agblocks &&
-		agblkno != 0 &&
-		ioff < (1 << mp->m_sb.sb_inopblog) &&
-		XFS_AGINO_TO_INO(mp, agno, agino) == ino;
+	bool		ino_ok = xfs_verify_dir_ino_ptr(mp, ino);
+
 	if (unlikely(XFS_TEST_ERROR(!ino_ok, mp, XFS_ERRTAG_DIR_INO_VALIDATE))) {
 		xfs_warn(mp, "Invalid inode number 0x%Lx",
 				(unsigned long long) ino);
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index 988bb3f..da3652b 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -2664,3 +2664,84 @@ xfs_ialloc_pagi_init(
 		xfs_trans_brelse(tp, bp);
 	return 0;
 }
+
+/* Calculate the first and last possible inode number in an AG. */
+void
+xfs_ialloc_aginode_range(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	xfs_agino_t		*first,
+	xfs_agino_t		*last)
+{
+	xfs_agblock_t		eoag;
+
+	eoag = xfs_ag_block_count(mp, agno);
+	*first = round_up(XFS_OFFBNO_TO_AGINO(mp, XFS_AGFL_BLOCK(mp) + 1, 0),
+			XFS_INODES_PER_CHUNK);
+	*last = round_down(XFS_OFFBNO_TO_AGINO(mp, eoag, 0),
+			XFS_INODES_PER_CHUNK) - 1;
+}
+
+/*
+ * Verify that an AG inode number pointer neither points outside the AG
+ * nor points at static metadata.
+ */
+bool
+xfs_verify_agino_ptr(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	xfs_agino_t		agino)
+{
+	xfs_agino_t		first;
+	xfs_agino_t		last;
+	int			ioff;
+
+	ioff = XFS_AGINO_TO_OFFSET(mp, agino);
+	xfs_ialloc_aginode_range(mp, agno, &first, &last);
+	return agino >= first && agino <= last &&
+	       ioff < (1 << mp->m_sb.sb_inopblog);
+}
+
+/*
+ * Verify that an FS inode number pointer neither points outside the
+ * filesystem nor points at static AG metadata.
+ */
+bool
+xfs_verify_ino_ptr(
+	struct xfs_mount	*mp,
+	xfs_ino_t		ino)
+{
+	xfs_agnumber_t		agno = XFS_INO_TO_AGNO(mp, ino);
+	xfs_agino_t		agino = XFS_INO_TO_AGINO(mp, ino);
+
+	if (agno >= mp->m_sb.sb_agcount)
+		return false;
+	if (XFS_AGINO_TO_INO(mp, agno, agino) != ino)
+		return false;
+	return xfs_verify_agino_ptr(mp, agno, agino);
+}
+
+/* Is this an internal inode number? */
+bool
+xfs_internal_inum(
+	struct xfs_mount	*mp,
+	xfs_ino_t		ino)
+{
+	return ino == mp->m_sb.sb_rbmino || ino == mp->m_sb.sb_rsumino ||
+		(xfs_sb_version_hasquota(&mp->m_sb) &&
+		 xfs_is_quota_inode(&mp->m_sb, ino));
+}
+
+/*
+ * Verify that a directory entry's inode number doesn't point at an internal
+ * inode, empty space, or static AG metadata.
+ */
+bool
+xfs_verify_dir_ino_ptr(
+	struct xfs_mount	*mp,
+	xfs_ino_t		ino)
+{
+	if (xfs_internal_inum(mp, ino))
+		return false;
+	return xfs_verify_ino_ptr(mp, ino);
+}
diff --git a/fs/xfs/libxfs/xfs_ialloc.h b/fs/xfs/libxfs/xfs_ialloc.h
index b32cfb5..904d69a 100644
--- a/fs/xfs/libxfs/xfs_ialloc.h
+++ b/fs/xfs/libxfs/xfs_ialloc.h
@@ -173,5 +173,12 @@ void xfs_inobt_btrec_to_irec(struct xfs_mount *mp, union xfs_btree_rec *rec,
 		struct xfs_inobt_rec_incore *irec);
 
 int xfs_ialloc_cluster_alignment(struct xfs_mount *mp);
+void xfs_ialloc_aginode_range(struct xfs_mount *mp, xfs_agnumber_t agno,
+		xfs_agino_t *first, xfs_agino_t *last);
+bool xfs_verify_agino_ptr(struct xfs_mount *mp, xfs_agnumber_t agno,
+		xfs_agino_t agino);
+bool xfs_verify_ino_ptr(struct xfs_mount *mp, xfs_ino_t ino);
+bool xfs_internal_inum(struct xfs_mount *mp, xfs_ino_t ino);
+bool xfs_verify_dir_ino_ptr(struct xfs_mount *mp, xfs_ino_t ino);
 
 #endif	/* __XFS_IALLOC_H__ */
diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
index c393a2f..0172d0b 100644
--- a/fs/xfs/xfs_itable.c
+++ b/fs/xfs/xfs_itable.c
@@ -31,16 +31,6 @@
 #include "xfs_trace.h"
 #include "xfs_icache.h"
 
-int
-xfs_internal_inum(
-	xfs_mount_t	*mp,
-	xfs_ino_t	ino)
-{
-	return (ino == mp->m_sb.sb_rbmino || ino == mp->m_sb.sb_rsumino ||
-		(xfs_sb_version_hasquota(&mp->m_sb) &&
-		 xfs_is_quota_inode(&mp->m_sb, ino)));
-}
-
 /*
  * Return stat information for one inode.
  * Return 0 if ok, else errno.
diff --git a/fs/xfs/xfs_itable.h b/fs/xfs/xfs_itable.h
index 17e86e0..6ea8b39 100644
--- a/fs/xfs/xfs_itable.h
+++ b/fs/xfs/xfs_itable.h
@@ -96,6 +96,4 @@ xfs_inumbers(
 	void			__user *buffer, /* buffer with inode info */
 	inumbers_fmt_pf		formatter);
 
-int xfs_internal_inum(struct xfs_mount *mp, xfs_ino_t ino);
-
 #endif	/* __XFS_ITABLE_H__ */


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 06/30] xfs: create an ioctl to scrub AG metadata
  2017-10-12  1:40 [PATCH v12 00/30] xfs: online scrub support Darrick J. Wong
                   ` (4 preceding siblings ...)
  2017-10-12  1:41 ` [PATCH 05/30] xfs: create inode pointer verifiers Darrick J. Wong
@ 2017-10-12  1:41 ` Darrick J. Wong
  2017-10-16  0:08   ` Dave Chinner
  2017-10-12  1:41 ` [PATCH 07/30] xfs: dispatch metadata scrub subcommands Darrick J. Wong
                   ` (23 subsequent siblings)
  29 siblings, 1 reply; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-12  1:41 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Create an ioctl that can be used to scrub internal filesystem metadata.
The new ioctl takes the metadata type, an (optional) AG number, an
(optional) inode number and generation, and a flags argument.  This will
be used by the upcoming XFS online scrub tool.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Kconfig           |   17 ++++++++++++++
 fs/xfs/Makefile          |   11 +++++++++
 fs/xfs/libxfs/xfs_fs.h   |   53 +++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/scrub.c     |   54 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/scrub.h     |   25 +++++++++++++++++++++
 fs/xfs/scrub/trace.c     |   41 +++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/trace.h     |   33 ++++++++++++++++++++++++++++
 fs/xfs/scrub/xfs_scrub.h |   29 +++++++++++++++++++++++++
 fs/xfs/xfs_ioctl.c       |   28 ++++++++++++++++++++++++
 fs/xfs/xfs_ioctl32.c     |    1 +
 10 files changed, 292 insertions(+)
 create mode 100644 fs/xfs/scrub/scrub.c
 create mode 100644 fs/xfs/scrub/scrub.h
 create mode 100644 fs/xfs/scrub/trace.c
 create mode 100644 fs/xfs/scrub/trace.h
 create mode 100644 fs/xfs/scrub/xfs_scrub.h


diff --git a/fs/xfs/Kconfig b/fs/xfs/Kconfig
index 1b98cfa..f42fcf1 100644
--- a/fs/xfs/Kconfig
+++ b/fs/xfs/Kconfig
@@ -71,6 +71,23 @@ config XFS_RT
 
 	  If unsure, say N.
 
+config XFS_ONLINE_SCRUB
+	bool "XFS online metadata check support"
+	default n
+	depends on XFS_FS
+	help
+	  If you say Y here you will be able to check metadata on a
+	  mounted XFS filesystem.  This feature is intended to reduce
+	  filesystem downtime by supplementing xfs_repair.  The key
+	  advantage here is to look for problems proactively so that
+	  they can be dealt with in a controlled manner.
+
+	  This feature is considered EXPERIMENTAL.  Use with caution!
+
+	  See the xfs_scrub man page in section 8 for additional information.
+
+	  If unsure, say N.
+
 config XFS_WARN
 	bool "XFS Verbose Warnings"
 	depends on XFS_FS && !XFS_DEBUG
diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index dbc33e0..f4312bc 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -138,3 +138,14 @@ xfs-$(CONFIG_XFS_POSIX_ACL)	+= xfs_acl.o
 xfs-$(CONFIG_SYSCTL)		+= xfs_sysctl.o
 xfs-$(CONFIG_COMPAT)		+= xfs_ioctl32.o
 xfs-$(CONFIG_EXPORTFS_BLOCK_OPS)	+= xfs_pnfs.o
+
+# online scrub/repair
+ifeq ($(CONFIG_XFS_ONLINE_SCRUB),y)
+
+# Tracepoints like to blow up, so build that before everything else
+
+xfs-y				+= $(addprefix scrub/, \
+				   trace.o \
+				   scrub.o \
+				   )
+endif
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 8c61f21..3b4a36e 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -468,6 +468,58 @@ typedef struct xfs_swapext
 #define XFS_FSOP_GOING_FLAGS_LOGFLUSH		0x1	/* flush log but not data */
 #define XFS_FSOP_GOING_FLAGS_NOLOGFLUSH		0x2	/* don't flush log nor data */
 
+/* metadata scrubbing */
+struct xfs_scrub_metadata {
+	__u32 sm_type;		/* What to check? */
+	__u32 sm_flags;		/* flags; see below. */
+	__u64 sm_ino;		/* inode number. */
+	__u32 sm_gen;		/* inode generation. */
+	__u32 sm_agno;		/* ag number. */
+	__u64 sm_reserved[5];	/* pad to 64 bytes */
+};
+
+/*
+ * Metadata types and flags for scrub operation.
+ */
+
+/* Scrub subcommands. */
+
+/* Number of scrub subcommands. */
+#define XFS_SCRUB_TYPE_NR	0
+
+/* i: Repair this metadata. */
+#define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
+
+/* o: Metadata object needs repair. */
+#define XFS_SCRUB_OFLAG_CORRUPT		(1 << 1)
+
+/*
+ * o: Metadata object could be optimized.  It's not corrupt, but
+ *    we could improve on it somehow.
+ */
+#define XFS_SCRUB_OFLAG_PREEN		(1 << 2)
+
+/* o: Cross-referencing failed. */
+#define XFS_SCRUB_OFLAG_XFAIL		(1 << 3)
+
+/* o: Metadata object disagrees with cross-referenced metadata. */
+#define XFS_SCRUB_OFLAG_XCORRUPT	(1 << 4)
+
+/* o: Scan was not complete. */
+#define XFS_SCRUB_OFLAG_INCOMPLETE	(1 << 5)
+
+/* o: Metadata object looked funny but isn't corrupt. */
+#define XFS_SCRUB_OFLAG_WARNING		(1 << 6)
+
+#define XFS_SCRUB_FLAGS_IN	(XFS_SCRUB_IFLAG_REPAIR)
+#define XFS_SCRUB_FLAGS_OUT	(XFS_SCRUB_OFLAG_CORRUPT | \
+				 XFS_SCRUB_OFLAG_PREEN | \
+				 XFS_SCRUB_OFLAG_XFAIL | \
+				 XFS_SCRUB_OFLAG_XCORRUPT | \
+				 XFS_SCRUB_OFLAG_INCOMPLETE | \
+				 XFS_SCRUB_OFLAG_WARNING)
+#define XFS_SCRUB_FLAGS_ALL	(XFS_SCRUB_FLAGS_IN | XFS_SCRUB_FLAGS_OUT)
+
 /*
  * ioctl limits
  */
@@ -511,6 +563,7 @@ typedef struct xfs_swapext
 #define XFS_IOC_ZERO_RANGE	_IOW ('X', 57, struct xfs_flock64)
 #define XFS_IOC_FREE_EOFBLOCKS	_IOR ('X', 58, struct xfs_fs_eofblocks)
 /*	XFS_IOC_GETFSMAP ------ hoisted 59         */
+#define XFS_IOC_SCRUB_METADATA	_IOWR('X', 60, struct xfs_scrub_metadata)
 
 /*
  * ioctl commands that replace IRIX syssgi()'s
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
new file mode 100644
index 0000000..5db2a6f
--- /dev/null
+++ b/fs/xfs/scrub/scrub.c
@@ -0,0 +1,54 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_alloc.h"
+#include "xfs_alloc_btree.h"
+#include "xfs_bmap.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_ialloc.h"
+#include "xfs_ialloc_btree.h"
+#include "xfs_refcount.h"
+#include "xfs_refcount_btree.h"
+#include "xfs_rmap.h"
+#include "xfs_rmap_btree.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/trace.h"
+
+/* Dispatch metadata scrubbing. */
+int
+xfs_scrub_metadata(
+	struct xfs_inode		*ip,
+	struct xfs_scrub_metadata	*sm)
+{
+	return -EOPNOTSUPP;
+}
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
new file mode 100644
index 0000000..eb1cd9d
--- /dev/null
+++ b/fs/xfs/scrub/scrub.h
@@ -0,0 +1,25 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef __XFS_SCRUB_SCRUB_H__
+#define __XFS_SCRUB_SCRUB_H__
+
+/* Metadata scrubbers */
+
+#endif	/* __XFS_SCRUB_SCRUB_H__ */
diff --git a/fs/xfs/scrub/trace.c b/fs/xfs/scrub/trace.c
new file mode 100644
index 0000000..c59fd41
--- /dev/null
+++ b/fs/xfs/scrub/trace.c
@@ -0,0 +1,41 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_da_format.h"
+#include "xfs_defer.h"
+#include "xfs_inode.h"
+#include "xfs_btree.h"
+#include "xfs_trans.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+
+/*
+ * We include this last to have the helpers above available for the trace
+ * event implementations.
+ */
+#define CREATE_TRACE_POINTS
+#include "scrub/trace.h"
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
new file mode 100644
index 0000000..a95a7c8
--- /dev/null
+++ b/fs/xfs/scrub/trace.h
@@ -0,0 +1,33 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM xfs_scrub
+
+#if !defined(_TRACE_XFS_SCRUB_TRACE_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_XFS_SCRUB_TRACE_H
+
+#include <linux/tracepoint.h>
+
+#endif /* _TRACE_XFS_SCRUB_TRACE_H */
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH .
+#define TRACE_INCLUDE_FILE scrub/trace
+#include <trace/define_trace.h>
diff --git a/fs/xfs/scrub/xfs_scrub.h b/fs/xfs/scrub/xfs_scrub.h
new file mode 100644
index 0000000..e00e0ea
--- /dev/null
+++ b/fs/xfs/scrub/xfs_scrub.h
@@ -0,0 +1,29 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef __XFS_SCRUB_H__
+#define __XFS_SCRUB_H__
+
+#ifndef CONFIG_XFS_ONLINE_SCRUB
+# define xfs_scrub_metadata(ip, sm)	(-ENOTTY)
+#else
+int xfs_scrub_metadata(struct xfs_inode *ip, struct xfs_scrub_metadata *sm);
+#endif /* CONFIG_XFS_ONLINE_SCRUB */
+
+#endif	/* __XFS_SCRUB_H__ */
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index aa75389..6ff012f 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -44,6 +44,7 @@
 #include "xfs_btree.h"
 #include <linux/fsmap.h>
 #include "xfs_fsmap.h"
+#include "scrub/xfs_scrub.h"
 
 #include <linux/capability.h>
 #include <linux/cred.h>
@@ -1703,6 +1704,30 @@ xfs_ioc_getfsmap(
 	return 0;
 }
 
+STATIC int
+xfs_ioc_scrub_metadata(
+	struct xfs_inode		*ip,
+	void				__user *arg)
+{
+	struct xfs_scrub_metadata	scrub;
+	int				error;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	if (copy_from_user(&scrub, arg, sizeof(scrub)))
+		return -EFAULT;
+
+	error = xfs_scrub_metadata(ip, &scrub);
+	if (error)
+		return error;
+
+	if (copy_to_user(arg, &scrub, sizeof(scrub)))
+		return -EFAULT;
+
+	return 0;
+}
+
 int
 xfs_ioc_swapext(
 	xfs_swapext_t	*sxp)
@@ -1886,6 +1911,9 @@ xfs_file_ioctl(
 	case FS_IOC_GETFSMAP:
 		return xfs_ioc_getfsmap(ip, arg);
 
+	case XFS_IOC_SCRUB_METADATA:
+		return xfs_ioc_scrub_metadata(ip, arg);
+
 	case XFS_IOC_FD_TO_HANDLE:
 	case XFS_IOC_PATH_TO_HANDLE:
 	case XFS_IOC_PATH_TO_FSHANDLE: {
diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
index fa0bc4d..35c79e2 100644
--- a/fs/xfs/xfs_ioctl32.c
+++ b/fs/xfs/xfs_ioctl32.c
@@ -556,6 +556,7 @@ xfs_file_compat_ioctl(
 	case XFS_IOC_ERROR_INJECTION:
 	case XFS_IOC_ERROR_CLEARALL:
 	case FS_IOC_GETFSMAP:
+	case XFS_IOC_SCRUB_METADATA:
 		return xfs_file_ioctl(filp, cmd, p);
 #ifndef BROKEN_X86_ALIGNMENT
 	/* These are handled fine if no alignment issues */


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 07/30] xfs: dispatch metadata scrub subcommands
  2017-10-12  1:40 [PATCH v12 00/30] xfs: online scrub support Darrick J. Wong
                   ` (5 preceding siblings ...)
  2017-10-12  1:41 ` [PATCH 06/30] xfs: create an ioctl to scrub AG metadata Darrick J. Wong
@ 2017-10-12  1:41 ` Darrick J. Wong
  2017-10-16  0:26   ` Dave Chinner
  2017-10-12  1:41 ` [PATCH 08/30] xfs: probe the scrub ioctl Darrick J. Wong
                   ` (22 subsequent siblings)
  29 siblings, 1 reply; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-12  1:41 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Create structures needed to hold scrubbing context and dispatch incoming
commands to the individual scrubbers.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/scrub/scrub.c |  192 ++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/scrub.h |   24 ++++++
 fs/xfs/scrub/trace.h |   43 +++++++++++
 3 files changed, 258 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 5db2a6f..75c318b 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -44,11 +44,201 @@
 #include "scrub/scrub.h"
 #include "scrub/trace.h"
 
+/*
+ * Online Scrub and Repair
+ *
+ * Traditionally, XFS (the kernel driver) did not know how to check or
+ * repair on-disk data structures.  That task was left to the xfs_check
+ * and xfs_repair tools, both of which require taking the filesystem
+ * offline for a thorough but time consuming examination.  Online
+ * scrub & repair, on the other hand, enables us to check the metadata
+ * for obvious errors while carefully stepping around the filesystem's
+ * ongoing operations, locking rules, etc.
+ *
+ * Given that most XFS metadata consist of records stored in a btree,
+ * most of the checking functions iterate the btree blocks themselves
+ * looking for irregularities.  When a record block is encountered, each
+ * record can be checked for obviously bad values.  Record values can
+ * also be cross-referenced against other btrees to look for potential
+ * misunderstandings between pieces of metadata.
+ *
+ * It is expected that the checkers responsible for per-AG metadata
+ * structures will lock the AG headers (AGI, AGF, AGFL), iterate the
+ * metadata structure, and perform any relevant cross-referencing before
+ * unlocking the AG and returning the results to userspace.  These
+ * scrubbers must not keep an AG locked for too long to avoid tying up
+ * the block and inode allocators.
+ *
+ * Block maps and b-trees rooted in an inode present a special challenge
+ * because they can involve extents from any AG.  The general scrubber
+ * structure of lock -> check -> xref -> unlock still holds, but AG
+ * locking order rules /must/ be obeyed to avoid deadlocks.  The
+ * ordering rule, of course, is that we must lock in increasing AG
+ * order.  Helper functions are provided to track which AG headers we've
+ * already locked.  If we detect an imminent locking order violation, we
+ * can signal a potential deadlock, in which case the scrubber can jump
+ * out to the top level, lock all the AGs in order, and retry the scrub.
+ *
+ * For file data (directories, extended attributes, symlinks) scrub, we
+ * can simply lock the inode and walk the data.  For btree data
+ * (directories and attributes) we follow the same btree-scrubbing
+ * strategy outlined previously to check the records.
+ *
+ * We use a bit of trickery with transactions to avoid buffer deadlocks
+ * if there is a cycle in the metadata.  The basic problem is that
+ * travelling down a btree involves locking the current buffer at each
+ * tree level.  If a pointer should somehow point back to a buffer that
+ * we've already examined, we will deadlock due to the second buffer
+ * locking attempt.  Note however that grabbing a buffer in transaction
+ * context links the locked buffer to the transaction.  If we try to
+ * re-grab the buffer in the context of the same transaction, we avoid
+ * the second lock attempt and continue.  Between the verifier and the
+ * scrubber, something will notice that something is amiss and report
+ * the corruption.  Therefore, each scrubber will allocate an empty
+ * transaction, attach buffers to it, and cancel the transaction at the
+ * end of the scrub run.  Cancelling a non-dirty transaction simply
+ * unlocks the buffers.
+ *
+ * There are four pieces of data that scrub can communicate to
+ * userspace.  The first is the error code (errno), which can be used to
+ * communicate operational errors in performing the scrub.  There are
+ * also three flags that can be set in the scrub context.  If the data
+ * structure itself is corrupt, the CORRUPT flag will be set.  If
+ * the metadata is correct but otherwise suboptimal, the PREEN flag
+ * will be set.
+ */
+
+/* Scrub setup and teardown */
+
+/* Free all the resources and finish the transactions. */
+STATIC int
+xfs_scrub_teardown(
+	struct xfs_scrub_context	*sc,
+	int				error)
+{
+	if (sc->tp) {
+		xfs_trans_cancel(sc->tp);
+		sc->tp = NULL;
+	}
+	return error;
+}
+
+/* Scrubbing dispatch. */
+
+static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
+};
+
+/* This isn't a stable feature, warn once per day. */
+static inline void
+xfs_scrub_experimental_warning(
+	struct xfs_mount	*mp)
+{
+	static struct ratelimit_state scrub_warning = RATELIMIT_STATE_INIT(
+			"xfs_scrub_warning", 86400 * HZ, 1);
+	ratelimit_set_flags(&scrub_warning, RATELIMIT_MSG_ON_RELEASE);
+
+	if (__ratelimit(&scrub_warning))
+		xfs_alert(mp,
+"EXPERIMENTAL online scrub feature in use. Use at your own risk!");
+}
+
 /* Dispatch metadata scrubbing. */
 int
 xfs_scrub_metadata(
 	struct xfs_inode		*ip,
 	struct xfs_scrub_metadata	*sm)
 {
-	return -EOPNOTSUPP;
+	struct xfs_scrub_context	sc;
+	struct xfs_mount		*mp = ip->i_mount;
+	const struct xfs_scrub_meta_ops	*ops;
+	bool				try_harder = false;
+	int				error = 0;
+
+	trace_xfs_scrub_start(ip, sm, error);
+
+	/* Forbidden if we are shut down or mounted norecovery. */
+	error = -ESHUTDOWN;
+	if (XFS_FORCED_SHUTDOWN(mp))
+		goto out;
+	error = -ENOTRECOVERABLE;
+	if (mp->m_flags & XFS_MOUNT_NORECOVERY)
+		goto out;
+
+	/* Check our inputs. */
+	error = -EINVAL;
+	sm->sm_flags &= ~XFS_SCRUB_FLAGS_OUT;
+	if (sm->sm_flags & ~XFS_SCRUB_FLAGS_IN)
+		goto out;
+	if (memchr_inv(sm->sm_reserved, 0, sizeof(sm->sm_reserved)))
+		goto out;
+
+	/* Do we know about this type of metadata? */
+	error = -ENOENT;
+	if (sm->sm_type >= XFS_SCRUB_TYPE_NR)
+		goto out;
+	ops = &meta_scrub_ops[sm->sm_type];
+	if (ops->scrub == NULL)
+		goto out;
+
+	/*
+	 * We won't scrub any filesystem that doesn't have the ability
+	 * to record unwritten extents.  The option was made default in
+	 * 2003, removed from mkfs in 2007, and cannot be disabled in
+	 * v5, so if we find a filesystem without this flag it's either
+	 * really old or totally unsupported.  Avoid it either way.
+	 * We also don't support v1-v3 filesystems, which aren't
+	 * mountable.
+	 */
+	error = -EOPNOTSUPP;
+	if (!xfs_sb_version_hasextflgbit(&mp->m_sb))
+		goto out;
+
+	/* Does this fs even support this type of metadata? */
+	error = -ENOENT;
+	if (ops->has && !ops->has(&mp->m_sb))
+		goto out;
+
+	/* We don't know how to repair anything yet. */
+	error = -EOPNOTSUPP;
+	if (sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR)
+		goto out;
+
+	xfs_scrub_experimental_warning(mp);
+
+retry_op:
+	/* Set up for the operation. */
+	memset(&sc, 0, sizeof(sc));
+	sc.mp = ip->i_mount;
+	sc.sm = sm;
+	sc.ops = ops;
+	sc.try_harder = try_harder;
+	error = sc.ops->setup(&sc, ip);
+	if (error)
+		goto out_teardown;
+
+	/* Scrub for errors. */
+	error = sc.ops->scrub(&sc);
+	if (!try_harder && error == -EDEADLOCK) {
+		/*
+		 * Scrubbers return -EDEADLOCK to mean 'try harder'.
+		 * Tear down everything we hold, then set up again with
+		 * preparation for worst-case scenarios.
+		 */
+		error = xfs_scrub_teardown(&sc, 0);
+		if (error)
+			goto out;
+		try_harder = true;
+		goto retry_op;
+	} else if (error)
+		goto out_teardown;
+
+	if (sc.sm->sm_flags & (XFS_SCRUB_OFLAG_CORRUPT |
+			       XFS_SCRUB_OFLAG_XCORRUPT))
+		xfs_alert_ratelimited(mp, "Corruption detected during scrub.");
+
+out_teardown:
+	error = xfs_scrub_teardown(&sc, error);
+out:
+	trace_xfs_scrub_done(ip, sm, error);
+	return error;
 }
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index eb1cd9d..ef7b50e 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -20,6 +20,30 @@
 #ifndef __XFS_SCRUB_SCRUB_H__
 #define __XFS_SCRUB_SCRUB_H__
 
+struct xfs_scrub_context;
+
+struct xfs_scrub_meta_ops {
+	/* Acquire whatever resources are needed for the operation. */
+	int		(*setup)(struct xfs_scrub_context *,
+				 struct xfs_inode *);
+
+	/* Examine metadata for errors. */
+	int		(*scrub)(struct xfs_scrub_context *);
+
+	/* Decide if we even have this piece of metadata. */
+	bool		(*has)(struct xfs_sb *);
+};
+
+struct xfs_scrub_context {
+	/* General scrub state. */
+	struct xfs_mount		*mp;
+	struct xfs_scrub_metadata	*sm;
+	const struct xfs_scrub_meta_ops	*ops;
+	struct xfs_trans		*tp;
+	struct xfs_inode		*ip;
+	bool				try_harder;
+};
+
 /* Metadata scrubbers */
 
 #endif	/* __XFS_SCRUB_SCRUB_H__ */
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index a95a7c8..688517e 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -25,6 +25,49 @@
 
 #include <linux/tracepoint.h>
 
+DECLARE_EVENT_CLASS(xfs_scrub_class,
+	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
+		 int error),
+	TP_ARGS(ip, sm, error),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(unsigned int, type)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_ino_t, inum)
+		__field(unsigned int, gen)
+		__field(unsigned int, flags)
+		__field(int, error)
+	),
+	TP_fast_assign(
+		__entry->dev = ip->i_mount->m_super->s_dev;
+		__entry->ino = ip->i_ino;
+		__entry->type = sm->sm_type;
+		__entry->agno = sm->sm_agno;
+		__entry->inum = sm->sm_ino;
+		__entry->gen = sm->sm_gen;
+		__entry->flags = sm->sm_flags;
+		__entry->error = error;
+	),
+	TP_printk("dev %d:%d ino %llu type %u agno %u inum %llu gen %u flags 0x%x error %d",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->type,
+		  __entry->agno,
+		  __entry->inum,
+		  __entry->gen,
+		  __entry->flags,
+		  __entry->error)
+)
+#define DEFINE_SCRUB_EVENT(name) \
+DEFINE_EVENT(xfs_scrub_class, name, \
+	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm, \
+		 int error), \
+	TP_ARGS(ip, sm, error))
+
+DEFINE_SCRUB_EVENT(xfs_scrub_start);
+DEFINE_SCRUB_EVENT(xfs_scrub_done);
+
 #endif /* _TRACE_XFS_SCRUB_TRACE_H */
 
 #undef TRACE_INCLUDE_PATH


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 08/30] xfs: probe the scrub ioctl
  2017-10-12  1:40 [PATCH v12 00/30] xfs: online scrub support Darrick J. Wong
                   ` (6 preceding siblings ...)
  2017-10-12  1:41 ` [PATCH 07/30] xfs: dispatch metadata scrub subcommands Darrick J. Wong
@ 2017-10-12  1:41 ` Darrick J. Wong
  2017-10-16  0:39   ` Dave Chinner
  2017-10-12  1:41 ` [PATCH 09/30] xfs: create helpers to record and deal with scrub problems Darrick J. Wong
                   ` (21 subsequent siblings)
  29 siblings, 1 reply; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-12  1:41 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Create a probe scrubber with id 0.  This will be used by xfs_scrub to
probe the kernel's abilities to scrub (and repair) the metadata.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile        |    1 +
 fs/xfs/libxfs/xfs_fs.h |    3 ++
 fs/xfs/scrub/common.c  |   59 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/common.h  |   39 ++++++++++++++++++++++++++++++++
 fs/xfs/scrub/scrub.c   |   19 +++++++++++++++
 fs/xfs/scrub/scrub.h   |    1 +
 fs/xfs/scrub/trace.c   |    1 +
 7 files changed, 122 insertions(+), 1 deletion(-)
 create mode 100644 fs/xfs/scrub/common.c
 create mode 100644 fs/xfs/scrub/common.h


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index f4312bc..ca14595 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -146,6 +146,7 @@ ifeq ($(CONFIG_XFS_ONLINE_SCRUB),y)
 
 xfs-y				+= $(addprefix scrub/, \
 				   trace.o \
+				   common.o \
 				   scrub.o \
 				   )
 endif
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 3b4a36e..765f91e 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -483,9 +483,10 @@ struct xfs_scrub_metadata {
  */
 
 /* Scrub subcommands. */
+#define XFS_SCRUB_TYPE_PROBE	0	/* presence test ioctl */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	0
+#define XFS_SCRUB_TYPE_NR	1
 
 /* i: Repair this metadata. */
 #define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
new file mode 100644
index 0000000..d2c8f94
--- /dev/null
+++ b/fs/xfs/scrub/common.c
@@ -0,0 +1,59 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_alloc.h"
+#include "xfs_alloc_btree.h"
+#include "xfs_bmap.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_ialloc.h"
+#include "xfs_ialloc_btree.h"
+#include "xfs_refcount.h"
+#include "xfs_refcount_btree.h"
+#include "xfs_rmap.h"
+#include "xfs_rmap_btree.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/trace.h"
+
+/* Common code for the metadata scrubbers. */
+
+/* Per-scrubber setup functions */
+
+/* Set us up with a transaction and an empty context. */
+int
+xfs_scrub_setup_fs(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	return xfs_scrub_trans_alloc(sc->sm, sc->mp, &sc->tp);
+}
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
new file mode 100644
index 0000000..f3d5865
--- /dev/null
+++ b/fs/xfs/scrub/common.h
@@ -0,0 +1,39 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef __XFS_SCRUB_COMMON_H__
+#define __XFS_SCRUB_COMMON_H__
+
+/*
+ * Grab an empty transaction so that we can re-grab locked buffers if
+ * one of our btrees turns out to be cyclic.
+ */
+static inline int
+xfs_scrub_trans_alloc(
+	struct xfs_scrub_metadata	*sm,
+	struct xfs_mount		*mp,
+	struct xfs_trans		**tpp)
+{
+	return xfs_trans_alloc_empty(mp, tpp);
+}
+
+/* Setup functions */
+int xfs_scrub_setup_fs(struct xfs_scrub_context *sc, struct xfs_inode *ip);
+
+#endif	/* __XFS_SCRUB_COMMON_H__ */
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 75c318b..92eac98 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -42,6 +42,7 @@
 #include "xfs_rmap_btree.h"
 #include "scrub/xfs_scrub.h"
 #include "scrub/scrub.h"
+#include "scrub/common.h"
 #include "scrub/trace.h"
 
 /*
@@ -108,6 +109,20 @@
  * will be set.
  */
 
+/*
+ * Scrub probe -- userspace uses this to probe if we're willing to
+ * scrub or repair a given mountpoint.
+ */
+int
+xfs_scrub_probe(
+	struct xfs_scrub_context	*sc)
+{
+	if (sc->sm->sm_ino || sc->sm->sm_agno)
+		return -EINVAL;
+
+	return 0;
+}
+
 /* Scrub setup and teardown */
 
 /* Free all the resources and finish the transactions. */
@@ -126,6 +141,10 @@ xfs_scrub_teardown(
 /* Scrubbing dispatch. */
 
 static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
+	{ /* ioctl presence test */
+		.setup	= xfs_scrub_setup_fs,
+		.scrub	= xfs_scrub_probe,
+	},
 };
 
 /* This isn't a stable feature, warn once per day. */
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index ef7b50e..b7b9422 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -45,5 +45,6 @@ struct xfs_scrub_context {
 };
 
 /* Metadata scrubbers */
+int xfs_scrub_tester(struct xfs_scrub_context *sc);
 
 #endif	/* __XFS_SCRUB_SCRUB_H__ */
diff --git a/fs/xfs/scrub/trace.c b/fs/xfs/scrub/trace.c
index c59fd41..88b5ccb 100644
--- a/fs/xfs/scrub/trace.c
+++ b/fs/xfs/scrub/trace.c
@@ -32,6 +32,7 @@
 #include "xfs_trans.h"
 #include "scrub/xfs_scrub.h"
 #include "scrub/scrub.h"
+#include "scrub/common.h"
 
 /*
  * We include this last to have the helpers above available for the trace


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 09/30] xfs: create helpers to record and deal with scrub problems
  2017-10-12  1:40 [PATCH v12 00/30] xfs: online scrub support Darrick J. Wong
                   ` (7 preceding siblings ...)
  2017-10-12  1:41 ` [PATCH 08/30] xfs: probe the scrub ioctl Darrick J. Wong
@ 2017-10-12  1:41 ` Darrick J. Wong
  2017-10-16  0:40   ` Dave Chinner
  2017-10-12  1:41 ` [PATCH 10/30] xfs: create helpers to scrub a metadata btree Darrick J. Wong
                   ` (20 subsequent siblings)
  29 siblings, 1 reply; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-12  1:41 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Create helper functions to record crc and corruption problems, and
deal with any other runtime errors that arise.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/scrub/common.c |  190 +++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/common.h |   23 +++++
 fs/xfs/scrub/trace.h  |  215 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 428 insertions(+)


diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index d2c8f94..709d491 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -47,6 +47,196 @@
 
 /* Common code for the metadata scrubbers. */
 
+/*
+ * Handling operational errors.
+ *
+ * The *_process_error() family of functions are used to process error return
+ * codes from functions called as part of a scrub operation.
+ *
+ * If there's no error, we return true to tell the caller that it's ok
+ * to move on to the next check in its list.
+ *
+ * For non-verifier errors (e.g. ENOMEM) we return false to tell the
+ * caller that something bad happened, and we preserve *error so that
+ * the caller can return the *error up the stack to userspace.
+ *
+ * Verifier errors (EFSBADCRC/EFSCORRUPTED) are recorded by setting
+ * OFLAG_CORRUPT in sm_flags and the *error is cleared.  In other words,
+ * we track verifier errors (and failed scrub checks) via OFLAG_CORRUPT,
+ * not via return codes.  We return false to tell the caller that
+ * something bad happened.  Since the error has been cleared, the caller
+ * will (presumably) return that zero and scrubbing will move on to
+ * whatever's next.
+ *
+ * ftrace can be used to record the precise metadata location and the
+ * approximate code location of the failed operation.
+ */
+
+/* Check for operational errors. */
+bool
+xfs_scrub_process_error(
+	struct xfs_scrub_context	*sc,
+	xfs_agnumber_t			agno,
+	xfs_agblock_t			bno,
+	int				*error)
+{
+	switch (*error) {
+	case 0:
+		return true;
+	case -EDEADLOCK:
+		/* Used to restart an op with deadlock avoidance. */
+		trace_xfs_scrub_deadlock_retry(sc->ip, sc->sm, *error);
+		break;
+	case -EFSBADCRC:
+	case -EFSCORRUPTED:
+		/* Note the badness but don't abort. */
+		sc->sm->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT;
+		*error = 0;
+		/* fall through */
+	default:
+		trace_xfs_scrub_op_error(sc, agno, bno, *error,
+				__return_address);
+		break;
+	}
+	return false;
+}
+
+/* Check for operational errors for a file offset. */
+bool
+xfs_scrub_fblock_process_error(
+	struct xfs_scrub_context	*sc,
+	int				whichfork,
+	xfs_fileoff_t			offset,
+	int				*error)
+{
+	switch (*error) {
+	case 0:
+		return true;
+	case -EDEADLOCK:
+		/* Used to restart an op with deadlock avoidance. */
+		trace_xfs_scrub_deadlock_retry(sc->ip, sc->sm, *error);
+		break;
+	case -EFSBADCRC:
+	case -EFSCORRUPTED:
+		/* Note the badness but don't abort. */
+		sc->sm->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT;
+		*error = 0;
+		/* fall through */
+	default:
+		trace_xfs_scrub_file_op_error(sc, whichfork, offset, *error,
+				__return_address);
+		break;
+	}
+	return false;
+}
+
+/*
+ * Handling scrub corruption/optimization/warning checks.
+ *
+ * The *_set_{corrupt,preen,warning}() family of functions are used to
+ * record the presence of metadata that is incorrect (corrupt), could be
+ * optimized somehow (preen), or should be flagged for administrative
+ * review but is not incorrect (warn).
+ *
+ * ftrace can be used to record the precise metadata location and
+ * approximate code location of the failed check.
+ */
+
+/* Record a block which could be optimized. */
+void
+xfs_scrub_block_set_preen(
+	struct xfs_scrub_context	*sc,
+	struct xfs_buf			*bp)
+{
+	sc->sm->sm_flags |= XFS_SCRUB_OFLAG_PREEN;
+	trace_xfs_scrub_block_preen(sc, bp->b_bn, __return_address);
+}
+
+/*
+ * Record an inode which could be optimized.  The trace data will
+ * include the block given by bp if bp is given; otherwise it will use
+ * the block location of the inode record itself.
+ */
+void
+xfs_scrub_ino_set_preen(
+	struct xfs_scrub_context	*sc,
+	struct xfs_buf			*bp)
+{
+	sc->sm->sm_flags |= XFS_SCRUB_OFLAG_PREEN;
+	trace_xfs_scrub_ino_preen(sc, sc->ip->i_ino, bp ? bp->b_bn : 0,
+			__return_address);
+}
+
+/* Record a corrupt block. */
+void
+xfs_scrub_block_set_corrupt(
+	struct xfs_scrub_context	*sc,
+	struct xfs_buf			*bp)
+{
+	sc->sm->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT;
+	trace_xfs_scrub_block_error(sc, bp->b_bn, __return_address);
+}
+
+/*
+ * Record a corrupt inode.  The trace data will include the block given
+ * by bp if bp is given; otherwise it will use the block location of the
+ * inode record itself.
+ */
+void
+xfs_scrub_ino_set_corrupt(
+	struct xfs_scrub_context	*sc,
+	xfs_ino_t			ino,
+	struct xfs_buf			*bp)
+{
+	sc->sm->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT;
+	trace_xfs_scrub_ino_error(sc, ino, bp ? bp->b_bn : 0, __return_address);
+}
+
+/* Record corruption in a block indexed by a file fork. */
+void
+xfs_scrub_fblock_set_corrupt(
+	struct xfs_scrub_context	*sc,
+	int				whichfork,
+	xfs_fileoff_t			offset)
+{
+	sc->sm->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT;
+	trace_xfs_scrub_fblock_error(sc, whichfork, offset, __return_address);
+}
+
+/*
+ * Warn about inodes that need administrative review but is not
+ * incorrect.
+ */
+void
+xfs_scrub_ino_set_warning(
+	struct xfs_scrub_context	*sc,
+	struct xfs_buf			*bp)
+{
+	sc->sm->sm_flags |= XFS_SCRUB_OFLAG_WARNING;
+	trace_xfs_scrub_ino_warning(sc, sc->ip->i_ino, bp ? bp->b_bn : 0,
+			__return_address);
+}
+
+/* Warn about a block indexed by a file fork that needs review. */
+void
+xfs_scrub_fblock_set_warning(
+	struct xfs_scrub_context	*sc,
+	int				whichfork,
+	xfs_fileoff_t			offset)
+{
+	sc->sm->sm_flags |= XFS_SCRUB_OFLAG_WARNING;
+	trace_xfs_scrub_fblock_warning(sc, whichfork, offset, __return_address);
+}
+
+/* Signal an incomplete scrub. */
+void
+xfs_scrub_set_incomplete(
+	struct xfs_scrub_context	*sc)
+{
+	sc->sm->sm_flags |= XFS_SCRUB_OFLAG_INCOMPLETE;
+	trace_xfs_scrub_incomplete(sc, __return_address);
+}
+
 /* Per-scrubber setup functions */
 
 /* Set us up with a transaction and an empty context. */
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index f3d5865..a7c3361 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -33,6 +33,29 @@ xfs_scrub_trans_alloc(
 	return xfs_trans_alloc_empty(mp, tpp);
 }
 
+bool xfs_scrub_process_error(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
+		xfs_agblock_t bno, int *error);
+bool xfs_scrub_fblock_process_error(struct xfs_scrub_context *sc, int whichfork,
+		xfs_fileoff_t offset, int *error);
+
+void xfs_scrub_block_set_preen(struct xfs_scrub_context *sc,
+		struct xfs_buf *bp);
+void xfs_scrub_ino_set_preen(struct xfs_scrub_context *sc, struct xfs_buf *bp);
+
+void xfs_scrub_block_set_corrupt(struct xfs_scrub_context *sc,
+		struct xfs_buf *bp);
+void xfs_scrub_ino_set_corrupt(struct xfs_scrub_context *sc, xfs_ino_t ino,
+		struct xfs_buf *bp);
+void xfs_scrub_fblock_set_corrupt(struct xfs_scrub_context *sc, int whichfork,
+		xfs_fileoff_t offset);
+
+void xfs_scrub_ino_set_warning(struct xfs_scrub_context *sc,
+		struct xfs_buf *bp);
+void xfs_scrub_fblock_set_warning(struct xfs_scrub_context *sc, int whichfork,
+		xfs_fileoff_t offset);
+
+void xfs_scrub_set_incomplete(struct xfs_scrub_context *sc);
+
 /* Setup functions */
 int xfs_scrub_setup_fs(struct xfs_scrub_context *sc, struct xfs_inode *ip);
 
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 688517e..d970659 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -24,6 +24,7 @@
 #define _TRACE_XFS_SCRUB_TRACE_H
 
 #include <linux/tracepoint.h>
+#include "xfs_bit.h"
 
 DECLARE_EVENT_CLASS(xfs_scrub_class,
 	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
@@ -67,6 +68,220 @@ DEFINE_EVENT(xfs_scrub_class, name, \
 
 DEFINE_SCRUB_EVENT(xfs_scrub_start);
 DEFINE_SCRUB_EVENT(xfs_scrub_done);
+DEFINE_SCRUB_EVENT(xfs_scrub_deadlock_retry);
+
+TRACE_EVENT(xfs_scrub_op_error,
+	TP_PROTO(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
+		 xfs_agblock_t bno, int error, void *ret_ip),
+	TP_ARGS(sc, agno, bno, error, ret_ip),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(unsigned int, type)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_agblock_t, bno)
+		__field(int, error)
+		__field(void *, ret_ip)
+	),
+	TP_fast_assign(
+		__entry->dev = sc->mp->m_super->s_dev;
+		__entry->type = sc->sm->sm_type;
+		__entry->agno = agno;
+		__entry->bno = bno;
+		__entry->error = error;
+		__entry->ret_ip = ret_ip;
+	),
+	TP_printk("dev %d:%d type %u agno %u agbno %u error %d ret_ip %pF",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->type,
+		  __entry->agno,
+		  __entry->bno,
+		  __entry->error,
+		  __entry->ret_ip)
+);
+
+TRACE_EVENT(xfs_scrub_file_op_error,
+	TP_PROTO(struct xfs_scrub_context *sc, int whichfork,
+		 xfs_fileoff_t offset, int error, void *ret_ip),
+	TP_ARGS(sc, whichfork, offset, error, ret_ip),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(int, whichfork)
+		__field(unsigned int, type)
+		__field(xfs_fileoff_t, offset)
+		__field(int, error)
+		__field(void *, ret_ip)
+	),
+	TP_fast_assign(
+		__entry->dev = sc->ip->i_mount->m_super->s_dev;
+		__entry->ino = sc->ip->i_ino;
+		__entry->whichfork = whichfork;
+		__entry->type = sc->sm->sm_type;
+		__entry->offset = offset;
+		__entry->error = error;
+		__entry->ret_ip = ret_ip;
+	),
+	TP_printk("dev %d:%d ino %llu fork %d type %u offset %llu error %d ret_ip %pF",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->whichfork,
+		  __entry->type,
+		  __entry->offset,
+		  __entry->error,
+		  __entry->ret_ip)
+);
+
+DECLARE_EVENT_CLASS(xfs_scrub_block_error_class,
+	TP_PROTO(struct xfs_scrub_context *sc, xfs_daddr_t daddr, void *ret_ip),
+	TP_ARGS(sc, daddr, ret_ip),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(unsigned int, type)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_agblock_t, bno)
+		__field(void *, ret_ip)
+	),
+	TP_fast_assign(
+		xfs_fsblock_t	fsbno;
+		xfs_agnumber_t	agno;
+		xfs_agblock_t	bno;
+
+		fsbno = XFS_DADDR_TO_FSB(sc->mp, daddr);
+		agno = XFS_FSB_TO_AGNO(sc->mp, fsbno);
+		bno = XFS_FSB_TO_AGBNO(sc->mp, fsbno);
+
+		__entry->dev = sc->mp->m_super->s_dev;
+		__entry->type = sc->sm->sm_type;
+		__entry->agno = agno;
+		__entry->bno = bno;
+		__entry->ret_ip = ret_ip;
+	),
+	TP_printk("dev %d:%d type %u agno %u agbno %u ret_ip %pF",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->type,
+		  __entry->agno,
+		  __entry->bno,
+		  __entry->ret_ip)
+)
+
+#define DEFINE_SCRUB_BLOCK_ERROR_EVENT(name) \
+DEFINE_EVENT(xfs_scrub_block_error_class, name, \
+	TP_PROTO(struct xfs_scrub_context *sc, xfs_daddr_t daddr, \
+		 void *ret_ip), \
+	TP_ARGS(sc, daddr, ret_ip))
+
+DEFINE_SCRUB_BLOCK_ERROR_EVENT(xfs_scrub_block_error);
+DEFINE_SCRUB_BLOCK_ERROR_EVENT(xfs_scrub_block_preen);
+
+DECLARE_EVENT_CLASS(xfs_scrub_ino_error_class,
+	TP_PROTO(struct xfs_scrub_context *sc, xfs_ino_t ino, xfs_daddr_t daddr,
+		 void *ret_ip),
+	TP_ARGS(sc, ino, daddr, ret_ip),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(unsigned int, type)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_agblock_t, bno)
+		__field(void *, ret_ip)
+	),
+	TP_fast_assign(
+		xfs_fsblock_t	fsbno;
+		xfs_agnumber_t	agno;
+		xfs_agblock_t	bno;
+
+		if (daddr) {
+			fsbno = XFS_DADDR_TO_FSB(sc->mp, daddr);
+			agno = XFS_FSB_TO_AGNO(sc->mp, fsbno);
+			bno = XFS_FSB_TO_AGBNO(sc->mp, fsbno);
+		} else {
+			agno = XFS_INO_TO_AGNO(sc->mp, ino);
+			bno = XFS_AGINO_TO_AGBNO(sc->mp,
+					XFS_INO_TO_AGINO(sc->mp, ino));
+		}
+
+		__entry->dev = sc->mp->m_super->s_dev;
+		__entry->ino = ino;
+		__entry->type = sc->sm->sm_type;
+		__entry->agno = agno;
+		__entry->bno = bno;
+		__entry->ret_ip = ret_ip;
+	),
+	TP_printk("dev %d:%d ino %llu type %u agno %u agbno %u ret_ip %pF",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->type,
+		  __entry->agno,
+		  __entry->bno,
+		  __entry->ret_ip)
+)
+
+#define DEFINE_SCRUB_INO_ERROR_EVENT(name) \
+DEFINE_EVENT(xfs_scrub_ino_error_class, name, \
+	TP_PROTO(struct xfs_scrub_context *sc, xfs_ino_t ino, \
+		 xfs_daddr_t daddr, void *ret_ip), \
+	TP_ARGS(sc, ino, daddr, ret_ip))
+
+DEFINE_SCRUB_INO_ERROR_EVENT(xfs_scrub_ino_error);
+DEFINE_SCRUB_INO_ERROR_EVENT(xfs_scrub_ino_preen);
+DEFINE_SCRUB_INO_ERROR_EVENT(xfs_scrub_ino_warning);
+
+DECLARE_EVENT_CLASS(xfs_scrub_fblock_error_class,
+	TP_PROTO(struct xfs_scrub_context *sc, int whichfork,
+		 xfs_fileoff_t offset, void *ret_ip),
+	TP_ARGS(sc, whichfork, offset, ret_ip),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(int, whichfork)
+		__field(unsigned int, type)
+		__field(xfs_fileoff_t, offset)
+		__field(void *, ret_ip)
+	),
+	TP_fast_assign(
+		__entry->dev = sc->ip->i_mount->m_super->s_dev;
+		__entry->ino = sc->ip->i_ino;
+		__entry->whichfork = whichfork;
+		__entry->type = sc->sm->sm_type;
+		__entry->offset = offset;
+		__entry->ret_ip = ret_ip;
+	),
+	TP_printk("dev %d:%d ino %llu fork %d type %u offset %llu ret_ip %pF",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->whichfork,
+		  __entry->type,
+		  __entry->offset,
+		  __entry->ret_ip)
+);
+
+#define DEFINE_SCRUB_FBLOCK_ERROR_EVENT(name) \
+DEFINE_EVENT(xfs_scrub_fblock_error_class, name, \
+	TP_PROTO(struct xfs_scrub_context *sc, int whichfork, \
+		 xfs_fileoff_t offset, void *ret_ip), \
+	TP_ARGS(sc, whichfork, offset, ret_ip))
+
+DEFINE_SCRUB_FBLOCK_ERROR_EVENT(xfs_scrub_fblock_error);
+DEFINE_SCRUB_FBLOCK_ERROR_EVENT(xfs_scrub_fblock_warning);
+
+TRACE_EVENT(xfs_scrub_incomplete,
+	TP_PROTO(struct xfs_scrub_context *sc, void *ret_ip),
+	TP_ARGS(sc, ret_ip),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(unsigned int, type)
+		__field(void *, ret_ip)
+	),
+	TP_fast_assign(
+		__entry->dev = sc->mp->m_super->s_dev;
+		__entry->type = sc->sm->sm_type;
+		__entry->ret_ip = ret_ip;
+	),
+	TP_printk("dev %d:%d type %u ret_ip %pF",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->type,
+		  __entry->ret_ip)
+);
 
 #endif /* _TRACE_XFS_SCRUB_TRACE_H */
 


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 10/30] xfs: create helpers to scrub a metadata btree
  2017-10-12  1:40 [PATCH v12 00/30] xfs: online scrub support Darrick J. Wong
                   ` (8 preceding siblings ...)
  2017-10-12  1:41 ` [PATCH 09/30] xfs: create helpers to record and deal with scrub problems Darrick J. Wong
@ 2017-10-12  1:41 ` Darrick J. Wong
  2017-10-16  0:56   ` Dave Chinner
  2017-10-12  1:41 ` [PATCH 11/30] xfs: scrub the shape of " Darrick J. Wong
                   ` (19 subsequent siblings)
  29 siblings, 1 reply; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-12  1:41 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Create helper functions and tracepoints to deal with errors while
scrubbing a metadata btree.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile      |    1 
 fs/xfs/scrub/btree.c |  114 +++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/btree.h |   57 +++++++++++++++++
 fs/xfs/scrub/trace.c |   17 +++++
 fs/xfs/scrub/trace.h |  163 ++++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 352 insertions(+)
 create mode 100644 fs/xfs/scrub/btree.c
 create mode 100644 fs/xfs/scrub/btree.h


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index ca14595..5888b9f 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -146,6 +146,7 @@ ifeq ($(CONFIG_XFS_ONLINE_SCRUB),y)
 
 xfs-y				+= $(addprefix scrub/, \
 				   trace.o \
+				   btree.o \
 				   common.o \
 				   scrub.o \
 				   )
diff --git a/fs/xfs/scrub/btree.c b/fs/xfs/scrub/btree.c
new file mode 100644
index 0000000..28539081
--- /dev/null
+++ b/fs/xfs/scrub/btree.c
@@ -0,0 +1,114 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_alloc.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/btree.h"
+#include "scrub/trace.h"
+
+/* btree scrubbing */
+
+/*
+ * Check for btree operation errors.  See the section about handling
+ * operational errors in common.c.
+ */
+bool
+xfs_scrub_btree_process_error(
+	struct xfs_scrub_context	*sc,
+	struct xfs_btree_cur		*cur,
+	int				level,
+	int				*error)
+{
+	if (*error == 0)
+		return true;
+
+	switch (*error) {
+	case -EDEADLOCK:
+		/* Used to restart an op with deadlock avoidance. */
+		trace_xfs_scrub_deadlock_retry(sc->ip, sc->sm, *error);
+		break;
+	case -EFSBADCRC:
+	case -EFSCORRUPTED:
+		/* Note the badness but don't abort. */
+		sc->sm->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT;
+		*error = 0;
+		/* fall through */
+	default:
+		if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE)
+			trace_xfs_scrub_ifork_btree_op_error(sc, cur, level,
+					*error, __return_address);
+		else
+			trace_xfs_scrub_btree_op_error(sc, cur, level,
+					*error, __return_address);
+		break;
+	}
+	return false;
+}
+
+/* Record btree block corruption. */
+void
+xfs_scrub_btree_set_corrupt(
+	struct xfs_scrub_context	*sc,
+	struct xfs_btree_cur		*cur,
+	int				level)
+{
+	sc->sm->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT;
+
+	if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE)
+		trace_xfs_scrub_ifork_btree_error(sc, cur, level,
+				__return_address);
+	else
+		trace_xfs_scrub_btree_error(sc, cur, level,
+				__return_address);
+}
+
+/*
+ * Visit all nodes and leaves of a btree.  Check that all pointers and
+ * records are in order, that the keys reflect the records, and use a callback
+ * so that the caller can verify individual records.  The callback is the same
+ * as the one for xfs_btree_query_range, so therefore this function also
+ * returns XFS_BTREE_QUERY_RANGE_ABORT, zero, or a negative error code.
+ */
+int
+xfs_scrub_btree(
+	struct xfs_scrub_context	*sc,
+	struct xfs_btree_cur		*cur,
+	xfs_scrub_btree_rec_fn		scrub_fn,
+	struct xfs_owner_info		*oinfo,
+	void				*private)
+{
+	int				error = -EOPNOTSUPP;
+
+	xfs_scrub_btree_process_error(sc, cur, 0, &error);
+	return error;
+}
diff --git a/fs/xfs/scrub/btree.h b/fs/xfs/scrub/btree.h
new file mode 100644
index 0000000..4de825a6
--- /dev/null
+++ b/fs/xfs/scrub/btree.h
@@ -0,0 +1,57 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef __XFS_SCRUB_BTREE_H__
+#define __XFS_SCRUB_BTREE_H__
+
+/* btree scrub */
+
+/* Check for btree operation errors. */
+bool xfs_scrub_btree_process_error(struct xfs_scrub_context *sc,
+		struct xfs_btree_cur *cur, int level, int *error);
+
+/* Check for btree corruption. */
+void xfs_scrub_btree_set_corrupt(struct xfs_scrub_context *sc,
+		struct xfs_btree_cur *cur, int level);
+
+struct xfs_scrub_btree;
+typedef int (*xfs_scrub_btree_rec_fn)(
+	struct xfs_scrub_btree	*bs,
+	union xfs_btree_rec	*rec);
+
+struct xfs_scrub_btree {
+	/* caller-provided scrub state */
+	struct xfs_scrub_context	*sc;
+	struct xfs_btree_cur		*cur;
+	xfs_scrub_btree_rec_fn		scrub_rec;
+	struct xfs_owner_info		*oinfo;
+	void				*private;
+
+	/* internal scrub state */
+	union xfs_btree_rec		lastrec;
+	bool				firstrec;
+	union xfs_btree_key		lastkey[XFS_BTREE_MAXLEVELS];
+	bool				firstkey[XFS_BTREE_MAXLEVELS];
+	struct list_head		to_check;
+};
+int xfs_scrub_btree(struct xfs_scrub_context *sc, struct xfs_btree_cur *cur,
+		    xfs_scrub_btree_rec_fn scrub_fn,
+		    struct xfs_owner_info *oinfo, void *private);
+
+#endif /* __XFS_SCRUB_BTREE_H__ */
diff --git a/fs/xfs/scrub/trace.c b/fs/xfs/scrub/trace.c
index 88b5ccb..472080e 100644
--- a/fs/xfs/scrub/trace.c
+++ b/fs/xfs/scrub/trace.c
@@ -30,10 +30,27 @@
 #include "xfs_inode.h"
 #include "xfs_btree.h"
 #include "xfs_trans.h"
+#include "xfs_bit.h"
 #include "scrub/xfs_scrub.h"
 #include "scrub/scrub.h"
 #include "scrub/common.h"
 
+/* Figure out which block the btree cursor was pointing to. */
+static inline xfs_fsblock_t
+xfs_scrub_btree_cur_fsbno(
+	struct xfs_btree_cur		*cur,
+	int				level)
+{
+	if (level < cur->bc_nlevels && cur->bc_bufs[level])
+		return XFS_DADDR_TO_FSB(cur->bc_mp, cur->bc_bufs[level]->b_bn);
+	else if (level == cur->bc_nlevels - 1 &&
+		 cur->bc_flags & XFS_BTREE_LONG_PTRS)
+		return XFS_INO_TO_FSB(cur->bc_mp, cur->bc_private.b.ip->i_ino);
+	else if (!(cur->bc_flags & XFS_BTREE_LONG_PTRS))
+		return XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_private.a.agno, 0);
+	return NULLFSBLOCK;
+}
+
 /*
  * We include this last to have the helpers above available for the trace
  * event implementations.
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index d970659..147ea0b 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -283,6 +283,169 @@ TRACE_EVENT(xfs_scrub_incomplete,
 		  __entry->ret_ip)
 );
 
+TRACE_EVENT(xfs_scrub_btree_op_error,
+	TP_PROTO(struct xfs_scrub_context *sc, struct xfs_btree_cur *cur,
+		 int level, int error, void *ret_ip),
+	TP_ARGS(sc, cur, level, error, ret_ip),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(unsigned int, type)
+		__field(xfs_btnum_t, btnum)
+		__field(int, level)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_agblock_t, bno)
+		__field(int, ptr);
+		__field(int, error)
+		__field(void *, ret_ip)
+	),
+	TP_fast_assign(
+		xfs_fsblock_t fsbno = xfs_scrub_btree_cur_fsbno(cur, level);
+
+		__entry->dev = sc->mp->m_super->s_dev;
+		__entry->type = sc->sm->sm_type;
+		__entry->btnum = cur->bc_btnum;
+		__entry->level = level;
+		__entry->agno = XFS_FSB_TO_AGNO(cur->bc_mp, fsbno);
+		__entry->bno = XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno);
+		__entry->ptr = cur->bc_ptrs[level];
+		__entry->error = error;
+		__entry->ret_ip = ret_ip;
+	),
+	TP_printk("dev %d:%d type %u btnum %d level %d ptr %d agno %u agbno %u error %d ret_ip %pF",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->type,
+		  __entry->btnum,
+		  __entry->level,
+		  __entry->ptr,
+		  __entry->agno,
+		  __entry->bno,
+		  __entry->error,
+		  __entry->ret_ip)
+);
+
+TRACE_EVENT(xfs_scrub_ifork_btree_op_error,
+	TP_PROTO(struct xfs_scrub_context *sc, struct xfs_btree_cur *cur,
+		 int level, int error, void *ret_ip),
+	TP_ARGS(sc, cur, level, error, ret_ip),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(int, whichfork)
+		__field(unsigned int, type)
+		__field(xfs_btnum_t, btnum)
+		__field(int, level)
+		__field(int, ptr)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_agblock_t, bno)
+		__field(int, error)
+		__field(void *, ret_ip)
+	),
+	TP_fast_assign(
+		xfs_fsblock_t fsbno = xfs_scrub_btree_cur_fsbno(cur, level);
+		__entry->dev = sc->mp->m_super->s_dev;
+		__entry->ino = sc->ip->i_ino;
+		__entry->whichfork = cur->bc_private.b.whichfork;
+		__entry->type = sc->sm->sm_type;
+		__entry->btnum = cur->bc_btnum;
+		__entry->level = level;
+		__entry->ptr = cur->bc_ptrs[level];
+		__entry->agno = XFS_FSB_TO_AGNO(cur->bc_mp, fsbno);
+		__entry->bno = XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno);
+		__entry->error = error;
+		__entry->ret_ip = ret_ip;
+	),
+	TP_printk("dev %d:%d ino %llu fork %d type %u btnum %d level %d ptr %d agno %u agbno %u error %d ret_ip %pF",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->whichfork,
+		  __entry->type,
+		  __entry->btnum,
+		  __entry->level,
+		  __entry->ptr,
+		  __entry->agno,
+		  __entry->bno,
+		  __entry->error,
+		  __entry->ret_ip)
+);
+
+TRACE_EVENT(xfs_scrub_btree_error,
+	TP_PROTO(struct xfs_scrub_context *sc, struct xfs_btree_cur *cur,
+		 int level, void *ret_ip),
+	TP_ARGS(sc, cur, level, ret_ip),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(unsigned int, type)
+		__field(xfs_btnum_t, btnum)
+		__field(int, level)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_agblock_t, bno)
+		__field(int, ptr);
+		__field(void *, ret_ip)
+	),
+	TP_fast_assign(
+		xfs_fsblock_t fsbno = xfs_scrub_btree_cur_fsbno(cur, level);
+		__entry->dev = sc->mp->m_super->s_dev;
+		__entry->type = sc->sm->sm_type;
+		__entry->btnum = cur->bc_btnum;
+		__entry->level = level;
+		__entry->agno = XFS_FSB_TO_AGNO(cur->bc_mp, fsbno);
+		__entry->bno = XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno);
+		__entry->ptr = cur->bc_ptrs[level];
+		__entry->ret_ip = ret_ip;
+	),
+	TP_printk("dev %d:%d type %u btnum %d level %d ptr %d agno %u agbno %u ret_ip %pF",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->type,
+		  __entry->btnum,
+		  __entry->level,
+		  __entry->ptr,
+		  __entry->agno,
+		  __entry->bno,
+		  __entry->ret_ip)
+);
+
+TRACE_EVENT(xfs_scrub_ifork_btree_error,
+	TP_PROTO(struct xfs_scrub_context *sc, struct xfs_btree_cur *cur,
+		 int level, void *ret_ip),
+	TP_ARGS(sc, cur, level, ret_ip),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(int, whichfork)
+		__field(unsigned int, type)
+		__field(xfs_btnum_t, btnum)
+		__field(int, level)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_agblock_t, bno)
+		__field(int, ptr);
+		__field(void *, ret_ip)
+	),
+	TP_fast_assign(
+		xfs_fsblock_t fsbno = xfs_scrub_btree_cur_fsbno(cur, level);
+		__entry->dev = sc->mp->m_super->s_dev;
+		__entry->ino = sc->ip->i_ino;
+		__entry->whichfork = cur->bc_private.b.whichfork;
+		__entry->type = sc->sm->sm_type;
+		__entry->btnum = cur->bc_btnum;
+		__entry->level = level;
+		__entry->agno = XFS_FSB_TO_AGNO(cur->bc_mp, fsbno);
+		__entry->bno = XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno);
+		__entry->ptr = cur->bc_ptrs[level];
+		__entry->ret_ip = ret_ip;
+	),
+	TP_printk("dev %d:%d ino %llu fork %d type %u btnum %d level %d ptr %d agno %u agbno %u ret_ip %pF",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->whichfork,
+		  __entry->type,
+		  __entry->btnum,
+		  __entry->level,
+		  __entry->ptr,
+		  __entry->agno,
+		  __entry->bno,
+		  __entry->ret_ip)
+);
+
 #endif /* _TRACE_XFS_SCRUB_TRACE_H */
 
 #undef TRACE_INCLUDE_PATH


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 11/30] xfs: scrub the shape of a metadata btree
  2017-10-12  1:40 [PATCH v12 00/30] xfs: online scrub support Darrick J. Wong
                   ` (9 preceding siblings ...)
  2017-10-12  1:41 ` [PATCH 10/30] xfs: create helpers to scrub a metadata btree Darrick J. Wong
@ 2017-10-12  1:41 ` Darrick J. Wong
  2017-10-16  1:29   ` Dave Chinner
  2017-10-12  1:42 ` [PATCH 12/30] xfs: scrub btree keys and records Darrick J. Wong
                   ` (18 subsequent siblings)
  29 siblings, 1 reply; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-12  1:41 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Create a function that can check the shape of a btree -- each block
passes basic inspection and all the pointers look ok.  In the next patch
we'll add the ability to check the actual keys and records stored within
the btree.  Add some helper functions so that we report detailed scrub
errors in a uniform manner in dmesg.  These are helper functions for
subsequent patches.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_btree.c |   16 +++
 fs/xfs/libxfs/xfs_btree.h |    7 +
 fs/xfs/scrub/btree.c      |  249 ++++++++++++++++++++++++++++++++++++++++++++-
 fs/xfs/scrub/common.h     |   18 +++
 4 files changed, 283 insertions(+), 7 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index 2266a5a..dc23407 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -1051,7 +1051,7 @@ xfs_btree_setbuf(
 	}
 }
 
-STATIC int
+bool
 xfs_btree_ptr_is_null(
 	struct xfs_btree_cur	*cur,
 	union xfs_btree_ptr	*ptr)
@@ -1076,7 +1076,7 @@ xfs_btree_set_ptr_null(
 /*
  * Get/set/init sibling pointers
  */
-STATIC void
+void
 xfs_btree_get_sibling(
 	struct xfs_btree_cur	*cur,
 	struct xfs_btree_block	*block,
@@ -4938,3 +4938,15 @@ xfs_btree_count_blocks(
 	return xfs_btree_visit_blocks(cur, xfs_btree_count_blocks_helper,
 			blocks);
 }
+
+/* Compare two btree pointers. */
+int64_t
+xfs_btree_diff_two_ptrs(
+	struct xfs_btree_cur		*cur,
+	const union xfs_btree_ptr	*a,
+	const union xfs_btree_ptr	*b)
+{
+	if (cur->bc_flags & XFS_BTREE_LONG_PTRS)
+		return (int64_t)be64_to_cpu(a->l) - be64_to_cpu(b->l);
+	return (int64_t)be32_to_cpu(a->s) - be32_to_cpu(b->s);
+}
diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index baf7064..a8431bc 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -531,5 +531,12 @@ int xfs_btree_lookup_get_block(struct xfs_btree_cur *cur, int level,
 		union xfs_btree_ptr *pp, struct xfs_btree_block **blkp);
 struct xfs_btree_block *xfs_btree_get_block(struct xfs_btree_cur *cur,
 		int level, struct xfs_buf **bpp);
+bool xfs_btree_ptr_is_null(struct xfs_btree_cur *cur, union xfs_btree_ptr *ptr);
+int64_t xfs_btree_diff_two_ptrs(struct xfs_btree_cur *cur,
+				const union xfs_btree_ptr *a,
+				const union xfs_btree_ptr *b);
+void xfs_btree_get_sibling(struct xfs_btree_cur *cur,
+			   struct xfs_btree_block *block,
+			   union xfs_btree_ptr *ptr, int lr);
 
 #endif	/* __XFS_BTREE_H__ */
diff --git a/fs/xfs/scrub/btree.c b/fs/xfs/scrub/btree.c
index 28539081..68dec6a 100644
--- a/fs/xfs/scrub/btree.c
+++ b/fs/xfs/scrub/btree.c
@@ -93,11 +93,170 @@ xfs_scrub_btree_set_corrupt(
 }
 
 /*
+ * Check a btree pointer.  Returns true if it's ok to use this pointer.
+ * Callers do not need to set the corrupt flag.
+ */
+static bool
+xfs_scrub_btree_ptr_ok(
+	struct xfs_scrub_btree		*bs,
+	int				level,
+	union xfs_btree_ptr		*ptr)
+{
+	bool				res;
+
+	/* A btree rooted in an inode has no block pointer to the root. */
+	if ((bs->cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) &&
+	    level == bs->cur->bc_nlevels)
+		return true;
+
+	/* Otherwise, check the pointers. */
+	if (bs->cur->bc_flags & XFS_BTREE_LONG_PTRS) {
+		res = xfs_btree_check_lptr(bs->cur, be64_to_cpu(ptr->l), level);
+		if (!res)
+			xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, level);
+	} else {
+		res = xfs_btree_check_sptr(bs->cur, be32_to_cpu(ptr->s), level);
+		if (!res)
+			xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, level);
+	}
+
+	return res;
+}
+
+/* Check that a btree block's sibling matches what we expect it. */
+STATIC int
+xfs_scrub_btree_block_check_sibling(
+	struct xfs_scrub_btree		*bs,
+	int				level,
+	int				direction,
+	union xfs_btree_ptr		*sibling)
+{
+	struct xfs_btree_cur		*cur = bs->cur;
+	struct xfs_btree_block		*pblock;
+	struct xfs_buf			*pbp;
+	struct xfs_btree_cur		*ncur = NULL;
+	union xfs_btree_ptr		*pp;
+	int				success;
+	int				error;
+
+	if (xfs_btree_ptr_is_null(cur, sibling))
+		return 0;
+
+	error = xfs_btree_dup_cursor(cur, &ncur);
+	if (!xfs_scrub_btree_process_error(bs->sc, cur, level + 1, &error) ||
+	    !ncur)
+		return error;
+
+	if (direction > 0)
+		error = xfs_btree_increment(ncur, level + 1, &success);
+	else
+		error = xfs_btree_decrement(ncur, level + 1, &success);
+	if (!xfs_scrub_btree_process_error(bs->sc, cur, level + 1, &error))
+		goto out;
+	if (!success) {
+		xfs_scrub_btree_set_corrupt(bs->sc, cur, level + 1);
+		goto out;
+	}
+
+	pblock = xfs_btree_get_block(ncur, level + 1, &pbp);
+	pp = xfs_btree_ptr_addr(ncur, ncur->bc_ptrs[level + 1], pblock);
+	if (!xfs_scrub_btree_ptr_ok(bs, level + 1, pp))
+		goto out;
+
+	if (xfs_btree_diff_two_ptrs(cur, pp, sibling))
+		xfs_scrub_btree_set_corrupt(bs->sc, cur, level);
+out:
+	xfs_btree_del_cursor(ncur, XFS_BTREE_ERROR);
+	return error;
+}
+
+/* Check the siblings of a btree block. */
+STATIC int
+xfs_scrub_btree_block_check_siblings(
+	struct xfs_scrub_btree		*bs,
+	struct xfs_btree_block		*block)
+{
+	struct xfs_btree_cur		*cur = bs->cur;
+	union xfs_btree_ptr		leftsib;
+	union xfs_btree_ptr		rightsib;
+	int				level;
+	int				error = 0;
+
+	xfs_btree_get_sibling(cur, block, &leftsib, XFS_BB_LEFTSIB);
+	xfs_btree_get_sibling(cur, block, &rightsib, XFS_BB_RIGHTSIB);
+	level = xfs_btree_get_level(block);
+
+	/* Root block should never have siblings. */
+	if (level == cur->bc_nlevels - 1) {
+		if (!xfs_btree_ptr_is_null(cur, &leftsib) ||
+		    !xfs_btree_ptr_is_null(cur, &rightsib))
+			xfs_scrub_btree_set_corrupt(bs->sc, cur, level);
+		goto out;
+	}
+
+	/*
+	 * Does the left & right sibling pointers match the adjacent
+	 * parent level pointers?
+	 * (These function absorbs error codes for us.)
+	 */
+	error = xfs_scrub_btree_block_check_sibling(bs, level, -1, &leftsib);
+	if (error)
+		return error;
+	error = xfs_scrub_btree_block_check_sibling(bs, level, 1, &rightsib);
+	if (error)
+		return error;
+out:
+	return error;
+}
+
+/*
+ * Grab and scrub a btree block given a btree pointer.  Returns block
+ * and buffer pointers (if applicable) if they're ok to use.
+ */
+STATIC int
+xfs_scrub_btree_get_block(
+	struct xfs_scrub_btree		*bs,
+	int				level,
+	union xfs_btree_ptr		*pp,
+	struct xfs_btree_block		**pblock,
+	struct xfs_buf			**pbp)
+{
+	void				*failed_at;
+	int				error;
+
+	error = xfs_btree_lookup_get_block(bs->cur, level, pp, pblock);
+	if (!xfs_scrub_btree_process_error(bs->sc, bs->cur, level, &error) ||
+	    !pblock)
+		return error;
+
+	xfs_btree_get_block(bs->cur, level, pbp);
+	if (bs->cur->bc_flags & XFS_BTREE_LONG_PTRS) {
+		failed_at = __xfs_btree_check_lblock(bs->cur, *pblock,
+				level, *pbp);
+		if (failed_at) {
+			xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, level);
+			return 0;
+		}
+	} else {
+		failed_at = __xfs_btree_check_sblock(bs->cur, *pblock,
+				 level, *pbp);
+		if (failed_at) {
+			xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, level);
+			return 0;
+		}
+	}
+
+	/*
+	 * Check the block's siblings; this function absorbs error codes
+	 * for us.
+	 */
+	return xfs_scrub_btree_block_check_siblings(bs, *pblock);
+}
+
+/*
  * Visit all nodes and leaves of a btree.  Check that all pointers and
  * records are in order, that the keys reflect the records, and use a callback
- * so that the caller can verify individual records.  The callback is the same
- * as the one for xfs_btree_query_range, so therefore this function also
- * returns XFS_BTREE_QUERY_RANGE_ABORT, zero, or a negative error code.
+ * so that the caller can verify individual records.
  */
 int
 xfs_scrub_btree(
@@ -107,8 +266,88 @@ xfs_scrub_btree(
 	struct xfs_owner_info		*oinfo,
 	void				*private)
 {
-	int				error = -EOPNOTSUPP;
+	struct xfs_scrub_btree		bs = {0};
+	union xfs_btree_ptr		ptr;
+	union xfs_btree_ptr		*pp;
+	struct xfs_btree_block		*block;
+	int				level;
+	struct xfs_buf			*bp;
+	int				i;
+	int				error = 0;
+
+	/* Initialize scrub state */
+	bs.cur = cur;
+	bs.scrub_rec = scrub_fn;
+	bs.oinfo = oinfo;
+	bs.firstrec = true;
+	bs.private = private;
+	bs.sc = sc;
+	for (i = 0; i < XFS_BTREE_MAXLEVELS; i++)
+		bs.firstkey[i] = true;
+	INIT_LIST_HEAD(&bs.to_check);
+
+	/* Don't try to check a tree with a height we can't handle. */
+	if (cur->bc_nlevels > XFS_BTREE_MAXLEVELS) {
+		xfs_scrub_btree_set_corrupt(sc, cur, 0);
+		goto out;
+	}
+
+	/*
+	 * Load the root of the btree.  The helper function absorbs
+	 * error codes for us.
+	 */
+	level = cur->bc_nlevels - 1;
+	cur->bc_ops->init_ptr_from_cur(cur, &ptr);
+	if (!xfs_scrub_btree_ptr_ok(&bs, cur->bc_nlevels, &ptr))
+		goto out;
+	error = xfs_scrub_btree_get_block(&bs, level, &ptr, &block, &bp);
+	if (error)
+		goto out;
+
+	cur->bc_ptrs[level] = 1;
+
+	while (level < cur->bc_nlevels) {
+		block = xfs_btree_get_block(cur, level, &bp);
+
+		if (level == 0) {
+			/* End of leaf, pop back towards the root. */
+			if (cur->bc_ptrs[level] >
+			    be16_to_cpu(block->bb_numrecs)) {
+				if (level < cur->bc_nlevels - 1)
+					cur->bc_ptrs[level + 1]++;
+				level++;
+				continue;
+			}
+
+			if (xfs_scrub_should_terminate(sc, &error))
+				break;
+
+			cur->bc_ptrs[level]++;
+			continue;
+		}
+
+		/* End of node, pop back towards the root. */
+		if (cur->bc_ptrs[level] > be16_to_cpu(block->bb_numrecs)) {
+			if (level < cur->bc_nlevels - 1)
+				cur->bc_ptrs[level + 1]++;
+			level++;
+			continue;
+		}
+
+		/* Drill another level deeper. */
+		pp = xfs_btree_ptr_addr(cur, cur->bc_ptrs[level], block);
+		if (!xfs_scrub_btree_ptr_ok(&bs, level, pp)) {
+			cur->bc_ptrs[level]++;
+			continue;
+		}
+		level--;
+		error = xfs_scrub_btree_get_block(&bs, level, pp, &block, &bp);
+		if (!xfs_scrub_btree_process_error(sc, cur, level, &error))
+			goto out;
+
+		cur->bc_ptrs[level] = 1;
+	}
 
-	xfs_scrub_btree_process_error(sc, cur, 0, &error);
+out:
 	return error;
 }
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index a7c3361..414bbb8 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -21,6 +21,24 @@
 #define __XFS_SCRUB_COMMON_H__
 
 /*
+ * We /could/ terminate a scrub/repair operation early.  If we're not
+ * in a good place to continue (fatal signal, etc.) then bail out.
+ * Note that we're careful not to make any judgements about *error.
+ */
+static inline bool
+xfs_scrub_should_terminate(
+	struct xfs_scrub_context	*sc,
+	int				*error)
+{
+	if (fatal_signal_pending(current)) {
+		if (*error == 0)
+			*error = -EAGAIN;
+		return true;
+	}
+	return false;
+}
+
+/*
  * Grab an empty transaction so that we can re-grab locked buffers if
  * one of our btrees turns out to be cyclic.
  */


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 12/30] xfs: scrub btree keys and records
  2017-10-12  1:40 [PATCH v12 00/30] xfs: online scrub support Darrick J. Wong
                   ` (10 preceding siblings ...)
  2017-10-12  1:41 ` [PATCH 11/30] xfs: scrub the shape of " Darrick J. Wong
@ 2017-10-12  1:42 ` Darrick J. Wong
  2017-10-16  1:31   ` Dave Chinner
  2017-10-12  1:42 ` [PATCH 13/30] xfs: create helpers to scan an allocation group Darrick J. Wong
                   ` (17 subsequent siblings)
  29 siblings, 1 reply; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-12  1:42 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Add to the btree scrubber the ability to check that the keys and
records are in the right order and actually call out to our record
iterator to do actual checking of the records.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/scrub/btree.c |  110 ++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/trace.h |   45 ++++++++++++++++++++
 2 files changed, 154 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/scrub/btree.c b/fs/xfs/scrub/btree.c
index 68dec6a..0cd591f 100644
--- a/fs/xfs/scrub/btree.c
+++ b/fs/xfs/scrub/btree.c
@@ -93,6 +93,101 @@ xfs_scrub_btree_set_corrupt(
 }
 
 /*
+ * Make sure this record is in order and doesn't stray outside of the parent
+ * keys.
+ */
+STATIC void
+xfs_scrub_btree_rec(
+	struct xfs_scrub_btree	*bs)
+{
+	struct xfs_btree_cur	*cur = bs->cur;
+	union xfs_btree_rec	*rec;
+	union xfs_btree_key	key;
+	union xfs_btree_key	hkey;
+	union xfs_btree_key	*keyp;
+	struct xfs_btree_block	*block;
+	struct xfs_btree_block	*keyblock;
+	struct xfs_buf		*bp;
+
+	block = xfs_btree_get_block(cur, 0, &bp);
+	rec = xfs_btree_rec_addr(cur, cur->bc_ptrs[0], block);
+
+	trace_xfs_scrub_btree_rec(bs->sc, cur, 0);
+
+	/* If this isn't the first record, are they in order? */
+	if (!bs->firstrec && !cur->bc_ops->recs_inorder(cur, &bs->lastrec, rec))
+		xfs_scrub_btree_set_corrupt(bs->sc, cur, 0);
+	bs->firstrec = false;
+	memcpy(&bs->lastrec, rec, cur->bc_ops->rec_len);
+
+	if (cur->bc_nlevels == 1)
+		return;
+
+	/* Is this at least as large as the parent low key? */
+	cur->bc_ops->init_key_from_rec(&key, rec);
+	keyblock = xfs_btree_get_block(cur, 1, &bp);
+	keyp = xfs_btree_key_addr(cur, cur->bc_ptrs[1], keyblock);
+	if (cur->bc_ops->diff_two_keys(cur, &key, keyp) < 0)
+		xfs_scrub_btree_set_corrupt(bs->sc, cur, 1);
+
+	if (!(cur->bc_flags & XFS_BTREE_OVERLAPPING))
+		return;
+
+	/* Is this no larger than the parent high key? */
+	cur->bc_ops->init_high_key_from_rec(&hkey, rec);
+	keyp = xfs_btree_high_key_addr(cur, cur->bc_ptrs[1], keyblock);
+	if (cur->bc_ops->diff_two_keys(cur, keyp, &hkey) < 0)
+		xfs_scrub_btree_set_corrupt(bs->sc, cur, 1);
+}
+
+/*
+ * Make sure this key is in order and doesn't stray outside of the parent
+ * keys.
+ */
+STATIC void
+xfs_scrub_btree_key(
+	struct xfs_scrub_btree	*bs,
+	int			level)
+{
+	struct xfs_btree_cur	*cur = bs->cur;
+	union xfs_btree_key	*key;
+	union xfs_btree_key	*keyp;
+	struct xfs_btree_block	*block;
+	struct xfs_btree_block	*keyblock;
+	struct xfs_buf		*bp;
+
+	block = xfs_btree_get_block(cur, level, &bp);
+	key = xfs_btree_key_addr(cur, cur->bc_ptrs[level], block);
+
+	trace_xfs_scrub_btree_key(bs->sc, cur, level);
+
+	/* If this isn't the first key, are they in order? */
+	if (!bs->firstkey[level] &&
+	    !cur->bc_ops->keys_inorder(cur, &bs->lastkey[level], key))
+		xfs_scrub_btree_set_corrupt(bs->sc, cur, level);
+	bs->firstkey[level] = false;
+	memcpy(&bs->lastkey[level], key, cur->bc_ops->key_len);
+
+	if (level + 1 >= cur->bc_nlevels)
+		return;
+
+	/* Is this at least as large as the parent low key? */
+	keyblock = xfs_btree_get_block(cur, level + 1, &bp);
+	keyp = xfs_btree_key_addr(cur, cur->bc_ptrs[level + 1], keyblock);
+	if (cur->bc_ops->diff_two_keys(cur, key, keyp) < 0)
+		xfs_scrub_btree_set_corrupt(bs->sc, cur, level);
+
+	if (!(cur->bc_flags & XFS_BTREE_OVERLAPPING))
+		return;
+
+	/* Is this no larger than the parent high key? */
+	key = xfs_btree_high_key_addr(cur, cur->bc_ptrs[level], block);
+	keyp = xfs_btree_high_key_addr(cur, cur->bc_ptrs[level + 1], keyblock);
+	if (cur->bc_ops->diff_two_keys(cur, keyp, key) < 0)
+		xfs_scrub_btree_set_corrupt(bs->sc, cur, level);
+}
+
+/*
  * Check a btree pointer.  Returns true if it's ok to use this pointer.
  * Callers do not need to set the corrupt flag.
  */
@@ -269,6 +364,7 @@ xfs_scrub_btree(
 	struct xfs_scrub_btree		bs = {0};
 	union xfs_btree_ptr		ptr;
 	union xfs_btree_ptr		*pp;
+	union xfs_btree_rec		*recp;
 	struct xfs_btree_block		*block;
 	int				level;
 	struct xfs_buf			*bp;
@@ -319,7 +415,16 @@ xfs_scrub_btree(
 				continue;
 			}
 
-			if (xfs_scrub_should_terminate(sc, &error))
+			/* Records in order for scrub? */
+			xfs_scrub_btree_rec(&bs);
+
+			/* Call out to the record checker. */
+			recp = xfs_btree_rec_addr(cur, cur->bc_ptrs[0], block);
+			error = bs.scrub_rec(&bs, recp);
+			if (error)
+				break;
+			if (xfs_scrub_should_terminate(sc, &error) ||
+			    (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT))
 				break;
 
 			cur->bc_ptrs[level]++;
@@ -334,6 +439,9 @@ xfs_scrub_btree(
 			continue;
 		}
 
+		/* Keys in order for scrub? */
+		xfs_scrub_btree_key(&bs, level);
+
 		/* Drill another level deeper. */
 		pp = xfs_btree_ptr_addr(cur, cur->bc_ptrs[level], block);
 		if (!xfs_scrub_btree_ptr_ok(&bs, level, pp)) {
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 147ea0b..c4ebfb5 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -446,6 +446,51 @@ TRACE_EVENT(xfs_scrub_ifork_btree_error,
 		  __entry->ret_ip)
 );
 
+DECLARE_EVENT_CLASS(xfs_scrub_sbtree_class,
+	TP_PROTO(struct xfs_scrub_context *sc, struct xfs_btree_cur *cur,
+		 int level),
+	TP_ARGS(sc, cur, level),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(int, type)
+		__field(xfs_btnum_t, btnum)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_agblock_t, bno)
+		__field(int, level)
+		__field(int, nlevels)
+		__field(int, ptr)
+	),
+	TP_fast_assign(
+		xfs_fsblock_t fsbno = xfs_scrub_btree_cur_fsbno(cur, level);
+
+		__entry->dev = sc->mp->m_super->s_dev;
+		__entry->type = sc->sm->sm_type;
+		__entry->btnum = cur->bc_btnum;
+		__entry->agno = XFS_FSB_TO_AGNO(cur->bc_mp, fsbno);
+		__entry->bno = XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno);
+		__entry->level = level;
+		__entry->nlevels = cur->bc_nlevels;
+		__entry->ptr = cur->bc_ptrs[level];
+	),
+	TP_printk("dev %d:%d type %u btnum %d agno %u agbno %u level %d nlevels %d ptr %d",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->type,
+		  __entry->btnum,
+		  __entry->agno,
+		  __entry->bno,
+		  __entry->level,
+		  __entry->nlevels,
+		  __entry->ptr)
+)
+#define DEFINE_SCRUB_SBTREE_EVENT(name) \
+DEFINE_EVENT(xfs_scrub_sbtree_class, name, \
+	TP_PROTO(struct xfs_scrub_context *sc, struct xfs_btree_cur *cur, \
+		 int level), \
+	TP_ARGS(sc, cur, level))
+
+DEFINE_SCRUB_SBTREE_EVENT(xfs_scrub_btree_rec);
+DEFINE_SCRUB_SBTREE_EVENT(xfs_scrub_btree_key);
+
 #endif /* _TRACE_XFS_SCRUB_TRACE_H */
 
 #undef TRACE_INCLUDE_PATH


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 13/30] xfs: create helpers to scan an allocation group
  2017-10-12  1:40 [PATCH v12 00/30] xfs: online scrub support Darrick J. Wong
                   ` (11 preceding siblings ...)
  2017-10-12  1:42 ` [PATCH 12/30] xfs: scrub btree keys and records Darrick J. Wong
@ 2017-10-12  1:42 ` Darrick J. Wong
  2017-10-16  1:32   ` Dave Chinner
  2017-10-12  1:42 ` [PATCH 14/30] xfs: scrub the secondary superblocks Darrick J. Wong
                   ` (16 subsequent siblings)
  29 siblings, 1 reply; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-12  1:42 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Add some helpers to enable us to lock an AG's headers, create btree
cursors for all btrees in that allocation group, and clean up
afterwards.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/scrub/common.c |  179 +++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/common.h |   10 +++
 fs/xfs/scrub/scrub.c  |    4 +
 fs/xfs/scrub/scrub.h  |   21 ++++++
 4 files changed, 214 insertions(+)


diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 709d491..cd6fada 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -44,6 +44,7 @@
 #include "scrub/scrub.h"
 #include "scrub/common.h"
 #include "scrub/trace.h"
+#include "scrub/btree.h"
 
 /* Common code for the metadata scrubbers. */
 
@@ -237,6 +238,184 @@ xfs_scrub_set_incomplete(
 	trace_xfs_scrub_incomplete(sc, __return_address);
 }
 
+/*
+ * AG scrubbing
+ *
+ * These helpers facilitate locking an allocation group's header
+ * buffers, setting up cursors for all btrees that are present, and
+ * cleaning everything up once we're through.
+ */
+
+/*
+ * Grab all the headers for an AG.
+ *
+ * The headers should be released by xfs_scrub_ag_free, but as a fail
+ * safe we attach all the buffers we grab to the scrub transaction so
+ * they'll all be freed when we cancel it.
+ */
+int
+xfs_scrub_ag_read_headers(
+	struct xfs_scrub_context	*sc,
+	xfs_agnumber_t			agno,
+	struct xfs_buf			**agi,
+	struct xfs_buf			**agf,
+	struct xfs_buf			**agfl)
+{
+	struct xfs_mount		*mp = sc->mp;
+	int				error;
+
+	error = xfs_ialloc_read_agi(mp, sc->tp, agno, agi);
+	if (error)
+		goto out;
+
+	error = xfs_alloc_read_agf(mp, sc->tp, agno, 0, agf);
+	if (error)
+		goto out;
+	if (!*agf) {
+		error = -ENOMEM;
+		goto out;
+	}
+
+	error = xfs_alloc_read_agfl(mp, sc->tp, agno, agfl);
+	if (error)
+		goto out;
+
+out:
+	return error;
+}
+
+/* Release all the AG btree cursors. */
+void
+xfs_scrub_ag_btcur_free(
+	struct xfs_scrub_ag		*sa)
+{
+	if (sa->refc_cur)
+		xfs_btree_del_cursor(sa->refc_cur, XFS_BTREE_ERROR);
+	if (sa->rmap_cur)
+		xfs_btree_del_cursor(sa->rmap_cur, XFS_BTREE_ERROR);
+	if (sa->fino_cur)
+		xfs_btree_del_cursor(sa->fino_cur, XFS_BTREE_ERROR);
+	if (sa->ino_cur)
+		xfs_btree_del_cursor(sa->ino_cur, XFS_BTREE_ERROR);
+	if (sa->cnt_cur)
+		xfs_btree_del_cursor(sa->cnt_cur, XFS_BTREE_ERROR);
+	if (sa->bno_cur)
+		xfs_btree_del_cursor(sa->bno_cur, XFS_BTREE_ERROR);
+
+	sa->refc_cur = NULL;
+	sa->rmap_cur = NULL;
+	sa->fino_cur = NULL;
+	sa->ino_cur = NULL;
+	sa->bno_cur = NULL;
+	sa->cnt_cur = NULL;
+}
+
+/* Initialize all the btree cursors for an AG. */
+int
+xfs_scrub_ag_btcur_init(
+	struct xfs_scrub_context	*sc,
+	struct xfs_scrub_ag		*sa)
+{
+	struct xfs_mount		*mp = sc->mp;
+	xfs_agnumber_t			agno = sa->agno;
+
+	if (sa->agf_bp) {
+		/* Set up a bnobt cursor for cross-referencing. */
+		sa->bno_cur = xfs_allocbt_init_cursor(mp, sc->tp, sa->agf_bp,
+				agno, XFS_BTNUM_BNO);
+		if (!sa->bno_cur)
+			goto err;
+
+		/* Set up a cntbt cursor for cross-referencing. */
+		sa->cnt_cur = xfs_allocbt_init_cursor(mp, sc->tp, sa->agf_bp,
+				agno, XFS_BTNUM_CNT);
+		if (!sa->cnt_cur)
+			goto err;
+	}
+
+	/* Set up a inobt cursor for cross-referencing. */
+	if (sa->agi_bp) {
+		sa->ino_cur = xfs_inobt_init_cursor(mp, sc->tp, sa->agi_bp,
+					agno, XFS_BTNUM_INO);
+		if (!sa->ino_cur)
+			goto err;
+	}
+
+	/* Set up a finobt cursor for cross-referencing. */
+	if (sa->agi_bp && xfs_sb_version_hasfinobt(&mp->m_sb)) {
+		sa->fino_cur = xfs_inobt_init_cursor(mp, sc->tp, sa->agi_bp,
+				agno, XFS_BTNUM_FINO);
+		if (!sa->fino_cur)
+			goto err;
+	}
+
+	/* Set up a rmapbt cursor for cross-referencing. */
+	if (sa->agf_bp && xfs_sb_version_hasrmapbt(&mp->m_sb)) {
+		sa->rmap_cur = xfs_rmapbt_init_cursor(mp, sc->tp, sa->agf_bp,
+				agno);
+		if (!sa->rmap_cur)
+			goto err;
+	}
+
+	/* Set up a refcountbt cursor for cross-referencing. */
+	if (sa->agf_bp && xfs_sb_version_hasreflink(&mp->m_sb)) {
+		sa->refc_cur = xfs_refcountbt_init_cursor(mp, sc->tp,
+				sa->agf_bp, agno, NULL);
+		if (!sa->refc_cur)
+			goto err;
+	}
+
+	return 0;
+err:
+	return -ENOMEM;
+}
+
+/* Release the AG header context and btree cursors. */
+void
+xfs_scrub_ag_free(
+	struct xfs_scrub_context	*sc,
+	struct xfs_scrub_ag		*sa)
+{
+	xfs_scrub_ag_btcur_free(sa);
+	if (sa->agfl_bp) {
+		xfs_trans_brelse(sc->tp, sa->agfl_bp);
+		sa->agfl_bp = NULL;
+	}
+	if (sa->agf_bp) {
+		xfs_trans_brelse(sc->tp, sa->agf_bp);
+		sa->agf_bp = NULL;
+	}
+	if (sa->agi_bp) {
+		xfs_trans_brelse(sc->tp, sa->agi_bp);
+		sa->agi_bp = NULL;
+	}
+	sa->agno = NULLAGNUMBER;
+}
+
+/*
+ * For scrub, grab the AGI and the AGF headers, in that order.  Locking
+ * order requires us to get the AGI before the AGF.  We use the
+ * transaction to avoid deadlocking on crosslinked metadata buffers;
+ * either the caller passes one in (bmap scrub) or we have to create a
+ * transaction ourselves.
+ */
+int
+xfs_scrub_ag_init(
+	struct xfs_scrub_context	*sc,
+	xfs_agnumber_t			agno,
+	struct xfs_scrub_ag		*sa)
+{
+	int				error;
+
+	sa->agno = agno;
+	error = xfs_scrub_ag_read_headers(sc, agno, &sa->agi_bp,
+			&sa->agf_bp, &sa->agfl_bp);
+	if (error)
+		return error;
+
+	return xfs_scrub_ag_btcur_init(sc, sa);
+}
+
 /* Per-scrubber setup functions */
 
 /* Set us up with a transaction and an empty context. */
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 414bbb8..aca39b5 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -77,4 +77,14 @@ void xfs_scrub_set_incomplete(struct xfs_scrub_context *sc);
 /* Setup functions */
 int xfs_scrub_setup_fs(struct xfs_scrub_context *sc, struct xfs_inode *ip);
 
+void xfs_scrub_ag_free(struct xfs_scrub_context *sc, struct xfs_scrub_ag *sa);
+int xfs_scrub_ag_init(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
+		      struct xfs_scrub_ag *sa);
+int xfs_scrub_ag_read_headers(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
+			      struct xfs_buf **agi, struct xfs_buf **agf,
+			      struct xfs_buf **agfl);
+void xfs_scrub_ag_btcur_free(struct xfs_scrub_ag *sa);
+int xfs_scrub_ag_btcur_init(struct xfs_scrub_context *sc,
+			    struct xfs_scrub_ag *sa);
+
 #endif	/* __XFS_SCRUB_COMMON_H__ */
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 92eac98..3a98060 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -44,6 +44,8 @@
 #include "scrub/scrub.h"
 #include "scrub/common.h"
 #include "scrub/trace.h"
+#include "scrub/scrub.h"
+#include "scrub/btree.h"
 
 /*
  * Online Scrub and Repair
@@ -131,6 +133,7 @@ xfs_scrub_teardown(
 	struct xfs_scrub_context	*sc,
 	int				error)
 {
+	xfs_scrub_ag_free(sc, &sc->sa);
 	if (sc->tp) {
 		xfs_trans_cancel(sc->tp);
 		sc->tp = NULL;
@@ -231,6 +234,7 @@ xfs_scrub_metadata(
 	sc.sm = sm;
 	sc.ops = ops;
 	sc.try_harder = try_harder;
+	sc.sa.agno = NULLAGNUMBER;
 	error = sc.ops->setup(&sc, ip);
 	if (error)
 		goto out_teardown;
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index b7b9422..1385295 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -34,6 +34,24 @@ struct xfs_scrub_meta_ops {
 	bool		(*has)(struct xfs_sb *);
 };
 
+/* Buffer pointers and btree cursors for an entire AG. */
+struct xfs_scrub_ag {
+	xfs_agnumber_t			agno;
+
+	/* AG btree roots */
+	struct xfs_buf			*agf_bp;
+	struct xfs_buf			*agfl_bp;
+	struct xfs_buf			*agi_bp;
+
+	/* AG btrees */
+	struct xfs_btree_cur		*bno_cur;
+	struct xfs_btree_cur		*cnt_cur;
+	struct xfs_btree_cur		*ino_cur;
+	struct xfs_btree_cur		*fino_cur;
+	struct xfs_btree_cur		*rmap_cur;
+	struct xfs_btree_cur		*refc_cur;
+};
+
 struct xfs_scrub_context {
 	/* General scrub state. */
 	struct xfs_mount		*mp;
@@ -42,6 +60,9 @@ struct xfs_scrub_context {
 	struct xfs_trans		*tp;
 	struct xfs_inode		*ip;
 	bool				try_harder;
+
+	/* State tracking for single-AG operations. */
+	struct xfs_scrub_ag		sa;
 };
 
 /* Metadata scrubbers */


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 14/30] xfs: scrub the secondary superblocks
  2017-10-12  1:40 [PATCH v12 00/30] xfs: online scrub support Darrick J. Wong
                   ` (12 preceding siblings ...)
  2017-10-12  1:42 ` [PATCH 13/30] xfs: create helpers to scan an allocation group Darrick J. Wong
@ 2017-10-12  1:42 ` Darrick J. Wong
  2017-10-16  5:16   ` Dave Chinner
  2017-10-12  1:42 ` [PATCH 15/30] xfs: scrub AGF and AGFL Darrick J. Wong
                   ` (15 subsequent siblings)
  29 siblings, 1 reply; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-12  1:42 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Ensure that the geometry presented in the backup superblocks matches
the primary superblock so that repair can recover the filesystem if
that primary gets corrupted.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile         |    1 
 fs/xfs/libxfs/xfs_fs.h  |    3 
 fs/xfs/scrub/agheader.c |  330 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/common.h   |    2 
 fs/xfs/scrub/scrub.c    |    4 +
 fs/xfs/scrub/scrub.h    |    1 
 6 files changed, 340 insertions(+), 1 deletion(-)
 create mode 100644 fs/xfs/scrub/agheader.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 5888b9f..e92d04d 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -146,6 +146,7 @@ ifeq ($(CONFIG_XFS_ONLINE_SCRUB),y)
 
 xfs-y				+= $(addprefix scrub/, \
 				   trace.o \
+				   agheader.o \
 				   btree.o \
 				   common.o \
 				   scrub.o \
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 765f91e..8543cbb 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -484,9 +484,10 @@ struct xfs_scrub_metadata {
 
 /* Scrub subcommands. */
 #define XFS_SCRUB_TYPE_PROBE	0	/* presence test ioctl */
+#define XFS_SCRUB_TYPE_SB	1	/* superblock */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	1
+#define XFS_SCRUB_TYPE_NR	2
 
 /* i: Repair this metadata. */
 #define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/agheader.c b/fs/xfs/scrub/agheader.c
new file mode 100644
index 0000000..aa1025f
--- /dev/null
+++ b/fs/xfs/scrub/agheader.c
@@ -0,0 +1,330 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/trace.h"
+
+/*
+ * Set up scrub to check all the static metadata in each AG.
+ * This means the SB, AGF, AGI, and AGFL headers.
+ */
+int
+xfs_scrub_setup_ag_header(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	struct xfs_mount		*mp = sc->mp;
+
+	if (sc->sm->sm_agno >= mp->m_sb.sb_agcount ||
+	    sc->sm->sm_ino || sc->sm->sm_gen)
+		return -EINVAL;
+	return xfs_scrub_setup_fs(sc, ip);
+}
+
+/* Superblock */
+
+/*
+ * Scrub the filesystem superblock.
+ *
+ * Note: We do /not/ attempt to check AG 0's superblock.  Mount is
+ * responsible for validating all the geometry information in sb 0, so
+ * if the filesystem is capable of initiating online scrub, then clearly
+ * sb 0 is ok and we can use its information to check everything else.
+ */
+int
+xfs_scrub_superblock(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_buf			*bp;
+	struct xfs_dsb			*sb;
+	xfs_agnumber_t			agno;
+	uint32_t			v2_ok;
+	__be32				features_mask;
+	int				error;
+	__be16				vernum_mask;
+
+	agno = sc->sm->sm_agno;
+	if (agno == 0)
+		return 0;
+
+	error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
+		  XFS_AGB_TO_DADDR(mp, agno, XFS_SB_BLOCK(mp)),
+		  XFS_FSS_TO_BB(mp, 1), 0, &bp, &xfs_sb_buf_ops);
+	if (!xfs_scrub_process_error(sc, agno, XFS_SB_BLOCK(mp), &error))
+		return error;
+
+	sb = XFS_BUF_TO_SBP(bp);
+
+	/*
+	 * Verify the geometries match.  Fields that are permanently
+	 * set by mkfs are checked; fields that can be updated later
+	 * (and are not propagated to backup superblocks) are preen
+	 * checked.
+	 */
+	if (sb->sb_blocksize != cpu_to_be32(mp->m_sb.sb_blocksize))
+		xfs_scrub_block_set_corrupt(sc, bp);
+
+	if (sb->sb_dblocks != cpu_to_be64(mp->m_sb.sb_dblocks))
+		xfs_scrub_block_set_corrupt(sc, bp);
+
+	if (sb->sb_rblocks != cpu_to_be64(mp->m_sb.sb_rblocks))
+		xfs_scrub_block_set_corrupt(sc, bp);
+
+	if (sb->sb_rextents != cpu_to_be64(mp->m_sb.sb_rextents))
+		xfs_scrub_block_set_corrupt(sc, bp);
+
+	if (!uuid_equal(&sb->sb_uuid, &mp->m_sb.sb_uuid))
+		xfs_scrub_block_set_preen(sc, bp);
+
+	if (sb->sb_logstart != cpu_to_be64(mp->m_sb.sb_logstart))
+		xfs_scrub_block_set_corrupt(sc, bp);
+
+	if (sb->sb_rootino != cpu_to_be64(mp->m_sb.sb_rootino))
+		xfs_scrub_block_set_preen(sc, bp);
+
+	if (sb->sb_rbmino != cpu_to_be64(mp->m_sb.sb_rbmino))
+		xfs_scrub_block_set_preen(sc, bp);
+
+	if (sb->sb_rsumino != cpu_to_be64(mp->m_sb.sb_rsumino))
+		xfs_scrub_block_set_preen(sc, bp);
+
+	if (sb->sb_rextsize != cpu_to_be32(mp->m_sb.sb_rextsize))
+		xfs_scrub_block_set_corrupt(sc, bp);
+
+	if (sb->sb_agblocks != cpu_to_be32(mp->m_sb.sb_agblocks))
+		xfs_scrub_block_set_corrupt(sc, bp);
+
+	if (sb->sb_agcount != cpu_to_be32(mp->m_sb.sb_agcount))
+		xfs_scrub_block_set_corrupt(sc, bp);
+
+	if (sb->sb_rbmblocks != cpu_to_be32(mp->m_sb.sb_rbmblocks))
+		xfs_scrub_block_set_corrupt(sc, bp);
+
+	if (sb->sb_logblocks != cpu_to_be32(mp->m_sb.sb_logblocks))
+		xfs_scrub_block_set_corrupt(sc, bp);
+
+	/* Check sb_versionnum bits that are set at mkfs time. */
+	vernum_mask = cpu_to_be16(~XFS_SB_VERSION_OKBITS |
+				  XFS_SB_VERSION_NUMBITS |
+				  XFS_SB_VERSION_ALIGNBIT |
+				  XFS_SB_VERSION_DALIGNBIT |
+				  XFS_SB_VERSION_SHAREDBIT |
+				  XFS_SB_VERSION_LOGV2BIT |
+				  XFS_SB_VERSION_SECTORBIT |
+				  XFS_SB_VERSION_EXTFLGBIT |
+				  XFS_SB_VERSION_DIRV2BIT);
+	if ((sb->sb_versionnum & vernum_mask) !=
+	    (cpu_to_be16(mp->m_sb.sb_versionnum) & vernum_mask))
+		xfs_scrub_block_set_corrupt(sc, bp);
+
+	/* Check sb_versionnum bits that can be set after mkfs time. */
+	vernum_mask = cpu_to_be16(XFS_SB_VERSION_ATTRBIT |
+				  XFS_SB_VERSION_NLINKBIT |
+				  XFS_SB_VERSION_QUOTABIT);
+	if ((sb->sb_versionnum & vernum_mask) !=
+	    (cpu_to_be16(mp->m_sb.sb_versionnum) & vernum_mask))
+		xfs_scrub_block_set_preen(sc, bp);
+
+	if (sb->sb_sectsize != cpu_to_be16(mp->m_sb.sb_sectsize))
+		xfs_scrub_block_set_corrupt(sc, bp);
+
+	if (sb->sb_inodesize != cpu_to_be16(mp->m_sb.sb_inodesize))
+		xfs_scrub_block_set_corrupt(sc, bp);
+
+	if (sb->sb_inopblock != cpu_to_be16(mp->m_sb.sb_inopblock))
+		xfs_scrub_block_set_corrupt(sc, bp);
+
+	if (memcmp(sb->sb_fname, mp->m_sb.sb_fname, sizeof(sb->sb_fname)))
+		xfs_scrub_block_set_preen(sc, bp);
+
+	if (sb->sb_blocklog != mp->m_sb.sb_blocklog)
+		xfs_scrub_block_set_corrupt(sc, bp);
+
+	if (sb->sb_sectlog != mp->m_sb.sb_sectlog)
+		xfs_scrub_block_set_corrupt(sc, bp);
+
+	if (sb->sb_inodelog != mp->m_sb.sb_inodelog)
+		xfs_scrub_block_set_corrupt(sc, bp);
+
+	if (sb->sb_inopblog != mp->m_sb.sb_inopblog)
+		xfs_scrub_block_set_corrupt(sc, bp);
+
+	if (sb->sb_agblklog != mp->m_sb.sb_agblklog)
+		xfs_scrub_block_set_corrupt(sc, bp);
+
+	if (sb->sb_rextslog != mp->m_sb.sb_rextslog)
+		xfs_scrub_block_set_corrupt(sc, bp);
+
+	if (sb->sb_imax_pct != mp->m_sb.sb_imax_pct)
+		xfs_scrub_block_set_preen(sc, bp);
+
+	/*
+	 * Skip the summary counters since we track them in memory anyway.
+	 * sb_icount, sb_ifree, sb_fdblocks, sb_frexents
+	 */
+
+	if (sb->sb_uquotino != cpu_to_be64(mp->m_sb.sb_uquotino))
+		xfs_scrub_block_set_preen(sc, bp);
+
+	if (sb->sb_gquotino != cpu_to_be64(mp->m_sb.sb_gquotino))
+		xfs_scrub_block_set_preen(sc, bp);
+
+	/*
+	 * Skip the quota flags since repair will force quotacheck.
+	 * sb_qflags
+	 */
+
+	if (sb->sb_flags != mp->m_sb.sb_flags)
+		xfs_scrub_block_set_corrupt(sc, bp);
+
+	if (sb->sb_shared_vn != mp->m_sb.sb_shared_vn)
+		xfs_scrub_block_set_corrupt(sc, bp);
+
+	if (sb->sb_inoalignmt != cpu_to_be32(mp->m_sb.sb_inoalignmt))
+		xfs_scrub_block_set_corrupt(sc, bp);
+
+	if (sb->sb_unit != cpu_to_be32(mp->m_sb.sb_unit))
+		xfs_scrub_block_set_preen(sc, bp);
+
+	if (sb->sb_width != cpu_to_be32(mp->m_sb.sb_width))
+		xfs_scrub_block_set_preen(sc, bp);
+
+	if (sb->sb_dirblklog != mp->m_sb.sb_dirblklog)
+		xfs_scrub_block_set_corrupt(sc, bp);
+
+	if (sb->sb_logsectlog != mp->m_sb.sb_logsectlog)
+		xfs_scrub_block_set_corrupt(sc, bp);
+
+	if (sb->sb_logsectsize != cpu_to_be16(mp->m_sb.sb_logsectsize))
+		xfs_scrub_block_set_corrupt(sc, bp);
+
+	if (sb->sb_logsunit != cpu_to_be32(mp->m_sb.sb_logsunit))
+		xfs_scrub_block_set_corrupt(sc, bp);
+
+	/* Do we see any invalid bits in sb_features2? */
+	if (!xfs_sb_version_hasmorebits(&mp->m_sb)) {
+		if (sb->sb_features2 != 0)
+			xfs_scrub_block_set_corrupt(sc, bp);
+	} else {
+		v2_ok = XFS_SB_VERSION2_OKBITS;
+		if (XFS_SB_VERSION_NUM(&mp->m_sb) >= XFS_SB_VERSION_5)
+			v2_ok |= XFS_SB_VERSION2_CRCBIT;
+
+		if (!!(sb->sb_features2 & cpu_to_be32(~v2_ok)))
+			xfs_scrub_block_set_corrupt(sc, bp);
+
+		if (sb->sb_features2 != sb->sb_bad_features2)
+			xfs_scrub_block_set_preen(sc, bp);
+	}
+
+	/* Check sb_features2 flags that are set at mkfs time. */
+	features_mask = cpu_to_be32(XFS_SB_VERSION2_LAZYSBCOUNTBIT |
+				    XFS_SB_VERSION2_PROJID32BIT |
+				    XFS_SB_VERSION2_CRCBIT |
+				    XFS_SB_VERSION2_FTYPE);
+	if ((sb->sb_features2 & features_mask) !=
+	    (cpu_to_be32(mp->m_sb.sb_features2) & features_mask))
+		xfs_scrub_block_set_corrupt(sc, bp);
+
+	/* Check sb_features2 flags that can be set after mkfs time. */
+	features_mask = cpu_to_be32(XFS_SB_VERSION2_ATTR2BIT);
+	if ((sb->sb_features2 & features_mask) !=
+	    (cpu_to_be32(mp->m_sb.sb_features2) & features_mask))
+		xfs_scrub_block_set_corrupt(sc, bp);
+
+	if (!xfs_sb_version_hascrc(&mp->m_sb)) {
+		/* all v5 fields must be zero */
+		if (memchr_inv(&sb->sb_features_compat, 0,
+				sizeof(struct xfs_dsb) -
+				offsetof(struct xfs_dsb, sb_features_compat)))
+			xfs_scrub_block_set_corrupt(sc, bp);
+	} else {
+		/* Check compat flags; all are set at mkfs time. */
+		features_mask = cpu_to_be32(XFS_SB_FEAT_COMPAT_UNKNOWN);
+		if ((sb->sb_features_compat & features_mask) !=
+		    (cpu_to_be32(mp->m_sb.sb_features_compat) & features_mask))
+			xfs_scrub_block_set_corrupt(sc, bp);
+
+		/* Check ro compat flags; all are set at mkfs time. */
+		features_mask = cpu_to_be32(XFS_SB_FEAT_RO_COMPAT_UNKNOWN |
+					    XFS_SB_FEAT_RO_COMPAT_FINOBT |
+					    XFS_SB_FEAT_RO_COMPAT_RMAPBT |
+					    XFS_SB_FEAT_RO_COMPAT_REFLINK);
+		if ((sb->sb_features_ro_compat & features_mask) !=
+		    (cpu_to_be32(mp->m_sb.sb_features_ro_compat) &
+		     features_mask))
+			xfs_scrub_block_set_corrupt(sc, bp);
+
+		/* Check incompat flags; all are set at mkfs time. */
+		features_mask = cpu_to_be32(XFS_SB_FEAT_INCOMPAT_UNKNOWN |
+					    XFS_SB_FEAT_INCOMPAT_FTYPE |
+					    XFS_SB_FEAT_INCOMPAT_SPINODES |
+					    XFS_SB_FEAT_INCOMPAT_META_UUID);
+		if ((sb->sb_features_incompat & features_mask) !=
+		    (cpu_to_be32(mp->m_sb.sb_features_incompat) &
+		     features_mask))
+			xfs_scrub_block_set_corrupt(sc, bp);
+
+		/* Check log incompat flags; all are set at mkfs time. */
+		features_mask = cpu_to_be32(XFS_SB_FEAT_INCOMPAT_LOG_UNKNOWN);
+		if ((sb->sb_features_log_incompat & features_mask) !=
+		    (cpu_to_be32(mp->m_sb.sb_features_log_incompat) &
+		     features_mask))
+			xfs_scrub_block_set_corrupt(sc, bp);
+
+		/* Don't care about sb_crc */
+
+		if (sb->sb_spino_align != cpu_to_be32(mp->m_sb.sb_spino_align))
+			xfs_scrub_block_set_corrupt(sc, bp);
+
+		if (sb->sb_pquotino != cpu_to_be64(mp->m_sb.sb_pquotino))
+			xfs_scrub_block_set_preen(sc, bp);
+
+		/* Don't care about sb_lsn */
+	}
+
+	if (xfs_sb_version_hasmetauuid(&mp->m_sb)) {
+		/* The metadata UUID must be the same for all supers */
+		if (!uuid_equal(&sb->sb_meta_uuid, &mp->m_sb.sb_meta_uuid))
+			xfs_scrub_block_set_corrupt(sc, bp);
+	}
+
+	/* Everything else must be zero. */
+	if (memchr_inv(sb + 1, 0,
+			BBTOB(bp->b_length) - sizeof(struct xfs_dsb)))
+		xfs_scrub_block_set_corrupt(sc, bp);
+
+	return error;
+}
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index aca39b5..b0a5adf 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -76,6 +76,8 @@ void xfs_scrub_set_incomplete(struct xfs_scrub_context *sc);
 
 /* Setup functions */
 int xfs_scrub_setup_fs(struct xfs_scrub_context *sc, struct xfs_inode *ip);
+int xfs_scrub_setup_ag_header(struct xfs_scrub_context *sc,
+			      struct xfs_inode *ip);
 
 void xfs_scrub_ag_free(struct xfs_scrub_context *sc, struct xfs_scrub_ag *sa);
 int xfs_scrub_ag_init(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 3a98060..702812b 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -148,6 +148,10 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.setup	= xfs_scrub_setup_fs,
 		.scrub	= xfs_scrub_probe,
 	},
+	{ /* superblock */
+		.setup	= xfs_scrub_setup_ag_header,
+		.scrub	= xfs_scrub_superblock,
+	},
 };
 
 /* This isn't a stable feature, warn once per day. */
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index 1385295..13e3f9b 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -67,5 +67,6 @@ struct xfs_scrub_context {
 
 /* Metadata scrubbers */
 int xfs_scrub_tester(struct xfs_scrub_context *sc);
+int xfs_scrub_superblock(struct xfs_scrub_context *sc);
 
 #endif	/* __XFS_SCRUB_SCRUB_H__ */


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 15/30] xfs: scrub AGF and AGFL
  2017-10-12  1:40 [PATCH v12 00/30] xfs: online scrub support Darrick J. Wong
                   ` (13 preceding siblings ...)
  2017-10-12  1:42 ` [PATCH 14/30] xfs: scrub the secondary superblocks Darrick J. Wong
@ 2017-10-12  1:42 ` Darrick J. Wong
  2017-10-16  2:18   ` Dave Chinner
  2017-10-12  1:42 ` [PATCH 16/30] xfs: scrub the AGI Darrick J. Wong
                   ` (14 subsequent siblings)
  29 siblings, 1 reply; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-12  1:42 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Check the block references in the AGF and AGFL headers to make sure
they make sense.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_fs.h  |    4 +
 fs/xfs/scrub/agheader.c |  184 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/common.c   |   28 ++++++-
 fs/xfs/scrub/common.h   |    4 +
 fs/xfs/scrub/scrub.c    |    8 ++
 fs/xfs/scrub/scrub.h    |    2 +
 6 files changed, 223 insertions(+), 7 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 8543cbb..aeb2a66 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -485,9 +485,11 @@ struct xfs_scrub_metadata {
 /* Scrub subcommands. */
 #define XFS_SCRUB_TYPE_PROBE	0	/* presence test ioctl */
 #define XFS_SCRUB_TYPE_SB	1	/* superblock */
+#define XFS_SCRUB_TYPE_AGF	2	/* AG free header */
+#define XFS_SCRUB_TYPE_AGFL	3	/* AG free list */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	2
+#define XFS_SCRUB_TYPE_NR	4
 
 /* i: Repair this metadata. */
 #define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/agheader.c b/fs/xfs/scrub/agheader.c
index aa1025f..594ef34 100644
--- a/fs/xfs/scrub/agheader.c
+++ b/fs/xfs/scrub/agheader.c
@@ -30,6 +30,7 @@
 #include "xfs_trans.h"
 #include "xfs_sb.h"
 #include "xfs_inode.h"
+#include "xfs_alloc.h"
 #include "scrub/xfs_scrub.h"
 #include "scrub/scrub.h"
 #include "scrub/common.h"
@@ -52,6 +53,65 @@ xfs_scrub_setup_ag_header(
 	return xfs_scrub_setup_fs(sc, ip);
 }
 
+/* Walk all the blocks in the AGFL. */
+int
+xfs_scrub_walk_agfl(
+	struct xfs_scrub_context	*sc,
+	int				(*fn)(struct xfs_scrub_context *,
+					      xfs_agblock_t bno, void *),
+	void				*priv)
+{
+	struct xfs_agf			*agf;
+	__be32				*agfl_bno;
+	struct xfs_mount		*mp = sc->mp;
+	unsigned int			flfirst;
+	unsigned int			fllast;
+	int				i;
+	int				error;
+
+	agf = XFS_BUF_TO_AGF(sc->sa.agf_bp);
+	agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, sc->sa.agfl_bp);
+	flfirst = be32_to_cpu(agf->agf_flfirst);
+	fllast = be32_to_cpu(agf->agf_fllast);
+
+	/* Nothing to walk in an empty AGFL. */
+	if (agf->agf_flcount == cpu_to_be32(0))
+		return 0;
+
+	/* first to last is a consecutive list. */
+	if (fllast >= flfirst) {
+		for (i = flfirst; i <= fllast; i++) {
+			error = fn(sc, be32_to_cpu(agfl_bno[i]), priv);
+			if (error)
+				return error;
+			if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
+				return error;
+		}
+
+		return 0;
+	}
+
+	/* first to the end */
+	for (i = flfirst; i < XFS_AGFL_SIZE(mp); i++) {
+		error = fn(sc, be32_to_cpu(agfl_bno[i]), priv);
+		if (error)
+			return error;
+		if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
+			return error;
+	}
+
+	/* the start to last. */
+	for (i = 0; i <= fllast; i++) {
+		error = fn(sc, be32_to_cpu(agfl_bno[i]), priv);
+		if (error)
+			return error;
+		if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
+			return error;
+	}
+
+	return 0;
+}
+
 /* Superblock */
 
 /*
@@ -328,3 +388,127 @@ xfs_scrub_superblock(
 
 	return error;
 }
+
+/* AGF */
+
+/* Scrub the AGF. */
+int
+xfs_scrub_agf(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_agf			*agf;
+	xfs_agnumber_t			agno;
+	xfs_agblock_t			agbno;
+	xfs_agblock_t			eoag;
+	xfs_agblock_t			agfl_first;
+	xfs_agblock_t			agfl_last;
+	xfs_agblock_t			agfl_count;
+	xfs_agblock_t			fl_count;
+	int				level;
+	int				error = 0;
+
+	agno = sc->sa.agno = sc->sm->sm_agno;
+	error = xfs_scrub_ag_read_headers(sc, agno, &sc->sa.agi_bp,
+			&sc->sa.agf_bp, &sc->sa.agfl_bp);
+	if (!xfs_scrub_process_error(sc, agno, XFS_AGF_BLOCK(sc->mp), &error))
+		goto out;
+
+	agf = XFS_BUF_TO_AGF(sc->sa.agf_bp);
+
+	/* Check the AG length */
+	eoag = be32_to_cpu(agf->agf_length);
+	if (eoag != xfs_ag_block_count(mp, agno))
+		xfs_scrub_block_set_corrupt(sc, sc->sa.agf_bp);
+
+	/* Check the AGF btree roots and levels */
+	agbno = be32_to_cpu(agf->agf_roots[XFS_BTNUM_BNO]);
+	if (!xfs_verify_agbno_ptr(mp, agno, agbno))
+		xfs_scrub_block_set_corrupt(sc, sc->sa.agf_bp);
+
+	agbno = be32_to_cpu(agf->agf_roots[XFS_BTNUM_CNT]);
+	if (!xfs_verify_agbno_ptr(mp, agno, agbno))
+		xfs_scrub_block_set_corrupt(sc, sc->sa.agf_bp);
+
+	level = be32_to_cpu(agf->agf_levels[XFS_BTNUM_BNO]);
+	if (level <= 0 || level > XFS_BTREE_MAXLEVELS)
+		xfs_scrub_block_set_corrupt(sc, sc->sa.agf_bp);
+
+	level = be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNT]);
+	if (level <= 0 || level > XFS_BTREE_MAXLEVELS)
+		xfs_scrub_block_set_corrupt(sc, sc->sa.agf_bp);
+
+	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
+		agbno = be32_to_cpu(agf->agf_roots[XFS_BTNUM_RMAP]);
+		if (!xfs_verify_agbno_ptr(mp, agno, agbno))
+			xfs_scrub_block_set_corrupt(sc, sc->sa.agf_bp);
+
+		level = be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAP]);
+		if (level <= 0 || level > XFS_BTREE_MAXLEVELS)
+			xfs_scrub_block_set_corrupt(sc, sc->sa.agf_bp);
+	}
+
+	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
+		agbno = be32_to_cpu(agf->agf_refcount_root);
+		if (!xfs_verify_agbno_ptr(mp, agno, agbno))
+			xfs_scrub_block_set_corrupt(sc, sc->sa.agf_bp);
+
+		level = be32_to_cpu(agf->agf_refcount_level);
+		if (level <= 0 || level > XFS_BTREE_MAXLEVELS)
+			xfs_scrub_block_set_corrupt(sc, sc->sa.agf_bp);
+	}
+
+	/* Check the AGFL counters */
+	agfl_first = be32_to_cpu(agf->agf_flfirst);
+	agfl_last = be32_to_cpu(agf->agf_fllast);
+	agfl_count = be32_to_cpu(agf->agf_flcount);
+	if (agfl_last > agfl_first)
+		fl_count = agfl_last - agfl_first + 1;
+	else
+		fl_count = XFS_AGFL_SIZE(mp) - agfl_first + agfl_last + 1;
+	if (agfl_count != 0 && fl_count != agfl_count)
+		xfs_scrub_block_set_corrupt(sc, sc->sa.agf_bp);
+
+out:
+	return error;
+}
+
+/* AGFL */
+
+/* Scrub an AGFL block. */
+STATIC int
+xfs_scrub_agfl_block(
+	struct xfs_scrub_context	*sc,
+	xfs_agblock_t			agbno,
+	void				*priv)
+{
+	struct xfs_mount		*mp = sc->mp;
+	xfs_agnumber_t			agno = sc->sa.agno;
+
+	if (!xfs_verify_agbno_ptr(mp, agno, agbno))
+		xfs_scrub_block_set_corrupt(sc, sc->sa.agfl_bp);
+
+	return 0;
+}
+
+/* Scrub the AGFL. */
+int
+xfs_scrub_agfl(
+	struct xfs_scrub_context	*sc)
+{
+	xfs_agnumber_t			agno;
+	int				error;
+
+	agno = sc->sa.agno = sc->sm->sm_agno;
+	error = xfs_scrub_ag_read_headers(sc, agno, &sc->sa.agi_bp,
+			&sc->sa.agf_bp, &sc->sa.agfl_bp);
+	if (!xfs_scrub_process_error(sc, agno, XFS_AGFL_BLOCK(sc->mp), &error))
+		goto out;
+	if (!sc->sa.agf_bp)
+		return -EFSCORRUPTED;
+
+	/* Check the blocks in the AGFL. */
+	return xfs_scrub_walk_agfl(sc, xfs_scrub_agfl_block, NULL);
+out:
+	return error;
+}
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index cd6fada..f0bb9dd 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -246,6 +246,26 @@ xfs_scrub_set_incomplete(
  * cleaning everything up once we're through.
  */
 
+/* Decide if we want to return an AG header read failure. */
+static inline bool
+want_ag_read_header_failure(
+	struct xfs_scrub_context	*sc,
+	unsigned int			type)
+{
+	/* Return all AG header read failures when scanning btrees. */
+	if (sc->sm->sm_type != XFS_SCRUB_TYPE_AGF &&
+	    sc->sm->sm_type != XFS_SCRUB_TYPE_AGFL)
+		return true;
+	/*
+	 * If we're scanning a given type of AG header, we only want to
+	 * see read failures from that specific header.  We'd like the
+	 * other headers to cross-check them, but this isn't required.
+	 */
+	if (sc->sm->sm_type == type)
+		return true;
+	return false;
+}
+
 /*
  * Grab all the headers for an AG.
  *
@@ -269,15 +289,11 @@ xfs_scrub_ag_read_headers(
 		goto out;
 
 	error = xfs_alloc_read_agf(mp, sc->tp, agno, 0, agf);
-	if (error)
-		goto out;
-	if (!*agf) {
-		error = -ENOMEM;
+	if (error && want_ag_read_header_failure(sc, XFS_SCRUB_TYPE_AGF))
 		goto out;
-	}
 
 	error = xfs_alloc_read_agfl(mp, sc->tp, agno, agfl);
-	if (error)
+	if (error && want_ag_read_header_failure(sc, XFS_SCRUB_TYPE_AGFL))
 		goto out;
 
 out:
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index b0a5adf..251a195 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -88,5 +88,9 @@ int xfs_scrub_ag_read_headers(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
 void xfs_scrub_ag_btcur_free(struct xfs_scrub_ag *sa);
 int xfs_scrub_ag_btcur_init(struct xfs_scrub_context *sc,
 			    struct xfs_scrub_ag *sa);
+int xfs_scrub_walk_agfl(struct xfs_scrub_context *sc,
+			int (*fn)(struct xfs_scrub_context *, xfs_agblock_t bno,
+				  void *),
+			void *priv);
 
 #endif	/* __XFS_SCRUB_COMMON_H__ */
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 702812b..8d0d5c8 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -152,6 +152,14 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.setup	= xfs_scrub_setup_ag_header,
 		.scrub	= xfs_scrub_superblock,
 	},
+	{ /* agf */
+		.setup	= xfs_scrub_setup_ag_header,
+		.scrub	= xfs_scrub_agf,
+	},
+	{ /* agfl */
+		.setup	= xfs_scrub_setup_ag_header,
+		.scrub	= xfs_scrub_agfl,
+	},
 };
 
 /* This isn't a stable feature, warn once per day. */
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index 13e3f9b..50f8641 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -68,5 +68,7 @@ struct xfs_scrub_context {
 /* Metadata scrubbers */
 int xfs_scrub_tester(struct xfs_scrub_context *sc);
 int xfs_scrub_superblock(struct xfs_scrub_context *sc);
+int xfs_scrub_agf(struct xfs_scrub_context *sc);
+int xfs_scrub_agfl(struct xfs_scrub_context *sc);
 
 #endif	/* __XFS_SCRUB_SCRUB_H__ */


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 16/30] xfs: scrub the AGI
  2017-10-12  1:40 [PATCH v12 00/30] xfs: online scrub support Darrick J. Wong
                   ` (14 preceding siblings ...)
  2017-10-12  1:42 ` [PATCH 15/30] xfs: scrub AGF and AGFL Darrick J. Wong
@ 2017-10-12  1:42 ` Darrick J. Wong
  2017-10-16  2:19   ` Dave Chinner
  2017-10-12  1:42 ` [PATCH 17/30] xfs: scrub free space btrees Darrick J. Wong
                   ` (13 subsequent siblings)
  29 siblings, 1 reply; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-12  1:42 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Add a forgotten check to the AGI verifier, then wire up the scrub
infrastructure to check the AGI contents.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_fs.h  |    3 +-
 fs/xfs/scrub/agheader.c |   82 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/common.c   |    5 ++-
 fs/xfs/scrub/scrub.c    |    4 ++
 fs/xfs/scrub/scrub.h    |    1 +
 5 files changed, 92 insertions(+), 3 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index aeb2a66..1e326dd 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -487,9 +487,10 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_SB	1	/* superblock */
 #define XFS_SCRUB_TYPE_AGF	2	/* AG free header */
 #define XFS_SCRUB_TYPE_AGFL	3	/* AG free list */
+#define XFS_SCRUB_TYPE_AGI	4	/* AG inode header */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	4
+#define XFS_SCRUB_TYPE_NR	5
 
 /* i: Repair this metadata. */
 #define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/agheader.c b/fs/xfs/scrub/agheader.c
index 594ef34..3e181c3 100644
--- a/fs/xfs/scrub/agheader.c
+++ b/fs/xfs/scrub/agheader.c
@@ -31,6 +31,7 @@
 #include "xfs_sb.h"
 #include "xfs_inode.h"
 #include "xfs_alloc.h"
+#include "xfs_ialloc.h"
 #include "scrub/xfs_scrub.h"
 #include "scrub/scrub.h"
 #include "scrub/common.h"
@@ -512,3 +513,84 @@ xfs_scrub_agfl(
 out:
 	return error;
 }
+
+/* AGI */
+
+/* Scrub the AGI. */
+int
+xfs_scrub_agi(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_agi			*agi;
+	xfs_agnumber_t			agno;
+	xfs_agblock_t			agbno;
+	xfs_agblock_t			eoag;
+	xfs_agino_t			agino;
+	xfs_agino_t			first_agino;
+	xfs_agino_t			last_agino;
+	xfs_agino_t			icount;
+	int				i;
+	int				level;
+	int				error = 0;
+
+	agno = sc->sa.agno = sc->sm->sm_agno;
+	error = xfs_scrub_ag_read_headers(sc, agno, &sc->sa.agi_bp,
+			&sc->sa.agf_bp, &sc->sa.agfl_bp);
+	if (!xfs_scrub_process_error(sc, agno, XFS_AGI_BLOCK(sc->mp), &error))
+		goto out;
+
+	agi = XFS_BUF_TO_AGI(sc->sa.agi_bp);
+
+	/* Check the AG length */
+	eoag = be32_to_cpu(agi->agi_length);
+	if (eoag != xfs_ag_block_count(mp, agno))
+		xfs_scrub_block_set_corrupt(sc, sc->sa.agi_bp);
+
+	/* Check btree roots and levels */
+	agbno = be32_to_cpu(agi->agi_root);
+	if (!xfs_verify_agbno_ptr(mp, agno, agbno))
+		xfs_scrub_block_set_corrupt(sc, sc->sa.agi_bp);
+
+	level = be32_to_cpu(agi->agi_level);
+	if (level <= 0 || level > XFS_BTREE_MAXLEVELS)
+		xfs_scrub_block_set_corrupt(sc, sc->sa.agi_bp);
+
+	if (xfs_sb_version_hasfinobt(&mp->m_sb)) {
+		agbno = be32_to_cpu(agi->agi_free_root);
+		if (!xfs_verify_agbno_ptr(mp, agno, agbno))
+			xfs_scrub_block_set_corrupt(sc, sc->sa.agi_bp);
+
+		level = be32_to_cpu(agi->agi_free_level);
+		if (level <= 0 || level > XFS_BTREE_MAXLEVELS)
+			xfs_scrub_block_set_corrupt(sc, sc->sa.agi_bp);
+	}
+
+	/* Check inode counters */
+	xfs_ialloc_aginode_range(mp, agno, &first_agino, &last_agino);
+	icount = be32_to_cpu(agi->agi_count);
+	if (icount > last_agino - first_agino + 1 ||
+	    icount < be32_to_cpu(agi->agi_freecount))
+		xfs_scrub_block_set_corrupt(sc, sc->sa.agi_bp);
+
+	/* Check inode pointers */
+	agino = be32_to_cpu(agi->agi_newino);
+	if (agino != NULLAGINO && !xfs_verify_agino_ptr(mp, agno, agino))
+		xfs_scrub_block_set_corrupt(sc, sc->sa.agi_bp);
+
+	agino = be32_to_cpu(agi->agi_dirino);
+	if (agino != NULLAGINO && !xfs_verify_agino_ptr(mp, agno, agino))
+		xfs_scrub_block_set_corrupt(sc, sc->sa.agi_bp);
+
+	/* Check unlinked inode buckets */
+	for (i = 0; i < XFS_AGI_UNLINKED_BUCKETS; i++) {
+		agino = be32_to_cpu(agi->agi_unlinked[i]);
+		if (agino == NULLAGINO)
+			continue;
+		if (!xfs_verify_agino_ptr(mp, agno, agino))
+			xfs_scrub_block_set_corrupt(sc, sc->sa.agi_bp);
+	}
+
+out:
+	return error;
+}
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index f0bb9dd..b0ba14c 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -254,7 +254,8 @@ want_ag_read_header_failure(
 {
 	/* Return all AG header read failures when scanning btrees. */
 	if (sc->sm->sm_type != XFS_SCRUB_TYPE_AGF &&
-	    sc->sm->sm_type != XFS_SCRUB_TYPE_AGFL)
+	    sc->sm->sm_type != XFS_SCRUB_TYPE_AGFL &&
+	    sc->sm->sm_type != XFS_SCRUB_TYPE_AGI)
 		return true;
 	/*
 	 * If we're scanning a given type of AG header, we only want to
@@ -285,7 +286,7 @@ xfs_scrub_ag_read_headers(
 	int				error;
 
 	error = xfs_ialloc_read_agi(mp, sc->tp, agno, agi);
-	if (error)
+	if (error && want_ag_read_header_failure(sc, XFS_SCRUB_TYPE_AGI))
 		goto out;
 
 	error = xfs_alloc_read_agf(mp, sc->tp, agno, 0, agf);
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 8d0d5c8..07c45d6 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -160,6 +160,10 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.setup	= xfs_scrub_setup_ag_header,
 		.scrub	= xfs_scrub_agfl,
 	},
+	{ /* agi */
+		.setup	= xfs_scrub_setup_ag_header,
+		.scrub	= xfs_scrub_agi,
+	},
 };
 
 /* This isn't a stable feature, warn once per day. */
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index 50f8641..09952c2 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -70,5 +70,6 @@ int xfs_scrub_tester(struct xfs_scrub_context *sc);
 int xfs_scrub_superblock(struct xfs_scrub_context *sc);
 int xfs_scrub_agf(struct xfs_scrub_context *sc);
 int xfs_scrub_agfl(struct xfs_scrub_context *sc);
+int xfs_scrub_agi(struct xfs_scrub_context *sc);
 
 #endif	/* __XFS_SCRUB_SCRUB_H__ */


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 17/30] xfs: scrub free space btrees
  2017-10-12  1:40 [PATCH v12 00/30] xfs: online scrub support Darrick J. Wong
                   ` (15 preceding siblings ...)
  2017-10-12  1:42 ` [PATCH 16/30] xfs: scrub the AGI Darrick J. Wong
@ 2017-10-12  1:42 ` Darrick J. Wong
  2017-10-16  2:25   ` Dave Chinner
  2017-10-12  1:42 ` [PATCH 18/30] xfs: scrub inode btrees Darrick J. Wong
                   ` (12 subsequent siblings)
  29 siblings, 1 reply; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-12  1:42 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Check the extent records free space btrees to ensure that the values
look sane.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile        |    1 
 fs/xfs/libxfs/xfs_fs.h |    4 +-
 fs/xfs/scrub/alloc.c   |  103 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/common.c  |   16 +++++++
 fs/xfs/scrub/common.h  |    6 +++
 fs/xfs/scrub/scrub.c   |    8 ++++
 fs/xfs/scrub/scrub.h   |    2 +
 7 files changed, 139 insertions(+), 1 deletion(-)
 create mode 100644 fs/xfs/scrub/alloc.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index e92d04d..84ac733 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -147,6 +147,7 @@ ifeq ($(CONFIG_XFS_ONLINE_SCRUB),y)
 xfs-y				+= $(addprefix scrub/, \
 				   trace.o \
 				   agheader.o \
+				   alloc.o \
 				   btree.o \
 				   common.o \
 				   scrub.o \
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 1e326dd..1e23d13 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -488,9 +488,11 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_AGF	2	/* AG free header */
 #define XFS_SCRUB_TYPE_AGFL	3	/* AG free list */
 #define XFS_SCRUB_TYPE_AGI	4	/* AG inode header */
+#define XFS_SCRUB_TYPE_BNOBT	5	/* freesp by block btree */
+#define XFS_SCRUB_TYPE_CNTBT	6	/* freesp by length btree */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	5
+#define XFS_SCRUB_TYPE_NR	7
 
 /* i: Repair this metadata. */
 #define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/alloc.c b/fs/xfs/scrub/alloc.c
new file mode 100644
index 0000000..87db6a8
--- /dev/null
+++ b/fs/xfs/scrub/alloc.c
@@ -0,0 +1,103 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_alloc.h"
+#include "xfs_rmap.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/btree.h"
+#include "scrub/trace.h"
+
+/*
+ * Set us up to scrub free space btrees.
+ */
+int
+xfs_scrub_setup_ag_allocbt(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	return xfs_scrub_setup_ag_btree(sc, ip, false);
+}
+
+/* Free space btree scrubber. */
+
+/* Scrub a bnobt/cntbt record. */
+STATIC int
+xfs_scrub_allocbt_helper(
+	struct xfs_scrub_btree		*bs,
+	union xfs_btree_rec		*rec)
+{
+	struct xfs_mount		*mp = bs->cur->bc_mp;
+	xfs_agnumber_t			agno = bs->cur->bc_private.a.agno;
+	xfs_agblock_t			bno;
+	xfs_extlen_t			len;
+	int				error = 0;
+
+	bno = be32_to_cpu(rec->alloc.ar_startblock);
+	len = be32_to_cpu(rec->alloc.ar_blockcount);
+
+	if (bno + len <= bno ||
+	    !xfs_verify_agbno_ptr(mp, agno, bno) ||
+	    !xfs_verify_agbno_ptr(mp, agno, bno + len - 1))
+		xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
+
+	return error;
+}
+
+/* Scrub the freespace btrees for some AG. */
+STATIC int
+xfs_scrub_allocbt(
+	struct xfs_scrub_context	*sc,
+	xfs_btnum_t			which)
+{
+	struct xfs_owner_info		oinfo;
+	struct xfs_btree_cur		*cur;
+
+	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_AG);
+	cur = which == XFS_BTNUM_BNO ? sc->sa.bno_cur : sc->sa.cnt_cur;
+	return xfs_scrub_btree(sc, cur, xfs_scrub_allocbt_helper,
+			&oinfo, NULL);
+}
+
+int
+xfs_scrub_bnobt(
+	struct xfs_scrub_context	*sc)
+{
+	return xfs_scrub_allocbt(sc, XFS_BTNUM_BNO);
+}
+
+int
+xfs_scrub_cntbt(
+	struct xfs_scrub_context	*sc)
+{
+	return xfs_scrub_allocbt(sc, XFS_BTNUM_CNT);
+}
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index b0ba14c..018127a 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -443,3 +443,19 @@ xfs_scrub_setup_fs(
 {
 	return xfs_scrub_trans_alloc(sc->sm, sc->mp, &sc->tp);
 }
+
+/* Set us up with AG headers and btree cursors. */
+int
+xfs_scrub_setup_ag_btree(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip,
+	bool				force_log)
+{
+	int				error;
+
+	error = xfs_scrub_setup_ag_header(sc, ip);
+	if (error)
+		return error;
+
+	return xfs_scrub_ag_init(sc, sc->sm->sm_agno, &sc->sa);
+}
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 251a195..372a844 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -78,6 +78,9 @@ void xfs_scrub_set_incomplete(struct xfs_scrub_context *sc);
 int xfs_scrub_setup_fs(struct xfs_scrub_context *sc, struct xfs_inode *ip);
 int xfs_scrub_setup_ag_header(struct xfs_scrub_context *sc,
 			      struct xfs_inode *ip);
+int xfs_scrub_setup_ag_allocbt(struct xfs_scrub_context *sc,
+			       struct xfs_inode *ip);
+
 
 void xfs_scrub_ag_free(struct xfs_scrub_context *sc, struct xfs_scrub_ag *sa);
 int xfs_scrub_ag_init(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
@@ -93,4 +96,7 @@ int xfs_scrub_walk_agfl(struct xfs_scrub_context *sc,
 				  void *),
 			void *priv);
 
+int xfs_scrub_setup_ag_btree(struct xfs_scrub_context *sc,
+			     struct xfs_inode *ip, bool force_log);
+
 #endif	/* __XFS_SCRUB_COMMON_H__ */
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 07c45d6..b3a5e9d 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -164,6 +164,14 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.setup	= xfs_scrub_setup_ag_header,
 		.scrub	= xfs_scrub_agi,
 	},
+	{ /* bnobt */
+		.setup	= xfs_scrub_setup_ag_allocbt,
+		.scrub	= xfs_scrub_bnobt,
+	},
+	{ /* cntbt */
+		.setup	= xfs_scrub_setup_ag_allocbt,
+		.scrub	= xfs_scrub_cntbt,
+	},
 };
 
 /* This isn't a stable feature, warn once per day. */
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index 09952c2..a4af99c 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -71,5 +71,7 @@ int xfs_scrub_superblock(struct xfs_scrub_context *sc);
 int xfs_scrub_agf(struct xfs_scrub_context *sc);
 int xfs_scrub_agfl(struct xfs_scrub_context *sc);
 int xfs_scrub_agi(struct xfs_scrub_context *sc);
+int xfs_scrub_bnobt(struct xfs_scrub_context *sc);
+int xfs_scrub_cntbt(struct xfs_scrub_context *sc);
 
 #endif	/* __XFS_SCRUB_SCRUB_H__ */


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 18/30] xfs: scrub inode btrees
  2017-10-12  1:40 [PATCH v12 00/30] xfs: online scrub support Darrick J. Wong
                   ` (16 preceding siblings ...)
  2017-10-12  1:42 ` [PATCH 17/30] xfs: scrub free space btrees Darrick J. Wong
@ 2017-10-12  1:42 ` Darrick J. Wong
  2017-10-16  2:55   ` Dave Chinner
  2017-10-17  0:11   ` [PATCH v2 " Darrick J. Wong
  2017-10-12  1:42 ` [PATCH 19/30] xfs: scrub rmap btrees Darrick J. Wong
                   ` (11 subsequent siblings)
  29 siblings, 2 replies; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-12  1:42 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Check the records of the inode btrees to make sure that the values
make sense given the inode records themselves.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile            |    1 
 fs/xfs/libxfs/xfs_format.h |    2 
 fs/xfs/libxfs/xfs_fs.h     |    4 -
 fs/xfs/scrub/common.c      |   29 ++++
 fs/xfs/scrub/common.h      |    3 
 fs/xfs/scrub/ialloc.c      |  333 ++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/scrub.c       |    9 +
 fs/xfs/scrub/scrub.h       |    2 
 8 files changed, 381 insertions(+), 2 deletions(-)
 create mode 100644 fs/xfs/scrub/ialloc.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 84ac733..82326b7 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -150,6 +150,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   alloc.o \
 				   btree.o \
 				   common.o \
+				   ialloc.o \
 				   scrub.o \
 				   )
 endif
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 23229f0..154c3dd 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -518,7 +518,7 @@ static inline int xfs_sb_version_hasftype(struct xfs_sb *sbp)
 		 (sbp->sb_features2 & XFS_SB_VERSION2_FTYPE));
 }
 
-static inline int xfs_sb_version_hasfinobt(xfs_sb_t *sbp)
+static inline bool xfs_sb_version_hasfinobt(xfs_sb_t *sbp)
 {
 	return (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) &&
 		(sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_FINOBT);
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 1e23d13..74df6ec 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -490,9 +490,11 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_AGI	4	/* AG inode header */
 #define XFS_SCRUB_TYPE_BNOBT	5	/* freesp by block btree */
 #define XFS_SCRUB_TYPE_CNTBT	6	/* freesp by length btree */
+#define XFS_SCRUB_TYPE_INOBT	7	/* inode btree */
+#define XFS_SCRUB_TYPE_FINOBT	8	/* free inode btree */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	7
+#define XFS_SCRUB_TYPE_NR	9
 
 /* i: Repair this metadata. */
 #define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 018127a..39165c3 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -40,6 +40,8 @@
 #include "xfs_refcount_btree.h"
 #include "xfs_rmap.h"
 #include "xfs_rmap_btree.h"
+#include "xfs_log.h"
+#include "xfs_trans_priv.h"
 #include "scrub/xfs_scrub.h"
 #include "scrub/scrub.h"
 #include "scrub/common.h"
@@ -451,11 +453,38 @@ xfs_scrub_setup_ag_btree(
 	struct xfs_inode		*ip,
 	bool				force_log)
 {
+	struct xfs_mount		*mp = sc->mp;
 	int				error;
 
+	/*
+	 * If the caller asks us to checkpont the log, do so.  This
+	 * expensive operation should be performed infrequently and only
+	 * as a last resort.  Any caller that sets force_log should
+	 * document why they need to do so.
+	 */
+	if (force_log) {
+		error = xfs_scrub_checkpoint_log(mp);
+		if (error)
+			return error;
+	}
+
 	error = xfs_scrub_setup_ag_header(sc, ip);
 	if (error)
 		return error;
 
 	return xfs_scrub_ag_init(sc, sc->sm->sm_agno, &sc->sa);
 }
+
+/* Push everything out of the log onto disk. */
+int
+xfs_scrub_checkpoint_log(
+	struct xfs_mount	*mp)
+{
+	int			error;
+
+	error = _xfs_log_force(mp, XFS_LOG_SYNC, NULL);
+	if (error)
+		return error;
+	xfs_ail_push_all_sync(mp->m_ail);
+	return 0;
+}
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 372a844..17830b8 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -73,6 +73,7 @@ void xfs_scrub_fblock_set_warning(struct xfs_scrub_context *sc, int whichfork,
 		xfs_fileoff_t offset);
 
 void xfs_scrub_set_incomplete(struct xfs_scrub_context *sc);
+int xfs_scrub_checkpoint_log(struct xfs_mount *mp);
 
 /* Setup functions */
 int xfs_scrub_setup_fs(struct xfs_scrub_context *sc, struct xfs_inode *ip);
@@ -80,6 +81,8 @@ int xfs_scrub_setup_ag_header(struct xfs_scrub_context *sc,
 			      struct xfs_inode *ip);
 int xfs_scrub_setup_ag_allocbt(struct xfs_scrub_context *sc,
 			       struct xfs_inode *ip);
+int xfs_scrub_setup_ag_iallocbt(struct xfs_scrub_context *sc,
+				struct xfs_inode *ip);
 
 
 void xfs_scrub_ag_free(struct xfs_scrub_context *sc, struct xfs_scrub_ag *sa);
diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
new file mode 100644
index 0000000..669fad4
--- /dev/null
+++ b/fs/xfs/scrub/ialloc.c
@@ -0,0 +1,333 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_alloc.h"
+#include "xfs_ialloc.h"
+#include "xfs_ialloc_btree.h"
+#include "xfs_icache.h"
+#include "xfs_rmap.h"
+#include "xfs_log.h"
+#include "xfs_trans_priv.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/btree.h"
+#include "scrub/trace.h"
+
+/*
+ * Set us up to scrub inode btrees.
+ * If we detect a discrepancy between the inobt and the inode,
+ * try again after forcing logged inode cores out to disk.
+ */
+int
+xfs_scrub_setup_ag_iallocbt(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	return xfs_scrub_setup_ag_btree(sc, ip, sc->try_harder);
+}
+
+/* Inode btree scrubber. */
+
+/* Is this chunk worth checking? */
+STATIC bool
+xfs_scrub_iallocbt_chunk(
+	struct xfs_scrub_btree		*bs,
+	struct xfs_inobt_rec_incore	*irec,
+	xfs_agino_t			agino,
+	xfs_extlen_t			len)
+{
+	struct xfs_mount		*mp = bs->cur->bc_mp;
+	xfs_agnumber_t			agno = bs->cur->bc_private.a.agno;
+	xfs_agblock_t			bno;
+
+	bno = XFS_AGINO_TO_AGBNO(mp, agino);
+	if (bno + len <= bno ||
+	    !xfs_verify_agbno_ptr(mp, agno, bno) ||
+	    !xfs_verify_agbno_ptr(mp, agno, bno + len - 1))
+		xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
+
+	return true;
+}
+
+/* Count the number of free inodes. */
+static unsigned int
+xfs_scrub_iallocbt_freecount(
+	xfs_inofree_t			freemask)
+{
+	BUILD_BUG_ON(sizeof(freemask) != sizeof(__u64));
+	return hweight64(freemask);
+}
+
+/* Check a particular inode with ir_free. */
+STATIC int
+xfs_scrub_iallocbt_check_cluster_freemask(
+	struct xfs_scrub_btree		*bs,
+	xfs_ino_t			fsino,
+	xfs_agino_t			chunkino,
+	xfs_agino_t			clusterino,
+	struct xfs_inobt_rec_incore	*irec,
+	struct xfs_buf			*bp)
+{
+	struct xfs_dinode		*dip;
+	struct xfs_mount		*mp = bs->cur->bc_mp;
+	bool				freemask_ok;
+	bool				inuse;
+	int				error;
+
+	dip = xfs_buf_offset(bp, clusterino * mp->m_sb.sb_inodesize);
+	if (be16_to_cpu(dip->di_magic) != XFS_DINODE_MAGIC ||
+	    (dip->di_version >= 3 &&
+	     be64_to_cpu(dip->di_ino) != fsino + clusterino)) {
+		xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
+		goto out;
+	}
+
+	freemask_ok = (irec->ir_free & XFS_INOBT_MASK(chunkino + clusterino));
+	error = xfs_icache_inode_is_allocated(mp, bs->cur->bc_tp,
+			fsino + clusterino, &inuse);
+	if (error == -ENODATA) {
+		/* Not cached, just read the disk buffer */
+		freemask_ok ^= !!(dip->di_mode);
+		if (!bs->sc->try_harder && !freemask_ok)
+			return -EDEADLOCK;
+	} else if (error < 0) {
+		/*
+		 * Inode is only half assembled, or there was an IO error,
+		 * or the verifier failed, so don't bother trying to check.
+		 * The inode scrubber can deal with this.
+		 */
+		freemask_ok = true;
+	} else {
+		/* Inode is all there. */
+		freemask_ok ^= inuse;
+	}
+	if (!freemask_ok)
+		xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
+out:
+	return 0;
+}
+
+/* Make sure the free mask is consistent with what the inodes think. */
+STATIC int
+xfs_scrub_iallocbt_check_freemask(
+	struct xfs_scrub_btree		*bs,
+	struct xfs_inobt_rec_incore	*irec)
+{
+	struct xfs_owner_info		oinfo;
+	struct xfs_imap			imap;
+	struct xfs_mount		*mp = bs->cur->bc_mp;
+	struct xfs_dinode		*dip;
+	struct xfs_buf			*bp;
+	xfs_ino_t			fsino;
+	xfs_agino_t			nr_inodes;
+	xfs_agino_t			agino;
+	xfs_agino_t			chunkino;
+	xfs_agino_t			clusterino;
+	xfs_agblock_t			agbno;
+	int				blks_per_cluster;
+	uint16_t			holemask;
+	uint16_t			ir_holemask;
+	int				error = 0;
+
+	/* Make sure the freemask matches the inode records. */
+	blks_per_cluster = xfs_icluster_size_fsb(mp);
+	nr_inodes = XFS_OFFBNO_TO_AGINO(mp, blks_per_cluster, 0);
+	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_INODES);
+
+	for (agino = irec->ir_startino;
+	     agino < irec->ir_startino + XFS_INODES_PER_CHUNK;
+	     agino += blks_per_cluster * mp->m_sb.sb_inopblock) {
+		fsino = XFS_AGINO_TO_INO(mp, bs->cur->bc_private.a.agno, agino);
+		chunkino = agino - irec->ir_startino;
+		agbno = XFS_AGINO_TO_AGBNO(mp, agino);
+
+		/* Compute the holemask mask for this cluster. */
+		for (clusterino = 0, holemask = 0; clusterino < nr_inodes;
+		     clusterino += XFS_INODES_PER_HOLEMASK_BIT)
+			holemask |= XFS_INOBT_MASK((chunkino + clusterino) /
+					XFS_INODES_PER_HOLEMASK_BIT);
+
+		/* The whole cluster must be a hole or not a hole. */
+		ir_holemask = (irec->ir_holemask & holemask);
+		if (ir_holemask != holemask && ir_holemask != 0) {
+			xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
+			continue;
+		}
+
+		/* If any part of this is a hole, skip it. */
+		if (ir_holemask)
+			continue;
+
+		/* Grab the inode cluster buffer. */
+		imap.im_blkno = XFS_AGB_TO_DADDR(mp, bs->cur->bc_private.a.agno,
+				agbno);
+		imap.im_len = XFS_FSB_TO_BB(mp, blks_per_cluster);
+		imap.im_boffset = 0;
+
+		error = xfs_imap_to_bp(mp, bs->cur->bc_tp, &imap,
+				&dip, &bp, 0, 0);
+		if (!xfs_scrub_btree_process_error(bs->sc, bs->cur, 0, &error))
+			continue;
+
+		/* Which inodes are free? */
+		for (clusterino = 0; clusterino < nr_inodes; clusterino++) {
+			error = xfs_scrub_iallocbt_check_cluster_freemask(bs,
+					fsino, chunkino, clusterino, irec, bp);
+			if (error) {
+				xfs_trans_brelse(bs->cur->bc_tp, bp);
+				return error;
+			}
+		}
+
+		xfs_trans_brelse(bs->cur->bc_tp, bp);
+	}
+
+	return error;
+}
+
+/* Scrub an inobt/finobt record. */
+STATIC int
+xfs_scrub_iallocbt_helper(
+	struct xfs_scrub_btree		*bs,
+	union xfs_btree_rec		*rec)
+{
+	struct xfs_mount		*mp = bs->cur->bc_mp;
+	struct xfs_inobt_rec_incore	irec;
+	uint64_t			holes;
+	xfs_agnumber_t			agno = bs->cur->bc_private.a.agno;
+	xfs_agino_t			agino;
+	xfs_agblock_t			agbno;
+	xfs_extlen_t			len;
+	int				holecount;
+	int				i;
+	int				error = 0;
+	unsigned int			real_freecount;
+	uint16_t			holemask;
+
+	xfs_inobt_btrec_to_irec(mp, rec, &irec);
+
+	if (irec.ir_count > XFS_INODES_PER_CHUNK ||
+	    irec.ir_freecount > XFS_INODES_PER_CHUNK)
+		xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
+
+	real_freecount = irec.ir_freecount +
+			(XFS_INODES_PER_CHUNK - irec.ir_count);
+	if (real_freecount != xfs_scrub_iallocbt_freecount(irec.ir_free))
+		xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
+
+	agino = irec.ir_startino;
+	/* Record has to be properly aligned within the AG. */
+	if (!xfs_verify_agino_ptr(mp, agno, agino) ||
+	    !xfs_verify_agino_ptr(mp, agno, agino + XFS_INODES_PER_CHUNK - 1)) {
+		xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
+		goto out;
+	}
+
+	/* Make sure this record is aligned to cluster and inoalignmnt size. */
+	agbno = XFS_AGINO_TO_AGBNO(mp, irec.ir_startino);
+	if ((agbno & (xfs_ialloc_cluster_alignment(mp) - 1)) ||
+	    (agbno & (xfs_icluster_size_fsb(mp) - 1)))
+		xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
+
+	/* Handle non-sparse inodes */
+	if (!xfs_inobt_issparse(irec.ir_holemask)) {
+		len = XFS_B_TO_FSB(mp,
+				XFS_INODES_PER_CHUNK * mp->m_sb.sb_inodesize);
+		if (irec.ir_count != XFS_INODES_PER_CHUNK)
+			xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
+
+		if (!xfs_scrub_iallocbt_chunk(bs, &irec, agino, len))
+			goto out;
+		goto check_freemask;
+	}
+
+	/* Check each chunk of a sparse inode cluster. */
+	holemask = irec.ir_holemask;
+	holecount = 0;
+	len = XFS_B_TO_FSB(mp,
+			XFS_INODES_PER_HOLEMASK_BIT * mp->m_sb.sb_inodesize);
+	holes = ~xfs_inobt_irec_to_allocmask(&irec);
+	if ((holes & irec.ir_free) != holes ||
+	    irec.ir_freecount > irec.ir_count)
+		xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
+
+	for (i = 0; i < XFS_INOBT_HOLEMASK_BITS; i++) {
+		if (holemask & 1)
+			holecount += XFS_INODES_PER_HOLEMASK_BIT;
+		else if (!xfs_scrub_iallocbt_chunk(bs, &irec, agino, len))
+			break;
+		holemask >>= 1;
+		agino += XFS_INODES_PER_HOLEMASK_BIT;
+	}
+
+	if (holecount > XFS_INODES_PER_CHUNK ||
+	    holecount + irec.ir_count != XFS_INODES_PER_CHUNK)
+		xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
+
+check_freemask:
+	error = xfs_scrub_iallocbt_check_freemask(bs, &irec);
+	if (error)
+		goto out;
+
+out:
+	return error;
+}
+
+/* Scrub the inode btrees for some AG. */
+STATIC int
+xfs_scrub_iallocbt(
+	struct xfs_scrub_context	*sc,
+	xfs_btnum_t			which)
+{
+	struct xfs_btree_cur		*cur;
+	struct xfs_owner_info		oinfo;
+
+	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_INOBT);
+	cur = which == XFS_BTNUM_INO ? sc->sa.ino_cur : sc->sa.fino_cur;
+	return xfs_scrub_btree(sc, cur, xfs_scrub_iallocbt_helper,
+			&oinfo, NULL);
+}
+
+int
+xfs_scrub_inobt(
+	struct xfs_scrub_context	*sc)
+{
+	return xfs_scrub_iallocbt(sc, XFS_BTNUM_INO);
+}
+
+int
+xfs_scrub_finobt(
+	struct xfs_scrub_context	*sc)
+{
+	return xfs_scrub_iallocbt(sc, XFS_BTNUM_FINO);
+}
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index b3a5e9d..e70b421 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -172,6 +172,15 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.setup	= xfs_scrub_setup_ag_allocbt,
 		.scrub	= xfs_scrub_cntbt,
 	},
+	{ /* inobt */
+		.setup	= xfs_scrub_setup_ag_iallocbt,
+		.scrub	= xfs_scrub_inobt,
+	},
+	{ /* finobt */
+		.setup	= xfs_scrub_setup_ag_iallocbt,
+		.scrub	= xfs_scrub_finobt,
+		.has	= xfs_sb_version_hasfinobt,
+	},
 };
 
 /* This isn't a stable feature, warn once per day. */
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index a4af99c..5d97453 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -73,5 +73,7 @@ int xfs_scrub_agfl(struct xfs_scrub_context *sc);
 int xfs_scrub_agi(struct xfs_scrub_context *sc);
 int xfs_scrub_bnobt(struct xfs_scrub_context *sc);
 int xfs_scrub_cntbt(struct xfs_scrub_context *sc);
+int xfs_scrub_inobt(struct xfs_scrub_context *sc);
+int xfs_scrub_finobt(struct xfs_scrub_context *sc);
 
 #endif	/* __XFS_SCRUB_SCRUB_H__ */


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 19/30] xfs: scrub rmap btrees
  2017-10-12  1:40 [PATCH v12 00/30] xfs: online scrub support Darrick J. Wong
                   ` (17 preceding siblings ...)
  2017-10-12  1:42 ` [PATCH 18/30] xfs: scrub inode btrees Darrick J. Wong
@ 2017-10-12  1:42 ` Darrick J. Wong
  2017-10-16  3:01   ` Dave Chinner
  2017-10-12  1:42 ` [PATCH 20/30] xfs: scrub refcount btrees Darrick J. Wong
                   ` (10 subsequent siblings)
  29 siblings, 1 reply; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-12  1:42 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Check the reverse mapping records to make sure that the contents
make sense.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile        |    1 
 fs/xfs/libxfs/xfs_fs.h |    3 +
 fs/xfs/scrub/common.h  |    2 +
 fs/xfs/scrub/rmap.c    |  138 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/scrub.c   |    5 ++
 fs/xfs/scrub/scrub.h   |    1 
 6 files changed, 149 insertions(+), 1 deletion(-)
 create mode 100644 fs/xfs/scrub/rmap.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 82326b7..5a64f8d 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -151,6 +151,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   btree.o \
 				   common.o \
 				   ialloc.o \
+				   rmap.o \
 				   scrub.o \
 				   )
 endif
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 74df6ec..fb1d997 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -492,9 +492,10 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_CNTBT	6	/* freesp by length btree */
 #define XFS_SCRUB_TYPE_INOBT	7	/* inode btree */
 #define XFS_SCRUB_TYPE_FINOBT	8	/* free inode btree */
+#define XFS_SCRUB_TYPE_RMAPBT	9	/* reverse mapping btree */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	9
+#define XFS_SCRUB_TYPE_NR	10
 
 /* i: Repair this metadata. */
 #define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 17830b8..7922775 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -83,6 +83,8 @@ int xfs_scrub_setup_ag_allocbt(struct xfs_scrub_context *sc,
 			       struct xfs_inode *ip);
 int xfs_scrub_setup_ag_iallocbt(struct xfs_scrub_context *sc,
 				struct xfs_inode *ip);
+int xfs_scrub_setup_ag_rmapbt(struct xfs_scrub_context *sc,
+			      struct xfs_inode *ip);
 
 
 void xfs_scrub_ag_free(struct xfs_scrub_context *sc, struct xfs_scrub_ag *sa);
diff --git a/fs/xfs/scrub/rmap.c b/fs/xfs/scrub/rmap.c
new file mode 100644
index 0000000..14401f7
--- /dev/null
+++ b/fs/xfs/scrub/rmap.c
@@ -0,0 +1,138 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_alloc.h"
+#include "xfs_ialloc.h"
+#include "xfs_rmap.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/btree.h"
+#include "scrub/trace.h"
+
+/*
+ * Set us up to scrub reverse mapping btrees.
+ */
+int
+xfs_scrub_setup_ag_rmapbt(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	return xfs_scrub_setup_ag_btree(sc, ip, false);
+}
+
+/* Reverse-mapping scrubber. */
+
+/* Scrub an rmapbt record. */
+STATIC int
+xfs_scrub_rmapbt_helper(
+	struct xfs_scrub_btree		*bs,
+	union xfs_btree_rec		*rec)
+{
+	struct xfs_mount		*mp = bs->cur->bc_mp;
+	struct xfs_rmap_irec		irec;
+	xfs_agnumber_t			agno = bs->cur->bc_private.a.agno;
+	bool				non_inode;
+	bool				is_unwritten;
+	bool				is_bmbt;
+	bool				is_attr;
+	int				error;
+
+	error = xfs_rmap_btrec_to_irec(rec, &irec);
+	if (!xfs_scrub_btree_process_error(bs->sc, bs->cur, 0, &error))
+		goto out;
+
+	/* Check extent. */
+	if (irec.rm_startblock + irec.rm_blockcount <= irec.rm_startblock)
+		xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
+
+	if (irec.rm_owner == XFS_RMAP_OWN_FS) {
+		/*
+		 * xfs_verify_agbno_ptr returns false for static fs metadata.
+		 * Since that only exists at the start of the AG, validate
+		 * that by hand.
+		 */
+		if (irec.rm_startblock != 0 ||
+		    irec.rm_blockcount != XFS_AGFL_BLOCK(mp) + 1)
+			xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
+	} else {
+		/*
+		 * Otherwise we must point somewhere past the static metadata
+		 * but before the end of the FS.  Run the regular check.
+		 */
+		if (!xfs_verify_agbno_ptr(mp, agno, irec.rm_startblock) ||
+		    !xfs_verify_agbno_ptr(mp, agno, irec.rm_startblock +
+				irec.rm_blockcount - 1))
+			xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
+	}
+
+	/* Check flags. */
+	non_inode = XFS_RMAP_NON_INODE_OWNER(irec.rm_owner);
+	is_bmbt = irec.rm_flags & XFS_RMAP_BMBT_BLOCK;
+	is_attr = irec.rm_flags & XFS_RMAP_ATTR_FORK;
+	is_unwritten = irec.rm_flags & XFS_RMAP_UNWRITTEN;
+
+	if (is_bmbt && irec.rm_offset != 0)
+		xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
+
+	if (non_inode && irec.rm_offset != 0)
+		xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
+
+	if (is_unwritten && (is_bmbt || non_inode || is_attr))
+		xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
+
+	if (non_inode && (is_bmbt || is_unwritten || is_attr))
+		xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
+
+	if (!non_inode) {
+		if (!xfs_verify_ino_ptr(mp, irec.rm_owner))
+			xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
+	} else {
+		/* Non-inode owner within the magic values? */
+		if (irec.rm_owner <= XFS_RMAP_OWN_MIN ||
+		    irec.rm_owner > XFS_RMAP_OWN_FS)
+			xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
+	}
+out:
+	return error;
+}
+
+/* Scrub the rmap btree for some AG. */
+int
+xfs_scrub_rmapbt(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_owner_info		oinfo;
+
+	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_AG);
+	return xfs_scrub_btree(sc, sc->sa.rmap_cur, xfs_scrub_rmapbt_helper,
+			&oinfo, NULL);
+}
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index e70b421..9239cb3 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -181,6 +181,11 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.scrub	= xfs_scrub_finobt,
 		.has	= xfs_sb_version_hasfinobt,
 	},
+	{ /* rmapbt */
+		.setup	= xfs_scrub_setup_ag_rmapbt,
+		.scrub	= xfs_scrub_rmapbt,
+		.has	= xfs_sb_version_hasrmapbt,
+	},
 };
 
 /* This isn't a stable feature, warn once per day. */
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index 5d97453..0d1e78b 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -75,5 +75,6 @@ int xfs_scrub_bnobt(struct xfs_scrub_context *sc);
 int xfs_scrub_cntbt(struct xfs_scrub_context *sc);
 int xfs_scrub_inobt(struct xfs_scrub_context *sc);
 int xfs_scrub_finobt(struct xfs_scrub_context *sc);
+int xfs_scrub_rmapbt(struct xfs_scrub_context *sc);
 
 #endif	/* __XFS_SCRUB_SCRUB_H__ */


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 20/30] xfs: scrub refcount btrees
  2017-10-12  1:40 [PATCH v12 00/30] xfs: online scrub support Darrick J. Wong
                   ` (18 preceding siblings ...)
  2017-10-12  1:42 ` [PATCH 19/30] xfs: scrub rmap btrees Darrick J. Wong
@ 2017-10-12  1:42 ` Darrick J. Wong
  2017-10-16  3:02   ` Dave Chinner
  2017-10-12  1:43 ` [PATCH 21/30] xfs: scrub inodes Darrick J. Wong
                   ` (9 subsequent siblings)
  29 siblings, 1 reply; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-12  1:42 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Plumb in the pieces necessary to check the refcount btree.  If rmap is
available, check the reference count by performing an interval query
against the rmapbt.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile         |    1 
 fs/xfs/libxfs/xfs_fs.h  |    3 +
 fs/xfs/scrub/common.h   |    2 +
 fs/xfs/scrub/refcount.c |   99 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/scrub.c    |    5 ++
 fs/xfs/scrub/scrub.h    |    1 
 6 files changed, 110 insertions(+), 1 deletion(-)
 create mode 100644 fs/xfs/scrub/refcount.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 5a64f8d..a7c5752 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -151,6 +151,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   btree.o \
 				   common.o \
 				   ialloc.o \
+				   refcount.o \
 				   rmap.o \
 				   scrub.o \
 				   )
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index fb1d997..b3f992c 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -493,9 +493,10 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_INOBT	7	/* inode btree */
 #define XFS_SCRUB_TYPE_FINOBT	8	/* free inode btree */
 #define XFS_SCRUB_TYPE_RMAPBT	9	/* reverse mapping btree */
+#define XFS_SCRUB_TYPE_REFCNTBT	10	/* reference count btree */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	10
+#define XFS_SCRUB_TYPE_NR	11
 
 /* i: Repair this metadata. */
 #define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 7922775..610e956 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -85,6 +85,8 @@ int xfs_scrub_setup_ag_iallocbt(struct xfs_scrub_context *sc,
 				struct xfs_inode *ip);
 int xfs_scrub_setup_ag_rmapbt(struct xfs_scrub_context *sc,
 			      struct xfs_inode *ip);
+int xfs_scrub_setup_ag_refcountbt(struct xfs_scrub_context *sc,
+				  struct xfs_inode *ip);
 
 
 void xfs_scrub_ag_free(struct xfs_scrub_context *sc, struct xfs_scrub_ag *sa);
diff --git a/fs/xfs/scrub/refcount.c b/fs/xfs/scrub/refcount.c
new file mode 100644
index 0000000..2aab775
--- /dev/null
+++ b/fs/xfs/scrub/refcount.c
@@ -0,0 +1,99 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_alloc.h"
+#include "xfs_rmap.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/btree.h"
+#include "scrub/trace.h"
+
+/*
+ * Set us up to scrub reference count btrees.
+ */
+int
+xfs_scrub_setup_ag_refcountbt(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	return xfs_scrub_setup_ag_btree(sc, ip, false);
+}
+
+/* Reference count btree scrubber. */
+
+/* Scrub a refcountbt record. */
+STATIC int
+xfs_scrub_refcountbt_helper(
+	struct xfs_scrub_btree		*bs,
+	union xfs_btree_rec		*rec)
+{
+	struct xfs_mount		*mp = bs->cur->bc_mp;
+	xfs_agnumber_t			agno = bs->cur->bc_private.a.agno;
+	xfs_agblock_t			bno;
+	xfs_extlen_t			len;
+	xfs_nlink_t			refcount;
+	bool				has_cowflag;
+	int				error = 0;
+
+	bno = be32_to_cpu(rec->refc.rc_startblock);
+	len = be32_to_cpu(rec->refc.rc_blockcount);
+	refcount = be32_to_cpu(rec->refc.rc_refcount);
+
+	/* Only CoW records can have refcount == 1. */
+	has_cowflag = (bno & XFS_REFC_COW_START);
+	if ((refcount == 1 && !has_cowflag) || (refcount != 1 && has_cowflag))
+		xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
+
+	/* Check the extent. */
+	bno &= ~XFS_REFC_COW_START;
+	if (bno + len <= bno ||
+	    !xfs_verify_agbno_ptr(mp, agno, bno) ||
+	    !xfs_verify_agbno_ptr(mp, agno, bno + len - 1))
+		xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
+
+	if (refcount == 0)
+		xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
+
+	return error;
+}
+
+/* Scrub the refcount btree for some AG. */
+int
+xfs_scrub_refcountbt(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_owner_info		oinfo;
+
+	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_REFC);
+	return xfs_scrub_btree(sc, sc->sa.refc_cur, xfs_scrub_refcountbt_helper,
+			&oinfo, NULL);
+}
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 9239cb3..10c9078 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -186,6 +186,11 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.scrub	= xfs_scrub_rmapbt,
 		.has	= xfs_sb_version_hasrmapbt,
 	},
+	{ /* refcountbt */
+		.setup	= xfs_scrub_setup_ag_refcountbt,
+		.scrub	= xfs_scrub_refcountbt,
+		.has	= xfs_sb_version_hasreflink,
+	},
 };
 
 /* This isn't a stable feature, warn once per day. */
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index 0d1e78b..1c80bf5 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -76,5 +76,6 @@ int xfs_scrub_cntbt(struct xfs_scrub_context *sc);
 int xfs_scrub_inobt(struct xfs_scrub_context *sc);
 int xfs_scrub_finobt(struct xfs_scrub_context *sc);
 int xfs_scrub_rmapbt(struct xfs_scrub_context *sc);
+int xfs_scrub_refcountbt(struct xfs_scrub_context *sc);
 
 #endif	/* __XFS_SCRUB_SCRUB_H__ */


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 21/30] xfs: scrub inodes
  2017-10-12  1:40 [PATCH v12 00/30] xfs: online scrub support Darrick J. Wong
                   ` (19 preceding siblings ...)
  2017-10-12  1:42 ` [PATCH 20/30] xfs: scrub refcount btrees Darrick J. Wong
@ 2017-10-12  1:43 ` Darrick J. Wong
  2017-10-12 22:32   ` Darrick J. Wong
  2017-10-17  0:13   ` [PATCH v2 " Darrick J. Wong
  2017-10-12  1:43 ` [PATCH 22/30] xfs: scrub inode block mappings Darrick J. Wong
                   ` (8 subsequent siblings)
  29 siblings, 2 replies; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-12  1:43 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Scrub the fields within an inode.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile        |    1 
 fs/xfs/libxfs/xfs_fs.h |    3 
 fs/xfs/scrub/common.c  |   54 ++++
 fs/xfs/scrub/common.h  |    3 
 fs/xfs/scrub/inode.c   |  607 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/scrub.c   |   18 +
 fs/xfs/scrub/scrub.h   |    2 
 7 files changed, 685 insertions(+), 3 deletions(-)
 create mode 100644 fs/xfs/scrub/inode.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index a7c5752..28e14b7 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -151,6 +151,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   btree.o \
 				   common.o \
 				   ialloc.o \
+				   inode.o \
 				   refcount.o \
 				   rmap.o \
 				   scrub.o \
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index b3f992c..f8463e0 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -494,9 +494,10 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_FINOBT	8	/* free inode btree */
 #define XFS_SCRUB_TYPE_RMAPBT	9	/* reverse mapping btree */
 #define XFS_SCRUB_TYPE_REFCNTBT	10	/* reference count btree */
+#define XFS_SCRUB_TYPE_INODE	11	/* inode record */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	11
+#define XFS_SCRUB_TYPE_NR	12
 
 /* i: Repair this metadata. */
 #define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 39165c3..415c6a9 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -30,6 +30,8 @@
 #include "xfs_trans.h"
 #include "xfs_sb.h"
 #include "xfs_inode.h"
+#include "xfs_icache.h"
+#include "xfs_itable.h"
 #include "xfs_alloc.h"
 #include "xfs_alloc_btree.h"
 #include "xfs_bmap.h"
@@ -488,3 +490,55 @@ xfs_scrub_checkpoint_log(
 	xfs_ail_push_all_sync(mp->m_ail);
 	return 0;
 }
+
+/*
+ * Given an inode and the scrub control structure, grab either the
+ * inode referenced in the control structure or the inode passed in.
+ * The inode is not locked.
+ */
+int
+xfs_scrub_get_inode(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip_in)
+{
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_inode		*ip = NULL;
+	int				error;
+
+	/*
+	 * If userspace passed us an AG number or a generation number
+	 * without an inode number, they haven't got a clue so bail out
+	 * immediately.
+	 */
+	if (sc->sm->sm_agno || (sc->sm->sm_gen && !sc->sm->sm_ino))
+		return -EINVAL;
+
+	/* We want to scan the inode we already had opened. */
+	if (sc->sm->sm_ino == 0 || sc->sm->sm_ino == ip_in->i_ino) {
+		sc->ip = ip_in;
+		return 0;
+	}
+
+	/* Look up the inode, see if the generation number matches. */
+	if (xfs_internal_inum(mp, sc->sm->sm_ino))
+		return -ENOENT;
+	error = xfs_iget(mp, NULL, sc->sm->sm_ino,
+			XFS_IGET_UNTRUSTED | XFS_IGET_DONTCACHE, 0, &ip);
+	if (error == -ENOENT || error == -EINVAL) {
+		/* inode doesn't exist... */
+		return -ENOENT;
+	} else if (error) {
+		trace_xfs_scrub_op_error(sc,
+				XFS_INO_TO_AGNO(mp, sc->sm->sm_ino),
+				XFS_INO_TO_AGBNO(mp, sc->sm->sm_ino),
+				error, __return_address);
+		return error;
+	}
+	if (VFS_I(ip)->i_generation != sc->sm->sm_gen) {
+		iput(VFS_I(ip));
+		return -ENOENT;
+	}
+
+	sc->ip = ip;
+	return 0;
+}
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 610e956..fcec11e 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -87,6 +87,8 @@ int xfs_scrub_setup_ag_rmapbt(struct xfs_scrub_context *sc,
 			      struct xfs_inode *ip);
 int xfs_scrub_setup_ag_refcountbt(struct xfs_scrub_context *sc,
 				  struct xfs_inode *ip);
+int xfs_scrub_setup_inode(struct xfs_scrub_context *sc,
+			  struct xfs_inode *ip);
 
 
 void xfs_scrub_ag_free(struct xfs_scrub_context *sc, struct xfs_scrub_ag *sa);
@@ -105,5 +107,6 @@ int xfs_scrub_walk_agfl(struct xfs_scrub_context *sc,
 
 int xfs_scrub_setup_ag_btree(struct xfs_scrub_context *sc,
 			     struct xfs_inode *ip, bool force_log);
+int xfs_scrub_get_inode(struct xfs_scrub_context *sc, struct xfs_inode *ip_in);
 
 #endif	/* __XFS_SCRUB_COMMON_H__ */
diff --git a/fs/xfs/scrub/inode.c b/fs/xfs/scrub/inode.c
new file mode 100644
index 0000000..aa1c549
--- /dev/null
+++ b/fs/xfs/scrub/inode.c
@@ -0,0 +1,607 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_icache.h"
+#include "xfs_inode_buf.h"
+#include "xfs_inode_fork.h"
+#include "xfs_ialloc.h"
+#include "xfs_da_format.h"
+#include "xfs_reflink.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/trace.h"
+
+/*
+ * Grab total control of the inode metadata.  It doesn't matter here if
+ * the file data is still changing; exclusive access to the metadata is
+ * the goal.
+ */
+int
+xfs_scrub_setup_inode(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	struct xfs_mount		*mp = sc->mp;
+	int				error;
+
+	/*
+	 * Try to get the inode.  If the verifiers fail, we try again
+	 * in raw mode.
+	 */
+	error = xfs_scrub_get_inode(sc, ip);
+	switch (error) {
+	case 0:
+		break;
+	case -EFSCORRUPTED:
+	case -EFSBADCRC:
+		return 0;
+	default:
+		return error;
+	}
+
+	/* Got the inode, lock it and we're ready to go. */
+	sc->ilock_flags = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL;
+	xfs_ilock(sc->ip, sc->ilock_flags);
+	error = xfs_scrub_trans_alloc(sc->sm, mp, &sc->tp);
+	if (error)
+		goto out;
+	sc->ilock_flags |= XFS_ILOCK_EXCL;
+	xfs_ilock(sc->ip, XFS_ILOCK_EXCL);
+
+out:
+	/* scrub teardown will unlock and release the inode for us */
+	return error;
+}
+
+/* Inode core */
+
+/*
+ * di_extsize hint validation is somewhat cumbersome. Rules are:
+ *
+ * 1. extent size hint is only valid for directories and regular files
+ * 2. DIFLAG_EXTSIZE is only valid for regular files
+ * 3. DIFLAG_EXTSZINHERIT is only valid for directories.
+ * 4. extsize hint of 0 turns off hints, clears inode flags.
+ * 5. either flag must be set if extsize != 0
+ * 6. Extent size must be a multiple of the appropriate block size.
+ * 7. extent size hint cannot be longer than maximum extent length
+ * 8. for non-realtime files, the extent size hint must be limited
+ *    to half the AG size to avoid alignment extending the extent
+ *    beyond the limits of the AG.
+ */
+STATIC void
+xfs_scrub_inode_extsize(
+	struct xfs_scrub_context	*sc,
+	struct xfs_buf			*bp,
+	struct xfs_dinode		*dip,
+	xfs_ino_t			ino,
+	uint16_t			mode,
+	uint16_t			flags)
+{
+	struct xfs_mount		*mp = sc->mp;
+	bool				rt_flag;
+	bool				hint_flag;
+	bool				inherit_flag;
+	uint32_t			extsize;
+	uint32_t			extsize_bytes;
+	uint32_t			blocksize_bytes;
+
+	rt_flag = (flags & XFS_DIFLAG_REALTIME);
+	hint_flag = (flags & XFS_DIFLAG_EXTSIZE);
+	inherit_flag = (flags & XFS_DIFLAG_EXTSZINHERIT);
+	extsize = be32_to_cpu(dip->di_extsize);
+	extsize_bytes = XFS_FSB_TO_B(sc->mp, extsize);
+
+	if (rt_flag)
+		blocksize_bytes = mp->m_sb.sb_rextsize << mp->m_sb.sb_blocklog;
+	else
+		blocksize_bytes = mp->m_sb.sb_blocksize;
+
+	if ((hint_flag || inherit_flag) && (!S_ISDIR(mode) && !S_ISREG(mode)))
+		goto bad;
+
+	if (hint_flag && !S_ISREG(mode))
+		goto bad;
+
+	if (inherit_flag && !S_ISDIR(mode))
+		goto bad;
+
+	if ((hint_flag || inherit_flag) && extsize == 0)
+		goto bad;
+
+	if (!(hint_flag || inherit_flag) && extsize != 0)
+		goto bad;
+
+	if (extsize_bytes % blocksize_bytes)
+		goto bad;
+
+	if (extsize > MAXEXTLEN)
+		goto bad;
+
+	if (!rt_flag && extsize > mp->m_sb.sb_agblocks / 2)
+		goto bad;
+
+	return;
+bad:
+	xfs_scrub_ino_set_corrupt(sc, ino, bp);
+}
+
+/*
+ * di_cowextsize hint validation is somewhat cumbersome. Rules are:
+ *
+ * 1. flag requires reflink feature
+ * 2. cow extent size hint is only valid for directories and regular files
+ * 3. cow extsize hint of 0 turns off hints, clears inode flags.
+ * 4. either flag must be set if cow extsize != 0
+ * 5. flag cannot be set for rt files
+ * 6. Extent size must be a multiple of the appropriate block size.
+ * 7. extent size hint cannot be longer than maximum extent length
+ * 8. the extent size hint must be limited
+ *    to half the AG size to avoid alignment extending the extent
+ *    beyond the limits of the AG.
+ */
+STATIC void
+xfs_scrub_inode_cowextsize(
+	struct xfs_scrub_context	*sc,
+	struct xfs_buf			*bp,
+	struct xfs_dinode		*dip,
+	xfs_ino_t			ino,
+	uint16_t			mode,
+	uint16_t			flags,
+	uint64_t			flags2)
+{
+	struct xfs_mount		*mp = sc->mp;
+	bool				rt_flag;
+	bool				hint_flag;
+	uint32_t			extsize;
+	uint32_t			extsize_bytes;
+
+	rt_flag = (flags & XFS_DIFLAG_REALTIME);
+	hint_flag = (flags2 & XFS_DIFLAG2_COWEXTSIZE);
+	extsize = be32_to_cpu(dip->di_extsize);
+	extsize_bytes = XFS_FSB_TO_B(sc->mp, extsize);
+
+	if (hint_flag && !xfs_sb_version_hasreflink(&mp->m_sb))
+		goto bad;
+
+	if (hint_flag && (!S_ISDIR(mode) && !S_ISREG(mode)))
+		goto bad;
+
+	if (hint_flag && extsize == 0)
+		goto bad;
+
+	if (!hint_flag && extsize != 0)
+		goto bad;
+
+	if (hint_flag && rt_flag)
+		goto bad;
+
+	if (extsize_bytes % mp->m_sb.sb_blocksize)
+		goto bad;
+
+	if (extsize > MAXEXTLEN)
+		goto bad;
+
+	if (extsize > mp->m_sb.sb_agblocks / 2)
+		goto bad;
+
+	return;
+bad:
+	xfs_scrub_ino_set_corrupt(sc, ino, bp);
+}
+
+/* Make sure the di_flags make sense for the inode. */
+STATIC void
+xfs_scrub_inode_flags(
+	struct xfs_scrub_context	*sc,
+	struct xfs_buf			*bp,
+	struct xfs_dinode		*dip,
+	xfs_ino_t			ino,
+	uint16_t			mode,
+	uint16_t			flags)
+{
+	struct xfs_mount		*mp = sc->mp;
+
+	if (flags & ~XFS_DIFLAG_ANY)
+		goto bad;
+
+	/* rt flags require rt device */
+	if ((flags & (XFS_DIFLAG_REALTIME | XFS_DIFLAG_RTINHERIT)) &&
+	    !mp->m_rtdev_targp)
+		goto bad;
+
+	/* new rt bitmap flag only valid for rbmino */
+	if ((flags & XFS_DIFLAG_NEWRTBM) && ino != mp->m_sb.sb_rbmino)
+		goto bad;
+
+	/* directory-only flags */
+	if ((flags & (XFS_DIFLAG_RTINHERIT |
+		     XFS_DIFLAG_EXTSZINHERIT |
+		     XFS_DIFLAG_PROJINHERIT |
+		     XFS_DIFLAG_NOSYMLINKS)) &&
+	    !S_ISDIR(mode))
+		goto bad;
+
+	/* file-only flags */
+	if ((flags & (XFS_DIFLAG_REALTIME | FS_XFLAG_EXTSIZE)) &&
+	    !S_ISREG(mode))
+		goto bad;
+
+	/* filestreams and rt make no sense */
+	if ((flags & XFS_DIFLAG_FILESTREAM) && (flags & XFS_DIFLAG_REALTIME))
+		goto bad;
+
+	return;
+bad:
+	xfs_scrub_ino_set_corrupt(sc, ino, bp);
+}
+
+/* Make sure the di_flags2 make sense for the inode. */
+STATIC void
+xfs_scrub_inode_flags2(
+	struct xfs_scrub_context	*sc,
+	struct xfs_buf			*bp,
+	struct xfs_dinode		*dip,
+	xfs_ino_t			ino,
+	uint16_t			mode,
+	uint64_t			flags2)
+{
+	struct xfs_mount		*mp = sc->mp;
+
+	if (flags2 & ~XFS_DIFLAG2_ANY)
+		goto bad;
+
+	/* reflink flag requires reflink feature */
+	if ((flags2 & XFS_DIFLAG2_REFLINK) &&
+	    !xfs_sb_version_hasreflink(&mp->m_sb))
+		goto bad;
+
+	/* cowextsize flag is checked w.r.t. mode separately */
+
+	/* file-only flags */
+	if ((flags2 & (XFS_DIFLAG2_DAX | XFS_DIFLAG2_REFLINK)) &&
+	    !S_ISREG(mode))
+		goto bad;
+
+	/* dax and reflink make no sense, currently */
+	if ((flags2 & XFS_DIFLAG2_DAX) && (flags2 & XFS_DIFLAG2_REFLINK))
+		goto bad;
+
+	return;
+bad:
+	xfs_scrub_ino_set_corrupt(sc, ino, bp);
+}
+
+/* Scrub all the ondisk inode fields. */
+STATIC void
+xfs_scrub_dinode(
+	struct xfs_scrub_context	*sc,
+	struct xfs_buf			*bp,
+	struct xfs_dinode		*dip,
+	xfs_ino_t			ino)
+{
+	struct xfs_mount		*mp = sc->mp;
+	size_t				fork_recs;
+	unsigned long long		isize;
+	uint64_t			flags2;
+	uint32_t			nextents;
+	uint16_t			flags;
+	uint16_t			mode;
+
+	flags = be16_to_cpu(dip->di_flags);
+	if (dip->di_version >= 3)
+		flags2 = be64_to_cpu(dip->di_flags2);
+	else
+		flags2 = 0;
+
+	/* di_mode */
+	mode = be16_to_cpu(dip->di_mode);
+	if (mode & ~(S_IALLUGO | S_IFMT))
+		xfs_scrub_ino_set_corrupt(sc, ino, bp);
+
+	/* v1/v2 fields */
+	switch (dip->di_version) {
+	case 1:
+		/*
+		 * We autoconvert v1 inodes into v2 inodes on writeout,
+		 * so just mark this inode for preening.
+		 */
+		xfs_scrub_ino_set_preen(sc, bp);
+		break;
+	case 2:
+	case 3:
+		if (dip->di_onlink != 0)
+			xfs_scrub_ino_set_corrupt(sc, ino, bp);
+
+		if (dip->di_mode == 0 && sc->ip)
+			xfs_scrub_ino_set_corrupt(sc, ino, bp);
+
+		if (dip->di_projid_hi != 0 &&
+		    !xfs_sb_version_hasprojid32bit(&mp->m_sb))
+			xfs_scrub_ino_set_corrupt(sc, ino, bp);
+		break;
+	default:
+		xfs_scrub_ino_set_corrupt(sc, ino, bp);
+		return;
+	}
+
+	/*
+	 * di_uid/di_gid -- -1 isn't invalid, but there's no way that
+	 * userspace could have created that.
+	 */
+	if (dip->di_uid == cpu_to_be32(-1U) ||
+	    dip->di_gid == cpu_to_be32(-1U))
+		xfs_scrub_ino_set_warning(sc, bp);
+
+	/* di_format */
+	switch (dip->di_format) {
+	case XFS_DINODE_FMT_DEV:
+		if (!S_ISCHR(mode) && !S_ISBLK(mode) &&
+		    !S_ISFIFO(mode) && !S_ISSOCK(mode))
+			xfs_scrub_ino_set_corrupt(sc, ino, bp);
+		break;
+	case XFS_DINODE_FMT_LOCAL:
+		if (!S_ISDIR(mode) && !S_ISLNK(mode))
+			xfs_scrub_ino_set_corrupt(sc, ino, bp);
+		break;
+	case XFS_DINODE_FMT_EXTENTS:
+		if (!S_ISREG(mode) && !S_ISDIR(mode) && !S_ISLNK(mode))
+			xfs_scrub_ino_set_corrupt(sc, ino, bp);
+		break;
+	case XFS_DINODE_FMT_BTREE:
+		if (!S_ISREG(mode) && !S_ISDIR(mode))
+			xfs_scrub_ino_set_corrupt(sc, ino, bp);
+		break;
+	case XFS_DINODE_FMT_UUID:
+	default:
+		xfs_scrub_ino_set_corrupt(sc, ino, bp);
+		break;
+	}
+
+	/*
+	 * di_size.  xfs_dinode_verify checks for things that screw up
+	 * the VFS such as the upper bit being set and zero-length
+	 * symlinks/directories, but we can do more here.
+	 */
+	isize = be64_to_cpu(dip->di_size);
+	if (isize & (1ULL << 63))
+		xfs_scrub_ino_set_corrupt(sc, ino, bp);
+
+	/* Devices, fifos, and sockets must have zero size */
+	if (!S_ISDIR(mode) && !S_ISREG(mode) && !S_ISLNK(mode) && isize != 0)
+		xfs_scrub_ino_set_corrupt(sc, ino, bp);
+
+	/* Directories can't be larger than the data section size (32G) */
+	if (S_ISDIR(mode) && (isize == 0 || isize >= XFS_DIR2_SPACE_SIZE))
+		xfs_scrub_ino_set_corrupt(sc, ino, bp);
+
+	/* Symlinks can't be larger than SYMLINK_MAXLEN */
+	if (S_ISLNK(mode) && (isize == 0 || isize >= XFS_SYMLINK_MAXLEN))
+		xfs_scrub_ino_set_corrupt(sc, ino, bp);
+
+	/*
+	 * Warn if the running kernel can't handle the kinds of offsets
+	 * needed to deal with the file size.  In other words, if the
+	 * pagecache can't cache all the blocks in this file due to
+	 * overly large offsets, flag the inode for admin review.
+	 */
+	if (isize >= mp->m_super->s_maxbytes)
+		xfs_scrub_ino_set_warning(sc, bp);
+
+	/* di_nblocks */
+	if (flags2 & XFS_DIFLAG2_REFLINK) {
+		; /* nblocks can exceed dblocks */
+	} else if (flags & XFS_DIFLAG_REALTIME) {
+		/*
+		 * nblocks is the sum of data extents (in the rtdev),
+		 * attr extents (in the datadev), and both forks' bmbt
+		 * blocks (in the datadev).  This clumsy check is the
+		 * best we can do without cross-referencing with the
+		 * inode forks.
+		 */
+		if (be64_to_cpu(dip->di_nblocks) >=
+		    mp->m_sb.sb_dblocks + mp->m_sb.sb_rblocks)
+			xfs_scrub_ino_set_corrupt(sc, ino, bp);
+	} else {
+		if (be64_to_cpu(dip->di_nblocks) >= mp->m_sb.sb_dblocks)
+			xfs_scrub_ino_set_corrupt(sc, ino, bp);
+	}
+
+	xfs_scrub_inode_flags(sc, bp, dip, ino, mode, flags);
+
+	xfs_scrub_inode_extsize(sc, bp, dip, ino, mode, flags);
+
+	/* di_nextents */
+	nextents = be32_to_cpu(dip->di_nextents);
+	fork_recs =  XFS_DFORK_DSIZE(dip, mp) / sizeof(struct xfs_bmbt_rec);
+	switch (dip->di_format) {
+	case XFS_DINODE_FMT_EXTENTS:
+		if (nextents > fork_recs)
+			xfs_scrub_ino_set_corrupt(sc, ino, bp);
+		break;
+	case XFS_DINODE_FMT_BTREE:
+		if (nextents <= fork_recs)
+			xfs_scrub_ino_set_corrupt(sc, ino, bp);
+		break;
+	default:
+		if (nextents != 0)
+			xfs_scrub_ino_set_corrupt(sc, ino, bp);
+		break;
+	}
+
+	/* di_forkoff */
+	if (XFS_DFORK_APTR(dip) >= (char *)dip + mp->m_sb.sb_inodesize)
+		xfs_scrub_ino_set_corrupt(sc, ino, bp);
+	if (dip->di_anextents != 0 && dip->di_forkoff == 0)
+		xfs_scrub_ino_set_corrupt(sc, ino, bp);
+	if (dip->di_forkoff == 0 && dip->di_aformat != XFS_DINODE_FMT_EXTENTS)
+		xfs_scrub_ino_set_corrupt(sc, ino, bp);
+
+	/* di_aformat */
+	if (dip->di_aformat != XFS_DINODE_FMT_LOCAL &&
+	    dip->di_aformat != XFS_DINODE_FMT_EXTENTS &&
+	    dip->di_aformat != XFS_DINODE_FMT_BTREE)
+		xfs_scrub_ino_set_corrupt(sc, ino, bp);
+
+	/* di_anextents */
+	nextents = be16_to_cpu(dip->di_anextents);
+	fork_recs =  XFS_DFORK_ASIZE(dip, mp) / sizeof(struct xfs_bmbt_rec);
+	switch (dip->di_aformat) {
+	case XFS_DINODE_FMT_EXTENTS:
+		if (nextents > fork_recs)
+			xfs_scrub_ino_set_corrupt(sc, ino, bp);
+		break;
+	case XFS_DINODE_FMT_BTREE:
+		if (nextents <= fork_recs)
+			xfs_scrub_ino_set_corrupt(sc, ino, bp);
+		break;
+	default:
+		if (nextents != 0)
+			xfs_scrub_ino_set_corrupt(sc, ino, bp);
+	}
+
+	if (flags2)
+		xfs_scrub_inode_flags2(sc, bp, dip, ino, mode, flags2);
+
+	xfs_scrub_inode_cowextsize(sc, bp, dip, ino, mode, flags, flags2);
+}
+
+/* Map and read a raw inode. */
+STATIC int
+xfs_scrub_inode_map_raw(
+	struct xfs_scrub_context	*sc,
+	xfs_ino_t			ino,
+	struct xfs_buf			**bpp,
+	struct xfs_dinode		**dipp)
+{
+	struct xfs_imap			imap;
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_buf			*bp;
+	struct xfs_dinode		*dip;
+	int				error;
+
+	error = xfs_imap(mp, sc->tp, ino, &imap, XFS_IGET_UNTRUSTED);
+	if (error == -EINVAL) {
+		/*
+		 * Inode could have gotten deleted out from under us;
+		 * just forget about it.
+		 */
+		error = -ENOENT;
+		goto out;
+	}
+	if (!xfs_scrub_process_error(sc, XFS_INO_TO_AGNO(mp, ino),
+			XFS_INO_TO_AGBNO(mp, ino), &error))
+		goto out;
+
+	error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
+			imap.im_blkno, imap.im_len, XBF_UNMAPPED, &bp,
+			NULL);
+	if (!xfs_scrub_process_error(sc, XFS_INO_TO_AGNO(mp, ino),
+			XFS_INO_TO_AGBNO(mp, ino), &error))
+		goto out;
+
+	/* Is this really an inode? */
+	bp->b_ops = &xfs_inode_buf_ops;
+	dip = xfs_buf_offset(bp, imap.im_boffset);
+	if (!xfs_dinode_verify(mp, ino, dip) ||
+	    !xfs_dinode_good_version(mp, dip->di_version)) {
+		xfs_scrub_ino_set_corrupt(sc, ino, bp);
+		goto out;
+	}
+
+	/* ...and is it the one we asked for? */
+	if (be32_to_cpu(dip->di_gen) != sc->sm->sm_gen) {
+		error = -ENOENT;
+		goto out;
+	}
+
+	*dipp = dip;
+	*bpp = bp;
+out:
+	return error;
+}
+
+/* Scrub an inode. */
+int
+xfs_scrub_inode(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_dinode		di;
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_buf			*bp = NULL;
+	struct xfs_dinode		*dip;
+	xfs_ino_t			ino;
+
+	bool				has_shared;
+	int				error = 0;
+
+	/* Did we get the in-core inode, or are we doing this manually? */
+	if (sc->ip) {
+		ino = sc->ip->i_ino;
+		xfs_inode_to_disk(sc->ip, &di, 0);
+		dip = &di;
+	} else {
+		/* Map & read inode. */
+		ino = sc->sm->sm_ino;
+		error = xfs_scrub_inode_map_raw(sc, ino, &bp, &dip);
+		if (error)
+			goto out;
+	}
+
+	xfs_scrub_dinode(sc, bp, dip, ino);
+	if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
+		goto out;
+
+	/* Now let's do the things that require a live inode. */
+	if (!sc->ip)
+		goto out;
+
+	/*
+	 * Does this inode have the reflink flag set but no shared extents?
+	 * Set the preening flag if this is the case.
+	 */
+	if (xfs_is_reflink_inode(sc->ip)) {
+		error = xfs_reflink_inode_has_shared_extents(sc->tp, sc->ip,
+				&has_shared);
+		if (!xfs_scrub_process_error(sc, XFS_INO_TO_AGNO(mp, ino),
+				XFS_INO_TO_AGBNO(mp, ino), &error))
+			goto out;
+		if (!has_shared)
+			xfs_scrub_ino_set_preen(sc, bp);
+	}
+
+out:
+	if (bp)
+		xfs_trans_brelse(sc->tp, bp);
+	return error;
+}
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 10c9078..ab4209c 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -30,6 +30,8 @@
 #include "xfs_trans.h"
 #include "xfs_sb.h"
 #include "xfs_inode.h"
+#include "xfs_icache.h"
+#include "xfs_itable.h"
 #include "xfs_alloc.h"
 #include "xfs_alloc_btree.h"
 #include "xfs_bmap.h"
@@ -131,6 +133,7 @@ xfs_scrub_probe(
 STATIC int
 xfs_scrub_teardown(
 	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip_in,
 	int				error)
 {
 	xfs_scrub_ag_free(sc, &sc->sa);
@@ -138,6 +141,13 @@ xfs_scrub_teardown(
 		xfs_trans_cancel(sc->tp);
 		sc->tp = NULL;
 	}
+	if (sc->ip) {
+		xfs_iunlock(sc->ip, sc->ilock_flags);
+		if (sc->ip != ip_in &&
+		    !xfs_internal_inum(sc->mp, sc->ip->i_ino))
+			iput(VFS_I(sc->ip));
+		sc->ip = NULL;
+	}
 	return error;
 }
 
@@ -191,6 +201,10 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.scrub	= xfs_scrub_refcountbt,
 		.has	= xfs_sb_version_hasreflink,
 	},
+	{ /* inode record */
+		.setup	= xfs_scrub_setup_inode,
+		.scrub	= xfs_scrub_inode,
+	},
 };
 
 /* This isn't a stable feature, warn once per day. */
@@ -290,7 +304,7 @@ xfs_scrub_metadata(
 		 * Tear down everything we hold, then set up again with
 		 * preparation for worst-case scenarios.
 		 */
-		error = xfs_scrub_teardown(&sc, 0);
+		error = xfs_scrub_teardown(&sc, ip, 0);
 		if (error)
 			goto out;
 		try_harder = true;
@@ -303,7 +317,7 @@ xfs_scrub_metadata(
 		xfs_alert_ratelimited(mp, "Corruption detected during scrub.");
 
 out_teardown:
-	error = xfs_scrub_teardown(&sc, error);
+	error = xfs_scrub_teardown(&sc, ip, error);
 out:
 	trace_xfs_scrub_done(ip, sm, error);
 	return error;
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index 1c80bf5..ec635d4 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -59,6 +59,7 @@ struct xfs_scrub_context {
 	const struct xfs_scrub_meta_ops	*ops;
 	struct xfs_trans		*tp;
 	struct xfs_inode		*ip;
+	uint				ilock_flags;
 	bool				try_harder;
 
 	/* State tracking for single-AG operations. */
@@ -77,5 +78,6 @@ int xfs_scrub_inobt(struct xfs_scrub_context *sc);
 int xfs_scrub_finobt(struct xfs_scrub_context *sc);
 int xfs_scrub_rmapbt(struct xfs_scrub_context *sc);
 int xfs_scrub_refcountbt(struct xfs_scrub_context *sc);
+int xfs_scrub_inode(struct xfs_scrub_context *sc);
 
 #endif	/* __XFS_SCRUB_SCRUB_H__ */


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 22/30] xfs: scrub inode block mappings
  2017-10-12  1:40 [PATCH v12 00/30] xfs: online scrub support Darrick J. Wong
                   ` (20 preceding siblings ...)
  2017-10-12  1:43 ` [PATCH 21/30] xfs: scrub inodes Darrick J. Wong
@ 2017-10-12  1:43 ` Darrick J. Wong
  2017-10-16  3:26   ` Dave Chinner
  2017-10-12  1:43 ` [PATCH 23/30] xfs: scrub directory/attribute btrees Darrick J. Wong
                   ` (7 subsequent siblings)
  29 siblings, 1 reply; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-12  1:43 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Scrub an individual inode's block mappings to make sure they make sense.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile        |    1 
 fs/xfs/libxfs/xfs_fs.h |    5 +
 fs/xfs/scrub/bmap.c    |  362 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/common.h  |    5 +
 fs/xfs/scrub/scrub.c   |   12 ++
 fs/xfs/scrub/scrub.h   |    3 
 6 files changed, 386 insertions(+), 2 deletions(-)
 create mode 100644 fs/xfs/scrub/bmap.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 28e14b7..5a77489 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -148,6 +148,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   trace.o \
 				   agheader.o \
 				   alloc.o \
+				   bmap.o \
 				   btree.o \
 				   common.o \
 				   ialloc.o \
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index f8463e0..02ae58b 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -495,9 +495,12 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_RMAPBT	9	/* reverse mapping btree */
 #define XFS_SCRUB_TYPE_REFCNTBT	10	/* reference count btree */
 #define XFS_SCRUB_TYPE_INODE	11	/* inode record */
+#define XFS_SCRUB_TYPE_BMBTD	12	/* data fork block mapping */
+#define XFS_SCRUB_TYPE_BMBTA	13	/* attr fork block mapping */
+#define XFS_SCRUB_TYPE_BMBTC	14	/* CoW fork block mapping */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	12
+#define XFS_SCRUB_TYPE_NR	15
 
 /* i: Repair this metadata. */
 #define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c
new file mode 100644
index 0000000..3955933
--- /dev/null
+++ b/fs/xfs/scrub/bmap.c
@@ -0,0 +1,362 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_inode_fork.h"
+#include "xfs_alloc.h"
+#include "xfs_rtalloc.h"
+#include "xfs_bmap.h"
+#include "xfs_bmap_util.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_rmap.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/btree.h"
+#include "scrub/trace.h"
+
+/* Set us up with an inode's bmap. */
+int
+xfs_scrub_setup_inode_bmap(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	struct xfs_mount		*mp = sc->mp;
+	int				error;
+
+	error = xfs_scrub_get_inode(sc, ip);
+	if (error)
+		goto out;
+
+	sc->ilock_flags = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL;
+	xfs_ilock(sc->ip, sc->ilock_flags);
+
+	/*
+	 * We don't want any ephemeral data fork updates sitting around
+	 * while we inspect block mappings, so wait for directio to finish
+	 * and flush dirty data if we have delalloc reservations.
+	 */
+	if (S_ISREG(VFS_I(sc->ip)->i_mode) &&
+	    sc->sm->sm_type == XFS_SCRUB_TYPE_BMBTD) {
+		inode_dio_wait(VFS_I(sc->ip));
+		error = filemap_write_and_wait(VFS_I(sc->ip)->i_mapping);
+		if (error)
+			goto out;
+
+		/* Drop the page cache if we're repairing block mappings. */
+		if (sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR) {
+			error = invalidate_inode_pages2(
+					VFS_I(sc->ip)->i_mapping);
+			if (error)
+				goto out;
+		}
+	}
+
+	/* Got the inode, lock it and we're ready to go. */
+	error = xfs_scrub_trans_alloc(sc->sm, mp, &sc->tp);
+	if (error)
+		goto out;
+	sc->ilock_flags |= XFS_ILOCK_EXCL;
+	xfs_ilock(sc->ip, XFS_ILOCK_EXCL);
+
+out:
+	/* scrub teardown will unlock and release the inode */
+	return error;
+}
+
+/*
+ * Inode fork block mapping (BMBT) scrubber.
+ * More complex than the others because we have to scrub
+ * all the extents regardless of whether or not the fork
+ * is in btree format.
+ */
+
+struct xfs_scrub_bmap_info {
+	struct xfs_scrub_context	*sc;
+	xfs_fileoff_t			lastoff;
+	bool				is_rt;
+	bool				is_shared;
+	int				whichfork;
+};
+
+/* Scrub a single extent record. */
+STATIC int
+xfs_scrub_bmap_extent(
+	struct xfs_inode		*ip,
+	struct xfs_btree_cur		*cur,
+	struct xfs_scrub_bmap_info	*info,
+	struct xfs_bmbt_irec		*irec)
+{
+	struct xfs_mount		*mp = info->sc->mp;
+	struct xfs_buf			*bp = NULL;
+	int				error = 0;
+
+	if (cur)
+		xfs_btree_get_block(cur, 0, &bp);
+
+	/*
+	 * Check for out-of-order extents.  This record could have come
+	 * from the incore list, for which there is no ordering check.
+	 */
+	if (irec->br_startoff < info->lastoff)
+		xfs_scrub_fblock_set_corrupt(info->sc, info->whichfork,
+				irec->br_startoff);
+
+	/* There should never be a "hole" extent in either extent list. */
+	if (irec->br_startblock == HOLESTARTBLOCK)
+		xfs_scrub_fblock_set_corrupt(info->sc, info->whichfork,
+				irec->br_startoff);
+
+	/*
+	 * Check for delalloc extents.  We never iterate the ones in the
+	 * in-core extent scan, and we should never see these in the bmbt.
+	 */
+	if (isnullstartblock(irec->br_startblock))
+		xfs_scrub_fblock_set_corrupt(info->sc, info->whichfork,
+				irec->br_startoff);
+
+	/* Make sure the extent points to a valid place. */
+	if (irec->br_startblock + irec->br_blockcount <= irec->br_startblock)
+		xfs_scrub_fblock_set_corrupt(info->sc, info->whichfork,
+				irec->br_startoff);
+	if (info->is_rt &&
+	    (!xfs_verify_rtbno_ptr(mp, irec->br_startblock) ||
+	     !xfs_verify_rtbno_ptr(mp, irec->br_startblock +
+				irec->br_blockcount - 1)))
+		xfs_scrub_fblock_set_corrupt(info->sc, info->whichfork,
+				irec->br_startoff);
+	if (!info->is_rt &&
+	    (!xfs_verify_fsbno_ptr(mp, irec->br_startblock) ||
+	     !xfs_verify_fsbno_ptr(mp, irec->br_startblock +
+				irec->br_blockcount - 1)))
+		xfs_scrub_fblock_set_corrupt(info->sc, info->whichfork,
+				irec->br_startoff);
+
+	if (irec->br_state == XFS_EXT_UNWRITTEN &&
+	    !xfs_sb_version_hasextflgbit(&mp->m_sb))
+		xfs_scrub_fblock_set_corrupt(info->sc, info->whichfork,
+				irec->br_startoff);
+
+	info->lastoff = irec->br_startoff + irec->br_blockcount;
+	return error;
+}
+
+/* Scrub a bmbt record. */
+STATIC int
+xfs_scrub_bmapbt_helper(
+	struct xfs_scrub_btree		*bs,
+	union xfs_btree_rec		*rec)
+{
+	struct xfs_bmbt_rec_host	ihost;
+	struct xfs_bmbt_irec		irec;
+	struct xfs_scrub_bmap_info	*info = bs->private;
+	struct xfs_inode		*ip = bs->cur->bc_private.b.ip;
+	struct xfs_buf			*bp = NULL;
+	struct xfs_btree_block		*block;
+	uint64_t			owner;
+	int				i;
+
+	/*
+	 * Check the owners of the btree blocks up to the level below
+	 * the root since the verifiers don't do that.
+	 */
+	if (xfs_sb_version_hascrc(&bs->cur->bc_mp->m_sb) &&
+	    bs->cur->bc_ptrs[0] == 1) {
+		for (i = 0; i < bs->cur->bc_nlevels - 1; i++) {
+			block = xfs_btree_get_block(bs->cur, i, &bp);
+			owner = be64_to_cpu(block->bb_u.l.bb_owner);
+			if (owner != ip->i_ino)
+				xfs_scrub_fblock_set_corrupt(bs->sc,
+						info->whichfork, 0);
+		}
+	}
+
+	/* Set up the in-core record and scrub it. */
+	ihost.l0 = be64_to_cpu(rec->bmbt.l0);
+	ihost.l1 = be64_to_cpu(rec->bmbt.l1);
+	xfs_bmbt_get_all(&ihost, &irec);
+	return xfs_scrub_bmap_extent(ip, bs->cur, info, &irec);
+}
+
+/* Scan the btree records. */
+STATIC int
+xfs_scrub_bmap_btree(
+	struct xfs_scrub_context	*sc,
+	int				whichfork,
+	struct xfs_scrub_bmap_info	*info)
+{
+	struct xfs_owner_info		oinfo;
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_inode		*ip = sc->ip;
+	struct xfs_btree_cur		*cur;
+	int				error;
+
+	cur = xfs_bmbt_init_cursor(mp, sc->tp, ip, whichfork);
+	xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, whichfork);
+	error = xfs_scrub_btree(sc, cur, xfs_scrub_bmapbt_helper, &oinfo, info);
+	xfs_btree_del_cursor(cur, error ? XFS_BTREE_ERROR :
+					  XFS_BTREE_NOERROR);
+	return error;
+}
+
+/*
+ * Scrub an inode fork's block mappings.
+ *
+ * First we scan every record in every btree block, if applicable.
+ * Then we unconditionally scan the incore extent cache.
+ */
+STATIC int
+xfs_scrub_bmap(
+	struct xfs_scrub_context	*sc,
+	int				whichfork)
+{
+	struct xfs_bmbt_irec		irec;
+	struct xfs_scrub_bmap_info	info = {0};
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_inode		*ip = sc->ip;
+	struct xfs_ifork		*ifp;
+	xfs_fileoff_t			endoff;
+	xfs_extnum_t			idx;
+	bool				found;
+	int				error = 0;
+
+	ifp = XFS_IFORK_PTR(ip, whichfork);
+
+	info.is_rt = whichfork == XFS_DATA_FORK && XFS_IS_REALTIME_INODE(ip);
+	info.whichfork = whichfork;
+	info.is_shared = whichfork == XFS_DATA_FORK && xfs_is_reflink_inode(ip);
+	info.sc = sc;
+
+	switch (whichfork) {
+	case XFS_COW_FORK:
+		/* Non-existent CoW forks are ignorable. */
+		if (!ifp)
+			goto out;
+		/* No CoW forks on non-reflink inodes/filesystems. */
+		if (!xfs_is_reflink_inode(ip)) {
+			xfs_scrub_ino_set_corrupt(sc, sc->ip->i_ino, NULL);
+			goto out;
+		}
+		break;
+	case XFS_ATTR_FORK:
+		if (!ifp)
+			goto out;
+		if (!xfs_sb_version_hasattr(&mp->m_sb) &&
+		    !xfs_sb_version_hasattr2(&mp->m_sb))
+			xfs_scrub_ino_set_corrupt(sc, sc->ip->i_ino, NULL);
+		break;
+	}
+
+	/* Check the fork values */
+	switch (XFS_IFORK_FORMAT(ip, whichfork)) {
+	case XFS_DINODE_FMT_UUID:
+	case XFS_DINODE_FMT_DEV:
+	case XFS_DINODE_FMT_LOCAL:
+		/* No mappings to check. */
+		goto out;
+	case XFS_DINODE_FMT_EXTENTS:
+		if (!(ifp->if_flags & XFS_IFEXTENTS)) {
+			xfs_scrub_fblock_set_corrupt(sc, whichfork, 0);
+			goto out;
+		}
+		break;
+	case XFS_DINODE_FMT_BTREE:
+		if (whichfork == XFS_COW_FORK) {
+			xfs_scrub_fblock_set_corrupt(sc, whichfork, 0);
+			goto out;
+		}
+
+		error = xfs_scrub_bmap_btree(sc, whichfork, &info);
+		if (error)
+			goto out;
+		break;
+	default:
+		xfs_scrub_fblock_set_corrupt(sc, whichfork, 0);
+		goto out;
+	}
+
+	/* Extent data is in memory, so scrub that. */
+
+	/* Find the offset of the last extent in the mapping. */
+	error = xfs_bmap_last_offset(ip, &endoff, whichfork);
+	if (!xfs_scrub_fblock_process_error(sc, whichfork, 0, &error))
+		goto out;
+
+	/* Scrub extent records. */
+	info.lastoff = 0;
+	ifp = XFS_IFORK_PTR(ip, whichfork);
+	for (found = xfs_iext_lookup_extent(ip, ifp, 0, &idx, &irec);
+	     found != 0;
+	     found = xfs_iext_get_extent(ifp, ++idx, &irec)) {
+		if (xfs_scrub_should_terminate(sc, &error))
+			break;
+		if (isnullstartblock(irec.br_startblock))
+			continue;
+		if (irec.br_startoff >= endoff) {
+			xfs_scrub_fblock_set_corrupt(sc, whichfork,
+					irec.br_startoff);
+			goto out;
+		}
+		error = xfs_scrub_bmap_extent(ip, NULL, &info, &irec);
+		if (error)
+			goto out;
+	}
+
+out:
+	return error;
+}
+
+/* Scrub an inode's data fork. */
+int
+xfs_scrub_bmap_data(
+	struct xfs_scrub_context	*sc)
+{
+	return xfs_scrub_bmap(sc, XFS_DATA_FORK);
+}
+
+/* Scrub an inode's attr fork. */
+int
+xfs_scrub_bmap_attr(
+	struct xfs_scrub_context	*sc)
+{
+	return xfs_scrub_bmap(sc, XFS_ATTR_FORK);
+}
+
+/* Scrub an inode's CoW fork. */
+int
+xfs_scrub_bmap_cow(
+	struct xfs_scrub_context	*sc)
+{
+	if (!xfs_is_reflink_inode(sc->ip))
+		return -ENOENT;
+
+	return xfs_scrub_bmap(sc, XFS_COW_FORK);
+}
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index fcec11e..b3cf4a2 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -89,7 +89,10 @@ int xfs_scrub_setup_ag_refcountbt(struct xfs_scrub_context *sc,
 				  struct xfs_inode *ip);
 int xfs_scrub_setup_inode(struct xfs_scrub_context *sc,
 			  struct xfs_inode *ip);
-
+int xfs_scrub_setup_inode_bmap(struct xfs_scrub_context *sc,
+			       struct xfs_inode *ip);
+int xfs_scrub_setup_inode_bmap_data(struct xfs_scrub_context *sc,
+				    struct xfs_inode *ip);
 
 void xfs_scrub_ag_free(struct xfs_scrub_context *sc, struct xfs_scrub_ag *sa);
 int xfs_scrub_ag_init(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index ab4209c..b20fdd3 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -205,6 +205,18 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.setup	= xfs_scrub_setup_inode,
 		.scrub	= xfs_scrub_inode,
 	},
+	{ /* inode data fork */
+		.setup	= xfs_scrub_setup_inode_bmap,
+		.scrub	= xfs_scrub_bmap_data,
+	},
+	{ /* inode attr fork */
+		.setup	= xfs_scrub_setup_inode_bmap,
+		.scrub	= xfs_scrub_bmap_attr,
+	},
+	{ /* inode CoW fork */
+		.setup	= xfs_scrub_setup_inode_bmap,
+		.scrub	= xfs_scrub_bmap_cow,
+	},
 };
 
 /* This isn't a stable feature, warn once per day. */
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index ec635d4..8920ccf 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -79,5 +79,8 @@ int xfs_scrub_finobt(struct xfs_scrub_context *sc);
 int xfs_scrub_rmapbt(struct xfs_scrub_context *sc);
 int xfs_scrub_refcountbt(struct xfs_scrub_context *sc);
 int xfs_scrub_inode(struct xfs_scrub_context *sc);
+int xfs_scrub_bmap_data(struct xfs_scrub_context *sc);
+int xfs_scrub_bmap_attr(struct xfs_scrub_context *sc);
+int xfs_scrub_bmap_cow(struct xfs_scrub_context *sc);
 
 #endif	/* __XFS_SCRUB_SCRUB_H__ */


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 23/30] xfs: scrub directory/attribute btrees
  2017-10-12  1:40 [PATCH v12 00/30] xfs: online scrub support Darrick J. Wong
                   ` (21 preceding siblings ...)
  2017-10-12  1:43 ` [PATCH 22/30] xfs: scrub inode block mappings Darrick J. Wong
@ 2017-10-12  1:43 ` Darrick J. Wong
  2017-10-16  4:13   ` Dave Chinner
  2017-10-12  1:43 ` [PATCH 24/30] xfs: scrub directory metadata Darrick J. Wong
                   ` (6 subsequent siblings)
  29 siblings, 1 reply; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-12  1:43 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, Fengguang Wu

From: Darrick J. Wong <darrick.wong@oracle.com>

Provide a way to check the shape and scrub the hashes and records
in a directory or extended attribute btree.  These are helper functions
for the directory & attribute scrubbers in subsequent patches.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[fengguang: remove unneeded variable to store return value]
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
---
 fs/xfs/Makefile        |    1 
 fs/xfs/scrub/dabtree.c |  570 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/dabtree.h |   58 +++++
 3 files changed, 629 insertions(+)
 create mode 100644 fs/xfs/scrub/dabtree.c
 create mode 100644 fs/xfs/scrub/dabtree.h


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 5a77489..b48437f 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -151,6 +151,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   bmap.o \
 				   btree.o \
 				   common.o \
+				   dabtree.o \
 				   ialloc.o \
 				   inode.o \
 				   refcount.o \
diff --git a/fs/xfs/scrub/dabtree.c b/fs/xfs/scrub/dabtree.c
new file mode 100644
index 0000000..672d273
--- /dev/null
+++ b/fs/xfs/scrub/dabtree.c
@@ -0,0 +1,570 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_inode_fork.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_dir2.h"
+#include "xfs_dir2_priv.h"
+#include "xfs_attr_leaf.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/trace.h"
+#include "scrub/dabtree.h"
+
+/* Directory/Attribute Btree */
+
+/*
+ * Check for da btree operation errors.  See the section about handling
+ * operational errors in common.c.
+ */
+bool
+xfs_scrub_da_process_error(
+	struct xfs_scrub_da_btree	*ds,
+	int				level,
+	int				*error)
+{
+	struct xfs_scrub_context	*sc = ds->sc;
+
+	if (*error == 0)
+		return true;
+
+	switch (*error) {
+	case -EDEADLOCK:
+		/* Used to restart an op with deadlock avoidance. */
+		trace_xfs_scrub_deadlock_retry(sc->ip, sc->sm, *error);
+		break;
+	case -EFSBADCRC:
+	case -EFSCORRUPTED:
+		/* Note the badness but don't abort. */
+		sc->sm->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT;
+		*error = 0;
+		/* fall through */
+	default:
+		trace_xfs_scrub_file_op_error(sc, ds->dargs.whichfork,
+				xfs_dir2_da_to_db(ds->dargs.geo,
+					ds->state->path.blk[level].blkno),
+				*error, __return_address);
+		break;
+	}
+	return false;
+}
+
+/*
+ * Check for da btree corruption.  See the section about handling
+ * operational errors in common.c.
+ */
+void
+xfs_scrub_da_set_corrupt(
+	struct xfs_scrub_da_btree	*ds,
+	int				level)
+{
+	struct xfs_scrub_context	*sc = ds->sc;
+
+	sc->sm->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT;
+
+	trace_xfs_scrub_fblock_error(sc, ds->dargs.whichfork,
+			xfs_dir2_da_to_db(ds->dargs.geo,
+				ds->state->path.blk[level].blkno),
+			__return_address);
+}
+
+/* Find an entry at a certain level in a da btree. */
+STATIC void *
+xfs_scrub_da_btree_entry(
+	struct xfs_scrub_da_btree	*ds,
+	int				level,
+	int				rec)
+{
+	char				*ents;
+	struct xfs_da_state_blk		*blk;
+	void				*baddr;
+
+	/* Dispatch the entry finding function. */
+	blk = &ds->state->path.blk[level];
+	baddr = blk->bp->b_addr;
+	switch (blk->magic) {
+	case XFS_ATTR_LEAF_MAGIC:
+	case XFS_ATTR3_LEAF_MAGIC:
+		ents = (char *)xfs_attr3_leaf_entryp(baddr);
+		return ents + (rec * sizeof(struct xfs_attr_leaf_entry));
+	case XFS_DIR2_LEAFN_MAGIC:
+	case XFS_DIR3_LEAFN_MAGIC:
+		ents = (char *)ds->dargs.dp->d_ops->leaf_ents_p(baddr);
+		return ents + (rec * sizeof(struct xfs_dir2_leaf_entry));
+	case XFS_DIR2_LEAF1_MAGIC:
+	case XFS_DIR3_LEAF1_MAGIC:
+		ents = (char *)ds->dargs.dp->d_ops->leaf_ents_p(baddr);
+		return ents + (rec * sizeof(struct xfs_dir2_leaf_entry));
+	case XFS_DA_NODE_MAGIC:
+	case XFS_DA3_NODE_MAGIC:
+		ents = (char *)ds->dargs.dp->d_ops->node_tree_p(baddr);
+		return ents + (rec * sizeof(struct xfs_da_node_entry));
+	}
+
+	return NULL;
+}
+
+/* Scrub a da btree hash (key). */
+int
+xfs_scrub_da_btree_hash(
+	struct xfs_scrub_da_btree	*ds,
+	int				level,
+	__be32				*hashp)
+{
+	struct xfs_da_state_blk		*blks;
+	struct xfs_da_node_entry	*entry;
+	xfs_dahash_t			hash;
+	xfs_dahash_t			parent_hash;
+
+	/* Is this hash in order? */
+	hash = be32_to_cpu(*hashp);
+	if (hash < ds->hashes[level])
+		xfs_scrub_da_set_corrupt(ds, level);
+	ds->hashes[level] = hash;
+
+	if (level == 0)
+		return 0;
+
+	/* Is this hash no larger than the parent hash? */
+	blks = ds->state->path.blk;
+	entry = xfs_scrub_da_btree_entry(ds, level - 1, blks[level - 1].index);
+	parent_hash = be32_to_cpu(entry->hashval);
+	if (parent_hash < hash)
+		xfs_scrub_da_set_corrupt(ds, level);
+
+	return 0;
+}
+
+/*
+ * Check a da btree pointer.  Returns true if it's ok to use this
+ * pointer.
+ */
+STATIC bool
+xfs_scrub_da_btree_ptr_ok(
+	struct xfs_scrub_da_btree	*ds,
+	int				level,
+	xfs_dablk_t			blkno)
+{
+	if (blkno < ds->lowest || (ds->highest != 0 && blkno >= ds->highest)) {
+		xfs_scrub_da_set_corrupt(ds, level);
+		return false;
+	}
+
+	return true;
+}
+
+/*
+ * The da btree scrubber can handle leaf1 blocks as a degenerate
+ * form of leafn blocks.  Since the regular da code doesn't handle
+ * leaf1, we must multiplex the verifiers.
+ */
+static void
+xfs_scrub_da_btree_read_verify(
+	struct xfs_buf		*bp)
+{
+	struct xfs_da_blkinfo	*info = bp->b_addr;
+
+	switch (be16_to_cpu(info->magic)) {
+	case XFS_DIR2_LEAF1_MAGIC:
+	case XFS_DIR3_LEAF1_MAGIC:
+		bp->b_ops = &xfs_dir3_leaf1_buf_ops;
+		bp->b_ops->verify_read(bp);
+		return;
+	default:
+		/*
+		 * xfs_da3_node_buf_ops already know how to handle
+		 * DA*_NODE, ATTR*_LEAF, and DIR*_LEAFN blocks.
+		 */
+		bp->b_ops = &xfs_da3_node_buf_ops;
+		bp->b_ops->verify_read(bp);
+		return;
+	}
+}
+static void
+xfs_scrub_da_btree_write_verify(
+	struct xfs_buf		*bp)
+{
+	struct xfs_da_blkinfo	*info = bp->b_addr;
+
+	switch (be16_to_cpu(info->magic)) {
+	case XFS_DIR2_LEAF1_MAGIC:
+	case XFS_DIR3_LEAF1_MAGIC:
+		bp->b_ops = &xfs_dir3_leaf1_buf_ops;
+		bp->b_ops->verify_write(bp);
+		return;
+	default:
+		/*
+		 * xfs_da3_node_buf_ops already know how to handle
+		 * DA*_NODE, ATTR*_LEAF, and DIR*_LEAFN blocks.
+		 */
+		bp->b_ops = &xfs_da3_node_buf_ops;
+		bp->b_ops->verify_write(bp);
+		return;
+	}
+}
+
+static const struct xfs_buf_ops xfs_scrub_da_btree_buf_ops = {
+	.name = "xfs_scrub_da_btree",
+	.verify_read = xfs_scrub_da_btree_read_verify,
+	.verify_write = xfs_scrub_da_btree_write_verify,
+};
+
+/* Check a block's sibling. */
+STATIC int
+xfs_scrub_da_btree_block_check_sibling(
+	struct xfs_scrub_da_btree	*ds,
+	int				level,
+	int				direction,
+	xfs_dablk_t			sibling)
+{
+	int				retval;
+	int				error;
+
+	if (!sibling)
+		return 0;
+
+	/* Move the alternate cursor one block in the direction given. */
+	memcpy(&ds->state->altpath, &ds->state->path,
+			sizeof(ds->state->altpath));
+	error = xfs_da3_path_shift(ds->state, &ds->state->altpath,
+			direction, false, &retval);
+	if (!xfs_scrub_da_process_error(ds, level, &error))
+		return error;
+	if (retval) {
+		xfs_scrub_da_set_corrupt(ds, level);
+		return error;
+	}
+
+	if (ds->state->altpath.blk[level].blkno != sibling)
+		xfs_scrub_da_set_corrupt(ds, level);
+	xfs_trans_brelse(ds->dargs.trans, ds->state->altpath.blk[level].bp);
+	return error;
+}
+
+/* Check a block's sibling pointers. */
+STATIC int
+xfs_scrub_da_btree_block_check_siblings(
+	struct xfs_scrub_da_btree	*ds,
+	int				level,
+	struct xfs_da_blkinfo		*hdr)
+{
+	xfs_dablk_t			forw;
+	xfs_dablk_t			back;
+	int				error = 0;
+
+	forw = be32_to_cpu(hdr->forw);
+	back = be32_to_cpu(hdr->back);
+
+	/* Top level blocks should not have sibling pointers. */
+	if (level == 0) {
+		if (forw != 0 || back != 0)
+			xfs_scrub_da_set_corrupt(ds, level);
+		return 0;
+	}
+
+	/*
+	 * Check back (left) and forw (right) pointers.  These functions
+	 * absorb error codes for us.
+	 */
+	error = xfs_scrub_da_btree_block_check_sibling(ds, level, 0, back);
+	if (error)
+		goto out;
+	error = xfs_scrub_da_btree_block_check_sibling(ds, level, 1, forw);
+
+out:
+	memset(&ds->state->altpath, 0, sizeof(ds->state->altpath));
+	return error;
+}
+
+/* Load a dir/attribute block from a btree. */
+STATIC int
+xfs_scrub_da_btree_block(
+	struct xfs_scrub_da_btree	*ds,
+	int				level,
+	xfs_dablk_t			blkno)
+{
+	struct xfs_da_state_blk		*blk;
+	struct xfs_da_intnode		*node;
+	struct xfs_da_node_entry	*btree;
+	struct xfs_da3_blkinfo		*hdr3;
+	struct xfs_da_args		*dargs = &ds->dargs;
+	struct xfs_inode		*ip = ds->dargs.dp;
+	xfs_ino_t			owner;
+	int				*pmaxrecs;
+	struct xfs_da3_icnode_hdr	nodehdr;
+	int				error;
+
+	blk = &ds->state->path.blk[level];
+	ds->state->path.active = level + 1;
+
+	/* Release old block. */
+	if (blk->bp) {
+		xfs_trans_brelse(dargs->trans, blk->bp);
+		blk->bp = NULL;
+	}
+
+	/* Check the pointer. */
+	blk->blkno = blkno;
+	if (!xfs_scrub_da_btree_ptr_ok(ds, level, blkno))
+		goto out_nobuf;
+
+	/* Read the buffer. */
+	error = xfs_da_read_buf(dargs->trans, dargs->dp, blk->blkno, -2,
+			&blk->bp, dargs->whichfork,
+			&xfs_scrub_da_btree_buf_ops);
+	if (!xfs_scrub_da_process_error(ds, level, &error))
+		goto out_nobuf;
+
+	/*
+	 * We didn't find a dir btree root block, which means that
+	 * there's no LEAF1/LEAFN tree (at least not where it's supposed
+	 * to be), so jump out now.
+	 */
+	if (ds->dargs.whichfork == XFS_DATA_FORK && level == 0 &&
+			blk->bp == NULL)
+		goto out_nobuf;
+
+	/* It's /not/ ok for attr trees not to have a da btree. */
+	if (blk->bp == NULL) {
+		xfs_scrub_da_set_corrupt(ds, level);
+		goto out_nobuf;
+	}
+
+	hdr3 = blk->bp->b_addr;
+	blk->magic = be16_to_cpu(hdr3->hdr.magic);
+	pmaxrecs = &ds->maxrecs[level];
+
+	/* Check the owner. */
+	if (xfs_sb_version_hascrc(&ip->i_mount->m_sb)) {
+		owner = be64_to_cpu(hdr3->owner);
+		if (owner != ip->i_ino)
+			xfs_scrub_da_set_corrupt(ds, level);
+	}
+
+	/* Check the siblings. */
+	error = xfs_scrub_da_btree_block_check_siblings(ds, level, &hdr3->hdr);
+	if (error)
+		goto out;
+
+	/* Interpret the buffer. */
+	switch (blk->magic) {
+	case XFS_ATTR_LEAF_MAGIC:
+	case XFS_ATTR3_LEAF_MAGIC:
+		xfs_trans_buf_set_type(dargs->trans, blk->bp,
+				XFS_BLFT_ATTR_LEAF_BUF);
+		blk->magic = XFS_ATTR_LEAF_MAGIC;
+		blk->hashval = xfs_attr_leaf_lasthash(blk->bp, pmaxrecs);
+		if (ds->tree_level != 0)
+			xfs_scrub_da_set_corrupt(ds, level);
+		break;
+	case XFS_DIR2_LEAFN_MAGIC:
+	case XFS_DIR3_LEAFN_MAGIC:
+		xfs_trans_buf_set_type(dargs->trans, blk->bp,
+				XFS_BLFT_DIR_LEAFN_BUF);
+		blk->magic = XFS_DIR2_LEAFN_MAGIC;
+		blk->hashval = xfs_dir2_leaf_lasthash(ip, blk->bp, pmaxrecs);
+		if (ds->tree_level != 0)
+			xfs_scrub_da_set_corrupt(ds, level);
+		break;
+	case XFS_DIR2_LEAF1_MAGIC:
+	case XFS_DIR3_LEAF1_MAGIC:
+		xfs_trans_buf_set_type(dargs->trans, blk->bp,
+				XFS_BLFT_DIR_LEAF1_BUF);
+		blk->magic = XFS_DIR2_LEAF1_MAGIC;
+		blk->hashval = xfs_dir2_leaf_lasthash(ip, blk->bp, pmaxrecs);
+		if (ds->tree_level != 0)
+			xfs_scrub_da_set_corrupt(ds, level);
+		break;
+	case XFS_DA_NODE_MAGIC:
+	case XFS_DA3_NODE_MAGIC:
+		xfs_trans_buf_set_type(dargs->trans, blk->bp,
+				XFS_BLFT_DA_NODE_BUF);
+		blk->magic = XFS_DA_NODE_MAGIC;
+		node = blk->bp->b_addr;
+		ip->d_ops->node_hdr_from_disk(&nodehdr, node);
+		btree = ip->d_ops->node_tree_p(node);
+		*pmaxrecs = nodehdr.count;
+		blk->hashval = be32_to_cpu(btree[*pmaxrecs - 1].hashval);
+		if (level == 0) {
+			if (nodehdr.level >= XFS_DA_NODE_MAXDEPTH) {
+				xfs_scrub_da_set_corrupt(ds, level);
+				goto out_freebp;
+			}
+			ds->tree_level = nodehdr.level;
+		} else {
+			if (ds->tree_level != nodehdr.level) {
+				xfs_scrub_da_set_corrupt(ds, level);
+				goto out_freebp;
+			}
+		}
+		break;
+	default:
+		xfs_scrub_da_set_corrupt(ds, level);
+		goto out_freebp;
+	}
+
+out:
+	return error;
+out_freebp:
+	xfs_trans_brelse(dargs->trans, blk->bp);
+	blk->bp = NULL;
+out_nobuf:
+	blk->blkno = 0;
+	return error;
+}
+
+/* Visit all nodes and leaves of a da btree. */
+int
+xfs_scrub_da_btree(
+	struct xfs_scrub_context	*sc,
+	int				whichfork,
+	xfs_scrub_da_btree_rec_fn	scrub_fn)
+{
+	struct xfs_scrub_da_btree	ds = {};
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_da_state_blk		*blks;
+	struct xfs_da_node_entry	*key;
+	void				*rec;
+	xfs_dablk_t			blkno;
+	int				level;
+	int				error;
+
+	/* Skip short format data structures; no btree to scan. */
+	if (XFS_IFORK_FORMAT(sc->ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
+	    XFS_IFORK_FORMAT(sc->ip, whichfork) != XFS_DINODE_FMT_BTREE)
+		return 0;
+
+	/* Set up initial da state. */
+	ds.dargs.dp = sc->ip;
+	ds.dargs.whichfork = whichfork;
+	ds.dargs.trans = sc->tp;
+	ds.dargs.op_flags = XFS_DA_OP_OKNOENT;
+	ds.state = xfs_da_state_alloc();
+	ds.state->args = &ds.dargs;
+	ds.state->mp = mp;
+	ds.sc = sc;
+	if (whichfork == XFS_ATTR_FORK) {
+		ds.dargs.geo = mp->m_attr_geo;
+		ds.lowest = 0;
+		ds.highest = 0;
+	} else {
+		ds.dargs.geo = mp->m_dir_geo;
+		ds.lowest = ds.dargs.geo->leafblk;
+		ds.highest = ds.dargs.geo->freeblk;
+	}
+	blkno = ds.lowest;
+	level = 0;
+
+	/* Find the root of the da tree, if present. */
+	blks = ds.state->path.blk;
+	error = xfs_scrub_da_btree_block(&ds, level, blkno);
+	if (error)
+		goto out_state;
+	/*
+	 * We didn't find a block at ds.lowest, which means that there's
+	 * no LEAF1/LEAFN tree (at least not where it's supposed to be),
+	 * so jump out now.
+	 */
+	if (blks[level].bp == NULL)
+		goto out_state;
+
+	blks[level].index = 0;
+	while (level >= 0 && level < XFS_DA_NODE_MAXDEPTH) {
+		/* Handle leaf block. */
+		if (blks[level].magic != XFS_DA_NODE_MAGIC) {
+			/* End of leaf, pop back towards the root. */
+			if (blks[level].index >= ds.maxrecs[level]) {
+				if (level > 0)
+					blks[level - 1].index++;
+				ds.tree_level++;
+				level--;
+				continue;
+			}
+
+			/* Dispatch record scrubbing. */
+			rec = xfs_scrub_da_btree_entry(&ds, level,
+					blks[level].index);
+			error = scrub_fn(&ds, level, rec);
+			if (error)
+				break;
+			if (xfs_scrub_should_terminate(sc, &error) ||
+			    (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT))
+				break;
+
+			blks[level].index++;
+			continue;
+		}
+
+
+		/* End of node, pop back towards the root. */
+		if (blks[level].index >= ds.maxrecs[level]) {
+			if (level > 0)
+				blks[level - 1].index++;
+			ds.tree_level++;
+			level--;
+			continue;
+		}
+
+		/* Hashes in order for scrub? */
+		key = xfs_scrub_da_btree_entry(&ds, level, blks[level].index);
+		error = xfs_scrub_da_btree_hash(&ds, level, &key->hashval);
+		if (error)
+			goto out;
+
+		/* Drill another level deeper. */
+		blkno = be32_to_cpu(key->before);
+		level++;
+		ds.tree_level--;
+		error = xfs_scrub_da_btree_block(&ds, level, blkno);
+		if (error)
+			goto out;
+		if (blks[level].bp == NULL)
+			goto out;
+
+		blks[level].index = 0;
+	}
+
+out:
+	/* Release all the buffers we're tracking. */
+	for (level = 0; level < XFS_DA_NODE_MAXDEPTH; level++) {
+		if (blks[level].bp == NULL)
+			continue;
+		xfs_trans_brelse(sc->tp, blks[level].bp);
+		blks[level].bp = NULL;
+	}
+
+out_state:
+	xfs_da_state_free(ds.state);
+	return error;
+}
diff --git a/fs/xfs/scrub/dabtree.h b/fs/xfs/scrub/dabtree.h
new file mode 100644
index 0000000..2a766de1f
--- /dev/null
+++ b/fs/xfs/scrub/dabtree.h
@@ -0,0 +1,58 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef __XFS_SCRUB_DABTREE_H__
+#define __XFS_SCRUB_DABTREE_H__
+
+/* dir/attr btree */
+
+struct xfs_scrub_da_btree {
+	struct xfs_da_args		dargs;
+	xfs_dahash_t			hashes[XFS_DA_NODE_MAXDEPTH];
+	int				maxrecs[XFS_DA_NODE_MAXDEPTH];
+	struct xfs_da_state		*state;
+	struct xfs_scrub_context	*sc;
+
+	/*
+	 * Lowest and highest directory block address in which we expect
+	 * to find dir/attr btree node blocks.  For a directory this
+	 * (presumably) means between LEAF_OFFSET and FREE_OFFSET; for
+	 * attributes there is no limit.
+	 */
+	xfs_dablk_t			lowest;
+	xfs_dablk_t			highest;
+
+	int				tree_level;
+};
+
+typedef int (*xfs_scrub_da_btree_rec_fn)(struct xfs_scrub_da_btree *ds,
+		int level, void *rec);
+
+/* Check for da btree operation errors. */
+bool xfs_scrub_da_process_error(struct xfs_scrub_da_btree *ds, int level, int *error);
+
+/* Check for da btree corruption. */
+void xfs_scrub_da_set_corrupt(struct xfs_scrub_da_btree *ds, int level);
+
+int xfs_scrub_da_btree_hash(struct xfs_scrub_da_btree *ds, int level,
+			    __be32 *hashp);
+int xfs_scrub_da_btree(struct xfs_scrub_context *sc, int whichfork,
+		       xfs_scrub_da_btree_rec_fn scrub_fn);
+
+#endif /* __XFS_SCRUB_DABTREE_H__ */


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 24/30] xfs: scrub directory metadata
  2017-10-12  1:40 [PATCH v12 00/30] xfs: online scrub support Darrick J. Wong
                   ` (22 preceding siblings ...)
  2017-10-12  1:43 ` [PATCH 23/30] xfs: scrub directory/attribute btrees Darrick J. Wong
@ 2017-10-12  1:43 ` Darrick J. Wong
  2017-10-16  4:29   ` Dave Chinner
  2017-10-17  0:14   ` [PATCH v2 " Darrick J. Wong
  2017-10-12  1:43 ` [PATCH 25/30] xfs: scrub directory freespace Darrick J. Wong
                   ` (5 subsequent siblings)
  29 siblings, 2 replies; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-12  1:43 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Scrub the hash tree and all the entries in a directory.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile        |    1 
 fs/xfs/libxfs/xfs_fs.h |    3 
 fs/xfs/scrub/common.c  |   28 ++++
 fs/xfs/scrub/common.h  |    4 +
 fs/xfs/scrub/dir.c     |  318 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/scrub.c   |    4 +
 fs/xfs/scrub/scrub.h   |    1 
 7 files changed, 358 insertions(+), 1 deletion(-)
 create mode 100644 fs/xfs/scrub/dir.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index b48437f..69aa88e 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -152,6 +152,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   btree.o \
 				   common.o \
 				   dabtree.o \
+				   dir.o \
 				   ialloc.o \
 				   inode.o \
 				   refcount.o \
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 02ae58b..b16d004 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -498,9 +498,10 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_BMBTD	12	/* data fork block mapping */
 #define XFS_SCRUB_TYPE_BMBTA	13	/* attr fork block mapping */
 #define XFS_SCRUB_TYPE_BMBTC	14	/* CoW fork block mapping */
+#define XFS_SCRUB_TYPE_DIR	15	/* directory */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	15
+#define XFS_SCRUB_TYPE_NR	16
 
 /* i: Repair this metadata. */
 #define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 415c6a9..318dd97 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -542,3 +542,31 @@ xfs_scrub_get_inode(
 	sc->ip = ip;
 	return 0;
 }
+
+/* Set us up to scrub a file's contents. */
+int
+xfs_scrub_setup_inode_contents(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip,
+	unsigned int			resblks)
+{
+	struct xfs_mount		*mp = sc->mp;
+	int				error;
+
+	error = xfs_scrub_get_inode(sc, ip);
+	if (error)
+		return error;
+
+	/* Got the inode, lock it and we're ready to go. */
+	sc->ilock_flags = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL;
+	xfs_ilock(sc->ip, sc->ilock_flags);
+	error = xfs_scrub_trans_alloc(sc->sm, mp, &sc->tp);
+	if (error)
+		goto out;
+	sc->ilock_flags |= XFS_ILOCK_EXCL;
+	xfs_ilock(sc->ip, XFS_ILOCK_EXCL);
+
+out:
+	/* scrub teardown will unlock and release the inode for us */
+	return error;
+}
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index b3cf4a2..7cd4a78 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -93,6 +93,8 @@ int xfs_scrub_setup_inode_bmap(struct xfs_scrub_context *sc,
 			       struct xfs_inode *ip);
 int xfs_scrub_setup_inode_bmap_data(struct xfs_scrub_context *sc,
 				    struct xfs_inode *ip);
+int xfs_scrub_setup_directory(struct xfs_scrub_context *sc,
+			      struct xfs_inode *ip);
 
 void xfs_scrub_ag_free(struct xfs_scrub_context *sc, struct xfs_scrub_ag *sa);
 int xfs_scrub_ag_init(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
@@ -111,5 +113,7 @@ int xfs_scrub_walk_agfl(struct xfs_scrub_context *sc,
 int xfs_scrub_setup_ag_btree(struct xfs_scrub_context *sc,
 			     struct xfs_inode *ip, bool force_log);
 int xfs_scrub_get_inode(struct xfs_scrub_context *sc, struct xfs_inode *ip_in);
+int xfs_scrub_setup_inode_contents(struct xfs_scrub_context *sc,
+				   struct xfs_inode *ip, unsigned int resblks);
 
 #endif	/* __XFS_SCRUB_COMMON_H__ */
diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c
new file mode 100644
index 0000000..e2a8f90
--- /dev/null
+++ b/fs/xfs/scrub/dir.c
@@ -0,0 +1,318 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_icache.h"
+#include "xfs_itable.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_dir2.h"
+#include "xfs_dir2_priv.h"
+#include "xfs_ialloc.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/trace.h"
+#include "scrub/dabtree.h"
+
+/* Set us up to scrub directories. */
+int
+xfs_scrub_setup_directory(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	return xfs_scrub_setup_inode_contents(sc, ip, 0);
+}
+
+/* Directories */
+
+/* Scrub a directory entry. */
+
+struct xfs_scrub_dir_ctx {
+	/* VFS fill-directory iterator */
+	struct dir_context		dir_iter;
+
+	struct xfs_scrub_context	*sc;
+};
+
+/* Check that an inode's mode matches a given DT_ type. */
+STATIC int
+xfs_scrub_dir_check_ftype(
+	struct xfs_scrub_dir_ctx	*sdc,
+	xfs_fileoff_t			offset,
+	xfs_ino_t			inum,
+	int				dtype)
+{
+	struct xfs_mount		*mp = sdc->sc->mp;
+	struct xfs_inode		*ip;
+	int				ino_dtype;
+	int				error = 0;
+
+	if (!xfs_sb_version_hasftype(&mp->m_sb)) {
+		if (dtype != DT_UNKNOWN && dtype != DT_DIR)
+			xfs_scrub_fblock_set_corrupt(sdc->sc, XFS_DATA_FORK,
+					offset);
+		goto out;
+	}
+
+	error = xfs_iget(mp, sdc->sc->tp, inum, XFS_IGET_DONTCACHE, 0, &ip);
+	if (!xfs_scrub_fblock_process_error(sdc->sc, XFS_DATA_FORK, offset,
+			&error))
+		goto out;
+
+	/* Convert mode to the DT_* values that dir_emit uses. */
+	ino_dtype = (VFS_I(ip)->i_mode & S_IFMT) >> 12;
+	if (ino_dtype != dtype)
+		xfs_scrub_fblock_set_corrupt(sdc->sc, XFS_DATA_FORK, offset);
+	iput(VFS_I(ip));
+out:
+	return error;
+}
+
+/*
+ * Scrub a single directory entry.
+ *
+ * We use the VFS directory iterator (i.e. readdir) to call this
+ * function for every directory entry in a directory.  Once we're here,
+ * we check the inode number to make sure it's sane, then we check that
+ * we can look up this filename.  Finally, we check the ftype.
+ */
+STATIC int
+xfs_scrub_dir_actor(
+	struct dir_context		*dir_iter,
+	const char			*name,
+	int				namelen,
+	loff_t				pos,
+	u64				ino,
+	unsigned			type)
+{
+	struct xfs_mount		*mp;
+	struct xfs_inode		*ip;
+	struct xfs_scrub_dir_ctx	*sdc;
+	struct xfs_name			xname;
+	xfs_ino_t			lookup_ino;
+	xfs_dablk_t			offset;
+	int				error = 0;
+
+	sdc = container_of(dir_iter, struct xfs_scrub_dir_ctx, dir_iter);
+	ip = sdc->sc->ip;
+	mp = ip->i_mount;
+	offset = xfs_dir2_db_to_da(mp->m_dir_geo,
+			xfs_dir2_dataptr_to_db(mp->m_dir_geo, pos));
+
+	/* Does this inode number make sense? */
+	if (!xfs_verify_dir_ino_ptr(mp, ino)) {
+		xfs_scrub_fblock_set_corrupt(sdc->sc, XFS_DATA_FORK, offset);
+		goto out;
+	}
+
+	if (!strncmp(".", name, namelen)) {
+		/* If this is "." then check that the inum matches the dir. */
+		if (xfs_sb_version_hasftype(&mp->m_sb) && type != DT_DIR)
+			xfs_scrub_fblock_set_corrupt(sdc->sc, XFS_DATA_FORK,
+					offset);
+		if (ino != ip->i_ino)
+			xfs_scrub_fblock_set_corrupt(sdc->sc, XFS_DATA_FORK,
+					offset);
+	} else if (!strncmp("..", name, namelen)) {
+		/*
+		 * If this is ".." in the root inode, check that the inum
+		 * matches this dir.
+		 */
+		if (xfs_sb_version_hasftype(&mp->m_sb) && type != DT_DIR)
+			xfs_scrub_fblock_set_corrupt(sdc->sc, XFS_DATA_FORK,
+					offset);
+		if (ip->i_ino == mp->m_sb.sb_rootino && ino != ip->i_ino)
+			xfs_scrub_fblock_set_corrupt(sdc->sc, XFS_DATA_FORK,
+					offset);
+	}
+
+	/* Verify that we can look up this name by hash. */
+	xname.name = name;
+	xname.len = namelen;
+	xname.type = XFS_DIR3_FT_UNKNOWN;
+
+	error = xfs_dir_lookup(sdc->sc->tp, ip, &xname, &lookup_ino, NULL);
+	if (!xfs_scrub_fblock_process_error(sdc->sc, XFS_DATA_FORK, offset,
+			&error))
+		goto fail_xref;
+	if (lookup_ino != ino) {
+		xfs_scrub_fblock_set_corrupt(sdc->sc, XFS_DATA_FORK, offset);
+		goto out;
+	}
+
+	/* Verify the file type.  This function absorbs error codes. */
+	error = xfs_scrub_dir_check_ftype(sdc, offset, lookup_ino, type);
+	if (error)
+		goto out;
+out:
+	return error;
+fail_xref:
+	return error;
+}
+
+/* Scrub a directory btree record. */
+STATIC int
+xfs_scrub_dir_rec(
+	struct xfs_scrub_da_btree	*ds,
+	int				level,
+	void				*rec)
+{
+	struct xfs_mount		*mp = ds->state->mp;
+	struct xfs_dir2_leaf_entry	*ent = rec;
+	struct xfs_inode		*dp = ds->dargs.dp;
+	struct xfs_dir2_data_entry	*dent;
+	struct xfs_buf			*bp;
+	xfs_ino_t			ino;
+	xfs_dablk_t			rec_bno;
+	xfs_dir2_db_t			db;
+	xfs_dir2_data_aoff_t		off;
+	xfs_dir2_dataptr_t		ptr;
+	xfs_dahash_t			calc_hash;
+	xfs_dahash_t			hash;
+	unsigned int			tag;
+	int				error;
+
+	/* Check the hash of the entry. */
+	error = xfs_scrub_da_btree_hash(ds, level, &ent->hashval);
+	if (error)
+		goto out;
+
+	/* Valid hash pointer? */
+	ptr = be32_to_cpu(ent->address);
+	if (ptr == 0)
+		return 0;
+
+	/* Find the directory entry's location. */
+	db = xfs_dir2_dataptr_to_db(mp->m_dir_geo, ptr);
+	off = xfs_dir2_dataptr_to_off(mp->m_dir_geo, ptr);
+	rec_bno = xfs_dir2_db_to_da(mp->m_dir_geo, db);
+
+	if (rec_bno >= mp->m_dir_geo->leafblk) {
+		xfs_scrub_da_set_corrupt(ds, level);
+		goto out;
+	}
+	error = xfs_dir3_data_read(ds->dargs.trans, dp, rec_bno, -2, &bp);
+	if (!xfs_scrub_fblock_process_error(ds->sc, XFS_DATA_FORK, rec_bno,
+			&error))
+		goto out;
+	if (!bp) {
+		xfs_scrub_fblock_set_corrupt(ds->sc, XFS_DATA_FORK, rec_bno);
+		goto out;
+	}
+
+	/* Retrieve the entry, sanity check it, and compare hashes. */
+	dent = (struct xfs_dir2_data_entry *)(((char *)bp->b_addr) + off);
+	ino = be64_to_cpu(dent->inumber);
+	hash = be32_to_cpu(ent->hashval);
+	tag = be16_to_cpup(dp->d_ops->data_entry_tag_p(dent));
+	if (!xfs_verify_dir_ino_ptr(mp, ino) || tag != off)
+		xfs_scrub_fblock_set_corrupt(ds->sc, XFS_DATA_FORK, rec_bno);
+	if (dent->namelen == 0) {
+		xfs_scrub_fblock_set_corrupt(ds->sc, XFS_DATA_FORK, rec_bno);
+		goto out_relse;
+	}
+	calc_hash = xfs_da_hashname(dent->name, dent->namelen);
+	if (calc_hash != hash)
+		xfs_scrub_fblock_set_corrupt(ds->sc, XFS_DATA_FORK, rec_bno);
+
+out_relse:
+	xfs_trans_brelse(ds->dargs.trans, bp);
+out:
+	return error;
+}
+
+/* Scrub a whole directory. */
+int
+xfs_scrub_directory(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_scrub_dir_ctx	sdc = {
+		.dir_iter.actor = xfs_scrub_dir_actor,
+		.dir_iter.pos = 0,
+		.sc = sc,
+	};
+	size_t				bufsize;
+	loff_t				oldpos;
+	int				error;
+
+	if (!S_ISDIR(VFS_I(sc->ip)->i_mode))
+		return -ENOENT;
+
+	/* Plausible size? */
+	if (sc->ip->i_d.di_size < xfs_dir2_sf_hdr_size(0)) {
+		xfs_scrub_ino_set_corrupt(sc, sc->ip->i_ino, NULL);
+		goto out;
+	}
+
+	/* Check directory tree structure */
+	error = xfs_scrub_da_btree(sc, XFS_DATA_FORK, xfs_scrub_dir_rec);
+	if (error)
+		return error;
+
+	/*
+	 * Check that every dirent we see can also be looked up by hash.
+	 * Userspace usually asks for a 32k buffer, so we will too.
+	 */
+	bufsize = (size_t)min_t(loff_t, 32768, sc->ip->i_d.di_size);
+
+	/*
+	 * Look up every name in this directory by hash.
+	 *
+	 * Use the xfs_readdir function to call xfs_scrub_dir_actor on
+	 * every directory entry in this directory.  In _actor, we check
+	 * the name, inode number, and ftype (if applicable) of the
+	 * entry.  xfs_readdir uses the VFS filldir functions to provide
+	 * iteration context.
+	 *
+	 * The VFS grabs a read or write lock via i_rwsem before it reads
+	 * or writes to a directory.  If we've gotten this far we've
+	 * already obtained IOLOCK_EXCL, which (since 4.10) is the same as
+	 * getting a write lock on i_rwsem.  Therefore, it is safe for us
+	 * to drop the ILOCK here in order to reuse the _readdir and
+	 * _dir_lookup routines, which do their own ILOCK locking.
+	 */
+	oldpos = 0;
+	sc->ilock_flags &= ~XFS_ILOCK_EXCL;
+	xfs_iunlock(sc->ip, XFS_ILOCK_EXCL);
+	while (true) {
+		error = xfs_readdir(sc->tp, sc->ip, &sdc.dir_iter, bufsize);
+		if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, 0,
+				&error))
+			goto out;
+		if (oldpos == sdc.dir_iter.pos)
+			break;
+		oldpos = sdc.dir_iter.pos;
+	}
+
+out:
+	return error;
+}
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index b20fdd3..4a44727 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -217,6 +217,10 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.setup	= xfs_scrub_setup_inode_bmap,
 		.scrub	= xfs_scrub_bmap_cow,
 	},
+	{ /* directory */
+		.setup	= xfs_scrub_setup_directory,
+		.scrub	= xfs_scrub_directory,
+	},
 };
 
 /* This isn't a stable feature, warn once per day. */
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index 8920ccf..844506e 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -82,5 +82,6 @@ int xfs_scrub_inode(struct xfs_scrub_context *sc);
 int xfs_scrub_bmap_data(struct xfs_scrub_context *sc);
 int xfs_scrub_bmap_attr(struct xfs_scrub_context *sc);
 int xfs_scrub_bmap_cow(struct xfs_scrub_context *sc);
+int xfs_scrub_directory(struct xfs_scrub_context *sc);
 
 #endif	/* __XFS_SCRUB_SCRUB_H__ */


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 25/30] xfs: scrub directory freespace
  2017-10-12  1:40 [PATCH v12 00/30] xfs: online scrub support Darrick J. Wong
                   ` (23 preceding siblings ...)
  2017-10-12  1:43 ` [PATCH 24/30] xfs: scrub directory metadata Darrick J. Wong
@ 2017-10-12  1:43 ` Darrick J. Wong
  2017-10-16  4:49   ` Dave Chinner
  2017-10-17  1:10   ` [PATCH v2 " Darrick J. Wong
  2017-10-12  1:43 ` [PATCH 26/30] xfs: scrub extended attributes Darrick J. Wong
                   ` (4 subsequent siblings)
  29 siblings, 2 replies; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-12  1:43 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Check the free space information in a directory.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/scrub/dir.c |  425 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 425 insertions(+)


diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c
index e2a8f90..a41310f 100644
--- a/fs/xfs/scrub/dir.c
+++ b/fs/xfs/scrub/dir.c
@@ -250,6 +250,426 @@ xfs_scrub_dir_rec(
 	return error;
 }
 
+/*
+ * Is this unused entry either in the bestfree or smaller than all of them?
+ * We assume the bestfrees are sorted longest to shortest, and that there
+ * aren't any bogus entries.
+ */
+static inline void
+xfs_scrub_directory_check_free_entry(
+	struct xfs_scrub_context	*sc,
+	xfs_dablk_t			lblk,
+	struct xfs_dir2_data_free	*bf,
+	struct xfs_dir2_data_unused	*dup)
+{
+	struct xfs_dir2_data_free	*dfp;
+	unsigned int			dup_length;
+
+	dup_length = be16_to_cpu(dup->length);
+
+	/* Unused entry is shorter than any of the bestfrees */
+	if (dup_length < be16_to_cpu(bf[XFS_DIR2_DATA_FD_COUNT - 1].length))
+		return;
+
+	for (dfp = &bf[XFS_DIR2_DATA_FD_COUNT - 1]; dfp >= bf; dfp--)
+		if (dup_length == be16_to_cpu(dfp->length))
+			return;
+
+	/* Unused entry should be in the bestfrees but wasn't found. */
+	xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
+}
+
+/* Check free space info in a directory data block. */
+STATIC int
+xfs_scrub_directory_data_bestfree(
+	struct xfs_scrub_context	*sc,
+	xfs_dablk_t			lblk,
+	bool				is_block)
+{
+	struct xfs_dir2_data_unused	*dup;
+	struct xfs_dir2_data_free	*dfp;
+	struct xfs_buf			*bp;
+	struct xfs_dir2_data_free	*bf;
+	struct xfs_mount		*mp = sc->mp;
+	const struct xfs_dir_ops	*d_ops;
+	char				*ptr;
+	char				*endptr;
+	u16				tag;
+	unsigned int			nr_bestfrees = 0;
+	unsigned int			nr_frees = 0;
+	unsigned int			smallest_bestfree;
+	int				newlen;
+	int				offset;
+	int				error;
+
+	d_ops = sc->ip->d_ops;
+
+	if (is_block) {
+		/* dir block format */
+		if (lblk != XFS_B_TO_FSBT(mp, XFS_DIR2_DATA_OFFSET))
+			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
+		error = xfs_dir3_block_read(sc->tp, sc->ip, &bp);
+	} else {
+		/* dir data format */
+		error = xfs_dir3_data_read(sc->tp, sc->ip, lblk, -1, &bp);
+	}
+	if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, lblk, &error))
+		goto out;
+
+	/* Do the bestfrees correspond to actual free space? */
+	bf = d_ops->data_bestfree_p(bp->b_addr);
+	smallest_bestfree = UINT_MAX;
+	for (dfp = &bf[0]; dfp < &bf[XFS_DIR2_DATA_FD_COUNT]; dfp++) {
+		offset = be16_to_cpu(dfp->offset);
+		if (offset == 0)
+			continue;
+		if (offset >= mp->m_dir_geo->blksize) {
+			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
+			goto out_buf;
+		}
+		dup = (struct xfs_dir2_data_unused *)(bp->b_addr + offset);
+		tag = be16_to_cpu(*xfs_dir2_data_unused_tag_p(dup));
+
+		/* bestfree doesn't match the entry it points at? */
+		if (dup->freetag != cpu_to_be16(XFS_DIR2_DATA_FREE_TAG) ||
+		    be16_to_cpu(dup->length) != be16_to_cpu(dfp->length) ||
+		    tag != ((char *)dup - (char *)bp->b_addr)) {
+			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
+			goto out_buf;
+		}
+
+		/* bestfree records should be ordered largest to smallest */
+		if (smallest_bestfree < be16_to_cpu(dfp->length)) {
+			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
+			goto out_buf;
+		}
+
+		smallest_bestfree = be16_to_cpu(dfp->length);
+		nr_bestfrees++;
+	}
+
+	/* Make sure the bestfrees are actually the best free spaces. */
+	ptr = (char *)d_ops->data_entry_p(bp->b_addr);
+	if (is_block) {
+		struct xfs_dir2_block_tail	*btp;
+
+		btp = xfs_dir2_block_tail_p(mp->m_dir_geo, bp->b_addr);
+		endptr = (char *)xfs_dir2_block_leaf_p(btp);
+	} else
+		endptr = (char *)bp->b_addr + BBTOB(bp->b_length);
+	while (ptr < endptr) {
+		dup = (struct xfs_dir2_data_unused *)ptr;
+		/* Skip real entries */
+		if (dup->freetag != cpu_to_be16(XFS_DIR2_DATA_FREE_TAG)) {
+			struct xfs_dir2_data_entry	*dep;
+
+			dep = (struct xfs_dir2_data_entry *)ptr;
+			newlen = d_ops->data_entsize(dep->namelen);
+			if (newlen <= 0) {
+				xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK,
+						lblk);
+				goto out_buf;
+			}
+			ptr += newlen;
+			if (endptr < ptr)
+				xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK,
+					      lblk);
+			continue;
+		}
+
+		/* Spot check this free entry */
+		tag = be16_to_cpu(*xfs_dir2_data_unused_tag_p(dup));
+		if (tag != ((char *)dup - (char *)bp->b_addr))
+			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
+
+		/*
+		 * Either this entry is a bestfree or it's smaller than
+		 * any of the bestfrees.
+		 */
+		xfs_scrub_directory_check_free_entry(sc, lblk, bf, dup);
+
+		/* Move on. */
+		newlen = be16_to_cpu(dup->length);
+		if (newlen <= 0) {
+			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
+			goto out_buf;
+		}
+		ptr += newlen;
+		if (endptr < ptr)
+			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
+		else
+			nr_frees++;
+	}
+
+	/* Did we see at least as many free slots as there are bestfrees? */
+	if (nr_frees < nr_bestfrees)
+		xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
+out_buf:
+	xfs_trans_brelse(sc->tp, bp);
+out:
+	return error;
+}
+
+/*
+ * Does the free space length in the free space index block ($len) match
+ * the longest length in the directory data block's bestfree array?
+ * Assume that we've already checked that the data block's bestfree
+ * array is in order.
+ */
+static inline void
+xfs_scrub_directory_check_freesp(
+	struct xfs_scrub_context	*sc,
+	xfs_dablk_t			lblk,
+	struct xfs_buf			*dbp,
+	unsigned int			len)
+{
+	struct xfs_dir2_data_free	*bf;
+	struct xfs_dir2_data_free	*dfp;
+	int				offset;
+
+	if (len == 0)
+		return;
+
+	bf = sc->ip->d_ops->data_bestfree_p(dbp->b_addr);
+	for (dfp = &bf[0]; dfp < &bf[XFS_DIR2_DATA_FD_COUNT]; dfp++) {
+		offset = be16_to_cpu(dfp->offset);
+		if (offset == 0)
+			break;
+		if (len == be16_to_cpu(dfp->length))
+			return;
+		/* Didn't find the best length in the bestfree data */
+		break;
+	}
+
+	xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
+}
+
+/* Check free space info in a directory leaf1 block. */
+STATIC int
+xfs_scrub_directory_leaf1_bestfree(
+	struct xfs_scrub_context	*sc,
+	struct xfs_da_args		*args,
+	xfs_dablk_t			lblk)
+{
+	struct xfs_dir2_leaf_tail	*ltp;
+	struct xfs_buf			*dbp;
+	struct xfs_buf			*bp;
+	struct xfs_mount		*mp = sc->mp;
+	__be16				*bestp;
+	__u16				best;
+	int				i;
+	int				error;
+
+	/*
+	 * Read the free space block.  The verifier will check for hash
+	 * value ordering problems and check the stale entry count.
+	 */
+	error = xfs_dir3_leaf_read(sc->tp, sc->ip, lblk, -1, &bp);
+	if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, lblk, &error))
+		goto out;
+
+	/* Check all the entries. */
+	ltp = xfs_dir2_leaf_tail_p(mp->m_dir_geo, bp->b_addr);
+	bestp = xfs_dir2_leaf_bests_p(ltp);
+	for (i = 0; i < be32_to_cpu(ltp->bestcount); i++, bestp++) {
+		best = be16_to_cpu(*bestp);
+		if (best == NULLDATAOFF)
+			continue;
+		error = xfs_dir3_data_read(sc->tp, sc->ip,
+				i * args->geo->fsbcount, -1, &dbp);
+		if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, lblk,
+				&error))
+			continue;
+		xfs_scrub_directory_check_freesp(sc, lblk, dbp, best);
+		xfs_trans_brelse(sc->tp, dbp);
+	}
+out:
+	return error;
+}
+
+/* Check free space info in a directory freespace block. */
+STATIC int
+xfs_scrub_directory_free_bestfree(
+	struct xfs_scrub_context	*sc,
+	struct xfs_da_args		*args,
+	xfs_dablk_t			lblk)
+{
+	struct xfs_dir3_icfree_hdr	freehdr;
+	struct xfs_buf			*dbp;
+	struct xfs_buf			*bp;
+	__be16				*bestp;
+	__be16				best;
+	int				i;
+	int				error;
+
+	/* Read the free space block */
+	error = xfs_dir2_free_read(sc->tp, sc->ip, lblk, &bp);
+	if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, lblk, &error))
+		goto out;
+
+	/* Check all the entries. */
+	sc->ip->d_ops->free_hdr_from_disk(&freehdr, bp->b_addr);
+	bestp = sc->ip->d_ops->free_bests_p(bp->b_addr);
+	for (i = 0; i < freehdr.nvalid; i++, bestp++) {
+		best = be16_to_cpu(*bestp);
+		if (best == NULLDATAOFF)
+			continue;
+		error = xfs_dir3_data_read(sc->tp, sc->ip,
+				(freehdr.firstdb + i) * args->geo->fsbcount,
+				-1, &dbp);
+		if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, lblk,
+				&error))
+			continue;
+		xfs_scrub_directory_check_freesp(sc, lblk, dbp, best);
+		xfs_trans_brelse(sc->tp, dbp);
+	}
+out:
+	return error;
+}
+
+/* Check free space information in directories. */
+STATIC int
+xfs_scrub_directory_blocks(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_bmbt_irec		got;
+	struct xfs_da_args		args;
+	struct xfs_ifork		*ifp;
+	struct xfs_mount		*mp = sc->mp;
+	xfs_fileoff_t			leaf_lblk;
+	xfs_fileoff_t			free_lblk;
+	xfs_fileoff_t			lblk;
+	xfs_extnum_t			idx;
+	bool				found;
+	int				is_block = 0;
+	int				error;
+
+	/* Ignore local format directories. */
+	if (sc->ip->i_d.di_format != XFS_DINODE_FMT_EXTENTS &&
+	    sc->ip->i_d.di_format != XFS_DINODE_FMT_BTREE)
+		return 0;
+
+	ifp = XFS_IFORK_PTR(sc->ip, XFS_DATA_FORK);
+	lblk = XFS_B_TO_FSB(mp, XFS_DIR2_DATA_OFFSET);
+	leaf_lblk = XFS_B_TO_FSB(mp, XFS_DIR2_LEAF_OFFSET);
+	free_lblk = XFS_B_TO_FSB(mp, XFS_DIR2_FREE_OFFSET);
+
+	/* Is this a block dir? */
+	args.dp = sc->ip;
+	args.geo = mp->m_dir_geo;
+	args.trans = sc->tp;
+	error = xfs_dir2_isblock(&args, &is_block);
+	if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, lblk, &error))
+		goto out;
+
+	/* Iterate all the data extents in the directory... */
+	found = xfs_iext_lookup_extent(sc->ip, ifp, lblk, &idx, &got);
+	while (found) {
+		/* Block directories only have a single block at offset 0. */
+		if (is_block &&
+		    (got.br_startoff > 0 ||
+		     got.br_blockcount != args.geo->fsbcount)) {
+			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK,
+					got.br_startoff);
+			break;
+		}
+
+		/* No more data blocks... */
+		if (got.br_startoff >= leaf_lblk)
+			break;
+
+		/*
+		 * Check each data block's bestfree data.
+		 *
+		 * Iterate all the fsbcount-aligned block offsets in
+		 * this directory.  The directory block reading code is
+		 * smart enough to do its own bmap lookups to handle
+		 * discontiguous directory blocks.  When we're done
+		 * with the extent record, re-query the bmap at the
+		 * next fsbcount-aligned offset to avoid redundant
+		 * block checks.
+		 */
+		for (lblk = roundup((xfs_dablk_t)got.br_startoff,
+				args.geo->fsbcount);
+		     lblk < got.br_startoff + got.br_blockcount;
+		     lblk += args.geo->fsbcount) {
+			error = xfs_scrub_directory_data_bestfree(sc, lblk,
+					is_block);
+			if (error)
+				goto out;
+		}
+		lblk = roundup((xfs_dablk_t)got.br_startoff + got.br_blockcount,
+				args.geo->fsbcount);
+		found = xfs_iext_lookup_extent(sc->ip, ifp, lblk, &idx, &got);
+	}
+
+	if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
+		goto out;
+
+	/* Look for a leaf1 block, which has free info. */
+	if (xfs_iext_lookup_extent(sc->ip, ifp, leaf_lblk, &idx, &got) &&
+	    got.br_startoff == leaf_lblk &&
+	    got.br_blockcount == args.geo->fsbcount &&
+	    !xfs_iext_get_extent(ifp, ++idx, &got)) {
+		if (is_block) {
+			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
+			goto out;
+		}
+		error = xfs_scrub_directory_leaf1_bestfree(sc, &args,
+				leaf_lblk);
+		if (error)
+			goto out;
+	}
+
+	if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
+		goto out;
+
+	/* Scan for free blocks */
+	lblk = free_lblk;
+	found = xfs_iext_lookup_extent(sc->ip, ifp, lblk, &idx, &got);
+	while (found) {
+		/*
+		 * Dirs can't have blocks mapped above 2^32.
+		 * Single-block dirs shouldn't even be here.
+		 */
+		lblk = got.br_startoff;
+		if (lblk & ~0xFFFFFFFFULL) {
+			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
+			goto out;
+		}
+		if (is_block) {
+			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
+			goto out;
+		}
+
+		/*
+		 * Check each dir free block's bestfree data.
+		 *
+		 * Iterate all the fsbcount-aligned block offsets in
+		 * this directory.  The directory block reading code is
+		 * smart enough to do its own bmap lookups to handle
+		 * discontiguous directory blocks.  When we're done
+		 * with the extent record, re-query the bmap at the
+		 * next fsbcount-aligned offset to avoid redundant
+		 * block checks.
+		 */
+		for (lblk = roundup((xfs_dablk_t)got.br_startoff,
+				args.geo->fsbcount);
+		     lblk < got.br_startoff + got.br_blockcount;
+		     lblk += args.geo->fsbcount) {
+			error = xfs_scrub_directory_free_bestfree(sc, &args,
+					lblk);
+			if (error)
+				goto out;
+		}
+		lblk = roundup((xfs_dablk_t)got.br_startoff + got.br_blockcount,
+				args.geo->fsbcount);
+		found = xfs_iext_lookup_extent(sc->ip, ifp, lblk, &idx, &got);
+	}
+out:
+	return error;
+}
+
 /* Scrub a whole directory. */
 int
 xfs_scrub_directory(
@@ -278,6 +698,11 @@ xfs_scrub_directory(
 	if (error)
 		return error;
 
+	/* Check the freespace. */
+	error = xfs_scrub_directory_blocks(sc);
+	if (error)
+		return error;
+
 	/*
 	 * Check that every dirent we see can also be looked up by hash.
 	 * Userspace usually asks for a 32k buffer, so we will too.


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 26/30] xfs: scrub extended attributes
  2017-10-12  1:40 [PATCH v12 00/30] xfs: online scrub support Darrick J. Wong
                   ` (24 preceding siblings ...)
  2017-10-12  1:43 ` [PATCH 25/30] xfs: scrub directory freespace Darrick J. Wong
@ 2017-10-12  1:43 ` Darrick J. Wong
  2017-10-16  4:50   ` Dave Chinner
  2017-10-12  1:43 ` [PATCH 27/30] xfs: scrub symbolic links Darrick J. Wong
                   ` (3 subsequent siblings)
  29 siblings, 1 reply; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-12  1:43 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Scrub the hash tree, keys, and values in an extended attribute structure.
Refactor the attribute code to use the transaction if the caller supplied
one to avoid buffer deadocks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile        |    1 
 fs/xfs/libxfs/xfs_fs.h |    3 -
 fs/xfs/scrub/attr.c    |  239 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/common.h  |    2 
 fs/xfs/scrub/scrub.c   |    8 ++
 fs/xfs/scrub/scrub.h   |    2 
 fs/xfs/xfs_attr.h      |    5 +
 fs/xfs/xfs_attr_list.c |    7 +
 8 files changed, 264 insertions(+), 3 deletions(-)
 create mode 100644 fs/xfs/scrub/attr.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 69aa88e..4d46399 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -148,6 +148,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   trace.o \
 				   agheader.o \
 				   alloc.o \
+				   attr.o \
 				   bmap.o \
 				   btree.o \
 				   common.o \
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index b16d004..0834ce6 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -499,9 +499,10 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_BMBTA	13	/* attr fork block mapping */
 #define XFS_SCRUB_TYPE_BMBTC	14	/* CoW fork block mapping */
 #define XFS_SCRUB_TYPE_DIR	15	/* directory */
+#define XFS_SCRUB_TYPE_XATTR	16	/* extended attribute */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	16
+#define XFS_SCRUB_TYPE_NR	17
 
 /* i: Repair this metadata. */
 #define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
new file mode 100644
index 0000000..69a4104
--- /dev/null
+++ b/fs/xfs/scrub/attr.c
@@ -0,0 +1,239 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_dir2.h"
+#include "xfs_attr.h"
+#include "xfs_attr_leaf.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/dabtree.h"
+#include "scrub/trace.h"
+
+#include <linux/posix_acl_xattr.h>
+#include <linux/xattr.h>
+
+/* Set us up to scrub an inode's extended attributes. */
+int
+xfs_scrub_setup_xattr(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	/* Allocate the buffer without the inode lock held. */
+	sc->buf = kmem_zalloc_large(XATTR_SIZE_MAX, KM_SLEEP);
+	if (!sc->buf)
+		return -ENOMEM;
+
+	return xfs_scrub_setup_inode_contents(sc, ip, 0);
+}
+
+/* Extended Attributes */
+
+struct xfs_scrub_xattr {
+	struct xfs_attr_list_context	context;
+	struct xfs_scrub_context	*sc;
+};
+
+/*
+ * Check that an extended attribute key can be looked up by hash.
+ *
+ * We use the XFS attribute list iterator (i.e. xfs_attr_list_int_ilocked)
+ * to call this function for every attribute key in an inode.  Once
+ * we're here, we load the attribute value to see if any errors happen,
+ * or if we get more or less data than we expected.
+ */
+static void
+xfs_scrub_xattr_listent(
+	struct xfs_attr_list_context	*context,
+	int				flags,
+	unsigned char			*name,
+	int				namelen,
+	int				valuelen)
+{
+	struct xfs_scrub_xattr		*sx;
+	struct xfs_da_args		args = {0};
+	int				error = 0;
+
+	sx = container_of(context, struct xfs_scrub_xattr, context);
+
+	if (flags & XFS_ATTR_INCOMPLETE) {
+		/* Incomplete attr key, just mark the inode for preening. */
+		xfs_scrub_ino_set_preen(sx->sc, NULL);
+		return;
+	}
+
+	args.flags = ATTR_KERNOTIME;
+	if (flags & XFS_ATTR_ROOT)
+		args.flags |= ATTR_ROOT;
+	else if (flags & XFS_ATTR_SECURE)
+		args.flags |= ATTR_SECURE;
+	args.geo = context->dp->i_mount->m_attr_geo;
+	args.whichfork = XFS_ATTR_FORK;
+	args.dp = context->dp;
+	args.name = name;
+	args.namelen = namelen;
+	args.hashval = xfs_da_hashname(args.name, args.namelen);
+	args.trans = context->tp;
+	args.value = sx->sc->buf;
+	args.valuelen = XATTR_SIZE_MAX;
+
+	error = xfs_attr_get_ilocked(context->dp, &args);
+	if (error == -EEXIST)
+		error = 0;
+	if (!xfs_scrub_fblock_process_error(sx->sc, XFS_ATTR_FORK, args.blkno,
+			&error))
+		goto fail_xref;
+	if (args.valuelen != valuelen)
+		xfs_scrub_fblock_set_corrupt(sx->sc, XFS_ATTR_FORK,
+					     args.blkno);
+
+fail_xref:
+	return;
+}
+
+/* Scrub a attribute btree record. */
+STATIC int
+xfs_scrub_xattr_rec(
+	struct xfs_scrub_da_btree	*ds,
+	int				level,
+	void				*rec)
+{
+	struct xfs_mount		*mp = ds->state->mp;
+	struct xfs_attr_leaf_entry	*ent = rec;
+	struct xfs_da_state_blk		*blk;
+	struct xfs_attr_leaf_name_local	*lentry;
+	struct xfs_attr_leaf_name_remote	*rentry;
+	struct xfs_buf			*bp;
+	xfs_dahash_t			calc_hash;
+	xfs_dahash_t			hash;
+	int				nameidx;
+	int				hdrsize;
+	unsigned int			badflags;
+	int				error;
+
+	blk = &ds->state->path.blk[level];
+
+	/* Check the hash of the entry. */
+	error = xfs_scrub_da_btree_hash(ds, level, &ent->hashval);
+	if (error)
+		goto out;
+
+	/* Find the attr entry's location. */
+	bp = blk->bp;
+	hdrsize = xfs_attr3_leaf_hdr_size(bp->b_addr);
+	nameidx = be16_to_cpu(ent->nameidx);
+	if (nameidx < hdrsize || nameidx >= mp->m_attr_geo->blksize) {
+		xfs_scrub_da_set_corrupt(ds, level);
+		goto out;
+	}
+
+	/* Retrieve the entry and check it. */
+	hash = be32_to_cpu(ent->hashval);
+	badflags = ~(XFS_ATTR_LOCAL | XFS_ATTR_ROOT | XFS_ATTR_SECURE |
+			XFS_ATTR_INCOMPLETE);
+	if ((ent->flags & badflags) != 0)
+		xfs_scrub_da_set_corrupt(ds, level);
+	if (ent->flags & XFS_ATTR_LOCAL) {
+		lentry = (struct xfs_attr_leaf_name_local *)
+				(((char *)bp->b_addr) + nameidx);
+		if (lentry->namelen <= 0) {
+			xfs_scrub_da_set_corrupt(ds, level);
+			goto out;
+		}
+		calc_hash = xfs_da_hashname(lentry->nameval, lentry->namelen);
+	} else {
+		rentry = (struct xfs_attr_leaf_name_remote *)
+				(((char *)bp->b_addr) + nameidx);
+		if (rentry->namelen <= 0) {
+			xfs_scrub_da_set_corrupt(ds, level);
+			goto out;
+		}
+		calc_hash = xfs_da_hashname(rentry->name, rentry->namelen);
+	}
+	if (calc_hash != hash)
+		xfs_scrub_da_set_corrupt(ds, level);
+
+out:
+	return error;
+}
+
+/* Scrub the extended attribute metadata. */
+int
+xfs_scrub_xattr(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_scrub_xattr		sx = { 0 };
+	struct attrlist_cursor_kern	cursor = { 0 };
+	int				error = 0;
+
+	if (!xfs_inode_hasattr(sc->ip))
+		return -ENOENT;
+
+	memset(&sx, 0, sizeof(sx));
+	/* Check attribute tree structure */
+	error = xfs_scrub_da_btree(sc, XFS_ATTR_FORK, xfs_scrub_xattr_rec);
+	if (error)
+		goto out;
+
+	/* Check that every attr key can also be looked up by hash. */
+	sx.context.dp = sc->ip;
+	sx.context.cursor = &cursor;
+	sx.context.resynch = 1;
+	sx.context.put_listent = xfs_scrub_xattr_listent;
+	sx.context.tp = sc->tp;
+	sx.context.flags = ATTR_INCOMPLETE;
+	sx.sc = sc;
+
+	/*
+	 * Look up every xattr in this file by name.
+	 *
+	 * Use the backend implementation of xfs_attr_list to call
+	 * xfs_scrub_xattr_listent on every attribute key in this inode.
+	 * In other words, we use the same iterator/callback mechanism
+	 * that listattr uses to scrub extended attributes, though in our
+	 * _listent function, we check the value of the attribute.
+	 *
+	 * The VFS only locks i_rwsem when modifying attrs, so keep all
+	 * three locks held because that's the only way to ensure we're
+	 * the only thread poking into the da btree.  We traverse the da
+	 * btree while holding a leaf buffer locked for the xattr name
+	 * iteration, which doesn't really follow the usual buffer
+	 * locking order.
+	 */
+	error = xfs_attr_list_int_ilocked(&sx.context);
+	if (!xfs_scrub_fblock_process_error(sc, XFS_ATTR_FORK, 0, &error))
+		goto out;
+out:
+	return error;
+}
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 7cd4a78..b938429 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -95,6 +95,8 @@ int xfs_scrub_setup_inode_bmap_data(struct xfs_scrub_context *sc,
 				    struct xfs_inode *ip);
 int xfs_scrub_setup_directory(struct xfs_scrub_context *sc,
 			      struct xfs_inode *ip);
+int xfs_scrub_setup_xattr(struct xfs_scrub_context *sc,
+			  struct xfs_inode *ip);
 
 void xfs_scrub_ag_free(struct xfs_scrub_context *sc, struct xfs_scrub_ag *sa);
 int xfs_scrub_ag_init(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 4a44727..7ad9f54 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -148,6 +148,10 @@ xfs_scrub_teardown(
 			iput(VFS_I(sc->ip));
 		sc->ip = NULL;
 	}
+	if (sc->buf) {
+		kmem_free(sc->buf);
+		sc->buf = NULL;
+	}
 	return error;
 }
 
@@ -221,6 +225,10 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.setup	= xfs_scrub_setup_directory,
 		.scrub	= xfs_scrub_directory,
 	},
+	{ /* extended attributes */
+		.setup	= xfs_scrub_setup_xattr,
+		.scrub	= xfs_scrub_xattr,
+	},
 };
 
 /* This isn't a stable feature, warn once per day. */
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index 844506e..d31ff58 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -59,6 +59,7 @@ struct xfs_scrub_context {
 	const struct xfs_scrub_meta_ops	*ops;
 	struct xfs_trans		*tp;
 	struct xfs_inode		*ip;
+	void				*buf;
 	uint				ilock_flags;
 	bool				try_harder;
 
@@ -83,5 +84,6 @@ int xfs_scrub_bmap_data(struct xfs_scrub_context *sc);
 int xfs_scrub_bmap_attr(struct xfs_scrub_context *sc);
 int xfs_scrub_bmap_cow(struct xfs_scrub_context *sc);
 int xfs_scrub_directory(struct xfs_scrub_context *sc);
+int xfs_scrub_xattr(struct xfs_scrub_context *sc);
 
 #endif	/* __XFS_SCRUB_SCRUB_H__ */
diff --git a/fs/xfs/xfs_attr.h b/fs/xfs/xfs_attr.h
index 5d5a5e2..d07bf27 100644
--- a/fs/xfs/xfs_attr.h
+++ b/fs/xfs/xfs_attr.h
@@ -48,6 +48,8 @@ struct xfs_attr_list_context;
 #define ATTR_KERNOTIME	0x1000	/* [kernel] don't update inode timestamps */
 #define ATTR_KERNOVAL	0x2000	/* [kernel] get attr size only, not value */
 
+#define ATTR_INCOMPLETE	0x4000	/* [kernel] return INCOMPLETE attr keys */
+
 #define XFS_ATTR_FLAGS \
 	{ ATTR_DONTFOLLOW, 	"DONTFOLLOW" }, \
 	{ ATTR_ROOT,		"ROOT" }, \
@@ -56,7 +58,8 @@ struct xfs_attr_list_context;
 	{ ATTR_CREATE,		"CREATE" }, \
 	{ ATTR_REPLACE,		"REPLACE" }, \
 	{ ATTR_KERNOTIME,	"KERNOTIME" }, \
-	{ ATTR_KERNOVAL,	"KERNOVAL" }
+	{ ATTR_KERNOVAL,	"KERNOVAL" }, \
+	{ ATTR_INCOMPLETE,	"INCOMPLETE" }
 
 /*
  * The maximum size (into the kernel or returned from the kernel) of an
diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
index 7740c8a..5816786 100644
--- a/fs/xfs/xfs_attr_list.c
+++ b/fs/xfs/xfs_attr_list.c
@@ -407,7 +407,8 @@ xfs_attr3_leaf_list_int(
 			cursor->offset = 0;
 		}
 
-		if (entry->flags & XFS_ATTR_INCOMPLETE)
+		if ((entry->flags & XFS_ATTR_INCOMPLETE) &&
+		    !(context->flags & ATTR_INCOMPLETE))
 			continue;		/* skip incomplete entries */
 
 		if (entry->flags & XFS_ATTR_LOCAL) {
@@ -583,6 +584,10 @@ xfs_attr_list(
 	    (cursor->hashval || cursor->blkno || cursor->offset))
 		return -EINVAL;
 
+	/* Only internal consumers can retrieve incomplete attrs. */
+	if (flags & ATTR_INCOMPLETE)
+		return -EINVAL;
+
 	/*
 	 * Check for a properly aligned buffer.
 	 */


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 27/30] xfs: scrub symbolic links
  2017-10-12  1:40 [PATCH v12 00/30] xfs: online scrub support Darrick J. Wong
                   ` (25 preceding siblings ...)
  2017-10-12  1:43 ` [PATCH 26/30] xfs: scrub extended attributes Darrick J. Wong
@ 2017-10-12  1:43 ` Darrick J. Wong
  2017-10-16  4:52   ` Dave Chinner
  2017-10-12  1:43 ` [PATCH 28/30] xfs: scrub directory parent pointers Darrick J. Wong
                   ` (2 subsequent siblings)
  29 siblings, 1 reply; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-12  1:43 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Create the infrastructure to scrub symbolic link data.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile        |    1 +
 fs/xfs/libxfs/xfs_fs.h |    3 +-
 fs/xfs/scrub/common.h  |    2 +
 fs/xfs/scrub/scrub.c   |    4 ++
 fs/xfs/scrub/scrub.h   |    1 +
 fs/xfs/scrub/symlink.c |   92 ++++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 102 insertions(+), 1 deletion(-)
 create mode 100644 fs/xfs/scrub/symlink.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 4d46399..28637a6 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -159,5 +159,6 @@ xfs-y				+= $(addprefix scrub/, \
 				   refcount.o \
 				   rmap.o \
 				   scrub.o \
+				   symlink.o \
 				   )
 endif
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 0834ce6..bb8bcd0 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -500,9 +500,10 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_BMBTC	14	/* CoW fork block mapping */
 #define XFS_SCRUB_TYPE_DIR	15	/* directory */
 #define XFS_SCRUB_TYPE_XATTR	16	/* extended attribute */
+#define XFS_SCRUB_TYPE_SYMLINK	17	/* symbolic link */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	17
+#define XFS_SCRUB_TYPE_NR	18
 
 /* i: Repair this metadata. */
 #define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index b938429..b71c1a8 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -97,6 +97,8 @@ int xfs_scrub_setup_directory(struct xfs_scrub_context *sc,
 			      struct xfs_inode *ip);
 int xfs_scrub_setup_xattr(struct xfs_scrub_context *sc,
 			  struct xfs_inode *ip);
+int xfs_scrub_setup_symlink(struct xfs_scrub_context *sc,
+			    struct xfs_inode *ip);
 
 void xfs_scrub_ag_free(struct xfs_scrub_context *sc, struct xfs_scrub_ag *sa);
 int xfs_scrub_ag_init(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 7ad9f54..fbf6696 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -229,6 +229,10 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.setup	= xfs_scrub_setup_xattr,
 		.scrub	= xfs_scrub_xattr,
 	},
+	{ /* symbolic link */
+		.setup	= xfs_scrub_setup_symlink,
+		.scrub	= xfs_scrub_symlink,
+	},
 };
 
 /* This isn't a stable feature, warn once per day. */
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index d31ff58..dc4ed8d 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -85,5 +85,6 @@ int xfs_scrub_bmap_attr(struct xfs_scrub_context *sc);
 int xfs_scrub_bmap_cow(struct xfs_scrub_context *sc);
 int xfs_scrub_directory(struct xfs_scrub_context *sc);
 int xfs_scrub_xattr(struct xfs_scrub_context *sc);
+int xfs_scrub_symlink(struct xfs_scrub_context *sc);
 
 #endif	/* __XFS_SCRUB_SCRUB_H__ */
diff --git a/fs/xfs/scrub/symlink.c b/fs/xfs/scrub/symlink.c
new file mode 100644
index 0000000..3aa3d60
--- /dev/null
+++ b/fs/xfs/scrub/symlink.c
@@ -0,0 +1,92 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_inode_fork.h"
+#include "xfs_symlink.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/trace.h"
+
+/* Set us up to scrub a symbolic link. */
+int
+xfs_scrub_setup_symlink(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	/* Allocate the buffer without the inode lock held. */
+	sc->buf = kmem_zalloc_large(XFS_SYMLINK_MAXLEN + 1, KM_SLEEP);
+	if (!sc->buf)
+		return -ENOMEM;
+
+	return xfs_scrub_setup_inode_contents(sc, ip, 0);
+}
+
+/* Symbolic links. */
+
+int
+xfs_scrub_symlink(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_inode		*ip = sc->ip;
+	struct xfs_ifork		*ifp;
+	loff_t				len;
+	int				error = 0;
+
+	if (!S_ISLNK(VFS_I(ip)->i_mode))
+		return -ENOENT;
+	ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
+	len = ip->i_d.di_size;
+
+	/* Plausible size? */
+	if (len > XFS_SYMLINK_MAXLEN || len <= 0) {
+		xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
+		goto out;
+	}
+
+	/* Inline symlink? */
+	if (ifp->if_flags & XFS_IFINLINE) {
+		if (len > XFS_IFORK_DSIZE(ip) ||
+		    len > strnlen(ifp->if_u1.if_data, XFS_IFORK_DSIZE(ip)))
+			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
+		goto out;
+	}
+
+	/* Remote symlink; must read the contents. */
+	error = xfs_readlink_bmap_ilocked(sc->ip, sc->buf);
+	if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, 0, &error))
+		goto out;
+	if (strnlen(sc->buf, XFS_SYMLINK_MAXLEN) < len)
+		xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
+out:
+	return error;
+}


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 28/30] xfs: scrub directory parent pointers
  2017-10-12  1:40 [PATCH v12 00/30] xfs: online scrub support Darrick J. Wong
                   ` (26 preceding siblings ...)
  2017-10-12  1:43 ` [PATCH 27/30] xfs: scrub symbolic links Darrick J. Wong
@ 2017-10-12  1:43 ` Darrick J. Wong
  2017-10-16  5:09   ` Dave Chinner
  2017-10-17  0:16   ` [PATCH v2 " Darrick J. Wong
  2017-10-12  1:43 ` [PATCH 29/30] xfs: scrub realtime bitmap/summary Darrick J. Wong
  2017-10-12  1:44 ` [PATCH 30/30] xfs: scrub quota information Darrick J. Wong
  29 siblings, 2 replies; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-12  1:43 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Scrub parent pointers, sort of.  For directories, we can ride the
'..' entry up to the parent to confirm that there's at most one
dentry that points back to this directory.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile        |    1 
 fs/xfs/libxfs/xfs_fs.h |    3 -
 fs/xfs/scrub/common.h  |    2 
 fs/xfs/scrub/parent.c  |  277 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/scrub.c   |    4 +
 fs/xfs/scrub/scrub.h   |    1 
 6 files changed, 287 insertions(+), 1 deletion(-)
 create mode 100644 fs/xfs/scrub/parent.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 28637a6..2193a54 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -156,6 +156,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   dir.o \
 				   ialloc.o \
 				   inode.o \
+				   parent.o \
 				   refcount.o \
 				   rmap.o \
 				   scrub.o \
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index bb8bcd0..7444094 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -501,9 +501,10 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_DIR	15	/* directory */
 #define XFS_SCRUB_TYPE_XATTR	16	/* extended attribute */
 #define XFS_SCRUB_TYPE_SYMLINK	17	/* symbolic link */
+#define XFS_SCRUB_TYPE_PARENT	18	/* parent pointers */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	18
+#define XFS_SCRUB_TYPE_NR	19
 
 /* i: Repair this metadata. */
 #define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index b71c1a8..0542e7d 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -99,6 +99,8 @@ int xfs_scrub_setup_xattr(struct xfs_scrub_context *sc,
 			  struct xfs_inode *ip);
 int xfs_scrub_setup_symlink(struct xfs_scrub_context *sc,
 			    struct xfs_inode *ip);
+int xfs_scrub_setup_parent(struct xfs_scrub_context *sc,
+			   struct xfs_inode *ip);
 
 void xfs_scrub_ag_free(struct xfs_scrub_context *sc, struct xfs_scrub_ag *sa);
 int xfs_scrub_ag_init(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
diff --git a/fs/xfs/scrub/parent.c b/fs/xfs/scrub/parent.c
new file mode 100644
index 0000000..9ba3f0d
--- /dev/null
+++ b/fs/xfs/scrub/parent.c
@@ -0,0 +1,277 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_icache.h"
+#include "xfs_dir2.h"
+#include "xfs_dir2_priv.h"
+#include "xfs_ialloc.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/trace.h"
+
+/* Set us up to scrub parents. */
+int
+xfs_scrub_setup_parent(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	return xfs_scrub_setup_inode_contents(sc, ip, 0);
+}
+
+/* Parent pointers */
+
+/* Look for an entry in a parent pointing to this inode. */
+
+struct xfs_scrub_parent_ctx {
+	struct dir_context		dc;
+	xfs_ino_t			ino;
+	xfs_nlink_t			nlink;
+};
+
+/* Look for a single entry in a directory pointing to an inode. */
+STATIC int
+xfs_scrub_parent_actor(
+	struct dir_context		*dc,
+	const char			*name,
+	int				namelen,
+	loff_t				pos,
+	u64				ino,
+	unsigned			type)
+{
+	struct xfs_scrub_parent_ctx	*spc;
+
+	spc = container_of(dc, struct xfs_scrub_parent_ctx, dc);
+	if (spc->ino == ino)
+		spc->nlink++;
+	return 0;
+}
+
+/* Count the number of dentries in the parent dir that point to this inode. */
+STATIC int
+xfs_scrub_parent_count_parent_dentries(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*parent,
+	xfs_nlink_t			*nlink)
+{
+	struct xfs_scrub_parent_ctx	spc = {
+		.dc.actor = xfs_scrub_parent_actor,
+		.dc.pos = 0,
+		.ino = sc->ip->i_ino,
+		.nlink = 0,
+	};
+	struct xfs_ifork		*ifp;
+	size_t				bufsize;
+	loff_t				oldpos;
+	uint				lock_mode;
+	int				error;
+
+	/*
+	 * Load the parent directory's extent map.  A regular directory
+	 * open would start readahead (and thus load the extent map)
+	 * before we even got to a readdir call, but this isn't
+	 * guaranteed here.
+	 */
+	lock_mode = xfs_ilock_data_map_shared(parent);
+	ifp = XFS_IFORK_PTR(parent, XFS_DATA_FORK);
+	if (XFS_IFORK_FORMAT(parent, XFS_DATA_FORK) == XFS_DINODE_FMT_BTREE &&
+	    !(ifp->if_flags & XFS_IFEXTENTS)) {
+		error = xfs_iread_extents(sc->tp, parent, XFS_DATA_FORK);
+		if (error) {
+			xfs_iunlock(parent, lock_mode);
+			return error;
+		}
+	}
+	xfs_iunlock(parent, lock_mode);
+
+	/*
+	 * Iterate the parent dir to confirm that there is
+	 * exactly one entry pointing back to the inode being
+	 * scanned.
+	 */
+	bufsize = (size_t)min_t(loff_t, 32768, parent->i_d.di_size);
+	oldpos = 0;
+	while (true) {
+		error = xfs_readdir(sc->tp, parent, &spc.dc, bufsize);
+		if (error)
+			goto out;
+		if (oldpos == spc.dc.pos)
+			break;
+		oldpos = spc.dc.pos;
+	}
+	*nlink = spc.nlink;
+out:
+	return error;
+}
+
+/* Scrub a parent pointer. */
+int
+xfs_scrub_parent(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_inode		*dp = NULL;
+	xfs_ino_t			dnum;
+	xfs_nlink_t			expected_nlink;
+	xfs_nlink_t			nlink;
+	int				tries = 0;
+	int				error;
+
+	/*
+	 * If we're a directory, check that the '..' link points up to
+	 * a directory that has one entry pointing to us.
+	 */
+	if (!S_ISDIR(VFS_I(sc->ip)->i_mode))
+		return -ENOENT;
+
+	/* We're not a special inode, are we? */
+	if (!xfs_verify_dir_ino_ptr(mp, sc->ip->i_ino)) {
+		xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
+		goto out;
+	}
+
+	/*
+	 * If we're an unlinked directory, the parent /won't/ have a link
+	 * to us.  Otherwise, it should have one link.
+	 */
+	expected_nlink = VFS_I(sc->ip)->i_nlink == 0 ? 0 : 1;
+
+	/*
+	 * The VFS grabs a read or write lock via i_rwsem before it reads
+	 * or writes to a directory.  If we've gotten this far we've
+	 * already obtained IOLOCK_EXCL, which (since 4.10) is the same as
+	 * getting a write lock on i_rwsem.  Therefore, it is safe for us
+	 * to drop the ILOCK here in order to do directory lookups.
+	 */
+	sc->ilock_flags &= ~(XFS_ILOCK_EXCL | XFS_MMAPLOCK_EXCL);
+	xfs_iunlock(sc->ip, XFS_ILOCK_EXCL | XFS_MMAPLOCK_EXCL);
+
+	/* Look up '..' */
+	error = xfs_dir_lookup(sc->tp, sc->ip, &xfs_name_dotdot, &dnum, NULL);
+	if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, 0, &error))
+		goto out;
+	if (!xfs_verify_dir_ino_ptr(mp, dnum)) {
+		xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
+		goto out;
+	}
+
+	/* Is this the root dir?  Then '..' must point to itself. */
+	if (sc->ip == mp->m_rootip) {
+		if (sc->ip->i_ino != mp->m_sb.sb_rootino ||
+		    sc->ip->i_ino != dnum)
+			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
+		return 0;
+	}
+
+try_again:
+	/* Otherwise, '..' must not point to ourselves. */
+	if (sc->ip->i_ino == dnum) {
+		xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
+		goto out;
+	}
+
+	error = xfs_iget(mp, sc->tp, dnum, XFS_IGET_DONTCACHE, 0, &dp);
+	if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, 0, &error))
+		goto out;
+	if (dp == sc->ip) {
+		xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
+		goto out_rele;
+	}
+
+	/*
+	 * We prefer to keep the inode locked while we lock and search
+	 * its alleged parent for a forward reference.  However, this
+	 * child -> parent scheme can deadlock with the parent -> child
+	 * scheme that is normally used.  Therefore, if we can lock the
+	 * parent, just validate the references and get out.
+	 */
+	if (xfs_ilock_nowait(dp, XFS_IOLOCK_SHARED)) {
+		error = xfs_scrub_parent_count_parent_dentries(sc, dp, &nlink);
+		if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, 0,
+				&error))
+			goto out_unlock;
+		if (nlink != expected_nlink)
+			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
+		goto out_unlock;
+	}
+
+	/*
+	 * The game changes if we get here.  We failed to lock the parent,
+	 * so we're going to try to verify both pointers while only holding
+	 * one lock so as to avoid deadlocking with something that's actually
+	 * trying to traverse down the directory tree.
+	 */
+	xfs_iunlock(sc->ip, sc->ilock_flags);
+	sc->ilock_flags = 0;
+	xfs_ilock(dp, XFS_IOLOCK_SHARED);
+
+	/* Go looking for our dentry. */
+	error = xfs_scrub_parent_count_parent_dentries(sc, dp, &nlink);
+	if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, 0, &error))
+		goto out_unlock;
+
+	/* Drop the parent lock, relock this inode. */
+	xfs_iunlock(dp, XFS_IOLOCK_SHARED);
+	sc->ilock_flags = XFS_IOLOCK_EXCL;
+	xfs_ilock(sc->ip, sc->ilock_flags);
+
+	/* Look up '..' to see if the inode changed. */
+	error = xfs_dir_lookup(sc->tp, sc->ip, &xfs_name_dotdot, &dnum, NULL);
+	if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, 0, &error))
+		goto out_rele;
+
+	/* Drat, parent changed.  Try again! */
+	if (dnum != dp->i_ino) {
+		iput(VFS_I(dp));
+		tries++;
+		if (tries < 20)
+			goto try_again;
+		xfs_scrub_set_incomplete(sc);
+		goto out;
+	}
+	iput(VFS_I(dp));
+
+	/*
+	 * '..' didn't change, so check that there was only one entry
+	 * for us in the parent.
+	 */
+	if (nlink != expected_nlink)
+		xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
+	goto out;
+
+out_unlock:
+	xfs_iunlock(dp, XFS_IOLOCK_SHARED);
+out_rele:
+	iput(VFS_I(dp));
+out:
+	return error;
+}
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index fbf6696..8ecc3a1 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -233,6 +233,10 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.setup	= xfs_scrub_setup_symlink,
 		.scrub	= xfs_scrub_symlink,
 	},
+	{ /* parent pointers */
+		.setup	= xfs_scrub_setup_parent,
+		.scrub	= xfs_scrub_parent,
+	},
 };
 
 /* This isn't a stable feature, warn once per day. */
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index dc4ed8d..a264810 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -86,5 +86,6 @@ int xfs_scrub_bmap_cow(struct xfs_scrub_context *sc);
 int xfs_scrub_directory(struct xfs_scrub_context *sc);
 int xfs_scrub_xattr(struct xfs_scrub_context *sc);
 int xfs_scrub_symlink(struct xfs_scrub_context *sc);
+int xfs_scrub_parent(struct xfs_scrub_context *sc);
 
 #endif	/* __XFS_SCRUB_SCRUB_H__ */


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 29/30] xfs: scrub realtime bitmap/summary
  2017-10-12  1:40 [PATCH v12 00/30] xfs: online scrub support Darrick J. Wong
                   ` (27 preceding siblings ...)
  2017-10-12  1:43 ` [PATCH 28/30] xfs: scrub directory parent pointers Darrick J. Wong
@ 2017-10-12  1:43 ` Darrick J. Wong
  2017-10-16  5:11   ` Dave Chinner
  2017-10-12  1:44 ` [PATCH 30/30] xfs: scrub quota information Darrick J. Wong
  29 siblings, 1 reply; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-12  1:43 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Perform simple tests of the realtime bitmap and summary.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile            |    2 +
 fs/xfs/libxfs/xfs_format.h |    5 ++
 fs/xfs/libxfs/xfs_fs.h     |    4 +-
 fs/xfs/scrub/common.h      |    9 ++++
 fs/xfs/scrub/rtbitmap.c    |  108 ++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/scrub.c       |   10 ++++
 fs/xfs/scrub/scrub.h       |   15 ++++++
 7 files changed, 152 insertions(+), 1 deletion(-)
 create mode 100644 fs/xfs/scrub/rtbitmap.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 2193a54..9ce581e 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -162,4 +162,6 @@ xfs-y				+= $(addprefix scrub/, \
 				   scrub.o \
 				   symlink.o \
 				   )
+
+xfs-$(CONFIG_XFS_RT)		+= scrub/rtbitmap.o
 endif
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 154c3dd..d4d9bef 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -315,6 +315,11 @@ static inline bool xfs_sb_good_version(struct xfs_sb *sbp)
 	return false;
 }
 
+static inline bool xfs_sb_version_hasrealtime(struct xfs_sb *sbp)
+{
+	return sbp->sb_rblocks > 0;
+}
+
 /*
  * Detect a mismatched features2 field.  Older kernels read/wrote
  * this into the wrong slot, so to be safe we keep them in sync.
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 7444094..f8bac92 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -502,9 +502,11 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_XATTR	16	/* extended attribute */
 #define XFS_SCRUB_TYPE_SYMLINK	17	/* symbolic link */
 #define XFS_SCRUB_TYPE_PARENT	18	/* parent pointers */
+#define XFS_SCRUB_TYPE_RTBITMAP	19	/* realtime bitmap */
+#define XFS_SCRUB_TYPE_RTSUM	20	/* realtime summary */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	19
+#define XFS_SCRUB_TYPE_NR	21
 
 /* i: Repair this metadata. */
 #define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 0542e7d..5b561e2 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -101,6 +101,15 @@ int xfs_scrub_setup_symlink(struct xfs_scrub_context *sc,
 			    struct xfs_inode *ip);
 int xfs_scrub_setup_parent(struct xfs_scrub_context *sc,
 			   struct xfs_inode *ip);
+#ifdef CONFIG_XFS_RT
+int xfs_scrub_setup_rt(struct xfs_scrub_context *sc, struct xfs_inode *ip);
+#else
+static inline int
+xfs_scrub_setup_rt(struct xfs_scrub_context *sc, struct xfs_inode *ip)
+{
+	return -ENOENT;
+}
+#endif
 
 void xfs_scrub_ag_free(struct xfs_scrub_context *sc, struct xfs_scrub_ag *sa);
 int xfs_scrub_ag_init(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
diff --git a/fs/xfs/scrub/rtbitmap.c b/fs/xfs/scrub/rtbitmap.c
new file mode 100644
index 0000000..66d4252
--- /dev/null
+++ b/fs/xfs/scrub/rtbitmap.c
@@ -0,0 +1,108 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_alloc.h"
+#include "xfs_rtalloc.h"
+#include "xfs_inode.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/trace.h"
+
+/* Set us up with the realtime metadata locked. */
+int
+xfs_scrub_setup_rt(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	struct xfs_mount		*mp = sc->mp;
+	int				error = 0;
+
+	/*
+	 * If userspace gave us an AG number or inode data, they don't
+	 * know what they're doing.  Get out.
+	 */
+	if (sc->sm->sm_agno || sc->sm->sm_ino || sc->sm->sm_gen)
+		return -EINVAL;
+
+	error = xfs_scrub_setup_fs(sc, ip);
+	if (error)
+		return error;
+
+	sc->ilock_flags = XFS_ILOCK_EXCL | XFS_ILOCK_RTBITMAP;
+	sc->ip = mp->m_rbmip;
+	xfs_ilock(sc->ip, sc->ilock_flags);
+
+	return 0;
+}
+
+/* Realtime bitmap. */
+
+/* Scrub a free extent record from the realtime bitmap. */
+STATIC int
+xfs_scrub_rtbitmap_helper(
+	struct xfs_trans		*tp,
+	struct xfs_rtalloc_rec		*rec,
+	void				*priv)
+{
+	struct xfs_scrub_context	*sc = priv;
+
+	if (rec->ar_startblock + rec->ar_blockcount <= rec->ar_startblock ||
+	    !xfs_verify_rtbno_ptr(sc->mp, rec->ar_startblock) ||
+	    !xfs_verify_rtbno_ptr(sc->mp, rec->ar_startblock +
+			rec->ar_blockcount - 1))
+		xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
+	return 0;
+}
+
+/* Scrub the realtime bitmap. */
+int
+xfs_scrub_rtbitmap(
+	struct xfs_scrub_context	*sc)
+{
+	int				error;
+
+	error = xfs_rtalloc_query_all(sc->tp, xfs_scrub_rtbitmap_helper, sc);
+	if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, 0, &error))
+		goto out;
+
+out:
+	return error;
+}
+
+/* Scrub the realtime summary. */
+int
+xfs_scrub_rtsummary(
+	struct xfs_scrub_context	*sc)
+{
+	/* XXX: implement this some day */
+	return -ENOENT;
+}
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 8ecc3a1..09fc59d 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -237,6 +237,16 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.setup	= xfs_scrub_setup_parent,
 		.scrub	= xfs_scrub_parent,
 	},
+	{ /* realtime bitmap */
+		.setup	= xfs_scrub_setup_rt,
+		.scrub	= xfs_scrub_rtbitmap,
+		.has	= xfs_sb_version_hasrealtime,
+	},
+	{ /* realtime summary */
+		.setup	= xfs_scrub_setup_rt,
+		.scrub	= xfs_scrub_rtsummary,
+		.has	= xfs_sb_version_hasrealtime,
+	},
 };
 
 /* This isn't a stable feature, warn once per day. */
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index a264810..9aff4e2 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -87,5 +87,20 @@ int xfs_scrub_directory(struct xfs_scrub_context *sc);
 int xfs_scrub_xattr(struct xfs_scrub_context *sc);
 int xfs_scrub_symlink(struct xfs_scrub_context *sc);
 int xfs_scrub_parent(struct xfs_scrub_context *sc);
+#ifdef CONFIG_XFS_RT
+int xfs_scrub_rtbitmap(struct xfs_scrub_context *sc);
+int xfs_scrub_rtsummary(struct xfs_scrub_context *sc);
+#else
+static inline int
+xfs_scrub_rtbitmap(struct xfs_scrub_context *sc)
+{
+	return -ENOENT;
+}
+static inline int
+xfs_scrub_rtsummary(struct xfs_scrub_context *sc)
+{
+	return -ENOENT;
+}
+#endif
 
 #endif	/* __XFS_SCRUB_SCRUB_H__ */


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 30/30] xfs: scrub quota information
  2017-10-12  1:40 [PATCH v12 00/30] xfs: online scrub support Darrick J. Wong
                   ` (28 preceding siblings ...)
  2017-10-12  1:43 ` [PATCH 29/30] xfs: scrub realtime bitmap/summary Darrick J. Wong
@ 2017-10-12  1:44 ` Darrick J. Wong
  2017-10-16  5:12   ` Dave Chinner
  29 siblings, 1 reply; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-12  1:44 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Perform some quick sanity testing of the disk quota information.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile        |    1 
 fs/xfs/libxfs/xfs_fs.h |    5 +
 fs/xfs/scrub/common.h  |    9 +
 fs/xfs/scrub/quota.c   |  301 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/scrub.c   |   12 ++
 fs/xfs/scrub/scrub.h   |    9 +
 6 files changed, 336 insertions(+), 1 deletion(-)
 create mode 100644 fs/xfs/scrub/quota.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 9ce581e..3152469 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -164,4 +164,5 @@ xfs-y				+= $(addprefix scrub/, \
 				   )
 
 xfs-$(CONFIG_XFS_RT)		+= scrub/rtbitmap.o
+xfs-$(CONFIG_XFS_QUOTA)		+= scrub/quota.o
 endif
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index f8bac92..b9092410 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -504,9 +504,12 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_PARENT	18	/* parent pointers */
 #define XFS_SCRUB_TYPE_RTBITMAP	19	/* realtime bitmap */
 #define XFS_SCRUB_TYPE_RTSUM	20	/* realtime summary */
+#define XFS_SCRUB_TYPE_UQUOTA	21	/* user quotas */
+#define XFS_SCRUB_TYPE_GQUOTA	22	/* group quotas */
+#define XFS_SCRUB_TYPE_PQUOTA	23	/* project quotas */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	21
+#define XFS_SCRUB_TYPE_NR	24
 
 /* i: Repair this metadata. */
 #define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 5b561e2..0409ec2 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -110,6 +110,15 @@ xfs_scrub_setup_rt(struct xfs_scrub_context *sc, struct xfs_inode *ip)
 	return -ENOENT;
 }
 #endif
+#ifdef CONFIG_XFS_QUOTA
+int xfs_scrub_setup_quota(struct xfs_scrub_context *sc, struct xfs_inode *ip);
+#else
+static inline int
+xfs_scrub_setup_quota(struct xfs_scrub_context *sc, struct xfs_inode *ip)
+{
+	return -ENOENT;
+}
+#endif
 
 void xfs_scrub_ag_free(struct xfs_scrub_context *sc, struct xfs_scrub_ag *sa);
 int xfs_scrub_ag_init(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
diff --git a/fs/xfs/scrub/quota.c b/fs/xfs/scrub/quota.c
new file mode 100644
index 0000000..e4a9d4e
--- /dev/null
+++ b/fs/xfs/scrub/quota.c
@@ -0,0 +1,301 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_inode_fork.h"
+#include "xfs_alloc.h"
+#include "xfs_bmap.h"
+#include "xfs_quota.h"
+#include "xfs_qm.h"
+#include "xfs_dquot.h"
+#include "xfs_dquot_item.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/trace.h"
+
+/* Convert a scrub type code to a DQ flag, or return 0 if error. */
+static inline uint
+xfs_scrub_quota_to_dqtype(
+	struct xfs_scrub_context	*sc)
+{
+	switch (sc->sm->sm_type) {
+	case XFS_SCRUB_TYPE_UQUOTA:
+		return XFS_DQ_USER;
+	case XFS_SCRUB_TYPE_GQUOTA:
+		return XFS_DQ_GROUP;
+	case XFS_SCRUB_TYPE_PQUOTA:
+		return XFS_DQ_PROJ;
+	default:
+		return 0;
+	}
+}
+
+/* Set us up to scrub a quota. */
+int
+xfs_scrub_setup_quota(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	uint				dqtype;
+
+	/*
+	 * If userspace gave us an AG number or inode data, they don't
+	 * know what they're doing.  Get out.
+	 */
+	if (sc->sm->sm_agno || sc->sm->sm_ino || sc->sm->sm_gen)
+		return -EINVAL;
+
+	dqtype = xfs_scrub_quota_to_dqtype(sc);
+	if (dqtype == 0)
+		return -EINVAL;
+	if (!xfs_this_quota_on(sc->mp, dqtype))
+		return -ENOENT;
+	return 0;
+}
+
+/* Quotas. */
+
+/* Scrub the fields in an individual quota item. */
+STATIC void
+xfs_scrub_quota_item(
+	struct xfs_scrub_context	*sc,
+	uint				dqtype,
+	struct xfs_dquot		*dq,
+	xfs_dqid_t			id)
+{
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_disk_dquot		*d = &dq->q_core;
+	struct xfs_quotainfo		*qi = mp->m_quotainfo;
+	xfs_fileoff_t			offset;
+	unsigned long long		bsoft;
+	unsigned long long		isoft;
+	unsigned long long		rsoft;
+	unsigned long long		bhard;
+	unsigned long long		ihard;
+	unsigned long long		rhard;
+	unsigned long long		bcount;
+	unsigned long long		icount;
+	unsigned long long		rcount;
+	xfs_ino_t			fs_icount;
+
+	offset = id * qi->qi_dqperchunk;
+
+	/*
+	 * We fed $id and DQNEXT into the xfs_qm_dqget call, which means
+	 * that the actual dquot we got must either have the same id or
+	 * the next higher id.
+	 */
+	if (id > be32_to_cpu(d->d_id))
+		xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, offset);
+
+	/* Did we get the dquot type we wanted? */
+	if (dqtype != (d->d_flags & XFS_DQ_ALLTYPES))
+		xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, offset);
+
+	/* Check the limits. */
+	bhard = be64_to_cpu(d->d_blk_hardlimit);
+	ihard = be64_to_cpu(d->d_ino_hardlimit);
+	rhard = be64_to_cpu(d->d_rtb_hardlimit);
+
+	bsoft = be64_to_cpu(d->d_blk_softlimit);
+	isoft = be64_to_cpu(d->d_ino_softlimit);
+	rsoft = be64_to_cpu(d->d_rtb_softlimit);
+
+	/*
+	 * Warn if the hard limits are larger than the fs.
+	 * Administrators can do this, though in production this seems
+	 * suspect, which is why we flag it for review.
+	 *
+	 * Complain about corruption if the soft limit is greater than
+	 * the hard limit.
+	 */
+	if (bhard > mp->m_sb.sb_dblocks)
+		xfs_scrub_fblock_set_warning(sc, XFS_DATA_FORK, offset);
+	if (bsoft > bhard)
+		xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, offset);
+
+	if (ihard > mp->m_maxicount)
+		xfs_scrub_fblock_set_warning(sc, XFS_DATA_FORK, offset);
+	if (isoft > ihard)
+		xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, offset);
+
+	if (rhard > mp->m_sb.sb_rblocks)
+		xfs_scrub_fblock_set_warning(sc, XFS_DATA_FORK, offset);
+	if (rsoft > rhard)
+		xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, offset);
+
+	/* Check the resource counts. */
+	bcount = be64_to_cpu(d->d_bcount);
+	icount = be64_to_cpu(d->d_icount);
+	rcount = be64_to_cpu(d->d_rtbcount);
+	fs_icount = percpu_counter_sum(&mp->m_icount);
+
+	/*
+	 * Check that usage doesn't exceed physical limits.  However, on
+	 * a reflink filesystem we're allowed to exceed physical space
+	 * if there are no quota limits.
+	 */
+	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
+		if (mp->m_sb.sb_dblocks < bcount)
+			xfs_scrub_fblock_set_warning(sc, XFS_DATA_FORK,
+					offset);
+	} else {
+		if (mp->m_sb.sb_dblocks < bcount)
+			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK,
+					offset);
+	}
+	if (icount > fs_icount || rcount > mp->m_sb.sb_rblocks)
+		xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, offset);
+
+	/*
+	 * We can violate the hard limits if the admin suddenly sets a
+	 * lower limit than the actual usage.  However, we flag it for
+	 * admin review.
+	 */
+	if (id != 0 && bhard != 0 && bcount > bhard)
+		xfs_scrub_fblock_set_warning(sc, XFS_DATA_FORK, offset);
+	if (id != 0 && ihard != 0 && icount > ihard)
+		xfs_scrub_fblock_set_warning(sc, XFS_DATA_FORK, offset);
+	if (id != 0 && rhard != 0 && rcount > rhard)
+		xfs_scrub_fblock_set_warning(sc, XFS_DATA_FORK, offset);
+}
+
+/* Scrub all of a quota type's items. */
+int
+xfs_scrub_quota(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_bmbt_irec		irec = { 0 };
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_inode		*ip;
+	struct xfs_quotainfo		*qi = mp->m_quotainfo;
+	struct xfs_dquot		*dq;
+	xfs_fileoff_t			max_dqid_off;
+	xfs_fileoff_t			off = 0;
+	xfs_dqid_t			id = 0;
+	uint				dqtype;
+	int				nimaps;
+	int				error;
+
+	if (!XFS_IS_QUOTA_RUNNING(mp) || !XFS_IS_QUOTA_ON(mp))
+		return -ENOENT;
+
+	mutex_lock(&qi->qi_quotaofflock);
+	dqtype = xfs_scrub_quota_to_dqtype(sc);
+	if (!xfs_this_quota_on(sc->mp, dqtype)) {
+		error = -ENOENT;
+		goto out_unlock_quota;
+	}
+
+	/* Attach to the quota inode and set sc->ip so that reporting works. */
+	ip = xfs_quota_inode(sc->mp, dqtype);
+	sc->ip = ip;
+
+	/* Look for problem extents. */
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
+	if (ip->i_d.di_flags & XFS_DIFLAG_REALTIME) {
+		xfs_scrub_ino_set_corrupt(sc, sc->ip->i_ino, NULL);
+		goto out_unlock_inode;
+	}
+	max_dqid_off = ((xfs_dqid_t)-1) / qi->qi_dqperchunk;
+	while (1) {
+		if (xfs_scrub_should_terminate(sc, &error))
+			break;
+
+		off = irec.br_startoff + irec.br_blockcount;
+		nimaps = 1;
+		error = xfs_bmapi_read(ip, off, -1, &irec, &nimaps,
+				XFS_BMAPI_ENTIRE);
+		if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, off,
+				&error))
+			goto out_unlock_inode;
+		if (!nimaps)
+			break;
+		if (irec.br_startblock == HOLESTARTBLOCK)
+			continue;
+
+		/* Check the extent record doesn't point to crap. */
+		if (irec.br_startblock + irec.br_blockcount <=
+		    irec.br_startblock)
+			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK,
+					irec.br_startoff);
+		if (!xfs_verify_fsbno_ptr(mp, irec.br_startblock) ||
+		    !xfs_verify_fsbno_ptr(mp, irec.br_startblock +
+					irec.br_blockcount - 1))
+			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK,
+					irec.br_startoff);
+
+		/*
+		 * Unwritten extents or blocks mapped above the highest
+		 * quota id shouldn't happen.
+		 */
+		if (isnullstartblock(irec.br_startblock) ||
+		    irec.br_startoff > max_dqid_off ||
+		    irec.br_startoff + irec.br_blockcount > max_dqid_off + 1)
+			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, off);
+	}
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
+		goto out;
+
+	/* Check all the quota items. */
+	while (id < ((xfs_dqid_t)-1ULL)) {
+		if (xfs_scrub_should_terminate(sc, &error))
+			break;
+
+		error = xfs_qm_dqget(mp, NULL, id, dqtype, XFS_QMOPT_DQNEXT,
+				&dq);
+		if (error == -ENOENT)
+			break;
+		if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK,
+				id * qi->qi_dqperchunk, &error))
+			break;
+
+		xfs_scrub_quota_item(sc, dqtype, dq, id);
+
+		id = be32_to_cpu(dq->q_core.d_id) + 1;
+		xfs_qm_dqput(dq);
+		if (!id)
+			break;
+	}
+
+out:
+	/* We set sc->ip earlier, so make sure we clear it now. */
+	sc->ip = NULL;
+out_unlock_quota:
+	mutex_unlock(&qi->qi_quotaofflock);
+	return error;
+
+out_unlock_inode:
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	goto out;
+}
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 09fc59d..5815da3 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -247,6 +247,18 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.scrub	= xfs_scrub_rtsummary,
 		.has	= xfs_sb_version_hasrealtime,
 	},
+	{ /* user quota */
+		.setup = xfs_scrub_setup_quota,
+		.scrub = xfs_scrub_quota,
+	},
+	{ /* group quota */
+		.setup = xfs_scrub_setup_quota,
+		.scrub = xfs_scrub_quota,
+	},
+	{ /* project quota */
+		.setup = xfs_scrub_setup_quota,
+		.scrub = xfs_scrub_quota,
+	},
 };
 
 /* This isn't a stable feature, warn once per day. */
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index 9aff4e2..e9ec041 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -102,5 +102,14 @@ xfs_scrub_rtsummary(struct xfs_scrub_context *sc)
 	return -ENOENT;
 }
 #endif
+#ifdef CONFIG_XFS_QUOTA
+int xfs_scrub_quota(struct xfs_scrub_context *sc);
+#else
+static inline int
+xfs_scrub_quota(struct xfs_scrub_context *sc)
+{
+	return -ENOENT;
+}
+#endif
 
 #endif	/* __XFS_SCRUB_SCRUB_H__ */


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* Re: [PATCH 01/30] xfs: return a distinct error code value for IGET_INCORE cache misses
  2017-10-12  1:40 ` [PATCH 01/30] xfs: return a distinct error code value for IGET_INCORE cache misses Darrick J. Wong
@ 2017-10-12  5:25   ` Dave Chinner
  0 siblings, 0 replies; 99+ messages in thread
From: Dave Chinner @ 2017-10-12  5:25 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, Brian Foster

On Wed, Oct 11, 2017 at 06:40:49PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> For an XFS_IGET_INCORE iget operation, if the inode isn't in the cache,
> return ENODATA so that we don't confuse it with the pre-existing ENOENT
> cases (inode is in cache, but freed).
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by: Brian Foster <bfoster@redhat.com>

Reviewed-by: Dave Chinner <dchinner@redhat.com>

> ---
>  fs/xfs/xfs_icache.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> 
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 3422711..43005fb 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -610,7 +610,7 @@ xfs_iget(
>  	} else {
>  		rcu_read_unlock();
>  		if (flags & XFS_IGET_INCORE) {
> -			error = -ENOENT;
> +			error = -ENODATA;
>  			goto out_error_or_again;
>  		}
>  		XFS_STATS_INC(mp, xs_ig_missed);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 02/30] xfs: create block pointer check functions
  2017-10-12  1:40 ` [PATCH 02/30] xfs: create block pointer check functions Darrick J. Wong
@ 2017-10-12  5:28   ` Dave Chinner
  2017-10-12  5:48     ` Dave Chinner
  0 siblings, 1 reply; 99+ messages in thread
From: Dave Chinner @ 2017-10-12  5:28 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Oct 11, 2017 at 06:40:55PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Create some helper functions to check that a block pointer points
> within the filesystem (or AG) and doesn't point at static metadata.
> We will use this for scrub.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Look fine

Reviewed-by: Dave Chinner <dchinner@redhat.com>

> ---
>  fs/xfs/libxfs/xfs_alloc.c    |   49 ++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/libxfs/xfs_alloc.h    |    4 +++
>  fs/xfs/libxfs/xfs_rtbitmap.c |   12 ++++++++++
>  fs/xfs/xfs_rtalloc.h         |    2 ++
>  4 files changed, 67 insertions(+)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 744dcae..bd3a943 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -2923,3 +2923,52 @@ xfs_alloc_query_all(
>  	query.fn = fn;
>  	return xfs_btree_query_all(cur, xfs_alloc_query_range_helper, &query);
>  }
> +
> +/* Find the size of the AG, in blocks. */
> +xfs_agblock_t
> +xfs_ag_block_count(
> +	struct xfs_mount	*mp,
> +	xfs_agnumber_t		agno)
> +{
> +	ASSERT(agno < mp->m_sb.sb_agcount);
> +
> +	if (agno < mp->m_sb.sb_agcount - 1)
> +		return mp->m_sb.sb_agblocks;
> +	return mp->m_sb.sb_dblocks - (agno * mp->m_sb.sb_agblocks);
> +}
> +
> +/*
> + * Verify that an AG block number pointer neither points outside the AG
> + * nor points at static metadata.
> + */
> +bool
> +xfs_verify_agbno_ptr(
> +	struct xfs_mount	*mp,
> +	xfs_agnumber_t		agno,
> +	xfs_agblock_t		agbno)
> +{
> +	xfs_agblock_t		eoag;
> +
> +	eoag = xfs_ag_block_count(mp, agno);
> +	if (agbno >= eoag)
> +		return false;
> +	if (agbno <= XFS_AGFL_BLOCK(mp))
> +		return false;
> +	return true;
> +}
> +
> +/*
> + * Verify that an FS block number pointer neither points outside the
> + * filesystem nor points at static AG metadata.
> + */
> +bool
> +xfs_verify_fsbno_ptr(
> +	struct xfs_mount	*mp,
> +	xfs_fsblock_t		fsbno)
> +{
> +	xfs_agnumber_t		agno = XFS_FSB_TO_AGNO(mp, fsbno);
> +
> +	if (agno >= mp->m_sb.sb_agcount)
> +		return false;
> +	return xfs_verify_agbno_ptr(mp, agno, XFS_FSB_TO_AGBNO(mp, fsbno));
> +}
> diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
> index ef26edc..3185807 100644
> --- a/fs/xfs/libxfs/xfs_alloc.h
> +++ b/fs/xfs/libxfs/xfs_alloc.h
> @@ -232,5 +232,9 @@ int xfs_alloc_query_range(struct xfs_btree_cur *cur,
>  		xfs_alloc_query_range_fn fn, void *priv);
>  int xfs_alloc_query_all(struct xfs_btree_cur *cur, xfs_alloc_query_range_fn fn,
>  		void *priv);
> +xfs_agblock_t xfs_ag_block_count(struct xfs_mount *mp, xfs_agnumber_t agno);
> +bool xfs_verify_agbno_ptr(struct xfs_mount *mp, xfs_agnumber_t agno,
> +		xfs_agblock_t agbno);
> +bool xfs_verify_fsbno_ptr(struct xfs_mount *mp, xfs_fsblock_t fsbno);
>  
>  #endif	/* __XFS_ALLOC_H__ */
> diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c
> index 5d4e43e..0a49348 100644
> --- a/fs/xfs/libxfs/xfs_rtbitmap.c
> +++ b/fs/xfs/libxfs/xfs_rtbitmap.c
> @@ -1086,3 +1086,15 @@ xfs_rtalloc_query_all(
>  
>  	return xfs_rtalloc_query_range(tp, &keys[0], &keys[1], fn, priv);
>  }
> +
> +/*
> + * Verify that an realtime block number pointer doesn't point off the
> + * end of the realtime device.
> + */
> +bool
> +xfs_verify_rtbno_ptr(
> +	struct xfs_mount	*mp,
> +	xfs_rtblock_t		rtbno)
> +{
> +	return rtbno < mp->m_sb.sb_rblocks;
> +}
> diff --git a/fs/xfs/xfs_rtalloc.h b/fs/xfs/xfs_rtalloc.h
> index 79defa7..11b8554 100644
> --- a/fs/xfs/xfs_rtalloc.h
> +++ b/fs/xfs/xfs_rtalloc.h
> @@ -138,6 +138,7 @@ int xfs_rtalloc_query_range(struct xfs_trans *tp,
>  int xfs_rtalloc_query_all(struct xfs_trans *tp,
>  			  xfs_rtalloc_query_range_fn fn,
>  			  void *priv);
> +bool xfs_verify_rtbno_ptr(struct xfs_mount *mp, xfs_rtblock_t rtbno);
>  #else
>  # define xfs_rtallocate_extent(t,b,min,max,l,f,p,rb)    (ENOSYS)
>  # define xfs_rtfree_extent(t,b,l)                       (ENOSYS)
> @@ -146,6 +147,7 @@ int xfs_rtalloc_query_all(struct xfs_trans *tp,
>  # define xfs_rtalloc_query_range(t,l,h,f,p)             (ENOSYS)
>  # define xfs_rtalloc_query_all(t,f,p)                   (ENOSYS)
>  # define xfs_rtbuf_get(m,t,b,i,p)                       (ENOSYS)
> +# define xfs_verify_rtbno_ptr(m, r)			(false)
>  static inline int		/* error */
>  xfs_rtmount_init(
>  	xfs_mount_t	*mp)	/* file system mount structure */
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 02/30] xfs: create block pointer check functions
  2017-10-12  5:28   ` Dave Chinner
@ 2017-10-12  5:48     ` Dave Chinner
  2017-10-16 19:46       ` Darrick J. Wong
  0 siblings, 1 reply; 99+ messages in thread
From: Dave Chinner @ 2017-10-12  5:48 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Oct 12, 2017 at 04:28:52PM +1100, Dave Chinner wrote:
> On Wed, Oct 11, 2017 at 06:40:55PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Create some helper functions to check that a block pointer points
> > within the filesystem (or AG) and doesn't point at static metadata.
> > We will use this for scrub.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Look fine

now that I think about it and seen a bit more code....

> 
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> 
> > ---
> >  fs/xfs/libxfs/xfs_alloc.c    |   49 ++++++++++++++++++++++++++++++++++++++++++
> >  fs/xfs/libxfs/xfs_alloc.h    |    4 +++
> >  fs/xfs/libxfs/xfs_rtbitmap.c |   12 ++++++++++
> >  fs/xfs/xfs_rtalloc.h         |    2 ++
> >  4 files changed, 67 insertions(+)
> > 
> > 
> > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> > index 744dcae..bd3a943 100644
> > --- a/fs/xfs/libxfs/xfs_alloc.c
> > +++ b/fs/xfs/libxfs/xfs_alloc.c
> > @@ -2923,3 +2923,52 @@ xfs_alloc_query_all(
> >  	query.fn = fn;
> >  	return xfs_btree_query_all(cur, xfs_alloc_query_range_helper, &query);
> >  }
> > +
> > +/* Find the size of the AG, in blocks. */
> > +xfs_agblock_t
> > +xfs_ag_block_count(
> > +	struct xfs_mount	*mp,
> > +	xfs_agnumber_t		agno)
> > +{
> > +	ASSERT(agno < mp->m_sb.sb_agcount);
> > +
> > +	if (agno < mp->m_sb.sb_agcount - 1)
> > +		return mp->m_sb.sb_agblocks;
> > +	return mp->m_sb.sb_dblocks - (agno * mp->m_sb.sb_agblocks);
> > +}
> > +
> > +/*
> > + * Verify that an AG block number pointer neither points outside the AG
> > + * nor points at static metadata.
> > + */
> > +bool
> > +xfs_verify_agbno_ptr(

You can probably drop the "_ptr" prefix from these because I don't
think we every try to check/validate the agbno/fsbno of the static
metadata....

Some of the code just reads a bit weird with the "_ptr" suffix
in it...

Still consider it reviewed....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 03/30] xfs: refactor btree pointer checks
  2017-10-12  1:41 ` [PATCH 03/30] xfs: refactor btree pointer checks Darrick J. Wong
@ 2017-10-12  5:51   ` Dave Chinner
  0 siblings, 0 replies; 99+ messages in thread
From: Dave Chinner @ 2017-10-12  5:51 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Oct 11, 2017 at 06:41:01PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Refactor the btree pointer checks so that we can call them from the
> scrub code without logging errors to dmesg.  Preserve the existing error
> reporting for regular operations.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_bmap.c  |    4 +--
>  fs/xfs/libxfs/xfs_btree.c |   70 +++++++++++++++++++++------------------------
>  fs/xfs/libxfs/xfs_btree.h |   13 +++++++-
>  3 files changed, 45 insertions(+), 42 deletions(-)

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 05/30] xfs: create inode pointer verifiers
  2017-10-12  1:41 ` [PATCH 05/30] xfs: create inode pointer verifiers Darrick J. Wong
@ 2017-10-12 20:23   ` Darrick J. Wong
  2017-10-13  5:22     ` Dave Chinner
  2017-10-16 19:49   ` [PATCH v2 " Darrick J. Wong
  1 sibling, 1 reply; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-12 20:23 UTC (permalink / raw)
  To: linux-xfs

On Wed, Oct 11, 2017 at 06:41:15PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Create some helper functions to check that inode pointers point to
> somewhere within the filesystem and not at the static AG metadata.
> Move xfs_internal_inum and create a directory inode check function.
> We will use these functions in scrub and elsewhere.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_dir2.c   |   19 ++--------
>  fs/xfs/libxfs/xfs_ialloc.c |   81 ++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/libxfs/xfs_ialloc.h |    7 ++++
>  fs/xfs/xfs_itable.c        |   10 -----
>  fs/xfs/xfs_itable.h        |    2 -
>  5 files changed, 91 insertions(+), 28 deletions(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
> index ccf9783..ee4d2a3 100644
> --- a/fs/xfs/libxfs/xfs_dir2.c
> +++ b/fs/xfs/libxfs/xfs_dir2.c
> @@ -30,6 +30,7 @@
>  #include "xfs_bmap.h"
>  #include "xfs_dir2.h"
>  #include "xfs_dir2_priv.h"
> +#include "xfs_ialloc.h"
>  #include "xfs_error.h"
>  #include "xfs_trace.h"
>  
> @@ -202,22 +203,8 @@ xfs_dir_ino_validate(
>  	xfs_mount_t	*mp,
>  	xfs_ino_t	ino)
>  {
> -	xfs_agblock_t	agblkno;
> -	xfs_agino_t	agino;
> -	xfs_agnumber_t	agno;
> -	int		ino_ok;
> -	int		ioff;
> -
> -	agno = XFS_INO_TO_AGNO(mp, ino);
> -	agblkno = XFS_INO_TO_AGBNO(mp, ino);
> -	ioff = XFS_INO_TO_OFFSET(mp, ino);
> -	agino = XFS_OFFBNO_TO_AGINO(mp, agblkno, ioff);
> -	ino_ok =
> -		agno < mp->m_sb.sb_agcount &&
> -		agblkno < mp->m_sb.sb_agblocks &&
> -		agblkno != 0 &&
> -		ioff < (1 << mp->m_sb.sb_inopblog) &&
> -		XFS_AGINO_TO_INO(mp, agno, agino) == ino;
> +	bool		ino_ok = xfs_verify_dir_ino_ptr(mp, ino);
> +
>  	if (unlikely(XFS_TEST_ERROR(!ino_ok, mp, XFS_ERRTAG_DIR_INO_VALIDATE))) {
>  		xfs_warn(mp, "Invalid inode number 0x%Lx",
>  				(unsigned long long) ino);
> diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> index 988bb3f..da3652b 100644
> --- a/fs/xfs/libxfs/xfs_ialloc.c
> +++ b/fs/xfs/libxfs/xfs_ialloc.c
> @@ -2664,3 +2664,84 @@ xfs_ialloc_pagi_init(
>  		xfs_trans_brelse(tp, bp);
>  	return 0;
>  }
> +
> +/* Calculate the first and last possible inode number in an AG. */
> +void
> +xfs_ialloc_aginode_range(
> +	struct xfs_mount	*mp,
> +	xfs_agnumber_t		agno,
> +	xfs_agino_t		*first,
> +	xfs_agino_t		*last)
> +{
> +	xfs_agblock_t		eoag;
> +
> +	eoag = xfs_ag_block_count(mp, agno);
> +	*first = round_up(XFS_OFFBNO_TO_AGINO(mp, XFS_AGFL_BLOCK(mp) + 1, 0),
> +			XFS_INODES_PER_CHUNK);
> +	*last = round_down(XFS_OFFBNO_TO_AGINO(mp, eoag, 0),
> +			XFS_INODES_PER_CHUNK) - 1;

This is incorrect; we allocate inode chunks aligned to
xfs_ialloc_cluster_alignment blocks, which doesn't necessarily result in
ir_startino being aligned to XFS_INODES_PER_CHUNK.

I think the correct code is this:

	/* Calculate the first inode. */
	bno = round_up(XFS_AGFL_BLOCK(mp) + 1,
			xfs_ialloc_cluster_alignment(mp));
	*first = XFS_OFFBNO_TO_AGINO(mp, bno, 0);

	/* Calculate the last inode. */
	bno = round_down(eoag, xfs_ialloc_cluster_alignment(mp));
	*last = XFS_OFFBNO_TO_AGINO(mp, bno, 0) - 1;

...which unfortunately I didn't realize until trying to play with
nondefault geometry options (1k blocks, no sparse inodes).

--D

> +}
> +
> +/*
> + * Verify that an AG inode number pointer neither points outside the AG
> + * nor points at static metadata.
> + */
> +bool
> +xfs_verify_agino_ptr(
> +	struct xfs_mount	*mp,
> +	xfs_agnumber_t		agno,
> +	xfs_agino_t		agino)
> +{
> +	xfs_agino_t		first;
> +	xfs_agino_t		last;
> +	int			ioff;
> +
> +	ioff = XFS_AGINO_TO_OFFSET(mp, agino);
> +	xfs_ialloc_aginode_range(mp, agno, &first, &last);
> +	return agino >= first && agino <= last &&
> +	       ioff < (1 << mp->m_sb.sb_inopblog);
> +}
> +
> +/*
> + * Verify that an FS inode number pointer neither points outside the
> + * filesystem nor points at static AG metadata.
> + */
> +bool
> +xfs_verify_ino_ptr(
> +	struct xfs_mount	*mp,
> +	xfs_ino_t		ino)
> +{
> +	xfs_agnumber_t		agno = XFS_INO_TO_AGNO(mp, ino);
> +	xfs_agino_t		agino = XFS_INO_TO_AGINO(mp, ino);
> +
> +	if (agno >= mp->m_sb.sb_agcount)
> +		return false;
> +	if (XFS_AGINO_TO_INO(mp, agno, agino) != ino)
> +		return false;
> +	return xfs_verify_agino_ptr(mp, agno, agino);
> +}
> +
> +/* Is this an internal inode number? */
> +bool
> +xfs_internal_inum(
> +	struct xfs_mount	*mp,
> +	xfs_ino_t		ino)
> +{
> +	return ino == mp->m_sb.sb_rbmino || ino == mp->m_sb.sb_rsumino ||
> +		(xfs_sb_version_hasquota(&mp->m_sb) &&
> +		 xfs_is_quota_inode(&mp->m_sb, ino));
> +}
> +
> +/*
> + * Verify that a directory entry's inode number doesn't point at an internal
> + * inode, empty space, or static AG metadata.
> + */
> +bool
> +xfs_verify_dir_ino_ptr(
> +	struct xfs_mount	*mp,
> +	xfs_ino_t		ino)
> +{
> +	if (xfs_internal_inum(mp, ino))
> +		return false;
> +	return xfs_verify_ino_ptr(mp, ino);
> +}
> diff --git a/fs/xfs/libxfs/xfs_ialloc.h b/fs/xfs/libxfs/xfs_ialloc.h
> index b32cfb5..904d69a 100644
> --- a/fs/xfs/libxfs/xfs_ialloc.h
> +++ b/fs/xfs/libxfs/xfs_ialloc.h
> @@ -173,5 +173,12 @@ void xfs_inobt_btrec_to_irec(struct xfs_mount *mp, union xfs_btree_rec *rec,
>  		struct xfs_inobt_rec_incore *irec);
>  
>  int xfs_ialloc_cluster_alignment(struct xfs_mount *mp);
> +void xfs_ialloc_aginode_range(struct xfs_mount *mp, xfs_agnumber_t agno,
> +		xfs_agino_t *first, xfs_agino_t *last);
> +bool xfs_verify_agino_ptr(struct xfs_mount *mp, xfs_agnumber_t agno,
> +		xfs_agino_t agino);
> +bool xfs_verify_ino_ptr(struct xfs_mount *mp, xfs_ino_t ino);
> +bool xfs_internal_inum(struct xfs_mount *mp, xfs_ino_t ino);
> +bool xfs_verify_dir_ino_ptr(struct xfs_mount *mp, xfs_ino_t ino);
>  
>  #endif	/* __XFS_IALLOC_H__ */
> diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
> index c393a2f..0172d0b 100644
> --- a/fs/xfs/xfs_itable.c
> +++ b/fs/xfs/xfs_itable.c
> @@ -31,16 +31,6 @@
>  #include "xfs_trace.h"
>  #include "xfs_icache.h"
>  
> -int
> -xfs_internal_inum(
> -	xfs_mount_t	*mp,
> -	xfs_ino_t	ino)
> -{
> -	return (ino == mp->m_sb.sb_rbmino || ino == mp->m_sb.sb_rsumino ||
> -		(xfs_sb_version_hasquota(&mp->m_sb) &&
> -		 xfs_is_quota_inode(&mp->m_sb, ino)));
> -}
> -
>  /*
>   * Return stat information for one inode.
>   * Return 0 if ok, else errno.
> diff --git a/fs/xfs/xfs_itable.h b/fs/xfs/xfs_itable.h
> index 17e86e0..6ea8b39 100644
> --- a/fs/xfs/xfs_itable.h
> +++ b/fs/xfs/xfs_itable.h
> @@ -96,6 +96,4 @@ xfs_inumbers(
>  	void			__user *buffer, /* buffer with inode info */
>  	inumbers_fmt_pf		formatter);
>  
> -int xfs_internal_inum(struct xfs_mount *mp, xfs_ino_t ino);
> -
>  #endif	/* __XFS_ITABLE_H__ */
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 21/30] xfs: scrub inodes
  2017-10-12  1:43 ` [PATCH 21/30] xfs: scrub inodes Darrick J. Wong
@ 2017-10-12 22:32   ` Darrick J. Wong
  2017-10-16  3:16     ` Dave Chinner
  2017-10-17  0:13   ` [PATCH v2 " Darrick J. Wong
  1 sibling, 1 reply; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-12 22:32 UTC (permalink / raw)
  To: linux-xfs

On Wed, Oct 11, 2017 at 06:43:00PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Scrub the fields within an inode.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/Makefile        |    1 
>  fs/xfs/libxfs/xfs_fs.h |    3 
>  fs/xfs/scrub/common.c  |   54 ++++
>  fs/xfs/scrub/common.h  |    3 
>  fs/xfs/scrub/inode.c   |  607 ++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/scrub/scrub.c   |   18 +
>  fs/xfs/scrub/scrub.h   |    2 
>  7 files changed, 685 insertions(+), 3 deletions(-)
>  create mode 100644 fs/xfs/scrub/inode.c
> 
> 
> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> index a7c5752..28e14b7 100644
> --- a/fs/xfs/Makefile
> +++ b/fs/xfs/Makefile
> @@ -151,6 +151,7 @@ xfs-y				+= $(addprefix scrub/, \
>  				   btree.o \
>  				   common.o \
>  				   ialloc.o \
> +				   inode.o \
>  				   refcount.o \
>  				   rmap.o \
>  				   scrub.o \
> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> index b3f992c..f8463e0 100644
> --- a/fs/xfs/libxfs/xfs_fs.h
> +++ b/fs/xfs/libxfs/xfs_fs.h
> @@ -494,9 +494,10 @@ struct xfs_scrub_metadata {
>  #define XFS_SCRUB_TYPE_FINOBT	8	/* free inode btree */
>  #define XFS_SCRUB_TYPE_RMAPBT	9	/* reverse mapping btree */
>  #define XFS_SCRUB_TYPE_REFCNTBT	10	/* reference count btree */
> +#define XFS_SCRUB_TYPE_INODE	11	/* inode record */
>  
>  /* Number of scrub subcommands. */
> -#define XFS_SCRUB_TYPE_NR	11
> +#define XFS_SCRUB_TYPE_NR	12
>  
>  /* i: Repair this metadata. */
>  #define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> index 39165c3..415c6a9 100644
> --- a/fs/xfs/scrub/common.c
> +++ b/fs/xfs/scrub/common.c
> @@ -30,6 +30,8 @@
>  #include "xfs_trans.h"
>  #include "xfs_sb.h"
>  #include "xfs_inode.h"
> +#include "xfs_icache.h"
> +#include "xfs_itable.h"
>  #include "xfs_alloc.h"
>  #include "xfs_alloc_btree.h"
>  #include "xfs_bmap.h"
> @@ -488,3 +490,55 @@ xfs_scrub_checkpoint_log(
>  	xfs_ail_push_all_sync(mp->m_ail);
>  	return 0;
>  }
> +
> +/*
> + * Given an inode and the scrub control structure, grab either the
> + * inode referenced in the control structure or the inode passed in.
> + * The inode is not locked.
> + */
> +int
> +xfs_scrub_get_inode(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_inode		*ip_in)
> +{
> +	struct xfs_mount		*mp = sc->mp;
> +	struct xfs_inode		*ip = NULL;
> +	int				error;
> +
> +	/*
> +	 * If userspace passed us an AG number or a generation number
> +	 * without an inode number, they haven't got a clue so bail out
> +	 * immediately.
> +	 */
> +	if (sc->sm->sm_agno || (sc->sm->sm_gen && !sc->sm->sm_ino))
> +		return -EINVAL;
> +
> +	/* We want to scan the inode we already had opened. */
> +	if (sc->sm->sm_ino == 0 || sc->sm->sm_ino == ip_in->i_ino) {
> +		sc->ip = ip_in;
> +		return 0;
> +	}
> +
> +	/* Look up the inode, see if the generation number matches. */
> +	if (xfs_internal_inum(mp, sc->sm->sm_ino))
> +		return -ENOENT;
> +	error = xfs_iget(mp, NULL, sc->sm->sm_ino,
> +			XFS_IGET_UNTRUSTED | XFS_IGET_DONTCACHE, 0, &ip);
> +	if (error == -ENOENT || error == -EINVAL) {
> +		/* inode doesn't exist... */
> +		return -ENOENT;
> +	} else if (error) {
> +		trace_xfs_scrub_op_error(sc,
> +				XFS_INO_TO_AGNO(mp, sc->sm->sm_ino),
> +				XFS_INO_TO_AGBNO(mp, sc->sm->sm_ino),
> +				error, __return_address);
> +		return error;
> +	}
> +	if (VFS_I(ip)->i_generation != sc->sm->sm_gen) {
> +		iput(VFS_I(ip));
> +		return -ENOENT;
> +	}
> +
> +	sc->ip = ip;
> +	return 0;
> +}
> diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
> index 610e956..fcec11e 100644
> --- a/fs/xfs/scrub/common.h
> +++ b/fs/xfs/scrub/common.h
> @@ -87,6 +87,8 @@ int xfs_scrub_setup_ag_rmapbt(struct xfs_scrub_context *sc,
>  			      struct xfs_inode *ip);
>  int xfs_scrub_setup_ag_refcountbt(struct xfs_scrub_context *sc,
>  				  struct xfs_inode *ip);
> +int xfs_scrub_setup_inode(struct xfs_scrub_context *sc,
> +			  struct xfs_inode *ip);
>  
>  
>  void xfs_scrub_ag_free(struct xfs_scrub_context *sc, struct xfs_scrub_ag *sa);
> @@ -105,5 +107,6 @@ int xfs_scrub_walk_agfl(struct xfs_scrub_context *sc,
>  
>  int xfs_scrub_setup_ag_btree(struct xfs_scrub_context *sc,
>  			     struct xfs_inode *ip, bool force_log);
> +int xfs_scrub_get_inode(struct xfs_scrub_context *sc, struct xfs_inode *ip_in);
>  
>  #endif	/* __XFS_SCRUB_COMMON_H__ */
> diff --git a/fs/xfs/scrub/inode.c b/fs/xfs/scrub/inode.c
> new file mode 100644
> index 0000000..aa1c549
> --- /dev/null
> +++ b/fs/xfs/scrub/inode.c
> @@ -0,0 +1,607 @@
> +/*
> + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> + *
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#include "xfs.h"
> +#include "xfs_fs.h"
> +#include "xfs_shared.h"
> +#include "xfs_format.h"
> +#include "xfs_trans_resv.h"
> +#include "xfs_mount.h"
> +#include "xfs_defer.h"
> +#include "xfs_btree.h"
> +#include "xfs_bit.h"
> +#include "xfs_log_format.h"
> +#include "xfs_trans.h"
> +#include "xfs_sb.h"
> +#include "xfs_inode.h"
> +#include "xfs_icache.h"
> +#include "xfs_inode_buf.h"
> +#include "xfs_inode_fork.h"
> +#include "xfs_ialloc.h"
> +#include "xfs_da_format.h"
> +#include "xfs_reflink.h"
> +#include "scrub/xfs_scrub.h"
> +#include "scrub/scrub.h"
> +#include "scrub/common.h"
> +#include "scrub/trace.h"
> +
> +/*
> + * Grab total control of the inode metadata.  It doesn't matter here if
> + * the file data is still changing; exclusive access to the metadata is
> + * the goal.
> + */
> +int
> +xfs_scrub_setup_inode(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_inode		*ip)
> +{
> +	struct xfs_mount		*mp = sc->mp;
> +	int				error;
> +
> +	/*
> +	 * Try to get the inode.  If the verifiers fail, we try again
> +	 * in raw mode.
> +	 */
> +	error = xfs_scrub_get_inode(sc, ip);
> +	switch (error) {
> +	case 0:
> +		break;
> +	case -EFSCORRUPTED:
> +	case -EFSBADCRC:
> +		return 0;
> +	default:
> +		return error;
> +	}
> +
> +	/* Got the inode, lock it and we're ready to go. */
> +	sc->ilock_flags = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL;
> +	xfs_ilock(sc->ip, sc->ilock_flags);
> +	error = xfs_scrub_trans_alloc(sc->sm, mp, &sc->tp);
> +	if (error)
> +		goto out;
> +	sc->ilock_flags |= XFS_ILOCK_EXCL;
> +	xfs_ilock(sc->ip, XFS_ILOCK_EXCL);
> +
> +out:
> +	/* scrub teardown will unlock and release the inode for us */
> +	return error;
> +}
> +
> +/* Inode core */
> +
> +/*
> + * di_extsize hint validation is somewhat cumbersome. Rules are:
> + *
> + * 1. extent size hint is only valid for directories and regular files
> + * 2. DIFLAG_EXTSIZE is only valid for regular files
> + * 3. DIFLAG_EXTSZINHERIT is only valid for directories.
> + * 4. extsize hint of 0 turns off hints, clears inode flags.
> + * 5. either flag must be set if extsize != 0
> + * 6. Extent size must be a multiple of the appropriate block size.
> + * 7. extent size hint cannot be longer than maximum extent length
> + * 8. for non-realtime files, the extent size hint must be limited
> + *    to half the AG size to avoid alignment extending the extent
> + *    beyond the limits of the AG.
> + */
> +STATIC void
> +xfs_scrub_inode_extsize(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_buf			*bp,
> +	struct xfs_dinode		*dip,
> +	xfs_ino_t			ino,
> +	uint16_t			mode,
> +	uint16_t			flags)
> +{
> +	struct xfs_mount		*mp = sc->mp;
> +	bool				rt_flag;
> +	bool				hint_flag;
> +	bool				inherit_flag;
> +	uint32_t			extsize;
> +	uint32_t			extsize_bytes;
> +	uint32_t			blocksize_bytes;
> +
> +	rt_flag = (flags & XFS_DIFLAG_REALTIME);
> +	hint_flag = (flags & XFS_DIFLAG_EXTSIZE);
> +	inherit_flag = (flags & XFS_DIFLAG_EXTSZINHERIT);
> +	extsize = be32_to_cpu(dip->di_extsize);
> +	extsize_bytes = XFS_FSB_TO_B(sc->mp, extsize);
> +
> +	if (rt_flag)
> +		blocksize_bytes = mp->m_sb.sb_rextsize << mp->m_sb.sb_blocklog;
> +	else
> +		blocksize_bytes = mp->m_sb.sb_blocksize;
> +
> +	if ((hint_flag || inherit_flag) && (!S_ISDIR(mode) && !S_ISREG(mode)))
> +		goto bad;
> +
> +	if (hint_flag && !S_ISREG(mode))
> +		goto bad;
> +
> +	if (inherit_flag && !S_ISDIR(mode))
> +		goto bad;
> +
> +	if ((hint_flag || inherit_flag) && extsize == 0)
> +		goto bad;
> +
> +	if (!(hint_flag || inherit_flag) && extsize != 0)
> +		goto bad;
> +
> +	if (extsize_bytes % blocksize_bytes)
> +		goto bad;
> +
> +	if (extsize > MAXEXTLEN)
> +		goto bad;
> +
> +	if (!rt_flag && extsize > mp->m_sb.sb_agblocks / 2)
> +		goto bad;
> +
> +	return;
> +bad:
> +	xfs_scrub_ino_set_corrupt(sc, ino, bp);
> +}
> +
> +/*
> + * di_cowextsize hint validation is somewhat cumbersome. Rules are:
> + *
> + * 1. flag requires reflink feature
> + * 2. cow extent size hint is only valid for directories and regular files
> + * 3. cow extsize hint of 0 turns off hints, clears inode flags.
> + * 4. either flag must be set if cow extsize != 0
> + * 5. flag cannot be set for rt files
> + * 6. Extent size must be a multiple of the appropriate block size.
> + * 7. extent size hint cannot be longer than maximum extent length
> + * 8. the extent size hint must be limited
> + *    to half the AG size to avoid alignment extending the extent
> + *    beyond the limits of the AG.
> + */
> +STATIC void
> +xfs_scrub_inode_cowextsize(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_buf			*bp,
> +	struct xfs_dinode		*dip,
> +	xfs_ino_t			ino,
> +	uint16_t			mode,
> +	uint16_t			flags,
> +	uint64_t			flags2)
> +{
> +	struct xfs_mount		*mp = sc->mp;
> +	bool				rt_flag;
> +	bool				hint_flag;
> +	uint32_t			extsize;
> +	uint32_t			extsize_bytes;
> +
> +	rt_flag = (flags & XFS_DIFLAG_REALTIME);
> +	hint_flag = (flags2 & XFS_DIFLAG2_COWEXTSIZE);
> +	extsize = be32_to_cpu(dip->di_extsize);

Doh, this ought to be extsize = be32_to_cpu(dip->di_cowextsize); will fix.

--D

> +	extsize_bytes = XFS_FSB_TO_B(sc->mp, extsize);
> +
> +	if (hint_flag && !xfs_sb_version_hasreflink(&mp->m_sb))
> +		goto bad;
> +
> +	if (hint_flag && (!S_ISDIR(mode) && !S_ISREG(mode)))
> +		goto bad;
> +
> +	if (hint_flag && extsize == 0)
> +		goto bad;
> +
> +	if (!hint_flag && extsize != 0)
> +		goto bad;
> +
> +	if (hint_flag && rt_flag)
> +		goto bad;
> +
> +	if (extsize_bytes % mp->m_sb.sb_blocksize)
> +		goto bad;
> +
> +	if (extsize > MAXEXTLEN)
> +		goto bad;
> +
> +	if (extsize > mp->m_sb.sb_agblocks / 2)
> +		goto bad;
> +
> +	return;
> +bad:
> +	xfs_scrub_ino_set_corrupt(sc, ino, bp);
> +}
> +
> +/* Make sure the di_flags make sense for the inode. */
> +STATIC void
> +xfs_scrub_inode_flags(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_buf			*bp,
> +	struct xfs_dinode		*dip,
> +	xfs_ino_t			ino,
> +	uint16_t			mode,
> +	uint16_t			flags)
> +{
> +	struct xfs_mount		*mp = sc->mp;
> +
> +	if (flags & ~XFS_DIFLAG_ANY)
> +		goto bad;
> +
> +	/* rt flags require rt device */
> +	if ((flags & (XFS_DIFLAG_REALTIME | XFS_DIFLAG_RTINHERIT)) &&
> +	    !mp->m_rtdev_targp)
> +		goto bad;
> +
> +	/* new rt bitmap flag only valid for rbmino */
> +	if ((flags & XFS_DIFLAG_NEWRTBM) && ino != mp->m_sb.sb_rbmino)
> +		goto bad;
> +
> +	/* directory-only flags */
> +	if ((flags & (XFS_DIFLAG_RTINHERIT |
> +		     XFS_DIFLAG_EXTSZINHERIT |
> +		     XFS_DIFLAG_PROJINHERIT |
> +		     XFS_DIFLAG_NOSYMLINKS)) &&
> +	    !S_ISDIR(mode))
> +		goto bad;
> +
> +	/* file-only flags */
> +	if ((flags & (XFS_DIFLAG_REALTIME | FS_XFLAG_EXTSIZE)) &&
> +	    !S_ISREG(mode))
> +		goto bad;
> +
> +	/* filestreams and rt make no sense */
> +	if ((flags & XFS_DIFLAG_FILESTREAM) && (flags & XFS_DIFLAG_REALTIME))
> +		goto bad;
> +
> +	return;
> +bad:
> +	xfs_scrub_ino_set_corrupt(sc, ino, bp);
> +}
> +
> +/* Make sure the di_flags2 make sense for the inode. */
> +STATIC void
> +xfs_scrub_inode_flags2(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_buf			*bp,
> +	struct xfs_dinode		*dip,
> +	xfs_ino_t			ino,
> +	uint16_t			mode,
> +	uint64_t			flags2)
> +{
> +	struct xfs_mount		*mp = sc->mp;
> +
> +	if (flags2 & ~XFS_DIFLAG2_ANY)
> +		goto bad;
> +
> +	/* reflink flag requires reflink feature */
> +	if ((flags2 & XFS_DIFLAG2_REFLINK) &&
> +	    !xfs_sb_version_hasreflink(&mp->m_sb))
> +		goto bad;
> +
> +	/* cowextsize flag is checked w.r.t. mode separately */
> +
> +	/* file-only flags */
> +	if ((flags2 & (XFS_DIFLAG2_DAX | XFS_DIFLAG2_REFLINK)) &&
> +	    !S_ISREG(mode))
> +		goto bad;
> +
> +	/* dax and reflink make no sense, currently */
> +	if ((flags2 & XFS_DIFLAG2_DAX) && (flags2 & XFS_DIFLAG2_REFLINK))
> +		goto bad;
> +
> +	return;
> +bad:
> +	xfs_scrub_ino_set_corrupt(sc, ino, bp);
> +}
> +
> +/* Scrub all the ondisk inode fields. */
> +STATIC void
> +xfs_scrub_dinode(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_buf			*bp,
> +	struct xfs_dinode		*dip,
> +	xfs_ino_t			ino)
> +{
> +	struct xfs_mount		*mp = sc->mp;
> +	size_t				fork_recs;
> +	unsigned long long		isize;
> +	uint64_t			flags2;
> +	uint32_t			nextents;
> +	uint16_t			flags;
> +	uint16_t			mode;
> +
> +	flags = be16_to_cpu(dip->di_flags);
> +	if (dip->di_version >= 3)
> +		flags2 = be64_to_cpu(dip->di_flags2);
> +	else
> +		flags2 = 0;
> +
> +	/* di_mode */
> +	mode = be16_to_cpu(dip->di_mode);
> +	if (mode & ~(S_IALLUGO | S_IFMT))
> +		xfs_scrub_ino_set_corrupt(sc, ino, bp);
> +
> +	/* v1/v2 fields */
> +	switch (dip->di_version) {
> +	case 1:
> +		/*
> +		 * We autoconvert v1 inodes into v2 inodes on writeout,
> +		 * so just mark this inode for preening.
> +		 */
> +		xfs_scrub_ino_set_preen(sc, bp);
> +		break;
> +	case 2:
> +	case 3:
> +		if (dip->di_onlink != 0)
> +			xfs_scrub_ino_set_corrupt(sc, ino, bp);
> +
> +		if (dip->di_mode == 0 && sc->ip)
> +			xfs_scrub_ino_set_corrupt(sc, ino, bp);
> +
> +		if (dip->di_projid_hi != 0 &&
> +		    !xfs_sb_version_hasprojid32bit(&mp->m_sb))
> +			xfs_scrub_ino_set_corrupt(sc, ino, bp);
> +		break;
> +	default:
> +		xfs_scrub_ino_set_corrupt(sc, ino, bp);
> +		return;
> +	}
> +
> +	/*
> +	 * di_uid/di_gid -- -1 isn't invalid, but there's no way that
> +	 * userspace could have created that.
> +	 */
> +	if (dip->di_uid == cpu_to_be32(-1U) ||
> +	    dip->di_gid == cpu_to_be32(-1U))
> +		xfs_scrub_ino_set_warning(sc, bp);
> +
> +	/* di_format */
> +	switch (dip->di_format) {
> +	case XFS_DINODE_FMT_DEV:
> +		if (!S_ISCHR(mode) && !S_ISBLK(mode) &&
> +		    !S_ISFIFO(mode) && !S_ISSOCK(mode))
> +			xfs_scrub_ino_set_corrupt(sc, ino, bp);
> +		break;
> +	case XFS_DINODE_FMT_LOCAL:
> +		if (!S_ISDIR(mode) && !S_ISLNK(mode))
> +			xfs_scrub_ino_set_corrupt(sc, ino, bp);
> +		break;
> +	case XFS_DINODE_FMT_EXTENTS:
> +		if (!S_ISREG(mode) && !S_ISDIR(mode) && !S_ISLNK(mode))
> +			xfs_scrub_ino_set_corrupt(sc, ino, bp);
> +		break;
> +	case XFS_DINODE_FMT_BTREE:
> +		if (!S_ISREG(mode) && !S_ISDIR(mode))
> +			xfs_scrub_ino_set_corrupt(sc, ino, bp);
> +		break;
> +	case XFS_DINODE_FMT_UUID:
> +	default:
> +		xfs_scrub_ino_set_corrupt(sc, ino, bp);
> +		break;
> +	}
> +
> +	/*
> +	 * di_size.  xfs_dinode_verify checks for things that screw up
> +	 * the VFS such as the upper bit being set and zero-length
> +	 * symlinks/directories, but we can do more here.
> +	 */
> +	isize = be64_to_cpu(dip->di_size);
> +	if (isize & (1ULL << 63))
> +		xfs_scrub_ino_set_corrupt(sc, ino, bp);
> +
> +	/* Devices, fifos, and sockets must have zero size */
> +	if (!S_ISDIR(mode) && !S_ISREG(mode) && !S_ISLNK(mode) && isize != 0)
> +		xfs_scrub_ino_set_corrupt(sc, ino, bp);
> +
> +	/* Directories can't be larger than the data section size (32G) */
> +	if (S_ISDIR(mode) && (isize == 0 || isize >= XFS_DIR2_SPACE_SIZE))
> +		xfs_scrub_ino_set_corrupt(sc, ino, bp);
> +
> +	/* Symlinks can't be larger than SYMLINK_MAXLEN */
> +	if (S_ISLNK(mode) && (isize == 0 || isize >= XFS_SYMLINK_MAXLEN))
> +		xfs_scrub_ino_set_corrupt(sc, ino, bp);
> +
> +	/*
> +	 * Warn if the running kernel can't handle the kinds of offsets
> +	 * needed to deal with the file size.  In other words, if the
> +	 * pagecache can't cache all the blocks in this file due to
> +	 * overly large offsets, flag the inode for admin review.
> +	 */
> +	if (isize >= mp->m_super->s_maxbytes)
> +		xfs_scrub_ino_set_warning(sc, bp);
> +
> +	/* di_nblocks */
> +	if (flags2 & XFS_DIFLAG2_REFLINK) {
> +		; /* nblocks can exceed dblocks */
> +	} else if (flags & XFS_DIFLAG_REALTIME) {
> +		/*
> +		 * nblocks is the sum of data extents (in the rtdev),
> +		 * attr extents (in the datadev), and both forks' bmbt
> +		 * blocks (in the datadev).  This clumsy check is the
> +		 * best we can do without cross-referencing with the
> +		 * inode forks.
> +		 */
> +		if (be64_to_cpu(dip->di_nblocks) >=
> +		    mp->m_sb.sb_dblocks + mp->m_sb.sb_rblocks)
> +			xfs_scrub_ino_set_corrupt(sc, ino, bp);
> +	} else {
> +		if (be64_to_cpu(dip->di_nblocks) >= mp->m_sb.sb_dblocks)
> +			xfs_scrub_ino_set_corrupt(sc, ino, bp);
> +	}
> +
> +	xfs_scrub_inode_flags(sc, bp, dip, ino, mode, flags);
> +
> +	xfs_scrub_inode_extsize(sc, bp, dip, ino, mode, flags);
> +
> +	/* di_nextents */
> +	nextents = be32_to_cpu(dip->di_nextents);
> +	fork_recs =  XFS_DFORK_DSIZE(dip, mp) / sizeof(struct xfs_bmbt_rec);
> +	switch (dip->di_format) {
> +	case XFS_DINODE_FMT_EXTENTS:
> +		if (nextents > fork_recs)
> +			xfs_scrub_ino_set_corrupt(sc, ino, bp);
> +		break;
> +	case XFS_DINODE_FMT_BTREE:
> +		if (nextents <= fork_recs)
> +			xfs_scrub_ino_set_corrupt(sc, ino, bp);
> +		break;
> +	default:
> +		if (nextents != 0)
> +			xfs_scrub_ino_set_corrupt(sc, ino, bp);
> +		break;
> +	}
> +
> +	/* di_forkoff */
> +	if (XFS_DFORK_APTR(dip) >= (char *)dip + mp->m_sb.sb_inodesize)
> +		xfs_scrub_ino_set_corrupt(sc, ino, bp);
> +	if (dip->di_anextents != 0 && dip->di_forkoff == 0)
> +		xfs_scrub_ino_set_corrupt(sc, ino, bp);
> +	if (dip->di_forkoff == 0 && dip->di_aformat != XFS_DINODE_FMT_EXTENTS)
> +		xfs_scrub_ino_set_corrupt(sc, ino, bp);
> +
> +	/* di_aformat */
> +	if (dip->di_aformat != XFS_DINODE_FMT_LOCAL &&
> +	    dip->di_aformat != XFS_DINODE_FMT_EXTENTS &&
> +	    dip->di_aformat != XFS_DINODE_FMT_BTREE)
> +		xfs_scrub_ino_set_corrupt(sc, ino, bp);
> +
> +	/* di_anextents */
> +	nextents = be16_to_cpu(dip->di_anextents);
> +	fork_recs =  XFS_DFORK_ASIZE(dip, mp) / sizeof(struct xfs_bmbt_rec);
> +	switch (dip->di_aformat) {
> +	case XFS_DINODE_FMT_EXTENTS:
> +		if (nextents > fork_recs)
> +			xfs_scrub_ino_set_corrupt(sc, ino, bp);
> +		break;
> +	case XFS_DINODE_FMT_BTREE:
> +		if (nextents <= fork_recs)
> +			xfs_scrub_ino_set_corrupt(sc, ino, bp);
> +		break;
> +	default:
> +		if (nextents != 0)
> +			xfs_scrub_ino_set_corrupt(sc, ino, bp);
> +	}
> +
> +	if (flags2)
> +		xfs_scrub_inode_flags2(sc, bp, dip, ino, mode, flags2);
> +
> +	xfs_scrub_inode_cowextsize(sc, bp, dip, ino, mode, flags, flags2);
> +}
> +
> +/* Map and read a raw inode. */
> +STATIC int
> +xfs_scrub_inode_map_raw(
> +	struct xfs_scrub_context	*sc,
> +	xfs_ino_t			ino,
> +	struct xfs_buf			**bpp,
> +	struct xfs_dinode		**dipp)
> +{
> +	struct xfs_imap			imap;
> +	struct xfs_mount		*mp = sc->mp;
> +	struct xfs_buf			*bp;
> +	struct xfs_dinode		*dip;
> +	int				error;
> +
> +	error = xfs_imap(mp, sc->tp, ino, &imap, XFS_IGET_UNTRUSTED);
> +	if (error == -EINVAL) {
> +		/*
> +		 * Inode could have gotten deleted out from under us;
> +		 * just forget about it.
> +		 */
> +		error = -ENOENT;
> +		goto out;
> +	}
> +	if (!xfs_scrub_process_error(sc, XFS_INO_TO_AGNO(mp, ino),
> +			XFS_INO_TO_AGBNO(mp, ino), &error))
> +		goto out;
> +
> +	error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
> +			imap.im_blkno, imap.im_len, XBF_UNMAPPED, &bp,
> +			NULL);
> +	if (!xfs_scrub_process_error(sc, XFS_INO_TO_AGNO(mp, ino),
> +			XFS_INO_TO_AGBNO(mp, ino), &error))
> +		goto out;
> +
> +	/* Is this really an inode? */
> +	bp->b_ops = &xfs_inode_buf_ops;
> +	dip = xfs_buf_offset(bp, imap.im_boffset);
> +	if (!xfs_dinode_verify(mp, ino, dip) ||
> +	    !xfs_dinode_good_version(mp, dip->di_version)) {
> +		xfs_scrub_ino_set_corrupt(sc, ino, bp);
> +		goto out;
> +	}
> +
> +	/* ...and is it the one we asked for? */
> +	if (be32_to_cpu(dip->di_gen) != sc->sm->sm_gen) {
> +		error = -ENOENT;
> +		goto out;
> +	}
> +
> +	*dipp = dip;
> +	*bpp = bp;
> +out:
> +	return error;
> +}
> +
> +/* Scrub an inode. */
> +int
> +xfs_scrub_inode(
> +	struct xfs_scrub_context	*sc)
> +{
> +	struct xfs_dinode		di;
> +	struct xfs_mount		*mp = sc->mp;
> +	struct xfs_buf			*bp = NULL;
> +	struct xfs_dinode		*dip;
> +	xfs_ino_t			ino;
> +
> +	bool				has_shared;
> +	int				error = 0;
> +
> +	/* Did we get the in-core inode, or are we doing this manually? */
> +	if (sc->ip) {
> +		ino = sc->ip->i_ino;
> +		xfs_inode_to_disk(sc->ip, &di, 0);
> +		dip = &di;
> +	} else {
> +		/* Map & read inode. */
> +		ino = sc->sm->sm_ino;
> +		error = xfs_scrub_inode_map_raw(sc, ino, &bp, &dip);
> +		if (error)
> +			goto out;
> +	}
> +
> +	xfs_scrub_dinode(sc, bp, dip, ino);
> +	if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
> +		goto out;
> +
> +	/* Now let's do the things that require a live inode. */
> +	if (!sc->ip)
> +		goto out;
> +
> +	/*
> +	 * Does this inode have the reflink flag set but no shared extents?
> +	 * Set the preening flag if this is the case.
> +	 */
> +	if (xfs_is_reflink_inode(sc->ip)) {
> +		error = xfs_reflink_inode_has_shared_extents(sc->tp, sc->ip,
> +				&has_shared);
> +		if (!xfs_scrub_process_error(sc, XFS_INO_TO_AGNO(mp, ino),
> +				XFS_INO_TO_AGBNO(mp, ino), &error))
> +			goto out;
> +		if (!has_shared)
> +			xfs_scrub_ino_set_preen(sc, bp);
> +	}
> +
> +out:
> +	if (bp)
> +		xfs_trans_brelse(sc->tp, bp);
> +	return error;
> +}
> diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
> index 10c9078..ab4209c 100644
> --- a/fs/xfs/scrub/scrub.c
> +++ b/fs/xfs/scrub/scrub.c
> @@ -30,6 +30,8 @@
>  #include "xfs_trans.h"
>  #include "xfs_sb.h"
>  #include "xfs_inode.h"
> +#include "xfs_icache.h"
> +#include "xfs_itable.h"
>  #include "xfs_alloc.h"
>  #include "xfs_alloc_btree.h"
>  #include "xfs_bmap.h"
> @@ -131,6 +133,7 @@ xfs_scrub_probe(
>  STATIC int
>  xfs_scrub_teardown(
>  	struct xfs_scrub_context	*sc,
> +	struct xfs_inode		*ip_in,
>  	int				error)
>  {
>  	xfs_scrub_ag_free(sc, &sc->sa);
> @@ -138,6 +141,13 @@ xfs_scrub_teardown(
>  		xfs_trans_cancel(sc->tp);
>  		sc->tp = NULL;
>  	}
> +	if (sc->ip) {
> +		xfs_iunlock(sc->ip, sc->ilock_flags);
> +		if (sc->ip != ip_in &&
> +		    !xfs_internal_inum(sc->mp, sc->ip->i_ino))
> +			iput(VFS_I(sc->ip));
> +		sc->ip = NULL;
> +	}
>  	return error;
>  }
>  
> @@ -191,6 +201,10 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
>  		.scrub	= xfs_scrub_refcountbt,
>  		.has	= xfs_sb_version_hasreflink,
>  	},
> +	{ /* inode record */
> +		.setup	= xfs_scrub_setup_inode,
> +		.scrub	= xfs_scrub_inode,
> +	},
>  };
>  
>  /* This isn't a stable feature, warn once per day. */
> @@ -290,7 +304,7 @@ xfs_scrub_metadata(
>  		 * Tear down everything we hold, then set up again with
>  		 * preparation for worst-case scenarios.
>  		 */
> -		error = xfs_scrub_teardown(&sc, 0);
> +		error = xfs_scrub_teardown(&sc, ip, 0);
>  		if (error)
>  			goto out;
>  		try_harder = true;
> @@ -303,7 +317,7 @@ xfs_scrub_metadata(
>  		xfs_alert_ratelimited(mp, "Corruption detected during scrub.");
>  
>  out_teardown:
> -	error = xfs_scrub_teardown(&sc, error);
> +	error = xfs_scrub_teardown(&sc, ip, error);
>  out:
>  	trace_xfs_scrub_done(ip, sm, error);
>  	return error;
> diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
> index 1c80bf5..ec635d4 100644
> --- a/fs/xfs/scrub/scrub.h
> +++ b/fs/xfs/scrub/scrub.h
> @@ -59,6 +59,7 @@ struct xfs_scrub_context {
>  	const struct xfs_scrub_meta_ops	*ops;
>  	struct xfs_trans		*tp;
>  	struct xfs_inode		*ip;
> +	uint				ilock_flags;
>  	bool				try_harder;
>  
>  	/* State tracking for single-AG operations. */
> @@ -77,5 +78,6 @@ int xfs_scrub_inobt(struct xfs_scrub_context *sc);
>  int xfs_scrub_finobt(struct xfs_scrub_context *sc);
>  int xfs_scrub_rmapbt(struct xfs_scrub_context *sc);
>  int xfs_scrub_refcountbt(struct xfs_scrub_context *sc);
> +int xfs_scrub_inode(struct xfs_scrub_context *sc);
>  
>  #endif	/* __XFS_SCRUB_SCRUB_H__ */
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 04/30] xfs: refactor btree block header checking functions
  2017-10-12  1:41 ` [PATCH 04/30] xfs: refactor btree block header checking functions Darrick J. Wong
@ 2017-10-13  1:01   ` Dave Chinner
  2017-10-13 21:15     ` Darrick J. Wong
  2017-10-16 19:48   ` [PATCH v2 " Darrick J. Wong
  1 sibling, 1 reply; 99+ messages in thread
From: Dave Chinner @ 2017-10-13  1:01 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Oct 11, 2017 at 06:41:07PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Refactor the btree block header checks to have an internal function that
> returns the address of the failing check without logging errors.  The
> scrubber will call the internal function, while the external version
> will maintain the current logging behavior.

.....

> diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
> index 8f52eda..baf7064 100644
> --- a/fs/xfs/libxfs/xfs_btree.h
> +++ b/fs/xfs/libxfs/xfs_btree.h
> @@ -255,6 +255,11 @@ typedef struct xfs_btree_cur
>   */
>  #define	XFS_BUF_TO_BLOCK(bp)	((struct xfs_btree_block *)((bp)->b_addr))
>  
> +/* Internal long and short btree block checks. */
> +void *__xfs_btree_check_lblock(struct xfs_btree_cur *cur,
> +		struct xfs_btree_block *block, int level, struct xfs_buf *bp);
> +void *__xfs_btree_check_sblock(struct xfs_btree_cur *cur,
> +		struct xfs_btree_block *block, int level, struct xfs_buf *bp);

/*
 * Internal long and short btree block checks. They return NULL if
 * the block is OK, otherwise they return the address of the failed
 * check.
 */

>  
>  /*
>   * Check that block header is ok.
> diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
> index dcd1292..b825953 100644
> --- a/fs/xfs/xfs_linux.h
> +++ b/fs/xfs/xfs_linux.h
> @@ -142,6 +142,13 @@ typedef __u32			xfs_nlink_t;
>  #define SYNCHRONIZE()	barrier()
>  #define __return_address __builtin_return_address(0)
>  
> +/*
> + * Return the address of a label.  Use asm volatile so that the optimizer
> + * won't try anything stupid like refactoring the error jumpouts into a
> + * single return, which throws off the reported address.
> + */
> +#define __this_address  ({ __label__ __here; __here: asm volatile(""); &&__here; })

I think this should probably use barrier() rather than an asm
statement - can you check that this works correctly with gcc?  Other
compilers won't work with a asm statement (llvm/intel) but should
DTRT with a compiler barrier intrinsic...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 05/30] xfs: create inode pointer verifiers
  2017-10-12 20:23   ` Darrick J. Wong
@ 2017-10-13  5:22     ` Dave Chinner
  2017-10-13 16:16       ` Darrick J. Wong
  0 siblings, 1 reply; 99+ messages in thread
From: Dave Chinner @ 2017-10-13  5:22 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Oct 12, 2017 at 01:23:03PM -0700, Darrick J. Wong wrote:
> On Wed, Oct 11, 2017 at 06:41:15PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Create some helper functions to check that inode pointers point to
> > somewhere within the filesystem and not at the static AG metadata.
> > Move xfs_internal_inum and create a directory inode check function.
> > We will use these functions in scrub and elsewhere.
....
> > diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> > index 988bb3f..da3652b 100644
> > --- a/fs/xfs/libxfs/xfs_ialloc.c
> > +++ b/fs/xfs/libxfs/xfs_ialloc.c
> > @@ -2664,3 +2664,84 @@ xfs_ialloc_pagi_init(
> >  		xfs_trans_brelse(tp, bp);
> >  	return 0;
> >  }
> > +
> > +/* Calculate the first and last possible inode number in an AG. */
> > +void
> > +xfs_ialloc_aginode_range(

agino_range?

> > +	struct xfs_mount	*mp,
> > +	xfs_agnumber_t		agno,
> > +	xfs_agino_t		*first,
> > +	xfs_agino_t		*last)
> > +{
> > +	xfs_agblock_t		eoag;
> > +
> > +	eoag = xfs_ag_block_count(mp, agno);
> > +	*first = round_up(XFS_OFFBNO_TO_AGINO(mp, XFS_AGFL_BLOCK(mp) + 1, 0),
> > +			XFS_INODES_PER_CHUNK);
> > +	*last = round_down(XFS_OFFBNO_TO_AGINO(mp, eoag, 0),
> > +			XFS_INODES_PER_CHUNK) - 1;
> 
> This is incorrect; we allocate inode chunks aligned to
> xfs_ialloc_cluster_alignment blocks, which doesn't necessarily result in
> ir_startino being aligned to XFS_INODES_PER_CHUNK.

*nod*

> I think the correct code is this:
> 
> 	/* Calculate the first inode. */
> 	bno = round_up(XFS_AGFL_BLOCK(mp) + 1,
> 			xfs_ialloc_cluster_alignment(mp));
> 	*first = XFS_OFFBNO_TO_AGINO(mp, bno, 0);

*nod*

> 	/* Calculate the last inode. */
> 	bno = round_down(eoag, xfs_ialloc_cluster_alignment(mp));
> 	*last = XFS_OFFBNO_TO_AGINO(mp, bno, 0) - 1;

Bit tricky - I'm not sure that this will give the same inode number
in all cases as rounding down to last valid chunk start offset and
then adding (MAX(XFS_INODES_PER_CHUNK, inodes-per-block) - 1) to
it....

> ...which unfortunately I didn't realize until trying to play with
> nondefault geometry options (1k blocks, no sparse inodes).

Ok, that might explain a bunch of inode noise on my 1k block size
test runs...

> > +/*
> > + * Verify that an AG inode number pointer neither points outside the AG
> > + * nor points at static metadata.
> > + */
> > +bool
> > +xfs_verify_agino_ptr(

Again, I'd probably drop the _ptr suffix here.

> > +	struct xfs_mount	*mp,
> > +	xfs_agnumber_t		agno,
> > +	xfs_agino_t		agino)
> > +{
> > +	xfs_agino_t		first;
> > +	xfs_agino_t		last;
> > +	int			ioff;
> > +
> > +	ioff = XFS_AGINO_TO_OFFSET(mp, agino);
> > +	xfs_ialloc_aginode_range(mp, agno, &first, &last);
> > +	return agino >= first && agino <= last &&
> > +	       ioff < (1 << mp->m_sb.sb_inopblog);

This ioff check will always evaluate as true, yes?

ioff	= XFS_AGINO_TO_OFFSET(i)
	= ((i) & XFS_INO_MASK(XFS_INO_OFFSET_BITS(mp)))
	= (i & XFS_INO_MASK((mp)->m_sb.sb_inopblog))
	= (i & (uint32_t)((1ULL << (mp)->m_sb.sb_inopblog)) - 1)

say sb_inopblog = 8:

ioff	= (i & 0xFF)

And so:

	ioff < (1 << 8)

will always be true.

So I'm not sure this check is needed?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 05/30] xfs: create inode pointer verifiers
  2017-10-13  5:22     ` Dave Chinner
@ 2017-10-13 16:16       ` Darrick J. Wong
  0 siblings, 0 replies; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-13 16:16 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Fri, Oct 13, 2017 at 04:22:20PM +1100, Dave Chinner wrote:
> On Thu, Oct 12, 2017 at 01:23:03PM -0700, Darrick J. Wong wrote:
> > On Wed, Oct 11, 2017 at 06:41:15PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > Create some helper functions to check that inode pointers point to
> > > somewhere within the filesystem and not at the static AG metadata.
> > > Move xfs_internal_inum and create a directory inode check function.
> > > We will use these functions in scrub and elsewhere.
> ....
> > > diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> > > index 988bb3f..da3652b 100644
> > > --- a/fs/xfs/libxfs/xfs_ialloc.c
> > > +++ b/fs/xfs/libxfs/xfs_ialloc.c
> > > @@ -2664,3 +2664,84 @@ xfs_ialloc_pagi_init(
> > >  		xfs_trans_brelse(tp, bp);
> > >  	return 0;
> > >  }
> > > +
> > > +/* Calculate the first and last possible inode number in an AG. */
> > > +void
> > > +xfs_ialloc_aginode_range(
> 
> agino_range?
> 
> > > +	struct xfs_mount	*mp,
> > > +	xfs_agnumber_t		agno,
> > > +	xfs_agino_t		*first,
> > > +	xfs_agino_t		*last)
> > > +{
> > > +	xfs_agblock_t		eoag;
> > > +
> > > +	eoag = xfs_ag_block_count(mp, agno);
> > > +	*first = round_up(XFS_OFFBNO_TO_AGINO(mp, XFS_AGFL_BLOCK(mp) + 1, 0),
> > > +			XFS_INODES_PER_CHUNK);
> > > +	*last = round_down(XFS_OFFBNO_TO_AGINO(mp, eoag, 0),
> > > +			XFS_INODES_PER_CHUNK) - 1;
> > 
> > This is incorrect; we allocate inode chunks aligned to
> > xfs_ialloc_cluster_alignment blocks, which doesn't necessarily result in
> > ir_startino being aligned to XFS_INODES_PER_CHUNK.
> 
> *nod*
> 
> > I think the correct code is this:
> > 
> > 	/* Calculate the first inode. */
> > 	bno = round_up(XFS_AGFL_BLOCK(mp) + 1,
> > 			xfs_ialloc_cluster_alignment(mp));
> > 	*first = XFS_OFFBNO_TO_AGINO(mp, bno, 0);
> 
> *nod*
> 
> > 	/* Calculate the last inode. */
> > 	bno = round_down(eoag, xfs_ialloc_cluster_alignment(mp));
> > 	*last = XFS_OFFBNO_TO_AGINO(mp, bno, 0) - 1;
> 
> Bit tricky - I'm not sure that this will give the same inode number
> in all cases as rounding down to last valid chunk start offset and
> then adding (MAX(XFS_INODES_PER_CHUNK, inodes-per-block) - 1) to
> it....
> 
> > ...which unfortunately I didn't realize until trying to play with
> > nondefault geometry options (1k blocks, no sparse inodes).
> 
> Ok, that might explain a bunch of inode noise on my 1k block size
> test runs...
> 
> > > +/*
> > > + * Verify that an AG inode number pointer neither points outside the AG
> > > + * nor points at static metadata.
> > > + */
> > > +bool
> > > +xfs_verify_agino_ptr(
> 
> Again, I'd probably drop the _ptr suffix here.
> 
> > > +	struct xfs_mount	*mp,
> > > +	xfs_agnumber_t		agno,
> > > +	xfs_agino_t		agino)
> > > +{
> > > +	xfs_agino_t		first;
> > > +	xfs_agino_t		last;
> > > +	int			ioff;
> > > +
> > > +	ioff = XFS_AGINO_TO_OFFSET(mp, agino);
> > > +	xfs_ialloc_aginode_range(mp, agno, &first, &last);
> > > +	return agino >= first && agino <= last &&
> > > +	       ioff < (1 << mp->m_sb.sb_inopblog);
> 
> This ioff check will always evaluate as true, yes?
> 
> ioff	= XFS_AGINO_TO_OFFSET(i)
> 	= ((i) & XFS_INO_MASK(XFS_INO_OFFSET_BITS(mp)))
> 	= (i & XFS_INO_MASK((mp)->m_sb.sb_inopblog))
> 	= (i & (uint32_t)((1ULL << (mp)->m_sb.sb_inopblog)) - 1)
> 
> say sb_inopblog = 8:
> 
> ioff	= (i & 0xFF)
> 
> And so:
> 
> 	ioff < (1 << 8)
> 
> will always be true.
> 
> So I'm not sure this check is needed?

Indeed not.

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 04/30] xfs: refactor btree block header checking functions
  2017-10-13  1:01   ` Dave Chinner
@ 2017-10-13 21:15     ` Darrick J. Wong
  0 siblings, 0 replies; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-13 21:15 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Fri, Oct 13, 2017 at 12:01:22PM +1100, Dave Chinner wrote:
> On Wed, Oct 11, 2017 at 06:41:07PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Refactor the btree block header checks to have an internal function that
> > returns the address of the failing check without logging errors.  The
> > scrubber will call the internal function, while the external version
> > will maintain the current logging behavior.
> 
> .....
> 
> > diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
> > index 8f52eda..baf7064 100644
> > --- a/fs/xfs/libxfs/xfs_btree.h
> > +++ b/fs/xfs/libxfs/xfs_btree.h
> > @@ -255,6 +255,11 @@ typedef struct xfs_btree_cur
> >   */
> >  #define	XFS_BUF_TO_BLOCK(bp)	((struct xfs_btree_block *)((bp)->b_addr))
> >  
> > +/* Internal long and short btree block checks. */
> > +void *__xfs_btree_check_lblock(struct xfs_btree_cur *cur,
> > +		struct xfs_btree_block *block, int level, struct xfs_buf *bp);
> > +void *__xfs_btree_check_sblock(struct xfs_btree_cur *cur,
> > +		struct xfs_btree_block *block, int level, struct xfs_buf *bp);
> 
> /*
>  * Internal long and short btree block checks. They return NULL if
>  * the block is OK, otherwise they return the address of the failed
>  * check.
>  */

Ok.

> 
> >  
> >  /*
> >   * Check that block header is ok.
> > diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
> > index dcd1292..b825953 100644
> > --- a/fs/xfs/xfs_linux.h
> > +++ b/fs/xfs/xfs_linux.h
> > @@ -142,6 +142,13 @@ typedef __u32			xfs_nlink_t;
> >  #define SYNCHRONIZE()	barrier()
> >  #define __return_address __builtin_return_address(0)
> >  
> > +/*
> > + * Return the address of a label.  Use asm volatile so that the optimizer
> > + * won't try anything stupid like refactoring the error jumpouts into a
> > + * single return, which throws off the reported address.
> > + */
> > +#define __this_address  ({ __label__ __here; __here: asm volatile(""); &&__here; })
> 
> I think this should probably use barrier() rather than an asm
> statement - can you check that this works correctly with gcc?  Other
> compilers won't work with a asm statement (llvm/intel) but should
> DTRT with a compiler barrier intrinsic...

Ok.  The asm output is identical under asm volatile/barrier, at least on gcc.

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 06/30] xfs: create an ioctl to scrub AG metadata
  2017-10-12  1:41 ` [PATCH 06/30] xfs: create an ioctl to scrub AG metadata Darrick J. Wong
@ 2017-10-16  0:08   ` Dave Chinner
  0 siblings, 0 replies; 99+ messages in thread
From: Dave Chinner @ 2017-10-16  0:08 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Oct 11, 2017 at 06:41:21PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Create an ioctl that can be used to scrub internal filesystem metadata.
> The new ioctl takes the metadata type, an (optional) AG number, an
> (optional) inode number and generation, and a flags argument.  This will
> be used by the upcoming XFS online scrub tool.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 07/30] xfs: dispatch metadata scrub subcommands
  2017-10-12  1:41 ` [PATCH 07/30] xfs: dispatch metadata scrub subcommands Darrick J. Wong
@ 2017-10-16  0:26   ` Dave Chinner
  0 siblings, 0 replies; 99+ messages in thread
From: Dave Chinner @ 2017-10-16  0:26 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Oct 11, 2017 at 06:41:31PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Create structures needed to hold scrubbing context and dispatch incoming
> commands to the individual scrubbers.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Looks good.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 08/30] xfs: probe the scrub ioctl
  2017-10-12  1:41 ` [PATCH 08/30] xfs: probe the scrub ioctl Darrick J. Wong
@ 2017-10-16  0:39   ` Dave Chinner
  2017-10-16 19:54     ` Darrick J. Wong
  0 siblings, 1 reply; 99+ messages in thread
From: Dave Chinner @ 2017-10-16  0:39 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Oct 11, 2017 at 06:41:37PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Create a probe scrubber with id 0.  This will be used by xfs_scrub to
> probe the kernel's abilities to scrub (and repair) the metadata.

This no longer returns anything to userspace it indicate
capabilities. I can see that the previous patch checks for
valid/invalid input flags, so we have unknown feature
checking in place, just not obviously through the probe function
implementation. Can you expand this comment a little to explain
where the supported feature checks occur and so all that is required
here is a stub that does nothing?

Otherwise, consider it:

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 09/30] xfs: create helpers to record and deal with scrub problems
  2017-10-12  1:41 ` [PATCH 09/30] xfs: create helpers to record and deal with scrub problems Darrick J. Wong
@ 2017-10-16  0:40   ` Dave Chinner
  0 siblings, 0 replies; 99+ messages in thread
From: Dave Chinner @ 2017-10-16  0:40 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Oct 11, 2017 at 06:41:44PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Create helper functions to record crc and corruption problems, and
> deal with any other runtime errors that arise.

xfs_scrub_process_error() makes much more sense :)

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 10/30] xfs: create helpers to scrub a metadata btree
  2017-10-12  1:41 ` [PATCH 10/30] xfs: create helpers to scrub a metadata btree Darrick J. Wong
@ 2017-10-16  0:56   ` Dave Chinner
  0 siblings, 0 replies; 99+ messages in thread
From: Dave Chinner @ 2017-10-16  0:56 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Oct 11, 2017 at 06:41:50PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Create helper functions and tracepoints to deal with errors while
> scrubbing a metadata btree.

looks good.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 11/30] xfs: scrub the shape of a metadata btree
  2017-10-12  1:41 ` [PATCH 11/30] xfs: scrub the shape of " Darrick J. Wong
@ 2017-10-16  1:29   ` Dave Chinner
  2017-10-16 20:09     ` Darrick J. Wong
  0 siblings, 1 reply; 99+ messages in thread
From: Dave Chinner @ 2017-10-16  1:29 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Oct 11, 2017 at 06:41:56PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Create a function that can check the shape of a btree -- each block
> passes basic inspection and all the pointers look ok.  In the next patch
> we'll add the ability to check the actual keys and records stored within
> the btree.  Add some helper functions so that we report detailed scrub
> errors in a uniform manner in dmesg.  These are helper functions for
> subsequent patches.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Minor thing:

>  /*
> + * Check a btree pointer.  Returns true if it's ok to use this pointer.
> + * Callers do not need to set the corrupt flag.
> + */
> +static bool
> +xfs_scrub_btree_ptr_ok(
> +	struct xfs_scrub_btree		*bs,
> +	int				level,
> +	union xfs_btree_ptr		*ptr)
> +{
> +	bool				res;
> +
> +	/* A btree rooted in an inode has no block pointer to the root. */
> +	if ((bs->cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) &&
> +	    level == bs->cur->bc_nlevels)
> +		return true;
> +
> +	/* Otherwise, check the pointers. */
> +	if (bs->cur->bc_flags & XFS_BTREE_LONG_PTRS) {
> +		res = xfs_btree_check_lptr(bs->cur, be64_to_cpu(ptr->l), level);
> +		if (!res)
> +			xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, level);
> +	} else {
> +		res = xfs_btree_check_sptr(bs->cur, be32_to_cpu(ptr->s), level);
> +		if (!res)
> +			xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, level);
> +	}

We should already know what type of btree we are scrubbing, so I
think this can be simplified to a single
xfs_scrub_btree_set_corrupt() tracepoint.

> +STATIC int
> +xfs_scrub_btree_get_block(
> +	struct xfs_scrub_btree		*bs,
> +	int				level,
> +	union xfs_btree_ptr		*pp,
> +	struct xfs_btree_block		**pblock,
> +	struct xfs_buf			**pbp)
> +{
> +	void				*failed_at;
> +	int				error;
> +
> +	error = xfs_btree_lookup_get_block(bs->cur, level, pp, pblock);
> +	if (!xfs_scrub_btree_process_error(bs->sc, bs->cur, level, &error) ||
> +	    !pblock)
> +		return error;
> +
> +	xfs_btree_get_block(bs->cur, level, pbp);
> +	if (bs->cur->bc_flags & XFS_BTREE_LONG_PTRS) {
> +		failed_at = __xfs_btree_check_lblock(bs->cur, *pblock,
> +				level, *pbp);
> +		if (failed_at) {
> +			xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, level);
> +			return 0;
> +		}
> +	} else {
> +		failed_at = __xfs_btree_check_sblock(bs->cur, *pblock,
> +				 level, *pbp);
> +		if (failed_at) {
> +			xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, level);
> +			return 0;
> +		}
> +	}

And same here.

> diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
> index a7c3361..414bbb8 100644
> --- a/fs/xfs/scrub/common.h
> +++ b/fs/xfs/scrub/common.h
> @@ -21,6 +21,24 @@
>  #define __XFS_SCRUB_COMMON_H__
>  
>  /*
> + * We /could/ terminate a scrub/repair operation early.  If we're not
> + * in a good place to continue (fatal signal, etc.) then bail out.
> + * Note that we're careful not to make any judgements about *error.
> + */
> +static inline bool
> +xfs_scrub_should_terminate(
> +	struct xfs_scrub_context	*sc,
> +	int				*error)
> +{
> +	if (fatal_signal_pending(current)) {
> +		if (*error == 0)
> +			*error = -EAGAIN;
> +		return true;
> +	}
> +	return false;
> +}

Probably should move that to the original scrub infrastructure
patch.

Otherwise looks fine.


Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 12/30] xfs: scrub btree keys and records
  2017-10-12  1:42 ` [PATCH 12/30] xfs: scrub btree keys and records Darrick J. Wong
@ 2017-10-16  1:31   ` Dave Chinner
  0 siblings, 0 replies; 99+ messages in thread
From: Dave Chinner @ 2017-10-16  1:31 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Oct 11, 2017 at 06:42:02PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Add to the btree scrubber the ability to check that the keys and
> records are in the right order and actually call out to our record
> iterator to do actual checking of the records.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Looks fine.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 13/30] xfs: create helpers to scan an allocation group
  2017-10-12  1:42 ` [PATCH 13/30] xfs: create helpers to scan an allocation group Darrick J. Wong
@ 2017-10-16  1:32   ` Dave Chinner
  0 siblings, 0 replies; 99+ messages in thread
From: Dave Chinner @ 2017-10-16  1:32 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Oct 11, 2017 at 06:42:08PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Add some helpers to enable us to lock an AG's headers, create btree
> cursors for all btrees in that allocation group, and clean up
> afterwards.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 15/30] xfs: scrub AGF and AGFL
  2017-10-12  1:42 ` [PATCH 15/30] xfs: scrub AGF and AGFL Darrick J. Wong
@ 2017-10-16  2:18   ` Dave Chinner
  0 siblings, 0 replies; 99+ messages in thread
From: Dave Chinner @ 2017-10-16  2:18 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Oct 11, 2017 at 06:42:22PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Check the block references in the AGF and AGFL headers to make sure
> they make sense.

Looks good. Much simpler without all the duplicated AG header
reading. :P

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 16/30] xfs: scrub the AGI
  2017-10-12  1:42 ` [PATCH 16/30] xfs: scrub the AGI Darrick J. Wong
@ 2017-10-16  2:19   ` Dave Chinner
  0 siblings, 0 replies; 99+ messages in thread
From: Dave Chinner @ 2017-10-16  2:19 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Oct 11, 2017 at 06:42:28PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Add a forgotten check to the AGI verifier, then wire up the scrub
> infrastructure to check the AGI contents.

Looks good.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 17/30] xfs: scrub free space btrees
  2017-10-12  1:42 ` [PATCH 17/30] xfs: scrub free space btrees Darrick J. Wong
@ 2017-10-16  2:25   ` Dave Chinner
  2017-10-16 20:36     ` Darrick J. Wong
  0 siblings, 1 reply; 99+ messages in thread
From: Dave Chinner @ 2017-10-16  2:25 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Oct 11, 2017 at 06:42:35PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Check the extent records free space btrees to ensure that the values
> look sane.

Minor thing:

> +/* Scrub a bnobt/cntbt record. */
> +STATIC int
> +xfs_scrub_allocbt_helper(

xfs_scrub_allocbt_rec()

Reads much more nicely with this name. :P

Otherwise, consider it:

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 18/30] xfs: scrub inode btrees
  2017-10-12  1:42 ` [PATCH 18/30] xfs: scrub inode btrees Darrick J. Wong
@ 2017-10-16  2:55   ` Dave Chinner
  2017-10-16 22:16     ` Darrick J. Wong
  2017-10-17  0:11   ` [PATCH v2 " Darrick J. Wong
  1 sibling, 1 reply; 99+ messages in thread
From: Dave Chinner @ 2017-10-16  2:55 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Oct 11, 2017 at 06:42:41PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Check the records of the inode btrees to make sure that the values
> make sense given the inode records themselves.

.....

I think maybe I missed this first time around...

> +/* Check a particular inode with ir_free. */
> +STATIC int
> +xfs_scrub_iallocbt_check_cluster_freemask(
> +	struct xfs_scrub_btree		*bs,
> +	xfs_ino_t			fsino,
> +	xfs_agino_t			chunkino,
> +	xfs_agino_t			clusterino,
> +	struct xfs_inobt_rec_incore	*irec,
> +	struct xfs_buf			*bp)
> +{
> +	struct xfs_dinode		*dip;
> +	struct xfs_mount		*mp = bs->cur->bc_mp;
> +	bool				freemask_ok;
> +	bool				inuse;
> +	int				error;
> +
> +	dip = xfs_buf_offset(bp, clusterino * mp->m_sb.sb_inodesize);
> +	if (be16_to_cpu(dip->di_magic) != XFS_DINODE_MAGIC ||
> +	    (dip->di_version >= 3 &&
> +	     be64_to_cpu(dip->di_ino) != fsino + clusterino)) {
> +		xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
> +		goto out;
> +	}
> +
> +	freemask_ok = (irec->ir_free & XFS_INOBT_MASK(chunkino + clusterino));

Ok, so if the inode if free, the corresponding bit in the mask
will be set....

> +	error = xfs_icache_inode_is_allocated(mp, bs->cur->bc_tp,
> +			fsino + clusterino, &inuse);
> +	if (error == -ENODATA) {
> +		/* Not cached, just read the disk buffer */
> +		freemask_ok ^= !!(dip->di_mode);

And this uses the lowest bit of the mask? How does that work?

/me spends 10 minutes trying to work out this function before he
realises that freemask_ok is a boolean, so the initial freemask
bit is collapsed down to a single bit.

Ok, that's definitely unexpected and not obvious from the code or
comments. The name "freemask_ok" is misleading the way it's used.
The first time it is set is means "inode is free", then after this
operation it means "inode matches free mask"....

> +		if (!bs->sc->try_harder && !freemask_ok)
> +			return -EDEADLOCK;
> +	} else if (error < 0) {
> +		/*
> +		 * Inode is only half assembled, or there was an IO error,
> +		 * or the verifier failed, so don't bother trying to check.
> +		 * The inode scrubber can deal with this.
> +		 */
> +		freemask_ok = true;

And here it means "we didn't check the free mask"

> +	} else {
> +		/* Inode is all there. */
> +		freemask_ok ^= inuse;

And here is means "inode matches free mask" again....

> +	}
> +	if (!freemask_ok)
> +		xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
> +out:
> +	return 0;

Can we rewrite this to be a little more obvious?

	bool		inode_is_free = false;
	bool		freemask_ok;
....
	if (irec->ir_free & XFS_INOBT_MASK(chunkino + clusterino))
		inode_is_free = true;
....
	if (error == -ENODATA) {
		freemask_ok = inode_is_free ^ !!(dip->di_mode);
....
	else if (error < 0) {
		/*
		 * Inode is only half assembled, .....
		 */
		goto out;
	} else {
		freemask_ok = inode_is_free ^ inuse;
	}

That's a lot more obvious what the code is checking...


> +/* Scrub an inobt/finobt record. */
> +STATIC int
> +xfs_scrub_iallocbt_helper(

s/helper/rec/

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 19/30] xfs: scrub rmap btrees
  2017-10-12  1:42 ` [PATCH 19/30] xfs: scrub rmap btrees Darrick J. Wong
@ 2017-10-16  3:01   ` Dave Chinner
  0 siblings, 0 replies; 99+ messages in thread
From: Dave Chinner @ 2017-10-16  3:01 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Oct 11, 2017 at 06:42:47PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Check the reverse mapping records to make sure that the contents
> make sense.

Minor:

> +/* Scrub an rmapbt record. */
> +STATIC int
> +xfs_scrub_rmapbt_helper(

s/helper/rec/

otherwise:

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 20/30] xfs: scrub refcount btrees
  2017-10-12  1:42 ` [PATCH 20/30] xfs: scrub refcount btrees Darrick J. Wong
@ 2017-10-16  3:02   ` Dave Chinner
  0 siblings, 0 replies; 99+ messages in thread
From: Dave Chinner @ 2017-10-16  3:02 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Oct 11, 2017 at 06:42:53PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Plumb in the pieces necessary to check the refcount btree.  If rmap is
> available, check the reference count by performing an interval query
> against the rmapbt.

....

> +/* Scrub a refcountbt record. */
> +STATIC int
> +xfs_scrub_refcountbt_helper(

s/helper/rec/

Apart from that

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 21/30] xfs: scrub inodes
  2017-10-12 22:32   ` Darrick J. Wong
@ 2017-10-16  3:16     ` Dave Chinner
  2017-10-16 22:08       ` Darrick J. Wong
  0 siblings, 1 reply; 99+ messages in thread
From: Dave Chinner @ 2017-10-16  3:16 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Oct 12, 2017 at 03:32:50PM -0700, Darrick J. Wong wrote:
> On Wed, Oct 11, 2017 at 06:43:00PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Scrub the fields within an inode.

.....

> > +
> > +/*
> > + * Given an inode and the scrub control structure, grab either the
> > + * inode referenced in the control structure or the inode passed in.
> > + * The inode is not locked.
> > + */
> > +int
> > +xfs_scrub_get_inode(
> > +	struct xfs_scrub_context	*sc,
> > +	struct xfs_inode		*ip_in)
> > +{
> > +	struct xfs_mount		*mp = sc->mp;
> > +	struct xfs_inode		*ip = NULL;
> > +	int				error;
> > +
> > +	/*
> > +	 * If userspace passed us an AG number or a generation number
> > +	 * without an inode number, they haven't got a clue so bail out
> > +	 * immediately.
> > +	 */
> > +	if (sc->sm->sm_agno || (sc->sm->sm_gen && !sc->sm->sm_ino))
> > +		return -EINVAL;
> > +
> > +	/* We want to scan the inode we already had opened. */
> > +	if (sc->sm->sm_ino == 0 || sc->sm->sm_ino == ip_in->i_ino) {
> > +		sc->ip = ip_in;
> > +		return 0;
> > +	}
> > +
> > +	/* Look up the inode, see if the generation number matches. */
> > +	if (xfs_internal_inum(mp, sc->sm->sm_ino))
> > +		return -ENOENT;
> > +	error = xfs_iget(mp, NULL, sc->sm->sm_ino,
> > +			XFS_IGET_UNTRUSTED | XFS_IGET_DONTCACHE, 0, &ip);
> > +	if (error == -ENOENT || error == -EINVAL) {
> > +		/* inode doesn't exist... */
> > +		return -ENOENT;
> > +	} else if (error) {
> > +		trace_xfs_scrub_op_error(sc,
> > +				XFS_INO_TO_AGNO(mp, sc->sm->sm_ino),
> > +				XFS_INO_TO_AGBNO(mp, sc->sm->sm_ino),
> > +				error, __return_address);
> > +		return error;
> > +	}
> > +	if (VFS_I(ip)->i_generation != sc->sm->sm_gen) {
> > +		iput(VFS_I(ip));
> > +		return -ENOENT;
> > +	}
> > +
> > +	sc->ip = ip;
> > +	return 0;
> > +}

Much nicer with the way everything is clearly spelled out :P

> > +/* Inode core */
> > +
> > +/*
> > + * di_extsize hint validation is somewhat cumbersome. Rules are:
> > + *
> > + * 1. extent size hint is only valid for directories and regular files
> > + * 2. DIFLAG_EXTSIZE is only valid for regular files
> > + * 3. DIFLAG_EXTSZINHERIT is only valid for directories.
> > + * 4. extsize hint of 0 turns off hints, clears inode flags.
> > + * 5. either flag must be set if extsize != 0
> > + * 6. Extent size must be a multiple of the appropriate block size.
> > + * 7. extent size hint cannot be longer than maximum extent length
> > + * 8. for non-realtime files, the extent size hint must be limited
> > + *    to half the AG size to avoid alignment extending the extent
> > + *    beyond the limits of the AG.
> > + */
> > +STATIC void
> > +xfs_scrub_inode_extsize(
> > +	struct xfs_scrub_context	*sc,
> > +	struct xfs_buf			*bp,
> > +	struct xfs_dinode		*dip,
> > +	xfs_ino_t			ino,
> > +	uint16_t			mode,
> > +	uint16_t			flags)
> > +{
> > +	struct xfs_mount		*mp = sc->mp;
> > +	bool				rt_flag;
> > +	bool				hint_flag;
> > +	bool				inherit_flag;
> > +	uint32_t			extsize;
> > +	uint32_t			extsize_bytes;
> > +	uint32_t			blocksize_bytes;
> > +
> > +	rt_flag = (flags & XFS_DIFLAG_REALTIME);
> > +	hint_flag = (flags & XFS_DIFLAG_EXTSIZE);
> > +	inherit_flag = (flags & XFS_DIFLAG_EXTSZINHERIT);
> > +	extsize = be32_to_cpu(dip->di_extsize);
> > +	extsize_bytes = XFS_FSB_TO_B(sc->mp, extsize);
> > +
> > +	if (rt_flag)
> > +		blocksize_bytes = mp->m_sb.sb_rextsize << mp->m_sb.sb_blocklog;
> > +	else
> > +		blocksize_bytes = mp->m_sb.sb_blocksize;
> > +
> > +	if ((hint_flag || inherit_flag) && (!S_ISDIR(mode) && !S_ISREG(mode)))

Logic is a correct but reads funny:

	if ((hint_flag || inherit_flag) &&
	    !(S_ISREG(mode) || S_ISDIR(mode)))

> > +/*
> > + * di_cowextsize hint validation is somewhat cumbersome. Rules are:
> > + *
> > + * 1. flag requires reflink feature
> > + * 2. cow extent size hint is only valid for directories and regular files
> > + * 3. cow extsize hint of 0 turns off hints, clears inode flags.
> > + * 4. either flag must be set if cow extsize != 0
> > + * 5. flag cannot be set for rt files
> > + * 6. Extent size must be a multiple of the appropriate block size.
> > + * 7. extent size hint cannot be longer than maximum extent length
> > + * 8. the extent size hint must be limited
> > + *    to half the AG size to avoid alignment extending the extent
> > + *    beyond the limits of the AG.
> > + */

Perhaps this comment doesn't need duplicating for a 3rd time. Maybe
for both di_extsize and di_cowextsize just say:

/*
 * Extent size hints have explicit rules. They are documented at
 * xfs_ioctl_setattr_check_extsize() - these functions need to be
 * kept in sync with each other.
 */

> > +STATIC void
> > +xfs_scrub_inode_cowextsize(
> > +	struct xfs_scrub_context	*sc,
> > +	struct xfs_buf			*bp,
> > +	struct xfs_dinode		*dip,
> > +	xfs_ino_t			ino,
> > +	uint16_t			mode,
> > +	uint16_t			flags,
> > +	uint64_t			flags2)
> > +{
> > +	struct xfs_mount		*mp = sc->mp;
> > +	bool				rt_flag;
> > +	bool				hint_flag;
> > +	uint32_t			extsize;
> > +	uint32_t			extsize_bytes;
> > +
> > +	rt_flag = (flags & XFS_DIFLAG_REALTIME);
> > +	hint_flag = (flags2 & XFS_DIFLAG2_COWEXTSIZE);
> > +	extsize = be32_to_cpu(dip->di_extsize);
> 
> Doh, this ought to be extsize = be32_to_cpu(dip->di_cowextsize); will fix.

Yup, with that fix in place all the spurious inode warnings I was
getting went away.

> > +/* Map and read a raw inode. */
> > +STATIC int
> > +xfs_scrub_inode_map_raw(
> > +	struct xfs_scrub_context	*sc,
> > +	xfs_ino_t			ino,
> > +	struct xfs_buf			**bpp,
> > +	struct xfs_dinode		**dipp)
> > +{
> > +	struct xfs_imap			imap;
> > +	struct xfs_mount		*mp = sc->mp;
> > +	struct xfs_buf			*bp;
> > +	struct xfs_dinode		*dip;
> > +	int				error;
> > +
> > +	error = xfs_imap(mp, sc->tp, ino, &imap, XFS_IGET_UNTRUSTED);
> > +	if (error == -EINVAL) {
> > +		/*
> > +		 * Inode could have gotten deleted out from under us;
> > +		 * just forget about it.
> > +		 */
> > +		error = -ENOENT;
> > +		goto out;
> > +	}
> > +	if (!xfs_scrub_process_error(sc, XFS_INO_TO_AGNO(mp, ino),
> > +			XFS_INO_TO_AGBNO(mp, ino), &error))
> > +		goto out;
> > +
> > +	error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
> > +			imap.im_blkno, imap.im_len, XBF_UNMAPPED, &bp,
> > +			NULL);
> > +	if (!xfs_scrub_process_error(sc, XFS_INO_TO_AGNO(mp, ino),
> > +			XFS_INO_TO_AGBNO(mp, ino), &error))
> > +		goto out;
> > +
> > +	/* Is this really an inode? */
> > +	bp->b_ops = &xfs_inode_buf_ops;

A comment here on why we skip the read verifier when pulling in the
inode buffer would be nice.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 22/30] xfs: scrub inode block mappings
  2017-10-12  1:43 ` [PATCH 22/30] xfs: scrub inode block mappings Darrick J. Wong
@ 2017-10-16  3:26   ` Dave Chinner
  2017-10-16 20:43     ` Darrick J. Wong
  0 siblings, 1 reply; 99+ messages in thread
From: Dave Chinner @ 2017-10-16  3:26 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Oct 11, 2017 at 06:43:06PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Scrub an individual inode's block mappings to make sure they make sense.

....

> +/* Set us up with an inode's bmap. */
> +int
> +xfs_scrub_setup_inode_bmap(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_inode		*ip)
> +{
> +	struct xfs_mount		*mp = sc->mp;
> +	int				error;
> +
> +	error = xfs_scrub_get_inode(sc, ip);
> +	if (error)
> +		goto out;
> +
> +	sc->ilock_flags = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL;
> +	xfs_ilock(sc->ip, sc->ilock_flags);
> +
> +	/*
> +	 * We don't want any ephemeral data fork updates sitting around
> +	 * while we inspect block mappings, so wait for directio to finish
> +	 * and flush dirty data if we have delalloc reservations.
> +	 */
> +	if (S_ISREG(VFS_I(sc->ip)->i_mode) &&
> +	    sc->sm->sm_type == XFS_SCRUB_TYPE_BMBTD) {
> +		inode_dio_wait(VFS_I(sc->ip));
> +		error = filemap_write_and_wait(VFS_I(sc->ip)->i_mapping);
> +		if (error)
> +			goto out;
> +
> +		/* Drop the page cache if we're repairing block mappings. */
> +		if (sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR) {
> +			error = invalidate_inode_pages2(
> +					VFS_I(sc->ip)->i_mapping);
> +			if (error)
> +				goto out;

I'll point this out just to say I've seen it. It's a little out of
place for this patch set, but it's harmless.

> +/* Scrub a bmbt record. */
> +STATIC int
> +xfs_scrub_bmapbt_helper(

s/helper/rec/

> + *
> + * First we scan every record in every btree block, if applicable.
> + * Then we unconditionally scan the incore extent cache.
> + */
> +STATIC int
> +xfs_scrub_bmap(
> +	struct xfs_scrub_context	*sc,
> +	int				whichfork)
> +{
> +	struct xfs_bmbt_irec		irec;
> +	struct xfs_scrub_bmap_info	info = {0};
> +	struct xfs_mount		*mp = sc->mp;
> +	struct xfs_inode		*ip = sc->ip;
> +	struct xfs_ifork		*ifp;
> +	xfs_fileoff_t			endoff;
> +	xfs_extnum_t			idx;
> +	bool				found;
> +	int				error = 0;
> +
> +	ifp = XFS_IFORK_PTR(ip, whichfork);
> +
> +	info.is_rt = whichfork == XFS_DATA_FORK && XFS_IS_REALTIME_INODE(ip);
> +	info.whichfork = whichfork;
> +	info.is_shared = whichfork == XFS_DATA_FORK && xfs_is_reflink_inode(ip);
> +	info.sc = sc;
> +
> +	switch (whichfork) {
> +	case XFS_COW_FORK:
> +		/* Non-existent CoW forks are ignorable. */
> +		if (!ifp)
> +			goto out;
> +		/* No CoW forks on non-reflink inodes/filesystems. */
> +		if (!xfs_is_reflink_inode(ip)) {
> +			xfs_scrub_ino_set_corrupt(sc, sc->ip->i_ino, NULL);
> +			goto out;
> +		}
> +		break;
> +	case XFS_ATTR_FORK:
> +		if (!ifp)
> +			goto out;
> +		if (!xfs_sb_version_hasattr(&mp->m_sb) &&
> +		    !xfs_sb_version_hasattr2(&mp->m_sb))
> +			xfs_scrub_ino_set_corrupt(sc, sc->ip->i_ino, NULL);
> +		break;
> +	}

Missing a default option here for other values. Some compilers will
warn about this.

Otherwise this look fine.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 23/30] xfs: scrub directory/attribute btrees
  2017-10-12  1:43 ` [PATCH 23/30] xfs: scrub directory/attribute btrees Darrick J. Wong
@ 2017-10-16  4:13   ` Dave Chinner
  0 siblings, 0 replies; 99+ messages in thread
From: Dave Chinner @ 2017-10-16  4:13 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, Fengguang Wu

On Wed, Oct 11, 2017 at 06:43:12PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Provide a way to check the shape and scrub the hashes and records
> in a directory or extended attribute btree.  These are helper functions
> for the directory & attribute scrubbers in subsequent patches.

looks good

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 24/30] xfs: scrub directory metadata
  2017-10-12  1:43 ` [PATCH 24/30] xfs: scrub directory metadata Darrick J. Wong
@ 2017-10-16  4:29   ` Dave Chinner
  2017-10-16 20:46     ` Darrick J. Wong
  2017-10-17  0:14   ` [PATCH v2 " Darrick J. Wong
  1 sibling, 1 reply; 99+ messages in thread
From: Dave Chinner @ 2017-10-16  4:29 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Oct 11, 2017 at 06:43:19PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Scrub the hash tree and all the entries in a directory.

.....
> +/* Check that an inode's mode matches a given DT_ type. */
> +STATIC int
> +xfs_scrub_dir_check_ftype(
> +	struct xfs_scrub_dir_ctx	*sdc,
> +	xfs_fileoff_t			offset,
> +	xfs_ino_t			inum,
> +	int				dtype)
> +{
> +	struct xfs_mount		*mp = sdc->sc->mp;
> +	struct xfs_inode		*ip;
> +	int				ino_dtype;
> +	int				error = 0;
> +
> +	if (!xfs_sb_version_hasftype(&mp->m_sb)) {
> +		if (dtype != DT_UNKNOWN && dtype != DT_DIR)
> +			xfs_scrub_fblock_set_corrupt(sdc->sc, XFS_DATA_FORK,
> +					offset);
> +		goto out;
> +	}
> +
> +	error = xfs_iget(mp, sdc->sc->tp, inum, XFS_IGET_DONTCACHE, 0, &ip);
> +	if (!xfs_scrub_fblock_process_error(sdc->sc, XFS_DATA_FORK, offset,
> +			&error))
> +		goto out;
> +
> +	/* Convert mode to the DT_* values that dir_emit uses. */
> +	ino_dtype = (VFS_I(ip)->i_mode & S_IFMT) >> 12;

xfs_mode_to_ftype() ?

Otherwise it looks ok.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 25/30] xfs: scrub directory freespace
  2017-10-12  1:43 ` [PATCH 25/30] xfs: scrub directory freespace Darrick J. Wong
@ 2017-10-16  4:49   ` Dave Chinner
  2017-10-16 22:37     ` Darrick J. Wong
  2017-10-17  1:10   ` [PATCH v2 " Darrick J. Wong
  1 sibling, 1 reply; 99+ messages in thread
From: Dave Chinner @ 2017-10-16  4:49 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Oct 11, 2017 at 06:43:26PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Check the free space information in a directory.

....

> diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c
> index e2a8f90..a41310f 100644
> --- a/fs/xfs/scrub/dir.c
> +++ b/fs/xfs/scrub/dir.c
> @@ -250,6 +250,426 @@ xfs_scrub_dir_rec(
>  	return error;
>  }
>  
> +/*
> + * Is this unused entry either in the bestfree or smaller than all of them?
> + * We assume the bestfrees are sorted longest to shortest, and that there
> + * aren't any bogus entries.

s/We assume/We've already checked/

> + */
> +static inline void
> +xfs_scrub_directory_check_free_entry(
> +	struct xfs_scrub_context	*sc,
> +	xfs_dablk_t			lblk,
> +	struct xfs_dir2_data_free	*bf,
> +	struct xfs_dir2_data_unused	*dup)

....

> +	while (ptr < endptr) {
> +		dup = (struct xfs_dir2_data_unused *)ptr;
> +		/* Skip real entries */
> +		if (dup->freetag != cpu_to_be16(XFS_DIR2_DATA_FREE_TAG)) {
> +			struct xfs_dir2_data_entry	*dep;
> +
> +			dep = (struct xfs_dir2_data_entry *)ptr;
> +			newlen = d_ops->data_entsize(dep->namelen);
> +			if (newlen <= 0) {
> +				xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK,
> +						lblk);
> +				goto out_buf;
> +			}
> +			ptr += newlen;
> +			if (endptr < ptr)
> +				xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK,
> +					      lblk);
> +			continue;
> +		}
> +
> +		/* Spot check this free entry */
> +		tag = be16_to_cpu(*xfs_dir2_data_unused_tag_p(dup));
> +		if (tag != ((char *)dup - (char *)bp->b_addr))
> +			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
> +
> +		/*
> +		 * Either this entry is a bestfree or it's smaller than
> +		 * any of the bestfrees.
> +		 */
> +		xfs_scrub_directory_check_free_entry(sc, lblk, bf, dup);
> +
> +		/* Move on. */
> +		newlen = be16_to_cpu(dup->length);
> +		if (newlen <= 0) {
> +			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
> +			goto out_buf;
> +		}
> +		ptr += newlen;
> +		if (endptr < ptr)
> +			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);

I'd prefer this matches the loop logic order. ie.

		if (ptr >= endptr)

> +		else
> +			nr_frees++;
> +	}
> +
> +	/* Did we see at least as many free slots as there are bestfrees? */
> +	if (nr_frees < nr_bestfrees)
> +		xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
> +out_buf:
> +	xfs_trans_brelse(sc->tp, bp);
> +out:
> +	return error;
> +}
> +
> +/*
> + * Does the free space length in the free space index block ($len) match
> + * the longest length in the directory data block's bestfree array?
> + * Assume that we've already checked that the data block's bestfree
> + * array is in order.
> + */
> +static inline void
> +xfs_scrub_directory_check_freesp(

No need for inline here, the compiler will do that automatically if
appropriate.

> +	struct xfs_scrub_context	*sc,
> +	xfs_dablk_t			lblk,
> +	struct xfs_buf			*dbp,
> +	unsigned int			len)
> +{
> +	struct xfs_dir2_data_free	*bf;
> +	struct xfs_dir2_data_free	*dfp;
> +	int				offset;
> +
> +	if (len == 0)
> +		return;
> +
> +	bf = sc->ip->d_ops->data_bestfree_p(dbp->b_addr);
> +	for (dfp = &bf[0]; dfp < &bf[XFS_DIR2_DATA_FD_COUNT]; dfp++) {
> +		offset = be16_to_cpu(dfp->offset);
> +		if (offset == 0)
> +			break;
> +		if (len == be16_to_cpu(dfp->length))
> +			return;
> +		/* Didn't find the best length in the bestfree data */
> +		break;
> +	}
> +
> +	xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
> +}
> +
> +/* Check free space info in a directory leaf1 block. */
> +STATIC int
> +xfs_scrub_directory_leaf1_bestfree(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_da_args		*args,
> +	xfs_dablk_t			lblk)
> +{
> +	struct xfs_dir2_leaf_tail	*ltp;
> +	struct xfs_buf			*dbp;
> +	struct xfs_buf			*bp;
> +	struct xfs_mount		*mp = sc->mp;
> +	__be16				*bestp;
> +	__u16				best;
> +	int				i;
> +	int				error;
> +
> +	/*
> +	 * Read the free space block.  The verifier will check for hash
> +	 * value ordering problems and check the stale entry count.
> +	 */
> +	error = xfs_dir3_leaf_read(sc->tp, sc->ip, lblk, -1, &bp);
> +	if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, lblk, &error))
> +		goto out;
> +
> +	/* Check all the entries. */
> +	ltp = xfs_dir2_leaf_tail_p(mp->m_dir_geo, bp->b_addr);
> +	bestp = xfs_dir2_leaf_bests_p(ltp);
> +	for (i = 0; i < be32_to_cpu(ltp->bestcount); i++, bestp++) {
> +		best = be16_to_cpu(*bestp);
> +		if (best == NULLDATAOFF)
> +			continue;

Count stale entries, check if matches hdr->stale ?

> +		error = xfs_dir3_data_read(sc->tp, sc->ip,
> +				i * args->geo->fsbcount, -1, &dbp);
> +		if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, lblk,
> +				&error))
> +			continue;
> +		xfs_scrub_directory_check_freesp(sc, lblk, dbp, best);
> +		xfs_trans_brelse(sc->tp, dbp);
> +	}
> +out:
> +	return error;
> +}
> +
> +/* Check free space info in a directory freespace block. */
> +STATIC int
> +xfs_scrub_directory_free_bestfree(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_da_args		*args,
> +	xfs_dablk_t			lblk)
> +{
> +	struct xfs_dir3_icfree_hdr	freehdr;
> +	struct xfs_buf			*dbp;
> +	struct xfs_buf			*bp;
> +	__be16				*bestp;
> +	__be16				best;
> +	int				i;
> +	int				error;
> +
> +	/* Read the free space block */
> +	error = xfs_dir2_free_read(sc->tp, sc->ip, lblk, &bp);
> +	if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, lblk, &error))
> +		goto out;
> +
> +	/* Check all the entries. */
> +	sc->ip->d_ops->free_hdr_from_disk(&freehdr, bp->b_addr);
> +	bestp = sc->ip->d_ops->free_bests_p(bp->b_addr);
> +	for (i = 0; i < freehdr.nvalid; i++, bestp++) {
> +		best = be16_to_cpu(*bestp);
> +		if (best == NULLDATAOFF)
> +			continue;

Count stale entries, check freehdr.nvalid + stale = freehdr.nused?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 26/30] xfs: scrub extended attributes
  2017-10-12  1:43 ` [PATCH 26/30] xfs: scrub extended attributes Darrick J. Wong
@ 2017-10-16  4:50   ` Dave Chinner
  0 siblings, 0 replies; 99+ messages in thread
From: Dave Chinner @ 2017-10-16  4:50 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Oct 11, 2017 at 06:43:32PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Scrub the hash tree, keys, and values in an extended attribute structure.
> Refactor the attribute code to use the transaction if the caller supplied
> one to avoid buffer deadocks.

Looks good.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 27/30] xfs: scrub symbolic links
  2017-10-12  1:43 ` [PATCH 27/30] xfs: scrub symbolic links Darrick J. Wong
@ 2017-10-16  4:52   ` Dave Chinner
  0 siblings, 0 replies; 99+ messages in thread
From: Dave Chinner @ 2017-10-16  4:52 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Oct 11, 2017 at 06:43:41PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Create the infrastructure to scrub symbolic link data.

looks good.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 28/30] xfs: scrub directory parent pointers
  2017-10-12  1:43 ` [PATCH 28/30] xfs: scrub directory parent pointers Darrick J. Wong
@ 2017-10-16  5:09   ` Dave Chinner
  2017-10-16 21:46     ` Darrick J. Wong
  2017-10-17  0:16   ` [PATCH v2 " Darrick J. Wong
  1 sibling, 1 reply; 99+ messages in thread
From: Dave Chinner @ 2017-10-16  5:09 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Oct 11, 2017 at 06:43:47PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Scrub parent pointers, sort of.  For directories, we can ride the
> '..' entry up to the parent to confirm that there's at most one
> dentry that points back to this directory.

....

> +/* Count the number of dentries in the parent dir that point to this inode. */
> +STATIC int
> +xfs_scrub_parent_count_parent_dentries(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_inode		*parent,
> +	xfs_nlink_t			*nlink)
> +{
> +	struct xfs_scrub_parent_ctx	spc = {
> +		.dc.actor = xfs_scrub_parent_actor,
> +		.dc.pos = 0,
> +		.ino = sc->ip->i_ino,
> +		.nlink = 0,
> +	};
> +	struct xfs_ifork		*ifp;
> +	size_t				bufsize;
> +	loff_t				oldpos;
> +	uint				lock_mode;
> +	int				error;
> +
> +	/*
> +	 * Load the parent directory's extent map.  A regular directory
> +	 * open would start readahead (and thus load the extent map)
> +	 * before we even got to a readdir call, but this isn't
> +	 * guaranteed here.
> +	 */
> +	lock_mode = xfs_ilock_data_map_shared(parent);
> +	ifp = XFS_IFORK_PTR(parent, XFS_DATA_FORK);
> +	if (XFS_IFORK_FORMAT(parent, XFS_DATA_FORK) == XFS_DINODE_FMT_BTREE &&
> +	    !(ifp->if_flags & XFS_IFEXTENTS)) {
> +		error = xfs_iread_extents(sc->tp, parent, XFS_DATA_FORK);
> +		if (error) {
> +			xfs_iunlock(parent, lock_mode);
> +			return error;
> +		}
> +	}
> +	xfs_iunlock(parent, lock_mode);

Why not just do what xfs_dir_open() does? i.e.

        /*
         * If there are any blocks, read-ahead block 0 as we're almost
         * certain to have the next operation be a read there.
         */
        mode = xfs_ilock_data_map_shared(ip);
        if (ip->i_d.di_nextents > 0)
                error = xfs_dir3_data_readahead(ip, 0, -1);
        xfs_iunlock(ip, mode);

> +	/*
> +	 * Iterate the parent dir to confirm that there is
> +	 * exactly one entry pointing back to the inode being
> +	 * scanned.
> +	 */
> +	bufsize = (size_t)min_t(loff_t, 32768, parent->i_d.di_size);

Perhaps we need a define for that 32k magic number now it's being
used in multiple places?

> +	oldpos = 0;
> +	while (true) {
> +		error = xfs_readdir(sc->tp, parent, &spc.dc, bufsize);
> +		if (error)
> +			goto out;
> +		if (oldpos == spc.dc.pos)
> +			break;
> +		oldpos = spc.dc.pos;
> +	}
> +	*nlink = spc.nlink;
> +out:
> +	return error;
> +}
> +
> +/* Scrub a parent pointer. */
> +int
> +xfs_scrub_parent(
> +	struct xfs_scrub_context	*sc)
> +{
> +	struct xfs_mount		*mp = sc->mp;
> +	struct xfs_inode		*dp = NULL;
> +	xfs_ino_t			dnum;
> +	xfs_nlink_t			expected_nlink;
> +	xfs_nlink_t			nlink;
> +	int				tries = 0;
> +	int				error;
> +
> +	/*
> +	 * If we're a directory, check that the '..' link points up to
> +	 * a directory that has one entry pointing to us.
> +	 */
> +	if (!S_ISDIR(VFS_I(sc->ip)->i_mode))
> +		return -ENOENT;
> +
> +	/* We're not a special inode, are we? */
> +	if (!xfs_verify_dir_ino_ptr(mp, sc->ip->i_ino)) {
> +		xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
> +		goto out;
> +	}
> +
> +	/*
> +	 * If we're an unlinked directory, the parent /won't/ have a link
> +	 * to us.  Otherwise, it should have one link.
> +	 */
> +	expected_nlink = VFS_I(sc->ip)->i_nlink == 0 ? 0 : 1;
> +
> +	/*
> +	 * The VFS grabs a read or write lock via i_rwsem before it reads
> +	 * or writes to a directory.  If we've gotten this far we've
> +	 * already obtained IOLOCK_EXCL, which (since 4.10) is the same as
> +	 * getting a write lock on i_rwsem.  Therefore, it is safe for us
> +	 * to drop the ILOCK here in order to do directory lookups.
> +	 */
> +	sc->ilock_flags &= ~(XFS_ILOCK_EXCL | XFS_MMAPLOCK_EXCL);
> +	xfs_iunlock(sc->ip, XFS_ILOCK_EXCL | XFS_MMAPLOCK_EXCL);
> +
> +	/* Look up '..' */
> +	error = xfs_dir_lookup(sc->tp, sc->ip, &xfs_name_dotdot, &dnum, NULL);
> +	if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, 0, &error))
> +		goto out;
> +	if (!xfs_verify_dir_ino_ptr(mp, dnum)) {
> +		xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
> +		goto out;
> +	}
> +
> +	/* Is this the root dir?  Then '..' must point to itself. */
> +	if (sc->ip == mp->m_rootip) {
> +		if (sc->ip->i_ino != mp->m_sb.sb_rootino ||
> +		    sc->ip->i_ino != dnum)
> +			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
> +		return 0;
> +	}

All good to here.

> +try_again:
> +	/* Otherwise, '..' must not point to ourselves. */
> +	if (sc->ip->i_ino == dnum) {
> +		xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
> +		goto out;
> +	}
> +
> +	error = xfs_iget(mp, sc->tp, dnum, XFS_IGET_DONTCACHE, 0, &dp);
> +	if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, 0, &error))
> +		goto out;
> +	if (dp == sc->ip) {
> +		xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
> +		goto out_rele;
> +	}
> +
> +	/*
> +	 * We prefer to keep the inode locked while we lock and search
> +	 * its alleged parent for a forward reference.  However, this
> +	 * child -> parent scheme can deadlock with the parent -> child
> +	 * scheme that is normally used.  Therefore, if we can lock the
> +	 * parent, just validate the references and get out.
> +	 */
> +	if (xfs_ilock_nowait(dp, XFS_IOLOCK_SHARED)) {
> +		error = xfs_scrub_parent_count_parent_dentries(sc, dp, &nlink);
> +		if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, 0,
> +				&error))
> +			goto out_unlock;
> +		if (nlink != expected_nlink)
> +			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
> +		goto out_unlock;
> +	}
> +
> +	/*
> +	 * The game changes if we get here.  We failed to lock the parent,
> +	 * so we're going to try to verify both pointers while only holding
> +	 * one lock so as to avoid deadlocking with something that's actually
> +	 * trying to traverse down the directory tree.
> +	 */
> +	xfs_iunlock(sc->ip, sc->ilock_flags);
> +	sc->ilock_flags = 0;
> +	xfs_ilock(dp, XFS_IOLOCK_SHARED);
> +
> +	/* Go looking for our dentry. */
> +	error = xfs_scrub_parent_count_parent_dentries(sc, dp, &nlink);
> +	if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, 0, &error))
> +		goto out_unlock;
> +
> +	/* Drop the parent lock, relock this inode. */
> +	xfs_iunlock(dp, XFS_IOLOCK_SHARED);
> +	sc->ilock_flags = XFS_IOLOCK_EXCL;
> +	xfs_ilock(sc->ip, sc->ilock_flags);
> +
> +	/* Look up '..' to see if the inode changed. */
> +	error = xfs_dir_lookup(sc->tp, sc->ip, &xfs_name_dotdot, &dnum, NULL);
> +	if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, 0, &error))
> +		goto out_rele;
> +
> +	/* Drat, parent changed.  Try again! */
> +	if (dnum != dp->i_ino) {
> +		iput(VFS_I(dp));
> +		tries++;
> +		if (tries < 20)
> +			goto try_again;
> +		xfs_scrub_set_incomplete(sc);
> +		goto out;
> +	}
> +	iput(VFS_I(dp));

Can you factor this into a loop and function?

	do {
		valid = xfs_scrub_parent_validate(&error)
		if (error)
			goto out_unlock;
	} while (!valid && ++retries < 20)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 29/30] xfs: scrub realtime bitmap/summary
  2017-10-12  1:43 ` [PATCH 29/30] xfs: scrub realtime bitmap/summary Darrick J. Wong
@ 2017-10-16  5:11   ` Dave Chinner
  0 siblings, 0 replies; 99+ messages in thread
From: Dave Chinner @ 2017-10-16  5:11 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Oct 11, 2017 at 06:43:54PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Perform simple tests of the realtime bitmap and summary.

....

> +/* Realtime bitmap. */
> +
> +/* Scrub a free extent record from the realtime bitmap. */
> +STATIC int
> +xfs_scrub_rtbitmap_helper(

s/helper/rec/

Otherwise good.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 30/30] xfs: scrub quota information
  2017-10-12  1:44 ` [PATCH 30/30] xfs: scrub quota information Darrick J. Wong
@ 2017-10-16  5:12   ` Dave Chinner
  2017-10-17  1:11     ` Darrick J. Wong
  0 siblings, 1 reply; 99+ messages in thread
From: Dave Chinner @ 2017-10-16  5:12 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Oct 11, 2017 at 06:44:00PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Perform some quick sanity testing of the disk quota information.

Looks good.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 14/30] xfs: scrub the secondary superblocks
  2017-10-12  1:42 ` [PATCH 14/30] xfs: scrub the secondary superblocks Darrick J. Wong
@ 2017-10-16  5:16   ` Dave Chinner
  2017-10-20 23:34     ` Darrick J. Wong
  0 siblings, 1 reply; 99+ messages in thread
From: Dave Chinner @ 2017-10-16  5:16 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Oct 11, 2017 at 06:42:16PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Ensure that the geometry presented in the backup superblocks matches
> the primary superblock so that repair can recover the filesystem if
> that primary gets corrupted.

I've noticed that scrub on certain fstests will report PREEN for
secondary superblocks and repair thinks there is nothing wrong and
doesn't fix them. I'm not sure which field it's complaining about,
but at this point I don't see this as a blocker. Follow up patches
would be fine.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 02/30] xfs: create block pointer check functions
  2017-10-12  5:48     ` Dave Chinner
@ 2017-10-16 19:46       ` Darrick J. Wong
  0 siblings, 0 replies; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-16 19:46 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Oct 12, 2017 at 04:48:56PM +1100, Dave Chinner wrote:
> On Thu, Oct 12, 2017 at 04:28:52PM +1100, Dave Chinner wrote:
> > On Wed, Oct 11, 2017 at 06:40:55PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > Create some helper functions to check that a block pointer points
> > > within the filesystem (or AG) and doesn't point at static metadata.
> > > We will use this for scrub.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Look fine
> 
> now that I think about it and seen a bit more code....
> 
> > 
> > Reviewed-by: Dave Chinner <dchinner@redhat.com>
> > 
> > > ---
> > >  fs/xfs/libxfs/xfs_alloc.c    |   49 ++++++++++++++++++++++++++++++++++++++++++
> > >  fs/xfs/libxfs/xfs_alloc.h    |    4 +++
> > >  fs/xfs/libxfs/xfs_rtbitmap.c |   12 ++++++++++
> > >  fs/xfs/xfs_rtalloc.h         |    2 ++
> > >  4 files changed, 67 insertions(+)
> > > 
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> > > index 744dcae..bd3a943 100644
> > > --- a/fs/xfs/libxfs/xfs_alloc.c
> > > +++ b/fs/xfs/libxfs/xfs_alloc.c
> > > @@ -2923,3 +2923,52 @@ xfs_alloc_query_all(
> > >  	query.fn = fn;
> > >  	return xfs_btree_query_all(cur, xfs_alloc_query_range_helper, &query);
> > >  }
> > > +
> > > +/* Find the size of the AG, in blocks. */
> > > +xfs_agblock_t
> > > +xfs_ag_block_count(
> > > +	struct xfs_mount	*mp,
> > > +	xfs_agnumber_t		agno)
> > > +{
> > > +	ASSERT(agno < mp->m_sb.sb_agcount);
> > > +
> > > +	if (agno < mp->m_sb.sb_agcount - 1)
> > > +		return mp->m_sb.sb_agblocks;
> > > +	return mp->m_sb.sb_dblocks - (agno * mp->m_sb.sb_agblocks);
> > > +}
> > > +
> > > +/*
> > > + * Verify that an AG block number pointer neither points outside the AG
> > > + * nor points at static metadata.
> > > + */
> > > +bool
> > > +xfs_verify_agbno_ptr(
> 
> You can probably drop the "_ptr" prefix from these because I don't
> think we every try to check/validate the agbno/fsbno of the static
> metadata....
> 
> Some of the code just reads a bit weird with the "_ptr" suffix
> in it...

I wrangled with the name for a while too -- a generic block number could
refer to any part of the AG, whereas a block number in a metadata
structure is a pointer and should never point to static metadata, hence
the _ptr suffix.  On the other hand, some of the block pointers can be
NULL{FS,AG}BLOCK and others can't, and we don't check that here so it's
not quite a pointer check either.

Meh.

xfs_verify_agbno() it is.  Anyone reading the comments will figure out
why xfs_verify_agbno(mp, agno, XFS_AGFL_BLOCK(mp)) == false.

> Still consider it reviewed....

Ok.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH v2 04/30] xfs: refactor btree block header checking functions
  2017-10-12  1:41 ` [PATCH 04/30] xfs: refactor btree block header checking functions Darrick J. Wong
  2017-10-13  1:01   ` Dave Chinner
@ 2017-10-16 19:48   ` Darrick J. Wong
  2017-10-16 23:36     ` Dave Chinner
  1 sibling, 1 reply; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-16 19:48 UTC (permalink / raw)
  To: linux-xfs; +Cc: Dave Chinner

Refactor the btree block header checks to have an internal function that
returns the address of the failing check without logging errors.  The
scrubber will call the internal function, while the external version
will maintain the current logging behavior.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
v2: fix raw use of asm volatile in __this_address and lengthen comments
---
 fs/xfs/libxfs/xfs_btree.c |  166 +++++++++++++++++++++++++++------------------
 fs/xfs/libxfs/xfs_btree.h |    8 ++
 fs/xfs/xfs_linux.h        |    7 ++
 3 files changed, 113 insertions(+), 68 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index ae19f24..67a5b0f 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -63,44 +63,61 @@ xfs_btree_magic(
 	return magic;
 }
 
-STATIC int				/* error (0 or EFSCORRUPTED) */
-xfs_btree_check_lblock(
-	struct xfs_btree_cur	*cur,	/* btree cursor */
-	struct xfs_btree_block	*block,	/* btree long form block pointer */
-	int			level,	/* level of the btree block */
-	struct xfs_buf		*bp)	/* buffer for block, if any */
+/*
+ * Check a long btree block header.  Return the address of the failing check,
+ * or NULL if everything is ok.
+ */
+void *
+__xfs_btree_check_lblock(
+	struct xfs_btree_cur	*cur,
+	struct xfs_btree_block	*block,
+	int			level,
+	struct xfs_buf		*bp)
 {
-	int			lblock_ok = 1; /* block passes checks */
-	struct xfs_mount	*mp;	/* file system mount point */
+	struct xfs_mount	*mp = cur->bc_mp;
 	xfs_btnum_t		btnum = cur->bc_btnum;
-	int			crc;
-
-	mp = cur->bc_mp;
-	crc = xfs_sb_version_hascrc(&mp->m_sb);
+	int			crc = xfs_sb_version_hascrc(&mp->m_sb);
 
 	if (crc) {
-		lblock_ok = lblock_ok &&
-			uuid_equal(&block->bb_u.l.bb_uuid,
-				   &mp->m_sb.sb_meta_uuid) &&
-			block->bb_u.l.bb_blkno == cpu_to_be64(
-				bp ? bp->b_bn : XFS_BUF_DADDR_NULL);
+		if (!uuid_equal(&block->bb_u.l.bb_uuid, &mp->m_sb.sb_meta_uuid))
+			return __this_address;
+		if (block->bb_u.l.bb_blkno !=
+		    cpu_to_be64(bp ? bp->b_bn : XFS_BUF_DADDR_NULL))
+			return __this_address;
 	}
 
-	lblock_ok = lblock_ok &&
-		be32_to_cpu(block->bb_magic) == xfs_btree_magic(crc, btnum) &&
-		be16_to_cpu(block->bb_level) == level &&
-		be16_to_cpu(block->bb_numrecs) <=
-			cur->bc_ops->get_maxrecs(cur, level) &&
-		block->bb_u.l.bb_leftsib &&
-		(block->bb_u.l.bb_leftsib == cpu_to_be64(NULLFSBLOCK) ||
-		 XFS_FSB_SANITY_CHECK(mp,
-			be64_to_cpu(block->bb_u.l.bb_leftsib))) &&
-		block->bb_u.l.bb_rightsib &&
-		(block->bb_u.l.bb_rightsib == cpu_to_be64(NULLFSBLOCK) ||
-		 XFS_FSB_SANITY_CHECK(mp,
-			be64_to_cpu(block->bb_u.l.bb_rightsib)));
-
-	if (unlikely(XFS_TEST_ERROR(!lblock_ok, mp,
+	if (be32_to_cpu(block->bb_magic) != xfs_btree_magic(crc, btnum))
+		return __this_address;
+	if (be16_to_cpu(block->bb_level) != level)
+		return __this_address;
+	if (be16_to_cpu(block->bb_numrecs) >
+	    cur->bc_ops->get_maxrecs(cur, level))
+		return __this_address;
+	if (block->bb_u.l.bb_leftsib != cpu_to_be64(NULLFSBLOCK) &&
+	    !xfs_btree_check_lptr(cur, be64_to_cpu(block->bb_u.l.bb_leftsib),
+			level + 1))
+		return __this_address;
+	if (block->bb_u.l.bb_rightsib != cpu_to_be64(NULLFSBLOCK) &&
+	    !xfs_btree_check_lptr(cur, be64_to_cpu(block->bb_u.l.bb_rightsib),
+			level + 1))
+		return __this_address;
+
+	return NULL;
+}
+
+/* Check a long btree block header. */
+int
+xfs_btree_check_lblock(
+	struct xfs_btree_cur	*cur,
+	struct xfs_btree_block	*block,
+	int			level,
+	struct xfs_buf		*bp)
+{
+	struct xfs_mount	*mp = cur->bc_mp;
+	void			*failed_at;
+
+	failed_at = __xfs_btree_check_lblock(cur, block, level, bp);
+	if (unlikely(XFS_TEST_ERROR(failed_at != NULL, mp,
 			XFS_ERRTAG_BTREE_CHECK_LBLOCK))) {
 		if (bp)
 			trace_xfs_btree_corrupt(bp, _RET_IP_);
@@ -110,48 +127,61 @@ xfs_btree_check_lblock(
 	return 0;
 }
 
-STATIC int				/* error (0 or EFSCORRUPTED) */
-xfs_btree_check_sblock(
-	struct xfs_btree_cur	*cur,	/* btree cursor */
-	struct xfs_btree_block	*block,	/* btree short form block pointer */
-	int			level,	/* level of the btree block */
-	struct xfs_buf		*bp)	/* buffer containing block */
+/*
+ * Check a short btree block header.  Return the address of the failing check,
+ * or NULL if everything is ok.
+ */
+void *
+__xfs_btree_check_sblock(
+	struct xfs_btree_cur	*cur,
+	struct xfs_btree_block	*block,
+	int			level,
+	struct xfs_buf		*bp)
 {
-	struct xfs_mount	*mp;	/* file system mount point */
-	struct xfs_buf		*agbp;	/* buffer for ag. freespace struct */
-	struct xfs_agf		*agf;	/* ag. freespace structure */
-	xfs_agblock_t		agflen;	/* native ag. freespace length */
-	int			sblock_ok = 1; /* block passes checks */
+	struct xfs_mount	*mp = cur->bc_mp;
 	xfs_btnum_t		btnum = cur->bc_btnum;
-	int			crc;
-
-	mp = cur->bc_mp;
-	crc = xfs_sb_version_hascrc(&mp->m_sb);
-	agbp = cur->bc_private.a.agbp;
-	agf = XFS_BUF_TO_AGF(agbp);
-	agflen = be32_to_cpu(agf->agf_length);
+	int			crc = xfs_sb_version_hascrc(&mp->m_sb);
 
 	if (crc) {
-		sblock_ok = sblock_ok &&
-			uuid_equal(&block->bb_u.s.bb_uuid,
-				   &mp->m_sb.sb_meta_uuid) &&
-			block->bb_u.s.bb_blkno == cpu_to_be64(
-				bp ? bp->b_bn : XFS_BUF_DADDR_NULL);
+		if (!uuid_equal(&block->bb_u.s.bb_uuid, &mp->m_sb.sb_meta_uuid))
+			return __this_address;
+		if (block->bb_u.s.bb_blkno !=
+		    cpu_to_be64(bp ? bp->b_bn : XFS_BUF_DADDR_NULL))
+			return __this_address;
 	}
 
-	sblock_ok = sblock_ok &&
-		be32_to_cpu(block->bb_magic) == xfs_btree_magic(crc, btnum) &&
-		be16_to_cpu(block->bb_level) == level &&
-		be16_to_cpu(block->bb_numrecs) <=
-			cur->bc_ops->get_maxrecs(cur, level) &&
-		(block->bb_u.s.bb_leftsib == cpu_to_be32(NULLAGBLOCK) ||
-		 be32_to_cpu(block->bb_u.s.bb_leftsib) < agflen) &&
-		block->bb_u.s.bb_leftsib &&
-		(block->bb_u.s.bb_rightsib == cpu_to_be32(NULLAGBLOCK) ||
-		 be32_to_cpu(block->bb_u.s.bb_rightsib) < agflen) &&
-		block->bb_u.s.bb_rightsib;
-
-	if (unlikely(XFS_TEST_ERROR(!sblock_ok, mp,
+	if (be32_to_cpu(block->bb_magic) != xfs_btree_magic(crc, btnum))
+		return __this_address;
+	if (be16_to_cpu(block->bb_level) != level)
+		return __this_address;
+	if (be16_to_cpu(block->bb_numrecs) >
+	    cur->bc_ops->get_maxrecs(cur, level))
+		return __this_address;
+	if (block->bb_u.s.bb_leftsib != cpu_to_be32(NULLAGBLOCK) &&
+	    !xfs_btree_check_sptr(cur, be32_to_cpu(block->bb_u.s.bb_leftsib),
+			level + 1))
+		return __this_address;
+	if (block->bb_u.s.bb_rightsib != cpu_to_be32(NULLAGBLOCK) &&
+	    !xfs_btree_check_sptr(cur, be32_to_cpu(block->bb_u.s.bb_rightsib),
+			level + 1))
+		return __this_address;
+
+	return NULL;
+}
+
+/* Check a short btree block header. */
+STATIC int
+xfs_btree_check_sblock(
+	struct xfs_btree_cur	*cur,
+	struct xfs_btree_block	*block,
+	int			level,
+	struct xfs_buf		*bp)
+{
+	struct xfs_mount	*mp = cur->bc_mp;
+	void			*failed_at;
+
+	failed_at = __xfs_btree_check_sblock(cur, block, level, bp);
+	if (unlikely(XFS_TEST_ERROR(failed_at != NULL, mp,
 			XFS_ERRTAG_BTREE_CHECK_SBLOCK))) {
 		if (bp)
 			trace_xfs_btree_corrupt(bp, _RET_IP_);
diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index 8f52eda..43694e4 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -255,6 +255,14 @@ typedef struct xfs_btree_cur
  */
 #define	XFS_BUF_TO_BLOCK(bp)	((struct xfs_btree_block *)((bp)->b_addr))
 
+/*
+ * Internal long and short btree block checks.  They return NULL if the
+ * block is ok or the address of the failed check otherwise.
+ */
+void *__xfs_btree_check_lblock(struct xfs_btree_cur *cur,
+		struct xfs_btree_block *block, int level, struct xfs_buf *bp);
+void *__xfs_btree_check_sblock(struct xfs_btree_cur *cur,
+		struct xfs_btree_block *block, int level, struct xfs_buf *bp);
 
 /*
  * Check that block header is ok.
diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
index dcd1292..00a5efe 100644
--- a/fs/xfs/xfs_linux.h
+++ b/fs/xfs/xfs_linux.h
@@ -142,6 +142,13 @@ typedef __u32			xfs_nlink_t;
 #define SYNCHRONIZE()	barrier()
 #define __return_address __builtin_return_address(0)
 
+/*
+ * Return the address of a label.  Use barrier() so that the optimizer
+ * won't reorder code to refactor the error jumpouts into a single
+ * return, which throws off the reported address.
+ */
+#define __this_address	({ __label__ __here; __here: barrier(); &&__here; })
+
 #define XFS_PROJID_DEFAULT	0
 
 #define MIN(a,b)	(min(a,b))

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 05/30] xfs: create inode pointer verifiers
  2017-10-12  1:41 ` [PATCH 05/30] xfs: create inode pointer verifiers Darrick J. Wong
  2017-10-12 20:23   ` Darrick J. Wong
@ 2017-10-16 19:49   ` Darrick J. Wong
  2017-10-16 23:53     ` Dave Chinner
  1 sibling, 1 reply; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-16 19:49 UTC (permalink / raw)
  To: linux-xfs; +Cc: Dave Chinner

Create some helper functions to check that inode pointers point to
somewhere within the filesystem and not at the static AG metadata.
Move xfs_internal_inum and create a directory inode check function.
We will use these functions in scrub and elsewhere.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
v2: fix names, fix incorrect results returned from agino_range, remove
unnecessary tests
---
 fs/xfs/libxfs/xfs_dir2.c   |   19 +--------
 fs/xfs/libxfs/xfs_ialloc.c |   90 ++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_ialloc.h |    7 +++
 fs/xfs/xfs_itable.c        |   10 -----
 fs/xfs/xfs_itable.h        |    2 -
 5 files changed, 100 insertions(+), 28 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
index ccf9783..ee5e916 100644
--- a/fs/xfs/libxfs/xfs_dir2.c
+++ b/fs/xfs/libxfs/xfs_dir2.c
@@ -30,6 +30,7 @@
 #include "xfs_bmap.h"
 #include "xfs_dir2.h"
 #include "xfs_dir2_priv.h"
+#include "xfs_ialloc.h"
 #include "xfs_error.h"
 #include "xfs_trace.h"
 
@@ -202,22 +203,8 @@ xfs_dir_ino_validate(
 	xfs_mount_t	*mp,
 	xfs_ino_t	ino)
 {
-	xfs_agblock_t	agblkno;
-	xfs_agino_t	agino;
-	xfs_agnumber_t	agno;
-	int		ino_ok;
-	int		ioff;
-
-	agno = XFS_INO_TO_AGNO(mp, ino);
-	agblkno = XFS_INO_TO_AGBNO(mp, ino);
-	ioff = XFS_INO_TO_OFFSET(mp, ino);
-	agino = XFS_OFFBNO_TO_AGINO(mp, agblkno, ioff);
-	ino_ok =
-		agno < mp->m_sb.sb_agcount &&
-		agblkno < mp->m_sb.sb_agblocks &&
-		agblkno != 0 &&
-		ioff < (1 << mp->m_sb.sb_inopblog) &&
-		XFS_AGINO_TO_INO(mp, agno, agino) == ino;
+	bool		ino_ok = xfs_verify_dir_ino(mp, ino);
+
 	if (unlikely(XFS_TEST_ERROR(!ino_ok, mp, XFS_ERRTAG_DIR_INO_VALIDATE))) {
 		xfs_warn(mp, "Invalid inode number 0x%Lx",
 				(unsigned long long) ino);
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index dfd6439..e11f8af 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -2664,3 +2664,93 @@ xfs_ialloc_pagi_init(
 		xfs_trans_brelse(tp, bp);
 	return 0;
 }
+
+/* Calculate the first and last possible inode number in an AG. */
+void
+xfs_ialloc_agino_range(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	xfs_agino_t		*first,
+	xfs_agino_t		*last)
+{
+	xfs_agblock_t		bno;
+	xfs_agblock_t		eoag;
+
+	eoag = xfs_ag_block_count(mp, agno);
+
+	/*
+	 * Calculate the first inode, which will be in the first
+	 * cluster-aligned block after the AGFL.
+	 */
+	bno = round_up(XFS_AGFL_BLOCK(mp) + 1,
+			xfs_ialloc_cluster_alignment(mp));
+	*first = XFS_OFFBNO_TO_AGINO(mp, bno, 0);
+
+	/*
+	 * Calculate the last inode, which will be at the end of the
+	 * last (aligned) cluster that can be allocated in the AG.
+	 */
+	bno = round_down(eoag, xfs_ialloc_cluster_alignment(mp));
+	*last = XFS_OFFBNO_TO_AGINO(mp, bno, 0) - 1;
+}
+
+/*
+ * Verify that an AG inode number pointer neither points outside the AG
+ * nor points at static metadata.
+ */
+bool
+xfs_verify_agino(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	xfs_agino_t		agino)
+{
+	xfs_agino_t		first;
+	xfs_agino_t		last;
+
+	xfs_ialloc_agino_range(mp, agno, &first, &last);
+	return agino >= first && agino <= last;
+}
+
+/*
+ * Verify that an FS inode number pointer neither points outside the
+ * filesystem nor points at static AG metadata.
+ */
+bool
+xfs_verify_ino(
+	struct xfs_mount	*mp,
+	xfs_ino_t		ino)
+{
+	xfs_agnumber_t		agno = XFS_INO_TO_AGNO(mp, ino);
+	xfs_agino_t		agino = XFS_INO_TO_AGINO(mp, ino);
+
+	if (agno >= mp->m_sb.sb_agcount)
+		return false;
+	if (XFS_AGINO_TO_INO(mp, agno, agino) != ino)
+		return false;
+	return xfs_verify_agino(mp, agno, agino);
+}
+
+/* Is this an internal inode number? */
+bool
+xfs_internal_inum(
+	struct xfs_mount	*mp,
+	xfs_ino_t		ino)
+{
+	return ino == mp->m_sb.sb_rbmino || ino == mp->m_sb.sb_rsumino ||
+		(xfs_sb_version_hasquota(&mp->m_sb) &&
+		 xfs_is_quota_inode(&mp->m_sb, ino));
+}
+
+/*
+ * Verify that a directory entry's inode number doesn't point at an internal
+ * inode, empty space, or static AG metadata.
+ */
+bool
+xfs_verify_dir_ino(
+	struct xfs_mount	*mp,
+	xfs_ino_t		ino)
+{
+	if (xfs_internal_inum(mp, ino))
+		return false;
+	return xfs_verify_ino(mp, ino);
+}
diff --git a/fs/xfs/libxfs/xfs_ialloc.h b/fs/xfs/libxfs/xfs_ialloc.h
index b32cfb5..d2bdcd5 100644
--- a/fs/xfs/libxfs/xfs_ialloc.h
+++ b/fs/xfs/libxfs/xfs_ialloc.h
@@ -173,5 +173,12 @@ void xfs_inobt_btrec_to_irec(struct xfs_mount *mp, union xfs_btree_rec *rec,
 		struct xfs_inobt_rec_incore *irec);
 
 int xfs_ialloc_cluster_alignment(struct xfs_mount *mp);
+void xfs_ialloc_agino_range(struct xfs_mount *mp, xfs_agnumber_t agno,
+		xfs_agino_t *first, xfs_agino_t *last);
+bool xfs_verify_agino(struct xfs_mount *mp, xfs_agnumber_t agno,
+		xfs_agino_t agino);
+bool xfs_verify_ino(struct xfs_mount *mp, xfs_ino_t ino);
+bool xfs_internal_inum(struct xfs_mount *mp, xfs_ino_t ino);
+bool xfs_verify_dir_ino(struct xfs_mount *mp, xfs_ino_t ino);
 
 #endif	/* __XFS_IALLOC_H__ */
diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
index c393a2f..0172d0b 100644
--- a/fs/xfs/xfs_itable.c
+++ b/fs/xfs/xfs_itable.c
@@ -31,16 +31,6 @@
 #include "xfs_trace.h"
 #include "xfs_icache.h"
 
-int
-xfs_internal_inum(
-	xfs_mount_t	*mp,
-	xfs_ino_t	ino)
-{
-	return (ino == mp->m_sb.sb_rbmino || ino == mp->m_sb.sb_rsumino ||
-		(xfs_sb_version_hasquota(&mp->m_sb) &&
-		 xfs_is_quota_inode(&mp->m_sb, ino)));
-}
-
 /*
  * Return stat information for one inode.
  * Return 0 if ok, else errno.
diff --git a/fs/xfs/xfs_itable.h b/fs/xfs/xfs_itable.h
index 17e86e0..6ea8b39 100644
--- a/fs/xfs/xfs_itable.h
+++ b/fs/xfs/xfs_itable.h
@@ -96,6 +96,4 @@ xfs_inumbers(
 	void			__user *buffer, /* buffer with inode info */
 	inumbers_fmt_pf		formatter);
 
-int xfs_internal_inum(struct xfs_mount *mp, xfs_ino_t ino);
-
 #endif	/* __XFS_ITABLE_H__ */

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* Re: [PATCH 08/30] xfs: probe the scrub ioctl
  2017-10-16  0:39   ` Dave Chinner
@ 2017-10-16 19:54     ` Darrick J. Wong
  2017-10-16 23:05       ` Dave Chinner
  0 siblings, 1 reply; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-16 19:54 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, Oct 16, 2017 at 11:39:12AM +1100, Dave Chinner wrote:
> On Wed, Oct 11, 2017 at 06:41:37PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Create a probe scrubber with id 0.  This will be used by xfs_scrub to
> > probe the kernel's abilities to scrub (and repair) the metadata.
> 
> This no longer returns anything to userspace it indicate
> capabilities. I can see that the previous patch checks for
> valid/invalid input flags, so we have unknown feature
> checking in place, just not obviously through the probe function
> implementation. Can you expand this comment a little to explain
> where the supported feature checks occur and so all that is required
> here is a stub that does nothing?

Ok.  I propose:

"Create a probe scrubber with id 0.  This will be used by xfs_scrub to
probe the kernel's abilities to scrub (and repair) the metadata.  We do
this by validating the ioctl inputs from userspace, preparing the
filesystem for a scrub (or a repair) operation, and immediately
returning to userspace.  Userspace can use the returned errno and
structure state to decide (in broad terms) if scrub/repair are supported
by the running kernel."

--D

> 
> Otherwise, consider it:
> 
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> 
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 11/30] xfs: scrub the shape of a metadata btree
  2017-10-16  1:29   ` Dave Chinner
@ 2017-10-16 20:09     ` Darrick J. Wong
  0 siblings, 0 replies; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-16 20:09 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, Oct 16, 2017 at 12:29:59PM +1100, Dave Chinner wrote:
> On Wed, Oct 11, 2017 at 06:41:56PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Create a function that can check the shape of a btree -- each block
> > passes basic inspection and all the pointers look ok.  In the next patch
> > we'll add the ability to check the actual keys and records stored within
> > the btree.  Add some helper functions so that we report detailed scrub
> > errors in a uniform manner in dmesg.  These are helper functions for
> > subsequent patches.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Minor thing:
> 
> >  /*
> > + * Check a btree pointer.  Returns true if it's ok to use this pointer.
> > + * Callers do not need to set the corrupt flag.
> > + */
> > +static bool
> > +xfs_scrub_btree_ptr_ok(
> > +	struct xfs_scrub_btree		*bs,
> > +	int				level,
> > +	union xfs_btree_ptr		*ptr)
> > +{
> > +	bool				res;
> > +
> > +	/* A btree rooted in an inode has no block pointer to the root. */
> > +	if ((bs->cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) &&
> > +	    level == bs->cur->bc_nlevels)
> > +		return true;
> > +
> > +	/* Otherwise, check the pointers. */
> > +	if (bs->cur->bc_flags & XFS_BTREE_LONG_PTRS) {
> > +		res = xfs_btree_check_lptr(bs->cur, be64_to_cpu(ptr->l), level);
> > +		if (!res)
> > +			xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, level);
> > +	} else {
> > +		res = xfs_btree_check_sptr(bs->cur, be32_to_cpu(ptr->s), level);
> > +		if (!res)
> > +			xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, level);
> > +	}
> 
> We should already know what type of btree we are scrubbing, so I
> think this can be simplified to a single
> xfs_scrub_btree_set_corrupt() tracepoint.

Ok to both.

> > +STATIC int
> > +xfs_scrub_btree_get_block(
> > +	struct xfs_scrub_btree		*bs,
> > +	int				level,
> > +	union xfs_btree_ptr		*pp,
> > +	struct xfs_btree_block		**pblock,
> > +	struct xfs_buf			**pbp)
> > +{
> > +	void				*failed_at;
> > +	int				error;
> > +
> > +	error = xfs_btree_lookup_get_block(bs->cur, level, pp, pblock);
> > +	if (!xfs_scrub_btree_process_error(bs->sc, bs->cur, level, &error) ||
> > +	    !pblock)
> > +		return error;
> > +
> > +	xfs_btree_get_block(bs->cur, level, pbp);
> > +	if (bs->cur->bc_flags & XFS_BTREE_LONG_PTRS) {
> > +		failed_at = __xfs_btree_check_lblock(bs->cur, *pblock,
> > +				level, *pbp);
> > +		if (failed_at) {
> > +			xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, level);
> > +			return 0;
> > +		}
> > +	} else {
> > +		failed_at = __xfs_btree_check_sblock(bs->cur, *pblock,
> > +				 level, *pbp);
> > +		if (failed_at) {
> > +			xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, level);
> > +			return 0;
> > +		}
> > +	}
> 
> And same here.
> 
> > diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
> > index a7c3361..414bbb8 100644
> > --- a/fs/xfs/scrub/common.h
> > +++ b/fs/xfs/scrub/common.h
> > @@ -21,6 +21,24 @@
> >  #define __XFS_SCRUB_COMMON_H__
> >  
> >  /*
> > + * We /could/ terminate a scrub/repair operation early.  If we're not
> > + * in a good place to continue (fatal signal, etc.) then bail out.
> > + * Note that we're careful not to make any judgements about *error.
> > + */
> > +static inline bool
> > +xfs_scrub_should_terminate(
> > +	struct xfs_scrub_context	*sc,
> > +	int				*error)
> > +{
> > +	if (fatal_signal_pending(current)) {
> > +		if (*error == 0)
> > +			*error = -EAGAIN;
> > +		return true;
> > +	}
> > +	return false;
> > +}
> 
> Probably should move that to the original scrub infrastructure
> patch.

Will do.  It's in this patch because (iirc) this is the first time
anything uses it.

--D

> Otherwise looks fine.
> 
> 
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> 
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 17/30] xfs: scrub free space btrees
  2017-10-16  2:25   ` Dave Chinner
@ 2017-10-16 20:36     ` Darrick J. Wong
  0 siblings, 0 replies; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-16 20:36 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, Oct 16, 2017 at 01:25:58PM +1100, Dave Chinner wrote:
> On Wed, Oct 11, 2017 at 06:42:35PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Check the extent records free space btrees to ensure that the values
> > look sane.
> 
> Minor thing:
> 
> > +/* Scrub a bnobt/cntbt record. */
> > +STATIC int
> > +xfs_scrub_allocbt_helper(
> 
> xfs_scrub_allocbt_rec()
> 
> Reads much more nicely with this name. :P

<nod> Will change as I go through the patches.

--D

> Otherwise, consider it:
> 
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> 
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 22/30] xfs: scrub inode block mappings
  2017-10-16  3:26   ` Dave Chinner
@ 2017-10-16 20:43     ` Darrick J. Wong
  0 siblings, 0 replies; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-16 20:43 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, Oct 16, 2017 at 02:26:01PM +1100, Dave Chinner wrote:
> On Wed, Oct 11, 2017 at 06:43:06PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Scrub an individual inode's block mappings to make sure they make sense.
> 
> ....
> 
> > +/* Set us up with an inode's bmap. */
> > +int
> > +xfs_scrub_setup_inode_bmap(
> > +	struct xfs_scrub_context	*sc,
> > +	struct xfs_inode		*ip)
> > +{
> > +	struct xfs_mount		*mp = sc->mp;
> > +	int				error;
> > +
> > +	error = xfs_scrub_get_inode(sc, ip);
> > +	if (error)
> > +		goto out;
> > +
> > +	sc->ilock_flags = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL;
> > +	xfs_ilock(sc->ip, sc->ilock_flags);
> > +
> > +	/*
> > +	 * We don't want any ephemeral data fork updates sitting around
> > +	 * while we inspect block mappings, so wait for directio to finish
> > +	 * and flush dirty data if we have delalloc reservations.
> > +	 */
> > +	if (S_ISREG(VFS_I(sc->ip)->i_mode) &&
> > +	    sc->sm->sm_type == XFS_SCRUB_TYPE_BMBTD) {
> > +		inode_dio_wait(VFS_I(sc->ip));
> > +		error = filemap_write_and_wait(VFS_I(sc->ip)->i_mapping);
> > +		if (error)
> > +			goto out;
> > +
> > +		/* Drop the page cache if we're repairing block mappings. */
> > +		if (sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR) {
> > +			error = invalidate_inode_pages2(
> > +					VFS_I(sc->ip)->i_mapping);
> > +			if (error)
> > +				goto out;
> 
> I'll point this out just to say I've seen it. It's a little out of
> place for this patch set, but it's harmless.

Oops, I'll move this to the repair patches since I'm already reworking
this patch anyway.

> > +/* Scrub a bmbt record. */
> > +STATIC int
> > +xfs_scrub_bmapbt_helper(
> 
> s/helper/rec/
> 
> > + *
> > + * First we scan every record in every btree block, if applicable.
> > + * Then we unconditionally scan the incore extent cache.
> > + */
> > +STATIC int
> > +xfs_scrub_bmap(
> > +	struct xfs_scrub_context	*sc,
> > +	int				whichfork)
> > +{
> > +	struct xfs_bmbt_irec		irec;
> > +	struct xfs_scrub_bmap_info	info = {0};
> > +	struct xfs_mount		*mp = sc->mp;
> > +	struct xfs_inode		*ip = sc->ip;
> > +	struct xfs_ifork		*ifp;
> > +	xfs_fileoff_t			endoff;
> > +	xfs_extnum_t			idx;
> > +	bool				found;
> > +	int				error = 0;
> > +
> > +	ifp = XFS_IFORK_PTR(ip, whichfork);
> > +
> > +	info.is_rt = whichfork == XFS_DATA_FORK && XFS_IS_REALTIME_INODE(ip);
> > +	info.whichfork = whichfork;
> > +	info.is_shared = whichfork == XFS_DATA_FORK && xfs_is_reflink_inode(ip);
> > +	info.sc = sc;
> > +
> > +	switch (whichfork) {
> > +	case XFS_COW_FORK:
> > +		/* Non-existent CoW forks are ignorable. */
> > +		if (!ifp)
> > +			goto out;
> > +		/* No CoW forks on non-reflink inodes/filesystems. */
> > +		if (!xfs_is_reflink_inode(ip)) {
> > +			xfs_scrub_ino_set_corrupt(sc, sc->ip->i_ino, NULL);
> > +			goto out;
> > +		}
> > +		break;
> > +	case XFS_ATTR_FORK:
> > +		if (!ifp)
> > +			goto out;
> > +		if (!xfs_sb_version_hasattr(&mp->m_sb) &&
> > +		    !xfs_sb_version_hasattr2(&mp->m_sb))
> > +			xfs_scrub_ino_set_corrupt(sc, sc->ip->i_ino, NULL);
> > +		break;
> > +	}
> 
> Missing a default option here for other values. Some compilers will
> warn about this.

Ok.

> Otherwise this look fine.
> 
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> 
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 24/30] xfs: scrub directory metadata
  2017-10-16  4:29   ` Dave Chinner
@ 2017-10-16 20:46     ` Darrick J. Wong
  0 siblings, 0 replies; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-16 20:46 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, Oct 16, 2017 at 03:29:47PM +1100, Dave Chinner wrote:
> On Wed, Oct 11, 2017 at 06:43:19PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Scrub the hash tree and all the entries in a directory.
> 
> .....
> > +/* Check that an inode's mode matches a given DT_ type. */
> > +STATIC int
> > +xfs_scrub_dir_check_ftype(
> > +	struct xfs_scrub_dir_ctx	*sdc,
> > +	xfs_fileoff_t			offset,
> > +	xfs_ino_t			inum,
> > +	int				dtype)
> > +{
> > +	struct xfs_mount		*mp = sdc->sc->mp;
> > +	struct xfs_inode		*ip;
> > +	int				ino_dtype;
> > +	int				error = 0;
> > +
> > +	if (!xfs_sb_version_hasftype(&mp->m_sb)) {
> > +		if (dtype != DT_UNKNOWN && dtype != DT_DIR)
> > +			xfs_scrub_fblock_set_corrupt(sdc->sc, XFS_DATA_FORK,
> > +					offset);
> > +		goto out;
> > +	}
> > +
> > +	error = xfs_iget(mp, sdc->sc->tp, inum, XFS_IGET_DONTCACHE, 0, &ip);
> > +	if (!xfs_scrub_fblock_process_error(sdc->sc, XFS_DATA_FORK, offset,
> > +			&error))
> > +		goto out;
> > +
> > +	/* Convert mode to the DT_* values that dir_emit uses. */
> > +	ino_dtype = (VFS_I(ip)->i_mode & S_IFMT) >> 12;
> 
> xfs_mode_to_ftype() ?

Yep, will fix.

--D

> Otherwise it looks ok.
> 
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> 
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 28/30] xfs: scrub directory parent pointers
  2017-10-16  5:09   ` Dave Chinner
@ 2017-10-16 21:46     ` Darrick J. Wong
  2017-10-16 23:30       ` Dave Chinner
  0 siblings, 1 reply; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-16 21:46 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, Oct 16, 2017 at 04:09:28PM +1100, Dave Chinner wrote:
> On Wed, Oct 11, 2017 at 06:43:47PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Scrub parent pointers, sort of.  For directories, we can ride the
> > '..' entry up to the parent to confirm that there's at most one
> > dentry that points back to this directory.
> 
> ....
> 
> > +/* Count the number of dentries in the parent dir that point to this inode. */
> > +STATIC int
> > +xfs_scrub_parent_count_parent_dentries(
> > +	struct xfs_scrub_context	*sc,
> > +	struct xfs_inode		*parent,
> > +	xfs_nlink_t			*nlink)
> > +{
> > +	struct xfs_scrub_parent_ctx	spc = {
> > +		.dc.actor = xfs_scrub_parent_actor,
> > +		.dc.pos = 0,
> > +		.ino = sc->ip->i_ino,
> > +		.nlink = 0,
> > +	};
> > +	struct xfs_ifork		*ifp;
> > +	size_t				bufsize;
> > +	loff_t				oldpos;
> > +	uint				lock_mode;
> > +	int				error;
> > +
> > +	/*
> > +	 * Load the parent directory's extent map.  A regular directory
> > +	 * open would start readahead (and thus load the extent map)
> > +	 * before we even got to a readdir call, but this isn't
> > +	 * guaranteed here.
> > +	 */
> > +	lock_mode = xfs_ilock_data_map_shared(parent);
> > +	ifp = XFS_IFORK_PTR(parent, XFS_DATA_FORK);
> > +	if (XFS_IFORK_FORMAT(parent, XFS_DATA_FORK) == XFS_DINODE_FMT_BTREE &&
> > +	    !(ifp->if_flags & XFS_IFEXTENTS)) {
> > +		error = xfs_iread_extents(sc->tp, parent, XFS_DATA_FORK);
> > +		if (error) {
> > +			xfs_iunlock(parent, lock_mode);
> > +			return error;
> > +		}
> > +	}
> > +	xfs_iunlock(parent, lock_mode);
> 
> Why not just do what xfs_dir_open() does? i.e.
> 
>         /*
>          * If there are any blocks, read-ahead block 0 as we're almost
>          * certain to have the next operation be a read there.
>          */
>         mode = xfs_ilock_data_map_shared(ip);
>         if (ip->i_d.di_nextents > 0)
>                 error = xfs_dir3_data_readahead(ip, 0, -1);
>         xfs_iunlock(ip, mode);

Ok.

> > +	/*
> > +	 * Iterate the parent dir to confirm that there is
> > +	 * exactly one entry pointing back to the inode being
> > +	 * scanned.
> > +	 */
> > +	bufsize = (size_t)min_t(loff_t, 32768, parent->i_d.di_size);
> 
> Perhaps we need a define for that 32k magic number now it's being
> used in multiple places?

Magic indeed; glibc uses the minimum of (4 * BUFSIZ) or (sizeof(struct
dirent)); musl uses a static 2k buffer; dietlibc uses (PAGE_SIZE -
sizeof(structure header))...

...what would we call it?

/*
 * The Linux API doesn't pass down the total size of the buffer we read
 * into down to the filesystem.  With the filldir concept it's not
 * needed for correct information, but the XFS dir2 leaf code wants an
 * estimate of the buffer size to calculate its readahead window and
 * size the buffers used for mapping to physical blocks.
 *
 * Try to give it an estimate that's good enough, maybe at some point we
 * can change the ->readdir prototype to include the buffer size.  For
 * now we use the current glibc buffer size.
 */
#define XFS_DEFAULT_READDIR_BUFSIZE	(32768)

(As a side question, do we want to bump this up to a full pagesize on
architectures that have 64k pages?  I'd probably just leave it, but
let's see if anyone running those architectures complains or sends in a
patch?)

> > +	oldpos = 0;
> > +	while (true) {
> > +		error = xfs_readdir(sc->tp, parent, &spc.dc, bufsize);
> > +		if (error)
> > +			goto out;
> > +		if (oldpos == spc.dc.pos)
> > +			break;
> > +		oldpos = spc.dc.pos;
> > +	}
> > +	*nlink = spc.nlink;
> > +out:
> > +	return error;
> > +}
> > +
> > +/* Scrub a parent pointer. */
> > +int
> > +xfs_scrub_parent(
> > +	struct xfs_scrub_context	*sc)
> > +{
> > +	struct xfs_mount		*mp = sc->mp;
> > +	struct xfs_inode		*dp = NULL;
> > +	xfs_ino_t			dnum;
> > +	xfs_nlink_t			expected_nlink;
> > +	xfs_nlink_t			nlink;
> > +	int				tries = 0;
> > +	int				error;
> > +
> > +	/*
> > +	 * If we're a directory, check that the '..' link points up to
> > +	 * a directory that has one entry pointing to us.
> > +	 */
> > +	if (!S_ISDIR(VFS_I(sc->ip)->i_mode))
> > +		return -ENOENT;
> > +
> > +	/* We're not a special inode, are we? */
> > +	if (!xfs_verify_dir_ino_ptr(mp, sc->ip->i_ino)) {
> > +		xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
> > +		goto out;
> > +	}
> > +
> > +	/*
> > +	 * If we're an unlinked directory, the parent /won't/ have a link
> > +	 * to us.  Otherwise, it should have one link.
> > +	 */
> > +	expected_nlink = VFS_I(sc->ip)->i_nlink == 0 ? 0 : 1;
> > +
> > +	/*
> > +	 * The VFS grabs a read or write lock via i_rwsem before it reads
> > +	 * or writes to a directory.  If we've gotten this far we've
> > +	 * already obtained IOLOCK_EXCL, which (since 4.10) is the same as
> > +	 * getting a write lock on i_rwsem.  Therefore, it is safe for us
> > +	 * to drop the ILOCK here in order to do directory lookups.
> > +	 */
> > +	sc->ilock_flags &= ~(XFS_ILOCK_EXCL | XFS_MMAPLOCK_EXCL);
> > +	xfs_iunlock(sc->ip, XFS_ILOCK_EXCL | XFS_MMAPLOCK_EXCL);
> > +
> > +	/* Look up '..' */
> > +	error = xfs_dir_lookup(sc->tp, sc->ip, &xfs_name_dotdot, &dnum, NULL);
> > +	if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, 0, &error))
> > +		goto out;
> > +	if (!xfs_verify_dir_ino_ptr(mp, dnum)) {
> > +		xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
> > +		goto out;
> > +	}
> > +
> > +	/* Is this the root dir?  Then '..' must point to itself. */
> > +	if (sc->ip == mp->m_rootip) {
> > +		if (sc->ip->i_ino != mp->m_sb.sb_rootino ||
> > +		    sc->ip->i_ino != dnum)
> > +			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
> > +		return 0;
> > +	}
> 
> All good to here.
> 
> > +try_again:
> > +	/* Otherwise, '..' must not point to ourselves. */
> > +	if (sc->ip->i_ino == dnum) {
> > +		xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
> > +		goto out;
> > +	}
> > +
> > +	error = xfs_iget(mp, sc->tp, dnum, XFS_IGET_DONTCACHE, 0, &dp);
> > +	if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, 0, &error))
> > +		goto out;
> > +	if (dp == sc->ip) {
> > +		xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
> > +		goto out_rele;
> > +	}
> > +
> > +	/*
> > +	 * We prefer to keep the inode locked while we lock and search
> > +	 * its alleged parent for a forward reference.  However, this
> > +	 * child -> parent scheme can deadlock with the parent -> child
> > +	 * scheme that is normally used.  Therefore, if we can lock the
> > +	 * parent, just validate the references and get out.
> > +	 */
> > +	if (xfs_ilock_nowait(dp, XFS_IOLOCK_SHARED)) {
> > +		error = xfs_scrub_parent_count_parent_dentries(sc, dp, &nlink);
> > +		if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, 0,
> > +				&error))
> > +			goto out_unlock;
> > +		if (nlink != expected_nlink)
> > +			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
> > +		goto out_unlock;
> > +	}
> > +
> > +	/*
> > +	 * The game changes if we get here.  We failed to lock the parent,
> > +	 * so we're going to try to verify both pointers while only holding
> > +	 * one lock so as to avoid deadlocking with something that's actually
> > +	 * trying to traverse down the directory tree.
> > +	 */
> > +	xfs_iunlock(sc->ip, sc->ilock_flags);
> > +	sc->ilock_flags = 0;
> > +	xfs_ilock(dp, XFS_IOLOCK_SHARED);
> > +
> > +	/* Go looking for our dentry. */
> > +	error = xfs_scrub_parent_count_parent_dentries(sc, dp, &nlink);
> > +	if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, 0, &error))
> > +		goto out_unlock;
> > +
> > +	/* Drop the parent lock, relock this inode. */
> > +	xfs_iunlock(dp, XFS_IOLOCK_SHARED);
> > +	sc->ilock_flags = XFS_IOLOCK_EXCL;
> > +	xfs_ilock(sc->ip, sc->ilock_flags);
> > +
> > +	/* Look up '..' to see if the inode changed. */
> > +	error = xfs_dir_lookup(sc->tp, sc->ip, &xfs_name_dotdot, &dnum, NULL);
> > +	if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, 0, &error))
> > +		goto out_rele;
> > +
> > +	/* Drat, parent changed.  Try again! */
> > +	if (dnum != dp->i_ino) {
> > +		iput(VFS_I(dp));
> > +		tries++;
> > +		if (tries < 20)
> > +			goto try_again;
> > +		xfs_scrub_set_incomplete(sc);
> > +		goto out;
> > +	}
> > +	iput(VFS_I(dp));
> 
> Can you factor this into a loop and function?
> 
> 	do {
> 		valid = xfs_scrub_parent_validate(&error)
> 		if (error)
> 			goto out_unlock;
> 	} while (!valid && ++retries < 20)

Ok.

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 21/30] xfs: scrub inodes
  2017-10-16  3:16     ` Dave Chinner
@ 2017-10-16 22:08       ` Darrick J. Wong
  0 siblings, 0 replies; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-16 22:08 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, Oct 16, 2017 at 02:16:47PM +1100, Dave Chinner wrote:
> On Thu, Oct 12, 2017 at 03:32:50PM -0700, Darrick J. Wong wrote:
> > On Wed, Oct 11, 2017 at 06:43:00PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > Scrub the fields within an inode.
> 
> .....
> 
> > > +
> > > +/*
> > > + * Given an inode and the scrub control structure, grab either the
> > > + * inode referenced in the control structure or the inode passed in.
> > > + * The inode is not locked.
> > > + */
> > > +int
> > > +xfs_scrub_get_inode(
> > > +	struct xfs_scrub_context	*sc,
> > > +	struct xfs_inode		*ip_in)
> > > +{
> > > +	struct xfs_mount		*mp = sc->mp;
> > > +	struct xfs_inode		*ip = NULL;
> > > +	int				error;
> > > +
> > > +	/*
> > > +	 * If userspace passed us an AG number or a generation number
> > > +	 * without an inode number, they haven't got a clue so bail out
> > > +	 * immediately.
> > > +	 */
> > > +	if (sc->sm->sm_agno || (sc->sm->sm_gen && !sc->sm->sm_ino))
> > > +		return -EINVAL;
> > > +
> > > +	/* We want to scan the inode we already had opened. */
> > > +	if (sc->sm->sm_ino == 0 || sc->sm->sm_ino == ip_in->i_ino) {
> > > +		sc->ip = ip_in;
> > > +		return 0;
> > > +	}
> > > +
> > > +	/* Look up the inode, see if the generation number matches. */
> > > +	if (xfs_internal_inum(mp, sc->sm->sm_ino))
> > > +		return -ENOENT;
> > > +	error = xfs_iget(mp, NULL, sc->sm->sm_ino,
> > > +			XFS_IGET_UNTRUSTED | XFS_IGET_DONTCACHE, 0, &ip);
> > > +	if (error == -ENOENT || error == -EINVAL) {
> > > +		/* inode doesn't exist... */
> > > +		return -ENOENT;
> > > +	} else if (error) {
> > > +		trace_xfs_scrub_op_error(sc,
> > > +				XFS_INO_TO_AGNO(mp, sc->sm->sm_ino),
> > > +				XFS_INO_TO_AGBNO(mp, sc->sm->sm_ino),
> > > +				error, __return_address);
> > > +		return error;
> > > +	}
> > > +	if (VFS_I(ip)->i_generation != sc->sm->sm_gen) {
> > > +		iput(VFS_I(ip));
> > > +		return -ENOENT;
> > > +	}
> > > +
> > > +	sc->ip = ip;
> > > +	return 0;
> > > +}
> 
> Much nicer with the way everything is clearly spelled out :P
> 
> > > +/* Inode core */
> > > +
> > > +/*
> > > + * di_extsize hint validation is somewhat cumbersome. Rules are:
> > > + *
> > > + * 1. extent size hint is only valid for directories and regular files
> > > + * 2. DIFLAG_EXTSIZE is only valid for regular files
> > > + * 3. DIFLAG_EXTSZINHERIT is only valid for directories.
> > > + * 4. extsize hint of 0 turns off hints, clears inode flags.
> > > + * 5. either flag must be set if extsize != 0
> > > + * 6. Extent size must be a multiple of the appropriate block size.
> > > + * 7. extent size hint cannot be longer than maximum extent length
> > > + * 8. for non-realtime files, the extent size hint must be limited
> > > + *    to half the AG size to avoid alignment extending the extent
> > > + *    beyond the limits of the AG.
> > > + */
> > > +STATIC void
> > > +xfs_scrub_inode_extsize(
> > > +	struct xfs_scrub_context	*sc,
> > > +	struct xfs_buf			*bp,
> > > +	struct xfs_dinode		*dip,
> > > +	xfs_ino_t			ino,
> > > +	uint16_t			mode,
> > > +	uint16_t			flags)
> > > +{
> > > +	struct xfs_mount		*mp = sc->mp;
> > > +	bool				rt_flag;
> > > +	bool				hint_flag;
> > > +	bool				inherit_flag;
> > > +	uint32_t			extsize;
> > > +	uint32_t			extsize_bytes;
> > > +	uint32_t			blocksize_bytes;
> > > +
> > > +	rt_flag = (flags & XFS_DIFLAG_REALTIME);
> > > +	hint_flag = (flags & XFS_DIFLAG_EXTSIZE);
> > > +	inherit_flag = (flags & XFS_DIFLAG_EXTSZINHERIT);
> > > +	extsize = be32_to_cpu(dip->di_extsize);
> > > +	extsize_bytes = XFS_FSB_TO_B(sc->mp, extsize);
> > > +
> > > +	if (rt_flag)
> > > +		blocksize_bytes = mp->m_sb.sb_rextsize << mp->m_sb.sb_blocklog;
> > > +	else
> > > +		blocksize_bytes = mp->m_sb.sb_blocksize;
> > > +
> > > +	if ((hint_flag || inherit_flag) && (!S_ISDIR(mode) && !S_ISREG(mode)))
> 
> Logic is a correct but reads funny:
> 
> 	if ((hint_flag || inherit_flag) &&
> 	    !(S_ISREG(mode) || S_ISDIR(mode)))

Ok.  Fixed this and the cowextsize.

> > > +/*
> > > + * di_cowextsize hint validation is somewhat cumbersome. Rules are:
> > > + *
> > > + * 1. flag requires reflink feature
> > > + * 2. cow extent size hint is only valid for directories and regular files
> > > + * 3. cow extsize hint of 0 turns off hints, clears inode flags.
> > > + * 4. either flag must be set if cow extsize != 0
> > > + * 5. flag cannot be set for rt files
> > > + * 6. Extent size must be a multiple of the appropriate block size.
> > > + * 7. extent size hint cannot be longer than maximum extent length
> > > + * 8. the extent size hint must be limited
> > > + *    to half the AG size to avoid alignment extending the extent
> > > + *    beyond the limits of the AG.
> > > + */
> 
> Perhaps this comment doesn't need duplicating for a 3rd time. Maybe
> for both di_extsize and di_cowextsize just say:
> 
> /*
>  * Extent size hints have explicit rules. They are documented at
>  * xfs_ioctl_setattr_check_extsize() - these functions need to be
>  * kept in sync with each other.
>  */

Ok.  I've also amended the comment at xfs_ioctl_setattr_check_extsize to
remind people to keep the scrub version in sync.

> > > +STATIC void
> > > +xfs_scrub_inode_cowextsize(
> > > +	struct xfs_scrub_context	*sc,
> > > +	struct xfs_buf			*bp,
> > > +	struct xfs_dinode		*dip,
> > > +	xfs_ino_t			ino,
> > > +	uint16_t			mode,
> > > +	uint16_t			flags,
> > > +	uint64_t			flags2)
> > > +{
> > > +	struct xfs_mount		*mp = sc->mp;
> > > +	bool				rt_flag;
> > > +	bool				hint_flag;
> > > +	uint32_t			extsize;
> > > +	uint32_t			extsize_bytes;
> > > +
> > > +	rt_flag = (flags & XFS_DIFLAG_REALTIME);
> > > +	hint_flag = (flags2 & XFS_DIFLAG2_COWEXTSIZE);
> > > +	extsize = be32_to_cpu(dip->di_extsize);
> > 
> > Doh, this ought to be extsize = be32_to_cpu(dip->di_cowextsize); will fix.
> 
> Yup, with that fix in place all the spurious inode warnings I was
> getting went away.
> 
> > > +/* Map and read a raw inode. */
> > > +STATIC int
> > > +xfs_scrub_inode_map_raw(
> > > +	struct xfs_scrub_context	*sc,
> > > +	xfs_ino_t			ino,
> > > +	struct xfs_buf			**bpp,
> > > +	struct xfs_dinode		**dipp)
> > > +{
> > > +	struct xfs_imap			imap;
> > > +	struct xfs_mount		*mp = sc->mp;
> > > +	struct xfs_buf			*bp;
> > > +	struct xfs_dinode		*dip;
> > > +	int				error;
> > > +
> > > +	error = xfs_imap(mp, sc->tp, ino, &imap, XFS_IGET_UNTRUSTED);
> > > +	if (error == -EINVAL) {
> > > +		/*
> > > +		 * Inode could have gotten deleted out from under us;
> > > +		 * just forget about it.
> > > +		 */
> > > +		error = -ENOENT;
> > > +		goto out;
> > > +	}
> > > +	if (!xfs_scrub_process_error(sc, XFS_INO_TO_AGNO(mp, ino),
> > > +			XFS_INO_TO_AGBNO(mp, ino), &error))
> > > +		goto out;
> > > +
> > > +	error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
> > > +			imap.im_blkno, imap.im_len, XBF_UNMAPPED, &bp,
> > > +			NULL);
> > > +	if (!xfs_scrub_process_error(sc, XFS_INO_TO_AGNO(mp, ino),
> > > +			XFS_INO_TO_AGBNO(mp, ino), &error))
> > > +		goto out;
> > > +
> > > +	/* Is this really an inode? */
> > > +	bp->b_ops = &xfs_inode_buf_ops;
> 
> A comment here on why we skip the read verifier when pulling in the
> inode buffer would be nice.

/*
 * Is this really an inode?  We disabled verifiers in the above
 * xfs_trans_read_buf call because the inode buffer verifier
 * fails on /any/ inode record in the inode cluster with a bad
 * magic or version number, not just the one that we're
 * checking.  Therefore, grab the buffer unconditionally, attach
 * the inode verifiers by hand, and run the inode verifier only
 * on the one inode we want.
 */

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 18/30] xfs: scrub inode btrees
  2017-10-16  2:55   ` Dave Chinner
@ 2017-10-16 22:16     ` Darrick J. Wong
  0 siblings, 0 replies; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-16 22:16 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, Oct 16, 2017 at 01:55:59PM +1100, Dave Chinner wrote:
> On Wed, Oct 11, 2017 at 06:42:41PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Check the records of the inode btrees to make sure that the values
> > make sense given the inode records themselves.
> 
> .....
> 
> I think maybe I missed this first time around...
> 
> > +/* Check a particular inode with ir_free. */
> > +STATIC int
> > +xfs_scrub_iallocbt_check_cluster_freemask(
> > +	struct xfs_scrub_btree		*bs,
> > +	xfs_ino_t			fsino,
> > +	xfs_agino_t			chunkino,
> > +	xfs_agino_t			clusterino,
> > +	struct xfs_inobt_rec_incore	*irec,
> > +	struct xfs_buf			*bp)
> > +{
> > +	struct xfs_dinode		*dip;
> > +	struct xfs_mount		*mp = bs->cur->bc_mp;
> > +	bool				freemask_ok;
> > +	bool				inuse;
> > +	int				error;
> > +
> > +	dip = xfs_buf_offset(bp, clusterino * mp->m_sb.sb_inodesize);
> > +	if (be16_to_cpu(dip->di_magic) != XFS_DINODE_MAGIC ||
> > +	    (dip->di_version >= 3 &&
> > +	     be64_to_cpu(dip->di_ino) != fsino + clusterino)) {
> > +		xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
> > +		goto out;
> > +	}
> > +
> > +	freemask_ok = (irec->ir_free & XFS_INOBT_MASK(chunkino + clusterino));
> 
> Ok, so if the inode if free, the corresponding bit in the mask
> will be set....
> 
> > +	error = xfs_icache_inode_is_allocated(mp, bs->cur->bc_tp,
> > +			fsino + clusterino, &inuse);
> > +	if (error == -ENODATA) {
> > +		/* Not cached, just read the disk buffer */
> > +		freemask_ok ^= !!(dip->di_mode);
> 
> And this uses the lowest bit of the mask? How does that work?
> 
> /me spends 10 minutes trying to work out this function before he
> realises that freemask_ok is a boolean, so the initial freemask
> bit is collapsed down to a single bit.
> 
> Ok, that's definitely unexpected and not obvious from the code or
> comments. The name "freemask_ok" is misleading the way it's used.
> The first time it is set is means "inode is free", then after this
> operation it means "inode matches free mask"....

Ugh, sorry.  Thank you for complaining about this!

> 
> > +		if (!bs->sc->try_harder && !freemask_ok)
> > +			return -EDEADLOCK;
> > +	} else if (error < 0) {
> > +		/*
> > +		 * Inode is only half assembled, or there was an IO error,
> > +		 * or the verifier failed, so don't bother trying to check.
> > +		 * The inode scrubber can deal with this.
> > +		 */
> > +		freemask_ok = true;
> 
> And here it means "we didn't check the free mask"
> 
> > +	} else {
> > +		/* Inode is all there. */
> > +		freemask_ok ^= inuse;
> 
> And here is means "inode matches free mask" again....
> 
> > +	}
> > +	if (!freemask_ok)
> > +		xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
> > +out:
> > +	return 0;
> 
> Can we rewrite this to be a little more obvious?
> 
> 	bool		inode_is_free = false;
> 	bool		freemask_ok;
> ....
> 	if (irec->ir_free & XFS_INOBT_MASK(chunkino + clusterino))
> 		inode_is_free = true;
> ....
> 	if (error == -ENODATA) {
> 		freemask_ok = inode_is_free ^ !!(dip->di_mode);
> ....
> 	else if (error < 0) {
> 		/*
> 		 * Inode is only half assembled, .....
> 		 */
> 		goto out;
> 	} else {
> 		freemask_ok = inode_is_free ^ inuse;
> 	}
> 
> That's a lot more obvious what the code is checking...

Done.

> 
> > +/* Scrub an inobt/finobt record. */
> > +STATIC int
> > +xfs_scrub_iallocbt_helper(
> 
> s/helper/rec/

Done.

--D

> 
> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 25/30] xfs: scrub directory freespace
  2017-10-16  4:49   ` Dave Chinner
@ 2017-10-16 22:37     ` Darrick J. Wong
  2017-10-16 23:11       ` Darrick J. Wong
  2017-10-16 23:14       ` Dave Chinner
  0 siblings, 2 replies; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-16 22:37 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, Oct 16, 2017 at 03:49:21PM +1100, Dave Chinner wrote:
> On Wed, Oct 11, 2017 at 06:43:26PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Check the free space information in a directory.
> 
> ....
> 
> > diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c
> > index e2a8f90..a41310f 100644
> > --- a/fs/xfs/scrub/dir.c
> > +++ b/fs/xfs/scrub/dir.c
> > @@ -250,6 +250,426 @@ xfs_scrub_dir_rec(
> >  	return error;
> >  }
> >  
> > +/*
> > + * Is this unused entry either in the bestfree or smaller than all of them?
> > + * We assume the bestfrees are sorted longest to shortest, and that there
> > + * aren't any bogus entries.
> 
> s/We assume/We've already checked/

Ok.

> > + */
> > +static inline void
> > +xfs_scrub_directory_check_free_entry(
> > +	struct xfs_scrub_context	*sc,
> > +	xfs_dablk_t			lblk,
> > +	struct xfs_dir2_data_free	*bf,
> > +	struct xfs_dir2_data_unused	*dup)
> 
> ....
> 
> > +	while (ptr < endptr) {
> > +		dup = (struct xfs_dir2_data_unused *)ptr;
> > +		/* Skip real entries */
> > +		if (dup->freetag != cpu_to_be16(XFS_DIR2_DATA_FREE_TAG)) {
> > +			struct xfs_dir2_data_entry	*dep;
> > +
> > +			dep = (struct xfs_dir2_data_entry *)ptr;
> > +			newlen = d_ops->data_entsize(dep->namelen);
> > +			if (newlen <= 0) {
> > +				xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK,
> > +						lblk);
> > +				goto out_buf;
> > +			}
> > +			ptr += newlen;
> > +			if (endptr < ptr)
> > +				xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK,
> > +					      lblk);
> > +			continue;
> > +		}
> > +
> > +		/* Spot check this free entry */
> > +		tag = be16_to_cpu(*xfs_dir2_data_unused_tag_p(dup));
> > +		if (tag != ((char *)dup - (char *)bp->b_addr))
> > +			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
> > +
> > +		/*
> > +		 * Either this entry is a bestfree or it's smaller than
> > +		 * any of the bestfrees.
> > +		 */
> > +		xfs_scrub_directory_check_free_entry(sc, lblk, bf, dup);
> > +
> > +		/* Move on. */
> > +		newlen = be16_to_cpu(dup->length);
> > +		if (newlen <= 0) {
> > +			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
> > +			goto out_buf;
> > +		}
> > +		ptr += newlen;
> > +		if (endptr < ptr)
> > +			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
> 
> I'd prefer this matches the loop logic order. ie.
> 
> 		if (ptr >= endptr)
> 

Ok.  That check can be lifted out of the loop anyway.

> > +		else
> > +			nr_frees++;
> > +	}
> > +
> > +	/* Did we see at least as many free slots as there are bestfrees? */
> > +	if (nr_frees < nr_bestfrees)
> > +		xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
> > +out_buf:
> > +	xfs_trans_brelse(sc->tp, bp);
> > +out:
> > +	return error;
> > +}
> > +
> > +/*
> > + * Does the free space length in the free space index block ($len) match
> > + * the longest length in the directory data block's bestfree array?
> > + * Assume that we've already checked that the data block's bestfree
> > + * array is in order.
> > + */
> > +static inline void
> > +xfs_scrub_directory_check_freesp(
> 
> No need for inline here, the compiler will do that automatically if
> appropriate.

Ok.  I'll make it STATIC like everything else here.

> > +	struct xfs_scrub_context	*sc,
> > +	xfs_dablk_t			lblk,
> > +	struct xfs_buf			*dbp,
> > +	unsigned int			len)
> > +{
> > +	struct xfs_dir2_data_free	*bf;
> > +	struct xfs_dir2_data_free	*dfp;
> > +	int				offset;
> > +
> > +	if (len == 0)
> > +		return;
> > +
> > +	bf = sc->ip->d_ops->data_bestfree_p(dbp->b_addr);
> > +	for (dfp = &bf[0]; dfp < &bf[XFS_DIR2_DATA_FD_COUNT]; dfp++) {
> > +		offset = be16_to_cpu(dfp->offset);
> > +		if (offset == 0)
> > +			break;
> > +		if (len == be16_to_cpu(dfp->length))
> > +			return;
> > +		/* Didn't find the best length in the bestfree data */
> > +		break;
> > +	}
> > +
> > +	xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
> > +}
> > +
> > +/* Check free space info in a directory leaf1 block. */
> > +STATIC int
> > +xfs_scrub_directory_leaf1_bestfree(
> > +	struct xfs_scrub_context	*sc,
> > +	struct xfs_da_args		*args,
> > +	xfs_dablk_t			lblk)
> > +{
> > +	struct xfs_dir2_leaf_tail	*ltp;
> > +	struct xfs_buf			*dbp;
> > +	struct xfs_buf			*bp;
> > +	struct xfs_mount		*mp = sc->mp;
> > +	__be16				*bestp;
> > +	__u16				best;
> > +	int				i;
> > +	int				error;
> > +
> > +	/*
> > +	 * Read the free space block.  The verifier will check for hash
> > +	 * value ordering problems and check the stale entry count.
> > +	 */
> > +	error = xfs_dir3_leaf_read(sc->tp, sc->ip, lblk, -1, &bp);
> > +	if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, lblk, &error))
> > +		goto out;
> > +
> > +	/* Check all the entries. */
> > +	ltp = xfs_dir2_leaf_tail_p(mp->m_dir_geo, bp->b_addr);
> > +	bestp = xfs_dir2_leaf_bests_p(ltp);
> > +	for (i = 0; i < be32_to_cpu(ltp->bestcount); i++, bestp++) {
> > +		best = be16_to_cpu(*bestp);
> > +		if (best == NULLDATAOFF)
> > +			continue;
> 
> Count stale entries, check if matches hdr->stale ?

This should already be done by xfs_dir3_leaf_read ->
xfs_dir3_leaf1_read_verify -> __read_verify -> xfs_dir3_leaf_verify ->
xfs_dir3_leaf_check_int, right?

> 
> > +		error = xfs_dir3_data_read(sc->tp, sc->ip,
> > +				i * args->geo->fsbcount, -1, &dbp);
> > +		if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, lblk,
> > +				&error))
> > +			continue;
> > +		xfs_scrub_directory_check_freesp(sc, lblk, dbp, best);
> > +		xfs_trans_brelse(sc->tp, dbp);
> > +	}
> > +out:
> > +	return error;
> > +}
> > +
> > +/* Check free space info in a directory freespace block. */
> > +STATIC int
> > +xfs_scrub_directory_free_bestfree(
> > +	struct xfs_scrub_context	*sc,
> > +	struct xfs_da_args		*args,
> > +	xfs_dablk_t			lblk)
> > +{
> > +	struct xfs_dir3_icfree_hdr	freehdr;
> > +	struct xfs_buf			*dbp;
> > +	struct xfs_buf			*bp;
> > +	__be16				*bestp;
> > +	__be16				best;
> > +	int				i;
> > +	int				error;
> > +
> > +	/* Read the free space block */
> > +	error = xfs_dir2_free_read(sc->tp, sc->ip, lblk, &bp);
> > +	if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, lblk, &error))
> > +		goto out;
> > +
> > +	/* Check all the entries. */
> > +	sc->ip->d_ops->free_hdr_from_disk(&freehdr, bp->b_addr);
> > +	bestp = sc->ip->d_ops->free_bests_p(bp->b_addr);
> > +	for (i = 0; i < freehdr.nvalid; i++, bestp++) {
> > +		best = be16_to_cpu(*bestp);
> > +		if (best == NULLDATAOFF)
> > +			continue;
> 
> Count stale entries, check freehdr.nvalid + stale = freehdr.nused?

Aha, yes, that needs to be checked.

Also that for loop needs to be terminated on i < freehdr.nused.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 08/30] xfs: probe the scrub ioctl
  2017-10-16 19:54     ` Darrick J. Wong
@ 2017-10-16 23:05       ` Dave Chinner
  0 siblings, 0 replies; 99+ messages in thread
From: Dave Chinner @ 2017-10-16 23:05 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Oct 16, 2017 at 12:54:36PM -0700, Darrick J. Wong wrote:
> On Mon, Oct 16, 2017 at 11:39:12AM +1100, Dave Chinner wrote:
> > On Wed, Oct 11, 2017 at 06:41:37PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > Create a probe scrubber with id 0.  This will be used by xfs_scrub to
> > > probe the kernel's abilities to scrub (and repair) the metadata.
> > 
> > This no longer returns anything to userspace it indicate
> > capabilities. I can see that the previous patch checks for
> > valid/invalid input flags, so we have unknown feature
> > checking in place, just not obviously through the probe function
> > implementation. Can you expand this comment a little to explain
> > where the supported feature checks occur and so all that is required
> > here is a stub that does nothing?
> 
> Ok.  I propose:
> 
> "Create a probe scrubber with id 0.  This will be used by xfs_scrub to
> probe the kernel's abilities to scrub (and repair) the metadata.  We do
> this by validating the ioctl inputs from userspace, preparing the
> filesystem for a scrub (or a repair) operation, and immediately
> returning to userspace.  Userspace can use the returned errno and
> structure state to decide (in broad terms) if scrub/repair are supported
> by the running kernel."

Yup, that works for me. :P

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 25/30] xfs: scrub directory freespace
  2017-10-16 22:37     ` Darrick J. Wong
@ 2017-10-16 23:11       ` Darrick J. Wong
  2017-10-16 23:14       ` Dave Chinner
  1 sibling, 0 replies; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-16 23:11 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, Oct 16, 2017 at 03:37:08PM -0700, Darrick J. Wong wrote:
> On Mon, Oct 16, 2017 at 03:49:21PM +1100, Dave Chinner wrote:
> > On Wed, Oct 11, 2017 at 06:43:26PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > Check the free space information in a directory.
> > 
> > ....
> > 
> > > diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c
> > > index e2a8f90..a41310f 100644
> > > --- a/fs/xfs/scrub/dir.c
> > > +++ b/fs/xfs/scrub/dir.c
> > > @@ -250,6 +250,426 @@ xfs_scrub_dir_rec(
> > >  	return error;
> > >  }
> > >  
> > > +/*
> > > + * Is this unused entry either in the bestfree or smaller than all of them?
> > > + * We assume the bestfrees are sorted longest to shortest, and that there
> > > + * aren't any bogus entries.
> > 
> > s/We assume/We've already checked/
> 
> Ok.
> 
> > > + */
> > > +static inline void
> > > +xfs_scrub_directory_check_free_entry(
> > > +	struct xfs_scrub_context	*sc,
> > > +	xfs_dablk_t			lblk,
> > > +	struct xfs_dir2_data_free	*bf,
> > > +	struct xfs_dir2_data_unused	*dup)
> > 
> > ....
> > 
> > > +	while (ptr < endptr) {
> > > +		dup = (struct xfs_dir2_data_unused *)ptr;
> > > +		/* Skip real entries */
> > > +		if (dup->freetag != cpu_to_be16(XFS_DIR2_DATA_FREE_TAG)) {
> > > +			struct xfs_dir2_data_entry	*dep;
> > > +
> > > +			dep = (struct xfs_dir2_data_entry *)ptr;
> > > +			newlen = d_ops->data_entsize(dep->namelen);
> > > +			if (newlen <= 0) {
> > > +				xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK,
> > > +						lblk);
> > > +				goto out_buf;
> > > +			}
> > > +			ptr += newlen;
> > > +			if (endptr < ptr)
> > > +				xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK,
> > > +					      lblk);
> > > +			continue;
> > > +		}
> > > +
> > > +		/* Spot check this free entry */
> > > +		tag = be16_to_cpu(*xfs_dir2_data_unused_tag_p(dup));
> > > +		if (tag != ((char *)dup - (char *)bp->b_addr))
> > > +			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
> > > +
> > > +		/*
> > > +		 * Either this entry is a bestfree or it's smaller than
> > > +		 * any of the bestfrees.
> > > +		 */
> > > +		xfs_scrub_directory_check_free_entry(sc, lblk, bf, dup);
> > > +
> > > +		/* Move on. */
> > > +		newlen = be16_to_cpu(dup->length);
> > > +		if (newlen <= 0) {
> > > +			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
> > > +			goto out_buf;
> > > +		}
> > > +		ptr += newlen;
> > > +		if (endptr < ptr)
> > > +			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
> > 
> > I'd prefer this matches the loop logic order. ie.
> > 
> > 		if (ptr >= endptr)
> > 
> 
> Ok.  That check can be lifted out of the loop anyway.
> 
> > > +		else
> > > +			nr_frees++;
> > > +	}
> > > +
> > > +	/* Did we see at least as many free slots as there are bestfrees? */
> > > +	if (nr_frees < nr_bestfrees)
> > > +		xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
> > > +out_buf:
> > > +	xfs_trans_brelse(sc->tp, bp);
> > > +out:
> > > +	return error;
> > > +}
> > > +
> > > +/*
> > > + * Does the free space length in the free space index block ($len) match
> > > + * the longest length in the directory data block's bestfree array?
> > > + * Assume that we've already checked that the data block's bestfree
> > > + * array is in order.
> > > + */
> > > +static inline void
> > > +xfs_scrub_directory_check_freesp(
> > 
> > No need for inline here, the compiler will do that automatically if
> > appropriate.
> 
> Ok.  I'll make it STATIC like everything else here.
> 
> > > +	struct xfs_scrub_context	*sc,
> > > +	xfs_dablk_t			lblk,
> > > +	struct xfs_buf			*dbp,
> > > +	unsigned int			len)
> > > +{
> > > +	struct xfs_dir2_data_free	*bf;
> > > +	struct xfs_dir2_data_free	*dfp;
> > > +	int				offset;
> > > +
> > > +	if (len == 0)
> > > +		return;
> > > +
> > > +	bf = sc->ip->d_ops->data_bestfree_p(dbp->b_addr);
> > > +	for (dfp = &bf[0]; dfp < &bf[XFS_DIR2_DATA_FD_COUNT]; dfp++) {
> > > +		offset = be16_to_cpu(dfp->offset);
> > > +		if (offset == 0)
> > > +			break;
> > > +		if (len == be16_to_cpu(dfp->length))
> > > +			return;
> > > +		/* Didn't find the best length in the bestfree data */
> > > +		break;
> > > +	}
> > > +
> > > +	xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
> > > +}
> > > +
> > > +/* Check free space info in a directory leaf1 block. */
> > > +STATIC int
> > > +xfs_scrub_directory_leaf1_bestfree(
> > > +	struct xfs_scrub_context	*sc,
> > > +	struct xfs_da_args		*args,
> > > +	xfs_dablk_t			lblk)
> > > +{
> > > +	struct xfs_dir2_leaf_tail	*ltp;
> > > +	struct xfs_buf			*dbp;
> > > +	struct xfs_buf			*bp;
> > > +	struct xfs_mount		*mp = sc->mp;
> > > +	__be16				*bestp;
> > > +	__u16				best;
> > > +	int				i;
> > > +	int				error;
> > > +
> > > +	/*
> > > +	 * Read the free space block.  The verifier will check for hash
> > > +	 * value ordering problems and check the stale entry count.
> > > +	 */
> > > +	error = xfs_dir3_leaf_read(sc->tp, sc->ip, lblk, -1, &bp);
> > > +	if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, lblk, &error))
> > > +		goto out;
> > > +
> > > +	/* Check all the entries. */
> > > +	ltp = xfs_dir2_leaf_tail_p(mp->m_dir_geo, bp->b_addr);
> > > +	bestp = xfs_dir2_leaf_bests_p(ltp);
> > > +	for (i = 0; i < be32_to_cpu(ltp->bestcount); i++, bestp++) {
> > > +		best = be16_to_cpu(*bestp);
> > > +		if (best == NULLDATAOFF)
> > > +			continue;
> > 
> > Count stale entries, check if matches hdr->stale ?
> 
> This should already be done by xfs_dir3_leaf_read ->
> xfs_dir3_leaf1_read_verify -> __read_verify -> xfs_dir3_leaf_verify ->
> xfs_dir3_leaf_check_int, right?
> 
> > 
> > > +		error = xfs_dir3_data_read(sc->tp, sc->ip,
> > > +				i * args->geo->fsbcount, -1, &dbp);
> > > +		if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, lblk,
> > > +				&error))
> > > +			continue;
> > > +		xfs_scrub_directory_check_freesp(sc, lblk, dbp, best);
> > > +		xfs_trans_brelse(sc->tp, dbp);
> > > +	}
> > > +out:
> > > +	return error;
> > > +}
> > > +
> > > +/* Check free space info in a directory freespace block. */
> > > +STATIC int
> > > +xfs_scrub_directory_free_bestfree(
> > > +	struct xfs_scrub_context	*sc,
> > > +	struct xfs_da_args		*args,
> > > +	xfs_dablk_t			lblk)
> > > +{
> > > +	struct xfs_dir3_icfree_hdr	freehdr;
> > > +	struct xfs_buf			*dbp;
> > > +	struct xfs_buf			*bp;
> > > +	__be16				*bestp;
> > > +	__be16				best;
> > > +	int				i;
> > > +	int				error;
> > > +
> > > +	/* Read the free space block */
> > > +	error = xfs_dir2_free_read(sc->tp, sc->ip, lblk, &bp);
> > > +	if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, lblk, &error))
> > > +		goto out;
> > > +
> > > +	/* Check all the entries. */
> > > +	sc->ip->d_ops->free_hdr_from_disk(&freehdr, bp->b_addr);
> > > +	bestp = sc->ip->d_ops->free_bests_p(bp->b_addr);
> > > +	for (i = 0; i < freehdr.nvalid; i++, bestp++) {
> > > +		best = be16_to_cpu(*bestp);
> > > +		if (best == NULLDATAOFF)
> > > +			continue;
> > 
> > Count stale entries, check freehdr.nvalid + stale = freehdr.nused?
> 
> Aha, yes, that needs to be checked.
> 
> Also that for loop needs to be terminated on i < freehdr.nused.

(Er... NAK on that last sentence; it's the documentation that are wrong.)

--D

> 
> --D
> 
> > Cheers,
> > 
> > Dave.
> > -- 
> > Dave Chinner
> > david@fromorbit.com
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 25/30] xfs: scrub directory freespace
  2017-10-16 22:37     ` Darrick J. Wong
  2017-10-16 23:11       ` Darrick J. Wong
@ 2017-10-16 23:14       ` Dave Chinner
  2017-10-16 23:38         ` Darrick J. Wong
  1 sibling, 1 reply; 99+ messages in thread
From: Dave Chinner @ 2017-10-16 23:14 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Oct 16, 2017 at 03:37:08PM -0700, Darrick J. Wong wrote:
> On Mon, Oct 16, 2017 at 03:49:21PM +1100, Dave Chinner wrote:
> > On Wed, Oct 11, 2017 at 06:43:26PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > Check the free space information in a directory.
> > 
> > ....
> > 
> > > +/* Check free space info in a directory leaf1 block. */
> > > +STATIC int
> > > +xfs_scrub_directory_leaf1_bestfree(
> > > +	struct xfs_scrub_context	*sc,
> > > +	struct xfs_da_args		*args,
> > > +	xfs_dablk_t			lblk)
> > > +{
> > > +	struct xfs_dir2_leaf_tail	*ltp;
> > > +	struct xfs_buf			*dbp;
> > > +	struct xfs_buf			*bp;
> > > +	struct xfs_mount		*mp = sc->mp;
> > > +	__be16				*bestp;
> > > +	__u16				best;
> > > +	int				i;
> > > +	int				error;
> > > +
> > > +	/*
> > > +	 * Read the free space block.  The verifier will check for hash
> > > +	 * value ordering problems and check the stale entry count.
> > > +	 */
> > > +	error = xfs_dir3_leaf_read(sc->tp, sc->ip, lblk, -1, &bp);
> > > +	if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, lblk, &error))
> > > +		goto out;
> > > +
> > > +	/* Check all the entries. */
> > > +	ltp = xfs_dir2_leaf_tail_p(mp->m_dir_geo, bp->b_addr);
> > > +	bestp = xfs_dir2_leaf_bests_p(ltp);
> > > +	for (i = 0; i < be32_to_cpu(ltp->bestcount); i++, bestp++) {
> > > +		best = be16_to_cpu(*bestp);
> > > +		if (best == NULLDATAOFF)
> > > +			continue;
> > 
> > Count stale entries, check if matches hdr->stale ?
> 
> This should already be done by xfs_dir3_leaf_read ->
> xfs_dir3_leaf1_read_verify -> __read_verify -> xfs_dir3_leaf_verify ->
> xfs_dir3_leaf_check_int, right?

True. However, it's simple to do and we're already iterating over
all the structures necessary and detecting the case necessary to
check it, so it doesn't hurt and might catch in-core corruptions
before they hit the write verifier?

And it makes it do the same checks as the free_bestfree checks
below....

> > > +		if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, lblk,
> > > +				&error))
> > > +			continue;
> > > +		xfs_scrub_directory_check_freesp(sc, lblk, dbp, best);
> > > +		xfs_trans_brelse(sc->tp, dbp);
> > > +	}
> > > +out:
> > > +	return error;
> > > +}
> > > +
> > > +/* Check free space info in a directory freespace block. */
> > > +STATIC int
> > > +xfs_scrub_directory_free_bestfree(
> > > +	struct xfs_scrub_context	*sc,
> > > +	struct xfs_da_args		*args,
> > > +	xfs_dablk_t			lblk)
> > > +{
> > > +	struct xfs_dir3_icfree_hdr	freehdr;
> > > +	struct xfs_buf			*dbp;
> > > +	struct xfs_buf			*bp;
> > > +	__be16				*bestp;
> > > +	__be16				best;
> > > +	int				i;
> > > +	int				error;
> > > +
> > > +	/* Read the free space block */
> > > +	error = xfs_dir2_free_read(sc->tp, sc->ip, lblk, &bp);
> > > +	if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, lblk, &error))
> > > +		goto out;
> > > +
> > > +	/* Check all the entries. */
> > > +	sc->ip->d_ops->free_hdr_from_disk(&freehdr, bp->b_addr);
> > > +	bestp = sc->ip->d_ops->free_bests_p(bp->b_addr);
> > > +	for (i = 0; i < freehdr.nvalid; i++, bestp++) {
> > > +		best = be16_to_cpu(*bestp);
> > > +		if (best == NULLDATAOFF)
> > > +			continue;
> > 
> > Count stale entries, check freehdr.nvalid + stale = freehdr.nused?
> 
> Aha, yes, that needs to be checked.
> 
> Also that for loop needs to be terminated on i < freehdr.nused.

Good catch - I missed that, too :P

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 28/30] xfs: scrub directory parent pointers
  2017-10-16 21:46     ` Darrick J. Wong
@ 2017-10-16 23:30       ` Dave Chinner
  2017-10-16 23:58         ` Darrick J. Wong
  0 siblings, 1 reply; 99+ messages in thread
From: Dave Chinner @ 2017-10-16 23:30 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Oct 16, 2017 at 02:46:01PM -0700, Darrick J. Wong wrote:
> On Mon, Oct 16, 2017 at 04:09:28PM +1100, Dave Chinner wrote:
> > On Wed, Oct 11, 2017 at 06:43:47PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > Scrub parent pointers, sort of.  For directories, we can ride the
> > > '..' entry up to the parent to confirm that there's at most one
> > > dentry that points back to this directory.
> > 
> > ....
> > 
> > > +/* Count the number of dentries in the parent dir that point to this inode. */
> > > +STATIC int
> > > +xfs_scrub_parent_count_parent_dentries(
> > > +	struct xfs_scrub_context	*sc,
> > > +	struct xfs_inode		*parent,
> > > +	xfs_nlink_t			*nlink)
> > > +{
> > > +	struct xfs_scrub_parent_ctx	spc = {
> > > +		.dc.actor = xfs_scrub_parent_actor,
> > > +		.dc.pos = 0,
> > > +		.ino = sc->ip->i_ino,
> > > +		.nlink = 0,
> > > +	};
> > > +	struct xfs_ifork		*ifp;
> > > +	size_t				bufsize;
> > > +	loff_t				oldpos;
> > > +	uint				lock_mode;
> > > +	int				error;
> > > +
> > > +	/*
> > > +	 * Load the parent directory's extent map.  A regular directory
> > > +	 * open would start readahead (and thus load the extent map)
> > > +	 * before we even got to a readdir call, but this isn't
> > > +	 * guaranteed here.
> > > +	 */
> > > +	lock_mode = xfs_ilock_data_map_shared(parent);
> > > +	ifp = XFS_IFORK_PTR(parent, XFS_DATA_FORK);
> > > +	if (XFS_IFORK_FORMAT(parent, XFS_DATA_FORK) == XFS_DINODE_FMT_BTREE &&
> > > +	    !(ifp->if_flags & XFS_IFEXTENTS)) {
> > > +		error = xfs_iread_extents(sc->tp, parent, XFS_DATA_FORK);
> > > +		if (error) {
> > > +			xfs_iunlock(parent, lock_mode);
> > > +			return error;
> > > +		}
> > > +	}
> > > +	xfs_iunlock(parent, lock_mode);
> > 
> > Why not just do what xfs_dir_open() does? i.e.
> > 
> >         /*
> >          * If there are any blocks, read-ahead block 0 as we're almost
> >          * certain to have the next operation be a read there.
> >          */
> >         mode = xfs_ilock_data_map_shared(ip);
> >         if (ip->i_d.di_nextents > 0)
> >                 error = xfs_dir3_data_readahead(ip, 0, -1);
> >         xfs_iunlock(ip, mode);
> 
> Ok.
> 
> > > +	/*
> > > +	 * Iterate the parent dir to confirm that there is
> > > +	 * exactly one entry pointing back to the inode being
> > > +	 * scanned.
> > > +	 */
> > > +	bufsize = (size_t)min_t(loff_t, 32768, parent->i_d.di_size);
> > 
> > Perhaps we need a define for that 32k magic number now it's being
> > used in multiple places?
> 
> Magic indeed; glibc uses the minimum of (4 * BUFSIZ) or (sizeof(struct
> dirent)); musl uses a static 2k buffer; dietlibc uses (PAGE_SIZE -
> sizeof(structure header))...
> 
> ...what would we call it?
> 
> /*
>  * The Linux API doesn't pass down the total size of the buffer we read
>  * into down to the filesystem.  With the filldir concept it's not
>  * needed for correct information, but the XFS dir2 leaf code wants an
>  * estimate of the buffer size to calculate its readahead window and
>  * size the buffers used for mapping to physical blocks.
>  *
>  * Try to give it an estimate that's good enough, maybe at some point we
>  * can change the ->readdir prototype to include the buffer size.  For
>  * now we use the current glibc buffer size.
>  */
> #define XFS_DEFAULT_READDIR_BUFSIZE	(32768)

That looks fine, though I think XFS_READDIR_BUFSIZE (or purple!) is
sufficient.

> (As a side question, do we want to bump this up to a full pagesize on
> architectures that have 64k pages?  I'd probably just leave it, but
> let's see if anyone running those architectures complains or sends in a
> patch?)

If it was to be dynamic, it should be determined by the directory
block size, not the arch page size.

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 04/30] xfs: refactor btree block header checking functions
  2017-10-16 19:48   ` [PATCH v2 " Darrick J. Wong
@ 2017-10-16 23:36     ` Dave Chinner
  0 siblings, 0 replies; 99+ messages in thread
From: Dave Chinner @ 2017-10-16 23:36 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Oct 16, 2017 at 12:48:03PM -0700, Darrick J. Wong wrote:
> Refactor the btree block header checks to have an internal function that
> returns the address of the failing check without logging errors.  The
> scrubber will call the internal function, while the external version
> will maintain the current logging behavior.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Looks good.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 25/30] xfs: scrub directory freespace
  2017-10-16 23:14       ` Dave Chinner
@ 2017-10-16 23:38         ` Darrick J. Wong
  0 siblings, 0 replies; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-16 23:38 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Oct 17, 2017 at 10:14:13AM +1100, Dave Chinner wrote:
> On Mon, Oct 16, 2017 at 03:37:08PM -0700, Darrick J. Wong wrote:
> > On Mon, Oct 16, 2017 at 03:49:21PM +1100, Dave Chinner wrote:
> > > On Wed, Oct 11, 2017 at 06:43:26PM -0700, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > 
> > > > Check the free space information in a directory.
> > > 
> > > ....
> > > 
> > > > +/* Check free space info in a directory leaf1 block. */
> > > > +STATIC int
> > > > +xfs_scrub_directory_leaf1_bestfree(
> > > > +	struct xfs_scrub_context	*sc,
> > > > +	struct xfs_da_args		*args,
> > > > +	xfs_dablk_t			lblk)
> > > > +{
> > > > +	struct xfs_dir2_leaf_tail	*ltp;
> > > > +	struct xfs_buf			*dbp;
> > > > +	struct xfs_buf			*bp;
> > > > +	struct xfs_mount		*mp = sc->mp;
> > > > +	__be16				*bestp;
> > > > +	__u16				best;
> > > > +	int				i;
> > > > +	int				error;
> > > > +
> > > > +	/*
> > > > +	 * Read the free space block.  The verifier will check for hash
> > > > +	 * value ordering problems and check the stale entry count.
> > > > +	 */
> > > > +	error = xfs_dir3_leaf_read(sc->tp, sc->ip, lblk, -1, &bp);
> > > > +	if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, lblk, &error))
> > > > +		goto out;
> > > > +
> > > > +	/* Check all the entries. */
> > > > +	ltp = xfs_dir2_leaf_tail_p(mp->m_dir_geo, bp->b_addr);
> > > > +	bestp = xfs_dir2_leaf_bests_p(ltp);
> > > > +	for (i = 0; i < be32_to_cpu(ltp->bestcount); i++, bestp++) {
> > > > +		best = be16_to_cpu(*bestp);
> > > > +		if (best == NULLDATAOFF)
> > > > +			continue;
> > > 
> > > Count stale entries, check if matches hdr->stale ?
> > 
> > This should already be done by xfs_dir3_leaf_read ->
> > xfs_dir3_leaf1_read_verify -> __read_verify -> xfs_dir3_leaf_verify ->
> > xfs_dir3_leaf_check_int, right?
> 
> True. However, it's simple to do and we're already iterating over
> all the structures necessary and detecting the case necessary to
> check it, so it doesn't hurt and might catch in-core corruptions
> before they hit the write verifier?

Yes, though the verifier rework series will (in addition to printing
instruction pointer addresses of the failing verifier checks) expose the
structure verification code via a new b_ops function pointer, and then
enhances scrub to call it.

(That said, from a completeness standpoint I don't mind duplicating the
code...)

> And it makes it do the same checks as the free_bestfree checks
> below....
> 
> > > > +		if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, lblk,
> > > > +				&error))
> > > > +			continue;
> > > > +		xfs_scrub_directory_check_freesp(sc, lblk, dbp, best);
> > > > +		xfs_trans_brelse(sc->tp, dbp);
> > > > +	}
> > > > +out:
> > > > +	return error;
> > > > +}
> > > > +
> > > > +/* Check free space info in a directory freespace block. */
> > > > +STATIC int
> > > > +xfs_scrub_directory_free_bestfree(
> > > > +	struct xfs_scrub_context	*sc,
> > > > +	struct xfs_da_args		*args,
> > > > +	xfs_dablk_t			lblk)
> > > > +{
> > > > +	struct xfs_dir3_icfree_hdr	freehdr;
> > > > +	struct xfs_buf			*dbp;
> > > > +	struct xfs_buf			*bp;
> > > > +	__be16				*bestp;
> > > > +	__be16				best;
> > > > +	int				i;
> > > > +	int				error;
> > > > +
> > > > +	/* Read the free space block */
> > > > +	error = xfs_dir2_free_read(sc->tp, sc->ip, lblk, &bp);
> > > > +	if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, lblk, &error))
> > > > +		goto out;
> > > > +
> > > > +	/* Check all the entries. */
> > > > +	sc->ip->d_ops->free_hdr_from_disk(&freehdr, bp->b_addr);
> > > > +	bestp = sc->ip->d_ops->free_bests_p(bp->b_addr);
> > > > +	for (i = 0; i < freehdr.nvalid; i++, bestp++) {
> > > > +		best = be16_to_cpu(*bestp);
> > > > +		if (best == NULLDATAOFF)
> > > > +			continue;
> > > 
> > > Count stale entries, check freehdr.nvalid + stale = freehdr.nused?
> > 
> > Aha, yes, that needs to be checked.
> > 
> > Also that for loop needs to be terminated on i < freehdr.nused.
> 
> Good catch - I missed that, too :P

Well.... the documentation says that nused is the number of entries that
have been filled out and nvalid is the number of entries that aren't
0xFFFF, but....

fhdr.firstdb = 0
fhdr.nvalid = 11
fhdr.nused = 9
fbests[0-10] = 0:0x60 1:0x8 3:0x10 4:0xffff 5:0x10 6:0x8 7:0x10 8:0xffff
9:0x180 10:0xe10

So the documentation is wrong w.r.t. whatever libxfs writes to disk. :)

"The freeindex’s hdr.nvalid should always be the same as the number of
allocated data directory blocks containing name/inode data and will
always be less than or equal to hdr.nused. The value of hdr.nused should
be the same as the index of the last data directory block plus one (i.e.
when the last data block is freed, nused and nvalid are decremented)."

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 05/30] xfs: create inode pointer verifiers
  2017-10-16 19:49   ` [PATCH v2 " Darrick J. Wong
@ 2017-10-16 23:53     ` Dave Chinner
  0 siblings, 0 replies; 99+ messages in thread
From: Dave Chinner @ 2017-10-16 23:53 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Oct 16, 2017 at 12:49:32PM -0700, Darrick J. Wong wrote:
> Create some helper functions to check that inode pointers point to
> somewhere within the filesystem and not at the static AG metadata.
> Move xfs_internal_inum and create a directory inode check function.
> We will use these functions in scrub and elsewhere.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> v2: fix names, fix incorrect results returned from agino_range, remove
> unnecessary tests

Looks good.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 28/30] xfs: scrub directory parent pointers
  2017-10-16 23:30       ` Dave Chinner
@ 2017-10-16 23:58         ` Darrick J. Wong
  0 siblings, 0 replies; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-16 23:58 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Oct 17, 2017 at 10:30:17AM +1100, Dave Chinner wrote:
> On Mon, Oct 16, 2017 at 02:46:01PM -0700, Darrick J. Wong wrote:
> > On Mon, Oct 16, 2017 at 04:09:28PM +1100, Dave Chinner wrote:
> > > On Wed, Oct 11, 2017 at 06:43:47PM -0700, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > 
> > > > Scrub parent pointers, sort of.  For directories, we can ride the
> > > > '..' entry up to the parent to confirm that there's at most one
> > > > dentry that points back to this directory.
> > > 
> > > ....
> > > 
> > > > +/* Count the number of dentries in the parent dir that point to this inode. */
> > > > +STATIC int
> > > > +xfs_scrub_parent_count_parent_dentries(
> > > > +	struct xfs_scrub_context	*sc,
> > > > +	struct xfs_inode		*parent,
> > > > +	xfs_nlink_t			*nlink)
> > > > +{
> > > > +	struct xfs_scrub_parent_ctx	spc = {
> > > > +		.dc.actor = xfs_scrub_parent_actor,
> > > > +		.dc.pos = 0,
> > > > +		.ino = sc->ip->i_ino,
> > > > +		.nlink = 0,
> > > > +	};
> > > > +	struct xfs_ifork		*ifp;
> > > > +	size_t				bufsize;
> > > > +	loff_t				oldpos;
> > > > +	uint				lock_mode;
> > > > +	int				error;
> > > > +
> > > > +	/*
> > > > +	 * Load the parent directory's extent map.  A regular directory
> > > > +	 * open would start readahead (and thus load the extent map)
> > > > +	 * before we even got to a readdir call, but this isn't
> > > > +	 * guaranteed here.
> > > > +	 */
> > > > +	lock_mode = xfs_ilock_data_map_shared(parent);
> > > > +	ifp = XFS_IFORK_PTR(parent, XFS_DATA_FORK);
> > > > +	if (XFS_IFORK_FORMAT(parent, XFS_DATA_FORK) == XFS_DINODE_FMT_BTREE &&
> > > > +	    !(ifp->if_flags & XFS_IFEXTENTS)) {
> > > > +		error = xfs_iread_extents(sc->tp, parent, XFS_DATA_FORK);
> > > > +		if (error) {
> > > > +			xfs_iunlock(parent, lock_mode);
> > > > +			return error;
> > > > +		}
> > > > +	}
> > > > +	xfs_iunlock(parent, lock_mode);
> > > 
> > > Why not just do what xfs_dir_open() does? i.e.
> > > 
> > >         /*
> > >          * If there are any blocks, read-ahead block 0 as we're almost
> > >          * certain to have the next operation be a read there.
> > >          */
> > >         mode = xfs_ilock_data_map_shared(ip);
> > >         if (ip->i_d.di_nextents > 0)
> > >                 error = xfs_dir3_data_readahead(ip, 0, -1);
> > >         xfs_iunlock(ip, mode);
> > 
> > Ok.
> > 
> > > > +	/*
> > > > +	 * Iterate the parent dir to confirm that there is
> > > > +	 * exactly one entry pointing back to the inode being
> > > > +	 * scanned.
> > > > +	 */
> > > > +	bufsize = (size_t)min_t(loff_t, 32768, parent->i_d.di_size);
> > > 
> > > Perhaps we need a define for that 32k magic number now it's being
> > > used in multiple places?
> > 
> > Magic indeed; glibc uses the minimum of (4 * BUFSIZ) or (sizeof(struct
> > dirent)); musl uses a static 2k buffer; dietlibc uses (PAGE_SIZE -
> > sizeof(structure header))...
> > 
> > ...what would we call it?
> > 
> > /*
> >  * The Linux API doesn't pass down the total size of the buffer we read
> >  * into down to the filesystem.  With the filldir concept it's not
> >  * needed for correct information, but the XFS dir2 leaf code wants an
> >  * estimate of the buffer size to calculate its readahead window and
> >  * size the buffers used for mapping to physical blocks.
> >  *
> >  * Try to give it an estimate that's good enough, maybe at some point we
> >  * can change the ->readdir prototype to include the buffer size.  For
> >  * now we use the current glibc buffer size.
> >  */
> > #define XFS_DEFAULT_READDIR_BUFSIZE	(32768)
> 
> That looks fine, though I think XFS_READDIR_BUFSIZE (or purple!) is
> sufficient.

I like the shorter name, done.

> > (As a side question, do we want to bump this up to a full pagesize on
> > architectures that have 64k pages?  I'd probably just leave it, but
> > let's see if anyone running those architectures complains or sends in a
> > patch?)
> 
> If it was to be dynamic, it should be determined by the directory
> block size, not the arch page size.

<nod>

--D

> Cheers,
> 
> Dave.
> 
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH v2 18/30] xfs: scrub inode btrees
  2017-10-12  1:42 ` [PATCH 18/30] xfs: scrub inode btrees Darrick J. Wong
  2017-10-16  2:55   ` Dave Chinner
@ 2017-10-17  0:11   ` Darrick J. Wong
  2017-10-17 21:59     ` Dave Chinner
  1 sibling, 1 reply; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-17  0:11 UTC (permalink / raw)
  To: linux-xfs; +Cc: Dave Chinner

Check the records of the inode btrees to make sure that the values
make sense given the inode records themselves.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
v2: fix insane freemask variable usage, shorten helper function names
---
 fs/xfs/Makefile            |    1 
 fs/xfs/libxfs/xfs_format.h |    2 
 fs/xfs/libxfs/xfs_fs.h     |    4 -
 fs/xfs/scrub/common.c      |   29 ++++
 fs/xfs/scrub/common.h      |    3 
 fs/xfs/scrub/ialloc.c      |  334 ++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/scrub.c       |    9 +
 fs/xfs/scrub/scrub.h       |    2 
 8 files changed, 382 insertions(+), 2 deletions(-)
 create mode 100644 fs/xfs/scrub/ialloc.c

diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 84ac733..82326b7 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -150,6 +150,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   alloc.o \
 				   btree.o \
 				   common.o \
+				   ialloc.o \
 				   scrub.o \
 				   )
 endif
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 23229f0..154c3dd 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -518,7 +518,7 @@ static inline int xfs_sb_version_hasftype(struct xfs_sb *sbp)
 		 (sbp->sb_features2 & XFS_SB_VERSION2_FTYPE));
 }
 
-static inline int xfs_sb_version_hasfinobt(xfs_sb_t *sbp)
+static inline bool xfs_sb_version_hasfinobt(xfs_sb_t *sbp)
 {
 	return (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) &&
 		(sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_FINOBT);
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 1e23d13..74df6ec 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -490,9 +490,11 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_AGI	4	/* AG inode header */
 #define XFS_SCRUB_TYPE_BNOBT	5	/* freesp by block btree */
 #define XFS_SCRUB_TYPE_CNTBT	6	/* freesp by length btree */
+#define XFS_SCRUB_TYPE_INOBT	7	/* inode btree */
+#define XFS_SCRUB_TYPE_FINOBT	8	/* free inode btree */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	7
+#define XFS_SCRUB_TYPE_NR	9
 
 /* i: Repair this metadata. */
 #define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 018127a..39165c3 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -40,6 +40,8 @@
 #include "xfs_refcount_btree.h"
 #include "xfs_rmap.h"
 #include "xfs_rmap_btree.h"
+#include "xfs_log.h"
+#include "xfs_trans_priv.h"
 #include "scrub/xfs_scrub.h"
 #include "scrub/scrub.h"
 #include "scrub/common.h"
@@ -451,11 +453,38 @@ xfs_scrub_setup_ag_btree(
 	struct xfs_inode		*ip,
 	bool				force_log)
 {
+	struct xfs_mount		*mp = sc->mp;
 	int				error;
 
+	/*
+	 * If the caller asks us to checkpont the log, do so.  This
+	 * expensive operation should be performed infrequently and only
+	 * as a last resort.  Any caller that sets force_log should
+	 * document why they need to do so.
+	 */
+	if (force_log) {
+		error = xfs_scrub_checkpoint_log(mp);
+		if (error)
+			return error;
+	}
+
 	error = xfs_scrub_setup_ag_header(sc, ip);
 	if (error)
 		return error;
 
 	return xfs_scrub_ag_init(sc, sc->sm->sm_agno, &sc->sa);
 }
+
+/* Push everything out of the log onto disk. */
+int
+xfs_scrub_checkpoint_log(
+	struct xfs_mount	*mp)
+{
+	int			error;
+
+	error = _xfs_log_force(mp, XFS_LOG_SYNC, NULL);
+	if (error)
+		return error;
+	xfs_ail_push_all_sync(mp->m_ail);
+	return 0;
+}
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 372a844..17830b8 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -73,6 +73,7 @@ void xfs_scrub_fblock_set_warning(struct xfs_scrub_context *sc, int whichfork,
 		xfs_fileoff_t offset);
 
 void xfs_scrub_set_incomplete(struct xfs_scrub_context *sc);
+int xfs_scrub_checkpoint_log(struct xfs_mount *mp);
 
 /* Setup functions */
 int xfs_scrub_setup_fs(struct xfs_scrub_context *sc, struct xfs_inode *ip);
@@ -80,6 +81,8 @@ int xfs_scrub_setup_ag_header(struct xfs_scrub_context *sc,
 			      struct xfs_inode *ip);
 int xfs_scrub_setup_ag_allocbt(struct xfs_scrub_context *sc,
 			       struct xfs_inode *ip);
+int xfs_scrub_setup_ag_iallocbt(struct xfs_scrub_context *sc,
+				struct xfs_inode *ip);
 
 
 void xfs_scrub_ag_free(struct xfs_scrub_context *sc, struct xfs_scrub_ag *sa);
diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
new file mode 100644
index 0000000..90e7911
--- /dev/null
+++ b/fs/xfs/scrub/ialloc.c
@@ -0,0 +1,334 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_alloc.h"
+#include "xfs_ialloc.h"
+#include "xfs_ialloc_btree.h"
+#include "xfs_icache.h"
+#include "xfs_rmap.h"
+#include "xfs_log.h"
+#include "xfs_trans_priv.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/btree.h"
+#include "scrub/trace.h"
+
+/*
+ * Set us up to scrub inode btrees.
+ * If we detect a discrepancy between the inobt and the inode,
+ * try again after forcing logged inode cores out to disk.
+ */
+int
+xfs_scrub_setup_ag_iallocbt(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	return xfs_scrub_setup_ag_btree(sc, ip, sc->try_harder);
+}
+
+/* Inode btree scrubber. */
+
+/* Is this chunk worth checking? */
+STATIC bool
+xfs_scrub_iallocbt_chunk(
+	struct xfs_scrub_btree		*bs,
+	struct xfs_inobt_rec_incore	*irec,
+	xfs_agino_t			agino,
+	xfs_extlen_t			len)
+{
+	struct xfs_mount		*mp = bs->cur->bc_mp;
+	xfs_agnumber_t			agno = bs->cur->bc_private.a.agno;
+	xfs_agblock_t			bno;
+
+	bno = XFS_AGINO_TO_AGBNO(mp, agino);
+	if (bno + len <= bno ||
+	    !xfs_verify_agbno(mp, agno, bno) ||
+	    !xfs_verify_agbno(mp, agno, bno + len - 1))
+		xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
+
+	return true;
+}
+
+/* Count the number of free inodes. */
+static unsigned int
+xfs_scrub_iallocbt_freecount(
+	xfs_inofree_t			freemask)
+{
+	BUILD_BUG_ON(sizeof(freemask) != sizeof(__u64));
+	return hweight64(freemask);
+}
+
+/* Check a particular inode with ir_free. */
+STATIC int
+xfs_scrub_iallocbt_check_cluster_freemask(
+	struct xfs_scrub_btree		*bs,
+	xfs_ino_t			fsino,
+	xfs_agino_t			chunkino,
+	xfs_agino_t			clusterino,
+	struct xfs_inobt_rec_incore	*irec,
+	struct xfs_buf			*bp)
+{
+	struct xfs_dinode		*dip;
+	struct xfs_mount		*mp = bs->cur->bc_mp;
+	bool				inode_is_free = false;
+	bool				freemask_ok;
+	bool				inuse;
+	int				error;
+
+	dip = xfs_buf_offset(bp, clusterino * mp->m_sb.sb_inodesize);
+	if (be16_to_cpu(dip->di_magic) != XFS_DINODE_MAGIC ||
+	    (dip->di_version >= 3 &&
+	     be64_to_cpu(dip->di_ino) != fsino + clusterino)) {
+		xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
+		goto out;
+	}
+
+	if (irec->ir_free & XFS_INOBT_MASK(chunkino + clusterino))
+		inode_is_free = true;
+	error = xfs_icache_inode_is_allocated(mp, bs->cur->bc_tp,
+			fsino + clusterino, &inuse);
+	if (error == -ENODATA) {
+		/* Not cached, just read the disk buffer */
+		freemask_ok = inode_is_free ^ !!(dip->di_mode);
+		if (!bs->sc->try_harder && !freemask_ok)
+			return -EDEADLOCK;
+	} else if (error < 0) {
+		/*
+		 * Inode is only half assembled, or there was an IO error,
+		 * or the verifier failed, so don't bother trying to check.
+		 * The inode scrubber can deal with this.
+		 */
+		goto out;
+	} else {
+		/* Inode is all there. */
+		freemask_ok = inode_is_free ^ inuse;
+	}
+	if (!freemask_ok)
+		xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
+out:
+	return 0;
+}
+
+/* Make sure the free mask is consistent with what the inodes think. */
+STATIC int
+xfs_scrub_iallocbt_check_freemask(
+	struct xfs_scrub_btree		*bs,
+	struct xfs_inobt_rec_incore	*irec)
+{
+	struct xfs_owner_info		oinfo;
+	struct xfs_imap			imap;
+	struct xfs_mount		*mp = bs->cur->bc_mp;
+	struct xfs_dinode		*dip;
+	struct xfs_buf			*bp;
+	xfs_ino_t			fsino;
+	xfs_agino_t			nr_inodes;
+	xfs_agino_t			agino;
+	xfs_agino_t			chunkino;
+	xfs_agino_t			clusterino;
+	xfs_agblock_t			agbno;
+	int				blks_per_cluster;
+	uint16_t			holemask;
+	uint16_t			ir_holemask;
+	int				error = 0;
+
+	/* Make sure the freemask matches the inode records. */
+	blks_per_cluster = xfs_icluster_size_fsb(mp);
+	nr_inodes = XFS_OFFBNO_TO_AGINO(mp, blks_per_cluster, 0);
+	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_INODES);
+
+	for (agino = irec->ir_startino;
+	     agino < irec->ir_startino + XFS_INODES_PER_CHUNK;
+	     agino += blks_per_cluster * mp->m_sb.sb_inopblock) {
+		fsino = XFS_AGINO_TO_INO(mp, bs->cur->bc_private.a.agno, agino);
+		chunkino = agino - irec->ir_startino;
+		agbno = XFS_AGINO_TO_AGBNO(mp, agino);
+
+		/* Compute the holemask mask for this cluster. */
+		for (clusterino = 0, holemask = 0; clusterino < nr_inodes;
+		     clusterino += XFS_INODES_PER_HOLEMASK_BIT)
+			holemask |= XFS_INOBT_MASK((chunkino + clusterino) /
+					XFS_INODES_PER_HOLEMASK_BIT);
+
+		/* The whole cluster must be a hole or not a hole. */
+		ir_holemask = (irec->ir_holemask & holemask);
+		if (ir_holemask != holemask && ir_holemask != 0) {
+			xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
+			continue;
+		}
+
+		/* If any part of this is a hole, skip it. */
+		if (ir_holemask)
+			continue;
+
+		/* Grab the inode cluster buffer. */
+		imap.im_blkno = XFS_AGB_TO_DADDR(mp, bs->cur->bc_private.a.agno,
+				agbno);
+		imap.im_len = XFS_FSB_TO_BB(mp, blks_per_cluster);
+		imap.im_boffset = 0;
+
+		error = xfs_imap_to_bp(mp, bs->cur->bc_tp, &imap,
+				&dip, &bp, 0, 0);
+		if (!xfs_scrub_btree_process_error(bs->sc, bs->cur, 0, &error))
+			continue;
+
+		/* Which inodes are free? */
+		for (clusterino = 0; clusterino < nr_inodes; clusterino++) {
+			error = xfs_scrub_iallocbt_check_cluster_freemask(bs,
+					fsino, chunkino, clusterino, irec, bp);
+			if (error) {
+				xfs_trans_brelse(bs->cur->bc_tp, bp);
+				return error;
+			}
+		}
+
+		xfs_trans_brelse(bs->cur->bc_tp, bp);
+	}
+
+	return error;
+}
+
+/* Scrub an inobt/finobt record. */
+STATIC int
+xfs_scrub_iallocbt_rec(
+	struct xfs_scrub_btree		*bs,
+	union xfs_btree_rec		*rec)
+{
+	struct xfs_mount		*mp = bs->cur->bc_mp;
+	struct xfs_inobt_rec_incore	irec;
+	uint64_t			holes;
+	xfs_agnumber_t			agno = bs->cur->bc_private.a.agno;
+	xfs_agino_t			agino;
+	xfs_agblock_t			agbno;
+	xfs_extlen_t			len;
+	int				holecount;
+	int				i;
+	int				error = 0;
+	unsigned int			real_freecount;
+	uint16_t			holemask;
+
+	xfs_inobt_btrec_to_irec(mp, rec, &irec);
+
+	if (irec.ir_count > XFS_INODES_PER_CHUNK ||
+	    irec.ir_freecount > XFS_INODES_PER_CHUNK)
+		xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
+
+	real_freecount = irec.ir_freecount +
+			(XFS_INODES_PER_CHUNK - irec.ir_count);
+	if (real_freecount != xfs_scrub_iallocbt_freecount(irec.ir_free))
+		xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
+
+	agino = irec.ir_startino;
+	/* Record has to be properly aligned within the AG. */
+	if (!xfs_verify_agino(mp, agno, agino) ||
+	    !xfs_verify_agino(mp, agno, agino + XFS_INODES_PER_CHUNK - 1)) {
+		xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
+		goto out;
+	}
+
+	/* Make sure this record is aligned to cluster and inoalignmnt size. */
+	agbno = XFS_AGINO_TO_AGBNO(mp, irec.ir_startino);
+	if ((agbno & (xfs_ialloc_cluster_alignment(mp) - 1)) ||
+	    (agbno & (xfs_icluster_size_fsb(mp) - 1)))
+		xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
+
+	/* Handle non-sparse inodes */
+	if (!xfs_inobt_issparse(irec.ir_holemask)) {
+		len = XFS_B_TO_FSB(mp,
+				XFS_INODES_PER_CHUNK * mp->m_sb.sb_inodesize);
+		if (irec.ir_count != XFS_INODES_PER_CHUNK)
+			xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
+
+		if (!xfs_scrub_iallocbt_chunk(bs, &irec, agino, len))
+			goto out;
+		goto check_freemask;
+	}
+
+	/* Check each chunk of a sparse inode cluster. */
+	holemask = irec.ir_holemask;
+	holecount = 0;
+	len = XFS_B_TO_FSB(mp,
+			XFS_INODES_PER_HOLEMASK_BIT * mp->m_sb.sb_inodesize);
+	holes = ~xfs_inobt_irec_to_allocmask(&irec);
+	if ((holes & irec.ir_free) != holes ||
+	    irec.ir_freecount > irec.ir_count)
+		xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
+
+	for (i = 0; i < XFS_INOBT_HOLEMASK_BITS; i++) {
+		if (holemask & 1)
+			holecount += XFS_INODES_PER_HOLEMASK_BIT;
+		else if (!xfs_scrub_iallocbt_chunk(bs, &irec, agino, len))
+			break;
+		holemask >>= 1;
+		agino += XFS_INODES_PER_HOLEMASK_BIT;
+	}
+
+	if (holecount > XFS_INODES_PER_CHUNK ||
+	    holecount + irec.ir_count != XFS_INODES_PER_CHUNK)
+		xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
+
+check_freemask:
+	error = xfs_scrub_iallocbt_check_freemask(bs, &irec);
+	if (error)
+		goto out;
+
+out:
+	return error;
+}
+
+/* Scrub the inode btrees for some AG. */
+STATIC int
+xfs_scrub_iallocbt(
+	struct xfs_scrub_context	*sc,
+	xfs_btnum_t			which)
+{
+	struct xfs_btree_cur		*cur;
+	struct xfs_owner_info		oinfo;
+
+	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_INOBT);
+	cur = which == XFS_BTNUM_INO ? sc->sa.ino_cur : sc->sa.fino_cur;
+	return xfs_scrub_btree(sc, cur, xfs_scrub_iallocbt_rec, &oinfo, NULL);
+}
+
+int
+xfs_scrub_inobt(
+	struct xfs_scrub_context	*sc)
+{
+	return xfs_scrub_iallocbt(sc, XFS_BTNUM_INO);
+}
+
+int
+xfs_scrub_finobt(
+	struct xfs_scrub_context	*sc)
+{
+	return xfs_scrub_iallocbt(sc, XFS_BTNUM_FINO);
+}
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index d69ab7f..3c04913 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -182,6 +182,15 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.setup	= xfs_scrub_setup_ag_allocbt,
 		.scrub	= xfs_scrub_cntbt,
 	},
+	{ /* inobt */
+		.setup	= xfs_scrub_setup_ag_iallocbt,
+		.scrub	= xfs_scrub_inobt,
+	},
+	{ /* finobt */
+		.setup	= xfs_scrub_setup_ag_iallocbt,
+		.scrub	= xfs_scrub_finobt,
+		.has	= xfs_sb_version_hasfinobt,
+	},
 };
 
 /* This isn't a stable feature, warn once per day. */
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index a4af99c..5d97453 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -73,5 +73,7 @@ int xfs_scrub_agfl(struct xfs_scrub_context *sc);
 int xfs_scrub_agi(struct xfs_scrub_context *sc);
 int xfs_scrub_bnobt(struct xfs_scrub_context *sc);
 int xfs_scrub_cntbt(struct xfs_scrub_context *sc);
+int xfs_scrub_inobt(struct xfs_scrub_context *sc);
+int xfs_scrub_finobt(struct xfs_scrub_context *sc);
 
 #endif	/* __XFS_SCRUB_SCRUB_H__ */

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 21/30] xfs: scrub inodes
  2017-10-12  1:43 ` [PATCH 21/30] xfs: scrub inodes Darrick J. Wong
  2017-10-12 22:32   ` Darrick J. Wong
@ 2017-10-17  0:13   ` Darrick J. Wong
  2017-10-17 22:01     ` Dave Chinner
  1 sibling, 1 reply; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-17  0:13 UTC (permalink / raw)
  To: linux-xfs; +Cc: Dave Chinner

Scrub the fields within an inode.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
v2: fix illegible logic, enhance comments
---
 fs/xfs/Makefile        |    1 
 fs/xfs/libxfs/xfs_fs.h |    3 
 fs/xfs/scrub/common.c  |   54 ++++
 fs/xfs/scrub/common.h  |    3 
 fs/xfs/scrub/inode.c   |  605 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/scrub.c   |   18 +
 fs/xfs/scrub/scrub.h   |    2 
 fs/xfs/xfs_ioctl.c     |    4 
 8 files changed, 687 insertions(+), 3 deletions(-)
 create mode 100644 fs/xfs/scrub/inode.c

diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index a7c5752..28e14b7 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -151,6 +151,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   btree.o \
 				   common.o \
 				   ialloc.o \
+				   inode.o \
 				   refcount.o \
 				   rmap.o \
 				   scrub.o \
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index b3f992c..f8463e0 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -494,9 +494,10 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_FINOBT	8	/* free inode btree */
 #define XFS_SCRUB_TYPE_RMAPBT	9	/* reverse mapping btree */
 #define XFS_SCRUB_TYPE_REFCNTBT	10	/* reference count btree */
+#define XFS_SCRUB_TYPE_INODE	11	/* inode record */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	11
+#define XFS_SCRUB_TYPE_NR	12
 
 /* i: Repair this metadata. */
 #define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 39165c3..415c6a9 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -30,6 +30,8 @@
 #include "xfs_trans.h"
 #include "xfs_sb.h"
 #include "xfs_inode.h"
+#include "xfs_icache.h"
+#include "xfs_itable.h"
 #include "xfs_alloc.h"
 #include "xfs_alloc_btree.h"
 #include "xfs_bmap.h"
@@ -488,3 +490,55 @@ xfs_scrub_checkpoint_log(
 	xfs_ail_push_all_sync(mp->m_ail);
 	return 0;
 }
+
+/*
+ * Given an inode and the scrub control structure, grab either the
+ * inode referenced in the control structure or the inode passed in.
+ * The inode is not locked.
+ */
+int
+xfs_scrub_get_inode(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip_in)
+{
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_inode		*ip = NULL;
+	int				error;
+
+	/*
+	 * If userspace passed us an AG number or a generation number
+	 * without an inode number, they haven't got a clue so bail out
+	 * immediately.
+	 */
+	if (sc->sm->sm_agno || (sc->sm->sm_gen && !sc->sm->sm_ino))
+		return -EINVAL;
+
+	/* We want to scan the inode we already had opened. */
+	if (sc->sm->sm_ino == 0 || sc->sm->sm_ino == ip_in->i_ino) {
+		sc->ip = ip_in;
+		return 0;
+	}
+
+	/* Look up the inode, see if the generation number matches. */
+	if (xfs_internal_inum(mp, sc->sm->sm_ino))
+		return -ENOENT;
+	error = xfs_iget(mp, NULL, sc->sm->sm_ino,
+			XFS_IGET_UNTRUSTED | XFS_IGET_DONTCACHE, 0, &ip);
+	if (error == -ENOENT || error == -EINVAL) {
+		/* inode doesn't exist... */
+		return -ENOENT;
+	} else if (error) {
+		trace_xfs_scrub_op_error(sc,
+				XFS_INO_TO_AGNO(mp, sc->sm->sm_ino),
+				XFS_INO_TO_AGBNO(mp, sc->sm->sm_ino),
+				error, __return_address);
+		return error;
+	}
+	if (VFS_I(ip)->i_generation != sc->sm->sm_gen) {
+		iput(VFS_I(ip));
+		return -ENOENT;
+	}
+
+	sc->ip = ip;
+	return 0;
+}
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 610e956..fcec11e 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -87,6 +87,8 @@ int xfs_scrub_setup_ag_rmapbt(struct xfs_scrub_context *sc,
 			      struct xfs_inode *ip);
 int xfs_scrub_setup_ag_refcountbt(struct xfs_scrub_context *sc,
 				  struct xfs_inode *ip);
+int xfs_scrub_setup_inode(struct xfs_scrub_context *sc,
+			  struct xfs_inode *ip);
 
 
 void xfs_scrub_ag_free(struct xfs_scrub_context *sc, struct xfs_scrub_ag *sa);
@@ -105,5 +107,6 @@ int xfs_scrub_walk_agfl(struct xfs_scrub_context *sc,
 
 int xfs_scrub_setup_ag_btree(struct xfs_scrub_context *sc,
 			     struct xfs_inode *ip, bool force_log);
+int xfs_scrub_get_inode(struct xfs_scrub_context *sc, struct xfs_inode *ip_in);
 
 #endif	/* __XFS_SCRUB_COMMON_H__ */
diff --git a/fs/xfs/scrub/inode.c b/fs/xfs/scrub/inode.c
new file mode 100644
index 0000000..36c144e
--- /dev/null
+++ b/fs/xfs/scrub/inode.c
@@ -0,0 +1,605 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_icache.h"
+#include "xfs_inode_buf.h"
+#include "xfs_inode_fork.h"
+#include "xfs_ialloc.h"
+#include "xfs_da_format.h"
+#include "xfs_reflink.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/trace.h"
+
+/*
+ * Grab total control of the inode metadata.  It doesn't matter here if
+ * the file data is still changing; exclusive access to the metadata is
+ * the goal.
+ */
+int
+xfs_scrub_setup_inode(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	struct xfs_mount		*mp = sc->mp;
+	int				error;
+
+	/*
+	 * Try to get the inode.  If the verifiers fail, we try again
+	 * in raw mode.
+	 */
+	error = xfs_scrub_get_inode(sc, ip);
+	switch (error) {
+	case 0:
+		break;
+	case -EFSCORRUPTED:
+	case -EFSBADCRC:
+		return 0;
+	default:
+		return error;
+	}
+
+	/* Got the inode, lock it and we're ready to go. */
+	sc->ilock_flags = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL;
+	xfs_ilock(sc->ip, sc->ilock_flags);
+	error = xfs_scrub_trans_alloc(sc->sm, mp, &sc->tp);
+	if (error)
+		goto out;
+	sc->ilock_flags |= XFS_ILOCK_EXCL;
+	xfs_ilock(sc->ip, XFS_ILOCK_EXCL);
+
+out:
+	/* scrub teardown will unlock and release the inode for us */
+	return error;
+}
+
+/* Inode core */
+
+/*
+ * Validate di_extsize hint.
+ *
+ * The rules are documented at xfs_ioctl_setattr_check_extsize().
+ * These functions must be kept in sync with each other.
+ */
+STATIC void
+xfs_scrub_inode_extsize(
+	struct xfs_scrub_context	*sc,
+	struct xfs_buf			*bp,
+	struct xfs_dinode		*dip,
+	xfs_ino_t			ino,
+	uint16_t			mode,
+	uint16_t			flags)
+{
+	struct xfs_mount		*mp = sc->mp;
+	bool				rt_flag;
+	bool				hint_flag;
+	bool				inherit_flag;
+	uint32_t			extsize;
+	uint32_t			extsize_bytes;
+	uint32_t			blocksize_bytes;
+
+	rt_flag = (flags & XFS_DIFLAG_REALTIME);
+	hint_flag = (flags & XFS_DIFLAG_EXTSIZE);
+	inherit_flag = (flags & XFS_DIFLAG_EXTSZINHERIT);
+	extsize = be32_to_cpu(dip->di_extsize);
+	extsize_bytes = XFS_FSB_TO_B(sc->mp, extsize);
+
+	if (rt_flag)
+		blocksize_bytes = mp->m_sb.sb_rextsize << mp->m_sb.sb_blocklog;
+	else
+		blocksize_bytes = mp->m_sb.sb_blocksize;
+
+	if ((hint_flag || inherit_flag) && !(S_ISDIR(mode) || S_ISREG(mode)))
+		goto bad;
+
+	if (hint_flag && !S_ISREG(mode))
+		goto bad;
+
+	if (inherit_flag && !S_ISDIR(mode))
+		goto bad;
+
+	if ((hint_flag || inherit_flag) && extsize == 0)
+		goto bad;
+
+	if (!(hint_flag || inherit_flag) && extsize != 0)
+		goto bad;
+
+	if (extsize_bytes % blocksize_bytes)
+		goto bad;
+
+	if (extsize > MAXEXTLEN)
+		goto bad;
+
+	if (!rt_flag && extsize > mp->m_sb.sb_agblocks / 2)
+		goto bad;
+
+	return;
+bad:
+	xfs_scrub_ino_set_corrupt(sc, ino, bp);
+}
+
+/*
+ * Validate di_cowextsize hint.
+ *
+ * The rules are documented at xfs_ioctl_setattr_check_cowextsize().
+ * These functions must be kept in sync with each other.
+ */
+STATIC void
+xfs_scrub_inode_cowextsize(
+	struct xfs_scrub_context	*sc,
+	struct xfs_buf			*bp,
+	struct xfs_dinode		*dip,
+	xfs_ino_t			ino,
+	uint16_t			mode,
+	uint16_t			flags,
+	uint64_t			flags2)
+{
+	struct xfs_mount		*mp = sc->mp;
+	bool				rt_flag;
+	bool				hint_flag;
+	uint32_t			extsize;
+	uint32_t			extsize_bytes;
+
+	rt_flag = (flags & XFS_DIFLAG_REALTIME);
+	hint_flag = (flags2 & XFS_DIFLAG2_COWEXTSIZE);
+	extsize = be32_to_cpu(dip->di_cowextsize);
+	extsize_bytes = XFS_FSB_TO_B(sc->mp, extsize);
+
+	if (hint_flag && !xfs_sb_version_hasreflink(&mp->m_sb))
+		goto bad;
+
+	if (hint_flag && !(S_ISDIR(mode) || S_ISREG(mode)))
+		goto bad;
+
+	if (hint_flag && extsize == 0)
+		goto bad;
+
+	if (!hint_flag && extsize != 0)
+		goto bad;
+
+	if (hint_flag && rt_flag)
+		goto bad;
+
+	if (extsize_bytes % mp->m_sb.sb_blocksize)
+		goto bad;
+
+	if (extsize > MAXEXTLEN)
+		goto bad;
+
+	if (extsize > mp->m_sb.sb_agblocks / 2)
+		goto bad;
+
+	return;
+bad:
+	xfs_scrub_ino_set_corrupt(sc, ino, bp);
+}
+
+/* Make sure the di_flags make sense for the inode. */
+STATIC void
+xfs_scrub_inode_flags(
+	struct xfs_scrub_context	*sc,
+	struct xfs_buf			*bp,
+	struct xfs_dinode		*dip,
+	xfs_ino_t			ino,
+	uint16_t			mode,
+	uint16_t			flags)
+{
+	struct xfs_mount		*mp = sc->mp;
+
+	if (flags & ~XFS_DIFLAG_ANY)
+		goto bad;
+
+	/* rt flags require rt device */
+	if ((flags & (XFS_DIFLAG_REALTIME | XFS_DIFLAG_RTINHERIT)) &&
+	    !mp->m_rtdev_targp)
+		goto bad;
+
+	/* new rt bitmap flag only valid for rbmino */
+	if ((flags & XFS_DIFLAG_NEWRTBM) && ino != mp->m_sb.sb_rbmino)
+		goto bad;
+
+	/* directory-only flags */
+	if ((flags & (XFS_DIFLAG_RTINHERIT |
+		     XFS_DIFLAG_EXTSZINHERIT |
+		     XFS_DIFLAG_PROJINHERIT |
+		     XFS_DIFLAG_NOSYMLINKS)) &&
+	    !S_ISDIR(mode))
+		goto bad;
+
+	/* file-only flags */
+	if ((flags & (XFS_DIFLAG_REALTIME | FS_XFLAG_EXTSIZE)) &&
+	    !S_ISREG(mode))
+		goto bad;
+
+	/* filestreams and rt make no sense */
+	if ((flags & XFS_DIFLAG_FILESTREAM) && (flags & XFS_DIFLAG_REALTIME))
+		goto bad;
+
+	return;
+bad:
+	xfs_scrub_ino_set_corrupt(sc, ino, bp);
+}
+
+/* Make sure the di_flags2 make sense for the inode. */
+STATIC void
+xfs_scrub_inode_flags2(
+	struct xfs_scrub_context	*sc,
+	struct xfs_buf			*bp,
+	struct xfs_dinode		*dip,
+	xfs_ino_t			ino,
+	uint16_t			mode,
+	uint16_t			flags,
+	uint64_t			flags2)
+{
+	struct xfs_mount		*mp = sc->mp;
+
+	if (flags2 & ~XFS_DIFLAG2_ANY)
+		goto bad;
+
+	/* reflink flag requires reflink feature */
+	if ((flags2 & XFS_DIFLAG2_REFLINK) &&
+	    !xfs_sb_version_hasreflink(&mp->m_sb))
+		goto bad;
+
+	/* cowextsize flag is checked w.r.t. mode separately */
+
+	/* file-only flags */
+	if ((flags2 & (XFS_DIFLAG2_DAX | XFS_DIFLAG2_REFLINK)) &&
+	    !S_ISREG(mode))
+		goto bad;
+
+	/* realtime and reflink make no sense, currently */
+	if ((flags & XFS_DIFLAG_REALTIME) && (flags2 & XFS_DIFLAG2_REFLINK))
+		goto bad;
+
+	/* dax and reflink make no sense, currently */
+	if ((flags2 & XFS_DIFLAG2_DAX) && (flags2 & XFS_DIFLAG2_REFLINK))
+		goto bad;
+
+	return;
+bad:
+	xfs_scrub_ino_set_corrupt(sc, ino, bp);
+}
+
+/* Scrub all the ondisk inode fields. */
+STATIC void
+xfs_scrub_dinode(
+	struct xfs_scrub_context	*sc,
+	struct xfs_buf			*bp,
+	struct xfs_dinode		*dip,
+	xfs_ino_t			ino)
+{
+	struct xfs_mount		*mp = sc->mp;
+	size_t				fork_recs;
+	unsigned long long		isize;
+	uint64_t			flags2;
+	uint32_t			nextents;
+	uint16_t			flags;
+	uint16_t			mode;
+
+	flags = be16_to_cpu(dip->di_flags);
+	if (dip->di_version >= 3)
+		flags2 = be64_to_cpu(dip->di_flags2);
+	else
+		flags2 = 0;
+
+	/* di_mode */
+	mode = be16_to_cpu(dip->di_mode);
+	if (mode & ~(S_IALLUGO | S_IFMT))
+		xfs_scrub_ino_set_corrupt(sc, ino, bp);
+
+	/* v1/v2 fields */
+	switch (dip->di_version) {
+	case 1:
+		/*
+		 * We autoconvert v1 inodes into v2 inodes on writeout,
+		 * so just mark this inode for preening.
+		 */
+		xfs_scrub_ino_set_preen(sc, bp);
+		break;
+	case 2:
+	case 3:
+		if (dip->di_onlink != 0)
+			xfs_scrub_ino_set_corrupt(sc, ino, bp);
+
+		if (dip->di_mode == 0 && sc->ip)
+			xfs_scrub_ino_set_corrupt(sc, ino, bp);
+
+		if (dip->di_projid_hi != 0 &&
+		    !xfs_sb_version_hasprojid32bit(&mp->m_sb))
+			xfs_scrub_ino_set_corrupt(sc, ino, bp);
+		break;
+	default:
+		xfs_scrub_ino_set_corrupt(sc, ino, bp);
+		return;
+	}
+
+	/*
+	 * di_uid/di_gid -- -1 isn't invalid, but there's no way that
+	 * userspace could have created that.
+	 */
+	if (dip->di_uid == cpu_to_be32(-1U) ||
+	    dip->di_gid == cpu_to_be32(-1U))
+		xfs_scrub_ino_set_warning(sc, bp);
+
+	/* di_format */
+	switch (dip->di_format) {
+	case XFS_DINODE_FMT_DEV:
+		if (!S_ISCHR(mode) && !S_ISBLK(mode) &&
+		    !S_ISFIFO(mode) && !S_ISSOCK(mode))
+			xfs_scrub_ino_set_corrupt(sc, ino, bp);
+		break;
+	case XFS_DINODE_FMT_LOCAL:
+		if (!S_ISDIR(mode) && !S_ISLNK(mode))
+			xfs_scrub_ino_set_corrupt(sc, ino, bp);
+		break;
+	case XFS_DINODE_FMT_EXTENTS:
+		if (!S_ISREG(mode) && !S_ISDIR(mode) && !S_ISLNK(mode))
+			xfs_scrub_ino_set_corrupt(sc, ino, bp);
+		break;
+	case XFS_DINODE_FMT_BTREE:
+		if (!S_ISREG(mode) && !S_ISDIR(mode))
+			xfs_scrub_ino_set_corrupt(sc, ino, bp);
+		break;
+	case XFS_DINODE_FMT_UUID:
+	default:
+		xfs_scrub_ino_set_corrupt(sc, ino, bp);
+		break;
+	}
+
+	/*
+	 * di_size.  xfs_dinode_verify checks for things that screw up
+	 * the VFS such as the upper bit being set and zero-length
+	 * symlinks/directories, but we can do more here.
+	 */
+	isize = be64_to_cpu(dip->di_size);
+	if (isize & (1ULL << 63))
+		xfs_scrub_ino_set_corrupt(sc, ino, bp);
+
+	/* Devices, fifos, and sockets must have zero size */
+	if (!S_ISDIR(mode) && !S_ISREG(mode) && !S_ISLNK(mode) && isize != 0)
+		xfs_scrub_ino_set_corrupt(sc, ino, bp);
+
+	/* Directories can't be larger than the data section size (32G) */
+	if (S_ISDIR(mode) && (isize == 0 || isize >= XFS_DIR2_SPACE_SIZE))
+		xfs_scrub_ino_set_corrupt(sc, ino, bp);
+
+	/* Symlinks can't be larger than SYMLINK_MAXLEN */
+	if (S_ISLNK(mode) && (isize == 0 || isize >= XFS_SYMLINK_MAXLEN))
+		xfs_scrub_ino_set_corrupt(sc, ino, bp);
+
+	/*
+	 * Warn if the running kernel can't handle the kinds of offsets
+	 * needed to deal with the file size.  In other words, if the
+	 * pagecache can't cache all the blocks in this file due to
+	 * overly large offsets, flag the inode for admin review.
+	 */
+	if (isize >= mp->m_super->s_maxbytes)
+		xfs_scrub_ino_set_warning(sc, bp);
+
+	/* di_nblocks */
+	if (flags2 & XFS_DIFLAG2_REFLINK) {
+		; /* nblocks can exceed dblocks */
+	} else if (flags & XFS_DIFLAG_REALTIME) {
+		/*
+		 * nblocks is the sum of data extents (in the rtdev),
+		 * attr extents (in the datadev), and both forks' bmbt
+		 * blocks (in the datadev).  This clumsy check is the
+		 * best we can do without cross-referencing with the
+		 * inode forks.
+		 */
+		if (be64_to_cpu(dip->di_nblocks) >=
+		    mp->m_sb.sb_dblocks + mp->m_sb.sb_rblocks)
+			xfs_scrub_ino_set_corrupt(sc, ino, bp);
+	} else {
+		if (be64_to_cpu(dip->di_nblocks) >= mp->m_sb.sb_dblocks)
+			xfs_scrub_ino_set_corrupt(sc, ino, bp);
+	}
+
+	xfs_scrub_inode_flags(sc, bp, dip, ino, mode, flags);
+
+	xfs_scrub_inode_extsize(sc, bp, dip, ino, mode, flags);
+
+	/* di_nextents */
+	nextents = be32_to_cpu(dip->di_nextents);
+	fork_recs =  XFS_DFORK_DSIZE(dip, mp) / sizeof(struct xfs_bmbt_rec);
+	switch (dip->di_format) {
+	case XFS_DINODE_FMT_EXTENTS:
+		if (nextents > fork_recs)
+			xfs_scrub_ino_set_corrupt(sc, ino, bp);
+		break;
+	case XFS_DINODE_FMT_BTREE:
+		if (nextents <= fork_recs)
+			xfs_scrub_ino_set_corrupt(sc, ino, bp);
+		break;
+	default:
+		if (nextents != 0)
+			xfs_scrub_ino_set_corrupt(sc, ino, bp);
+		break;
+	}
+
+	/* di_forkoff */
+	if (XFS_DFORK_APTR(dip) >= (char *)dip + mp->m_sb.sb_inodesize)
+		xfs_scrub_ino_set_corrupt(sc, ino, bp);
+	if (dip->di_anextents != 0 && dip->di_forkoff == 0)
+		xfs_scrub_ino_set_corrupt(sc, ino, bp);
+	if (dip->di_forkoff == 0 && dip->di_aformat != XFS_DINODE_FMT_EXTENTS)
+		xfs_scrub_ino_set_corrupt(sc, ino, bp);
+
+	/* di_aformat */
+	if (dip->di_aformat != XFS_DINODE_FMT_LOCAL &&
+	    dip->di_aformat != XFS_DINODE_FMT_EXTENTS &&
+	    dip->di_aformat != XFS_DINODE_FMT_BTREE)
+		xfs_scrub_ino_set_corrupt(sc, ino, bp);
+
+	/* di_anextents */
+	nextents = be16_to_cpu(dip->di_anextents);
+	fork_recs =  XFS_DFORK_ASIZE(dip, mp) / sizeof(struct xfs_bmbt_rec);
+	switch (dip->di_aformat) {
+	case XFS_DINODE_FMT_EXTENTS:
+		if (nextents > fork_recs)
+			xfs_scrub_ino_set_corrupt(sc, ino, bp);
+		break;
+	case XFS_DINODE_FMT_BTREE:
+		if (nextents <= fork_recs)
+			xfs_scrub_ino_set_corrupt(sc, ino, bp);
+		break;
+	default:
+		if (nextents != 0)
+			xfs_scrub_ino_set_corrupt(sc, ino, bp);
+	}
+
+	if (dip->di_version >= 3) {
+		xfs_scrub_inode_flags2(sc, bp, dip, ino, mode, flags, flags2);
+		xfs_scrub_inode_cowextsize(sc, bp, dip, ino, mode, flags,
+				flags2);
+	}
+}
+
+/* Map and read a raw inode. */
+STATIC int
+xfs_scrub_inode_map_raw(
+	struct xfs_scrub_context	*sc,
+	xfs_ino_t			ino,
+	struct xfs_buf			**bpp,
+	struct xfs_dinode		**dipp)
+{
+	struct xfs_imap			imap;
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_buf			*bp;
+	struct xfs_dinode		*dip;
+	int				error;
+
+	error = xfs_imap(mp, sc->tp, ino, &imap, XFS_IGET_UNTRUSTED);
+	if (error == -EINVAL) {
+		/*
+		 * Inode could have gotten deleted out from under us;
+		 * just forget about it.
+		 */
+		error = -ENOENT;
+		goto out;
+	}
+	if (!xfs_scrub_process_error(sc, XFS_INO_TO_AGNO(mp, ino),
+			XFS_INO_TO_AGBNO(mp, ino), &error))
+		goto out;
+
+	error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
+			imap.im_blkno, imap.im_len, XBF_UNMAPPED, &bp,
+			NULL);
+	if (!xfs_scrub_process_error(sc, XFS_INO_TO_AGNO(mp, ino),
+			XFS_INO_TO_AGBNO(mp, ino), &error))
+		goto out;
+
+	/*
+	 * Is this really an inode?  We disabled verifiers in the above
+	 * xfs_trans_read_buf call because the inode buffer verifier
+	 * fails on /any/ inode record in the inode cluster with a bad
+	 * magic or version number, not just the one that we're
+	 * checking.  Therefore, grab the buffer unconditionally, attach
+	 * the inode verifiers by hand, and run the inode verifier only
+	 * on the one inode we want.
+	 */
+	bp->b_ops = &xfs_inode_buf_ops;
+	dip = xfs_buf_offset(bp, imap.im_boffset);
+	if (!xfs_dinode_verify(mp, ino, dip) ||
+	    !xfs_dinode_good_version(mp, dip->di_version)) {
+		xfs_scrub_ino_set_corrupt(sc, ino, bp);
+		goto out;
+	}
+
+	/* ...and is it the one we asked for? */
+	if (be32_to_cpu(dip->di_gen) != sc->sm->sm_gen) {
+		error = -ENOENT;
+		goto out;
+	}
+
+	*dipp = dip;
+	*bpp = bp;
+out:
+	return error;
+}
+
+/* Scrub an inode. */
+int
+xfs_scrub_inode(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_dinode		di;
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_buf			*bp = NULL;
+	struct xfs_dinode		*dip;
+	xfs_ino_t			ino;
+
+	bool				has_shared;
+	int				error = 0;
+
+	/* Did we get the in-core inode, or are we doing this manually? */
+	if (sc->ip) {
+		ino = sc->ip->i_ino;
+		xfs_inode_to_disk(sc->ip, &di, 0);
+		dip = &di;
+	} else {
+		/* Map & read inode. */
+		ino = sc->sm->sm_ino;
+		error = xfs_scrub_inode_map_raw(sc, ino, &bp, &dip);
+		if (error)
+			goto out;
+	}
+
+	xfs_scrub_dinode(sc, bp, dip, ino);
+	if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
+		goto out;
+
+	/* Now let's do the things that require a live inode. */
+	if (!sc->ip)
+		goto out;
+
+	/*
+	 * Does this inode have the reflink flag set but no shared extents?
+	 * Set the preening flag if this is the case.
+	 */
+	if (xfs_is_reflink_inode(sc->ip)) {
+		error = xfs_reflink_inode_has_shared_extents(sc->tp, sc->ip,
+				&has_shared);
+		if (!xfs_scrub_process_error(sc, XFS_INO_TO_AGNO(mp, ino),
+				XFS_INO_TO_AGBNO(mp, ino), &error))
+			goto out;
+		if (!has_shared)
+			xfs_scrub_ino_set_preen(sc, bp);
+	}
+
+out:
+	if (bp)
+		xfs_trans_brelse(sc->tp, bp);
+	return error;
+}
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index dbf717fdf..c271d96 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -30,6 +30,8 @@
 #include "xfs_trans.h"
 #include "xfs_sb.h"
 #include "xfs_inode.h"
+#include "xfs_icache.h"
+#include "xfs_itable.h"
 #include "xfs_alloc.h"
 #include "xfs_alloc_btree.h"
 #include "xfs_bmap.h"
@@ -141,6 +143,7 @@ xfs_scrub_probe(
 STATIC int
 xfs_scrub_teardown(
 	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip_in,
 	int				error)
 {
 	xfs_scrub_ag_free(sc, &sc->sa);
@@ -148,6 +151,13 @@ xfs_scrub_teardown(
 		xfs_trans_cancel(sc->tp);
 		sc->tp = NULL;
 	}
+	if (sc->ip) {
+		xfs_iunlock(sc->ip, sc->ilock_flags);
+		if (sc->ip != ip_in &&
+		    !xfs_internal_inum(sc->mp, sc->ip->i_ino))
+			iput(VFS_I(sc->ip));
+		sc->ip = NULL;
+	}
 	return error;
 }
 
@@ -201,6 +211,10 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.scrub	= xfs_scrub_refcountbt,
 		.has	= xfs_sb_version_hasreflink,
 	},
+	{ /* inode record */
+		.setup	= xfs_scrub_setup_inode,
+		.scrub	= xfs_scrub_inode,
+	},
 };
 
 /* This isn't a stable feature, warn once per day. */
@@ -300,7 +314,7 @@ xfs_scrub_metadata(
 		 * Tear down everything we hold, then set up again with
 		 * preparation for worst-case scenarios.
 		 */
-		error = xfs_scrub_teardown(&sc, 0);
+		error = xfs_scrub_teardown(&sc, ip, 0);
 		if (error)
 			goto out;
 		try_harder = true;
@@ -313,7 +327,7 @@ xfs_scrub_metadata(
 		xfs_alert_ratelimited(mp, "Corruption detected during scrub.");
 
 out_teardown:
-	error = xfs_scrub_teardown(&sc, error);
+	error = xfs_scrub_teardown(&sc, ip, error);
 out:
 	trace_xfs_scrub_done(ip, sm, error);
 	return error;
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index 1c80bf5..ec635d4 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -59,6 +59,7 @@ struct xfs_scrub_context {
 	const struct xfs_scrub_meta_ops	*ops;
 	struct xfs_trans		*tp;
 	struct xfs_inode		*ip;
+	uint				ilock_flags;
 	bool				try_harder;
 
 	/* State tracking for single-AG operations. */
@@ -77,5 +78,6 @@ int xfs_scrub_inobt(struct xfs_scrub_context *sc);
 int xfs_scrub_finobt(struct xfs_scrub_context *sc);
 int xfs_scrub_rmapbt(struct xfs_scrub_context *sc);
 int xfs_scrub_refcountbt(struct xfs_scrub_context *sc);
+int xfs_scrub_inode(struct xfs_scrub_context *sc);
 
 #endif	/* __XFS_SCRUB_SCRUB_H__ */
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 6ff012f..5a84f58 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1202,6 +1202,8 @@ xfs_ioctl_setattr_get_trans(
  * 8. for non-realtime files, the extent size hint must be limited
  *    to half the AG size to avoid alignment extending the extent beyond the
  *    limits of the AG.
+ *
+ * Please keep this function in sync with xfs_scrub_inode_extsize.
  */
 static int
 xfs_ioctl_setattr_check_extsize(
@@ -1258,6 +1260,8 @@ xfs_ioctl_setattr_check_extsize(
  * 5. Extent size must be a multiple of the appropriate block size.
  * 6. The extent size hint must be limited to half the AG size to avoid
  *    alignment extending the extent beyond the limits of the AG.
+ *
+ * Please keep this function in sync with xfs_scrub_inode_cowextsize.
  */
 static int
 xfs_ioctl_setattr_check_cowextsize(

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 24/30] xfs: scrub directory metadata
  2017-10-12  1:43 ` [PATCH 24/30] xfs: scrub directory metadata Darrick J. Wong
  2017-10-16  4:29   ` Dave Chinner
@ 2017-10-17  0:14   ` Darrick J. Wong
  2017-10-17 22:06     ` Dave Chinner
  1 sibling, 1 reply; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-17  0:14 UTC (permalink / raw)
  To: linux-xfs; +Cc: Dave Chinner

Scrub the hash tree and all the entries in a directory.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
v2: use helpers to extract DT_ codes, #define buffer size magic numbers
---
 fs/xfs/Makefile           |    1 
 fs/xfs/libxfs/xfs_dir2.c  |    4 -
 fs/xfs/libxfs/xfs_fs.h    |    3 
 fs/xfs/scrub/common.c     |   28 ++++
 fs/xfs/scrub/common.h     |    4 +
 fs/xfs/scrub/dir.c        |  320 +++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/scrub.c      |    4 +
 fs/xfs/scrub/scrub.h      |    1 
 fs/xfs/xfs_dir2_readdir.c |    2 
 fs/xfs/xfs_file.c         |    2 
 fs/xfs/xfs_mount.h        |   17 ++
 11 files changed, 382 insertions(+), 4 deletions(-)
 create mode 100644 fs/xfs/scrub/dir.c

diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index b48437f..69aa88e 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -152,6 +152,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   btree.o \
 				   common.o \
 				   dabtree.o \
+				   dir.o \
 				   ialloc.o \
 				   inode.o \
 				   refcount.o \
diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
index ee5e916..41ea6d4 100644
--- a/fs/xfs/libxfs/xfs_dir2.c
+++ b/fs/xfs/libxfs/xfs_dir2.c
@@ -39,7 +39,9 @@ struct xfs_name xfs_name_dotdot = { (unsigned char *)"..", 2, XFS_DIR3_FT_DIR };
 /*
  * Convert inode mode to directory entry filetype
  */
-unsigned char xfs_mode_to_ftype(int mode)
+unsigned char
+xfs_mode_to_ftype(
+	int		mode)
 {
 	switch (mode & S_IFMT) {
 	case S_IFREG:
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 02ae58b..b16d004 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -498,9 +498,10 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_BMBTD	12	/* data fork block mapping */
 #define XFS_SCRUB_TYPE_BMBTA	13	/* attr fork block mapping */
 #define XFS_SCRUB_TYPE_BMBTC	14	/* CoW fork block mapping */
+#define XFS_SCRUB_TYPE_DIR	15	/* directory */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	15
+#define XFS_SCRUB_TYPE_NR	16
 
 /* i: Repair this metadata. */
 #define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 415c6a9..318dd97 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -542,3 +542,31 @@ xfs_scrub_get_inode(
 	sc->ip = ip;
 	return 0;
 }
+
+/* Set us up to scrub a file's contents. */
+int
+xfs_scrub_setup_inode_contents(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip,
+	unsigned int			resblks)
+{
+	struct xfs_mount		*mp = sc->mp;
+	int				error;
+
+	error = xfs_scrub_get_inode(sc, ip);
+	if (error)
+		return error;
+
+	/* Got the inode, lock it and we're ready to go. */
+	sc->ilock_flags = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL;
+	xfs_ilock(sc->ip, sc->ilock_flags);
+	error = xfs_scrub_trans_alloc(sc->sm, mp, &sc->tp);
+	if (error)
+		goto out;
+	sc->ilock_flags |= XFS_ILOCK_EXCL;
+	xfs_ilock(sc->ip, XFS_ILOCK_EXCL);
+
+out:
+	/* scrub teardown will unlock and release the inode for us */
+	return error;
+}
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index b3cf4a2..7cd4a78 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -93,6 +93,8 @@ int xfs_scrub_setup_inode_bmap(struct xfs_scrub_context *sc,
 			       struct xfs_inode *ip);
 int xfs_scrub_setup_inode_bmap_data(struct xfs_scrub_context *sc,
 				    struct xfs_inode *ip);
+int xfs_scrub_setup_directory(struct xfs_scrub_context *sc,
+			      struct xfs_inode *ip);
 
 void xfs_scrub_ag_free(struct xfs_scrub_context *sc, struct xfs_scrub_ag *sa);
 int xfs_scrub_ag_init(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
@@ -111,5 +113,7 @@ int xfs_scrub_walk_agfl(struct xfs_scrub_context *sc,
 int xfs_scrub_setup_ag_btree(struct xfs_scrub_context *sc,
 			     struct xfs_inode *ip, bool force_log);
 int xfs_scrub_get_inode(struct xfs_scrub_context *sc, struct xfs_inode *ip_in);
+int xfs_scrub_setup_inode_contents(struct xfs_scrub_context *sc,
+				   struct xfs_inode *ip, unsigned int resblks);
 
 #endif	/* __XFS_SCRUB_COMMON_H__ */
diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c
new file mode 100644
index 0000000..ffdaf60
--- /dev/null
+++ b/fs/xfs/scrub/dir.c
@@ -0,0 +1,320 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_icache.h"
+#include "xfs_itable.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_dir2.h"
+#include "xfs_dir2_priv.h"
+#include "xfs_ialloc.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/trace.h"
+#include "scrub/dabtree.h"
+
+/* Set us up to scrub directories. */
+int
+xfs_scrub_setup_directory(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	return xfs_scrub_setup_inode_contents(sc, ip, 0);
+}
+
+/* Directories */
+
+/* Scrub a directory entry. */
+
+struct xfs_scrub_dir_ctx {
+	/* VFS fill-directory iterator */
+	struct dir_context		dir_iter;
+
+	struct xfs_scrub_context	*sc;
+};
+
+/* Check that an inode's mode matches a given DT_ type. */
+STATIC int
+xfs_scrub_dir_check_ftype(
+	struct xfs_scrub_dir_ctx	*sdc,
+	xfs_fileoff_t			offset,
+	xfs_ino_t			inum,
+	int				dtype)
+{
+	struct xfs_mount		*mp = sdc->sc->mp;
+	struct xfs_inode		*ip;
+	int				ino_dtype;
+	int				error = 0;
+
+	if (!xfs_sb_version_hasftype(&mp->m_sb)) {
+		if (dtype != DT_UNKNOWN && dtype != DT_DIR)
+			xfs_scrub_fblock_set_corrupt(sdc->sc, XFS_DATA_FORK,
+					offset);
+		goto out;
+	}
+
+	error = xfs_iget(mp, sdc->sc->tp, inum, XFS_IGET_DONTCACHE, 0, &ip);
+	if (!xfs_scrub_fblock_process_error(sdc->sc, XFS_DATA_FORK, offset,
+			&error))
+		goto out;
+
+	/* Convert mode to the DT_* values that dir_emit uses. */
+	ino_dtype = xfs_dir3_get_dtype(mp,
+			xfs_mode_to_ftype(VFS_I(ip)->i_mode));
+	if (ino_dtype != dtype)
+		xfs_scrub_fblock_set_corrupt(sdc->sc, XFS_DATA_FORK, offset);
+	iput(VFS_I(ip));
+out:
+	return error;
+}
+
+/*
+ * Scrub a single directory entry.
+ *
+ * We use the VFS directory iterator (i.e. readdir) to call this
+ * function for every directory entry in a directory.  Once we're here,
+ * we check the inode number to make sure it's sane, then we check that
+ * we can look up this filename.  Finally, we check the ftype.
+ */
+STATIC int
+xfs_scrub_dir_actor(
+	struct dir_context		*dir_iter,
+	const char			*name,
+	int				namelen,
+	loff_t				pos,
+	u64				ino,
+	unsigned			type)
+{
+	struct xfs_mount		*mp;
+	struct xfs_inode		*ip;
+	struct xfs_scrub_dir_ctx	*sdc;
+	struct xfs_name			xname;
+	xfs_ino_t			lookup_ino;
+	xfs_dablk_t			offset;
+	int				error = 0;
+
+	sdc = container_of(dir_iter, struct xfs_scrub_dir_ctx, dir_iter);
+	ip = sdc->sc->ip;
+	mp = ip->i_mount;
+	offset = xfs_dir2_db_to_da(mp->m_dir_geo,
+			xfs_dir2_dataptr_to_db(mp->m_dir_geo, pos));
+
+	/* Does this inode number make sense? */
+	if (!xfs_verify_dir_ino(mp, ino)) {
+		xfs_scrub_fblock_set_corrupt(sdc->sc, XFS_DATA_FORK, offset);
+		goto out;
+	}
+
+	if (!strncmp(".", name, namelen)) {
+		/* If this is "." then check that the inum matches the dir. */
+		if (xfs_sb_version_hasftype(&mp->m_sb) && type != DT_DIR)
+			xfs_scrub_fblock_set_corrupt(sdc->sc, XFS_DATA_FORK,
+					offset);
+		if (ino != ip->i_ino)
+			xfs_scrub_fblock_set_corrupt(sdc->sc, XFS_DATA_FORK,
+					offset);
+	} else if (!strncmp("..", name, namelen)) {
+		/*
+		 * If this is ".." in the root inode, check that the inum
+		 * matches this dir.
+		 */
+		if (xfs_sb_version_hasftype(&mp->m_sb) && type != DT_DIR)
+			xfs_scrub_fblock_set_corrupt(sdc->sc, XFS_DATA_FORK,
+					offset);
+		if (ip->i_ino == mp->m_sb.sb_rootino && ino != ip->i_ino)
+			xfs_scrub_fblock_set_corrupt(sdc->sc, XFS_DATA_FORK,
+					offset);
+	}
+
+	/* Verify that we can look up this name by hash. */
+	xname.name = name;
+	xname.len = namelen;
+	xname.type = XFS_DIR3_FT_UNKNOWN;
+
+	error = xfs_dir_lookup(sdc->sc->tp, ip, &xname, &lookup_ino, NULL);
+	if (!xfs_scrub_fblock_process_error(sdc->sc, XFS_DATA_FORK, offset,
+			&error))
+		goto fail_xref;
+	if (lookup_ino != ino) {
+		xfs_scrub_fblock_set_corrupt(sdc->sc, XFS_DATA_FORK, offset);
+		goto out;
+	}
+
+	/* Verify the file type.  This function absorbs error codes. */
+	error = xfs_scrub_dir_check_ftype(sdc, offset, lookup_ino, type);
+	if (error)
+		goto out;
+out:
+	return error;
+fail_xref:
+	return error;
+}
+
+/* Scrub a directory btree record. */
+STATIC int
+xfs_scrub_dir_rec(
+	struct xfs_scrub_da_btree	*ds,
+	int				level,
+	void				*rec)
+{
+	struct xfs_mount		*mp = ds->state->mp;
+	struct xfs_dir2_leaf_entry	*ent = rec;
+	struct xfs_inode		*dp = ds->dargs.dp;
+	struct xfs_dir2_data_entry	*dent;
+	struct xfs_buf			*bp;
+	xfs_ino_t			ino;
+	xfs_dablk_t			rec_bno;
+	xfs_dir2_db_t			db;
+	xfs_dir2_data_aoff_t		off;
+	xfs_dir2_dataptr_t		ptr;
+	xfs_dahash_t			calc_hash;
+	xfs_dahash_t			hash;
+	unsigned int			tag;
+	int				error;
+
+	/* Check the hash of the entry. */
+	error = xfs_scrub_da_btree_hash(ds, level, &ent->hashval);
+	if (error)
+		goto out;
+
+	/* Valid hash pointer? */
+	ptr = be32_to_cpu(ent->address);
+	if (ptr == 0)
+		return 0;
+
+	/* Find the directory entry's location. */
+	db = xfs_dir2_dataptr_to_db(mp->m_dir_geo, ptr);
+	off = xfs_dir2_dataptr_to_off(mp->m_dir_geo, ptr);
+	rec_bno = xfs_dir2_db_to_da(mp->m_dir_geo, db);
+
+	if (rec_bno >= mp->m_dir_geo->leafblk) {
+		xfs_scrub_da_set_corrupt(ds, level);
+		goto out;
+	}
+	error = xfs_dir3_data_read(ds->dargs.trans, dp, rec_bno, -2, &bp);
+	if (!xfs_scrub_fblock_process_error(ds->sc, XFS_DATA_FORK, rec_bno,
+			&error))
+		goto out;
+	if (!bp) {
+		xfs_scrub_fblock_set_corrupt(ds->sc, XFS_DATA_FORK, rec_bno);
+		goto out;
+	}
+
+	/* Retrieve the entry, sanity check it, and compare hashes. */
+	dent = (struct xfs_dir2_data_entry *)(((char *)bp->b_addr) + off);
+	ino = be64_to_cpu(dent->inumber);
+	hash = be32_to_cpu(ent->hashval);
+	tag = be16_to_cpup(dp->d_ops->data_entry_tag_p(dent));
+	if (!xfs_verify_dir_ino(mp, ino) || tag != off)
+		xfs_scrub_fblock_set_corrupt(ds->sc, XFS_DATA_FORK, rec_bno);
+	if (dent->namelen == 0) {
+		xfs_scrub_fblock_set_corrupt(ds->sc, XFS_DATA_FORK, rec_bno);
+		goto out_relse;
+	}
+	calc_hash = xfs_da_hashname(dent->name, dent->namelen);
+	if (calc_hash != hash)
+		xfs_scrub_fblock_set_corrupt(ds->sc, XFS_DATA_FORK, rec_bno);
+
+out_relse:
+	xfs_trans_brelse(ds->dargs.trans, bp);
+out:
+	return error;
+}
+
+/* Scrub a whole directory. */
+int
+xfs_scrub_directory(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_scrub_dir_ctx	sdc = {
+		.dir_iter.actor = xfs_scrub_dir_actor,
+		.dir_iter.pos = 0,
+		.sc = sc,
+	};
+	size_t				bufsize;
+	loff_t				oldpos;
+	int				error;
+
+	if (!S_ISDIR(VFS_I(sc->ip)->i_mode))
+		return -ENOENT;
+
+	/* Plausible size? */
+	if (sc->ip->i_d.di_size < xfs_dir2_sf_hdr_size(0)) {
+		xfs_scrub_ino_set_corrupt(sc, sc->ip->i_ino, NULL);
+		goto out;
+	}
+
+	/* Check directory tree structure */
+	error = xfs_scrub_da_btree(sc, XFS_DATA_FORK, xfs_scrub_dir_rec);
+	if (error)
+		return error;
+
+	/*
+	 * Check that every dirent we see can also be looked up by hash.
+	 * Userspace usually asks for a 32k buffer, so we will too.
+	 */
+	bufsize = (size_t)min_t(loff_t, XFS_READDIR_BUFSIZE,
+			sc->ip->i_d.di_size);
+
+	/*
+	 * Look up every name in this directory by hash.
+	 *
+	 * Use the xfs_readdir function to call xfs_scrub_dir_actor on
+	 * every directory entry in this directory.  In _actor, we check
+	 * the name, inode number, and ftype (if applicable) of the
+	 * entry.  xfs_readdir uses the VFS filldir functions to provide
+	 * iteration context.
+	 *
+	 * The VFS grabs a read or write lock via i_rwsem before it reads
+	 * or writes to a directory.  If we've gotten this far we've
+	 * already obtained IOLOCK_EXCL, which (since 4.10) is the same as
+	 * getting a write lock on i_rwsem.  Therefore, it is safe for us
+	 * to drop the ILOCK here in order to reuse the _readdir and
+	 * _dir_lookup routines, which do their own ILOCK locking.
+	 */
+	oldpos = 0;
+	sc->ilock_flags &= ~XFS_ILOCK_EXCL;
+	xfs_iunlock(sc->ip, XFS_ILOCK_EXCL);
+	while (true) {
+		error = xfs_readdir(sc->tp, sc->ip, &sdc.dir_iter, bufsize);
+		if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, 0,
+				&error))
+			goto out;
+		if (oldpos == sdc.dir_iter.pos)
+			break;
+		oldpos = sdc.dir_iter.pos;
+	}
+
+out:
+	return error;
+}
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 0979a8c..004f52d 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -227,6 +227,10 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.setup	= xfs_scrub_setup_inode_bmap,
 		.scrub	= xfs_scrub_bmap_cow,
 	},
+	{ /* directory */
+		.setup	= xfs_scrub_setup_directory,
+		.scrub	= xfs_scrub_directory,
+	},
 };
 
 /* This isn't a stable feature, warn once per day. */
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index 8920ccf..844506e 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -82,5 +82,6 @@ int xfs_scrub_inode(struct xfs_scrub_context *sc);
 int xfs_scrub_bmap_data(struct xfs_scrub_context *sc);
 int xfs_scrub_bmap_attr(struct xfs_scrub_context *sc);
 int xfs_scrub_bmap_cow(struct xfs_scrub_context *sc);
+int xfs_scrub_directory(struct xfs_scrub_context *sc);
 
 #endif	/* __XFS_SCRUB_SCRUB_H__ */
diff --git a/fs/xfs/xfs_dir2_readdir.c b/fs/xfs/xfs_dir2_readdir.c
index ba2638d..238e365 100644
--- a/fs/xfs/xfs_dir2_readdir.c
+++ b/fs/xfs/xfs_dir2_readdir.c
@@ -41,7 +41,7 @@ static unsigned char xfs_dir3_filetype_table[] = {
 	DT_FIFO, DT_SOCK, DT_LNK, DT_WHT,
 };
 
-static unsigned char
+unsigned char
 xfs_dir3_get_dtype(
 	struct xfs_mount	*mp,
 	uint8_t			filetype)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 56d0e52..64e4c43 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -979,7 +979,7 @@ xfs_file_readdir(
 	 * point we can change the ->readdir prototype to include the
 	 * buffer size.  For now we use the current glibc buffer size.
 	 */
-	bufsize = (size_t)min_t(loff_t, 32768, ip->i_d.di_size);
+	bufsize = (size_t)min_t(loff_t, XFS_READDIR_BUFSIZE, ip->i_d.di_size);
 
 	return xfs_readdir(NULL, ip, ctx, bufsize);
 }
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index e0792d0..0ae0b92 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -446,4 +446,21 @@ int	xfs_zero_extent(struct xfs_inode *ip, xfs_fsblock_t start_fsb,
 struct xfs_error_cfg * xfs_error_get_cfg(struct xfs_mount *mp,
 		int error_class, int error);
 
+/*
+ * The Linux API doesn't pass down the total size of the buffer
+ * we read into down to the filesystem.  With the filldir concept
+ * it's not needed for correct information, but the XFS dir2 leaf
+ * code wants an estimate of the buffer size to calculate it's
+ * readahead window and size the buffers used for mapping to
+ * physical blocks.
+ *
+ * Try to give it an estimate that's good enough, maybe at some
+ * point we can change the ->readdir prototype to include the
+ * buffer size.  For now we use the current glibc buffer size.
+ * musl libc hardcodes 2k and dietlibc uses PAGE_SIZE.
+ */
+#define XFS_READDIR_BUFSIZE	(32768)
+
+unsigned char xfs_dir3_get_dtype(struct xfs_mount *mp, uint8_t filetype);
+
 #endif	/* __XFS_MOUNT_H__ */

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 28/30] xfs: scrub directory parent pointers
  2017-10-12  1:43 ` [PATCH 28/30] xfs: scrub directory parent pointers Darrick J. Wong
  2017-10-16  5:09   ` Dave Chinner
@ 2017-10-17  0:16   ` Darrick J. Wong
  2017-10-17 22:11     ` Dave Chinner
  1 sibling, 1 reply; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-17  0:16 UTC (permalink / raw)
  To: linux-xfs; +Cc: Dave Chinner

Scrub parent pointers, sort of.  For directories, we can ride the
'..' entry up to the parent to confirm that there's at most one
dentry that points back to this directory.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
v2: refactor the single-ilock-retry code into a separate validate function
---
 fs/xfs/Makefile        |    1 
 fs/xfs/libxfs/xfs_fs.h |    3 
 fs/xfs/scrub/common.h  |    2 
 fs/xfs/scrub/parent.c  |  309 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/scrub.c   |    4 +
 fs/xfs/scrub/scrub.h   |    1 
 6 files changed, 319 insertions(+), 1 deletion(-)
 create mode 100644 fs/xfs/scrub/parent.c

diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 28637a6..2193a54 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -156,6 +156,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   dir.o \
 				   ialloc.o \
 				   inode.o \
+				   parent.o \
 				   refcount.o \
 				   rmap.o \
 				   scrub.o \
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index bb8bcd0..7444094 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -501,9 +501,10 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_DIR	15	/* directory */
 #define XFS_SCRUB_TYPE_XATTR	16	/* extended attribute */
 #define XFS_SCRUB_TYPE_SYMLINK	17	/* symbolic link */
+#define XFS_SCRUB_TYPE_PARENT	18	/* parent pointers */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	18
+#define XFS_SCRUB_TYPE_NR	19
 
 /* i: Repair this metadata. */
 #define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index b71c1a8..0542e7d 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -99,6 +99,8 @@ int xfs_scrub_setup_xattr(struct xfs_scrub_context *sc,
 			  struct xfs_inode *ip);
 int xfs_scrub_setup_symlink(struct xfs_scrub_context *sc,
 			    struct xfs_inode *ip);
+int xfs_scrub_setup_parent(struct xfs_scrub_context *sc,
+			   struct xfs_inode *ip);
 
 void xfs_scrub_ag_free(struct xfs_scrub_context *sc, struct xfs_scrub_ag *sa);
 int xfs_scrub_ag_init(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
diff --git a/fs/xfs/scrub/parent.c b/fs/xfs/scrub/parent.c
new file mode 100644
index 0000000..c4a78a3
--- /dev/null
+++ b/fs/xfs/scrub/parent.c
@@ -0,0 +1,309 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_icache.h"
+#include "xfs_dir2.h"
+#include "xfs_dir2_priv.h"
+#include "xfs_ialloc.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/trace.h"
+
+/* Set us up to scrub parents. */
+int
+xfs_scrub_setup_parent(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	return xfs_scrub_setup_inode_contents(sc, ip, 0);
+}
+
+/* Parent pointers */
+
+/* Look for an entry in a parent pointing to this inode. */
+
+struct xfs_scrub_parent_ctx {
+	struct dir_context		dc;
+	xfs_ino_t			ino;
+	xfs_nlink_t			nlink;
+};
+
+/* Look for a single entry in a directory pointing to an inode. */
+STATIC int
+xfs_scrub_parent_actor(
+	struct dir_context		*dc,
+	const char			*name,
+	int				namelen,
+	loff_t				pos,
+	u64				ino,
+	unsigned			type)
+{
+	struct xfs_scrub_parent_ctx	*spc;
+
+	spc = container_of(dc, struct xfs_scrub_parent_ctx, dc);
+	if (spc->ino == ino)
+		spc->nlink++;
+	return 0;
+}
+
+/* Count the number of dentries in the parent dir that point to this inode. */
+STATIC int
+xfs_scrub_parent_count_parent_dentries(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*parent,
+	xfs_nlink_t			*nlink)
+{
+	struct xfs_scrub_parent_ctx	spc = {
+		.dc.actor = xfs_scrub_parent_actor,
+		.dc.pos = 0,
+		.ino = sc->ip->i_ino,
+		.nlink = 0,
+	};
+	size_t				bufsize;
+	loff_t				oldpos;
+	uint				lock_mode;
+	int				error = 0;
+
+	/*
+	 * If there are any blocks, read-ahead block 0 as we're almost
+	 * certain to have the next operation be a read there.  This is
+	 * how we guarantee that the parent's extent map has been loaded,
+	 * if there is one.
+	 */
+	lock_mode = xfs_ilock_data_map_shared(parent);
+	if (parent->i_d.di_nextents > 0)
+		error = xfs_dir3_data_readahead(parent, 0, -1);
+	xfs_iunlock(parent, lock_mode);
+	if (error)
+		return error;
+
+	/*
+	 * Iterate the parent dir to confirm that there is
+	 * exactly one entry pointing back to the inode being
+	 * scanned.
+	 */
+	bufsize = (size_t)min_t(loff_t, XFS_READDIR_BUFSIZE,
+			parent->i_d.di_size);
+	oldpos = 0;
+	while (true) {
+		error = xfs_readdir(sc->tp, parent, &spc.dc, bufsize);
+		if (error)
+			goto out;
+		if (oldpos == spc.dc.pos)
+			break;
+		oldpos = spc.dc.pos;
+	}
+	*nlink = spc.nlink;
+out:
+	return error;
+}
+
+/*
+ * Given the inode number of the alleged parent of the inode being
+ * scrubbed, try to validate that the parent has exactly one directory
+ * entry pointing back to the inode being scrubbed.
+ */
+STATIC int
+xfs_scrub_parent_validate(
+	struct xfs_scrub_context	*sc,
+	xfs_ino_t			dnum,
+	bool				*try_again)
+{
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_inode		*dp = NULL;
+	xfs_nlink_t			expected_nlink;
+	xfs_nlink_t			nlink;
+	int				error;
+
+	*try_again = false;
+
+	/* '..' must not point to ourselves. */
+	if (sc->ip->i_ino == dnum) {
+		xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
+		goto out;
+	}
+
+	/*
+	 * If we're an unlinked directory, the parent /won't/ have a link
+	 * to us.  Otherwise, it should have one link.
+	 */
+	expected_nlink = VFS_I(sc->ip)->i_nlink == 0 ? 0 : 1;
+
+	/* Grab this parent inode. */
+	error = xfs_iget(mp, sc->tp, dnum, XFS_IGET_DONTCACHE, 0, &dp);
+	if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, 0, &error))
+		goto out;
+	if (dp == sc->ip) {
+		xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
+		goto out_rele;
+	}
+
+	/*
+	 * We prefer to keep the inode locked while we lock and search
+	 * its alleged parent for a forward reference.  If we can grab
+	 * the iolock, validate the pointers and we're done.  We must
+	 * use nowait here to avoid an ABBA deadlock on the parent and
+	 * the child inodes.
+	 */
+	if (xfs_ilock_nowait(dp, XFS_IOLOCK_SHARED)) {
+		error = xfs_scrub_parent_count_parent_dentries(sc, dp, &nlink);
+		if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, 0,
+				&error))
+			goto out_unlock;
+		if (nlink != expected_nlink)
+			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
+		goto out_unlock;
+	}
+
+	/*
+	 * The game changes if we get here.  We failed to lock the parent,
+	 * so we're going to try to verify both pointers while only holding
+	 * one lock so as to avoid deadlocking with something that's actually
+	 * trying to traverse down the directory tree.
+	 */
+	xfs_iunlock(sc->ip, sc->ilock_flags);
+	sc->ilock_flags = 0;
+	xfs_ilock(dp, XFS_IOLOCK_SHARED);
+
+	/* Go looking for our dentry. */
+	error = xfs_scrub_parent_count_parent_dentries(sc, dp, &nlink);
+	if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, 0, &error))
+		goto out_unlock;
+
+	/* Drop the parent lock, relock this inode. */
+	xfs_iunlock(dp, XFS_IOLOCK_SHARED);
+	sc->ilock_flags = XFS_IOLOCK_EXCL;
+	xfs_ilock(sc->ip, sc->ilock_flags);
+
+	/*
+	 * If we're an unlinked directory, the parent /won't/ have a link
+	 * to us.  Otherwise, it should have one link.  We have to re-set
+	 * it here because we dropped the lock on sc->ip.
+	 */
+	expected_nlink = VFS_I(sc->ip)->i_nlink == 0 ? 0 : 1;
+
+	/* Look up '..' to see if the inode changed. */
+	error = xfs_dir_lookup(sc->tp, sc->ip, &xfs_name_dotdot, &dnum, NULL);
+	if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, 0, &error))
+		goto out_rele;
+
+	/* Drat, parent changed.  Try again! */
+	if (dnum != dp->i_ino) {
+		iput(VFS_I(dp));
+		*try_again = true;
+		return 0;
+	}
+	iput(VFS_I(dp));
+
+	/*
+	 * '..' didn't change, so check that there was only one entry
+	 * for us in the parent.
+	 */
+	if (nlink != expected_nlink)
+		xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
+	return error;
+
+out_unlock:
+	xfs_iunlock(dp, XFS_IOLOCK_SHARED);
+out_rele:
+	iput(VFS_I(dp));
+out:
+	return error;
+}
+
+/* Scrub a parent pointer. */
+int
+xfs_scrub_parent(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_mount		*mp = sc->mp;
+	xfs_ino_t			dnum;
+	bool				try_again;
+	int				tries = 0;
+	int				error;
+
+	/*
+	 * If we're a directory, check that the '..' link points up to
+	 * a directory that has one entry pointing to us.
+	 */
+	if (!S_ISDIR(VFS_I(sc->ip)->i_mode))
+		return -ENOENT;
+
+	/* We're not a special inode, are we? */
+	if (!xfs_verify_dir_ino(mp, sc->ip->i_ino)) {
+		xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
+		goto out;
+	}
+
+	/*
+	 * The VFS grabs a read or write lock via i_rwsem before it reads
+	 * or writes to a directory.  If we've gotten this far we've
+	 * already obtained IOLOCK_EXCL, which (since 4.10) is the same as
+	 * getting a write lock on i_rwsem.  Therefore, it is safe for us
+	 * to drop the ILOCK here in order to do directory lookups.
+	 */
+	sc->ilock_flags &= ~(XFS_ILOCK_EXCL | XFS_MMAPLOCK_EXCL);
+	xfs_iunlock(sc->ip, XFS_ILOCK_EXCL | XFS_MMAPLOCK_EXCL);
+
+	/* Look up '..' */
+	error = xfs_dir_lookup(sc->tp, sc->ip, &xfs_name_dotdot, &dnum, NULL);
+	if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, 0, &error))
+		goto out;
+	if (!xfs_verify_dir_ino(mp, dnum)) {
+		xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
+		goto out;
+	}
+
+	/* Is this the root dir?  Then '..' must point to itself. */
+	if (sc->ip == mp->m_rootip) {
+		if (sc->ip->i_ino != mp->m_sb.sb_rootino ||
+		    sc->ip->i_ino != dnum)
+			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
+		goto out;
+	}
+
+	do {
+		error = xfs_scrub_parent_validate(sc, dnum, &try_again);
+		if (error)
+			goto out;
+	} while (try_again && ++tries < 20);
+
+	/*
+	 * We gave it our best shot but failed, so mark this scrub
+	 * incomplete.  Userspace can decide if it wants to try again.
+	 */
+	if (try_again && tries == 20)
+		xfs_scrub_set_incomplete(sc);
+out:
+	return error;
+}
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 0a7276b..c6ca402 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -243,6 +243,10 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.setup	= xfs_scrub_setup_symlink,
 		.scrub	= xfs_scrub_symlink,
 	},
+	{ /* parent pointers */
+		.setup	= xfs_scrub_setup_parent,
+		.scrub	= xfs_scrub_parent,
+	},
 };
 
 /* This isn't a stable feature, warn once per day. */
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index dc4ed8d..a264810 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -86,5 +86,6 @@ int xfs_scrub_bmap_cow(struct xfs_scrub_context *sc);
 int xfs_scrub_directory(struct xfs_scrub_context *sc);
 int xfs_scrub_xattr(struct xfs_scrub_context *sc);
 int xfs_scrub_symlink(struct xfs_scrub_context *sc);
+int xfs_scrub_parent(struct xfs_scrub_context *sc);
 
 #endif	/* __XFS_SCRUB_SCRUB_H__ */

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 25/30] xfs: scrub directory freespace
  2017-10-12  1:43 ` [PATCH 25/30] xfs: scrub directory freespace Darrick J. Wong
  2017-10-16  4:49   ` Dave Chinner
@ 2017-10-17  1:10   ` Darrick J. Wong
  2017-10-17 22:08     ` Dave Chinner
  1 sibling, 1 reply; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-17  1:10 UTC (permalink / raw)
  To: linux-xfs; +Cc: Dave Chinner

Check the free space information in a directory.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
v2: make the freespace and leaf checks more complete
---
 fs/xfs/scrub/dir.c |  474 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 474 insertions(+)

diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c
index ffdaf60..21c50e4 100644
--- a/fs/xfs/scrub/dir.c
+++ b/fs/xfs/scrub/dir.c
@@ -251,6 +251,475 @@ xfs_scrub_dir_rec(
 	return error;
 }
 
+/*
+ * Is this unused entry either in the bestfree or smaller than all of
+ * them?  We've already checked that the bestfrees are sorted longest to
+ * shortest, and that there aren't any bogus entries.
+ */
+STATIC void
+xfs_scrub_directory_check_free_entry(
+	struct xfs_scrub_context	*sc,
+	xfs_dablk_t			lblk,
+	struct xfs_dir2_data_free	*bf,
+	struct xfs_dir2_data_unused	*dup)
+{
+	struct xfs_dir2_data_free	*dfp;
+	unsigned int			dup_length;
+
+	dup_length = be16_to_cpu(dup->length);
+
+	/* Unused entry is shorter than any of the bestfrees */
+	if (dup_length < be16_to_cpu(bf[XFS_DIR2_DATA_FD_COUNT - 1].length))
+		return;
+
+	for (dfp = &bf[XFS_DIR2_DATA_FD_COUNT - 1]; dfp >= bf; dfp--)
+		if (dup_length == be16_to_cpu(dfp->length))
+			return;
+
+	/* Unused entry should be in the bestfrees but wasn't found. */
+	xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
+}
+
+/* Check free space info in a directory data block. */
+STATIC int
+xfs_scrub_directory_data_bestfree(
+	struct xfs_scrub_context	*sc,
+	xfs_dablk_t			lblk,
+	bool				is_block)
+{
+	struct xfs_dir2_data_unused	*dup;
+	struct xfs_dir2_data_free	*dfp;
+	struct xfs_buf			*bp;
+	struct xfs_dir2_data_free	*bf;
+	struct xfs_mount		*mp = sc->mp;
+	const struct xfs_dir_ops	*d_ops;
+	char				*ptr;
+	char				*endptr;
+	u16				tag;
+	unsigned int			nr_bestfrees = 0;
+	unsigned int			nr_frees = 0;
+	unsigned int			smallest_bestfree;
+	int				newlen;
+	int				offset;
+	int				error;
+
+	d_ops = sc->ip->d_ops;
+
+	if (is_block) {
+		/* dir block format */
+		if (lblk != XFS_B_TO_FSBT(mp, XFS_DIR2_DATA_OFFSET))
+			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
+		error = xfs_dir3_block_read(sc->tp, sc->ip, &bp);
+	} else {
+		/* dir data format */
+		error = xfs_dir3_data_read(sc->tp, sc->ip, lblk, -1, &bp);
+	}
+	if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, lblk, &error))
+		goto out;
+
+	/* Do the bestfrees correspond to actual free space? */
+	bf = d_ops->data_bestfree_p(bp->b_addr);
+	smallest_bestfree = UINT_MAX;
+	for (dfp = &bf[0]; dfp < &bf[XFS_DIR2_DATA_FD_COUNT]; dfp++) {
+		offset = be16_to_cpu(dfp->offset);
+		if (offset == 0)
+			continue;
+		if (offset >= mp->m_dir_geo->blksize) {
+			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
+			goto out_buf;
+		}
+		dup = (struct xfs_dir2_data_unused *)(bp->b_addr + offset);
+		tag = be16_to_cpu(*xfs_dir2_data_unused_tag_p(dup));
+
+		/* bestfree doesn't match the entry it points at? */
+		if (dup->freetag != cpu_to_be16(XFS_DIR2_DATA_FREE_TAG) ||
+		    be16_to_cpu(dup->length) != be16_to_cpu(dfp->length) ||
+		    tag != ((char *)dup - (char *)bp->b_addr)) {
+			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
+			goto out_buf;
+		}
+
+		/* bestfree records should be ordered largest to smallest */
+		if (smallest_bestfree < be16_to_cpu(dfp->length)) {
+			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
+			goto out_buf;
+		}
+
+		smallest_bestfree = be16_to_cpu(dfp->length);
+		nr_bestfrees++;
+	}
+
+	/* Make sure the bestfrees are actually the best free spaces. */
+	ptr = (char *)d_ops->data_entry_p(bp->b_addr);
+	if (is_block) {
+		struct xfs_dir2_block_tail	*btp;
+
+		btp = xfs_dir2_block_tail_p(mp->m_dir_geo, bp->b_addr);
+		endptr = (char *)xfs_dir2_block_leaf_p(btp);
+	} else
+		endptr = (char *)bp->b_addr + BBTOB(bp->b_length);
+	while (ptr < endptr) {
+		dup = (struct xfs_dir2_data_unused *)ptr;
+		/* Skip real entries */
+		if (dup->freetag != cpu_to_be16(XFS_DIR2_DATA_FREE_TAG)) {
+			struct xfs_dir2_data_entry	*dep;
+
+			dep = (struct xfs_dir2_data_entry *)ptr;
+			newlen = d_ops->data_entsize(dep->namelen);
+			if (newlen <= 0) {
+				xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK,
+						lblk);
+				goto out_buf;
+			}
+			ptr += newlen;
+			continue;
+		}
+
+		/* Spot check this free entry */
+		tag = be16_to_cpu(*xfs_dir2_data_unused_tag_p(dup));
+		if (tag != ((char *)dup - (char *)bp->b_addr))
+			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
+
+		/*
+		 * Either this entry is a bestfree or it's smaller than
+		 * any of the bestfrees.
+		 */
+		xfs_scrub_directory_check_free_entry(sc, lblk, bf, dup);
+
+		/* Move on. */
+		newlen = be16_to_cpu(dup->length);
+		if (newlen <= 0) {
+			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
+			goto out_buf;
+		}
+		ptr += newlen;
+		if (ptr <= endptr)
+			nr_frees++;
+	}
+
+	/* Did we go off the end? */
+	if (ptr > endptr)
+		xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
+
+	/* Did we see at least as many free slots as there are bestfrees? */
+	if (nr_frees < nr_bestfrees)
+		xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
+out_buf:
+	xfs_trans_brelse(sc->tp, bp);
+out:
+	return error;
+}
+
+/*
+ * Does the free space length in the free space index block ($len) match
+ * the longest length in the directory data block's bestfree array?
+ * Assume that we've already checked that the data block's bestfree
+ * array is in order.
+ */
+STATIC void
+xfs_scrub_directory_check_freesp(
+	struct xfs_scrub_context	*sc,
+	xfs_dablk_t			lblk,
+	struct xfs_buf			*dbp,
+	unsigned int			len)
+{
+	struct xfs_dir2_data_free	*bf;
+	struct xfs_dir2_data_free	*dfp;
+	int				offset;
+
+	if (len == 0)
+		return;
+
+	bf = sc->ip->d_ops->data_bestfree_p(dbp->b_addr);
+	for (dfp = &bf[0]; dfp < &bf[XFS_DIR2_DATA_FD_COUNT]; dfp++) {
+		offset = be16_to_cpu(dfp->offset);
+		if (offset == 0)
+			break;
+		if (len == be16_to_cpu(dfp->length))
+			return;
+		/* Didn't find the best length in the bestfree data */
+		break;
+	}
+
+	xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
+}
+
+/* Check free space info in a directory leaf1 block. */
+STATIC int
+xfs_scrub_directory_leaf1_bestfree(
+	struct xfs_scrub_context	*sc,
+	struct xfs_da_args		*args,
+	xfs_dablk_t			lblk)
+{
+	struct xfs_dir3_icleaf_hdr	leafhdr;
+	struct xfs_dir2_leaf_entry	*ents;
+	struct xfs_dir2_leaf_tail	*ltp;
+	struct xfs_dir2_leaf		*leaf;
+	struct xfs_buf			*dbp;
+	struct xfs_buf			*bp;
+	struct xfs_mount		*mp = sc->mp;
+	const struct xfs_dir_ops	*d_ops = sc->ip->d_ops;
+	__be16				*bestp;
+	__u16				best;
+	__u32				hash;
+	__u32				lasthash = 0;
+	__u32				bestcount;
+	unsigned int			stale = 0;
+	int				i;
+	int				error;
+
+	/* Read the free space block. */
+	error = xfs_dir3_leaf_read(sc->tp, sc->ip, lblk, -1, &bp);
+	if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, lblk, &error))
+		goto out;
+
+	leaf = bp->b_addr;
+	d_ops->leaf_hdr_from_disk(&leafhdr, leaf);
+	ents = d_ops->leaf_ents_p(leaf);
+	ltp = xfs_dir2_leaf_tail_p(mp->m_dir_geo, leaf);
+	bestcount = be32_to_cpu(ltp->bestcount);
+	bestp = xfs_dir2_leaf_bests_p(ltp);
+
+	/*
+	 * There should be as many bestfree slots as there are dir data
+	 * blocks that can fit under i_size.
+	 */
+	if (bestcount != XFS_B_TO_FSB(mp, sc->ip->i_d.di_size)) {
+		xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
+		goto out;
+	}
+
+	/* Is the leaf count even remotely sane? */
+	if (leafhdr.count > d_ops->leaf_max_ents(mp->m_dir_geo)) {
+		xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
+		goto out;
+	}
+
+	/* Leaves and bests don't overlap in leaf format. */
+	if ((char *)&ents[leafhdr.count] > (char *)xfs_dir2_leaf_bests_p(ltp)) {
+		xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
+		goto out;
+	}
+
+	/* Check hash value order, count stale entries.  */
+	for (i = 0; i < leafhdr.count; i++) {
+		hash = be32_to_cpu(ents[i].hashval);
+		if (i > 0 && lasthash > hash)
+			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
+		lasthash = hash;
+		if (ents[i].address == cpu_to_be32(XFS_DIR2_NULL_DATAPTR))
+			stale++;
+	}
+	if (leafhdr.stale != stale)
+		xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
+
+	/* Check all the bestfree entries. */
+	for (i = 0; i < be32_to_cpu(ltp->bestcount); i++, bestp++) {
+		best = be16_to_cpu(*bestp);
+		if (best == NULLDATAOFF)
+			continue;
+		error = xfs_dir3_data_read(sc->tp, sc->ip,
+				i * args->geo->fsbcount, -1, &dbp);
+		if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, lblk,
+				&error))
+			continue;
+		xfs_scrub_directory_check_freesp(sc, lblk, dbp, best);
+		xfs_trans_brelse(sc->tp, dbp);
+	}
+out:
+	return error;
+}
+
+/* Check free space info in a directory freespace block. */
+STATIC int
+xfs_scrub_directory_free_bestfree(
+	struct xfs_scrub_context	*sc,
+	struct xfs_da_args		*args,
+	xfs_dablk_t			lblk)
+{
+	struct xfs_dir3_icfree_hdr	freehdr;
+	struct xfs_buf			*dbp;
+	struct xfs_buf			*bp;
+	__be16				*bestp;
+	__be16				best;
+	unsigned int			stale = 0;
+	int				i;
+	int				error;
+
+	/* Read the free space block */
+	error = xfs_dir2_free_read(sc->tp, sc->ip, lblk, &bp);
+	if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, lblk, &error))
+		goto out;
+
+	/* Check all the entries. */
+	sc->ip->d_ops->free_hdr_from_disk(&freehdr, bp->b_addr);
+	bestp = sc->ip->d_ops->free_bests_p(bp->b_addr);
+	for (i = 0; i < freehdr.nvalid; i++, bestp++) {
+		best = be16_to_cpu(*bestp);
+		if (best == NULLDATAOFF) {
+			stale++;
+			continue;
+		}
+		error = xfs_dir3_data_read(sc->tp, sc->ip,
+				(freehdr.firstdb + i) * args->geo->fsbcount,
+				-1, &dbp);
+		if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, lblk,
+				&error))
+			continue;
+		xfs_scrub_directory_check_freesp(sc, lblk, dbp, best);
+		xfs_trans_brelse(sc->tp, dbp);
+	}
+
+	if (freehdr.nused + stale != freehdr.nvalid)
+		xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
+out:
+	return error;
+}
+
+/* Check free space information in directories. */
+STATIC int
+xfs_scrub_directory_blocks(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_bmbt_irec		got;
+	struct xfs_da_args		args;
+	struct xfs_ifork		*ifp;
+	struct xfs_mount		*mp = sc->mp;
+	xfs_fileoff_t			leaf_lblk;
+	xfs_fileoff_t			free_lblk;
+	xfs_fileoff_t			lblk;
+	xfs_extnum_t			idx;
+	xfs_dablk_t			dabno;
+	bool				found;
+	int				is_block = 0;
+	int				error;
+
+	/* Ignore local format directories. */
+	if (sc->ip->i_d.di_format != XFS_DINODE_FMT_EXTENTS &&
+	    sc->ip->i_d.di_format != XFS_DINODE_FMT_BTREE)
+		return 0;
+
+	ifp = XFS_IFORK_PTR(sc->ip, XFS_DATA_FORK);
+	lblk = XFS_B_TO_FSB(mp, XFS_DIR2_DATA_OFFSET);
+	leaf_lblk = XFS_B_TO_FSB(mp, XFS_DIR2_LEAF_OFFSET);
+	free_lblk = XFS_B_TO_FSB(mp, XFS_DIR2_FREE_OFFSET);
+
+	/* Is this a block dir? */
+	args.dp = sc->ip;
+	args.geo = mp->m_dir_geo;
+	args.trans = sc->tp;
+	error = xfs_dir2_isblock(&args, &is_block);
+	if (!xfs_scrub_fblock_process_error(sc, XFS_DATA_FORK, lblk, &error))
+		goto out;
+
+	/* Iterate all the data extents in the directory... */
+	found = xfs_iext_lookup_extent(sc->ip, ifp, lblk, &idx, &got);
+	while (found) {
+		/* Block directories only have a single block at offset 0. */
+		if (is_block &&
+		    (got.br_startoff > 0 ||
+		     got.br_blockcount != args.geo->fsbcount)) {
+			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK,
+					got.br_startoff);
+			break;
+		}
+
+		/* No more data blocks... */
+		if (got.br_startoff >= leaf_lblk)
+			break;
+
+		/*
+		 * Check each data block's bestfree data.
+		 *
+		 * Iterate all the fsbcount-aligned block offsets in
+		 * this directory.  The directory block reading code is
+		 * smart enough to do its own bmap lookups to handle
+		 * discontiguous directory blocks.  When we're done
+		 * with the extent record, re-query the bmap at the
+		 * next fsbcount-aligned offset to avoid redundant
+		 * block checks.
+		 */
+		for (lblk = roundup((xfs_dablk_t)got.br_startoff,
+				args.geo->fsbcount);
+		     lblk < got.br_startoff + got.br_blockcount;
+		     lblk += args.geo->fsbcount) {
+			error = xfs_scrub_directory_data_bestfree(sc, lblk,
+					is_block);
+			if (error)
+				goto out;
+		}
+		dabno = got.br_startoff + got.br_blockcount;
+		lblk = roundup(dabno, args.geo->fsbcount);
+		found = xfs_iext_lookup_extent(sc->ip, ifp, lblk, &idx, &got);
+	}
+
+	if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
+		goto out;
+
+	/* Look for a leaf1 block, which has free info. */
+	if (xfs_iext_lookup_extent(sc->ip, ifp, leaf_lblk, &idx, &got) &&
+	    got.br_startoff == leaf_lblk &&
+	    got.br_blockcount == args.geo->fsbcount &&
+	    !xfs_iext_get_extent(ifp, ++idx, &got)) {
+		if (is_block) {
+			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
+			goto out;
+		}
+		error = xfs_scrub_directory_leaf1_bestfree(sc, &args,
+				leaf_lblk);
+		if (error)
+			goto out;
+	}
+
+	if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
+		goto out;
+
+	/* Scan for free blocks */
+	lblk = free_lblk;
+	found = xfs_iext_lookup_extent(sc->ip, ifp, lblk, &idx, &got);
+	while (found) {
+		/*
+		 * Dirs can't have blocks mapped above 2^32.
+		 * Single-block dirs shouldn't even be here.
+		 */
+		lblk = got.br_startoff;
+		if (lblk & ~0xFFFFFFFFULL) {
+			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
+			goto out;
+		}
+		if (is_block) {
+			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
+			goto out;
+		}
+
+		/*
+		 * Check each dir free block's bestfree data.
+		 *
+		 * Iterate all the fsbcount-aligned block offsets in
+		 * this directory.  The directory block reading code is
+		 * smart enough to do its own bmap lookups to handle
+		 * discontiguous directory blocks.  When we're done
+		 * with the extent record, re-query the bmap at the
+		 * next fsbcount-aligned offset to avoid redundant
+		 * block checks.
+		 */
+		for (lblk = roundup((xfs_dablk_t)got.br_startoff,
+				args.geo->fsbcount);
+		     lblk < got.br_startoff + got.br_blockcount;
+		     lblk += args.geo->fsbcount) {
+			error = xfs_scrub_directory_free_bestfree(sc, &args,
+					lblk);
+			if (error)
+				goto out;
+		}
+		dabno = got.br_startoff + got.br_blockcount;
+		lblk = roundup(dabno, args.geo->fsbcount);
+		found = xfs_iext_lookup_extent(sc->ip, ifp, lblk, &idx, &got);
+	}
+out:
+	return error;
+}
+
 /* Scrub a whole directory. */
 int
 xfs_scrub_directory(
@@ -279,6 +748,11 @@ xfs_scrub_directory(
 	if (error)
 		return error;
 
+	/* Check the freespace. */
+	error = xfs_scrub_directory_blocks(sc);
+	if (error)
+		return error;
+
 	/*
 	 * Check that every dirent we see can also be looked up by hash.
 	 * Userspace usually asks for a 32k buffer, so we will too.

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* Re: [PATCH 30/30] xfs: scrub quota information
  2017-10-16  5:12   ` Dave Chinner
@ 2017-10-17  1:11     ` Darrick J. Wong
  0 siblings, 0 replies; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-17  1:11 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, Oct 16, 2017 at 04:12:14PM +1100, Dave Chinner wrote:
> On Wed, Oct 11, 2017 at 06:44:00PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Perform some quick sanity testing of the disk quota information.
> 
> Looks good.
> 
> Reviewed-by: Dave Chinner <dchinner@redhat.com>

Hey, you got all the way to the last patch!  Thanks a lot for the review!

--D

> 
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 18/30] xfs: scrub inode btrees
  2017-10-17  0:11   ` [PATCH v2 " Darrick J. Wong
@ 2017-10-17 21:59     ` Dave Chinner
  0 siblings, 0 replies; 99+ messages in thread
From: Dave Chinner @ 2017-10-17 21:59 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Oct 16, 2017 at 05:11:17PM -0700, Darrick J. Wong wrote:
> Check the records of the inode btrees to make sure that the values
> make sense given the inode records themselves.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> v2: fix insane freemask variable usage, shorten helper function names

looks good to me.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 21/30] xfs: scrub inodes
  2017-10-17  0:13   ` [PATCH v2 " Darrick J. Wong
@ 2017-10-17 22:01     ` Dave Chinner
  0 siblings, 0 replies; 99+ messages in thread
From: Dave Chinner @ 2017-10-17 22:01 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Oct 16, 2017 at 05:13:00PM -0700, Darrick J. Wong wrote:
> Scrub the fields within an inode.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> v2: fix illegible logic, enhance comments
> ---

Good to go.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 24/30] xfs: scrub directory metadata
  2017-10-17  0:14   ` [PATCH v2 " Darrick J. Wong
@ 2017-10-17 22:06     ` Dave Chinner
  0 siblings, 0 replies; 99+ messages in thread
From: Dave Chinner @ 2017-10-17 22:06 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Oct 16, 2017 at 05:14:33PM -0700, Darrick J. Wong wrote:
> Scrub the hash tree and all the entries in a directory.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> v2: use helpers to extract DT_ codes, #define buffer size magic numbers
> ---
....
> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> index e0792d0..0ae0b92 100644
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -446,4 +446,21 @@ int	xfs_zero_extent(struct xfs_inode *ip, xfs_fsblock_t start_fsb,
>  struct xfs_error_cfg * xfs_error_get_cfg(struct xfs_mount *mp,
>  		int error_class, int error);
>  
> +/*
> + * The Linux API doesn't pass down the total size of the buffer
> + * we read into down to the filesystem.  With the filldir concept
> + * it's not needed for correct information, but the XFS dir2 leaf
> + * code wants an estimate of the buffer size to calculate it's
> + * readahead window and size the buffers used for mapping to
> + * physical blocks.
> + *
> + * Try to give it an estimate that's good enough, maybe at some
> + * point we can change the ->readdir prototype to include the
> + * buffer size.  For now we use the current glibc buffer size.
> + * musl libc hardcodes 2k and dietlibc uses PAGE_SIZE.
> + */
> +#define XFS_READDIR_BUFSIZE	(32768)
> +
> +unsigned char xfs_dir3_get_dtype(struct xfs_mount *mp, uint8_t filetype);
> +
>  #endif	/* __XFS_MOUNT_H__ */

I think these belong in xfs_dir2.h, next to the declaration of
xfs_mode_to_ftype()....

Other than that,

Reviewed-by: Dave Chinner <dchinner@redhat.com>

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 25/30] xfs: scrub directory freespace
  2017-10-17  1:10   ` [PATCH v2 " Darrick J. Wong
@ 2017-10-17 22:08     ` Dave Chinner
  2017-10-17 23:51       ` Darrick J. Wong
  0 siblings, 1 reply; 99+ messages in thread
From: Dave Chinner @ 2017-10-17 22:08 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Oct 16, 2017 at 06:10:45PM -0700, Darrick J. Wong wrote:
> Check the free space information in a directory.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> v2: make the freespace and leaf checks more complete
> ---
.....
> +	/* Make sure the bestfrees are actually the best free spaces. */
> +	ptr = (char *)d_ops->data_entry_p(bp->b_addr);
> +	if (is_block) {
> +		struct xfs_dir2_block_tail	*btp;
> +
> +		btp = xfs_dir2_block_tail_p(mp->m_dir_geo, bp->b_addr);
> +		endptr = (char *)xfs_dir2_block_leaf_p(btp);
> +	} else
> +		endptr = (char *)bp->b_addr + BBTOB(bp->b_length);
> +	while (ptr < endptr) {
> +		dup = (struct xfs_dir2_data_unused *)ptr;
> +		/* Skip real entries */
> +		if (dup->freetag != cpu_to_be16(XFS_DIR2_DATA_FREE_TAG)) {
> +			struct xfs_dir2_data_entry	*dep;
> +
> +			dep = (struct xfs_dir2_data_entry *)ptr;
> +			newlen = d_ops->data_entsize(dep->namelen);
> +			if (newlen <= 0) {
> +				xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK,
> +						lblk);
> +				goto out_buf;
> +			}
> +			ptr += newlen;
> +			continue;
> +		}
> +
> +		/* Spot check this free entry */
> +		tag = be16_to_cpu(*xfs_dir2_data_unused_tag_p(dup));
> +		if (tag != ((char *)dup - (char *)bp->b_addr))
> +			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
> +
> +		/*
> +		 * Either this entry is a bestfree or it's smaller than
> +		 * any of the bestfrees.
> +		 */
> +		xfs_scrub_directory_check_free_entry(sc, lblk, bf, dup);
> +
> +		/* Move on. */
> +		newlen = be16_to_cpu(dup->length);
> +		if (newlen <= 0) {
> +			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
> +			goto out_buf;
> +		}
> +		ptr += newlen;
> +		if (ptr <= endptr)
> +			nr_frees++;
> +	}
> +
> +	/* Did we go off the end? */
> +	if (ptr > endptr)
> +		xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);

ptr >= endptr?

Otherwise looks good.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 28/30] xfs: scrub directory parent pointers
  2017-10-17  0:16   ` [PATCH v2 " Darrick J. Wong
@ 2017-10-17 22:11     ` Dave Chinner
  0 siblings, 0 replies; 99+ messages in thread
From: Dave Chinner @ 2017-10-17 22:11 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Oct 16, 2017 at 05:16:03PM -0700, Darrick J. Wong wrote:
> Scrub parent pointers, sort of.  For directories, we can ride the
> '..' entry up to the parent to confirm that there's at most one
> dentry that points back to this directory.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> v2: refactor the single-ilock-retry code into a separate validate function

Looks fine.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 25/30] xfs: scrub directory freespace
  2017-10-17 22:08     ` Dave Chinner
@ 2017-10-17 23:51       ` Darrick J. Wong
  0 siblings, 0 replies; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-17 23:51 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Oct 18, 2017 at 09:08:35AM +1100, Dave Chinner wrote:
> On Mon, Oct 16, 2017 at 06:10:45PM -0700, Darrick J. Wong wrote:
> > Check the free space information in a directory.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> > v2: make the freespace and leaf checks more complete
> > ---
> .....
> > +	/* Make sure the bestfrees are actually the best free spaces. */
> > +	ptr = (char *)d_ops->data_entry_p(bp->b_addr);
> > +	if (is_block) {
> > +		struct xfs_dir2_block_tail	*btp;
> > +
> > +		btp = xfs_dir2_block_tail_p(mp->m_dir_geo, bp->b_addr);
> > +		endptr = (char *)xfs_dir2_block_leaf_p(btp);
> > +	} else
> > +		endptr = (char *)bp->b_addr + BBTOB(bp->b_length);

/* Iterate the entries, stopping when we hit or go past the end. */

> > +	while (ptr < endptr) {
> > +		dup = (struct xfs_dir2_data_unused *)ptr;
> > +		/* Skip real entries */
> > +		if (dup->freetag != cpu_to_be16(XFS_DIR2_DATA_FREE_TAG)) {
> > +			struct xfs_dir2_data_entry	*dep;
> > +
> > +			dep = (struct xfs_dir2_data_entry *)ptr;
> > +			newlen = d_ops->data_entsize(dep->namelen);
> > +			if (newlen <= 0) {
> > +				xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK,
> > +						lblk);
> > +				goto out_buf;
> > +			}
> > +			ptr += newlen;
> > +			continue;
> > +		}
> > +
> > +		/* Spot check this free entry */
> > +		tag = be16_to_cpu(*xfs_dir2_data_unused_tag_p(dup));
> > +		if (tag != ((char *)dup - (char *)bp->b_addr))
> > +			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
> > +
> > +		/*
> > +		 * Either this entry is a bestfree or it's smaller than
> > +		 * any of the bestfrees.
> > +		 */
> > +		xfs_scrub_directory_check_free_entry(sc, lblk, bf, dup);
> > +
> > +		/* Move on. */
> > +		newlen = be16_to_cpu(dup->length);
> > +		if (newlen <= 0) {
> > +			xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
> > +			goto out_buf;
> > +		}
> > +		ptr += newlen;
> > +		if (ptr <= endptr)
> > +			nr_frees++;
> > +	}
> > +
> > +	/* Did we go off the end? */

/* We're required to fill all the space. */

--D

> > +	if (ptr > endptr)
> > +		xfs_scrub_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
> 
> ptr >= endptr?
> 
> Otherwise looks good.
> 
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> 
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 14/30] xfs: scrub the secondary superblocks
  2017-10-16  5:16   ` Dave Chinner
@ 2017-10-20 23:34     ` Darrick J. Wong
  0 siblings, 0 replies; 99+ messages in thread
From: Darrick J. Wong @ 2017-10-20 23:34 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, Oct 16, 2017 at 04:16:44PM +1100, Dave Chinner wrote:
> On Wed, Oct 11, 2017 at 06:42:16PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Ensure that the geometry presented in the backup superblocks matches
> > the primary superblock so that repair can recover the filesystem if
> > that primary gets corrupted.
> 
> I've noticed that scrub on certain fstests will report PREEN for
> secondary superblocks and repair thinks there is nothing wrong and
> doesn't fix them. I'm not sure which field it's complaining about,
> but at this point I don't see this as a blocker. Follow up patches
> would be fine.

These are the sb fields that currently trigger preen reports:

rootino rbmino rsumino imax_pct uquotino gquotino pquotino unit width
uuid fname

rootino: This will always be the first inode in the first possible
inode cluster in the filesystem, right?  mkfs copies rootino to the last
and the middle superblock, though if the location is fixed then we don't
strictly need to propagate it, do we?

rbmino/rsumino: have fixed ino numbers (rootino + 1 and + 2,
respectively) and are always rebuilt by repair.

uquotino/gquotino/pquotino can be totally rebuilt by quotacheck, right?
So if repair zaps these (and it does) either as part of a secondary ->
primary rebuild or for any other reason then it doesn't matter.

imax_pct: only set by mkfs and copied to all the other sb's.   This
might be mislabeled as a preen.

uuid/fname: seems to be set in all sb copies by xfs_db.  Maybe this one
is mislabeled?

unit/width: will be copied into sb 0 from mount options.  We only lose
information if repair recovers the primary sb and the mount options
don't include unit/width information.

Now for sb_versionnum, sb_version2, and the v5 feature bits:

versionnum: quotabit is turned on any time we mount with quota, and
quota has to be activated with mount options, right?

nlinkbit is always turned on at mount time...

versionnum attrbit: turned on any time we set an attr, and nothing much
(other than setting attr/attr2 bits in the sb) seem to depend on whether
or not this is set.

features2 attr2bit: I think the attr2 bit can be controlled by mounting
with 'attr2' and 'noattr2', right?  So, similar to unit/width we only
lose information if repair recovers a superblock and the mount options
don't include attr2 information, and even then XFS seems to like to turn
it on except in the noattr2 case.

So I get the impression that I might be able to turn uuid/fname/imax_pct
into regular _corrupt checks since (AFAICT) xfsprogs actually updates
them correctly, and simply drop the other fields from checking.

--D

> 
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> 
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 99+ messages in thread

end of thread, other threads:[~2017-10-20 23:34 UTC | newest]

Thread overview: 99+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-12  1:40 [PATCH v12 00/30] xfs: online scrub support Darrick J. Wong
2017-10-12  1:40 ` [PATCH 01/30] xfs: return a distinct error code value for IGET_INCORE cache misses Darrick J. Wong
2017-10-12  5:25   ` Dave Chinner
2017-10-12  1:40 ` [PATCH 02/30] xfs: create block pointer check functions Darrick J. Wong
2017-10-12  5:28   ` Dave Chinner
2017-10-12  5:48     ` Dave Chinner
2017-10-16 19:46       ` Darrick J. Wong
2017-10-12  1:41 ` [PATCH 03/30] xfs: refactor btree pointer checks Darrick J. Wong
2017-10-12  5:51   ` Dave Chinner
2017-10-12  1:41 ` [PATCH 04/30] xfs: refactor btree block header checking functions Darrick J. Wong
2017-10-13  1:01   ` Dave Chinner
2017-10-13 21:15     ` Darrick J. Wong
2017-10-16 19:48   ` [PATCH v2 " Darrick J. Wong
2017-10-16 23:36     ` Dave Chinner
2017-10-12  1:41 ` [PATCH 05/30] xfs: create inode pointer verifiers Darrick J. Wong
2017-10-12 20:23   ` Darrick J. Wong
2017-10-13  5:22     ` Dave Chinner
2017-10-13 16:16       ` Darrick J. Wong
2017-10-16 19:49   ` [PATCH v2 " Darrick J. Wong
2017-10-16 23:53     ` Dave Chinner
2017-10-12  1:41 ` [PATCH 06/30] xfs: create an ioctl to scrub AG metadata Darrick J. Wong
2017-10-16  0:08   ` Dave Chinner
2017-10-12  1:41 ` [PATCH 07/30] xfs: dispatch metadata scrub subcommands Darrick J. Wong
2017-10-16  0:26   ` Dave Chinner
2017-10-12  1:41 ` [PATCH 08/30] xfs: probe the scrub ioctl Darrick J. Wong
2017-10-16  0:39   ` Dave Chinner
2017-10-16 19:54     ` Darrick J. Wong
2017-10-16 23:05       ` Dave Chinner
2017-10-12  1:41 ` [PATCH 09/30] xfs: create helpers to record and deal with scrub problems Darrick J. Wong
2017-10-16  0:40   ` Dave Chinner
2017-10-12  1:41 ` [PATCH 10/30] xfs: create helpers to scrub a metadata btree Darrick J. Wong
2017-10-16  0:56   ` Dave Chinner
2017-10-12  1:41 ` [PATCH 11/30] xfs: scrub the shape of " Darrick J. Wong
2017-10-16  1:29   ` Dave Chinner
2017-10-16 20:09     ` Darrick J. Wong
2017-10-12  1:42 ` [PATCH 12/30] xfs: scrub btree keys and records Darrick J. Wong
2017-10-16  1:31   ` Dave Chinner
2017-10-12  1:42 ` [PATCH 13/30] xfs: create helpers to scan an allocation group Darrick J. Wong
2017-10-16  1:32   ` Dave Chinner
2017-10-12  1:42 ` [PATCH 14/30] xfs: scrub the secondary superblocks Darrick J. Wong
2017-10-16  5:16   ` Dave Chinner
2017-10-20 23:34     ` Darrick J. Wong
2017-10-12  1:42 ` [PATCH 15/30] xfs: scrub AGF and AGFL Darrick J. Wong
2017-10-16  2:18   ` Dave Chinner
2017-10-12  1:42 ` [PATCH 16/30] xfs: scrub the AGI Darrick J. Wong
2017-10-16  2:19   ` Dave Chinner
2017-10-12  1:42 ` [PATCH 17/30] xfs: scrub free space btrees Darrick J. Wong
2017-10-16  2:25   ` Dave Chinner
2017-10-16 20:36     ` Darrick J. Wong
2017-10-12  1:42 ` [PATCH 18/30] xfs: scrub inode btrees Darrick J. Wong
2017-10-16  2:55   ` Dave Chinner
2017-10-16 22:16     ` Darrick J. Wong
2017-10-17  0:11   ` [PATCH v2 " Darrick J. Wong
2017-10-17 21:59     ` Dave Chinner
2017-10-12  1:42 ` [PATCH 19/30] xfs: scrub rmap btrees Darrick J. Wong
2017-10-16  3:01   ` Dave Chinner
2017-10-12  1:42 ` [PATCH 20/30] xfs: scrub refcount btrees Darrick J. Wong
2017-10-16  3:02   ` Dave Chinner
2017-10-12  1:43 ` [PATCH 21/30] xfs: scrub inodes Darrick J. Wong
2017-10-12 22:32   ` Darrick J. Wong
2017-10-16  3:16     ` Dave Chinner
2017-10-16 22:08       ` Darrick J. Wong
2017-10-17  0:13   ` [PATCH v2 " Darrick J. Wong
2017-10-17 22:01     ` Dave Chinner
2017-10-12  1:43 ` [PATCH 22/30] xfs: scrub inode block mappings Darrick J. Wong
2017-10-16  3:26   ` Dave Chinner
2017-10-16 20:43     ` Darrick J. Wong
2017-10-12  1:43 ` [PATCH 23/30] xfs: scrub directory/attribute btrees Darrick J. Wong
2017-10-16  4:13   ` Dave Chinner
2017-10-12  1:43 ` [PATCH 24/30] xfs: scrub directory metadata Darrick J. Wong
2017-10-16  4:29   ` Dave Chinner
2017-10-16 20:46     ` Darrick J. Wong
2017-10-17  0:14   ` [PATCH v2 " Darrick J. Wong
2017-10-17 22:06     ` Dave Chinner
2017-10-12  1:43 ` [PATCH 25/30] xfs: scrub directory freespace Darrick J. Wong
2017-10-16  4:49   ` Dave Chinner
2017-10-16 22:37     ` Darrick J. Wong
2017-10-16 23:11       ` Darrick J. Wong
2017-10-16 23:14       ` Dave Chinner
2017-10-16 23:38         ` Darrick J. Wong
2017-10-17  1:10   ` [PATCH v2 " Darrick J. Wong
2017-10-17 22:08     ` Dave Chinner
2017-10-17 23:51       ` Darrick J. Wong
2017-10-12  1:43 ` [PATCH 26/30] xfs: scrub extended attributes Darrick J. Wong
2017-10-16  4:50   ` Dave Chinner
2017-10-12  1:43 ` [PATCH 27/30] xfs: scrub symbolic links Darrick J. Wong
2017-10-16  4:52   ` Dave Chinner
2017-10-12  1:43 ` [PATCH 28/30] xfs: scrub directory parent pointers Darrick J. Wong
2017-10-16  5:09   ` Dave Chinner
2017-10-16 21:46     ` Darrick J. Wong
2017-10-16 23:30       ` Dave Chinner
2017-10-16 23:58         ` Darrick J. Wong
2017-10-17  0:16   ` [PATCH v2 " Darrick J. Wong
2017-10-17 22:11     ` Dave Chinner
2017-10-12  1:43 ` [PATCH 29/30] xfs: scrub realtime bitmap/summary Darrick J. Wong
2017-10-16  5:11   ` Dave Chinner
2017-10-12  1:44 ` [PATCH 30/30] xfs: scrub quota information Darrick J. Wong
2017-10-16  5:12   ` Dave Chinner
2017-10-17  1:11     ` Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.