All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] xfs: inode scrubber fixes
@ 2019-01-01  2:08 Darrick J. Wong
  2019-01-01  2:08 ` [PATCH 1/8] xfs: never try to scrub more than 64 inodes per inobt record Darrick J. Wong
                   ` (7 more replies)
  0 siblings, 8 replies; 30+ messages in thread
From: Darrick J. Wong @ 2019-01-01  2:08 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

Hi all,

Here are fixes for some problems with the inode btree scrub code, namely
that the existing code does not handle the case where a single inode
cluster is mapped by multiple inobt records.

Patch 1 corrects a condition where we needed to clamp the number of
inodes checked for a given inobt record to the inode chunk size.

Patches 2-3 move the inobt record alignment checks to a separate
function and enhance the function to check that when we have more than
one inobt record per cluster we actually check that *all* of the
necessary records are present and in the correct order.  This patch
includes fixes for the finobt alignment false positives recently
reported by Chandan.

Patches 4-6 reorganize the inobt free data checks to deal with the
"multiple inobt records per icluster" situation.  In restructuring the
code to do so, we also rename variables and functions to be less
confusing about what they're there for.  We also fix the 'is the inode
free?' check to calculate dinode buffer offsets correctly in the
"multiple inobt records per icluster" situation.

Patch 7 aborts the xattr scrub loop if there are pending fatal signals.

Patch 8 checks that for any directory or attr fork there are no extent
maps that stretch beyond what a xfs_dablk_t can map.

If you're going to start using this mess, you probably ought to just
pull from my git trees.  The kernel patches[1] should apply against
4.20.

Comments and questions are, as always, welcome.

--D

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/8] xfs: never try to scrub more than 64 inodes per inobt record
  2019-01-01  2:08 [PATCH 0/8] xfs: inode scrubber fixes Darrick J. Wong
@ 2019-01-01  2:08 ` Darrick J. Wong
  2019-01-02 12:39   ` Carlos Maiolino
  2019-01-04 18:30   ` Brian Foster
  2019-01-01  2:08 ` [PATCH 2/8] xfs: check the ir_startino alignment directly Darrick J. Wong
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 30+ messages in thread
From: Darrick J. Wong @ 2019-01-01  2:08 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Make sure we never check more than XFS_INODES_PER_CHUNK inodes for any
given inobt record since there can be more than one inobt record mapped
to an inode cluster.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/scrub/ialloc.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
index 882dc56c5c21..fd431682db0b 100644
--- a/fs/xfs/scrub/ialloc.c
+++ b/fs/xfs/scrub/ialloc.c
@@ -203,7 +203,8 @@ xchk_iallocbt_check_freemask(
 	int				error = 0;
 
 	/* Make sure the freemask matches the inode records. */
-	nr_inodes = mp->m_inodes_per_cluster;
+	nr_inodes = min_t(unsigned int, XFS_INODES_PER_CHUNK,
+			mp->m_inodes_per_cluster);
 
 	for (agino = irec->ir_startino;
 	     agino < irec->ir_startino + XFS_INODES_PER_CHUNK;

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/8] xfs: check the ir_startino alignment directly
  2019-01-01  2:08 [PATCH 0/8] xfs: inode scrubber fixes Darrick J. Wong
  2019-01-01  2:08 ` [PATCH 1/8] xfs: never try to scrub more than 64 inodes per inobt record Darrick J. Wong
@ 2019-01-01  2:08 ` Darrick J. Wong
  2019-01-04 18:31   ` Brian Foster
  2019-01-01  2:08 ` [PATCH 3/8] xfs: check inobt record alignment on big block filesystems Darrick J. Wong
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 30+ messages in thread
From: Darrick J. Wong @ 2019-01-01  2:08 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

In xchk_iallocbt_rec, check the alignment of ir_startino by converting
the inode cluster block alignment into units of inodes instead of the
other way around (converting ir_startino to blocks).  This prevents us
from tripping over off-by-one errors in ir_startino which are obscured
by the inode -> block conversion.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/scrub/ialloc.c |   45 +++++++++++++++++++++++++++++++++++++++------
 1 file changed, 39 insertions(+), 6 deletions(-)


diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
index fd431682db0b..5082331d6c03 100644
--- a/fs/xfs/scrub/ialloc.c
+++ b/fs/xfs/scrub/ialloc.c
@@ -265,6 +265,42 @@ xchk_iallocbt_check_freemask(
 	return error;
 }
 
+/* Make sure this inode btree record is aligned properly. */
+STATIC void
+xchk_iallocbt_rec_alignment(
+	struct xchk_btree		*bs,
+	struct xfs_inobt_rec_incore	*irec)
+{
+	struct xfs_mount		*mp = bs->sc->mp;
+
+	/*
+	 * finobt records have different positioning requirements than inobt
+	 * records: each finobt record must have a corresponding inobt record.
+	 * That is checked in the xref function, so for now we only catch the
+	 * obvious case where the record isn't even chunk-aligned.
+	 *
+	 * Note also that if a fs block contains more than a single chunk of
+	 * inodes, we will have finobt records only for those chunks containing
+	 * free inodes.
+	 */
+	if (bs->cur->bc_btnum == XFS_BTNUM_FINO) {
+		if (irec->ir_startino & (XFS_INODES_PER_CHUNK - 1))
+			xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
+		return;
+	}
+
+	/* inobt records must be aligned to cluster and inoalignmnt size. */
+	if (irec->ir_startino & (mp->m_cluster_align_inodes - 1)) {
+		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
+		return;
+	}
+
+	if (irec->ir_startino & (mp->m_inodes_per_cluster - 1)) {
+		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
+		return;
+	}
+}
+
 /* Scrub an inobt/finobt record. */
 STATIC int
 xchk_iallocbt_rec(
@@ -277,7 +313,6 @@ xchk_iallocbt_rec(
 	uint64_t			holes;
 	xfs_agnumber_t			agno = bs->cur->bc_private.a.agno;
 	xfs_agino_t			agino;
-	xfs_agblock_t			agbno;
 	xfs_extlen_t			len;
 	int				holecount;
 	int				i;
@@ -304,11 +339,9 @@ xchk_iallocbt_rec(
 		goto out;
 	}
 
-	/* Make sure this record is aligned to cluster and inoalignmnt size. */
-	agbno = XFS_AGINO_TO_AGBNO(mp, irec.ir_startino);
-	if ((agbno & (mp->m_cluster_align - 1)) ||
-	    (agbno & (mp->m_blocks_per_cluster - 1)))
-		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
+	xchk_iallocbt_rec_alignment(bs, &irec);
+	if (bs->sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
+		goto out;
 
 	iabt->inodes += irec.ir_count;
 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/8] xfs: check inobt record alignment on big block filesystems
  2019-01-01  2:08 [PATCH 0/8] xfs: inode scrubber fixes Darrick J. Wong
  2019-01-01  2:08 ` [PATCH 1/8] xfs: never try to scrub more than 64 inodes per inobt record Darrick J. Wong
  2019-01-01  2:08 ` [PATCH 2/8] xfs: check the ir_startino alignment directly Darrick J. Wong
@ 2019-01-01  2:08 ` Darrick J. Wong
  2019-01-04 18:31   ` Brian Foster
  2019-01-01  2:08 ` [PATCH 4/8] xfs: hoist inode cluster checks out of loop Darrick J. Wong
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 30+ messages in thread
From: Darrick J. Wong @ 2019-01-01  2:08 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, Dave Chinner

From: Darrick J. Wong <darrick.wong@oracle.com>

On a big block filesystem, there may be multiple inobt records covering
a single inode cluster.  These records obviously won't be aligned to
cluster alignment rules, and they must cover the entire cluster.  Teach
scrub to check for these things.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/scrub/ialloc.c |   41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)


diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
index 5082331d6c03..5e593e4292b2 100644
--- a/fs/xfs/scrub/ialloc.c
+++ b/fs/xfs/scrub/ialloc.c
@@ -47,6 +47,12 @@ xchk_setup_ag_iallocbt(
 struct xchk_iallocbt {
 	/* Number of inodes we see while scanning inobt. */
 	unsigned long long	inodes;
+
+	/* Expected next startino, for big block filesystems. */
+	xfs_agino_t		next_startino;
+
+	/* Expected end of the current inode cluster. */
+	xfs_agino_t		next_cluster_ino;
 };
 
 /*
@@ -272,6 +278,7 @@ xchk_iallocbt_rec_alignment(
 	struct xfs_inobt_rec_incore	*irec)
 {
 	struct xfs_mount		*mp = bs->sc->mp;
+	struct xchk_iallocbt		*iabt = bs->private;
 
 	/*
 	 * finobt records have different positioning requirements than inobt
@@ -289,6 +296,27 @@ xchk_iallocbt_rec_alignment(
 		return;
 	}
 
+	if (iabt->next_startino != NULLAGINO) {
+		/*
+		 * We're midway through a cluster of inodes that is mapped by
+		 * multiple inobt records.  Did we get the record for the next
+		 * irec in the sequence?
+		 */
+		if (irec->ir_startino != iabt->next_startino) {
+			xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
+			return;
+		}
+
+		iabt->next_startino += XFS_INODES_PER_CHUNK;
+
+		/* Are we done with the cluster? */
+		if (iabt->next_startino >= iabt->next_cluster_ino) {
+			iabt->next_startino = NULLAGINO;
+			iabt->next_cluster_ino = NULLAGINO;
+		}
+		return;
+	}
+
 	/* inobt records must be aligned to cluster and inoalignmnt size. */
 	if (irec->ir_startino & (mp->m_cluster_align_inodes - 1)) {
 		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
@@ -299,6 +327,17 @@ xchk_iallocbt_rec_alignment(
 		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
 		return;
 	}
+
+	if (mp->m_inodes_per_cluster <= XFS_INODES_PER_CHUNK)
+		return;
+
+	/*
+	 * If this is the start of an inode cluster that can be mapped by
+	 * multiple inobt records, the next inobt record must follow exactly
+	 * after this one.
+	 */
+	iabt->next_startino = irec->ir_startino + XFS_INODES_PER_CHUNK;
+	iabt->next_cluster_ino = irec->ir_startino + mp->m_inodes_per_cluster;
 }
 
 /* Scrub an inobt/finobt record. */
@@ -463,6 +502,8 @@ xchk_iallocbt(
 	struct xfs_btree_cur	*cur;
 	struct xchk_iallocbt	iabt = {
 		.inodes		= 0,
+		.next_startino	= NULLAGINO,
+		.next_cluster_ino = NULLAGINO,
 	};
 	int			error;
 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 4/8] xfs: hoist inode cluster checks out of loop
  2019-01-01  2:08 [PATCH 0/8] xfs: inode scrubber fixes Darrick J. Wong
                   ` (2 preceding siblings ...)
  2019-01-01  2:08 ` [PATCH 3/8] xfs: check inobt record alignment on big block filesystems Darrick J. Wong
@ 2019-01-01  2:08 ` Darrick J. Wong
  2019-01-04 18:31   ` Brian Foster
  2019-01-01  2:08 ` [PATCH 5/8] xfs: clean up the inode cluster checking in the inobt scrub Darrick J. Wong
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 30+ messages in thread
From: Darrick J. Wong @ 2019-01-01  2:08 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, Dave Chinner

From: Darrick J. Wong <darrick.wong@oracle.com>

Hoist the inode cluster checks out of the inobt record check loop into
a separate function in preparation for refactoring of that loop.  No
functional changes here; that's in the next patch.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/scrub/ialloc.c |  119 +++++++++++++++++++++++++++----------------------
 1 file changed, 65 insertions(+), 54 deletions(-)


diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
index 5e593e4292b2..9af01ea0258d 100644
--- a/fs/xfs/scrub/ialloc.c
+++ b/fs/xfs/scrub/ialloc.c
@@ -188,19 +188,19 @@ xchk_iallocbt_check_cluster_freemask(
 	return 0;
 }
 
-/* Make sure the free mask is consistent with what the inodes think. */
+/* Check an inode cluster. */
 STATIC int
-xchk_iallocbt_check_freemask(
+xchk_iallocbt_check_cluster(
 	struct xchk_btree		*bs,
-	struct xfs_inobt_rec_incore	*irec)
+	struct xfs_inobt_rec_incore	*irec,
+	xfs_agino_t			agino)
 {
 	struct xfs_imap			imap;
 	struct xfs_mount		*mp = bs->cur->bc_mp;
 	struct xfs_dinode		*dip;
 	struct xfs_buf			*bp;
 	xfs_ino_t			fsino;
-	xfs_agino_t			nr_inodes;
-	xfs_agino_t			agino;
+	unsigned int			nr_inodes;
 	xfs_agino_t			chunkino;
 	xfs_agino_t			clusterino;
 	xfs_agblock_t			agbno;
@@ -212,60 +212,71 @@ xchk_iallocbt_check_freemask(
 	nr_inodes = min_t(unsigned int, XFS_INODES_PER_CHUNK,
 			mp->m_inodes_per_cluster);
 
-	for (agino = irec->ir_startino;
-	     agino < irec->ir_startino + XFS_INODES_PER_CHUNK;
-	     agino += mp->m_inodes_per_cluster) {
-		fsino = XFS_AGINO_TO_INO(mp, bs->cur->bc_private.a.agno, agino);
-		chunkino = agino - irec->ir_startino;
-		agbno = XFS_AGINO_TO_AGBNO(mp, agino);
-
-		/* Compute the holemask mask for this cluster. */
-		for (clusterino = 0, holemask = 0; clusterino < nr_inodes;
-		     clusterino += XFS_INODES_PER_HOLEMASK_BIT)
-			holemask |= XFS_INOBT_MASK((chunkino + clusterino) /
-					XFS_INODES_PER_HOLEMASK_BIT);
-
-		/* The whole cluster must be a hole or not a hole. */
-		ir_holemask = (irec->ir_holemask & holemask);
-		if (ir_holemask != holemask && ir_holemask != 0) {
-			xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
-			continue;
-		}
+	fsino = XFS_AGINO_TO_INO(mp, bs->cur->bc_private.a.agno, agino);
+	chunkino = agino - irec->ir_startino;
+	agbno = XFS_AGINO_TO_AGBNO(mp, agino);
 
-		/* If any part of this is a hole, skip it. */
-		if (ir_holemask) {
-			xchk_xref_is_not_owned_by(bs->sc, agbno,
-					mp->m_blocks_per_cluster,
-					&XFS_RMAP_OINFO_INODES);
-			continue;
-		}
+	/* Compute the holemask mask for this cluster. */
+	for (clusterino = 0, holemask = 0; clusterino < nr_inodes;
+	     clusterino += XFS_INODES_PER_HOLEMASK_BIT)
+		holemask |= XFS_INOBT_MASK((chunkino + clusterino) /
+				XFS_INODES_PER_HOLEMASK_BIT);
 
-		xchk_xref_is_owned_by(bs->sc, agbno, mp->m_blocks_per_cluster,
+	/* The whole cluster must be a hole or not a hole. */
+	ir_holemask = (irec->ir_holemask & holemask);
+	if (ir_holemask != holemask && ir_holemask != 0) {
+		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
+		return 0;
+	}
+
+	/* If any part of this is a hole, skip it. */
+	if (ir_holemask) {
+		xchk_xref_is_not_owned_by(bs->sc, agbno,
+				mp->m_blocks_per_cluster,
 				&XFS_RMAP_OINFO_INODES);
+		return 0;
+	}
 
-		/* Grab the inode cluster buffer. */
-		imap.im_blkno = XFS_AGB_TO_DADDR(mp, bs->cur->bc_private.a.agno,
-				agbno);
-		imap.im_len = XFS_FSB_TO_BB(mp, mp->m_blocks_per_cluster);
-		imap.im_boffset = 0;
-
-		error = xfs_imap_to_bp(mp, bs->cur->bc_tp, &imap,
-				&dip, &bp, 0, 0);
-		if (!xchk_btree_xref_process_error(bs->sc, bs->cur, 0,
-				&error))
-			continue;
-
-		/* Which inodes are free? */
-		for (clusterino = 0; clusterino < nr_inodes; clusterino++) {
-			error = xchk_iallocbt_check_cluster_freemask(bs,
-					fsino, chunkino, clusterino, irec, bp);
-			if (error) {
-				xfs_trans_brelse(bs->cur->bc_tp, bp);
-				return error;
-			}
-		}
+	xchk_xref_is_owned_by(bs->sc, agbno, mp->m_blocks_per_cluster,
+			&XFS_RMAP_OINFO_INODES);
+
+	/* Grab the inode cluster buffer. */
+	imap.im_blkno = XFS_AGB_TO_DADDR(mp, bs->cur->bc_private.a.agno, agbno);
+	imap.im_len = XFS_FSB_TO_BB(mp, mp->m_blocks_per_cluster);
+	imap.im_boffset = 0;
 
-		xfs_trans_brelse(bs->cur->bc_tp, bp);
+	error = xfs_imap_to_bp(mp, bs->cur->bc_tp, &imap, &dip, &bp, 0, 0);
+	if (!xchk_btree_xref_process_error(bs->sc, bs->cur, 0, &error))
+		return 0;
+
+	/* Which inodes are free? */
+	for (clusterino = 0; clusterino < nr_inodes; clusterino++) {
+		error = xchk_iallocbt_check_cluster_freemask(bs, fsino,
+				chunkino, clusterino, irec, bp);
+		if (error)
+			break;
+	}
+
+	xfs_trans_brelse(bs->cur->bc_tp, bp);
+	return error;
+}
+
+/* Make sure the free mask is consistent with what the inodes think. */
+STATIC int
+xchk_iallocbt_check_freemask(
+	struct xchk_btree		*bs,
+	struct xfs_inobt_rec_incore	*irec)
+{
+	struct xfs_mount		*mp = bs->cur->bc_mp;
+	xfs_agino_t			agino;
+	int				error = 0;
+
+	for (agino = irec->ir_startino;
+	     agino < irec->ir_startino + XFS_INODES_PER_CHUNK;
+	     agino += mp->m_blocks_per_cluster * mp->m_sb.sb_inopblock) {
+		error = xchk_iallocbt_check_cluster(bs, irec, agino);
+		if (error)
+			break;
 	}
 
 	return error;

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 5/8] xfs: clean up the inode cluster checking in the inobt scrub
  2019-01-01  2:08 [PATCH 0/8] xfs: inode scrubber fixes Darrick J. Wong
                   ` (3 preceding siblings ...)
  2019-01-01  2:08 ` [PATCH 4/8] xfs: hoist inode cluster checks out of loop Darrick J. Wong
@ 2019-01-01  2:08 ` Darrick J. Wong
  2019-01-04 18:32   ` Brian Foster
  2019-01-01  2:09 ` [PATCH 6/8] xfs: scrub big block inode btrees correctly Darrick J. Wong
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 30+ messages in thread
From: Darrick J. Wong @ 2019-01-01  2:08 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

The code to check inobt records against inode clusters is a mess of
poorly named variables and unnecessary parameters.  Clean the
unnecessary inode number parameters out of _check_cluster_freemask in
favor of computing them inside the function instead of making the caller
do it.  In xchk_iallocbt_check_cluster, rename the variables to make it
more obvious just what chunk_ino and cluster_ino represent.

Add a tracepoint to make it easier to track each inode cluster as we
scrub it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/scrub/ialloc.c |  162 ++++++++++++++++++++++++++++++++-----------------
 fs/xfs/scrub/trace.h  |   45 ++++++++++++++
 2 files changed, 151 insertions(+), 56 deletions(-)


diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
index 9af01ea0258d..2f6c2d7fa3fd 100644
--- a/fs/xfs/scrub/ialloc.c
+++ b/fs/xfs/scrub/ialloc.c
@@ -134,41 +134,70 @@ xchk_iallocbt_freecount(
 	return hweight64(freemask);
 }
 
-/* Check a particular inode with ir_free. */
+/*
+ * Check that an inode's allocation status matches ir_free in the inobt
+ * record.  First we try querying the in-core inode state, and if the inode
+ * isn't loaded we examine the on-disk inode directly.
+ *
+ * Since there can be 1:M and M:1 mappings between inobt records and inode
+ * clusters, we pass in the inode location information as an inobt record;
+ * the index of an inode cluster within the inobt record (as well as the
+ * cluster buffer itself); and the index of the inode within the cluster.
+ *
+ * @irec is the inobt record.
+ * @cluster_base is the inode offset of the cluster within the @irec.
+ * @cluster_bp is the cluster buffer.
+ * @cluster_index is the inode offset within the inode cluster.
+ */
 STATIC int
-xchk_iallocbt_check_cluster_freemask(
+xchk_iallocbt_check_cluster_ifree(
 	struct xchk_btree		*bs,
-	xfs_ino_t			fsino,
-	xfs_agino_t			chunkino,
-	xfs_agino_t			clusterino,
 	struct xfs_inobt_rec_incore	*irec,
-	struct xfs_buf			*bp)
+	unsigned int			cluster_base,
+	struct xfs_buf			*cluster_bp,
+	unsigned int			cluster_index)
 {
-	struct xfs_dinode		*dip;
 	struct xfs_mount		*mp = bs->cur->bc_mp;
-	bool				inode_is_free = false;
+	struct xfs_dinode		*dip;
+	xfs_ino_t			fsino;
+	xfs_agino_t			agino;
+	unsigned int			offset;
+	bool				irec_free;
+	bool				ino_inuse;
 	bool				freemask_ok;
-	bool				inuse;
-	int				error = 0;
+	int				error;
 
 	if (xchk_should_terminate(bs->sc, &error))
 		return error;
 
-	dip = xfs_buf_offset(bp, clusterino * mp->m_sb.sb_inodesize);
+	/*
+	 * Given an inobt record, an offset of a cluster within the record,
+	 * and an offset of an inode within a cluster, compute which fs inode
+	 * we're talking about and the offset of the inode record within the
+	 * inode buffer.
+	 */
+	agino = irec->ir_startino + cluster_base + cluster_index;
+	fsino = XFS_AGINO_TO_INO(mp, bs->cur->bc_private.a.agno, agino);
+	offset = cluster_index * mp->m_sb.sb_inodesize;
+	if (offset >= BBTOB(cluster_bp->b_length)) {
+		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
+		goto out;
+	}
+	dip = xfs_buf_offset(cluster_bp, offset);
+	irec_free = (irec->ir_free & XFS_INOBT_MASK(cluster_base +
+						    cluster_index));
+
 	if (be16_to_cpu(dip->di_magic) != XFS_DINODE_MAGIC ||
-	    (dip->di_version >= 3 &&
-	     be64_to_cpu(dip->di_ino) != fsino + clusterino)) {
+	    (dip->di_version >= 3 && be64_to_cpu(dip->di_ino) != fsino)) {
 		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
 		goto out;
 	}
 
-	if (irec->ir_free & XFS_INOBT_MASK(chunkino + clusterino))
-		inode_is_free = true;
-	error = xfs_icache_inode_is_allocated(mp, bs->cur->bc_tp,
-			fsino + clusterino, &inuse);
+	error = xfs_icache_inode_is_allocated(mp, bs->cur->bc_tp, fsino,
+			&ino_inuse);
 	if (error == -ENODATA) {
 		/* Not cached, just read the disk buffer */
-		freemask_ok = inode_is_free ^ !!(dip->di_mode);
+		freemask_ok = irec_free ^ !!(dip->di_mode);
 		if (!bs->sc->try_harder && !freemask_ok)
 			return -EDEADLOCK;
 	} else if (error < 0) {
@@ -180,7 +209,7 @@ xchk_iallocbt_check_cluster_freemask(
 		goto out;
 	} else {
 		/* Inode is all there. */
-		freemask_ok = inode_is_free ^ inuse;
+		freemask_ok = irec_free ^ ino_inuse;
 	}
 	if (!freemask_ok)
 		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
@@ -188,43 +217,57 @@ xchk_iallocbt_check_cluster_freemask(
 	return 0;
 }
 
-/* Check an inode cluster. */
+/*
+ * Check that the holemask and freemask of a hypothetical inode cluster match
+ * what's actually on disk.  If sparse inodes are enabled, the cluster does
+ * not actually have to map to inodes if the corresponding holemask bit is set.
+ *
+ * @cluster_base is the first inode in the cluster within the @irec.
+ */
 STATIC int
 xchk_iallocbt_check_cluster(
 	struct xchk_btree		*bs,
 	struct xfs_inobt_rec_incore	*irec,
-	xfs_agino_t			agino)
+	unsigned int			cluster_base)
 {
 	struct xfs_imap			imap;
 	struct xfs_mount		*mp = bs->cur->bc_mp;
 	struct xfs_dinode		*dip;
-	struct xfs_buf			*bp;
-	xfs_ino_t			fsino;
+	struct xfs_buf			*cluster_bp;
 	unsigned int			nr_inodes;
-	xfs_agino_t			chunkino;
-	xfs_agino_t			clusterino;
+	xfs_agnumber_t			agno = bs->cur->bc_private.a.agno;
 	xfs_agblock_t			agbno;
-	uint16_t			holemask;
+	unsigned int			cluster_index;
+	uint16_t			cluster_mask = 0;
 	uint16_t			ir_holemask;
 	int				error = 0;
 
-	/* Make sure the freemask matches the inode records. */
 	nr_inodes = min_t(unsigned int, XFS_INODES_PER_CHUNK,
 			mp->m_inodes_per_cluster);
 
-	fsino = XFS_AGINO_TO_INO(mp, bs->cur->bc_private.a.agno, agino);
-	chunkino = agino - irec->ir_startino;
-	agbno = XFS_AGINO_TO_AGBNO(mp, agino);
+	/* Map this inode cluster */
+	agbno = XFS_AGINO_TO_AGBNO(mp, irec->ir_startino + cluster_base);
 
-	/* Compute the holemask mask for this cluster. */
-	for (clusterino = 0, holemask = 0; clusterino < nr_inodes;
-	     clusterino += XFS_INODES_PER_HOLEMASK_BIT)
-		holemask |= XFS_INOBT_MASK((chunkino + clusterino) /
+	/* Compute a bitmask for this cluster that can be used for holemask. */
+	for (cluster_index = 0;
+	     cluster_index < nr_inodes;
+	     cluster_index += XFS_INODES_PER_HOLEMASK_BIT)
+		cluster_mask |= XFS_INOBT_MASK((cluster_base + cluster_index) /
 				XFS_INODES_PER_HOLEMASK_BIT);
 
+	ir_holemask = (irec->ir_holemask & cluster_mask);
+	imap.im_blkno = XFS_AGB_TO_DADDR(mp, agno, agbno);
+	imap.im_len = XFS_FSB_TO_BB(mp, mp->m_blocks_per_cluster);
+	imap.im_boffset = 0;
+
+	trace_xchk_iallocbt_check_cluster(mp, agno, irec->ir_startino,
+			imap.im_blkno, imap.im_len, cluster_base, nr_inodes,
+			cluster_mask, ir_holemask,
+			XFS_INO_TO_OFFSET(mp, irec->ir_startino +
+					  cluster_base));
+
 	/* The whole cluster must be a hole or not a hole. */
-	ir_holemask = (irec->ir_holemask & holemask);
-	if (ir_holemask != holemask && ir_holemask != 0) {
+	if (ir_holemask != cluster_mask && ir_holemask != 0) {
 		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
 		return 0;
 	}
@@ -241,40 +284,47 @@ xchk_iallocbt_check_cluster(
 			&XFS_RMAP_OINFO_INODES);
 
 	/* Grab the inode cluster buffer. */
-	imap.im_blkno = XFS_AGB_TO_DADDR(mp, bs->cur->bc_private.a.agno, agbno);
-	imap.im_len = XFS_FSB_TO_BB(mp, mp->m_blocks_per_cluster);
-	imap.im_boffset = 0;
-
-	error = xfs_imap_to_bp(mp, bs->cur->bc_tp, &imap, &dip, &bp, 0, 0);
+	error = xfs_imap_to_bp(mp, bs->cur->bc_tp, &imap, &dip, &cluster_bp,
+			0, 0);
 	if (!xchk_btree_xref_process_error(bs->sc, bs->cur, 0, &error))
-		return 0;
+		return error;
 
-	/* Which inodes are free? */
-	for (clusterino = 0; clusterino < nr_inodes; clusterino++) {
-		error = xchk_iallocbt_check_cluster_freemask(bs, fsino,
-				chunkino, clusterino, irec, bp);
+	/* Check free status of each inode within this cluster. */
+	for (cluster_index = 0; cluster_index < nr_inodes; cluster_index++) {
+		error = xchk_iallocbt_check_cluster_ifree(bs, irec,
+				cluster_base, cluster_bp, cluster_index);
 		if (error)
 			break;
 	}
 
-	xfs_trans_brelse(bs->cur->bc_tp, bp);
+	xfs_trans_brelse(bs->cur->bc_tp, cluster_bp);
 	return error;
 }
 
-/* Make sure the free mask is consistent with what the inodes think. */
+/*
+ * For all the inode clusters that could map to this inobt record, make sure
+ * that the holemask makes sense and that the allocation status of each inode
+ * matches the freemask.
+ */
 STATIC int
-xchk_iallocbt_check_freemask(
+xchk_iallocbt_check_clusters(
 	struct xchk_btree		*bs,
 	struct xfs_inobt_rec_incore	*irec)
 {
-	struct xfs_mount		*mp = bs->cur->bc_mp;
-	xfs_agino_t			agino;
+	unsigned int			cluster_base;
 	int				error = 0;
 
-	for (agino = irec->ir_startino;
-	     agino < irec->ir_startino + XFS_INODES_PER_CHUNK;
-	     agino += mp->m_blocks_per_cluster * mp->m_sb.sb_inopblock) {
-		error = xchk_iallocbt_check_cluster(bs, irec, agino);
+	/*
+	 * For the common case where this inobt record maps to multiple inode
+	 * clusters this will call _check_cluster for each cluster.
+	 *
+	 * For the case that multiple inobt records map to a single cluster,
+	 * this will call _check_cluster once.
+	 */
+	for (cluster_base = 0;
+	     cluster_base < XFS_INODES_PER_CHUNK;
+	     cluster_base += bs->sc->mp->m_inodes_per_cluster) {
+		error = xchk_iallocbt_check_cluster(bs, irec, cluster_base);
 		if (error)
 			break;
 	}
@@ -431,7 +481,7 @@ xchk_iallocbt_rec(
 		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
 
 check_freemask:
-	error = xchk_iallocbt_check_freemask(bs, &irec);
+	error = xchk_iallocbt_check_clusters(bs, &irec);
 	if (error)
 		goto out;
 
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 8344b14031ef..3c83e8b3b39c 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -545,6 +545,51 @@ TRACE_EVENT(xchk_xref_error,
 		  __entry->ret_ip)
 );
 
+TRACE_EVENT(xchk_iallocbt_check_cluster,
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
+		 xfs_agino_t startino, xfs_daddr_t map_daddr,
+		 unsigned short map_len, unsigned int chunk_ino,
+		 unsigned int nr_inodes, uint16_t cluster_mask,
+		 uint16_t holemask, unsigned int cluster_ino),
+	TP_ARGS(mp, agno, startino, map_daddr, map_len, chunk_ino, nr_inodes,
+		cluster_mask, holemask, cluster_ino),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_agino_t, startino)
+		__field(xfs_daddr_t, map_daddr)
+		__field(unsigned short, map_len)
+		__field(unsigned int, chunk_ino)
+		__field(unsigned int, nr_inodes)
+		__field(unsigned int, cluster_ino)
+		__field(uint16_t, cluster_mask)
+		__field(uint16_t, holemask)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->agno = agno;
+		__entry->startino = startino;
+		__entry->map_daddr = map_daddr;
+		__entry->map_len = map_len;
+		__entry->chunk_ino = chunk_ino;
+		__entry->nr_inodes = nr_inodes;
+		__entry->cluster_mask = cluster_mask;
+		__entry->holemask = holemask;
+		__entry->cluster_ino = cluster_ino;
+	),
+	TP_printk("dev %d:%d agno %d startino %u daddr 0x%llx len %d chunkino %u nr_inodes %u cluster_mask 0x%x holemask 0x%x cluster_ino %u",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->agno,
+		  __entry->startino,
+		  __entry->map_daddr,
+		  __entry->map_len,
+		  __entry->chunk_ino,
+		  __entry->nr_inodes,
+		  __entry->cluster_mask,
+		  __entry->holemask,
+		  __entry->cluster_ino)
+)
+
 /* repair tracepoints */
 #if IS_ENABLED(CONFIG_XFS_ONLINE_REPAIR)
 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 6/8] xfs: scrub big block inode btrees correctly
  2019-01-01  2:08 [PATCH 0/8] xfs: inode scrubber fixes Darrick J. Wong
                   ` (4 preceding siblings ...)
  2019-01-01  2:08 ` [PATCH 5/8] xfs: clean up the inode cluster checking in the inobt scrub Darrick J. Wong
@ 2019-01-01  2:09 ` Darrick J. Wong
  2019-01-04 18:38   ` Brian Foster
  2019-01-01  2:09 ` [PATCH 7/8] xfs: abort xattr scrub if fatal signals are pending Darrick J. Wong
  2019-01-01  2:09 ` [PATCH 8/8] xfs: scrub should flag dir/attr offsets that aren't mappable with xfs_dablk_t Darrick J. Wong
  7 siblings, 1 reply; 30+ messages in thread
From: Darrick J. Wong @ 2019-01-01  2:09 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Teach scrub how to handle the case that there are one or more inobt
records covering a given inode cluster.  This fixes the operation on big
block filesystems (e.g. 64k blocks, 512 byte inodes).

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/scrub/ialloc.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)


diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
index 2f6c2d7fa3fd..1be6a5ebd61e 100644
--- a/fs/xfs/scrub/ialloc.c
+++ b/fs/xfs/scrub/ialloc.c
@@ -162,6 +162,7 @@ xchk_iallocbt_check_cluster_ifree(
 	xfs_ino_t			fsino;
 	xfs_agino_t			agino;
 	unsigned int			offset;
+	unsigned int			cluster_buf_base;
 	bool				irec_free;
 	bool				ino_inuse;
 	bool				freemask_ok;
@@ -174,11 +175,14 @@ xchk_iallocbt_check_cluster_ifree(
 	 * Given an inobt record, an offset of a cluster within the record,
 	 * and an offset of an inode within a cluster, compute which fs inode
 	 * we're talking about and the offset of the inode record within the
-	 * inode buffer.
+	 * inode buffer, being careful about inobt records that don't align
+	 * with the start of the inode buffer when block sizes are large.
 	 */
 	agino = irec->ir_startino + cluster_base + cluster_index;
 	fsino = XFS_AGINO_TO_INO(mp, bs->cur->bc_private.a.agno, agino);
-	offset = cluster_index * mp->m_sb.sb_inodesize;
+	cluster_buf_base = XFS_INO_TO_OFFSET(mp, irec->ir_startino +
+					     cluster_base);
+	offset = (cluster_buf_base + cluster_index) * mp->m_sb.sb_inodesize;
 	if (offset >= BBTOB(cluster_bp->b_length)) {
 		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
 		goto out;

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 7/8] xfs: abort xattr scrub if fatal signals are pending
  2019-01-01  2:08 [PATCH 0/8] xfs: inode scrubber fixes Darrick J. Wong
                   ` (5 preceding siblings ...)
  2019-01-01  2:09 ` [PATCH 6/8] xfs: scrub big block inode btrees correctly Darrick J. Wong
@ 2019-01-01  2:09 ` Darrick J. Wong
  2019-01-04 18:39   ` Brian Foster
  2019-01-01  2:09 ` [PATCH 8/8] xfs: scrub should flag dir/attr offsets that aren't mappable with xfs_dablk_t Darrick J. Wong
  7 siblings, 1 reply; 30+ messages in thread
From: Darrick J. Wong @ 2019-01-01  2:09 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

The extended attribute scrubber should abort the "read all attrs" loop
if there's a fatal signal pending on the process.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/scrub/attr.c |    5 +++++
 1 file changed, 5 insertions(+)


diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
index 81d5e90547a1..9960bc5b5d76 100644
--- a/fs/xfs/scrub/attr.c
+++ b/fs/xfs/scrub/attr.c
@@ -82,6 +82,11 @@ xchk_xattr_listent(
 
 	sx = container_of(context, struct xchk_xattr, context);
 
+	if (xchk_should_terminate(sx->sc, &error)) {
+		context->seen_enough = 1;
+		return;
+	}
+
 	if (flags & XFS_ATTR_INCOMPLETE) {
 		/* Incomplete attr key, just mark the inode for preening. */
 		xchk_ino_set_preen(sx->sc, context->dp->i_ino);

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 8/8] xfs: scrub should flag dir/attr offsets that aren't mappable with xfs_dablk_t
  2019-01-01  2:08 [PATCH 0/8] xfs: inode scrubber fixes Darrick J. Wong
                   ` (6 preceding siblings ...)
  2019-01-01  2:09 ` [PATCH 7/8] xfs: abort xattr scrub if fatal signals are pending Darrick J. Wong
@ 2019-01-01  2:09 ` Darrick J. Wong
  2019-01-04 18:39   ` Brian Foster
  7 siblings, 1 reply; 30+ messages in thread
From: Darrick J. Wong @ 2019-01-01  2:09 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Teach scrub to flag extent maps that exceed the range that can be mapped
with a xfs_dablk_t.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_types.c |   11 +++++++++++
 fs/xfs/libxfs/xfs_types.h |    1 +
 fs/xfs/scrub/bmap.c       |   27 +++++++++++++++++++++++++++
 3 files changed, 39 insertions(+)


diff --git a/fs/xfs/libxfs/xfs_types.c b/fs/xfs/libxfs/xfs_types.c
index 3306fc42cfad..8e03e88a2f1a 100644
--- a/fs/xfs/libxfs/xfs_types.c
+++ b/fs/xfs/libxfs/xfs_types.c
@@ -204,3 +204,14 @@ xfs_verify_icount(
 	xfs_icount_range(mp, &min, &max);
 	return icount >= min && icount <= max;
 }
+
+/* Sanity-checking of dir/attr block offsets. */
+bool
+xfs_verify_dablk(
+	struct xfs_mount	*mp,
+	xfs_fileoff_t		dabno)
+{
+	xfs_dablk_t		max_dablk = -1U;
+
+	return dabno < max_dablk;
+}
diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h
index 8f02855a019a..704b4f308780 100644
--- a/fs/xfs/libxfs/xfs_types.h
+++ b/fs/xfs/libxfs/xfs_types.h
@@ -188,5 +188,6 @@ bool xfs_internal_inum(struct xfs_mount *mp, xfs_ino_t ino);
 bool xfs_verify_dir_ino(struct xfs_mount *mp, xfs_ino_t ino);
 bool xfs_verify_rtbno(struct xfs_mount *mp, xfs_rtblock_t rtbno);
 bool xfs_verify_icount(struct xfs_mount *mp, unsigned long long icount);
+bool xfs_verify_dablk(struct xfs_mount *mp, xfs_fileoff_t off);
 
 #endif	/* __XFS_TYPES_H__ */
diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c
index e1d11f3223e3..a703cd58a90e 100644
--- a/fs/xfs/scrub/bmap.c
+++ b/fs/xfs/scrub/bmap.c
@@ -281,6 +281,31 @@ xchk_bmap_extent_xref(
 	xchk_ag_free(info->sc, &info->sc->sa);
 }
 
+/*
+ * Directories and attr forks should never have blocks that can't be addressed
+ * by a xfs_dablk_t.
+ */
+STATIC void
+xchk_bmap_dirattr_extent(
+	struct xfs_inode	*ip,
+	struct xchk_bmap_info	*info,
+	struct xfs_bmbt_irec	*irec)
+{
+	struct xfs_mount	*mp = ip->i_mount;
+	xfs_fileoff_t		off;
+
+	if (!S_ISDIR(VFS_I(ip)->i_mode) && info->whichfork != XFS_ATTR_FORK)
+		return;
+
+	if (!xfs_verify_dablk(mp, irec->br_startoff))
+		xchk_fblock_set_corrupt(info->sc, info->whichfork,
+				irec->br_startoff);
+
+	off = irec->br_startoff + irec->br_blockcount - 1;
+	if (!xfs_verify_dablk(mp, off))
+		xchk_fblock_set_corrupt(info->sc, info->whichfork, off);
+}
+
 /* Scrub a single extent record. */
 STATIC int
 xchk_bmap_extent(
@@ -305,6 +330,8 @@ xchk_bmap_extent(
 		xchk_fblock_set_corrupt(info->sc, info->whichfork,
 				irec->br_startoff);
 
+	xchk_bmap_dirattr_extent(ip, info, irec);
+
 	/* There should never be a "hole" extent in either extent list. */
 	if (irec->br_startblock == HOLESTARTBLOCK)
 		xchk_fblock_set_corrupt(info->sc, info->whichfork,

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/8] xfs: never try to scrub more than 64 inodes per inobt record
  2019-01-01  2:08 ` [PATCH 1/8] xfs: never try to scrub more than 64 inodes per inobt record Darrick J. Wong
@ 2019-01-02 12:39   ` Carlos Maiolino
  2019-01-02 13:30     ` Chandan Rajendra
  2019-01-04 18:30   ` Brian Foster
  1 sibling, 1 reply; 30+ messages in thread
From: Carlos Maiolino @ 2019-01-02 12:39 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Dec 31, 2018 at 06:08:33PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Make sure we never check more than XFS_INODES_PER_CHUNK inodes for any
> given inobt record since there can be more than one inobt record mapped
> to an inode cluster.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/scrub/ialloc.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> 
> diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
> index 882dc56c5c21..fd431682db0b 100644
> --- a/fs/xfs/scrub/ialloc.c
> +++ b/fs/xfs/scrub/ialloc.c
> @@ -203,7 +203,8 @@ xchk_iallocbt_check_freemask(
>  	int				error = 0;
>  
>  	/* Make sure the freemask matches the inode records. */
> -	nr_inodes = mp->m_inodes_per_cluster;
> +	nr_inodes = min_t(unsigned int, XFS_INODES_PER_CHUNK,
> +			mp->m_inodes_per_cluster);

Pardon me if this doesn't make sense, but, this looks like a good time to catch
a possible corruption?! If mp->m_inodes_per_cluster is > XFS_INODES_PER_CHUNK
something is terribly wrong and we could report it here instead of max it out to
XFS_INODES_PER_CHUNK, but I haven't studied the scrub code that deep yet to see
if my suggestion makes sense or not :)


>  
>  	for (agino = irec->ir_startino;
>  	     agino < irec->ir_startino + XFS_INODES_PER_CHUNK;
> 

-- 
Carlos

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/8] xfs: never try to scrub more than 64 inodes per inobt record
  2019-01-02 12:39   ` Carlos Maiolino
@ 2019-01-02 13:30     ` Chandan Rajendra
  0 siblings, 0 replies; 30+ messages in thread
From: Chandan Rajendra @ 2019-01-02 13:30 UTC (permalink / raw)
  To: Carlos Maiolino; +Cc: Darrick J. Wong, linux-xfs

On Wednesday, January 2, 2019 6:09:29 PM IST Carlos Maiolino wrote:
> On Mon, Dec 31, 2018 at 06:08:33PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Make sure we never check more than XFS_INODES_PER_CHUNK inodes for any
> > given inobt record since there can be more than one inobt record mapped
> > to an inode cluster.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/scrub/ialloc.c |    3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > 
> > diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
> > index 882dc56c5c21..fd431682db0b 100644
> > --- a/fs/xfs/scrub/ialloc.c
> > +++ b/fs/xfs/scrub/ialloc.c
> > @@ -203,7 +203,8 @@ xchk_iallocbt_check_freemask(
> >  	int				error = 0;
> >  
> >  	/* Make sure the freemask matches the inode records. */
> > -	nr_inodes = mp->m_inodes_per_cluster;
> > +	nr_inodes = min_t(unsigned int, XFS_INODES_PER_CHUNK,
> > +			mp->m_inodes_per_cluster);
> 
> Pardon me if this doesn't make sense, but, this looks like a good time to catch
> a possible corruption?! If mp->m_inodes_per_cluster is > XFS_INODES_PER_CHUNK
> something is terribly wrong and we could report it here instead of max it out to
> XFS_INODES_PER_CHUNK, but I haven't studied the scrub code that deep yet to see
> if my suggestion makes sense or not :)

With 64k block size, we would have 128 inodes per cluster i.e. more than one
chunk can be accommodated within a cluster.

-- 
chandan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/8] xfs: never try to scrub more than 64 inodes per inobt record
  2019-01-01  2:08 ` [PATCH 1/8] xfs: never try to scrub more than 64 inodes per inobt record Darrick J. Wong
  2019-01-02 12:39   ` Carlos Maiolino
@ 2019-01-04 18:30   ` Brian Foster
  1 sibling, 0 replies; 30+ messages in thread
From: Brian Foster @ 2019-01-04 18:30 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Dec 31, 2018 at 06:08:33PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Make sure we never check more than XFS_INODES_PER_CHUNK inodes for any
> given inobt record since there can be more than one inobt record mapped
> to an inode cluster.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/scrub/ialloc.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> 
> diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
> index 882dc56c5c21..fd431682db0b 100644
> --- a/fs/xfs/scrub/ialloc.c
> +++ b/fs/xfs/scrub/ialloc.c
> @@ -203,7 +203,8 @@ xchk_iallocbt_check_freemask(
>  	int				error = 0;
>  
>  	/* Make sure the freemask matches the inode records. */
> -	nr_inodes = mp->m_inodes_per_cluster;
> +	nr_inodes = min_t(unsigned int, XFS_INODES_PER_CHUNK,
> +			mp->m_inodes_per_cluster);
>  

I'm wondering why we wouldn't also use nr_inodes in the loop immediately
below instead of ->m_inodes_per_cluster, but it looks like this all
changes in the following patches.

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  	for (agino = irec->ir_startino;
>  	     agino < irec->ir_startino + XFS_INODES_PER_CHUNK;
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/8] xfs: check the ir_startino alignment directly
  2019-01-01  2:08 ` [PATCH 2/8] xfs: check the ir_startino alignment directly Darrick J. Wong
@ 2019-01-04 18:31   ` Brian Foster
  2019-01-04 20:59     ` Darrick J. Wong
  0 siblings, 1 reply; 30+ messages in thread
From: Brian Foster @ 2019-01-04 18:31 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Dec 31, 2018 at 06:08:39PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> In xchk_iallocbt_rec, check the alignment of ir_startino by converting
> the inode cluster block alignment into units of inodes instead of the
> other way around (converting ir_startino to blocks).  This prevents us
> from tripping over off-by-one errors in ir_startino which are obscured
> by the inode -> block conversion.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/scrub/ialloc.c |   45 +++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 39 insertions(+), 6 deletions(-)
> 
> 
> diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
> index fd431682db0b..5082331d6c03 100644
> --- a/fs/xfs/scrub/ialloc.c
> +++ b/fs/xfs/scrub/ialloc.c
> @@ -265,6 +265,42 @@ xchk_iallocbt_check_freemask(
>  	return error;
>  }
>  
> +/* Make sure this inode btree record is aligned properly. */
> +STATIC void
> +xchk_iallocbt_rec_alignment(
> +	struct xchk_btree		*bs,
> +	struct xfs_inobt_rec_incore	*irec)
> +{
> +	struct xfs_mount		*mp = bs->sc->mp;
> +
> +	/*
> +	 * finobt records have different positioning requirements than inobt
> +	 * records: each finobt record must have a corresponding inobt record.
> +	 * That is checked in the xref function, so for now we only catch the
> +	 * obvious case where the record isn't even chunk-aligned.
> +	 *
> +	 * Note also that if a fs block contains more than a single chunk of
> +	 * inodes, we will have finobt records only for those chunks containing
> +	 * free inodes.
> +	 */
> +	if (bs->cur->bc_btnum == XFS_BTNUM_FINO) {
> +		if (irec->ir_startino & (XFS_INODES_PER_CHUNK - 1))
> +			xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> +		return;
> +	}

Is the above really a finobt only check? Couldn't we run this
sanity check against all records and skip the following for finobt?

Otherwise seems fine:

Reviewed-by: Brian Foster <bfoster@redhat.com>

> +
> +	/* inobt records must be aligned to cluster and inoalignmnt size. */
> +	if (irec->ir_startino & (mp->m_cluster_align_inodes - 1)) {
> +		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> +		return;
> +	}
> +
> +	if (irec->ir_startino & (mp->m_inodes_per_cluster - 1)) {
> +		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> +		return;
> +	}
> +}
> +
>  /* Scrub an inobt/finobt record. */
>  STATIC int
>  xchk_iallocbt_rec(
> @@ -277,7 +313,6 @@ xchk_iallocbt_rec(
>  	uint64_t			holes;
>  	xfs_agnumber_t			agno = bs->cur->bc_private.a.agno;
>  	xfs_agino_t			agino;
> -	xfs_agblock_t			agbno;
>  	xfs_extlen_t			len;
>  	int				holecount;
>  	int				i;
> @@ -304,11 +339,9 @@ xchk_iallocbt_rec(
>  		goto out;
>  	}
>  
> -	/* Make sure this record is aligned to cluster and inoalignmnt size. */
> -	agbno = XFS_AGINO_TO_AGBNO(mp, irec.ir_startino);
> -	if ((agbno & (mp->m_cluster_align - 1)) ||
> -	    (agbno & (mp->m_blocks_per_cluster - 1)))
> -		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> +	xchk_iallocbt_rec_alignment(bs, &irec);
> +	if (bs->sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
> +		goto out;
>  
>  	iabt->inodes += irec.ir_count;
>  
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/8] xfs: check inobt record alignment on big block filesystems
  2019-01-01  2:08 ` [PATCH 3/8] xfs: check inobt record alignment on big block filesystems Darrick J. Wong
@ 2019-01-04 18:31   ` Brian Foster
  0 siblings, 0 replies; 30+ messages in thread
From: Brian Foster @ 2019-01-04 18:31 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, Dave Chinner

On Mon, Dec 31, 2018 at 06:08:45PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> On a big block filesystem, there may be multiple inobt records covering
> a single inode cluster.  These records obviously won't be aligned to
> cluster alignment rules, and they must cover the entire cluster.  Teach
> scrub to check for these things.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/scrub/ialloc.c |   41 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 41 insertions(+)
> 
> 
> diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
> index 5082331d6c03..5e593e4292b2 100644
> --- a/fs/xfs/scrub/ialloc.c
> +++ b/fs/xfs/scrub/ialloc.c
> @@ -47,6 +47,12 @@ xchk_setup_ag_iallocbt(
>  struct xchk_iallocbt {
>  	/* Number of inodes we see while scanning inobt. */
>  	unsigned long long	inodes;
> +
> +	/* Expected next startino, for big block filesystems. */
> +	xfs_agino_t		next_startino;
> +
> +	/* Expected end of the current inode cluster. */
> +	xfs_agino_t		next_cluster_ino;
>  };
>  
>  /*
> @@ -272,6 +278,7 @@ xchk_iallocbt_rec_alignment(
>  	struct xfs_inobt_rec_incore	*irec)
>  {
>  	struct xfs_mount		*mp = bs->sc->mp;
> +	struct xchk_iallocbt		*iabt = bs->private;
>  
>  	/*
>  	 * finobt records have different positioning requirements than inobt
> @@ -289,6 +296,27 @@ xchk_iallocbt_rec_alignment(
>  		return;
>  	}
>  
> +	if (iabt->next_startino != NULLAGINO) {
> +		/*
> +		 * We're midway through a cluster of inodes that is mapped by
> +		 * multiple inobt records.  Did we get the record for the next
> +		 * irec in the sequence?
> +		 */
> +		if (irec->ir_startino != iabt->next_startino) {
> +			xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> +			return;
> +		}
> +
> +		iabt->next_startino += XFS_INODES_PER_CHUNK;
> +
> +		/* Are we done with the cluster? */
> +		if (iabt->next_startino >= iabt->next_cluster_ino) {
> +			iabt->next_startino = NULLAGINO;
> +			iabt->next_cluster_ino = NULLAGINO;
> +		}
> +		return;
> +	}
> +
>  	/* inobt records must be aligned to cluster and inoalignmnt size. */
>  	if (irec->ir_startino & (mp->m_cluster_align_inodes - 1)) {
>  		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> @@ -299,6 +327,17 @@ xchk_iallocbt_rec_alignment(
>  		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
>  		return;
>  	}
> +
> +	if (mp->m_inodes_per_cluster <= XFS_INODES_PER_CHUNK)
> +		return;
> +
> +	/*
> +	 * If this is the start of an inode cluster that can be mapped by
> +	 * multiple inobt records, the next inobt record must follow exactly
> +	 * after this one.
> +	 */
> +	iabt->next_startino = irec->ir_startino + XFS_INODES_PER_CHUNK;
> +	iabt->next_cluster_ino = irec->ir_startino + mp->m_inodes_per_cluster;
>  }
>  
>  /* Scrub an inobt/finobt record. */
> @@ -463,6 +502,8 @@ xchk_iallocbt(
>  	struct xfs_btree_cur	*cur;
>  	struct xchk_iallocbt	iabt = {
>  		.inodes		= 0,
> +		.next_startino	= NULLAGINO,
> +		.next_cluster_ino = NULLAGINO,
>  	};
>  	int			error;
>  
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/8] xfs: hoist inode cluster checks out of loop
  2019-01-01  2:08 ` [PATCH 4/8] xfs: hoist inode cluster checks out of loop Darrick J. Wong
@ 2019-01-04 18:31   ` Brian Foster
  0 siblings, 0 replies; 30+ messages in thread
From: Brian Foster @ 2019-01-04 18:31 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, Dave Chinner

On Mon, Dec 31, 2018 at 06:08:52PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Hoist the inode cluster checks out of the inobt record check loop into
> a separate function in preparation for refactoring of that loop.  No
> functional changes here; that's in the next patch.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/scrub/ialloc.c |  119 +++++++++++++++++++++++++++----------------------
>  1 file changed, 65 insertions(+), 54 deletions(-)
> 
> 
> diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
> index 5e593e4292b2..9af01ea0258d 100644
> --- a/fs/xfs/scrub/ialloc.c
> +++ b/fs/xfs/scrub/ialloc.c
...
> +/* Make sure the free mask is consistent with what the inodes think. */
> +STATIC int
> +xchk_iallocbt_check_freemask(
> +	struct xchk_btree		*bs,
> +	struct xfs_inobt_rec_incore	*irec)
> +{
> +	struct xfs_mount		*mp = bs->cur->bc_mp;
> +	xfs_agino_t			agino;
> +	int				error = 0;
> +
> +	for (agino = irec->ir_startino;
> +	     agino < irec->ir_startino + XFS_INODES_PER_CHUNK;
> +	     agino += mp->m_blocks_per_cluster * mp->m_sb.sb_inopblock) {

->m_inodes_per_cluster ?

Otherwise looks fine:

Reviewed-by: Brian Foster <bfoster@redhat.com>

> +		error = xchk_iallocbt_check_cluster(bs, irec, agino);
> +		if (error)
> +			break;
>  	}
>  
>  	return error;
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/8] xfs: clean up the inode cluster checking in the inobt scrub
  2019-01-01  2:08 ` [PATCH 5/8] xfs: clean up the inode cluster checking in the inobt scrub Darrick J. Wong
@ 2019-01-04 18:32   ` Brian Foster
  2019-01-04 22:02     ` Darrick J. Wong
  0 siblings, 1 reply; 30+ messages in thread
From: Brian Foster @ 2019-01-04 18:32 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Dec 31, 2018 at 06:08:58PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> The code to check inobt records against inode clusters is a mess of
> poorly named variables and unnecessary parameters.  Clean the
> unnecessary inode number parameters out of _check_cluster_freemask in
> favor of computing them inside the function instead of making the caller
> do it.  In xchk_iallocbt_check_cluster, rename the variables to make it
> more obvious just what chunk_ino and cluster_ino represent.
> 
> Add a tracepoint to make it easier to track each inode cluster as we
> scrub it.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/scrub/ialloc.c |  162 ++++++++++++++++++++++++++++++++-----------------
>  fs/xfs/scrub/trace.h  |   45 ++++++++++++++
>  2 files changed, 151 insertions(+), 56 deletions(-)
> 
> 
> diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
> index 9af01ea0258d..2f6c2d7fa3fd 100644
> --- a/fs/xfs/scrub/ialloc.c
> +++ b/fs/xfs/scrub/ialloc.c
> @@ -134,41 +134,70 @@ xchk_iallocbt_freecount(
>  	return hweight64(freemask);
>  }
>  
> -/* Check a particular inode with ir_free. */
> +/*
> + * Check that an inode's allocation status matches ir_free in the inobt
> + * record.  First we try querying the in-core inode state, and if the inode
> + * isn't loaded we examine the on-disk inode directly.
> + *
> + * Since there can be 1:M and M:1 mappings between inobt records and inode
> + * clusters, we pass in the inode location information as an inobt record;
> + * the index of an inode cluster within the inobt record (as well as the
> + * cluster buffer itself); and the index of the inode within the cluster.
> + *
> + * @irec is the inobt record.
> + * @cluster_base is the inode offset of the cluster within the @irec.
> + * @cluster_bp is the cluster buffer.
> + * @cluster_index is the inode offset within the inode cluster.
> + */
>  STATIC int
> -xchk_iallocbt_check_cluster_freemask(
> +xchk_iallocbt_check_cluster_ifree(
>  	struct xchk_btree		*bs,
> -	xfs_ino_t			fsino,
> -	xfs_agino_t			chunkino,
> -	xfs_agino_t			clusterino,
>  	struct xfs_inobt_rec_incore	*irec,
> -	struct xfs_buf			*bp)
> +	unsigned int			cluster_base,
> +	struct xfs_buf			*cluster_bp,
> +	unsigned int			cluster_index)
>  {
> -	struct xfs_dinode		*dip;
>  	struct xfs_mount		*mp = bs->cur->bc_mp;
> -	bool				inode_is_free = false;
> +	struct xfs_dinode		*dip;
> +	xfs_ino_t			fsino;
> +	xfs_agino_t			agino;
> +	unsigned int			offset;
> +	bool				irec_free;
> +	bool				ino_inuse;
>  	bool				freemask_ok;
> -	bool				inuse;
> -	int				error = 0;
> +	int				error;
>  
>  	if (xchk_should_terminate(bs->sc, &error))
>  		return error;
>  
> -	dip = xfs_buf_offset(bp, clusterino * mp->m_sb.sb_inodesize);
> +	/*
> +	 * Given an inobt record, an offset of a cluster within the record,
> +	 * and an offset of an inode within a cluster, compute which fs inode
> +	 * we're talking about and the offset of the inode record within the

Do you mean "offset of the inode within the buffer?"

> +	 * inode buffer.
> +	 */
> +	agino = irec->ir_startino + cluster_base + cluster_index;
> +	fsino = XFS_AGINO_TO_INO(mp, bs->cur->bc_private.a.agno, agino);
> +	offset = cluster_index * mp->m_sb.sb_inodesize;
> +	if (offset >= BBTOB(cluster_bp->b_length)) {
> +		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> +		goto out;
> +	}
...
> @@ -241,40 +284,47 @@ xchk_iallocbt_check_cluster(
...
>  STATIC int
> -xchk_iallocbt_check_freemask(
> +xchk_iallocbt_check_clusters(
>  	struct xchk_btree		*bs,
>  	struct xfs_inobt_rec_incore	*irec)
>  {
> -	struct xfs_mount		*mp = bs->cur->bc_mp;
> -	xfs_agino_t			agino;
> +	unsigned int			cluster_base;
>  	int				error = 0;
>  
> -	for (agino = irec->ir_startino;
> -	     agino < irec->ir_startino + XFS_INODES_PER_CHUNK;
> -	     agino += mp->m_blocks_per_cluster * mp->m_sb.sb_inopblock) {
> -		error = xchk_iallocbt_check_cluster(bs, irec, agino);
> +	/*
> +	 * For the common case where this inobt record maps to multiple inode
> +	 * clusters this will call _check_cluster for each cluster.
> +	 *
> +	 * For the case that multiple inobt records map to a single cluster,
> +	 * this will call _check_cluster once.
> +	 */
> +	for (cluster_base = 0;
> +	     cluster_base < XFS_INODES_PER_CHUNK;
> +	     cluster_base += bs->sc->mp->m_inodes_per_cluster) {

I see this puts ->m_inodes_per_cluster back here, not sure why it
changed back and forth but Ok.

> +		error = xchk_iallocbt_check_cluster(bs, irec, cluster_base);
>  		if (error)
>  			break;
>  	}
> @@ -431,7 +481,7 @@ xchk_iallocbt_rec(
>  		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
>  
>  check_freemask:

Perhaps we should rename the label as well?

Nits aside:

Reviewed-by: Brian Foster <bfoster@redhat.com>

> -	error = xchk_iallocbt_check_freemask(bs, &irec);
> +	error = xchk_iallocbt_check_clusters(bs, &irec);
>  	if (error)
>  		goto out;
>  
> diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
> index 8344b14031ef..3c83e8b3b39c 100644
> --- a/fs/xfs/scrub/trace.h
> +++ b/fs/xfs/scrub/trace.h
> @@ -545,6 +545,51 @@ TRACE_EVENT(xchk_xref_error,
>  		  __entry->ret_ip)
>  );
>  
> +TRACE_EVENT(xchk_iallocbt_check_cluster,
> +	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
> +		 xfs_agino_t startino, xfs_daddr_t map_daddr,
> +		 unsigned short map_len, unsigned int chunk_ino,
> +		 unsigned int nr_inodes, uint16_t cluster_mask,
> +		 uint16_t holemask, unsigned int cluster_ino),
> +	TP_ARGS(mp, agno, startino, map_daddr, map_len, chunk_ino, nr_inodes,
> +		cluster_mask, holemask, cluster_ino),
> +	TP_STRUCT__entry(
> +		__field(dev_t, dev)
> +		__field(xfs_agnumber_t, agno)
> +		__field(xfs_agino_t, startino)
> +		__field(xfs_daddr_t, map_daddr)
> +		__field(unsigned short, map_len)
> +		__field(unsigned int, chunk_ino)
> +		__field(unsigned int, nr_inodes)
> +		__field(unsigned int, cluster_ino)
> +		__field(uint16_t, cluster_mask)
> +		__field(uint16_t, holemask)
> +	),
> +	TP_fast_assign(
> +		__entry->dev = mp->m_super->s_dev;
> +		__entry->agno = agno;
> +		__entry->startino = startino;
> +		__entry->map_daddr = map_daddr;
> +		__entry->map_len = map_len;
> +		__entry->chunk_ino = chunk_ino;
> +		__entry->nr_inodes = nr_inodes;
> +		__entry->cluster_mask = cluster_mask;
> +		__entry->holemask = holemask;
> +		__entry->cluster_ino = cluster_ino;
> +	),
> +	TP_printk("dev %d:%d agno %d startino %u daddr 0x%llx len %d chunkino %u nr_inodes %u cluster_mask 0x%x holemask 0x%x cluster_ino %u",
> +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> +		  __entry->agno,
> +		  __entry->startino,
> +		  __entry->map_daddr,
> +		  __entry->map_len,
> +		  __entry->chunk_ino,
> +		  __entry->nr_inodes,
> +		  __entry->cluster_mask,
> +		  __entry->holemask,
> +		  __entry->cluster_ino)
> +)
> +
>  /* repair tracepoints */
>  #if IS_ENABLED(CONFIG_XFS_ONLINE_REPAIR)
>  
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/8] xfs: scrub big block inode btrees correctly
  2019-01-01  2:09 ` [PATCH 6/8] xfs: scrub big block inode btrees correctly Darrick J. Wong
@ 2019-01-04 18:38   ` Brian Foster
  2019-01-05  0:29     ` Darrick J. Wong
  0 siblings, 1 reply; 30+ messages in thread
From: Brian Foster @ 2019-01-04 18:38 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Dec 31, 2018 at 06:09:10PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Teach scrub how to handle the case that there are one or more inobt
> records covering a given inode cluster.  This fixes the operation on big
> block filesystems (e.g. 64k blocks, 512 byte inodes).
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/scrub/ialloc.c |    8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> 
> diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
> index 2f6c2d7fa3fd..1be6a5ebd61e 100644
> --- a/fs/xfs/scrub/ialloc.c
> +++ b/fs/xfs/scrub/ialloc.c
> @@ -162,6 +162,7 @@ xchk_iallocbt_check_cluster_ifree(
>  	xfs_ino_t			fsino;
>  	xfs_agino_t			agino;
>  	unsigned int			offset;
> +	unsigned int			cluster_buf_base;
>  	bool				irec_free;
>  	bool				ino_inuse;
>  	bool				freemask_ok;
> @@ -174,11 +175,14 @@ xchk_iallocbt_check_cluster_ifree(
>  	 * Given an inobt record, an offset of a cluster within the record,
>  	 * and an offset of an inode within a cluster, compute which fs inode
>  	 * we're talking about and the offset of the inode record within the
> -	 * inode buffer.
> +	 * inode buffer, being careful about inobt records that don't align
> +	 * with the start of the inode buffer when block sizes are large.
>  	 */
>  	agino = irec->ir_startino + cluster_base + cluster_index;
>  	fsino = XFS_AGINO_TO_INO(mp, bs->cur->bc_private.a.agno, agino);
> -	offset = cluster_index * mp->m_sb.sb_inodesize;
> +	cluster_buf_base = XFS_INO_TO_OFFSET(mp, irec->ir_startino +
> +					     cluster_base);
> +	offset = (cluster_buf_base + cluster_index) * mp->m_sb.sb_inodesize;

So cluster_buf_base should always be 0 unless the record itself is not
cluster aligned (i.e., the second record in a large FSB), right? If so,
cluster_base should also be 0 in any case where cluster_buf_base != 0.
I'm wondering if that can be made a little more explicit, or perhaps
self-documented with an assert.

Thinking more about it, it's kind of confusing to me either way because
cluster_index isn't really a cluster index in this case, rather it's
more of a record index because it doesn't account for the fact that the
record is a subset of the cluster. Hmmm... could we simplify this by
setting imap.im_boffset appropriately in the caller such that dip always
points to either the first inode in the buffer or first inode in the
record, then just pass in dip directly and let the caller increment it
in the associated loop? Maybe that's something for another patch..

Brian

>  	if (offset >= BBTOB(cluster_bp->b_length)) {
>  		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
>  		goto out;
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 7/8] xfs: abort xattr scrub if fatal signals are pending
  2019-01-01  2:09 ` [PATCH 7/8] xfs: abort xattr scrub if fatal signals are pending Darrick J. Wong
@ 2019-01-04 18:39   ` Brian Foster
  0 siblings, 0 replies; 30+ messages in thread
From: Brian Foster @ 2019-01-04 18:39 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Dec 31, 2018 at 06:09:17PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> The extended attribute scrubber should abort the "read all attrs" loop
> if there's a fatal signal pending on the process.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/scrub/attr.c |    5 +++++
>  1 file changed, 5 insertions(+)
> 
> 
> diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
> index 81d5e90547a1..9960bc5b5d76 100644
> --- a/fs/xfs/scrub/attr.c
> +++ b/fs/xfs/scrub/attr.c
> @@ -82,6 +82,11 @@ xchk_xattr_listent(
>  
>  	sx = container_of(context, struct xchk_xattr, context);
>  
> +	if (xchk_should_terminate(sx->sc, &error)) {
> +		context->seen_enough = 1;
> +		return;
> +	}
> +
>  	if (flags & XFS_ATTR_INCOMPLETE) {
>  		/* Incomplete attr key, just mark the inode for preening. */
>  		xchk_ino_set_preen(sx->sc, context->dp->i_ino);
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 8/8] xfs: scrub should flag dir/attr offsets that aren't mappable with xfs_dablk_t
  2019-01-01  2:09 ` [PATCH 8/8] xfs: scrub should flag dir/attr offsets that aren't mappable with xfs_dablk_t Darrick J. Wong
@ 2019-01-04 18:39   ` Brian Foster
  2019-01-04 23:09     ` Darrick J. Wong
  0 siblings, 1 reply; 30+ messages in thread
From: Brian Foster @ 2019-01-04 18:39 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Dec 31, 2018 at 06:09:23PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Teach scrub to flag extent maps that exceed the range that can be mapped
> with a xfs_dablk_t.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_types.c |   11 +++++++++++
>  fs/xfs/libxfs/xfs_types.h |    1 +
>  fs/xfs/scrub/bmap.c       |   27 +++++++++++++++++++++++++++
>  3 files changed, 39 insertions(+)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_types.c b/fs/xfs/libxfs/xfs_types.c
> index 3306fc42cfad..8e03e88a2f1a 100644
> --- a/fs/xfs/libxfs/xfs_types.c
> +++ b/fs/xfs/libxfs/xfs_types.c
> @@ -204,3 +204,14 @@ xfs_verify_icount(
>  	xfs_icount_range(mp, &min, &max);
>  	return icount >= min && icount <= max;
>  }
> +
> +/* Sanity-checking of dir/attr block offsets. */
> +bool
> +xfs_verify_dablk(
> +	struct xfs_mount	*mp,
> +	xfs_fileoff_t		dabno)
> +{
> +	xfs_dablk_t		max_dablk = -1U;
> +
> +	return dabno < max_dablk;

Should that be <= ?

Brian

> +}
> diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h
> index 8f02855a019a..704b4f308780 100644
> --- a/fs/xfs/libxfs/xfs_types.h
> +++ b/fs/xfs/libxfs/xfs_types.h
> @@ -188,5 +188,6 @@ bool xfs_internal_inum(struct xfs_mount *mp, xfs_ino_t ino);
>  bool xfs_verify_dir_ino(struct xfs_mount *mp, xfs_ino_t ino);
>  bool xfs_verify_rtbno(struct xfs_mount *mp, xfs_rtblock_t rtbno);
>  bool xfs_verify_icount(struct xfs_mount *mp, unsigned long long icount);
> +bool xfs_verify_dablk(struct xfs_mount *mp, xfs_fileoff_t off);
>  
>  #endif	/* __XFS_TYPES_H__ */
> diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c
> index e1d11f3223e3..a703cd58a90e 100644
> --- a/fs/xfs/scrub/bmap.c
> +++ b/fs/xfs/scrub/bmap.c
> @@ -281,6 +281,31 @@ xchk_bmap_extent_xref(
>  	xchk_ag_free(info->sc, &info->sc->sa);
>  }
>  
> +/*
> + * Directories and attr forks should never have blocks that can't be addressed
> + * by a xfs_dablk_t.
> + */
> +STATIC void
> +xchk_bmap_dirattr_extent(
> +	struct xfs_inode	*ip,
> +	struct xchk_bmap_info	*info,
> +	struct xfs_bmbt_irec	*irec)
> +{
> +	struct xfs_mount	*mp = ip->i_mount;
> +	xfs_fileoff_t		off;
> +
> +	if (!S_ISDIR(VFS_I(ip)->i_mode) && info->whichfork != XFS_ATTR_FORK)
> +		return;
> +
> +	if (!xfs_verify_dablk(mp, irec->br_startoff))
> +		xchk_fblock_set_corrupt(info->sc, info->whichfork,
> +				irec->br_startoff);
> +
> +	off = irec->br_startoff + irec->br_blockcount - 1;
> +	if (!xfs_verify_dablk(mp, off))
> +		xchk_fblock_set_corrupt(info->sc, info->whichfork, off);
> +}
> +
>  /* Scrub a single extent record. */
>  STATIC int
>  xchk_bmap_extent(
> @@ -305,6 +330,8 @@ xchk_bmap_extent(
>  		xchk_fblock_set_corrupt(info->sc, info->whichfork,
>  				irec->br_startoff);
>  
> +	xchk_bmap_dirattr_extent(ip, info, irec);
> +
>  	/* There should never be a "hole" extent in either extent list. */
>  	if (irec->br_startblock == HOLESTARTBLOCK)
>  		xchk_fblock_set_corrupt(info->sc, info->whichfork,
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/8] xfs: check the ir_startino alignment directly
  2019-01-04 18:31   ` Brian Foster
@ 2019-01-04 20:59     ` Darrick J. Wong
  2019-01-07 13:45       ` Brian Foster
  0 siblings, 1 reply; 30+ messages in thread
From: Darrick J. Wong @ 2019-01-04 20:59 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Jan 04, 2019 at 01:31:04PM -0500, Brian Foster wrote:
> On Mon, Dec 31, 2018 at 06:08:39PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > In xchk_iallocbt_rec, check the alignment of ir_startino by converting
> > the inode cluster block alignment into units of inodes instead of the
> > other way around (converting ir_startino to blocks).  This prevents us
> > from tripping over off-by-one errors in ir_startino which are obscured
> > by the inode -> block conversion.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/scrub/ialloc.c |   45 +++++++++++++++++++++++++++++++++++++++------
> >  1 file changed, 39 insertions(+), 6 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
> > index fd431682db0b..5082331d6c03 100644
> > --- a/fs/xfs/scrub/ialloc.c
> > +++ b/fs/xfs/scrub/ialloc.c
> > @@ -265,6 +265,42 @@ xchk_iallocbt_check_freemask(
> >  	return error;
> >  }
> >  
> > +/* Make sure this inode btree record is aligned properly. */
> > +STATIC void
> > +xchk_iallocbt_rec_alignment(
> > +	struct xchk_btree		*bs,
> > +	struct xfs_inobt_rec_incore	*irec)
> > +{
> > +	struct xfs_mount		*mp = bs->sc->mp;
> > +
> > +	/*
> > +	 * finobt records have different positioning requirements than inobt
> > +	 * records: each finobt record must have a corresponding inobt record.
> > +	 * That is checked in the xref function, so for now we only catch the
> > +	 * obvious case where the record isn't even chunk-aligned.
> > +	 *
> > +	 * Note also that if a fs block contains more than a single chunk of
> > +	 * inodes, we will have finobt records only for those chunks containing
> > +	 * free inodes.
> > +	 */
> > +	if (bs->cur->bc_btnum == XFS_BTNUM_FINO) {
> > +		if (irec->ir_startino & (XFS_INODES_PER_CHUNK - 1))
> > +			xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> > +		return;
> > +	}
> 
> Is the above really a finobt only check? Couldn't we run this
> sanity check against all records and skip the following for finobt?

Uhoh, it occurs to me that in the 4kblock !spinodes case we can have
inobt records (and therefore finobt records) that are aligned to
m_cluster_align_inodes, and that value can be less than 64.  So I think
this has to be something along the lines of:

imask = min(XFS_INODES_PER_CHUNK, mp->m_cluster_align_inodes) - 1;
if (irec->ir_startino & imask)
	/* set corrupt... */

Hmm, and testing seems to bear this out.  New patch forthcoming.

> Otherwise seems fine:
> 
> Reviewed-by: Brian Foster <bfoster@redhat.com>

I've wondered in recent days if this is even necessary at all -- when
we're asked to check the inobt we check the ir_startino alignment of all
those records, so really the only thing we need is the existing check
that for each finobt record there's also an inobt record with the same
ir_startino.  OTOH I guess we shouldn't really assume that the calling
process already checked the inobt or that it didn't change between calls.

--D

> > +
> > +	/* inobt records must be aligned to cluster and inoalignmnt size. */
> > +	if (irec->ir_startino & (mp->m_cluster_align_inodes - 1)) {
> > +		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> > +		return;
> > +	}
> > +
> > +	if (irec->ir_startino & (mp->m_inodes_per_cluster - 1)) {
> > +		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> > +		return;
> > +	}
> > +}
> > +
> >  /* Scrub an inobt/finobt record. */
> >  STATIC int
> >  xchk_iallocbt_rec(
> > @@ -277,7 +313,6 @@ xchk_iallocbt_rec(
> >  	uint64_t			holes;
> >  	xfs_agnumber_t			agno = bs->cur->bc_private.a.agno;
> >  	xfs_agino_t			agino;
> > -	xfs_agblock_t			agbno;
> >  	xfs_extlen_t			len;
> >  	int				holecount;
> >  	int				i;
> > @@ -304,11 +339,9 @@ xchk_iallocbt_rec(
> >  		goto out;
> >  	}
> >  
> > -	/* Make sure this record is aligned to cluster and inoalignmnt size. */
> > -	agbno = XFS_AGINO_TO_AGBNO(mp, irec.ir_startino);
> > -	if ((agbno & (mp->m_cluster_align - 1)) ||
> > -	    (agbno & (mp->m_blocks_per_cluster - 1)))
> > -		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> > +	xchk_iallocbt_rec_alignment(bs, &irec);
> > +	if (bs->sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
> > +		goto out;
> >  
> >  	iabt->inodes += irec.ir_count;
> >  
> > 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/8] xfs: clean up the inode cluster checking in the inobt scrub
  2019-01-04 18:32   ` Brian Foster
@ 2019-01-04 22:02     ` Darrick J. Wong
  0 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2019-01-04 22:02 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Jan 04, 2019 at 01:32:39PM -0500, Brian Foster wrote:
> On Mon, Dec 31, 2018 at 06:08:58PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > The code to check inobt records against inode clusters is a mess of
> > poorly named variables and unnecessary parameters.  Clean the
> > unnecessary inode number parameters out of _check_cluster_freemask in
> > favor of computing them inside the function instead of making the caller
> > do it.  In xchk_iallocbt_check_cluster, rename the variables to make it
> > more obvious just what chunk_ino and cluster_ino represent.
> > 
> > Add a tracepoint to make it easier to track each inode cluster as we
> > scrub it.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/scrub/ialloc.c |  162 ++++++++++++++++++++++++++++++++-----------------
> >  fs/xfs/scrub/trace.h  |   45 ++++++++++++++
> >  2 files changed, 151 insertions(+), 56 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
> > index 9af01ea0258d..2f6c2d7fa3fd 100644
> > --- a/fs/xfs/scrub/ialloc.c
> > +++ b/fs/xfs/scrub/ialloc.c
> > @@ -134,41 +134,70 @@ xchk_iallocbt_freecount(
> >  	return hweight64(freemask);
> >  }
> >  
> > -/* Check a particular inode with ir_free. */
> > +/*
> > + * Check that an inode's allocation status matches ir_free in the inobt
> > + * record.  First we try querying the in-core inode state, and if the inode
> > + * isn't loaded we examine the on-disk inode directly.
> > + *
> > + * Since there can be 1:M and M:1 mappings between inobt records and inode
> > + * clusters, we pass in the inode location information as an inobt record;
> > + * the index of an inode cluster within the inobt record (as well as the
> > + * cluster buffer itself); and the index of the inode within the cluster.
> > + *
> > + * @irec is the inobt record.
> > + * @cluster_base is the inode offset of the cluster within the @irec.
> > + * @cluster_bp is the cluster buffer.
> > + * @cluster_index is the inode offset within the inode cluster.
> > + */
> >  STATIC int
> > -xchk_iallocbt_check_cluster_freemask(
> > +xchk_iallocbt_check_cluster_ifree(
> >  	struct xchk_btree		*bs,
> > -	xfs_ino_t			fsino,
> > -	xfs_agino_t			chunkino,
> > -	xfs_agino_t			clusterino,
> >  	struct xfs_inobt_rec_incore	*irec,
> > -	struct xfs_buf			*bp)
> > +	unsigned int			cluster_base,
> > +	struct xfs_buf			*cluster_bp,
> > +	unsigned int			cluster_index)
> >  {
> > -	struct xfs_dinode		*dip;
> >  	struct xfs_mount		*mp = bs->cur->bc_mp;
> > -	bool				inode_is_free = false;
> > +	struct xfs_dinode		*dip;
> > +	xfs_ino_t			fsino;
> > +	xfs_agino_t			agino;
> > +	unsigned int			offset;
> > +	bool				irec_free;
> > +	bool				ino_inuse;
> >  	bool				freemask_ok;
> > -	bool				inuse;
> > -	int				error = 0;
> > +	int				error;
> >  
> >  	if (xchk_should_terminate(bs->sc, &error))
> >  		return error;
> >  
> > -	dip = xfs_buf_offset(bp, clusterino * mp->m_sb.sb_inodesize);
> > +	/*
> > +	 * Given an inobt record, an offset of a cluster within the record,
> > +	 * and an offset of an inode within a cluster, compute which fs inode
> > +	 * we're talking about and the offset of the inode record within the
> 
> Do you mean "offset of the inode within the buffer?"

Yep.

> > +	 * inode buffer.
> > +	 */
> > +	agino = irec->ir_startino + cluster_base + cluster_index;
> > +	fsino = XFS_AGINO_TO_INO(mp, bs->cur->bc_private.a.agno, agino);
> > +	offset = cluster_index * mp->m_sb.sb_inodesize;
> > +	if (offset >= BBTOB(cluster_bp->b_length)) {
> > +		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> > +		goto out;
> > +	}
> ...
> > @@ -241,40 +284,47 @@ xchk_iallocbt_check_cluster(
> ...
> >  STATIC int
> > -xchk_iallocbt_check_freemask(
> > +xchk_iallocbt_check_clusters(
> >  	struct xchk_btree		*bs,
> >  	struct xfs_inobt_rec_incore	*irec)
> >  {
> > -	struct xfs_mount		*mp = bs->cur->bc_mp;
> > -	xfs_agino_t			agino;
> > +	unsigned int			cluster_base;
> >  	int				error = 0;
> >  
> > -	for (agino = irec->ir_startino;
> > -	     agino < irec->ir_startino + XFS_INODES_PER_CHUNK;
> > -	     agino += mp->m_blocks_per_cluster * mp->m_sb.sb_inopblock) {
> > -		error = xchk_iallocbt_check_cluster(bs, irec, agino);
> > +	/*
> > +	 * For the common case where this inobt record maps to multiple inode
> > +	 * clusters this will call _check_cluster for each cluster.
> > +	 *
> > +	 * For the case that multiple inobt records map to a single cluster,
> > +	 * this will call _check_cluster once.
> > +	 */
> > +	for (cluster_base = 0;
> > +	     cluster_base < XFS_INODES_PER_CHUNK;
> > +	     cluster_base += bs->sc->mp->m_inodes_per_cluster) {
> 
> I see this puts ->m_inodes_per_cluster back here, not sure why it
> changed back and forth but Ok.

I'll fix it up before I commit them.  Evidently I forgot to change all
the patches when I created inodes_per_cluster out of (blocks_per_cluster
* inopblock).

> > +		error = xchk_iallocbt_check_cluster(bs, irec, cluster_base);
> >  		if (error)
> >  			break;
> >  	}
> > @@ -431,7 +481,7 @@ xchk_iallocbt_rec(
> >  		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> >  
> >  check_freemask:
> 
> Perhaps we should rename the label as well?

Yep.

> Nits aside:
> 
> Reviewed-by: Brian Foster <bfoster@redhat.com>
> 
> > -	error = xchk_iallocbt_check_freemask(bs, &irec);
> > +	error = xchk_iallocbt_check_clusters(bs, &irec);
> >  	if (error)
> >  		goto out;
> >  
> > diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
> > index 8344b14031ef..3c83e8b3b39c 100644
> > --- a/fs/xfs/scrub/trace.h
> > +++ b/fs/xfs/scrub/trace.h
> > @@ -545,6 +545,51 @@ TRACE_EVENT(xchk_xref_error,
> >  		  __entry->ret_ip)
> >  );
> >  
> > +TRACE_EVENT(xchk_iallocbt_check_cluster,
> > +	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
> > +		 xfs_agino_t startino, xfs_daddr_t map_daddr,
> > +		 unsigned short map_len, unsigned int chunk_ino,
> > +		 unsigned int nr_inodes, uint16_t cluster_mask,
> > +		 uint16_t holemask, unsigned int cluster_ino),
> > +	TP_ARGS(mp, agno, startino, map_daddr, map_len, chunk_ino, nr_inodes,
> > +		cluster_mask, holemask, cluster_ino),
> > +	TP_STRUCT__entry(
> > +		__field(dev_t, dev)
> > +		__field(xfs_agnumber_t, agno)
> > +		__field(xfs_agino_t, startino)
> > +		__field(xfs_daddr_t, map_daddr)
> > +		__field(unsigned short, map_len)
> > +		__field(unsigned int, chunk_ino)
> > +		__field(unsigned int, nr_inodes)
> > +		__field(unsigned int, cluster_ino)
> > +		__field(uint16_t, cluster_mask)
> > +		__field(uint16_t, holemask)
> > +	),
> > +	TP_fast_assign(
> > +		__entry->dev = mp->m_super->s_dev;
> > +		__entry->agno = agno;
> > +		__entry->startino = startino;
> > +		__entry->map_daddr = map_daddr;
> > +		__entry->map_len = map_len;
> > +		__entry->chunk_ino = chunk_ino;
> > +		__entry->nr_inodes = nr_inodes;
> > +		__entry->cluster_mask = cluster_mask;
> > +		__entry->holemask = holemask;
> > +		__entry->cluster_ino = cluster_ino;
> > +	),
> > +	TP_printk("dev %d:%d agno %d startino %u daddr 0x%llx len %d chunkino %u nr_inodes %u cluster_mask 0x%x holemask 0x%x cluster_ino %u",
> > +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> > +		  __entry->agno,
> > +		  __entry->startino,
> > +		  __entry->map_daddr,
> > +		  __entry->map_len,
> > +		  __entry->chunk_ino,
> > +		  __entry->nr_inodes,
> > +		  __entry->cluster_mask,
> > +		  __entry->holemask,
> > +		  __entry->cluster_ino)
> > +)
> > +
> >  /* repair tracepoints */
> >  #if IS_ENABLED(CONFIG_XFS_ONLINE_REPAIR)
> >  
> > 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 8/8] xfs: scrub should flag dir/attr offsets that aren't mappable with xfs_dablk_t
  2019-01-04 18:39   ` Brian Foster
@ 2019-01-04 23:09     ` Darrick J. Wong
  0 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2019-01-04 23:09 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Jan 04, 2019 at 01:39:31PM -0500, Brian Foster wrote:
> On Mon, Dec 31, 2018 at 06:09:23PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Teach scrub to flag extent maps that exceed the range that can be mapped
> > with a xfs_dablk_t.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/libxfs/xfs_types.c |   11 +++++++++++
> >  fs/xfs/libxfs/xfs_types.h |    1 +
> >  fs/xfs/scrub/bmap.c       |   27 +++++++++++++++++++++++++++
> >  3 files changed, 39 insertions(+)
> > 
> > 
> > diff --git a/fs/xfs/libxfs/xfs_types.c b/fs/xfs/libxfs/xfs_types.c
> > index 3306fc42cfad..8e03e88a2f1a 100644
> > --- a/fs/xfs/libxfs/xfs_types.c
> > +++ b/fs/xfs/libxfs/xfs_types.c
> > @@ -204,3 +204,14 @@ xfs_verify_icount(
> >  	xfs_icount_range(mp, &min, &max);
> >  	return icount >= min && icount <= max;
> >  }
> > +
> > +/* Sanity-checking of dir/attr block offsets. */
> > +bool
> > +xfs_verify_dablk(
> > +	struct xfs_mount	*mp,
> > +	xfs_fileoff_t		dabno)
> > +{
> > +	xfs_dablk_t		max_dablk = -1U;
> > +
> > +	return dabno < max_dablk;
> 
> Should that be <= ?

Yeah.  Thanks for catching that.

/me notes that xfs_repair doesn't seem to check dablk offsets at all.

Anyway, thanks for the review!  I'll resend the patches that need a v2
early next week.

--D

> Brian
> 
> > +}
> > diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h
> > index 8f02855a019a..704b4f308780 100644
> > --- a/fs/xfs/libxfs/xfs_types.h
> > +++ b/fs/xfs/libxfs/xfs_types.h
> > @@ -188,5 +188,6 @@ bool xfs_internal_inum(struct xfs_mount *mp, xfs_ino_t ino);
> >  bool xfs_verify_dir_ino(struct xfs_mount *mp, xfs_ino_t ino);
> >  bool xfs_verify_rtbno(struct xfs_mount *mp, xfs_rtblock_t rtbno);
> >  bool xfs_verify_icount(struct xfs_mount *mp, unsigned long long icount);
> > +bool xfs_verify_dablk(struct xfs_mount *mp, xfs_fileoff_t off);
> >  
> >  #endif	/* __XFS_TYPES_H__ */
> > diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c
> > index e1d11f3223e3..a703cd58a90e 100644
> > --- a/fs/xfs/scrub/bmap.c
> > +++ b/fs/xfs/scrub/bmap.c
> > @@ -281,6 +281,31 @@ xchk_bmap_extent_xref(
> >  	xchk_ag_free(info->sc, &info->sc->sa);
> >  }
> >  
> > +/*
> > + * Directories and attr forks should never have blocks that can't be addressed
> > + * by a xfs_dablk_t.
> > + */
> > +STATIC void
> > +xchk_bmap_dirattr_extent(
> > +	struct xfs_inode	*ip,
> > +	struct xchk_bmap_info	*info,
> > +	struct xfs_bmbt_irec	*irec)
> > +{
> > +	struct xfs_mount	*mp = ip->i_mount;
> > +	xfs_fileoff_t		off;
> > +
> > +	if (!S_ISDIR(VFS_I(ip)->i_mode) && info->whichfork != XFS_ATTR_FORK)
> > +		return;
> > +
> > +	if (!xfs_verify_dablk(mp, irec->br_startoff))
> > +		xchk_fblock_set_corrupt(info->sc, info->whichfork,
> > +				irec->br_startoff);
> > +
> > +	off = irec->br_startoff + irec->br_blockcount - 1;
> > +	if (!xfs_verify_dablk(mp, off))
> > +		xchk_fblock_set_corrupt(info->sc, info->whichfork, off);
> > +}
> > +
> >  /* Scrub a single extent record. */
> >  STATIC int
> >  xchk_bmap_extent(
> > @@ -305,6 +330,8 @@ xchk_bmap_extent(
> >  		xchk_fblock_set_corrupt(info->sc, info->whichfork,
> >  				irec->br_startoff);
> >  
> > +	xchk_bmap_dirattr_extent(ip, info, irec);
> > +
> >  	/* There should never be a "hole" extent in either extent list. */
> >  	if (irec->br_startblock == HOLESTARTBLOCK)
> >  		xchk_fblock_set_corrupt(info->sc, info->whichfork,
> > 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/8] xfs: scrub big block inode btrees correctly
  2019-01-04 18:38   ` Brian Foster
@ 2019-01-05  0:29     ` Darrick J. Wong
  2019-01-07 13:45       ` Brian Foster
  0 siblings, 1 reply; 30+ messages in thread
From: Darrick J. Wong @ 2019-01-05  0:29 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Jan 04, 2019 at 01:38:56PM -0500, Brian Foster wrote:
> On Mon, Dec 31, 2018 at 06:09:10PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Teach scrub how to handle the case that there are one or more inobt
> > records covering a given inode cluster.  This fixes the operation on big
> > block filesystems (e.g. 64k blocks, 512 byte inodes).
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/scrub/ialloc.c |    8 ++++++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
> > index 2f6c2d7fa3fd..1be6a5ebd61e 100644
> > --- a/fs/xfs/scrub/ialloc.c
> > +++ b/fs/xfs/scrub/ialloc.c
> > @@ -162,6 +162,7 @@ xchk_iallocbt_check_cluster_ifree(
> >  	xfs_ino_t			fsino;
> >  	xfs_agino_t			agino;
> >  	unsigned int			offset;
> > +	unsigned int			cluster_buf_base;
> >  	bool				irec_free;
> >  	bool				ino_inuse;
> >  	bool				freemask_ok;
> > @@ -174,11 +175,14 @@ xchk_iallocbt_check_cluster_ifree(
> >  	 * Given an inobt record, an offset of a cluster within the record,
> >  	 * and an offset of an inode within a cluster, compute which fs inode
> >  	 * we're talking about and the offset of the inode record within the
> > -	 * inode buffer.
> > +	 * inode buffer, being careful about inobt records that don't align
> > +	 * with the start of the inode buffer when block sizes are large.
> >  	 */
> >  	agino = irec->ir_startino + cluster_base + cluster_index;
> >  	fsino = XFS_AGINO_TO_INO(mp, bs->cur->bc_private.a.agno, agino);
> > -	offset = cluster_index * mp->m_sb.sb_inodesize;
> > +	cluster_buf_base = XFS_INO_TO_OFFSET(mp, irec->ir_startino +
> > +					     cluster_base);
> > +	offset = (cluster_buf_base + cluster_index) * mp->m_sb.sb_inodesize;
> 
> So cluster_buf_base should always be 0 unless the record itself is not
> cluster aligned (i.e., the second record in a large FSB), right?

Right.  For the (n > 0)th inobt record for a large FSB, ir_startino's
block offset will be less than inopblock, so cluster_buf_base will be
greater than zero.

> If so, cluster_base should also be 0 in any case where
> cluster_buf_base != 0.  I'm wondering if that can be made a little
> more explicit, or perhaps self-documented with an assert.

Right.  For the more typical case where there are multiple clusters for
each inobt record, ir_startino's block offset will always point to the
start of an inode cluster (and therefore cluster_buf_base will be zero)
but cluster_base will increment as we visit each cluster in the inobt.

I think this code can become:

	cluster_buf_base = XFS_INO_TO_OFFSET(mp, irec->ir_startino);
	ASSERT(cluster_buf_base == 0 || cluster_base == 0);
	offset = (cluster_buf_base + cluster_index) << mp->m_sb.sb_inopblog;

But I'll go feed that to the test machines before I commit to that. :)

> Thinking more about it, it's kind of confusing to me either way because
> cluster_index isn't really a cluster index in this case, rather it's
> more of a record index because it doesn't account for the fact that the
> record is a subset of the cluster. Hmmm... could we simplify this by
> setting imap.im_boffset appropriately in the caller such that dip always
> points to either the first inode in the buffer or first inode in the
> record, then just pass in dip directly and let the caller increment it
> in the associated loop? Maybe that's something for another patch..

I think that would be possible...

--D

> Brian
> 
> >  	if (offset >= BBTOB(cluster_bp->b_length)) {
> >  		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> >  		goto out;
> > 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/8] xfs: check the ir_startino alignment directly
  2019-01-04 20:59     ` Darrick J. Wong
@ 2019-01-07 13:45       ` Brian Foster
  2019-01-08  1:43         ` Darrick J. Wong
  0 siblings, 1 reply; 30+ messages in thread
From: Brian Foster @ 2019-01-07 13:45 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Fri, Jan 04, 2019 at 12:59:06PM -0800, Darrick J. Wong wrote:
> On Fri, Jan 04, 2019 at 01:31:04PM -0500, Brian Foster wrote:
> > On Mon, Dec 31, 2018 at 06:08:39PM -0800, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > In xchk_iallocbt_rec, check the alignment of ir_startino by converting
> > > the inode cluster block alignment into units of inodes instead of the
> > > other way around (converting ir_startino to blocks).  This prevents us
> > > from tripping over off-by-one errors in ir_startino which are obscured
> > > by the inode -> block conversion.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > >  fs/xfs/scrub/ialloc.c |   45 +++++++++++++++++++++++++++++++++++++++------
> > >  1 file changed, 39 insertions(+), 6 deletions(-)
> > > 
> > > 
> > > diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
> > > index fd431682db0b..5082331d6c03 100644
> > > --- a/fs/xfs/scrub/ialloc.c
> > > +++ b/fs/xfs/scrub/ialloc.c
> > > @@ -265,6 +265,42 @@ xchk_iallocbt_check_freemask(
> > >  	return error;
> > >  }
> > >  
> > > +/* Make sure this inode btree record is aligned properly. */
> > > +STATIC void
> > > +xchk_iallocbt_rec_alignment(
> > > +	struct xchk_btree		*bs,
> > > +	struct xfs_inobt_rec_incore	*irec)
> > > +{
> > > +	struct xfs_mount		*mp = bs->sc->mp;
> > > +
> > > +	/*
> > > +	 * finobt records have different positioning requirements than inobt
> > > +	 * records: each finobt record must have a corresponding inobt record.
> > > +	 * That is checked in the xref function, so for now we only catch the
> > > +	 * obvious case where the record isn't even chunk-aligned.
> > > +	 *
> > > +	 * Note also that if a fs block contains more than a single chunk of
> > > +	 * inodes, we will have finobt records only for those chunks containing
> > > +	 * free inodes.
> > > +	 */
> > > +	if (bs->cur->bc_btnum == XFS_BTNUM_FINO) {
> > > +		if (irec->ir_startino & (XFS_INODES_PER_CHUNK - 1))
> > > +			xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> > > +		return;
> > > +	}
> > 
> > Is the above really a finobt only check? Couldn't we run this
> > sanity check against all records and skip the following for finobt?
> 
> Uhoh, it occurs to me that in the 4kblock !spinodes case we can have
> inobt records (and therefore finobt records) that are aligned to
> m_cluster_align_inodes, and that value can be less than 64.  So I think
> this has to be something along the lines of:
> 

Er, yeah. IIRC spinodes required slightly larger inode chunk alignment
than traditionally required in order to ensure we don't create
conflicting/overlapping sparse records. A quick look at mkfs shows that
the cluster size basically defines the sparse chunk size and the chunk
alignment == chunk size.

> imask = min(XFS_INODES_PER_CHUNK, mp->m_cluster_align_inodes) - 1;
> if (irec->ir_startino & imask)
> 	/* set corrupt... */
> 
> Hmm, and testing seems to bear this out.  New patch forthcoming.
> 

Ok, I take it the min() is required because m_cluster_align_inodes can
be multiple records in the large FSB case. If so, I wonder if it would
be more simple to use m_cluster_align, but otherwise a one-liner comment
couldn't hurt.

> > Otherwise seems fine:
> > 
> > Reviewed-by: Brian Foster <bfoster@redhat.com>
> 
> I've wondered in recent days if this is even necessary at all -- when
> we're asked to check the inobt we check the ir_startino alignment of all
> those records, so really the only thing we need is the existing check
> that for each finobt record there's also an inobt record with the same
> ir_startino.  OTOH I guess we shouldn't really assume that the calling
> process already checked the inobt or that it didn't change between calls.
> 

I guess it makes sense to verify the applicable records match, but in
general I agree that "1. verify the inobt 2. verify the finobt is a
subset of the inobt" is probably sufficient (and more elegant logic).

Brian

> --D
> 
> > > +
> > > +	/* inobt records must be aligned to cluster and inoalignmnt size. */
> > > +	if (irec->ir_startino & (mp->m_cluster_align_inodes - 1)) {
> > > +		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> > > +		return;
> > > +	}
> > > +
> > > +	if (irec->ir_startino & (mp->m_inodes_per_cluster - 1)) {
> > > +		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> > > +		return;
> > > +	}
> > > +}
> > > +
> > >  /* Scrub an inobt/finobt record. */
> > >  STATIC int
> > >  xchk_iallocbt_rec(
> > > @@ -277,7 +313,6 @@ xchk_iallocbt_rec(
> > >  	uint64_t			holes;
> > >  	xfs_agnumber_t			agno = bs->cur->bc_private.a.agno;
> > >  	xfs_agino_t			agino;
> > > -	xfs_agblock_t			agbno;
> > >  	xfs_extlen_t			len;
> > >  	int				holecount;
> > >  	int				i;
> > > @@ -304,11 +339,9 @@ xchk_iallocbt_rec(
> > >  		goto out;
> > >  	}
> > >  
> > > -	/* Make sure this record is aligned to cluster and inoalignmnt size. */
> > > -	agbno = XFS_AGINO_TO_AGBNO(mp, irec.ir_startino);
> > > -	if ((agbno & (mp->m_cluster_align - 1)) ||
> > > -	    (agbno & (mp->m_blocks_per_cluster - 1)))
> > > -		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> > > +	xchk_iallocbt_rec_alignment(bs, &irec);
> > > +	if (bs->sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
> > > +		goto out;
> > >  
> > >  	iabt->inodes += irec.ir_count;
> > >  
> > > 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/8] xfs: scrub big block inode btrees correctly
  2019-01-05  0:29     ` Darrick J. Wong
@ 2019-01-07 13:45       ` Brian Foster
  2019-01-08  2:03         ` Darrick J. Wong
  0 siblings, 1 reply; 30+ messages in thread
From: Brian Foster @ 2019-01-07 13:45 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Fri, Jan 04, 2019 at 04:29:16PM -0800, Darrick J. Wong wrote:
> On Fri, Jan 04, 2019 at 01:38:56PM -0500, Brian Foster wrote:
> > On Mon, Dec 31, 2018 at 06:09:10PM -0800, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > Teach scrub how to handle the case that there are one or more inobt
> > > records covering a given inode cluster.  This fixes the operation on big
> > > block filesystems (e.g. 64k blocks, 512 byte inodes).
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > >  fs/xfs/scrub/ialloc.c |    8 ++++++--
> > >  1 file changed, 6 insertions(+), 2 deletions(-)
> > > 
> > > 
> > > diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
> > > index 2f6c2d7fa3fd..1be6a5ebd61e 100644
> > > --- a/fs/xfs/scrub/ialloc.c
> > > +++ b/fs/xfs/scrub/ialloc.c
> > > @@ -162,6 +162,7 @@ xchk_iallocbt_check_cluster_ifree(
> > >  	xfs_ino_t			fsino;
> > >  	xfs_agino_t			agino;
> > >  	unsigned int			offset;
> > > +	unsigned int			cluster_buf_base;
> > >  	bool				irec_free;
> > >  	bool				ino_inuse;
> > >  	bool				freemask_ok;
> > > @@ -174,11 +175,14 @@ xchk_iallocbt_check_cluster_ifree(
> > >  	 * Given an inobt record, an offset of a cluster within the record,
> > >  	 * and an offset of an inode within a cluster, compute which fs inode
> > >  	 * we're talking about and the offset of the inode record within the
> > > -	 * inode buffer.
> > > +	 * inode buffer, being careful about inobt records that don't align
> > > +	 * with the start of the inode buffer when block sizes are large.
> > >  	 */
> > >  	agino = irec->ir_startino + cluster_base + cluster_index;
> > >  	fsino = XFS_AGINO_TO_INO(mp, bs->cur->bc_private.a.agno, agino);
> > > -	offset = cluster_index * mp->m_sb.sb_inodesize;
> > > +	cluster_buf_base = XFS_INO_TO_OFFSET(mp, irec->ir_startino +
> > > +					     cluster_base);
> > > +	offset = (cluster_buf_base + cluster_index) * mp->m_sb.sb_inodesize;
> > 
> > So cluster_buf_base should always be 0 unless the record itself is not
> > cluster aligned (i.e., the second record in a large FSB), right?
> 
> Right.  For the (n > 0)th inobt record for a large FSB, ir_startino's
> block offset will be less than inopblock, so cluster_buf_base will be
> greater than zero.
> 
> > If so, cluster_base should also be 0 in any case where
> > cluster_buf_base != 0.  I'm wondering if that can be made a little
> > more explicit, or perhaps self-documented with an assert.
> 
> Right.  For the more typical case where there are multiple clusters for
> each inobt record, ir_startino's block offset will always point to the
> start of an inode cluster (and therefore cluster_buf_base will be zero)
> but cluster_base will increment as we visit each cluster in the inobt.
> 
> I think this code can become:
> 
> 	cluster_buf_base = XFS_INO_TO_OFFSET(mp, irec->ir_startino);
> 	ASSERT(cluster_buf_base == 0 || cluster_base == 0);
> 	offset = (cluster_buf_base + cluster_index) << mp->m_sb.sb_inopblog;
> 
> But I'll go feed that to the test machines before I commit to that. :)
> 
> > Thinking more about it, it's kind of confusing to me either way because
> > cluster_index isn't really a cluster index in this case, rather it's
> > more of a record index because it doesn't account for the fact that the
> > record is a subset of the cluster. Hmmm... could we simplify this by
> > setting imap.im_boffset appropriately in the caller such that dip always
> > points to either the first inode in the buffer or first inode in the
> > record, then just pass in dip directly and let the caller increment it
> > in the associated loop? Maybe that's something for another patch..
> 
> I think that would be possible...
> 

If so, I think it's worthwhile because then you'd only need to pass the
record, the inode index and the dip pointer to this function. AFAICT
you'd just need to pull up the cluster_buf_base logic here (along with
a useful comment) to the im_boffset assignment in the caller.

Brian

> --D
> 
> > Brian
> > 
> > >  	if (offset >= BBTOB(cluster_bp->b_length)) {
> > >  		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> > >  		goto out;
> > > 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/8] xfs: check the ir_startino alignment directly
  2019-01-07 13:45       ` Brian Foster
@ 2019-01-08  1:43         ` Darrick J. Wong
  2019-01-08 12:47           ` Brian Foster
  0 siblings, 1 reply; 30+ messages in thread
From: Darrick J. Wong @ 2019-01-08  1:43 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Mon, Jan 07, 2019 at 08:45:31AM -0500, Brian Foster wrote:
> On Fri, Jan 04, 2019 at 12:59:06PM -0800, Darrick J. Wong wrote:
> > On Fri, Jan 04, 2019 at 01:31:04PM -0500, Brian Foster wrote:
> > > On Mon, Dec 31, 2018 at 06:08:39PM -0800, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > 
> > > > In xchk_iallocbt_rec, check the alignment of ir_startino by converting
> > > > the inode cluster block alignment into units of inodes instead of the
> > > > other way around (converting ir_startino to blocks).  This prevents us
> > > > from tripping over off-by-one errors in ir_startino which are obscured
> > > > by the inode -> block conversion.
> > > > 
> > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > ---
> > > >  fs/xfs/scrub/ialloc.c |   45 +++++++++++++++++++++++++++++++++++++++------
> > > >  1 file changed, 39 insertions(+), 6 deletions(-)
> > > > 
> > > > 
> > > > diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
> > > > index fd431682db0b..5082331d6c03 100644
> > > > --- a/fs/xfs/scrub/ialloc.c
> > > > +++ b/fs/xfs/scrub/ialloc.c
> > > > @@ -265,6 +265,42 @@ xchk_iallocbt_check_freemask(
> > > >  	return error;
> > > >  }
> > > >  
> > > > +/* Make sure this inode btree record is aligned properly. */
> > > > +STATIC void
> > > > +xchk_iallocbt_rec_alignment(
> > > > +	struct xchk_btree		*bs,
> > > > +	struct xfs_inobt_rec_incore	*irec)
> > > > +{
> > > > +	struct xfs_mount		*mp = bs->sc->mp;
> > > > +
> > > > +	/*
> > > > +	 * finobt records have different positioning requirements than inobt
> > > > +	 * records: each finobt record must have a corresponding inobt record.
> > > > +	 * That is checked in the xref function, so for now we only catch the
> > > > +	 * obvious case where the record isn't even chunk-aligned.
> > > > +	 *
> > > > +	 * Note also that if a fs block contains more than a single chunk of
> > > > +	 * inodes, we will have finobt records only for those chunks containing
> > > > +	 * free inodes.
> > > > +	 */
> > > > +	if (bs->cur->bc_btnum == XFS_BTNUM_FINO) {
> > > > +		if (irec->ir_startino & (XFS_INODES_PER_CHUNK - 1))
> > > > +			xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> > > > +		return;
> > > > +	}
> > > 
> > > Is the above really a finobt only check? Couldn't we run this
> > > sanity check against all records and skip the following for finobt?
> > 
> > Uhoh, it occurs to me that in the 4kblock !spinodes case we can have
> > inobt records (and therefore finobt records) that are aligned to
> > m_cluster_align_inodes, and that value can be less than 64.  So I think
> > this has to be something along the lines of:
> > 
> 
> Er, yeah. IIRC spinodes required slightly larger inode chunk alignment
> than traditionally required in order to ensure we don't create
> conflicting/overlapping sparse records. A quick look at mkfs shows that
> the cluster size basically defines the sparse chunk size and the chunk
> alignment == chunk size.

<nod>

> > imask = min(XFS_INODES_PER_CHUNK, mp->m_cluster_align_inodes) - 1;
> > if (irec->ir_startino & imask)
> > 	/* set corrupt... */
> > 
> > Hmm, and testing seems to bear this out.  New patch forthcoming.
> > 
> 
> Ok, I take it the min() is required because m_cluster_align_inodes can
> be multiple records in the large FSB case. If so, I wonder if it would
> be more simple to use m_cluster_align,

I don't see how that would work -- m_cluster_align is in units of
blocks, not inodes.

> but otherwise a one-liner comment couldn't hurt.

At the moment the comment says:

	/*
	 * finobt records have different positioning requirements than inobt
	 * records: each finobt record must have a corresponding inobt record.
	 * That is checked in the xref function, so for now we only catch the
	 * obvious case where the record isn't at all aligned properly.
	 *
	 * Note that if a fs block contains more than a single chunk of inodes,
	 * we will have finobt records only for those chunks containing free
	 * inodes, and therefore expect chunk alignment of finobt records.
	 * Otherwise, we expect that the finobt record is aligned to the
	 * cluster alignment as told by the superblock.
	 */

--D

> > > Otherwise seems fine:
> > > 
> > > Reviewed-by: Brian Foster <bfoster@redhat.com>
> > 
> > I've wondered in recent days if this is even necessary at all -- when
> > we're asked to check the inobt we check the ir_startino alignment of all
> > those records, so really the only thing we need is the existing check
> > that for each finobt record there's also an inobt record with the same
> > ir_startino.  OTOH I guess we shouldn't really assume that the calling
> > process already checked the inobt or that it didn't change between calls.
> > 
> 
> I guess it makes sense to verify the applicable records match, but in
> general I agree that "1. verify the inobt 2. verify the finobt is a
> subset of the inobt" is probably sufficient (and more elegant logic).
> 
> Brian
> 
> > --D
> > 
> > > > +
> > > > +	/* inobt records must be aligned to cluster and inoalignmnt size. */
> > > > +	if (irec->ir_startino & (mp->m_cluster_align_inodes - 1)) {
> > > > +		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> > > > +		return;
> > > > +	}
> > > > +
> > > > +	if (irec->ir_startino & (mp->m_inodes_per_cluster - 1)) {
> > > > +		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> > > > +		return;
> > > > +	}
> > > > +}
> > > > +
> > > >  /* Scrub an inobt/finobt record. */
> > > >  STATIC int
> > > >  xchk_iallocbt_rec(
> > > > @@ -277,7 +313,6 @@ xchk_iallocbt_rec(
> > > >  	uint64_t			holes;
> > > >  	xfs_agnumber_t			agno = bs->cur->bc_private.a.agno;
> > > >  	xfs_agino_t			agino;
> > > > -	xfs_agblock_t			agbno;
> > > >  	xfs_extlen_t			len;
> > > >  	int				holecount;
> > > >  	int				i;
> > > > @@ -304,11 +339,9 @@ xchk_iallocbt_rec(
> > > >  		goto out;
> > > >  	}
> > > >  
> > > > -	/* Make sure this record is aligned to cluster and inoalignmnt size. */
> > > > -	agbno = XFS_AGINO_TO_AGBNO(mp, irec.ir_startino);
> > > > -	if ((agbno & (mp->m_cluster_align - 1)) ||
> > > > -	    (agbno & (mp->m_blocks_per_cluster - 1)))
> > > > -		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> > > > +	xchk_iallocbt_rec_alignment(bs, &irec);
> > > > +	if (bs->sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
> > > > +		goto out;
> > > >  
> > > >  	iabt->inodes += irec.ir_count;
> > > >  
> > > > 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/8] xfs: scrub big block inode btrees correctly
  2019-01-07 13:45       ` Brian Foster
@ 2019-01-08  2:03         ` Darrick J. Wong
  0 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2019-01-08  2:03 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Mon, Jan 07, 2019 at 08:45:55AM -0500, Brian Foster wrote:
> On Fri, Jan 04, 2019 at 04:29:16PM -0800, Darrick J. Wong wrote:
> > On Fri, Jan 04, 2019 at 01:38:56PM -0500, Brian Foster wrote:
> > > On Mon, Dec 31, 2018 at 06:09:10PM -0800, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > 
> > > > Teach scrub how to handle the case that there are one or more inobt
> > > > records covering a given inode cluster.  This fixes the operation on big
> > > > block filesystems (e.g. 64k blocks, 512 byte inodes).
> > > > 
> > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > ---
> > > >  fs/xfs/scrub/ialloc.c |    8 ++++++--
> > > >  1 file changed, 6 insertions(+), 2 deletions(-)
> > > > 
> > > > 
> > > > diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
> > > > index 2f6c2d7fa3fd..1be6a5ebd61e 100644
> > > > --- a/fs/xfs/scrub/ialloc.c
> > > > +++ b/fs/xfs/scrub/ialloc.c
> > > > @@ -162,6 +162,7 @@ xchk_iallocbt_check_cluster_ifree(
> > > >  	xfs_ino_t			fsino;
> > > >  	xfs_agino_t			agino;
> > > >  	unsigned int			offset;
> > > > +	unsigned int			cluster_buf_base;
> > > >  	bool				irec_free;
> > > >  	bool				ino_inuse;
> > > >  	bool				freemask_ok;
> > > > @@ -174,11 +175,14 @@ xchk_iallocbt_check_cluster_ifree(
> > > >  	 * Given an inobt record, an offset of a cluster within the record,
> > > >  	 * and an offset of an inode within a cluster, compute which fs inode
> > > >  	 * we're talking about and the offset of the inode record within the
> > > > -	 * inode buffer.
> > > > +	 * inode buffer, being careful about inobt records that don't align
> > > > +	 * with the start of the inode buffer when block sizes are large.
> > > >  	 */
> > > >  	agino = irec->ir_startino + cluster_base + cluster_index;
> > > >  	fsino = XFS_AGINO_TO_INO(mp, bs->cur->bc_private.a.agno, agino);
> > > > -	offset = cluster_index * mp->m_sb.sb_inodesize;
> > > > +	cluster_buf_base = XFS_INO_TO_OFFSET(mp, irec->ir_startino +
> > > > +					     cluster_base);
> > > > +	offset = (cluster_buf_base + cluster_index) * mp->m_sb.sb_inodesize;
> > > 
> > > So cluster_buf_base should always be 0 unless the record itself is not
> > > cluster aligned (i.e., the second record in a large FSB), right?
> > 
> > Right.  For the (n > 0)th inobt record for a large FSB, ir_startino's
> > block offset will be less than inopblock, so cluster_buf_base will be
> > greater than zero.
> > 
> > > If so, cluster_base should also be 0 in any case where
> > > cluster_buf_base != 0.  I'm wondering if that can be made a little
> > > more explicit, or perhaps self-documented with an assert.
> > 
> > Right.  For the more typical case where there are multiple clusters for
> > each inobt record, ir_startino's block offset will always point to the
> > start of an inode cluster (and therefore cluster_buf_base will be zero)
> > but cluster_base will increment as we visit each cluster in the inobt.
> > 
> > I think this code can become:
> > 
> > 	cluster_buf_base = XFS_INO_TO_OFFSET(mp, irec->ir_startino);
> > 	ASSERT(cluster_buf_base == 0 || cluster_base == 0);
> > 	offset = (cluster_buf_base + cluster_index) << mp->m_sb.sb_inopblog;
> > 
> > But I'll go feed that to the test machines before I commit to that. :)
> > 
> > > Thinking more about it, it's kind of confusing to me either way because
> > > cluster_index isn't really a cluster index in this case, rather it's
> > > more of a record index because it doesn't account for the fact that the
> > > record is a subset of the cluster. Hmmm... could we simplify this by
> > > setting imap.im_boffset appropriately in the caller such that dip always
> > > points to either the first inode in the buffer or first inode in the
> > > record, then just pass in dip directly and let the caller increment it
> > > in the associated loop? Maybe that's something for another patch..
> > 
> > I think that would be possible...
> > 
> 
> If so, I think it's worthwhile because then you'd only need to pass the
> record, the inode index and the dip pointer to this function. AFAICT
> you'd just need to pull up the cluster_buf_base logic here (along with
> a useful comment) to the im_boffset assignment in the caller.

<nod> It looks like it cleans things up quite a bit.

--D

> Brian
> 
> > --D
> > 
> > > Brian
> > > 
> > > >  	if (offset >= BBTOB(cluster_bp->b_length)) {
> > > >  		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> > > >  		goto out;
> > > > 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/8] xfs: check the ir_startino alignment directly
  2019-01-08  1:43         ` Darrick J. Wong
@ 2019-01-08 12:47           ` Brian Foster
  2019-01-08 18:28             ` Darrick J. Wong
  0 siblings, 1 reply; 30+ messages in thread
From: Brian Foster @ 2019-01-08 12:47 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Jan 07, 2019 at 05:43:07PM -0800, Darrick J. Wong wrote:
> On Mon, Jan 07, 2019 at 08:45:31AM -0500, Brian Foster wrote:
> > On Fri, Jan 04, 2019 at 12:59:06PM -0800, Darrick J. Wong wrote:
> > > On Fri, Jan 04, 2019 at 01:31:04PM -0500, Brian Foster wrote:
> > > > On Mon, Dec 31, 2018 at 06:08:39PM -0800, Darrick J. Wong wrote:
> > > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > 
> > > > > In xchk_iallocbt_rec, check the alignment of ir_startino by converting
> > > > > the inode cluster block alignment into units of inodes instead of the
> > > > > other way around (converting ir_startino to blocks).  This prevents us
> > > > > from tripping over off-by-one errors in ir_startino which are obscured
> > > > > by the inode -> block conversion.
> > > > > 
> > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > ---
> > > > >  fs/xfs/scrub/ialloc.c |   45 +++++++++++++++++++++++++++++++++++++++------
> > > > >  1 file changed, 39 insertions(+), 6 deletions(-)
> > > > > 
> > > > > 
> > > > > diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
> > > > > index fd431682db0b..5082331d6c03 100644
> > > > > --- a/fs/xfs/scrub/ialloc.c
> > > > > +++ b/fs/xfs/scrub/ialloc.c
> > > > > @@ -265,6 +265,42 @@ xchk_iallocbt_check_freemask(
> > > > >  	return error;
> > > > >  }
> > > > >  
> > > > > +/* Make sure this inode btree record is aligned properly. */
> > > > > +STATIC void
> > > > > +xchk_iallocbt_rec_alignment(
> > > > > +	struct xchk_btree		*bs,
> > > > > +	struct xfs_inobt_rec_incore	*irec)
> > > > > +{
> > > > > +	struct xfs_mount		*mp = bs->sc->mp;
> > > > > +
> > > > > +	/*
> > > > > +	 * finobt records have different positioning requirements than inobt
> > > > > +	 * records: each finobt record must have a corresponding inobt record.
> > > > > +	 * That is checked in the xref function, so for now we only catch the
> > > > > +	 * obvious case where the record isn't even chunk-aligned.
> > > > > +	 *
> > > > > +	 * Note also that if a fs block contains more than a single chunk of
> > > > > +	 * inodes, we will have finobt records only for those chunks containing
> > > > > +	 * free inodes.
> > > > > +	 */
> > > > > +	if (bs->cur->bc_btnum == XFS_BTNUM_FINO) {
> > > > > +		if (irec->ir_startino & (XFS_INODES_PER_CHUNK - 1))
> > > > > +			xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> > > > > +		return;
> > > > > +	}
> > > > 
> > > > Is the above really a finobt only check? Couldn't we run this
> > > > sanity check against all records and skip the following for finobt?
> > > 
> > > Uhoh, it occurs to me that in the 4kblock !spinodes case we can have
> > > inobt records (and therefore finobt records) that are aligned to
> > > m_cluster_align_inodes, and that value can be less than 64.  So I think
> > > this has to be something along the lines of:
> > > 
> > 
> > Er, yeah. IIRC spinodes required slightly larger inode chunk alignment
> > than traditionally required in order to ensure we don't create
> > conflicting/overlapping sparse records. A quick look at mkfs shows that
> > the cluster size basically defines the sparse chunk size and the chunk
> > alignment == chunk size.
> 
> <nod>
> 
> > > imask = min(XFS_INODES_PER_CHUNK, mp->m_cluster_align_inodes) - 1;
> > > if (irec->ir_startino & imask)
> > > 	/* set corrupt... */
> > > 
> > > Hmm, and testing seems to bear this out.  New patch forthcoming.
> > > 
> > 
> > Ok, I take it the min() is required because m_cluster_align_inodes can
> > be multiple records in the large FSB case. If so, I wonder if it would
> > be more simple to use m_cluster_align,
> 
> I don't see how that would work -- m_cluster_align is in units of
> blocks, not inodes.
> 

Convert the startino to the agbno and check that..? It's ultimately just
a nit, but my comment is that:

	... = min(XFS_INODES_PER_CHUNK, mp->m_cluster_align_inodes) - 1;

... gives me a brief wtf because I'm not sure what chunk size has to do
with record alignment. Then I stare at it, go look at how
->m_cluster_align and friends are calculated, read the comment below and
work out that we're basically special casing conversion of the startino
of a multi-record per large FSB to block granularity. Note that part of
my temporary confusion here is probably just that I'm not used to seeing
cluster alignment in inode units..

> > but otherwise a one-liner comment couldn't hurt.
> 
> At the moment the comment says:
> 
> 	/*
> 	 * finobt records have different positioning requirements than inobt
> 	 * records: each finobt record must have a corresponding inobt record.
> 	 * That is checked in the xref function, so for now we only catch the
> 	 * obvious case where the record isn't at all aligned properly.
> 	 *
> 	 * Note that if a fs block contains more than a single chunk of inodes,
> 	 * we will have finobt records only for those chunks containing free
> 	 * inodes, and therefore expect chunk alignment of finobt records.
> 	 * Otherwise, we expect that the finobt record is aligned to the
> 	 * cluster alignment as told by the superblock.
> 	 */
> 

As opposed to something like:

	agbno = XFS_AGINO_TO_AGBNO(mp, irec->ir_startino);
	if (agbno & (mp->m_cluster_align - 1))
		...

... which IMO is self explanatory: make sure the start block of the
inode chunk is properly aligned. Am I missing some reason why we can't
do that?

Brian

> --D
> 
> > > > Otherwise seems fine:
> > > > 
> > > > Reviewed-by: Brian Foster <bfoster@redhat.com>
> > > 
> > > I've wondered in recent days if this is even necessary at all -- when
> > > we're asked to check the inobt we check the ir_startino alignment of all
> > > those records, so really the only thing we need is the existing check
> > > that for each finobt record there's also an inobt record with the same
> > > ir_startino.  OTOH I guess we shouldn't really assume that the calling
> > > process already checked the inobt or that it didn't change between calls.
> > > 
> > 
> > I guess it makes sense to verify the applicable records match, but in
> > general I agree that "1. verify the inobt 2. verify the finobt is a
> > subset of the inobt" is probably sufficient (and more elegant logic).
> > 
> > Brian
> > 
> > > --D
> > > 
> > > > > +
> > > > > +	/* inobt records must be aligned to cluster and inoalignmnt size. */
> > > > > +	if (irec->ir_startino & (mp->m_cluster_align_inodes - 1)) {
> > > > > +		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> > > > > +		return;
> > > > > +	}
> > > > > +
> > > > > +	if (irec->ir_startino & (mp->m_inodes_per_cluster - 1)) {
> > > > > +		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> > > > > +		return;
> > > > > +	}
> > > > > +}
> > > > > +
> > > > >  /* Scrub an inobt/finobt record. */
> > > > >  STATIC int
> > > > >  xchk_iallocbt_rec(
> > > > > @@ -277,7 +313,6 @@ xchk_iallocbt_rec(
> > > > >  	uint64_t			holes;
> > > > >  	xfs_agnumber_t			agno = bs->cur->bc_private.a.agno;
> > > > >  	xfs_agino_t			agino;
> > > > > -	xfs_agblock_t			agbno;
> > > > >  	xfs_extlen_t			len;
> > > > >  	int				holecount;
> > > > >  	int				i;
> > > > > @@ -304,11 +339,9 @@ xchk_iallocbt_rec(
> > > > >  		goto out;
> > > > >  	}
> > > > >  
> > > > > -	/* Make sure this record is aligned to cluster and inoalignmnt size. */
> > > > > -	agbno = XFS_AGINO_TO_AGBNO(mp, irec.ir_startino);
> > > > > -	if ((agbno & (mp->m_cluster_align - 1)) ||
> > > > > -	    (agbno & (mp->m_blocks_per_cluster - 1)))
> > > > > -		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> > > > > +	xchk_iallocbt_rec_alignment(bs, &irec);
> > > > > +	if (bs->sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
> > > > > +		goto out;
> > > > >  
> > > > >  	iabt->inodes += irec.ir_count;
> > > > >  
> > > > > 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/8] xfs: check the ir_startino alignment directly
  2019-01-08 12:47           ` Brian Foster
@ 2019-01-08 18:28             ` Darrick J. Wong
  2019-01-08 19:00               ` Brian Foster
  0 siblings, 1 reply; 30+ messages in thread
From: Darrick J. Wong @ 2019-01-08 18:28 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Tue, Jan 08, 2019 at 07:47:23AM -0500, Brian Foster wrote:
> On Mon, Jan 07, 2019 at 05:43:07PM -0800, Darrick J. Wong wrote:
> > On Mon, Jan 07, 2019 at 08:45:31AM -0500, Brian Foster wrote:
> > > On Fri, Jan 04, 2019 at 12:59:06PM -0800, Darrick J. Wong wrote:
> > > > On Fri, Jan 04, 2019 at 01:31:04PM -0500, Brian Foster wrote:
> > > > > On Mon, Dec 31, 2018 at 06:08:39PM -0800, Darrick J. Wong wrote:
> > > > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > > 
> > > > > > In xchk_iallocbt_rec, check the alignment of ir_startino by converting
> > > > > > the inode cluster block alignment into units of inodes instead of the
> > > > > > other way around (converting ir_startino to blocks).  This prevents us
> > > > > > from tripping over off-by-one errors in ir_startino which are obscured
> > > > > > by the inode -> block conversion.
> > > > > > 
> > > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > > ---
> > > > > >  fs/xfs/scrub/ialloc.c |   45 +++++++++++++++++++++++++++++++++++++++------
> > > > > >  1 file changed, 39 insertions(+), 6 deletions(-)
> > > > > > 
> > > > > > 
> > > > > > diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
> > > > > > index fd431682db0b..5082331d6c03 100644
> > > > > > --- a/fs/xfs/scrub/ialloc.c
> > > > > > +++ b/fs/xfs/scrub/ialloc.c
> > > > > > @@ -265,6 +265,42 @@ xchk_iallocbt_check_freemask(
> > > > > >  	return error;
> > > > > >  }
> > > > > >  
> > > > > > +/* Make sure this inode btree record is aligned properly. */
> > > > > > +STATIC void
> > > > > > +xchk_iallocbt_rec_alignment(
> > > > > > +	struct xchk_btree		*bs,
> > > > > > +	struct xfs_inobt_rec_incore	*irec)
> > > > > > +{
> > > > > > +	struct xfs_mount		*mp = bs->sc->mp;
> > > > > > +
> > > > > > +	/*
> > > > > > +	 * finobt records have different positioning requirements than inobt
> > > > > > +	 * records: each finobt record must have a corresponding inobt record.
> > > > > > +	 * That is checked in the xref function, so for now we only catch the
> > > > > > +	 * obvious case where the record isn't even chunk-aligned.
> > > > > > +	 *
> > > > > > +	 * Note also that if a fs block contains more than a single chunk of
> > > > > > +	 * inodes, we will have finobt records only for those chunks containing
> > > > > > +	 * free inodes.
> > > > > > +	 */
> > > > > > +	if (bs->cur->bc_btnum == XFS_BTNUM_FINO) {
> > > > > > +		if (irec->ir_startino & (XFS_INODES_PER_CHUNK - 1))
> > > > > > +			xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> > > > > > +		return;
> > > > > > +	}
> > > > > 
> > > > > Is the above really a finobt only check? Couldn't we run this
> > > > > sanity check against all records and skip the following for finobt?
> > > > 
> > > > Uhoh, it occurs to me that in the 4kblock !spinodes case we can have
> > > > inobt records (and therefore finobt records) that are aligned to
> > > > m_cluster_align_inodes, and that value can be less than 64.  So I think
> > > > this has to be something along the lines of:
> > > > 
> > > 
> > > Er, yeah. IIRC spinodes required slightly larger inode chunk alignment
> > > than traditionally required in order to ensure we don't create
> > > conflicting/overlapping sparse records. A quick look at mkfs shows that
> > > the cluster size basically defines the sparse chunk size and the chunk
> > > alignment == chunk size.
> > 
> > <nod>
> > 
> > > > imask = min(XFS_INODES_PER_CHUNK, mp->m_cluster_align_inodes) - 1;
> > > > if (irec->ir_startino & imask)
> > > > 	/* set corrupt... */
> > > > 
> > > > Hmm, and testing seems to bear this out.  New patch forthcoming.
> > > > 
> > > 
> > > Ok, I take it the min() is required because m_cluster_align_inodes can
> > > be multiple records in the large FSB case. If so, I wonder if it would
> > > be more simple to use m_cluster_align,
> > 
> > I don't see how that would work -- m_cluster_align is in units of
> > blocks, not inodes.
> > 
> 
> Convert the startino to the agbno and check that..? It's ultimately just
> a nit, but my comment is that:
> 
> 	... = min(XFS_INODES_PER_CHUNK, mp->m_cluster_align_inodes) - 1;
> 
> ... gives me a brief wtf because I'm not sure what chunk size has to do
> with record alignment. Then I stare at it, go look at how
> ->m_cluster_align and friends are calculated, read the comment below and
> work out that we're basically special casing conversion of the startino
> of a multi-record per large FSB to block granularity. Note that part of
> my temporary confusion here is probably just that I'm not used to seeing
> cluster alignment in inode units..

Aha... now I think I know the source of the confusion (see below):

> > > but otherwise a one-liner comment couldn't hurt.
> > 
> > At the moment the comment says:
> > 
> > 	/*
> > 	 * finobt records have different positioning requirements than inobt
> > 	 * records: each finobt record must have a corresponding inobt record.
> > 	 * That is checked in the xref function, so for now we only catch the
> > 	 * obvious case where the record isn't at all aligned properly.
> > 	 *
> > 	 * Note that if a fs block contains more than a single chunk of inodes,
> > 	 * we will have finobt records only for those chunks containing free
> > 	 * inodes, and therefore expect chunk alignment of finobt records.
> > 	 * Otherwise, we expect that the finobt record is aligned to the
> > 	 * cluster alignment as told by the superblock.
> > 	 */
> > 
> 
> As opposed to something like:
> 
> 	agbno = XFS_AGINO_TO_AGBNO(mp, irec->ir_startino);
> 	if (agbno & (mp->m_cluster_align - 1))
> 		...
> 
> ... which IMO is self explanatory: make sure the start block of the
> inode chunk is properly aligned. Am I missing some reason why we can't
> do that?

AGINO_TO_AGBNO right-shifts the low bits off ir_startino before the mask
test, which means we won't catch corruptions such as (ir_startino == 97)
if we're only examining block numbers derived from inode numbers.

I'll add an additional comment to the alignment check function to
explain why we have to perform the alignment checks at inode
granularity, not block granularity.

--D

> Brian
> 
> > --D
> > 
> > > > > Otherwise seems fine:
> > > > > 
> > > > > Reviewed-by: Brian Foster <bfoster@redhat.com>
> > > > 
> > > > I've wondered in recent days if this is even necessary at all -- when
> > > > we're asked to check the inobt we check the ir_startino alignment of all
> > > > those records, so really the only thing we need is the existing check
> > > > that for each finobt record there's also an inobt record with the same
> > > > ir_startino.  OTOH I guess we shouldn't really assume that the calling
> > > > process already checked the inobt or that it didn't change between calls.
> > > > 
> > > 
> > > I guess it makes sense to verify the applicable records match, but in
> > > general I agree that "1. verify the inobt 2. verify the finobt is a
> > > subset of the inobt" is probably sufficient (and more elegant logic).
> > > 
> > > Brian
> > > 
> > > > --D
> > > > 
> > > > > > +
> > > > > > +	/* inobt records must be aligned to cluster and inoalignmnt size. */
> > > > > > +	if (irec->ir_startino & (mp->m_cluster_align_inodes - 1)) {
> > > > > > +		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> > > > > > +		return;
> > > > > > +	}
> > > > > > +
> > > > > > +	if (irec->ir_startino & (mp->m_inodes_per_cluster - 1)) {
> > > > > > +		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> > > > > > +		return;
> > > > > > +	}
> > > > > > +}
> > > > > > +
> > > > > >  /* Scrub an inobt/finobt record. */
> > > > > >  STATIC int
> > > > > >  xchk_iallocbt_rec(
> > > > > > @@ -277,7 +313,6 @@ xchk_iallocbt_rec(
> > > > > >  	uint64_t			holes;
> > > > > >  	xfs_agnumber_t			agno = bs->cur->bc_private.a.agno;
> > > > > >  	xfs_agino_t			agino;
> > > > > > -	xfs_agblock_t			agbno;
> > > > > >  	xfs_extlen_t			len;
> > > > > >  	int				holecount;
> > > > > >  	int				i;
> > > > > > @@ -304,11 +339,9 @@ xchk_iallocbt_rec(
> > > > > >  		goto out;
> > > > > >  	}
> > > > > >  
> > > > > > -	/* Make sure this record is aligned to cluster and inoalignmnt size. */
> > > > > > -	agbno = XFS_AGINO_TO_AGBNO(mp, irec.ir_startino);
> > > > > > -	if ((agbno & (mp->m_cluster_align - 1)) ||
> > > > > > -	    (agbno & (mp->m_blocks_per_cluster - 1)))
> > > > > > -		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> > > > > > +	xchk_iallocbt_rec_alignment(bs, &irec);
> > > > > > +	if (bs->sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
> > > > > > +		goto out;
> > > > > >  
> > > > > >  	iabt->inodes += irec.ir_count;
> > > > > >  
> > > > > > 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/8] xfs: check the ir_startino alignment directly
  2019-01-08 18:28             ` Darrick J. Wong
@ 2019-01-08 19:00               ` Brian Foster
  0 siblings, 0 replies; 30+ messages in thread
From: Brian Foster @ 2019-01-08 19:00 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Jan 08, 2019 at 10:28:30AM -0800, Darrick J. Wong wrote:
> On Tue, Jan 08, 2019 at 07:47:23AM -0500, Brian Foster wrote:
> > On Mon, Jan 07, 2019 at 05:43:07PM -0800, Darrick J. Wong wrote:
> > > On Mon, Jan 07, 2019 at 08:45:31AM -0500, Brian Foster wrote:
> > > > On Fri, Jan 04, 2019 at 12:59:06PM -0800, Darrick J. Wong wrote:
> > > > > On Fri, Jan 04, 2019 at 01:31:04PM -0500, Brian Foster wrote:
> > > > > > On Mon, Dec 31, 2018 at 06:08:39PM -0800, Darrick J. Wong wrote:
> > > > > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > > > 
> > > > > > > In xchk_iallocbt_rec, check the alignment of ir_startino by converting
> > > > > > > the inode cluster block alignment into units of inodes instead of the
> > > > > > > other way around (converting ir_startino to blocks).  This prevents us
> > > > > > > from tripping over off-by-one errors in ir_startino which are obscured
> > > > > > > by the inode -> block conversion.
> > > > > > > 
> > > > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > > > ---
> > > > > > >  fs/xfs/scrub/ialloc.c |   45 +++++++++++++++++++++++++++++++++++++++------
> > > > > > >  1 file changed, 39 insertions(+), 6 deletions(-)
> > > > > > > 
> > > > > > > 
> > > > > > > diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
> > > > > > > index fd431682db0b..5082331d6c03 100644
> > > > > > > --- a/fs/xfs/scrub/ialloc.c
> > > > > > > +++ b/fs/xfs/scrub/ialloc.c
> > > > > > > @@ -265,6 +265,42 @@ xchk_iallocbt_check_freemask(
> > > > > > >  	return error;
> > > > > > >  }
> > > > > > >  
> > > > > > > +/* Make sure this inode btree record is aligned properly. */
> > > > > > > +STATIC void
> > > > > > > +xchk_iallocbt_rec_alignment(
> > > > > > > +	struct xchk_btree		*bs,
> > > > > > > +	struct xfs_inobt_rec_incore	*irec)
> > > > > > > +{
> > > > > > > +	struct xfs_mount		*mp = bs->sc->mp;
> > > > > > > +
> > > > > > > +	/*
> > > > > > > +	 * finobt records have different positioning requirements than inobt
> > > > > > > +	 * records: each finobt record must have a corresponding inobt record.
> > > > > > > +	 * That is checked in the xref function, so for now we only catch the
> > > > > > > +	 * obvious case where the record isn't even chunk-aligned.
> > > > > > > +	 *
> > > > > > > +	 * Note also that if a fs block contains more than a single chunk of
> > > > > > > +	 * inodes, we will have finobt records only for those chunks containing
> > > > > > > +	 * free inodes.
> > > > > > > +	 */
> > > > > > > +	if (bs->cur->bc_btnum == XFS_BTNUM_FINO) {
> > > > > > > +		if (irec->ir_startino & (XFS_INODES_PER_CHUNK - 1))
> > > > > > > +			xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> > > > > > > +		return;
> > > > > > > +	}
> > > > > > 
> > > > > > Is the above really a finobt only check? Couldn't we run this
> > > > > > sanity check against all records and skip the following for finobt?
> > > > > 
> > > > > Uhoh, it occurs to me that in the 4kblock !spinodes case we can have
> > > > > inobt records (and therefore finobt records) that are aligned to
> > > > > m_cluster_align_inodes, and that value can be less than 64.  So I think
> > > > > this has to be something along the lines of:
> > > > > 
> > > > 
> > > > Er, yeah. IIRC spinodes required slightly larger inode chunk alignment
> > > > than traditionally required in order to ensure we don't create
> > > > conflicting/overlapping sparse records. A quick look at mkfs shows that
> > > > the cluster size basically defines the sparse chunk size and the chunk
> > > > alignment == chunk size.
> > > 
> > > <nod>
> > > 
> > > > > imask = min(XFS_INODES_PER_CHUNK, mp->m_cluster_align_inodes) - 1;
> > > > > if (irec->ir_startino & imask)
> > > > > 	/* set corrupt... */
> > > > > 
> > > > > Hmm, and testing seems to bear this out.  New patch forthcoming.
> > > > > 
> > > > 
> > > > Ok, I take it the min() is required because m_cluster_align_inodes can
> > > > be multiple records in the large FSB case. If so, I wonder if it would
> > > > be more simple to use m_cluster_align,
> > > 
> > > I don't see how that would work -- m_cluster_align is in units of
> > > blocks, not inodes.
> > > 
> > 
> > Convert the startino to the agbno and check that..? It's ultimately just
> > a nit, but my comment is that:
> > 
> > 	... = min(XFS_INODES_PER_CHUNK, mp->m_cluster_align_inodes) - 1;
> > 
> > ... gives me a brief wtf because I'm not sure what chunk size has to do
> > with record alignment. Then I stare at it, go look at how
> > ->m_cluster_align and friends are calculated, read the comment below and
> > work out that we're basically special casing conversion of the startino
> > of a multi-record per large FSB to block granularity. Note that part of
> > my temporary confusion here is probably just that I'm not used to seeing
> > cluster alignment in inode units..
> 
> Aha... now I think I know the source of the confusion (see below):
> 
> > > > but otherwise a one-liner comment couldn't hurt.
> > > 
> > > At the moment the comment says:
> > > 
> > > 	/*
> > > 	 * finobt records have different positioning requirements than inobt
> > > 	 * records: each finobt record must have a corresponding inobt record.
> > > 	 * That is checked in the xref function, so for now we only catch the
> > > 	 * obvious case where the record isn't at all aligned properly.
> > > 	 *
> > > 	 * Note that if a fs block contains more than a single chunk of inodes,
> > > 	 * we will have finobt records only for those chunks containing free
> > > 	 * inodes, and therefore expect chunk alignment of finobt records.
> > > 	 * Otherwise, we expect that the finobt record is aligned to the
> > > 	 * cluster alignment as told by the superblock.
> > > 	 */
> > > 
> > 
> > As opposed to something like:
> > 
> > 	agbno = XFS_AGINO_TO_AGBNO(mp, irec->ir_startino);
> > 	if (agbno & (mp->m_cluster_align - 1))
> > 		...
> > 
> > ... which IMO is self explanatory: make sure the start block of the
> > inode chunk is properly aligned. Am I missing some reason why we can't
> > do that?
> 
> AGINO_TO_AGBNO right-shifts the low bits off ir_startino before the mask
> test, which means we won't catch corruptions such as (ir_startino == 97)
> if we're only examining block numbers derived from inode numbers.
> 
> I'll add an additional comment to the alignment check function to
> explain why we have to perform the alignment checks at inode
> granularity, not block granularity.
> 

Ok, I see. One other quick thought.. do we still lose that information
if we just convert back and forth? I.e., convert agino to agbno/offset,
check alignment etc., convert agbno/offset back to agino and make sure
it matches the startino..?

That aside, yeah.. a couple sentences that explain precisely what you
did above clear this up quite a bit.

Brian

> --D
> 
> > Brian
> > 
> > > --D
> > > 
> > > > > > Otherwise seems fine:
> > > > > > 
> > > > > > Reviewed-by: Brian Foster <bfoster@redhat.com>
> > > > > 
> > > > > I've wondered in recent days if this is even necessary at all -- when
> > > > > we're asked to check the inobt we check the ir_startino alignment of all
> > > > > those records, so really the only thing we need is the existing check
> > > > > that for each finobt record there's also an inobt record with the same
> > > > > ir_startino.  OTOH I guess we shouldn't really assume that the calling
> > > > > process already checked the inobt or that it didn't change between calls.
> > > > > 
> > > > 
> > > > I guess it makes sense to verify the applicable records match, but in
> > > > general I agree that "1. verify the inobt 2. verify the finobt is a
> > > > subset of the inobt" is probably sufficient (and more elegant logic).
> > > > 
> > > > Brian
> > > > 
> > > > > --D
> > > > > 
> > > > > > > +
> > > > > > > +	/* inobt records must be aligned to cluster and inoalignmnt size. */
> > > > > > > +	if (irec->ir_startino & (mp->m_cluster_align_inodes - 1)) {
> > > > > > > +		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> > > > > > > +		return;
> > > > > > > +	}
> > > > > > > +
> > > > > > > +	if (irec->ir_startino & (mp->m_inodes_per_cluster - 1)) {
> > > > > > > +		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> > > > > > > +		return;
> > > > > > > +	}
> > > > > > > +}
> > > > > > > +
> > > > > > >  /* Scrub an inobt/finobt record. */
> > > > > > >  STATIC int
> > > > > > >  xchk_iallocbt_rec(
> > > > > > > @@ -277,7 +313,6 @@ xchk_iallocbt_rec(
> > > > > > >  	uint64_t			holes;
> > > > > > >  	xfs_agnumber_t			agno = bs->cur->bc_private.a.agno;
> > > > > > >  	xfs_agino_t			agino;
> > > > > > > -	xfs_agblock_t			agbno;
> > > > > > >  	xfs_extlen_t			len;
> > > > > > >  	int				holecount;
> > > > > > >  	int				i;
> > > > > > > @@ -304,11 +339,9 @@ xchk_iallocbt_rec(
> > > > > > >  		goto out;
> > > > > > >  	}
> > > > > > >  
> > > > > > > -	/* Make sure this record is aligned to cluster and inoalignmnt size. */
> > > > > > > -	agbno = XFS_AGINO_TO_AGBNO(mp, irec.ir_startino);
> > > > > > > -	if ((agbno & (mp->m_cluster_align - 1)) ||
> > > > > > > -	    (agbno & (mp->m_blocks_per_cluster - 1)))
> > > > > > > -		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> > > > > > > +	xchk_iallocbt_rec_alignment(bs, &irec);
> > > > > > > +	if (bs->sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
> > > > > > > +		goto out;
> > > > > > >  
> > > > > > >  	iabt->inodes += irec.ir_count;
> > > > > > >  
> > > > > > > 

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2019-01-08 19:00 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-01  2:08 [PATCH 0/8] xfs: inode scrubber fixes Darrick J. Wong
2019-01-01  2:08 ` [PATCH 1/8] xfs: never try to scrub more than 64 inodes per inobt record Darrick J. Wong
2019-01-02 12:39   ` Carlos Maiolino
2019-01-02 13:30     ` Chandan Rajendra
2019-01-04 18:30   ` Brian Foster
2019-01-01  2:08 ` [PATCH 2/8] xfs: check the ir_startino alignment directly Darrick J. Wong
2019-01-04 18:31   ` Brian Foster
2019-01-04 20:59     ` Darrick J. Wong
2019-01-07 13:45       ` Brian Foster
2019-01-08  1:43         ` Darrick J. Wong
2019-01-08 12:47           ` Brian Foster
2019-01-08 18:28             ` Darrick J. Wong
2019-01-08 19:00               ` Brian Foster
2019-01-01  2:08 ` [PATCH 3/8] xfs: check inobt record alignment on big block filesystems Darrick J. Wong
2019-01-04 18:31   ` Brian Foster
2019-01-01  2:08 ` [PATCH 4/8] xfs: hoist inode cluster checks out of loop Darrick J. Wong
2019-01-04 18:31   ` Brian Foster
2019-01-01  2:08 ` [PATCH 5/8] xfs: clean up the inode cluster checking in the inobt scrub Darrick J. Wong
2019-01-04 18:32   ` Brian Foster
2019-01-04 22:02     ` Darrick J. Wong
2019-01-01  2:09 ` [PATCH 6/8] xfs: scrub big block inode btrees correctly Darrick J. Wong
2019-01-04 18:38   ` Brian Foster
2019-01-05  0:29     ` Darrick J. Wong
2019-01-07 13:45       ` Brian Foster
2019-01-08  2:03         ` Darrick J. Wong
2019-01-01  2:09 ` [PATCH 7/8] xfs: abort xattr scrub if fatal signals are pending Darrick J. Wong
2019-01-04 18:39   ` Brian Foster
2019-01-01  2:09 ` [PATCH 8/8] xfs: scrub should flag dir/attr offsets that aren't mappable with xfs_dablk_t Darrick J. Wong
2019-01-04 18:39   ` Brian Foster
2019-01-04 23:09     ` Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.