linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/9] xfs-5.0: inode scrubber fixes
@ 2018-11-28 23:26 Darrick J. Wong
  2018-11-28 23:27 ` [PATCH 1/9] xfs: count inode blocks correctly in inobt scrub Darrick J. Wong
                   ` (9 more replies)
  0 siblings, 10 replies; 18+ messages in thread
From: Darrick J. Wong @ 2018-11-28 23:26 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

Hi all,

Here are fixes for some problems with the inode btree scrub code, namely
that the existing code does not handle the case where a single inode
cluster is mapped by multiple inobt records.

Patch 1 teaches the inode record block counting code to handle the case
where there's more than one inobt record per inode cluster.  We do this
by counting inodes and converting to blocks only at the end.

Patch 2 corrects a condition where we needed to clamp the number of
inodes checked for a given inobt record to the inode chunk size.

Patches 3-4 move the inobt record alignment checks to a separate
function and enhance the function to check that when we have more than
one inobt record per cluster we actually check that *all* of the
necessary records are present and in the correct order.

Patches 5-7 reorganize the inobt free data checks to deal with the
"multiple inobt records per icluster" situation.  In restructuring the
code to do so, we also rename variables and functions to be less
confusing about what they're there for.  We also fix the 'is the inode
free?' check to calculate dinode buffer offsets correctly in the
"multiple inobt records per icluster" situation.

Patch 8 aborts the xattr scrub loop if there are pending fatal signals.

Patch 9 checks that for any directory or attr fork there are no extent
maps that stretch beyond what a xfs_dablk_t can map.

If you're going to start using this mess, you probably ought to just
pull from my git trees.  The kernel patches[1] should apply against
4.20-rc4.

Comments and questions are, as always, welcome.

--D

[1] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=djwong-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/9] xfs: count inode blocks correctly in inobt scrub
  2018-11-28 23:26 [PATCH 0/9] xfs-5.0: inode scrubber fixes Darrick J. Wong
@ 2018-11-28 23:27 ` Darrick J. Wong
  2018-11-28 23:27 ` [PATCH 2/9] xfs: never try to scrub more than 64 inodes per inobt record Darrick J. Wong
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Darrick J. Wong @ 2018-11-28 23:27 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, Dave Chinner

From: Darrick J. Wong <darrick.wong@oracle.com>

A big block filesystem might require more than one inobt record to cover
all the inodes in the block.  In these cases it is not correct to round
the irec count up to the nearest block because this causes us to
overestimate the number of inode blocks we expect to find.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/scrub/ialloc.c |   22 +++++++++++++++-------
 1 file changed, 15 insertions(+), 7 deletions(-)


diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
index 9b5287a0e8ba..882dc56c5c21 100644
--- a/fs/xfs/scrub/ialloc.c
+++ b/fs/xfs/scrub/ialloc.c
@@ -44,6 +44,11 @@ xchk_setup_ag_iallocbt(
 
 /* Inode btree scrubber. */
 
+struct xchk_iallocbt {
+	/* Number of inodes we see while scanning inobt. */
+	unsigned long long	inodes;
+};
+
 /*
  * If we're checking the finobt, cross-reference with the inobt.
  * Otherwise we're checking the inobt; if there is an finobt, make sure
@@ -266,7 +271,7 @@ xchk_iallocbt_rec(
 	union xfs_btree_rec		*rec)
 {
 	struct xfs_mount		*mp = bs->cur->bc_mp;
-	xfs_filblks_t			*inode_blocks = bs->private;
+	struct xchk_iallocbt		*iabt = bs->private;
 	struct xfs_inobt_rec_incore	irec;
 	uint64_t			holes;
 	xfs_agnumber_t			agno = bs->cur->bc_private.a.agno;
@@ -304,8 +309,7 @@ xchk_iallocbt_rec(
 	    (agbno & (mp->m_blocks_per_cluster - 1)))
 		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
 
-	*inode_blocks += XFS_B_TO_FSB(mp,
-			irec.ir_count * mp->m_sb.sb_inodesize);
+	iabt->inodes += irec.ir_count;
 
 	/* Handle non-sparse inodes */
 	if (!xfs_inobt_issparse(irec.ir_holemask)) {
@@ -397,9 +401,10 @@ STATIC void
 xchk_iallocbt_xref_rmap_inodes(
 	struct xfs_scrub	*sc,
 	int			which,
-	xfs_filblks_t		inode_blocks)
+	unsigned long long	inodes)
 {
 	xfs_filblks_t		blocks;
+	xfs_filblks_t		inode_blocks;
 	int			error;
 
 	if (!sc->sa.rmap_cur || xchk_skip_xref(sc->sm))
@@ -410,6 +415,7 @@ xchk_iallocbt_xref_rmap_inodes(
 			&XFS_RMAP_OINFO_INODES, &blocks);
 	if (!xchk_should_check_xref(sc, &error, &sc->sa.rmap_cur))
 		return;
+	inode_blocks = XFS_B_TO_FSB(sc->mp, inodes * sc->mp->m_sb.sb_inodesize);
 	if (blocks != inode_blocks)
 		xchk_btree_xref_set_corrupt(sc, sc->sa.rmap_cur, 0);
 }
@@ -421,12 +427,14 @@ xchk_iallocbt(
 	xfs_btnum_t		which)
 {
 	struct xfs_btree_cur	*cur;
-	xfs_filblks_t		inode_blocks = 0;
+	struct xchk_iallocbt	iabt = {
+		.inodes		= 0,
+	};
 	int			error;
 
 	cur = which == XFS_BTNUM_INO ? sc->sa.ino_cur : sc->sa.fino_cur;
 	error = xchk_btree(sc, cur, xchk_iallocbt_rec, &XFS_RMAP_OINFO_INOBT,
-			&inode_blocks);
+			&iabt);
 	if (error)
 		return error;
 
@@ -440,7 +448,7 @@ xchk_iallocbt(
 	 * to inode chunks with free inodes.
 	 */
 	if (which == XFS_BTNUM_INO)
-		xchk_iallocbt_xref_rmap_inodes(sc, which, inode_blocks);
+		xchk_iallocbt_xref_rmap_inodes(sc, which, iabt.inodes);
 
 	return error;
 }

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/9] xfs: never try to scrub more than 64 inodes per inobt record
  2018-11-28 23:26 [PATCH 0/9] xfs-5.0: inode scrubber fixes Darrick J. Wong
  2018-11-28 23:27 ` [PATCH 1/9] xfs: count inode blocks correctly in inobt scrub Darrick J. Wong
@ 2018-11-28 23:27 ` Darrick J. Wong
  2018-11-28 23:27 ` [PATCH 3/9] xfs: check the ir_startino alignment directly Darrick J. Wong
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Darrick J. Wong @ 2018-11-28 23:27 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Make sure we never check more than XFS_INODES_PER_CHUNK inodes for any
given inobt record since there can be more than one inobt record mapped
to an inode cluster.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/scrub/ialloc.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
index 882dc56c5c21..fd431682db0b 100644
--- a/fs/xfs/scrub/ialloc.c
+++ b/fs/xfs/scrub/ialloc.c
@@ -203,7 +203,8 @@ xchk_iallocbt_check_freemask(
 	int				error = 0;
 
 	/* Make sure the freemask matches the inode records. */
-	nr_inodes = mp->m_inodes_per_cluster;
+	nr_inodes = min_t(unsigned int, XFS_INODES_PER_CHUNK,
+			mp->m_inodes_per_cluster);
 
 	for (agino = irec->ir_startino;
 	     agino < irec->ir_startino + XFS_INODES_PER_CHUNK;

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/9] xfs: check the ir_startino alignment directly
  2018-11-28 23:26 [PATCH 0/9] xfs-5.0: inode scrubber fixes Darrick J. Wong
  2018-11-28 23:27 ` [PATCH 1/9] xfs: count inode blocks correctly in inobt scrub Darrick J. Wong
  2018-11-28 23:27 ` [PATCH 2/9] xfs: never try to scrub more than 64 inodes per inobt record Darrick J. Wong
@ 2018-11-28 23:27 ` Darrick J. Wong
  2018-11-28 23:27 ` [PATCH 4/9] xfs: check inobt record alignment on big block filesystems Darrick J. Wong
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Darrick J. Wong @ 2018-11-28 23:27 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

In xchk_iallocbt_rec, check the alignment of ir_startino by converting
the inode cluster block alignment into units of inodes instead of the
other way around (converting ir_startino to blocks).  This prevents us
from tripping over off-by-one errors in ir_startino which are obscured
by the inode -> block conversion.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/scrub/ialloc.c |   30 ++++++++++++++++++++++++------
 1 file changed, 24 insertions(+), 6 deletions(-)


diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
index fd431682db0b..ad241807b49a 100644
--- a/fs/xfs/scrub/ialloc.c
+++ b/fs/xfs/scrub/ialloc.c
@@ -265,6 +265,27 @@ xchk_iallocbt_check_freemask(
 	return error;
 }
 
+/* Make sure this record is aligned to cluster and inoalignmnt size. */
+STATIC void
+xchk_iallocbt_rec_alignment(
+	struct xchk_btree		*bs,
+	struct xfs_inobt_rec_incore	*irec)
+{
+	xfs_agino_t			imask;
+
+	imask = bs->sc->mp->m_cluster_align_inodes - 1;
+	if (irec->ir_startino & imask) {
+		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
+		return;
+	}
+
+	imask = bs->sc->mp->m_inodes_per_cluster - 1;
+	if (irec->ir_startino & imask) {
+		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
+		return;
+	}
+}
+
 /* Scrub an inobt/finobt record. */
 STATIC int
 xchk_iallocbt_rec(
@@ -277,7 +298,6 @@ xchk_iallocbt_rec(
 	uint64_t			holes;
 	xfs_agnumber_t			agno = bs->cur->bc_private.a.agno;
 	xfs_agino_t			agino;
-	xfs_agblock_t			agbno;
 	xfs_extlen_t			len;
 	int				holecount;
 	int				i;
@@ -304,11 +324,9 @@ xchk_iallocbt_rec(
 		goto out;
 	}
 
-	/* Make sure this record is aligned to cluster and inoalignmnt size. */
-	agbno = XFS_AGINO_TO_AGBNO(mp, irec.ir_startino);
-	if ((agbno & (mp->m_cluster_align - 1)) ||
-	    (agbno & (mp->m_blocks_per_cluster - 1)))
-		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
+	xchk_iallocbt_rec_alignment(bs, &irec);
+	if (bs->sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
+		goto out;
 
 	iabt->inodes += irec.ir_count;
 

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/9] xfs: check inobt record alignment on big block filesystems
  2018-11-28 23:26 [PATCH 0/9] xfs-5.0: inode scrubber fixes Darrick J. Wong
                   ` (2 preceding siblings ...)
  2018-11-28 23:27 ` [PATCH 3/9] xfs: check the ir_startino alignment directly Darrick J. Wong
@ 2018-11-28 23:27 ` Darrick J. Wong
  2018-11-28 23:27 ` [PATCH 5/9] xfs: hoist inode cluster checks out of loop Darrick J. Wong
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Darrick J. Wong @ 2018-11-28 23:27 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, Dave Chinner

From: Darrick J. Wong <darrick.wong@oracle.com>

On a big block filesystem, there may be multiple inobt records covering
a single inode cluster.  These records obviously won't be aligned to
cluster alignment rules, and they must cover the entire cluster.  Teach
scrub to check for these things.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/scrub/ialloc.c |   42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)


diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
index ad241807b49a..b03f5f04b056 100644
--- a/fs/xfs/scrub/ialloc.c
+++ b/fs/xfs/scrub/ialloc.c
@@ -47,6 +47,12 @@ xchk_setup_ag_iallocbt(
 struct xchk_iallocbt {
 	/* Number of inodes we see while scanning inobt. */
 	unsigned long long	inodes;
+
+	/* Expected next startino, for big block filesystems. */
+	xfs_agino_t		next_startino;
+
+	/* Expected end of the current inode cluster. */
+	xfs_agino_t		next_cluster_ino;
 };
 
 /*
@@ -271,8 +277,30 @@ xchk_iallocbt_rec_alignment(
 	struct xchk_btree		*bs,
 	struct xfs_inobt_rec_incore	*irec)
 {
+	struct xchk_iallocbt		*iabt = bs->private;
 	xfs_agino_t			imask;
 
+	if (iabt->next_startino != NULLAGINO) {
+		/*
+		 * We're midway through a cluster of inodes that is mapped by
+		 * multiple inobt records.  Did we get the record for the next
+		 * irec in the sequence?
+		 */
+		if (irec->ir_startino != iabt->next_startino) {
+			xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
+			return;
+		}
+
+		iabt->next_startino += XFS_INODES_PER_CHUNK;
+
+		/* Are we done with the cluster? */
+		if (iabt->next_startino >= iabt->next_cluster_ino) {
+			iabt->next_startino = NULLAGINO;
+			iabt->next_cluster_ino = NULLAGINO;
+		}
+		return;
+	}
+
 	imask = bs->sc->mp->m_cluster_align_inodes - 1;
 	if (irec->ir_startino & imask) {
 		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
@@ -284,6 +312,18 @@ xchk_iallocbt_rec_alignment(
 		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
 		return;
 	}
+
+	if (bs->sc->mp->m_inodes_per_cluster <= XFS_INODES_PER_CHUNK)
+		return;
+
+	/*
+	 * If this is the start of an inode cluster that can be mapped by
+	 * multiple inobt records, the next inobt record must follow exactly
+	 * after this one.
+	 */
+	iabt->next_startino = irec->ir_startino + XFS_INODES_PER_CHUNK;
+	iabt->next_cluster_ino = irec->ir_startino +
+			bs->sc->mp->m_inodes_per_cluster;
 }
 
 /* Scrub an inobt/finobt record. */
@@ -448,6 +488,8 @@ xchk_iallocbt(
 	struct xfs_btree_cur	*cur;
 	struct xchk_iallocbt	iabt = {
 		.inodes		= 0,
+		.next_startino	= NULLAGINO,
+		.next_cluster_ino = NULLAGINO,
 	};
 	int			error;
 

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/9] xfs: hoist inode cluster checks out of loop
  2018-11-28 23:26 [PATCH 0/9] xfs-5.0: inode scrubber fixes Darrick J. Wong
                   ` (3 preceding siblings ...)
  2018-11-28 23:27 ` [PATCH 4/9] xfs: check inobt record alignment on big block filesystems Darrick J. Wong
@ 2018-11-28 23:27 ` Darrick J. Wong
  2018-11-28 23:27 ` [PATCH 6/9] xfs: clean up the inode cluster checking in the inobt scrub Darrick J. Wong
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Darrick J. Wong @ 2018-11-28 23:27 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, Dave Chinner

From: Darrick J. Wong <darrick.wong@oracle.com>

Hoist the inode cluster checks out of the inobt record check loop into
a separate function in preparation for refactoring of that loop.  No
functional changes here; that's in the next patch.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/scrub/ialloc.c |  119 +++++++++++++++++++++++++++----------------------
 1 file changed, 65 insertions(+), 54 deletions(-)


diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
index b03f5f04b056..475f930e0daa 100644
--- a/fs/xfs/scrub/ialloc.c
+++ b/fs/xfs/scrub/ialloc.c
@@ -188,19 +188,19 @@ xchk_iallocbt_check_cluster_freemask(
 	return 0;
 }
 
-/* Make sure the free mask is consistent with what the inodes think. */
+/* Check an inode cluster. */
 STATIC int
-xchk_iallocbt_check_freemask(
+xchk_iallocbt_check_cluster(
 	struct xchk_btree		*bs,
-	struct xfs_inobt_rec_incore	*irec)
+	struct xfs_inobt_rec_incore	*irec,
+	xfs_agino_t			agino)
 {
 	struct xfs_imap			imap;
 	struct xfs_mount		*mp = bs->cur->bc_mp;
 	struct xfs_dinode		*dip;
 	struct xfs_buf			*bp;
 	xfs_ino_t			fsino;
-	xfs_agino_t			nr_inodes;
-	xfs_agino_t			agino;
+	unsigned int			nr_inodes;
 	xfs_agino_t			chunkino;
 	xfs_agino_t			clusterino;
 	xfs_agblock_t			agbno;
@@ -212,60 +212,71 @@ xchk_iallocbt_check_freemask(
 	nr_inodes = min_t(unsigned int, XFS_INODES_PER_CHUNK,
 			mp->m_inodes_per_cluster);
 
-	for (agino = irec->ir_startino;
-	     agino < irec->ir_startino + XFS_INODES_PER_CHUNK;
-	     agino += mp->m_inodes_per_cluster) {
-		fsino = XFS_AGINO_TO_INO(mp, bs->cur->bc_private.a.agno, agino);
-		chunkino = agino - irec->ir_startino;
-		agbno = XFS_AGINO_TO_AGBNO(mp, agino);
-
-		/* Compute the holemask mask for this cluster. */
-		for (clusterino = 0, holemask = 0; clusterino < nr_inodes;
-		     clusterino += XFS_INODES_PER_HOLEMASK_BIT)
-			holemask |= XFS_INOBT_MASK((chunkino + clusterino) /
-					XFS_INODES_PER_HOLEMASK_BIT);
-
-		/* The whole cluster must be a hole or not a hole. */
-		ir_holemask = (irec->ir_holemask & holemask);
-		if (ir_holemask != holemask && ir_holemask != 0) {
-			xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
-			continue;
-		}
+	fsino = XFS_AGINO_TO_INO(mp, bs->cur->bc_private.a.agno, agino);
+	chunkino = agino - irec->ir_startino;
+	agbno = XFS_AGINO_TO_AGBNO(mp, agino);
 
-		/* If any part of this is a hole, skip it. */
-		if (ir_holemask) {
-			xchk_xref_is_not_owned_by(bs->sc, agbno,
-					mp->m_blocks_per_cluster,
-					&XFS_RMAP_OINFO_INODES);
-			continue;
-		}
+	/* Compute the holemask mask for this cluster. */
+	for (clusterino = 0, holemask = 0; clusterino < nr_inodes;
+	     clusterino += XFS_INODES_PER_HOLEMASK_BIT)
+		holemask |= XFS_INOBT_MASK((chunkino + clusterino) /
+				XFS_INODES_PER_HOLEMASK_BIT);
 
-		xchk_xref_is_owned_by(bs->sc, agbno, mp->m_blocks_per_cluster,
+	/* The whole cluster must be a hole or not a hole. */
+	ir_holemask = (irec->ir_holemask & holemask);
+	if (ir_holemask != holemask && ir_holemask != 0) {
+		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
+		return 0;
+	}
+
+	/* If any part of this is a hole, skip it. */
+	if (ir_holemask) {
+		xchk_xref_is_not_owned_by(bs->sc, agbno,
+				mp->m_blocks_per_cluster,
 				&XFS_RMAP_OINFO_INODES);
+		return 0;
+	}
 
-		/* Grab the inode cluster buffer. */
-		imap.im_blkno = XFS_AGB_TO_DADDR(mp, bs->cur->bc_private.a.agno,
-				agbno);
-		imap.im_len = XFS_FSB_TO_BB(mp, mp->m_blocks_per_cluster);
-		imap.im_boffset = 0;
-
-		error = xfs_imap_to_bp(mp, bs->cur->bc_tp, &imap,
-				&dip, &bp, 0, 0);
-		if (!xchk_btree_xref_process_error(bs->sc, bs->cur, 0,
-				&error))
-			continue;
-
-		/* Which inodes are free? */
-		for (clusterino = 0; clusterino < nr_inodes; clusterino++) {
-			error = xchk_iallocbt_check_cluster_freemask(bs,
-					fsino, chunkino, clusterino, irec, bp);
-			if (error) {
-				xfs_trans_brelse(bs->cur->bc_tp, bp);
-				return error;
-			}
-		}
+	xchk_xref_is_owned_by(bs->sc, agbno, mp->m_blocks_per_cluster,
+			&XFS_RMAP_OINFO_INODES);
+
+	/* Grab the inode cluster buffer. */
+	imap.im_blkno = XFS_AGB_TO_DADDR(mp, bs->cur->bc_private.a.agno, agbno);
+	imap.im_len = XFS_FSB_TO_BB(mp, mp->m_blocks_per_cluster);
+	imap.im_boffset = 0;
 
-		xfs_trans_brelse(bs->cur->bc_tp, bp);
+	error = xfs_imap_to_bp(mp, bs->cur->bc_tp, &imap, &dip, &bp, 0, 0);
+	if (!xchk_btree_xref_process_error(bs->sc, bs->cur, 0, &error))
+		return 0;
+
+	/* Which inodes are free? */
+	for (clusterino = 0; clusterino < nr_inodes; clusterino++) {
+		error = xchk_iallocbt_check_cluster_freemask(bs, fsino,
+				chunkino, clusterino, irec, bp);
+		if (error)
+			break;
+	}
+
+	xfs_trans_brelse(bs->cur->bc_tp, bp);
+	return error;
+}
+
+/* Make sure the free mask is consistent with what the inodes think. */
+STATIC int
+xchk_iallocbt_check_freemask(
+	struct xchk_btree		*bs,
+	struct xfs_inobt_rec_incore	*irec)
+{
+	struct xfs_mount		*mp = bs->cur->bc_mp;
+	xfs_agino_t			agino;
+	int				error = 0;
+
+	for (agino = irec->ir_startino;
+	     agino < irec->ir_startino + XFS_INODES_PER_CHUNK;
+	     agino += mp->m_blocks_per_cluster * mp->m_sb.sb_inopblock) {
+		error = xchk_iallocbt_check_cluster(bs, irec, agino);
+		if (error)
+			break;
 	}
 
 	return error;

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 6/9] xfs: clean up the inode cluster checking in the inobt scrub
  2018-11-28 23:26 [PATCH 0/9] xfs-5.0: inode scrubber fixes Darrick J. Wong
                   ` (4 preceding siblings ...)
  2018-11-28 23:27 ` [PATCH 5/9] xfs: hoist inode cluster checks out of loop Darrick J. Wong
@ 2018-11-28 23:27 ` Darrick J. Wong
  2018-12-19  8:31   ` Chandan Rajendra
  2018-11-28 23:27 ` [PATCH 7/9] xfs: scrub big block inode btrees correctly Darrick J. Wong
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 18+ messages in thread
From: Darrick J. Wong @ 2018-11-28 23:27 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

The code to check inobt records against inode clusters is a mess of
poorly named variables and unnecessary parameters.  Clean the
unnecessary inode number parameters out of _check_cluster_freemask in
favor of computing them inside the function instead of making the caller
do it.  In xchk_iallocbt_check_cluster, rename the variables to make it
more obvious just what chunk_ino and cluster_ino represent.

Add a tracepoint to make it easier to track each inode cluster as we
scrub it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/scrub/ialloc.c |  162 ++++++++++++++++++++++++++++++++-----------------
 fs/xfs/scrub/trace.h  |   45 ++++++++++++++
 2 files changed, 151 insertions(+), 56 deletions(-)


diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
index 475f930e0daa..1d79f12dfce5 100644
--- a/fs/xfs/scrub/ialloc.c
+++ b/fs/xfs/scrub/ialloc.c
@@ -134,41 +134,70 @@ xchk_iallocbt_freecount(
 	return hweight64(freemask);
 }
 
-/* Check a particular inode with ir_free. */
+/*
+ * Check that an inode's allocation status matches ir_free in the inobt
+ * record.  First we try querying the in-core inode state, and if the inode
+ * isn't loaded we examine the on-disk inode directly.
+ *
+ * Since there can be 1:M and M:1 mappings between inobt records and inode
+ * clusters, we pass in the inode location information as an inobt record;
+ * the index of an inode cluster within the inobt record (as well as the
+ * cluster buffer itself); and the index of the inode within the cluster.
+ *
+ * @irec is the inobt record.
+ * @cluster_base is the inode offset of the cluster within the @irec.
+ * @cluster_bp is the cluster buffer.
+ * @cluster_index is the inode offset within the inode cluster.
+ */
 STATIC int
-xchk_iallocbt_check_cluster_freemask(
+xchk_iallocbt_check_cluster_ifree(
 	struct xchk_btree		*bs,
-	xfs_ino_t			fsino,
-	xfs_agino_t			chunkino,
-	xfs_agino_t			clusterino,
 	struct xfs_inobt_rec_incore	*irec,
-	struct xfs_buf			*bp)
+	unsigned int			cluster_base,
+	struct xfs_buf			*cluster_bp,
+	unsigned int			cluster_index)
 {
-	struct xfs_dinode		*dip;
 	struct xfs_mount		*mp = bs->cur->bc_mp;
-	bool				inode_is_free = false;
+	struct xfs_dinode		*dip;
+	xfs_ino_t			fsino;
+	xfs_agino_t			agino;
+	unsigned int			offset;
+	bool				irec_free;
+	bool				ino_inuse;
 	bool				freemask_ok;
-	bool				inuse;
-	int				error = 0;
+	int				error;
 
 	if (xchk_should_terminate(bs->sc, &error))
 		return error;
 
-	dip = xfs_buf_offset(bp, clusterino * mp->m_sb.sb_inodesize);
+	/*
+	 * Given an inobt record, an offset of a cluster within the record,
+	 * and an offset of an inode within a cluster, compute which fs inode
+	 * we're talking about and the offset of the inode record within the
+	 * inode buffer.
+	 */
+	agino = irec->ir_startino + cluster_base + cluster_index;
+	fsino = XFS_AGINO_TO_INO(mp, bs->cur->bc_private.a.agno, agino);
+	offset = cluster_index * mp->m_sb.sb_inodesize;
+	if (offset >= BBTOB(cluster_bp->b_length)) {
+		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
+		goto out;
+	}
+	dip = xfs_buf_offset(cluster_bp, offset);
+	irec_free = (irec->ir_free & XFS_INOBT_MASK(cluster_base +
+						    cluster_index));
+
 	if (be16_to_cpu(dip->di_magic) != XFS_DINODE_MAGIC ||
-	    (dip->di_version >= 3 &&
-	     be64_to_cpu(dip->di_ino) != fsino + clusterino)) {
+	    (dip->di_version >= 3 && be64_to_cpu(dip->di_ino) != fsino)) {
 		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
 		goto out;
 	}
 
-	if (irec->ir_free & XFS_INOBT_MASK(chunkino + clusterino))
-		inode_is_free = true;
-	error = xfs_icache_inode_is_allocated(mp, bs->cur->bc_tp,
-			fsino + clusterino, &inuse);
+	error = xfs_icache_inode_is_allocated(mp, bs->cur->bc_tp, fsino,
+			&ino_inuse);
 	if (error == -ENODATA) {
 		/* Not cached, just read the disk buffer */
-		freemask_ok = inode_is_free ^ !!(dip->di_mode);
+		freemask_ok = irec_free ^ !!(dip->di_mode);
 		if (!bs->sc->try_harder && !freemask_ok)
 			return -EDEADLOCK;
 	} else if (error < 0) {
@@ -180,7 +209,7 @@ xchk_iallocbt_check_cluster_freemask(
 		goto out;
 	} else {
 		/* Inode is all there. */
-		freemask_ok = inode_is_free ^ inuse;
+		freemask_ok = irec_free ^ ino_inuse;
 	}
 	if (!freemask_ok)
 		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
@@ -188,43 +217,57 @@ xchk_iallocbt_check_cluster_freemask(
 	return 0;
 }
 
-/* Check an inode cluster. */
+/*
+ * Check that the holemask and freemask of a hypothetical inode cluster match
+ * what's actually on disk.  If sparse inodes are enabled, the cluster does
+ * not actually have to map to inodes if the corresponding holemask bit is set.
+ *
+ * @cluster_base is the first inode in the cluster within the @irec.
+ */
 STATIC int
 xchk_iallocbt_check_cluster(
 	struct xchk_btree		*bs,
 	struct xfs_inobt_rec_incore	*irec,
-	xfs_agino_t			agino)
+	unsigned int			cluster_base)
 {
 	struct xfs_imap			imap;
 	struct xfs_mount		*mp = bs->cur->bc_mp;
 	struct xfs_dinode		*dip;
-	struct xfs_buf			*bp;
-	xfs_ino_t			fsino;
+	struct xfs_buf			*cluster_bp;
 	unsigned int			nr_inodes;
-	xfs_agino_t			chunkino;
-	xfs_agino_t			clusterino;
+	xfs_agnumber_t			agno = bs->cur->bc_private.a.agno;
 	xfs_agblock_t			agbno;
-	uint16_t			holemask;
+	unsigned int			cluster_index;
+	uint16_t			cluster_mask = 0;
 	uint16_t			ir_holemask;
 	int				error = 0;
 
-	/* Make sure the freemask matches the inode records. */
 	nr_inodes = min_t(unsigned int, XFS_INODES_PER_CHUNK,
 			mp->m_inodes_per_cluster);
 
-	fsino = XFS_AGINO_TO_INO(mp, bs->cur->bc_private.a.agno, agino);
-	chunkino = agino - irec->ir_startino;
-	agbno = XFS_AGINO_TO_AGBNO(mp, agino);
+	/* Map this inode cluster */
+	agbno = XFS_AGINO_TO_AGBNO(mp, irec->ir_startino + cluster_base);
 
-	/* Compute the holemask mask for this cluster. */
-	for (clusterino = 0, holemask = 0; clusterino < nr_inodes;
-	     clusterino += XFS_INODES_PER_HOLEMASK_BIT)
-		holemask |= XFS_INOBT_MASK((chunkino + clusterino) /
+	/* Compute a bitmask for this cluster that can be used for holemask. */
+	for (cluster_index = 0;
+	     cluster_index < nr_inodes;
+	     cluster_index += XFS_INODES_PER_HOLEMASK_BIT)
+		cluster_mask |= XFS_INOBT_MASK((cluster_base + cluster_index) /
 				XFS_INODES_PER_HOLEMASK_BIT);
 
+	ir_holemask = (irec->ir_holemask & cluster_mask);
+	imap.im_blkno = XFS_AGB_TO_DADDR(mp, agno, agbno);
+	imap.im_len = XFS_FSB_TO_BB(mp, mp->m_blocks_per_cluster);
+	imap.im_boffset = 0;
+
+	trace_xchk_iallocbt_check_cluster(mp, agno, irec->ir_startino,
+			imap.im_blkno, imap.im_len, cluster_base, nr_inodes,
+			cluster_mask, ir_holemask,
+			XFS_INO_TO_OFFSET(mp, irec->ir_startino +
+					  cluster_base));
+
 	/* The whole cluster must be a hole or not a hole. */
-	ir_holemask = (irec->ir_holemask & holemask);
-	if (ir_holemask != holemask && ir_holemask != 0) {
+	if (ir_holemask != cluster_mask && ir_holemask != 0) {
 		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
 		return 0;
 	}
@@ -241,40 +284,47 @@ xchk_iallocbt_check_cluster(
 			&XFS_RMAP_OINFO_INODES);
 
 	/* Grab the inode cluster buffer. */
-	imap.im_blkno = XFS_AGB_TO_DADDR(mp, bs->cur->bc_private.a.agno, agbno);
-	imap.im_len = XFS_FSB_TO_BB(mp, mp->m_blocks_per_cluster);
-	imap.im_boffset = 0;
-
-	error = xfs_imap_to_bp(mp, bs->cur->bc_tp, &imap, &dip, &bp, 0, 0);
+	error = xfs_imap_to_bp(mp, bs->cur->bc_tp, &imap, &dip, &cluster_bp,
+			0, 0);
 	if (!xchk_btree_xref_process_error(bs->sc, bs->cur, 0, &error))
-		return 0;
+		return error;
 
-	/* Which inodes are free? */
-	for (clusterino = 0; clusterino < nr_inodes; clusterino++) {
-		error = xchk_iallocbt_check_cluster_freemask(bs, fsino,
-				chunkino, clusterino, irec, bp);
+	/* Check free status of each inode within this cluster. */
+	for (cluster_index = 0; cluster_index < nr_inodes; cluster_index++) {
+		error = xchk_iallocbt_check_cluster_ifree(bs, irec,
+				cluster_base, cluster_bp, cluster_index);
 		if (error)
 			break;
 	}
 
-	xfs_trans_brelse(bs->cur->bc_tp, bp);
+	xfs_trans_brelse(bs->cur->bc_tp, cluster_bp);
 	return error;
 }
 
-/* Make sure the free mask is consistent with what the inodes think. */
+/*
+ * For all the inode clusters that could map to this inobt record, make sure
+ * that the holemask makes sense and that the allocation status of each inode
+ * matches the freemask.
+ */
 STATIC int
-xchk_iallocbt_check_freemask(
+xchk_iallocbt_check_clusters(
 	struct xchk_btree		*bs,
 	struct xfs_inobt_rec_incore	*irec)
 {
-	struct xfs_mount		*mp = bs->cur->bc_mp;
-	xfs_agino_t			agino;
+	unsigned int			cluster_base;
 	int				error = 0;
 
-	for (agino = irec->ir_startino;
-	     agino < irec->ir_startino + XFS_INODES_PER_CHUNK;
-	     agino += mp->m_blocks_per_cluster * mp->m_sb.sb_inopblock) {
-		error = xchk_iallocbt_check_cluster(bs, irec, agino);
+	/*
+	 * For the common case where this inobt record maps to multiple inode
+	 * clusters this will call _check_cluster for each cluster.
+	 *
+	 * For the case that multiple inobt records map to a single cluster,
+	 * this will call _check_cluster once.
+	 */
+	for (cluster_base = 0;
+	     cluster_base < XFS_INODES_PER_CHUNK;
+	     cluster_base += bs->sc->mp->m_inodes_per_cluster) {
+		error = xchk_iallocbt_check_cluster(bs, irec, cluster_base);
 		if (error)
 			break;
 	}
@@ -417,7 +467,7 @@ xchk_iallocbt_rec(
 		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
 
 check_freemask:
-	error = xchk_iallocbt_check_freemask(bs, &irec);
+	error = xchk_iallocbt_check_clusters(bs, &irec);
 	if (error)
 		goto out;
 
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 4e20f0e48232..f1c144c6dcc5 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -480,6 +480,51 @@ TRACE_EVENT(xchk_xref_error,
 		  __entry->ret_ip)
 );
 
+TRACE_EVENT(xchk_iallocbt_check_cluster,
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
+		 xfs_agino_t startino, xfs_daddr_t map_daddr,
+		 unsigned short map_len, unsigned int chunk_ino,
+		 unsigned int nr_inodes, uint16_t cluster_mask,
+		 uint16_t holemask, unsigned int cluster_ino),
+	TP_ARGS(mp, agno, startino, map_daddr, map_len, chunk_ino, nr_inodes,
+		cluster_mask, holemask, cluster_ino),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_agino_t, startino)
+		__field(xfs_daddr_t, map_daddr)
+		__field(unsigned short, map_len)
+		__field(unsigned int, chunk_ino)
+		__field(unsigned int, nr_inodes)
+		__field(unsigned int, cluster_ino)
+		__field(uint16_t, cluster_mask)
+		__field(uint16_t, holemask)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->agno = agno;
+		__entry->startino = startino;
+		__entry->map_daddr = map_daddr;
+		__entry->map_len = map_len;
+		__entry->chunk_ino = chunk_ino;
+		__entry->nr_inodes = nr_inodes;
+		__entry->cluster_mask = cluster_mask;
+		__entry->holemask = holemask;
+		__entry->cluster_ino = cluster_ino;
+	),
+	TP_printk("dev %d:%d agno %d startino %u daddr 0x%llx len %d chunkino %u nr_inodes %u cluster_mask 0x%x holemask 0x%x cluster_ino %u",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->agno,
+		  __entry->startino,
+		  __entry->map_daddr,
+		  __entry->map_len,
+		  __entry->chunk_ino,
+		  __entry->nr_inodes,
+		  __entry->cluster_mask,
+		  __entry->holemask,
+		  __entry->cluster_ino)
+)
+
 /* repair tracepoints */
 #if IS_ENABLED(CONFIG_XFS_ONLINE_REPAIR)
 

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 7/9] xfs: scrub big block inode btrees correctly
  2018-11-28 23:26 [PATCH 0/9] xfs-5.0: inode scrubber fixes Darrick J. Wong
                   ` (5 preceding siblings ...)
  2018-11-28 23:27 ` [PATCH 6/9] xfs: clean up the inode cluster checking in the inobt scrub Darrick J. Wong
@ 2018-11-28 23:27 ` Darrick J. Wong
  2018-11-28 23:27 ` [PATCH 8/9] xfs: abort xattr scrub if fatal signals are pending Darrick J. Wong
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Darrick J. Wong @ 2018-11-28 23:27 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Teach scrub how to handle the case that there are one or more inobt
records covering a given inode cluster.  This fixes the operation on big
block filesystems (e.g. 64k blocks, 512 byte inodes).

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/scrub/ialloc.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)


diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
index 1d79f12dfce5..26bcf359b326 100644
--- a/fs/xfs/scrub/ialloc.c
+++ b/fs/xfs/scrub/ialloc.c
@@ -162,6 +162,7 @@ xchk_iallocbt_check_cluster_ifree(
 	xfs_ino_t			fsino;
 	xfs_agino_t			agino;
 	unsigned int			offset;
+	unsigned int			cluster_buf_base;
 	bool				irec_free;
 	bool				ino_inuse;
 	bool				freemask_ok;
@@ -174,11 +175,14 @@ xchk_iallocbt_check_cluster_ifree(
 	 * Given an inobt record, an offset of a cluster within the record,
 	 * and an offset of an inode within a cluster, compute which fs inode
 	 * we're talking about and the offset of the inode record within the
-	 * inode buffer.
+	 * inode buffer, being careful about inobt records that don't align
+	 * with the start of the inode buffer when block sizes are large.
 	 */
 	agino = irec->ir_startino + cluster_base + cluster_index;
 	fsino = XFS_AGINO_TO_INO(mp, bs->cur->bc_private.a.agno, agino);
-	offset = cluster_index * mp->m_sb.sb_inodesize;
+	cluster_buf_base = XFS_INO_TO_OFFSET(mp, irec->ir_startino +
+					     cluster_base);
+	offset = (cluster_buf_base + cluster_index) * mp->m_sb.sb_inodesize;
 	if (offset >= BBTOB(cluster_bp->b_length)) {
 		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
 		goto out;

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 8/9] xfs: abort xattr scrub if fatal signals are pending
  2018-11-28 23:26 [PATCH 0/9] xfs-5.0: inode scrubber fixes Darrick J. Wong
                   ` (6 preceding siblings ...)
  2018-11-28 23:27 ` [PATCH 7/9] xfs: scrub big block inode btrees correctly Darrick J. Wong
@ 2018-11-28 23:27 ` Darrick J. Wong
  2018-11-28 23:27 ` [PATCH 9/9] xfs: scrub should flag dir/attr offsets that aren't mappable with xfs_dablk_t Darrick J. Wong
  2018-12-19  8:31 ` [PATCH 0/9] xfs-5.0: inode scrubber fixes Chandan Rajendra
  9 siblings, 0 replies; 18+ messages in thread
From: Darrick J. Wong @ 2018-11-28 23:27 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

The extended attribute scrubber should abort the "read all attrs" loop
if there's a fatal signal pending on the process.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/scrub/attr.c |    5 +++++
 1 file changed, 5 insertions(+)


diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
index 81d5e90547a1..9960bc5b5d76 100644
--- a/fs/xfs/scrub/attr.c
+++ b/fs/xfs/scrub/attr.c
@@ -82,6 +82,11 @@ xchk_xattr_listent(
 
 	sx = container_of(context, struct xchk_xattr, context);
 
+	if (xchk_should_terminate(sx->sc, &error)) {
+		context->seen_enough = 1;
+		return;
+	}
+
 	if (flags & XFS_ATTR_INCOMPLETE) {
 		/* Incomplete attr key, just mark the inode for preening. */
 		xchk_ino_set_preen(sx->sc, context->dp->i_ino);

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 9/9] xfs: scrub should flag dir/attr offsets that aren't mappable with xfs_dablk_t
  2018-11-28 23:26 [PATCH 0/9] xfs-5.0: inode scrubber fixes Darrick J. Wong
                   ` (7 preceding siblings ...)
  2018-11-28 23:27 ` [PATCH 8/9] xfs: abort xattr scrub if fatal signals are pending Darrick J. Wong
@ 2018-11-28 23:27 ` Darrick J. Wong
  2018-12-19  8:31 ` [PATCH 0/9] xfs-5.0: inode scrubber fixes Chandan Rajendra
  9 siblings, 0 replies; 18+ messages in thread
From: Darrick J. Wong @ 2018-11-28 23:27 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Teach scrub to flag extent maps that exceed the range that can be mapped
with a xfs_dablk_t.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_types.c |   11 +++++++++++
 fs/xfs/libxfs/xfs_types.h |    1 +
 fs/xfs/scrub/bmap.c       |   27 +++++++++++++++++++++++++++
 3 files changed, 39 insertions(+)


diff --git a/fs/xfs/libxfs/xfs_types.c b/fs/xfs/libxfs/xfs_types.c
index 3306fc42cfad..8e03e88a2f1a 100644
--- a/fs/xfs/libxfs/xfs_types.c
+++ b/fs/xfs/libxfs/xfs_types.c
@@ -204,3 +204,14 @@ xfs_verify_icount(
 	xfs_icount_range(mp, &min, &max);
 	return icount >= min && icount <= max;
 }
+
+/* Sanity-checking of dir/attr block offsets. */
+bool
+xfs_verify_dablk(
+	struct xfs_mount	*mp,
+	xfs_fileoff_t		dabno)
+{
+	xfs_dablk_t		max_dablk = -1U;
+
+	return dabno < max_dablk;
+}
diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h
index b9e6c89284c3..05e8fa558f3e 100644
--- a/fs/xfs/libxfs/xfs_types.h
+++ b/fs/xfs/libxfs/xfs_types.h
@@ -166,5 +166,6 @@ bool xfs_internal_inum(struct xfs_mount *mp, xfs_ino_t ino);
 bool xfs_verify_dir_ino(struct xfs_mount *mp, xfs_ino_t ino);
 bool xfs_verify_rtbno(struct xfs_mount *mp, xfs_rtblock_t rtbno);
 bool xfs_verify_icount(struct xfs_mount *mp, unsigned long long icount);
+bool xfs_verify_dablk(struct xfs_mount *mp, xfs_fileoff_t off);
 
 #endif	/* __XFS_TYPES_H__ */
diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c
index e1d11f3223e3..a703cd58a90e 100644
--- a/fs/xfs/scrub/bmap.c
+++ b/fs/xfs/scrub/bmap.c
@@ -281,6 +281,31 @@ xchk_bmap_extent_xref(
 	xchk_ag_free(info->sc, &info->sc->sa);
 }
 
+/*
+ * Directories and attr forks should never have blocks that can't be addressed
+ * by a xfs_dablk_t.
+ */
+STATIC void
+xchk_bmap_dirattr_extent(
+	struct xfs_inode	*ip,
+	struct xchk_bmap_info	*info,
+	struct xfs_bmbt_irec	*irec)
+{
+	struct xfs_mount	*mp = ip->i_mount;
+	xfs_fileoff_t		off;
+
+	if (!S_ISDIR(VFS_I(ip)->i_mode) && info->whichfork != XFS_ATTR_FORK)
+		return;
+
+	if (!xfs_verify_dablk(mp, irec->br_startoff))
+		xchk_fblock_set_corrupt(info->sc, info->whichfork,
+				irec->br_startoff);
+
+	off = irec->br_startoff + irec->br_blockcount - 1;
+	if (!xfs_verify_dablk(mp, off))
+		xchk_fblock_set_corrupt(info->sc, info->whichfork, off);
+}
+
 /* Scrub a single extent record. */
 STATIC int
 xchk_bmap_extent(
@@ -305,6 +330,8 @@ xchk_bmap_extent(
 		xchk_fblock_set_corrupt(info->sc, info->whichfork,
 				irec->br_startoff);
 
+	xchk_bmap_dirattr_extent(ip, info, irec);
+
 	/* There should never be a "hole" extent in either extent list. */
 	if (irec->br_startblock == HOLESTARTBLOCK)
 		xchk_fblock_set_corrupt(info->sc, info->whichfork,

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 6/9] xfs: clean up the inode cluster checking in the inobt scrub
  2018-11-28 23:27 ` [PATCH 6/9] xfs: clean up the inode cluster checking in the inobt scrub Darrick J. Wong
@ 2018-12-19  8:31   ` Chandan Rajendra
  0 siblings, 0 replies; 18+ messages in thread
From: Chandan Rajendra @ 2018-12-19  8:31 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

Hi Darrick,

Trivial review comments inlined below,

On Thursday, November 29, 2018 4:57:36 AM IST Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> The code to check inobt records against inode clusters is a mess of
> poorly named variables and unnecessary parameters.  Clean the
> unnecessary inode number parameters out of _check_cluster_freemask in
> favor of computing them inside the function instead of making the caller
> do it.  In xchk_iallocbt_check_cluster, rename the variables to make it
> more obvious just what chunk_ino and cluster_ino represent.
> 
> Add a tracepoint to make it easier to track each inode cluster as we
> scrub it.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/scrub/ialloc.c |  162 ++++++++++++++++++++++++++++++++-----------------
>  fs/xfs/scrub/trace.h  |   45 ++++++++++++++
>  2 files changed, 151 insertions(+), 56 deletions(-)
> 
> 
> diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
> index 475f930e0daa..1d79f12dfce5 100644
> --- a/fs/xfs/scrub/ialloc.c
> +++ b/fs/xfs/scrub/ialloc.c
> @@ -134,41 +134,70 @@ xchk_iallocbt_freecount(
>  	return hweight64(freemask);
>  }
> 
> -/* Check a particular inode with ir_free. */
> +/*
> + * Check that an inode's allocation status matches ir_free in the inobt
> + * record.  First we try querying the in-core inode state, and if the inode
> + * isn't loaded we examine the on-disk inode directly.
> + *
> + * Since there can be 1:M and M:1 mappings between inobt records and inode
> + * clusters, we pass in the inode location information as an inobt record;
> + * the index of an inode cluster within the inobt record (as well as the
> + * cluster buffer itself); and the index of the inode within the cluster.
> + *
> + * @irec is the inobt record.
> + * @cluster_base is the inode offset of the cluster within the @irec.
> + * @cluster_bp is the cluster buffer.
> + * @cluster_index is the inode offset within the inode cluster.
> + */
>  STATIC int
> -xchk_iallocbt_check_cluster_freemask(
> +xchk_iallocbt_check_cluster_ifree(
>  	struct xchk_btree		*bs,
> -	xfs_ino_t			fsino,
> -	xfs_agino_t			chunkino,
> -	xfs_agino_t			clusterino,
>  	struct xfs_inobt_rec_incore	*irec,
> -	struct xfs_buf			*bp)
> +	unsigned int			cluster_base,
> +	struct xfs_buf			*cluster_bp,
> +	unsigned int			cluster_index)
>  {
> -	struct xfs_dinode		*dip;
>  	struct xfs_mount		*mp = bs->cur->bc_mp;
> -	bool				inode_is_free = false;
> +	struct xfs_dinode		*dip;
> +	xfs_ino_t			fsino;
> +	xfs_agino_t			agino;
> +	unsigned int			offset;
> +	bool				irec_free;
> +	bool				ino_inuse;
>  	bool				freemask_ok;
> -	bool				inuse;
> -	int				error = 0;
> +	int				error;
> 
>  	if (xchk_should_terminate(bs->sc, &error))
>  		return error;
> 
> -	dip = xfs_buf_offset(bp, clusterino * mp->m_sb.sb_inodesize);
> +	/*
> +	 * Given an inobt record, an offset of a cluster within the record,
> +	 * and an offset of an inode within a cluster, compute which fs inode
> +	 * we're talking about and the offset of the inode record within the
> +	 * inode buffer.
> +	 */
> +	agino = irec->ir_startino + cluster_base + cluster_index;
> +	fsino = XFS_AGINO_TO_INO(mp, bs->cur->bc_private.a.agno, agino);
> +	offset = cluster_index * mp->m_sb.sb_inodesize;

I noticed that the 64k block size case has been fixed in the next patch.

> +	if (offset >= BBTOB(cluster_bp->b_length)) {
> +		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> +		goto out;
> +	}
> +	dip = xfs_buf_offset(cluster_bp, offset);
> +	irec_free = (irec->ir_free & XFS_INOBT_MASK(cluster_base +
> +						    cluster_index));
> +
>  	if (be16_to_cpu(dip->di_magic) != XFS_DINODE_MAGIC ||
> -	    (dip->di_version >= 3 &&
> -	     be64_to_cpu(dip->di_ino) != fsino + clusterino)) {
> +	    (dip->di_version >= 3 && be64_to_cpu(dip->di_ino) != fsino)) {
>  		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
>  		goto out;
>  	}
> 
> -	if (irec->ir_free & XFS_INOBT_MASK(chunkino + clusterino))
> -		inode_is_free = true;
> -	error = xfs_icache_inode_is_allocated(mp, bs->cur->bc_tp,
> -			fsino + clusterino, &inuse);
> +	error = xfs_icache_inode_is_allocated(mp, bs->cur->bc_tp, fsino,
> +			&ino_inuse);
>  	if (error == -ENODATA) {
>  		/* Not cached, just read the disk buffer */
> -		freemask_ok = inode_is_free ^ !!(dip->di_mode);
> +		freemask_ok = irec_free ^ !!(dip->di_mode);
>  		if (!bs->sc->try_harder && !freemask_ok)
>  			return -EDEADLOCK;
>  	} else if (error < 0) {
> @@ -180,7 +209,7 @@ xchk_iallocbt_check_cluster_freemask(
>  		goto out;
>  	} else {
>  		/* Inode is all there. */
> -		freemask_ok = inode_is_free ^ inuse;
> +		freemask_ok = irec_free ^ ino_inuse;
>  	}
>  	if (!freemask_ok)
>  		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> @@ -188,43 +217,57 @@ xchk_iallocbt_check_cluster_freemask(
>  	return 0;
>  }
> 
> -/* Check an inode cluster. */
> +/*
> + * Check that the holemask and freemask of a hypothetical inode cluster match
> + * what's actually on disk.  If sparse inodes are enabled, the cluster does
> + * not actually have to map to inodes if the corresponding holemask bit is set.
> + *
> + * @cluster_base is the first inode in the cluster within the @irec.
> + */
>  STATIC int
>  xchk_iallocbt_check_cluster(
>  	struct xchk_btree		*bs,
>  	struct xfs_inobt_rec_incore	*irec,
> -	xfs_agino_t			agino)
> +	unsigned int			cluster_base)
>  {
>  	struct xfs_imap			imap;
>  	struct xfs_mount		*mp = bs->cur->bc_mp;
>  	struct xfs_dinode		*dip;
> -	struct xfs_buf			*bp;
> -	xfs_ino_t			fsino;
> +	struct xfs_buf			*cluster_bp;
>  	unsigned int			nr_inodes;
> -	xfs_agino_t			chunkino;
> -	xfs_agino_t			clusterino;
> +	xfs_agnumber_t			agno = bs->cur->bc_private.a.agno;
>  	xfs_agblock_t			agbno;
> -	uint16_t			holemask;
> +	unsigned int			cluster_index;
> +	uint16_t			cluster_mask = 0;
>  	uint16_t			ir_holemask;
>  	int				error = 0;
> 
> -	/* Make sure the freemask matches the inode records. */
>  	nr_inodes = min_t(unsigned int, XFS_INODES_PER_CHUNK,
>  			mp->m_inodes_per_cluster);
> 
> -	fsino = XFS_AGINO_TO_INO(mp, bs->cur->bc_private.a.agno, agino);
> -	chunkino = agino - irec->ir_startino;
> -	agbno = XFS_AGINO_TO_AGBNO(mp, agino);
> +	/* Map this inode cluster */
> +	agbno = XFS_AGINO_TO_AGBNO(mp, irec->ir_startino + cluster_base);
> 
> -	/* Compute the holemask mask for this cluster. */
> -	for (clusterino = 0, holemask = 0; clusterino < nr_inodes;
> -	     clusterino += XFS_INODES_PER_HOLEMASK_BIT)
> -		holemask |= XFS_INOBT_MASK((chunkino + clusterino) /
> +	/* Compute a bitmask for this cluster that can be used for holemask. */
> +	for (cluster_index = 0;
> +	     cluster_index < nr_inodes;
> +	     cluster_index += XFS_INODES_PER_HOLEMASK_BIT)
> +		cluster_mask |= XFS_INOBT_MASK((cluster_base + cluster_index) /
>  				XFS_INODES_PER_HOLEMASK_BIT);
> 
> +	ir_holemask = (irec->ir_holemask & cluster_mask);
> +	imap.im_blkno = XFS_AGB_TO_DADDR(mp, agno, agbno);
> +	imap.im_len = XFS_FSB_TO_BB(mp, mp->m_blocks_per_cluster);
> +	imap.im_boffset = 0;
> +
> +	trace_xchk_iallocbt_check_cluster(mp, agno, irec->ir_startino,
> +			imap.im_blkno, imap.im_len, cluster_base, nr_inodes,
> +			cluster_mask, ir_holemask,
> +			XFS_INO_TO_OFFSET(mp, irec->ir_startino +
> +					  cluster_base));
> +
>  	/* The whole cluster must be a hole or not a hole. */

IMHO, "The whole cluster must be a hole or not *have* a hole in it" probably
conveys the meaning clearly.

> -	ir_holemask = (irec->ir_holemask & holemask);
> -	if (ir_holemask != holemask && ir_holemask != 0) {
> +	if (ir_holemask != cluster_mask && ir_holemask != 0) {
>  		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
>  		return 0;
>  	}
> @@ -241,40 +284,47 @@ xchk_iallocbt_check_cluster(
>  			&XFS_RMAP_OINFO_INODES);
> 
>  	/* Grab the inode cluster buffer. */
> -	imap.im_blkno = XFS_AGB_TO_DADDR(mp, bs->cur->bc_private.a.agno, agbno);
> -	imap.im_len = XFS_FSB_TO_BB(mp, mp->m_blocks_per_cluster);
> -	imap.im_boffset = 0;
> -
> -	error = xfs_imap_to_bp(mp, bs->cur->bc_tp, &imap, &dip, &bp, 0, 0);
> +	error = xfs_imap_to_bp(mp, bs->cur->bc_tp, &imap, &dip, &cluster_bp,
> +			0, 0);
>  	if (!xchk_btree_xref_process_error(bs->sc, bs->cur, 0, &error))
> -		return 0;
> +		return error;
> 
> -	/* Which inodes are free? */
> -	for (clusterino = 0; clusterino < nr_inodes; clusterino++) {
> -		error = xchk_iallocbt_check_cluster_freemask(bs, fsino,
> -				chunkino, clusterino, irec, bp);
> +	/* Check free status of each inode within this cluster. */
> +	for (cluster_index = 0; cluster_index < nr_inodes; cluster_index++) {
> +		error = xchk_iallocbt_check_cluster_ifree(bs, irec,
> +				cluster_base, cluster_bp, cluster_index);
>  		if (error)
>  			break;
>  	}
> 
> -	xfs_trans_brelse(bs->cur->bc_tp, bp);
> +	xfs_trans_brelse(bs->cur->bc_tp, cluster_bp);
>  	return error;
>  }
> 
> -/* Make sure the free mask is consistent with what the inodes think. */
> +/*
> + * For all the inode clusters that could map to this inobt record, make sure
> + * that the holemask makes sense and that the allocation status of each inode
> + * matches the freemask.
> + */
>  STATIC int
> -xchk_iallocbt_check_freemask(
> +xchk_iallocbt_check_clusters(
>  	struct xchk_btree		*bs,
>  	struct xfs_inobt_rec_incore	*irec)
>  {
> -	struct xfs_mount		*mp = bs->cur->bc_mp;
> -	xfs_agino_t			agino;
> +	unsigned int			cluster_base;
>  	int				error = 0;
> 
> -	for (agino = irec->ir_startino;
> -	     agino < irec->ir_startino + XFS_INODES_PER_CHUNK;
> -	     agino += mp->m_blocks_per_cluster * mp->m_sb.sb_inopblock) {
> -		error = xchk_iallocbt_check_cluster(bs, irec, agino);
> +	/*
> +	 * For the common case where this inobt record maps to multiple inode
> +	 * clusters this will call _check_cluster for each cluster.
> +	 *
> +	 * For the case that multiple inobt records map to a single cluster,
> +	 * this will call _check_cluster once.
> +	 */
> +	for (cluster_base = 0;
> +	     cluster_base < XFS_INODES_PER_CHUNK;
> +	     cluster_base += bs->sc->mp->m_inodes_per_cluster) {
> +		error = xchk_iallocbt_check_cluster(bs, irec, cluster_base);
>  		if (error)
>  			break;
>  	}
> @@ -417,7 +467,7 @@ xchk_iallocbt_rec(
>  		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> 
>  check_freemask:

May be the above label should be renamed to "check_clusters".

> -	error = xchk_iallocbt_check_freemask(bs, &irec);
> +	error = xchk_iallocbt_check_clusters(bs, &irec);
>  	if (error)
>  		goto out;
> 

-- 
chandan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/9] xfs-5.0: inode scrubber fixes
  2018-11-28 23:26 [PATCH 0/9] xfs-5.0: inode scrubber fixes Darrick J. Wong
                   ` (8 preceding siblings ...)
  2018-11-28 23:27 ` [PATCH 9/9] xfs: scrub should flag dir/attr offsets that aren't mappable with xfs_dablk_t Darrick J. Wong
@ 2018-12-19  8:31 ` Chandan Rajendra
  2018-12-19 13:15   ` Chandan Rajendra
  9 siblings, 1 reply; 18+ messages in thread
From: Chandan Rajendra @ 2018-12-19  8:31 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thursday, November 29, 2018 4:56:58 AM IST Darrick J. Wong wrote:
> Hi all,
> 
> Here are fixes for some problems with the inode btree scrub code, namely
> that the existing code does not handle the case where a single inode
> cluster is mapped by multiple inobt records.
> 
> Patch 1 teaches the inode record block counting code to handle the case
> where there's more than one inobt record per inode cluster.  We do this
> by counting inodes and converting to blocks only at the end.
> 
> Patch 2 corrects a condition where we needed to clamp the number of
> inodes checked for a given inobt record to the inode chunk size.
> 
> Patches 3-4 move the inobt record alignment checks to a separate
> function and enhance the function to check that when we have more than
> one inobt record per cluster we actually check that *all* of the
> necessary records are present and in the correct order.
> 
> Patches 5-7 reorganize the inobt free data checks to deal with the
> "multiple inobt records per icluster" situation.  In restructuring the
> code to do so, we also rename variables and functions to be less
> confusing about what they're there for.  We also fix the 'is the inode
> free?' check to calculate dinode buffer offsets correctly in the
> "multiple inobt records per icluster" situation.
> 
> Patch 8 aborts the xattr scrub loop if there are pending fatal signals.
> 
> Patch 9 checks that for any directory or attr fork there are no extent
> maps that stretch beyond what a xfs_dablk_t can map.
> 
> If you're going to start using this mess, you probably ought to just
> pull from my git trees.  The kernel patches[1] should apply against
> 4.20-rc4.
> 
> Comments and questions are, as always, welcome.

Hi Darrick,

I reviewed patches 1 through 7. The fixes look good.

I am not well versed with the XFS code that deals with xattrs &
directories. Hence I could not review patches 8 and 9.

I did a simple scrub test on a 64k blocksized filesystem running a kernel
having the following commit as the HEAD,

commit 6c4e1579b332e52566c11416e0dd0fa91a3b8f70 (HEAD -> djwong-devel, djwong-xfs-linux/djwong-devel)
Author: Darrick J. Wong <darrick.wong@oracle.com>
Date:   Thu Oct 18 17:35:49 2018 -0700

    xfs: repair quotas
    
    Fix anything that causes the quota verifiers to fail.
    
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


Xfsprogs had the following as the topmost commit,

commit 633eec2a893c3be9796dad188144b00e084560ec (HEAD -> djwong-devel, djwong/djwong-devel)
Author: Darrick J. Wong <darrick.wong@oracle.com>
Date:   Tue Nov 13 17:38:31 2018 -0800

    xfs: repair extended attributes
    
    If the extended attributes look bad, try to sift through the rubble to
    find whatever keys/values we can, zap the attr tree, and re-add the
    values.
    
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


The test filesystem had 2000 files with each being 64k in size.

root@ubuntu:~# xfs_scrub -d -n -v /mnt/
EXPERIMENTAL xfs_scrub program in use! Use at your own risk!
Phase 1: Find filesystem geometry.
/mnt/: using 8 threads to scrub.
Phase 2: Check internal metadata.
Info: AG 1 superblock: Optimization is possible. (scrub.c line 269)
Info: AG 3 superblock: Optimization is possible. (scrub.c line 269)
Info: AG 2 superblock: Optimization is possible. (scrub.c line 269)
Error: AG 0 free inode btree: Repairs are required. (scrub.c line 253)
Phase 3: Scan all inodes.
Phase 5: Check directory tree.
Info: /mnt/: Filesystem has errors, skipping connectivity checks. (phase5.c line 295)
Phase 7: Check summary counters.
163.9MiB data used;  1.9K inodes used.
152.7MiB data found; 1.9K inodes found.
1.9K inodes counted; 1.9K inodes checked.
/mnt/: errors found: 1
/mnt/: Re-run xfs_scrub without -n.

Looks like we have a bug when scrubbing the Free inode btree. I will work on
figuring out the root cause and also plan to execute scrub tests shipped with
xfstests.

-- 
chandan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/9] xfs-5.0: inode scrubber fixes
  2018-12-19  8:31 ` [PATCH 0/9] xfs-5.0: inode scrubber fixes Chandan Rajendra
@ 2018-12-19 13:15   ` Chandan Rajendra
  2018-12-19 20:33     ` Darrick J. Wong
  0 siblings, 1 reply; 18+ messages in thread
From: Chandan Rajendra @ 2018-12-19 13:15 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wednesday, December 19, 2018 2:01:13 PM IST Chandan Rajendra wrote:
> On Thursday, November 29, 2018 4:56:58 AM IST Darrick J. Wong wrote:
> > Hi all,
> > 
> > Here are fixes for some problems with the inode btree scrub code, namely
> > that the existing code does not handle the case where a single inode
> > cluster is mapped by multiple inobt records.
> > 
> > Patch 1 teaches the inode record block counting code to handle the case
> > where there's more than one inobt record per inode cluster.  We do this
> > by counting inodes and converting to blocks only at the end.
> > 
> > Patch 2 corrects a condition where we needed to clamp the number of
> > inodes checked for a given inobt record to the inode chunk size.
> > 
> > Patches 3-4 move the inobt record alignment checks to a separate
> > function and enhance the function to check that when we have more than
> > one inobt record per cluster we actually check that *all* of the
> > necessary records are present and in the correct order.
> > 
> > Patches 5-7 reorganize the inobt free data checks to deal with the
> > "multiple inobt records per icluster" situation.  In restructuring the
> > code to do so, we also rename variables and functions to be less
> > confusing about what they're there for.  We also fix the 'is the inode
> > free?' check to calculate dinode buffer offsets correctly in the
> > "multiple inobt records per icluster" situation.
> > 
> > Patch 8 aborts the xattr scrub loop if there are pending fatal signals.
> > 
> > Patch 9 checks that for any directory or attr fork there are no extent
> > maps that stretch beyond what a xfs_dablk_t can map.
> > 
> > If you're going to start using this mess, you probably ought to just
> > pull from my git trees.  The kernel patches[1] should apply against
> > 4.20-rc4.
> > 
> > Comments and questions are, as always, welcome.
> 
> Hi Darrick,
> 
> I reviewed patches 1 through 7. The fixes look good.
> 
> I am not well versed with the XFS code that deals with xattrs &
> directories. Hence I could not review patches 8 and 9.
> 
> I did a simple scrub test on a 64k blocksized filesystem running a kernel
> having the following commit as the HEAD,
> 
> commit 6c4e1579b332e52566c11416e0dd0fa91a3b8f70 (HEAD -> djwong-devel, djwong-xfs-linux/djwong-devel)
> Author: Darrick J. Wong <darrick.wong@oracle.com>
> Date:   Thu Oct 18 17:35:49 2018 -0700
> 
>     xfs: repair quotas
>     
>     Fix anything that causes the quota verifiers to fail.
>     
>     Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> 
> Xfsprogs had the following as the topmost commit,
> 
> commit 633eec2a893c3be9796dad188144b00e084560ec (HEAD -> djwong-devel, djwong/djwong-devel)
> Author: Darrick J. Wong <darrick.wong@oracle.com>
> Date:   Tue Nov 13 17:38:31 2018 -0800
> 
>     xfs: repair extended attributes
>     
>     If the extended attributes look bad, try to sift through the rubble to
>     find whatever keys/values we can, zap the attr tree, and re-add the
>     values.
>     
>     Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> 
> The test filesystem had 2000 files with each being 64k in size.
> 
> root@ubuntu:~# xfs_scrub -d -n -v /mnt/
> EXPERIMENTAL xfs_scrub program in use! Use at your own risk!
> Phase 1: Find filesystem geometry.
> /mnt/: using 8 threads to scrub.
> Phase 2: Check internal metadata.
> Info: AG 1 superblock: Optimization is possible. (scrub.c line 269)
> Info: AG 3 superblock: Optimization is possible. (scrub.c line 269)
> Info: AG 2 superblock: Optimization is possible. (scrub.c line 269)
> Error: AG 0 free inode btree: Repairs are required. (scrub.c line 253)
> Phase 3: Scan all inodes.
> Phase 5: Check directory tree.
> Info: /mnt/: Filesystem has errors, skipping connectivity checks. (phase5.c line 295)
> Phase 7: Check summary counters.
> 163.9MiB data used;  1.9K inodes used.
> 152.7MiB data found; 1.9K inodes found.
> 1.9K inodes counted; 1.9K inodes checked.
> /mnt/: errors found: 1
> /mnt/: Re-run xfs_scrub without -n.
> 
> Looks like we have a bug when scrubbing the Free inode btree. I will work on
> figuring out the root cause and also plan to execute scrub tests shipped with
> xfstests.
> 
> 

We do have an unaligned finobt record,

xfs_scrub 27247 [004] 18014.774762: probe:xchk_iallocbt_rec_alignment: (c00000000075941c) ir_startino=176960 ig_cluster_align_inodes=128 imask_u32=127
        c00000000075941c xchk_iallocbt_rec_alignment+0x10c (/boot/vmlinux)
        c00000000075966c xchk_iallocbt_rec+0x14c (/boot/vmlinux)
                       0 [unknown] ([unknown])
        c000000000752968 xchk_btree+0x288 (/boot/vmlinux)
        c0000000007598c4 xchk_iallocbt+0x64 (/boot/vmlinux)
        c00000000075dc1c xfs_scrub_metadata+0x4ac (/boot/vmlinux)
        c0000000006e4b88 xfs_ioc_scrub_metadata+0x68 (/boot/vmlinux)
        c0000000006e7d5c xfs_file_ioctl+0xbbc (/boot/vmlinux)
        c000000000425e04 do_vfs_ioctl+0xd4 (/boot/vmlinux)
        c000000000426874 ksys_ioctl+0x64 (/boot/vmlinux)
        c000000000426928 __se_sys_ioctl+0x28 (/boot/vmlinux)
        c00000000000bae4 system_call+0x5c (/boot/vmlinux)
            7ffff77d7694 __GI___ioctl+0x114 (/lib/powerpc64le-linux-gnu/libc-2.27.so)
               10000e4e0 xfs_check_metadata+0x100 (/root/repos/xfsprogs-dev/scrub/xfs_scrub)
               10000e484 xfs_check_metadata+0xa4 (/root/repos/xfsprogs-dev/scrub/xfs_scrub)
               10000ec48 xfs_scrub_metadata+0xd8 (/root/repos/xfsprogs-dev/scrub/xfs_scrub)
               100008dac xfs_scan_ag_metadata+0x12c (/root/repos/xfsprogs-dev/scrub/xfs_scrub)
               1000198c8 workqueue_thread+0x108 (/root/repos/xfsprogs-dev/scrub/xfs_scrub)
            7ffff7f2885c start_thread+0x10c (/lib/powerpc64le-linux-gnu/libpthread-2.27.so)
            7ffff77e9028 __clone+0x98 (/lib/powerpc64le-linux-gnu/libc-2.27.so)

Maybe this is due to a bug in the finobt code. I will continue debugging and
report my findings.

-- 
chandan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/9] xfs-5.0: inode scrubber fixes
  2018-12-19 13:15   ` Chandan Rajendra
@ 2018-12-19 20:33     ` Darrick J. Wong
  2018-12-31 11:39       ` Chandan Rajendra
  0 siblings, 1 reply; 18+ messages in thread
From: Darrick J. Wong @ 2018-12-19 20:33 UTC (permalink / raw)
  To: Chandan Rajendra; +Cc: linux-xfs

On Wed, Dec 19, 2018 at 06:45:26PM +0530, Chandan Rajendra wrote:
> On Wednesday, December 19, 2018 2:01:13 PM IST Chandan Rajendra wrote:
> > On Thursday, November 29, 2018 4:56:58 AM IST Darrick J. Wong wrote:
> > > Hi all,
> > > 
> > > Here are fixes for some problems with the inode btree scrub code, namely
> > > that the existing code does not handle the case where a single inode
> > > cluster is mapped by multiple inobt records.
> > > 
> > > Patch 1 teaches the inode record block counting code to handle the case
> > > where there's more than one inobt record per inode cluster.  We do this
> > > by counting inodes and converting to blocks only at the end.
> > > 
> > > Patch 2 corrects a condition where we needed to clamp the number of
> > > inodes checked for a given inobt record to the inode chunk size.
> > > 
> > > Patches 3-4 move the inobt record alignment checks to a separate
> > > function and enhance the function to check that when we have more than
> > > one inobt record per cluster we actually check that *all* of the
> > > necessary records are present and in the correct order.
> > > 
> > > Patches 5-7 reorganize the inobt free data checks to deal with the
> > > "multiple inobt records per icluster" situation.  In restructuring the
> > > code to do so, we also rename variables and functions to be less
> > > confusing about what they're there for.  We also fix the 'is the inode
> > > free?' check to calculate dinode buffer offsets correctly in the
> > > "multiple inobt records per icluster" situation.
> > > 
> > > Patch 8 aborts the xattr scrub loop if there are pending fatal signals.
> > > 
> > > Patch 9 checks that for any directory or attr fork there are no extent
> > > maps that stretch beyond what a xfs_dablk_t can map.
> > > 
> > > If you're going to start using this mess, you probably ought to just
> > > pull from my git trees.  The kernel patches[1] should apply against
> > > 4.20-rc4.
> > > 
> > > Comments and questions are, as always, welcome.
> > 
> > Hi Darrick,
> > 
> > I reviewed patches 1 through 7. The fixes look good.
> > 
> > I am not well versed with the XFS code that deals with xattrs &
> > directories. Hence I could not review patches 8 and 9.
> > 
> > I did a simple scrub test on a 64k blocksized filesystem running a kernel
> > having the following commit as the HEAD,
> > 
> > commit 6c4e1579b332e52566c11416e0dd0fa91a3b8f70 (HEAD -> djwong-devel, djwong-xfs-linux/djwong-devel)
> > Author: Darrick J. Wong <darrick.wong@oracle.com>
> > Date:   Thu Oct 18 17:35:49 2018 -0700
> > 
> >     xfs: repair quotas
> >     
> >     Fix anything that causes the quota verifiers to fail.
> >     
> >     Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > 
> > Xfsprogs had the following as the topmost commit,
> > 
> > commit 633eec2a893c3be9796dad188144b00e084560ec (HEAD -> djwong-devel, djwong/djwong-devel)
> > Author: Darrick J. Wong <darrick.wong@oracle.com>
> > Date:   Tue Nov 13 17:38:31 2018 -0800
> > 
> >     xfs: repair extended attributes
> >     
> >     If the extended attributes look bad, try to sift through the rubble to
> >     find whatever keys/values we can, zap the attr tree, and re-add the
> >     values.
> >     
> >     Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > 
> > The test filesystem had 2000 files with each being 64k in size.
> > 
> > root@ubuntu:~# xfs_scrub -d -n -v /mnt/
> > EXPERIMENTAL xfs_scrub program in use! Use at your own risk!
> > Phase 1: Find filesystem geometry.
> > /mnt/: using 8 threads to scrub.
> > Phase 2: Check internal metadata.
> > Info: AG 1 superblock: Optimization is possible. (scrub.c line 269)
> > Info: AG 3 superblock: Optimization is possible. (scrub.c line 269)
> > Info: AG 2 superblock: Optimization is possible. (scrub.c line 269)
> > Error: AG 0 free inode btree: Repairs are required. (scrub.c line 253)
> > Phase 3: Scan all inodes.
> > Phase 5: Check directory tree.
> > Info: /mnt/: Filesystem has errors, skipping connectivity checks. (phase5.c line 295)
> > Phase 7: Check summary counters.
> > 163.9MiB data used;  1.9K inodes used.
> > 152.7MiB data found; 1.9K inodes found.
> > 1.9K inodes counted; 1.9K inodes checked.
> > /mnt/: errors found: 1
> > /mnt/: Re-run xfs_scrub without -n.
> > 
> > Looks like we have a bug when scrubbing the Free inode btree. I will work on
> > figuring out the root cause and also plan to execute scrub tests shipped with
> > xfstests.
> > 
> > 
> 
> We do have an unaligned finobt record,
> 
> xfs_scrub 27247 [004] 18014.774762: probe:xchk_iallocbt_rec_alignment: (c00000000075941c) ir_startino=176960 ig_cluster_align_inodes=128 imask_u32=127
>         c00000000075941c xchk_iallocbt_rec_alignment+0x10c (/boot/vmlinux)
>         c00000000075966c xchk_iallocbt_rec+0x14c (/boot/vmlinux)
>                        0 [unknown] ([unknown])
>         c000000000752968 xchk_btree+0x288 (/boot/vmlinux)
>         c0000000007598c4 xchk_iallocbt+0x64 (/boot/vmlinux)
>         c00000000075dc1c xfs_scrub_metadata+0x4ac (/boot/vmlinux)
>         c0000000006e4b88 xfs_ioc_scrub_metadata+0x68 (/boot/vmlinux)
>         c0000000006e7d5c xfs_file_ioctl+0xbbc (/boot/vmlinux)
>         c000000000425e04 do_vfs_ioctl+0xd4 (/boot/vmlinux)
>         c000000000426874 ksys_ioctl+0x64 (/boot/vmlinux)
>         c000000000426928 __se_sys_ioctl+0x28 (/boot/vmlinux)
>         c00000000000bae4 system_call+0x5c (/boot/vmlinux)
>             7ffff77d7694 __GI___ioctl+0x114 (/lib/powerpc64le-linux-gnu/libc-2.27.so)
>                10000e4e0 xfs_check_metadata+0x100 (/root/repos/xfsprogs-dev/scrub/xfs_scrub)
>                10000e484 xfs_check_metadata+0xa4 (/root/repos/xfsprogs-dev/scrub/xfs_scrub)
>                10000ec48 xfs_scrub_metadata+0xd8 (/root/repos/xfsprogs-dev/scrub/xfs_scrub)
>                100008dac xfs_scan_ag_metadata+0x12c (/root/repos/xfsprogs-dev/scrub/xfs_scrub)
>                1000198c8 workqueue_thread+0x108 (/root/repos/xfsprogs-dev/scrub/xfs_scrub)
>             7ffff7f2885c start_thread+0x10c (/lib/powerpc64le-linux-gnu/libpthread-2.27.so)
>             7ffff77e9028 __clone+0x98 (/lib/powerpc64le-linux-gnu/libc-2.27.so)
> 
> Maybe this is due to a bug in the finobt code. I will continue debugging and
> report my findings.

Ok.  Feel free to send along a metadump if you have one.

--D

> -- 
> chandan
> 
> 
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/9] xfs-5.0: inode scrubber fixes
  2018-12-19 20:33     ` Darrick J. Wong
@ 2018-12-31 11:39       ` Chandan Rajendra
  2018-12-31 18:51         ` Darrick J. Wong
  0 siblings, 1 reply; 18+ messages in thread
From: Chandan Rajendra @ 2018-12-31 11:39 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Chandan Rajendra, linux-xfs

[-- Attachment #1: Type: text/plain, Size: 12220 bytes --]

On Thursday, December 20, 2018 2:03:59 AM IST Darrick J. Wong wrote:
> On Wed, Dec 19, 2018 at 06:45:26PM +0530, Chandan Rajendra wrote:
> > On Wednesday, December 19, 2018 2:01:13 PM IST Chandan Rajendra wrote:
> > > On Thursday, November 29, 2018 4:56:58 AM IST Darrick J. Wong wrote:
> > > > Hi all,
> > > > 
> > > > Here are fixes for some problems with the inode btree scrub code, namely
> > > > that the existing code does not handle the case where a single inode
> > > > cluster is mapped by multiple inobt records.
> > > > 
> > > > Patch 1 teaches the inode record block counting code to handle the case
> > > > where there's more than one inobt record per inode cluster.  We do this
> > > > by counting inodes and converting to blocks only at the end.
> > > > 
> > > > Patch 2 corrects a condition where we needed to clamp the number of
> > > > inodes checked for a given inobt record to the inode chunk size.
> > > > 
> > > > Patches 3-4 move the inobt record alignment checks to a separate
> > > > function and enhance the function to check that when we have more than
> > > > one inobt record per cluster we actually check that *all* of the
> > > > necessary records are present and in the correct order.
> > > > 
> > > > Patches 5-7 reorganize the inobt free data checks to deal with the
> > > > "multiple inobt records per icluster" situation.  In restructuring the
> > > > code to do so, we also rename variables and functions to be less
> > > > confusing about what they're there for.  We also fix the 'is the inode
> > > > free?' check to calculate dinode buffer offsets correctly in the
> > > > "multiple inobt records per icluster" situation.
> > > > 
> > > > Patch 8 aborts the xattr scrub loop if there are pending fatal signals.
> > > > 
> > > > Patch 9 checks that for any directory or attr fork there are no extent
> > > > maps that stretch beyond what a xfs_dablk_t can map.
> > > > 
> > > > If you're going to start using this mess, you probably ought to just
> > > > pull from my git trees.  The kernel patches[1] should apply against
> > > > 4.20-rc4.
> > > > 
> > > > Comments and questions are, as always, welcome.
> > > 
> > > Hi Darrick,
> > > 
> > > I reviewed patches 1 through 7. The fixes look good.
> > > 
> > > I am not well versed with the XFS code that deals with xattrs &
> > > directories. Hence I could not review patches 8 and 9.
> > > 
> > > I did a simple scrub test on a 64k blocksized filesystem running a kernel
> > > having the following commit as the HEAD,
> > > 
> > > commit 6c4e1579b332e52566c11416e0dd0fa91a3b8f70 (HEAD -> djwong-devel, djwong-xfs-linux/djwong-devel)
> > > Author: Darrick J. Wong <darrick.wong@oracle.com>
> > > Date:   Thu Oct 18 17:35:49 2018 -0700
> > > 
> > >     xfs: repair quotas
> > >     
> > >     Fix anything that causes the quota verifiers to fail.
> > >     
> > >     Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > 
> > > Xfsprogs had the following as the topmost commit,
> > > 
> > > commit 633eec2a893c3be9796dad188144b00e084560ec (HEAD -> djwong-devel, djwong/djwong-devel)
> > > Author: Darrick J. Wong <darrick.wong@oracle.com>
> > > Date:   Tue Nov 13 17:38:31 2018 -0800
> > > 
> > >     xfs: repair extended attributes
> > >     
> > >     If the extended attributes look bad, try to sift through the rubble to
> > >     find whatever keys/values we can, zap the attr tree, and re-add the
> > >     values.
> > >     
> > >     Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > 
> > > The test filesystem had 2000 files with each being 64k in size.
> > > 
> > > root@ubuntu:~# xfs_scrub -d -n -v /mnt/
> > > EXPERIMENTAL xfs_scrub program in use! Use at your own risk!
> > > Phase 1: Find filesystem geometry.
> > > /mnt/: using 8 threads to scrub.
> > > Phase 2: Check internal metadata.
> > > Info: AG 1 superblock: Optimization is possible. (scrub.c line 269)
> > > Info: AG 3 superblock: Optimization is possible. (scrub.c line 269)
> > > Info: AG 2 superblock: Optimization is possible. (scrub.c line 269)
> > > Error: AG 0 free inode btree: Repairs are required. (scrub.c line 253)
> > > Phase 3: Scan all inodes.
> > > Phase 5: Check directory tree.
> > > Info: /mnt/: Filesystem has errors, skipping connectivity checks. (phase5.c line 295)
> > > Phase 7: Check summary counters.
> > > 163.9MiB data used;  1.9K inodes used.
> > > 152.7MiB data found; 1.9K inodes found.
> > > 1.9K inodes counted; 1.9K inodes checked.
> > > /mnt/: errors found: 1
> > > /mnt/: Re-run xfs_scrub without -n.
> > > 
> > > Looks like we have a bug when scrubbing the Free inode btree. I will work on
> > > figuring out the root cause and also plan to execute scrub tests shipped with
> > > xfstests.
> > > 
> > > 
> > 
> > We do have an unaligned finobt record,
> > 
> > xfs_scrub 27247 [004] 18014.774762: probe:xchk_iallocbt_rec_alignment: (c00000000075941c) ir_startino=176960 ig_cluster_align_inodes=128 imask_u32=127
> >         c00000000075941c xchk_iallocbt_rec_alignment+0x10c (/boot/vmlinux)
> >         c00000000075966c xchk_iallocbt_rec+0x14c (/boot/vmlinux)
> >                        0 [unknown] ([unknown])
> >         c000000000752968 xchk_btree+0x288 (/boot/vmlinux)
> >         c0000000007598c4 xchk_iallocbt+0x64 (/boot/vmlinux)
> >         c00000000075dc1c xfs_scrub_metadata+0x4ac (/boot/vmlinux)
> >         c0000000006e4b88 xfs_ioc_scrub_metadata+0x68 (/boot/vmlinux)
> >         c0000000006e7d5c xfs_file_ioctl+0xbbc (/boot/vmlinux)
> >         c000000000425e04 do_vfs_ioctl+0xd4 (/boot/vmlinux)
> >         c000000000426874 ksys_ioctl+0x64 (/boot/vmlinux)
> >         c000000000426928 __se_sys_ioctl+0x28 (/boot/vmlinux)
> >         c00000000000bae4 system_call+0x5c (/boot/vmlinux)
> >             7ffff77d7694 __GI___ioctl+0x114 (/lib/powerpc64le-linux-gnu/libc-2.27.so)
> >                10000e4e0 xfs_check_metadata+0x100 (/root/repos/xfsprogs-dev/scrub/xfs_scrub)
> >                10000e484 xfs_check_metadata+0xa4 (/root/repos/xfsprogs-dev/scrub/xfs_scrub)
> >                10000ec48 xfs_scrub_metadata+0xd8 (/root/repos/xfsprogs-dev/scrub/xfs_scrub)
> >                100008dac xfs_scan_ag_metadata+0x12c (/root/repos/xfsprogs-dev/scrub/xfs_scrub)
> >                1000198c8 workqueue_thread+0x108 (/root/repos/xfsprogs-dev/scrub/xfs_scrub)
> >             7ffff7f2885c start_thread+0x10c (/lib/powerpc64le-linux-gnu/libpthread-2.27.so)
> >             7ffff77e9028 __clone+0x98 (/lib/powerpc64le-linux-gnu/libc-2.27.so)
> > 
> > Maybe this is due to a bug in the finobt code. I will continue debugging and
> > report my findings.
> 
> Ok.  Feel free to send along a metadump if you have one.
> 


Indeed, XFS does not align finobt records according to inode cluster size.

More precisely, For big block filesystems, When freeing inodes (please refer
to xfs_difree_inobt()), XFS does not delete inobt records due to the following
code evaluating to false,

        if (!(mp->m_flags & XFS_MOUNT_IKEEP) &&
            rec.ir_free == XFS_INOBT_ALL_FREE &&
            mp->m_sb.sb_inopblock <= XFS_INODES_PER_CHUNK) {

However there is no such conditional check when deleting finobt records.
Hence I wrote the following patch,

---
 fs/xfs/libxfs/xfs_ialloc.c | 85 +++++++++++++++++++++++++++-----------
 fs/xfs/scrub/ialloc.c      |  2 +
 2 files changed, 64 insertions(+), 23 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index f32be0c85f93..8d3edd758413 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -1399,14 +1399,19 @@ xfs_dialloc_ag_finobt_near(
 	struct xfs_btree_cur		*lcur = *ocur;	/* left search cursor */
 	struct xfs_btree_cur		*rcur;	/* right search cursor */
 	struct xfs_inobt_rec_incore	rrec;
+	xfs_agino_t			agino;
 	int				error;
 	int				i, j;
 
-	error = xfs_inobt_lookup(lcur, pagino, XFS_LOOKUP_LE, &i);
-	if (error)
-		return error;
+	agino = pagino;
+	while (1) {
+		error = xfs_inobt_lookup(lcur, agino, XFS_LOOKUP_LE, &i);
+		if (error)
+			return error;
+
+		if (i != 1)
+			break;
 
-	if (i == 1) {
 		error = xfs_inobt_get_rec(lcur, rec, &i);
 		if (error)
 			return error;
@@ -1417,23 +1422,39 @@ xfs_dialloc_ag_finobt_near(
 		 * only tracks chunks with at least one free inode, so record
 		 * existence is enough.
 		 */
-		if (pagino >= rec->ir_startino &&
-		    pagino < (rec->ir_startino + XFS_INODES_PER_CHUNK))
-			return 0;
+		if (rec->ir_freecount) {
+			if (pagino >= rec->ir_startino &&
+				pagino < (rec->ir_startino + XFS_INODES_PER_CHUNK))
+				return 0;
+			else
+				break;
+		}
+
+		agino = rec->ir_startino - 1;
 	}
 
 	error = xfs_btree_dup_cursor(lcur, &rcur);
 	if (error)
 		return error;
 
-	error = xfs_inobt_lookup(rcur, pagino, XFS_LOOKUP_GE, &j);
-	if (error)
-		goto error_rcur;
-	if (j == 1) {
+	agino = pagino;
+	while (1) {
+		error = xfs_inobt_lookup(rcur, agino, XFS_LOOKUP_GE, &j);
+		if (error)
+			goto error_rcur;
+
+		if (j != 1)
+			break;
+
 		error = xfs_inobt_get_rec(rcur, &rrec, &j);
 		if (error)
 			goto error_rcur;
 		XFS_WANT_CORRUPTED_GOTO(lcur->bc_mp, j == 1, error_rcur);
+
+		if (rrec.ir_freecount)
+			break;
+
+		agino = rrec.ir_startino + XFS_INODES_PER_CHUNK;
 	}
 
 	XFS_WANT_CORRUPTED_GOTO(lcur->bc_mp, i == 1 || j == 1, error_rcur);
@@ -1477,6 +1498,7 @@ xfs_dialloc_ag_finobt_newino(
 	struct xfs_btree_cur		*cur,
 	struct xfs_inobt_rec_incore	*rec)
 {
+	xfs_agino_t agino;
 	int error;
 	int i;
 
@@ -1490,22 +1512,32 @@ xfs_dialloc_ag_finobt_newino(
 			if (error)
 				return error;
 			XFS_WANT_CORRUPTED_RETURN(cur->bc_mp, i == 1);
-			return 0;
+
+			if (rec->ir_freecount)
+				return 0;
 		}
 	}
 
 	/*
 	 * Find the first inode available in the AG.
 	 */
-	error = xfs_inobt_lookup(cur, 0, XFS_LOOKUP_GE, &i);
-	if (error)
-		return error;
-	XFS_WANT_CORRUPTED_RETURN(cur->bc_mp, i == 1);
+	agino = 0;
+	while (1) {
+		error = xfs_inobt_lookup(cur, agino, XFS_LOOKUP_GE, &i);
+		if (error)
+			return error;
+		XFS_WANT_CORRUPTED_RETURN(cur->bc_mp, i == 1);
 
-	error = xfs_inobt_get_rec(cur, rec, &i);
-	if (error)
-		return error;
-	XFS_WANT_CORRUPTED_RETURN(cur->bc_mp, i == 1);
+		error = xfs_inobt_get_rec(cur, rec, &i);
+		if (error)
+			return error;
+		XFS_WANT_CORRUPTED_RETURN(cur->bc_mp, i == 1);
+
+		if (rec->ir_freecount)
+			break;
+
+		agino += XFS_INODES_PER_CHUNK;
+	}
 
 	return 0;
 }
@@ -1615,10 +1647,17 @@ xfs_dialloc_ag(
 	 */
 	rec.ir_free &= ~XFS_INOBT_MASK(offset);
 	rec.ir_freecount--;
-	if (rec.ir_freecount)
+	if (rec.ir_freecount) {
 		error = xfs_inobt_update(cur, &rec);
-	else
-		error = xfs_btree_delete(cur, &i);
+	} else {
+		if (!(mp->m_flags & XFS_MOUNT_IKEEP) &&
+			/* TODO: Add a condition to check hole mask */
+			mp->m_sb.sb_inopblock <= XFS_INODES_PER_CHUNK)
+			error = xfs_btree_delete(cur, &i);
+		else
+			error = xfs_inobt_update(cur, &rec);
+	}
+
 	if (error)
 		goto error_cur;
 
diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
index 9269f5158cdc..838161f25db4 100644
--- a/fs/xfs/scrub/ialloc.c
+++ b/fs/xfs/scrub/ialloc.c
@@ -79,9 +79,11 @@ xchk_iallocbt_chunk_xref_other(
 	error = xfs_ialloc_has_inode_record(*pcur, agino, agino, &has_irec);
 	if (!xchk_should_check_xref(sc, &error, pcur))
 		return;
+#if 0
 	if (((irec->ir_freecount > 0 && !has_irec) ||
 	     (irec->ir_freecount == 0 && has_irec)))
 		xchk_btree_xref_set_corrupt(sc, *pcur, 0);
+#endif
 }
 
 /* Cross-reference with the other btrees. */
-- 

The "cross reference" check had to be disabled since the above patch now
causes finobt entries with freecount value set to be 0 in big block
filesystem instances.

The following trivial tests have been executed on a kernel compiled with the
above patch,

1. For both 4k and 64k block sized filesystems,
   1. Create 2000 files and execute xfs_scrub.
2. Repeat step 1 with sparse inode feature disabled.
3. Build linux kernel on the test filesystem.

I am planning to test it more thoroughly after I get review comments for the
code changes provided above.

I have attached xz compressed image of xfs_metadump output of the test
filesystem along with this mail.

PS: Apologies for the delayed response. It took me some time to understand the
relevant code and write the patch provided above.

-- 
chandan

[-- Attachment #2: scrub-64k.bin.xz --]
[-- Type: application/x-xz, Size: 130024 bytes --]

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/9] xfs-5.0: inode scrubber fixes
  2018-12-31 11:39       ` Chandan Rajendra
@ 2018-12-31 18:51         ` Darrick J. Wong
  2019-01-01 15:29           ` Chandan Rajendra
  0 siblings, 1 reply; 18+ messages in thread
From: Darrick J. Wong @ 2018-12-31 18:51 UTC (permalink / raw)
  To: Chandan Rajendra; +Cc: Chandan Rajendra, linux-xfs

On Mon, Dec 31, 2018 at 05:09:47PM +0530, Chandan Rajendra wrote:
> On Thursday, December 20, 2018 2:03:59 AM IST Darrick J. Wong wrote:
> > On Wed, Dec 19, 2018 at 06:45:26PM +0530, Chandan Rajendra wrote:
> > > On Wednesday, December 19, 2018 2:01:13 PM IST Chandan Rajendra wrote:
> > > > On Thursday, November 29, 2018 4:56:58 AM IST Darrick J. Wong wrote:
> > > > > Hi all,
> > > > > 
> > > > > Here are fixes for some problems with the inode btree scrub code, namely
> > > > > that the existing code does not handle the case where a single inode
> > > > > cluster is mapped by multiple inobt records.
> > > > > 
> > > > > Patch 1 teaches the inode record block counting code to handle the case
> > > > > where there's more than one inobt record per inode cluster.  We do this
> > > > > by counting inodes and converting to blocks only at the end.
> > > > > 
> > > > > Patch 2 corrects a condition where we needed to clamp the number of
> > > > > inodes checked for a given inobt record to the inode chunk size.
> > > > > 
> > > > > Patches 3-4 move the inobt record alignment checks to a separate
> > > > > function and enhance the function to check that when we have more than
> > > > > one inobt record per cluster we actually check that *all* of the
> > > > > necessary records are present and in the correct order.
> > > > > 
> > > > > Patches 5-7 reorganize the inobt free data checks to deal with the
> > > > > "multiple inobt records per icluster" situation.  In restructuring the
> > > > > code to do so, we also rename variables and functions to be less
> > > > > confusing about what they're there for.  We also fix the 'is the inode
> > > > > free?' check to calculate dinode buffer offsets correctly in the
> > > > > "multiple inobt records per icluster" situation.
> > > > > 
> > > > > Patch 8 aborts the xattr scrub loop if there are pending fatal signals.
> > > > > 
> > > > > Patch 9 checks that for any directory or attr fork there are no extent
> > > > > maps that stretch beyond what a xfs_dablk_t can map.
> > > > > 
> > > > > If you're going to start using this mess, you probably ought to just
> > > > > pull from my git trees.  The kernel patches[1] should apply against
> > > > > 4.20-rc4.
> > > > > 
> > > > > Comments and questions are, as always, welcome.
> > > > 
> > > > Hi Darrick,
> > > > 
> > > > I reviewed patches 1 through 7. The fixes look good.
> > > > 
> > > > I am not well versed with the XFS code that deals with xattrs &
> > > > directories. Hence I could not review patches 8 and 9.
> > > > 
> > > > I did a simple scrub test on a 64k blocksized filesystem running a kernel
> > > > having the following commit as the HEAD,
> > > > 
> > > > commit 6c4e1579b332e52566c11416e0dd0fa91a3b8f70 (HEAD -> djwong-devel, djwong-xfs-linux/djwong-devel)
> > > > Author: Darrick J. Wong <darrick.wong@oracle.com>
> > > > Date:   Thu Oct 18 17:35:49 2018 -0700
> > > > 
> > > >     xfs: repair quotas
> > > >     
> > > >     Fix anything that causes the quota verifiers to fail.
> > > >     
> > > >     Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > 
> > > > 
> > > > Xfsprogs had the following as the topmost commit,
> > > > 
> > > > commit 633eec2a893c3be9796dad188144b00e084560ec (HEAD -> djwong-devel, djwong/djwong-devel)
> > > > Author: Darrick J. Wong <darrick.wong@oracle.com>
> > > > Date:   Tue Nov 13 17:38:31 2018 -0800
> > > > 
> > > >     xfs: repair extended attributes
> > > >     
> > > >     If the extended attributes look bad, try to sift through the rubble to
> > > >     find whatever keys/values we can, zap the attr tree, and re-add the
> > > >     values.
> > > >     
> > > >     Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > 
> > > > 
> > > > The test filesystem had 2000 files with each being 64k in size.
> > > > 
> > > > root@ubuntu:~# xfs_scrub -d -n -v /mnt/
> > > > EXPERIMENTAL xfs_scrub program in use! Use at your own risk!
> > > > Phase 1: Find filesystem geometry.
> > > > /mnt/: using 8 threads to scrub.
> > > > Phase 2: Check internal metadata.
> > > > Info: AG 1 superblock: Optimization is possible. (scrub.c line 269)
> > > > Info: AG 3 superblock: Optimization is possible. (scrub.c line 269)
> > > > Info: AG 2 superblock: Optimization is possible. (scrub.c line 269)
> > > > Error: AG 0 free inode btree: Repairs are required. (scrub.c line 253)
> > > > Phase 3: Scan all inodes.
> > > > Phase 5: Check directory tree.
> > > > Info: /mnt/: Filesystem has errors, skipping connectivity checks. (phase5.c line 295)
> > > > Phase 7: Check summary counters.
> > > > 163.9MiB data used;  1.9K inodes used.
> > > > 152.7MiB data found; 1.9K inodes found.
> > > > 1.9K inodes counted; 1.9K inodes checked.
> > > > /mnt/: errors found: 1
> > > > /mnt/: Re-run xfs_scrub without -n.
> > > > 
> > > > Looks like we have a bug when scrubbing the Free inode btree. I will work on
> > > > figuring out the root cause and also plan to execute scrub tests shipped with
> > > > xfstests.
> > > > 
> > > > 
> > > 
> > > We do have an unaligned finobt record,
> > > 
> > > xfs_scrub 27247 [004] 18014.774762: probe:xchk_iallocbt_rec_alignment: (c00000000075941c) ir_startino=176960 ig_cluster_align_inodes=128 imask_u32=127
> > >         c00000000075941c xchk_iallocbt_rec_alignment+0x10c (/boot/vmlinux)
> > >         c00000000075966c xchk_iallocbt_rec+0x14c (/boot/vmlinux)
> > >                        0 [unknown] ([unknown])
> > >         c000000000752968 xchk_btree+0x288 (/boot/vmlinux)
> > >         c0000000007598c4 xchk_iallocbt+0x64 (/boot/vmlinux)
> > >         c00000000075dc1c xfs_scrub_metadata+0x4ac (/boot/vmlinux)
> > >         c0000000006e4b88 xfs_ioc_scrub_metadata+0x68 (/boot/vmlinux)
> > >         c0000000006e7d5c xfs_file_ioctl+0xbbc (/boot/vmlinux)
> > >         c000000000425e04 do_vfs_ioctl+0xd4 (/boot/vmlinux)
> > >         c000000000426874 ksys_ioctl+0x64 (/boot/vmlinux)
> > >         c000000000426928 __se_sys_ioctl+0x28 (/boot/vmlinux)
> > >         c00000000000bae4 system_call+0x5c (/boot/vmlinux)
> > >             7ffff77d7694 __GI___ioctl+0x114 (/lib/powerpc64le-linux-gnu/libc-2.27.so)
> > >                10000e4e0 xfs_check_metadata+0x100 (/root/repos/xfsprogs-dev/scrub/xfs_scrub)
> > >                10000e484 xfs_check_metadata+0xa4 (/root/repos/xfsprogs-dev/scrub/xfs_scrub)
> > >                10000ec48 xfs_scrub_metadata+0xd8 (/root/repos/xfsprogs-dev/scrub/xfs_scrub)
> > >                100008dac xfs_scan_ag_metadata+0x12c (/root/repos/xfsprogs-dev/scrub/xfs_scrub)
> > >                1000198c8 workqueue_thread+0x108 (/root/repos/xfsprogs-dev/scrub/xfs_scrub)
> > >             7ffff7f2885c start_thread+0x10c (/lib/powerpc64le-linux-gnu/libpthread-2.27.so)
> > >             7ffff77e9028 __clone+0x98 (/lib/powerpc64le-linux-gnu/libc-2.27.so)
> > > 
> > > Maybe this is due to a bug in the finobt code. I will continue debugging and
> > > report my findings.
> > 
> > Ok.  Feel free to send along a metadump if you have one.
> > 
> 
> 
> Indeed, XFS does not align finobt records according to inode cluster size.
> 
> More precisely, For big block filesystems, When freeing inodes (please refer
> to xfs_difree_inobt()), XFS does not delete inobt records due to the following
> code evaluating to false,
> 
>         if (!(mp->m_flags & XFS_MOUNT_IKEEP) &&
>             rec.ir_free == XFS_INOBT_ALL_FREE &&
>             mp->m_sb.sb_inopblock <= XFS_INODES_PER_CHUNK) {
> 
> However there is no such conditional check when deleting finobt records.
> Hence I wrote the following patch,
> 
> ---
>  fs/xfs/libxfs/xfs_ialloc.c | 85 +++++++++++++++++++++++++++-----------
>  fs/xfs/scrub/ialloc.c      |  2 +
>  2 files changed, 64 insertions(+), 23 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> index f32be0c85f93..8d3edd758413 100644
> --- a/fs/xfs/libxfs/xfs_ialloc.c
> +++ b/fs/xfs/libxfs/xfs_ialloc.c
> @@ -1399,14 +1399,19 @@ xfs_dialloc_ag_finobt_near(
>  	struct xfs_btree_cur		*lcur = *ocur;	/* left search cursor */
>  	struct xfs_btree_cur		*rcur;	/* right search cursor */
>  	struct xfs_inobt_rec_incore	rrec;
> +	xfs_agino_t			agino;
>  	int				error;
>  	int				i, j;
>  
> -	error = xfs_inobt_lookup(lcur, pagino, XFS_LOOKUP_LE, &i);
> -	if (error)
> -		return error;
> +	agino = pagino;
> +	while (1) {
> +		error = xfs_inobt_lookup(lcur, agino, XFS_LOOKUP_LE, &i);
> +		if (error)
> +			return error;
> +
> +		if (i != 1)
> +			break;
>  
> -	if (i == 1) {
>  		error = xfs_inobt_get_rec(lcur, rec, &i);
>  		if (error)
>  			return error;
> @@ -1417,23 +1422,39 @@ xfs_dialloc_ag_finobt_near(
>  		 * only tracks chunks with at least one free inode, so record
>  		 * existence is enough.
>  		 */
> -		if (pagino >= rec->ir_startino &&
> -		    pagino < (rec->ir_startino + XFS_INODES_PER_CHUNK))
> -			return 0;
> +		if (rec->ir_freecount) {
> +			if (pagino >= rec->ir_startino &&
> +				pagino < (rec->ir_startino + XFS_INODES_PER_CHUNK))
> +				return 0;
> +			else
> +				break;
> +		}
> +
> +		agino = rec->ir_startino - 1;
>  	}
>  
>  	error = xfs_btree_dup_cursor(lcur, &rcur);
>  	if (error)
>  		return error;
>  
> -	error = xfs_inobt_lookup(rcur, pagino, XFS_LOOKUP_GE, &j);
> -	if (error)
> -		goto error_rcur;
> -	if (j == 1) {
> +	agino = pagino;
> +	while (1) {
> +		error = xfs_inobt_lookup(rcur, agino, XFS_LOOKUP_GE, &j);
> +		if (error)
> +			goto error_rcur;
> +
> +		if (j != 1)
> +			break;
> +
>  		error = xfs_inobt_get_rec(rcur, &rrec, &j);
>  		if (error)
>  			goto error_rcur;
>  		XFS_WANT_CORRUPTED_GOTO(lcur->bc_mp, j == 1, error_rcur);
> +
> +		if (rrec.ir_freecount)
> +			break;
> +
> +		agino = rrec.ir_startino + XFS_INODES_PER_CHUNK;
>  	}
>  
>  	XFS_WANT_CORRUPTED_GOTO(lcur->bc_mp, i == 1 || j == 1, error_rcur);
> @@ -1477,6 +1498,7 @@ xfs_dialloc_ag_finobt_newino(
>  	struct xfs_btree_cur		*cur,
>  	struct xfs_inobt_rec_incore	*rec)
>  {
> +	xfs_agino_t agino;
>  	int error;
>  	int i;
>  
> @@ -1490,22 +1512,32 @@ xfs_dialloc_ag_finobt_newino(
>  			if (error)
>  				return error;
>  			XFS_WANT_CORRUPTED_RETURN(cur->bc_mp, i == 1);
> -			return 0;
> +
> +			if (rec->ir_freecount)
> +				return 0;
>  		}
>  	}
>  
>  	/*
>  	 * Find the first inode available in the AG.
>  	 */
> -	error = xfs_inobt_lookup(cur, 0, XFS_LOOKUP_GE, &i);
> -	if (error)
> -		return error;
> -	XFS_WANT_CORRUPTED_RETURN(cur->bc_mp, i == 1);
> +	agino = 0;
> +	while (1) {
> +		error = xfs_inobt_lookup(cur, agino, XFS_LOOKUP_GE, &i);
> +		if (error)
> +			return error;
> +		XFS_WANT_CORRUPTED_RETURN(cur->bc_mp, i == 1);
>  
> -	error = xfs_inobt_get_rec(cur, rec, &i);
> -	if (error)
> -		return error;
> -	XFS_WANT_CORRUPTED_RETURN(cur->bc_mp, i == 1);
> +		error = xfs_inobt_get_rec(cur, rec, &i);
> +		if (error)
> +			return error;
> +		XFS_WANT_CORRUPTED_RETURN(cur->bc_mp, i == 1);
> +
> +		if (rec->ir_freecount)
> +			break;
> +
> +		agino += XFS_INODES_PER_CHUNK;
> +	}
>  
>  	return 0;
>  }
> @@ -1615,10 +1647,17 @@ xfs_dialloc_ag(
>  	 */
>  	rec.ir_free &= ~XFS_INOBT_MASK(offset);
>  	rec.ir_freecount--;
> -	if (rec.ir_freecount)
> +	if (rec.ir_freecount) {
>  		error = xfs_inobt_update(cur, &rec);
> -	else
> -		error = xfs_btree_delete(cur, &i);
> +	} else {
> +		if (!(mp->m_flags & XFS_MOUNT_IKEEP) &&
> +			/* TODO: Add a condition to check hole mask */
> +			mp->m_sb.sb_inopblock <= XFS_INODES_PER_CHUNK)
> +			error = xfs_btree_delete(cur, &i);
> +		else
> +			error = xfs_inobt_update(cur, &rec);
> +	}
> +
>  	if (error)
>  		goto error_cur;
>  
> diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
> index 9269f5158cdc..838161f25db4 100644
> --- a/fs/xfs/scrub/ialloc.c
> +++ b/fs/xfs/scrub/ialloc.c
> @@ -79,9 +79,11 @@ xchk_iallocbt_chunk_xref_other(
>  	error = xfs_ialloc_has_inode_record(*pcur, agino, agino, &has_irec);
>  	if (!xchk_should_check_xref(sc, &error, pcur))
>  		return;
> +#if 0
>  	if (((irec->ir_freecount > 0 && !has_irec) ||
>  	     (irec->ir_freecount == 0 && has_irec)))
>  		xchk_btree_xref_set_corrupt(sc, *pcur, 0);
> +#endif
>  }
>  
>  /* Cross-reference with the other btrees. */
> -- 
> 
> The "cross reference" check had to be disabled since the above patch now
> causes finobt entries with freecount value set to be 0 in big block
> filesystem instances.

I don't think this is a good idea, because this now breaks the notion
that the free inode btree contains only records for where there actually
/are/ free inodes.  AFAICT, the entire point of the finobt is to be a
condensed version of the inobt that only points to the free inodes, to
save time while creating files.  If we start writing ir_freecount == 0
finobt records now, what will older kernels think of this?

Furthermore, for these 'big block' filesystems we've already been
writing out individual finobt records that aren't aligned to anything
more than the cluster size.  These filesystems are already out in the
wild, so it is scrub that must adapt to this circumstance.

I think it's better to change xchk_iallocbt_rec_alignment to have a
different check at the top when we're checking finobt records:

/*
 * finobt records have different positioning requirements than inobt
 * records: each finobt record must have a corresponding inobt record.
 * That is checked in the xref function, so for now we only catch the
 * obvious case where the record isn't even chunk-aligned.
 *
 * Note also that if a fs block contains more than a single chunk of
 * inodes, we will have finobt records only for those chunks containing
 * free inodes.
 */
if (bs->cur->bc_btnum == XFS_BTNUM_FINO) {
	imask = XFS_INODES_PER_CHUNK - 1;
	if (irec->ir_startino & imask)
		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
	return;
}

> The following trivial tests have been executed on a kernel compiled with the
> above patch,
> 
> 1. For both 4k and 64k block sized filesystems,
>    1. Create 2000 files and execute xfs_scrub.
> 2. Repeat step 1 with sparse inode feature disabled.
> 3. Build linux kernel on the test filesystem.
> 
> I am planning to test it more thoroughly after I get review comments for the
> code changes provided above.
> 
> I have attached xz compressed image of xfs_metadump output of the test
> filesystem along with this mail.
> 
> PS: Apologies for the delayed response. It took me some time to understand the
> relevant code and write the patch provided above.

No worries, most of us have been on vacation for a week+ anyway.

--D

> -- 
> chandan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/9] xfs-5.0: inode scrubber fixes
  2018-12-31 18:51         ` Darrick J. Wong
@ 2019-01-01 15:29           ` Chandan Rajendra
  0 siblings, 0 replies; 18+ messages in thread
From: Chandan Rajendra @ 2019-01-01 15:29 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Chandan Rajendra, linux-xfs

On Tuesday, January 1, 2019 12:21:37 AM IST Darrick J. Wong wrote:
> On Mon, Dec 31, 2018 at 05:09:47PM +0530, Chandan Rajendra wrote:
> > On Thursday, December 20, 2018 2:03:59 AM IST Darrick J. Wong wrote:
> > > On Wed, Dec 19, 2018 at 06:45:26PM +0530, Chandan Rajendra wrote:
> > > > On Wednesday, December 19, 2018 2:01:13 PM IST Chandan Rajendra wrote:
> > > > > On Thursday, November 29, 2018 4:56:58 AM IST Darrick J. Wong wrote:
> > > > > > Hi all,
> > > > > > 
> > > > > > Here are fixes for some problems with the inode btree scrub code, namely
> > > > > > that the existing code does not handle the case where a single inode
> > > > > > cluster is mapped by multiple inobt records.
> > > > > > 
> > > > > > Patch 1 teaches the inode record block counting code to handle the case
> > > > > > where there's more than one inobt record per inode cluster.  We do this
> > > > > > by counting inodes and converting to blocks only at the end.
> > > > > > 
> > > > > > Patch 2 corrects a condition where we needed to clamp the number of
> > > > > > inodes checked for a given inobt record to the inode chunk size.
> > > > > > 
> > > > > > Patches 3-4 move the inobt record alignment checks to a separate
> > > > > > function and enhance the function to check that when we have more than
> > > > > > one inobt record per cluster we actually check that *all* of the
> > > > > > necessary records are present and in the correct order.
> > > > > > 
> > > > > > Patches 5-7 reorganize the inobt free data checks to deal with the
> > > > > > "multiple inobt records per icluster" situation.  In restructuring the
> > > > > > code to do so, we also rename variables and functions to be less
> > > > > > confusing about what they're there for.  We also fix the 'is the inode
> > > > > > free?' check to calculate dinode buffer offsets correctly in the
> > > > > > "multiple inobt records per icluster" situation.
> > > > > > 
> > > > > > Patch 8 aborts the xattr scrub loop if there are pending fatal signals.
> > > > > > 
> > > > > > Patch 9 checks that for any directory or attr fork there are no extent
> > > > > > maps that stretch beyond what a xfs_dablk_t can map.
> > > > > > 
> > > > > > If you're going to start using this mess, you probably ought to just
> > > > > > pull from my git trees.  The kernel patches[1] should apply against
> > > > > > 4.20-rc4.
> > > > > > 
> > > > > > Comments and questions are, as always, welcome.
> > > > > 
> > > > > Hi Darrick,
> > > > > 
> > > > > I reviewed patches 1 through 7. The fixes look good.
> > > > > 
> > > > > I am not well versed with the XFS code that deals with xattrs &
> > > > > directories. Hence I could not review patches 8 and 9.
> > > > > 
> > > > > I did a simple scrub test on a 64k blocksized filesystem running a kernel
> > > > > having the following commit as the HEAD,
> > > > > 
> > > > > commit 6c4e1579b332e52566c11416e0dd0fa91a3b8f70 (HEAD -> djwong-devel, djwong-xfs-linux/djwong-devel)
> > > > > Author: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > Date:   Thu Oct 18 17:35:49 2018 -0700
> > > > > 
> > > > >     xfs: repair quotas
> > > > >     
> > > > >     Fix anything that causes the quota verifiers to fail.
> > > > >     
> > > > >     Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > 
> > > > > 
> > > > > Xfsprogs had the following as the topmost commit,
> > > > > 
> > > > > commit 633eec2a893c3be9796dad188144b00e084560ec (HEAD -> djwong-devel, djwong/djwong-devel)
> > > > > Author: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > Date:   Tue Nov 13 17:38:31 2018 -0800
> > > > > 
> > > > >     xfs: repair extended attributes
> > > > >     
> > > > >     If the extended attributes look bad, try to sift through the rubble to
> > > > >     find whatever keys/values we can, zap the attr tree, and re-add the
> > > > >     values.
> > > > >     
> > > > >     Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > 
> > > > > 
> > > > > The test filesystem had 2000 files with each being 64k in size.
> > > > > 
> > > > > root@ubuntu:~# xfs_scrub -d -n -v /mnt/
> > > > > EXPERIMENTAL xfs_scrub program in use! Use at your own risk!
> > > > > Phase 1: Find filesystem geometry.
> > > > > /mnt/: using 8 threads to scrub.
> > > > > Phase 2: Check internal metadata.
> > > > > Info: AG 1 superblock: Optimization is possible. (scrub.c line 269)
> > > > > Info: AG 3 superblock: Optimization is possible. (scrub.c line 269)
> > > > > Info: AG 2 superblock: Optimization is possible. (scrub.c line 269)
> > > > > Error: AG 0 free inode btree: Repairs are required. (scrub.c line 253)
> > > > > Phase 3: Scan all inodes.
> > > > > Phase 5: Check directory tree.
> > > > > Info: /mnt/: Filesystem has errors, skipping connectivity checks. (phase5.c line 295)
> > > > > Phase 7: Check summary counters.
> > > > > 163.9MiB data used;  1.9K inodes used.
> > > > > 152.7MiB data found; 1.9K inodes found.
> > > > > 1.9K inodes counted; 1.9K inodes checked.
> > > > > /mnt/: errors found: 1
> > > > > /mnt/: Re-run xfs_scrub without -n.
> > > > > 
> > > > > Looks like we have a bug when scrubbing the Free inode btree. I will work on
> > > > > figuring out the root cause and also plan to execute scrub tests shipped with
> > > > > xfstests.
> > > > > 
> > > > > 
> > > > 
> > > > We do have an unaligned finobt record,
> > > > 
> > > > xfs_scrub 27247 [004] 18014.774762: probe:xchk_iallocbt_rec_alignment: (c00000000075941c) ir_startino=176960 ig_cluster_align_inodes=128 imask_u32=127
> > > >         c00000000075941c xchk_iallocbt_rec_alignment+0x10c (/boot/vmlinux)
> > > >         c00000000075966c xchk_iallocbt_rec+0x14c (/boot/vmlinux)
> > > >                        0 [unknown] ([unknown])
> > > >         c000000000752968 xchk_btree+0x288 (/boot/vmlinux)
> > > >         c0000000007598c4 xchk_iallocbt+0x64 (/boot/vmlinux)
> > > >         c00000000075dc1c xfs_scrub_metadata+0x4ac (/boot/vmlinux)
> > > >         c0000000006e4b88 xfs_ioc_scrub_metadata+0x68 (/boot/vmlinux)
> > > >         c0000000006e7d5c xfs_file_ioctl+0xbbc (/boot/vmlinux)
> > > >         c000000000425e04 do_vfs_ioctl+0xd4 (/boot/vmlinux)
> > > >         c000000000426874 ksys_ioctl+0x64 (/boot/vmlinux)
> > > >         c000000000426928 __se_sys_ioctl+0x28 (/boot/vmlinux)
> > > >         c00000000000bae4 system_call+0x5c (/boot/vmlinux)
> > > >             7ffff77d7694 __GI___ioctl+0x114 (/lib/powerpc64le-linux-gnu/libc-2.27.so)
> > > >                10000e4e0 xfs_check_metadata+0x100 (/root/repos/xfsprogs-dev/scrub/xfs_scrub)
> > > >                10000e484 xfs_check_metadata+0xa4 (/root/repos/xfsprogs-dev/scrub/xfs_scrub)
> > > >                10000ec48 xfs_scrub_metadata+0xd8 (/root/repos/xfsprogs-dev/scrub/xfs_scrub)
> > > >                100008dac xfs_scan_ag_metadata+0x12c (/root/repos/xfsprogs-dev/scrub/xfs_scrub)
> > > >                1000198c8 workqueue_thread+0x108 (/root/repos/xfsprogs-dev/scrub/xfs_scrub)
> > > >             7ffff7f2885c start_thread+0x10c (/lib/powerpc64le-linux-gnu/libpthread-2.27.so)
> > > >             7ffff77e9028 __clone+0x98 (/lib/powerpc64le-linux-gnu/libc-2.27.so)
> > > > 
> > > > Maybe this is due to a bug in the finobt code. I will continue debugging and
> > > > report my findings.
> > > 
> > > Ok.  Feel free to send along a metadump if you have one.
> > > 
> > 
> > 
> > Indeed, XFS does not align finobt records according to inode cluster size.
> > 
> > More precisely, For big block filesystems, When freeing inodes (please refer
> > to xfs_difree_inobt()), XFS does not delete inobt records due to the following
> > code evaluating to false,
> > 
> >         if (!(mp->m_flags & XFS_MOUNT_IKEEP) &&
> >             rec.ir_free == XFS_INOBT_ALL_FREE &&
> >             mp->m_sb.sb_inopblock <= XFS_INODES_PER_CHUNK) {
> > 
> > However there is no such conditional check when deleting finobt records.
> > Hence I wrote the following patch,
> > 
> > ---
> >  fs/xfs/libxfs/xfs_ialloc.c | 85 +++++++++++++++++++++++++++-----------
> >  fs/xfs/scrub/ialloc.c      |  2 +
> >  2 files changed, 64 insertions(+), 23 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> > index f32be0c85f93..8d3edd758413 100644
> > --- a/fs/xfs/libxfs/xfs_ialloc.c
> > +++ b/fs/xfs/libxfs/xfs_ialloc.c
> > @@ -1399,14 +1399,19 @@ xfs_dialloc_ag_finobt_near(
> >  	struct xfs_btree_cur		*lcur = *ocur;	/* left search cursor */
> >  	struct xfs_btree_cur		*rcur;	/* right search cursor */
> >  	struct xfs_inobt_rec_incore	rrec;
> > +	xfs_agino_t			agino;
> >  	int				error;
> >  	int				i, j;
> >  
> > -	error = xfs_inobt_lookup(lcur, pagino, XFS_LOOKUP_LE, &i);
> > -	if (error)
> > -		return error;
> > +	agino = pagino;
> > +	while (1) {
> > +		error = xfs_inobt_lookup(lcur, agino, XFS_LOOKUP_LE, &i);
> > +		if (error)
> > +			return error;
> > +
> > +		if (i != 1)
> > +			break;
> >  
> > -	if (i == 1) {
> >  		error = xfs_inobt_get_rec(lcur, rec, &i);
> >  		if (error)
> >  			return error;
> > @@ -1417,23 +1422,39 @@ xfs_dialloc_ag_finobt_near(
> >  		 * only tracks chunks with at least one free inode, so record
> >  		 * existence is enough.
> >  		 */
> > -		if (pagino >= rec->ir_startino &&
> > -		    pagino < (rec->ir_startino + XFS_INODES_PER_CHUNK))
> > -			return 0;
> > +		if (rec->ir_freecount) {
> > +			if (pagino >= rec->ir_startino &&
> > +				pagino < (rec->ir_startino + XFS_INODES_PER_CHUNK))
> > +				return 0;
> > +			else
> > +				break;
> > +		}
> > +
> > +		agino = rec->ir_startino - 1;
> >  	}
> >  
> >  	error = xfs_btree_dup_cursor(lcur, &rcur);
> >  	if (error)
> >  		return error;
> >  
> > -	error = xfs_inobt_lookup(rcur, pagino, XFS_LOOKUP_GE, &j);
> > -	if (error)
> > -		goto error_rcur;
> > -	if (j == 1) {
> > +	agino = pagino;
> > +	while (1) {
> > +		error = xfs_inobt_lookup(rcur, agino, XFS_LOOKUP_GE, &j);
> > +		if (error)
> > +			goto error_rcur;
> > +
> > +		if (j != 1)
> > +			break;
> > +
> >  		error = xfs_inobt_get_rec(rcur, &rrec, &j);
> >  		if (error)
> >  			goto error_rcur;
> >  		XFS_WANT_CORRUPTED_GOTO(lcur->bc_mp, j == 1, error_rcur);
> > +
> > +		if (rrec.ir_freecount)
> > +			break;
> > +
> > +		agino = rrec.ir_startino + XFS_INODES_PER_CHUNK;
> >  	}
> >  
> >  	XFS_WANT_CORRUPTED_GOTO(lcur->bc_mp, i == 1 || j == 1, error_rcur);
> > @@ -1477,6 +1498,7 @@ xfs_dialloc_ag_finobt_newino(
> >  	struct xfs_btree_cur		*cur,
> >  	struct xfs_inobt_rec_incore	*rec)
> >  {
> > +	xfs_agino_t agino;
> >  	int error;
> >  	int i;
> >  
> > @@ -1490,22 +1512,32 @@ xfs_dialloc_ag_finobt_newino(
> >  			if (error)
> >  				return error;
> >  			XFS_WANT_CORRUPTED_RETURN(cur->bc_mp, i == 1);
> > -			return 0;
> > +
> > +			if (rec->ir_freecount)
> > +				return 0;
> >  		}
> >  	}
> >  
> >  	/*
> >  	 * Find the first inode available in the AG.
> >  	 */
> > -	error = xfs_inobt_lookup(cur, 0, XFS_LOOKUP_GE, &i);
> > -	if (error)
> > -		return error;
> > -	XFS_WANT_CORRUPTED_RETURN(cur->bc_mp, i == 1);
> > +	agino = 0;
> > +	while (1) {
> > +		error = xfs_inobt_lookup(cur, agino, XFS_LOOKUP_GE, &i);
> > +		if (error)
> > +			return error;
> > +		XFS_WANT_CORRUPTED_RETURN(cur->bc_mp, i == 1);
> >  
> > -	error = xfs_inobt_get_rec(cur, rec, &i);
> > -	if (error)
> > -		return error;
> > -	XFS_WANT_CORRUPTED_RETURN(cur->bc_mp, i == 1);
> > +		error = xfs_inobt_get_rec(cur, rec, &i);
> > +		if (error)
> > +			return error;
> > +		XFS_WANT_CORRUPTED_RETURN(cur->bc_mp, i == 1);
> > +
> > +		if (rec->ir_freecount)
> > +			break;
> > +
> > +		agino += XFS_INODES_PER_CHUNK;
> > +	}
> >  
> >  	return 0;
> >  }
> > @@ -1615,10 +1647,17 @@ xfs_dialloc_ag(
> >  	 */
> >  	rec.ir_free &= ~XFS_INOBT_MASK(offset);
> >  	rec.ir_freecount--;
> > -	if (rec.ir_freecount)
> > +	if (rec.ir_freecount) {
> >  		error = xfs_inobt_update(cur, &rec);
> > -	else
> > -		error = xfs_btree_delete(cur, &i);
> > +	} else {
> > +		if (!(mp->m_flags & XFS_MOUNT_IKEEP) &&
> > +			/* TODO: Add a condition to check hole mask */
> > +			mp->m_sb.sb_inopblock <= XFS_INODES_PER_CHUNK)
> > +			error = xfs_btree_delete(cur, &i);
> > +		else
> > +			error = xfs_inobt_update(cur, &rec);
> > +	}
> > +
> >  	if (error)
> >  		goto error_cur;
> >  
> > diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
> > index 9269f5158cdc..838161f25db4 100644
> > --- a/fs/xfs/scrub/ialloc.c
> > +++ b/fs/xfs/scrub/ialloc.c
> > @@ -79,9 +79,11 @@ xchk_iallocbt_chunk_xref_other(
> >  	error = xfs_ialloc_has_inode_record(*pcur, agino, agino, &has_irec);
> >  	if (!xchk_should_check_xref(sc, &error, pcur))
> >  		return;
> > +#if 0
> >  	if (((irec->ir_freecount > 0 && !has_irec) ||
> >  	     (irec->ir_freecount == 0 && has_irec)))
> >  		xchk_btree_xref_set_corrupt(sc, *pcur, 0);
> > +#endif
> >  }
> >  
> >  /* Cross-reference with the other btrees. */
> 
> I don't think this is a good idea, because this now breaks the notion
> that the free inode btree contains only records for where there actually
> /are/ free inodes.  AFAICT, the entire point of the finobt is to be a
> condensed version of the inobt that only points to the free inodes, to
> save time while creating files.  If we start writing ir_freecount == 0
> finobt records now, what will older kernels think of this?
> 
> Furthermore, for these 'big block' filesystems we've already been
> writing out individual finobt records that aren't aligned to anything
> more than the cluster size.  These filesystems are already out in the
> wild, so it is scrub that must adapt to this circumstance.
> 
> I think it's better to change xchk_iallocbt_rec_alignment to have a
> different check at the top when we're checking finobt records:
> 
> /*
>  * finobt records have different positioning requirements than inobt
>  * records: each finobt record must have a corresponding inobt record.
>  * That is checked in the xref function, so for now we only catch the
>  * obvious case where the record isn't even chunk-aligned.
>  *
>  * Note also that if a fs block contains more than a single chunk of
>  * inodes, we will have finobt records only for those chunks containing
>  * free inodes.
>  */
> if (bs->cur->bc_btnum == XFS_BTNUM_FINO) {
> 	imask = XFS_INODES_PER_CHUNK - 1;
> 	if (irec->ir_startino & imask)
> 		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> 	return;
> }
> 

The above solution looks perfect and passes the simple "create 2000 files" and
"execute xfs_scrub" on 64k block sized filesystem. I will execute the scrub
tests that are part of fstests suite and let you know if there is a failure.

-- 
chandan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 5/9] xfs: hoist inode cluster checks out of loop
  2018-11-15  6:04 [PATCH v2 0/9] xfs-5.0: inode btree scrub fixes Darrick J. Wong
@ 2018-11-15  6:04 ` Darrick J. Wong
  0 siblings, 0 replies; 18+ messages in thread
From: Darrick J. Wong @ 2018-11-15  6:04 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, Dave Chinner

From: Darrick J. Wong <darrick.wong@oracle.com>

Hoist the inode cluster checks out of the inobt record check loop into
a separate function in preparation for refactoring of that loop.  No
functional changes here; that's in the next patch.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/scrub/ialloc.c |  119 +++++++++++++++++++++++++++----------------------
 1 file changed, 65 insertions(+), 54 deletions(-)


diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
index 5a33661e0656..475f930e0daa 100644
--- a/fs/xfs/scrub/ialloc.c
+++ b/fs/xfs/scrub/ialloc.c
@@ -188,19 +188,19 @@ xchk_iallocbt_check_cluster_freemask(
 	return 0;
 }
 
-/* Make sure the free mask is consistent with what the inodes think. */
+/* Check an inode cluster. */
 STATIC int
-xchk_iallocbt_check_freemask(
+xchk_iallocbt_check_cluster(
 	struct xchk_btree		*bs,
-	struct xfs_inobt_rec_incore	*irec)
+	struct xfs_inobt_rec_incore	*irec,
+	xfs_agino_t			agino)
 {
 	struct xfs_imap			imap;
 	struct xfs_mount		*mp = bs->cur->bc_mp;
 	struct xfs_dinode		*dip;
 	struct xfs_buf			*bp;
 	xfs_ino_t			fsino;
-	xfs_agino_t			nr_inodes;
-	xfs_agino_t			agino;
+	unsigned int			nr_inodes;
 	xfs_agino_t			chunkino;
 	xfs_agino_t			clusterino;
 	xfs_agblock_t			agbno;
@@ -212,60 +212,71 @@ xchk_iallocbt_check_freemask(
 	nr_inodes = min_t(unsigned int, XFS_INODES_PER_CHUNK,
 			mp->m_inodes_per_cluster);
 
-	for (agino = irec->ir_startino;
-	     agino < irec->ir_startino + XFS_INODES_PER_CHUNK;
-	     agino += mp->m_blocks_per_cluster * mp->m_sb.sb_inopblock) {
-		fsino = XFS_AGINO_TO_INO(mp, bs->cur->bc_private.a.agno, agino);
-		chunkino = agino - irec->ir_startino;
-		agbno = XFS_AGINO_TO_AGBNO(mp, agino);
-
-		/* Compute the holemask mask for this cluster. */
-		for (clusterino = 0, holemask = 0; clusterino < nr_inodes;
-		     clusterino += XFS_INODES_PER_HOLEMASK_BIT)
-			holemask |= XFS_INOBT_MASK((chunkino + clusterino) /
-					XFS_INODES_PER_HOLEMASK_BIT);
-
-		/* The whole cluster must be a hole or not a hole. */
-		ir_holemask = (irec->ir_holemask & holemask);
-		if (ir_holemask != holemask && ir_holemask != 0) {
-			xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
-			continue;
-		}
+	fsino = XFS_AGINO_TO_INO(mp, bs->cur->bc_private.a.agno, agino);
+	chunkino = agino - irec->ir_startino;
+	agbno = XFS_AGINO_TO_AGBNO(mp, agino);
 
-		/* If any part of this is a hole, skip it. */
-		if (ir_holemask) {
-			xchk_xref_is_not_owned_by(bs->sc, agbno,
-					mp->m_blocks_per_cluster,
-					&XFS_RMAP_OINFO_INODES);
-			continue;
-		}
+	/* Compute the holemask mask for this cluster. */
+	for (clusterino = 0, holemask = 0; clusterino < nr_inodes;
+	     clusterino += XFS_INODES_PER_HOLEMASK_BIT)
+		holemask |= XFS_INOBT_MASK((chunkino + clusterino) /
+				XFS_INODES_PER_HOLEMASK_BIT);
+
+	/* The whole cluster must be a hole or not a hole. */
+	ir_holemask = (irec->ir_holemask & holemask);
+	if (ir_holemask != holemask && ir_holemask != 0) {
+		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
+		return 0;
+	}
 
-		xchk_xref_is_owned_by(bs->sc, agbno, mp->m_blocks_per_cluster,
+	/* If any part of this is a hole, skip it. */
+	if (ir_holemask) {
+		xchk_xref_is_not_owned_by(bs->sc, agbno,
+				mp->m_blocks_per_cluster,
 				&XFS_RMAP_OINFO_INODES);
+		return 0;
+	}
 
-		/* Grab the inode cluster buffer. */
-		imap.im_blkno = XFS_AGB_TO_DADDR(mp, bs->cur->bc_private.a.agno,
-				agbno);
-		imap.im_len = XFS_FSB_TO_BB(mp, mp->m_blocks_per_cluster);
-		imap.im_boffset = 0;
-
-		error = xfs_imap_to_bp(mp, bs->cur->bc_tp, &imap,
-				&dip, &bp, 0, 0);
-		if (!xchk_btree_xref_process_error(bs->sc, bs->cur, 0,
-				&error))
-			continue;
-
-		/* Which inodes are free? */
-		for (clusterino = 0; clusterino < nr_inodes; clusterino++) {
-			error = xchk_iallocbt_check_cluster_freemask(bs,
-					fsino, chunkino, clusterino, irec, bp);
-			if (error) {
-				xfs_trans_brelse(bs->cur->bc_tp, bp);
-				return error;
-			}
-		}
+	xchk_xref_is_owned_by(bs->sc, agbno, mp->m_blocks_per_cluster,
+			&XFS_RMAP_OINFO_INODES);
+
+	/* Grab the inode cluster buffer. */
+	imap.im_blkno = XFS_AGB_TO_DADDR(mp, bs->cur->bc_private.a.agno, agbno);
+	imap.im_len = XFS_FSB_TO_BB(mp, mp->m_blocks_per_cluster);
+	imap.im_boffset = 0;
 
-		xfs_trans_brelse(bs->cur->bc_tp, bp);
+	error = xfs_imap_to_bp(mp, bs->cur->bc_tp, &imap, &dip, &bp, 0, 0);
+	if (!xchk_btree_xref_process_error(bs->sc, bs->cur, 0, &error))
+		return 0;
+
+	/* Which inodes are free? */
+	for (clusterino = 0; clusterino < nr_inodes; clusterino++) {
+		error = xchk_iallocbt_check_cluster_freemask(bs, fsino,
+				chunkino, clusterino, irec, bp);
+		if (error)
+			break;
+	}
+
+	xfs_trans_brelse(bs->cur->bc_tp, bp);
+	return error;
+}
+
+/* Make sure the free mask is consistent with what the inodes think. */
+STATIC int
+xchk_iallocbt_check_freemask(
+	struct xchk_btree		*bs,
+	struct xfs_inobt_rec_incore	*irec)
+{
+	struct xfs_mount		*mp = bs->cur->bc_mp;
+	xfs_agino_t			agino;
+	int				error = 0;
+
+	for (agino = irec->ir_startino;
+	     agino < irec->ir_startino + XFS_INODES_PER_CHUNK;
+	     agino += mp->m_blocks_per_cluster * mp->m_sb.sb_inopblock) {
+		error = xchk_iallocbt_check_cluster(bs, irec, agino);
+		if (error)
+			break;
 	}
 
 	return error;

^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-01-01 15:29 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-28 23:26 [PATCH 0/9] xfs-5.0: inode scrubber fixes Darrick J. Wong
2018-11-28 23:27 ` [PATCH 1/9] xfs: count inode blocks correctly in inobt scrub Darrick J. Wong
2018-11-28 23:27 ` [PATCH 2/9] xfs: never try to scrub more than 64 inodes per inobt record Darrick J. Wong
2018-11-28 23:27 ` [PATCH 3/9] xfs: check the ir_startino alignment directly Darrick J. Wong
2018-11-28 23:27 ` [PATCH 4/9] xfs: check inobt record alignment on big block filesystems Darrick J. Wong
2018-11-28 23:27 ` [PATCH 5/9] xfs: hoist inode cluster checks out of loop Darrick J. Wong
2018-11-28 23:27 ` [PATCH 6/9] xfs: clean up the inode cluster checking in the inobt scrub Darrick J. Wong
2018-12-19  8:31   ` Chandan Rajendra
2018-11-28 23:27 ` [PATCH 7/9] xfs: scrub big block inode btrees correctly Darrick J. Wong
2018-11-28 23:27 ` [PATCH 8/9] xfs: abort xattr scrub if fatal signals are pending Darrick J. Wong
2018-11-28 23:27 ` [PATCH 9/9] xfs: scrub should flag dir/attr offsets that aren't mappable with xfs_dablk_t Darrick J. Wong
2018-12-19  8:31 ` [PATCH 0/9] xfs-5.0: inode scrubber fixes Chandan Rajendra
2018-12-19 13:15   ` Chandan Rajendra
2018-12-19 20:33     ` Darrick J. Wong
2018-12-31 11:39       ` Chandan Rajendra
2018-12-31 18:51         ` Darrick J. Wong
2019-01-01 15:29           ` Chandan Rajendra
  -- strict thread matches above, loose matches on Subject: below --
2018-11-15  6:04 [PATCH v2 0/9] xfs-5.0: inode btree scrub fixes Darrick J. Wong
2018-11-15  6:04 ` [PATCH 5/9] xfs: hoist inode cluster checks out of loop Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).