All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6 V2] xfs: more verifications!
@ 2018-06-05  6:24 Dave Chinner
  2018-06-05  6:24 ` [PATCH 1/6] xfs: catch bad stripe alignment configurations Dave Chinner
                   ` (5 more replies)
  0 siblings, 6 replies; 36+ messages in thread
From: Dave Chinner @ 2018-06-05  6:24 UTC (permalink / raw)
  To: linux-xfs

Hi folks,

Updated version of this patchset. This version adds COW extent size
hint verification and moves the corruption error conversion to
ESTALE out of xfs_imap_to_bp() and up to xfs_nfs_get_inode() where
ESTALE should be returned.

And, oh, I just realised I forgot the overflow checks in the btree
record verification - I'll send an update for that in line.

Cheers,

Dave.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 1/6] xfs: catch bad stripe alignment configurations
  2018-06-05  6:24 [PATCH 0/6 V2] xfs: more verifications! Dave Chinner
@ 2018-06-05  6:24 ` Dave Chinner
  2018-06-05  9:27   ` Carlos Maiolino
  2018-06-05  6:24 ` [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier Dave Chinner
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 36+ messages in thread
From: Dave Chinner @ 2018-06-05  6:24 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

When stripe alignments are invalid, data alignment algorithms in the
allocator may not work correctly. Ensure we catch superblocks with
invalid stripe alignment setups at mount time. These data alignment
mismatches are now detected at mount time like this:

XFS (loop0): SB stripe unit sanity check failed
XFS (loop0): Metadata corruption detected at xfs_sb_read_verify+0xab/0x110, xfs_sb block 0xffffffffffffffff
XFS (loop0): Unmount and run xfs_repair
XFS (loop0): First 128 bytes of corrupted metadata buffer:
0000000091c2de02: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 10 00  XFSB............
0000000023bff869: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000000cdd8c893: 17 32 37 15 ff ca 46 3d 9a 17 d3 33 04 b5 f1 a2  .27...F=...3....
000000009fd2844f: 00 00 00 00 00 00 00 04 00 00 00 00 00 00 06 d0  ................
0000000088e9b0bb: 00 00 00 00 00 00 06 d1 00 00 00 00 00 00 06 d2  ................
00000000ff233a20: 00 00 00 01 00 00 10 00 00 00 00 01 00 00 00 00  ................
000000009db0ac8b: 00 00 03 60 e1 34 02 00 08 00 00 02 00 00 00 00  ...`.4..........
00000000f7022460: 00 00 00 00 00 00 00 00 0c 09 0b 01 0c 00 00 19  ................
XFS (loop0): SB validate failed with error -117.

And the mount fails.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_sb.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 54c5e14fdfda..fe9448862667 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -266,6 +266,22 @@ xfs_mount_validate_sb(
 		return -EFSCORRUPTED;
 	}
 
+	if (sbp->sb_unit) {
+		if (!xfs_sb_version_hasdalign(sbp) ||
+		    sbp->sb_unit > sbp->sb_width ||
+		    (sbp->sb_width % sbp->sb_unit) != 0) {
+			xfs_notice(mp, "SB stripe unit sanity check failed");
+			return -EFSCORRUPTED;
+		}
+	} else if (xfs_sb_version_hasdalign(sbp)) {
+		xfs_notice(mp, "SB stripe alignment sanity check failed");
+		return -EFSCORRUPTED;
+	} else if (sbp->sb_width) {
+		xfs_notice(mp, "SB stripe width sanity check failed");
+		return -EFSCORRUPTED;
+	}
+
+
 	if (xfs_sb_version_hascrc(&mp->m_sb) &&
 	    sbp->sb_blocksize < XFS_MIN_CRC_BLOCKSIZE) {
 		xfs_notice(mp, "v5 SB sanity check failed");
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier
  2018-06-05  6:24 [PATCH 0/6 V2] xfs: more verifications! Dave Chinner
  2018-06-05  6:24 ` [PATCH 1/6] xfs: catch bad stripe alignment configurations Dave Chinner
@ 2018-06-05  6:24 ` Dave Chinner
  2018-06-05  9:53   ` Carlos Maiolino
                     ` (2 more replies)
  2018-06-05  6:24 ` [PATCH 3/6] xfs: verify COW " Dave Chinner
                   ` (3 subsequent siblings)
  5 siblings, 3 replies; 36+ messages in thread
From: Dave Chinner @ 2018-06-05  6:24 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

There are rules for vald extent size hints. We enforce them when
applications set them, but fuzzers violate those rules and that
screws us over.

This results in alignment assertion failures when setting up
allocations such as this in direct IO:

XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432
....
Call Trace:
 xfs_bmap_btalloc+0x415/0x910
 xfs_bmapi_write+0x71c/0x12e0
 xfs_iomap_write_direct+0x2a9/0x420
 xfs_file_iomap_begin+0x4dc/0xa70
 iomap_apply+0x43/0x100
 iomap_file_buffered_write+0x62/0x90
 xfs_file_buffered_aio_write+0xba/0x300
 __vfs_write+0xd5/0x150
 vfs_write+0xb6/0x180
 ksys_write+0x45/0xa0
 do_syscall_64+0x5a/0x180
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

And from xfs_db:

core.extsize = 10380288

Which is not an integer multiple of the block size, and so violates
Rule #7 for setting extent size hints. Validate extent size hint
rules in the inode verifier to catch this.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_inode_buf.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index f5fff1ccb61d..be197c91307b 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -385,6 +385,7 @@ xfs_dinode_verify(
 	xfs_ino_t		ino,
 	struct xfs_dinode	*dip)
 {
+	xfs_failaddr_t		fa;
 	uint16_t		mode;
 	uint16_t		flags;
 	uint64_t		flags2;
@@ -501,6 +502,12 @@ xfs_dinode_verify(
 			return __this_address;
 	}
 
+	/* extent size hint validation */
+	fa = xfs_inode_validate_extsize(mp, be32_to_cpu(dip->di_extsize),
+					mode, be32_to_cpu(dip->di_flags));
+	if (fa)
+		return fa;
+
 	/* only version 3 or greater inodes are extensively verified here */
 	if (dip->di_version < 3)
 		return NULL;
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 3/6] xfs: verify COW extent size hint is valid in inode verifier
  2018-06-05  6:24 [PATCH 0/6 V2] xfs: more verifications! Dave Chinner
  2018-06-05  6:24 ` [PATCH 1/6] xfs: catch bad stripe alignment configurations Dave Chinner
  2018-06-05  6:24 ` [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier Dave Chinner
@ 2018-06-05  6:24 ` Dave Chinner
  2018-06-05 10:00   ` Carlos Maiolino
  2018-06-05 17:09   ` Darrick J. Wong
  2018-06-05  6:24 ` [PATCH 4/6] xfs: validate btree records on retreival Dave Chinner
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 36+ messages in thread
From: Dave Chinner @ 2018-06-05  6:24 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

There are rules for vald extent size hints. We enforce them when
applications set them, but fuzzers violate those rules and that
screws us over. Validate COW extent size hint rules in the inode
verifier to catch this.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_inode_buf.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index be197c91307b..ea64be7cbd98 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -504,7 +504,7 @@ xfs_dinode_verify(
 
 	/* extent size hint validation */
 	fa = xfs_inode_validate_extsize(mp, be32_to_cpu(dip->di_extsize),
-					mode, be32_to_cpu(dip->di_flags));
+					mode, flags);
 	if (fa)
 		return fa;
 
@@ -516,7 +516,7 @@ xfs_dinode_verify(
 
 	/* don't allow reflink/cowextsize if we don't have reflink */
 	if ((flags2 & (XFS_DIFLAG2_REFLINK | XFS_DIFLAG2_COWEXTSIZE)) &&
-            !xfs_sb_version_hasreflink(&mp->m_sb))
+	     !xfs_sb_version_hasreflink(&mp->m_sb))
 		return __this_address;
 
 	/* only regular files get reflink */
@@ -531,6 +531,12 @@ xfs_dinode_verify(
 	if ((flags2 & XFS_DIFLAG2_REFLINK) && (flags2 & XFS_DIFLAG2_DAX))
 		return __this_address;
 
+	/* COW extent size hint validation */
+	fa = xfs_inode_validate_cowextsize(mp, be32_to_cpu(dip->di_cowextsize),
+					mode, flags, flags2);
+	if (fa)
+		return fa;
+
 	return NULL;
 }
 
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 4/6] xfs: validate btree records on retreival
  2018-06-05  6:24 [PATCH 0/6 V2] xfs: more verifications! Dave Chinner
                   ` (2 preceding siblings ...)
  2018-06-05  6:24 ` [PATCH 3/6] xfs: verify COW " Dave Chinner
@ 2018-06-05  6:24 ` Dave Chinner
  2018-06-05  6:40   ` [PATCH 4/6 v2] " Dave Chinner
  2018-06-05  6:24 ` [PATCH 5/6] xfs: verify root inode more thoroughly Dave Chinner
  2018-06-05  6:24 ` [PATCH 6/6] xfs: push corruption -> ESTALE conversion to xfs_nfs_get_inode() Dave Chinner
  5 siblings, 1 reply; 36+ messages in thread
From: Dave Chinner @ 2018-06-05  6:24 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

So we don't check the validity of records as we walk the btree. When
there are corrupt records in the free space btree (e.g. zero
startblock/length or beyond EOAG) we just blindly use it and things
go bad from there. That leads to assert failures on debug kernels
like this:

XFS: Assertion failed: fs_is_ok, file: fs/xfs/libxfs/xfs_alloc.c, line: 450
....
Call Trace:
 xfs_alloc_fixup_trees+0x368/0x5c0
 xfs_alloc_ag_vextent_near+0x79a/0xe20
 xfs_alloc_ag_vextent+0x1d3/0x330
 xfs_alloc_vextent+0x5e9/0x870

Or crashes like this:

XFS (loop0): xfs_buf_find: daddr 0x7fb28 out of range, EOFS 0x8000
.....
BUG: unable to handle kernel NULL pointer dereference at 00000000000000c8
....
Call Trace:
 xfs_bmap_add_extent_hole_real+0x67d/0x930
 xfs_bmapi_write+0x934/0xc90
 xfs_da_grow_inode_int+0x27e/0x2f0
 xfs_dir2_grow_inode+0x55/0x130
 xfs_dir2_sf_to_block+0x94/0x5d0
 xfs_dir2_sf_addname+0xd0/0x590
 xfs_dir_createname+0x168/0x1a0
 xfs_rename+0x658/0x9b0

By checking that free space records pulled from the trees are
within the valid range, we catch many of these corruptions before
they can do damage.

This is a generic btree record checking deficiency. We need to
validate the records we fetch from all the different btrees before
we use them to catch corruptions like this.

This patch results in a corrupt record emitting an error message and
returning -EFSCORRUPTED, and the higher layers catch that and abort:

 XFS (loop0): Size Freespace BTree record corruption in AG 0 detected!
 XFS (loop0): start block 0x0 block count 0x0
 XFS (loop0): Internal error xfs_trans_cancel at line 1012 of file fs/xfs/xfs_trans.c.  Caller xfs_create+0x42a/0x670
 .....
 Call Trace:
  dump_stack+0x85/0xcb
  xfs_trans_cancel+0x19f/0x1c0
  xfs_create+0x42a/0x670
  xfs_generic_create+0x1f6/0x2c0
  vfs_create+0xf9/0x180
  do_mknodat+0x1f9/0x210
  do_syscall_64+0x5a/0x180
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
.....
 XFS (loop0): xfs_do_force_shutdown(0x8) called from line 1013 of file fs/xfs/xfs_trans.c.  Return address = ffffffff81500868
 XFS (loop0): Corruption of in-memory data detected.  Shutting down filesystem

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c    | 18 ++++++++++++++--
 fs/xfs/libxfs/xfs_ialloc.c   | 31 ++++++++++++++++++++++++++-
 fs/xfs/libxfs/xfs_refcount.c | 41 +++++++++++++++++++++++++++++++-----
 fs/xfs/libxfs/xfs_rmap.c     | 34 +++++++++++++++++++++++++++++-
 4 files changed, 115 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index a52ffb835b16..e9caa3d18f89 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -215,6 +215,8 @@ xfs_alloc_get_rec(
 	xfs_extlen_t		*len,	/* output: length of extent */
 	int			*stat)	/* output: success/failure */
 {
+	struct xfs_mount	*mp = cur->bc_mp;
+	xfs_agnumber_t		agno = cur->bc_private.a.agno;
 	union xfs_btree_rec	*rec;
 	int			error;
 
@@ -222,12 +224,24 @@ xfs_alloc_get_rec(
 	if (error || !(*stat))
 		return error;
 	if (rec->alloc.ar_blockcount == 0)
-		return -EFSCORRUPTED;
+		goto out_bad_rec;
 
 	*bno = be32_to_cpu(rec->alloc.ar_startblock);
 	*len = be32_to_cpu(rec->alloc.ar_blockcount);
 
-	return error;
+	if (!xfs_verify_agbno(mp, agno, *bno) ||
+	    !xfs_verify_agbno(mp, agno, *bno + *len - 1))
+		goto out_bad_rec;
+
+	return 0;
+
+out_bad_rec:
+	xfs_warn(mp,
+		"%s Freespace BTree record corruption in AG %d detected!",
+		cur->bc_btnum == XFS_BTNUM_BNO ? "Block" : "Size", agno);
+	xfs_warn(mp,
+		"start block 0x%x block count 0x%x", *bno, *len);
+	return -EFSCORRUPTED;
 }
 
 /*
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index ec5ea02b5553..3f551eb29157 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -121,16 +121,45 @@ xfs_inobt_get_rec(
 	struct xfs_inobt_rec_incore	*irec,
 	int				*stat)
 {
+	struct xfs_mount		*mp = cur->bc_mp;
+	xfs_agnumber_t			agno = cur->bc_private.a.agno;
 	union xfs_btree_rec		*rec;
 	int				error;
+	uint64_t			realfree;
 
 	error = xfs_btree_get_rec(cur, &rec, stat);
 	if (error || *stat == 0)
 		return error;
 
-	xfs_inobt_btrec_to_irec(cur->bc_mp, rec, irec);
+	xfs_inobt_btrec_to_irec(mp, rec, irec);
+
+	if (!xfs_verify_agino(mp, agno, irec->ir_startino))
+		goto out_bad_rec;
+	if (irec->ir_count < XFS_INODES_PER_HOLEMASK_BIT ||
+	    irec->ir_count > XFS_INODES_PER_CHUNK)
+		goto out_bad_rec;
+	if (irec->ir_freecount > XFS_INODES_PER_CHUNK)
+		goto out_bad_rec;
+
+	/* if there are no holes, return the first available offset */
+	if (!xfs_inobt_issparse(irec->ir_holemask))
+		realfree = irec->ir_free;
+	else
+		realfree = irec->ir_free & xfs_inobt_irec_to_allocmask(irec);
+	if (hweight64(realfree) != irec->ir_freecount)
+		goto out_bad_rec;
 
 	return 0;
+
+out_bad_rec:
+	xfs_warn(mp,
+		"%s Inode BTree record corruption in AG %d detected!",
+		cur->bc_btnum == XFS_BTNUM_INO ? "Used" : "Free", agno);
+	xfs_warn(mp,
+"start inode 0x%x, count 0x%x, free 0x%x freemask 0x%llx, holemask 0x%x",
+		irec->ir_startino, irec->ir_count, irec->ir_freecount,
+		irec->ir_free, irec->ir_holemask);
+	return -EFSCORRUPTED;
 }
 
 /*
diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
index ed5704c7dcf5..bd778e8b809f 100644
--- a/fs/xfs/libxfs/xfs_refcount.c
+++ b/fs/xfs/libxfs/xfs_refcount.c
@@ -111,16 +111,47 @@ xfs_refcount_get_rec(
 	struct xfs_refcount_irec	*irec,
 	int				*stat)
 {
+	struct xfs_mount		*mp = cur->bc_mp;
+	xfs_agnumber_t			agno = cur->bc_private.a.agno;
 	union xfs_btree_rec		*rec;
 	int				error;
+	xfs_agblock_t			realstart;
 
 	error = xfs_btree_get_rec(cur, &rec, stat);
-	if (!error && *stat == 1) {
-		xfs_refcount_btrec_to_irec(rec, irec);
-		trace_xfs_refcount_get(cur->bc_mp, cur->bc_private.a.agno,
-				irec);
+	if (error || !*stat)
+		return error;
+
+	xfs_refcount_btrec_to_irec(rec, irec);
+
+	agno = cur->bc_private.a.agno;
+	if (irec->rc_blockcount == 0 || irec->rc_blockcount > MAXREFCEXTLEN)
+		goto out_bad_rec;
+
+	/* handle special COW-staging state */
+	realstart = irec->rc_startblock;
+	if (realstart & XFS_REFC_COW_START) {
+		if (irec->rc_refcount != 1)
+			goto out_bad_rec;
+		realstart &= ~XFS_REFC_COW_START;
 	}
-	return error;
+
+	if (!xfs_verify_agbno(mp, agno, realstart))
+		goto out_bad_rec;
+	if (!xfs_verify_agbno(mp, agno, realstart + irec->rc_blockcount - 1))
+		goto out_bad_rec;
+	if (irec->rc_refcount == 0 || irec->rc_refcount > MAXREFCOUNT)
+		goto out_bad_rec;
+
+	trace_xfs_refcount_get(cur->bc_mp, cur->bc_private.a.agno, irec);
+	return 0;
+
+out_bad_rec:
+	xfs_warn(mp,
+		"Refcount BTree record corruption in AG %d detected!", agno);
+	xfs_warn(mp,
+		"Start block 0x%x, block count 0x%x, references 0x%x",
+		irec->rc_startblock, irec->rc_blockcount, irec->rc_refcount);
+	return -EFSCORRUPTED;
 }
 
 /*
diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
index 87711f9af625..69cfc92039e7 100644
--- a/fs/xfs/libxfs/xfs_rmap.c
+++ b/fs/xfs/libxfs/xfs_rmap.c
@@ -191,6 +191,8 @@ xfs_rmap_get_rec(
 	struct xfs_rmap_irec	*irec,
 	int			*stat)
 {
+	struct xfs_mount	*mp = cur->bc_mp;
+	xfs_agnumber_t		agno = cur->bc_private.a.agno;
 	union xfs_btree_rec	*rec;
 	int			error;
 
@@ -198,7 +200,37 @@ xfs_rmap_get_rec(
 	if (error || !*stat)
 		return error;
 
-	return xfs_rmap_btrec_to_irec(rec, irec);
+	if (xfs_rmap_btrec_to_irec(rec, irec))
+		goto out_bad_rec;
+
+	if (irec->rm_blockcount == 0)
+		goto out_bad_rec;
+	if (irec->rm_startblock <= XFS_AGFL_BLOCK(mp)) {
+		if (irec->rm_owner != XFS_RMAP_OWN_FS)
+			goto out_bad_rec;
+	} else {
+		if (!xfs_verify_agbno(mp, agno, irec->rm_startblock))
+			goto out_bad_rec;
+		if (!xfs_verify_agbno(mp, agno,
+				irec->rm_startblock + irec->rm_blockcount - 1))
+			goto out_bad_rec;
+	}
+
+	if (irec->rm_owner == 0)
+		goto out_bad_rec;
+	if (irec->rm_owner > XFS_MAXINUMBER &&
+	    irec->rm_owner <= XFS_RMAP_OWN_MIN)
+		goto out_bad_rec;
+
+	return 0;
+out_bad_rec:
+	xfs_warn(mp,
+		"RMAP BTree record corruption in AG %d detected!", agno);
+	xfs_warn(mp,
+		"Owner 0x%llx, flags 0x%x, start block 0x%x block count 0x%x",
+		irec->rm_owner, irec->rm_flags, irec->rm_startblock,
+		irec->rm_blockcount);
+	return -EFSCORRUPTED;
 }
 
 struct xfs_find_left_neighbor_info {
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 5/6] xfs: verify root inode more thoroughly
  2018-06-05  6:24 [PATCH 0/6 V2] xfs: more verifications! Dave Chinner
                   ` (3 preceding siblings ...)
  2018-06-05  6:24 ` [PATCH 4/6] xfs: validate btree records on retreival Dave Chinner
@ 2018-06-05  6:24 ` Dave Chinner
  2018-06-05 10:50   ` Carlos Maiolino
  2018-06-05 17:10   ` Darrick J. Wong
  2018-06-05  6:24 ` [PATCH 6/6] xfs: push corruption -> ESTALE conversion to xfs_nfs_get_inode() Dave Chinner
  5 siblings, 2 replies; 36+ messages in thread
From: Dave Chinner @ 2018-06-05  6:24 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

When looking up the root inode at mount time, we don't actually do
any verification to check that the inode is allocated and accounted
for correctly in the INOBT. Make the checks on the root inode more
robust by making it an untrusted lookup. This forces the inode
lookup to use the inode btree to verify the inode is allocated
and mapped correctly to disk. This will also have the effect of
catching a significant number of AGI/INOBT related corruptions in
AG 0 at mount time.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_mount.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 189fa7b615d3..a3378252baa1 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -862,9 +862,12 @@ xfs_mountfs(
 	 * Get and sanity-check the root inode.
 	 * Save the pointer to it in the mount structure.
 	 */
-	error = xfs_iget(mp, NULL, sbp->sb_rootino, 0, XFS_ILOCK_EXCL, &rip);
+	error = xfs_iget(mp, NULL, sbp->sb_rootino, XFS_IGET_UNTRUSTED,
+			 XFS_ILOCK_EXCL, &rip);
 	if (error) {
-		xfs_warn(mp, "failed to read root inode");
+		xfs_warn(mp,
+			"Failed to read root inode 0x%llx, error %d",
+			sbp->sb_rootino, -error);
 		goto out_log_dealloc;
 	}
 
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 6/6] xfs: push corruption -> ESTALE conversion to xfs_nfs_get_inode()
  2018-06-05  6:24 [PATCH 0/6 V2] xfs: more verifications! Dave Chinner
                   ` (4 preceding siblings ...)
  2018-06-05  6:24 ` [PATCH 5/6] xfs: verify root inode more thoroughly Dave Chinner
@ 2018-06-05  6:24 ` Dave Chinner
  2018-06-05 11:12   ` Carlos Maiolino
  2018-06-05 17:11   ` Darrick J. Wong
  5 siblings, 2 replies; 36+ messages in thread
From: Dave Chinner @ 2018-06-05  6:24 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

In xfs_imap_to_bp(), we convert a -EFSCORRUPTED error to -EINVAL if
we are doing an untrusted lookup. This is done because we need
failed filehandle lookups to report -ESTALE to the caller, and it
does this by converting -EINVAL and -ENOENT errors to -ESTALE.

The squashing of EFSCORRUPTED in imap_to_bp makes it impossible for
for xfs_iget(UNTRUSTED) callers to determine the difference between
"inode does not exist" and "corruption detected during lookup". We
realy need that distinction in places calling xfS_iget(UNTRUSTED),
so move the filehandle error case handling all the way out to
xfs_nfs_get_inode() where it is needed.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_inode_buf.c |  5 -----
 fs/xfs/xfs_export.c           | 15 ++++++++++++---
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index ea64be7cbd98..0a22c39325cd 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -189,11 +189,6 @@ xfs_imap_to_bp(
 			ASSERT(buf_flags & XBF_TRYLOCK);
 			return error;
 		}
-
-		if (error == -EFSCORRUPTED &&
-		    (iget_flags & XFS_IGET_UNTRUSTED))
-			return -EINVAL;
-
 		xfs_warn(mp, "%s: xfs_trans_read_buf() returned error %d.",
 			__func__, error);
 		return error;
diff --git a/fs/xfs/xfs_export.c b/fs/xfs/xfs_export.c
index 7af7d6faa709..3cf4682e2510 100644
--- a/fs/xfs/xfs_export.c
+++ b/fs/xfs/xfs_export.c
@@ -128,15 +128,24 @@ xfs_nfs_get_inode(
 	 */
 	error = xfs_iget(mp, NULL, ino, XFS_IGET_UNTRUSTED, 0, &ip);
 	if (error) {
+
 		/*
 		 * EINVAL means the inode cluster doesn't exist anymore.
-		 * This implies the filehandle is stale, so we should
-		 * translate it here.
+		 * EFSCORRUPTED means the metadata pointing to the inode cluster
+		 * or the inode cluster itself is corrupt.  This implies the
+		 * filehandle is stale, so we should translate it here.
 		 * We don't use ESTALE directly down the chain to not
 		 * confuse applications using bulkstat that expect EINVAL.
 		 */
-		if (error == -EINVAL || error == -ENOENT)
+		switch (error) {
+		case -EINVAL:
+		case -ENOENT:
+		case -EFSCORRUPTED:
 			error = -ESTALE;
+			break;
+		default:
+			break;
+		}
 		return ERR_PTR(error);
 	}
 
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 4/6 v2] xfs: validate btree records on retreival
  2018-06-05  6:24 ` [PATCH 4/6] xfs: validate btree records on retreival Dave Chinner
@ 2018-06-05  6:40   ` Dave Chinner
  2018-06-05 10:42     ` Carlos Maiolino
                       ` (2 more replies)
  0 siblings, 3 replies; 36+ messages in thread
From: Dave Chinner @ 2018-06-05  6:40 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

So we don't check the validity of records as we walk the btree. When
there are corrupt records in the free space btree (e.g. zero
startblock/length or beyond EOAG) we just blindly use it and things
go bad from there. That leads to assert failures on debug kernels
like this:

XFS: Assertion failed: fs_is_ok, file: fs/xfs/libxfs/xfs_alloc.c, line: 450
....
Call Trace:
 xfs_alloc_fixup_trees+0x368/0x5c0
 xfs_alloc_ag_vextent_near+0x79a/0xe20
 xfs_alloc_ag_vextent+0x1d3/0x330
 xfs_alloc_vextent+0x5e9/0x870

Or crashes like this:

XFS (loop0): xfs_buf_find: daddr 0x7fb28 out of range, EOFS 0x8000
.....
BUG: unable to handle kernel NULL pointer dereference at 00000000000000c8
....
Call Trace:
 xfs_bmap_add_extent_hole_real+0x67d/0x930
 xfs_bmapi_write+0x934/0xc90
 xfs_da_grow_inode_int+0x27e/0x2f0
 xfs_dir2_grow_inode+0x55/0x130
 xfs_dir2_sf_to_block+0x94/0x5d0
 xfs_dir2_sf_addname+0xd0/0x590
 xfs_dir_createname+0x168/0x1a0
 xfs_rename+0x658/0x9b0

By checking that free space records pulled from the trees are 
within the valid range, we catch many of these corruptions before
they can do damage.

This is a generic btree record checking deficiency. We need to
validate the records we fetch from all the different btrees before
we use them to catch corruptions like this.

This patch results in a corrupt record emitting an error message and
returning -EFSCORRUPTED, and the higher layers catch that and abort:

 XFS (loop0): Size Freespace BTree record corruption in AG 0 detected!
 XFS (loop0): start block 0x0 block count 0x0
 XFS (loop0): Internal error xfs_trans_cancel at line 1012 of file fs/xfs/xfs_trans.c.  Caller xfs_create+0x42a/0x670
 .....
 Call Trace:
  dump_stack+0x85/0xcb
  xfs_trans_cancel+0x19f/0x1c0
  xfs_create+0x42a/0x670
  xfs_generic_create+0x1f6/0x2c0
  vfs_create+0xf9/0x180
  do_mknodat+0x1f9/0x210
  do_syscall_64+0x5a/0x180
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
.....
 XFS (loop0): xfs_do_force_shutdown(0x8) called from line 1013 of file fs/xfs/xfs_trans.c.  Return address = ffffffff81500868
 XFS (loop0): Corruption of in-memory data detected.  Shutting down filesystem

Signed-off-by: Dave Chinner <dchinner@redhat.com>

---
V2:
- now with overflow checks
- slightly more robust rmap checks that validate AG header region
  properly.

 fs/xfs/libxfs/xfs_alloc.c    | 22 ++++++++++++++++++++--
 fs/xfs/libxfs/xfs_ialloc.c   | 31 +++++++++++++++++++++++++++++-
 fs/xfs/libxfs/xfs_refcount.c | 45 +++++++++++++++++++++++++++++++++++++++-----
 fs/xfs/libxfs/xfs_rmap.c     | 40 ++++++++++++++++++++++++++++++++++++++-
 4 files changed, 129 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index a52ffb835b16..1db50cfc0212 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -215,6 +215,8 @@ xfs_alloc_get_rec(
 	xfs_extlen_t		*len,	/* output: length of extent */
 	int			*stat)	/* output: success/failure */
 {
+	struct xfs_mount	*mp = cur->bc_mp;
+	xfs_agnumber_t		agno = cur->bc_private.a.agno;
 	union xfs_btree_rec	*rec;
 	int			error;
 
@@ -222,12 +224,28 @@ xfs_alloc_get_rec(
 	if (error || !(*stat))
 		return error;
 	if (rec->alloc.ar_blockcount == 0)
-		return -EFSCORRUPTED;
+		goto out_bad_rec;
 
 	*bno = be32_to_cpu(rec->alloc.ar_startblock);
 	*len = be32_to_cpu(rec->alloc.ar_blockcount);
 
-	return error;
+	/* check for valid extent range, including overflow */
+	if (!xfs_verify_agbno(mp, agno, *bno))
+		goto out_bad_rec;
+	if (*bno > *bno + *len)
+		goto out_bad_rec;
+	if (!xfs_verify_agbno(mp, agno, *bno + *len - 1))
+		goto out_bad_rec;
+
+	return 0;
+
+out_bad_rec:
+	xfs_warn(mp,
+		"%s Freespace BTree record corruption in AG %d detected!",
+		cur->bc_btnum == XFS_BTNUM_BNO ? "Block" : "Size", agno);
+	xfs_warn(mp,
+		"start block 0x%x block count 0x%x", *bno, *len);
+	return -EFSCORRUPTED;
 }
 
 /*
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index ec5ea02b5553..3f551eb29157 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -121,16 +121,45 @@ xfs_inobt_get_rec(
 	struct xfs_inobt_rec_incore	*irec,
 	int				*stat)
 {
+	struct xfs_mount		*mp = cur->bc_mp;
+	xfs_agnumber_t			agno = cur->bc_private.a.agno;
 	union xfs_btree_rec		*rec;
 	int				error;
+	uint64_t			realfree;
 
 	error = xfs_btree_get_rec(cur, &rec, stat);
 	if (error || *stat == 0)
 		return error;
 
-	xfs_inobt_btrec_to_irec(cur->bc_mp, rec, irec);
+	xfs_inobt_btrec_to_irec(mp, rec, irec);
+
+	if (!xfs_verify_agino(mp, agno, irec->ir_startino))
+		goto out_bad_rec;
+	if (irec->ir_count < XFS_INODES_PER_HOLEMASK_BIT ||
+	    irec->ir_count > XFS_INODES_PER_CHUNK)
+		goto out_bad_rec;
+	if (irec->ir_freecount > XFS_INODES_PER_CHUNK)
+		goto out_bad_rec;
+
+	/* if there are no holes, return the first available offset */
+	if (!xfs_inobt_issparse(irec->ir_holemask))
+		realfree = irec->ir_free;
+	else
+		realfree = irec->ir_free & xfs_inobt_irec_to_allocmask(irec);
+	if (hweight64(realfree) != irec->ir_freecount)
+		goto out_bad_rec;
 
 	return 0;
+
+out_bad_rec:
+	xfs_warn(mp,
+		"%s Inode BTree record corruption in AG %d detected!",
+		cur->bc_btnum == XFS_BTNUM_INO ? "Used" : "Free", agno);
+	xfs_warn(mp,
+"start inode 0x%x, count 0x%x, free 0x%x freemask 0x%llx, holemask 0x%x",
+		irec->ir_startino, irec->ir_count, irec->ir_freecount,
+		irec->ir_free, irec->ir_holemask);
+	return -EFSCORRUPTED;
 }
 
 /*
diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
index ed5704c7dcf5..9a2a2004af24 100644
--- a/fs/xfs/libxfs/xfs_refcount.c
+++ b/fs/xfs/libxfs/xfs_refcount.c
@@ -111,16 +111,51 @@ xfs_refcount_get_rec(
 	struct xfs_refcount_irec	*irec,
 	int				*stat)
 {
+	struct xfs_mount		*mp = cur->bc_mp;
+	xfs_agnumber_t			agno = cur->bc_private.a.agno;
 	union xfs_btree_rec		*rec;
 	int				error;
+	xfs_agblock_t			realstart;
 
 	error = xfs_btree_get_rec(cur, &rec, stat);
-	if (!error && *stat == 1) {
-		xfs_refcount_btrec_to_irec(rec, irec);
-		trace_xfs_refcount_get(cur->bc_mp, cur->bc_private.a.agno,
-				irec);
+	if (error || !*stat)
+		return error;
+
+	xfs_refcount_btrec_to_irec(rec, irec);
+
+	agno = cur->bc_private.a.agno;
+	if (irec->rc_blockcount == 0 || irec->rc_blockcount > MAXREFCEXTLEN)
+		goto out_bad_rec;
+
+	/* handle special COW-staging state */
+	realstart = irec->rc_startblock;
+	if (realstart & XFS_REFC_COW_START) {
+		if (irec->rc_refcount != 1)
+			goto out_bad_rec;
+		realstart &= ~XFS_REFC_COW_START;
 	}
-	return error;
+
+	/* check for valid extent range, including overflow */
+	if (!xfs_verify_agbno(mp, agno, realstart))
+		goto out_bad_rec;
+	if (realstart > realstart + irec->rc_blockcount)
+		goto out_bad_rec;
+	if (!xfs_verify_agbno(mp, agno, realstart + irec->rc_blockcount - 1))
+		goto out_bad_rec;
+
+	if (irec->rc_refcount == 0 || irec->rc_refcount > MAXREFCOUNT)
+		goto out_bad_rec;
+
+	trace_xfs_refcount_get(cur->bc_mp, cur->bc_private.a.agno, irec);
+	return 0;
+
+out_bad_rec:
+	xfs_warn(mp,
+		"Refcount BTree record corruption in AG %d detected!", agno);
+	xfs_warn(mp,
+		"Start block 0x%x, block count 0x%x, references 0x%x",
+		irec->rc_startblock, irec->rc_blockcount, irec->rc_refcount);
+	return -EFSCORRUPTED;
 }
 
 /*
diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
index 87711f9af625..e794bcd6591a 100644
--- a/fs/xfs/libxfs/xfs_rmap.c
+++ b/fs/xfs/libxfs/xfs_rmap.c
@@ -191,6 +191,8 @@ xfs_rmap_get_rec(
 	struct xfs_rmap_irec	*irec,
 	int			*stat)
 {
+	struct xfs_mount	*mp = cur->bc_mp;
+	xfs_agnumber_t		agno = cur->bc_private.a.agno;
 	union xfs_btree_rec	*rec;
 	int			error;
 
@@ -198,7 +200,43 @@ xfs_rmap_get_rec(
 	if (error || !*stat)
 		return error;
 
-	return xfs_rmap_btrec_to_irec(rec, irec);
+	if (xfs_rmap_btrec_to_irec(rec, irec))
+		goto out_bad_rec;
+
+	if (irec->rm_blockcount == 0)
+		goto out_bad_rec;
+	if (irec->rm_startblock <= XFS_AGFL_BLOCK(mp)) {
+		if (irec->rm_owner != XFS_RMAP_OWN_FS)
+			goto out_bad_rec;
+		if (irec->rm_blockcount != XFS_AGFL_BLOCK(mp) + 1)
+			goto out_bad_rec;
+	} else {
+		/* check for valid extent range, including overflow */
+		if (!xfs_verify_agbno(mp, agno, irec->rm_startblock))
+			goto out_bad_rec;
+		if (irec->rm_startblock >
+				irec->rm_startblock + irec->rm_blockcount)
+			goto out_bad_rec;
+		if (!xfs_verify_agbno(mp, agno,
+				irec->rm_startblock + irec->rm_blockcount - 1))
+			goto out_bad_rec;
+	}
+
+	if (irec->rm_owner == 0)
+		goto out_bad_rec;
+	if (irec->rm_owner > XFS_MAXINUMBER &&
+	    irec->rm_owner <= XFS_RMAP_OWN_MIN)
+		goto out_bad_rec;
+
+	return 0;
+out_bad_rec:
+	xfs_warn(mp,
+		"RMAP BTree record corruption in AG %d detected!", agno);
+	xfs_warn(mp,
+		"Owner 0x%llx, flags 0x%x, start block 0x%x block count 0x%x",
+		irec->rm_owner, irec->rm_flags, irec->rm_startblock,
+		irec->rm_blockcount);
+	return -EFSCORRUPTED;
 }
 
 struct xfs_find_left_neighbor_info {

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/6] xfs: catch bad stripe alignment configurations
  2018-06-05  6:24 ` [PATCH 1/6] xfs: catch bad stripe alignment configurations Dave Chinner
@ 2018-06-05  9:27   ` Carlos Maiolino
  0 siblings, 0 replies; 36+ messages in thread
From: Carlos Maiolino @ 2018-06-05  9:27 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Jun 05, 2018 at 04:24:18PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When stripe alignments are invalid, data alignment algorithms in the
> allocator may not work correctly. Ensure we catch superblocks with
> invalid stripe alignment setups at mount time. These data alignment
> mismatches are now detected at mount time like this:
> 
> XFS (loop0): SB stripe unit sanity check failed
> XFS (loop0): Metadata corruption detected at xfs_sb_read_verify+0xab/0x110, xfs_sb block 0xffffffffffffffff
> XFS (loop0): Unmount and run xfs_repair
> XFS (loop0): First 128 bytes of corrupted metadata buffer:
> 0000000091c2de02: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 10 00  XFSB............
> 0000000023bff869: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00000000cdd8c893: 17 32 37 15 ff ca 46 3d 9a 17 d3 33 04 b5 f1 a2  .27...F=...3....
> 000000009fd2844f: 00 00 00 00 00 00 00 04 00 00 00 00 00 00 06 d0  ................
> 0000000088e9b0bb: 00 00 00 00 00 00 06 d1 00 00 00 00 00 00 06 d2  ................
> 00000000ff233a20: 00 00 00 01 00 00 10 00 00 00 00 01 00 00 00 00  ................
> 000000009db0ac8b: 00 00 03 60 e1 34 02 00 08 00 00 02 00 00 00 00  ...`.4..........
> 00000000f7022460: 00 00 00 00 00 00 00 00 0c 09 0b 01 0c 00 00 19  ................
> XFS (loop0): SB validate failed with error -117.
> 
> And the mount fails.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_sb.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)

Looks good, you can add if you want:

Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
> 
> diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> index 54c5e14fdfda..fe9448862667 100644
> --- a/fs/xfs/libxfs/xfs_sb.c
> +++ b/fs/xfs/libxfs/xfs_sb.c
> @@ -266,6 +266,22 @@ xfs_mount_validate_sb(
>  		return -EFSCORRUPTED;
>  	}
>  
> +	if (sbp->sb_unit) {
> +		if (!xfs_sb_version_hasdalign(sbp) ||
> +		    sbp->sb_unit > sbp->sb_width ||
> +		    (sbp->sb_width % sbp->sb_unit) != 0) {
> +			xfs_notice(mp, "SB stripe unit sanity check failed");
> +			return -EFSCORRUPTED;
> +		}
> +	} else if (xfs_sb_version_hasdalign(sbp)) {
> +		xfs_notice(mp, "SB stripe alignment sanity check failed");
> +		return -EFSCORRUPTED;
> +	} else if (sbp->sb_width) {
> +		xfs_notice(mp, "SB stripe width sanity check failed");
> +		return -EFSCORRUPTED;
> +	}
> +
> +
>  	if (xfs_sb_version_hascrc(&mp->m_sb) &&
>  	    sbp->sb_blocksize < XFS_MIN_CRC_BLOCKSIZE) {
>  		xfs_notice(mp, "v5 SB sanity check failed");
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Carlos

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier
  2018-06-05  6:24 ` [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier Dave Chinner
@ 2018-06-05  9:53   ` Carlos Maiolino
  2018-06-05 22:56     ` Dave Chinner
  2018-06-05 17:10   ` Darrick J. Wong
  2018-07-24  6:39   ` Eric Sandeen
  2 siblings, 1 reply; 36+ messages in thread
From: Carlos Maiolino @ 2018-06-05  9:53 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Jun 05, 2018 at 04:24:19PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> There are rules for vald extent size hints. We enforce them when
> applications set them, but fuzzers violate those rules and that
> screws us over.
> 
> This results in alignment assertion failures when setting up
> allocations such as this in direct IO:
> 
> XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432
> ....
> Call Trace:
>  xfs_bmap_btalloc+0x415/0x910
>  xfs_bmapi_write+0x71c/0x12e0
>  xfs_iomap_write_direct+0x2a9/0x420
>  xfs_file_iomap_begin+0x4dc/0xa70
>  iomap_apply+0x43/0x100
>  iomap_file_buffered_write+0x62/0x90
>  xfs_file_buffered_aio_write+0xba/0x300
>  __vfs_write+0xd5/0x150
>  vfs_write+0xb6/0x180
>  ksys_write+0x45/0xa0
>  do_syscall_64+0x5a/0x180
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> And from xfs_db:
> 
> core.extsize = 10380288
> 
> Which is not an integer multiple of the block size, and so violates
> Rule #7 for setting extent size hints. Validate extent size hint
> rules in the inode verifier to catch this.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_inode_buf.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
> index f5fff1ccb61d..be197c91307b 100644
> --- a/fs/xfs/libxfs/xfs_inode_buf.c
> +++ b/fs/xfs/libxfs/xfs_inode_buf.c
> @@ -385,6 +385,7 @@ xfs_dinode_verify(
>  	xfs_ino_t		ino,
>  	struct xfs_dinode	*dip)
>  {
> +	xfs_failaddr_t		fa;

Weren't we getting rid of typedefs? To be honest the typedef here gives more
clarity to the code than void* directly, so, I'm ok with it anyway, I'm just
curious is some typedefs are going to be kept.

Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>

>  	uint16_t		mode;
>  	uint16_t		flags;
>  	uint64_t		flags2;
> @@ -501,6 +502,12 @@ xfs_dinode_verify(
>  			return __this_address;
>  	}
>  
> +	/* extent size hint validation */
> +	fa = xfs_inode_validate_extsize(mp, be32_to_cpu(dip->di_extsize),
> +					mode, be32_to_cpu(dip->di_flags));
> +	if (fa)
> +		return fa;
> +
>  	/* only version 3 or greater inodes are extensively verified here */
>  	if (dip->di_version < 3)
>  		return NULL;
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Carlos

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/6] xfs: verify COW extent size hint is valid in inode verifier
  2018-06-05  6:24 ` [PATCH 3/6] xfs: verify COW " Dave Chinner
@ 2018-06-05 10:00   ` Carlos Maiolino
  2018-06-05 17:09   ` Darrick J. Wong
  1 sibling, 0 replies; 36+ messages in thread
From: Carlos Maiolino @ 2018-06-05 10:00 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Jun 05, 2018 at 04:24:20PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> There are rules for vald extent size hints. We enforce them when
> applications set them, but fuzzers violate those rules and that
> screws us over. Validate COW extent size hint rules in the inode
> verifier to catch this.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_inode_buf.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 

Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>

> diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
> index be197c91307b..ea64be7cbd98 100644
> --- a/fs/xfs/libxfs/xfs_inode_buf.c
> +++ b/fs/xfs/libxfs/xfs_inode_buf.c
> @@ -504,7 +504,7 @@ xfs_dinode_verify(
>  
>  	/* extent size hint validation */
>  	fa = xfs_inode_validate_extsize(mp, be32_to_cpu(dip->di_extsize),
> -					mode, be32_to_cpu(dip->di_flags));
> +					mode, flags);
>  	if (fa)
>  		return fa;
>  
> @@ -516,7 +516,7 @@ xfs_dinode_verify(
>  
>  	/* don't allow reflink/cowextsize if we don't have reflink */
>  	if ((flags2 & (XFS_DIFLAG2_REFLINK | XFS_DIFLAG2_COWEXTSIZE)) &&
> -            !xfs_sb_version_hasreflink(&mp->m_sb))
> +	     !xfs_sb_version_hasreflink(&mp->m_sb))
>  		return __this_address;
>  
>  	/* only regular files get reflink */
> @@ -531,6 +531,12 @@ xfs_dinode_verify(
>  	if ((flags2 & XFS_DIFLAG2_REFLINK) && (flags2 & XFS_DIFLAG2_DAX))
>  		return __this_address;
>  
> +	/* COW extent size hint validation */
> +	fa = xfs_inode_validate_cowextsize(mp, be32_to_cpu(dip->di_cowextsize),
> +					mode, flags, flags2);
> +	if (fa)
> +		return fa;
> +
>  	return NULL;
>  }
>  
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Carlos

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 4/6 v2] xfs: validate btree records on retreival
  2018-06-05  6:40   ` [PATCH 4/6 v2] " Dave Chinner
@ 2018-06-05 10:42     ` Carlos Maiolino
  2018-06-05 23:00       ` Dave Chinner
  2018-06-05 17:47     ` Darrick J. Wong
  2018-06-06  1:21     ` [PATCH 4/6 v3] " Dave Chinner
  2 siblings, 1 reply; 36+ messages in thread
From: Carlos Maiolino @ 2018-06-05 10:42 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

>  	union xfs_btree_rec	*rec;
>  	int			error;
>  
> @@ -222,12 +224,28 @@ xfs_alloc_get_rec(
>  	if (error || !(*stat))
>  		return error;
>  	if (rec->alloc.ar_blockcount == 0)
> -		return -EFSCORRUPTED;
> +		goto out_bad_rec;
>  
>  	*bno = be32_to_cpu(rec->alloc.ar_startblock);
>  	*len = be32_to_cpu(rec->alloc.ar_blockcount);
>  
> -	return error;
> +	/* check for valid extent range, including overflow */
> +	if (!xfs_verify_agbno(mp, agno, *bno))
> +		goto out_bad_rec;
> +	if (*bno > *bno + *len)
> +		goto out_bad_rec;
> +	if (!xfs_verify_agbno(mp, agno, *bno + *len - 1))
> +		goto out_bad_rec;
> +
> +	return 0;
> +
> +out_bad_rec:
> +	xfs_warn(mp,
> +		"%s Freespace BTree record corruption in AG %d detected!",
> +		cur->bc_btnum == XFS_BTNUM_BNO ? "Block" : "Size", agno);
> +	xfs_warn(mp,
> +		"start block 0x%x block count 0x%x", *bno, *len);
> +	return -EFSCORRUPTED;
>  }
>  
>  /*
> diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> index ec5ea02b5553..3f551eb29157 100644
> --- a/fs/xfs/libxfs/xfs_ialloc.c
> +++ b/fs/xfs/libxfs/xfs_ialloc.c
> @@ -121,16 +121,45 @@ xfs_inobt_get_rec(
>  	struct xfs_inobt_rec_incore	*irec,
>  	int				*stat)
>  {
> +	struct xfs_mount		*mp = cur->bc_mp;
> +	xfs_agnumber_t			agno = cur->bc_private.a.agno;
>  	union xfs_btree_rec		*rec;
>  	int				error;
> +	uint64_t			realfree;
>  
>  	error = xfs_btree_get_rec(cur, &rec, stat);
>  	if (error || *stat == 0)
>  		return error;
>  
> -	xfs_inobt_btrec_to_irec(cur->bc_mp, rec, irec);
> +	xfs_inobt_btrec_to_irec(mp, rec, irec);
> +
> +	if (!xfs_verify_agino(mp, agno, irec->ir_startino))
> +		goto out_bad_rec;
> +	if (irec->ir_count < XFS_INODES_PER_HOLEMASK_BIT ||
> +	    irec->ir_count > XFS_INODES_PER_CHUNK)
> +		goto out_bad_rec;
> +	if (irec->ir_freecount > XFS_INODES_PER_CHUNK)
> +		goto out_bad_rec;
> +
> +	/* if there are no holes, return the first available offset */
> +	if (!xfs_inobt_issparse(irec->ir_holemask))
> +		realfree = irec->ir_free;
> +	else
> +		realfree = irec->ir_free & xfs_inobt_irec_to_allocmask(irec);
> +	if (hweight64(realfree) != irec->ir_freecount)
> +		goto out_bad_rec;
>  
>  	return 0;
> +
> +out_bad_rec:
> +	xfs_warn(mp,
> +		"%s Inode BTree record corruption in AG %d detected!",
> +		cur->bc_btnum == XFS_BTNUM_INO ? "Used" : "Free", agno);
> +	xfs_warn(mp,
> +"start inode 0x%x, count 0x%x, free 0x%x freemask 0x%llx, holemask 0x%x",
> +		irec->ir_startino, irec->ir_count, irec->ir_freecount,
> +		irec->ir_free, irec->ir_holemask);
> +	return -EFSCORRUPTED;
>  }
>  
>  /*
> diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
> index ed5704c7dcf5..9a2a2004af24 100644
> --- a/fs/xfs/libxfs/xfs_refcount.c
> +++ b/fs/xfs/libxfs/xfs_refcount.c
> @@ -111,16 +111,51 @@ xfs_refcount_get_rec(
>  	struct xfs_refcount_irec	*irec,
>  	int				*stat)
>  {
> +	struct xfs_mount		*mp = cur->bc_mp;
> +	xfs_agnumber_t			agno = cur->bc_private.a.agno;
>  	union xfs_btree_rec		*rec;
>  	int				error;
> +	xfs_agblock_t			realstart;
>  
>  	error = xfs_btree_get_rec(cur, &rec, stat);
> -	if (!error && *stat == 1) {
> -		xfs_refcount_btrec_to_irec(rec, irec);
> -		trace_xfs_refcount_get(cur->bc_mp, cur->bc_private.a.agno,
> -				irec);
> +	if (error || !*stat)
> +		return error;
> +
> +	xfs_refcount_btrec_to_irec(rec, irec);
> +
> +	agno = cur->bc_private.a.agno;
> +	if (irec->rc_blockcount == 0 || irec->rc_blockcount > MAXREFCEXTLEN)
> +		goto out_bad_rec;
> +
> +	/* handle special COW-staging state */
> +	realstart = irec->rc_startblock;
> +	if (realstart & XFS_REFC_COW_START) {
> +		if (irec->rc_refcount != 1)
> +			goto out_bad_rec;
> +		realstart &= ~XFS_REFC_COW_START;
>  	}
> -	return error;
> +
> +	/* check for valid extent range, including overflow */
> +	if (!xfs_verify_agbno(mp, agno, realstart))
> +		goto out_bad_rec;
> +	if (realstart > realstart + irec->rc_blockcount)

I am not sure if I'm right, but I thought this ought to be ">="?

Other than that, you can add:

Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>

-- 
Carlos

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 5/6] xfs: verify root inode more thoroughly
  2018-06-05  6:24 ` [PATCH 5/6] xfs: verify root inode more thoroughly Dave Chinner
@ 2018-06-05 10:50   ` Carlos Maiolino
  2018-06-05 17:10   ` Darrick J. Wong
  1 sibling, 0 replies; 36+ messages in thread
From: Carlos Maiolino @ 2018-06-05 10:50 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Jun 05, 2018 at 04:24:22PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When looking up the root inode at mount time, we don't actually do
> any verification to check that the inode is allocated and accounted
> for correctly in the INOBT. Make the checks on the root inode more
> robust by making it an untrusted lookup. This forces the inode
> lookup to use the inode btree to verify the inode is allocated
> and mapped correctly to disk. This will also have the effect of
> catching a significant number of AGI/INOBT related corruptions in
> AG 0 at mount time.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>

> ---
>  fs/xfs/xfs_mount.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index 189fa7b615d3..a3378252baa1 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -862,9 +862,12 @@ xfs_mountfs(
>  	 * Get and sanity-check the root inode.
>  	 * Save the pointer to it in the mount structure.
>  	 */
> -	error = xfs_iget(mp, NULL, sbp->sb_rootino, 0, XFS_ILOCK_EXCL, &rip);
> +	error = xfs_iget(mp, NULL, sbp->sb_rootino, XFS_IGET_UNTRUSTED,
> +			 XFS_ILOCK_EXCL, &rip);
>  	if (error) {
> -		xfs_warn(mp, "failed to read root inode");
> +		xfs_warn(mp,
> +			"Failed to read root inode 0x%llx, error %d",
> +			sbp->sb_rootino, -error);
>  		goto out_log_dealloc;
>  	}
>  
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Carlos

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 6/6] xfs: push corruption -> ESTALE conversion to xfs_nfs_get_inode()
  2018-06-05  6:24 ` [PATCH 6/6] xfs: push corruption -> ESTALE conversion to xfs_nfs_get_inode() Dave Chinner
@ 2018-06-05 11:12   ` Carlos Maiolino
  2018-06-05 17:11   ` Darrick J. Wong
  1 sibling, 0 replies; 36+ messages in thread
From: Carlos Maiolino @ 2018-06-05 11:12 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Jun 05, 2018 at 04:24:23PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> In xfs_imap_to_bp(), we convert a -EFSCORRUPTED error to -EINVAL if
> we are doing an untrusted lookup. This is done because we need
> failed filehandle lookups to report -ESTALE to the caller, and it
> does this by converting -EINVAL and -ENOENT errors to -ESTALE.
> 
> The squashing of EFSCORRUPTED in imap_to_bp makes it impossible for
> for xfs_iget(UNTRUSTED) callers to determine the difference between
> "inode does not exist" and "corruption detected during lookup". We
> realy need that distinction in places calling xfS_iget(UNTRUSTED),
> so move the filehandle error case handling all the way out to
> xfs_nfs_get_inode() where it is needed.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---


Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>


>  fs/xfs/libxfs/xfs_inode_buf.c |  5 -----
>  fs/xfs/xfs_export.c           | 15 ++++++++++++---
>  2 files changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
> index ea64be7cbd98..0a22c39325cd 100644
> --- a/fs/xfs/libxfs/xfs_inode_buf.c
> +++ b/fs/xfs/libxfs/xfs_inode_buf.c
> @@ -189,11 +189,6 @@ xfs_imap_to_bp(
>  			ASSERT(buf_flags & XBF_TRYLOCK);
>  			return error;
>  		}
> -
> -		if (error == -EFSCORRUPTED &&
> -		    (iget_flags & XFS_IGET_UNTRUSTED))
> -			return -EINVAL;
> -
>  		xfs_warn(mp, "%s: xfs_trans_read_buf() returned error %d.",
>  			__func__, error);
>  		return error;
> diff --git a/fs/xfs/xfs_export.c b/fs/xfs/xfs_export.c
> index 7af7d6faa709..3cf4682e2510 100644
> --- a/fs/xfs/xfs_export.c
> +++ b/fs/xfs/xfs_export.c
> @@ -128,15 +128,24 @@ xfs_nfs_get_inode(
>  	 */
>  	error = xfs_iget(mp, NULL, ino, XFS_IGET_UNTRUSTED, 0, &ip);
>  	if (error) {
> +
>  		/*
>  		 * EINVAL means the inode cluster doesn't exist anymore.
> -		 * This implies the filehandle is stale, so we should
> -		 * translate it here.
> +		 * EFSCORRUPTED means the metadata pointing to the inode cluster
> +		 * or the inode cluster itself is corrupt.  This implies the
> +		 * filehandle is stale, so we should translate it here.
>  		 * We don't use ESTALE directly down the chain to not
>  		 * confuse applications using bulkstat that expect EINVAL.
>  		 */
> -		if (error == -EINVAL || error == -ENOENT)
> +		switch (error) {
> +		case -EINVAL:
> +		case -ENOENT:
> +		case -EFSCORRUPTED:
>  			error = -ESTALE;
> +			break;
> +		default:
> +			break;
> +		}
>  		return ERR_PTR(error);
>  	}
>  
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Carlos

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/6] xfs: verify COW extent size hint is valid in inode verifier
  2018-06-05  6:24 ` [PATCH 3/6] xfs: verify COW " Dave Chinner
  2018-06-05 10:00   ` Carlos Maiolino
@ 2018-06-05 17:09   ` Darrick J. Wong
  1 sibling, 0 replies; 36+ messages in thread
From: Darrick J. Wong @ 2018-06-05 17:09 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Jun 05, 2018 at 04:24:20PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> There are rules for vald extent size hints. We enforce them when
> applications set them, but fuzzers violate those rules and that
> screws us over. Validate COW extent size hint rules in the inode
> verifier to catch this.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_inode_buf.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
> index be197c91307b..ea64be7cbd98 100644
> --- a/fs/xfs/libxfs/xfs_inode_buf.c
> +++ b/fs/xfs/libxfs/xfs_inode_buf.c
> @@ -504,7 +504,7 @@ xfs_dinode_verify(
>  
>  	/* extent size hint validation */
>  	fa = xfs_inode_validate_extsize(mp, be32_to_cpu(dip->di_extsize),
> -					mode, be32_to_cpu(dip->di_flags));
> +					mode, flags);
>  	if (fa)
>  		return fa;
>  
> @@ -516,7 +516,7 @@ xfs_dinode_verify(
>  
>  	/* don't allow reflink/cowextsize if we don't have reflink */
>  	if ((flags2 & (XFS_DIFLAG2_REFLINK | XFS_DIFLAG2_COWEXTSIZE)) &&
> -            !xfs_sb_version_hasreflink(&mp->m_sb))
> +	     !xfs_sb_version_hasreflink(&mp->m_sb))

These two bits belong in the previous patch, but I'll fix them on the
way in.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

>  		return __this_address;
>  
>  	/* only regular files get reflink */
> @@ -531,6 +531,12 @@ xfs_dinode_verify(
>  	if ((flags2 & XFS_DIFLAG2_REFLINK) && (flags2 & XFS_DIFLAG2_DAX))
>  		return __this_address;
>  
> +	/* COW extent size hint validation */
> +	fa = xfs_inode_validate_cowextsize(mp, be32_to_cpu(dip->di_cowextsize),
> +					mode, flags, flags2);
> +	if (fa)
> +		return fa;
> +
>  	return NULL;
>  }
>  
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier
  2018-06-05  6:24 ` [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier Dave Chinner
  2018-06-05  9:53   ` Carlos Maiolino
@ 2018-06-05 17:10   ` Darrick J. Wong
  2018-06-07 16:16     ` Darrick J. Wong
  2018-07-24  6:39   ` Eric Sandeen
  2 siblings, 1 reply; 36+ messages in thread
From: Darrick J. Wong @ 2018-06-05 17:10 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Jun 05, 2018 at 04:24:19PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> There are rules for vald extent size hints. We enforce them when
> applications set them, but fuzzers violate those rules and that
> screws us over.
> 
> This results in alignment assertion failures when setting up
> allocations such as this in direct IO:
> 
> XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432
> ....
> Call Trace:
>  xfs_bmap_btalloc+0x415/0x910
>  xfs_bmapi_write+0x71c/0x12e0
>  xfs_iomap_write_direct+0x2a9/0x420
>  xfs_file_iomap_begin+0x4dc/0xa70
>  iomap_apply+0x43/0x100
>  iomap_file_buffered_write+0x62/0x90
>  xfs_file_buffered_aio_write+0xba/0x300
>  __vfs_write+0xd5/0x150
>  vfs_write+0xb6/0x180
>  ksys_write+0x45/0xa0
>  do_syscall_64+0x5a/0x180
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> And from xfs_db:
> 
> core.extsize = 10380288
> 
> Which is not an integer multiple of the block size, and so violates
> Rule #7 for setting extent size hints. Validate extent size hint
> rules in the inode verifier to catch this.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Looks ok modulo my comments in the next patch,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/libxfs/xfs_inode_buf.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
> index f5fff1ccb61d..be197c91307b 100644
> --- a/fs/xfs/libxfs/xfs_inode_buf.c
> +++ b/fs/xfs/libxfs/xfs_inode_buf.c
> @@ -385,6 +385,7 @@ xfs_dinode_verify(
>  	xfs_ino_t		ino,
>  	struct xfs_dinode	*dip)
>  {
> +	xfs_failaddr_t		fa;
>  	uint16_t		mode;
>  	uint16_t		flags;
>  	uint64_t		flags2;
> @@ -501,6 +502,12 @@ xfs_dinode_verify(
>  			return __this_address;
>  	}
>  
> +	/* extent size hint validation */
> +	fa = xfs_inode_validate_extsize(mp, be32_to_cpu(dip->di_extsize),
> +					mode, be32_to_cpu(dip->di_flags));
> +	if (fa)
> +		return fa;
> +
>  	/* only version 3 or greater inodes are extensively verified here */
>  	if (dip->di_version < 3)
>  		return NULL;
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 5/6] xfs: verify root inode more thoroughly
  2018-06-05  6:24 ` [PATCH 5/6] xfs: verify root inode more thoroughly Dave Chinner
  2018-06-05 10:50   ` Carlos Maiolino
@ 2018-06-05 17:10   ` Darrick J. Wong
  1 sibling, 0 replies; 36+ messages in thread
From: Darrick J. Wong @ 2018-06-05 17:10 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Jun 05, 2018 at 04:24:22PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When looking up the root inode at mount time, we don't actually do
> any verification to check that the inode is allocated and accounted
> for correctly in the INOBT. Make the checks on the root inode more
> robust by making it an untrusted lookup. This forces the inode
> lookup to use the inode btree to verify the inode is allocated
> and mapped correctly to disk. This will also have the effect of
> catching a significant number of AGI/INOBT related corruptions in
> AG 0 at mount time.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_mount.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index 189fa7b615d3..a3378252baa1 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -862,9 +862,12 @@ xfs_mountfs(
>  	 * Get and sanity-check the root inode.
>  	 * Save the pointer to it in the mount structure.
>  	 */
> -	error = xfs_iget(mp, NULL, sbp->sb_rootino, 0, XFS_ILOCK_EXCL, &rip);
> +	error = xfs_iget(mp, NULL, sbp->sb_rootino, XFS_IGET_UNTRUSTED,
> +			 XFS_ILOCK_EXCL, &rip);
>  	if (error) {
> -		xfs_warn(mp, "failed to read root inode");
> +		xfs_warn(mp,
> +			"Failed to read root inode 0x%llx, error %d",
> +			sbp->sb_rootino, -error);
>  		goto out_log_dealloc;
>  	}
>  
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 6/6] xfs: push corruption -> ESTALE conversion to xfs_nfs_get_inode()
  2018-06-05  6:24 ` [PATCH 6/6] xfs: push corruption -> ESTALE conversion to xfs_nfs_get_inode() Dave Chinner
  2018-06-05 11:12   ` Carlos Maiolino
@ 2018-06-05 17:11   ` Darrick J. Wong
  1 sibling, 0 replies; 36+ messages in thread
From: Darrick J. Wong @ 2018-06-05 17:11 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Jun 05, 2018 at 04:24:23PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> In xfs_imap_to_bp(), we convert a -EFSCORRUPTED error to -EINVAL if
> we are doing an untrusted lookup. This is done because we need
> failed filehandle lookups to report -ESTALE to the caller, and it
> does this by converting -EINVAL and -ENOENT errors to -ESTALE.
> 
> The squashing of EFSCORRUPTED in imap_to_bp makes it impossible for
> for xfs_iget(UNTRUSTED) callers to determine the difference between
> "inode does not exist" and "corruption detected during lookup". We
> realy need that distinction in places calling xfS_iget(UNTRUSTED),
> so move the filehandle error case handling all the way out to
> xfs_nfs_get_inode() where it is needed.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Woot!
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/libxfs/xfs_inode_buf.c |  5 -----
>  fs/xfs/xfs_export.c           | 15 ++++++++++++---
>  2 files changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
> index ea64be7cbd98..0a22c39325cd 100644
> --- a/fs/xfs/libxfs/xfs_inode_buf.c
> +++ b/fs/xfs/libxfs/xfs_inode_buf.c
> @@ -189,11 +189,6 @@ xfs_imap_to_bp(
>  			ASSERT(buf_flags & XBF_TRYLOCK);
>  			return error;
>  		}
> -
> -		if (error == -EFSCORRUPTED &&
> -		    (iget_flags & XFS_IGET_UNTRUSTED))
> -			return -EINVAL;
> -
>  		xfs_warn(mp, "%s: xfs_trans_read_buf() returned error %d.",
>  			__func__, error);
>  		return error;
> diff --git a/fs/xfs/xfs_export.c b/fs/xfs/xfs_export.c
> index 7af7d6faa709..3cf4682e2510 100644
> --- a/fs/xfs/xfs_export.c
> +++ b/fs/xfs/xfs_export.c
> @@ -128,15 +128,24 @@ xfs_nfs_get_inode(
>  	 */
>  	error = xfs_iget(mp, NULL, ino, XFS_IGET_UNTRUSTED, 0, &ip);
>  	if (error) {
> +
>  		/*
>  		 * EINVAL means the inode cluster doesn't exist anymore.
> -		 * This implies the filehandle is stale, so we should
> -		 * translate it here.
> +		 * EFSCORRUPTED means the metadata pointing to the inode cluster
> +		 * or the inode cluster itself is corrupt.  This implies the
> +		 * filehandle is stale, so we should translate it here.
>  		 * We don't use ESTALE directly down the chain to not
>  		 * confuse applications using bulkstat that expect EINVAL.
>  		 */
> -		if (error == -EINVAL || error == -ENOENT)
> +		switch (error) {
> +		case -EINVAL:
> +		case -ENOENT:
> +		case -EFSCORRUPTED:
>  			error = -ESTALE;
> +			break;
> +		default:
> +			break;
> +		}
>  		return ERR_PTR(error);
>  	}
>  
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 4/6 v2] xfs: validate btree records on retreival
  2018-06-05  6:40   ` [PATCH 4/6 v2] " Dave Chinner
  2018-06-05 10:42     ` Carlos Maiolino
@ 2018-06-05 17:47     ` Darrick J. Wong
  2018-06-05 23:02       ` Dave Chinner
  2018-06-06  1:21     ` [PATCH 4/6 v3] " Dave Chinner
  2 siblings, 1 reply; 36+ messages in thread
From: Darrick J. Wong @ 2018-06-05 17:47 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Jun 05, 2018 at 04:40:43PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> So we don't check the validity of records as we walk the btree. When
> there are corrupt records in the free space btree (e.g. zero
> startblock/length or beyond EOAG) we just blindly use it and things
> go bad from there. That leads to assert failures on debug kernels
> like this:
> 
> XFS: Assertion failed: fs_is_ok, file: fs/xfs/libxfs/xfs_alloc.c, line: 450
> ....
> Call Trace:
>  xfs_alloc_fixup_trees+0x368/0x5c0
>  xfs_alloc_ag_vextent_near+0x79a/0xe20
>  xfs_alloc_ag_vextent+0x1d3/0x330
>  xfs_alloc_vextent+0x5e9/0x870
> 
> Or crashes like this:
> 
> XFS (loop0): xfs_buf_find: daddr 0x7fb28 out of range, EOFS 0x8000
> .....
> BUG: unable to handle kernel NULL pointer dereference at 00000000000000c8
> ....
> Call Trace:
>  xfs_bmap_add_extent_hole_real+0x67d/0x930
>  xfs_bmapi_write+0x934/0xc90
>  xfs_da_grow_inode_int+0x27e/0x2f0
>  xfs_dir2_grow_inode+0x55/0x130
>  xfs_dir2_sf_to_block+0x94/0x5d0
>  xfs_dir2_sf_addname+0xd0/0x590
>  xfs_dir_createname+0x168/0x1a0
>  xfs_rename+0x658/0x9b0
> 
> By checking that free space records pulled from the trees are 
> within the valid range, we catch many of these corruptions before
> they can do damage.
> 
> This is a generic btree record checking deficiency. We need to
> validate the records we fetch from all the different btrees before
> we use them to catch corruptions like this.
> 
> This patch results in a corrupt record emitting an error message and
> returning -EFSCORRUPTED, and the higher layers catch that and abort:
> 
>  XFS (loop0): Size Freespace BTree record corruption in AG 0 detected!
>  XFS (loop0): start block 0x0 block count 0x0
>  XFS (loop0): Internal error xfs_trans_cancel at line 1012 of file fs/xfs/xfs_trans.c.  Caller xfs_create+0x42a/0x670
>  .....
>  Call Trace:
>   dump_stack+0x85/0xcb
>   xfs_trans_cancel+0x19f/0x1c0
>   xfs_create+0x42a/0x670
>   xfs_generic_create+0x1f6/0x2c0
>   vfs_create+0xf9/0x180
>   do_mknodat+0x1f9/0x210
>   do_syscall_64+0x5a/0x180
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> .....
>  XFS (loop0): xfs_do_force_shutdown(0x8) called from line 1013 of file fs/xfs/xfs_trans.c.  Return address = ffffffff81500868
>  XFS (loop0): Corruption of in-memory data detected.  Shutting down filesystem
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> 
> ---
> V2:
> - now with overflow checks
> - slightly more robust rmap checks that validate AG header region
>   properly.
> 
>  fs/xfs/libxfs/xfs_alloc.c    | 22 ++++++++++++++++++++--
>  fs/xfs/libxfs/xfs_ialloc.c   | 31 +++++++++++++++++++++++++++++-
>  fs/xfs/libxfs/xfs_refcount.c | 45 +++++++++++++++++++++++++++++++++++++++-----
>  fs/xfs/libxfs/xfs_rmap.c     | 40 ++++++++++++++++++++++++++++++++++++++-
>  4 files changed, 129 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index a52ffb835b16..1db50cfc0212 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -215,6 +215,8 @@ xfs_alloc_get_rec(
>  	xfs_extlen_t		*len,	/* output: length of extent */
>  	int			*stat)	/* output: success/failure */
>  {
> +	struct xfs_mount	*mp = cur->bc_mp;
> +	xfs_agnumber_t		agno = cur->bc_private.a.agno;
>  	union xfs_btree_rec	*rec;
>  	int			error;
>  
> @@ -222,12 +224,28 @@ xfs_alloc_get_rec(
>  	if (error || !(*stat))
>  		return error;
>  	if (rec->alloc.ar_blockcount == 0)
> -		return -EFSCORRUPTED;
> +		goto out_bad_rec;
>  
>  	*bno = be32_to_cpu(rec->alloc.ar_startblock);
>  	*len = be32_to_cpu(rec->alloc.ar_blockcount);
>  
> -	return error;
> +	/* check for valid extent range, including overflow */
> +	if (!xfs_verify_agbno(mp, agno, *bno))
> +		goto out_bad_rec;
> +	if (*bno > *bno + *len)
> +		goto out_bad_rec;
> +	if (!xfs_verify_agbno(mp, agno, *bno + *len - 1))
> +		goto out_bad_rec;
> +
> +	return 0;
> +
> +out_bad_rec:
> +	xfs_warn(mp,
> +		"%s Freespace BTree record corruption in AG %d detected!",
> +		cur->bc_btnum == XFS_BTNUM_BNO ? "Block" : "Size", agno);
> +	xfs_warn(mp,
> +		"start block 0x%x block count 0x%x", *bno, *len);
> +	return -EFSCORRUPTED;
>  }
>  
>  /*
> diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> index ec5ea02b5553..3f551eb29157 100644
> --- a/fs/xfs/libxfs/xfs_ialloc.c
> +++ b/fs/xfs/libxfs/xfs_ialloc.c
> @@ -121,16 +121,45 @@ xfs_inobt_get_rec(
>  	struct xfs_inobt_rec_incore	*irec,
>  	int				*stat)
>  {
> +	struct xfs_mount		*mp = cur->bc_mp;
> +	xfs_agnumber_t			agno = cur->bc_private.a.agno;
>  	union xfs_btree_rec		*rec;
>  	int				error;
> +	uint64_t			realfree;
>  
>  	error = xfs_btree_get_rec(cur, &rec, stat);
>  	if (error || *stat == 0)
>  		return error;
>  
> -	xfs_inobt_btrec_to_irec(cur->bc_mp, rec, irec);
> +	xfs_inobt_btrec_to_irec(mp, rec, irec);
> +
> +	if (!xfs_verify_agino(mp, agno, irec->ir_startino))
> +		goto out_bad_rec;
> +	if (irec->ir_count < XFS_INODES_PER_HOLEMASK_BIT ||
> +	    irec->ir_count > XFS_INODES_PER_CHUNK)
> +		goto out_bad_rec;
> +	if (irec->ir_freecount > XFS_INODES_PER_CHUNK)
> +		goto out_bad_rec;
> +
> +	/* if there are no holes, return the first available offset */
> +	if (!xfs_inobt_issparse(irec->ir_holemask))
> +		realfree = irec->ir_free;
> +	else
> +		realfree = irec->ir_free & xfs_inobt_irec_to_allocmask(irec);
> +	if (hweight64(realfree) != irec->ir_freecount)
> +		goto out_bad_rec;
>  
>  	return 0;
> +
> +out_bad_rec:
> +	xfs_warn(mp,
> +		"%s Inode BTree record corruption in AG %d detected!",
> +		cur->bc_btnum == XFS_BTNUM_INO ? "Used" : "Free", agno);
> +	xfs_warn(mp,
> +"start inode 0x%x, count 0x%x, free 0x%x freemask 0x%llx, holemask 0x%x",
> +		irec->ir_startino, irec->ir_count, irec->ir_freecount,
> +		irec->ir_free, irec->ir_holemask);
> +	return -EFSCORRUPTED;
>  }
>  
>  /*
> diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
> index ed5704c7dcf5..9a2a2004af24 100644
> --- a/fs/xfs/libxfs/xfs_refcount.c
> +++ b/fs/xfs/libxfs/xfs_refcount.c
> @@ -111,16 +111,51 @@ xfs_refcount_get_rec(
>  	struct xfs_refcount_irec	*irec,
>  	int				*stat)
>  {
> +	struct xfs_mount		*mp = cur->bc_mp;
> +	xfs_agnumber_t			agno = cur->bc_private.a.agno;
>  	union xfs_btree_rec		*rec;
>  	int				error;
> +	xfs_agblock_t			realstart;
>  
>  	error = xfs_btree_get_rec(cur, &rec, stat);
> -	if (!error && *stat == 1) {
> -		xfs_refcount_btrec_to_irec(rec, irec);
> -		trace_xfs_refcount_get(cur->bc_mp, cur->bc_private.a.agno,
> -				irec);
> +	if (error || !*stat)
> +		return error;
> +
> +	xfs_refcount_btrec_to_irec(rec, irec);
> +
> +	agno = cur->bc_private.a.agno;
> +	if (irec->rc_blockcount == 0 || irec->rc_blockcount > MAXREFCEXTLEN)
> +		goto out_bad_rec;
> +
> +	/* handle special COW-staging state */
> +	realstart = irec->rc_startblock;
> +	if (realstart & XFS_REFC_COW_START) {
> +		if (irec->rc_refcount != 1)
> +			goto out_bad_rec;
> +		realstart &= ~XFS_REFC_COW_START;
>  	}

If the record does not have the cow "flag" set then the refcount cannot
be 1, so this needs an else clause:

} else {
	if (irec->rc_refcount == 1)
		goto out_bad_rec;
}

> -	return error;
> +
> +	/* check for valid extent range, including overflow */
> +	if (!xfs_verify_agbno(mp, agno, realstart))
> +		goto out_bad_rec;
> +	if (realstart > realstart + irec->rc_blockcount)
> +		goto out_bad_rec;
> +	if (!xfs_verify_agbno(mp, agno, realstart + irec->rc_blockcount - 1))
> +		goto out_bad_rec;
> +
> +	if (irec->rc_refcount == 0 || irec->rc_refcount > MAXREFCOUNT)
> +		goto out_bad_rec;
> +
> +	trace_xfs_refcount_get(cur->bc_mp, cur->bc_private.a.agno, irec);
> +	return 0;
> +
> +out_bad_rec:
> +	xfs_warn(mp,
> +		"Refcount BTree record corruption in AG %d detected!", agno);
> +	xfs_warn(mp,
> +		"Start block 0x%x, block count 0x%x, references 0x%x",
> +		irec->rc_startblock, irec->rc_blockcount, irec->rc_refcount);
> +	return -EFSCORRUPTED;
>  }
>  
>  /*
> diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
> index 87711f9af625..e794bcd6591a 100644
> --- a/fs/xfs/libxfs/xfs_rmap.c
> +++ b/fs/xfs/libxfs/xfs_rmap.c
> @@ -191,6 +191,8 @@ xfs_rmap_get_rec(
>  	struct xfs_rmap_irec	*irec,
>  	int			*stat)
>  {
> +	struct xfs_mount	*mp = cur->bc_mp;
> +	xfs_agnumber_t		agno = cur->bc_private.a.agno;
>  	union xfs_btree_rec	*rec;
>  	int			error;
>  
> @@ -198,7 +200,43 @@ xfs_rmap_get_rec(
>  	if (error || !*stat)
>  		return error;
>  
> -	return xfs_rmap_btrec_to_irec(rec, irec);
> +	if (xfs_rmap_btrec_to_irec(rec, irec))
> +		goto out_bad_rec;
> +
> +	if (irec->rm_blockcount == 0)
> +		goto out_bad_rec;
> +	if (irec->rm_startblock <= XFS_AGFL_BLOCK(mp)) {
> +		if (irec->rm_owner != XFS_RMAP_OWN_FS)
> +			goto out_bad_rec;
> +		if (irec->rm_blockcount != XFS_AGFL_BLOCK(mp) + 1)
> +			goto out_bad_rec;
> +	} else {
> +		/* check for valid extent range, including overflow */
> +		if (!xfs_verify_agbno(mp, agno, irec->rm_startblock))
> +			goto out_bad_rec;
> +		if (irec->rm_startblock >
> +				irec->rm_startblock + irec->rm_blockcount)
> +			goto out_bad_rec;
> +		if (!xfs_verify_agbno(mp, agno,
> +				irec->rm_startblock + irec->rm_blockcount - 1))
> +			goto out_bad_rec;
> +	}
> +
> +	if (irec->rm_owner == 0)
> +		goto out_bad_rec;
> +	if (irec->rm_owner > XFS_MAXINUMBER &&
> +	    irec->rm_owner <= XFS_RMAP_OWN_MIN)

rm_owner should be (between XFS_RMAP_OWN_FS and XFS_RMAP_OWN_MIN) or
(pass xfs_verify_fsino()) since we cannot have inodes between EOFS and
MAXINUMBER.

> +		goto out_bad_rec;
> +

Should we check the rmap flags here?

> +	return 0;
> +out_bad_rec:
> +	xfs_warn(mp,
> +		"RMAP BTree record corruption in AG %d detected!", agno);

"RMap" ?  Or "Reverse Mapping"?

(Consistent capitalization blah blah...)

--D

> +	xfs_warn(mp,
> +		"Owner 0x%llx, flags 0x%x, start block 0x%x block count 0x%x",
> +		irec->rm_owner, irec->rm_flags, irec->rm_startblock,
> +		irec->rm_blockcount);
> +	return -EFSCORRUPTED;
>  }
>  
>  struct xfs_find_left_neighbor_info {
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier
  2018-06-05  9:53   ` Carlos Maiolino
@ 2018-06-05 22:56     ` Dave Chinner
  0 siblings, 0 replies; 36+ messages in thread
From: Dave Chinner @ 2018-06-05 22:56 UTC (permalink / raw)
  To: linux-xfs

On Tue, Jun 05, 2018 at 11:53:59AM +0200, Carlos Maiolino wrote:
> On Tue, Jun 05, 2018 at 04:24:19PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > There are rules for vald extent size hints. We enforce them when
> > applications set them, but fuzzers violate those rules and that
> > screws us over.
> > 
> > This results in alignment assertion failures when setting up
> > allocations such as this in direct IO:
> > 
> > XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432
> > ....
> > Call Trace:
> >  xfs_bmap_btalloc+0x415/0x910
> >  xfs_bmapi_write+0x71c/0x12e0
> >  xfs_iomap_write_direct+0x2a9/0x420
> >  xfs_file_iomap_begin+0x4dc/0xa70
> >  iomap_apply+0x43/0x100
> >  iomap_file_buffered_write+0x62/0x90
> >  xfs_file_buffered_aio_write+0xba/0x300
> >  __vfs_write+0xd5/0x150
> >  vfs_write+0xb6/0x180
> >  ksys_write+0x45/0xa0
> >  do_syscall_64+0x5a/0x180
> >  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > 
> > And from xfs_db:
> > 
> > core.extsize = 10380288
> > 
> > Which is not an integer multiple of the block size, and so violates
> > Rule #7 for setting extent size hints. Validate extent size hint
> > rules in the inode verifier to catch this.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/libxfs/xfs_inode_buf.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
> > index f5fff1ccb61d..be197c91307b 100644
> > --- a/fs/xfs/libxfs/xfs_inode_buf.c
> > +++ b/fs/xfs/libxfs/xfs_inode_buf.c
> > @@ -385,6 +385,7 @@ xfs_dinode_verify(
> >  	xfs_ino_t		ino,
> >  	struct xfs_dinode	*dip)
> >  {
> > +	xfs_failaddr_t		fa;
> 
> Weren't we getting rid of typedefs?

Unneeded typedefs, yes.  e.g. typedef struct foo { } foo_t; serve no
useful purpose, so we get rid of them where appropriate.

> To be honest the typedef here gives more
> clarity to the code than void* directly, so, I'm ok with it anyway, I'm just
> curious is some typedefs are going to be kept.

Right, xfs_failaddr_t is a useful typedef - it tells us that this
variable will hold an instruction pointer related to the failure
that was detected, which is something a void * can't tell us.

It's all about context :P

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 4/6 v2] xfs: validate btree records on retreival
  2018-06-05 10:42     ` Carlos Maiolino
@ 2018-06-05 23:00       ` Dave Chinner
  0 siblings, 0 replies; 36+ messages in thread
From: Dave Chinner @ 2018-06-05 23:00 UTC (permalink / raw)
  To: linux-xfs

On Tue, Jun 05, 2018 at 12:42:07PM +0200, Carlos Maiolino wrote:
> > +	if (irec->rc_blockcount == 0 || irec->rc_blockcount > MAXREFCEXTLEN)
> > +		goto out_bad_rec;
> > +
> > +	/* handle special COW-staging state */
> > +	realstart = irec->rc_startblock;
> > +	if (realstart & XFS_REFC_COW_START) {
> > +		if (irec->rc_refcount != 1)
> > +			goto out_bad_rec;
> > +		realstart &= ~XFS_REFC_COW_START;
> >  	}
> > -	return error;
> > +
> > +	/* check for valid extent range, including overflow */
> > +	if (!xfs_verify_agbno(mp, agno, realstart))
> > +		goto out_bad_rec;
> > +	if (realstart > realstart + irec->rc_blockcount)
> 
> I am not sure if I'm right, but I thought this ought to be ">="?

We've already caught zero length and block count greater than
2^32-1, so if the above is true we've wrapped through zero during
the addition. But we can never add 0 or 2^32 to realstart here, so
the "==" condition will not occur and we don't need to check for
it....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 4/6 v2] xfs: validate btree records on retreival
  2018-06-05 17:47     ` Darrick J. Wong
@ 2018-06-05 23:02       ` Dave Chinner
  0 siblings, 0 replies; 36+ messages in thread
From: Dave Chinner @ 2018-06-05 23:02 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Jun 05, 2018 at 10:47:16AM -0700, Darrick J. Wong wrote:
> On Tue, Jun 05, 2018 at 04:40:43PM +1000, Dave Chinner wrote:
> > @@ -111,16 +111,51 @@ xfs_refcount_get_rec(
> >  	struct xfs_refcount_irec	*irec,
> >  	int				*stat)
> >  {
> > +	struct xfs_mount		*mp = cur->bc_mp;
> > +	xfs_agnumber_t			agno = cur->bc_private.a.agno;
> >  	union xfs_btree_rec		*rec;
> >  	int				error;
> > +	xfs_agblock_t			realstart;
> >  
> >  	error = xfs_btree_get_rec(cur, &rec, stat);
> > -	if (!error && *stat == 1) {
> > -		xfs_refcount_btrec_to_irec(rec, irec);
> > -		trace_xfs_refcount_get(cur->bc_mp, cur->bc_private.a.agno,
> > -				irec);
> > +	if (error || !*stat)
> > +		return error;
> > +
> > +	xfs_refcount_btrec_to_irec(rec, irec);
> > +
> > +	agno = cur->bc_private.a.agno;
> > +	if (irec->rc_blockcount == 0 || irec->rc_blockcount > MAXREFCEXTLEN)
> > +		goto out_bad_rec;
> > +
> > +	/* handle special COW-staging state */
> > +	realstart = irec->rc_startblock;
> > +	if (realstart & XFS_REFC_COW_START) {
> > +		if (irec->rc_refcount != 1)
> > +			goto out_bad_rec;
> > +		realstart &= ~XFS_REFC_COW_START;
> >  	}
> 
> If the record does not have the cow "flag" set then the refcount cannot
> be 1, so this needs an else clause:
> 
> } else {
> 	if (irec->rc_refcount == 1)
> 		goto out_bad_rec;
> }

Ah, yes, that is true, though it should be refcount < 2 because a
zero value is invalid, too.

> > +	if (irec->rm_owner == 0)
> > +		goto out_bad_rec;
> > +	if (irec->rm_owner > XFS_MAXINUMBER &&
> > +	    irec->rm_owner <= XFS_RMAP_OWN_MIN)
> 
> rm_owner should be (between XFS_RMAP_OWN_FS and XFS_RMAP_OWN_MIN) or
> (pass xfs_verify_fsino()) since we cannot have inodes between EOFS and
> MAXINUMBER.

Yes, that is better. I should have noticed xfs_verify_fsino() - my
bad.
>
> > +		goto out_bad_rec;
> > +
> 
> Should we check the rmap flags here?

They are checked in xfs_refcount_btrec_to_irec(), and it returns
-EFSCORRUPTED if invalid flags are set.

> > +	return 0;
> > +out_bad_rec:
> > +	xfs_warn(mp,
> > +		"RMAP BTree record corruption in AG %d detected!", agno);
> 
> "RMap" ?  Or "Reverse Mapping"?
> 
> (Consistent capitalization blah blah...)

I'll use the latter.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 4/6 v3] xfs: validate btree records on retreival
  2018-06-05  6:40   ` [PATCH 4/6 v2] " Dave Chinner
  2018-06-05 10:42     ` Carlos Maiolino
  2018-06-05 17:47     ` Darrick J. Wong
@ 2018-06-06  1:21     ` Dave Chinner
  2 siblings, 0 replies; 36+ messages in thread
From: Dave Chinner @ 2018-06-06  1:21 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

So we don't check the validity of records as we walk the btree. When
there are corrupt records in the free space btree (e.g. zero
startblock/length or beyond EOAG) we just blindly use it and things
go bad from there. That leads to assert failures on debug kernels
like this:

XFS: Assertion failed: fs_is_ok, file: fs/xfs/libxfs/xfs_alloc.c, line: 450
....
Call Trace:
 xfs_alloc_fixup_trees+0x368/0x5c0
 xfs_alloc_ag_vextent_near+0x79a/0xe20
 xfs_alloc_ag_vextent+0x1d3/0x330
 xfs_alloc_vextent+0x5e9/0x870

Or crashes like this:

XFS (loop0): xfs_buf_find: daddr 0x7fb28 out of range, EOFS 0x8000
.....
BUG: unable to handle kernel NULL pointer dereference at 00000000000000c8
....
Call Trace:
 xfs_bmap_add_extent_hole_real+0x67d/0x930
 xfs_bmapi_write+0x934/0xc90
 xfs_da_grow_inode_int+0x27e/0x2f0
 xfs_dir2_grow_inode+0x55/0x130
 xfs_dir2_sf_to_block+0x94/0x5d0
 xfs_dir2_sf_addname+0xd0/0x590
 xfs_dir_createname+0x168/0x1a0
 xfs_rename+0x658/0x9b0

By checking that free space records pulled from the trees are 
within the valid range, we catch many of these corruptions before
they can do damage.

This is a generic btree record checking deficiency. We need to
validate the records we fetch from all the different btrees before
we use them to catch corruptions like this.

This patch results in a corrupt record emitting an error message and
returning -EFSCORRUPTED, and the higher layers catch that and abort:

 XFS (loop0): Size Freespace BTree record corruption in AG 0 detected!
 XFS (loop0): start block 0x0 block count 0x0
 XFS (loop0): Internal error xfs_trans_cancel at line 1012 of file fs/xfs/xfs_trans.c.  Caller xfs_create+0x42a/0x670
 .....
 Call Trace:
  dump_stack+0x85/0xcb
  xfs_trans_cancel+0x19f/0x1c0
  xfs_create+0x42a/0x670
  xfs_generic_create+0x1f6/0x2c0
  vfs_create+0xf9/0x180
  do_mknodat+0x1f9/0x210
  do_syscall_64+0x5a/0x180
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
.....
 XFS (loop0): xfs_do_force_shutdown(0x8) called from line 1013 of file fs/xfs/xfs_trans.c.  Return address = ffffffff81500868
 XFS (loop0): Corruption of in-memory data detected.  Shutting down filesystem

Signed-off-by: Dave Chinner <dchinner@redhat.com>

---
V3:
- add refcount range check for non-COW extents
- fixed rmap owner range check
- fixed rmap error message format "Reverse Mapping BTree..."

 fs/xfs/libxfs/xfs_alloc.c    | 22 +++++++++++++++++++--
 fs/xfs/libxfs/xfs_ialloc.c   | 31 ++++++++++++++++++++++++++++-
 fs/xfs/libxfs/xfs_refcount.c | 47 +++++++++++++++++++++++++++++++++++++++-----
 fs/xfs/libxfs/xfs_rmap.c     | 41 +++++++++++++++++++++++++++++++++++++-
 4 files changed, 132 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index a52ffb835b16..1db50cfc0212 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -215,6 +215,8 @@ xfs_alloc_get_rec(
 	xfs_extlen_t		*len,	/* output: length of extent */
 	int			*stat)	/* output: success/failure */
 {
+	struct xfs_mount	*mp = cur->bc_mp;
+	xfs_agnumber_t		agno = cur->bc_private.a.agno;
 	union xfs_btree_rec	*rec;
 	int			error;
 
@@ -222,12 +224,28 @@ xfs_alloc_get_rec(
 	if (error || !(*stat))
 		return error;
 	if (rec->alloc.ar_blockcount == 0)
-		return -EFSCORRUPTED;
+		goto out_bad_rec;
 
 	*bno = be32_to_cpu(rec->alloc.ar_startblock);
 	*len = be32_to_cpu(rec->alloc.ar_blockcount);
 
-	return error;
+	/* check for valid extent range, including overflow */
+	if (!xfs_verify_agbno(mp, agno, *bno))
+		goto out_bad_rec;
+	if (*bno > *bno + *len)
+		goto out_bad_rec;
+	if (!xfs_verify_agbno(mp, agno, *bno + *len - 1))
+		goto out_bad_rec;
+
+	return 0;
+
+out_bad_rec:
+	xfs_warn(mp,
+		"%s Freespace BTree record corruption in AG %d detected!",
+		cur->bc_btnum == XFS_BTNUM_BNO ? "Block" : "Size", agno);
+	xfs_warn(mp,
+		"start block 0x%x block count 0x%x", *bno, *len);
+	return -EFSCORRUPTED;
 }
 
 /*
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index ec5ea02b5553..3f551eb29157 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -121,16 +121,45 @@ xfs_inobt_get_rec(
 	struct xfs_inobt_rec_incore	*irec,
 	int				*stat)
 {
+	struct xfs_mount		*mp = cur->bc_mp;
+	xfs_agnumber_t			agno = cur->bc_private.a.agno;
 	union xfs_btree_rec		*rec;
 	int				error;
+	uint64_t			realfree;
 
 	error = xfs_btree_get_rec(cur, &rec, stat);
 	if (error || *stat == 0)
 		return error;
 
-	xfs_inobt_btrec_to_irec(cur->bc_mp, rec, irec);
+	xfs_inobt_btrec_to_irec(mp, rec, irec);
+
+	if (!xfs_verify_agino(mp, agno, irec->ir_startino))
+		goto out_bad_rec;
+	if (irec->ir_count < XFS_INODES_PER_HOLEMASK_BIT ||
+	    irec->ir_count > XFS_INODES_PER_CHUNK)
+		goto out_bad_rec;
+	if (irec->ir_freecount > XFS_INODES_PER_CHUNK)
+		goto out_bad_rec;
+
+	/* if there are no holes, return the first available offset */
+	if (!xfs_inobt_issparse(irec->ir_holemask))
+		realfree = irec->ir_free;
+	else
+		realfree = irec->ir_free & xfs_inobt_irec_to_allocmask(irec);
+	if (hweight64(realfree) != irec->ir_freecount)
+		goto out_bad_rec;
 
 	return 0;
+
+out_bad_rec:
+	xfs_warn(mp,
+		"%s Inode BTree record corruption in AG %d detected!",
+		cur->bc_btnum == XFS_BTNUM_INO ? "Used" : "Free", agno);
+	xfs_warn(mp,
+"start inode 0x%x, count 0x%x, free 0x%x freemask 0x%llx, holemask 0x%x",
+		irec->ir_startino, irec->ir_count, irec->ir_freecount,
+		irec->ir_free, irec->ir_holemask);
+	return -EFSCORRUPTED;
 }
 
 /*
diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
index ed5704c7dcf5..9dda6fd0bb13 100644
--- a/fs/xfs/libxfs/xfs_refcount.c
+++ b/fs/xfs/libxfs/xfs_refcount.c
@@ -111,16 +111,53 @@ xfs_refcount_get_rec(
 	struct xfs_refcount_irec	*irec,
 	int				*stat)
 {
+	struct xfs_mount		*mp = cur->bc_mp;
+	xfs_agnumber_t			agno = cur->bc_private.a.agno;
 	union xfs_btree_rec		*rec;
 	int				error;
+	xfs_agblock_t			realstart;
 
 	error = xfs_btree_get_rec(cur, &rec, stat);
-	if (!error && *stat == 1) {
-		xfs_refcount_btrec_to_irec(rec, irec);
-		trace_xfs_refcount_get(cur->bc_mp, cur->bc_private.a.agno,
-				irec);
+	if (error || !*stat)
+		return error;
+
+	xfs_refcount_btrec_to_irec(rec, irec);
+
+	agno = cur->bc_private.a.agno;
+	if (irec->rc_blockcount == 0 || irec->rc_blockcount > MAXREFCEXTLEN)
+		goto out_bad_rec;
+
+	/* handle special COW-staging state */
+	realstart = irec->rc_startblock;
+	if (realstart & XFS_REFC_COW_START) {
+		if (irec->rc_refcount != 1)
+			goto out_bad_rec;
+		realstart &= ~XFS_REFC_COW_START;
+	} else if (irec->rc_refcount < 2) {
+		goto out_bad_rec;
 	}
-	return error;
+
+	/* check for valid extent range, including overflow */
+	if (!xfs_verify_agbno(mp, agno, realstart))
+		goto out_bad_rec;
+	if (realstart > realstart + irec->rc_blockcount)
+		goto out_bad_rec;
+	if (!xfs_verify_agbno(mp, agno, realstart + irec->rc_blockcount - 1))
+		goto out_bad_rec;
+
+	if (irec->rc_refcount == 0 || irec->rc_refcount > MAXREFCOUNT)
+		goto out_bad_rec;
+
+	trace_xfs_refcount_get(cur->bc_mp, cur->bc_private.a.agno, irec);
+	return 0;
+
+out_bad_rec:
+	xfs_warn(mp,
+		"Refcount BTree record corruption in AG %d detected!", agno);
+	xfs_warn(mp,
+		"Start block 0x%x, block count 0x%x, references 0x%x",
+		irec->rc_startblock, irec->rc_blockcount, irec->rc_refcount);
+	return -EFSCORRUPTED;
 }
 
 /*
diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
index 87711f9af625..d4460b0d2d81 100644
--- a/fs/xfs/libxfs/xfs_rmap.c
+++ b/fs/xfs/libxfs/xfs_rmap.c
@@ -27,6 +27,7 @@
 #include "xfs_extent_busy.h"
 #include "xfs_bmap.h"
 #include "xfs_inode.h"
+#include "xfs_ialloc.h"
 
 /*
  * Lookup the first record less than or equal to [bno, len, owner, offset]
@@ -191,6 +192,8 @@ xfs_rmap_get_rec(
 	struct xfs_rmap_irec	*irec,
 	int			*stat)
 {
+	struct xfs_mount	*mp = cur->bc_mp;
+	xfs_agnumber_t		agno = cur->bc_private.a.agno;
 	union xfs_btree_rec	*rec;
 	int			error;
 
@@ -198,7 +201,43 @@ xfs_rmap_get_rec(
 	if (error || !*stat)
 		return error;
 
-	return xfs_rmap_btrec_to_irec(rec, irec);
+	if (xfs_rmap_btrec_to_irec(rec, irec))
+		goto out_bad_rec;
+
+	if (irec->rm_blockcount == 0)
+		goto out_bad_rec;
+	if (irec->rm_startblock <= XFS_AGFL_BLOCK(mp)) {
+		if (irec->rm_owner != XFS_RMAP_OWN_FS)
+			goto out_bad_rec;
+		if (irec->rm_blockcount != XFS_AGFL_BLOCK(mp) + 1)
+			goto out_bad_rec;
+	} else {
+		/* check for valid extent range, including overflow */
+		if (!xfs_verify_agbno(mp, agno, irec->rm_startblock))
+			goto out_bad_rec;
+		if (irec->rm_startblock >
+				irec->rm_startblock + irec->rm_blockcount)
+			goto out_bad_rec;
+		if (!xfs_verify_agbno(mp, agno,
+				irec->rm_startblock + irec->rm_blockcount - 1))
+			goto out_bad_rec;
+	}
+
+	if (!(xfs_verify_ino(mp, irec->rm_owner) ||
+	      (irec->rm_owner <= XFS_RMAP_OWN_FS &&
+	       irec->rm_owner >= XFS_RMAP_OWN_MIN)))
+		goto out_bad_rec;
+
+	return 0;
+out_bad_rec:
+	xfs_warn(mp,
+		"Reverse Mapping BTree record corruption in AG %d detected!",
+		agno);
+	xfs_warn(mp,
+		"Owner 0x%llx, flags 0x%x, start block 0x%x block count 0x%x",
+		irec->rm_owner, irec->rm_flags, irec->rm_startblock,
+		irec->rm_blockcount);
+	return -EFSCORRUPTED;
 }
 
 struct xfs_find_left_neighbor_info {
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier
  2018-06-05 17:10   ` Darrick J. Wong
@ 2018-06-07 16:16     ` Darrick J. Wong
  2018-06-08  1:10       ` Dave Chinner
  0 siblings, 1 reply; 36+ messages in thread
From: Darrick J. Wong @ 2018-06-07 16:16 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Jun 05, 2018 at 10:10:15AM -0700, Darrick J. Wong wrote:
> On Tue, Jun 05, 2018 at 04:24:19PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > There are rules for vald extent size hints. We enforce them when
> > applications set them, but fuzzers violate those rules and that
> > screws us over.
> > 
> > This results in alignment assertion failures when setting up
> > allocations such as this in direct IO:
> > 
> > XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432
> > ....
> > Call Trace:
> >  xfs_bmap_btalloc+0x415/0x910
> >  xfs_bmapi_write+0x71c/0x12e0
> >  xfs_iomap_write_direct+0x2a9/0x420
> >  xfs_file_iomap_begin+0x4dc/0xa70
> >  iomap_apply+0x43/0x100
> >  iomap_file_buffered_write+0x62/0x90
> >  xfs_file_buffered_aio_write+0xba/0x300
> >  __vfs_write+0xd5/0x150
> >  vfs_write+0xb6/0x180
> >  ksys_write+0x45/0xa0
> >  do_syscall_64+0x5a/0x180
> >  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > 
> > And from xfs_db:
> > 
> > core.extsize = 10380288
> > 
> > Which is not an integer multiple of the block size, and so violates
> > Rule #7 for setting extent size hints. Validate extent size hint
> > rules in the inode verifier to catch this.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> 
> Looks ok modulo my comments in the next patch,
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

FWIW when I applied this to xfsprogs I saw an xfs/033 regression:

Phase 6 - check inode connectivity...
reinitializing root directory
Metadata corruption detected at 0x5555555c60e0, inode 0x80 dinode

fatal error -- could not iget root inode -- error - 117
[Inferior 1 (process 1178) exited with code 01]
(gdb) l *(0x5555555c60e0)
0x5555555c60e0 is in libxfs_inode_validate_extsize (xfs_inode_buf.c:729).

We fail the inode verifier while trying to _iget the root inode so that
we can reinitialize it; I suspect phase 3 is going to need to check the
extent size hints and clear them.

--D

> --D
> 
> > ---
> >  fs/xfs/libxfs/xfs_inode_buf.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
> > index f5fff1ccb61d..be197c91307b 100644
> > --- a/fs/xfs/libxfs/xfs_inode_buf.c
> > +++ b/fs/xfs/libxfs/xfs_inode_buf.c
> > @@ -385,6 +385,7 @@ xfs_dinode_verify(
> >  	xfs_ino_t		ino,
> >  	struct xfs_dinode	*dip)
> >  {
> > +	xfs_failaddr_t		fa;
> >  	uint16_t		mode;
> >  	uint16_t		flags;
> >  	uint64_t		flags2;
> > @@ -501,6 +502,12 @@ xfs_dinode_verify(
> >  			return __this_address;
> >  	}
> >  
> > +	/* extent size hint validation */
> > +	fa = xfs_inode_validate_extsize(mp, be32_to_cpu(dip->di_extsize),
> > +					mode, be32_to_cpu(dip->di_flags));
> > +	if (fa)
> > +		return fa;
> > +
> >  	/* only version 3 or greater inodes are extensively verified here */
> >  	if (dip->di_version < 3)
> >  		return NULL;
> > -- 
> > 2.17.0
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier
  2018-06-07 16:16     ` Darrick J. Wong
@ 2018-06-08  1:10       ` Dave Chinner
  2018-06-08  1:23         ` Darrick J. Wong
  0 siblings, 1 reply; 36+ messages in thread
From: Dave Chinner @ 2018-06-08  1:10 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Jun 07, 2018 at 09:16:31AM -0700, Darrick J. Wong wrote:
> On Tue, Jun 05, 2018 at 10:10:15AM -0700, Darrick J. Wong wrote:
> > On Tue, Jun 05, 2018 at 04:24:19PM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > There are rules for vald extent size hints. We enforce them when
> > > applications set them, but fuzzers violate those rules and that
> > > screws us over.
> > > 
> > > This results in alignment assertion failures when setting up
> > > allocations such as this in direct IO:
> > > 
> > > XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432
> > > ....
> > > Call Trace:
> > >  xfs_bmap_btalloc+0x415/0x910
> > >  xfs_bmapi_write+0x71c/0x12e0
> > >  xfs_iomap_write_direct+0x2a9/0x420
> > >  xfs_file_iomap_begin+0x4dc/0xa70
> > >  iomap_apply+0x43/0x100
> > >  iomap_file_buffered_write+0x62/0x90
> > >  xfs_file_buffered_aio_write+0xba/0x300
> > >  __vfs_write+0xd5/0x150
> > >  vfs_write+0xb6/0x180
> > >  ksys_write+0x45/0xa0
> > >  do_syscall_64+0x5a/0x180
> > >  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > > 
> > > And from xfs_db:
> > > 
> > > core.extsize = 10380288
> > > 
> > > Which is not an integer multiple of the block size, and so violates
> > > Rule #7 for setting extent size hints. Validate extent size hint
> > > rules in the inode verifier to catch this.
> > > 
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > 
> > Looks ok modulo my comments in the next patch,
> > Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> FWIW when I applied this to xfsprogs I saw an xfs/033 regression:
> 
> Phase 6 - check inode connectivity...
> reinitializing root directory
> Metadata corruption detected at 0x5555555c60e0, inode 0x80 dinode
> 
> fatal error -- could not iget root inode -- error - 117
> [Inferior 1 (process 1178) exited with code 01]
> (gdb) l *(0x5555555c60e0)
> 0x5555555c60e0 is in libxfs_inode_validate_extsize (xfs_inode_buf.c:729).
> 
> We fail the inode verifier while trying to _iget the root inode so that
> we can reinitialize it; I suspect phase 3 is going to need to check the
> extent size hints and clear them.

I'm actually quite happy to see that the continual process of
hardening the kernel verifiers has got to the point where we are
starting to expose deficiencies in xfs_repair.

Can I wait for the xfsprogs libxfs-4.18-sync branch to pick up these
verifier changes before looking at what repair needs to do to avoid
it? I don't want to do a forced context switch to
debugging/enhancing userspace code right at this moment....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier
  2018-06-08  1:10       ` Dave Chinner
@ 2018-06-08  1:23         ` Darrick J. Wong
  2018-06-08  2:23           ` Eric Sandeen
  0 siblings, 1 reply; 36+ messages in thread
From: Darrick J. Wong @ 2018-06-08  1:23 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Fri, Jun 08, 2018 at 11:10:39AM +1000, Dave Chinner wrote:
> On Thu, Jun 07, 2018 at 09:16:31AM -0700, Darrick J. Wong wrote:
> > On Tue, Jun 05, 2018 at 10:10:15AM -0700, Darrick J. Wong wrote:
> > > On Tue, Jun 05, 2018 at 04:24:19PM +1000, Dave Chinner wrote:
> > > > From: Dave Chinner <dchinner@redhat.com>
> > > > 
> > > > There are rules for vald extent size hints. We enforce them when
> > > > applications set them, but fuzzers violate those rules and that
> > > > screws us over.
> > > > 
> > > > This results in alignment assertion failures when setting up
> > > > allocations such as this in direct IO:
> > > > 
> > > > XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432
> > > > ....
> > > > Call Trace:
> > > >  xfs_bmap_btalloc+0x415/0x910
> > > >  xfs_bmapi_write+0x71c/0x12e0
> > > >  xfs_iomap_write_direct+0x2a9/0x420
> > > >  xfs_file_iomap_begin+0x4dc/0xa70
> > > >  iomap_apply+0x43/0x100
> > > >  iomap_file_buffered_write+0x62/0x90
> > > >  xfs_file_buffered_aio_write+0xba/0x300
> > > >  __vfs_write+0xd5/0x150
> > > >  vfs_write+0xb6/0x180
> > > >  ksys_write+0x45/0xa0
> > > >  do_syscall_64+0x5a/0x180
> > > >  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > > > 
> > > > And from xfs_db:
> > > > 
> > > > core.extsize = 10380288
> > > > 
> > > > Which is not an integer multiple of the block size, and so violates
> > > > Rule #7 for setting extent size hints. Validate extent size hint
> > > > rules in the inode verifier to catch this.
> > > > 
> > > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > 
> > > Looks ok modulo my comments in the next patch,
> > > Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > FWIW when I applied this to xfsprogs I saw an xfs/033 regression:
> > 
> > Phase 6 - check inode connectivity...
> > reinitializing root directory
> > Metadata corruption detected at 0x5555555c60e0, inode 0x80 dinode
> > 
> > fatal error -- could not iget root inode -- error - 117
> > [Inferior 1 (process 1178) exited with code 01]
> > (gdb) l *(0x5555555c60e0)
> > 0x5555555c60e0 is in libxfs_inode_validate_extsize (xfs_inode_buf.c:729).
> > 
> > We fail the inode verifier while trying to _iget the root inode so that
> > we can reinitialize it; I suspect phase 3 is going to need to check the
> > extent size hints and clear them.
> 
> I'm actually quite happy to see that the continual process of
> hardening the kernel verifiers has got to the point where we are
> starting to expose deficiencies in xfs_repair.
> 
> Can I wait for the xfsprogs libxfs-4.18-sync branch to pick up these
> verifier changes before looking at what repair needs to do to avoid
> it? I don't want to do a forced context switch to
> debugging/enhancing userspace code right at this moment....

That's ultimately up to Eric, but since fixing it is nontrivial surgery
on xfs_repair (and the verifier update patch doesn't itself break the
build) I'd be fine with fixing it after the 4.18 sync goes in.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier
  2018-06-08  1:23         ` Darrick J. Wong
@ 2018-06-08  2:23           ` Eric Sandeen
  0 siblings, 0 replies; 36+ messages in thread
From: Eric Sandeen @ 2018-06-08  2:23 UTC (permalink / raw)
  To: Darrick J. Wong, Dave Chinner; +Cc: linux-xfs



On 6/7/18 8:23 PM, Darrick J. Wong wrote:
> On Fri, Jun 08, 2018 at 11:10:39AM +1000, Dave Chinner wrote:
>> On Thu, Jun 07, 2018 at 09:16:31AM -0700, Darrick J. Wong wrote:
>>> On Tue, Jun 05, 2018 at 10:10:15AM -0700, Darrick J. Wong wrote:
>>>> On Tue, Jun 05, 2018 at 04:24:19PM +1000, Dave Chinner wrote:
>>>>> From: Dave Chinner <dchinner@redhat.com>
>>>>>
>>>>> There are rules for vald extent size hints. We enforce them when
>>>>> applications set them, but fuzzers violate those rules and that
>>>>> screws us over.
>>>>>
>>>>> This results in alignment assertion failures when setting up
>>>>> allocations such as this in direct IO:
>>>>>
>>>>> XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432
>>>>> ....
>>>>> Call Trace:
>>>>>  xfs_bmap_btalloc+0x415/0x910
>>>>>  xfs_bmapi_write+0x71c/0x12e0
>>>>>  xfs_iomap_write_direct+0x2a9/0x420
>>>>>  xfs_file_iomap_begin+0x4dc/0xa70
>>>>>  iomap_apply+0x43/0x100
>>>>>  iomap_file_buffered_write+0x62/0x90
>>>>>  xfs_file_buffered_aio_write+0xba/0x300
>>>>>  __vfs_write+0xd5/0x150
>>>>>  vfs_write+0xb6/0x180
>>>>>  ksys_write+0x45/0xa0
>>>>>  do_syscall_64+0x5a/0x180
>>>>>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>>>>
>>>>> And from xfs_db:
>>>>>
>>>>> core.extsize = 10380288
>>>>>
>>>>> Which is not an integer multiple of the block size, and so violates
>>>>> Rule #7 for setting extent size hints. Validate extent size hint
>>>>> rules in the inode verifier to catch this.
>>>>>
>>>>> Signed-off-by: Dave Chinner <dchinner@redhat.com>
>>>>
>>>> Looks ok modulo my comments in the next patch,
>>>> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
>>>
>>> FWIW when I applied this to xfsprogs I saw an xfs/033 regression:
>>>
>>> Phase 6 - check inode connectivity...
>>> reinitializing root directory
>>> Metadata corruption detected at 0x5555555c60e0, inode 0x80 dinode
>>>
>>> fatal error -- could not iget root inode -- error - 117
>>> [Inferior 1 (process 1178) exited with code 01]
>>> (gdb) l *(0x5555555c60e0)
>>> 0x5555555c60e0 is in libxfs_inode_validate_extsize (xfs_inode_buf.c:729).
>>>
>>> We fail the inode verifier while trying to _iget the root inode so that
>>> we can reinitialize it; I suspect phase 3 is going to need to check the
>>> extent size hints and clear them.
>>
>> I'm actually quite happy to see that the continual process of
>> hardening the kernel verifiers has got to the point where we are
>> starting to expose deficiencies in xfs_repair.
>>
>> Can I wait for the xfsprogs libxfs-4.18-sync branch to pick up these
>> verifier changes before looking at what repair needs to do to avoid
>> it? I don't want to do a forced context switch to
>> debugging/enhancing userspace code right at this moment....
> 
> That's ultimately up to Eric, but since fixing it is nontrivial surgery
> on xfs_repair (and the verifier update patch doesn't itself break the
> build) I'd be fine with fixing it after the 4.18 sync goes in.
> 
> --D

I think that getting it into the kernel and even into the xfsprogs/libxfs
tree for 4.18 is fine as long as we are sure a repair fix will be forthcoming
before 4.18 is done as long as it doesn't blow up regression testing /too/
much...  This kernel<->libxfs<->application coordination
can get a bit chicken-and-eggy sometimes.

I guess this kernel change means that only a latest xfs_repair will make
a latest kernel happy; I guess that's fairly normal.

-Eric


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier
  2018-06-05  6:24 ` [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier Dave Chinner
  2018-06-05  9:53   ` Carlos Maiolino
  2018-06-05 17:10   ` Darrick J. Wong
@ 2018-07-24  6:39   ` Eric Sandeen
  2018-07-24 16:43     ` Darrick J. Wong
  2 siblings, 1 reply; 36+ messages in thread
From: Eric Sandeen @ 2018-07-24  6:39 UTC (permalink / raw)
  To: Dave Chinner, linux-xfs

On 6/4/18 11:24 PM, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> There are rules for vald extent size hints. We enforce them when
> applications set them, but fuzzers violate those rules and that
> screws us over.
> 
> This results in alignment assertion failures when setting up
> allocations such as this in direct IO:
> 
> XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432
> ....
> Call Trace:
>  xfs_bmap_btalloc+0x415/0x910
>  xfs_bmapi_write+0x71c/0x12e0
>  xfs_iomap_write_direct+0x2a9/0x420
>  xfs_file_iomap_begin+0x4dc/0xa70
>  iomap_apply+0x43/0x100
>  iomap_file_buffered_write+0x62/0x90
>  xfs_file_buffered_aio_write+0xba/0x300
>  __vfs_write+0xd5/0x150
>  vfs_write+0xb6/0x180
>  ksys_write+0x45/0xa0
>  do_syscall_64+0x5a/0x180
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> And from xfs_db:
> 
> core.extsize = 10380288
> 
> Which is not an integer multiple of the block size, and so violates
> Rule #7 for setting extent size hints. Validate extent size hint
> rules in the inode verifier to catch this.

So, I think that if I do:

# mkfs.xfs -f -m crc=0 $TEST_DEV
# ./check xfs/229
# ./check xfs/229

I trip the verifier, because I end up with freed inodes on disk with an
extent size hints but zeroed flags.  

xfs_ifree sets di_flags = 0 but doesn't clear di_extsize; xfs_inode_validate_extsize
says if extsize !=0 and the hint flag is set, it fails

Anyone else see this?

(crc=0 needed because that causes us to actually reread the inode chunks
in xfs_iread vs. /* shortcut IO on inode allocation if possible */

-Eric

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier
  2018-07-24  6:39   ` Eric Sandeen
@ 2018-07-24 16:43     ` Darrick J. Wong
  2018-08-20 15:06       ` Brian Foster
  0 siblings, 1 reply; 36+ messages in thread
From: Darrick J. Wong @ 2018-07-24 16:43 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Dave Chinner, linux-xfs

On Mon, Jul 23, 2018 at 11:39:53PM -0700, Eric Sandeen wrote:
> On 6/4/18 11:24 PM, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > There are rules for vald extent size hints. We enforce them when
> > applications set them, but fuzzers violate those rules and that
> > screws us over.
> > 
> > This results in alignment assertion failures when setting up
> > allocations such as this in direct IO:
> > 
> > XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432
> > ....
> > Call Trace:
> >  xfs_bmap_btalloc+0x415/0x910
> >  xfs_bmapi_write+0x71c/0x12e0
> >  xfs_iomap_write_direct+0x2a9/0x420
> >  xfs_file_iomap_begin+0x4dc/0xa70
> >  iomap_apply+0x43/0x100
> >  iomap_file_buffered_write+0x62/0x90
> >  xfs_file_buffered_aio_write+0xba/0x300
> >  __vfs_write+0xd5/0x150
> >  vfs_write+0xb6/0x180
> >  ksys_write+0x45/0xa0
> >  do_syscall_64+0x5a/0x180
> >  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > 
> > And from xfs_db:
> > 
> > core.extsize = 10380288
> > 
> > Which is not an integer multiple of the block size, and so violates
> > Rule #7 for setting extent size hints. Validate extent size hint
> > rules in the inode verifier to catch this.
> 
> So, I think that if I do:
> 
> # mkfs.xfs -f -m crc=0 $TEST_DEV
> # ./check xfs/229
> # ./check xfs/229
> 
> I trip the verifier, because I end up with freed inodes on disk with an
> extent size hints but zeroed flags.  
> 
> xfs_ifree sets di_flags = 0 but doesn't clear di_extsize; xfs_inode_validate_extsize
> says if extsize !=0 and the hint flag is set, it fails
> 
> Anyone else see this?

Yeah, I think I just hit this on the TEST_DEV in xfs/242.

git blame says I lifted the code from the scrub code, and I probably
wrote the code having read the ioctl code (which clears the extsize
field if the iflag isn't set).

> (crc=0 needed because that causes us to actually reread the inode chunks
> in xfs_iread vs. /* shortcut IO on inode allocation if possible */

Hmmm, so a v5 fs mounted with ikeep will also read an inode chunk when
creating an inode.  It looks like we do that (instead of zeroing the
incore inode and setting a random i_generation) to preserve the existing
generation number?

In any case, it's pretty clear that kernels have been writing out freed
inode cores with di_mode == 0, di_flags == 0, and di_extsize == (some
number) so we clearly can't have that in the verifier.  It looks like we
only examine di_extsize if either EXTSZ flag are set, so it's not
causing incorrect behavior.  Maybe it can be a preening fix in
scrub/repair.

--D

> -Eric
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier
  2018-07-24 16:43     ` Darrick J. Wong
@ 2018-08-20 15:06       ` Brian Foster
  2018-08-20 15:27         ` Eric Sandeen
  0 siblings, 1 reply; 36+ messages in thread
From: Brian Foster @ 2018-08-20 15:06 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Eric Sandeen, Dave Chinner, linux-xfs

On Tue, Jul 24, 2018 at 09:43:46AM -0700, Darrick J. Wong wrote:
> On Mon, Jul 23, 2018 at 11:39:53PM -0700, Eric Sandeen wrote:
> > On 6/4/18 11:24 PM, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > There are rules for vald extent size hints. We enforce them when
> > > applications set them, but fuzzers violate those rules and that
> > > screws us over.
> > > 
> > > This results in alignment assertion failures when setting up
> > > allocations such as this in direct IO:
> > > 
> > > XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432
> > > ....
> > > Call Trace:
> > >  xfs_bmap_btalloc+0x415/0x910
> > >  xfs_bmapi_write+0x71c/0x12e0
> > >  xfs_iomap_write_direct+0x2a9/0x420
> > >  xfs_file_iomap_begin+0x4dc/0xa70
> > >  iomap_apply+0x43/0x100
> > >  iomap_file_buffered_write+0x62/0x90
> > >  xfs_file_buffered_aio_write+0xba/0x300
> > >  __vfs_write+0xd5/0x150
> > >  vfs_write+0xb6/0x180
> > >  ksys_write+0x45/0xa0
> > >  do_syscall_64+0x5a/0x180
> > >  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > > 
> > > And from xfs_db:
> > > 
> > > core.extsize = 10380288
> > > 
> > > Which is not an integer multiple of the block size, and so violates
> > > Rule #7 for setting extent size hints. Validate extent size hint
> > > rules in the inode verifier to catch this.
> > 
> > So, I think that if I do:
> > 
> > # mkfs.xfs -f -m crc=0 $TEST_DEV
> > # ./check xfs/229
> > # ./check xfs/229
> > 
> > I trip the verifier, because I end up with freed inodes on disk with an
> > extent size hints but zeroed flags.  
> > 
> > xfs_ifree sets di_flags = 0 but doesn't clear di_extsize; xfs_inode_validate_extsize
> > says if extsize !=0 and the hint flag is set, it fails
> > 
> > Anyone else see this?
> 
> Yeah, I think I just hit this on the TEST_DEV in xfs/242.
> 
> git blame says I lifted the code from the scrub code, and I probably
> wrote the code having read the ioctl code (which clears the extsize
> field if the iflag isn't set).
> 
> > (crc=0 needed because that causes us to actually reread the inode chunks
> > in xfs_iread vs. /* shortcut IO on inode allocation if possible */
> 
> Hmmm, so a v5 fs mounted with ikeep will also read an inode chunk when
> creating an inode.  It looks like we do that (instead of zeroing the
> incore inode and setting a random i_generation) to preserve the existing
> generation number?
> 
> In any case, it's pretty clear that kernels have been writing out freed
> inode cores with di_mode == 0, di_flags == 0, and di_extsize == (some
> number) so we clearly can't have that in the verifier.  It looks like we
> only examine di_extsize if either EXTSZ flag are set, so it's not
> causing incorrect behavior.  Maybe it can be a preening fix in
> scrub/repair.
> 

I just stumbled on this problem with xfs/229 that Eric reported. I'm
confused by the comment above regarding this not causing incorrect
behavior. This patch adds extsize verification to the inode verifier,
which trips up the inode allocation code if the (free) on-disk inode
still happens to have an old extsize hint set. This causes a filesystem
shutdown where we'd otherwise subsequently reset di_extsize to zero as
part of the inode allocation path (xfs_ialloc()). Shouldn't we filter
out this particular (!hint && extsize != 0) check for free inodes (or
something of that nature)?

FWIW, I reproduce this with xfs/229 with crc=0 or crc=1+ikeep.

Brian

> --D
> 
> > -Eric
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier
  2018-08-20 15:06       ` Brian Foster
@ 2018-08-20 15:27         ` Eric Sandeen
  2018-08-20 15:36           ` Darrick J. Wong
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Sandeen @ 2018-08-20 15:27 UTC (permalink / raw)
  To: Brian Foster, Darrick J. Wong; +Cc: Dave Chinner, linux-xfs



On 8/20/18 10:06 AM, Brian Foster wrote:
> On Tue, Jul 24, 2018 at 09:43:46AM -0700, Darrick J. Wong wrote:
>> On Mon, Jul 23, 2018 at 11:39:53PM -0700, Eric Sandeen wrote:
>>> On 6/4/18 11:24 PM, Dave Chinner wrote:
>>>> From: Dave Chinner <dchinner@redhat.com>
>>>>
>>>> There are rules for vald extent size hints. We enforce them when
>>>> applications set them, but fuzzers violate those rules and that
>>>> screws us over.
>>>>
>>>> This results in alignment assertion failures when setting up
>>>> allocations such as this in direct IO:
>>>>
>>>> XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432
>>>> ....
>>>> Call Trace:
>>>>  xfs_bmap_btalloc+0x415/0x910
>>>>  xfs_bmapi_write+0x71c/0x12e0
>>>>  xfs_iomap_write_direct+0x2a9/0x420
>>>>  xfs_file_iomap_begin+0x4dc/0xa70
>>>>  iomap_apply+0x43/0x100
>>>>  iomap_file_buffered_write+0x62/0x90
>>>>  xfs_file_buffered_aio_write+0xba/0x300
>>>>  __vfs_write+0xd5/0x150
>>>>  vfs_write+0xb6/0x180
>>>>  ksys_write+0x45/0xa0
>>>>  do_syscall_64+0x5a/0x180
>>>>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>>>
>>>> And from xfs_db:
>>>>
>>>> core.extsize = 10380288
>>>>
>>>> Which is not an integer multiple of the block size, and so violates
>>>> Rule #7 for setting extent size hints. Validate extent size hint
>>>> rules in the inode verifier to catch this.
>>>
>>> So, I think that if I do:
>>>
>>> # mkfs.xfs -f -m crc=0 $TEST_DEV
>>> # ./check xfs/229
>>> # ./check xfs/229
>>>
>>> I trip the verifier, because I end up with freed inodes on disk with an
>>> extent size hints but zeroed flags.  
>>>
>>> xfs_ifree sets di_flags = 0 but doesn't clear di_extsize; xfs_inode_validate_extsize
>>> says if extsize !=0 and the hint flag is set, it fails
>>>
>>> Anyone else see this?
>>
>> Yeah, I think I just hit this on the TEST_DEV in xfs/242.
>>
>> git blame says I lifted the code from the scrub code, and I probably
>> wrote the code having read the ioctl code (which clears the extsize
>> field if the iflag isn't set).
>>
>>> (crc=0 needed because that causes us to actually reread the inode chunks
>>> in xfs_iread vs. /* shortcut IO on inode allocation if possible */
>>
>> Hmmm, so a v5 fs mounted with ikeep will also read an inode chunk when
>> creating an inode.  It looks like we do that (instead of zeroing the
>> incore inode and setting a random i_generation) to preserve the existing
>> generation number?
>>
>> In any case, it's pretty clear that kernels have been writing out freed
>> inode cores with di_mode == 0, di_flags == 0, and di_extsize == (some
>> number) so we clearly can't have that in the verifier.  It looks like we
>> only examine di_extsize if either EXTSZ flag are set, so it's not
>> causing incorrect behavior.  Maybe it can be a preening fix in
>> scrub/repair.
>>
> 
> I just stumbled on this problem with xfs/229 that Eric reported. I'm
> confused by the comment above regarding this not causing incorrect
> behavior.

I think Darrick meant that having a nonzero extent size hint on disk
won't cause incorrect behavior because "we only examine di_extsize if
either EXTSZ flag are set"

-Eric

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier
  2018-08-20 15:27         ` Eric Sandeen
@ 2018-08-20 15:36           ` Darrick J. Wong
  2018-08-20 15:59             ` Brian Foster
  0 siblings, 1 reply; 36+ messages in thread
From: Darrick J. Wong @ 2018-08-20 15:36 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Brian Foster, Dave Chinner, linux-xfs

On Mon, Aug 20, 2018 at 10:27:42AM -0500, Eric Sandeen wrote:
> 
> 
> On 8/20/18 10:06 AM, Brian Foster wrote:
> > On Tue, Jul 24, 2018 at 09:43:46AM -0700, Darrick J. Wong wrote:
> >> On Mon, Jul 23, 2018 at 11:39:53PM -0700, Eric Sandeen wrote:
> >>> On 6/4/18 11:24 PM, Dave Chinner wrote:
> >>>> From: Dave Chinner <dchinner@redhat.com>
> >>>>
> >>>> There are rules for vald extent size hints. We enforce them when
> >>>> applications set them, but fuzzers violate those rules and that
> >>>> screws us over.
> >>>>
> >>>> This results in alignment assertion failures when setting up
> >>>> allocations such as this in direct IO:
> >>>>
> >>>> XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432
> >>>> ....
> >>>> Call Trace:
> >>>>  xfs_bmap_btalloc+0x415/0x910
> >>>>  xfs_bmapi_write+0x71c/0x12e0
> >>>>  xfs_iomap_write_direct+0x2a9/0x420
> >>>>  xfs_file_iomap_begin+0x4dc/0xa70
> >>>>  iomap_apply+0x43/0x100
> >>>>  iomap_file_buffered_write+0x62/0x90
> >>>>  xfs_file_buffered_aio_write+0xba/0x300
> >>>>  __vfs_write+0xd5/0x150
> >>>>  vfs_write+0xb6/0x180
> >>>>  ksys_write+0x45/0xa0
> >>>>  do_syscall_64+0x5a/0x180
> >>>>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >>>>
> >>>> And from xfs_db:
> >>>>
> >>>> core.extsize = 10380288
> >>>>
> >>>> Which is not an integer multiple of the block size, and so violates
> >>>> Rule #7 for setting extent size hints. Validate extent size hint
> >>>> rules in the inode verifier to catch this.
> >>>
> >>> So, I think that if I do:
> >>>
> >>> # mkfs.xfs -f -m crc=0 $TEST_DEV
> >>> # ./check xfs/229
> >>> # ./check xfs/229
> >>>
> >>> I trip the verifier, because I end up with freed inodes on disk with an
> >>> extent size hints but zeroed flags.  
> >>>
> >>> xfs_ifree sets di_flags = 0 but doesn't clear di_extsize; xfs_inode_validate_extsize
> >>> says if extsize !=0 and the hint flag is set, it fails
> >>>
> >>> Anyone else see this?
> >>
> >> Yeah, I think I just hit this on the TEST_DEV in xfs/242.
> >>
> >> git blame says I lifted the code from the scrub code, and I probably
> >> wrote the code having read the ioctl code (which clears the extsize
> >> field if the iflag isn't set).
> >>
> >>> (crc=0 needed because that causes us to actually reread the inode chunks
> >>> in xfs_iread vs. /* shortcut IO on inode allocation if possible */
> >>
> >> Hmmm, so a v5 fs mounted with ikeep will also read an inode chunk when
> >> creating an inode.  It looks like we do that (instead of zeroing the
> >> incore inode and setting a random i_generation) to preserve the existing
> >> generation number?
> >>
> >> In any case, it's pretty clear that kernels have been writing out freed
> >> inode cores with di_mode == 0, di_flags == 0, and di_extsize == (some
> >> number) so we clearly can't have that in the verifier.  It looks like we
> >> only examine di_extsize if either EXTSZ flag are set, so it's not
> >> causing incorrect behavior.  Maybe it can be a preening fix in
> >> scrub/repair.
> >>
> > 
> > I just stumbled on this problem with xfs/229 that Eric reported. I'm
> > confused by the comment above regarding this not causing incorrect
> > behavior.
> 
> I think Darrick meant that having a nonzero extent size hint on disk
> won't cause incorrect behavior because "we only examine di_extsize if
> either EXTSZ flag are set"

Yeah, he probably did. :)

I think Brian's suggestion of

if (i_mode != 0 && !hint && extsize != 0)
	barf_error();

sounds reasonable (having not tested that at all).

--D

> -Eric

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier
  2018-08-20 15:36           ` Darrick J. Wong
@ 2018-08-20 15:59             ` Brian Foster
  2018-08-20 22:15               ` Dave Chinner
  0 siblings, 1 reply; 36+ messages in thread
From: Brian Foster @ 2018-08-20 15:59 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Eric Sandeen, Dave Chinner, linux-xfs

On Mon, Aug 20, 2018 at 08:36:26AM -0700, Darrick J. Wong wrote:
> On Mon, Aug 20, 2018 at 10:27:42AM -0500, Eric Sandeen wrote:
> > 
> > 
> > On 8/20/18 10:06 AM, Brian Foster wrote:
> > > On Tue, Jul 24, 2018 at 09:43:46AM -0700, Darrick J. Wong wrote:
> > >> On Mon, Jul 23, 2018 at 11:39:53PM -0700, Eric Sandeen wrote:
> > >>> On 6/4/18 11:24 PM, Dave Chinner wrote:
> > >>>> From: Dave Chinner <dchinner@redhat.com>
> > >>>>
> > >>>> There are rules for vald extent size hints. We enforce them when
> > >>>> applications set them, but fuzzers violate those rules and that
> > >>>> screws us over.
> > >>>>
> > >>>> This results in alignment assertion failures when setting up
> > >>>> allocations such as this in direct IO:
> > >>>>
> > >>>> XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432
> > >>>> ....
> > >>>> Call Trace:
> > >>>>  xfs_bmap_btalloc+0x415/0x910
> > >>>>  xfs_bmapi_write+0x71c/0x12e0
> > >>>>  xfs_iomap_write_direct+0x2a9/0x420
> > >>>>  xfs_file_iomap_begin+0x4dc/0xa70
> > >>>>  iomap_apply+0x43/0x100
> > >>>>  iomap_file_buffered_write+0x62/0x90
> > >>>>  xfs_file_buffered_aio_write+0xba/0x300
> > >>>>  __vfs_write+0xd5/0x150
> > >>>>  vfs_write+0xb6/0x180
> > >>>>  ksys_write+0x45/0xa0
> > >>>>  do_syscall_64+0x5a/0x180
> > >>>>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > >>>>
> > >>>> And from xfs_db:
> > >>>>
> > >>>> core.extsize = 10380288
> > >>>>
> > >>>> Which is not an integer multiple of the block size, and so violates
> > >>>> Rule #7 for setting extent size hints. Validate extent size hint
> > >>>> rules in the inode verifier to catch this.
> > >>>
> > >>> So, I think that if I do:
> > >>>
> > >>> # mkfs.xfs -f -m crc=0 $TEST_DEV
> > >>> # ./check xfs/229
> > >>> # ./check xfs/229
> > >>>
> > >>> I trip the verifier, because I end up with freed inodes on disk with an
> > >>> extent size hints but zeroed flags.  
> > >>>
> > >>> xfs_ifree sets di_flags = 0 but doesn't clear di_extsize; xfs_inode_validate_extsize
> > >>> says if extsize !=0 and the hint flag is set, it fails
> > >>>
> > >>> Anyone else see this?
> > >>
> > >> Yeah, I think I just hit this on the TEST_DEV in xfs/242.
> > >>
> > >> git blame says I lifted the code from the scrub code, and I probably
> > >> wrote the code having read the ioctl code (which clears the extsize
> > >> field if the iflag isn't set).
> > >>
> > >>> (crc=0 needed because that causes us to actually reread the inode chunks
> > >>> in xfs_iread vs. /* shortcut IO on inode allocation if possible */
> > >>
> > >> Hmmm, so a v5 fs mounted with ikeep will also read an inode chunk when
> > >> creating an inode.  It looks like we do that (instead of zeroing the
> > >> incore inode and setting a random i_generation) to preserve the existing
> > >> generation number?
> > >>
> > >> In any case, it's pretty clear that kernels have been writing out freed
> > >> inode cores with di_mode == 0, di_flags == 0, and di_extsize == (some
> > >> number) so we clearly can't have that in the verifier.  It looks like we
> > >> only examine di_extsize if either EXTSZ flag are set, so it's not
> > >> causing incorrect behavior.  Maybe it can be a preening fix in
> > >> scrub/repair.
> > >>
> > > 
> > > I just stumbled on this problem with xfs/229 that Eric reported. I'm
> > > confused by the comment above regarding this not causing incorrect
> > > behavior.
> > 
> > I think Darrick meant that having a nonzero extent size hint on disk
> > won't cause incorrect behavior because "we only examine di_extsize if
> > either EXTSZ flag are set"
> 
> Yeah, he probably did. :)
> 

Got it, thanks.

> I think Brian's suggestion of
> 
> if (i_mode != 0 && !hint && extsize != 0)
> 	barf_error();
> 
> sounds reasonable (having not tested that at all).
> 

I'll run it through xfstests and get it posted if nothing else fails.

BTW, do we have a similar issue with the cowextsize hint (assuming
v5+ikeep)? It looks like it's cleared similarly in xfs_ialloc(), but I'm
not sure if it's cleared somewhere else on free...

Brian

> --D
> 
> > -Eric

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier
  2018-08-20 15:59             ` Brian Foster
@ 2018-08-20 22:15               ` Dave Chinner
  2018-08-21 10:56                 ` Brian Foster
  0 siblings, 1 reply; 36+ messages in thread
From: Dave Chinner @ 2018-08-20 22:15 UTC (permalink / raw)
  To: Brian Foster; +Cc: Darrick J. Wong, Eric Sandeen, linux-xfs

On Mon, Aug 20, 2018 at 11:59:18AM -0400, Brian Foster wrote:
> On Mon, Aug 20, 2018 at 08:36:26AM -0700, Darrick J. Wong wrote:
> > On Mon, Aug 20, 2018 at 10:27:42AM -0500, Eric Sandeen wrote:
> > > On 8/20/18 10:06 AM, Brian Foster wrote:
> > > > On Tue, Jul 24, 2018 at 09:43:46AM -0700, Darrick J. Wong wrote:
> > > >> On Mon, Jul 23, 2018 at 11:39:53PM -0700, Eric Sandeen wrote:
> > > >>> On 6/4/18 11:24 PM, Dave Chinner wrote:
> > > >>>> From: Dave Chinner <dchinner@redhat.com>
> > > >>>>
> > > >>>> There are rules for vald extent size hints. We enforce them when
> > > >>>> applications set them, but fuzzers violate those rules and that
> > > >>>> screws us over.
> > > >>>>
> > > >>>> This results in alignment assertion failures when setting up
> > > >>>> allocations such as this in direct IO:
> > > >>>>
> > > >>>> XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432
> > > >>>> ....
> > > >>>> Call Trace:
> > > >>>>  xfs_bmap_btalloc+0x415/0x910
> > > >>>>  xfs_bmapi_write+0x71c/0x12e0
> > > >>>>  xfs_iomap_write_direct+0x2a9/0x420
> > > >>>>  xfs_file_iomap_begin+0x4dc/0xa70
> > > >>>>  iomap_apply+0x43/0x100
> > > >>>>  iomap_file_buffered_write+0x62/0x90
> > > >>>>  xfs_file_buffered_aio_write+0xba/0x300
> > > >>>>  __vfs_write+0xd5/0x150
> > > >>>>  vfs_write+0xb6/0x180
> > > >>>>  ksys_write+0x45/0xa0
> > > >>>>  do_syscall_64+0x5a/0x180
> > > >>>>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > > >>>>
> > > >>>> And from xfs_db:
> > > >>>>
> > > >>>> core.extsize = 10380288
> > > >>>>
> > > >>>> Which is not an integer multiple of the block size, and so violates
> > > >>>> Rule #7 for setting extent size hints. Validate extent size hint
> > > >>>> rules in the inode verifier to catch this.
> > > >>>
> > > >>> So, I think that if I do:
> > > >>>
> > > >>> # mkfs.xfs -f -m crc=0 $TEST_DEV
> > > >>> # ./check xfs/229
> > > >>> # ./check xfs/229
> > > >>>
> > > >>> I trip the verifier, because I end up with freed inodes on disk with an
> > > >>> extent size hints but zeroed flags.  
> > > >>>
> > > >>> xfs_ifree sets di_flags = 0 but doesn't clear di_extsize; xfs_inode_validate_extsize
> > > >>> says if extsize !=0 and the hint flag is set, it fails
> > > >>>
> > > >>> Anyone else see this?
> > > >>
> > > >> Yeah, I think I just hit this on the TEST_DEV in xfs/242.
> > > >>
> > > >> git blame says I lifted the code from the scrub code, and I probably
> > > >> wrote the code having read the ioctl code (which clears the extsize
> > > >> field if the iflag isn't set).
> > > >>
> > > >>> (crc=0 needed because that causes us to actually reread the inode chunks
> > > >>> in xfs_iread vs. /* shortcut IO on inode allocation if possible */
> > > >>
> > > >> Hmmm, so a v5 fs mounted with ikeep will also read an inode chunk when
> > > >> creating an inode.  It looks like we do that (instead of zeroing the
> > > >> incore inode and setting a random i_generation) to preserve the existing
> > > >> generation number?
> > > >>
> > > >> In any case, it's pretty clear that kernels have been writing out freed
> > > >> inode cores with di_mode == 0, di_flags == 0, and di_extsize == (some
> > > >> number) so we clearly can't have that in the verifier.  It looks like we
> > > >> only examine di_extsize if either EXTSZ flag are set, so it's not
> > > >> causing incorrect behavior.  Maybe it can be a preening fix in
> > > >> scrub/repair.
> > > >>
> > > > 
> > > > I just stumbled on this problem with xfs/229 that Eric reported. I'm
> > > > confused by the comment above regarding this not causing incorrect
> > > > behavior.
> > > 
> > > I think Darrick meant that having a nonzero extent size hint on disk
> > > won't cause incorrect behavior because "we only examine di_extsize if
> > > either EXTSZ flag are set"
> > 
> > Yeah, he probably did. :)
> > 
> 
> Got it, thanks.
> 
> > I think Brian's suggestion of
> > 
> > if (i_mode != 0 && !hint && extsize != 0)
> > 	barf_error();
> > 
> > sounds reasonable (having not tested that at all).
> > 
> 
> I'll run it through xfstests and get it posted if nothing else fails.
> 
> BTW, do we have a similar issue with the cowextsize hint (assuming
> v5+ikeep)? It looks like it's cleared similarly in xfs_ialloc(), but I'm
> not sure if it's cleared somewhere else on free...

We should clear them on free now, so that we can draw a line in the
sand for when we can have verifiers check it. e.g. when the next
feature bit gets introduced, filesystems with that feature bit set
can also verify the extent size hints are zero on freed inodes
because we know that kernels supporting that feature always zero
them on free....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier
  2018-08-20 22:15               ` Dave Chinner
@ 2018-08-21 10:56                 ` Brian Foster
  2018-08-22  0:41                   ` Dave Chinner
  0 siblings, 1 reply; 36+ messages in thread
From: Brian Foster @ 2018-08-21 10:56 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Darrick J. Wong, Eric Sandeen, linux-xfs

On Tue, Aug 21, 2018 at 08:15:06AM +1000, Dave Chinner wrote:
> On Mon, Aug 20, 2018 at 11:59:18AM -0400, Brian Foster wrote:
> > On Mon, Aug 20, 2018 at 08:36:26AM -0700, Darrick J. Wong wrote:
> > > On Mon, Aug 20, 2018 at 10:27:42AM -0500, Eric Sandeen wrote:
> > > > On 8/20/18 10:06 AM, Brian Foster wrote:
> > > > > On Tue, Jul 24, 2018 at 09:43:46AM -0700, Darrick J. Wong wrote:
> > > > >> On Mon, Jul 23, 2018 at 11:39:53PM -0700, Eric Sandeen wrote:
> > > > >>> On 6/4/18 11:24 PM, Dave Chinner wrote:
> > > > >>>> From: Dave Chinner <dchinner@redhat.com>
> > > > >>>>
> > > > >>>> There are rules for vald extent size hints. We enforce them when
> > > > >>>> applications set them, but fuzzers violate those rules and that
> > > > >>>> screws us over.
> > > > >>>>
> > > > >>>> This results in alignment assertion failures when setting up
> > > > >>>> allocations such as this in direct IO:
> > > > >>>>
> > > > >>>> XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432
> > > > >>>> ....
> > > > >>>> Call Trace:
> > > > >>>>  xfs_bmap_btalloc+0x415/0x910
> > > > >>>>  xfs_bmapi_write+0x71c/0x12e0
> > > > >>>>  xfs_iomap_write_direct+0x2a9/0x420
> > > > >>>>  xfs_file_iomap_begin+0x4dc/0xa70
> > > > >>>>  iomap_apply+0x43/0x100
> > > > >>>>  iomap_file_buffered_write+0x62/0x90
> > > > >>>>  xfs_file_buffered_aio_write+0xba/0x300
> > > > >>>>  __vfs_write+0xd5/0x150
> > > > >>>>  vfs_write+0xb6/0x180
> > > > >>>>  ksys_write+0x45/0xa0
> > > > >>>>  do_syscall_64+0x5a/0x180
> > > > >>>>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > > > >>>>
> > > > >>>> And from xfs_db:
> > > > >>>>
> > > > >>>> core.extsize = 10380288
> > > > >>>>
> > > > >>>> Which is not an integer multiple of the block size, and so violates
> > > > >>>> Rule #7 for setting extent size hints. Validate extent size hint
> > > > >>>> rules in the inode verifier to catch this.
> > > > >>>
> > > > >>> So, I think that if I do:
> > > > >>>
> > > > >>> # mkfs.xfs -f -m crc=0 $TEST_DEV
> > > > >>> # ./check xfs/229
> > > > >>> # ./check xfs/229
> > > > >>>
> > > > >>> I trip the verifier, because I end up with freed inodes on disk with an
> > > > >>> extent size hints but zeroed flags.  
> > > > >>>
> > > > >>> xfs_ifree sets di_flags = 0 but doesn't clear di_extsize; xfs_inode_validate_extsize
> > > > >>> says if extsize !=0 and the hint flag is set, it fails
> > > > >>>
> > > > >>> Anyone else see this?
> > > > >>
> > > > >> Yeah, I think I just hit this on the TEST_DEV in xfs/242.
> > > > >>
> > > > >> git blame says I lifted the code from the scrub code, and I probably
> > > > >> wrote the code having read the ioctl code (which clears the extsize
> > > > >> field if the iflag isn't set).
> > > > >>
> > > > >>> (crc=0 needed because that causes us to actually reread the inode chunks
> > > > >>> in xfs_iread vs. /* shortcut IO on inode allocation if possible */
> > > > >>
> > > > >> Hmmm, so a v5 fs mounted with ikeep will also read an inode chunk when
> > > > >> creating an inode.  It looks like we do that (instead of zeroing the
> > > > >> incore inode and setting a random i_generation) to preserve the existing
> > > > >> generation number?
> > > > >>
> > > > >> In any case, it's pretty clear that kernels have been writing out freed
> > > > >> inode cores with di_mode == 0, di_flags == 0, and di_extsize == (some
> > > > >> number) so we clearly can't have that in the verifier.  It looks like we
> > > > >> only examine di_extsize if either EXTSZ flag are set, so it's not
> > > > >> causing incorrect behavior.  Maybe it can be a preening fix in
> > > > >> scrub/repair.
> > > > >>
> > > > > 
> > > > > I just stumbled on this problem with xfs/229 that Eric reported. I'm
> > > > > confused by the comment above regarding this not causing incorrect
> > > > > behavior.
> > > > 
> > > > I think Darrick meant that having a nonzero extent size hint on disk
> > > > won't cause incorrect behavior because "we only examine di_extsize if
> > > > either EXTSZ flag are set"
> > > 
> > > Yeah, he probably did. :)
> > > 
> > 
> > Got it, thanks.
> > 
> > > I think Brian's suggestion of
> > > 
> > > if (i_mode != 0 && !hint && extsize != 0)
> > > 	barf_error();
> > > 
> > > sounds reasonable (having not tested that at all).
> > > 
> > 
> > I'll run it through xfstests and get it posted if nothing else fails.
> > 
> > BTW, do we have a similar issue with the cowextsize hint (assuming
> > v5+ikeep)? It looks like it's cleared similarly in xfs_ialloc(), but I'm
> > not sure if it's cleared somewhere else on free...
> 

I should note for the list that we've since determined this was already
fixed in v4.18 [1]. The patch ended up in a common base branch between
what is used for upstream pull requests and XFS' for-next, being left
out of the latter just by accident.

[1] d4a34e1655 ("xfs: properly handle free inodes in extent hint
validators")

> We should clear them on free now, so that we can draw a line in the
> sand for when we can have verifiers check it. e.g. when the next
> feature bit gets introduced, filesystems with that feature bit set
> can also verify the extent size hints are zero on freed inodes
> because we know that kernels supporting that feature always zero
> them on free....
> 

That seems fine (and harmless) to me if the goal is ultimately to have
this content clear on-disk. It keeps things consistent for verifiers,
scrub, repair, etc. to not have some bits with required initialized
values and others where we need to accommodate stale data.

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier
  2018-08-21 10:56                 ` Brian Foster
@ 2018-08-22  0:41                   ` Dave Chinner
  0 siblings, 0 replies; 36+ messages in thread
From: Dave Chinner @ 2018-08-22  0:41 UTC (permalink / raw)
  To: Brian Foster; +Cc: Darrick J. Wong, Eric Sandeen, linux-xfs

On Tue, Aug 21, 2018 at 06:56:45AM -0400, Brian Foster wrote:
> On Tue, Aug 21, 2018 at 08:15:06AM +1000, Dave Chinner wrote:
> > We should clear them on free now, so that we can draw a line in the
> > sand for when we can have verifiers check it. e.g. when the next
> > feature bit gets introduced, filesystems with that feature bit set
> > can also verify the extent size hints are zero on freed inodes
> > because we know that kernels supporting that feature always zero
> > them on free....
> > 
> 
> That seems fine (and harmless) to me if the goal is ultimately to have
> this content clear on-disk. It keeps things consistent for verifiers,
> scrub, repair, etc. to not have some bits with required initialized
> values and others where we need to accommodate stale data.

*nod*

One of my original goals for all the online repair work was to
ensure that everything on disk was clean, sanitised and not
ambiguous in any way. The verifiers are a mechanism that helps us
identify things that aren't clean or sanitised, and working out how
to verify those things identifies ambiguities we need to fix. Such
as this one with freed inodes.

As I said to Eric on #xfs:

<dchinner_> We're well beyond the point where we trust a single field
	    in an on-disk structure to be the One True Source of Truth
<dchinner_> we want everything on disk to be correct and clean
<dchinner_> so we can make decisions like this correctly via
	    multi-field verifications


-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2018-08-22  4:03 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-05  6:24 [PATCH 0/6 V2] xfs: more verifications! Dave Chinner
2018-06-05  6:24 ` [PATCH 1/6] xfs: catch bad stripe alignment configurations Dave Chinner
2018-06-05  9:27   ` Carlos Maiolino
2018-06-05  6:24 ` [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier Dave Chinner
2018-06-05  9:53   ` Carlos Maiolino
2018-06-05 22:56     ` Dave Chinner
2018-06-05 17:10   ` Darrick J. Wong
2018-06-07 16:16     ` Darrick J. Wong
2018-06-08  1:10       ` Dave Chinner
2018-06-08  1:23         ` Darrick J. Wong
2018-06-08  2:23           ` Eric Sandeen
2018-07-24  6:39   ` Eric Sandeen
2018-07-24 16:43     ` Darrick J. Wong
2018-08-20 15:06       ` Brian Foster
2018-08-20 15:27         ` Eric Sandeen
2018-08-20 15:36           ` Darrick J. Wong
2018-08-20 15:59             ` Brian Foster
2018-08-20 22:15               ` Dave Chinner
2018-08-21 10:56                 ` Brian Foster
2018-08-22  0:41                   ` Dave Chinner
2018-06-05  6:24 ` [PATCH 3/6] xfs: verify COW " Dave Chinner
2018-06-05 10:00   ` Carlos Maiolino
2018-06-05 17:09   ` Darrick J. Wong
2018-06-05  6:24 ` [PATCH 4/6] xfs: validate btree records on retreival Dave Chinner
2018-06-05  6:40   ` [PATCH 4/6 v2] " Dave Chinner
2018-06-05 10:42     ` Carlos Maiolino
2018-06-05 23:00       ` Dave Chinner
2018-06-05 17:47     ` Darrick J. Wong
2018-06-05 23:02       ` Dave Chinner
2018-06-06  1:21     ` [PATCH 4/6 v3] " Dave Chinner
2018-06-05  6:24 ` [PATCH 5/6] xfs: verify root inode more thoroughly Dave Chinner
2018-06-05 10:50   ` Carlos Maiolino
2018-06-05 17:10   ` Darrick J. Wong
2018-06-05  6:24 ` [PATCH 6/6] xfs: push corruption -> ESTALE conversion to xfs_nfs_get_inode() Dave Chinner
2018-06-05 11:12   ` Carlos Maiolino
2018-06-05 17:11   ` Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.