All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/10] xfsprogs August 2015 patchbomb
@ 2015-08-15  1:43 Darrick J. Wong
  2015-08-15  1:43 ` [PATCH 01/10] libxfs: readahead of dir3 data blocks should use the read verifier Darrick J. Wong
                   ` (9 more replies)
  0 siblings, 10 replies; 25+ messages in thread
From: Darrick J. Wong @ 2015-08-15  1:43 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Hi all,

This is a rollup of various fixes for xfsprogs 4.2.0-rc1.

The first patch fixes the dir3 data block verifier to use the standard
read verifiers during readahead.  Just a resync with the kernel, since
nothing in xfsprogs uses libxfs readahead.

The second patch fixes xfs_db's inode command not to crash when the
user tries to navigate to a corrupt inode.

The third patch fixes a bug in xfs_repair wherein if xfs_repair fixes
a broken xattr block and later decides to unmap the block, the
"repaired" flag inadvertently prohibits the unmapping of that block.

The fourth patch fixes sign handling mistakes when dealing with the
b_error field in a xfs_buf -- error values in this field are negative,
so all checks and assignments must be as well.

The fifth patch forces prefetch to mark corrupt bmbt blocks UNCHECKED
so that the regular bmbt examination will fix the bad CRC if it
doesn't take any other action against the block.  Without this, a
corruption in the unused area will trigger a kernel error yet never
get fixed by repair.

The sixth patch fixes an obscure problem in xfs_repair -- when
prefetch is enabled and there exists a directory with multiple
corrupted blocks, it's possible that both corrupt blocks will be read
in and marked UNCHECKED.  If either the first corrupt block is so
badly damaged so as to cause the directory to be erased, or if the
second corrupt block is an index block (which is automatically
unmapped when the directory is rebuilt), the second block's buffer
will remain UNCHECKED.  If the block is then allocated to something
else (say lost+found), the next readbuf of the block will fail because
nobody clears UNCHECKED and CRCs are only set during writebufr, which
only happens when repair finishes its examination.

Patch 7 implements a 'reflink' and 'dedupe' command in xfs_io.  This
will be used in future xfstests to test reflink and dedupe features of
btrfs and xfs filesystems.

Patch 8 fixes xfs_db/blocktrash to not fail write verification when
corrupting a block and allows trashing of log and symlink blocks.

Patch 9 enhances the blocktrash command with a '-z' option that
trashes the block at the top of the cursor stack and doesn't require
blockget to have been run.

Patch 10 implements blockget for v5 filesystems.  This is a second try
at a previous patch which didn't quite catch all the new magic numbers
and had some problems iterating directory index data.

I've tested these xfsprogs changes against the for-next branch as of
8/03.  The rmap/reflink patches will be dealt with separately.

Scary rewound github repo with everything attached:
https://github.com/djwong/xfsprogs

Comments and questions are, as always, welcome.

--D

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 01/10] libxfs: readahead of dir3 data blocks should use the read verifier
  2015-08-15  1:43 [PATCH 00/10] xfsprogs August 2015 patchbomb Darrick J. Wong
@ 2015-08-15  1:43 ` Darrick J. Wong
  2015-08-17 18:31   ` Eric Sandeen
  2015-08-15  1:43 ` [PATCH 02/10] xfs_db: don't crash on a corrupt inode Darrick J. Wong
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Darrick J. Wong @ 2015-08-15  1:43 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

In the dir3 data block readahead function, use the regular read
verifier to check the block's CRC and spot-check the block contents
instead of calling the spot-checking routine directly.  This prevents
corrupted directory data blocks from being read into the kernel, which
can lead to garbage ls output and directory loops (if say one of the
entries contains invalid characters).

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/xfs_dir2_data.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


diff --git a/libxfs/xfs_dir2_data.c b/libxfs/xfs_dir2_data.c
index c475ba8..466e096 100644
--- a/libxfs/xfs_dir2_data.c
+++ b/libxfs/xfs_dir2_data.c
@@ -250,7 +250,8 @@ xfs_dir3_data_reada_verify(
 		return;
 	case cpu_to_be32(XFS_DIR2_DATA_MAGIC):
 	case cpu_to_be32(XFS_DIR3_DATA_MAGIC):
-		xfs_dir3_data_verify(bp);
+		bp->b_ops = &xfs_dir3_block_buf_ops;
+		bp->b_ops->verify_read(bp);
 		return;
 	default:
 		xfs_buf_ioerror(bp, -EFSCORRUPTED);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 02/10] xfs_db: don't crash on a corrupt inode
  2015-08-15  1:43 [PATCH 00/10] xfsprogs August 2015 patchbomb Darrick J. Wong
  2015-08-15  1:43 ` [PATCH 01/10] libxfs: readahead of dir3 data blocks should use the read verifier Darrick J. Wong
@ 2015-08-15  1:43 ` Darrick J. Wong
  2015-08-17 18:52   ` Eric Sandeen
  2015-08-15  1:43 ` [PATCH 03/10] xfs_repair: ignore "repaired" flag after we decide to clear xattr block Darrick J. Wong
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Darrick J. Wong @ 2015-08-15  1:43 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

If the user selects a corrupt inode via the 'inode XXX' command, the
read verifier will fail and the io cursor at the top of the ring will
not have any data attached.  When this is the case, we cannot
dereference the NULL pointer or xfs_db will crash.  Therefore, check
the buffer pointer before using it.

It's arguable that we ought to retry the read without the verifiers
if the inode is corrupt or fails CRC, since this /is/ a debugging
tool, and maybe you wanted the contents anyway.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 db/inode.c |    2 ++
 1 file changed, 2 insertions(+)


diff --git a/db/inode.c b/db/inode.c
index e86dabd..64b263b 100644
--- a/db/inode.c
+++ b/db/inode.c
@@ -682,6 +682,8 @@ set_cur_inode(
 	set_cur(&typtab[TYP_INODE], XFS_AGB_TO_DADDR(mp, agno, cluster_agbno),
 		numblks, DB_RING_IGN, NULL);
 	off_cur(offset << mp->m_sb.sb_inodelog, mp->m_sb.sb_inodesize);
+	if (!iocur_top->data)
+		return;
 	dip = iocur_top->data;
 	iocur_top->ino_buf = 1;
 	iocur_top->ino = ino;

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 03/10] xfs_repair: ignore "repaired" flag after we decide to clear xattr block
  2015-08-15  1:43 [PATCH 00/10] xfsprogs August 2015 patchbomb Darrick J. Wong
  2015-08-15  1:43 ` [PATCH 01/10] libxfs: readahead of dir3 data blocks should use the read verifier Darrick J. Wong
  2015-08-15  1:43 ` [PATCH 02/10] xfs_db: don't crash on a corrupt inode Darrick J. Wong
@ 2015-08-15  1:43 ` Darrick J. Wong
  2015-08-17 19:20   ` Eric Sandeen
  2015-08-15  1:44 ` [PATCH 04/10] xfs_repair: fix broken EFSBADCRC/EFSCORRUPTED usage with buffer errors Darrick J. Wong
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Darrick J. Wong @ 2015-08-15  1:43 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

If in the course of examining extended attribute block contents we
first decide to repair an entry (*repair = 1) but secondly decide to
clear the whole block, set *repair = 0 because the clearing action
only happens if *repair == 0.  Put another way, if we're nuking a
block, don't pretend like we've fixed it too.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/attr_repair.c |   14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)


diff --git a/repair/attr_repair.c b/repair/attr_repair.c
index 62f80e7..2bd9334 100644
--- a/repair/attr_repair.c
+++ b/repair/attr_repair.c
@@ -1311,6 +1311,13 @@ process_leaf_attr_block(
 		* we can add it then.
 		*/
 	}
+	/*
+	 * If we're just going to zap the block, don't pretend like we
+	 * repaired it, because repairing the block stops the clear
+	 * operation.
+	 */
+	if (clearit)
+		*repair = 0;
 	if (*repair)
 		xfs_attr3_leaf_hdr_to_disk(mp->m_attr_geo, leaf, &leafhdr);
 
@@ -1524,6 +1531,7 @@ process_longform_attr(
 	xfs_dahash_t	next_hashval;
 	int		repairlinks = 0;
 	struct xfs_attr3_icleaf_hdr leafhdr;
+	int		error;
 
 	*repair = 0;
 
@@ -1604,12 +1612,16 @@ process_longform_attr(
 			libxfs_writebuf(bp, 0);
 		} else
 			libxfs_putbuf(bp);
-		return (process_node_attr(mp, ino, dip, blkmap)); /* + repair */
+		error = process_node_attr(mp, ino, dip, blkmap); /* + repair */
+		if (error)
+			*repair = 0;
+		return error;
 	default:
 		do_warn(
 	_("bad attribute leaf magic # %#x for dir ino %" PRIu64 "\n"),
 			be16_to_cpu(leaf->hdr.info.magic), ino);
 		libxfs_putbuf(bp);
+		*repair = 0;
 		return(1);
 	}
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 04/10] xfs_repair: fix broken EFSBADCRC/EFSCORRUPTED usage with buffer errors
  2015-08-15  1:43 [PATCH 00/10] xfsprogs August 2015 patchbomb Darrick J. Wong
                   ` (2 preceding siblings ...)
  2015-08-15  1:43 ` [PATCH 03/10] xfs_repair: ignore "repaired" flag after we decide to clear xattr block Darrick J. Wong
@ 2015-08-15  1:44 ` Darrick J. Wong
  2015-08-17 19:51   ` Eric Sandeen
  2015-08-15  1:44 ` [PATCH 05/10] xfs_repair: force not-so-bad bmbt blocks back through the verifier Darrick J. Wong
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Darrick J. Wong @ 2015-08-15  1:44 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

When we encounter CRC or verifier errors, bp->b_error is set to
-EFSBADCRC and -EFSCORRUPTED; note the negative sign.  For whatever
reason, repair and db use the positive versions, and therefore fail to
notice the error, so fix all the broken uses.

Note however that the db and repair turn the negative codes returned
by libxfs into positive codes that can be used with strerror.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 db/attr.c     |    4 ++--
 db/dir2.c     |    4 ++--
 db/io.c       |    4 ++--
 db/io.h       |    2 +-
 repair/dir2.c |    2 +-
 repair/scan.c |   12 ++++++------
 6 files changed, 14 insertions(+), 14 deletions(-)


diff --git a/db/attr.c b/db/attr.c
index 897834b..5e69100 100644
--- a/db/attr.c
+++ b/db/attr.c
@@ -554,7 +554,7 @@ xfs_attr3_db_read_verify(
 		break;
 	default:
 		dbprintf(_("Unknown attribute buffer type!\n"));
-		xfs_buf_ioerror(bp, EFSCORRUPTED);
+		xfs_buf_ioerror(bp, -EFSCORRUPTED);
 		return;
 	}
 verify:
@@ -566,7 +566,7 @@ xfs_attr3_db_write_verify(
 	struct xfs_buf		*bp)
 {
 	dbprintf(_("Writing unknown attribute buffer type!\n"));
-	xfs_buf_ioerror(bp, EFSCORRUPTED);
+	xfs_buf_ioerror(bp, -EFSCORRUPTED);
 }
 
 const struct xfs_buf_ops xfs_attr3_db_buf_ops = {
diff --git a/db/dir2.c b/db/dir2.c
index 7f69e6f..cc76662 100644
--- a/db/dir2.c
+++ b/db/dir2.c
@@ -1021,7 +1021,7 @@ xfs_dir3_db_read_verify(
 		break;
 	default:
 		dbprintf(_("Unknown directory buffer type!\n"));
-		xfs_buf_ioerror(bp, EFSCORRUPTED);
+		xfs_buf_ioerror(bp, -EFSCORRUPTED);
 		return;
 	}
 verify:
@@ -1033,7 +1033,7 @@ xfs_dir3_db_write_verify(
 	struct xfs_buf		*bp)
 {
 	dbprintf(_("Writing unknown directory buffer type!\n"));
-	xfs_buf_ioerror(bp, EFSCORRUPTED);
+	xfs_buf_ioerror(bp, -EFSCORRUPTED);
 }
 
 const struct xfs_buf_ops xfs_dir3_db_buf_ops = {
diff --git a/db/io.c b/db/io.c
index 9fa52b8..9452e07 100644
--- a/db/io.c
+++ b/db/io.c
@@ -535,8 +535,8 @@ set_cur(
 	 * Keep the buffer even if the verifier says it is corrupted.
 	 * We're a diagnostic tool, after all.
 	 */
-	if (!bp || (bp->b_error && bp->b_error != EFSCORRUPTED &&
-				   bp->b_error != EFSBADCRC))
+	if (!bp || (bp->b_error && bp->b_error != -EFSCORRUPTED &&
+				   bp->b_error != -EFSBADCRC))
 		return;
 	iocur_top->buf = bp->b_addr;
 	iocur_top->bp = bp;
diff --git a/db/io.h b/db/io.h
index 31d96b4..6201d7b 100644
--- a/db/io.h
+++ b/db/io.h
@@ -75,6 +75,6 @@ iocur_crc_valid()
 		return -1;
 	if (iocur_top->bp->b_flags & LIBXFS_B_UNCHECKED)
 		return -1;
-	return (iocur_top->bp->b_error != EFSBADCRC &&
+	return (iocur_top->bp->b_error != -EFSBADCRC &&
 		(!iocur_top->ino_buf || iocur_top->ino_crc_ok));
 }
diff --git a/repair/dir2.c b/repair/dir2.c
index 187e069..a5646f8 100644
--- a/repair/dir2.c
+++ b/repair/dir2.c
@@ -199,7 +199,7 @@ _("bad dir magic number 0x%x in inode %" PRIu64 " bno = %u\n"),
 			goto error_out;
 		}
 		/* corrupt node; rebuild the dir. */
-		if (bp->b_error == EFSBADCRC || bp->b_error == EFSCORRUPTED) {
+		if (bp->b_error == -EFSBADCRC || bp->b_error == -EFSCORRUPTED) {
 			do_warn(
 _("corrupt tree block %u for directory inode %" PRIu64 "\n"),
 				bno, da_cursor->ino);
diff --git a/repair/scan.c b/repair/scan.c
index 58f45eb..1e7a4da 100644
--- a/repair/scan.c
+++ b/repair/scan.c
@@ -82,7 +82,7 @@ scan_sbtree(
 		do_error(_("can't read btree block %d/%d\n"), agno, root);
 		return;
 	}
-	if (bp->b_error == EFSBADCRC || bp->b_error == EFSCORRUPTED) {
+	if (bp->b_error == -EFSBADCRC || bp->b_error == -EFSCORRUPTED) {
 		do_warn(_("btree block %d/%d is suspect, error %d\n"),
 			agno, root, bp->b_error);
 		suspect = 1;
@@ -145,7 +145,7 @@ scan_lbtree(
 	 * is a corruption or not and whether it got corrected and so needs
 	 * writing back. CRC errors always imply we need to write the block.
 	 */
-	if (bp->b_error == EFSBADCRC) {
+	if (bp->b_error == -EFSBADCRC) {
 		do_warn(_("btree block %d/%d is suspect, error %d\n"),
 			XFS_FSB_TO_AGNO(mp, root),
 			XFS_FSB_TO_AGBNO(mp, root), bp->b_error);
@@ -1432,7 +1432,7 @@ scan_freelist(
 		do_abort(_("can't read agfl block for ag %d\n"), agno);
 		return;
 	}
-	if (agflbuf->b_error == EFSBADCRC)
+	if (agflbuf->b_error == -EFSBADCRC)
 		do_warn(_("agfl has bad CRC for ag %d\n"), agno);
 
 	freelist = XFS_BUF_TO_AGFL_BNO(mp, agflbuf);
@@ -1705,9 +1705,9 @@ scan_ag(
 	 * immediately, though.
 	 */
 	if (!no_modify) {
-		agi_dirty += (agibuf->b_error == EFSBADCRC);
-		agf_dirty += (agfbuf->b_error == EFSBADCRC);
-		sb_dirty += (sbbuf->b_error == EFSBADCRC);
+		agi_dirty += (agibuf->b_error == -EFSBADCRC);
+		agf_dirty += (agfbuf->b_error == -EFSBADCRC);
+		sb_dirty += (sbbuf->b_error == -EFSBADCRC);
 	}
 
 	if (agi_dirty && !no_modify)

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 05/10] xfs_repair: force not-so-bad bmbt blocks back through the verifier
  2015-08-15  1:43 [PATCH 00/10] xfsprogs August 2015 patchbomb Darrick J. Wong
                   ` (3 preceding siblings ...)
  2015-08-15  1:44 ` [PATCH 04/10] xfs_repair: fix broken EFSBADCRC/EFSCORRUPTED usage with buffer errors Darrick J. Wong
@ 2015-08-15  1:44 ` Darrick J. Wong
  2015-08-17 21:14   ` Eric Sandeen
  2015-08-15  1:44 ` [PATCH 06/10] xfs_repair: mark unreachable prefetched metadata blocks stale Darrick J. Wong
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Darrick J. Wong @ 2015-08-15  1:44 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

If during prefetch we encounter a bmbt block that fails the CRC check
due to corruption in the unused part of the block, force the buffer
back through the non-prefetch verifiers later so that the CRC is
updated.  Otherwise, the bad checksum goes unfixed and the kernel will
still flag the bmbt block as invalid.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/prefetch.c |    8 ++++++++
 1 file changed, 8 insertions(+)


diff --git a/repair/prefetch.c b/repair/prefetch.c
index 8b261ae..fc7097f 100644
--- a/repair/prefetch.c
+++ b/repair/prefetch.c
@@ -276,6 +276,14 @@ pf_scan_lbtree(
 
 	XFS_BUF_SET_PRIORITY(bp, isadir ? B_DIR_BMAP : B_BMAP);
 
+	/*
+	 * Make this bmbt buffer go back through the verifiers later so that
+	 * we correct checksum errors stemming from bitflips in the unused
+	 * parts of the bmbt block.
+	 */
+	if (bp->b_error == -EFSBADCRC || bp->b_error == -EFSCORRUPTED)
+		bp->b_flags |= LIBXFS_B_UNCHECKED;
+
 	rc = (*func)(XFS_BUF_TO_BLOCK(bp), level - 1, isadir, args);
 
 	libxfs_putbuf(bp);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 06/10] xfs_repair: mark unreachable prefetched metadata blocks stale
  2015-08-15  1:43 [PATCH 00/10] xfsprogs August 2015 patchbomb Darrick J. Wong
                   ` (4 preceding siblings ...)
  2015-08-15  1:44 ` [PATCH 05/10] xfs_repair: force not-so-bad bmbt blocks back through the verifier Darrick J. Wong
@ 2015-08-15  1:44 ` Darrick J. Wong
  2015-08-15  1:44 ` [PATCH 07/10] xfs_io: support reflinking and deduping file ranges Darrick J. Wong
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Darrick J. Wong @ 2015-08-15  1:44 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

When we're running xfs_repair with prefetch enabled, it's possible
that repair will decide to clear an inode without examining all
metadata blocks owned by that inode.  This leaves the unreferenced
prefetched buffers marked UNCHECKED, which will cause a subsequent CRC
error if the block is reallocated to a different structure and read
more than once.  Typically this happens when a large directory is
corrupted and lost+found has to grow to accomodate all the
disconnected inodes.

Therefore, clear the UNCHECKED flag and set the STALE flag to get rid
of the CRC errors and ensure that the blocks aren't written back out
to disk without first being marked dirty.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/phase4.c |    5 +++++
 repair/phase6.c |    4 ++++
 repair/scan.c   |   36 ++++++++++++++++++++++++++++++++++++
 repair/scan.h   |    2 ++
 4 files changed, 47 insertions(+)


diff --git a/repair/phase4.c b/repair/phase4.c
index e0571e8..13b2946 100644
--- a/repair/phase4.c
+++ b/repair/phase4.c
@@ -30,6 +30,7 @@
 #include "versions.h"
 #include "dir2.h"
 #include "progress.h"
+#include "scan.h"
 
 
 /*
@@ -300,6 +301,10 @@ phase4(xfs_mount_t *mp)
 	 * already in phase 3.
 	 */
 	process_ags(mp);
+
+	/* Mark stale anything we didn't get to. */
+	mark_unchecked_buffers_stale();
+
 	print_final_rpt();
 
 	/*
diff --git a/repair/phase6.c b/repair/phase6.c
index 467f119..5edaa30 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -29,6 +29,7 @@
 #include "dinode.h"
 #include "progress.h"
 #include "versions.h"
+#include "scan.h"
 
 static struct cred		zerocr;
 static struct fsxattr 		zerofsx;
@@ -3312,6 +3313,9 @@ _("        - resetting contents of realtime bitmap and summary inodes\n"));
 	 */
 	traverse_ags(mp);
 
+	/* Mark stale anything we didn't get to. */
+	mark_unchecked_buffers_stale();
+
 	/*
 	 * any directories that had updated ".." entries, rebuild them now
 	 */
diff --git a/repair/scan.c b/repair/scan.c
index 1e7a4da..431fd24 100644
--- a/repair/scan.c
+++ b/repair/scan.c
@@ -29,6 +29,7 @@
 #include "bmap.h"
 #include "progress.h"
 #include "threads.h"
+#include "cache.h"
 
 static xfs_mount_t	*mp = NULL;
 
@@ -1804,3 +1805,38 @@ scan_ags(
 	}
 }
 
+static void
+mark_buf_stale(
+	struct cache_node	*cn)
+{
+	struct xfs_buf		*bp = (struct xfs_buf *)cn;
+
+	if (bp->b_flags & LIBXFS_B_UNCHECKED) {
+		bp->b_flags &= ~LIBXFS_B_UNCHECKED;
+		bp->b_flags |= LIBXFS_B_STALE;
+	}
+}
+
+/*
+ * Find unchecked buffers and mark them checked and stale.
+ *
+ * When prefetch is enabled, buffers will be marked unchecked if they fail
+ * verification.  Actually examining the block clears the unchecked flag, so
+ * any buffer still unchecked at the end of the examination represents an
+ * unreachable block.  A block that was reachable during prefetch but isn't
+ * by the end of the examination was owned by something that was freed as
+ * part of the exam.  Therefore, the buffer can be considered free.  Therefore,
+ * set the stale flag so that getbuf and readbuf know to zero the buffer
+ * contents the next time the buffer is accessed.
+ *
+ * This also fixes the problem that repair reports CRC errors if the block is
+ * subsequently allocated to something else, reinitialized, and re-read.  This
+ * can happen if a directory with a corrupt dir3 leaf block is erased and the
+ * leaf block gets reused to grow lost+found during phase 7.
+ */
+void
+mark_unchecked_buffers_stale(void)
+{
+	cache_walk(libxfs_bcache, mark_buf_stale);
+}
+
diff --git a/repair/scan.h b/repair/scan.h
index ea8c0bf..d232a54 100644
--- a/repair/scan.h
+++ b/repair/scan.h
@@ -70,4 +70,6 @@ scan_ags(
 	struct xfs_mount	*mp,
 	int			scan_threads);
 
+void mark_unchecked_buffers_stale(void);
+
 #endif /* _XR_SCAN_H */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 07/10] xfs_io: support reflinking and deduping file ranges
  2015-08-15  1:43 [PATCH 00/10] xfsprogs August 2015 patchbomb Darrick J. Wong
                   ` (5 preceding siblings ...)
  2015-08-15  1:44 ` [PATCH 06/10] xfs_repair: mark unreachable prefetched metadata blocks stale Darrick J. Wong
@ 2015-08-15  1:44 ` Darrick J. Wong
  2015-08-15  1:44 ` [PATCH 08/10] xfs_db: enable blocktrash for checksummed filesystems Darrick J. Wong
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Darrick J. Wong @ 2015-08-15  1:44 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Wire up xfs_io to use the XFS range clone and dedupe ioctls to make
files share data blocks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 io/Makefile       |    2 -
 io/dedupe.c       |  190 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 io/init.c         |    2 +
 io/io.h           |    3 +
 io/reflink.c      |  180 ++++++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_fs.h   |   36 ++++++++++
 man/man8/xfs_io.8 |   67 +++++++++++++++++++
 7 files changed, 479 insertions(+), 1 deletion(-)
 create mode 100644 io/dedupe.c
 create mode 100644 io/reflink.c


diff --git a/io/Makefile b/io/Makefile
index a08a782..6c4810e 100644
--- a/io/Makefile
+++ b/io/Makefile
@@ -11,7 +11,7 @@ HFILES = init.h io.h
 CFILES = init.c \
 	attr.c bmap.c file.c freeze.c fsync.c getrusage.c imap.c link.c \
 	mmap.c open.c parent.c pread.c prealloc.c pwrite.c seek.c shutdown.c \
-	sync.c truncate.c
+	sync.c truncate.c reflink.c dedupe.c
 
 LLDLIBS = $(LIBXCMD) $(LIBHANDLE)
 LTDEPENDENCIES = $(LIBXCMD) $(LIBHANDLE)
diff --git a/io/dedupe.c b/io/dedupe.c
new file mode 100644
index 0000000..59c3d0f
--- /dev/null
+++ b/io/dedupe.c
@@ -0,0 +1,190 @@
+/*
+ * Copyright (c) 2015 Oracle, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+
+#include <sys/uio.h>
+#include <xfs/xfs.h>
+#include "command.h"
+#include "input.h"
+#include "init.h"
+#include "io.h"
+
+static cmdinfo_t dedupe_cmd;
+
+static void
+dedupe_help(void)
+{
+	printf(_(
+"\n"
+" Links a range of bytes (in block size increments) from a file into a range \n"
+" of bytes in the open file.  The contents of both file ranges must match.\n"
+"\n"
+" Example:\n"
+" 'dedupe some_file 0 4096 32768' - links 32768 bytes from some_file at \n"
+"                                    offset 0 to into the open file at \n"
+"                                    position 4096\n"
+"\n"
+" Reflink a range of blocks from a given input file to the open file.  Both\n"
+" files share the same range of physical disk blocks; a write to the shared\n"
+" range of either file should result in the write landing in a new block and\n"
+" that range of the file being remapped (i.e. copy-on-write).  Both files\n"
+" must reside on the same filesystem, and the contents of both ranges must\n"
+" match.\n"
+" -w   -- call fdatasync(2) at the end (included in timing results)\n"
+" -W   -- call fsync(2) at the end (included in timing results)\n"
+"\n"));
+}
+
+static int
+dedupe_f(
+	int		argc,
+	char		**argv)
+{
+	off64_t		soffset, doffset;
+	long long	count, total;
+	char		s1[64], s2[64], ts[64];
+	char		*infile;
+	int		Cflag, qflag, wflag, Wflag;
+	struct xfs_ioctl_file_extent_same_args	*args = NULL;
+	struct xfs_ioctl_file_extent_same_info	*info;
+	size_t		fsblocksize, fssectsize;
+	struct timeval	t1, t2;
+	int		c, fd = -1;
+
+	Cflag = qflag = wflag = Wflag = 0;
+	init_cvtnum(&fsblocksize, &fssectsize);
+
+	while ((c = getopt(argc, argv, "CqwW")) != EOF) {
+		switch (c) {
+		case 'C':
+			Cflag = 1;
+			break;
+		case 'q':
+			qflag = 1;
+			break;
+		case 'w':
+			wflag = 1;
+			break;
+		case 'W':
+			Wflag = 1;
+			break;
+		default:
+			return command_usage(&dedupe_cmd);
+		}
+	}
+	if (optind != argc - 4)
+		return command_usage(&dedupe_cmd);
+	infile = argv[optind];
+	optind++;
+	soffset = cvtnum(fsblocksize, fssectsize, argv[optind]);
+	if (soffset < 0) {
+		printf(_("non-numeric src offset argument -- %s\n"), argv[optind]);
+		return 0;
+	}
+	optind++;
+	doffset = cvtnum(fsblocksize, fssectsize, argv[optind]);
+	if (doffset < 0) {
+		printf(_("non-numeric dest offset argument -- %s\n"), argv[optind]);
+		return 0;
+	}
+	optind++;
+	count = cvtnum(fsblocksize, fssectsize, argv[optind]);
+	if (count < 1) {
+		printf(_("non-positive length argument -- %s\n"), argv[optind]);
+		return 0;
+	}
+
+	c = IO_READONLY;
+	fd = openfile(infile, NULL, c, 0);
+	if (fd < 0)
+		return 0;
+
+	gettimeofday(&t1, NULL);
+	args = calloc(1, sizeof(struct xfs_ioctl_file_extent_same_args) +
+			 sizeof(struct xfs_ioctl_file_extent_same_info));
+	if (!args)
+		goto done;
+	info = (struct xfs_ioctl_file_extent_same_info *)(args + 1);
+	args->logical_offset = soffset;
+	args->length = count;
+	args->dest_count = 1;
+	info->fd = file->fd;
+	info->logical_offset = doffset;
+	do {
+		c = ioctl(fd, XFS_IOC_FILE_EXTENT_SAME, args);
+		if (c)
+			break;
+		args->logical_offset += info->bytes_deduped;
+		info->logical_offset += info->bytes_deduped;
+		args->length -= info->bytes_deduped;
+	} while (c == 0 && info->status == 0 && info->bytes_deduped > 0);
+	if (c)
+		perror(_("dedupe ioctl"));
+	if (info->status < 0)
+		printf("dedupe: %s\n", _(strerror(-info->status)));
+	if (info->status == XFS_SAME_DATA_DIFFERS)
+		printf(_("Extents did not match.\n"));
+	if (c != 0 || info->status != 0)
+		goto done;
+	total = info->bytes_deduped;
+	c = 1;
+	if (Wflag)
+		fsync(file->fd);
+	if (wflag)
+		fdatasync(file->fd);
+	if (qflag)
+		goto done;
+	gettimeofday(&t2, NULL);
+	t2 = tsub(t2, t1);
+
+	/* Finally, report back -- -C gives a parsable format */
+	timestr(&t2, ts, sizeof(ts), Cflag ? VERBOSE_FIXED_TIME : 0);
+	if (!Cflag) {
+		cvtstr((double)total, s1, sizeof(s1));
+		cvtstr(tdiv((double)total, t2), s2, sizeof(s2));
+		printf(_("linked %lld/%lld bytes at offset %lld\n"),
+			total, count, (long long)doffset);
+		printf(_("%s, %d ops; %s (%s/sec and %.4f ops/sec)\n"),
+			s1, c, ts, s2, tdiv((double)c, t2));
+	} else {/* bytes,ops,time,bytes/sec,ops/sec */
+		printf("%lld,%d,%s,%.3f,%.3f\n",
+			total, c, ts,
+			tdiv((double)total, t2), tdiv((double)c, t2));
+	}
+done:
+	free(args);
+	close(fd);
+	return 0;
+}
+
+void
+dedupe_init(void)
+{
+	dedupe_cmd.name = "dedupe";
+	dedupe_cmd.altname = "dd";
+	dedupe_cmd.cfunc = dedupe_f;
+	dedupe_cmd.argmin = 4;
+	dedupe_cmd.argmax = -1;
+	dedupe_cmd.flags = CMD_NOMAP_OK | CMD_FOREIGN_OK;
+	dedupe_cmd.args =
+_("infile src_off dst_off len");
+	dedupe_cmd.oneline =
+		_("dedupes a number of bytes at a specified offset");
+	dedupe_cmd.help = dedupe_help;
+
+	add_command(&dedupe_cmd);
+}
diff --git a/io/init.c b/io/init.c
index 13f35c4..739371e 100644
--- a/io/init.c
+++ b/io/init.c
@@ -83,6 +83,8 @@ init_commands(void)
 	sync_init();
 	sync_range_init();
 	truncate_init();
+	reflink_init();
+	dedupe_init();
 }
 
 static int
diff --git a/io/io.h b/io/io.h
index b115e4a..ec8a530 100644
--- a/io/io.h
+++ b/io/io.h
@@ -161,3 +161,6 @@ extern void		readdir_init(void);
 #else
 #define readdir_init()		do { } while (0)
 #endif
+
+extern void		reflink_init(void);
+extern void		dedupe_init(void);
diff --git a/io/reflink.c b/io/reflink.c
new file mode 100644
index 0000000..fc2d2b9
--- /dev/null
+++ b/io/reflink.c
@@ -0,0 +1,180 @@
+/*
+ * Copyright (c) 2015 Oracle, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+
+#include <sys/uio.h>
+#include <xfs/xfs.h>
+#include "command.h"
+#include "input.h"
+#include "init.h"
+#include "io.h"
+
+static cmdinfo_t reflink_cmd;
+
+static void
+reflink_help(void)
+{
+	printf(_(
+"\n"
+" Links a range of bytes (in block size increments) from a file into a range \n"
+" of bytes in the open file.  The two extent ranges need not contain identical\n"
+" data. \n"
+"\n"
+" Example:\n"
+" 'reflink some_file 0 4096 32768' - links 32768 bytes from some_file at \n"
+"                                    offset 0 to into the open file at \n"
+"                                    position 4096\n"
+" 'reflink some_file' - links all bytes from some_file into the open file\n"
+"                       at position 0\n"
+"\n"
+" Reflink a range of blocks from a given input file to the open file.  Both\n"
+" files share the same range of physical disk blocks; a write to the shared\n"
+" range of either file should result in the write landing in a new block and\n"
+" that range of the file being remapped (i.e. copy-on-write).  Both files\n"
+" must reside on the same filesystem.\n"
+" -w   -- call fdatasync(2) at the end (included in timing results)\n"
+" -W   -- call fsync(2) at the end (included in timing results)\n"
+"\n"));
+}
+
+static int
+reflink_f(
+	int		argc,
+	char		**argv)
+{
+	off64_t		soffset, doffset;
+	long long	count = 0, total;
+	char		s1[64], s2[64], ts[64];
+	char		*infile = NULL;
+	int		Cflag, qflag, wflag, Wflag;
+	struct xfs_ioctl_clone_range_args	args;
+	size_t		fsblocksize, fssectsize;
+	struct timeval	t1, t2;
+	int		c, fd = -1;
+
+	Cflag = qflag = wflag = Wflag = 0;
+	init_cvtnum(&fsblocksize, &fssectsize);
+
+	while ((c = getopt(argc, argv, "CqwW")) != EOF) {
+		switch (c) {
+		case 'C':
+			Cflag = 1;
+			break;
+		case 'q':
+			qflag = 1;
+			break;
+		case 'w':
+			wflag = 1;
+			break;
+		case 'W':
+			Wflag = 1;
+			break;
+		default:
+			return command_usage(&reflink_cmd);
+		}
+	}
+	if (optind != argc - 4 && optind != argc - 1)
+		return command_usage(&reflink_cmd);
+	infile = argv[optind];
+	optind++;
+	if (optind == argc)
+		goto clone_all;
+	soffset = cvtnum(fsblocksize, fssectsize, argv[optind]);
+	if (soffset < 0) {
+		printf(_("non-numeric src offset argument -- %s\n"), argv[optind]);
+		return 0;
+	}
+	optind++;
+	doffset = cvtnum(fsblocksize, fssectsize, argv[optind]);
+	if (doffset < 0) {
+		printf(_("non-numeric dest offset argument -- %s\n"), argv[optind]);
+		return 0;
+	}
+	optind++;
+	count = cvtnum(fsblocksize, fssectsize, argv[optind]);
+	if (count < 1) {
+		printf(_("non-positive length argument -- %s\n"), argv[optind]);
+		return 0;
+	}
+
+clone_all:
+	c = IO_READONLY;
+	fd = openfile(infile, NULL, c, 0);
+	if (fd < 0)
+		return 0;
+
+	gettimeofday(&t1, NULL);
+	if (count) {
+		args.src_fd = fd;
+		args.src_offset = soffset;
+		args.src_length = count;
+		args.dest_offset = doffset;
+		c = ioctl(file->fd, XFS_IOC_CLONE_RANGE, &args);
+	} else {
+		c = ioctl(file->fd, XFS_IOC_CLONE, fd);
+	}
+	if (c < 0) {
+		perror(_("reflink"));
+		goto done;
+	}
+	total = count;
+	c = 1;
+	if (Wflag)
+		fsync(file->fd);
+	if (wflag)
+		fdatasync(file->fd);
+	if (qflag)
+		goto done;
+	gettimeofday(&t2, NULL);
+	t2 = tsub(t2, t1);
+
+	/* Finally, report back -- -C gives a parsable format */
+	timestr(&t2, ts, sizeof(ts), Cflag ? VERBOSE_FIXED_TIME : 0);
+	if (!Cflag) {
+		cvtstr((double)total, s1, sizeof(s1));
+		cvtstr(tdiv((double)total, t2), s2, sizeof(s2));
+		printf(_("linked %lld/%lld bytes at offset %lld\n"),
+			total, count, (long long)doffset);
+		printf(_("%s, %d ops; %s (%s/sec and %.4f ops/sec)\n"),
+			s1, c, ts, s2, tdiv((double)c, t2));
+	} else {/* bytes,ops,time,bytes/sec,ops/sec */
+		printf("%lld,%d,%s,%.3f,%.3f\n",
+			total, c, ts,
+			tdiv((double)total, t2), tdiv((double)c, t2));
+	}
+done:
+	close(fd);
+	return 0;
+}
+
+void
+reflink_init(void)
+{
+	reflink_cmd.name = "reflink";
+	reflink_cmd.altname = "rl";
+	reflink_cmd.cfunc = reflink_f;
+	reflink_cmd.argmin = 4;
+	reflink_cmd.argmax = -1;
+	reflink_cmd.flags = CMD_NOMAP_OK | CMD_FOREIGN_OK;
+	reflink_cmd.args =
+_("infile src_off dst_off len");
+	reflink_cmd.oneline =
+		_("reflinks a number of bytes at a specified offset");
+	reflink_cmd.help = reflink_help;
+
+	add_command(&reflink_cmd);
+}
diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h
index 89689c6..0c922ad 100644
--- a/libxfs/xfs_fs.h
+++ b/libxfs/xfs_fs.h
@@ -559,6 +559,42 @@ typedef struct xfs_swapext
 #define XFS_IOC_GOINGDOWN	     _IOR ('X', 125, __uint32_t)
 /*	XFS_IOC_GETFSUUID ---------- deprecated 140	 */
 
+/* reflink ioctls; these MUST match the btrfs ioctl definitions */
+struct xfs_ioctl_clone_range_args {
+	__s64 src_fd;
+	__u64 src_offset;
+	__u64 src_length;
+	__u64 dest_offset;
+};
+
+#define XFS_SAME_DATA_DIFFERS	1
+/* For extent-same ioctl */
+struct xfs_ioctl_file_extent_same_info {
+	__s64 fd;		/* in - destination file */
+	__u64 logical_offset;	/* in - start of extent in destination */
+	__u64 bytes_deduped;	/* out - total # of bytes we were able
+				 * to dedupe from this file */
+	/* status of this dedupe operation:
+	 * 0 if dedup succeeds
+	 * < 0 for error
+	 * == XFS_SAME_DATA_DIFFERS if data differs
+	 */
+	__s32 status;		/* out - see above description */
+	__u32 reserved;
+};
+
+struct xfs_ioctl_file_extent_same_args {
+	__u64 logical_offset;	/* in - start of extent in source */
+	__u64 length;		/* in - length of extent */
+	__u16 dest_count;	/* in - total elements in info array */
+	__u16 reserved1;
+	__u32 reserved2;
+	struct xfs_ioctl_file_extent_same_info info[0];
+};
+
+#define XFS_IOC_CLONE		 _IOW (0x94, 9, int)
+#define XFS_IOC_CLONE_RANGE	 _IOW (0x94, 13, struct xfs_ioctl_clone_range_args)
+#define XFS_IOC_FILE_EXTENT_SAME _IOWR(0x94, 54, struct xfs_ioctl_file_extent_same_args)
 
 #ifndef HAVE_BBMACROS
 /*
diff --git a/man/man8/xfs_io.8 b/man/man8/xfs_io.8
index 416206f..305335c 100644
--- a/man/man8/xfs_io.8
+++ b/man/man8/xfs_io.8
@@ -490,6 +490,73 @@ Recursively display all the specified segments starting at the specified
 .B \-s
 Display the starting lseek(2) offset. This offset will be a calculated value when
 both data and holes are displayed together or performing a recusively display.
+.RE
+.PD
+.TP
+.TP
+.BI "reflink  [ \-w ] [ \-W ] src_file [src_offset dst_offset length]"
+On filesystems that support the
+.B XFS_IOC_CLONE_RANGE
+or
+.B BTRFS_IOC_CLONE_RANGE
+ioctls, map
+.I length
+bytes at offset
+.I dst_offset
+in the open file to the same physical blocks that are mapped at offset
+.I src_offset
+in the file
+.I src_file
+, replacing any contents that may already have been there.  If a program
+writes into a reflinked block range of either file, the dirty blocks will be
+cloned, written to, and remapped ("copy on write") in the affected file,
+leaving the other file(s) unchanged.  If src_offset, dst_offset, and length
+are omitted, all contents of src_file will be reflinked into the open file.
+.RS 1.0i
+.PD 0
+.TP 0.4i
+.B \-w
+Call
+.BR fdatasync (2)
+after executing the ioctl.
+.TP
+.B \-W
+Call
+.BR fsync (2)
+after executing the command.
+.RE
+.PD
+.TP
+.TP
+.BI "dedupe  [ \-w ] [ \-W ] src_file src_offset dst_offset length"
+On filesystems that support the
+.B XFS_IOC_FILE_EXTENT_SAME
+or
+.B BTRFS_IOC_FILE_EXTENT_SAME
+ioctls, map
+.I length
+bytes at offset
+.I dst_offset
+in the open file to the same physical blocks that are mapped at offset
+.I src_offset
+in the file
+.I src_file
+, but only if the contents of both ranges are identical.  This is known as
+block-based deduplication.  If a program writes into a reflinked block range of
+either file, the dirty blocks will be cloned, written to, and remapped ("copy
+on write") in the affected file, leaving the other file(s) unchanged.
+.RS 1.0i
+.PD 0
+.TP 0.4i
+.B \-w
+Call
+.BR fdatasync (2)
+after executing the ioctl.
+.TP
+.B \-W
+Call
+.BR fsync (2)
+after executing the command.
 .TP
 
 .SH MEMORY MAPPED I/O COMMANDS

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 08/10] xfs_db: enable blocktrash for checksummed filesystems
  2015-08-15  1:43 [PATCH 00/10] xfsprogs August 2015 patchbomb Darrick J. Wong
                   ` (6 preceding siblings ...)
  2015-08-15  1:44 ` [PATCH 07/10] xfs_io: support reflinking and deduping file ranges Darrick J. Wong
@ 2015-08-15  1:44 ` Darrick J. Wong
  2015-08-18 19:26   ` Eric Sandeen
  2015-08-15  1:44 ` [PATCH 09/10] xfs_db: trash the block at the top of the cursor stack Darrick J. Wong
  2015-08-15  1:44 ` [PATCH 10/10] xfs_db: enable blockget for v5 filesystems Darrick J. Wong
  9 siblings, 1 reply; 25+ messages in thread
From: Darrick J. Wong @ 2015-08-15  1:44 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Disable the write verifiers when we're trashing a block.  With this
in place, create a xfs fuzzer script that formats, populates, corrupts,
tries to use, repairs, and tries again to use a crash test xfs image.
Hopefully this will shake out some v5 filesystem bugs.

v2: Drop xfsfuzz, don't assume every block is an AGF when blocktrashing.
Don't trash log blocks by default, because that skews the blocktrash
heavily towards damaging only log blocks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 db/check.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)


diff --git a/db/check.c b/db/check.c
index afeea32..965d0f5 100644
--- a/db/check.c
+++ b/db/check.c
@@ -944,6 +944,7 @@ blocktrash_b(
 	int		mask;
 	int		newbit;
 	int		offset;
+	const struct xfs_buf_ops *stashed_ops;
 	static char	*modestr[] = {
 		N_("zeroed"), N_("set"), N_("flipped"), N_("randomized")
 	};
@@ -952,8 +953,10 @@ blocktrash_b(
 	offset = (int)(random() % (int)(mp->m_sb.sb_blocksize * NBBY));
 	newbit = 0;
 	push_cur();
-	set_cur(&typtab[DBM_UNKNOWN],
+	set_cur(NULL,
 		XFS_AGB_TO_DADDR(mp, agno, agbno), blkbb, DB_RING_IGN, NULL);
+	stashed_ops = iocur_top->bp->b_ops;
+	iocur_top->bp->b_ops = NULL;
 	if ((buf = iocur_top->data) == NULL) {
 		dbprintf(_("can't read block %u/%u for trashing\n"), agno, agbno);
 		pop_cur();
@@ -984,6 +987,7 @@ blocktrash_b(
 			buf[byte] &= ~mask;
 	}
 	write_cur();
+	iocur_top->bp->b_ops = stashed_ops;
 	pop_cur();
 	printf(_("blocktrash: %u/%u %s block %d bit%s starting %d:%d %s\n"),
 		agno, agbno, typename[type], len, len == 1 ? "" : "s",
@@ -1040,9 +1044,11 @@ blocktrash_f(
 		   (1 << DBM_BTINO) |
 		   (1 << DBM_DIR) |
 		   (1 << DBM_INODE) |
+		   (1 << DBM_LOG) |
 		   (1 << DBM_QUOTA) |
 		   (1 << DBM_RTBITMAP) |
 		   (1 << DBM_RTSUM) |
+		   (1 << DBM_SYMLINK) |
 		   (1 << DBM_SB);
 	while ((c = getopt(argc, argv, "0123n:s:t:x:y:")) != EOF) {
 		switch (c) {
@@ -1106,7 +1112,7 @@ blocktrash_f(
 		return 0;
 	}
 	if (tmask == 0)
-		tmask = goodmask;
+		tmask = goodmask & ~((1 << DBM_LOG) | (1 << DBM_SB));
 	lentab = xmalloc(sizeof(ltab_t));
 	lentab->min = lentab->max = min;
 	lentablen = 1;

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 09/10] xfs_db: trash the block at the top of the cursor stack
  2015-08-15  1:43 [PATCH 00/10] xfsprogs August 2015 patchbomb Darrick J. Wong
                   ` (7 preceding siblings ...)
  2015-08-15  1:44 ` [PATCH 08/10] xfs_db: enable blocktrash for checksummed filesystems Darrick J. Wong
@ 2015-08-15  1:44 ` Darrick J. Wong
  2015-08-18 19:59   ` Eric Sandeen
  2015-08-15  1:44 ` [PATCH 10/10] xfs_db: enable blockget for v5 filesystems Darrick J. Wong
  9 siblings, 1 reply; 25+ messages in thread
From: Darrick J. Wong @ 2015-08-15  1:44 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Add a new -z option to blocktrash to make it trash the block that's at
the top of the stack, so that we can perform targeted fuzzing.  While
we're at it, prevent fuzzing off the end of the buffer and add a -o
parameter so that we can specify an offset to start fuzzing from.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 db/check.c        |   81 +++++++++++++++++++++++++++++++++++++++++------------
 man/man8/xfs_db.8 |   15 +++++++++-
 2 files changed, 77 insertions(+), 19 deletions(-)


diff --git a/db/check.c b/db/check.c
index 965d0f5..7c11b0b 100644
--- a/db/check.c
+++ b/db/check.c
@@ -930,8 +930,7 @@ typedef struct ltab {
 
 static void
 blocktrash_b(
-	xfs_agnumber_t	agno,
-	xfs_agblock_t	agbno,
+	int		offset,
 	dbm_t		type,
 	ltab_t		*ltabp,
 	int		mode)
@@ -943,23 +942,36 @@ blocktrash_b(
 	int		len;
 	int		mask;
 	int		newbit;
-	int		offset;
 	const struct xfs_buf_ops *stashed_ops;
 	static char	*modestr[] = {
 		N_("zeroed"), N_("set"), N_("flipped"), N_("randomized")
 	};
+	xfs_agnumber_t	agno;
+	xfs_agblock_t	agbno;
 
+	agno = XFS_FSB_TO_AGNO(mp, XFS_DADDR_TO_FSB(mp, iocur_top->bb));
+	agbno = XFS_FSB_TO_AGBNO(mp, XFS_DADDR_TO_FSB(mp, iocur_top->bb));
+	if (iocur_top->len == 0) {
+		dbprintf(_("zero-length block %u/%u buffer to trash??\n"),
+				agno, agbno);
+		return;
+	}
 	len = (int)((random() % (ltabp->max - ltabp->min + 1)) + ltabp->min);
-	offset = (int)(random() % (int)(mp->m_sb.sb_blocksize * NBBY));
+	/*
+	 * offset >= 0: start fuzzing at this exact offset.
+	 * offset < 0: pick an offset at least as high at -(offset + 1).
+	 */
+	if (offset < 0) {
+		offset = -(offset + 1);
+		offset = offset + (int)(random() % (int)((iocur_top->len - offset) * NBBY));
+	}
+	if (offset + len >= iocur_top->len * NBBY)
+		len = (iocur_top->len * NBBY) - offset;
 	newbit = 0;
-	push_cur();
-	set_cur(NULL,
-		XFS_AGB_TO_DADDR(mp, agno, agbno), blkbb, DB_RING_IGN, NULL);
 	stashed_ops = iocur_top->bp->b_ops;
 	iocur_top->bp->b_ops = NULL;
 	if ((buf = iocur_top->data) == NULL) {
 		dbprintf(_("can't read block %u/%u for trashing\n"), agno, agbno);
-		pop_cur();
 		return;
 	}
 	for (bitno = 0; bitno < len; bitno++) {
@@ -988,7 +1000,6 @@ blocktrash_b(
 	}
 	write_cur();
 	iocur_top->bp->b_ops = stashed_ops;
-	pop_cur();
 	printf(_("blocktrash: %u/%u %s block %d bit%s starting %d:%d %s\n"),
 		agno, agbno, typename[type], len, len == 1 ? "" : "s",
 		offset / NBBY, offset % NBBY, modestr[mode]);
@@ -1019,11 +1030,9 @@ blocktrash_f(
 	uint		seed;
 	int		sopt;
 	int		tmask;
+	bool		this_block = false;
+	int		offset = -1;
 
-	if (!dbmap) {
-		dbprintf(_("must run blockget first\n"));
-		return 0;
-	}
 	optind = 0;
 	count = 1;
 	min = 1;
@@ -1050,7 +1059,7 @@ blocktrash_f(
 		   (1 << DBM_RTSUM) |
 		   (1 << DBM_SYMLINK) |
 		   (1 << DBM_SB);
-	while ((c = getopt(argc, argv, "0123n:s:t:x:y:")) != EOF) {
+	while ((c = getopt(argc, argv, "0123n:o:s:t:x:y:z")) != EOF) {
 		switch (c) {
 		case '0':
 			mode = 0;
@@ -1071,6 +1080,21 @@ blocktrash_f(
 				return 0;
 			}
 			break;
+		case 'o': {
+			int relative = 0;
+			if (optarg[0] == '+') {
+				optarg++;
+				relative = 1;
+			}
+			offset = (int)strtol(optarg, &p, 0);
+			if (*p != '\0' || offset < 0) {
+				dbprintf(_("bad blocktrash offset %s\n"), optarg);
+				return 0;
+			}
+			if (relative)
+				offset = -offset - 1;
+			break;
+		}
 		case 's':
 			seed = (uint)strtoul(optarg, &p, 0);
 			sopt = 1;
@@ -1102,11 +1126,22 @@ blocktrash_f(
 				return 0;
 			}
 			break;
+		case 'z':
+			this_block = true;
+			break;
 		default:
 			dbprintf(_("bad option for blocktrash command\n"));
 			return 0;
 		}
 	}
+	if (!this_block && !dbmap) {
+		dbprintf(_("must run blockget first\n"));
+		return 0;
+	}
+	if (this_block && iocur_sp == 0) {
+		dbprintf(_("nothing on stack\n"));
+		return 0;
+	}
 	if (min > max) {
 		dbprintf(_("bad min/max for blocktrash command\n"));
 		return 0;
@@ -1125,6 +1160,14 @@ blocktrash_f(
 		} else
 			lentab[lentablen - 1].max = i;
 	}
+	if (!sopt)
+		dbprintf(_("blocktrash: seed %u\n"), seed);
+	srandom(seed);
+	if (this_block) {
+		blocktrash_b(offset, DBM_UNKNOWN, &lentab[random() % lentablen],
+				mode);
+		goto out;
+	}
 	for (blocks = 0, agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
 		for (agbno = 0, p = dbmap[agno];
 		     agbno < mp->m_sb.sb_agblocks;
@@ -1137,9 +1180,6 @@ blocktrash_f(
 		dbprintf(_("blocktrash: no matching blocks\n"));
 		goto out;
 	}
-	if (!sopt)
-		dbprintf(_("blocktrash: seed %u\n"), seed);
-	srandom(seed);
 	for (i = 0; i < count; i++) {
 		randb = (xfs_rfsblock_t)((((__int64_t)random() << 32) |
 					 random()) % blocks);
@@ -1153,8 +1193,13 @@ blocktrash_f(
 					continue;
 				if (bi++ < randb)
 					continue;
-				blocktrash_b(agno, agbno, (dbm_t)*p,
+				push_cur();
+				set_cur(NULL,
+					XFS_AGB_TO_DADDR(mp, agno, agbno),
+					blkbb, DB_RING_IGN, NULL);
+				blocktrash_b(offset, (dbm_t)*p,
 					&lentab[random() % lentablen], mode);
+				pop_cur();
 				done = 1;
 				break;
 			}
diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8
index df54bb7..681efc4 100644
--- a/man/man8/xfs_db.8
+++ b/man/man8/xfs_db.8
@@ -232,7 +232,7 @@ enables verbose output. Messages will be printed for every block and
 inode processed.
 .RE
 .TP
-.BI "blocktrash [\-n " count "] [\-x " min "] [\-y " max "] [\-s " seed "] [\-0|1|2|3] [\-t " type "] ..."
+.BI "blocktrash [-z] [\-o " offset "] [\-n " count "] [\-x " min "] [\-y " max "] [\-s " seed "] [\-0|1|2|3] [\-t " type "] ..."
 Trash randomly selected filesystem metadata blocks.
 Trashing occurs to randomly selected bits in the chosen blocks.
 This command is available only in debugging versions of
@@ -259,6 +259,13 @@ supplies the
 .I count
 of block-trashings to perform (default 1).
 .TP
+.B \-o
+supplies the bit
+.I offset
+at which to start trashing the block.  If the value is preceded by a '+', the
+trashing will start at a randomly chosen offset that is larger than the value
+supplied.  The default is to randomly choose an offset anywhere in the block.
+.TP
 .B \-s
 supplies a
 .I seed
@@ -282,6 +289,12 @@ size of bit range to be trashed. The default value is 1.
 sets the
 .I maximum
 size of bit range to be trashed. The default value is 1024.
+.TP
+.B \-z
+trashes the block at the top of the stack.  It is not necessary to
+run
+.BI blockget
+if this option is supplied.
 .RE
 .TP
 .BI "blockuse [\-n] [\-c " count ]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 10/10] xfs_db: enable blockget for v5 filesystems
  2015-08-15  1:43 [PATCH 00/10] xfsprogs August 2015 patchbomb Darrick J. Wong
                   ` (8 preceding siblings ...)
  2015-08-15  1:44 ` [PATCH 09/10] xfs_db: trash the block at the top of the cursor stack Darrick J. Wong
@ 2015-08-15  1:44 ` Darrick J. Wong
  9 siblings, 0 replies; 25+ messages in thread
From: Darrick J. Wong @ 2015-08-15  1:44 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Plumb in the necessary magic number checks and other fixups required
to handle v5 filesystems.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 db/check.c |  238 +++++++++++++++++++++++++++++++++++++++++++++++++++++-------
 db/type.c  |    2 +
 2 files changed, 214 insertions(+), 26 deletions(-)


diff --git a/db/check.c b/db/check.c
index 7c11b0b..edfe62d 100644
--- a/db/check.c
+++ b/db/check.c
@@ -44,7 +44,7 @@ typedef enum {
 	DBM_FREE1,	DBM_FREE2,	DBM_FREELIST,	DBM_INODE,
 	DBM_LOG,	DBM_MISSING,	DBM_QUOTA,	DBM_RTBITMAP,
 	DBM_RTDATA,	DBM_RTFREE,	DBM_RTSUM,	DBM_SB,
-	DBM_SYMLINK,
+	DBM_SYMLINK,	DBM_BTFINO,
 	DBM_NDBM
 } dbm_t;
 
@@ -170,6 +170,7 @@ static const char	*typename[] = {
 	"rtsum",
 	"sb",
 	"symlink",
+	"btfino",
 	NULL
 };
 static int		verbose;
@@ -345,6 +346,9 @@ static void		scanfunc_cnt(struct xfs_btree_block *block, int level,
 static void		scanfunc_ino(struct xfs_btree_block *block, int level,
 				     xfs_agf_t *agf, xfs_agblock_t bno,
 				     int isroot);
+static void		scanfunc_fino(struct xfs_btree_block *block, int level,
+				     struct xfs_agf *agf, xfs_agblock_t bno,
+				     int isroot);
 static void		set_dbmap(xfs_agnumber_t agno, xfs_agblock_t agbno,
 				  xfs_extlen_t len, dbm_t type,
 				  xfs_agnumber_t c_agno, xfs_agblock_t c_agbno);
@@ -789,19 +793,6 @@ blockget_f(
 		return 0;
 	}
 
-	/*
-	 * XXX: check does not support CRC enabled filesystems. Return
-	 * immediately, silently, with success but  without doing anything here
-	 * initially so that xfstests can run without modification on metadata
-	 * enabled filesystems.
-	 *
-	 * XXX: ultimately we need to dump an error message here that xfstests
-	 * filters out, or we need to actually do the work to make check support
-	 * crc enabled filesystems.
-	 */
-	if (xfs_sb_version_hascrc(&mp->m_sb))
-		return 0;
-
 	if (!init(argc, argv)) {
 		if (serious_error)
 			exitcode = 3;
@@ -1058,6 +1049,7 @@ blocktrash_f(
 		   (1 << DBM_RTBITMAP) |
 		   (1 << DBM_RTSUM) |
 		   (1 << DBM_SYMLINK) |
+		   (1 << DBM_BTFINO) |
 		   (1 << DBM_SB);
 	while ((c = getopt(argc, argv, "0123n:o:s:t:x:y:z")) != EOF) {
 		switch (c) {
@@ -2267,7 +2259,9 @@ process_data_dir_v2(
 	data = iocur_top->data;
 	block = iocur_top->data;
 	if (be32_to_cpu(block->magic) != XFS_DIR2_BLOCK_MAGIC &&
-			be32_to_cpu(data->magic) != XFS_DIR2_DATA_MAGIC) {
+			be32_to_cpu(data->magic) != XFS_DIR2_DATA_MAGIC &&
+			be32_to_cpu(block->magic) != XFS_DIR3_BLOCK_MAGIC &&
+			be32_to_cpu(data->magic) != XFS_DIR3_DATA_MAGIC) {
 		if (!sflag || v)
 			dbprintf(_("bad directory data magic # %#x for dir ino "
 				 "%lld block %d\n"),
@@ -2278,7 +2272,8 @@ process_data_dir_v2(
 	db = xfs_dir2_da_to_db(mp->m_dir_geo, dabno);
 	bf = M_DIROPS(mp)->data_bestfree_p(data);
 	ptr = (char *)M_DIROPS(mp)->data_unused_p(data);
-	if (be32_to_cpu(block->magic) == XFS_DIR2_BLOCK_MAGIC) {
+	if (be32_to_cpu(block->magic) == XFS_DIR2_BLOCK_MAGIC ||
+	    be32_to_cpu(block->magic) == XFS_DIR3_BLOCK_MAGIC) {
 		btp = xfs_dir2_block_tail_p(mp->m_dir_geo, block);
 		lep = xfs_dir2_block_leaf_p(btp);
 		endptr = (char *)lep;
@@ -2424,7 +2419,8 @@ process_data_dir_v2(
 			(*dot)++;
 		}
 	}
-	if (be32_to_cpu(data->magic) == XFS_DIR2_BLOCK_MAGIC) {
+	if (be32_to_cpu(data->magic) == XFS_DIR2_BLOCK_MAGIC ||
+	    be32_to_cpu(data->magic) == XFS_DIR3_BLOCK_MAGIC) {
 		endptr = (char *)data + mp->m_dir_geo->blksize;
 		for (i = stale = 0; lep && i < be32_to_cpu(btp->count); i++) {
 			if ((char *)&lep[i] >= endptr) {
@@ -2456,7 +2452,8 @@ process_data_dir_v2(
 				id->ino, dabno);
 		error++;
 	}
-	if (be32_to_cpu(data->magic) == XFS_DIR2_BLOCK_MAGIC &&
+	if ((be32_to_cpu(data->magic) == XFS_DIR2_BLOCK_MAGIC ||
+	     be32_to_cpu(data->magic) == XFS_DIR3_BLOCK_MAGIC) &&
 	    count != be32_to_cpu(btp->count) - be32_to_cpu(btp->stale)) {
 		if (!sflag || v)
 			dbprintf(_("dir %lld block %d bad block tail count %d "
@@ -2465,7 +2462,8 @@ process_data_dir_v2(
 				be32_to_cpu(btp->stale));
 		error++;
 	}
-	if (be32_to_cpu(data->magic) == XFS_DIR2_BLOCK_MAGIC && 
+	if ((be32_to_cpu(data->magic) == XFS_DIR2_BLOCK_MAGIC ||
+	     be32_to_cpu(data->magic) == XFS_DIR2_BLOCK_MAGIC) &&
 					stale != be32_to_cpu(btp->stale)) {
 		if (!sflag || v)
 			dbprintf(_("dir %lld block %d bad stale tail count %d\n"),
@@ -3051,6 +3049,73 @@ process_leaf_node_dir_v2(
 }
 
 static void
+process_leaf_node_dir_v3_free(
+	inodata_t		*id,
+	int			v,
+	xfs_dablk_t		dabno,
+	freetab_t		*freetab)
+{
+	xfs_dir2_data_off_t	ent;
+	struct xfs_dir3_free	*free;
+	int			i;
+	int			maxent;
+	int			used;
+
+	free = iocur_top->data;
+	maxent = M_DIROPS(mp)->free_max_bests(mp->m_dir_geo);
+	if (be32_to_cpu(free->hdr.firstdb) != xfs_dir2_da_to_db(mp->m_dir_geo, 
+					dabno - mp->m_dir_geo->freeblk) * maxent) {
+		if (!sflag || v)
+			dbprintf(_("bad free block firstdb %d for dir ino %lld "
+				 "block %d\n"),
+				be32_to_cpu(free->hdr.firstdb), id->ino, dabno);
+		error++;
+		return;
+	}
+	if (be32_to_cpu(free->hdr.nvalid) > maxent || 
+				be32_to_cpu(free->hdr.nvalid) < 0 ||
+				be32_to_cpu(free->hdr.nused) > maxent || 
+				be32_to_cpu(free->hdr.nused) < 0 ||
+				be32_to_cpu(free->hdr.nused) > 
+					be32_to_cpu(free->hdr.nvalid)) {
+		if (!sflag || v)
+			dbprintf(_("bad free block nvalid/nused %d/%d for dir "
+				 "ino %lld block %d\n"),
+				be32_to_cpu(free->hdr.nvalid), 
+				be32_to_cpu(free->hdr.nused), id->ino, dabno);
+		error++;
+		return;
+	}
+	for (used = i = 0; i < be32_to_cpu(free->hdr.nvalid); i++) {
+		if (freetab->nents <= be32_to_cpu(free->hdr.firstdb) + i)
+			ent = NULLDATAOFF;
+		else
+			ent = freetab->ents[be32_to_cpu(free->hdr.firstdb) + i];
+		if (ent != be16_to_cpu(free->bests[i])) {
+			if (!sflag || v)
+				dbprintf(_("bad free block ent %d is %d should "
+					 "be %d for dir ino %lld block %d\n"),
+					i, be16_to_cpu(free->bests[i]), ent, 
+					id->ino, dabno);
+			error++;
+		}
+		if (be16_to_cpu(free->bests[i]) != NULLDATAOFF)
+			used++;
+		if (ent != NULLDATAOFF)
+			freetab->ents[be32_to_cpu(free->hdr.firstdb) + i] = 
+								NULLDATAOFF;
+	}
+	if (used != be32_to_cpu(free->hdr.nused)) {
+		if (!sflag || v)
+			dbprintf(_("bad free block nused %d should be %d for dir "
+				 "ino %lld block %d\n"),
+				be32_to_cpu(free->hdr.nused), used, id->ino, 
+				dabno);
+		error++;
+	}
+}
+
+static void
 process_leaf_node_dir_v2_free(
 	inodata_t		*id,
 	int			v,
@@ -3064,7 +3129,8 @@ process_leaf_node_dir_v2_free(
 	int			used;
 
 	free = iocur_top->data;
-	if (be32_to_cpu(free->hdr.magic) != XFS_DIR2_FREE_MAGIC) {
+	if (be32_to_cpu(free->hdr.magic) != XFS_DIR2_FREE_MAGIC &&
+	    be32_to_cpu(free->hdr.magic) != XFS_DIR3_FREE_MAGIC) {
 		if (!sflag || v)
 			dbprintf(_("bad free block magic # %#x for dir ino %lld "
 				 "block %d\n"),
@@ -3072,6 +3138,10 @@ process_leaf_node_dir_v2_free(
 		error++;
 		return;
 	}
+	if (be32_to_cpu(free->hdr.magic) == XFS_DIR3_FREE_MAGIC) {
+		process_leaf_node_dir_v3_free(id, v, dabno, freetab);
+		return;
+	}
 	maxent = M_DIROPS(mp)->free_max_bests(mp->m_dir_geo);
 	if (be32_to_cpu(free->hdr.firstdb) != xfs_dir2_da_to_db(mp->m_dir_geo, 
 					dabno - mp->m_dir_geo->freeblk) * maxent) {
@@ -3125,6 +3195,21 @@ process_leaf_node_dir_v2_free(
 	}
 }
 
+/*
+ * Get address of the bestcount field in the single-leaf block.
+ */
+static inline int
+xfs_dir3_leaf_ents_count(struct xfs_dir2_leaf *lp)
+{
+	if (lp->hdr.info.magic == cpu_to_be16(XFS_DIR3_LEAF1_MAGIC) ||
+	    lp->hdr.info.magic == cpu_to_be16(XFS_DIR3_LEAFN_MAGIC)) {
+		struct xfs_dir3_leaf *lp3 = (struct xfs_dir3_leaf *)lp;
+
+		return be16_to_cpu(lp3->hdr.count);
+	}
+	return be16_to_cpu(lp->hdr.count);
+}
+
 static void
 process_leaf_node_dir_v2_int(
 	inodata_t		*id,
@@ -3135,6 +3220,7 @@ process_leaf_node_dir_v2_int(
 	int			i;
 	__be16			*lbp;
 	xfs_dir2_leaf_t		*leaf;
+	struct xfs_dir3_leaf	*leaf3 = NULL;
 	xfs_dir2_leaf_entry_t	*lep;
 	xfs_dir2_leaf_tail_t	*ltp;
 	xfs_da_intnode_t	*node;
@@ -3143,7 +3229,15 @@ process_leaf_node_dir_v2_int(
 
 	leaf = iocur_top->data;
 	switch (be16_to_cpu(leaf->hdr.info.magic)) {
+	case XFS_DIR3_LEAF1_MAGIC:
+	case XFS_DIR3_LEAFN_MAGIC:
+	case XFS_DA3_NODE_MAGIC:
+		leaf3 = iocur_top->data;
+		break;
+	}
+	switch (be16_to_cpu(leaf->hdr.info.magic)) {
 	case XFS_DIR2_LEAF1_MAGIC:
+	case XFS_DIR3_LEAF1_MAGIC:
 		if (be32_to_cpu(leaf->hdr.info.forw) || 
 					be32_to_cpu(leaf->hdr.info.back)) {
 			if (!sflag || v)
@@ -3183,10 +3277,12 @@ process_leaf_node_dir_v2_int(
 		}
 		break;
 	case XFS_DIR2_LEAFN_MAGIC:
+	case XFS_DIR3_LEAFN_MAGIC:
 		/* if it's at the root location then we can check the
 		 * pointers are null XXX */
 		break;
 	case XFS_DA_NODE_MAGIC:
+	case XFS_DA3_NODE_MAGIC:
 		node = iocur_top->data;
 		M_DIROPS(mp)->node_hdr_from_disk(&nodehdr, node);
 		if (nodehdr.level < 1 || nodehdr.level > XFS_DA_NODE_MAXDEPTH) {
@@ -3208,7 +3304,7 @@ process_leaf_node_dir_v2_int(
 		return;
 	}
 	lep = M_DIROPS(mp)->leaf_ents_p(leaf);
-	for (i = stale = 0; i < be16_to_cpu(leaf->hdr.count); i++) {
+	for (i = stale = 0; i < xfs_dir3_leaf_ents_count(leaf); i++) {
 		if (be32_to_cpu(lep[i].address) == XFS_DIR2_NULL_DATAPTR)
 			stale++;
 		else if (dir_hash_see(be32_to_cpu(lep[i].hashval), 
@@ -3221,7 +3317,14 @@ process_leaf_node_dir_v2_int(
 			error++;
 		}
 	}
-	if (stale != be16_to_cpu(leaf->hdr.stale)) {
+	if (leaf3 && stale != be16_to_cpu(leaf3->hdr.stale)) {
+		if (!sflag || v)
+			dbprintf(_("dir3 %lld block %d stale mismatch "
+				 "%d/%d\n"),
+				 id->ino, dabno, stale,
+				 be16_to_cpu(leaf3->hdr.stale));
+		error++;
+	} else if (!leaf && stale != be16_to_cpu(leaf->hdr.stale)) {
 		if (!sflag || v)
 			dbprintf(_("dir %lld block %d stale mismatch "
 				 "%d/%d\n"),
@@ -3808,6 +3911,12 @@ scan_ag(
 		be32_to_cpu(agi->agi_root),
 		be32_to_cpu(agi->agi_level),
 		1, scanfunc_ino, TYP_INOBT);
+	if (agi->agi_free_root) {
+		scan_sbtree(agf,
+			be32_to_cpu(agi->agi_free_root),
+			be32_to_cpu(agi->agi_free_level),
+			1, scanfunc_fino, TYP_FINOBT);
+	}
 	if (be32_to_cpu(agf->agf_freeblks) != agffreeblks) {
 		if (!sflag)
 			dbprintf(_("agf_freeblks %u, counted %u in ag %u\n"),
@@ -4007,7 +4116,8 @@ scanfunc_bmap(
 
 	agno = XFS_FSB_TO_AGNO(mp, bno);
 	agbno = XFS_FSB_TO_AGBNO(mp, bno);
-	if (be32_to_cpu(block->bb_magic) != XFS_BMAP_MAGIC) {
+	if (be32_to_cpu(block->bb_magic) != XFS_BMAP_MAGIC &&
+	    be32_to_cpu(block->bb_magic) != XFS_BMAP_CRC_MAGIC) {
 		if (!sflag || id->ilist || CHECK_BLIST(bno))
 			dbprintf(_("bad magic # %#x in inode %lld bmbt block "
 				 "%u/%u\n"),
@@ -4072,7 +4182,8 @@ scanfunc_bno(
 	xfs_agnumber_t		seqno = be32_to_cpu(agf->agf_seqno);
 	xfs_agblock_t		lastblock;
 
-	if (be32_to_cpu(block->bb_magic) != XFS_ABTB_MAGIC) {
+	if (be32_to_cpu(block->bb_magic) != XFS_ABTB_MAGIC &&
+	    be32_to_cpu(block->bb_magic) != XFS_ABTB_CRC_MAGIC) {
 		dbprintf(_("bad magic # %#x in btbno block %u/%u\n"),
 			be32_to_cpu(block->bb_magic), seqno, bno);
 		serious_error++;
@@ -4145,7 +4256,8 @@ scanfunc_cnt(
 	xfs_alloc_rec_t		*rp;
 	xfs_extlen_t		lastcount;
 
-	if (be32_to_cpu(block->bb_magic) != XFS_ABTC_MAGIC) {
+	if (be32_to_cpu(block->bb_magic) != XFS_ABTC_MAGIC &&
+	    be32_to_cpu(block->bb_magic) != XFS_ABTC_CRC_MAGIC) {
 		dbprintf(_("bad magic # %#x in btcnt block %u/%u\n"),
 			be32_to_cpu(block->bb_magic), seqno, bno);
 		serious_error++;
@@ -4225,7 +4337,8 @@ scanfunc_ino(
 	xfs_inobt_ptr_t		*pp;
 	xfs_inobt_rec_t		*rp;
 
-	if (be32_to_cpu(block->bb_magic) != XFS_IBT_MAGIC) {
+	if (be32_to_cpu(block->bb_magic) != XFS_IBT_MAGIC &&
+	    be32_to_cpu(block->bb_magic) != XFS_IBT_CRC_MAGIC) {
 		dbprintf(_("bad magic # %#x in inobt block %u/%u\n"),
 			be32_to_cpu(block->bb_magic), seqno, bno);
 		serious_error++;
@@ -4321,6 +4434,79 @@ scanfunc_ino(
 }
 
 static void
+scanfunc_fino(
+	struct xfs_btree_block	*block,
+	int			level,
+	struct xfs_agf		*agf,
+	xfs_agblock_t		bno,
+	int			isroot)
+{
+	xfs_agino_t		agino;
+	xfs_agnumber_t		seqno = be32_to_cpu(agf->agf_seqno);
+	int			i;
+	int			off;
+	xfs_inobt_ptr_t		*pp;
+	struct xfs_inobt_rec	*rp;
+
+	if (be32_to_cpu(block->bb_magic) != XFS_FIBT_MAGIC &&
+	    be32_to_cpu(block->bb_magic) != XFS_FIBT_CRC_MAGIC) {
+		dbprintf(_("bad magic # %#x in finobt block %u/%u\n"),
+			be32_to_cpu(block->bb_magic), seqno, bno);
+		serious_error++;
+		return;
+	}
+	if (be16_to_cpu(block->bb_level) != level) {
+		if (!sflag)
+			dbprintf(_("expected level %d got %d in finobt block "
+				 "%u/%u\n"),
+				level, be16_to_cpu(block->bb_level), seqno, bno);
+		error++;
+	}
+	set_dbmap(seqno, bno, 1, DBM_BTFINO, seqno, bno);
+	if (level == 0) {
+		if (be16_to_cpu(block->bb_numrecs) > mp->m_inobt_mxr[0] ||
+		    (isroot == 0 && be16_to_cpu(block->bb_numrecs) < mp->m_inobt_mnr[0])) {
+			dbprintf(_("bad btree nrecs (%u, min=%u, max=%u) in "
+				 "finobt block %u/%u\n"),
+				be16_to_cpu(block->bb_numrecs), mp->m_inobt_mnr[0],
+				mp->m_inobt_mxr[0], seqno, bno);
+			serious_error++;
+			return;
+		}
+		rp = XFS_INOBT_REC_ADDR(mp, block, 1);
+		for (i = 0; i < be16_to_cpu(block->bb_numrecs); i++) {
+			agino = be32_to_cpu(rp[i].ir_startino);
+			off = XFS_INO_TO_OFFSET(mp, agino);
+			if (off == 0) {
+				if ((sbversion & XFS_SB_VERSION_ALIGNBIT) &&
+				    mp->m_sb.sb_inoalignmt &&
+				    (XFS_INO_TO_AGBNO(mp, agino) %
+				     mp->m_sb.sb_inoalignmt))
+					sbversion &= ~XFS_SB_VERSION_ALIGNBIT;
+				check_set_dbmap(seqno, XFS_AGINO_TO_AGBNO(mp, agino),
+					(xfs_extlen_t)MAX(1,
+						XFS_INODES_PER_CHUNK >>
+						mp->m_sb.sb_inopblog),
+					DBM_INODE, DBM_INODE, seqno, bno);
+			}
+		}
+		return;
+	}
+	if (be16_to_cpu(block->bb_numrecs) > mp->m_inobt_mxr[1] ||
+	    (isroot == 0 && be16_to_cpu(block->bb_numrecs) < mp->m_inobt_mnr[1])) {
+		dbprintf(_("bad btree nrecs (%u, min=%u, max=%u) in finobt block "
+			 "%u/%u\n"),
+			be16_to_cpu(block->bb_numrecs), mp->m_inobt_mnr[1],
+			mp->m_inobt_mxr[1], seqno, bno);
+		serious_error++;
+		return;
+	}
+	pp = XFS_INOBT_PTR_ADDR(mp, block, 1, mp->m_inobt_mxr[1]);
+	for (i = 0; i < be16_to_cpu(block->bb_numrecs); i++)
+		scan_sbtree(agf, be32_to_cpu(pp[i]), level, 0, scanfunc_fino, TYP_FINOBT);
+}
+
+static void
 set_dbmap(
 	xfs_agnumber_t	agno,
 	xfs_agblock_t	agbno,
diff --git a/db/type.c b/db/type.c
index 5c60736..955986b 100644
--- a/db/type.c
+++ b/db/type.c
@@ -141,6 +141,8 @@ static const typ_t	__typtab_spcrc[] = {
 	{ TYP_SYMLINK, "symlink", handle_struct, symlink_crc_hfld,
 		&xfs_symlink_buf_ops },
 	{ TYP_TEXT, "text", handle_text, NULL, NULL },
+	{ TYP_FINOBT, "finobt", handle_struct, inobt_crc_hfld,
+		&xfs_inobt_buf_ops },
 	{ TYP_NONE, NULL }
 };
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 01/10] libxfs: readahead of dir3 data blocks should use the read verifier
  2015-08-15  1:43 ` [PATCH 01/10] libxfs: readahead of dir3 data blocks should use the read verifier Darrick J. Wong
@ 2015-08-17 18:31   ` Eric Sandeen
  2015-08-17 20:30     ` Darrick J. Wong
  0 siblings, 1 reply; 25+ messages in thread
From: Eric Sandeen @ 2015-08-17 18:31 UTC (permalink / raw)
  To: Darrick J. Wong, david; +Cc: xfs

On 8/14/15 8:43 PM, Darrick J. Wong wrote:
> In the dir3 data block readahead function, use the regular read
> verifier to check the block's CRC and spot-check the block contents
> instead of calling the spot-checking routine directly.  This prevents
> corrupted directory data blocks from being read into the kernel, which
> can lead to garbage ls output and directory loops (if say one of the
> entries contains invalid characters).
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  libxfs/xfs_dir2_data.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> 
> diff --git a/libxfs/xfs_dir2_data.c b/libxfs/xfs_dir2_data.c
> index c475ba8..466e096 100644
> --- a/libxfs/xfs_dir2_data.c
> +++ b/libxfs/xfs_dir2_data.c
> @@ -250,7 +250,8 @@ xfs_dir3_data_reada_verify(
>  		return;
>  	case cpu_to_be32(XFS_DIR2_DATA_MAGIC):
>  	case cpu_to_be32(XFS_DIR3_DATA_MAGIC):
> -		xfs_dir3_data_verify(bp);
> +		bp->b_ops = &xfs_dir3_block_buf_ops;
> +		bp->b_ops->verify_read(bp);

Shouldn't that be xfs_dir3_data_buf_ops ?

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 02/10] xfs_db: don't crash on a corrupt inode
  2015-08-15  1:43 ` [PATCH 02/10] xfs_db: don't crash on a corrupt inode Darrick J. Wong
@ 2015-08-17 18:52   ` Eric Sandeen
  2015-08-17 20:45     ` Darrick J. Wong
  0 siblings, 1 reply; 25+ messages in thread
From: Eric Sandeen @ 2015-08-17 18:52 UTC (permalink / raw)
  To: Darrick J. Wong, david, darrick.wong; +Cc: xfs

On 8/14/15 8:43 PM, Darrick J. Wong wrote:
> If the user selects a corrupt inode via the 'inode XXX' command, the
> read verifier will fail and the io cursor at the top of the ring will
> not have any data attached.  When this is the case, we cannot
> dereference the NULL pointer or xfs_db will crash.  Therefore, check
> the buffer pointer before using it.
> 
> It's arguable that we ought to retry the read without the verifiers
> if the inode is corrupt or fails CRC, since this /is/ a debugging
> tool, and maybe you wanted the contents anyway.

I agree.  It seems like we should do that, though it probably needs to
be done across the board for all metadata types if it's going to be done.
Maybe something to add to the TODO?

> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  db/inode.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> 
> diff --git a/db/inode.c b/db/inode.c
> index e86dabd..64b263b 100644
> --- a/db/inode.c
> +++ b/db/inode.c
> @@ -682,6 +682,8 @@ set_cur_inode(
>  	set_cur(&typtab[TYP_INODE], XFS_AGB_TO_DADDR(mp, agno, cluster_agbno),
>  		numblks, DB_RING_IGN, NULL);
>  	off_cur(offset << mp->m_sb.sb_inodelog, mp->m_sb.sb_inodesize);

off_cur checks for iocur_top == NULL, and warns if it is, so that's good.
The user should have a clue about what's gone wrong, at least.

But, callers of set_cur_inode() are still going to crash often as not:

ablock_f:
        set_cur_inode(iocur_top->ino);
        haveattr = XFS_DFORK_Q((xfs_dinode_t *)iocur_top->data);

bmap:

        set_cur_inode(iocur_top->ino);
        nex = *nexp;
        *nexp = 0;
        ASSERT(nex > 0);
        dip = iocur_top->data;

bmap_f:

                set_cur_inode(iocur_top->ino);
                dip = iocur_top->data;

and a few more :(

Perhaps set_cur_inode() should return failure, so the caller knows to bail,
pop_cur if it needs to, etc?

-Eric

> +	if (!iocur_top->data)
> +		return;
>  	dip = iocur_top->data;
>  	iocur_top->ino_buf = 1;
>  	iocur_top->ino = ino;
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 03/10] xfs_repair: ignore "repaired" flag after we decide to clear xattr block
  2015-08-15  1:43 ` [PATCH 03/10] xfs_repair: ignore "repaired" flag after we decide to clear xattr block Darrick J. Wong
@ 2015-08-17 19:20   ` Eric Sandeen
  2015-08-17 20:50     ` Darrick J. Wong
  0 siblings, 1 reply; 25+ messages in thread
From: Eric Sandeen @ 2015-08-17 19:20 UTC (permalink / raw)
  To: Darrick J. Wong, david, darrick.wong; +Cc: xfs

On 8/14/15 8:43 PM, Darrick J. Wong wrote:
> If in the course of examining extended attribute block contents we
> first decide to repair an entry (*repair = 1) but secondly decide to
> clear the whole block, set *repair = 0 because the clearing action
> only happens if *repair == 0.  Put another way, if we're nuking a
> block, don't pretend like we've fixed it too.

Hm, or what happens otherwise?

TBH this is making my brain hurt, chasing it all the way back up
the callchain.  It's *repair not *repaired; it's not saying we
fixed it, but that we should fix it, right?  It'll take me a while
to work out what all the callers do with the "clearit" return from
process_attr_leaf_block and with *repair.

Did you have a specific example of something going wrong?

thanks,
-Eric

> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  repair/attr_repair.c |   14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> 
> diff --git a/repair/attr_repair.c b/repair/attr_repair.c
> index 62f80e7..2bd9334 100644
> --- a/repair/attr_repair.c
> +++ b/repair/attr_repair.c
> @@ -1311,6 +1311,13 @@ process_leaf_attr_block(
>  		* we can add it then.
>  		*/
>  	}
> +	/*
> +	 * If we're just going to zap the block, don't pretend like we
> +	 * repaired it, because repairing the block stops the clear
> +	 * operation.
> +	 */
> +	if (clearit)
> +		*repair = 0;
>  	if (*repair)
>  		xfs_attr3_leaf_hdr_to_disk(mp->m_attr_geo, leaf, &leafhdr);
>  
> @@ -1524,6 +1531,7 @@ process_longform_attr(
>  	xfs_dahash_t	next_hashval;
>  	int		repairlinks = 0;
>  	struct xfs_attr3_icleaf_hdr leafhdr;
> +	int		error;
>  
>  	*repair = 0;
>  
> @@ -1604,12 +1612,16 @@ process_longform_attr(
>  			libxfs_writebuf(bp, 0);
>  		} else
>  			libxfs_putbuf(bp);
> -		return (process_node_attr(mp, ino, dip, blkmap)); /* + repair */
> +		error = process_node_attr(mp, ino, dip, blkmap); /* + repair */
> +		if (error)
> +			*repair = 0;
> +		return error;
>  	default:
>  		do_warn(
>  	_("bad attribute leaf magic # %#x for dir ino %" PRIu64 "\n"),
>  			be16_to_cpu(leaf->hdr.info.magic), ino);
>  		libxfs_putbuf(bp);
> +		*repair = 0;
>  		return(1);
>  	}
>  
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 04/10] xfs_repair: fix broken EFSBADCRC/EFSCORRUPTED usage with buffer errors
  2015-08-15  1:44 ` [PATCH 04/10] xfs_repair: fix broken EFSBADCRC/EFSCORRUPTED usage with buffer errors Darrick J. Wong
@ 2015-08-17 19:51   ` Eric Sandeen
  2015-08-17 19:57     ` Eric Sandeen
  0 siblings, 1 reply; 25+ messages in thread
From: Eric Sandeen @ 2015-08-17 19:51 UTC (permalink / raw)
  To: Darrick J. Wong, david, darrick.wong; +Cc: xfs

On 8/14/15 8:44 PM, Darrick J. Wong wrote:
> When we encounter CRC or verifier errors, bp->b_error is set to
> -EFSBADCRC and -EFSCORRUPTED; note the negative sign.  For whatever
> reason, repair and db use the positive versions, and therefore fail to
> notice the error, so fix all the broken uses.
> 
> Note however that the db and repair turn the negative codes returned
> by libxfs into positive codes that can be used with strerror.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

This looks right, but I think there's more; see:

XFS_WANT_CORRUPTED_GOTO
XFS_WANT_CORRUPTED_RETURN

(these return negative errors in kernelspace)

and a bunch of stuff in libxlog/xfs_log_recover.c...

FWIW, I guess it's expected that non-libxfs xfsprogs might carry around
positive error numbers, but of course it has to be consistent...

commit 12b5319796439c9442414f82049201d3c740e059
Author: Dave Chinner <dchinner@redhat.com>
Date:   Fri Jul 31 08:33:21 2015 +1000

    libxfs: error negation rework
    
    The libxfs core in the kernel now returns negative error numbers one
    failure rather than positive numbers. This commit switches the
    libxfs core to use negative error numbers and converts all
    the libxfs function callers to negate the returned error so that
    none of the other codeneeds tobe changed at this time.
    
    This allows us to drive negative errors through the xfsprogs code
    base at our leisure rather than having to do it all right now.
    
    Signed-off-by: Dave Chinner <dchinner@redhat.com>

So this is fine;

Reviewed-by: Eric Sandeen <sandeen@redhat.com>

but there are a few more libxfs/ bits to fix, I guess.

-Eric

> ---
>  db/attr.c     |    4 ++--
>  db/dir2.c     |    4 ++--
>  db/io.c       |    4 ++--
>  db/io.h       |    2 +-
>  repair/dir2.c |    2 +-
>  repair/scan.c |   12 ++++++------
>  6 files changed, 14 insertions(+), 14 deletions(-)
> 
> 
> diff --git a/db/attr.c b/db/attr.c
> index 897834b..5e69100 100644
> --- a/db/attr.c
> +++ b/db/attr.c
> @@ -554,7 +554,7 @@ xfs_attr3_db_read_verify(
>  		break;
>  	default:
>  		dbprintf(_("Unknown attribute buffer type!\n"));
> -		xfs_buf_ioerror(bp, EFSCORRUPTED);
> +		xfs_buf_ioerror(bp, -EFSCORRUPTED);
>  		return;
>  	}
>  verify:
> @@ -566,7 +566,7 @@ xfs_attr3_db_write_verify(
>  	struct xfs_buf		*bp)
>  {
>  	dbprintf(_("Writing unknown attribute buffer type!\n"));
> -	xfs_buf_ioerror(bp, EFSCORRUPTED);
> +	xfs_buf_ioerror(bp, -EFSCORRUPTED);
>  }
>  
>  const struct xfs_buf_ops xfs_attr3_db_buf_ops = {
> diff --git a/db/dir2.c b/db/dir2.c
> index 7f69e6f..cc76662 100644
> --- a/db/dir2.c
> +++ b/db/dir2.c
> @@ -1021,7 +1021,7 @@ xfs_dir3_db_read_verify(
>  		break;
>  	default:
>  		dbprintf(_("Unknown directory buffer type!\n"));
> -		xfs_buf_ioerror(bp, EFSCORRUPTED);
> +		xfs_buf_ioerror(bp, -EFSCORRUPTED);
>  		return;
>  	}
>  verify:
> @@ -1033,7 +1033,7 @@ xfs_dir3_db_write_verify(
>  	struct xfs_buf		*bp)
>  {
>  	dbprintf(_("Writing unknown directory buffer type!\n"));
> -	xfs_buf_ioerror(bp, EFSCORRUPTED);
> +	xfs_buf_ioerror(bp, -EFSCORRUPTED);
>  }
>  
>  const struct xfs_buf_ops xfs_dir3_db_buf_ops = {
> diff --git a/db/io.c b/db/io.c
> index 9fa52b8..9452e07 100644
> --- a/db/io.c
> +++ b/db/io.c
> @@ -535,8 +535,8 @@ set_cur(
>  	 * Keep the buffer even if the verifier says it is corrupted.
>  	 * We're a diagnostic tool, after all.
>  	 */
> -	if (!bp || (bp->b_error && bp->b_error != EFSCORRUPTED &&
> -				   bp->b_error != EFSBADCRC))
> +	if (!bp || (bp->b_error && bp->b_error != -EFSCORRUPTED &&
> +				   bp->b_error != -EFSBADCRC))
>  		return;
>  	iocur_top->buf = bp->b_addr;
>  	iocur_top->bp = bp;
> diff --git a/db/io.h b/db/io.h
> index 31d96b4..6201d7b 100644
> --- a/db/io.h
> +++ b/db/io.h
> @@ -75,6 +75,6 @@ iocur_crc_valid()
>  		return -1;
>  	if (iocur_top->bp->b_flags & LIBXFS_B_UNCHECKED)
>  		return -1;
> -	return (iocur_top->bp->b_error != EFSBADCRC &&
> +	return (iocur_top->bp->b_error != -EFSBADCRC &&
>  		(!iocur_top->ino_buf || iocur_top->ino_crc_ok));
>  }
> diff --git a/repair/dir2.c b/repair/dir2.c
> index 187e069..a5646f8 100644
> --- a/repair/dir2.c
> +++ b/repair/dir2.c
> @@ -199,7 +199,7 @@ _("bad dir magic number 0x%x in inode %" PRIu64 " bno = %u\n"),
>  			goto error_out;
>  		}
>  		/* corrupt node; rebuild the dir. */
> -		if (bp->b_error == EFSBADCRC || bp->b_error == EFSCORRUPTED) {
> +		if (bp->b_error == -EFSBADCRC || bp->b_error == -EFSCORRUPTED) {
>  			do_warn(
>  _("corrupt tree block %u for directory inode %" PRIu64 "\n"),
>  				bno, da_cursor->ino);
> diff --git a/repair/scan.c b/repair/scan.c
> index 58f45eb..1e7a4da 100644
> --- a/repair/scan.c
> +++ b/repair/scan.c
> @@ -82,7 +82,7 @@ scan_sbtree(
>  		do_error(_("can't read btree block %d/%d\n"), agno, root);
>  		return;
>  	}
> -	if (bp->b_error == EFSBADCRC || bp->b_error == EFSCORRUPTED) {
> +	if (bp->b_error == -EFSBADCRC || bp->b_error == -EFSCORRUPTED) {
>  		do_warn(_("btree block %d/%d is suspect, error %d\n"),
>  			agno, root, bp->b_error);
>  		suspect = 1;
> @@ -145,7 +145,7 @@ scan_lbtree(
>  	 * is a corruption or not and whether it got corrected and so needs
>  	 * writing back. CRC errors always imply we need to write the block.
>  	 */
> -	if (bp->b_error == EFSBADCRC) {
> +	if (bp->b_error == -EFSBADCRC) {
>  		do_warn(_("btree block %d/%d is suspect, error %d\n"),
>  			XFS_FSB_TO_AGNO(mp, root),
>  			XFS_FSB_TO_AGBNO(mp, root), bp->b_error);
> @@ -1432,7 +1432,7 @@ scan_freelist(
>  		do_abort(_("can't read agfl block for ag %d\n"), agno);
>  		return;
>  	}
> -	if (agflbuf->b_error == EFSBADCRC)
> +	if (agflbuf->b_error == -EFSBADCRC)
>  		do_warn(_("agfl has bad CRC for ag %d\n"), agno);
>  
>  	freelist = XFS_BUF_TO_AGFL_BNO(mp, agflbuf);
> @@ -1705,9 +1705,9 @@ scan_ag(
>  	 * immediately, though.
>  	 */
>  	if (!no_modify) {
> -		agi_dirty += (agibuf->b_error == EFSBADCRC);
> -		agf_dirty += (agfbuf->b_error == EFSBADCRC);
> -		sb_dirty += (sbbuf->b_error == EFSBADCRC);
> +		agi_dirty += (agibuf->b_error == -EFSBADCRC);
> +		agf_dirty += (agfbuf->b_error == -EFSBADCRC);
> +		sb_dirty += (sbbuf->b_error == -EFSBADCRC);
>  	}
>  
>  	if (agi_dirty && !no_modify)
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 04/10] xfs_repair: fix broken EFSBADCRC/EFSCORRUPTED usage with buffer errors
  2015-08-17 19:51   ` Eric Sandeen
@ 2015-08-17 19:57     ` Eric Sandeen
  0 siblings, 0 replies; 25+ messages in thread
From: Eric Sandeen @ 2015-08-17 19:57 UTC (permalink / raw)
  To: Darrick J. Wong, david, darrick.wong; +Cc: xfs

On 8/17/15 2:51 PM, Eric Sandeen wrote:
> On 8/14/15 8:44 PM, Darrick J. Wong wrote:
>> When we encounter CRC or verifier errors, bp->b_error is set to
>> -EFSBADCRC and -EFSCORRUPTED; note the negative sign.  For whatever
>> reason, repair and db use the positive versions, and therefore fail to
>> notice the error, so fix all the broken uses.
>>
>> Note however that the db and repair turn the negative codes returned
>> by libxfs into positive codes that can be used with strerror.
>>
>> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> This looks right, but I think there's more; see:
> 
> XFS_WANT_CORRUPTED_GOTO
> XFS_WANT_CORRUPTED_RETURN
> 
> (these return negative errors in kernelspace)
> 
> and a bunch of stuff in libxlog/xfs_log_recover.c...

Ok, I guess libxlog/* was never switched to negative, and so far it looks
ok.

The XFS_WANT_CORRUPTED* macros seem like a problem, though.

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 01/10] libxfs: readahead of dir3 data blocks should use the read verifier
  2015-08-17 18:31   ` Eric Sandeen
@ 2015-08-17 20:30     ` Darrick J. Wong
  0 siblings, 0 replies; 25+ messages in thread
From: Darrick J. Wong @ 2015-08-17 20:30 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs

On Mon, Aug 17, 2015 at 01:31:23PM -0500, Eric Sandeen wrote:
> On 8/14/15 8:43 PM, Darrick J. Wong wrote:
> > In the dir3 data block readahead function, use the regular read
> > verifier to check the block's CRC and spot-check the block contents
> > instead of calling the spot-checking routine directly.  This prevents
> > corrupted directory data blocks from being read into the kernel, which
> > can lead to garbage ls output and directory loops (if say one of the
> > entries contains invalid characters).
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  libxfs/xfs_dir2_data.c |    3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > 
> > diff --git a/libxfs/xfs_dir2_data.c b/libxfs/xfs_dir2_data.c
> > index c475ba8..466e096 100644
> > --- a/libxfs/xfs_dir2_data.c
> > +++ b/libxfs/xfs_dir2_data.c
> > @@ -250,7 +250,8 @@ xfs_dir3_data_reada_verify(
> >  		return;
> >  	case cpu_to_be32(XFS_DIR2_DATA_MAGIC):
> >  	case cpu_to_be32(XFS_DIR3_DATA_MAGIC):
> > -		xfs_dir3_data_verify(bp);
> > +		bp->b_ops = &xfs_dir3_block_buf_ops;
> > +		bp->b_ops->verify_read(bp);
> 
> Shouldn't that be xfs_dir3_data_buf_ops ?

D'oh.  Yep, good catch.

--D

> 
> -Eric
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 02/10] xfs_db: don't crash on a corrupt inode
  2015-08-17 18:52   ` Eric Sandeen
@ 2015-08-17 20:45     ` Darrick J. Wong
  0 siblings, 0 replies; 25+ messages in thread
From: Darrick J. Wong @ 2015-08-17 20:45 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Darrick J. Wong, xfs

On Mon, Aug 17, 2015 at 01:52:18PM -0500, Eric Sandeen wrote:
> On 8/14/15 8:43 PM, Darrick J. Wong wrote:
> > If the user selects a corrupt inode via the 'inode XXX' command, the
> > read verifier will fail and the io cursor at the top of the ring will
> > not have any data attached.  When this is the case, we cannot
> > dereference the NULL pointer or xfs_db will crash.  Therefore, check
> > the buffer pointer before using it.
> > 
> > It's arguable that we ought to retry the read without the verifiers
> > if the inode is corrupt or fails CRC, since this /is/ a debugging
> > tool, and maybe you wanted the contents anyway.
> 
> I agree.  It seems like we should do that, though it probably needs to
> be done across the board for all metadata types if it's going to be done.
> Maybe something to add to the TODO?

Hmm.  Looks like set_cur() wants to return buffer contents in the verifier
error case, except that the broken xfs_buf->b_error sign handling prevents
it from doing that... and the sign handling error is fixed later on in the
patch set.

So... that leaves hard IO errors, for which we still need this patch.

--D

> 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  db/inode.c |    2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > 
> > diff --git a/db/inode.c b/db/inode.c
> > index e86dabd..64b263b 100644
> > --- a/db/inode.c
> > +++ b/db/inode.c
> > @@ -682,6 +682,8 @@ set_cur_inode(
> >  	set_cur(&typtab[TYP_INODE], XFS_AGB_TO_DADDR(mp, agno, cluster_agbno),
> >  		numblks, DB_RING_IGN, NULL);
> >  	off_cur(offset << mp->m_sb.sb_inodelog, mp->m_sb.sb_inodesize);
> 
> off_cur checks for iocur_top == NULL, and warns if it is, so that's good.
> The user should have a clue about what's gone wrong, at least.
> 
> But, callers of set_cur_inode() are still going to crash often as not:
> 
> ablock_f:
>         set_cur_inode(iocur_top->ino);
>         haveattr = XFS_DFORK_Q((xfs_dinode_t *)iocur_top->data);
> 
> bmap:
> 
>         set_cur_inode(iocur_top->ino);
>         nex = *nexp;
>         *nexp = 0;
>         ASSERT(nex > 0);
>         dip = iocur_top->data;
> 
> bmap_f:
> 
>                 set_cur_inode(iocur_top->ino);
>                 dip = iocur_top->data;
> 
> and a few more :(
> 
> Perhaps set_cur_inode() should return failure, so the caller knows to bail,
> pop_cur if it needs to, etc?
> 
> -Eric
> 
> > +	if (!iocur_top->data)
> > +		return;
> >  	dip = iocur_top->data;
> >  	iocur_top->ino_buf = 1;
> >  	iocur_top->ino = ino;
> > 
> > _______________________________________________
> > xfs mailing list
> > xfs@oss.sgi.com
> > http://oss.sgi.com/mailman/listinfo/xfs
> > 
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 03/10] xfs_repair: ignore "repaired" flag after we decide to clear xattr block
  2015-08-17 19:20   ` Eric Sandeen
@ 2015-08-17 20:50     ` Darrick J. Wong
  0 siblings, 0 replies; 25+ messages in thread
From: Darrick J. Wong @ 2015-08-17 20:50 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Darrick J. Wong, xfs

On Mon, Aug 17, 2015 at 02:20:32PM -0500, Eric Sandeen wrote:
> On 8/14/15 8:43 PM, Darrick J. Wong wrote:
> > If in the course of examining extended attribute block contents we
> > first decide to repair an entry (*repair = 1) but secondly decide to
> > clear the whole block, set *repair = 0 because the clearing action
> > only happens if *repair == 0.  Put another way, if we're nuking a
> > block, don't pretend like we've fixed it too.
> 
> Hm, or what happens otherwise?
> 
> TBH this is making my brain hurt, chasing it all the way back up
> the callchain.  It's *repair not *repaired; it's not saying we
> fixed it, but that we should fix it, right?  It'll take me a while
> to work out what all the callers do with the "clearit" return from
> process_attr_leaf_block and with *repair.

I thought *repair actually did mean *repaired (past tense).  But I might just
be confused?  xfs_attr.c:939 says:

do_warn(
	_("removing attribute entry %d for inode %" PRIu64 "\n"),
	i, ino);
tempentry = (xfs_attr_sf_entry_t *)
	((intptr_t) currententry +
	 XFS_ATTR_SF_ENTSIZE(currententry));
memmove(currententry,tempentry,remainingspace);
asf->hdr.count -= 1;
i--; /* no worries, it will wrap back to 0 */
*repair = 1;

...that looks like a repair to me?  Also see attr_repair.c:965:

	if (asf->hdr.count != i)  {
		if (no_modify)  {
			do_warn(
_("would have corrected attribute entry count in inode %" PRIu64 " from %d to %d\n"),
				ino, asf->hdr.count, i);
		} else  {
			do_warn(
_("corrected attribute entry count in inode %" PRIu64 ", was %d, now %d\n"),
				ino, asf->hdr.count, i);
			asf->hdr.count = i;
			*repair = 1;
		}
	}

...looks like we fix asf->hdr.count and then set *repair to 1.

> Did you have a specific example of something going wrong?

A few dozen runs of xfs/72[0-2] are pretty good at hitting this.

--D

> 
> thanks,
> -Eric
> 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  repair/attr_repair.c |   14 +++++++++++++-
> >  1 file changed, 13 insertions(+), 1 deletion(-)
> > 
> > 
> > diff --git a/repair/attr_repair.c b/repair/attr_repair.c
> > index 62f80e7..2bd9334 100644
> > --- a/repair/attr_repair.c
> > +++ b/repair/attr_repair.c
> > @@ -1311,6 +1311,13 @@ process_leaf_attr_block(
> >  		* we can add it then.
> >  		*/
> >  	}
> > +	/*
> > +	 * If we're just going to zap the block, don't pretend like we
> > +	 * repaired it, because repairing the block stops the clear
> > +	 * operation.
> > +	 */
> > +	if (clearit)
> > +		*repair = 0;
> >  	if (*repair)
> >  		xfs_attr3_leaf_hdr_to_disk(mp->m_attr_geo, leaf, &leafhdr);
> >  
> > @@ -1524,6 +1531,7 @@ process_longform_attr(
> >  	xfs_dahash_t	next_hashval;
> >  	int		repairlinks = 0;
> >  	struct xfs_attr3_icleaf_hdr leafhdr;
> > +	int		error;
> >  
> >  	*repair = 0;
> >  
> > @@ -1604,12 +1612,16 @@ process_longform_attr(
> >  			libxfs_writebuf(bp, 0);
> >  		} else
> >  			libxfs_putbuf(bp);
> > -		return (process_node_attr(mp, ino, dip, blkmap)); /* + repair */
> > +		error = process_node_attr(mp, ino, dip, blkmap); /* + repair */
> > +		if (error)
> > +			*repair = 0;
> > +		return error;
> >  	default:
> >  		do_warn(
> >  	_("bad attribute leaf magic # %#x for dir ino %" PRIu64 "\n"),
> >  			be16_to_cpu(leaf->hdr.info.magic), ino);
> >  		libxfs_putbuf(bp);
> > +		*repair = 0;
> >  		return(1);
> >  	}
> >  
> > 
> > _______________________________________________
> > xfs mailing list
> > xfs@oss.sgi.com
> > http://oss.sgi.com/mailman/listinfo/xfs
> > 
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 05/10] xfs_repair: force not-so-bad bmbt blocks back through the verifier
  2015-08-15  1:44 ` [PATCH 05/10] xfs_repair: force not-so-bad bmbt blocks back through the verifier Darrick J. Wong
@ 2015-08-17 21:14   ` Eric Sandeen
  2015-08-17 23:48     ` Darrick J. Wong
  0 siblings, 1 reply; 25+ messages in thread
From: Eric Sandeen @ 2015-08-17 21:14 UTC (permalink / raw)
  To: Darrick J. Wong, david, darrick.wong; +Cc: xfs

On 8/14/15 8:44 PM, Darrick J. Wong wrote:
> If during prefetch we encounter a bmbt block that fails the CRC check
> due to corruption in the unused part of the block, force the buffer
> back through the non-prefetch verifiers later so that the CRC is
> updated.  Otherwise, the bad checksum goes unfixed and the kernel will
> still flag the bmbt block as invalid.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  repair/prefetch.c |    8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> 
> diff --git a/repair/prefetch.c b/repair/prefetch.c
> index 8b261ae..fc7097f 100644
> --- a/repair/prefetch.c
> +++ b/repair/prefetch.c
> @@ -276,6 +276,14 @@ pf_scan_lbtree(
>  
>  	XFS_BUF_SET_PRIORITY(bp, isadir ? B_DIR_BMAP : B_BMAP);
>  
> +	/*
> +	 * Make this bmbt buffer go back through the verifiers later so that
> +	 * we correct checksum errors stemming from bitflips in the unused
> +	 * parts of the bmbt block.
> +	 */
> +	if (bp->b_error == -EFSBADCRC || bp->b_error == -EFSCORRUPTED)
> +		bp->b_flags |= LIBXFS_B_UNCHECKED;

Hm, so why check EFSCORRUPTED?  If you're doing what the comment says, why
not just EFSBADCRC?  EFSCORRUPTED means that in-use portions are bad, no?

-Eric

> +
>  	rc = (*func)(XFS_BUF_TO_BLOCK(bp), level - 1, isadir, args);
>  
>  	libxfs_putbuf(bp);
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 05/10] xfs_repair: force not-so-bad bmbt blocks back through the verifier
  2015-08-17 21:14   ` Eric Sandeen
@ 2015-08-17 23:48     ` Darrick J. Wong
  0 siblings, 0 replies; 25+ messages in thread
From: Darrick J. Wong @ 2015-08-17 23:48 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Darrick J. Wong, xfs

On Mon, Aug 17, 2015 at 04:14:27PM -0500, Eric Sandeen wrote:
> On 8/14/15 8:44 PM, Darrick J. Wong wrote:
> > If during prefetch we encounter a bmbt block that fails the CRC check
> > due to corruption in the unused part of the block, force the buffer
> > back through the non-prefetch verifiers later so that the CRC is
> > updated.  Otherwise, the bad checksum goes unfixed and the kernel will
> > still flag the bmbt block as invalid.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  repair/prefetch.c |    8 ++++++++
> >  1 file changed, 8 insertions(+)
> > 
> > 
> > diff --git a/repair/prefetch.c b/repair/prefetch.c
> > index 8b261ae..fc7097f 100644
> > --- a/repair/prefetch.c
> > +++ b/repair/prefetch.c
> > @@ -276,6 +276,14 @@ pf_scan_lbtree(
> >  
> >  	XFS_BUF_SET_PRIORITY(bp, isadir ? B_DIR_BMAP : B_BMAP);
> >  
> > +	/*
> > +	 * Make this bmbt buffer go back through the verifiers later so that
> > +	 * we correct checksum errors stemming from bitflips in the unused
> > +	 * parts of the bmbt block.
> > +	 */
> > +	if (bp->b_error == -EFSBADCRC || bp->b_error == -EFSCORRUPTED)
> > +		bp->b_flags |= LIBXFS_B_UNCHECKED;
> 
> Hm, so why check EFSCORRUPTED?  If you're doing what the comment says, why
> not just EFSBADCRC?  EFSCORRUPTED means that in-use portions are bad, no?

You're right, I only need to hook onto EFSBADCRC.  I probably got too used
to seeing checks for both everywhere. 8)

--D

> 
> -Eric
> 
> > +
> >  	rc = (*func)(XFS_BUF_TO_BLOCK(bp), level - 1, isadir, args);
> >  
> >  	libxfs_putbuf(bp);
> > 
> > _______________________________________________
> > xfs mailing list
> > xfs@oss.sgi.com
> > http://oss.sgi.com/mailman/listinfo/xfs
> > 
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 08/10] xfs_db: enable blocktrash for checksummed filesystems
  2015-08-15  1:44 ` [PATCH 08/10] xfs_db: enable blocktrash for checksummed filesystems Darrick J. Wong
@ 2015-08-18 19:26   ` Eric Sandeen
  2015-08-19 15:22     ` Darrick J. Wong
  0 siblings, 1 reply; 25+ messages in thread
From: Eric Sandeen @ 2015-08-18 19:26 UTC (permalink / raw)
  To: Darrick J. Wong, david; +Cc: xfs

On 8/14/15 8:44 PM, Darrick J. Wong wrote:
> Disable the write verifiers when we're trashing a block.  With this
> in place, create a xfs fuzzer script that formats, populates, corrupts,
> tries to use, repairs, and tries again to use a crash test xfs image.
> Hopefully this will shake out some v5 filesystem bugs.

Maybe "we can create an xfs fuzzer script ..." (since it's not in this
patch)

> v2: Drop xfsfuzz, don't assume every block is an AGF when blocktrashing.
> Don't trash log blocks by default, because that skews the blocktrash
> heavily towards damaging only log blocks.

and skip DBM_SB by default as well, right?

And you added log blocks & symlinks to the allowed mask.

So I think something like:

Allow trashing of symlink & log blocks.
By default, do not trash superblocks (why?) or log blocks (because ...)

> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  db/check.c |   10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> 
> diff --git a/db/check.c b/db/check.c
> index afeea32..965d0f5 100644
> --- a/db/check.c
> +++ b/db/check.c
> @@ -944,6 +944,7 @@ blocktrash_b(
>  	int		mask;
>  	int		newbit;
>  	int		offset;
> +	const struct xfs_buf_ops *stashed_ops;
>  	static char	*modestr[] = {
>  		N_("zeroed"), N_("set"), N_("flipped"), N_("randomized")
>  	};
> @@ -952,8 +953,10 @@ blocktrash_b(
>  	offset = (int)(random() % (int)(mp->m_sb.sb_blocksize * NBBY));
>  	newbit = 0;
>  	push_cur();
> -	set_cur(&typtab[DBM_UNKNOWN],
> +	set_cur(NULL,
>  		XFS_AGB_TO_DADDR(mp, agno, agbno), blkbb, DB_RING_IGN, NULL);

Ok, you talked about this back on 5/28/15.  Weird.

But calling it with NULL is odd, too; nothing else does that.  What about TYP_NONE,

        { TYP_NONE, NULL }

its ops are NULL, as well... does that work?
Huh, ok, no callers w/ TYP_NONE, either.  I guess NULL works.

> +	stashed_ops = iocur_top->bp->b_ops;
> +	iocur_top->bp->b_ops = NULL;
>  	if ((buf = iocur_top->data) == NULL) {
>  		dbprintf(_("can't read block %u/%u for trashing\n"), agno, agbno);
>  		pop_cur();
> @@ -984,6 +987,7 @@ blocktrash_b(
>  			buf[byte] &= ~mask;
>  	}
>  	write_cur();
> +	iocur_top->bp->b_ops = stashed_ops;

*nod*

>  	pop_cur();
>  	printf(_("blocktrash: %u/%u %s block %d bit%s starting %d:%d %s\n"),
>  		agno, agbno, typename[type], len, len == 1 ? "" : "s",
> @@ -1040,9 +1044,11 @@ blocktrash_f(
>  		   (1 << DBM_BTINO) |
>  		   (1 << DBM_DIR) |
>  		   (1 << DBM_INODE) |
> +		   (1 << DBM_LOG) |

Ok, so you allow log blocks to be specified,

>  		   (1 << DBM_QUOTA) |
>  		   (1 << DBM_RTBITMAP) |
>  		   (1 << DBM_RTSUM) |
> +		   (1 << DBM_SYMLINK) |

and symlink blocks too, but...

>  		   (1 << DBM_SB);
>  	while ((c = getopt(argc, argv, "0123n:s:t:x:y:")) != EOF) {
>  		switch (c) {
> @@ -1106,7 +1112,7 @@ blocktrash_f(
>  		return 0;
>  	}
>  	if (tmask == 0)
> -		tmask = goodmask;
> +		tmask = goodmask & ~((1 << DBM_LOG) | (1 << DBM_SB));

you disable log & superblocks by default if no mask is specified.

I'm not 100% sure why you want to change this, what did you run into,
in practice, if they were allowed?

If the change stays, then the xfs_db manpage needs an update:

"If no -t options are given then all metadata types can be trashed."

Thanks,
-Eric


>  	lentab = xmalloc(sizeof(ltab_t));
>  	lentab->min = lentab->max = min;
>  	lentablen = 1;
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 09/10] xfs_db: trash the block at the top of the cursor stack
  2015-08-15  1:44 ` [PATCH 09/10] xfs_db: trash the block at the top of the cursor stack Darrick J. Wong
@ 2015-08-18 19:59   ` Eric Sandeen
  2015-08-19 15:12     ` Darrick J. Wong
  0 siblings, 1 reply; 25+ messages in thread
From: Eric Sandeen @ 2015-08-18 19:59 UTC (permalink / raw)
  To: Darrick J. Wong, david; +Cc: xfs

On 8/14/15 8:44 PM, Darrick J. Wong wrote:
> Add a new -z option to blocktrash to make it trash the block that's at
> the top of the stack, so that we can perform targeted fuzzing.  While
> we're at it, prevent fuzzing off the end of the buffer and add a -o
> parameter so that we can specify an offset to start fuzzing from.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  db/check.c        |   81 +++++++++++++++++++++++++++++++++++++++++------------
>  man/man8/xfs_db.8 |   15 +++++++++-
>  2 files changed, 77 insertions(+), 19 deletions(-)
> 
> 
> diff --git a/db/check.c b/db/check.c
> index 965d0f5..7c11b0b 100644
> --- a/db/check.c
> +++ b/db/check.c
> @@ -930,8 +930,7 @@ typedef struct ltab {
>  
>  static void
>  blocktrash_b(
> -	xfs_agnumber_t	agno,
> -	xfs_agblock_t	agbno,
> +	int		offset,

a comment about "offset into what?" might be nice

>  	dbm_t		type,
>  	ltab_t		*ltabp,
>  	int		mode)
> @@ -943,23 +942,36 @@ blocktrash_b(
>  	int		len;
>  	int		mask;
>  	int		newbit;
> -	int		offset;
>  	const struct xfs_buf_ops *stashed_ops;
>  	static char	*modestr[] = {
>  		N_("zeroed"), N_("set"), N_("flipped"), N_("randomized")
>  	};
> +	xfs_agnumber_t	agno;
> +	xfs_agblock_t	agbno;
>  
> +	agno = XFS_FSB_TO_AGNO(mp, XFS_DADDR_TO_FSB(mp, iocur_top->bb));
> +	agbno = XFS_FSB_TO_AGBNO(mp, XFS_DADDR_TO_FSB(mp, iocur_top->bb));
> +	if (iocur_top->len == 0) {
> +		dbprintf(_("zero-length block %u/%u buffer to trash??\n"),
> +				agno, agbno);
> +		return;
> +	}
>  	len = (int)((random() % (ltabp->max - ltabp->min + 1)) + ltabp->min);
> -	offset = (int)(random() % (int)(mp->m_sb.sb_blocksize * NBBY));
> +	/*
> +	 * offset >= 0: start fuzzing at this exact offset.

"this exact bit offset"

> +	 * offset < 0: pick an offset at least as high at -(offset + 1).

"a bit offset ..."

(units would be nice, in the comments.  I had assumed bytes, even though you have NBBY)

Ow, my brain.  ;)

Your manpage says:

> If the value is preceded by a '+', the
> +trashing will start at a randomly chosen offset that is larger than the value
> +supplied.

which seems sane.

If we supply +10,

+			if (relative)
+				offset = -offset - 1;

so now offset = -10 - 1, and we get -11.
Then:

> +	 */
> +	if (offset < 0) {
> +		offset = -(offset + 1);

now offset = -(-11 + 1) = -(-10) = 10.  Okay... so 10 or higher.

What's w/ the +/- 1?  

Why not just:

if (relative)
	offset = -offset;	/* i.e. -10 */

...

if (offset < 0) {
	offset = -offset;	/* i.e. 10 */
	offset = offset + (int)(random() % (int)((iocur_top->len - offset) * NBBY)); ...


> +		offset = offset + (int)(random() % (int)((iocur_top->len - offset) * NBBY));
> +	}
> +	if (offset + len >= iocur_top->len * NBBY)
> +		len = (iocur_top->len * NBBY) - offset;
>  	newbit = 0;
> -	push_cur();
> -	set_cur(NULL,
> -		XFS_AGB_TO_DADDR(mp, agno, agbno), blkbb, DB_RING_IGN, NULL);
>  	stashed_ops = iocur_top->bp->b_ops;
>  	iocur_top->bp->b_ops = NULL;
>  	if ((buf = iocur_top->data) == NULL) {
>  		dbprintf(_("can't read block %u/%u for trashing\n"), agno, agbno);
> -		pop_cur();
>  		return;
>  	}
>  	for (bitno = 0; bitno < len; bitno++) {
> @@ -988,7 +1000,6 @@ blocktrash_b(
>  	}
>  	write_cur();
>  	iocur_top->bp->b_ops = stashed_ops;
> -	pop_cur();
>  	printf(_("blocktrash: %u/%u %s block %d bit%s starting %d:%d %s\n"),
>  		agno, agbno, typename[type], len, len == 1 ? "" : "s",
>  		offset / NBBY, offset % NBBY, modestr[mode]);
> @@ -1019,11 +1030,9 @@ blocktrash_f(
>  	uint		seed;
>  	int		sopt;
>  	int		tmask;
> +	bool		this_block = false;
> +	int		offset = -1;
>  
> -	if (!dbmap) {
> -		dbprintf(_("must run blockget first\n"));
> -		return 0;
> -	}
>  	optind = 0;
>  	count = 1;
>  	min = 1;
> @@ -1050,7 +1059,7 @@ blocktrash_f(
>  		   (1 << DBM_RTSUM) |
>  		   (1 << DBM_SYMLINK) |
>  		   (1 << DBM_SB);
> -	while ((c = getopt(argc, argv, "0123n:s:t:x:y:")) != EOF) {
> +	while ((c = getopt(argc, argv, "0123n:o:s:t:x:y:z")) != EOF) {
>  		switch (c) {
>  		case '0':
>  			mode = 0;
> @@ -1071,6 +1080,21 @@ blocktrash_f(
>  				return 0;
>  			}
>  			break;
> +		case 'o': {
> +			int relative = 0;
> +			if (optarg[0] == '+') {
> +				optarg++;
> +				relative = 1;
> +			}
> +			offset = (int)strtol(optarg, &p, 0);
> +			if (*p != '\0' || offset < 0) {
> +				dbprintf(_("bad blocktrash offset %s\n"), optarg);
> +				return 0;
> +			}
> +			if (relative)
> +				offset = -offset - 1;
> +			break;
> +		}
>  		case 's':
>  			seed = (uint)strtoul(optarg, &p, 0);
>  			sopt = 1;
> @@ -1102,11 +1126,22 @@ blocktrash_f(
>  				return 0;
>  			}
>  			break;
> +		case 'z':
> +			this_block = true;
> +			break;

is there any mnemonic for 'z'?  Maybe 'c' for Current, or 'b' for (this one) Block?

Not that big a deal, just wondering.

>  		default:
>  			dbprintf(_("bad option for blocktrash command\n"));
>  			return 0;
>  		}
>  	}
> +	if (!this_block && !dbmap) {
> +		dbprintf(_("must run blockget first\n"));
> +		return 0;
> +	}
> +	if (this_block && iocur_sp == 0) {
> +		dbprintf(_("nothing on stack\n"));
> +		return 0;
> +	}
>  	if (min > max) {
>  		dbprintf(_("bad min/max for blocktrash command\n"));
>  		return 0;
> @@ -1125,6 +1160,14 @@ blocktrash_f(
>  		} else
>  			lentab[lentablen - 1].max = i;
>  	}
> +	if (!sopt)
> +		dbprintf(_("blocktrash: seed %u\n"), seed);

does this extra output break any xfstests?

# grep -r blocktrash common/ tests/xfs
# 

maybe not!

> +	srandom(seed);
> +	if (this_block) {
> +		blocktrash_b(offset, DBM_UNKNOWN, &lentab[random() % lentablen],
> +				mode);
> +		goto out;
> +	}
>  	for (blocks = 0, agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
>  		for (agbno = 0, p = dbmap[agno];
>  		     agbno < mp->m_sb.sb_agblocks;
> @@ -1137,9 +1180,6 @@ blocktrash_f(
>  		dbprintf(_("blocktrash: no matching blocks\n"));
>  		goto out;
>  	}
> -	if (!sopt)
> -		dbprintf(_("blocktrash: seed %u\n"), seed);
> -	srandom(seed);
>  	for (i = 0; i < count; i++) {
>  		randb = (xfs_rfsblock_t)((((__int64_t)random() << 32) |
>  					 random()) % blocks);
> @@ -1153,8 +1193,13 @@ blocktrash_f(
>  					continue;
>  				if (bi++ < randb)
>  					continue;
> -				blocktrash_b(agno, agbno, (dbm_t)*p,
> +				push_cur();
> +				set_cur(NULL,
> +					XFS_AGB_TO_DADDR(mp, agno, agbno),
> +					blkbb, DB_RING_IGN, NULL);
> +				blocktrash_b(offset, (dbm_t)*p,
>  					&lentab[random() % lentablen], mode);
> +				pop_cur();
>  				done = 1;
>  				break;
>  			}
> diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8
> index df54bb7..681efc4 100644
> --- a/man/man8/xfs_db.8
> +++ b/man/man8/xfs_db.8
> @@ -232,7 +232,7 @@ enables verbose output. Messages will be printed for every block and
>  inode processed.
>  .RE
>  .TP
> -.BI "blocktrash [\-n " count "] [\-x " min "] [\-y " max "] [\-s " seed "] [\-0|1|2|3] [\-t " type "] ..."
> +.BI "blocktrash [-z] [\-o " offset "] [\-n " count "] [\-x " min "] [\-y " max "] [\-s " seed "] [\-0|1|2|3] [\-t " type "] ..."
>  Trash randomly selected filesystem metadata blocks.
>  Trashing occurs to randomly selected bits in the chosen blocks.
>  This command is available only in debugging versions of
> @@ -259,6 +259,13 @@ supplies the
>  .I count
>  of block-trashings to perform (default 1).
>  .TP
> +.B \-o
> +supplies the bit
> +.I offset
> +at which to start trashing the block.  If the value is preceded by a '+', the
> +trashing will start at a randomly chosen offset that is larger than the value
> +supplied.  The default is to randomly choose an offset anywhere in the block.
> +.TP
>  .B \-s
>  supplies a
>  .I seed
> @@ -282,6 +289,12 @@ size of bit range to be trashed. The default value is 1.
>  sets the
>  .I maximum
>  size of bit range to be trashed. The default value is 1024.
> +.TP
> +.B \-z
> +trashes the block at the top of the stack.  It is not necessary to
> +run
> +.BI blockget
> +if this option is supplied.
>  .RE
>  .TP
>  .BI "blockuse [\-n] [\-c " count ]
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 09/10] xfs_db: trash the block at the top of the cursor stack
  2015-08-18 19:59   ` Eric Sandeen
@ 2015-08-19 15:12     ` Darrick J. Wong
  0 siblings, 0 replies; 25+ messages in thread
From: Darrick J. Wong @ 2015-08-19 15:12 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs

On Tue, Aug 18, 2015 at 02:59:28PM -0500, Eric Sandeen wrote:
> On 8/14/15 8:44 PM, Darrick J. Wong wrote:
> > Add a new -z option to blocktrash to make it trash the block that's at
> > the top of the stack, so that we can perform targeted fuzzing.  While
> > we're at it, prevent fuzzing off the end of the buffer and add a -o
> > parameter so that we can specify an offset to start fuzzing from.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  db/check.c        |   81 +++++++++++++++++++++++++++++++++++++++++------------
> >  man/man8/xfs_db.8 |   15 +++++++++-
> >  2 files changed, 77 insertions(+), 19 deletions(-)
> > 
> > 
> > diff --git a/db/check.c b/db/check.c
> > index 965d0f5..7c11b0b 100644
> > --- a/db/check.c
> > +++ b/db/check.c
> > @@ -930,8 +930,7 @@ typedef struct ltab {
> >  
> >  static void
> >  blocktrash_b(
> > -	xfs_agnumber_t	agno,
> > -	xfs_agblock_t	agbno,
> > +	int		offset,
> 
> a comment about "offset into what?" might be nice
> 
> >  	dbm_t		type,
> >  	ltab_t		*ltabp,
> >  	int		mode)
> > @@ -943,23 +942,36 @@ blocktrash_b(
> >  	int		len;
> >  	int		mask;
> >  	int		newbit;
> > -	int		offset;
> >  	const struct xfs_buf_ops *stashed_ops;
> >  	static char	*modestr[] = {
> >  		N_("zeroed"), N_("set"), N_("flipped"), N_("randomized")
> >  	};
> > +	xfs_agnumber_t	agno;
> > +	xfs_agblock_t	agbno;
> >  
> > +	agno = XFS_FSB_TO_AGNO(mp, XFS_DADDR_TO_FSB(mp, iocur_top->bb));
> > +	agbno = XFS_FSB_TO_AGBNO(mp, XFS_DADDR_TO_FSB(mp, iocur_top->bb));
> > +	if (iocur_top->len == 0) {
> > +		dbprintf(_("zero-length block %u/%u buffer to trash??\n"),
> > +				agno, agbno);
> > +		return;
> > +	}
> >  	len = (int)((random() % (ltabp->max - ltabp->min + 1)) + ltabp->min);
> > -	offset = (int)(random() % (int)(mp->m_sb.sb_blocksize * NBBY));
> > +	/*
> > +	 * offset >= 0: start fuzzing at this exact offset.
> 
> "this exact bit offset"
> 
> > +	 * offset < 0: pick an offset at least as high at -(offset + 1).
> 
> "a bit offset ..."
> 
> (units would be nice, in the comments.  I had assumed bytes, even though you have NBBY)

Ok.

> Ow, my brain.  ;)
> 
> Your manpage says:
> 
> > If the value is preceded by a '+', the
> > +trashing will start at a randomly chosen offset that is larger than the value
> > +supplied.
> 
> which seems sane.
> 
> If we supply +10,
> 
> +			if (relative)
> +				offset = -offset - 1;
> 
> so now offset = -10 - 1, and we get -11.
> Then:
> 
> > +	 */
> > +	if (offset < 0) {
> > +		offset = -(offset + 1);
> 
> now offset = -(-11 + 1) = -(-10) = 10.  Okay... so 10 or higher.
> 
> What's w/ the +/- 1?  

It's to distinguish "start fuzzing at exactly zero" from "pick anywhere
(at least as high as bit offset zero) to start fuzzing".

> 
> Why not just:
> 
> if (relative)
> 	offset = -offset;	/* i.e. -10 */
> 
> ...
> 
> if (offset < 0) {
> 	offset = -offset;	/* i.e. 10 */
> 	offset = offset + (int)(random() % (int)((iocur_top->len - offset) * NBBY)); ...
> 
> 
> > +		offset = offset + (int)(random() % (int)((iocur_top->len - offset) * NBBY));
> > +	}
> > +	if (offset + len >= iocur_top->len * NBBY)
> > +		len = (iocur_top->len * NBBY) - offset;
> >  	newbit = 0;
> > -	push_cur();
> > -	set_cur(NULL,
> > -		XFS_AGB_TO_DADDR(mp, agno, agbno), blkbb, DB_RING_IGN, NULL);
> >  	stashed_ops = iocur_top->bp->b_ops;
> >  	iocur_top->bp->b_ops = NULL;
> >  	if ((buf = iocur_top->data) == NULL) {
> >  		dbprintf(_("can't read block %u/%u for trashing\n"), agno, agbno);
> > -		pop_cur();
> >  		return;
> >  	}
> >  	for (bitno = 0; bitno < len; bitno++) {
> > @@ -988,7 +1000,6 @@ blocktrash_b(
> >  	}
> >  	write_cur();
> >  	iocur_top->bp->b_ops = stashed_ops;
> > -	pop_cur();
> >  	printf(_("blocktrash: %u/%u %s block %d bit%s starting %d:%d %s\n"),
> >  		agno, agbno, typename[type], len, len == 1 ? "" : "s",
> >  		offset / NBBY, offset % NBBY, modestr[mode]);
> > @@ -1019,11 +1030,9 @@ blocktrash_f(
> >  	uint		seed;
> >  	int		sopt;
> >  	int		tmask;
> > +	bool		this_block = false;
> > +	int		offset = -1;
> >  
> > -	if (!dbmap) {
> > -		dbprintf(_("must run blockget first\n"));
> > -		return 0;
> > -	}
> >  	optind = 0;
> >  	count = 1;
> >  	min = 1;
> > @@ -1050,7 +1059,7 @@ blocktrash_f(
> >  		   (1 << DBM_RTSUM) |
> >  		   (1 << DBM_SYMLINK) |
> >  		   (1 << DBM_SB);
> > -	while ((c = getopt(argc, argv, "0123n:s:t:x:y:")) != EOF) {
> > +	while ((c = getopt(argc, argv, "0123n:o:s:t:x:y:z")) != EOF) {
> >  		switch (c) {
> >  		case '0':
> >  			mode = 0;
> > @@ -1071,6 +1080,21 @@ blocktrash_f(
> >  				return 0;
> >  			}
> >  			break;
> > +		case 'o': {
> > +			int relative = 0;
> > +			if (optarg[0] == '+') {
> > +				optarg++;
> > +				relative = 1;
> > +			}
> > +			offset = (int)strtol(optarg, &p, 0);
> > +			if (*p != '\0' || offset < 0) {
> > +				dbprintf(_("bad blocktrash offset %s\n"), optarg);
> > +				return 0;
> > +			}
> > +			if (relative)
> > +				offset = -offset - 1;
> > +			break;
> > +		}
> >  		case 's':
> >  			seed = (uint)strtoul(optarg, &p, 0);
> >  			sopt = 1;
> > @@ -1102,11 +1126,22 @@ blocktrash_f(
> >  				return 0;
> >  			}
> >  			break;
> > +		case 'z':
> > +			this_block = true;
> > +			break;
> 
> is there any mnemonic for 'z'?  Maybe 'c' for Current, or 'b' for (this one) Block?
> 
> Not that big a deal, just wondering.

-z for "Corrupt 'zis block, pleaze". :)

I was hoping it is less confusing than "-t" for "select eligible block types"
and "-T" for "corrupt this exact block"?

> 
> >  		default:
> >  			dbprintf(_("bad option for blocktrash command\n"));
> >  			return 0;
> >  		}
> >  	}
> > +	if (!this_block && !dbmap) {
> > +		dbprintf(_("must run blockget first\n"));
> > +		return 0;
> > +	}
> > +	if (this_block && iocur_sp == 0) {
> > +		dbprintf(_("nothing on stack\n"));
> > +		return 0;
> > +	}
> >  	if (min > max) {
> >  		dbprintf(_("bad min/max for blocktrash command\n"));
> >  		return 0;
> > @@ -1125,6 +1160,14 @@ blocktrash_f(
> >  		} else
> >  			lentab[lentablen - 1].max = i;
> >  	}
> > +	if (!sopt)
> > +		dbprintf(_("blocktrash: seed %u\n"), seed);
> 
> does this extra output break any xfstests?
> 
> # grep -r blocktrash common/ tests/xfs
> # 
> 
> maybe not!

<shrug> I made the same observation.

--D

> 
> > +	srandom(seed);
> > +	if (this_block) {
> > +		blocktrash_b(offset, DBM_UNKNOWN, &lentab[random() % lentablen],
> > +				mode);
> > +		goto out;
> > +	}
> >  	for (blocks = 0, agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
> >  		for (agbno = 0, p = dbmap[agno];
> >  		     agbno < mp->m_sb.sb_agblocks;
> > @@ -1137,9 +1180,6 @@ blocktrash_f(
> >  		dbprintf(_("blocktrash: no matching blocks\n"));
> >  		goto out;
> >  	}
> > -	if (!sopt)
> > -		dbprintf(_("blocktrash: seed %u\n"), seed);
> > -	srandom(seed);
> >  	for (i = 0; i < count; i++) {
> >  		randb = (xfs_rfsblock_t)((((__int64_t)random() << 32) |
> >  					 random()) % blocks);
> > @@ -1153,8 +1193,13 @@ blocktrash_f(
> >  					continue;
> >  				if (bi++ < randb)
> >  					continue;
> > -				blocktrash_b(agno, agbno, (dbm_t)*p,
> > +				push_cur();
> > +				set_cur(NULL,
> > +					XFS_AGB_TO_DADDR(mp, agno, agbno),
> > +					blkbb, DB_RING_IGN, NULL);
> > +				blocktrash_b(offset, (dbm_t)*p,
> >  					&lentab[random() % lentablen], mode);
> > +				pop_cur();
> >  				done = 1;
> >  				break;
> >  			}
> > diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8
> > index df54bb7..681efc4 100644
> > --- a/man/man8/xfs_db.8
> > +++ b/man/man8/xfs_db.8
> > @@ -232,7 +232,7 @@ enables verbose output. Messages will be printed for every block and
> >  inode processed.
> >  .RE
> >  .TP
> > -.BI "blocktrash [\-n " count "] [\-x " min "] [\-y " max "] [\-s " seed "] [\-0|1|2|3] [\-t " type "] ..."
> > +.BI "blocktrash [-z] [\-o " offset "] [\-n " count "] [\-x " min "] [\-y " max "] [\-s " seed "] [\-0|1|2|3] [\-t " type "] ..."
> >  Trash randomly selected filesystem metadata blocks.
> >  Trashing occurs to randomly selected bits in the chosen blocks.
> >  This command is available only in debugging versions of
> > @@ -259,6 +259,13 @@ supplies the
> >  .I count
> >  of block-trashings to perform (default 1).
> >  .TP
> > +.B \-o
> > +supplies the bit
> > +.I offset
> > +at which to start trashing the block.  If the value is preceded by a '+', the
> > +trashing will start at a randomly chosen offset that is larger than the value
> > +supplied.  The default is to randomly choose an offset anywhere in the block.
> > +.TP
> >  .B \-s
> >  supplies a
> >  .I seed
> > @@ -282,6 +289,12 @@ size of bit range to be trashed. The default value is 1.
> >  sets the
> >  .I maximum
> >  size of bit range to be trashed. The default value is 1024.
> > +.TP
> > +.B \-z
> > +trashes the block at the top of the stack.  It is not necessary to
> > +run
> > +.BI blockget
> > +if this option is supplied.
> >  .RE
> >  .TP
> >  .BI "blockuse [\-n] [\-c " count ]
> > 
> > _______________________________________________
> > xfs mailing list
> > xfs@oss.sgi.com
> > http://oss.sgi.com/mailman/listinfo/xfs
> > 
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 08/10] xfs_db: enable blocktrash for checksummed filesystems
  2015-08-18 19:26   ` Eric Sandeen
@ 2015-08-19 15:22     ` Darrick J. Wong
  0 siblings, 0 replies; 25+ messages in thread
From: Darrick J. Wong @ 2015-08-19 15:22 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs

On Tue, Aug 18, 2015 at 02:26:37PM -0500, Eric Sandeen wrote:
> On 8/14/15 8:44 PM, Darrick J. Wong wrote:
> > Disable the write verifiers when we're trashing a block.  With this
> > in place, create a xfs fuzzer script that formats, populates, corrupts,
> > tries to use, repairs, and tries again to use a crash test xfs image.
> > Hopefully this will shake out some v5 filesystem bugs.
> 
> Maybe "we can create an xfs fuzzer script ..." (since it's not in this
> patch)

"With this in place, the new fuzzers group in xfstests can format, populate..."

> > v2: Drop xfsfuzz, don't assume every block is an AGF when blocktrashing.
> > Don't trash log blocks by default, because that skews the blocktrash
> > heavily towards damaging only log blocks.

"v3: Fix changelog issues, allow trashing of log blocks and symlinks,
and require the caller to explicitly ask for trashing of log blocks
and super blocks.  Allowing log blocks by default skews the trashing
heavily in favor of (probably unused) log blocks, which doesn't help
us with fuzzing.  Furthermore, trashing the superblock results in a
time consuming sector by sector superblock hunt."

> 
> and skip DBM_SB by default as well, right?
> 
> And you added log blocks & symlinks to the allowed mask.
> 
> So I think something like:
> 
> Allow trashing of symlink & log blocks.
> By default, do not trash superblocks (why?) or log blocks (because ...)
> 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  db/check.c |   10 ++++++++--
> >  1 file changed, 8 insertions(+), 2 deletions(-)
> > 
> > 
> > diff --git a/db/check.c b/db/check.c
> > index afeea32..965d0f5 100644
> > --- a/db/check.c
> > +++ b/db/check.c
> > @@ -944,6 +944,7 @@ blocktrash_b(
> >  	int		mask;
> >  	int		newbit;
> >  	int		offset;
> > +	const struct xfs_buf_ops *stashed_ops;
> >  	static char	*modestr[] = {
> >  		N_("zeroed"), N_("set"), N_("flipped"), N_("randomized")
> >  	};
> > @@ -952,8 +953,10 @@ blocktrash_b(
> >  	offset = (int)(random() % (int)(mp->m_sb.sb_blocksize * NBBY));
> >  	newbit = 0;
> >  	push_cur();
> > -	set_cur(&typtab[DBM_UNKNOWN],
> > +	set_cur(NULL,
> >  		XFS_AGB_TO_DADDR(mp, agno, agbno), blkbb, DB_RING_IGN, NULL);
> 
> Ok, you talked about this back on 5/28/15.  Weird.
> 
> But calling it with NULL is odd, too; nothing else does that.  What about TYP_NONE,
> 
>         { TYP_NONE, NULL }
> 
> its ops are NULL, as well... does that work?
> Huh, ok, no callers w/ TYP_NONE, either.  I guess NULL works.

I'm impressed you can remember that far back. :)

> > +	stashed_ops = iocur_top->bp->b_ops;
> > +	iocur_top->bp->b_ops = NULL;
> >  	if ((buf = iocur_top->data) == NULL) {
> >  		dbprintf(_("can't read block %u/%u for trashing\n"), agno, agbno);
> >  		pop_cur();
> > @@ -984,6 +987,7 @@ blocktrash_b(
> >  			buf[byte] &= ~mask;
> >  	}
> >  	write_cur();
> > +	iocur_top->bp->b_ops = stashed_ops;
> 
> *nod*
> 
> >  	pop_cur();
> >  	printf(_("blocktrash: %u/%u %s block %d bit%s starting %d:%d %s\n"),
> >  		agno, agbno, typename[type], len, len == 1 ? "" : "s",
> > @@ -1040,9 +1044,11 @@ blocktrash_f(
> >  		   (1 << DBM_BTINO) |
> >  		   (1 << DBM_DIR) |
> >  		   (1 << DBM_INODE) |
> > +		   (1 << DBM_LOG) |
> 
> Ok, so you allow log blocks to be specified,
> 
> >  		   (1 << DBM_QUOTA) |
> >  		   (1 << DBM_RTBITMAP) |
> >  		   (1 << DBM_RTSUM) |
> > +		   (1 << DBM_SYMLINK) |
> 
> and symlink blocks too, but...
> 
> >  		   (1 << DBM_SB);
> >  	while ((c = getopt(argc, argv, "0123n:s:t:x:y:")) != EOF) {
> >  		switch (c) {
> > @@ -1106,7 +1112,7 @@ blocktrash_f(
> >  		return 0;
> >  	}
> >  	if (tmask == 0)
> > -		tmask = goodmask;
> > +		tmask = goodmask & ~((1 << DBM_LOG) | (1 << DBM_SB));
> 
> you disable log & superblocks by default if no mask is specified.
> 
> I'm not 100% sure why you want to change this, what did you run into,
> in practice, if they were allowed?

I found that with log block trashing turned on, we'd most frequently
trash empty log blocks; if there wasn't anything in the log to get
replayed, the corruption wouldn't show up.  So, make the user
explicitly ask for log blocks to be included in the trash mask.

As for requiring the user to ask for superblocks to be in the trash
mask, that's mostly to prevent the behavior that if repair finds a
garbage superblock it'll scan every sector on the whole disk looking
for superblocks, which is pretty slow...

> If the change stays, then the xfs_db manpage needs an update:
> 
> "If no -t options are given then all metadata types can be trashed."

Ok.

--D

> 
> Thanks,
> -Eric
> 
> 
> >  	lentab = xmalloc(sizeof(ltab_t));
> >  	lentab->min = lentab->max = min;
> >  	lentablen = 1;
> > 
> > _______________________________________________
> > xfs mailing list
> > xfs@oss.sgi.com
> > http://oss.sgi.com/mailman/listinfo/xfs
> > 
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2015-08-19 15:23 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-15  1:43 [PATCH 00/10] xfsprogs August 2015 patchbomb Darrick J. Wong
2015-08-15  1:43 ` [PATCH 01/10] libxfs: readahead of dir3 data blocks should use the read verifier Darrick J. Wong
2015-08-17 18:31   ` Eric Sandeen
2015-08-17 20:30     ` Darrick J. Wong
2015-08-15  1:43 ` [PATCH 02/10] xfs_db: don't crash on a corrupt inode Darrick J. Wong
2015-08-17 18:52   ` Eric Sandeen
2015-08-17 20:45     ` Darrick J. Wong
2015-08-15  1:43 ` [PATCH 03/10] xfs_repair: ignore "repaired" flag after we decide to clear xattr block Darrick J. Wong
2015-08-17 19:20   ` Eric Sandeen
2015-08-17 20:50     ` Darrick J. Wong
2015-08-15  1:44 ` [PATCH 04/10] xfs_repair: fix broken EFSBADCRC/EFSCORRUPTED usage with buffer errors Darrick J. Wong
2015-08-17 19:51   ` Eric Sandeen
2015-08-17 19:57     ` Eric Sandeen
2015-08-15  1:44 ` [PATCH 05/10] xfs_repair: force not-so-bad bmbt blocks back through the verifier Darrick J. Wong
2015-08-17 21:14   ` Eric Sandeen
2015-08-17 23:48     ` Darrick J. Wong
2015-08-15  1:44 ` [PATCH 06/10] xfs_repair: mark unreachable prefetched metadata blocks stale Darrick J. Wong
2015-08-15  1:44 ` [PATCH 07/10] xfs_io: support reflinking and deduping file ranges Darrick J. Wong
2015-08-15  1:44 ` [PATCH 08/10] xfs_db: enable blocktrash for checksummed filesystems Darrick J. Wong
2015-08-18 19:26   ` Eric Sandeen
2015-08-19 15:22     ` Darrick J. Wong
2015-08-15  1:44 ` [PATCH 09/10] xfs_db: trash the block at the top of the cursor stack Darrick J. Wong
2015-08-18 19:59   ` Eric Sandeen
2015-08-19 15:12     ` Darrick J. Wong
2015-08-15  1:44 ` [PATCH 10/10] xfs_db: enable blockget for v5 filesystems Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.