linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/17] xfsprogs: online scrub/repair support
@ 2017-01-21  8:08 Darrick J. Wong
  2017-01-21  8:08 ` [PATCH 01/17] xfs_io: support the new getfsmap ioctl Darrick J. Wong
                   ` (15 more replies)
  0 siblings, 16 replies; 18+ messages in thread
From: Darrick J. Wong @ 2017-01-21  8:08 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs, linux-fsdevel

Hi all,

This is the fifth revision of a patchset that adds to XFS userland tools
support for online metadata scrubbing and repair.  There aren't any
on-disk format changes, and the main overview is in the cover letter for
the kernel patches.

In a departure from previous large submissions, I've decided only to
send patches for the userspace tools, and not for libxfs.  This avoids
flooding the list with nearly identical patches that were sent for the
kernel.  If you're interested in running the code, you can pull fully
functional code from my git branches.

If you're going to start using this mess, you probably ought to just
pull from my github trees for kernel[1], xfsprogs[2], and xfstests[3].
The kernel patches in the git trees should apply to 4.10-rc4; xfsprogs
patches to for-next; and xfstest to master.

This is an extraordinary way to eat your data.  Enjoy! 
Comments and questions are, as always, welcome.

--D

[1] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=djwong-devel
[2] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=djwong-devel
[3] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=djwong-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 01/17] xfs_io: support the new getfsmap ioctl
  2017-01-21  8:08 [PATCH v5 00/17] xfsprogs: online scrub/repair support Darrick J. Wong
@ 2017-01-21  8:08 ` Darrick J. Wong
  2017-01-21  8:08 ` [PATCH 02/17] xfsprogs: Space management tool Dave Chinner
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Darrick J. Wong @ 2017-01-21  8:08 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs, linux-fsdevel

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 io/Makefile          |    2 
 io/copy_file_range.c |    2 
 io/encrypt.c         |    1 
 io/fsmap.c           |  553 ++++++++++++++++++++++++++++++++++++++++++++++++++
 io/init.c            |    8 +
 io/io.h              |    9 +
 io/open.c            |   21 ++
 io/pwrite.c          |    2 
 io/reflink.c         |    4 
 io/sendfile.c        |    2 
 man/man8/xfs_io.8    |   47 ++++
 11 files changed, 637 insertions(+), 14 deletions(-)
 create mode 100644 io/fsmap.c


diff --git a/io/Makefile b/io/Makefile
index 32df568..fd07596 100644
--- a/io/Makefile
+++ b/io/Makefile
@@ -9,7 +9,7 @@ LTCOMMAND = xfs_io
 LSRCFILES = xfs_bmap.sh xfs_freeze.sh xfs_mkfile.sh
 HFILES = init.h io.h
 CFILES = init.c \
-	attr.c bmap.c cowextsize.c encrypt.c file.c freeze.c fsync.c \
+	attr.c bmap.c cowextsize.c encrypt.c file.c freeze.c fsmap.c fsync.c \
 	getrusage.c imap.c link.c mmap.c open.c parent.c pread.c prealloc.c \
 	pwrite.c reflink.c seek.c shutdown.c sync.c truncate.c utimes.c
 
diff --git a/io/copy_file_range.c b/io/copy_file_range.c
index 249c649..d1dfc5a 100644
--- a/io/copy_file_range.c
+++ b/io/copy_file_range.c
@@ -121,7 +121,7 @@ copy_range_f(int argc, char **argv)
 	if (optind != argc - 1)
 		return command_usage(&copy_range_cmd);
 
-	fd = openfile(argv[optind], NULL, IO_READONLY, 0);
+	fd = openfile(argv[optind], NULL, IO_READONLY, 0, NULL);
 	if (fd < 0)
 		return 0;
 
diff --git a/io/encrypt.c b/io/encrypt.c
index d844c5e..26ab97c 100644
--- a/io/encrypt.c
+++ b/io/encrypt.c
@@ -20,6 +20,7 @@
 #include "platform_defs.h"
 #include "command.h"
 #include "init.h"
+#include "path.h"
 #include "io.h"
 
 #ifndef ARRAY_SIZE
diff --git a/io/fsmap.c b/io/fsmap.c
new file mode 100644
index 0000000..5ca9a51
--- /dev/null
+++ b/io/fsmap.c
@@ -0,0 +1,553 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "platform_defs.h"
+#include "command.h"
+#include "init.h"
+#include "path.h"
+#include "io.h"
+#include "input.h"
+
+static cmdinfo_t	fsmap_cmd;
+static dev_t		xfs_data_dev;
+
+static void
+fsmap_help(void)
+{
+	printf(_(
+"\n"
+" prints the block mapping for an XFS filesystem"
+"\n"
+" Example:\n"
+" 'fsmap -dlrv [-n nr] [startoff] [endoff]' - tabular format verbose map, including unwritten extents\n"
+"\n"
+" fsmap prints the map of disk blocks used by the whole filesystem.\n"
+" The map lists each extent used by the file, as well as regions in the\n"
+" filesystem that do not have any corresponding blocks (free space).\n"
+" By default, each line of the listing takes the following form:\n"
+"     extent: [startoffset..endoffset] owner startblock..endblock\n"
+" All the file offsets and disk blocks are in units of 512-byte blocks.\n"
+" -d -- query only the data device.\n"
+" -l -- query only the log device.\n"
+" -r -- query only the realtime device.\n"
+" -n -- query n extents.\n"
+" -v -- Verbose information, specify ag info.  Show flags legend on 2nd -v\n"
+"\n"));
+}
+
+static int
+numlen(
+	off64_t	val)
+{
+	off64_t	tmp;
+	int	len;
+
+	for (len = 0, tmp = val; tmp > 0; tmp = tmp/10)
+		len++;
+	return (len == 0 ? 1 : len);
+}
+
+static const char *
+special_owner(
+	__int64_t	owner)
+{
+	switch (owner) {
+	case FMR_OWN_FREE:
+		return _("free space");
+	case FMR_OWN_UNKNOWN:
+		return _("unknown");
+	case FMR_OWN_FS:
+		return _("static fs metadata");
+	case FMR_OWN_LOG:
+		return _("journalling log");
+	case FMR_OWN_AG:
+		return _("per-AG metadata");
+	case FMR_OWN_INOBT:
+		return _("inode btree");
+	case FMR_OWN_INODES:
+		return _("inodes");
+	case FMR_OWN_REFC:
+		return _("refcount btree");
+	case FMR_OWN_COW:
+		return _("cow reservation");
+	case FMR_OWN_DEFECTIVE:
+		return _("defective");
+	default:
+		return _("unknown");
+	}
+}
+
+static void
+dump_map(
+	unsigned long long	*nr,
+	struct fsmap_head	*head)
+{
+	unsigned long long	i;
+	struct fsmap		*p;
+
+	for (i = 0, p = head->fmh_recs; i < head->fmh_entries; i++, p++) {
+		printf("\t%llu: %u:%u [%lld..%lld]: ", i + (*nr),
+			major(p->fmr_device), minor(p->fmr_device),
+			(long long)BTOBBT(p->fmr_physical),
+			(long long)BTOBBT(p->fmr_physical + p->fmr_length - 1));
+		if (p->fmr_flags & FMR_OF_SPECIAL_OWNER)
+			printf("%s", special_owner(p->fmr_owner));
+		else if (p->fmr_flags & FMR_OF_EXTENT_MAP)
+			printf(_("inode %lld extent map"),
+				(long long) p->fmr_owner);
+		else
+			printf(_("inode %lld %lld..%lld"),
+				(long long)p->fmr_owner,
+				(long long)BTOBBT(p->fmr_offset),
+				(long long)BTOBBT(p->fmr_offset + p->fmr_length - 1));
+		printf(_(" %lld blocks\n"),
+			(long long)BTOBBT(p->fmr_length));
+	}
+
+	(*nr) += head->fmh_entries;
+}
+
+/*
+ * Verbose mode displays:
+ *   extent: major:minor [startblock..endblock]: startoffset..endoffset \
+ *	ag# (agoffset..agendoffset) totalbbs flags
+ */
+#define MINRANGE_WIDTH	16
+#define MINAG_WIDTH	2
+#define MINTOT_WIDTH	5
+#define NFLG		7		/* count of flags */
+#define	FLG_NULL	00000000	/* Null flag */
+#define	FLG_SHARED	01000000	/* shared extent */
+#define	FLG_ATTR_FORK	00100000	/* attribute fork */
+#define	FLG_PRE		00010000	/* Unwritten extent */
+#define	FLG_BSU		00001000	/* Not on begin of stripe unit  */
+#define	FLG_ESU		00000100	/* Not on end   of stripe unit  */
+#define	FLG_BSW		00000010	/* Not on begin of stripe width */
+#define	FLG_ESW		00000001	/* Not on end   of stripe width */
+static void
+dump_map_verbose(
+	unsigned long long	*nr,
+	struct fsmap_head	*head,
+	bool			*dumped_flags,
+	struct xfs_fsop_geom	*fsgeo)
+{
+	unsigned long long	i;
+	struct fsmap		*p;
+	int			agno;
+	off64_t			agoff, bperag;
+	int			foff_w, boff_w, aoff_w, tot_w, agno_w, own_w;
+	int			nr_w, dev_w;
+	char			rbuf[32], bbuf[32], abuf[32], obuf[32];
+	char			nbuf[32], dbuf[32], gbuf[32];
+	int			sunit, swidth;
+	int			flg = 0;
+
+	foff_w = boff_w = aoff_w = own_w = MINRANGE_WIDTH;
+	dev_w = 3;
+	nr_w = 4;
+	tot_w = MINTOT_WIDTH;
+	bperag = (off64_t)fsgeo->agblocks *
+		  (off64_t)fsgeo->blocksize;
+	sunit = (fsgeo->sunit * fsgeo->blocksize);
+	swidth = (fsgeo->swidth * fsgeo->blocksize);
+
+	/*
+	 * Go through the extents and figure out the width
+	 * needed for all columns.
+	 */
+	for (i = 0, p = head->fmh_recs; i < head->fmh_entries; i++, p++) {
+		if (p->fmr_flags & FMR_OF_PREALLOC ||
+		    p->fmr_flags & FMR_OF_ATTR_FORK ||
+		    p->fmr_flags & FMR_OF_SHARED)
+			flg = 1;
+		if (sunit &&
+		    (p->fmr_physical  % sunit != 0 ||
+		     ((p->fmr_physical + p->fmr_length) % sunit) != 0 ||
+		     p->fmr_physical % swidth != 0 ||
+		     ((p->fmr_physical + p->fmr_length) % swidth) != 0))
+			flg = 1;
+		if (flg)
+			*dumped_flags = true;
+		snprintf(nbuf, sizeof(nbuf), "%llu", (*nr) + i);
+		nr_w = max(nr_w, strlen(nbuf));
+		if (head->fmh_oflags & FMH_OF_DEV_T)
+			snprintf(dbuf, sizeof(dbuf), "%u:%u",
+				major(p->fmr_device),
+				minor(p->fmr_device));
+		else
+			snprintf(dbuf, sizeof(dbuf), "0x%x", p->fmr_device);
+		dev_w = max(dev_w, strlen(dbuf));
+		snprintf(bbuf, sizeof(bbuf), "[%lld..%lld]:",
+			(long long)BTOBBT(p->fmr_physical),
+			(long long)BTOBBT(p->fmr_physical + p->fmr_length - 1));
+		boff_w = max(boff_w, strlen(bbuf));
+		if (p->fmr_flags & FMR_OF_SPECIAL_OWNER)
+			own_w = max(own_w, strlen(special_owner(p->fmr_owner)));
+		else {
+			snprintf(obuf, sizeof(obuf), "%lld",
+				(long long)p->fmr_owner);
+			own_w = max(own_w, strlen(obuf));
+		}
+		if (p->fmr_flags & FMR_OF_EXTENT_MAP)
+			foff_w = max(foff_w, strlen(_("extent_map")));
+		else if (p->fmr_flags & FMR_OF_SPECIAL_OWNER)
+			;
+		else {
+			snprintf(rbuf, sizeof(rbuf), "%lld..%lld",
+				(long long)BTOBBT(p->fmr_offset),
+				(long long)BTOBBT(p->fmr_offset + p->fmr_length - 1));
+			foff_w = max(foff_w, strlen(rbuf));
+		}
+		if (p->fmr_device == xfs_data_dev) {
+			agno = p->fmr_physical / bperag;
+			agoff = p->fmr_physical - (agno * bperag);
+			snprintf(abuf, sizeof(abuf),
+				"(%lld..%lld)",
+				(long long)BTOBBT(agoff),
+				(long long)BTOBBT(agoff + p->fmr_length - 1));
+		} else
+			abuf[0] = 0;
+		aoff_w = max(aoff_w, strlen(abuf));
+		tot_w = max(tot_w,
+			numlen(BTOBBT(p->fmr_length)));
+	}
+	agno_w = max(MINAG_WIDTH, numlen(fsgeo->agcount));
+	if (nr == 0)
+		printf("%*s: %-*s %-*s %-*s %-*s %*s %-*s %*s%s\n",
+			nr_w, _("EXT"),
+			dev_w, _("DEV"),
+			boff_w, _("BLOCK-RANGE"),
+			own_w, _("OWNER"),
+			foff_w, _("FILE-OFFSET"),
+			agno_w, _("AG"),
+			aoff_w, _("AG-OFFSET"),
+			tot_w, _("TOTAL"),
+			flg ? _(" FLAGS") : "");
+	for (i = 0, p = head->fmh_recs; i < head->fmh_entries; i++, p++) {
+		flg = FLG_NULL;
+		if (p->fmr_flags & FMR_OF_PREALLOC)
+			flg |= FLG_PRE;
+		if (p->fmr_flags & FMR_OF_ATTR_FORK)
+			flg |= FLG_ATTR_FORK;
+		if (p->fmr_flags & FMR_OF_SHARED)
+			flg |= FLG_SHARED;
+		/*
+		 * If striping enabled, determine if extent starts/ends
+		 * on a stripe unit boundary.
+		 */
+		if (sunit) {
+			if (p->fmr_physical  % sunit != 0)
+				flg |= FLG_BSU;
+			if (((p->fmr_physical +
+			      p->fmr_length ) % sunit ) != 0)
+				flg |= FLG_ESU;
+			if (p->fmr_physical % swidth != 0)
+				flg |= FLG_BSW;
+			if (((p->fmr_physical +
+			      p->fmr_length ) % swidth ) != 0)
+				flg |= FLG_ESW;
+		}
+		if (head->fmh_oflags & FMH_OF_DEV_T)
+			snprintf(dbuf, sizeof(dbuf), "%u:%u",
+				major(p->fmr_device),
+				minor(p->fmr_device));
+		else
+			snprintf(dbuf, sizeof(dbuf), "0x%x", p->fmr_device);
+		snprintf(bbuf, sizeof(bbuf), "[%lld..%lld]:",
+			(long long)BTOBBT(p->fmr_physical),
+			(long long)BTOBBT(p->fmr_physical + p->fmr_length - 1));
+		if (p->fmr_flags & FMR_OF_SPECIAL_OWNER) {
+			snprintf(obuf, sizeof(obuf), "%s",
+				special_owner(p->fmr_owner));
+			snprintf(rbuf, sizeof(rbuf), " ");
+		} else {
+			snprintf(obuf, sizeof(obuf), "%lld",
+				(long long)p->fmr_owner);
+			snprintf(rbuf, sizeof(rbuf), "%lld..%lld",
+				(long long)BTOBBT(p->fmr_offset),
+				(long long)BTOBBT(p->fmr_offset + p->fmr_length - 1));
+		}
+		if (p->fmr_device == xfs_data_dev) {
+			agno = p->fmr_physical / bperag;
+			agoff = p->fmr_physical - (agno * bperag);
+			snprintf(abuf, sizeof(abuf),
+				"(%lld..%lld)",
+				(long long)BTOBBT(agoff),
+				(long long)BTOBBT(agoff + p->fmr_length - 1));
+			snprintf(gbuf, sizeof(gbuf),
+				"%lld",
+				(long long)agno);
+		} else {
+			abuf[0] = 0;
+			gbuf[0] = 0;
+		}
+		if (p->fmr_flags & FMR_OF_EXTENT_MAP)
+			printf("%*llu: %-*s %-*s %-*s %-*s %-*s %-*s %*lld\n",
+				nr_w, (*nr) + i,
+				dev_w, dbuf,
+				boff_w, bbuf,
+				own_w, obuf,
+				foff_w, _("extent map"),
+				agno_w, gbuf,
+				aoff_w, abuf,
+				tot_w, (long long)BTOBBT(p->fmr_length));
+		else {
+			printf("%*llu: %-*s %-*s %-*s %-*s", nr_w, (*nr) + i,
+				dev_w, dbuf, boff_w, bbuf, own_w, obuf,
+				foff_w, rbuf);
+			printf(" %-*s %-*s", agno_w, gbuf,
+				aoff_w, abuf);
+			printf(" %*lld", tot_w,
+				(long long)BTOBBT(p->fmr_length));
+			if (flg == FLG_NULL)
+				printf("\n");
+			else
+				printf(" %-*.*o\n", NFLG, NFLG, flg);
+		}
+	}
+
+	(*nr) += head->fmh_entries;
+}
+
+static void
+dump_verbose_key(void)
+{
+	printf(_(" FLAG Values:\n"));
+	printf(_("    %*.*o Shared extent\n"),
+		NFLG+1, NFLG+1, FLG_SHARED);
+	printf(_("    %*.*o Attribute fork\n"),
+		NFLG+1, NFLG+1, FLG_ATTR_FORK);
+	printf(_("    %*.*o Unwritten preallocated extent\n"),
+		NFLG+1, NFLG+1, FLG_PRE);
+	printf(_("    %*.*o Doesn't begin on stripe unit\n"),
+		NFLG+1, NFLG+1, FLG_BSU);
+	printf(_("    %*.*o Doesn't end   on stripe unit\n"),
+		NFLG+1, NFLG+1, FLG_ESU);
+	printf(_("    %*.*o Doesn't begin on stripe width\n"),
+		NFLG+1, NFLG+1, FLG_BSW);
+	printf(_("    %*.*o Doesn't end   on stripe width\n"),
+		NFLG+1, NFLG+1, FLG_ESW);
+}
+
+int
+fsmap_f(
+	int			argc,
+	char			**argv)
+{
+	struct fsmap		*p;
+	struct fsmap_head	*nhead;
+	struct fsmap_head	*head;
+	struct fsmap		*l, *h;
+	struct xfs_fsop_geom	fsgeo;
+	long long		start = 0;
+	long long		end = -1;
+	int			nmap_size;
+	int			map_size;
+	int			nflag = 0;
+	int			vflag = 0;
+	int			i = 0;
+	int			c;
+	unsigned long long	nr = 0;
+	size_t			fsblocksize, fssectsize;
+	struct fs_path		*fs;
+	static bool		tab_init;
+	bool			dumped_flags = false;
+	int			dflag, lflag, rflag;
+
+	init_cvtnum(&fsblocksize, &fssectsize);
+
+	dflag = lflag = rflag = 0;
+	while ((c = getopt(argc, argv, "dln:rv")) != EOF) {
+		switch (c) {
+		case 'd':	/* data device */
+			dflag = 1;
+			break;
+		case 'l':	/* log device */
+			lflag = 1;
+			break;
+		case 'n':	/* number of extents specified */
+			nflag = atoi(optarg);
+			break;
+		case 'r':	/* rt device */
+			rflag = 1;
+			break;
+		case 'v':	/* Verbose output */
+			vflag++;
+			break;
+		default:
+			return command_usage(&fsmap_cmd);
+		}
+	}
+
+	if (dflag + lflag + rflag > 1)
+		return command_usage(&fsmap_cmd);
+
+	if (argc > optind && dflag + lflag + rflag == 0)
+		return command_usage(&fsmap_cmd);
+
+	if (argc > optind) {
+		start = cvtnum(fsblocksize, fssectsize, argv[optind]);
+		if (start < 0) {
+			fprintf(stderr,
+				_("Bad rmap start_bblock %s.\n"),
+				argv[optind]);
+			return 0;
+		}
+		start <<= BBSHIFT;
+	}
+
+	if (argc > optind + 1) {
+		end = cvtnum(fsblocksize, fssectsize, argv[optind + 1]);
+		if (end < 0) {
+			fprintf(stderr,
+				_("Bad rmap end_bblock %s.\n"),
+				argv[optind + 1]);
+			return 0;
+		}
+		end <<= BBSHIFT;
+	}
+
+	if (vflag) {
+		c = xfsctl(file->name, file->fd, XFS_IOC_FSGEOMETRY_V1, &fsgeo);
+		if (c < 0) {
+			fprintf(stderr,
+				_("%s: can't get geometry [\"%s\"]: %s\n"),
+				progname, file->name, strerror(errno));
+			exitcode = 1;
+			return 0;
+		}
+	}
+
+	map_size = nflag ? nflag : 131072 / sizeof(struct fsmap);
+	head = malloc(fsmap_sizeof(map_size));
+	if (head == NULL) {
+		fprintf(stderr, _("%s: malloc of %zu bytes failed.\n"),
+			progname, fsmap_sizeof(map_size));
+		exitcode = 1;
+		return 0;
+	}
+
+	memset(head, 0, sizeof(*head));
+	l = head->fmh_keys;
+	h = head->fmh_keys + 1;
+	if (dflag) {
+		l->fmr_device = h->fmr_device = file->fs_path.fs_datadev;
+	} else if (lflag) {
+		l->fmr_device = h->fmr_device = file->fs_path.fs_logdev;
+	} else if (rflag) {
+		l->fmr_device = h->fmr_device = file->fs_path.fs_rtdev;
+	} else {
+		l->fmr_device = 0;
+		h->fmr_device = UINT_MAX;
+	}
+	l->fmr_physical = start;
+	h->fmr_physical = end;
+	h->fmr_owner = ULLONG_MAX;
+	h->fmr_flags = UINT_MAX;
+	h->fmr_offset = ULLONG_MAX;
+
+	/* Count mappings */
+	if (!nflag) {
+		head->fmh_count = 0;
+		i = xfsctl(file->name, file->fd, XFS_IOC_GETFSMAP, head);
+		if (i < 0) {
+			fprintf(stderr, _("%s: xfsctl(XFS_IOC_GETFSMAP)"
+				" iflags=0x%x [\"%s\"]: %s\n"),
+				progname, head->fmh_iflags, file->name,
+				strerror(errno));
+			free(head);
+			exitcode = 1;
+			return 0;
+		}
+		if (head->fmh_entries > map_size + 2) {
+			map_size = 11ULL * head->fmh_entries / 10;
+			nmap_size = map_size > INT_MAX ? INT_MAX : map_size;
+			nhead = realloc(head, fsmap_sizeof(nmap_size));
+			if (nhead == NULL) {
+				fprintf(stderr,
+					_("%s: cannot realloc %zu bytes\n"),
+					progname, fsmap_sizeof(nmap_size));
+			} else {
+				head = nhead;
+				map_size = nmap_size;
+			}
+		}
+	}
+
+	/*
+	 * If this is an XFS filesystem, remember the data device.
+	 * (We report AG number/block for data device extents on XFS).
+	 */
+	if (!tab_init) {
+		fs_table_initialise(0, NULL, 0, NULL);
+		tab_init = true;
+	}
+	fs = fs_table_lookup(file->name, FS_MOUNT_POINT);
+	xfs_data_dev = fs ? fs->fs_datadev : 0;
+
+	head->fmh_count = map_size;
+	do {
+		/* Get some extents */
+		i = xfsctl(file->name, file->fd, XFS_IOC_GETFSMAP, head);
+		if (i < 0) {
+			fprintf(stderr, _("%s: xfsctl(XFS_IOC_GETFSMAP)"
+				" iflags=0x%x [\"%s\"]: %s\n"),
+				progname, head->fmh_iflags, file->name,
+				strerror(errno));
+			free(head);
+			exitcode = 1;
+			return 0;
+		}
+
+		if (head->fmh_entries == 0)
+			break;
+
+		if (!vflag)
+			dump_map(&nr, head);
+		else
+			dump_map_verbose(&nr, head, &dumped_flags, &fsgeo);
+
+		p = &head->fmh_recs[head->fmh_entries - 1];
+		if (p->fmr_flags & FMR_OF_LAST)
+			break;
+
+		head->fmh_keys[0] = *p;
+	} while (true);
+
+	if (dumped_flags)
+		dump_verbose_key();
+
+	free(head);
+	return 0;
+}
+
+void
+fsmap_init(void)
+{
+	fsmap_cmd.name = "fsmap";
+	fsmap_cmd.cfunc = fsmap_f;
+	fsmap_cmd.argmin = 0;
+	fsmap_cmd.argmax = -1;
+	fsmap_cmd.flags = CMD_NOMAP_OK;
+	fsmap_cmd.args = _("[-v] [-n nx] [start] [end]");
+	fsmap_cmd.oneline = _("print filesystem mapping for a range of blocks");
+	fsmap_cmd.help = fsmap_help;
+
+	add_command(&fsmap_cmd);
+}
diff --git a/io/init.c b/io/init.c
index 06002e6..1532149 100644
--- a/io/init.c
+++ b/io/init.c
@@ -66,6 +66,7 @@ init_commands(void)
 	file_init();
 	flink_init();
 	freeze_init();
+	fsmap_init();
 	fsync_init();
 	getrusage_init();
 	help_init();
@@ -138,6 +139,7 @@ init(
 	char		*sp;
 	mode_t		mode = 0600;
 	xfs_fsop_geom_t	geometry = { 0 };
+	struct fs_path	fsp;
 
 	progname = basename(argv[0]);
 	setlocale(LC_ALL, "");
@@ -147,6 +149,7 @@ init(
 	pagesize = getpagesize();
 	gettimeofday(&stopwatch, NULL);
 
+	fs_table_initialise(0, NULL, 0, NULL);
 	while ((c = getopt(argc, argv, "ac:C:dFfim:p:nrRstTVx")) != EOF) {
 		switch (c) {
 		case 'a':
@@ -211,11 +214,12 @@ init(
 	}
 
 	while (optind < argc) {
-		if ((c = openfile(argv[optind], &geometry, flags, mode)) < 0)
+		c = openfile(argv[optind], &geometry, flags, mode, &fsp);
+		if (c < 0)
 			exit(1);
 		if (!platform_test_xfs_fd(c))
 			flags |= IO_FOREIGN;
-		if (addfile(argv[optind], c, &geometry, flags) < 0)
+		if (addfile(argv[optind], c, &geometry, flags, &fsp) < 0)
 			exit(1);
 		optind++;
 	}
diff --git a/io/io.h b/io/io.h
index c40aad0..c7100c9 100644
--- a/io/io.h
+++ b/io/io.h
@@ -17,6 +17,7 @@
  */
 
 #include "xfs.h"
+#include "path.h"
 
 /*
  * Read/write patterns (default is always "forward")
@@ -47,6 +48,7 @@ typedef struct fileio {
 	int		flags;		/* flags describing file state */
 	char		*name;		/* file name at time of open */
 	xfs_fsop_geom_t	geom;		/* XFS filesystem geometry */
+	struct fs_path	fs_path;	/* XFS path information */
 } fileio_t;
 
 extern fileio_t		*filetable;	/* open file table */
@@ -76,8 +78,10 @@ extern void *check_mapping_range(mmap_region_t *, off64_t, size_t, int);
  */
 
 extern off64_t		filesize(void);
-extern int		openfile(char *, xfs_fsop_geom_t *, int, mode_t);
-extern int		addfile(char *, int , xfs_fsop_geom_t *, int);
+extern int		openfile(char *, xfs_fsop_geom_t *, int, mode_t,
+				 struct fs_path *);
+extern int		addfile(char *, int , xfs_fsop_geom_t *, int,
+				struct fs_path *);
 extern void		printxattr(uint, int, int, const char *, int, int);
 
 extern unsigned int	recurse_all;
@@ -98,6 +102,7 @@ extern void		encrypt_init(void);
 extern void		file_init(void);
 extern void		flink_init(void);
 extern void		freeze_init(void);
+extern void		fsmap_init(void);
 extern void		fsync_init(void);
 extern void		getrusage_init(void);
 extern void		help_init(void);
diff --git a/io/open.c b/io/open.c
index a12f4a2..16bee82 100644
--- a/io/open.c
+++ b/io/open.c
@@ -144,8 +144,10 @@ openfile(
 	char		*path,
 	xfs_fsop_geom_t	*geom,
 	int		flags,
-	mode_t		mode)
+	mode_t		mode,
+	struct fs_path	*fs_path)
 {
+	struct fs_path	*fsp;
 	int		fd;
 	int		oflags;
 
@@ -210,6 +212,14 @@ openfile(
 			}
 		}
 	}
+
+	if (fs_path) {
+		fsp = fs_table_lookup(path, FS_MOUNT_POINT);
+		if (!fsp)
+			memset(fs_path, 0, sizeof(*fs_path));
+		else
+			*fs_path = *fsp;
+	}
 	return fd;
 }
 
@@ -218,7 +228,8 @@ addfile(
 	char		*name,
 	int		fd,
 	xfs_fsop_geom_t	*geometry,
-	int		flags)
+	int		flags,
+	struct fs_path	*fs_path)
 {
 	char		*filename;
 
@@ -246,6 +257,7 @@ addfile(
 	file->flags = flags;
 	file->name = filename;
 	file->geom = *geometry;
+	file->fs_path = *fs_path;
 	return 0;
 }
 
@@ -287,6 +299,7 @@ open_f(
 	char		*sp;
 	mode_t		mode = 0600;
 	xfs_fsop_geom_t	geometry = { 0 };
+	struct fs_path	fsp;
 
 	if (argc == 1) {
 		if (file)
@@ -349,14 +362,14 @@ open_f(
 		return -1;
 	}
 
-	fd = openfile(argv[optind], &geometry, flags, mode);
+	fd = openfile(argv[optind], &geometry, flags, mode, &fsp);
 	if (fd < 0)
 		return 0;
 
 	if (!platform_test_xfs_fd(fd))
 		flags |= IO_FOREIGN;
 
-	addfile(argv[optind], fd, &geometry, flags);
+	addfile(argv[optind], fd, &geometry, flags, &fsp);
 	return 0;
 }
 
diff --git a/io/pwrite.c b/io/pwrite.c
index 7c0bb7f..1c5dfca 100644
--- a/io/pwrite.c
+++ b/io/pwrite.c
@@ -357,7 +357,7 @@ pwrite_f(
 		return 0;
 
 	c = IO_READONLY | (dflag ? IO_DIRECT : 0);
-	if (infile && ((fd = openfile(infile, NULL, c, 0)) < 0))
+	if (infile && ((fd = openfile(infile, NULL, c, 0, NULL)) < 0))
 		return 0;
 
 	gettimeofday(&t1, NULL);
diff --git a/io/reflink.c b/io/reflink.c
index fe05d1e..f584e8f 100644
--- a/io/reflink.c
+++ b/io/reflink.c
@@ -154,7 +154,7 @@ dedupe_f(
 		return 0;
 	}
 
-	fd = openfile(infile, NULL, IO_READONLY, 0);
+	fd = openfile(infile, NULL, IO_READONLY, 0, NULL);
 	if (fd < 0)
 		return 0;
 
@@ -278,7 +278,7 @@ reflink_f(
 	}
 
 clone_all:
-	fd = openfile(infile, NULL, IO_READONLY, 0);
+	fd = openfile(infile, NULL, IO_READONLY, 0, NULL);
 	if (fd < 0)
 		return 0;
 
diff --git a/io/sendfile.c b/io/sendfile.c
index edd31c9..063fa7f 100644
--- a/io/sendfile.c
+++ b/io/sendfile.c
@@ -115,7 +115,7 @@ sendfile_f(
 
 	if (!infile)
 		fd = filetable[fd].fd;
-	else if ((fd = openfile(infile, NULL, IO_READONLY, 0)) < 0)
+	else if ((fd = openfile(infile, NULL, IO_READONLY, 0, NULL)) < 0)
 		return 0;
 
 	if (optind == argc - 2) {
diff --git a/man/man8/xfs_io.8 b/man/man8/xfs_io.8
index 024f712..479c39b 100644
--- a/man/man8/xfs_io.8
+++ b/man/man8/xfs_io.8
@@ -301,6 +301,53 @@ ioctl.  Options behave as described in the
 .BR xfs_bmap (8)
 manual page.
 .TP
+.BI "fsmap [ \-v ] [ \-n " nx " ] [ " start " ] [ " end " ]
+Prints the mapping of disk blocks used by an XFS filesystem.  The map
+lists each extent used by files, allocation group metadata,
+journalling logs, and static filesystem metadata, as well as any
+regions that are unused.  Each line of the listings takes the
+following form:
+.PP
+.RS
+.IR extent ": " major ":" minor " [" startblock .. endblock "]: " owner " " startoffset .. endoffset " " length
+.PP
+Static filesystem metadata, allocation group metadata, btrees,
+journalling logs, and free space are marked by replacing the
+.IR startoffset .. endoffset
+with the appropriate marker.  All blocks, offsets, and lengths are specified
+in units of 512-byte blocks, no matter what the filesystem's block size is.
+.BI "The optional " start " and " end " arguments can be used to constrain
+the output to a particular range of disk blocks.
+.RE
+.RS 1.0i
+.PD 0
+.TP
+.BI \-n " num_extents"
+If this option is given,
+.B xfs_fsmap
+obtains the extent list of the file in groups of
+.I num_extents
+extents. In the absence of
+.BR \-n ", " xfs_fsmap
+queries the system for the number of extents in the filesystem and uses that
+value to compute the group size.
+.TP
+.B \-v
+Shows verbose information. When this flag is specified, additional AG
+specific information is appended to each line in the following form:
+.IP
+.RS 1.2i
+.IR agno " (" startagblock .. endagblock ") " nblocks " " flags
+.RE
+.IP
+A second
+.B \-v
+option will print out the
+.I flags
+legend.
+.RE
+.PD
+.TP
 .BI "extsize [ \-R | \-D ] [ " value " ]"
 Display and/or modify the preferred extent size used when allocating
 space for the currently open file. If the


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 02/17] xfsprogs: Space management tool
  2017-01-21  8:08 [PATCH v5 00/17] xfsprogs: online scrub/repair support Darrick J. Wong
  2017-01-21  8:08 ` [PATCH 01/17] xfs_io: support the new getfsmap ioctl Darrick J. Wong
@ 2017-01-21  8:08 ` Dave Chinner
  2017-01-22 16:46   ` James Bottomley
  2017-01-21  8:08 ` [PATCH 03/17] spaceman: add FITRIM support Dave Chinner
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 18+ messages in thread
From: Dave Chinner @ 2017-01-21  8:08 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs, linux-fsdevel

xfs_spaceman is intended as a diagnostic and control tool for space
management operations within XFS. Operations like examining free
space, managing allocation policies, issuing block discards on free
space, etc.

The tool is modelled on the xfs_io interface, allowing both
interactive and command line control of the tool, enabling it to be
used in scripts and automated management tools.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 Makefile          |    3 +
 spaceman/Makefile |   34 ++++++++++++
 spaceman/file.c   |  149 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 spaceman/init.c   |  119 ++++++++++++++++++++++++++++++++++++++++++
 spaceman/init.h   |   23 ++++++++
 spaceman/space.h  |   36 +++++++++++++
 6 files changed, 363 insertions(+), 1 deletion(-)
 create mode 100644 spaceman/Makefile
 create mode 100644 spaceman/file.c
 create mode 100644 spaceman/init.c
 create mode 100644 spaceman/init.h
 create mode 100644 spaceman/space.h


diff --git a/Makefile b/Makefile
index 6e45733..3a4872a 100644
--- a/Makefile
+++ b/Makefile
@@ -47,7 +47,7 @@ HDR_SUBDIRS = include libxfs
 DLIB_SUBDIRS = libxlog libxcmd libhandle
 LIB_SUBDIRS = libxfs $(DLIB_SUBDIRS)
 TOOL_SUBDIRS = copy db estimate fsck growfs io logprint mkfs quota \
-		mdrestore repair rtcp m4 man doc debian
+		mdrestore repair rtcp m4 man doc debian spaceman
 
 ifneq ("$(PKG_PLATFORM)","darwin")
 TOOL_SUBDIRS += fsr
@@ -88,6 +88,7 @@ quota: libxcmd
 repair: libxlog libxcmd
 copy: libxlog
 mkfs: libxcmd
+spaceman: libxcmd
 
 ifeq ($(HAVE_BUILDDEFS), yes)
 include $(BUILDRULES)
diff --git a/spaceman/Makefile b/spaceman/Makefile
new file mode 100644
index 0000000..ff8d23e
--- /dev/null
+++ b/spaceman/Makefile
@@ -0,0 +1,34 @@
+#
+# Copyright (c) 2012 Red Hat, Inc.  All Rights Reserved.
+#
+
+TOPDIR = ..
+include $(TOPDIR)/include/builddefs
+
+LTCOMMAND = xfs_spaceman
+HFILES = init.h space.h
+CFILES = init.c \
+	file.c
+
+LLDLIBS = $(LIBXCMD)
+LTDEPENDENCIES = $(LIBXCMD)
+LLDFLAGS = -static
+
+ifeq ($(ENABLE_READLINE),yes)
+LLDLIBS += $(LIBREADLINE) $(LIBTERMCAP)
+endif
+
+ifeq ($(ENABLE_EDITLINE),yes)
+LLDLIBS += $(LIBEDITLINE) $(LIBTERMCAP)
+endif
+
+default: depend $(LTCOMMAND)
+
+include $(BUILDRULES)
+
+install: default
+	$(INSTALL) -m 755 -d $(PKG_SBIN_DIR)
+	$(LTINSTALL) -m 755 $(LTCOMMAND) $(PKG_SBIN_DIR)
+install-dev:
+
+-include .dep
diff --git a/spaceman/file.c b/spaceman/file.c
new file mode 100644
index 0000000..d7ab05b
--- /dev/null
+++ b/spaceman/file.c
@@ -0,0 +1,149 @@
+/*
+ * Copyright (c) 2004-2005 Silicon Graphics, Inc.
+ * Copyright (c) 2012 Red Hat, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+
+#include "libxfs.h"
+#include <sys/mman.h>
+#include "command.h"
+#include "input.h"
+#include "init.h"
+#include "space.h"
+
+static cmdinfo_t print_cmd;
+
+fileio_t	*filetable;
+int		filecount;
+fileio_t	*file;
+
+static void
+print_fileio(
+	fileio_t	*file,
+	int		index,
+	int		braces)
+{
+	printf(_("%c%03d%c %-14s (%s,%s,%s%s%s)\n"),
+		braces? '[' : ' ', index, braces? ']' : ' ', file->name,
+		file->flags & O_SYNC ? _("sync") : _("non-sync"),
+		file->flags & O_DIRECT ? _("direct") : _("non-direct"),
+		file->flags & O_RDONLY ? _("read-only") : _("read-write"),
+		file->flags & O_APPEND ? _(",append-only") : "",
+		file->flags & O_NONBLOCK ? _(",non-block") : "");
+}
+
+int
+filelist_f(void)
+{
+	int		i;
+
+	for (i = 0; i < filecount; i++)
+		print_fileio(&filetable[i], i, &filetable[i] == file);
+	return 0;
+}
+
+static int
+print_f(
+	int		argc,
+	char		**argv)
+{
+	filelist_f();
+	return 0;
+}
+
+int
+openfile(
+	char		*path,
+	xfs_fsop_geom_t	*geom,
+	int		flags,
+	mode_t		mode)
+{
+	int		fd;
+
+	fd = open(path, flags, mode);
+	if (fd < 0) {
+		if ((errno == EISDIR) && (flags & O_RDWR)) {
+			/* make it as if we asked for O_RDONLY & try again */
+			flags &= ~O_RDWR;
+			flags |= O_RDONLY;
+			fd = open(path, flags, mode);
+			if (fd < 0) {
+				perror(path);
+				return -1;
+			}
+		} else {
+			perror(path);
+			return -1;
+		}
+	}
+
+	if (xfsctl(path, fd, XFS_IOC_FSGEOMETRY, geom) < 0) {
+		perror("XFS_IOC_FSGEOMETRY");
+		close(fd);
+		return -1;
+	}
+	return fd;
+}
+
+int
+addfile(
+	char		*name,
+	int		fd,
+	xfs_fsop_geom_t	*geometry,
+	int		flags)
+{
+	char		*filename;
+
+	filename = strdup(name);
+	if (!filename) {
+		perror("strdup");
+		close(fd);
+		return -1;
+	}
+
+	/* Extend the table of currently open files */
+	filetable = (fileio_t *)realloc(filetable,	/* growing */
+					++filecount * sizeof(fileio_t));
+	if (!filetable) {
+		perror("realloc");
+		filecount = 0;
+		free(filename);
+		close(fd);
+		return -1;
+	}
+
+	/* Finally, make this the new active open file */
+	file = &filetable[filecount - 1];
+	file->fd = fd;
+	file->flags = flags;
+	file->name = filename;
+	file->geom = *geometry;
+	return 0;
+}
+
+void
+file_init(void)
+{
+	print_cmd.name = "print";
+	print_cmd.altname = "p";
+	print_cmd.cfunc = print_f;
+	print_cmd.argmin = 0;
+	print_cmd.argmax = 0;
+	print_cmd.flags = CMD_FLAG_ONESHOT;
+	print_cmd.oneline = _("list current open files");
+
+	add_command(&print_cmd);
+}
diff --git a/spaceman/init.c b/spaceman/init.c
new file mode 100644
index 0000000..ebe3b5a
--- /dev/null
+++ b/spaceman/init.c
@@ -0,0 +1,119 @@
+/*
+ * Copyright (c) 2012 Red Hat, Inc
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+
+#include "libxfs.h"
+#include "command.h"
+#include "input.h"
+#include "init.h"
+#include "space.h"
+
+char	*progname;
+int	exitcode;
+
+void
+usage(void)
+{
+	fprintf(stderr,
+		_("Usage: %s [-c cmd] file\n"),
+		progname);
+	exit(1);
+}
+
+static void
+init_commands(void)
+{
+	file_init();
+	help_init();
+	quit_init();
+}
+
+static int
+init_args_command(
+	int	index)
+{
+	if (index >= filecount)
+		return 0;
+	file = &filetable[index++];
+	return index;
+}
+
+static int
+init_check_command(
+	const cmdinfo_t	*ct)
+{
+	if (!(ct->flags & CMD_FLAG_ONESHOT))
+		return 0;
+	return 1;
+}
+
+void
+init(
+	int		argc,
+	char		**argv)
+{
+	int		c, flags = 0;
+	mode_t		mode = 0600;
+	xfs_fsop_geom_t	geometry = { 0 };
+
+	progname = basename(argv[0]);
+	setlocale(LC_ALL, "");
+	bindtextdomain(PACKAGE, LOCALEDIR);
+	textdomain(PACKAGE);
+
+	while ((c = getopt(argc, argv, "c:V")) != EOF) {
+		switch (c) {
+		case 'c':
+			add_user_command(optarg);
+			break;
+		case 'V':
+			printf(_("%s version %s\n"), progname, VERSION);
+			exit(0);
+		default:
+			usage();
+		}
+	}
+
+	if (optind == argc)
+		usage();
+
+	while (optind < argc) {
+		if ((c = openfile(argv[optind], &geometry, flags, mode)) < 0)
+			exit(1);
+		if (!platform_test_xfs_fd(c)) {
+			printf(_("Not an XFS filesystem!\n"));
+			exit(1);
+		}
+		if (addfile(argv[optind], c, &geometry, flags) < 0)
+			exit(1);
+		optind++;
+	}
+
+	init_commands();
+	add_command_iterator(init_args_command);
+	add_check_command(init_check_command);
+}
+
+int
+main(
+	int	argc,
+	char	**argv)
+{
+	init(argc, argv);
+	command_loop();
+	return exitcode;
+}
diff --git a/spaceman/init.h b/spaceman/init.h
new file mode 100644
index 0000000..165e4f5
--- /dev/null
+++ b/spaceman/init.h
@@ -0,0 +1,23 @@
+/*
+ * Copyright (c) 2012 Red Hat, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+
+extern char	*progname;
+extern int	exitcode;
+
+#define min(a,b)	(((a)<(b))?(a):(b))
+#define max(a,b)	(((a)>(b))?(a):(b))
diff --git a/spaceman/space.h b/spaceman/space.h
new file mode 100644
index 0000000..6e1bc52
--- /dev/null
+++ b/spaceman/space.h
@@ -0,0 +1,36 @@
+/*
+ * Copyright (c) 2012 Red Hat, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+
+typedef struct fileio {
+	int		fd;		/* open file descriptor */
+	int		flags;		/* flags describing file state */
+	char		*name;		/* file name at time of open */
+	xfs_fsop_geom_t	geom;		/* XFS filesystem geometry */
+} fileio_t;
+
+extern fileio_t		*filetable;	/* open file table */
+extern int		filecount;	/* number of open files */
+extern fileio_t		*file;		/* active file in file table */
+extern int filelist_f(void);
+
+extern int	openfile(char *, xfs_fsop_geom_t *, int, mode_t);
+extern int	addfile(char *, int , xfs_fsop_geom_t *, int);
+
+extern void	file_init(void);
+extern void	help_init(void);
+extern void	quit_init(void);


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 03/17] spaceman: add FITRIM support
  2017-01-21  8:08 [PATCH v5 00/17] xfsprogs: online scrub/repair support Darrick J. Wong
  2017-01-21  8:08 ` [PATCH 01/17] xfs_io: support the new getfsmap ioctl Darrick J. Wong
  2017-01-21  8:08 ` [PATCH 02/17] xfsprogs: Space management tool Dave Chinner
@ 2017-01-21  8:08 ` Dave Chinner
  2017-01-21  8:08 ` [PATCH 04/17] spaceman: add new speculative prealloc control Dave Chinner
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2017-01-21  8:08 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs, linux-fsdevel

Add support for discarding free space extents via the FITRIM
command. Make it easy to discard a single range, an entire AG or all
the freespace in the filesystem.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 spaceman/Makefile |    2 -
 spaceman/init.c   |    1 
 spaceman/space.h  |    1 
 spaceman/trim.c   |  139 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 142 insertions(+), 1 deletion(-)
 create mode 100644 spaceman/trim.c


diff --git a/spaceman/Makefile b/spaceman/Makefile
index ff8d23e..9fb9142 100644
--- a/spaceman/Makefile
+++ b/spaceman/Makefile
@@ -8,7 +8,7 @@ include $(TOPDIR)/include/builddefs
 LTCOMMAND = xfs_spaceman
 HFILES = init.h space.h
 CFILES = init.c \
-	file.c
+	file.c trim.c
 
 LLDLIBS = $(LIBXCMD)
 LTDEPENDENCIES = $(LIBXCMD)
diff --git a/spaceman/init.c b/spaceman/init.c
index ebe3b5a..8eb4cc7 100644
--- a/spaceman/init.c
+++ b/spaceman/init.c
@@ -40,6 +40,7 @@ init_commands(void)
 	file_init();
 	help_init();
 	quit_init();
+	trim_init();
 }
 
 static int
diff --git a/spaceman/space.h b/spaceman/space.h
index 6e1bc52..7b4f034 100644
--- a/spaceman/space.h
+++ b/spaceman/space.h
@@ -34,3 +34,4 @@ extern int	addfile(char *, int , xfs_fsop_geom_t *, int);
 extern void	file_init(void);
 extern void	help_init(void);
 extern void	quit_init(void);
+extern void	trim_init(void);
diff --git a/spaceman/trim.c b/spaceman/trim.c
new file mode 100644
index 0000000..9bf6565
--- /dev/null
+++ b/spaceman/trim.c
@@ -0,0 +1,139 @@
+/*
+ * Copyright (c) 2012 Red Hat, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+
+#include "libxfs.h"
+#include <linux/fs.h>
+#include "command.h"
+#include "init.h"
+#include "space.h"
+#include "input.h"
+
+#ifndef FITRIM
+#define FITRIM          _IOWR('X', 121, struct fstrim_range)    /* Trim */
+
+struct fstrim_range {
+	__u64 start;
+	__u64 len;
+	__u64 minlen;
+};
+#endif
+
+static cmdinfo_t trim_cmd;
+
+/*
+ * Report on trimace usage in xfs filesystem.
+ */
+static int
+trim_f(
+	int		argc,
+	char		**argv)
+{
+	struct fstrim_range trim = {0};
+	xfs_agnumber_t	agno = 0;
+	off64_t		offset = 0;
+	ssize_t		length = 0;
+	ssize_t		minlen = 0;
+	int		aflag = 0;
+	int		fflag = 0;
+	int		ret;
+	int		c;
+
+	while ((c = getopt(argc, argv, "a:fm:")) != EOF) {
+		switch (c) {
+		case 'a':
+			if (fflag)
+				return command_usage(&trim_cmd);
+			aflag = 1;
+			agno = atoi(optarg);
+			break;
+		case 'f':
+			if (aflag)
+				return command_usage(&trim_cmd);
+			fflag = 1;
+			break;
+		case 'm':
+			minlen = cvtnum(file->geom.blocksize,
+					file->geom.sectsize, argv[optind]);
+			break;
+		default:
+			return command_usage(&trim_cmd);
+		}
+	}
+
+	if (optind != argc - 2 && !(aflag || fflag))
+		return command_usage(&trim_cmd);
+	if (optind != argc) {
+		offset = cvtnum(file->geom.blocksize, file->geom.sectsize,
+				argv[optind]);
+		length = cvtnum(file->geom.blocksize, file->geom.sectsize,
+				argv[optind + 1]);
+	} else if (agno) {
+		offset = agno * file->geom.agblocks * file->geom.blocksize;
+		length = file->geom.agblocks * file->geom.blocksize;
+	} else {
+		offset = 0;
+		length = file->geom.datablocks * file->geom.blocksize;
+	}
+
+	trim.start = offset;
+	trim.len = length;
+	trim.minlen = minlen;
+
+	ret = ioctl(file->fd, FITRIM, (unsigned long)&trim);
+	if (ret < 0) {
+		fprintf(stderr, "%s: ioctl(FITRIM) [\"%s\"]: "
+			"%s\n", progname, file->name, strerror(errno));
+		exitcode = 1;
+		return 0;
+	}
+	return 0;
+}
+
+static void
+trim_help(void)
+{
+	printf(_(
+"\n"
+"Discard filesystem free space\n"
+"\n"
+"Options: [-m minlen] [-f]|[-a agno]|[offset length]\n"
+"\n"
+" -m minlen -- skip freespace extents smaller than minlen\n"
+" -f -- trim all the freespace in the entire filesystem\n"
+" -a agno -- trim all the freespace in the given AG agno\n"
+" offset length -- trim the freespace in the range {offset, length}\n"
+"\n"));
+
+}
+
+void
+trim_init(void)
+{
+	trim_cmd.name = "trim";
+	trim_cmd.altname = "tr";
+	trim_cmd.cfunc = trim_f;
+	trim_cmd.argmin = 1;
+	trim_cmd.argmax = 4;
+	trim_cmd.args = "[-m minlen] [-f]|[-a agno]|[offset length]\n";
+	trim_cmd.flags = CMD_FLAG_ONESHOT;
+	trim_cmd.oneline = _("Discard filesystem free space");
+	trim_cmd.help = trim_help;
+
+	add_command(&trim_cmd);
+}
+


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 04/17] spaceman: add new speculative prealloc control
  2017-01-21  8:08 [PATCH v5 00/17] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (2 preceding siblings ...)
  2017-01-21  8:08 ` [PATCH 03/17] spaceman: add FITRIM support Dave Chinner
@ 2017-01-21  8:08 ` Dave Chinner
  2017-01-21  8:08 ` [PATCH 05/17] spaceman: AG state control Dave Chinner
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2017-01-21  8:08 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs, linux-fsdevel

Add an control interface for purging speculative
preallocation via the new ioctls.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 spaceman/Makefile   |    2 -
 spaceman/init.c     |    1 
 spaceman/prealloc.c |  135 +++++++++++++++++++++++++++++++++++++++++++++++++++
 spaceman/space.h    |    1 
 4 files changed, 138 insertions(+), 1 deletion(-)
 create mode 100644 spaceman/prealloc.c


diff --git a/spaceman/Makefile b/spaceman/Makefile
index 9fb9142..b1f1136 100644
--- a/spaceman/Makefile
+++ b/spaceman/Makefile
@@ -8,7 +8,7 @@ include $(TOPDIR)/include/builddefs
 LTCOMMAND = xfs_spaceman
 HFILES = init.h space.h
 CFILES = init.c \
-	file.c trim.c
+	file.c prealloc.c trim.c
 
 LLDLIBS = $(LIBXCMD)
 LTDEPENDENCIES = $(LIBXCMD)
diff --git a/spaceman/init.c b/spaceman/init.c
index 8eb4cc7..87ef27c 100644
--- a/spaceman/init.c
+++ b/spaceman/init.c
@@ -39,6 +39,7 @@ init_commands(void)
 {
 	file_init();
 	help_init();
+	prealloc_init();
 	quit_init();
 	trim_init();
 }
diff --git a/spaceman/prealloc.c b/spaceman/prealloc.c
new file mode 100644
index 0000000..645b772
--- /dev/null
+++ b/spaceman/prealloc.c
@@ -0,0 +1,135 @@
+/*
+ * Copyright (c) 2012 Red Hat, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+
+#include "libxfs.h"
+#include "command.h"
+#include "input.h"
+#include "init.h"
+#include "space.h"
+
+#ifndef XFS_IOC_FREE_EOFBLOCKS
+#define XFS_IOC_FREE_EOFBLOCKS _IOR ('X', 58, struct xfs_eofblocks)
+
+#define XFS_EOFBLOCKS_VERSION           1
+struct xfs_fs_eofblocks {
+	__u32		eof_version;
+	__u32		eof_flags;
+	uid_t		eof_uid;
+	gid_t		eof_gid;
+	prid_t		eof_prid;
+	__u32		pad32;
+	__u64		eof_min_file_size;
+	__u64		pad64[12];
+};
+
+/* eof_flags values */
+#define XFS_EOF_FLAGS_SYNC		(1 << 0) /* sync/wait mode scan */
+#define XFS_EOF_FLAGS_UID		(1 << 1) /* filter by uid */
+#define XFS_EOF_FLAGS_GID		(1 << 2) /* filter by gid */
+#define XFS_EOF_FLAGS_PRID		(1 << 3) /* filter by project id */
+#define XFS_EOF_FLAGS_MINFILESIZE	(1 << 4) /* filter by min file size */
+
+#endif
+
+static cmdinfo_t prealloc_cmd;
+
+/*
+ * Control preallocation amounts.
+ */
+static int
+prealloc_f(
+	int	argc,
+	char	**argv)
+{
+	struct xfs_fs_eofblocks eofb = {0};
+	int	c;
+
+	eofb.eof_version = XFS_EOFBLOCKS_VERSION;
+
+	while ((c = getopt(argc, argv, "g:m:p:su:")) != EOF) {
+		switch (c) {
+		case 'g':
+			eofb.eof_flags |= XFS_EOF_FLAGS_GID;
+			eofb.eof_gid = atoi(optarg);
+			break;
+		case 'u':
+			eofb.eof_flags |= XFS_EOF_FLAGS_UID;
+			eofb.eof_uid = atoi(optarg);
+			break;
+		case 'p':
+			eofb.eof_flags |= XFS_EOF_FLAGS_PRID;
+			eofb.eof_prid = atoi(optarg);
+			break;
+		case 's':
+			eofb.eof_flags |= XFS_EOF_FLAGS_SYNC;
+			break;
+		case 'm':
+			eofb.eof_flags |= XFS_EOF_FLAGS_MINFILESIZE;
+			eofb.eof_min_file_size = cvtnum(file->geom.blocksize,
+							file->geom.sectsize,
+							optarg);
+			break;
+		case '?':
+		default:
+			return command_usage(&prealloc_cmd);
+		}
+	}
+	if (optind != argc)
+		return command_usage(&prealloc_cmd);
+
+	if (xfsctl(file->name, file->fd, XFS_IOC_FREE_EOFBLOCKS, &eofb) < 0) {
+		fprintf(stderr, _("%s: XFS_IOC_FREE_EOFBLOCKS on %s: %s\n"),
+			progname, file->name, strerror(errno));
+	}
+	return 0;
+}
+
+static void
+prealloc_help(void)
+{
+	printf(_(
+"\n"
+"Control speculative preallocation\n"
+"\n"
+"Options: [-s] [-ugp id] [-m minlen]\n"
+"\n"
+" -s -- synchronous flush - wait for flush to complete\n"
+" -u uid -- remove prealloc on files matching user <uid>\n"
+" -g gid -- remove prealloc on files matching group <gid>\n"
+" -p prid -- remove prealloc on files matching project <prid>\n"
+" -m minlen -- only consider files larger than <minlen>\n"
+"\n"));
+
+}
+
+void
+prealloc_init(void)
+{
+	prealloc_cmd.name = "prealloc";
+	prealloc_cmd.altname = "prealloc";
+	prealloc_cmd.cfunc = prealloc_f;
+	prealloc_cmd.argmin = 1;
+	prealloc_cmd.argmax = -1;
+	prealloc_cmd.args = "[-s] [-ugp id] [-m minlen]\n";
+	prealloc_cmd.flags = CMD_FLAG_ONESHOT;
+	prealloc_cmd.oneline = _("Control specualtive preallocation");
+	prealloc_cmd.help = prealloc_help;
+
+	add_command(&prealloc_cmd);
+}
+
diff --git a/spaceman/space.h b/spaceman/space.h
index 7b4f034..0ae3116 100644
--- a/spaceman/space.h
+++ b/spaceman/space.h
@@ -33,5 +33,6 @@ extern int	addfile(char *, int , xfs_fsop_geom_t *, int);
 
 extern void	file_init(void);
 extern void	help_init(void);
+extern void	prealloc_init(void);
 extern void	quit_init(void);
 extern void	trim_init(void);


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 05/17] spaceman: AG state control
  2017-01-21  8:08 [PATCH v5 00/17] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (3 preceding siblings ...)
  2017-01-21  8:08 ` [PATCH 04/17] spaceman: add new speculative prealloc control Dave Chinner
@ 2017-01-21  8:08 ` Dave Chinner
  2017-01-21  8:08 ` [PATCH 06/17] spaceman: Free space mapping command Dave Chinner
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2017-01-21  8:08 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs, linux-fsdevel

Add support for a new allocation group state control ioctl. This
allows control of various AG parameters, such as whether inode
allocation is allowed in the AG, metadata preference, whether new
allocations are allowed, etc. This requires a new ioctl.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 spaceman/Makefile |    2 
 spaceman/ag.c     |  221 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 222 insertions(+), 1 deletion(-)
 create mode 100644 spaceman/ag.c


diff --git a/spaceman/Makefile b/spaceman/Makefile
index b1f1136..08709b3 100644
--- a/spaceman/Makefile
+++ b/spaceman/Makefile
@@ -8,7 +8,7 @@ include $(TOPDIR)/include/builddefs
 LTCOMMAND = xfs_spaceman
 HFILES = init.h space.h
 CFILES = init.c \
-	file.c prealloc.c trim.c
+	ag.c file.c prealloc.c trim.c
 
 LLDLIBS = $(LIBXCMD)
 LTDEPENDENCIES = $(LIBXCMD)
diff --git a/spaceman/ag.c b/spaceman/ag.c
new file mode 100644
index 0000000..567fe7a
--- /dev/null
+++ b/spaceman/ag.c
@@ -0,0 +1,221 @@
+/*
+ * Copyright (c) 2012 Red Hat, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+
+#include "libxfs.h"
+#include <linux/dqblk_xfs.h>
+#include "command.h"
+#include "input.h"
+#include "init.h"
+#include "space.h"
+
+#ifndef XFS_IOC_AGCONTROL
+#define XFS_IOC_AGCONTROL _IOWR ('X', 60, struct xfs_agcontrol)
+
+#define XFS_AGCONTROL_VERSION		1
+struct xfs_agcontrol {
+	__u32		version;
+	__u32		flags;
+	__u32		agno;
+	__u32		state;
+	__u64		pad[8];
+};
+
+/* control flags */
+#define XFS_AGCONTROL_GETAGFSTATE	(1 << 0)	/* get AGF state */
+#define XFS_AGCONTROL_SETAGFSTATE	(1 << 1)	/* set AGF state */
+#define XFS_AGCONTROL_GETAGISTATE	(1 << 2)	/* get AGI state */
+#define XFS_AGCONTROL_SETAGISTATE	(1 << 3)	/* set AGI state */
+
+/* state flags */
+
+/*
+ * inode and allocation states are split. AGF and AGI online state will move in
+ * sync as it is really a whole AG state. No allocation flags imply no new
+ * allocations, but inodes and extents can be removed. Readonly means no
+ * modification (alloc or free) is allowed. This is to allow different
+ * operations to be performed. e.g. emptying an AG in preparation for a shrink
+ * require NOALLOC state, but an AG that has a corrupted freespace btree might
+ * be switched to READONLY until the freespace tree is rebuilt. An AGF/AGI in
+ * this corrupt/ro state will set the relevant corruption flag in the state
+ * field....
+ */
+#define XFS_AGFSTATE_ONLINE		(1 << 0)	/* AGF online */
+#define XFS_AGFSTATE_NOALLOC		(1 << 1)	/* No new allocation */
+#define XFS_AGFSTATE_READONLY		(1 << 2)	/* AGF is immutable */
+#define XFS_AGFSTATE_METADATA		(1 << 3)	/* metadata preferred */
+#define XFS_AGFSTATE_CORRUPT_BNO	(1 << 4)	/* bno freespace corrupt */
+#define XFS_AGFSTATE_CORRUPT_CNT	(1 << 5)	/* cnt freespace corrupt */
+#define XFS_AGFSTATE_CORRUPT_AGFL	(1 << 6)	/* AGFL freespace corrupt */
+
+#define XFS_AGISTATE_ONLINE		(1 << 0)	/* AGI online */
+#define XFS_AGISTATE_NOALLOC		(1 << 1)	/* No new allocation */
+#define XFS_AGISTATE_READONLY		(1 << 2)	/* AGI is immutable */
+#define XFS_AGISTATE_CORRUPT_TREE	(1 << 2)	/* AGI btree corrupt */
+
+#endif
+
+static cmdinfo_t agfctl_cmd;
+static cmdinfo_t agictl_cmd;
+
+static int
+agfctl_f(
+	int		argc,
+	char		**argv)
+{
+	struct xfs_agcontrol agctl = {0};
+	xfs_agnumber_t	agno;
+	int		gflag = 0;
+	int		c;
+
+	while ((c = getopt(argc, argv, "gs")) != EOF) {
+		switch (c) {
+		case 'g':
+			gflag = 1;
+			break;
+		default:
+			return command_usage(&agfctl_cmd);
+		}
+	}
+	if (optind != argc - 1)
+		return command_usage(&agfctl_cmd);
+
+	agno = atoi(argv[optind]);
+	if (agno >= file->geom.agcount) {
+		fprintf(stderr, _("%s: agno %d out of range (max %d)\n"),
+			progname, agno, file->geom.agcount);
+		exitcode = 1;
+		return 0;
+	}
+
+	agctl.version = XFS_AGCONTROL_VERSION;
+	agctl.agno = agno;
+	if (gflag)
+		agctl.flags = XFS_AGCONTROL_GETAGFSTATE;
+
+	if (xfsctl(file->name, file->fd, XFS_IOC_AGCONTROL, &agctl) < 0) {
+		fprintf(stderr, _("%s: XFS_IOC_AGCONTROL on %s: %s\n"),
+			progname, file->name, strerror(errno));
+	}
+	return 0;
+}
+
+static void
+agfctl_help(void)
+{
+	printf(_(
+"\n"
+"AGF state control\n"
+"\n"
+"Options: [-g] agno\n"
+"\n"
+" -g -- get state\n"
+" agno -- AG to operate on\n"
+"\n"));
+
+}
+
+void
+agfctl_init(void)
+{
+	agfctl_cmd.name = "agfctl";
+	agfctl_cmd.altname = "agfctl";
+	agfctl_cmd.cfunc = agfctl_f;
+	agfctl_cmd.argmin = 2;
+	agfctl_cmd.argmax = -1;
+	agfctl_cmd.args = "agno\n";
+	agfctl_cmd.flags = CMD_FLAG_ONESHOT;
+	agfctl_cmd.oneline = _("AGF state control");
+	agfctl_cmd.help = agfctl_help;
+
+	add_command(&agfctl_cmd);
+}
+
+static int
+agictl_f(
+	int		argc,
+	char		**argv)
+{
+	struct xfs_agcontrol agctl = {0};
+	xfs_agnumber_t	agno;
+	int		gflag = 0;
+	int		c;
+
+	while ((c = getopt(argc, argv, "gs")) != EOF) {
+		switch (c) {
+		case 'g':
+			gflag = 1;
+			break;
+		default:
+			return command_usage(&agictl_cmd);
+		}
+	}
+	if (optind != argc - 1)
+		return command_usage(&agictl_cmd);
+
+	agno = atoi(argv[optind]);
+	if (agno >= file->geom.agcount) {
+		fprintf(stderr, _("%s: agno %d out of range (max %d)\n"),
+			progname, agno, file->geom.agcount);
+		exitcode = 1;
+		return 0;
+	}
+
+	agctl.version = XFS_AGCONTROL_VERSION;
+	agctl.agno = agno;
+	if (gflag)
+		agctl.flags = XFS_AGCONTROL_GETAGISTATE;
+
+	if (xfsctl(file->name, file->fd, XFS_IOC_AGCONTROL, &agctl) < 0) {
+		fprintf(stderr, _("%s: XFS_IOC_AGCONTROL on %s: %s\n"),
+			progname, file->name, strerror(errno));
+		exitcode = 1;
+		return 0;
+	}
+	return 0;
+}
+
+static void
+agictl_help(void)
+{
+	printf(_(
+"\n"
+"AGI state control\n"
+"\n"
+"Options: [-g] agno\n"
+"\n"
+" -g -- get state\n"
+" agno -- AG to operate on\n"
+"\n"));
+
+}
+
+void
+agictl_init(void)
+{
+	agictl_cmd.name = "agictl";
+	agictl_cmd.altname = "agictl";
+	agictl_cmd.cfunc = agictl_f;
+	agictl_cmd.argmin = 2;
+	agictl_cmd.argmax = -1;
+	agictl_cmd.args = "agno\n";
+	agictl_cmd.flags = CMD_FLAG_ONESHOT;
+	agictl_cmd.oneline = _("AGI state control");
+	agictl_cmd.help = agictl_help;
+
+	add_command(&agictl_cmd);
+}


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 06/17] spaceman: Free space mapping command
  2017-01-21  8:08 [PATCH v5 00/17] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (4 preceding siblings ...)
  2017-01-21  8:08 ` [PATCH 05/17] spaceman: AG state control Dave Chinner
@ 2017-01-21  8:08 ` Dave Chinner
  2017-01-21  8:08 ` [PATCH 07/17] xfs_spaceman: add a man page Darrick J. Wong
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2017-01-21  8:08 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs, linux-fsdevel

Add freespace mapping tool modelled on the xfs_db freesp command.
The advantage of this command over xfs_db is that it can be done
online and is coherent with concurrent modifications to the
filesystem.

This requires the kernel to support the XFS_IOC_GETFSMAP ioctl to map
free space indexes.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
[darrick: port from FIEMAPFS to GETFSMAP]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 spaceman/Makefile   |   15 ++
 spaceman/ag.c       |    1 
 spaceman/file.c     |   18 ++-
 spaceman/freesp.c   |  367 +++++++++++++++++++++++++++++++++++++++++++++++++++
 spaceman/init.c     |    9 +
 spaceman/prealloc.c |    1 
 spaceman/space.h    |   12 +-
 spaceman/trim.c     |    1 
 8 files changed, 416 insertions(+), 8 deletions(-)
 create mode 100644 spaceman/freesp.c


diff --git a/spaceman/Makefile b/spaceman/Makefile
index 08709b3..1370601 100644
--- a/spaceman/Makefile
+++ b/spaceman/Makefile
@@ -7,8 +7,12 @@ include $(TOPDIR)/include/builddefs
 
 LTCOMMAND = xfs_spaceman
 HFILES = init.h space.h
-CFILES = init.c \
-	ag.c file.c prealloc.c trim.c
+CFILES = ag.c \
+	 file.c \
+	 init.c \
+	 prealloc.c \
+	 trim.c
+
 
 LLDLIBS = $(LIBXCMD)
 LTDEPENDENCIES = $(LIBXCMD)
@@ -22,6 +26,13 @@ ifeq ($(ENABLE_EDITLINE),yes)
 LLDLIBS += $(LIBEDITLINE) $(LIBTERMCAP)
 endif
 
+ifeq ($(HAVE_FIEMAP),yes)
+CFILES += freesp.c
+LCFLAGS += -DHAVE_FIEMAP
+else
+LSRCFILES += freesp.c
+endif
+
 default: depend $(LTCOMMAND)
 
 include $(BUILDRULES)
diff --git a/spaceman/ag.c b/spaceman/ag.c
index 567fe7a..18ae8a4 100644
--- a/spaceman/ag.c
+++ b/spaceman/ag.c
@@ -21,6 +21,7 @@
 #include "command.h"
 #include "input.h"
 #include "init.h"
+#include "path.h"
 #include "space.h"
 
 #ifndef XFS_IOC_AGCONTROL
diff --git a/spaceman/file.c b/spaceman/file.c
index d7ab05b..1f00236 100644
--- a/spaceman/file.c
+++ b/spaceman/file.c
@@ -22,6 +22,7 @@
 #include "command.h"
 #include "input.h"
 #include "init.h"
+#include "path.h"
 #include "space.h"
 
 static cmdinfo_t print_cmd;
@@ -69,8 +70,10 @@ openfile(
 	char		*path,
 	xfs_fsop_geom_t	*geom,
 	int		flags,
-	mode_t		mode)
+	mode_t		mode,
+	struct fs_path	*fs_path)
 {
+	struct fs_path	*fsp;
 	int		fd;
 
 	fd = open(path, flags, mode);
@@ -95,6 +98,15 @@ openfile(
 		close(fd);
 		return -1;
 	}
+
+	if (fs_path) {
+		fsp = fs_table_lookup(path, FS_MOUNT_POINT);
+		if (!fsp) {
+			fprintf(stderr, _("Unable to find XFS information."));
+			return -1;
+		}
+		*fs_path = *fsp;
+	}
 	return fd;
 }
 
@@ -103,7 +115,8 @@ addfile(
 	char		*name,
 	int		fd,
 	xfs_fsop_geom_t	*geometry,
-	int		flags)
+	int		flags,
+	struct fs_path	*fs_path)
 {
 	char		*filename;
 
@@ -131,6 +144,7 @@ addfile(
 	file->flags = flags;
 	file->name = filename;
 	file->geom = *geometry;
+	file->fs_path = *fs_path;
 	return 0;
 }
 
diff --git a/spaceman/freesp.c b/spaceman/freesp.c
new file mode 100644
index 0000000..ffe2fdb
--- /dev/null
+++ b/spaceman/freesp.c
@@ -0,0 +1,367 @@
+/*
+ * Copyright (c) 2000-2001,2005 Silicon Graphics, Inc.
+ * Copyright (c) 2012 Red Hat, Inc.
+ * Copyright (c) 2017 Oracle.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+
+#include "libxfs.h"
+#include <linux/fiemap.h>
+#include "command.h"
+#include "init.h"
+#include "path.h"
+#include "space.h"
+
+typedef struct histent
+{
+	int		low;
+	int		high;
+	long long	count;
+	long long	blocks;
+} histent_t;
+
+static int		agcount;
+static xfs_agnumber_t	*aglist;
+static int		dumpflag;
+static int		equalsize;
+static histent_t	*hist;
+static int		histcount;
+static int		multsize;
+static int		seen1;
+static int		summaryflag;
+static bool		rtflag;
+static long long	totblocks;
+static long long	totexts;
+
+static cmdinfo_t freesp_cmd;
+
+static void
+addhistent(
+	int	h)
+{
+	hist = realloc(hist, (histcount + 1) * sizeof(*hist));
+	if (h == 0)
+		h = 1;
+	hist[histcount].low = h;
+	hist[histcount].count = hist[histcount].blocks = 0;
+	histcount++;
+	if (h == 1)
+		seen1 = 1;
+}
+
+static void
+addtohist(
+	xfs_agnumber_t	agno,
+	xfs_agblock_t	agbno,
+	off64_t		len)
+{
+	int		i;
+
+	if (dumpflag)
+		printf("%8d %8d %8Zu\n", agno, agbno, len);
+	totexts++;
+	totblocks += len;
+	for (i = 0; i < histcount; i++) {
+		if (hist[i].high >= len) {
+			hist[i].count++;
+			hist[i].blocks += len;
+			break;
+		}
+	}
+}
+
+static int
+hcmp(
+	const void	*a,
+	const void	*b)
+{
+	return ((histent_t *)a)->low - ((histent_t *)b)->low;
+}
+
+static void
+histinit(
+	int	maxlen)
+{
+	int	i;
+
+	if (equalsize) {
+		for (i = 1; i < maxlen; i += equalsize)
+			addhistent(i);
+	} else if (multsize) {
+		for (i = 1; i < maxlen; i *= multsize)
+			addhistent(i);
+	} else {
+		if (!seen1)
+			addhistent(1);
+		qsort(hist, histcount, sizeof(*hist), hcmp);
+	}
+	for (i = 0; i < histcount; i++) {
+		if (i < histcount - 1)
+			hist[i].high = hist[i + 1].low - 1;
+		else
+			hist[i].high = maxlen;
+	}
+}
+
+static void
+printhist(void)
+{
+	int	i;
+
+	printf("%7s %7s %7s %7s %6s\n",
+		_("from"), _("to"), _("extents"), _("blocks"), _("pct"));
+	for (i = 0; i < histcount; i++) {
+		if (hist[i].count)
+			printf("%7d %7d %7lld %7lld %6.2f\n", hist[i].low,
+				hist[i].high, hist[i].count, hist[i].blocks,
+				hist[i].blocks * 100.0 / totblocks);
+	}
+}
+
+static int
+inaglist(
+	xfs_agnumber_t	agno)
+{
+	int		i;
+
+	if (agcount == 0)
+		return 1;
+	for (i = 0; i < agcount; i++)
+		if (aglist[i] == agno)
+			return 1;
+	return 0;
+}
+
+#define NR_EXTENTS 128
+
+static void
+scan_ag(
+	xfs_agnumber_t		agno)
+{
+	struct fsmap_head	*fsmap;
+	struct fsmap		*extent;
+	struct fsmap		*l, *h;
+	struct fsmap		*p;
+	off64_t			blocksize = file->geom.blocksize;
+	off64_t			bperag;
+	off64_t			aglen;
+	xfs_agblock_t		agbno;
+	int			ret;
+	int			i;
+
+	bperag = (off64_t)file->geom.agblocks * blocksize;
+
+	fsmap = malloc(fsmap_sizeof(NR_EXTENTS));
+	if (!fsmap) {
+		fprintf(stderr, _("%s: fsmap malloc failed.\n"), progname);
+		exitcode = 1;
+		return;
+	}
+
+	memset(fsmap, 0, sizeof(*fsmap));
+	fsmap->fmh_count = NR_EXTENTS;
+	l = fsmap->fmh_keys;
+	h = fsmap->fmh_keys + 1;
+	if (agno != NULLAGNUMBER) {
+		l->fmr_physical = agno * bperag;
+		h->fmr_physical = ((agno + 1) * bperag) - 1;
+		l->fmr_device = h->fmr_device = file->fs_path.fs_datadev;
+	} else {
+		l->fmr_physical = 0;
+		h->fmr_physical = ULLONG_MAX;
+		l->fmr_device = h->fmr_device = file->fs_path.fs_rtdev;
+	}
+	h->fmr_owner = ULLONG_MAX;
+	h->fmr_flags = UINT_MAX;
+	h->fmr_offset = ULLONG_MAX;
+
+	while (true) {
+		ret = xfsctl(file->name, file->fd, XFS_IOC_GETFSMAP, fsmap);
+		if (ret < 0) {
+			fprintf(stderr, "%s: ioctl(XFS_IOC_GETFSMAP) [\"%s\"]: "
+				"%s\n", progname, file->name, strerror(errno));
+			free(fsmap);
+			exitcode = 1;
+			return;
+		}
+
+		/* No more extents to map, exit */
+		if (!fsmap->fmh_entries)
+			break;
+
+		for (i = 0, extent = fsmap->fmh_recs;
+		     i < fsmap->fmh_entries;
+		     i++, extent++) {
+			if (!(extent->fmr_flags & FMR_OF_SPECIAL_OWNER) ||
+			    extent->fmr_owner != FMR_OWN_FREE)
+				continue;
+			agbno = (extent->fmr_physical - (bperag * agno)) /
+								blocksize;
+			aglen = extent->fmr_length / blocksize;
+
+			addtohist(agno, agbno, aglen);
+		}
+
+		p = &fsmap->fmh_recs[fsmap->fmh_entries - 1];
+		if (p->fmr_flags & FMR_OF_LAST)
+			break;
+
+		fsmap->fmh_keys[0] = *p;
+	}
+}
+static void
+aglistadd(
+	char	*a)
+{
+	aglist = realloc(aglist, (agcount + 1) * sizeof(*aglist));
+	aglist[agcount] = (xfs_agnumber_t)atoi(a);
+	agcount++;
+}
+
+static int
+init(
+	int		argc,
+	char		**argv)
+{
+	int		c;
+	int		speced = 0;
+
+	agcount = dumpflag = equalsize = multsize = optind = 0;
+	histcount = seen1 = summaryflag = 0;
+	totblocks = totexts = 0;
+	aglist = NULL;
+	hist = NULL;
+	rtflag = false;
+	while ((c = getopt(argc, argv, "a:bde:h:m:rs")) != EOF) {
+		switch (c) {
+		case 'a':
+			aglistadd(optarg);
+			break;
+		case 'b':
+			if (speced)
+				return 0;
+			multsize = 2;
+			speced = 1;
+			break;
+		case 'd':
+			dumpflag = 1;
+			break;
+		case 'e':
+			if (speced)
+				return 0;
+			equalsize = atoi(optarg);
+			speced = 1;
+			break;
+		case 'h':
+			if (speced && !histcount)
+				return 0;
+			addhistent(atoi(optarg));
+			speced = 1;
+			break;
+		case 'm':
+			if (speced)
+				return 0;
+			multsize = atoi(optarg);
+			speced = 1;
+			break;
+		case 'r':
+			rtflag = true;
+			break;
+		case 's':
+			summaryflag = 1;
+			break;
+		case '?':
+			return 0;
+		}
+	}
+	if (optind != argc)
+		return 0;
+	if (!speced)
+		multsize = 2;
+	histinit(file->geom.agblocks);
+	return 1;
+}
+
+/*
+ * Report on freespace usage in xfs filesystem.
+ */
+static int
+freesp_f(
+	int		argc,
+	char		**argv)
+{
+	xfs_agnumber_t	agno;
+
+	if (!init(argc, argv))
+		return 0;
+	if (rtflag)
+		scan_ag(NULLAGNUMBER);
+	for (agno = 0; !rtflag && agno < file->geom.agcount; agno++)  {
+		if (inaglist(agno))
+			scan_ag(agno);
+	}
+	if (histcount)
+		printhist();
+	if (summaryflag) {
+		printf(_("total free extents %lld\n"), totexts);
+		printf(_("total free blocks %lld\n"), totblocks);
+		printf(_("average free extent size %g\n"),
+			(double)totblocks / (double)totexts);
+	}
+	if (aglist)
+		free(aglist);
+	if (hist)
+		free(hist);
+	return 0;
+}
+
+static void
+freesp_help(void)
+{
+	printf(_(
+"\n"
+"Examine filesystem free space\n"
+"\n"
+"Options: [-bcds] [-a agno] [-e bsize] [-h h1]... [-m bmult]\n"
+"\n"
+" -b -- binary histogram bin size\n"
+" -d -- debug output\n"
+" -r -- display realtime device free space information\n"
+" -s -- emit freespace summary information\n"
+" -a agno -- scan only the given AG agno\n"
+" -e bsize -- use fixed histogram bin size of bsize\n"
+" -h h1 -- use custom histogram bin size of h1. Multiple specifications allowed.\n"
+" -m bmult -- use histogram bin size multiplier of bmult\n"
+"\n"));
+
+}
+
+void
+freesp_init(void)
+{
+	freesp_cmd.name = "freesp";
+	freesp_cmd.altname = "fsp";
+	freesp_cmd.cfunc = freesp_f;
+	freesp_cmd.argmin = 0;
+	freesp_cmd.argmax = -1;
+	freesp_cmd.args = "[-bcds] [-a agno] [-e bsize] [-h h1]... [-m bmult]\n";
+	freesp_cmd.flags = CMD_FLAG_ONESHOT;
+	freesp_cmd.oneline = _("Examine filesystem free space");
+	freesp_cmd.help = freesp_help;
+
+	add_command(&freesp_cmd);
+}
+
diff --git a/spaceman/init.c b/spaceman/init.c
index 87ef27c..aef09dc 100644
--- a/spaceman/init.c
+++ b/spaceman/init.c
@@ -20,6 +20,7 @@
 #include "command.h"
 #include "input.h"
 #include "init.h"
+#include "path.h"
 #include "space.h"
 
 char	*progname;
@@ -38,6 +39,7 @@ static void
 init_commands(void)
 {
 	file_init();
+	freesp_init();
 	help_init();
 	prealloc_init();
 	quit_init();
@@ -71,12 +73,14 @@ init(
 	int		c, flags = 0;
 	mode_t		mode = 0600;
 	xfs_fsop_geom_t	geometry = { 0 };
+	struct fs_path	fsp;
 
 	progname = basename(argv[0]);
 	setlocale(LC_ALL, "");
 	bindtextdomain(PACKAGE, LOCALEDIR);
 	textdomain(PACKAGE);
 
+	fs_table_initialise(0, NULL, 0, NULL);
 	while ((c = getopt(argc, argv, "c:V")) != EOF) {
 		switch (c) {
 		case 'c':
@@ -94,13 +98,14 @@ init(
 		usage();
 
 	while (optind < argc) {
-		if ((c = openfile(argv[optind], &geometry, flags, mode)) < 0)
+		c = openfile(argv[optind], &geometry, flags, mode, &fsp);
+		if (c < 0)
 			exit(1);
 		if (!platform_test_xfs_fd(c)) {
 			printf(_("Not an XFS filesystem!\n"));
 			exit(1);
 		}
-		if (addfile(argv[optind], c, &geometry, flags) < 0)
+		if (addfile(argv[optind], c, &geometry, flags, &fsp) < 0)
 			exit(1);
 		optind++;
 	}
diff --git a/spaceman/prealloc.c b/spaceman/prealloc.c
index 645b772..ac7130c 100644
--- a/spaceman/prealloc.c
+++ b/spaceman/prealloc.c
@@ -20,6 +20,7 @@
 #include "command.h"
 #include "input.h"
 #include "init.h"
+#include "path.h"
 #include "space.h"
 
 #ifndef XFS_IOC_FREE_EOFBLOCKS
diff --git a/spaceman/space.h b/spaceman/space.h
index 0ae3116..872905d 100644
--- a/spaceman/space.h
+++ b/spaceman/space.h
@@ -21,6 +21,7 @@ typedef struct fileio {
 	int		flags;		/* flags describing file state */
 	char		*name;		/* file name at time of open */
 	xfs_fsop_geom_t	geom;		/* XFS filesystem geometry */
+	struct fs_path	fs_path;	/* XFS path information */
 } fileio_t;
 
 extern fileio_t		*filetable;	/* open file table */
@@ -28,11 +29,18 @@ extern int		filecount;	/* number of open files */
 extern fileio_t		*file;		/* active file in file table */
 extern int filelist_f(void);
 
-extern int	openfile(char *, xfs_fsop_geom_t *, int, mode_t);
-extern int	addfile(char *, int , xfs_fsop_geom_t *, int);
+extern int	openfile(char *, xfs_fsop_geom_t *, int, mode_t,
+			 struct fs_path *);
+extern int	addfile(char *, int , xfs_fsop_geom_t *, int, struct fs_path *);
 
 extern void	file_init(void);
 extern void	help_init(void);
 extern void	prealloc_init(void);
 extern void	quit_init(void);
 extern void	trim_init(void);
+
+#ifdef HAVE_FIEMAP
+extern void	freesp_init(void);
+#else
+static inline void freesp_init(void) {};
+#endif
diff --git a/spaceman/trim.c b/spaceman/trim.c
index 9bf6565..d1e5d82 100644
--- a/spaceman/trim.c
+++ b/spaceman/trim.c
@@ -20,6 +20,7 @@
 #include <linux/fs.h>
 #include "command.h"
 #include "init.h"
+#include "path.h"
 #include "space.h"
 #include "input.h"
 


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 07/17] xfs_spaceman: add a man page
  2017-01-21  8:08 [PATCH v5 00/17] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (5 preceding siblings ...)
  2017-01-21  8:08 ` [PATCH 06/17] spaceman: Free space mapping command Dave Chinner
@ 2017-01-21  8:08 ` Darrick J. Wong
  2017-01-21  8:08 ` [PATCH 08/17] xfs_spaceman: add group summary mode Darrick J. Wong
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Darrick J. Wong @ 2017-01-21  8:08 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs, linux-fsdevel

Add a manual page describing xfs_spaceman's behavior.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 man/man8/xfs_spaceman.8 |  102 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 102 insertions(+)
 create mode 100644 man/man8/xfs_spaceman.8


diff --git a/man/man8/xfs_spaceman.8 b/man/man8/xfs_spaceman.8
new file mode 100644
index 0000000..c1d19c0
--- /dev/null
+++ b/man/man8/xfs_spaceman.8
@@ -0,0 +1,102 @@
+.TH xfs_spaceman 8
+.SH NAME
+xfs_spaceman \- show free space information about an XFS filesystem
+.SH SYNOPSIS
+.B xfs_spaceman
+[
+.B \-c
+.I cmd
+]
+.I file
+.br
+.B xfs_spaceman \-V
+.SH DESCRIPTION
+.B xfs_spaceman
+reports and controls free space usage in an XFS filesystem.
+.SH OPTIONS
+.TP 1.0i
+.BI \-c " cmd"
+.B xfs_spaceman
+commands may be run interactively (the default) or as arguments on
+the command line. Multiple
+.B \-c
+arguments may be given. The commands are run in the sequence given,
+then the program exits.
+
+.SH COMMANDS
+.TP
+.BI "freesp [ \-sr ] [ \-b | \-e bsize | \-h h1 [ \-m bmult ]] [-a agno]"
+With no arguments,
+.B freesp
+shows a histogram of all free space extents in the filesystem.
+The
+.B -b
+argument establishes that the histogram bins are successive powers of two.
+This is the default.
+The
+.BR -h " and " -m
+options can be used to specify a custom histogram bin size as well as a
+multiplication factor for subsequent bin sizes.
+The
+.BR -e
+option fixes the histogram size to a particular value.
+The
+.BR -a " and " -r
+options constrain the free space information report to a particular AG
+or the realtime device, respectively.  The
+.B -a
+option may be specified multiple times.
+A summary of free space information will be printed if the
+.B -s
+option is given.
+.TP
+.BR "help [ " command " ]"
+Display a brief description of one or all commands.
+.TP
+.BI "prealloc [ \-ugp id ] [ \-m minlen ] [ \-s ]"
+Controls speculative preallocation.  The
+.BR -u ","
+.BR -g ","
+and
+.B -p
+options will clear all speculative preallocations for a given user,
+group, or project ID, respectively.
+The
+.B -m
+option causes the operation to ignore any file with a size smaller than
+.BR minlen "."
+The
+.B -s
+option will flush all dirty data and metadata to disk.
+.TP
+.B print
+Display a list of all open files.
+.TP
+.B quit
+Exit
+.BR xfs_spaceman .
+.TP
+.BI "trim [ \-f ] [ \-a agno ] [ \-m minlen ] [" " offset length " ]
+Instructs the underlying storage device to release all storage that may
+be backing free space in the filesystem.
+The
+.B -f
+option trims all free space in the entire filesystem.
+The
+.B -a
+option trims only the free space in a given AG.
+The
+.B -m
+option only trims free space extents that are longer than
+.IR minlen "."
+The
+.IR offset " and " length
+arguments trim all free space between
+.I offset
+and
+.IR "offset+length" "."
+The
+.BR -a " and " -f
+options are mutually exclusive with each other as well as with the
+.IR offset " and " length
+arguments.


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 08/17] xfs_spaceman: add group summary mode
  2017-01-21  8:08 [PATCH v5 00/17] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (6 preceding siblings ...)
  2017-01-21  8:08 ` [PATCH 07/17] xfs_spaceman: add a man page Darrick J. Wong
@ 2017-01-21  8:08 ` Darrick J. Wong
  2017-01-21  8:09 ` [PATCH 09/17] xfs_db: introduce fuzz command Darrick J. Wong
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Darrick J. Wong @ 2017-01-21  8:08 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs, linux-fsdevel

Add a -g switch to show only a per-group summary.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 man/man8/xfs_spaceman.8 |    8 +++++++-
 spaceman/freesp.c       |   27 ++++++++++++++++++++++++---
 2 files changed, 31 insertions(+), 4 deletions(-)


diff --git a/man/man8/xfs_spaceman.8 b/man/man8/xfs_spaceman.8
index c1d19c0..a57a0c3 100644
--- a/man/man8/xfs_spaceman.8
+++ b/man/man8/xfs_spaceman.8
@@ -25,7 +25,7 @@ then the program exits.
 
 .SH COMMANDS
 .TP
-.BI "freesp [ \-sr ] [ \-b | \-e bsize | \-h h1 [ \-m bmult ]] [-a agno]"
+.BI "freesp [ \-srg ] [ \-b | \-e bsize | \-h h1 [ \-m bmult ]] [-a agno]"
 With no arguments,
 .B freesp
 shows a histogram of all free space extents in the filesystem.
@@ -49,6 +49,12 @@ option may be specified multiple times.
 A summary of free space information will be printed if the
 .B -s
 option is given.
+The
+.B -g
+option prints a brief per-AG summary of the free space found in that AG.
+If
+.B -r
+is specified it will also report on free space in the realtime device.
 .TP
 .BR "help [ " command " ]"
 Display a brief description of one or all commands.
diff --git a/spaceman/freesp.c b/spaceman/freesp.c
index ffe2fdb..3073e24 100644
--- a/spaceman/freesp.c
+++ b/spaceman/freesp.c
@@ -42,6 +42,7 @@ static int		histcount;
 static int		multsize;
 static int		seen1;
 static int		summaryflag;
+static int		gflag;
 static bool		rtflag;
 static long long	totblocks;
 static long long	totexts;
@@ -159,6 +160,8 @@ scan_ag(
 	off64_t			bperag;
 	off64_t			aglen;
 	xfs_agblock_t		agbno;
+	unsigned long long	freeblks = 0;
+	unsigned long long	freeexts = 0;
 	int			ret;
 	int			i;
 
@@ -211,6 +214,8 @@ scan_ag(
 			agbno = (extent->fmr_physical - (bperag * agno)) /
 								blocksize;
 			aglen = extent->fmr_length / blocksize;
+			freeblks += aglen;
+			freeexts++;
 
 			addtohist(agno, agbno, aglen);
 		}
@@ -221,6 +226,15 @@ scan_ag(
 
 		fsmap->fmh_keys[0] = *p;
 	}
+
+	if (gflag) {
+		if (agno == NULLAGNUMBER)
+			printf(_("     rtdev %10llu %10llu\n"), freeexts,
+					freeblks);
+		else
+			printf(_("%10u %10llu %10llu\n"), agno, freeexts,
+					freeblks);
+	}
 }
 static void
 aglistadd(
@@ -245,7 +259,7 @@ init(
 	aglist = NULL;
 	hist = NULL;
 	rtflag = false;
-	while ((c = getopt(argc, argv, "a:bde:h:m:rs")) != EOF) {
+	while ((c = getopt(argc, argv, "a:bde:gh:m:rs")) != EOF) {
 		switch (c) {
 		case 'a':
 			aglistadd(optarg);
@@ -265,6 +279,10 @@ init(
 			equalsize = atoi(optarg);
 			speced = 1;
 			break;
+		case 'g':
+			histcount = 0;
+			gflag++;
+			break;
 		case 'h':
 			if (speced && !histcount)
 				return 0;
@@ -307,13 +325,15 @@ freesp_f(
 
 	if (!init(argc, argv))
 		return 0;
+	if (gflag)
+		printf(_("        AG    extents     blocks\n"));
 	if (rtflag)
 		scan_ag(NULLAGNUMBER);
 	for (agno = 0; !rtflag && agno < file->geom.agcount; agno++)  {
 		if (inaglist(agno))
 			scan_ag(agno);
 	}
-	if (histcount)
+	if (histcount && !gflag)
 		printhist();
 	if (summaryflag) {
 		printf(_("total free extents %lld\n"), totexts);
@@ -335,10 +355,11 @@ freesp_help(void)
 "\n"
 "Examine filesystem free space\n"
 "\n"
-"Options: [-bcds] [-a agno] [-e bsize] [-h h1]... [-m bmult]\n"
+"Options: [-bcdgs] [-a agno] [-e bsize] [-h h1]... [-m bmult]\n"
 "\n"
 " -b -- binary histogram bin size\n"
 " -d -- debug output\n"
+" -g -- print per-AG summary\n"
 " -r -- display realtime device free space information\n"
 " -s -- emit freespace summary information\n"
 " -a agno -- scan only the given AG agno\n"


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 09/17] xfs_db: introduce fuzz command
  2017-01-21  8:08 [PATCH v5 00/17] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (7 preceding siblings ...)
  2017-01-21  8:08 ` [PATCH 08/17] xfs_spaceman: add group summary mode Darrick J. Wong
@ 2017-01-21  8:09 ` Darrick J. Wong
  2017-01-21  8:09 ` [PATCH 10/17] xfs_db: print attribute remote value blocks Darrick J. Wong
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Darrick J. Wong @ 2017-01-21  8:09 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs, linux-fsdevel

Introduce a new 'fuzz' command to write creative values into
disk structure fields.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 db/Makefile       |    3 
 db/bit.c          |   17 +-
 db/bit.h          |    5 -
 db/command.c      |    2 
 db/fuzz.c         |  461 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 db/fuzz.h         |   21 ++
 db/io.c           |    9 +
 db/io.h           |    1 
 db/type.c         |   44 ++++-
 db/type.h         |    1 
 man/man8/xfs_db.8 |   55 ++++++
 11 files changed, 598 insertions(+), 21 deletions(-)
 create mode 100644 db/fuzz.c
 create mode 100644 db/fuzz.h


diff --git a/db/Makefile b/db/Makefile
index cdc0b99..feeacf6 100644
--- a/db/Makefile
+++ b/db/Makefile
@@ -12,7 +12,8 @@ HFILES = addr.h agf.h agfl.h agi.h attr.h attrshort.h bit.h block.h bmap.h \
 	dir2.h dir2sf.h dquot.h echo.h faddr.h field.h \
 	flist.h fprint.h frag.h freesp.h hash.h help.h init.h inode.h input.h \
 	io.h logformat.h malloc.h metadump.h output.h print.h quit.h sb.h \
-	 sig.h strvec.h text.h type.h write.h attrset.h symlink.h fsmap.h
+	 sig.h strvec.h text.h type.h write.h attrset.h symlink.h fsmap.h \
+	fuzz.h
 CFILES = $(HFILES:.h=.c)
 LSRCFILES = xfs_admin.sh xfs_ncheck.sh xfs_metadump.sh
 
diff --git a/db/bit.c b/db/bit.c
index 24872bf..3fcb085 100644
--- a/db/bit.c
+++ b/db/bit.c
@@ -19,13 +19,8 @@
 #include "libxfs.h"
 #include "bit.h"
 
-#undef setbit	/* defined in param.h on Linux */
-
-static int	getbit(char *ptr, int bit);
-static void	setbit(char *ptr, int bit, int val);
-
-static int
-getbit(
+int
+getbit_l(
 	char	*ptr,
 	int	bit)
 {
@@ -39,8 +34,8 @@ getbit(
 	return (*ptr & mask) >> shift;
 }
 
-static void
-setbit(
+void
+setbit_l(
 	char *ptr,
 	int  bit,
 	int  val)
@@ -106,7 +101,7 @@ getbitval(
 
 
 	for (i = 0, rval = 0LL; i < nbits; i++) {
-		if (getbit(p, bit + i)) {
+		if (getbit_l(p, bit + i)) {
 			/* If the last bit is on and we care about sign
 			 * bits and we don't have a full 64 bit
 			 * container, turn all bits on between the
@@ -162,7 +157,7 @@ setbitval(
 
 	if (bitoff % NBBY || nbits % NBBY) {
 		for (bit = 0; bit < nbits; bit++)
-			setbit(out, bit + bitoff, getbit(in, bit));
+			setbit_l(out, bit + bitoff, getbit_l(in, bit));
 	} else
 		memcpy(out + byteize(bitoff), in, byteize(nbits));
 }
diff --git a/db/bit.h b/db/bit.h
index 80ba24c..4506679 100644
--- a/db/bit.h
+++ b/db/bit.h
@@ -21,9 +21,12 @@
 #define	bitszof(x,y)	bitize(szof(x,y))
 #define	byteize(s)	((s) / NBBY)
 #define	bitoffs(s)	((s) % NBBY)
+#define	byteize_up(s)	(((s) + NBBY - 1) / NBBY)
 
 #define	BVUNSIGNED	0
 #define	BVSIGNED	1
 
 extern __int64_t	getbitval(void *obj, int bitoff, int nbits, int flags);
-extern void             setbitval(void *obuf, int bitoff, int nbits, void *ibuf);
+extern void		setbitval(void *obuf, int bitoff, int nbits, void *ibuf);
+extern int		getbit_l(char *ptr, int bit);
+extern void		setbit_l(char *ptr, int bit, int val);
diff --git a/db/command.c b/db/command.c
index 3d7cfd7..0eb4944 100644
--- a/db/command.c
+++ b/db/command.c
@@ -51,6 +51,7 @@
 #include "dquot.h"
 #include "fsmap.h"
 #include "crc.h"
+#include "fuzz.h"
 
 cmdinfo_t	*cmdtab;
 int		ncmds;
@@ -146,4 +147,5 @@ init_commands(void)
 	type_init();
 	write_init();
 	dquot_init();
+	fuzz_init();
 }
diff --git a/db/fuzz.c b/db/fuzz.c
new file mode 100644
index 0000000..061ecd1
--- /dev/null
+++ b/db/fuzz.c
@@ -0,0 +1,461 @@
+/*
+ * Copyright (c) 2000-2002,2005 Silicon Graphics, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+
+#include "libxfs.h"
+#include <ctype.h>
+#include <time.h>
+#include "bit.h"
+#include "block.h"
+#include "command.h"
+#include "type.h"
+#include "faddr.h"
+#include "fprint.h"
+#include "field.h"
+#include "flist.h"
+#include "io.h"
+#include "init.h"
+#include "output.h"
+#include "print.h"
+#include "write.h"
+#include "malloc.h"
+
+static int	fuzz_f(int argc, char **argv);
+static void     fuzz_help(void);
+
+static const cmdinfo_t	fuzz_cmd =
+	{ "fuzz", NULL, fuzz_f, 0, -1, 0, N_("[-c] [-d] field fuzzcmd..."),
+	  N_("fuzz values on disk"), fuzz_help };
+
+void
+fuzz_init(void)
+{
+	if (!expert_mode)
+		return;
+
+	add_command(&fuzz_cmd);
+	srand48(clock());
+}
+
+static void
+fuzz_help(void)
+{
+	dbprintf(_(
+"\n"
+" The 'fuzz' command fuzzes fields in any on-disk data structure.  For\n"
+" block fuzzing, see the 'blocktrash' or 'write' commands."
+"\n"
+" Examples:\n"
+"  Struct mode: 'fuzz core.uid zeroes'    - set an inode uid field to 0.\n"
+"               'fuzz crc ones'           - set a crc filed to all ones.\n"
+"               'fuzz bno[11] firstbit'   - set the high bit of a block array.\n"
+"               'fuzz keys[5].startblock add'    - increase a btree key value.\n"
+"               'fuzz uuid random'        - randomize the superblock uuid.\n"
+"\n"
+" In data mode type 'fuzz' by itself for a list of specific commands.\n\n"
+" Specifying the -c option will allow writes of invalid (corrupt) data with\n"
+" an invalid CRC. Specifying the -d option will allow writes of invalid data,\n"
+" but still recalculate the CRC so we are forced to check and detect the\n"
+" invalid data appropriately.\n\n"
+));
+
+}
+
+static int
+fuzz_f(
+	int		argc,
+	char		**argv)
+{
+	pfunc_t	pf;
+	extern char *progname;
+	int c;
+	bool corrupt = false;	/* Allow write of bad data w/ invalid CRC */
+	bool invalid_data = false; /* Allow write of bad data w/ valid CRC */
+	struct xfs_buf_ops local_ops;
+	const struct xfs_buf_ops *stashed_ops = NULL;
+
+	if (x.isreadonly & LIBXFS_ISREADONLY) {
+		dbprintf(_("%s started in read only mode, fuzzing disabled\n"),
+			progname);
+		return 0;
+	}
+
+	if (cur_typ == NULL) {
+		dbprintf(_("no current type\n"));
+		return 0;
+	}
+
+	pf = cur_typ->pfunc;
+	if (pf == NULL) {
+		dbprintf(_("no handler function for type %s, fuzz unsupported.\n"),
+			 cur_typ->name);
+		return 0;
+	}
+
+	while ((c = getopt(argc, argv, "cd")) != EOF) {
+		switch (c) {
+		case 'c':
+			corrupt = true;
+			break;
+		case 'd':
+			invalid_data = true;
+			break;
+		default:
+			dbprintf(_("bad option for fuzz command\n"));
+			return 0;
+		}
+	}
+
+	if (corrupt && invalid_data) {
+		dbprintf(_("Cannot specify both -c and -d options\n"));
+		return 0;
+	}
+
+	if (invalid_data && iocur_top->typ->crc_off == TYP_F_NO_CRC_OFF &&
+			!iocur_top->ino_buf) {
+		dbprintf(_("Cannot recalculate CRCs on this type of object\n"));
+		return 0;
+	}
+
+	argc -= optind;
+	argv += optind;
+
+	/*
+	 * If the buffer has no verifier or we are using standard verifier
+	 * paths, then just fuzz it and return
+	 */
+	if (!iocur_top->bp->b_ops ||
+	    !(corrupt || invalid_data)) {
+		(*pf)(DB_FUZZ, cur_typ->fields, argc, argv);
+		return 0;
+	}
+
+
+	/* Temporarily remove write verifier to write bad data */
+	stashed_ops = iocur_top->bp->b_ops;
+	local_ops.verify_read = stashed_ops->verify_read;
+	iocur_top->bp->b_ops = &local_ops;
+
+	if (corrupt) {
+		local_ops.verify_write = xfs_dummy_verify;
+		dbprintf(_("Allowing fuzz of corrupted data and bad CRC\n"));
+	} else if (iocur_top->ino_buf) {
+		local_ops.verify_write = xfs_verify_recalc_inode_crc;
+		dbprintf(_("Allowing fuzz of corrupted inode with good CRC\n"));
+	} else { /* invalid data */
+		local_ops.verify_write = xfs_verify_recalc_crc;
+		dbprintf(_("Allowing fuzz of corrupted data with good CRC\n"));
+	}
+
+	(*pf)(DB_FUZZ, cur_typ->fields, argc, argv);
+
+	iocur_top->bp->b_ops = stashed_ops;
+
+	return 0;
+}
+
+/* Write zeroes to the field */
+static bool
+fuzz_zeroes(
+	void		*buf,
+	int		bitoff,
+	int		nbits)
+{
+	char		*out = buf;
+	int		bit;
+
+	if (bitoff % NBBY || nbits % NBBY) {
+		for (bit = 0; bit < nbits; bit++)
+			setbit_l(out, bit + bitoff, 0);
+	} else
+		memset(out + byteize(bitoff), 0, byteize(nbits));
+	return true;
+}
+
+/* Write ones to the field */
+static bool
+fuzz_ones(
+	void		*buf,
+	int		bitoff,
+	int		nbits)
+{
+	char		*out = buf;
+	int		bit;
+
+	if (bitoff % NBBY || nbits % NBBY) {
+		for (bit = 0; bit < nbits; bit++)
+			setbit_l(out, bit + bitoff, 1);
+	} else
+		memset(out + byteize(bitoff), 0xFF, byteize(nbits));
+	return true;
+}
+
+/* Flip the high bit in the (presumably big-endian) field */
+static bool
+fuzz_firstbit(
+	void		*buf,
+	int		bitoff,
+	int		nbits)
+{
+	setbit_l((char *)buf, bitoff, !getbit_l((char *)buf, bitoff));
+	return true;
+}
+
+/* Flip the low bit in the (presumably big-endian) field */
+static bool
+fuzz_lastbit(
+	void		*buf,
+	int		bitoff,
+	int		nbits)
+{
+	setbit_l((char *)buf, bitoff + nbits - 1,
+			!getbit_l((char *)buf, bitoff));
+	return true;
+}
+
+/* Flip the middle bit in the (presumably big-endian) field */
+static bool
+fuzz_middlebit(
+	void		*buf,
+	int		bitoff,
+	int		nbits)
+{
+	setbit_l((char *)buf, bitoff + nbits / 2,
+			!getbit_l((char *)buf, bitoff));
+	return true;
+}
+
+/* Format and shift a number into a buffer for setbitval. */
+static char *
+format_number(
+	uint64_t	val,
+	__be64		*out,
+	int		bit_length)
+{
+	int		offset;
+	char		*rbuf = (char *)out;
+
+	/*
+	 * If the length of the field is not a multiple of a byte, push
+	 * the bits up in the field, so the most signicant field bit is
+	 * the most significant bit in the byte:
+	 *
+	 * before:
+	 * val  |----|----|----|----|----|--MM|mmmm|llll|
+	 * after
+	 * val  |----|----|----|----|----|MMmm|mmll|ll00|
+	 */
+	offset = bit_length % NBBY;
+	if (offset)
+		val <<= (NBBY - offset);
+
+	/*
+	 * convert to big endian and copy into the array
+	 * rbuf |----|----|----|----|----|MMmm|mmll|ll00|
+	 */
+	*out = cpu_to_be64(val);
+
+	/*
+	 * Align the array to point to the field in the array.
+	 *  rbuf  = |MMmm|mmll|ll00|
+	 */
+	offset = sizeof(__be64) - 1 - ((bit_length - 1) / sizeof(__be64));
+	return rbuf + offset;
+}
+
+/* Increase the value by some small prime number. */
+static bool
+fuzz_add(
+	void		*buf,
+	int		bitoff,
+	int		nbits)
+{
+	uint64_t	val;
+	__be64		out;
+	char		*b;
+
+	if (nbits > 64)
+		return false;
+
+	val = getbitval(buf, bitoff, nbits, BVUNSIGNED);
+	val += (nbits > 8 ? 2017 : 137);
+	b = format_number(val, &out, nbits);
+	setbitval(buf, bitoff, nbits, b);
+
+	return true;
+}
+
+/* Decrease the value by some small prime number. */
+static bool
+fuzz_sub(
+	void		*buf,
+	int		bitoff,
+	int		nbits)
+{
+	uint64_t	val;
+	__be64		out;
+	char		*b;
+
+	if (nbits > 64)
+		return false;
+
+	val = getbitval(buf, bitoff, nbits, BVUNSIGNED);
+	val -= (nbits > 8 ? 2017 : 137);
+	b = format_number(val, &out, nbits);
+	setbitval(buf, bitoff, nbits, b);
+
+	return true;
+}
+
+/* Randomize the field. */
+static bool
+fuzz_random(
+	void		*buf,
+	int		bitoff,
+	int		nbits)
+{
+	int		i, bytes;
+	char		*b, *rbuf;
+
+	bytes = byteize_up(nbits);
+	rbuf = b = malloc(bytes);
+	if (!b) {
+		perror("fuzz_random");
+		return false;
+	}
+
+	for (i = 0; i < bytes; i++)
+		*b++ = (char)lrand48();
+
+	setbitval(buf, bitoff, nbits, rbuf);
+	free(rbuf);
+
+	return true;
+}
+
+struct fuzzcmd {
+	const char	*verb;
+	bool		(*fn)(void *buf, int bitoff, int nbits);
+};
+
+/* Keep these verbs in sync with enum fuzzcmds. */
+static struct fuzzcmd fuzzverbs[] = {
+	{"zeroes",		fuzz_zeroes},
+	{"ones",		fuzz_ones},
+	{"firstbit",		fuzz_firstbit},
+	{"middlebit",		fuzz_middlebit},
+	{"lastbit",		fuzz_lastbit},
+	{"add",			fuzz_add},
+	{"sub",			fuzz_sub},
+	{"random",		fuzz_random},
+	{NULL,			NULL},
+};
+
+/* ARGSUSED */
+void
+fuzz_struct(
+	const field_t	*fields,
+	int		argc,
+	char		**argv)
+{
+	const ftattr_t	*fa;
+	flist_t		*fl;
+	flist_t		*sfl;
+	int		bit_length;
+	struct fuzzcmd	*fc;
+	bool		success;
+	int		parentoffset;
+
+	if (argc != 2) {
+		dbprintf(_("Usage: fuzz fieldname verb\n"));
+		dbprintf("Verbs: %s", fuzzverbs->verb);
+		for (fc = fuzzverbs + 1; fc->verb != NULL; fc++)
+			dbprintf(", %s", fc->verb);
+		dbprintf(".\n");
+		return;
+	}
+
+	fl = flist_scan(argv[0]);
+	if (!fl) {
+		dbprintf(_("unable to parse '%s'.\n"), argv[0]);
+		return;
+	}
+
+	/* Find our fuzz verb */
+	for (fc = fuzzverbs; fc->verb != NULL; fc++)
+		if (!strcmp(fc->verb, argv[1]))
+			break;
+	if (fc->fn == NULL) {
+		dbprintf(_("Unknown fuzz command '%s'.\n"), argv[1]);
+		return;
+	}
+
+	/* if we're a root field type, go down 1 layer to get field list */
+	if (fields->name[0] == '\0') {
+		fa = &ftattrtab[fields->ftyp];
+		ASSERT(fa->ftyp == fields->ftyp);
+		fields = fa->subfld;
+	}
+
+	/* run down the field list and set offsets into the data */
+	if (!flist_parse(fields, fl, iocur_top->data, 0)) {
+		flist_free(fl);
+		dbprintf(_("parsing error\n"));
+		return;
+	}
+
+	sfl = fl;
+	parentoffset = 0;
+	while (sfl->child) {
+		parentoffset = sfl->offset;
+		sfl = sfl->child;
+	}
+
+	/*
+	 * For structures, fsize * fcount tells us the size of the region we are
+	 * modifying, which is usually a single structure member and is pointed
+	 * to by the last child in the list.
+	 *
+	 * However, if the base structure is an array and we have a direct index
+	 * into the array (e.g. write bno[5]) then we are returned a single
+	 * flist object with the offset pointing directly at the location we
+	 * need to modify. The length of the object we are modifying is then
+	 * determined by the size of the individual array entry (fsize) and the
+	 * indexes defined in the object, not the overall size of the array
+	 * (which is what fcount returns).
+	 */
+	bit_length = fsize(sfl->fld, iocur_top->data, parentoffset, 0);
+	if (sfl->fld->flags & FLD_ARRAY)
+		bit_length *= sfl->high - sfl->low + 1;
+	else
+		bit_length *= fcount(sfl->fld, iocur_top->data, parentoffset);
+
+	/* Fuzz the value */
+	success = fc->fn(iocur_top->data, sfl->offset, bit_length);
+	if (!success) {
+		dbprintf(_("unable to fuzz field '%s'\n"), argv[0]);
+		flist_free(fl);
+		return;
+	}
+
+	/* Write the fuzzed value back */
+	write_cur();
+
+	flist_print(fl);
+	print_flist(fl);
+	flist_free(fl);
+}
diff --git a/db/fuzz.h b/db/fuzz.h
new file mode 100644
index 0000000..c203eb5
--- /dev/null
+++ b/db/fuzz.h
@@ -0,0 +1,21 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+extern void	fuzz_init(void);
+extern void	fuzz_struct(const field_t *fields, int argc, char **argv);
diff --git a/db/io.c b/db/io.c
index f398195..1f316d8 100644
--- a/db/io.c
+++ b/db/io.c
@@ -465,6 +465,15 @@ xfs_dummy_verify(
 }
 
 void
+xfs_verify_recalc_inode_crc(
+	struct xfs_buf *bp)
+{
+	ASSERT(iocur_top->ino_buf);
+	libxfs_dinode_calc_crc(mp, iocur_top->data);
+	iocur_top->ino_crc_ok = 1;
+}
+
+void
 xfs_verify_recalc_crc(
 	struct xfs_buf *bp)
 {
diff --git a/db/io.h b/db/io.h
index c69e9ce..12d96c2 100644
--- a/db/io.h
+++ b/db/io.h
@@ -64,6 +64,7 @@ extern void	set_cur(const struct typ *t, __int64_t d, int c, int ring_add,
 extern void     ring_add(void);
 extern void	set_iocur_type(const struct typ *t);
 extern void	xfs_dummy_verify(struct xfs_buf *bp);
+extern void	xfs_verify_recalc_inode_crc(struct xfs_buf *bp);
 extern void	xfs_verify_recalc_crc(struct xfs_buf *bp);
 
 /*
diff --git a/db/type.c b/db/type.c
index 10fa54e..adab10a 100644
--- a/db/type.c
+++ b/db/type.c
@@ -39,6 +39,7 @@
 #include "dir2.h"
 #include "text.h"
 #include "symlink.h"
+#include "fuzz.h"
 
 static const typ_t	*findtyp(char *name);
 static int		type_f(int argc, char **argv);
@@ -254,10 +255,17 @@ handle_struct(
 	int           argc,
 	char          **argv)
 {
-	if (action == DB_WRITE)
+	switch (action) {
+	case DB_FUZZ:
+		fuzz_struct(fields, argc, argv);
+		break;
+	case DB_WRITE:
 		write_struct(fields, argc, argv);
-	else
+		break;
+	case DB_READ:
 		print_struct(fields, argc, argv);
+		break;
+	}
 }
 
 void
@@ -267,10 +275,17 @@ handle_string(
 	int           argc,
 	char          **argv)
 {
-	if (action == DB_WRITE)
+	switch (action) {
+	case DB_WRITE:
 		write_string(fields, argc, argv);
-	else
+		break;
+	case DB_READ:
 		print_string(fields, argc, argv);
+		break;
+	case DB_FUZZ:
+		dbprintf(_("string fuzzing not supported.\n"));
+		break;
+	}
 }
 
 void
@@ -280,10 +295,17 @@ handle_block(
 	int           argc,
 	char          **argv)
 {
-	if (action == DB_WRITE)
+	switch (action) {
+	case DB_WRITE:
 		write_block(fields, argc, argv);
-	else
+		break;
+	case DB_READ:
 		print_block(fields, argc, argv);
+		break;
+	case DB_FUZZ:
+		dbprintf(_("use 'blocktrash' or 'write' to fuzz a block.\n"));
+		break;
+	}
 }
 
 void
@@ -293,6 +315,14 @@ handle_text(
 	int           argc,
 	char          **argv)
 {
-	if (action != DB_WRITE)
+	switch (action) {
+	case DB_FUZZ:
+		/* fall through */
+	case DB_WRITE:
+		dbprintf(_("text writing/fuzzing not supported.\n"));
+		break;
+	case DB_READ:
 		print_text(fields, argc, argv);
+		break;
+	}
 }
diff --git a/db/type.h b/db/type.h
index 87ff107..a50d705 100644
--- a/db/type.h
+++ b/db/type.h
@@ -30,6 +30,7 @@ typedef enum typnm
 	TYP_TEXT, TYP_FINOBT, TYP_NONE
 } typnm_t;
 
+#define DB_FUZZ  2
 #define DB_WRITE 1
 #define DB_READ  0
 
diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8
index 460d89d..55e0629 100644
--- a/man/man8/xfs_db.8
+++ b/man/man8/xfs_db.8
@@ -594,6 +594,55 @@ in units of 512-byte blocks, no matter what the filesystem's block size is.
 .BI "The optional " start " and " end " arguments can be used to constrain
 the output to a particular range of disk blocks.
 .TP
+.BI "fuzz [\-c] [\-d] " "field action"
+Write garbage into a specific structure field on disk.
+Expert mode must be enabled to use this command.
+The operation happens immediately; there is no buffering.
+.IP
+The fuzz command can take the following
+.IR action "s"
+against a field:
+.RS 1.0i
+.TP 0.4i
+.B zeroes
+Clears all bits in the field.
+.TP 0.4i
+.B ones
+Sets all bits in the field.
+.TP 0.4i
+.B firstbit
+Flips the first bit in the field.
+For a scalar value, this is the highest bit.
+.TP 0.4i
+.B middlebit
+Flips the middle bit in the field.
+.TP 0.4i
+.B lastbit
+Flips the last bit in the field.
+For a scalar value, this is the lowest bit.
+.TP 0.4i
+.B add
+Adds a small value to a scalar field.
+.TP 0.4i
+.B sub
+Subtracts a small value from a scalar field.
+.TP 0.4i
+.B random
+Randomizes the contents of the field.
+.RE
+.IP
+The following switches affect the write behavior:
+.RS 1.0i
+.TP 0.4i
+.B \-c
+Skip write verifiers and CRC recalculation; allows invalid data to be written
+to disk.
+.TP 0.4i
+.B \-d
+Skip write verifiers but perform CRC recalculation; allows invalid data to be
+written to disk to test detection of invalid data.
+.RE
+.TP
 .BI hash " string
 Prints the hash value of
 .I string
@@ -755,7 +804,7 @@ and
 bits respectively, and their string equivalent reported
 (but no modifications are made).
 .TP
-.BI "write [\-c] [" "field value" "] ..."
+.BI "write [\-c] [\-d] [" "field value" "] ..."
 Write a value to disk.
 Specific fields can be set in structures (struct mode),
 or a block can be set to data values (data mode),
@@ -778,6 +827,10 @@ with no arguments gives more information on the allowed commands.
 .B \-c
 Skip write verifiers and CRC recalculation; allows invalid data to be written
 to disk.
+.TP 0.4i
+.B \-d
+Skip write verifiers but perform CRC recalculation; allows invalid data to be
+written to disk to test detection of invalid data.
 .RE
 .SH TYPES
 This section gives the fields in each structure type and their meanings.


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 10/17] xfs_db: print attribute remote value blocks
  2017-01-21  8:08 [PATCH v5 00/17] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (8 preceding siblings ...)
  2017-01-21  8:09 ` [PATCH 09/17] xfs_db: introduce fuzz command Darrick J. Wong
@ 2017-01-21  8:09 ` Darrick J. Wong
  2017-01-21  8:09 ` [PATCH 11/17] xfs_db: write / fuzz bad values into dir/attr blocks with good CRCs Darrick J. Wong
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Darrick J. Wong @ 2017-01-21  8:09 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs, linux-fsdevel

Teach xfs_db how to print the contents of xattr remote value blocks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 db/attr.c  |   59 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 db/attr.h  |    1 +
 db/field.c |    3 +++
 db/field.h |    1 +
 4 files changed, 64 insertions(+)


diff --git a/db/attr.c b/db/attr.c
index e26ac67..0fffbc2 100644
--- a/db/attr.c
+++ b/db/attr.c
@@ -41,6 +41,9 @@ static int	attr_leaf_nvlist_offset(void *obj, int startoff, int idx);
 static int	attr_node_btree_count(void *obj, int startoff);
 static int	attr_node_hdr_count(void *obj, int startoff);
 
+static int	attr_remote_count(void *obj, int startoff);
+static int	attr3_remote_count(void *obj, int startoff);
+
 const field_t	attr_hfld[] = {
 	{ "", FLDT_ATTR, OI(0), C1, 0, TYP_NONE },
 	{ NULL }
@@ -53,6 +56,7 @@ const field_t	attr_flds[] = {
 	  FLD_COUNT, TYP_NONE },
 	{ "hdr", FLDT_ATTR_NODE_HDR, OI(NOFF(hdr)), attr_node_hdr_count,
 	  FLD_COUNT, TYP_NONE },
+	{ "data", FLDT_CHARNS, OI(0), attr_remote_count, FLD_COUNT, TYP_NONE },
 	{ "entries", FLDT_ATTR_LEAF_ENTRY, OI(LOFF(entries)),
 	  attr_leaf_entries_count, FLD_ARRAY|FLD_COUNT, TYP_NONE },
 	{ "btree", FLDT_ATTR_NODE_ENTRY, OI(NOFF(__btree)), attr_node_btree_count,
@@ -197,6 +201,33 @@ attr3_leaf_hdr_count(
 	return be16_to_cpu(leaf->hdr.info.hdr.magic) == XFS_ATTR3_LEAF_MAGIC;
 }
 
+static int
+attr_remote_count(
+	void		*obj,
+	int		startoff)
+{
+	if (attr_leaf_hdr_count(obj, startoff) == 0 &&
+	    attr_node_hdr_count(obj, startoff) == 0)
+		return mp->m_sb.sb_blocksize;
+	return 0;
+}
+
+static int
+attr3_remote_count(
+	void		*obj,
+	int		startoff)
+{
+	struct xfs_attr3_rmt_hdr	*hdr = obj;
+
+	ASSERT(startoff == 0);
+
+	if (hdr->rm_magic != cpu_to_be32(XFS_ATTR3_RMT_MAGIC))
+		return 0;
+	if (be32_to_cpu(hdr->rm_bytes) + sizeof(*hdr) > mp->m_sb.sb_blocksize)
+		return mp->m_sb.sb_blocksize - sizeof(*hdr);
+	return be32_to_cpu(hdr->rm_bytes);
+}
+
 typedef int (*attr_leaf_entry_walk_f)(struct xfs_attr_leafblock *,
 				      struct xfs_attr_leaf_entry *, int);
 static int
@@ -477,6 +508,17 @@ attr3_node_hdr_count(
 	return be16_to_cpu(node->hdr.info.hdr.magic) == XFS_DA3_NODE_MAGIC;
 }
 
+static int
+attr3_remote_hdr_count(
+	void			*obj,
+	int			startoff)
+{
+	struct xfs_attr3_rmt_hdr	*node = obj;
+
+	ASSERT(startoff == 0);
+	return be32_to_cpu(node->rm_magic) == XFS_ATTR3_RMT_MAGIC;
+}
+
 int
 attr_size(
 	void	*obj,
@@ -501,6 +543,8 @@ const field_t	attr3_flds[] = {
 	  FLD_COUNT, TYP_NONE },
 	{ "hdr", FLDT_DA3_NODE_HDR, OI(N3OFF(hdr)), attr3_node_hdr_count,
 	  FLD_COUNT, TYP_NONE },
+	{ "hdr", FLDT_ATTR3_REMOTE_HDR, OI(0), attr3_remote_hdr_count,
+	  FLD_COUNT, TYP_NONE },
 	{ "entries", FLDT_ATTR_LEAF_ENTRY, OI(L3OFF(entries)),
 	  attr3_leaf_entries_count, FLD_ARRAY|FLD_COUNT, TYP_NONE },
 	{ "btree", FLDT_ATTR_NODE_ENTRY, OI(N3OFF(__btree)),
@@ -523,6 +567,21 @@ const field_t	attr3_leaf_hdr_flds[] = {
 	{ NULL }
 };
 
+#define	RM3OFF(f)	bitize(offsetof(struct xfs_attr3_rmt_hdr, rm_ ## f))
+const struct field	attr3_remote_crc_flds[] = {
+	{ "magic", FLDT_UINT32X, OI(RM3OFF(magic)), C1, 0, TYP_NONE },
+	{ "offset", FLDT_UINT32D, OI(RM3OFF(offset)), C1, 0, TYP_NONE },
+	{ "bytes", FLDT_UINT32D, OI(RM3OFF(bytes)), C1, 0, TYP_NONE },
+	{ "crc", FLDT_CRC, OI(RM3OFF(crc)), C1, 0, TYP_NONE },
+	{ "uuid", FLDT_UUID, OI(RM3OFF(uuid)), C1, 0, TYP_NONE },
+	{ "owner", FLDT_INO, OI(RM3OFF(owner)), C1, 0, TYP_NONE },
+	{ "bno", FLDT_DFSBNO, OI(RM3OFF(blkno)), C1, 0, TYP_BMAPBTD },
+	{ "lsn", FLDT_UINT64X, OI(RM3OFF(lsn)), C1, 0, TYP_NONE },
+	{ "data", FLDT_CHARNS, OI(bitize(sizeof(struct xfs_attr3_rmt_hdr))),
+		attr3_remote_count, FLD_COUNT, TYP_NONE },
+	{ NULL }
+};
+
 /*
  * Special read verifier for attribute buffers. Detect the magic number
  * appropriately and set the correct verifier and call it.
diff --git a/db/attr.h b/db/attr.h
index bc3431f..d7bb579 100644
--- a/db/attr.h
+++ b/db/attr.h
@@ -30,6 +30,7 @@ extern const field_t	attr3_flds[];
 extern const field_t	attr3_hfld[];
 extern const field_t	attr3_leaf_hdr_flds[];
 extern const field_t	attr3_node_hdr_flds[];
+extern const field_t	attr3_remote_crc_flds[];
 
 extern int	attr_leaf_name_size(void *obj, int startoff, int idx);
 extern int	attr_size(void *obj, int startoff, int idx);
diff --git a/db/field.c b/db/field.c
index 1968dd5..e8bbbe3 100644
--- a/db/field.c
+++ b/db/field.c
@@ -97,6 +97,9 @@ const ftattr_t	ftattrtab[] = {
 	{ FLDT_ATTR3_NODE_HDR, "attr3_node_hdr", NULL,
 	  (char *)da3_node_hdr_flds, SI(bitsz(struct xfs_da3_node_hdr)),
 	  0, NULL, da3_node_hdr_flds },
+	{ FLDT_ATTR3_REMOTE_HDR, "attr3_remote_hdr", NULL,
+	  (char *)attr3_remote_crc_flds, attr_size, FTARG_SIZE, NULL,
+	  attr3_remote_crc_flds },
 
 	{ FLDT_BMAPBTA, "bmapbta", NULL, (char *)bmapbta_flds, btblock_size,
 	  FTARG_SIZE, NULL, bmapbta_flds },
diff --git a/db/field.h b/db/field.h
index 53616f1..e5a943b 100644
--- a/db/field.h
+++ b/db/field.h
@@ -46,6 +46,7 @@ typedef enum fldt	{
 	FLDT_ATTR3,
 	FLDT_ATTR3_LEAF_HDR,
 	FLDT_ATTR3_NODE_HDR,
+	FLDT_ATTR3_REMOTE_HDR,
 
 	FLDT_BMAPBTA,
 	FLDT_BMAPBTA_CRC,


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 11/17] xfs_db: write / fuzz bad values into dir/attr blocks with good CRCs
  2017-01-21  8:08 [PATCH v5 00/17] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (9 preceding siblings ...)
  2017-01-21  8:09 ` [PATCH 10/17] xfs_db: print attribute remote value blocks Darrick J. Wong
@ 2017-01-21  8:09 ` Darrick J. Wong
  2017-01-21  8:09 ` [PATCH 12/17] xfs_io: provide an interface to the scrub ioctls Darrick J. Wong
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Darrick J. Wong @ 2017-01-21  8:09 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs, linux-fsdevel

Extend typ_t to (optionally) store a pointer to a function to calculate
the CRC of the block, provide functions to do this for the dir3 and
attr3 types, and then wire up the fuzz and write commands so that we can
effectively fuzz directory and extended attribute block fields.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 db/attr.c  |   32 ++++++++++++++++++++++++++++++++
 db/attr.h  |    1 +
 db/dir2.c  |   37 +++++++++++++++++++++++++++++++++++++
 db/dir2.h  |    1 +
 db/fuzz.c  |    3 +++
 db/type.c  |    8 ++++----
 db/type.h  |    2 ++
 db/write.c |    3 +++
 8 files changed, 83 insertions(+), 4 deletions(-)


diff --git a/db/attr.c b/db/attr.c
index 0fffbc2..5a97925 100644
--- a/db/attr.c
+++ b/db/attr.c
@@ -582,6 +582,38 @@ const struct field	attr3_remote_crc_flds[] = {
 	{ NULL }
 };
 
+/* Set the CRC. */
+void
+xfs_attr3_set_crc(
+	struct xfs_buf		*bp)
+{
+	__be32			magic32;
+	__be16			magic16;
+
+	magic32 = *(__be32 *)bp->b_addr;
+	magic16 = ((struct xfs_da_blkinfo *)bp->b_addr)->magic;
+
+	switch (magic16) {
+	case cpu_to_be16(XFS_ATTR3_LEAF_MAGIC):
+		xfs_buf_update_cksum(bp, XFS_ATTR3_LEAF_CRC_OFF);
+		return;
+	case cpu_to_be16(XFS_DA3_NODE_MAGIC):
+		xfs_buf_update_cksum(bp, XFS_DA3_NODE_CRC_OFF);
+		return;
+	default:
+		break;
+	}
+
+	switch (magic32) {
+	case cpu_to_be32(XFS_ATTR3_RMT_MAGIC):
+		xfs_buf_update_cksum(bp, XFS_ATTR3_RMT_CRC_OFF);
+		return;
+	default:
+		dbprintf(_("Unknown attribute buffer type!\n"));
+		break;
+	}
+}
+
 /*
  * Special read verifier for attribute buffers. Detect the magic number
  * appropriately and set the correct verifier and call it.
diff --git a/db/attr.h b/db/attr.h
index d7bb579..9ea7429 100644
--- a/db/attr.h
+++ b/db/attr.h
@@ -34,5 +34,6 @@ extern const field_t	attr3_remote_crc_flds[];
 
 extern int	attr_leaf_name_size(void *obj, int startoff, int idx);
 extern int	attr_size(void *obj, int startoff, int idx);
+extern void	xfs_attr3_set_crc(struct xfs_buf *bp);
 
 extern const struct xfs_buf_ops xfs_attr3_db_buf_ops;
diff --git a/db/dir2.c b/db/dir2.c
index 533f705..3e21a7b 100644
--- a/db/dir2.c
+++ b/db/dir2.c
@@ -981,6 +981,43 @@ const field_t	da3_node_hdr_flds[] = {
 	{ NULL }
 };
 
+/* Set the CRC. */
+void
+xfs_dir3_set_crc(
+	struct xfs_buf		*bp)
+{
+	__be32			magic32;
+	__be16			magic16;
+
+	magic32 = *(__be32 *)bp->b_addr;
+	magic16 = ((struct xfs_da_blkinfo *)bp->b_addr)->magic;
+
+	switch (magic32) {
+	case cpu_to_be32(XFS_DIR3_BLOCK_MAGIC):
+	case cpu_to_be32(XFS_DIR3_DATA_MAGIC):
+		xfs_buf_update_cksum(bp, XFS_DIR3_DATA_CRC_OFF);
+		return;
+	case cpu_to_be32(XFS_DIR3_FREE_MAGIC):
+		xfs_buf_update_cksum(bp, XFS_DIR3_FREE_CRC_OFF);
+		return;
+	default:
+		break;
+	}
+
+	switch (magic16) {
+	case cpu_to_be16(XFS_DIR3_LEAF1_MAGIC):
+	case cpu_to_be16(XFS_DIR3_LEAFN_MAGIC):
+		xfs_buf_update_cksum(bp, XFS_DIR3_LEAF_CRC_OFF);
+		return;
+	case cpu_to_be16(XFS_DA3_NODE_MAGIC):
+		xfs_buf_update_cksum(bp, XFS_DA3_NODE_CRC_OFF);
+		return;
+	default:
+		dbprintf(_("Unknown directory buffer type! %x %x\n"), magic32, magic16);
+		break;
+	}
+}
+
 /*
  * Special read verifier for directory buffers. Detect the magic number
  * appropriately and set the correct verifier and call it.
diff --git a/db/dir2.h b/db/dir2.h
index 0c2a62e..1b87cd2 100644
--- a/db/dir2.h
+++ b/db/dir2.h
@@ -60,5 +60,6 @@ static inline uint8_t *xfs_dir2_sf_inumberp(xfs_dir2_sf_entry_t *sfep)
 
 extern int	dir2_data_union_size(void *obj, int startoff, int idx);
 extern int	dir2_size(void *obj, int startoff, int idx);
+extern void	xfs_dir3_set_crc(struct xfs_buf *bp);
 
 extern const struct xfs_buf_ops xfs_dir3_db_buf_ops;
diff --git a/db/fuzz.c b/db/fuzz.c
index 061ecd1..f294331 100644
--- a/db/fuzz.c
+++ b/db/fuzz.c
@@ -156,6 +156,9 @@ fuzz_f(
 	} else if (iocur_top->ino_buf) {
 		local_ops.verify_write = xfs_verify_recalc_inode_crc;
 		dbprintf(_("Allowing fuzz of corrupted inode with good CRC\n"));
+	} else if (iocur_top->typ->crc_off == TYP_F_CRC_FUNC) {
+		local_ops.verify_write = iocur_top->typ->set_crc;
+		dbprintf(_("Allowing fuzz of corrupted data with good CRC\n"));
 	} else { /* invalid data */
 		local_ops.verify_write = xfs_verify_recalc_crc;
 		dbprintf(_("Allowing fuzz of corrupted data with good CRC\n"));
diff --git a/db/type.c b/db/type.c
index adab10a..740adc0 100644
--- a/db/type.c
+++ b/db/type.c
@@ -88,7 +88,7 @@ static const typ_t	__typtab_crc[] = {
 	{ TYP_AGI, "agi", handle_struct, agi_hfld, &xfs_agi_buf_ops,
 		XFS_AGI_CRC_OFF },
 	{ TYP_ATTR, "attr3", handle_struct, attr3_hfld,
-		&xfs_attr3_db_buf_ops, TYP_F_NO_CRC_OFF },
+		&xfs_attr3_db_buf_ops, TYP_F_CRC_FUNC, xfs_attr3_set_crc },
 	{ TYP_BMAPBTA, "bmapbta", handle_struct, bmapbta_crc_hfld,
 		&xfs_bmbt_buf_ops, XFS_BTREE_LBLOCK_CRC_OFF },
 	{ TYP_BMAPBTD, "bmapbtd", handle_struct, bmapbtd_crc_hfld,
@@ -103,7 +103,7 @@ static const typ_t	__typtab_crc[] = {
 		&xfs_refcountbt_buf_ops, XFS_BTREE_SBLOCK_CRC_OFF },
 	{ TYP_DATA, "data", handle_block, NULL, NULL, TYP_F_NO_CRC_OFF },
 	{ TYP_DIR2, "dir3", handle_struct, dir3_hfld,
-		&xfs_dir3_db_buf_ops, TYP_F_NO_CRC_OFF },
+		&xfs_dir3_db_buf_ops, TYP_F_CRC_FUNC, xfs_dir3_set_crc },
 	{ TYP_DQBLK, "dqblk", handle_struct, dqblk_hfld,
 		&xfs_dquot_buf_ops, TYP_F_NO_CRC_OFF },
 	{ TYP_INOBT, "inobt", handle_struct, inobt_crc_hfld,
@@ -132,7 +132,7 @@ static const typ_t	__typtab_spcrc[] = {
 	{ TYP_AGI, "agi", handle_struct, agi_hfld, &xfs_agi_buf_ops ,
 		XFS_AGI_CRC_OFF },
 	{ TYP_ATTR, "attr3", handle_struct, attr3_hfld,
-		&xfs_attr3_db_buf_ops, TYP_F_NO_CRC_OFF },
+		&xfs_attr3_db_buf_ops, TYP_F_CRC_FUNC, xfs_attr3_set_crc },
 	{ TYP_BMAPBTA, "bmapbta", handle_struct, bmapbta_crc_hfld,
 		&xfs_bmbt_buf_ops, XFS_BTREE_LBLOCK_CRC_OFF },
 	{ TYP_BMAPBTD, "bmapbtd", handle_struct, bmapbtd_crc_hfld,
@@ -147,7 +147,7 @@ static const typ_t	__typtab_spcrc[] = {
 		&xfs_refcountbt_buf_ops, XFS_BTREE_SBLOCK_CRC_OFF },
 	{ TYP_DATA, "data", handle_block, NULL, NULL, TYP_F_NO_CRC_OFF },
 	{ TYP_DIR2, "dir3", handle_struct, dir3_hfld,
-		&xfs_dir3_db_buf_ops, TYP_F_NO_CRC_OFF },
+		&xfs_dir3_db_buf_ops, TYP_F_CRC_FUNC, xfs_dir3_set_crc },
 	{ TYP_DQBLK, "dqblk", handle_struct, dqblk_hfld,
 		&xfs_dquot_buf_ops, TYP_F_NO_CRC_OFF },
 	{ TYP_INOBT, "inobt", handle_struct, inobt_spcrc_hfld,
diff --git a/db/type.h b/db/type.h
index a50d705..3971975 100644
--- a/db/type.h
+++ b/db/type.h
@@ -46,6 +46,8 @@ typedef struct typ
 	const struct xfs_buf_ops *bops;
 	unsigned long		crc_off;
 #define TYP_F_NO_CRC_OFF	(-1UL)
+#define TYP_F_CRC_FUNC		(-2UL)
+	void			(*set_crc)(struct xfs_buf *);
 } typ_t;
 extern const typ_t	*typtab, *cur_typ;
 
diff --git a/db/write.c b/db/write.c
index 5c83874..ea87b40 100644
--- a/db/write.c
+++ b/db/write.c
@@ -164,6 +164,9 @@ write_f(
 	if (corrupt) {
 		local_ops.verify_write = xfs_dummy_verify;
 		dbprintf(_("Allowing write of corrupted data and bad CRC\n"));
+	} else if (iocur_top->typ->crc_off == TYP_F_CRC_FUNC) {
+		local_ops.verify_write = iocur_top->typ->set_crc;
+		dbprintf(_("Allowing write of corrupted data with good CRC\n"));
 	} else { /* invalid data */
 		local_ops.verify_write = xfs_verify_recalc_crc;
 		dbprintf(_("Allowing write of corrupted data with good CRC\n"));


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 12/17] xfs_io: provide an interface to the scrub ioctls
  2017-01-21  8:08 [PATCH v5 00/17] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (10 preceding siblings ...)
  2017-01-21  8:09 ` [PATCH 11/17] xfs_db: write / fuzz bad values into dir/attr blocks with good CRCs Darrick J. Wong
@ 2017-01-21  8:09 ` Darrick J. Wong
  2017-01-21  8:09 ` [PATCH 13/17] xfs_scrub: create online filesystem scrub program Darrick J. Wong
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Darrick J. Wong @ 2017-01-21  8:09 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs, linux-fsdevel

Create a new xfs_io command to call the new XFS metadata scrub ioctl.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 io/Makefile       |    2 
 io/init.c         |    2 
 io/inject.c       |    4 -
 io/io.h           |    2 
 io/scrub.c        |  330 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 man/man8/xfs_io.8 |   20 +++
 6 files changed, 358 insertions(+), 2 deletions(-)
 create mode 100644 io/scrub.c


diff --git a/io/Makefile b/io/Makefile
index fd07596..383f5cb 100644
--- a/io/Makefile
+++ b/io/Makefile
@@ -11,7 +11,7 @@ HFILES = init.h io.h
 CFILES = init.c \
 	attr.c bmap.c cowextsize.c encrypt.c file.c freeze.c fsmap.c fsync.c \
 	getrusage.c imap.c link.c mmap.c open.c parent.c pread.c prealloc.c \
-	pwrite.c reflink.c seek.c shutdown.c sync.c truncate.c utimes.c
+	pwrite.c reflink.c scrub.c seek.c shutdown.c sync.c truncate.c utimes.c
 
 LLDLIBS = $(LIBXCMD) $(LIBHANDLE) $(LIBPTHREAD)
 LTDEPENDENCIES = $(LIBXCMD) $(LIBHANDLE)
diff --git a/io/init.c b/io/init.c
index 1532149..bef6f62 100644
--- a/io/init.c
+++ b/io/init.c
@@ -83,7 +83,9 @@ init_commands(void)
 	quit_init();
 	readdir_init();
 	reflink_init();
+	repair_init();
 	resblks_init();
+	scrub_init();
 	seek_init();
 	sendfile_init();
 	shutdown_init();
diff --git a/io/inject.c b/io/inject.c
index 25c7021..91b5fd7 100644
--- a/io/inject.c
+++ b/io/inject.c
@@ -86,7 +86,9 @@ error_tag(char *name)
 		{ XFS_ERRTAG_BMAP_FINISH_ONE,		"bmap_finish_one" },
 #define XFS_ERRTAG_AG_RESV_CRITICAL			27
 		{ XFS_ERRTAG_AG_RESV_CRITICAL,		"ag_resv_critical" },
-#define XFS_ERRTAG_MAX                                  28
+#define XFS_ERRTAG_FORCE_REPAIR				28
+		{ XFS_ERRTAG_FORCE_REPAIR,		"force_repair" },
+#define XFS_ERRTAG_MAX                                  29
 		{ XFS_ERRTAG_MAX,			NULL }
 	};
 	int	count;
diff --git a/io/io.h b/io/io.h
index c7100c9..3dd6296 100644
--- a/io/io.h
+++ b/io/io.h
@@ -178,3 +178,5 @@ extern void		readdir_init(void);
 extern void		reflink_init(void);
 
 extern void		cowextsize_init(void);
+extern void		scrub_init(void);
+extern void		repair_init(void);
diff --git a/io/scrub.c b/io/scrub.c
new file mode 100644
index 0000000..caa965e
--- /dev/null
+++ b/io/scrub.c
@@ -0,0 +1,330 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+
+#include <sys/uio.h>
+#include <xfs/xfs.h>
+#include "command.h"
+#include "input.h"
+#include "init.h"
+#include "path.h"
+#include "io.h"
+
+static struct cmdinfo scrub_cmd;
+static struct cmdinfo repair_cmd;
+
+/* Type info and names for the scrub types. */
+enum scrub_type {
+	ST_NONE,	/* disabled */
+	ST_PERAG,	/* per-AG metadata */
+	ST_FS,		/* per-FS metadata */
+	ST_INODE,	/* per-inode metadata */
+};
+
+/* These must correspond with XFS_SCRUB_TYPE_ */
+struct scrub_descr {
+	const char	*name;
+	enum scrub_type	type;
+};
+
+static const struct scrub_descr scrubbers[] = {
+	{"dummy",	ST_NONE},
+	{"sb",		ST_PERAG},
+	{"agf",		ST_PERAG},
+	{"agfl",	ST_PERAG},
+	{"agi",		ST_PERAG},
+	{"bnobt",	ST_PERAG},
+	{"cntbt",	ST_PERAG},
+	{"inobt",	ST_PERAG},
+	{"finobt",	ST_PERAG},
+	{"rmapbt",	ST_PERAG},
+	{"refcountbt",	ST_PERAG},
+	{"inode",	ST_INODE},
+	{"bmapbtd",	ST_INODE},
+	{"bmapbta",	ST_INODE},
+	{"bmapbtc",	ST_INODE},
+	{"directory",	ST_INODE},
+	{"xattr",	ST_INODE},
+	{"symlink",	ST_INODE},
+	{"rtbitmap",	ST_FS},
+	{"rtsummary",	ST_FS},
+	{NULL,		ST_NONE},
+};
+
+static void
+scrub_help(void)
+{
+	const struct scrub_descr	*d;
+
+	printf(_("\n\
+ Scrubs a piece of XFS filesystem metadata.  The first argument is the type\n\
+ of metadata to examine.  Allocation group number(s) can be specified to\n\
+ restrict the scrub operation to a subset of allocation groups.\n\
+ Certain metadata types do not take AG numbers.\n\
+\n\
+ Example:\n\
+ 'scrub inobt 3' - scrub the inode btree in AG 3.\n\
+ 'scrub bmapbtd 128 13525' - scrubs the extent map of inode 128 gen 13525.\n\
+\n\
+ Known metadata scrub types are:"));
+	for (d = scrubbers; d->name; d++)
+		printf(" %s", d->name);
+	printf("\n");
+}
+
+static void
+scrub_ioctl(
+	int				fd,
+	int				type,
+	uint64_t			control,
+	uint32_t			control2)
+{
+	struct xfs_scrub_metadata	meta;
+	const struct scrub_descr	*sc;
+	int				error;
+
+	sc = &scrubbers[type];
+	memset(&meta, 0, sizeof(meta));
+	meta.sm_type = type;
+	switch (sc->type) {
+	case ST_PERAG:
+		meta.sm_agno = control;
+		break;
+	case ST_INODE:
+		meta.sm_ino = control;
+		meta.sm_gen = control2;
+		break;
+	case ST_NONE:
+	case ST_FS:
+		/* no control parameters */
+		break;
+	}
+	meta.sm_flags = 0;
+
+	error = ioctl(fd, XFS_IOC_SCRUB_METADATA, &meta);
+	if (error)
+		perror("scrub");
+	if (meta.sm_flags & XFS_SCRUB_FLAG_CORRUPT)
+		printf(_("Corruption detected.\n"));
+	if (meta.sm_flags & XFS_SCRUB_FLAG_PREEN)
+		printf(_("Optimization possible.\n"));
+	if (meta.sm_flags & XFS_SCRUB_FLAG_XFAIL)
+		printf(_("Cross-referencing failed.\n"));
+	if (meta.sm_flags & XFS_SCRUB_FLAG_XCORRUPT)
+		printf(_("Corruption detected during cross-referencing.\n"));
+}
+
+static int
+parse_args(
+	int				argc,
+	char				**argv,
+	struct cmdinfo			*cmdinfo,
+	void				(*fn)(int, int, uint64_t, uint32_t))
+{
+	char				*p;
+	int				type = -1;
+	int				i, c;
+	uint64_t			control = 0;
+	uint32_t			control2 = 0;
+	const struct scrub_descr	*d = NULL;
+
+	while ((c = getopt(argc, argv, "")) != EOF) {
+		switch (c) {
+		default:
+			return command_usage(cmdinfo);
+		}
+	}
+	if (optind > argc - 1)
+		return command_usage(cmdinfo);
+
+	for (i = 0, d = scrubbers; d->name; i++, d++) {
+		if (strcmp(d->name, argv[optind]) == 0) {
+			type = i;
+			break;
+		}
+	}
+	optind++;
+
+	if (type < 0)
+		return command_usage(cmdinfo);
+
+	switch (d->type) {
+	case ST_INODE:
+		if (optind == argc) {
+			control = 0;
+			control2 = 0;
+		} else if (optind == argc - 2) {
+			control = strtoull(argv[optind], &p, 0);
+			if (*p != '\0') {
+				fprintf(stderr,
+					_("Bad inode number %s.\n"), argv[i]);
+				return 0;
+			}
+			control2 = strtoul(argv[optind + 1], &p, 0);
+			if (*p != '\0') {
+				fprintf(stderr,
+					_("Bad generation number %s.\n"), argv[i]);
+				return 0;
+			}
+		} else {
+			fprintf(stderr,
+				_("Must specify inode number and generation.\n"));
+			return 0;
+		}
+		break;
+	case ST_PERAG:
+	case ST_NONE:
+		if (optind != argc - 1) {
+			fprintf(stderr,
+				_("Must specify AG number.\n"));
+			return 0;
+		}
+		control = strtoul(argv[optind], &p, 0);
+		if (*p != '\0') {
+			fprintf(stderr,
+				_("Bad AG number %s.\n"), argv[i]);
+			return 0;
+		}
+		break;
+	default:
+		if (optind != argc) {
+			fprintf(stderr,
+				_("No parameters allowed.\n"));
+			return 0;
+		}
+	}
+	fn(file->fd, type, control, control2);
+
+	return 0;
+}
+
+static int
+scrub_f(
+	int				argc,
+	char				**argv)
+{
+	return parse_args(argc, argv, &scrub_cmd, scrub_ioctl);
+}
+
+void
+scrub_init(void)
+{
+	scrub_cmd.name = "scrub";
+	scrub_cmd.altname = "sc";
+	scrub_cmd.cfunc = scrub_f;
+	scrub_cmd.argmin = 1;
+	scrub_cmd.argmax = -1;
+	scrub_cmd.flags = CMD_NOMAP_OK;
+	scrub_cmd.args =
+_("type [agno...]");
+	scrub_cmd.oneline =
+		_("scrubs filesystem metadata");
+	scrub_cmd.help = scrub_help;
+
+	add_command(&scrub_cmd);
+}
+
+static void
+repair_help(void)
+{
+	const struct scrub_descr	*d;
+
+	printf(_("\n\
+ Repairs a piece of XFS filesystem metadata.  The first argument is the type\n\
+ of metadata to examine.  Allocation group number(s) can be specified to\n\
+ restrict the scrub operation to a subset of allocation groups.\n\
+ Certain metadata types do not take AG numbers.\n\
+\n\
+ Example:\n\
+ 'repair inobt 3 5 7' - repairs the inode btree in groups 3, 5, and 7.\n\
+\n\
+ Known metadata repairs types are:"));
+	for (d = scrubbers; d->name; d++)
+		printf(" %s", d->name);
+	printf("\n");
+}
+
+static void
+repair_ioctl(
+	int				fd,
+	int				type,
+	uint64_t			control,
+	uint32_t			control2)
+{
+	struct xfs_scrub_metadata	meta;
+	const struct scrub_descr	*sc;
+	int				error;
+
+	sc = &scrubbers[type];
+	memset(&meta, 0, sizeof(meta));
+	meta.sm_type = type;
+	switch (sc->type) {
+	case ST_PERAG:
+		meta.sm_agno = control;
+		break;
+	case ST_INODE:
+		meta.sm_ino = control;
+		meta.sm_gen = control2;
+		break;
+	case ST_NONE:
+	case ST_FS:
+		/* no control parameters */
+		break;
+	}
+	meta.sm_flags = XFS_SCRUB_FLAG_REPAIR;
+
+	error = ioctl(fd, XFS_IOC_SCRUB_METADATA, &meta);
+	if (error)
+		perror("scrub");
+	if (meta.sm_flags & XFS_SCRUB_FLAG_CORRUPT)
+		printf(_("Corruption remains.\n"));
+	if (meta.sm_flags & XFS_SCRUB_FLAG_PREEN)
+		printf(_("Optimization possible.\n"));
+	if (meta.sm_flags & XFS_SCRUB_FLAG_XFAIL)
+		printf(_("Cross-referencing failed.\n"));
+	if (meta.sm_flags & XFS_SCRUB_FLAG_XCORRUPT)
+		printf(_("Corruption still detected during cross-referencing.\n"));
+}
+
+static int
+repair_f(
+	int				argc,
+	char				**argv)
+{
+	return parse_args(argc, argv, &repair_cmd, repair_ioctl);
+}
+
+void
+repair_init(void)
+{
+	if (!expert)
+		return;
+	repair_cmd.name = "repair";
+	repair_cmd.altname = "fix";
+	repair_cmd.cfunc = repair_f;
+	repair_cmd.argmin = 1;
+	repair_cmd.argmax = -1;
+	repair_cmd.flags = CMD_NOMAP_OK;
+	repair_cmd.args =
+_("type [agno...]");
+	repair_cmd.oneline =
+		_("repairs filesystem metadata");
+	repair_cmd.help = repair_help;
+
+	add_command(&repair_cmd);
+}
diff --git a/man/man8/xfs_io.8 b/man/man8/xfs_io.8
index 479c39b..64f66b9 100644
--- a/man/man8/xfs_io.8
+++ b/man/man8/xfs_io.8
@@ -998,6 +998,26 @@ version of policy structure (numeric)
 .BR get_encpolicy
 On filesystems that support encryption, display the encryption policy of the
 current file.
+.TP
+.BI "scrub " type " [ " agnumber... " | " "ino" " " "gen" " ]"
+Scrub internal XFS filesystem metadata.  The
+.BI type
+parameter specifies which type of metadata to scrub.
+For AG metadata, AG numbers can optionally be specified to restrict the
+scrub operation to a particular set of allocation groups.
+By default, all allocation groups are scrubbed.
+For file metadata, the scrub is applied to the open file unless the
+inode number and generation number are specified.
+.TP
+.BI "repair " type " [ " agnumber... " | " "ino" " " "gen" " ]"
+Repair internal XFS filesystem metadata.  The
+.BI type
+parameter specifies which type of metadata to repair.
+For AG metadata, AG numbers can optionally be specified to restrict the
+repair operation to a particular set of allocation groups.
+By default, all allocation groups are repaired.
+For file metadata, the repair is applied to the open file unless the
+inode number and generation number are specified.
 
 .SH SEE ALSO
 .BR mkfs.xfs (8),


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 13/17] xfs_scrub: create online filesystem scrub program
  2017-01-21  8:08 [PATCH v5 00/17] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (11 preceding siblings ...)
  2017-01-21  8:09 ` [PATCH 12/17] xfs_io: provide an interface to the scrub ioctls Darrick J. Wong
@ 2017-01-21  8:09 ` Darrick J. Wong
  2017-01-21  8:09 ` [PATCH 14/17] xfs_scrub: add generic VFS scrubber implementation Darrick J. Wong
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Darrick J. Wong @ 2017-01-21  8:09 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs, linux-fsdevel

Create a filesystem scrubbing tool that walks the directory tree,
queries every file's extents, extended attributes, and stat data.  For
generic (non-XFS) filesystems this depends on the kernel to do nearly
all the validation.  Optionally, we can (try to) read all the file
data.

This patch provides some helper components that will be used by the
various backends to walk the metadata, perform media scans, etc.  Actual
filesystem drivers will be in subsequent patches.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 Makefile              |    3 
 configure.ac          |   13 +
 include/builddefs.in  |   13 +
 m4/Makefile           |    1 
 m4/package_attrdev.m4 |   29 +
 m4/package_libcdev.m4 |  140 +++++++
 man/man8/xfs_scrub.8  |  127 ++++++
 scrub/Makefile        |   51 ++
 scrub/bitmap.c        |  425 ++++++++++++++++++++
 scrub/bitmap.h        |   42 ++
 scrub/disk.c          |  285 +++++++++++++
 scrub/disk.h          |   41 ++
 scrub/iocmd.c         |  412 +++++++++++++++++++
 scrub/iocmd.h         |   50 ++
 scrub/read_verify.c   |  314 +++++++++++++++
 scrub/read_verify.h   |   59 +++
 scrub/scrub.c         | 1055 +++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/scrub.h         |  192 +++++++++
 18 files changed, 3251 insertions(+), 1 deletion(-)
 create mode 100644 m4/package_attrdev.m4
 create mode 100644 man/man8/xfs_scrub.8
 create mode 100644 scrub/Makefile
 create mode 100644 scrub/bitmap.c
 create mode 100644 scrub/bitmap.h
 create mode 100644 scrub/disk.c
 create mode 100644 scrub/disk.h
 create mode 100644 scrub/iocmd.c
 create mode 100644 scrub/iocmd.h
 create mode 100644 scrub/read_verify.c
 create mode 100644 scrub/read_verify.h
 create mode 100644 scrub/scrub.c
 create mode 100644 scrub/scrub.h


diff --git a/Makefile b/Makefile
index 3a4872a..e6d79af 100644
--- a/Makefile
+++ b/Makefile
@@ -47,7 +47,7 @@ HDR_SUBDIRS = include libxfs
 DLIB_SUBDIRS = libxlog libxcmd libhandle
 LIB_SUBDIRS = libxfs $(DLIB_SUBDIRS)
 TOOL_SUBDIRS = copy db estimate fsck growfs io logprint mkfs quota \
-		mdrestore repair rtcp m4 man doc debian spaceman
+		mdrestore repair rtcp m4 man doc debian spaceman scrub
 
 ifneq ("$(PKG_PLATFORM)","darwin")
 TOOL_SUBDIRS += fsr
@@ -89,6 +89,7 @@ repair: libxlog libxcmd
 copy: libxlog
 mkfs: libxcmd
 spaceman: libxcmd
+scrub: libhandle libxcmd repair
 
 ifeq ($(HAVE_BUILDDEFS), yes)
 include $(BUILDRULES)
diff --git a/configure.ac b/configure.ac
index 3a4655f..70b5586 100644
--- a/configure.ac
+++ b/configure.ac
@@ -139,8 +139,21 @@ AC_HAVE_MNTENT
 AC_HAVE_FLS
 AC_HAVE_READDIR
 AC_HAVE_FSETXATTR
+AC_HAVE_FGETXATTR
+AC_HAVE_FLISTXATTR
+AC_HAVE_LLISTXATTR
 AC_HAVE_MREMAP
 AC_NEED_INTERNAL_FSXATTR
+AC_HAVE_MALLINFO
+AC_HAVE_SG_IO
+AC_HAVE_HDIO_GETGEO
+AC_HAVE_ATTRIBUTES_H
+AC_HAVE_ATTRIBUTES_MACROS
+AC_HAVE_ATTRIBUTES_STRUCTS
+AC_HAVE_OPENAT
+AC_HAVE_READLINKAT
+AC_HAVE_SYNCFS
+AC_HAVE_FSTATAT
 
 if test "$enable_blkid" = yes; then
 AC_HAVE_BLKID_TOPO
diff --git a/include/builddefs.in b/include/builddefs.in
index 612b547..2f2b9ad 100644
--- a/include/builddefs.in
+++ b/include/builddefs.in
@@ -109,8 +109,21 @@ HAVE_READDIR = @have_readdir@
 HAVE_MNTENT = @have_mntent@
 HAVE_FLS = @have_fls@
 HAVE_FSETXATTR = @have_fsetxattr@
+HAVE_FGETXATTR = @have_fgetxattr@
+HAVE_FLISTXATTR = @have_flistxattr@
+HAVE_LLISTXATTR = @have_llistxattr@
 HAVE_MREMAP = @have_mremap@
 NEED_INTERNAL_FSXATTR = @need_internal_fsxattr@
+HAVE_MALLINFO = @have_mallinfo@
+HAVE_SG_IO = @have_sg_io@
+HAVE_HDIO_GETGEO = @have_hdio_getgeo@
+HAVE_ATTRIBUTES_H = @have_attributes_h@
+HAVE_ATTRIBUTES_MACROS = @have_attributes_macros@
+HAVE_ATTRIBUTES_STRUCTS = @have_attributes_structs@
+HAVE_OPENAT = @have_openat@
+HAVE_READLINKAT = @have_readlinkat@
+HAVE_SYNCFS = @have_syncfs@
+HAVE_FSTATAT = @have_fstatat@
 
 GCCFLAGS = -funsigned-char -fno-strict-aliasing -Wall
 #	   -Wbitwise -Wno-transparent-union -Wno-old-initializer -Wno-decl
diff --git a/m4/Makefile b/m4/Makefile
index d282f0a..0c73f35 100644
--- a/m4/Makefile
+++ b/m4/Makefile
@@ -14,6 +14,7 @@ CONFIGURE = \
 
 LSRCFILES = \
 	manual_format.m4 \
+	package_attrdev.m4 \
 	package_blkid.m4 \
 	package_globals.m4 \
 	package_libcdev.m4 \
diff --git a/m4/package_attrdev.m4 b/m4/package_attrdev.m4
new file mode 100644
index 0000000..eb0e35b
--- /dev/null
+++ b/m4/package_attrdev.m4
@@ -0,0 +1,29 @@
+AC_DEFUN([AC_HAVE_ATTRIBUTES_H],
+  [ AC_CHECK_HEADERS(attr/attributes.h, [have_attributes_h=yes])
+    AC_SUBST(have_attributes_h)
+    if test "$have_attributes_h" != "yes"; then
+        echo
+        echo 'WARNING: attr/attributes.h does not exist.'
+        echo 'Install the extended attributes (attr) development package.'
+        echo 'Alternatively, run "make install-dev" from the attr source.'
+        echo
+    fi
+  ])
+
+AC_DEFUN([AC_HAVE_ATTRIBUTES_STRUCTS],
+  [ AC_CHECK_TYPES([struct attrlist_cursor, struct attr_multiop, struct attrlist_ent],
+    [have_attributes_structs=yes],,
+    [
+#include <sys/types.h>
+#include <attr/attributes.h>] )
+    AC_SUBST(have_attributes_structs)
+  ])
+
+AC_DEFUN([AC_HAVE_ATTRIBUTES_MACROS],
+  [ AC_TRY_LINK([
+#include <sys/types.h>
+#include <attr/attributes.h>],
+    [ int x = ATTR_SECURE; int y = ATTR_ROOT; int z = ATTR_TRUST; ATTR_ENTRY(0, 0); ],
+    [have_attributes_macros=yes])
+    AC_SUBST(have_attributes_macros)
+  ])
diff --git a/m4/package_libcdev.m4 b/m4/package_libcdev.m4
index 7d5a42d..5ff0713 100644
--- a/m4/package_libcdev.m4
+++ b/m4/package_libcdev.m4
@@ -236,6 +236,45 @@ AC_DEFUN([AC_HAVE_FSETXATTR],
   ])
 
 #
+# Check if we have a fgetxattr call (Mac OS X)
+#
+AC_DEFUN([AC_HAVE_FGETXATTR],
+  [ AC_CHECK_DECL([fgetxattr],
+       have_fgetxattr=yes,
+       [],
+       [#include <sys/types.h>
+        #include <attr/xattr.h>]
+       )
+    AC_SUBST(have_fgetxattr)
+  ])
+
+#
+# Check if we have a flistxattr call (Mac OS X)
+#
+AC_DEFUN([AC_HAVE_FLISTXATTR],
+  [ AC_CHECK_DECL([flistxattr],
+       have_flistxattr=yes,
+       [],
+       [#include <sys/types.h>
+        #include <attr/xattr.h>]
+       )
+    AC_SUBST(have_flistxattr)
+  ])
+
+#
+# Check if we have a llistxattr call (Mac OS X)
+#
+AC_DEFUN([AC_HAVE_LLISTXATTR],
+  [ AC_CHECK_DECL([llistxattr],
+       have_llistxattr=yes,
+       [],
+       [#include <sys/types.h>
+        #include <attr/xattr.h>]
+       )
+    AC_SUBST(have_llistxattr)
+  ])
+
+#
 # Check if there is mntent.h
 #
 AC_DEFUN([AC_HAVE_MNTENT],
@@ -277,3 +316,104 @@ AC_DEFUN([AC_NEED_INTERNAL_FSXATTR],
     )
     AC_SUBST(need_internal_fsxattr)
   ])
+
+#
+# Check if we have a mallinfo libc call
+#
+AC_DEFUN([AC_HAVE_MALLINFO],
+  [ AC_MSG_CHECKING([for mallinfo ])
+    AC_TRY_COMPILE([
+#include <malloc.h>
+    ], [
+         struct mallinfo test;
+
+         test.arena = 0; test.hblkhd = 0; test.uordblks = 0; test.fordblks = 0;
+         test = mallinfo();
+    ], have_mallinfo=yes
+       AC_MSG_RESULT(yes),
+       AC_MSG_RESULT(no))
+    AC_SUBST(have_mallinfo)
+  ])
+
+#
+# Check if we have the SG_IO ioctl
+#
+AC_DEFUN([AC_HAVE_SG_IO],
+  [ AC_MSG_CHECKING([for struct sg_io_hdr ])
+    AC_TRY_COMPILE([#include <scsi/sg.h>],
+    [
+         struct sg_io_hdr hdr;
+         ioctl(0, SG_IO, &hdr);
+    ], have_sg_io=yes
+       AC_MSG_RESULT(yes),
+       AC_MSG_RESULT(no))
+    AC_SUBST(have_sg_io)
+  ])
+
+#
+# Check if we have the HDIO_GETGEO ioctl
+#
+AC_DEFUN([AC_HAVE_HDIO_GETGEO],
+  [ AC_MSG_CHECKING([for struct hd_geometry ])
+    AC_TRY_COMPILE([#include <linux/hdreg.h>],
+    [
+         struct hd_geometry hdr;
+         ioctl(0, HDIO_GETGEO, &hdr);
+    ], have_hdio_getgeo=yes
+       AC_MSG_RESULT(yes),
+       AC_MSG_RESULT(no))
+    AC_SUBST(have_hdio_getgeo)
+  ])
+
+#
+# Check if we have a openat call
+#
+AC_DEFUN([AC_HAVE_OPENAT],
+  [ AC_CHECK_DECL([openat],
+       have_openat=yes,
+       [],
+       [#include <sys/types.h>
+        #include <sys/stat.h>
+        #include <fcntl.h>]
+       )
+    AC_SUBST(have_openat)
+  ])
+
+#
+# Check if we have a readlinkat call
+#
+AC_DEFUN([AC_HAVE_READLINKAT],
+  [ AC_CHECK_DECL([readlinkat],
+       have_readlinkat=yes,
+       [],
+       [#include <unistd.h>
+        #include <fcntl.h>]
+       )
+    AC_SUBST(have_readlinkat)
+  ])
+
+#
+# Check if we have a syncfs call
+#
+AC_DEFUN([AC_HAVE_SYNCFS],
+  [ AC_CHECK_DECL([syncfs],
+       have_syncfs=yes,
+       [],
+       [#define _GNU_SOURCE
+       #include <unistd.h>])
+    AC_SUBST(have_syncfs)
+  ])
+
+#
+# Check if we have a fstatat call
+#
+AC_DEFUN([AC_HAVE_FSTATAT],
+  [ AC_CHECK_DECL([fstatat],
+       have_fstatat=yes,
+       [],
+       [#define _GNU_SOURCE
+       #include <sys/types.h>
+       #include <sys/stat.h>
+       #include <unistd.h>])
+    AC_SUBST(have_fstatat)
+  ])
diff --git a/man/man8/xfs_scrub.8 b/man/man8/xfs_scrub.8
new file mode 100644
index 0000000..0ad1fb8
--- /dev/null
+++ b/man/man8/xfs_scrub.8
@@ -0,0 +1,127 @@
+.TH xfs_scrub 8
+.SH NAME
+xfs_scrub \- scrub the contents of an XFS filesystem
+.SH SYNOPSIS
+.B xfs_scrub
+[
+.B \-ademntTvVxy
+]
+.I mountpoint
+.br
+.B xfs_scrub \-V
+.SH DESCRIPTION
+.B xfs_scrub
+attempts to read and check all the metadata in a Linux filesystem.
+.PP
+If
+.B xfs_scrub
+does not detect an XFS filesystem, it will use a generic backend to
+scrub the filesystem.
+This involves walking the directory tree, querying the data and
+extended attribute extent maps, performing limited checks of directory
+and inode data, reading all of an inode's extended attributes,
+optionally reading all data in a file, and comparing the number of
+blocks and inodes seen against the reported counters.
+.PP
+If an XFS filesystem is detected, then
+.B xfs_scrub
+will ask the kernel to perform more rigorous scrubbing of the
+internal metadata.
+The in-kernel scrubbers also cross-reference each data structure's
+records against the other filesystem metadata.
+.PP
+This utility does not know how to correct all errors.
+If the tool cannot fix the detected errors, you must unmount the
+filesystem and run the appropriate repair tool.
+if this tool is run without either of the
+.B \-n
+or
+.B \-y
+options, then it will preen and optimize the filesystem when possible,
+though it will not try to fix errors.
+.SH OPTIONS
+.TP
+.BI \-a " errors"
+Abort if more than this many errors are found on the filesystem.
+.TP
+.B \-d
+Enable debugging mode, which augments error reports with the exact file
+and line where the scrub failure occurred.
+This also enables verbose mode.
+.TP
+.B \-e
+Specifies what happens when errors are detected.
+If
+.IR shutdown
+is given, the filesystem will be taken offline if errors are found.
+Not all backends can shut down a filesystem.
+If
+.IR continue
+is given, no action taken if errors are found.
+This is the default.
+.TP
+.BI \-m " file"
+Search this file for mounted filesystems instead of /etc/mtab.
+.TP
+.B \-n
+Dry run, do not modify anything in the filesystem.  This disables
+all preening and optimization behaviors, and disables calling
+FITRIM on the free space after a successful run.
+.TP
+.BI \-t " fstype"
+Force the use of a particular type of filesystem scrubber.
+The current backends are:
+.IR xfs , " ext4" , " ext3", " ext2", " btrfs" ", and " generic "."
+Most filesystems will work just fine with the generic backend.
+.TP
+.BI \-T
+Print timing and memory usage information for each phase.
+.TP
+.B \-v
+Enable verbose mode, which prints periodic status updates.
+.TP
+.B \-V
+Prints the version number and exits.
+.TP
+.B \-x
+Scrub file data.  This reads every block of every file on disk.
+If the filesystem reports file extent mappings or physical extent
+mappings and is backed by a block device,
+.TP
+.B \-y
+Try to repair all filesystem errors.  If the errors cannot be fixed
+online, then the filesystem must be taken offline for repair.
+.B xfs_scrub
+will issue O_DIRECT reads to the block device directly.
+If the block device is a SCSI disk, it will issue READ VERIFY commands
+directly to the disk.
+.SH EXIT CODE
+The exit code returned by
+.B xfs_scrub
+is the sum of the following conditions:
+.br
+\	0\	\-\ No errors
+.br
+\	4\	\-\ File system errors left uncorrected
+.br
+\	8\	\-\ Operational error
+.br
+\	16\	\-\ Usage or syntax error
+.br
+.SH CAVEATS
+.B xfs_scrub
+is an immature utility!
+The generic scrub backend walks the directory tree, reads file extents
+and data, and queries every extended attribute it can find.
+The generic scrub does not grab exclusive locks on the objects it is
+examining, nor does it have any way to cross-reference what it sees
+against the internal filesystem metadata.
+.PP
+The XFS backend takes advantage of in-kernel scrubbing to verify a
+given data structure with locks held.
+This can tie up the system for a while.
+.PP
+If errors are found, the filesystem should be taken offline and
+repaired.
+.SH SEE ALSO
+.BR xfs_repair (8).
diff --git a/scrub/Makefile b/scrub/Makefile
new file mode 100644
index 0000000..2aa359b
--- /dev/null
+++ b/scrub/Makefile
@@ -0,0 +1,51 @@
+#
+# Copyright (c) 2017 Oracle.  All Rights Reserved.
+#
+
+TOPDIR = ..
+include $(TOPDIR)/include/builddefs
+
+SCRUB_PREREQS=$(HAVE_FIEMAP)$(HAVE_ATTRIBUTES_H)$(HAVE_ATTRIBUTES_MACROS)$(HAVE_ATTRIBUTES_STRUCTS)$(HAVE_FGETXATTR)$(HAVE_FLISTXATTR)$(HAVE_LLISTXATTR)$(HAVE_OPENAT)$(HAVE_READLINKAT)$(HAVE_FSTATAT)
+
+ifeq ($(SCRUB_PREREQS),yesyesyesyesyesyesyesyesyesyes)
+LTCOMMAND = xfs_scrub
+INSTALL_SCRUB = install-scrub
+endif
+
+HFILES = scrub.h ../repair/threads.h read_verify.h iocmd.h
+CFILES = ../repair/avl64.c disk.c bitmap.c iocmd.c \
+	 read_verify.c scrub.c ../repair/threads.c
+
+LLDLIBS += $(LIBBLKID) $(LIBXFS) $(LIBXCMD) $(LIBUUID) $(LIBRT) $(LIBPTHREAD) $(LIBHANDLE)
+LTDEPENDENCIES += $(LIBXFS) $(LIBXCMD) $(LIBHANDLE)
+LLDFLAGS = -static-libtool-libs
+
+ifeq ($(HAVE_MALLINFO),yes)
+LCFLAGS += -DHAVE_MALLINFO
+endif
+
+ifeq ($(HAVE_SG_IO),yes)
+LCFLAGS += -DHAVE_SG_IO
+endif
+
+ifeq ($(HAVE_HDIO_GETGEO),yes)
+LCFLAGS += -DHAVE_HDIO_GETGEO
+endif
+
+ifeq ($(HAVE_SYNCFS),yes)
+LCFLAGS += -DHAVE_SYNCFS
+endif
+
+default: depend $(LTCOMMAND)
+
+include $(BUILDRULES)
+
+install: default $(INSTALL_SCRUB)
+
+install-scrub:
+	$(INSTALL) -m 755 -d $(PKG_ROOT_SBIN_DIR)
+	$(LTINSTALL) -m 755 $(LTCOMMAND) $(PKG_ROOT_SBIN_DIR)
+
+install-dev:
+
+-include .dep
diff --git a/scrub/bitmap.c b/scrub/bitmap.c
new file mode 100644
index 0000000..0146c49
--- /dev/null
+++ b/scrub/bitmap.c
@@ -0,0 +1,425 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "libxfs.h"
+#include "../repair/avl64.h"
+#include "bitmap.h"
+
+#define avl_for_each_range_safe(pos, n, l, first, last) \
+	for (pos = (first), n = pos->avl_nextino, l = (last)->avl_nextino; pos != (l); \
+			pos = n, n = pos ? pos->avl_nextino : NULL)
+
+#define avl_for_each_safe(tree, pos, n) \
+	for (pos = (tree)->avl_firstino, n = pos ? pos->avl_nextino : NULL; \
+			pos != NULL; \
+			pos = n, n = pos ? pos->avl_nextino : NULL)
+
+#define avl_for_each(tree, pos) \
+	for (pos = (tree)->avl_firstino; pos != NULL; pos = pos->avl_nextino)
+
+struct bitmap_node {
+	struct avl64node	btn_node;
+	uint64_t		btn_start;
+	uint64_t		btn_length;
+};
+
+static __uint64_t
+extent_start(
+	struct avl64node	*node)
+{
+	struct bitmap_node	*btn;
+
+	btn = container_of(node, struct bitmap_node, btn_node);
+	return btn->btn_start;
+}
+
+static __uint64_t
+extent_end(
+	struct avl64node	*node)
+{
+	struct bitmap_node	*btn;
+
+	btn = container_of(node, struct bitmap_node, btn_node);
+	return btn->btn_start + btn->btn_length;
+}
+
+static struct avl64ops bitmap_ops = {
+	extent_start,
+	extent_end,
+};
+
+/* Initialize an extent tree. */
+bool
+bitmap_init(
+	struct bitmap		*tree)
+{
+	tree->bt_tree = malloc(sizeof(struct avl64tree_desc));
+	if (!tree->bt_tree)
+		return false;
+
+	pthread_mutex_init(&tree->bt_lock, NULL);
+	avl64_init_tree(tree->bt_tree, &bitmap_ops);
+
+	return true;
+}
+
+/* Free an extent tree. */
+void
+bitmap_free(
+	struct bitmap		*tree)
+{
+	struct avl64node	*node;
+	struct avl64node	*n;
+	struct bitmap_node	*ext;
+
+	if (!tree->bt_tree)
+		return;
+
+	avl_for_each_safe(tree->bt_tree, node, n) {
+		ext = container_of(node, struct bitmap_node, btn_node);
+		free(ext);
+	}
+	free(tree->bt_tree);
+	tree->bt_tree = NULL;
+}
+
+/* Create a new extent. */
+static struct bitmap_node *
+bitmap_node_init(
+	uint64_t		start,
+	uint64_t		len)
+{
+	struct bitmap_node	*ext;
+
+	ext = malloc(sizeof(struct bitmap_node));
+	if (!ext)
+		return NULL;
+
+	ext->btn_node.avl_nextino = NULL;
+	ext->btn_start = start;
+	ext->btn_length = len;
+
+	return ext;
+}
+
+/* Add an extent (locked). */
+static bool
+__bitmap_add(
+	struct bitmap		*tree,
+	uint64_t		start,
+	uint64_t		length)
+{
+	struct avl64node	*firstn;
+	struct avl64node	*lastn;
+	struct avl64node	*pos;
+	struct avl64node	*n;
+	struct avl64node	*l;
+	struct bitmap_node	*ext;
+	uint64_t		new_start;
+	uint64_t		new_length;
+	struct avl64node	*node;
+	bool			res = true;
+
+	/* Find any existing nodes adjacent or within that range. */
+	avl64_findranges(tree->bt_tree, start - 1, start + length + 1,
+			&firstn, &lastn);
+
+	/* Nothing, just insert a new extent. */
+	if (firstn == NULL && lastn == NULL) {
+		ext = bitmap_node_init(start, length);
+		if (!ext)
+			return false;
+
+		node = avl64_insert(tree->bt_tree, &ext->btn_node);
+		if (node == NULL) {
+			free(ext);
+			errno = EEXIST;
+			return false;
+		}
+
+		return true;
+	}
+
+	ASSERT(firstn != NULL && lastn != NULL);
+	new_start = start;
+	new_length = length;
+
+	avl_for_each_range_safe(pos, n, l, firstn, lastn) {
+		ext = container_of(pos, struct bitmap_node, btn_node);
+
+		/* Bail if the new extent is contained within an old one. */
+		if (ext->btn_start <= start &&
+		    ext->btn_start + ext->btn_length >= start + length)
+			return res;
+
+		/* Check for overlapping and adjacent extents. */
+		if (ext->btn_start + ext->btn_length >= start ||
+		    ext->btn_start <= start + length) {
+			if (ext->btn_start < start) {
+				new_start = ext->btn_start;
+				new_length += ext->btn_length;
+			}
+
+			if (ext->btn_start + ext->btn_length >
+			    new_start + new_length)
+				new_length = ext->btn_start + ext->btn_length -
+						new_start;
+
+			avl64_delete(tree->bt_tree, pos);
+			free(ext);
+		}
+	}
+
+	ext = bitmap_node_init(new_start, new_length);
+	if (!ext)
+		return false;
+
+	node = avl64_insert(tree->bt_tree, &ext->btn_node);
+	if (node == NULL) {
+		free(ext);
+		errno = EEXIST;
+		return false;
+	}
+
+	return res;
+}
+
+/* Add an extent. */
+bool
+bitmap_add(
+	struct bitmap		*tree,
+	uint64_t		start,
+	uint64_t		length)
+{
+	bool			res;
+
+	pthread_mutex_lock(&tree->bt_lock);
+	res = __bitmap_add(tree, start, length);
+	pthread_mutex_unlock(&tree->bt_lock);
+
+	return res;
+}
+
+/* Remove an extent. */
+bool
+bitmap_remove(
+	struct bitmap		*tree,
+	uint64_t		start,
+	uint64_t		len)
+{
+	struct avl64node	*firstn;
+	struct avl64node	*lastn;
+	struct avl64node	*pos;
+	struct avl64node	*n;
+	struct avl64node	*l;
+	struct bitmap_node	*ext;
+	uint64_t		new_start;
+	uint64_t		new_length;
+	struct avl64node	*node;
+	int			stat;
+
+	pthread_mutex_lock(&tree->bt_lock);
+	/* Find any existing nodes over that range. */
+	avl64_findranges(tree->bt_tree, start, start + len, &firstn, &lastn);
+
+	/* Nothing, we're done. */
+	if (firstn == NULL && lastn == NULL) {
+		pthread_mutex_unlock(&tree->bt_lock);
+		return true;
+	}
+
+	ASSERT(firstn != NULL && lastn != NULL);
+
+	/* Delete or truncate everything in sight. */
+	avl_for_each_range_safe(pos, n, l, firstn, lastn) {
+		ext = container_of(pos, struct bitmap_node, btn_node);
+
+		stat = 0;
+		if (ext->btn_start < start)
+			stat |= 1;
+		if (ext->btn_start + ext->btn_length > start + len)
+			stat |= 2;
+		switch (stat) {
+		case 0:
+			/* Extent totally within range; delete. */
+			avl64_delete(tree->bt_tree, pos);
+			free(ext);
+			break;
+		case 1:
+			/* Extent is left-adjacent; truncate. */
+			ext->btn_length = start - ext->btn_start;
+			break;
+		case 2:
+			/* Extent is right-adjacent; move it. */
+			ext->btn_length = ext->btn_start + ext->btn_length -
+					(start + len);
+			ext->btn_start = start + len;
+			break;
+		case 3:
+			/* Extent overlaps both ends. */
+			ext->btn_length = start - ext->btn_start;
+			new_start = start + len;
+			new_length = ext->btn_start + ext->btn_length -
+					new_start;
+
+			ext = bitmap_node_init(new_start, new_length);
+			if (!ext)
+				return false;
+
+			node = avl64_insert(tree->bt_tree, &ext->btn_node);
+			if (node == NULL) {
+				errno = EEXIST;
+				return false;
+			}
+			break;
+		}
+	}
+
+	pthread_mutex_unlock(&tree->bt_lock);
+	return true;
+}
+
+/* Iterate an extent tree. */
+bool
+bitmap_iterate(
+	struct bitmap		*tree,
+	bool			(*fn)(uint64_t, uint64_t, void *),
+	void			*arg)
+{
+	struct avl64node	*node;
+	struct bitmap_node	*ext;
+	bool			moveon = true;
+
+	pthread_mutex_lock(&tree->bt_lock);
+	avl_for_each(tree->bt_tree, node) {
+		ext = container_of(node, struct bitmap_node, btn_node);
+		moveon = fn(ext->btn_start, ext->btn_length, arg);
+		if (!moveon)
+			break;
+	}
+	pthread_mutex_unlock(&tree->bt_lock);
+
+	return moveon;
+}
+
+/* Do any extents overlap the given one?  (locked) */
+static bool
+__bitmap_has_extent(
+	struct bitmap		*tree,
+	uint64_t		start,
+	uint64_t		len)
+{
+	struct avl64node	*firstn;
+	struct avl64node	*lastn;
+
+	/* Find any existing nodes over that range. */
+	avl64_findranges(tree->bt_tree, start, start + len, &firstn, &lastn);
+
+	return firstn != NULL && lastn != NULL;
+}
+
+/* Do any extents overlap the given one? */
+bool
+bitmap_has_extent(
+	struct bitmap		*tree,
+	uint64_t		start,
+	uint64_t		len)
+{
+	bool			res;
+
+	pthread_mutex_lock(&tree->bt_lock);
+	res = __bitmap_has_extent(tree, start, len);
+	pthread_mutex_unlock(&tree->bt_lock);
+
+	return res;
+}
+
+/* Ensure that the extent is set, and return the old value. */
+bool
+bitmap_test_and_set(
+	struct bitmap		*tree,
+	uint64_t		start,
+	bool			*was_set)
+{
+	bool			res = true;
+
+	pthread_mutex_lock(&tree->bt_lock);
+	*was_set = __bitmap_has_extent(tree, start, 1);
+	if (!(*was_set))
+		res = __bitmap_add(tree, start, 1);
+	pthread_mutex_unlock(&tree->bt_lock);
+
+	return res;
+}
+
+/* Is it empty? */
+bool
+bitmap_empty(
+	struct bitmap		*tree)
+{
+	return tree->bt_tree->avl_firstino == NULL;
+}
+
+static bool
+merge_helper(
+	uint64_t		start,
+	uint64_t		length,
+	void			*arg)
+{
+	struct bitmap		*thistree = arg;
+
+	return __bitmap_add(thistree, start, length);
+}
+
+/* Merge another tree with this one. */
+bool
+bitmap_merge(
+	struct bitmap		*thistree,
+	struct bitmap		*tree)
+{
+	bool			res;
+
+	assert(thistree != tree);
+
+	pthread_mutex_lock(&thistree->bt_lock);
+	res = bitmap_iterate(tree, merge_helper, thistree);
+	pthread_mutex_unlock(&thistree->bt_lock);
+
+	return res;
+}
+
+static bool
+bitmap_dump_fn(
+	uint64_t		startblock,
+	uint64_t		blockcount,
+	void			*arg)
+{
+	printf("%"PRIu64":%"PRIu64"\n", startblock, blockcount);
+	return true;
+}
+
+/* Dump extent tree. */
+void
+bitmap_dump(
+	struct bitmap		*tree)
+{
+	printf("BITMAP DUMP %p\n", tree);
+	bitmap_iterate(tree, bitmap_dump_fn, NULL);
+	printf("BITMAP DUMP DONE\n");
+}
diff --git a/scrub/bitmap.h b/scrub/bitmap.h
new file mode 100644
index 0000000..e3702aa
--- /dev/null
+++ b/scrub/bitmap.h
@@ -0,0 +1,42 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef BITMAP_H_
+#define BITMAP_H_
+
+struct bitmap {
+	pthread_mutex_t		bt_lock;
+	struct avl64tree_desc	*bt_tree;
+};
+
+bool bitmap_init(struct bitmap *tree);
+void bitmap_free(struct bitmap *tree);
+bool bitmap_add(struct bitmap *tree, uint64_t start, uint64_t length);
+bool bitmap_remove(struct bitmap *tree, uint64_t start,
+		uint64_t len);
+bool bitmap_iterate(struct bitmap *tree,
+		bool (*fn)(uint64_t, uint64_t, void *), void *arg);
+bool bitmap_has_extent(struct bitmap *tree, uint64_t start,
+		uint64_t len);
+bool bitmap_test_and_set(struct bitmap *tree, uint64_t start, bool *was_set);
+bool bitmap_empty(struct bitmap *tree);
+bool bitmap_merge(struct bitmap *thistree, struct bitmap *tree);
+void bitmap_dump(struct bitmap *tree);
+
+#endif /* BITMAP_H_ */
diff --git a/scrub/disk.c b/scrub/disk.c
new file mode 100644
index 0000000..c2b158a
--- /dev/null
+++ b/scrub/disk.c
@@ -0,0 +1,285 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "libxfs.h"
+#include <sys/statvfs.h>
+#include <sys/types.h>
+#include <dirent.h>
+#ifdef HAVE_SG_IO
+# include <scsi/sg.h>
+#endif
+#ifdef HAVE_HDIO_GETGEO
+# include <linux/hdreg.h>
+#endif
+#include "disk.h"
+#include "scrub.h"
+
+/* Figure out how many disk heads are available. */
+static unsigned int
+__disk_heads(
+	struct disk		*disk)
+{
+	int			iomin;
+	int			ioopt;
+	unsigned short		rot;
+	int			error;
+
+	/* If it's not a block device, throw all the CPUs at it. */
+	if (!S_ISBLK(disk->d_sb.st_mode))
+		return libxfs_nproc();
+
+	/* Non-rotational device?  Throw all the CPUs. */
+	rot = 1;
+	error = ioctl(disk->d_fd, BLKROTATIONAL, &rot);
+	if (error == 0 && rot == 0)
+		return libxfs_nproc();
+
+	/*
+	 * Sometimes we can infer the number of devices from the
+	 * min/optimal IO sizes.
+	 */
+	iomin = ioopt = 0;
+	if (ioctl(disk->d_fd, BLKIOMIN, &iomin) == 0 &&
+	    ioctl(disk->d_fd, BLKIOOPT, &ioopt) == 0 &&
+            iomin > 0 && ioopt > 0) {
+		return min(libxfs_nproc(), max(1, ioopt / iomin));
+	}
+
+	/* Rotating device?  I guess? */
+	return 2;
+}
+
+/* Figure out how many disk heads are available. */
+unsigned int
+disk_heads(
+	struct disk		*disk)
+{
+	if (nr_threads < 0)
+		return __disk_heads(disk);
+	return min(__disk_heads(disk), nr_threads);
+}
+
+/* Execute a SCSI VERIFY(16).  We hope. */
+#ifdef HAVE_SG_IO
+# define SENSE_BUF_LEN		64
+# define VERIFY16_CMDLEN	16
+# define VERIFY16_CMD		0x8F
+
+# ifndef SG_FLAG_Q_AT_TAIL
+#  define SG_FLAG_Q_AT_TAIL	0x10
+# endif
+static int
+disk_scsi_verify(
+	struct disk		*disk,
+	uint64_t		startblock, /* lba */
+	uint64_t		blockcount) /* lba */
+{
+	struct sg_io_hdr	iohdr;
+	unsigned char		cdb[VERIFY16_CMDLEN];
+	unsigned char		sense[SENSE_BUF_LEN];
+	uint64_t		llba;
+	uint64_t		veri_len = blockcount;
+	int			error;
+
+	assert(!debug_tweak_on("XFS_SCRUB_NO_SCSI_VERIFY"));
+
+	llba = startblock + (disk->d_start >> BBSHIFT);
+
+	/* Borrowed from sg_verify */
+	cdb[0] = VERIFY16_CMD;
+	cdb[1] = 0; /* skip PI, DPO, and byte check. */
+	cdb[2] = (llba >> 56) & 0xff;
+	cdb[3] = (llba >> 48) & 0xff;
+	cdb[4] = (llba >> 40) & 0xff;
+	cdb[5] = (llba >> 32) & 0xff;
+	cdb[6] = (llba >> 24) & 0xff;
+	cdb[7] = (llba >> 16) & 0xff;
+	cdb[8] = (llba >> 8) & 0xff;
+	cdb[9] = llba & 0xff;
+	cdb[10] = (veri_len >> 24) & 0xff;
+	cdb[11] = (veri_len >> 16) & 0xff;
+	cdb[12] = (veri_len >> 8) & 0xff;
+	cdb[13] = veri_len & 0xff;
+	cdb[14] = 0;
+	cdb[15] = 0;
+	memset(sense, 0, SENSE_BUF_LEN);
+
+	/* v3 SG_IO */
+	memset(&iohdr, 0, sizeof(iohdr));
+	iohdr.interface_id = 'S';
+	iohdr.dxfer_direction = SG_DXFER_NONE;
+	iohdr.cmdp = cdb;
+	iohdr.cmd_len = VERIFY16_CMDLEN;
+	iohdr.sbp = sense;
+	iohdr.mx_sb_len = SENSE_BUF_LEN;
+	iohdr.flags |= SG_FLAG_Q_AT_TAIL;
+	iohdr.timeout = 30000; /* 30s */
+
+	error = ioctl(disk->d_fd, SG_IO, &iohdr);
+	if (error)
+		return error;
+
+	dbg_printf("VERIFY(16) fd %d lba %"PRIu64" len %"PRIu64" info %x "
+			"status %d masked %d msg %d host %d driver %d "
+			"duration %d resid %d\n",
+			disk->d_fd, startblock, blockcount, iohdr.info,
+			iohdr.status, iohdr.masked_status, iohdr.msg_status,
+			iohdr.host_status, iohdr.driver_status, iohdr.duration,
+			iohdr.resid);
+
+	if (iohdr.info & SG_INFO_CHECK) {
+		dbg_printf("status: msg %x host %x driver %x\n",
+				iohdr.msg_status, iohdr.host_status,
+				iohdr.driver_status);
+		errno = EIO;
+		return -1;
+	}
+
+	return error;
+}
+#else
+# define disk_scsi_verify(...)		(ENOTTY)
+#endif /* HAVE_SG_IO */
+
+/* Test the availability of the kernel scrub ioctl. */
+static bool
+disk_can_scsi_verify(
+	struct disk		*disk)
+{
+	int			error;
+
+	if (debug_tweak_on("XFS_SCRUB_NO_SCSI_VERIFY"))
+		return false;
+
+	error = disk_scsi_verify(disk, 0, 1);
+	return error == 0;
+}
+
+/* Open a disk device and discover its geometry. */
+int
+disk_open(
+	const char		*pathname,
+	struct disk		*disk)
+{
+#ifdef HAVE_HDIO_GETGEO
+	struct hd_geometry	bdgeo;
+#endif
+	bool			suspicious_disk = false;
+	int			lba_sz;
+	int			error;
+
+	disk->d_fd = open(pathname, O_RDONLY | O_DIRECT | O_NOATIME);
+	if (disk->d_fd < 0)
+		return -1;
+
+	/* Try to get LBA size. */
+	error = ioctl(disk->d_fd, BLKSSZGET, &lba_sz);
+	if (error)
+		lba_sz = 512;
+	disk->d_lbalog = libxfs_log2_roundup(lba_sz);
+
+	/* Obtain disk's stat info. */
+	error = fstat(disk->d_fd, &disk->d_sb);
+	if (error) {
+		error = errno;
+		close(disk->d_fd);
+		errno = error;
+		disk->d_fd = -1;
+		return -1;
+	}
+
+	/* Determine bdev size, block size, and offset. */
+	if (S_ISBLK(disk->d_sb.st_mode)) {
+		error = ioctl(disk->d_fd, BLKGETSIZE64, &disk->d_size);
+		if (error)
+			disk->d_size = 0;
+		error = ioctl(disk->d_fd, BLKBSZGET, &disk->d_blksize);
+		if (error)
+			disk->d_blksize = 0;
+#ifdef HAVE_HDIO_GETGEO
+		error = ioctl(disk->d_fd, HDIO_GETGEO, &bdgeo);
+		if (!error) {
+			/*
+			 * dm devices will pass through ioctls, which means
+			 * we can't use SCSI VERIFY unless the start is 0.
+			 * Most dm devices don't set geometry (unlike scsi
+			 * and nvme) so use a zeroed out CHS to screen them
+			 * out.
+			 */
+			if (bdgeo.start != 0 &&
+			    (unsigned long long)bdgeo.heads * bdgeo.sectors *
+					bdgeo.sectors == 0)
+				suspicious_disk = true;
+			disk->d_start = bdgeo.start << BBSHIFT;
+		} else
+#endif
+			disk->d_start = 0;
+	} else {
+		disk->d_size = disk->d_sb.st_size;
+		disk->d_blksize = disk->d_sb.st_blksize;
+		disk->d_start = 0;
+	}
+
+	/* Can we issue SCSI VERIFY? */
+	if (!suspicious_disk && disk_can_scsi_verify(disk))
+		disk->d_flags |= DISK_FLAG_SCSI_VERIFY;
+
+	return 0;
+}
+
+/* Close a disk device. */
+int
+disk_close(
+	struct disk		*disk)
+{
+	int			error = 0;
+
+	if (disk->d_fd >= 0)
+		error = close(disk->d_fd);
+	disk->d_fd = -1;
+	return error;
+}
+
+/* Is this device open? */
+bool
+disk_is_open(
+	struct disk		*disk)
+{
+	return disk->d_fd >= 0;
+}
+
+#define BTOLBAT(d, bytes)	((uint64_t)(bytes) >> (d)->d_lbalog)
+#define LBASIZE(d)		(1ULL << (d)->d_lbalog)
+#define BTOLBA(d, bytes)	(((uint64_t)(bytes) + LBASIZE(d) - 1) >> (d)->d_lbalog)
+
+/* Read-verify an extent of a disk device. */
+ssize_t
+disk_read_verify(
+	struct disk		*disk,
+	void			*buf,
+	uint64_t		start,
+	uint64_t		length)
+{
+	/* Convert to logical block size. */
+	if (disk->d_flags & DISK_FLAG_SCSI_VERIFY)
+		return disk_scsi_verify(disk, BTOLBAT(disk, start),
+				BTOLBA(disk, length));
+
+	return pread(disk->d_fd, buf, length, start);
+}
diff --git a/scrub/disk.h b/scrub/disk.h
new file mode 100644
index 0000000..8930075
--- /dev/null
+++ b/scrub/disk.h
@@ -0,0 +1,41 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef DISK_H_
+#define DISK_H_
+
+#define DISK_FLAG_SCSI_VERIFY	0x1
+struct disk {
+	struct stat	d_sb;
+	int		d_fd;
+	int		d_lbalog;
+	unsigned int	d_flags;
+	unsigned int	d_blksize;	/* bytes */
+	uint64_t	d_size;		/* bytes */
+	uint64_t	d_start;	/* bytes */
+};
+
+unsigned int disk_heads(struct disk *disk);
+bool disk_is_open(struct disk *disk);
+int disk_open(const char *pathname, struct disk *disk);
+int disk_close(struct disk *disk);
+ssize_t disk_read_verify(struct disk *disk, void *buf, uint64_t startblock,
+		uint64_t blockcount);
+
+#endif /* DISK_H_ */
diff --git a/scrub/iocmd.c b/scrub/iocmd.c
new file mode 100644
index 0000000..de99e1b
--- /dev/null
+++ b/scrub/iocmd.c
@@ -0,0 +1,412 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "libxfs.h"
+#include <linux/fiemap.h>
+#include <sys/statvfs.h>
+#include <sys/types.h>
+#include <dirent.h>
+#include <sys/xattr.h>
+#include "../repair/threads.h"
+#include "disk.h"
+#include "scrub.h"
+#include "iocmd.h"
+
+#define NR_EXTENTS	512
+
+/* Scan a filesystem tree. */
+struct scan_fs_tree {
+	unsigned int		nr_dirs;
+	pthread_mutex_t		lock;
+	pthread_cond_t		wakeup;
+	struct stat		root_sb;
+	bool			moveon;
+	bool			(*dir_fn)(struct scrub_ctx *, const char *,
+					  int, void *);
+	bool			(*dirent_fn)(struct scrub_ctx *, const char *,
+					     int, struct dirent *,
+					     struct stat *, void *);
+	void			*arg;
+};
+
+/* Per-work-item scan context. */
+struct scan_fs_tree_dir {
+	char			*path;
+	struct scan_fs_tree	*sft;
+	bool			rootdir;
+};
+
+/* Scan a directory sub tree. */
+static void
+scan_fs_dir(
+	struct work_queue	*wq,
+	xfs_agnumber_t		agno,
+	void			*arg)
+{
+	struct scrub_ctx	*ctx = (struct scrub_ctx *)wq->mp;
+	struct scan_fs_tree_dir	*sftd = arg;
+	struct scan_fs_tree	*sft = sftd->sft;
+	DIR			*dir;
+	struct dirent		*dirent;
+	char			newpath[PATH_MAX];
+	struct scan_fs_tree_dir	*new_sftd;
+	struct stat		sb;
+	int			dir_fd;
+	int			error;
+
+	/* Open the directory. */
+	dir_fd = open(sftd->path, O_RDONLY | O_NOATIME | O_NOFOLLOW | O_NOCTTY);
+	if (dir_fd < 0) {
+		if (errno != ENOENT)
+			str_errno(ctx, sftd->path);
+		goto out;
+	}
+
+	/* Caller-specific directory checks. */
+	if (sft->dir_fn && !sft->dir_fn(ctx, sftd->path, dir_fd, sft->arg)) {
+		sft->moveon = false;
+		goto out;
+	}
+
+	/* Caller-specific directory entry function on the rootdir. */
+	if (sftd->rootdir) {
+		/* Get the stat info for this directory entry. */
+		error = fstat(dir_fd, &sb);
+		if (error) {
+			str_errno(ctx, sftd->path);
+			goto out;
+		}
+		if (!sft->dirent_fn(ctx, sftd->path, dir_fd, NULL, &sb,
+				sft->arg)) {
+			sft->moveon = false;
+			goto out;
+		}
+	}
+
+	/* Iterate the directory entries. */
+	dir = fdopendir(dir_fd);
+	if (!dir) {
+		str_errno(ctx, sftd->path);
+		goto out;
+	}
+	rewinddir(dir);
+	for (dirent = readdir(dir); dirent != NULL; dirent = readdir(dir)) {
+		snprintf(newpath, PATH_MAX, "%s/%s", sftd->path,
+				dirent->d_name);
+
+		/* Get the stat info for this directory entry. */
+		error = fstatat(dir_fd, dirent->d_name, &sb,
+				AT_NO_AUTOMOUNT | AT_SYMLINK_NOFOLLOW);
+		if (error) {
+			str_errno(ctx, newpath);
+			continue;
+		}
+
+		/* Ignore files on other filesystems. */
+		if (sb.st_dev != sft->root_sb.st_dev)
+			continue;
+
+		/* Caller-specific directory entry function. */
+		if (!sft->dirent_fn(ctx, newpath, dir_fd, dirent, &sb,
+				sft->arg)) {
+			sft->moveon = false;
+			break;
+		}
+
+		if (xfs_scrub_excessive_errors(ctx)) {
+			sft->moveon = false;
+			break;
+		}
+
+		/* If directory, call ourselves recursively. */
+		if (S_ISDIR(sb.st_mode) && strcmp(".", dirent->d_name) &&
+		    strcmp("..", dirent->d_name)) {
+			new_sftd = malloc(sizeof(struct scan_fs_tree_dir));
+			if (!new_sftd) {
+				str_errno(ctx, newpath);
+				sft->moveon = false;
+				break;
+			}
+			new_sftd->path = strdup(newpath);
+			new_sftd->sft = sft;
+			new_sftd->rootdir = false;
+			pthread_mutex_lock(&sft->lock);
+			sft->nr_dirs++;
+			pthread_mutex_unlock(&sft->lock);
+			queue_work(wq, scan_fs_dir, 0, new_sftd);
+		}
+	}
+
+	/* Close dir, go away. */
+	error = closedir(dir);
+	if (error)
+		str_errno(ctx, sftd->path);
+
+out:
+	pthread_mutex_lock(&sft->lock);
+	sft->nr_dirs--;
+	if (sft->nr_dirs == 0)
+		pthread_cond_signal(&sft->wakeup);
+	pthread_mutex_unlock(&sft->lock);
+
+	free(sftd->path);
+	free(sftd);
+}
+
+/* Scan the entire filesystem. */
+bool
+scan_fs_tree(
+	struct scrub_ctx	*ctx,
+	bool			(*dir_fn)(struct scrub_ctx *, const char *,
+					  int, void *),
+	bool			(*dirent_fn)(struct scrub_ctx *, const char *,
+						int, struct dirent *,
+						struct stat *, void *),
+	void			*arg)
+{
+	struct work_queue	wq;
+	struct scan_fs_tree	sft;
+	struct scan_fs_tree_dir	*sftd;
+
+	sft.moveon = true;
+	sft.nr_dirs = 1;
+	sft.root_sb = ctx->mnt_sb;
+	sft.dir_fn = dir_fn;
+	sft.dirent_fn = dirent_fn;
+	sft.arg = arg;
+	pthread_mutex_init(&sft.lock, NULL);
+	pthread_cond_init(&sft.wakeup, NULL);
+
+	sftd = malloc(sizeof(struct scan_fs_tree_dir));
+	if (!sftd) {
+		str_errno(ctx, ctx->mntpoint);
+		return false;
+	}
+	sftd->path = strdup(ctx->mntpoint);
+	sftd->sft = &sft;
+	sftd->rootdir = true;
+
+	create_work_queue(&wq, (struct xfs_mount *)ctx, scrub_nproc(ctx));
+	queue_work(&wq, scan_fs_dir, 0, sftd);
+
+	pthread_mutex_lock(&sft.lock);
+	pthread_cond_wait(&sft.wakeup, &sft.lock);
+	assert(sft.nr_dirs == 0);
+	pthread_mutex_unlock(&sft.lock);
+	destroy_work_queue(&wq);
+
+	return sft.moveon;
+}
+
+/* Check an inode's extents... the hard way. */
+static bool
+fibmap(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	int			fd,
+	bool			(*fn)(struct scrub_ctx *, const char *,
+				      struct fiemap_extent *, void *),
+	void			*arg)
+{
+	struct stat		sb;
+	struct fiemap_extent	extent = {0};
+	unsigned int		blk;
+	unsigned int		b;
+	unsigned int		blksz;
+	unsigned long long	physical;
+	off_t			numblocks;
+	bool			moveon = true;
+	int			error;
+
+	assert(scrub_has_fibmap(ctx));
+
+	error = fstat(fd, &sb);
+	if (error) {
+		str_errno(ctx, descr);
+		return false;
+	}
+
+	blksz = ctx->datadev.d_blksize;
+	numblocks = (sb.st_size + blksz - 1) / blksz;
+	if (numblocks > UINT_MAX)
+		numblocks = UINT_MAX;
+	extent.fe_flags = FIEMAP_EXTENT_MERGED;
+	for (blk = 0; blk < numblocks; blk++) {
+		b = blk;
+		error = ioctl(fd, FIBMAP, &b);
+		if (error) {
+			if (errno == EOPNOTSUPP || errno == EINVAL) {
+				str_warn(ctx, descr,
+_("data block FIEMAP/FIBMAP not supported, will not check extent map."));
+				ctx->quirks &= ~SCRUB_QUIRK_FIBMAP_WORKS;
+				return true;
+			}
+			str_errno(ctx, descr);
+			continue;
+		}
+
+		physical = b * blksz;
+		if (extent.fe_length > 0 &&
+		    physical == extent.fe_physical + extent.fe_length) {
+			/* Physically contiguous, just merge. */
+			extent.fe_length += blksz;
+		} else {
+			/* Emit extent if there is one. */
+			if (extent.fe_length > 0) {
+				moveon = fn(ctx, descr, &extent, arg);
+				if (!moveon)
+					break;
+			}
+			if (physical == 0) {
+				/* b == 0 means a hole... */
+				extent.fe_length = 0;
+			} else {
+				/* Start a new extent. */
+				extent.fe_physical = physical;
+				extent.fe_logical = blk * blksz;
+				extent.fe_length = blksz;
+			}
+		}
+
+		if (xfs_scrub_excessive_errors(ctx)) {
+			moveon = false;
+			break;
+		}
+	}
+
+	/* If there's an extent left over, emit it. */
+	if (moveon && extent.fe_length > 0) {
+		extent.fe_flags |= FIEMAP_EXTENT_LAST;
+		moveon = fn(ctx, descr, &extent, arg);
+	}
+
+	return moveon;
+}
+
+/* Call the FIEMAP ioctl on a file. */
+bool
+fiemap(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	int			fd,
+	bool			attr_fork,
+	bool			use_fibmap,
+	bool			(*fn)(struct scrub_ctx *, const char *,
+				      struct fiemap_extent *, void *),
+	void			*arg)
+{
+	struct fiemap		*fiemap;
+	struct fiemap_extent	*extent;
+	size_t			sz;
+	__u64			next_logical;
+	bool			moveon = true;
+	bool			last = false;
+	unsigned int		i;
+	int			error;
+
+	assert(attr_fork || (scrub_has_fiemap(ctx) || scrub_has_fibmap(ctx)));
+	assert(!attr_fork || scrub_has_fiemap_attr(ctx));
+
+	if (!attr_fork && !scrub_has_fiemap(ctx))
+		return use_fibmap ? fibmap(ctx, descr, fd, fn, arg) : false;
+	else if (attr_fork && !scrub_has_fiemap_attr(ctx))
+		return true;
+
+	sz = sizeof(struct fiemap) + sizeof(struct fiemap_extent) * NR_EXTENTS;
+	fiemap = calloc(1, sz);
+	if (!fiemap) {
+		str_errno(ctx, descr);
+		return false;
+	}
+
+	fiemap->fm_length = ~0ULL;
+	fiemap->fm_flags = FIEMAP_FLAG_SYNC;
+	if (attr_fork)
+		fiemap->fm_flags |= FIEMAP_FLAG_XATTR;
+	fiemap->fm_extent_count = NR_EXTENTS;
+	fiemap->fm_reserved = 0;
+	next_logical = 0;
+
+	while (!last) {
+		fiemap->fm_start = next_logical;
+		error = ioctl(fd, FS_IOC_FIEMAP, (unsigned long)fiemap);
+		if (error < 0 && (errno == EOPNOTSUPP || errno == EBADR)) {
+			if (attr_fork) {
+				str_warn(ctx, descr,
+_("extended attribute FIEMAP not supported, will not check extent map."));
+				ctx->quirks &= ~SCRUB_QUIRK_FIEMAP_ATTR_WORKS;
+			} else {
+				ctx->quirks &= ~SCRUB_QUIRK_FIEMAP_WORKS;
+			}
+			break;
+		}
+		if (error < 0) {
+			str_errno(ctx, descr);
+			break;
+		}
+
+		/* No more extents to map, exit */
+		if (!fiemap->fm_mapped_extents)
+			break;
+
+		for (i = 0; i < fiemap->fm_mapped_extents; i++) {
+			extent = &fiemap->fm_extents[i];
+
+			moveon = fn(ctx, descr, extent, arg);
+			if (!moveon)
+				goto out;
+
+			if (xfs_scrub_excessive_errors(ctx)) {
+				moveon = false;
+				goto out;
+			}
+
+			next_logical = extent->fe_logical + extent->fe_length;
+			if (extent->fe_flags & FIEMAP_EXTENT_LAST)
+				last = true;
+		}
+	}
+
+out:
+	free(fiemap);
+	return moveon;
+}
+
+#ifndef FITRIM
+struct fstrim_range {
+	__u64 start;
+	__u64 len;
+	__u64 minlen;
+};
+#define FITRIM		_IOWR('X', 121, struct fstrim_range)	/* Trim */
+#endif
+
+/* Call FITRIM to trim all the unused space in a filesystem. */
+void
+fstrim(
+	struct scrub_ctx	*ctx)
+{
+	struct fstrim_range	range = {0};
+	int			error;
+
+	range.len = ULLONG_MAX;
+	error = ioctl(ctx->mnt_fd, FITRIM, &range);
+	if (error && errno != EOPNOTSUPP && errno != ENOTTY)
+		perror(_("fstrim"));
+}
diff --git a/scrub/iocmd.h b/scrub/iocmd.h
new file mode 100644
index 0000000..047e5fc
--- /dev/null
+++ b/scrub/iocmd.h
@@ -0,0 +1,50 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef IOCMD_H_
+#define IOCMD_H_
+
+struct fiemap_extent;
+
+bool
+scan_fs_tree(
+	struct scrub_ctx	*ctx,
+	bool			(*dir_fn)(struct scrub_ctx *, const char *,
+					  int, void *),
+	bool			(*dirent_fn)(struct scrub_ctx *, const char *,
+						int, struct dirent *,
+						struct stat *, void *),
+	void			*arg);
+
+bool
+fiemap(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	int			fd,
+	bool			attr_fork,
+	bool			fibmap,
+	bool			(*fn)(struct scrub_ctx *, const char *,
+				      struct fiemap_extent *, void *),
+	void			*arg);
+
+void
+fstrim(
+	struct scrub_ctx	*ctx);
+
+#endif /* IOCMD_H_ */
diff --git a/scrub/read_verify.c b/scrub/read_verify.c
new file mode 100644
index 0000000..7a9994e
--- /dev/null
+++ b/scrub/read_verify.c
@@ -0,0 +1,314 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "libxfs.h"
+#include <sys/statvfs.h>
+#include <sys/types.h>
+#include <dirent.h>
+#include "disk.h"
+#include "scrub.h"
+#include "../repair/threads.h"
+#include "read_verify.h"
+
+/* How many bytes have we verified? */
+static pthread_mutex_t		verified_lock = PTHREAD_MUTEX_INITIALIZER;
+static unsigned long long	verified_bytes;
+
+/* Tolerate 64k holes in adjacent read verify requests. */
+#define IO_BATCH_LOCALITY	(65536)
+
+/* Create a thread pool to run read verifiers. */
+bool
+read_verify_pool_init(
+	struct read_verify_pool		*rvp,
+	struct scrub_ctx		*ctx,
+	void				*readbuf,
+	size_t				readbufsz,
+	size_t				min_io_sz,
+	read_verify_ioend_fn_t		ioend_fn,
+	unsigned int			nproc)
+{
+	rvp->rvp_readbuf = readbuf;
+	rvp->rvp_readbufsz = readbufsz;
+	rvp->rvp_ctx = ctx;
+	rvp->rvp_min_io_size = min_io_sz;
+	rvp->ioend_fn = ioend_fn;
+	rvp->rvp_nproc = nproc;
+	create_work_queue(&rvp->rvp_wq, (struct xfs_mount *)rvp, nproc);
+	return true;
+}
+
+/* How many bytes has this process verified? */
+unsigned long long
+read_verify_bytes(void)
+{
+	return verified_bytes;
+}
+
+/* Finish up any read verification work and tear it down. */
+void
+read_verify_pool_destroy(
+	struct read_verify_pool		*rvp)
+{
+	destroy_work_queue(&rvp->rvp_wq);
+	memset(&rvp->rvp_wq, 0, sizeof(struct work_queue));
+}
+
+/*
+ * Issue a read-verify IO in big batches.
+ */
+static void
+read_verify(
+	struct work_queue		*wq,
+	xfs_agnumber_t			agno,
+	void				*arg)
+{
+	struct read_verify		*rv = arg;
+	struct read_verify_pool		*rvp;
+	unsigned long long		verified = 0;
+	ssize_t				sz;
+	ssize_t				len;
+
+	rvp = (struct read_verify_pool *)wq->mp;
+	while (rv->io_length > 0) {
+		len = min(rv->io_length, rvp->rvp_readbufsz);
+		dbg_printf("diskverify %d %"PRIu64" %zu\n", rv->io_disk->d_fd,
+				rv->io_start, len);
+		sz = disk_read_verify(rv->io_disk, rvp->rvp_readbuf,
+				rv->io_start, len);
+		if (sz < 0) {
+			dbg_printf("IOERR %d %"PRIu64" %zu\n",
+					rv->io_disk->d_fd,
+					rv->io_start, len);
+			rvp->ioend_fn(rvp, rv->io_disk, rv->io_start,
+					rvp->rvp_min_io_size,
+					errno, rv->io_end_arg);
+			len = rvp->rvp_min_io_size;
+		}
+
+		verified += len;
+		rv->io_start += len;
+		rv->io_length -= len;
+	}
+
+	free(rv);
+	pthread_mutex_lock(&verified_lock);
+	verified_bytes += verified;
+	pthread_mutex_unlock(&verified_lock);
+}
+
+/* Queue a read verify request. */
+static void
+read_verify_queue(
+	struct read_verify_pool		*rvp,
+	struct read_verify		*rv)
+{
+	struct read_verify		*tmp;
+
+	dbg_printf("verify fd %d start %"PRIu64" len %"PRIu64"\n",
+			rv->io_disk->d_fd, rv->io_start, rv->io_length);
+
+	tmp = malloc(sizeof(struct read_verify));
+	if (!tmp) {
+		rvp->ioend_fn(rvp, rv->io_disk, rv->io_start, rv->io_length,
+				errno, rv->io_end_arg);
+		return;
+	}
+	*tmp = *rv;
+
+	queue_work(&rvp->rvp_wq, read_verify, 0, tmp);
+}
+
+/*
+ * Issue an IO request.  We'll batch subsequent requests if they're
+ * within 64k of each other
+ */
+void
+read_verify_schedule(
+	struct read_verify_pool		*rvp,
+	struct read_verify		*rv,
+	struct disk			*disk,
+	uint64_t			start,
+	uint64_t			length,
+	void				*end_arg)
+{
+	uint64_t			ve_end;
+	uint64_t			io_end;
+
+	assert(rvp->rvp_readbuf);
+	ve_end = start + length;
+	io_end = rv->io_start + rv->io_length;
+
+	/*
+	 * If we have a stashed IO, we haven't changed fds, the error
+	 * reporting is the same, and the two extents are close,
+	 * we can combine them.
+	 */
+	if (rv->io_length > 0 && disk == rv->io_disk &&
+	    end_arg == rv->io_end_arg &&
+	    ((start >= rv->io_start && start <= io_end + IO_BATCH_LOCALITY) ||
+	     (rv->io_start >= start &&
+	      rv->io_start <= ve_end + IO_BATCH_LOCALITY))) {
+		rv->io_start = min(rv->io_start, start);
+		rv->io_length = max(ve_end, io_end) - rv->io_start;
+	} else  {
+		/* Otherwise, issue the stashed IO (if there is one) */
+		if (rv->io_length > 0)
+			read_verify_queue(rvp, rv);
+
+		/* Stash the new IO. */
+		rv->io_disk = disk;
+		rv->io_start = start;
+		rv->io_length = length;
+		rv->io_end_arg = end_arg;
+	}
+}
+
+/* Force any stashed IOs into the verifier. */
+void
+read_verify_force(
+	struct read_verify_pool		*rvp,
+	struct read_verify		*rv)
+{
+	assert(rvp->rvp_readbuf);
+	if (rv->io_length == 0)
+		return;
+
+	read_verify_queue(rvp, rv);
+	rv->io_length = 0;
+}
+
+/* Read all the data in a file. */
+bool
+read_verify_file(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	int			fd,
+	struct stat		*sb)
+{
+	off_t			data_end = 0;
+	off_t			data_start;
+	off_t			start;
+	ssize_t			sz;
+	size_t			count;
+	unsigned long long	verified = 0;
+	bool			reports_holes = true;
+	bool			direct_io = false;
+	bool			moveon = true;
+	int			flags;
+	int			error;
+
+	/*
+	 * Try to force the kernel to read file data from disk.  First
+	 * we try to set O_DIRECT.  If that fails, try to purge the page
+	 * cache.
+	 */
+	flags = fcntl(fd, F_GETFL);
+	error = fcntl(fd, F_SETFL, flags | O_DIRECT);
+	if (error)
+		posix_fadvise(fd, 0, sb->st_size, POSIX_FADV_DONTNEED);
+	else
+		direct_io = true;
+
+	/* See if SEEK_DATA/SEEK_HOLE work... */
+	data_start = lseek(fd, data_end, SEEK_DATA);
+	if (data_start < 0) {
+		/* ENXIO for SEEK_DATA means no file data anywhere. */
+		if (errno == ENXIO)
+			return true;
+		reports_holes = false;
+	}
+
+	if (reports_holes) {
+		data_end = lseek(fd, data_start, SEEK_HOLE);
+		if (data_end < 0)
+			reports_holes = false;
+	}
+
+	/* ...or just read everything if they don't. */
+	if (!reports_holes) {
+		data_start = 0;
+		data_end = sb->st_size;
+	}
+
+	if (!direct_io) {
+		posix_fadvise(fd, 0, sb->st_size, POSIX_FADV_SEQUENTIAL);
+		posix_fadvise(fd, 0, sb->st_size, POSIX_FADV_WILLNEED);
+	}
+	/* Read the non-hole areas. */
+	while (data_start < data_end) {
+		start = data_start;
+
+		if (direct_io && (start & (page_size - 1)))
+			start &= ~(page_size - 1);
+		count = min(IO_MAX_SIZE, data_end - start);
+		if (direct_io && (count & (page_size - 1)))
+			count = (count + page_size) & ~(page_size - 1);
+		sz = pread(fd, ctx->readbuf, count, start);
+		if (sz < 0) {
+			str_errno(ctx, descr);
+			break;
+		} else if (sz == 0) {
+			str_error(ctx, descr,
+_("Read zero bytes, expected %zu."),
+					count);
+			break;
+		} else if (sz != count && start + sz != data_end) {
+			str_warn(ctx, descr,
+_("Short read of %zu bytes, expected %zu."),
+					sz, count);
+		}
+		verified += sz;
+		data_start = start + sz;
+
+		if (xfs_scrub_excessive_errors(ctx)) {
+			moveon = false;
+			break;
+		}
+
+		if (data_start >= data_end && reports_holes) {
+			data_start = lseek(fd, data_end, SEEK_DATA);
+			if (data_start < 0) {
+				if (errno != ENXIO)
+					str_errno(ctx, descr);
+				break;
+			}
+			data_end = lseek(fd, data_start, SEEK_HOLE);
+			if (data_end < 0) {
+				if (errno != ENXIO)
+					str_errno(ctx, descr);
+				break;
+			}
+		}
+	}
+
+	/* Turn off O_DIRECT. */
+	if (direct_io) {
+		flags = fcntl(fd, F_GETFL);
+		error = fcntl(fd, F_SETFL, flags & ~O_DIRECT);
+		if (error)
+			str_errno(ctx, descr);
+	}
+
+	pthread_mutex_lock(&verified_lock);
+	verified_bytes += verified;
+	pthread_mutex_unlock(&verified_lock);
+
+	return moveon;
+}
diff --git a/scrub/read_verify.h b/scrub/read_verify.h
new file mode 100644
index 0000000..a10ba8c
--- /dev/null
+++ b/scrub/read_verify.h
@@ -0,0 +1,59 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef READ_VERIFY_H_
+#define READ_VERIFY_H_
+
+struct read_verify_pool;
+
+typedef void (*read_verify_ioend_fn_t)(struct read_verify_pool *rvp,
+		struct disk *disk, uint64_t start, uint64_t length,
+		int error, void *arg);
+typedef void (*read_verify_ioend_arg_free_fn_t)(void *arg);
+
+struct read_verify_pool {
+	struct work_queue	rvp_wq;
+	struct scrub_ctx	*rvp_ctx;
+	void			*rvp_readbuf;
+	read_verify_ioend_fn_t	ioend_fn;
+	read_verify_ioend_arg_free_fn_t	ioend_arg_free_fn;
+	size_t			rvp_readbufsz;		/* bytes */
+	size_t			rvp_min_io_size;	/* bytes */
+	int			rvp_nproc;
+};
+
+bool read_verify_pool_init(struct read_verify_pool *rvp, struct scrub_ctx *ctx,
+		void *readbuf, size_t readbufsz, size_t min_io_sz,
+		read_verify_ioend_fn_t ioend_fn, unsigned int nproc);
+void read_verify_pool_destroy(struct read_verify_pool *rvp);
+
+struct read_verify {
+	void			*io_end_arg;
+	struct disk		*io_disk;
+	uint64_t		io_start;	/* bytes */
+	uint64_t		io_length;	/* bytes */
+};
+
+void read_verify_schedule(struct read_verify_pool *rvp, struct read_verify *rv,
+		struct disk *disk, uint64_t start, uint64_t length,
+		void *end_arg);
+void read_verify_force(struct read_verify_pool *rvp, struct read_verify *rv);
+unsigned long long read_verify_bytes(void);
+
+#endif /* READ_VERIFY_H_ */
diff --git a/scrub/scrub.c b/scrub/scrub.c
new file mode 100644
index 0000000..5ed6559
--- /dev/null
+++ b/scrub/scrub.c
@@ -0,0 +1,1055 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "libxfs.h"
+#include <stdio.h>
+#include <mntent.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/resource.h>
+#include <sys/statvfs.h>
+#include <sys/vfs.h>
+#include <fcntl.h>
+#include <dirent.h>
+#include "disk.h"
+#include "scrub.h"
+#include "../repair/threads.h"
+#include "read_verify.h"
+
+#define _PATH_PROC_MOUNTS	"/proc/mounts"
+
+bool				verbose;
+int				debug;
+bool				scrub_data;
+bool				dumpcore;
+bool				display_rusage;
+long				page_size;
+int				nr_threads = -1;
+enum errors_action		error_action = ERRORS_CONTINUE;
+static unsigned long		max_errors;
+
+static void __attribute__((noreturn))
+usage(void)
+{
+	fprintf(stderr, _("Usage: %s [OPTIONS] mountpoint\n"), progname);
+	fprintf(stderr, _("-a:\tStop after this many errors are found.\n"));
+	fprintf(stderr, _("-d:\tRun program in debug mode.\n"));
+	fprintf(stderr, _("-e:\tWhat to do if errors are found.\n"));
+	fprintf(stderr, _("-j:\tStart no more than this many threads.\n"));
+	fprintf(stderr, _("-m:\tPath to /etc/mtab.\n"));
+	fprintf(stderr, _("-n:\tDry run.  Do not modify anything.\n"));
+	fprintf(stderr, _("-t:\tUse this filesystem backend for scrubbing.\n"));
+	fprintf(stderr, _("-T:\tDisplay timing/usage information.\n"));
+	fprintf(stderr, _("-v:\tVerbose output.\n"));
+	fprintf(stderr, _("-V:\tPrint version.\n"));
+	fprintf(stderr, _("-x:\tScrub file data too.\n"));
+	fprintf(stderr, _("-y:\tRepair all errors.\n"));
+
+	exit(16);
+}
+
+/*
+ * Check if the argument is either the device name or mountpoint of a mounted
+ * filesystem.
+ */
+static bool
+find_mountpoint_check(
+	struct stat		*sb,
+	struct mntent		*t)
+{
+	struct stat		ms;
+
+	if (S_ISDIR(sb->st_mode)) {		/* mount point */
+		if (stat(t->mnt_dir, &ms) < 0)
+			return false;
+		if (sb->st_ino != ms.st_ino)
+			return false;
+		if (sb->st_dev != ms.st_dev)
+			return false;
+		/*
+		 * Since we can handle non-XFS filesystems, we don't
+		 * need to check that the device is accessible.
+		 * (The xfs_fsr version of this function does care.)
+		 */
+	} else {				/* device */
+		if (stat(t->mnt_fsname, &ms) < 0)
+			return false;
+		if (sb->st_rdev != ms.st_rdev)
+			return false;
+		/*
+		 * Make sure the mountpoint given by mtab is accessible
+		 * before using it.
+		 */
+		if (stat(t->mnt_dir, &ms) < 0)
+			return false;
+	}
+
+	return true;
+}
+
+/* Check that our alleged mountpoint is in mtab */
+static bool
+find_mountpoint(
+	char			*mtab,
+	struct scrub_ctx	*ctx)
+{
+	struct mntent_cursor	cursor;
+	struct mntent		*t = NULL;
+	bool			found = false;
+
+	if (platform_mntent_open(&cursor, mtab) != 0) {
+		fprintf(stderr, "Error: can't get mntent entries.\n");
+		exit(1);
+	}
+
+	while ((t = platform_mntent_next(&cursor)) != NULL) {
+		/*
+		 * Keep jotting down matching mount details; newer mounts are
+		 * towards the end of the file (hopefully).
+		 */
+		if (find_mountpoint_check(&ctx->mnt_sb, t)) {
+			ctx->mntpoint = strdup(t->mnt_dir);
+			ctx->mnt_type = strdup(t->mnt_type);
+			ctx->blkdev = strdup(t->mnt_fsname);
+			found = true;
+		}
+	}
+	platform_mntent_close(&cursor);
+	return found;
+}
+
+/* Too many errors? Bail out. */
+bool
+xfs_scrub_excessive_errors(
+	struct scrub_ctx	*ctx)
+{
+	bool			ret;
+
+	pthread_mutex_lock(&ctx->lock);
+	ret = max_errors > 0 && ctx->errors_found >= max_errors;
+	pthread_mutex_unlock(&ctx->lock);
+
+	return ret;
+}
+
+/* Get the name of the repair tool. */
+const char *
+repair_tool(
+	struct scrub_ctx	*ctx)
+{
+	if (ctx->ops->repair_tool)
+		return ctx->ops->repair_tool;
+
+	return "fsck";
+}
+
+/* Print a string and whatever error is stored in errno. */
+void
+__str_errno(
+	struct scrub_ctx	*ctx,
+	const char		*str,
+	const char		*file,
+	int			line)
+{
+	char			buf[DESCR_BUFSZ];
+
+	pthread_mutex_lock(&ctx->lock);
+	fprintf(stderr, "%s: %s.", str, strerror_r(errno, buf, DESCR_BUFSZ));
+	if (debug)
+		fprintf(stderr, " (%s line %d)", file, line);
+	fprintf(stderr, "\n");
+	ctx->errors_found++;
+	pthread_mutex_unlock(&ctx->lock);
+}
+
+/* Print a string and some error text. */
+void
+__str_error(
+	struct scrub_ctx	*ctx,
+	const char		*str,
+	const char		*file,
+	int			line,
+	const char		*format,
+	...)
+{
+	va_list			args;
+
+	pthread_mutex_lock(&ctx->lock);
+	fprintf(stderr, "%s: ", str);
+	va_start(args, format);
+	vfprintf(stderr, format, args);
+	va_end(args);
+	if (debug)
+		fprintf(stderr, " (%s line %d)", file, line);
+	fprintf(stderr, "\n");
+	ctx->errors_found++;
+	pthread_mutex_unlock(&ctx->lock);
+}
+
+/* Print a string and some warning text. */
+void
+__str_warn(
+	struct scrub_ctx	*ctx,
+	const char		*str,
+	const char		*file,
+	int			line,
+	const char		*format,
+	...)
+{
+	va_list			args;
+
+	pthread_mutex_lock(&ctx->lock);
+	fprintf(stderr, "%s: ", str);
+	va_start(args, format);
+	vfprintf(stderr, format, args);
+	va_end(args);
+	if (debug)
+		fprintf(stderr, " (%s line %d)", file, line);
+	fprintf(stderr, "\n");
+	ctx->warnings_found++;
+	pthread_mutex_unlock(&ctx->lock);
+}
+
+/* Print a string and some informational text. */
+void
+__str_info(
+	struct scrub_ctx	*ctx,
+	const char		*str,
+	const char		*file,
+	int			line,
+	const char		*format,
+	...)
+{
+	va_list			args;
+
+	pthread_mutex_lock(&ctx->lock);
+	fprintf(stdout, "%s: ", str);
+	va_start(args, format);
+	vfprintf(stdout, format, args);
+	va_end(args);
+	if (debug)
+		fprintf(stdout, " (%s line %d)", file, line);
+	fprintf(stdout, "\n");
+	fflush(stdout);
+	pthread_mutex_unlock(&ctx->lock);
+}
+
+/* Increment the repair count. */
+void
+__record_repair(
+	struct scrub_ctx	*ctx,
+	const char		*str,
+	const char		*file,
+	int			line,
+	const char		*format,
+	...)
+{
+	va_list			args;
+
+	pthread_mutex_lock(&ctx->lock);
+	fprintf(stderr, "%s: ", str);
+	va_start(args, format);
+	vfprintf(stderr, format, args);
+	va_end(args);
+	if (debug)
+		fprintf(stderr, " (%s line %d)", file, line);
+	fprintf(stderr, "\n");
+	ctx->repairs++;
+	pthread_mutex_unlock(&ctx->lock);
+}
+
+/* Increment the optimization (preening) count. */
+void
+__record_preen(
+	struct scrub_ctx	*ctx,
+	const char		*str,
+	const char		*file,
+	int			line,
+	const char		*format,
+	...)
+{
+	va_list			args;
+
+	pthread_mutex_lock(&ctx->lock);
+	if (debug || verbose) {
+		fprintf(stdout, "%s: ", str);
+		va_start(args, format);
+		vfprintf(stdout, format, args);
+		va_end(args);
+		if (debug)
+			fprintf(stdout, " (%s line %d)", file, line);
+		fprintf(stdout, "\n");
+		fflush(stdout);
+	}
+	ctx->preens++;
+	pthread_mutex_unlock(&ctx->lock);
+}
+
+/*
+ * FS-specific scrub operations profiles.
+ * generic_scrub_ops will be selected if nothing else is.
+ */
+static struct scrub_ops *scrub_impl[] = {
+	NULL
+};
+
+void __attribute__((noreturn))
+do_error(char const *msg, ...)
+{
+	va_list args;
+
+	fprintf(stderr, _("\nfatal error -- "));
+
+	va_start(args, msg);
+	vfprintf(stderr, msg, args);
+	if (dumpcore)
+		abort();
+	exit(1);
+}
+
+#define SCRUB_QUIRK_FNS(name, flagname) \
+bool \
+scrub_has_##name( \
+	struct scrub_ctx		*ctx) \
+{ \
+	return ctx->quirks & SCRUB_QUIRK_##flagname; \
+}
+SCRUB_QUIRK_FNS(fiemap,		FIEMAP_WORKS)
+SCRUB_QUIRK_FNS(fiemap_attr,	FIEMAP_ATTR_WORKS)
+SCRUB_QUIRK_FNS(fibmap,		FIBMAP_WORKS)
+SCRUB_QUIRK_FNS(shared_blocks,	SHARED_BLOCKS)
+SCRUB_QUIRK_FNS(unstable_inums,	UNSTABLE_INUM)
+
+/* How many threads to kick off? */
+unsigned int
+scrub_nproc(
+	struct scrub_ctx	*ctx)
+{
+	if (nr_threads < 0)
+		return ctx->nr_io_threads;
+	return min(ctx->nr_io_threads, nr_threads);
+}
+
+/* Decide if a value is within +/- (n/d) of a desired value. */
+bool
+within_range(
+	struct scrub_ctx	*ctx,
+	unsigned long long	value,
+	unsigned long long	desired,
+	unsigned long long	diff_threshold,
+	unsigned int		n,
+	unsigned int		d,
+	const char		*descr)
+{
+	assert(n < d);
+
+	/* Don't complain if difference does not exceed an absolute value. */
+	if (value < desired && desired - value < diff_threshold)
+		return true;
+	if (value > desired && value - desired < diff_threshold)
+		return true;
+
+	/* Complain if the difference exceeds a certain percentage. */
+	if (value < desired * (d - n) / d) {
+		str_warn(ctx, ctx->mntpoint,
+_("Found fewer %s than reported"), descr);
+		return false;
+	}
+	if (value > desired * (d + n) / d) {
+		str_warn(ctx, ctx->mntpoint,
+_("Found more %s than reported"), descr);
+		return false;
+	}
+	return true;
+}
+
+static double
+timeval_subtract(
+	struct timeval		*tv1,
+	struct timeval		*tv2)
+{
+	return ((tv1->tv_sec - tv2->tv_sec) +
+		((float) (tv1->tv_usec - tv2->tv_usec)) / 1000000);
+}
+
+/* Produce human readable disk space output. */
+double
+auto_space_units(
+	unsigned long long	bytes,
+	char			**units)
+{
+	if (debug > 1)
+		goto no_prefix;
+	if (bytes > (1ULL << 40)) {
+		*units = "TiB";
+		return (double)bytes / (1ULL << 40);
+	} else if (bytes > (1ULL << 30)) {
+		*units = "GiB";
+		return (double)bytes / (1ULL << 30);
+	} else if (bytes > (1ULL << 20)) {
+		*units = "MiB";
+		return (double)bytes / (1ULL << 20);
+	} else if (bytes > (1ULL << 10)) {
+		*units = "KiB";
+		return (double)bytes / (1ULL << 10);
+	} else {
+no_prefix:
+		*units = "B";
+		return bytes;
+	}
+}
+
+/* Produce human readable discrete number output. */
+double
+auto_units(
+	unsigned long long	number,
+	char			**units)
+{
+	if (debug > 1)
+		goto no_prefix;
+	if (number > 1000000000000ULL) {
+		*units = "T";
+		return number / 1000000000000.0;
+	} else if (number > 1000000000ULL) {
+		*units = "G";
+		return number / 1000000000.0;
+	} else if (number > 1000000ULL) {
+		*units = "M";
+		return number / 1000000.0;
+	} else if (number > 1000ULL) {
+		*units = "K";
+		return number / 1000.0;
+	} else {
+no_prefix:
+		*units = "";
+		return number;
+	}
+}
+
+/*
+ * Given a directory fd and (possibly) a dirent, open the file associated
+ * with the entry.  If the entry is null, just duplicate the dir_fd.
+ */
+int
+dirent_open(
+	int			dir_fd,
+	struct dirent		*dirent)
+{
+	if (!dirent)
+		return dup(dir_fd);
+	return openat(dir_fd, dirent->d_name,
+			O_RDONLY | O_NOATIME | O_NOFOLLOW | O_NOCTTY);
+}
+
+#ifndef RUSAGE_BOTH
+# define RUSAGE_BOTH		(-2)
+#endif
+
+/* Get resource usage for ourselves and all children. */
+int
+scrub_getrusage(
+	struct rusage		*usage)
+{
+	struct rusage		cusage;
+	int			err;
+
+	err = getrusage(RUSAGE_BOTH, usage);
+	if (!err)
+		return err;
+
+	err = getrusage(RUSAGE_SELF, usage);
+	if (err)
+		return err;
+
+	err = getrusage(RUSAGE_CHILDREN, &cusage);
+	if (err)
+		return err;
+
+	usage->ru_minflt += cusage.ru_minflt;
+	usage->ru_majflt += cusage.ru_majflt;
+	usage->ru_nswap += cusage.ru_nswap;
+	usage->ru_inblock += cusage.ru_inblock;
+	usage->ru_oublock += cusage.ru_oublock;
+	usage->ru_msgsnd += cusage.ru_msgsnd;
+	usage->ru_msgrcv += cusage.ru_msgrcv;
+	usage->ru_nsignals += cusage.ru_nsignals;
+	usage->ru_nvcsw += cusage.ru_nvcsw;
+	usage->ru_nivcsw += cusage.ru_nivcsw;
+	return 0;
+}
+
+struct phase_info {
+	struct rusage		ruse;
+	struct timeval		time;
+	unsigned long long	verified_bytes;
+	void			*brk_start;
+	const char		*tag;
+};
+
+/* Start tracking resource usage for a phase. */
+static bool
+phase_start(
+	struct phase_info	*pi,
+	const char		*tag,
+	const char		*descr)
+{
+	int			error;
+
+	error = scrub_getrusage(&pi->ruse);
+	if (error) {
+		perror(_("getrusage"));
+		return false;
+	}
+	pi->brk_start = sbrk(0);
+
+	error = gettimeofday(&pi->time, NULL);
+	if (error) {
+		perror(_("gettimeofday"));
+		return false;
+	}
+	pi->tag = tag;
+
+	pi->verified_bytes = read_verify_bytes();
+
+	if ((verbose || display_rusage) && descr) {
+		fprintf(stdout, _("%s%s\n"), pi->tag, descr);
+		fflush(stdout);
+	}
+	return true;
+}
+
+/* Report usage stats. */
+static bool
+phase_end(
+	struct phase_info	*pi)
+{
+	struct rusage		ruse_now;
+#ifdef HAVE_MALLINFO
+	struct mallinfo		mall_now;
+#endif
+	struct timeval		time_now;
+	double			dt;
+	unsigned long long	verified;
+	long			in, out;
+	long			io;
+	double			i, o, t;
+	double			din, dout, dtot;
+	char			*iu, *ou, *tu, *dinu, *doutu, *dtotu;
+	double			v, dv;
+	char			*vu, *dvu;
+	int			error;
+
+	if (!display_rusage)
+		return true;
+
+	error = gettimeofday(&time_now, NULL);
+	if (error) {
+		perror(_("gettimeofday"));
+		return false;
+	}
+	dt = timeval_subtract(&time_now, &pi->time);
+
+	error = scrub_getrusage(&ruse_now);
+	if (error) {
+		perror(_("getrusage"));
+		return false;
+	}
+
+#define kbytes(x)	(((unsigned long)(x) + 1023) / 1024)
+#ifdef HAVE_MALLINFO
+
+	mall_now = mallinfo();
+	fprintf(stdout, _("%sMemory used: %luk/%luk (%luk/%luk), "), pi->tag,
+		kbytes(mall_now.arena), kbytes(mall_now.hblkhd),
+		kbytes(mall_now.uordblks), kbytes(mall_now.fordblks));
+#else
+	fprintf(stdout, _("%sMemory used: %luk, "), pi->tag,
+		(unsigned long) kbytes(((char *) sbrk(0)) -
+				       ((char *) pi->brk_start)));
+#endif
+#undef kbytes
+
+	fprintf(stdout, _("time: %5.2f/%5.2f/%5.2fs\n"),
+		timeval_subtract(&time_now, &pi->time),
+		timeval_subtract(&ruse_now.ru_utime, &pi->ruse.ru_utime),
+		timeval_subtract(&ruse_now.ru_stime, &pi->ruse.ru_stime));
+
+	/* I/O usage */
+	in =  (ruse_now.ru_inblock - pi->ruse.ru_inblock) << BBSHIFT;
+	out = (ruse_now.ru_oublock - pi->ruse.ru_oublock) << BBSHIFT;
+	io = in + out;
+	if (io) {
+		i = auto_space_units(in, &iu);
+		o = auto_space_units(out, &ou);
+		t = auto_space_units(io, &tu);
+		din = auto_space_units(in / dt, &dinu);
+		dout = auto_space_units(out / dt, &doutu);
+		dtot = auto_space_units(io / dt, &dtotu);
+		fprintf(stdout,
+_("%sI/O: %.1f%s in, %.1f%s out, %.1f%s tot\n"),
+			pi->tag, i, iu, o, ou, t, tu);
+		fprintf(stdout,
+_("%sI/O rate: %.1f%s/s in, %.1f%s/s out, %.1f%s/s tot\n"),
+			pi->tag, din, dinu, dout, doutu, dtot, dtotu);
+	}
+
+	/* How many bytes were read-verified? */
+	verified = read_verify_bytes() - pi->verified_bytes;
+	if (verified) {
+		v = auto_space_units(verified, &vu);
+		dv = auto_space_units(verified / dt, &dvu);
+		fprintf(stdout, _("%sVerify: %.1f%s, rate: %.1f%s/s\n"),
+			pi->tag, v, vu, dv, dvu);
+	}
+	fflush(stdout);
+
+	return true;
+}
+
+/* Find filesystem geometry and perform any other setup functions. */
+static bool
+find_geo(
+	struct scrub_ctx	*ctx)
+{
+	bool			moveon;
+	int			error;
+
+	/*
+	 * Open the directory with O_NOATIME.  For mountpoints owned
+	 * by root, this should be sufficient to ensure that we have
+	 * CAP_SYS_ADMIN, which we probably need to do anything fancy
+	 * with the (XFS driver) kernel.
+	 */
+	ctx->mnt_fd = open(ctx->mntpoint, O_RDONLY | O_NOATIME | O_DIRECTORY);
+	if (ctx->mnt_fd < 0) {
+		if (errno == EPERM)
+			str_info(ctx, ctx->mntpoint,
+_("Must be root to run scrub."));
+		else
+			str_errno(ctx, ctx->mntpoint);
+		return false;
+	}
+	error = disk_open(ctx->blkdev, &ctx->datadev);
+	if (error && errno != ENOENT)
+		str_errno(ctx, ctx->blkdev);
+
+	error = fstat(ctx->mnt_fd, &ctx->mnt_sb);
+	if (error) {
+		str_errno(ctx, ctx->mntpoint);
+		return false;
+	}
+	error = fstatvfs(ctx->mnt_fd, &ctx->mnt_sv);
+	if (error) {
+		str_errno(ctx, ctx->mntpoint);
+		return false;
+	}
+	error = fstatfs(ctx->mnt_fd, &ctx->mnt_sf);
+	if (error) {
+		str_errno(ctx, ctx->mntpoint);
+		return false;
+	}
+	if (disk_is_open(&ctx->datadev))
+		ctx->nr_io_threads = disk_heads(&ctx->datadev);
+	else
+		ctx->nr_io_threads = libxfs_nproc();
+	moveon = ctx->ops->scan_fs(ctx);
+	if (verbose) {
+		fprintf(stdout, _("%s: using %d threads to scrub.\n"),
+				ctx->mntpoint, scrub_nproc(ctx));
+		fflush(stdout);
+	}
+
+	return moveon;
+}
+
+struct scrub_phase {
+	char		*descr;
+	bool		(*fn)(struct scrub_ctx *);
+};
+
+/* Run the preening phase if there are no errors. */
+static bool
+preen(
+	struct scrub_ctx	*ctx)
+{
+	if (ctx->errors_found) {
+		str_info(ctx, ctx->mntpoint,
+_("Errors found, please re-run with -y."));
+		return true;
+	}
+
+	return ctx->ops->preen_fs(ctx);
+}
+
+/* Run all the phases of the scrubber. */
+#define REPAIR_DUMMY_FN		((void *)1)
+#define DATASCAN_DUMMY_FN	((void *)2)
+static bool
+run_scrub_phases(
+	struct scrub_ctx	*ctx)
+{
+	struct scrub_phase	phases[] = {
+		{_("Find filesystem geometry."),   find_geo},
+		{_("Check internal metadata."),	   ctx->ops->scan_metadata},
+		{_("Scan all inodes."),		   ctx->ops->scan_inodes},
+		{_("Check directory structure."),  ctx->ops->scan_fs_tree},
+		{NULL, REPAIR_DUMMY_FN},
+		{_("Verify data file integrity."), DATASCAN_DUMMY_FN},
+		{_("Check summary counters."),	   ctx->ops->check_summary},
+		{NULL, NULL},
+	};
+	struct phase_info	pi;
+	char			buf[DESCR_BUFSZ];
+	struct scrub_phase	*phase;
+	bool			moveon;
+	int			c;
+
+	/* Run all phases of the scrub tool. */
+	for (c = 1, phase = phases; phase->fn; phase++, c++) {
+		/* Inject the repair/preen function. */
+		if (phase->fn == REPAIR_DUMMY_FN) {
+			if (ctx->mode == SCRUB_MODE_PREEN) {
+				phase->descr = _("Preen filesystem.");
+				phase->fn = preen;
+			} else if (ctx->mode == SCRUB_MODE_REPAIR) {
+				phase->descr = _("Repair filesystem.");
+				phase->fn = ctx->ops->repair_fs;
+			}
+		} else if (phase->fn == DATASCAN_DUMMY_FN && scrub_data)
+			phase->fn = ctx->ops->scan_blocks;
+
+		if (phase->fn == REPAIR_DUMMY_FN ||
+		    phase->fn == DATASCAN_DUMMY_FN) {
+			c--;
+			continue;
+		} else if (phase->descr)
+			snprintf(buf, DESCR_BUFSZ, _("Phase %d: "), c);
+		else
+			buf[0] = 0;
+		moveon = phase_start(&pi, buf, phase->descr);
+		if (!moveon)
+			return false;
+		moveon = phase->fn(ctx);
+		if (!moveon)
+			return false;
+		moveon = phase_end(&pi);
+		if (!moveon)
+			return false;
+
+		/* Too many errors? */
+		if (xfs_scrub_excessive_errors(ctx))
+			return false;
+	}
+
+	return true;
+}
+
+/* Find an appropriate scrub backend. */
+static struct scrub_ops *
+find_ops(
+	const char		*mnt_type)
+{
+	struct scrub_ops	**ops;
+	struct scrub_ops	*op;
+	const char		*p;
+
+	for (ops = scrub_impl; *ops; ops++) {
+		op = *ops;
+		if (op->aliases) {
+			for (p = op->aliases; *p != 0; p += strlen(p) + 1) {
+				if (!strcmp(mnt_type, p))
+					return op;
+			}
+		}
+		if (!strcmp(mnt_type, op->name))
+			return op;
+	}
+
+	return NULL;
+}
+
+int
+main(
+	int			argc,
+	char			**argv)
+{
+	int			c;
+	char			*mtab = NULL;
+	struct scrub_ctx	ctx = {0};
+	struct phase_info	all_pi;
+	long			arg;
+	bool			ismnt;
+	bool			moveon = true;
+	static bool		injected;
+	int			ret;
+	int			error;
+
+	progname = basename(argv[0]);
+	setlocale(LC_ALL, "");
+	bindtextdomain(PACKAGE, LOCALEDIR);
+	textdomain(PACKAGE);
+
+	pthread_mutex_init(&ctx.lock, NULL);
+	ctx.datadev.d_fd = -1;
+	ctx.mode = SCRUB_MODE_DEFAULT;
+	while ((c = getopt(argc, argv, "a:de:j:m:nTt:vxVy")) != EOF) {
+		switch (c) {
+		case 'a':
+			max_errors = strtoull(optarg, NULL, 10);
+			if (errno) {
+				perror("max_errors");
+				usage();
+			}
+			break;
+		case 'd':
+			debug++;
+			dumpcore = true;
+			break;
+		case 'e':
+			if (!strcmp("continue", optarg))
+				error_action = ERRORS_CONTINUE;
+			else if (!strcmp("shutdown", optarg))
+				error_action = ERRORS_SHUTDOWN;
+			else
+				usage();
+			break;
+		case 'j':
+			arg = strtol(optarg, NULL, 10);
+			if (errno || arg < 0 || arg > INT_MAX) {
+				perror("nr_threads");
+				usage();
+			}
+			nr_threads = arg;
+			break;
+		case 'm':
+			mtab = optarg;
+			break;
+		case 'n':
+			if (ctx.mode != SCRUB_MODE_DEFAULT) {
+				fprintf(stderr,
+_("Only one of the options -n or -y may be specified.\n"));
+				return 1;
+			}
+			ctx.mode = SCRUB_MODE_DRY_RUN;
+			break;
+		case 't':
+			ctx.ops = find_ops(optarg);
+			break;
+		case 'T':
+			display_rusage = true;
+			break;
+		case 'v':
+			verbose = true;
+			break;
+		case 'x':
+			scrub_data = true;
+			break;
+		case 'V':
+			fprintf(stdout, _("%s version %s\n"), progname,
+					VERSION);
+			fflush(stdout);
+			exit(0);
+		case 'y':
+			if (ctx.mode != SCRUB_MODE_DEFAULT) {
+				fprintf(stderr,
+_("Only one of the options -n or -y may be specified.\n"));
+				return 1;
+			}
+			ctx.mode = SCRUB_MODE_REPAIR;
+			break;
+		case '?':
+			/* fall through */
+		default:
+			usage();
+		}
+	}
+
+	if (optind != argc - 1)
+		usage();
+
+	ctx.mntpoint = argv[optind];
+	if (!debug_tweak_on("XFS_SCRUB_NO_FIEMAP"))
+		ctx.quirks |= SCRUB_QUIRK_FIEMAP_WORKS |
+			      SCRUB_QUIRK_FIEMAP_ATTR_WORKS;
+	if (!debug_tweak_on("XFS_SCRUB_NO_FIBMAP"))
+		ctx.quirks |= SCRUB_QUIRK_FIBMAP_WORKS;
+
+	/* Find the mount record for the passed-in argument. */
+
+	if (stat(argv[optind], &ctx.mnt_sb) < 0) {
+		fprintf(stderr,
+			_("%s: could not stat: %s: %s\n"),
+			progname, argv[optind], strerror(errno));
+		ret = 16;
+		goto end;
+	}
+
+	/*
+	 * If the user did not specify an explicit mount table, try to use
+	 * /proc/mounts if it is available, else /etc/mtab.  We prefer
+	 * /proc/mounts because it is kernel controlled, while /etc/mtab
+	 * may contain garbage that userspace tools like pam_mounts wrote
+	 * into it.
+	 */
+	if (!mtab) {
+		if (access(_PATH_PROC_MOUNTS, R_OK) == 0)
+			mtab = _PATH_PROC_MOUNTS;
+		else
+			mtab = _PATH_MOUNTED;
+	}
+
+	ismnt = find_mountpoint(mtab, &ctx);
+	if (!ismnt) {
+		fprintf(stderr, _("%s: Not a mount point or block device.\n"),
+			ctx.mntpoint);
+		ret = 16;
+		goto end;
+	}
+
+	/* Find an appropriate scrub backend. */
+	if (!ctx.ops)
+		ctx.ops = find_ops(ctx.mnt_type);
+	if (verbose) {
+		fprintf(stdout,
+			_("%s: scrubbing %s filesystem with %s driver.\n"),
+			ctx.mntpoint, ctx.mnt_type, ctx.ops->name);
+		fflush(stdout);
+	}
+
+	/* Initialize overall phase stats. */
+	moveon = phase_start(&all_pi, "", NULL);
+	if (!moveon)
+		goto out;
+
+	/*
+	 * Does our backend support shutting down, if the user
+	 * wants errors=shutdown?
+	 */
+	if (error_action == ERRORS_SHUTDOWN && ctx.ops->shutdown_fs == NULL) {
+		fprintf(stderr,
+_("%s: %s driver does not support error shutdown!\n"),
+			ctx.mntpoint, ctx.ops->name);
+		goto out;
+	}
+
+	/* Does our backend support preen, if the user so requests? */
+	if (ctx.mode == SCRUB_MODE_PREEN && ctx.ops->preen_fs == NULL) {
+		fprintf(stderr,
+_("%s: %s driver does not support preening filesystem!\n"),
+			ctx.mntpoint, ctx.ops->name);
+		goto out;
+	}
+
+	/* Does our backend support repair, if the user so requests? */
+	if (ctx.mode == SCRUB_MODE_REPAIR && ctx.ops->repair_fs == NULL) {
+		fprintf(stderr,
+_("%s: %s driver does not support repairing filesystem!\n"),
+			ctx.mntpoint, ctx.ops->name);
+		goto out;
+	}
+
+	/* Set up a page-aligned buffer for read verification. */
+	page_size = sysconf(_SC_PAGESIZE);
+	if (page_size < 0) {
+		str_errno(&ctx, ctx.mntpoint);
+		goto out;
+	}
+
+	/* Try to allocate a read buffer if we don't have one. */
+	error = posix_memalign((void **)&ctx.readbuf, page_size,
+			IO_MAX_SIZE);
+	if (error || !ctx.readbuf) {
+		str_errno(&ctx, ctx.mntpoint);
+		goto out;
+	}
+
+	/* Flush everything out to disk before we start. */
+	error = syncfs(ctx.mnt_fd);
+	if (error) {
+		str_errno(&ctx, ctx.mntpoint);
+		goto out;
+	}
+
+	if (debug_tweak_on("XFS_SCRUB_FORCE_REPAIR") && !injected) {
+		ctx.mode = SCRUB_MODE_REPAIR;
+		injected = true;
+	}
+
+	/* Scrub a filesystem. */
+	moveon = run_scrub_phases(&ctx);
+	if (!moveon)
+		goto out;
+
+out:
+	if (xfs_scrub_excessive_errors(&ctx))
+		str_info(&ctx, ctx.mntpoint, _("Too many errors; aborting."));
+
+	ret = 0;
+	if (!moveon)
+		ret |= 8;
+
+	/* Clean up scan data. */
+	moveon = ctx.ops->cleanup(&ctx);
+	if (!moveon)
+		ret |= 8;
+
+	if (ctx.repairs && ctx.preens)
+		fprintf(stdout,
+_("%s: %lu repairs and %lu optimizations made.\n"),
+			ctx.mntpoint, ctx.repairs, ctx.preens);
+	else if (ctx.repairs && ctx.preens == 0)
+		fprintf(stdout,
+_("%s: %lu repairs made.\n"),
+			ctx.mntpoint, ctx.repairs);
+	else if (ctx.repairs == 0 && ctx.preens)
+		fprintf(stdout,
+_("%s: %lu optimizations made.\n"),
+			ctx.mntpoint, ctx.preens);
+
+	if (ctx.errors_found && ctx.warnings_found)
+		fprintf(stderr,
+_("%s: %lu errors and %lu warnings found.  Unmount and run %s.\n"),
+			ctx.mntpoint, ctx.errors_found, ctx.warnings_found,
+			repair_tool(&ctx));
+	else if (ctx.errors_found && ctx.warnings_found == 0)
+		fprintf(stderr,
+_("%s: %lu errors found.  Unmount and run %s.\n"),
+			ctx.mntpoint, ctx.errors_found, repair_tool(&ctx));
+	else if (ctx.errors_found == 0 && ctx.warnings_found)
+		fprintf(stderr,
+_("%s: %lu warnings found.\n"),
+			ctx.mntpoint, ctx.warnings_found);
+	if (ctx.errors_found) {
+		if (error_action == ERRORS_SHUTDOWN)
+			ctx.ops->shutdown_fs(&ctx);
+		ret |= 4;
+	}
+	phase_end(&all_pi);
+	close(ctx.mnt_fd);
+	disk_close(&ctx.datadev);
+
+	free(ctx.blkdev);
+	free(ctx.readbuf);
+	free(ctx.mntpoint);
+	free(ctx.mnt_type);
+end:
+	return ret;
+}
diff --git a/scrub/scrub.h b/scrub/scrub.h
new file mode 100644
index 0000000..442371b
--- /dev/null
+++ b/scrub/scrub.h
@@ -0,0 +1,192 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef SCRUB_H_
+#define SCRUB_H_
+
+#define DESCR_BUFSZ		256
+
+/*
+ * Perform all IO in 32M chunks.  This cannot exceed 65536 sectors
+ * because that's the biggest SCSI VERIFY(16) we dare to send.
+ */
+#define IO_MAX_SIZE		33554432
+#define IO_MAX_SECTORS		(IO_MAX_SIZE >> BBSHIFT)
+
+struct scrub_ctx;
+
+struct scrub_ops {
+	const char	*name;
+	const char	*repair_tool;
+	const char	*aliases; /* null-separated string, end w/ two nulls */
+	bool (*cleanup)(struct scrub_ctx *ctx);
+	bool (*scan_fs)(struct scrub_ctx *ctx);
+	bool (*scan_inodes)(struct scrub_ctx *ctx);
+	bool (*check_dir)(struct scrub_ctx *ctx, const char *descr, int dir_fd);
+	bool (*check_inode)(struct scrub_ctx *ctx, const char *descr, int fd,
+			    struct stat *sb);
+	bool (*scan_extents)(struct scrub_ctx *ctx, const char *descr, int fd,
+			     struct stat *sb, bool attr_fork);
+	bool (*scan_xattrs)(struct scrub_ctx *ctx, const char *descr, int fd);
+	bool (*scan_special_xattrs)(struct scrub_ctx *ctx, const char *path);
+	bool (*scan_metadata)(struct scrub_ctx *ctx);
+	bool (*check_summary)(struct scrub_ctx *ctx);
+	bool (*scan_blocks)(struct scrub_ctx *ctx);
+	bool (*read_file)(struct scrub_ctx *ctx, const char *descr, int fd,
+			  struct stat *sb);
+	bool (*scan_fs_tree)(struct scrub_ctx *ctx);
+	bool (*preen_fs)(struct scrub_ctx *ctx);
+	bool (*repair_fs)(struct scrub_ctx *ctx);
+	void (*shutdown_fs)(struct scrub_ctx *ctx);
+};
+
+enum scrub_mode {
+	SCRUB_MODE_DRY_RUN,
+	SCRUB_MODE_PREEN,
+	SCRUB_MODE_REPAIR,
+};
+#define SCRUB_MODE_DEFAULT			SCRUB_MODE_PREEN
+
+#define SCRUB_QUIRK_FIEMAP_WORKS	(1UL << 0)
+#define SCRUB_QUIRK_FIEMAP_ATTR_WORKS	(1UL << 1)
+#define SCRUB_QUIRK_FIBMAP_WORKS	(1UL << 2)
+#define SCRUB_QUIRK_SHARED_BLOCKS	(1UL << 3)
+/* dirent/stat inode numbers do not match */
+#define SCRUB_QUIRK_UNSTABLE_INUM	(1UL << 4)
+
+bool scrub_has_fiemap(struct scrub_ctx *ctx);
+bool scrub_has_fiemap_attr(struct scrub_ctx *ctx);
+bool scrub_has_fibmap(struct scrub_ctx *ctx);
+bool scrub_has_shared_blocks(struct scrub_ctx *ctx);
+bool scrub_has_unstable_inums(struct scrub_ctx *ctx);
+
+struct scrub_ctx {
+	/* Immutable scrub state. */
+	struct scrub_ops	*ops;
+	char			*mntpoint;
+	char			*blkdev;
+	char			*mnt_type;
+	void			*readbuf;
+	int			mnt_fd;
+	enum scrub_mode		mode;
+	unsigned int		nr_io_threads;
+	struct disk		datadev;
+	struct stat		mnt_sb;
+	struct statvfs		mnt_sv;
+	struct statfs		mnt_sf;
+
+	/* Mutable scrub state; use lock. */
+	pthread_mutex_t		lock;
+	unsigned long		errors_found;
+	unsigned long		warnings_found;
+	unsigned long		repairs;
+	unsigned long		preens;
+	unsigned long		quirks;
+
+	void			*priv;
+};
+
+enum errors_action {
+	ERRORS_CONTINUE,
+	ERRORS_SHUTDOWN,
+};
+
+extern bool			verbose;
+extern int			debug;
+extern bool			scrub_data;
+extern long			page_size;
+extern enum errors_action	error_action;
+extern int			nr_threads;
+
+bool xfs_scrub_excessive_errors(struct scrub_ctx *ctx);
+
+void __str_errno(struct scrub_ctx *, const char *, const char *, int);
+void __str_error(struct scrub_ctx *, const char *, const char *, int,
+		 const char *, ...);
+void __str_warn(struct scrub_ctx *, const char *, const char *, int,
+		const char *, ...);
+void __str_info(struct scrub_ctx *, const char *, const char *, int,
+		const char *, ...);
+void __record_repair(struct scrub_ctx *, const char *, const char *, int,
+		const char *, ...);
+void __record_preen(struct scrub_ctx *, const char *, const char *, int,
+		const char *, ...);
+
+#define str_errno(ctx, str)		__str_errno(ctx, str, __FILE__, __LINE__)
+#define str_error(ctx, str, ...)	__str_error(ctx, str, __FILE__, __LINE__, __VA_ARGS__)
+#define str_warn(ctx, str, ...)		__str_warn(ctx, str, __FILE__, __LINE__, __VA_ARGS__)
+#define str_info(ctx, str, ...)		__str_info(ctx, str, __FILE__, __LINE__, __VA_ARGS__)
+#define record_repair(ctx, str, ...)	__record_repair(ctx, str, __FILE__, __LINE__, __VA_ARGS__)
+#define record_preen(ctx, str, ...)	__record_preen(ctx, str, __FILE__, __LINE__, __VA_ARGS__)
+#define dbg_printf(fmt, ...)		{if (debug > 1) {printf(fmt, __VA_ARGS__);}}
+
+#ifndef container_of
+# define container_of(ptr, type, member) ({			\
+	const typeof( ((type *)0)->member ) *__mptr = (ptr);	\
+		(type *)( (char *)__mptr - offsetof(type,member) );})
+#endif
+
+/* Is this debug tweak enabled? */
+static inline bool
+debug_tweak_on(
+	const char		*name)
+{
+	return debug && getenv(name) != NULL;
+}
+
+/* Generic implementations of the ops functions */
+bool generic_cleanup(struct scrub_ctx *ctx);
+bool generic_scan_fs(struct scrub_ctx *ctx);
+bool generic_scan_inodes(struct scrub_ctx *ctx);
+bool generic_check_dir(struct scrub_ctx *ctx, const char *descr, int dir_fd);
+bool generic_check_inode(struct scrub_ctx *ctx, const char *descr, int fd,
+			 struct stat *sb);
+bool generic_scan_extents(struct scrub_ctx *ctx, const char *descr, int fd,
+			  struct stat *sb, bool attr_fork);
+bool generic_scan_xattrs(struct scrub_ctx *ctx, const char *descr, int fd);
+bool generic_scan_special_xattrs(struct scrub_ctx *ctx, const char *path);
+bool generic_scan_metadata(struct scrub_ctx *ctx);
+bool generic_check_summary(struct scrub_ctx *ctx);
+bool read_verify_file(struct scrub_ctx *ctx, const char *descr, int fd,
+		      struct stat *sb);
+bool generic_scan_blocks(struct scrub_ctx *ctx);
+bool generic_scan_fs_tree(struct scrub_ctx *ctx);
+bool generic_preen_fs(struct scrub_ctx *ctx);
+
+/* Miscellaneous utility functions */
+unsigned int scrub_nproc(struct scrub_ctx *ctx);
+bool generic_check_directory(struct scrub_ctx *ctx, const char *descr,
+		int *pfd);
+bool within_range(struct scrub_ctx *ctx, unsigned long long value,
+		unsigned long long desired, unsigned long long diff_threshold,
+		unsigned int n, unsigned int d, const char *descr);
+double auto_space_units(unsigned long long kilobytes, char **units);
+double auto_units(unsigned long long number, char **units);
+const char *repair_tool(struct scrub_ctx *ctx);
+int dirent_open(int dir_fd, struct dirent *dirent);
+
+#ifndef HAVE_SYNCFS
+static inline int syncfs(int fd)
+{
+	sync();
+	return 0;
+}
+#endif
+
+#endif /* SCRUB_H_ */


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 14/17] xfs_scrub: add generic VFS scrubber implementation
  2017-01-21  8:08 [PATCH v5 00/17] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (12 preceding siblings ...)
  2017-01-21  8:09 ` [PATCH 13/17] xfs_scrub: create online filesystem scrub program Darrick J. Wong
@ 2017-01-21  8:09 ` Darrick J. Wong
  2017-01-21  8:09 ` [PATCH 16/17] xfs_scrub: add tweaks for specific non-XFS filesystems Darrick J. Wong
  2017-01-21  8:10 ` [PATCH 17/17] xfs_scrub: create a script to scrub all xfs filesystems Darrick J. Wong
  15 siblings, 0 replies; 18+ messages in thread
From: Darrick J. Wong @ 2017-01-21  8:09 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs, linux-fsdevel

First, add a generic scrubber implementation that uses standard VFS
interfaces to walk as much of a filesystem's metadata as possible.  We
can use this to scrub non-XFS filesystems, though the primary intent is
to provide a fallback for the XFS driver should some features be
unavailable.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/Makefile  |    2 
 scrub/generic.c | 1152 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/scrub.c   |    2 
 scrub/scrub.h   |    2 
 4 files changed, 1156 insertions(+), 2 deletions(-)
 create mode 100644 scrub/generic.c


diff --git a/scrub/Makefile b/scrub/Makefile
index 2aa359b..4639eed 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -13,7 +13,7 @@ INSTALL_SCRUB = install-scrub
 endif
 
 HFILES = scrub.h ../repair/threads.h read_verify.h iocmd.h
-CFILES = ../repair/avl64.c disk.c bitmap.c iocmd.c \
+CFILES = ../repair/avl64.c disk.c bitmap.c generic.c iocmd.c \
 	 read_verify.c scrub.c ../repair/threads.c
 
 LLDLIBS += $(LIBBLKID) $(LIBXFS) $(LIBXCMD) $(LIBUUID) $(LIBRT) $(LIBPTHREAD) $(LIBHANDLE)
diff --git a/scrub/generic.c b/scrub/generic.c
new file mode 100644
index 0000000..dbde103
--- /dev/null
+++ b/scrub/generic.c
@@ -0,0 +1,1152 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "libxfs.h"
+#include <linux/fiemap.h>
+#include <sys/statvfs.h>
+#include <sys/types.h>
+#include <dirent.h>
+#include <sys/xattr.h>
+#include "disk.h"
+#include "scrub.h"
+#include "iocmd.h"
+#include "../repair/threads.h"
+#include "read_verify.h"
+#include "bitmap.h"
+
+/*
+ * Generic Filesystem Scrub Strategy
+ *
+ * For a generic filesystem, we can only scrub the filesystem using the
+ * generic VFS APIs that are accessible to userspace.  This requirement
+ * reduces the effectiveness of the scrub because we can only scrub that
+ * which we can find through the directory tree namespace -- we won't be
+ * able to examine open unlinked files or any directory subtree that is
+ * also a mountpoint.
+ *
+ * The "find geometry" phase collects statfs/statvfs information and
+ * opens file descriptors to the mountpoint.  If the filesystem has a
+ * block device, a file descriptor is opened to that as well.
+ *
+ * The VFS has no mechanism to scrub internal metadata or to iterate
+ * inodes by inode number, so those phases do nothing.
+ *
+ * The "check directory structure" phase walks the directory tree
+ * looking for inodes.  Each directory is processed separately by thread
+ * pool workers.  For each entry in a directory, we scrub the following
+ * pieces of metadata:
+ *
+ *     - The dirent inode number is compared against the fstatat output.
+ *     - The dirent type code is also checked against the fstatat type.
+ *     - If it's a symlink, the target is read but not validated.
+ *     - If the entry is not a file or directory, the extended
+ *       attributes names and values are read via llistxattr.
+ *     - If the entry points to a file or directory, open the inode.
+ *       If not, we're done with the entry.
+ *     - The inode stat buffer is re-checked.
+ *     - The extent maps for file data and extended attribute data are
+ *       checked.
+ *     - Extended attributes are read.
+ *
+ * The "verify data file integrity" phase re-walks the directory tree
+ * for files.  If the filesystem supports FIEMAP and we have the block
+ * device open, the data extents are read directly from disk.  This step
+ * is optimized by buffering the disk extents in a bitmap and using the
+ * bitmap to issue large IOs; if there are errors, those are recorded
+ * and cross-referenced against the metadata to identify the affected
+ * files with a second walk/FIEMAP run.  If FIEMAP is unavailable, it
+ * falls back to using SEEK_DATA and SEEK_HOLE to direct-read file
+ * contents.  If even that fails, direct-read the entire file.
+ *
+ * In the "check summary counters" phase, we tally up the blocks and
+ * inodes we saw and compare that to the statfs output.  This gives the
+ * user a rough estimate of how thorough the scrub was.
+ */
+
+#ifndef SEEK_DATA
+# define SEEK_DATA	3	/* seek to the next data */
+#endif
+
+#ifndef SEEK_HOLE
+# define SEEK_HOLE	4	/* seek to the next hole */
+#endif
+
+/* Routines to translate bad physical extents into file paths and offsets. */
+
+/* Report if this extent overlaps a bad region. */
+static bool
+report_verify_inode_fiemap(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	struct fiemap_extent	*extent,
+	void			*arg)
+{
+	struct bitmap	*tree = arg;
+
+	/* Skip non-real/non-aligned extents. */
+	if (extent->fe_flags & (FIEMAP_EXTENT_UNKNOWN |
+				FIEMAP_EXTENT_DELALLOC |
+				FIEMAP_EXTENT_ENCODED |
+				FIEMAP_EXTENT_NOT_ALIGNED |
+				FIEMAP_EXTENT_UNWRITTEN))
+		return true;
+
+	if (!bitmap_has_extent(tree, extent->fe_physical,
+			extent->fe_length))
+		return true;
+
+	str_error(ctx, descr,
+_("offset %llu failed read verification."), extent->fe_logical);
+
+	return true;
+}
+
+/* Iterate the extent mappings of a file to report errors. */
+static bool
+report_verify_fd(
+	struct scrub_ctx		*ctx,
+	const char			*descr,
+	int				fd,
+	void				*arg)
+{
+	/* data fork */
+	fiemap(ctx, descr, fd, false, false, report_verify_inode_fiemap, arg);
+
+	/* attr fork */
+	fiemap(ctx, descr, fd, true, false, report_verify_inode_fiemap, arg);
+
+	return true;
+}
+
+/* Scan the inode associated with a directory entry. */
+static bool
+report_verify_dirent(
+	struct scrub_ctx	*ctx,
+	const char		*path,
+	int			dir_fd,
+	struct dirent		*dirent,
+	struct stat		*sb,
+	void			*arg)
+{
+	bool			moveon;
+	int			fd;
+
+	/* Ignore things we can't open. */
+	if (!S_ISREG(sb->st_mode))
+		return true;
+	/* Ignore . and .. */
+	if (dirent && (!strcmp(".", dirent->d_name) ||
+		       !strcmp("..", dirent->d_name)))
+		return true;
+
+	/* Open the file */
+	fd = dirent_open(dir_fd, dirent);
+	if (fd < 0)
+		return true;
+
+	/* Go find the badness. */
+	moveon = report_verify_fd(ctx, path, fd, arg);
+	if (moveon)
+		goto out;
+
+out:
+	close(fd);
+
+	return moveon;
+}
+
+/* Given bad extent lists for the data device, find bad files. */
+static bool
+report_verify_errors(
+	struct scrub_ctx		*ctx,
+	struct bitmap		*d_bad)
+{
+	/* Scan the directory tree to get file paths. */
+	return scan_fs_tree(ctx, NULL, report_verify_dirent, d_bad);
+}
+
+/* Phase 1 */
+bool
+generic_scan_fs(
+	struct scrub_ctx	*ctx)
+{
+	/* If there's no disk device, forget FIEMAP. */
+	if (!disk_is_open(&ctx->datadev))
+		ctx->quirks &= ~(SCRUB_QUIRK_FIEMAP_WORKS |
+				 SCRUB_QUIRK_FIEMAP_ATTR_WORKS |
+				 SCRUB_QUIRK_FIBMAP_WORKS);
+
+	return true;
+}
+
+bool
+generic_cleanup(
+	struct scrub_ctx	*ctx)
+{
+	/* Nothing to do here. */
+	return true;
+}
+
+/* Phase 2 */
+bool
+generic_scan_metadata(
+	struct scrub_ctx	*ctx)
+{
+	/* Nothing to do here. */
+	return true;
+}
+
+/* Phase 3 */
+bool
+generic_scan_inodes(
+	struct scrub_ctx	*ctx)
+{
+	/* Nothing to do here. */
+	return true;
+}
+
+/* Phase 4 */
+
+/* Check all entries in a directory. */
+bool
+generic_check_dir(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	int			dir_fd)
+{
+	/* Nothing to do here. */
+	return true;
+}
+
+/* Check an extent for problems. */
+static bool
+check_fiemap_extent(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	struct fiemap_extent	*extent,
+	void			*arg)
+{
+	unsigned long long	eofs;
+
+	if (!disk_is_open(&ctx->datadev))
+		return true;
+	eofs = ctx->datadev.d_size;
+
+	if (extent->fe_length == 0)
+		str_error(ctx, descr,
+_("extent (%llu/%llu/%llu) has zero length."),
+			extent->fe_physical,
+			extent->fe_logical,
+			extent->fe_length);
+	if (extent->fe_physical > eofs)
+		str_error(ctx, descr,
+_("extent (%llu/%llu/%llu) starts past end of filesystem at %llu."),
+			extent->fe_physical,
+			extent->fe_logical,
+			extent->fe_length,
+			eofs);
+	if (extent->fe_physical + extent->fe_length > eofs ||
+	    extent->fe_physical + extent->fe_length < extent->fe_physical)
+		str_error(ctx, descr,
+_("extent (%llu/%llu/%llu) ends past end of filesystem at %llu."),
+			extent->fe_physical,
+			extent->fe_logical,
+			extent->fe_length,
+			eofs);
+	if (extent->fe_logical + extent->fe_length < extent->fe_logical)
+		str_error(ctx, descr,
+_("extent (%llu/%llu/%llu) overflows file offset."),
+			extent->fe_physical,
+			extent->fe_logical,
+			extent->fe_length);
+	return true;
+}
+
+/* Check an inode's extents. */
+bool
+generic_scan_extents(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	int			fd,
+	struct stat		*sb,
+	bool			attr_fork)
+{
+	/* FIEMAP only works for files. */
+	if (!S_ISREG(sb->st_mode))
+		return true;
+
+	/* Don't invoke FIEMAP if we don't support it. */
+	if (attr_fork && !scrub_has_fiemap_attr(ctx))
+		return true;
+	if (!attr_fork && !(scrub_has_fiemap(ctx) || scrub_has_fibmap(ctx)))
+		return true;
+
+	return fiemap(ctx, descr, fd, attr_fork, true,
+			check_fiemap_extent, NULL);
+}
+
+/* Check the fields of an inode. */
+bool
+generic_check_inode(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	int			fd,
+	struct stat		*sb)
+{
+	if (sb->st_nlink == 0)
+		str_error(ctx, descr,
+_("nlinks should not be 0."));
+
+	return true;
+}
+
+/* Does this file have extended attributes? */
+bool
+file_has_xattrs(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	int			fd)
+{
+	ssize_t			buf_sz;
+
+	buf_sz = flistxattr(fd, NULL, 0);
+	if (buf_sz == 0)
+		return false;
+	else if (buf_sz < 0) {
+		if (errno == EOPNOTSUPP || errno == ENODATA)
+			return false;
+		str_errno(ctx, descr);
+		return false;
+	}
+
+	return true;
+}
+
+/* Try to read all the extended attributes. */
+bool
+generic_scan_xattrs(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	int			fd)
+{
+	char			*buf = NULL;
+	char			*p;
+	ssize_t			buf_sz;
+	ssize_t			sz;
+	ssize_t			val_sz;
+	ssize_t			sz2;
+	bool			moveon = true;
+
+	buf_sz = flistxattr(fd, NULL, 0);
+	if (buf_sz == 0)
+		return true;
+	else if (buf_sz < 0) {
+		if (errno == EOPNOTSUPP || errno == ENODATA)
+			return true;
+		str_errno(ctx, descr);
+		return true;
+	}
+
+	buf = malloc(buf_sz);
+	if (!buf) {
+		str_errno(ctx, descr);
+		return false;
+	}
+
+	sz = flistxattr(fd, buf, buf_sz);
+	if (sz < 0) {
+		str_errno(ctx, descr);
+		goto out;
+	} else if (sz != buf_sz) {
+		str_error(ctx, descr,
+_("read %zu bytes of xattr names, expected %zu bytes."),
+				sz, buf_sz);
+	}
+
+	/* Read all the attrs and values. */
+	for (p = buf; p < buf + sz; p += strlen(p) + 1) {
+		val_sz = fgetxattr(fd, p, NULL, 0);
+		if (val_sz < 0) {
+			if (errno != EOPNOTSUPP && errno != ENODATA)
+				str_errno(ctx, descr);
+			continue;
+		}
+		sz2 = fgetxattr(fd, p, ctx->readbuf, val_sz);
+		if (sz2 < 0) {
+			str_errno(ctx, descr);
+			continue;
+		} else if (sz2 != val_sz)
+			str_error(ctx, descr,
+_("read %zu bytes from xattr %s value, expected %zu bytes."),
+					sz2, p, val_sz);
+	}
+out:
+	free(buf);
+	return moveon;
+}
+
+/* Try to read all the extended attributes of things that have no fd. */
+bool
+generic_scan_special_xattrs(
+	struct scrub_ctx	*ctx,
+	const char		*path)
+{
+	char			*buf = NULL;
+	char			*p;
+	ssize_t			buf_sz;
+	ssize_t			sz;
+	ssize_t			val_sz;
+	ssize_t			sz2;
+	bool			moveon = true;
+
+	buf_sz = llistxattr(path, NULL, 0);
+	if (buf_sz == -EOPNOTSUPP)
+		return true;
+	else if (buf_sz == 0)
+		return true;
+	else if (buf_sz < 0) {
+		str_errno(ctx, path);
+		return true;
+	}
+
+	buf = malloc(buf_sz);
+	if (!buf) {
+		str_errno(ctx, path);
+		return false;
+	}
+
+	sz = llistxattr(path, buf, buf_sz);
+	if (sz < 0) {
+		str_errno(ctx, path);
+		goto out;
+	} else if (sz != buf_sz) {
+		str_error(ctx, path,
+_("read %zu bytes of xattr names, expected %zu bytes."),
+				sz, buf_sz);
+	}
+
+	/* Read all the attrs and values. */
+	for (p = buf; p < buf + sz; p += strlen(p) + 1) {
+		val_sz = lgetxattr(path, p, NULL, 0);
+		if (val_sz < 0) {
+			str_errno(ctx, path);
+			continue;
+		}
+		sz2 = lgetxattr(path, p, ctx->readbuf, val_sz);
+		if (sz2 < 0) {
+			str_errno(ctx, path);
+			continue;
+		} else if (sz2 != val_sz)
+			str_error(ctx, path,
+_("read %zu bytes from xattr %s value, expected %zu bytes."),
+					sz2, p, val_sz);
+
+		if (xfs_scrub_excessive_errors(ctx)) {
+			moveon = false;
+			break;
+		}
+	}
+out:
+	free(buf);
+	return moveon;
+}
+
+/* Directory checking */
+#define CHECK_TYPE(type) \
+	case DT_##type: \
+		if (!S_IS##type(sb->st_mode)) { \
+			str_error(ctx, descr, \
+_("dtype of block does not match mode 0x%x\n"), \
+				sb->st_mode & S_IFMT); \
+		} \
+		break;
+
+/* Ensure that the directory entry matches the stat info. */
+static bool
+generic_verify_dirent(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	struct dirent		*dirent,
+	struct stat		*sb)
+{
+	if (!scrub_has_unstable_inums(ctx) && dirent->d_ino != sb->st_ino) {
+		str_error(ctx, descr,
+_("inode numbers (%llu != %llu) do not match!"),
+			(unsigned long long)dirent->d_ino,
+			(unsigned long long)sb->st_ino);
+	}
+
+	switch (dirent->d_type) {
+	case DT_UNKNOWN:
+		break;
+	CHECK_TYPE(BLK)
+	CHECK_TYPE(CHR)
+	CHECK_TYPE(DIR)
+	CHECK_TYPE(FIFO)
+	CHECK_TYPE(LNK)
+	CHECK_TYPE(REG)
+	CHECK_TYPE(SOCK)
+	}
+
+	return true;
+}
+#undef CHECK_TYPE
+
+/* Scan the inode associated with a directory entry. */
+static bool
+check_dirent(
+	struct scrub_ctx	*ctx,
+	const char		*path,
+	int			dir_fd,
+	struct dirent		*dirent,
+	struct stat		*sb,
+	void			*arg)
+{
+	struct stat		fd_sb;
+	static char		linkbuf[PATH_MAX + 1];
+	ssize_t			len;
+	bool			moveon;
+	int			fd;
+	int			error;
+
+	/* No dirent for the rootdir; skip it. */
+	if (!dirent)
+		return true;
+
+	/* Check the directory entry itself. */
+	moveon = generic_verify_dirent(ctx, path, dirent, sb);
+	if (!moveon)
+		return moveon;
+
+	/* If symlink, read the target value. */
+	if (S_ISLNK(sb->st_mode)) {
+		len = readlinkat(dir_fd, dirent->d_name, linkbuf,
+				PATH_MAX);
+		if (len < 0)
+			str_errno(ctx, path);
+		else if (len > sb->st_size)
+			str_error(ctx, path,
+_("read %zu bytes from a %zu byte symlink?"),
+				len, sb->st_size);
+	}
+
+	/* Read the xattrs without a file descriptor. */
+	if (S_ISSOCK(sb->st_mode) || S_ISFIFO(sb->st_mode) ||
+	    S_ISBLK(sb->st_mode) || S_ISCHR(sb->st_mode) ||
+	    S_ISLNK(sb->st_mode)) {
+		moveon = ctx->ops->scan_special_xattrs(ctx, path);
+		if (!moveon)
+			return moveon;
+	}
+
+	/* If not dir or file, move on to the next dirent. */
+	if (!S_ISDIR(sb->st_mode) && !S_ISREG(sb->st_mode))
+		return true;
+
+	/* Open the file */
+	fd = openat(dir_fd, dirent->d_name,
+			O_RDONLY | O_NOATIME | O_NOFOLLOW | O_NOCTTY);
+	if (fd < 0) {
+		if (errno != ENOENT)
+			str_errno(ctx, path);
+		return true;
+	}
+
+	/* Did the fstatat and the open race? */
+	if (fstat(fd, &fd_sb) < 0) {
+		str_errno(ctx, path);
+		goto close;
+	}
+	if (fd_sb.st_ino != sb->st_ino || fd_sb.st_dev != sb->st_dev)
+		str_warn(ctx, path,
+_("inode changed out from under us!"));
+
+	/* Check the inode. */
+	moveon = ctx->ops->check_inode(ctx, path, fd, &fd_sb);
+	if (!moveon)
+		goto close;
+
+	/* Scan the extent maps. */
+	moveon = ctx->ops->scan_extents(ctx, path, fd, &fd_sb, false);
+	if (!moveon)
+		goto close;
+	if (file_has_xattrs(ctx, path, fd)) {
+		moveon = ctx->ops->scan_extents(ctx, path, fd, &fd_sb, true);
+		if (!moveon)
+			goto close;
+	}
+
+	/* Read all the extended attributes. */
+	moveon = ctx->ops->scan_xattrs(ctx, path, fd);
+	if (!moveon)
+		goto close;
+
+close:
+	/* Close file. */
+	error = close(fd);
+	if (error)
+		str_errno(ctx, path);
+
+	return moveon;
+}
+
+/*
+ * Check all the entries in a directory.
+ */
+bool
+generic_check_directory(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	int			*pfd)
+{
+	struct stat		sb;
+	DIR			*dir;
+	struct dirent		*dirent;
+	bool			moveon = true;
+	int			fd = *pfd;
+	int			error;
+
+	/* Iterate the directory entries. */
+	dir = fdopendir(fd);
+	if (!dir) {
+		str_errno(ctx, descr);
+		return true;
+	}
+	rewinddir(dir);
+
+	/* Iterate every directory entry. */
+	for (dirent = readdir(dir);
+	     dirent != NULL;
+	     dirent = readdir(dir)) {
+		error = fstatat(fd, dirent->d_name, &sb,
+				AT_NO_AUTOMOUNT | AT_SYMLINK_NOFOLLOW);
+		if (error) {
+			str_errno(ctx, descr);
+			break;
+		}
+
+		/* Ignore files on other filesystems. */
+		if (sb.st_dev != ctx->mnt_sb.st_dev)
+			continue;
+
+		/* Check the type codes. */
+		moveon = generic_verify_dirent(ctx, descr, dirent, &sb);
+		if (!moveon)
+			break;
+
+		if (xfs_scrub_excessive_errors(ctx)) {
+			moveon = false;
+			break;
+		}
+	}
+
+	/* Close dir, go away. */
+	error = closedir(dir);
+	if (error)
+		str_errno(ctx, descr);
+	*pfd = -1;
+	return moveon;
+}
+
+/* Adapter for the check_dir thing. */
+static bool
+check_dir(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	int			dir_fd,
+	void			*arg)
+{
+	return ctx->ops->check_dir(ctx, descr, dir_fd);
+}
+
+/* Traverse the directory tree. */
+bool
+generic_scan_fs_tree(
+	struct scrub_ctx	*ctx)
+{
+	return scan_fs_tree(ctx, check_dir, check_dirent, NULL);
+}
+
+/* Phase 5 */
+
+struct read_verify_files {
+	struct scrub_ctx	*ctx;
+	struct bitmap		good;		/* bytes */
+	struct bitmap		bad;		/* bytes */
+	struct read_verify_pool	rvp;
+	struct read_verify	rv;
+	bool			use_fiemap;
+};
+
+/* Handle an io error while read verifying an extent. */
+void
+read_verify_fiemap_ioerr(
+	struct read_verify_pool		*rvp,
+	struct disk			*disk,
+	uint64_t			start,
+	uint64_t			length,
+	int				error,
+	void				*arg)
+{
+	struct read_verify_files	*rvf = arg;
+
+	bitmap_add(&rvf->bad, start, length);
+}
+
+/* Check an extent for data integrity problems. */
+bool
+read_verify_fiemap_extent(
+	struct scrub_ctx		*ctx,
+	const char			*descr,
+	struct fiemap_extent		*extent,
+	void				*arg)
+{
+	struct read_verify_files	*rvf = arg;
+
+	/* Skip non-real/non-aligned extents. */
+	if (extent->fe_flags & (FIEMAP_EXTENT_UNKNOWN |
+				FIEMAP_EXTENT_DELALLOC |
+				FIEMAP_EXTENT_ENCODED |
+				FIEMAP_EXTENT_NOT_ALIGNED |
+				FIEMAP_EXTENT_UNWRITTEN))
+		return true;
+
+	return bitmap_add(&rvf->good, extent->fe_physical,
+			extent->fe_length);
+}
+
+/* Scan the inode associated with a directory entry. */
+static bool
+read_verify_dirent(
+	struct scrub_ctx		*ctx,
+	const char			*path,
+	int				dir_fd,
+	struct dirent			*dirent,
+	struct stat			*sb,
+	void				*arg)
+{
+	struct stat			fd_sb;
+	struct read_verify_files	*rvf = arg;
+	bool				moveon = true;
+	int				fd;
+	int				error;
+
+	/* If not file, move on to the next dirent. */
+	if (!S_ISREG(sb->st_mode))
+		return true;
+
+	/* Open the file */
+	fd = openat(dir_fd, dirent->d_name,
+			O_RDONLY | O_NOATIME | O_NOFOLLOW | O_NOCTTY);
+	if (fd < 0) {
+		if (errno != ENOENT)
+			str_errno(ctx, path);
+		return true;
+	}
+
+	/* Did the fstatat and the open race? */
+	if (fstat(fd, &fd_sb) < 0) {
+		str_errno(ctx, path);
+		goto close;
+	}
+	if (fd_sb.st_ino != sb->st_ino || fd_sb.st_dev != sb->st_dev)
+		str_warn(ctx, path,
+_("inode changed out from under us!"));
+
+	/*
+	 * Either record the file extent map data for one big push later,
+	 * or read the file data the regular way.
+	 */
+	if (rvf->use_fiemap)
+		moveon = fiemap(ctx, path, fd, false, false,
+				read_verify_fiemap_extent, rvf);
+	else
+		moveon = ctx->ops->read_file(ctx, path, fd, &fd_sb);
+	if (!moveon)
+		goto close;
+
+close:
+	/* Close file. */
+	error = close(fd);
+	if (error)
+		str_errno(ctx, path);
+
+	return moveon;
+}
+
+static bool
+schedule_read_verify(
+	uint64_t			start,
+	uint64_t			length,
+	void				*arg)
+{
+	struct read_verify_files	*rvf = arg;
+
+	read_verify_schedule(&rvf->rvp, &rvf->rv, &rvf->ctx->datadev,
+			start, length, rvf);
+	return true;
+}
+
+/* Can we FIEMAP every block in a file? */
+static bool
+can_fiemap_all_file_blocks(
+	struct scrub_ctx		*ctx)
+{
+	return disk_is_open(&ctx->datadev) &&
+		scrub_has_fiemap(ctx) && scrub_has_fiemap_attr(ctx);
+}
+
+/* Scan all the data blocks, using FIEMAP to figure out what to verify. */
+bool
+generic_scan_blocks(
+	struct scrub_ctx		*ctx)
+{
+	struct read_verify_files	rvf = {0};
+	bool				moveon;
+
+	if (!scrub_data)
+		return true;
+
+	rvf.ctx = ctx;
+
+	/* If FIEMAP is unavailable, just use regular file pread. */
+	if (!can_fiemap_all_file_blocks(ctx))
+		return scan_fs_tree(ctx, NULL, read_verify_dirent, &rvf);
+
+	rvf.use_fiemap = true;
+	moveon = bitmap_init(&rvf.good);
+	if (!moveon) {
+		str_errno(ctx, ctx->mntpoint);
+		return false;
+	}
+
+	moveon = bitmap_init(&rvf.bad);
+	if (!moveon) {
+		str_errno(ctx, ctx->mntpoint);
+		goto out_good;
+	}
+
+	/* Collect all the extent maps. */
+	moveon = scan_fs_tree(ctx, NULL, read_verify_dirent, &rvf);
+	if (!moveon)
+		goto out_bad;
+
+	/* Run all the IO in batches. */
+	moveon = read_verify_pool_init(&rvf.rvp, ctx, ctx->readbuf, IO_MAX_SIZE,
+			ctx->mnt_sf.f_frsize, read_verify_fiemap_ioerr,
+			disk_heads(&ctx->datadev));
+	if (!moveon)
+		goto out_bad;
+	moveon = bitmap_iterate(&rvf.good, schedule_read_verify, &rvf);
+	if (!moveon)
+		goto out_pool;
+	read_verify_force(&rvf.rvp, &rvf.rv);
+	read_verify_pool_destroy(&rvf.rvp);
+
+	/* Scan the whole dir tree to see what matches the bad extents. */
+	if (!bitmap_empty(&rvf.bad))
+		moveon = report_verify_errors(ctx, &rvf.bad);
+
+	bitmap_free(&rvf.bad);
+	bitmap_free(&rvf.good);
+	return moveon;
+
+out_pool:
+	read_verify_pool_destroy(&rvf.rvp);
+out_bad:
+	bitmap_free(&rvf.bad);
+out_good:
+	bitmap_free(&rvf.good);
+
+	return moveon;
+}
+
+/* Phase 6 */
+struct summary_counts {
+	pthread_mutex_t		lock;
+	struct bitmap	dext;
+	struct bitmap	inob;	/* inode bitmap */
+	unsigned long long	inodes;	/* number of inodes */
+	unsigned long long	bytes;	/* bytes used */
+};
+
+struct inode_fork_summary {
+	struct bitmap	*tree;
+	unsigned long long	bytes;
+};
+
+/* Record data block extents in a bitmap. */
+bool
+generic_record_inode_summary_fiemap(
+	struct scrub_ctx		*ctx,
+	const char			*descr,
+	struct fiemap_extent		*extent,
+	void				*arg)
+{
+	struct inode_fork_summary	*ifs = arg;
+
+	/* Skip non-real/non-aligned extents. */
+	if (extent->fe_flags & (FIEMAP_EXTENT_UNKNOWN |
+				FIEMAP_EXTENT_DELALLOC |
+				FIEMAP_EXTENT_ENCODED |
+				FIEMAP_EXTENT_NOT_ALIGNED))
+		return true;
+
+	bitmap_add(ifs->tree, extent->fe_physical, extent->fe_length);
+	ifs->bytes += extent->fe_length;
+
+	return true;
+}
+
+/* Record the presence of an inode and its block usage. */
+static bool
+generic_record_inode_summary(
+	struct scrub_ctx		*ctx,
+	const char			*descr,
+	int				dir_fd,
+	struct dirent			*dirent,
+	struct stat			*sb,
+	void				*arg)
+{
+	struct summary_counts		*summary = arg;
+	struct stat			fd_sb;
+	struct inode_fork_summary	ifs;
+	unsigned long long		bs_bytes;
+	int				fd;
+	bool				has;
+	bool				moveon = true;
+
+	if (dirent && (strcmp(dirent->d_name, ".") == 0 ||
+		       strcmp(dirent->d_name, "..") == 0))
+		return true;
+
+	/* Detect hardlinked files. */
+	moveon = bitmap_test_and_set(&summary->inob, sb->st_ino, &has);
+	if (!moveon)
+		return moveon;
+	if (has)
+		return true;
+
+	bs_bytes = sb->st_blocks << BBSHIFT;
+
+	/* Record the inode.  If it's not a file, record the data usage too. */
+	pthread_mutex_lock(&summary->lock);
+	summary->inodes++;
+
+	/*
+	 * We can use fiemap and dext to figure out the correct block usage
+	 * for files that might share blocks.  If any of those conditions
+	 * are not met (non-file, fs doesn't support reflink, fiemap doesn't
+	 * work) then we just assume that the inode is the sole owner of its
+	 * blocks and use that to calculate the block usage.
+	 */
+	if (!can_fiemap_all_file_blocks(ctx) || !scrub_has_shared_blocks(ctx) ||
+	    !S_ISREG(sb->st_mode)) {
+		summary->bytes += bs_bytes;
+		pthread_mutex_unlock(&summary->lock);
+		return true;
+	}
+	pthread_mutex_unlock(&summary->lock);
+
+	/* Open the file */
+	fd = dirent_open(dir_fd, dirent);
+	if (fd < 0) {
+		if (errno != ENOENT)
+			str_errno(ctx, descr);
+		return true;
+	}
+
+	/* Did the fstatat and the open race? */
+	if (fstat(fd, &fd_sb) < 0) {
+		str_errno(ctx, descr);
+		goto close;
+	}
+
+	if (fd_sb.st_ino != sb->st_ino || fd_sb.st_dev != sb->st_dev)
+		str_warn(ctx, descr,
+_("inode changed out from under us!"));
+
+	ifs.tree = &summary->dext;
+	ifs.bytes = 0;
+	moveon = fiemap(ctx, descr, fd, false, false,
+			generic_record_inode_summary_fiemap, &ifs);
+	if (!moveon)
+		goto out_nofiemap;
+	if (file_has_xattrs(ctx, descr, fd)) {
+		moveon = fiemap(ctx, descr, fd, true, false,
+				generic_record_inode_summary_fiemap, &ifs);
+		if (!moveon)
+			goto out_nofiemap;
+	}
+
+	/*
+	 * bs_bytes tracks the number of bytes assigned to this file
+	 * for data, xattrs, and block mapping metadata.  ifs.bytes tracks
+	 * the data and xattr storage space used, so the diff between the
+	 * two is the space used for block mapping metadata.  Add that to
+	 * the data usage.
+	 */
+out_nofiemap:
+	pthread_mutex_lock(&summary->lock);
+	summary->bytes += bs_bytes - ifs.bytes;
+	pthread_mutex_unlock(&summary->lock);
+
+close:
+	close(fd);
+	return moveon;
+}
+
+/* Sum the bytes in each extent. */
+static bool
+generic_summary_count_helper(
+	uint64_t			start,
+	uint64_t			length,
+	void				*arg)
+{
+	unsigned long long		*count = arg;
+
+	*count += length;
+	return true;
+}
+
+/* Traverse the directory tree, counting inodes & blocks. */
+bool
+generic_check_summary(
+	struct scrub_ctx	*ctx)
+{
+	struct summary_counts	summary = {0};
+	struct stat		sb;
+	struct statvfs		sfs;
+	unsigned long long	fd;
+	unsigned long long	fi;
+	unsigned long long	sd;
+	unsigned long long	si;
+	unsigned long long	absdiff;
+	bool			complain = false;
+	bool			moveon;
+	int			error;
+
+	pthread_mutex_init(&summary.lock, NULL);
+
+	/* Flush everything out to disk before we start counting. */
+	error = syncfs(ctx->mnt_fd);
+	if (error) {
+		str_errno(ctx, ctx->mntpoint);
+		return false;
+	}
+
+	/* Get the rootdir's summary stats. */
+	error = fstat(ctx->mnt_fd, &sb);
+	if (error) {
+		str_errno(ctx, ctx->mntpoint);
+		return false;
+	}
+
+	moveon = bitmap_init(&summary.dext);
+	if (!moveon)
+		return moveon;
+
+	moveon = bitmap_init(&summary.inob);
+	if (!moveon)
+		return moveon;
+
+	/* Scan the rest of the filesystem. */
+	moveon = scan_fs_tree(ctx, NULL, generic_record_inode_summary,
+			&summary);
+	if (!moveon)
+		return moveon;
+
+	/* Summarize extent tree results. */
+	moveon = bitmap_iterate(&summary.dext,
+			generic_summary_count_helper, &summary.bytes);
+	if (!moveon)
+		return moveon;
+
+	bitmap_free(&summary.inob);
+	bitmap_free(&summary.dext);
+
+	/* Compare to statfs results. */
+	error = fstatvfs(ctx->mnt_fd, &sfs);
+	if (error) {
+		str_errno(ctx, ctx->mntpoint);
+		return false;
+	}
+
+	/* Report on what we found. */
+	fd = (sfs.f_blocks - sfs.f_bfree) * sfs.f_frsize;
+	fi = sfs.f_files - sfs.f_ffree;
+	sd = summary.bytes;
+	si = summary.inodes;
+
+	/*
+	 * Complain if the counts are off by more than 10%, unless
+	 * the inaccuracy is less than 32MB worth of blocks or 100 inodes.
+	 * Ignore zero counters.
+	 */
+	absdiff = 1ULL << 25;
+	if (fd)
+		complain = !within_range(ctx, sd, fd, absdiff, 1, 10,
+				_("data blocks"));
+	if (fi)
+		complain |= !within_range(ctx, si, fi, 100, 1, 10, _("inodes"));
+
+	if (complain || verbose) {
+		double		b, i;
+		char		*bu, *iu;
+
+		b = auto_space_units(fd, &bu);
+		i = auto_units(fi, &iu);
+		fprintf(stdout, _("%.1f%s data used;  %.1f%s inodes used.\n"),
+				b, bu, i, iu);
+		b = auto_space_units(sd, &bu);
+		i = auto_units(si, &iu);
+		fprintf(stdout, _("%.1f%s data found; %.1f%s inodes found.\n"),
+				b, bu, i, iu);
+		fflush(stdout);
+	}
+
+	return true;
+}
+
+/* Phase 7: Preening filesystem. */
+bool
+generic_preen_fs(
+	struct scrub_ctx		*ctx)
+{
+	fstrim(ctx);
+	return true;
+}
+
+struct scrub_ops generic_scrub_ops = {
+	.name			= "generic",
+	.cleanup		= generic_cleanup,
+	.scan_fs		= generic_scan_fs,
+	.scan_inodes		= generic_scan_inodes,
+	.check_dir		= generic_check_dir,
+	.check_inode		= generic_check_inode,
+	.scan_extents		= generic_scan_extents,
+	.scan_xattrs		= generic_scan_xattrs,
+	.scan_special_xattrs	= generic_scan_special_xattrs,
+	.scan_metadata		= generic_scan_metadata,
+	.check_summary		= generic_check_summary,
+	.read_file		= read_verify_file,
+	.scan_blocks		= generic_scan_blocks,
+	.scan_fs_tree		= generic_scan_fs_tree,
+	.preen_fs		= generic_preen_fs,
+};
diff --git a/scrub/scrub.c b/scrub/scrub.c
index 5ed6559..7ed5374 100644
--- a/scrub/scrub.c
+++ b/scrub/scrub.c
@@ -783,7 +783,7 @@ find_ops(
 			return op;
 	}
 
-	return NULL;
+	return &generic_scrub_ops;
 }
 
 int
diff --git a/scrub/scrub.h b/scrub/scrub.h
index 442371b..6ab53c1 100644
--- a/scrub/scrub.h
+++ b/scrub/scrub.h
@@ -150,6 +150,8 @@ debug_tweak_on(
 	return debug && getenv(name) != NULL;
 }
 
+extern struct scrub_ops	generic_scrub_ops;
+
 /* Generic implementations of the ops functions */
 bool generic_cleanup(struct scrub_ctx *ctx);
 bool generic_scan_fs(struct scrub_ctx *ctx);


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 16/17] xfs_scrub: add tweaks for specific non-XFS filesystems
  2017-01-21  8:08 [PATCH v5 00/17] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (13 preceding siblings ...)
  2017-01-21  8:09 ` [PATCH 14/17] xfs_scrub: add generic VFS scrubber implementation Darrick J. Wong
@ 2017-01-21  8:09 ` Darrick J. Wong
  2017-01-21  8:10 ` [PATCH 17/17] xfs_scrub: create a script to scrub all xfs filesystems Darrick J. Wong
  15 siblings, 0 replies; 18+ messages in thread
From: Darrick J. Wong @ 2017-01-21  8:09 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs, linux-fsdevel

Though xfs_scrub is primarily meant to perform online scrubbing of XFS
filesystems, it is mostly capable of scrubbing other types.  Some FSes
need some tweaks, so provide them here.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/Makefile  |    2 -
 scrub/non_xfs.c |  185 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/scrub.c   |    3 +
 scrub/scrub.h   |    3 +
 4 files changed, 192 insertions(+), 1 deletion(-)
 create mode 100644 scrub/non_xfs.c


diff --git a/scrub/Makefile b/scrub/Makefile
index d5d58de..42994ba 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -13,7 +13,7 @@ INSTALL_SCRUB = install-scrub
 endif
 
 HFILES = scrub.h ../repair/threads.h read_verify.h iocmd.h xfs_ioctl.h
-CFILES = ../repair/avl64.c disk.c bitmap.c generic.c iocmd.c \
+CFILES = ../repair/avl64.c disk.c bitmap.c generic.c iocmd.c non_xfs.c \
 	 read_verify.c scrub.c ../repair/threads.c xfs.c xfs_ioctl.c
 
 LLDLIBS += $(LIBBLKID) $(LIBXFS) $(LIBXCMD) $(LIBUUID) $(LIBRT) $(LIBPTHREAD) $(LIBHANDLE)
diff --git a/scrub/non_xfs.c b/scrub/non_xfs.c
new file mode 100644
index 0000000..aec9837
--- /dev/null
+++ b/scrub/non_xfs.c
@@ -0,0 +1,185 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "libxfs.h"
+#include <dirent.h>
+#include <sys/statvfs.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include "disk.h"
+#include "scrub.h"
+
+/* Stub scrubbers for non-XFS filesystems. */
+
+/* Read the btrfs geometry. */
+static bool
+btrfs_scan_fs(
+	struct scrub_ctx		*ctx)
+{
+	/*
+	 * btrfs is a volume manager, so we can't get meaningful block numbers
+	 * out of FIEMAP/FIBMAP.  It also checksums data, so raw device access
+	 * for file verify is impossible.  btrfs also supports reflink.
+	 */
+	ctx->quirks |= SCRUB_QUIRK_SHARED_BLOCKS;
+	disk_close(&ctx->datadev);
+	return generic_scan_fs(ctx);
+}
+
+/* Scrub all disk blocks using the btrfs scrub command. */
+static bool
+btrfs_scan_blocks(
+	struct scrub_ctx		*ctx)
+{
+	pid_t				pid;
+	pid_t				rpid;
+	char				*args[] = {"btrfs", "scrub", "start",
+						   "-B", "-f", "-q",
+						   ctx->mntpoint, NULL, NULL};
+	int				status;
+	int				err;
+
+	if (ctx->mode == SCRUB_MODE_DRY_RUN) {
+		args[6] = "-n";
+		args[7] = ctx->mntpoint;
+	}
+
+	pid = fork();
+	if (pid < 0)
+		str_errno(ctx, ctx->mntpoint);
+	else if (pid == 0) {
+		status = execvp(args[0], args);
+		exit(255);
+	} else {
+		rpid = waitpid(pid, &status, 0);
+		while (rpid >= 0 && rpid != pid && !WIFEXITED(status) &&
+				!WIFSIGNALED(status)) {
+			rpid = waitpid(pid, &status, 0);
+		}
+		if (rpid < 0)
+			str_errno(ctx, ctx->mntpoint);
+		else if (WIFSIGNALED(status))
+			str_error(ctx, ctx->mntpoint,
+_("btrfs scrub died, signal %d"),
+					WTERMSIG(status));
+		else if (WIFEXITED(status)) {
+			err = WEXITSTATUS(status);
+			if (err == 0)
+				return true;
+			else if (err == 255)
+				str_error(ctx, ctx->mntpoint,
+_("btrfs scrub failed to run."));
+			else
+				str_error(ctx, ctx->mntpoint,
+_("btrfs scrub signalled corruption, error %d"),
+						err);
+		}
+	}
+
+	return true;
+}
+
+/* btrfs profile */
+struct scrub_ops btrfs_scrub_ops = {
+	.name			= "btrfs",
+	.cleanup		= generic_cleanup,
+	.scan_fs		= btrfs_scan_fs,
+	.scan_inodes		= generic_scan_inodes,
+	.check_dir		= generic_check_dir,
+	.check_inode		= generic_check_inode,
+	.scan_extents		= generic_scan_extents,
+	.scan_xattrs		= generic_scan_xattrs,
+	.scan_special_xattrs	= generic_scan_special_xattrs,
+	.scan_metadata		= generic_scan_metadata,
+	.check_summary		= generic_check_summary,
+	.read_file		= read_verify_file,
+	.scan_blocks		= btrfs_scan_blocks,
+	.scan_fs_tree		= generic_scan_fs_tree,
+	.preen_fs		= generic_preen_fs,
+};
+
+/*
+ * Generic FS scanner for filesystems that support shared blocks.
+ */
+static bool
+scan_fs_shared_blocks(
+	struct scrub_ctx		*ctx)
+{
+	ctx->quirks |= SCRUB_QUIRK_SHARED_BLOCKS;
+	return generic_scan_fs(ctx);
+}
+
+/* shared block filesystem profiles */
+struct scrub_ops shared_block_fs_scrub_ops = {
+	.name			= "shared block generic",
+	.aliases		= "ocfs2\0",
+	.cleanup		= generic_cleanup,
+	.scan_fs		= scan_fs_shared_blocks,
+	.scan_inodes		= generic_scan_inodes,
+	.check_dir		= generic_check_dir,
+	.check_inode		= generic_check_inode,
+	.scan_extents		= generic_scan_extents,
+	.scan_xattrs		= generic_scan_xattrs,
+	.scan_special_xattrs	= generic_scan_special_xattrs,
+	.scan_metadata		= generic_scan_metadata,
+	.check_summary		= generic_check_summary,
+	.read_file		= read_verify_file,
+	.scan_blocks		= generic_scan_blocks,
+	.scan_fs_tree		= generic_scan_fs_tree,
+	.preen_fs		= generic_preen_fs,
+};
+
+/*
+ * Generic FS scan for filesystems that don't present stable inode numbers
+ * between the directory entry and the stat buffer.
+ */
+static bool
+scan_fs_unstable_inum(
+	struct scrub_ctx		*ctx)
+{
+	/*
+	 * HFS+ implements hard links by creating a special hidden file
+	 * that redirects to the real file, so the inode numbers reported
+	 * in the dirent and the fstat buffers don't necessarily match.
+	 *
+	 * iso9660/vfat don't have stable dirent -> inode numbers.
+	 */
+	ctx->quirks |= SCRUB_QUIRK_UNSTABLE_INUM;
+	return generic_scan_fs(ctx);
+}
+
+/* unstable inum filesystem profile */
+struct scrub_ops unstable_inum_fs_scrub_ops = {
+	.name			= "unstable inum generic",
+	.aliases		= "hfsplus\0iso9660\0vfat\0",
+	.cleanup		= generic_cleanup,
+	.scan_fs		= scan_fs_unstable_inum,
+	.scan_inodes		= generic_scan_inodes,
+	.check_dir		= generic_check_dir,
+	.check_inode		= generic_check_inode,
+	.scan_extents		= generic_scan_extents,
+	.scan_xattrs		= generic_scan_xattrs,
+	.scan_special_xattrs	= generic_scan_special_xattrs,
+	.scan_metadata		= generic_scan_metadata,
+	.check_summary		= generic_check_summary,
+	.read_file		= read_verify_file,
+	.scan_blocks		= generic_scan_blocks,
+	.scan_fs_tree		= generic_scan_fs_tree,
+	.preen_fs		= generic_preen_fs,
+};
diff --git a/scrub/scrub.c b/scrub/scrub.c
index 1dcca66..bfcc1c2 100644
--- a/scrub/scrub.c
+++ b/scrub/scrub.c
@@ -309,6 +309,9 @@ __record_preen(
  */
 static struct scrub_ops *scrub_impl[] = {
 	&xfs_scrub_ops,
+	&btrfs_scrub_ops,
+	&shared_block_fs_scrub_ops,
+	&unstable_inum_fs_scrub_ops,
 	NULL
 };
 
diff --git a/scrub/scrub.h b/scrub/scrub.h
index e2376a5..8b23e7f 100644
--- a/scrub/scrub.h
+++ b/scrub/scrub.h
@@ -152,6 +152,9 @@ debug_tweak_on(
 
 extern struct scrub_ops	generic_scrub_ops;
 extern struct scrub_ops	xfs_scrub_ops;
+extern struct scrub_ops	btrfs_scrub_ops;
+extern struct scrub_ops	shared_block_fs_scrub_ops;
+extern struct scrub_ops	unstable_inum_fs_scrub_ops;
 
 /* Generic implementations of the ops functions */
 bool generic_cleanup(struct scrub_ctx *ctx);


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 17/17] xfs_scrub: create a script to scrub all xfs filesystems
  2017-01-21  8:08 [PATCH v5 00/17] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (14 preceding siblings ...)
  2017-01-21  8:09 ` [PATCH 16/17] xfs_scrub: add tweaks for specific non-XFS filesystems Darrick J. Wong
@ 2017-01-21  8:10 ` Darrick J. Wong
  15 siblings, 0 replies; 18+ messages in thread
From: Darrick J. Wong @ 2017-01-21  8:10 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs, linux-fsdevel

Create an xfs_scrub_all command to find all XFS filesystems
and run an online scrub against them all.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 debian/control           |    3 +
 debian/rules             |    1 
 man/man8/xfs_scrub_all.8 |   32 +++++++++++
 scrub/Makefile           |   11 +++-
 scrub/xfs_scrub_all.in   |  133 ++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 176 insertions(+), 4 deletions(-)
 create mode 100644 man/man8/xfs_scrub_all.8
 create mode 100644 scrub/xfs_scrub_all.in


diff --git a/debian/control b/debian/control
index ad81662..d7c0316 100644
--- a/debian/control
+++ b/debian/control
@@ -3,12 +3,13 @@ Section: admin
 Priority: optional
 Maintainer: XFS Development Team <linux-xfs@vger.kernel.org>
 Uploaders: Nathan Scott <nathans@debian.org>, Anibal Monsalve Salazar <anibal@debian.org>
-Build-Depends: uuid-dev, dh-autoreconf, debhelper (>= 5), gettext, libtool, libreadline-gplv2-dev | libreadline5-dev, libblkid-dev (>= 2.17), linux-libc-dev
+Build-Depends: uuid-dev, dh-autoreconf, debhelper (>= 5), gettext, libtool, libreadline-gplv2-dev | libreadline5-dev, libblkid-dev (>= 2.17), linux-libc-dev, dh-python
 Standards-Version: 3.9.1
 Homepage: http://xfs.org/
 
 Package: xfsprogs
 Depends: ${shlibs:Depends}, ${misc:Depends}
+Recommends: ${python3:Depends}, util-linux
 Provides: fsck-backend
 Suggests: xfsdump, acl, attr, quota
 Breaks: xfsdump (<< 3.0.0)
diff --git a/debian/rules b/debian/rules
index c673380..a870944 100755
--- a/debian/rules
+++ b/debian/rules
@@ -76,6 +76,7 @@ binary-arch: checkroot built
 	$(pkgdi)  $(MAKE) -C debian install-d-i
 	$(pkgme)  $(MAKE) dist
 	rmdir debian/xfslibs-dev/usr/share/doc/xfsprogs
+	dh_python3
 	dh_installdocs
 	dh_installchangelogs
 	dh_strip
diff --git a/man/man8/xfs_scrub_all.8 b/man/man8/xfs_scrub_all.8
new file mode 100644
index 0000000..5e1420b
--- /dev/null
+++ b/man/man8/xfs_scrub_all.8
@@ -0,0 +1,32 @@
+.TH xfs_scrub_all 8
+.SH NAME
+xfs_scrub_all \- scrub all mounted XFS filesystems
+.SH SYNOPSIS
+.B xfs_scrub_all
+.SH DESCRIPTION
+.B xfs_scrub_all
+attempts to read and check all the metadata on all mounted XFS filesystems.
+The online scrub is performed via the
+.B xfs_scrub
+tool, either by running it directly or by using systemd to start it
+in a restricted fashion.
+Mounted filesystems are mapped to physical storage devices so that scrub
+operations can be run in parallel so long as no two scrubbers access
+the same device simultaneously.
+.SH EXIT CODE
+The exit code returned by
+.B xfs_scrub_all
+is the sum of the following conditions:
+.br
+\	0\	\-\ No errors
+.br
+\	4\	\-\ File system errors left uncorrected
+.br
+\	8\	\-\ Operational error
+.br
+\	16\	\-\ Usage or syntax error
+.TP
+These are the same error codes returned by xfs_scrub.
+.br
+.SH SEE ALSO
+.BR xfs_scrub (8).
diff --git a/scrub/Makefile b/scrub/Makefile
index 42994ba..d7cea17 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -36,15 +36,20 @@ ifeq ($(HAVE_SYNCFS),yes)
 LCFLAGS += -DHAVE_SYNCFS
 endif
 
-default: depend $(LTCOMMAND)
+default: depend $(LTCOMMAND) xfs_scrub_all
+
+xfs_scrub_all: xfs_scrub_all.in
+	$(SED) -e "s|@sbindir@|$(PKG_ROOT_SBIN_DIR)|g" < $< > $@
+	chmod a+x $@
 
 include $(BUILDRULES)
 
-install: default $(INSTALL_SCRUB)
+install: $(INSTALL_SCRUB)
 
-install-scrub:
+install-scrub: default
 	$(INSTALL) -m 755 -d $(PKG_ROOT_SBIN_DIR)
 	$(LTINSTALL) -m 755 $(LTCOMMAND) $(PKG_ROOT_SBIN_DIR)
+	$(INSTALL) -m 755 xfs_scrub_all $(PKG_ROOT_SBIN_DIR)
 
 install-dev:
 
diff --git a/scrub/xfs_scrub_all.in b/scrub/xfs_scrub_all.in
new file mode 100644
index 0000000..d0a7825
--- /dev/null
+++ b/scrub/xfs_scrub_all.in
@@ -0,0 +1,133 @@
+#!/usr/bin/env python3
+
+# Run online scrubbers in parallel, but avoid thrashing.
+#
+# Copyright (C) 2017 Oracle.  All rights reserved.
+#
+# Author: Darrick J. Wong <darrick.wong@oracle.com>
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License
+# as published by the Free Software Foundation; either version 2
+# of the License, or (at your option) any later version.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+
+import subprocess
+import json
+import threading
+import time
+import sys
+
+SUPPORTED_FS = set(['xfs', 'ext4', 'ext3', 'ext2'])
+retcode = 0
+terminate = False
+
+def find_mounts():
+	'''Map mountpoints to physical disks.'''
+
+	fs = {}
+	cmd=['lsblk', '-o', 'KNAME,TYPE,FSTYPE,MOUNTPOINT', '-J']
+	result = subprocess.Popen(cmd, stdout=subprocess.PIPE)
+	result.wait()
+	if result.returncode != 0:
+		return fs
+	sarray = [x.decode('utf-8') for x in result.stdout.readlines()]
+	output = ' '.join(sarray)
+	bdevdata = json.loads(output)
+	for bdev in bdevdata['blockdevices']:
+		if bdev['type'] == 'disk':
+			lastdisk = bdev['kname']
+		elif bdev['fstype'] is not None and bdev['fstype'] in SUPPORTED_FS:
+			mnt = bdev['mountpoint']
+			if mnt is not None:
+				if mnt in fs:
+					fs[mnt].add(lastdisk)
+				else:
+					fs[mnt] = set([lastdisk])
+	return fs
+
+def run_scrub(mnt, cond, running_devs, mntdevs, killfuncs):
+	'''Run a scrub process.'''
+	global retcode, terminate
+
+	print("Scrubbing %s..." % mnt)
+
+	# Invoke xfs_scrub manually
+	if not terminate:
+		cmd=['@sbindir@/xfs_scrub', '-dTvn', mnt]
+		try:
+			proc = subprocess.Popen(cmd)
+			fn = lambda: proc.terminate()
+			killfuncs.add(fn)
+			proc.wait()
+			try:
+				killfuncs.remove(fn)
+			except:
+				pass
+			print("Scrubbing %s done, (err=%d)" % (mnt, proc.returncode))
+			retcode |= proc.returncode
+		except:
+			print("Unable to start scrub tool.")
+
+	running_devs -= mntdevs
+	cond.acquire()
+	cond.notify()
+	cond.release()
+
+def main():
+	'''Find mounts, schedule scrub runs.'''
+	def thr(mnt, devs):
+		a = (mnt, cond, running_devs, devs, killfuncs)
+		thr = threading.Thread(target = run_scrub, args = a)
+		thr.start()
+	global retcode, terminate
+
+	fs = find_mounts()
+
+	# Schedule scrub jobs...
+	running_devs = set()
+	killfuncs = set()
+	cond = threading.Condition()
+	while len(fs) > 0:
+		if len(running_devs) == 0:
+			mnt, devs = fs.popitem()
+			running_devs.update(devs)
+			thr(mnt, devs)
+		poppers = set()
+		for mnt in fs:
+			devs = fs[mnt]
+			can_run = True
+			for dev in devs:
+				if dev in running_devs:
+					can_run = False
+					break
+			if can_run:
+				running_devs.update(devs)
+				poppers.add(mnt)
+				thr(mnt, devs)
+		for p in poppers:
+			fs.pop(p)
+		cond.acquire()
+		try:
+			cond.wait()
+		except KeyboardInterrupt:
+			terminate = True
+			print("Terminating...")
+			while len(killfuncs) > 0:
+				fn = killfuncs.pop()
+				fn()
+			fs = []
+		cond.release()
+
+	sys.exit(retcode)
+
+if __name__ == '__main__':
+	main()


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 02/17] xfsprogs: Space management tool
  2017-01-21  8:08 ` [PATCH 02/17] xfsprogs: Space management tool Dave Chinner
@ 2017-01-22 16:46   ` James Bottomley
  0 siblings, 0 replies; 18+ messages in thread
From: James Bottomley @ 2017-01-22 16:46 UTC (permalink / raw)
  To: Dave Chinner, sandeen, darrick.wong; +Cc: linux-xfs, linux-fsdevel

On Sat, 2017-01-21 at 00:08 -0800, Dave Chinner wrote:
> xfs_spaceman is intended as a diagnostic and control tool for space
> management operations within XFS. Operations like examining free
> space, managing allocation policies, issuing block discards on free
> space, etc.
> 
> The tool is modelled on the xfs_io interface, allowing both
> interactive and command line control of the tool, enabling it to be
> used in scripts and automated management tools.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Please don't do this: a forged from address is the most classic
indicator of spam and a significant number of email servers will pick
it up and discard the mail.  The way you send someone else's patch is
to send it from your own email, but at the top of the body have

From: Dave Chinner <dchinner@redhat.com>

And then you also need to append your own signoff because you're
retransmitting someone else's work.

James



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-01-22 16:49 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-21  8:08 [PATCH v5 00/17] xfsprogs: online scrub/repair support Darrick J. Wong
2017-01-21  8:08 ` [PATCH 01/17] xfs_io: support the new getfsmap ioctl Darrick J. Wong
2017-01-21  8:08 ` [PATCH 02/17] xfsprogs: Space management tool Dave Chinner
2017-01-22 16:46   ` James Bottomley
2017-01-21  8:08 ` [PATCH 03/17] spaceman: add FITRIM support Dave Chinner
2017-01-21  8:08 ` [PATCH 04/17] spaceman: add new speculative prealloc control Dave Chinner
2017-01-21  8:08 ` [PATCH 05/17] spaceman: AG state control Dave Chinner
2017-01-21  8:08 ` [PATCH 06/17] spaceman: Free space mapping command Dave Chinner
2017-01-21  8:08 ` [PATCH 07/17] xfs_spaceman: add a man page Darrick J. Wong
2017-01-21  8:08 ` [PATCH 08/17] xfs_spaceman: add group summary mode Darrick J. Wong
2017-01-21  8:09 ` [PATCH 09/17] xfs_db: introduce fuzz command Darrick J. Wong
2017-01-21  8:09 ` [PATCH 10/17] xfs_db: print attribute remote value blocks Darrick J. Wong
2017-01-21  8:09 ` [PATCH 11/17] xfs_db: write / fuzz bad values into dir/attr blocks with good CRCs Darrick J. Wong
2017-01-21  8:09 ` [PATCH 12/17] xfs_io: provide an interface to the scrub ioctls Darrick J. Wong
2017-01-21  8:09 ` [PATCH 13/17] xfs_scrub: create online filesystem scrub program Darrick J. Wong
2017-01-21  8:09 ` [PATCH 14/17] xfs_scrub: add generic VFS scrubber implementation Darrick J. Wong
2017-01-21  8:09 ` [PATCH 16/17] xfs_scrub: add tweaks for specific non-XFS filesystems Darrick J. Wong
2017-01-21  8:10 ` [PATCH 17/17] xfs_scrub: create a script to scrub all xfs filesystems Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).