All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v11 00/27] xfsprogs: online scrub/repair support
@ 2018-01-06  1:51 Darrick J. Wong
  2018-01-06  1:51 ` [PATCH 01/27] xfs_scrub: create online filesystem scrub program Darrick J. Wong
                   ` (30 more replies)
  0 siblings, 31 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-06  1:51 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

Hi all,

This is the eleventh revision of a patchset that adds to XFS userland tools
support for online metadata scrubbing and repair.  Since v10 I've rebased
to the latest for-next, fixed some wonky error messages, and fixed a few
minor problems I found via code inspection.  However, this patch series is
more or less the same as v10.

We start by creating the basic shell of the program that can do argument
parsing and error reporting, create some abstractions for the XFS ioctls
that we use to iterate and scrub metadata, and then tie together all the
in-kernel scrubbing in separate scrub phases.

Next, we move on to checking the directory tree for connectivity and
naming problems and add the infrastructure to perform an (optional) scan
of the in-use parts of the disk media.  We also implement a minimal
preen -- if the fs checks out, we can try to run fstrim; and some basic
progress reporting if the program is running interactively.

Finally, we add some wrapper scripts to schedule scrubs of all the
mounted filesystems; and the necessary systemd / cron infrastructure
that is needed to automatically scan everything once a week.  All of
this is disabled by default.  The systemd integration allows us to give
scrub exactly the privileges it needs while walling off the rest of the
system.

If you're going to start using this mess, you probably ought to just
pull from my git tree for xfsprogs[1].  This series relies on the
libfrog patches sent earlier.  Kernel support will appear in 4.15.

Comments and questions are, as always, welcome.

--D

[1] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=djwong-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 01/27] xfs_scrub: create online filesystem scrub program
  2018-01-06  1:51 [PATCH v11 00/27] xfsprogs: online scrub/repair support Darrick J. Wong
@ 2018-01-06  1:51 ` Darrick J. Wong
  2018-01-12  0:16   ` Eric Sandeen
  2018-01-12  1:07   ` Eric Sandeen
  2018-01-06  1:51 ` [PATCH 02/27] xfs_scrub: common error handling Darrick J. Wong
                   ` (29 subsequent siblings)
  30 siblings, 2 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-06  1:51 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Create the foundations of a filesystem scrubbing tool that asks the
kernel to inspect all metadata in the filesystem and (ultimately) to
repair anything that's broken.  Also create the man page for the
utility.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 .gitignore                   |    1 
 Makefile                     |    3 +
 man/man8/xfs_scrub.8         |  117 ++++++++++++++++++++++++++++++++++++++++++
 scrub/Makefile               |   42 +++++++++++++++
 scrub/common.c               |   20 +++++++
 scrub/common.h               |   23 ++++++++
 scrub/xfs_scrub.c            |  109 +++++++++++++++++++++++++++++++++++++++
 scrub/xfs_scrub.h            |   23 ++++++++
 tools/find-api-violations.sh |    2 -
 9 files changed, 338 insertions(+), 2 deletions(-)
 create mode 100644 man/man8/xfs_scrub.8
 create mode 100644 scrub/Makefile
 create mode 100644 scrub/common.c
 create mode 100644 scrub/common.h
 create mode 100644 scrub/xfs_scrub.c
 create mode 100644 scrub/xfs_scrub.h


diff --git a/.gitignore b/.gitignore
index e839e2a..a3db640 100644
--- a/.gitignore
+++ b/.gitignore
@@ -68,6 +68,7 @@ cscope.*
 /repair/xfs_repair
 /rtcp/xfs_rtcp
 /spaceman/xfs_spaceman
+/scrub/xfs_scrub
 
 # generated crc files
 /libxfs/crc32selftest
diff --git a/Makefile b/Makefile
index 0dce80a..3bd0796 100644
--- a/Makefile
+++ b/Makefile
@@ -48,7 +48,7 @@ LIBFROG_SUBDIR = libfrog
 DLIB_SUBDIRS = libxlog libxcmd libhandle
 LIB_SUBDIRS = libxfs $(DLIB_SUBDIRS)
 TOOL_SUBDIRS = copy db estimate fsck growfs io logprint mkfs quota \
-		mdrestore repair rtcp m4 man doc debian spaceman
+		mdrestore repair rtcp m4 man doc debian spaceman scrub
 
 ifneq ("$(PKG_PLATFORM)","darwin")
 TOOL_SUBDIRS += fsr
@@ -91,6 +91,7 @@ repair: libxlog libxcmd
 copy: libxlog
 mkfs: libxcmd
 spaceman: libxcmd
+scrub: libhandle libxcmd
 
 ifeq ($(HAVE_BUILDDEFS), yes)
 include $(BUILDRULES)
diff --git a/man/man8/xfs_scrub.8 b/man/man8/xfs_scrub.8
new file mode 100644
index 0000000..95f4fea
--- /dev/null
+++ b/man/man8/xfs_scrub.8
@@ -0,0 +1,117 @@
+.TH xfs_scrub 8
+.SH NAME
+xfs_scrub \- scrub the contents of an XFS filesystem
+.SH SYNOPSIS
+.B xfs_scrub
+[
+.B \-abemnTvVxy
+]
+.I mount-point
+.br
+.B xfs_scrub \-V
+.SH DESCRIPTION
+.B xfs_scrub
+attempts to check and repair all metadata in a mounted XFS filesystem.
+.PP
+.B xfs_scrub
+asks the kernel to scrub all metadata objects in the filesystem.
+Metadata records are scanned for obviously bad values and then
+cross-referenced against other metadata.
+The goal is to establish a threasonable confidence about the consistency
+of the overall filesystem by examining the consistency of individual
+metadata records against the other metadata in the filesystem across the
+entire filesystem.
+Damaged metadata can be rebuilt from other metadata if there is
+sufficient redundancy (and no other corruption) in the metadata.
+.PP
+This utility does not know how to correct all errors.
+If the tool cannot fix the detected errors, you must unmount the
+filesystem and run
+.B xfs_repair
+to fix the problems.
+If this tool is not run with either of the
+.B \-n
+or
+.B \-y
+options, then it will optimize the filesystem when possible,
+but it will not try to fix errors.
+.SH OPTIONS
+.TP
+.BI \-a " errors"
+Abort if more than this many errors are found on the filesystem.
+.TP
+.B \-b
+Run in background mode.
+If the option is specified once, only run a single scrubbing thread at a
+time.
+If given more than once, an artificial delay of 100us is added to each
+scrub call to reduce CPU overhead even further.
+.TP
+.B \-e
+Specifies what happens when errors are detected.
+If
+.IR shutdown
+is given, the filesystem will be taken offline if errors are found.
+Not all backends can shut down a filesystem.
+If
+.IR continue
+is given, no action taken if errors are found.
+This is the default.
+.TP
+.BI \-m " file"
+Search this file for mounted filesystems instead of /etc/mtab.
+.TP
+.B \-n
+Dry run, do not modify anything in the filesystem.
+This disables all preening and optimization behaviors, and disables
+calling FITRIM on the free space after a successful run.
+.TP
+.BI \-T
+Print timing and memory usage information for each phase.
+.TP
+.B \-v
+Enable verbose mode, which prints periodic status updates.
+.TP
+.B \-V
+Prints the version number and exits.
+.TP
+.B \-x
+Scrub all file data too.
+The block list will be sorted in disk order for better performance.
+.B xfs_scrub
+will issue O_DIRECT reads to the block device directly.
+If the block device is a SCSI disk, it will issue READ VERIFY commands
+directly to the disk.
+.TP
+.B \-y
+Try to repair all filesystem errors.
+If the errors cannot be fixed online, then the filesystem must be taken
+offline for repair.
+.SH EXIT CODE
+The exit code returned by
+.B xfs_scrub
+is the sum of the following conditions:
+.br
+\	0\	\-\ No errors
+.br
+\	1\	\-\ File system errors left uncorrected
+.br
+\	2\	\-\ File system optimizations possible
+.br
+\	4\	\-\ Operational error
+.br
+\	8\	\-\ Usage or syntax error
+.br
+.SH CAVEATS
+.B xfs_scrub
+is an immature utility!
+This program takes advantage of in-kernel scrubbing to verify a given
+data structure with locks held.
+The kernel must support the BULKSTAT, FSGEOMETRY, FSCOUNTS, GET_RESBLKS,
+GETBMAPX, GETFSMAP, INUMBERS, and SCRUB_METADATA ioctls.
+This can tie up the system for a while.
+.PP
+If errors are found and cannot be repaired, the filesystem must be taken
+offline and repaired.
+.SH SEE ALSO
+.BR xfs_repair (8).
diff --git a/scrub/Makefile b/scrub/Makefile
new file mode 100644
index 0000000..62cca3b
--- /dev/null
+++ b/scrub/Makefile
@@ -0,0 +1,42 @@
+#
+# Copyright (C) 2018 Oracle.  All Rights Reserved.
+#
+
+TOPDIR = ..
+include $(TOPDIR)/include/builddefs
+
+# On linux we get fsmap from the system or define it ourselves
+# so include this based on platform type.  If this reverts to only
+# the autoconf check w/o local definition, change to testing HAVE_GETFSMAP
+SCRUB_PREREQS=$(PKG_PLATFORM)
+
+ifeq ($(SCRUB_PREREQS),linux)
+LTCOMMAND = xfs_scrub
+INSTALL_SCRUB = install-scrub
+endif	# scrub_prereqs
+
+HFILES = \
+common.h \
+xfs_scrub.h
+
+CFILES = \
+common.c \
+xfs_scrub.c
+
+LLDLIBS += $(LIBHANDLE) $(LIBFROG) $(LIBPTHREAD)
+LTDEPENDENCIES += $(LIBHANDLE) $(LIBFROG)
+LLDFLAGS = -static
+
+default: depend $(LTCOMMAND)
+
+include $(BUILDRULES)
+
+install: default $(INSTALL_SCRUB)
+
+install-scrub:
+	$(INSTALL) -m 755 -d $(PKG_ROOT_SBIN_DIR)
+	$(LTINSTALL) -m 755 $(LTCOMMAND) $(PKG_ROOT_SBIN_DIR)
+
+install-dev:
+
+-include .dep
diff --git a/scrub/common.c b/scrub/common.c
new file mode 100644
index 0000000..0a58c16
--- /dev/null
+++ b/scrub/common.c
@@ -0,0 +1,20 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "common.h"
diff --git a/scrub/common.h b/scrub/common.h
new file mode 100644
index 0000000..1082296
--- /dev/null
+++ b/scrub/common.h
@@ -0,0 +1,23 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef XFS_SCRUB_COMMON_H_
+#define XFS_SCRUB_COMMON_H_
+
+#endif /* XFS_SCRUB_COMMON_H_ */
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
new file mode 100644
index 0000000..4f26855
--- /dev/null
+++ b/scrub/xfs_scrub.c
@@ -0,0 +1,109 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include "xfs_scrub.h"
+
+/*
+ * XFS Online Metadata Scrub (and Repair)
+ *
+ * The XFS scrubber uses custom XFS ioctls to probe more deeply into the
+ * internals of the filesystem.  It takes advantage of scrubbing ioctls
+ * to check all the records stored in a metadata object and to
+ * cross-reference those records against the other filesystem metadata.
+ *
+ * After the program gathers command line arguments to figure out
+ * exactly what the user wants the program is going to do, scrub
+ * execution is split up into several separate phases:
+ *
+ * The "find geometry" phase queries XFS for the filesystem geometry.
+ * The block devices for the data, realtime, and log devices are opened.
+ * Kernel ioctls are test-queried to see if they actually work (the scrub
+ * ioctl in particular), and any other filesystem-specific information
+ * is gathered.
+ *
+ * In the "check internal metadata" phase, we call the metadata scrub
+ * ioctl to check the filesystem's internal per-AG btrees.  This
+ * includes the AG superblock, AGF, AGFL, and AGI headers, freespace
+ * btrees, the regular and free inode btrees, the reverse mapping
+ * btrees, and the reference counting btrees.  If the realtime device is
+ * enabled, the realtime bitmap and reverse mapping btrees are enabled.
+ * Quotas, if enabled, are also checked in this phase.
+ *
+ * Each AG (and the realtime device) has its metadata checked in a
+ * separate thread for better performance.  Errors in the internal
+ * metadata can be fixed here prior to the inode scan; refer to the
+ * section about the "repair filesystem" phase for more information.
+ *
+ * The "scan all inodes" phase uses BULKSTAT to scan all the inodes in
+ * an AG in disk order.  The BULKSTAT information provides enough
+ * information to construct a file handle that is used to check the
+ * following parts of every file:
+ *
+ *  - The inode record
+ *  - All three block forks (data, attr, CoW)
+ *  - If it's a symlink, the symlink target.
+ *  - If it's a directory, the directory entries.
+ *  - All extended attributes
+ *  - The parent pointer
+ *
+ * Multiple threads are started to check each the inodes of each AG in
+ * parallel.  Errors in file metadata can be fixed here; see the section
+ * about the "repair filesystem" phase for more information.
+ *
+ * Next comes the (configurable) "repair filesystem" phase.  The user
+ * can instruct this program to fix all problems encountered; to fix
+ * only optimality problems and leave the corruptions; or not to touch
+ * the filesystem at all.  Any metadata repairs that did not succeed in
+ * the previous two phases are retried here; if there are uncorrectable
+ * errors, xfs_scrub stops here.
+ *
+ * The next phase is the "check directory tree" phase.  In this phase,
+ * every directory is opened (via file handle) to confirm that each
+ * directory is connected to the root.  Directory entries are checked
+ * for ambiguous Unicode normalization mappings, which is to say that we
+ * look for pairs of entries whose utf-8 strings normalize to the same
+ * code point sequence and map to different inodes, because that could
+ * be used to trick a user into opening the wrong file.  The names of
+ * extended attributes are checked for Unicode normalization collisions.
+ *
+ * In the "verify data file integrity" phase, we employ GETFSMAP to read
+ * the reverse-mappings of all AGs and issue direct-reads of the
+ * underlying disk blocks.  We rely on the underlying storage to have
+ * checksummed the data blocks appropriately.  Multiple threads are
+ * started to check each AG in parallel; a separate thread pool is used
+ * to handle the direct reads.
+ *
+ * In the "check summary counters" phase, use GETFSMAP to tally up the
+ * blocks and BULKSTAT to tally up the inodes we saw and compare that to
+ * the statfs output.  This gives the user a rough estimate of how
+ * thorough the scrub was.
+ */
+
+/* Program name; needed for libxcmd error reports. */
+char				*progname = "xfs_scrub";
+
+int
+main(
+	int			argc,
+	char			**argv)
+{
+	fprintf(stderr, "XXX: This program is not complete!\n");
+	return 4;
+}
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
new file mode 100644
index 0000000..ff9c24d
--- /dev/null
+++ b/scrub/xfs_scrub.h
@@ -0,0 +1,23 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef XFS_SCRUB_XFS_SCRUB_H_
+#define XFS_SCRUB_XFS_SCRUB_H_
+
+#endif /* XFS_SCRUB_XFS_SCRUB_H_ */
diff --git a/tools/find-api-violations.sh b/tools/find-api-violations.sh
index 3b976d3..cb075ba 100755
--- a/tools/find-api-violations.sh
+++ b/tools/find-api-violations.sh
@@ -6,7 +6,7 @@
 
 # NOTE: This script doesn't look for API violations in function parameters.
 
-tool_dirs="copy db estimate fsck fsr growfs io logprint mdrestore mkfs quota repair rtcp"
+tool_dirs="copy db estimate fsck fsr growfs io logprint mdrestore mkfs quota repair rtcp scrub"
 
 # Calls to xfs_* functions in libxfs/*.c without the libxfs_ prefix
 find_possible_api_calls() {


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 02/27] xfs_scrub: common error handling
  2018-01-06  1:51 [PATCH v11 00/27] xfsprogs: online scrub/repair support Darrick J. Wong
  2018-01-06  1:51 ` [PATCH 01/27] xfs_scrub: create online filesystem scrub program Darrick J. Wong
@ 2018-01-06  1:51 ` Darrick J. Wong
  2018-01-12  1:15   ` Eric Sandeen
  2018-01-06  1:51 ` [PATCH 03/27] xfs_scrub: set up command line argument parsing Darrick J. Wong
                   ` (28 subsequent siblings)
  30 siblings, 1 reply; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-06  1:51 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Standardize how we record and report errors.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/common.c    |  141 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/common.h    |   28 +++++++++++
 scrub/xfs_scrub.c |    8 +++
 scrub/xfs_scrub.h |   12 +++++
 4 files changed, 189 insertions(+)


diff --git a/scrub/common.c b/scrub/common.c
index 0a58c16..3c89b7d 100644
--- a/scrub/common.c
+++ b/scrub/common.c
@@ -17,4 +17,145 @@
  * along with this program; if not, write the Free Software Foundation,
  * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
  */
+#include <stdio.h>
+#include <pthread.h>
+#include <stdbool.h>
+#include "platform_defs.h"
+#include "xfs.h"
+#include "xfs_scrub.h"
 #include "common.h"
+
+/*
+ * Reporting Status to the Console
+ *
+ * We aim for a roughly standard reporting format -- the severity of the
+ * status being reported, a textual description of the objecting being
+ * reported, and whatever the status happens to be.
+ *
+ * Errors are the most severe and reflect filesystem corruption.
+ * Warnings indicate that something is amiss and needs the attention of
+ * the administrator, but does not constitute a corruption.  Information
+ * is merely advisory.
+ */
+
+/* Too many errors? Bail out. */
+bool
+xfs_scrub_excessive_errors(
+	struct scrub_ctx	*ctx)
+{
+	bool			ret;
+
+	pthread_mutex_lock(&ctx->lock);
+	ret = ctx->max_errors > 0 && ctx->errors_found >= ctx->max_errors;
+	pthread_mutex_unlock(&ctx->lock);
+
+	return ret;
+}
+
+/* Print an error string and whatever error is stored in errno. */
+void
+__str_errno(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	const char		*file,
+	int			line)
+{
+	char			buf[DESCR_BUFSZ];
+
+	pthread_mutex_lock(&ctx->lock);
+	fprintf(stderr, _("Error: %s: %s."), descr,
+			strerror_r(errno, buf, DESCR_BUFSZ));
+	if (debug)
+		fprintf(stderr, _(" (%s line %d)"), file, line);
+	fprintf(stderr, "\n");
+	ctx->runtime_errors++;
+	pthread_mutex_unlock(&ctx->lock);
+}
+
+/* Print an error string and some error text. */
+void
+__str_error(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	const char		*file,
+	int			line,
+	const char		*format,
+	...)
+{
+	va_list			args;
+
+	pthread_mutex_lock(&ctx->lock);
+	fprintf(stderr, _("Error: %s: "), descr);
+	va_start(args, format);
+	vfprintf(stderr, format, args);
+	va_end(args);
+	if (debug)
+		fprintf(stderr, _(" (%s line %d)"), file, line);
+	fprintf(stderr, "\n");
+	ctx->errors_found++;
+	pthread_mutex_unlock(&ctx->lock);
+}
+
+/* Print a warning string and some warning text. */
+void
+__str_warn(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	const char		*file,
+	int			line,
+	const char		*format,
+	...)
+{
+	va_list			args;
+
+	pthread_mutex_lock(&ctx->lock);
+	fprintf(stderr, _("Warning: %s: "), descr);
+	va_start(args, format);
+	vfprintf(stderr, format, args);
+	va_end(args);
+	if (debug)
+		fprintf(stderr, _(" (%s line %d)"), file, line);
+	fprintf(stderr, "\n");
+	ctx->warnings_found++;
+	pthread_mutex_unlock(&ctx->lock);
+}
+
+/* Print an informational string and some informational text. */
+void
+__str_info(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	const char		*file,
+	int			line,
+	const char		*format,
+	...)
+{
+	va_list			args;
+
+	pthread_mutex_lock(&ctx->lock);
+	fprintf(stdout, _("Info: %s: "), descr);
+	va_start(args, format);
+	vfprintf(stdout, format, args);
+	va_end(args);
+	if (debug)
+		fprintf(stdout, _(" (%s line %d)"), file, line);
+	fprintf(stdout, "\n");
+	fflush(stdout);
+	pthread_mutex_unlock(&ctx->lock);
+}
+
+/* Catch fatal errors from pieces we import from xfs_repair. */
+void __attribute__((noreturn))
+do_error(char const *msg, ...)
+{
+	va_list args;
+
+	fprintf(stderr, _("\nfatal error -- "));
+
+	va_start(args, msg);
+	vfprintf(stderr, msg, args);
+	va_end(args);
+	if (dumpcore)
+		abort();
+	exit(1);
+}
diff --git a/scrub/common.h b/scrub/common.h
index 1082296..f620620 100644
--- a/scrub/common.h
+++ b/scrub/common.h
@@ -20,4 +20,32 @@
 #ifndef XFS_SCRUB_COMMON_H_
 #define XFS_SCRUB_COMMON_H_
 
+/*
+ * When reporting a defective metadata object to the console, this
+ * is the size of the buffer to use to store the description of that
+ * item.
+ */
+#define DESCR_BUFSZ	256
+
+bool xfs_scrub_excessive_errors(struct scrub_ctx *ctx);
+
+void __str_errno(struct scrub_ctx *ctx, const char *descr, const char *file,
+		 int line);
+void __str_error(struct scrub_ctx *ctx, const char *descr, const char *file,
+		 int line, const char *format, ...);
+void __str_warn(struct scrub_ctx *ctx, const char *descr, const char *file,
+		int line, const char *format, ...);
+void __str_info(struct scrub_ctx *ctx, const char *descr, const char *file,
+		int line, const char *format, ...);
+void __record_repair(struct scrub_ctx *ctx, const char *descr, const char *file,
+		int line, const char *format, ...);
+void __record_preen(struct scrub_ctx *ctx, const char *descr, const char *file,
+		int line, const char *format, ...);
+
+#define str_errno(ctx, str)		__str_errno(ctx, str, __FILE__, __LINE__)
+#define str_error(ctx, str, ...)	__str_error(ctx, str, __FILE__, __LINE__, __VA_ARGS__)
+#define str_warn(ctx, str, ...)		__str_warn(ctx, str, __FILE__, __LINE__, __VA_ARGS__)
+#define str_info(ctx, str, ...)		__str_info(ctx, str, __FILE__, __LINE__, __VA_ARGS__)
+#define dbg_printf(fmt, ...)		{if (debug > 1) {printf(fmt, __VA_ARGS__);}}
+
 #endif /* XFS_SCRUB_COMMON_H_ */
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index 4f26855..10116a8 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -18,6 +18,8 @@
  * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
  */
 #include <stdio.h>
+#include <pthread.h>
+#include <stdbool.h>
 #include "xfs_scrub.h"
 
 /*
@@ -99,6 +101,12 @@
 /* Program name; needed for libxcmd error reports. */
 char				*progname = "xfs_scrub";
 
+/* Debug level; higher values mean more verbosity. */
+unsigned int			debug;
+
+/* Should we dump core if errors happen? */
+bool				dumpcore;
+
 int
 main(
 	int			argc,
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index ff9c24d..f19ac6b 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -20,4 +20,16 @@
 #ifndef XFS_SCRUB_XFS_SCRUB_H_
 #define XFS_SCRUB_XFS_SCRUB_H_
 
+extern unsigned int		debug;
+extern bool			dumpcore;
+
+struct scrub_ctx {
+	/* Mutable scrub state; use lock. */
+	pthread_mutex_t		lock;
+	unsigned long long	max_errors;
+	unsigned long long	runtime_errors;
+	unsigned long long	errors_found;
+	unsigned long long	warnings_found;
+};
+
 #endif /* XFS_SCRUB_XFS_SCRUB_H_ */


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 03/27] xfs_scrub: set up command line argument parsing
  2018-01-06  1:51 [PATCH v11 00/27] xfsprogs: online scrub/repair support Darrick J. Wong
  2018-01-06  1:51 ` [PATCH 01/27] xfs_scrub: create online filesystem scrub program Darrick J. Wong
  2018-01-06  1:51 ` [PATCH 02/27] xfs_scrub: common error handling Darrick J. Wong
@ 2018-01-06  1:51 ` Darrick J. Wong
  2018-01-11 23:39   ` Eric Sandeen
  2018-01-12  1:30   ` Eric Sandeen
  2018-01-06  1:51 ` [PATCH 04/27] xfs_scrub: dispatch the various phases of the scrub program Darrick J. Wong
                   ` (27 subsequent siblings)
  30 siblings, 2 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-06  1:51 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Parse command line options in order to set up the context in which we
will scrub the filesystem.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/common.h    |    8 ++
 scrub/xfs_scrub.c |  207 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/xfs_scrub.h |   34 +++++++++
 3 files changed, 249 insertions(+)


diff --git a/scrub/common.h b/scrub/common.h
index f620620..15a59bd 100644
--- a/scrub/common.h
+++ b/scrub/common.h
@@ -48,4 +48,12 @@ void __record_preen(struct scrub_ctx *ctx, const char *descr, const char *file,
 #define str_info(ctx, str, ...)		__str_info(ctx, str, __FILE__, __LINE__, __VA_ARGS__)
 #define dbg_printf(fmt, ...)		{if (debug > 1) {printf(fmt, __VA_ARGS__);}}
 
+/* Is this debug tweak enabled? */
+static inline bool
+debug_tweak_on(
+	const char		*name)
+{
+	return debug && getenv(name) != NULL;
+}
+
 #endif /* XFS_SCRUB_COMMON_H_ */
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index 10116a8..9db3b41 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -20,7 +20,12 @@
 #include <stdio.h>
 #include <pthread.h>
 #include <stdbool.h>
+#include <stdlib.h>
+#include "platform_defs.h"
+#include "xfs.h"
+#include "input.h"
 #include "xfs_scrub.h"
+#include "common.h"
 
 /*
  * XFS Online Metadata Scrub (and Repair)
@@ -107,11 +112,213 @@ unsigned int			debug;
 /* Should we dump core if errors happen? */
 bool				dumpcore;
 
+/* Display resource usage at the end of each phase? */
+bool				display_rusage;
+
+/* Background mode; higher values insert more pauses between scrub calls. */
+unsigned int			bg_mode;
+
+/* Maximum number of processors available to us. */
+int				nproc;
+
+/* Number of threads we're allowed to use. */
+unsigned int			nr_threads;
+
+/* Verbosity; higher values print more information. */
+bool				verbose;
+
+/* Should we scrub the data blocks? */
+bool				scrub_data;
+
+/* Size of a memory page. */
+long				page_size;
+
+static void __attribute__((noreturn))
+usage(void)
+{
+	fprintf(stderr, _("Usage: %s [OPTIONS] mountpoint\n"), progname);
+	fprintf(stderr, _("-a:\tStop after this many errors are found.\n"));
+	fprintf(stderr, _("-b:\tBackground mode.\n"));
+	fprintf(stderr, _("-e:\tWhat to do if errors are found.\n"));
+	fprintf(stderr, _("-m:\tPath to /etc/mtab.\n"));
+	fprintf(stderr, _("-n:\tDry run.  Do not modify anything.\n"));
+	fprintf(stderr, _("-T:\tDisplay timing/usage information.\n"));
+	fprintf(stderr, _("-v:\tVerbose output.\n"));
+	fprintf(stderr, _("-V:\tPrint version.\n"));
+	fprintf(stderr, _("-x:\tScrub file data too.\n"));
+	fprintf(stderr, _("-y:\tRepair all errors.\n"));
+
+	exit(16);
+}
+
 int
 main(
 	int			argc,
 	char			**argv)
 {
+	int			c;
+	char			*mtab = NULL;
+	char			*repairstr = "";
+	struct scrub_ctx	ctx = {0};
+	unsigned long long	total_errors;
+	bool			moveon = true;
+	static bool		injected;
+	int			ret = 0;
+
 	fprintf(stderr, "XXX: This program is not complete!\n");
 	return 4;
+
+	progname = basename(argv[0]);
+	setlocale(LC_ALL, "");
+	bindtextdomain(PACKAGE, LOCALEDIR);
+	textdomain(PACKAGE);
+
+	pthread_mutex_init(&ctx.lock, NULL);
+	ctx.mode = SCRUB_MODE_DEFAULT;
+	ctx.error_action = ERRORS_CONTINUE;
+	while ((c = getopt(argc, argv, "a:bde:m:nTvxVy")) != EOF) {
+		switch (c) {
+		case 'a':
+			ctx.max_errors = cvt_u64(optarg, 10);
+			if (errno) {
+				perror(optarg);
+				usage();
+			}
+			break;
+		case 'b':
+			nr_threads = 1;
+			bg_mode++;
+			break;
+		case 'd':
+			debug++;
+			dumpcore = true;
+			break;
+		case 'e':
+			if (!strcmp("continue", optarg))
+				ctx.error_action = ERRORS_CONTINUE;
+			else if (!strcmp("shutdown", optarg))
+				ctx.error_action = ERRORS_SHUTDOWN;
+			else
+				usage();
+			break;
+		case 'm':
+			mtab = optarg;
+			break;
+		case 'n':
+			if (ctx.mode != SCRUB_MODE_DEFAULT) {
+				fprintf(stderr,
+_("Only one of the options -n or -y may be specified.\n"));
+				return 1;
+			}
+			ctx.mode = SCRUB_MODE_DRY_RUN;
+			break;
+		case 'T':
+			display_rusage = true;
+			break;
+		case 'v':
+			verbose = true;
+			break;
+		case 'V':
+			fprintf(stdout, _("%s version %s\n"), progname,
+					VERSION);
+			fflush(stdout);
+			exit(0);
+		case 'x':
+			scrub_data = true;
+			break;
+		case 'y':
+			if (ctx.mode != SCRUB_MODE_DEFAULT) {
+				fprintf(stderr,
+_("Only one of the options -n or -y may be specified.\n"));
+				return 1;
+			}
+			ctx.mode = SCRUB_MODE_REPAIR;
+			break;
+		case '?':
+			/* fall through */
+		default:
+			usage();
+		}
+	}
+
+	/* Override thread count if debugger */
+	if (debug_tweak_on("XFS_SCRUB_THREADS")) {
+		unsigned int	x;
+
+		x = cvt_u32(getenv("XFS_SCRUB_THREADS"), 10);
+		if (errno) {
+			perror("nr_threads");
+			usage();
+		}
+		nr_threads = x;
+	}
+
+	if (optind != argc - 1)
+		usage();
+
+	ctx.mntpoint = strdup(argv[optind]);
+
+	/*
+	 * If the user did not specify an explicit mount table, try to use
+	 * /proc/mounts if it is available, else /etc/mtab.  We prefer
+	 * /proc/mounts because it is kernel controlled, while /etc/mtab
+	 * may contain garbage that userspace tools like pam_mounts wrote
+	 * into it.
+	 */
+	if (!mtab) {
+		if (access(_PATH_PROC_MOUNTS, R_OK) == 0)
+			mtab = _PATH_PROC_MOUNTS;
+		else
+			mtab = _PATH_MOUNTED;
+	}
+
+	/* How many CPUs? */
+	nproc = sysconf(_SC_NPROCESSORS_ONLN);
+	if (nproc < 1)
+		nproc = 1;
+
+	/* Set up a page-aligned buffer for read verification. */
+	page_size = sysconf(_SC_PAGESIZE);
+	if (page_size < 0) {
+		str_errno(&ctx, ctx.mntpoint);
+		goto out;
+	}
+
+	if (debug_tweak_on("XFS_SCRUB_FORCE_REPAIR") && !injected) {
+		ctx.mode = SCRUB_MODE_REPAIR;
+		injected = true;
+	}
+
+	if (xfs_scrub_excessive_errors(&ctx))
+		str_info(&ctx, ctx.mntpoint, _("Too many errors; aborting."));
+
+	if (debug_tweak_on("XFS_SCRUB_FORCE_ERROR"))
+		str_error(&ctx, ctx.mntpoint, _("Injecting error."));
+
+out:
+	total_errors = ctx.errors_found + ctx.runtime_errors;
+	if (ctx.need_repair)
+		repairstr = _("  Unmount and run xfs_repair.");
+	if (total_errors && ctx.warnings_found)
+		fprintf(stderr,
+_("%s: %llu errors and %llu warnings found.%s\n"),
+			ctx.mntpoint, total_errors, ctx.warnings_found,
+			repairstr);
+	else if (total_errors && ctx.warnings_found == 0)
+		fprintf(stderr,
+_("%s: %llu errors found.%s\n"),
+			ctx.mntpoint, total_errors, repairstr);
+	else if (total_errors == 0 && ctx.warnings_found)
+		fprintf(stderr,
+_("%s: %llu warnings found.\n"),
+			ctx.mntpoint, ctx.warnings_found);
+	if (ctx.errors_found)
+		ret |= 1;
+	if (ctx.warnings_found)
+		ret |= 2;
+	if (ctx.runtime_errors)
+		ret |= 4;
+	free(ctx.mntpoint);
+
+	return ret;
 }
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index f19ac6b..03d6012 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -20,16 +20,50 @@
 #ifndef XFS_SCRUB_XFS_SCRUB_H_
 #define XFS_SCRUB_XFS_SCRUB_H_
 
+#define _PATH_PROC_MOUNTS	"/proc/mounts"
+
+extern unsigned int		nr_threads;
+extern unsigned int		bg_mode;
 extern unsigned int		debug;
+extern int			nproc;
+extern bool			display_rusage;
 extern bool			dumpcore;
+extern bool			verbose;
+extern bool			scrub_data;
+extern long			page_size;
+
+enum scrub_mode {
+	SCRUB_MODE_DRY_RUN,
+	SCRUB_MODE_PREEN,
+	SCRUB_MODE_REPAIR,
+};
+#define SCRUB_MODE_DEFAULT			SCRUB_MODE_PREEN
+
+enum error_action {
+	ERRORS_CONTINUE,
+	ERRORS_SHUTDOWN,
+};
 
 struct scrub_ctx {
+	/* Immutable scrub state. */
+
+	/* Strings we need for presentation */
+	char			*mntpoint;
+	char			*blkdev;
+
+	/* What does the user want us to do? */
+	enum scrub_mode		mode;
+
+	/* How does the user want us to react to errors? */
+	enum error_action	error_action;
+
 	/* Mutable scrub state; use lock. */
 	pthread_mutex_t		lock;
 	unsigned long long	max_errors;
 	unsigned long long	runtime_errors;
 	unsigned long long	errors_found;
 	unsigned long long	warnings_found;
+	bool			need_repair;
 };
 
 #endif /* XFS_SCRUB_XFS_SCRUB_H_ */


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 04/27] xfs_scrub: dispatch the various phases of the scrub program
  2018-01-06  1:51 [PATCH v11 00/27] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (2 preceding siblings ...)
  2018-01-06  1:51 ` [PATCH 03/27] xfs_scrub: set up command line argument parsing Darrick J. Wong
@ 2018-01-06  1:51 ` Darrick J. Wong
  2018-01-06  1:51 ` [PATCH 05/27] xfs_scrub: figure out how many threads we're going to need Darrick J. Wong
                   ` (26 subsequent siblings)
  30 siblings, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-06  1:51 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Create the dispatching routines that we'll use to call out to each
separate phase of the program.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 configure.ac          |    1 
 include/builddefs.in  |    1 
 m4/package_libcdev.m4 |   18 +++
 scrub/Makefile        |    4 +
 scrub/common.c        |   63 +++++++++++
 scrub/common.h        |    4 +
 scrub/xfs_scrub.c     |  275 +++++++++++++++++++++++++++++++++++++++++++++++++
 7 files changed, 366 insertions(+)


diff --git a/configure.ac b/configure.ac
index f83d581..796a91b 100644
--- a/configure.ac
+++ b/configure.ac
@@ -165,6 +165,7 @@ AC_HAVE_GETFSMAP
 AC_HAVE_STATFS_FLAGS
 AC_HAVE_MAP_SYNC
 AC_HAVE_DEVMAPPER
+AC_HAVE_MALLINFO
 
 if test "$enable_blkid" = yes; then
 AC_HAVE_BLKID_TOPO
diff --git a/include/builddefs.in b/include/builddefs.in
index 9470703..28cf0d8 100644
--- a/include/builddefs.in
+++ b/include/builddefs.in
@@ -119,6 +119,7 @@ HAVE_GETFSMAP = @have_getfsmap@
 HAVE_STATFS_FLAGS = @have_statfs_flags@
 HAVE_MAP_SYNC = @have_map_sync@
 HAVE_DEVMAPPER = @have_devmapper@
+HAVE_MALLINFO = @have_mallinfo@
 
 GCCFLAGS = -funsigned-char -fno-strict-aliasing -Wall
 #	   -Wbitwise -Wno-transparent-union -Wno-old-initializer -Wno-decl
diff --git a/m4/package_libcdev.m4 b/m4/package_libcdev.m4
index 71cedc5..d3955f0 100644
--- a/m4/package_libcdev.m4
+++ b/m4/package_libcdev.m4
@@ -344,3 +344,21 @@ AC_DEFUN([AC_HAVE_MAP_SYNC],
 	AC_MSG_RESULT(no))
     AC_SUBST(have_map_sync)
   ])
+
+#
+# Check if we have a mallinfo libc call
+#
+AC_DEFUN([AC_HAVE_MALLINFO],
+  [ AC_MSG_CHECKING([for mallinfo ])
+    AC_TRY_COMPILE([
+#include <malloc.h>
+    ], [
+         struct mallinfo test;
+
+         test.arena = 0; test.hblkhd = 0; test.uordblks = 0; test.fordblks = 0;
+         test = mallinfo();
+    ], have_mallinfo=yes
+       AC_MSG_RESULT(yes),
+       AC_MSG_RESULT(no))
+    AC_SUBST(have_mallinfo)
+  ])
diff --git a/scrub/Makefile b/scrub/Makefile
index 62cca3b..097ec84 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -27,6 +27,10 @@ LLDLIBS += $(LIBHANDLE) $(LIBFROG) $(LIBPTHREAD)
 LTDEPENDENCIES += $(LIBHANDLE) $(LIBFROG)
 LLDFLAGS = -static
 
+ifeq ($(HAVE_MALLINFO),yes)
+LCFLAGS += -DHAVE_MALLINFO
+endif
+
 default: depend $(LTCOMMAND)
 
 include $(BUILDRULES)
diff --git a/scrub/common.c b/scrub/common.c
index 3c89b7d..9880ab5 100644
--- a/scrub/common.c
+++ b/scrub/common.c
@@ -159,3 +159,66 @@ do_error(char const *msg, ...)
 		abort();
 	exit(1);
 }
+
+double
+timeval_subtract(
+	struct timeval		*tv1,
+	struct timeval		*tv2)
+{
+	return ((tv1->tv_sec - tv2->tv_sec) +
+		((float) (tv1->tv_usec - tv2->tv_usec)) / 1000000);
+}
+
+/* Produce human readable disk space output. */
+double
+auto_space_units(
+	unsigned long long	bytes,
+	char			**units)
+{
+	if (debug > 1)
+		goto no_prefix;
+	if (bytes > (1ULL << 40)) {
+		*units = "TiB";
+		return (double)bytes / (1ULL << 40);
+	} else if (bytes > (1ULL << 30)) {
+		*units = "GiB";
+		return (double)bytes / (1ULL << 30);
+	} else if (bytes > (1ULL << 20)) {
+		*units = "MiB";
+		return (double)bytes / (1ULL << 20);
+	} else if (bytes > (1ULL << 10)) {
+		*units = "KiB";
+		return (double)bytes / (1ULL << 10);
+	}
+
+no_prefix:
+	*units = "B";
+	return bytes;
+}
+
+/* Produce human readable discrete number output. */
+double
+auto_units(
+	unsigned long long	number,
+	char			**units)
+{
+	if (debug > 1)
+		goto no_prefix;
+	if (number > 1000000000000ULL) {
+		*units = "T";
+		return number / 1000000000000.0;
+	} else if (number > 1000000000ULL) {
+		*units = "G";
+		return number / 1000000000.0;
+	} else if (number > 1000000ULL) {
+		*units = "M";
+		return number / 1000000.0;
+	} else if (number > 1000ULL) {
+		*units = "K";
+		return number / 1000.0;
+	}
+
+no_prefix:
+	*units = "";
+	return number;
+}
diff --git a/scrub/common.h b/scrub/common.h
index 15a59bd..3afc616 100644
--- a/scrub/common.h
+++ b/scrub/common.h
@@ -56,4 +56,8 @@ debug_tweak_on(
 	return debug && getenv(name) != NULL;
 }
 
+double timeval_subtract(struct timeval *tv1, struct timeval *tv2);
+double auto_space_units(unsigned long long kilobytes, char **units);
+double auto_units(unsigned long long number, char **units);
+
 #endif /* XFS_SCRUB_COMMON_H_ */
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index 9db3b41..a9c185b 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -21,6 +21,8 @@
 #include <pthread.h>
 #include <stdbool.h>
 #include <stdlib.h>
+#include <sys/time.h>
+#include <sys/resource.h>
 #include "platform_defs.h"
 #include "xfs.h"
 #include "input.h"
@@ -151,6 +153,267 @@ usage(void)
 	exit(16);
 }
 
+#ifndef RUSAGE_BOTH
+# define RUSAGE_BOTH		(-2)
+#endif
+
+/* Get resource usage for ourselves and all children. */
+static int
+scrub_getrusage(
+	struct rusage		*usage)
+{
+	struct rusage		cusage;
+	int			err;
+
+	err = getrusage(RUSAGE_BOTH, usage);
+	if (!err)
+		return err;
+
+	err = getrusage(RUSAGE_SELF, usage);
+	if (err)
+		return err;
+
+	err = getrusage(RUSAGE_CHILDREN, &cusage);
+	if (err)
+		return err;
+
+	usage->ru_minflt += cusage.ru_minflt;
+	usage->ru_majflt += cusage.ru_majflt;
+	usage->ru_nswap += cusage.ru_nswap;
+	usage->ru_inblock += cusage.ru_inblock;
+	usage->ru_oublock += cusage.ru_oublock;
+	usage->ru_msgsnd += cusage.ru_msgsnd;
+	usage->ru_msgrcv += cusage.ru_msgrcv;
+	usage->ru_nsignals += cusage.ru_nsignals;
+	usage->ru_nvcsw += cusage.ru_nvcsw;
+	usage->ru_nivcsw += cusage.ru_nivcsw;
+	return 0;
+}
+
+/*
+ * Scrub Phase Dispatch
+ *
+ * The operations of the scrub program are split up into several
+ * different phases.  Each phase builds upon the metadata checked in the
+ * previous phase, which is to say that we may skip phase (X + 1) if our
+ * scans in phase (X) reveal corruption.  A phase may be skipped
+ * entirely.
+ */
+
+/* Resource usage for each phase. */
+struct phase_rusage {
+	struct rusage		ruse;
+	struct timeval		time;
+	unsigned long long	verified_bytes;
+	void			*brk_start;
+	const char		*descr;
+};
+
+/* Operations for each phase. */
+#define DATASCAN_DUMMY_FN	((void *)1)
+#define REPAIR_DUMMY_FN		((void *)2)
+struct phase_ops {
+	char		*descr;
+	bool		(*fn)(struct scrub_ctx *);
+	bool		must_run;
+};
+
+/* Start tracking resource usage for a phase. */
+static bool
+phase_start(
+	struct phase_rusage	*pi,
+	unsigned int		phase,
+	const char		*descr)
+{
+	int			error;
+
+	memset(pi, 0, sizeof(*pi));
+	error = scrub_getrusage(&pi->ruse);
+	if (error) {
+		perror(_("getrusage"));
+		return false;
+	}
+	pi->brk_start = sbrk(0);
+
+	error = gettimeofday(&pi->time, NULL);
+	if (error) {
+		perror(_("gettimeofday"));
+		return false;
+	}
+
+	pi->descr = descr;
+	if ((verbose || display_rusage) && descr) {
+		fprintf(stdout, _("Phase %u: %s\n"), phase, descr);
+		fflush(stdout);
+	}
+	return true;
+}
+
+/* Report usage stats. */
+static bool
+phase_end(
+	struct phase_rusage	*pi,
+	unsigned int		phase)
+{
+	struct rusage		ruse_now;
+#ifdef HAVE_MALLINFO
+	struct mallinfo		mall_now;
+#endif
+	struct timeval		time_now;
+	char			phasebuf[DESCR_BUFSZ];
+	double			dt;
+	unsigned long long	in, out;
+	unsigned long long	io;
+	double			i, o, t;
+	double			din, dout, dtot;
+	char			*iu, *ou, *tu, *dinu, *doutu, *dtotu;
+	int			error;
+
+	if (!display_rusage)
+		return true;
+
+	error = gettimeofday(&time_now, NULL);
+	if (error) {
+		perror(_("gettimeofday"));
+		return false;
+	}
+	dt = timeval_subtract(&time_now, &pi->time);
+
+	error = scrub_getrusage(&ruse_now);
+	if (error) {
+		perror(_("getrusage"));
+		return false;
+	}
+
+	if (phase)
+		snprintf(phasebuf, DESCR_BUFSZ, _("Phase %u: "), phase);
+	else
+		phasebuf[0] = 0;
+
+#define kbytes(x)	(((unsigned long)(x) + 1023) / 1024)
+#ifdef HAVE_MALLINFO
+
+	mall_now = mallinfo();
+	fprintf(stdout, _("%sMemory used: %luk/%luk (%luk/%luk), "),
+		phasebuf,
+		kbytes(mall_now.arena), kbytes(mall_now.hblkhd),
+		kbytes(mall_now.uordblks), kbytes(mall_now.fordblks));
+#else
+	fprintf(stdout, _("%sMemory used: %luk, "),
+		phasebuf,
+		(unsigned long) kbytes(((char *) sbrk(0)) -
+				       ((char *) pi->brk_start)));
+#endif
+#undef kbytes
+
+	fprintf(stdout, _("time: %5.2f/%5.2f/%5.2fs\n"),
+		timeval_subtract(&time_now, &pi->time),
+		timeval_subtract(&ruse_now.ru_utime, &pi->ruse.ru_utime),
+		timeval_subtract(&ruse_now.ru_stime, &pi->ruse.ru_stime));
+
+	/* I/O usage */
+	in =  ((unsigned long long)ruse_now.ru_inblock -
+			pi->ruse.ru_inblock) << BBSHIFT;
+	out = ((unsigned long long)ruse_now.ru_oublock -
+			pi->ruse.ru_oublock) << BBSHIFT;
+	io = in + out;
+	if (io) {
+		i = auto_space_units(in, &iu);
+		o = auto_space_units(out, &ou);
+		t = auto_space_units(io, &tu);
+		din = auto_space_units(in / dt, &dinu);
+		dout = auto_space_units(out / dt, &doutu);
+		dtot = auto_space_units(io / dt, &dtotu);
+		fprintf(stdout,
+_("%sI/O: %.1f%s in, %.1f%s out, %.1f%s tot\n"),
+			phasebuf, i, iu, o, ou, t, tu);
+		fprintf(stdout,
+_("%sI/O rate: %.1f%s/s in, %.1f%s/s out, %.1f%s/s tot\n"),
+			phasebuf, din, dinu, dout, doutu, dtot, dtotu);
+	}
+	fflush(stdout);
+
+	return true;
+}
+
+/* Run all the phases of the scrubber. */
+static bool
+run_scrub_phases(
+	struct scrub_ctx	*ctx)
+{
+	struct phase_ops phases[] =
+	{
+		{
+			.descr = _("Find filesystem geometry."),
+		},
+		{
+			.descr = _("Check internal metadata."),
+		},
+		{
+			.descr = _("Scan all inodes."),
+		},
+		{
+			.descr = _("Defer filesystem repairs."),
+			.fn = REPAIR_DUMMY_FN,
+		},
+		{
+			.descr = _("Check directory tree."),
+		},
+		{
+			.descr = _("Verify data file integrity."),
+			.fn = DATASCAN_DUMMY_FN,
+		},
+		{
+			.descr = _("Check summary counters."),
+		},
+		{
+			NULL
+		},
+	};
+	struct phase_rusage	pi;
+	struct phase_ops	*sp;
+	bool			moveon = true;
+	unsigned int		debug_phase = 0;
+	unsigned int		phase;
+
+	if (debug && debug_tweak_on("XFS_SCRUB_PHASE"))
+		debug_phase = atoi(getenv("XFS_SCRUB_PHASE"));
+
+	/* Run all phases of the scrub tool. */
+	for (phase = 1, sp = phases; sp->fn; sp++, phase++) {
+		/* Skip certain phases unless they're turned on. */
+		if (sp->fn == REPAIR_DUMMY_FN ||
+		    sp->fn == DATASCAN_DUMMY_FN)
+			continue;
+
+		/* Allow debug users to force a particular phase. */
+		if (debug_phase && phase != debug_phase && !sp->must_run)
+			continue;
+
+		/* Run this phase. */
+		moveon = phase_start(&pi, phase, sp->descr);
+		if (!moveon)
+			break;
+		moveon = sp->fn(ctx);
+		if (!moveon) {
+			str_info(ctx, ctx->mntpoint,
+_("Scrub aborted after phase %d."),
+					phase);
+			break;
+		}
+		moveon = phase_end(&pi, phase);
+		if (!moveon)
+			break;
+
+		/* Too many errors? */
+		moveon = !xfs_scrub_excessive_errors(ctx);
+		if (!moveon)
+			break;
+	}
+
+	return moveon;
+}
+
 int
 main(
 	int			argc,
@@ -160,6 +423,7 @@ main(
 	char			*mtab = NULL;
 	char			*repairstr = "";
 	struct scrub_ctx	ctx = {0};
+	struct phase_rusage	all_pi;
 	unsigned long long	total_errors;
 	bool			moveon = true;
 	static bool		injected;
@@ -272,6 +536,11 @@ _("Only one of the options -n or -y may be specified.\n"));
 			mtab = _PATH_MOUNTED;
 	}
 
+	/* Initialize overall phase stats. */
+	moveon = phase_start(&all_pi, 0, NULL);
+	if (!moveon)
+		goto out;
+
 	/* How many CPUs? */
 	nproc = sysconf(_SC_NPROCESSORS_ONLN);
 	if (nproc < 1)
@@ -289,6 +558,11 @@ _("Only one of the options -n or -y may be specified.\n"));
 		injected = true;
 	}
 
+	/* Scrub a filesystem. */
+	moveon = run_scrub_phases(&ctx);
+	if (!moveon)
+		ret |= 4;
+
 	if (xfs_scrub_excessive_errors(&ctx))
 		str_info(&ctx, ctx.mntpoint, _("Too many errors; aborting."));
 
@@ -318,6 +592,7 @@ _("%s: %llu warnings found.\n"),
 		ret |= 2;
 	if (ctx.runtime_errors)
 		ret |= 4;
+	phase_end(&all_pi, 0);
 	free(ctx.mntpoint);
 
 	return ret;


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 05/27] xfs_scrub: figure out how many threads we're going to need
  2018-01-06  1:51 [PATCH v11 00/27] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (3 preceding siblings ...)
  2018-01-06  1:51 ` [PATCH 04/27] xfs_scrub: dispatch the various phases of the scrub program Darrick J. Wong
@ 2018-01-06  1:51 ` Darrick J. Wong
  2018-01-06  1:52 ` [PATCH 06/27] xfs_scrub: create an abstraction for a block device Darrick J. Wong
                   ` (25 subsequent siblings)
  30 siblings, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-06  1:51 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Create the plumbing to figure out how many threads we're going to want
to do all of our scrubbing.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/common.c    |   26 ++++++++++++++++++++++++++
 scrub/common.h    |    2 ++
 scrub/xfs_scrub.h |    3 +++
 3 files changed, 31 insertions(+)


diff --git a/scrub/common.c b/scrub/common.c
index 9880ab5..75c6df5 100644
--- a/scrub/common.c
+++ b/scrub/common.c
@@ -222,3 +222,29 @@ auto_units(
 	*units = "";
 	return number;
 }
+
+/* How many threads to kick off? */
+unsigned int
+scrub_nproc(
+	struct scrub_ctx	*ctx)
+{
+	if (nr_threads)
+		return nr_threads;
+	return ctx->nr_io_threads;
+}
+
+/*
+ * How many threads to kick off for a workqueue?  If we only want one
+ * thread, save ourselves the overhead and just run it in the main thread.
+ */
+unsigned int
+scrub_nproc_workqueue(
+	struct scrub_ctx	*ctx)
+{
+	unsigned int		x;
+
+	x = scrub_nproc(ctx);
+	if (x == 1)
+		x = 0;
+	return x;
+}
diff --git a/scrub/common.h b/scrub/common.h
index 3afc616..41b3ea7 100644
--- a/scrub/common.h
+++ b/scrub/common.h
@@ -59,5 +59,7 @@ debug_tweak_on(
 double timeval_subtract(struct timeval *tv1, struct timeval *tv2);
 double auto_space_units(unsigned long long kilobytes, char **units);
 double auto_units(unsigned long long number, char **units);
+unsigned int scrub_nproc(struct scrub_ctx *ctx);
+unsigned int scrub_nproc_workqueue(struct scrub_ctx *ctx);
 
 #endif /* XFS_SCRUB_COMMON_H_ */
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index 03d6012..7f1dcb1 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -57,6 +57,9 @@ struct scrub_ctx {
 	/* How does the user want us to react to errors? */
 	enum error_action	error_action;
 
+	/* Number of threads for metadata scrubbing */
+	unsigned int		nr_io_threads;
+
 	/* Mutable scrub state; use lock. */
 	pthread_mutex_t		lock;
 	unsigned long long	max_errors;


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 06/27] xfs_scrub: create an abstraction for a block device
  2018-01-06  1:51 [PATCH v11 00/27] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (4 preceding siblings ...)
  2018-01-06  1:51 ` [PATCH 05/27] xfs_scrub: figure out how many threads we're going to need Darrick J. Wong
@ 2018-01-06  1:52 ` Darrick J. Wong
  2018-01-11 23:24   ` Eric Sandeen
  2018-01-06  1:52 ` [PATCH 07/27] xfs_scrub: find XFS filesystem geometry Darrick J. Wong
                   ` (24 subsequent siblings)
  30 siblings, 1 reply; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-06  1:52 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Create an abstraction to handle all of our low level disk operations.
We'll eventually use it to bind to a fs mount point and block device.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/Makefile |    2 +
 scrub/disk.c   |  164 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/disk.h   |   39 +++++++++++++
 3 files changed, 205 insertions(+)
 create mode 100644 scrub/disk.c
 create mode 100644 scrub/disk.h


diff --git a/scrub/Makefile b/scrub/Makefile
index 097ec84..c3a9986 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -17,10 +17,12 @@ endif	# scrub_prereqs
 
 HFILES = \
 common.h \
+disk.h \
 xfs_scrub.h
 
 CFILES = \
 common.c \
+disk.c \
 xfs_scrub.c
 
 LLDLIBS += $(LIBHANDLE) $(LIBFROG) $(LIBPTHREAD)
diff --git a/scrub/disk.c b/scrub/disk.c
new file mode 100644
index 0000000..d4bf81f
--- /dev/null
+++ b/scrub/disk.c
@@ -0,0 +1,164 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <stdbool.h>
+#include <unistd.h>
+#include <fcntl.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/ioctl.h>
+#include <sys/statvfs.h>
+#include <sys/vfs.h>
+#include <linux/fs.h>
+#include "platform_defs.h"
+#include "libfrog.h"
+#include "xfs_scrub.h"
+#include "disk.h"
+
+/*
+ * Disk Abstraction
+ *
+ * These routines help us to discover the geometry of a block device,
+ * estimate the amount of concurrent IOs that we can send to it, and
+ * abstract the process of performing read verification of disk blocks.
+ */
+
+/* Figure out how many disk heads are available. */
+static unsigned int
+__disk_heads(
+	struct disk		*disk)
+{
+	int			iomin;
+	int			ioopt;
+	unsigned short		rot;
+	int			error;
+
+	/* If it's not a block device, throw all the CPUs at it. */
+	if (!S_ISBLK(disk->d_sb.st_mode))
+		return nproc;
+
+	/* Non-rotational device?  Throw all the CPUs. */
+	rot = 1;
+	error = ioctl(disk->d_fd, BLKROTATIONAL, &rot);
+	if (error == 0 && rot == 0)
+		return nproc;
+
+	/*
+	 * Sometimes we can infer the number of devices from the
+	 * min/optimal IO sizes.
+	 */
+	iomin = ioopt = 0;
+	if (ioctl(disk->d_fd, BLKIOMIN, &iomin) == 0 &&
+	    ioctl(disk->d_fd, BLKIOOPT, &ioopt) == 0 &&
+	    iomin > 0 && ioopt > 0) {
+		return min(nproc, max(1, ioopt / iomin));
+	}
+
+	/* Rotating device?  I guess? */
+	return 2;
+}
+
+/* Figure out how many disk heads are available. */
+unsigned int
+disk_heads(
+	struct disk		*disk)
+{
+	if (nr_threads)
+		return nr_threads;
+	return __disk_heads(disk);
+}
+
+/* Open a disk device and discover its geometry. */
+struct disk *
+disk_open(
+	const char		*pathname)
+{
+	struct disk		*disk;
+	int			lba_sz;
+	int			error;
+
+	disk = calloc(1, sizeof(struct disk));
+	if (!disk)
+		return NULL;
+
+	disk->d_fd = open(pathname, O_RDONLY | O_DIRECT | O_NOATIME);
+	if (disk->d_fd < 0)
+		goto out_free;
+
+	/* Try to get LBA size. */
+	error = ioctl(disk->d_fd, BLKSSZGET, &lba_sz);
+	if (error)
+		lba_sz = 512;
+	disk->d_lbalog = log2_roundup(lba_sz);
+
+	/* Obtain disk's stat info. */
+	error = fstat(disk->d_fd, &disk->d_sb);
+	if (error)
+		goto out_close;
+
+	/* Determine bdev size, block size, and offset. */
+	if (S_ISBLK(disk->d_sb.st_mode)) {
+		error = ioctl(disk->d_fd, BLKGETSIZE64, &disk->d_size);
+		if (error)
+			disk->d_size = 0;
+		error = ioctl(disk->d_fd, BLKBSZGET, &disk->d_blksize);
+		if (error)
+			disk->d_blksize = 0;
+		disk->d_start = 0;
+	} else {
+		disk->d_size = disk->d_sb.st_size;
+		disk->d_blksize = disk->d_sb.st_blksize;
+		disk->d_start = 0;
+	}
+
+	return disk;
+out_close:
+	close(disk->d_fd);
+out_free:
+	free(disk);
+	return NULL;
+}
+
+/* Close a disk device. */
+int
+disk_close(
+	struct disk		*disk)
+{
+	int			error = 0;
+
+	if (disk->d_fd >= 0)
+		error = close(disk->d_fd);
+	disk->d_fd = -1;
+	free(disk);
+	return error;
+}
+
+/* Read-verify an extent of a disk device. */
+ssize_t
+disk_read_verify(
+	struct disk		*disk,
+	void			*buf,
+	uint64_t		start,
+	uint64_t		length)
+{
+	return pread(disk->d_fd, buf, length, start);
+}
diff --git a/scrub/disk.h b/scrub/disk.h
new file mode 100644
index 0000000..834678e
--- /dev/null
+++ b/scrub/disk.h
@@ -0,0 +1,39 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef XFS_SCRUB_DISK_H_
+#define XFS_SCRUB_DISK_H_
+
+struct disk {
+	struct stat	d_sb;
+	int		d_fd;
+	int		d_lbalog;
+	unsigned int	d_flags;
+	unsigned int	d_blksize;	/* bytes */
+	uint64_t	d_size;		/* bytes */
+	uint64_t	d_start;	/* bytes */
+};
+
+unsigned int disk_heads(struct disk *disk);
+struct disk *disk_open(const char *pathname);
+int disk_close(struct disk *disk);
+ssize_t disk_read_verify(struct disk *disk, void *buf, uint64_t startblock,
+		uint64_t blockcount);
+
+#endif /* XFS_SCRUB_DISK_H_ */


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 07/27] xfs_scrub: find XFS filesystem geometry
  2018-01-06  1:51 [PATCH v11 00/27] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (5 preceding siblings ...)
  2018-01-06  1:52 ` [PATCH 06/27] xfs_scrub: create an abstraction for a block device Darrick J. Wong
@ 2018-01-06  1:52 ` Darrick J. Wong
  2018-01-06  1:52 ` [PATCH 08/27] xfs_scrub: add inode iteration functions Darrick J. Wong
                   ` (23 subsequent siblings)
  30 siblings, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-06  1:52 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Discover the geometry of the XFS filesystem that we've been told to
scan, and set up some common functions that will be used by the
scrub phases.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/Makefile    |    5 +
 scrub/common.c    |   72 +++++++++++++++++
 scrub/common.h    |   10 ++
 scrub/disk.c      |    3 +
 scrub/phase1.c    |  223 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/xfs_scrub.c |   35 ++++++++
 scrub/xfs_scrub.h |   29 +++++++
 7 files changed, 376 insertions(+), 1 deletion(-)
 create mode 100644 scrub/phase1.c


diff --git a/scrub/Makefile b/scrub/Makefile
index c3a9986..5239dae 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -23,6 +23,7 @@ xfs_scrub.h
 CFILES = \
 common.c \
 disk.c \
+phase1.c \
 xfs_scrub.c
 
 LLDLIBS += $(LIBHANDLE) $(LIBFROG) $(LIBPTHREAD)
@@ -33,6 +34,10 @@ ifeq ($(HAVE_MALLINFO),yes)
 LCFLAGS += -DHAVE_MALLINFO
 endif
 
+ifeq ($(HAVE_SYNCFS),yes)
+LCFLAGS += -DHAVE_SYNCFS
+endif
+
 default: depend $(LTCOMMAND)
 
 include $(BUILDRULES)
diff --git a/scrub/common.c b/scrub/common.c
index 75c6df5..252809d 100644
--- a/scrub/common.c
+++ b/scrub/common.c
@@ -20,8 +20,11 @@
 #include <stdio.h>
 #include <pthread.h>
 #include <stdbool.h>
+#include <sys/statvfs.h>
 #include "platform_defs.h"
 #include "xfs.h"
+#include "xfs_fs.h"
+#include "path.h"
 #include "xfs_scrub.h"
 #include "common.h"
 
@@ -248,3 +251,72 @@ scrub_nproc_workqueue(
 		x = 0;
 	return x;
 }
+
+/*
+ * Check if the argument is either the device name or mountpoint of a mounted
+ * filesystem.
+ */
+#define MNTTYPE_XFS	"xfs"
+static bool
+find_mountpoint_check(
+	struct stat		*sb,
+	struct mntent		*t)
+{
+	struct stat		ms;
+
+	if (S_ISDIR(sb->st_mode)) {		/* mount point */
+		if (stat(t->mnt_dir, &ms) < 0)
+			return false;
+		if (sb->st_ino != ms.st_ino)
+			return false;
+		if (sb->st_dev != ms.st_dev)
+			return false;
+		if (strcmp(t->mnt_type, MNTTYPE_XFS) != 0)
+			return NULL;
+	} else {				/* device */
+		if (stat(t->mnt_fsname, &ms) < 0)
+			return false;
+		if (sb->st_rdev != ms.st_rdev)
+			return false;
+		if (strcmp(t->mnt_type, MNTTYPE_XFS) != 0)
+			return NULL;
+		/*
+		 * Make sure the mountpoint given by mtab is accessible
+		 * before using it.
+		 */
+		if (stat(t->mnt_dir, &ms) < 0)
+			return false;
+	}
+
+	return true;
+}
+
+/* Check that our alleged mountpoint is in mtab */
+bool
+find_mountpoint(
+	char			*mtab,
+	struct scrub_ctx	*ctx)
+{
+	struct mntent_cursor	cursor;
+	struct mntent		*t = NULL;
+	bool			found = false;
+
+	if (platform_mntent_open(&cursor, mtab) != 0) {
+		fprintf(stderr, "Error: can't get mntent entries.\n");
+		exit(1);
+	}
+
+	while ((t = platform_mntent_next(&cursor)) != NULL) {
+		/*
+		 * Keep jotting down matching mount details; newer mounts are
+		 * towards the end of the file (hopefully).
+		 */
+		if (find_mountpoint_check(&ctx->mnt_sb, t)) {
+			ctx->mntpoint = strdup(t->mnt_dir);
+			ctx->blkdev = strdup(t->mnt_fsname);
+			found = true;
+		}
+	}
+	platform_mntent_close(&cursor);
+	return found;
+}
diff --git a/scrub/common.h b/scrub/common.h
index 41b3ea7..fed95df 100644
--- a/scrub/common.h
+++ b/scrub/common.h
@@ -62,4 +62,14 @@ double auto_units(unsigned long long number, char **units);
 unsigned int scrub_nproc(struct scrub_ctx *ctx);
 unsigned int scrub_nproc_workqueue(struct scrub_ctx *ctx);
 
+#ifndef HAVE_SYNCFS
+static inline int syncfs(int fd)
+{
+	sync();
+	return 0;
+}
+#endif
+
+bool find_mountpoint(char *mtab, struct scrub_ctx *ctx);
+
 #endif /* XFS_SCRUB_COMMON_H_ */
diff --git a/scrub/disk.c b/scrub/disk.c
index d4bf81f..546a06c 100644
--- a/scrub/disk.c
+++ b/scrub/disk.c
@@ -31,6 +31,9 @@
 #include <linux/fs.h>
 #include "platform_defs.h"
 #include "libfrog.h"
+#include "xfs.h"
+#include "path.h"
+#include "xfs_fs.h"
 #include "xfs_scrub.h"
 #include "disk.h"
 
diff --git a/scrub/phase1.c b/scrub/phase1.c
new file mode 100644
index 0000000..65409d3
--- /dev/null
+++ b/scrub/phase1.c
@@ -0,0 +1,223 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include <mntent.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/resource.h>
+#include <sys/statvfs.h>
+#include <sys/vfs.h>
+#include <fcntl.h>
+#include <dirent.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <pthread.h>
+#include <errno.h>
+#include <linux/fs.h>
+#include "libfrog.h"
+#include "workqueue.h"
+#include "input.h"
+#include "path.h"
+#include "handle.h"
+#include "bitops.h"
+#include "xfs_arch.h"
+#include "xfs_format.h"
+#include "avl64.h"
+#include "list.h"
+#include "xfs_scrub.h"
+#include "common.h"
+#include "disk.h"
+
+/* Phase 1: Find filesystem geometry (and clean up after) */
+
+/* Shut down the filesystem. */
+void
+xfs_shutdown_fs(
+	struct scrub_ctx		*ctx)
+{
+	int				flag;
+
+	flag = XFS_FSOP_GOING_FLAGS_LOGFLUSH;
+	str_info(ctx, ctx->mntpoint, _("Shutting down filesystem!"));
+	if (ioctl(ctx->mnt_fd, XFS_IOC_GOINGDOWN, &flag))
+		str_errno(ctx, ctx->mntpoint);
+}
+
+/* Clean up the XFS-specific state data. */
+bool
+xfs_cleanup_fs(
+	struct scrub_ctx	*ctx)
+{
+	if (ctx->fshandle)
+		free_handle(ctx->fshandle, ctx->fshandle_len);
+	if (ctx->rtdev)
+		disk_close(ctx->rtdev);
+	if (ctx->logdev)
+		disk_close(ctx->logdev);
+	if (ctx->datadev)
+		disk_close(ctx->datadev);
+	fshandle_destroy();
+	close(ctx->mnt_fd);
+	fs_table_destroy();
+
+	return true;
+}
+
+/*
+ * Bind to the mountpoint, read the XFS geometry, bind to the block devices.
+ * Anything we've already built will be cleaned up by xfs_cleanup_fs.
+ */
+bool
+xfs_setup_fs(
+	struct scrub_ctx		*ctx)
+{
+	struct fs_path			*fsp;
+	int				error;
+
+	/*
+	 * Open the directory with O_NOATIME.  For mountpoints owned
+	 * by root, this should be sufficient to ensure that we have
+	 * CAP_SYS_ADMIN, which we probably need to do anything fancy
+	 * with the (XFS driver) kernel.
+	 */
+	ctx->mnt_fd = open(ctx->mntpoint, O_RDONLY | O_NOATIME | O_DIRECTORY);
+	if (ctx->mnt_fd < 0) {
+		if (errno == EPERM)
+			str_info(ctx, ctx->mntpoint,
+_("Must be root to run scrub."));
+		else
+			str_errno(ctx, ctx->mntpoint);
+		return false;
+	}
+
+	error = fstat(ctx->mnt_fd, &ctx->mnt_sb);
+	if (error) {
+		str_errno(ctx, ctx->mntpoint);
+		return false;
+	}
+	error = fstatvfs(ctx->mnt_fd, &ctx->mnt_sv);
+	if (error) {
+		str_errno(ctx, ctx->mntpoint);
+		return false;
+	}
+	error = fstatfs(ctx->mnt_fd, &ctx->mnt_sf);
+	if (error) {
+		str_errno(ctx, ctx->mntpoint);
+		return false;
+	}
+
+	ctx->nr_io_threads = nproc;
+	if (verbose) {
+		fprintf(stdout, _("%s: using %d threads to scrub.\n"),
+				ctx->mntpoint, scrub_nproc(ctx));
+		fflush(stdout);
+	}
+
+	if (!platform_test_xfs_fd(ctx->mnt_fd)) {
+		str_error(ctx, ctx->mntpoint,
+_("Does not appear to be an XFS filesystem!"));
+		return false;
+	}
+
+	/*
+	 * Flush everything out to disk before we start checking.
+	 * This seems to reduce the incidence of stale file handle
+	 * errors when we open things by handle.
+	 */
+	error = syncfs(ctx->mnt_fd);
+	if (error) {
+		str_errno(ctx, ctx->mntpoint);
+		return false;
+	}
+
+	/* Retrieve XFS geometry. */
+	error = ioctl(ctx->mnt_fd, XFS_IOC_FSGEOMETRY, &ctx->geo);
+	if (error) {
+		str_errno(ctx, ctx->mntpoint);
+		return false;
+	}
+
+	ctx->agblklog = log2_roundup(ctx->geo.agblocks);
+	ctx->blocklog = highbit32(ctx->geo.blocksize);
+	ctx->inodelog = highbit32(ctx->geo.inodesize);
+	ctx->inopblog = ctx->blocklog - ctx->inodelog;
+
+	error = path_to_fshandle(ctx->mntpoint, &ctx->fshandle,
+			&ctx->fshandle_len);
+	if (error) {
+		perror(_("getting fshandle"));
+		return false;
+	}
+
+	/* Go find the XFS devices if we have a usable fsmap. */
+	fs_table_initialise(0, NULL, 0, NULL);
+	errno = 0;
+	fsp = fs_table_lookup(ctx->mntpoint, FS_MOUNT_POINT);
+	if (!fsp) {
+		str_error(ctx, ctx->mntpoint,
+_("Unable to find XFS information."));
+		return false;
+	}
+	memcpy(&ctx->fsinfo, fsp, sizeof(struct fs_path));
+
+	/* Did we find the log and rt devices, if they're present? */
+	if (ctx->geo.logstart == 0 && ctx->fsinfo.fs_log == NULL) {
+		str_error(ctx, ctx->mntpoint,
+_("Unable to find log device path."));
+		return false;
+	}
+	if (ctx->geo.rtblocks && ctx->fsinfo.fs_rt == NULL) {
+		str_error(ctx, ctx->mntpoint,
+_("Unable to find realtime device path."));
+		return false;
+	}
+
+	/* Open the raw devices. */
+	ctx->datadev = disk_open(ctx->fsinfo.fs_name);
+	if (error) {
+		str_errno(ctx, ctx->fsinfo.fs_name);
+		return false;
+	}
+
+	if (ctx->fsinfo.fs_log) {
+		ctx->logdev = disk_open(ctx->fsinfo.fs_log);
+		if (error) {
+			str_errno(ctx, ctx->fsinfo.fs_name);
+			return false;
+		}
+	}
+	if (ctx->fsinfo.fs_rt) {
+		ctx->rtdev = disk_open(ctx->fsinfo.fs_rt);
+		if (error) {
+			str_errno(ctx, ctx->fsinfo.fs_name);
+			return false;
+		}
+	}
+
+	/*
+	 * Everything's set up, which means any failures recorded after
+	 * this point are most probably corruption errors (as opposed to
+	 * purely setup errors).
+	 */
+	ctx->need_repair = true;
+	return true;
+}
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index a9c185b..a733b8f 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -23,9 +23,12 @@
 #include <stdlib.h>
 #include <sys/time.h>
 #include <sys/resource.h>
+#include <sys/statvfs.h>
 #include "platform_defs.h"
 #include "xfs.h"
+#include "xfs_fs.h"
 #include "input.h"
+#include "path.h"
 #include "xfs_scrub.h"
 #include "common.h"
 
@@ -345,6 +348,8 @@ run_scrub_phases(
 	{
 		{
 			.descr = _("Find filesystem geometry."),
+			.fn = xfs_setup_fs,
+			.must_run = true,
 		},
 		{
 			.descr = _("Check internal metadata."),
@@ -426,6 +431,7 @@ main(
 	struct phase_rusage	all_pi;
 	unsigned long long	total_errors;
 	bool			moveon = true;
+	bool			ismnt;
 	static bool		injected;
 	int			ret = 0;
 
@@ -522,6 +528,15 @@ _("Only one of the options -n or -y may be specified.\n"));
 
 	ctx.mntpoint = strdup(argv[optind]);
 
+	/* Find the mount record for the passed-in argument. */
+	if (stat(argv[optind], &ctx.mnt_sb) < 0) {
+		fprintf(stderr,
+			_("%s: could not stat: %s: %s\n"),
+			progname, argv[optind], strerror(errno));
+		ret |= 8;
+		goto out;
+	}
+
 	/*
 	 * If the user did not specify an explicit mount table, try to use
 	 * /proc/mounts if it is available, else /etc/mtab.  We prefer
@@ -541,6 +556,15 @@ _("Only one of the options -n or -y may be specified.\n"));
 	if (!moveon)
 		goto out;
 
+	ismnt = find_mountpoint(mtab, &ctx);
+	if (!ismnt) {
+		fprintf(stderr,
+_("%s: Not a XFS mount point or block device.\n"),
+			ctx.mntpoint);
+		ret |= 8;
+		goto out;
+	}
+
 	/* How many CPUs? */
 	nproc = sysconf(_SC_NPROCESSORS_ONLN);
 	if (nproc < 1)
@@ -569,6 +593,11 @@ _("Only one of the options -n or -y may be specified.\n"));
 	if (debug_tweak_on("XFS_SCRUB_FORCE_ERROR"))
 		str_error(&ctx, ctx.mntpoint, _("Injecting error."));
 
+	/* Clean up scan data. */
+	moveon = xfs_cleanup_fs(&ctx);
+	if (!moveon)
+		ret |= 8;
+
 out:
 	total_errors = ctx.errors_found + ctx.runtime_errors;
 	if (ctx.need_repair)
@@ -586,13 +615,17 @@ _("%s: %llu errors found.%s\n"),
 		fprintf(stderr,
 _("%s: %llu warnings found.\n"),
 			ctx.mntpoint, ctx.warnings_found);
-	if (ctx.errors_found)
+	if (ctx.errors_found) {
+		if (ctx.error_action == ERRORS_SHUTDOWN)
+			xfs_shutdown_fs(&ctx);
 		ret |= 1;
+	}
 	if (ctx.warnings_found)
 		ret |= 2;
 	if (ctx.runtime_errors)
 		ret |= 4;
 	phase_end(&all_pi, 0);
+	free(ctx.blkdev);
 	free(ctx.mntpoint);
 
 	return ret;
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index 7f1dcb1..2be7c65 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -51,15 +51,38 @@ struct scrub_ctx {
 	char			*mntpoint;
 	char			*blkdev;
 
+	/* Mountpoint info */
+	struct stat		mnt_sb;
+	struct statvfs		mnt_sv;
+	struct statfs		mnt_sf;
+
+	/* Open block devices */
+	struct disk		*datadev;
+	struct disk		*logdev;
+	struct disk		*rtdev;
+
 	/* What does the user want us to do? */
 	enum scrub_mode		mode;
 
 	/* How does the user want us to react to errors? */
 	enum error_action	error_action;
 
+	/* fd to filesystem mount point */
+	int			mnt_fd;
+
 	/* Number of threads for metadata scrubbing */
 	unsigned int		nr_io_threads;
 
+	/* XFS specific geometry */
+	struct xfs_fsop_geom	geo;
+	struct fs_path		fsinfo;
+	unsigned int		agblklog;
+	unsigned int		blocklog;
+	unsigned int		inodelog;
+	unsigned int		inopblog;
+	void			*fshandle;
+	size_t			fshandle_len;
+
 	/* Mutable scrub state; use lock. */
 	pthread_mutex_t		lock;
 	unsigned long long	max_errors;
@@ -67,6 +90,12 @@ struct scrub_ctx {
 	unsigned long long	errors_found;
 	unsigned long long	warnings_found;
 	bool			need_repair;
+	bool			preen_triggers[XFS_SCRUB_TYPE_NR];
 };
 
+/* Phase helper functions */
+void xfs_shutdown_fs(struct scrub_ctx *ctx);
+bool xfs_cleanup_fs(struct scrub_ctx *ctx);
+bool xfs_setup_fs(struct scrub_ctx *ctx);
+
 #endif /* XFS_SCRUB_XFS_SCRUB_H_ */


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 08/27] xfs_scrub: add inode iteration functions
  2018-01-06  1:51 [PATCH v11 00/27] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (6 preceding siblings ...)
  2018-01-06  1:52 ` [PATCH 07/27] xfs_scrub: find XFS filesystem geometry Darrick J. Wong
@ 2018-01-06  1:52 ` Darrick J. Wong
  2018-01-06  1:52 ` [PATCH 09/27] xfs_scrub: add space map " Darrick J. Wong
                   ` (22 subsequent siblings)
  30 siblings, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-06  1:52 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

These helpers enable userspace to count or iterate all inodes in a
filesystem.  The counting function uses INUMBERS, while the inode
iterator uses INUMBERS and BULKSTAT to iterate over every inode that
should be in the filesystem.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/Makefile |    2 
 scrub/inodes.c |  284 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/inodes.h |   32 ++++++
 3 files changed, 318 insertions(+)
 create mode 100644 scrub/inodes.c
 create mode 100644 scrub/inodes.h


diff --git a/scrub/Makefile b/scrub/Makefile
index 5239dae..4d1c908 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -18,11 +18,13 @@ endif	# scrub_prereqs
 HFILES = \
 common.h \
 disk.h \
+inodes.h \
 xfs_scrub.h
 
 CFILES = \
 common.c \
 disk.c \
+inodes.c \
 phase1.c \
 xfs_scrub.c
 
diff --git a/scrub/inodes.c b/scrub/inodes.c
new file mode 100644
index 0000000..694bca7
--- /dev/null
+++ b/scrub/inodes.c
@@ -0,0 +1,284 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <pthread.h>
+#include <sys/statvfs.h>
+#include "platform_defs.h"
+#include "xfs.h"
+#include "xfs_arch.h"
+#include "xfs_format.h"
+#include "handle.h"
+#include "path.h"
+#include "workqueue.h"
+#include "xfs_scrub.h"
+#include "common.h"
+#include "inodes.h"
+
+/*
+ * Iterate a range of inodes.
+ *
+ * This is a little more involved than repeatedly asking BULKSTAT for a
+ * buffer's worth of stat data for some number of inodes.  We want to
+ * scan as many of the inodes that the inobt thinks there are, including
+ * the ones that are broken, but if we ask for n inodes start at x,
+ * it'll skip the bad ones and fill from beyond the range (x + n).
+ *
+ * Therefore, we ask INUMBERS to return one inobt chunk's worth of inode
+ * bitmap information.  Then we try to BULKSTAT only the inodes that
+ * were present in that chunk, and compare what we got against what
+ * INUMBERS said was there.  If there's a mismatch, we know that we have
+ * an inode that fails the verifiers but so we can inject the bulkstat
+ * information to force the scrub code to deal with the broken inodes.
+ *
+ * If the iteration function returns ESTALE, that means that the inode
+ * has been deleted and possibly recreated since the BULKSTAT call.  We
+ * wil refresh the stat information and try again up to 30 times before
+ * reporting the staleness as an error.
+ */
+
+/*
+ * Call into the filesystem for inode/bulkstat information and call our
+ * iterator function.  We'll try to fill the bulkstat information in
+ * batches, but we also can detect iget failures.
+ */
+static bool
+xfs_iterate_inodes_range(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	void			*fshandle,
+	uint64_t		first_ino,
+	uint64_t		last_ino,
+	xfs_inode_iter_fn	fn,
+	void			*arg)
+{
+	struct xfs_fsop_bulkreq	igrpreq = {0};
+	struct xfs_fsop_bulkreq	bulkreq = {0};
+	struct xfs_fsop_bulkreq	onereq = {0};
+	struct xfs_handle	handle;
+	struct xfs_inogrp	inogrp;
+	struct xfs_bstat	bstat[XFS_INODES_PER_CHUNK] = {0};
+	char			idescr[DESCR_BUFSZ];
+	char			buf[DESCR_BUFSZ];
+	struct xfs_bstat	*bs;
+	__u64			last_stale = first_ino - 1;
+	__u64			igrp_ino;
+	__u64			oneino;
+	__u64			ino;
+	__s32			bulklen = 0;
+	__s32			onelen = 0;
+	__s32			igrplen = 0;
+	bool			moveon = true;
+	int			i;
+	int			error;
+	int			stale_count = 0;
+
+	onereq.lastip  = &oneino;
+	onereq.icount  = 1;
+	onereq.ocount  = &onelen;
+
+	bulkreq.lastip  = &ino;
+	bulkreq.icount  = XFS_INODES_PER_CHUNK;
+	bulkreq.ubuffer = &bstat;
+	bulkreq.ocount  = &bulklen;
+
+	igrpreq.lastip  = &igrp_ino;
+	igrpreq.icount  = 1;
+	igrpreq.ubuffer = &inogrp;
+	igrpreq.ocount  = &igrplen;
+
+	memcpy(&handle.ha_fsid, fshandle, sizeof(handle.ha_fsid));
+	handle.ha_fid.fid_len = sizeof(xfs_fid_t) -
+			sizeof(handle.ha_fid.fid_len);
+	handle.ha_fid.fid_pad = 0;
+
+	/* Find the inode chunk & alloc mask */
+	igrp_ino = first_ino;
+	error = ioctl(ctx->mnt_fd, XFS_IOC_FSINUMBERS, &igrpreq);
+	while (!error && igrplen) {
+		/* Load the inodes. */
+		ino = inogrp.xi_startino - 1;
+		bulkreq.icount = inogrp.xi_alloccount;
+		error = ioctl(ctx->mnt_fd, XFS_IOC_FSBULKSTAT, &bulkreq);
+		if (error)
+			str_warn(ctx, descr, "%s", strerror_r(errno,
+						buf, DESCR_BUFSZ));
+
+		/* Did we get exactly the inodes we expected? */
+		for (i = 0, bs = bstat; i < XFS_INODES_PER_CHUNK; i++) {
+			if (!(inogrp.xi_allocmask & (1ULL << i)))
+				continue;
+			if (bs->bs_ino == inogrp.xi_startino + i) {
+				bs++;
+				continue;
+			}
+
+			/* Load the one inode. */
+			oneino = inogrp.xi_startino + i;
+			onereq.ubuffer = bs;
+			error = ioctl(ctx->mnt_fd, XFS_IOC_FSBULKSTAT_SINGLE,
+					&onereq);
+			if (error || bs->bs_ino != inogrp.xi_startino + i) {
+				memset(bs, 0, sizeof(struct xfs_bstat));
+				bs->bs_ino = inogrp.xi_startino + i;
+				bs->bs_blksize = ctx->mnt_sv.f_frsize;
+			}
+			bs++;
+		}
+
+		/* Iterate all the inodes. */
+		for (i = 0, bs = bstat; i < inogrp.xi_alloccount; i++, bs++) {
+			if (bs->bs_ino > last_ino)
+				goto out;
+
+			handle.ha_fid.fid_ino = bs->bs_ino;
+			handle.ha_fid.fid_gen = bs->bs_gen;
+			error = fn(ctx, &handle, bs, arg);
+			switch (error) {
+			case 0:
+				break;
+			case ESTALE:
+				if (last_stale == inogrp.xi_startino)
+					stale_count++;
+				else {
+					last_stale = inogrp.xi_startino;
+					stale_count = 0;
+				}
+				if (stale_count < 30) {
+					igrp_ino = inogrp.xi_startino;
+					goto igrp_retry;
+				}
+				snprintf(idescr, DESCR_BUFSZ, "inode %"PRIu64,
+						(uint64_t)bs->bs_ino);
+				str_warn(ctx, idescr, "%s", strerror_r(error,
+						buf, DESCR_BUFSZ));
+				break;
+			case XFS_ITERATE_INODES_ABORT:
+				error = 0;
+				/* fall thru */
+			default:
+				moveon = false;
+				errno = error;
+				goto err;
+			}
+			if (xfs_scrub_excessive_errors(ctx)) {
+				moveon = false;
+				goto out;
+			}
+		}
+
+igrp_retry:
+		error = ioctl(ctx->mnt_fd, XFS_IOC_FSINUMBERS, &igrpreq);
+	}
+
+err:
+	if (error) {
+		str_errno(ctx, descr);
+		moveon = false;
+	}
+out:
+	return moveon;
+}
+
+/* BULKSTAT wrapper routines. */
+struct xfs_scan_inodes {
+	xfs_inode_iter_fn	fn;
+	void			*arg;
+	bool			moveon;
+};
+
+/* Scan all the inodes in an AG. */
+static void
+xfs_scan_ag_inodes(
+	struct workqueue	*wq,
+	xfs_agnumber_t		agno,
+	void			*arg)
+{
+	struct xfs_scan_inodes	*si = arg;
+	struct scrub_ctx	*ctx = (struct scrub_ctx *)wq->wq_ctx;
+	char			descr[DESCR_BUFSZ];
+	uint64_t		ag_ino;
+	uint64_t		next_ag_ino;
+	bool			moveon;
+
+	snprintf(descr, DESCR_BUFSZ, _("dev %d:%d AG %u inodes"),
+				major(ctx->fsinfo.fs_datadev),
+				minor(ctx->fsinfo.fs_datadev),
+				agno);
+
+	ag_ino = (__u64)agno << (ctx->inopblog + ctx->agblklog);
+	next_ag_ino = (__u64)(agno + 1) << (ctx->inopblog + ctx->agblklog);
+
+	moveon = xfs_iterate_inodes_range(ctx, descr, ctx->fshandle, ag_ino,
+			next_ag_ino - 1, si->fn, si->arg);
+	if (!moveon)
+		si->moveon = false;
+}
+
+/* Scan all the inodes in a filesystem. */
+bool
+xfs_scan_all_inodes(
+	struct scrub_ctx	*ctx,
+	xfs_inode_iter_fn	fn,
+	void			*arg)
+{
+	struct xfs_scan_inodes	si;
+	xfs_agnumber_t		agno;
+	struct workqueue	wq;
+	int			ret;
+
+	si.moveon = true;
+	si.fn = fn;
+	si.arg = arg;
+
+	ret = workqueue_create(&wq, (struct xfs_mount *)ctx,
+			scrub_nproc_workqueue(ctx));
+	if (ret) {
+		str_error(ctx, ctx->mntpoint, _("Could not create workqueue."));
+		return false;
+	}
+
+	for (agno = 0; agno < ctx->geo.agcount; agno++) {
+		ret = workqueue_add(&wq, xfs_scan_ag_inodes, agno, &si);
+		if (ret) {
+			si.moveon = false;
+			str_error(ctx, ctx->mntpoint,
+_("Could not queue AG %u bulkstat work."), agno);
+			break;
+		}
+	}
+
+	workqueue_destroy(&wq);
+
+	return si.moveon;
+}
+
+/*
+ * Open a file by handle, or return a negative error code.
+ */
+int
+xfs_open_handle(
+	struct xfs_handle	*handle)
+{
+	return open_by_fshandle(handle, sizeof(*handle),
+			O_RDONLY | O_NOATIME | O_NOFOLLOW | O_NOCTTY);
+}
diff --git a/scrub/inodes.h b/scrub/inodes.h
new file mode 100644
index 0000000..693cb05
--- /dev/null
+++ b/scrub/inodes.h
@@ -0,0 +1,32 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef XFS_SCRUB_INODES_H_
+#define XFS_SCRUB_INODES_H_
+
+typedef int (*xfs_inode_iter_fn)(struct scrub_ctx *ctx,
+		struct xfs_handle *handle, struct xfs_bstat *bs, void *arg);
+
+#define XFS_ITERATE_INODES_ABORT	(-1)
+bool xfs_scan_all_inodes(struct scrub_ctx *ctx, xfs_inode_iter_fn fn,
+		void *arg);
+
+int xfs_open_handle(struct xfs_handle *handle);
+
+#endif /* XFS_SCRUB_INODES_H_ */


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 09/27] xfs_scrub: add space map iteration functions
  2018-01-06  1:51 [PATCH v11 00/27] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (7 preceding siblings ...)
  2018-01-06  1:52 ` [PATCH 08/27] xfs_scrub: add inode iteration functions Darrick J. Wong
@ 2018-01-06  1:52 ` Darrick J. Wong
  2018-01-06  1:52 ` [PATCH 10/27] xfs_scrub: add file " Darrick J. Wong
                   ` (21 subsequent siblings)
  30 siblings, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-06  1:52 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

These helpers enable userspace to iterate all the space map information
in a filesystem.  The iteration function uses GETFSMAP.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/Makefile   |    2 
 scrub/spacemap.c |  256 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/spacemap.h |   31 +++++++
 3 files changed, 289 insertions(+)
 create mode 100644 scrub/spacemap.c
 create mode 100644 scrub/spacemap.h


diff --git a/scrub/Makefile b/scrub/Makefile
index 4d1c908..24e0c44 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -19,6 +19,7 @@ HFILES = \
 common.h \
 disk.h \
 inodes.h \
+spacemap.h \
 xfs_scrub.h
 
 CFILES = \
@@ -26,6 +27,7 @@ common.c \
 disk.c \
 inodes.c \
 phase1.c \
+spacemap.c \
 xfs_scrub.c
 
 LLDLIBS += $(LIBHANDLE) $(LIBFROG) $(LIBPTHREAD)
diff --git a/scrub/spacemap.c b/scrub/spacemap.c
new file mode 100644
index 0000000..2dc6e2b
--- /dev/null
+++ b/scrub/spacemap.c
@@ -0,0 +1,256 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <string.h>
+#include <pthread.h>
+#include <sys/statvfs.h>
+#include "workqueue.h"
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "path.h"
+#include "xfs_scrub.h"
+#include "common.h"
+#include "spacemap.h"
+
+/*
+ * Filesystem space map iterators.
+ *
+ * Logically, we call GETFSMAP to fetch a set of space map records and
+ * call a function to iterate over the records.  However, that's not
+ * what actually happens -- the work is split into separate items, with
+ * each AG, the realtime device, and the log device getting their own
+ * work items.  For an XFS with a realtime device and an external log,
+ * this means that we can have up to ($agcount + 2) threads running at
+ * once.
+ *
+ * This comes into play if we want to have per-workitem memory.  Maybe.
+ * XXX: do we really need all that ?
+ */
+
+#define FSMAP_NR	65536
+
+/* Iterate all the fs block mappings between the two keys. */
+bool
+xfs_iterate_fsmap(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	struct fsmap		*keys,
+	xfs_fsmap_iter_fn	fn,
+	void			*arg)
+{
+	struct fsmap_head	*head;
+	struct fsmap		*p;
+	bool			moveon = true;
+	int			i;
+	int			error;
+
+	head = malloc(fsmap_sizeof(FSMAP_NR));
+	if (!head) {
+		str_errno(ctx, descr);
+		return false;
+	}
+
+	memset(head, 0, sizeof(*head));
+	memcpy(head->fmh_keys, keys, sizeof(struct fsmap) * 2);
+	head->fmh_count = FSMAP_NR;
+
+	while ((error = ioctl(ctx->mnt_fd, FS_IOC_GETFSMAP, head)) == 0) {
+		for (i = 0, p = head->fmh_recs;
+		     i < head->fmh_entries;
+		     i++, p++) {
+			moveon = fn(ctx, descr, p, arg);
+			if (!moveon)
+				goto out;
+			if (xfs_scrub_excessive_errors(ctx)) {
+				moveon = false;
+				goto out;
+			}
+		}
+
+		if (head->fmh_entries == 0)
+			break;
+		p = &head->fmh_recs[head->fmh_entries - 1];
+		if (p->fmr_flags & FMR_OF_LAST)
+			break;
+		fsmap_advance(head);
+	}
+
+	if (error) {
+		str_errno(ctx, descr);
+		moveon = false;
+	}
+out:
+	free(head);
+	return moveon;
+}
+
+/* GETFSMAP wrappers routines. */
+struct xfs_scan_blocks {
+	xfs_fsmap_iter_fn	fn;
+	void			*arg;
+	bool			moveon;
+};
+
+/* Iterate all the reverse mappings of an AG. */
+static void
+xfs_scan_ag_blocks(
+	struct workqueue	*wq,
+	xfs_agnumber_t		agno,
+	void			*arg)
+{
+	struct scrub_ctx	*ctx = (struct scrub_ctx *)wq->wq_ctx;
+	struct xfs_scan_blocks	*sbx = arg;
+	char			descr[DESCR_BUFSZ];
+	struct fsmap		keys[2];
+	off64_t			bperag;
+	bool			moveon;
+
+	bperag = (off64_t)ctx->geo.agblocks *
+		 (off64_t)ctx->geo.blocksize;
+
+	snprintf(descr, DESCR_BUFSZ, _("dev %d:%d AG %u fsmap"),
+				major(ctx->fsinfo.fs_datadev),
+				minor(ctx->fsinfo.fs_datadev),
+				agno);
+
+	memset(keys, 0, sizeof(struct fsmap) * 2);
+	keys->fmr_device = ctx->fsinfo.fs_datadev;
+	keys->fmr_physical = agno * bperag;
+	(keys + 1)->fmr_device = ctx->fsinfo.fs_datadev;
+	(keys + 1)->fmr_physical = ((agno + 1) * bperag) - 1;
+	(keys + 1)->fmr_owner = ULLONG_MAX;
+	(keys + 1)->fmr_offset = ULLONG_MAX;
+	(keys + 1)->fmr_flags = UINT_MAX;
+
+	moveon = xfs_iterate_fsmap(ctx, descr, keys, sbx->fn, sbx->arg);
+	if (!moveon)
+		sbx->moveon = false;
+}
+
+/* Iterate all the reverse mappings of a standalone device. */
+static void
+xfs_scan_dev_blocks(
+	struct scrub_ctx	*ctx,
+	int			idx,
+	dev_t			dev,
+	struct xfs_scan_blocks	*sbx)
+{
+	struct fsmap		keys[2];
+	char			descr[DESCR_BUFSZ];
+	bool			moveon;
+
+	snprintf(descr, DESCR_BUFSZ, _("dev %d:%d fsmap"),
+			major(dev), minor(dev));
+
+	memset(keys, 0, sizeof(struct fsmap) * 2);
+	keys->fmr_device = dev;
+	(keys + 1)->fmr_device = dev;
+	(keys + 1)->fmr_physical = ULLONG_MAX;
+	(keys + 1)->fmr_owner = ULLONG_MAX;
+	(keys + 1)->fmr_offset = ULLONG_MAX;
+	(keys + 1)->fmr_flags = UINT_MAX;
+
+	moveon = xfs_iterate_fsmap(ctx, descr, keys, sbx->fn, sbx->arg);
+	if (!moveon)
+		sbx->moveon = false;
+}
+
+/* Iterate all the reverse mappings of the realtime device. */
+static void
+xfs_scan_rt_blocks(
+	struct workqueue	*wq,
+	xfs_agnumber_t		agno,
+	void			*arg)
+{
+	struct scrub_ctx	*ctx = (struct scrub_ctx *)wq->wq_ctx;
+
+	xfs_scan_dev_blocks(ctx, agno, ctx->fsinfo.fs_rtdev, arg);
+}
+
+/* Iterate all the reverse mappings of the log device. */
+static void
+xfs_scan_log_blocks(
+	struct workqueue	*wq,
+	xfs_agnumber_t		agno,
+	void			*arg)
+{
+	struct scrub_ctx	*ctx = (struct scrub_ctx *)wq->wq_ctx;
+
+	xfs_scan_dev_blocks(ctx, agno, ctx->fsinfo.fs_logdev, arg);
+}
+
+/* Scan all the blocks in a filesystem. */
+bool
+xfs_scan_all_spacemaps(
+	struct scrub_ctx	*ctx,
+	xfs_fsmap_iter_fn	fn,
+	void			*arg)
+{
+	struct workqueue	wq;
+	struct xfs_scan_blocks	sbx;
+	xfs_agnumber_t		agno;
+	int			ret;
+
+	sbx.moveon = true;
+	sbx.fn = fn;
+	sbx.arg = arg;
+
+	ret = workqueue_create(&wq, (struct xfs_mount *)ctx,
+			scrub_nproc_workqueue(ctx));
+	if (ret) {
+		str_error(ctx, ctx->mntpoint, _("Could not create workqueue."));
+		return false;
+	}
+	if (ctx->fsinfo.fs_rt) {
+		ret = workqueue_add(&wq, xfs_scan_rt_blocks,
+				ctx->geo.agcount + 1, &sbx);
+		if (ret) {
+			sbx.moveon = false;
+			str_error(ctx, ctx->mntpoint,
+_("Could not queue rtdev fsmap work."));
+			goto out;
+		}
+	}
+	if (ctx->fsinfo.fs_log) {
+		ret = workqueue_add(&wq, xfs_scan_log_blocks,
+				ctx->geo.agcount + 2, &sbx);
+		if (ret) {
+			sbx.moveon = false;
+			str_error(ctx, ctx->mntpoint,
+_("Could not queue logdev fsmap work."));
+			goto out;
+		}
+	}
+	for (agno = 0; agno < ctx->geo.agcount; agno++) {
+		ret = workqueue_add(&wq, xfs_scan_ag_blocks, agno, &sbx);
+		if (ret) {
+			sbx.moveon = false;
+			str_error(ctx, ctx->mntpoint,
+_("Could not queue AG %u fsmap work."), agno);
+			break;
+		}
+	}
+out:
+	workqueue_destroy(&wq);
+
+	return sbx.moveon;
+}
diff --git a/scrub/spacemap.h b/scrub/spacemap.h
new file mode 100644
index 0000000..9ee46f7
--- /dev/null
+++ b/scrub/spacemap.h
@@ -0,0 +1,31 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef XFS_SCRUB_SPACEMAP_H_
+#define XFS_SCRUB_SPACEMAP_H_
+
+typedef bool (*xfs_fsmap_iter_fn)(struct scrub_ctx *ctx, const char *descr,
+		struct fsmap *fsr, void *arg);
+
+bool xfs_iterate_fsmap(struct scrub_ctx *ctx, const char *descr,
+		struct fsmap *keys, xfs_fsmap_iter_fn fn, void *arg);
+bool xfs_scan_all_spacemaps(struct scrub_ctx *ctx, xfs_fsmap_iter_fn fn,
+		void *arg);
+
+#endif /* XFS_SCRUB_SPACEMAP_H_ */


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 10/27] xfs_scrub: add file space map iteration functions
  2018-01-06  1:51 [PATCH v11 00/27] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (8 preceding siblings ...)
  2018-01-06  1:52 ` [PATCH 09/27] xfs_scrub: add space map " Darrick J. Wong
@ 2018-01-06  1:52 ` Darrick J. Wong
  2018-01-11 23:19   ` Eric Sandeen
  2018-01-06  1:52 ` [PATCH 11/27] xfs_scrub: filesystem counter collection functions Darrick J. Wong
                   ` (20 subsequent siblings)
  30 siblings, 1 reply; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-06  1:52 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

These helpers enable userspace to iterate all the space map information
for a file.  The iteration function uses GETBMAPX.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/Makefile  |    2 +
 scrub/filemap.c |  158 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/filemap.h |   39 ++++++++++++++
 3 files changed, 199 insertions(+)
 create mode 100644 scrub/filemap.c
 create mode 100644 scrub/filemap.h


diff --git a/scrub/Makefile b/scrub/Makefile
index 24e0c44..a3534e6 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -18,6 +18,7 @@ endif	# scrub_prereqs
 HFILES = \
 common.h \
 disk.h \
+filemap.h \
 inodes.h \
 spacemap.h \
 xfs_scrub.h
@@ -25,6 +26,7 @@ xfs_scrub.h
 CFILES = \
 common.c \
 disk.c \
+filemap.c \
 inodes.c \
 phase1.c \
 spacemap.c \
diff --git a/scrub/filemap.c b/scrub/filemap.c
new file mode 100644
index 0000000..1c3c1cc
--- /dev/null
+++ b/scrub/filemap.c
@@ -0,0 +1,158 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <string.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/statvfs.h>
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "path.h"
+#include "xfs_scrub.h"
+#include "common.h"
+#include "filemap.h"
+
+/*
+ * These routines provide a simple interface to query the block
+ * mappings of the fork of a given inode via GETBMAPX and call a
+ * function to iterate each mapping result.
+ */
+
+#define BMAP_NR		2048
+
+/* Iterate all the extent block mappings between the key and fork end. */
+bool
+xfs_iterate_filemaps(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	int			fd,
+	int			whichfork,
+	struct xfs_bmap		*key,
+	xfs_bmap_iter_fn	fn,
+	void			*arg)
+{
+	struct fsxattr		fsx;
+	struct getbmapx		*map;
+	struct getbmapx		*p;
+	struct xfs_bmap		bmap;
+	char			bmap_descr[DESCR_BUFSZ];
+	bool			moveon = true;
+	xfs_off_t		new_off;
+	int			getxattr_type;
+	int			i;
+	int			error;
+
+	switch (whichfork) {
+	case XFS_ATTR_FORK:
+		snprintf(bmap_descr, DESCR_BUFSZ, _("%s attr"), descr);
+		break;
+	case XFS_COW_FORK:
+		snprintf(bmap_descr, DESCR_BUFSZ, _("%s CoW"), descr);
+		break;
+	case XFS_DATA_FORK:
+		snprintf(bmap_descr, DESCR_BUFSZ, _("%s data"), descr);
+		break;
+	default:
+		abort();
+	}
+
+	map = calloc(BMAP_NR, sizeof(struct getbmapx));
+	if (!map) {
+		str_errno(ctx, bmap_descr);
+		return false;
+	}
+
+	map->bmv_offset = BTOBB(key->bm_offset);
+	map->bmv_block = BTOBB(key->bm_physical);
+	if (key->bm_length == 0)
+		map->bmv_length = ULLONG_MAX;
+	else
+		map->bmv_length = BTOBB(key->bm_length);
+	map->bmv_count = BMAP_NR;
+	map->bmv_iflags = BMV_IF_NO_DMAPI_READ | BMV_IF_PREALLOC |
+			  BMV_IF_NO_HOLES;
+	switch (whichfork) {
+	case XFS_ATTR_FORK:
+		getxattr_type = XFS_IOC_FSGETXATTRA;
+		map->bmv_iflags |= BMV_IF_ATTRFORK;
+		break;
+	case XFS_COW_FORK:
+		map->bmv_iflags |= BMV_IF_COWFORK;
+		getxattr_type = FS_IOC_FSGETXATTR;
+		break;
+	case XFS_DATA_FORK:
+		getxattr_type = FS_IOC_FSGETXATTR;
+		break;
+	default:
+		abort();
+	}
+
+	error = ioctl(fd, getxattr_type, &fsx);
+	if (error < 0) {
+		str_errno(ctx, bmap_descr);
+		moveon = false;
+		goto out;
+	}
+
+	while ((error = ioctl(fd, XFS_IOC_GETBMAPX, map)) == 0) {
+		for (i = 0, p = &map[i + 1]; i < map->bmv_entries; i++, p++) {
+			bmap.bm_offset = BBTOB(p->bmv_offset);
+			bmap.bm_physical = BBTOB(p->bmv_block);
+			bmap.bm_length = BBTOB(p->bmv_length);
+			bmap.bm_flags = p->bmv_oflags;
+			moveon = fn(ctx, bmap_descr, fd, whichfork, &fsx,
+					&bmap, arg);
+			if (!moveon)
+				goto out;
+			if (xfs_scrub_excessive_errors(ctx)) {
+				moveon = false;
+				goto out;
+			}
+		}
+
+		if (map->bmv_entries == 0)
+			break;
+		p = map + map->bmv_entries;
+		if (p->bmv_oflags & BMV_OF_LAST)
+			break;
+
+		new_off = p->bmv_offset + p->bmv_length;
+		map->bmv_length -= new_off - map->bmv_offset;
+		map->bmv_offset = new_off;
+	}
+
+	/*
+	 * Pre-reflink filesystems don't know about CoW forks, so don't
+	 * be too surprised if it fails.
+	 */
+	if (whichfork == XFS_COW_FORK && error && errno == EINVAL)
+		error = 0;
+
+	if (error)
+		str_errno(ctx, bmap_descr);
+out:
+	memcpy(key, map, sizeof(struct getbmapx));
+	free(map);
+	return moveon;
+}
diff --git a/scrub/filemap.h b/scrub/filemap.h
new file mode 100644
index 0000000..30d53d0
--- /dev/null
+++ b/scrub/filemap.h
@@ -0,0 +1,39 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef XFS_SCRUB_FILEMAP_H_
+#define XFS_SCRUB_FILEMAP_H_
+
+/* inode fork block mapping */
+struct xfs_bmap {
+	uint64_t	bm_offset;	/* file offset of segment in bytes */
+	uint64_t	bm_physical;	/* physical starting byte  */
+	uint64_t	bm_length;	/* length of segment, bytes */
+	uint32_t	bm_flags;	/* output flags */
+};
+
+typedef bool (*xfs_bmap_iter_fn)(struct scrub_ctx *ctx, const char *descr,
+		int fd, int whichfork, struct fsxattr *fsx,
+		struct xfs_bmap *bmap, void *arg);
+
+bool xfs_iterate_filemaps(struct scrub_ctx *ctx, const char *descr, int fd,
+		int whichfork, struct xfs_bmap *key, xfs_bmap_iter_fn fn,
+		void *arg);
+
+#endif /* XFS_SCRUB_FILEMAP_H_ */


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 11/27] xfs_scrub: filesystem counter collection functions
  2018-01-06  1:51 [PATCH v11 00/27] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (9 preceding siblings ...)
  2018-01-06  1:52 ` [PATCH 10/27] xfs_scrub: add file " Darrick J. Wong
@ 2018-01-06  1:52 ` Darrick J. Wong
  2018-01-06  1:52 ` [PATCH 12/27] xfs_scrub: wrap the scrub ioctl Darrick J. Wong
                   ` (19 subsequent siblings)
  30 siblings, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-06  1:52 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Add a couple of helper functions to estimate the inode and block
counters on the filesystem.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/Makefile     |    2 
 scrub/fscounters.c |  212 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/fscounters.h |   29 +++++++
 3 files changed, 243 insertions(+)
 create mode 100644 scrub/fscounters.c
 create mode 100644 scrub/fscounters.h


diff --git a/scrub/Makefile b/scrub/Makefile
index a3534e6..5397339 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -19,6 +19,7 @@ HFILES = \
 common.h \
 disk.h \
 filemap.h \
+fscounters.h \
 inodes.h \
 spacemap.h \
 xfs_scrub.h
@@ -27,6 +28,7 @@ CFILES = \
 common.c \
 disk.c \
 filemap.c \
+fscounters.c \
 inodes.c \
 phase1.c \
 spacemap.c \
diff --git a/scrub/fscounters.c b/scrub/fscounters.c
new file mode 100644
index 0000000..4294bf3
--- /dev/null
+++ b/scrub/fscounters.c
@@ -0,0 +1,212 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <sys/statvfs.h>
+#include "platform_defs.h"
+#include "xfs.h"
+#include "xfs_arch.h"
+#include "xfs_fs.h"
+#include "xfs_format.h"
+#include "path.h"
+#include "workqueue.h"
+#include "xfs_scrub.h"
+#include "common.h"
+#include "fscounters.h"
+
+/*
+ * Filesystem counter collection routines.  We can count the number of
+ * inodes in the filesystem, and we can estimate the block counters.
+ */
+
+/* Count the number of inodes in the filesystem. */
+
+/* INUMBERS wrapper routines. */
+struct xfs_count_inodes {
+	bool			moveon;
+	uint64_t		counters[0];
+};
+
+/*
+ * Count the number of inodes.  Use INUMBERS to figure out how many inodes
+ * exist in the filesystem, assuming we've already scrubbed that.
+ */
+static bool
+xfs_count_inodes_range(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	uint64_t		first_ino,
+	uint64_t		last_ino,
+	uint64_t		*count)
+{
+	struct xfs_fsop_bulkreq	igrpreq = {0};
+	struct xfs_inogrp	inogrp;
+	__u64			igrp_ino;
+	uint64_t		nr = 0;
+	__s32			igrplen = 0;
+	int			error;
+
+	ASSERT(!(first_ino & (XFS_INODES_PER_CHUNK - 1)));
+	ASSERT((last_ino & (XFS_INODES_PER_CHUNK - 1)));
+
+	igrpreq.lastip  = &igrp_ino;
+	igrpreq.icount  = 1;
+	igrpreq.ubuffer = &inogrp;
+	igrpreq.ocount  = &igrplen;
+
+	igrp_ino = first_ino;
+	error = ioctl(ctx->mnt_fd, XFS_IOC_FSINUMBERS, &igrpreq);
+	while (!error && igrplen && inogrp.xi_startino < last_ino) {
+		nr += inogrp.xi_alloccount;
+		error = ioctl(ctx->mnt_fd, XFS_IOC_FSINUMBERS, &igrpreq);
+	}
+
+	if (error) {
+		str_errno(ctx, descr);
+		return false;
+	}
+
+	*count = nr;
+	return true;
+}
+
+/* Scan all the inodes in an AG. */
+static void
+xfs_count_ag_inodes(
+	struct workqueue	*wq,
+	xfs_agnumber_t		agno,
+	void			*arg)
+{
+	struct xfs_count_inodes	*ci = arg;
+	struct scrub_ctx	*ctx = (struct scrub_ctx *)wq->wq_ctx;
+	char			descr[DESCR_BUFSZ];
+	uint64_t		ag_ino;
+	uint64_t		next_ag_ino;
+	bool			moveon;
+
+	snprintf(descr, DESCR_BUFSZ, _("dev %d:%d AG %u inodes"),
+				major(ctx->fsinfo.fs_datadev),
+				minor(ctx->fsinfo.fs_datadev),
+				agno);
+
+	ag_ino = (__u64)agno << (ctx->inopblog + ctx->agblklog);
+	next_ag_ino = (__u64)(agno + 1) << (ctx->inopblog + ctx->agblklog);
+
+	moveon = xfs_count_inodes_range(ctx, descr, ag_ino, next_ag_ino - 1,
+			&ci->counters[agno]);
+	if (!moveon)
+		ci->moveon = false;
+}
+
+/* Count all the inodes in a filesystem. */
+bool
+xfs_count_all_inodes(
+	struct scrub_ctx	*ctx,
+	uint64_t		*count)
+{
+	struct xfs_count_inodes	*ci;
+	xfs_agnumber_t		agno;
+	struct workqueue	wq;
+	bool			moveon;
+	int			ret;
+
+	ci = calloc(1, sizeof(struct xfs_count_inodes) +
+			(ctx->geo.agcount * sizeof(uint64_t)));
+	if (!ci)
+		return false;
+	ci->moveon = true;
+
+	ret = workqueue_create(&wq, (struct xfs_mount *)ctx,
+			scrub_nproc_workqueue(ctx));
+	if (ret) {
+		moveon = false;
+		str_error(ctx, ctx->mntpoint, _("Could not create workqueue."));
+		goto out_free;
+	}
+	for (agno = 0; agno < ctx->geo.agcount; agno++) {
+		ret = workqueue_add(&wq, xfs_count_ag_inodes, agno, ci);
+		if (ret) {
+			moveon = false;
+			str_error(ctx, ctx->mntpoint,
+_("Could not queue AG %u icount work."), agno);
+			break;
+		}
+	}
+	workqueue_destroy(&wq);
+
+	for (agno = 0; agno < ctx->geo.agcount; agno++)
+		*count += ci->counters[agno];
+	moveon = ci->moveon;
+
+out_free:
+	free(ci);
+	return moveon;
+}
+
+/* Estimate the number of blocks and inodes in the filesystem. */
+bool
+xfs_scan_estimate_blocks(
+	struct scrub_ctx		*ctx,
+	unsigned long long		*d_blocks,
+	unsigned long long		*d_bfree,
+	unsigned long long		*r_blocks,
+	unsigned long long		*r_bfree,
+	unsigned long long		*f_files,
+	unsigned long long		*f_free)
+{
+	struct xfs_fsop_counts		fc;
+	struct xfs_fsop_resblks		rb;
+	struct statvfs			sfs;
+	int				error;
+
+	/* Grab the fstatvfs counters, since it has to report accurately. */
+	error = fstatvfs(ctx->mnt_fd, &sfs);
+	if (error) {
+		str_errno(ctx, ctx->mntpoint);
+		return false;
+	}
+
+	/* Fetch the filesystem counters. */
+	error = ioctl(ctx->mnt_fd, XFS_IOC_FSCOUNTS, &fc);
+	if (error) {
+		str_errno(ctx, ctx->mntpoint);
+		return false;
+	}
+
+	/*
+	 * XFS reserves some blocks to prevent hard ENOSPC, so add those
+	 * blocks back to the free data counts.
+	 */
+	error = ioctl(ctx->mnt_fd, XFS_IOC_GET_RESBLKS, &rb);
+	if (error)
+		str_errno(ctx, ctx->mntpoint);
+	sfs.f_bfree += rb.resblks_avail;
+
+	*d_blocks = sfs.f_blocks + (ctx->geo.logstart ? ctx->geo.logblocks : 0);
+	*d_bfree = sfs.f_bfree;
+	*r_blocks = ctx->geo.rtblocks;
+	*r_bfree = fc.freertx;
+	*f_files = sfs.f_files;
+	*f_free = sfs.f_ffree;
+
+	return true;
+}
diff --git a/scrub/fscounters.h b/scrub/fscounters.h
new file mode 100644
index 0000000..40a4c05
--- /dev/null
+++ b/scrub/fscounters.h
@@ -0,0 +1,29 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef XFS_SCRUB_FSCOUNTERS_H_
+#define XFS_SCRUB_FSCOUNTERS_H_
+
+bool xfs_scan_estimate_blocks(struct scrub_ctx *ctx,
+		unsigned long long *d_blocks, unsigned long long *d_bfree,
+		unsigned long long *r_blocks, unsigned long long *r_bfree,
+		unsigned long long *f_files, unsigned long long *f_free);
+bool xfs_count_all_inodes(struct scrub_ctx *ctx, uint64_t *count);
+
+#endif /* XFS_SCRUB_FSCOUNTERS_H_ */


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 12/27] xfs_scrub: wrap the scrub ioctl
  2018-01-06  1:51 [PATCH v11 00/27] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (10 preceding siblings ...)
  2018-01-06  1:52 ` [PATCH 11/27] xfs_scrub: filesystem counter collection functions Darrick J. Wong
@ 2018-01-06  1:52 ` Darrick J. Wong
  2018-01-11 23:12   ` Eric Sandeen
  2018-01-06  1:52 ` [PATCH 13/27] xfs_scrub: scan filesystem and AG metadata Darrick J. Wong
                   ` (18 subsequent siblings)
  30 siblings, 1 reply; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-06  1:52 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Create some wrappers to call the scrub ioctls.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/Makefile |    2 
 scrub/common.c |   19 ++
 scrub/common.h |    1 
 scrub/phase1.c |    8 +
 scrub/scrub.c  |  620 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/scrub.h  |   62 ++++++
 6 files changed, 712 insertions(+)
 create mode 100644 scrub/scrub.c
 create mode 100644 scrub/scrub.h


diff --git a/scrub/Makefile b/scrub/Makefile
index 5397339..915b801 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -21,6 +21,7 @@ disk.h \
 filemap.h \
 fscounters.h \
 inodes.h \
+scrub.h \
 spacemap.h \
 xfs_scrub.h
 
@@ -31,6 +32,7 @@ filemap.c \
 fscounters.c \
 inodes.c \
 phase1.c \
+scrub.c \
 spacemap.c \
 xfs_scrub.c
 
diff --git a/scrub/common.c b/scrub/common.c
index 252809d..eb602a8 100644
--- a/scrub/common.c
+++ b/scrub/common.c
@@ -320,3 +320,22 @@ find_mountpoint(
 	platform_mntent_close(&cursor);
 	return found;
 }
+
+/*
+ * Sleep for 100ms * however many -b we got past the initial one.
+ * This is an (albeit clumsy) way to throttle scrub activity.
+ */
+void
+background_sleep(void)
+{
+	unsigned long long	time;
+	struct timespec		tv;
+
+	if (bg_mode < 2)
+		return;
+
+	time = 100000 * (bg_mode - 1);
+	tv.tv_sec = time / 1000000;
+	tv.tv_nsec = time % 1000000;
+	nanosleep(&tv, NULL);
+}
diff --git a/scrub/common.h b/scrub/common.h
index fed95df..81e83c2 100644
--- a/scrub/common.h
+++ b/scrub/common.h
@@ -71,5 +71,6 @@ static inline int syncfs(int fd)
 #endif
 
 bool find_mountpoint(char *mtab, struct scrub_ctx *ctx);
+void background_sleep(void);
 
 #endif /* XFS_SCRUB_COMMON_H_ */
diff --git a/scrub/phase1.c b/scrub/phase1.c
index 65409d3..d7a321f 100644
--- a/scrub/phase1.c
+++ b/scrub/phase1.c
@@ -46,6 +46,7 @@
 #include "xfs_scrub.h"
 #include "common.h"
 #include "disk.h"
+#include "scrub.h"
 
 /* Phase 1: Find filesystem geometry (and clean up after) */
 
@@ -168,6 +169,13 @@ _("Does not appear to be an XFS filesystem!"));
 		return false;
 	}
 
+	/* Do we have kernel-assisted metadata scrubbing? */
+	if (!xfs_can_scrub_fs_metadata(ctx) || !xfs_can_scrub_inode(ctx) ||
+	    !xfs_can_scrub_bmap(ctx) || !xfs_can_scrub_dir(ctx) ||
+	    !xfs_can_scrub_attr(ctx) || !xfs_can_scrub_symlink(ctx) ||
+	    !xfs_can_scrub_parent(ctx))
+		return false;
+
 	/* Go find the XFS devices if we have a usable fsmap. */
 	fs_table_initialise(0, NULL, 0, NULL);
 	errno = 0;
diff --git a/scrub/scrub.c b/scrub/scrub.c
new file mode 100644
index 0000000..98e7e0d
--- /dev/null
+++ b/scrub/scrub.c
@@ -0,0 +1,620 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <string.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/statvfs.h>
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "path.h"
+#include "xfs_scrub.h"
+#include "common.h"
+#include "scrub.h"
+#include "xfs_errortag.h"
+
+/* Online scrub and repair wrappers. */
+
+/* Type info and names for the scrub types. */
+enum scrub_type {
+	ST_NONE,	/* disabled */
+	ST_AGHEADER,	/* per-AG header */
+	ST_PERAG,	/* per-AG metadata */
+	ST_FS,		/* per-FS metadata */
+	ST_INODE,	/* per-inode metadata */
+};
+struct scrub_descr {
+	const char	*name;
+	enum scrub_type	type;
+};
+
+/* These must correspond to XFS_SCRUB_TYPE_ */
+static const struct scrub_descr scrubbers[XFS_SCRUB_TYPE_NR] = {
+	[XFS_SCRUB_TYPE_PROBE] =
+		{"metadata",				ST_NONE},
+	[XFS_SCRUB_TYPE_SB] =
+		{"superblock",				ST_AGHEADER},
+	[XFS_SCRUB_TYPE_AGF] =
+		{"free space header",			ST_AGHEADER},
+	[XFS_SCRUB_TYPE_AGFL] =
+		{"free list",				ST_AGHEADER},
+	[XFS_SCRUB_TYPE_AGI] =
+		{"inode header",			ST_AGHEADER},
+	[XFS_SCRUB_TYPE_BNOBT] =
+		{"freesp by block btree",		ST_PERAG},
+	[XFS_SCRUB_TYPE_CNTBT] =
+		{"freesp by length btree",		ST_PERAG},
+	[XFS_SCRUB_TYPE_INOBT] =
+		{"inode btree",				ST_PERAG},
+	[XFS_SCRUB_TYPE_FINOBT] =
+		{"free inode btree",			ST_PERAG},
+	[XFS_SCRUB_TYPE_RMAPBT] =
+		{"reverse mapping btree",		ST_PERAG},
+	[XFS_SCRUB_TYPE_REFCNTBT] =
+		{"reference count btree",		ST_PERAG},
+	[XFS_SCRUB_TYPE_INODE] =
+		{"inode record",			ST_INODE},
+	[XFS_SCRUB_TYPE_BMBTD] =
+		{"data block map",			ST_INODE},
+	[XFS_SCRUB_TYPE_BMBTA] =
+		{"attr block map",			ST_INODE},
+	[XFS_SCRUB_TYPE_BMBTC] =
+		{"CoW block map",			ST_INODE},
+	[XFS_SCRUB_TYPE_DIR] =
+		{"directory entries",			ST_INODE},
+	[XFS_SCRUB_TYPE_XATTR] =
+		{"extended attributes",			ST_INODE},
+	[XFS_SCRUB_TYPE_SYMLINK] =
+		{"symbolic link",			ST_INODE},
+	[XFS_SCRUB_TYPE_PARENT] =
+		{"parent pointer",			ST_INODE},
+	[XFS_SCRUB_TYPE_RTBITMAP] =
+		{"realtime bitmap",			ST_FS},
+	[XFS_SCRUB_TYPE_RTSUM] =
+		{"realtime summary",			ST_FS},
+	[XFS_SCRUB_TYPE_UQUOTA] =
+		{"user quotas",				ST_FS},
+	[XFS_SCRUB_TYPE_GQUOTA] =
+		{"group quotas",			ST_FS},
+	[XFS_SCRUB_TYPE_PQUOTA] =
+		{"project quotas",			ST_FS},
+};
+
+/* Format a scrub description. */
+static void
+format_scrub_descr(
+	char				*buf,
+	size_t				buflen,
+	struct xfs_scrub_metadata	*meta,
+	const struct scrub_descr	*sc)
+{
+	switch (sc->type) {
+	case ST_AGHEADER:
+	case ST_PERAG:
+		snprintf(buf, buflen, _("AG %u %s"), meta->sm_agno,
+				_(sc->name));
+		break;
+	case ST_INODE:
+		snprintf(buf, buflen, _("Inode %"PRIu64" %s"),
+				(uint64_t)meta->sm_ino, _(sc->name));
+		break;
+	case ST_FS:
+		snprintf(buf, buflen, _("%s"), _(sc->name));
+		break;
+	case ST_NONE:
+		assert(0);
+		break;
+	}
+}
+
+/* Predicates for scrub flag state. */
+
+static inline bool is_corrupt(struct xfs_scrub_metadata *sm)
+{
+	return sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT;
+}
+
+static inline bool is_unoptimized(struct xfs_scrub_metadata *sm)
+{
+	return sm->sm_flags & XFS_SCRUB_OFLAG_PREEN;
+}
+
+static inline bool xref_failed(struct xfs_scrub_metadata *sm)
+{
+	return sm->sm_flags & XFS_SCRUB_OFLAG_XFAIL;
+}
+
+static inline bool xref_disagrees(struct xfs_scrub_metadata *sm)
+{
+	return sm->sm_flags & XFS_SCRUB_OFLAG_XCORRUPT;
+}
+
+static inline bool is_incomplete(struct xfs_scrub_metadata *sm)
+{
+	return sm->sm_flags & XFS_SCRUB_OFLAG_INCOMPLETE;
+}
+
+static inline bool is_suspicious(struct xfs_scrub_metadata *sm)
+{
+	return sm->sm_flags & XFS_SCRUB_OFLAG_WARNING;
+}
+
+/* Should we fix it? */
+static inline bool needs_repair(struct xfs_scrub_metadata *sm)
+{
+	return is_corrupt(sm) || xref_disagrees(sm);
+}
+
+/* Warn about strange circumstances after scrub. */
+static inline void
+xfs_scrub_warn_incomplete_scrub(
+	struct scrub_ctx		*ctx,
+	const char			*descr,
+	struct xfs_scrub_metadata	*meta)
+{
+	if (is_incomplete(meta))
+		str_info(ctx, descr, _("Check incomplete."));
+
+	if (is_suspicious(meta)) {
+		if (debug)
+			str_info(ctx, descr, _("Possibly suspect metadata."));
+		else
+			str_warn(ctx, descr, _("Possibly suspect metadata."));
+	}
+
+	if (xref_failed(meta))
+		str_info(ctx, descr, _("Cross-referencing failed."));
+}
+
+/* Do a read-only check of some metadata. */
+static enum check_outcome
+xfs_check_metadata(
+	struct scrub_ctx		*ctx,
+	int				fd,
+	struct xfs_scrub_metadata	*meta,
+	bool				is_inode)
+{
+	char				buf[DESCR_BUFSZ];
+	unsigned int			tries = 0;
+	int				code;
+	int				error;
+
+	assert(!debug_tweak_on("XFS_SCRUB_NO_KERNEL"));
+	assert(meta->sm_type < XFS_SCRUB_TYPE_NR);
+	format_scrub_descr(buf, DESCR_BUFSZ, meta, &scrubbers[meta->sm_type]);
+
+	dbg_printf("check %s flags %xh\n", buf, meta->sm_flags);
+retry:
+	error = ioctl(fd, XFS_IOC_SCRUB_METADATA, meta);
+	if (debug_tweak_on("XFS_SCRUB_FORCE_REPAIR") && !error)
+		meta->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT;
+	if (error) {
+		code = errno;
+		switch (code) {
+		case ENOENT:
+			/* Metadata not present, just skip it. */
+			return CHECK_DONE;
+		case ESHUTDOWN:
+			/* FS already crashed, give up. */
+			str_error(ctx, buf,
+_("Filesystem is shut down, aborting."));
+			return CHECK_ABORT;
+		case EIO:
+		case ENOMEM:
+			/* Abort on I/O errors or insufficient memory. */
+			str_errno(ctx, buf);
+			return CHECK_ABORT;
+		case EDEADLOCK:
+		case EBUSY:
+		case EFSBADCRC:
+		case EFSCORRUPTED:
+			/*
+			 * The first two should never escape the kernel,
+			 * and the other two should be reported via sm_flags.
+			 */
+			str_error(ctx, buf,
+_("Kernel bug!  errno=%d"), code);
+			/* fall through */
+		default:
+			/* Operational error. */
+			str_errno(ctx, buf);
+			return CHECK_DONE;
+		}
+	}
+
+	/*
+	 * If the kernel says the test was incomplete or that there was
+	 * a cross-referencing discrepancy but no obvious corruption,
+	 * we'll try the scan again, just in case the fs was busy.
+	 * Only retry so many times.
+	 */
+	if (tries < 10 && (is_incomplete(meta) ||
+			   (xref_disagrees(meta) && !is_corrupt(meta)))) {
+		tries++;
+		goto retry;
+	}
+
+	/* Complain about incomplete or suspicious metadata. */
+	xfs_scrub_warn_incomplete_scrub(ctx, buf, meta);
+
+	/*
+	 * If we need repairs or there were discrepancies, schedule a
+	 * repair if desired, otherwise complain.
+	 */
+	if (is_corrupt(meta) || xref_disagrees(meta)) {
+		if (ctx->mode < SCRUB_MODE_REPAIR) {
+			str_error(ctx, buf,
+_("Repairs are required."));
+			return CHECK_DONE;
+		}
+
+		return CHECK_REPAIR;
+	}
+
+	/*
+	 * If we could optimize, schedule a repair if desired,
+	 * otherwise complain.
+	 */
+	if (is_unoptimized(meta)) {
+		if (ctx->mode < SCRUB_MODE_PREEN) {
+			if (!is_inode) {
+				/* AG or FS metadata, always warn. */
+				str_info(ctx, buf,
+_("Optimization is possible."));
+			} else if (!ctx->preen_triggers[meta->sm_type]) {
+				/* File metadata, only warn once per type. */
+				pthread_mutex_lock(&ctx->lock);
+				if (!ctx->preen_triggers[meta->sm_type])
+					ctx->preen_triggers[meta->sm_type] = true;
+				pthread_mutex_unlock(&ctx->lock);
+			}
+			return CHECK_DONE;
+		}
+
+		return CHECK_REPAIR;
+	}
+
+	/* Everything is ok. */
+	return CHECK_DONE;
+}
+
+/* Bulk-notify user about things that could be optimized. */
+void
+xfs_scrub_report_preen_triggers(
+	struct scrub_ctx		*ctx)
+{
+	int				i;
+
+	for (i = 0; i < XFS_SCRUB_TYPE_NR; i++) {
+		pthread_mutex_lock(&ctx->lock);
+		if (ctx->preen_triggers[i]) {
+			ctx->preen_triggers[i] = false;
+			pthread_mutex_unlock(&ctx->lock);
+			str_info(ctx, ctx->mntpoint,
+_("Optimizations of %s are possible."), scrubbers[i].name);
+		} else {
+			pthread_mutex_unlock(&ctx->lock);
+		}
+	}
+}
+
+/* Scrub metadata, saving corruption reports for later. */
+static bool
+xfs_scrub_metadata(
+	struct scrub_ctx		*ctx,
+	enum scrub_type			scrub_type,
+	xfs_agnumber_t			agno)
+{
+	struct xfs_scrub_metadata	meta = {0};
+	const struct scrub_descr	*sc;
+	enum check_outcome		fix;
+	int				type;
+
+	sc = scrubbers;
+	for (type = 0; type < XFS_SCRUB_TYPE_NR; type++, sc++) {
+		if (sc->type != scrub_type)
+			continue;
+
+		meta.sm_type = type;
+		meta.sm_flags = 0;
+		meta.sm_agno = agno;
+		background_sleep();
+
+		/* Check the item. */
+		fix = xfs_check_metadata(ctx, ctx->mnt_fd, &meta, false);
+		switch (fix) {
+		case CHECK_ABORT:
+			return false;
+		case CHECK_REPAIR:
+			/* fall through */
+		case CHECK_DONE:
+			continue;
+		case CHECK_RETRY:
+			abort();
+			break;
+		}
+	}
+
+	return true;
+}
+
+/*
+ * Scrub primary superblock.  This will be useful if we ever need to hook
+ * a filesystem-wide pre-scrub activity off of the sb 0 scrubber (which
+ * currently does nothing).
+ */
+bool
+xfs_scrub_primary_super(
+	struct scrub_ctx		*ctx)
+{
+	struct xfs_scrub_metadata	meta = {
+		.sm_type = XFS_SCRUB_TYPE_SB,
+	};
+	enum check_outcome		fix;
+
+	/* Check the item. */
+	fix = xfs_check_metadata(ctx, ctx->mnt_fd, &meta, false);
+	switch (fix) {
+	case CHECK_ABORT:
+		return false;
+	case CHECK_REPAIR:
+		/* fall through */
+	case CHECK_DONE:
+		return true;
+	case CHECK_RETRY:
+		abort();
+		break;
+	}
+
+	return true;
+}
+
+/* Scrub each AG's header blocks. */
+bool
+xfs_scrub_ag_headers(
+	struct scrub_ctx		*ctx,
+	xfs_agnumber_t			agno)
+{
+	return xfs_scrub_metadata(ctx, ST_AGHEADER, agno);
+}
+
+/* Scrub each AG's metadata btrees. */
+bool
+xfs_scrub_ag_metadata(
+	struct scrub_ctx		*ctx,
+	xfs_agnumber_t			agno)
+{
+	return xfs_scrub_metadata(ctx, ST_PERAG, agno);
+}
+
+/* Scrub whole-FS metadata btrees. */
+bool
+xfs_scrub_fs_metadata(
+	struct scrub_ctx		*ctx)
+{
+	return xfs_scrub_metadata(ctx, ST_FS, 0);
+}
+
+/* Scrub inode metadata. */
+static bool
+__xfs_scrub_file(
+	struct scrub_ctx		*ctx,
+	uint64_t			ino,
+	uint32_t			gen,
+	int				fd,
+	unsigned int			type)
+{
+	struct xfs_scrub_metadata	meta = {0};
+	enum check_outcome		fix;
+
+	assert(type < XFS_SCRUB_TYPE_NR);
+	assert(scrubbers[type].type == ST_INODE);
+
+	meta.sm_type = type;
+	meta.sm_ino = ino;
+	meta.sm_gen = gen;
+
+	/* Scrub the piece of metadata. */
+	fix = xfs_check_metadata(ctx, fd, &meta, true);
+	if (fix == CHECK_ABORT)
+		return false;
+	if (fix == CHECK_DONE)
+		return true;
+
+	return true;
+}
+
+bool
+xfs_scrub_inode_fields(
+	struct scrub_ctx	*ctx,
+	uint64_t		ino,
+	uint32_t		gen,
+	int			fd)
+{
+	return __xfs_scrub_file(ctx, ino, gen, fd, XFS_SCRUB_TYPE_INODE);
+}
+
+bool
+xfs_scrub_data_fork(
+	struct scrub_ctx	*ctx,
+	uint64_t		ino,
+	uint32_t		gen,
+	int			fd)
+{
+	return __xfs_scrub_file(ctx, ino, gen, fd, XFS_SCRUB_TYPE_BMBTD);
+}
+
+bool
+xfs_scrub_attr_fork(
+	struct scrub_ctx	*ctx,
+	uint64_t		ino,
+	uint32_t		gen,
+	int			fd)
+{
+	return __xfs_scrub_file(ctx, ino, gen, fd, XFS_SCRUB_TYPE_BMBTA);
+}
+
+bool
+xfs_scrub_cow_fork(
+	struct scrub_ctx	*ctx,
+	uint64_t		ino,
+	uint32_t		gen,
+	int			fd)
+{
+	return __xfs_scrub_file(ctx, ino, gen, fd, XFS_SCRUB_TYPE_BMBTC);
+}
+
+bool
+xfs_scrub_dir(
+	struct scrub_ctx	*ctx,
+	uint64_t		ino,
+	uint32_t		gen,
+	int			fd)
+{
+	return __xfs_scrub_file(ctx, ino, gen, fd, XFS_SCRUB_TYPE_DIR);
+}
+
+bool
+xfs_scrub_attr(
+	struct scrub_ctx	*ctx,
+	uint64_t		ino,
+	uint32_t		gen,
+	int			fd)
+{
+	return __xfs_scrub_file(ctx, ino, gen, fd, XFS_SCRUB_TYPE_XATTR);
+}
+
+bool
+xfs_scrub_symlink(
+	struct scrub_ctx	*ctx,
+	uint64_t		ino,
+	uint32_t		gen,
+	int			fd)
+{
+	return __xfs_scrub_file(ctx, ino, gen, fd, XFS_SCRUB_TYPE_SYMLINK);
+}
+
+bool
+xfs_scrub_parent(
+	struct scrub_ctx	*ctx,
+	uint64_t		ino,
+	uint32_t		gen,
+	int			fd)
+{
+	return __xfs_scrub_file(ctx, ino, gen, fd, XFS_SCRUB_TYPE_PARENT);
+}
+
+/* Test the availability of a kernel scrub command. */
+static bool
+__xfs_scrub_test(
+	struct scrub_ctx		*ctx,
+	unsigned int			type,
+	bool				repair)
+{
+	struct xfs_scrub_metadata	meta = {0};
+	int				error;
+
+	if (debug_tweak_on("XFS_SCRUB_NO_KERNEL"))
+		return false;
+
+	meta.sm_type = type;
+	if (repair)
+		meta.sm_flags |= XFS_SCRUB_IFLAG_REPAIR;
+	error = ioctl(ctx->mnt_fd, XFS_IOC_SCRUB_METADATA, &meta);
+	if (!error)
+		return true;
+	switch (errno) {
+	case EROFS:
+		str_info(ctx, ctx->mntpoint,
+_("Filesystem is mounted read-only; cannot proceed."));
+		return false;
+	case ENOTRECOVERABLE:
+		str_info(ctx, ctx->mntpoint,
+_("Filesystem is mounted norecovery; cannot proceed."));
+		return false;
+	case EOPNOTSUPP:
+	case ENOTTY:
+		str_info(ctx, ctx->mntpoint,
+_("Kernel %s %s facility is required."),
+				_(scrubbers[type].name),
+				repair ? _("repair") : _("scrub"));
+		return false;
+	case ENOENT:
+		/* Scrubber says not present on this fs; that's fine. */
+		return true;
+	default:
+		str_info(ctx, ctx->mntpoint, "%s", strerror(errno));
+		return true;
+	}
+	return error == 0 || (error && errno != EOPNOTSUPP && errno != ENOTTY);
+}
+
+bool
+xfs_can_scrub_fs_metadata(
+	struct scrub_ctx	*ctx)
+{
+	return __xfs_scrub_test(ctx, XFS_SCRUB_TYPE_PROBE, false);
+}
+
+bool
+xfs_can_scrub_inode(
+	struct scrub_ctx	*ctx)
+{
+	return __xfs_scrub_test(ctx, XFS_SCRUB_TYPE_INODE, false);
+}
+
+bool
+xfs_can_scrub_bmap(
+	struct scrub_ctx	*ctx)
+{
+	return __xfs_scrub_test(ctx, XFS_SCRUB_TYPE_BMBTD, false);
+}
+
+bool
+xfs_can_scrub_dir(
+	struct scrub_ctx	*ctx)
+{
+	return __xfs_scrub_test(ctx, XFS_SCRUB_TYPE_DIR, false);
+}
+
+bool
+xfs_can_scrub_attr(
+	struct scrub_ctx	*ctx)
+{
+	return __xfs_scrub_test(ctx, XFS_SCRUB_TYPE_XATTR, false);
+}
+
+bool
+xfs_can_scrub_symlink(
+	struct scrub_ctx	*ctx)
+{
+	return __xfs_scrub_test(ctx, XFS_SCRUB_TYPE_SYMLINK, false);
+}
+
+bool
+xfs_can_scrub_parent(
+	struct scrub_ctx	*ctx)
+{
+	return __xfs_scrub_test(ctx, XFS_SCRUB_TYPE_PARENT, false);
+}
diff --git a/scrub/scrub.h b/scrub/scrub.h
new file mode 100644
index 0000000..0b454df
--- /dev/null
+++ b/scrub/scrub.h
@@ -0,0 +1,62 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef XFS_SCRUB_SCRUB_H_
+#define XFS_SCRUB_SCRUB_H_
+
+/* Online scrub and repair. */
+enum check_outcome {
+	CHECK_DONE,	/* no further processing needed */
+	CHECK_REPAIR,	/* schedule this for repairs */
+	CHECK_ABORT,	/* end program */
+	CHECK_RETRY,	/* repair failed, try again later */
+};
+
+void xfs_scrub_report_preen_triggers(struct scrub_ctx *ctx);
+bool xfs_scrub_primary_super(struct scrub_ctx *ctx);
+bool xfs_scrub_ag_headers(struct scrub_ctx *ctx, xfs_agnumber_t agno);
+bool xfs_scrub_ag_metadata(struct scrub_ctx *ctx, xfs_agnumber_t agno);
+bool xfs_scrub_fs_metadata(struct scrub_ctx *ctx);
+
+bool xfs_can_scrub_fs_metadata(struct scrub_ctx *ctx);
+bool xfs_can_scrub_inode(struct scrub_ctx *ctx);
+bool xfs_can_scrub_bmap(struct scrub_ctx *ctx);
+bool xfs_can_scrub_dir(struct scrub_ctx *ctx);
+bool xfs_can_scrub_attr(struct scrub_ctx *ctx);
+bool xfs_can_scrub_symlink(struct scrub_ctx *ctx);
+bool xfs_can_scrub_parent(struct scrub_ctx *ctx);
+
+bool xfs_scrub_inode_fields(struct scrub_ctx *ctx, uint64_t ino, uint32_t gen,
+		int fd);
+bool xfs_scrub_data_fork(struct scrub_ctx *ctx, uint64_t ino, uint32_t gen,
+		int fd);
+bool xfs_scrub_attr_fork(struct scrub_ctx *ctx, uint64_t ino, uint32_t gen,
+		int fd);
+bool xfs_scrub_cow_fork(struct scrub_ctx *ctx, uint64_t ino, uint32_t gen,
+		int fd);
+bool xfs_scrub_dir(struct scrub_ctx *ctx, uint64_t ino, uint32_t gen,
+		int fd);
+bool xfs_scrub_attr(struct scrub_ctx *ctx, uint64_t ino, uint32_t gen,
+		int fd);
+bool xfs_scrub_symlink(struct scrub_ctx *ctx, uint64_t ino, uint32_t gen,
+		int fd);
+bool xfs_scrub_parent(struct scrub_ctx *ctx, uint64_t ino, uint32_t gen,
+		int fd);
+
+#endif /* XFS_SCRUB_SCRUB_H_ */


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 13/27] xfs_scrub: scan filesystem and AG metadata
  2018-01-06  1:51 [PATCH v11 00/27] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (11 preceding siblings ...)
  2018-01-06  1:52 ` [PATCH 12/27] xfs_scrub: wrap the scrub ioctl Darrick J. Wong
@ 2018-01-06  1:52 ` Darrick J. Wong
  2018-01-06  1:52 ` [PATCH 14/27] xfs_scrub: thread-safe stats counter Darrick J. Wong
                   ` (17 subsequent siblings)
  30 siblings, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-06  1:52 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Scrub the filesystem and per-AG metadata.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/Makefile    |    1 
 scrub/phase2.c    |  133 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/xfs_scrub.c |    1 
 scrub/xfs_scrub.h |    1 
 4 files changed, 136 insertions(+)
 create mode 100644 scrub/phase2.c


diff --git a/scrub/Makefile b/scrub/Makefile
index 915b801..9edc933 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -32,6 +32,7 @@ filemap.c \
 fscounters.c \
 inodes.c \
 phase1.c \
+phase2.c \
 scrub.c \
 spacemap.c \
 xfs_scrub.c
diff --git a/scrub/phase2.c b/scrub/phase2.c
new file mode 100644
index 0000000..153ae02
--- /dev/null
+++ b/scrub/phase2.c
@@ -0,0 +1,133 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/statvfs.h>
+#include "xfs.h"
+#include "path.h"
+#include "workqueue.h"
+#include "xfs_scrub.h"
+#include "common.h"
+#include "scrub.h"
+
+/* Phase 2: Check internal metadata. */
+
+/* Scrub each AG's metadata btrees. */
+static void
+xfs_scan_ag_metadata(
+	struct workqueue		*wq,
+	xfs_agnumber_t			agno,
+	void				*arg)
+{
+	struct scrub_ctx		*ctx = (struct scrub_ctx *)wq->wq_ctx;
+	bool				*pmoveon = arg;
+	bool				moveon;
+	char				descr[DESCR_BUFSZ];
+
+	snprintf(descr, DESCR_BUFSZ, _("AG %u"), agno);
+
+	/*
+	 * First we scrub and fix the AG headers, because we need
+	 * them to work well enough to check the AG btrees.
+	 */
+	moveon = xfs_scrub_ag_headers(ctx, agno);
+	if (!moveon)
+		goto err;
+
+	/* Now scrub the AG btrees. */
+	moveon = xfs_scrub_ag_metadata(ctx, agno);
+	if (!moveon)
+		goto err;
+
+	return;
+err:
+	*pmoveon = false;
+}
+
+/* Scrub whole-FS metadata btrees. */
+static void
+xfs_scan_fs_metadata(
+	struct workqueue		*wq,
+	xfs_agnumber_t			agno,
+	void				*arg)
+{
+	struct scrub_ctx		*ctx = (struct scrub_ctx *)wq->wq_ctx;
+	bool				*pmoveon = arg;
+	bool				moveon;
+
+	moveon = xfs_scrub_fs_metadata(ctx);
+	if (!moveon)
+		*pmoveon = false;
+}
+
+/* Scan all filesystem metadata. */
+bool
+xfs_scan_metadata(
+	struct scrub_ctx	*ctx)
+{
+	struct workqueue	wq;
+	xfs_agnumber_t		agno;
+	bool			moveon = true;
+	int			ret;
+
+	ret = workqueue_create(&wq, (struct xfs_mount *)ctx,
+			scrub_nproc_workqueue(ctx));
+	if (ret) {
+		str_error(ctx, ctx->mntpoint, _("Could not create workqueue."));
+		return false;
+	}
+
+	/*
+	 * In case we ever use the primary super scrubber to perform fs
+	 * upgrades (followed by a full scrub), do that before we launch
+	 * anything else.
+	 */
+	moveon = xfs_scrub_primary_super(ctx);
+	if (!moveon)
+		return moveon;
+
+	for (agno = 0; moveon && agno < ctx->geo.agcount; agno++) {
+		ret = workqueue_add(&wq, xfs_scan_ag_metadata, agno, &moveon);
+		if (ret) {
+			moveon = false;
+			str_error(ctx, ctx->mntpoint,
+_("Could not queue AG %u scrub work."), agno);
+			goto out;
+		}
+	}
+
+	if (!moveon)
+		goto out;
+
+	ret = workqueue_add(&wq, xfs_scan_fs_metadata, 0, &moveon);
+	if (ret) {
+		moveon = false;
+		str_error(ctx, ctx->mntpoint,
+_("Could not queue filesystem scrub work."));
+		goto out;
+	}
+
+out:
+	workqueue_destroy(&wq);
+	return moveon;
+}
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index a733b8f..be52a98 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -353,6 +353,7 @@ run_scrub_phases(
 		},
 		{
 			.descr = _("Check internal metadata."),
+			.fn = xfs_scan_metadata,
 		},
 		{
 			.descr = _("Scan all inodes."),
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index 2be7c65..4c3882b 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -97,5 +97,6 @@ struct scrub_ctx {
 void xfs_shutdown_fs(struct scrub_ctx *ctx);
 bool xfs_cleanup_fs(struct scrub_ctx *ctx);
 bool xfs_setup_fs(struct scrub_ctx *ctx);
+bool xfs_scan_metadata(struct scrub_ctx *ctx);
 
 #endif /* XFS_SCRUB_XFS_SCRUB_H_ */


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 14/27] xfs_scrub: thread-safe stats counter
  2018-01-06  1:51 [PATCH v11 00/27] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (12 preceding siblings ...)
  2018-01-06  1:52 ` [PATCH 13/27] xfs_scrub: scan filesystem and AG metadata Darrick J. Wong
@ 2018-01-06  1:52 ` Darrick J. Wong
  2018-01-06  1:53 ` [PATCH 15/27] xfs_scrub: scan inodes Darrick J. Wong
                   ` (16 subsequent siblings)
  30 siblings, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-06  1:52 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Create a threaded stats counter that we'll use to track scan progress.
This includes things like how much of the disk blocks we've scanned,
or later how much progress we've made in each phase.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 include/ptvar.h  |   32 +++++++++++++
 libfrog/Makefile |    1 
 libfrog/ptvar.c  |  133 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/Makefile   |    2 +
 scrub/counter.c  |  104 ++++++++++++++++++++++++++++++++++++++++++
 scrub/counter.h  |   29 ++++++++++++
 6 files changed, 301 insertions(+)
 create mode 100644 include/ptvar.h
 create mode 100644 libfrog/ptvar.c
 create mode 100644 scrub/counter.c
 create mode 100644 scrub/counter.h


diff --git a/include/ptvar.h b/include/ptvar.h
new file mode 100644
index 0000000..6308228
--- /dev/null
+++ b/include/ptvar.h
@@ -0,0 +1,32 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef LIBFROG_PERCPU_H_
+#define LIBFROG_PERCPU_H_
+
+struct ptvar;
+
+typedef bool (*ptvar_iter_fn)(struct ptvar *ptv, void *data, void *foreach_arg);
+
+struct ptvar *ptvar_init(size_t nr, size_t size);
+void ptvar_free(struct ptvar *ptv);
+void *ptvar_get(struct ptvar *ptv);
+bool ptvar_foreach(struct ptvar *ptv, ptvar_iter_fn fn, void *foreach_arg);
+
+#endif /* LIBFROG_PERCPU_H_ */
diff --git a/libfrog/Makefile b/libfrog/Makefile
index 4c15605..230b08f 100644
--- a/libfrog/Makefile
+++ b/libfrog/Makefile
@@ -16,6 +16,7 @@ convert.c \
 list_sort.c \
 paths.c \
 projects.c \
+ptvar.c \
 radix-tree.c \
 topology.c \
 util.c \
diff --git a/libfrog/ptvar.c b/libfrog/ptvar.c
new file mode 100644
index 0000000..3654706
--- /dev/null
+++ b/libfrog/ptvar.c
@@ -0,0 +1,133 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdint.h>
+#include <stdlib.h>
+#include <stdbool.h>
+#include <string.h>
+#include <assert.h>
+#include <pthread.h>
+#include <unistd.h>
+#include "platform_defs.h"
+#include "ptvar.h"
+
+/*
+ * Per-thread Variables
+ *
+ * This data structure manages a lockless per-thread variable.  We
+ * implement this by allocating an array of memory regions, and as each
+ * thread tries to acquire its own region, we hand out the array
+ * elements to each thread.  This way, each thread gets its own
+ * cacheline and (after the first access) doesn't have to contend for a
+ * lock for each access.
+ */
+struct ptvar {
+	pthread_key_t	key;
+	pthread_mutex_t	lock;
+	size_t		nr_used;
+	size_t		nr_counters;
+	size_t		data_size;
+	unsigned char	data[0];
+};
+#define PTVAR_SIZE(nr, sz) (sizeof(struct ptvar) + ((nr) * (size)))
+
+/* Initialize per-thread counter. */
+struct ptvar *
+ptvar_init(
+	size_t		nr,
+	size_t		size)
+{
+	struct ptvar	*ptv;
+	int		ret;
+
+#ifdef _SC_LEVEL1_DCACHE_LINESIZE
+	/* Try to prevent cache pingpong by aligning to cacheline size. */
+	size = max(size, sysconf(_SC_LEVEL1_DCACHE_LINESIZE));
+#endif
+
+	ptv = malloc(PTVAR_SIZE(nr, size));
+	if (!ptv)
+		return NULL;
+	ptv->data_size = size;
+	ptv->nr_counters = nr;
+	ptv->nr_used = 0;
+	memset(ptv->data, 0, nr * size);
+	ret = pthread_mutex_init(&ptv->lock, NULL);
+	if (ret)
+		goto out;
+	ret = pthread_key_create(&ptv->key, NULL);
+	if (ret)
+		goto out_mutex;
+	return ptv;
+
+out_mutex:
+	pthread_mutex_destroy(&ptv->lock);
+out:
+	free(ptv);
+	return NULL;
+}
+
+/* Free per-thread counter. */
+void
+ptvar_free(
+	struct ptvar	*ptv)
+{
+	pthread_key_delete(ptv->key);
+	pthread_mutex_destroy(&ptv->lock);
+	free(ptv);
+}
+
+/* Get a reference to this thread's variable. */
+void *
+ptvar_get(
+	struct ptvar	*ptv)
+{
+	void		*p;
+
+	p = pthread_getspecific(ptv->key);
+	if (!p) {
+		pthread_mutex_lock(&ptv->lock);
+		assert(ptv->nr_used < ptv->nr_counters);
+		p = &ptv->data[(ptv->nr_used++) * ptv->data_size];
+		pthread_setspecific(ptv->key, p);
+		pthread_mutex_unlock(&ptv->lock);
+	}
+	return p;
+}
+
+/* Iterate all of the per-thread variables. */
+bool
+ptvar_foreach(
+	struct ptvar	*ptv,
+	ptvar_iter_fn	fn,
+	void		*foreach_arg)
+{
+	size_t		i;
+	bool		ret = true;
+
+	pthread_mutex_lock(&ptv->lock);
+	for (i = 0; i < ptv->nr_used; i++) {
+		ret = fn(ptv, &ptv->data[i * ptv->data_size], foreach_arg);
+		if (!ret)
+			break;
+	}
+	pthread_mutex_unlock(&ptv->lock);
+
+	return ret;
+}
diff --git a/scrub/Makefile b/scrub/Makefile
index 9edc933..30dbe54 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -17,6 +17,7 @@ endif	# scrub_prereqs
 
 HFILES = \
 common.h \
+counter.h \
 disk.h \
 filemap.h \
 fscounters.h \
@@ -27,6 +28,7 @@ xfs_scrub.h
 
 CFILES = \
 common.c \
+counter.c \
 disk.c \
 filemap.c \
 fscounters.c \
diff --git a/scrub/counter.c b/scrub/counter.c
new file mode 100644
index 0000000..ced3cf3
--- /dev/null
+++ b/scrub/counter.c
@@ -0,0 +1,104 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdint.h>
+#include <stdlib.h>
+#include <stdbool.h>
+#include <string.h>
+#include <assert.h>
+#include <pthread.h>
+#include "ptvar.h"
+#include "counter.h"
+
+/*
+ * Per-Thread Counters
+ *
+ * This is a global counter object that uses per-thread counters to
+ * count things without having to content for a single shared lock.
+ * Provided we know the number of threads that will be accessing the
+ * counter, each thread gets its own thread-specific counter variable.
+ * Changing the value is fast, though retrieving the value is expensive
+ * and approximate.
+ */
+struct ptcounter {
+	struct ptvar	*var;
+};
+
+/* Initialize per-thread counter. */
+struct ptcounter *
+ptcounter_init(
+	size_t			nr)
+{
+	struct ptcounter	*p;
+
+	p = malloc(sizeof(struct ptcounter));
+	if (!p)
+		return NULL;
+	p->var = ptvar_init(nr, sizeof(uint64_t));
+	if (!p->var) {
+		free(p);
+		return NULL;
+	}
+	return p;
+}
+
+/* Free per-thread counter. */
+void
+ptcounter_free(
+	struct ptcounter	*ptc)
+{
+	ptvar_free(ptc->var);
+	free(ptc);
+}
+
+/* Add a quantity to the counter. */
+void
+ptcounter_add(
+	struct ptcounter	*ptc,
+	int64_t			nr)
+{
+	uint64_t		*p;
+
+	p = ptvar_get(ptc->var);
+	*p += nr;
+}
+
+static bool
+ptcounter_val_helper(
+	struct ptvar		*ptv,
+	void			*data,
+	void			*foreach_arg)
+{
+	uint64_t		*sum = foreach_arg;
+	uint64_t		*count = data;
+
+	*sum += *count;
+	return true;
+}
+
+/* Return the approximate value of this counter. */
+uint64_t
+ptcounter_value(
+	struct ptcounter	*ptc)
+{
+	uint64_t		sum = 0;
+
+	ptvar_foreach(ptc->var, ptcounter_val_helper, &sum);
+	return sum;
+}
diff --git a/scrub/counter.h b/scrub/counter.h
new file mode 100644
index 0000000..2aac795
--- /dev/null
+++ b/scrub/counter.h
@@ -0,0 +1,29 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef XFS_SCRUB_COUNTER_H_
+#define XFS_SCRUB_COUNTER_H_
+
+struct ptcounter;
+struct ptcounter *ptcounter_init(size_t nr);
+void ptcounter_free(struct ptcounter *ptc);
+void ptcounter_add(struct ptcounter *ptc, int64_t nr);
+uint64_t ptcounter_value(struct ptcounter *ptc);
+
+#endif /* XFS_SCRUB_COUNTER_H_ */


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 15/27] xfs_scrub: scan inodes
  2018-01-06  1:51 [PATCH v11 00/27] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (13 preceding siblings ...)
  2018-01-06  1:52 ` [PATCH 14/27] xfs_scrub: thread-safe stats counter Darrick J. Wong
@ 2018-01-06  1:53 ` Darrick J. Wong
  2018-01-06  1:53 ` [PATCH 16/27] xfs_scrub: check directory connectivity Darrick J. Wong
                   ` (15 subsequent siblings)
  30 siblings, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-06  1:53 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Scan all the inodes in the system for problems.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/Makefile    |    1 
 scrub/phase3.c    |  152 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/xfs_scrub.c |    1 
 scrub/xfs_scrub.h |    2 +
 4 files changed, 156 insertions(+)
 create mode 100644 scrub/phase3.c


diff --git a/scrub/Makefile b/scrub/Makefile
index 30dbe54..e0d15d8 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -35,6 +35,7 @@ fscounters.c \
 inodes.c \
 phase1.c \
 phase2.c \
+phase3.c \
 scrub.c \
 spacemap.c \
 xfs_scrub.c
diff --git a/scrub/phase3.c b/scrub/phase3.c
new file mode 100644
index 0000000..b3fc510
--- /dev/null
+++ b/scrub/phase3.c
@@ -0,0 +1,152 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/statvfs.h>
+#include "xfs.h"
+#include "path.h"
+#include "workqueue.h"
+#include "xfs_scrub.h"
+#include "common.h"
+#include "counter.h"
+#include "inodes.h"
+#include "scrub.h"
+
+/* Phase 3: Scan all inodes. */
+
+/*
+ * Run a per-file metadata scanner.  We use the ino/gen interface to
+ * ensure that the inode we're checking matches what the inode scan
+ * told us to look at.
+ */
+static bool
+xfs_scrub_fd(
+	struct scrub_ctx	*ctx,
+	bool			(*fn)(struct scrub_ctx *, uint64_t,
+				      uint32_t, int),
+	struct xfs_bstat	*bs)
+{
+	return fn(ctx, bs->bs_ino, bs->bs_gen, ctx->mnt_fd);
+}
+
+struct scrub_inode_ctx {
+	struct ptcounter	*icount;
+	bool			moveon;
+};
+
+/* Verify the contents, xattrs, and extent maps of an inode. */
+static int
+xfs_scrub_inode(
+	struct scrub_ctx	*ctx,
+	struct xfs_handle	*handle,
+	struct xfs_bstat	*bstat,
+	void			*arg)
+{
+	struct scrub_inode_ctx	*ictx = arg;
+	struct ptcounter	*icount = ictx->icount;
+	bool			moveon = true;
+	int			fd = -1;
+
+	background_sleep();
+
+	/* Try to open the inode to pin it. */
+	if (S_ISREG(bstat->bs_mode)) {
+		fd = xfs_open_handle(handle);
+		/* Stale inode means we scan the whole cluster again. */
+		if (fd < 0 && errno == ESTALE)
+			return ESTALE;
+	}
+
+	/* Scrub the inode. */
+	moveon = xfs_scrub_fd(ctx, xfs_scrub_inode_fields, bstat);
+	if (!moveon)
+		goto out;
+
+	/* Scrub all block mappings. */
+	moveon = xfs_scrub_fd(ctx, xfs_scrub_data_fork, bstat);
+	if (!moveon)
+		goto out;
+	moveon = xfs_scrub_fd(ctx, xfs_scrub_attr_fork, bstat);
+	if (!moveon)
+		goto out;
+	moveon = xfs_scrub_fd(ctx, xfs_scrub_cow_fork, bstat);
+	if (!moveon)
+		goto out;
+
+	if (S_ISLNK(bstat->bs_mode)) {
+		/* Check symlink contents. */
+		moveon = xfs_scrub_symlink(ctx, bstat->bs_ino,
+				bstat->bs_gen, ctx->mnt_fd);
+	} else if (S_ISDIR(bstat->bs_mode)) {
+		/* Check the directory entries. */
+		moveon = xfs_scrub_fd(ctx, xfs_scrub_dir, bstat);
+	}
+	if (!moveon)
+		goto out;
+
+	/* Check all the extended attributes. */
+	moveon = xfs_scrub_fd(ctx, xfs_scrub_attr, bstat);
+	if (!moveon)
+		goto out;
+
+	/* Check parent pointers. */
+	moveon = xfs_scrub_fd(ctx, xfs_scrub_parent, bstat);
+	if (!moveon)
+		goto out;
+
+out:
+	ptcounter_add(icount, 1);
+	if (fd >= 0)
+		close(fd);
+	if (!moveon)
+		ictx->moveon = false;
+	return ictx->moveon ? 0 : XFS_ITERATE_INODES_ABORT;
+}
+
+/* Verify all the inodes in a filesystem. */
+bool
+xfs_scan_inodes(
+	struct scrub_ctx	*ctx)
+{
+	struct scrub_inode_ctx	ictx;
+	bool			ret;
+
+	ictx.moveon = true;
+	ictx.icount = ptcounter_init(scrub_nproc(ctx));
+	if (!ictx.icount) {
+		str_error(ctx, ctx->mntpoint, _("Could not create counter."));
+		return false;
+	}
+
+	ret = xfs_scan_all_inodes(ctx, xfs_scrub_inode, &ictx);
+	if (!ret)
+		ictx.moveon = false;
+	if (!ictx.moveon)
+		goto free;
+	xfs_scrub_report_preen_triggers(ctx);
+	ctx->inodes_checked = ptcounter_value(ictx.icount);
+
+free:
+	ptcounter_free(ictx.icount);
+	return ictx.moveon;
+}
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index be52a98..5bde6cf 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -357,6 +357,7 @@ run_scrub_phases(
 		},
 		{
 			.descr = _("Scan all inodes."),
+			.fn = xfs_scan_inodes,
 		},
 		{
 			.descr = _("Defer filesystem repairs."),
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index 4c3882b..41d471b 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -89,6 +89,7 @@ struct scrub_ctx {
 	unsigned long long	runtime_errors;
 	unsigned long long	errors_found;
 	unsigned long long	warnings_found;
+	unsigned long long	inodes_checked;
 	bool			need_repair;
 	bool			preen_triggers[XFS_SCRUB_TYPE_NR];
 };
@@ -98,5 +99,6 @@ void xfs_shutdown_fs(struct scrub_ctx *ctx);
 bool xfs_cleanup_fs(struct scrub_ctx *ctx);
 bool xfs_setup_fs(struct scrub_ctx *ctx);
 bool xfs_scan_metadata(struct scrub_ctx *ctx);
+bool xfs_scan_inodes(struct scrub_ctx *ctx);
 
 #endif /* XFS_SCRUB_XFS_SCRUB_H_ */


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 16/27] xfs_scrub: check directory connectivity
  2018-01-06  1:51 [PATCH v11 00/27] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (14 preceding siblings ...)
  2018-01-06  1:53 ` [PATCH 15/27] xfs_scrub: scan inodes Darrick J. Wong
@ 2018-01-06  1:53 ` Darrick J. Wong
  2018-01-06  1:53 ` [PATCH 17/27] xfs_scrub: warn about suspicious characters in directory/xattr names Darrick J. Wong
                   ` (14 subsequent siblings)
  30 siblings, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-06  1:53 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Opening directories by file handle will cause the kernel to perform
parent lookups all the way to the root directory.  Take advantage of
this to ensure that directories actually connect to the root.  Some
day we'll have parent pointers and can make this more comprehensive.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/Makefile    |    1 +
 scrub/phase5.c    |  101 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/xfs_scrub.c |    1 +
 scrub/xfs_scrub.h |    1 +
 4 files changed, 104 insertions(+)
 create mode 100644 scrub/phase5.c


diff --git a/scrub/Makefile b/scrub/Makefile
index e0d15d8..adb868e 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -36,6 +36,7 @@ inodes.c \
 phase1.c \
 phase2.c \
 phase3.c \
+phase5.c \
 scrub.c \
 spacemap.c \
 xfs_scrub.c
diff --git a/scrub/phase5.c b/scrub/phase5.c
new file mode 100644
index 0000000..0b161e3
--- /dev/null
+++ b/scrub/phase5.c
@@ -0,0 +1,101 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/statvfs.h>
+#include "xfs.h"
+#include "path.h"
+#include "workqueue.h"
+#include "xfs_scrub.h"
+#include "common.h"
+#include "inodes.h"
+#include "scrub.h"
+
+/* Phase 5: Check directory connectivity. */
+
+/*
+ * Verify the connectivity of the directory tree.
+ * We know that the kernel's open-by-handle function will try to reconnect
+ * parents of an opened directory, so we'll accept that as sufficient.
+ */
+static int
+xfs_scrub_connections(
+	struct scrub_ctx	*ctx,
+	struct xfs_handle	*handle,
+	struct xfs_bstat	*bstat,
+	void			*arg)
+{
+	bool			*pmoveon = arg;
+	char			descr[DESCR_BUFSZ];
+	bool			moveon = true;
+	xfs_agnumber_t		agno;
+	xfs_agino_t		agino;
+	int			fd = -1;
+
+	agno = bstat->bs_ino / (1ULL << (ctx->inopblog + ctx->agblklog));
+	agino = bstat->bs_ino % (1ULL << (ctx->inopblog + ctx->agblklog));
+	snprintf(descr, DESCR_BUFSZ, _("inode %"PRIu64" (%u/%u)"),
+			(uint64_t)bstat->bs_ino, agno, agino);
+	background_sleep();
+
+	/* Open the dir, let the kernel try to reconnect it to the root. */
+	if (S_ISDIR(bstat->bs_mode)) {
+		fd = xfs_open_handle(handle);
+		if (fd < 0) {
+			if (errno == ESTALE)
+				return ESTALE;
+			str_errno(ctx, descr);
+			goto out;
+		}
+	}
+
+out:
+	if (fd >= 0)
+		close(fd);
+	if (!moveon)
+		*pmoveon = false;
+	return *pmoveon ? 0 : XFS_ITERATE_INODES_ABORT;
+}
+
+/* Check directory connectivity. */
+bool
+xfs_scan_connections(
+	struct scrub_ctx	*ctx)
+{
+	bool			moveon = true;
+	bool			ret;
+
+	if (ctx->errors_found) {
+		str_info(ctx, ctx->mntpoint,
+_("Filesystem has errors, skipping connectivity checks."));
+		return true;
+	}
+
+	ret = xfs_scan_all_inodes(ctx, xfs_scrub_connections, &moveon);
+	if (!ret)
+		moveon = false;
+	if (!moveon)
+		return false;
+	xfs_scrub_report_preen_triggers(ctx);
+	return true;
+}
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index 5bde6cf..64517f4 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -365,6 +365,7 @@ run_scrub_phases(
 		},
 		{
 			.descr = _("Check directory tree."),
+			.fn = xfs_scan_connections,
 		},
 		{
 			.descr = _("Verify data file integrity."),
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index 41d471b..c9f53d8 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -100,5 +100,6 @@ bool xfs_cleanup_fs(struct scrub_ctx *ctx);
 bool xfs_setup_fs(struct scrub_ctx *ctx);
 bool xfs_scan_metadata(struct scrub_ctx *ctx);
 bool xfs_scan_inodes(struct scrub_ctx *ctx);
+bool xfs_scan_connections(struct scrub_ctx *ctx);
 
 #endif /* XFS_SCRUB_XFS_SCRUB_H_ */


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 17/27] xfs_scrub: warn about suspicious characters in directory/xattr names
  2018-01-06  1:51 [PATCH v11 00/27] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (15 preceding siblings ...)
  2018-01-06  1:53 ` [PATCH 16/27] xfs_scrub: check directory connectivity Darrick J. Wong
@ 2018-01-06  1:53 ` Darrick J. Wong
  2018-01-06  1:53 ` [PATCH 18/27] xfs_scrub: warn about normalized Unicode name collisions Darrick J. Wong
                   ` (13 subsequent siblings)
  30 siblings, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-06  1:53 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Look for control characters and punctuation that interfere with shell
globbing in directory entry names and extended attribute key names.
Technically these aren't filesystem corruptions because names are
arbitrary sequences of bytes, but they've been known to cause problems
in the Unix environment so warn if we see them.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 configure.ac         |    2 +
 debian/control       |    2 -
 include/builddefs.in |    1 
 m4/Makefile          |    1 
 m4/package_attr.m4   |   23 ++++++
 scrub/Makefile       |    6 ++
 scrub/common.c       |   54 ++++++++++++++
 scrub/common.h       |    4 +
 scrub/phase5.c       |  192 ++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/xfs_scrub.h    |    1 
 10 files changed, 285 insertions(+), 1 deletion(-)
 create mode 100644 m4/package_attr.m4


diff --git a/configure.ac b/configure.ac
index 796a91b..e2e3f66 100644
--- a/configure.ac
+++ b/configure.ac
@@ -166,6 +166,8 @@ AC_HAVE_STATFS_FLAGS
 AC_HAVE_MAP_SYNC
 AC_HAVE_DEVMAPPER
 AC_HAVE_MALLINFO
+AC_PACKAGE_WANT_ATTRIBUTES_H
+AC_HAVE_LIBATTR
 
 if test "$enable_blkid" = yes; then
 AC_HAVE_BLKID_TOPO
diff --git a/debian/control b/debian/control
index f5980b2..f664a6b 100644
--- a/debian/control
+++ b/debian/control
@@ -3,7 +3,7 @@ Section: admin
 Priority: optional
 Maintainer: XFS Development Team <linux-xfs@vger.kernel.org>
 Uploaders: Nathan Scott <nathans@debian.org>, Anibal Monsalve Salazar <anibal@debian.org>
-Build-Depends: uuid-dev, dh-autoreconf, debhelper (>= 5), gettext, libtool, libreadline-gplv2-dev | libreadline5-dev, libblkid-dev (>= 2.17), linux-libc-dev, libdevmapper-dev
+Build-Depends: uuid-dev, dh-autoreconf, debhelper (>= 5), gettext, libtool, libreadline-gplv2-dev | libreadline5-dev, libblkid-dev (>= 2.17), linux-libc-dev, libdevmapper-dev, libattr1-dev
 Standards-Version: 3.9.1
 Homepage: https://xfs.wiki.kernel.org/
 
diff --git a/include/builddefs.in b/include/builddefs.in
index 28cf0d8..cc1b7e2 100644
--- a/include/builddefs.in
+++ b/include/builddefs.in
@@ -120,6 +120,7 @@ HAVE_STATFS_FLAGS = @have_statfs_flags@
 HAVE_MAP_SYNC = @have_map_sync@
 HAVE_DEVMAPPER = @have_devmapper@
 HAVE_MALLINFO = @have_mallinfo@
+HAVE_LIBATTR = @have_libattr@
 
 GCCFLAGS = -funsigned-char -fno-strict-aliasing -Wall
 #	   -Wbitwise -Wno-transparent-union -Wno-old-initializer -Wno-decl
diff --git a/m4/Makefile b/m4/Makefile
index 77f2edd..d5f1d2f 100644
--- a/m4/Makefile
+++ b/m4/Makefile
@@ -17,6 +17,7 @@ LSRCFILES = \
 	package_blkid.m4 \
 	package_devmapper.m4 \
 	package_globals.m4 \
+	package_attr.m4 \
 	package_libcdev.m4 \
 	package_pthread.m4 \
 	package_sanitizer.m4 \
diff --git a/m4/package_attr.m4 b/m4/package_attr.m4
new file mode 100644
index 0000000..4324923
--- /dev/null
+++ b/m4/package_attr.m4
@@ -0,0 +1,23 @@
+AC_DEFUN([AC_PACKAGE_WANT_ATTRIBUTES_H],
+  [
+    AC_CHECK_HEADERS(attr/attributes.h)
+  ])
+
+#
+# Check if we have a ATTR_ROOT flag and libattr structures
+#
+AC_DEFUN([AC_HAVE_LIBATTR],
+  [ AC_MSG_CHECKING([for struct attrlist_cursor])
+    AC_TRY_COMPILE([
+#include <sys/types.h>
+#include <attr/attributes.h>
+       ], [
+struct attrlist_cursor *cur;
+struct attrlist *list;
+struct attrlist_ent *ent;
+int flags = ATTR_ROOT;
+       ], have_libattr=yes
+          AC_MSG_RESULT(yes),
+          AC_MSG_RESULT(no))
+    AC_SUBST(have_libattr)
+  ])
diff --git a/scrub/Makefile b/scrub/Makefile
index adb868e..67ac6af 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -53,8 +53,14 @@ ifeq ($(HAVE_SYNCFS),yes)
 LCFLAGS += -DHAVE_SYNCFS
 endif
 
+ifeq ($(HAVE_LIBATTR),yes)
+LCFLAGS += -DHAVE_LIBATTR
+endif
+
 default: depend $(LTCOMMAND)
 
+phase5.o: $(TOPDIR)/include/builddefs
+
 include $(BUILDRULES)
 
 install: default $(INSTALL_SCRUB)
diff --git a/scrub/common.c b/scrub/common.c
index eb602a8..b02e7fc 100644
--- a/scrub/common.c
+++ b/scrub/common.c
@@ -339,3 +339,57 @@ background_sleep(void)
 	tv.tv_nsec = time % 1000000;
 	nanosleep(&tv, NULL);
 }
+
+/*
+ * Return the input string with non-printing bytes escaped.
+ * Caller must free the buffer.
+ */
+char *
+string_escape(
+	const char		*in)
+{
+	char			*str;
+	const char		*p;
+	char			*q;
+	int			x;
+
+	str = malloc(strlen(in) * 4);
+	if (!str)
+		return NULL;
+	for (p = in, q = str; *p != '\0'; p++) {
+		if (isprint(*p)) {
+			*q = *p;
+			q++;
+		} else {
+			x = sprintf(q, "\\x%02x", *p);
+			q += x;
+		}
+	}
+	*q = '\0';
+	return str;
+}
+
+/*
+ * Record another naming warning, and decide if it's worth
+ * complaining about.
+ */
+bool
+should_warn_about_name(
+	struct scrub_ctx	*ctx)
+{
+	bool			whine;
+	bool			res;
+
+	pthread_mutex_lock(&ctx->lock);
+	ctx->naming_warnings++;
+	whine = ctx->naming_warnings == TOO_MANY_NAME_WARNINGS;
+	res = ctx->naming_warnings < TOO_MANY_NAME_WARNINGS;
+	pthread_mutex_unlock(&ctx->lock);
+
+	if (whine && !(debug || verbose))
+		str_info(ctx, ctx->mntpoint,
+_("More than %u naming warnings, shutting up."),
+				TOO_MANY_NAME_WARNINGS);
+
+	return debug || verbose || res;
+}
diff --git a/scrub/common.h b/scrub/common.h
index 81e83c2..e5a13d8 100644
--- a/scrub/common.h
+++ b/scrub/common.h
@@ -72,5 +72,9 @@ static inline int syncfs(int fd)
 
 bool find_mountpoint(char *mtab, struct scrub_ctx *ctx);
 void background_sleep(void);
+char *string_escape(const char *in);
+
+#define TOO_MANY_NAME_WARNINGS	10000
+bool should_warn_about_name(struct scrub_ctx *ctx);
 
 #endif /* XFS_SCRUB_COMMON_H_ */
diff --git a/scrub/phase5.c b/scrub/phase5.c
index 0b161e3..98d30f8 100644
--- a/scrub/phase5.c
+++ b/scrub/phase5.c
@@ -20,10 +20,15 @@
 #include <stdio.h>
 #include <stdint.h>
 #include <stdbool.h>
+#include <dirent.h>
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <sys/statvfs.h>
+#ifdef HAVE_LIBATTR
+# include <attr/attributes.h>
+#endif
 #include "xfs.h"
+#include "handle.h"
 #include "path.h"
 #include "workqueue.h"
 #include "xfs_scrub.h"
@@ -34,6 +39,181 @@
 /* Phase 5: Check directory connectivity. */
 
 /*
+ * Warn about problematic bytes in a directory/attribute name.  That means
+ * terminal control characters and escape sequences, since that could be used
+ * to do something naughty to the user's computer and/or break scripts.  XFS
+ * doesn't consider any byte sequence invalid, so don't flag these as errors.
+ */
+static bool
+xfs_scrub_check_name(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	const char		*namedescr,
+	const char		*name)
+{
+	const char		*p;
+	bool			bad = false;
+	char			*errname;
+
+	/* Complain about zero length names. */
+	if (*name == '\0' && should_warn_about_name(ctx)) {
+		str_warn(ctx, descr, _("Zero length name found."));
+		return true;
+	}
+
+	/* control characters */
+	for (p = name; *p; p++) {
+		if ((*p >= 1 && *p <= 31) || *p == 127) {
+			bad = true;
+			break;
+		}
+	}
+
+	if (bad && should_warn_about_name(ctx)) {
+		errname = string_escape(name);
+		if (!errname) {
+			str_errno(ctx, descr);
+			return false;
+		}
+		str_info(ctx, descr,
+_("Control character found in %s name \"%s\"."),
+				namedescr, errname);
+		free(errname);
+	}
+
+	return true;
+}
+
+/*
+ * Iterate a directory looking for filenames with problematic
+ * characters.
+ */
+static bool
+xfs_scrub_scan_dirents(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	int			*fd)
+{
+	DIR			*dir;
+	struct dirent		*dentry;
+	bool			moveon = true;
+
+	dir = fdopendir(*fd);
+	if (!dir) {
+		str_errno(ctx, descr);
+		goto out;
+	}
+	*fd = -1; /* closedir will close *fd for us */
+
+	dentry = readdir(dir);
+	while (dentry) {
+		moveon = xfs_scrub_check_name(ctx, descr, _("directory"),
+				dentry->d_name);
+		if (!moveon)
+			break;
+		dentry = readdir(dir);
+	}
+
+	closedir(dir);
+out:
+	return moveon;
+}
+
+#ifdef HAVE_LIBATTR
+/* Routines to scan all of an inode's xattrs for name problems. */
+struct xfs_attr_ns {
+	int			flags;
+	const char		*name;
+};
+
+static const struct xfs_attr_ns attr_ns[] = {
+	{0,			"user"},
+	{ATTR_ROOT,		"system"},
+	{ATTR_SECURE,		"secure"},
+	{0, NULL},
+};
+
+/*
+ * Check all the xattr names in a particular namespace of a file handle
+ * for Unicode normalization problems or collisions.
+ */
+static bool
+xfs_scrub_scan_fhandle_namespace_xattrs(
+	struct scrub_ctx		*ctx,
+	const char			*descr,
+	struct xfs_handle		*handle,
+	const struct xfs_attr_ns	*attr_ns)
+{
+	struct attrlist_cursor		cur;
+	char				attrbuf[XFS_XATTR_LIST_MAX];
+	char				keybuf[NAME_MAX + 1];
+	struct attrlist			*attrlist = (struct attrlist *)attrbuf;
+	struct attrlist_ent		*ent;
+	bool				moveon = true;
+	int				i;
+	int				error;
+
+	memset(attrbuf, 0, XFS_XATTR_LIST_MAX);
+	memset(&cur, 0, sizeof(cur));
+	memset(keybuf, 0, NAME_MAX + 1);
+	error = attr_list_by_handle(handle, sizeof(*handle), attrbuf,
+			XFS_XATTR_LIST_MAX, attr_ns->flags, &cur);
+	while (!error) {
+		/* Examine the xattrs. */
+		for (i = 0; i < attrlist->al_count; i++) {
+			ent = ATTR_ENTRY(attrlist, i);
+			snprintf(keybuf, NAME_MAX, "%s.%s", attr_ns->name,
+					ent->a_name);
+			moveon = xfs_scrub_check_name(ctx, descr,
+					_("extended attribute"), keybuf);
+			if (!moveon)
+				goto out;
+		}
+
+		if (!attrlist->al_more)
+			break;
+		error = attr_list_by_handle(handle, sizeof(*handle), attrbuf,
+				XFS_XATTR_LIST_MAX, attr_ns->flags, &cur);
+	}
+	if (error && errno != ESTALE)
+		str_errno(ctx, descr);
+out:
+	return moveon;
+}
+
+/*
+ * Check all the xattr names in all the xattr namespaces for problematic
+ * characters.
+ */
+static bool
+xfs_scrub_scan_fhandle_xattrs(
+	struct scrub_ctx		*ctx,
+	const char			*descr,
+	struct xfs_handle		*handle)
+{
+	const struct xfs_attr_ns	*ns;
+	bool				moveon = true;
+
+	for (ns = attr_ns; ns->name; ns++) {
+		moveon = xfs_scrub_scan_fhandle_namespace_xattrs(ctx, descr,
+				handle, ns);
+		if (!moveon)
+			break;
+	}
+	return moveon;
+}
+#else
+static inline bool
+xfs_scrub_scan_fhandle_xattrs(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	struct xfs_handle	*handle)
+{
+	return true;
+}
+#endif /* HAVE_LIBATTR */
+
+/*
  * Verify the connectivity of the directory tree.
  * We know that the kernel's open-by-handle function will try to reconnect
  * parents of an opened directory, so we'll accept that as sufficient.
@@ -58,6 +238,11 @@ xfs_scrub_connections(
 			(uint64_t)bstat->bs_ino, agno, agino);
 	background_sleep();
 
+        /* Warn about naming problems in xattrs. */
+        moveon = xfs_scrub_scan_fhandle_xattrs(ctx, descr, handle);
+        if (!moveon)
+                goto out;
+
 	/* Open the dir, let the kernel try to reconnect it to the root. */
 	if (S_ISDIR(bstat->bs_mode)) {
 		fd = xfs_open_handle(handle);
@@ -69,6 +254,13 @@ xfs_scrub_connections(
 		}
 	}
 
+        /* Warn about naming problems in the directory entries. */
+        if (fd >= 0 && S_ISDIR(bstat->bs_mode)) {
+                moveon = xfs_scrub_scan_dirents(ctx, descr, &fd);
+                if (!moveon)
+                        goto out;
+        }
+
 out:
 	if (fd >= 0)
 		close(fd);
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index c9f53d8..66003e4 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -90,6 +90,7 @@ struct scrub_ctx {
 	unsigned long long	errors_found;
 	unsigned long long	warnings_found;
 	unsigned long long	inodes_checked;
+	unsigned long long	naming_warnings;
 	bool			need_repair;
 	bool			preen_triggers[XFS_SCRUB_TYPE_NR];
 };


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 18/27] xfs_scrub: warn about normalized Unicode name collisions
  2018-01-06  1:51 [PATCH v11 00/27] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (16 preceding siblings ...)
  2018-01-06  1:53 ` [PATCH 17/27] xfs_scrub: warn about suspicious characters in directory/xattr names Darrick J. Wong
@ 2018-01-06  1:53 ` Darrick J. Wong
  2018-01-16 23:52   ` Eric Sandeen
  2018-01-06  1:53 ` [PATCH 19/27] xfs_scrub: create a bitmap data structure Darrick J. Wong
                   ` (12 subsequent siblings)
  30 siblings, 1 reply; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-06  1:53 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Iterate all directory and xattr names to look for name collisions
amongst Unicode normalized names.  This is generally a sign of buggy
programs or malicious duplicate files.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 configure.ac            |    2 
 debian/control          |    2 
 include/builddefs.in    |    2 
 m4/Makefile             |    1 
 m4/package_unistring.m4 |   19 ++
 scrub/Makefile          |   12 +
 scrub/common.c          |   20 ++
 scrub/common.h          |    3 
 scrub/phase5.c          |   53 +++++-
 scrub/unicrash.c        |  399 +++++++++++++++++++++++++++++++++++++++++++++++
 scrub/unicrash.h        |   49 ++++++
 scrub/xfs_scrub.c       |    2 
 12 files changed, 546 insertions(+), 18 deletions(-)
 create mode 100644 m4/package_unistring.m4
 create mode 100644 scrub/unicrash.c
 create mode 100644 scrub/unicrash.h


diff --git a/configure.ac b/configure.ac
index e2e3f66..fc44bd5 100644
--- a/configure.ac
+++ b/configure.ac
@@ -168,6 +168,8 @@ AC_HAVE_DEVMAPPER
 AC_HAVE_MALLINFO
 AC_PACKAGE_WANT_ATTRIBUTES_H
 AC_HAVE_LIBATTR
+AC_PACKAGE_WANT_UNINORM_H
+AC_HAVE_U8NORMALIZE
 
 if test "$enable_blkid" = yes; then
 AC_HAVE_BLKID_TOPO
diff --git a/debian/control b/debian/control
index f664a6b..36d1bd8 100644
--- a/debian/control
+++ b/debian/control
@@ -3,7 +3,7 @@ Section: admin
 Priority: optional
 Maintainer: XFS Development Team <linux-xfs@vger.kernel.org>
 Uploaders: Nathan Scott <nathans@debian.org>, Anibal Monsalve Salazar <anibal@debian.org>
-Build-Depends: uuid-dev, dh-autoreconf, debhelper (>= 5), gettext, libtool, libreadline-gplv2-dev | libreadline5-dev, libblkid-dev (>= 2.17), linux-libc-dev, libdevmapper-dev, libattr1-dev
+Build-Depends: uuid-dev, dh-autoreconf, debhelper (>= 5), gettext, libtool, libreadline-gplv2-dev | libreadline5-dev, libblkid-dev (>= 2.17), linux-libc-dev, libdevmapper-dev, libattr1-dev, libunistring-dev
 Standards-Version: 3.9.1
 Homepage: https://xfs.wiki.kernel.org/
 
diff --git a/include/builddefs.in b/include/builddefs.in
index cc1b7e2..1c264a0 100644
--- a/include/builddefs.in
+++ b/include/builddefs.in
@@ -36,6 +36,7 @@ LIBEDITLINE = @libeditline@
 LIBREADLINE = @libreadline@
 LIBBLKID = @libblkid@
 LIBDEVMAPPER = @libdevmapper@
+LIBUNISTRING = @libunistring@
 LIBXFS = $(TOPDIR)/libxfs/libxfs.la
 LIBFROG = $(TOPDIR)/libfrog/libfrog.la
 LIBXCMD = $(TOPDIR)/libxcmd/libxcmd.la
@@ -121,6 +122,7 @@ HAVE_MAP_SYNC = @have_map_sync@
 HAVE_DEVMAPPER = @have_devmapper@
 HAVE_MALLINFO = @have_mallinfo@
 HAVE_LIBATTR = @have_libattr@
+HAVE_U8NORMALIZE = @have_u8normalize@
 
 GCCFLAGS = -funsigned-char -fno-strict-aliasing -Wall
 #	   -Wbitwise -Wno-transparent-union -Wno-old-initializer -Wno-decl
diff --git a/m4/Makefile b/m4/Makefile
index d5f1d2f..61d617e 100644
--- a/m4/Makefile
+++ b/m4/Makefile
@@ -22,6 +22,7 @@ LSRCFILES = \
 	package_pthread.m4 \
 	package_sanitizer.m4 \
 	package_types.m4 \
+	package_unistring.m4 \
 	package_utilies.m4 \
 	package_uuiddev.m4 \
 	multilib.m4 \
diff --git a/m4/package_unistring.m4 b/m4/package_unistring.m4
new file mode 100644
index 0000000..9cbfcb0
--- /dev/null
+++ b/m4/package_unistring.m4
@@ -0,0 +1,19 @@
+AC_DEFUN([AC_PACKAGE_WANT_UNINORM_H],
+  [ AC_CHECK_HEADERS(uninorm.h)
+    if test $ac_cv_header_uninorm_h = no; then
+	AC_CHECK_HEADERS(uninorm.h,, [
+	echo
+	echo 'WARNING: could not find a valid uninorm.h header.'])
+    fi
+  ])
+
+AC_DEFUN([AC_HAVE_U8NORMALIZE],
+  [ AC_CHECK_LIB(unistring, u8_normalize,[
+	libunistring=-lunistring
+	have_u8normalize=yes
+    ],[
+	echo
+	echo 'WARNING: xfs_scrub will not be built with Unicode libraries.'])
+    AC_SUBST(libunistring)
+    AC_SUBST(have_u8normalize)
+  ])
diff --git a/scrub/Makefile b/scrub/Makefile
index 67ac6af..858bc40 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -24,6 +24,7 @@ fscounters.h \
 inodes.h \
 scrub.h \
 spacemap.h \
+unicrash.h \
 xfs_scrub.h
 
 CFILES = \
@@ -41,8 +42,8 @@ scrub.c \
 spacemap.c \
 xfs_scrub.c
 
-LLDLIBS += $(LIBHANDLE) $(LIBFROG) $(LIBPTHREAD)
-LTDEPENDENCIES += $(LIBHANDLE) $(LIBFROG)
+LLDLIBS += $(LIBHANDLE) $(LIBFROG) $(LIBPTHREAD) $(LIBUNISTRING)
+LTDEPENDENCIES += $(LIBHANDLE) $(LIBFROG) $(LIBUNISTRING)
 LLDFLAGS = -static
 
 ifeq ($(HAVE_MALLINFO),yes)
@@ -57,9 +58,14 @@ ifeq ($(HAVE_LIBATTR),yes)
 LCFLAGS += -DHAVE_LIBATTR
 endif
 
+ifeq ($(HAVE_U8NORMALIZE),yes)
+CFILES += unicrash.c
+LCFLAGS += -DHAVE_U8NORMALIZE
+endif
+
 default: depend $(LTCOMMAND)
 
-phase5.o: $(TOPDIR)/include/builddefs
+phase5.o unicrash.o xfs.o: $(TOPDIR)/include/builddefs
 
 include $(BUILDRULES)
 
diff --git a/scrub/common.c b/scrub/common.c
index b02e7fc..10c4017 100644
--- a/scrub/common.c
+++ b/scrub/common.c
@@ -75,6 +75,26 @@ __str_errno(
 	pthread_mutex_unlock(&ctx->lock);
 }
 
+/* Print a warning string and whatever error is stored in errno. */
+void
+__str_errno_warn(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	const char		*file,
+	int			line)
+{
+	char			buf[DESCR_BUFSZ];
+
+	pthread_mutex_lock(&ctx->lock);
+	fprintf(stderr, _("Warning: %s: %s."), descr,
+			strerror_r(errno, buf, DESCR_BUFSZ));
+	if (debug)
+		fprintf(stderr, _(" (%s line %d)"), file, line);
+	fprintf(stderr, "\n");
+	ctx->warnings_found++;
+	pthread_mutex_unlock(&ctx->lock);
+}
+
 /* Print an error string and some error text. */
 void
 __str_error(
diff --git a/scrub/common.h b/scrub/common.h
index e5a13d8..dd2070e 100644
--- a/scrub/common.h
+++ b/scrub/common.h
@@ -41,11 +41,14 @@ void __record_repair(struct scrub_ctx *ctx, const char *descr, const char *file,
 		int line, const char *format, ...);
 void __record_preen(struct scrub_ctx *ctx, const char *descr, const char *file,
 		int line, const char *format, ...);
+void __str_errno_warn(struct scrub_ctx *, const char *descr, const char *file,
+		      int line);
 
 #define str_errno(ctx, str)		__str_errno(ctx, str, __FILE__, __LINE__)
 #define str_error(ctx, str, ...)	__str_error(ctx, str, __FILE__, __LINE__, __VA_ARGS__)
 #define str_warn(ctx, str, ...)		__str_warn(ctx, str, __FILE__, __LINE__, __VA_ARGS__)
 #define str_info(ctx, str, ...)		__str_info(ctx, str, __FILE__, __LINE__, __VA_ARGS__)
+#define str_errno_warn(ctx, str)	__str_errno_warn(ctx, str, __FILE__, __LINE__)
 #define dbg_printf(fmt, ...)		{if (debug > 1) {printf(fmt, __VA_ARGS__);}}
 
 /* Is this debug tweak enabled? */
diff --git a/scrub/phase5.c b/scrub/phase5.c
index 98d30f8..8b8aeed 100644
--- a/scrub/phase5.c
+++ b/scrub/phase5.c
@@ -35,6 +35,7 @@
 #include "common.h"
 #include "inodes.h"
 #include "scrub.h"
+#include "unicrash.h"
 
 /* Phase 5: Check directory connectivity. */
 
@@ -92,8 +93,10 @@ static bool
 xfs_scrub_scan_dirents(
 	struct scrub_ctx	*ctx,
 	const char		*descr,
-	int			*fd)
+	int			*fd,
+	struct xfs_bstat	*bstat)
 {
+	struct unicrash		*uc = NULL;
 	DIR			*dir;
 	struct dirent		*dentry;
 	bool			moveon = true;
@@ -105,15 +108,24 @@ xfs_scrub_scan_dirents(
 	}
 	*fd = -1; /* closedir will close *fd for us */
 
+	moveon = unicrash_dir_init(&uc, ctx, bstat);
+	if (!moveon)
+		goto out_unicrash;
+
 	dentry = readdir(dir);
 	while (dentry) {
 		moveon = xfs_scrub_check_name(ctx, descr, _("directory"),
 				dentry->d_name);
 		if (!moveon)
 			break;
+		moveon = unicrash_check_dir_name(uc, descr, dentry);
+		if (!moveon)
+			break;
 		dentry = readdir(dir);
 	}
+	unicrash_free(uc);
 
+out_unicrash:
 	closedir(dir);
 out:
 	return moveon;
@@ -142,6 +154,7 @@ xfs_scrub_scan_fhandle_namespace_xattrs(
 	struct scrub_ctx		*ctx,
 	const char			*descr,
 	struct xfs_handle		*handle,
+	struct xfs_bstat		*bstat,
 	const struct xfs_attr_ns	*attr_ns)
 {
 	struct attrlist_cursor		cur;
@@ -149,10 +162,15 @@ xfs_scrub_scan_fhandle_namespace_xattrs(
 	char				keybuf[NAME_MAX + 1];
 	struct attrlist			*attrlist = (struct attrlist *)attrbuf;
 	struct attrlist_ent		*ent;
+	struct unicrash			*uc;
 	bool				moveon = true;
 	int				i;
 	int				error;
 
+	moveon = unicrash_xattr_init(&uc, ctx, bstat);
+	if (!moveon)
+		return false;
+
 	memset(attrbuf, 0, XFS_XATTR_LIST_MAX);
 	memset(&cur, 0, sizeof(cur));
 	memset(keybuf, 0, NAME_MAX + 1);
@@ -168,6 +186,9 @@ xfs_scrub_scan_fhandle_namespace_xattrs(
 					_("extended attribute"), keybuf);
 			if (!moveon)
 				goto out;
+			moveon = unicrash_check_xattr_name(uc, descr, keybuf);
+			if (!moveon)
+				goto out;
 		}
 
 		if (!attrlist->al_more)
@@ -178,6 +199,7 @@ xfs_scrub_scan_fhandle_namespace_xattrs(
 	if (error && errno != ESTALE)
 		str_errno(ctx, descr);
 out:
+	unicrash_free(uc);
 	return moveon;
 }
 
@@ -189,14 +211,15 @@ static bool
 xfs_scrub_scan_fhandle_xattrs(
 	struct scrub_ctx		*ctx,
 	const char			*descr,
-	struct xfs_handle		*handle)
+	struct xfs_handle		*handle,
+	struct xfs_bstat		*bstat)
 {
 	const struct xfs_attr_ns	*ns;
 	bool				moveon = true;
 
 	for (ns = attr_ns; ns->name; ns++) {
 		moveon = xfs_scrub_scan_fhandle_namespace_xattrs(ctx, descr,
-				handle, ns);
+				handle, bstat, ns);
 		if (!moveon)
 			break;
 	}
@@ -217,6 +240,8 @@ xfs_scrub_scan_fhandle_xattrs(
  * Verify the connectivity of the directory tree.
  * We know that the kernel's open-by-handle function will try to reconnect
  * parents of an opened directory, so we'll accept that as sufficient.
+ *
+ * Check for potential Unicode collisions in names.
  */
 static int
 xfs_scrub_connections(
@@ -227,7 +252,7 @@ xfs_scrub_connections(
 {
 	bool			*pmoveon = arg;
 	char			descr[DESCR_BUFSZ];
-	bool			moveon = true;
+	bool			moveon;
 	xfs_agnumber_t		agno;
 	xfs_agino_t		agino;
 	int			fd = -1;
@@ -238,10 +263,10 @@ xfs_scrub_connections(
 			(uint64_t)bstat->bs_ino, agno, agino);
 	background_sleep();
 
-        /* Warn about naming problems in xattrs. */
-        moveon = xfs_scrub_scan_fhandle_xattrs(ctx, descr, handle);
-        if (!moveon)
-                goto out;
+	/* Warn about naming problems in xattrs. */
+	moveon = xfs_scrub_scan_fhandle_xattrs(ctx, descr, handle, bstat);
+	if (!moveon)
+		goto out;
 
 	/* Open the dir, let the kernel try to reconnect it to the root. */
 	if (S_ISDIR(bstat->bs_mode)) {
@@ -254,12 +279,12 @@ xfs_scrub_connections(
 		}
 	}
 
-        /* Warn about naming problems in the directory entries. */
-        if (fd >= 0 && S_ISDIR(bstat->bs_mode)) {
-                moveon = xfs_scrub_scan_dirents(ctx, descr, &fd);
-                if (!moveon)
-                        goto out;
-        }
+	/* Warn about naming problems in the directory entries. */
+	if (fd >= 0 && S_ISDIR(bstat->bs_mode)) {
+		moveon = xfs_scrub_scan_dirents(ctx, descr, &fd, bstat);
+		if (!moveon)
+			goto out;
+	}
 
 out:
 	if (fd >= 0)
diff --git a/scrub/unicrash.c b/scrub/unicrash.c
new file mode 100644
index 0000000..25d6701
--- /dev/null
+++ b/scrub/unicrash.c
@@ -0,0 +1,399 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <stdlib.h>
+#include <dirent.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/statvfs.h>
+#include <unistr.h>
+#include <uninorm.h>
+#include "xfs.h"
+#include "path.h"
+#include "xfs_scrub.h"
+#include "common.h"
+
+/*
+ * Detect collisions of Unicode-normalized names.
+ *
+ * Record all the name->ino mappings in a directory/xattr, with a twist!
+ * The twist is that we perform unicode normalization on every name we
+ * see, so that we can warn about a directory containing more than one
+ * directory entries that normalize to the same Unicode string.  These
+ * entries are at best a sign of Unicode mishandling, or some sort of
+ * weird name substitution attack if the entries do not point to the
+ * same inode.  Warn if we see multiple dirents that do not all point to
+ * the same inode.
+ *
+ * For extended attributes we perform the same collision checks on the
+ * attribute, though any collision is enough to trigger a warning.
+ *
+ * We flag these collisions as warnings and not errors because XFS
+ * treats names as a sequence of arbitrary nonzero bytes.  While a
+ * Unicode collision is not technically a filesystem corruption, we
+ * ought to say something if there's a possibility for misleading a
+ * user.
+ *
+ * To normalize, we use Unicode NFKC.  We use the composing
+ * normalization mode (e.g. "E WITH ACUTE" instead of "E" then "ACUTE")
+ * because that's what W3C (and in general Linux) uses.  This enables us
+ * to detect multiple object names that normalize to the same name and
+ * could be confusing to users.  Furthermore, we use the compatibility
+ * mode to detect names with compatible but different code points to
+ * strengthen those checks.
+ */
+
+struct name_entry {
+	struct name_entry	*next;
+	xfs_ino_t		ino;
+	size_t			uninamelen;
+	uint8_t			uniname[0];
+};
+#define NAME_ENTRY_SZ(nl)	(sizeof(struct name_entry) + 1 + \
+				 (nl * sizeof(uint8_t)))
+
+struct unicrash {
+	struct scrub_ctx	*ctx;
+	bool			compare_ino;
+	size_t			nr_buckets;
+	struct name_entry	*buckets[0];
+};
+#define UNICRASH_SZ(nr)		(sizeof(struct unicrash) + \
+				 (nr * sizeof(struct name_entry *)))
+
+/*
+ * We only care about validating utf8 collisions if the underlying
+ * system configuration says we're using utf8.  If the language
+ * specifier string used to output messages has ".UTF-8" somewhere in
+ * its name, then we conclude utf8 is in use.  Otherwise, no checking is
+ * performed.
+ *
+ * Most modern Linux systems default to utf8, so the only time this
+ * check will return false is if the administrator configured things
+ * this way or if things are so messed up there is no locale data at
+ * all.
+ */
+#define UTF8_STR		".UTF-8"
+#define UTF8_STRLEN		(sizeof(UTF8_STR) - 1)
+static bool
+is_utf8_locale(void)
+{
+	const char		*msg_locale;
+	static int		answer = -1;
+
+	if (answer != -1)
+		return answer;
+
+	msg_locale = setlocale(LC_MESSAGES, NULL);
+	if (msg_locale == NULL)
+		return false;
+
+	if (strstr(msg_locale, UTF8_STR) != NULL)
+		answer = 1;
+	else
+		answer = 0;
+	return answer;
+}
+
+/* Set up unicrash global state. */
+void
+unicrash_setup(void)
+{
+	is_utf8_locale();
+}
+
+/* Initialize the collision detector. */
+static bool
+unicrash_init(
+	struct unicrash		**ucp,
+	struct scrub_ctx	*ctx,
+	bool			compare_ino,
+	size_t			nr_buckets)
+{
+	struct unicrash		*p;
+
+	if (!is_utf8_locale()) {
+		*ucp = NULL;
+		return true;
+	}
+
+	if (nr_buckets > 65536)
+		nr_buckets = 65536;
+	else if (nr_buckets < 16)
+		nr_buckets = 16;
+
+	p = calloc(1, UNICRASH_SZ(nr_buckets));
+	if (!p)
+		return false;
+	p->ctx = ctx;
+	p->nr_buckets = nr_buckets;
+	p->compare_ino = compare_ino;
+	*ucp = p;
+
+	return true;
+}
+
+/* Initialize the collision detector for a directory. */
+bool
+unicrash_dir_init(
+	struct unicrash		**ucp,
+	struct scrub_ctx	*ctx,
+	struct xfs_bstat	*bstat)
+{
+	/*
+	 * Assume 64 bytes per dentry, clamp buckets between 16 and 64k.
+	 * Same general idea as dir_hash_init in xfs_repair.
+	 */
+	return unicrash_init(ucp, ctx, true, bstat->bs_size / 64);
+}
+
+/* Initialize the collision detector for an extended attribute. */
+bool
+unicrash_xattr_init(
+	struct unicrash		**ucp,
+	struct scrub_ctx	*ctx,
+	struct xfs_bstat	*bstat)
+{
+	/* Assume 16 attributes per extent for lack of a better idea. */
+	return unicrash_init(ucp, ctx, false, 16 * (1 + bstat->bs_aextents));
+}
+
+/* Free the crash detector. */
+void
+unicrash_free(
+	struct unicrash		*uc)
+{
+	struct name_entry	*ne;
+	struct name_entry	*x;
+	size_t			i;
+
+	if (!uc)
+		return;
+
+	for (i = 0; i < uc->nr_buckets; i++) {
+		for (ne = uc->buckets[i]; ne != NULL; ne = x) {
+			x = ne->next;
+			free(ne);
+		}
+	}
+	free(uc);
+}
+
+/* Steal the dirhash function from libxfs, avoid linking with libxfs. */
+
+#define rol32(x, y)		(((x) << (y)) | ((x) >> (32 - (y))))
+
+/*
+ * Implement a simple hash on a character string.
+ * Rotate the hash value by 7 bits, then XOR each character in.
+ * This is implemented with some source-level loop unrolling.
+ */
+static xfs_dahash_t
+unicrash_hashname(
+	const uint8_t		*name,
+	size_t			namelen)
+{
+	xfs_dahash_t		hash;
+
+	/*
+	 * Do four characters at a time as long as we can.
+	 */
+	for (hash = 0; namelen >= 4; namelen -= 4, name += 4)
+		hash = (name[0] << 21) ^ (name[1] << 14) ^ (name[2] << 7) ^
+		       (name[3] << 0) ^ rol32(hash, 7 * 4);
+
+	/*
+	 * Now do the rest of the characters.
+	 */
+	switch (namelen) {
+	case 3:
+		return (name[0] << 14) ^ (name[1] << 7) ^ (name[2] << 0) ^
+		       rol32(hash, 7 * 3);
+	case 2:
+		return (name[0] << 7) ^ (name[1] << 0) ^ rol32(hash, 7 * 2);
+	case 1:
+		return (name[0] << 0) ^ rol32(hash, 7 * 1);
+	default: /* case 0: */
+		return hash;
+	}
+}
+
+/*
+ * Normalize a name according to Unicode NFKC normalization rules.
+ * Returns true if the name was already normalized.
+ */
+static bool
+unicrash_normalize(
+	const char		*in,
+	uint8_t			*out,
+	size_t			outlen)
+{
+	size_t			inlen = strlen(in);
+
+	assert(inlen <= outlen);
+	if (!u8_normalize(UNINORM_NFKC, (const uint8_t *)in, inlen,
+			out, &outlen)) {
+		/* Didn't normalize, just return the same buffer. */
+		memcpy(out, in, inlen + 1);
+		return true;
+	}
+	out[outlen] = 0;
+	return outlen == inlen ? memcmp(in, out, inlen) == 0 : false;
+}
+
+/* Complain about Unicode problems. */
+static void
+unicrash_complain(
+	struct unicrash		*uc,
+	const char		*descr,
+	const char		*what,
+	bool			normal,
+	bool			unique,
+	const char		*name,
+	uint8_t			*uniname)
+{
+	char			*bad1 = NULL;
+	char			*bad2 = NULL;
+
+	bad1 = string_escape(name);
+	bad2 = string_escape((char *)uniname);
+
+	if (!normal && should_warn_about_name(uc->ctx))
+		str_info(uc->ctx, descr,
+_("Unicode name \"%s\" in %s should be normalized as \"%s\"."),
+				bad1, what, bad2);
+	if (!unique)
+		str_warn(uc->ctx, descr,
+_("Duplicate normalized Unicode name \"%s\" found in %s."),
+				bad1, what);
+
+	free(bad1);
+	free(bad2);
+}
+
+/*
+ * Try to add a name -> ino entry to the collision detector.  The name
+ * must be normalized according to Unicode NFKC normalization rules to
+ * detect byte-unique names that map to the same sequence of Unicode
+ * code points.
+ *
+ * This function returns true either if there was no previous mapping or
+ * there was a mapping that matched exactly.  It returns false if
+ * there is already a record with that name pointing to a different
+ * inode.
+ */
+static bool
+unicrash_add(
+	struct unicrash		*uc,
+	uint8_t			*uniname,
+	xfs_ino_t		ino,
+	bool			*unique)
+{
+	struct name_entry	*ne;
+	struct name_entry	*x;
+	struct name_entry	**nep;
+	size_t			uninamelen = u8_strlen(uniname);
+	size_t			bucket;
+	xfs_dahash_t		hash;
+
+	/* Do we already know about that name? */
+	hash = unicrash_hashname(uniname, uninamelen);
+	bucket = hash % uc->nr_buckets;
+	for (nep = &uc->buckets[bucket], ne = *nep; ne != NULL; ne = x) {
+		if (u8_strcmp(uniname, ne->uniname) == 0) {
+			*unique = uc->compare_ino ? ne->ino == ino : false;
+			return true;
+		}
+		nep = &ne->next;
+		x = ne->next;
+	}
+
+	/* Remember that name. */
+	x = malloc(NAME_ENTRY_SZ(uninamelen));
+	if (!x)
+		return false;
+	x->next = NULL;
+	x->ino = ino;
+	x->uninamelen = uninamelen;
+	memcpy(x->uniname, uniname, uninamelen + 1);
+	*nep = x;
+	*unique = true;
+
+	return true;
+}
+
+/* Check a name for unicode normalization problems or collisions. */
+static bool
+__unicrash_check_name(
+	struct unicrash		*uc,
+	const char		*descr,
+	const char		*namedescr,
+	const char		*name,
+	xfs_ino_t		ino)
+{
+	uint8_t			uniname[(NAME_MAX * 2) + 1];
+	bool			moveon;
+	bool			normal;
+	bool			unique;
+
+	memset(uniname, 0, (NAME_MAX * 2) + 1);
+	normal = unicrash_normalize(name, uniname, NAME_MAX * 2);
+	moveon = unicrash_add(uc, uniname, ino, &unique);
+	if (!moveon)
+		return false;
+
+	if (normal && unique)
+		return true;
+
+	unicrash_complain(uc, descr, namedescr, normal, unique, name,
+			uniname);
+	return true;
+}
+
+/* Check a directory entry for unicode normalization problems or collisions. */
+bool
+unicrash_check_dir_name(
+	struct unicrash		*uc,
+	const char		*descr,
+	struct dirent		*dentry)
+{
+	if (!uc)
+		return true;
+	return __unicrash_check_name(uc, descr, _("directory"),
+			dentry->d_name, dentry->d_ino);
+}
+
+/*
+ * Check an extended attribute name for unicode normalization problems
+ * or collisions.
+ */
+bool
+unicrash_check_xattr_name(
+	struct unicrash		*uc,
+	const char		*descr,
+	const char		*attrname)
+{
+	if (!uc)
+		return true;
+	return __unicrash_check_name(uc, descr, _("extended attribute"),
+			attrname, 0);
+}
diff --git a/scrub/unicrash.h b/scrub/unicrash.h
new file mode 100644
index 0000000..d4561b7
--- /dev/null
+++ b/scrub/unicrash.h
@@ -0,0 +1,49 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef XFS_SCRUB_UNICRASH_H_
+#define XFS_SCRUB_UNICRASH_H_
+
+struct unicrash;
+
+/* Unicode name collision detection. */
+#ifdef HAVE_U8NORMALIZE
+
+struct dirent;
+
+void unicrash_setup(void);
+bool unicrash_dir_init(struct unicrash **ucp, struct scrub_ctx *ctx,
+		struct xfs_bstat *bstat);
+bool unicrash_xattr_init(struct unicrash **ucp, struct scrub_ctx *ctx,
+		struct xfs_bstat *bstat);
+void unicrash_free(struct unicrash *uc);
+bool unicrash_check_dir_name(struct unicrash *uc, const char *descr,
+		struct dirent *dirent);
+bool unicrash_check_xattr_name(struct unicrash *uc, const char *descr,
+		const char *attrname);
+#else
+# define unicrash_setup()
+# define unicrash_dir_init(u, c, b)		(true)
+# define unicrash_xattr_init(u, c, b)		(true)
+# define unicrash_free(u)			do {(u) = (u);} while (0)
+# define unicrash_check_dir_name(u, d, n)	(true)
+# define unicrash_check_xattr_name(u, d, n)	(true)
+#endif /* HAVE_U8NORMALIZE */
+
+#endif /* XFS_SCRUB_UNICRASH_H_ */
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index 64517f4..f7e4e37 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -31,6 +31,7 @@
 #include "path.h"
 #include "xfs_scrub.h"
 #include "common.h"
+#include "unicrash.h"
 
 /*
  * XFS Online Metadata Scrub (and Repair)
@@ -529,6 +530,7 @@ _("Only one of the options -n or -y may be specified.\n"));
 	if (optind != argc - 1)
 		usage();
 
+	unicrash_setup();
 	ctx.mntpoint = strdup(argv[optind]);
 
 	/* Find the mount record for the passed-in argument. */


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 19/27] xfs_scrub: create a bitmap data structure
  2018-01-06  1:51 [PATCH v11 00/27] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (17 preceding siblings ...)
  2018-01-06  1:53 ` [PATCH 18/27] xfs_scrub: warn about normalized Unicode name collisions Darrick J. Wong
@ 2018-01-06  1:53 ` Darrick J. Wong
  2018-01-06  1:53 ` [PATCH 20/27] xfs_scrub: create infrastructure to read verify data blocks Darrick J. Wong
                   ` (11 subsequent siblings)
  30 siblings, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-06  1:53 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Create an efficient tree-based bitmap data structure.  We will use this
during the data block scan to record the LBAs of IO errors so that we
can report broken files to userspace.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/Makefile |    2 
 scrub/bitmap.c |  410 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/bitmap.h |   38 +++++
 3 files changed, 450 insertions(+)
 create mode 100644 scrub/bitmap.c
 create mode 100644 scrub/bitmap.h


diff --git a/scrub/Makefile b/scrub/Makefile
index 858bc40..a9aaa99 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -16,6 +16,7 @@ INSTALL_SCRUB = install-scrub
 endif	# scrub_prereqs
 
 HFILES = \
+bitmap.h \
 common.h \
 counter.h \
 disk.h \
@@ -28,6 +29,7 @@ unicrash.h \
 xfs_scrub.h
 
 CFILES = \
+bitmap.c \
 common.c \
 counter.c \
 disk.c \
diff --git a/scrub/bitmap.c b/scrub/bitmap.c
new file mode 100644
index 0000000..a88fd0e
--- /dev/null
+++ b/scrub/bitmap.c
@@ -0,0 +1,410 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <errno.h>
+#include <assert.h>
+#include <inttypes.h>
+#include <pthread.h>
+#include "platform_defs.h"
+#include "avl64.h"
+#include "list.h"
+#include "bitmap.h"
+
+/*
+ * Space Efficient Bitmap
+ *
+ * Implements a space-efficient bitmap.  We use an AVL tree to manage
+ * extent records that tell us which ranges are set; the bitmap key is
+ * an arbitrary uint64_t.  The usual bitmap operations (set, clear,
+ * test, test and set) are supported, plus we can iterate set ranges.
+ */
+
+#define avl_for_each_range_safe(pos, n, l, first, last) \
+	for (pos = (first), n = pos->avl_nextino, l = (last)->avl_nextino; pos != (l); \
+			pos = n, n = pos ? pos->avl_nextino : NULL)
+
+#define avl_for_each_safe(tree, pos, n) \
+	for (pos = (tree)->avl_firstino, n = pos ? pos->avl_nextino : NULL; \
+			pos != NULL; \
+			pos = n, n = pos ? pos->avl_nextino : NULL)
+
+#define avl_for_each(tree, pos) \
+	for (pos = (tree)->avl_firstino; pos != NULL; pos = pos->avl_nextino)
+
+struct bitmap_node {
+	struct avl64node	btn_node;
+	uint64_t		btn_start;
+	uint64_t		btn_length;
+};
+
+static uint64_t
+extent_start(
+	struct avl64node	*node)
+{
+	struct bitmap_node	*btn;
+
+	btn = container_of(node, struct bitmap_node, btn_node);
+	return btn->btn_start;
+}
+
+static uint64_t
+extent_end(
+	struct avl64node	*node)
+{
+	struct bitmap_node	*btn;
+
+	btn = container_of(node, struct bitmap_node, btn_node);
+	return btn->btn_start + btn->btn_length;
+}
+
+static struct avl64ops bitmap_ops = {
+	extent_start,
+	extent_end,
+};
+
+/* Initialize a bitmap. */
+bool
+bitmap_init(
+	struct bitmap		**bmapp)
+{
+	struct bitmap		*bmap;
+
+	bmap = calloc(1, sizeof(struct bitmap));
+	if (!bmap)
+		return false;
+	bmap->bt_tree = malloc(sizeof(struct avl64tree_desc));
+	if (!bmap->bt_tree) {
+		free(bmap);
+		return false;
+	}
+
+	pthread_mutex_init(&bmap->bt_lock, NULL);
+	avl64_init_tree(bmap->bt_tree, &bitmap_ops);
+	*bmapp = bmap;
+
+	return true;
+}
+
+/* Free a bitmap. */
+void
+bitmap_free(
+	struct bitmap		**bmapp)
+{
+	struct bitmap		*bmap;
+	struct avl64node	*node;
+	struct avl64node	*n;
+	struct bitmap_node	*ext;
+
+	bmap = *bmapp;
+	avl_for_each_safe(bmap->bt_tree, node, n) {
+		ext = container_of(node, struct bitmap_node, btn_node);
+		free(ext);
+	}
+	free(bmap->bt_tree);
+	*bmapp = NULL;
+}
+
+/* Create a new bitmap extent node. */
+static struct bitmap_node *
+bitmap_node_init(
+	uint64_t		start,
+	uint64_t		len)
+{
+	struct bitmap_node	*ext;
+
+	ext = malloc(sizeof(struct bitmap_node));
+	if (!ext)
+		return NULL;
+
+	ext->btn_node.avl_nextino = NULL;
+	ext->btn_start = start;
+	ext->btn_length = len;
+
+	return ext;
+}
+
+/* Set a region of bits (locked). */
+static bool
+__bitmap_set(
+	struct bitmap		*bmap,
+	uint64_t		start,
+	uint64_t		length)
+{
+	struct avl64node	*firstn;
+	struct avl64node	*lastn;
+	struct avl64node	*pos;
+	struct avl64node	*n;
+	struct avl64node	*l;
+	struct bitmap_node	*ext;
+	uint64_t		new_start;
+	uint64_t		new_length;
+	struct avl64node	*node;
+	bool			res = true;
+
+	/* Find any existing nodes adjacent or within that range. */
+	avl64_findranges(bmap->bt_tree, start - 1, start + length + 1,
+			&firstn, &lastn);
+
+	/* Nothing, just insert a new extent. */
+	if (firstn == NULL && lastn == NULL) {
+		ext = bitmap_node_init(start, length);
+		if (!ext)
+			return false;
+
+		node = avl64_insert(bmap->bt_tree, &ext->btn_node);
+		if (node == NULL) {
+			free(ext);
+			errno = EEXIST;
+			return false;
+		}
+
+		return true;
+	}
+
+	assert(firstn != NULL && lastn != NULL);
+	new_start = start;
+	new_length = length;
+
+	avl_for_each_range_safe(pos, n, l, firstn, lastn) {
+		ext = container_of(pos, struct bitmap_node, btn_node);
+
+		/* Bail if the new extent is contained within an old one. */
+		if (ext->btn_start <= start &&
+		    ext->btn_start + ext->btn_length >= start + length)
+			return res;
+
+		/* Check for overlapping and adjacent extents. */
+		if (ext->btn_start + ext->btn_length >= start ||
+		    ext->btn_start <= start + length) {
+			if (ext->btn_start < start) {
+				new_start = ext->btn_start;
+				new_length += ext->btn_length;
+			}
+
+			if (ext->btn_start + ext->btn_length >
+			    new_start + new_length)
+				new_length = ext->btn_start + ext->btn_length -
+						new_start;
+
+			avl64_delete(bmap->bt_tree, pos);
+			free(ext);
+		}
+	}
+
+	ext = bitmap_node_init(new_start, new_length);
+	if (!ext)
+		return false;
+
+	node = avl64_insert(bmap->bt_tree, &ext->btn_node);
+	if (node == NULL) {
+		free(ext);
+		errno = EEXIST;
+		return false;
+	}
+
+	return res;
+}
+
+/* Set a region of bits. */
+bool
+bitmap_set(
+	struct bitmap		*bmap,
+	uint64_t		start,
+	uint64_t		length)
+{
+	bool			res;
+
+	pthread_mutex_lock(&bmap->bt_lock);
+	res = __bitmap_set(bmap, start, length);
+	pthread_mutex_unlock(&bmap->bt_lock);
+
+	return res;
+}
+
+#if 0	/* Unused, provided for completeness. */
+/* Clear a region of bits. */
+bool
+bitmap_clear(
+	struct bitmap		*bmap,
+	uint64_t		start,
+	uint64_t		len)
+{
+	struct avl64node	*firstn;
+	struct avl64node	*lastn;
+	struct avl64node	*pos;
+	struct avl64node	*n;
+	struct avl64node	*l;
+	struct bitmap_node	*ext;
+	uint64_t		new_start;
+	uint64_t		new_length;
+	struct avl64node	*node;
+	int			stat;
+
+	pthread_mutex_lock(&bmap->bt_lock);
+	/* Find any existing nodes over that range. */
+	avl64_findranges(bmap->bt_tree, start, start + len, &firstn, &lastn);
+
+	/* Nothing, we're done. */
+	if (firstn == NULL && lastn == NULL) {
+		pthread_mutex_unlock(&bmap->bt_lock);
+		return true;
+	}
+
+	assert(firstn != NULL && lastn != NULL);
+
+	/* Delete or truncate everything in sight. */
+	avl_for_each_range_safe(pos, n, l, firstn, lastn) {
+		ext = container_of(pos, struct bitmap_node, btn_node);
+
+		stat = 0;
+		if (ext->btn_start < start)
+			stat |= 1;
+		if (ext->btn_start + ext->btn_length > start + len)
+			stat |= 2;
+		switch (stat) {
+		case 0:
+			/* Extent totally within range; delete. */
+			avl64_delete(bmap->bt_tree, pos);
+			free(ext);
+			break;
+		case 1:
+			/* Extent is left-adjacent; truncate. */
+			ext->btn_length = start - ext->btn_start;
+			break;
+		case 2:
+			/* Extent is right-adjacent; move it. */
+			ext->btn_length = ext->btn_start + ext->btn_length -
+					(start + len);
+			ext->btn_start = start + len;
+			break;
+		case 3:
+			/* Extent overlaps both ends. */
+			ext->btn_length = start - ext->btn_start;
+			new_start = start + len;
+			new_length = ext->btn_start + ext->btn_length -
+					new_start;
+
+			ext = bitmap_node_init(new_start, new_length);
+			if (!ext)
+				return false;
+
+			node = avl64_insert(bmap->bt_tree, &ext->btn_node);
+			if (node == NULL) {
+				errno = EEXIST;
+				return false;
+			}
+			break;
+		}
+	}
+
+	pthread_mutex_unlock(&bmap->bt_lock);
+	return true;
+}
+#endif
+
+#ifdef DEBUG
+/* Iterate the set regions of this bitmap. */
+bool
+bitmap_iterate(
+	struct bitmap		*bmap,
+	bool			(*fn)(uint64_t, uint64_t, void *),
+	void			*arg)
+{
+	struct avl64node	*node;
+	struct bitmap_node	*ext;
+	bool			moveon = true;
+
+	pthread_mutex_lock(&bmap->bt_lock);
+	avl_for_each(bmap->bt_tree, node) {
+		ext = container_of(node, struct bitmap_node, btn_node);
+		moveon = fn(ext->btn_start, ext->btn_length, arg);
+		if (!moveon)
+			break;
+	}
+	pthread_mutex_unlock(&bmap->bt_lock);
+
+	return moveon;
+}
+#endif
+
+/* Do any bitmap extents overlap the given one?  (locked) */
+static bool
+__bitmap_test(
+	struct bitmap		*bmap,
+	uint64_t		start,
+	uint64_t		len)
+{
+	struct avl64node	*firstn;
+	struct avl64node	*lastn;
+
+	/* Find any existing nodes over that range. */
+	avl64_findranges(bmap->bt_tree, start, start + len, &firstn, &lastn);
+
+	return firstn != NULL && lastn != NULL;
+}
+
+/* Is any part of this range set? */
+bool
+bitmap_test(
+	struct bitmap		*bmap,
+	uint64_t		start,
+	uint64_t		len)
+{
+	bool			res;
+
+	pthread_mutex_lock(&bmap->bt_lock);
+	res = __bitmap_test(bmap, start, len);
+	pthread_mutex_unlock(&bmap->bt_lock);
+
+	return res;
+}
+
+/* Are none of the bits set? */
+bool
+bitmap_empty(
+	struct bitmap		*bmap)
+{
+	return bmap->bt_tree->avl_firstino == NULL;
+}
+
+#ifdef DEBUG
+static bool
+bitmap_dump_fn(
+	uint64_t		startblock,
+	uint64_t		blockcount,
+	void			*arg)
+{
+	printf("%"PRIu64":%"PRIu64"\n", startblock, blockcount);
+	return true;
+}
+
+/* Dump bitmap. */
+void
+bitmap_dump(
+	struct bitmap		*bmap)
+{
+	printf("BITMAP DUMP %p\n", bmap);
+	bitmap_iterate(bmap, bitmap_dump_fn, NULL);
+	printf("BITMAP DUMP DONE\n");
+}
+#endif
diff --git a/scrub/bitmap.h b/scrub/bitmap.h
new file mode 100644
index 0000000..e8dcd4f
--- /dev/null
+++ b/scrub/bitmap.h
@@ -0,0 +1,38 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef XFS_SCRUB_BITMAP_H_
+#define XFS_SCRUB_BITMAP_H_
+
+struct bitmap {
+	pthread_mutex_t		bt_lock;
+	struct avl64tree_desc	*bt_tree;
+};
+
+bool bitmap_init(struct bitmap **bmap);
+void bitmap_free(struct bitmap **bmap);
+bool bitmap_set(struct bitmap *bmap, uint64_t start, uint64_t length);
+bool bitmap_iterate(struct bitmap *bmap,
+		bool (*fn)(uint64_t, uint64_t, void *), void *arg);
+bool bitmap_test(struct bitmap *bmap, uint64_t start,
+		uint64_t len);
+bool bitmap_empty(struct bitmap *bmap);
+void bitmap_dump(struct bitmap *bmap);
+
+#endif /* XFS_SCRUB_BITMAP_H_ */


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 20/27] xfs_scrub: create infrastructure to read verify data blocks
  2018-01-06  1:51 [PATCH v11 00/27] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (18 preceding siblings ...)
  2018-01-06  1:53 ` [PATCH 19/27] xfs_scrub: create a bitmap data structure Darrick J. Wong
@ 2018-01-06  1:53 ` Darrick J. Wong
  2018-01-06  1:53 ` [PATCH 21/27] xfs_scrub: scrub file " Darrick J. Wong
                   ` (10 subsequent siblings)
  30 siblings, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-06  1:53 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Manage the scheduling, issuance, and reporting of data block
verification reads.  This enables us to combine adjacent (or nearly
adjacent) read requests, and to take advantage of high-IOPS devices by
issuing IO from multiple threads.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/Makefile      |    2 
 scrub/read_verify.c |  268 +++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/read_verify.h |   50 ++++++++++
 scrub/xfs_scrub.h   |    3 +
 4 files changed, 323 insertions(+)
 create mode 100644 scrub/read_verify.c
 create mode 100644 scrub/read_verify.h


diff --git a/scrub/Makefile b/scrub/Makefile
index a9aaa99..3b3eb95 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -23,6 +23,7 @@ disk.h \
 filemap.h \
 fscounters.h \
 inodes.h \
+read_verify.h \
 scrub.h \
 spacemap.h \
 unicrash.h \
@@ -40,6 +41,7 @@ phase1.c \
 phase2.c \
 phase3.c \
 phase5.c \
+read_verify.c \
 scrub.c \
 spacemap.c \
 xfs_scrub.c
diff --git a/scrub/read_verify.c b/scrub/read_verify.c
new file mode 100644
index 0000000..244626d
--- /dev/null
+++ b/scrub/read_verify.c
@@ -0,0 +1,268 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <stdlib.h>
+#include <sys/statvfs.h>
+#include "workqueue.h"
+#include "path.h"
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_scrub.h"
+#include "common.h"
+#include "counter.h"
+#include "disk.h"
+#include "read_verify.h"
+
+/*
+ * Read Verify Pool
+ *
+ * Manages the data block read verification phase.  The caller schedules
+ * verification requests, which are then scheduled to be run by a thread
+ * pool worker.  Adjacent (or nearly adjacent) requests can be combined
+ * to reduce overhead when free space fragmentation is high.  The thread
+ * pool takes care of issuing multiple IOs to the device, if possible.
+ */
+
+/*
+ * Perform all IO in 32M chunks.  This cannot exceed 65536 sectors
+ * because that's the biggest SCSI VERIFY(16) we dare to send.
+ */
+#define RVP_IO_MAX_SIZE		(33554432)
+#define RVP_IO_MAX_SECTORS	(RVP_IO_MAX_SIZE >> BBSHIFT)
+
+/* Tolerate 64k holes in adjacent read verify requests. */
+#define RVP_IO_BATCH_LOCALITY	(65536)
+
+struct read_verify_pool {
+	struct workqueue	wq;		/* thread pool */
+	struct scrub_ctx	*ctx;		/* scrub context */
+	void			*readbuf;	/* read buffer */
+	struct ptcounter	*verified_bytes;
+	read_verify_ioerr_fn_t	ioerr_fn;	/* io error callback */
+	size_t			miniosz;	/* minimum io size, bytes */
+};
+
+/* Create a thread pool to run read verifiers. */
+struct read_verify_pool *
+read_verify_pool_init(
+	struct scrub_ctx		*ctx,
+	size_t				miniosz,
+	read_verify_ioerr_fn_t		ioerr_fn,
+	unsigned int			nproc)
+{
+	struct read_verify_pool		*rvp;
+	bool				ret;
+	int				error;
+
+	rvp = calloc(1, sizeof(struct read_verify_pool));
+	if (!rvp)
+		return NULL;
+
+	error = posix_memalign((void **)&rvp->readbuf, page_size,
+			RVP_IO_MAX_SIZE);
+	if (error || !rvp->readbuf)
+		goto out_free;
+	rvp->verified_bytes = ptcounter_init(nproc);
+	if (!rvp->verified_bytes)
+		goto out_buf;
+	rvp->miniosz = miniosz;
+	rvp->ctx = ctx;
+	rvp->ioerr_fn = ioerr_fn;
+	/* Run in the main thread if we only want one thread. */
+	if (nproc == 1)
+		nproc = 0;
+	ret = workqueue_create(&rvp->wq, (struct xfs_mount *)rvp, nproc);
+	if (ret)
+		goto out_counter;
+	return rvp;
+
+out_counter:
+	ptcounter_free(rvp->verified_bytes);
+out_buf:
+	free(rvp->readbuf);
+out_free:
+	free(rvp);
+	return NULL;
+}
+
+/* Finish up any read verification work. */
+void
+read_verify_pool_flush(
+	struct read_verify_pool		*rvp)
+{
+	workqueue_destroy(&rvp->wq);
+}
+
+/* Finish up any read verification work and tear it down. */
+void
+read_verify_pool_destroy(
+	struct read_verify_pool		*rvp)
+{
+	ptcounter_free(rvp->verified_bytes);
+	free(rvp->readbuf);
+	free(rvp);
+}
+
+/*
+ * Issue a read-verify IO in big batches.
+ */
+static void
+read_verify(
+	struct workqueue		*wq,
+	xfs_agnumber_t			agno,
+	void				*arg)
+{
+	struct read_verify		*rv = arg;
+	struct read_verify_pool		*rvp;
+	unsigned long long		verified = 0;
+	ssize_t				sz;
+	ssize_t				len;
+
+	rvp = (struct read_verify_pool *)wq->wq_ctx;
+	while (rv->io_length > 0) {
+		len = min(rv->io_length, RVP_IO_MAX_SIZE);
+		dbg_printf("diskverify %d %"PRIu64" %zu\n", rv->io_disk->d_fd,
+				rv->io_start, len);
+		sz = disk_read_verify(rv->io_disk, rvp->readbuf,
+				rv->io_start, len);
+		if (sz < 0) {
+			dbg_printf("IOERR %d %"PRIu64" %zu\n",
+					rv->io_disk->d_fd,
+					rv->io_start, len);
+			/* IO error, so try the next logical block. */
+			len = rvp->miniosz;
+			rvp->ioerr_fn(rvp->ctx, rv->io_disk, rv->io_start, len,
+					errno, rv->io_end_arg);
+		}
+
+		verified += len;
+		rv->io_start += len;
+		rv->io_length -= len;
+	}
+
+	free(rv);
+	ptcounter_add(rvp->verified_bytes, verified);
+}
+
+/* Queue a read verify request. */
+static bool
+read_verify_queue(
+	struct read_verify_pool		*rvp,
+	struct read_verify		*rv)
+{
+	struct read_verify		*tmp;
+	bool				ret;
+
+	dbg_printf("verify fd %d start %"PRIu64" len %"PRIu64"\n",
+			rv->io_disk->d_fd, rv->io_start, rv->io_length);
+
+	tmp = malloc(sizeof(struct read_verify));
+	if (!tmp) {
+		rvp->ioerr_fn(rvp->ctx, rv->io_disk, rv->io_start,
+				rv->io_length, errno, rv->io_end_arg);
+		return true;
+	}
+	memcpy(tmp, rv, sizeof(*tmp));
+
+	ret = workqueue_add(&rvp->wq, read_verify, 0, tmp);
+	if (ret) {
+		str_error(rvp->ctx, rvp->ctx->mntpoint,
+_("Could not queue read-verify work."));
+		free(tmp);
+		return false;
+	}
+	rv->io_length = 0;
+	return true;
+}
+
+/*
+ * Issue an IO request.  We'll batch subsequent requests if they're
+ * within 64k of each other
+ */
+bool
+read_verify_schedule_io(
+	struct read_verify_pool		*rvp,
+	struct read_verify		*rv,
+	struct disk			*disk,
+	uint64_t			start,
+	uint64_t			length,
+	void				*end_arg)
+{
+	uint64_t			req_end;
+	uint64_t			rv_end;
+
+	assert(rvp->readbuf);
+	req_end = start + length;
+	rv_end = rv->io_start + rv->io_length;
+
+	/*
+	 * If we have a stashed IO, we haven't changed fds, the error
+	 * reporting is the same, and the two extents are close,
+	 * we can combine them.
+	 */
+	if (rv->io_length > 0 && disk == rv->io_disk &&
+	    end_arg == rv->io_end_arg &&
+	    ((start >= rv->io_start && start <= rv_end + RVP_IO_BATCH_LOCALITY) ||
+	     (rv->io_start >= start &&
+	      rv->io_start <= req_end + RVP_IO_BATCH_LOCALITY))) {
+		rv->io_start = min(rv->io_start, start);
+		rv->io_length = max(req_end, rv_end) - rv->io_start;
+	} else  {
+		/* Otherwise, issue the stashed IO (if there is one) */
+		if (rv->io_length > 0)
+			return read_verify_queue(rvp, rv);
+
+		/* Stash the new IO. */
+		rv->io_disk = disk;
+		rv->io_start = start;
+		rv->io_length = length;
+		rv->io_end_arg = end_arg;
+	}
+
+	return true;
+}
+
+/* Force any stashed IOs into the verifier. */
+bool
+read_verify_force_io(
+	struct read_verify_pool		*rvp,
+	struct read_verify		*rv)
+{
+	bool				moveon;
+
+	assert(rvp->readbuf);
+	if (rv->io_length == 0)
+		return true;
+
+	moveon = read_verify_queue(rvp, rv);
+	if (moveon)
+		rv->io_length = 0;
+	return moveon;
+}
+
+/* How many bytes has this process verified? */
+uint64_t
+read_verify_bytes(
+	struct read_verify_pool		*rvp)
+{
+	return ptcounter_value(rvp->verified_bytes);
+}
diff --git a/scrub/read_verify.h b/scrub/read_verify.h
new file mode 100644
index 0000000..cea7a08
--- /dev/null
+++ b/scrub/read_verify.h
@@ -0,0 +1,50 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef XFS_SCRUB_READ_VERIFY_H_
+#define XFS_SCRUB_READ_VERIFY_H_
+
+struct scrub_ctx;
+struct read_verify_pool;
+
+/* Function called when an IO error happens. */
+typedef void (*read_verify_ioerr_fn_t)(struct scrub_ctx *ctx,
+		struct disk *disk, uint64_t start, uint64_t length,
+		int error, void *arg);
+
+struct read_verify_pool *read_verify_pool_init(struct scrub_ctx *ctx,
+		size_t miniosz, read_verify_ioerr_fn_t ioerr_fn,
+		unsigned int nproc);
+void read_verify_pool_flush(struct read_verify_pool *rvp);
+void read_verify_pool_destroy(struct read_verify_pool *rvp);
+
+struct read_verify {
+	void			*io_end_arg;
+	struct disk		*io_disk;
+	uint64_t		io_start;	/* bytes */
+	uint64_t		io_length;	/* bytes */
+};
+
+bool read_verify_schedule_io(struct read_verify_pool *rvp,
+		struct read_verify *rv, struct disk *disk, uint64_t start,
+		uint64_t length, void *end_arg);
+bool read_verify_force_io(struct read_verify_pool *rvp, struct read_verify *rv);
+uint64_t read_verify_bytes(struct read_verify_pool *rvp);
+
+#endif /* XFS_SCRUB_READ_VERIFY_H_ */
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index 66003e4..31a927c 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -83,6 +83,9 @@ struct scrub_ctx {
 	void			*fshandle;
 	size_t			fshandle_len;
 
+	/* Data block read verification buffer */
+	void			*readbuf;
+
 	/* Mutable scrub state; use lock. */
 	pthread_mutex_t		lock;
 	unsigned long long	max_errors;


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 21/27] xfs_scrub: scrub file data blocks
  2018-01-06  1:51 [PATCH v11 00/27] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (19 preceding siblings ...)
  2018-01-06  1:53 ` [PATCH 20/27] xfs_scrub: create infrastructure to read verify data blocks Darrick J. Wong
@ 2018-01-06  1:53 ` Darrick J. Wong
  2018-01-11 23:25   ` Eric Sandeen
  2018-01-06  1:53 ` [PATCH 22/27] xfs_scrub: optionally use SCSI READ VERIFY commands to scrub data blocks on disk Darrick J. Wong
                   ` (9 subsequent siblings)
  30 siblings, 1 reply; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-06  1:53 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Read all data blocks from the disk, hoping to catch IO errors.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 configure.ac          |    2 
 include/builddefs.in  |    2 
 m4/package_libcdev.m4 |   28 +++
 scrub/Makefile        |    7 -
 scrub/phase6.c        |  516 +++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/vfs.c           |  221 +++++++++++++++++++++
 scrub/vfs.h           |   31 +++
 scrub/xfs_scrub.c     |    4 
 scrub/xfs_scrub.h     |    2 
 9 files changed, 811 insertions(+), 2 deletions(-)
 create mode 100644 scrub/phase6.c
 create mode 100644 scrub/vfs.c
 create mode 100644 scrub/vfs.h


diff --git a/configure.ac b/configure.ac
index fc44bd5..8eda010 100644
--- a/configure.ac
+++ b/configure.ac
@@ -170,6 +170,8 @@ AC_PACKAGE_WANT_ATTRIBUTES_H
 AC_HAVE_LIBATTR
 AC_PACKAGE_WANT_UNINORM_H
 AC_HAVE_U8NORMALIZE
+AC_HAVE_OPENAT
+AC_HAVE_FSTATAT
 
 if test "$enable_blkid" = yes; then
 AC_HAVE_BLKID_TOPO
diff --git a/include/builddefs.in b/include/builddefs.in
index 1c264a0..2f8d33f 100644
--- a/include/builddefs.in
+++ b/include/builddefs.in
@@ -123,6 +123,8 @@ HAVE_DEVMAPPER = @have_devmapper@
 HAVE_MALLINFO = @have_mallinfo@
 HAVE_LIBATTR = @have_libattr@
 HAVE_U8NORMALIZE = @have_u8normalize@
+HAVE_OPENAT = @have_openat@
+HAVE_FSTATAT = @have_fstatat@
 
 GCCFLAGS = -funsigned-char -fno-strict-aliasing -Wall
 #	   -Wbitwise -Wno-transparent-union -Wno-old-initializer -Wno-decl
diff --git a/m4/package_libcdev.m4 b/m4/package_libcdev.m4
index d3955f0..e0abc12 100644
--- a/m4/package_libcdev.m4
+++ b/m4/package_libcdev.m4
@@ -362,3 +362,31 @@ AC_DEFUN([AC_HAVE_MALLINFO],
        AC_MSG_RESULT(no))
     AC_SUBST(have_mallinfo)
   ])
+
+#
+# Check if we have a openat call
+#
+AC_DEFUN([AC_HAVE_OPENAT],
+  [ AC_CHECK_DECL([openat],
+       have_openat=yes,
+       [],
+       [#include <sys/types.h>
+        #include <sys/stat.h>
+        #include <fcntl.h>]
+       )
+    AC_SUBST(have_openat)
+  ])
+
+#
+# Check if we have a fstatat call
+#
+AC_DEFUN([AC_HAVE_FSTATAT],
+  [ AC_CHECK_DECL([fstatat],
+       have_fstatat=yes,
+       [],
+       [#define _GNU_SOURCE
+       #include <sys/types.h>
+       #include <sys/stat.h>
+       #include <unistd.h>])
+    AC_SUBST(have_fstatat)
+  ])
diff --git a/scrub/Makefile b/scrub/Makefile
index 3b3eb95..4b70efa 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -8,9 +8,9 @@ include $(TOPDIR)/include/builddefs
 # On linux we get fsmap from the system or define it ourselves
 # so include this based on platform type.  If this reverts to only
 # the autoconf check w/o local definition, change to testing HAVE_GETFSMAP
-SCRUB_PREREQS=$(PKG_PLATFORM)
+SCRUB_PREREQS=$(PKG_PLATFORM)$(HAVE_OPENAT)$(HAVE_FSTATAT)
 
-ifeq ($(SCRUB_PREREQS),linux)
+ifeq ($(SCRUB_PREREQS),linuxyesyes)
 LTCOMMAND = xfs_scrub
 INSTALL_SCRUB = install-scrub
 endif	# scrub_prereqs
@@ -27,6 +27,7 @@ read_verify.h \
 scrub.h \
 spacemap.h \
 unicrash.h \
+vfs.h \
 xfs_scrub.h
 
 CFILES = \
@@ -41,9 +42,11 @@ phase1.c \
 phase2.c \
 phase3.c \
 phase5.c \
+phase6.c \
 read_verify.c \
 scrub.c \
 spacemap.c \
+vfs.c \
 xfs_scrub.c
 
 LLDLIBS += $(LIBHANDLE) $(LIBFROG) $(LIBPTHREAD) $(LIBUNISTRING)
diff --git a/scrub/phase6.c b/scrub/phase6.c
new file mode 100644
index 0000000..5ecb8dc
--- /dev/null
+++ b/scrub/phase6.c
@@ -0,0 +1,516 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <dirent.h>
+#include <sys/statvfs.h>
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "handle.h"
+#include "path.h"
+#include "ptvar.h"
+#include "workqueue.h"
+#include "xfs_scrub.h"
+#include "common.h"
+#include "bitmap.h"
+#include "disk.h"
+#include "filemap.h"
+#include "inodes.h"
+#include "read_verify.h"
+#include "spacemap.h"
+#include "vfs.h"
+
+/*
+ * Phase 6: Verify data file integrity.
+ *
+ * Identify potential data block extents with GETFSMAP, then feed those
+ * extents to the read-verify pool to get the verify commands batched,
+ * issued, and (if there are problems) reported back to us.  If there
+ * are errors, we'll record the bad regions and (if available) use rmap
+ * to tell us if metadata are now corrupt.  Otherwise, we'll scan the
+ * whole directory tree looking for files that overlap the bad regions
+ * and report the paths of the now corrupt files.
+ */
+
+/* Find the fd for a given device identifier. */
+static struct disk *
+xfs_dev_to_disk(
+	struct scrub_ctx	*ctx,
+	dev_t			dev)
+{
+	if (dev == ctx->fsinfo.fs_datadev)
+		return ctx->datadev;
+	else if (dev == ctx->fsinfo.fs_logdev)
+		return ctx->logdev;
+	else if (dev == ctx->fsinfo.fs_rtdev)
+		return ctx->rtdev;
+	abort();
+}
+
+/* Find the device major/minor for a given file descriptor. */
+static dev_t
+xfs_disk_to_dev(
+	struct scrub_ctx	*ctx,
+	struct disk		*disk)
+{
+	if (disk == ctx->datadev)
+		return ctx->fsinfo.fs_datadev;
+	else if (disk == ctx->logdev)
+		return ctx->fsinfo.fs_logdev;
+	else if (disk == ctx->rtdev)
+		return ctx->fsinfo.fs_rtdev;
+	abort();
+}
+
+struct owner_decode {
+	uint64_t		owner;
+	const char		*descr;
+};
+
+static const struct owner_decode special_owners[] = {
+	{XFS_FMR_OWN_FREE,	"free space"},
+	{XFS_FMR_OWN_UNKNOWN,	"unknown owner"},
+	{XFS_FMR_OWN_FS,	"static FS metadata"},
+	{XFS_FMR_OWN_LOG,	"journalling log"},
+	{XFS_FMR_OWN_AG,	"per-AG metadata"},
+	{XFS_FMR_OWN_INOBT,	"inode btree blocks"},
+	{XFS_FMR_OWN_INODES,	"inodes"},
+	{XFS_FMR_OWN_REFC,	"refcount btree"},
+	{XFS_FMR_OWN_COW,	"CoW staging"},
+	{XFS_FMR_OWN_DEFECTIVE,	"bad blocks"},
+	{0, NULL},
+};
+
+/* Decode a special owner. */
+static const char *
+xfs_decode_special_owner(
+	uint64_t			owner)
+{
+	const struct owner_decode	*od = special_owners;
+
+	while (od->descr) {
+		if (od->owner == owner)
+			return od->descr;
+		od++;
+	}
+
+	return NULL;
+}
+
+/* Routines to translate bad physical extents into file paths and offsets. */
+
+struct xfs_verify_error_info {
+	struct bitmap			*d_bad;		/* bytes */
+	struct bitmap			*r_bad;		/* bytes */
+};
+
+/* Report if this extent overlaps a bad region. */
+static bool
+xfs_report_verify_inode_bmap(
+	struct scrub_ctx		*ctx,
+	const char			*descr,
+	int				fd,
+	int				whichfork,
+	struct fsxattr			*fsx,
+	struct xfs_bmap			*bmap,
+	void				*arg)
+{
+	struct xfs_verify_error_info	*vei = arg;
+	struct bitmap			*bmp;
+
+	/* Only report errors for real extents. */
+	if (bmap->bm_flags & (BMV_OF_PREALLOC | BMV_OF_DELALLOC))
+		return true;
+
+	if (fsx->fsx_xflags & FS_XFLAG_REALTIME)
+		bmp = vei->r_bad;
+	else
+		bmp = vei->d_bad;
+
+	if (!bitmap_test(bmp, bmap->bm_physical, bmap->bm_length))
+		return true;
+
+	str_error(ctx, descr,
+_("offset %llu failed read verification."), bmap->bm_offset);
+	return true;
+}
+
+/* Iterate the extent mappings of a file to report errors. */
+static bool
+xfs_report_verify_fd(
+	struct scrub_ctx		*ctx,
+	const char			*descr,
+	int				fd,
+	void				*arg)
+{
+	struct xfs_bmap			key = {0};
+	bool				moveon;
+
+	/* data fork */
+	moveon = xfs_iterate_filemaps(ctx, descr, fd, XFS_DATA_FORK, &key,
+			xfs_report_verify_inode_bmap, arg);
+	if (!moveon)
+		return false;
+
+	/* attr fork */
+	moveon = xfs_iterate_filemaps(ctx, descr, fd, XFS_ATTR_FORK, &key,
+			xfs_report_verify_inode_bmap, arg);
+	if (!moveon)
+		return false;
+	return true;
+}
+
+/* Report read verify errors in unlinked (but still open) files. */
+static int
+xfs_report_verify_inode(
+	struct scrub_ctx		*ctx,
+	struct xfs_handle		*handle,
+	struct xfs_bstat		*bstat,
+	void				*arg)
+{
+	char				descr[DESCR_BUFSZ];
+	char				buf[DESCR_BUFSZ];
+	bool				moveon;
+	int				fd;
+	int				error;
+
+	snprintf(descr, DESCR_BUFSZ, _("inode %"PRIu64" (unlinked)"),
+			(uint64_t)bstat->bs_ino);
+
+	/* Ignore linked files and things we can't open. */
+	if (bstat->bs_nlink != 0)
+		return 0;
+	if (!S_ISREG(bstat->bs_mode) && !S_ISDIR(bstat->bs_mode))
+		return 0;
+
+	/* Try to open the inode. */
+	fd = xfs_open_handle(handle);
+	if (fd < 0) {
+		error = errno;
+		if (error == ESTALE)
+			return error;
+
+		str_warn(ctx, descr, "%s", strerror_r(error, buf, DESCR_BUFSZ));
+		return error;
+	}
+
+	/* Go find the badness. */
+	moveon = xfs_report_verify_fd(ctx, descr, fd, arg);
+	close(fd);
+
+	return moveon ? 0 : XFS_ITERATE_INODES_ABORT;
+}
+
+/* Scan a directory for matches in the read verify error list. */
+static bool
+xfs_report_verify_dir(
+	struct scrub_ctx	*ctx,
+	const char		*path,
+	int			dir_fd,
+	void			*arg)
+{
+	return xfs_report_verify_fd(ctx, path, dir_fd, arg);
+}
+
+/*
+ * Scan the inode associated with a directory entry for matches with
+ * the read verify error list.
+ */
+static bool
+xfs_report_verify_dirent(
+	struct scrub_ctx	*ctx,
+	const char		*path,
+	int			dir_fd,
+	struct dirent		*dirent,
+	struct stat		*sb,
+	void			*arg)
+{
+	bool			moveon;
+	int			fd;
+
+	/* Ignore things we can't open. */
+	if (!S_ISREG(sb->st_mode) && !S_ISDIR(sb->st_mode))
+		return true;
+
+	/* Ignore . and .. */
+	if (!strcmp(".", dirent->d_name) || !strcmp("..", dirent->d_name))
+		return true;
+
+	/*
+	 * If we were given a dirent, open the associated file under
+	 * dir_fd for badblocks scanning.  If dirent is NULL, then it's
+	 * the directory itself we want to scan.
+	 */
+	fd = openat(dir_fd, dirent->d_name,
+			O_RDONLY | O_NOATIME | O_NOFOLLOW | O_NOCTTY);
+	if (fd < 0)
+		return true;
+
+	/* Go find the badness. */
+	moveon = xfs_report_verify_fd(ctx, path, fd, arg);
+	if (moveon)
+		goto out;
+
+out:
+	close(fd);
+
+	return moveon;
+}
+
+/* Given bad extent lists for the data & rtdev, find bad files. */
+static bool
+xfs_report_verify_errors(
+	struct scrub_ctx		*ctx,
+	struct bitmap			*d_bad,
+	struct bitmap			*r_bad)
+{
+	struct xfs_verify_error_info	vei;
+	bool				moveon;
+
+	vei.d_bad = d_bad;
+	vei.r_bad = r_bad;
+
+	/* Scan the directory tree to get file paths. */
+	moveon = scan_fs_tree(ctx, xfs_report_verify_dir,
+			xfs_report_verify_dirent, &vei);
+	if (!moveon)
+		return false;
+
+	/* Scan for unlinked files. */
+	return xfs_scan_all_inodes(ctx, xfs_report_verify_inode, &vei);
+}
+
+/* Verify disk blocks with GETFSMAP */
+
+struct xfs_verify_extent {
+	struct read_verify_pool	*readverify;
+	struct ptvar		*rvstate;
+	struct bitmap		*d_bad;		/* bytes */
+	struct bitmap		*r_bad;		/* bytes */
+};
+
+/* Report an IO error resulting from read-verify based off getfsmap. */
+static bool
+xfs_check_rmap_error_report(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	struct fsmap		*map,
+	void			*arg)
+{
+	const char		*type;
+	char			buf[32];
+	uint64_t		err_physical = *(uint64_t *)arg;
+	uint64_t		err_off;
+
+	if (err_physical > map->fmr_physical)
+		err_off = err_physical - map->fmr_physical;
+	else
+		err_off = 0;
+
+	snprintf(buf, 32, _("disk offset %"PRIu64),
+			(uint64_t)BTOBB(map->fmr_physical + err_off));
+
+	if (map->fmr_flags & FMR_OF_SPECIAL_OWNER) {
+		type = xfs_decode_special_owner(map->fmr_owner);
+		str_error(ctx, buf,
+_("%s failed read verification."),
+				type);
+	}
+
+	/*
+	 * XXX: If we had a getparent() call we could report IO errors
+	 * efficiently.  Until then, we'll have to scan the dir tree
+	 * to find the bad file's pathname.
+	 */
+
+	return true;
+}
+
+/*
+ * Remember a read error for later, and see if rmap will tell us about the
+ * owner ahead of time.
+ */
+void
+xfs_check_rmap_ioerr(
+	struct scrub_ctx		*ctx,
+	struct disk			*disk,
+	uint64_t			start,
+	uint64_t			length,
+	int				error,
+	void				*arg)
+{
+	struct fsmap			keys[2];
+	char				descr[DESCR_BUFSZ];
+	struct xfs_verify_extent	*ve = arg;
+	struct bitmap			*tree;
+	dev_t				dev;
+	bool				moveon;
+
+	dev = xfs_disk_to_dev(ctx, disk);
+
+	/*
+	 * If we don't have parent pointers, save the bad extent for
+	 * later rescanning.
+	 */
+	if (dev == ctx->fsinfo.fs_datadev)
+		tree = ve->d_bad;
+	else if (dev == ctx->fsinfo.fs_rtdev)
+		tree = ve->r_bad;
+	else
+		tree = NULL;
+	if (tree) {
+		moveon = bitmap_set(tree, start, length);
+		if (!moveon)
+			str_errno(ctx, ctx->mntpoint);
+	}
+
+	snprintf(descr, DESCR_BUFSZ, _("dev %d:%d ioerr @ %"PRIu64":%"PRIu64" "),
+			major(dev), minor(dev), start, length);
+
+	/* Go figure out which blocks are bad from the fsmap. */
+	memset(keys, 0, sizeof(struct fsmap) * 2);
+	keys->fmr_device = dev;
+	keys->fmr_physical = start;
+	(keys + 1)->fmr_device = dev;
+	(keys + 1)->fmr_physical = start + length - 1;
+	(keys + 1)->fmr_owner = ULLONG_MAX;
+	(keys + 1)->fmr_offset = ULLONG_MAX;
+	(keys + 1)->fmr_flags = UINT_MAX;
+	xfs_iterate_fsmap(ctx, descr, keys, xfs_check_rmap_error_report,
+			&start);
+}
+
+/* Schedule a read-verify of a (data block) extent. */
+static bool
+xfs_check_rmap(
+	struct scrub_ctx		*ctx,
+	const char			*descr,
+	struct fsmap			*map,
+	void				*arg)
+{
+	struct xfs_verify_extent	*ve = arg;
+	struct disk			*disk;
+
+	dbg_printf("rmap dev %d:%d phys %"PRIu64" owner %"PRId64
+			" offset %"PRIu64" len %"PRIu64" flags 0x%x\n",
+			major(map->fmr_device), minor(map->fmr_device),
+			(uint64_t)map->fmr_physical, (int64_t)map->fmr_owner,
+			(uint64_t)map->fmr_offset, (uint64_t)map->fmr_length,
+			map->fmr_flags);
+
+	/* "Unknown" extents should be verified; they could be data. */
+	if ((map->fmr_flags & FMR_OF_SPECIAL_OWNER) &&
+			map->fmr_owner == XFS_FMR_OWN_UNKNOWN)
+		map->fmr_flags &= ~FMR_OF_SPECIAL_OWNER;
+
+	/*
+	 * We only care about read-verifying data extents that have been
+	 * written to disk.  This means we can skip "special" owners
+	 * (metadata), xattr blocks, unwritten extents, and extent maps.
+	 * These should all get checked elsewhere in the scrubber.
+	 */
+	if (map->fmr_flags & (FMR_OF_PREALLOC | FMR_OF_ATTR_FORK |
+			      FMR_OF_EXTENT_MAP | FMR_OF_SPECIAL_OWNER))
+		goto out;
+
+	/* XXX: Filter out directory data blocks. */
+
+	/* Schedule the read verify command for (eventual) running. */
+	disk = xfs_dev_to_disk(ctx, map->fmr_device);
+
+	read_verify_schedule_io(ve->readverify, ptvar_get(ve->rvstate), disk,
+			map->fmr_physical, map->fmr_length, ve);
+
+out:
+	/* Is this the last extent?  Fire off the read. */
+	if (map->fmr_flags & FMR_OF_LAST)
+		read_verify_force_io(ve->readverify, ptvar_get(ve->rvstate));
+
+	return true;
+}
+
+/*
+ * Read verify all the file data blocks in a filesystem.  Since XFS doesn't
+ * do data checksums, we trust that the underlying storage will pass back
+ * an IO error if it can't retrieve whatever we previously stored there.
+ * If we hit an IO error, we'll record the bad blocks in a bitmap and then
+ * scan the extent maps of the entire fs tree to figure (and the unlinked
+ * inodes) out which files are now broken.
+ */
+bool
+xfs_scan_blocks(
+	struct scrub_ctx		*ctx)
+{
+	struct xfs_verify_extent	ve;
+	bool				moveon;
+
+	ve.rvstate = ptvar_init(scrub_nproc(ctx), sizeof(struct read_verify));
+	if (!ve.rvstate) {
+		str_errno(ctx, ctx->mntpoint);
+		return false;
+	}
+
+	moveon = bitmap_init(&ve.d_bad);
+	if (!moveon) {
+		str_errno(ctx, ctx->mntpoint);
+		goto out_ve;
+	}
+
+	moveon = bitmap_init(&ve.r_bad);
+	if (!moveon) {
+		str_errno(ctx, ctx->mntpoint);
+		goto out_dbad;
+	}
+
+	ve.readverify = read_verify_pool_init(ctx, ctx->geo.blocksize,
+			xfs_check_rmap_ioerr, disk_heads(ctx->datadev));
+	if (!ve.readverify) {
+		moveon = false;
+		str_error(ctx, ctx->mntpoint,
+_("Could not create media verifier."));
+		goto out_rbad;
+	}
+	moveon = xfs_scan_all_spacemaps(ctx, xfs_check_rmap, &ve);
+	if (!moveon)
+		goto out_pool;
+	read_verify_pool_flush(ve.readverify);
+	ctx->bytes_checked += read_verify_bytes(ve.readverify);
+	read_verify_pool_destroy(ve.readverify);
+
+	/* Scan the whole dir tree to see what matches the bad extents. */
+	if (!bitmap_empty(ve.d_bad) || !bitmap_empty(ve.r_bad))
+		moveon = xfs_report_verify_errors(ctx, ve.d_bad, ve.r_bad);
+
+	bitmap_free(&ve.r_bad);
+	bitmap_free(&ve.d_bad);
+	ptvar_free(ve.rvstate);
+	return moveon;
+
+out_pool:
+	read_verify_pool_destroy(ve.readverify);
+out_rbad:
+	bitmap_free(&ve.r_bad);
+out_dbad:
+	bitmap_free(&ve.d_bad);
+out_ve:
+	ptvar_free(ve.rvstate);
+	return moveon;
+}
diff --git a/scrub/vfs.c b/scrub/vfs.c
new file mode 100644
index 0000000..6a51090
--- /dev/null
+++ b/scrub/vfs.c
@@ -0,0 +1,221 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <dirent.h>
+#include <sys/types.h>
+#include <sys/statvfs.h>
+#include "xfs.h"
+#include "handle.h"
+#include "path.h"
+#include "workqueue.h"
+#include "xfs_scrub.h"
+#include "common.h"
+#include "vfs.h"
+
+/*
+ * Helper functions to assist in traversing a directory tree using regular
+ * VFS calls.
+ */
+
+/* Scan a filesystem tree. */
+struct scan_fs_tree {
+	unsigned int		nr_dirs;
+	pthread_mutex_t		lock;
+	pthread_cond_t		wakeup;
+	struct stat		root_sb;
+	bool			moveon;
+	scan_fs_tree_dir_fn	dir_fn;
+	scan_fs_tree_dirent_fn	dirent_fn;
+	void			*arg;
+};
+
+/* Per-work-item scan context. */
+struct scan_fs_tree_dir {
+	char			*path;
+	struct scan_fs_tree	*sft;
+	bool			rootdir;
+};
+
+/* Scan a directory sub tree. */
+static void
+scan_fs_dir(
+	struct workqueue	*wq,
+	xfs_agnumber_t		agno,
+	void			*arg)
+{
+	struct scrub_ctx	*ctx = (struct scrub_ctx *)wq->wq_ctx;
+	struct scan_fs_tree_dir	*sftd = arg;
+	struct scan_fs_tree	*sft = sftd->sft;
+	DIR			*dir;
+	struct dirent		*dirent;
+	char			newpath[PATH_MAX];
+	struct scan_fs_tree_dir	*new_sftd;
+	struct stat		sb;
+	int			dir_fd;
+	int			error;
+
+	/* Open the directory. */
+	dir_fd = open(sftd->path, O_RDONLY | O_NOATIME | O_NOFOLLOW | O_NOCTTY);
+	if (dir_fd < 0) {
+		if (errno != ENOENT)
+			str_errno(ctx, sftd->path);
+		goto out;
+	}
+
+	/* Caller-specific directory checks. */
+	if (!sft->dir_fn(ctx, sftd->path, dir_fd, sft->arg)) {
+		sft->moveon = false;
+		goto out;
+	}
+
+	/* Iterate the directory entries. */
+	dir = fdopendir(dir_fd);
+	if (!dir) {
+		str_errno(ctx, sftd->path);
+		goto out;
+	}
+	rewinddir(dir);
+	for (dirent = readdir(dir); dirent != NULL; dirent = readdir(dir)) {
+		snprintf(newpath, PATH_MAX, "%s/%s", sftd->path,
+				dirent->d_name);
+
+		/* Get the stat info for this directory entry. */
+		error = fstatat(dir_fd, dirent->d_name, &sb,
+				AT_NO_AUTOMOUNT | AT_SYMLINK_NOFOLLOW);
+		if (error) {
+			str_errno(ctx, newpath);
+			continue;
+		}
+
+		/* Ignore files on other filesystems. */
+		if (sb.st_dev != sft->root_sb.st_dev)
+			continue;
+
+		/* Caller-specific directory entry function. */
+		if (!sft->dirent_fn(ctx, newpath, dir_fd, dirent, &sb,
+				sft->arg)) {
+			sft->moveon = false;
+			break;
+		}
+
+		if (xfs_scrub_excessive_errors(ctx)) {
+			sft->moveon = false;
+			break;
+		}
+
+		/* If directory, call ourselves recursively. */
+		if (S_ISDIR(sb.st_mode) && strcmp(".", dirent->d_name) &&
+		    strcmp("..", dirent->d_name)) {
+			new_sftd = malloc(sizeof(struct scan_fs_tree_dir));
+			if (!new_sftd) {
+				str_errno(ctx, newpath);
+				sft->moveon = false;
+				break;
+			}
+			new_sftd->path = strdup(newpath);
+			new_sftd->sft = sft;
+			new_sftd->rootdir = false;
+			pthread_mutex_lock(&sft->lock);
+			sft->nr_dirs++;
+			pthread_mutex_unlock(&sft->lock);
+			error = workqueue_add(wq, scan_fs_dir, 0, new_sftd);
+			if (error) {
+				str_error(ctx, ctx->mntpoint,
+_("Could not queue subdirectory scan work."));
+				sft->moveon = false;
+				break;
+			}
+		}
+	}
+
+	/* Close dir, go away. */
+	error = closedir(dir);
+	if (error)
+		str_errno(ctx, sftd->path);
+
+out:
+	pthread_mutex_lock(&sft->lock);
+	sft->nr_dirs--;
+	if (sft->nr_dirs == 0)
+		pthread_cond_signal(&sft->wakeup);
+	pthread_mutex_unlock(&sft->lock);
+
+	free(sftd->path);
+	free(sftd);
+}
+
+/* Scan the entire filesystem. */
+bool
+scan_fs_tree(
+	struct scrub_ctx	*ctx,
+	scan_fs_tree_dir_fn	dir_fn,
+	scan_fs_tree_dirent_fn	dirent_fn,
+	void			*arg)
+{
+	struct workqueue	wq;
+	struct scan_fs_tree	sft;
+	struct scan_fs_tree_dir	*sftd;
+	int			ret;
+
+	sft.moveon = true;
+	sft.nr_dirs = 1;
+	sft.root_sb = ctx->mnt_sb;
+	sft.dir_fn = dir_fn;
+	sft.dirent_fn = dirent_fn;
+	sft.arg = arg;
+	pthread_mutex_init(&sft.lock, NULL);
+	pthread_cond_init(&sft.wakeup, NULL);
+
+	sftd = malloc(sizeof(struct scan_fs_tree_dir));
+	if (!sftd) {
+		str_errno(ctx, ctx->mntpoint);
+		return false;
+	}
+	sftd->path = strdup(ctx->mntpoint);
+	sftd->sft = &sft;
+	sftd->rootdir = true;
+
+	ret = workqueue_create(&wq, (struct xfs_mount *)ctx,
+			scrub_nproc_workqueue(ctx));
+	if (ret) {
+		str_error(ctx, ctx->mntpoint, _("Could not create workqueue."));
+		goto out_free;
+	}
+	ret = workqueue_add(&wq, scan_fs_dir, 0, sftd);
+	if (ret) {
+		str_error(ctx, ctx->mntpoint,
+_("Could not queue directory scan work."));
+		goto out_free;
+	}
+
+	pthread_mutex_lock(&sft.lock);
+	pthread_cond_wait(&sft.wakeup, &sft.lock);
+	assert(sft.nr_dirs == 0);
+	pthread_mutex_unlock(&sft.lock);
+	workqueue_destroy(&wq);
+
+	return sft.moveon;
+out_free:
+	free(sftd->path);
+	free(sftd);
+	return false;
+}
diff --git a/scrub/vfs.h b/scrub/vfs.h
new file mode 100644
index 0000000..100eb18
--- /dev/null
+++ b/scrub/vfs.h
@@ -0,0 +1,31 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef XFS_SCRUB_VFS_H_
+#define XFS_SCRUB_VFS_H_
+
+typedef bool (*scan_fs_tree_dir_fn)(struct scrub_ctx *, const char *,
+		int, void *);
+typedef bool (*scan_fs_tree_dirent_fn)(struct scrub_ctx *, const char *,
+		int, struct dirent *, struct stat *, void *);
+
+bool scan_fs_tree(struct scrub_ctx *ctx, scan_fs_tree_dir_fn dir_fn,
+		scan_fs_tree_dirent_fn dirent_fn, void *arg);
+
+#endif /* XFS_SCRUB_VFS_H_ */
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index f7e4e37..fa1d089 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -390,6 +390,10 @@ run_scrub_phases(
 
 	/* Run all phases of the scrub tool. */
 	for (phase = 1, sp = phases; sp->fn; sp++, phase++) {
+		/* Turn on certain phases if user said to. */
+		if (sp->fn == DATASCAN_DUMMY_FN && scrub_data)
+			sp->fn = xfs_scan_blocks;
+
 		/* Skip certain phases unless they're turned on. */
 		if (sp->fn == REPAIR_DUMMY_FN ||
 		    sp->fn == DATASCAN_DUMMY_FN)
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index 31a927c..026631a 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -93,6 +93,7 @@ struct scrub_ctx {
 	unsigned long long	errors_found;
 	unsigned long long	warnings_found;
 	unsigned long long	inodes_checked;
+	unsigned long long	bytes_checked;
 	unsigned long long	naming_warnings;
 	bool			need_repair;
 	bool			preen_triggers[XFS_SCRUB_TYPE_NR];
@@ -105,5 +106,6 @@ bool xfs_setup_fs(struct scrub_ctx *ctx);
 bool xfs_scan_metadata(struct scrub_ctx *ctx);
 bool xfs_scan_inodes(struct scrub_ctx *ctx);
 bool xfs_scan_connections(struct scrub_ctx *ctx);
+bool xfs_scan_blocks(struct scrub_ctx *ctx);
 
 #endif /* XFS_SCRUB_XFS_SCRUB_H_ */


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 22/27] xfs_scrub: optionally use SCSI READ VERIFY commands to scrub data blocks on disk
  2018-01-06  1:51 [PATCH v11 00/27] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (20 preceding siblings ...)
  2018-01-06  1:53 ` [PATCH 21/27] xfs_scrub: scrub file " Darrick J. Wong
@ 2018-01-06  1:53 ` Darrick J. Wong
  2018-01-06  1:53 ` [PATCH 23/27] xfs_scrub: check summary counters Darrick J. Wong
                   ` (8 subsequent siblings)
  30 siblings, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-06  1:53 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

If we sense that we're talking to a raw SCSI disk, use the SCSI READ
VERIFY command to ask the disk to verify a disk internally.  This can
sharply reduce the runtime of the data block verification phase on
devices whose internal bandwidth exceeds their link bandwidth.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 configure.ac          |    2 +
 include/builddefs.in  |    2 +
 m4/package_libcdev.m4 |   30 ++++++++++
 scrub/Makefile        |    8 +++
 scrub/disk.c          |  146 +++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/disk.h          |    1 
 6 files changed, 188 insertions(+), 1 deletion(-)


diff --git a/configure.ac b/configure.ac
index 8eda010..bb032e5 100644
--- a/configure.ac
+++ b/configure.ac
@@ -172,6 +172,8 @@ AC_PACKAGE_WANT_UNINORM_H
 AC_HAVE_U8NORMALIZE
 AC_HAVE_OPENAT
 AC_HAVE_FSTATAT
+AC_HAVE_SG_IO
+AC_HAVE_HDIO_GETGEO
 
 if test "$enable_blkid" = yes; then
 AC_HAVE_BLKID_TOPO
diff --git a/include/builddefs.in b/include/builddefs.in
index 2f8d33f..d44faf9 100644
--- a/include/builddefs.in
+++ b/include/builddefs.in
@@ -125,6 +125,8 @@ HAVE_LIBATTR = @have_libattr@
 HAVE_U8NORMALIZE = @have_u8normalize@
 HAVE_OPENAT = @have_openat@
 HAVE_FSTATAT = @have_fstatat@
+HAVE_SG_IO = @have_sg_io@
+HAVE_HDIO_GETGEO = @have_hdio_getgeo@
 
 GCCFLAGS = -funsigned-char -fno-strict-aliasing -Wall
 #	   -Wbitwise -Wno-transparent-union -Wno-old-initializer -Wno-decl
diff --git a/m4/package_libcdev.m4 b/m4/package_libcdev.m4
index e0abc12..9258c27 100644
--- a/m4/package_libcdev.m4
+++ b/m4/package_libcdev.m4
@@ -390,3 +390,33 @@ AC_DEFUN([AC_HAVE_FSTATAT],
        #include <unistd.h>])
     AC_SUBST(have_fstatat)
   ])
+
+#
+# Check if we have the SG_IO ioctl
+#
+AC_DEFUN([AC_HAVE_SG_IO],
+  [ AC_MSG_CHECKING([for struct sg_io_hdr ])
+    AC_TRY_COMPILE([#include <scsi/sg.h>],
+    [
+         struct sg_io_hdr hdr;
+         ioctl(0, SG_IO, &hdr);
+    ], have_sg_io=yes
+       AC_MSG_RESULT(yes),
+       AC_MSG_RESULT(no))
+    AC_SUBST(have_sg_io)
+  ])
+
+#
+# Check if we have the HDIO_GETGEO ioctl
+#
+AC_DEFUN([AC_HAVE_HDIO_GETGEO],
+  [ AC_MSG_CHECKING([for struct hd_geometry ])
+    AC_TRY_COMPILE([#include <linux/hdreg.h>],
+    [
+         struct hd_geometry hdr;
+         ioctl(0, HDIO_GETGEO, &hdr);
+    ], have_hdio_getgeo=yes
+       AC_MSG_RESULT(yes),
+       AC_MSG_RESULT(no))
+    AC_SUBST(have_hdio_getgeo)
+  ])
diff --git a/scrub/Makefile b/scrub/Makefile
index 4b70efa..1fb6e84 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -70,6 +70,14 @@ CFILES += unicrash.c
 LCFLAGS += -DHAVE_U8NORMALIZE
 endif
 
+ifeq ($(HAVE_SG_IO),yes)
+LCFLAGS += -DHAVE_SG_IO
+endif
+
+ifeq ($(HAVE_HDIO_GETGEO),yes)
+LCFLAGS += -DHAVE_HDIO_GETGEO
+endif
+
 default: depend $(LTCOMMAND)
 
 phase5.o unicrash.o xfs.o: $(TOPDIR)/include/builddefs
diff --git a/scrub/disk.c b/scrub/disk.c
index 546a06c..35c5a76 100644
--- a/scrub/disk.c
+++ b/scrub/disk.c
@@ -29,12 +29,19 @@
 #include <sys/statvfs.h>
 #include <sys/vfs.h>
 #include <linux/fs.h>
+#ifdef HAVE_SG_IO
+# include <scsi/sg.h>
+#endif
+#ifdef HAVE_HDIO_GETGEO
+# include <linux/hdreg.h>
+#endif
 #include "platform_defs.h"
 #include "libfrog.h"
 #include "xfs.h"
 #include "path.h"
 #include "xfs_fs.h"
 #include "xfs_scrub.h"
+#include "common.h"
 #include "disk.h"
 
 /*
@@ -90,12 +97,119 @@ disk_heads(
 	return __disk_heads(disk);
 }
 
+/*
+ * Execute a SCSI VERIFY(16) to verify disk contents.
+ * For devices that support this command, this can sharply reduce the
+ * runtime of the data block verification phase if the storage device's
+ * internal bandwidth exceeds its link bandwidth.  However, it only
+ * works if we're talking to a raw SCSI device, and only if we trust the
+ * firmware.
+ */
+#ifdef HAVE_SG_IO
+# define SENSE_BUF_LEN		64
+# define VERIFY16_CMDLEN	16
+# define VERIFY16_CMD		0x8F
+
+# ifndef SG_FLAG_Q_AT_TAIL
+#  define SG_FLAG_Q_AT_TAIL	0x10
+# endif
+static int
+disk_scsi_verify(
+	struct disk		*disk,
+	uint64_t		startblock, /* lba */
+	uint64_t		blockcount) /* lba */
+{
+	struct sg_io_hdr	iohdr;
+	unsigned char		cdb[VERIFY16_CMDLEN];
+	unsigned char		sense[SENSE_BUF_LEN];
+	uint64_t		llba;
+	uint64_t		veri_len = blockcount;
+	int			error;
+
+	assert(!debug_tweak_on("XFS_SCRUB_NO_SCSI_VERIFY"));
+
+	llba = startblock + (disk->d_start >> BBSHIFT);
+
+	/* Borrowed from sg_verify */
+	cdb[0] = VERIFY16_CMD;
+	cdb[1] = 0; /* skip PI, DPO, and byte check. */
+	cdb[2] = (llba >> 56) & 0xff;
+	cdb[3] = (llba >> 48) & 0xff;
+	cdb[4] = (llba >> 40) & 0xff;
+	cdb[5] = (llba >> 32) & 0xff;
+	cdb[6] = (llba >> 24) & 0xff;
+	cdb[7] = (llba >> 16) & 0xff;
+	cdb[8] = (llba >> 8) & 0xff;
+	cdb[9] = llba & 0xff;
+	cdb[10] = (veri_len >> 24) & 0xff;
+	cdb[11] = (veri_len >> 16) & 0xff;
+	cdb[12] = (veri_len >> 8) & 0xff;
+	cdb[13] = veri_len & 0xff;
+	cdb[14] = 0;
+	cdb[15] = 0;
+	memset(sense, 0, SENSE_BUF_LEN);
+
+	/* v3 SG_IO */
+	memset(&iohdr, 0, sizeof(iohdr));
+	iohdr.interface_id = 'S';
+	iohdr.dxfer_direction = SG_DXFER_NONE;
+	iohdr.cmdp = cdb;
+	iohdr.cmd_len = VERIFY16_CMDLEN;
+	iohdr.sbp = sense;
+	iohdr.mx_sb_len = SENSE_BUF_LEN;
+	iohdr.flags |= SG_FLAG_Q_AT_TAIL;
+	iohdr.timeout = 30000; /* 30s */
+
+	error = ioctl(disk->d_fd, SG_IO, &iohdr);
+	if (error)
+		return error;
+
+	dbg_printf("VERIFY(16) fd %d lba %"PRIu64" len %"PRIu64" info %x "
+			"status %d masked %d msg %d host %d driver %d "
+			"duration %d resid %d\n",
+			disk->d_fd, startblock, blockcount, iohdr.info,
+			iohdr.status, iohdr.masked_status, iohdr.msg_status,
+			iohdr.host_status, iohdr.driver_status, iohdr.duration,
+			iohdr.resid);
+
+	if (iohdr.info & SG_INFO_CHECK) {
+		dbg_printf("status: msg %x host %x driver %x\n",
+				iohdr.msg_status, iohdr.host_status,
+				iohdr.driver_status);
+		errno = EIO;
+		return -1;
+	}
+
+	return error;
+}
+#else
+# define disk_scsi_verify(...)		(ENOTTY)
+#endif /* HAVE_SG_IO */
+
+/* Test the availability of the kernel scrub ioctl. */
+static bool
+disk_can_scsi_verify(
+	struct disk		*disk)
+{
+	int			error;
+
+	if (debug_tweak_on("XFS_SCRUB_NO_SCSI_VERIFY"))
+		return false;
+
+	error = disk_scsi_verify(disk, 0, 1);
+	return error == 0;
+}
+
 /* Open a disk device and discover its geometry. */
 struct disk *
 disk_open(
 	const char		*pathname)
 {
+#ifdef HAVE_HDIO_GETGEO
+	struct hd_geometry	bdgeo;
+#endif
 	struct disk		*disk;
+	bool			suspicious_disk = false;
 	int			lba_sz;
 	int			error;
 
@@ -126,13 +240,34 @@ disk_open(
 		error = ioctl(disk->d_fd, BLKBSZGET, &disk->d_blksize);
 		if (error)
 			disk->d_blksize = 0;
-		disk->d_start = 0;
+#ifdef HAVE_HDIO_GETGEO
+		error = ioctl(disk->d_fd, HDIO_GETGEO, &bdgeo);
+		if (!error) {
+			/*
+			 * dm devices will pass through ioctls, which means
+			 * we can't use SCSI VERIFY unless the start is 0.
+			 * Most dm devices don't set geometry (unlike scsi
+			 * and nvme) so use a zeroed out CHS to screen them
+			 * out.
+			 */
+			if (bdgeo.start != 0 &&
+			    (unsigned long long)bdgeo.heads * bdgeo.sectors *
+					bdgeo.sectors == 0)
+				suspicious_disk = true;
+			disk->d_start = bdgeo.start << BBSHIFT;
+		} else
+#endif
+			disk->d_start = 0;
 	} else {
 		disk->d_size = disk->d_sb.st_size;
 		disk->d_blksize = disk->d_sb.st_blksize;
 		disk->d_start = 0;
 	}
 
+	/* Can we issue SCSI VERIFY? */
+	if (!suspicious_disk && disk_can_scsi_verify(disk))
+		disk->d_flags |= DISK_FLAG_SCSI_VERIFY;
+
 	return disk;
 out_close:
 	close(disk->d_fd);
@@ -155,6 +290,10 @@ disk_close(
 	return error;
 }
 
+#define BTOLBAT(d, bytes)	((uint64_t)(bytes) >> (d)->d_lbalog)
+#define LBASIZE(d)		(1ULL << (d)->d_lbalog)
+#define BTOLBA(d, bytes)	(((uint64_t)(bytes) + LBASIZE(d) - 1) >> (d)->d_lbalog)
+
 /* Read-verify an extent of a disk device. */
 ssize_t
 disk_read_verify(
@@ -163,5 +302,10 @@ disk_read_verify(
 	uint64_t		start,
 	uint64_t		length)
 {
+	/* Convert to logical block size. */
+	if (disk->d_flags & DISK_FLAG_SCSI_VERIFY)
+		return disk_scsi_verify(disk, BTOLBAT(disk, start),
+				BTOLBA(disk, length));
+
 	return pread(disk->d_fd, buf, length, start);
 }
diff --git a/scrub/disk.h b/scrub/disk.h
index 834678e..8a00144 100644
--- a/scrub/disk.h
+++ b/scrub/disk.h
@@ -20,6 +20,7 @@
 #ifndef XFS_SCRUB_DISK_H_
 #define XFS_SCRUB_DISK_H_
 
+#define DISK_FLAG_SCSI_VERIFY	0x1
 struct disk {
 	struct stat	d_sb;
 	int		d_fd;


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 23/27] xfs_scrub: check summary counters
  2018-01-06  1:51 [PATCH v11 00/27] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (21 preceding siblings ...)
  2018-01-06  1:53 ` [PATCH 22/27] xfs_scrub: optionally use SCSI READ VERIFY commands to scrub data blocks on disk Darrick J. Wong
@ 2018-01-06  1:53 ` Darrick J. Wong
  2018-01-06  1:54 ` [PATCH 24/27] xfs_scrub: fstrim the free areas if there are no errors on the filesystem Darrick J. Wong
                   ` (7 subsequent siblings)
  30 siblings, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-06  1:53 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Make sure the filesystem summary counters are somewhat close to what
we can find by scanning the filesystem.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/Makefile    |    1 
 scrub/common.c    |   28 ++++++
 scrub/common.h    |    4 +
 scrub/phase7.c    |  266 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/xfs_scrub.c |    5 -
 scrub/xfs_scrub.h |    1 
 6 files changed, 302 insertions(+), 3 deletions(-)
 create mode 100644 scrub/phase7.c


diff --git a/scrub/Makefile b/scrub/Makefile
index 1fb6e84..fd26624 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -43,6 +43,7 @@ phase2.c \
 phase3.c \
 phase5.c \
 phase6.c \
+phase7.c \
 read_verify.c \
 scrub.c \
 spacemap.c \
diff --git a/scrub/common.c b/scrub/common.c
index 10c4017..bcdb8c0 100644
--- a/scrub/common.c
+++ b/scrub/common.c
@@ -413,3 +413,31 @@ _("More than %u naming warnings, shutting up."),
 
 	return debug || verbose || res;
 }
+
+/* Decide if a value is within +/- (n/d) of a desired value. */
+bool
+within_range(
+	struct scrub_ctx	*ctx,
+	unsigned long long	value,
+	unsigned long long	desired,
+	unsigned long long	abs_threshold,
+	unsigned int		n,
+	unsigned int		d,
+	const char		*descr)
+{
+	assert(n < d);
+
+	/* Don't complain if difference does not exceed an absolute value. */
+	if (value < desired && desired - value < abs_threshold)
+		return true;
+	if (value > desired && value - desired < abs_threshold)
+		return true;
+
+	/* Complain if the difference exceeds a certain percentage. */
+	if (value < desired * (d - n) / d)
+		return false;
+	if (value > desired * (d + n) / d)
+		return false;
+
+	return true;
+}
diff --git a/scrub/common.h b/scrub/common.h
index dd2070e..bd67a17 100644
--- a/scrub/common.h
+++ b/scrub/common.h
@@ -80,4 +80,8 @@ char *string_escape(const char *in);
 #define TOO_MANY_NAME_WARNINGS	10000
 bool should_warn_about_name(struct scrub_ctx *ctx);
 
+bool within_range(struct scrub_ctx *ctx, unsigned long long value,
+		unsigned long long desired, unsigned long long abs_threshold,
+		unsigned int n, unsigned int d, const char *descr);
+
 #endif /* XFS_SCRUB_COMMON_H_ */
diff --git a/scrub/phase7.c b/scrub/phase7.c
new file mode 100644
index 0000000..460ca8a
--- /dev/null
+++ b/scrub/phase7.c
@@ -0,0 +1,266 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <stdlib.h>
+#include <sys/statvfs.h>
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "path.h"
+#include "ptvar.h"
+#include "xfs_scrub.h"
+#include "common.h"
+#include "fscounters.h"
+#include "spacemap.h"
+
+/* Phase 7: Check summary counters. */
+
+struct xfs_summary_counts {
+	unsigned long long	dbytes;		/* data dev bytes */
+	unsigned long long	rbytes;		/* rt dev bytes */
+	unsigned long long	next_phys;	/* next phys bytes we see? */
+	unsigned long long	agbytes;	/* freespace bytes */
+};
+
+/* Record block usage. */
+static bool
+xfs_record_block_summary(
+	struct scrub_ctx		*ctx,
+	const char			*descr,
+	struct fsmap			*fsmap,
+	void				*arg)
+{
+	struct xfs_summary_counts	*counts;
+	unsigned long long		len;
+
+	counts = ptvar_get((struct ptvar *)arg);
+	if (fsmap->fmr_device == ctx->fsinfo.fs_logdev)
+		return true;
+	if ((fsmap->fmr_flags & FMR_OF_SPECIAL_OWNER) &&
+	    fsmap->fmr_owner == XFS_FMR_OWN_FREE)
+		return true;
+
+	len = fsmap->fmr_length;
+
+	/* freesp btrees live in free space, need to adjust counters later. */
+	if ((fsmap->fmr_flags & FMR_OF_SPECIAL_OWNER) &&
+	    fsmap->fmr_owner == XFS_FMR_OWN_AG) {
+		counts->agbytes += fsmap->fmr_length;
+	}
+	if (fsmap->fmr_device == ctx->fsinfo.fs_rtdev) {
+		/* Count realtime extents. */
+		counts->rbytes += len;
+	} else {
+		/* Count datadev extents. */
+		if (counts->next_phys >= fsmap->fmr_physical + len)
+			return true;
+		else if (counts->next_phys > fsmap->fmr_physical)
+			len = counts->next_phys - fsmap->fmr_physical;
+		counts->dbytes += len;
+		counts->next_phys = fsmap->fmr_physical + fsmap->fmr_length;
+	}
+
+	return true;
+}
+
+/* Add all the summaries in the per-thread counter */
+static bool
+xfs_add_summaries(
+	struct ptvar			*ptv,
+	void				*data,
+	void				*arg)
+{
+	struct xfs_summary_counts	*total = arg;
+	struct xfs_summary_counts	*item = data;
+
+	total->dbytes += item->dbytes;
+	total->rbytes += item->rbytes;
+	total->agbytes += item->agbytes;
+	return true;
+}
+
+/*
+ * Count all inodes and blocks in the filesystem as told by GETFSMAP and
+ * BULKSTAT, and compare that to summary counters.  Since this is a live
+ * filesystem we'll be content if the summary counts are within 10% of
+ * what we observed.
+ */
+bool
+xfs_scan_summary(
+	struct scrub_ctx		*ctx)
+{
+	struct xfs_summary_counts	totalcount = {0};
+	struct ptvar			*ptvar;
+	unsigned long long		used_data;
+	unsigned long long		used_rt;
+	unsigned long long		used_files;
+	unsigned long long		stat_data;
+	unsigned long long		stat_rt;
+	uint64_t			counted_inodes = 0;
+	unsigned long long		absdiff;
+	unsigned long long		d_blocks;
+	unsigned long long		d_bfree;
+	unsigned long long		r_blocks;
+	unsigned long long		r_bfree;
+	unsigned long long		f_files;
+	unsigned long long		f_free;
+	bool				moveon;
+	bool				complain;
+	int				error;
+
+	/* Flush everything out to disk before we start counting. */
+	error = syncfs(ctx->mnt_fd);
+	if (error) {
+		str_errno(ctx, ctx->mntpoint);
+		return false;
+	}
+
+	ptvar = ptvar_init(scrub_nproc(ctx), sizeof(struct xfs_summary_counts));
+	if (!ptvar) {
+		str_errno(ctx, ctx->mntpoint);
+		return false;
+	}
+
+	/* Use fsmap to count blocks. */
+	moveon = xfs_scan_all_spacemaps(ctx, xfs_record_block_summary, ptvar);
+	if (!moveon)
+		goto out_free;
+	moveon = ptvar_foreach(ptvar, xfs_add_summaries, &totalcount);
+	if (!moveon)
+		goto out_free;
+	ptvar_free(ptvar);
+
+	/* Scan the whole fs. */
+	moveon = xfs_count_all_inodes(ctx, &counted_inodes);
+	if (!moveon)
+		goto out;
+
+	moveon = xfs_scan_estimate_blocks(ctx, &d_blocks, &d_bfree, &r_blocks,
+			&r_bfree, &f_files, &f_free);
+	if (!moveon)
+		return moveon;
+
+	/*
+	 * If we counted blocks with fsmap, then dblocks includes
+	 * blocks for the AGFL and the freespace/rmap btrees.  The
+	 * filesystem treats them as "free", but since we scanned
+	 * them, we'll consider them used.
+	 */
+	d_bfree -= totalcount.agbytes >> ctx->blocklog;
+
+	/* Report on what we found. */
+	used_data = (d_blocks - d_bfree) << ctx->blocklog;
+	used_rt = (r_blocks - r_bfree) << ctx->blocklog;
+	used_files = f_files - f_free;
+	stat_data = totalcount.dbytes;
+	stat_rt = totalcount.rbytes;
+
+	/*
+	 * Complain if the counts are off by more than 10% unless
+	 * the inaccuracy is less than 32MB worth of blocks or 100 inodes.
+	 */
+	absdiff = 1ULL << 25;
+	complain = verbose;
+	complain |= !within_range(ctx, stat_data, used_data, absdiff, 1, 10,
+			_("data blocks"));
+	complain |= !within_range(ctx, stat_rt, used_rt, absdiff, 1, 10,
+			_("realtime blocks"));
+	complain |= !within_range(ctx, counted_inodes, used_files, 100, 1, 10,
+			_("inodes"));
+
+	if (complain) {
+		double		d, r, i;
+		char		*du, *ru, *iu;
+
+		if (used_rt || stat_rt) {
+			d = auto_space_units(used_data, &du);
+			r = auto_space_units(used_rt, &ru);
+			i = auto_units(used_files, &iu);
+			fprintf(stdout,
+_("%.1f%s data used;  %.1f%s realtime data used;  %.2f%s inodes used.\n"),
+					d, du, r, ru, i, iu);
+			d = auto_space_units(stat_data, &du);
+			r = auto_space_units(stat_rt, &ru);
+			i = auto_units(counted_inodes, &iu);
+			fprintf(stdout,
+_("%.1f%s data found; %.1f%s realtime data found; %.2f%s inodes found.\n"),
+					d, du, r, ru, i, iu);
+		} else {
+			d = auto_space_units(used_data, &du);
+			i = auto_units(used_files, &iu);
+			fprintf(stdout,
+_("%.1f%s data used;  %.1f%s inodes used.\n"),
+					d, du, i, iu);
+			d = auto_space_units(stat_data, &du);
+			i = auto_units(counted_inodes, &iu);
+			fprintf(stdout,
+_("%.1f%s data found; %.1f%s inodes found.\n"),
+					d, du, i, iu);
+		}
+		fflush(stdout);
+	}
+
+	/*
+	 * Complain if the checked inode counts are off, which
+	 * implies an incomplete check.
+	 */
+	if (verbose ||
+	    !within_range(ctx, counted_inodes, ctx->inodes_checked, 100, 1, 10,
+			_("checked inodes"))) {
+		double		i1, i2;
+		char		*i1u, *i2u;
+
+		i1 = auto_units(counted_inodes, &i1u);
+		i2 = auto_units(ctx->inodes_checked, &i2u);
+		fprintf(stdout,
+_("%.1f%s inodes counted; %.1f%s inodes checked.\n"),
+				i1, i1u, i2, i2u);
+		fflush(stdout);
+	}
+
+	/*
+	 * Complain if the checked block counts are off, which
+	 * implies an incomplete check.
+	 */
+	if (ctx->bytes_checked &&
+	    (verbose ||
+	     !within_range(ctx, used_data + used_rt,
+			ctx->bytes_checked, absdiff, 1, 10,
+			_("verified blocks")))) {
+		double		b1, b2;
+		char		*b1u, *b2u;
+
+		b1 = auto_space_units(used_data + used_rt, &b1u);
+		b2 = auto_space_units(ctx->bytes_checked, &b2u);
+		fprintf(stdout,
+_("%.1f%s data counted; %.1f%s data verified.\n"),
+				b1, b1u, b2, b2u);
+		fflush(stdout);
+	}
+
+	moveon = true;
+
+out:
+	return moveon;
+out_free:
+	ptvar_free(ptvar);
+	return moveon;
+}
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index fa1d089..bc40f3c 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -374,6 +374,8 @@ run_scrub_phases(
 		},
 		{
 			.descr = _("Check summary counters."),
+			.fn = xfs_scan_summary,
+			.must_run = true,
 		},
 		{
 			NULL
@@ -443,9 +445,6 @@ main(
 	static bool		injected;
 	int			ret = 0;
 
-	fprintf(stderr, "XXX: This program is not complete!\n");
-	return 4;
-
 	progname = basename(argv[0]);
 	setlocale(LC_ALL, "");
 	bindtextdomain(PACKAGE, LOCALEDIR);
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index 026631a..a5cdba8 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -107,5 +107,6 @@ bool xfs_scan_metadata(struct scrub_ctx *ctx);
 bool xfs_scan_inodes(struct scrub_ctx *ctx);
 bool xfs_scan_connections(struct scrub_ctx *ctx);
 bool xfs_scan_blocks(struct scrub_ctx *ctx);
+bool xfs_scan_summary(struct scrub_ctx *ctx);
 
 #endif /* XFS_SCRUB_XFS_SCRUB_H_ */


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 24/27] xfs_scrub: fstrim the free areas if there are no errors on the filesystem
  2018-01-06  1:51 [PATCH v11 00/27] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (22 preceding siblings ...)
  2018-01-06  1:53 ` [PATCH 23/27] xfs_scrub: check summary counters Darrick J. Wong
@ 2018-01-06  1:54 ` Darrick J. Wong
  2018-01-16 22:07   ` Eric Sandeen
  2018-01-06  1:54 ` [PATCH 25/27] xfs_scrub: progress indicator Darrick J. Wong
                   ` (6 subsequent siblings)
  30 siblings, 1 reply; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-06  1:54 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

If the filesystem scan comes out clean or fixes all the problems, call
fstrim to clean out the free areas (if it's an ssd/thinp/whatever).

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/Makefile    |    1 +
 scrub/phase4.c    |   52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/vfs.c       |   23 +++++++++++++++++++++++
 scrub/vfs.h       |    2 ++
 scrub/xfs_scrub.c |   26 +++++++++++++++++++++++++-
 scrub/xfs_scrub.h |    1 +
 6 files changed, 104 insertions(+), 1 deletion(-)
 create mode 100644 scrub/phase4.c


diff --git a/scrub/Makefile b/scrub/Makefile
index fd26624..91f99ff 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -41,6 +41,7 @@ inodes.c \
 phase1.c \
 phase2.c \
 phase3.c \
+phase4.c \
 phase5.c \
 phase6.c \
 phase7.c \
diff --git a/scrub/phase4.c b/scrub/phase4.c
new file mode 100644
index 0000000..dadf4de
--- /dev/null
+++ b/scrub/phase4.c
@@ -0,0 +1,52 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <dirent.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/statvfs.h>
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "list.h"
+#include "path.h"
+#include "workqueue.h"
+#include "xfs_scrub.h"
+#include "common.h"
+#include "scrub.h"
+#include "vfs.h"
+
+/* Phase 4: Repair filesystem. */
+
+/* Fix everything that needs fixing. */
+bool
+xfs_repair_fs(
+	struct scrub_ctx		*ctx)
+{
+	bool				moveon = true;
+
+	pthread_mutex_lock(&ctx->lock);
+	if (moveon && ctx->errors_found == 0)
+		fstrim(ctx);
+	pthread_mutex_unlock(&ctx->lock);
+
+	return moveon;
+}
diff --git a/scrub/vfs.c b/scrub/vfs.c
index 6a51090..98d356f 100644
--- a/scrub/vfs.c
+++ b/scrub/vfs.c
@@ -219,3 +219,26 @@ _("Could not queue directory scan work."));
 	free(sftd);
 	return false;
 }
+
+#ifndef FITRIM
+struct fstrim_range {
+	__u64 start;
+	__u64 len;
+	__u64 minlen;
+};
+#define FITRIM		_IOWR('X', 121, struct fstrim_range)	/* Trim */
+#endif
+
+/* Call FITRIM to trim all the unused space in a filesystem. */
+void
+fstrim(
+	struct scrub_ctx	*ctx)
+{
+	struct fstrim_range	range = {0};
+	int			error;
+
+	range.len = ULLONG_MAX;
+	error = ioctl(ctx->mnt_fd, FITRIM, &range);
+	if (error && errno != EOPNOTSUPP && errno != ENOTTY)
+		perror(_("fstrim"));
+}
diff --git a/scrub/vfs.h b/scrub/vfs.h
index 100eb18..3305159 100644
--- a/scrub/vfs.h
+++ b/scrub/vfs.h
@@ -28,4 +28,6 @@ typedef bool (*scan_fs_tree_dirent_fn)(struct scrub_ctx *, const char *,
 bool scan_fs_tree(struct scrub_ctx *ctx, scan_fs_tree_dir_fn dir_fn,
 		scan_fs_tree_dirent_fn dirent_fn, void *arg);
 
+void fstrim(struct scrub_ctx *ctx);
+
 #endif /* XFS_SCRUB_VFS_H_ */
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index bc40f3c..7809431 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -340,6 +340,20 @@ _("%sI/O rate: %.1f%s/s in, %.1f%s/s out, %.1f%s/s tot\n"),
 	return true;
 }
 
+/* Run the preening phase if there are no errors. */
+static bool
+preen(
+	struct scrub_ctx	*ctx)
+{
+	if (ctx->errors_found) {
+		str_info(ctx, ctx->mntpoint,
+_("Errors found, please re-run with -y."));
+		return true;
+	}
+
+	return xfs_repair_fs(ctx);
+}
+
 /* Run all the phases of the scrubber. */
 static bool
 run_scrub_phases(
@@ -393,8 +407,18 @@ run_scrub_phases(
 	/* Run all phases of the scrub tool. */
 	for (phase = 1, sp = phases; sp->fn; sp++, phase++) {
 		/* Turn on certain phases if user said to. */
-		if (sp->fn == DATASCAN_DUMMY_FN && scrub_data)
+		if (sp->fn == DATASCAN_DUMMY_FN && scrub_data) {
 			sp->fn = xfs_scan_blocks;
+		} else if (sp->fn == REPAIR_DUMMY_FN) {
+			if (ctx->mode == SCRUB_MODE_PREEN) {
+				sp->descr = _("Preen filesystem.");
+				sp->fn = preen;
+			} else if (ctx->mode == SCRUB_MODE_REPAIR) {
+				sp->descr = _("Repair filesystem.");
+				sp->fn = xfs_repair_fs;
+			}
+			sp->must_run = true;
+		}
 
 		/* Skip certain phases unless they're turned on. */
 		if (sp->fn == REPAIR_DUMMY_FN ||
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index a5cdba8..4a383f1 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -108,5 +108,6 @@ bool xfs_scan_inodes(struct scrub_ctx *ctx);
 bool xfs_scan_connections(struct scrub_ctx *ctx);
 bool xfs_scan_blocks(struct scrub_ctx *ctx);
 bool xfs_scan_summary(struct scrub_ctx *ctx);
+bool xfs_repair_fs(struct scrub_ctx *ctx);
 
 #endif /* XFS_SCRUB_XFS_SCRUB_H_ */


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 25/27] xfs_scrub: progress indicator
  2018-01-06  1:51 [PATCH v11 00/27] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (23 preceding siblings ...)
  2018-01-06  1:54 ` [PATCH 24/27] xfs_scrub: fstrim the free areas if there are no errors on the filesystem Darrick J. Wong
@ 2018-01-06  1:54 ` Darrick J. Wong
  2018-01-11 23:27   ` Eric Sandeen
  2018-01-06  1:54 ` [PATCH 26/27] xfs_scrub: create a script to scrub all xfs filesystems Darrick J. Wong
                   ` (5 subsequent siblings)
  30 siblings, 1 reply; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-06  1:54 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Implement a progress indicator.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 man/man8/xfs_scrub.8 |   11 ++
 scrub/Makefile       |    2 
 scrub/common.c       |   23 ++++-
 scrub/phase2.c       |   14 +++
 scrub/phase3.c       |   16 ++++
 scrub/phase4.c       |   19 ++++
 scrub/phase5.c       |    2 
 scrub/phase6.c       |   28 ++++++
 scrub/progress.c     |  222 ++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/progress.h     |   33 +++++++
 scrub/read_verify.c  |    2 
 scrub/scrub.c        |   28 ++++++
 scrub/xfs_scrub.c    |   59 +++++++++++++
 scrub/xfs_scrub.h    |   14 +++
 14 files changed, 463 insertions(+), 10 deletions(-)
 create mode 100644 scrub/progress.c
 create mode 100644 scrub/progress.h


diff --git a/man/man8/xfs_scrub.8 b/man/man8/xfs_scrub.8
index 95f4fea..dee9076 100644
--- a/man/man8/xfs_scrub.8
+++ b/man/man8/xfs_scrub.8
@@ -4,7 +4,7 @@ xfs_scrub \- scrub the contents of an XFS filesystem
 .SH SYNOPSIS
 .B xfs_scrub
 [
-.B \-abemnTvVxy
+.B \-abCemnTvVxy
 ]
 .I mount-point
 .br
@@ -47,6 +47,15 @@ time.
 If given more than once, an artificial delay of 100us is added to each
 scrub call to reduce CPU overhead even further.
 .TP
+.BI \-C " fd"
+This option causes xfs_scrub to write progress information to the
+specified file description so that the progress of the filesystem check
+can be monitored.
+If the file description is a tty, a fancy progress bar is rendered.
+Otherwise, a simple numeric status dump compatible with the
+.B fsck -C
+format is output.
+.TP
 .B \-e
 Specifies what happens when errors are detected.
 If
diff --git a/scrub/Makefile b/scrub/Makefile
index 91f99ff..7a80ff6 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -23,6 +23,7 @@ disk.h \
 filemap.h \
 fscounters.h \
 inodes.h \
+progress.h \
 read_verify.h \
 scrub.h \
 spacemap.h \
@@ -45,6 +46,7 @@ phase4.c \
 phase5.c \
 phase6.c \
 phase7.c \
+progress.c \
 read_verify.c \
 scrub.c \
 spacemap.c \
diff --git a/scrub/common.c b/scrub/common.c
index bcdb8c0..fa66ddc 100644
--- a/scrub/common.c
+++ b/scrub/common.c
@@ -27,6 +27,7 @@
 #include "path.h"
 #include "xfs_scrub.h"
 #include "common.h"
+#include "progress.h"
 
 /*
  * Reporting Status to the Console
@@ -55,6 +56,18 @@ xfs_scrub_excessive_errors(
 	return ret;
 }
 
+/* If stderr is a tty, clear to end of line to clean up progress bar. */
+static inline const char *stderr_start(void)
+{
+	return stderr_isatty ? CLEAR_EOL : "";
+}
+
+/* If stdout is a tty, clear to end of line to clean up progress bar. */
+static inline const char *stdout_start(void)
+{
+	return stdout_isatty ? CLEAR_EOL : "";
+}
+
 /* Print an error string and whatever error is stored in errno. */
 void
 __str_errno(
@@ -66,7 +79,7 @@ __str_errno(
 	char			buf[DESCR_BUFSZ];
 
 	pthread_mutex_lock(&ctx->lock);
-	fprintf(stderr, _("Error: %s: %s."), descr,
+	fprintf(stderr, _("%sError: %s: %s."), stderr_start(), descr,
 			strerror_r(errno, buf, DESCR_BUFSZ));
 	if (debug)
 		fprintf(stderr, _(" (%s line %d)"), file, line);
@@ -86,7 +99,7 @@ __str_errno_warn(
 	char			buf[DESCR_BUFSZ];
 
 	pthread_mutex_lock(&ctx->lock);
-	fprintf(stderr, _("Warning: %s: %s."), descr,
+	fprintf(stderr, _("%sWarning: %s: %s."), stderr_start(), descr,
 			strerror_r(errno, buf, DESCR_BUFSZ));
 	if (debug)
 		fprintf(stderr, _(" (%s line %d)"), file, line);
@@ -108,7 +121,7 @@ __str_error(
 	va_list			args;
 
 	pthread_mutex_lock(&ctx->lock);
-	fprintf(stderr, _("Error: %s: "), descr);
+	fprintf(stderr, _("%sError: %s: "), stderr_start(), descr);
 	va_start(args, format);
 	vfprintf(stderr, format, args);
 	va_end(args);
@@ -132,7 +145,7 @@ __str_warn(
 	va_list			args;
 
 	pthread_mutex_lock(&ctx->lock);
-	fprintf(stderr, _("Warning: %s: "), descr);
+	fprintf(stderr, _("%sWarning: %s: "), stderr_start(), descr);
 	va_start(args, format);
 	vfprintf(stderr, format, args);
 	va_end(args);
@@ -156,7 +169,7 @@ __str_info(
 	va_list			args;
 
 	pthread_mutex_lock(&ctx->lock);
-	fprintf(stdout, _("Info: %s: "), descr);
+	fprintf(stdout, _("%sInfo: %s: "), stdout_start(), descr);
 	va_start(args, format);
 	vfprintf(stdout, format, args);
 	va_end(args);
diff --git a/scrub/phase2.c b/scrub/phase2.c
index 153ae02..e8eb1ca 100644
--- a/scrub/phase2.c
+++ b/scrub/phase2.c
@@ -131,3 +131,17 @@ _("Could not queue filesystem scrub work."));
 	workqueue_destroy(&wq);
 	return moveon;
 }
+
+/* Estimate how much work we're going to do. */
+bool
+xfs_estimate_metadata_work(
+	struct scrub_ctx	*ctx,
+	uint64_t		*items,
+	unsigned int		*nr_threads,
+	int			*rshift)
+{
+	*items = xfs_scrub_estimate_ag_work(ctx);
+	*nr_threads = scrub_nproc(ctx);
+	*rshift = 0;
+	return true;
+}
diff --git a/scrub/phase3.c b/scrub/phase3.c
index b3fc510..43697c6 100644
--- a/scrub/phase3.c
+++ b/scrub/phase3.c
@@ -30,6 +30,7 @@
 #include "common.h"
 #include "counter.h"
 #include "inodes.h"
+#include "progress.h"
 #include "scrub.h"
 
 /* Phase 3: Scan all inodes. */
@@ -116,6 +117,7 @@ xfs_scrub_inode(
 
 out:
 	ptcounter_add(icount, 1);
+	progress_add(1);
 	if (fd >= 0)
 		close(fd);
 	if (!moveon)
@@ -150,3 +152,17 @@ xfs_scan_inodes(
 	ptcounter_free(ictx.icount);
 	return ictx.moveon;
 }
+
+/* Estimate how much work we're going to do. */
+bool
+xfs_estimate_inodes_work(
+	struct scrub_ctx	*ctx,
+	uint64_t		*items,
+	unsigned int		*nr_threads,
+	int			*rshift)
+{
+	*items = ctx->mnt_sv.f_files - ctx->mnt_sv.f_ffree;
+	*nr_threads = scrub_nproc(ctx);
+	*rshift = 0;
+	return true;
+}
diff --git a/scrub/phase4.c b/scrub/phase4.c
index dadf4de..43a654a 100644
--- a/scrub/phase4.c
+++ b/scrub/phase4.c
@@ -31,6 +31,7 @@
 #include "workqueue.h"
 #include "xfs_scrub.h"
 #include "common.h"
+#include "progress.h"
 #include "scrub.h"
 #include "vfs.h"
 
@@ -44,9 +45,25 @@ xfs_repair_fs(
 	bool				moveon = true;
 
 	pthread_mutex_lock(&ctx->lock);
-	if (moveon && ctx->errors_found == 0)
+	if (moveon && ctx->errors_found == 0) {
 		fstrim(ctx);
+		progress_add(1);
+	}
 	pthread_mutex_unlock(&ctx->lock);
 
 	return moveon;
 }
+
+/* Estimate how much work we're going to do. */
+bool
+xfs_estimate_repair_work(
+	struct scrub_ctx	*ctx,
+	uint64_t		*items,
+	unsigned int		*nr_threads,
+	int			*rshift)
+{
+	*items = 1;
+	*nr_threads = 1;
+	*rshift = 0;
+	return true;
+}
diff --git a/scrub/phase5.c b/scrub/phase5.c
index 8b8aeed..1ec8313 100644
--- a/scrub/phase5.c
+++ b/scrub/phase5.c
@@ -34,6 +34,7 @@
 #include "xfs_scrub.h"
 #include "common.h"
 #include "inodes.h"
+#include "progress.h"
 #include "scrub.h"
 #include "unicrash.h"
 
@@ -287,6 +288,7 @@ xfs_scrub_connections(
 	}
 
 out:
+	progress_add(1);
 	if (fd >= 0)
 		close(fd);
 	if (!moveon)
diff --git a/scrub/phase6.c b/scrub/phase6.c
index 5ecb8dc..d349730 100644
--- a/scrub/phase6.c
+++ b/scrub/phase6.c
@@ -33,6 +33,7 @@
 #include "bitmap.h"
 #include "disk.h"
 #include "filemap.h"
+#include "fscounters.h"
 #include "inodes.h"
 #include "read_verify.h"
 #include "spacemap.h"
@@ -514,3 +515,30 @@ _("Could not create media verifier."));
 	ptvar_free(ve.rvstate);
 	return moveon;
 }
+
+/* Estimate how much work we're going to do. */
+bool
+xfs_estimate_verify_work(
+	struct scrub_ctx	*ctx,
+	uint64_t		*items,
+	unsigned int		*nr_threads,
+	int			*rshift)
+{
+	unsigned long long	d_blocks;
+	unsigned long long	d_bfree;
+	unsigned long long	r_blocks;
+	unsigned long long	r_bfree;
+	unsigned long long	f_files;
+	unsigned long long	f_free;
+	bool			moveon;
+
+	moveon = xfs_scan_estimate_blocks(ctx, &d_blocks, &d_bfree,
+				&r_blocks, &r_bfree, &f_files, &f_free);
+	if (!moveon)
+		return moveon;
+
+	*items = ((d_blocks - d_bfree) + (r_blocks - r_bfree)) << ctx->blocklog;
+	*nr_threads = disk_heads(ctx->datadev);
+	*rshift = 20;
+	return moveon;
+}
diff --git a/scrub/progress.c b/scrub/progress.c
new file mode 100644
index 0000000..30b2152
--- /dev/null
+++ b/scrub/progress.c
@@ -0,0 +1,222 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "libxfs.h"
+#include <stdio.h>
+#include <dirent.h>
+#include <pthread.h>
+#include <sys/statvfs.h>
+#include "../repair/threads.h"
+#include "path.h"
+#include "disk.h"
+#include "read_verify.h"
+#include "xfs_scrub.h"
+#include "common.h"
+#include "counter.h"
+#include "progress.h"
+
+/*
+ * Progress Tracking
+ *
+ * For scrub phases that expect to take a long time, this facility uses
+ * the threaded counter and some phase/state information to report the
+ * progress of a particular phase to stdout.  Each phase that wants
+ * progress information needs to set up the tracker with an estimate of
+ * the work to be done and periodic updates when work items finish.  In
+ * return, the progress tracker will print a pretty progress bar and
+ * twiddle to a tty, or a raw numeric output compatible with fsck -C.
+ */
+struct progress_tracker {
+	FILE			*fp;
+	const char		*tag;
+	struct ptcounter	*ptc;
+	uint64_t		max;
+	unsigned int		phase;
+	int			rshift;
+	int			twiddle;
+	bool			isatty;
+	bool			terminate;
+	pthread_t		thread;
+
+	/* static state */
+	pthread_mutex_t		lock;
+	pthread_cond_t		wakeup;
+};
+
+static struct progress_tracker pt = {
+	.lock			= PTHREAD_MUTEX_INITIALIZER,
+	.wakeup			= PTHREAD_COND_INITIALIZER,
+};
+
+/* Add some progress. */
+void
+progress_add(
+	uint64_t		x)
+{
+	if (pt.fp)
+		ptcounter_add(pt.ptc, x);
+}
+
+static const char twiddles[] = "|/-\\";
+
+static void
+progress_report(
+	uint64_t		sum)
+{
+	char			buf[81];
+	int			tag_len;
+	int			num_len;
+	int			pbar_len;
+	int			plen;
+
+	if (!pt.fp)
+		return;
+
+	if (sum > pt.max)
+		sum = pt.max;
+
+	/* Emulate fsck machine-readable output (phase, cur, max, label) */
+	if (!pt.isatty) {
+		snprintf(buf, sizeof(buf), _("%u %"PRIu64" %"PRIu64" %s"),
+				pt.phase, sum, pt.max, pt.tag);
+		fprintf(pt.fp, "%s\n", buf);
+		fflush(pt.fp);
+		return;
+	}
+
+	/* Interactive twiddle progress bar. */
+	if (debug) {
+		num_len = snprintf(buf, sizeof(buf),
+				"%c %"PRIu64"/%"PRIu64" (%.1f%%)",
+				twiddles[pt.twiddle],
+				sum >> pt.rshift,
+				pt.max >> pt.rshift,
+				100.0 * sum / pt.max);
+	} else {
+		num_len = snprintf(buf, sizeof(buf),
+				"%c (%.1f%%)",
+				twiddles[pt.twiddle],
+				100.0 * sum / pt.max);
+	}
+	memmove(buf + sizeof(buf) - (num_len + 1), buf, num_len + 1);
+	tag_len = snprintf(buf, sizeof(buf), _("Phase %u: |"), pt.phase);
+	pbar_len = sizeof(buf) - (num_len + 1 + tag_len);
+	plen = (int)((double)pbar_len * sum / pt.max);
+	memset(buf + tag_len, '=', plen);
+	memset(buf + tag_len + plen, ' ', pbar_len - plen);
+	pt.twiddle = (pt.twiddle + 1) % 4;
+	fprintf(pt.fp, "%c%s\r%c", START_IGNORE, buf, END_IGNORE);
+	fflush(pt.fp);
+}
+
+#define NSEC_PER_SEC	(1000000000)
+static void *
+progress_report_thread(void *arg)
+{
+	struct timespec		abstime;
+	int			ret;
+
+	pthread_mutex_lock(&pt.lock);
+	while (1) {
+		/* Every half second. */
+		ret = clock_gettime(CLOCK_REALTIME, &abstime);
+		if (ret)
+			break;
+		abstime.tv_nsec += NSEC_PER_SEC / 2;
+		if (abstime.tv_nsec > NSEC_PER_SEC) {
+			abstime.tv_sec++;
+			abstime.tv_nsec -= NSEC_PER_SEC;
+		}
+		pthread_cond_timedwait(&pt.wakeup, &pt.lock, &abstime);
+		if (pt.terminate)
+			break;
+		progress_report(ptcounter_value(pt.ptc));
+	}
+	pthread_mutex_unlock(&pt.lock);
+	return NULL;
+}
+
+/* End a phase of progress reporting. */
+void
+progress_end_phase(void)
+{
+	if (!pt.fp)
+		return;
+
+	pthread_mutex_lock(&pt.lock);
+	pt.terminate = true;
+	pthread_mutex_unlock(&pt.lock);
+	pthread_cond_broadcast(&pt.wakeup);
+	pthread_join(pt.thread, NULL);
+
+	progress_report(pt.max);
+	ptcounter_free(pt.ptc);
+	pt.max = 0;
+	pt.ptc = NULL;
+	if (pt.fp) {
+		fprintf(pt.fp, CLEAR_EOL);
+		fflush(pt.fp);
+	}
+	pt.fp = NULL;
+}
+
+/* Set ourselves up to report progress. */
+bool
+progress_init_phase(
+	struct scrub_ctx	*ctx,
+	FILE			*fp,
+	unsigned int		phase,
+	uint64_t		max,
+	int			rshift,
+	unsigned int		nr_threads)
+{
+	int			ret;
+
+	assert(pt.fp == NULL);
+	if (fp == NULL || max == 0) {
+		pt.fp = NULL;
+		return true;
+	}
+	pt.fp = fp;
+	pt.isatty = isatty(fileno(fp));
+	pt.tag = ctx->mntpoint;
+	pt.max = max;
+	pt.phase = phase;
+	pt.rshift = rshift;
+	pt.twiddle = 0;
+	pt.terminate = false;
+
+	pt.ptc = ptcounter_init(nr_threads);
+	if (!pt.ptc)
+		goto out_max;
+
+	ret = pthread_create(&pt.thread, NULL, progress_report_thread, NULL);
+	if (ret)
+		goto out_ptcounter;
+
+	return true;
+
+out_ptcounter:
+	ptcounter_free(pt.ptc);
+	pt.ptc = NULL;
+out_max:
+	pt.max = 0;
+	pt.fp = NULL;
+	return false;
+}
diff --git a/scrub/progress.h b/scrub/progress.h
new file mode 100644
index 0000000..29a3e83
--- /dev/null
+++ b/scrub/progress.h
@@ -0,0 +1,33 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef XFS_SCRUB_PROGRESS_H_
+#define XFS_SCRUB_PROGRESS_H_
+
+#define CLEAR_EOL	"\033[K"
+#define START_IGNORE	'\001'
+#define END_IGNORE	'\002'
+
+bool progress_init_phase(struct scrub_ctx *ctx, FILE *progress_fp,
+			 unsigned int phase, uint64_t max, int rshift,
+			 unsigned int nr_threads);
+void progress_end_phase(void);
+void progress_add(uint64_t x);
+
+#endif /* XFS_SCRUB_PROGRESS_H_ */
diff --git a/scrub/read_verify.c b/scrub/read_verify.c
index 244626d..e816688 100644
--- a/scrub/read_verify.c
+++ b/scrub/read_verify.c
@@ -31,6 +31,7 @@
 #include "counter.h"
 #include "disk.h"
 #include "read_verify.h"
+#include "progress.h"
 
 /*
  * Read Verify Pool
@@ -154,6 +155,7 @@ read_verify(
 					errno, rv->io_end_arg);
 		}
 
+		progress_add(len);
 		verified += len;
 		rv->io_start += len;
 		rv->io_length -= len;
diff --git a/scrub/scrub.c b/scrub/scrub.c
index 98e7e0d..bc4eab4 100644
--- a/scrub/scrub.c
+++ b/scrub/scrub.c
@@ -31,6 +31,7 @@
 #include "path.h"
 #include "xfs_scrub.h"
 #include "common.h"
+#include "progress.h"
 #include "scrub.h"
 #include "xfs_errortag.h"
 
@@ -343,6 +344,7 @@ xfs_scrub_metadata(
 
 		/* Check the item. */
 		fix = xfs_check_metadata(ctx, ctx->mnt_fd, &meta, false);
+		progress_add(1);
 		switch (fix) {
 		case CHECK_ABORT:
 			return false;
@@ -416,6 +418,32 @@ xfs_scrub_fs_metadata(
 	return xfs_scrub_metadata(ctx, ST_FS, 0);
 }
 
+/* How many items do we have to check? */
+unsigned int
+xfs_scrub_estimate_ag_work(
+	struct scrub_ctx		*ctx)
+{
+	const struct scrub_descr	*sc;
+	int				type;
+	unsigned int			estimate = 0;
+
+	sc = scrubbers;
+	for (type = 0; type < XFS_SCRUB_TYPE_NR; type++, sc++) {
+		switch (sc->type) {
+		case ST_AGHEADER:
+		case ST_PERAG:
+			estimate += ctx->geo.agcount;
+			break;
+		case ST_FS:
+			estimate++;
+			break;
+		default:
+			break;
+		}
+	}
+	return estimate;
+}
+
 /* Scrub inode metadata. */
 static bool
 __xfs_scrub_file(
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index 7809431..5750108 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -32,6 +32,7 @@
 #include "xfs_scrub.h"
 #include "common.h"
 #include "unicrash.h"
+#include "progress.h"
 
 /*
  * XFS Online Metadata Scrub (and Repair)
@@ -139,12 +140,17 @@ bool				scrub_data;
 /* Size of a memory page. */
 long				page_size;
 
+/* If stdout/stderr are ttys, we can use richer terminal control. */
+bool				stderr_isatty;
+bool				stdout_isatty;
+
 static void __attribute__((noreturn))
 usage(void)
 {
 	fprintf(stderr, _("Usage: %s [OPTIONS] mountpoint\n"), progname);
 	fprintf(stderr, _("-a:\tStop after this many errors are found.\n"));
 	fprintf(stderr, _("-b:\tBackground mode.\n"));
+	fprintf(stderr, _("-C:\tPrint progress information to this fd.\n"));
 	fprintf(stderr, _("-e:\tWhat to do if errors are found.\n"));
 	fprintf(stderr, _("-m:\tPath to /etc/mtab.\n"));
 	fprintf(stderr, _("-n:\tDry run.  Do not modify anything.\n"));
@@ -219,6 +225,8 @@ struct phase_rusage {
 struct phase_ops {
 	char		*descr;
 	bool		(*fn)(struct scrub_ctx *);
+	bool		(*estimate_work)(struct scrub_ctx *, uint64_t *,
+					 unsigned int *, int *);
 	bool		must_run;
 };
 
@@ -357,7 +365,8 @@ _("Errors found, please re-run with -y."));
 /* Run all the phases of the scrubber. */
 static bool
 run_scrub_phases(
-	struct scrub_ctx	*ctx)
+	struct scrub_ctx	*ctx,
+	FILE			*progress_fp)
 {
 	struct phase_ops phases[] =
 	{
@@ -369,22 +378,27 @@ run_scrub_phases(
 		{
 			.descr = _("Check internal metadata."),
 			.fn = xfs_scan_metadata,
+			.estimate_work = xfs_estimate_metadata_work,
 		},
 		{
 			.descr = _("Scan all inodes."),
 			.fn = xfs_scan_inodes,
+			.estimate_work = xfs_estimate_inodes_work,
 		},
 		{
 			.descr = _("Defer filesystem repairs."),
 			.fn = REPAIR_DUMMY_FN,
+			.estimate_work = xfs_estimate_repair_work,
 		},
 		{
 			.descr = _("Check directory tree."),
 			.fn = xfs_scan_connections,
+			.estimate_work = xfs_estimate_inodes_work,
 		},
 		{
 			.descr = _("Verify data file integrity."),
 			.fn = DATASCAN_DUMMY_FN,
+			.estimate_work = xfs_estimate_verify_work,
 		},
 		{
 			.descr = _("Check summary counters."),
@@ -397,9 +411,12 @@ run_scrub_phases(
 	};
 	struct phase_rusage	pi;
 	struct phase_ops	*sp;
+	uint64_t		max_work;
 	bool			moveon = true;
 	unsigned int		debug_phase = 0;
 	unsigned int		phase;
+	unsigned int		nr_threads;
+	int			rshift;
 
 	if (debug && debug_tweak_on("XFS_SCRUB_PHASE"))
 		debug_phase = atoi(getenv("XFS_SCRUB_PHASE"));
@@ -433,6 +450,18 @@ run_scrub_phases(
 		moveon = phase_start(&pi, phase, sp->descr);
 		if (!moveon)
 			break;
+		if (sp->estimate_work) {
+			moveon = sp->estimate_work(ctx, &max_work, &nr_threads,
+					&rshift);
+			if (!moveon)
+				break;
+			moveon = progress_init_phase(ctx, progress_fp, phase,
+					max_work, rshift, nr_threads);
+		} else {
+			moveon = progress_init_phase(ctx, NULL, phase, 0, 0, 0);
+		}
+		if (!moveon)
+			break;
 		moveon = sp->fn(ctx);
 		if (!moveon) {
 			str_info(ctx, ctx->mntpoint,
@@ -440,6 +469,7 @@ _("Scrub aborted after phase %d."),
 					phase);
 			break;
 		}
+		progress_end_phase();
 		moveon = phase_end(&pi, phase);
 		if (!moveon)
 			break;
@@ -461,6 +491,7 @@ main(
 	int			c;
 	char			*mtab = NULL;
 	char			*repairstr = "";
+	FILE			*progress_fp = NULL;
 	struct scrub_ctx	ctx = {0};
 	struct phase_rusage	all_pi;
 	unsigned long long	total_errors;
@@ -477,7 +508,7 @@ main(
 	pthread_mutex_init(&ctx.lock, NULL);
 	ctx.mode = SCRUB_MODE_DEFAULT;
 	ctx.error_action = ERRORS_CONTINUE;
-	while ((c = getopt(argc, argv, "a:bde:m:nTvxVy")) != EOF) {
+	while ((c = getopt(argc, argv, "a:bC:de:m:nTvxVy")) != EOF) {
 		switch (c) {
 		case 'a':
 			ctx.max_errors = cvt_u64(optarg, 10);
@@ -490,6 +521,19 @@ main(
 			nr_threads = 1;
 			bg_mode++;
 			break;
+		case 'C':
+			errno = 0;
+			ret = cvt_u32(optarg, 10);
+			if (errno) {
+				perror(optarg);
+				usage();
+			}
+			progress_fp = fdopen(ret, "w");
+			if (!progress_fp) {
+				perror(optarg);
+				usage();
+			}
+			break;
 		case 'd':
 			debug++;
 			dumpcore = true;
@@ -560,6 +604,13 @@ _("Only one of the options -n or -y may be specified.\n"));
 	unicrash_setup();
 	ctx.mntpoint = strdup(argv[optind]);
 
+	stdout_isatty = isatty(STDOUT_FILENO);
+	stderr_isatty = isatty(STDERR_FILENO);
+
+	/* If interactive, start the progress bar. */
+	if (stdout_isatty && !progress_fp)
+		progress_fp = fdopen(1, "w+");
+
 	/* Find the mount record for the passed-in argument. */
 	if (stat(argv[optind], &ctx.mnt_sb) < 0) {
 		fprintf(stderr,
@@ -615,7 +666,7 @@ _("%s: Not a XFS mount point or block device.\n"),
 	}
 
 	/* Scrub a filesystem. */
-	moveon = run_scrub_phases(&ctx);
+	moveon = run_scrub_phases(&ctx, progress_fp);
 	if (!moveon)
 		ret |= 4;
 
@@ -657,6 +708,8 @@ _("%s: %llu warnings found.\n"),
 	if (ctx.runtime_errors)
 		ret |= 4;
 	phase_end(&all_pi, 0);
+	if (progress_fp)
+		fclose(progress_fp);
 	free(ctx.blkdev);
 	free(ctx.mntpoint);
 
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index 4a383f1..cda290c 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -31,6 +31,8 @@ extern bool			dumpcore;
 extern bool			verbose;
 extern bool			scrub_data;
 extern long			page_size;
+extern bool			stderr_isatty;
+extern bool			stdout_isatty;
 
 enum scrub_mode {
 	SCRUB_MODE_DRY_RUN,
@@ -110,4 +112,16 @@ bool xfs_scan_blocks(struct scrub_ctx *ctx);
 bool xfs_scan_summary(struct scrub_ctx *ctx);
 bool xfs_repair_fs(struct scrub_ctx *ctx);
 
+/* Progress estimator functions */
+uint64_t xfs_estimate_inodes(struct scrub_ctx *ctx);
+unsigned int xfs_scrub_estimate_ag_work(struct scrub_ctx *ctx);
+bool xfs_estimate_metadata_work(struct scrub_ctx *ctx, uint64_t *items,
+				unsigned int *nr_threads, int *rshift);
+bool xfs_estimate_inodes_work(struct scrub_ctx *ctx, uint64_t *items,
+			      unsigned int *nr_threads, int *rshift);
+bool xfs_estimate_repair_work(struct scrub_ctx *ctx, uint64_t *items,
+			      unsigned int *nr_threads, int *rshift);
+bool xfs_estimate_verify_work(struct scrub_ctx *ctx, uint64_t *items,
+			      unsigned int *nr_threads, int *rshift);
+
 #endif /* XFS_SCRUB_XFS_SCRUB_H_ */


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 26/27] xfs_scrub: create a script to scrub all xfs filesystems
  2018-01-06  1:51 [PATCH v11 00/27] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (24 preceding siblings ...)
  2018-01-06  1:54 ` [PATCH 25/27] xfs_scrub: progress indicator Darrick J. Wong
@ 2018-01-06  1:54 ` Darrick J. Wong
  2018-01-06  1:54 ` [PATCH 27/27] xfs_scrub: integrate services with systemd Darrick J. Wong
                   ` (4 subsequent siblings)
  30 siblings, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-06  1:54 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Create an xfs_scrub_all command to find all XFS filesystems
and run an online scrub against them all.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 debian/control           |    3 +
 debian/rules             |    1 
 man/man8/xfs_scrub_all.8 |   32 ++++++++++
 scrub/Makefile           |   15 ++++
 scrub/xfs_scrub_all.in   |  154 ++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 201 insertions(+), 4 deletions(-)
 create mode 100644 man/man8/xfs_scrub_all.8
 create mode 100644 scrub/xfs_scrub_all.in


diff --git a/debian/control b/debian/control
index 36d1bd8..801744b 100644
--- a/debian/control
+++ b/debian/control
@@ -3,12 +3,13 @@ Section: admin
 Priority: optional
 Maintainer: XFS Development Team <linux-xfs@vger.kernel.org>
 Uploaders: Nathan Scott <nathans@debian.org>, Anibal Monsalve Salazar <anibal@debian.org>
-Build-Depends: uuid-dev, dh-autoreconf, debhelper (>= 5), gettext, libtool, libreadline-gplv2-dev | libreadline5-dev, libblkid-dev (>= 2.17), linux-libc-dev, libdevmapper-dev, libattr1-dev, libunistring-dev
+Build-Depends: uuid-dev, dh-autoreconf, debhelper (>= 5), gettext, libtool, libreadline-gplv2-dev | libreadline5-dev, libblkid-dev (>= 2.17), linux-libc-dev, libdevmapper-dev, libattr1-dev, libunistring-dev, dh-python
 Standards-Version: 3.9.1
 Homepage: https://xfs.wiki.kernel.org/
 
 Package: xfsprogs
 Depends: ${shlibs:Depends}, ${misc:Depends}
+Recommends: ${python3:Depends}, util-linux
 Provides: fsck-backend
 Suggests: xfsdump, acl, attr, quota
 Breaks: xfsdump (<< 3.0.0)
diff --git a/debian/rules b/debian/rules
index baefdba..abb794e 100755
--- a/debian/rules
+++ b/debian/rules
@@ -76,6 +76,7 @@ binary-arch: checkroot built
 	$(pkgdi)  $(MAKE) -C debian install-d-i
 	$(pkgme)  $(MAKE) dist
 	rmdir debian/xfslibs-dev/usr/share/doc/xfsprogs
+	dh_python3
 	dh_installdocs
 	dh_installchangelogs
 	dh_strip
diff --git a/man/man8/xfs_scrub_all.8 b/man/man8/xfs_scrub_all.8
new file mode 100644
index 0000000..5e1420b
--- /dev/null
+++ b/man/man8/xfs_scrub_all.8
@@ -0,0 +1,32 @@
+.TH xfs_scrub_all 8
+.SH NAME
+xfs_scrub_all \- scrub all mounted XFS filesystems
+.SH SYNOPSIS
+.B xfs_scrub_all
+.SH DESCRIPTION
+.B xfs_scrub_all
+attempts to read and check all the metadata on all mounted XFS filesystems.
+The online scrub is performed via the
+.B xfs_scrub
+tool, either by running it directly or by using systemd to start it
+in a restricted fashion.
+Mounted filesystems are mapped to physical storage devices so that scrub
+operations can be run in parallel so long as no two scrubbers access
+the same device simultaneously.
+.SH EXIT CODE
+The exit code returned by
+.B xfs_scrub_all
+is the sum of the following conditions:
+.br
+\	0\	\-\ No errors
+.br
+\	4\	\-\ File system errors left uncorrected
+.br
+\	8\	\-\ Operational error
+.br
+\	16\	\-\ Usage or syntax error
+.TP
+These are the same error codes returned by xfs_scrub.
+.br
+.SH SEE ALSO
+.BR xfs_scrub (8).
diff --git a/scrub/Makefile b/scrub/Makefile
index 7a80ff6..f709606 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -13,6 +13,8 @@ SCRUB_PREREQS=$(PKG_PLATFORM)$(HAVE_OPENAT)$(HAVE_FSTATAT)
 ifeq ($(SCRUB_PREREQS),linuxyesyes)
 LTCOMMAND = xfs_scrub
 INSTALL_SCRUB = install-scrub
+XFS_SCRUB_ALL_PROG = xfs_scrub_all
+XFS_SCRUB_ARGS = -b -n
 endif	# scrub_prereqs
 
 HFILES = \
@@ -82,17 +84,24 @@ ifeq ($(HAVE_HDIO_GETGEO),yes)
 LCFLAGS += -DHAVE_HDIO_GETGEO
 endif
 
-default: depend $(LTCOMMAND)
+default: depend $(LTCOMMAND) $(XFS_SCRUB_ALL_PROG)
+
+xfs_scrub_all: xfs_scrub_all.in
+	@echo "    [SED]    $@"
+	$(Q)$(SED) -e "s|@sbindir@|$(PKG_ROOT_SBIN_DIR)|g" \
+		   -e "s|@scrub_args@|$(XFS_SCRUB_ARGS)|g" < $< > $@
+	$(Q)chmod a+x $@
 
 phase5.o unicrash.o xfs.o: $(TOPDIR)/include/builddefs
 
 include $(BUILDRULES)
 
-install: default $(INSTALL_SCRUB)
+install: $(INSTALL_SCRUB)
 
-install-scrub:
+install-scrub: default
 	$(INSTALL) -m 755 -d $(PKG_ROOT_SBIN_DIR)
 	$(LTINSTALL) -m 755 $(LTCOMMAND) $(PKG_ROOT_SBIN_DIR)
+	$(INSTALL) -m 755 $(XFS_SCRUB_ALL_PROG) $(PKG_ROOT_SBIN_DIR)
 
 install-dev:
 
diff --git a/scrub/xfs_scrub_all.in b/scrub/xfs_scrub_all.in
new file mode 100644
index 0000000..7738644
--- /dev/null
+++ b/scrub/xfs_scrub_all.in
@@ -0,0 +1,154 @@
+#!/usr/bin/env python3
+
+# Run online scrubbers in parallel, but avoid thrashing.
+#
+# Copyright (C) 2018 Oracle.  All rights reserved.
+#
+# Author: Darrick J. Wong <darrick.wong@oracle.com>
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License
+# as published by the Free Software Foundation; either version 2
+# of the License, or (at your option) any later version.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+
+import subprocess
+import json
+import threading
+import time
+import sys
+
+retcode = 0
+terminate = False
+
+def find_mounts():
+	'''Map mountpoints to physical disks.'''
+
+	fs = {}
+	cmd=['lsblk', '-o', 'KNAME,TYPE,FSTYPE,MOUNTPOINT', '-J']
+	result = subprocess.Popen(cmd, stdout=subprocess.PIPE)
+	result.wait()
+	if result.returncode != 0:
+		return fs
+	sarray = [x.decode('utf-8') for x in result.stdout.readlines()]
+	output = ' '.join(sarray)
+	bdevdata = json.loads(output)
+	# The lsblk output had better be in disks-then-partitions order
+	for bdev in bdevdata['blockdevices']:
+		if bdev['type'] in ('disk', 'loop'):
+			lastdisk = bdev['kname']
+		if bdev['fstype'] == 'xfs':
+			mnt = bdev['mountpoint']
+			if mnt is None:
+				continue
+			if mnt in fs:
+				fs[mnt].add(lastdisk)
+			else:
+				fs[mnt] = set([lastdisk])
+	return fs
+
+def run_killable(cmd, stdout, killfuncs, kill_fn):
+	'''Run a killable program.  Returns program retcode or -1 if we can't start it.'''
+	try:
+		proc = subprocess.Popen(cmd, stdout = stdout)
+		real_kill_fn = lambda: kill_fn(proc)
+		killfuncs.add(real_kill_fn)
+		proc.wait()
+		try:
+			killfuncs.remove(real_kill_fn)
+		except:
+			pass
+		return proc.returncode
+	except:
+		return -1
+
+def run_scrub(mnt, cond, running_devs, mntdevs, killfuncs):
+	'''Run a scrub process.'''
+	global retcode, terminate
+
+	print("Scrubbing %s..." % mnt)
+	sys.stdout.flush()
+
+	try:
+		if terminate:
+			return
+
+		# Invoke xfs_scrub manually
+		cmd=['@sbindir@/xfs_scrub', '@scrub_args@', mnt]
+		ret = run_killable(cmd, None, killfuncs, \
+				lambda proc: proc.terminate())
+		if ret >= 0:
+			print("Scrubbing %s done, (err=%d)" % (mnt, ret))
+			sys.stdout.flush()
+			retcode |= ret
+			return
+
+		if terminate:
+			return
+
+		print("Unable to start scrub tool.")
+		sys.stdout.flush()
+	finally:
+		running_devs -= mntdevs
+		cond.acquire()
+		cond.notify()
+		cond.release()
+
+def main():
+	'''Find mounts, schedule scrub runs.'''
+	def thr(mnt, devs):
+		a = (mnt, cond, running_devs, devs, killfuncs)
+		thr = threading.Thread(target = run_scrub, args = a)
+		thr.start()
+	global retcode, terminate
+
+	fs = find_mounts()
+
+	# Schedule scrub jobs...
+	running_devs = set()
+	killfuncs = set()
+	cond = threading.Condition()
+	while len(fs) > 0:
+		if len(running_devs) == 0:
+			mnt, devs = fs.popitem()
+			running_devs.update(devs)
+			thr(mnt, devs)
+		poppers = set()
+		for mnt in fs:
+			devs = fs[mnt]
+			can_run = True
+			for dev in devs:
+				if dev in running_devs:
+					can_run = False
+					break
+			if can_run:
+				running_devs.update(devs)
+				poppers.add(mnt)
+				thr(mnt, devs)
+		for p in poppers:
+			fs.pop(p)
+		cond.acquire()
+		try:
+			cond.wait()
+		except KeyboardInterrupt:
+			terminate = True
+			print("Terminating...")
+			sys.stdout.flush()
+			while len(killfuncs) > 0:
+				fn = killfuncs.pop()
+				fn()
+			fs = []
+		cond.release()
+
+	sys.exit(retcode)
+
+if __name__ == '__main__':
+	main()


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 27/27] xfs_scrub: integrate services with systemd
  2018-01-06  1:51 [PATCH v11 00/27] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (25 preceding siblings ...)
  2018-01-06  1:54 ` [PATCH 26/27] xfs_scrub: create a script to scrub all xfs filesystems Darrick J. Wong
@ 2018-01-06  1:54 ` Darrick J. Wong
  2018-01-06  3:50 ` [PATCH 07/27] xfs_scrub: find XFS filesystem geometry Darrick J. Wong
                   ` (3 subsequent siblings)
  30 siblings, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-06  1:54 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Create a systemd service unit so that we can run the online scrubber
under systemd with (somewhat) appropriate containment.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 .gitignore                       |    4 +++
 configure.ac                     |   15 +++++++++++
 include/builddefs.in             |    3 ++
 scrub/Makefile                   |   32 ++++++++++++++++++++++-
 scrub/xfs_scrub.c                |   25 ++++++++++++++++++
 scrub/xfs_scrub@.service.in      |   18 +++++++++++++
 scrub/xfs_scrub_all.cron.in      |    2 +
 scrub/xfs_scrub_all.in           |   53 ++++++++++++++++++++++++++++++++++++++
 scrub/xfs_scrub_all.service.in   |    8 ++++++
 scrub/xfs_scrub_all.timer        |   11 ++++++++
 scrub/xfs_scrub_fail             |   26 +++++++++++++++++++
 scrub/xfs_scrub_fail@.service.in |   10 +++++++
 12 files changed, 206 insertions(+), 1 deletion(-)
 create mode 100644 scrub/xfs_scrub@.service.in
 create mode 100644 scrub/xfs_scrub_all.cron.in
 create mode 100644 scrub/xfs_scrub_all.service.in
 create mode 100644 scrub/xfs_scrub_all.timer
 create mode 100755 scrub/xfs_scrub_fail
 create mode 100644 scrub/xfs_scrub_fail@.service.in


diff --git a/.gitignore b/.gitignore
index a3db640..d887451 100644
--- a/.gitignore
+++ b/.gitignore
@@ -69,6 +69,10 @@ cscope.*
 /rtcp/xfs_rtcp
 /spaceman/xfs_spaceman
 /scrub/xfs_scrub
+/scrub/xfs_scrub@.service
+/scrub/xfs_scrub_all
+/scrub/xfs_scrub_all.service
+/scrub/xfs_scrub_fail@.service
 
 # generated crc files
 /libxfs/crc32selftest
diff --git a/configure.ac b/configure.ac
index bb032e5..f7840db 100644
--- a/configure.ac
+++ b/configure.ac
@@ -121,6 +121,21 @@ esac
 AC_SUBST([root_sbindir])
 AC_SUBST([root_libdir])
 
+# Where do systemd services go?
+pkg_systemdsystemunitdir="$(pkg-config --variable=systemdsystemunitdir systemd 2>/dev/null)"
+case "${pkg_systemdsystemunitdir}" in
+"")
+	systemdsystemunitdir=""
+	have_systemd=no
+	;;
+*)
+	systemdsystemunitdir="${pkg_systemdsystemunitdir}"
+	have_systemd=yes
+	;;
+esac
+AC_SUBST([have_systemd])
+AC_SUBST([systemdsystemunitdir])
+
 # Find localized files.  Don't descend into any "dot directories"
 # (like .git or .pc from quilt).  Strangely, the "-print" argument
 # to "find" is required, to avoid including such directories in the
diff --git a/include/builddefs.in b/include/builddefs.in
index d44faf9..4b4bf41 100644
--- a/include/builddefs.in
+++ b/include/builddefs.in
@@ -128,6 +128,9 @@ HAVE_FSTATAT = @have_fstatat@
 HAVE_SG_IO = @have_sg_io@
 HAVE_HDIO_GETGEO = @have_hdio_getgeo@
 
+HAVE_SYSTEMD = @have_systemd@
+SYSTEMDSYSTEMUNITDIR = @systemdsystemunitdir@
+
 GCCFLAGS = -funsigned-char -fno-strict-aliasing -Wall
 #	   -Wbitwise -Wno-transparent-union -Wno-old-initializer -Wno-decl
 
diff --git a/scrub/Makefile b/scrub/Makefile
index f709606..3e6f690 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -15,6 +15,16 @@ LTCOMMAND = xfs_scrub
 INSTALL_SCRUB = install-scrub
 XFS_SCRUB_ALL_PROG = xfs_scrub_all
 XFS_SCRUB_ARGS = -b -n
+ifeq ($(HAVE_SYSTEMD),yes)
+INSTALL_SCRUB += install-systemd
+SYSTEMDSERVICES = xfs_scrub@.service xfs_scrub_all.service xfs_scrub_all.timer xfs_scrub_fail@.service
+endif
+CRONSERVICES = xfs_scrub_all.cron
+CROND_DIR = /etc/cron.d
+
+# Disable all the crontabs for now
+CROND_DIR = $(PKG_LIB_DIR)/$(PKG_NAME)
+
 endif	# scrub_prereqs
 
 HFILES = \
@@ -84,7 +94,8 @@ ifeq ($(HAVE_HDIO_GETGEO),yes)
 LCFLAGS += -DHAVE_HDIO_GETGEO
 endif
 
-default: depend $(LTCOMMAND) $(XFS_SCRUB_ALL_PROG)
+default: depend $(LTCOMMAND) $(XFS_SCRUB_ALL_PROG) $(SYSTEMDSERVICES) \
+	$(CRONSERVICES)
 
 xfs_scrub_all: xfs_scrub_all.in
 	@echo "    [SED]    $@"
@@ -98,10 +109,29 @@ include $(BUILDRULES)
 
 install: $(INSTALL_SCRUB)
 
+%.service: %.service.in
+	@echo "    [SED]    $@"
+	$(Q)$(SED) -e "s|@sbindir@|$(PKG_ROOT_SBIN_DIR)|g" \
+		   -e "s|@scrub_args@|$(XFS_SCRUB_ARGS)|g" \
+		   -e "s|@pkg_lib_dir@|$(PKG_LIB_DIR)|g" \
+		   -e "s|@pkg_name@|$(PKG_NAME)|g" < $< > $@
+
+%.cron: %.cron.in
+	@echo "    [SED]    $@"
+	$(Q)$(SED) -e "s|@sbindir@|$(PKG_ROOT_SBIN_DIR)|g" < $< > $@
+
+install-systemd: default
+	$(INSTALL) -m 755 -d $(SYSTEMDSYSTEMUNITDIR)
+	$(INSTALL) -m 644 $(SYSTEMDSERVICES) $(SYSTEMDSYSTEMUNITDIR)
+	$(INSTALL) -m 755 -d $(PKG_LIB_DIR)/$(PKG_NAME)
+	$(INSTALL) -m 755 xfs_scrub_fail $(PKG_LIB_DIR)/$(PKG_NAME)
+
 install-scrub: default
 	$(INSTALL) -m 755 -d $(PKG_ROOT_SBIN_DIR)
 	$(LTINSTALL) -m 755 $(LTCOMMAND) $(PKG_ROOT_SBIN_DIR)
 	$(INSTALL) -m 755 $(XFS_SCRUB_ALL_PROG) $(PKG_ROOT_SBIN_DIR)
+	$(INSTALL) -m 755 -d $(CROND_DIR)
+	$(INSTALL) -m 644 $(CRONSERVICES) $(CROND_DIR)
 
 install-dev:
 
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index 5750108..66c64a4 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -144,6 +144,12 @@ long				page_size;
 bool				stderr_isatty;
 bool				stdout_isatty;
 
+/*
+ * If we are running as a service, we need to be careful about what
+ * error codes we return to the calling process.
+ */
+bool				is_service;
+
 static void __attribute__((noreturn))
 usage(void)
 {
@@ -611,6 +617,9 @@ _("Only one of the options -n or -y may be specified.\n"));
 	if (stdout_isatty && !progress_fp)
 		progress_fp = fdopen(1, "w+");
 
+	if (getenv("SERVICE_MODE"))
+		is_service = true;
+
 	/* Find the mount record for the passed-in argument. */
 	if (stat(argv[optind], &ctx.mnt_sb) < 0) {
 		fprintf(stderr,
@@ -713,5 +722,21 @@ _("%s: %llu warnings found.\n"),
 	free(ctx.blkdev);
 	free(ctx.mntpoint);
 
+	/*
+	 * If we're running as a service, bump return code up by 150 to
+	 * avoid conflicting with (sysvinit) service return codes.
+	 */
+	if (is_service) {
+		/*
+		 * journald queries /proc as part of taking in log
+		 * messages; it uses this information to associate the
+		 * message with systemd units, etc.  This races with
+		 * process exit, so delay that a couple of seconds so
+		 * that we capture the summary outputs in the job log.
+		 */
+		sleep(2);
+		if (ret)
+			ret += 150;
+	}
 	return ret;
 }
diff --git a/scrub/xfs_scrub@.service.in b/scrub/xfs_scrub@.service.in
new file mode 100644
index 0000000..6b6992d
--- /dev/null
+++ b/scrub/xfs_scrub@.service.in
@@ -0,0 +1,18 @@
+[Unit]
+Description=Online XFS Metadata Check for %I
+OnFailure=xfs_scrub_fail@%i.service
+
+[Service]
+Type=oneshot
+WorkingDirectory=%I
+PrivateNetwork=true
+ProtectSystem=full
+ProtectHome=read-only
+PrivateTmp=yes
+AmbientCapabilities=CAP_SYS_ADMIN CAP_FOWNER CAP_DAC_OVERRIDE CAP_DAC_READ_SEARCH CAP_SYS_RAWIO
+NoNewPrivileges=yes
+User=nobody
+IOSchedulingClass=idle
+CPUSchedulingPolicy=idle
+Environment=SERVICE_MODE=1
+ExecStart=@sbindir@/xfs_scrub @scrub_args@ %I
diff --git a/scrub/xfs_scrub_all.cron.in b/scrub/xfs_scrub_all.cron.in
new file mode 100644
index 0000000..ec82236
--- /dev/null
+++ b/scrub/xfs_scrub_all.cron.in
@@ -0,0 +1,2 @@
+SERVICE_MODE=1
+10 3 * * 0 root test -e /run/systemd/system || @sbindir@/xfs_scrub_all
diff --git a/scrub/xfs_scrub_all.in b/scrub/xfs_scrub_all.in
index 7738644..27cdc32 100644
--- a/scrub/xfs_scrub_all.in
+++ b/scrub/xfs_scrub_all.in
@@ -25,10 +25,19 @@ import json
 import threading
 import time
 import sys
+import os
 
 retcode = 0
 terminate = False
 
+def DEVNULL():
+	'''Return /dev/null in subprocess writable format.'''
+	try:
+		from subprocess import DEVNULL
+		return DEVNULL
+	except ImportError:
+		return open(os.devnull, 'wb')
+
 def find_mounts():
 	'''Map mountpoints to physical disks.'''
 
@@ -55,6 +64,13 @@ def find_mounts():
 				fs[mnt] = set([lastdisk])
 	return fs
 
+def kill_systemd(unit, proc):
+	'''Kill systemd unit.'''
+	proc.terminate()
+	cmd=['systemctl', 'stop', unit]
+	x = subprocess.Popen(cmd)
+	x.wait()
+
 def run_killable(cmd, stdout, killfuncs, kill_fn):
 	'''Run a killable program.  Returns program retcode or -1 if we can't start it.'''
 	try:
@@ -81,6 +97,19 @@ def run_scrub(mnt, cond, running_devs, mntdevs, killfuncs):
 		if terminate:
 			return
 
+		# Try it the systemd way
+		cmd=['systemctl', 'start', 'xfs_scrub@%s' % mnt]
+		ret = run_killable(cmd, DEVNULL(), killfuncs, \
+				lambda proc: kill_systemd('xfs_scrub@%s' % mnt, proc))
+		if ret == 0 or ret == 1:
+			print("Scrubbing %s done, (err=%d)" % (mnt, ret))
+			sys.stdout.flush()
+			retcode |= ret
+			return
+
+		if terminate:
+			return
+
 		# Invoke xfs_scrub manually
 		cmd=['@sbindir@/xfs_scrub', '@scrub_args@', mnt]
 		ret = run_killable(cmd, None, killfuncs, \
@@ -112,6 +141,17 @@ def main():
 
 	fs = find_mounts()
 
+	# Tail the journal if we ourselves aren't a service...
+	journalthread = None
+	if 'SERVICE_MODE' not in os.environ:
+		try:
+			cmd=['journalctl', '--no-pager', '-q', '-S', 'now', \
+					'-f', '-u', 'xfs_scrub@*', '-o', \
+					'cat']
+			journalthread = subprocess.Popen(cmd)
+		except:
+			pass
+
 	# Schedule scrub jobs...
 	running_devs = set()
 	killfuncs = set()
@@ -148,6 +188,19 @@ def main():
 			fs = []
 		cond.release()
 
+	if journalthread is not None:
+		journalthread.terminate()
+
+	# journald queries /proc as part of taking in log
+	# messages; it uses this information to associate the
+	# message with systemd units, etc.  This races with
+	# process exit, so delay that a couple of seconds so
+	# that we capture the summary outputs in the job log.
+	if 'SERVICE_MODE' in os.environ:
+		time.sleep(2)
+		if retcode:
+			retcode += 150
+
 	sys.exit(retcode)
 
 if __name__ == '__main__':
diff --git a/scrub/xfs_scrub_all.service.in b/scrub/xfs_scrub_all.service.in
new file mode 100644
index 0000000..683804e
--- /dev/null
+++ b/scrub/xfs_scrub_all.service.in
@@ -0,0 +1,8 @@
+[Unit]
+Description=Online XFS Metadata Check for All Filesystems
+ConditionACPower=true
+
+[Service]
+Type=oneshot
+Environment=SERVICE_MODE=1
+ExecStart=@sbindir@/xfs_scrub_all
diff --git a/scrub/xfs_scrub_all.timer b/scrub/xfs_scrub_all.timer
new file mode 100644
index 0000000..2e4a33b
--- /dev/null
+++ b/scrub/xfs_scrub_all.timer
@@ -0,0 +1,11 @@
+[Unit]
+Description=Periodic XFS Online Metadata Check for All Filesystems
+
+[Timer]
+# Run on Sunday at 3:10am, to avoid running afoul of DST changes
+OnCalendar=Sun *-*-* 03:10:00
+RandomizedDelaySec=60
+Persistent=true
+
+[Install]
+WantedBy=timers.target
diff --git a/scrub/xfs_scrub_fail b/scrub/xfs_scrub_fail
new file mode 100755
index 0000000..36dd50e
--- /dev/null
+++ b/scrub/xfs_scrub_fail
@@ -0,0 +1,26 @@
+#!/bin/bash
+
+# Email logs of failed xfs_scrub unit runs
+
+mailer=/usr/sbin/sendmail
+recipient="$1"
+test -z "${recipient}" && exit 0
+mntpoint="$2"
+test -z "${mntpoint}" && exit 0
+hostname="$(hostname -f 2>/dev/null)"
+test -z "${hostname}" && hostname="${HOSTNAME}"
+if [ ! -x "${mailer}" ]; then
+	echo "${mailer}: Mailer program not found."
+	exit 1
+fi
+
+(cat << ENDL
+To: $1
+From: <xfs_scrub@${hostname}>
+Subject: xfs_scrub failure on ${mntpoint}
+
+So sorry, the automatic xfs_scrub of ${mntpoint} on ${hostname} failed.
+
+A log of what happened follows:
+ENDL
+systemctl status --full --lines 4294967295 "xfs_scrub@${mntpoint}") | "${mailer}" -t -i
diff --git a/scrub/xfs_scrub_fail@.service.in b/scrub/xfs_scrub_fail@.service.in
new file mode 100644
index 0000000..785f881
--- /dev/null
+++ b/scrub/xfs_scrub_fail@.service.in
@@ -0,0 +1,10 @@
+[Unit]
+Description=Online XFS Metadata Check Failure Reporting for %I
+
+[Service]
+Type=oneshot
+Environment=EMAIL_ADDR=root
+ExecStart=@pkg_lib_dir@/@pkg_name@/xfs_scrub_fail "${EMAIL_ADDR}" %I
+User=mail
+Group=mail
+SupplementaryGroups=systemd-journal


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 07/27] xfs_scrub: find XFS filesystem geometry
  2018-01-06  1:51 [PATCH v11 00/27] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (26 preceding siblings ...)
  2018-01-06  1:54 ` [PATCH 27/27] xfs_scrub: integrate services with systemd Darrick J. Wong
@ 2018-01-06  3:50 ` Darrick J. Wong
  2018-01-12  4:17 ` [PATCH v11 00/27] xfsprogs: online scrub/repair support Eric Sandeen
                   ` (2 subsequent siblings)
  30 siblings, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-06  3:50 UTC (permalink / raw)
  To: sandeen; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Discover the geometry of the XFS filesystem that we've been told to
scan, and set up some common functions that will be used by the
scrub phases.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/Makefile    |    5 +
 scrub/common.c    |   72 +++++++++++++++++
 scrub/common.h    |   10 ++
 scrub/disk.c      |    3 +
 scrub/phase1.c    |  223 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/xfs_scrub.c |   35 ++++++++
 scrub/xfs_scrub.h |   29 +++++++
 7 files changed, 376 insertions(+), 1 deletion(-)
 create mode 100644 scrub/phase1.c

diff --git a/scrub/Makefile b/scrub/Makefile
index c3a9986..5239dae 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -23,6 +23,7 @@ xfs_scrub.h
 CFILES = \
 common.c \
 disk.c \
+phase1.c \
 xfs_scrub.c
 
 LLDLIBS += $(LIBHANDLE) $(LIBFROG) $(LIBPTHREAD)
@@ -33,6 +34,10 @@ ifeq ($(HAVE_MALLINFO),yes)
 LCFLAGS += -DHAVE_MALLINFO
 endif
 
+ifeq ($(HAVE_SYNCFS),yes)
+LCFLAGS += -DHAVE_SYNCFS
+endif
+
 default: depend $(LTCOMMAND)
 
 include $(BUILDRULES)
diff --git a/scrub/common.c b/scrub/common.c
index 75c6df5..252809d 100644
--- a/scrub/common.c
+++ b/scrub/common.c
@@ -20,8 +20,11 @@
 #include <stdio.h>
 #include <pthread.h>
 #include <stdbool.h>
+#include <sys/statvfs.h>
 #include "platform_defs.h"
 #include "xfs.h"
+#include "xfs_fs.h"
+#include "path.h"
 #include "xfs_scrub.h"
 #include "common.h"
 
@@ -248,3 +251,72 @@ scrub_nproc_workqueue(
 		x = 0;
 	return x;
 }
+
+/*
+ * Check if the argument is either the device name or mountpoint of a mounted
+ * filesystem.
+ */
+#define MNTTYPE_XFS	"xfs"
+static bool
+find_mountpoint_check(
+	struct stat		*sb,
+	struct mntent		*t)
+{
+	struct stat		ms;
+
+	if (S_ISDIR(sb->st_mode)) {		/* mount point */
+		if (stat(t->mnt_dir, &ms) < 0)
+			return false;
+		if (sb->st_ino != ms.st_ino)
+			return false;
+		if (sb->st_dev != ms.st_dev)
+			return false;
+		if (strcmp(t->mnt_type, MNTTYPE_XFS) != 0)
+			return NULL;
+	} else {				/* device */
+		if (stat(t->mnt_fsname, &ms) < 0)
+			return false;
+		if (sb->st_rdev != ms.st_rdev)
+			return false;
+		if (strcmp(t->mnt_type, MNTTYPE_XFS) != 0)
+			return NULL;
+		/*
+		 * Make sure the mountpoint given by mtab is accessible
+		 * before using it.
+		 */
+		if (stat(t->mnt_dir, &ms) < 0)
+			return false;
+	}
+
+	return true;
+}
+
+/* Check that our alleged mountpoint is in mtab */
+bool
+find_mountpoint(
+	char			*mtab,
+	struct scrub_ctx	*ctx)
+{
+	struct mntent_cursor	cursor;
+	struct mntent		*t = NULL;
+	bool			found = false;
+
+	if (platform_mntent_open(&cursor, mtab) != 0) {
+		fprintf(stderr, "Error: can't get mntent entries.\n");
+		exit(1);
+	}
+
+	while ((t = platform_mntent_next(&cursor)) != NULL) {
+		/*
+		 * Keep jotting down matching mount details; newer mounts are
+		 * towards the end of the file (hopefully).
+		 */
+		if (find_mountpoint_check(&ctx->mnt_sb, t)) {
+			ctx->mntpoint = strdup(t->mnt_dir);
+			ctx->blkdev = strdup(t->mnt_fsname);
+			found = true;
+		}
+	}
+	platform_mntent_close(&cursor);
+	return found;
+}
diff --git a/scrub/common.h b/scrub/common.h
index 41b3ea7..fed95df 100644
--- a/scrub/common.h
+++ b/scrub/common.h
@@ -62,4 +62,14 @@ double auto_units(unsigned long long number, char **units);
 unsigned int scrub_nproc(struct scrub_ctx *ctx);
 unsigned int scrub_nproc_workqueue(struct scrub_ctx *ctx);
 
+#ifndef HAVE_SYNCFS
+static inline int syncfs(int fd)
+{
+	sync();
+	return 0;
+}
+#endif
+
+bool find_mountpoint(char *mtab, struct scrub_ctx *ctx);
+
 #endif /* XFS_SCRUB_COMMON_H_ */
diff --git a/scrub/disk.c b/scrub/disk.c
index d4bf81f..546a06c 100644
--- a/scrub/disk.c
+++ b/scrub/disk.c
@@ -31,6 +31,9 @@
 #include <linux/fs.h>
 #include "platform_defs.h"
 #include "libfrog.h"
+#include "xfs.h"
+#include "path.h"
+#include "xfs_fs.h"
 #include "xfs_scrub.h"
 #include "disk.h"
 
diff --git a/scrub/phase1.c b/scrub/phase1.c
new file mode 100644
index 0000000..65409d3
--- /dev/null
+++ b/scrub/phase1.c
@@ -0,0 +1,223 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include <mntent.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/resource.h>
+#include <sys/statvfs.h>
+#include <sys/vfs.h>
+#include <fcntl.h>
+#include <dirent.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <pthread.h>
+#include <errno.h>
+#include <linux/fs.h>
+#include "libfrog.h"
+#include "workqueue.h"
+#include "input.h"
+#include "path.h"
+#include "handle.h"
+#include "bitops.h"
+#include "xfs_arch.h"
+#include "xfs_format.h"
+#include "avl64.h"
+#include "list.h"
+#include "xfs_scrub.h"
+#include "common.h"
+#include "disk.h"
+
+/* Phase 1: Find filesystem geometry (and clean up after) */
+
+/* Shut down the filesystem. */
+void
+xfs_shutdown_fs(
+	struct scrub_ctx		*ctx)
+{
+	int				flag;
+
+	flag = XFS_FSOP_GOING_FLAGS_LOGFLUSH;
+	str_info(ctx, ctx->mntpoint, _("Shutting down filesystem!"));
+	if (ioctl(ctx->mnt_fd, XFS_IOC_GOINGDOWN, &flag))
+		str_errno(ctx, ctx->mntpoint);
+}
+
+/* Clean up the XFS-specific state data. */
+bool
+xfs_cleanup_fs(
+	struct scrub_ctx	*ctx)
+{
+	if (ctx->fshandle)
+		free_handle(ctx->fshandle, ctx->fshandle_len);
+	if (ctx->rtdev)
+		disk_close(ctx->rtdev);
+	if (ctx->logdev)
+		disk_close(ctx->logdev);
+	if (ctx->datadev)
+		disk_close(ctx->datadev);
+	fshandle_destroy();
+	close(ctx->mnt_fd);
+	fs_table_destroy();
+
+	return true;
+}
+
+/*
+ * Bind to the mountpoint, read the XFS geometry, bind to the block devices.
+ * Anything we've already built will be cleaned up by xfs_cleanup_fs.
+ */
+bool
+xfs_setup_fs(
+	struct scrub_ctx		*ctx)
+{
+	struct fs_path			*fsp;
+	int				error;
+
+	/*
+	 * Open the directory with O_NOATIME.  For mountpoints owned
+	 * by root, this should be sufficient to ensure that we have
+	 * CAP_SYS_ADMIN, which we probably need to do anything fancy
+	 * with the (XFS driver) kernel.
+	 */
+	ctx->mnt_fd = open(ctx->mntpoint, O_RDONLY | O_NOATIME | O_DIRECTORY);
+	if (ctx->mnt_fd < 0) {
+		if (errno == EPERM)
+			str_info(ctx, ctx->mntpoint,
+_("Must be root to run scrub."));
+		else
+			str_errno(ctx, ctx->mntpoint);
+		return false;
+	}
+
+	error = fstat(ctx->mnt_fd, &ctx->mnt_sb);
+	if (error) {
+		str_errno(ctx, ctx->mntpoint);
+		return false;
+	}
+	error = fstatvfs(ctx->mnt_fd, &ctx->mnt_sv);
+	if (error) {
+		str_errno(ctx, ctx->mntpoint);
+		return false;
+	}
+	error = fstatfs(ctx->mnt_fd, &ctx->mnt_sf);
+	if (error) {
+		str_errno(ctx, ctx->mntpoint);
+		return false;
+	}
+
+	ctx->nr_io_threads = nproc;
+	if (verbose) {
+		fprintf(stdout, _("%s: using %d threads to scrub.\n"),
+				ctx->mntpoint, scrub_nproc(ctx));
+		fflush(stdout);
+	}
+
+	if (!platform_test_xfs_fd(ctx->mnt_fd)) {
+		str_error(ctx, ctx->mntpoint,
+_("Does not appear to be an XFS filesystem!"));
+		return false;
+	}
+
+	/*
+	 * Flush everything out to disk before we start checking.
+	 * This seems to reduce the incidence of stale file handle
+	 * errors when we open things by handle.
+	 */
+	error = syncfs(ctx->mnt_fd);
+	if (error) {
+		str_errno(ctx, ctx->mntpoint);
+		return false;
+	}
+
+	/* Retrieve XFS geometry. */
+	error = ioctl(ctx->mnt_fd, XFS_IOC_FSGEOMETRY, &ctx->geo);
+	if (error) {
+		str_errno(ctx, ctx->mntpoint);
+		return false;
+	}
+
+	ctx->agblklog = log2_roundup(ctx->geo.agblocks);
+	ctx->blocklog = highbit32(ctx->geo.blocksize);
+	ctx->inodelog = highbit32(ctx->geo.inodesize);
+	ctx->inopblog = ctx->blocklog - ctx->inodelog;
+
+	error = path_to_fshandle(ctx->mntpoint, &ctx->fshandle,
+			&ctx->fshandle_len);
+	if (error) {
+		perror(_("getting fshandle"));
+		return false;
+	}
+
+	/* Go find the XFS devices if we have a usable fsmap. */
+	fs_table_initialise(0, NULL, 0, NULL);
+	errno = 0;
+	fsp = fs_table_lookup(ctx->mntpoint, FS_MOUNT_POINT);
+	if (!fsp) {
+		str_error(ctx, ctx->mntpoint,
+_("Unable to find XFS information."));
+		return false;
+	}
+	memcpy(&ctx->fsinfo, fsp, sizeof(struct fs_path));
+
+	/* Did we find the log and rt devices, if they're present? */
+	if (ctx->geo.logstart == 0 && ctx->fsinfo.fs_log == NULL) {
+		str_error(ctx, ctx->mntpoint,
+_("Unable to find log device path."));
+		return false;
+	}
+	if (ctx->geo.rtblocks && ctx->fsinfo.fs_rt == NULL) {
+		str_error(ctx, ctx->mntpoint,
+_("Unable to find realtime device path."));
+		return false;
+	}
+
+	/* Open the raw devices. */
+	ctx->datadev = disk_open(ctx->fsinfo.fs_name);
+	if (error) {
+		str_errno(ctx, ctx->fsinfo.fs_name);
+		return false;
+	}
+
+	if (ctx->fsinfo.fs_log) {
+		ctx->logdev = disk_open(ctx->fsinfo.fs_log);
+		if (error) {
+			str_errno(ctx, ctx->fsinfo.fs_name);
+			return false;
+		}
+	}
+	if (ctx->fsinfo.fs_rt) {
+		ctx->rtdev = disk_open(ctx->fsinfo.fs_rt);
+		if (error) {
+			str_errno(ctx, ctx->fsinfo.fs_name);
+			return false;
+		}
+	}
+
+	/*
+	 * Everything's set up, which means any failures recorded after
+	 * this point are most probably corruption errors (as opposed to
+	 * purely setup errors).
+	 */
+	ctx->need_repair = true;
+	return true;
+}
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index a9c185b..a733b8f 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -23,9 +23,12 @@
 #include <stdlib.h>
 #include <sys/time.h>
 #include <sys/resource.h>
+#include <sys/statvfs.h>
 #include "platform_defs.h"
 #include "xfs.h"
+#include "xfs_fs.h"
 #include "input.h"
+#include "path.h"
 #include "xfs_scrub.h"
 #include "common.h"
 
@@ -345,6 +348,8 @@ run_scrub_phases(
 	{
 		{
 			.descr = _("Find filesystem geometry."),
+			.fn = xfs_setup_fs,
+			.must_run = true,
 		},
 		{
 			.descr = _("Check internal metadata."),
@@ -426,6 +431,7 @@ main(
 	struct phase_rusage	all_pi;
 	unsigned long long	total_errors;
 	bool			moveon = true;
+	bool			ismnt;
 	static bool		injected;
 	int			ret = 0;
 
@@ -522,6 +528,15 @@ _("Only one of the options -n or -y may be specified.\n"));
 
 	ctx.mntpoint = strdup(argv[optind]);
 
+	/* Find the mount record for the passed-in argument. */
+	if (stat(argv[optind], &ctx.mnt_sb) < 0) {
+		fprintf(stderr,
+			_("%s: could not stat: %s: %s\n"),
+			progname, argv[optind], strerror(errno));
+		ret |= 8;
+		goto out;
+	}
+
 	/*
 	 * If the user did not specify an explicit mount table, try to use
 	 * /proc/mounts if it is available, else /etc/mtab.  We prefer
@@ -541,6 +556,15 @@ _("Only one of the options -n or -y may be specified.\n"));
 	if (!moveon)
 		goto out;
 
+	ismnt = find_mountpoint(mtab, &ctx);
+	if (!ismnt) {
+		fprintf(stderr,
+_("%s: Not a XFS mount point or block device.\n"),
+			ctx.mntpoint);
+		ret |= 8;
+		goto out;
+	}
+
 	/* How many CPUs? */
 	nproc = sysconf(_SC_NPROCESSORS_ONLN);
 	if (nproc < 1)
@@ -569,6 +593,11 @@ _("Only one of the options -n or -y may be specified.\n"));
 	if (debug_tweak_on("XFS_SCRUB_FORCE_ERROR"))
 		str_error(&ctx, ctx.mntpoint, _("Injecting error."));
 
+	/* Clean up scan data. */
+	moveon = xfs_cleanup_fs(&ctx);
+	if (!moveon)
+		ret |= 8;
+
 out:
 	total_errors = ctx.errors_found + ctx.runtime_errors;
 	if (ctx.need_repair)
@@ -586,13 +615,17 @@ _("%s: %llu errors found.%s\n"),
 		fprintf(stderr,
 _("%s: %llu warnings found.\n"),
 			ctx.mntpoint, ctx.warnings_found);
-	if (ctx.errors_found)
+	if (ctx.errors_found) {
+		if (ctx.error_action == ERRORS_SHUTDOWN)
+			xfs_shutdown_fs(&ctx);
 		ret |= 1;
+	}
 	if (ctx.warnings_found)
 		ret |= 2;
 	if (ctx.runtime_errors)
 		ret |= 4;
 	phase_end(&all_pi, 0);
+	free(ctx.blkdev);
 	free(ctx.mntpoint);
 
 	return ret;
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index 7f1dcb1..2be7c65 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -51,15 +51,38 @@ struct scrub_ctx {
 	char			*mntpoint;
 	char			*blkdev;
 
+	/* Mountpoint info */
+	struct stat		mnt_sb;
+	struct statvfs		mnt_sv;
+	struct statfs		mnt_sf;
+
+	/* Open block devices */
+	struct disk		*datadev;
+	struct disk		*logdev;
+	struct disk		*rtdev;
+
 	/* What does the user want us to do? */
 	enum scrub_mode		mode;
 
 	/* How does the user want us to react to errors? */
 	enum error_action	error_action;
 
+	/* fd to filesystem mount point */
+	int			mnt_fd;
+
 	/* Number of threads for metadata scrubbing */
 	unsigned int		nr_io_threads;
 
+	/* XFS specific geometry */
+	struct xfs_fsop_geom	geo;
+	struct fs_path		fsinfo;
+	unsigned int		agblklog;
+	unsigned int		blocklog;
+	unsigned int		inodelog;
+	unsigned int		inopblog;
+	void			*fshandle;
+	size_t			fshandle_len;
+
 	/* Mutable scrub state; use lock. */
 	pthread_mutex_t		lock;
 	unsigned long long	max_errors;
@@ -67,6 +90,12 @@ struct scrub_ctx {
 	unsigned long long	errors_found;
 	unsigned long long	warnings_found;
 	bool			need_repair;
+	bool			preen_triggers[XFS_SCRUB_TYPE_NR];
 };
 
+/* Phase helper functions */
+void xfs_shutdown_fs(struct scrub_ctx *ctx);
+bool xfs_cleanup_fs(struct scrub_ctx *ctx);
+bool xfs_setup_fs(struct scrub_ctx *ctx);
+
 #endif /* XFS_SCRUB_XFS_SCRUB_H_ */

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH 12/27] xfs_scrub: wrap the scrub ioctl
  2018-01-06  1:52 ` [PATCH 12/27] xfs_scrub: wrap the scrub ioctl Darrick J. Wong
@ 2018-01-11 23:12   ` Eric Sandeen
  2018-01-12  0:28     ` Darrick J. Wong
  0 siblings, 1 reply; 61+ messages in thread
From: Eric Sandeen @ 2018-01-11 23:12 UTC (permalink / raw)
  To: Darrick J. Wong, sandeen; +Cc: linux-xfs

On 1/5/18 7:52 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Create some wrappers to call the scrub ioctls.

> +/*
> + * Sleep for 100ms * however many -b we got past the initial one.
> + * This is an (albeit clumsy) way to throttle scrub activity.
> + */
> +void
> +background_sleep(void)
> +{
> +	unsigned long long	time;
> +	struct timespec		tv;
> +
> +	if (bg_mode < 2)
> +		return;
> +
> +	time = 100000 * (bg_mode - 1);

<coverity pass>

Probably want to cast the constant(s) to something larger if someone
issues -b $HUGE ... 100000ULL?

-Eric

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 10/27] xfs_scrub: add file space map iteration functions
  2018-01-06  1:52 ` [PATCH 10/27] xfs_scrub: add file " Darrick J. Wong
@ 2018-01-11 23:19   ` Eric Sandeen
  2018-01-12  0:24     ` Darrick J. Wong
  0 siblings, 1 reply; 61+ messages in thread
From: Eric Sandeen @ 2018-01-11 23:19 UTC (permalink / raw)
  To: Darrick J. Wong, sandeen; +Cc: linux-xfs

On 1/5/18 7:52 PM, Darrick J. Wong wrote:


> + * These routines provide a simple interface to query the block
> + * mappings of the fork of a given inode via GETBMAPX and call a
> + * function to iterate each mapping result.
> + */
> +
> +#define BMAP_NR		2048
> +
> +/* Iterate all the extent block mappings between the key and fork end. */
> +bool
> +xfs_iterate_filemaps(
> +	struct scrub_ctx	*ctx,
> +	const char		*descr,
> +	int			fd,
> +	int			whichfork,
> +	struct xfs_bmap		*key,

<coverity pass>

Ok key is an xfs_bmap:

/* inode fork block mapping */
struct xfs_bmap {
        uint64_t        bm_offset;      /* file offset of segment in bytes */
        uint64_t        bm_physical;    /* physical starting byte  */
        uint64_t        bm_length;      /* length of segment, bytes */
        uint32_t        bm_flags;       /* output flags */
};

> +	xfs_bmap_iter_fn	fn,
> +	void			*arg)
> +{
> +	struct fsxattr		fsx;
> +	struct getbmapx		*map
map is a getbmapx ...

struct getbmapx {
        __s64           bmv_offset;     /* file offset of segment in blocks */
        __s64           bmv_block;      /* starting block (64-bit daddr_t)  */
        __s64           bmv_length;     /* length of segment, blocks        */
        __s32           bmv_count;      /* # of entries in array incl. 1st  */
        __s32           bmv_entries;    /* # of entries filled in (output). */
        __s32           bmv_iflags;     /* input flags (1st structure)      */
        __s32           bmv_oflags;     /* output flags (after 1st structure)*/
        __s32           bmv_unused1;    /* future use                       */
        __s32           bmv_unused2;    /* future use                       */
};

...

> +out:
> +	memcpy(key, map, sizeof(struct getbmapx));

so I don't think that fits, right?


-Eric

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 06/27] xfs_scrub: create an abstraction for a block device
  2018-01-06  1:52 ` [PATCH 06/27] xfs_scrub: create an abstraction for a block device Darrick J. Wong
@ 2018-01-11 23:24   ` Eric Sandeen
  2018-01-11 23:59     ` Darrick J. Wong
  0 siblings, 1 reply; 61+ messages in thread
From: Eric Sandeen @ 2018-01-11 23:24 UTC (permalink / raw)
  To: Darrick J. Wong, sandeen; +Cc: linux-xfs

On 1/5/18 7:52 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>

...

> +/*
> + * Disk Abstraction
> + *
> + * These routines help us to discover the geometry of a block device,
> + * estimate the amount of concurrent IOs that we can send to it, and
> + * abstract the process of performing read verification of disk blocks.
> + */
> +
> +/* Figure out how many disk heads are available. */
> +static unsigned int
> +__disk_heads(
> +	struct disk		*disk)
> +{
> +	int			iomin;
> +	int			ioopt;
> +	unsigned short		rot;
> +	int			error;
> +
> +	/* If it's not a block device, throw all the CPUs at it. */
> +	if (!S_ISBLK(disk->d_sb.st_mode))
> +		return nproc;
> +
> +	/* Non-rotational device?  Throw all the CPUs. */
> +	rot = 1;
> +	error = ioctl(disk->d_fd, BLKROTATIONAL, &rot);
> +	if (error == 0 && rot == 0)
> +		return nproc;

I needed

+#ifndef BLKROTATIONAL
+#define BLKROTATIONAL _IO(0x12,126)
+#endif

to make this compile on my not /that/ ancient (?) rhel6 box ;)

-Eric

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 21/27] xfs_scrub: scrub file data blocks
  2018-01-06  1:53 ` [PATCH 21/27] xfs_scrub: scrub file " Darrick J. Wong
@ 2018-01-11 23:25   ` Eric Sandeen
  2018-01-12  0:29     ` Darrick J. Wong
  0 siblings, 1 reply; 61+ messages in thread
From: Eric Sandeen @ 2018-01-11 23:25 UTC (permalink / raw)
  To: Darrick J. Wong, sandeen; +Cc: linux-xfs



On 1/5/18 7:53 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>

...

> +		/* Get the stat info for this directory entry. */
> +		error = fstatat(dir_fd, dirent->d_name, &sb,
> +				AT_NO_AUTOMOUNT | AT_SYMLINK_NOFOLLOW);
> +		if (error) {
> +			str_errno(ctx, newpath);
> +			continue;

I needed:

+#ifndef AT_NO_AUTOMOUNT
+#define AT_NO_AUTOMOUNT 0x800
+#endif

here

-Eric

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 25/27] xfs_scrub: progress indicator
  2018-01-06  1:54 ` [PATCH 25/27] xfs_scrub: progress indicator Darrick J. Wong
@ 2018-01-11 23:27   ` Eric Sandeen
  2018-01-12  0:32     ` Darrick J. Wong
  0 siblings, 1 reply; 61+ messages in thread
From: Eric Sandeen @ 2018-01-11 23:27 UTC (permalink / raw)
  To: Darrick J. Wong, sandeen; +Cc: linux-xfs

On 1/5/18 7:54 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>


> +#define NSEC_PER_SEC	(1000000000)
> +static void *
> +progress_report_thread(void *arg)
> +{
> +	struct timespec		abstime;
> +	int			ret;
> +
> +	pthread_mutex_lock(&pt.lock);
> +	while (1) {
> +		/* Every half second. */
> +		ret = clock_gettime(CLOCK_REALTIME, &abstime);


My manpage says "link with -rt" and to include <time.h>, this got me
going:

diff --git a/scrub/Makefile b/scrub/Makefile
index 3e6f690..0094d9d 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -67,7 +67,7 @@ xfs_scrub.c
 
 LLDLIBS += $(LIBHANDLE) $(LIBFROG) $(LIBPTHREAD) $(LIBUNISTRING)
 LTDEPENDENCIES += $(LIBHANDLE) $(LIBFROG) $(LIBUNISTRING)
-LLDFLAGS = -static
+LLDFLAGS = -static -lrt
 
 ifeq ($(HAVE_MALLINFO),yes)
 LCFLAGS += -DHAVE_MALLINFO
diff --git a/scrub/progress.c b/scrub/progress.c
index 30b2152..61b9c60 100644
--- a/scrub/progress.c
+++ b/scrub/progress.c
@@ -22,6 +22,7 @@
 #include <dirent.h>
 #include <pthread.h>
 #include <sys/statvfs.h>
+#include <time.h>
 #include "../repair/threads.h"
 #include "path.h"
 #include "disk.h"


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/27] xfs_scrub: set up command line argument parsing
  2018-01-06  1:51 ` [PATCH 03/27] xfs_scrub: set up command line argument parsing Darrick J. Wong
@ 2018-01-11 23:39   ` Eric Sandeen
  2018-01-12  1:53     ` Darrick J. Wong
  2018-01-12  1:30   ` Eric Sandeen
  1 sibling, 1 reply; 61+ messages in thread
From: Eric Sandeen @ 2018-01-11 23:39 UTC (permalink / raw)
  To: Darrick J. Wong, sandeen; +Cc: linux-xfs

On 1/5/18 7:51 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Parse command line options in order to set up the context in which we
> will scrub the filesystem.


> +static void __attribute__((noreturn))
> +usage(void)
> +{
> +	fprintf(stderr, _("Usage: %s [OPTIONS] mountpoint\n"), progname);
> +	fprintf(stderr, _("-a:\tStop after this many errors are found.\n"));
> +	fprintf(stderr, _("-b:\tBackground mode.\n"));

do you intentionally not document -d?
<same question for manpage>

> +	fprintf(stderr, _("-e:\tWhat to do if errors are found.\n"));
> +	fprintf(stderr, _("-m:\tPath to /etc/mtab.\n"));
> +	fprintf(stderr, _("-n:\tDry run.  Do not modify anything.\n"));
> +	fprintf(stderr, _("-T:\tDisplay timing/usage information.\n"));
> +	fprintf(stderr, _("-v:\tVerbose output.\n"));
> +	fprintf(stderr, _("-V:\tPrint version.\n"));
> +	fprintf(stderr, _("-x:\tScrub file data too.\n"));
> +	fprintf(stderr, _("-y:\tRepair all errors.\n"));
> +
> +	exit(16);
> +}

Could we make this more like xfs_repair usage() for consistency?

Usage: xfs_repair [options] device

Options:
  -f           The device is a file
  -L           Force log zeroing. Do this as a last resort.
  -l logdev    Specifies the device where the external log resides.
  -m maxmem    Maximum amount of memory to be used in megabytes.
  -n           No modify mode, just checks the filesystem for damage.
  -P           Disables prefetching.
  -r rtdev     Specifies the device where the realtime section resides.
  -v           Verbose output.
  -c subopts   Change filesystem parameters - use xfs_admin.
  -o subopts   Override default behaviour, refer to man page.
  -t interval  Reporting interval in seconds.
  -d           Repair dangerously.
  -V           Reports version and exits.

so maybe:

Usage: xfs_scrub [options] mountpoint

  -a count	Stop after this many errors are found.
  -b		Background mode.
  -C fd		Print progress information to this fd.
  -e behavior	What to do if errors are found. (shutdown|continue)
  -m path	Path to /etc/mtab.
  -n		Dry run.  Do not modify anything.
  -T		Display timing/usage information.
  -v		Verbose output.
  -V		Reports version and exits.
  -x		Scrub file data too.
  -y		Repair all errors.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 06/27] xfs_scrub: create an abstraction for a block device
  2018-01-11 23:24   ` Eric Sandeen
@ 2018-01-11 23:59     ` Darrick J. Wong
  2018-01-12  0:04       ` Eric Sandeen
  0 siblings, 1 reply; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-11 23:59 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: sandeen, linux-xfs

On Thu, Jan 11, 2018 at 05:24:58PM -0600, Eric Sandeen wrote:
> On 1/5/18 7:52 PM, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> ...
> 
> > +/*
> > + * Disk Abstraction
> > + *
> > + * These routines help us to discover the geometry of a block device,
> > + * estimate the amount of concurrent IOs that we can send to it, and
> > + * abstract the process of performing read verification of disk blocks.
> > + */
> > +
> > +/* Figure out how many disk heads are available. */
> > +static unsigned int
> > +__disk_heads(
> > +	struct disk		*disk)
> > +{
> > +	int			iomin;
> > +	int			ioopt;
> > +	unsigned short		rot;
> > +	int			error;
> > +
> > +	/* If it's not a block device, throw all the CPUs at it. */
> > +	if (!S_ISBLK(disk->d_sb.st_mode))
> > +		return nproc;
> > +
> > +	/* Non-rotational device?  Throw all the CPUs. */
> > +	rot = 1;
> > +	error = ioctl(disk->d_fd, BLKROTATIONAL, &rot);
> > +	if (error == 0 && rot == 0)
> > +		return nproc;
> 
> I needed
> 
> +#ifndef BLKROTATIONAL
> +#define BLKROTATIONAL _IO(0x12,126)
> +#endif
> 
> to make this compile on my not /that/ ancient (?) rhel6 box ;)

Hmm... well, since I don't see backporting xfs kernel scrub to 2.6.32
maybe xfsprogs' build system should just turn off xfs_scrub on old
systems?

In any case, I #ifdef BLKROTATIONAL'd out the entire clause.

--D

> -Eric
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 06/27] xfs_scrub: create an abstraction for a block device
  2018-01-11 23:59     ` Darrick J. Wong
@ 2018-01-12  0:04       ` Eric Sandeen
  2018-01-12  1:27         ` Darrick J. Wong
  0 siblings, 1 reply; 61+ messages in thread
From: Eric Sandeen @ 2018-01-12  0:04 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: sandeen, linux-xfs

On 1/11/18 5:59 PM, Darrick J. Wong wrote:
> On Thu, Jan 11, 2018 at 05:24:58PM -0600, Eric Sandeen wrote:
...

>>> +	/* Non-rotational device?  Throw all the CPUs. */
>>> +	rot = 1;
>>> +	error = ioctl(disk->d_fd, BLKROTATIONAL, &rot);
>>> +	if (error == 0 && rot == 0)
>>> +		return nproc;
>>
>> I needed
>>
>> +#ifndef BLKROTATIONAL
>> +#define BLKROTATIONAL _IO(0x12,126)
>> +#endif
>>
>> to make this compile on my not /that/ ancient (?) rhel6 box ;)
> 
> Hmm... well, since I don't see backporting xfs kernel scrub to 2.6.32
> maybe xfsprogs' build system should just turn off xfs_scrub on old
> systems?
> 
> In any case, I #ifdef BLKROTATIONAL'd out the entire clause.

ok.  well, other distros are making noise about using bleeding edge progs
w/ older distro kernels (hence the mkfs config file wishes) so it's probably
good to consider building against older environments.

Thanks,
-Eric

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 01/27] xfs_scrub: create online filesystem scrub program
  2018-01-06  1:51 ` [PATCH 01/27] xfs_scrub: create online filesystem scrub program Darrick J. Wong
@ 2018-01-12  0:16   ` Eric Sandeen
  2018-01-12  1:08     ` Darrick J. Wong
  2018-01-12  1:07   ` Eric Sandeen
  1 sibling, 1 reply; 61+ messages in thread
From: Eric Sandeen @ 2018-01-12  0:16 UTC (permalink / raw)
  To: Darrick J. Wong, sandeen; +Cc: linux-xfs

On 1/5/18 7:51 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>

<man page nitpicking>

> diff --git a/man/man8/xfs_scrub.8 b/man/man8/xfs_scrub.8
> new file mode 100644
> index 0000000..95f4fea
> --- /dev/null
> +++ b/man/man8/xfs_scrub.8
> @@ -0,0 +1,117 @@
> +.TH xfs_scrub 8
> +.SH NAME
> +xfs_scrub \- scrub the contents of an XFS filesystem
> +.SH SYNOPSIS
> +.B xfs_scrub
> +[
> +.B \-abemnTvVxy
               ^
> +]
> +.I mount-point

or block device?

> +.br
> +.B xfs_scrub \-V
                  ^

If V is special it probably shouldn't be in the first arg string?

Do you mean to hide the "-d" option?


> +.SH DESCRIPTION
> +.B xfs_scrub
> +attempts to check and repair all metadata in a mounted XFS filesystem.
> +.PP
> +.B xfs_scrub
> +asks the kernel to scrub all metadata objects in the filesystem.
> +Metadata records are scanned for obviously bad values and then
> +cross-referenced against other metadata.
> +The goal is to establish a threasonable confidence about the consistency

"reasonable"

> +of the overall filesystem by examining the consistency of individual
> +metadata records against the other metadata in the filesystem across the
> +entire filesystem.

Redundant, "examining the consistency of individual metadata records against
the other medtadata in the filesystem."  would suffice.

> +Damaged metadata can be rebuilt from other metadata if there is
> +sufficient redundancy (and no other corruption) in the metadata.

Again redundant, maybe just "if there is sufficient redundancy within
other intact metadata?"

> +.PP
> +This utility does not know how to correct all errors.
> +If the tool cannot fix the detected errors, you must unmount the
> +filesystem and run
> +.B xfs_repair
> +to fix the problems.
> +If this tool is not run with either of the
> +.B \-n
> +or
> +.B \-y
> +options, then it will optimize the filesystem when possible,
> +but it will not try to fix errors.

I think the manpage needs to describe what this optimization might
involve, at least at a high level.  Will it fsr all my files? Will
it trim my free space?  Will it compact my directories?  Will it ...?
What exactly am I agreeing to here? :)

> +.SH OPTIONS
> +.TP
> +.BI \-a " errors"
> +Abort if more than this many errors are found on the filesystem.
> +.TP
> +.B \-b
> +Run in background mode.
> +If the option is specified once, only run a single scrubbing thread at a
> +time.
> +If given more than once, an artificial delay of 100us is added to each
> +scrub call to reduce CPU overhead even further.

I wonder, should it take a value instead of -bbbbbbbbb?

> +.TP
> +.B \-e
> +Specifies what happens when errors are detected.
> +If
> +.IR shutdown
> +is given, the filesystem will be taken offline if errors are found.
> +Not all backends can shut down a filesystem.

<user> what's a backend? </user>

> +If
> +.IR continue
> +is given, no action taken if errors are found.
> +This is the default.

<user> so how do I know what errors were found? </user>

> +.TP
> +.BI \-m " file"
> +Search this file for mounted filesystems instead of /etc/mtab.
> +.TP
> +.B \-n
> +Dry run, do not modify anything in the filesystem.
> +This disables all preening and optimization behaviors, and disables
> +calling FITRIM on the free space after a successful run.

what if I only want to disable FITRIM?  (-k?)
Oh, and it runs FITRIM?  Can you mention that more prominently
in the behavior description?  (and should it, given that we
have a tool for that purpose?)

> +.TP
> +.BI \-T
> +Print timing and memory usage information for each phase.
> +.TP
> +.B \-v
> +Enable verbose mode, which prints periodic status updates.
> +.TP
> +.B \-V
> +Prints the version number and exits.
> +.TP
> +.B \-x
> +Scrub all file data too.

colloquial?  maybe s/too/as well/ 

> +The block list will be sorted in disk order for better performance.

Cool, so when I'm done, my filesystem will have better performance if I use -x?
and none of my files will be corrupted!  ;)

The read order is probably an implementation detail that doesn't need to be in
the manpage.  It may be worth changing the description a bit to make it
clearer that the purpose is to determine readability of every file block?
I mean, that should probably be obvious, but ...

> +.B xfs_scrub
> +will issue O_DIRECT reads to the block device directly.
> +If the block device is a SCSI disk, it will issue READ VERIFY commands
> +directly to the disk.

+ These actions will confirm that all file data blocks can be read from storage.

or something?

> +.TP
> +.B \-y
> +Try to repair all filesystem errors.
> +If the errors cannot be fixed online, then the filesystem must be taken
> +offline for repair.
> +.SH EXIT CODE
> +The exit code returned by
> +.B xfs_scrub
> +is the sum of the following conditions:
> +.br
> +\	0\	\-\ No errors
> +.br
> +\	1\	\-\ File system errors left uncorrected
> +.br
> +\	2\	\-\ File system optimizations possible
> +.br
> +\	4\	\-\ Operational error
> +.br
> +\	8\	\-\ Usage or syntax error
> +.br
> +.SH CAVEATS
> +.B xfs_scrub
> +is an immature utility!

Might it damage my filesystem? ;)

> +This program takes advantage of in-kernel scrubbing to verify a given
> +data structure with locks held.
> +The kernel must support the BULKSTAT, FSGEOMETRY, FSCOUNTS, GET_RESBLKS,
> +GETBMAPX, GETFSMAP, INUMBERS, and SCRUB_METADATA ioctls.

Some of those ioctls are ancient and probably don't need to be specified...
Can you do anything at all without SCRUB_METADATA?  If not,
is SCRUB_METADATA sufficient to determine that the kernel has the rest
of what it needs?

> +This can tie up the system for a while.

Maybe that's a statement to go right after "locks held"

> +.PP
> +If errors are found and cannot be repaired, the filesystem must be taken
> +offline and repaired.

"unmounted and repaired" might be more specific?  *shrug*

> +.SH SEE ALSO
> +.BR xfs_repair (8).



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 10/27] xfs_scrub: add file space map iteration functions
  2018-01-11 23:19   ` Eric Sandeen
@ 2018-01-12  0:24     ` Darrick J. Wong
  0 siblings, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-12  0:24 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: sandeen, linux-xfs

On Thu, Jan 11, 2018 at 05:19:22PM -0600, Eric Sandeen wrote:
> On 1/5/18 7:52 PM, Darrick J. Wong wrote:
> 
> 
> > + * These routines provide a simple interface to query the block
> > + * mappings of the fork of a given inode via GETBMAPX and call a
> > + * function to iterate each mapping result.
> > + */
> > +
> > +#define BMAP_NR		2048
> > +
> > +/* Iterate all the extent block mappings between the key and fork end. */
> > +bool
> > +xfs_iterate_filemaps(
> > +	struct scrub_ctx	*ctx,
> > +	const char		*descr,
> > +	int			fd,
> > +	int			whichfork,
> > +	struct xfs_bmap		*key,
> 
> <coverity pass>
> 
> Ok key is an xfs_bmap:
> 
> /* inode fork block mapping */
> struct xfs_bmap {
>         uint64_t        bm_offset;      /* file offset of segment in bytes */
>         uint64_t        bm_physical;    /* physical starting byte  */
>         uint64_t        bm_length;      /* length of segment, bytes */
>         uint32_t        bm_flags;       /* output flags */
> };
> 
> > +	xfs_bmap_iter_fn	fn,
> > +	void			*arg)
> > +{
> > +	struct fsxattr		fsx;
> > +	struct getbmapx		*map
> map is a getbmapx ...
> 
> struct getbmapx {
>         __s64           bmv_offset;     /* file offset of segment in blocks */
>         __s64           bmv_block;      /* starting block (64-bit daddr_t)  */
>         __s64           bmv_length;     /* length of segment, blocks        */
>         __s32           bmv_count;      /* # of entries in array incl. 1st  */
>         __s32           bmv_entries;    /* # of entries filled in (output). */
>         __s32           bmv_iflags;     /* input flags (1st structure)      */
>         __s32           bmv_oflags;     /* output flags (after 1st structure)*/
>         __s32           bmv_unused1;    /* future use                       */
>         __s32           bmv_unused2;    /* future use                       */
> };
> 
> ...
> 
> > +out:
> > +	memcpy(key, map, sizeof(struct getbmapx));
> 
> so I don't think that fits, right?

I can't remember why this line is even needed, so away it goes.

--D

> 
> 
> -Eric
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 12/27] xfs_scrub: wrap the scrub ioctl
  2018-01-11 23:12   ` Eric Sandeen
@ 2018-01-12  0:28     ` Darrick J. Wong
  0 siblings, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-12  0:28 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: sandeen, linux-xfs

On Thu, Jan 11, 2018 at 05:12:49PM -0600, Eric Sandeen wrote:
> On 1/5/18 7:52 PM, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Create some wrappers to call the scrub ioctls.
> 
> > +/*
> > + * Sleep for 100ms * however many -b we got past the initial one.
> > + * This is an (albeit clumsy) way to throttle scrub activity.
> > + */
> > +void
> > +background_sleep(void)
> > +{
> > +	unsigned long long	time;
> > +	struct timespec		tv;
> > +
> > +	if (bg_mode < 2)
> > +		return;
> > +
> > +	time = 100000 * (bg_mode - 1);
> 
> <coverity pass>
> 
> Probably want to cast the constant(s) to something larger if someone
> issues -b $HUGE ... 100000ULL?

I suppose, though I doubt anyone will pass -b 42,950 times. 8-)

(-b doesn't take an argument)

Fixed.

--D

> -Eric
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 21/27] xfs_scrub: scrub file data blocks
  2018-01-11 23:25   ` Eric Sandeen
@ 2018-01-12  0:29     ` Darrick J. Wong
  0 siblings, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-12  0:29 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: sandeen, linux-xfs

On Thu, Jan 11, 2018 at 05:25:58PM -0600, Eric Sandeen wrote:
> 
> 
> On 1/5/18 7:53 PM, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> ...
> 
> > +		/* Get the stat info for this directory entry. */
> > +		error = fstatat(dir_fd, dirent->d_name, &sb,
> > +				AT_NO_AUTOMOUNT | AT_SYMLINK_NOFOLLOW);
> > +		if (error) {
> > +			str_errno(ctx, newpath);
> > +			continue;
> 
> I needed:
> 
> +#ifndef AT_NO_AUTOMOUNT
> +#define AT_NO_AUTOMOUNT 0x800
> +#endif

Fixed.

--D

> here
> 
> -Eric
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 25/27] xfs_scrub: progress indicator
  2018-01-11 23:27   ` Eric Sandeen
@ 2018-01-12  0:32     ` Darrick J. Wong
  0 siblings, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-12  0:32 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: sandeen, linux-xfs

On Thu, Jan 11, 2018 at 05:27:54PM -0600, Eric Sandeen wrote:
> On 1/5/18 7:54 PM, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> 
> > +#define NSEC_PER_SEC	(1000000000)
> > +static void *
> > +progress_report_thread(void *arg)
> > +{
> > +	struct timespec		abstime;
> > +	int			ret;
> > +
> > +	pthread_mutex_lock(&pt.lock);
> > +	while (1) {
> > +		/* Every half second. */
> > +		ret = clock_gettime(CLOCK_REALTIME, &abstime);
> 
> 
> My manpage says "link with -rt" and to include <time.h>, this got me
> going:
> 
> diff --git a/scrub/Makefile b/scrub/Makefile
> index 3e6f690..0094d9d 100644
> --- a/scrub/Makefile
> +++ b/scrub/Makefile
> @@ -67,7 +67,7 @@ xfs_scrub.c
>  
>  LLDLIBS += $(LIBHANDLE) $(LIBFROG) $(LIBPTHREAD) $(LIBUNISTRING)
>  LTDEPENDENCIES += $(LIBHANDLE) $(LIBFROG) $(LIBUNISTRING)
> -LLDFLAGS = -static
> +LLDFLAGS = -static -lrt

I added $(LIBRT) to the end of LLDLIBS/LTDEPENDENCIES since we already
defined it elsewhere in the autoconf goo for benefit of the other
programs.

>  
>  ifeq ($(HAVE_MALLINFO),yes)
>  LCFLAGS += -DHAVE_MALLINFO
> diff --git a/scrub/progress.c b/scrub/progress.c
> index 30b2152..61b9c60 100644
> --- a/scrub/progress.c
> +++ b/scrub/progress.c
> @@ -22,6 +22,7 @@
>  #include <dirent.h>
>  #include <pthread.h>
>  #include <sys/statvfs.h>
> +#include <time.h>

Fixed.

--D

>  #include "../repair/threads.h"
>  #include "path.h"
>  #include "disk.h"
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 01/27] xfs_scrub: create online filesystem scrub program
  2018-01-06  1:51 ` [PATCH 01/27] xfs_scrub: create online filesystem scrub program Darrick J. Wong
  2018-01-12  0:16   ` Eric Sandeen
@ 2018-01-12  1:07   ` Eric Sandeen
  2018-01-12  1:10     ` Darrick J. Wong
  1 sibling, 1 reply; 61+ messages in thread
From: Eric Sandeen @ 2018-01-12  1:07 UTC (permalink / raw)
  To: Darrick J. Wong, sandeen; +Cc: linux-xfs



On 1/5/18 7:51 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Create the foundations of a filesystem scrubbing tool that asks the
> kernel to inspect all metadata in the filesystem and (ultimately) to
> repair anything that's broken.  Also create the man page for the
> utility.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

...

> +/*
> + * XFS Online Metadata Scrub (and Repair)
> + *
> + * The XFS scrubber uses custom XFS ioctls to probe more deeply into the
> + * internals of the filesystem.  It takes advantage of scrubbing ioctls
> + * to check all the records stored in a metadata object and to
> + * cross-reference those records against the other filesystem metadata.
> + *
> + * After the program gathers command line arguments to figure out
> + * exactly what the user wants the program is going to do, scrub

* exactly what the user wants the program to do

or -

* exactly what the program is going to do

or -

* exactly what the user wants to do

:)

> + * execution is split up into several separate phases:
> + *
> + * The "find geometry" phase queries XFS for the filesystem geometry.
> + * The block devices for the data, realtime, and log devices are opened.
> + * Kernel ioctls are test-queried to see if they actually work (the scrub
> + * ioctl in particular), and any other filesystem-specific information
> + * is gathered.
> + *
> + * In the "check internal metadata" phase, we call the metadata scrub
> + * ioctl to check the filesystem's internal per-AG btrees.  This
> + * includes the AG superblock, AGF, AGFL, and AGI headers, freespace
> + * btrees, the regular and free inode btrees, the reverse mapping
> + * btrees, and the reference counting btrees.  If the realtime device is
> + * enabled, the realtime bitmap and reverse mapping btrees are enabled.

checked?

-Eric

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 01/27] xfs_scrub: create online filesystem scrub program
  2018-01-12  0:16   ` Eric Sandeen
@ 2018-01-12  1:08     ` Darrick J. Wong
  0 siblings, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-12  1:08 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: sandeen, linux-xfs

On Thu, Jan 11, 2018 at 06:16:02PM -0600, Eric Sandeen wrote:
> On 1/5/18 7:51 PM, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> <man page nitpicking>
> 
> > diff --git a/man/man8/xfs_scrub.8 b/man/man8/xfs_scrub.8
> > new file mode 100644
> > index 0000000..95f4fea
> > --- /dev/null
> > +++ b/man/man8/xfs_scrub.8
> > @@ -0,0 +1,117 @@
> > +.TH xfs_scrub 8
> > +.SH NAME
> > +xfs_scrub \- scrub the contents of an XFS filesystem
> > +.SH SYNOPSIS
> > +.B xfs_scrub
> > +[
> > +.B \-abemnTvVxy
>                ^
> > +]
> > +.I mount-point
> 
> or block device?
> 
> > +.br
> > +.B xfs_scrub \-V
>                   ^
> 
> If V is special it probably shouldn't be in the first arg string?

Yes, fixed.

> Do you mean to hide the "-d" option?

-d turn on debug mode; I was going to keep that hidden from users.

> 
> > +.SH DESCRIPTION
> > +.B xfs_scrub
> > +attempts to check and repair all metadata in a mounted XFS filesystem.
> > +.PP
> > +.B xfs_scrub
> > +asks the kernel to scrub all metadata objects in the filesystem.
> > +Metadata records are scanned for obviously bad values and then
> > +cross-referenced against other metadata.
> > +The goal is to establish a threasonable confidence about the consistency
> 
> "reasonable"

Fixed.

> > +of the overall filesystem by examining the consistency of individual
> > +metadata records against the other metadata in the filesystem across the
> > +entire filesystem.
> 
> Redundant, "examining the consistency of individual metadata records against
> the other medtadata in the filesystem."  would suffice.

Fixed.

> > +Damaged metadata can be rebuilt from other metadata if there is
> > +sufficient redundancy (and no other corruption) in the metadata.
> 
> Again redundant, maybe just "if there is sufficient redundancy within
> other intact metadata?"

"Damaged metadata can be rebuilt from other metadata if there exists
redundant data structures which are intact."

?

> > +.PP
> > +This utility does not know how to correct all errors.
> > +If the tool cannot fix the detected errors, you must unmount the
> > +filesystem and run
> > +.B xfs_repair
> > +to fix the problems.
> > +If this tool is not run with either of the
> > +.B \-n
> > +or
> > +.B \-y
> > +options, then it will optimize the filesystem when possible,
> > +but it will not try to fix errors.
> 
> I think the manpage needs to describe what this optimization might
> involve, at least at a high level.  Will it fsr all my files? Will
> it trim my free space?  Will it compact my directories?  Will it ...?
> What exactly am I agreeing to here? :)

"Optimizations may include, but are not limited to, activities such as
compacting metadata or bypassing shared block write checks for files
that no longer share blocks."

> > +.SH OPTIONS
> > +.TP
> > +.BI \-a " errors"
> > +Abort if more than this many errors are found on the filesystem.
> > +.TP
> > +.B \-b
> > +Run in background mode.
> > +If the option is specified once, only run a single scrubbing thread at a
> > +time.
> > +If given more than once, an artificial delay of 100us is added to each
> > +scrub call to reduce CPU overhead even further.
> 
> I wonder, should it take a value instead of -bbbbbbbbb?

More than ten -b and this program gets reallllly slow.  There are
currently six global fs checks, ten per-AG checks, and seven per-file
checks.  On my /home filesystem with 4M inodes and 32 AGs that adds up
to...

6 + (32 * 10) + (4M * 7) == ~28M scrub calls, or 324 days to perform
a scan.

> > +.TP
> > +.B \-e
> > +Specifies what happens when errors are detected.
> > +If
> > +.IR shutdown
> > +is given, the filesystem will be taken offline if errors are found.
> > +Not all backends can shut down a filesystem.
> 
> <user> what's a backend? </user>

Leftover remnant from the days when this was a frankentool that could be
used to walk filesystems via the standard interfaces.  I removed this
sentence.

> > +If
> > +.IR continue
> > +is given, no action taken if errors are found.
> > +This is the default.
> 
> <user> so how do I know what errors were found? </user>

"Filesystem corruption and optimization opportunities will be logged to
the standard error stream."

I'll put that at the top.

> > +.TP
> > +.BI \-m " file"
> > +Search this file for mounted filesystems instead of /etc/mtab.
> > +.TP
> > +.B \-n
> > +Dry run, do not modify anything in the filesystem.
> > +This disables all preening and optimization behaviors, and disables
> > +calling FITRIM on the free space after a successful run.
> 
> what if I only want to disable FITRIM?  (-k?)

Oh all right. :)

> Oh, and it runs FITRIM?  Can you mention that more prominently
> in the behavior description?

I'll put it in the list of optimizations.

> (and should it, given that we have a tool for that purpose?)

Yes we have fstrim but I consider it too scary to run out of the
blue without checking the health of the free space info first.

> > +.TP
> > +.BI \-T
> > +Print timing and memory usage information for each phase.
> > +.TP
> > +.B \-v
> > +Enable verbose mode, which prints periodic status updates.
> > +.TP
> > +.B \-V
> > +Prints the version number and exits.
> > +.TP
> > +.B \-x
> > +Scrub all file data too.
> 
> colloquial?  maybe s/too/as well/ 

"Read all file data extents to look for disk errors."

> > +The block list will be sorted in disk order for better performance.
> 
> Cool, so when I'm done, my filesystem will have better performance if I use -x?
> and none of my files will be corrupted!  ;)
> 
> The read order is probably an implementation detail that doesn't need to be in
> the manpage.  It may be worth changing the description a bit to make it
> clearer that the purpose is to determine readability of every file block?
> I mean, that should probably be obvious, but ...

Eh, I'll just remove it.

> > +.B xfs_scrub
> > +will issue O_DIRECT reads to the block device directly.
> > +If the block device is a SCSI disk, it will issue READ VERIFY commands
> > +directly to the disk.
> 
> + These actions will confirm that all file data blocks can be read from storage.
> 
> or something?

Ok, added that verbatim.

> > +.TP
> > +.B \-y
> > +Try to repair all filesystem errors.
> > +If the errors cannot be fixed online, then the filesystem must be taken
> > +offline for repair.
> > +.SH EXIT CODE
> > +The exit code returned by
> > +.B xfs_scrub
> > +is the sum of the following conditions:
> > +.br
> > +\	0\	\-\ No errors
> > +.br
> > +\	1\	\-\ File system errors left uncorrected
> > +.br
> > +\	2\	\-\ File system optimizations possible
> > +.br
> > +\	4\	\-\ Operational error
> > +.br
> > +\	8\	\-\ Usage or syntax error
> > +.br
> > +.SH CAVEATS
> > +.B xfs_scrub
> > +is an immature utility!
> 
> Might it damage my filesystem? ;)

It glides as softly as a piston!




...oh, are we not doing the monorail song?

> > +This program takes advantage of in-kernel scrubbing to verify a given
> > +data structure with locks held.

"This program takes advantage of in-kernel scrubbing to verify a given
data structure with locks held and can keep the filesystem busy for a
long time."

> > +The kernel must support the BULKSTAT, FSGEOMETRY, FSCOUNTS, GET_RESBLKS,
> > +GETBMAPX, GETFSMAP, INUMBERS, and SCRUB_METADATA ioctls.
> 
> Some of those ioctls are ancient and probably don't need to be specified...
> Can you do anything at all without SCRUB_METADATA?  If not,
> is SCRUB_METADATA sufficient to determine that the kernel has the rest
> of what it needs?

SCRUB_METADATA is enough, provided we don't get kernel-tinyfication'd.

> > +This can tie up the system for a while.
> 
> Maybe that's a statement to go right after "locks held"

Ok.

> > +.PP
> > +If errors are found and cannot be repaired, the filesystem must be taken
> > +offline and repaired.
> 
> "unmounted and repaired" might be more specific?  *shrug*

Ok.

--D

> > +.SH SEE ALSO
> > +.BR xfs_repair (8).
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 01/27] xfs_scrub: create online filesystem scrub program
  2018-01-12  1:07   ` Eric Sandeen
@ 2018-01-12  1:10     ` Darrick J. Wong
  0 siblings, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-12  1:10 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: sandeen, linux-xfs

On Thu, Jan 11, 2018 at 07:07:43PM -0600, Eric Sandeen wrote:
> 
> 
> On 1/5/18 7:51 PM, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Create the foundations of a filesystem scrubbing tool that asks the
> > kernel to inspect all metadata in the filesystem and (ultimately) to
> > repair anything that's broken.  Also create the man page for the
> > utility.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> ...
> 
> > +/*
> > + * XFS Online Metadata Scrub (and Repair)
> > + *
> > + * The XFS scrubber uses custom XFS ioctls to probe more deeply into the
> > + * internals of the filesystem.  It takes advantage of scrubbing ioctls
> > + * to check all the records stored in a metadata object and to
> > + * cross-reference those records against the other filesystem metadata.
> > + *
> > + * After the program gathers command line arguments to figure out
> > + * exactly what the user wants the program is going to do, scrub
> 
> * exactly what the user wants the program to do
> 
> or -
> 
> * exactly what the program is going to do
> 
> or -
> 
> * exactly what the user wants to do
> 
> :)

The second.  The program can figure out what the program is going to do;
it has no idea what the user wants.

> > + * execution is split up into several separate phases:
> > + *
> > + * The "find geometry" phase queries XFS for the filesystem geometry.
> > + * The block devices for the data, realtime, and log devices are opened.
> > + * Kernel ioctls are test-queried to see if they actually work (the scrub
> > + * ioctl in particular), and any other filesystem-specific information
> > + * is gathered.
> > + *
> > + * In the "check internal metadata" phase, we call the metadata scrub
> > + * ioctl to check the filesystem's internal per-AG btrees.  This
> > + * includes the AG superblock, AGF, AGFL, and AGI headers, freespace
> > + * btrees, the regular and free inode btrees, the reverse mapping
> > + * btrees, and the reference counting btrees.  If the realtime device is
> > + * enabled, the realtime bitmap and reverse mapping btrees are enabled.
> 
> checked?

Fixed.

--D

> -Eric
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 02/27] xfs_scrub: common error handling
  2018-01-06  1:51 ` [PATCH 02/27] xfs_scrub: common error handling Darrick J. Wong
@ 2018-01-12  1:15   ` Eric Sandeen
  2018-01-12  1:23     ` Darrick J. Wong
  0 siblings, 1 reply; 61+ messages in thread
From: Eric Sandeen @ 2018-01-12  1:15 UTC (permalink / raw)
  To: Darrick J. Wong, sandeen; +Cc: linux-xfs

On 1/5/18 7:51 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Standardize how we record and report errors.
> 


> +/*
> + * Reporting Status to the Console
> + *
> + * We aim for a roughly standard reporting format -- the severity of the
> + * status being reported, a textual description of the objecting being

object?  (I mean, I suppose it might be objecting to your horribly
corrupted filesystem?) ;)

> + * reported, and whatever the status happens to be.
> + *
> + * Errors are the most severe and reflect filesystem corruption.
> + * Warnings indicate that something is amiss and needs the attention of
> + * the administrator, but does not constitute a corruption.  Information
> + * is merely advisory.
> + */
> +


>  /* Program name; needed for libxcmd error reports. */
>  char				*progname = "xfs_scrub";
>  
> +/* Debug level; higher values mean more verbosity. */
> +unsigned int			debug;
> +
> +/* Should we dump core if errors happen? */
> +bool				dumpcore;

not independent of debug right, but ... *shrug* 

-Eric

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 02/27] xfs_scrub: common error handling
  2018-01-12  1:15   ` Eric Sandeen
@ 2018-01-12  1:23     ` Darrick J. Wong
  0 siblings, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-12  1:23 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: sandeen, linux-xfs

On Thu, Jan 11, 2018 at 07:15:52PM -0600, Eric Sandeen wrote:
> On 1/5/18 7:51 PM, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Standardize how we record and report errors.
> > 
> 
> 
> > +/*
> > + * Reporting Status to the Console
> > + *
> > + * We aim for a roughly standard reporting format -- the severity of the
> > + * status being reported, a textual description of the objecting being
> 
> object?  (I mean, I suppose it might be objecting to your horribly
> corrupted filesystem?) ;)

Fixed.

> > + * reported, and whatever the status happens to be.
> > + *
> > + * Errors are the most severe and reflect filesystem corruption.
> > + * Warnings indicate that something is amiss and needs the attention of
> > + * the administrator, but does not constitute a corruption.  Information
> > + * is merely advisory.
> > + */
> > +
> 
> 
> >  /* Program name; needed for libxcmd error reports. */
> >  char				*progname = "xfs_scrub";
> >  
> > +/* Debug level; higher values mean more verbosity. */
> > +unsigned int			debug;
> > +
> > +/* Should we dump core if errors happen? */
> > +bool				dumpcore;
> 
> not independent of debug right, but ... *shrug* 

Wart from the old days.  I'll gate core dumping on debug.

--D

> -Eric
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 06/27] xfs_scrub: create an abstraction for a block device
  2018-01-12  0:04       ` Eric Sandeen
@ 2018-01-12  1:27         ` Darrick J. Wong
  0 siblings, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-12  1:27 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: sandeen, linux-xfs

On Thu, Jan 11, 2018 at 06:04:38PM -0600, Eric Sandeen wrote:
> On 1/11/18 5:59 PM, Darrick J. Wong wrote:
> > On Thu, Jan 11, 2018 at 05:24:58PM -0600, Eric Sandeen wrote:
> ...
> 
> >>> +	/* Non-rotational device?  Throw all the CPUs. */
> >>> +	rot = 1;
> >>> +	error = ioctl(disk->d_fd, BLKROTATIONAL, &rot);
> >>> +	if (error == 0 && rot == 0)
> >>> +		return nproc;
> >>
> >> I needed
> >>
> >> +#ifndef BLKROTATIONAL
> >> +#define BLKROTATIONAL _IO(0x12,126)
> >> +#endif
> >>
> >> to make this compile on my not /that/ ancient (?) rhel6 box ;)
> > 
> > Hmm... well, since I don't see backporting xfs kernel scrub to 2.6.32
> > maybe xfsprogs' build system should just turn off xfs_scrub on old
> > systems?
> > 
> > In any case, I #ifdef BLKROTATIONAL'd out the entire clause.
> 
> ok.  well, other distros are making noise about using bleeding edge progs
> w/ older distro kernels (hence the mkfs config file wishes) so it's probably
> good to consider building against older environments.

<shrug> ok I can patch it in like that...

--D

> Thanks,
> -Eric
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/27] xfs_scrub: set up command line argument parsing
  2018-01-06  1:51 ` [PATCH 03/27] xfs_scrub: set up command line argument parsing Darrick J. Wong
  2018-01-11 23:39   ` Eric Sandeen
@ 2018-01-12  1:30   ` Eric Sandeen
  2018-01-12  2:03     ` Darrick J. Wong
  1 sibling, 1 reply; 61+ messages in thread
From: Eric Sandeen @ 2018-01-12  1:30 UTC (permalink / raw)
  To: Darrick J. Wong, sandeen; +Cc: linux-xfs

On 1/5/18 7:51 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Parse command line options in order to set up the context in which we
> will scrub the filesystem.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  scrub/common.h    |    8 ++
>  scrub/xfs_scrub.c |  207 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  scrub/xfs_scrub.h |   34 +++++++++
>  3 files changed, 249 insertions(+)
> 
> 
> diff --git a/scrub/common.h b/scrub/common.h
> index f620620..15a59bd 100644
> --- a/scrub/common.h
> +++ b/scrub/common.h
> @@ -48,4 +48,12 @@ void __record_preen(struct scrub_ctx *ctx, const char *descr, const char *file,
>  #define str_info(ctx, str, ...)		__str_info(ctx, str, __FILE__, __LINE__, __VA_ARGS__)
>  #define dbg_printf(fmt, ...)		{if (debug > 1) {printf(fmt, __VA_ARGS__);}}
>  
> +/* Is this debug tweak enabled? */
> +static inline bool
> +debug_tweak_on(
> +	const char		*name)
> +{
> +	return debug && getenv(name) != NULL;

since it's debug anyway, I wonder if printing
"FOO_BAR_TWEAK is on" would be useful here.

> +}
> +
>  #endif /* XFS_SCRUB_COMMON_H_ */
> diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
> index 10116a8..9db3b41 100644
> --- a/scrub/xfs_scrub.c
> +++ b/scrub/xfs_scrub.c
> @@ -20,7 +20,12 @@
>  #include <stdio.h>
>  #include <pthread.h>
>  #include <stdbool.h>
> +#include <stdlib.h>
> +#include "platform_defs.h"
> +#include "xfs.h"
> +#include "input.h"
>  #include "xfs_scrub.h"
> +#include "common.h"
>  
>  /*
>   * XFS Online Metadata Scrub (and Repair)
> @@ -107,11 +112,213 @@ unsigned int			debug;
>  /* Should we dump core if errors happen? */
>  bool				dumpcore;
>  
> +/* Display resource usage at the end of each phase? */
> +bool				display_rusage;
> +
> +/* Background mode; higher values insert more pauses between scrub calls. */
> +unsigned int			bg_mode;
> +
> +/* Maximum number of processors available to us. */
> +int				nproc;
> +
> +/* Number of threads we're allowed to use. */
> +unsigned int			nr_threads;
> +
> +/* Verbosity; higher values print more information. */
> +bool				verbose;
> +
> +/* Should we scrub the data blocks? */
> +bool				scrub_data;
> +
> +/* Size of a memory page. */
> +long				page_size;
> +
> +static void __attribute__((noreturn))
> +usage(void)
> +{
> +	fprintf(stderr, _("Usage: %s [OPTIONS] mountpoint\n"), progname);
> +	fprintf(stderr, _("-a:\tStop after this many errors are found.\n"));
> +	fprintf(stderr, _("-b:\tBackground mode.\n"));
> +	fprintf(stderr, _("-e:\tWhat to do if errors are found.\n"));
> +	fprintf(stderr, _("-m:\tPath to /etc/mtab.\n"));
> +	fprintf(stderr, _("-n:\tDry run.  Do not modify anything.\n"));
> +	fprintf(stderr, _("-T:\tDisplay timing/usage information.\n"));
> +	fprintf(stderr, _("-v:\tVerbose output.\n"));
> +	fprintf(stderr, _("-V:\tPrint version.\n"));
> +	fprintf(stderr, _("-x:\tScrub file data too.\n"));
> +	fprintf(stderr, _("-y:\tRepair all errors.\n"));
> +
> +	exit(16);
> +}
> +
>  int
>  main(
>  	int			argc,
>  	char			**argv)
>  {
> +	int			c;
> +	char			*mtab = NULL;
> +	char			*repairstr = "";
> +	struct scrub_ctx	ctx = {0};
> +	unsigned long long	total_errors;
> +	bool			moveon = true;
> +	static bool		injected;
> +	int			ret = 0;
> +
>  	fprintf(stderr, "XXX: This program is not complete!\n");
>  	return 4;
> +
> +	progname = basename(argv[0]);
> +	setlocale(LC_ALL, "");
> +	bindtextdomain(PACKAGE, LOCALEDIR);
> +	textdomain(PACKAGE);
> +
> +	pthread_mutex_init(&ctx.lock, NULL);
> +	ctx.mode = SCRUB_MODE_DEFAULT;
> +	ctx.error_action = ERRORS_CONTINUE;
> +	while ((c = getopt(argc, argv, "a:bde:m:nTvxVy")) != EOF) {
> +		switch (c) {
> +		case 'a':
> +			ctx.max_errors = cvt_u64(optarg, 10);
> +			if (errno) {
> +				perror(optarg);
> +				usage();
> +			}
> +			break;
> +		case 'b':
> +			nr_threads = 1;
> +			bg_mode++;
> +			break;
> +		case 'd':
> +			debug++;
> +			dumpcore = true;
> +			break;
> +		case 'e':
> +			if (!strcmp("continue", optarg))
> +				ctx.error_action = ERRORS_CONTINUE;
> +			else if (!strcmp("shutdown", optarg))
> +				ctx.error_action = ERRORS_SHUTDOWN;
> +			else
> +				usage();

Nothing tells me what I did wrong here,

# scrub/xfs_scrub -e make_it_so /mnt/test
Usage: xfs_scrub [OPTIONS] mountpoint
-a:	Stop after this many errors are found.
-b:	Background mode.
-C:	Print progress information to this fd.
-e:	What to do if errors are found.
...

I told it what to do... what's wrong?

> +			break;
> +		case 'm':
> +			mtab = optarg;
> +			break;
> +		case 'n':
> +			if (ctx.mode != SCRUB_MODE_DEFAULT) {
> +				fprintf(stderr,
> +_("Only one of the options -n or -y may be specified.\n"));
> +				return 1;
> +			}
> +			ctx.mode = SCRUB_MODE_DRY_RUN;
> +			break;
> +		case 'T':
> +			display_rusage = true;
> +			break;
> +		case 'v':
> +			verbose = true;
> +			break;
> +		case 'V':
> +			fprintf(stdout, _("%s version %s\n"), progname,
> +					VERSION);
> +			fflush(stdout);
> +			exit(0);
> +		case 'x':
> +			scrub_data = true;
> +			break;
> +		case 'y':
> +			if (ctx.mode != SCRUB_MODE_DEFAULT) {
> +				fprintf(stderr,
> +_("Only one of the options -n or -y may be specified.\n"));
> +				return 1;
> +			}
> +			ctx.mode = SCRUB_MODE_REPAIR;
> +			break;
> +		case '?':

'?' isn't in the getopt string ...

# scrub/xfs_scrub ?
xfs_scrub: could not stat: ?: No such file or directory


> +			/* fall through */
> +		default:
> +			usage();
> +		}
> +	}
> +
> +	/* Override thread count if debugger */
> +	if (debug_tweak_on("XFS_SCRUB_THREADS")) {

can you document all these tweaks somewhere near the top in a comment?

> +		unsigned int	x;
> +
> +		x = cvt_u32(getenv("XFS_SCRUB_THREADS"), 10);
> +		if (errno) {
> +			perror("nr_threads");
> +			usage();
> +		}
> +		nr_threads = x;
> +	}
> +
> +	if (optind != argc - 1)
> +		usage();
> +
> +	ctx.mntpoint = strdup(argv[optind]);
> +
> +	/*
> +	 * If the user did not specify an explicit mount table, try to use
> +	 * /proc/mounts if it is available, else /etc/mtab.  We prefer
> +	 * /proc/mounts because it is kernel controlled, while /etc/mtab
> +	 * may contain garbage that userspace tools like pam_mounts wrote
> +	 * into it.
> +	 */
> +	if (!mtab) {
> +		if (access(_PATH_PROC_MOUNTS, R_OK) == 0)
> +			mtab = _PATH_PROC_MOUNTS;
> +		else
> +			mtab = _PATH_MOUNTED;
> +	}
> +
> +	/* How many CPUs? */
> +	nproc = sysconf(_SC_NPROCESSORS_ONLN);
> +	if (nproc < 1)
> +		nproc = 1;
> +
> +	/* Set up a page-aligned buffer for read verification. */
> +	page_size = sysconf(_SC_PAGESIZE);
> +	if (page_size < 0) {
> +		str_errno(&ctx, ctx.mntpoint);
> +		goto out;
> +	}
> +
> +	if (debug_tweak_on("XFS_SCRUB_FORCE_REPAIR") && !injected) {
> +		ctx.mode = SCRUB_MODE_REPAIR;
> +		injected = true;
> +	}

what is "injected" used for?  How could it already be set?.

> +
> +	if (xfs_scrub_excessive_errors(&ctx))
> +		str_info(&ctx, ctx.mntpoint, _("Too many errors; aborting."));

wait wut?  oh right, you'll add $DO_STUFF above this in later patches ;)


Rest looks ok

-Eric

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/27] xfs_scrub: set up command line argument parsing
  2018-01-11 23:39   ` Eric Sandeen
@ 2018-01-12  1:53     ` Darrick J. Wong
  0 siblings, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-12  1:53 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: sandeen, linux-xfs

On Thu, Jan 11, 2018 at 05:39:38PM -0600, Eric Sandeen wrote:
> On 1/5/18 7:51 PM, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Parse command line options in order to set up the context in which we
> > will scrub the filesystem.
> 
> 
> > +static void __attribute__((noreturn))
> > +usage(void)
> > +{
> > +	fprintf(stderr, _("Usage: %s [OPTIONS] mountpoint\n"), progname);
> > +	fprintf(stderr, _("-a:\tStop after this many errors are found.\n"));
> > +	fprintf(stderr, _("-b:\tBackground mode.\n"));
> 
> do you intentionally not document -d?
> <same question for manpage>

Debug mode, so yes.

> > +	fprintf(stderr, _("-e:\tWhat to do if errors are found.\n"));
> > +	fprintf(stderr, _("-m:\tPath to /etc/mtab.\n"));
> > +	fprintf(stderr, _("-n:\tDry run.  Do not modify anything.\n"));
> > +	fprintf(stderr, _("-T:\tDisplay timing/usage information.\n"));
> > +	fprintf(stderr, _("-v:\tVerbose output.\n"));
> > +	fprintf(stderr, _("-V:\tPrint version.\n"));
> > +	fprintf(stderr, _("-x:\tScrub file data too.\n"));
> > +	fprintf(stderr, _("-y:\tRepair all errors.\n"));
> > +
> > +	exit(16);
> > +}
> 
> Could we make this more like xfs_repair usage() for consistency?
> 
> Usage: xfs_repair [options] device
> 
> Options:
>   -f           The device is a file
>   -L           Force log zeroing. Do this as a last resort.
>   -l logdev    Specifies the device where the external log resides.
>   -m maxmem    Maximum amount of memory to be used in megabytes.
>   -n           No modify mode, just checks the filesystem for damage.
>   -P           Disables prefetching.
>   -r rtdev     Specifies the device where the realtime section resides.
>   -v           Verbose output.
>   -c subopts   Change filesystem parameters - use xfs_admin.
>   -o subopts   Override default behaviour, refer to man page.
>   -t interval  Reporting interval in seconds.
>   -d           Repair dangerously.
>   -V           Reports version and exits.
> 
> so maybe:
> 
> Usage: xfs_scrub [options] mountpoint
> 
>   -a count	Stop after this many errors are found.
>   -b		Background mode.
>   -C fd		Print progress information to this fd.
>   -e behavior	What to do if errors are found. (shutdown|continue)
>   -m path	Path to /etc/mtab.
>   -n		Dry run.  Do not modify anything.
>   -T		Display timing/usage information.
>   -v		Verbose output.
>   -V		Reports version and exits.
>   -x		Scrub file data too.
>   -y		Repair all errors.

Ok.  Assuming you meant to indent everything by two spaces and make it
obvious which switches take parameters.

--D

> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/27] xfs_scrub: set up command line argument parsing
  2018-01-12  1:30   ` Eric Sandeen
@ 2018-01-12  2:03     ` Darrick J. Wong
  0 siblings, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-12  2:03 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: sandeen, linux-xfs

On Thu, Jan 11, 2018 at 07:30:11PM -0600, Eric Sandeen wrote:
> On 1/5/18 7:51 PM, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Parse command line options in order to set up the context in which we
> > will scrub the filesystem.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  scrub/common.h    |    8 ++
> >  scrub/xfs_scrub.c |  207 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  scrub/xfs_scrub.h |   34 +++++++++
> >  3 files changed, 249 insertions(+)
> > 
> > 
> > diff --git a/scrub/common.h b/scrub/common.h
> > index f620620..15a59bd 100644
> > --- a/scrub/common.h
> > +++ b/scrub/common.h
> > @@ -48,4 +48,12 @@ void __record_preen(struct scrub_ctx *ctx, const char *descr, const char *file,
> >  #define str_info(ctx, str, ...)		__str_info(ctx, str, __FILE__, __LINE__, __VA_ARGS__)
> >  #define dbg_printf(fmt, ...)		{if (debug > 1) {printf(fmt, __VA_ARGS__);}}
> >  
> > +/* Is this debug tweak enabled? */
> > +static inline bool
> > +debug_tweak_on(
> > +	const char		*name)
> > +{
> > +	return debug && getenv(name) != NULL;
> 
> since it's debug anyway, I wonder if printing
> "FOO_BAR_TWEAK is on" would be useful here.
> 
> > +}
> > +
> >  #endif /* XFS_SCRUB_COMMON_H_ */
> > diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
> > index 10116a8..9db3b41 100644
> > --- a/scrub/xfs_scrub.c
> > +++ b/scrub/xfs_scrub.c
> > @@ -20,7 +20,12 @@
> >  #include <stdio.h>
> >  #include <pthread.h>
> >  #include <stdbool.h>
> > +#include <stdlib.h>
> > +#include "platform_defs.h"
> > +#include "xfs.h"
> > +#include "input.h"
> >  #include "xfs_scrub.h"
> > +#include "common.h"
> >  
> >  /*
> >   * XFS Online Metadata Scrub (and Repair)
> > @@ -107,11 +112,213 @@ unsigned int			debug;
> >  /* Should we dump core if errors happen? */
> >  bool				dumpcore;
> >  
> > +/* Display resource usage at the end of each phase? */
> > +bool				display_rusage;
> > +
> > +/* Background mode; higher values insert more pauses between scrub calls. */
> > +unsigned int			bg_mode;
> > +
> > +/* Maximum number of processors available to us. */
> > +int				nproc;
> > +
> > +/* Number of threads we're allowed to use. */
> > +unsigned int			nr_threads;
> > +
> > +/* Verbosity; higher values print more information. */
> > +bool				verbose;
> > +
> > +/* Should we scrub the data blocks? */
> > +bool				scrub_data;
> > +
> > +/* Size of a memory page. */
> > +long				page_size;
> > +
> > +static void __attribute__((noreturn))
> > +usage(void)
> > +{
> > +	fprintf(stderr, _("Usage: %s [OPTIONS] mountpoint\n"), progname);
> > +	fprintf(stderr, _("-a:\tStop after this many errors are found.\n"));
> > +	fprintf(stderr, _("-b:\tBackground mode.\n"));
> > +	fprintf(stderr, _("-e:\tWhat to do if errors are found.\n"));
> > +	fprintf(stderr, _("-m:\tPath to /etc/mtab.\n"));
> > +	fprintf(stderr, _("-n:\tDry run.  Do not modify anything.\n"));
> > +	fprintf(stderr, _("-T:\tDisplay timing/usage information.\n"));
> > +	fprintf(stderr, _("-v:\tVerbose output.\n"));
> > +	fprintf(stderr, _("-V:\tPrint version.\n"));
> > +	fprintf(stderr, _("-x:\tScrub file data too.\n"));
> > +	fprintf(stderr, _("-y:\tRepair all errors.\n"));
> > +
> > +	exit(16);
> > +}
> > +
> >  int
> >  main(
> >  	int			argc,
> >  	char			**argv)
> >  {
> > +	int			c;
> > +	char			*mtab = NULL;
> > +	char			*repairstr = "";
> > +	struct scrub_ctx	ctx = {0};
> > +	unsigned long long	total_errors;
> > +	bool			moveon = true;
> > +	static bool		injected;
> > +	int			ret = 0;
> > +
> >  	fprintf(stderr, "XXX: This program is not complete!\n");
> >  	return 4;
> > +
> > +	progname = basename(argv[0]);
> > +	setlocale(LC_ALL, "");
> > +	bindtextdomain(PACKAGE, LOCALEDIR);
> > +	textdomain(PACKAGE);
> > +
> > +	pthread_mutex_init(&ctx.lock, NULL);
> > +	ctx.mode = SCRUB_MODE_DEFAULT;
> > +	ctx.error_action = ERRORS_CONTINUE;
> > +	while ((c = getopt(argc, argv, "a:bde:m:nTvxVy")) != EOF) {
> > +		switch (c) {
> > +		case 'a':
> > +			ctx.max_errors = cvt_u64(optarg, 10);
> > +			if (errno) {
> > +				perror(optarg);
> > +				usage();
> > +			}
> > +			break;
> > +		case 'b':
> > +			nr_threads = 1;
> > +			bg_mode++;
> > +			break;
> > +		case 'd':
> > +			debug++;
> > +			dumpcore = true;
> > +			break;
> > +		case 'e':
> > +			if (!strcmp("continue", optarg))
> > +				ctx.error_action = ERRORS_CONTINUE;
> > +			else if (!strcmp("shutdown", optarg))
> > +				ctx.error_action = ERRORS_SHUTDOWN;
> > +			else
> > +				usage();
> 
> Nothing tells me what I did wrong here,
> 
> # scrub/xfs_scrub -e make_it_so /mnt/test
> Usage: xfs_scrub [OPTIONS] mountpoint
> -a:	Stop after this many errors are found.
> -b:	Background mode.
> -C:	Print progress information to this fd.
> -e:	What to do if errors are found.
> ...
> 
> I told it what to do... what's wrong?

Unknown error behavior "$optarg". ?

> > +			break;
> > +		case 'm':
> > +			mtab = optarg;
> > +			break;
> > +		case 'n':
> > +			if (ctx.mode != SCRUB_MODE_DEFAULT) {
> > +				fprintf(stderr,
> > +_("Only one of the options -n or -y may be specified.\n"));
> > +				return 1;
> > +			}
> > +			ctx.mode = SCRUB_MODE_DRY_RUN;
> > +			break;
> > +		case 'T':
> > +			display_rusage = true;
> > +			break;
> > +		case 'v':
> > +			verbose = true;
> > +			break;
> > +		case 'V':
> > +			fprintf(stdout, _("%s version %s\n"), progname,
> > +					VERSION);
> > +			fflush(stdout);
> > +			exit(0);
> > +		case 'x':
> > +			scrub_data = true;
> > +			break;
> > +		case 'y':
> > +			if (ctx.mode != SCRUB_MODE_DEFAULT) {
> > +				fprintf(stderr,
> > +_("Only one of the options -n or -y may be specified.\n"));
> > +				return 1;
> > +			}
> > +			ctx.mode = SCRUB_MODE_REPAIR;
> > +			break;
> > +		case '?':
> 
> '?' isn't in the getopt string ...

The getopt manpage says it returns '?' for an unknown parameter, so I
provide the specific case here so that nobody can accidentally add a
second (case '?') statement.

IOWs, it's a defensive move.

> # scrub/xfs_scrub ?
> xfs_scrub: could not stat: ?: No such file or directory
> 
> 
> > +			/* fall through */
> > +		default:
> > +			usage();
> > +		}
> > +	}
> > +
> > +	/* Override thread count if debugger */
> > +	if (debug_tweak_on("XFS_SCRUB_THREADS")) {
> 
> can you document all these tweaks somewhere near the top in a comment?

/*
 * Known debug tweaks (pass -d and set the environment variable):
 * XFS_SCRUB_FORCE_ERROR	-- pretend all metadata is corrupt
 * XFS_SCRUB_FORCE_REPAIR	-- repair all metadata even if it's ok
 * XFS_SCRUB_NO_KERNEL		-- pretend there is no kernel ioctl
 * XFS_SCRUB_NO_SCSI_VERIFY	-- disable SCSI VERIFY (if present)
 * XFS_SCRUB_PHASE		-- run only this scrub phase
 * XFS_SCRUB_THREADS		-- start exactly this number of threads
 */

> > +		unsigned int	x;
> > +
> > +		x = cvt_u32(getenv("XFS_SCRUB_THREADS"), 10);
> > +		if (errno) {
> > +			perror("nr_threads");
> > +			usage();
> > +		}
> > +		nr_threads = x;
> > +	}
> > +
> > +	if (optind != argc - 1)
> > +		usage();
> > +
> > +	ctx.mntpoint = strdup(argv[optind]);
> > +
> > +	/*
> > +	 * If the user did not specify an explicit mount table, try to use
> > +	 * /proc/mounts if it is available, else /etc/mtab.  We prefer
> > +	 * /proc/mounts because it is kernel controlled, while /etc/mtab
> > +	 * may contain garbage that userspace tools like pam_mounts wrote
> > +	 * into it.
> > +	 */
> > +	if (!mtab) {
> > +		if (access(_PATH_PROC_MOUNTS, R_OK) == 0)
> > +			mtab = _PATH_PROC_MOUNTS;
> > +		else
> > +			mtab = _PATH_MOUNTED;
> > +	}
> > +
> > +	/* How many CPUs? */
> > +	nproc = sysconf(_SC_NPROCESSORS_ONLN);
> > +	if (nproc < 1)
> > +		nproc = 1;
> > +
> > +	/* Set up a page-aligned buffer for read verification. */
> > +	page_size = sysconf(_SC_PAGESIZE);
> > +	if (page_size < 0) {
> > +		str_errno(&ctx, ctx.mntpoint);
> > +		goto out;
> > +	}
> > +
> > +	if (debug_tweak_on("XFS_SCRUB_FORCE_REPAIR") && !injected) {
> > +		ctx.mode = SCRUB_MODE_REPAIR;
> > +		injected = true;
> > +	}
> 
> what is "injected" used for?  How could it already be set?.

Not needed here, will remove.

> > +
> > +	if (xfs_scrub_excessive_errors(&ctx))
> > +		str_info(&ctx, ctx.mntpoint, _("Too many errors; aborting."));
> 
> wait wut?  oh right, you'll add $DO_STUFF above this in later patches ;)
> 
> 
> Rest looks ok

Ok.

--D

> 
> -Eric
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v11 00/27] xfsprogs: online scrub/repair support
  2018-01-06  1:51 [PATCH v11 00/27] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (27 preceding siblings ...)
  2018-01-06  3:50 ` [PATCH 07/27] xfs_scrub: find XFS filesystem geometry Darrick J. Wong
@ 2018-01-12  4:17 ` Eric Sandeen
  2018-01-17  1:31   ` Darrick J. Wong
  2018-01-16 19:21 ` [PATCH 28/27] xfs_scrub: wire up repair ioctl Darrick J. Wong
  2018-01-16 19:21 ` [PATCH 29/27] xfs_scrub: schedule and manage repairs to the filesystem Darrick J. Wong
  30 siblings, 1 reply; 61+ messages in thread
From: Eric Sandeen @ 2018-01-12  4:17 UTC (permalink / raw)
  To: Darrick J. Wong, sandeen; +Cc: linux-xfs

On 1/5/18 7:51 PM, Darrick J. Wong wrote:
> Hi all,
> 
> This is the eleventh revision of a patchset that adds to XFS userland tools
> support for online metadata scrubbing and repair.  Since v10 I've rebased
> to the latest for-next, fixed some wonky error messages, and fixed a few
> minor problems I found via code inspection.  However, this patch series is
> more or less the same as v10.

General note rather than finding the patches they came from ;)

these can be made static and in some cases removed from header files,
and/or ... hm, some aren't used at all.

  'bitmap_dump' is unique to scrub/bitmap.o  (function)
  'bitmap_iterate' is unique to scrub/bitmap.o  (function)
  'do_error' is unique to scrub/common.o  (function)
  'display_rusage' is unique to scrub/xfs_scrub.o  (global variable)
  'is_service' is unique to scrub/xfs_scrub.o  (global variable)
  'progname' is unique to scrub/xfs_scrub.o  (global variable)
  'scrub_data' is unique to scrub/xfs_scrub.o  (global variable)
  'xfs_check_rmap_ioerr' is unique to scrub/phase6.o  (function)


bitmap_dump (and so bitmap_iterate) are unused
do_error is unused as well?

-Eric

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 28/27] xfs_scrub: wire up repair ioctl
  2018-01-06  1:51 [PATCH v11 00/27] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (28 preceding siblings ...)
  2018-01-12  4:17 ` [PATCH v11 00/27] xfsprogs: online scrub/repair support Eric Sandeen
@ 2018-01-16 19:21 ` Darrick J. Wong
  2018-01-16 19:21 ` [PATCH 29/27] xfs_scrub: schedule and manage repairs to the filesystem Darrick J. Wong
  30 siblings, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-16 19:21 UTC (permalink / raw)
  To: sandeen; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Create the mechanism we need to actually call the kernel's online repair
functionality.  The interface will consume a repair description; the
descriptor management will follow in the next patch.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/common.c    |   51 +++++++++++++++++++
 scrub/common.h    |    2 +
 scrub/phase1.c    |   15 ++++++
 scrub/phase2.c    |    1 
 scrub/phase3.c    |    1 
 scrub/phase5.c    |    1 
 scrub/scrub.c     |  139 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/scrub.h     |   20 ++++++++
 scrub/xfs_scrub.h |    2 +
 9 files changed, 232 insertions(+)

diff --git a/scrub/common.c b/scrub/common.c
index 48ee01c..bd3e939 100644
--- a/scrub/common.c
+++ b/scrub/common.c
@@ -180,6 +180,57 @@ __str_info(
 	pthread_mutex_unlock(&ctx->lock);
 }
 
+/* Increment the repair count. */
+void
+__record_repair(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	const char		*file,
+	int			line,
+	const char		*format,
+	...)
+{
+	va_list			args;
+
+	pthread_mutex_lock(&ctx->lock);
+	fprintf(stderr, _("Repaired: %s: "), descr);
+	va_start(args, format);
+	vfprintf(stderr, format, args);
+	va_end(args);
+	if (debug)
+		fprintf(stderr, _(" (%s line %d)"), file, line);
+	fprintf(stderr, "\n");
+	ctx->repairs++;
+	pthread_mutex_unlock(&ctx->lock);
+}
+
+/* Increment the optimization (preening) count. */
+void
+__record_preen(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	const char		*file,
+	int			line,
+	const char		*format,
+	...)
+{
+	va_list			args;
+
+	pthread_mutex_lock(&ctx->lock);
+	if (debug || verbose) {
+		fprintf(stdout, _("Optimized: %s: "), descr);
+		va_start(args, format);
+		vfprintf(stdout, format, args);
+		va_end(args);
+		if (debug)
+			fprintf(stdout, _(" (%s line %d)"), file, line);
+		fprintf(stdout, "\n");
+		fflush(stdout);
+	}
+	ctx->preens++;
+	pthread_mutex_unlock(&ctx->lock);
+}
+
 /* Catch fatal errors from pieces we import from xfs_repair. */
 void __attribute__((noreturn))
 do_error(char const *msg, ...)
diff --git a/scrub/common.h b/scrub/common.h
index bd67a17..ea3bc3f 100644
--- a/scrub/common.h
+++ b/scrub/common.h
@@ -49,6 +49,8 @@ void __str_errno_warn(struct scrub_ctx *, const char *descr, const char *file,
 #define str_warn(ctx, str, ...)		__str_warn(ctx, str, __FILE__, __LINE__, __VA_ARGS__)
 #define str_info(ctx, str, ...)		__str_info(ctx, str, __FILE__, __LINE__, __VA_ARGS__)
 #define str_errno_warn(ctx, str)	__str_errno_warn(ctx, str, __FILE__, __LINE__)
+#define record_repair(ctx, str, ...)	__record_repair(ctx, str, __FILE__, __LINE__, __VA_ARGS__)
+#define record_preen(ctx, str, ...)	__record_preen(ctx, str, __FILE__, __LINE__, __VA_ARGS__)
 #define dbg_printf(fmt, ...)		{if (debug > 1) {printf(fmt, __VA_ARGS__);}}
 
 /* Is this debug tweak enabled? */
diff --git a/scrub/phase1.c b/scrub/phase1.c
index d7a321f..3a2fbd7 100644
--- a/scrub/phase1.c
+++ b/scrub/phase1.c
@@ -176,6 +176,21 @@ _("Does not appear to be an XFS filesystem!"));
 	    !xfs_can_scrub_parent(ctx))
 		return false;
 
+	/* Do we have kernel-assisted metadata repair? */
+	if (ctx->mode != SCRUB_MODE_DRY_RUN && !xfs_can_repair(ctx)) {
+		if (ctx->mode == SCRUB_MODE_PREEN) {
+			/* w/o repair, demote preen to dry run. */
+			if (debug || verbose)
+				str_info(ctx, ctx->mntpoint,
+_("Metadata repairing not supported; demoting to scan mode.")
+						);
+			ctx->mode = SCRUB_MODE_DRY_RUN;
+		} else {
+			/* Repair mode w/o repair; abort. */
+			return false;
+		}
+	}
+
 	/* Go find the XFS devices if we have a usable fsmap. */
 	fs_table_initialise(0, NULL, 0, NULL);
 	errno = 0;
diff --git a/scrub/phase2.c b/scrub/phase2.c
index e8eb1ca..32e2752 100644
--- a/scrub/phase2.c
+++ b/scrub/phase2.c
@@ -24,6 +24,7 @@
 #include <sys/stat.h>
 #include <sys/statvfs.h>
 #include "xfs.h"
+#include "list.h"
 #include "path.h"
 #include "workqueue.h"
 #include "xfs_scrub.h"
diff --git a/scrub/phase3.c b/scrub/phase3.c
index 43697c6..f4117b0 100644
--- a/scrub/phase3.c
+++ b/scrub/phase3.c
@@ -24,6 +24,7 @@
 #include <sys/stat.h>
 #include <sys/statvfs.h>
 #include "xfs.h"
+#include "list.h"
 #include "path.h"
 #include "workqueue.h"
 #include "xfs_scrub.h"
diff --git a/scrub/phase5.c b/scrub/phase5.c
index fc3308b..703b279 100644
--- a/scrub/phase5.c
+++ b/scrub/phase5.c
@@ -29,6 +29,7 @@
 #endif
 #include "xfs.h"
 #include "handle.h"
+#include "list.h"
 #include "path.h"
 #include "workqueue.h"
 #include "xfs_scrub.h"
diff --git a/scrub/scrub.c b/scrub/scrub.c
index bc4eab4..5729b9b 100644
--- a/scrub/scrub.c
+++ b/scrub/scrub.c
@@ -28,6 +28,7 @@
 #include <sys/statvfs.h>
 #include "xfs.h"
 #include "xfs_fs.h"
+#include "list.h"
 #include "path.h"
 #include "xfs_scrub.h"
 #include "common.h"
@@ -561,10 +562,20 @@ __xfs_scrub_test(
 	bool				repair)
 {
 	struct xfs_scrub_metadata	meta = {0};
+	struct xfs_error_injection	inject;
+	static bool			injected;
 	int				error;
 
 	if (debug_tweak_on("XFS_SCRUB_NO_KERNEL"))
 		return false;
+	if (debug_tweak_on("XFS_SCRUB_FORCE_REPAIR") && !injected) {
+		inject.fd = ctx->mnt_fd;
+		inject.errtag = XFS_ERRTAG_FORCE_SCRUB_REPAIR;
+		error = ioctl(ctx->mnt_fd,
+				XFS_IOC_ERROR_INJECTION, &inject);
+		if (error == 0)
+			injected = true;
+	}
 
 	meta.sm_type = type;
 	if (repair)
@@ -646,3 +657,131 @@ xfs_can_scrub_parent(
 {
 	return __xfs_scrub_test(ctx, XFS_SCRUB_TYPE_PARENT, false);
 }
+
+bool
+xfs_can_repair(
+	struct scrub_ctx	*ctx)
+{
+	return __xfs_scrub_test(ctx, XFS_SCRUB_TYPE_PROBE, true);
+}
+
+/* General repair routines. */
+
+/* Repair some metadata. */
+enum check_outcome
+xfs_repair_metadata(
+	struct scrub_ctx		*ctx,
+	int				fd,
+	struct repair_item		*ri,
+	unsigned int			repair_flags)
+{
+	char				buf[DESCR_BUFSZ];
+	struct xfs_scrub_metadata	meta = { 0 };
+	struct xfs_scrub_metadata	oldm;
+	int				error;
+
+	assert(ri->type < XFS_SCRUB_TYPE_NR);
+	assert(!debug_tweak_on("XFS_SCRUB_NO_KERNEL"));
+	meta.sm_type = ri->type;
+	meta.sm_flags = ri->flags | XFS_SCRUB_IFLAG_REPAIR;
+	switch (scrubbers[ri->type].type) {
+	case ST_AGHEADER:
+	case ST_PERAG:
+		meta.sm_agno = ri->agno;
+		break;
+	case ST_INODE:
+		meta.sm_ino = ri->ino;
+		meta.sm_gen = ri->gen;
+		break;
+	default:
+		break;
+	}
+
+	/*
+	 * If this is a preen operation but we're only repairing
+	 * critical items, defer the preening until later.
+	 */
+	if (!needs_repair(&meta) && (repair_flags & XRM_REPAIR_ONLY))
+		return CHECK_RETRY;
+
+	memcpy(&oldm, &meta, sizeof(oldm));
+	format_scrub_descr(buf, DESCR_BUFSZ, &meta, &scrubbers[meta.sm_type]);
+
+	if (needs_repair(&meta))
+		str_info(ctx, buf, _("Attempting repair."));
+	else if (debug || verbose)
+		str_info(ctx, buf, _("Attempting optimization."));
+
+	error = ioctl(fd, XFS_IOC_SCRUB_METADATA, &meta);
+	/*
+	 * If the caller doesn't want us to complain, tell the caller to
+	 * requeue the repair for later and don't say a thing.
+	 */
+	if (!(repair_flags & XRM_NOFIX_COMPLAIN) &&
+	    (error || needs_repair(&meta)))
+		return CHECK_RETRY;
+	if (error) {
+		switch (errno) {
+		case EDEADLOCK:
+		case EBUSY:
+			/* Filesystem is busy, try again later. */
+			if (debug || verbose)
+				str_info(ctx, buf,
+_("Filesystem is busy, deferring repair."));
+			return CHECK_RETRY;
+		case ESHUTDOWN:
+			/* Filesystem is already shut down, abort. */
+			str_error(ctx, buf,
+_("Filesystem is shut down, aborting."));
+			return CHECK_ABORT;
+		case ENOTTY:
+		case EOPNOTSUPP:
+			/*
+			 * If we forced repairs, don't complain if kernel
+			 * doesn't know how to fix.
+			 */
+			if (debug_tweak_on("XFS_SCRUB_FORCE_REPAIR"))
+				return CHECK_DONE;
+			/* fall through */
+		case EINVAL:
+			/* Kernel doesn't know how to repair this? */
+			str_error(ctx, buf,
+_("Don't know how to fix; offline repair required."));
+			return CHECK_DONE;
+		case EROFS:
+			/* Read-only filesystem, can't fix. */
+			if (verbose || debug || needs_repair(&oldm))
+				str_info(ctx, buf,
+_("Read-only filesystem; cannot make changes."));
+			return CHECK_DONE;
+		case ENOENT:
+			/* Metadata not present, just skip it. */
+			return CHECK_DONE;
+		case ENOMEM:
+		case ENOSPC:
+			/* Don't care if preen fails due to low resources. */
+			if (is_unoptimized(&oldm) && !needs_repair(&oldm))
+				return CHECK_DONE;
+			/* fall through */
+		default:
+			/* Operational error. */
+			str_errno(ctx, buf);
+			return CHECK_DONE;
+		}
+	}
+	if (repair_flags & XRM_NOFIX_COMPLAIN)
+		xfs_scrub_warn_incomplete_scrub(ctx, buf, &meta);
+	if (needs_repair(&meta)) {
+		/* Still broken, try again or fix offline. */
+		if (repair_flags & XRM_NOFIX_COMPLAIN)
+			str_error(ctx, buf,
+_("Repair unsuccessful; offline repair required."));
+	} else {
+		/* Clean operation, no corruption detected. */
+		if (needs_repair(&oldm))
+			record_repair(ctx, buf, _("Repairs successful."));
+		else
+			record_preen(ctx, buf, _("Optimization successful."));
+	}
+	return CHECK_DONE;
+}
diff --git a/scrub/scrub.h b/scrub/scrub.h
index 0b454df..1c44fba 100644
--- a/scrub/scrub.h
+++ b/scrub/scrub.h
@@ -41,6 +41,7 @@ bool xfs_can_scrub_dir(struct scrub_ctx *ctx);
 bool xfs_can_scrub_attr(struct scrub_ctx *ctx);
 bool xfs_can_scrub_symlink(struct scrub_ctx *ctx);
 bool xfs_can_scrub_parent(struct scrub_ctx *ctx);
+bool xfs_can_repair(struct scrub_ctx *ctx);
 
 bool xfs_scrub_inode_fields(struct scrub_ctx *ctx, uint64_t ino, uint32_t gen,
 		int fd);
@@ -59,4 +60,23 @@ bool xfs_scrub_symlink(struct scrub_ctx *ctx, uint64_t ino, uint32_t gen,
 bool xfs_scrub_parent(struct scrub_ctx *ctx, uint64_t ino, uint32_t gen,
 		int fd);
 
+/* Repair parameters are the scrub inputs and retry count. */
+struct repair_item {
+	struct list_head	list;
+	__u64			ino;
+	__u32			type;
+	__u32			flags;
+	__u32			gen;
+	__u32			agno;
+};
+
+/* Only perform repairs; leave optimization-only actions for later. */
+#define XRM_REPAIR_ONLY		(1U << 0)
+
+/* Complain if still broken even after fix. */
+#define XRM_NOFIX_COMPLAIN	(1U << 1)
+
+enum check_outcome xfs_repair_metadata(struct scrub_ctx *ctx, int fd,
+		struct repair_item *ri, unsigned int repair_flags);
+
 #endif /* XFS_SCRUB_SCRUB_H_ */
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index 9b5e490..83b8ae2 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -97,6 +97,8 @@ struct scrub_ctx {
 	unsigned long long	inodes_checked;
 	unsigned long long	bytes_checked;
 	unsigned long long	naming_warnings;
+	unsigned long long	repairs;
+	unsigned long long	preens;
 	bool			need_repair;
 	bool			preen_triggers[XFS_SCRUB_TYPE_NR];
 };

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 29/27] xfs_scrub: schedule and manage repairs to the filesystem
  2018-01-06  1:51 [PATCH v11 00/27] xfsprogs: online scrub/repair support Darrick J. Wong
                   ` (29 preceding siblings ...)
  2018-01-16 19:21 ` [PATCH 28/27] xfs_scrub: wire up repair ioctl Darrick J. Wong
@ 2018-01-16 19:21 ` Darrick J. Wong
  30 siblings, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-16 19:21 UTC (permalink / raw)
  To: sandeen; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Teach xfs_scrub to remember scrub requests that failed (or indicated
that optimization is a possibility) as repair requests that can be
deferred until later.  Add a new repair phase that deals with the
repair requests.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 man/man8/xfs_scrub.8 |   27 ++++-
 scrub/Makefile       |    2 
 scrub/phase1.c       |    7 +
 scrub/phase2.c       |   59 +++++++++-
 scrub/phase3.c       |   42 +++++--
 scrub/phase4.c       |   76 ++++++++++++-
 scrub/repair.c       |  299 ++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/repair.h       |   55 +++++++++
 scrub/scrub.c        |  107 +++++++++++++-----
 scrub/scrub.h        |   32 +++--
 scrub/xfs_scrub.c    |   22 ++++
 scrub/xfs_scrub.h    |    1 
 12 files changed, 667 insertions(+), 62 deletions(-)
 create mode 100644 scrub/repair.c
 create mode 100644 scrub/repair.h

diff --git a/man/man8/xfs_scrub.8 b/man/man8/xfs_scrub.8
index 4c394a5..ce5d876 100644
--- a/man/man8/xfs_scrub.8
+++ b/man/man8/xfs_scrub.8
@@ -114,9 +114,27 @@ Instructing the underlying storage to discard unused extents via the
 .B FITRIM
 ioctl.
 .SH REPAIRS
-This program currently does not support making any repairs.
-Corruptions can only be fixed by unmounting the filesystem and running
-.BR xfs_repair (8).
+Repairs are performed by calling into the kernel.
+This limits the scope of repair activities to rebuilding primary data
+structures from secondary data structures, or secondary structures from
+primary structures.
+The existence of secondary data structures may require features that can
+only be turned on from
+.BR mkfs.xfs (8).
+If errors cannot be repaired, the filesystem must be
+unmounted and
+.BR xfs_repair (8)
+run.
+Repairs supported by the kernel include, but are not limited to:
+.IP \[bu] 2
+Reconstructing extent allocation data from the reverse mapping data.
+.IP \[bu]
+Reconstructing reverse mapping data from primary extent allocation data.
+.IP \[bu]
+Scheduling a quotacheck for the next mount.
+.PP
+If corrupt metadata is successfully repaired, this program will log that
+a repair has succeeded instead of a corruption report.
 .SH EXIT CODE
 The exit code returned by
 .B xfs_scrub
@@ -140,8 +158,5 @@ This program takes advantage of in-kernel scrubbing to verify a given
 data structure with locks held and can keep the filesystem busy for a
 long time.
 The kernel must be new enough to support the SCRUB_METADATA ioctl.
-.PP
-If errors are found and cannot be repaired, the filesystem must be
-unmounted and repaired.
 .SH SEE ALSO
 .BR xfs_repair (8).
diff --git a/scrub/Makefile b/scrub/Makefile
index 597b2eb..7cdada2 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -37,6 +37,7 @@ fscounters.h \
 inodes.h \
 progress.h \
 read_verify.h \
+repair.h \
 scrub.h \
 spacemap.h \
 unicrash.h \
@@ -60,6 +61,7 @@ phase6.c \
 phase7.c \
 progress.c \
 read_verify.c \
+repair.c \
 scrub.c \
 spacemap.c \
 vfs.c \
diff --git a/scrub/phase1.c b/scrub/phase1.c
index 3a2fbd7..f7d01d1 100644
--- a/scrub/phase1.c
+++ b/scrub/phase1.c
@@ -47,6 +47,7 @@
 #include "common.h"
 #include "disk.h"
 #include "scrub.h"
+#include "repair.h"
 
 /* Phase 1: Find filesystem geometry (and clean up after) */
 
@@ -68,6 +69,7 @@ bool
 xfs_cleanup_fs(
 	struct scrub_ctx	*ctx)
 {
+	xfs_repair_lists_free(&ctx->repair_lists);
 	if (ctx->fshandle)
 		free_handle(ctx->fshandle, ctx->fshandle_len);
 	if (ctx->rtdev)
@@ -157,6 +159,11 @@ _("Does not appear to be an XFS filesystem!"));
 		return false;
 	}
 
+	if (!xfs_repair_lists_alloc(ctx->geo.agcount, &ctx->repair_lists)) {
+		str_error(ctx, ctx->mntpoint, _("Not enough memory."));
+		return false;
+	}
+
 	ctx->agblklog = log2_roundup(ctx->geo.agblocks);
 	ctx->blocklog = highbit32(ctx->geo.blocksize);
 	ctx->inodelog = highbit32(ctx->geo.inodesize);
diff --git a/scrub/phase2.c b/scrub/phase2.c
index 32e2752..5669f0a 100644
--- a/scrub/phase2.c
+++ b/scrub/phase2.c
@@ -30,6 +30,7 @@
 #include "xfs_scrub.h"
 #include "common.h"
 #include "scrub.h"
+#include "repair.h"
 
 /* Phase 2: Check internal metadata. */
 
@@ -42,24 +43,65 @@ xfs_scan_ag_metadata(
 {
 	struct scrub_ctx		*ctx = (struct scrub_ctx *)wq->wq_ctx;
 	bool				*pmoveon = arg;
+	struct xfs_repair_list		repairs;
+	struct xfs_repair_list		repair_now;
+	unsigned long long		broken_primaries;
+	unsigned long long		broken_secondaries;
 	bool				moveon;
 	char				descr[DESCR_BUFSZ];
 
+	xfs_repair_list_init(&repairs);
+	xfs_repair_list_init(&repair_now);
 	snprintf(descr, DESCR_BUFSZ, _("AG %u"), agno);
 
 	/*
 	 * First we scrub and fix the AG headers, because we need
 	 * them to work well enough to check the AG btrees.
 	 */
-	moveon = xfs_scrub_ag_headers(ctx, agno);
+	moveon = xfs_scrub_ag_headers(ctx, agno, &repairs);
+	if (!moveon)
+		goto err;
+
+	/* Repair header damage. */
+	moveon = xfs_quick_repair(ctx, agno, &repairs);
 	if (!moveon)
 		goto err;
 
 	/* Now scrub the AG btrees. */
-	moveon = xfs_scrub_ag_metadata(ctx, agno);
+	moveon = xfs_scrub_ag_metadata(ctx, agno, &repairs);
+	if (!moveon)
+		goto err;
+
+	/*
+	 * Figure out if we need to perform early fixing.  The only
+	 * reason we need to do this is if the inobt is broken, which
+	 * prevents phase 3 (inode scan) from running.  We can rebuild
+	 * the inobt from rmapbt data, but if the rmapbt is broken even
+	 * at this early phase then we are sunk.
+	 */
+	broken_secondaries = 0;
+	broken_primaries = 0;
+	xfs_repair_find_mustfix(&repairs, &repair_now,
+			&broken_primaries, &broken_secondaries);
+	if (broken_secondaries && !debug_tweak_on("XFS_SCRUB_FORCE_REPAIR")) {
+		if (broken_primaries)
+			str_info(ctx, descr,
+_("Corrupt primary and secondary block mapping metadata."));
+		else
+			str_info(ctx, descr,
+_("Corrupt secondary block mapping metadata."));
+		str_info(ctx, descr,
+_("Filesystem might not be repairable."));
+	}
+
+	/* Repair (inode) btree damage. */
+	moveon = xfs_quick_repair(ctx, agno, &repair_now);
 	if (!moveon)
 		goto err;
 
+	/* Everything else gets fixed during phase 4. */
+	xfs_defer_repairs(ctx, agno, &repairs);
+
 	return;
 err:
 	*pmoveon = false;
@@ -74,11 +116,15 @@ xfs_scan_fs_metadata(
 {
 	struct scrub_ctx		*ctx = (struct scrub_ctx *)wq->wq_ctx;
 	bool				*pmoveon = arg;
+	struct xfs_repair_list		repairs;
 	bool				moveon;
 
-	moveon = xfs_scrub_fs_metadata(ctx);
+	xfs_repair_list_init(&repairs);
+	moveon = xfs_scrub_fs_metadata(ctx, &repairs);
 	if (!moveon)
 		*pmoveon = false;
+
+	xfs_defer_repairs(ctx, agno, &repairs);
 }
 
 /* Scan all filesystem metadata. */
@@ -86,6 +132,7 @@ bool
 xfs_scan_metadata(
 	struct scrub_ctx	*ctx)
 {
+	struct xfs_repair_list	repairs;
 	struct workqueue	wq;
 	xfs_agnumber_t		agno;
 	bool			moveon = true;
@@ -103,7 +150,11 @@ xfs_scan_metadata(
 	 * upgrades (followed by a full scrub), do that before we launch
 	 * anything else.
 	 */
-	moveon = xfs_scrub_primary_super(ctx);
+	xfs_repair_list_init(&repairs);
+	moveon = xfs_scrub_primary_super(ctx, &repairs);
+	if (!moveon)
+		return moveon;
+	moveon = xfs_quick_repair(ctx, 0, &repairs);
 	if (!moveon)
 		return moveon;
 
diff --git a/scrub/phase3.c b/scrub/phase3.c
index f4117b0..7fb0120 100644
--- a/scrub/phase3.c
+++ b/scrub/phase3.c
@@ -33,6 +33,7 @@
 #include "inodes.h"
 #include "progress.h"
 #include "scrub.h"
+#include "repair.h"
 
 /* Phase 3: Scan all inodes. */
 
@@ -45,10 +46,11 @@ static bool
 xfs_scrub_fd(
 	struct scrub_ctx	*ctx,
 	bool			(*fn)(struct scrub_ctx *, uint64_t,
-				      uint32_t, int),
-	struct xfs_bstat	*bs)
+				      uint32_t, int, struct xfs_repair_list *),
+	struct xfs_bstat	*bs,
+	struct xfs_repair_list	*rl)
 {
-	return fn(ctx, bs->bs_ino, bs->bs_gen, ctx->mnt_fd);
+	return fn(ctx, bs->bs_ino, bs->bs_gen, ctx->mnt_fd, rl);
 }
 
 struct scrub_inode_ctx {
@@ -64,11 +66,15 @@ xfs_scrub_inode(
 	struct xfs_bstat	*bstat,
 	void			*arg)
 {
+	struct xfs_repair_list	repairs;
 	struct scrub_inode_ctx	*ictx = arg;
 	struct ptcounter	*icount = ictx->icount;
+	xfs_agnumber_t		agno;
 	bool			moveon = true;
 	int			fd = -1;
 
+	xfs_repair_list_init(&repairs);
+	agno = bstat->bs_ino / (1ULL << (ctx->inopblog + ctx->agblklog));
 	background_sleep();
 
 	/* Try to open the inode to pin it. */
@@ -80,45 +86,59 @@ xfs_scrub_inode(
 	}
 
 	/* Scrub the inode. */
-	moveon = xfs_scrub_fd(ctx, xfs_scrub_inode_fields, bstat);
+	moveon = xfs_scrub_fd(ctx, xfs_scrub_inode_fields, bstat, &repairs);
+	if (!moveon)
+		goto out;
+
+	moveon = xfs_quick_repair(ctx, agno, &repairs);
 	if (!moveon)
 		goto out;
 
 	/* Scrub all block mappings. */
-	moveon = xfs_scrub_fd(ctx, xfs_scrub_data_fork, bstat);
+	moveon = xfs_scrub_fd(ctx, xfs_scrub_data_fork, bstat, &repairs);
 	if (!moveon)
 		goto out;
-	moveon = xfs_scrub_fd(ctx, xfs_scrub_attr_fork, bstat);
+	moveon = xfs_scrub_fd(ctx, xfs_scrub_attr_fork, bstat, &repairs);
 	if (!moveon)
 		goto out;
-	moveon = xfs_scrub_fd(ctx, xfs_scrub_cow_fork, bstat);
+	moveon = xfs_scrub_fd(ctx, xfs_scrub_cow_fork, bstat, &repairs);
+	if (!moveon)
+		goto out;
+
+	moveon = xfs_quick_repair(ctx, agno, &repairs);
 	if (!moveon)
 		goto out;
 
 	if (S_ISLNK(bstat->bs_mode)) {
 		/* Check symlink contents. */
 		moveon = xfs_scrub_symlink(ctx, bstat->bs_ino,
-				bstat->bs_gen, ctx->mnt_fd);
+				bstat->bs_gen, ctx->mnt_fd, &repairs);
 	} else if (S_ISDIR(bstat->bs_mode)) {
 		/* Check the directory entries. */
-		moveon = xfs_scrub_fd(ctx, xfs_scrub_dir, bstat);
+		moveon = xfs_scrub_fd(ctx, xfs_scrub_dir, bstat, &repairs);
 	}
 	if (!moveon)
 		goto out;
 
 	/* Check all the extended attributes. */
-	moveon = xfs_scrub_fd(ctx, xfs_scrub_attr, bstat);
+	moveon = xfs_scrub_fd(ctx, xfs_scrub_attr, bstat, &repairs);
 	if (!moveon)
 		goto out;
 
 	/* Check parent pointers. */
-	moveon = xfs_scrub_fd(ctx, xfs_scrub_parent, bstat);
+	moveon = xfs_scrub_fd(ctx, xfs_scrub_parent, bstat, &repairs);
+	if (!moveon)
+		goto out;
+
+	/* Try to repair the file while it's open. */
+	moveon = xfs_quick_repair(ctx, agno, &repairs);
 	if (!moveon)
 		goto out;
 
 out:
 	ptcounter_add(icount, 1);
 	progress_add(1);
+	xfs_defer_repairs(ctx, agno, &repairs);
 	if (fd >= 0)
 		close(fd);
 	if (!moveon)
diff --git a/scrub/phase4.c b/scrub/phase4.c
index 9c81069..b502238 100644
--- a/scrub/phase4.c
+++ b/scrub/phase4.c
@@ -33,16 +33,82 @@
 #include "common.h"
 #include "progress.h"
 #include "scrub.h"
+#include "repair.h"
 #include "vfs.h"
 
 /* Phase 4: Repair filesystem. */
 
+/* Fix all the problems in our per-AG list. */
+static void
+xfs_repair_ag(
+	struct workqueue		*wq,
+	xfs_agnumber_t			agno,
+	void				*priv)
+{
+	struct scrub_ctx		*ctx = (struct scrub_ctx *)wq->wq_ctx;
+	bool				*pmoveon = priv;
+	struct xfs_repair_list		*repairs;
+	size_t				unfixed;
+	size_t				new_unfixed;
+	unsigned int			flags = 0;
+	bool				moveon;
+
+	repairs = &ctx->repair_lists[agno];
+	unfixed = xfs_repair_list_length(repairs);
+
+	/* Repair anything broken until we fail to make progress. */
+	do {
+		moveon = xfs_repair_list_now(ctx, ctx->mnt_fd, repairs, flags);
+		if (!moveon) {
+			*pmoveon = false;
+			return;
+		}
+		new_unfixed = xfs_repair_list_length(repairs);
+		if (new_unfixed == unfixed)
+			break;
+		unfixed = new_unfixed;
+	} while (unfixed > 0 && *pmoveon);
+
+	if (!*pmoveon)
+		return;
+
+	/* Try once more, but this time complain if we can't fix things. */
+	flags |= XRML_NOFIX_COMPLAIN;
+	moveon = xfs_repair_list_now(ctx, ctx->mnt_fd, repairs, flags);
+	if (!moveon)
+		*pmoveon = false;
+}
+
 /* Fix everything that needs fixing. */
 bool
 xfs_repair_fs(
 	struct scrub_ctx		*ctx)
 {
+	struct workqueue		wq;
+	xfs_agnumber_t			agno;
 	bool				moveon = true;
+	int				ret;
+
+	ret = workqueue_create(&wq, (struct xfs_mount *)ctx,
+			scrub_nproc_workqueue(ctx));
+	if (ret) {
+		str_error(ctx, ctx->mntpoint, _("Could not create workqueue."));
+		return false;
+	}
+	for (agno = 0; agno < ctx->geo.agcount; agno++) {
+		if (xfs_repair_list_length(&ctx->repair_lists[agno]) > 0) {
+			ret = workqueue_add(&wq, xfs_repair_ag, agno, &moveon);
+			if (ret) {
+				moveon = false;
+				str_error(ctx, ctx->mntpoint,
+_("Could not queue repair work."));
+				break;
+			}
+		}
+		if (!moveon)
+			break;
+	}
+	workqueue_destroy(&wq);
 
 	pthread_mutex_lock(&ctx->lock);
 	if (moveon && ctx->errors_found == 0 && want_fstrim) {
@@ -62,8 +128,14 @@ xfs_estimate_repair_work(
 	unsigned int		*nr_threads,
 	int			*rshift)
 {
-	*items = 1;
-	*nr_threads = 1;
+	xfs_agnumber_t		agno;
+	size_t			need_fixing = 0;
+
+	for (agno = 0; agno < ctx->geo.agcount; agno++)
+		need_fixing += xfs_repair_list_length(&ctx->repair_lists[agno]);
+	need_fixing++;
+	*items = need_fixing;
+	*nr_threads = scrub_nproc(ctx) + 1;
 	*rshift = 0;
 	return true;
 }
diff --git a/scrub/repair.c b/scrub/repair.c
new file mode 100644
index 0000000..4a6d7b7
--- /dev/null
+++ b/scrub/repair.c
@@ -0,0 +1,299 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <stdlib.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/statvfs.h>
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "list.h"
+#include "path.h"
+#include "xfs_scrub.h"
+#include "common.h"
+#include "scrub.h"
+#include "repair.h"
+
+/*
+ * Prioritize repair items in order of how long we can wait.
+ * 0 = do it now, 10000 = do it later.
+ *
+ * To minimize the amount of repair work, we want to prioritize metadata
+ * objects by perceived corruptness.  If CORRUPT is set, the fields are
+ * just plain bad; try fixing that first.  Otherwise if XCORRUPT is set,
+ * the fields could be bad, but the xref data could also be bad; we'll
+ * try fixing that next.  Finally, if XFAIL is set, some other metadata
+ * structure failed validation during xref, so we'll recheck this
+ * metadata last since it was probably fine.
+ *
+ * For metadata that lie in the critical path of checking other metadata
+ * (superblock, AG{F,I,FL}, inobt) we scrub and fix those things before
+ * we even get to handling their dependencies, so things should progress
+ * in order.
+ */
+
+/* Sort repair items in severity order. */
+static int
+PRIO(
+	struct repair_item	*ri,
+	int			order)
+{
+	if (ri->flags & XFS_SCRUB_OFLAG_CORRUPT)
+		return order;
+	else if (ri->flags & XFS_SCRUB_OFLAG_XCORRUPT)
+		return 100 + order;
+	else if (ri->flags & XFS_SCRUB_OFLAG_XFAIL)
+		return 200 + order;
+	else if (ri->flags & XFS_SCRUB_OFLAG_PREEN)
+		return 300 + order;
+	abort();
+}
+
+/* Sort the repair items in dependency order. */
+static int
+xfs_repair_item_priority(
+	struct repair_item	*ri)
+{
+	switch (ri->type) {
+	case XFS_SCRUB_TYPE_SB:
+	case XFS_SCRUB_TYPE_AGF:
+	case XFS_SCRUB_TYPE_AGFL:
+	case XFS_SCRUB_TYPE_AGI:
+	case XFS_SCRUB_TYPE_BNOBT:
+	case XFS_SCRUB_TYPE_CNTBT:
+	case XFS_SCRUB_TYPE_INOBT:
+	case XFS_SCRUB_TYPE_FINOBT:
+	case XFS_SCRUB_TYPE_REFCNTBT:
+	case XFS_SCRUB_TYPE_RMAPBT:
+	case XFS_SCRUB_TYPE_INODE:
+	case XFS_SCRUB_TYPE_BMBTD:
+	case XFS_SCRUB_TYPE_BMBTA:
+	case XFS_SCRUB_TYPE_BMBTC:
+		return PRIO(ri, ri->type - 1);
+	case XFS_SCRUB_TYPE_DIR:
+	case XFS_SCRUB_TYPE_XATTR:
+	case XFS_SCRUB_TYPE_SYMLINK:
+	case XFS_SCRUB_TYPE_PARENT:
+		return PRIO(ri, XFS_SCRUB_TYPE_DIR);
+	case XFS_SCRUB_TYPE_RTBITMAP:
+	case XFS_SCRUB_TYPE_RTSUM:
+		return PRIO(ri, XFS_SCRUB_TYPE_RTBITMAP);
+	case XFS_SCRUB_TYPE_UQUOTA:
+	case XFS_SCRUB_TYPE_GQUOTA:
+	case XFS_SCRUB_TYPE_PQUOTA:
+		return PRIO(ri, XFS_SCRUB_TYPE_UQUOTA);
+	}
+	abort();
+}
+
+/* Make sure that btrees get repaired before headers. */
+static int
+xfs_repair_item_compare(
+	void				*priv,
+	struct list_head		*a,
+	struct list_head		*b)
+{
+	struct repair_item		*ra;
+	struct repair_item		*rb;
+
+	ra = container_of(a, struct repair_item, list);
+	rb = container_of(b, struct repair_item, list);
+
+	return xfs_repair_item_priority(ra) - xfs_repair_item_priority(rb);
+}
+
+/*
+ * Figure out which AG metadata must be fixed before we can move on
+ * to the inode scan.
+ */
+void
+xfs_repair_find_mustfix(
+	struct xfs_repair_list		*repairs,
+	struct xfs_repair_list		*repair_now,
+	unsigned long long		*broken_primaries,
+	unsigned long long		*broken_secondaries)
+{
+	struct repair_item		*n;
+	struct repair_item		*ri;
+
+	list_for_each_entry_safe(ri, n, &repairs->list, list) {
+		switch (ri->type) {
+		case XFS_SCRUB_TYPE_RMAPBT:
+			(*broken_secondaries)++;
+			break;
+		case XFS_SCRUB_TYPE_FINOBT:
+		case XFS_SCRUB_TYPE_INOBT:
+			repairs->nr--;
+			list_del(&ri->list);
+			list_add_tail(&ri->list, &repair_now->list);
+			repair_now->nr++;
+			/* fall through */
+		case XFS_SCRUB_TYPE_BNOBT:
+		case XFS_SCRUB_TYPE_CNTBT:
+		case XFS_SCRUB_TYPE_REFCNTBT:
+			(*broken_primaries)++;
+			break;
+		default:
+			abort();
+			break;
+		}
+	}
+}
+
+/* Allocate a certain number of repair lists for the scrub context. */
+bool
+xfs_repair_lists_alloc(
+	size_t				nr,
+	struct xfs_repair_list		**listsp)
+{
+	struct xfs_repair_list		*lists;
+	xfs_agnumber_t			agno;
+
+	lists = calloc(nr, sizeof(struct xfs_repair_list));
+	if (!lists)
+		return false;
+
+	for (agno = 0; agno < nr; agno++)
+		xfs_repair_list_init(&lists[agno]);
+	*listsp = lists;
+
+	return true;
+}
+
+/* Free the repair lists. */
+void
+xfs_repair_lists_free(
+	struct xfs_repair_list		**listsp)
+{
+	free(*listsp);
+	*listsp = NULL;
+}
+
+/* Initialize repair list */
+void
+xfs_repair_list_init(
+	struct xfs_repair_list		*rl)
+{
+	INIT_LIST_HEAD(&rl->list);
+	rl->nr = 0;
+	rl->sorted = false;
+}
+
+/* Number of repairs in this list. */
+size_t
+xfs_repair_list_length(
+	struct xfs_repair_list		*rl)
+{
+	return rl->nr;
+};
+
+/* Add to the list of repairs. */
+void
+xfs_repair_list_add(
+	struct xfs_repair_list		*rl,
+	struct repair_item		*ri)
+{
+	list_add_tail(&ri->list, &rl->list);
+	rl->nr++;
+	rl->sorted = false;
+}
+
+/* Splice two repair lists. */
+void
+xfs_repair_list_splice(
+	struct xfs_repair_list		*dest,
+	struct xfs_repair_list		*src)
+{
+	if (src->nr == 0)
+		return;
+
+	list_splice_tail_init(&src->list, &dest->list);
+	dest->nr += src->nr;
+	src->nr = 0;
+	dest->sorted = false;
+}
+
+/* Repair everything on this list. */
+bool
+xfs_repair_list_now(
+	struct scrub_ctx		*ctx,
+	int				fd,
+	struct xfs_repair_list		*rl,
+	unsigned int			repair_flags)
+{
+	struct repair_item		*ri;
+	struct repair_item		*n;
+	enum check_outcome		fix;
+
+	if (!rl->sorted) {
+		list_sort(NULL, &rl->list, xfs_repair_item_compare);
+		rl->sorted = true;
+	}
+
+	list_for_each_entry_safe(ri, n, &rl->list, list) {
+		fix = xfs_repair_metadata(ctx, fd, ri, repair_flags);
+		switch (fix) {
+		case CHECK_DONE:
+			rl->nr--;
+			list_del(&ri->list);
+			free(ri);
+			continue;
+		case CHECK_ABORT:
+			return false;
+		case CHECK_RETRY:
+			continue;
+		case CHECK_REPAIR:
+			abort();
+		}
+	}
+
+	return !xfs_scrub_excessive_errors(ctx);
+}
+
+/* Defer all the repairs until phase 4. */
+void
+xfs_defer_repairs(
+	struct scrub_ctx		*ctx,
+	xfs_agnumber_t			agno,
+	struct xfs_repair_list		*rl)
+{
+	ASSERT(agno < ctx->geo.agcount);
+
+	xfs_repair_list_splice(&ctx->repair_lists[agno], rl);
+}
+
+/* Quickly try to repair AG metadata; broken things are remembered for later. */
+bool
+xfs_quick_repair(
+	struct scrub_ctx		*ctx,
+	xfs_agnumber_t			agno,
+	struct xfs_repair_list		*rl)
+{
+	bool				moveon;
+
+	moveon = xfs_repair_list_now(ctx, ctx->mnt_fd, rl, XRML_REPAIR_ONLY);
+	if (!moveon)
+		return moveon;
+
+	xfs_defer_repairs(ctx, agno, rl);
+	return true;
+}
diff --git a/scrub/repair.h b/scrub/repair.h
new file mode 100644
index 0000000..3ae15ef
--- /dev/null
+++ b/scrub/repair.h
@@ -0,0 +1,55 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef XFS_SCRUB_REPAIR_H_
+#define XFS_SCRUB_REPAIR_H_
+
+struct xfs_repair_list {
+	struct list_head	list;
+	size_t			nr;
+	bool			sorted;
+};
+
+bool xfs_repair_lists_alloc(size_t nr, struct xfs_repair_list **listsp);
+void xfs_repair_lists_free(struct xfs_repair_list **listsp);
+
+void xfs_repair_list_init(struct xfs_repair_list *rl);
+size_t xfs_repair_list_length(struct xfs_repair_list *rl);
+void xfs_repair_list_add(struct xfs_repair_list *dest,
+		struct repair_item *item);
+void xfs_repair_list_splice(struct xfs_repair_list *dest,
+		struct xfs_repair_list *src);
+
+void xfs_repair_find_mustfix(struct xfs_repair_list *repairs,
+		struct xfs_repair_list *repair_now,
+		unsigned long long *broken_primaries,
+		unsigned long long *broken_secondaries);
+
+/* Passed through to xfs_repair_metadata() */
+#define XRML_REPAIR_ONLY	(XRM_REPAIR_ONLY)
+#define XRML_NOFIX_COMPLAIN	(XRM_NOFIX_COMPLAIN)
+
+bool xfs_repair_list_now(struct scrub_ctx *ctx, int fd,
+		struct xfs_repair_list *repair_list, unsigned int repair_flags);
+void xfs_defer_repairs(struct scrub_ctx *ctx, xfs_agnumber_t agno,
+		struct xfs_repair_list *rl);
+bool xfs_quick_repair(struct scrub_ctx *ctx, xfs_agnumber_t agno,
+		struct xfs_repair_list *rl);
+
+#endif /* XFS_SCRUB_REPAIR_H_ */
diff --git a/scrub/scrub.c b/scrub/scrub.c
index 5729b9b..55e8b98 100644
--- a/scrub/scrub.c
+++ b/scrub/scrub.c
@@ -35,6 +35,7 @@
 #include "progress.h"
 #include "scrub.h"
 #include "xfs_errortag.h"
+#include "repair.h"
 
 /* Online scrub and repair wrappers. */
 
@@ -321,12 +322,47 @@ _("Optimizations of %s are possible."), scrubbers[i].name);
 	}
 }
 
+/* Save a scrub context for later repairs. */
+bool
+xfs_scrub_save_repair(
+	struct scrub_ctx		*ctx,
+	struct xfs_repair_list		*rl,
+	struct xfs_scrub_metadata	*meta)
+{
+	struct repair_item		*ri;
+
+	/* Schedule this item for later repairs. */
+	ri = malloc(sizeof(struct repair_item));
+	if (!ri) {
+		str_errno(ctx, _("repair list"));
+		return false;
+	}
+	ri->type = meta->sm_type;
+	ri->flags = meta->sm_flags;
+	switch (scrubbers[meta->sm_type].type) {
+	case ST_AGHEADER:
+	case ST_PERAG:
+		ri->agno = meta->sm_agno;
+		break;
+	case ST_INODE:
+		ri->ino = meta->sm_ino;
+		ri->gen = meta->sm_gen;
+		break;
+	default:
+		break;
+	}
+
+	xfs_repair_list_add(rl, ri);
+	return true;
+}
+
 /* Scrub metadata, saving corruption reports for later. */
 static bool
 xfs_scrub_metadata(
 	struct scrub_ctx		*ctx,
 	enum scrub_type			scrub_type,
-	xfs_agnumber_t			agno)
+	xfs_agnumber_t			agno,
+	struct xfs_repair_list		*rl)
 {
 	struct xfs_scrub_metadata	meta = {0};
 	const struct scrub_descr	*sc;
@@ -350,6 +386,8 @@ xfs_scrub_metadata(
 		case CHECK_ABORT:
 			return false;
 		case CHECK_REPAIR:
+			if (!xfs_scrub_save_repair(ctx, rl, &meta))
+				return false;
 			/* fall through */
 		case CHECK_DONE:
 			continue;
@@ -369,7 +407,8 @@ xfs_scrub_metadata(
  */
 bool
 xfs_scrub_primary_super(
-	struct scrub_ctx		*ctx)
+	struct scrub_ctx		*ctx,
+	struct xfs_repair_list		*repair_list)
 {
 	struct xfs_scrub_metadata	meta = {
 		.sm_type = XFS_SCRUB_TYPE_SB,
@@ -382,6 +421,8 @@ xfs_scrub_primary_super(
 	case CHECK_ABORT:
 		return false;
 	case CHECK_REPAIR:
+		if (!xfs_scrub_save_repair(ctx, repair_list, &meta))
+			return false;
 		/* fall through */
 	case CHECK_DONE:
 		return true;
@@ -397,26 +438,29 @@ xfs_scrub_primary_super(
 bool
 xfs_scrub_ag_headers(
 	struct scrub_ctx		*ctx,
-	xfs_agnumber_t			agno)
+	xfs_agnumber_t			agno,
+	struct xfs_repair_list		*rl)
 {
-	return xfs_scrub_metadata(ctx, ST_AGHEADER, agno);
+	return xfs_scrub_metadata(ctx, ST_AGHEADER, agno, rl);
 }
 
 /* Scrub each AG's metadata btrees. */
 bool
 xfs_scrub_ag_metadata(
 	struct scrub_ctx		*ctx,
-	xfs_agnumber_t			agno)
+	xfs_agnumber_t			agno,
+	struct xfs_repair_list		*rl)
 {
-	return xfs_scrub_metadata(ctx, ST_PERAG, agno);
+	return xfs_scrub_metadata(ctx, ST_PERAG, agno, rl);
 }
 
 /* Scrub whole-FS metadata btrees. */
 bool
 xfs_scrub_fs_metadata(
-	struct scrub_ctx		*ctx)
+	struct scrub_ctx		*ctx,
+	struct xfs_repair_list		*rl)
 {
-	return xfs_scrub_metadata(ctx, ST_FS, 0);
+	return xfs_scrub_metadata(ctx, ST_FS, 0, rl);
 }
 
 /* How many items do we have to check? */
@@ -452,7 +496,8 @@ __xfs_scrub_file(
 	uint64_t			ino,
 	uint32_t			gen,
 	int				fd,
-	unsigned int			type)
+	unsigned int			type,
+	struct xfs_repair_list		*rl)
 {
 	struct xfs_scrub_metadata	meta = {0};
 	enum check_outcome		fix;
@@ -471,7 +516,7 @@ __xfs_scrub_file(
 	if (fix == CHECK_DONE)
 		return true;
 
-	return true;
+	return xfs_scrub_save_repair(ctx, rl, &meta);
 }
 
 bool
@@ -479,9 +524,10 @@ xfs_scrub_inode_fields(
 	struct scrub_ctx	*ctx,
 	uint64_t		ino,
 	uint32_t		gen,
-	int			fd)
+	int			fd,
+	struct xfs_repair_list	*rl)
 {
-	return __xfs_scrub_file(ctx, ino, gen, fd, XFS_SCRUB_TYPE_INODE);
+	return __xfs_scrub_file(ctx, ino, gen, fd, XFS_SCRUB_TYPE_INODE, rl);
 }
 
 bool
@@ -489,9 +535,10 @@ xfs_scrub_data_fork(
 	struct scrub_ctx	*ctx,
 	uint64_t		ino,
 	uint32_t		gen,
-	int			fd)
+	int			fd,
+	struct xfs_repair_list	*rl)
 {
-	return __xfs_scrub_file(ctx, ino, gen, fd, XFS_SCRUB_TYPE_BMBTD);
+	return __xfs_scrub_file(ctx, ino, gen, fd, XFS_SCRUB_TYPE_BMBTD, rl);
 }
 
 bool
@@ -499,9 +546,10 @@ xfs_scrub_attr_fork(
 	struct scrub_ctx	*ctx,
 	uint64_t		ino,
 	uint32_t		gen,
-	int			fd)
+	int			fd,
+	struct xfs_repair_list	*rl)
 {
-	return __xfs_scrub_file(ctx, ino, gen, fd, XFS_SCRUB_TYPE_BMBTA);
+	return __xfs_scrub_file(ctx, ino, gen, fd, XFS_SCRUB_TYPE_BMBTA, rl);
 }
 
 bool
@@ -509,9 +557,10 @@ xfs_scrub_cow_fork(
 	struct scrub_ctx	*ctx,
 	uint64_t		ino,
 	uint32_t		gen,
-	int			fd)
+	int			fd,
+	struct xfs_repair_list	*rl)
 {
-	return __xfs_scrub_file(ctx, ino, gen, fd, XFS_SCRUB_TYPE_BMBTC);
+	return __xfs_scrub_file(ctx, ino, gen, fd, XFS_SCRUB_TYPE_BMBTC, rl);
 }
 
 bool
@@ -519,9 +568,10 @@ xfs_scrub_dir(
 	struct scrub_ctx	*ctx,
 	uint64_t		ino,
 	uint32_t		gen,
-	int			fd)
+	int			fd,
+	struct xfs_repair_list	*rl)
 {
-	return __xfs_scrub_file(ctx, ino, gen, fd, XFS_SCRUB_TYPE_DIR);
+	return __xfs_scrub_file(ctx, ino, gen, fd, XFS_SCRUB_TYPE_DIR, rl);
 }
 
 bool
@@ -529,9 +579,10 @@ xfs_scrub_attr(
 	struct scrub_ctx	*ctx,
 	uint64_t		ino,
 	uint32_t		gen,
-	int			fd)
+	int			fd,
+	struct xfs_repair_list	*rl)
 {
-	return __xfs_scrub_file(ctx, ino, gen, fd, XFS_SCRUB_TYPE_XATTR);
+	return __xfs_scrub_file(ctx, ino, gen, fd, XFS_SCRUB_TYPE_XATTR, rl);
 }
 
 bool
@@ -539,9 +590,10 @@ xfs_scrub_symlink(
 	struct scrub_ctx	*ctx,
 	uint64_t		ino,
 	uint32_t		gen,
-	int			fd)
+	int			fd,
+	struct xfs_repair_list	*rl)
 {
-	return __xfs_scrub_file(ctx, ino, gen, fd, XFS_SCRUB_TYPE_SYMLINK);
+	return __xfs_scrub_file(ctx, ino, gen, fd, XFS_SCRUB_TYPE_SYMLINK, rl);
 }
 
 bool
@@ -549,9 +601,10 @@ xfs_scrub_parent(
 	struct scrub_ctx	*ctx,
 	uint64_t		ino,
 	uint32_t		gen,
-	int			fd)
+	int			fd,
+	struct xfs_repair_list	*rl)
 {
-	return __xfs_scrub_file(ctx, ino, gen, fd, XFS_SCRUB_TYPE_PARENT);
+	return __xfs_scrub_file(ctx, ino, gen, fd, XFS_SCRUB_TYPE_PARENT, rl);
 }
 
 /* Test the availability of a kernel scrub command. */
@@ -773,7 +826,7 @@ _("Read-only filesystem; cannot make changes."));
 		xfs_scrub_warn_incomplete_scrub(ctx, buf, &meta);
 	if (needs_repair(&meta)) {
 		/* Still broken, try again or fix offline. */
-		if (repair_flags & XRM_NOFIX_COMPLAIN)
+		if ((repair_flags & XRM_NOFIX_COMPLAIN) || debug)
 			str_error(ctx, buf,
 _("Repair unsuccessful; offline repair required."));
 	} else {
diff --git a/scrub/scrub.h b/scrub/scrub.h
index 1c44fba..22ac89a 100644
--- a/scrub/scrub.h
+++ b/scrub/scrub.h
@@ -28,11 +28,19 @@ enum check_outcome {
 	CHECK_RETRY,	/* repair failed, try again later */
 };
 
+struct repair_item;
+
 void xfs_scrub_report_preen_triggers(struct scrub_ctx *ctx);
-bool xfs_scrub_primary_super(struct scrub_ctx *ctx);
-bool xfs_scrub_ag_headers(struct scrub_ctx *ctx, xfs_agnumber_t agno);
-bool xfs_scrub_ag_metadata(struct scrub_ctx *ctx, xfs_agnumber_t agno);
-bool xfs_scrub_fs_metadata(struct scrub_ctx *ctx);
+bool xfs_scrub_primary_super(struct scrub_ctx *ctx,
+		struct xfs_repair_list *repair_list);
+bool xfs_scrub_ag_headers(struct scrub_ctx *ctx, xfs_agnumber_t agno,
+		struct xfs_repair_list *repair_list);
+bool xfs_scrub_ag_metadata(struct scrub_ctx *ctx, xfs_agnumber_t agno,
+		struct xfs_repair_list *repair_list);
+bool xfs_scrub_fs_metadata(struct scrub_ctx *ctx,
+		struct xfs_repair_list *repair_list);
+enum check_outcome xfs_repair_metadata(struct scrub_ctx *ctx, int fd,
+		struct repair_item *ri, unsigned int flags);
 
 bool xfs_can_scrub_fs_metadata(struct scrub_ctx *ctx);
 bool xfs_can_scrub_inode(struct scrub_ctx *ctx);
@@ -44,21 +52,21 @@ bool xfs_can_scrub_parent(struct scrub_ctx *ctx);
 bool xfs_can_repair(struct scrub_ctx *ctx);
 
 bool xfs_scrub_inode_fields(struct scrub_ctx *ctx, uint64_t ino, uint32_t gen,
-		int fd);
+		int fd, struct xfs_repair_list *repair_list);
 bool xfs_scrub_data_fork(struct scrub_ctx *ctx, uint64_t ino, uint32_t gen,
-		int fd);
+		int fd, struct xfs_repair_list *repair_list);
 bool xfs_scrub_attr_fork(struct scrub_ctx *ctx, uint64_t ino, uint32_t gen,
-		int fd);
+		int fd, struct xfs_repair_list *repair_list);
 bool xfs_scrub_cow_fork(struct scrub_ctx *ctx, uint64_t ino, uint32_t gen,
-		int fd);
+		int fd, struct xfs_repair_list *repair_list);
 bool xfs_scrub_dir(struct scrub_ctx *ctx, uint64_t ino, uint32_t gen,
-		int fd);
+		int fd, struct xfs_repair_list *repair_list);
 bool xfs_scrub_attr(struct scrub_ctx *ctx, uint64_t ino, uint32_t gen,
-		int fd);
+		int fd, struct xfs_repair_list *repair_list);
 bool xfs_scrub_symlink(struct scrub_ctx *ctx, uint64_t ino, uint32_t gen,
-		int fd);
+		int fd, struct xfs_repair_list *repair_list);
 bool xfs_scrub_parent(struct scrub_ctx *ctx, uint64_t ino, uint32_t gen,
-		int fd);
+		int fd, struct xfs_repair_list *repair_list);
 
 /* Repair parameters are the scrub inputs and retry count. */
 struct repair_item {
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index b5ce4c6..b9dd4d9 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -88,6 +88,15 @@
  * the previous two phases are retried here; if there are uncorrectable
  * errors, xfs_scrub stops here.
  *
+ * To perform the actual repairs, we iterate all the items on the per-AG
+ * repair list and ask the kernel to repair them.  Items which are
+ * successfully repaired are removed from the list.  If an item is not
+ * repaired successfully (or the kernel asks us to try again), we retry
+ * the repairs until there is nothing left to fix or we fail to make
+ * forward progress.  In that event, the unrepaired items are recorded
+ * as errors.  If there are no errors at this point, we call FSTRIM on
+ * the filesystem.
+ *
  * The next phase is the "check directory tree" phase.  In this phase,
  * every directory is opened (via file handle) to confirm that each
  * directory is connected to the root.  Directory entries are checked
@@ -707,6 +716,19 @@ _("%s: Not a XFS mount point or block device.\n"),
 		ret |= 8;
 
 out:
+	if (ctx.repairs && ctx.preens)
+		fprintf(stdout,
+_("%s: %llu repairs and %llu optimizations made.\n"),
+			ctx.mntpoint, ctx.repairs, ctx.preens);
+	else if (ctx.repairs && ctx.preens == 0)
+		fprintf(stdout,
+_("%s: %llu repairs made.\n"),
+			ctx.mntpoint, ctx.repairs);
+	else if (ctx.repairs == 0 && ctx.preens)
+		fprintf(stdout,
+_("%s: %llu optimizations made.\n"),
+			ctx.mntpoint, ctx.preens);
+
 	total_errors = ctx.errors_found + ctx.runtime_errors;
 	if (ctx.need_repair)
 		repairstr = _("  Unmount and run xfs_repair.");
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index 83b8ae2..bd21642 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -90,6 +90,7 @@ struct scrub_ctx {
 
 	/* Mutable scrub state; use lock. */
 	pthread_mutex_t		lock;
+	struct xfs_repair_list	*repair_lists;
 	unsigned long long	max_errors;
 	unsigned long long	runtime_errors;
 	unsigned long long	errors_found;

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH 24/27] xfs_scrub: fstrim the free areas if there are no errors on the filesystem
  2018-01-06  1:54 ` [PATCH 24/27] xfs_scrub: fstrim the free areas if there are no errors on the filesystem Darrick J. Wong
@ 2018-01-16 22:07   ` Eric Sandeen
  2018-01-16 22:23     ` Darrick J. Wong
  0 siblings, 1 reply; 61+ messages in thread
From: Eric Sandeen @ 2018-01-16 22:07 UTC (permalink / raw)
  To: Darrick J. Wong, sandeen; +Cc: linux-xfs

On 1/5/18 7:54 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> If the filesystem scan comes out clean or fixes all the problems, call
> fstrim to clean out the free areas (if it's an ssd/thinp/whatever).

Is this the right patch header for this patch?

Oh ok, this adds a "repair phase" which is really only implementing
preen for now, which is really only fstrimming at this point.

so:

preen()
  if no errors
     xfs_repair_fs() (IMHO odd to call "repair" on a clean filesystem?)
       fstrim

So I guess what was confusing to me is that you do "preen" work under
"repair" functions.  I get it that they might all be lumped together
in pending work now, but I'm still wrapping my head around what does
and doesn't happen in various modes, and how to recognize that in
the code...


> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  scrub/Makefile    |    1 +
>  scrub/phase4.c    |   52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  scrub/vfs.c       |   23 +++++++++++++++++++++++
>  scrub/vfs.h       |    2 ++
>  scrub/xfs_scrub.c |   26 +++++++++++++++++++++++++-
>  scrub/xfs_scrub.h |    1 +
>  6 files changed, 104 insertions(+), 1 deletion(-)
>  create mode 100644 scrub/phase4.c
> 
> 
> diff --git a/scrub/Makefile b/scrub/Makefile
> index fd26624..91f99ff 100644
> --- a/scrub/Makefile
> +++ b/scrub/Makefile
> @@ -41,6 +41,7 @@ inodes.c \
>  phase1.c \
>  phase2.c \
>  phase3.c \
> +phase4.c \
>  phase5.c \
>  phase6.c \
>  phase7.c \
> diff --git a/scrub/phase4.c b/scrub/phase4.c
> new file mode 100644
> index 0000000..dadf4de
> --- /dev/null
> +++ b/scrub/phase4.c
> @@ -0,0 +1,52 @@
> +/*
> + * Copyright (C) 2018 Oracle.  All Rights Reserved.
> + *
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#include <stdio.h>
> +#include <stdint.h>
> +#include <stdbool.h>
> +#include <dirent.h>
> +#include <sys/types.h>
> +#include <sys/stat.h>
> +#include <sys/statvfs.h>
> +#include "xfs.h"
> +#include "xfs_fs.h"
> +#include "list.h"
> +#include "path.h"
> +#include "workqueue.h"
> +#include "xfs_scrub.h"
> +#include "common.h"
> +#include "scrub.h"
> +#include "vfs.h"
> +
> +/* Phase 4: Repair filesystem. */
> +
> +/* Fix everything that needs fixing. */
> +bool
> +xfs_repair_fs(
> +	struct scrub_ctx		*ctx)
> +{
> +	bool				moveon = true;
> +
> +	pthread_mutex_lock(&ctx->lock);
> +	if (moveon && ctx->errors_found == 0)
> +		fstrim(ctx);
> +	pthread_mutex_unlock(&ctx->lock);
> +
> +	return moveon;
> +}
> diff --git a/scrub/vfs.c b/scrub/vfs.c
> index 6a51090..98d356f 100644
> --- a/scrub/vfs.c
> +++ b/scrub/vfs.c
> @@ -219,3 +219,26 @@ _("Could not queue directory scan work."));
>  	free(sftd);
>  	return false;
>  }
> +
> +#ifndef FITRIM
> +struct fstrim_range {
> +	__u64 start;
> +	__u64 len;
> +	__u64 minlen;
> +};
> +#define FITRIM		_IOWR('X', 121, struct fstrim_range)	/* Trim */
> +#endif

(I wonder if we should move all these "if it ain't available define it"
stuff into a single header file at some point...)

> +
> +/* Call FITRIM to trim all the unused space in a filesystem. */
> +void
> +fstrim(
> +	struct scrub_ctx	*ctx)
> +{
> +	struct fstrim_range	range = {0};
> +	int			error;
> +
> +	range.len = ULLONG_MAX;
> +	error = ioctl(ctx->mnt_fd, FITRIM, &range);
> +	if (error && errno != EOPNOTSUPP && errno != ENOTTY)
> +		perror(_("fstrim"));
> +}

still wondering if we should have an option to skip this, given some device's
horrific performance under fstrim, and/or an other desire to keep an image
whole.

> diff --git a/scrub/vfs.h b/scrub/vfs.h
> index 100eb18..3305159 100644
> --- a/scrub/vfs.h
> +++ b/scrub/vfs.h
> @@ -28,4 +28,6 @@ typedef bool (*scan_fs_tree_dirent_fn)(struct scrub_ctx *, const char *,
>  bool scan_fs_tree(struct scrub_ctx *ctx, scan_fs_tree_dir_fn dir_fn,
>  		scan_fs_tree_dirent_fn dirent_fn, void *arg);
>  
> +void fstrim(struct scrub_ctx *ctx);
> +
>  #endif /* XFS_SCRUB_VFS_H_ */
> diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
> index bc40f3c..7809431 100644
> --- a/scrub/xfs_scrub.c
> +++ b/scrub/xfs_scrub.c
> @@ -340,6 +340,20 @@ _("%sI/O rate: %.1f%s/s in, %.1f%s/s out, %.1f%s/s tot\n"),
>  	return true;
>  }
>  
> +/* Run the preening phase if there are no errors. */
> +static bool
> +preen(
> +	struct scrub_ctx	*ctx)
> +{
> +	if (ctx->errors_found) {
> +		str_info(ctx, ctx->mntpoint,
> +_("Errors found, please re-run with -y."));
> +		return true;
> +	}
> +
> +	return xfs_repair_fs(ctx);
> +}
> +
>  /* Run all the phases of the scrubber. */
>  static bool
>  run_scrub_phases(
> @@ -393,8 +407,18 @@ run_scrub_phases(
>  	/* Run all phases of the scrub tool. */
>  	for (phase = 1, sp = phases; sp->fn; sp++, phase++) {
>  		/* Turn on certain phases if user said to. */
> -		if (sp->fn == DATASCAN_DUMMY_FN && scrub_data)
> +		if (sp->fn == DATASCAN_DUMMY_FN && scrub_data) {
>  			sp->fn = xfs_scan_blocks;
> +		} else if (sp->fn == REPAIR_DUMMY_FN) {
> +			if (ctx->mode == SCRUB_MODE_PREEN) {
> +				sp->descr = _("Preen filesystem.");
> +				sp->fn = preen;
> +			} else if (ctx->mode == SCRUB_MODE_REPAIR) {
> +				sp->descr = _("Repair filesystem.");
> +				sp->fn = xfs_repair_fs;
> +			}
> +			sp->must_run = true;

if must_run is always true here, should it just be initialized in
the structure along w/ the other must_run phases?

> +		}
>  
>  		/* Skip certain phases unless they're turned on. */
>  		if (sp->fn == REPAIR_DUMMY_FN ||
> diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
> index a5cdba8..4a383f1 100644
> --- a/scrub/xfs_scrub.h
> +++ b/scrub/xfs_scrub.h
> @@ -108,5 +108,6 @@ bool xfs_scan_inodes(struct scrub_ctx *ctx);
>  bool xfs_scan_connections(struct scrub_ctx *ctx);
>  bool xfs_scan_blocks(struct scrub_ctx *ctx);
>  bool xfs_scan_summary(struct scrub_ctx *ctx);
> +bool xfs_repair_fs(struct scrub_ctx *ctx);
>  
>  #endif /* XFS_SCRUB_XFS_SCRUB_H_ */
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 24/27] xfs_scrub: fstrim the free areas if there are no errors on the filesystem
  2018-01-16 22:07   ` Eric Sandeen
@ 2018-01-16 22:23     ` Darrick J. Wong
  0 siblings, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-16 22:23 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: sandeen, linux-xfs

On Tue, Jan 16, 2018 at 04:07:44PM -0600, Eric Sandeen wrote:
> On 1/5/18 7:54 PM, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > If the filesystem scan comes out clean or fixes all the problems, call
> > fstrim to clean out the free areas (if it's an ssd/thinp/whatever).
> 
> Is this the right patch header for this patch?
> 
> Oh ok, this adds a "repair phase" which is really only implementing
> preen for now, which is really only fstrimming at this point.
> 
> so:
> 
> preen()
>   if no errors
>      xfs_repair_fs() (IMHO odd to call "repair" on a clean filesystem?)
>        fstrim
> 
> So I guess what was confusing to me is that you do "preen" work under
> "repair" functions.  I get it that they might all be lumped together
> in pending work now, but I'm still wrapping my head around what does
> and doesn't happen in various modes, and how to recognize that in
> the code...

Based on our extended IRC conversations I was planning to rename the
"repair list" to "action items" so that we could have a
xfs_process_action_items() that actually takes care of issuing the
repair calls, then we could have two wrappers:

xfs_repair_fs() -> xfs_process_action_items(); fstrim();
xfs_preen_fs() -> if (!ctx->errors_found) xfs_process_action_items()

> 
> 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  scrub/Makefile    |    1 +
> >  scrub/phase4.c    |   52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  scrub/vfs.c       |   23 +++++++++++++++++++++++
> >  scrub/vfs.h       |    2 ++
> >  scrub/xfs_scrub.c |   26 +++++++++++++++++++++++++-
> >  scrub/xfs_scrub.h |    1 +
> >  6 files changed, 104 insertions(+), 1 deletion(-)
> >  create mode 100644 scrub/phase4.c
> > 
> > 
> > diff --git a/scrub/Makefile b/scrub/Makefile
> > index fd26624..91f99ff 100644
> > --- a/scrub/Makefile
> > +++ b/scrub/Makefile
> > @@ -41,6 +41,7 @@ inodes.c \
> >  phase1.c \
> >  phase2.c \
> >  phase3.c \
> > +phase4.c \
> >  phase5.c \
> >  phase6.c \
> >  phase7.c \
> > diff --git a/scrub/phase4.c b/scrub/phase4.c
> > new file mode 100644
> > index 0000000..dadf4de
> > --- /dev/null
> > +++ b/scrub/phase4.c
> > @@ -0,0 +1,52 @@
> > +/*
> > + * Copyright (C) 2018 Oracle.  All Rights Reserved.
> > + *
> > + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License
> > + * as published by the Free Software Foundation; either version 2
> > + * of the License, or (at your option) any later version.
> > + *
> > + * This program is distributed in the hope that it would be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, write the Free Software Foundation,
> > + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> > + */
> > +#include <stdio.h>
> > +#include <stdint.h>
> > +#include <stdbool.h>
> > +#include <dirent.h>
> > +#include <sys/types.h>
> > +#include <sys/stat.h>
> > +#include <sys/statvfs.h>
> > +#include "xfs.h"
> > +#include "xfs_fs.h"
> > +#include "list.h"
> > +#include "path.h"
> > +#include "workqueue.h"
> > +#include "xfs_scrub.h"
> > +#include "common.h"
> > +#include "scrub.h"
> > +#include "vfs.h"
> > +
> > +/* Phase 4: Repair filesystem. */
> > +
> > +/* Fix everything that needs fixing. */
> > +bool
> > +xfs_repair_fs(
> > +	struct scrub_ctx		*ctx)
> > +{
> > +	bool				moveon = true;
> > +
> > +	pthread_mutex_lock(&ctx->lock);
> > +	if (moveon && ctx->errors_found == 0)
> > +		fstrim(ctx);
> > +	pthread_mutex_unlock(&ctx->lock);
> > +
> > +	return moveon;
> > +}
> > diff --git a/scrub/vfs.c b/scrub/vfs.c
> > index 6a51090..98d356f 100644
> > --- a/scrub/vfs.c
> > +++ b/scrub/vfs.c
> > @@ -219,3 +219,26 @@ _("Could not queue directory scan work."));
> >  	free(sftd);
> >  	return false;
> >  }
> > +
> > +#ifndef FITRIM
> > +struct fstrim_range {
> > +	__u64 start;
> > +	__u64 len;
> > +	__u64 minlen;
> > +};
> > +#define FITRIM		_IOWR('X', 121, struct fstrim_range)	/* Trim */
> > +#endif
> 
> (I wonder if we should move all these "if it ain't available define it"
> stuff into a single header file at some point...)

Yeah, probably....

> > +
> > +/* Call FITRIM to trim all the unused space in a filesystem. */
> > +void
> > +fstrim(
> > +	struct scrub_ctx	*ctx)
> > +{
> > +	struct fstrim_range	range = {0};
> > +	int			error;
> > +
> > +	range.len = ULLONG_MAX;
> > +	error = ioctl(ctx->mnt_fd, FITRIM, &range);
> > +	if (error && errno != EOPNOTSUPP && errno != ENOTTY)
> > +		perror(_("fstrim"));
> > +}
> 
> still wondering if we should have an option to skip this, given some device's
> horrific performance under fstrim, and/or an other desire to keep an image
> whole.

I already added it in my dev tree.  -k turns off FITRIM.

> > diff --git a/scrub/vfs.h b/scrub/vfs.h
> > index 100eb18..3305159 100644
> > --- a/scrub/vfs.h
> > +++ b/scrub/vfs.h
> > @@ -28,4 +28,6 @@ typedef bool (*scan_fs_tree_dirent_fn)(struct scrub_ctx *, const char *,
> >  bool scan_fs_tree(struct scrub_ctx *ctx, scan_fs_tree_dir_fn dir_fn,
> >  		scan_fs_tree_dirent_fn dirent_fn, void *arg);
> >  
> > +void fstrim(struct scrub_ctx *ctx);
> > +
> >  #endif /* XFS_SCRUB_VFS_H_ */
> > diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
> > index bc40f3c..7809431 100644
> > --- a/scrub/xfs_scrub.c
> > +++ b/scrub/xfs_scrub.c
> > @@ -340,6 +340,20 @@ _("%sI/O rate: %.1f%s/s in, %.1f%s/s out, %.1f%s/s tot\n"),
> >  	return true;
> >  }
> >  
> > +/* Run the preening phase if there are no errors. */
> > +static bool
> > +preen(
> > +	struct scrub_ctx	*ctx)
> > +{
> > +	if (ctx->errors_found) {
> > +		str_info(ctx, ctx->mntpoint,
> > +_("Errors found, please re-run with -y."));
> > +		return true;
> > +	}
> > +
> > +	return xfs_repair_fs(ctx);
> > +}
> > +
> >  /* Run all the phases of the scrubber. */
> >  static bool
> >  run_scrub_phases(
> > @@ -393,8 +407,18 @@ run_scrub_phases(
> >  	/* Run all phases of the scrub tool. */
> >  	for (phase = 1, sp = phases; sp->fn; sp++, phase++) {
> >  		/* Turn on certain phases if user said to. */
> > -		if (sp->fn == DATASCAN_DUMMY_FN && scrub_data)
> > +		if (sp->fn == DATASCAN_DUMMY_FN && scrub_data) {
> >  			sp->fn = xfs_scan_blocks;
> > +		} else if (sp->fn == REPAIR_DUMMY_FN) {
> > +			if (ctx->mode == SCRUB_MODE_PREEN) {
> > +				sp->descr = _("Preen filesystem.");
> > +				sp->fn = preen;
> > +			} else if (ctx->mode == SCRUB_MODE_REPAIR) {
> > +				sp->descr = _("Repair filesystem.");
> > +				sp->fn = xfs_repair_fs;
> > +			}
> > +			sp->must_run = true;
> 
> if must_run is always true here, should it just be initialized in
> the structure along w/ the other must_run phases?

Ok.

--D

> > +		}
> >  
> >  		/* Skip certain phases unless they're turned on. */
> >  		if (sp->fn == REPAIR_DUMMY_FN ||
> > diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
> > index a5cdba8..4a383f1 100644
> > --- a/scrub/xfs_scrub.h
> > +++ b/scrub/xfs_scrub.h
> > @@ -108,5 +108,6 @@ bool xfs_scan_inodes(struct scrub_ctx *ctx);
> >  bool xfs_scan_connections(struct scrub_ctx *ctx);
> >  bool xfs_scan_blocks(struct scrub_ctx *ctx);
> >  bool xfs_scan_summary(struct scrub_ctx *ctx);
> > +bool xfs_repair_fs(struct scrub_ctx *ctx);
> >  
> >  #endif /* XFS_SCRUB_XFS_SCRUB_H_ */
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 18/27] xfs_scrub: warn about normalized Unicode name collisions
  2018-01-06  1:53 ` [PATCH 18/27] xfs_scrub: warn about normalized Unicode name collisions Darrick J. Wong
@ 2018-01-16 23:52   ` Eric Sandeen
  2018-01-16 23:57     ` Eric Sandeen
  2018-01-16 23:59     ` Darrick J. Wong
  0 siblings, 2 replies; 61+ messages in thread
From: Eric Sandeen @ 2018-01-16 23:52 UTC (permalink / raw)
  To: Darrick J. Wong, sandeen; +Cc: linux-xfs


> +/* Print a warning string and whatever error is stored in errno. */
> +void
> +__str_errno_warn(
> +	struct scrub_ctx	*ctx,
> +	const char		*descr,
> +	const char		*file,
> +	int			line)
> +{
> +	char			buf[DESCR_BUFSZ];
> +
> +	pthread_mutex_lock(&ctx->lock);
> +	fprintf(stderr, _("Warning: %s: %s."), descr,
> +			strerror_r(errno, buf, DESCR_BUFSZ));
> +	if (debug)
> +		fprintf(stderr, _(" (%s line %d)"), file, line);
> +	fprintf(stderr, "\n");
> +	ctx->warnings_found++;
> +	pthread_mutex_unlock(&ctx->lock);
> +}
> +

Oh hello, unused-new-6th-printing-variant!  ;)

It took a lot of careful peering at, and scrolling around, to figure
out what all these different __str_ variants do.

Can we collapse all these str_foo_bar things down into a function
that makes logical choices based on what's passed in?  Here's what
I was playing with, see if it actually implements what you want
and if it's any better, and yeah, long lines sorry.

common.h:

void __str_out(struct scrub_ctx *, const char *descr, int level, int error,
		const char *file, int line, const char *format, ...);

#define S_ERROR	0
#define S_WARN	1
#define S_INFO	2

#define str_errno(ctx, str)		__str_out(ctx, str, S_ERROR, errno, __FILE__, __LINE__, NULL)
#define str_error(ctx, str, ...)	__str_out(ctx, str, S_ERROR, 0,     __FILE__, __LINE__, __VA_ARGS__)
#define str_errno_warn(ctx, str)	__str_out(ctx, str, S_WARN,  errno, __FILE__, __LINE__, NULL)
#define str_warn(ctx, str, ...)		__str_out(ctx, str, S_WARN,  0,     __FILE__, __LINE__, __VA_ARGS__)
#define str_info(ctx, str, ...)		__str_out(ctx, str, S_INFO,  0,     __FILE__, __LINE__, __VA_ARGS__)

/* note, could rationalize those names a bit, maybe must str_errno -> str_errno_error? */

common.c:

/* If stdout/stderr is a tty, clear to end of line to clean up progress bar. */
static inline const char *str_start(FILE *stream)
{
	if (stream == stderr)
		return stderr_isatty ? CLEAR_EOL : "";
	else
		return stdout_isatty ? CLEAR_EOL : "";
}

static const char *err_str[] = {
        "Error",
        "Warning",
        "Info",
};

/* Print a warning string and some warning text. */
void
__str_out(
	struct scrub_ctx	*ctx,
	const char		*descr,
	int			level,
	int			error,
	const char		*file,
	int			line,
	const char		*format,
	...)
{
	FILE			*stream = stderr;
	va_list			args;
	char                    buf[DESCR_BUFSZ];

	/* print strerror or format of choice but not both */
	if (error && format)
		abort();

	if (level >= S_INFO)
		stream = stdout;

	pthread_mutex_lock(&ctx->lock);
	if (errno)
		fprintf(stream, _("%s%s: %s: %s."),
				str_start(stream), err_str[level], descr,
				strerror_r(errno, buf, DESCR_BUFSZ));
	else {
		fprintf(stream, _("%s%s: %s: "),
				str_start(stream), err_str[level], descr);

		va_start(args, format);
		vfprintf(stream, format, args);
		va_end(args);
	}

	if (debug)
		fprintf(stream, _(" (%s line %d)"), file, line);
	fprintf(stream, "\n");
	if (stream == stdout)
		fflush(stream);

	if (errno)	/* A syscall failed */
		ctx->runtime_errors++;
	else if (level == S_ERROR)
		ctx->errors_found++;
	else if (level == S_WARN)
		ctx->warnings_found++;

	pthread_mutex_unlock(&ctx->lock);
}

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 18/27] xfs_scrub: warn about normalized Unicode name collisions
  2018-01-16 23:52   ` Eric Sandeen
@ 2018-01-16 23:57     ` Eric Sandeen
  2018-01-16 23:59     ` Darrick J. Wong
  1 sibling, 0 replies; 61+ messages in thread
From: Eric Sandeen @ 2018-01-16 23:57 UTC (permalink / raw)
  To: Eric Sandeen, Darrick J. Wong; +Cc: linux-xfs

On 1/16/18 5:52 PM, Eric Sandeen wrote:

...

> /* Print a warning string and some warning text. */
> void
> __str_out(
> 	struct scrub_ctx	*ctx,
> 	const char		*descr,
> 	int			level,
> 	int			error,
> 	const char		*file,
> 	int			line,
> 	const char		*format,
> 	...)
> {
> 	FILE			*stream = stderr;
> 	va_list			args;
> 	char                    buf[DESCR_BUFSZ];
> 
> 	/* print strerror or format of choice but not both */
> 	if (error && format)
> 		abort();
> 
> 	if (level >= S_INFO)
> 		stream = stdout;
> 
> 	pthread_mutex_lock(&ctx->lock);

oops this and every other "errno" below should be error, sorry:

> 	if (errno)
> 		fprintf(stream, _("%s%s: %s: %s."),
> 				str_start(stream), err_str[level], descr,
> 				strerror_r(errno, buf, DESCR_BUFSZ));

-Eric

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 18/27] xfs_scrub: warn about normalized Unicode name collisions
  2018-01-16 23:52   ` Eric Sandeen
  2018-01-16 23:57     ` Eric Sandeen
@ 2018-01-16 23:59     ` Darrick J. Wong
  1 sibling, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-16 23:59 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: sandeen, linux-xfs

On Tue, Jan 16, 2018 at 05:52:10PM -0600, Eric Sandeen wrote:
> 
> > +/* Print a warning string and whatever error is stored in errno. */
> > +void
> > +__str_errno_warn(
> > +	struct scrub_ctx	*ctx,
> > +	const char		*descr,
> > +	const char		*file,
> > +	int			line)
> > +{
> > +	char			buf[DESCR_BUFSZ];
> > +
> > +	pthread_mutex_lock(&ctx->lock);
> > +	fprintf(stderr, _("Warning: %s: %s."), descr,
> > +			strerror_r(errno, buf, DESCR_BUFSZ));
> > +	if (debug)
> > +		fprintf(stderr, _(" (%s line %d)"), file, line);
> > +	fprintf(stderr, "\n");
> > +	ctx->warnings_found++;
> > +	pthread_mutex_unlock(&ctx->lock);
> > +}
> > +
> 
> Oh hello, unused-new-6th-printing-variant!  ;)
> 
> It took a lot of careful peering at, and scrolling around, to figure
> out what all these different __str_ variants do.
> 
> Can we collapse all these str_foo_bar things down into a function
> that makes logical choices based on what's passed in?  Here's what
> I was playing with, see if it actually implements what you want
> and if it's any better, and yeah, long lines sorry.
> 
> common.h:
> 
> void __str_out(struct scrub_ctx *, const char *descr, int level, int error,
> 		const char *file, int line, const char *format, ...);
> 
> #define S_ERROR	0
> #define S_WARN	1
> #define S_INFO	2
> 
> #define str_errno(ctx, str)		__str_out(ctx, str, S_ERROR, errno, __FILE__, __LINE__, NULL)
> #define str_error(ctx, str, ...)	__str_out(ctx, str, S_ERROR, 0,     __FILE__, __LINE__, __VA_ARGS__)
> #define str_errno_warn(ctx, str)	__str_out(ctx, str, S_WARN,  errno, __FILE__, __LINE__, NULL)
> #define str_warn(ctx, str, ...)		__str_out(ctx, str, S_WARN,  0,     __FILE__, __LINE__, __VA_ARGS__)
> #define str_info(ctx, str, ...)		__str_out(ctx, str, S_INFO,  0,     __FILE__, __LINE__, __VA_ARGS__)
> 
> /* note, could rationalize those names a bit, maybe must str_errno -> str_errno_error? */
> 
> common.c:
> 
> /* If stdout/stderr is a tty, clear to end of line to clean up progress bar. */
> static inline const char *str_start(FILE *stream)
> {
> 	if (stream == stderr)
> 		return stderr_isatty ? CLEAR_EOL : "";
> 	else
> 		return stdout_isatty ? CLEAR_EOL : "";
> }
> 
> static const char *err_str[] = {
>         "Error",
>         "Warning",
>         "Info",
> };
> 
> /* Print a warning string and some warning text. */
> void
> __str_out(
> 	struct scrub_ctx	*ctx,
> 	const char		*descr,
> 	int			level,
> 	int			error,
> 	const char		*file,
> 	int			line,
> 	const char		*format,
> 	...)
> {
> 	FILE			*stream = stderr;
> 	va_list			args;
> 	char                    buf[DESCR_BUFSZ];
> 
> 	/* print strerror or format of choice but not both */
> 	if (error && format)
> 		abort();
> 
> 	if (level >= S_INFO)
> 		stream = stdout;
> 
> 	pthread_mutex_lock(&ctx->lock);
> 	if (errno)
> 		fprintf(stream, _("%s%s: %s: %s."),
> 				str_start(stream), err_str[level], descr,
> 				strerror_r(errno, buf, DESCR_BUFSZ));
> 	else {
> 		fprintf(stream, _("%s%s: %s: "),
> 				str_start(stream), err_str[level], descr);
> 
> 		va_start(args, format);
> 		vfprintf(stream, format, args);
> 		va_end(args);
> 	}
> 
> 	if (debug)
> 		fprintf(stream, _(" (%s line %d)"), file, line);
> 	fprintf(stream, "\n");
> 	if (stream == stdout)
> 		fflush(stream);
> 
> 	if (errno)	/* A syscall failed */
> 		ctx->runtime_errors++;
> 	else if (level == S_ERROR)
> 		ctx->errors_found++;
> 	else if (level == S_WARN)
> 		ctx->warnings_found++;
> 
> 	pthread_mutex_unlock(&ctx->lock);

Yes, the whole thing could get unified into a single helper like this.

--D

> }
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v11 00/27] xfsprogs: online scrub/repair support
  2018-01-12  4:17 ` [PATCH v11 00/27] xfsprogs: online scrub/repair support Eric Sandeen
@ 2018-01-17  1:31   ` Darrick J. Wong
  0 siblings, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-01-17  1:31 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: sandeen, linux-xfs

On Thu, Jan 11, 2018 at 10:17:28PM -0600, Eric Sandeen wrote:
> On 1/5/18 7:51 PM, Darrick J. Wong wrote:
> > Hi all,
> > 
> > This is the eleventh revision of a patchset that adds to XFS userland tools
> > support for online metadata scrubbing and repair.  Since v10 I've rebased
> > to the latest for-next, fixed some wonky error messages, and fixed a few
> > minor problems I found via code inspection.  However, this patch series is
> > more or less the same as v10.
> 
> General note rather than finding the patches they came from ;)
> 
> these can be made static and in some cases removed from header files,
> and/or ... hm, some aren't used at all.
> 
>   'bitmap_dump' is unique to scrub/bitmap.o  (function)
>   'bitmap_iterate' is unique to scrub/bitmap.o  (function)

These only exist #ifdef DEBUG

>   'do_error' is unique to scrub/common.o  (function)

Unused, removed.

>   'display_rusage' is unique to scrub/xfs_scrub.o  (global variable)
>   'is_service' is unique to scrub/xfs_scrub.o  (global variable)
>   'scrub_data' is unique to scrub/xfs_scrub.o  (global variable)
>   'xfs_check_rmap_ioerr' is unique to scrub/phase6.o  (function)

Ok, these have been made local to the file.

>   'progname' is unique to scrub/xfs_scrub.o  (global variable)

I thought we needed to have this for libxfs?

Ah, right, we don't link against libxfs anymore. :)

--D

> bitmap_dump (and so bitmap_iterate) are unused
> do_error is unused as well?
> 
> -Eric
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 06/27] xfs_scrub: create an abstraction for a block device
  2017-11-17 21:00 [PATCH v10 00/27] xfsprogs-4.15: online scrub/repair support Darrick J. Wong
@ 2017-11-17 21:00 ` Darrick J. Wong
  0 siblings, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2017-11-17 21:00 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Create an abstraction to handle all of our low level disk operations.
We'll eventually use it to bind to a fs mount point and block device.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/Makefile |    2 +
 scrub/disk.c   |  164 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/disk.h   |   39 +++++++++++++
 3 files changed, 205 insertions(+)
 create mode 100644 scrub/disk.c
 create mode 100644 scrub/disk.h


diff --git a/scrub/Makefile b/scrub/Makefile
index ac0af94..f810790 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -17,10 +17,12 @@ endif	# scrub_prereqs
 
 HFILES = \
 common.h \
+disk.h \
 xfs_scrub.h
 
 CFILES = \
 common.c \
+disk.c \
 xfs_scrub.c
 
 LLDLIBS += $(LIBHANDLE) $(LIBFROG) $(LIBPTHREAD)
diff --git a/scrub/disk.c b/scrub/disk.c
new file mode 100644
index 0000000..fe91842
--- /dev/null
+++ b/scrub/disk.c
@@ -0,0 +1,164 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <stdbool.h>
+#include <unistd.h>
+#include <fcntl.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/ioctl.h>
+#include <sys/statvfs.h>
+#include <sys/vfs.h>
+#include <linux/fs.h>
+#include "platform_defs.h"
+#include "libfrog.h"
+#include "xfs_scrub.h"
+#include "disk.h"
+
+/*
+ * Disk Abstraction
+ *
+ * These routines help us to discover the geometry of a block device,
+ * estimate the amount of concurrent IOs that we can send to it, and
+ * abstract the process of performing read verification of disk blocks.
+ */
+
+/* Figure out how many disk heads are available. */
+static unsigned int
+__disk_heads(
+	struct disk		*disk)
+{
+	int			iomin;
+	int			ioopt;
+	unsigned short		rot;
+	int			error;
+
+	/* If it's not a block device, throw all the CPUs at it. */
+	if (!S_ISBLK(disk->d_sb.st_mode))
+		return nproc;
+
+	/* Non-rotational device?  Throw all the CPUs. */
+	rot = 1;
+	error = ioctl(disk->d_fd, BLKROTATIONAL, &rot);
+	if (error == 0 && rot == 0)
+		return nproc;
+
+	/*
+	 * Sometimes we can infer the number of devices from the
+	 * min/optimal IO sizes.
+	 */
+	iomin = ioopt = 0;
+	if (ioctl(disk->d_fd, BLKIOMIN, &iomin) == 0 &&
+	    ioctl(disk->d_fd, BLKIOOPT, &ioopt) == 0 &&
+	    iomin > 0 && ioopt > 0) {
+		return min(nproc, max(1, ioopt / iomin));
+	}
+
+	/* Rotating device?  I guess? */
+	return 2;
+}
+
+/* Figure out how many disk heads are available. */
+unsigned int
+disk_heads(
+	struct disk		*disk)
+{
+	if (nr_threads)
+		return nr_threads;
+	return __disk_heads(disk);
+}
+
+/* Open a disk device and discover its geometry. */
+struct disk *
+disk_open(
+	const char		*pathname)
+{
+	struct disk		*disk;
+	int			lba_sz;
+	int			error;
+
+	disk = calloc(1, sizeof(struct disk));
+	if (!disk)
+		return NULL;
+
+	disk->d_fd = open(pathname, O_RDONLY | O_DIRECT | O_NOATIME);
+	if (disk->d_fd < 0)
+		goto out_free;
+
+	/* Try to get LBA size. */
+	error = ioctl(disk->d_fd, BLKSSZGET, &lba_sz);
+	if (error)
+		lba_sz = 512;
+	disk->d_lbalog = log2_roundup(lba_sz);
+
+	/* Obtain disk's stat info. */
+	error = fstat(disk->d_fd, &disk->d_sb);
+	if (error)
+		goto out_close;
+
+	/* Determine bdev size, block size, and offset. */
+	if (S_ISBLK(disk->d_sb.st_mode)) {
+		error = ioctl(disk->d_fd, BLKGETSIZE64, &disk->d_size);
+		if (error)
+			disk->d_size = 0;
+		error = ioctl(disk->d_fd, BLKBSZGET, &disk->d_blksize);
+		if (error)
+			disk->d_blksize = 0;
+		disk->d_start = 0;
+	} else {
+		disk->d_size = disk->d_sb.st_size;
+		disk->d_blksize = disk->d_sb.st_blksize;
+		disk->d_start = 0;
+	}
+
+	return disk;
+out_close:
+	close(disk->d_fd);
+out_free:
+	free(disk);
+	return NULL;
+}
+
+/* Close a disk device. */
+int
+disk_close(
+	struct disk		*disk)
+{
+	int			error = 0;
+
+	if (disk->d_fd >= 0)
+		error = close(disk->d_fd);
+	disk->d_fd = -1;
+	free(disk);
+	return error;
+}
+
+/* Read-verify an extent of a disk device. */
+ssize_t
+disk_read_verify(
+	struct disk		*disk,
+	void			*buf,
+	uint64_t		start,
+	uint64_t		length)
+{
+	return pread(disk->d_fd, buf, length, start);
+}
diff --git a/scrub/disk.h b/scrub/disk.h
new file mode 100644
index 0000000..4331300
--- /dev/null
+++ b/scrub/disk.h
@@ -0,0 +1,39 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef XFS_SCRUB_DISK_H_
+#define XFS_SCRUB_DISK_H_
+
+struct disk {
+	struct stat	d_sb;
+	int		d_fd;
+	int		d_lbalog;
+	unsigned int	d_flags;
+	unsigned int	d_blksize;	/* bytes */
+	uint64_t	d_size;		/* bytes */
+	uint64_t	d_start;	/* bytes */
+};
+
+unsigned int disk_heads(struct disk *disk);
+struct disk *disk_open(const char *pathname);
+int disk_close(struct disk *disk);
+ssize_t disk_read_verify(struct disk *disk, void *buf, uint64_t startblock,
+		uint64_t blockcount);
+
+#endif /* XFS_SCRUB_DISK_H_ */


^ permalink raw reply related	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2018-01-17  1:36 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-06  1:51 [PATCH v11 00/27] xfsprogs: online scrub/repair support Darrick J. Wong
2018-01-06  1:51 ` [PATCH 01/27] xfs_scrub: create online filesystem scrub program Darrick J. Wong
2018-01-12  0:16   ` Eric Sandeen
2018-01-12  1:08     ` Darrick J. Wong
2018-01-12  1:07   ` Eric Sandeen
2018-01-12  1:10     ` Darrick J. Wong
2018-01-06  1:51 ` [PATCH 02/27] xfs_scrub: common error handling Darrick J. Wong
2018-01-12  1:15   ` Eric Sandeen
2018-01-12  1:23     ` Darrick J. Wong
2018-01-06  1:51 ` [PATCH 03/27] xfs_scrub: set up command line argument parsing Darrick J. Wong
2018-01-11 23:39   ` Eric Sandeen
2018-01-12  1:53     ` Darrick J. Wong
2018-01-12  1:30   ` Eric Sandeen
2018-01-12  2:03     ` Darrick J. Wong
2018-01-06  1:51 ` [PATCH 04/27] xfs_scrub: dispatch the various phases of the scrub program Darrick J. Wong
2018-01-06  1:51 ` [PATCH 05/27] xfs_scrub: figure out how many threads we're going to need Darrick J. Wong
2018-01-06  1:52 ` [PATCH 06/27] xfs_scrub: create an abstraction for a block device Darrick J. Wong
2018-01-11 23:24   ` Eric Sandeen
2018-01-11 23:59     ` Darrick J. Wong
2018-01-12  0:04       ` Eric Sandeen
2018-01-12  1:27         ` Darrick J. Wong
2018-01-06  1:52 ` [PATCH 07/27] xfs_scrub: find XFS filesystem geometry Darrick J. Wong
2018-01-06  1:52 ` [PATCH 08/27] xfs_scrub: add inode iteration functions Darrick J. Wong
2018-01-06  1:52 ` [PATCH 09/27] xfs_scrub: add space map " Darrick J. Wong
2018-01-06  1:52 ` [PATCH 10/27] xfs_scrub: add file " Darrick J. Wong
2018-01-11 23:19   ` Eric Sandeen
2018-01-12  0:24     ` Darrick J. Wong
2018-01-06  1:52 ` [PATCH 11/27] xfs_scrub: filesystem counter collection functions Darrick J. Wong
2018-01-06  1:52 ` [PATCH 12/27] xfs_scrub: wrap the scrub ioctl Darrick J. Wong
2018-01-11 23:12   ` Eric Sandeen
2018-01-12  0:28     ` Darrick J. Wong
2018-01-06  1:52 ` [PATCH 13/27] xfs_scrub: scan filesystem and AG metadata Darrick J. Wong
2018-01-06  1:52 ` [PATCH 14/27] xfs_scrub: thread-safe stats counter Darrick J. Wong
2018-01-06  1:53 ` [PATCH 15/27] xfs_scrub: scan inodes Darrick J. Wong
2018-01-06  1:53 ` [PATCH 16/27] xfs_scrub: check directory connectivity Darrick J. Wong
2018-01-06  1:53 ` [PATCH 17/27] xfs_scrub: warn about suspicious characters in directory/xattr names Darrick J. Wong
2018-01-06  1:53 ` [PATCH 18/27] xfs_scrub: warn about normalized Unicode name collisions Darrick J. Wong
2018-01-16 23:52   ` Eric Sandeen
2018-01-16 23:57     ` Eric Sandeen
2018-01-16 23:59     ` Darrick J. Wong
2018-01-06  1:53 ` [PATCH 19/27] xfs_scrub: create a bitmap data structure Darrick J. Wong
2018-01-06  1:53 ` [PATCH 20/27] xfs_scrub: create infrastructure to read verify data blocks Darrick J. Wong
2018-01-06  1:53 ` [PATCH 21/27] xfs_scrub: scrub file " Darrick J. Wong
2018-01-11 23:25   ` Eric Sandeen
2018-01-12  0:29     ` Darrick J. Wong
2018-01-06  1:53 ` [PATCH 22/27] xfs_scrub: optionally use SCSI READ VERIFY commands to scrub data blocks on disk Darrick J. Wong
2018-01-06  1:53 ` [PATCH 23/27] xfs_scrub: check summary counters Darrick J. Wong
2018-01-06  1:54 ` [PATCH 24/27] xfs_scrub: fstrim the free areas if there are no errors on the filesystem Darrick J. Wong
2018-01-16 22:07   ` Eric Sandeen
2018-01-16 22:23     ` Darrick J. Wong
2018-01-06  1:54 ` [PATCH 25/27] xfs_scrub: progress indicator Darrick J. Wong
2018-01-11 23:27   ` Eric Sandeen
2018-01-12  0:32     ` Darrick J. Wong
2018-01-06  1:54 ` [PATCH 26/27] xfs_scrub: create a script to scrub all xfs filesystems Darrick J. Wong
2018-01-06  1:54 ` [PATCH 27/27] xfs_scrub: integrate services with systemd Darrick J. Wong
2018-01-06  3:50 ` [PATCH 07/27] xfs_scrub: find XFS filesystem geometry Darrick J. Wong
2018-01-12  4:17 ` [PATCH v11 00/27] xfsprogs: online scrub/repair support Eric Sandeen
2018-01-17  1:31   ` Darrick J. Wong
2018-01-16 19:21 ` [PATCH 28/27] xfs_scrub: wire up repair ioctl Darrick J. Wong
2018-01-16 19:21 ` [PATCH 29/27] xfs_scrub: schedule and manage repairs to the filesystem Darrick J. Wong
  -- strict thread matches above, loose matches on Subject: below --
2017-11-17 21:00 [PATCH v10 00/27] xfsprogs-4.15: online scrub/repair support Darrick J. Wong
2017-11-17 21:00 ` [PATCH 06/27] xfs_scrub: create an abstraction for a block device Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.