All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v10 00/27] xfs: online scrub support
@ 2017-09-21  0:17 Darrick J. Wong
  2017-09-21  0:17 ` [PATCH 01/27] xfs: return a distinct error code value for IGET_INCORE cache misses Darrick J. Wong
                   ` (27 more replies)
  0 siblings, 28 replies; 51+ messages in thread
From: Darrick J. Wong @ 2017-09-21  0:17 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

Hi all,

This is the ninth revision of a patchset that adds to XFS kernel support
for online metadata scrubbing and repair.  There aren't any on-disk
format changes.  Changes since v9 include minor bug fixes and rebasing
to 4.14.  I have been performing daily online scrubs of my XFS
filesystems for several months now, with surprisingly few problems.

Online scrub/repair support consists of four major pieces -- first, an
ioctl that maps physical extents to their owners (GETFSMAP; already in
4.12); second, various in-kernel metadata scrubbing ioctls to examine
metadata records and cross-reference them with other filesystem
metadata; third, an in-kernel mechanism for rebuilding damaged metadata
objects and btrees; and fourth, a userspace component to coordinate
scrubbing and repair operations.

This new utility, xfs_scrub, is separate from the existing offline
xfs_repair tool.  The program uses various XFS ioctls to iterate all XFS
metadata and asks the kernel to check the metadata and repair it if
necessary.

While I understand that reviewer bandwidth is limited, I would like to
get this series prepped for 4.15, if possible.  I have isolated the
scrub code such that it can be compiled out entirely, in the hopes that
we can stabilize the code while not exposing regular users to riskier
code.

If you're going to start using this mess, you probably ought to just
pull from my git trees.  The kernel patches[1] should apply against
4.14-rc1.  xfsprogs[2] and xfstests[3] can be found in their usual
places.  The git trees contain all four series' worth of changes.

This is an extraordinary way to eat your data.  Enjoy! 
Comments and questions are, as always, welcome.

--D

[1] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=djwong-devel
[2] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=djwong-devel
[3] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=djwong-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 01/27] xfs: return a distinct error code value for IGET_INCORE cache misses
  2017-09-21  0:17 [PATCH v10 00/27] xfs: online scrub support Darrick J. Wong
@ 2017-09-21  0:17 ` Darrick J. Wong
  2017-09-21 14:36   ` Brian Foster
  2017-09-21  0:17 ` [PATCH 02/27] xfs: query the per-AG reservation counters Darrick J. Wong
                   ` (26 subsequent siblings)
  27 siblings, 1 reply; 51+ messages in thread
From: Darrick J. Wong @ 2017-09-21  0:17 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

For an XFS_IGET_INCORE iget operation, if the inode isn't in the cache,
return ENODATA so that we don't confuse it with the pre-existing ENOENT
cases (inode is in cache, but freed).

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_icache.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 3422711..43005fb 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -610,7 +610,7 @@ xfs_iget(
 	} else {
 		rcu_read_unlock();
 		if (flags & XFS_IGET_INCORE) {
-			error = -ENOENT;
+			error = -ENODATA;
 			goto out_error_or_again;
 		}
 		XFS_STATS_INC(mp, xs_ig_missed);


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 02/27] xfs: query the per-AG reservation counters
  2017-09-21  0:17 [PATCH v10 00/27] xfs: online scrub support Darrick J. Wong
  2017-09-21  0:17 ` [PATCH 01/27] xfs: return a distinct error code value for IGET_INCORE cache misses Darrick J. Wong
@ 2017-09-21  0:17 ` Darrick J. Wong
  2017-09-21 14:36   ` Brian Foster
  2017-09-21  0:17 ` [PATCH 03/27] xfs: create an ioctl to scrub AG metadata Darrick J. Wong
                   ` (25 subsequent siblings)
  27 siblings, 1 reply; 51+ messages in thread
From: Darrick J. Wong @ 2017-09-21  0:17 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Establish an ioctl for userspace to query the original and current
per-AG reservation counts.  This will be used by xfs_scrub to
check that the vfs counters are at least somewhat sane.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_fs.h |   12 ++++++++++++
 fs/xfs/xfs_fsops.c     |   26 ++++++++++++++++++++++++++
 fs/xfs/xfs_fsops.h     |    2 ++
 fs/xfs/xfs_ioctl.c     |   24 ++++++++++++++++++++++++
 fs/xfs/xfs_ioctl32.c   |    1 +
 5 files changed, 65 insertions(+)


diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 8c61f21..2c26c38 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -469,6 +469,17 @@ typedef struct xfs_swapext
 #define XFS_FSOP_GOING_FLAGS_NOLOGFLUSH		0x2	/* don't flush log nor data */
 
 /*
+ * AG reserved block counters
+ */
+struct xfs_fsop_ag_resblks {
+	__u32 ar_flags;			/* output flags, none defined now */
+	__u32 ar_reserved;		/* zero */
+	__u64 ar_current_resv;		/* blocks reserved now */
+	__u64 ar_mount_resv;		/* blocks reserved at mount time */
+	__u64 ar_reserved2[5];		/* zero */
+};
+
+/*
  * ioctl limits
  */
 #ifdef XATTR_LIST_MAX
@@ -543,6 +554,7 @@ typedef struct xfs_swapext
 #define XFS_IOC_ATTRMULTI_BY_HANDLE  _IOW ('X', 123, struct xfs_fsop_attrmulti_handlereq)
 #define XFS_IOC_FSGEOMETRY	     _IOR ('X', 124, struct xfs_fsop_geom)
 #define XFS_IOC_GOINGDOWN	     _IOR ('X', 125, uint32_t)
+#define XFS_IOC_GET_AG_RESBLKS	     _IOR ('X', 126, struct xfs_fsop_ag_resblks)
 /*	XFS_IOC_GETFSUUID ---------- deprecated 140	 */
 
 
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 8f22fc5..50fb3a2 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -44,6 +44,7 @@
 #include "xfs_filestream.h"
 #include "xfs_rmap.h"
 #include "xfs_ag_resv.h"
+#include "xfs_fs.h"
 
 /*
  * File system operations
@@ -1046,3 +1047,28 @@ xfs_fs_unreserve_ag_blocks(
 
 	return error;
 }
+
+/* Query the per-AG reservations to see how many blocks we have reserved. */
+int
+xfs_fs_get_ag_reserve_blocks(
+	struct xfs_mount		*mp,
+	struct xfs_fsop_ag_resblks	*out)
+{
+	struct xfs_ag_resv		*r;
+	struct xfs_perag		*pag;
+	xfs_agnumber_t			agno;
+
+	memset(out, 0, sizeof(struct xfs_fsop_ag_resblks));
+	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
+		pag = xfs_perag_get(mp, agno);
+		r = xfs_perag_resv(pag, XFS_AG_RESV_METADATA);
+		out->ar_current_resv += r->ar_reserved;
+		out->ar_mount_resv += r->ar_asked;
+		r = xfs_perag_resv(pag, XFS_AG_RESV_AGFL);
+		out->ar_current_resv += r->ar_reserved;
+		out->ar_mount_resv += r->ar_asked;
+		xfs_perag_put(pag);
+	}
+
+	return 0;
+}
diff --git a/fs/xfs/xfs_fsops.h b/fs/xfs/xfs_fsops.h
index 2954c13..c8f5e26 100644
--- a/fs/xfs/xfs_fsops.h
+++ b/fs/xfs/xfs_fsops.h
@@ -25,6 +25,8 @@ extern int xfs_fs_counts(xfs_mount_t *mp, xfs_fsop_counts_t *cnt);
 extern int xfs_reserve_blocks(xfs_mount_t *mp, uint64_t *inval,
 				xfs_fsop_resblks_t *outval);
 extern int xfs_fs_goingdown(xfs_mount_t *mp, uint32_t inflags);
+extern int xfs_fs_get_ag_reserve_blocks(struct xfs_mount *mp,
+		struct xfs_fsop_ag_resblks *out);
 
 extern int xfs_fs_reserve_ag_blocks(struct xfs_mount *mp);
 extern int xfs_fs_unreserve_ag_blocks(struct xfs_mount *mp);
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 5049e8a..44dc178 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1782,6 +1782,27 @@ xfs_ioc_swapext(
 	return error;
 }
 
+static int
+xfs_ioc_get_ag_reserve_blocks(
+	struct xfs_mount		*mp,
+	void __user			*arg)
+{
+	struct xfs_fsop_ag_resblks	out;
+	int				error;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	error = xfs_fs_get_ag_reserve_blocks(mp, &out);
+	if (error)
+		return error;
+
+	if (copy_to_user(arg, &out, sizeof(out)))
+		return -EFAULT;
+
+	return 0;
+}
+
 /*
  * Note: some of the ioctl's return positive numbers as a
  * byte count indicating success, such as readlink_by_handle.
@@ -1987,6 +2008,9 @@ xfs_file_ioctl(
 		return 0;
 	}
 
+	case XFS_IOC_GET_AG_RESBLKS:
+		return xfs_ioc_get_ag_reserve_blocks(mp, arg);
+
 	case XFS_IOC_FSGROWFSDATA: {
 		xfs_growfs_data_t in;
 
diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
index fa0bc4d..e8b4de3 100644
--- a/fs/xfs/xfs_ioctl32.c
+++ b/fs/xfs/xfs_ioctl32.c
@@ -556,6 +556,7 @@ xfs_file_compat_ioctl(
 	case XFS_IOC_ERROR_INJECTION:
 	case XFS_IOC_ERROR_CLEARALL:
 	case FS_IOC_GETFSMAP:
+	case XFS_IOC_GET_AG_RESBLKS:
 		return xfs_file_ioctl(filp, cmd, p);
 #ifndef BROKEN_X86_ALIGNMENT
 	/* These are handled fine if no alignment issues */


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 03/27] xfs: create an ioctl to scrub AG metadata
  2017-09-21  0:17 [PATCH v10 00/27] xfs: online scrub support Darrick J. Wong
  2017-09-21  0:17 ` [PATCH 01/27] xfs: return a distinct error code value for IGET_INCORE cache misses Darrick J. Wong
  2017-09-21  0:17 ` [PATCH 02/27] xfs: query the per-AG reservation counters Darrick J. Wong
@ 2017-09-21  0:17 ` Darrick J. Wong
  2017-09-21 14:36   ` Brian Foster
  2017-09-21  0:18 ` [PATCH 04/27] xfs: dispatch metadata scrub subcommands Darrick J. Wong
                   ` (24 subsequent siblings)
  27 siblings, 1 reply; 51+ messages in thread
From: Darrick J. Wong @ 2017-09-21  0:17 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Create an ioctl that can be used to scrub internal filesystem metadata.
The new ioctl takes the metadata type, an (optional) AG number, an
(optional) inode number and generation, and a flags argument.  This will
be used by the upcoming XFS online scrub tool.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Kconfig           |   17 ++++++++++++++
 fs/xfs/Makefile          |   11 +++++++++
 fs/xfs/libxfs/xfs_fs.h   |   53 +++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/scrub.c     |   54 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/scrub.h     |   25 +++++++++++++++++++++
 fs/xfs/scrub/trace.c     |   41 +++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/trace.h     |   33 ++++++++++++++++++++++++++++
 fs/xfs/scrub/xfs_scrub.h |   29 +++++++++++++++++++++++++
 fs/xfs/xfs_ioctl.c       |   28 ++++++++++++++++++++++++
 fs/xfs/xfs_ioctl32.c     |    1 +
 10 files changed, 292 insertions(+)
 create mode 100644 fs/xfs/scrub/scrub.c
 create mode 100644 fs/xfs/scrub/scrub.h
 create mode 100644 fs/xfs/scrub/trace.c
 create mode 100644 fs/xfs/scrub/trace.h
 create mode 100644 fs/xfs/scrub/xfs_scrub.h


diff --git a/fs/xfs/Kconfig b/fs/xfs/Kconfig
index 1b98cfa..f42fcf1 100644
--- a/fs/xfs/Kconfig
+++ b/fs/xfs/Kconfig
@@ -71,6 +71,23 @@ config XFS_RT
 
 	  If unsure, say N.
 
+config XFS_ONLINE_SCRUB
+	bool "XFS online metadata check support"
+	default n
+	depends on XFS_FS
+	help
+	  If you say Y here you will be able to check metadata on a
+	  mounted XFS filesystem.  This feature is intended to reduce
+	  filesystem downtime by supplementing xfs_repair.  The key
+	  advantage here is to look for problems proactively so that
+	  they can be dealt with in a controlled manner.
+
+	  This feature is considered EXPERIMENTAL.  Use with caution!
+
+	  See the xfs_scrub man page in section 8 for additional information.
+
+	  If unsure, say N.
+
 config XFS_WARN
 	bool "XFS Verbose Warnings"
 	depends on XFS_FS && !XFS_DEBUG
diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index dbc33e0..f4312bc 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -138,3 +138,14 @@ xfs-$(CONFIG_XFS_POSIX_ACL)	+= xfs_acl.o
 xfs-$(CONFIG_SYSCTL)		+= xfs_sysctl.o
 xfs-$(CONFIG_COMPAT)		+= xfs_ioctl32.o
 xfs-$(CONFIG_EXPORTFS_BLOCK_OPS)	+= xfs_pnfs.o
+
+# online scrub/repair
+ifeq ($(CONFIG_XFS_ONLINE_SCRUB),y)
+
+# Tracepoints like to blow up, so build that before everything else
+
+xfs-y				+= $(addprefix scrub/, \
+				   trace.o \
+				   scrub.o \
+				   )
+endif
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 2c26c38..a4b4c8c 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -468,6 +468,58 @@ typedef struct xfs_swapext
 #define XFS_FSOP_GOING_FLAGS_LOGFLUSH		0x1	/* flush log but not data */
 #define XFS_FSOP_GOING_FLAGS_NOLOGFLUSH		0x2	/* don't flush log nor data */
 
+/* metadata scrubbing */
+struct xfs_scrub_metadata {
+	__u32 sm_type;		/* What to check? */
+	__u32 sm_flags;		/* flags; see below. */
+	__u64 sm_ino;		/* inode number. */
+	__u32 sm_gen;		/* inode generation. */
+	__u32 sm_agno;		/* ag number. */
+	__u64 sm_reserved[5];	/* pad to 64 bytes */
+};
+
+/*
+ * Metadata types and flags for scrub operation.
+ */
+
+/* Scrub subcommands. */
+
+/* Number of scrub subcommands. */
+#define XFS_SCRUB_TYPE_NR	0
+
+/* i: Repair this metadata. */
+#define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
+
+/* o: Metadata object needs repair. */
+#define XFS_SCRUB_OFLAG_CORRUPT		(1 << 1)
+
+/*
+ * o: Metadata object could be optimized.  It's not corrupt, but
+ *    we could improve on it somehow.
+ */
+#define XFS_SCRUB_OFLAG_PREEN		(1 << 2)
+
+/* o: Cross-referencing failed. */
+#define XFS_SCRUB_OFLAG_XFAIL		(1 << 3)
+
+/* o: Metadata object disagrees with cross-referenced metadata. */
+#define XFS_SCRUB_OFLAG_XCORRUPT	(1 << 4)
+
+/* o: Scan was not complete. */
+#define XFS_SCRUB_OFLAG_INCOMPLETE	(1 << 5)
+
+/* o: Metadata object looked funny but isn't corrupt. */
+#define XFS_SCRUB_OFLAG_WARNING		(1 << 6)
+
+#define XFS_SCRUB_FLAGS_IN	(XFS_SCRUB_IFLAG_REPAIR)
+#define XFS_SCRUB_FLAGS_OUT	(XFS_SCRUB_OFLAG_CORRUPT | \
+				 XFS_SCRUB_OFLAG_PREEN | \
+				 XFS_SCRUB_OFLAG_XFAIL | \
+				 XFS_SCRUB_OFLAG_XCORRUPT | \
+				 XFS_SCRUB_OFLAG_INCOMPLETE | \
+				 XFS_SCRUB_OFLAG_WARNING)
+#define XFS_SCRUB_FLAGS_ALL	(XFS_SCRUB_FLAGS_IN | XFS_SCRUB_FLAGS_OUT)
+
 /*
  * AG reserved block counters
  */
@@ -522,6 +574,7 @@ struct xfs_fsop_ag_resblks {
 #define XFS_IOC_ZERO_RANGE	_IOW ('X', 57, struct xfs_flock64)
 #define XFS_IOC_FREE_EOFBLOCKS	_IOR ('X', 58, struct xfs_fs_eofblocks)
 /*	XFS_IOC_GETFSMAP ------ hoisted 59         */
+#define XFS_IOC_SCRUB_METADATA	_IOWR('X', 60, struct xfs_scrub_metadata)
 
 /*
  * ioctl commands that replace IRIX syssgi()'s
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
new file mode 100644
index 0000000..5db2a6f
--- /dev/null
+++ b/fs/xfs/scrub/scrub.c
@@ -0,0 +1,54 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_alloc.h"
+#include "xfs_alloc_btree.h"
+#include "xfs_bmap.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_ialloc.h"
+#include "xfs_ialloc_btree.h"
+#include "xfs_refcount.h"
+#include "xfs_refcount_btree.h"
+#include "xfs_rmap.h"
+#include "xfs_rmap_btree.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/trace.h"
+
+/* Dispatch metadata scrubbing. */
+int
+xfs_scrub_metadata(
+	struct xfs_inode		*ip,
+	struct xfs_scrub_metadata	*sm)
+{
+	return -EOPNOTSUPP;
+}
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
new file mode 100644
index 0000000..eb1cd9d
--- /dev/null
+++ b/fs/xfs/scrub/scrub.h
@@ -0,0 +1,25 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef __XFS_SCRUB_SCRUB_H__
+#define __XFS_SCRUB_SCRUB_H__
+
+/* Metadata scrubbers */
+
+#endif	/* __XFS_SCRUB_SCRUB_H__ */
diff --git a/fs/xfs/scrub/trace.c b/fs/xfs/scrub/trace.c
new file mode 100644
index 0000000..c59fd41
--- /dev/null
+++ b/fs/xfs/scrub/trace.c
@@ -0,0 +1,41 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_da_format.h"
+#include "xfs_defer.h"
+#include "xfs_inode.h"
+#include "xfs_btree.h"
+#include "xfs_trans.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+
+/*
+ * We include this last to have the helpers above available for the trace
+ * event implementations.
+ */
+#define CREATE_TRACE_POINTS
+#include "scrub/trace.h"
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
new file mode 100644
index 0000000..a95a7c8
--- /dev/null
+++ b/fs/xfs/scrub/trace.h
@@ -0,0 +1,33 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM xfs_scrub
+
+#if !defined(_TRACE_XFS_SCRUB_TRACE_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_XFS_SCRUB_TRACE_H
+
+#include <linux/tracepoint.h>
+
+#endif /* _TRACE_XFS_SCRUB_TRACE_H */
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH .
+#define TRACE_INCLUDE_FILE scrub/trace
+#include <trace/define_trace.h>
diff --git a/fs/xfs/scrub/xfs_scrub.h b/fs/xfs/scrub/xfs_scrub.h
new file mode 100644
index 0000000..e00e0ea
--- /dev/null
+++ b/fs/xfs/scrub/xfs_scrub.h
@@ -0,0 +1,29 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef __XFS_SCRUB_H__
+#define __XFS_SCRUB_H__
+
+#ifndef CONFIG_XFS_ONLINE_SCRUB
+# define xfs_scrub_metadata(ip, sm)	(-ENOTTY)
+#else
+int xfs_scrub_metadata(struct xfs_inode *ip, struct xfs_scrub_metadata *sm);
+#endif /* CONFIG_XFS_ONLINE_SCRUB */
+
+#endif	/* __XFS_SCRUB_H__ */
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 44dc178..ab7a7f8 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -44,6 +44,7 @@
 #include "xfs_btree.h"
 #include <linux/fsmap.h>
 #include "xfs_fsmap.h"
+#include "scrub/xfs_scrub.h"
 
 #include <linux/capability.h>
 #include <linux/cred.h>
@@ -1702,6 +1703,30 @@ xfs_ioc_getfsmap(
 	return 0;
 }
 
+STATIC int
+xfs_ioc_scrub_metadata(
+	struct xfs_inode		*ip,
+	void				__user *arg)
+{
+	struct xfs_scrub_metadata	scrub;
+	int				error;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	if (copy_from_user(&scrub, arg, sizeof(scrub)))
+		return -EFAULT;
+
+	error = xfs_scrub_metadata(ip, &scrub);
+	if (error)
+		return error;
+
+	if (copy_to_user(arg, &scrub, sizeof(scrub)))
+		return -EFAULT;
+
+	return 0;
+}
+
 int
 xfs_ioc_swapext(
 	xfs_swapext_t	*sxp)
@@ -1906,6 +1931,9 @@ xfs_file_ioctl(
 	case FS_IOC_GETFSMAP:
 		return xfs_ioc_getfsmap(ip, arg);
 
+	case XFS_IOC_SCRUB_METADATA:
+		return xfs_ioc_scrub_metadata(ip, arg);
+
 	case XFS_IOC_FD_TO_HANDLE:
 	case XFS_IOC_PATH_TO_HANDLE:
 	case XFS_IOC_PATH_TO_FSHANDLE: {
diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
index e8b4de3..972d4bd 100644
--- a/fs/xfs/xfs_ioctl32.c
+++ b/fs/xfs/xfs_ioctl32.c
@@ -557,6 +557,7 @@ xfs_file_compat_ioctl(
 	case XFS_IOC_ERROR_CLEARALL:
 	case FS_IOC_GETFSMAP:
 	case XFS_IOC_GET_AG_RESBLKS:
+	case XFS_IOC_SCRUB_METADATA:
 		return xfs_file_ioctl(filp, cmd, p);
 #ifndef BROKEN_X86_ALIGNMENT
 	/* These are handled fine if no alignment issues */


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 04/27] xfs: dispatch metadata scrub subcommands
  2017-09-21  0:17 [PATCH v10 00/27] xfs: online scrub support Darrick J. Wong
                   ` (2 preceding siblings ...)
  2017-09-21  0:17 ` [PATCH 03/27] xfs: create an ioctl to scrub AG metadata Darrick J. Wong
@ 2017-09-21  0:18 ` Darrick J. Wong
  2017-09-21 14:37   ` Brian Foster
  2017-09-21  0:18 ` [PATCH 05/27] xfs: test the scrub ioctl Darrick J. Wong
                   ` (23 subsequent siblings)
  27 siblings, 1 reply; 51+ messages in thread
From: Darrick J. Wong @ 2017-09-21  0:18 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Create structures needed to hold scrubbing context and dispatch incoming
commands to the individual scrubbers.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/scrub/scrub.c |  172 ++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/scrub.h |   19 ++++++
 fs/xfs/scrub/trace.h |   43 +++++++++++++
 3 files changed, 233 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 5db2a6f..7cf518e 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -44,11 +44,181 @@
 #include "scrub/scrub.h"
 #include "scrub/trace.h"
 
+/*
+ * Online Scrub and Repair
+ *
+ * Traditionally, XFS (the kernel driver) did not know how to check or
+ * repair on-disk data structures.  That task was left to the xfs_check
+ * and xfs_repair tools, both of which require taking the filesystem
+ * offline for a thorough but time consuming examination.  Online
+ * scrub & repair, on the other hand, enables us to check the metadata
+ * for obvious errors while carefully stepping around the filesystem's
+ * ongoing operations, locking rules, etc.
+ *
+ * Given that most XFS metadata consist of records stored in a btree,
+ * most of the checking functions iterate the btree blocks themselves
+ * looking for irregularities.  When a record block is encountered, each
+ * record can be checked for obviously bad values.  Record values can
+ * also be cross-referenced against other btrees to look for potential
+ * misunderstandings between pieces of metadata.
+ *
+ * It is expected that the checkers responsible for per-AG metadata
+ * structures will lock the AG headers (AGI, AGF, AGFL), iterate the
+ * metadata structure, and perform any relevant cross-referencing before
+ * unlocking the AG and returning the results to userspace.  These
+ * scrubbers must not keep an AG locked for too long to avoid tying up
+ * the block and inode allocators.
+ *
+ * Block maps and b-trees rooted in an inode present a special challenge
+ * because they can involve extents from any AG.  The general scrubber
+ * structure of lock -> check -> xref -> unlock still holds, but AG
+ * locking order rules /must/ be obeyed to avoid deadlocks.  The
+ * ordering rule, of course, is that we must lock in increasing AG
+ * order.  Helper functions are provided to track which AG headers we've
+ * already locked.  If we detect an imminent locking order violation, we
+ * can signal a potential deadlock, in which case the scrubber can jump
+ * out to the top level, lock all the AGs in order, and retry the scrub.
+ *
+ * For file data (directories, extended attributes, symlinks) scrub, we
+ * can simply lock the inode and walk the data.  For btree data
+ * (directories and attributes) we follow the same btree-scrubbing
+ * strategy outlined previously to check the records.
+ *
+ * We use a bit of trickery with transactions to avoid buffer deadlocks
+ * if there is a cycle in the metadata.  The basic problem is that
+ * travelling down a btree involves locking the current buffer at each
+ * tree level.  If a pointer should somehow point back to a buffer that
+ * we've already examined, we will deadlock due to the second buffer
+ * locking attempt.  Note however that grabbing a buffer in transaction
+ * context links the locked buffer to the transaction.  If we try to
+ * re-grab the buffer in the context of the same transaction, we avoid
+ * the second lock attempt and continue.  Between the verifier and the
+ * scrubber, something will notice that something is amiss and report
+ * the corruption.  Therefore, each scrubber will allocate an empty
+ * transaction, attach buffers to it, and cancel the transaction at the
+ * end of the scrub run.  Cancelling a non-dirty transaction simply
+ * unlocks the buffers.
+ *
+ * There are four pieces of data that scrub can communicate to
+ * userspace.  The first is the error code (errno), which can be used to
+ * communicate operational errors in performing the scrub.  There are
+ * also three flags that can be set in the scrub context.  If the data
+ * structure itself is corrupt, the CORRUPT flag will be set.  If
+ * the metadata is correct but otherwise suboptimal, the PREEN flag
+ * will be set.
+ */
+
+/* Scrub setup and teardown */
+
+/* Free all the resources and finish the transactions. */
+STATIC int
+xfs_scrub_teardown(
+	struct xfs_scrub_context	*sc,
+	int				error)
+{
+	if (sc->tp) {
+		xfs_trans_cancel(sc->tp);
+		sc->tp = NULL;
+	}
+	return error;
+}
+
+/* Scrubbing dispatch. */
+
+static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
+};
+
 /* Dispatch metadata scrubbing. */
 int
 xfs_scrub_metadata(
 	struct xfs_inode		*ip,
 	struct xfs_scrub_metadata	*sm)
 {
-	return -EOPNOTSUPP;
+	struct xfs_scrub_context	sc;
+	struct xfs_mount		*mp = ip->i_mount;
+	const struct xfs_scrub_meta_ops	*ops;
+	bool				try_harder = false;
+	int				error = 0;
+
+	trace_xfs_scrub_start(ip, sm, error);
+
+	/* Forbidden if we are shut down or mounted norecovery. */
+	error = -ESHUTDOWN;
+	if (XFS_FORCED_SHUTDOWN(mp))
+		goto out;
+	error = -ENOTRECOVERABLE;
+	if (mp->m_flags & XFS_MOUNT_NORECOVERY)
+		goto out;
+
+	/* Check our inputs. */
+	error = -EINVAL;
+	sm->sm_flags &= ~XFS_SCRUB_FLAGS_OUT;
+	if (sm->sm_flags & ~XFS_SCRUB_FLAGS_IN)
+		goto out;
+	if (memchr_inv(sm->sm_reserved, 0, sizeof(sm->sm_reserved)))
+		goto out;
+
+	/* Do we know about this type of metadata? */
+	error = -ENOENT;
+	if (sm->sm_type >= XFS_SCRUB_TYPE_NR)
+		goto out;
+	ops = &meta_scrub_ops[sm->sm_type];
+	if (ops->scrub == NULL)
+		goto out;
+
+	/* Does this fs even support this type of metadata? */
+	if (ops->has && !ops->has(&mp->m_sb))
+		goto out;
+
+	/* We don't know how to repair anything yet. */
+	error = -EOPNOTSUPP;
+	if (sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR)
+		goto out;
+
+	/* This isn't a stable feature.  Use with care. */
+	{
+		static bool warned;
+
+		if (!warned)
+			xfs_alert(mp,
+	"EXPERIMENTAL online scrub feature in use. Use at your own risk!");
+		warned = true;
+	}
+
+retry_op:
+	/* Set up for the operation. */
+	memset(&sc, 0, sizeof(sc));
+	sc.mp = ip->i_mount;
+	sc.sm = sm;
+	sc.ops = ops;
+	sc.try_harder = try_harder;
+	error = sc.ops->setup(&sc, ip);
+	if (error)
+		goto out_teardown;
+
+	/* Scrub for errors. */
+	error = sc.ops->scrub(&sc);
+	if (!try_harder && error == -EDEADLOCK) {
+		/*
+		 * Scrubbers return -EDEADLOCK to mean 'try harder'.
+		 * Tear down everything we hold, then set up again with
+		 * preparation for worst-case scenarios.
+		 */
+		error = xfs_scrub_teardown(&sc, 0);
+		if (error)
+			goto out;
+		try_harder = true;
+		goto retry_op;
+	} else if (error)
+		goto out_teardown;
+
+	if (sc.sm->sm_flags & (XFS_SCRUB_OFLAG_CORRUPT |
+			       XFS_SCRUB_OFLAG_XCORRUPT))
+		xfs_alert_ratelimited(mp, "Corruption detected during scrub.");
+
+out_teardown:
+	error = xfs_scrub_teardown(&sc, error);
+out:
+	trace_xfs_scrub_done(ip, sm, error);
+	return error;
 }
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index eb1cd9d..b271b2a 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -20,6 +20,25 @@
 #ifndef __XFS_SCRUB_SCRUB_H__
 #define __XFS_SCRUB_SCRUB_H__
 
+struct xfs_scrub_context;
+
+struct xfs_scrub_meta_ops {
+	int		(*setup)(struct xfs_scrub_context *,
+				 struct xfs_inode *);
+	int		(*scrub)(struct xfs_scrub_context *);
+	bool		(*has)(struct xfs_sb *);
+};
+
+struct xfs_scrub_context {
+	/* General scrub state. */
+	struct xfs_mount		*mp;
+	struct xfs_scrub_metadata	*sm;
+	const struct xfs_scrub_meta_ops	*ops;
+	struct xfs_trans		*tp;
+	struct xfs_inode		*ip;
+	bool				try_harder;
+};
+
 /* Metadata scrubbers */
 
 #endif	/* __XFS_SCRUB_SCRUB_H__ */
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index a95a7c8..688517e 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -25,6 +25,49 @@
 
 #include <linux/tracepoint.h>
 
+DECLARE_EVENT_CLASS(xfs_scrub_class,
+	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
+		 int error),
+	TP_ARGS(ip, sm, error),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(unsigned int, type)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_ino_t, inum)
+		__field(unsigned int, gen)
+		__field(unsigned int, flags)
+		__field(int, error)
+	),
+	TP_fast_assign(
+		__entry->dev = ip->i_mount->m_super->s_dev;
+		__entry->ino = ip->i_ino;
+		__entry->type = sm->sm_type;
+		__entry->agno = sm->sm_agno;
+		__entry->inum = sm->sm_ino;
+		__entry->gen = sm->sm_gen;
+		__entry->flags = sm->sm_flags;
+		__entry->error = error;
+	),
+	TP_printk("dev %d:%d ino %llu type %u agno %u inum %llu gen %u flags 0x%x error %d",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->type,
+		  __entry->agno,
+		  __entry->inum,
+		  __entry->gen,
+		  __entry->flags,
+		  __entry->error)
+)
+#define DEFINE_SCRUB_EVENT(name) \
+DEFINE_EVENT(xfs_scrub_class, name, \
+	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm, \
+		 int error), \
+	TP_ARGS(ip, sm, error))
+
+DEFINE_SCRUB_EVENT(xfs_scrub_start);
+DEFINE_SCRUB_EVENT(xfs_scrub_done);
+
 #endif /* _TRACE_XFS_SCRUB_TRACE_H */
 
 #undef TRACE_INCLUDE_PATH


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 05/27] xfs: test the scrub ioctl
  2017-09-21  0:17 [PATCH v10 00/27] xfs: online scrub support Darrick J. Wong
                   ` (3 preceding siblings ...)
  2017-09-21  0:18 ` [PATCH 04/27] xfs: dispatch metadata scrub subcommands Darrick J. Wong
@ 2017-09-21  0:18 ` Darrick J. Wong
  2017-09-21  6:04   ` Dave Chinner
  2017-09-21  0:18 ` [PATCH 06/27] xfs: create helpers to record and deal with scrub problems Darrick J. Wong
                   ` (22 subsequent siblings)
  27 siblings, 1 reply; 51+ messages in thread
From: Darrick J. Wong @ 2017-09-21  0:18 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Create a test scrubber with id 0.  This will be used by xfs_scrub to
probe the kernel's abilities to scrub (and repair) the metadata.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile        |    1 +
 fs/xfs/libxfs/xfs_fs.h |    3 ++
 fs/xfs/scrub/common.c  |   60 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/common.h  |   44 +++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/scrub.c   |   33 ++++++++++++++++++++++++++
 fs/xfs/scrub/scrub.h   |    1 +
 fs/xfs/scrub/trace.c   |    1 +
 7 files changed, 142 insertions(+), 1 deletion(-)
 create mode 100644 fs/xfs/scrub/common.c
 create mode 100644 fs/xfs/scrub/common.h


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index f4312bc..ca14595 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -146,6 +146,7 @@ ifeq ($(CONFIG_XFS_ONLINE_SCRUB),y)
 
 xfs-y				+= $(addprefix scrub/, \
 				   trace.o \
+				   common.o \
 				   scrub.o \
 				   )
 endif
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index a4b4c8c..5105bad 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -483,9 +483,10 @@ struct xfs_scrub_metadata {
  */
 
 /* Scrub subcommands. */
+#define XFS_SCRUB_TYPE_TEST	0	/* presence test ioctl */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	0
+#define XFS_SCRUB_TYPE_NR	1
 
 /* i: Repair this metadata. */
 #define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
new file mode 100644
index 0000000..13ccb36
--- /dev/null
+++ b/fs/xfs/scrub/common.c
@@ -0,0 +1,60 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_alloc.h"
+#include "xfs_alloc_btree.h"
+#include "xfs_bmap.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_ialloc.h"
+#include "xfs_ialloc_btree.h"
+#include "xfs_refcount.h"
+#include "xfs_refcount_btree.h"
+#include "xfs_rmap.h"
+#include "xfs_rmap_btree.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/trace.h"
+
+/* Common code for the metadata scrubbers. */
+
+/* Per-scrubber setup functions */
+
+/* Set us up with a transaction and an empty context. */
+int
+xfs_scrub_setup_fs(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	return xfs_scrub_trans_alloc(sc->sm, sc->mp,
+			&M_RES(sc->mp)->tr_itruncate, 0, 0, 0, &sc->tp);
+}
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
new file mode 100644
index 0000000..b97df8c
--- /dev/null
+++ b/fs/xfs/scrub/common.h
@@ -0,0 +1,44 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef __XFS_SCRUB_COMMON_H__
+#define __XFS_SCRUB_COMMON_H__
+
+/*
+ * Grab a transaction.  If we're going to repair something, we need to
+ * ensure there's enough reservation to make all the changes.  If not,
+ * we can use an empty transaction.
+ */
+static inline int
+xfs_scrub_trans_alloc(
+	struct xfs_scrub_metadata	*sm,
+	struct xfs_mount		*mp,
+	struct xfs_trans_res		*resp,
+	uint				blocks,
+	uint				rtextents,
+	uint				flags,
+	struct xfs_trans		**tpp)
+{
+	return xfs_trans_alloc_empty(mp, tpp);
+}
+
+/* Setup functions */
+int xfs_scrub_setup_fs(struct xfs_scrub_context *sc, struct xfs_inode *ip);
+
+#endif	/* __XFS_SCRUB_COMMON_H__ */
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 7cf518e..7936a23 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -42,6 +42,7 @@
 #include "xfs_rmap_btree.h"
 #include "scrub/xfs_scrub.h"
 #include "scrub/scrub.h"
+#include "scrub/common.h"
 #include "scrub/trace.h"
 
 /*
@@ -108,6 +109,34 @@
  * will be set.
  */
 
+/*
+ * Test scrubber -- userspace uses this to probe if we're willing to
+ * scrub or repair a given mountpoint.
+ */
+int
+xfs_scrub_tester(
+	struct xfs_scrub_context	*sc)
+{
+	if (sc->sm->sm_ino || sc->sm->sm_agno)
+		return -EINVAL;
+	if (sc->sm->sm_gen & XFS_SCRUB_OFLAG_CORRUPT)
+		sc->sm->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT;
+	if (sc->sm->sm_gen & XFS_SCRUB_OFLAG_PREEN)
+		sc->sm->sm_flags |= XFS_SCRUB_OFLAG_PREEN;
+	if (sc->sm->sm_gen & XFS_SCRUB_OFLAG_XFAIL)
+		sc->sm->sm_flags |= XFS_SCRUB_OFLAG_XFAIL;
+	if (sc->sm->sm_gen & XFS_SCRUB_OFLAG_XCORRUPT)
+		sc->sm->sm_flags |= XFS_SCRUB_OFLAG_XCORRUPT;
+	if (sc->sm->sm_gen & XFS_SCRUB_OFLAG_INCOMPLETE)
+		sc->sm->sm_flags |= XFS_SCRUB_OFLAG_INCOMPLETE;
+	if (sc->sm->sm_gen & XFS_SCRUB_OFLAG_WARNING)
+		sc->sm->sm_flags |= XFS_SCRUB_OFLAG_WARNING;
+	if (sc->sm->sm_gen & ~XFS_SCRUB_FLAGS_OUT)
+		return -ENOENT;
+
+	return 0;
+}
+
 /* Scrub setup and teardown */
 
 /* Free all the resources and finish the transactions. */
@@ -126,6 +155,10 @@ xfs_scrub_teardown(
 /* Scrubbing dispatch. */
 
 static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
+	{ /* ioctl presence test */
+		.setup	= xfs_scrub_setup_fs,
+		.scrub	= xfs_scrub_tester,
+	},
 };
 
 /* Dispatch metadata scrubbing. */
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index b271b2a..2528039 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -40,5 +40,6 @@ struct xfs_scrub_context {
 };
 
 /* Metadata scrubbers */
+int xfs_scrub_tester(struct xfs_scrub_context *sc);
 
 #endif	/* __XFS_SCRUB_SCRUB_H__ */
diff --git a/fs/xfs/scrub/trace.c b/fs/xfs/scrub/trace.c
index c59fd41..88b5ccb 100644
--- a/fs/xfs/scrub/trace.c
+++ b/fs/xfs/scrub/trace.c
@@ -32,6 +32,7 @@
 #include "xfs_trans.h"
 #include "scrub/xfs_scrub.h"
 #include "scrub/scrub.h"
+#include "scrub/common.h"
 
 /*
  * We include this last to have the helpers above available for the trace


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 06/27] xfs: create helpers to record and deal with scrub problems
  2017-09-21  0:17 [PATCH v10 00/27] xfs: online scrub support Darrick J. Wong
                   ` (4 preceding siblings ...)
  2017-09-21  0:18 ` [PATCH 05/27] xfs: test the scrub ioctl Darrick J. Wong
@ 2017-09-21  0:18 ` Darrick J. Wong
  2017-09-22  7:16   ` Dave Chinner
  2017-09-21  0:18 ` [PATCH 07/27] xfs: create helpers to scrub a metadata btree Darrick J. Wong
                   ` (21 subsequent siblings)
  27 siblings, 1 reply; 51+ messages in thread
From: Darrick J. Wong @ 2017-09-21  0:18 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Create helper functions to record crc and corruption problems, and
deal with any other runtime errors that arise.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/scrub/common.c |  243 +++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/common.h |   39 ++++++++
 fs/xfs/scrub/trace.h  |  193 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 475 insertions(+)


diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 13ccb36..cf3f1365 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -47,6 +47,249 @@
 
 /* Common code for the metadata scrubbers. */
 
+/* Check for operational errors. */
+bool
+xfs_scrub_op_ok(
+	struct xfs_scrub_context	*sc,
+	xfs_agnumber_t			agno,
+	xfs_agblock_t			bno,
+	int				*error)
+{
+	switch (*error) {
+	case 0:
+		return true;
+	case -EDEADLOCK:
+		/* Used to restart an op with deadlock avoidance. */
+		trace_xfs_scrub_deadlock_retry(sc->ip, sc->sm, *error);
+		break;
+	case -EFSBADCRC:
+	case -EFSCORRUPTED:
+		/* Note the badness but don't abort. */
+		sc->sm->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT;
+		*error = 0;
+		/* fall through */
+	default:
+		trace_xfs_scrub_op_error(sc, agno, bno, *error,
+				__return_address);
+		break;
+	}
+	return false;
+}
+
+/* Check for operational errors for a file offset. */
+bool
+xfs_scrub_fblock_op_ok(
+	struct xfs_scrub_context	*sc,
+	int				whichfork,
+	xfs_fileoff_t			offset,
+	int				*error)
+{
+	switch (*error) {
+	case 0:
+		return true;
+	case -EDEADLOCK:
+		/* Used to restart an op with deadlock avoidance. */
+		trace_xfs_scrub_deadlock_retry(sc->ip, sc->sm, *error);
+		break;
+	case -EFSBADCRC:
+	case -EFSCORRUPTED:
+		/* Note the badness but don't abort. */
+		sc->sm->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT;
+		*error = 0;
+		/* fall through */
+	default:
+		trace_xfs_scrub_file_op_error(sc, whichfork, offset, *error,
+				__return_address);
+		break;
+	}
+	return false;
+}
+
+/* Check for metadata block optimization possibilities. */
+bool
+xfs_scrub_block_preen_ok(
+	struct xfs_scrub_context	*sc,
+	struct xfs_buf			*bp,
+	bool				fs_ok)
+{
+	struct xfs_mount		*mp = sc->mp;
+	xfs_fsblock_t			fsbno;
+	xfs_agnumber_t			agno;
+	xfs_agblock_t			bno;
+
+	if (fs_ok)
+		return fs_ok;
+
+	fsbno = XFS_DADDR_TO_FSB(mp, bp->b_bn);
+	agno = XFS_FSB_TO_AGNO(mp, fsbno);
+	bno = XFS_FSB_TO_AGBNO(mp, fsbno);
+
+	sc->sm->sm_flags |= XFS_SCRUB_OFLAG_PREEN;
+	trace_xfs_scrub_block_preen(sc, agno, bno, __return_address);
+	return fs_ok;
+}
+
+/* Check for inode metadata optimization possibilities. */
+bool
+xfs_scrub_ino_preen_ok(
+	struct xfs_scrub_context	*sc,
+	struct xfs_buf			*bp,
+	bool				fs_ok)
+{
+	struct xfs_inode		*ip = sc->ip;
+	struct xfs_mount		*mp = sc->mp;
+	xfs_fsblock_t			fsbno;
+	xfs_agnumber_t			agno;
+	xfs_agblock_t			bno;
+
+	if (fs_ok)
+		return fs_ok;
+
+	if (bp) {
+		fsbno = XFS_DADDR_TO_FSB(mp, bp->b_bn);
+		agno = XFS_FSB_TO_AGNO(mp, fsbno);
+		bno = XFS_FSB_TO_AGBNO(mp, fsbno);
+	} else {
+		agno = XFS_INO_TO_AGNO(mp, ip->i_ino);
+		bno = XFS_INO_TO_AGINO(mp, ip->i_ino);
+	}
+
+	sc->sm->sm_flags |= XFS_SCRUB_OFLAG_PREEN;
+	trace_xfs_scrub_ino_preen(sc, ip->i_ino, agno, bno, __return_address);
+	return fs_ok;
+}
+
+/* Check for metadata block corruption. */
+bool
+xfs_scrub_block_check_ok(
+	struct xfs_scrub_context	*sc,
+	struct xfs_buf			*bp,
+	bool				fs_ok)
+{
+	struct xfs_mount		*mp = sc->mp;
+	xfs_fsblock_t			fsbno;
+	xfs_agnumber_t			agno;
+	xfs_agblock_t			bno;
+
+	if (fs_ok)
+		return fs_ok;
+
+	fsbno = XFS_DADDR_TO_FSB(mp, bp->b_bn);
+	agno = XFS_FSB_TO_AGNO(mp, fsbno);
+	bno = XFS_FSB_TO_AGBNO(mp, fsbno);
+
+	sc->sm->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT;
+	trace_xfs_scrub_block_error(sc, agno, bno, __return_address);
+	return fs_ok;
+}
+
+/* Check for inode metadata corruption. */
+bool
+xfs_scrub_ino_check_ok(
+	struct xfs_scrub_context	*sc,
+	xfs_ino_t			ino,
+	struct xfs_buf			*bp,
+	bool				fs_ok)
+{
+	struct xfs_inode		*ip = sc->ip;
+	struct xfs_mount		*mp = sc->mp;
+	xfs_fsblock_t			fsbno;
+	xfs_agnumber_t			agno;
+	xfs_agblock_t			bno;
+
+	if (fs_ok)
+		return fs_ok;
+
+	if (bp) {
+		fsbno = XFS_DADDR_TO_FSB(mp, bp->b_bn);
+		agno = XFS_FSB_TO_AGNO(mp, fsbno);
+		bno = XFS_FSB_TO_AGBNO(mp, fsbno);
+	} else {
+		agno = XFS_INO_TO_AGNO(mp, ip->i_ino);
+		bno = XFS_INO_TO_AGINO(mp, ip->i_ino);
+	}
+
+	sc->sm->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT;
+	trace_xfs_scrub_ino_error(sc, ino, agno, bno, __return_address);
+	return fs_ok;
+}
+
+/* Check for file fork block corruption. */
+bool
+xfs_scrub_fblock_check_ok(
+	struct xfs_scrub_context	*sc,
+	int				whichfork,
+	xfs_fileoff_t			offset,
+	bool				fs_ok)
+{
+	if (fs_ok)
+		return fs_ok;
+
+	sc->sm->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT;
+	trace_xfs_scrub_fblock_error(sc, whichfork, offset, __return_address);
+	return fs_ok;
+}
+
+/* Check for inode metadata non-corruption problems. */
+bool
+xfs_scrub_ino_warn_ok(
+	struct xfs_scrub_context	*sc,
+	struct xfs_buf			*bp,
+	bool				fs_ok)
+{
+	struct xfs_inode		*ip = sc->ip;
+	struct xfs_mount		*mp = sc->mp;
+	xfs_fsblock_t			fsbno;
+	xfs_agnumber_t			agno;
+	xfs_agblock_t			bno;
+
+	if (fs_ok)
+		return fs_ok;
+
+	if (bp) {
+		fsbno = XFS_DADDR_TO_FSB(mp, bp->b_bn);
+		agno = XFS_FSB_TO_AGNO(mp, fsbno);
+		bno = XFS_FSB_TO_AGBNO(mp, fsbno);
+	} else {
+		agno = XFS_INO_TO_AGNO(mp, ip->i_ino);
+		bno = XFS_INO_TO_AGINO(mp, ip->i_ino);
+	}
+
+	sc->sm->sm_flags |= XFS_SCRUB_OFLAG_WARNING;
+	trace_xfs_scrub_ino_warning(sc, ip->i_ino, agno, bno, __return_address);
+	return fs_ok;
+}
+
+/* Check for file fork block non-corruption problems. */
+bool
+xfs_scrub_fblock_warn_ok(
+	struct xfs_scrub_context	*sc,
+	int				whichfork,
+	xfs_fileoff_t			offset,
+	bool				fs_ok)
+{
+	if (fs_ok)
+		return fs_ok;
+
+	sc->sm->sm_flags |= XFS_SCRUB_OFLAG_WARNING;
+	trace_xfs_scrub_fblock_warning(sc, whichfork, offset, __return_address);
+	return fs_ok;
+}
+
+/* Signal an incomplete scrub. */
+bool
+xfs_scrub_check_thoroughness(
+	struct xfs_scrub_context	*sc,
+	bool				fs_ok)
+{
+	if (fs_ok)
+		return fs_ok;
+
+	sc->sm->sm_flags |= XFS_SCRUB_OFLAG_INCOMPLETE;
+	trace_xfs_scrub_incomplete(sc, __return_address);
+	return fs_ok;
+}
+
 /* Per-scrubber setup functions */
 
 /* Set us up with a transaction and an empty context. */
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index b97df8c..e1bb14b 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -38,6 +38,45 @@ xfs_scrub_trans_alloc(
 	return xfs_trans_alloc_empty(mp, tpp);
 }
 
+/* Check for operational errors for a block check. */
+bool xfs_scrub_op_ok(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
+		     xfs_agblock_t bno, int *error);
+
+/* Check for operational errors for a file offset check. */
+bool xfs_scrub_fblock_op_ok(struct xfs_scrub_context *sc, int whichfork,
+			  xfs_fileoff_t offset, int *error);
+
+/* Check for metadata block optimization possibilities. */
+bool xfs_scrub_block_preen_ok(struct xfs_scrub_context *sc, struct xfs_buf *bp,
+			      bool fs_ok);
+
+/* Check for inode metadata optimization possibilities. */
+bool xfs_scrub_ino_preen_ok(struct xfs_scrub_context *sc, struct xfs_buf *bp,
+			    bool fs_ok);
+
+/* Check for metadata block corruption. */
+bool xfs_scrub_block_check_ok(struct xfs_scrub_context *sc, struct xfs_buf *bp,
+			      bool fs_ok);
+
+/* Check for inode metadata corruption. */
+bool xfs_scrub_ino_check_ok(struct xfs_scrub_context *sc, xfs_ino_t ino,
+			    struct xfs_buf *bp, bool fs_ok);
+
+/* Check for file fork block corruption. */
+bool xfs_scrub_fblock_check_ok(struct xfs_scrub_context *sc, int whichfork,
+			       xfs_fileoff_t offset, bool fs_ok);
+
+/* Check for inode metadata non-corruption weirdness problems. */
+bool xfs_scrub_ino_warn_ok(struct xfs_scrub_context *sc, struct xfs_buf *bp,
+			   bool fs_ok);
+
+/* Check for file data block non-corruption weirdness problems. */
+bool xfs_scrub_fblock_warn_ok(struct xfs_scrub_context *sc, int whichfork,
+			      xfs_fileoff_t offset, bool fs_ok);
+
+/* Signal an incomplete scrub. */
+bool xfs_scrub_check_thoroughness(struct xfs_scrub_context *sc, bool fs_ok);
+
 /* Setup functions */
 int xfs_scrub_setup_fs(struct xfs_scrub_context *sc, struct xfs_inode *ip);
 
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 688517e..8d67a85 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -67,6 +67,199 @@ DEFINE_EVENT(xfs_scrub_class, name, \
 
 DEFINE_SCRUB_EVENT(xfs_scrub_start);
 DEFINE_SCRUB_EVENT(xfs_scrub_done);
+DEFINE_SCRUB_EVENT(xfs_scrub_deadlock_retry);
+
+TRACE_EVENT(xfs_scrub_op_error,
+	TP_PROTO(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
+		 xfs_agblock_t bno, int error, void *ret_ip),
+	TP_ARGS(sc, agno, bno, error, ret_ip),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(unsigned int, type)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_agblock_t, bno)
+		__field(int, error)
+		__field(void *, ret_ip)
+	),
+	TP_fast_assign(
+		__entry->dev = sc->mp->m_super->s_dev;
+		__entry->type = sc->sm->sm_type;
+		__entry->agno = agno;
+		__entry->bno = bno;
+		__entry->error = error;
+		__entry->ret_ip = ret_ip;
+	),
+	TP_printk("dev %d:%d type %u agno %u agbno %u error %d ret_ip %pF",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->type,
+		  __entry->agno,
+		  __entry->bno,
+		  __entry->error,
+		  __entry->ret_ip)
+);
+
+TRACE_EVENT(xfs_scrub_file_op_error,
+	TP_PROTO(struct xfs_scrub_context *sc, int whichfork,
+		 xfs_fileoff_t offset, int error, void *ret_ip),
+	TP_ARGS(sc, whichfork, offset, error, ret_ip),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(int, whichfork)
+		__field(unsigned int, type)
+		__field(xfs_fileoff_t, offset)
+		__field(int, error)
+		__field(void *, ret_ip)
+	),
+	TP_fast_assign(
+		__entry->dev = sc->ip->i_mount->m_super->s_dev;
+		__entry->ino = sc->ip->i_ino;
+		__entry->whichfork = whichfork;
+		__entry->type = sc->sm->sm_type;
+		__entry->offset = offset;
+		__entry->error = error;
+		__entry->ret_ip = ret_ip;
+	),
+	TP_printk("dev %d:%d ino %llu fork %d type %u offset %llu error %d ret_ip %pF",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->whichfork,
+		  __entry->type,
+		  __entry->offset,
+		  __entry->error,
+		  __entry->ret_ip)
+);
+
+DECLARE_EVENT_CLASS(xfs_scrub_block_error_class,
+	TP_PROTO(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
+		 xfs_agblock_t bno, void *ret_ip),
+	TP_ARGS(sc, agno, bno, ret_ip),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(unsigned int, type)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_agblock_t, bno)
+		__field(void *, ret_ip)
+	),
+	TP_fast_assign(
+		__entry->dev = sc->mp->m_super->s_dev;
+		__entry->type = sc->sm->sm_type;
+		__entry->agno = agno;
+		__entry->bno = bno;
+		__entry->ret_ip = ret_ip;
+	),
+	TP_printk("dev %d:%d type %u agno %u agbno %u ret_ip %pF",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->type,
+		  __entry->agno,
+		  __entry->bno,
+		  __entry->ret_ip)
+)
+
+#define DEFINE_SCRUB_BLOCK_ERROR_EVENT(name) \
+DEFINE_EVENT(xfs_scrub_block_error_class, name, \
+	TP_PROTO(struct xfs_scrub_context *sc, xfs_agnumber_t agno, \
+		 xfs_agblock_t bno, void *ret_ip), \
+	TP_ARGS(sc, agno, bno, ret_ip))
+
+DEFINE_SCRUB_BLOCK_ERROR_EVENT(xfs_scrub_block_error);
+DEFINE_SCRUB_BLOCK_ERROR_EVENT(xfs_scrub_block_preen);
+
+DECLARE_EVENT_CLASS(xfs_scrub_ino_error_class,
+	TP_PROTO(struct xfs_scrub_context *sc, xfs_ino_t ino,
+		 xfs_agnumber_t agno, xfs_agblock_t bno, void *ret_ip),
+	TP_ARGS(sc, ino, agno, bno, ret_ip),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(unsigned int, type)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_agblock_t, bno)
+		__field(void *, ret_ip)
+	),
+	TP_fast_assign(
+		__entry->dev = sc->mp->m_super->s_dev;
+		__entry->ino = ino;
+		__entry->type = sc->sm->sm_type;
+		__entry->agno = agno;
+		__entry->bno = bno;
+		__entry->ret_ip = ret_ip;
+	),
+	TP_printk("dev %d:%d ino %llu type %u agno %u agbno %u ret_ip %pF",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->type,
+		  __entry->agno,
+		  __entry->bno,
+		  __entry->ret_ip)
+)
+
+#define DEFINE_SCRUB_INO_ERROR_EVENT(name) \
+DEFINE_EVENT(xfs_scrub_ino_error_class, name, \
+	TP_PROTO(struct xfs_scrub_context *sc, xfs_ino_t ino, \
+		 xfs_agnumber_t agno, xfs_agblock_t bno, void *ret_ip), \
+	TP_ARGS(sc, ino, agno, bno, ret_ip))
+
+DEFINE_SCRUB_INO_ERROR_EVENT(xfs_scrub_ino_error);
+DEFINE_SCRUB_INO_ERROR_EVENT(xfs_scrub_ino_preen);
+DEFINE_SCRUB_INO_ERROR_EVENT(xfs_scrub_ino_warning);
+
+DECLARE_EVENT_CLASS(xfs_scrub_fblock_error_class,
+	TP_PROTO(struct xfs_scrub_context *sc, int whichfork,
+		 xfs_fileoff_t offset, void *ret_ip),
+	TP_ARGS(sc, whichfork, offset, ret_ip),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(int, whichfork)
+		__field(unsigned int, type)
+		__field(xfs_fileoff_t, offset)
+		__field(void *, ret_ip)
+	),
+	TP_fast_assign(
+		__entry->dev = sc->ip->i_mount->m_super->s_dev;
+		__entry->ino = sc->ip->i_ino;
+		__entry->whichfork = whichfork;
+		__entry->type = sc->sm->sm_type;
+		__entry->offset = offset;
+		__entry->ret_ip = ret_ip;
+	),
+	TP_printk("dev %d:%d ino %llu fork %d type %u offset %llu ret_ip %pF",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->whichfork,
+		  __entry->type,
+		  __entry->offset,
+		  __entry->ret_ip)
+);
+
+#define DEFINE_SCRUB_FBLOCK_ERROR_EVENT(name) \
+DEFINE_EVENT(xfs_scrub_fblock_error_class, name, \
+	TP_PROTO(struct xfs_scrub_context *sc, int whichfork, \
+		 xfs_fileoff_t offset, void *ret_ip), \
+	TP_ARGS(sc, whichfork, offset, ret_ip))
+
+DEFINE_SCRUB_FBLOCK_ERROR_EVENT(xfs_scrub_fblock_error);
+DEFINE_SCRUB_FBLOCK_ERROR_EVENT(xfs_scrub_fblock_warning);
+
+TRACE_EVENT(xfs_scrub_incomplete,
+	TP_PROTO(struct xfs_scrub_context *sc, void *ret_ip),
+	TP_ARGS(sc, ret_ip),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(unsigned int, type)
+		__field(void *, ret_ip)
+	),
+	TP_fast_assign(
+		__entry->dev = sc->mp->m_super->s_dev;
+		__entry->type = sc->sm->sm_type;
+		__entry->ret_ip = ret_ip;
+	),
+	TP_printk("dev %d:%d type %u ret_ip %pF",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->type,
+		  __entry->ret_ip)
+);
 
 #endif /* _TRACE_XFS_SCRUB_TRACE_H */
 


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 07/27] xfs: create helpers to scrub a metadata btree
  2017-09-21  0:17 [PATCH v10 00/27] xfs: online scrub support Darrick J. Wong
                   ` (5 preceding siblings ...)
  2017-09-21  0:18 ` [PATCH 06/27] xfs: create helpers to record and deal with scrub problems Darrick J. Wong
@ 2017-09-21  0:18 ` Darrick J. Wong
  2017-09-22  7:23   ` Dave Chinner
  2017-09-21  0:18 ` [PATCH 08/27] xfs: scrub the shape of " Darrick J. Wong
                   ` (20 subsequent siblings)
  27 siblings, 1 reply; 51+ messages in thread
From: Darrick J. Wong @ 2017-09-21  0:18 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Create helper functions and tracepoints to deal with errors while
scrubbing a metadata btree.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile      |    1 
 fs/xfs/scrub/btree.c |  114 +++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/btree.h |   57 ++++++++++++++++++
 fs/xfs/scrub/trace.c |   14 ++++
 fs/xfs/scrub/trace.h |  162 ++++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 348 insertions(+)
 create mode 100644 fs/xfs/scrub/btree.c
 create mode 100644 fs/xfs/scrub/btree.h


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index ca14595..5888b9f 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -146,6 +146,7 @@ ifeq ($(CONFIG_XFS_ONLINE_SCRUB),y)
 
 xfs-y				+= $(addprefix scrub/, \
 				   trace.o \
+				   btree.o \
 				   common.o \
 				   scrub.o \
 				   )
diff --git a/fs/xfs/scrub/btree.c b/fs/xfs/scrub/btree.c
new file mode 100644
index 0000000..adf5d09
--- /dev/null
+++ b/fs/xfs/scrub/btree.c
@@ -0,0 +1,114 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_alloc.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/btree.h"
+#include "scrub/trace.h"
+
+/* btree scrubbing */
+
+/* Check for btree operation errors . */
+bool
+xfs_scrub_btree_op_ok(
+	struct xfs_scrub_context	*sc,
+	struct xfs_btree_cur		*cur,
+	int				level,
+	int				*error)
+{
+	if (*error == 0)
+		return true;
+
+	switch (*error) {
+	case -EDEADLOCK:
+		/* Used to restart an op with deadlock avoidance. */
+		trace_xfs_scrub_deadlock_retry(sc->ip, sc->sm, *error);
+		break;
+	case -EFSBADCRC:
+	case -EFSCORRUPTED:
+		/* Note the badness but don't abort. */
+		sc->sm->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT;
+		*error = 0;
+		/* fall through */
+	default:
+		if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE)
+			trace_xfs_scrub_ifork_btree_op_error(sc, cur, level,
+					*error, __return_address);
+		else
+			trace_xfs_scrub_btree_op_error(sc, cur, level,
+					*error, __return_address);
+		break;
+	}
+	return false;
+}
+
+/* Check for btree corruption. */
+bool
+xfs_scrub_btree_check_ok(
+	struct xfs_scrub_context	*sc,
+	struct xfs_btree_cur		*cur,
+	int				level,
+	bool				fs_ok)
+{
+	if (fs_ok)
+		return fs_ok;
+
+	sc->sm->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT;
+
+	if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE)
+		trace_xfs_scrub_ifork_btree_error(sc, cur, level,
+				__return_address);
+	else
+		trace_xfs_scrub_btree_error(sc, cur, level,
+				__return_address);
+	return fs_ok;
+}
+
+/*
+ * Visit all nodes and leaves of a btree.  Check that all pointers and
+ * records are in order, that the keys reflect the records, and use a callback
+ * so that the caller can verify individual records.  The callback is the same
+ * as the one for xfs_btree_query_range, so therefore this function also
+ * returns XFS_BTREE_QUERY_RANGE_ABORT, zero, or a negative error code.
+ */
+int
+xfs_scrub_btree(
+	struct xfs_scrub_context	*sc,
+	struct xfs_btree_cur		*cur,
+	xfs_scrub_btree_rec_fn		scrub_fn,
+	struct xfs_owner_info		*oinfo,
+	void				*private)
+{
+	xfs_scrub_btree_op_ok(sc, cur, 0, false);
+	return -EOPNOTSUPP;
+}
diff --git a/fs/xfs/scrub/btree.h b/fs/xfs/scrub/btree.h
new file mode 100644
index 0000000..133e33d
--- /dev/null
+++ b/fs/xfs/scrub/btree.h
@@ -0,0 +1,57 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef __XFS_SCRUB_BTREE_H__
+#define __XFS_SCRUB_BTREE_H__
+
+/* btree scrub */
+
+/* Check for btree operation errors. */
+bool xfs_scrub_btree_op_ok(struct xfs_scrub_context *sc,
+			   struct xfs_btree_cur *cur, int level, int *error);
+
+/* Check for btree corruption. */
+bool xfs_scrub_btree_check_ok(struct xfs_scrub_context *sc,
+			      struct xfs_btree_cur *cur, int level, bool fs_ok);
+
+struct xfs_scrub_btree;
+typedef int (*xfs_scrub_btree_rec_fn)(
+	struct xfs_scrub_btree	*bs,
+	union xfs_btree_rec	*rec);
+
+struct xfs_scrub_btree {
+	/* caller-provided scrub state */
+	struct xfs_scrub_context	*sc;
+	struct xfs_btree_cur		*cur;
+	xfs_scrub_btree_rec_fn		scrub_rec;
+	struct xfs_owner_info		*oinfo;
+	void				*private;
+
+	/* internal scrub state */
+	union xfs_btree_rec		lastrec;
+	bool				firstrec;
+	union xfs_btree_key		lastkey[XFS_BTREE_MAXLEVELS];
+	bool				firstkey[XFS_BTREE_MAXLEVELS];
+	struct list_head		to_check;
+};
+int xfs_scrub_btree(struct xfs_scrub_context *sc, struct xfs_btree_cur *cur,
+		    xfs_scrub_btree_rec_fn scrub_fn,
+		    struct xfs_owner_info *oinfo, void *private);
+
+#endif /* __XFS_SCRUB_BTREE_H__ */
diff --git a/fs/xfs/scrub/trace.c b/fs/xfs/scrub/trace.c
index 88b5ccb..ee00115 100644
--- a/fs/xfs/scrub/trace.c
+++ b/fs/xfs/scrub/trace.c
@@ -30,10 +30,24 @@
 #include "xfs_inode.h"
 #include "xfs_btree.h"
 #include "xfs_trans.h"
+#include "xfs_bit.h"
 #include "scrub/xfs_scrub.h"
 #include "scrub/scrub.h"
 #include "scrub/common.h"
 
+/* Figure out which block the btree cursor was pointing to. */
+static inline xfs_fsblock_t
+xfs_scrub_btree_cur_fsbno(
+	struct xfs_btree_cur		*cur,
+	int				level)
+{
+	if (level < cur->bc_nlevels && cur->bc_bufs[level])
+		return XFS_DADDR_TO_FSB(cur->bc_mp, cur->bc_bufs[level]->b_bn);
+	else if (cur->bc_flags & XFS_BTREE_LONG_PTRS)
+		return XFS_INO_TO_FSB(cur->bc_mp, cur->bc_private.b.ip->i_ino);
+	return XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_private.a.agno, 0);
+}
+
 /*
  * We include this last to have the helpers above available for the trace
  * event implementations.
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 8d67a85..78f96b0 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -261,6 +261,168 @@ TRACE_EVENT(xfs_scrub_incomplete,
 		  __entry->ret_ip)
 );
 
+TRACE_EVENT(xfs_scrub_btree_op_error,
+	TP_PROTO(struct xfs_scrub_context *sc, struct xfs_btree_cur *cur,
+		 int level, int error, void *ret_ip),
+	TP_ARGS(sc, cur, level, error, ret_ip),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(unsigned int, type)
+		__field(xfs_btnum_t, btnum)
+		__field(int, level)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_agblock_t, bno)
+		__field(int, ptr);
+		__field(int, error)
+		__field(void *, ret_ip)
+	),
+	TP_fast_assign(
+		xfs_fsblock_t fsbno = xfs_scrub_btree_cur_fsbno(cur, level);
+		__entry->dev = sc->mp->m_super->s_dev;
+		__entry->type = sc->sm->sm_type;
+		__entry->btnum = cur->bc_btnum;
+		__entry->level = level;
+		__entry->agno = XFS_FSB_TO_AGNO(cur->bc_mp, fsbno);
+		__entry->bno = XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno);
+		__entry->ptr = cur->bc_ptrs[level];
+		__entry->error = error;
+		__entry->ret_ip = ret_ip;
+	),
+	TP_printk("dev %d:%d type %u btnum %d level %d ptr %d agno %u agbno %u error %d ret_ip %pF",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->type,
+		  __entry->btnum,
+		  __entry->level,
+		  __entry->ptr,
+		  __entry->agno,
+		  __entry->bno,
+		  __entry->error,
+		  __entry->ret_ip)
+);
+
+TRACE_EVENT(xfs_scrub_ifork_btree_op_error,
+	TP_PROTO(struct xfs_scrub_context *sc, struct xfs_btree_cur *cur,
+		 int level, int error, void *ret_ip),
+	TP_ARGS(sc, cur, level, error, ret_ip),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(int, whichfork)
+		__field(unsigned int, type)
+		__field(xfs_btnum_t, btnum)
+		__field(int, level)
+		__field(int, ptr)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_agblock_t, bno)
+		__field(int, error)
+		__field(void *, ret_ip)
+	),
+	TP_fast_assign(
+		xfs_fsblock_t fsbno = xfs_scrub_btree_cur_fsbno(cur, level);
+		__entry->dev = sc->mp->m_super->s_dev;
+		__entry->ino = sc->ip->i_ino;
+		__entry->whichfork = cur->bc_private.b.whichfork;
+		__entry->type = sc->sm->sm_type;
+		__entry->btnum = cur->bc_btnum;
+		__entry->level = level;
+		__entry->ptr = cur->bc_ptrs[level];
+		__entry->agno = XFS_FSB_TO_AGNO(cur->bc_mp, fsbno);
+		__entry->bno = XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno);
+		__entry->error = error;
+		__entry->ret_ip = ret_ip;
+	),
+	TP_printk("dev %d:%d ino %llu fork %d type %u btnum %d level %d ptr %d agno %u agbno %u error %d ret_ip %pF",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->whichfork,
+		  __entry->type,
+		  __entry->btnum,
+		  __entry->level,
+		  __entry->ptr,
+		  __entry->agno,
+		  __entry->bno,
+		  __entry->error,
+		  __entry->ret_ip)
+);
+
+TRACE_EVENT(xfs_scrub_btree_error,
+	TP_PROTO(struct xfs_scrub_context *sc, struct xfs_btree_cur *cur,
+		 int level, void *ret_ip),
+	TP_ARGS(sc, cur, level, ret_ip),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(unsigned int, type)
+		__field(xfs_btnum_t, btnum)
+		__field(int, level)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_agblock_t, bno)
+		__field(int, ptr);
+		__field(void *, ret_ip)
+	),
+	TP_fast_assign(
+		xfs_fsblock_t fsbno = xfs_scrub_btree_cur_fsbno(cur, level);
+		__entry->dev = sc->mp->m_super->s_dev;
+		__entry->type = sc->sm->sm_type;
+		__entry->btnum = cur->bc_btnum;
+		__entry->level = level;
+		__entry->agno = XFS_FSB_TO_AGNO(cur->bc_mp, fsbno);
+		__entry->bno = XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno);
+		__entry->ptr = cur->bc_ptrs[level];
+		__entry->ret_ip = ret_ip;
+	),
+	TP_printk("dev %d:%d type %u btnum %d level %d ptr %d agno %u agbno %u ret_ip %pF",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->type,
+		  __entry->btnum,
+		  __entry->level,
+		  __entry->ptr,
+		  __entry->agno,
+		  __entry->bno,
+		  __entry->ret_ip)
+);
+
+TRACE_EVENT(xfs_scrub_ifork_btree_error,
+	TP_PROTO(struct xfs_scrub_context *sc, struct xfs_btree_cur *cur,
+		 int level, void *ret_ip),
+	TP_ARGS(sc, cur, level, ret_ip),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(int, whichfork)
+		__field(unsigned int, type)
+		__field(xfs_btnum_t, btnum)
+		__field(int, level)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_agblock_t, bno)
+		__field(int, ptr);
+		__field(void *, ret_ip)
+	),
+	TP_fast_assign(
+		xfs_fsblock_t fsbno = xfs_scrub_btree_cur_fsbno(cur, level);
+		__entry->dev = sc->mp->m_super->s_dev;
+		__entry->ino = sc->ip->i_ino;
+		__entry->whichfork = cur->bc_private.b.whichfork;
+		__entry->type = sc->sm->sm_type;
+		__entry->btnum = cur->bc_btnum;
+		__entry->level = level;
+		__entry->agno = XFS_FSB_TO_AGNO(cur->bc_mp, fsbno);
+		__entry->bno = XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno);
+		__entry->ptr = cur->bc_ptrs[level];
+		__entry->ret_ip = ret_ip;
+	),
+	TP_printk("dev %d:%d ino %llu fork %d type %u btnum %d level %d ptr %d agno %u agbno %u ret_ip %pF",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->whichfork,
+		  __entry->type,
+		  __entry->btnum,
+		  __entry->level,
+		  __entry->ptr,
+		  __entry->agno,
+		  __entry->bno,
+		  __entry->ret_ip)
+);
+
 #endif /* _TRACE_XFS_SCRUB_TRACE_H */
 
 #undef TRACE_INCLUDE_PATH


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 08/27] xfs: scrub the shape of a metadata btree
  2017-09-21  0:17 [PATCH v10 00/27] xfs: online scrub support Darrick J. Wong
                   ` (6 preceding siblings ...)
  2017-09-21  0:18 ` [PATCH 07/27] xfs: create helpers to scrub a metadata btree Darrick J. Wong
@ 2017-09-21  0:18 ` Darrick J. Wong
  2017-09-22 15:22   ` Brian Foster
  2017-09-21  0:18 ` [PATCH 09/27] xfs: scrub btree keys and records Darrick J. Wong
                   ` (19 subsequent siblings)
  27 siblings, 1 reply; 51+ messages in thread
From: Darrick J. Wong @ 2017-09-21  0:18 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Create a function that can check the shape of a btree -- each block
passes basic inspection and all the pointers look ok.  In the next patch
we'll add the ability to check the actual keys and records stored within
the btree.  Add some helper functions so that we report detailed scrub
errors in a uniform manner in dmesg.  These are helper functions for
subsequent patches.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_btree.c |   16 +++
 fs/xfs/libxfs/xfs_btree.h |    7 +
 fs/xfs/scrub/btree.c      |  236 +++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/common.h     |   13 ++
 4 files changed, 268 insertions(+), 4 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index 5bfb882..c4d8b47 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -1027,7 +1027,7 @@ xfs_btree_setbuf(
 	}
 }
 
-STATIC int
+bool
 xfs_btree_ptr_is_null(
 	struct xfs_btree_cur	*cur,
 	union xfs_btree_ptr	*ptr)
@@ -1052,7 +1052,7 @@ xfs_btree_set_ptr_null(
 /*
  * Get/set/init sibling pointers
  */
-STATIC void
+void
 xfs_btree_get_sibling(
 	struct xfs_btree_cur	*cur,
 	struct xfs_btree_block	*block,
@@ -4914,3 +4914,15 @@ xfs_btree_count_blocks(
 	return xfs_btree_visit_blocks(cur, xfs_btree_count_blocks_helper,
 			blocks);
 }
+
+/* Compare two btree pointers. */
+int64_t
+xfs_btree_diff_two_ptrs(
+	struct xfs_btree_cur		*cur,
+	const union xfs_btree_ptr	*a,
+	const union xfs_btree_ptr	*b)
+{
+	if (cur->bc_flags & XFS_BTREE_LONG_PTRS)
+		return (int64_t)be64_to_cpu(a->l) - be64_to_cpu(b->l);
+	return (int64_t)be32_to_cpu(a->s) - be32_to_cpu(b->s);
+}
diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index f2a88c3..0daf524 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -517,5 +517,12 @@ int xfs_btree_lookup_get_block(struct xfs_btree_cur *cur, int level,
 		union xfs_btree_ptr *pp, struct xfs_btree_block **blkp);
 struct xfs_btree_block *xfs_btree_get_block(struct xfs_btree_cur *cur,
 		int level, struct xfs_buf **bpp);
+bool xfs_btree_ptr_is_null(struct xfs_btree_cur *cur, union xfs_btree_ptr *ptr);
+int64_t xfs_btree_diff_two_ptrs(struct xfs_btree_cur *cur,
+				const union xfs_btree_ptr *a,
+				const union xfs_btree_ptr *b);
+void xfs_btree_get_sibling(struct xfs_btree_cur *cur,
+			   struct xfs_btree_block *block,
+			   union xfs_btree_ptr *ptr, int lr);
 
 #endif	/* __XFS_BTREE_H__ */
diff --git a/fs/xfs/scrub/btree.c b/fs/xfs/scrub/btree.c
index adf5d09..a9c2bf3 100644
--- a/fs/xfs/scrub/btree.c
+++ b/fs/xfs/scrub/btree.c
@@ -94,6 +94,152 @@ xfs_scrub_btree_check_ok(
 	return fs_ok;
 }
 
+/* Check a btree pointer. */
+static int
+xfs_scrub_btree_ptr(
+	struct xfs_scrub_btree		*bs,
+	int				level,
+	union xfs_btree_ptr		*ptr)
+{
+	struct xfs_btree_cur		*cur = bs->cur;
+	xfs_daddr_t			daddr;
+	xfs_daddr_t			eofs;
+
+	if (!xfs_scrub_btree_check_ok(bs->sc, cur, level,
+			!xfs_btree_ptr_is_null(cur, ptr)))
+		goto corrupt;
+	if (cur->bc_flags & XFS_BTREE_LONG_PTRS) {
+		daddr = XFS_FSB_TO_DADDR(cur->bc_mp, be64_to_cpu(ptr->l));
+	} else {
+		if (!xfs_scrub_btree_check_ok(bs->sc, cur, level,
+				cur->bc_private.a.agno != NULLAGNUMBER))
+			goto corrupt;
+
+		daddr = XFS_AGB_TO_DADDR(cur->bc_mp, cur->bc_private.a.agno,
+				be32_to_cpu(ptr->s));
+	}
+	eofs = XFS_FSB_TO_BB(cur->bc_mp, cur->bc_mp->m_sb.sb_dblocks);
+	if (!xfs_scrub_btree_check_ok(bs->sc, cur, level,
+			daddr != 0 && daddr < eofs))
+		goto corrupt;
+
+	return 0;
+
+corrupt:
+	return -EFSCORRUPTED;
+}
+
+/* Check that a btree block's sibling matches what we expect it. */
+STATIC int
+xfs_scrub_btree_block_check_sibling(
+	struct xfs_scrub_btree		*bs,
+	int				level,
+	int				direction,
+	union xfs_btree_ptr		*sibling)
+{
+	struct xfs_btree_cur		*cur = bs->cur;
+	struct xfs_btree_block		*pblock;
+	struct xfs_buf			*pbp;
+	struct xfs_btree_cur		*ncur;
+	union xfs_btree_ptr		*pp;
+	int				success;
+	int				error;
+
+	if (xfs_btree_ptr_is_null(cur, sibling))
+		return 0;
+
+	error = xfs_btree_dup_cursor(cur, &ncur);
+	if (error)
+		return error;
+
+	if (direction > 0)
+		error = xfs_btree_increment(ncur, level + 1, &success);
+	else
+		error = xfs_btree_decrement(ncur, level + 1, &success);
+	if (!xfs_scrub_btree_op_ok(bs->sc, cur, level + 1, &error) ||
+	    !xfs_scrub_btree_check_ok(bs->sc, cur, level + 1, success))
+		goto out;
+
+	pblock = xfs_btree_get_block(ncur, level + 1, &pbp);
+	pp = xfs_btree_ptr_addr(ncur, ncur->bc_ptrs[level + 1], pblock);
+	error = xfs_scrub_btree_ptr(bs, level + 1, pp);
+	if (error) {
+		/*
+		 * _scrub_btree_ptr already recorded a garbage sibling.
+		 * Don't let the EFSCORRUPTED bubble up and prevent more
+		 * scanning of the data structure.
+		 */
+		error = 0;
+		goto out;
+	}
+
+	xfs_scrub_btree_check_ok(bs->sc, cur, level,
+			!xfs_btree_diff_two_ptrs(cur, pp, sibling));
+out:
+	xfs_btree_del_cursor(ncur, XFS_BTREE_ERROR);
+	return error;
+}
+
+/* Check the siblings of a btree block. */
+STATIC int
+xfs_scrub_btree_block_check_siblings(
+	struct xfs_scrub_btree		*bs,
+	struct xfs_btree_block		*block)
+{
+	struct xfs_btree_cur		*cur = bs->cur;
+	union xfs_btree_ptr		leftsib;
+	union xfs_btree_ptr		rightsib;
+	int				level;
+	int				error = 0;
+
+	xfs_btree_get_sibling(cur, block, &leftsib, XFS_BB_LEFTSIB);
+	xfs_btree_get_sibling(cur, block, &rightsib, XFS_BB_RIGHTSIB);
+	level = xfs_btree_get_level(block);
+
+	/* Root block should never have siblings. */
+	if (level == cur->bc_nlevels - 1) {
+		xfs_scrub_btree_check_ok(bs->sc, cur, level,
+				xfs_btree_ptr_is_null(cur, &leftsib) &&
+				xfs_btree_ptr_is_null(cur, &rightsib));
+		goto out;
+	}
+
+	/* Does the left sibling match the parent level left block? */
+	error = xfs_scrub_btree_block_check_sibling(bs, level, -1, &leftsib);
+	if (error)
+		return error;
+
+	/* Does the right sibling match the parent level right block? */
+	error = xfs_scrub_btree_block_check_sibling(bs, level, 1, &rightsib);
+	if (error)
+		return error;
+out:
+	return error;
+}
+
+/* Grab and scrub a btree block. */
+STATIC int
+xfs_scrub_btree_block(
+	struct xfs_scrub_btree		*bs,
+	int				level,
+	union xfs_btree_ptr		*pp,
+	struct xfs_btree_block		**pblock,
+	struct xfs_buf			**pbp)
+{
+	int				error;
+
+	error = xfs_btree_lookup_get_block(bs->cur, level, pp, pblock);
+	if (error)
+		return error;
+
+	xfs_btree_get_block(bs->cur, level, pbp);
+	error = xfs_btree_check_block(bs->cur, *pblock, level, *pbp);
+	if (error)
+		return error;
+
+	return xfs_scrub_btree_block_check_siblings(bs, *pblock);
+}
+
 /*
  * Visit all nodes and leaves of a btree.  Check that all pointers and
  * records are in order, that the keys reflect the records, and use a callback
@@ -109,6 +255,92 @@ xfs_scrub_btree(
 	struct xfs_owner_info		*oinfo,
 	void				*private)
 {
-	xfs_scrub_btree_op_ok(sc, cur, 0, false);
-	return -EOPNOTSUPP;
+	struct xfs_scrub_btree		bs = {0};
+	union xfs_btree_ptr		ptr;
+	union xfs_btree_ptr		*pp;
+	struct xfs_btree_block		*block;
+	int				level;
+	struct xfs_buf			*bp;
+	int				i;
+	int				error = 0;
+
+	/* Initialize scrub state */
+	bs.cur = cur;
+	bs.scrub_rec = scrub_fn;
+	bs.oinfo = oinfo;
+	bs.firstrec = true;
+	bs.private = private;
+	bs.sc = sc;
+	for (i = 0; i < XFS_BTREE_MAXLEVELS; i++)
+		bs.firstkey[i] = true;
+	INIT_LIST_HEAD(&bs.to_check);
+
+	/* Don't try to check a tree with a height we can't handle. */
+	if (!xfs_scrub_btree_check_ok(sc, cur, 0, cur->bc_nlevels > 0 &&
+			cur->bc_nlevels <= XFS_BTREE_MAXLEVELS))
+		goto out;
+
+	/* Make sure the root isn't in the superblock. */
+	if (!(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE)) {
+		cur->bc_ops->init_ptr_from_cur(cur, &ptr);
+		error = xfs_scrub_btree_ptr(&bs, cur->bc_nlevels, &ptr);
+		if (!xfs_scrub_btree_op_ok(sc, cur, cur->bc_nlevels - 1, &error))
+			goto out;
+	}
+
+	/* Load the root of the btree. */
+	level = cur->bc_nlevels - 1;
+	cur->bc_ops->init_ptr_from_cur(cur, &ptr);
+	error = xfs_scrub_btree_block(&bs, level, &ptr, &block, &bp);
+	if (!xfs_scrub_btree_op_ok(sc, cur, cur->bc_nlevels - 1, &error))
+		goto out;
+
+	cur->bc_ptrs[level] = 1;
+
+	while (level < cur->bc_nlevels) {
+		block = xfs_btree_get_block(cur, level, &bp);
+
+		if (level == 0) {
+			/* End of leaf, pop back towards the root. */
+			if (cur->bc_ptrs[level] >
+			    be16_to_cpu(block->bb_numrecs)) {
+				if (level < cur->bc_nlevels - 1)
+					cur->bc_ptrs[level + 1]++;
+				level++;
+				continue;
+			}
+
+			if (xfs_scrub_should_terminate(&error))
+				break;
+
+			cur->bc_ptrs[level]++;
+			continue;
+		}
+
+		/* End of node, pop back towards the root. */
+		if (cur->bc_ptrs[level] > be16_to_cpu(block->bb_numrecs)) {
+			if (level < cur->bc_nlevels - 1)
+				cur->bc_ptrs[level + 1]++;
+			level++;
+			continue;
+		}
+
+		/* Drill another level deeper. */
+		pp = xfs_btree_ptr_addr(cur, cur->bc_ptrs[level], block);
+		error = xfs_scrub_btree_ptr(&bs, level, pp);
+		if (error) {
+			error = 0;
+			cur->bc_ptrs[level]++;
+			continue;
+		}
+		level--;
+		error = xfs_scrub_btree_block(&bs, level, pp, &block, &bp);
+		if (!xfs_scrub_btree_op_ok(sc, cur, level, &error))
+			goto out;
+
+		cur->bc_ptrs[level] = 1;
+	}
+
+out:
+	return error;
 }
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index e1bb14b..9920488 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -20,6 +20,19 @@
 #ifndef __XFS_SCRUB_COMMON_H__
 #define __XFS_SCRUB_COMMON_H__
 
+/* Should we end the scrub early? */
+static inline bool
+xfs_scrub_should_terminate(
+	int		*error)
+{
+	if (fatal_signal_pending(current)) {
+		if (*error == 0)
+			*error = -EAGAIN;
+		return true;
+	}
+	return false;
+}
+
 /*
  * Grab a transaction.  If we're going to repair something, we need to
  * ensure there's enough reservation to make all the changes.  If not,


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 09/27] xfs: scrub btree keys and records
  2017-09-21  0:17 [PATCH v10 00/27] xfs: online scrub support Darrick J. Wong
                   ` (7 preceding siblings ...)
  2017-09-21  0:18 ` [PATCH 08/27] xfs: scrub the shape of " Darrick J. Wong
@ 2017-09-21  0:18 ` Darrick J. Wong
  2017-09-21  0:18 ` [PATCH 10/27] xfs: create helpers to scan an allocation group Darrick J. Wong
                   ` (18 subsequent siblings)
  27 siblings, 0 replies; 51+ messages in thread
From: Darrick J. Wong @ 2017-09-21  0:18 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Add to the btree scrubber the ability to check that the keys and
records are in the right order and actually call out to our record
iterator to do actual checking of the records.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/scrub/btree.c |  115 ++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/trace.h |   44 +++++++++++++++++++
 2 files changed, 159 insertions(+)


diff --git a/fs/xfs/scrub/btree.c b/fs/xfs/scrub/btree.c
index a9c2bf3..0ae56f5 100644
--- a/fs/xfs/scrub/btree.c
+++ b/fs/xfs/scrub/btree.c
@@ -94,6 +94,104 @@ xfs_scrub_btree_check_ok(
 	return fs_ok;
 }
 
+/*
+ * Make sure this record is in order and doesn't stray outside of the parent
+ * keys.
+ */
+STATIC int
+xfs_scrub_btree_rec(
+	struct xfs_scrub_btree	*bs)
+{
+	struct xfs_btree_cur	*cur = bs->cur;
+	union xfs_btree_rec	*rec;
+	union xfs_btree_key	key;
+	union xfs_btree_key	hkey;
+	union xfs_btree_key	*keyp;
+	struct xfs_btree_block	*block;
+	struct xfs_btree_block	*keyblock;
+	struct xfs_buf		*bp;
+
+	block = xfs_btree_get_block(cur, 0, &bp);
+	rec = xfs_btree_rec_addr(cur, cur->bc_ptrs[0], block);
+
+	trace_xfs_scrub_btree_rec(bs->sc, cur, 0);
+
+	/* If this isn't the first record, are they in order? */
+	xfs_scrub_btree_check_ok(bs->sc, cur, 0, bs->firstrec ||
+			cur->bc_ops->recs_inorder(cur, &bs->lastrec, rec));
+	bs->firstrec = false;
+	memcpy(&bs->lastrec, rec, cur->bc_ops->rec_len);
+
+	if (cur->bc_nlevels == 1)
+		return 0;
+
+	/* Is this at least as large as the parent low key? */
+	cur->bc_ops->init_key_from_rec(&key, rec);
+	keyblock = xfs_btree_get_block(cur, 1, &bp);
+	keyp = xfs_btree_key_addr(cur, cur->bc_ptrs[1], keyblock);
+	xfs_scrub_btree_check_ok(bs->sc, cur, 1,
+			cur->bc_ops->diff_two_keys(cur, &key, keyp) >= 0);
+
+	if (!(cur->bc_flags & XFS_BTREE_OVERLAPPING))
+		return 0;
+
+	/* Is this no larger than the parent high key? */
+	cur->bc_ops->init_high_key_from_rec(&hkey, rec);
+	keyp = xfs_btree_high_key_addr(cur, cur->bc_ptrs[1], keyblock);
+	xfs_scrub_btree_check_ok(bs->sc, cur, 1,
+			cur->bc_ops->diff_two_keys(cur, keyp, &hkey) >= 0);
+
+	return 0;
+}
+
+/*
+ * Make sure this key is in order and doesn't stray outside of the parent
+ * keys.
+ */
+STATIC int
+xfs_scrub_btree_key(
+	struct xfs_scrub_btree	*bs,
+	int			level)
+{
+	struct xfs_btree_cur	*cur = bs->cur;
+	union xfs_btree_key	*key;
+	union xfs_btree_key	*keyp;
+	struct xfs_btree_block	*block;
+	struct xfs_btree_block	*keyblock;
+	struct xfs_buf		*bp;
+
+	block = xfs_btree_get_block(cur, level, &bp);
+	key = xfs_btree_key_addr(cur, cur->bc_ptrs[level], block);
+
+	trace_xfs_scrub_btree_key(bs->sc, cur, level);
+
+	/* If this isn't the first key, are they in order? */
+	xfs_scrub_btree_check_ok(bs->sc, cur, level, bs->firstkey[level] ||
+			cur->bc_ops->keys_inorder(cur, &bs->lastkey[level], key));
+	bs->firstkey[level] = false;
+	memcpy(&bs->lastkey[level], key, cur->bc_ops->key_len);
+
+	if (level + 1 >= cur->bc_nlevels)
+		return 0;
+
+	/* Is this at least as large as the parent low key? */
+	keyblock = xfs_btree_get_block(cur, level + 1, &bp);
+	keyp = xfs_btree_key_addr(cur, cur->bc_ptrs[level + 1], keyblock);
+	xfs_scrub_btree_check_ok(bs->sc, cur, level,
+			cur->bc_ops->diff_two_keys(cur, key, keyp) >= 0);
+
+	if (!(cur->bc_flags & XFS_BTREE_OVERLAPPING))
+		return 0;
+
+	/* Is this no larger than the parent high key? */
+	key = xfs_btree_high_key_addr(cur, cur->bc_ptrs[level], block);
+	keyp = xfs_btree_high_key_addr(cur, cur->bc_ptrs[level + 1], keyblock);
+	xfs_scrub_btree_check_ok(bs->sc, cur, level,
+			cur->bc_ops->diff_two_keys(cur, keyp, key) >= 0);
+
+	return 0;
+}
+
 /* Check a btree pointer. */
 static int
 xfs_scrub_btree_ptr(
@@ -258,6 +356,7 @@ xfs_scrub_btree(
 	struct xfs_scrub_btree		bs = {0};
 	union xfs_btree_ptr		ptr;
 	union xfs_btree_ptr		*pp;
+	union xfs_btree_rec		*recp;
 	struct xfs_btree_block		*block;
 	int				level;
 	struct xfs_buf			*bp;
@@ -310,6 +409,17 @@ xfs_scrub_btree(
 				continue;
 			}
 
+			/* Records in order for scrub? */
+			error = xfs_scrub_btree_rec(&bs);
+			if (error)
+				goto out;
+
+			/* Call out to the record checker. */
+			recp = xfs_btree_rec_addr(cur, cur->bc_ptrs[0], block);
+			error = bs.scrub_rec(&bs, recp);
+			if (error < 0 ||
+			    error == XFS_BTREE_QUERY_RANGE_ABORT)
+				break;
 			if (xfs_scrub_should_terminate(&error))
 				break;
 
@@ -325,6 +435,11 @@ xfs_scrub_btree(
 			continue;
 		}
 
+		/* Keys in order for scrub? */
+		error = xfs_scrub_btree_key(&bs, level);
+		if (error)
+			goto out;
+
 		/* Drill another level deeper. */
 		pp = xfs_btree_ptr_addr(cur, cur->bc_ptrs[level], block);
 		error = xfs_scrub_btree_ptr(&bs, level, pp);
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 78f96b0..a78c8d1 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -423,6 +423,50 @@ TRACE_EVENT(xfs_scrub_ifork_btree_error,
 		  __entry->ret_ip)
 );
 
+DECLARE_EVENT_CLASS(xfs_scrub_sbtree_class,
+	TP_PROTO(struct xfs_scrub_context *sc, struct xfs_btree_cur *cur,
+		 int level),
+	TP_ARGS(sc, cur, level),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(int, type)
+		__field(xfs_btnum_t, btnum)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_agblock_t, bno)
+		__field(int, level)
+		__field(int, nlevels)
+		__field(int, ptr)
+	),
+	TP_fast_assign(
+		xfs_fsblock_t fsbno = xfs_scrub_btree_cur_fsbno(cur, level);
+		__entry->dev = sc->mp->m_super->s_dev;
+		__entry->type = sc->sm->sm_type;
+		__entry->btnum = cur->bc_btnum;
+		__entry->agno = XFS_FSB_TO_AGNO(cur->bc_mp, fsbno);
+		__entry->bno = XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno);
+		__entry->level = level;
+		__entry->nlevels = cur->bc_nlevels;
+		__entry->ptr = cur->bc_ptrs[level];
+	),
+	TP_printk("dev %d:%d type %u btnum %d agno %u agbno %u level %d nlevels %d ptr %d",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->type,
+		  __entry->btnum,
+		  __entry->agno,
+		  __entry->bno,
+		  __entry->level,
+		  __entry->nlevels,
+		  __entry->ptr)
+)
+#define DEFINE_SCRUB_SBTREE_EVENT(name) \
+DEFINE_EVENT(xfs_scrub_sbtree_class, name, \
+	TP_PROTO(struct xfs_scrub_context *sc, struct xfs_btree_cur *cur, \
+		 int level), \
+	TP_ARGS(sc, cur, level))
+
+DEFINE_SCRUB_SBTREE_EVENT(xfs_scrub_btree_rec);
+DEFINE_SCRUB_SBTREE_EVENT(xfs_scrub_btree_key);
+
 #endif /* _TRACE_XFS_SCRUB_TRACE_H */
 
 #undef TRACE_INCLUDE_PATH


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 10/27] xfs: create helpers to scan an allocation group
  2017-09-21  0:17 [PATCH v10 00/27] xfs: online scrub support Darrick J. Wong
                   ` (8 preceding siblings ...)
  2017-09-21  0:18 ` [PATCH 09/27] xfs: scrub btree keys and records Darrick J. Wong
@ 2017-09-21  0:18 ` Darrick J. Wong
  2017-09-21  0:18 ` [PATCH 11/27] xfs: scrub the backup superblocks Darrick J. Wong
                   ` (17 subsequent siblings)
  27 siblings, 0 replies; 51+ messages in thread
From: Darrick J. Wong @ 2017-09-21  0:18 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Add some helpers to enable us to lock an AG's headers, create btree
cursors for all btrees in that allocation group, and clean up
afterwards.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/scrub/common.c |  173 +++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/common.h |   10 +++
 fs/xfs/scrub/scrub.c  |    4 +
 fs/xfs/scrub/scrub.h  |   21 ++++++
 4 files changed, 208 insertions(+)


diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index cf3f1365..1e80f23 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -44,6 +44,7 @@
 #include "scrub/scrub.h"
 #include "scrub/common.h"
 #include "scrub/trace.h"
+#include "scrub/btree.h"
 
 /* Common code for the metadata scrubbers. */
 
@@ -290,6 +291,178 @@ xfs_scrub_check_thoroughness(
 	return fs_ok;
 }
 
+/*
+ * AG scrubbing
+ *
+ * These helpers facilitate locking an allocation group's header
+ * buffers, setting up cursors for all btrees that are present, and
+ * cleaning everything up once we're through.
+ */
+
+/* Grab all the headers for an AG. */
+int
+xfs_scrub_ag_read_headers(
+	struct xfs_scrub_context	*sc,
+	xfs_agnumber_t			agno,
+	struct xfs_buf			**agi,
+	struct xfs_buf			**agf,
+	struct xfs_buf			**agfl)
+{
+	struct xfs_mount		*mp = sc->mp;
+	int				error;
+
+	error = xfs_ialloc_read_agi(mp, sc->tp, agno, agi);
+	if (error)
+		goto out;
+
+	error = xfs_alloc_read_agf(mp, sc->tp, agno, 0, agf);
+	if (error)
+		goto out;
+	if (!*agf) {
+		error = -ENOMEM;
+		goto out;
+	}
+
+	error = xfs_alloc_read_agfl(mp, sc->tp, agno, agfl);
+	if (error)
+		goto out;
+
+out:
+	return error;
+}
+
+/* Release all the AG btree cursors. */
+void
+xfs_scrub_ag_btcur_free(
+	struct xfs_scrub_ag		*sa)
+{
+	if (sa->refc_cur)
+		xfs_btree_del_cursor(sa->refc_cur, XFS_BTREE_ERROR);
+	if (sa->rmap_cur)
+		xfs_btree_del_cursor(sa->rmap_cur, XFS_BTREE_ERROR);
+	if (sa->fino_cur)
+		xfs_btree_del_cursor(sa->fino_cur, XFS_BTREE_ERROR);
+	if (sa->ino_cur)
+		xfs_btree_del_cursor(sa->ino_cur, XFS_BTREE_ERROR);
+	if (sa->cnt_cur)
+		xfs_btree_del_cursor(sa->cnt_cur, XFS_BTREE_ERROR);
+	if (sa->bno_cur)
+		xfs_btree_del_cursor(sa->bno_cur, XFS_BTREE_ERROR);
+
+	sa->refc_cur = NULL;
+	sa->rmap_cur = NULL;
+	sa->fino_cur = NULL;
+	sa->ino_cur = NULL;
+	sa->bno_cur = NULL;
+	sa->cnt_cur = NULL;
+}
+
+/* Initialize all the btree cursors for an AG. */
+int
+xfs_scrub_ag_btcur_init(
+	struct xfs_scrub_context	*sc,
+	struct xfs_scrub_ag		*sa)
+{
+	struct xfs_mount		*mp = sc->mp;
+	xfs_agnumber_t			agno = sa->agno;
+
+	if (sa->agf_bp) {
+		/* Set up a bnobt cursor for cross-referencing. */
+		sa->bno_cur = xfs_allocbt_init_cursor(mp, sc->tp, sa->agf_bp,
+				agno, XFS_BTNUM_BNO);
+		if (!sa->bno_cur)
+			goto err;
+
+		/* Set up a cntbt cursor for cross-referencing. */
+		sa->cnt_cur = xfs_allocbt_init_cursor(mp, sc->tp, sa->agf_bp,
+				agno, XFS_BTNUM_CNT);
+		if (!sa->cnt_cur)
+			goto err;
+	}
+
+	/* Set up a inobt cursor for cross-referencing. */
+	if (sa->agi_bp) {
+		sa->ino_cur = xfs_inobt_init_cursor(mp, sc->tp, sa->agi_bp,
+					agno, XFS_BTNUM_INO);
+		if (!sa->ino_cur)
+			goto err;
+	}
+
+	/* Set up a finobt cursor for cross-referencing. */
+	if (sa->agi_bp && xfs_sb_version_hasfinobt(&mp->m_sb)) {
+		sa->fino_cur = xfs_inobt_init_cursor(mp, sc->tp, sa->agi_bp,
+				agno, XFS_BTNUM_FINO);
+		if (!sa->fino_cur)
+			goto err;
+	}
+
+	/* Set up a rmapbt cursor for cross-referencing. */
+	if (sa->agf_bp && xfs_sb_version_hasrmapbt(&mp->m_sb)) {
+		sa->rmap_cur = xfs_rmapbt_init_cursor(mp, sc->tp, sa->agf_bp,
+				agno);
+		if (!sa->rmap_cur)
+			goto err;
+	}
+
+	/* Set up a refcountbt cursor for cross-referencing. */
+	if (sa->agf_bp && xfs_sb_version_hasreflink(&mp->m_sb)) {
+		sa->refc_cur = xfs_refcountbt_init_cursor(mp, sc->tp,
+				sa->agf_bp, agno, NULL);
+		if (!sa->refc_cur)
+			goto err;
+	}
+
+	return 0;
+err:
+	return -ENOMEM;
+}
+
+/* Release the AG header context and btree cursors. */
+void
+xfs_scrub_ag_free(
+	struct xfs_scrub_context	*sc,
+	struct xfs_scrub_ag		*sa)
+{
+	xfs_scrub_ag_btcur_free(sa);
+	if (sa->agfl_bp) {
+		xfs_trans_brelse(sc->tp, sa->agfl_bp);
+		sa->agfl_bp = NULL;
+	}
+	if (sa->agf_bp) {
+		xfs_trans_brelse(sc->tp, sa->agf_bp);
+		sa->agf_bp = NULL;
+	}
+	if (sa->agi_bp) {
+		xfs_trans_brelse(sc->tp, sa->agi_bp);
+		sa->agi_bp = NULL;
+	}
+	sa->agno = NULLAGNUMBER;
+}
+
+/*
+ * For scrub, grab the AGI and the AGF headers, in that order.  Locking
+ * order requires us to get the AGI before the AGF.  We use the
+ * transaction to avoid deadlocking on crosslinked metadata buffers;
+ * either the caller passes one in (bmap scrub) or we have to create a
+ * transaction ourselves.
+ */
+int
+xfs_scrub_ag_init(
+	struct xfs_scrub_context	*sc,
+	xfs_agnumber_t			agno,
+	struct xfs_scrub_ag		*sa)
+{
+	int				error;
+
+	sa->agno = agno;
+	error = xfs_scrub_ag_read_headers(sc, agno, &sa->agi_bp,
+			&sa->agf_bp, &sa->agfl_bp);
+	if (error)
+		return error;
+
+	return xfs_scrub_ag_btcur_init(sc, sa);
+}
+
 /* Per-scrubber setup functions */
 
 /* Set us up with a transaction and an empty context. */
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 9920488..105b7ad 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -93,4 +93,14 @@ bool xfs_scrub_check_thoroughness(struct xfs_scrub_context *sc, bool fs_ok);
 /* Setup functions */
 int xfs_scrub_setup_fs(struct xfs_scrub_context *sc, struct xfs_inode *ip);
 
+void xfs_scrub_ag_free(struct xfs_scrub_context *sc, struct xfs_scrub_ag *sa);
+int xfs_scrub_ag_init(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
+		      struct xfs_scrub_ag *sa);
+int xfs_scrub_ag_read_headers(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
+			      struct xfs_buf **agi, struct xfs_buf **agf,
+			      struct xfs_buf **agfl);
+void xfs_scrub_ag_btcur_free(struct xfs_scrub_ag *sa);
+int xfs_scrub_ag_btcur_init(struct xfs_scrub_context *sc,
+			    struct xfs_scrub_ag *sa);
+
 #endif	/* __XFS_SCRUB_COMMON_H__ */
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 7936a23..bfc53b6 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -44,6 +44,8 @@
 #include "scrub/scrub.h"
 #include "scrub/common.h"
 #include "scrub/trace.h"
+#include "scrub/scrub.h"
+#include "scrub/btree.h"
 
 /*
  * Online Scrub and Repair
@@ -145,6 +147,7 @@ xfs_scrub_teardown(
 	struct xfs_scrub_context	*sc,
 	int				error)
 {
+	xfs_scrub_ag_free(sc, &sc->sa);
 	if (sc->tp) {
 		xfs_trans_cancel(sc->tp);
 		sc->tp = NULL;
@@ -225,6 +228,7 @@ xfs_scrub_metadata(
 	sc.sm = sm;
 	sc.ops = ops;
 	sc.try_harder = try_harder;
+	sc.sa.agno = NULLAGNUMBER;
 	error = sc.ops->setup(&sc, ip);
 	if (error)
 		goto out_teardown;
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index 2528039..291444f 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -29,6 +29,24 @@ struct xfs_scrub_meta_ops {
 	bool		(*has)(struct xfs_sb *);
 };
 
+/* Buffer pointers and btree cursors for an entire AG. */
+struct xfs_scrub_ag {
+	xfs_agnumber_t			agno;
+
+	/* AG btree roots */
+	struct xfs_buf			*agf_bp;
+	struct xfs_buf			*agfl_bp;
+	struct xfs_buf			*agi_bp;
+
+	/* AG btrees */
+	struct xfs_btree_cur		*bno_cur;
+	struct xfs_btree_cur		*cnt_cur;
+	struct xfs_btree_cur		*ino_cur;
+	struct xfs_btree_cur		*fino_cur;
+	struct xfs_btree_cur		*rmap_cur;
+	struct xfs_btree_cur		*refc_cur;
+};
+
 struct xfs_scrub_context {
 	/* General scrub state. */
 	struct xfs_mount		*mp;
@@ -37,6 +55,9 @@ struct xfs_scrub_context {
 	struct xfs_trans		*tp;
 	struct xfs_inode		*ip;
 	bool				try_harder;
+
+	/* State tracking for single-AG operations. */
+	struct xfs_scrub_ag		sa;
 };
 
 /* Metadata scrubbers */


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 11/27] xfs: scrub the backup superblocks
  2017-09-21  0:17 [PATCH v10 00/27] xfs: online scrub support Darrick J. Wong
                   ` (9 preceding siblings ...)
  2017-09-21  0:18 ` [PATCH 10/27] xfs: create helpers to scan an allocation group Darrick J. Wong
@ 2017-09-21  0:18 ` Darrick J. Wong
  2017-09-21  0:18 ` [PATCH 12/27] xfs: scrub AGF and AGFL Darrick J. Wong
                   ` (16 subsequent siblings)
  27 siblings, 0 replies; 51+ messages in thread
From: Darrick J. Wong @ 2017-09-21  0:18 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Ensure that the geometry presented in the backup superblocks matches
the primary superblock so that repair can recover the filesystem if
that primary gets corrupted.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile         |    1 
 fs/xfs/libxfs/xfs_fs.h  |    3 
 fs/xfs/scrub/agheader.c |  326 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/common.h   |    2 
 fs/xfs/scrub/scrub.c    |    4 +
 fs/xfs/scrub/scrub.h    |    1 
 6 files changed, 336 insertions(+), 1 deletion(-)
 create mode 100644 fs/xfs/scrub/agheader.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 5888b9f..e92d04d 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -146,6 +146,7 @@ ifeq ($(CONFIG_XFS_ONLINE_SCRUB),y)
 
 xfs-y				+= $(addprefix scrub/, \
 				   trace.o \
+				   agheader.o \
 				   btree.o \
 				   common.o \
 				   scrub.o \
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 5105bad..b98bba2 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -484,9 +484,10 @@ struct xfs_scrub_metadata {
 
 /* Scrub subcommands. */
 #define XFS_SCRUB_TYPE_TEST	0	/* presence test ioctl */
+#define XFS_SCRUB_TYPE_SB	1	/* superblock */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	1
+#define XFS_SCRUB_TYPE_NR	2
 
 /* i: Repair this metadata. */
 #define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/agheader.c b/fs/xfs/scrub/agheader.c
new file mode 100644
index 0000000..c58a5ef
--- /dev/null
+++ b/fs/xfs/scrub/agheader.c
@@ -0,0 +1,326 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/trace.h"
+
+/* Set us up to check an AG header. */
+int
+xfs_scrub_setup_ag_header(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	struct xfs_mount		*mp = sc->mp;
+
+	if (sc->sm->sm_agno >= mp->m_sb.sb_agcount ||
+	    sc->sm->sm_ino || sc->sm->sm_gen)
+		return -EINVAL;
+	return xfs_scrub_setup_fs(sc, ip);
+}
+
+/* Superblock */
+
+/* Scrub the filesystem superblock. */
+int
+xfs_scrub_superblock(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_buf			*bp;
+	struct xfs_dsb			*sb;
+	xfs_agnumber_t			agno;
+	uint32_t			v2_ok;
+	__be32				features_mask;
+	int				error;
+	__be16				vernum_mask;
+
+	agno = sc->sm->sm_agno;
+	if (agno == 0)
+		return 0;
+
+	error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
+		  XFS_AGB_TO_DADDR(mp, agno, XFS_SB_BLOCK(mp)),
+		  XFS_FSS_TO_BB(mp, 1), 0, &bp, &xfs_sb_buf_ops);
+	if (!xfs_scrub_op_ok(sc, agno, XFS_SB_BLOCK(mp), &error))
+		return error;
+
+	sb = XFS_BUF_TO_SBP(bp);
+
+	/*
+	 * Verify the geometries match.  Fields that are permanently
+	 * set by mkfs are checked; fields that can be updated later
+	 * (and are not propagated to backup superblocks) are preen
+	 * checked.
+	 */
+	xfs_scrub_block_check_ok(sc, bp,
+			sb->sb_blocksize == cpu_to_be32(mp->m_sb.sb_blocksize));
+
+	xfs_scrub_block_check_ok(sc, bp,
+			sb->sb_dblocks == cpu_to_be64(mp->m_sb.sb_dblocks));
+
+	xfs_scrub_block_check_ok(sc, bp,
+			sb->sb_rblocks == cpu_to_be64(mp->m_sb.sb_rblocks));
+
+	xfs_scrub_block_check_ok(sc, bp,
+			sb->sb_rextents == cpu_to_be64(mp->m_sb.sb_rextents));
+
+	xfs_scrub_block_preen_ok(sc, bp,
+			uuid_equal(&sb->sb_uuid, &mp->m_sb.sb_uuid));
+
+	xfs_scrub_block_check_ok(sc, bp,
+			sb->sb_logstart == cpu_to_be64(mp->m_sb.sb_logstart));
+
+	xfs_scrub_block_preen_ok(sc, bp,
+			sb->sb_rootino == cpu_to_be64(mp->m_sb.sb_rootino));
+
+	xfs_scrub_block_preen_ok(sc, bp,
+			sb->sb_rbmino == cpu_to_be64(mp->m_sb.sb_rbmino));
+
+	xfs_scrub_block_preen_ok(sc, bp,
+			sb->sb_rsumino == cpu_to_be64(mp->m_sb.sb_rsumino));
+
+	xfs_scrub_block_check_ok(sc, bp,
+			sb->sb_rextsize == cpu_to_be32(mp->m_sb.sb_rextsize));
+
+	xfs_scrub_block_check_ok(sc, bp,
+			sb->sb_agblocks == cpu_to_be32(mp->m_sb.sb_agblocks));
+
+	xfs_scrub_block_check_ok(sc, bp,
+			sb->sb_agcount == cpu_to_be32(mp->m_sb.sb_agcount));
+
+	xfs_scrub_block_check_ok(sc, bp,
+			sb->sb_rbmblocks == cpu_to_be32(mp->m_sb.sb_rbmblocks));
+
+	xfs_scrub_block_check_ok(sc, bp,
+			sb->sb_logblocks == cpu_to_be32(mp->m_sb.sb_logblocks));
+
+	/* Check sb_versionnum bits that are set at mkfs time. */
+	vernum_mask = cpu_to_be16(~XFS_SB_VERSION_OKBITS |
+				  XFS_SB_VERSION_NUMBITS |
+				  XFS_SB_VERSION_ALIGNBIT |
+				  XFS_SB_VERSION_DALIGNBIT |
+				  XFS_SB_VERSION_SHAREDBIT |
+				  XFS_SB_VERSION_LOGV2BIT |
+				  XFS_SB_VERSION_SECTORBIT |
+				  XFS_SB_VERSION_EXTFLGBIT |
+				  XFS_SB_VERSION_DIRV2BIT);
+	xfs_scrub_block_check_ok(sc, bp,
+			(sb->sb_versionnum & vernum_mask) ==
+			(cpu_to_be16(mp->m_sb.sb_versionnum) & vernum_mask));
+
+	/* Check sb_versionnum bits that can be set after mkfs time. */
+	vernum_mask = cpu_to_be16(XFS_SB_VERSION_ATTRBIT |
+				  XFS_SB_VERSION_NLINKBIT |
+				  XFS_SB_VERSION_QUOTABIT);
+	xfs_scrub_block_preen_ok(sc, bp,
+			(sb->sb_versionnum & vernum_mask) ==
+			(cpu_to_be16(mp->m_sb.sb_versionnum) & vernum_mask));
+
+	xfs_scrub_block_check_ok(sc, bp,
+			sb->sb_sectsize == cpu_to_be16(mp->m_sb.sb_sectsize));
+
+	xfs_scrub_block_check_ok(sc, bp,
+			sb->sb_inodesize == cpu_to_be16(mp->m_sb.sb_inodesize));
+
+	xfs_scrub_block_check_ok(sc, bp,
+			sb->sb_inopblock == cpu_to_be16(mp->m_sb.sb_inopblock));
+
+	xfs_scrub_block_preen_ok(sc, bp,
+			!memcmp(sb->sb_fname, mp->m_sb.sb_fname,
+				sizeof(sb->sb_fname)));
+
+	xfs_scrub_block_check_ok(sc, bp,
+			sb->sb_blocklog == mp->m_sb.sb_blocklog);
+
+	xfs_scrub_block_check_ok(sc, bp,
+			sb->sb_sectlog == mp->m_sb.sb_sectlog);
+
+	xfs_scrub_block_check_ok(sc, bp,
+			sb->sb_inodelog == mp->m_sb.sb_inodelog);
+
+	xfs_scrub_block_check_ok(sc, bp,
+			sb->sb_inopblog == mp->m_sb.sb_inopblog);
+
+	xfs_scrub_block_check_ok(sc, bp,
+			sb->sb_agblklog == mp->m_sb.sb_agblklog);
+
+	xfs_scrub_block_check_ok(sc, bp,
+			sb->sb_rextslog == mp->m_sb.sb_rextslog);
+
+	xfs_scrub_block_preen_ok(sc, bp,
+			sb->sb_imax_pct == mp->m_sb.sb_imax_pct);
+
+	/*
+	 * Skip the summary counters since we track them in memory anyway.
+	 * sb_icount, sb_ifree, sb_fdblocks, sb_frexents
+	 */
+
+	xfs_scrub_block_preen_ok(sc, bp,
+			sb->sb_uquotino == cpu_to_be64(mp->m_sb.sb_uquotino));
+
+	xfs_scrub_block_preen_ok(sc, bp,
+			sb->sb_gquotino == cpu_to_be64(mp->m_sb.sb_gquotino));
+
+	/*
+	 * Skip the quota flags since repair will force quotacheck.
+	 * sb_qflags
+	 */
+
+	xfs_scrub_block_check_ok(sc, bp,
+			sb->sb_flags == mp->m_sb.sb_flags);
+
+	xfs_scrub_block_check_ok(sc, bp,
+			sb->sb_shared_vn == mp->m_sb.sb_shared_vn);
+
+	xfs_scrub_block_check_ok(sc, bp,
+			sb->sb_inoalignmt == cpu_to_be32(mp->m_sb.sb_inoalignmt));
+
+	xfs_scrub_block_preen_ok(sc, bp,
+			sb->sb_unit == cpu_to_be32(mp->m_sb.sb_unit));
+
+	xfs_scrub_block_preen_ok(sc, bp,
+			sb->sb_width == cpu_to_be32(mp->m_sb.sb_width));
+
+	xfs_scrub_block_check_ok(sc, bp,
+			sb->sb_dirblklog == mp->m_sb.sb_dirblklog);
+
+	xfs_scrub_block_check_ok(sc, bp,
+			sb->sb_logsectlog == mp->m_sb.sb_logsectlog);
+
+	xfs_scrub_block_check_ok(sc, bp,
+			sb->sb_logsectsize ==
+			cpu_to_be16(mp->m_sb.sb_logsectsize));
+
+	xfs_scrub_block_check_ok(sc, bp,
+			sb->sb_logsunit == cpu_to_be32(mp->m_sb.sb_logsunit));
+
+	/* Do we see any invalid bits in sb_features2? */
+	if (!xfs_sb_version_hasmorebits(&mp->m_sb)) {
+		xfs_scrub_block_check_ok(sc, bp, sb->sb_features2 == 0);
+	} else {
+		v2_ok = XFS_SB_VERSION2_OKBITS;
+		if (XFS_SB_VERSION_NUM(&mp->m_sb) >= XFS_SB_VERSION_5)
+			v2_ok |= XFS_SB_VERSION2_CRCBIT;
+
+		xfs_scrub_block_check_ok(sc, bp,
+				!(sb->sb_features2 & cpu_to_be32(~v2_ok)));
+
+		xfs_scrub_block_preen_ok(sc, bp,
+				sb->sb_features2 == sb->sb_bad_features2);
+	}
+
+	/* Check sb_features2 flags that are set at mkfs time. */
+	features_mask = cpu_to_be32(XFS_SB_VERSION2_LAZYSBCOUNTBIT |
+				    XFS_SB_VERSION2_PROJID32BIT |
+				    XFS_SB_VERSION2_CRCBIT |
+				    XFS_SB_VERSION2_FTYPE);
+	xfs_scrub_block_check_ok(sc, bp,
+			(sb->sb_features2 & features_mask) ==
+			(cpu_to_be32(mp->m_sb.sb_features2) & features_mask));
+
+	/* Check sb_features2 flags that can be set after mkfs time. */
+	features_mask = cpu_to_be32(XFS_SB_VERSION2_ATTR2BIT);
+	xfs_scrub_block_check_ok(sc, bp,
+			(sb->sb_features2 & features_mask) ==
+			(cpu_to_be32(mp->m_sb.sb_features2) & features_mask));
+
+	if (!xfs_sb_version_hascrc(&mp->m_sb)) {
+		/* all v5 fields must be zero */
+		xfs_scrub_block_check_ok(sc, bp,
+				!memchr_inv(&sb->sb_features_compat, 0,
+					sizeof(struct xfs_dsb) -
+					offsetof(struct xfs_dsb,
+						sb_features_compat)));
+	} else {
+		/* Check compat flags; all are set at mkfs time. */
+		features_mask = cpu_to_be32(XFS_SB_FEAT_COMPAT_UNKNOWN);
+		xfs_scrub_block_check_ok(sc, bp,
+				(sb->sb_features_compat & features_mask) ==
+				(cpu_to_be32(mp->m_sb.sb_features_compat) &
+					features_mask));
+
+		/* Check ro compat flags; all are set at mkfs time. */
+		features_mask = cpu_to_be32(XFS_SB_FEAT_RO_COMPAT_UNKNOWN |
+					    XFS_SB_FEAT_RO_COMPAT_FINOBT |
+					    XFS_SB_FEAT_RO_COMPAT_RMAPBT |
+					    XFS_SB_FEAT_RO_COMPAT_REFLINK);
+		xfs_scrub_block_check_ok(sc, bp,
+				(sb->sb_features_ro_compat & features_mask) ==
+				(cpu_to_be32(mp->m_sb.sb_features_ro_compat) &
+					features_mask));
+
+		/* Check incompat flags; all are set at mkfs time. */
+		features_mask = cpu_to_be32(XFS_SB_FEAT_INCOMPAT_UNKNOWN |
+					    XFS_SB_FEAT_INCOMPAT_FTYPE |
+					    XFS_SB_FEAT_INCOMPAT_SPINODES |
+					    XFS_SB_FEAT_INCOMPAT_META_UUID);
+		xfs_scrub_block_check_ok(sc, bp,
+				(sb->sb_features_incompat & features_mask) ==
+				(cpu_to_be32(mp->m_sb.sb_features_incompat) &
+					features_mask));
+
+		/* Check log incompat flags; all are set at mkfs time. */
+		features_mask = cpu_to_be32(XFS_SB_FEAT_INCOMPAT_LOG_UNKNOWN);
+		xfs_scrub_block_check_ok(sc, bp,
+				(sb->sb_features_log_incompat & features_mask) ==
+				(cpu_to_be32(mp->m_sb.sb_features_log_incompat) &
+					features_mask));
+
+		/* Don't care about sb_crc */
+
+		xfs_scrub_block_check_ok(sc, bp,
+				sb->sb_spino_align ==
+				cpu_to_be32(mp->m_sb.sb_spino_align));
+
+		xfs_scrub_block_preen_ok(sc, bp,
+				sb->sb_pquotino ==
+				cpu_to_be64(mp->m_sb.sb_pquotino));
+
+		/* Don't care about sb_lsn */
+	}
+
+	if (xfs_sb_version_hasmetauuid(&mp->m_sb)) {
+		/* The metadata UUID must be the same for all supers */
+		xfs_scrub_block_check_ok(sc, bp,
+				uuid_equal(&sb->sb_meta_uuid,
+					&mp->m_sb.sb_meta_uuid));
+	}
+
+	/* Everything else must be zero. */
+	xfs_scrub_block_check_ok(sc, bp,
+			!memchr_inv(sb + 1, 0,
+				BBTOB(bp->b_length) - sizeof(struct xfs_dsb)));
+
+	return error;
+}
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 105b7ad..4c7c308 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -92,6 +92,8 @@ bool xfs_scrub_check_thoroughness(struct xfs_scrub_context *sc, bool fs_ok);
 
 /* Setup functions */
 int xfs_scrub_setup_fs(struct xfs_scrub_context *sc, struct xfs_inode *ip);
+int xfs_scrub_setup_ag_header(struct xfs_scrub_context *sc,
+			      struct xfs_inode *ip);
 
 void xfs_scrub_ag_free(struct xfs_scrub_context *sc, struct xfs_scrub_ag *sa);
 int xfs_scrub_ag_init(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index bfc53b6..8e08224 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -162,6 +162,10 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.setup	= xfs_scrub_setup_fs,
 		.scrub	= xfs_scrub_tester,
 	},
+	{ /* superblock */
+		.setup	= xfs_scrub_setup_ag_header,
+		.scrub	= xfs_scrub_superblock,
+	},
 };
 
 /* Dispatch metadata scrubbing. */
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index 291444f..027f62e 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -62,5 +62,6 @@ struct xfs_scrub_context {
 
 /* Metadata scrubbers */
 int xfs_scrub_tester(struct xfs_scrub_context *sc);
+int xfs_scrub_superblock(struct xfs_scrub_context *sc);
 
 #endif	/* __XFS_SCRUB_SCRUB_H__ */


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 12/27] xfs: scrub AGF and AGFL
  2017-09-21  0:17 [PATCH v10 00/27] xfs: online scrub support Darrick J. Wong
                   ` (10 preceding siblings ...)
  2017-09-21  0:18 ` [PATCH 11/27] xfs: scrub the backup superblocks Darrick J. Wong
@ 2017-09-21  0:18 ` Darrick J. Wong
  2017-09-21  0:18 ` [PATCH 13/27] xfs: scrub the AGI Darrick J. Wong
                   ` (15 subsequent siblings)
  27 siblings, 0 replies; 51+ messages in thread
From: Darrick J. Wong @ 2017-09-21  0:18 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Check the block references in the AGF and AGFL headers to make sure
they make sense.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_fs.h  |    4 +
 fs/xfs/scrub/agheader.c |  220 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/common.c   |   60 +++++++++++++
 fs/xfs/scrub/common.h   |    6 +
 fs/xfs/scrub/scrub.c    |    8 ++
 fs/xfs/scrub/scrub.h    |    2 
 6 files changed, 299 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index b98bba2..a22d90f 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -485,9 +485,11 @@ struct xfs_scrub_metadata {
 /* Scrub subcommands. */
 #define XFS_SCRUB_TYPE_TEST	0	/* presence test ioctl */
 #define XFS_SCRUB_TYPE_SB	1	/* superblock */
+#define XFS_SCRUB_TYPE_AGF	2	/* AG free header */
+#define XFS_SCRUB_TYPE_AGFL	3	/* AG free list */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	2
+#define XFS_SCRUB_TYPE_NR	4
 
 /* i: Repair this metadata. */
 #define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/agheader.c b/fs/xfs/scrub/agheader.c
index c58a5ef..c2cc2af 100644
--- a/fs/xfs/scrub/agheader.c
+++ b/fs/xfs/scrub/agheader.c
@@ -49,6 +49,72 @@ xfs_scrub_setup_ag_header(
 	return xfs_scrub_setup_fs(sc, ip);
 }
 
+/* Find the size of the AG, in blocks. */
+static inline xfs_agblock_t
+xfs_scrub_ag_blocks(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno)
+{
+	ASSERT(agno < mp->m_sb.sb_agcount);
+
+	if (agno < mp->m_sb.sb_agcount - 1)
+		return mp->m_sb.sb_agblocks;
+	return mp->m_sb.sb_dblocks - (agno * mp->m_sb.sb_agblocks);
+}
+
+/* Walk all the blocks in the AGFL. */
+int
+xfs_scrub_walk_agfl(
+	struct xfs_scrub_context	*sc,
+	int				(*fn)(struct xfs_scrub_context *,
+					      xfs_agblock_t bno, void *),
+	void				*priv)
+{
+	struct xfs_agf			*agf;
+	__be32				*agfl_bno;
+	struct xfs_mount		*mp = sc->mp;
+	unsigned int			flfirst;
+	unsigned int			fllast;
+	int				i;
+	int				error;
+
+	agf = XFS_BUF_TO_AGF(sc->sa.agf_bp);
+	agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, sc->sa.agfl_bp);
+	flfirst = be32_to_cpu(agf->agf_flfirst);
+	fllast = be32_to_cpu(agf->agf_fllast);
+
+	/* Skip an empty AGFL. */
+	if (agf->agf_flcount == cpu_to_be32(0))
+		return 0;
+
+	/* first to last is a consecutive list. */
+	if (fllast >= flfirst) {
+		for (i = flfirst; i <= fllast; i++) {
+			error = fn(sc, be32_to_cpu(agfl_bno[i]), priv);
+			if (error)
+				return error;
+		}
+
+		return 0;
+	}
+
+	/* first to the end */
+	for (i = flfirst; i < XFS_AGFL_SIZE(mp); i++) {
+		error = fn(sc, be32_to_cpu(agfl_bno[i]), priv);
+		if (error)
+			return error;
+	}
+
+	/* the start to last. */
+	for (i = 0; i <= fllast; i++) {
+		error = fn(sc, be32_to_cpu(agfl_bno[i]), priv);
+		if (error)
+			return error;
+	}
+
+	return 0;
+}
+
 /* Superblock */
 
 /* Scrub the filesystem superblock. */
@@ -324,3 +390,157 @@ xfs_scrub_superblock(
 
 	return error;
 }
+
+/* AGF */
+
+/* Scrub the AGF. */
+int
+xfs_scrub_agf(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_agf			*agf;
+	xfs_daddr_t			daddr;
+	xfs_daddr_t			eofs;
+	xfs_agnumber_t			agno;
+	xfs_agblock_t			agbno;
+	xfs_agblock_t			eoag;
+	xfs_agblock_t			agfl_first;
+	xfs_agblock_t			agfl_last;
+	xfs_agblock_t			agfl_count;
+	xfs_agblock_t			fl_count;
+	int				level;
+	int				error = 0;
+
+	agno = sc->sm->sm_agno;
+	error = xfs_scrub_load_ag_headers(sc, agno, XFS_SCRUB_TYPE_AGF);
+	if (!xfs_scrub_op_ok(sc, agno, XFS_AGF_BLOCK(sc->mp), &error))
+		goto out;
+
+	agf = XFS_BUF_TO_AGF(sc->sa.agf_bp);
+	eofs = XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks);
+
+	/* Check the AG length */
+	eoag = be32_to_cpu(agf->agf_length);
+	xfs_scrub_block_check_ok(sc, sc->sa.agf_bp,
+			eoag == xfs_scrub_ag_blocks(mp, agno));
+
+	/* Check the AGF btree roots and levels */
+	agbno = be32_to_cpu(agf->agf_roots[XFS_BTNUM_BNO]);
+	daddr = XFS_AGB_TO_DADDR(mp, agno, agbno);
+	xfs_scrub_block_check_ok(sc, sc->sa.agf_bp,
+			agbno > XFS_AGI_BLOCK(mp) &&
+			agbno < mp->m_sb.sb_agblocks &&
+			agbno < eoag && daddr < eofs);
+
+	agbno = be32_to_cpu(agf->agf_roots[XFS_BTNUM_CNT]);
+	daddr = XFS_AGB_TO_DADDR(mp, agno, agbno);
+	xfs_scrub_block_check_ok(sc, sc->sa.agf_bp,
+			agbno > XFS_AGI_BLOCK(mp) &&
+			agbno < mp->m_sb.sb_agblocks &&
+			agbno < eoag && daddr < eofs);
+
+	level = be32_to_cpu(agf->agf_levels[XFS_BTNUM_BNO]);
+	xfs_scrub_block_check_ok(sc, sc->sa.agf_bp,
+			level > 0 && level <= XFS_BTREE_MAXLEVELS);
+
+	level = be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNT]);
+	xfs_scrub_block_check_ok(sc, sc->sa.agf_bp,
+			level > 0 && level <= XFS_BTREE_MAXLEVELS);
+
+	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
+		agbno = be32_to_cpu(agf->agf_roots[XFS_BTNUM_RMAP]);
+		daddr = XFS_AGB_TO_DADDR(mp, agno, agbno);
+		xfs_scrub_block_check_ok(sc, sc->sa.agf_bp,
+				agbno > XFS_AGI_BLOCK(mp) &&
+				agbno < mp->m_sb.sb_agblocks &&
+				agbno < eoag && daddr < eofs);
+
+		level = be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAP]);
+		xfs_scrub_block_check_ok(sc, sc->sa.agf_bp,
+				level > 0 && level <= XFS_BTREE_MAXLEVELS);
+	}
+
+	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
+		agbno = be32_to_cpu(agf->agf_refcount_root);
+		daddr = XFS_AGB_TO_DADDR(mp, agno, agbno);
+		xfs_scrub_block_check_ok(sc, sc->sa.agf_bp,
+				agbno > XFS_AGI_BLOCK(mp) &&
+				agbno < mp->m_sb.sb_agblocks &&
+				agbno < eoag && daddr < eofs);
+
+		level = be32_to_cpu(agf->agf_refcount_level);
+		xfs_scrub_block_check_ok(sc, sc->sa.agf_bp,
+				level > 0 && level <= XFS_BTREE_MAXLEVELS);
+	}
+
+	/* Check the AGFL counters */
+	agfl_first = be32_to_cpu(agf->agf_flfirst);
+	agfl_last = be32_to_cpu(agf->agf_fllast);
+	agfl_count = be32_to_cpu(agf->agf_flcount);
+	if (agfl_last > agfl_first)
+		fl_count = agfl_last - agfl_first + 1;
+	else
+		fl_count = XFS_AGFL_SIZE(mp) - agfl_first + agfl_last + 1;
+	xfs_scrub_block_check_ok(sc, sc->sa.agf_bp,
+			agfl_count == 0 || fl_count == agfl_count);
+
+out:
+	return error;
+}
+
+/* AGFL */
+
+struct xfs_scrub_agfl {
+	xfs_agblock_t			eoag;
+	xfs_daddr_t			eofs;
+};
+
+/* Scrub an AGFL block. */
+STATIC int
+xfs_scrub_agfl_block(
+	struct xfs_scrub_context	*sc,
+	xfs_agblock_t			agbno,
+	void				*priv)
+{
+	struct xfs_mount		*mp = sc->mp;
+	xfs_agnumber_t			agno = sc->sa.agno;
+	struct xfs_scrub_agfl		*sagfl = priv;
+	int				error = 0;
+
+	xfs_scrub_block_check_ok(sc, sc->sa.agfl_bp,
+			agbno > XFS_AGI_BLOCK(mp) &&
+			agbno < mp->m_sb.sb_agblocks &&
+			agbno < sagfl->eoag &&
+			XFS_AGB_TO_DADDR(mp, agno, agbno) < sagfl->eofs);
+
+	return error;
+}
+
+/* Scrub the AGFL. */
+int
+xfs_scrub_agfl(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_scrub_agfl		sagfl;
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_agf			*agf;
+	xfs_agnumber_t			agno;
+	int				error;
+
+	agno = sc->sm->sm_agno;
+	error = xfs_scrub_load_ag_headers(sc, agno, XFS_SCRUB_TYPE_AGFL);
+	if (!xfs_scrub_op_ok(sc, agno, XFS_AGFL_BLOCK(sc->mp), &error))
+		goto out;
+	if (!sc->sa.agf_bp)
+		return -EFSCORRUPTED;
+
+	agf = XFS_BUF_TO_AGF(sc->sa.agf_bp);
+	sagfl.eofs = XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks);
+	sagfl.eoag = be32_to_cpu(agf->agf_length);
+
+	/* Check the blocks in the AGFL. */
+	return xfs_scrub_walk_agfl(sc, xfs_scrub_agfl_block, &sagfl);
+out:
+	return error;
+}
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 1e80f23..e0e611d 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -463,6 +463,66 @@ xfs_scrub_ag_init(
 	return xfs_scrub_ag_btcur_init(sc, sa);
 }
 
+/*
+ * Load and verify an AG header for further AG header examination.
+ * If this header is not the target of the examination, don't return
+ * the buffer if a runtime or verifier error occurs.
+ */
+STATIC int
+xfs_scrub_load_ag_header(
+	struct xfs_scrub_context	*sc,
+	xfs_daddr_t			daddr,
+	struct xfs_buf			**bpp,
+	const struct xfs_buf_ops	*ops,
+	bool				is_target)
+{
+	struct xfs_mount		*mp = sc->mp;
+	int				error;
+
+	*bpp = NULL;
+	error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
+			XFS_AG_DADDR(mp, sc->sa.agno, daddr),
+			XFS_FSS_TO_BB(mp, 1), 0, bpp, ops);
+	return is_target ? error : 0;
+}
+
+/*
+ * Load as many of the AG headers and btree cursors as we can for an
+ * examination and cross-reference of an AG header.
+ */
+int
+xfs_scrub_load_ag_headers(
+	struct xfs_scrub_context	*sc,
+	xfs_agnumber_t			agno,
+	unsigned int			type)
+{
+	struct xfs_mount		*mp = sc->mp;
+	int				error;
+
+	ASSERT(type == XFS_SCRUB_TYPE_AGF || type == XFS_SCRUB_TYPE_AGFL);
+	memset(&sc->sa, 0, sizeof(sc->sa));
+	sc->sa.agno = agno;
+
+	error = xfs_scrub_load_ag_header(sc, XFS_AGI_DADDR(mp),
+			&sc->sa.agi_bp, &xfs_agi_buf_ops, false);
+	if (error)
+		return error;
+
+	error = xfs_scrub_load_ag_header(sc, XFS_AGF_DADDR(mp),
+			&sc->sa.agf_bp, &xfs_agf_buf_ops,
+			type == XFS_SCRUB_TYPE_AGF);
+	if (error)
+		return error;
+
+	error = xfs_scrub_load_ag_header(sc, XFS_AGFL_DADDR(mp),
+			&sc->sa.agfl_bp, &xfs_agfl_buf_ops,
+			type == XFS_SCRUB_TYPE_AGFL);
+	if (error)
+		return error;
+
+	return 0;
+}
+
 /* Per-scrubber setup functions */
 
 /* Set us up with a transaction and an empty context. */
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 4c7c308..4d9a56c 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -104,5 +104,11 @@ int xfs_scrub_ag_read_headers(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
 void xfs_scrub_ag_btcur_free(struct xfs_scrub_ag *sa);
 int xfs_scrub_ag_btcur_init(struct xfs_scrub_context *sc,
 			    struct xfs_scrub_ag *sa);
+int xfs_scrub_load_ag_headers(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
+			      unsigned int type);
+int xfs_scrub_walk_agfl(struct xfs_scrub_context *sc,
+			int (*fn)(struct xfs_scrub_context *, xfs_agblock_t bno,
+				  void *),
+			void *priv);
 
 #endif	/* __XFS_SCRUB_COMMON_H__ */
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 8e08224..1afce7b 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -166,6 +166,14 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.setup	= xfs_scrub_setup_ag_header,
 		.scrub	= xfs_scrub_superblock,
 	},
+	{ /* agf */
+		.setup	= xfs_scrub_setup_ag_header,
+		.scrub	= xfs_scrub_agf,
+	},
+	{ /* agfl */
+		.setup	= xfs_scrub_setup_ag_header,
+		.scrub	= xfs_scrub_agfl,
+	},
 };
 
 /* Dispatch metadata scrubbing. */
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index 027f62e..cb31a01 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -63,5 +63,7 @@ struct xfs_scrub_context {
 /* Metadata scrubbers */
 int xfs_scrub_tester(struct xfs_scrub_context *sc);
 int xfs_scrub_superblock(struct xfs_scrub_context *sc);
+int xfs_scrub_agf(struct xfs_scrub_context *sc);
+int xfs_scrub_agfl(struct xfs_scrub_context *sc);
 
 #endif	/* __XFS_SCRUB_SCRUB_H__ */


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 13/27] xfs: scrub the AGI
  2017-09-21  0:17 [PATCH v10 00/27] xfs: online scrub support Darrick J. Wong
                   ` (11 preceding siblings ...)
  2017-09-21  0:18 ` [PATCH 12/27] xfs: scrub AGF and AGFL Darrick J. Wong
@ 2017-09-21  0:18 ` Darrick J. Wong
  2017-09-21  0:19 ` [PATCH 14/27] xfs: scrub free space btrees Darrick J. Wong
                   ` (14 subsequent siblings)
  27 siblings, 0 replies; 51+ messages in thread
From: Darrick J. Wong @ 2017-09-21  0:18 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Add a forgotten check to the AGI verifier, then wire up the scrub
infrastructure to check the AGI contents.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_fs.h  |    3 +-
 fs/xfs/scrub/agheader.c |   88 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/common.c   |    6 ++-
 fs/xfs/scrub/scrub.c    |    4 ++
 fs/xfs/scrub/scrub.h    |    1 +
 5 files changed, 99 insertions(+), 3 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index a22d90f..2ac0049 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -487,9 +487,10 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_SB	1	/* superblock */
 #define XFS_SCRUB_TYPE_AGF	2	/* AG free header */
 #define XFS_SCRUB_TYPE_AGFL	3	/* AG free list */
+#define XFS_SCRUB_TYPE_AGI	4	/* AG inode header */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	4
+#define XFS_SCRUB_TYPE_NR	5
 
 /* i: Repair this metadata. */
 #define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/agheader.c b/fs/xfs/scrub/agheader.c
index c2cc2af..cfbbe9e 100644
--- a/fs/xfs/scrub/agheader.c
+++ b/fs/xfs/scrub/agheader.c
@@ -544,3 +544,91 @@ xfs_scrub_agfl(
 out:
 	return error;
 }
+
+/* AGI */
+
+/* Scrub the AGI. */
+int
+xfs_scrub_agi(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_agi			*agi;
+	xfs_daddr_t			daddr;
+	xfs_daddr_t			eofs;
+	xfs_agnumber_t			agno;
+	xfs_agblock_t			agbno;
+	xfs_agblock_t			eoag;
+	xfs_agino_t			agino;
+	xfs_agino_t			first_agino;
+	xfs_agino_t			last_agino;
+	int				i;
+	int				level;
+	int				error = 0;
+
+	agno = sc->sm->sm_agno;
+	error = xfs_scrub_load_ag_headers(sc, agno, XFS_SCRUB_TYPE_AGI);
+	if (!xfs_scrub_op_ok(sc, agno, XFS_AGI_BLOCK(sc->mp), &error))
+		goto out;
+
+	agi = XFS_BUF_TO_AGI(sc->sa.agi_bp);
+	eofs = XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks);
+
+	/* Check the AG length */
+	eoag = be32_to_cpu(agi->agi_length);
+	xfs_scrub_block_check_ok(sc, sc->sa.agi_bp,
+			eoag == xfs_scrub_ag_blocks(mp, agno));
+
+	/* Check btree roots and levels */
+	agbno = be32_to_cpu(agi->agi_root);
+	daddr = XFS_AGB_TO_DADDR(mp, agno, agbno);
+	xfs_scrub_block_check_ok(sc, sc->sa.agi_bp,
+			agbno > XFS_AGI_BLOCK(mp) &&
+			agbno < mp->m_sb.sb_agblocks &&
+			agbno < eoag && daddr < eofs);
+
+	level = be32_to_cpu(agi->agi_level);
+	xfs_scrub_block_check_ok(sc, sc->sa.agi_bp,
+			level > 0 && level <= XFS_BTREE_MAXLEVELS);
+
+	if (xfs_sb_version_hasfinobt(&mp->m_sb)) {
+		agbno = be32_to_cpu(agi->agi_free_root);
+		daddr = XFS_AGB_TO_DADDR(mp, agno, agbno);
+		xfs_scrub_block_check_ok(sc, sc->sa.agi_bp,
+				agbno > XFS_AGI_BLOCK(mp) &&
+				agbno < mp->m_sb.sb_agblocks &&
+				agbno < eoag && daddr < eofs);
+
+		level = be32_to_cpu(agi->agi_free_level);
+		xfs_scrub_block_check_ok(sc, sc->sa.agi_bp,
+				level > 0 && level <= XFS_BTREE_MAXLEVELS);
+	}
+
+	/* Check inode counters */
+	first_agino = XFS_OFFBNO_TO_AGINO(mp, XFS_AGI_BLOCK(mp) + 1, 0);
+	last_agino = XFS_OFFBNO_TO_AGINO(mp, eoag + 1, 0) - 1;
+	agino = be32_to_cpu(agi->agi_count);
+	xfs_scrub_block_check_ok(sc, sc->sa.agi_bp,
+			agino <= last_agino - first_agino + 1 &&
+			agino >= be32_to_cpu(agi->agi_freecount));
+
+	/* Check inode pointers */
+	agino = be32_to_cpu(agi->agi_newino);
+	xfs_scrub_block_check_ok(sc, sc->sa.agi_bp, agino == NULLAGINO ||
+			(agino >= first_agino && agino <= last_agino));
+	agino = be32_to_cpu(agi->agi_dirino);
+	xfs_scrub_block_check_ok(sc, sc->sa.agi_bp, agino == NULLAGINO ||
+			(agino >= first_agino && agino <= last_agino));
+
+	/* Check unlinked inode buckets */
+	for (i = 0; i < XFS_AGI_UNLINKED_BUCKETS; i++) {
+		agino = be32_to_cpu(agi->agi_unlinked[i]);
+		if (agino == NULLAGINO)
+			continue;
+		xfs_scrub_block_check_ok(sc, sc->sa.agi_bp,
+				(agino >= first_agino && agino <= last_agino));
+	}
+
+out:
+	return error;
+}
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index e0e611d..b62f084 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -499,12 +499,14 @@ xfs_scrub_load_ag_headers(
 	struct xfs_mount		*mp = sc->mp;
 	int				error;
 
-	ASSERT(type == XFS_SCRUB_TYPE_AGF || type == XFS_SCRUB_TYPE_AGFL);
+	ASSERT(type == XFS_SCRUB_TYPE_AGF || type == XFS_SCRUB_TYPE_AGFL ||
+	       type == XFS_SCRUB_TYPE_AGI);
 	memset(&sc->sa, 0, sizeof(sc->sa));
 	sc->sa.agno = agno;
 
 	error = xfs_scrub_load_ag_header(sc, XFS_AGI_DADDR(mp),
-			&sc->sa.agi_bp, &xfs_agi_buf_ops, false);
+			&sc->sa.agi_bp, &xfs_agi_buf_ops,
+			type == XFS_SCRUB_TYPE_AGI);
 	if (error)
 		return error;
 
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 1afce7b..5fc34a6 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -174,6 +174,10 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.setup	= xfs_scrub_setup_ag_header,
 		.scrub	= xfs_scrub_agfl,
 	},
+	{ /* agi */
+		.setup	= xfs_scrub_setup_ag_header,
+		.scrub	= xfs_scrub_agi,
+	},
 };
 
 /* Dispatch metadata scrubbing. */
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index cb31a01..00020a3 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -65,5 +65,6 @@ int xfs_scrub_tester(struct xfs_scrub_context *sc);
 int xfs_scrub_superblock(struct xfs_scrub_context *sc);
 int xfs_scrub_agf(struct xfs_scrub_context *sc);
 int xfs_scrub_agfl(struct xfs_scrub_context *sc);
+int xfs_scrub_agi(struct xfs_scrub_context *sc);
 
 #endif	/* __XFS_SCRUB_SCRUB_H__ */


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 14/27] xfs: scrub free space btrees
  2017-09-21  0:17 [PATCH v10 00/27] xfs: online scrub support Darrick J. Wong
                   ` (12 preceding siblings ...)
  2017-09-21  0:18 ` [PATCH 13/27] xfs: scrub the AGI Darrick J. Wong
@ 2017-09-21  0:19 ` Darrick J. Wong
  2017-09-21  0:19 ` [PATCH 15/27] xfs: scrub inode btrees Darrick J. Wong
                   ` (13 subsequent siblings)
  27 siblings, 0 replies; 51+ messages in thread
From: Darrick J. Wong @ 2017-09-21  0:19 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Check the extent records free space btrees to ensure that the values
look sane.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile        |    1 
 fs/xfs/libxfs/xfs_fs.h |    4 +-
 fs/xfs/scrub/alloc.c   |  108 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/common.c  |   16 +++++++
 fs/xfs/scrub/common.h  |    6 +++
 fs/xfs/scrub/scrub.c   |    8 ++++
 fs/xfs/scrub/scrub.h   |    2 +
 7 files changed, 144 insertions(+), 1 deletion(-)
 create mode 100644 fs/xfs/scrub/alloc.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index e92d04d..84ac733 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -147,6 +147,7 @@ ifeq ($(CONFIG_XFS_ONLINE_SCRUB),y)
 xfs-y				+= $(addprefix scrub/, \
 				   trace.o \
 				   agheader.o \
+				   alloc.o \
 				   btree.o \
 				   common.o \
 				   scrub.o \
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 2ac0049..55230b8 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -488,9 +488,11 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_AGF	2	/* AG free header */
 #define XFS_SCRUB_TYPE_AGFL	3	/* AG free list */
 #define XFS_SCRUB_TYPE_AGI	4	/* AG inode header */
+#define XFS_SCRUB_TYPE_BNOBT	5	/* freesp by block btree */
+#define XFS_SCRUB_TYPE_CNTBT	6	/* freesp by length btree */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	5
+#define XFS_SCRUB_TYPE_NR	7
 
 /* i: Repair this metadata. */
 #define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/alloc.c b/fs/xfs/scrub/alloc.c
new file mode 100644
index 0000000..f0e2386
--- /dev/null
+++ b/fs/xfs/scrub/alloc.c
@@ -0,0 +1,108 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_rmap.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/btree.h"
+#include "scrub/trace.h"
+
+/*
+ * Set us up to scrub free space btrees.
+ * Push everything out of the log so that the busy extent list is empty.
+ */
+int
+xfs_scrub_setup_ag_allocbt(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	return xfs_scrub_setup_ag_btree(sc, ip, sc->try_harder);
+}
+
+/* Free space btree scrubber. */
+
+/* Scrub a bnobt/cntbt record. */
+STATIC int
+xfs_scrub_allocbt_helper(
+	struct xfs_scrub_btree		*bs,
+	union xfs_btree_rec		*rec)
+{
+	struct xfs_mount		*mp = bs->cur->bc_mp;
+	struct xfs_agf			*agf;
+	unsigned long long		rec_end;
+	xfs_agblock_t			bno;
+	xfs_extlen_t			len;
+	int				error = 0;
+
+	bno = be32_to_cpu(rec->alloc.ar_startblock);
+	len = be32_to_cpu(rec->alloc.ar_blockcount);
+	agf = XFS_BUF_TO_AGF(bs->sc->sa.agf_bp);
+	rec_end = (unsigned long long)bno + len;
+
+	xfs_scrub_btree_check_ok(bs->sc, bs->cur, 0,
+			bno < mp->m_sb.sb_agblocks &&
+			bno < be32_to_cpu(agf->agf_length) &&
+			len != 0 &&
+			rec_end <= mp->m_sb.sb_agblocks &&
+			rec_end <= be32_to_cpu(agf->agf_length));
+
+	return error;
+}
+
+/* Scrub the freespace btrees for some AG. */
+STATIC int
+xfs_scrub_allocbt(
+	struct xfs_scrub_context	*sc,
+	xfs_btnum_t			which)
+{
+	struct xfs_owner_info		oinfo;
+	struct xfs_btree_cur		*cur;
+
+	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_AG);
+	cur = which == XFS_BTNUM_BNO ? sc->sa.bno_cur : sc->sa.cnt_cur;
+	return xfs_scrub_btree(sc, cur, xfs_scrub_allocbt_helper,
+			&oinfo, NULL);
+}
+
+int
+xfs_scrub_bnobt(
+	struct xfs_scrub_context	*sc)
+{
+	return xfs_scrub_allocbt(sc, XFS_BTNUM_BNO);
+}
+
+int
+xfs_scrub_cntbt(
+	struct xfs_scrub_context	*sc)
+{
+	return xfs_scrub_allocbt(sc, XFS_BTNUM_CNT);
+}
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index b62f084..a3e185c 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -536,3 +536,19 @@ xfs_scrub_setup_fs(
 	return xfs_scrub_trans_alloc(sc->sm, sc->mp,
 			&M_RES(sc->mp)->tr_itruncate, 0, 0, 0, &sc->tp);
 }
+
+/* Set us up with AG headers and btree cursors. */
+int
+xfs_scrub_setup_ag_btree(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip,
+	bool				force_log)
+{
+	int				error;
+
+	error = xfs_scrub_setup_ag_header(sc, ip);
+	if (error)
+		return error;
+
+	return xfs_scrub_ag_init(sc, sc->sm->sm_agno, &sc->sa);
+}
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 4d9a56c..d70f470 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -94,6 +94,9 @@ bool xfs_scrub_check_thoroughness(struct xfs_scrub_context *sc, bool fs_ok);
 int xfs_scrub_setup_fs(struct xfs_scrub_context *sc, struct xfs_inode *ip);
 int xfs_scrub_setup_ag_header(struct xfs_scrub_context *sc,
 			      struct xfs_inode *ip);
+int xfs_scrub_setup_ag_allocbt(struct xfs_scrub_context *sc,
+			       struct xfs_inode *ip);
+
 
 void xfs_scrub_ag_free(struct xfs_scrub_context *sc, struct xfs_scrub_ag *sa);
 int xfs_scrub_ag_init(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
@@ -111,4 +114,7 @@ int xfs_scrub_walk_agfl(struct xfs_scrub_context *sc,
 				  void *),
 			void *priv);
 
+int xfs_scrub_setup_ag_btree(struct xfs_scrub_context *sc,
+			     struct xfs_inode *ip, bool force_log);
+
 #endif	/* __XFS_SCRUB_COMMON_H__ */
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 5fc34a6..e41778c 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -178,6 +178,14 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.setup	= xfs_scrub_setup_ag_header,
 		.scrub	= xfs_scrub_agi,
 	},
+	{ /* bnobt */
+		.setup	= xfs_scrub_setup_ag_allocbt,
+		.scrub	= xfs_scrub_bnobt,
+	},
+	{ /* cntbt */
+		.setup	= xfs_scrub_setup_ag_allocbt,
+		.scrub	= xfs_scrub_cntbt,
+	},
 };
 
 /* Dispatch metadata scrubbing. */
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index 00020a3..3f1a77c 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -66,5 +66,7 @@ int xfs_scrub_superblock(struct xfs_scrub_context *sc);
 int xfs_scrub_agf(struct xfs_scrub_context *sc);
 int xfs_scrub_agfl(struct xfs_scrub_context *sc);
 int xfs_scrub_agi(struct xfs_scrub_context *sc);
+int xfs_scrub_bnobt(struct xfs_scrub_context *sc);
+int xfs_scrub_cntbt(struct xfs_scrub_context *sc);
 
 #endif	/* __XFS_SCRUB_SCRUB_H__ */


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 15/27] xfs: scrub inode btrees
  2017-09-21  0:17 [PATCH v10 00/27] xfs: online scrub support Darrick J. Wong
                   ` (13 preceding siblings ...)
  2017-09-21  0:19 ` [PATCH 14/27] xfs: scrub free space btrees Darrick J. Wong
@ 2017-09-21  0:19 ` Darrick J. Wong
  2017-09-21  0:19 ` [PATCH 16/27] xfs: scrub rmap btrees Darrick J. Wong
                   ` (12 subsequent siblings)
  27 siblings, 0 replies; 51+ messages in thread
From: Darrick J. Wong @ 2017-09-21  0:19 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Check the records of the inode btrees to make sure that the values
make sense given the inode records themselves.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile            |    1 
 fs/xfs/libxfs/xfs_format.h |    2 
 fs/xfs/libxfs/xfs_fs.h     |    4 -
 fs/xfs/scrub/common.h      |    2 
 fs/xfs/scrub/ialloc.c      |  332 ++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/scrub.c       |    9 +
 fs/xfs/scrub/scrub.h       |    2 
 7 files changed, 350 insertions(+), 2 deletions(-)
 create mode 100644 fs/xfs/scrub/ialloc.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 84ac733..82326b7 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -150,6 +150,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   alloc.o \
 				   btree.o \
 				   common.o \
+				   ialloc.o \
 				   scrub.o \
 				   )
 endif
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 23229f0..154c3dd 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -518,7 +518,7 @@ static inline int xfs_sb_version_hasftype(struct xfs_sb *sbp)
 		 (sbp->sb_features2 & XFS_SB_VERSION2_FTYPE));
 }
 
-static inline int xfs_sb_version_hasfinobt(xfs_sb_t *sbp)
+static inline bool xfs_sb_version_hasfinobt(xfs_sb_t *sbp)
 {
 	return (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) &&
 		(sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_FINOBT);
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 55230b8..c8e6d89 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -490,9 +490,11 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_AGI	4	/* AG inode header */
 #define XFS_SCRUB_TYPE_BNOBT	5	/* freesp by block btree */
 #define XFS_SCRUB_TYPE_CNTBT	6	/* freesp by length btree */
+#define XFS_SCRUB_TYPE_INOBT	7	/* inode btree */
+#define XFS_SCRUB_TYPE_FINOBT	8	/* free inode btree */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	7
+#define XFS_SCRUB_TYPE_NR	9
 
 /* i: Repair this metadata. */
 #define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index d70f470..8fc0619 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -96,6 +96,8 @@ int xfs_scrub_setup_ag_header(struct xfs_scrub_context *sc,
 			      struct xfs_inode *ip);
 int xfs_scrub_setup_ag_allocbt(struct xfs_scrub_context *sc,
 			       struct xfs_inode *ip);
+int xfs_scrub_setup_ag_iallocbt(struct xfs_scrub_context *sc,
+				struct xfs_inode *ip);
 
 
 void xfs_scrub_ag_free(struct xfs_scrub_context *sc, struct xfs_scrub_ag *sa);
diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
new file mode 100644
index 0000000..cb0cb8b
--- /dev/null
+++ b/fs/xfs/scrub/ialloc.c
@@ -0,0 +1,332 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_ialloc.h"
+#include "xfs_ialloc_btree.h"
+#include "xfs_icache.h"
+#include "xfs_rmap.h"
+#include "xfs_log.h"
+#include "xfs_trans_priv.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/btree.h"
+#include "scrub/trace.h"
+
+/*
+ * Set us up to scrub inode btrees.
+ * If we detect a discrepancy between the inobt and the inode,
+ * try again after forcing logged inode cores out to disk.
+ */
+int
+xfs_scrub_setup_ag_iallocbt(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	return xfs_scrub_setup_ag_btree(sc, ip, sc->try_harder);
+}
+
+/* Inode btree scrubber. */
+
+/* Scrub a chunk of an inobt record. */
+STATIC bool
+xfs_scrub_iallocbt_chunk(
+	struct xfs_scrub_btree		*bs,
+	struct xfs_inobt_rec_incore	*irec,
+	xfs_agino_t			agino,
+	xfs_extlen_t			len)
+{
+	struct xfs_mount		*mp = bs->cur->bc_mp;
+	struct xfs_agf			*agf;
+	unsigned long long		rec_end;
+	xfs_agblock_t			eoag;
+	xfs_agblock_t			bno;
+
+	agf = XFS_BUF_TO_AGF(bs->sc->sa.agf_bp);
+	eoag = be32_to_cpu(agf->agf_length);
+	bno = XFS_AGINO_TO_AGBNO(mp, agino);
+	rec_end = (unsigned long long)bno + len;
+
+	return xfs_scrub_btree_check_ok(bs->sc, bs->cur, 0,
+			bno < mp->m_sb.sb_agblocks && bno < eoag &&
+			rec_end <= mp->m_sb.sb_agblocks && rec_end <= eoag);
+}
+
+/* Count the number of free inodes. */
+static unsigned int
+xfs_scrub_iallocbt_freecount(
+	xfs_inofree_t			freemask)
+{
+	int				bits = XFS_INODES_PER_CHUNK;
+	unsigned int			ret = 0;
+
+	while (bits--) {
+		if (freemask & 1)
+			ret++;
+		freemask >>= 1;
+	}
+
+	return ret;
+}
+
+/* Check a particular inode with ir_free. */
+STATIC int
+xfs_scrub_iallocbt_check_cluster_freemask(
+	struct xfs_scrub_btree		*bs,
+	xfs_ino_t			fsino,
+	xfs_agino_t			chunkino,
+	xfs_agino_t			clusterino,
+	struct xfs_inobt_rec_incore	*irec,
+	struct xfs_buf			*bp)
+{
+	struct xfs_dinode		*dip;
+	struct xfs_mount		*mp = bs->cur->bc_mp;
+	bool				freemask_ok;
+	bool				inuse;
+	int				error;
+
+	dip = xfs_buf_offset(bp, clusterino * mp->m_sb.sb_inodesize);
+	if (!xfs_scrub_btree_check_ok(bs->sc, bs->cur, 0,
+			be16_to_cpu(dip->di_magic) == XFS_DINODE_MAGIC &&
+			(dip->di_version < 3 || be64_to_cpu(dip->di_ino) ==
+				fsino + clusterino)))
+		goto out;
+
+	freemask_ok = !!(irec->ir_free & XFS_INOBT_MASK(chunkino + clusterino));
+	error = xfs_icache_inode_is_allocated(mp, bs->cur->bc_tp,
+			fsino + clusterino, &inuse);
+	if (error == -ENODATA) {
+		/* Not cached, just read the disk buffer */
+		freemask_ok ^= !!(dip->di_mode);
+		if (!bs->sc->try_harder && !freemask_ok)
+			return -EDEADLOCK;
+	} else if (error < 0) {
+		/* Inode is only half assembled, don't bother. */
+		freemask_ok = true;
+	} else {
+		/* Inode is all there. */
+		freemask_ok ^= inuse;
+	}
+	xfs_scrub_btree_check_ok(bs->sc, bs->cur, 0, freemask_ok);
+out:
+	return 0;
+}
+
+/* Make sure the free mask is consistent with what the inodes think. */
+STATIC int
+xfs_scrub_iallocbt_check_freemask(
+	struct xfs_scrub_btree		*bs,
+	struct xfs_inobt_rec_incore	*irec)
+{
+	struct xfs_owner_info		oinfo;
+	struct xfs_imap			imap;
+	struct xfs_mount		*mp = bs->cur->bc_mp;
+	struct xfs_dinode		*dip;
+	struct xfs_buf			*bp;
+	xfs_ino_t			fsino;
+	xfs_agino_t			nr_inodes;
+	xfs_agino_t			agino;
+	xfs_agino_t			chunkino;
+	xfs_agino_t			clusterino;
+	xfs_agblock_t			agbno;
+	int				blks_per_cluster;
+	uint16_t			holemask;
+	uint16_t			ir_holemask;
+	int				error = 0;
+
+	/* Make sure the freemask matches the inode records. */
+	blks_per_cluster = xfs_icluster_size_fsb(mp);
+	nr_inodes = XFS_OFFBNO_TO_AGINO(mp, blks_per_cluster, 0);
+	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_INODES);
+
+	for (agino = irec->ir_startino;
+	     agino < irec->ir_startino + XFS_INODES_PER_CHUNK;
+	     agino += blks_per_cluster * mp->m_sb.sb_inopblock) {
+		fsino = XFS_AGINO_TO_INO(mp, bs->cur->bc_private.a.agno, agino);
+		chunkino = agino - irec->ir_startino;
+		agbno = XFS_AGINO_TO_AGBNO(mp, agino);
+
+		/* Compute the holemask mask for this cluster. */
+		for (clusterino = 0, holemask = 0; clusterino < nr_inodes;
+		     clusterino += XFS_INODES_PER_HOLEMASK_BIT)
+			holemask |= XFS_INOBT_MASK((chunkino + clusterino) /
+					XFS_INODES_PER_HOLEMASK_BIT);
+
+		/* The whole cluster must be a hole or not a hole. */
+		ir_holemask = (irec->ir_holemask & holemask);
+		xfs_scrub_btree_check_ok(bs->sc, bs->cur, 0,
+				ir_holemask == holemask || ir_holemask == 0);
+
+		/* If any part of this is a hole, skip it. */
+		if (ir_holemask)
+			continue;
+
+		/* Grab the inode cluster buffer. */
+		imap.im_blkno = XFS_AGB_TO_DADDR(mp, bs->cur->bc_private.a.agno,
+				agbno);
+		imap.im_len = XFS_FSB_TO_BB(mp, blks_per_cluster);
+		imap.im_boffset = 0;
+
+		error = xfs_imap_to_bp(mp, bs->cur->bc_tp, &imap,
+				&dip, &bp, 0, 0);
+		if (!xfs_scrub_btree_op_ok(bs->sc, bs->cur, 0, &error))
+			continue;
+
+		/* Which inodes are free? */
+		for (clusterino = 0; clusterino < nr_inodes; clusterino++) {
+			error = xfs_scrub_iallocbt_check_cluster_freemask(bs,
+					fsino, chunkino, clusterino, irec, bp);
+			if (error) {
+				xfs_trans_brelse(bs->cur->bc_tp, bp);
+				return error;
+			}
+		}
+
+		xfs_trans_brelse(bs->cur->bc_tp, bp);
+	}
+
+	return error;
+}
+
+/* Scrub an inobt/finobt record. */
+STATIC int
+xfs_scrub_iallocbt_helper(
+	struct xfs_scrub_btree		*bs,
+	union xfs_btree_rec		*rec)
+{
+	struct xfs_mount		*mp = bs->cur->bc_mp;
+	struct xfs_agi			*agi;
+	struct xfs_inobt_rec_incore	irec;
+	uint64_t			holes;
+	xfs_agino_t			agino;
+	xfs_agblock_t			agbno;
+	xfs_extlen_t			len;
+	int				holecount;
+	int				i;
+	int				error = 0;
+	unsigned int			real_freecount;
+	uint16_t			holemask;
+
+	xfs_inobt_btrec_to_irec(mp, rec, &irec);
+
+	xfs_scrub_btree_check_ok(bs->sc, bs->cur, 0,
+			irec.ir_count <= XFS_INODES_PER_CHUNK &&
+			irec.ir_freecount <= XFS_INODES_PER_CHUNK);
+	real_freecount = irec.ir_freecount +
+			(XFS_INODES_PER_CHUNK - irec.ir_count);
+	xfs_scrub_btree_check_ok(bs->sc, bs->cur, 0,
+		real_freecount == xfs_scrub_iallocbt_freecount(irec.ir_free));
+
+	agi = XFS_BUF_TO_AGI(bs->sc->sa.agi_bp);
+	agino = irec.ir_startino;
+	agbno = XFS_AGINO_TO_AGBNO(mp, irec.ir_startino);
+	if (!xfs_scrub_btree_check_ok(bs->sc, bs->cur, 0,
+			agbno < be32_to_cpu(agi->agi_length)))
+		goto out;
+	xfs_scrub_btree_check_ok(bs->sc, bs->cur, 0,
+			!(agbno & (xfs_ialloc_cluster_alignment(mp) - 1)) &&
+			!(agbno & (xfs_icluster_size_fsb(mp) - 1)));
+
+	/* Handle non-sparse inodes */
+	if (!xfs_inobt_issparse(irec.ir_holemask)) {
+		len = XFS_B_TO_FSB(mp,
+				XFS_INODES_PER_CHUNK * mp->m_sb.sb_inodesize);
+		xfs_scrub_btree_check_ok(bs->sc, bs->cur, 0,
+				irec.ir_count == XFS_INODES_PER_CHUNK);
+
+		if (!xfs_scrub_iallocbt_chunk(bs, &irec, agino, len))
+			goto out;
+		goto check_freemask;
+	}
+
+	/* Check each chunk of a sparse inode cluster. */
+	holemask = irec.ir_holemask;
+	holecount = 0;
+	len = XFS_B_TO_FSB(mp,
+			XFS_INODES_PER_HOLEMASK_BIT * mp->m_sb.sb_inodesize);
+	holes = ~xfs_inobt_irec_to_allocmask(&irec);
+	xfs_scrub_btree_check_ok(bs->sc, bs->cur, 0,
+			(holes & irec.ir_free) == holes &&
+			irec.ir_freecount <= irec.ir_count);
+
+	for (i = 0; i < XFS_INOBT_HOLEMASK_BITS; holemask >>= 1,
+			i++, agino += XFS_INODES_PER_HOLEMASK_BIT) {
+		if (holemask & 1) {
+			holecount += XFS_INODES_PER_HOLEMASK_BIT;
+			continue;
+		}
+
+		if (!xfs_scrub_iallocbt_chunk(bs, &irec, agino, len))
+			break;
+	}
+
+	xfs_scrub_btree_check_ok(bs->sc, bs->cur, 0,
+			holecount <= XFS_INODES_PER_CHUNK &&
+			holecount + irec.ir_count == XFS_INODES_PER_CHUNK);
+
+check_freemask:
+	error = xfs_scrub_iallocbt_check_freemask(bs, &irec);
+	if (error)
+		goto out;
+
+out:
+	return error;
+}
+
+/* Scrub the inode btrees for some AG. */
+STATIC int
+xfs_scrub_iallocbt(
+	struct xfs_scrub_context	*sc,
+	xfs_btnum_t			which)
+{
+	struct xfs_btree_cur		*cur;
+	struct xfs_owner_info		oinfo;
+
+	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_INOBT);
+	cur = which == XFS_BTNUM_INO ? sc->sa.ino_cur : sc->sa.fino_cur;
+	return xfs_scrub_btree(sc, cur, xfs_scrub_iallocbt_helper,
+			&oinfo, NULL);
+}
+
+int
+xfs_scrub_inobt(
+	struct xfs_scrub_context	*sc)
+{
+	return xfs_scrub_iallocbt(sc, XFS_BTNUM_INO);
+}
+
+int
+xfs_scrub_finobt(
+	struct xfs_scrub_context	*sc)
+{
+	return xfs_scrub_iallocbt(sc, XFS_BTNUM_FINO);
+}
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index e41778c..9304bb3 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -186,6 +186,15 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.setup	= xfs_scrub_setup_ag_allocbt,
 		.scrub	= xfs_scrub_cntbt,
 	},
+	{ /* inobt */
+		.setup	= xfs_scrub_setup_ag_iallocbt,
+		.scrub	= xfs_scrub_inobt,
+	},
+	{ /* finobt */
+		.setup	= xfs_scrub_setup_ag_iallocbt,
+		.scrub	= xfs_scrub_finobt,
+		.has	= xfs_sb_version_hasfinobt,
+	},
 };
 
 /* Dispatch metadata scrubbing. */
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index 3f1a77c..55c9dde 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -68,5 +68,7 @@ int xfs_scrub_agfl(struct xfs_scrub_context *sc);
 int xfs_scrub_agi(struct xfs_scrub_context *sc);
 int xfs_scrub_bnobt(struct xfs_scrub_context *sc);
 int xfs_scrub_cntbt(struct xfs_scrub_context *sc);
+int xfs_scrub_inobt(struct xfs_scrub_context *sc);
+int xfs_scrub_finobt(struct xfs_scrub_context *sc);
 
 #endif	/* __XFS_SCRUB_SCRUB_H__ */


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 16/27] xfs: scrub rmap btrees
  2017-09-21  0:17 [PATCH v10 00/27] xfs: online scrub support Darrick J. Wong
                   ` (14 preceding siblings ...)
  2017-09-21  0:19 ` [PATCH 15/27] xfs: scrub inode btrees Darrick J. Wong
@ 2017-09-21  0:19 ` Darrick J. Wong
  2017-09-21  0:19 ` [PATCH 17/27] xfs: scrub refcount btrees Darrick J. Wong
                   ` (11 subsequent siblings)
  27 siblings, 0 replies; 51+ messages in thread
From: Darrick J. Wong @ 2017-09-21  0:19 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Check the reverse mapping records to make sure that the contents
make sense.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile        |    1 
 fs/xfs/libxfs/xfs_fs.h |    3 +
 fs/xfs/scrub/common.h  |    2 +
 fs/xfs/scrub/rmap.c    |  130 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/scrub.c   |    5 ++
 fs/xfs/scrub/scrub.h   |    1 
 6 files changed, 141 insertions(+), 1 deletion(-)
 create mode 100644 fs/xfs/scrub/rmap.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 82326b7..5a64f8d 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -151,6 +151,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   btree.o \
 				   common.o \
 				   ialloc.o \
+				   rmap.o \
 				   scrub.o \
 				   )
 endif
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index c8e6d89..74aa021 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -492,9 +492,10 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_CNTBT	6	/* freesp by length btree */
 #define XFS_SCRUB_TYPE_INOBT	7	/* inode btree */
 #define XFS_SCRUB_TYPE_FINOBT	8	/* free inode btree */
+#define XFS_SCRUB_TYPE_RMAPBT	9	/* reverse mapping btree */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	9
+#define XFS_SCRUB_TYPE_NR	10
 
 /* i: Repair this metadata. */
 #define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 8fc0619..9075a5c 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -98,6 +98,8 @@ int xfs_scrub_setup_ag_allocbt(struct xfs_scrub_context *sc,
 			       struct xfs_inode *ip);
 int xfs_scrub_setup_ag_iallocbt(struct xfs_scrub_context *sc,
 				struct xfs_inode *ip);
+int xfs_scrub_setup_ag_rmapbt(struct xfs_scrub_context *sc,
+			      struct xfs_inode *ip);
 
 
 void xfs_scrub_ag_free(struct xfs_scrub_context *sc, struct xfs_scrub_ag *sa);
diff --git a/fs/xfs/scrub/rmap.c b/fs/xfs/scrub/rmap.c
new file mode 100644
index 0000000..7331ecf
--- /dev/null
+++ b/fs/xfs/scrub/rmap.c
@@ -0,0 +1,130 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_rmap.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/btree.h"
+#include "scrub/trace.h"
+
+/*
+ * Set us up to scrub reverse mapping btrees.
+ */
+int
+xfs_scrub_setup_ag_rmapbt(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	return xfs_scrub_setup_ag_btree(sc, ip, false);
+}
+
+/* Reverse-mapping scrubber. */
+
+/* Scrub an rmapbt record. */
+STATIC int
+xfs_scrub_rmapbt_helper(
+	struct xfs_scrub_btree		*bs,
+	union xfs_btree_rec		*rec)
+{
+	struct xfs_mount		*mp = bs->cur->bc_mp;
+	struct xfs_agf			*agf;
+	struct xfs_rmap_irec		irec;
+	unsigned long long		rec_end;
+	xfs_agblock_t			eoag;
+	bool				non_inode;
+	bool				is_unwritten;
+	bool				is_bmbt;
+	bool				is_attr;
+	int				error;
+
+	error = xfs_rmap_btrec_to_irec(rec, &irec);
+	if (!xfs_scrub_btree_op_ok(bs->sc, bs->cur, 0, &error))
+		goto out;
+
+	/* Check extent. */
+	agf = XFS_BUF_TO_AGF(bs->sc->sa.agf_bp);
+	eoag = be32_to_cpu(agf->agf_length);
+	rec_end = (unsigned long long)irec.rm_startblock + irec.rm_blockcount;
+
+	xfs_scrub_btree_check_ok(bs->sc, bs->cur, 0,
+			irec.rm_startblock < mp->m_sb.sb_agblocks &&
+			irec.rm_startblock < eoag &&
+			irec.rm_blockcount != 0 &&
+			rec_end <= mp->m_sb.sb_agblocks &&
+			rec_end <= eoag);
+
+	/* Check flags. */
+	non_inode = XFS_RMAP_NON_INODE_OWNER(irec.rm_owner);
+	is_bmbt = irec.rm_flags & XFS_RMAP_BMBT_BLOCK;
+	is_attr = irec.rm_flags & XFS_RMAP_ATTR_FORK;
+	is_unwritten = irec.rm_flags & XFS_RMAP_UNWRITTEN;
+
+	xfs_scrub_btree_check_ok(bs->sc, bs->cur, 0,
+			(!is_bmbt || irec.rm_offset == 0) &&
+			(!non_inode || irec.rm_offset == 0) &&
+			(!is_unwritten || !(is_bmbt || non_inode || is_attr)) &&
+			(!non_inode || !(is_bmbt || is_unwritten || is_attr)));
+
+	/* Owner inode within an AG? */
+	xfs_scrub_btree_check_ok(bs->sc, bs->cur, 0, non_inode ||
+			(XFS_INO_TO_AGNO(mp, irec.rm_owner) <
+							mp->m_sb.sb_agcount &&
+			 XFS_AGINO_TO_AGBNO(mp,
+				XFS_INO_TO_AGINO(mp, irec.rm_owner)) <
+							mp->m_sb.sb_agblocks));
+	/* Owner inode within the FS? */
+	xfs_scrub_btree_check_ok(bs->sc, bs->cur, 0, non_inode ||
+			XFS_AGB_TO_DADDR(mp,
+				XFS_INO_TO_AGNO(mp, irec.rm_owner),
+				XFS_AGINO_TO_AGBNO(mp,
+					XFS_INO_TO_AGINO(mp, irec.rm_owner))) <
+			XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks));
+
+	/* Non-inode owner within the magic values? */
+	xfs_scrub_btree_check_ok(bs->sc, bs->cur, 0, !non_inode ||
+			(irec.rm_owner > XFS_RMAP_OWN_MIN &&
+			 irec.rm_owner <= XFS_RMAP_OWN_FS));
+out:
+	return error;
+}
+
+/* Scrub the rmap btree for some AG. */
+int
+xfs_scrub_rmapbt(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_owner_info		oinfo;
+
+	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_AG);
+	return xfs_scrub_btree(sc, sc->sa.rmap_cur, xfs_scrub_rmapbt_helper,
+			&oinfo, NULL);
+}
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 9304bb3..a418269 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -195,6 +195,11 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.scrub	= xfs_scrub_finobt,
 		.has	= xfs_sb_version_hasfinobt,
 	},
+	{ /* rmapbt */
+		.setup	= xfs_scrub_setup_ag_rmapbt,
+		.scrub	= xfs_scrub_rmapbt,
+		.has	= xfs_sb_version_hasrmapbt,
+	},
 };
 
 /* Dispatch metadata scrubbing. */
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index 55c9dde..f602ded 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -70,5 +70,6 @@ int xfs_scrub_bnobt(struct xfs_scrub_context *sc);
 int xfs_scrub_cntbt(struct xfs_scrub_context *sc);
 int xfs_scrub_inobt(struct xfs_scrub_context *sc);
 int xfs_scrub_finobt(struct xfs_scrub_context *sc);
+int xfs_scrub_rmapbt(struct xfs_scrub_context *sc);
 
 #endif	/* __XFS_SCRUB_SCRUB_H__ */


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 17/27] xfs: scrub refcount btrees
  2017-09-21  0:17 [PATCH v10 00/27] xfs: online scrub support Darrick J. Wong
                   ` (15 preceding siblings ...)
  2017-09-21  0:19 ` [PATCH 16/27] xfs: scrub rmap btrees Darrick J. Wong
@ 2017-09-21  0:19 ` Darrick J. Wong
  2017-09-21  0:19 ` [PATCH 18/27] xfs: scrub inodes Darrick J. Wong
                   ` (10 subsequent siblings)
  27 siblings, 0 replies; 51+ messages in thread
From: Darrick J. Wong @ 2017-09-21  0:19 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Plumb in the pieces necessary to check the refcount btree.  If rmap is
available, check the reference count by performing an interval query
against the rmapbt.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile         |    1 
 fs/xfs/libxfs/xfs_fs.h  |    3 +
 fs/xfs/scrub/common.h   |    2 +
 fs/xfs/scrub/refcount.c |   99 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/scrub.c    |    5 ++
 fs/xfs/scrub/scrub.h    |    1 
 6 files changed, 110 insertions(+), 1 deletion(-)
 create mode 100644 fs/xfs/scrub/refcount.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 5a64f8d..a7c5752 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -151,6 +151,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   btree.o \
 				   common.o \
 				   ialloc.o \
+				   refcount.o \
 				   rmap.o \
 				   scrub.o \
 				   )
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 74aa021..450a692 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -493,9 +493,10 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_INOBT	7	/* inode btree */
 #define XFS_SCRUB_TYPE_FINOBT	8	/* free inode btree */
 #define XFS_SCRUB_TYPE_RMAPBT	9	/* reverse mapping btree */
+#define XFS_SCRUB_TYPE_REFCNTBT	10	/* reference count btree */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	10
+#define XFS_SCRUB_TYPE_NR	11
 
 /* i: Repair this metadata. */
 #define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 9075a5c..e50368d 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -100,6 +100,8 @@ int xfs_scrub_setup_ag_iallocbt(struct xfs_scrub_context *sc,
 				struct xfs_inode *ip);
 int xfs_scrub_setup_ag_rmapbt(struct xfs_scrub_context *sc,
 			      struct xfs_inode *ip);
+int xfs_scrub_setup_ag_refcountbt(struct xfs_scrub_context *sc,
+				  struct xfs_inode *ip);
 
 
 void xfs_scrub_ag_free(struct xfs_scrub_context *sc, struct xfs_scrub_ag *sa);
diff --git a/fs/xfs/scrub/refcount.c b/fs/xfs/scrub/refcount.c
new file mode 100644
index 0000000..86e6759
--- /dev/null
+++ b/fs/xfs/scrub/refcount.c
@@ -0,0 +1,99 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_rmap.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/btree.h"
+#include "scrub/trace.h"
+
+/*
+ * Set us up to scrub reference count btrees.
+ */
+int
+xfs_scrub_setup_ag_refcountbt(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	return xfs_scrub_setup_ag_btree(sc, ip, false);
+}
+
+/* Reference count btree scrubber. */
+
+/* Scrub a refcountbt record. */
+STATIC int
+xfs_scrub_refcountbt_helper(
+	struct xfs_scrub_btree		*bs,
+	union xfs_btree_rec		*rec)
+{
+	struct xfs_mount		*mp = bs->cur->bc_mp;
+	struct xfs_agf			*agf;
+	struct xfs_refcount_irec	irec;
+	unsigned long long		rec_end;
+	xfs_agblock_t			eoag;
+	bool				has_cowflag;
+	int				error = 0;
+
+	irec.rc_startblock = be32_to_cpu(rec->refc.rc_startblock);
+	irec.rc_blockcount = be32_to_cpu(rec->refc.rc_blockcount);
+	irec.rc_refcount = be32_to_cpu(rec->refc.rc_refcount);
+	agf = XFS_BUF_TO_AGF(bs->sc->sa.agf_bp);
+	eoag = be32_to_cpu(agf->agf_length);
+
+	has_cowflag = !!(irec.rc_startblock & XFS_REFC_COW_START);
+	xfs_scrub_btree_check_ok(bs->sc, bs->cur, 0,
+			(irec.rc_refcount == 1 && has_cowflag) ||
+			(irec.rc_refcount != 1 && !has_cowflag));
+	irec.rc_startblock &= ~XFS_REFC_COW_START;
+	rec_end = (unsigned long long)irec.rc_startblock + irec.rc_blockcount;
+	xfs_scrub_btree_check_ok(bs->sc, bs->cur, 0,
+			irec.rc_startblock < mp->m_sb.sb_agblocks &&
+			irec.rc_startblock < eoag &&
+			irec.rc_blockcount != 0 &&
+			rec_end <= mp->m_sb.sb_agblocks &&
+			rec_end <= eoag &&
+			irec.rc_refcount >= 1);
+
+	return error;
+}
+
+/* Scrub the refcount btree for some AG. */
+int
+xfs_scrub_refcountbt(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_owner_info		oinfo;
+
+	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_REFC);
+	return xfs_scrub_btree(sc, sc->sa.refc_cur, xfs_scrub_refcountbt_helper,
+			&oinfo, NULL);
+}
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index a418269..4af2096 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -200,6 +200,11 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.scrub	= xfs_scrub_rmapbt,
 		.has	= xfs_sb_version_hasrmapbt,
 	},
+	{ /* refcountbt */
+		.setup	= xfs_scrub_setup_ag_refcountbt,
+		.scrub	= xfs_scrub_refcountbt,
+		.has	= xfs_sb_version_hasreflink,
+	},
 };
 
 /* Dispatch metadata scrubbing. */
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index f602ded..4cc4b84 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -71,5 +71,6 @@ int xfs_scrub_cntbt(struct xfs_scrub_context *sc);
 int xfs_scrub_inobt(struct xfs_scrub_context *sc);
 int xfs_scrub_finobt(struct xfs_scrub_context *sc);
 int xfs_scrub_rmapbt(struct xfs_scrub_context *sc);
+int xfs_scrub_refcountbt(struct xfs_scrub_context *sc);
 
 #endif	/* __XFS_SCRUB_SCRUB_H__ */


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 18/27] xfs: scrub inodes
  2017-09-21  0:17 [PATCH v10 00/27] xfs: online scrub support Darrick J. Wong
                   ` (16 preceding siblings ...)
  2017-09-21  0:19 ` [PATCH 17/27] xfs: scrub refcount btrees Darrick J. Wong
@ 2017-09-21  0:19 ` Darrick J. Wong
  2017-09-21  0:19 ` [PATCH 19/27] xfs: scrub inode block mappings Darrick J. Wong
                   ` (9 subsequent siblings)
  27 siblings, 0 replies; 51+ messages in thread
From: Darrick J. Wong @ 2017-09-21  0:19 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Scrub the fields within an inode.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile        |    1 
 fs/xfs/libxfs/xfs_fs.h |    3 
 fs/xfs/scrub/common.c  |   49 +++++++
 fs/xfs/scrub/common.h  |    3 
 fs/xfs/scrub/inode.c   |  342 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/scrub.c   |   17 ++
 fs/xfs/scrub/scrub.h   |    2 
 7 files changed, 414 insertions(+), 3 deletions(-)
 create mode 100644 fs/xfs/scrub/inode.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index a7c5752..28e14b7 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -151,6 +151,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   btree.o \
 				   common.o \
 				   ialloc.o \
+				   inode.o \
 				   refcount.o \
 				   rmap.o \
 				   scrub.o \
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 450a692..2051c3d 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -494,9 +494,10 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_FINOBT	8	/* free inode btree */
 #define XFS_SCRUB_TYPE_RMAPBT	9	/* reverse mapping btree */
 #define XFS_SCRUB_TYPE_REFCNTBT	10	/* reference count btree */
+#define XFS_SCRUB_TYPE_INODE	11	/* inode record */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	11
+#define XFS_SCRUB_TYPE_NR	12
 
 /* i: Repair this metadata. */
 #define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index a3e185c..2a1d456 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -30,6 +30,8 @@
 #include "xfs_trans.h"
 #include "xfs_sb.h"
 #include "xfs_inode.h"
+#include "xfs_icache.h"
+#include "xfs_itable.h"
 #include "xfs_alloc.h"
 #include "xfs_alloc_btree.h"
 #include "xfs_bmap.h"
@@ -552,3 +554,50 @@ xfs_scrub_setup_ag_btree(
 
 	return xfs_scrub_ag_init(sc, sc->sm->sm_agno, &sc->sa);
 }
+
+/*
+ * Given an inode and the scrub control structure, grab either the
+ * inode referenced in the control structure or the inode passed in.
+ * The inode is not locked.
+ */
+int
+xfs_scrub_get_inode(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip_in)
+{
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_inode		*ips = NULL;
+	int				error;
+
+	if (sc->sm->sm_agno || (sc->sm->sm_gen && !sc->sm->sm_ino))
+		return -EINVAL;
+
+	/* We want to scan the inode we already had opened. */
+	if (sc->sm->sm_ino == 0 || sc->sm->sm_ino == ip_in->i_ino) {
+		sc->ip = ip_in;
+		return 0;
+	}
+
+	/* Look up the inode, see if the generation number matches. */
+	if (xfs_internal_inum(mp, sc->sm->sm_ino))
+		return -ENOENT;
+	error = xfs_iget(mp, NULL, sc->sm->sm_ino, XFS_IGET_UNTRUSTED,
+			0, &ips);
+	if (error == -ENOENT || error == -EINVAL) {
+		/* inode doesn't exist... */
+		return -ENOENT;
+	} else if (error) {
+		trace_xfs_scrub_op_error(sc,
+				XFS_INO_TO_AGNO(mp, sc->sm->sm_ino),
+				XFS_INO_TO_AGBNO(mp, sc->sm->sm_ino),
+				error, __return_address);
+		return error;
+	}
+	if (VFS_I(ips)->i_generation != sc->sm->sm_gen) {
+		iput(VFS_I(ips));
+		return -ENOENT;
+	}
+
+	sc->ip = ips;
+	return 0;
+}
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index e50368d..ae6b557 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -102,6 +102,8 @@ int xfs_scrub_setup_ag_rmapbt(struct xfs_scrub_context *sc,
 			      struct xfs_inode *ip);
 int xfs_scrub_setup_ag_refcountbt(struct xfs_scrub_context *sc,
 				  struct xfs_inode *ip);
+int xfs_scrub_setup_inode(struct xfs_scrub_context *sc,
+			  struct xfs_inode *ip);
 
 
 void xfs_scrub_ag_free(struct xfs_scrub_context *sc, struct xfs_scrub_ag *sa);
@@ -122,5 +124,6 @@ int xfs_scrub_walk_agfl(struct xfs_scrub_context *sc,
 
 int xfs_scrub_setup_ag_btree(struct xfs_scrub_context *sc,
 			     struct xfs_inode *ip, bool force_log);
+int xfs_scrub_get_inode(struct xfs_scrub_context *sc, struct xfs_inode *ip_in);
 
 #endif	/* __XFS_SCRUB_COMMON_H__ */
diff --git a/fs/xfs/scrub/inode.c b/fs/xfs/scrub/inode.c
new file mode 100644
index 0000000..63d29f23
--- /dev/null
+++ b/fs/xfs/scrub/inode.c
@@ -0,0 +1,342 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_icache.h"
+#include "xfs_inode_buf.h"
+#include "xfs_inode_fork.h"
+#include "xfs_ialloc.h"
+#include "xfs_log.h"
+#include "xfs_trans_priv.h"
+#include "xfs_reflink.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/trace.h"
+
+/* Set us up with an inode. */
+int
+xfs_scrub_setup_inode(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	struct xfs_mount		*mp = sc->mp;
+	int				error;
+
+	/*
+	 * Try to get the inode.  If the verifiers fail, we try again
+	 * in raw mode.
+	 */
+	error = xfs_scrub_get_inode(sc, ip);
+	switch (error) {
+	case 0:
+		break;
+	case -EFSCORRUPTED:
+	case -EFSBADCRC:
+		/* Push everything out of the log onto disk prior to check. */
+		error = _xfs_log_force(mp, XFS_LOG_SYNC, NULL);
+		if (error)
+			return error;
+		xfs_ail_push_all_sync(mp->m_ail);
+		return 0;
+	default:
+		return error;
+	}
+
+	/* Got the inode, lock it and we're ready to go. */
+	sc->ilock_flags = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL;
+	xfs_ilock(sc->ip, sc->ilock_flags);
+	error = xfs_scrub_trans_alloc(sc->sm, mp, &M_RES(mp)->tr_itruncate,
+			0, 0, 0, &sc->tp);
+	if (error)
+		goto out_unlock;
+	sc->ilock_flags |= XFS_ILOCK_EXCL;
+	xfs_ilock(sc->ip, XFS_ILOCK_EXCL);
+
+	return error;
+out_unlock:
+	xfs_iunlock(sc->ip, sc->ilock_flags);
+	if (sc->ip != ip)
+		iput(VFS_I(sc->ip));
+	sc->ip = NULL;
+	return error;
+}
+
+/* Inode core */
+
+/* Scrub an inode. */
+int
+xfs_scrub_inode(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_imap			imap;
+	struct xfs_dinode		di;
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_buf			*bp = NULL;
+	struct xfs_dinode		*dip;
+	xfs_ino_t			ino;
+	size_t				fork_recs;
+	unsigned long long		isize;
+	uint64_t			flags2;
+	uint32_t			nextents;
+	uint32_t			extsize;
+	uint32_t			cowextsize;
+	uint16_t			flags;
+	uint16_t			mode;
+	bool				has_shared;
+	int				error = 0;
+
+	/* Did we get the in-core inode, or are we doing this manually? */
+	if (sc->ip) {
+		ino = sc->ip->i_ino;
+		xfs_inode_to_disk(sc->ip, &di, 0);
+		dip = &di;
+	} else {
+		/* Map & read inode. */
+		ino = sc->sm->sm_ino;
+		error = xfs_imap(mp, sc->tp, ino, &imap, XFS_IGET_UNTRUSTED);
+		if (error == -EINVAL) {
+			/*
+			 * Inode could have gotten deleted out from under us;
+			 * just forget about it.
+			 */
+			error = -ENOENT;
+			goto out;
+		}
+		if (!xfs_scrub_op_ok(sc, XFS_INO_TO_AGNO(mp, ino),
+				XFS_INO_TO_AGBNO(mp, ino), &error))
+			goto out;
+
+		error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
+				imap.im_blkno, imap.im_len, XBF_UNMAPPED, &bp,
+				NULL);
+		if (!xfs_scrub_op_ok(sc, XFS_INO_TO_AGNO(mp, ino),
+				XFS_INO_TO_AGBNO(mp, ino), &error))
+			goto out;
+
+		/* Is this really the inode we want? */
+		bp->b_ops = &xfs_inode_buf_ops;
+		dip = xfs_buf_offset(bp, imap.im_boffset);
+		if (!xfs_scrub_ino_check_ok(sc, ino, bp,
+				xfs_dinode_verify(mp, ino, dip) &&
+				xfs_dinode_good_version(mp, dip->di_version)))
+			goto out;
+		if (be32_to_cpu(dip->di_gen) != sc->sm->sm_gen) {
+			error = -ENOENT;
+			goto out;
+		}
+	}
+
+	flags = be16_to_cpu(dip->di_flags);
+	if (dip->di_version >= 3)
+		flags2 = be64_to_cpu(dip->di_flags2);
+	else
+		flags2 = 0;
+
+	/* di_mode */
+	mode = be16_to_cpu(dip->di_mode);
+	xfs_scrub_ino_check_ok(sc, ino, bp, !(mode & ~(S_IALLUGO | S_IFMT)));
+
+	/* v1/v2 fields */
+	switch (dip->di_version) {
+	case 1:
+		xfs_scrub_ino_check_ok(sc, ino, bp,
+				dip->di_nlink == 0 &&
+				(dip->di_mode || !sc->ip) &&
+				dip->di_projid_lo == 0 &&
+				dip->di_projid_hi == 0);
+		break;
+	case 2:
+	case 3:
+		xfs_scrub_ino_check_ok(sc, ino, bp,
+				dip->di_onlink == 0 &&
+				(dip->di_mode || !sc->ip) &&
+				(dip->di_projid_hi == 0 ||
+				 xfs_sb_version_hasprojid32bit(&mp->m_sb)));
+		break;
+	default:
+		ASSERT(0);
+		break;
+	}
+
+	/*
+	 * di_uid/di_gid -- -1 isn't invalid, but there's no way that
+	 * userspace could have created that.
+	 */
+	xfs_scrub_ino_warn_ok(sc, bp,
+			dip->di_uid != cpu_to_be32(-1U) &&
+			dip->di_gid != cpu_to_be32(-1U));
+
+	/* di_format */
+	switch (dip->di_format) {
+	case XFS_DINODE_FMT_DEV:
+		xfs_scrub_ino_check_ok(sc, ino, bp,
+				S_ISCHR(mode) || S_ISBLK(mode) ||
+				S_ISFIFO(mode) || S_ISSOCK(mode));
+		break;
+	case XFS_DINODE_FMT_LOCAL:
+		xfs_scrub_ino_check_ok(sc, ino, bp,
+				S_ISDIR(mode) || S_ISLNK(mode));
+		break;
+	case XFS_DINODE_FMT_EXTENTS:
+		xfs_scrub_ino_check_ok(sc, ino, bp, S_ISREG(mode) ||
+				S_ISDIR(mode) || S_ISLNK(mode));
+		break;
+	case XFS_DINODE_FMT_BTREE:
+		xfs_scrub_ino_check_ok(sc, ino, bp, S_ISREG(mode) ||
+				S_ISDIR(mode));
+		break;
+	case XFS_DINODE_FMT_UUID:
+	default:
+		xfs_scrub_ino_check_ok(sc, ino, bp, false);
+		break;
+	}
+
+	/* di_size */
+	isize = be64_to_cpu(dip->di_size);
+	xfs_scrub_ino_check_ok(sc, ino, bp, !(isize & (1ULL << 63)));
+	if (!S_ISDIR(mode) && !S_ISREG(mode) && !S_ISLNK(mode))
+		xfs_scrub_ino_check_ok(sc, ino, bp, isize == 0);
+
+	/* di_nblocks */
+	if (flags2 & XFS_DIFLAG2_REFLINK) {
+		; /* nblocks can exceed dblocks */
+	} else if (flags & XFS_DIFLAG_REALTIME) {
+		xfs_scrub_ino_check_ok(sc, ino, bp,
+				be64_to_cpu(dip->di_nblocks) <
+				mp->m_sb.sb_dblocks + mp->m_sb.sb_rblocks);
+	} else {
+		xfs_scrub_ino_check_ok(sc, ino, bp,
+				be64_to_cpu(dip->di_nblocks) <
+				mp->m_sb.sb_dblocks);
+	}
+
+	/* di_extsize */
+	extsize = be32_to_cpu(dip->di_extsize);
+	if (flags & (XFS_DIFLAG_EXTSIZE | XFS_DIFLAG_EXTSZINHERIT)) {
+		xfs_scrub_ino_check_ok(sc, ino, bp,
+				extsize > 0 &&
+				extsize <= MAXEXTLEN &&
+				(extsize <= mp->m_sb.sb_agblocks / 2 ||
+				 (flags & XFS_DIFLAG_REALTIME)));
+	} else {
+		xfs_scrub_ino_check_ok(sc, ino, bp, extsize == 0);
+	}
+
+	/* di_flags */
+	xfs_scrub_ino_check_ok(sc, ino, bp,
+			(!(flags & XFS_DIFLAG_IMMUTABLE) ||
+			 !(flags & XFS_DIFLAG_APPEND)) &&
+			(!(flags & XFS_DIFLAG_FILESTREAM) ||
+			 !(flags & XFS_DIFLAG_REALTIME)));
+
+	/* di_nextents */
+	nextents = be32_to_cpu(dip->di_nextents);
+	fork_recs =  XFS_DFORK_DSIZE(dip, mp) / sizeof(struct xfs_bmbt_rec);
+	switch (dip->di_format) {
+	case XFS_DINODE_FMT_EXTENTS:
+		xfs_scrub_ino_check_ok(sc, ino, bp, nextents <= fork_recs);
+		break;
+	case XFS_DINODE_FMT_BTREE:
+		xfs_scrub_ino_check_ok(sc, ino, bp, nextents > fork_recs);
+		break;
+	case XFS_DINODE_FMT_LOCAL:
+	case XFS_DINODE_FMT_DEV:
+	case XFS_DINODE_FMT_UUID:
+	default:
+		xfs_scrub_ino_check_ok(sc, ino, bp, nextents == 0);
+		break;
+	}
+
+	/* di_anextents */
+	nextents = be16_to_cpu(dip->di_anextents);
+	fork_recs =  XFS_DFORK_ASIZE(dip, mp) / sizeof(struct xfs_bmbt_rec);
+	switch (dip->di_aformat) {
+	case XFS_DINODE_FMT_EXTENTS:
+		xfs_scrub_ino_check_ok(sc, ino, bp, nextents <= fork_recs);
+		break;
+	case XFS_DINODE_FMT_BTREE:
+		xfs_scrub_ino_check_ok(sc, ino, bp, nextents > fork_recs);
+		break;
+	case XFS_DINODE_FMT_LOCAL:
+	case XFS_DINODE_FMT_DEV:
+	case XFS_DINODE_FMT_UUID:
+	default:
+		xfs_scrub_ino_check_ok(sc, ino, bp, nextents == 0);
+		break;
+	}
+
+	/* di_forkoff */
+	xfs_scrub_ino_check_ok(sc, ino, bp,
+			XFS_DFORK_APTR(dip) <
+				(char *)dip + mp->m_sb.sb_inodesize &&
+			(dip->di_anextents == 0 || dip->di_forkoff));
+
+	/* di_aformat */
+	xfs_scrub_ino_check_ok(sc, ino, bp,
+			dip->di_aformat == XFS_DINODE_FMT_LOCAL ||
+			dip->di_aformat == XFS_DINODE_FMT_EXTENTS ||
+			dip->di_aformat == XFS_DINODE_FMT_BTREE);
+
+	/* di_cowextsize */
+	cowextsize = be32_to_cpu(dip->di_cowextsize);
+	if (flags2 & XFS_DIFLAG2_COWEXTSIZE) {
+		xfs_scrub_ino_check_ok(sc, ino, bp,
+				xfs_sb_version_hasreflink(&mp->m_sb) &&
+				cowextsize > 0 &&
+				cowextsize <= MAXEXTLEN &&
+				cowextsize <= mp->m_sb.sb_agblocks / 2);
+	} else {
+		xfs_scrub_ino_check_ok(sc, ino, bp, cowextsize == 0);
+	}
+
+	/* Now let's do the things that require a live inode. */
+	if (!sc->ip)
+		goto out;
+
+	/*
+	 * Does this inode have the reflink flag set but no shared extents?
+	 * Set the preening flag if this is the case.
+	 */
+	if (xfs_is_reflink_inode(sc->ip)) {
+		error = xfs_reflink_inode_has_shared_extents(sc->tp, sc->ip,
+				&has_shared);
+		if (!xfs_scrub_op_ok(sc, XFS_INO_TO_AGNO(mp, ino),
+				XFS_INO_TO_AGBNO(mp, ino), &error))
+			goto out;
+		xfs_scrub_ino_preen_ok(sc, bp, has_shared == true);
+	}
+
+out:
+	if (bp)
+		xfs_trans_brelse(sc->tp, bp);
+	return error;
+}
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 4af2096..b88f86e 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -30,6 +30,8 @@
 #include "xfs_trans.h"
 #include "xfs_sb.h"
 #include "xfs_inode.h"
+#include "xfs_icache.h"
+#include "xfs_itable.h"
 #include "xfs_alloc.h"
 #include "xfs_alloc_btree.h"
 #include "xfs_bmap.h"
@@ -145,6 +147,7 @@ xfs_scrub_tester(
 STATIC int
 xfs_scrub_teardown(
 	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip_in,
 	int				error)
 {
 	xfs_scrub_ag_free(sc, &sc->sa);
@@ -152,6 +155,12 @@ xfs_scrub_teardown(
 		xfs_trans_cancel(sc->tp);
 		sc->tp = NULL;
 	}
+	if (sc->ip) {
+		xfs_iunlock(sc->ip, sc->ilock_flags);
+		if (sc->ip != ip_in)
+			iput(VFS_I(sc->ip));
+		sc->ip = NULL;
+	}
 	return error;
 }
 
@@ -205,6 +214,10 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.scrub	= xfs_scrub_refcountbt,
 		.has	= xfs_sb_version_hasreflink,
 	},
+	{ /* inode record */
+		.setup	= xfs_scrub_setup_inode,
+		.scrub	= xfs_scrub_inode,
+	},
 };
 
 /* Dispatch metadata scrubbing. */
@@ -284,7 +297,7 @@ xfs_scrub_metadata(
 		 * Tear down everything we hold, then set up again with
 		 * preparation for worst-case scenarios.
 		 */
-		error = xfs_scrub_teardown(&sc, 0);
+		error = xfs_scrub_teardown(&sc, ip, 0);
 		if (error)
 			goto out;
 		try_harder = true;
@@ -297,7 +310,7 @@ xfs_scrub_metadata(
 		xfs_alert_ratelimited(mp, "Corruption detected during scrub.");
 
 out_teardown:
-	error = xfs_scrub_teardown(&sc, error);
+	error = xfs_scrub_teardown(&sc, ip, error);
 out:
 	trace_xfs_scrub_done(ip, sm, error);
 	return error;
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index 4cc4b84..fed8ed1 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -54,6 +54,7 @@ struct xfs_scrub_context {
 	const struct xfs_scrub_meta_ops	*ops;
 	struct xfs_trans		*tp;
 	struct xfs_inode		*ip;
+	uint				ilock_flags;
 	bool				try_harder;
 
 	/* State tracking for single-AG operations. */
@@ -72,5 +73,6 @@ int xfs_scrub_inobt(struct xfs_scrub_context *sc);
 int xfs_scrub_finobt(struct xfs_scrub_context *sc);
 int xfs_scrub_rmapbt(struct xfs_scrub_context *sc);
 int xfs_scrub_refcountbt(struct xfs_scrub_context *sc);
+int xfs_scrub_inode(struct xfs_scrub_context *sc);
 
 #endif	/* __XFS_SCRUB_SCRUB_H__ */


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 19/27] xfs: scrub inode block mappings
  2017-09-21  0:17 [PATCH v10 00/27] xfs: online scrub support Darrick J. Wong
                   ` (17 preceding siblings ...)
  2017-09-21  0:19 ` [PATCH 18/27] xfs: scrub inodes Darrick J. Wong
@ 2017-09-21  0:19 ` Darrick J. Wong
  2017-09-21  0:19 ` [PATCH 20/27] xfs: scrub directory/attribute btrees Darrick J. Wong
                   ` (8 subsequent siblings)
  27 siblings, 0 replies; 51+ messages in thread
From: Darrick J. Wong @ 2017-09-21  0:19 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Scrub an individual inode's block mappings to make sure they make sense.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile        |    1 
 fs/xfs/libxfs/xfs_fs.h |    5 +
 fs/xfs/scrub/bmap.c    |  358 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/common.h  |    5 +
 fs/xfs/scrub/scrub.c   |   12 ++
 fs/xfs/scrub/scrub.h   |    3 
 6 files changed, 382 insertions(+), 2 deletions(-)
 create mode 100644 fs/xfs/scrub/bmap.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 28e14b7..5a77489 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -148,6 +148,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   trace.o \
 				   agheader.o \
 				   alloc.o \
+				   bmap.o \
 				   btree.o \
 				   common.o \
 				   ialloc.o \
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 2051c3d..27e9f90 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -495,9 +495,12 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_RMAPBT	9	/* reverse mapping btree */
 #define XFS_SCRUB_TYPE_REFCNTBT	10	/* reference count btree */
 #define XFS_SCRUB_TYPE_INODE	11	/* inode record */
+#define XFS_SCRUB_TYPE_BMBTD	12	/* data fork block mapping */
+#define XFS_SCRUB_TYPE_BMBTA	13	/* attr fork block mapping */
+#define XFS_SCRUB_TYPE_BMBTC	14	/* CoW fork block mapping */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	12
+#define XFS_SCRUB_TYPE_NR	15
 
 /* i: Repair this metadata. */
 #define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c
new file mode 100644
index 0000000..353ed4f
--- /dev/null
+++ b/fs/xfs/scrub/bmap.c
@@ -0,0 +1,358 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_inode_fork.h"
+#include "xfs_bmap.h"
+#include "xfs_bmap_util.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_rmap.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/btree.h"
+#include "scrub/trace.h"
+
+/* Set us up with an inode's bmap. */
+STATIC int
+__xfs_scrub_setup_inode_bmap(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip,
+	bool				flush_data)
+{
+	struct xfs_mount		*mp = sc->mp;
+	int				error;
+
+	error = xfs_scrub_get_inode(sc, ip);
+	if (error)
+		return error;
+
+	sc->ilock_flags = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL;
+	xfs_ilock(sc->ip, sc->ilock_flags);
+
+	/*
+	 * We don't want any ephemeral data fork updates sitting around
+	 * while we inspect block mappings, so wait for directio to finish
+	 * and flush dirty data if we have delalloc reservations.
+	 */
+	if (S_ISREG(VFS_I(sc->ip)->i_mode) && flush_data) {
+		inode_dio_wait(VFS_I(sc->ip));
+		error = filemap_write_and_wait(VFS_I(sc->ip)->i_mapping);
+		if (error)
+			goto out_unlock;
+		error = invalidate_inode_pages2(VFS_I(sc->ip)->i_mapping);
+		if (error)
+			goto out_unlock;
+	}
+
+	/* Got the inode, lock it and we're ready to go. */
+	error = xfs_scrub_trans_alloc(sc->sm, mp, &M_RES(mp)->tr_itruncate,
+			0, 0, 0, &sc->tp);
+	if (error)
+		goto out_unlock;
+	sc->ilock_flags |= XFS_ILOCK_EXCL;
+	xfs_ilock(sc->ip, XFS_ILOCK_EXCL);
+
+	return 0;
+out_unlock:
+	xfs_iunlock(sc->ip, sc->ilock_flags);
+	if (sc->ip != ip)
+		iput(VFS_I(sc->ip));
+	sc->ip = NULL;
+	return error;
+}
+
+/* Set us up to scrub the data fork. */
+int
+xfs_scrub_setup_inode_bmap_data(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	return __xfs_scrub_setup_inode_bmap(sc, ip, true);
+}
+
+/* Set us up to scrub the attr or CoW fork. */
+int
+xfs_scrub_setup_inode_bmap(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	return __xfs_scrub_setup_inode_bmap(sc, ip, false);
+}
+
+/*
+ * Inode fork block mapping (BMBT) scrubber.
+ * More complex than the others because we have to scrub
+ * all the extents regardless of whether or not the fork
+ * is in btree format.
+ */
+
+struct xfs_scrub_bmap_info {
+	struct xfs_scrub_context	*sc;
+	xfs_daddr_t			eofs;
+	xfs_fileoff_t			lastoff;
+	bool				is_rt;
+	bool				is_shared;
+	int				whichfork;
+};
+
+/* Scrub a single extent record. */
+STATIC int
+xfs_scrub_bmap_extent(
+	struct xfs_inode		*ip,
+	struct xfs_btree_cur		*cur,
+	struct xfs_scrub_bmap_info	*info,
+	struct xfs_bmbt_irec		*irec)
+{
+	struct xfs_scrub_ag		sa = { 0 };
+	struct xfs_mount		*mp = info->sc->mp;
+	struct xfs_buf			*bp = NULL;
+	xfs_daddr_t			daddr;
+	xfs_daddr_t			dlen;
+	xfs_fsblock_t			bno;
+	xfs_agnumber_t			agno;
+	int				error = 0;
+
+	if (cur)
+		xfs_btree_get_block(cur, 0, &bp);
+
+	xfs_scrub_fblock_check_ok(info->sc, info->whichfork, irec->br_startoff,
+			irec->br_startoff >= info->lastoff &&
+			irec->br_startblock != HOLESTARTBLOCK &&
+			!isnullstartblock(irec->br_startblock));
+
+	/* Actual mapping, so check the block ranges. */
+	if (info->is_rt) {
+		daddr = XFS_FSB_TO_BB(mp, irec->br_startblock);
+		agno = NULLAGNUMBER;
+		bno = irec->br_startblock;
+	} else {
+		daddr = XFS_FSB_TO_DADDR(mp, irec->br_startblock);
+		agno = XFS_FSB_TO_AGNO(mp, irec->br_startblock);
+		if (!xfs_scrub_fblock_check_ok(info->sc, info->whichfork,
+				irec->br_startoff, agno < mp->m_sb.sb_agcount))
+			goto out;
+		bno = XFS_FSB_TO_AGBNO(mp, irec->br_startblock);
+		xfs_scrub_fblock_check_ok(info->sc, info->whichfork,
+			irec->br_startoff, bno < mp->m_sb.sb_agblocks);
+	}
+	dlen = XFS_FSB_TO_BB(mp, irec->br_blockcount);
+	xfs_scrub_fblock_check_ok(info->sc, info->whichfork, irec->br_startoff,
+			irec->br_blockcount > 0 &&
+			irec->br_blockcount <= MAXEXTLEN &&
+			daddr < info->eofs &&
+			daddr + dlen <= info->eofs &&
+			(irec->br_state != XFS_EXT_UNWRITTEN ||
+			 xfs_sb_version_hasextflgbit(&mp->m_sb)));
+
+	/* Set ourselves up for cross-referencing later. */
+	if (!info->is_rt) {
+		error = xfs_scrub_ag_init(info->sc, agno, &sa);
+		if (!xfs_scrub_fblock_op_ok(info->sc, info->whichfork,
+				irec->br_startoff, &error))
+			goto out;
+	}
+
+	xfs_scrub_ag_free(info->sc, &sa);
+out:
+	info->lastoff = irec->br_startoff + irec->br_blockcount;
+	return error;
+}
+
+/* Scrub a bmbt record. */
+STATIC int
+xfs_scrub_bmapbt_helper(
+	struct xfs_scrub_btree		*bs,
+	union xfs_btree_rec		*rec)
+{
+	struct xfs_bmbt_rec_host	ihost;
+	struct xfs_bmbt_irec		irec;
+	struct xfs_scrub_bmap_info	*info = bs->private;
+	struct xfs_inode		*ip = bs->cur->bc_private.b.ip;
+	struct xfs_buf			*bp = NULL;
+	struct xfs_btree_block		*block;
+	uint64_t			owner;
+	int				i;
+
+	/*
+	 * Check the owners of the btree blocks up to the level below
+	 * the root since the verifiers don't do that.
+	 */
+	if (xfs_sb_version_hascrc(&bs->cur->bc_mp->m_sb) &&
+	    bs->cur->bc_ptrs[0] == 1) {
+		for (i = 0; i < bs->cur->bc_nlevels - 1; i++) {
+			block = xfs_btree_get_block(bs->cur, i, &bp);
+			owner = be64_to_cpu(block->bb_u.l.bb_owner);
+			xfs_scrub_fblock_check_ok(bs->sc, info->whichfork, 0,
+					owner == ip->i_ino);
+		}
+	}
+
+	/* Set up the in-core record and scrub it. */
+	ihost.l0 = be64_to_cpu(rec->bmbt.l0);
+	ihost.l1 = be64_to_cpu(rec->bmbt.l1);
+	xfs_bmbt_get_all(&ihost, &irec);
+	return xfs_scrub_bmap_extent(ip, bs->cur, info, &irec);
+}
+
+/* Scrub an inode fork's block mappings. */
+STATIC int
+xfs_scrub_bmap(
+	struct xfs_scrub_context	*sc,
+	int				whichfork)
+{
+	struct xfs_bmbt_irec		irec;
+	struct xfs_scrub_bmap_info	info = {0};
+	struct xfs_owner_info		oinfo;
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_inode		*ip = sc->ip;
+	struct xfs_ifork		*ifp;
+	struct xfs_btree_cur		*cur;
+	xfs_fileoff_t			endoff;
+	xfs_extnum_t			idx;
+	bool				found;
+	int				error = 0;
+
+	ifp = XFS_IFORK_PTR(ip, whichfork);
+
+	info.is_rt = whichfork == XFS_DATA_FORK && XFS_IS_REALTIME_INODE(ip);
+	info.eofs = XFS_FSB_TO_BB(mp, info.is_rt ? mp->m_sb.sb_rblocks :
+					      mp->m_sb.sb_dblocks);
+	info.whichfork = whichfork;
+	info.is_shared = whichfork == XFS_DATA_FORK && xfs_is_reflink_inode(ip);
+	info.sc = sc;
+
+	switch (whichfork) {
+	case XFS_COW_FORK:
+		/* Non-existent CoW forks are ignorable. */
+		if (!ifp)
+			goto out_unlock;
+		/* No CoW forks on non-reflink inodes/filesystems. */
+		if (!xfs_scrub_ino_check_ok(sc, sc->ip->i_ino, NULL,
+				xfs_is_reflink_inode(ip)))
+			goto out_unlock;
+		break;
+	case XFS_ATTR_FORK:
+		if (!ifp)
+			goto out_unlock;
+		xfs_scrub_ino_check_ok(sc, sc->ip->i_ino, NULL,
+				xfs_sb_version_hasattr(&mp->m_sb) ||
+				xfs_sb_version_hasattr2(&mp->m_sb));
+		break;
+	}
+
+	/* Check the fork values */
+	switch (XFS_IFORK_FORMAT(ip, whichfork)) {
+	case XFS_DINODE_FMT_UUID:
+	case XFS_DINODE_FMT_DEV:
+	case XFS_DINODE_FMT_LOCAL:
+		/* No mappings to check. */
+		goto out_unlock;
+	case XFS_DINODE_FMT_EXTENTS:
+		if (!xfs_scrub_fblock_check_ok(sc, whichfork, 0,
+				ifp->if_flags & XFS_IFEXTENTS))
+			goto out_unlock;
+		break;
+	case XFS_DINODE_FMT_BTREE:
+		ASSERT(whichfork != XFS_COW_FORK);
+
+		/* Scan the btree records. */
+		cur = xfs_bmbt_init_cursor(mp, sc->tp, ip, whichfork);
+		xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, whichfork);
+		error = xfs_scrub_btree(sc, cur, xfs_scrub_bmapbt_helper,
+				&oinfo, &info);
+		xfs_btree_del_cursor(cur, error ? XFS_BTREE_ERROR :
+						  XFS_BTREE_NOERROR);
+		if (error == -EDEADLOCK)
+			return error;
+		else if (error)
+			goto out_unlock;
+		break;
+	default:
+		xfs_scrub_fblock_check_ok(sc, whichfork, 0, false);
+		goto out_unlock;
+	}
+
+	/* Extent data is in memory, so scrub that. */
+
+	/* Find the offset of the last extent in the mapping. */
+	error = xfs_bmap_last_offset(ip, &endoff, whichfork);
+	if (!xfs_scrub_fblock_op_ok(sc, whichfork, 0, &error))
+		goto out_unlock;
+
+	/* Scrub extent records. */
+	info.lastoff = 0;
+	ifp = XFS_IFORK_PTR(ip, whichfork);
+	for (found = xfs_iext_lookup_extent(ip, ifp, 0, &idx, &irec);
+	     found != 0;
+	     found = xfs_iext_get_extent(ifp, ++idx, &irec)) {
+		if (xfs_scrub_should_terminate(&error))
+			break;
+		if (isnullstartblock(irec.br_startblock))
+			continue;
+		if (!xfs_scrub_fblock_check_ok(sc, whichfork, irec.br_startoff,
+				irec.br_startoff < endoff))
+			goto out_unlock;
+		error = xfs_scrub_bmap_extent(ip, NULL, &info, &irec);
+		if (error == -EDEADLOCK)
+			return error;
+	}
+
+out_unlock:
+	return error;
+}
+
+/* Scrub an inode's data fork. */
+int
+xfs_scrub_bmap_data(
+	struct xfs_scrub_context	*sc)
+{
+	return xfs_scrub_bmap(sc, XFS_DATA_FORK);
+}
+
+/* Scrub an inode's attr fork. */
+int
+xfs_scrub_bmap_attr(
+	struct xfs_scrub_context	*sc)
+{
+	return xfs_scrub_bmap(sc, XFS_ATTR_FORK);
+}
+
+/* Scrub an inode's CoW fork. */
+int
+xfs_scrub_bmap_cow(
+	struct xfs_scrub_context	*sc)
+{
+	if (!xfs_is_reflink_inode(sc->ip))
+		return -ENOENT;
+
+	return xfs_scrub_bmap(sc, XFS_COW_FORK);
+}
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index ae6b557..79c00ad 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -104,7 +104,10 @@ int xfs_scrub_setup_ag_refcountbt(struct xfs_scrub_context *sc,
 				  struct xfs_inode *ip);
 int xfs_scrub_setup_inode(struct xfs_scrub_context *sc,
 			  struct xfs_inode *ip);
-
+int xfs_scrub_setup_inode_bmap(struct xfs_scrub_context *sc,
+			       struct xfs_inode *ip);
+int xfs_scrub_setup_inode_bmap_data(struct xfs_scrub_context *sc,
+				    struct xfs_inode *ip);
 
 void xfs_scrub_ag_free(struct xfs_scrub_context *sc, struct xfs_scrub_ag *sa);
 int xfs_scrub_ag_init(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index b88f86e..cd896e5 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -218,6 +218,18 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.setup	= xfs_scrub_setup_inode,
 		.scrub	= xfs_scrub_inode,
 	},
+	{ /* inode data fork */
+		.setup	= xfs_scrub_setup_inode_bmap_data,
+		.scrub	= xfs_scrub_bmap_data,
+	},
+	{ /* inode attr fork */
+		.setup	= xfs_scrub_setup_inode_bmap,
+		.scrub	= xfs_scrub_bmap_attr,
+	},
+	{ /* inode CoW fork */
+		.setup	= xfs_scrub_setup_inode_bmap,
+		.scrub	= xfs_scrub_bmap_cow,
+	},
 };
 
 /* Dispatch metadata scrubbing. */
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index fed8ed1..75323e4 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -74,5 +74,8 @@ int xfs_scrub_finobt(struct xfs_scrub_context *sc);
 int xfs_scrub_rmapbt(struct xfs_scrub_context *sc);
 int xfs_scrub_refcountbt(struct xfs_scrub_context *sc);
 int xfs_scrub_inode(struct xfs_scrub_context *sc);
+int xfs_scrub_bmap_data(struct xfs_scrub_context *sc);
+int xfs_scrub_bmap_attr(struct xfs_scrub_context *sc);
+int xfs_scrub_bmap_cow(struct xfs_scrub_context *sc);
 
 #endif	/* __XFS_SCRUB_SCRUB_H__ */


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 20/27] xfs: scrub directory/attribute btrees
  2017-09-21  0:17 [PATCH v10 00/27] xfs: online scrub support Darrick J. Wong
                   ` (18 preceding siblings ...)
  2017-09-21  0:19 ` [PATCH 19/27] xfs: scrub inode block mappings Darrick J. Wong
@ 2017-09-21  0:19 ` Darrick J. Wong
  2017-09-21  0:19 ` [PATCH 21/27] xfs: scrub directory metadata Darrick J. Wong
                   ` (7 subsequent siblings)
  27 siblings, 0 replies; 51+ messages in thread
From: Darrick J. Wong @ 2017-09-21  0:19 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, Fengguang Wu

From: Darrick J. Wong <darrick.wong@oracle.com>

Provide a way to check the shape and scrub the hashes and records
in a directory or extended attribute btree.  These are helper functions
for the directory & attribute scrubbers in subsequent patches.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[fengguang: remove unneeded variable to store return value]
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
---
 fs/xfs/Makefile        |    1 
 fs/xfs/scrub/dabtree.c |  539 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/dabtree.h |   52 +++++
 3 files changed, 592 insertions(+)
 create mode 100644 fs/xfs/scrub/dabtree.c
 create mode 100644 fs/xfs/scrub/dabtree.h


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 5a77489..b48437f 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -151,6 +151,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   bmap.o \
 				   btree.o \
 				   common.o \
+				   dabtree.o \
 				   ialloc.o \
 				   inode.o \
 				   refcount.o \
diff --git a/fs/xfs/scrub/dabtree.c b/fs/xfs/scrub/dabtree.c
new file mode 100644
index 0000000..832774a
--- /dev/null
+++ b/fs/xfs/scrub/dabtree.c
@@ -0,0 +1,539 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_inode_fork.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_dir2.h"
+#include "xfs_dir2_priv.h"
+#include "xfs_attr_leaf.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/trace.h"
+#include "scrub/dabtree.h"
+
+/* Directory/Attribute Btree */
+
+/* Check for da btree operation errors. */
+bool
+xfs_scrub_da_op_ok(
+	struct xfs_scrub_da_btree	*ds,
+	int				level,
+	int				*error)
+{
+	struct xfs_scrub_context	*sc = ds->sc;
+
+	if (*error == 0)
+		return true;
+
+	switch (*error) {
+	case -EDEADLOCK:
+		/* Used to restart an op with deadlock avoidance. */
+		trace_xfs_scrub_deadlock_retry(sc->ip, sc->sm, *error);
+		break;
+	case -EFSBADCRC:
+	case -EFSCORRUPTED:
+		/* Note the badness but don't abort. */
+		sc->sm->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT;
+		*error = 0;
+		/* fall through */
+	default:
+		trace_xfs_scrub_file_op_error(sc, ds->dargs.whichfork,
+				xfs_dir2_da_to_db(ds->dargs.geo,
+					ds->state->path.blk[level].blkno),
+				*error, __return_address);
+		break;
+	}
+	return false;
+}
+
+/* Check for da btree corruption. */
+bool
+xfs_scrub_da_check_ok(
+	struct xfs_scrub_da_btree	*ds,
+	int				level,
+	bool				fs_ok)
+{
+	struct xfs_scrub_context	*sc = ds->sc;
+
+	if (fs_ok)
+		return fs_ok;
+
+	sc->sm->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT;
+
+	trace_xfs_scrub_fblock_error(sc, ds->dargs.whichfork,
+			xfs_dir2_da_to_db(ds->dargs.geo,
+				ds->state->path.blk[level].blkno),
+			__return_address);
+	return fs_ok;
+}
+
+/* Find an entry at a certain level in a da btree. */
+STATIC void *
+xfs_scrub_da_btree_entry(
+	struct xfs_scrub_da_btree	*ds,
+	int				level,
+	int				rec)
+{
+	char				*ents;
+	void				*(*fn)(void *);
+	size_t				sz;
+	struct xfs_da_state_blk		*blk;
+
+	/* Dispatch the entry finding function. */
+	blk = &ds->state->path.blk[level];
+	switch (blk->magic) {
+	case XFS_ATTR_LEAF_MAGIC:
+	case XFS_ATTR3_LEAF_MAGIC:
+		fn = (xfs_da_leaf_ents_fn)xfs_attr3_leaf_entryp;
+		sz = sizeof(struct xfs_attr_leaf_entry);
+		break;
+	case XFS_DIR2_LEAFN_MAGIC:
+	case XFS_DIR3_LEAFN_MAGIC:
+		fn = (xfs_da_leaf_ents_fn)ds->dargs.dp->d_ops->leaf_ents_p;
+		sz = sizeof(struct xfs_dir2_leaf_entry);
+		break;
+	case XFS_DIR2_LEAF1_MAGIC:
+	case XFS_DIR3_LEAF1_MAGIC:
+		fn = (xfs_da_leaf_ents_fn)ds->dargs.dp->d_ops->leaf_ents_p;
+		sz = sizeof(struct xfs_dir2_leaf_entry);
+		break;
+	case XFS_DA_NODE_MAGIC:
+	case XFS_DA3_NODE_MAGIC:
+		fn = (xfs_da_leaf_ents_fn)ds->dargs.dp->d_ops->node_tree_p;
+		sz = sizeof(struct xfs_da_node_entry);
+		break;
+	default:
+		return NULL;
+	}
+
+	ents = fn(blk->bp->b_addr);
+	return ents + (sz * rec);
+}
+
+/* Scrub a da btree hash (key). */
+int
+xfs_scrub_da_btree_hash(
+	struct xfs_scrub_da_btree	*ds,
+	int				level,
+	__be32				*hashp)
+{
+	struct xfs_da_state_blk		*blks;
+	struct xfs_da_node_entry	*btree;
+	xfs_dahash_t			hash;
+	xfs_dahash_t			parent_hash;
+
+	/* Is this hash in order? */
+	hash = be32_to_cpu(*hashp);
+	xfs_scrub_da_check_ok(ds, level, hash >= ds->hashes[level]);
+	ds->hashes[level] = hash;
+
+	if (level == 0)
+		return 0;
+
+	/* Is this hash no larger than the parent hash? */
+	blks = ds->state->path.blk;
+	btree = xfs_scrub_da_btree_entry(ds, level - 1, blks[level - 1].index);
+	parent_hash = be32_to_cpu(btree->hashval);
+	xfs_scrub_da_check_ok(ds, level, hash <= parent_hash);
+
+	return 0;
+}
+
+/* Scrub a da btree pointer. */
+STATIC int
+xfs_scrub_da_btree_ptr(
+	struct xfs_scrub_da_btree	*ds,
+	int				level,
+	xfs_dablk_t			blkno)
+{
+	xfs_scrub_da_check_ok(ds, level, blkno >= ds->lowest &&
+			(ds->highest == 0 || blkno < ds->highest));
+
+	return 0;
+}
+
+/*
+ * The da btree scrubber can handle leaf1 blocks as a degenerate
+ * form of da btree.  Since the regular da code doesn't handle
+ * leaf1, we must multiplex the verifiers.
+ */
+static void
+xfs_scrub_da_btree_read_verify(
+	struct xfs_buf		*bp)
+{
+	struct xfs_da_blkinfo	*info = bp->b_addr;
+
+	switch (be16_to_cpu(info->magic)) {
+	case XFS_DIR2_LEAF1_MAGIC:
+	case XFS_DIR3_LEAF1_MAGIC:
+		bp->b_ops = &xfs_dir3_leaf1_buf_ops;
+		bp->b_ops->verify_read(bp);
+		return;
+	default:
+		bp->b_ops = &xfs_da3_node_buf_ops;
+		bp->b_ops->verify_read(bp);
+		return;
+	}
+}
+static void
+xfs_scrub_da_btree_write_verify(
+	struct xfs_buf	*bp)
+{
+	struct xfs_da_blkinfo	*info = bp->b_addr;
+
+	switch (be16_to_cpu(info->magic)) {
+	case XFS_DIR2_LEAF1_MAGIC:
+	case XFS_DIR3_LEAF1_MAGIC:
+		bp->b_ops = &xfs_dir3_leaf1_buf_ops;
+		bp->b_ops->verify_write(bp);
+		return;
+	default:
+		bp->b_ops = &xfs_da3_node_buf_ops;
+		bp->b_ops->verify_write(bp);
+		return;
+	}
+}
+
+static const struct xfs_buf_ops xfs_scrub_da_btree_buf_ops = {
+	.name = "xfs_scrub_da_btree",
+	.verify_read = xfs_scrub_da_btree_read_verify,
+	.verify_write = xfs_scrub_da_btree_write_verify,
+};
+
+/* Check a block's sibling. */
+STATIC int
+xfs_scrub_da_btree_block_check_sibling(
+	struct xfs_scrub_da_btree	*ds,
+	int				level,
+	int				direction,
+	xfs_dablk_t			sibling)
+{
+	int				retval;
+	int				error;
+
+	if (!sibling)
+		return 0;
+
+	/* Move the alternate cursor back one block. */
+	memcpy(&ds->state->altpath, &ds->state->path,
+			sizeof(ds->state->altpath));
+	error = xfs_da3_path_shift(ds->state, &ds->state->altpath,
+			direction, false, &retval);
+	if (!xfs_scrub_da_op_ok(ds, level, &error) ||
+	    !xfs_scrub_da_check_ok(ds, level, retval == 0))
+		return error;
+
+	xfs_scrub_da_check_ok(ds, level,
+			ds->state->altpath.blk[level].blkno == sibling);
+	xfs_trans_brelse(ds->dargs.trans, ds->state->altpath.blk[level].bp);
+	return error;
+}
+
+/* Check a block's sibling pointers. */
+STATIC int
+xfs_scrub_da_btree_block_check_siblings(
+	struct xfs_scrub_da_btree	*ds,
+	int				level,
+	struct xfs_da_blkinfo		*hdr)
+{
+	xfs_dablk_t			forw;
+	xfs_dablk_t			back;
+	int				error = 0;
+
+	forw = be32_to_cpu(hdr->forw);
+	back = be32_to_cpu(hdr->back);
+
+	/* Top level blocks should not have sibling pointers. */
+	if (level == 0) {
+		xfs_scrub_da_check_ok(ds, level, forw == 0 && back == 0);
+		return error;
+	}
+
+	/* Check back (left) pointer. */
+	error = xfs_scrub_da_btree_block_check_sibling(ds, level, 0, back);
+	if (error)
+		goto out;
+
+	/* Check forw (right) pointer. */
+	error = xfs_scrub_da_btree_block_check_sibling(ds, level, 1, forw);
+
+out:
+	memset(&ds->state->altpath, 0, sizeof(ds->state->altpath));
+	return error;
+}
+
+/* Load a dir/attribute block from a btree. */
+STATIC int
+xfs_scrub_da_btree_block(
+	struct xfs_scrub_da_btree	*ds,
+	int				level,
+	xfs_dablk_t			blkno)
+{
+	struct xfs_da_state_blk		*blk;
+	struct xfs_da_intnode		*node;
+	struct xfs_da_node_entry	*btree;
+	struct xfs_da3_blkinfo		*hdr3;
+	struct xfs_da_args		*dargs = &ds->dargs;
+	struct xfs_inode		*ip = ds->dargs.dp;
+	xfs_ino_t			owner;
+	int				*pmaxrecs;
+	struct xfs_da3_icnode_hdr	nodehdr;
+	int				error;
+
+	blk = &ds->state->path.blk[level];
+	ds->state->path.active = level + 1;
+
+	/* Release old block. */
+	if (blk->bp) {
+		xfs_trans_brelse(dargs->trans, blk->bp);
+		blk->bp = NULL;
+	}
+
+	/* Check the pointer. */
+	blk->blkno = blkno;
+	error = xfs_scrub_da_btree_ptr(ds, level, blkno);
+	if (error) {
+		blk->blkno = 0;
+		goto out;
+	}
+
+	/* Read the buffer. */
+	error = xfs_da_read_buf(dargs->trans, dargs->dp, blk->blkno, -2,
+			&blk->bp, dargs->whichfork,
+			&xfs_scrub_da_btree_buf_ops);
+	if (!xfs_scrub_da_op_ok(ds, level, &error))
+		goto out_nobuf;
+
+	/* It's ok for a directory not to have a da btree in it. */
+	if (ds->dargs.whichfork == XFS_DATA_FORK && level == 0 &&
+			blk->bp == NULL)
+		goto out_nobuf;
+	if (!xfs_scrub_da_check_ok(ds, level, blk->bp != NULL))
+		goto out_nobuf;
+
+	hdr3 = blk->bp->b_addr;
+	blk->magic = be16_to_cpu(hdr3->hdr.magic);
+	pmaxrecs = &ds->maxrecs[level];
+
+	/* Check the owner. */
+	if (xfs_sb_version_hascrc(&ip->i_mount->m_sb)) {
+		owner = be64_to_cpu(hdr3->owner);
+		error = -EFSCORRUPTED;
+		if (!xfs_scrub_da_check_ok(ds, level, owner == ip->i_ino))
+			goto out;
+	}
+
+	/* Check the siblings. */
+	error = xfs_scrub_da_btree_block_check_siblings(ds, level, &hdr3->hdr);
+	if (error)
+		goto out;
+
+	/* Interpret the buffer. */
+	error = -EFSCORRUPTED;
+	switch (blk->magic) {
+	case XFS_ATTR_LEAF_MAGIC:
+	case XFS_ATTR3_LEAF_MAGIC:
+		xfs_trans_buf_set_type(dargs->trans, blk->bp,
+				XFS_BLFT_ATTR_LEAF_BUF);
+		blk->magic = XFS_ATTR_LEAF_MAGIC;
+		blk->hashval = xfs_attr_leaf_lasthash(blk->bp, pmaxrecs);
+		xfs_scrub_da_check_ok(ds, level, ds->tree_level == 0);
+		break;
+	case XFS_DIR2_LEAFN_MAGIC:
+	case XFS_DIR3_LEAFN_MAGIC:
+		xfs_trans_buf_set_type(dargs->trans, blk->bp,
+				XFS_BLFT_DIR_LEAFN_BUF);
+		blk->magic = XFS_DIR2_LEAFN_MAGIC;
+		blk->hashval = xfs_dir2_leaf_lasthash(ip, blk->bp, pmaxrecs);
+		xfs_scrub_da_check_ok(ds, level, ds->tree_level == 0);
+		break;
+	case XFS_DIR2_LEAF1_MAGIC:
+	case XFS_DIR3_LEAF1_MAGIC:
+		xfs_trans_buf_set_type(dargs->trans, blk->bp,
+				XFS_BLFT_DIR_LEAF1_BUF);
+		blk->magic = XFS_DIR2_LEAF1_MAGIC;
+		blk->hashval = xfs_dir2_leaf_lasthash(ip, blk->bp, pmaxrecs);
+		xfs_scrub_da_check_ok(ds, level, ds->tree_level == 0);
+		break;
+	case XFS_DA_NODE_MAGIC:
+	case XFS_DA3_NODE_MAGIC:
+		xfs_trans_buf_set_type(dargs->trans, blk->bp,
+				XFS_BLFT_DA_NODE_BUF);
+		blk->magic = XFS_DA_NODE_MAGIC;
+		node = blk->bp->b_addr;
+		ip->d_ops->node_hdr_from_disk(&nodehdr, node);
+		btree = ip->d_ops->node_tree_p(node);
+		*pmaxrecs = nodehdr.count;
+		blk->hashval = be32_to_cpu(btree[*pmaxrecs - 1].hashval);
+		if (level == 0) {
+			if (!xfs_scrub_da_check_ok(ds, level,
+					nodehdr.level < XFS_DA_NODE_MAXDEPTH))
+				goto out;
+			ds->tree_level = nodehdr.level;
+		} else
+			if (!xfs_scrub_da_check_ok(ds, level,
+					ds->tree_level == nodehdr.level))
+				goto out;
+		break;
+	default:
+		xfs_scrub_da_check_ok(ds, level, false);
+		xfs_trans_brelse(dargs->trans, blk->bp);
+		blk->bp = NULL;
+		blk->blkno = 0;
+		break;
+	}
+	error = 0;
+
+out:
+	return error;
+out_nobuf:
+	blk->blkno = 0;
+	return error;
+}
+
+/* Visit all nodes and leaves of a da btree. */
+int
+xfs_scrub_da_btree(
+	struct xfs_scrub_context	*sc,
+	int				whichfork,
+	xfs_scrub_da_btree_rec_fn	scrub_fn)
+{
+	struct xfs_scrub_da_btree	ds;
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_da_state_blk		*blks;
+	struct xfs_da_node_entry	*key;
+	void				*rec;
+	xfs_dablk_t			blkno;
+	bool				is_attr;
+	int				level;
+	int				error;
+
+	memset(&ds, 0, sizeof(ds));
+	/* Skip short format data structures; no btree to scan. */
+	if (XFS_IFORK_FORMAT(sc->ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
+	    XFS_IFORK_FORMAT(sc->ip, whichfork) != XFS_DINODE_FMT_BTREE)
+		return 0;
+
+	/* Set up initial da state. */
+	is_attr = whichfork == XFS_ATTR_FORK;
+	ds.dargs.geo = is_attr ? mp->m_attr_geo : mp->m_dir_geo;
+	ds.dargs.dp = sc->ip;
+	ds.dargs.whichfork = whichfork;
+	ds.dargs.trans = sc->tp;
+	ds.dargs.op_flags = XFS_DA_OP_OKNOENT;
+	ds.state = xfs_da_state_alloc();
+	ds.state->args = &ds.dargs;
+	ds.state->mp = mp;
+	ds.sc = sc;
+	blkno = ds.lowest = is_attr ? 0 : ds.dargs.geo->leafblk;
+	ds.highest = is_attr ? 0 : ds.dargs.geo->freeblk;
+	level = 0;
+
+	/* Find the root of the da tree, if present. */
+	blks = ds.state->path.blk;
+	error = xfs_scrub_da_btree_block(&ds, level, blkno);
+	if (error)
+		goto out_state;
+	if (blks[level].bp == NULL)
+		goto out_state;
+
+	blks[level].index = 0;
+	while (level >= 0 && level < XFS_DA_NODE_MAXDEPTH) {
+		/* Handle leaf block. */
+		if (blks[level].magic != XFS_DA_NODE_MAGIC) {
+			/* End of leaf, pop back towards the root. */
+			if (blks[level].index >= ds.maxrecs[level]) {
+				if (level > 0)
+					blks[level - 1].index++;
+				ds.tree_level++;
+				level--;
+				continue;
+			}
+
+			/* Dispatch record scrubbing. */
+			rec = xfs_scrub_da_btree_entry(&ds, level,
+					blks[level].index);
+			error = scrub_fn(&ds, level, rec);
+			if (error < 0 ||
+			    error == XFS_BTREE_QUERY_RANGE_ABORT)
+				break;
+			if (xfs_scrub_should_terminate(&error))
+				break;
+
+			blks[level].index++;
+			continue;
+		}
+
+
+		/* End of node, pop back towards the root. */
+		if (blks[level].index >= ds.maxrecs[level]) {
+			if (level > 0)
+				blks[level - 1].index++;
+			ds.tree_level++;
+			level--;
+			continue;
+		}
+
+		/* Hashes in order for scrub? */
+		key = xfs_scrub_da_btree_entry(&ds, level, blks[level].index);
+		error = xfs_scrub_da_btree_hash(&ds, level, &key->hashval);
+		if (error)
+			goto out;
+
+		/* Drill another level deeper. */
+		blkno = be32_to_cpu(key->before);
+		level++;
+		ds.tree_level--;
+		error = xfs_scrub_da_btree_block(&ds, level, blkno);
+		if (error)
+			goto out;
+		if (blks[level].bp == NULL)
+			goto out;
+
+		blks[level].index = 0;
+	}
+
+out:
+	/* Release all the buffers we're tracking. */
+	for (level = 0; level < XFS_DA_NODE_MAXDEPTH; level++) {
+		if (blks[level].bp == NULL)
+			continue;
+		xfs_trans_brelse(sc->tp, blks[level].bp);
+		blks[level].bp = NULL;
+	}
+
+out_state:
+	xfs_da_state_free(ds.state);
+	return error;
+}
diff --git a/fs/xfs/scrub/dabtree.h b/fs/xfs/scrub/dabtree.h
new file mode 100644
index 0000000..75254f3
--- /dev/null
+++ b/fs/xfs/scrub/dabtree.h
@@ -0,0 +1,52 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef __XFS_SCRUB_DABTREE_H__
+#define __XFS_SCRUB_DABTREE_H__
+
+/* dir/attr btree */
+
+struct xfs_scrub_da_btree {
+	struct xfs_da_args		dargs;
+	xfs_dahash_t			hashes[XFS_DA_NODE_MAXDEPTH];
+	int				maxrecs[XFS_DA_NODE_MAXDEPTH];
+	struct xfs_da_state		*state;
+	struct xfs_scrub_context	*sc;
+	xfs_dablk_t			lowest;
+	xfs_dablk_t			highest;
+	int				tree_level;
+};
+
+typedef void *(*xfs_da_leaf_ents_fn)(void *);
+typedef int (*xfs_scrub_da_btree_rec_fn)(struct xfs_scrub_da_btree *ds,
+		int level, void *rec);
+
+/* Check for da btree operation errors. */
+bool xfs_scrub_da_op_ok(struct xfs_scrub_da_btree *ds, int level, int *error);
+
+/* Check for da btree corruption. */
+bool xfs_scrub_da_check_ok(struct xfs_scrub_da_btree *ds, int level,
+			   bool fs_ok);
+
+int xfs_scrub_da_btree_hash(struct xfs_scrub_da_btree *ds, int level,
+			    __be32 *hashp);
+int xfs_scrub_da_btree(struct xfs_scrub_context *sc, int whichfork,
+		       xfs_scrub_da_btree_rec_fn scrub_fn);
+
+#endif /* __XFS_SCRUB_DABTREE_H__ */


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 21/27] xfs: scrub directory metadata
  2017-09-21  0:17 [PATCH v10 00/27] xfs: online scrub support Darrick J. Wong
                   ` (19 preceding siblings ...)
  2017-09-21  0:19 ` [PATCH 20/27] xfs: scrub directory/attribute btrees Darrick J. Wong
@ 2017-09-21  0:19 ` Darrick J. Wong
  2017-09-21  0:19 ` [PATCH 22/27] xfs: scrub directory freespace Darrick J. Wong
                   ` (6 subsequent siblings)
  27 siblings, 0 replies; 51+ messages in thread
From: Darrick J. Wong @ 2017-09-21  0:19 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Scrub the hash tree and all the entries in a directory.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile        |    1 
 fs/xfs/libxfs/xfs_fs.h |    3 
 fs/xfs/scrub/common.c  |   33 +++++
 fs/xfs/scrub/common.h  |    4 +
 fs/xfs/scrub/dir.c     |  289 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/scrub.c   |    4 +
 fs/xfs/scrub/scrub.h   |    1 
 7 files changed, 334 insertions(+), 1 deletion(-)
 create mode 100644 fs/xfs/scrub/dir.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index b48437f..69aa88e 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -152,6 +152,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   btree.o \
 				   common.o \
 				   dabtree.o \
+				   dir.o \
 				   ialloc.o \
 				   inode.o \
 				   refcount.o \
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 27e9f90..8d2bea5 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -498,9 +498,10 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_BMBTD	12	/* data fork block mapping */
 #define XFS_SCRUB_TYPE_BMBTA	13	/* attr fork block mapping */
 #define XFS_SCRUB_TYPE_BMBTC	14	/* CoW fork block mapping */
+#define XFS_SCRUB_TYPE_DIR	15	/* directory */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	15
+#define XFS_SCRUB_TYPE_NR	16
 
 /* i: Repair this metadata. */
 #define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 2a1d456..a332610 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -601,3 +601,36 @@ xfs_scrub_get_inode(
 	sc->ip = ips;
 	return 0;
 }
+
+/* Set us up to scrub a file's contents. */
+int
+xfs_scrub_setup_inode_contents(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip,
+	unsigned int			resblks)
+{
+	struct xfs_mount		*mp = sc->mp;
+	int				error;
+
+	error = xfs_scrub_get_inode(sc, ip);
+	if (error)
+		return error;
+
+	/* Got the inode, lock it and we're ready to go. */
+	sc->ilock_flags = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL;
+	xfs_ilock(sc->ip, sc->ilock_flags);
+	error = xfs_scrub_trans_alloc(sc->sm, mp, &M_RES(mp)->tr_itruncate,
+			resblks, 0, 0, &sc->tp);
+	if (error)
+		goto out_unlock;
+	sc->ilock_flags |= XFS_ILOCK_EXCL;
+	xfs_ilock(sc->ip, XFS_ILOCK_EXCL);
+
+	return 0;
+out_unlock:
+	xfs_iunlock(sc->ip, sc->ilock_flags);
+	if (sc->ip != ip)
+		iput(VFS_I(sc->ip));
+	sc->ip = NULL;
+	return error;
+}
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 79c00ad..82b8056 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -108,6 +108,8 @@ int xfs_scrub_setup_inode_bmap(struct xfs_scrub_context *sc,
 			       struct xfs_inode *ip);
 int xfs_scrub_setup_inode_bmap_data(struct xfs_scrub_context *sc,
 				    struct xfs_inode *ip);
+int xfs_scrub_setup_directory(struct xfs_scrub_context *sc,
+			      struct xfs_inode *ip);
 
 void xfs_scrub_ag_free(struct xfs_scrub_context *sc, struct xfs_scrub_ag *sa);
 int xfs_scrub_ag_init(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
@@ -128,5 +130,7 @@ int xfs_scrub_walk_agfl(struct xfs_scrub_context *sc,
 int xfs_scrub_setup_ag_btree(struct xfs_scrub_context *sc,
 			     struct xfs_inode *ip, bool force_log);
 int xfs_scrub_get_inode(struct xfs_scrub_context *sc, struct xfs_inode *ip_in);
+int xfs_scrub_setup_inode_contents(struct xfs_scrub_context *sc,
+				   struct xfs_inode *ip, unsigned int resblks);
 
 #endif	/* __XFS_SCRUB_COMMON_H__ */
diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c
new file mode 100644
index 0000000..57adca5
--- /dev/null
+++ b/fs/xfs/scrub/dir.c
@@ -0,0 +1,289 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_icache.h"
+#include "xfs_itable.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_dir2.h"
+#include "xfs_dir2_priv.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/trace.h"
+#include "scrub/dabtree.h"
+
+/* Set us up to scrub directories. */
+int
+xfs_scrub_setup_directory(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	return xfs_scrub_setup_inode_contents(sc, ip, 0);
+}
+
+/* Directories */
+
+/* Scrub a directory entry. */
+
+struct xfs_scrub_dir_ctx {
+	struct dir_context		dc;
+	struct xfs_scrub_context	*sc;
+};
+
+/* Check that an inode's mode matches a given DT_ type. */
+STATIC int
+xfs_scrub_dir_check_ftype(
+	struct xfs_scrub_dir_ctx	*sdc,
+	xfs_fileoff_t			offset,
+	xfs_ino_t			inum,
+	int				dtype)
+{
+	struct xfs_mount		*mp = sdc->sc->mp;
+	struct xfs_inode		*ip;
+	int				ino_dtype;
+	int				error = 0;
+
+	if (!xfs_sb_version_hasftype(&mp->m_sb)) {
+		xfs_scrub_fblock_check_ok(sdc->sc, XFS_DATA_FORK, offset,
+				dtype == DT_UNKNOWN || dtype == DT_DIR);
+		goto out;
+	}
+
+	error = xfs_iget(mp, sdc->sc->tp, inum, 0, 0, &ip);
+	if (!xfs_scrub_fblock_op_ok(sdc->sc, XFS_DATA_FORK, offset, &error))
+		goto out;
+
+	/* Convert mode to the DT_* values that dir_emit uses. */
+	ino_dtype = (VFS_I(ip)->i_mode & S_IFMT) >> 12;
+	xfs_scrub_fblock_check_ok(sdc->sc, XFS_DATA_FORK, offset,
+			ino_dtype == dtype);
+	iput(VFS_I(ip));
+out:
+	return error;
+}
+
+/* Scrub a single directory entry. */
+STATIC int
+xfs_scrub_dir_actor(
+	struct dir_context		*dc,
+	const char			*name,
+	int				namelen,
+	loff_t				pos,
+	u64				ino,
+	unsigned			type)
+{
+	struct xfs_mount		*mp;
+	struct xfs_inode		*ip;
+	struct xfs_scrub_dir_ctx	*sdc;
+	struct xfs_name			xname;
+	xfs_ino_t			lookup_ino;
+	xfs_dablk_t			offset;
+	int				error = 0;
+
+	sdc = container_of(dc, struct xfs_scrub_dir_ctx, dc);
+	ip = sdc->sc->ip;
+	mp = ip->i_mount;
+	offset = xfs_dir2_db_to_da(mp->m_dir_geo,
+			xfs_dir2_dataptr_to_db(mp->m_dir_geo, pos));
+
+	/* Does this inode number make sense? */
+	if (!xfs_scrub_fblock_check_ok(sdc->sc, XFS_DATA_FORK, offset,
+			xfs_dir_ino_validate(mp, ino) == 0 &&
+			!xfs_internal_inum(mp, ino)))
+		goto out;
+
+	/* Verify that we can look up this name by hash. */
+	xname.name = name;
+	xname.len = namelen;
+	xname.type = XFS_DIR3_FT_UNKNOWN;
+
+	error = xfs_dir_lookup(sdc->sc->tp, ip, &xname, &lookup_ino, NULL);
+	if (!xfs_scrub_fblock_op_ok(sdc->sc, XFS_DATA_FORK, offset, &error))
+		goto fail_xref;
+	if (!xfs_scrub_fblock_check_ok(sdc->sc, XFS_DATA_FORK, offset,
+			lookup_ino == ino))
+		goto out;
+
+	if (!strncmp(".", name, namelen)) {
+		/* If this is "." then check that the inum matches the dir. */
+		xfs_scrub_fblock_check_ok(sdc->sc, XFS_DATA_FORK, offset,
+				(!xfs_sb_version_hasftype(&mp->m_sb) ||
+				 type == DT_DIR) &&
+				ino == ip->i_ino);
+	} else if (!strncmp("..", name, namelen)) {
+		/*
+		 * If this is ".." in the root inode, check that the inum
+		 * matches this dir.
+		 */
+		xfs_scrub_fblock_check_ok(sdc->sc, XFS_DATA_FORK, offset,
+				(!xfs_sb_version_hasftype(&mp->m_sb) ||
+				 type == DT_DIR) &&
+				(ip->i_ino != mp->m_sb.sb_rootino ||
+				 ino == ip->i_ino));
+	}
+
+	/* Verify the file type. */
+	error = xfs_scrub_dir_check_ftype(sdc, offset, lookup_ino, type);
+	if (error)
+		goto out;
+out:
+	return error;
+fail_xref:
+	return error ? error : -EFSCORRUPTED;
+}
+
+/* Scrub a directory btree record. */
+STATIC int
+xfs_scrub_dir_rec(
+	struct xfs_scrub_da_btree	*ds,
+	int				level,
+	void				*rec)
+{
+	struct xfs_mount		*mp = ds->state->mp;
+	struct xfs_dir2_leaf_entry	*ent = rec;
+	struct xfs_inode		*dp = ds->dargs.dp;
+	struct xfs_dir2_data_entry	*dent;
+	struct xfs_buf			*bp;
+	xfs_ino_t			ino;
+	xfs_dablk_t			rec_bno;
+	xfs_dir2_db_t			db;
+	xfs_dir2_data_aoff_t		off;
+	xfs_dir2_dataptr_t		ptr;
+	xfs_dahash_t			calc_hash;
+	xfs_dahash_t			hash;
+	unsigned int			tag;
+	int				error;
+
+	/* Check the hash of the entry. */
+	error = xfs_scrub_da_btree_hash(ds, level, &ent->hashval);
+	if (error)
+		goto out;
+
+	/* Valid hash pointer? */
+	ptr = be32_to_cpu(ent->address);
+	if (ptr == 0)
+		return 0;
+
+	/* Find the directory entry's location. */
+	db = xfs_dir2_dataptr_to_db(mp->m_dir_geo, ptr);
+	off = xfs_dir2_dataptr_to_off(mp->m_dir_geo, ptr);
+	rec_bno = xfs_dir2_db_to_da(mp->m_dir_geo, db);
+
+	if (!xfs_scrub_da_check_ok(ds, level, rec_bno < mp->m_dir_geo->leafblk))
+		goto out;
+	error = xfs_dir3_data_read(ds->dargs.trans, dp, rec_bno, -2, &bp);
+	if (!xfs_scrub_fblock_op_ok(ds->sc, XFS_DATA_FORK, rec_bno, &error) ||
+	    !xfs_scrub_fblock_check_ok(ds->sc, XFS_DATA_FORK, rec_bno,
+			bp != NULL))
+		goto out;
+
+	/* Retrieve the entry and check it. */
+	dent = (struct xfs_dir2_data_entry *)(((char *)bp->b_addr) + off);
+	ino = be64_to_cpu(dent->inumber);
+	hash = be32_to_cpu(ent->hashval);
+	tag = be16_to_cpup(dp->d_ops->data_entry_tag_p(dent));
+	xfs_scrub_fblock_check_ok(ds->sc, XFS_DATA_FORK, rec_bno,
+			xfs_dir_ino_validate(mp, ino) == 0 &&
+			!xfs_internal_inum(mp, ino) &&
+			tag == off);
+	if (!xfs_scrub_fblock_check_ok(ds->sc, XFS_DATA_FORK, rec_bno,
+			dent->namelen > 0))
+		goto out_relse;
+	calc_hash = xfs_da_hashname(dent->name, dent->namelen);
+	xfs_scrub_fblock_check_ok(ds->sc, XFS_DATA_FORK, rec_bno,
+			calc_hash == hash);
+
+out_relse:
+	xfs_trans_brelse(ds->dargs.trans, bp);
+out:
+	return error;
+}
+
+/* Scrub a whole directory. */
+int
+xfs_scrub_directory(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_scrub_dir_ctx	sdc = {
+		.dc.actor = xfs_scrub_dir_actor,
+		.dc.pos = 0,
+	};
+	size_t				bufsize;
+	loff_t				oldpos;
+	int				error;
+
+	if (!S_ISDIR(VFS_I(sc->ip)->i_mode))
+		return -ENOENT;
+
+	/* Plausible size? */
+	if (!xfs_scrub_ino_check_ok(sc, sc->ip->i_ino, NULL,
+			sc->ip->i_d.di_size >= xfs_dir2_sf_hdr_size(0)))
+		goto out;
+
+	/* Check directory tree structure */
+	error = xfs_scrub_da_btree(sc, XFS_DATA_FORK, xfs_scrub_dir_rec);
+	if (error)
+		return error;
+
+	/*
+	 * Check that every dirent we see can also be looked up by hash.
+	 * Userspace usually asks for a 32k buffer, so we will too.
+	 */
+	bufsize = (size_t)min_t(loff_t, 32768, sc->ip->i_d.di_size);
+	sdc.sc = sc;
+
+	/*
+	 * Look up every name in this directory by hash.
+	 *
+	 * The VFS grabs a read or write lock via i_rwsem before it reads
+	 * or writes to a directory.  If we've gotten this far we've
+	 * already obtained IOLOCK_EXCL, which (since 4.10) is the same as
+	 * getting a write lock on i_rwsem.  Therefore, it is safe for us
+	 * to drop the ILOCK here in order to reuse the _readdir and
+	 * _dir_lookup routines, which do their own ILOCK locking.
+	 */
+	oldpos = 0;
+	sc->ilock_flags &= ~XFS_ILOCK_EXCL;
+	xfs_iunlock(sc->ip, XFS_ILOCK_EXCL);
+	while (true) {
+		error = xfs_readdir(sc->tp, sc->ip, &sdc.dc, bufsize);
+		if (!xfs_scrub_fblock_op_ok(sc, XFS_DATA_FORK, 0, &error))
+			goto out;
+		if (oldpos == sdc.dc.pos)
+			break;
+		oldpos = sdc.dc.pos;
+	}
+
+out:
+	return error;
+}
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index cd896e5..760492a 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -230,6 +230,10 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.setup	= xfs_scrub_setup_inode_bmap,
 		.scrub	= xfs_scrub_bmap_cow,
 	},
+	{ /* directory */
+		.setup	= xfs_scrub_setup_directory,
+		.scrub	= xfs_scrub_directory,
+	},
 };
 
 /* Dispatch metadata scrubbing. */
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index 75323e4..4c348e8 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -77,5 +77,6 @@ int xfs_scrub_inode(struct xfs_scrub_context *sc);
 int xfs_scrub_bmap_data(struct xfs_scrub_context *sc);
 int xfs_scrub_bmap_attr(struct xfs_scrub_context *sc);
 int xfs_scrub_bmap_cow(struct xfs_scrub_context *sc);
+int xfs_scrub_directory(struct xfs_scrub_context *sc);
 
 #endif	/* __XFS_SCRUB_SCRUB_H__ */


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 22/27] xfs: scrub directory freespace
  2017-09-21  0:17 [PATCH v10 00/27] xfs: online scrub support Darrick J. Wong
                   ` (20 preceding siblings ...)
  2017-09-21  0:19 ` [PATCH 21/27] xfs: scrub directory metadata Darrick J. Wong
@ 2017-09-21  0:19 ` Darrick J. Wong
  2017-09-21  0:20 ` [PATCH 23/27] xfs: scrub extended attributes Darrick J. Wong
                   ` (5 subsequent siblings)
  27 siblings, 0 replies; 51+ messages in thread
From: Darrick J. Wong @ 2017-09-21  0:19 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Check the free space information in a directory.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/scrub/dir.c |  339 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 339 insertions(+)


diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c
index 57adca5..54d20e7 100644
--- a/fs/xfs/scrub/dir.c
+++ b/fs/xfs/scrub/dir.c
@@ -229,6 +229,340 @@ xfs_scrub_dir_rec(
 	return error;
 }
 
+/* Is this free entry either in the bestfree or smaller than all of them? */
+static inline bool
+xfs_scrub_directory_check_free_entry(
+	struct xfs_dir2_data_free	*bf,
+	struct xfs_dir2_data_unused	*dup)
+{
+	struct xfs_dir2_data_free	*dfp;
+	unsigned int			smallest;
+
+	smallest = -1U;
+	for (dfp = &bf[0]; dfp < &bf[XFS_DIR2_DATA_FD_COUNT]; dfp++) {
+		if (dfp->offset &&
+		    be16_to_cpu(dfp->length) == be16_to_cpu(dup->length))
+			return true;
+		if (smallest < be16_to_cpu(dfp->length))
+			smallest = be16_to_cpu(dfp->length);
+	}
+
+	return be16_to_cpu(dup->length) <= smallest;
+}
+
+/* Check free space info in a directory data block. */
+STATIC int
+xfs_scrub_directory_data_bestfree(
+	struct xfs_scrub_context	*sc,
+	xfs_dablk_t			lblk,
+	bool				is_block)
+{
+	struct xfs_dir2_data_unused	*dup;
+	struct xfs_dir2_data_free	*dfp;
+	struct xfs_buf			*bp;
+	struct xfs_dir2_data_free	*bf;
+	struct xfs_mount		*mp = sc->mp;
+	char				*ptr;
+	char				*endptr;
+	u16				tag;
+	int				newlen;
+	int				offset;
+	int				error;
+
+	if (is_block) {
+		/* dir block format */
+		xfs_scrub_fblock_check_ok(sc, XFS_DATA_FORK, lblk, lblk ==
+				XFS_B_TO_FSBT(mp, XFS_DIR2_DATA_OFFSET));
+		error = xfs_dir3_block_read(sc->tp, sc->ip, &bp);
+	} else {
+		/* dir data format */
+		error = xfs_dir3_data_read(sc->tp, sc->ip, lblk, -1, &bp);
+	}
+	if (!xfs_scrub_fblock_op_ok(sc, XFS_DATA_FORK, lblk, &error))
+		goto out;
+
+	/* Do the bestfrees correspond to actual free space? */
+	bf = sc->ip->d_ops->data_bestfree_p(bp->b_addr);
+	for (dfp = &bf[0]; dfp < &bf[XFS_DIR2_DATA_FD_COUNT]; dfp++) {
+		offset = be16_to_cpu(dfp->offset);
+		if (!xfs_scrub_fblock_check_ok(sc, XFS_DATA_FORK, lblk,
+				offset < BBTOB(bp->b_length)) || !offset)
+			continue;
+		dup = (struct xfs_dir2_data_unused *)(bp->b_addr + offset);
+		tag = be16_to_cpu(*xfs_dir2_data_unused_tag_p(dup));
+
+		xfs_scrub_fblock_check_ok(sc, XFS_DATA_FORK, lblk,
+				dup->freetag ==
+					cpu_to_be16(XFS_DIR2_DATA_FREE_TAG) &&
+				be16_to_cpu(dup->length) ==
+					be16_to_cpu(dfp->length) &&
+				tag == ((char *)dup - (char *)bp->b_addr));
+	}
+
+	/* Make sure the bestfrees are actually the best free spaces. */
+	ptr = (char *)sc->ip->d_ops->data_entry_p(bp->b_addr);
+	if (is_block) {
+		struct xfs_dir2_block_tail	*btp;
+
+		btp = xfs_dir2_block_tail_p(sc->mp->m_dir_geo, bp->b_addr);
+		endptr = (char *)xfs_dir2_block_leaf_p(btp);
+	} else
+		endptr = (char *)bp->b_addr + BBTOB(bp->b_length);
+	while (ptr < endptr) {
+		dup = (struct xfs_dir2_data_unused *)ptr;
+		/* Skip real entries */
+		if (dup->freetag != cpu_to_be16(XFS_DIR2_DATA_FREE_TAG)) {
+			struct xfs_dir2_data_entry	*dep;
+
+			dep = (struct xfs_dir2_data_entry *)ptr;
+			newlen = sc->ip->d_ops->data_entsize(dep->namelen);
+			if (!xfs_scrub_fblock_check_ok(sc, XFS_DATA_FORK, lblk,
+					newlen > 0))
+				goto out_buf;
+			ptr += newlen;
+			xfs_scrub_fblock_check_ok(sc, XFS_DATA_FORK, lblk,
+					ptr <= endptr);
+			continue;
+		}
+
+		/* Spot check this free entry */
+		tag = be16_to_cpu(*xfs_dir2_data_unused_tag_p(dup));
+		xfs_scrub_fblock_check_ok(sc, XFS_DATA_FORK, lblk,
+				tag == ((char *)dup - (char *)bp->b_addr));
+
+		/*
+		 * Either this entry is a bestfree or it's smaller than
+		 * any of the bestfrees.
+		 */
+		xfs_scrub_fblock_check_ok(sc, XFS_DATA_FORK, lblk,
+				xfs_scrub_directory_check_free_entry(bf, dup));
+
+		/* Move on. */
+		newlen = be16_to_cpu(dup->length);
+		if (!xfs_scrub_fblock_check_ok(sc, XFS_DATA_FORK, lblk,
+				newlen > 0))
+			goto out_buf;
+		ptr += newlen;
+		xfs_scrub_fblock_check_ok(sc, XFS_DATA_FORK, lblk,
+				ptr <= endptr);
+	}
+out_buf:
+	xfs_trans_brelse(sc->tp, bp);
+out:
+	return error;
+}
+
+/* Is this the longest free entry in the block? */
+static inline bool
+xfs_scrub_directory_check_freesp(
+	struct xfs_inode		*dp,
+	struct xfs_buf			*dbp,
+	unsigned int			len)
+{
+	struct xfs_dir2_data_free	*bf;
+	struct xfs_dir2_data_free	*dfp;
+	unsigned int			longest = 0;
+	int				offset;
+
+	bf = dp->d_ops->data_bestfree_p(dbp->b_addr);
+	for (dfp = &bf[0]; dfp < &bf[XFS_DIR2_DATA_FD_COUNT]; dfp++) {
+		offset = be16_to_cpu(dfp->offset);
+		if (!offset)
+			continue;
+		if (longest < be16_to_cpu(dfp->length))
+			longest = be16_to_cpu(dfp->length);
+	}
+
+	return longest == len;
+}
+
+/* Check free space info in a directory leaf1 block. */
+STATIC int
+xfs_scrub_directory_leaf1_bestfree(
+	struct xfs_scrub_context	*sc,
+	struct xfs_da_args		*args,
+	xfs_dablk_t			lblk)
+{
+	struct xfs_dir2_leaf_tail	*ltp;
+	struct xfs_buf			*dbp;
+	struct xfs_buf			*bp;
+	struct xfs_mount		*mp = sc->mp;
+	__be16				*bestp;
+	__u16				best;
+	int				i;
+	int				error;
+
+	/* Read the free space block */
+	error = xfs_dir3_leaf_read(sc->tp, sc->ip, lblk, -1, &bp);
+	if (!xfs_scrub_fblock_op_ok(sc, XFS_DATA_FORK, lblk, &error))
+		goto out;
+
+	/* Check all the entries. */
+	ltp = xfs_dir2_leaf_tail_p(mp->m_dir_geo, bp->b_addr);
+	bestp = xfs_dir2_leaf_bests_p(ltp);
+	for (i = 0; i < be32_to_cpu(ltp->bestcount); i++, bestp++) {
+		best = be16_to_cpu(*bestp);
+		if (best == NULLDATAOFF)
+			continue;
+		error = xfs_dir3_data_read(sc->tp, sc->ip,
+				i * args->geo->fsbcount, -1, &dbp);
+		if (!xfs_scrub_fblock_op_ok(sc, XFS_DATA_FORK, lblk, &error))
+			continue;
+		xfs_scrub_fblock_check_ok(sc, XFS_DATA_FORK, lblk,
+				xfs_scrub_directory_check_freesp(sc->ip, dbp,
+					best));
+		xfs_trans_brelse(sc->tp, dbp);
+	}
+out:
+	return error;
+}
+
+/* Check free space info in a directory freespace block. */
+STATIC int
+xfs_scrub_directory_free_bestfree(
+	struct xfs_scrub_context	*sc,
+	struct xfs_da_args		*args,
+	xfs_dablk_t			lblk)
+{
+	struct xfs_dir3_icfree_hdr	freehdr;
+	struct xfs_buf			*dbp;
+	struct xfs_buf			*bp;
+	__be16				*bestp;
+	__be16				best;
+	int				i;
+	int				error;
+
+	/* Read the free space block */
+	error = xfs_dir2_free_read(sc->tp, sc->ip, lblk, &bp);
+	if (!xfs_scrub_fblock_op_ok(sc, XFS_DATA_FORK, lblk, &error))
+		goto out;
+
+	/* Check all the entries. */
+	sc->ip->d_ops->free_hdr_from_disk(&freehdr, bp->b_addr);
+	bestp = sc->ip->d_ops->free_bests_p(bp->b_addr);
+	for (i = 0; i < freehdr.nvalid; i++, bestp++) {
+		best = be16_to_cpu(*bestp);
+		if (best == NULLDATAOFF)
+			continue;
+		error = xfs_dir3_data_read(sc->tp, sc->ip,
+				(freehdr.firstdb + i) * args->geo->fsbcount,
+				-1, &dbp);
+		if (!xfs_scrub_fblock_op_ok(sc, XFS_DATA_FORK, lblk, &error))
+			continue;
+		xfs_scrub_fblock_check_ok(sc, XFS_DATA_FORK, lblk,
+				xfs_scrub_directory_check_freesp(sc->ip, dbp,
+					best));
+		xfs_trans_brelse(sc->tp, dbp);
+	}
+out:
+	return error;
+}
+
+/* Check free space information in directories. */
+STATIC int
+xfs_scrub_directory_blocks(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_bmbt_irec		got;
+	struct xfs_da_args		args;
+	struct xfs_ifork		*ifp;
+	struct xfs_mount		*mp = sc->mp;
+	xfs_fileoff_t			leaf_lblk;
+	xfs_fileoff_t			free_lblk;
+	xfs_fileoff_t			lblk;
+	xfs_extnum_t			idx;
+	bool				found;
+	int				is_block = 0;
+	int				error;
+
+	/* Ignore local format directories. */
+	if (sc->ip->i_d.di_format != XFS_DINODE_FMT_EXTENTS &&
+	    sc->ip->i_d.di_format != XFS_DINODE_FMT_BTREE)
+		return 0;
+
+	ifp = XFS_IFORK_PTR(sc->ip, XFS_DATA_FORK);
+	lblk = XFS_B_TO_FSB(mp, XFS_DIR2_DATA_OFFSET);
+	leaf_lblk = XFS_B_TO_FSB(mp, XFS_DIR2_LEAF_OFFSET);
+	free_lblk = XFS_B_TO_FSB(mp, XFS_DIR2_FREE_OFFSET);
+
+	/* Is this a block dir? */
+	args.dp = sc->ip;
+	args.geo = mp->m_dir_geo;
+	args.trans = sc->tp;
+	error = xfs_dir2_isblock(&args, &is_block);
+	if (!xfs_scrub_fblock_op_ok(sc, XFS_DATA_FORK, lblk, &error))
+		goto out;
+
+	/* Iterate all the data extents in the directory... */
+	found = xfs_iext_lookup_extent(sc->ip, ifp, lblk, &idx, &got);
+	while (found) {
+		/* No more data blocks... */
+		if (got.br_startoff >= leaf_lblk)
+			break;
+
+		/* Check each data block's bestfree data */
+		for (lblk = roundup((xfs_dablk_t)got.br_startoff,
+				args.geo->fsbcount);
+		     lblk < got.br_startoff + got.br_blockcount;
+		     lblk += args.geo->fsbcount) {
+			error = xfs_scrub_directory_data_bestfree(sc, lblk,
+					is_block);
+			if (error)
+				goto out;
+		}
+
+		found = xfs_iext_get_extent(ifp, ++idx, &got);
+	}
+
+	/* Look for a leaf1 block, which has free info. */
+	if (xfs_iext_lookup_extent(sc->ip, ifp, leaf_lblk, &idx, &got) &&
+	    got.br_startoff == leaf_lblk &&
+	    got.br_blockcount == args.geo->fsbcount &&
+	    !xfs_iext_get_extent(ifp, ++idx, &got)) {
+		if (!xfs_scrub_fblock_check_ok(sc, XFS_DATA_FORK, lblk,
+				!is_block))
+			goto not_leaf1;
+		error = xfs_scrub_directory_leaf1_bestfree(sc, &args,
+				leaf_lblk);
+		if (error)
+			goto out;
+	}
+not_leaf1:
+
+	/* Scan for free blocks */
+	lblk = free_lblk;
+	found = xfs_iext_lookup_extent(sc->ip, ifp, lblk, &idx, &got);
+	while (found) {
+		/*
+		 * Dirs can't have blocks mapped above 2^32.
+		 * Single-block dirs shouldn't even be here.
+		 */
+		lblk = got.br_startoff;
+		if (!xfs_scrub_fblock_check_ok(sc, XFS_DATA_FORK, lblk,
+				!(lblk & ~((1ULL << 32) - 1ULL))))
+			goto out;
+		if (!xfs_scrub_fblock_check_ok(sc, XFS_DATA_FORK, lblk,
+				!is_block))
+			goto nextfree;
+
+		/* Check each dir free block's bestfree data */
+		for (lblk = roundup((xfs_dablk_t)got.br_startoff,
+				args.geo->fsbcount);
+		     lblk < got.br_startoff + got.br_blockcount;
+		     lblk += args.geo->fsbcount) {
+			error = xfs_scrub_directory_free_bestfree(sc, &args,
+					lblk);
+			if (error)
+				goto out;
+		}
+
+nextfree:
+		found = xfs_iext_get_extent(ifp, ++idx, &got);
+	}
+out:
+	return error;
+}
+
 /* Scrub a whole directory. */
 int
 xfs_scrub_directory(
@@ -255,6 +589,11 @@ xfs_scrub_directory(
 	if (error)
 		return error;
 
+	/* Check the freespace. */
+	error = xfs_scrub_directory_blocks(sc);
+	if (error)
+		return error;
+
 	/*
 	 * Check that every dirent we see can also be looked up by hash.
 	 * Userspace usually asks for a 32k buffer, so we will too.


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 23/27] xfs: scrub extended attributes
  2017-09-21  0:17 [PATCH v10 00/27] xfs: online scrub support Darrick J. Wong
                   ` (21 preceding siblings ...)
  2017-09-21  0:19 ` [PATCH 22/27] xfs: scrub directory freespace Darrick J. Wong
@ 2017-09-21  0:20 ` Darrick J. Wong
  2017-09-21  0:20 ` [PATCH 24/27] xfs: scrub symbolic links Darrick J. Wong
                   ` (4 subsequent siblings)
  27 siblings, 0 replies; 51+ messages in thread
From: Darrick J. Wong @ 2017-09-21  0:20 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Scrub the hash tree, keys, and values in an extended attribute structure.
Refactor the attribute code to use the transaction if the caller supplied
one to avoid buffer deadocks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile        |    1 
 fs/xfs/libxfs/xfs_fs.h |    3 -
 fs/xfs/scrub/attr.c    |  210 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/common.h  |    2 
 fs/xfs/scrub/scrub.c   |    8 ++
 fs/xfs/scrub/scrub.h   |    2 
 6 files changed, 225 insertions(+), 1 deletion(-)
 create mode 100644 fs/xfs/scrub/attr.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 69aa88e..4d46399 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -148,6 +148,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   trace.o \
 				   agheader.o \
 				   alloc.o \
+				   attr.o \
 				   bmap.o \
 				   btree.o \
 				   common.o \
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 8d2bea5..d31d743 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -499,9 +499,10 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_BMBTA	13	/* attr fork block mapping */
 #define XFS_SCRUB_TYPE_BMBTC	14	/* CoW fork block mapping */
 #define XFS_SCRUB_TYPE_DIR	15	/* directory */
+#define XFS_SCRUB_TYPE_XATTR	16	/* extended attribute */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	16
+#define XFS_SCRUB_TYPE_NR	17
 
 /* i: Repair this metadata. */
 #define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
new file mode 100644
index 0000000..4c31e10
--- /dev/null
+++ b/fs/xfs/scrub/attr.c
@@ -0,0 +1,210 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_dir2.h"
+#include "xfs_attr.h"
+#include "xfs_attr_leaf.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/dabtree.h"
+#include "scrub/trace.h"
+
+#include <linux/posix_acl_xattr.h>
+#include <linux/xattr.h>
+
+/* Set us up to scrub an inode's extended attributes. */
+int
+xfs_scrub_setup_xattr(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	/* Allocate the buffer without the inode lock held. */
+	sc->buf = kmem_zalloc_large(XATTR_SIZE_MAX, KM_SLEEP);
+	if (!sc->buf)
+		return -ENOMEM;
+
+	return xfs_scrub_setup_inode_contents(sc, ip, 0);
+}
+
+/* Extended Attributes */
+
+struct xfs_scrub_xattr {
+	struct xfs_attr_list_context	context;
+	struct xfs_scrub_context	*sc;
+};
+
+/* Check that an extended attribute key can be looked up by hash. */
+static void
+xfs_scrub_xattr_listent(
+	struct xfs_attr_list_context	*context,
+	int				flags,
+	unsigned char			*name,
+	int				namelen,
+	int				valuelen)
+{
+	struct xfs_scrub_xattr		*sx;
+	struct xfs_da_args		args = {0};
+	int				error = 0;
+
+	sx = container_of(context, struct xfs_scrub_xattr, context);
+
+	args.flags = ATTR_KERNOTIME;
+	if (flags & XFS_ATTR_ROOT)
+		args.flags |= ATTR_ROOT;
+	else if (flags & XFS_ATTR_SECURE)
+		args.flags |= ATTR_SECURE;
+	args.geo = context->dp->i_mount->m_attr_geo;
+	args.whichfork = XFS_ATTR_FORK;
+	args.dp = context->dp;
+	args.name = name;
+	args.namelen = namelen;
+	args.hashval = xfs_da_hashname(args.name, args.namelen);
+	args.trans = context->tp;
+	args.value = sx->sc->buf;
+	args.valuelen = XATTR_SIZE_MAX;
+
+	error = xfs_attr_get_ilocked(context->dp, &args);
+	if (error == -EEXIST)
+		error = 0;
+	if (!xfs_scrub_fblock_op_ok(sx->sc, XFS_ATTR_FORK, args.blkno, &error))
+		goto fail_xref;
+	xfs_scrub_fblock_check_ok(sx->sc, XFS_ATTR_FORK, args.blkno,
+			args.valuelen == valuelen);
+
+fail_xref:
+	return;
+}
+
+/* Scrub a attribute btree record. */
+STATIC int
+xfs_scrub_xattr_rec(
+	struct xfs_scrub_da_btree	*ds,
+	int				level,
+	void				*rec)
+{
+	struct xfs_mount		*mp = ds->state->mp;
+	struct xfs_attr_leaf_entry	*ent = rec;
+	struct xfs_da_state_blk		*blk;
+	struct xfs_attr_leaf_name_local	*lentry;
+	struct xfs_attr_leaf_name_remote	*rentry;
+	struct xfs_buf			*bp;
+	xfs_dahash_t			calc_hash;
+	xfs_dahash_t			hash;
+	int				nameidx;
+	int				hdrsize;
+	unsigned int			badflags;
+	int				error;
+
+	blk = &ds->state->path.blk[level];
+
+	/* Check the hash of the entry. */
+	error = xfs_scrub_da_btree_hash(ds, level, &ent->hashval);
+	if (error)
+		goto out;
+
+	/* Find the attr entry's location. */
+	bp = blk->bp;
+	hdrsize = xfs_attr3_leaf_hdr_size(bp->b_addr);
+	nameidx = be16_to_cpu(ent->nameidx);
+	if (!xfs_scrub_da_check_ok(ds, level, nameidx >= hdrsize &&
+			nameidx < mp->m_attr_geo->blksize))
+		goto out;
+
+	/* Retrieve the entry and check it. */
+	hash = be32_to_cpu(ent->hashval);
+	badflags = ~(XFS_ATTR_LOCAL | XFS_ATTR_ROOT | XFS_ATTR_SECURE |
+			XFS_ATTR_INCOMPLETE);
+	xfs_scrub_da_check_ok(ds, level, (ent->flags & badflags) == 0);
+	if (ent->flags & XFS_ATTR_LOCAL) {
+		lentry = (struct xfs_attr_leaf_name_local *)
+				(((char *)bp->b_addr) + nameidx);
+		if (!xfs_scrub_da_check_ok(ds, level, lentry->namelen > 0))
+			goto out;
+		calc_hash = xfs_da_hashname(lentry->nameval, lentry->namelen);
+	} else {
+		rentry = (struct xfs_attr_leaf_name_remote *)
+				(((char *)bp->b_addr) + nameidx);
+		if (!xfs_scrub_da_check_ok(ds, level, rentry->namelen > 0))
+			goto out;
+		calc_hash = xfs_da_hashname(rentry->name, rentry->namelen);
+	}
+	xfs_scrub_da_check_ok(ds, level, calc_hash == hash);
+
+out:
+	return error;
+}
+
+/* Scrub the extended attribute metadata. */
+int
+xfs_scrub_xattr(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_scrub_xattr		sx = { 0 };
+	struct attrlist_cursor_kern	cursor = { 0 };
+	int				error = 0;
+
+	if (!xfs_inode_hasattr(sc->ip))
+		return -ENOENT;
+
+	memset(&sx, 0, sizeof(sx));
+	/* Check attribute tree structure */
+	error = xfs_scrub_da_btree(sc, XFS_ATTR_FORK, xfs_scrub_xattr_rec);
+	if (error)
+		goto out;
+
+	/* Check that every attr key can also be looked up by hash. */
+	sx.context.dp = sc->ip;
+	sx.context.cursor = &cursor;
+	sx.context.resynch = 1;
+	sx.context.put_listent = xfs_scrub_xattr_listent;
+	sx.context.tp = sc->tp;
+	sx.sc = sc;
+
+	/*
+	 * Look up every xattr in this file by name.
+	 *
+	 * The VFS only locks i_rwsem when modifying attrs, so keep all
+	 * three locks held because that's the only way to ensure we're
+	 * the only thread poking into the da btree.  We traverse the da
+	 * btree while holding a leaf buffer locked for the xattr name
+	 * iteration, which doesn't really follow the usual buffer
+	 * locking order.
+	 */
+	error = xfs_attr_list_int_ilocked(&sx.context);
+	if (!xfs_scrub_fblock_op_ok(sc, XFS_ATTR_FORK, 0, &error))
+		goto out;
+out:
+	return error;
+}
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 82b8056..f0a33b3 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -110,6 +110,8 @@ int xfs_scrub_setup_inode_bmap_data(struct xfs_scrub_context *sc,
 				    struct xfs_inode *ip);
 int xfs_scrub_setup_directory(struct xfs_scrub_context *sc,
 			      struct xfs_inode *ip);
+int xfs_scrub_setup_xattr(struct xfs_scrub_context *sc,
+			  struct xfs_inode *ip);
 
 void xfs_scrub_ag_free(struct xfs_scrub_context *sc, struct xfs_scrub_ag *sa);
 int xfs_scrub_ag_init(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 760492a..046b3e5 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -161,6 +161,10 @@ xfs_scrub_teardown(
 			iput(VFS_I(sc->ip));
 		sc->ip = NULL;
 	}
+	if (sc->buf) {
+		kmem_free(sc->buf);
+		sc->buf = NULL;
+	}
 	return error;
 }
 
@@ -234,6 +238,10 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.setup	= xfs_scrub_setup_directory,
 		.scrub	= xfs_scrub_directory,
 	},
+	{ /* extended attributes */
+		.setup	= xfs_scrub_setup_xattr,
+		.scrub	= xfs_scrub_xattr,
+	},
 };
 
 /* Dispatch metadata scrubbing. */
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index 4c348e8..b440e26 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -54,6 +54,7 @@ struct xfs_scrub_context {
 	const struct xfs_scrub_meta_ops	*ops;
 	struct xfs_trans		*tp;
 	struct xfs_inode		*ip;
+	void				*buf;
 	uint				ilock_flags;
 	bool				try_harder;
 
@@ -78,5 +79,6 @@ int xfs_scrub_bmap_data(struct xfs_scrub_context *sc);
 int xfs_scrub_bmap_attr(struct xfs_scrub_context *sc);
 int xfs_scrub_bmap_cow(struct xfs_scrub_context *sc);
 int xfs_scrub_directory(struct xfs_scrub_context *sc);
+int xfs_scrub_xattr(struct xfs_scrub_context *sc);
 
 #endif	/* __XFS_SCRUB_SCRUB_H__ */


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 24/27] xfs: scrub symbolic links
  2017-09-21  0:17 [PATCH v10 00/27] xfs: online scrub support Darrick J. Wong
                   ` (22 preceding siblings ...)
  2017-09-21  0:20 ` [PATCH 23/27] xfs: scrub extended attributes Darrick J. Wong
@ 2017-09-21  0:20 ` Darrick J. Wong
  2017-09-21  0:20 ` [PATCH 25/27] xfs: scrub parent pointers Darrick J. Wong
                   ` (3 subsequent siblings)
  27 siblings, 0 replies; 51+ messages in thread
From: Darrick J. Wong @ 2017-09-21  0:20 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Create the infrastructure to scrub symbolic link data.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile        |    1 +
 fs/xfs/libxfs/xfs_fs.h |    3 +-
 fs/xfs/scrub/common.h  |    2 +
 fs/xfs/scrub/scrub.c   |    4 ++
 fs/xfs/scrub/scrub.h   |    1 +
 fs/xfs/scrub/symlink.c |   92 ++++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 102 insertions(+), 1 deletion(-)
 create mode 100644 fs/xfs/scrub/symlink.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 4d46399..28637a6 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -159,5 +159,6 @@ xfs-y				+= $(addprefix scrub/, \
 				   refcount.o \
 				   rmap.o \
 				   scrub.o \
+				   symlink.o \
 				   )
 endif
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index d31d743..955eea7 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -500,9 +500,10 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_BMBTC	14	/* CoW fork block mapping */
 #define XFS_SCRUB_TYPE_DIR	15	/* directory */
 #define XFS_SCRUB_TYPE_XATTR	16	/* extended attribute */
+#define XFS_SCRUB_TYPE_SYMLINK	17	/* symbolic link */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	17
+#define XFS_SCRUB_TYPE_NR	18
 
 /* i: Repair this metadata. */
 #define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index f0a33b3..eeb6bc4 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -112,6 +112,8 @@ int xfs_scrub_setup_directory(struct xfs_scrub_context *sc,
 			      struct xfs_inode *ip);
 int xfs_scrub_setup_xattr(struct xfs_scrub_context *sc,
 			  struct xfs_inode *ip);
+int xfs_scrub_setup_symlink(struct xfs_scrub_context *sc,
+			    struct xfs_inode *ip);
 
 void xfs_scrub_ag_free(struct xfs_scrub_context *sc, struct xfs_scrub_ag *sa);
 int xfs_scrub_ag_init(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 046b3e5..0185a61 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -242,6 +242,10 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.setup	= xfs_scrub_setup_xattr,
 		.scrub	= xfs_scrub_xattr,
 	},
+	{ /* symbolic link */
+		.setup	= xfs_scrub_setup_symlink,
+		.scrub	= xfs_scrub_symlink,
+	},
 };
 
 /* Dispatch metadata scrubbing. */
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index b440e26..89dfb66 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -80,5 +80,6 @@ int xfs_scrub_bmap_attr(struct xfs_scrub_context *sc);
 int xfs_scrub_bmap_cow(struct xfs_scrub_context *sc);
 int xfs_scrub_directory(struct xfs_scrub_context *sc);
 int xfs_scrub_xattr(struct xfs_scrub_context *sc);
+int xfs_scrub_symlink(struct xfs_scrub_context *sc);
 
 #endif	/* __XFS_SCRUB_SCRUB_H__ */
diff --git a/fs/xfs/scrub/symlink.c b/fs/xfs/scrub/symlink.c
new file mode 100644
index 0000000..e3b5d35
--- /dev/null
+++ b/fs/xfs/scrub/symlink.c
@@ -0,0 +1,92 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_inode_fork.h"
+#include "xfs_symlink.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/trace.h"
+
+/* Set us up to scrub a symbolic link. */
+int
+xfs_scrub_setup_symlink(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	/* Allocate the buffer without the inode lock held. */
+	sc->buf = kmem_zalloc_large(XFS_SYMLINK_MAXLEN + 1, KM_SLEEP);
+	if (!sc->buf)
+		return -ENOMEM;
+
+	return xfs_scrub_setup_inode_contents(sc, ip, 0);
+}
+
+/* Symbolic links. */
+
+int
+xfs_scrub_symlink(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_inode		*ip = sc->ip;
+	struct xfs_ifork		*ifp;
+	loff_t				len;
+	int				error = 0;
+
+	if (!S_ISLNK(VFS_I(ip)->i_mode))
+		return -ENOENT;
+	ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
+	len = ip->i_d.di_size;
+
+	/* Plausible size? */
+	if (!xfs_scrub_fblock_check_ok(sc, XFS_DATA_FORK, 0,
+			len <= XFS_SYMLINK_MAXLEN && len > 0))
+		goto out;
+
+	/* Inline symlink? */
+	if (ifp->if_flags & XFS_IFINLINE) {
+		xfs_scrub_fblock_check_ok(sc, XFS_DATA_FORK, 0,
+				len <= XFS_IFORK_DSIZE(ip) &&
+				len <= strnlen(ifp->if_u1.if_data,
+					XFS_IFORK_DSIZE(ip)));
+		goto out;
+	}
+
+	/* Remote symlink; must read the contents. */
+	error = xfs_readlink_bmap_ilocked(sc->ip, sc->buf);
+	if (!xfs_scrub_fblock_op_ok(sc, XFS_DATA_FORK, 0, &error))
+		goto out;
+	xfs_scrub_fblock_check_ok(sc, XFS_DATA_FORK, 0,
+			len <= strnlen(sc->buf, XFS_SYMLINK_MAXLEN));
+out:
+	return error;
+}


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 25/27] xfs: scrub parent pointers
  2017-09-21  0:17 [PATCH v10 00/27] xfs: online scrub support Darrick J. Wong
                   ` (23 preceding siblings ...)
  2017-09-21  0:20 ` [PATCH 24/27] xfs: scrub symbolic links Darrick J. Wong
@ 2017-09-21  0:20 ` Darrick J. Wong
  2017-09-21  0:20 ` [PATCH 26/27] xfs: scrub realtime bitmap/summary Darrick J. Wong
                   ` (2 subsequent siblings)
  27 siblings, 0 replies; 51+ messages in thread
From: Darrick J. Wong @ 2017-09-21  0:20 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Scrub parent pointers, sort of.  For directories, we can ride the
'..' entry up to the parent to confirm that there's at most one
dentry that points back to this directory.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile        |    1 
 fs/xfs/libxfs/xfs_fs.h |    3 -
 fs/xfs/scrub/common.h  |    2 
 fs/xfs/scrub/parent.c  |  262 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/scrub.c   |    4 +
 fs/xfs/scrub/scrub.h   |    1 
 6 files changed, 272 insertions(+), 1 deletion(-)
 create mode 100644 fs/xfs/scrub/parent.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 28637a6..2193a54 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -156,6 +156,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   dir.o \
 				   ialloc.o \
 				   inode.o \
+				   parent.o \
 				   refcount.o \
 				   rmap.o \
 				   scrub.o \
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 955eea7..e3d10b4 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -501,9 +501,10 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_DIR	15	/* directory */
 #define XFS_SCRUB_TYPE_XATTR	16	/* extended attribute */
 #define XFS_SCRUB_TYPE_SYMLINK	17	/* symbolic link */
+#define XFS_SCRUB_TYPE_PARENT	18	/* parent pointers */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	18
+#define XFS_SCRUB_TYPE_NR	19
 
 /* i: Repair this metadata. */
 #define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index eeb6bc4..29c3c29 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -114,6 +114,8 @@ int xfs_scrub_setup_xattr(struct xfs_scrub_context *sc,
 			  struct xfs_inode *ip);
 int xfs_scrub_setup_symlink(struct xfs_scrub_context *sc,
 			    struct xfs_inode *ip);
+int xfs_scrub_setup_parent(struct xfs_scrub_context *sc,
+			   struct xfs_inode *ip);
 
 void xfs_scrub_ag_free(struct xfs_scrub_context *sc, struct xfs_scrub_ag *sa);
 int xfs_scrub_ag_init(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
diff --git a/fs/xfs/scrub/parent.c b/fs/xfs/scrub/parent.c
new file mode 100644
index 0000000..ec99ebc
--- /dev/null
+++ b/fs/xfs/scrub/parent.c
@@ -0,0 +1,262 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_icache.h"
+#include "xfs_dir2.h"
+#include "xfs_dir2_priv.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/trace.h"
+
+/* Set us up to scrub parents. */
+int
+xfs_scrub_setup_parent(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	return xfs_scrub_setup_inode_contents(sc, ip, 0);
+}
+
+/* Parent pointers */
+
+/* Look for an entry in a parent pointing to this inode. */
+
+struct xfs_scrub_parent_ctx {
+	struct dir_context		dc;
+	xfs_ino_t			ino;
+	xfs_nlink_t			nr;
+};
+
+/* Look for a single entry in a directory pointing to an inode. */
+STATIC int
+xfs_scrub_parent_actor(
+	struct dir_context		*dc,
+	const char			*name,
+	int				namelen,
+	loff_t				pos,
+	u64				ino,
+	unsigned			type)
+{
+	struct xfs_scrub_parent_ctx	*spc;
+
+	spc = container_of(dc, struct xfs_scrub_parent_ctx, dc);
+	if (spc->ino == ino)
+		spc->nr++;
+	return 0;
+}
+
+/* Count the number of dentries in the parent dir that point to this inode. */
+STATIC int
+xfs_scrub_parent_count_parent_dentries(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*parent,
+	xfs_nlink_t			*nr)
+{
+	struct xfs_scrub_parent_ctx	spc = {
+		.dc.actor = xfs_scrub_parent_actor,
+		.dc.pos = 0,
+		.ino = sc->ip->i_ino,
+		.nr = 0,
+	};
+	struct xfs_ifork		*ifp;
+	size_t				bufsize;
+	loff_t				oldpos;
+	uint				lock_mode;
+	int				error;
+
+	/*
+	 * Load the parent directory's extent map.  A regular directory
+	 * open would start readahead (and thus load the extent map)
+	 * before we even got to a readdir call, but this isn't
+	 * guaranteed here.
+	 */
+	lock_mode = xfs_ilock_data_map_shared(parent);
+	ifp = XFS_IFORK_PTR(parent, XFS_DATA_FORK);
+	if (XFS_IFORK_FORMAT(parent, XFS_DATA_FORK) == XFS_DINODE_FMT_BTREE &&
+	    !(ifp->if_flags & XFS_IFEXTENTS)) {
+		error = xfs_iread_extents(sc->tp, parent, XFS_DATA_FORK);
+		if (error) {
+			xfs_iunlock(parent, lock_mode);
+			return error;
+		}
+	}
+	xfs_iunlock(parent, lock_mode);
+
+	/*
+	 * Iterate the parent dir to confirm that there is
+	 * exactly one entry pointing back to the inode being
+	 * scanned.
+	 */
+	bufsize = (size_t)min_t(loff_t, 32768, parent->i_d.di_size);
+	oldpos = 0;
+	while (true) {
+		error = xfs_readdir(sc->tp, parent, &spc.dc, bufsize);
+		if (error)
+			goto out;
+		if (oldpos == spc.dc.pos)
+			break;
+		oldpos = spc.dc.pos;
+	}
+	*nr = spc.nr;
+out:
+	return error;
+}
+
+/* Scrub a parent pointer. */
+int
+xfs_scrub_parent(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_inode		*dp = NULL;
+	xfs_ino_t			dnum;
+	xfs_nlink_t			expected_nr;
+	xfs_nlink_t			nr;
+	int				tries = 0;
+	int				error;
+
+	/*
+	 * If we're a directory, check that the '..' link points up to
+	 * a directory that has one entry pointing to us.
+	 */
+	if (!S_ISDIR(VFS_I(sc->ip)->i_mode))
+		return -ENOENT;
+
+	/*
+	 * If we're an unlinked directory, the parent /won't/ have a link
+	 * to us.  Otherwise, it should have one link.
+	 */
+	expected_nr = VFS_I(sc->ip)->i_nlink == 0 ? 0 : 1;
+
+	/*
+	 * The VFS grabs a read or write lock via i_rwsem before it reads
+	 * or writes to a directory.  If we've gotten this far we've
+	 * already obtained IOLOCK_EXCL, which (since 4.10) is the same as
+	 * getting a write lock on i_rwsem.  Therefore, it is safe for us
+	 * to drop the ILOCK here in order to do directory lookups.
+	 */
+	sc->ilock_flags &= ~(XFS_ILOCK_EXCL | XFS_MMAPLOCK_EXCL);
+	xfs_iunlock(sc->ip, XFS_ILOCK_EXCL | XFS_MMAPLOCK_EXCL);
+
+	/* Look up '..' */
+	error = xfs_dir_lookup(sc->tp, sc->ip, &xfs_name_dotdot, &dnum, NULL);
+	if (!xfs_scrub_fblock_op_ok(sc, XFS_DATA_FORK, 0, &error))
+		goto out;
+
+	/* Is this the root dir?  Then '..' must point to itself. */
+	if (sc->ip == mp->m_rootip) {
+		xfs_scrub_fblock_check_ok(sc, XFS_DATA_FORK, 0,
+				sc->ip->i_ino == mp->m_sb.sb_rootino &&
+				dnum == sc->ip->i_ino);
+		return 0;
+	}
+
+try_again:
+	/* Otherwise, '..' must not point to ourselves. */
+	if (!xfs_scrub_fblock_check_ok(sc, XFS_DATA_FORK, 0,
+			sc->ip->i_ino != dnum))
+		goto out;
+
+	error = xfs_iget(mp, sc->tp, dnum, 0, 0, &dp);
+	if (!xfs_scrub_fblock_op_ok(sc, XFS_DATA_FORK, 0, &error))
+		goto out;
+	if (!xfs_scrub_fblock_check_ok(sc, XFS_DATA_FORK, 0,
+			dp != sc->ip))
+		goto out_rele;
+
+	/*
+	 * We prefer to keep the inode locked while we lock and search
+	 * its alleged parent for a forward reference.  However, this
+	 * child -> parent scheme can deadlock with the parent -> child
+	 * scheme that is normally used.  Therefore, if we can lock the
+	 * parent, just validate the references and get out.
+	 */
+	if (xfs_ilock_nowait(dp, XFS_IOLOCK_SHARED)) {
+		error = xfs_scrub_parent_count_parent_dentries(sc, dp, &nr);
+		if (!xfs_scrub_fblock_op_ok(sc, XFS_DATA_FORK, 0, &error))
+			goto out_unlock;
+		xfs_scrub_fblock_check_ok(sc, XFS_DATA_FORK, 0,
+				nr == expected_nr);
+		goto out_unlock;
+	}
+
+	/*
+	 * The game changes if we get here.  We failed to lock the parent,
+	 * so we're going to try to verify both pointers while only holding
+	 * one lock so as to avoid deadlocking with something that's actually
+	 * trying to traverse down the directory tree.
+	 */
+	xfs_iunlock(sc->ip, sc->ilock_flags);
+	sc->ilock_flags = 0;
+	xfs_ilock(dp, XFS_IOLOCK_SHARED);
+
+	/* Go looking for our dentry. */
+	error = xfs_scrub_parent_count_parent_dentries(sc, dp, &nr);
+	if (!xfs_scrub_fblock_op_ok(sc, XFS_DATA_FORK, 0, &error))
+		goto out_unlock;
+
+	/* Drop the parent lock, relock this inode. */
+	xfs_iunlock(dp, XFS_IOLOCK_SHARED);
+	sc->ilock_flags = XFS_IOLOCK_EXCL;
+	xfs_ilock(sc->ip, sc->ilock_flags);
+
+	/* Look up '..' to see if the inode changed. */
+	error = xfs_dir_lookup(sc->tp, sc->ip, &xfs_name_dotdot, &dnum, NULL);
+	if (!xfs_scrub_fblock_op_ok(sc, XFS_DATA_FORK, 0, &error))
+		goto out_rele;
+
+	/* Drat, parent changed.  Try again! */
+	if (dnum != dp->i_ino) {
+		iput(VFS_I(dp));
+		tries++;
+		if (tries < 20)
+			goto try_again;
+		xfs_scrub_check_thoroughness(sc, false);
+		goto out;
+	}
+	iput(VFS_I(dp));
+
+	/*
+	 * '..' didn't change, so check that there was only one entry
+	 * for us in the parent.
+	 */
+	xfs_scrub_fblock_check_ok(sc, XFS_DATA_FORK, 0, nr == expected_nr);
+	goto out;
+
+out_unlock:
+	xfs_iunlock(dp, XFS_IOLOCK_SHARED);
+out_rele:
+	iput(VFS_I(dp));
+out:
+	return error;
+}
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 0185a61..6dae908 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -246,6 +246,10 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.setup	= xfs_scrub_setup_symlink,
 		.scrub	= xfs_scrub_symlink,
 	},
+	{ /* parent pointers */
+		.setup	= xfs_scrub_setup_parent,
+		.scrub	= xfs_scrub_parent,
+	},
 };
 
 /* Dispatch metadata scrubbing. */
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index 89dfb66..3363df9 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -81,5 +81,6 @@ int xfs_scrub_bmap_cow(struct xfs_scrub_context *sc);
 int xfs_scrub_directory(struct xfs_scrub_context *sc);
 int xfs_scrub_xattr(struct xfs_scrub_context *sc);
 int xfs_scrub_symlink(struct xfs_scrub_context *sc);
+int xfs_scrub_parent(struct xfs_scrub_context *sc);
 
 #endif	/* __XFS_SCRUB_SCRUB_H__ */


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 26/27] xfs: scrub realtime bitmap/summary
  2017-09-21  0:17 [PATCH v10 00/27] xfs: online scrub support Darrick J. Wong
                   ` (24 preceding siblings ...)
  2017-09-21  0:20 ` [PATCH 25/27] xfs: scrub parent pointers Darrick J. Wong
@ 2017-09-21  0:20 ` Darrick J. Wong
  2017-09-21  0:20 ` [PATCH 27/27] xfs: scrub quota information Darrick J. Wong
  2017-09-22  3:27 ` [PATCH] man: describe the metadata scrubbing ioctl Darrick J. Wong
  27 siblings, 0 replies; 51+ messages in thread
From: Darrick J. Wong @ 2017-09-21  0:20 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Perform simple tests of the realtime bitmap and summary.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile            |    2 +
 fs/xfs/libxfs/xfs_format.h |    5 ++
 fs/xfs/libxfs/xfs_fs.h     |    4 +-
 fs/xfs/scrub/common.h      |    1 
 fs/xfs/scrub/rtbitmap.c    |   98 ++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/scrub.c       |   15 +++++++
 fs/xfs/scrub/scrub.h       |    2 +
 7 files changed, 126 insertions(+), 1 deletion(-)
 create mode 100644 fs/xfs/scrub/rtbitmap.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 2193a54..9ce581e 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -162,4 +162,6 @@ xfs-y				+= $(addprefix scrub/, \
 				   scrub.o \
 				   symlink.o \
 				   )
+
+xfs-$(CONFIG_XFS_RT)		+= scrub/rtbitmap.o
 endif
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 154c3dd..d4d9bef 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -315,6 +315,11 @@ static inline bool xfs_sb_good_version(struct xfs_sb *sbp)
 	return false;
 }
 
+static inline bool xfs_sb_version_hasrealtime(struct xfs_sb *sbp)
+{
+	return sbp->sb_rblocks > 0;
+}
+
 /*
  * Detect a mismatched features2 field.  Older kernels read/wrote
  * this into the wrong slot, so to be safe we keep them in sync.
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index e3d10b4..3f37156 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -502,9 +502,11 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_XATTR	16	/* extended attribute */
 #define XFS_SCRUB_TYPE_SYMLINK	17	/* symbolic link */
 #define XFS_SCRUB_TYPE_PARENT	18	/* parent pointers */
+#define XFS_SCRUB_TYPE_RTBITMAP	19	/* realtime bitmap */
+#define XFS_SCRUB_TYPE_RTSUM	20	/* realtime summary */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	19
+#define XFS_SCRUB_TYPE_NR	21
 
 /* i: Repair this metadata. */
 #define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 29c3c29..871204a 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -116,6 +116,7 @@ int xfs_scrub_setup_symlink(struct xfs_scrub_context *sc,
 			    struct xfs_inode *ip);
 int xfs_scrub_setup_parent(struct xfs_scrub_context *sc,
 			   struct xfs_inode *ip);
+int xfs_scrub_setup_rt(struct xfs_scrub_context *sc, struct xfs_inode *ip);
 
 void xfs_scrub_ag_free(struct xfs_scrub_context *sc, struct xfs_scrub_ag *sa);
 int xfs_scrub_ag_init(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
diff --git a/fs/xfs/scrub/rtbitmap.c b/fs/xfs/scrub/rtbitmap.c
new file mode 100644
index 0000000..e15f8f3
--- /dev/null
+++ b/fs/xfs/scrub/rtbitmap.c
@@ -0,0 +1,98 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_alloc.h"
+#include "xfs_rtalloc.h"
+#include "xfs_inode.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/trace.h"
+
+/* Set us up with the realtime metadata locked. */
+int
+xfs_scrub_setup_rt(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	struct xfs_mount		*mp = sc->mp;
+	int				lockmode;
+	int				error = 0;
+
+	if (sc->sm->sm_agno || sc->sm->sm_ino || sc->sm->sm_gen)
+		return -EINVAL;
+
+	error = xfs_scrub_setup_fs(sc, ip);
+	if (error)
+		return error;
+
+	lockmode = XFS_ILOCK_EXCL | XFS_ILOCK_RTBITMAP;
+	xfs_ilock(mp->m_rbmip, lockmode);
+	xfs_trans_ijoin(sc->tp, mp->m_rbmip, lockmode);
+
+	return 0;
+}
+
+/* Realtime bitmap. */
+
+/* Scrub a free extent record from the realtime bitmap. */
+STATIC int
+xfs_scrub_rtbitmap_helper(
+	struct xfs_trans		*tp,
+	struct xfs_rtalloc_rec		*rec,
+	void				*priv)
+{
+	return 0;
+}
+
+/* Scrub the realtime bitmap. */
+int
+xfs_scrub_rtbitmap(
+	struct xfs_scrub_context	*sc)
+{
+	int				error;
+
+	error = xfs_rtalloc_query_all(sc->tp, xfs_scrub_rtbitmap_helper, NULL);
+	if (!xfs_scrub_fblock_op_ok(sc, XFS_DATA_FORK, 0, &error))
+		goto out;
+
+out:
+	return error;
+}
+
+/* Scrub the realtime summary. */
+int
+xfs_scrub_rtsummary(
+	struct xfs_scrub_context	*sc)
+{
+	/* XXX: implement this some day */
+	return -ENOENT;
+}
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 6dae908..23b010d 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -250,6 +250,21 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.setup	= xfs_scrub_setup_parent,
 		.scrub	= xfs_scrub_parent,
 	},
+#ifdef CONFIG_XFS_RT
+	{ /* realtime bitmap */
+		.setup	= xfs_scrub_setup_rt,
+		.scrub	= xfs_scrub_rtbitmap,
+		.has	= xfs_sb_version_hasrealtime,
+	},
+	{ /* realtime summary */
+		.setup	= xfs_scrub_setup_rt,
+		.scrub	= xfs_scrub_rtsummary,
+		.has	= xfs_sb_version_hasrealtime,
+	},
+#else
+	{ NULL },
+	{ NULL },
+#endif
 };
 
 /* Dispatch metadata scrubbing. */
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index 3363df9..97fd03f 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -82,5 +82,7 @@ int xfs_scrub_directory(struct xfs_scrub_context *sc);
 int xfs_scrub_xattr(struct xfs_scrub_context *sc);
 int xfs_scrub_symlink(struct xfs_scrub_context *sc);
 int xfs_scrub_parent(struct xfs_scrub_context *sc);
+int xfs_scrub_rtbitmap(struct xfs_scrub_context *sc);
+int xfs_scrub_rtsummary(struct xfs_scrub_context *sc);
 
 #endif	/* __XFS_SCRUB_SCRUB_H__ */


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 27/27] xfs: scrub quota information
  2017-09-21  0:17 [PATCH v10 00/27] xfs: online scrub support Darrick J. Wong
                   ` (25 preceding siblings ...)
  2017-09-21  0:20 ` [PATCH 26/27] xfs: scrub realtime bitmap/summary Darrick J. Wong
@ 2017-09-21  0:20 ` Darrick J. Wong
  2017-09-22  3:27 ` [PATCH] man: describe the metadata scrubbing ioctl Darrick J. Wong
  27 siblings, 0 replies; 51+ messages in thread
From: Darrick J. Wong @ 2017-09-21  0:20 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Perform some quick sanity testing of the disk quota information.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile        |    1 
 fs/xfs/libxfs/xfs_fs.h |    5 +
 fs/xfs/scrub/common.h  |    1 
 fs/xfs/scrub/quota.c   |  258 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/scrub.c   |   18 +++
 fs/xfs/scrub/scrub.h   |    1 
 6 files changed, 283 insertions(+), 1 deletion(-)
 create mode 100644 fs/xfs/scrub/quota.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 9ce581e..3152469 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -164,4 +164,5 @@ xfs-y				+= $(addprefix scrub/, \
 				   )
 
 xfs-$(CONFIG_XFS_RT)		+= scrub/rtbitmap.o
+xfs-$(CONFIG_XFS_QUOTA)		+= scrub/quota.o
 endif
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 3f37156..fba0105 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -504,9 +504,12 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_PARENT	18	/* parent pointers */
 #define XFS_SCRUB_TYPE_RTBITMAP	19	/* realtime bitmap */
 #define XFS_SCRUB_TYPE_RTSUM	20	/* realtime summary */
+#define XFS_SCRUB_TYPE_UQUOTA	21	/* user quotas */
+#define XFS_SCRUB_TYPE_GQUOTA	22	/* group quotas */
+#define XFS_SCRUB_TYPE_PQUOTA	23	/* project quotas */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	21
+#define XFS_SCRUB_TYPE_NR	24
 
 /* i: Repair this metadata. */
 #define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 871204a..a4a2ca5 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -117,6 +117,7 @@ int xfs_scrub_setup_symlink(struct xfs_scrub_context *sc,
 int xfs_scrub_setup_parent(struct xfs_scrub_context *sc,
 			   struct xfs_inode *ip);
 int xfs_scrub_setup_rt(struct xfs_scrub_context *sc, struct xfs_inode *ip);
+int xfs_scrub_setup_quota(struct xfs_scrub_context *sc, struct xfs_inode *ip);
 
 void xfs_scrub_ag_free(struct xfs_scrub_context *sc, struct xfs_scrub_ag *sa);
 int xfs_scrub_ag_init(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
diff --git a/fs/xfs/scrub/quota.c b/fs/xfs/scrub/quota.c
new file mode 100644
index 0000000..302d5a8
--- /dev/null
+++ b/fs/xfs/scrub/quota.c
@@ -0,0 +1,258 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_inode_fork.h"
+#include "xfs_bmap.h"
+#include "xfs_quota.h"
+#include "xfs_qm.h"
+#include "xfs_dquot.h"
+#include "xfs_dquot_item.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/trace.h"
+
+/* Convert a scrub type code to a DQ flag, or return 0 if error. */
+static inline uint
+xfs_scrub_quota_to_dqtype(
+	struct xfs_scrub_context	*sc)
+{
+	switch (sc->sm->sm_type) {
+	case XFS_SCRUB_TYPE_UQUOTA:
+		return XFS_DQ_USER;
+	case XFS_SCRUB_TYPE_GQUOTA:
+		return XFS_DQ_GROUP;
+	case XFS_SCRUB_TYPE_PQUOTA:
+		return XFS_DQ_PROJ;
+	default:
+		return 0;
+	}
+}
+
+/* Set us up to scrub a quota. */
+int
+xfs_scrub_setup_quota(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	uint				dqtype;
+
+	if (sc->sm->sm_agno || sc->sm->sm_ino || sc->sm->sm_gen)
+		return -EINVAL;
+
+	dqtype = xfs_scrub_quota_to_dqtype(sc);
+	if (dqtype == 0)
+		return -EINVAL;
+	return 0;
+}
+
+/* Quotas. */
+
+/* Scrub the fields in an individual quota item. */
+STATIC void
+xfs_scrub_quota_item(
+	struct xfs_scrub_context	*sc,
+	uint				dqtype,
+	struct xfs_dquot		*dq,
+	xfs_dqid_t			id)
+{
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_disk_dquot		*d = &dq->q_core;
+	struct xfs_quotainfo		*qi = mp->m_quotainfo;
+	xfs_fileoff_t			offset;
+	unsigned long long		bsoft;
+	unsigned long long		isoft;
+	unsigned long long		rsoft;
+	unsigned long long		bhard;
+	unsigned long long		ihard;
+	unsigned long long		rhard;
+	unsigned long long		bcount;
+	unsigned long long		icount;
+	unsigned long long		rcount;
+	xfs_ino_t			inodes;
+
+	/* Did we get the dquot we wanted? */
+	offset = id * qi->qi_dqperchunk;
+	xfs_scrub_fblock_check_ok(sc, XFS_DATA_FORK, offset,
+			id <= be32_to_cpu(d->d_id) &&
+			dqtype == (d->d_flags & XFS_DQ_ALLTYPES));
+
+	/* Check the limits. */
+	bhard = be64_to_cpu(d->d_blk_hardlimit);
+	ihard = be64_to_cpu(d->d_ino_hardlimit);
+	rhard = be64_to_cpu(d->d_rtb_hardlimit);
+
+	bsoft = be64_to_cpu(d->d_blk_softlimit);
+	isoft = be64_to_cpu(d->d_ino_softlimit);
+	rsoft = be64_to_cpu(d->d_rtb_softlimit);
+
+	inodes = XFS_AGINO_TO_INO(mp, mp->m_sb.sb_agcount, 0);
+
+	/*
+	 * Warn if the limits are larger than the fs.  Administrators
+	 * can do this, though in production this seems suspect.
+	 */
+	xfs_scrub_fblock_warn_ok(sc, XFS_DATA_FORK, offset,
+			bhard <= mp->m_sb.sb_dblocks &&
+			ihard <= inodes &&
+			rhard <= mp->m_sb.sb_rblocks &&
+			bsoft <= mp->m_sb.sb_dblocks &&
+			isoft <= inodes &&
+			rsoft <= mp->m_sb.sb_rblocks);
+
+	/* Soft limit must be less than the hard limit. */
+	xfs_scrub_fblock_check_ok(sc, XFS_DATA_FORK, offset,
+			bsoft <= bhard &&
+			isoft <= ihard &&
+			rsoft <= rhard);
+
+	/* Check the resource counts. */
+	bcount = be64_to_cpu(d->d_bcount);
+	icount = be64_to_cpu(d->d_icount);
+	rcount = be64_to_cpu(d->d_rtbcount);
+	inodes = percpu_counter_sum(&mp->m_icount);
+
+	/*
+	 * Check that usage doesn't exceed physical limits.  However, on
+	 * a reflink filesystem we're allowed to exceed physical space
+	 * if there are no quota limits.
+	 */
+	if (xfs_sb_version_hasreflink(&mp->m_sb))
+		xfs_scrub_fblock_warn_ok(sc, XFS_DATA_FORK, offset,
+				bcount <= mp->m_sb.sb_dblocks);
+	else
+		xfs_scrub_fblock_check_ok(sc, XFS_DATA_FORK, offset,
+				bcount <= mp->m_sb.sb_dblocks);
+	xfs_scrub_fblock_check_ok(sc, XFS_DATA_FORK, offset,
+			icount <= inodes && rcount <= mp->m_sb.sb_rblocks);
+
+	/*
+	 * We can violate the hard limits if the admin suddenly sets a
+	 * lower limit than the actual usage.  However, we flag it for
+	 * admin review.
+	 */
+	xfs_scrub_fblock_warn_ok(sc, XFS_DATA_FORK, offset,
+			(id == 0 || bhard == 0 || bcount <= bhard) &&
+			(id == 0 || ihard == 0 || icount <= ihard) &&
+			(id == 0 || rhard == 0 || rcount <= rhard));
+}
+
+/* Scrub all of a quota type's items. */
+int
+xfs_scrub_quota(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_bmbt_irec		irec = { 0 };
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_inode		*ip;
+	struct xfs_quotainfo		*qi = mp->m_quotainfo;
+	struct xfs_dquot		*dq;
+	xfs_fileoff_t			max_dqid_off;
+	xfs_fileoff_t			off = 0;
+	xfs_dqid_t			id = 0;
+	uint				dqtype;
+	int				nimaps;
+	int				error;
+
+	if (!XFS_IS_QUOTA_RUNNING(mp) || !XFS_IS_QUOTA_ON(mp))
+		return -ENOENT;
+
+	mutex_lock(&qi->qi_quotaofflock);
+	dqtype = xfs_scrub_quota_to_dqtype(sc);
+	if (!xfs_this_quota_on(sc->mp, dqtype)) {
+		error = -ENOENT;
+		goto out;
+	}
+
+	/* Attach to the quota inode and set sc->ip so that reporting works. */
+	ip = xfs_quota_inode(sc->mp, dqtype);
+	sc->ip = ip;
+
+	/* Look for problem extents. */
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
+	max_dqid_off = ((xfs_dqid_t)-1) / qi->qi_dqperchunk;
+	while (1) {
+		if (xfs_scrub_should_terminate(&error))
+			break;
+
+		off = irec.br_startoff + irec.br_blockcount;
+		nimaps = 1;
+		error = xfs_bmapi_read(ip, off, -1, &irec, &nimaps,
+				XFS_BMAPI_ENTIRE);
+		if (!xfs_scrub_fblock_op_ok(sc, XFS_DATA_FORK, off, &error))
+			goto out_unlock;
+		if (!nimaps)
+			break;
+		if (irec.br_startblock == HOLESTARTBLOCK)
+			continue;
+
+		/*
+		 * Unwritten extents or blocks mapped above the highest
+		 * quota id shouldn't happen.
+		 */
+		xfs_scrub_fblock_check_ok(sc, XFS_DATA_FORK, off,
+				!isnullstartblock(irec.br_startblock) &&
+				irec.br_startoff <= max_dqid_off &&
+				irec.br_startoff + irec.br_blockcount <=
+					max_dqid_off + 1);
+	}
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+
+	/* Check all the quota items. */
+	while (id < ((xfs_dqid_t)-1ULL)) {
+		if (xfs_scrub_should_terminate(&error))
+			break;
+
+		error = xfs_qm_dqget(mp, NULL, id, dqtype, XFS_QMOPT_DQNEXT,
+				&dq);
+		if (error == -ENOENT)
+			break;
+		if (!xfs_scrub_fblock_op_ok(sc, XFS_DATA_FORK,
+				id * qi->qi_dqperchunk, &error))
+			goto out;
+
+		xfs_scrub_quota_item(sc, dqtype, dq, id);
+
+		id = be32_to_cpu(dq->q_core.d_id) + 1;
+		xfs_qm_dqput(dq);
+		if (!id)
+			break;
+	}
+	goto out;
+
+out_unlock:
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+out:
+	sc->ip = NULL;
+	mutex_unlock(&qi->qi_quotaofflock);
+	return error;
+}
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 23b010d..8cddfcd 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -265,6 +265,24 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 	{ NULL },
 	{ NULL },
 #endif
+#ifdef CONFIG_XFS_QUOTA
+	{ /* user quota */
+		.setup = xfs_scrub_setup_quota,
+		.scrub = xfs_scrub_quota,
+	},
+	{ /* group quota */
+		.setup = xfs_scrub_setup_quota,
+		.scrub = xfs_scrub_quota,
+	},
+	{ /* project quota */
+		.setup = xfs_scrub_setup_quota,
+		.scrub = xfs_scrub_quota,
+	},
+#else
+	{ NULL },
+	{ NULL },
+	{ NULL },
+#endif
 };
 
 /* Dispatch metadata scrubbing. */
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index 97fd03f..3218664 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -84,5 +84,6 @@ int xfs_scrub_symlink(struct xfs_scrub_context *sc);
 int xfs_scrub_parent(struct xfs_scrub_context *sc);
 int xfs_scrub_rtbitmap(struct xfs_scrub_context *sc);
 int xfs_scrub_rtsummary(struct xfs_scrub_context *sc);
+int xfs_scrub_quota(struct xfs_scrub_context *sc);
 
 #endif	/* __XFS_SCRUB_SCRUB_H__ */


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH 05/27] xfs: test the scrub ioctl
  2017-09-21  0:18 ` [PATCH 05/27] xfs: test the scrub ioctl Darrick J. Wong
@ 2017-09-21  6:04   ` Dave Chinner
  2017-09-21 18:14     ` Darrick J. Wong
  0 siblings, 1 reply; 51+ messages in thread
From: Dave Chinner @ 2017-09-21  6:04 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Sep 20, 2017 at 05:18:08PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Create a test scrubber with id 0.  This will be used by xfs_scrub to
> probe the kernel's abilities to scrub (and repair) the metadata.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/Makefile        |    1 +
>  fs/xfs/libxfs/xfs_fs.h |    3 ++
>  fs/xfs/scrub/common.c  |   60 ++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/scrub/common.h  |   44 +++++++++++++++++++++++++++++++++++
>  fs/xfs/scrub/scrub.c   |   33 ++++++++++++++++++++++++++
>  fs/xfs/scrub/scrub.h   |    1 +
>  fs/xfs/scrub/trace.c   |    1 +
>  7 files changed, 142 insertions(+), 1 deletion(-)
>  create mode 100644 fs/xfs/scrub/common.c
>  create mode 100644 fs/xfs/scrub/common.h
> 
> 
> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> index f4312bc..ca14595 100644
> --- a/fs/xfs/Makefile
> +++ b/fs/xfs/Makefile
> @@ -146,6 +146,7 @@ ifeq ($(CONFIG_XFS_ONLINE_SCRUB),y)
>  
>  xfs-y				+= $(addprefix scrub/, \
>  				   trace.o \
> +				   common.o \
>  				   scrub.o \
>  				   )
>  endif
> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> index a4b4c8c..5105bad 100644
> --- a/fs/xfs/libxfs/xfs_fs.h
> +++ b/fs/xfs/libxfs/xfs_fs.h
> @@ -483,9 +483,10 @@ struct xfs_scrub_metadata {
>   */
>  
>  /* Scrub subcommands. */
> +#define XFS_SCRUB_TYPE_TEST	0	/* presence test ioctl */

Shouldn't we call this a "probe" - as in "probe for support" so it
doesn't get confused with "use this to test whether scrub works"

>  /* Number of scrub subcommands. */
> -#define XFS_SCRUB_TYPE_NR	0
> +#define XFS_SCRUB_TYPE_NR	1
>  
>  /* i: Repair this metadata. */
>  #define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> new file mode 100644
> index 0000000..13ccb36
> --- /dev/null
> +++ b/fs/xfs/scrub/common.c
> @@ -0,0 +1,60 @@
> +/*
> + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> + *
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#include "xfs.h"
> +#include "xfs_fs.h"
> +#include "xfs_shared.h"
> +#include "xfs_format.h"
> +#include "xfs_trans_resv.h"
> +#include "xfs_mount.h"
> +#include "xfs_defer.h"
> +#include "xfs_btree.h"
> +#include "xfs_bit.h"
> +#include "xfs_log_format.h"
> +#include "xfs_trans.h"
> +#include "xfs_sb.h"
> +#include "xfs_inode.h"
> +#include "xfs_alloc.h"
> +#include "xfs_alloc_btree.h"
> +#include "xfs_bmap.h"
> +#include "xfs_bmap_btree.h"
> +#include "xfs_ialloc.h"
> +#include "xfs_ialloc_btree.h"
> +#include "xfs_refcount.h"
> +#include "xfs_refcount_btree.h"
> +#include "xfs_rmap.h"
> +#include "xfs_rmap_btree.h"
> +#include "scrub/xfs_scrub.h"
> +#include "scrub/scrub.h"
> +#include "scrub/common.h"
> +#include "scrub/trace.h"
> +
> +/* Common code for the metadata scrubbers. */
> +
> +/* Per-scrubber setup functions */
> +
> +/* Set us up with a transaction and an empty context. */
> +int
> +xfs_scrub_setup_fs(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_inode		*ip)
> +{
> +	return xfs_scrub_trans_alloc(sc->sm, sc->mp,
> +			&M_RES(sc->mp)->tr_itruncate, 0, 0, 0, &sc->tp);
> +}

Using the truncate transaction reservation really needs explaining
here....

.....
>  
> +/*
> + * Test scrubber -- userspace uses this to probe if we're willing to
> + * scrub or repair a given mountpoint.
> + */

Yup, definitely should be called xfs_scrub_probe()....

> +int
> +xfs_scrub_tester(
> +	struct xfs_scrub_context	*sc)
> +{
> +	if (sc->sm->sm_ino || sc->sm->sm_agno)
> +		return -EINVAL;
> +	if (sc->sm->sm_gen & XFS_SCRUB_OFLAG_CORRUPT)
> +		sc->sm->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT;
> +	if (sc->sm->sm_gen & XFS_SCRUB_OFLAG_PREEN)
> +		sc->sm->sm_flags |= XFS_SCRUB_OFLAG_PREEN;
> +	if (sc->sm->sm_gen & XFS_SCRUB_OFLAG_XFAIL)
> +		sc->sm->sm_flags |= XFS_SCRUB_OFLAG_XFAIL;
> +	if (sc->sm->sm_gen & XFS_SCRUB_OFLAG_XCORRUPT)
> +		sc->sm->sm_flags |= XFS_SCRUB_OFLAG_XCORRUPT;
> +	if (sc->sm->sm_gen & XFS_SCRUB_OFLAG_INCOMPLETE)
> +		sc->sm->sm_flags |= XFS_SCRUB_OFLAG_INCOMPLETE;
> +	if (sc->sm->sm_gen & XFS_SCRUB_OFLAG_WARNING)
> +		sc->sm->sm_flags |= XFS_SCRUB_OFLAG_WARNING;
> +	if (sc->sm->sm_gen & ~XFS_SCRUB_FLAGS_OUT)
> +		return -ENOENT;

Shouldn't this check should be first? If not, comment to explain?

Also, I find that really hard to parse because it's so dense and
so much is repeated over and over again (makes my pattern matching
brain cells scream). It's copying the exact same flags from
sc->sm->sm_gen to sc->sm->sm_flags, so why not somethign like:


	struct xfs_scrub_m...	*sm = sc->sm;

	if (sm->sm_ino || sm->sm_agno)
		return -EINVAL;

	sm->flags = sm->sm_gen & XFS_SCRUB_FLAGS_OUT;
	if (sm->sm_gen & ~XFS_SCRUB_FLAGS_OUT)
		return -ENOENT;

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 01/27] xfs: return a distinct error code value for IGET_INCORE cache misses
  2017-09-21  0:17 ` [PATCH 01/27] xfs: return a distinct error code value for IGET_INCORE cache misses Darrick J. Wong
@ 2017-09-21 14:36   ` Brian Foster
  0 siblings, 0 replies; 51+ messages in thread
From: Brian Foster @ 2017-09-21 14:36 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Sep 20, 2017 at 05:17:37PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> For an XFS_IGET_INCORE iget operation, if the inode isn't in the cache,
> return ENODATA so that we don't confuse it with the pre-existing ENOENT
> cases (inode is in cache, but freed).
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_icache.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> 
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 3422711..43005fb 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -610,7 +610,7 @@ xfs_iget(
>  	} else {
>  		rcu_read_unlock();
>  		if (flags & XFS_IGET_INCORE) {
> -			error = -ENOENT;
> +			error = -ENODATA;
>  			goto out_error_or_again;
>  		}
>  		XFS_STATS_INC(mp, xs_ig_missed);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 02/27] xfs: query the per-AG reservation counters
  2017-09-21  0:17 ` [PATCH 02/27] xfs: query the per-AG reservation counters Darrick J. Wong
@ 2017-09-21 14:36   ` Brian Foster
  2017-09-21 17:30     ` Darrick J. Wong
  0 siblings, 1 reply; 51+ messages in thread
From: Brian Foster @ 2017-09-21 14:36 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Sep 20, 2017 at 05:17:43PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Establish an ioctl for userspace to query the original and current
> per-AG reservation counts.  This will be used by xfs_scrub to
> check that the vfs counters are at least somewhat sane.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

The code seems fine, but I'm wondering how much of this would remain as
is based on Dave's potential switch of the perag res stuff to be a
hidden reservation. That itself still needs to be posted/reviewed/etc.,
so might it be smarter to defer defining an interface to export this
stuff until the core mechanism is worked out?

That aside, I'm also wondering how/what userspace scrub does with the
overall total values as opposed to individual AG reservation values. Are
the current "ask" values expected to be uniform across each AG..? For
example, what happens if one AG has a bogus "current" value for whatever
reason (perhaps this is more relevant for both values if we do actually
end up with on-disk reservations)?

Brian

>  fs/xfs/libxfs/xfs_fs.h |   12 ++++++++++++
>  fs/xfs/xfs_fsops.c     |   26 ++++++++++++++++++++++++++
>  fs/xfs/xfs_fsops.h     |    2 ++
>  fs/xfs/xfs_ioctl.c     |   24 ++++++++++++++++++++++++
>  fs/xfs/xfs_ioctl32.c   |    1 +
>  5 files changed, 65 insertions(+)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> index 8c61f21..2c26c38 100644
> --- a/fs/xfs/libxfs/xfs_fs.h
> +++ b/fs/xfs/libxfs/xfs_fs.h
> @@ -469,6 +469,17 @@ typedef struct xfs_swapext
>  #define XFS_FSOP_GOING_FLAGS_NOLOGFLUSH		0x2	/* don't flush log nor data */
>  
>  /*
> + * AG reserved block counters
> + */
> +struct xfs_fsop_ag_resblks {
> +	__u32 ar_flags;			/* output flags, none defined now */
> +	__u32 ar_reserved;		/* zero */
> +	__u64 ar_current_resv;		/* blocks reserved now */
> +	__u64 ar_mount_resv;		/* blocks reserved at mount time */
> +	__u64 ar_reserved2[5];		/* zero */
> +};
> +
> +/*
>   * ioctl limits
>   */
>  #ifdef XATTR_LIST_MAX
> @@ -543,6 +554,7 @@ typedef struct xfs_swapext
>  #define XFS_IOC_ATTRMULTI_BY_HANDLE  _IOW ('X', 123, struct xfs_fsop_attrmulti_handlereq)
>  #define XFS_IOC_FSGEOMETRY	     _IOR ('X', 124, struct xfs_fsop_geom)
>  #define XFS_IOC_GOINGDOWN	     _IOR ('X', 125, uint32_t)
> +#define XFS_IOC_GET_AG_RESBLKS	     _IOR ('X', 126, struct xfs_fsop_ag_resblks)
>  /*	XFS_IOC_GETFSUUID ---------- deprecated 140	 */
>  
>  
> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> index 8f22fc5..50fb3a2 100644
> --- a/fs/xfs/xfs_fsops.c
> +++ b/fs/xfs/xfs_fsops.c
> @@ -44,6 +44,7 @@
>  #include "xfs_filestream.h"
>  #include "xfs_rmap.h"
>  #include "xfs_ag_resv.h"
> +#include "xfs_fs.h"
>  
>  /*
>   * File system operations
> @@ -1046,3 +1047,28 @@ xfs_fs_unreserve_ag_blocks(
>  
>  	return error;
>  }
> +
> +/* Query the per-AG reservations to see how many blocks we have reserved. */
> +int
> +xfs_fs_get_ag_reserve_blocks(
> +	struct xfs_mount		*mp,
> +	struct xfs_fsop_ag_resblks	*out)
> +{
> +	struct xfs_ag_resv		*r;
> +	struct xfs_perag		*pag;
> +	xfs_agnumber_t			agno;
> +
> +	memset(out, 0, sizeof(struct xfs_fsop_ag_resblks));
> +	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
> +		pag = xfs_perag_get(mp, agno);
> +		r = xfs_perag_resv(pag, XFS_AG_RESV_METADATA);
> +		out->ar_current_resv += r->ar_reserved;
> +		out->ar_mount_resv += r->ar_asked;
> +		r = xfs_perag_resv(pag, XFS_AG_RESV_AGFL);
> +		out->ar_current_resv += r->ar_reserved;
> +		out->ar_mount_resv += r->ar_asked;
> +		xfs_perag_put(pag);
> +	}
> +
> +	return 0;
> +}
> diff --git a/fs/xfs/xfs_fsops.h b/fs/xfs/xfs_fsops.h
> index 2954c13..c8f5e26 100644
> --- a/fs/xfs/xfs_fsops.h
> +++ b/fs/xfs/xfs_fsops.h
> @@ -25,6 +25,8 @@ extern int xfs_fs_counts(xfs_mount_t *mp, xfs_fsop_counts_t *cnt);
>  extern int xfs_reserve_blocks(xfs_mount_t *mp, uint64_t *inval,
>  				xfs_fsop_resblks_t *outval);
>  extern int xfs_fs_goingdown(xfs_mount_t *mp, uint32_t inflags);
> +extern int xfs_fs_get_ag_reserve_blocks(struct xfs_mount *mp,
> +		struct xfs_fsop_ag_resblks *out);
>  
>  extern int xfs_fs_reserve_ag_blocks(struct xfs_mount *mp);
>  extern int xfs_fs_unreserve_ag_blocks(struct xfs_mount *mp);
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 5049e8a..44dc178 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -1782,6 +1782,27 @@ xfs_ioc_swapext(
>  	return error;
>  }
>  
> +static int
> +xfs_ioc_get_ag_reserve_blocks(
> +	struct xfs_mount		*mp,
> +	void __user			*arg)
> +{
> +	struct xfs_fsop_ag_resblks	out;
> +	int				error;
> +
> +	if (!capable(CAP_SYS_ADMIN))
> +		return -EPERM;
> +
> +	error = xfs_fs_get_ag_reserve_blocks(mp, &out);
> +	if (error)
> +		return error;
> +
> +	if (copy_to_user(arg, &out, sizeof(out)))
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +
>  /*
>   * Note: some of the ioctl's return positive numbers as a
>   * byte count indicating success, such as readlink_by_handle.
> @@ -1987,6 +2008,9 @@ xfs_file_ioctl(
>  		return 0;
>  	}
>  
> +	case XFS_IOC_GET_AG_RESBLKS:
> +		return xfs_ioc_get_ag_reserve_blocks(mp, arg);
> +
>  	case XFS_IOC_FSGROWFSDATA: {
>  		xfs_growfs_data_t in;
>  
> diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
> index fa0bc4d..e8b4de3 100644
> --- a/fs/xfs/xfs_ioctl32.c
> +++ b/fs/xfs/xfs_ioctl32.c
> @@ -556,6 +556,7 @@ xfs_file_compat_ioctl(
>  	case XFS_IOC_ERROR_INJECTION:
>  	case XFS_IOC_ERROR_CLEARALL:
>  	case FS_IOC_GETFSMAP:
> +	case XFS_IOC_GET_AG_RESBLKS:
>  		return xfs_file_ioctl(filp, cmd, p);
>  #ifndef BROKEN_X86_ALIGNMENT
>  	/* These are handled fine if no alignment issues */
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 03/27] xfs: create an ioctl to scrub AG metadata
  2017-09-21  0:17 ` [PATCH 03/27] xfs: create an ioctl to scrub AG metadata Darrick J. Wong
@ 2017-09-21 14:36   ` Brian Foster
  2017-09-21 17:35     ` Darrick J. Wong
  0 siblings, 1 reply; 51+ messages in thread
From: Brian Foster @ 2017-09-21 14:36 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Sep 20, 2017 at 05:17:56PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Create an ioctl that can be used to scrub internal filesystem metadata.
> The new ioctl takes the metadata type, an (optional) AG number, an
> (optional) inode number and generation, and a flags argument.  This will
> be used by the upcoming XFS online scrub tool.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/Kconfig           |   17 ++++++++++++++
>  fs/xfs/Makefile          |   11 +++++++++
>  fs/xfs/libxfs/xfs_fs.h   |   53 +++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/scrub/scrub.c     |   54 ++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/scrub/scrub.h     |   25 +++++++++++++++++++++
>  fs/xfs/scrub/trace.c     |   41 +++++++++++++++++++++++++++++++++++
>  fs/xfs/scrub/trace.h     |   33 ++++++++++++++++++++++++++++
>  fs/xfs/scrub/xfs_scrub.h |   29 +++++++++++++++++++++++++
>  fs/xfs/xfs_ioctl.c       |   28 ++++++++++++++++++++++++
>  fs/xfs/xfs_ioctl32.c     |    1 +
>  10 files changed, 292 insertions(+)
>  create mode 100644 fs/xfs/scrub/scrub.c
>  create mode 100644 fs/xfs/scrub/scrub.h
>  create mode 100644 fs/xfs/scrub/trace.c
>  create mode 100644 fs/xfs/scrub/trace.h
>  create mode 100644 fs/xfs/scrub/xfs_scrub.h
> 
> 

The code looks sane, though I think I need to understand the error codes
a bit better. Perhaps once I get further into the series..

Brian

> diff --git a/fs/xfs/Kconfig b/fs/xfs/Kconfig
> index 1b98cfa..f42fcf1 100644
> --- a/fs/xfs/Kconfig
> +++ b/fs/xfs/Kconfig
> @@ -71,6 +71,23 @@ config XFS_RT
>  
>  	  If unsure, say N.
>  
> +config XFS_ONLINE_SCRUB
> +	bool "XFS online metadata check support"
> +	default n
> +	depends on XFS_FS
> +	help
> +	  If you say Y here you will be able to check metadata on a
> +	  mounted XFS filesystem.  This feature is intended to reduce
> +	  filesystem downtime by supplementing xfs_repair.  The key
> +	  advantage here is to look for problems proactively so that
> +	  they can be dealt with in a controlled manner.
> +
> +	  This feature is considered EXPERIMENTAL.  Use with caution!
> +
> +	  See the xfs_scrub man page in section 8 for additional information.
> +
> +	  If unsure, say N.
> +
>  config XFS_WARN
>  	bool "XFS Verbose Warnings"
>  	depends on XFS_FS && !XFS_DEBUG
> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> index dbc33e0..f4312bc 100644
> --- a/fs/xfs/Makefile
> +++ b/fs/xfs/Makefile
> @@ -138,3 +138,14 @@ xfs-$(CONFIG_XFS_POSIX_ACL)	+= xfs_acl.o
>  xfs-$(CONFIG_SYSCTL)		+= xfs_sysctl.o
>  xfs-$(CONFIG_COMPAT)		+= xfs_ioctl32.o
>  xfs-$(CONFIG_EXPORTFS_BLOCK_OPS)	+= xfs_pnfs.o
> +
> +# online scrub/repair
> +ifeq ($(CONFIG_XFS_ONLINE_SCRUB),y)
> +
> +# Tracepoints like to blow up, so build that before everything else
> +
> +xfs-y				+= $(addprefix scrub/, \
> +				   trace.o \
> +				   scrub.o \
> +				   )
> +endif
> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> index 2c26c38..a4b4c8c 100644
> --- a/fs/xfs/libxfs/xfs_fs.h
> +++ b/fs/xfs/libxfs/xfs_fs.h
> @@ -468,6 +468,58 @@ typedef struct xfs_swapext
>  #define XFS_FSOP_GOING_FLAGS_LOGFLUSH		0x1	/* flush log but not data */
>  #define XFS_FSOP_GOING_FLAGS_NOLOGFLUSH		0x2	/* don't flush log nor data */
>  
> +/* metadata scrubbing */
> +struct xfs_scrub_metadata {
> +	__u32 sm_type;		/* What to check? */
> +	__u32 sm_flags;		/* flags; see below. */
> +	__u64 sm_ino;		/* inode number. */
> +	__u32 sm_gen;		/* inode generation. */
> +	__u32 sm_agno;		/* ag number. */
> +	__u64 sm_reserved[5];	/* pad to 64 bytes */
> +};
> +
> +/*
> + * Metadata types and flags for scrub operation.
> + */
> +
> +/* Scrub subcommands. */
> +
> +/* Number of scrub subcommands. */
> +#define XFS_SCRUB_TYPE_NR	0
> +
> +/* i: Repair this metadata. */
> +#define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
> +
> +/* o: Metadata object needs repair. */
> +#define XFS_SCRUB_OFLAG_CORRUPT		(1 << 1)
> +
> +/*
> + * o: Metadata object could be optimized.  It's not corrupt, but
> + *    we could improve on it somehow.
> + */
> +#define XFS_SCRUB_OFLAG_PREEN		(1 << 2)
> +
> +/* o: Cross-referencing failed. */
> +#define XFS_SCRUB_OFLAG_XFAIL		(1 << 3)
> +
> +/* o: Metadata object disagrees with cross-referenced metadata. */
> +#define XFS_SCRUB_OFLAG_XCORRUPT	(1 << 4)
> +
> +/* o: Scan was not complete. */
> +#define XFS_SCRUB_OFLAG_INCOMPLETE	(1 << 5)
> +
> +/* o: Metadata object looked funny but isn't corrupt. */
> +#define XFS_SCRUB_OFLAG_WARNING		(1 << 6)
> +
> +#define XFS_SCRUB_FLAGS_IN	(XFS_SCRUB_IFLAG_REPAIR)
> +#define XFS_SCRUB_FLAGS_OUT	(XFS_SCRUB_OFLAG_CORRUPT | \
> +				 XFS_SCRUB_OFLAG_PREEN | \
> +				 XFS_SCRUB_OFLAG_XFAIL | \
> +				 XFS_SCRUB_OFLAG_XCORRUPT | \
> +				 XFS_SCRUB_OFLAG_INCOMPLETE | \
> +				 XFS_SCRUB_OFLAG_WARNING)
> +#define XFS_SCRUB_FLAGS_ALL	(XFS_SCRUB_FLAGS_IN | XFS_SCRUB_FLAGS_OUT)
> +
>  /*
>   * AG reserved block counters
>   */
> @@ -522,6 +574,7 @@ struct xfs_fsop_ag_resblks {
>  #define XFS_IOC_ZERO_RANGE	_IOW ('X', 57, struct xfs_flock64)
>  #define XFS_IOC_FREE_EOFBLOCKS	_IOR ('X', 58, struct xfs_fs_eofblocks)
>  /*	XFS_IOC_GETFSMAP ------ hoisted 59         */
> +#define XFS_IOC_SCRUB_METADATA	_IOWR('X', 60, struct xfs_scrub_metadata)
>  
>  /*
>   * ioctl commands that replace IRIX syssgi()'s
> diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
> new file mode 100644
> index 0000000..5db2a6f
> --- /dev/null
> +++ b/fs/xfs/scrub/scrub.c
> @@ -0,0 +1,54 @@
> +/*
> + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> + *
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#include "xfs.h"
> +#include "xfs_fs.h"
> +#include "xfs_shared.h"
> +#include "xfs_format.h"
> +#include "xfs_trans_resv.h"
> +#include "xfs_mount.h"
> +#include "xfs_defer.h"
> +#include "xfs_btree.h"
> +#include "xfs_bit.h"
> +#include "xfs_log_format.h"
> +#include "xfs_trans.h"
> +#include "xfs_sb.h"
> +#include "xfs_inode.h"
> +#include "xfs_alloc.h"
> +#include "xfs_alloc_btree.h"
> +#include "xfs_bmap.h"
> +#include "xfs_bmap_btree.h"
> +#include "xfs_ialloc.h"
> +#include "xfs_ialloc_btree.h"
> +#include "xfs_refcount.h"
> +#include "xfs_refcount_btree.h"
> +#include "xfs_rmap.h"
> +#include "xfs_rmap_btree.h"
> +#include "scrub/xfs_scrub.h"
> +#include "scrub/scrub.h"
> +#include "scrub/trace.h"
> +
> +/* Dispatch metadata scrubbing. */
> +int
> +xfs_scrub_metadata(
> +	struct xfs_inode		*ip,
> +	struct xfs_scrub_metadata	*sm)
> +{
> +	return -EOPNOTSUPP;
> +}
> diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
> new file mode 100644
> index 0000000..eb1cd9d
> --- /dev/null
> +++ b/fs/xfs/scrub/scrub.h
> @@ -0,0 +1,25 @@
> +/*
> + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> + *
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#ifndef __XFS_SCRUB_SCRUB_H__
> +#define __XFS_SCRUB_SCRUB_H__
> +
> +/* Metadata scrubbers */
> +
> +#endif	/* __XFS_SCRUB_SCRUB_H__ */
> diff --git a/fs/xfs/scrub/trace.c b/fs/xfs/scrub/trace.c
> new file mode 100644
> index 0000000..c59fd41
> --- /dev/null
> +++ b/fs/xfs/scrub/trace.c
> @@ -0,0 +1,41 @@
> +/*
> + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> + *
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#include "xfs.h"
> +#include "xfs_fs.h"
> +#include "xfs_shared.h"
> +#include "xfs_format.h"
> +#include "xfs_log_format.h"
> +#include "xfs_trans_resv.h"
> +#include "xfs_mount.h"
> +#include "xfs_defer.h"
> +#include "xfs_da_format.h"
> +#include "xfs_defer.h"
> +#include "xfs_inode.h"
> +#include "xfs_btree.h"
> +#include "xfs_trans.h"
> +#include "scrub/xfs_scrub.h"
> +#include "scrub/scrub.h"
> +
> +/*
> + * We include this last to have the helpers above available for the trace
> + * event implementations.
> + */
> +#define CREATE_TRACE_POINTS
> +#include "scrub/trace.h"
> diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
> new file mode 100644
> index 0000000..a95a7c8
> --- /dev/null
> +++ b/fs/xfs/scrub/trace.h
> @@ -0,0 +1,33 @@
> +/*
> + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> + *
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM xfs_scrub
> +
> +#if !defined(_TRACE_XFS_SCRUB_TRACE_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_XFS_SCRUB_TRACE_H
> +
> +#include <linux/tracepoint.h>
> +
> +#endif /* _TRACE_XFS_SCRUB_TRACE_H */
> +
> +#undef TRACE_INCLUDE_PATH
> +#define TRACE_INCLUDE_PATH .
> +#define TRACE_INCLUDE_FILE scrub/trace
> +#include <trace/define_trace.h>
> diff --git a/fs/xfs/scrub/xfs_scrub.h b/fs/xfs/scrub/xfs_scrub.h
> new file mode 100644
> index 0000000..e00e0ea
> --- /dev/null
> +++ b/fs/xfs/scrub/xfs_scrub.h
> @@ -0,0 +1,29 @@
> +/*
> + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> + *
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#ifndef __XFS_SCRUB_H__
> +#define __XFS_SCRUB_H__
> +
> +#ifndef CONFIG_XFS_ONLINE_SCRUB
> +# define xfs_scrub_metadata(ip, sm)	(-ENOTTY)
> +#else
> +int xfs_scrub_metadata(struct xfs_inode *ip, struct xfs_scrub_metadata *sm);
> +#endif /* CONFIG_XFS_ONLINE_SCRUB */
> +
> +#endif	/* __XFS_SCRUB_H__ */
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 44dc178..ab7a7f8 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -44,6 +44,7 @@
>  #include "xfs_btree.h"
>  #include <linux/fsmap.h>
>  #include "xfs_fsmap.h"
> +#include "scrub/xfs_scrub.h"
>  
>  #include <linux/capability.h>
>  #include <linux/cred.h>
> @@ -1702,6 +1703,30 @@ xfs_ioc_getfsmap(
>  	return 0;
>  }
>  
> +STATIC int
> +xfs_ioc_scrub_metadata(
> +	struct xfs_inode		*ip,
> +	void				__user *arg)
> +{
> +	struct xfs_scrub_metadata	scrub;
> +	int				error;
> +
> +	if (!capable(CAP_SYS_ADMIN))
> +		return -EPERM;
> +
> +	if (copy_from_user(&scrub, arg, sizeof(scrub)))
> +		return -EFAULT;
> +
> +	error = xfs_scrub_metadata(ip, &scrub);
> +	if (error)
> +		return error;
> +
> +	if (copy_to_user(arg, &scrub, sizeof(scrub)))
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +
>  int
>  xfs_ioc_swapext(
>  	xfs_swapext_t	*sxp)
> @@ -1906,6 +1931,9 @@ xfs_file_ioctl(
>  	case FS_IOC_GETFSMAP:
>  		return xfs_ioc_getfsmap(ip, arg);
>  
> +	case XFS_IOC_SCRUB_METADATA:
> +		return xfs_ioc_scrub_metadata(ip, arg);
> +
>  	case XFS_IOC_FD_TO_HANDLE:
>  	case XFS_IOC_PATH_TO_HANDLE:
>  	case XFS_IOC_PATH_TO_FSHANDLE: {
> diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
> index e8b4de3..972d4bd 100644
> --- a/fs/xfs/xfs_ioctl32.c
> +++ b/fs/xfs/xfs_ioctl32.c
> @@ -557,6 +557,7 @@ xfs_file_compat_ioctl(
>  	case XFS_IOC_ERROR_CLEARALL:
>  	case FS_IOC_GETFSMAP:
>  	case XFS_IOC_GET_AG_RESBLKS:
> +	case XFS_IOC_SCRUB_METADATA:
>  		return xfs_file_ioctl(filp, cmd, p);
>  #ifndef BROKEN_X86_ALIGNMENT
>  	/* These are handled fine if no alignment issues */
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 04/27] xfs: dispatch metadata scrub subcommands
  2017-09-21  0:18 ` [PATCH 04/27] xfs: dispatch metadata scrub subcommands Darrick J. Wong
@ 2017-09-21 14:37   ` Brian Foster
  2017-09-21 18:08     ` Darrick J. Wong
  0 siblings, 1 reply; 51+ messages in thread
From: Brian Foster @ 2017-09-21 14:37 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Sep 20, 2017 at 05:18:02PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Create structures needed to hold scrubbing context and dispatch incoming
> commands to the individual scrubbers.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/scrub/scrub.c |  172 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/scrub/scrub.h |   19 ++++++
>  fs/xfs/scrub/trace.h |   43 +++++++++++++
>  3 files changed, 233 insertions(+), 1 deletion(-)
> 
> 
> diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
> index 5db2a6f..7cf518e 100644
> --- a/fs/xfs/scrub/scrub.c
> +++ b/fs/xfs/scrub/scrub.c
> @@ -44,11 +44,181 @@
>  #include "scrub/scrub.h"
>  #include "scrub/trace.h"
>  
> +/*
> + * Online Scrub and Repair
> + *
> + * Traditionally, XFS (the kernel driver) did not know how to check or
> + * repair on-disk data structures.  That task was left to the xfs_check
> + * and xfs_repair tools, both of which require taking the filesystem
> + * offline for a thorough but time consuming examination.  Online
> + * scrub & repair, on the other hand, enables us to check the metadata
> + * for obvious errors while carefully stepping around the filesystem's
> + * ongoing operations, locking rules, etc.
> + *
> + * Given that most XFS metadata consist of records stored in a btree,
> + * most of the checking functions iterate the btree blocks themselves
> + * looking for irregularities.  When a record block is encountered, each
> + * record can be checked for obviously bad values.  Record values can
> + * also be cross-referenced against other btrees to look for potential
> + * misunderstandings between pieces of metadata.
> + *
> + * It is expected that the checkers responsible for per-AG metadata
> + * structures will lock the AG headers (AGI, AGF, AGFL), iterate the
> + * metadata structure, and perform any relevant cross-referencing before
> + * unlocking the AG and returning the results to userspace.  These
> + * scrubbers must not keep an AG locked for too long to avoid tying up
> + * the block and inode allocators.
> + *
> + * Block maps and b-trees rooted in an inode present a special challenge
> + * because they can involve extents from any AG.  The general scrubber
> + * structure of lock -> check -> xref -> unlock still holds, but AG
> + * locking order rules /must/ be obeyed to avoid deadlocks.  The
> + * ordering rule, of course, is that we must lock in increasing AG
> + * order.  Helper functions are provided to track which AG headers we've
> + * already locked.  If we detect an imminent locking order violation, we
> + * can signal a potential deadlock, in which case the scrubber can jump
> + * out to the top level, lock all the AGs in order, and retry the scrub.
> + *
> + * For file data (directories, extended attributes, symlinks) scrub, we
> + * can simply lock the inode and walk the data.  For btree data
> + * (directories and attributes) we follow the same btree-scrubbing
> + * strategy outlined previously to check the records.
> + *
> + * We use a bit of trickery with transactions to avoid buffer deadlocks
> + * if there is a cycle in the metadata.  The basic problem is that
> + * travelling down a btree involves locking the current buffer at each
> + * tree level.  If a pointer should somehow point back to a buffer that
> + * we've already examined, we will deadlock due to the second buffer
> + * locking attempt.  Note however that grabbing a buffer in transaction
> + * context links the locked buffer to the transaction.  If we try to
> + * re-grab the buffer in the context of the same transaction, we avoid
> + * the second lock attempt and continue.  Between the verifier and the
> + * scrubber, something will notice that something is amiss and report
> + * the corruption.  Therefore, each scrubber will allocate an empty
> + * transaction, attach buffers to it, and cancel the transaction at the
> + * end of the scrub run.  Cancelling a non-dirty transaction simply
> + * unlocks the buffers.
> + *
> + * There are four pieces of data that scrub can communicate to
> + * userspace.  The first is the error code (errno), which can be used to
> + * communicate operational errors in performing the scrub.  There are
> + * also three flags that can be set in the scrub context.  If the data
> + * structure itself is corrupt, the CORRUPT flag will be set.  If
> + * the metadata is correct but otherwise suboptimal, the PREEN flag
> + * will be set.

Did you mean to describe other flags here?

> + */
> +
> +/* Scrub setup and teardown */
> +
> +/* Free all the resources and finish the transactions. */
> +STATIC int
> +xfs_scrub_teardown(
> +	struct xfs_scrub_context	*sc,
> +	int				error)

What's the purpose of passing error just to return it?

> +{
> +	if (sc->tp) {
> +		xfs_trans_cancel(sc->tp);
> +		sc->tp = NULL;
> +	}
> +	return error;
> +}
> +
> +/* Scrubbing dispatch. */
> +
> +static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
> +};
> +
>  /* Dispatch metadata scrubbing. */
>  int
>  xfs_scrub_metadata(
>  	struct xfs_inode		*ip,
>  	struct xfs_scrub_metadata	*sm)
>  {
> -	return -EOPNOTSUPP;
> +	struct xfs_scrub_context	sc;
> +	struct xfs_mount		*mp = ip->i_mount;
> +	const struct xfs_scrub_meta_ops	*ops;
> +	bool				try_harder = false;
> +	int				error = 0;
> +
> +	trace_xfs_scrub_start(ip, sm, error);
> +
> +	/* Forbidden if we are shut down or mounted norecovery. */
> +	error = -ESHUTDOWN;
> +	if (XFS_FORCED_SHUTDOWN(mp))
> +		goto out;
> +	error = -ENOTRECOVERABLE;
> +	if (mp->m_flags & XFS_MOUNT_NORECOVERY)
> +		goto out;
> +
> +	/* Check our inputs. */
> +	error = -EINVAL;
> +	sm->sm_flags &= ~XFS_SCRUB_FLAGS_OUT;
> +	if (sm->sm_flags & ~XFS_SCRUB_FLAGS_IN)
> +		goto out;
> +	if (memchr_inv(sm->sm_reserved, 0, sizeof(sm->sm_reserved)))
> +		goto out;
> +
> +	/* Do we know about this type of metadata? */
> +	error = -ENOENT;
> +	if (sm->sm_type >= XFS_SCRUB_TYPE_NR)
> +		goto out;
> +	ops = &meta_scrub_ops[sm->sm_type];
> +	if (ops->scrub == NULL)
> +		goto out;
> +
> +	/* Does this fs even support this type of metadata? */
> +	if (ops->has && !ops->has(&mp->m_sb))
> +		goto out;
> +
> +	/* We don't know how to repair anything yet. */
> +	error = -EOPNOTSUPP;
> +	if (sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR)
> +		goto out;
> +
> +	/* This isn't a stable feature.  Use with care. */
> +	{
> +		static bool warned;
> +
> +		if (!warned)
> +			xfs_alert(mp,
> +	"EXPERIMENTAL online scrub feature in use. Use at your own risk!");
> +		warned = true;
> +	}
> +
> +retry_op:
> +	/* Set up for the operation. */
> +	memset(&sc, 0, sizeof(sc));
> +	sc.mp = ip->i_mount;
> +	sc.sm = sm;
> +	sc.ops = ops;
> +	sc.try_harder = try_harder;
> +	error = sc.ops->setup(&sc, ip);
> +	if (error)
> +		goto out_teardown;
> +
> +	/* Scrub for errors. */
> +	error = sc.ops->scrub(&sc);
> +	if (!try_harder && error == -EDEADLOCK) {
> +		/*
> +		 * Scrubbers return -EDEADLOCK to mean 'try harder'.
> +		 * Tear down everything we hold, then set up again with
> +		 * preparation for worst-case scenarios.
> +		 */
> +		error = xfs_scrub_teardown(&sc, 0);
> +		if (error)
> +			goto out;
> +		try_harder = true;
> +		goto retry_op;
> +	} else if (error)
> +		goto out_teardown;
> +
> +	if (sc.sm->sm_flags & (XFS_SCRUB_OFLAG_CORRUPT |
> +			       XFS_SCRUB_OFLAG_XCORRUPT))
> +		xfs_alert_ratelimited(mp, "Corruption detected during scrub.");
> +
> +out_teardown:
> +	error = xfs_scrub_teardown(&sc, error);
> +out:
> +	trace_xfs_scrub_done(ip, sm, error);
> +	return error;
>  }
> diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
> index eb1cd9d..b271b2a 100644
> --- a/fs/xfs/scrub/scrub.h
> +++ b/fs/xfs/scrub/scrub.h
> @@ -20,6 +20,25 @@
>  #ifndef __XFS_SCRUB_SCRUB_H__
>  #define __XFS_SCRUB_SCRUB_H__
>  
> +struct xfs_scrub_context;
> +
> +struct xfs_scrub_meta_ops {
> +	int		(*setup)(struct xfs_scrub_context *,
> +				 struct xfs_inode *);
> +	int		(*scrub)(struct xfs_scrub_context *);
> +	bool		(*has)(struct xfs_sb *);

I assume 'has' is to identify whether a particular mount supports a
particular feature. I suppose a better name would be nice here, or
perhaps just a comment to outline the purpose of each callout.

Brian

> +};
> +
> +struct xfs_scrub_context {
> +	/* General scrub state. */
> +	struct xfs_mount		*mp;
> +	struct xfs_scrub_metadata	*sm;
> +	const struct xfs_scrub_meta_ops	*ops;
> +	struct xfs_trans		*tp;
> +	struct xfs_inode		*ip;
> +	bool				try_harder;
> +};
> +
>  /* Metadata scrubbers */
>  
>  #endif	/* __XFS_SCRUB_SCRUB_H__ */
> diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
> index a95a7c8..688517e 100644
> --- a/fs/xfs/scrub/trace.h
> +++ b/fs/xfs/scrub/trace.h
> @@ -25,6 +25,49 @@
>  
>  #include <linux/tracepoint.h>
>  
> +DECLARE_EVENT_CLASS(xfs_scrub_class,
> +	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
> +		 int error),
> +	TP_ARGS(ip, sm, error),
> +	TP_STRUCT__entry(
> +		__field(dev_t, dev)
> +		__field(xfs_ino_t, ino)
> +		__field(unsigned int, type)
> +		__field(xfs_agnumber_t, agno)
> +		__field(xfs_ino_t, inum)
> +		__field(unsigned int, gen)
> +		__field(unsigned int, flags)
> +		__field(int, error)
> +	),
> +	TP_fast_assign(
> +		__entry->dev = ip->i_mount->m_super->s_dev;
> +		__entry->ino = ip->i_ino;
> +		__entry->type = sm->sm_type;
> +		__entry->agno = sm->sm_agno;
> +		__entry->inum = sm->sm_ino;
> +		__entry->gen = sm->sm_gen;
> +		__entry->flags = sm->sm_flags;
> +		__entry->error = error;
> +	),
> +	TP_printk("dev %d:%d ino %llu type %u agno %u inum %llu gen %u flags 0x%x error %d",
> +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> +		  __entry->ino,
> +		  __entry->type,
> +		  __entry->agno,
> +		  __entry->inum,
> +		  __entry->gen,
> +		  __entry->flags,
> +		  __entry->error)
> +)
> +#define DEFINE_SCRUB_EVENT(name) \
> +DEFINE_EVENT(xfs_scrub_class, name, \
> +	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm, \
> +		 int error), \
> +	TP_ARGS(ip, sm, error))
> +
> +DEFINE_SCRUB_EVENT(xfs_scrub_start);
> +DEFINE_SCRUB_EVENT(xfs_scrub_done);
> +
>  #endif /* _TRACE_XFS_SCRUB_TRACE_H */
>  
>  #undef TRACE_INCLUDE_PATH
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 02/27] xfs: query the per-AG reservation counters
  2017-09-21 14:36   ` Brian Foster
@ 2017-09-21 17:30     ` Darrick J. Wong
  0 siblings, 0 replies; 51+ messages in thread
From: Darrick J. Wong @ 2017-09-21 17:30 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Thu, Sep 21, 2017 at 10:36:31AM -0400, Brian Foster wrote:
> On Wed, Sep 20, 2017 at 05:17:43PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Establish an ioctl for userspace to query the original and current
> > per-AG reservation counts.  This will be used by xfs_scrub to
> > check that the vfs counters are at least somewhat sane.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> 
> The code seems fine, but I'm wondering how much of this would remain as
> is based on Dave's potential switch of the perag res stuff to be a
> hidden reservation. That itself still needs to be posted/reviewed/etc.,
> so might it be smarter to defer defining an interface to export this
> stuff until the core mechanism is worked out?

I think Dave's vision is that we should permanently steal the AG
reservations from the max fs size:

(device size - static metadata - static per-ag reservations)

In which case the ioctl isn't needed at all, since the statfs counters
would simply never report the per-ag reservations at all.

> That aside, I'm also wondering how/what userspace scrub does with the
> overall total values as opposed to individual AG reservation values.

To a rough approximation, xfs_scrub (phase 7) compares the fs geometry
data (used blocks, used inodes) against what it found via getfsmap and
bulkstat and warns about a potentially insufficient scrub if the counts
are off by more than 10% from what the fs reports.  The per-AG summary
counter ar_current_resv is added to the statvfs.f_bfree counter since
reserved blocks don't show up in getfsmap.

Since per-AG reservations never use more than 9% of the fs space worst
case (1k blocks) and usually 2% (4k blocks) the current 10% tolerance
built into xfs_scrub is probably good enough for now.

Anyway, I plan to withdraw this patch.

> Are the current "ask" values expected to be uniform across each AG..?

scrub doesn't care.

> For example, what happens if one AG has a bogus "current" value for
> whatever reason (perhaps this is more relevant for both values if we
> do actually end up with on-disk reservations)?

If it exceeds scrub's tolerances it'll warn about the discrepancy.
There's probably no way to fix the summary counters without an
xfs_repair run, unless it's acceptable to freeze the whole fs.

Not going to try to do that. :)

--D

> 
> Brian
> 
> >  fs/xfs/libxfs/xfs_fs.h |   12 ++++++++++++
> >  fs/xfs/xfs_fsops.c     |   26 ++++++++++++++++++++++++++
> >  fs/xfs/xfs_fsops.h     |    2 ++
> >  fs/xfs/xfs_ioctl.c     |   24 ++++++++++++++++++++++++
> >  fs/xfs/xfs_ioctl32.c   |    1 +
> >  5 files changed, 65 insertions(+)
> > 
> > 
> > diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> > index 8c61f21..2c26c38 100644
> > --- a/fs/xfs/libxfs/xfs_fs.h
> > +++ b/fs/xfs/libxfs/xfs_fs.h
> > @@ -469,6 +469,17 @@ typedef struct xfs_swapext
> >  #define XFS_FSOP_GOING_FLAGS_NOLOGFLUSH		0x2	/* don't flush log nor data */
> >  
> >  /*
> > + * AG reserved block counters
> > + */
> > +struct xfs_fsop_ag_resblks {
> > +	__u32 ar_flags;			/* output flags, none defined now */
> > +	__u32 ar_reserved;		/* zero */
> > +	__u64 ar_current_resv;		/* blocks reserved now */
> > +	__u64 ar_mount_resv;		/* blocks reserved at mount time */
> > +	__u64 ar_reserved2[5];		/* zero */
> > +};
> > +
> > +/*
> >   * ioctl limits
> >   */
> >  #ifdef XATTR_LIST_MAX
> > @@ -543,6 +554,7 @@ typedef struct xfs_swapext
> >  #define XFS_IOC_ATTRMULTI_BY_HANDLE  _IOW ('X', 123, struct xfs_fsop_attrmulti_handlereq)
> >  #define XFS_IOC_FSGEOMETRY	     _IOR ('X', 124, struct xfs_fsop_geom)
> >  #define XFS_IOC_GOINGDOWN	     _IOR ('X', 125, uint32_t)
> > +#define XFS_IOC_GET_AG_RESBLKS	     _IOR ('X', 126, struct xfs_fsop_ag_resblks)
> >  /*	XFS_IOC_GETFSUUID ---------- deprecated 140	 */
> >  
> >  
> > diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> > index 8f22fc5..50fb3a2 100644
> > --- a/fs/xfs/xfs_fsops.c
> > +++ b/fs/xfs/xfs_fsops.c
> > @@ -44,6 +44,7 @@
> >  #include "xfs_filestream.h"
> >  #include "xfs_rmap.h"
> >  #include "xfs_ag_resv.h"
> > +#include "xfs_fs.h"
> >  
> >  /*
> >   * File system operations
> > @@ -1046,3 +1047,28 @@ xfs_fs_unreserve_ag_blocks(
> >  
> >  	return error;
> >  }
> > +
> > +/* Query the per-AG reservations to see how many blocks we have reserved. */
> > +int
> > +xfs_fs_get_ag_reserve_blocks(
> > +	struct xfs_mount		*mp,
> > +	struct xfs_fsop_ag_resblks	*out)
> > +{
> > +	struct xfs_ag_resv		*r;
> > +	struct xfs_perag		*pag;
> > +	xfs_agnumber_t			agno;
> > +
> > +	memset(out, 0, sizeof(struct xfs_fsop_ag_resblks));
> > +	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
> > +		pag = xfs_perag_get(mp, agno);
> > +		r = xfs_perag_resv(pag, XFS_AG_RESV_METADATA);
> > +		out->ar_current_resv += r->ar_reserved;
> > +		out->ar_mount_resv += r->ar_asked;
> > +		r = xfs_perag_resv(pag, XFS_AG_RESV_AGFL);
> > +		out->ar_current_resv += r->ar_reserved;
> > +		out->ar_mount_resv += r->ar_asked;
> > +		xfs_perag_put(pag);
> > +	}
> > +
> > +	return 0;
> > +}
> > diff --git a/fs/xfs/xfs_fsops.h b/fs/xfs/xfs_fsops.h
> > index 2954c13..c8f5e26 100644
> > --- a/fs/xfs/xfs_fsops.h
> > +++ b/fs/xfs/xfs_fsops.h
> > @@ -25,6 +25,8 @@ extern int xfs_fs_counts(xfs_mount_t *mp, xfs_fsop_counts_t *cnt);
> >  extern int xfs_reserve_blocks(xfs_mount_t *mp, uint64_t *inval,
> >  				xfs_fsop_resblks_t *outval);
> >  extern int xfs_fs_goingdown(xfs_mount_t *mp, uint32_t inflags);
> > +extern int xfs_fs_get_ag_reserve_blocks(struct xfs_mount *mp,
> > +		struct xfs_fsop_ag_resblks *out);
> >  
> >  extern int xfs_fs_reserve_ag_blocks(struct xfs_mount *mp);
> >  extern int xfs_fs_unreserve_ag_blocks(struct xfs_mount *mp);
> > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> > index 5049e8a..44dc178 100644
> > --- a/fs/xfs/xfs_ioctl.c
> > +++ b/fs/xfs/xfs_ioctl.c
> > @@ -1782,6 +1782,27 @@ xfs_ioc_swapext(
> >  	return error;
> >  }
> >  
> > +static int
> > +xfs_ioc_get_ag_reserve_blocks(
> > +	struct xfs_mount		*mp,
> > +	void __user			*arg)
> > +{
> > +	struct xfs_fsop_ag_resblks	out;
> > +	int				error;
> > +
> > +	if (!capable(CAP_SYS_ADMIN))
> > +		return -EPERM;
> > +
> > +	error = xfs_fs_get_ag_reserve_blocks(mp, &out);
> > +	if (error)
> > +		return error;
> > +
> > +	if (copy_to_user(arg, &out, sizeof(out)))
> > +		return -EFAULT;
> > +
> > +	return 0;
> > +}
> > +
> >  /*
> >   * Note: some of the ioctl's return positive numbers as a
> >   * byte count indicating success, such as readlink_by_handle.
> > @@ -1987,6 +2008,9 @@ xfs_file_ioctl(
> >  		return 0;
> >  	}
> >  
> > +	case XFS_IOC_GET_AG_RESBLKS:
> > +		return xfs_ioc_get_ag_reserve_blocks(mp, arg);
> > +
> >  	case XFS_IOC_FSGROWFSDATA: {
> >  		xfs_growfs_data_t in;
> >  
> > diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
> > index fa0bc4d..e8b4de3 100644
> > --- a/fs/xfs/xfs_ioctl32.c
> > +++ b/fs/xfs/xfs_ioctl32.c
> > @@ -556,6 +556,7 @@ xfs_file_compat_ioctl(
> >  	case XFS_IOC_ERROR_INJECTION:
> >  	case XFS_IOC_ERROR_CLEARALL:
> >  	case FS_IOC_GETFSMAP:
> > +	case XFS_IOC_GET_AG_RESBLKS:
> >  		return xfs_file_ioctl(filp, cmd, p);
> >  #ifndef BROKEN_X86_ALIGNMENT
> >  	/* These are handled fine if no alignment issues */
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 03/27] xfs: create an ioctl to scrub AG metadata
  2017-09-21 14:36   ` Brian Foster
@ 2017-09-21 17:35     ` Darrick J. Wong
  2017-09-21 17:52       ` Brian Foster
  0 siblings, 1 reply; 51+ messages in thread
From: Darrick J. Wong @ 2017-09-21 17:35 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Thu, Sep 21, 2017 at 10:36:39AM -0400, Brian Foster wrote:
> On Wed, Sep 20, 2017 at 05:17:56PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Create an ioctl that can be used to scrub internal filesystem metadata.
> > The new ioctl takes the metadata type, an (optional) AG number, an
> > (optional) inode number and generation, and a flags argument.  This will
> > be used by the upcoming XFS online scrub tool.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/Kconfig           |   17 ++++++++++++++
> >  fs/xfs/Makefile          |   11 +++++++++
> >  fs/xfs/libxfs/xfs_fs.h   |   53 +++++++++++++++++++++++++++++++++++++++++++++
> >  fs/xfs/scrub/scrub.c     |   54 ++++++++++++++++++++++++++++++++++++++++++++++
> >  fs/xfs/scrub/scrub.h     |   25 +++++++++++++++++++++
> >  fs/xfs/scrub/trace.c     |   41 +++++++++++++++++++++++++++++++++++
> >  fs/xfs/scrub/trace.h     |   33 ++++++++++++++++++++++++++++
> >  fs/xfs/scrub/xfs_scrub.h |   29 +++++++++++++++++++++++++
> >  fs/xfs/xfs_ioctl.c       |   28 ++++++++++++++++++++++++
> >  fs/xfs/xfs_ioctl32.c     |    1 +
> >  10 files changed, 292 insertions(+)
> >  create mode 100644 fs/xfs/scrub/scrub.c
> >  create mode 100644 fs/xfs/scrub/scrub.h
> >  create mode 100644 fs/xfs/scrub/trace.c
> >  create mode 100644 fs/xfs/scrub/trace.h
> >  create mode 100644 fs/xfs/scrub/xfs_scrub.h
> > 
> > 
> 
> The code looks sane, though I think I need to understand the error codes
> a bit better. Perhaps once I get further into the series..

Yes, the ioctl needs documentation, though I'm unclear on where's an
appropriate place to put them.  The ioctl is not intended for general
consumption, so man-pages.git seems inappropriate.  I was thinking
either in xfs_fs.h directly, or perhaps xfsprogs' man pages?

(We don't seem to document the xfs ioctls afaict?)

--D

> 
> Brian
> 
> > diff --git a/fs/xfs/Kconfig b/fs/xfs/Kconfig
> > index 1b98cfa..f42fcf1 100644
> > --- a/fs/xfs/Kconfig
> > +++ b/fs/xfs/Kconfig
> > @@ -71,6 +71,23 @@ config XFS_RT
> >  
> >  	  If unsure, say N.
> >  
> > +config XFS_ONLINE_SCRUB
> > +	bool "XFS online metadata check support"
> > +	default n
> > +	depends on XFS_FS
> > +	help
> > +	  If you say Y here you will be able to check metadata on a
> > +	  mounted XFS filesystem.  This feature is intended to reduce
> > +	  filesystem downtime by supplementing xfs_repair.  The key
> > +	  advantage here is to look for problems proactively so that
> > +	  they can be dealt with in a controlled manner.
> > +
> > +	  This feature is considered EXPERIMENTAL.  Use with caution!
> > +
> > +	  See the xfs_scrub man page in section 8 for additional information.
> > +
> > +	  If unsure, say N.
> > +
> >  config XFS_WARN
> >  	bool "XFS Verbose Warnings"
> >  	depends on XFS_FS && !XFS_DEBUG
> > diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> > index dbc33e0..f4312bc 100644
> > --- a/fs/xfs/Makefile
> > +++ b/fs/xfs/Makefile
> > @@ -138,3 +138,14 @@ xfs-$(CONFIG_XFS_POSIX_ACL)	+= xfs_acl.o
> >  xfs-$(CONFIG_SYSCTL)		+= xfs_sysctl.o
> >  xfs-$(CONFIG_COMPAT)		+= xfs_ioctl32.o
> >  xfs-$(CONFIG_EXPORTFS_BLOCK_OPS)	+= xfs_pnfs.o
> > +
> > +# online scrub/repair
> > +ifeq ($(CONFIG_XFS_ONLINE_SCRUB),y)
> > +
> > +# Tracepoints like to blow up, so build that before everything else
> > +
> > +xfs-y				+= $(addprefix scrub/, \
> > +				   trace.o \
> > +				   scrub.o \
> > +				   )
> > +endif
> > diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> > index 2c26c38..a4b4c8c 100644
> > --- a/fs/xfs/libxfs/xfs_fs.h
> > +++ b/fs/xfs/libxfs/xfs_fs.h
> > @@ -468,6 +468,58 @@ typedef struct xfs_swapext
> >  #define XFS_FSOP_GOING_FLAGS_LOGFLUSH		0x1	/* flush log but not data */
> >  #define XFS_FSOP_GOING_FLAGS_NOLOGFLUSH		0x2	/* don't flush log nor data */
> >  
> > +/* metadata scrubbing */
> > +struct xfs_scrub_metadata {
> > +	__u32 sm_type;		/* What to check? */
> > +	__u32 sm_flags;		/* flags; see below. */
> > +	__u64 sm_ino;		/* inode number. */
> > +	__u32 sm_gen;		/* inode generation. */
> > +	__u32 sm_agno;		/* ag number. */
> > +	__u64 sm_reserved[5];	/* pad to 64 bytes */
> > +};
> > +
> > +/*
> > + * Metadata types and flags for scrub operation.
> > + */
> > +
> > +/* Scrub subcommands. */
> > +
> > +/* Number of scrub subcommands. */
> > +#define XFS_SCRUB_TYPE_NR	0
> > +
> > +/* i: Repair this metadata. */
> > +#define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
> > +
> > +/* o: Metadata object needs repair. */
> > +#define XFS_SCRUB_OFLAG_CORRUPT		(1 << 1)
> > +
> > +/*
> > + * o: Metadata object could be optimized.  It's not corrupt, but
> > + *    we could improve on it somehow.
> > + */
> > +#define XFS_SCRUB_OFLAG_PREEN		(1 << 2)
> > +
> > +/* o: Cross-referencing failed. */
> > +#define XFS_SCRUB_OFLAG_XFAIL		(1 << 3)
> > +
> > +/* o: Metadata object disagrees with cross-referenced metadata. */
> > +#define XFS_SCRUB_OFLAG_XCORRUPT	(1 << 4)
> > +
> > +/* o: Scan was not complete. */
> > +#define XFS_SCRUB_OFLAG_INCOMPLETE	(1 << 5)
> > +
> > +/* o: Metadata object looked funny but isn't corrupt. */
> > +#define XFS_SCRUB_OFLAG_WARNING		(1 << 6)
> > +
> > +#define XFS_SCRUB_FLAGS_IN	(XFS_SCRUB_IFLAG_REPAIR)
> > +#define XFS_SCRUB_FLAGS_OUT	(XFS_SCRUB_OFLAG_CORRUPT | \
> > +				 XFS_SCRUB_OFLAG_PREEN | \
> > +				 XFS_SCRUB_OFLAG_XFAIL | \
> > +				 XFS_SCRUB_OFLAG_XCORRUPT | \
> > +				 XFS_SCRUB_OFLAG_INCOMPLETE | \
> > +				 XFS_SCRUB_OFLAG_WARNING)
> > +#define XFS_SCRUB_FLAGS_ALL	(XFS_SCRUB_FLAGS_IN | XFS_SCRUB_FLAGS_OUT)
> > +
> >  /*
> >   * AG reserved block counters
> >   */
> > @@ -522,6 +574,7 @@ struct xfs_fsop_ag_resblks {
> >  #define XFS_IOC_ZERO_RANGE	_IOW ('X', 57, struct xfs_flock64)
> >  #define XFS_IOC_FREE_EOFBLOCKS	_IOR ('X', 58, struct xfs_fs_eofblocks)
> >  /*	XFS_IOC_GETFSMAP ------ hoisted 59         */
> > +#define XFS_IOC_SCRUB_METADATA	_IOWR('X', 60, struct xfs_scrub_metadata)
> >  
> >  /*
> >   * ioctl commands that replace IRIX syssgi()'s
> > diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
> > new file mode 100644
> > index 0000000..5db2a6f
> > --- /dev/null
> > +++ b/fs/xfs/scrub/scrub.c
> > @@ -0,0 +1,54 @@
> > +/*
> > + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> > + *
> > + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License
> > + * as published by the Free Software Foundation; either version 2
> > + * of the License, or (at your option) any later version.
> > + *
> > + * This program is distributed in the hope that it would be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, write the Free Software Foundation,
> > + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> > + */
> > +#include "xfs.h"
> > +#include "xfs_fs.h"
> > +#include "xfs_shared.h"
> > +#include "xfs_format.h"
> > +#include "xfs_trans_resv.h"
> > +#include "xfs_mount.h"
> > +#include "xfs_defer.h"
> > +#include "xfs_btree.h"
> > +#include "xfs_bit.h"
> > +#include "xfs_log_format.h"
> > +#include "xfs_trans.h"
> > +#include "xfs_sb.h"
> > +#include "xfs_inode.h"
> > +#include "xfs_alloc.h"
> > +#include "xfs_alloc_btree.h"
> > +#include "xfs_bmap.h"
> > +#include "xfs_bmap_btree.h"
> > +#include "xfs_ialloc.h"
> > +#include "xfs_ialloc_btree.h"
> > +#include "xfs_refcount.h"
> > +#include "xfs_refcount_btree.h"
> > +#include "xfs_rmap.h"
> > +#include "xfs_rmap_btree.h"
> > +#include "scrub/xfs_scrub.h"
> > +#include "scrub/scrub.h"
> > +#include "scrub/trace.h"
> > +
> > +/* Dispatch metadata scrubbing. */
> > +int
> > +xfs_scrub_metadata(
> > +	struct xfs_inode		*ip,
> > +	struct xfs_scrub_metadata	*sm)
> > +{
> > +	return -EOPNOTSUPP;
> > +}
> > diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
> > new file mode 100644
> > index 0000000..eb1cd9d
> > --- /dev/null
> > +++ b/fs/xfs/scrub/scrub.h
> > @@ -0,0 +1,25 @@
> > +/*
> > + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> > + *
> > + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License
> > + * as published by the Free Software Foundation; either version 2
> > + * of the License, or (at your option) any later version.
> > + *
> > + * This program is distributed in the hope that it would be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, write the Free Software Foundation,
> > + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> > + */
> > +#ifndef __XFS_SCRUB_SCRUB_H__
> > +#define __XFS_SCRUB_SCRUB_H__
> > +
> > +/* Metadata scrubbers */
> > +
> > +#endif	/* __XFS_SCRUB_SCRUB_H__ */
> > diff --git a/fs/xfs/scrub/trace.c b/fs/xfs/scrub/trace.c
> > new file mode 100644
> > index 0000000..c59fd41
> > --- /dev/null
> > +++ b/fs/xfs/scrub/trace.c
> > @@ -0,0 +1,41 @@
> > +/*
> > + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> > + *
> > + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License
> > + * as published by the Free Software Foundation; either version 2
> > + * of the License, or (at your option) any later version.
> > + *
> > + * This program is distributed in the hope that it would be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, write the Free Software Foundation,
> > + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> > + */
> > +#include "xfs.h"
> > +#include "xfs_fs.h"
> > +#include "xfs_shared.h"
> > +#include "xfs_format.h"
> > +#include "xfs_log_format.h"
> > +#include "xfs_trans_resv.h"
> > +#include "xfs_mount.h"
> > +#include "xfs_defer.h"
> > +#include "xfs_da_format.h"
> > +#include "xfs_defer.h"
> > +#include "xfs_inode.h"
> > +#include "xfs_btree.h"
> > +#include "xfs_trans.h"
> > +#include "scrub/xfs_scrub.h"
> > +#include "scrub/scrub.h"
> > +
> > +/*
> > + * We include this last to have the helpers above available for the trace
> > + * event implementations.
> > + */
> > +#define CREATE_TRACE_POINTS
> > +#include "scrub/trace.h"
> > diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
> > new file mode 100644
> > index 0000000..a95a7c8
> > --- /dev/null
> > +++ b/fs/xfs/scrub/trace.h
> > @@ -0,0 +1,33 @@
> > +/*
> > + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> > + *
> > + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License
> > + * as published by the Free Software Foundation; either version 2
> > + * of the License, or (at your option) any later version.
> > + *
> > + * This program is distributed in the hope that it would be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, write the Free Software Foundation,
> > + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> > + */
> > +#undef TRACE_SYSTEM
> > +#define TRACE_SYSTEM xfs_scrub
> > +
> > +#if !defined(_TRACE_XFS_SCRUB_TRACE_H) || defined(TRACE_HEADER_MULTI_READ)
> > +#define _TRACE_XFS_SCRUB_TRACE_H
> > +
> > +#include <linux/tracepoint.h>
> > +
> > +#endif /* _TRACE_XFS_SCRUB_TRACE_H */
> > +
> > +#undef TRACE_INCLUDE_PATH
> > +#define TRACE_INCLUDE_PATH .
> > +#define TRACE_INCLUDE_FILE scrub/trace
> > +#include <trace/define_trace.h>
> > diff --git a/fs/xfs/scrub/xfs_scrub.h b/fs/xfs/scrub/xfs_scrub.h
> > new file mode 100644
> > index 0000000..e00e0ea
> > --- /dev/null
> > +++ b/fs/xfs/scrub/xfs_scrub.h
> > @@ -0,0 +1,29 @@
> > +/*
> > + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> > + *
> > + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License
> > + * as published by the Free Software Foundation; either version 2
> > + * of the License, or (at your option) any later version.
> > + *
> > + * This program is distributed in the hope that it would be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, write the Free Software Foundation,
> > + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> > + */
> > +#ifndef __XFS_SCRUB_H__
> > +#define __XFS_SCRUB_H__
> > +
> > +#ifndef CONFIG_XFS_ONLINE_SCRUB
> > +# define xfs_scrub_metadata(ip, sm)	(-ENOTTY)
> > +#else
> > +int xfs_scrub_metadata(struct xfs_inode *ip, struct xfs_scrub_metadata *sm);
> > +#endif /* CONFIG_XFS_ONLINE_SCRUB */
> > +
> > +#endif	/* __XFS_SCRUB_H__ */
> > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> > index 44dc178..ab7a7f8 100644
> > --- a/fs/xfs/xfs_ioctl.c
> > +++ b/fs/xfs/xfs_ioctl.c
> > @@ -44,6 +44,7 @@
> >  #include "xfs_btree.h"
> >  #include <linux/fsmap.h>
> >  #include "xfs_fsmap.h"
> > +#include "scrub/xfs_scrub.h"
> >  
> >  #include <linux/capability.h>
> >  #include <linux/cred.h>
> > @@ -1702,6 +1703,30 @@ xfs_ioc_getfsmap(
> >  	return 0;
> >  }
> >  
> > +STATIC int
> > +xfs_ioc_scrub_metadata(
> > +	struct xfs_inode		*ip,
> > +	void				__user *arg)
> > +{
> > +	struct xfs_scrub_metadata	scrub;
> > +	int				error;
> > +
> > +	if (!capable(CAP_SYS_ADMIN))
> > +		return -EPERM;
> > +
> > +	if (copy_from_user(&scrub, arg, sizeof(scrub)))
> > +		return -EFAULT;
> > +
> > +	error = xfs_scrub_metadata(ip, &scrub);
> > +	if (error)
> > +		return error;
> > +
> > +	if (copy_to_user(arg, &scrub, sizeof(scrub)))
> > +		return -EFAULT;
> > +
> > +	return 0;
> > +}
> > +
> >  int
> >  xfs_ioc_swapext(
> >  	xfs_swapext_t	*sxp)
> > @@ -1906,6 +1931,9 @@ xfs_file_ioctl(
> >  	case FS_IOC_GETFSMAP:
> >  		return xfs_ioc_getfsmap(ip, arg);
> >  
> > +	case XFS_IOC_SCRUB_METADATA:
> > +		return xfs_ioc_scrub_metadata(ip, arg);
> > +
> >  	case XFS_IOC_FD_TO_HANDLE:
> >  	case XFS_IOC_PATH_TO_HANDLE:
> >  	case XFS_IOC_PATH_TO_FSHANDLE: {
> > diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
> > index e8b4de3..972d4bd 100644
> > --- a/fs/xfs/xfs_ioctl32.c
> > +++ b/fs/xfs/xfs_ioctl32.c
> > @@ -557,6 +557,7 @@ xfs_file_compat_ioctl(
> >  	case XFS_IOC_ERROR_CLEARALL:
> >  	case FS_IOC_GETFSMAP:
> >  	case XFS_IOC_GET_AG_RESBLKS:
> > +	case XFS_IOC_SCRUB_METADATA:
> >  		return xfs_file_ioctl(filp, cmd, p);
> >  #ifndef BROKEN_X86_ALIGNMENT
> >  	/* These are handled fine if no alignment issues */
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 03/27] xfs: create an ioctl to scrub AG metadata
  2017-09-21 17:35     ` Darrick J. Wong
@ 2017-09-21 17:52       ` Brian Foster
  2017-09-22  3:26         ` Darrick J. Wong
  0 siblings, 1 reply; 51+ messages in thread
From: Brian Foster @ 2017-09-21 17:52 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Sep 21, 2017 at 10:35:10AM -0700, Darrick J. Wong wrote:
> On Thu, Sep 21, 2017 at 10:36:39AM -0400, Brian Foster wrote:
> > On Wed, Sep 20, 2017 at 05:17:56PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > Create an ioctl that can be used to scrub internal filesystem metadata.
> > > The new ioctl takes the metadata type, an (optional) AG number, an
> > > (optional) inode number and generation, and a flags argument.  This will
> > > be used by the upcoming XFS online scrub tool.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > >  fs/xfs/Kconfig           |   17 ++++++++++++++
> > >  fs/xfs/Makefile          |   11 +++++++++
> > >  fs/xfs/libxfs/xfs_fs.h   |   53 +++++++++++++++++++++++++++++++++++++++++++++
> > >  fs/xfs/scrub/scrub.c     |   54 ++++++++++++++++++++++++++++++++++++++++++++++
> > >  fs/xfs/scrub/scrub.h     |   25 +++++++++++++++++++++
> > >  fs/xfs/scrub/trace.c     |   41 +++++++++++++++++++++++++++++++++++
> > >  fs/xfs/scrub/trace.h     |   33 ++++++++++++++++++++++++++++
> > >  fs/xfs/scrub/xfs_scrub.h |   29 +++++++++++++++++++++++++
> > >  fs/xfs/xfs_ioctl.c       |   28 ++++++++++++++++++++++++
> > >  fs/xfs/xfs_ioctl32.c     |    1 +
> > >  10 files changed, 292 insertions(+)
> > >  create mode 100644 fs/xfs/scrub/scrub.c
> > >  create mode 100644 fs/xfs/scrub/scrub.h
> > >  create mode 100644 fs/xfs/scrub/trace.c
> > >  create mode 100644 fs/xfs/scrub/trace.h
> > >  create mode 100644 fs/xfs/scrub/xfs_scrub.h
> > > 
> > > 
> > 
> > The code looks sane, though I think I need to understand the error codes
> > a bit better. Perhaps once I get further into the series..
> 
> Yes, the ioctl needs documentation, though I'm unclear on where's an
> appropriate place to put them.  The ioctl is not intended for general
> consumption, so man-pages.git seems inappropriate.  I was thinking
> either in xfs_fs.h directly, or perhaps xfsprogs' man pages?
> 

For now (while all of this is still experimental and until you get a
better answer :P), it seems reasonable enough to me to add basic
interface documentation to the kernel source. Then we at least have the
content and can move it out into a manpage or something later, if
needed.

Brian

> (We don't seem to document the xfs ioctls afaict?)
> 
> --D
> 
> > 
> > Brian
> > 
> > > diff --git a/fs/xfs/Kconfig b/fs/xfs/Kconfig
> > > index 1b98cfa..f42fcf1 100644
> > > --- a/fs/xfs/Kconfig
> > > +++ b/fs/xfs/Kconfig
> > > @@ -71,6 +71,23 @@ config XFS_RT
> > >  
> > >  	  If unsure, say N.
> > >  
> > > +config XFS_ONLINE_SCRUB
> > > +	bool "XFS online metadata check support"
> > > +	default n
> > > +	depends on XFS_FS
> > > +	help
> > > +	  If you say Y here you will be able to check metadata on a
> > > +	  mounted XFS filesystem.  This feature is intended to reduce
> > > +	  filesystem downtime by supplementing xfs_repair.  The key
> > > +	  advantage here is to look for problems proactively so that
> > > +	  they can be dealt with in a controlled manner.
> > > +
> > > +	  This feature is considered EXPERIMENTAL.  Use with caution!
> > > +
> > > +	  See the xfs_scrub man page in section 8 for additional information.
> > > +
> > > +	  If unsure, say N.
> > > +
> > >  config XFS_WARN
> > >  	bool "XFS Verbose Warnings"
> > >  	depends on XFS_FS && !XFS_DEBUG
> > > diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> > > index dbc33e0..f4312bc 100644
> > > --- a/fs/xfs/Makefile
> > > +++ b/fs/xfs/Makefile
> > > @@ -138,3 +138,14 @@ xfs-$(CONFIG_XFS_POSIX_ACL)	+= xfs_acl.o
> > >  xfs-$(CONFIG_SYSCTL)		+= xfs_sysctl.o
> > >  xfs-$(CONFIG_COMPAT)		+= xfs_ioctl32.o
> > >  xfs-$(CONFIG_EXPORTFS_BLOCK_OPS)	+= xfs_pnfs.o
> > > +
> > > +# online scrub/repair
> > > +ifeq ($(CONFIG_XFS_ONLINE_SCRUB),y)
> > > +
> > > +# Tracepoints like to blow up, so build that before everything else
> > > +
> > > +xfs-y				+= $(addprefix scrub/, \
> > > +				   trace.o \
> > > +				   scrub.o \
> > > +				   )
> > > +endif
> > > diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> > > index 2c26c38..a4b4c8c 100644
> > > --- a/fs/xfs/libxfs/xfs_fs.h
> > > +++ b/fs/xfs/libxfs/xfs_fs.h
> > > @@ -468,6 +468,58 @@ typedef struct xfs_swapext
> > >  #define XFS_FSOP_GOING_FLAGS_LOGFLUSH		0x1	/* flush log but not data */
> > >  #define XFS_FSOP_GOING_FLAGS_NOLOGFLUSH		0x2	/* don't flush log nor data */
> > >  
> > > +/* metadata scrubbing */
> > > +struct xfs_scrub_metadata {
> > > +	__u32 sm_type;		/* What to check? */
> > > +	__u32 sm_flags;		/* flags; see below. */
> > > +	__u64 sm_ino;		/* inode number. */
> > > +	__u32 sm_gen;		/* inode generation. */
> > > +	__u32 sm_agno;		/* ag number. */
> > > +	__u64 sm_reserved[5];	/* pad to 64 bytes */
> > > +};
> > > +
> > > +/*
> > > + * Metadata types and flags for scrub operation.
> > > + */
> > > +
> > > +/* Scrub subcommands. */
> > > +
> > > +/* Number of scrub subcommands. */
> > > +#define XFS_SCRUB_TYPE_NR	0
> > > +
> > > +/* i: Repair this metadata. */
> > > +#define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
> > > +
> > > +/* o: Metadata object needs repair. */
> > > +#define XFS_SCRUB_OFLAG_CORRUPT		(1 << 1)
> > > +
> > > +/*
> > > + * o: Metadata object could be optimized.  It's not corrupt, but
> > > + *    we could improve on it somehow.
> > > + */
> > > +#define XFS_SCRUB_OFLAG_PREEN		(1 << 2)
> > > +
> > > +/* o: Cross-referencing failed. */
> > > +#define XFS_SCRUB_OFLAG_XFAIL		(1 << 3)
> > > +
> > > +/* o: Metadata object disagrees with cross-referenced metadata. */
> > > +#define XFS_SCRUB_OFLAG_XCORRUPT	(1 << 4)
> > > +
> > > +/* o: Scan was not complete. */
> > > +#define XFS_SCRUB_OFLAG_INCOMPLETE	(1 << 5)
> > > +
> > > +/* o: Metadata object looked funny but isn't corrupt. */
> > > +#define XFS_SCRUB_OFLAG_WARNING		(1 << 6)
> > > +
> > > +#define XFS_SCRUB_FLAGS_IN	(XFS_SCRUB_IFLAG_REPAIR)
> > > +#define XFS_SCRUB_FLAGS_OUT	(XFS_SCRUB_OFLAG_CORRUPT | \
> > > +				 XFS_SCRUB_OFLAG_PREEN | \
> > > +				 XFS_SCRUB_OFLAG_XFAIL | \
> > > +				 XFS_SCRUB_OFLAG_XCORRUPT | \
> > > +				 XFS_SCRUB_OFLAG_INCOMPLETE | \
> > > +				 XFS_SCRUB_OFLAG_WARNING)
> > > +#define XFS_SCRUB_FLAGS_ALL	(XFS_SCRUB_FLAGS_IN | XFS_SCRUB_FLAGS_OUT)
> > > +
> > >  /*
> > >   * AG reserved block counters
> > >   */
> > > @@ -522,6 +574,7 @@ struct xfs_fsop_ag_resblks {
> > >  #define XFS_IOC_ZERO_RANGE	_IOW ('X', 57, struct xfs_flock64)
> > >  #define XFS_IOC_FREE_EOFBLOCKS	_IOR ('X', 58, struct xfs_fs_eofblocks)
> > >  /*	XFS_IOC_GETFSMAP ------ hoisted 59         */
> > > +#define XFS_IOC_SCRUB_METADATA	_IOWR('X', 60, struct xfs_scrub_metadata)
> > >  
> > >  /*
> > >   * ioctl commands that replace IRIX syssgi()'s
> > > diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
> > > new file mode 100644
> > > index 0000000..5db2a6f
> > > --- /dev/null
> > > +++ b/fs/xfs/scrub/scrub.c
> > > @@ -0,0 +1,54 @@
> > > +/*
> > > + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> > > + *
> > > + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> > > + *
> > > + * This program is free software; you can redistribute it and/or
> > > + * modify it under the terms of the GNU General Public License
> > > + * as published by the Free Software Foundation; either version 2
> > > + * of the License, or (at your option) any later version.
> > > + *
> > > + * This program is distributed in the hope that it would be useful,
> > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > > + * GNU General Public License for more details.
> > > + *
> > > + * You should have received a copy of the GNU General Public License
> > > + * along with this program; if not, write the Free Software Foundation,
> > > + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> > > + */
> > > +#include "xfs.h"
> > > +#include "xfs_fs.h"
> > > +#include "xfs_shared.h"
> > > +#include "xfs_format.h"
> > > +#include "xfs_trans_resv.h"
> > > +#include "xfs_mount.h"
> > > +#include "xfs_defer.h"
> > > +#include "xfs_btree.h"
> > > +#include "xfs_bit.h"
> > > +#include "xfs_log_format.h"
> > > +#include "xfs_trans.h"
> > > +#include "xfs_sb.h"
> > > +#include "xfs_inode.h"
> > > +#include "xfs_alloc.h"
> > > +#include "xfs_alloc_btree.h"
> > > +#include "xfs_bmap.h"
> > > +#include "xfs_bmap_btree.h"
> > > +#include "xfs_ialloc.h"
> > > +#include "xfs_ialloc_btree.h"
> > > +#include "xfs_refcount.h"
> > > +#include "xfs_refcount_btree.h"
> > > +#include "xfs_rmap.h"
> > > +#include "xfs_rmap_btree.h"
> > > +#include "scrub/xfs_scrub.h"
> > > +#include "scrub/scrub.h"
> > > +#include "scrub/trace.h"
> > > +
> > > +/* Dispatch metadata scrubbing. */
> > > +int
> > > +xfs_scrub_metadata(
> > > +	struct xfs_inode		*ip,
> > > +	struct xfs_scrub_metadata	*sm)
> > > +{
> > > +	return -EOPNOTSUPP;
> > > +}
> > > diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
> > > new file mode 100644
> > > index 0000000..eb1cd9d
> > > --- /dev/null
> > > +++ b/fs/xfs/scrub/scrub.h
> > > @@ -0,0 +1,25 @@
> > > +/*
> > > + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> > > + *
> > > + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> > > + *
> > > + * This program is free software; you can redistribute it and/or
> > > + * modify it under the terms of the GNU General Public License
> > > + * as published by the Free Software Foundation; either version 2
> > > + * of the License, or (at your option) any later version.
> > > + *
> > > + * This program is distributed in the hope that it would be useful,
> > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > > + * GNU General Public License for more details.
> > > + *
> > > + * You should have received a copy of the GNU General Public License
> > > + * along with this program; if not, write the Free Software Foundation,
> > > + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> > > + */
> > > +#ifndef __XFS_SCRUB_SCRUB_H__
> > > +#define __XFS_SCRUB_SCRUB_H__
> > > +
> > > +/* Metadata scrubbers */
> > > +
> > > +#endif	/* __XFS_SCRUB_SCRUB_H__ */
> > > diff --git a/fs/xfs/scrub/trace.c b/fs/xfs/scrub/trace.c
> > > new file mode 100644
> > > index 0000000..c59fd41
> > > --- /dev/null
> > > +++ b/fs/xfs/scrub/trace.c
> > > @@ -0,0 +1,41 @@
> > > +/*
> > > + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> > > + *
> > > + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> > > + *
> > > + * This program is free software; you can redistribute it and/or
> > > + * modify it under the terms of the GNU General Public License
> > > + * as published by the Free Software Foundation; either version 2
> > > + * of the License, or (at your option) any later version.
> > > + *
> > > + * This program is distributed in the hope that it would be useful,
> > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > > + * GNU General Public License for more details.
> > > + *
> > > + * You should have received a copy of the GNU General Public License
> > > + * along with this program; if not, write the Free Software Foundation,
> > > + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> > > + */
> > > +#include "xfs.h"
> > > +#include "xfs_fs.h"
> > > +#include "xfs_shared.h"
> > > +#include "xfs_format.h"
> > > +#include "xfs_log_format.h"
> > > +#include "xfs_trans_resv.h"
> > > +#include "xfs_mount.h"
> > > +#include "xfs_defer.h"
> > > +#include "xfs_da_format.h"
> > > +#include "xfs_defer.h"
> > > +#include "xfs_inode.h"
> > > +#include "xfs_btree.h"
> > > +#include "xfs_trans.h"
> > > +#include "scrub/xfs_scrub.h"
> > > +#include "scrub/scrub.h"
> > > +
> > > +/*
> > > + * We include this last to have the helpers above available for the trace
> > > + * event implementations.
> > > + */
> > > +#define CREATE_TRACE_POINTS
> > > +#include "scrub/trace.h"
> > > diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
> > > new file mode 100644
> > > index 0000000..a95a7c8
> > > --- /dev/null
> > > +++ b/fs/xfs/scrub/trace.h
> > > @@ -0,0 +1,33 @@
> > > +/*
> > > + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> > > + *
> > > + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> > > + *
> > > + * This program is free software; you can redistribute it and/or
> > > + * modify it under the terms of the GNU General Public License
> > > + * as published by the Free Software Foundation; either version 2
> > > + * of the License, or (at your option) any later version.
> > > + *
> > > + * This program is distributed in the hope that it would be useful,
> > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > > + * GNU General Public License for more details.
> > > + *
> > > + * You should have received a copy of the GNU General Public License
> > > + * along with this program; if not, write the Free Software Foundation,
> > > + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> > > + */
> > > +#undef TRACE_SYSTEM
> > > +#define TRACE_SYSTEM xfs_scrub
> > > +
> > > +#if !defined(_TRACE_XFS_SCRUB_TRACE_H) || defined(TRACE_HEADER_MULTI_READ)
> > > +#define _TRACE_XFS_SCRUB_TRACE_H
> > > +
> > > +#include <linux/tracepoint.h>
> > > +
> > > +#endif /* _TRACE_XFS_SCRUB_TRACE_H */
> > > +
> > > +#undef TRACE_INCLUDE_PATH
> > > +#define TRACE_INCLUDE_PATH .
> > > +#define TRACE_INCLUDE_FILE scrub/trace
> > > +#include <trace/define_trace.h>
> > > diff --git a/fs/xfs/scrub/xfs_scrub.h b/fs/xfs/scrub/xfs_scrub.h
> > > new file mode 100644
> > > index 0000000..e00e0ea
> > > --- /dev/null
> > > +++ b/fs/xfs/scrub/xfs_scrub.h
> > > @@ -0,0 +1,29 @@
> > > +/*
> > > + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> > > + *
> > > + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> > > + *
> > > + * This program is free software; you can redistribute it and/or
> > > + * modify it under the terms of the GNU General Public License
> > > + * as published by the Free Software Foundation; either version 2
> > > + * of the License, or (at your option) any later version.
> > > + *
> > > + * This program is distributed in the hope that it would be useful,
> > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > > + * GNU General Public License for more details.
> > > + *
> > > + * You should have received a copy of the GNU General Public License
> > > + * along with this program; if not, write the Free Software Foundation,
> > > + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> > > + */
> > > +#ifndef __XFS_SCRUB_H__
> > > +#define __XFS_SCRUB_H__
> > > +
> > > +#ifndef CONFIG_XFS_ONLINE_SCRUB
> > > +# define xfs_scrub_metadata(ip, sm)	(-ENOTTY)
> > > +#else
> > > +int xfs_scrub_metadata(struct xfs_inode *ip, struct xfs_scrub_metadata *sm);
> > > +#endif /* CONFIG_XFS_ONLINE_SCRUB */
> > > +
> > > +#endif	/* __XFS_SCRUB_H__ */
> > > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> > > index 44dc178..ab7a7f8 100644
> > > --- a/fs/xfs/xfs_ioctl.c
> > > +++ b/fs/xfs/xfs_ioctl.c
> > > @@ -44,6 +44,7 @@
> > >  #include "xfs_btree.h"
> > >  #include <linux/fsmap.h>
> > >  #include "xfs_fsmap.h"
> > > +#include "scrub/xfs_scrub.h"
> > >  
> > >  #include <linux/capability.h>
> > >  #include <linux/cred.h>
> > > @@ -1702,6 +1703,30 @@ xfs_ioc_getfsmap(
> > >  	return 0;
> > >  }
> > >  
> > > +STATIC int
> > > +xfs_ioc_scrub_metadata(
> > > +	struct xfs_inode		*ip,
> > > +	void				__user *arg)
> > > +{
> > > +	struct xfs_scrub_metadata	scrub;
> > > +	int				error;
> > > +
> > > +	if (!capable(CAP_SYS_ADMIN))
> > > +		return -EPERM;
> > > +
> > > +	if (copy_from_user(&scrub, arg, sizeof(scrub)))
> > > +		return -EFAULT;
> > > +
> > > +	error = xfs_scrub_metadata(ip, &scrub);
> > > +	if (error)
> > > +		return error;
> > > +
> > > +	if (copy_to_user(arg, &scrub, sizeof(scrub)))
> > > +		return -EFAULT;
> > > +
> > > +	return 0;
> > > +}
> > > +
> > >  int
> > >  xfs_ioc_swapext(
> > >  	xfs_swapext_t	*sxp)
> > > @@ -1906,6 +1931,9 @@ xfs_file_ioctl(
> > >  	case FS_IOC_GETFSMAP:
> > >  		return xfs_ioc_getfsmap(ip, arg);
> > >  
> > > +	case XFS_IOC_SCRUB_METADATA:
> > > +		return xfs_ioc_scrub_metadata(ip, arg);
> > > +
> > >  	case XFS_IOC_FD_TO_HANDLE:
> > >  	case XFS_IOC_PATH_TO_HANDLE:
> > >  	case XFS_IOC_PATH_TO_FSHANDLE: {
> > > diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
> > > index e8b4de3..972d4bd 100644
> > > --- a/fs/xfs/xfs_ioctl32.c
> > > +++ b/fs/xfs/xfs_ioctl32.c
> > > @@ -557,6 +557,7 @@ xfs_file_compat_ioctl(
> > >  	case XFS_IOC_ERROR_CLEARALL:
> > >  	case FS_IOC_GETFSMAP:
> > >  	case XFS_IOC_GET_AG_RESBLKS:
> > > +	case XFS_IOC_SCRUB_METADATA:
> > >  		return xfs_file_ioctl(filp, cmd, p);
> > >  #ifndef BROKEN_X86_ALIGNMENT
> > >  	/* These are handled fine if no alignment issues */
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 04/27] xfs: dispatch metadata scrub subcommands
  2017-09-21 14:37   ` Brian Foster
@ 2017-09-21 18:08     ` Darrick J. Wong
  0 siblings, 0 replies; 51+ messages in thread
From: Darrick J. Wong @ 2017-09-21 18:08 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Thu, Sep 21, 2017 at 10:37:02AM -0400, Brian Foster wrote:
> On Wed, Sep 20, 2017 at 05:18:02PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Create structures needed to hold scrubbing context and dispatch incoming
> > commands to the individual scrubbers.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/scrub/scrub.c |  172 ++++++++++++++++++++++++++++++++++++++++++++++++++
> >  fs/xfs/scrub/scrub.h |   19 ++++++
> >  fs/xfs/scrub/trace.h |   43 +++++++++++++
> >  3 files changed, 233 insertions(+), 1 deletion(-)
> > 
> > 
> > diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
> > index 5db2a6f..7cf518e 100644
> > --- a/fs/xfs/scrub/scrub.c
> > +++ b/fs/xfs/scrub/scrub.c
> > @@ -44,11 +44,181 @@
> >  #include "scrub/scrub.h"
> >  #include "scrub/trace.h"
> >  
> > +/*
> > + * Online Scrub and Repair
> > + *
> > + * Traditionally, XFS (the kernel driver) did not know how to check or
> > + * repair on-disk data structures.  That task was left to the xfs_check
> > + * and xfs_repair tools, both of which require taking the filesystem
> > + * offline for a thorough but time consuming examination.  Online
> > + * scrub & repair, on the other hand, enables us to check the metadata
> > + * for obvious errors while carefully stepping around the filesystem's
> > + * ongoing operations, locking rules, etc.
> > + *
> > + * Given that most XFS metadata consist of records stored in a btree,
> > + * most of the checking functions iterate the btree blocks themselves
> > + * looking for irregularities.  When a record block is encountered, each
> > + * record can be checked for obviously bad values.  Record values can
> > + * also be cross-referenced against other btrees to look for potential
> > + * misunderstandings between pieces of metadata.
> > + *
> > + * It is expected that the checkers responsible for per-AG metadata
> > + * structures will lock the AG headers (AGI, AGF, AGFL), iterate the
> > + * metadata structure, and perform any relevant cross-referencing before
> > + * unlocking the AG and returning the results to userspace.  These
> > + * scrubbers must not keep an AG locked for too long to avoid tying up
> > + * the block and inode allocators.
> > + *
> > + * Block maps and b-trees rooted in an inode present a special challenge
> > + * because they can involve extents from any AG.  The general scrubber
> > + * structure of lock -> check -> xref -> unlock still holds, but AG
> > + * locking order rules /must/ be obeyed to avoid deadlocks.  The
> > + * ordering rule, of course, is that we must lock in increasing AG
> > + * order.  Helper functions are provided to track which AG headers we've
> > + * already locked.  If we detect an imminent locking order violation, we
> > + * can signal a potential deadlock, in which case the scrubber can jump
> > + * out to the top level, lock all the AGs in order, and retry the scrub.
> > + *
> > + * For file data (directories, extended attributes, symlinks) scrub, we
> > + * can simply lock the inode and walk the data.  For btree data
> > + * (directories and attributes) we follow the same btree-scrubbing
> > + * strategy outlined previously to check the records.
> > + *
> > + * We use a bit of trickery with transactions to avoid buffer deadlocks
> > + * if there is a cycle in the metadata.  The basic problem is that
> > + * travelling down a btree involves locking the current buffer at each
> > + * tree level.  If a pointer should somehow point back to a buffer that
> > + * we've already examined, we will deadlock due to the second buffer
> > + * locking attempt.  Note however that grabbing a buffer in transaction
> > + * context links the locked buffer to the transaction.  If we try to
> > + * re-grab the buffer in the context of the same transaction, we avoid
> > + * the second lock attempt and continue.  Between the verifier and the
> > + * scrubber, something will notice that something is amiss and report
> > + * the corruption.  Therefore, each scrubber will allocate an empty
> > + * transaction, attach buffers to it, and cancel the transaction at the
> > + * end of the scrub run.  Cancelling a non-dirty transaction simply
> > + * unlocks the buffers.
> > + *
> > + * There are four pieces of data that scrub can communicate to
> > + * userspace.  The first is the error code (errno), which can be used to
> > + * communicate operational errors in performing the scrub.  There are
> > + * also three flags that can be set in the scrub context.  If the data
> > + * structure itself is corrupt, the CORRUPT flag will be set.  If
> > + * the metadata is correct but otherwise suboptimal, the PREEN flag
> > + * will be set.
> 
> Did you mean to describe other flags here?

Somewhere; the other flags get added in whichever patch(es) start using
them.

> > + */
> > +
> > +/* Scrub setup and teardown */
> > +
> > +/* Free all the resources and finish the transactions. */
> > +STATIC int
> > +xfs_scrub_teardown(
> > +	struct xfs_scrub_context	*sc,
> > +	int				error)
> 
> What's the purpose of passing error just to return it?

Eventually repair needs it to decide if it's cancelling the transaction
or commiting a repair.  I suppose I could remove it here and add it back
later.

> > +{
> > +	if (sc->tp) {
> > +		xfs_trans_cancel(sc->tp);
> > +		sc->tp = NULL;
> > +	}
> > +	return error;
> > +}
> > +
> > +/* Scrubbing dispatch. */
> > +
> > +static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
> > +};
> > +
> >  /* Dispatch metadata scrubbing. */
> >  int
> >  xfs_scrub_metadata(
> >  	struct xfs_inode		*ip,
> >  	struct xfs_scrub_metadata	*sm)
> >  {
> > -	return -EOPNOTSUPP;
> > +	struct xfs_scrub_context	sc;
> > +	struct xfs_mount		*mp = ip->i_mount;
> > +	const struct xfs_scrub_meta_ops	*ops;
> > +	bool				try_harder = false;
> > +	int				error = 0;
> > +
> > +	trace_xfs_scrub_start(ip, sm, error);
> > +
> > +	/* Forbidden if we are shut down or mounted norecovery. */
> > +	error = -ESHUTDOWN;
> > +	if (XFS_FORCED_SHUTDOWN(mp))
> > +		goto out;
> > +	error = -ENOTRECOVERABLE;
> > +	if (mp->m_flags & XFS_MOUNT_NORECOVERY)
> > +		goto out;
> > +
> > +	/* Check our inputs. */
> > +	error = -EINVAL;
> > +	sm->sm_flags &= ~XFS_SCRUB_FLAGS_OUT;
> > +	if (sm->sm_flags & ~XFS_SCRUB_FLAGS_IN)
> > +		goto out;
> > +	if (memchr_inv(sm->sm_reserved, 0, sizeof(sm->sm_reserved)))
> > +		goto out;
> > +
> > +	/* Do we know about this type of metadata? */
> > +	error = -ENOENT;
> > +	if (sm->sm_type >= XFS_SCRUB_TYPE_NR)
> > +		goto out;
> > +	ops = &meta_scrub_ops[sm->sm_type];
> > +	if (ops->scrub == NULL)
> > +		goto out;
> > +
> > +	/* Does this fs even support this type of metadata? */
> > +	if (ops->has && !ops->has(&mp->m_sb))
> > +		goto out;
> > +
> > +	/* We don't know how to repair anything yet. */
> > +	error = -EOPNOTSUPP;
> > +	if (sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR)
> > +		goto out;
> > +
> > +	/* This isn't a stable feature.  Use with care. */
> > +	{
> > +		static bool warned;
> > +
> > +		if (!warned)
> > +			xfs_alert(mp,
> > +	"EXPERIMENTAL online scrub feature in use. Use at your own risk!");
> > +		warned = true;
> > +	}
> > +
> > +retry_op:
> > +	/* Set up for the operation. */
> > +	memset(&sc, 0, sizeof(sc));
> > +	sc.mp = ip->i_mount;
> > +	sc.sm = sm;
> > +	sc.ops = ops;
> > +	sc.try_harder = try_harder;
> > +	error = sc.ops->setup(&sc, ip);
> > +	if (error)
> > +		goto out_teardown;
> > +
> > +	/* Scrub for errors. */
> > +	error = sc.ops->scrub(&sc);
> > +	if (!try_harder && error == -EDEADLOCK) {
> > +		/*
> > +		 * Scrubbers return -EDEADLOCK to mean 'try harder'.
> > +		 * Tear down everything we hold, then set up again with
> > +		 * preparation for worst-case scenarios.
> > +		 */
> > +		error = xfs_scrub_teardown(&sc, 0);
> > +		if (error)
> > +			goto out;
> > +		try_harder = true;
> > +		goto retry_op;
> > +	} else if (error)
> > +		goto out_teardown;
> > +
> > +	if (sc.sm->sm_flags & (XFS_SCRUB_OFLAG_CORRUPT |
> > +			       XFS_SCRUB_OFLAG_XCORRUPT))
> > +		xfs_alert_ratelimited(mp, "Corruption detected during scrub.");
> > +
> > +out_teardown:
> > +	error = xfs_scrub_teardown(&sc, error);
> > +out:
> > +	trace_xfs_scrub_done(ip, sm, error);
> > +	return error;
> >  }
> > diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
> > index eb1cd9d..b271b2a 100644
> > --- a/fs/xfs/scrub/scrub.h
> > +++ b/fs/xfs/scrub/scrub.h
> > @@ -20,6 +20,25 @@
> >  #ifndef __XFS_SCRUB_SCRUB_H__
> >  #define __XFS_SCRUB_SCRUB_H__
> >  
> > +struct xfs_scrub_context;
> > +
> > +struct xfs_scrub_meta_ops {
> > +	int		(*setup)(struct xfs_scrub_context *,
> > +				 struct xfs_inode *);
> > +	int		(*scrub)(struct xfs_scrub_context *);
> > +	bool		(*has)(struct xfs_sb *);
> 
> I assume 'has' is to identify whether a particular mount supports a
> particular feature. I suppose a better name would be nice here, or
> perhaps just a comment to outline the purpose of each callout.

I'll add some comments.

--D

> 
> Brian
> 
> > +};
> > +
> > +struct xfs_scrub_context {
> > +	/* General scrub state. */
> > +	struct xfs_mount		*mp;
> > +	struct xfs_scrub_metadata	*sm;
> > +	const struct xfs_scrub_meta_ops	*ops;
> > +	struct xfs_trans		*tp;
> > +	struct xfs_inode		*ip;
> > +	bool				try_harder;
> > +};
> > +
> >  /* Metadata scrubbers */
> >  
> >  #endif	/* __XFS_SCRUB_SCRUB_H__ */
> > diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
> > index a95a7c8..688517e 100644
> > --- a/fs/xfs/scrub/trace.h
> > +++ b/fs/xfs/scrub/trace.h
> > @@ -25,6 +25,49 @@
> >  
> >  #include <linux/tracepoint.h>
> >  
> > +DECLARE_EVENT_CLASS(xfs_scrub_class,
> > +	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
> > +		 int error),
> > +	TP_ARGS(ip, sm, error),
> > +	TP_STRUCT__entry(
> > +		__field(dev_t, dev)
> > +		__field(xfs_ino_t, ino)
> > +		__field(unsigned int, type)
> > +		__field(xfs_agnumber_t, agno)
> > +		__field(xfs_ino_t, inum)
> > +		__field(unsigned int, gen)
> > +		__field(unsigned int, flags)
> > +		__field(int, error)
> > +	),
> > +	TP_fast_assign(
> > +		__entry->dev = ip->i_mount->m_super->s_dev;
> > +		__entry->ino = ip->i_ino;
> > +		__entry->type = sm->sm_type;
> > +		__entry->agno = sm->sm_agno;
> > +		__entry->inum = sm->sm_ino;
> > +		__entry->gen = sm->sm_gen;
> > +		__entry->flags = sm->sm_flags;
> > +		__entry->error = error;
> > +	),
> > +	TP_printk("dev %d:%d ino %llu type %u agno %u inum %llu gen %u flags 0x%x error %d",
> > +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> > +		  __entry->ino,
> > +		  __entry->type,
> > +		  __entry->agno,
> > +		  __entry->inum,
> > +		  __entry->gen,
> > +		  __entry->flags,
> > +		  __entry->error)
> > +)
> > +#define DEFINE_SCRUB_EVENT(name) \
> > +DEFINE_EVENT(xfs_scrub_class, name, \
> > +	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm, \
> > +		 int error), \
> > +	TP_ARGS(ip, sm, error))
> > +
> > +DEFINE_SCRUB_EVENT(xfs_scrub_start);
> > +DEFINE_SCRUB_EVENT(xfs_scrub_done);
> > +
> >  #endif /* _TRACE_XFS_SCRUB_TRACE_H */
> >  
> >  #undef TRACE_INCLUDE_PATH
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 05/27] xfs: test the scrub ioctl
  2017-09-21  6:04   ` Dave Chinner
@ 2017-09-21 18:14     ` Darrick J. Wong
  0 siblings, 0 replies; 51+ messages in thread
From: Darrick J. Wong @ 2017-09-21 18:14 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Sep 21, 2017 at 04:04:16PM +1000, Dave Chinner wrote:
> On Wed, Sep 20, 2017 at 05:18:08PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Create a test scrubber with id 0.  This will be used by xfs_scrub to
> > probe the kernel's abilities to scrub (and repair) the metadata.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/Makefile        |    1 +
> >  fs/xfs/libxfs/xfs_fs.h |    3 ++
> >  fs/xfs/scrub/common.c  |   60 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  fs/xfs/scrub/common.h  |   44 +++++++++++++++++++++++++++++++++++
> >  fs/xfs/scrub/scrub.c   |   33 ++++++++++++++++++++++++++
> >  fs/xfs/scrub/scrub.h   |    1 +
> >  fs/xfs/scrub/trace.c   |    1 +
> >  7 files changed, 142 insertions(+), 1 deletion(-)
> >  create mode 100644 fs/xfs/scrub/common.c
> >  create mode 100644 fs/xfs/scrub/common.h
> > 
> > 
> > diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> > index f4312bc..ca14595 100644
> > --- a/fs/xfs/Makefile
> > +++ b/fs/xfs/Makefile
> > @@ -146,6 +146,7 @@ ifeq ($(CONFIG_XFS_ONLINE_SCRUB),y)
> >  
> >  xfs-y				+= $(addprefix scrub/, \
> >  				   trace.o \
> > +				   common.o \
> >  				   scrub.o \
> >  				   )
> >  endif
> > diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> > index a4b4c8c..5105bad 100644
> > --- a/fs/xfs/libxfs/xfs_fs.h
> > +++ b/fs/xfs/libxfs/xfs_fs.h
> > @@ -483,9 +483,10 @@ struct xfs_scrub_metadata {
> >   */
> >  
> >  /* Scrub subcommands. */
> > +#define XFS_SCRUB_TYPE_TEST	0	/* presence test ioctl */
> 
> Shouldn't we call this a "probe" - as in "probe for support" so it
> doesn't get confused with "use this to test whether scrub works"

I'd thought "test" as in "test if it's even there", but "probe" works
just as well and (hopefully) confuses everyone less.

> 
> >  /* Number of scrub subcommands. */
> > -#define XFS_SCRUB_TYPE_NR	0
> > +#define XFS_SCRUB_TYPE_NR	1
> >  
> >  /* i: Repair this metadata. */
> >  #define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
> > diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> > new file mode 100644
> > index 0000000..13ccb36
> > --- /dev/null
> > +++ b/fs/xfs/scrub/common.c
> > @@ -0,0 +1,60 @@
> > +/*
> > + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> > + *
> > + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License
> > + * as published by the Free Software Foundation; either version 2
> > + * of the License, or (at your option) any later version.
> > + *
> > + * This program is distributed in the hope that it would be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, write the Free Software Foundation,
> > + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> > + */
> > +#include "xfs.h"
> > +#include "xfs_fs.h"
> > +#include "xfs_shared.h"
> > +#include "xfs_format.h"
> > +#include "xfs_trans_resv.h"
> > +#include "xfs_mount.h"
> > +#include "xfs_defer.h"
> > +#include "xfs_btree.h"
> > +#include "xfs_bit.h"
> > +#include "xfs_log_format.h"
> > +#include "xfs_trans.h"
> > +#include "xfs_sb.h"
> > +#include "xfs_inode.h"
> > +#include "xfs_alloc.h"
> > +#include "xfs_alloc_btree.h"
> > +#include "xfs_bmap.h"
> > +#include "xfs_bmap_btree.h"
> > +#include "xfs_ialloc.h"
> > +#include "xfs_ialloc_btree.h"
> > +#include "xfs_refcount.h"
> > +#include "xfs_refcount_btree.h"
> > +#include "xfs_rmap.h"
> > +#include "xfs_rmap_btree.h"
> > +#include "scrub/xfs_scrub.h"
> > +#include "scrub/scrub.h"
> > +#include "scrub/common.h"
> > +#include "scrub/trace.h"
> > +
> > +/* Common code for the metadata scrubbers. */
> > +
> > +/* Per-scrubber setup functions */
> > +
> > +/* Set us up with a transaction and an empty context. */
> > +int
> > +xfs_scrub_setup_fs(
> > +	struct xfs_scrub_context	*sc,
> > +	struct xfs_inode		*ip)
> > +{
> > +	return xfs_scrub_trans_alloc(sc->sm, sc->mp,
> > +			&M_RES(sc->mp)->tr_itruncate, 0, 0, 0, &sc->tp);
> > +}
> 
> Using the truncate transaction reservation really needs explaining
> here....

/*
 * Reserve a transaction with the largest reservation so that we can
 * handle the worst case log item space requirement if we have to repair
 * something big.
 */

Hm, come to think of it all callres of xfs_scrub_trans_alloc differ only
in the resblks passed in, so I can factor out the rtblks/flags/type
parameters.

> 
> .....
> >  
> > +/*
> > + * Test scrubber -- userspace uses this to probe if we're willing to
> > + * scrub or repair a given mountpoint.
> > + */
> 
> Yup, definitely should be called xfs_scrub_probe()....

Ok.

> > +int
> > +xfs_scrub_tester(
> > +	struct xfs_scrub_context	*sc)
> > +{
> > +	if (sc->sm->sm_ino || sc->sm->sm_agno)
> > +		return -EINVAL;
> > +	if (sc->sm->sm_gen & XFS_SCRUB_OFLAG_CORRUPT)
> > +		sc->sm->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT;
> > +	if (sc->sm->sm_gen & XFS_SCRUB_OFLAG_PREEN)
> > +		sc->sm->sm_flags |= XFS_SCRUB_OFLAG_PREEN;
> > +	if (sc->sm->sm_gen & XFS_SCRUB_OFLAG_XFAIL)
> > +		sc->sm->sm_flags |= XFS_SCRUB_OFLAG_XFAIL;
> > +	if (sc->sm->sm_gen & XFS_SCRUB_OFLAG_XCORRUPT)
> > +		sc->sm->sm_flags |= XFS_SCRUB_OFLAG_XCORRUPT;
> > +	if (sc->sm->sm_gen & XFS_SCRUB_OFLAG_INCOMPLETE)
> > +		sc->sm->sm_flags |= XFS_SCRUB_OFLAG_INCOMPLETE;
> > +	if (sc->sm->sm_gen & XFS_SCRUB_OFLAG_WARNING)
> > +		sc->sm->sm_flags |= XFS_SCRUB_OFLAG_WARNING;
> > +	if (sc->sm->sm_gen & ~XFS_SCRUB_FLAGS_OUT)
> > +		return -ENOENT;
> 
> Shouldn't this check should be first? If not, comment to explain?
> 
> Also, I find that really hard to parse because it's so dense and
> so much is repeated over and over again (makes my pattern matching
> brain cells scream). It's copying the exact same flags from
> sc->sm->sm_gen to sc->sm->sm_flags, so why not somethign like:
> 
> 
> 	struct xfs_scrub_m...	*sm = sc->sm;
> 
> 	if (sm->sm_ino || sm->sm_agno)
> 		return -EINVAL;
> 
> 	sm->flags = sm->sm_gen & XFS_SCRUB_FLAGS_OUT;
> 	if (sm->sm_gen & ~XFS_SCRUB_FLAGS_OUT)
> 		return -ENOENT;

Yes, that could be:

	if (sm->sm_ino || sm->sm_agno)
		return -EINVAL;
	if (sm->sm_gen & ~XFS_SCRUB_FLAGS_OUT)
		return -ENOENT;

	/* Echo parameters back to userspace to prove that we exist. */
	sm->flags = sm->sm_gen & XFS_SCRUB_FLAGS_OUT;

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 03/27] xfs: create an ioctl to scrub AG metadata
  2017-09-21 17:52       ` Brian Foster
@ 2017-09-22  3:26         ` Darrick J. Wong
  0 siblings, 0 replies; 51+ messages in thread
From: Darrick J. Wong @ 2017-09-22  3:26 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Thu, Sep 21, 2017 at 01:52:19PM -0400, Brian Foster wrote:
> On Thu, Sep 21, 2017 at 10:35:10AM -0700, Darrick J. Wong wrote:
> > On Thu, Sep 21, 2017 at 10:36:39AM -0400, Brian Foster wrote:
> > > On Wed, Sep 20, 2017 at 05:17:56PM -0700, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > 
> > > > Create an ioctl that can be used to scrub internal filesystem metadata.
> > > > The new ioctl takes the metadata type, an (optional) AG number, an
> > > > (optional) inode number and generation, and a flags argument.  This will
> > > > be used by the upcoming XFS online scrub tool.
> > > > 
> > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > ---
> > > >  fs/xfs/Kconfig           |   17 ++++++++++++++
> > > >  fs/xfs/Makefile          |   11 +++++++++
> > > >  fs/xfs/libxfs/xfs_fs.h   |   53 +++++++++++++++++++++++++++++++++++++++++++++
> > > >  fs/xfs/scrub/scrub.c     |   54 ++++++++++++++++++++++++++++++++++++++++++++++
> > > >  fs/xfs/scrub/scrub.h     |   25 +++++++++++++++++++++
> > > >  fs/xfs/scrub/trace.c     |   41 +++++++++++++++++++++++++++++++++++
> > > >  fs/xfs/scrub/trace.h     |   33 ++++++++++++++++++++++++++++
> > > >  fs/xfs/scrub/xfs_scrub.h |   29 +++++++++++++++++++++++++
> > > >  fs/xfs/xfs_ioctl.c       |   28 ++++++++++++++++++++++++
> > > >  fs/xfs/xfs_ioctl32.c     |    1 +
> > > >  10 files changed, 292 insertions(+)
> > > >  create mode 100644 fs/xfs/scrub/scrub.c
> > > >  create mode 100644 fs/xfs/scrub/scrub.h
> > > >  create mode 100644 fs/xfs/scrub/trace.c
> > > >  create mode 100644 fs/xfs/scrub/trace.h
> > > >  create mode 100644 fs/xfs/scrub/xfs_scrub.h
> > > > 
> > > > 
> > > 
> > > The code looks sane, though I think I need to understand the error codes
> > > a bit better. Perhaps once I get further into the series..
> > 
> > Yes, the ioctl needs documentation, though I'm unclear on where's an
> > appropriate place to put them.  The ioctl is not intended for general
> > consumption, so man-pages.git seems inappropriate.  I was thinking
> > either in xfs_fs.h directly, or perhaps xfsprogs' man pages?
> > 
> 
> For now (while all of this is still experimental and until you get a
> better answer :P), it seems reasonable enough to me to add basic
> interface documentation to the kernel source. Then we at least have the
> content and can move it out into a manpage or something later, if
> needed.

It's more or less documented in the source... but in a somewhat
piecemeal fashion as we add functionality.  I've drafted a manpage to
describe all of what the ioctl wants to do, so I'll tack that on the end
of this series (and cc you and Dave).

--D

> 
> Brian
> 
> > (We don't seem to document the xfs ioctls afaict?)
> > 
> > --D
> > 
> > > 
> > > Brian
> > > 
> > > > diff --git a/fs/xfs/Kconfig b/fs/xfs/Kconfig
> > > > index 1b98cfa..f42fcf1 100644
> > > > --- a/fs/xfs/Kconfig
> > > > +++ b/fs/xfs/Kconfig
> > > > @@ -71,6 +71,23 @@ config XFS_RT
> > > >  
> > > >  	  If unsure, say N.
> > > >  
> > > > +config XFS_ONLINE_SCRUB
> > > > +	bool "XFS online metadata check support"
> > > > +	default n
> > > > +	depends on XFS_FS
> > > > +	help
> > > > +	  If you say Y here you will be able to check metadata on a
> > > > +	  mounted XFS filesystem.  This feature is intended to reduce
> > > > +	  filesystem downtime by supplementing xfs_repair.  The key
> > > > +	  advantage here is to look for problems proactively so that
> > > > +	  they can be dealt with in a controlled manner.
> > > > +
> > > > +	  This feature is considered EXPERIMENTAL.  Use with caution!
> > > > +
> > > > +	  See the xfs_scrub man page in section 8 for additional information.
> > > > +
> > > > +	  If unsure, say N.
> > > > +
> > > >  config XFS_WARN
> > > >  	bool "XFS Verbose Warnings"
> > > >  	depends on XFS_FS && !XFS_DEBUG
> > > > diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> > > > index dbc33e0..f4312bc 100644
> > > > --- a/fs/xfs/Makefile
> > > > +++ b/fs/xfs/Makefile
> > > > @@ -138,3 +138,14 @@ xfs-$(CONFIG_XFS_POSIX_ACL)	+= xfs_acl.o
> > > >  xfs-$(CONFIG_SYSCTL)		+= xfs_sysctl.o
> > > >  xfs-$(CONFIG_COMPAT)		+= xfs_ioctl32.o
> > > >  xfs-$(CONFIG_EXPORTFS_BLOCK_OPS)	+= xfs_pnfs.o
> > > > +
> > > > +# online scrub/repair
> > > > +ifeq ($(CONFIG_XFS_ONLINE_SCRUB),y)
> > > > +
> > > > +# Tracepoints like to blow up, so build that before everything else
> > > > +
> > > > +xfs-y				+= $(addprefix scrub/, \
> > > > +				   trace.o \
> > > > +				   scrub.o \
> > > > +				   )
> > > > +endif
> > > > diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> > > > index 2c26c38..a4b4c8c 100644
> > > > --- a/fs/xfs/libxfs/xfs_fs.h
> > > > +++ b/fs/xfs/libxfs/xfs_fs.h
> > > > @@ -468,6 +468,58 @@ typedef struct xfs_swapext
> > > >  #define XFS_FSOP_GOING_FLAGS_LOGFLUSH		0x1	/* flush log but not data */
> > > >  #define XFS_FSOP_GOING_FLAGS_NOLOGFLUSH		0x2	/* don't flush log nor data */
> > > >  
> > > > +/* metadata scrubbing */
> > > > +struct xfs_scrub_metadata {
> > > > +	__u32 sm_type;		/* What to check? */
> > > > +	__u32 sm_flags;		/* flags; see below. */
> > > > +	__u64 sm_ino;		/* inode number. */
> > > > +	__u32 sm_gen;		/* inode generation. */
> > > > +	__u32 sm_agno;		/* ag number. */
> > > > +	__u64 sm_reserved[5];	/* pad to 64 bytes */
> > > > +};
> > > > +
> > > > +/*
> > > > + * Metadata types and flags for scrub operation.
> > > > + */
> > > > +
> > > > +/* Scrub subcommands. */
> > > > +
> > > > +/* Number of scrub subcommands. */
> > > > +#define XFS_SCRUB_TYPE_NR	0
> > > > +
> > > > +/* i: Repair this metadata. */
> > > > +#define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
> > > > +
> > > > +/* o: Metadata object needs repair. */
> > > > +#define XFS_SCRUB_OFLAG_CORRUPT		(1 << 1)
> > > > +
> > > > +/*
> > > > + * o: Metadata object could be optimized.  It's not corrupt, but
> > > > + *    we could improve on it somehow.
> > > > + */
> > > > +#define XFS_SCRUB_OFLAG_PREEN		(1 << 2)
> > > > +
> > > > +/* o: Cross-referencing failed. */
> > > > +#define XFS_SCRUB_OFLAG_XFAIL		(1 << 3)
> > > > +
> > > > +/* o: Metadata object disagrees with cross-referenced metadata. */
> > > > +#define XFS_SCRUB_OFLAG_XCORRUPT	(1 << 4)
> > > > +
> > > > +/* o: Scan was not complete. */
> > > > +#define XFS_SCRUB_OFLAG_INCOMPLETE	(1 << 5)
> > > > +
> > > > +/* o: Metadata object looked funny but isn't corrupt. */
> > > > +#define XFS_SCRUB_OFLAG_WARNING		(1 << 6)
> > > > +
> > > > +#define XFS_SCRUB_FLAGS_IN	(XFS_SCRUB_IFLAG_REPAIR)
> > > > +#define XFS_SCRUB_FLAGS_OUT	(XFS_SCRUB_OFLAG_CORRUPT | \
> > > > +				 XFS_SCRUB_OFLAG_PREEN | \
> > > > +				 XFS_SCRUB_OFLAG_XFAIL | \
> > > > +				 XFS_SCRUB_OFLAG_XCORRUPT | \
> > > > +				 XFS_SCRUB_OFLAG_INCOMPLETE | \
> > > > +				 XFS_SCRUB_OFLAG_WARNING)
> > > > +#define XFS_SCRUB_FLAGS_ALL	(XFS_SCRUB_FLAGS_IN | XFS_SCRUB_FLAGS_OUT)
> > > > +
> > > >  /*
> > > >   * AG reserved block counters
> > > >   */
> > > > @@ -522,6 +574,7 @@ struct xfs_fsop_ag_resblks {
> > > >  #define XFS_IOC_ZERO_RANGE	_IOW ('X', 57, struct xfs_flock64)
> > > >  #define XFS_IOC_FREE_EOFBLOCKS	_IOR ('X', 58, struct xfs_fs_eofblocks)
> > > >  /*	XFS_IOC_GETFSMAP ------ hoisted 59         */
> > > > +#define XFS_IOC_SCRUB_METADATA	_IOWR('X', 60, struct xfs_scrub_metadata)
> > > >  
> > > >  /*
> > > >   * ioctl commands that replace IRIX syssgi()'s
> > > > diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
> > > > new file mode 100644
> > > > index 0000000..5db2a6f
> > > > --- /dev/null
> > > > +++ b/fs/xfs/scrub/scrub.c
> > > > @@ -0,0 +1,54 @@
> > > > +/*
> > > > + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> > > > + *
> > > > + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> > > > + *
> > > > + * This program is free software; you can redistribute it and/or
> > > > + * modify it under the terms of the GNU General Public License
> > > > + * as published by the Free Software Foundation; either version 2
> > > > + * of the License, or (at your option) any later version.
> > > > + *
> > > > + * This program is distributed in the hope that it would be useful,
> > > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > > > + * GNU General Public License for more details.
> > > > + *
> > > > + * You should have received a copy of the GNU General Public License
> > > > + * along with this program; if not, write the Free Software Foundation,
> > > > + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> > > > + */
> > > > +#include "xfs.h"
> > > > +#include "xfs_fs.h"
> > > > +#include "xfs_shared.h"
> > > > +#include "xfs_format.h"
> > > > +#include "xfs_trans_resv.h"
> > > > +#include "xfs_mount.h"
> > > > +#include "xfs_defer.h"
> > > > +#include "xfs_btree.h"
> > > > +#include "xfs_bit.h"
> > > > +#include "xfs_log_format.h"
> > > > +#include "xfs_trans.h"
> > > > +#include "xfs_sb.h"
> > > > +#include "xfs_inode.h"
> > > > +#include "xfs_alloc.h"
> > > > +#include "xfs_alloc_btree.h"
> > > > +#include "xfs_bmap.h"
> > > > +#include "xfs_bmap_btree.h"
> > > > +#include "xfs_ialloc.h"
> > > > +#include "xfs_ialloc_btree.h"
> > > > +#include "xfs_refcount.h"
> > > > +#include "xfs_refcount_btree.h"
> > > > +#include "xfs_rmap.h"
> > > > +#include "xfs_rmap_btree.h"
> > > > +#include "scrub/xfs_scrub.h"
> > > > +#include "scrub/scrub.h"
> > > > +#include "scrub/trace.h"
> > > > +
> > > > +/* Dispatch metadata scrubbing. */
> > > > +int
> > > > +xfs_scrub_metadata(
> > > > +	struct xfs_inode		*ip,
> > > > +	struct xfs_scrub_metadata	*sm)
> > > > +{
> > > > +	return -EOPNOTSUPP;
> > > > +}
> > > > diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
> > > > new file mode 100644
> > > > index 0000000..eb1cd9d
> > > > --- /dev/null
> > > > +++ b/fs/xfs/scrub/scrub.h
> > > > @@ -0,0 +1,25 @@
> > > > +/*
> > > > + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> > > > + *
> > > > + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> > > > + *
> > > > + * This program is free software; you can redistribute it and/or
> > > > + * modify it under the terms of the GNU General Public License
> > > > + * as published by the Free Software Foundation; either version 2
> > > > + * of the License, or (at your option) any later version.
> > > > + *
> > > > + * This program is distributed in the hope that it would be useful,
> > > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > > > + * GNU General Public License for more details.
> > > > + *
> > > > + * You should have received a copy of the GNU General Public License
> > > > + * along with this program; if not, write the Free Software Foundation,
> > > > + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> > > > + */
> > > > +#ifndef __XFS_SCRUB_SCRUB_H__
> > > > +#define __XFS_SCRUB_SCRUB_H__
> > > > +
> > > > +/* Metadata scrubbers */
> > > > +
> > > > +#endif	/* __XFS_SCRUB_SCRUB_H__ */
> > > > diff --git a/fs/xfs/scrub/trace.c b/fs/xfs/scrub/trace.c
> > > > new file mode 100644
> > > > index 0000000..c59fd41
> > > > --- /dev/null
> > > > +++ b/fs/xfs/scrub/trace.c
> > > > @@ -0,0 +1,41 @@
> > > > +/*
> > > > + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> > > > + *
> > > > + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> > > > + *
> > > > + * This program is free software; you can redistribute it and/or
> > > > + * modify it under the terms of the GNU General Public License
> > > > + * as published by the Free Software Foundation; either version 2
> > > > + * of the License, or (at your option) any later version.
> > > > + *
> > > > + * This program is distributed in the hope that it would be useful,
> > > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > > > + * GNU General Public License for more details.
> > > > + *
> > > > + * You should have received a copy of the GNU General Public License
> > > > + * along with this program; if not, write the Free Software Foundation,
> > > > + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> > > > + */
> > > > +#include "xfs.h"
> > > > +#include "xfs_fs.h"
> > > > +#include "xfs_shared.h"
> > > > +#include "xfs_format.h"
> > > > +#include "xfs_log_format.h"
> > > > +#include "xfs_trans_resv.h"
> > > > +#include "xfs_mount.h"
> > > > +#include "xfs_defer.h"
> > > > +#include "xfs_da_format.h"
> > > > +#include "xfs_defer.h"
> > > > +#include "xfs_inode.h"
> > > > +#include "xfs_btree.h"
> > > > +#include "xfs_trans.h"
> > > > +#include "scrub/xfs_scrub.h"
> > > > +#include "scrub/scrub.h"
> > > > +
> > > > +/*
> > > > + * We include this last to have the helpers above available for the trace
> > > > + * event implementations.
> > > > + */
> > > > +#define CREATE_TRACE_POINTS
> > > > +#include "scrub/trace.h"
> > > > diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
> > > > new file mode 100644
> > > > index 0000000..a95a7c8
> > > > --- /dev/null
> > > > +++ b/fs/xfs/scrub/trace.h
> > > > @@ -0,0 +1,33 @@
> > > > +/*
> > > > + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> > > > + *
> > > > + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> > > > + *
> > > > + * This program is free software; you can redistribute it and/or
> > > > + * modify it under the terms of the GNU General Public License
> > > > + * as published by the Free Software Foundation; either version 2
> > > > + * of the License, or (at your option) any later version.
> > > > + *
> > > > + * This program is distributed in the hope that it would be useful,
> > > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > > > + * GNU General Public License for more details.
> > > > + *
> > > > + * You should have received a copy of the GNU General Public License
> > > > + * along with this program; if not, write the Free Software Foundation,
> > > > + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> > > > + */
> > > > +#undef TRACE_SYSTEM
> > > > +#define TRACE_SYSTEM xfs_scrub
> > > > +
> > > > +#if !defined(_TRACE_XFS_SCRUB_TRACE_H) || defined(TRACE_HEADER_MULTI_READ)
> > > > +#define _TRACE_XFS_SCRUB_TRACE_H
> > > > +
> > > > +#include <linux/tracepoint.h>
> > > > +
> > > > +#endif /* _TRACE_XFS_SCRUB_TRACE_H */
> > > > +
> > > > +#undef TRACE_INCLUDE_PATH
> > > > +#define TRACE_INCLUDE_PATH .
> > > > +#define TRACE_INCLUDE_FILE scrub/trace
> > > > +#include <trace/define_trace.h>
> > > > diff --git a/fs/xfs/scrub/xfs_scrub.h b/fs/xfs/scrub/xfs_scrub.h
> > > > new file mode 100644
> > > > index 0000000..e00e0ea
> > > > --- /dev/null
> > > > +++ b/fs/xfs/scrub/xfs_scrub.h
> > > > @@ -0,0 +1,29 @@
> > > > +/*
> > > > + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> > > > + *
> > > > + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> > > > + *
> > > > + * This program is free software; you can redistribute it and/or
> > > > + * modify it under the terms of the GNU General Public License
> > > > + * as published by the Free Software Foundation; either version 2
> > > > + * of the License, or (at your option) any later version.
> > > > + *
> > > > + * This program is distributed in the hope that it would be useful,
> > > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > > > + * GNU General Public License for more details.
> > > > + *
> > > > + * You should have received a copy of the GNU General Public License
> > > > + * along with this program; if not, write the Free Software Foundation,
> > > > + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> > > > + */
> > > > +#ifndef __XFS_SCRUB_H__
> > > > +#define __XFS_SCRUB_H__
> > > > +
> > > > +#ifndef CONFIG_XFS_ONLINE_SCRUB
> > > > +# define xfs_scrub_metadata(ip, sm)	(-ENOTTY)
> > > > +#else
> > > > +int xfs_scrub_metadata(struct xfs_inode *ip, struct xfs_scrub_metadata *sm);
> > > > +#endif /* CONFIG_XFS_ONLINE_SCRUB */
> > > > +
> > > > +#endif	/* __XFS_SCRUB_H__ */
> > > > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> > > > index 44dc178..ab7a7f8 100644
> > > > --- a/fs/xfs/xfs_ioctl.c
> > > > +++ b/fs/xfs/xfs_ioctl.c
> > > > @@ -44,6 +44,7 @@
> > > >  #include "xfs_btree.h"
> > > >  #include <linux/fsmap.h>
> > > >  #include "xfs_fsmap.h"
> > > > +#include "scrub/xfs_scrub.h"
> > > >  
> > > >  #include <linux/capability.h>
> > > >  #include <linux/cred.h>
> > > > @@ -1702,6 +1703,30 @@ xfs_ioc_getfsmap(
> > > >  	return 0;
> > > >  }
> > > >  
> > > > +STATIC int
> > > > +xfs_ioc_scrub_metadata(
> > > > +	struct xfs_inode		*ip,
> > > > +	void				__user *arg)
> > > > +{
> > > > +	struct xfs_scrub_metadata	scrub;
> > > > +	int				error;
> > > > +
> > > > +	if (!capable(CAP_SYS_ADMIN))
> > > > +		return -EPERM;
> > > > +
> > > > +	if (copy_from_user(&scrub, arg, sizeof(scrub)))
> > > > +		return -EFAULT;
> > > > +
> > > > +	error = xfs_scrub_metadata(ip, &scrub);
> > > > +	if (error)
> > > > +		return error;
> > > > +
> > > > +	if (copy_to_user(arg, &scrub, sizeof(scrub)))
> > > > +		return -EFAULT;
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +
> > > >  int
> > > >  xfs_ioc_swapext(
> > > >  	xfs_swapext_t	*sxp)
> > > > @@ -1906,6 +1931,9 @@ xfs_file_ioctl(
> > > >  	case FS_IOC_GETFSMAP:
> > > >  		return xfs_ioc_getfsmap(ip, arg);
> > > >  
> > > > +	case XFS_IOC_SCRUB_METADATA:
> > > > +		return xfs_ioc_scrub_metadata(ip, arg);
> > > > +
> > > >  	case XFS_IOC_FD_TO_HANDLE:
> > > >  	case XFS_IOC_PATH_TO_HANDLE:
> > > >  	case XFS_IOC_PATH_TO_FSHANDLE: {
> > > > diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
> > > > index e8b4de3..972d4bd 100644
> > > > --- a/fs/xfs/xfs_ioctl32.c
> > > > +++ b/fs/xfs/xfs_ioctl32.c
> > > > @@ -557,6 +557,7 @@ xfs_file_compat_ioctl(
> > > >  	case XFS_IOC_ERROR_CLEARALL:
> > > >  	case FS_IOC_GETFSMAP:
> > > >  	case XFS_IOC_GET_AG_RESBLKS:
> > > > +	case XFS_IOC_SCRUB_METADATA:
> > > >  		return xfs_file_ioctl(filp, cmd, p);
> > > >  #ifndef BROKEN_X86_ALIGNMENT
> > > >  	/* These are handled fine if no alignment issues */
> > > > 
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH] man: describe the metadata scrubbing ioctl
  2017-09-21  0:17 [PATCH v10 00/27] xfs: online scrub support Darrick J. Wong
                   ` (26 preceding siblings ...)
  2017-09-21  0:20 ` [PATCH 27/27] xfs: scrub quota information Darrick J. Wong
@ 2017-09-22  3:27 ` Darrick J. Wong
  27 siblings, 0 replies; 51+ messages in thread
From: Darrick J. Wong @ 2017-09-22  3:27 UTC (permalink / raw)
  To: linux-xfs, Brian Foster, Dave Chinner

Document the XFS-specific metadata scrub/repair ioctl's behavior,
arguments, and side effects.  Dump this in xfsprogs for lack of a
better destination.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 man/Makefile                        |    2 
 man/man2/ioctl_xfs_scrub_metadata.2 |  298 +++++++++++++++++++++++++++++++++++
 2 files changed, 299 insertions(+), 1 deletion(-)
 create mode 100644 man/man2/ioctl_xfs_scrub_metadata.2

diff --git a/man/Makefile b/man/Makefile
index 863284c..cae891f 100644
--- a/man/Makefile
+++ b/man/Makefile
@@ -5,7 +5,7 @@
 TOPDIR = ..
 include $(TOPDIR)/include/builddefs
 
-SUBDIRS = man3 man5 man8
+SUBDIRS = man2 man3 man5 man8
 
 default : $(SUBDIRS)
 
diff --git a/man/man2/ioctl_xfs_scrub_metadata.2 b/man/man2/ioctl_xfs_scrub_metadata.2
new file mode 100644
index 0000000..fa1c56d
--- /dev/null
+++ b/man/man2/ioctl_xfs_scrub_metadata.2
@@ -0,0 +1,298 @@
+.\" Copyright (c) 2017, Oracle.  All rights reserved.
+.\"
+.\" %%%LICENSE_START(GPLv2+_DOC_FULL)
+.\" This is free documentation; you can redistribute it and/or
+.\" modify it under the terms of the GNU General Public License as
+.\" published by the Free Software Foundation; either version 2 of
+.\" the License, or (at your option) any later version.
+.\"
+.\" The GNU General Public License's references to "object code"
+.\" and "executables" are to be interpreted as the output of any
+.\" document formatting or typesetting system, including
+.\" intermediate and printed output.
+.\"
+.\" This manual is distributed in the hope that it will be useful,
+.\" but WITHOUT ANY WARRANTY; without even the implied warranty of
+.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+.\" GNU General Public License for more details.
+.\"
+.\" You should have received a copy of the GNU General Public
+.\" License along with this manual; if not, see
+.\" <http://www.gnu.org/licenses/>.
+.\" %%%LICENSE_END
+.TH IOCTL-XFS-SCRUB-METADATA 2 2017-09-21 "Linux" "Linux Programmer's Manual"
+.SH NAME
+ioctl_xfs_scrub_metadata \- check XFS filesystem metadata
+.SH SYNOPSIS
+.br
+.B #include <xfs/xfs_fs.h>
+.PP
+.BI "int ioctl(int " dest_fd ", XFS_IOC_SCRUB_METADATA, struct xfs_scrub_metadata *" arg );
+.SH DESCRIPTION
+This XFS ioctl asks the kernel driver to examine a piece of metadata for
+errors or suboptimal metadata.
+Examination includes running the metadata verifiers, checking records
+for obviously incorrect or impossible values, and cross-referencing each
+record with any other available metadata in the filesystem.
+This ioctl can also try to repair or optimize metadata, though this may
+tie up the filesystem for a long period of time.
+The type and location of the metadata is conveyed in a structure of the
+following form:
+.PP
+.in +4n
+.EX
+struct xfs_scrub_metadata {
+	__u32 sm_type;
+	__u32 sm_flags;
+	__u64 sm_ino;
+	__u32 sm_gen;
+	__u32 sm_agno;
+	__u64 sm_reserved[5];
+};
+.EE
+.in
+.PP
+The field
+.I sm_reserved
+must be zero.
+.PP
+The field
+.I sm_type
+indicates the type of metadata to check:
+.RS 0.4i
+.TP
+.B XFS_SCRUB_TYPE_PROBE
+Probe the kernel to see if it is willing to try to check or repair this
+filesystem.
+If any
+.B sm_flags
+output flags are set in
+.BR sm_gen ", "
+they will be copied to
+.B sm_flags
+before the call returns.
+
+.PD 0
+.PP
+.nf
+.B XFS_SCRUB_TYPE_SB
+.B XFS_SCRUB_TYPE_AGF
+.B XFS_SCRUB_TYPE_AGFL
+.fi
+.TP
+.B XFS_SCRUB_TYPE_AGI
+Examine a given allocation group's superblock, free space header, free
+block list, or inode header, respectively.
+Headers are checked for obviously incorrect values and cross-referenced
+against the allocation group's metadata btrees, if possible.
+The allocation group number must be given in
+.BR sm_agno "."
+
+.PP
+.nf
+.B XFS_SCRUB_TYPE_BNOBT
+.B XFS_SCRUB_TYPE_CNTBT
+.B XFS_SCRUB_TYPE_INOBT
+.B XFS_SCRUB_TYPE_FINOBT
+.B XFS_SCRUB_TYPE_RMAPBT
+.fi
+.TP
+.B XFS_SCRUB_TYPE_REFCNTBT
+Examine a given allocation group's free space btrees, inode btress, reverse
+mapping btrees, or reference count btrees, respectively.
+against the allocation group's metadata btrees, if possible.
+Space extent records are checked for obviously incorrect values and
+cross-referenced with the other space extent metadata to ensure that
+there are no conflicts.
+The allocation group number must be given in
+.BR sm_agno "."
+
+.TP
+.B XFS_SCRUB_TYPE_INODE
+Examine a given inode's inode record for obviously incorrect values and
+discrepancies with the rest of filesystem metadata.
+Parent pointers are checked for impossible inode values and are then
+followed up to the parent directory to ensure that the linkage makes
+sense.
+The inode to examine can be specified either through
+.B sm_ino
+and
+.BR sm_gen "; "
+if not specified, then the file described by
+.B dest_fd
+will be examined.
+
+.PP
+.nf
+.B XFS_SCRUB_TYPE_BMBTD
+.B XFS_SCRUB_TYPE_BMBTA
+.fi
+.TP
+.B XFS_SCRUB_TYPE_BMBTC
+Examine a given inode's data block map, extended attribute block map,
+copy on write block map, or parent inode pointer.
+Inode records are examined for obviously incorrect values and
+discrepancies with the three block map types.
+The block maps are checked for obviously wrong values and
+cross-referenced with the allocation group space extent metadata for
+discrepancies.
+The inode to examine can be specified in the same manner as
+.BR XFS_SCRUB_TYPE_INODE "."
+
+.TP
+.B XFS_SCRUB_TYPE_XATTR
+Examine the extended attribute records and indices of a given inode for
+incorrect pointers and other signs of damage.
+The inode to examine can be specified in the same manner as
+.BR XFS_SCRUB_TYPE_INODE "."
+
+.TP
+.B XFS_SCRUB_TYPE_DIR
+Examine the entries in a given directory for invalid data or dangling pointers.
+The directory to examine can be specified in the same manner as
+.BR XFS_SCRUB_TYPE_INODE "."
+
+.TP
+.B XFS_SCRUB_TYPE_SYMLINK
+Examine the target of a symbolic link for obvious pathname problems.
+The link to examine can be specified in the same manner as
+.BR XFS_SCRUB_TYPE_INODE "."
+
+.PP
+.nf
+.B XFS_SCRUB_TYPE_RTBITMAP
+.fi
+.TP
+.B XFS_SCRUB_TYPE_RTSUM
+Examine the realtime block bitmap and realtime summary inodes for
+corruption.
+
+.PP
+.nf
+.B XFS_SCRUB_TYPE_UQUOTA
+.B XFS_SCRUB_TYPE_GQUOTA
+.fi
+.TP
+.B XFS_SCRUB_TYPE_PQUOTA
+Examine all user, group, or project quota records for corruption.
+.RE
+
+.PD 1
+.PP
+The field
+.I sm_flags
+control the behavior of the scrub operation and provide more information
+about the outcome of the operation.
+If none of the
+.B XFS_SCRUB_OFLAG_*
+flags are set upon return, the metadata is clean.
+.RS 0.4i
+.TP
+.B XFS_SCRUB_IFLAG_REPAIR
+If the caller sets this flag, the checker will examine the metadata and
+try to fix any problems or suboptimal metadata that it finds.
+If no errors occur during the repair operation, the check is performed a
+second time to determine if the repair succeeded.
+If errors do occur, the call returns an error status immediately.
+.TP
+.B XFS_SCRUB_OFLAG_CORRUPT
+The metadata was corrupt when the call returned.
+If
+.B XFS_SCRUB_IFLAG_REPAIR
+was specified, then an attempted repair failed to fix the problem.
+Unmount the filesystem and run
+.B xfs_repair
+to fix the filesystem.
+.TP
+.B XFS_SCRUB_OFLAG_PREEN
+The metadata is ok, but some aspect of the metadata could be optimized
+to increase performance.
+.TP
+.B XFS_SCRUB_OFLAG_XFAIL
+Filesystem errors were encountered when accessing other metadata to
+cross-reference the records attached to this metadata object.
+.TP
+.B XFS_SCRUB_OFLAG_XCORRUPT
+Discrepancies were found when cross-referencing the records attached to
+this metadata object against all other available metadata in the system.
+.TP
+.B XFS_SCRUB_OFLAG_INCOMPLETE
+The checker was unable to complete its check of all records.
+.TP
+.B XFS_SCRUB_OFLAG_WARNING
+The checker encountered a metadata object with potentially problematic
+records.
+However, the records were not obviously corrupt.
+.RE
+.PP
+For metadata checkers that operate on inodes or inode metadata, the fields
+.IR sm_ino " and " sm_gen
+are the inode number and generation number of the inode to check.
+If the inode number is zero, the inode represented by
+.I dest_fd
+is used instead.
+.PP
+For metadata checkers that operate on allocation group metadata, the field
+.I sm_agno
+indicates the allocation group in which to find the metadata.
+.PP
+For metadata checkers that operate on filesystem-wide metadata, no
+further arguments are required.
+.SH RETURN VALUE
+On error, \-1 is returned, and
+.I errno
+is set to indicate the error.
+.PP
+.SH ERRORS
+Error codes can be one of, but are not limited to, the following:
+.TP
+.B EBUSY
+The filesystem object is busy; the repair will have to be tried again.
+.TP
+.B EFSCORRUPTED
+Severe filesystem corruption was detected and could not be repaired.
+Unmount the filesystem and run
+.B xfs_repair
+to fix the filesystem.
+.TP
+.B EINVAL
+One or more of the arguments specified is invalid.
+.TP
+.B ENOENT
+The specified metadata object does not exist.
+For example, this error code is returned for a
+.B XFS_SCRUB_TYPE_REFCNTBT
+request on a filesystem that does not support reflink.
+.TP
+.B ENOMEM
+There was not sufficient memory to perform the scrub or repair operation.
+Some operations (most notably reference count checking) require a lot of
+memory.
+.TP
+.B ENOSPC
+There is not enough free disk space to attempt a repair.
+.TP
+.B ENOTRECOVERABLE
+Filesystem was mounted in
+.B norecovery
+mode and therefore has an unclean log.
+.TP
+.B ENOTTY
+Online scrubbing or repair were not enabled.
+.TP
+.B EOPNOTSUPP
+Repairs of the requested metadata object are not supported.
+.TP
+.B EROFS
+Filesystem is read-only and a repair was requested.
+.TP
+.B ESHUTDOWN;
+Filesystem is shut down due to previous errors.
+.SH CONFORMING TO
+This API is specific to XFS on Linux.
+.SH NOTES
+These operations may tie up the filesystem for a long time.
+A calling process can be stop the operation by being sent a fatal
+signal, but non-fatal signals are blocked.
+.SH SEE ALSO
+.BR ioctl (2)

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH 06/27] xfs: create helpers to record and deal with scrub problems
  2017-09-21  0:18 ` [PATCH 06/27] xfs: create helpers to record and deal with scrub problems Darrick J. Wong
@ 2017-09-22  7:16   ` Dave Chinner
  2017-09-22 16:44     ` Darrick J. Wong
  0 siblings, 1 reply; 51+ messages in thread
From: Dave Chinner @ 2017-09-22  7:16 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Sep 20, 2017 at 05:18:14PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Create helper functions to record crc and corruption problems, and
> deal with any other runtime errors that arise.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/scrub/common.c |  243 +++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/scrub/common.h |   39 ++++++++
>  fs/xfs/scrub/trace.h  |  193 +++++++++++++++++++++++++++++++++++++++
>  3 files changed, 475 insertions(+)
> 
> 
> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> index 13ccb36..cf3f1365 100644
> --- a/fs/xfs/scrub/common.c
> +++ b/fs/xfs/scrub/common.c
> @@ -47,6 +47,249 @@
>  
>  /* Common code for the metadata scrubbers. */
>  
> +/* Check for operational errors. */
> +bool
> +xfs_scrub_op_ok(
> +	struct xfs_scrub_context	*sc,
> +	xfs_agnumber_t			agno,
> +	xfs_agblock_t			bno,
> +	int				*error)
> +{
> +	switch (*error) {
> +	case 0:
> +		return true;
> +	case -EDEADLOCK:
> +		/* Used to restart an op with deadlock avoidance. */
> +		trace_xfs_scrub_deadlock_retry(sc->ip, sc->sm, *error);
> +		break;
> +	case -EFSBADCRC:
> +	case -EFSCORRUPTED:
> +		/* Note the badness but don't abort. */
> +		sc->sm->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT;
> +		*error = 0;
> +		/* fall through */
> +	default:
> +		trace_xfs_scrub_op_error(sc, agno, bno, *error,
> +				__return_address);
> +		break;
> +	}
> +	return false;
> +}

What are the semantics here w.r.t. *error? on some errors it's
cleared before we return, on others it's ignored. It's as clear as
mud what we should expect from these functions...

> +/* Check for metadata block optimization possibilities. */
> +bool
> +xfs_scrub_block_preen_ok(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_buf			*bp,
> +	bool				fs_ok)
> +{
> +	struct xfs_mount		*mp = sc->mp;
> +	xfs_fsblock_t			fsbno;
> +	xfs_agnumber_t			agno;
> +	xfs_agblock_t			bno;
> +
> +	if (fs_ok)
> +		return fs_ok;
> +
> +	fsbno = XFS_DADDR_TO_FSB(mp, bp->b_bn);
> +	agno = XFS_FSB_TO_AGNO(mp, fsbno);
> +	bno = XFS_FSB_TO_AGBNO(mp, fsbno);
> +
> +	sc->sm->sm_flags |= XFS_SCRUB_OFLAG_PREEN;
> +	trace_xfs_scrub_block_preen(sc, agno, bno, __return_address);
> +	return fs_ok;
> +}

Again, I'm not sure what the return value semantics of the functioon
are? Why does the fs_ok return shortcut exist?

Same for all the other functions...

> +
> +/* Check for inode metadata non-corruption problems. */
> +bool
> +xfs_scrub_ino_warn_ok(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_buf			*bp,
> +	bool				fs_ok)

Confusing. What's the difference between a corruption problem and a
"non-corruption problem" that requires a warning?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 07/27] xfs: create helpers to scrub a metadata btree
  2017-09-21  0:18 ` [PATCH 07/27] xfs: create helpers to scrub a metadata btree Darrick J. Wong
@ 2017-09-22  7:23   ` Dave Chinner
  2017-09-22 16:59     ` Darrick J. Wong
  0 siblings, 1 reply; 51+ messages in thread
From: Dave Chinner @ 2017-09-22  7:23 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Sep 20, 2017 at 05:18:20PM -0700, Darrick J. Wong wrote:
> +/* Check for btree operation errors . */
> +bool
> +xfs_scrub_btree_op_ok(
> +	struct xfs_scrub_context	*sc,
> +	struct xfs_btree_cur		*cur,
> +	int				level,
> +	int				*error)
> +{
> +	if (*error == 0)
> +		return true;
> +
> +	switch (*error) {
> +	case -EDEADLOCK:
> +		/* Used to restart an op with deadlock avoidance. */
> +		trace_xfs_scrub_deadlock_retry(sc->ip, sc->sm, *error);
> +		break;
> +	case -EFSBADCRC:
> +	case -EFSCORRUPTED:
> +		/* Note the badness but don't abort. */
> +		sc->sm->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT;
> +		*error = 0;
> +		/* fall through */
> +	default:
> +		if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE)
> +			trace_xfs_scrub_ifork_btree_op_error(sc, cur, level,
> +					*error, __return_address);
> +		else
> +			trace_xfs_scrub_btree_op_error(sc, cur, level,
> +					*error, __return_address);

Why different tracepoints when you could just output the
cur->bc_flags in the trace output and use the same tracepoint?
Doing that looks like it would simplify the tracepoint code
in this patch....

> +/* Figure out which block the btree cursor was pointing to. */
> +static inline xfs_fsblock_t
> +xfs_scrub_btree_cur_fsbno(
> +	struct xfs_btree_cur		*cur,
> +	int				level)
> +{
> +	if (level < cur->bc_nlevels && cur->bc_bufs[level])
> +		return XFS_DADDR_TO_FSB(cur->bc_mp, cur->bc_bufs[level]->b_bn);
> +	else if (cur->bc_flags & XFS_BTREE_LONG_PTRS)

no need for "else if", just "if" will do.

> +		return XFS_INO_TO_FSB(cur->bc_mp, cur->bc_private.b.ip->i_ino);

This makes no sense to me. Why are we returning the block address of
the inode here? It's not part of the btree....


> +	return XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_private.a.agno, 0);

Nor is the first block of the AG....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 08/27] xfs: scrub the shape of a metadata btree
  2017-09-21  0:18 ` [PATCH 08/27] xfs: scrub the shape of " Darrick J. Wong
@ 2017-09-22 15:22   ` Brian Foster
  2017-09-22 17:22     ` Darrick J. Wong
  0 siblings, 1 reply; 51+ messages in thread
From: Brian Foster @ 2017-09-22 15:22 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Sep 20, 2017 at 05:18:26PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Create a function that can check the shape of a btree -- each block
> passes basic inspection and all the pointers look ok.  In the next patch
> we'll add the ability to check the actual keys and records stored within
> the btree.  Add some helper functions so that we report detailed scrub
> errors in a uniform manner in dmesg.  These are helper functions for
> subsequent patches.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_btree.c |   16 +++
>  fs/xfs/libxfs/xfs_btree.h |    7 +
>  fs/xfs/scrub/btree.c      |  236 +++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/scrub/common.h     |   13 ++
>  4 files changed, 268 insertions(+), 4 deletions(-)
> 
> 
...
> diff --git a/fs/xfs/scrub/btree.c b/fs/xfs/scrub/btree.c
> index adf5d09..a9c2bf3 100644
> --- a/fs/xfs/scrub/btree.c
> +++ b/fs/xfs/scrub/btree.c
...
> @@ -109,6 +255,92 @@ xfs_scrub_btree(
>  	struct xfs_owner_info		*oinfo,
>  	void				*private)
>  {
> -	xfs_scrub_btree_op_ok(sc, cur, 0, false);
> -	return -EOPNOTSUPP;
> +	struct xfs_scrub_btree		bs = {0};
> +	union xfs_btree_ptr		ptr;
> +	union xfs_btree_ptr		*pp;
> +	struct xfs_btree_block		*block;
> +	int				level;
> +	struct xfs_buf			*bp;
> +	int				i;
> +	int				error = 0;
> +
> +	/* Initialize scrub state */
> +	bs.cur = cur;
> +	bs.scrub_rec = scrub_fn;
> +	bs.oinfo = oinfo;
> +	bs.firstrec = true;
> +	bs.private = private;
> +	bs.sc = sc;
> +	for (i = 0; i < XFS_BTREE_MAXLEVELS; i++)
> +		bs.firstkey[i] = true;
> +	INIT_LIST_HEAD(&bs.to_check);
> +
> +	/* Don't try to check a tree with a height we can't handle. */
> +	if (!xfs_scrub_btree_check_ok(sc, cur, 0, cur->bc_nlevels > 0 &&
> +			cur->bc_nlevels <= XFS_BTREE_MAXLEVELS))
> +		goto out;
> +
> +	/* Make sure the root isn't in the superblock. */
> +	if (!(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE)) {
> +		cur->bc_ops->init_ptr_from_cur(cur, &ptr);
> +		error = xfs_scrub_btree_ptr(&bs, cur->bc_nlevels, &ptr);
> +		if (!xfs_scrub_btree_op_ok(sc, cur, cur->bc_nlevels - 1, &error))
> +			goto out;
> +	}
> +

This is kind of in line with Dave's comments on the previous patch that
introduce some of these helpers. I just glanced over them for now
because I didn't have enough context to grok the error processing.

FWIW, the btree_op_ok()/btree_check_ok() stuff kind of makes my eyes
cross a bit because I can't easily see the logic checks or distinguish
between those and error code checks. This is also a bit confusing
because it looks like we overload return codes for various things. E.g.,
we generate -EFSCORRUPTED in some cases just so the caller can set the
state on the context and clear it, but then we still use the fact that
an error _was_ set to control the flow of the task via the op_ok()
return value. This makes some of the code flow/decision making logic
hard to follow, particularly since some of that state looks like it can
be lost.

Case in point.. what happens if say xfs_btree_increment() returns
-EFSCORRUPTED back to xfs_scrub_btree_block_check_sibling()? It looks to
me that the latter calls btree_op_ok() to set the corrupt state, clears
the error code and skips out appropriately.
xfs_scrub_btree_block_check_sibling() now returns zero, which
potentially bubbles up to xfs_scrub_btree() where we check the error
code again. Is it expected that error == 0 here? What is supposed to
happen here?

I'm wondering if this could all be made more clear by trying to
explicitly separate out operational errors, scrub failures and whatever
we want to call the logic that clears an -EFSCORRUPTED/-EFSBADCRC error
code but still indicates something happened. :P

For starters, rather than wrap every logic check with btree_op_check(),
could we use explicit logic and let each function update the context
based on problems it found? For example, something like the following is
much more easy to read for me than the associated logic above:

	/* Don't try to check a tree with a height we can't handle. */
	if (!(cur->bc_nlevels > 0 &&
	      cur->bc_nlevels <= XFS_BTREE_MAXLEVELS)) {
		xfs_scrub_sc_corrupt(...);
		goto out;
	}

And of course the context update calls could be factored into an
out_corrupt label or something where appropriate.

Beyond that, where we need to identify a bit of metadata is busted to
perhaps do something like skip it but not abort (as we may have filtered
out an -EFSCORRUPTED) return code, could we pass a flag down a
particular callchain (i.e., think 'bool *bad' or 'int *stat' a la the
core btree code)? Then we can still transfer that state back up the
chain and the caller(s) can distinguish operational errors from "this
thing is corrupted, act accordingly," regardless of how the corruption
was detected.

Brian

> +	/* Load the root of the btree. */
> +	level = cur->bc_nlevels - 1;
> +	cur->bc_ops->init_ptr_from_cur(cur, &ptr);
> +	error = xfs_scrub_btree_block(&bs, level, &ptr, &block, &bp);
> +	if (!xfs_scrub_btree_op_ok(sc, cur, cur->bc_nlevels - 1, &error))
> +		goto out;
> +
> +	cur->bc_ptrs[level] = 1;
> +
> +	while (level < cur->bc_nlevels) {
> +		block = xfs_btree_get_block(cur, level, &bp);
> +
> +		if (level == 0) {
> +			/* End of leaf, pop back towards the root. */
> +			if (cur->bc_ptrs[level] >
> +			    be16_to_cpu(block->bb_numrecs)) {
> +				if (level < cur->bc_nlevels - 1)
> +					cur->bc_ptrs[level + 1]++;
> +				level++;
> +				continue;
> +			}
> +
> +			if (xfs_scrub_should_terminate(&error))
> +				break;
> +
> +			cur->bc_ptrs[level]++;
> +			continue;
> +		}
> +
> +		/* End of node, pop back towards the root. */
> +		if (cur->bc_ptrs[level] > be16_to_cpu(block->bb_numrecs)) {
> +			if (level < cur->bc_nlevels - 1)
> +				cur->bc_ptrs[level + 1]++;
> +			level++;
> +			continue;
> +		}
> +
> +		/* Drill another level deeper. */
> +		pp = xfs_btree_ptr_addr(cur, cur->bc_ptrs[level], block);
> +		error = xfs_scrub_btree_ptr(&bs, level, pp);
> +		if (error) {
> +			error = 0;
> +			cur->bc_ptrs[level]++;
> +			continue;
> +		}
> +		level--;
> +		error = xfs_scrub_btree_block(&bs, level, pp, &block, &bp);
> +		if (!xfs_scrub_btree_op_ok(sc, cur, level, &error))
> +			goto out;
> +
> +		cur->bc_ptrs[level] = 1;
> +	}
> +
> +out:
> +	return error;
>  }
> diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
> index e1bb14b..9920488 100644
> --- a/fs/xfs/scrub/common.h
> +++ b/fs/xfs/scrub/common.h
> @@ -20,6 +20,19 @@
>  #ifndef __XFS_SCRUB_COMMON_H__
>  #define __XFS_SCRUB_COMMON_H__
>  
> +/* Should we end the scrub early? */
> +static inline bool
> +xfs_scrub_should_terminate(
> +	int		*error)
> +{
> +	if (fatal_signal_pending(current)) {
> +		if (*error == 0)
> +			*error = -EAGAIN;
> +		return true;
> +	}
> +	return false;
> +}
> +
>  /*
>   * Grab a transaction.  If we're going to repair something, we need to
>   * ensure there's enough reservation to make all the changes.  If not,
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 06/27] xfs: create helpers to record and deal with scrub problems
  2017-09-22  7:16   ` Dave Chinner
@ 2017-09-22 16:44     ` Darrick J. Wong
  2017-09-23  7:22       ` Dave Chinner
  0 siblings, 1 reply; 51+ messages in thread
From: Darrick J. Wong @ 2017-09-22 16:44 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Fri, Sep 22, 2017 at 05:16:08PM +1000, Dave Chinner wrote:
> On Wed, Sep 20, 2017 at 05:18:14PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Create helper functions to record crc and corruption problems, and
> > deal with any other runtime errors that arise.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/scrub/common.c |  243 +++++++++++++++++++++++++++++++++++++++++++++++++
> >  fs/xfs/scrub/common.h |   39 ++++++++
> >  fs/xfs/scrub/trace.h  |  193 +++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 475 insertions(+)
> > 
> > 
> > diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> > index 13ccb36..cf3f1365 100644
> > --- a/fs/xfs/scrub/common.c
> > +++ b/fs/xfs/scrub/common.c
> > @@ -47,6 +47,249 @@
> >  
> >  /* Common code for the metadata scrubbers. */
> >  
> > +/* Check for operational errors. */
> > +bool
> > +xfs_scrub_op_ok(
> > +	struct xfs_scrub_context	*sc,
> > +	xfs_agnumber_t			agno,
> > +	xfs_agblock_t			bno,
> > +	int				*error)
> > +{
> > +	switch (*error) {
> > +	case 0:
> > +		return true;
> > +	case -EDEADLOCK:
> > +		/* Used to restart an op with deadlock avoidance. */
> > +		trace_xfs_scrub_deadlock_retry(sc->ip, sc->sm, *error);
> > +		break;
> > +	case -EFSBADCRC:
> > +	case -EFSCORRUPTED:
> > +		/* Note the badness but don't abort. */
> > +		sc->sm->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT;
> > +		*error = 0;
> > +		/* fall through */
> > +	default:
> > +		trace_xfs_scrub_op_error(sc, agno, bno, *error,
> > +				__return_address);
> > +		break;
> > +	}
> > +	return false;
> > +}
> 
> What are the semantics here w.r.t. *error? on some errors it's
> cleared before we return, on others it's ignored. It's as clear as
> mud what we should expect from these functions...

If there's no error, we return true to tell the caller that it's ok to
move on to the next check in its list.

For non-verifier errors (e.g. ENOMEM) we return false to tell the caller
that there's no point in it continuing, and we preserve *error so that
the caller can return the *error up the stack.  Checking stops
immediately and the error is handed to userspace.

Verifier errors (EFSBADCRC/EFSCORRUPTED) are recorded in sm_flags and
the *error is cleared.  We return false to tell the caller that there's
point in it continuing with this record.  The caller returns zero to its
caller, which means that checking continues, having skipped whatever
block failed the verifier.

> > +/* Check for metadata block optimization possibilities. */
> > +bool
> > +xfs_scrub_block_preen_ok(
> > +	struct xfs_scrub_context	*sc,
> > +	struct xfs_buf			*bp,
> > +	bool				fs_ok)
> > +{
> > +	struct xfs_mount		*mp = sc->mp;
> > +	xfs_fsblock_t			fsbno;
> > +	xfs_agnumber_t			agno;
> > +	xfs_agblock_t			bno;
> > +
> > +	if (fs_ok)
> > +		return fs_ok;
> > +
> > +	fsbno = XFS_DADDR_TO_FSB(mp, bp->b_bn);
> > +	agno = XFS_FSB_TO_AGNO(mp, fsbno);
> > +	bno = XFS_FSB_TO_AGBNO(mp, fsbno);
> > +
> > +	sc->sm->sm_flags |= XFS_SCRUB_OFLAG_PREEN;
> > +	trace_xfs_scrub_block_preen(sc, agno, bno, __return_address);
> > +	return fs_ok;
> > +}
> 
> Again, I'm not sure what the return value semantics of the functioon
> are? Why does the fs_ok return shortcut exist?

The fs_ok functions are wrappers around an if test; the results of the
if test are passed in as fs_ok.

Therefore, if fs_ok then things are fine and we just skip out.

Otherwise, we found something and we should set sm_flags and jump out.

> Same for all the other functions...
> 
> > +
> > +/* Check for inode metadata non-corruption problems. */
> > +bool
> > +xfs_scrub_ino_warn_ok(
> > +	struct xfs_scrub_context	*sc,
> > +	struct xfs_buf			*bp,
> > +	bool				fs_ok)
> 
> Confusing. What's the difference between a corruption problem and a
> "non-corruption problem" that requires a warning?

Anything that's less severe than "your fs is corrupt" but otherwise
requires administrator review.  The inode scrubber sets this for inodes
with a -1 uid/gid.  XFS seems fine with it, but the VFS treats -1ULL as
a magic "doesn't exist" value, and then userspace can't change it.
The quota code sets warnings if it detects quota usage exceeding the
hard limit, or if the limits are larger than the fs, etc.

In these cases I'd want the administrator to have a look and/or take
corrective action, but XFS doesn't flag those situations as fs
corruption nor does xfs_repair complain about them as corruption.

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 07/27] xfs: create helpers to scrub a metadata btree
  2017-09-22  7:23   ` Dave Chinner
@ 2017-09-22 16:59     ` Darrick J. Wong
  0 siblings, 0 replies; 51+ messages in thread
From: Darrick J. Wong @ 2017-09-22 16:59 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Fri, Sep 22, 2017 at 05:23:22PM +1000, Dave Chinner wrote:
> On Wed, Sep 20, 2017 at 05:18:20PM -0700, Darrick J. Wong wrote:
> > +/* Check for btree operation errors . */
> > +bool
> > +xfs_scrub_btree_op_ok(
> > +	struct xfs_scrub_context	*sc,
> > +	struct xfs_btree_cur		*cur,
> > +	int				level,
> > +	int				*error)
> > +{
> > +	if (*error == 0)
> > +		return true;
> > +
> > +	switch (*error) {
> > +	case -EDEADLOCK:
> > +		/* Used to restart an op with deadlock avoidance. */
> > +		trace_xfs_scrub_deadlock_retry(sc->ip, sc->sm, *error);
> > +		break;
> > +	case -EFSBADCRC:
> > +	case -EFSCORRUPTED:
> > +		/* Note the badness but don't abort. */
> > +		sc->sm->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT;
> > +		*error = 0;
> > +		/* fall through */
> > +	default:
> > +		if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE)
> > +			trace_xfs_scrub_ifork_btree_op_error(sc, cur, level,
> > +					*error, __return_address);
> > +		else
> > +			trace_xfs_scrub_btree_op_error(sc, cur, level,
> > +					*error, __return_address);
> 
> Why different tracepoints when you could just output the
> cur->bc_flags in the trace output and use the same tracepoint?
> Doing that looks like it would simplify the tracepoint code
> in this patch....

The ifork_btree_op_error tracepoint records the inode number and
whichfork, whereas the btree_op_error of course does not.  I don't know
how to write a tracepoint that changes the TP_printk format string &
arguments based on the inputs.  If the bmap scrubber sees a problem with
an inode's fork, I want to know which inode and which fork without
having to go find the previous xfs_scrub_start trace.

> 
> > +/* Figure out which block the btree cursor was pointing to. */
> > +static inline xfs_fsblock_t
> > +xfs_scrub_btree_cur_fsbno(
> > +	struct xfs_btree_cur		*cur,
> > +	int				level)
> > +{
> > +	if (level < cur->bc_nlevels && cur->bc_bufs[level])
> > +		return XFS_DADDR_TO_FSB(cur->bc_mp, cur->bc_bufs[level]->b_bn);
> > +	else if (cur->bc_flags & XFS_BTREE_LONG_PTRS)
> 
> no need for "else if", just "if" will do.
> 
> > +		return XFS_INO_TO_FSB(cur->bc_mp, cur->bc_private.b.ip->i_ino);
> 
> This makes no sense to me. Why are we returning the block address of
> the inode here? It's not part of the btree....
> 
> 
> > +	return XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_private.a.agno, 0);
> 
> Nor is the first block of the AG....

I should've written into the comment that the fsb returned by this
function is used only for tracepoint output.

These two clauses are fallbacks to handle the case that this level of
the btree cursor doesn't point to a buffer.  If there is no xfs_buf and
this is a long-pointer btree then either we're looking at the btree root
in the inode or staring into space, for either of these we might as well
report the block that contains the inode.  If there is no xfs_buf and
it's a short-pointer btree then it must be an AG tree and so report the
AG.

I could be persuaded that if we don't have a buffer and it's not the
inode root then just report NULLFSBLOCK in the tracepoint, since I
revised all the tracepoints to contain the inum/agno anyway.

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 08/27] xfs: scrub the shape of a metadata btree
  2017-09-22 15:22   ` Brian Foster
@ 2017-09-22 17:22     ` Darrick J. Wong
  2017-09-22 19:13       ` Brian Foster
  0 siblings, 1 reply; 51+ messages in thread
From: Darrick J. Wong @ 2017-09-22 17:22 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Sep 22, 2017 at 11:22:33AM -0400, Brian Foster wrote:
> On Wed, Sep 20, 2017 at 05:18:26PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Create a function that can check the shape of a btree -- each block
> > passes basic inspection and all the pointers look ok.  In the next patch
> > we'll add the ability to check the actual keys and records stored within
> > the btree.  Add some helper functions so that we report detailed scrub
> > errors in a uniform manner in dmesg.  These are helper functions for
> > subsequent patches.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/libxfs/xfs_btree.c |   16 +++
> >  fs/xfs/libxfs/xfs_btree.h |    7 +
> >  fs/xfs/scrub/btree.c      |  236 +++++++++++++++++++++++++++++++++++++++++++++
> >  fs/xfs/scrub/common.h     |   13 ++
> >  4 files changed, 268 insertions(+), 4 deletions(-)
> > 
> > 
> ...
> > diff --git a/fs/xfs/scrub/btree.c b/fs/xfs/scrub/btree.c
> > index adf5d09..a9c2bf3 100644
> > --- a/fs/xfs/scrub/btree.c
> > +++ b/fs/xfs/scrub/btree.c
> ...
> > @@ -109,6 +255,92 @@ xfs_scrub_btree(
> >  	struct xfs_owner_info		*oinfo,
> >  	void				*private)
> >  {
> > -	xfs_scrub_btree_op_ok(sc, cur, 0, false);
> > -	return -EOPNOTSUPP;
> > +	struct xfs_scrub_btree		bs = {0};
> > +	union xfs_btree_ptr		ptr;
> > +	union xfs_btree_ptr		*pp;
> > +	struct xfs_btree_block		*block;
> > +	int				level;
> > +	struct xfs_buf			*bp;
> > +	int				i;
> > +	int				error = 0;
> > +
> > +	/* Initialize scrub state */
> > +	bs.cur = cur;
> > +	bs.scrub_rec = scrub_fn;
> > +	bs.oinfo = oinfo;
> > +	bs.firstrec = true;
> > +	bs.private = private;
> > +	bs.sc = sc;
> > +	for (i = 0; i < XFS_BTREE_MAXLEVELS; i++)
> > +		bs.firstkey[i] = true;
> > +	INIT_LIST_HEAD(&bs.to_check);
> > +
> > +	/* Don't try to check a tree with a height we can't handle. */
> > +	if (!xfs_scrub_btree_check_ok(sc, cur, 0, cur->bc_nlevels > 0 &&
> > +			cur->bc_nlevels <= XFS_BTREE_MAXLEVELS))
> > +		goto out;
> > +
> > +	/* Make sure the root isn't in the superblock. */
> > +	if (!(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE)) {
> > +		cur->bc_ops->init_ptr_from_cur(cur, &ptr);
> > +		error = xfs_scrub_btree_ptr(&bs, cur->bc_nlevels, &ptr);
> > +		if (!xfs_scrub_btree_op_ok(sc, cur, cur->bc_nlevels - 1, &error))
> > +			goto out;
> > +	}
> > +
> 
> This is kind of in line with Dave's comments on the previous patch that
> introduce some of these helpers. I just glanced over them for now
> because I didn't have enough context to grok the error processing.

(That's been a struggle with this patchset -- some of these helpers
don't get used until much further in the patchset.  I could have
sprinkled them into whichever patch uses them first, but now the hunks
are all over the series, I'd have to do more dependency tracking to make
sure bisect continues to work, and the frequency of auto-merge failures
as I push and pop the stack increase dramatically.)

> FWIW, the btree_op_ok()/btree_check_ok() stuff kind of makes my eyes
> cross a bit because I can't easily see the logic checks or distinguish
> between those and error code checks. This is also a bit confusing
> because it looks like we overload return codes for various things. E.g.,
> we generate -EFSCORRUPTED in some cases just so the caller can set the
> state on the context and clear it, but then we still use the fact that
> an error _was_ set to control the flow of the task via the op_ok()
> return value. This makes some of the code flow/decision making logic
> hard to follow, particularly since some of that state looks like it can
> be lost.
> 
> Case in point.. what happens if say xfs_btree_increment() returns
> -EFSCORRUPTED back to xfs_scrub_btree_block_check_sibling()? It looks to
> me that the latter calls btree_op_ok() to set the corrupt state, clears
> the error code and skips out appropriately.
> xfs_scrub_btree_block_check_sibling() now returns zero, which
> potentially bubbles up to xfs_scrub_btree() where we check the error
> code again. Is it expected that error == 0 here? What is supposed to
> happen here?

Yes, error == 0 is intended here.  Given a block B, we want to check
that B->rightsib->leftsib (if the sibling exists at all) point back to
B.  If the btree_increment operation returns EFSCORRUPTED we don't know
if that's because the B->rightsib points at something it's not supposed
to, or if B->rightsib points at a btree block, but that sibling block is
corrupt.  Therefore we set the corrupt flag and bubble error == 0 up the
call stack so that we can check the other records in the btree.   This
enables those following with ftrace to see everything that scrub thinks
is wrong with that piece of metadata.

IOWs, we only use error code returns for "runtime error, abort the whole
operation immediately".

> I'm wondering if this could all be made more clear by trying to
> explicitly separate out operational errors, scrub failures and whatever
> we want to call the logic that clears an -EFSCORRUPTED/-EFSBADCRC error
> code but still indicates something happened. :P
> 
> For starters, rather than wrap every logic check with btree_op_check(),
> could we use explicit logic and let each function update the context
> based on problems it found? For example, something like the following is
> much more easy to read for me than the associated logic above:
> 
> 	/* Don't try to check a tree with a height we can't handle. */
> 	if (!(cur->bc_nlevels > 0 &&
> 	      cur->bc_nlevels <= XFS_BTREE_MAXLEVELS)) {
> 		xfs_scrub_sc_corrupt(...);
> 		goto out;
> 	}
> 
> And of course the context update calls could be factored into an
> out_corrupt label or something where appropriate.

Yes, that could be done.

> Beyond that, where we need to identify a bit of metadata is busted to
> perhaps do something like skip it but not abort (as we may have filtered
> out an -EFSCORRUPTED) return code, could we pass a flag down a
> particular callchain (i.e., think 'bool *bad' or 'int *stat' a la the
> core btree code)? Then we can still transfer that state back up the
> chain and the caller(s) can distinguish operational errors from "this
> thing is corrupted, act accordingly," regardless of how the corruption
> was detected.

So far I haven't needed to distinguish between "no problems encountered"
and "this callchain hit a verifier error so we just set _CORRUPT" --
scrub always keeps going until it runs out of things to check.

(Maybe I'm missing something?)

--D

> 
> Brian
> 
> > +	/* Load the root of the btree. */
> > +	level = cur->bc_nlevels - 1;
> > +	cur->bc_ops->init_ptr_from_cur(cur, &ptr);
> > +	error = xfs_scrub_btree_block(&bs, level, &ptr, &block, &bp);
> > +	if (!xfs_scrub_btree_op_ok(sc, cur, cur->bc_nlevels - 1, &error))
> > +		goto out;
> > +
> > +	cur->bc_ptrs[level] = 1;
> > +
> > +	while (level < cur->bc_nlevels) {
> > +		block = xfs_btree_get_block(cur, level, &bp);
> > +
> > +		if (level == 0) {
> > +			/* End of leaf, pop back towards the root. */
> > +			if (cur->bc_ptrs[level] >
> > +			    be16_to_cpu(block->bb_numrecs)) {
> > +				if (level < cur->bc_nlevels - 1)
> > +					cur->bc_ptrs[level + 1]++;
> > +				level++;
> > +				continue;
> > +			}
> > +
> > +			if (xfs_scrub_should_terminate(&error))
> > +				break;
> > +
> > +			cur->bc_ptrs[level]++;
> > +			continue;
> > +		}
> > +
> > +		/* End of node, pop back towards the root. */
> > +		if (cur->bc_ptrs[level] > be16_to_cpu(block->bb_numrecs)) {
> > +			if (level < cur->bc_nlevels - 1)
> > +				cur->bc_ptrs[level + 1]++;
> > +			level++;
> > +			continue;
> > +		}
> > +
> > +		/* Drill another level deeper. */
> > +		pp = xfs_btree_ptr_addr(cur, cur->bc_ptrs[level], block);
> > +		error = xfs_scrub_btree_ptr(&bs, level, pp);
> > +		if (error) {
> > +			error = 0;
> > +			cur->bc_ptrs[level]++;
> > +			continue;
> > +		}
> > +		level--;
> > +		error = xfs_scrub_btree_block(&bs, level, pp, &block, &bp);
> > +		if (!xfs_scrub_btree_op_ok(sc, cur, level, &error))
> > +			goto out;
> > +
> > +		cur->bc_ptrs[level] = 1;
> > +	}
> > +
> > +out:
> > +	return error;
> >  }
> > diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
> > index e1bb14b..9920488 100644
> > --- a/fs/xfs/scrub/common.h
> > +++ b/fs/xfs/scrub/common.h
> > @@ -20,6 +20,19 @@
> >  #ifndef __XFS_SCRUB_COMMON_H__
> >  #define __XFS_SCRUB_COMMON_H__
> >  
> > +/* Should we end the scrub early? */
> > +static inline bool
> > +xfs_scrub_should_terminate(
> > +	int		*error)
> > +{
> > +	if (fatal_signal_pending(current)) {
> > +		if (*error == 0)
> > +			*error = -EAGAIN;
> > +		return true;
> > +	}
> > +	return false;
> > +}
> > +
> >  /*
> >   * Grab a transaction.  If we're going to repair something, we need to
> >   * ensure there's enough reservation to make all the changes.  If not,
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 08/27] xfs: scrub the shape of a metadata btree
  2017-09-22 17:22     ` Darrick J. Wong
@ 2017-09-22 19:13       ` Brian Foster
  2017-09-22 20:14         ` Darrick J. Wong
  0 siblings, 1 reply; 51+ messages in thread
From: Brian Foster @ 2017-09-22 19:13 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Fri, Sep 22, 2017 at 10:22:07AM -0700, Darrick J. Wong wrote:
> On Fri, Sep 22, 2017 at 11:22:33AM -0400, Brian Foster wrote:
> > On Wed, Sep 20, 2017 at 05:18:26PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > Create a function that can check the shape of a btree -- each block
> > > passes basic inspection and all the pointers look ok.  In the next patch
> > > we'll add the ability to check the actual keys and records stored within
> > > the btree.  Add some helper functions so that we report detailed scrub
> > > errors in a uniform manner in dmesg.  These are helper functions for
> > > subsequent patches.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > >  fs/xfs/libxfs/xfs_btree.c |   16 +++
> > >  fs/xfs/libxfs/xfs_btree.h |    7 +
> > >  fs/xfs/scrub/btree.c      |  236 +++++++++++++++++++++++++++++++++++++++++++++
> > >  fs/xfs/scrub/common.h     |   13 ++
> > >  4 files changed, 268 insertions(+), 4 deletions(-)
> > > 
> > > 
> > ...
> > > diff --git a/fs/xfs/scrub/btree.c b/fs/xfs/scrub/btree.c
> > > index adf5d09..a9c2bf3 100644
> > > --- a/fs/xfs/scrub/btree.c
> > > +++ b/fs/xfs/scrub/btree.c
> > ...
> > > @@ -109,6 +255,92 @@ xfs_scrub_btree(
> > >  	struct xfs_owner_info		*oinfo,
> > >  	void				*private)
> > >  {
> > > -	xfs_scrub_btree_op_ok(sc, cur, 0, false);
> > > -	return -EOPNOTSUPP;
> > > +	struct xfs_scrub_btree		bs = {0};
> > > +	union xfs_btree_ptr		ptr;
> > > +	union xfs_btree_ptr		*pp;
> > > +	struct xfs_btree_block		*block;
> > > +	int				level;
> > > +	struct xfs_buf			*bp;
> > > +	int				i;
> > > +	int				error = 0;
> > > +
> > > +	/* Initialize scrub state */
> > > +	bs.cur = cur;
> > > +	bs.scrub_rec = scrub_fn;
> > > +	bs.oinfo = oinfo;
> > > +	bs.firstrec = true;
> > > +	bs.private = private;
> > > +	bs.sc = sc;
> > > +	for (i = 0; i < XFS_BTREE_MAXLEVELS; i++)
> > > +		bs.firstkey[i] = true;
> > > +	INIT_LIST_HEAD(&bs.to_check);
> > > +
> > > +	/* Don't try to check a tree with a height we can't handle. */
> > > +	if (!xfs_scrub_btree_check_ok(sc, cur, 0, cur->bc_nlevels > 0 &&
> > > +			cur->bc_nlevels <= XFS_BTREE_MAXLEVELS))
> > > +		goto out;
> > > +
> > > +	/* Make sure the root isn't in the superblock. */
> > > +	if (!(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE)) {
> > > +		cur->bc_ops->init_ptr_from_cur(cur, &ptr);
> > > +		error = xfs_scrub_btree_ptr(&bs, cur->bc_nlevels, &ptr);
> > > +		if (!xfs_scrub_btree_op_ok(sc, cur, cur->bc_nlevels - 1, &error))
> > > +			goto out;
> > > +	}
> > > +
> > 
> > This is kind of in line with Dave's comments on the previous patch that
> > introduce some of these helpers. I just glanced over them for now
> > because I didn't have enough context to grok the error processing.
> 
> (That's been a struggle with this patchset -- some of these helpers
> don't get used until much further in the patchset.  I could have
> sprinkled them into whichever patch uses them first, but now the hunks
> are all over the series, I'd have to do more dependency tracking to make
> sure bisect continues to work, and the frequency of auto-merge failures
> as I push and pop the stack increase dramatically.)
> 
> > FWIW, the btree_op_ok()/btree_check_ok() stuff kind of makes my eyes
> > cross a bit because I can't easily see the logic checks or distinguish
> > between those and error code checks. This is also a bit confusing
> > because it looks like we overload return codes for various things. E.g.,
> > we generate -EFSCORRUPTED in some cases just so the caller can set the
> > state on the context and clear it, but then we still use the fact that
> > an error _was_ set to control the flow of the task via the op_ok()
> > return value. This makes some of the code flow/decision making logic
> > hard to follow, particularly since some of that state looks like it can
> > be lost.
> > 
> > Case in point.. what happens if say xfs_btree_increment() returns
> > -EFSCORRUPTED back to xfs_scrub_btree_block_check_sibling()? It looks to
> > me that the latter calls btree_op_ok() to set the corrupt state, clears
> > the error code and skips out appropriately.
> > xfs_scrub_btree_block_check_sibling() now returns zero, which
> > potentially bubbles up to xfs_scrub_btree() where we check the error
> > code again. Is it expected that error == 0 here? What is supposed to
> > happen here?
> 
> Yes, error == 0 is intended here.  Given a block B, we want to check
> that B->rightsib->leftsib (if the sibling exists at all) point back to
> B.  If the btree_increment operation returns EFSCORRUPTED we don't know
> if that's because the B->rightsib points at something it's not supposed
> to, or if B->rightsib points at a btree block, but that sibling block is
> corrupt.  Therefore we set the corrupt flag and bubble error == 0 up the
> call stack so that we can check the other records in the btree.   This
> enables those following with ftrace to see everything that scrub thinks
> is wrong with that piece of metadata.
> 
> IOWs, we only use error code returns for "runtime error, abort the whole
> operation immediately".
> 

Ok, so we intentionally have to consume the error here because it
doesn't necessarily reflect the corrupted state of the scrubbed block.
So IIUC, overloading return codes as such means error handling must
either return -EFSCORRUPTED for the current object being corrupted,
return some other error for an error in the infrastructure, or clear any
-EFSCORRUPTED error generated by checks that don't necessarily mean the
current object is corrupted (or too much so to interrupt processing).

> > I'm wondering if this could all be made more clear by trying to
> > explicitly separate out operational errors, scrub failures and whatever
> > we want to call the logic that clears an -EFSCORRUPTED/-EFSBADCRC error
> > code but still indicates something happened. :P
> > 
> > For starters, rather than wrap every logic check with btree_op_check(),
> > could we use explicit logic and let each function update the context
> > based on problems it found? For example, something like the following is
> > much more easy to read for me than the associated logic above:
> > 
> > 	/* Don't try to check a tree with a height we can't handle. */
> > 	if (!(cur->bc_nlevels > 0 &&
> > 	      cur->bc_nlevels <= XFS_BTREE_MAXLEVELS)) {
> > 		xfs_scrub_sc_corrupt(...);
> > 		goto out;
> > 	}
> > 
> > And of course the context update calls could be factored into an
> > out_corrupt label or something where appropriate.
> 
> Yes, that could be done.
> 
> > Beyond that, where we need to identify a bit of metadata is busted to
> > perhaps do something like skip it but not abort (as we may have filtered
> > out an -EFSCORRUPTED) return code, could we pass a flag down a
> > particular callchain (i.e., think 'bool *bad' or 'int *stat' a la the
> > core btree code)? Then we can still transfer that state back up the
> > chain and the caller(s) can distinguish operational errors from "this
> > thing is corrupted, act accordingly," regardless of how the corruption
> > was detected.
> 
> So far I haven't needed to distinguish between "no problems encountered"
> and "this callchain hit a verifier error so we just set _CORRUPT" --
> scrub always keeps going until it runs out of things to check.
> 

Ok. This is partly speculation on the above (trying to wrap my head
around the error consumption bits as is) and partly to try and see if we
can make the flow more readable.

In my mind, this is more clear if return codes are reserved for
operational/infrastructure errors and the corrupted state of a piece of
metadata is its own state. Using the example above, any -EFSCORRUPTED
errors from external calls (xfs_btree_check_block(),
xfs_btree_increment(), etc.) would always be cleared and replaced with a
return 0. The difference between those is the former (check_block())
error sets a 'bad = true' state on the currently scrubbed bit of
metadata and the latter (check_sibling()) does not. The latter can of
course still set the global corrupted state on the context to track that
there is an inconsistency in the fs. Thoughts?

Brian

> (Maybe I'm missing something?)
> 
> --D
> 
> > 
> > Brian
> > 
> > > +	/* Load the root of the btree. */
> > > +	level = cur->bc_nlevels - 1;
> > > +	cur->bc_ops->init_ptr_from_cur(cur, &ptr);
> > > +	error = xfs_scrub_btree_block(&bs, level, &ptr, &block, &bp);
> > > +	if (!xfs_scrub_btree_op_ok(sc, cur, cur->bc_nlevels - 1, &error))
> > > +		goto out;
> > > +
> > > +	cur->bc_ptrs[level] = 1;
> > > +
> > > +	while (level < cur->bc_nlevels) {
> > > +		block = xfs_btree_get_block(cur, level, &bp);
> > > +
> > > +		if (level == 0) {
> > > +			/* End of leaf, pop back towards the root. */
> > > +			if (cur->bc_ptrs[level] >
> > > +			    be16_to_cpu(block->bb_numrecs)) {
> > > +				if (level < cur->bc_nlevels - 1)
> > > +					cur->bc_ptrs[level + 1]++;
> > > +				level++;
> > > +				continue;
> > > +			}
> > > +
> > > +			if (xfs_scrub_should_terminate(&error))
> > > +				break;
> > > +
> > > +			cur->bc_ptrs[level]++;
> > > +			continue;
> > > +		}
> > > +
> > > +		/* End of node, pop back towards the root. */
> > > +		if (cur->bc_ptrs[level] > be16_to_cpu(block->bb_numrecs)) {
> > > +			if (level < cur->bc_nlevels - 1)
> > > +				cur->bc_ptrs[level + 1]++;
> > > +			level++;
> > > +			continue;
> > > +		}
> > > +
> > > +		/* Drill another level deeper. */
> > > +		pp = xfs_btree_ptr_addr(cur, cur->bc_ptrs[level], block);
> > > +		error = xfs_scrub_btree_ptr(&bs, level, pp);
> > > +		if (error) {
> > > +			error = 0;
> > > +			cur->bc_ptrs[level]++;
> > > +			continue;
> > > +		}
> > > +		level--;
> > > +		error = xfs_scrub_btree_block(&bs, level, pp, &block, &bp);
> > > +		if (!xfs_scrub_btree_op_ok(sc, cur, level, &error))
> > > +			goto out;
> > > +
> > > +		cur->bc_ptrs[level] = 1;
> > > +	}
> > > +
> > > +out:
> > > +	return error;
> > >  }
> > > diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
> > > index e1bb14b..9920488 100644
> > > --- a/fs/xfs/scrub/common.h
> > > +++ b/fs/xfs/scrub/common.h
> > > @@ -20,6 +20,19 @@
> > >  #ifndef __XFS_SCRUB_COMMON_H__
> > >  #define __XFS_SCRUB_COMMON_H__
> > >  
> > > +/* Should we end the scrub early? */
> > > +static inline bool
> > > +xfs_scrub_should_terminate(
> > > +	int		*error)
> > > +{
> > > +	if (fatal_signal_pending(current)) {
> > > +		if (*error == 0)
> > > +			*error = -EAGAIN;
> > > +		return true;
> > > +	}
> > > +	return false;
> > > +}
> > > +
> > >  /*
> > >   * Grab a transaction.  If we're going to repair something, we need to
> > >   * ensure there's enough reservation to make all the changes.  If not,
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 08/27] xfs: scrub the shape of a metadata btree
  2017-09-22 19:13       ` Brian Foster
@ 2017-09-22 20:14         ` Darrick J. Wong
  2017-09-22 21:15           ` Brian Foster
  0 siblings, 1 reply; 51+ messages in thread
From: Darrick J. Wong @ 2017-09-22 20:14 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Sep 22, 2017 at 03:13:20PM -0400, Brian Foster wrote:
> On Fri, Sep 22, 2017 at 10:22:07AM -0700, Darrick J. Wong wrote:
> > On Fri, Sep 22, 2017 at 11:22:33AM -0400, Brian Foster wrote:
> > > On Wed, Sep 20, 2017 at 05:18:26PM -0700, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > 
> > > > Create a function that can check the shape of a btree -- each block
> > > > passes basic inspection and all the pointers look ok.  In the next patch
> > > > we'll add the ability to check the actual keys and records stored within
> > > > the btree.  Add some helper functions so that we report detailed scrub
> > > > errors in a uniform manner in dmesg.  These are helper functions for
> > > > subsequent patches.
> > > > 
> > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > ---
> > > >  fs/xfs/libxfs/xfs_btree.c |   16 +++
> > > >  fs/xfs/libxfs/xfs_btree.h |    7 +
> > > >  fs/xfs/scrub/btree.c      |  236 +++++++++++++++++++++++++++++++++++++++++++++
> > > >  fs/xfs/scrub/common.h     |   13 ++
> > > >  4 files changed, 268 insertions(+), 4 deletions(-)
> > > > 
> > > > 
> > > ...
> > > > diff --git a/fs/xfs/scrub/btree.c b/fs/xfs/scrub/btree.c
> > > > index adf5d09..a9c2bf3 100644
> > > > --- a/fs/xfs/scrub/btree.c
> > > > +++ b/fs/xfs/scrub/btree.c
> > > ...
> > > > @@ -109,6 +255,92 @@ xfs_scrub_btree(
> > > >  	struct xfs_owner_info		*oinfo,
> > > >  	void				*private)
> > > >  {
> > > > -	xfs_scrub_btree_op_ok(sc, cur, 0, false);
> > > > -	return -EOPNOTSUPP;
> > > > +	struct xfs_scrub_btree		bs = {0};
> > > > +	union xfs_btree_ptr		ptr;
> > > > +	union xfs_btree_ptr		*pp;
> > > > +	struct xfs_btree_block		*block;
> > > > +	int				level;
> > > > +	struct xfs_buf			*bp;
> > > > +	int				i;
> > > > +	int				error = 0;
> > > > +
> > > > +	/* Initialize scrub state */
> > > > +	bs.cur = cur;
> > > > +	bs.scrub_rec = scrub_fn;
> > > > +	bs.oinfo = oinfo;
> > > > +	bs.firstrec = true;
> > > > +	bs.private = private;
> > > > +	bs.sc = sc;
> > > > +	for (i = 0; i < XFS_BTREE_MAXLEVELS; i++)
> > > > +		bs.firstkey[i] = true;
> > > > +	INIT_LIST_HEAD(&bs.to_check);
> > > > +
> > > > +	/* Don't try to check a tree with a height we can't handle. */
> > > > +	if (!xfs_scrub_btree_check_ok(sc, cur, 0, cur->bc_nlevels > 0 &&
> > > > +			cur->bc_nlevels <= XFS_BTREE_MAXLEVELS))
> > > > +		goto out;
> > > > +
> > > > +	/* Make sure the root isn't in the superblock. */
> > > > +	if (!(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE)) {
> > > > +		cur->bc_ops->init_ptr_from_cur(cur, &ptr);
> > > > +		error = xfs_scrub_btree_ptr(&bs, cur->bc_nlevels, &ptr);
> > > > +		if (!xfs_scrub_btree_op_ok(sc, cur, cur->bc_nlevels - 1, &error))
> > > > +			goto out;
> > > > +	}
> > > > +
> > > 
> > > This is kind of in line with Dave's comments on the previous patch that
> > > introduce some of these helpers. I just glanced over them for now
> > > because I didn't have enough context to grok the error processing.
> > 
> > (That's been a struggle with this patchset -- some of these helpers
> > don't get used until much further in the patchset.  I could have
> > sprinkled them into whichever patch uses them first, but now the hunks
> > are all over the series, I'd have to do more dependency tracking to make
> > sure bisect continues to work, and the frequency of auto-merge failures
> > as I push and pop the stack increase dramatically.)
> > 
> > > FWIW, the btree_op_ok()/btree_check_ok() stuff kind of makes my eyes
> > > cross a bit because I can't easily see the logic checks or distinguish
> > > between those and error code checks. This is also a bit confusing
> > > because it looks like we overload return codes for various things. E.g.,
> > > we generate -EFSCORRUPTED in some cases just so the caller can set the
> > > state on the context and clear it, but then we still use the fact that
> > > an error _was_ set to control the flow of the task via the op_ok()
> > > return value. This makes some of the code flow/decision making logic
> > > hard to follow, particularly since some of that state looks like it can
> > > be lost.
> > > 
> > > Case in point.. what happens if say xfs_btree_increment() returns
> > > -EFSCORRUPTED back to xfs_scrub_btree_block_check_sibling()? It looks to
> > > me that the latter calls btree_op_ok() to set the corrupt state, clears
> > > the error code and skips out appropriately.
> > > xfs_scrub_btree_block_check_sibling() now returns zero, which
> > > potentially bubbles up to xfs_scrub_btree() where we check the error
> > > code again. Is it expected that error == 0 here? What is supposed to
> > > happen here?
> > 
> > Yes, error == 0 is intended here.  Given a block B, we want to check
> > that B->rightsib->leftsib (if the sibling exists at all) point back to
> > B.  If the btree_increment operation returns EFSCORRUPTED we don't know
> > if that's because the B->rightsib points at something it's not supposed
> > to, or if B->rightsib points at a btree block, but that sibling block is
> > corrupt.  Therefore we set the corrupt flag and bubble error == 0 up the
> > call stack so that we can check the other records in the btree.   This
> > enables those following with ftrace to see everything that scrub thinks
> > is wrong with that piece of metadata.
> > 
> > IOWs, we only use error code returns for "runtime error, abort the whole
> > operation immediately".
> > 
> 
> Ok, so we intentionally have to consume the error here because it
> doesn't necessarily reflect the corrupted state of the scrubbed block.

Not quite -- while we're examining blocks or otherwise operating on the
btree we've been told to scrub, we always want to consume an
EFSCORRUPTED error (and set the CORRUPT flag) because we always want to
try to check everything, even if we find problems midway through a scan.

This is similar to how gcc will keep processing past the first error to
try to report everything that's wrong in the source file instead of
bailing out at the first error like it used to do.

> So IIUC, overloading return codes as such means error handling must
> either return -EFSCORRUPTED for the current object being corrupted,

We /never/ return EFSCORRUPTED to userspace, because we have the
OFLAG_CORRUPT flag to indicate any kind of corruption anywhere in this
data structure we're checking.

Let's say that a 2-level btree looks like this:

           B0
            |
+-----+-----+--------+-----+----------------+-------------------+
|     |     |        |     |                |                   |
|     |     |        V     |                |                   |
|     |     |  someblock   |                |                   |
|     |     |              |                |                   |
V     V     V              V                V                   V
B1 -> B2 -> B3 ----> B4 -> B5 (badmagic) -> B6 (bad records) -> B7

Here we have a bad pointer in B0 that should point to B4 but now points
to something that was never part of B.  In B6 we have some incorrect
records, and in B5 we have a bad checksum.

First we visit B0.  Nothing obviously wrong there, so we proceed with
the depth-first search of B.  We examine B1 and B2 via pointer[1] and
pointer[2], respectively, and find nothing wrong.  Now we try
pointer[3], which we follow to B3.

Then we get to B3's sibling pointer check.  Leftsib is ok, but when we
move on to checking rightsib, the xfs_btree_increment returns EFSBADCRC
because the pointer[4] in B0 points to a block that isn't in the btree.
Here we want to consume the EFSBADCRC, so we set OFLAG_CORRUPT and
continue walking the tree.

Next we try to walk pointer[4] in B0 and again hit a EFSBADCRC error.
Again we set OFLAG_CORRUPT and continue walking the tree.

Then we try to walk pointer[5] in B0 and encounter B5.  The CRC matches,
but the magic number is wrong, so we hit EFSCORRUPTED.  The block is
toast, but we still need to keep walking.

Now we walk pointer[6] and encounter B6.  We encounter no operational
errors but then we see some incorrect records so we set OFLAG_CORRUPT
(it's still set) and continue.

Finally we get to pointer[7], where everything is fine again.  If we
haven't encountered any operational problems like ENOMEM then we'll
return to userspace with OFLAG_CORRUPT set, a return value of zero.

The ftrace buffer will have a report about the operational error trying
to walk down pointer[4], another one about B5, record check failures
from B6.

(If instead we run out of memory checking B7 then we'll return the
ENOMEM to userspace.)

> return some other error for an error in the infrastructure, or clear any
> -EFSCORRUPTED error generated by checks that don't necessarily mean the
> current object is corrupted (or too much so to interrupt processing).
> 
> > > I'm wondering if this could all be made more clear by trying to
> > > explicitly separate out operational errors, scrub failures and whatever
> > > we want to call the logic that clears an -EFSCORRUPTED/-EFSBADCRC error
> > > code but still indicates something happened. :P
> > > 
> > > For starters, rather than wrap every logic check with btree_op_check(),
> > > could we use explicit logic and let each function update the context
> > > based on problems it found? For example, something like the following is
> > > much more easy to read for me than the associated logic above:
> > > 
> > > 	/* Don't try to check a tree with a height we can't handle. */
> > > 	if (!(cur->bc_nlevels > 0 &&
> > > 	      cur->bc_nlevels <= XFS_BTREE_MAXLEVELS)) {
> > > 		xfs_scrub_sc_corrupt(...);
> > > 		goto out;
> > > 	}
> > > 
> > > And of course the context update calls could be factored into an
> > > out_corrupt label or something where appropriate.
> > 
> > Yes, that could be done.
> > 
> > > Beyond that, where we need to identify a bit of metadata is busted to
> > > perhaps do something like skip it but not abort (as we may have filtered
> > > out an -EFSCORRUPTED) return code, could we pass a flag down a
> > > particular callchain (i.e., think 'bool *bad' or 'int *stat' a la the
> > > core btree code)? Then we can still transfer that state back up the
> > > chain and the caller(s) can distinguish operational errors from "this
> > > thing is corrupted, act accordingly," regardless of how the corruption
> > > was detected.
> > 
> > So far I haven't needed to distinguish between "no problems encountered"
> > and "this callchain hit a verifier error so we just set _CORRUPT" --
> > scrub always keeps going until it runs out of things to check.
> > 
> 
> Ok. This is partly speculation on the above (trying to wrap my head
> around the error consumption bits as is) and partly to try and see if we
> can make the flow more readable.
> 
> In my mind, this is more clear if return codes are reserved for
> operational/infrastructure errors

Yes, this is true.

> and the corrupted state of a piece of metadata is its own state.

Also true -- this is OFLAG_CORRUPT.

> Using the example above, any -EFSCORRUPTED errors from external calls
> (xfs_btree_check_block(), xfs_btree_increment(), etc.) would always be
> cleared and replaced with a return 0.

<nod>

> The difference between those is the former (check_block()) error sets
> a 'bad = true' state on the currently scrubbed bit of metadata and the
> latter (check_sibling()) does not.

Nothing in the scrub code needs to track badness at that fine-grained of
a level.  When we get to the repair patches you'll see that any kind of
error triggers a complete rebuild of the btree index, with absolutely no
attempt to touch the existing (inconsistent) btree.

We /could/ return to userspace as soon as we hit the first EFSCORRUPTED
or failed check, TBH.

> The latter can of course still set the global corrupted state on the
> context to track that there is an inconsistency in the fs. Thoughts?

I've wondered if it might be clearer if we did something like:

int error;
bool bailout;

error = xfs_btree_increment(...);
bailout = xfs_scrub_op_error(..., &error);
if (bailout)
	return error;

if (ptr->field == BADVAL) {
	xfs_scrub_corrupt(...);
	return error;
}

<shrug>

--D

> 
> Brian
> 
> > (Maybe I'm missing something?)
> > 
> > --D
> > 
> > > 
> > > Brian
> > > 
> > > > +	/* Load the root of the btree. */
> > > > +	level = cur->bc_nlevels - 1;
> > > > +	cur->bc_ops->init_ptr_from_cur(cur, &ptr);
> > > > +	error = xfs_scrub_btree_block(&bs, level, &ptr, &block, &bp);
> > > > +	if (!xfs_scrub_btree_op_ok(sc, cur, cur->bc_nlevels - 1, &error))
> > > > +		goto out;
> > > > +
> > > > +	cur->bc_ptrs[level] = 1;
> > > > +
> > > > +	while (level < cur->bc_nlevels) {
> > > > +		block = xfs_btree_get_block(cur, level, &bp);
> > > > +
> > > > +		if (level == 0) {
> > > > +			/* End of leaf, pop back towards the root. */
> > > > +			if (cur->bc_ptrs[level] >
> > > > +			    be16_to_cpu(block->bb_numrecs)) {
> > > > +				if (level < cur->bc_nlevels - 1)
> > > > +					cur->bc_ptrs[level + 1]++;
> > > > +				level++;
> > > > +				continue;
> > > > +			}
> > > > +
> > > > +			if (xfs_scrub_should_terminate(&error))
> > > > +				break;
> > > > +
> > > > +			cur->bc_ptrs[level]++;
> > > > +			continue;
> > > > +		}
> > > > +
> > > > +		/* End of node, pop back towards the root. */
> > > > +		if (cur->bc_ptrs[level] > be16_to_cpu(block->bb_numrecs)) {
> > > > +			if (level < cur->bc_nlevels - 1)
> > > > +				cur->bc_ptrs[level + 1]++;
> > > > +			level++;
> > > > +			continue;
> > > > +		}
> > > > +
> > > > +		/* Drill another level deeper. */
> > > > +		pp = xfs_btree_ptr_addr(cur, cur->bc_ptrs[level], block);
> > > > +		error = xfs_scrub_btree_ptr(&bs, level, pp);
> > > > +		if (error) {
> > > > +			error = 0;
> > > > +			cur->bc_ptrs[level]++;
> > > > +			continue;
> > > > +		}
> > > > +		level--;
> > > > +		error = xfs_scrub_btree_block(&bs, level, pp, &block, &bp);
> > > > +		if (!xfs_scrub_btree_op_ok(sc, cur, level, &error))
> > > > +			goto out;
> > > > +
> > > > +		cur->bc_ptrs[level] = 1;
> > > > +	}
> > > > +
> > > > +out:
> > > > +	return error;
> > > >  }
> > > > diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
> > > > index e1bb14b..9920488 100644
> > > > --- a/fs/xfs/scrub/common.h
> > > > +++ b/fs/xfs/scrub/common.h
> > > > @@ -20,6 +20,19 @@
> > > >  #ifndef __XFS_SCRUB_COMMON_H__
> > > >  #define __XFS_SCRUB_COMMON_H__
> > > >  
> > > > +/* Should we end the scrub early? */
> > > > +static inline bool
> > > > +xfs_scrub_should_terminate(
> > > > +	int		*error)
> > > > +{
> > > > +	if (fatal_signal_pending(current)) {
> > > > +		if (*error == 0)
> > > > +			*error = -EAGAIN;
> > > > +		return true;
> > > > +	}
> > > > +	return false;
> > > > +}
> > > > +
> > > >  /*
> > > >   * Grab a transaction.  If we're going to repair something, we need to
> > > >   * ensure there's enough reservation to make all the changes.  If not,
> > > > 
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 08/27] xfs: scrub the shape of a metadata btree
  2017-09-22 20:14         ` Darrick J. Wong
@ 2017-09-22 21:15           ` Brian Foster
  0 siblings, 0 replies; 51+ messages in thread
From: Brian Foster @ 2017-09-22 21:15 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Fri, Sep 22, 2017 at 01:14:39PM -0700, Darrick J. Wong wrote:
> On Fri, Sep 22, 2017 at 03:13:20PM -0400, Brian Foster wrote:
> > On Fri, Sep 22, 2017 at 10:22:07AM -0700, Darrick J. Wong wrote:
> > > On Fri, Sep 22, 2017 at 11:22:33AM -0400, Brian Foster wrote:
> > > > On Wed, Sep 20, 2017 at 05:18:26PM -0700, Darrick J. Wong wrote:
> > > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > 
> > > > > Create a function that can check the shape of a btree -- each block
> > > > > passes basic inspection and all the pointers look ok.  In the next patch
> > > > > we'll add the ability to check the actual keys and records stored within
> > > > > the btree.  Add some helper functions so that we report detailed scrub
> > > > > errors in a uniform manner in dmesg.  These are helper functions for
> > > > > subsequent patches.
> > > > > 
> > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > ---
> > > > >  fs/xfs/libxfs/xfs_btree.c |   16 +++
> > > > >  fs/xfs/libxfs/xfs_btree.h |    7 +
> > > > >  fs/xfs/scrub/btree.c      |  236 +++++++++++++++++++++++++++++++++++++++++++++
> > > > >  fs/xfs/scrub/common.h     |   13 ++
> > > > >  4 files changed, 268 insertions(+), 4 deletions(-)
> > > > > 
> > > > > 
> > > > ...
> > > > > diff --git a/fs/xfs/scrub/btree.c b/fs/xfs/scrub/btree.c
> > > > > index adf5d09..a9c2bf3 100644
> > > > > --- a/fs/xfs/scrub/btree.c
> > > > > +++ b/fs/xfs/scrub/btree.c
> > > > ...
> > > > > @@ -109,6 +255,92 @@ xfs_scrub_btree(
> > > > >  	struct xfs_owner_info		*oinfo,
> > > > >  	void				*private)
> > > > >  {
> > > > > -	xfs_scrub_btree_op_ok(sc, cur, 0, false);
> > > > > -	return -EOPNOTSUPP;
> > > > > +	struct xfs_scrub_btree		bs = {0};
> > > > > +	union xfs_btree_ptr		ptr;
> > > > > +	union xfs_btree_ptr		*pp;
> > > > > +	struct xfs_btree_block		*block;
> > > > > +	int				level;
> > > > > +	struct xfs_buf			*bp;
> > > > > +	int				i;
> > > > > +	int				error = 0;
> > > > > +
> > > > > +	/* Initialize scrub state */
> > > > > +	bs.cur = cur;
> > > > > +	bs.scrub_rec = scrub_fn;
> > > > > +	bs.oinfo = oinfo;
> > > > > +	bs.firstrec = true;
> > > > > +	bs.private = private;
> > > > > +	bs.sc = sc;
> > > > > +	for (i = 0; i < XFS_BTREE_MAXLEVELS; i++)
> > > > > +		bs.firstkey[i] = true;
> > > > > +	INIT_LIST_HEAD(&bs.to_check);
> > > > > +
> > > > > +	/* Don't try to check a tree with a height we can't handle. */
> > > > > +	if (!xfs_scrub_btree_check_ok(sc, cur, 0, cur->bc_nlevels > 0 &&
> > > > > +			cur->bc_nlevels <= XFS_BTREE_MAXLEVELS))
> > > > > +		goto out;
> > > > > +
> > > > > +	/* Make sure the root isn't in the superblock. */
> > > > > +	if (!(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE)) {
> > > > > +		cur->bc_ops->init_ptr_from_cur(cur, &ptr);
> > > > > +		error = xfs_scrub_btree_ptr(&bs, cur->bc_nlevels, &ptr);
> > > > > +		if (!xfs_scrub_btree_op_ok(sc, cur, cur->bc_nlevels - 1, &error))
> > > > > +			goto out;
> > > > > +	}
> > > > > +
> > > > 
> > > > This is kind of in line with Dave's comments on the previous patch that
> > > > introduce some of these helpers. I just glanced over them for now
> > > > because I didn't have enough context to grok the error processing.
> > > 
> > > (That's been a struggle with this patchset -- some of these helpers
> > > don't get used until much further in the patchset.  I could have
> > > sprinkled them into whichever patch uses them first, but now the hunks
> > > are all over the series, I'd have to do more dependency tracking to make
> > > sure bisect continues to work, and the frequency of auto-merge failures
> > > as I push and pop the stack increase dramatically.)
> > > 
> > > > FWIW, the btree_op_ok()/btree_check_ok() stuff kind of makes my eyes
> > > > cross a bit because I can't easily see the logic checks or distinguish
> > > > between those and error code checks. This is also a bit confusing
> > > > because it looks like we overload return codes for various things. E.g.,
> > > > we generate -EFSCORRUPTED in some cases just so the caller can set the
> > > > state on the context and clear it, but then we still use the fact that
> > > > an error _was_ set to control the flow of the task via the op_ok()
> > > > return value. This makes some of the code flow/decision making logic
> > > > hard to follow, particularly since some of that state looks like it can
> > > > be lost.
> > > > 
> > > > Case in point.. what happens if say xfs_btree_increment() returns
> > > > -EFSCORRUPTED back to xfs_scrub_btree_block_check_sibling()? It looks to
> > > > me that the latter calls btree_op_ok() to set the corrupt state, clears
> > > > the error code and skips out appropriately.
> > > > xfs_scrub_btree_block_check_sibling() now returns zero, which
> > > > potentially bubbles up to xfs_scrub_btree() where we check the error
> > > > code again. Is it expected that error == 0 here? What is supposed to
> > > > happen here?
> > > 
> > > Yes, error == 0 is intended here.  Given a block B, we want to check
> > > that B->rightsib->leftsib (if the sibling exists at all) point back to
> > > B.  If the btree_increment operation returns EFSCORRUPTED we don't know
> > > if that's because the B->rightsib points at something it's not supposed
> > > to, or if B->rightsib points at a btree block, but that sibling block is
> > > corrupt.  Therefore we set the corrupt flag and bubble error == 0 up the
> > > call stack so that we can check the other records in the btree.   This
> > > enables those following with ftrace to see everything that scrub thinks
> > > is wrong with that piece of metadata.
> > > 
> > > IOWs, we only use error code returns for "runtime error, abort the whole
> > > operation immediately".
> > > 
> > 
> > Ok, so we intentionally have to consume the error here because it
> > doesn't necessarily reflect the corrupted state of the scrubbed block.
> 
> Not quite -- while we're examining blocks or otherwise operating on the
> btree we've been told to scrub, we always want to consume an
> EFSCORRUPTED error (and set the CORRUPT flag) because we always want to
> try to check everything, even if we find problems midway through a scan.
> 

Then why don't we consume -EFSCORRUPTED errors from the call to
xfs_btree_check_block() in xfs_scrub_btree_block(), but leave it for the
caller? Note that the point I'm trying to get at is not to say that we
sometimes don't consume -EFSCORRUPTED errors at all, but rather we
consume them in different places to implicitly affect the code flow.

In the xfs_btree_check_block() case, my understanding is that we have to
leave that one for the caller to consume rather than consume it
immediately like we do in the check_sibling() code. If my understand is
not correct, what's the difference between those two sites with regard
to consuming -EFSCORRUPTED immediately in one and not the other?

> This is similar to how gcc will keep processing past the first error to
> try to report everything that's wrong in the source file instead of
> bailing out at the first error like it used to do.
> 
> > So IIUC, overloading return codes as such means error handling must
> > either return -EFSCORRUPTED for the current object being corrupted,
> 
> We /never/ return EFSCORRUPTED to userspace, because we have the
> OFLAG_CORRUPT flag to indicate any kind of corruption anywhere in this
> data structure we're checking.
> 

Understood, I'm not suggesting we'd return -EFSCORRUPTED to userspace.

> Let's say that a 2-level btree looks like this:
> 
>            B0
>             |
> +-----+-----+--------+-----+----------------+-------------------+
> |     |     |        |     |                |                   |
> |     |     |        V     |                |                   |
> |     |     |  someblock   |                |                   |
> |     |     |              |                |                   |
> V     V     V              V                V                   V
> B1 -> B2 -> B3 ----> B4 -> B5 (badmagic) -> B6 (bad records) -> B7
> 
> Here we have a bad pointer in B0 that should point to B4 but now points
> to something that was never part of B.  In B6 we have some incorrect
> records, and in B5 we have a bad checksum.
> 
> First we visit B0.  Nothing obviously wrong there, so we proceed with
> the depth-first search of B.  We examine B1 and B2 via pointer[1] and
> pointer[2], respectively, and find nothing wrong.  Now we try
> pointer[3], which we follow to B3.
> 
> Then we get to B3's sibling pointer check.  Leftsib is ok, but when we
> move on to checking rightsib, the xfs_btree_increment returns EFSBADCRC
> because the pointer[4] in B0 points to a block that isn't in the btree.
> Here we want to consume the EFSBADCRC, so we set OFLAG_CORRUPT and
> continue walking the tree.
> 
> Next we try to walk pointer[4] in B0 and again hit a EFSBADCRC error.
> Again we set OFLAG_CORRUPT and continue walking the tree.
> 
> Then we try to walk pointer[5] in B0 and encounter B5.  The CRC matches,
> but the magic number is wrong, so we hit EFSCORRUPTED.  The block is
> toast, but we still need to keep walking.
> 
> Now we walk pointer[6] and encounter B6.  We encounter no operational
> errors but then we see some incorrect records so we set OFLAG_CORRUPT
> (it's still set) and continue.
> 
> Finally we get to pointer[7], where everything is fine again.  If we
> haven't encountered any operational problems like ENOMEM then we'll
> return to userspace with OFLAG_CORRUPT set, a return value of zero.
> 
> The ftrace buffer will have a report about the operational error trying
> to walk down pointer[4], another one about B5, record check failures
> from B6.
> 
> (If instead we run out of memory checking B7 then we'll return the
> ENOMEM to userspace.)
> 

Got it, thanks. Note that I'm not trying to suggest to deviate from this
behavior. It's just a refactoring such that -EFSCORRUPTED is always
consumed immediately from any external function that could generate it
and is translated into the local scrub state (i.e., corrupted = true
and/or OFLAG_CORRUPT, where the latter is a subset of the former).
Basically, think of it as instead of never returning -EFSCORRUPTED to
userspace, it should not be returned by any scrub function, ever.

> > return some other error for an error in the infrastructure, or clear any
> > -EFSCORRUPTED error generated by checks that don't necessarily mean the
> > current object is corrupted (or too much so to interrupt processing).
> > 
> > > > I'm wondering if this could all be made more clear by trying to
> > > > explicitly separate out operational errors, scrub failures and whatever
> > > > we want to call the logic that clears an -EFSCORRUPTED/-EFSBADCRC error
> > > > code but still indicates something happened. :P
> > > > 
> > > > For starters, rather than wrap every logic check with btree_op_check(),
> > > > could we use explicit logic and let each function update the context
> > > > based on problems it found? For example, something like the following is
> > > > much more easy to read for me than the associated logic above:
> > > > 
> > > > 	/* Don't try to check a tree with a height we can't handle. */
> > > > 	if (!(cur->bc_nlevels > 0 &&
> > > > 	      cur->bc_nlevels <= XFS_BTREE_MAXLEVELS)) {
> > > > 		xfs_scrub_sc_corrupt(...);
> > > > 		goto out;
> > > > 	}
> > > > 
> > > > And of course the context update calls could be factored into an
> > > > out_corrupt label or something where appropriate.
> > > 
> > > Yes, that could be done.
> > > 
> > > > Beyond that, where we need to identify a bit of metadata is busted to
> > > > perhaps do something like skip it but not abort (as we may have filtered
> > > > out an -EFSCORRUPTED) return code, could we pass a flag down a
> > > > particular callchain (i.e., think 'bool *bad' or 'int *stat' a la the
> > > > core btree code)? Then we can still transfer that state back up the
> > > > chain and the caller(s) can distinguish operational errors from "this
> > > > thing is corrupted, act accordingly," regardless of how the corruption
> > > > was detected.
> > > 
> > > So far I haven't needed to distinguish between "no problems encountered"
> > > and "this callchain hit a verifier error so we just set _CORRUPT" --
> > > scrub always keeps going until it runs out of things to check.
> > > 
> > 
> > Ok. This is partly speculation on the above (trying to wrap my head
> > around the error consumption bits as is) and partly to try and see if we
> > can make the flow more readable.
> > 
> > In my mind, this is more clear if return codes are reserved for
> > operational/infrastructure errors
> 
> Yes, this is true.
> 
> > and the corrupted state of a piece of metadata is its own state.
> 
> Also true -- this is OFLAG_CORRUPT.
> 
> > Using the example above, any -EFSCORRUPTED errors from external calls
> > (xfs_btree_check_block(), xfs_btree_increment(), etc.) would always be
> > cleared and replaced with a return 0.
> 
> <nod>
> 
> > The difference between those is the former (check_block()) error sets
> > a 'bad = true' state on the currently scrubbed bit of metadata and the
> > latter (check_sibling()) does not.
> 
> Nothing in the scrub code needs to track badness at that fine-grained of
> a level.  When we get to the repair patches you'll see that any kind of
> error triggers a complete rebuild of the btree index, with absolutely no
> attempt to touch the existing (inconsistent) btree.
> 

Perhaps, but that's not really the point. My argument is that it
simplifies the logic and thus makes the code more clear. :)

> We /could/ return to userspace as soon as we hit the first EFSCORRUPTED
> or failed check, TBH.
> 

That's a separate question that I haven't really thought much about yet
tbh. I suppose I could see use for a oneshot option or something for
users who may want to exit quickly in favor of offline repair rather
than wait for a thorough scan.

> > The latter can of course still set the global corrupted state on the
> > context to track that there is an inconsistency in the fs. Thoughts?
> 
> I've wondered if it might be clearer if we did something like:
> 
> int error;
> bool bailout;
> 
> error = xfs_btree_increment(...);
> bailout = xfs_scrub_op_error(..., &error);
> if (bailout)
> 	return error;
> 
> if (ptr->field == BADVAL) {
> 	xfs_scrub_corrupt(...);
> 	return error;
> }
> 

It still appears confusing to me because isn't whatever we were going to
do that depends on the xfs_btree_increment() call now bogus if the
increment itself fails, for whatever reason? When would you not bail out
of here on increment failure?

Using a more simple example of xfs_scrub_btree_block(), I'd expect it to
look something like this:

xfs_scrub_btree_block(..., bool *bad, ...)
{
	...

	error = xfs_btree_lookup_get_block(bs->cur, ...);
	if (error)
		goto out_err;

	xfs_btree_get_block(bs->cur, ...);
	error = xfs_btree_check_block(bs->cur, ...);
	if (error)
		goto out_err;

	error = xfs_scrub_btree_block_check_siblings(bs, ...);
	ASSERT(error != -EFSCORRUPTED);
	return error;

out_err:
	if (error == -EFSCORRUPTED) {
		*bad = true;
		error = 0;
	}
	return error;
}

check_siblings() doesn't need bad because it's checking the siblings. It
just sets OFLAG_CORRUPT if it finds corruption or returns a non
corruption error if one occurs. The caller sets OFLAG_CORRUPT if bad ==
true or bails if error != 0.

Brian

> <shrug>
> 
> --D
> 
> > 
> > Brian
> > 
> > > (Maybe I'm missing something?)
> > > 
> > > --D
> > > 
> > > > 
> > > > Brian
> > > > 
> > > > > +	/* Load the root of the btree. */
> > > > > +	level = cur->bc_nlevels - 1;
> > > > > +	cur->bc_ops->init_ptr_from_cur(cur, &ptr);
> > > > > +	error = xfs_scrub_btree_block(&bs, level, &ptr, &block, &bp);
> > > > > +	if (!xfs_scrub_btree_op_ok(sc, cur, cur->bc_nlevels - 1, &error))
> > > > > +		goto out;
> > > > > +
> > > > > +	cur->bc_ptrs[level] = 1;
> > > > > +
> > > > > +	while (level < cur->bc_nlevels) {
> > > > > +		block = xfs_btree_get_block(cur, level, &bp);
> > > > > +
> > > > > +		if (level == 0) {
> > > > > +			/* End of leaf, pop back towards the root. */
> > > > > +			if (cur->bc_ptrs[level] >
> > > > > +			    be16_to_cpu(block->bb_numrecs)) {
> > > > > +				if (level < cur->bc_nlevels - 1)
> > > > > +					cur->bc_ptrs[level + 1]++;
> > > > > +				level++;
> > > > > +				continue;
> > > > > +			}
> > > > > +
> > > > > +			if (xfs_scrub_should_terminate(&error))
> > > > > +				break;
> > > > > +
> > > > > +			cur->bc_ptrs[level]++;
> > > > > +			continue;
> > > > > +		}
> > > > > +
> > > > > +		/* End of node, pop back towards the root. */
> > > > > +		if (cur->bc_ptrs[level] > be16_to_cpu(block->bb_numrecs)) {
> > > > > +			if (level < cur->bc_nlevels - 1)
> > > > > +				cur->bc_ptrs[level + 1]++;
> > > > > +			level++;
> > > > > +			continue;
> > > > > +		}
> > > > > +
> > > > > +		/* Drill another level deeper. */
> > > > > +		pp = xfs_btree_ptr_addr(cur, cur->bc_ptrs[level], block);
> > > > > +		error = xfs_scrub_btree_ptr(&bs, level, pp);
> > > > > +		if (error) {
> > > > > +			error = 0;
> > > > > +			cur->bc_ptrs[level]++;
> > > > > +			continue;
> > > > > +		}
> > > > > +		level--;
> > > > > +		error = xfs_scrub_btree_block(&bs, level, pp, &block, &bp);
> > > > > +		if (!xfs_scrub_btree_op_ok(sc, cur, level, &error))
> > > > > +			goto out;
> > > > > +
> > > > > +		cur->bc_ptrs[level] = 1;
> > > > > +	}
> > > > > +
> > > > > +out:
> > > > > +	return error;
> > > > >  }
> > > > > diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
> > > > > index e1bb14b..9920488 100644
> > > > > --- a/fs/xfs/scrub/common.h
> > > > > +++ b/fs/xfs/scrub/common.h
> > > > > @@ -20,6 +20,19 @@
> > > > >  #ifndef __XFS_SCRUB_COMMON_H__
> > > > >  #define __XFS_SCRUB_COMMON_H__
> > > > >  
> > > > > +/* Should we end the scrub early? */
> > > > > +static inline bool
> > > > > +xfs_scrub_should_terminate(
> > > > > +	int		*error)
> > > > > +{
> > > > > +	if (fatal_signal_pending(current)) {
> > > > > +		if (*error == 0)
> > > > > +			*error = -EAGAIN;
> > > > > +		return true;
> > > > > +	}
> > > > > +	return false;
> > > > > +}
> > > > > +
> > > > >  /*
> > > > >   * Grab a transaction.  If we're going to repair something, we need to
> > > > >   * ensure there's enough reservation to make all the changes.  If not,
> > > > > 
> > > > > --
> > > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > > the body of a message to majordomo@vger.kernel.org
> > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 06/27] xfs: create helpers to record and deal with scrub problems
  2017-09-22 16:44     ` Darrick J. Wong
@ 2017-09-23  7:22       ` Dave Chinner
  2017-09-23  7:24         ` Darrick J. Wong
  0 siblings, 1 reply; 51+ messages in thread
From: Dave Chinner @ 2017-09-23  7:22 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

Darrick,

Excuse the nasty top post, but can you wrap all this up in comments
in the code? It all makes sense now you explain it, but I'm not
going to remember all that in a couple of months time...

-Dave.

On Fri, Sep 22, 2017 at 09:44:18AM -0700, Darrick J. Wong wrote:
> On Fri, Sep 22, 2017 at 05:16:08PM +1000, Dave Chinner wrote:
> > On Wed, Sep 20, 2017 at 05:18:14PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > Create helper functions to record crc and corruption problems, and
> > > deal with any other runtime errors that arise.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > >  fs/xfs/scrub/common.c |  243 +++++++++++++++++++++++++++++++++++++++++++++++++
> > >  fs/xfs/scrub/common.h |   39 ++++++++
> > >  fs/xfs/scrub/trace.h  |  193 +++++++++++++++++++++++++++++++++++++++
> > >  3 files changed, 475 insertions(+)
> > > 
> > > 
> > > diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> > > index 13ccb36..cf3f1365 100644
> > > --- a/fs/xfs/scrub/common.c
> > > +++ b/fs/xfs/scrub/common.c
> > > @@ -47,6 +47,249 @@
> > >  
> > >  /* Common code for the metadata scrubbers. */
> > >  
> > > +/* Check for operational errors. */
> > > +bool
> > > +xfs_scrub_op_ok(
> > > +	struct xfs_scrub_context	*sc,
> > > +	xfs_agnumber_t			agno,
> > > +	xfs_agblock_t			bno,
> > > +	int				*error)
> > > +{
> > > +	switch (*error) {
> > > +	case 0:
> > > +		return true;
> > > +	case -EDEADLOCK:
> > > +		/* Used to restart an op with deadlock avoidance. */
> > > +		trace_xfs_scrub_deadlock_retry(sc->ip, sc->sm, *error);
> > > +		break;
> > > +	case -EFSBADCRC:
> > > +	case -EFSCORRUPTED:
> > > +		/* Note the badness but don't abort. */
> > > +		sc->sm->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT;
> > > +		*error = 0;
> > > +		/* fall through */
> > > +	default:
> > > +		trace_xfs_scrub_op_error(sc, agno, bno, *error,
> > > +				__return_address);
> > > +		break;
> > > +	}
> > > +	return false;
> > > +}
> > 
> > What are the semantics here w.r.t. *error? on some errors it's
> > cleared before we return, on others it's ignored. It's as clear as
> > mud what we should expect from these functions...
> 
> If there's no error, we return true to tell the caller that it's ok to
> move on to the next check in its list.
> 
> For non-verifier errors (e.g. ENOMEM) we return false to tell the caller
> that there's no point in it continuing, and we preserve *error so that
> the caller can return the *error up the stack.  Checking stops
> immediately and the error is handed to userspace.
> 
> Verifier errors (EFSBADCRC/EFSCORRUPTED) are recorded in sm_flags and
> the *error is cleared.  We return false to tell the caller that there's
> point in it continuing with this record.  The caller returns zero to its
> caller, which means that checking continues, having skipped whatever
> block failed the verifier.
> 
> > > +/* Check for metadata block optimization possibilities. */
> > > +bool
> > > +xfs_scrub_block_preen_ok(
> > > +	struct xfs_scrub_context	*sc,
> > > +	struct xfs_buf			*bp,
> > > +	bool				fs_ok)
> > > +{
> > > +	struct xfs_mount		*mp = sc->mp;
> > > +	xfs_fsblock_t			fsbno;
> > > +	xfs_agnumber_t			agno;
> > > +	xfs_agblock_t			bno;
> > > +
> > > +	if (fs_ok)
> > > +		return fs_ok;
> > > +
> > > +	fsbno = XFS_DADDR_TO_FSB(mp, bp->b_bn);
> > > +	agno = XFS_FSB_TO_AGNO(mp, fsbno);
> > > +	bno = XFS_FSB_TO_AGBNO(mp, fsbno);
> > > +
> > > +	sc->sm->sm_flags |= XFS_SCRUB_OFLAG_PREEN;
> > > +	trace_xfs_scrub_block_preen(sc, agno, bno, __return_address);
> > > +	return fs_ok;
> > > +}
> > 
> > Again, I'm not sure what the return value semantics of the functioon
> > are? Why does the fs_ok return shortcut exist?
> 
> The fs_ok functions are wrappers around an if test; the results of the
> if test are passed in as fs_ok.
> 
> Therefore, if fs_ok then things are fine and we just skip out.
> 
> Otherwise, we found something and we should set sm_flags and jump out.
> 
> > Same for all the other functions...
> > 
> > > +
> > > +/* Check for inode metadata non-corruption problems. */
> > > +bool
> > > +xfs_scrub_ino_warn_ok(
> > > +	struct xfs_scrub_context	*sc,
> > > +	struct xfs_buf			*bp,
> > > +	bool				fs_ok)
> > 
> > Confusing. What's the difference between a corruption problem and a
> > "non-corruption problem" that requires a warning?
> 
> Anything that's less severe than "your fs is corrupt" but otherwise
> requires administrator review.  The inode scrubber sets this for inodes
> with a -1 uid/gid.  XFS seems fine with it, but the VFS treats -1ULL as
> a magic "doesn't exist" value, and then userspace can't change it.
> The quota code sets warnings if it detects quota usage exceeding the
> hard limit, or if the limits are larger than the fs, etc.
> 
> In these cases I'd want the administrator to have a look and/or take
> corrective action, but XFS doesn't flag those situations as fs
> corruption nor does xfs_repair complain about them as corruption.
> 
> --D
> 
> > 
> > Cheers,
> > 
> > Dave.
> > -- 
> > Dave Chinner
> > david@fromorbit.com
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 06/27] xfs: create helpers to record and deal with scrub problems
  2017-09-23  7:22       ` Dave Chinner
@ 2017-09-23  7:24         ` Darrick J. Wong
  0 siblings, 0 replies; 51+ messages in thread
From: Darrick J. Wong @ 2017-09-23  7:24 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Sat, Sep 23, 2017 at 05:22:28PM +1000, Dave Chinner wrote:
> Darrick,
> 
> Excuse the nasty top post, but can you wrap all this up in comments
> in the code? It all makes sense now you explain it, but I'm not
> going to remember all that in a couple of months time...

Already done.

(FWIW top-posting doesn't bother me...)

--D

> 
> -Dave.
> 
> On Fri, Sep 22, 2017 at 09:44:18AM -0700, Darrick J. Wong wrote:
> > On Fri, Sep 22, 2017 at 05:16:08PM +1000, Dave Chinner wrote:
> > > On Wed, Sep 20, 2017 at 05:18:14PM -0700, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > 
> > > > Create helper functions to record crc and corruption problems, and
> > > > deal with any other runtime errors that arise.
> > > > 
> > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > ---
> > > >  fs/xfs/scrub/common.c |  243 +++++++++++++++++++++++++++++++++++++++++++++++++
> > > >  fs/xfs/scrub/common.h |   39 ++++++++
> > > >  fs/xfs/scrub/trace.h  |  193 +++++++++++++++++++++++++++++++++++++++
> > > >  3 files changed, 475 insertions(+)
> > > > 
> > > > 
> > > > diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> > > > index 13ccb36..cf3f1365 100644
> > > > --- a/fs/xfs/scrub/common.c
> > > > +++ b/fs/xfs/scrub/common.c
> > > > @@ -47,6 +47,249 @@
> > > >  
> > > >  /* Common code for the metadata scrubbers. */
> > > >  
> > > > +/* Check for operational errors. */
> > > > +bool
> > > > +xfs_scrub_op_ok(
> > > > +	struct xfs_scrub_context	*sc,
> > > > +	xfs_agnumber_t			agno,
> > > > +	xfs_agblock_t			bno,
> > > > +	int				*error)
> > > > +{
> > > > +	switch (*error) {
> > > > +	case 0:
> > > > +		return true;
> > > > +	case -EDEADLOCK:
> > > > +		/* Used to restart an op with deadlock avoidance. */
> > > > +		trace_xfs_scrub_deadlock_retry(sc->ip, sc->sm, *error);
> > > > +		break;
> > > > +	case -EFSBADCRC:
> > > > +	case -EFSCORRUPTED:
> > > > +		/* Note the badness but don't abort. */
> > > > +		sc->sm->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT;
> > > > +		*error = 0;
> > > > +		/* fall through */
> > > > +	default:
> > > > +		trace_xfs_scrub_op_error(sc, agno, bno, *error,
> > > > +				__return_address);
> > > > +		break;
> > > > +	}
> > > > +	return false;
> > > > +}
> > > 
> > > What are the semantics here w.r.t. *error? on some errors it's
> > > cleared before we return, on others it's ignored. It's as clear as
> > > mud what we should expect from these functions...
> > 
> > If there's no error, we return true to tell the caller that it's ok to
> > move on to the next check in its list.
> > 
> > For non-verifier errors (e.g. ENOMEM) we return false to tell the caller
> > that there's no point in it continuing, and we preserve *error so that
> > the caller can return the *error up the stack.  Checking stops
> > immediately and the error is handed to userspace.
> > 
> > Verifier errors (EFSBADCRC/EFSCORRUPTED) are recorded in sm_flags and
> > the *error is cleared.  We return false to tell the caller that there's
> > point in it continuing with this record.  The caller returns zero to its
> > caller, which means that checking continues, having skipped whatever
> > block failed the verifier.
> > 
> > > > +/* Check for metadata block optimization possibilities. */
> > > > +bool
> > > > +xfs_scrub_block_preen_ok(
> > > > +	struct xfs_scrub_context	*sc,
> > > > +	struct xfs_buf			*bp,
> > > > +	bool				fs_ok)
> > > > +{
> > > > +	struct xfs_mount		*mp = sc->mp;
> > > > +	xfs_fsblock_t			fsbno;
> > > > +	xfs_agnumber_t			agno;
> > > > +	xfs_agblock_t			bno;
> > > > +
> > > > +	if (fs_ok)
> > > > +		return fs_ok;
> > > > +
> > > > +	fsbno = XFS_DADDR_TO_FSB(mp, bp->b_bn);
> > > > +	agno = XFS_FSB_TO_AGNO(mp, fsbno);
> > > > +	bno = XFS_FSB_TO_AGBNO(mp, fsbno);
> > > > +
> > > > +	sc->sm->sm_flags |= XFS_SCRUB_OFLAG_PREEN;
> > > > +	trace_xfs_scrub_block_preen(sc, agno, bno, __return_address);
> > > > +	return fs_ok;
> > > > +}
> > > 
> > > Again, I'm not sure what the return value semantics of the functioon
> > > are? Why does the fs_ok return shortcut exist?
> > 
> > The fs_ok functions are wrappers around an if test; the results of the
> > if test are passed in as fs_ok.
> > 
> > Therefore, if fs_ok then things are fine and we just skip out.
> > 
> > Otherwise, we found something and we should set sm_flags and jump out.
> > 
> > > Same for all the other functions...
> > > 
> > > > +
> > > > +/* Check for inode metadata non-corruption problems. */
> > > > +bool
> > > > +xfs_scrub_ino_warn_ok(
> > > > +	struct xfs_scrub_context	*sc,
> > > > +	struct xfs_buf			*bp,
> > > > +	bool				fs_ok)
> > > 
> > > Confusing. What's the difference between a corruption problem and a
> > > "non-corruption problem" that requires a warning?
> > 
> > Anything that's less severe than "your fs is corrupt" but otherwise
> > requires administrator review.  The inode scrubber sets this for inodes
> > with a -1 uid/gid.  XFS seems fine with it, but the VFS treats -1ULL as
> > a magic "doesn't exist" value, and then userspace can't change it.
> > The quota code sets warnings if it detects quota usage exceeding the
> > hard limit, or if the limits are larger than the fs, etc.
> > 
> > In these cases I'd want the administrator to have a look and/or take
> > corrective action, but XFS doesn't flag those situations as fs
> > corruption nor does xfs_repair complain about them as corruption.
> > 
> > --D
> > 
> > > 
> > > Cheers,
> > > 
> > > Dave.
> > > -- 
> > > Dave Chinner
> > > david@fromorbit.com
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2017-09-23  7:24 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-21  0:17 [PATCH v10 00/27] xfs: online scrub support Darrick J. Wong
2017-09-21  0:17 ` [PATCH 01/27] xfs: return a distinct error code value for IGET_INCORE cache misses Darrick J. Wong
2017-09-21 14:36   ` Brian Foster
2017-09-21  0:17 ` [PATCH 02/27] xfs: query the per-AG reservation counters Darrick J. Wong
2017-09-21 14:36   ` Brian Foster
2017-09-21 17:30     ` Darrick J. Wong
2017-09-21  0:17 ` [PATCH 03/27] xfs: create an ioctl to scrub AG metadata Darrick J. Wong
2017-09-21 14:36   ` Brian Foster
2017-09-21 17:35     ` Darrick J. Wong
2017-09-21 17:52       ` Brian Foster
2017-09-22  3:26         ` Darrick J. Wong
2017-09-21  0:18 ` [PATCH 04/27] xfs: dispatch metadata scrub subcommands Darrick J. Wong
2017-09-21 14:37   ` Brian Foster
2017-09-21 18:08     ` Darrick J. Wong
2017-09-21  0:18 ` [PATCH 05/27] xfs: test the scrub ioctl Darrick J. Wong
2017-09-21  6:04   ` Dave Chinner
2017-09-21 18:14     ` Darrick J. Wong
2017-09-21  0:18 ` [PATCH 06/27] xfs: create helpers to record and deal with scrub problems Darrick J. Wong
2017-09-22  7:16   ` Dave Chinner
2017-09-22 16:44     ` Darrick J. Wong
2017-09-23  7:22       ` Dave Chinner
2017-09-23  7:24         ` Darrick J. Wong
2017-09-21  0:18 ` [PATCH 07/27] xfs: create helpers to scrub a metadata btree Darrick J. Wong
2017-09-22  7:23   ` Dave Chinner
2017-09-22 16:59     ` Darrick J. Wong
2017-09-21  0:18 ` [PATCH 08/27] xfs: scrub the shape of " Darrick J. Wong
2017-09-22 15:22   ` Brian Foster
2017-09-22 17:22     ` Darrick J. Wong
2017-09-22 19:13       ` Brian Foster
2017-09-22 20:14         ` Darrick J. Wong
2017-09-22 21:15           ` Brian Foster
2017-09-21  0:18 ` [PATCH 09/27] xfs: scrub btree keys and records Darrick J. Wong
2017-09-21  0:18 ` [PATCH 10/27] xfs: create helpers to scan an allocation group Darrick J. Wong
2017-09-21  0:18 ` [PATCH 11/27] xfs: scrub the backup superblocks Darrick J. Wong
2017-09-21  0:18 ` [PATCH 12/27] xfs: scrub AGF and AGFL Darrick J. Wong
2017-09-21  0:18 ` [PATCH 13/27] xfs: scrub the AGI Darrick J. Wong
2017-09-21  0:19 ` [PATCH 14/27] xfs: scrub free space btrees Darrick J. Wong
2017-09-21  0:19 ` [PATCH 15/27] xfs: scrub inode btrees Darrick J. Wong
2017-09-21  0:19 ` [PATCH 16/27] xfs: scrub rmap btrees Darrick J. Wong
2017-09-21  0:19 ` [PATCH 17/27] xfs: scrub refcount btrees Darrick J. Wong
2017-09-21  0:19 ` [PATCH 18/27] xfs: scrub inodes Darrick J. Wong
2017-09-21  0:19 ` [PATCH 19/27] xfs: scrub inode block mappings Darrick J. Wong
2017-09-21  0:19 ` [PATCH 20/27] xfs: scrub directory/attribute btrees Darrick J. Wong
2017-09-21  0:19 ` [PATCH 21/27] xfs: scrub directory metadata Darrick J. Wong
2017-09-21  0:19 ` [PATCH 22/27] xfs: scrub directory freespace Darrick J. Wong
2017-09-21  0:20 ` [PATCH 23/27] xfs: scrub extended attributes Darrick J. Wong
2017-09-21  0:20 ` [PATCH 24/27] xfs: scrub symbolic links Darrick J. Wong
2017-09-21  0:20 ` [PATCH 25/27] xfs: scrub parent pointers Darrick J. Wong
2017-09-21  0:20 ` [PATCH 26/27] xfs: scrub realtime bitmap/summary Darrick J. Wong
2017-09-21  0:20 ` [PATCH 27/27] xfs: scrub quota information Darrick J. Wong
2017-09-22  3:27 ` [PATCH] man: describe the metadata scrubbing ioctl Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.