All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: darrick.wong@oracle.com
Cc: linux-xfs@vger.kernel.org
Subject: [PATCH 02/25] xfs: dispatch metadata scrub subcommands
Date: Tue, 03 Oct 2017 13:41:02 -0700	[thread overview]
Message-ID: <150706326240.19351.5195584361798357627.stgit@magnolia> (raw)
In-Reply-To: <150706324963.19351.17715069858921948692.stgit@magnolia>

From: Darrick J. Wong <darrick.wong@oracle.com>

Create structures needed to hold scrubbing context and dispatch incoming
commands to the individual scrubbers.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/scrub/scrub.c |  178 ++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/scrub.h |   24 +++++++
 fs/xfs/scrub/trace.h |   43 ++++++++++++
 3 files changed, 244 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 5db2a6f..1f0ab15 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -44,11 +44,187 @@
 #include "scrub/scrub.h"
 #include "scrub/trace.h"
 
+/*
+ * Online Scrub and Repair
+ *
+ * Traditionally, XFS (the kernel driver) did not know how to check or
+ * repair on-disk data structures.  That task was left to the xfs_check
+ * and xfs_repair tools, both of which require taking the filesystem
+ * offline for a thorough but time consuming examination.  Online
+ * scrub & repair, on the other hand, enables us to check the metadata
+ * for obvious errors while carefully stepping around the filesystem's
+ * ongoing operations, locking rules, etc.
+ *
+ * Given that most XFS metadata consist of records stored in a btree,
+ * most of the checking functions iterate the btree blocks themselves
+ * looking for irregularities.  When a record block is encountered, each
+ * record can be checked for obviously bad values.  Record values can
+ * also be cross-referenced against other btrees to look for potential
+ * misunderstandings between pieces of metadata.
+ *
+ * It is expected that the checkers responsible for per-AG metadata
+ * structures will lock the AG headers (AGI, AGF, AGFL), iterate the
+ * metadata structure, and perform any relevant cross-referencing before
+ * unlocking the AG and returning the results to userspace.  These
+ * scrubbers must not keep an AG locked for too long to avoid tying up
+ * the block and inode allocators.
+ *
+ * Block maps and b-trees rooted in an inode present a special challenge
+ * because they can involve extents from any AG.  The general scrubber
+ * structure of lock -> check -> xref -> unlock still holds, but AG
+ * locking order rules /must/ be obeyed to avoid deadlocks.  The
+ * ordering rule, of course, is that we must lock in increasing AG
+ * order.  Helper functions are provided to track which AG headers we've
+ * already locked.  If we detect an imminent locking order violation, we
+ * can signal a potential deadlock, in which case the scrubber can jump
+ * out to the top level, lock all the AGs in order, and retry the scrub.
+ *
+ * For file data (directories, extended attributes, symlinks) scrub, we
+ * can simply lock the inode and walk the data.  For btree data
+ * (directories and attributes) we follow the same btree-scrubbing
+ * strategy outlined previously to check the records.
+ *
+ * We use a bit of trickery with transactions to avoid buffer deadlocks
+ * if there is a cycle in the metadata.  The basic problem is that
+ * travelling down a btree involves locking the current buffer at each
+ * tree level.  If a pointer should somehow point back to a buffer that
+ * we've already examined, we will deadlock due to the second buffer
+ * locking attempt.  Note however that grabbing a buffer in transaction
+ * context links the locked buffer to the transaction.  If we try to
+ * re-grab the buffer in the context of the same transaction, we avoid
+ * the second lock attempt and continue.  Between the verifier and the
+ * scrubber, something will notice that something is amiss and report
+ * the corruption.  Therefore, each scrubber will allocate an empty
+ * transaction, attach buffers to it, and cancel the transaction at the
+ * end of the scrub run.  Cancelling a non-dirty transaction simply
+ * unlocks the buffers.
+ *
+ * There are four pieces of data that scrub can communicate to
+ * userspace.  The first is the error code (errno), which can be used to
+ * communicate operational errors in performing the scrub.  There are
+ * also three flags that can be set in the scrub context.  If the data
+ * structure itself is corrupt, the CORRUPT flag will be set.  If
+ * the metadata is correct but otherwise suboptimal, the PREEN flag
+ * will be set.
+ */
+
+/* Scrub setup and teardown */
+
+/* Free all the resources and finish the transactions. */
+STATIC int
+xfs_scrub_teardown(
+	struct xfs_scrub_context	*sc,
+	int				error)
+{
+	if (sc->tp) {
+		xfs_trans_cancel(sc->tp);
+		sc->tp = NULL;
+	}
+	return error;
+}
+
+/* Scrubbing dispatch. */
+
+static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
+};
+
+/* This isn't a stable feature, warn once per day. */
+static inline void
+xfs_scrub_experimental_warning(
+	struct xfs_mount	*mp)
+{
+	static struct ratelimit_state scrub_warning = RATELIMIT_STATE_INIT(
+			"xfs_scrub_warning", 86400 * HZ, 1);
+	ratelimit_set_flags(&scrub_warning, RATELIMIT_MSG_ON_RELEASE);
+
+	if (__ratelimit(&scrub_warning))
+		xfs_alert(mp,
+"EXPERIMENTAL online scrub feature in use. Use at your own risk!");
+}
+
 /* Dispatch metadata scrubbing. */
 int
 xfs_scrub_metadata(
 	struct xfs_inode		*ip,
 	struct xfs_scrub_metadata	*sm)
 {
-	return -EOPNOTSUPP;
+	struct xfs_scrub_context	sc;
+	struct xfs_mount		*mp = ip->i_mount;
+	const struct xfs_scrub_meta_ops	*ops;
+	bool				try_harder = false;
+	int				error = 0;
+
+	trace_xfs_scrub_start(ip, sm, error);
+
+	/* Forbidden if we are shut down or mounted norecovery. */
+	error = -ESHUTDOWN;
+	if (XFS_FORCED_SHUTDOWN(mp))
+		goto out;
+	error = -ENOTRECOVERABLE;
+	if (mp->m_flags & XFS_MOUNT_NORECOVERY)
+		goto out;
+
+	/* Check our inputs. */
+	error = -EINVAL;
+	sm->sm_flags &= ~XFS_SCRUB_FLAGS_OUT;
+	if (sm->sm_flags & ~XFS_SCRUB_FLAGS_IN)
+		goto out;
+	if (memchr_inv(sm->sm_reserved, 0, sizeof(sm->sm_reserved)))
+		goto out;
+
+	/* Do we know about this type of metadata? */
+	error = -ENOENT;
+	if (sm->sm_type >= XFS_SCRUB_TYPE_NR)
+		goto out;
+	ops = &meta_scrub_ops[sm->sm_type];
+	if (ops->scrub == NULL)
+		goto out;
+
+	/* Does this fs even support this type of metadata? */
+	if (ops->has && !ops->has(&mp->m_sb))
+		goto out;
+
+	/* We don't know how to repair anything yet. */
+	error = -EOPNOTSUPP;
+	if (sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR)
+		goto out;
+
+	xfs_scrub_experimental_warning(mp);
+
+retry_op:
+	/* Set up for the operation. */
+	memset(&sc, 0, sizeof(sc));
+	sc.mp = ip->i_mount;
+	sc.sm = sm;
+	sc.ops = ops;
+	sc.try_harder = try_harder;
+	error = sc.ops->setup(&sc, ip);
+	if (error)
+		goto out_teardown;
+
+	/* Scrub for errors. */
+	error = sc.ops->scrub(&sc);
+	if (!try_harder && error == -EDEADLOCK) {
+		/*
+		 * Scrubbers return -EDEADLOCK to mean 'try harder'.
+		 * Tear down everything we hold, then set up again with
+		 * preparation for worst-case scenarios.
+		 */
+		error = xfs_scrub_teardown(&sc, 0);
+		if (error)
+			goto out;
+		try_harder = true;
+		goto retry_op;
+	} else if (error)
+		goto out_teardown;
+
+	if (sc.sm->sm_flags & (XFS_SCRUB_OFLAG_CORRUPT |
+			       XFS_SCRUB_OFLAG_XCORRUPT))
+		xfs_alert_ratelimited(mp, "Corruption detected during scrub.");
+
+out_teardown:
+	error = xfs_scrub_teardown(&sc, error);
+out:
+	trace_xfs_scrub_done(ip, sm, error);
+	return error;
 }
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index eb1cd9d..ef7b50e 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -20,6 +20,30 @@
 #ifndef __XFS_SCRUB_SCRUB_H__
 #define __XFS_SCRUB_SCRUB_H__
 
+struct xfs_scrub_context;
+
+struct xfs_scrub_meta_ops {
+	/* Acquire whatever resources are needed for the operation. */
+	int		(*setup)(struct xfs_scrub_context *,
+				 struct xfs_inode *);
+
+	/* Examine metadata for errors. */
+	int		(*scrub)(struct xfs_scrub_context *);
+
+	/* Decide if we even have this piece of metadata. */
+	bool		(*has)(struct xfs_sb *);
+};
+
+struct xfs_scrub_context {
+	/* General scrub state. */
+	struct xfs_mount		*mp;
+	struct xfs_scrub_metadata	*sm;
+	const struct xfs_scrub_meta_ops	*ops;
+	struct xfs_trans		*tp;
+	struct xfs_inode		*ip;
+	bool				try_harder;
+};
+
 /* Metadata scrubbers */
 
 #endif	/* __XFS_SCRUB_SCRUB_H__ */
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index a95a7c8..688517e 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -25,6 +25,49 @@
 
 #include <linux/tracepoint.h>
 
+DECLARE_EVENT_CLASS(xfs_scrub_class,
+	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
+		 int error),
+	TP_ARGS(ip, sm, error),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(unsigned int, type)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_ino_t, inum)
+		__field(unsigned int, gen)
+		__field(unsigned int, flags)
+		__field(int, error)
+	),
+	TP_fast_assign(
+		__entry->dev = ip->i_mount->m_super->s_dev;
+		__entry->ino = ip->i_ino;
+		__entry->type = sm->sm_type;
+		__entry->agno = sm->sm_agno;
+		__entry->inum = sm->sm_ino;
+		__entry->gen = sm->sm_gen;
+		__entry->flags = sm->sm_flags;
+		__entry->error = error;
+	),
+	TP_printk("dev %d:%d ino %llu type %u agno %u inum %llu gen %u flags 0x%x error %d",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->type,
+		  __entry->agno,
+		  __entry->inum,
+		  __entry->gen,
+		  __entry->flags,
+		  __entry->error)
+)
+#define DEFINE_SCRUB_EVENT(name) \
+DEFINE_EVENT(xfs_scrub_class, name, \
+	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm, \
+		 int error), \
+	TP_ARGS(ip, sm, error))
+
+DEFINE_SCRUB_EVENT(xfs_scrub_start);
+DEFINE_SCRUB_EVENT(xfs_scrub_done);
+
 #endif /* _TRACE_XFS_SCRUB_TRACE_H */
 
 #undef TRACE_INCLUDE_PATH


  parent reply	other threads:[~2017-10-03 20:41 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-03 20:40 [PATCH v11 00/25] xfs: online scrub support Darrick J. Wong
2017-10-03 20:40 ` [PATCH 01/25] xfs: create an ioctl to scrub AG metadata Darrick J. Wong
2017-10-03 20:41 ` Darrick J. Wong [this message]
2017-10-03 20:41 ` [PATCH 03/25] xfs: probe the scrub ioctl Darrick J. Wong
2017-10-03 23:32   ` Dave Chinner
2017-10-04  0:02     ` Darrick J. Wong
2017-10-04  1:56       ` Dave Chinner
2017-10-04  3:14         ` Darrick J. Wong
2017-10-03 20:41 ` [PATCH 04/25] xfs: create helpers to record and deal with scrub problems Darrick J. Wong
2017-10-03 23:44   ` Dave Chinner
2017-10-04  0:56     ` Darrick J. Wong
2017-10-03 20:41 ` [PATCH 05/25] xfs: create helpers to scrub a metadata btree Darrick J. Wong
2017-10-03 23:49   ` Dave Chinner
2017-10-04  0:13     ` Darrick J. Wong
2017-10-03 20:41 ` [PATCH 06/25] xfs: scrub the shape of " Darrick J. Wong
2017-10-04  0:15   ` Dave Chinner
2017-10-04  3:51     ` Darrick J. Wong
2017-10-04  5:48       ` Dave Chinner
2017-10-04 17:48         ` Darrick J. Wong
2017-10-03 20:41 ` [PATCH 07/25] xfs: scrub btree keys and records Darrick J. Wong
2017-10-04 20:52   ` Darrick J. Wong
2017-10-03 20:41 ` [PATCH 08/25] xfs: create helpers to scan an allocation group Darrick J. Wong
2017-10-04  0:46   ` Dave Chinner
2017-10-04  3:58     ` Darrick J. Wong
2017-10-04  5:59       ` Dave Chinner
2017-10-04 17:51         ` Darrick J. Wong
2017-10-03 20:41 ` [PATCH 09/25] xfs: scrub the backup superblocks Darrick J. Wong
2017-10-04  0:57   ` Dave Chinner
2017-10-04  4:06     ` Darrick J. Wong
2017-10-04  6:13       ` Dave Chinner
2017-10-04 17:56         ` Darrick J. Wong
2017-10-03 20:41 ` [PATCH 10/25] xfs: scrub AGF and AGFL Darrick J. Wong
2017-10-04  1:31   ` Dave Chinner
2017-10-04  4:21     ` Darrick J. Wong
2017-10-04  6:28       ` Dave Chinner
2017-10-04 17:57         ` Darrick J. Wong
2017-10-03 20:41 ` [PATCH 11/25] xfs: scrub the AGI Darrick J. Wong
2017-10-04  1:43   ` Dave Chinner
2017-10-04  4:25     ` Darrick J. Wong
2017-10-04  6:43       ` Dave Chinner
2017-10-04 18:02         ` Darrick J. Wong
2017-10-04 22:16           ` Dave Chinner
2017-10-04 23:12             ` Darrick J. Wong
2017-10-03 20:42 ` [PATCH 12/25] xfs: scrub free space btrees Darrick J. Wong
2017-10-05  0:59   ` Dave Chinner
2017-10-05  1:13     ` Darrick J. Wong
2017-10-03 20:42 ` [PATCH 13/25] xfs: scrub inode btrees Darrick J. Wong
2017-10-05  2:08   ` Dave Chinner
2017-10-05  5:47     ` Darrick J. Wong
2017-10-05  7:22       ` Dave Chinner
2017-10-05 18:26         ` Darrick J. Wong
2017-10-03 20:42 ` [PATCH 14/25] xfs: scrub rmap btrees Darrick J. Wong
2017-10-05  2:56   ` Dave Chinner
2017-10-05  5:02     ` Darrick J. Wong
2017-10-03 20:42 ` [PATCH 15/25] xfs: scrub refcount btrees Darrick J. Wong
2017-10-05  2:59   ` Dave Chinner
2017-10-05  5:02     ` Darrick J. Wong
2017-10-03 20:42 ` [PATCH 16/25] xfs: scrub inodes Darrick J. Wong
2017-10-05  4:04   ` Dave Chinner
2017-10-05  5:22     ` Darrick J. Wong
2017-10-05  7:13       ` Dave Chinner
2017-10-05 19:56         ` Darrick J. Wong
2017-10-03 20:42 ` [PATCH 17/25] xfs: scrub inode block mappings Darrick J. Wong
2017-10-06  2:51   ` Dave Chinner
2017-10-06 17:00     ` Darrick J. Wong
2017-10-07 23:10       ` Dave Chinner
2017-10-08  3:54         ` Darrick J. Wong
2017-10-03 20:42 ` [PATCH 18/25] xfs: scrub directory/attribute btrees Darrick J. Wong
2017-10-06  5:07   ` Dave Chinner
2017-10-06 18:30     ` Darrick J. Wong
2017-10-03 20:42 ` [PATCH 19/25] xfs: scrub directory metadata Darrick J. Wong
2017-10-06  7:07   ` Dave Chinner
2017-10-06 19:45     ` Darrick J. Wong
2017-10-06 22:16       ` Dave Chinner
2017-10-03 20:42 ` [PATCH 20/25] xfs: scrub directory freespace Darrick J. Wong
2017-10-09  1:44   ` Dave Chinner
2017-10-09 22:54     ` Darrick J. Wong
2017-10-03 20:43 ` [PATCH 21/25] xfs: scrub extended attributes Darrick J. Wong
2017-10-09  2:13   ` Dave Chinner
2017-10-09 21:14     ` Darrick J. Wong
2017-10-03 20:43 ` [PATCH 22/25] xfs: scrub symbolic links Darrick J. Wong
2017-10-09  2:17   ` Dave Chinner
2017-10-03 20:43 ` [PATCH 23/25] xfs: scrub parent pointers Darrick J. Wong
2017-10-03 20:43 ` [PATCH 24/25] xfs: scrub realtime bitmap/summary Darrick J. Wong
2017-10-09  2:28   ` Dave Chinner
2017-10-09 20:24     ` Darrick J. Wong
2017-10-03 20:43 ` [PATCH 25/25] xfs: scrub quota information Darrick J. Wong
2017-10-09  2:51   ` Dave Chinner
2017-10-09 20:03     ` Darrick J. Wong
2017-10-09 22:17       ` Dave Chinner
2017-10-09 23:08         ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=150706326240.19351.5195584361798357627.stgit@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.