All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v17 00/16] xfs-4.19: online repair support
@ 2018-07-26  0:19 Darrick J. Wong
  2018-07-26  0:19 ` [PATCH 01/16] xfs: pass transaction lock while setting up agresv on cyclic metadata Darrick J. Wong
                   ` (15 more replies)
  0 siblings, 16 replies; 26+ messages in thread
From: Darrick J. Wong @ 2018-07-26  0:19 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, david, allison.henderson

Hi all,

This is the seventeenth revision of a patchset that adds to XFS kernel
support for online metadata scrubbing and repair.  There aren't any
on-disk format changes.

New for this version of the patch series are fixes for numerous review
comments that came from Dave and Allison.  The long prefixes of the
previous versions have been drastically shortened.  Comments about the
strategies used to repair broken parts of the filesystem have been
expanded where reviewers thought it confusing.  A few data structures
have been renamed to reflect more accurately what they do.

Note, this series does not include any of the controversial repair
functionality that requires fs freezing; that has been deferred to a
later posting.

The first patch pushes a transaction pointer through the per-AG
reservation code so that scrub can reinitialize the per-AG reservations
after repairing metadata while maintaining the AG header lock.

The next two patches move the 'extent list' functionality into a
separate file and rename it xfs_bitmap, since that's what the data
structure actually represents.

Patches 4-14 implement reconstruction of the AGF/AGI/AGFL headers, the
free space btrees, the inode btrees, the inodes, the inode forks, the
inode block maps, symbolic links, and extended attributes.

Patch 15 augments scrub to rebuild extended attributes when any of the
attr blocks are fragmented.

Patch 16 implements reconstruction of quota blocks.

If you're going to start using this mess, you probably ought to just
pull from my git trees.  The kernel patches[1] should apply against
4.18-rc6.  xfsprogs[2] and xfstests[3] can be found in their usual
places.  The git trees contain all four series' worth of changes.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

[1] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=djwong-devel
[2] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=djwong-devel
[3] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=djwong-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 01/16] xfs: pass transaction lock while setting up agresv on cyclic metadata
  2018-07-26  0:19 [PATCH v17 00/16] xfs-4.19: online repair support Darrick J. Wong
@ 2018-07-26  0:19 ` Darrick J. Wong
  2018-07-27 14:21   ` Brian Foster
  2018-07-26  0:19 ` [PATCH 02/16] xfs: move the repair extent list into its own file Darrick J. Wong
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 26+ messages in thread
From: Darrick J. Wong @ 2018-07-26  0:19 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, david, allison.henderson

From: Darrick J. Wong <darrick.wong@oracle.com>

Pass a tranaction pointer through to all helpers that calculate the
per-AG block reservation.  Online repair will use this to reinitialize
per-ag reservations while it still holds all the AG headers locked to
the repair transaction.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_ag_resv.c        |   13 +++++++------
 fs/xfs/libxfs/xfs_ag_resv.h        |    2 +-
 fs/xfs/libxfs/xfs_ialloc_btree.c   |   10 ++++++----
 fs/xfs/libxfs/xfs_ialloc_btree.h   |    4 ++--
 fs/xfs/libxfs/xfs_refcount_btree.c |    5 +++--
 fs/xfs/libxfs/xfs_refcount_btree.h |    3 ++-
 fs/xfs/libxfs/xfs_rmap_btree.c     |    5 +++--
 fs/xfs/libxfs/xfs_rmap_btree.h     |    2 +-
 fs/xfs/xfs_fsops.c                 |    2 +-
 9 files changed, 26 insertions(+), 20 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_ag_resv.c b/fs/xfs/libxfs/xfs_ag_resv.c
index fecd187fcf2c..e701ebc36c06 100644
--- a/fs/xfs/libxfs/xfs_ag_resv.c
+++ b/fs/xfs/libxfs/xfs_ag_resv.c
@@ -248,7 +248,8 @@ __xfs_ag_resv_init(
 /* Create a per-AG block reservation. */
 int
 xfs_ag_resv_init(
-	struct xfs_perag		*pag)
+	struct xfs_perag		*pag,
+	struct xfs_trans		*tp)
 {
 	struct xfs_mount		*mp = pag->pag_mount;
 	xfs_agnumber_t			agno = pag->pag_agno;
@@ -260,11 +261,11 @@ xfs_ag_resv_init(
 	if (pag->pag_meta_resv.ar_asked == 0) {
 		ask = used = 0;
 
-		error = xfs_refcountbt_calc_reserves(mp, agno, &ask, &used);
+		error = xfs_refcountbt_calc_reserves(mp, tp, agno, &ask, &used);
 		if (error)
 			goto out;
 
-		error = xfs_finobt_calc_reserves(mp, agno, &ask, &used);
+		error = xfs_finobt_calc_reserves(mp, tp, agno, &ask, &used);
 		if (error)
 			goto out;
 
@@ -282,7 +283,7 @@ xfs_ag_resv_init(
 
 			mp->m_inotbt_nores = true;
 
-			error = xfs_refcountbt_calc_reserves(mp, agno, &ask,
+			error = xfs_refcountbt_calc_reserves(mp, tp, agno, &ask,
 					&used);
 			if (error)
 				goto out;
@@ -298,7 +299,7 @@ xfs_ag_resv_init(
 	if (pag->pag_rmapbt_resv.ar_asked == 0) {
 		ask = used = 0;
 
-		error = xfs_rmapbt_calc_reserves(mp, agno, &ask, &used);
+		error = xfs_rmapbt_calc_reserves(mp, tp, agno, &ask, &used);
 		if (error)
 			goto out;
 
@@ -309,7 +310,7 @@ xfs_ag_resv_init(
 
 #ifdef DEBUG
 	/* need to read in the AGF for the ASSERT below to work */
-	error = xfs_alloc_pagf_init(pag->pag_mount, NULL, pag->pag_agno, 0);
+	error = xfs_alloc_pagf_init(pag->pag_mount, tp, pag->pag_agno, 0);
 	if (error)
 		return error;
 
diff --git a/fs/xfs/libxfs/xfs_ag_resv.h b/fs/xfs/libxfs/xfs_ag_resv.h
index 4619b554ee90..d1005116b43b 100644
--- a/fs/xfs/libxfs/xfs_ag_resv.h
+++ b/fs/xfs/libxfs/xfs_ag_resv.h
@@ -7,7 +7,7 @@
 #define	__XFS_AG_RESV_H__
 
 int xfs_ag_resv_free(struct xfs_perag *pag);
-int xfs_ag_resv_init(struct xfs_perag *pag);
+int xfs_ag_resv_init(struct xfs_perag *pag, struct xfs_trans *tp);
 
 bool xfs_ag_resv_critical(struct xfs_perag *pag, enum xfs_ag_resv_type type);
 xfs_extlen_t xfs_ag_resv_needed(struct xfs_perag *pag,
diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
index 735a33252eb2..86c50208a143 100644
--- a/fs/xfs/libxfs/xfs_ialloc_btree.c
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
@@ -552,6 +552,7 @@ xfs_inobt_max_size(
 static int
 xfs_inobt_count_blocks(
 	struct xfs_mount	*mp,
+	struct xfs_trans	*tp,
 	xfs_agnumber_t		agno,
 	xfs_btnum_t		btnum,
 	xfs_extlen_t		*tree_blocks)
@@ -560,14 +561,14 @@ xfs_inobt_count_blocks(
 	struct xfs_btree_cur	*cur;
 	int			error;
 
-	error = xfs_ialloc_read_agi(mp, NULL, agno, &agbp);
+	error = xfs_ialloc_read_agi(mp, tp, agno, &agbp);
 	if (error)
 		return error;
 
-	cur = xfs_inobt_init_cursor(mp, NULL, agbp, agno, btnum);
+	cur = xfs_inobt_init_cursor(mp, tp, agbp, agno, btnum);
 	error = xfs_btree_count_blocks(cur, tree_blocks);
 	xfs_btree_del_cursor(cur, error);
-	xfs_buf_relse(agbp);
+	xfs_trans_brelse(tp, agbp);
 
 	return error;
 }
@@ -578,6 +579,7 @@ xfs_inobt_count_blocks(
 int
 xfs_finobt_calc_reserves(
 	struct xfs_mount	*mp,
+	struct xfs_trans	*tp,
 	xfs_agnumber_t		agno,
 	xfs_extlen_t		*ask,
 	xfs_extlen_t		*used)
@@ -588,7 +590,7 @@ xfs_finobt_calc_reserves(
 	if (!xfs_sb_version_hasfinobt(&mp->m_sb))
 		return 0;
 
-	error = xfs_inobt_count_blocks(mp, agno, XFS_BTNUM_FINO, &tree_len);
+	error = xfs_inobt_count_blocks(mp, tp, agno, XFS_BTNUM_FINO, &tree_len);
 	if (error)
 		return error;
 
diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.h b/fs/xfs/libxfs/xfs_ialloc_btree.h
index bf8f0c405e7d..ebdd0c6b8766 100644
--- a/fs/xfs/libxfs/xfs_ialloc_btree.h
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.h
@@ -60,8 +60,8 @@ int xfs_inobt_rec_check_count(struct xfs_mount *,
 #define xfs_inobt_rec_check_count(mp, rec)	0
 #endif	/* DEBUG */
 
-int xfs_finobt_calc_reserves(struct xfs_mount *mp, xfs_agnumber_t agno,
-		xfs_extlen_t *ask, xfs_extlen_t *used);
+int xfs_finobt_calc_reserves(struct xfs_mount *mp, struct xfs_trans *tp,
+		xfs_agnumber_t agno, xfs_extlen_t *ask, xfs_extlen_t *used);
 extern xfs_extlen_t xfs_iallocbt_calc_size(struct xfs_mount *mp,
 		unsigned long long len);
 
diff --git a/fs/xfs/libxfs/xfs_refcount_btree.c b/fs/xfs/libxfs/xfs_refcount_btree.c
index b71937982c5b..bcd65ee37260 100644
--- a/fs/xfs/libxfs/xfs_refcount_btree.c
+++ b/fs/xfs/libxfs/xfs_refcount_btree.c
@@ -408,6 +408,7 @@ xfs_refcountbt_max_size(
 int
 xfs_refcountbt_calc_reserves(
 	struct xfs_mount	*mp,
+	struct xfs_trans	*tp,
 	xfs_agnumber_t		agno,
 	xfs_extlen_t		*ask,
 	xfs_extlen_t		*used)
@@ -422,14 +423,14 @@ xfs_refcountbt_calc_reserves(
 		return 0;
 
 
-	error = xfs_alloc_read_agf(mp, NULL, agno, 0, &agbp);
+	error = xfs_alloc_read_agf(mp, tp, agno, 0, &agbp);
 	if (error)
 		return error;
 
 	agf = XFS_BUF_TO_AGF(agbp);
 	agblocks = be32_to_cpu(agf->agf_length);
 	tree_len = be32_to_cpu(agf->agf_refcount_blocks);
-	xfs_buf_relse(agbp);
+	xfs_trans_brelse(tp, agbp);
 
 	*ask += xfs_refcountbt_max_size(mp, agblocks);
 	*used += tree_len;
diff --git a/fs/xfs/libxfs/xfs_refcount_btree.h b/fs/xfs/libxfs/xfs_refcount_btree.h
index d2852b6e1fa8..c868394ac02e 100644
--- a/fs/xfs/libxfs/xfs_refcount_btree.h
+++ b/fs/xfs/libxfs/xfs_refcount_btree.h
@@ -55,6 +55,7 @@ extern xfs_extlen_t xfs_refcountbt_max_size(struct xfs_mount *mp,
 		xfs_agblock_t agblocks);
 
 extern int xfs_refcountbt_calc_reserves(struct xfs_mount *mp,
-		xfs_agnumber_t agno, xfs_extlen_t *ask, xfs_extlen_t *used);
+		struct xfs_trans *tp, xfs_agnumber_t agno, xfs_extlen_t *ask,
+		xfs_extlen_t *used);
 
 #endif	/* __XFS_REFCOUNT_BTREE_H__ */
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.c b/fs/xfs/libxfs/xfs_rmap_btree.c
index 221a88ea60bb..f79cf040d745 100644
--- a/fs/xfs/libxfs/xfs_rmap_btree.c
+++ b/fs/xfs/libxfs/xfs_rmap_btree.c
@@ -554,6 +554,7 @@ xfs_rmapbt_max_size(
 int
 xfs_rmapbt_calc_reserves(
 	struct xfs_mount	*mp,
+	struct xfs_trans	*tp,
 	xfs_agnumber_t		agno,
 	xfs_extlen_t		*ask,
 	xfs_extlen_t		*used)
@@ -567,14 +568,14 @@ xfs_rmapbt_calc_reserves(
 	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
 		return 0;
 
-	error = xfs_alloc_read_agf(mp, NULL, agno, 0, &agbp);
+	error = xfs_alloc_read_agf(mp, tp, agno, 0, &agbp);
 	if (error)
 		return error;
 
 	agf = XFS_BUF_TO_AGF(agbp);
 	agblocks = be32_to_cpu(agf->agf_length);
 	tree_len = be32_to_cpu(agf->agf_rmap_blocks);
-	xfs_buf_relse(agbp);
+	xfs_trans_brelse(tp, agbp);
 
 	/* Reserve 1% of the AG or enough for 1 block per record. */
 	*ask += max(agblocks / 100, xfs_rmapbt_max_size(mp, agblocks));
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.h b/fs/xfs/libxfs/xfs_rmap_btree.h
index 50198b6c3bb2..820d668b063d 100644
--- a/fs/xfs/libxfs/xfs_rmap_btree.h
+++ b/fs/xfs/libxfs/xfs_rmap_btree.h
@@ -51,7 +51,7 @@ extern xfs_extlen_t xfs_rmapbt_calc_size(struct xfs_mount *mp,
 extern xfs_extlen_t xfs_rmapbt_max_size(struct xfs_mount *mp,
 		xfs_agblock_t agblocks);
 
-extern int xfs_rmapbt_calc_reserves(struct xfs_mount *mp,
+extern int xfs_rmapbt_calc_reserves(struct xfs_mount *mp, struct xfs_trans *tp,
 		xfs_agnumber_t agno, xfs_extlen_t *ask, xfs_extlen_t *used);
 
 #endif	/* __XFS_RMAP_BTREE_H__ */
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 3f2bd6032cf8..7c00b8bedfe3 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -536,7 +536,7 @@ xfs_fs_reserve_ag_blocks(
 
 	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
 		pag = xfs_perag_get(mp, agno);
-		err2 = xfs_ag_resv_init(pag);
+		err2 = xfs_ag_resv_init(pag, NULL);
 		xfs_perag_put(pag);
 		if (err2 && !error)
 			error = err2;


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 02/16] xfs: move the repair extent list into its own file
  2018-07-26  0:19 [PATCH v17 00/16] xfs-4.19: online repair support Darrick J. Wong
  2018-07-26  0:19 ` [PATCH 01/16] xfs: pass transaction lock while setting up agresv on cyclic metadata Darrick J. Wong
@ 2018-07-26  0:19 ` Darrick J. Wong
  2018-07-27 14:21   ` Brian Foster
  2018-07-26  0:19 ` [PATCH 03/16] xfs: refactor the xrep_extent_list into xfs_bitmap Darrick J. Wong
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 26+ messages in thread
From: Darrick J. Wong @ 2018-07-26  0:19 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, david, allison.henderson

From: Darrick J. Wong <darrick.wong@oracle.com>

Move the xrep_extent_list code into a separate file.  Logically, this
data structure is really just a clumsy bitmap, and in the next patch
we'll make this more obvious.  No functional changes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile       |    1 
 fs/xfs/scrub/bitmap.c |  208 +++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/bitmap.h |   37 +++++++++
 fs/xfs/scrub/repair.c |  194 ----------------------------------------------
 fs/xfs/scrub/repair.h |   27 ------
 5 files changed, 248 insertions(+), 219 deletions(-)
 create mode 100644 fs/xfs/scrub/bitmap.c
 create mode 100644 fs/xfs/scrub/bitmap.h


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index a36cccbec169..57ec46951ede 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -164,6 +164,7 @@ xfs-$(CONFIG_XFS_QUOTA)		+= scrub/quota.o
 ifeq ($(CONFIG_XFS_ONLINE_REPAIR),y)
 xfs-y				+= $(addprefix scrub/, \
 				   agheader_repair.o \
+				   bitmap.o \
 				   repair.o \
 				   )
 endif
diff --git a/fs/xfs/scrub/bitmap.c b/fs/xfs/scrub/bitmap.c
new file mode 100644
index 000000000000..a7c2f4773f98
--- /dev/null
+++ b/fs/xfs/scrub/bitmap.c
@@ -0,0 +1,208 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/trace.h"
+#include "scrub/repair.h"
+#include "scrub/bitmap.h"
+
+/* Collect a dead btree extent for later disposal. */
+int
+xrep_collect_btree_extent(
+	struct xfs_scrub	*sc,
+	struct xrep_extent_list	*exlist,
+	xfs_fsblock_t		fsbno,
+	xfs_extlen_t		len)
+{
+	struct xrep_extent	*rex;
+
+	trace_xrep_collect_btree_extent(sc->mp,
+			XFS_FSB_TO_AGNO(sc->mp, fsbno),
+			XFS_FSB_TO_AGBNO(sc->mp, fsbno), len);
+
+	rex = kmem_alloc(sizeof(struct xrep_extent), KM_MAYFAIL);
+	if (!rex)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&rex->list);
+	rex->fsbno = fsbno;
+	rex->len = len;
+	list_add_tail(&rex->list, &exlist->list);
+
+	return 0;
+}
+
+/*
+ * An error happened during the rebuild so the transaction will be cancelled.
+ * The fs will shut down, and the administrator has to unmount and run repair.
+ * Therefore, free all the memory associated with the list so we can die.
+ */
+void
+xrep_cancel_btree_extents(
+	struct xfs_scrub	*sc,
+	struct xrep_extent_list	*exlist)
+{
+	struct xrep_extent	*rex;
+	struct xrep_extent	*n;
+
+	for_each_xrep_extent_safe(rex, n, exlist) {
+		list_del(&rex->list);
+		kmem_free(rex);
+	}
+}
+
+/* Compare two btree extents. */
+static int
+xrep_btree_extent_cmp(
+	void			*priv,
+	struct list_head	*a,
+	struct list_head	*b)
+{
+	struct xrep_extent	*ap;
+	struct xrep_extent	*bp;
+
+	ap = container_of(a, struct xrep_extent, list);
+	bp = container_of(b, struct xrep_extent, list);
+
+	if (ap->fsbno > bp->fsbno)
+		return 1;
+	if (ap->fsbno < bp->fsbno)
+		return -1;
+	return 0;
+}
+
+/*
+ * Remove all the blocks mentioned in @sublist from the extents in @exlist.
+ *
+ * The intent is that callers will iterate the rmapbt for all of its records
+ * for a given owner to generate @exlist; and iterate all the blocks of the
+ * metadata structures that are not being rebuilt and have the same rmapbt
+ * owner to generate @sublist.  This routine subtracts all the extents
+ * mentioned in sublist from all the extents linked in @exlist, which leaves
+ * @exlist as the list of blocks that are not accounted for, which we assume
+ * are the dead blocks of the old metadata structure.  The blocks mentioned in
+ * @exlist can be reaped.
+ */
+#define LEFT_ALIGNED	(1 << 0)
+#define RIGHT_ALIGNED	(1 << 1)
+int
+xrep_subtract_extents(
+	struct xfs_scrub	*sc,
+	struct xrep_extent_list	*exlist,
+	struct xrep_extent_list	*sublist)
+{
+	struct list_head	*lp;
+	struct xrep_extent	*ex;
+	struct xrep_extent	*newex;
+	struct xrep_extent	*subex;
+	xfs_fsblock_t		sub_fsb;
+	xfs_extlen_t		sub_len;
+	int			state;
+	int			error = 0;
+
+	if (list_empty(&exlist->list) || list_empty(&sublist->list))
+		return 0;
+	ASSERT(!list_empty(&sublist->list));
+
+	list_sort(NULL, &exlist->list, xrep_btree_extent_cmp);
+	list_sort(NULL, &sublist->list, xrep_btree_extent_cmp);
+
+	/*
+	 * Now that we've sorted both lists, we iterate exlist once, rolling
+	 * forward through sublist and/or exlist as necessary until we find an
+	 * overlap or reach the end of either list.  We do not reset lp to the
+	 * head of exlist nor do we reset subex to the head of sublist.  The
+	 * list traversal is similar to merge sort, but we're deleting
+	 * instead.  In this manner we avoid O(n^2) operations.
+	 */
+	subex = list_first_entry(&sublist->list, struct xrep_extent,
+			list);
+	lp = exlist->list.next;
+	while (lp != &exlist->list) {
+		ex = list_entry(lp, struct xrep_extent, list);
+
+		/*
+		 * Advance subex and/or ex until we find a pair that
+		 * intersect or we run out of extents.
+		 */
+		while (subex->fsbno + subex->len <= ex->fsbno) {
+			if (list_is_last(&subex->list, &sublist->list))
+				goto out;
+			subex = list_next_entry(subex, list);
+		}
+		if (subex->fsbno >= ex->fsbno + ex->len) {
+			lp = lp->next;
+			continue;
+		}
+
+		/* trim subex to fit the extent we have */
+		sub_fsb = subex->fsbno;
+		sub_len = subex->len;
+		if (subex->fsbno < ex->fsbno) {
+			sub_len -= ex->fsbno - subex->fsbno;
+			sub_fsb = ex->fsbno;
+		}
+		if (sub_len > ex->len)
+			sub_len = ex->len;
+
+		state = 0;
+		if (sub_fsb == ex->fsbno)
+			state |= LEFT_ALIGNED;
+		if (sub_fsb + sub_len == ex->fsbno + ex->len)
+			state |= RIGHT_ALIGNED;
+		switch (state) {
+		case LEFT_ALIGNED:
+			/* Coincides with only the left. */
+			ex->fsbno += sub_len;
+			ex->len -= sub_len;
+			break;
+		case RIGHT_ALIGNED:
+			/* Coincides with only the right. */
+			ex->len -= sub_len;
+			lp = lp->next;
+			break;
+		case LEFT_ALIGNED | RIGHT_ALIGNED:
+			/* Total overlap, just delete ex. */
+			lp = lp->next;
+			list_del(&ex->list);
+			kmem_free(ex);
+			break;
+		case 0:
+			/*
+			 * Deleting from the middle: add the new right extent
+			 * and then shrink the left extent.
+			 */
+			newex = kmem_alloc(sizeof(struct xrep_extent),
+					KM_MAYFAIL);
+			if (!newex) {
+				error = -ENOMEM;
+				goto out;
+			}
+			INIT_LIST_HEAD(&newex->list);
+			newex->fsbno = sub_fsb + sub_len;
+			newex->len = ex->fsbno + ex->len - newex->fsbno;
+			list_add(&newex->list, &ex->list);
+			ex->len = sub_fsb - ex->fsbno;
+			lp = lp->next;
+			break;
+		default:
+			ASSERT(0);
+			break;
+		}
+	}
+
+out:
+	return error;
+}
+#undef LEFT_ALIGNED
+#undef RIGHT_ALIGNED
diff --git a/fs/xfs/scrub/bitmap.h b/fs/xfs/scrub/bitmap.h
new file mode 100644
index 000000000000..1038157695a8
--- /dev/null
+++ b/fs/xfs/scrub/bitmap.h
@@ -0,0 +1,37 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ */
+#ifndef __XFS_SCRUB_BITMAP_H__
+#define __XFS_SCRUB_BITMAP_H__
+
+struct xrep_extent {
+	struct list_head	list;
+	xfs_fsblock_t		fsbno;
+	xfs_extlen_t		len;
+};
+
+struct xrep_extent_list {
+	struct list_head	list;
+};
+
+static inline void
+xrep_init_extent_list(
+	struct xrep_extent_list		*exlist)
+{
+	INIT_LIST_HEAD(&exlist->list);
+}
+
+#define for_each_xrep_extent_safe(rbe, n, exlist) \
+	list_for_each_entry_safe((rbe), (n), &(exlist)->list, list)
+int xrep_collect_btree_extent(struct xfs_scrub *sc,
+		struct xrep_extent_list *btlist, xfs_fsblock_t fsbno,
+		xfs_extlen_t len);
+void xrep_cancel_btree_extents(struct xfs_scrub *sc,
+		struct xrep_extent_list *btlist);
+int xrep_subtract_extents(struct xfs_scrub *sc,
+		struct xrep_extent_list *exlist,
+		struct xrep_extent_list *sublist);
+
+#endif	/* __XFS_SCRUB_BITMAP_H__ */
diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
index 5de1cac424ec..27a904ef6189 100644
--- a/fs/xfs/scrub/repair.c
+++ b/fs/xfs/scrub/repair.c
@@ -34,6 +34,7 @@
 #include "scrub/common.h"
 #include "scrub/trace.h"
 #include "scrub/repair.h"
+#include "scrub/bitmap.h"
 
 /*
  * Attempt to repair some metadata, if the metadata is corrupt and userspace
@@ -380,200 +381,7 @@ xrep_init_btblock(
  * sublist.  As with the other btrees we subtract sublist from exlist, and the
  * result (since the rmapbt lives in the free space) are the blocks from the
  * old rmapbt.
- */
-
-/* Collect a dead btree extent for later disposal. */
-int
-xrep_collect_btree_extent(
-	struct xfs_scrub	*sc,
-	struct xrep_extent_list	*exlist,
-	xfs_fsblock_t		fsbno,
-	xfs_extlen_t		len)
-{
-	struct xrep_extent	*rex;
-
-	trace_xrep_collect_btree_extent(sc->mp,
-			XFS_FSB_TO_AGNO(sc->mp, fsbno),
-			XFS_FSB_TO_AGBNO(sc->mp, fsbno), len);
-
-	rex = kmem_alloc(sizeof(struct xrep_extent), KM_MAYFAIL);
-	if (!rex)
-		return -ENOMEM;
-
-	INIT_LIST_HEAD(&rex->list);
-	rex->fsbno = fsbno;
-	rex->len = len;
-	list_add_tail(&rex->list, &exlist->list);
-
-	return 0;
-}
-
-/*
- * An error happened during the rebuild so the transaction will be cancelled.
- * The fs will shut down, and the administrator has to unmount and run repair.
- * Therefore, free all the memory associated with the list so we can die.
- */
-void
-xrep_cancel_btree_extents(
-	struct xfs_scrub	*sc,
-	struct xrep_extent_list	*exlist)
-{
-	struct xrep_extent	*rex;
-	struct xrep_extent	*n;
-
-	for_each_xrep_extent_safe(rex, n, exlist) {
-		list_del(&rex->list);
-		kmem_free(rex);
-	}
-}
-
-/* Compare two btree extents. */
-static int
-xrep_btree_extent_cmp(
-	void			*priv,
-	struct list_head	*a,
-	struct list_head	*b)
-{
-	struct xrep_extent	*ap;
-	struct xrep_extent	*bp;
-
-	ap = container_of(a, struct xrep_extent, list);
-	bp = container_of(b, struct xrep_extent, list);
-
-	if (ap->fsbno > bp->fsbno)
-		return 1;
-	if (ap->fsbno < bp->fsbno)
-		return -1;
-	return 0;
-}
-
-/*
- * Remove all the blocks mentioned in @sublist from the extents in @exlist.
  *
- * The intent is that callers will iterate the rmapbt for all of its records
- * for a given owner to generate @exlist; and iterate all the blocks of the
- * metadata structures that are not being rebuilt and have the same rmapbt
- * owner to generate @sublist.  This routine subtracts all the extents
- * mentioned in sublist from all the extents linked in @exlist, which leaves
- * @exlist as the list of blocks that are not accounted for, which we assume
- * are the dead blocks of the old metadata structure.  The blocks mentioned in
- * @exlist can be reaped.
- */
-#define LEFT_ALIGNED	(1 << 0)
-#define RIGHT_ALIGNED	(1 << 1)
-int
-xrep_subtract_extents(
-	struct xfs_scrub	*sc,
-	struct xrep_extent_list	*exlist,
-	struct xrep_extent_list	*sublist)
-{
-	struct list_head	*lp;
-	struct xrep_extent	*ex;
-	struct xrep_extent	*newex;
-	struct xrep_extent	*subex;
-	xfs_fsblock_t		sub_fsb;
-	xfs_extlen_t		sub_len;
-	int			state;
-	int			error = 0;
-
-	if (list_empty(&exlist->list) || list_empty(&sublist->list))
-		return 0;
-	ASSERT(!list_empty(&sublist->list));
-
-	list_sort(NULL, &exlist->list, xrep_btree_extent_cmp);
-	list_sort(NULL, &sublist->list, xrep_btree_extent_cmp);
-
-	/*
-	 * Now that we've sorted both lists, we iterate exlist once, rolling
-	 * forward through sublist and/or exlist as necessary until we find an
-	 * overlap or reach the end of either list.  We do not reset lp to the
-	 * head of exlist nor do we reset subex to the head of sublist.  The
-	 * list traversal is similar to merge sort, but we're deleting
-	 * instead.  In this manner we avoid O(n^2) operations.
-	 */
-	subex = list_first_entry(&sublist->list, struct xrep_extent,
-			list);
-	lp = exlist->list.next;
-	while (lp != &exlist->list) {
-		ex = list_entry(lp, struct xrep_extent, list);
-
-		/*
-		 * Advance subex and/or ex until we find a pair that
-		 * intersect or we run out of extents.
-		 */
-		while (subex->fsbno + subex->len <= ex->fsbno) {
-			if (list_is_last(&subex->list, &sublist->list))
-				goto out;
-			subex = list_next_entry(subex, list);
-		}
-		if (subex->fsbno >= ex->fsbno + ex->len) {
-			lp = lp->next;
-			continue;
-		}
-
-		/* trim subex to fit the extent we have */
-		sub_fsb = subex->fsbno;
-		sub_len = subex->len;
-		if (subex->fsbno < ex->fsbno) {
-			sub_len -= ex->fsbno - subex->fsbno;
-			sub_fsb = ex->fsbno;
-		}
-		if (sub_len > ex->len)
-			sub_len = ex->len;
-
-		state = 0;
-		if (sub_fsb == ex->fsbno)
-			state |= LEFT_ALIGNED;
-		if (sub_fsb + sub_len == ex->fsbno + ex->len)
-			state |= RIGHT_ALIGNED;
-		switch (state) {
-		case LEFT_ALIGNED:
-			/* Coincides with only the left. */
-			ex->fsbno += sub_len;
-			ex->len -= sub_len;
-			break;
-		case RIGHT_ALIGNED:
-			/* Coincides with only the right. */
-			ex->len -= sub_len;
-			lp = lp->next;
-			break;
-		case LEFT_ALIGNED | RIGHT_ALIGNED:
-			/* Total overlap, just delete ex. */
-			lp = lp->next;
-			list_del(&ex->list);
-			kmem_free(ex);
-			break;
-		case 0:
-			/*
-			 * Deleting from the middle: add the new right extent
-			 * and then shrink the left extent.
-			 */
-			newex = kmem_alloc(sizeof(struct xrep_extent),
-					KM_MAYFAIL);
-			if (!newex) {
-				error = -ENOMEM;
-				goto out;
-			}
-			INIT_LIST_HEAD(&newex->list);
-			newex->fsbno = sub_fsb + sub_len;
-			newex->len = ex->fsbno + ex->len - newex->fsbno;
-			list_add(&newex->list, &ex->list);
-			ex->len = sub_fsb - ex->fsbno;
-			lp = lp->next;
-			break;
-		default:
-			ASSERT(0);
-			break;
-		}
-	}
-
-out:
-	return error;
-}
-#undef LEFT_ALIGNED
-#undef RIGHT_ALIGNED
-
-/*
  * Disposal of Blocks from Old per-AG Btrees
  *
  * Now that we've constructed a new btree to replace the damaged one, we want
diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
index 91355f6b0087..a3d491a438f4 100644
--- a/fs/xfs/scrub/repair.h
+++ b/fs/xfs/scrub/repair.h
@@ -27,33 +27,8 @@ int xrep_init_btblock(struct xfs_scrub *sc, xfs_fsblock_t fsb,
 		struct xfs_buf **bpp, xfs_btnum_t btnum,
 		const struct xfs_buf_ops *ops);
 
-struct xrep_extent {
-	struct list_head	list;
-	xfs_fsblock_t		fsbno;
-	xfs_extlen_t		len;
-};
-
-struct xrep_extent_list {
-	struct list_head	list;
-};
-
-static inline void
-xrep_init_extent_list(
-	struct xrep_extent_list	*exlist)
-{
-	INIT_LIST_HEAD(&exlist->list);
-}
+struct xrep_extent_list;
 
-#define for_each_xrep_extent_safe(rbe, n, exlist) \
-	list_for_each_entry_safe((rbe), (n), &(exlist)->list, list)
-int xrep_collect_btree_extent(struct xfs_scrub *sc,
-		struct xrep_extent_list *btlist, xfs_fsblock_t fsbno,
-		xfs_extlen_t len);
-void xrep_cancel_btree_extents(struct xfs_scrub *sc,
-		struct xrep_extent_list *btlist);
-int xrep_subtract_extents(struct xfs_scrub *sc,
-		struct xrep_extent_list *exlist,
-		struct xrep_extent_list *sublist);
 int xrep_fix_freelist(struct xfs_scrub *sc, bool can_shrink);
 int xrep_invalidate_blocks(struct xfs_scrub *sc,
 		struct xrep_extent_list *btlist);


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 03/16] xfs: refactor the xrep_extent_list into xfs_bitmap
  2018-07-26  0:19 [PATCH v17 00/16] xfs-4.19: online repair support Darrick J. Wong
  2018-07-26  0:19 ` [PATCH 01/16] xfs: pass transaction lock while setting up agresv on cyclic metadata Darrick J. Wong
  2018-07-26  0:19 ` [PATCH 02/16] xfs: move the repair extent list into its own file Darrick J. Wong
@ 2018-07-26  0:19 ` Darrick J. Wong
  2018-07-27 14:21   ` Brian Foster
  2018-07-26  0:19 ` [PATCH 04/16] xfs: repair the AGF Darrick J. Wong
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 26+ messages in thread
From: Darrick J. Wong @ 2018-07-26  0:19 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, david, allison.henderson

From: Darrick J. Wong <darrick.wong@oracle.com>

As mentioned previously, the xrep_extent_list basically implements a
bitmap with two functions: set and disjoint union.  Rename all these
functions to xfs_bitmap to shorten the name and make it more obvious
what we're doing.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/scrub/bitmap.c |  173 +++++++++++++++++++++++++------------------------
 fs/xfs/scrub/bitmap.h |   34 ++++------
 fs/xfs/scrub/repair.c |   85 +++++++++++-------------
 fs/xfs/scrub/repair.h |    8 +-
 fs/xfs/scrub/trace.h  |    1 
 5 files changed, 144 insertions(+), 157 deletions(-)


diff --git a/fs/xfs/scrub/bitmap.c b/fs/xfs/scrub/bitmap.c
index a7c2f4773f98..4840f5a1e179 100644
--- a/fs/xfs/scrub/bitmap.c
+++ b/fs/xfs/scrub/bitmap.c
@@ -16,51 +16,53 @@
 #include "scrub/repair.h"
 #include "scrub/bitmap.h"
 
-/* Collect a dead btree extent for later disposal. */
+/*
+ * Set a range of this bitmap.  Caller must ensure the range is not set.
+ *
+ * This is the logical equivalent of bitmap |= mask(fsbno, len).
+ */
 int
-xrep_collect_btree_extent(
-	struct xfs_scrub	*sc,
-	struct xrep_extent_list	*exlist,
+xfs_bitmap_set(
+	struct xfs_bitmap	*bitmap,
 	xfs_fsblock_t		fsbno,
-	xfs_extlen_t		len)
+	xfs_fsblock_t		len)
 {
-	struct xrep_extent	*rex;
+	struct xfs_bitmap_range	*bmr;
 
-	trace_xrep_collect_btree_extent(sc->mp,
-			XFS_FSB_TO_AGNO(sc->mp, fsbno),
-			XFS_FSB_TO_AGBNO(sc->mp, fsbno), len);
-
-	rex = kmem_alloc(sizeof(struct xrep_extent), KM_MAYFAIL);
-	if (!rex)
+	bmr = kmem_alloc(sizeof(struct xfs_bitmap_range), KM_MAYFAIL);
+	if (!bmr)
 		return -ENOMEM;
 
-	INIT_LIST_HEAD(&rex->list);
-	rex->fsbno = fsbno;
-	rex->len = len;
-	list_add_tail(&rex->list, &exlist->list);
+	INIT_LIST_HEAD(&bmr->list);
+	bmr->fsbno = fsbno;
+	bmr->len = len;
+	list_add_tail(&bmr->list, &bitmap->list);
 
 	return 0;
 }
 
-/*
- * An error happened during the rebuild so the transaction will be cancelled.
- * The fs will shut down, and the administrator has to unmount and run repair.
- * Therefore, free all the memory associated with the list so we can die.
- */
+/* Free everything related to this bitmap. */
 void
-xrep_cancel_btree_extents(
-	struct xfs_scrub	*sc,
-	struct xrep_extent_list	*exlist)
+xfs_bitmap_destroy(
+	struct xfs_bitmap	*bitmap)
 {
-	struct xrep_extent	*rex;
-	struct xrep_extent	*n;
+	struct xfs_bitmap_range	*bmr;
+	struct xfs_bitmap_range	*n;
 
-	for_each_xrep_extent_safe(rex, n, exlist) {
-		list_del(&rex->list);
-		kmem_free(rex);
+	for_each_xfs_bitmap_extent(bmr, n, bitmap) {
+		list_del(&bmr->list);
+		kmem_free(bmr);
 	}
 }
 
+/* Set up a per-AG block bitmap. */
+void
+xfs_bitmap_init(
+	struct xfs_bitmap	*bitmap)
+{
+	INIT_LIST_HEAD(&bitmap->list);
+}
+
 /* Compare two btree extents. */
 static int
 xrep_btree_extent_cmp(
@@ -68,11 +70,11 @@ xrep_btree_extent_cmp(
 	struct list_head	*a,
 	struct list_head	*b)
 {
-	struct xrep_extent	*ap;
-	struct xrep_extent	*bp;
+	struct xfs_bitmap_range	*ap;
+	struct xfs_bitmap_range	*bp;
 
-	ap = container_of(a, struct xrep_extent, list);
-	bp = container_of(b, struct xrep_extent, list);
+	ap = container_of(a, struct xfs_bitmap_range, list);
+	bp = container_of(b, struct xfs_bitmap_range, list);
 
 	if (ap->fsbno > bp->fsbno)
 		return 1;
@@ -82,117 +84,118 @@ xrep_btree_extent_cmp(
 }
 
 /*
- * Remove all the blocks mentioned in @sublist from the extents in @exlist.
+ * Remove all the blocks mentioned in @sub from the extents in @bitmap.
  *
  * The intent is that callers will iterate the rmapbt for all of its records
- * for a given owner to generate @exlist; and iterate all the blocks of the
+ * for a given owner to generate @bitmap; and iterate all the blocks of the
  * metadata structures that are not being rebuilt and have the same rmapbt
- * owner to generate @sublist.  This routine subtracts all the extents
- * mentioned in sublist from all the extents linked in @exlist, which leaves
- * @exlist as the list of blocks that are not accounted for, which we assume
+ * owner to generate @sub.  This routine subtracts all the extents
+ * mentioned in sub from all the extents linked in @bitmap, which leaves
+ * @bitmap as the list of blocks that are not accounted for, which we assume
  * are the dead blocks of the old metadata structure.  The blocks mentioned in
- * @exlist can be reaped.
+ * @bitmap can be reaped.
+ *
+ * This is the logical equivalent of bitmap &= ~sub.
  */
 #define LEFT_ALIGNED	(1 << 0)
 #define RIGHT_ALIGNED	(1 << 1)
 int
-xrep_subtract_extents(
-	struct xfs_scrub	*sc,
-	struct xrep_extent_list	*exlist,
-	struct xrep_extent_list	*sublist)
+xfs_bitmap_disunion(
+	struct xfs_bitmap	*bitmap,
+	struct xfs_bitmap	*sub)
 {
 	struct list_head	*lp;
-	struct xrep_extent	*ex;
-	struct xrep_extent	*newex;
-	struct xrep_extent	*subex;
+	struct xfs_bitmap_range	*br;
+	struct xfs_bitmap_range	*new_br;
+	struct xfs_bitmap_range	*sub_br;
 	xfs_fsblock_t		sub_fsb;
-	xfs_extlen_t		sub_len;
+	xfs_fsblock_t		sub_len;
 	int			state;
 	int			error = 0;
 
-	if (list_empty(&exlist->list) || list_empty(&sublist->list))
+	if (list_empty(&bitmap->list) || list_empty(&sub->list))
 		return 0;
-	ASSERT(!list_empty(&sublist->list));
+	ASSERT(!list_empty(&sub->list));
 
-	list_sort(NULL, &exlist->list, xrep_btree_extent_cmp);
-	list_sort(NULL, &sublist->list, xrep_btree_extent_cmp);
+	list_sort(NULL, &bitmap->list, xrep_btree_extent_cmp);
+	list_sort(NULL, &sub->list, xrep_btree_extent_cmp);
 
 	/*
-	 * Now that we've sorted both lists, we iterate exlist once, rolling
-	 * forward through sublist and/or exlist as necessary until we find an
+	 * Now that we've sorted both lists, we iterate bitmap once, rolling
+	 * forward through sub and/or bitmap as necessary until we find an
 	 * overlap or reach the end of either list.  We do not reset lp to the
-	 * head of exlist nor do we reset subex to the head of sublist.  The
+	 * head of bitmap nor do we reset sub_br to the head of sub.  The
 	 * list traversal is similar to merge sort, but we're deleting
 	 * instead.  In this manner we avoid O(n^2) operations.
 	 */
-	subex = list_first_entry(&sublist->list, struct xrep_extent,
+	sub_br = list_first_entry(&sub->list, struct xfs_bitmap_range,
 			list);
-	lp = exlist->list.next;
-	while (lp != &exlist->list) {
-		ex = list_entry(lp, struct xrep_extent, list);
+	lp = bitmap->list.next;
+	while (lp != &bitmap->list) {
+		br = list_entry(lp, struct xfs_bitmap_range, list);
 
 		/*
-		 * Advance subex and/or ex until we find a pair that
+		 * Advance sub_br and/or br until we find a pair that
 		 * intersect or we run out of extents.
 		 */
-		while (subex->fsbno + subex->len <= ex->fsbno) {
-			if (list_is_last(&subex->list, &sublist->list))
+		while (sub_br->fsbno + sub_br->len <= br->fsbno) {
+			if (list_is_last(&sub_br->list, &sub->list))
 				goto out;
-			subex = list_next_entry(subex, list);
+			sub_br = list_next_entry(sub_br, list);
 		}
-		if (subex->fsbno >= ex->fsbno + ex->len) {
+		if (sub_br->fsbno >= br->fsbno + br->len) {
 			lp = lp->next;
 			continue;
 		}
 
-		/* trim subex to fit the extent we have */
-		sub_fsb = subex->fsbno;
-		sub_len = subex->len;
-		if (subex->fsbno < ex->fsbno) {
-			sub_len -= ex->fsbno - subex->fsbno;
-			sub_fsb = ex->fsbno;
+		/* trim sub_br to fit the extent we have */
+		sub_fsb = sub_br->fsbno;
+		sub_len = sub_br->len;
+		if (sub_br->fsbno < br->fsbno) {
+			sub_len -= br->fsbno - sub_br->fsbno;
+			sub_fsb = br->fsbno;
 		}
-		if (sub_len > ex->len)
-			sub_len = ex->len;
+		if (sub_len > br->len)
+			sub_len = br->len;
 
 		state = 0;
-		if (sub_fsb == ex->fsbno)
+		if (sub_fsb == br->fsbno)
 			state |= LEFT_ALIGNED;
-		if (sub_fsb + sub_len == ex->fsbno + ex->len)
+		if (sub_fsb + sub_len == br->fsbno + br->len)
 			state |= RIGHT_ALIGNED;
 		switch (state) {
 		case LEFT_ALIGNED:
 			/* Coincides with only the left. */
-			ex->fsbno += sub_len;
-			ex->len -= sub_len;
+			br->fsbno += sub_len;
+			br->len -= sub_len;
 			break;
 		case RIGHT_ALIGNED:
 			/* Coincides with only the right. */
-			ex->len -= sub_len;
+			br->len -= sub_len;
 			lp = lp->next;
 			break;
 		case LEFT_ALIGNED | RIGHT_ALIGNED:
 			/* Total overlap, just delete ex. */
 			lp = lp->next;
-			list_del(&ex->list);
-			kmem_free(ex);
+			list_del(&br->list);
+			kmem_free(br);
 			break;
 		case 0:
 			/*
 			 * Deleting from the middle: add the new right extent
 			 * and then shrink the left extent.
 			 */
-			newex = kmem_alloc(sizeof(struct xrep_extent),
+			new_br = kmem_alloc(sizeof(struct xfs_bitmap_range),
 					KM_MAYFAIL);
-			if (!newex) {
+			if (!new_br) {
 				error = -ENOMEM;
 				goto out;
 			}
-			INIT_LIST_HEAD(&newex->list);
-			newex->fsbno = sub_fsb + sub_len;
-			newex->len = ex->fsbno + ex->len - newex->fsbno;
-			list_add(&newex->list, &ex->list);
-			ex->len = sub_fsb - ex->fsbno;
+			INIT_LIST_HEAD(&new_br->list);
+			new_br->fsbno = sub_fsb + sub_len;
+			new_br->len = br->fsbno + br->len - new_br->fsbno;
+			list_add(&new_br->list, &br->list);
+			br->len = sub_fsb - br->fsbno;
 			lp = lp->next;
 			break;
 		default:
diff --git a/fs/xfs/scrub/bitmap.h b/fs/xfs/scrub/bitmap.h
index 1038157695a8..3c39900e9269 100644
--- a/fs/xfs/scrub/bitmap.h
+++ b/fs/xfs/scrub/bitmap.h
@@ -6,32 +6,28 @@
 #ifndef __XFS_SCRUB_BITMAP_H__
 #define __XFS_SCRUB_BITMAP_H__
 
-struct xrep_extent {
+struct xfs_bitmap_range {
 	struct list_head	list;
 	xfs_fsblock_t		fsbno;
-	xfs_extlen_t		len;
+	xfs_fsblock_t		len;
 };
 
-struct xrep_extent_list {
+struct xfs_bitmap {
 	struct list_head	list;
 };
 
-static inline void
-xrep_init_extent_list(
-	struct xrep_extent_list		*exlist)
-{
-	INIT_LIST_HEAD(&exlist->list);
-}
+void xfs_bitmap_init(struct xfs_bitmap *bitmap);
+void xfs_bitmap_destroy(struct xfs_bitmap *bitmap);
 
-#define for_each_xrep_extent_safe(rbe, n, exlist) \
-	list_for_each_entry_safe((rbe), (n), &(exlist)->list, list)
-int xrep_collect_btree_extent(struct xfs_scrub *sc,
-		struct xrep_extent_list *btlist, xfs_fsblock_t fsbno,
-		xfs_extlen_t len);
-void xrep_cancel_btree_extents(struct xfs_scrub *sc,
-		struct xrep_extent_list *btlist);
-int xrep_subtract_extents(struct xfs_scrub *sc,
-		struct xrep_extent_list *exlist,
-		struct xrep_extent_list *sublist);
+#define for_each_xfs_bitmap_extent(bex, n, bitmap) \
+	list_for_each_entry_safe((bex), (n), &(bitmap)->list, list)
+
+#define for_each_xfs_bitmap_block(fsbno, bex, n, bitmap) \
+	list_for_each_entry_safe((bex), (n), &(bitmap)->list, list) \
+		for (fsbno = bex->fsbno; fsbno < bex->fsbno + bex->len; fsbno++)
+
+int xfs_bitmap_set(struct xfs_bitmap *bitmap, xfs_fsblock_t fsbno,
+		xfs_fsblock_t len);
+int xfs_bitmap_disunion(struct xfs_bitmap *bitmap, struct xfs_bitmap *sub);
 
 #endif	/* __XFS_SCRUB_BITMAP_H__ */
diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
index 27a904ef6189..85b048b341a0 100644
--- a/fs/xfs/scrub/repair.c
+++ b/fs/xfs/scrub/repair.c
@@ -368,17 +368,17 @@ xrep_init_btblock(
  *
  * However, that leaves the matter of removing all the metadata describing the
  * old broken structure.  For primary metadata we use the rmap data to collect
- * every extent with a matching rmap owner (exlist); we then iterate all other
+ * every extent with a matching rmap owner (bitmap); we then iterate all other
  * metadata structures with the same rmap owner to collect the extents that
- * cannot be removed (sublist).  We then subtract sublist from exlist to
+ * cannot be removed (sublist).  We then subtract sublist from bitmap to
  * derive the blocks that were used by the old btree.  These blocks can be
  * reaped.
  *
  * For rmapbt reconstructions we must use different tactics for extent
  * collection.  First we iterate all primary metadata (this excludes the old
  * rmapbt, obviously) to generate new rmap records.  The gaps in the rmap
- * records are collected as exlist.  The bnobt records are collected as
- * sublist.  As with the other btrees we subtract sublist from exlist, and the
+ * records are collected as bitmap.  The bnobt records are collected as
+ * sublist.  As with the other btrees we subtract sublist from bitmap, and the
  * result (since the rmapbt lives in the free space) are the blocks from the
  * old rmapbt.
  *
@@ -386,11 +386,11 @@ xrep_init_btblock(
  *
  * Now that we've constructed a new btree to replace the damaged one, we want
  * to dispose of the blocks that (we think) the old btree was using.
- * Previously, we used the rmapbt to collect the extents (exlist) with the
+ * Previously, we used the rmapbt to collect the extents (bitmap) with the
  * rmap owner corresponding to the tree we rebuilt, collected extents for any
  * blocks with the same rmap owner that are owned by another data structure
- * (sublist), and subtracted sublist from exlist.  In theory the extents
- * remaining in exlist are the old btree's blocks.
+ * (sublist), and subtracted sublist from bitmap.  In theory the extents
+ * remaining in bitmap are the old btree's blocks.
  *
  * Unfortunately, it's possible that the btree was crosslinked with other
  * blocks on disk.  The rmap data can tell us if there are multiple owners, so
@@ -406,7 +406,7 @@ xrep_init_btblock(
  * If there are no rmap records at all, we also free the block.  If the btree
  * being rebuilt lives in the free space (bnobt/cntbt/rmapbt) then there isn't
  * supposed to be a rmap record and everything is ok.  For other btrees there
- * had to have been an rmap entry for the block to have ended up on @exlist,
+ * had to have been an rmap entry for the block to have ended up on @bitmap,
  * so if it's gone now there's something wrong and the fs will shut down.
  *
  * Note: If there are multiple rmap records with only the same rmap owner as
@@ -419,7 +419,7 @@ xrep_init_btblock(
  * The caller is responsible for locking the AG headers for the entire rebuild
  * operation so that nothing else can sneak in and change the AG state while
  * we're not looking.  We also assume that the caller already invalidated any
- * buffers associated with @exlist.
+ * buffers associated with @bitmap.
  */
 
 /*
@@ -429,13 +429,12 @@ xrep_init_btblock(
 int
 xrep_invalidate_blocks(
 	struct xfs_scrub	*sc,
-	struct xrep_extent_list	*exlist)
+	struct xfs_bitmap	*bitmap)
 {
-	struct xrep_extent	*rex;
-	struct xrep_extent	*n;
+	struct xfs_bitmap_range	*bmr;
+	struct xfs_bitmap_range	*n;
 	struct xfs_buf		*bp;
 	xfs_fsblock_t		fsbno;
-	xfs_agblock_t		i;
 
 	/*
 	 * For each block in each extent, see if there's an incore buffer for
@@ -445,18 +444,16 @@ xrep_invalidate_blocks(
 	 * because we never own those; and if we can't TRYLOCK the buffer we
 	 * assume it's owned by someone else.
 	 */
-	for_each_xrep_extent_safe(rex, n, exlist) {
-		for (fsbno = rex->fsbno, i = rex->len; i > 0; fsbno++, i--) {
-			/* Skip AG headers and post-EOFS blocks */
-			if (!xfs_verify_fsbno(sc->mp, fsbno))
-				continue;
-			bp = xfs_buf_incore(sc->mp->m_ddev_targp,
-					XFS_FSB_TO_DADDR(sc->mp, fsbno),
-					XFS_FSB_TO_BB(sc->mp, 1), XBF_TRYLOCK);
-			if (bp) {
-				xfs_trans_bjoin(sc->tp, bp);
-				xfs_trans_binval(sc->tp, bp);
-			}
+	for_each_xfs_bitmap_block(fsbno, bmr, n, bitmap) {
+		/* Skip AG headers and post-EOFS blocks */
+		if (!xfs_verify_fsbno(sc->mp, fsbno))
+			continue;
+		bp = xfs_buf_incore(sc->mp->m_ddev_targp,
+				XFS_FSB_TO_DADDR(sc->mp, fsbno),
+				XFS_FSB_TO_BB(sc->mp, 1), XBF_TRYLOCK);
+		if (bp) {
+			xfs_trans_bjoin(sc->tp, bp);
+			xfs_trans_binval(sc->tp, bp);
 		}
 	}
 
@@ -519,9 +516,9 @@ xrep_put_freelist(
 	return 0;
 }
 
-/* Dispose of a single metadata block. */
+/* Dispose of a single block. */
 STATIC int
-xrep_dispose_btree_block(
+xrep_reap_block(
 	struct xfs_scrub	*sc,
 	xfs_fsblock_t		fsbno,
 	struct xfs_owner_info	*oinfo,
@@ -593,41 +590,35 @@ xrep_dispose_btree_block(
 	return error;
 }
 
-/* Dispose of btree blocks from an old per-AG btree. */
+/* Dispose of every block of every extent in the bitmap. */
 int
-xrep_reap_btree_extents(
+xrep_reap_extents(
 	struct xfs_scrub	*sc,
-	struct xrep_extent_list	*exlist,
+	struct xfs_bitmap	*bitmap,
 	struct xfs_owner_info	*oinfo,
 	enum xfs_ag_resv_type	type)
 {
-	struct xrep_extent	*rex;
-	struct xrep_extent	*n;
+	struct xfs_bitmap_range	*bmr;
+	struct xfs_bitmap_range	*n;
+	xfs_fsblock_t		fsbno;
 	int			error = 0;
 
 	ASSERT(xfs_sb_version_hasrmapbt(&sc->mp->m_sb));
 
-	/* Dispose of every block from the old btree. */
-	for_each_xrep_extent_safe(rex, n, exlist) {
+	for_each_xfs_bitmap_block(fsbno, bmr, n, bitmap) {
 		ASSERT(sc->ip != NULL ||
-		       XFS_FSB_TO_AGNO(sc->mp, rex->fsbno) == sc->sa.agno);
-
+		       XFS_FSB_TO_AGNO(sc->mp, fsbno) == sc->sa.agno);
 		trace_xrep_dispose_btree_extent(sc->mp,
-				XFS_FSB_TO_AGNO(sc->mp, rex->fsbno),
-				XFS_FSB_TO_AGBNO(sc->mp, rex->fsbno), rex->len);
+				XFS_FSB_TO_AGNO(sc->mp, fsbno),
+				XFS_FSB_TO_AGBNO(sc->mp, fsbno), 1);
 
-		for (; rex->len > 0; rex->len--, rex->fsbno++) {
-			error = xrep_dispose_btree_block(sc, rex->fsbno,
-					oinfo, type);
-			if (error)
-				goto out;
-		}
-		list_del(&rex->list);
-		kmem_free(rex);
+		error = xrep_reap_block(sc, fsbno, oinfo, type);
+		if (error)
+			goto out;
 	}
 
 out:
-	xrep_cancel_btree_extents(sc, exlist);
+	xfs_bitmap_destroy(bitmap);
 	return error;
 }
 
diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
index a3d491a438f4..5a4e92221916 100644
--- a/fs/xfs/scrub/repair.h
+++ b/fs/xfs/scrub/repair.h
@@ -27,13 +27,11 @@ int xrep_init_btblock(struct xfs_scrub *sc, xfs_fsblock_t fsb,
 		struct xfs_buf **bpp, xfs_btnum_t btnum,
 		const struct xfs_buf_ops *ops);
 
-struct xrep_extent_list;
+struct xfs_bitmap;
 
 int xrep_fix_freelist(struct xfs_scrub *sc, bool can_shrink);
-int xrep_invalidate_blocks(struct xfs_scrub *sc,
-		struct xrep_extent_list *btlist);
-int xrep_reap_btree_extents(struct xfs_scrub *sc,
-		struct xrep_extent_list *exlist,
+int xrep_invalidate_blocks(struct xfs_scrub *sc, struct xfs_bitmap *btlist);
+int xrep_reap_extents(struct xfs_scrub *sc, struct xfs_bitmap *exlist,
 		struct xfs_owner_info *oinfo, enum xfs_ag_resv_type type);
 
 struct xrep_find_ag_btree {
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 93db22c39b51..4e20f0e48232 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -511,7 +511,6 @@ DEFINE_EVENT(xrep_extent_class, name, \
 		 xfs_agblock_t agbno, xfs_extlen_t len), \
 	TP_ARGS(mp, agno, agbno, len))
 DEFINE_REPAIR_EXTENT_EVENT(xrep_dispose_btree_extent);
-DEFINE_REPAIR_EXTENT_EVENT(xrep_collect_btree_extent);
 DEFINE_REPAIR_EXTENT_EVENT(xrep_agfl_insert);
 
 DECLARE_EVENT_CLASS(xrep_rmap_class,


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 04/16] xfs: repair the AGF
  2018-07-26  0:19 [PATCH v17 00/16] xfs-4.19: online repair support Darrick J. Wong
                   ` (2 preceding siblings ...)
  2018-07-26  0:19 ` [PATCH 03/16] xfs: refactor the xrep_extent_list into xfs_bitmap Darrick J. Wong
@ 2018-07-26  0:19 ` Darrick J. Wong
  2018-07-27 14:23   ` Brian Foster
  2018-07-26  0:20 ` [PATCH 05/16] xfs: repair the AGFL Darrick J. Wong
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 26+ messages in thread
From: Darrick J. Wong @ 2018-07-26  0:19 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, david, allison.henderson

From: Darrick J. Wong <darrick.wong@oracle.com>

Regenerate the AGF from the rmap data.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/scrub/agheader_repair.c |  366 ++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/repair.c          |   27 ++-
 fs/xfs/scrub/repair.h          |    4 
 fs/xfs/scrub/scrub.c           |    2 
 4 files changed, 389 insertions(+), 10 deletions(-)


diff --git a/fs/xfs/scrub/agheader_repair.c b/fs/xfs/scrub/agheader_repair.c
index 1e96621ece3a..938af216cb1c 100644
--- a/fs/xfs/scrub/agheader_repair.c
+++ b/fs/xfs/scrub/agheader_repair.c
@@ -17,12 +17,19 @@
 #include "xfs_sb.h"
 #include "xfs_inode.h"
 #include "xfs_alloc.h"
+#include "xfs_alloc_btree.h"
 #include "xfs_ialloc.h"
+#include "xfs_ialloc_btree.h"
 #include "xfs_rmap.h"
+#include "xfs_rmap_btree.h"
+#include "xfs_refcount.h"
+#include "xfs_refcount_btree.h"
 #include "scrub/xfs_scrub.h"
 #include "scrub/scrub.h"
 #include "scrub/common.h"
 #include "scrub/trace.h"
+#include "scrub/repair.h"
+#include "scrub/bitmap.h"
 
 /* Superblock */
 
@@ -54,3 +61,362 @@ xrep_superblock(
 	xfs_trans_log_buf(sc->tp, bp, 0, BBTOB(bp->b_length) - 1);
 	return error;
 }
+
+/* AGF */
+
+struct xrep_agf_allocbt {
+	struct xfs_scrub	*sc;
+	xfs_agblock_t		freeblks;
+	xfs_agblock_t		longest;
+};
+
+/* Record free space shape information. */
+STATIC int
+xrep_agf_walk_allocbt(
+	struct xfs_btree_cur		*cur,
+	struct xfs_alloc_rec_incore	*rec,
+	void				*priv)
+{
+	struct xrep_agf_allocbt		*raa = priv;
+	int				error = 0;
+
+	if (xchk_should_terminate(raa->sc, &error))
+		return error;
+
+	raa->freeblks += rec->ar_blockcount;
+	if (rec->ar_blockcount > raa->longest)
+		raa->longest = rec->ar_blockcount;
+	return error;
+}
+
+/* Does this AGFL block look sane? */
+STATIC int
+xrep_agf_check_agfl_block(
+	struct xfs_mount	*mp,
+	xfs_agblock_t		agbno,
+	void			*priv)
+{
+	struct xfs_scrub	*sc = priv;
+
+	if (!xfs_verify_agbno(mp, sc->sa.agno, agbno))
+		return -EFSCORRUPTED;
+	return 0;
+}
+
+/*
+ * Offset within the xrep_find_ag_btree array for each btree type.  Avoid the
+ * XFS_BTNUM_ names here to avoid creating a sparse array.
+ */
+enum {
+	XREP_AGF_BNOBT = 0,
+	XREP_AGF_CNTBT,
+	XREP_AGF_RMAPBT,
+	XREP_AGF_REFCOUNTBT,
+	XREP_AGF_END,
+	XREP_AGF_MAX
+};
+
+/*
+ * Given the btree roots described by *fab, find the roots, check them for
+ * sanity, and pass the root data back out via *fab.
+ *
+ * This is /also/ a chicken and egg problem because we have to use the rmapbt
+ * (rooted in the AGF) to find the btrees rooted in the AGF.  We also have no
+ * idea if the btrees make any sense.  If we hit obvious corruptions in those
+ * btrees we'll bail out.
+ */
+STATIC int
+xrep_agf_find_btrees(
+	struct xfs_scrub		*sc,
+	struct xfs_buf			*agf_bp,
+	struct xrep_find_ag_btree	*fab,
+	struct xfs_buf			*agfl_bp)
+{
+	struct xfs_agf			*old_agf = XFS_BUF_TO_AGF(agf_bp);
+	int				error;
+
+	/* Go find the root data. */
+	error = xrep_find_ag_btree_roots(sc, agf_bp, fab, agfl_bp);
+	if (error)
+		return error;
+
+	/* We must find the bnobt, cntbt, and rmapbt roots. */
+	if (fab[XREP_AGF_BNOBT].root == NULLAGBLOCK ||
+	    fab[XREP_AGF_BNOBT].height > XFS_BTREE_MAXLEVELS ||
+	    fab[XREP_AGF_CNTBT].root == NULLAGBLOCK ||
+	    fab[XREP_AGF_CNTBT].height > XFS_BTREE_MAXLEVELS ||
+	    fab[XREP_AGF_RMAPBT].root == NULLAGBLOCK ||
+	    fab[XREP_AGF_RMAPBT].height > XFS_BTREE_MAXLEVELS)
+		return -EFSCORRUPTED;
+
+	/*
+	 * We relied on the rmapbt to reconstruct the AGF.  If we get a
+	 * different root then something's seriously wrong.
+	 */
+	if (fab[XREP_AGF_RMAPBT].root !=
+	    be32_to_cpu(old_agf->agf_roots[XFS_BTNUM_RMAPi]))
+		return -EFSCORRUPTED;
+
+	/* We must find the refcountbt root if that feature is enabled. */
+	if (xfs_sb_version_hasreflink(&sc->mp->m_sb) &&
+	    (fab[XREP_AGF_REFCOUNTBT].root == NULLAGBLOCK ||
+	     fab[XREP_AGF_REFCOUNTBT].height > XFS_BTREE_MAXLEVELS))
+		return -EFSCORRUPTED;
+
+	return 0;
+}
+
+/*
+ * Reinitialize the AGF header, making an in-core copy of the old contents so
+ * that we know which in-core state needs to be reinitialized.
+ */
+STATIC void
+xrep_agf_init_header(
+	struct xfs_scrub	*sc,
+	struct xfs_buf		*agf_bp,
+	struct xfs_agf		*old_agf)
+{
+	struct xfs_mount	*mp = sc->mp;
+	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agf_bp);
+
+	memcpy(old_agf, agf, sizeof(*old_agf));
+	memset(agf, 0, BBTOB(agf_bp->b_length));
+	agf->agf_magicnum = cpu_to_be32(XFS_AGF_MAGIC);
+	agf->agf_versionnum = cpu_to_be32(XFS_AGF_VERSION);
+	agf->agf_seqno = cpu_to_be32(sc->sa.agno);
+	agf->agf_length = cpu_to_be32(xfs_ag_block_count(mp, sc->sa.agno));
+	agf->agf_flfirst = old_agf->agf_flfirst;
+	agf->agf_fllast = old_agf->agf_fllast;
+	agf->agf_flcount = old_agf->agf_flcount;
+	if (xfs_sb_version_hascrc(&mp->m_sb))
+		uuid_copy(&agf->agf_uuid, &mp->m_sb.sb_meta_uuid);
+
+	/* Mark the incore AGF data stale until we're done fixing things. */
+	ASSERT(sc->sa.pag->pagf_init);
+	sc->sa.pag->pagf_init = 0;
+}
+
+/* Set btree root information in an AGF. */
+STATIC void
+xrep_agf_set_roots(
+	struct xfs_scrub		*sc,
+	struct xfs_agf			*agf,
+	struct xrep_find_ag_btree	*fab)
+{
+	agf->agf_roots[XFS_BTNUM_BNOi] =
+			cpu_to_be32(fab[XREP_AGF_BNOBT].root);
+	agf->agf_levels[XFS_BTNUM_BNOi] =
+			cpu_to_be32(fab[XREP_AGF_BNOBT].height);
+
+	agf->agf_roots[XFS_BTNUM_CNTi] =
+			cpu_to_be32(fab[XREP_AGF_CNTBT].root);
+	agf->agf_levels[XFS_BTNUM_CNTi] =
+			cpu_to_be32(fab[XREP_AGF_CNTBT].height);
+
+	agf->agf_roots[XFS_BTNUM_RMAPi] =
+			cpu_to_be32(fab[XREP_AGF_RMAPBT].root);
+	agf->agf_levels[XFS_BTNUM_RMAPi] =
+			cpu_to_be32(fab[XREP_AGF_RMAPBT].height);
+
+	if (xfs_sb_version_hasreflink(&sc->mp->m_sb)) {
+		agf->agf_refcount_root =
+				cpu_to_be32(fab[XREP_AGF_REFCOUNTBT].root);
+		agf->agf_refcount_level =
+				cpu_to_be32(fab[XREP_AGF_REFCOUNTBT].height);
+	}
+}
+
+/* Update all AGF fields which derive from btree contents. */
+STATIC int
+xrep_agf_calc_from_btrees(
+	struct xfs_scrub	*sc,
+	struct xfs_buf		*agf_bp)
+{
+	struct xrep_agf_allocbt	raa = { .sc = sc };
+	struct xfs_btree_cur	*cur = NULL;
+	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agf_bp);
+	struct xfs_mount	*mp = sc->mp;
+	xfs_agblock_t		btreeblks;
+	xfs_agblock_t		blocks;
+	int			error;
+
+	/* Update the AGF counters from the bnobt. */
+	cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno,
+			XFS_BTNUM_BNO);
+	error = xfs_alloc_query_all(cur, xrep_agf_walk_allocbt, &raa);
+	if (error)
+		goto err;
+	error = xfs_btree_count_blocks(cur, &blocks);
+	if (error)
+		goto err;
+	xfs_btree_del_cursor(cur, error);
+	btreeblks = blocks - 1;
+	agf->agf_freeblks = cpu_to_be32(raa.freeblks);
+	agf->agf_longest = cpu_to_be32(raa.longest);
+
+	/* Update the AGF counters from the cntbt. */
+	cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno,
+			XFS_BTNUM_CNT);
+	error = xfs_btree_count_blocks(cur, &blocks);
+	if (error)
+		goto err;
+	xfs_btree_del_cursor(cur, error);
+	btreeblks += blocks - 1;
+
+	/* Update the AGF counters from the rmapbt. */
+	cur = xfs_rmapbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno);
+	error = xfs_btree_count_blocks(cur, &blocks);
+	if (error)
+		goto err;
+	xfs_btree_del_cursor(cur, error);
+	agf->agf_rmap_blocks = cpu_to_be32(blocks);
+	btreeblks += blocks - 1;
+
+	agf->agf_btreeblks = cpu_to_be32(btreeblks);
+
+	/* Update the AGF counters from the refcountbt. */
+	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
+		cur = xfs_refcountbt_init_cursor(mp, sc->tp, agf_bp,
+				sc->sa.agno, NULL);
+		error = xfs_btree_count_blocks(cur, &blocks);
+		if (error)
+			goto err;
+		xfs_btree_del_cursor(cur, error);
+		agf->agf_refcount_blocks = cpu_to_be32(blocks);
+	}
+
+	return 0;
+err:
+	xfs_btree_del_cursor(cur, error);
+	return error;
+}
+
+/* Commit the new AGF and reinitialize the incore state. */
+STATIC int
+xrep_agf_commit_new(
+	struct xfs_scrub	*sc,
+	struct xfs_buf		*agf_bp)
+{
+	struct xfs_perag	*pag;
+	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agf_bp);
+
+	/* Trigger fdblocks recalculation */
+	xfs_force_summary_recalc(sc->mp);
+
+	/* Write this to disk. */
+	xfs_trans_buf_set_type(sc->tp, agf_bp, XFS_BLFT_AGF_BUF);
+	xfs_trans_log_buf(sc->tp, agf_bp, 0, BBTOB(agf_bp->b_length) - 1);
+
+	/* Now reinitialize the in-core counters we changed. */
+	pag = sc->sa.pag;
+	sc->sa.pag->pagf_init = 1;
+	pag->pagf_btreeblks = be32_to_cpu(agf->agf_btreeblks);
+	pag->pagf_freeblks = be32_to_cpu(agf->agf_freeblks);
+	pag->pagf_longest = be32_to_cpu(agf->agf_longest);
+	pag->pagf_levels[XFS_BTNUM_BNOi] =
+			be32_to_cpu(agf->agf_levels[XFS_BTNUM_BNOi]);
+	pag->pagf_levels[XFS_BTNUM_CNTi] =
+			be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNTi]);
+	pag->pagf_levels[XFS_BTNUM_RMAPi] =
+			be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAPi]);
+	pag->pagf_refcount_level = be32_to_cpu(agf->agf_refcount_level);
+
+	return 0;
+}
+
+/* Repair the AGF. v5 filesystems only. */
+int
+xrep_agf(
+	struct xfs_scrub		*sc)
+{
+	struct xrep_find_ag_btree	fab[XREP_AGF_MAX] = {
+		[XREP_AGF_BNOBT] = {
+			.rmap_owner = XFS_RMAP_OWN_AG,
+			.buf_ops = &xfs_allocbt_buf_ops,
+			.magic = XFS_ABTB_CRC_MAGIC,
+		},
+		[XREP_AGF_CNTBT] = {
+			.rmap_owner = XFS_RMAP_OWN_AG,
+			.buf_ops = &xfs_allocbt_buf_ops,
+			.magic = XFS_ABTC_CRC_MAGIC,
+		},
+		[XREP_AGF_RMAPBT] = {
+			.rmap_owner = XFS_RMAP_OWN_AG,
+			.buf_ops = &xfs_rmapbt_buf_ops,
+			.magic = XFS_RMAP_CRC_MAGIC,
+		},
+		[XREP_AGF_REFCOUNTBT] = {
+			.rmap_owner = XFS_RMAP_OWN_REFC,
+			.buf_ops = &xfs_refcountbt_buf_ops,
+			.magic = XFS_REFC_CRC_MAGIC,
+		},
+		[XREP_AGF_END] = {
+			.buf_ops = NULL,
+		},
+	};
+	struct xfs_agf			old_agf;
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_buf			*agf_bp;
+	struct xfs_buf			*agfl_bp;
+	struct xfs_agf			*agf;
+	int				error;
+
+	/* We require the rmapbt to rebuild anything. */
+	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+		return -EOPNOTSUPP;
+
+	xchk_perag_get(sc->mp, &sc->sa);
+	error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
+			XFS_AG_DADDR(mp, sc->sa.agno, XFS_AGF_DADDR(mp)),
+			XFS_FSS_TO_BB(mp, 1), 0, &agf_bp, NULL);
+	if (error)
+		return error;
+	agf_bp->b_ops = &xfs_agf_buf_ops;
+	agf = XFS_BUF_TO_AGF(agf_bp);
+
+	/*
+	 * Load the AGFL so that we can screen out OWN_AG blocks that are on
+	 * the AGFL now; these blocks might have once been part of the
+	 * bno/cnt/rmap btrees but are not now.  This is a chicken and egg
+	 * problem: the AGF is corrupt, so we have to trust the AGFL contents
+	 * because we can't do any serious cross-referencing with any of the
+	 * btrees rooted in the AGF.  If the AGFL contents are obviously bad
+	 * then we'll bail out.
+	 */
+	error = xfs_alloc_read_agfl(mp, sc->tp, sc->sa.agno, &agfl_bp);
+	if (error)
+		return error;
+
+	/*
+	 * Spot-check the AGFL blocks; if they're obviously corrupt then
+	 * there's nothing we can do but bail out.
+	 */
+	error = xfs_agfl_walk(sc->mp, XFS_BUF_TO_AGF(agf_bp), agfl_bp,
+			xrep_agf_check_agfl_block, sc);
+	if (error)
+		return error;
+
+	/*
+	 * Find the AGF btree roots.  This is also a chicken-and-egg situation;
+	 * see the function for more details.
+	 */
+	error = xrep_agf_find_btrees(sc, agf_bp, fab, agfl_bp);
+	if (error)
+		return error;
+
+	/* Start rewriting the header and implant the btrees we found. */
+	xrep_agf_init_header(sc, agf_bp, &old_agf);
+	xrep_agf_set_roots(sc, agf, fab);
+	error = xrep_agf_calc_from_btrees(sc, agf_bp);
+	if (error)
+		goto out_revert;
+
+	/* Commit the changes and reinitialize incore state. */
+	return xrep_agf_commit_new(sc, agf_bp);
+
+out_revert:
+	/* Mark the incore AGF state stale and revert the AGF. */
+	sc->sa.pag->pagf_init = 0;
+	memcpy(agf, &old_agf, sizeof(old_agf));
+	return error;
+}
diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
index 85b048b341a0..17cf48564390 100644
--- a/fs/xfs/scrub/repair.c
+++ b/fs/xfs/scrub/repair.c
@@ -128,9 +128,12 @@ xrep_roll_ag_trans(
 	int			error;
 
 	/* Keep the AG header buffers locked so we can keep going. */
-	xfs_trans_bhold(sc->tp, sc->sa.agi_bp);
-	xfs_trans_bhold(sc->tp, sc->sa.agf_bp);
-	xfs_trans_bhold(sc->tp, sc->sa.agfl_bp);
+	if (sc->sa.agi_bp)
+		xfs_trans_bhold(sc->tp, sc->sa.agi_bp);
+	if (sc->sa.agf_bp)
+		xfs_trans_bhold(sc->tp, sc->sa.agf_bp);
+	if (sc->sa.agfl_bp)
+		xfs_trans_bhold(sc->tp, sc->sa.agfl_bp);
 
 	/* Roll the transaction. */
 	error = xfs_trans_roll(&sc->tp);
@@ -138,9 +141,12 @@ xrep_roll_ag_trans(
 		goto out_release;
 
 	/* Join AG headers to the new transaction. */
-	xfs_trans_bjoin(sc->tp, sc->sa.agi_bp);
-	xfs_trans_bjoin(sc->tp, sc->sa.agf_bp);
-	xfs_trans_bjoin(sc->tp, sc->sa.agfl_bp);
+	if (sc->sa.agi_bp)
+		xfs_trans_bjoin(sc->tp, sc->sa.agi_bp);
+	if (sc->sa.agf_bp)
+		xfs_trans_bjoin(sc->tp, sc->sa.agf_bp);
+	if (sc->sa.agfl_bp)
+		xfs_trans_bjoin(sc->tp, sc->sa.agfl_bp);
 
 	return 0;
 
@@ -150,9 +156,12 @@ xrep_roll_ag_trans(
 	 * buffers will be released during teardown on our way out
 	 * of the kernel.
 	 */
-	xfs_trans_bhold_release(sc->tp, sc->sa.agi_bp);
-	xfs_trans_bhold_release(sc->tp, sc->sa.agf_bp);
-	xfs_trans_bhold_release(sc->tp, sc->sa.agfl_bp);
+	if (sc->sa.agi_bp)
+		xfs_trans_bhold_release(sc->tp, sc->sa.agi_bp);
+	if (sc->sa.agf_bp)
+		xfs_trans_bhold_release(sc->tp, sc->sa.agf_bp);
+	if (sc->sa.agfl_bp)
+		xfs_trans_bhold_release(sc->tp, sc->sa.agfl_bp);
 
 	return error;
 }
diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
index 5a4e92221916..1d283360b5ab 100644
--- a/fs/xfs/scrub/repair.h
+++ b/fs/xfs/scrub/repair.h
@@ -58,6 +58,8 @@ int xrep_ino_dqattach(struct xfs_scrub *sc);
 
 int xrep_probe(struct xfs_scrub *sc);
 int xrep_superblock(struct xfs_scrub *sc);
+int xrep_agf(struct xfs_scrub *sc);
+int xrep_agfl(struct xfs_scrub *sc);
 
 #else
 
@@ -81,6 +83,8 @@ xrep_calc_ag_resblks(
 
 #define xrep_probe			xrep_notsupported
 #define xrep_superblock			xrep_notsupported
+#define xrep_agf			xrep_notsupported
+#define xrep_agfl			xrep_notsupported
 
 #endif /* CONFIG_XFS_ONLINE_REPAIR */
 
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 6efb926f3cf8..1e8a17c8e2b9 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -214,7 +214,7 @@ static const struct xchk_meta_ops meta_scrub_ops[] = {
 		.type	= ST_PERAG,
 		.setup	= xchk_setup_fs,
 		.scrub	= xchk_agf,
-		.repair	= xrep_notsupported,
+		.repair	= xrep_agf,
 	},
 	[XFS_SCRUB_TYPE_AGFL]= {	/* agfl */
 		.type	= ST_PERAG,


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 05/16] xfs: repair the AGFL
  2018-07-26  0:19 [PATCH v17 00/16] xfs-4.19: online repair support Darrick J. Wong
                   ` (3 preceding siblings ...)
  2018-07-26  0:19 ` [PATCH 04/16] xfs: repair the AGF Darrick J. Wong
@ 2018-07-26  0:20 ` Darrick J. Wong
  2018-07-26  0:20 ` [PATCH 06/16] xfs: repair the AGI Darrick J. Wong
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2018-07-26  0:20 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, david, allison.henderson

From: Darrick J. Wong <darrick.wong@oracle.com>

Repair the AGFL from the rmap data.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/scrub/agheader_repair.c |  272 ++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/bitmap.c          |   92 ++++++++++++++
 fs/xfs/scrub/bitmap.h          |    4 +
 fs/xfs/scrub/scrub.c           |    2 
 4 files changed, 369 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/scrub/agheader_repair.c b/fs/xfs/scrub/agheader_repair.c
index 938af216cb1c..407fee8d94dd 100644
--- a/fs/xfs/scrub/agheader_repair.c
+++ b/fs/xfs/scrub/agheader_repair.c
@@ -420,3 +420,275 @@ xrep_agf(
 	memcpy(agf, &old_agf, sizeof(old_agf));
 	return error;
 }
+
+/* AGFL */
+
+struct xrep_agfl {
+	/* Bitmap of other OWN_AG metadata blocks. */
+	struct xfs_bitmap	agmetablocks;
+
+	/* Bitmap of free space. */
+	struct xfs_bitmap	*freesp;
+
+	struct xfs_scrub	*sc;
+};
+
+/* Record all OWN_AG (free space btree) information from the rmap data. */
+STATIC int
+xrep_agfl_walk_rmap(
+	struct xfs_btree_cur	*cur,
+	struct xfs_rmap_irec	*rec,
+	void			*priv)
+{
+	struct xrep_agfl	*ra = priv;
+	xfs_fsblock_t		fsb;
+	int			error = 0;
+
+	if (xchk_should_terminate(ra->sc, &error))
+		return error;
+
+	/* Record all the OWN_AG blocks. */
+	if (rec->rm_owner == XFS_RMAP_OWN_AG) {
+		fsb = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_private.a.agno,
+				rec->rm_startblock);
+		error = xfs_bitmap_set(ra->freesp, fsb, rec->rm_blockcount);
+		if (error)
+			return error;
+	}
+
+	return xfs_bitmap_set_btcur_path(&ra->agmetablocks, cur);
+}
+
+/*
+ * Map out all the non-AGFL OWN_AG space in this AG so that we can deduce
+ * which blocks belong to the AGFL.
+ *
+ * Compute the set of old AGFL blocks by subtracting from the list of OWN_AG
+ * blocks the list of blocks owned by all other OWN_AG metadata (bnobt, cntbt,
+ * rmapbt).  These are the old AGFL blocks, so return that list and the number
+ * of blocks we're actually going to put back on the AGFL.
+ */
+STATIC int
+xrep_agfl_collect_blocks(
+	struct xfs_scrub	*sc,
+	struct xfs_buf		*agf_bp,
+	struct xfs_bitmap	*agfl_extents,
+	xfs_agblock_t		*flcount)
+{
+	struct xrep_agfl	ra;
+	struct xfs_mount	*mp = sc->mp;
+	struct xfs_btree_cur	*cur;
+	struct xfs_bitmap_range	*br;
+	struct xfs_bitmap_range	*n;
+	int			error;
+
+	ra.sc = sc;
+	ra.freesp = agfl_extents;
+	xfs_bitmap_init(&ra.agmetablocks);
+
+	/* Find all space used by the free space btrees & rmapbt. */
+	cur = xfs_rmapbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno);
+	error = xfs_rmap_query_all(cur, xrep_agfl_walk_rmap, &ra);
+	if (error)
+		goto err;
+	xfs_btree_del_cursor(cur, error);
+
+	/* Find all blocks currently being used by the bnobt. */
+	cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno,
+			XFS_BTNUM_BNO);
+	error = xfs_bitmap_set_btblocks(&ra.agmetablocks, cur);
+	if (error)
+		goto err;
+	xfs_btree_del_cursor(cur, error);
+
+	/* Find all blocks currently being used by the cntbt. */
+	cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno,
+			XFS_BTNUM_CNT);
+	error = xfs_bitmap_set_btblocks(&ra.agmetablocks, cur);
+	if (error)
+		goto err;
+
+	xfs_btree_del_cursor(cur, error);
+
+	/*
+	 * Drop the freesp meta blocks that are in use by btrees.
+	 * The remaining blocks /should/ be AGFL blocks.
+	 */
+	error = xfs_bitmap_disunion(agfl_extents, &ra.agmetablocks);
+	xfs_bitmap_destroy(&ra.agmetablocks);
+	if (error)
+		return error;
+
+	/*
+	 * Calculate the new AGFL size.  If we found more blocks than fit in
+	 * the AGFL we'll free them later.
+	 */
+	*flcount = 0;
+	for_each_xfs_bitmap_extent(br, n, agfl_extents) {
+		*flcount += br->len;
+		if (*flcount > xfs_agfl_size(mp))
+			break;
+	}
+	if (*flcount > xfs_agfl_size(mp))
+		*flcount = xfs_agfl_size(mp);
+	return 0;
+
+err:
+	xfs_bitmap_destroy(&ra.agmetablocks);
+	xfs_btree_del_cursor(cur, error);
+	return error;
+}
+
+/* Update the AGF and reset the in-core state. */
+STATIC int
+xrep_agfl_update_agf(
+	struct xfs_scrub	*sc,
+	struct xfs_buf		*agf_bp,
+	xfs_agblock_t		flcount)
+{
+	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agf_bp);
+
+	ASSERT(flcount <= xfs_agfl_size(sc->mp));
+
+	/* Trigger fdblocks recalculation */
+	xfs_force_summary_recalc(sc->mp);
+
+	/* Update the AGF counters. */
+	if (sc->sa.pag->pagf_init)
+		sc->sa.pag->pagf_flcount = flcount;
+	agf->agf_flfirst = cpu_to_be32(0);
+	agf->agf_flcount = cpu_to_be32(flcount);
+	agf->agf_fllast = cpu_to_be32(flcount - 1);
+
+	xfs_alloc_log_agf(sc->tp, agf_bp,
+			XFS_AGF_FLFIRST | XFS_AGF_FLLAST | XFS_AGF_FLCOUNT);
+	return 0;
+}
+
+/* Write out a totally new AGFL. */
+STATIC void
+xrep_agfl_init_header(
+	struct xfs_scrub	*sc,
+	struct xfs_buf		*agfl_bp,
+	struct xfs_bitmap	*agfl_extents,
+	xfs_agblock_t		flcount)
+{
+	struct xfs_mount	*mp = sc->mp;
+	__be32			*agfl_bno;
+	struct xfs_bitmap_range	*br;
+	struct xfs_bitmap_range	*n;
+	struct xfs_agfl		*agfl;
+	xfs_agblock_t		agbno;
+	unsigned int		fl_off;
+
+	ASSERT(flcount <= xfs_agfl_size(mp));
+
+	/* Start rewriting the header. */
+	agfl = XFS_BUF_TO_AGFL(agfl_bp);
+	memset(agfl, 0xFF, BBTOB(agfl_bp->b_length));
+	agfl->agfl_magicnum = cpu_to_be32(XFS_AGFL_MAGIC);
+	agfl->agfl_seqno = cpu_to_be32(sc->sa.agno);
+	uuid_copy(&agfl->agfl_uuid, &mp->m_sb.sb_meta_uuid);
+
+	/*
+	 * Fill the AGFL with the remaining blocks.  If agfl_extents has more
+	 * blocks than fit in the AGFL, they will be freed in a subsequent
+	 * step.
+	 */
+	fl_off = 0;
+	agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agfl_bp);
+	for_each_xfs_bitmap_extent(br, n, agfl_extents) {
+		agbno = XFS_FSB_TO_AGBNO(mp, br->fsbno);
+
+		trace_xrep_agfl_insert(mp, sc->sa.agno, agbno, br->len);
+
+		while (br->len > 0 && fl_off < flcount) {
+			agfl_bno[fl_off] = cpu_to_be32(agbno);
+			fl_off++;
+			agbno++;
+			br->fsbno++;
+			br->len--;
+		}
+
+		if (br->len)
+			break;
+		list_del(&br->list);
+		kmem_free(br);
+	}
+
+	/* Write new AGFL to disk. */
+	xfs_trans_buf_set_type(sc->tp, agfl_bp, XFS_BLFT_AGFL_BUF);
+	xfs_trans_log_buf(sc->tp, agfl_bp, 0, BBTOB(agfl_bp->b_length) - 1);
+}
+
+/* Repair the AGFL. */
+int
+xrep_agfl(
+	struct xfs_scrub	*sc)
+{
+	struct xfs_owner_info	oinfo;
+	struct xfs_bitmap	agfl_extents;
+	struct xfs_mount	*mp = sc->mp;
+	struct xfs_buf		*agf_bp;
+	struct xfs_buf		*agfl_bp;
+	xfs_agblock_t		flcount;
+	int			error;
+
+	/* We require the rmapbt to rebuild anything. */
+	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+		return -EOPNOTSUPP;
+
+	xchk_perag_get(sc->mp, &sc->sa);
+	xfs_bitmap_init(&agfl_extents);
+
+	/*
+	 * Read the AGF so that we can query the rmapbt.  We hope that there's
+	 * nothing wrong with the AGF, but all the AG header repair functions
+	 * have this chicken-and-egg problem.
+	 */
+	error = xfs_alloc_read_agf(mp, sc->tp, sc->sa.agno, 0, &agf_bp);
+	if (error)
+		return error;
+	if (!agf_bp)
+		return -ENOMEM;
+
+	error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
+			XFS_AG_DADDR(mp, sc->sa.agno, XFS_AGFL_DADDR(mp)),
+			XFS_FSS_TO_BB(mp, 1), 0, &agfl_bp, NULL);
+	if (error)
+		return error;
+	agfl_bp->b_ops = &xfs_agfl_buf_ops;
+
+	/* Gather all the extents we're going to put on the new AGFL. */
+	error = xrep_agfl_collect_blocks(sc, agf_bp, &agfl_extents, &flcount);
+	if (error)
+		goto err;
+
+	/*
+	 * Update AGF and AGFL.  We reset the global free block counter when
+	 * we adjust the AGF flcount (which can fail) so avoid updating any
+	 * buffers until we know that part works.
+	 */
+	error = xrep_agfl_update_agf(sc, agf_bp, flcount);
+	if (error)
+		goto err;
+	xrep_agfl_init_header(sc, agfl_bp, &agfl_extents, flcount);
+
+	/*
+	 * Ok, the AGFL should be ready to go now.  Roll the transaction to
+	 * make the new AGFL permanent before we start using it to return
+	 * freespace overflow to the freespace btrees.
+	 */
+	sc->sa.agf_bp = agf_bp;
+	sc->sa.agfl_bp = agfl_bp;
+	error = xrep_roll_ag_trans(sc);
+	if (error)
+		goto err;
+
+	/* Dump any AGFL overflow. */
+	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_AG);
+	return xrep_reap_extents(sc, &agfl_extents, &oinfo, XFS_AG_RESV_AGFL);
+err:
+	xfs_bitmap_destroy(&agfl_extents);
+	return error;
+}
diff --git a/fs/xfs/scrub/bitmap.c b/fs/xfs/scrub/bitmap.c
index 4840f5a1e179..046ee63982c9 100644
--- a/fs/xfs/scrub/bitmap.c
+++ b/fs/xfs/scrub/bitmap.c
@@ -9,6 +9,7 @@
 #include "xfs_format.h"
 #include "xfs_trans_resv.h"
 #include "xfs_mount.h"
+#include "xfs_btree.h"
 #include "scrub/xfs_scrub.h"
 #include "scrub/scrub.h"
 #include "scrub/common.h"
@@ -209,3 +210,94 @@ xfs_bitmap_disunion(
 }
 #undef LEFT_ALIGNED
 #undef RIGHT_ALIGNED
+
+/*
+ * Record all btree blocks seen while iterating all records of a btree.
+ *
+ * We know that the btree query_all function starts at the left edge and walks
+ * towards the right edge of the tree.  Therefore, we know that we can walk up
+ * the btree cursor towards the root; if the pointer for a given level points
+ * to the first record/key in that block, we haven't seen this block before;
+ * and therefore we need to remember that we saw this block in the btree.
+ *
+ * So if our btree is:
+ *
+ *    4
+ *  / | \
+ * 1  2  3
+ *
+ * Pretend for this example that each leaf block has 100 btree records.  For
+ * the first btree record, we'll observe that bc_ptrs[0] == 1, so we record
+ * that we saw block 1.  Then we observe that bc_ptrs[1] == 1, so we record
+ * block 4.  The list is [1, 4].
+ *
+ * For the second btree record, we see that bc_ptrs[0] == 2, so we exit the
+ * loop.  The list remains [1, 4].
+ *
+ * For the 101st btree record, we've moved onto leaf block 2.  Now
+ * bc_ptrs[0] == 1 again, so we record that we saw block 2.  We see that
+ * bc_ptrs[1] == 2, so we exit the loop.  The list is now [1, 4, 2].
+ *
+ * For the 102nd record, bc_ptrs[0] == 2, so we continue.
+ *
+ * For the 201st record, we've moved on to leaf block 3.  bc_ptrs[0] == 1, so
+ * we add 3 to the list.  Now it is [1, 4, 2, 3].
+ *
+ * For the 300th record we just exit, with the list being [1, 4, 2, 3].
+ */
+
+/*
+ * Record all the buffers pointed to by the btree cursor.  Callers already
+ * engaged in a btree walk should call this function to capture the list of
+ * blocks going from the leaf towards the root.
+ */
+int
+xfs_bitmap_set_btcur_path(
+	struct xfs_bitmap	*bitmap,
+	struct xfs_btree_cur	*cur)
+{
+	struct xfs_buf		*bp;
+	xfs_fsblock_t		fsb;
+	int			i;
+	int			error;
+
+	for (i = 0; i < cur->bc_nlevels && cur->bc_ptrs[i] == 1; i++) {
+		xfs_btree_get_block(cur, i, &bp);
+		if (!bp)
+			continue;
+		fsb = XFS_DADDR_TO_FSB(cur->bc_mp, bp->b_bn);
+		error = xfs_bitmap_set(bitmap, fsb, 1);
+		if (error)
+			return error;
+	}
+
+	return 0;
+}
+
+/* Collect a btree's block in the bitmap. */
+STATIC int
+xfs_bitmap_collect_btblock(
+	struct xfs_btree_cur	*cur,
+	int			level,
+	void			*priv)
+{
+	struct xfs_bitmap	*bitmap = priv;
+	struct xfs_buf		*bp;
+	xfs_fsblock_t		fsbno;
+
+	xfs_btree_get_block(cur, level, &bp);
+	if (!bp)
+		return 0;
+
+	fsbno = XFS_DADDR_TO_FSB(cur->bc_mp, bp->b_bn);
+	return xfs_bitmap_set(bitmap, fsbno, 1);
+}
+
+/* Walk the btree and mark the bitmap wherever a btree block is found. */
+int
+xfs_bitmap_set_btblocks(
+	struct xfs_bitmap	*bitmap,
+	struct xfs_btree_cur	*cur)
+{
+	return xfs_btree_visit_blocks(cur, xfs_bitmap_collect_btblock, bitmap);
+}
diff --git a/fs/xfs/scrub/bitmap.h b/fs/xfs/scrub/bitmap.h
index 3c39900e9269..b5df433b6e9c 100644
--- a/fs/xfs/scrub/bitmap.h
+++ b/fs/xfs/scrub/bitmap.h
@@ -29,5 +29,9 @@ void xfs_bitmap_destroy(struct xfs_bitmap *bitmap);
 int xfs_bitmap_set(struct xfs_bitmap *bitmap, xfs_fsblock_t fsbno,
 		xfs_fsblock_t len);
 int xfs_bitmap_disunion(struct xfs_bitmap *bitmap, struct xfs_bitmap *sub);
+int xfs_bitmap_set_btcur_path(struct xfs_bitmap *bitmap,
+		struct xfs_btree_cur *cur);
+int xfs_bitmap_set_btblocks(struct xfs_bitmap *bitmap,
+		struct xfs_btree_cur *cur);
 
 #endif	/* __XFS_SCRUB_BITMAP_H__ */
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 1e8a17c8e2b9..2670f4cf62f4 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -220,7 +220,7 @@ static const struct xchk_meta_ops meta_scrub_ops[] = {
 		.type	= ST_PERAG,
 		.setup	= xchk_setup_fs,
 		.scrub	= xchk_agfl,
-		.repair	= xrep_notsupported,
+		.repair	= xrep_agfl,
 	},
 	[XFS_SCRUB_TYPE_AGI] = {	/* agi */
 		.type	= ST_PERAG,


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 06/16] xfs: repair the AGI
  2018-07-26  0:19 [PATCH v17 00/16] xfs-4.19: online repair support Darrick J. Wong
                   ` (4 preceding siblings ...)
  2018-07-26  0:20 ` [PATCH 05/16] xfs: repair the AGFL Darrick J. Wong
@ 2018-07-26  0:20 ` Darrick J. Wong
  2018-07-26  0:20 ` [PATCH 07/16] xfs: repair free space btrees Darrick J. Wong
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2018-07-26  0:20 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, david, allison.henderson

From: Darrick J. Wong <darrick.wong@oracle.com>

Rebuild the AGI header items with some help from the rmapbt.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/scrub/agheader_repair.c |  216 ++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/repair.h          |    2 
 fs/xfs/scrub/scrub.c           |    2 
 3 files changed, 219 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/scrub/agheader_repair.c b/fs/xfs/scrub/agheader_repair.c
index 407fee8d94dd..8d525fa28f17 100644
--- a/fs/xfs/scrub/agheader_repair.c
+++ b/fs/xfs/scrub/agheader_repair.c
@@ -692,3 +692,219 @@ xrep_agfl(
 	xfs_bitmap_destroy(&agfl_extents);
 	return error;
 }
+
+/* AGI */
+
+/*
+ * Offset within the xrep_find_ag_btree array for each btree type.  Avoid the
+ * XFS_BTNUM_ names here to avoid creating a sparse array.
+ */
+enum {
+	XREP_AGI_INOBT = 0,
+	XREP_AGI_FINOBT,
+	XREP_AGI_END,
+	XREP_AGI_MAX
+};
+
+/*
+ * Given the inode btree roots described by *fab, find the roots, check them
+ * for sanity, and pass the root data back out via *fab.
+ */
+STATIC int
+xrep_agi_find_btrees(
+	struct xfs_scrub		*sc,
+	struct xrep_find_ag_btree	*fab)
+{
+	struct xfs_buf			*agf_bp;
+	struct xfs_mount		*mp = sc->mp;
+	int				error;
+
+	/* Read the AGF. */
+	error = xfs_alloc_read_agf(mp, sc->tp, sc->sa.agno, 0, &agf_bp);
+	if (error)
+		return error;
+	if (!agf_bp)
+		return -ENOMEM;
+
+	/* Find the btree roots. */
+	error = xrep_find_ag_btree_roots(sc, agf_bp, fab, NULL);
+	if (error)
+		return error;
+
+	/* We must find the inobt root. */
+	if (fab[XREP_AGI_INOBT].root == NULLAGBLOCK ||
+	    fab[XREP_AGI_INOBT].height > XFS_BTREE_MAXLEVELS)
+		return -EFSCORRUPTED;
+
+	/* We must find the finobt root if that feature is enabled. */
+	if (xfs_sb_version_hasfinobt(&mp->m_sb) &&
+	    (fab[XREP_AGI_FINOBT].root == NULLAGBLOCK ||
+	     fab[XREP_AGI_FINOBT].height > XFS_BTREE_MAXLEVELS))
+		return -EFSCORRUPTED;
+
+	return 0;
+}
+
+/*
+ * Reinitialize the AGI header, making an in-core copy of the old contents so
+ * that we know which in-core state needs to be reinitialized.
+ */
+STATIC void
+xrep_agi_init_header(
+	struct xfs_scrub	*sc,
+	struct xfs_buf		*agi_bp,
+	struct xfs_agi		*old_agi)
+{
+	struct xfs_agi		*agi = XFS_BUF_TO_AGI(agi_bp);
+	struct xfs_mount	*mp = sc->mp;
+
+	memcpy(old_agi, agi, sizeof(*old_agi));
+	memset(agi, 0, BBTOB(agi_bp->b_length));
+	agi->agi_magicnum = cpu_to_be32(XFS_AGI_MAGIC);
+	agi->agi_versionnum = cpu_to_be32(XFS_AGI_VERSION);
+	agi->agi_seqno = cpu_to_be32(sc->sa.agno);
+	agi->agi_length = cpu_to_be32(xfs_ag_block_count(mp, sc->sa.agno));
+	agi->agi_newino = cpu_to_be32(NULLAGINO);
+	agi->agi_dirino = cpu_to_be32(NULLAGINO);
+	if (xfs_sb_version_hascrc(&mp->m_sb))
+		uuid_copy(&agi->agi_uuid, &mp->m_sb.sb_meta_uuid);
+
+	/* We don't know how to fix the unlinked list yet. */
+	memcpy(&agi->agi_unlinked, &old_agi->agi_unlinked,
+			sizeof(agi->agi_unlinked));
+
+	/* Mark the incore AGF data stale until we're done fixing things. */
+	ASSERT(sc->sa.pag->pagi_init);
+	sc->sa.pag->pagi_init = 0;
+}
+
+/* Set btree root information in an AGI. */
+STATIC void
+xrep_agi_set_roots(
+	struct xfs_scrub		*sc,
+	struct xfs_agi			*agi,
+	struct xrep_find_ag_btree	*fab)
+{
+	agi->agi_root = cpu_to_be32(fab[XREP_AGI_INOBT].root);
+	agi->agi_level = cpu_to_be32(fab[XREP_AGI_INOBT].height);
+
+	if (xfs_sb_version_hasfinobt(&sc->mp->m_sb)) {
+		agi->agi_free_root = cpu_to_be32(fab[XREP_AGI_FINOBT].root);
+		agi->agi_free_level = cpu_to_be32(fab[XREP_AGI_FINOBT].height);
+	}
+}
+
+/* Update the AGI counters. */
+STATIC int
+xrep_agi_calc_from_btrees(
+	struct xfs_scrub	*sc,
+	struct xfs_buf		*agi_bp)
+{
+	struct xfs_btree_cur	*cur;
+	struct xfs_agi		*agi = XFS_BUF_TO_AGI(agi_bp);
+	struct xfs_mount	*mp = sc->mp;
+	xfs_agino_t		count;
+	xfs_agino_t		freecount;
+	int			error;
+
+	cur = xfs_inobt_init_cursor(mp, sc->tp, agi_bp, sc->sa.agno,
+			XFS_BTNUM_INO);
+	error = xfs_ialloc_count_inodes(cur, &count, &freecount);
+	if (error)
+		goto err;
+	xfs_btree_del_cursor(cur, error);
+
+	agi->agi_count = cpu_to_be32(count);
+	agi->agi_freecount = cpu_to_be32(freecount);
+	return 0;
+err:
+	xfs_btree_del_cursor(cur, error);
+	return error;
+}
+
+/* Trigger reinitialization of the in-core data. */
+STATIC int
+xrep_agi_commit_new(
+	struct xfs_scrub	*sc,
+	struct xfs_buf		*agi_bp,
+	const struct xfs_agi	*old_agi)
+{
+	struct xfs_perag	*pag;
+	struct xfs_agi		*agi = XFS_BUF_TO_AGI(agi_bp);
+
+	/* Trigger inode count recalculation */
+	xfs_force_summary_recalc(sc->mp);
+
+	/* Write this to disk. */
+	xfs_trans_buf_set_type(sc->tp, agi_bp, XFS_BLFT_AGI_BUF);
+	xfs_trans_log_buf(sc->tp, agi_bp, 0, BBTOB(agi_bp->b_length) - 1);
+
+	/* Now reinitialize the in-core counters if necessary. */
+	pag = sc->sa.pag;
+	sc->sa.pag->pagi_init = 1;
+	pag->pagi_count = be32_to_cpu(agi->agi_count);
+	pag->pagi_freecount = be32_to_cpu(agi->agi_freecount);
+
+	return 0;
+}
+
+/* Repair the AGI. */
+int
+xrep_agi(
+	struct xfs_scrub		*sc)
+{
+	struct xrep_find_ag_btree	fab[XREP_AGI_MAX] = {
+		[XREP_AGI_INOBT] = {
+			.rmap_owner = XFS_RMAP_OWN_INOBT,
+			.buf_ops = &xfs_inobt_buf_ops,
+			.magic = XFS_IBT_CRC_MAGIC,
+		},
+		[XREP_AGI_FINOBT] = {
+			.rmap_owner = XFS_RMAP_OWN_INOBT,
+			.buf_ops = &xfs_inobt_buf_ops,
+			.magic = XFS_FIBT_CRC_MAGIC,
+		},
+		[XREP_AGI_END] = {
+			.buf_ops = NULL
+		},
+	};
+	struct xfs_agi			old_agi;
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_buf			*agi_bp;
+	struct xfs_agi			*agi;
+	int				error;
+
+	/* We require the rmapbt to rebuild anything. */
+	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+		return -EOPNOTSUPP;
+
+	xchk_perag_get(sc->mp, &sc->sa);
+	error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
+			XFS_AG_DADDR(mp, sc->sa.agno, XFS_AGI_DADDR(mp)),
+			XFS_FSS_TO_BB(mp, 1), 0, &agi_bp, NULL);
+	if (error)
+		return error;
+	agi_bp->b_ops = &xfs_agi_buf_ops;
+	agi = XFS_BUF_TO_AGI(agi_bp);
+
+	/* Find the AGI btree roots. */
+	error = xrep_agi_find_btrees(sc, fab);
+	if (error)
+		return error;
+
+	/* Start rewriting the header and implant the btrees we found. */
+	xrep_agi_init_header(sc, agi_bp, &old_agi);
+	xrep_agi_set_roots(sc, agi, fab);
+	error = xrep_agi_calc_from_btrees(sc, agi_bp);
+	if (error)
+		goto out_revert;
+
+	/* Reinitialize in-core state. */
+	return xrep_agi_commit_new(sc, agi_bp, &old_agi);
+
+out_revert:
+	/* Mark the incore AGI state stale and revert the AGI. */
+	sc->sa.pag->pagi_init = 0;
+	memcpy(agi, &old_agi, sizeof(old_agi));
+	return error;
+}
diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
index 1d283360b5ab..9de321eee4ab 100644
--- a/fs/xfs/scrub/repair.h
+++ b/fs/xfs/scrub/repair.h
@@ -60,6 +60,7 @@ int xrep_probe(struct xfs_scrub *sc);
 int xrep_superblock(struct xfs_scrub *sc);
 int xrep_agf(struct xfs_scrub *sc);
 int xrep_agfl(struct xfs_scrub *sc);
+int xrep_agi(struct xfs_scrub *sc);
 
 #else
 
@@ -85,6 +86,7 @@ xrep_calc_ag_resblks(
 #define xrep_superblock			xrep_notsupported
 #define xrep_agf			xrep_notsupported
 #define xrep_agfl			xrep_notsupported
+#define xrep_agi			xrep_notsupported
 
 #endif /* CONFIG_XFS_ONLINE_REPAIR */
 
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 2670f4cf62f4..4bfae1e61d30 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -226,7 +226,7 @@ static const struct xchk_meta_ops meta_scrub_ops[] = {
 		.type	= ST_PERAG,
 		.setup	= xchk_setup_fs,
 		.scrub	= xchk_agi,
-		.repair	= xrep_notsupported,
+		.repair	= xrep_agi,
 	},
 	[XFS_SCRUB_TYPE_BNOBT] = {	/* bnobt */
 		.type	= ST_PERAG,


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 07/16] xfs: repair free space btrees
  2018-07-26  0:19 [PATCH v17 00/16] xfs-4.19: online repair support Darrick J. Wong
                   ` (5 preceding siblings ...)
  2018-07-26  0:20 ` [PATCH 06/16] xfs: repair the AGI Darrick J. Wong
@ 2018-07-26  0:20 ` Darrick J. Wong
  2018-07-26  0:21 ` [PATCH 08/16] xfs: repair inode btrees Darrick J. Wong
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2018-07-26  0:20 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, david, allison.henderson

From: Darrick J. Wong <darrick.wong@oracle.com>

Rebuild the free space btrees from the gaps in the rmap btree.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile             |    1 
 fs/xfs/scrub/alloc.c        |    1 
 fs/xfs/scrub/alloc_repair.c |  581 +++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/common.c       |    8 +
 fs/xfs/scrub/repair.h       |    2 
 fs/xfs/scrub/scrub.c        |    4 
 fs/xfs/scrub/trace.h        |    2 
 fs/xfs/xfs_extent_busy.c    |   14 +
 fs/xfs/xfs_extent_busy.h    |    2 
 9 files changed, 610 insertions(+), 5 deletions(-)
 create mode 100644 fs/xfs/scrub/alloc_repair.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 57ec46951ede..44ddd112acd2 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -164,6 +164,7 @@ xfs-$(CONFIG_XFS_QUOTA)		+= scrub/quota.o
 ifeq ($(CONFIG_XFS_ONLINE_REPAIR),y)
 xfs-y				+= $(addprefix scrub/, \
 				   agheader_repair.o \
+				   alloc_repair.o \
 				   bitmap.o \
 				   repair.o \
 				   )
diff --git a/fs/xfs/scrub/alloc.c b/fs/xfs/scrub/alloc.c
index 036b5c7021eb..c9b34ba312ab 100644
--- a/fs/xfs/scrub/alloc.c
+++ b/fs/xfs/scrub/alloc.c
@@ -15,7 +15,6 @@
 #include "xfs_log_format.h"
 #include "xfs_trans.h"
 #include "xfs_sb.h"
-#include "xfs_alloc.h"
 #include "xfs_rmap.h"
 #include "xfs_alloc.h"
 #include "scrub/xfs_scrub.h"
diff --git a/fs/xfs/scrub/alloc_repair.c b/fs/xfs/scrub/alloc_repair.c
new file mode 100644
index 000000000000..b228c2906de2
--- /dev/null
+++ b/fs/xfs/scrub/alloc_repair.c
@@ -0,0 +1,581 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_alloc.h"
+#include "xfs_alloc_btree.h"
+#include "xfs_rmap.h"
+#include "xfs_rmap_btree.h"
+#include "xfs_inode.h"
+#include "xfs_refcount.h"
+#include "xfs_extent_busy.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/btree.h"
+#include "scrub/trace.h"
+#include "scrub/repair.h"
+#include "scrub/bitmap.h"
+
+/*
+ * Free Space Btree Repair
+ * =======================
+ *
+ * The reverse mappings are supposed to record all space usage for the entire
+ * AG.  Therefore, we can recalculate the free extents in an AG by looking for
+ * gaps in the physical extents recorded in the rmapbt.  On a reflink
+ * filesystem this is a little more tricky in that we have to be aware that
+ * the rmap records are allowed to overlap.
+ *
+ * We derive which blocks belonged to the old bnobt/cntbt by recording all the
+ * OWN_AG extents and subtracting out the blocks owned by all other OWN_AG
+ * metadata: the rmapbt blocks visited while iterating the reverse mappings
+ * and the AGFL blocks.
+ *
+ * Once we have both of those pieces, we can reconstruct the bnobt and cntbt
+ * by blowing out the free block state and freeing all the extents that we
+ * found.  This adds the requirement that we can't have any busy extents in
+ * the AG because the busy code cannot handle duplicate records.
+ *
+ * Note that we can only rebuild both free space btrees at the same time
+ * because the regular extent freeing infrastructure loads both btrees at the
+ * same time.
+ *
+ * We use the prefix 'xrep_abt' here because we regenerate both free space
+ * allocation btrees at the same time.
+ */
+
+struct xrep_abt_extent {
+	struct list_head	list;
+	xfs_agblock_t		bno;
+	xfs_extlen_t		len;
+};
+
+struct xrep_abt {
+	/* Blocks owned by the rmapbt or the agfl. */
+	struct xfs_bitmap	nobtlist;
+
+	/* All OWN_AG blocks. */
+	struct xfs_bitmap	*btlist;
+
+	/* Free space extents. */
+	struct list_head	*extlist;
+
+	struct xfs_scrub	*sc;
+
+	/* Length of extlist. */
+	uint64_t		nr_records;
+
+	/*
+	 * Next block we anticipate seeing in the rmap records.  If the next
+	 * rmap record is greater than next_bno, we have found unused space.
+	 */
+	xfs_agblock_t		next_bno;
+
+	/* Number of free blocks in this AG. */
+	xfs_agblock_t		nr_blocks;
+};
+
+/* Record extents that aren't in use from gaps in the rmap records. */
+STATIC int
+xrep_abt_walk_rmap(
+	struct xfs_btree_cur	*cur,
+	struct xfs_rmap_irec	*rec,
+	void			*priv)
+{
+	struct xrep_abt		*ra = priv;
+	struct xrep_abt_extent	*rae;
+	xfs_fsblock_t		fsb;
+	int			error;
+
+	/* Record all the OWN_AG blocks... */
+	if (rec->rm_owner == XFS_RMAP_OWN_AG) {
+		fsb = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_private.a.agno,
+				rec->rm_startblock);
+		error = xfs_bitmap_set(ra->btlist, fsb, rec->rm_blockcount);
+		if (error)
+			return error;
+	}
+
+	/* ...and all the rmapbt blocks... */
+	error = xfs_bitmap_set_btcur_path(&ra->nobtlist, cur);
+	if (error)
+		return error;
+
+	/* ...and all the free space. */
+	if (rec->rm_startblock > ra->next_bno) {
+		trace_xrep_abt_walk_rmap(cur->bc_mp, cur->bc_private.a.agno,
+				ra->next_bno, rec->rm_startblock - ra->next_bno,
+				XFS_RMAP_OWN_NULL, 0, 0);
+
+		rae = kmem_alloc(sizeof(struct xrep_abt_extent), KM_MAYFAIL);
+		if (!rae)
+			return -ENOMEM;
+		INIT_LIST_HEAD(&rae->list);
+		rae->bno = ra->next_bno;
+		rae->len = rec->rm_startblock - ra->next_bno;
+		list_add_tail(&rae->list, ra->extlist);
+		ra->nr_records++;
+		ra->nr_blocks += rae->len;
+	}
+	ra->next_bno = max_t(xfs_agblock_t, ra->next_bno,
+			rec->rm_startblock + rec->rm_blockcount);
+	return 0;
+}
+
+/* Collect an AGFL block for the not-to-release list. */
+static int
+xrep_abt_walk_agfl(
+	struct xfs_mount	*mp,
+	xfs_agblock_t		bno,
+	void			*priv)
+{
+	struct xrep_abt		*ra = priv;
+	xfs_fsblock_t		fsb;
+
+	fsb = XFS_AGB_TO_FSB(mp, ra->sc->sa.agno, bno);
+	return xfs_bitmap_set(&ra->nobtlist, fsb, 1);
+}
+
+/* Compare two free space extents. */
+static int
+xrep_abt_extent_cmp(
+	void			*priv,
+	struct list_head	*a,
+	struct list_head	*b)
+{
+	struct xrep_abt_extent	*ap;
+	struct xrep_abt_extent	*bp;
+
+	ap = container_of(a, struct xrep_abt_extent, list);
+	bp = container_of(b, struct xrep_abt_extent, list);
+
+	if (ap->bno > bp->bno)
+		return 1;
+	else if (ap->bno < bp->bno)
+		return -1;
+	return 0;
+}
+
+/* Free an extent, which creates a record in the bnobt/cntbt. */
+STATIC int
+xrep_abt_free_extent(
+	struct xfs_scrub	*sc,
+	xfs_fsblock_t		fsbno,
+	xfs_extlen_t		len,
+	struct xfs_owner_info	*oinfo)
+{
+	int			error;
+
+	error = xfs_free_extent(sc->tp, fsbno, len, oinfo, 0);
+	if (error)
+		return error;
+	error = xrep_roll_ag_trans(sc);
+	if (error)
+		return error;
+	return xfs_mod_fdblocks(sc->mp, -(int64_t)len, false);
+}
+
+/* Find the longest free extent in the list. */
+static struct xrep_abt_extent *
+xrep_abt_get_longest(
+	struct list_head	*free_extents)
+{
+	struct xrep_abt_extent	*rae;
+	struct xrep_abt_extent	*res = NULL;
+
+	list_for_each_entry(rae, free_extents, list) {
+		if (!res || rae->len > res->len)
+			res = rae;
+	}
+	return res;
+}
+
+/*
+ * Allocate a block from the (cached) first extent in the AG.  In theory
+ * this should never fail, since we already checked that there was enough
+ * space to handle the new btrees.
+ */
+STATIC xfs_fsblock_t
+xrep_abt_alloc_block(
+	struct xfs_scrub	*sc,
+	struct list_head	*free_extents)
+{
+	struct xrep_abt_extent	*ext;
+
+	/* Pull the first free space extent off the list, and... */
+	ext = list_first_entry(free_extents, struct xrep_abt_extent, list);
+
+	/* ...take its first block. */
+	ext->bno++;
+	ext->len--;
+	if (ext->len == 0) {
+		list_del(&ext->list);
+		kmem_free(ext);
+	}
+
+	return XFS_AGB_TO_FSB(sc->mp, sc->sa.agno, ext->bno - 1);
+}
+
+/* Free every record in the extent list. */
+STATIC void
+xrep_abt_cancel_freelist(
+	struct list_head	*extlist)
+{
+	struct xrep_abt_extent	*rae;
+	struct xrep_abt_extent	*n;
+
+	list_for_each_entry_safe(rae, n, extlist, list) {
+		list_del(&rae->list);
+		kmem_free(rae);
+	}
+}
+
+/*
+ * Iterate all reverse mappings to find (1) the free extents, (2) the OWN_AG
+ * extents, (3) the rmapbt blocks, and (4) the AGFL blocks.  The free space is
+ * (1) + (2) - (3) - (4).  Figure out if we have enough free space to
+ * reconstruct the free space btrees.  Caller must clean up the input lists
+ * if something goes wrong.
+ */
+STATIC int
+xrep_abt_find_freespace(
+	struct xfs_scrub	*sc,
+	struct list_head	*free_extents,
+	struct xfs_bitmap	*old_allocbt_blocks)
+{
+	struct xrep_abt		ra;
+	struct xrep_abt_extent	*rae;
+	struct xfs_btree_cur	*cur;
+	struct xfs_mount	*mp = sc->mp;
+	xfs_agblock_t		agend;
+	xfs_agblock_t		nr_blocks;
+	int			error;
+
+	ra.extlist = free_extents;
+	ra.btlist = old_allocbt_blocks;
+	xfs_bitmap_init(&ra.nobtlist);
+	ra.next_bno = 0;
+	ra.nr_records = 0;
+	ra.nr_blocks = 0;
+	ra.sc = sc;
+
+	/*
+	 * Iterate all the reverse mappings to find gaps in the physical
+	 * mappings, all the OWN_AG blocks, and all the rmapbt extents.
+	 */
+	cur = xfs_rmapbt_init_cursor(mp, sc->tp, sc->sa.agf_bp, sc->sa.agno);
+	error = xfs_rmap_query_all(cur, xrep_abt_walk_rmap, &ra);
+	if (error)
+		goto err;
+	xfs_btree_del_cursor(cur, error);
+	cur = NULL;
+
+	/* Insert a record for space between the last rmap and EOAG. */
+	agend = be32_to_cpu(XFS_BUF_TO_AGF(sc->sa.agf_bp)->agf_length);
+	if (ra.next_bno < agend) {
+		rae = kmem_alloc(sizeof(struct xrep_abt_extent), KM_MAYFAIL);
+		if (!rae) {
+			error = -ENOMEM;
+			goto err;
+		}
+		INIT_LIST_HEAD(&rae->list);
+		rae->bno = ra.next_bno;
+		rae->len = agend - ra.next_bno;
+		list_add_tail(&rae->list, free_extents);
+		ra.nr_records++;
+		ra.nr_blocks += rae->len;
+	}
+
+	/* Collect all the AGFL blocks. */
+	error = xfs_agfl_walk(mp, XFS_BUF_TO_AGF(sc->sa.agf_bp),
+			sc->sa.agfl_bp, xrep_abt_walk_agfl, &ra);
+	if (error)
+		goto err;
+
+	/* Do we have enough space to rebuild both freespace btrees? */
+	nr_blocks = 2 * xfs_allocbt_calc_size(mp, ra.nr_records);
+	if (!xrep_ag_has_space(sc->sa.pag, nr_blocks, XFS_AG_RESV_NONE) ||
+	    ra.nr_blocks < nr_blocks) {
+		error = -ENOSPC;
+		goto err;
+	}
+
+	/* Compute the old bnobt/cntbt blocks. */
+	error = xfs_bitmap_disunion(old_allocbt_blocks, &ra.nobtlist);
+err:
+	xfs_bitmap_destroy(&ra.nobtlist);
+	if (cur)
+		xfs_btree_del_cursor(cur, error);
+	return error;
+}
+
+/*
+ * Reset the global free block counter and the per-AG counters to make it look
+ * like this AG has no free space.
+ */
+STATIC int
+xrep_abt_reset_counters(
+	struct xfs_scrub	*sc,
+	int			*log_flags)
+{
+	struct xfs_perag	*pag = sc->sa.pag;
+	struct xfs_agf		*agf;
+	xfs_agblock_t		new_btblks;
+	xfs_agblock_t		to_free;
+	int			error;
+
+	/*
+	 * Since we're abandoning the old bnobt/cntbt, we have to decrease
+	 * fdblocks by the # of blocks in those trees.  btreeblks counts the
+	 * non-root blocks of the free space and rmap btrees.  Do this before
+	 * resetting the AGF counters.
+	 */
+	agf = XFS_BUF_TO_AGF(sc->sa.agf_bp);
+
+	/* rmap_blocks accounts root block, btreeblks doesn't */
+	new_btblks = be32_to_cpu(agf->agf_rmap_blocks) - 1;
+
+	/* btreeblks doesn't account bno/cnt root blocks */
+	to_free = pag->pagf_btreeblks + 2;
+
+	/* and don't account for the blocks we aren't freeing */
+	to_free -= new_btblks;
+
+	error = xfs_mod_fdblocks(sc->mp, -(int64_t)to_free, false);
+	if (error)
+		return error;
+
+	/*
+	 * Reset the per-AG info, both incore and ondisk.  Mark the incore
+	 * state stale in case we fail out of here.
+	 */
+	ASSERT(pag->pagf_init);
+	pag->pagf_init = 0;
+	pag->pagf_btreeblks = new_btblks;
+	pag->pagf_freeblks = 0;
+	pag->pagf_longest = 0;
+
+	agf->agf_btreeblks = cpu_to_be32(new_btblks);
+	agf->agf_freeblks = 0;
+	agf->agf_longest = 0;
+	*log_flags |= XFS_AGF_BTREEBLKS | XFS_AGF_LONGEST | XFS_AGF_FREEBLKS;
+
+	return 0;
+}
+
+/* Initialize a new free space btree root and implant into AGF. */
+STATIC int
+xrep_abt_reset_btree(
+	struct xfs_scrub	*sc,
+	xfs_btnum_t		btnum,
+	struct list_head	*free_extents)
+{
+	struct xfs_owner_info	oinfo;
+	struct xfs_buf		*bp;
+	struct xfs_perag	*pag = sc->sa.pag;
+	struct xfs_mount	*mp = sc->mp;
+	struct xfs_agf		*agf = XFS_BUF_TO_AGF(sc->sa.agf_bp);
+	xfs_fsblock_t		fsbno;
+	int			error;
+
+	/* Allocate new root block. */
+	fsbno = xrep_abt_alloc_block(sc, free_extents);
+	if (fsbno == NULLFSBLOCK)
+		return -ENOSPC;
+
+	/* Initialize new tree root. */
+	error = xrep_init_btblock(sc, fsbno, &bp, btnum, &xfs_allocbt_buf_ops);
+	if (error)
+		return error;
+
+	/* Implant into AGF. */
+	agf->agf_roots[btnum] = cpu_to_be32(XFS_FSB_TO_AGBNO(mp, fsbno));
+	agf->agf_levels[btnum] = cpu_to_be32(1);
+
+	/* Add rmap records for the btree roots */
+	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_AG);
+	error = xfs_rmap_alloc(sc->tp, sc->sa.agf_bp, sc->sa.agno,
+			XFS_FSB_TO_AGBNO(mp, fsbno), 1, &oinfo);
+	if (error)
+		return error;
+
+	/* Reset the incore state. */
+	pag->pagf_levels[btnum] = 1;
+
+	return 0;
+}
+
+/* Initialize new bnobt/cntbt roots and implant them into the AGF. */
+STATIC int
+xrep_abt_reset_btrees(
+	struct xfs_scrub	*sc,
+	struct list_head	*free_extents,
+	int			*log_flags)
+{
+	int			error;
+
+	error = xrep_abt_reset_btree(sc, XFS_BTNUM_BNOi, free_extents);
+	if (error)
+		return error;
+	error = xrep_abt_reset_btree(sc, XFS_BTNUM_CNTi, free_extents);
+	if (error)
+		return error;
+
+	*log_flags |= XFS_AGF_ROOTS | XFS_AGF_LEVELS;
+	return 0;
+}
+
+/*
+ * Make our new freespace btree roots permanent so that we can start freeing
+ * unused space back into the AG.
+ */
+STATIC int
+xrep_abt_commit_new(
+	struct xfs_scrub	*sc,
+	struct xfs_bitmap	*old_allocbt_blocks,
+	int			log_flags)
+{
+	int			error;
+
+	xfs_alloc_log_agf(sc->tp, sc->sa.agf_bp, log_flags);
+
+	/* Invalidate the old freespace btree blocks and commit. */
+	error = xrep_invalidate_blocks(sc, old_allocbt_blocks);
+	if (error)
+		return error;
+	error = xrep_roll_ag_trans(sc);
+	if (error)
+		return error;
+
+	/* Now that we've succeeded, mark the incore state valid again. */
+	sc->sa.pag->pagf_init = 1;
+	return 0;
+}
+
+/* Build new free space btrees and dispose of the old one. */
+STATIC int
+xrep_abt_rebuild_trees(
+	struct xfs_scrub	*sc,
+	struct list_head	*free_extents,
+	struct xfs_bitmap	*old_allocbt_blocks)
+{
+	struct xfs_owner_info	oinfo;
+	struct xrep_abt_extent	*rae;
+	struct xrep_abt_extent	*n;
+	struct xrep_abt_extent	*longest;
+	int			error;
+
+	xfs_rmap_skip_owner_update(&oinfo);
+
+	/*
+	 * Insert the longest free extent in case it's necessary to
+	 * refresh the AGFL with multiple blocks.  If there is no longest
+	 * extent, we had exactly the free space we needed; we're done.
+	 */
+	longest = xrep_abt_get_longest(free_extents);
+	if (!longest)
+		goto done;
+	error = xrep_abt_free_extent(sc,
+			XFS_AGB_TO_FSB(sc->mp, sc->sa.agno, longest->bno),
+			longest->len, &oinfo);
+	list_del(&longest->list);
+	kmem_free(longest);
+	if (error)
+		return error;
+
+	/* Insert records into the new btrees. */
+	list_for_each_entry_safe(rae, n, free_extents, list) {
+		error = xrep_abt_free_extent(sc,
+				XFS_AGB_TO_FSB(sc->mp, sc->sa.agno, rae->bno),
+				rae->len, &oinfo);
+		if (error)
+			return error;
+		list_del(&rae->list);
+		kmem_free(rae);
+	}
+
+done:
+	/* Free all the OWN_AG blocks that are not in the rmapbt/agfl. */
+	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_AG);
+	return xrep_reap_extents(sc, old_allocbt_blocks, &oinfo,
+			XFS_AG_RESV_NONE);
+}
+
+/* Repair the freespace btrees for some AG. */
+int
+xrep_allocbt(
+	struct xfs_scrub	*sc)
+{
+	struct list_head	free_extents;
+	struct xfs_bitmap	old_allocbt_blocks;
+	struct xfs_mount	*mp = sc->mp;
+	int			log_flags = 0;
+	int			error;
+
+	/* We require the rmapbt to rebuild anything. */
+	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+		return -EOPNOTSUPP;
+
+	xchk_perag_get(sc->mp, &sc->sa);
+
+	/*
+	 * Make sure the busy extent list is clear because we can't put
+	 * extents on there twice.
+	 */
+	if (!xfs_extent_busy_list_empty(sc->sa.pag))
+		return -EDEADLOCK;
+
+	/* Collect the free space data and find the old btree blocks. */
+	INIT_LIST_HEAD(&free_extents);
+	xfs_bitmap_init(&old_allocbt_blocks);
+	error = xrep_abt_find_freespace(sc, &free_extents, &old_allocbt_blocks);
+	if (error)
+		goto out;
+
+	/* Make sure we got some free space. */
+	if (list_empty(&free_extents)) {
+		error = -ENOSPC;
+		goto out;
+	}
+
+	/*
+	 * Sort the free extents by block number to avoid bnobt splits when we
+	 * rebuild the free space btrees.
+	 */
+	list_sort(NULL, &free_extents, xrep_abt_extent_cmp);
+
+	/*
+	 * Blow out the old free space btrees.  This is the point at which
+	 * we are no longer able to bail out gracefully.
+	 */
+	error = xrep_abt_reset_counters(sc, &log_flags);
+	if (error)
+		goto out;
+	error = xrep_abt_reset_btrees(sc, &free_extents, &log_flags);
+	if (error)
+		goto out;
+	error = xrep_abt_commit_new(sc, &old_allocbt_blocks, log_flags);
+	if (error)
+		goto out;
+
+	/* Now rebuild the freespace information. */
+	error = xrep_abt_rebuild_trees(sc, &free_extents, &old_allocbt_blocks);
+out:
+	xrep_abt_cancel_freelist(&free_extents);
+	xfs_bitmap_destroy(&old_allocbt_blocks);
+	return error;
+}
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 6512d8fb67e1..5b910aecc59a 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -623,8 +623,14 @@ xchk_setup_ag_btree(
 	 * expensive operation should be performed infrequently and only
 	 * as a last resort.  Any caller that sets force_log should
 	 * document why they need to do so.
+	 *
+	 * Force everything in memory out to disk if we're repairing.
+	 * This ensures we won't get tripped up by btree blocks sitting
+	 * in memory waiting to have LSNs stamped in.  The AGF/AGI repair
+	 * routines use any available rmap data to try to find a btree
+	 * root that also passes the read verifiers.
 	 */
-	if (force_log) {
+	if (force_log || (sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR)) {
 		error = xchk_checkpoint_log(mp);
 		if (error)
 			return error;
diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
index 9de321eee4ab..bc1a5f1cbcdc 100644
--- a/fs/xfs/scrub/repair.h
+++ b/fs/xfs/scrub/repair.h
@@ -61,6 +61,7 @@ int xrep_superblock(struct xfs_scrub *sc);
 int xrep_agf(struct xfs_scrub *sc);
 int xrep_agfl(struct xfs_scrub *sc);
 int xrep_agi(struct xfs_scrub *sc);
+int xrep_allocbt(struct xfs_scrub *sc);
 
 #else
 
@@ -87,6 +88,7 @@ xrep_calc_ag_resblks(
 #define xrep_agf			xrep_notsupported
 #define xrep_agfl			xrep_notsupported
 #define xrep_agi			xrep_notsupported
+#define xrep_allocbt			xrep_notsupported
 
 #endif /* CONFIG_XFS_ONLINE_REPAIR */
 
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 4bfae1e61d30..2133a3199372 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -232,13 +232,13 @@ static const struct xchk_meta_ops meta_scrub_ops[] = {
 		.type	= ST_PERAG,
 		.setup	= xchk_setup_ag_allocbt,
 		.scrub	= xchk_bnobt,
-		.repair	= xrep_notsupported,
+		.repair	= xrep_allocbt,
 	},
 	[XFS_SCRUB_TYPE_CNTBT] = {	/* cntbt */
 		.type	= ST_PERAG,
 		.setup	= xchk_setup_ag_allocbt,
 		.scrub	= xchk_cntbt,
-		.repair	= xrep_notsupported,
+		.repair	= xrep_allocbt,
 	},
 	[XFS_SCRUB_TYPE_INOBT] = {	/* inobt */
 		.type	= ST_PERAG,
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 4e20f0e48232..26bd5dc68efe 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -551,7 +551,7 @@ DEFINE_EVENT(xrep_rmap_class, name, \
 		 xfs_agblock_t agbno, xfs_extlen_t len, \
 		 uint64_t owner, uint64_t offset, unsigned int flags), \
 	TP_ARGS(mp, agno, agbno, len, owner, offset, flags))
-DEFINE_REPAIR_RMAP_EVENT(xrep_alloc_extent_fn);
+DEFINE_REPAIR_RMAP_EVENT(xrep_abt_walk_rmap);
 DEFINE_REPAIR_RMAP_EVENT(xrep_ialloc_extent_fn);
 DEFINE_REPAIR_RMAP_EVENT(xrep_rmap_extent_fn);
 DEFINE_REPAIR_RMAP_EVENT(xrep_bmap_extent_fn);
diff --git a/fs/xfs/xfs_extent_busy.c b/fs/xfs/xfs_extent_busy.c
index 0ed68379e551..82f99633a597 100644
--- a/fs/xfs/xfs_extent_busy.c
+++ b/fs/xfs/xfs_extent_busy.c
@@ -657,3 +657,17 @@ xfs_extent_busy_ag_cmp(
 		diff = b1->bno - b2->bno;
 	return diff;
 }
+
+/* Are there any busy extents in this AG? */
+bool
+xfs_extent_busy_list_empty(
+	struct xfs_perag	*pag)
+{
+	spin_lock(&pag->pagb_lock);
+	if (pag->pagb_tree.rb_node) {
+		spin_unlock(&pag->pagb_lock);
+		return false;
+	}
+	spin_unlock(&pag->pagb_lock);
+	return true;
+}
diff --git a/fs/xfs/xfs_extent_busy.h b/fs/xfs/xfs_extent_busy.h
index 990ab3891971..2f8c73c712c6 100644
--- a/fs/xfs/xfs_extent_busy.h
+++ b/fs/xfs/xfs_extent_busy.h
@@ -65,4 +65,6 @@ static inline void xfs_extent_busy_sort(struct list_head *list)
 	list_sort(NULL, list, xfs_extent_busy_ag_cmp);
 }
 
+bool xfs_extent_busy_list_empty(struct xfs_perag *pag);
+
 #endif /* __XFS_EXTENT_BUSY_H__ */


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 08/16] xfs: repair inode btrees
  2018-07-26  0:19 [PATCH v17 00/16] xfs-4.19: online repair support Darrick J. Wong
                   ` (6 preceding siblings ...)
  2018-07-26  0:20 ` [PATCH 07/16] xfs: repair free space btrees Darrick J. Wong
@ 2018-07-26  0:21 ` Darrick J. Wong
  2018-07-26  0:21 ` [PATCH 09/16] xfs: repair refcount btrees Darrick J. Wong
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2018-07-26  0:21 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, david, allison.henderson

From: Darrick J. Wong <darrick.wong@oracle.com>

Use the rmapbt to find inode chunks, query the chunks to compute
hole and free masks, and with that information rebuild the inobt
and finobt.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile              |    1 
 fs/xfs/scrub/common.c        |    2 
 fs/xfs/scrub/ialloc_repair.c |  673 ++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/repair.c        |   20 +
 fs/xfs/scrub/repair.h        |   11 +
 fs/xfs/scrub/scrub.c         |    4 
 fs/xfs/scrub/scrub.h         |    1 
 fs/xfs/scrub/trace.h         |    4 
 8 files changed, 712 insertions(+), 4 deletions(-)
 create mode 100644 fs/xfs/scrub/ialloc_repair.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 44ddd112acd2..af1dc9aeb1a7 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -166,6 +166,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   agheader_repair.o \
 				   alloc_repair.o \
 				   bitmap.o \
+				   ialloc_repair.o \
 				   repair.o \
 				   )
 endif
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 5b910aecc59a..d03c4df38ac8 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -516,6 +516,8 @@ xchk_ag_free(
 	struct xchk_ag		*sa)
 {
 	xchk_ag_btcur_free(sa);
+	if (sa->pag != NULL && sc->reset_perag_resv)
+		xrep_reset_perag_resv(sc);
 	if (sa->agfl_bp) {
 		xfs_trans_brelse(sc->tp, sa->agfl_bp);
 		sa->agfl_bp = NULL;
diff --git a/fs/xfs/scrub/ialloc_repair.c b/fs/xfs/scrub/ialloc_repair.c
new file mode 100644
index 000000000000..126135c1a147
--- /dev/null
+++ b/fs/xfs/scrub/ialloc_repair.c
@@ -0,0 +1,673 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_alloc.h"
+#include "xfs_ialloc.h"
+#include "xfs_ialloc_btree.h"
+#include "xfs_icache.h"
+#include "xfs_rmap.h"
+#include "xfs_rmap_btree.h"
+#include "xfs_log.h"
+#include "xfs_trans_priv.h"
+#include "xfs_error.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/btree.h"
+#include "scrub/trace.h"
+#include "scrub/repair.h"
+#include "scrub/bitmap.h"
+
+/*
+ * Inode Btree Repair
+ * ==================
+ *
+ * A quick refresher of inode btrees on a v5 filesystem:
+ *
+ * - Each inode btree record can describe a single 'inode chunk'.  The chunk
+ *   size is defined to be 64 inodes.  If sparse inodes are enabled, every
+ *   inobt record must be aligned to the chunk size.  A chunk can be smaller
+ *   than a fs block.  One must be careful with 64k-block filesystems whose
+ *   inodes are smaller than 1k.
+ *
+ * - Inode buffers are read into memory in units of 'inode clusters'.  However
+ *   many inodes fit in a cluster buffer is the smallest number of inodes that
+ *   can be allocated or freed.  Clusters are never larger than a chunk and
+ *   never smaller than a fs block.  If sparse inodes are not enabled, then
+ *   records can be aligned to a cluster.
+ *
+ * - If sparse inodes are enabled, the holemask field will be active.  Each
+ *   bit of the holemask represents 4 potential inodes; if set, the
+ *   corresponding space does *not* contain inodes and must be left alone.
+ *
+ * So what's the rebuild algorithm?
+ *
+ * Iterate the reverse mapping records looking for OWN_INODES and OWN_INOBT
+ * records.  The OWN_INOBT records are the old inode btree blocks and will be
+ * cleared out after we've rebuilt the tree.  Each possible inode chunk within
+ * an OWN_INODES record will be read in and the freemask calculated from the
+ * i_mode data in the inode chunk.  For sparse inodes the holemask will be
+ * calculated by creating the properly aligned inobt record and punching out
+ * any chunk that's missing.  Inode allocations and frees grab the AGI first,
+ * so repair protects itself from concurrent access by locking the AGI.
+ *
+ * Once we've reconstructed all the inode records, we can create new inode
+ * btree roots and reload the btrees.  We rebuild both inode trees at the same
+ * time because they have the same rmap owner and it would be more complex to
+ * figure out if the other tree isn't in need of a rebuild and which OWN_INOBT
+ * blocks it owns.  We have all the data we need to build both, so dump
+ * everything and start over.
+ *
+ * We use the prefix 'xrep_ibt' because we rebuild both inode btrees.
+ */
+
+struct xrep_ibt_extent {
+	struct list_head	list;
+	xfs_inofree_t		freemask;
+	xfs_agino_t		startino;
+	unsigned int		count;
+	unsigned int		usedcount;
+	uint16_t		holemask;
+};
+
+struct xrep_ibt {
+	/* Reconstructed inode records. */
+	struct list_head	*extlist;
+
+	/* Old inode btree blocks we found in the rmap. */
+	struct xfs_bitmap	*btlist;
+
+	struct xfs_scrub	*sc;
+
+	/* Number of inode btree block records. */
+	uint64_t		nr_records;
+};
+
+/*
+ * Is this inode in use?  If the inode is in memory we can tell from i_mode,
+ * otherwise we have to check di_mode in the on-disk buffer.  We only care
+ * that the high (i.e. non-permission) bits of _mode are zero.  This should be
+ * safe because repair keeps all AG headers locked until the end, and process
+ * trying to perform an inode allocation/free must lock the AGI.
+ */
+STATIC int
+xrep_ibt_check_free(
+	struct xfs_scrub	*sc,
+	struct xfs_buf		*bp,
+	xfs_ino_t		fsino,
+	xfs_agino_t		bpino,
+	bool			*inuse)
+{
+	struct xfs_mount	*mp = sc->mp;
+	struct xfs_dinode	*dip;
+	int			error;
+
+	/* Will the in-core inode tell us if it's in use? */
+	error = xfs_icache_inode_is_allocated(mp, sc->tp, fsino, inuse);
+	if (!error)
+		return 0;
+
+	/* Inode uncached or half assembled, read disk buffer */
+	dip = xfs_buf_offset(bp, bpino * mp->m_sb.sb_inodesize);
+	if (be16_to_cpu(dip->di_magic) != XFS_DINODE_MAGIC)
+		return -EFSCORRUPTED;
+
+	if (dip->di_version >= 3 && be64_to_cpu(dip->di_ino) != fsino)
+		return -EFSCORRUPTED;
+
+	*inuse = dip->di_mode != 0;
+	return 0;
+}
+
+/*
+ * For each inode cluster covering the physical extent recorded by the rmapbt,
+ * we must calculate the properly aligned startino of that cluster, then
+ * iterate each cluster to fill in used and filled masks appropriately.  We
+ * then use the (startino, used, filled) information to construct the
+ * appropriate inode records.
+ */
+STATIC int
+xrep_ibt_process_cluster(
+	struct xrep_ibt		*ri,
+	xfs_agblock_t		agbno,
+	int			blks_per_cluster,
+	xfs_agino_t		rec_agino)
+{
+	struct xfs_imap		imap;
+	struct xrep_ibt_extent	*rie;
+	struct xfs_dinode	*dip;
+	struct xfs_buf		*bp;
+	struct xfs_scrub	*sc = ri->sc;
+	struct xfs_mount	*mp = sc->mp;
+	xfs_ino_t		fsino;
+	xfs_inofree_t		usedmask;
+	xfs_agino_t		nr_inodes;
+	xfs_agino_t		startino;
+	xfs_agino_t		clusterino;
+	xfs_agino_t		clusteroff;
+	xfs_agino_t		agino;
+	uint16_t		fillmask;
+	bool			inuse;
+	int			usedcount;
+	int			error;
+
+	/* The per-AG inum of this inode cluster. */
+	agino = XFS_OFFBNO_TO_AGINO(mp, agbno, 0);
+
+	/* The per-AG inum of the inobt record. */
+	startino = rec_agino + rounddown(agino - rec_agino,
+			XFS_INODES_PER_CHUNK);
+
+	/* The per-AG inum of the cluster within the inobt record. */
+	clusteroff = agino - startino;
+
+	/* Every inode in this holemask slot is filled. */
+	nr_inodes = XFS_OFFBNO_TO_AGINO(mp, blks_per_cluster, 0);
+	fillmask = xfs_inobt_maskn(clusteroff / XFS_INODES_PER_HOLEMASK_BIT,
+			nr_inodes / XFS_INODES_PER_HOLEMASK_BIT);
+
+	/*
+	 * Grab the inode cluster buffer.  This is safe to do with a broken
+	 * inobt because imap_to_bp directly maps the buffer without touching
+	 * either inode btree.
+	 */
+	imap.im_blkno = XFS_AGB_TO_DADDR(mp, sc->sa.agno, agbno);
+	imap.im_len = XFS_FSB_TO_BB(mp, blks_per_cluster);
+	imap.im_boffset = 0;
+	error = xfs_imap_to_bp(mp, sc->tp, &imap, &dip, &bp, 0,
+			XFS_IGET_UNTRUSTED);
+	if (error)
+		return error;
+
+	usedmask = 0;
+	usedcount = 0;
+	/* Which inodes within this cluster are free? */
+	for (clusterino = 0; clusterino < nr_inodes; clusterino++) {
+		fsino = XFS_AGINO_TO_INO(mp, sc->sa.agno, agino + clusterino);
+		error = xrep_ibt_check_free(sc, bp, fsino,
+				clusterino, &inuse);
+		if (error) {
+			xfs_trans_brelse(sc->tp, bp);
+			return error;
+		}
+		if (inuse) {
+			usedcount++;
+			usedmask |= XFS_INOBT_MASK(clusteroff + clusterino);
+		}
+	}
+	xfs_trans_brelse(sc->tp, bp);
+
+	/*
+	 * If the last item in the list is our chunk record,
+	 * update that.
+	 */
+	if (!list_empty(ri->extlist)) {
+		rie = list_last_entry(ri->extlist, struct xrep_ibt_extent,
+				list);
+		if (rie->startino + XFS_INODES_PER_CHUNK > startino) {
+			rie->freemask &= ~usedmask;
+			rie->holemask &= ~fillmask;
+			rie->count += nr_inodes;
+			rie->usedcount += usedcount;
+			return 0;
+		}
+	}
+
+	/* New inode chunk; add to the list. */
+	rie = kmem_alloc(sizeof(struct xrep_ibt_extent), KM_MAYFAIL);
+	if (!rie)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&rie->list);
+	rie->startino = startino;
+	rie->freemask = XFS_INOBT_ALL_FREE & ~usedmask;
+	rie->holemask = XFS_INOBT_ALL_FREE & ~fillmask;
+	rie->count = nr_inodes;
+	rie->usedcount = usedcount;
+	list_add_tail(&rie->list, ri->extlist);
+	ri->nr_records++;
+
+	return 0;
+}
+
+/* Record extents that belong to inode btrees. */
+STATIC int
+xrep_ibt_walk_rmap(
+	struct xfs_btree_cur	*cur,
+	struct xfs_rmap_irec	*rec,
+	void			*priv)
+{
+	struct xrep_ibt		*ri = priv;
+	struct xfs_mount	*mp = cur->bc_mp;
+	xfs_fsblock_t		fsbno;
+	xfs_agblock_t		agbno = rec->rm_startblock;
+	xfs_agino_t		inoalign;
+	xfs_agino_t		agino;
+	xfs_agino_t		rec_agino;
+	int			blks_per_cluster;
+	int			error = 0;
+
+	if (xchk_should_terminate(ri->sc, &error))
+		return error;
+
+	/* Fragment of the old btrees; dispose of them later. */
+	if (rec->rm_owner == XFS_RMAP_OWN_INOBT) {
+		fsbno = XFS_AGB_TO_FSB(mp, ri->sc->sa.agno, agbno);
+		return xfs_bitmap_set(ri->btlist, fsbno, rec->rm_blockcount);
+	}
+
+	/* Skip extents which are not owned by this inode and fork. */
+	if (rec->rm_owner != XFS_RMAP_OWN_INODES)
+		return 0;
+
+	blks_per_cluster = xfs_icluster_size_fsb(mp);
+
+	if (agbno % blks_per_cluster != 0)
+		return -EFSCORRUPTED;
+
+	trace_xrep_ibt_walk_rmap(mp, ri->sc->sa.agno, rec->rm_startblock,
+			rec->rm_blockcount, rec->rm_owner, rec->rm_offset,
+			rec->rm_flags);
+
+	/*
+	 * Determine the inode block alignment, and where the block
+	 * ought to start if it's aligned properly.  On a sparse inode
+	 * system the rmap doesn't have to start on an alignment boundary,
+	 * but the record does.  On pre-sparse filesystems, we /must/
+	 * start both rmap and inobt on an alignment boundary.
+	 */
+	inoalign = xfs_ialloc_cluster_alignment(mp);
+	agino = XFS_OFFBNO_TO_AGINO(mp, agbno, 0);
+	rec_agino = XFS_OFFBNO_TO_AGINO(mp, rounddown(agbno, inoalign), 0);
+	if (!xfs_sb_version_hassparseinodes(&mp->m_sb) && agino != rec_agino)
+		return -EFSCORRUPTED;
+
+	/*
+	 * Set up the free/hole masks for each inode cluster that could be
+	 * mapped by this rmap record.
+	 */
+	for (;
+	     agbno < rec->rm_startblock + rec->rm_blockcount;
+	     agbno += blks_per_cluster) {
+		error = xrep_ibt_process_cluster(ri, agbno, blks_per_cluster,
+				rec_agino);
+		if (error)
+			return error;
+	}
+
+	return 0;
+}
+
+/* Compare two ialloc extents. */
+static int
+xrep_ibt_extent_cmp(
+	void			*priv,
+	struct list_head	*a,
+	struct list_head	*b)
+{
+	struct xrep_ibt_extent	*ap;
+	struct xrep_ibt_extent	*bp;
+
+	ap = container_of(a, struct xrep_ibt_extent, list);
+	bp = container_of(b, struct xrep_ibt_extent, list);
+
+	if (ap->startino > bp->startino)
+		return 1;
+	else if (ap->startino < bp->startino)
+		return -1;
+	return 0;
+}
+
+/* Insert an inode chunk record into a given btree. */
+static int
+xrep_ibt_insert_btrec(
+	struct xfs_btree_cur	*cur,
+	struct xrep_ibt_extent	*rie)
+{
+	int			stat;
+	int			error;
+
+	error = xfs_inobt_lookup(cur, rie->startino, XFS_LOOKUP_EQ, &stat);
+	if (error)
+		return error;
+	XFS_WANT_CORRUPTED_RETURN(cur->bc_mp, stat == 0);
+	error = xfs_inobt_insert_rec(cur, rie->holemask, rie->count,
+			rie->count - rie->usedcount, rie->freemask, &stat);
+	if (error)
+		return error;
+	XFS_WANT_CORRUPTED_RETURN(cur->bc_mp, stat == 1);
+	return error;
+}
+
+/* Insert an inode chunk record into both inode btrees. */
+static int
+xrep_ibt_insert_rec(
+	struct xfs_scrub	*sc,
+	struct xrep_ibt_extent	*rie)
+{
+	struct xfs_btree_cur	*cur;
+	int			error;
+
+	trace_xrep_ibt_insert(sc->mp, sc->sa.agno, rie->startino,
+			rie->holemask, rie->count, rie->count - rie->usedcount,
+			rie->freemask);
+
+	/* Insert into the inobt. */
+	cur = xfs_inobt_init_cursor(sc->mp, sc->tp, sc->sa.agi_bp, sc->sa.agno,
+			XFS_BTNUM_INO);
+	error = xrep_ibt_insert_btrec(cur, rie);
+	if (error)
+		goto out_cur;
+	xfs_btree_del_cursor(cur, error);
+
+	/* Insert into the finobt if chunk has free inodes. */
+	if (xfs_sb_version_hasfinobt(&sc->mp->m_sb) &&
+	    rie->count != rie->usedcount) {
+		cur = xfs_inobt_init_cursor(sc->mp, sc->tp, sc->sa.agi_bp,
+				sc->sa.agno, XFS_BTNUM_FINO);
+		error = xrep_ibt_insert_btrec(cur, rie);
+		if (error)
+			goto out_cur;
+		xfs_btree_del_cursor(cur, error);
+	}
+
+	return xrep_roll_ag_trans(sc);
+out_cur:
+	xfs_btree_del_cursor(cur, error);
+	return error;
+}
+
+/* Free every record in the inode list. */
+STATIC void
+xrep_ibt_cancel_inorecs(
+	struct list_head	*reclist)
+{
+	struct xrep_ibt_extent	*rie;
+	struct xrep_ibt_extent	*n;
+
+	list_for_each_entry_safe(rie, n, reclist, list) {
+		list_del(&rie->list);
+		kmem_free(rie);
+	}
+}
+
+/*
+ * Iterate all reverse mappings to find the inodes (OWN_INODES) and the inode
+ * btrees (OWN_INOBT).  Figure out if we have enough free space to reconstruct
+ * the inode btrees.  The caller must clean up the lists if anything goes
+ * wrong.
+ */
+STATIC int
+xrep_ibt_find_inodes(
+	struct xfs_scrub	*sc,
+	struct list_head	*inode_records,
+	struct xfs_bitmap	*old_iallocbt_blocks)
+{
+	struct xrep_ibt		ri;
+	struct xfs_mount	*mp = sc->mp;
+	struct xfs_btree_cur	*cur;
+	xfs_agblock_t		nr_blocks;
+	int			error;
+
+	/* Collect all reverse mappings for inode blocks. */
+	ri.extlist = inode_records;
+	ri.btlist = old_iallocbt_blocks;
+	ri.nr_records = 0;
+	ri.sc = sc;
+
+	cur = xfs_rmapbt_init_cursor(mp, sc->tp, sc->sa.agf_bp, sc->sa.agno);
+	error = xfs_rmap_query_all(cur, xrep_ibt_walk_rmap, &ri);
+	if (error)
+		goto err;
+	xfs_btree_del_cursor(cur, error);
+
+	/* Do we have enough space to rebuild all inode trees? */
+	nr_blocks = xfs_iallocbt_calc_size(mp, ri.nr_records);
+	if (xfs_sb_version_hasfinobt(&mp->m_sb))
+		nr_blocks *= 2;
+	if (!xrep_ag_has_space(sc->sa.pag, nr_blocks, XFS_AG_RESV_NONE))
+		return -ENOSPC;
+
+	return 0;
+
+err:
+	xfs_btree_del_cursor(cur, error);
+	return error;
+}
+
+/* Update the AGI counters. */
+STATIC int
+xrep_ibt_reset_counters(
+	struct xfs_scrub	*sc,
+	struct list_head	*inode_records,
+	int			*log_flags)
+{
+	struct xfs_agi		*agi;
+	struct xrep_ibt_extent	*rie;
+	struct xfs_perag	*pag = sc->sa.pag;
+	unsigned int		count = 0;
+	unsigned int		usedcount = 0;
+	unsigned int		freecount;
+
+	/* Figure out the new counters. */
+	list_for_each_entry(rie, inode_records, list) {
+		count += rie->count;
+		usedcount += rie->usedcount;
+	}
+
+	agi = XFS_BUF_TO_AGI(sc->sa.agi_bp);
+	freecount = count - usedcount;
+
+	/* Trigger inode count recalculation */
+	xfs_force_summary_recalc(sc->mp);
+
+	/*
+	 * Reset the per-AG info, both incore and ondisk.  Mark the incore
+	 * state stale in case we fail out of here.
+	 */
+	ASSERT(pag->pagi_init);
+	pag->pagi_init = 0;
+	pag->pagi_count = count;
+	pag->pagi_freecount = freecount;
+
+	agi->agi_count = cpu_to_be32(count);
+	agi->agi_freecount = cpu_to_be32(freecount);
+	*log_flags |= XFS_AGI_COUNT | XFS_AGI_FREECOUNT;
+
+	return 0;
+}
+
+/* Initialize a new inode btree roots and implant it into the AGI. */
+STATIC int
+xrep_ibt_reset_btree(
+	struct xfs_scrub	*sc,
+	xfs_btnum_t		btnum,
+	struct xfs_owner_info	*oinfo,
+	enum xfs_ag_resv_type	resv,
+	int			*log_flags)
+{
+	struct xfs_agi		*agi;
+	struct xfs_buf		*bp;
+	struct xfs_mount	*mp = sc->mp;
+	xfs_fsblock_t		fsbno;
+	int			error;
+
+	agi = XFS_BUF_TO_AGI(sc->sa.agi_bp);
+
+	/* Initialize new btree root. */
+	error = xrep_alloc_ag_block(sc, oinfo, &fsbno, resv);
+	if (error)
+		return error;
+	error = xrep_init_btblock(sc, fsbno, &bp, btnum, &xfs_inobt_buf_ops);
+	if (error)
+		return error;
+
+	switch (btnum) {
+	case XFS_BTNUM_INOi:
+		agi->agi_root = cpu_to_be32(XFS_FSB_TO_AGBNO(mp, fsbno));
+		agi->agi_level = cpu_to_be32(1);
+		*log_flags |= XFS_AGI_ROOT | XFS_AGI_LEVEL;
+		break;
+	case XFS_BTNUM_FINOi:
+		agi->agi_free_root = cpu_to_be32(XFS_FSB_TO_AGBNO(mp, fsbno));
+		agi->agi_free_level = cpu_to_be32(1);
+		*log_flags |= XFS_AGI_FREE_ROOT | XFS_AGI_FREE_LEVEL;
+		break;
+	default:
+		ASSERT(0);
+	}
+
+	return 0;
+}
+
+/* Initialize new inobt/finobt roots and implant them into the AGI. */
+STATIC int
+xrep_ibt_reset_btrees(
+	struct xfs_scrub	*sc,
+	struct xfs_owner_info	*oinfo,
+	int			*log_flags)
+{
+	enum xfs_ag_resv_type	resv;
+	int			error;
+
+	resv = XFS_AG_RESV_NONE;
+	error = xrep_ibt_reset_btree(sc, XFS_BTNUM_INO, oinfo, XFS_AG_RESV_NONE,
+			log_flags);
+	if (error || !xfs_sb_version_hasfinobt(&sc->mp->m_sb))
+		return error;
+
+	/*
+	 * If we made a per-AG reservation for the finobt then we must account
+	 * the new block correctly.
+	 */
+	if (!sc->mp->m_inotbt_nores)
+		resv = XFS_AG_RESV_METADATA;
+	return xrep_ibt_reset_btree(sc, XFS_BTNUM_FINO, oinfo, resv, log_flags);
+}
+
+/* Build new inode btrees and dispose of the old one. */
+STATIC int
+xrep_ibt_rebuild_trees(
+	struct xfs_scrub	*sc,
+	struct list_head	*inode_records,
+	struct xfs_owner_info	*oinfo,
+	struct xfs_bitmap	*old_iallocbt_blocks)
+{
+	struct xrep_ibt_extent	*rie;
+	struct xrep_ibt_extent	*n;
+	int			error;
+
+	/* Add all records. */
+	list_sort(NULL, inode_records, xrep_ibt_extent_cmp);
+	list_for_each_entry_safe(rie, n, inode_records, list) {
+		error = xrep_ibt_insert_rec(sc, rie);
+		if (error)
+			return error;
+
+		list_del(&rie->list);
+		kmem_free(rie);
+	}
+
+	/* Free the old inode btree blocks if they're not in use. */
+	return xrep_reap_extents(sc, old_iallocbt_blocks, oinfo,
+			XFS_AG_RESV_NONE);
+}
+
+/*
+ * Make our new inode btree roots permanent so that we can start re-adding
+ * inode records back into the AG.
+ */
+STATIC int
+xrep_ibt_commit_new(
+	struct xfs_scrub	*sc,
+	struct xfs_bitmap	*old_iallocbt_blocks,
+	int			log_flags)
+{
+	int			error;
+
+	xfs_ialloc_log_agi(sc->tp, sc->sa.agi_bp, log_flags);
+
+	/* Invalidate all the inobt/finobt blocks in btlist. */
+	error = xrep_invalidate_blocks(sc, old_iallocbt_blocks);
+	if (error)
+		return error;
+	error = xrep_roll_ag_trans(sc);
+	if (error)
+		return error;
+
+	/*
+	 * Now that we've succeeded, mark the incore state valid again.  If the
+	 * finobt is enabled, make sure we reinitialize the per-AG reservations
+	 * when we're done.
+	 */
+	sc->sa.pag->pagi_init = 1;
+	if (xfs_sb_version_hasfinobt(&sc->mp->m_sb))
+		sc->reset_perag_resv = true;
+	return 0;
+}
+
+/* Repair both inode btrees. */
+int
+xrep_iallocbt(
+	struct xfs_scrub	*sc)
+{
+	struct xfs_owner_info	oinfo;
+	struct list_head	inode_records;
+	struct xfs_bitmap	old_iallocbt_blocks;
+	struct xfs_mount	*mp = sc->mp;
+	int			log_flags = 0;
+	int			error = 0;
+
+	/* We require the rmapbt to rebuild anything. */
+	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+		return -EOPNOTSUPP;
+
+	xchk_perag_get(sc->mp, &sc->sa);
+
+	/* Collect the free space data and find the old btree blocks. */
+	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_INOBT);
+	INIT_LIST_HEAD(&inode_records);
+	xfs_bitmap_init(&old_iallocbt_blocks);
+	error = xrep_ibt_find_inodes(sc, &inode_records, &old_iallocbt_blocks);
+	if (error)
+		goto out;
+
+	/*
+	 * Blow out the old inode btrees.  This is the point at which
+	 * we are no longer able to bail out gracefully.
+	 */
+	error = xrep_ibt_reset_counters(sc, &inode_records, &log_flags);
+	if (error)
+		goto out;
+	error = xrep_ibt_reset_btrees(sc, &oinfo, &log_flags);
+	if (error)
+		goto out;
+	error = xrep_ibt_commit_new(sc, &old_iallocbt_blocks, log_flags);
+	if (error)
+		goto out;
+
+	/* Now rebuild the inode information. */
+	error = xrep_ibt_rebuild_trees(sc, &inode_records, &oinfo,
+			&old_iallocbt_blocks);
+	if (error)
+		goto out;
+out:
+	xrep_ibt_cancel_inorecs(&inode_records);
+	xfs_bitmap_destroy(&old_iallocbt_blocks);
+	return error;
+}
diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
index 17cf48564390..a44deb6f06ab 100644
--- a/fs/xfs/scrub/repair.c
+++ b/fs/xfs/scrub/repair.c
@@ -880,3 +880,23 @@ xrep_ino_dqattach(
 
 	return error;
 }
+
+/*
+ * Reinitialize the per-AG block reservation for the AG we just fixed.
+ */
+int
+xrep_reset_perag_resv(
+	struct xfs_scrub	*sc)
+{
+	int			error;
+
+	ASSERT(sc->ops->type == ST_PERAG);
+	ASSERT(sc->tp);
+
+	error = xfs_ag_resv_free(sc->sa.pag);
+	if (error)
+		goto out;
+	error = xfs_ag_resv_init(sc->sa.pag, sc->tp);
+out:
+	return error;
+}
diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
index bc1a5f1cbcdc..0cc53dee3228 100644
--- a/fs/xfs/scrub/repair.h
+++ b/fs/xfs/scrub/repair.h
@@ -53,6 +53,7 @@ int xrep_find_ag_btree_roots(struct xfs_scrub *sc, struct xfs_buf *agf_bp,
 		struct xrep_find_ag_btree *btree_info, struct xfs_buf *agfl_bp);
 void xrep_force_quotacheck(struct xfs_scrub *sc, uint dqtype);
 int xrep_ino_dqattach(struct xfs_scrub *sc);
+int xrep_reset_perag_resv(struct xfs_scrub *sc);
 
 /* Metadata repairers */
 
@@ -62,6 +63,7 @@ int xrep_agf(struct xfs_scrub *sc);
 int xrep_agfl(struct xfs_scrub *sc);
 int xrep_agi(struct xfs_scrub *sc);
 int xrep_allocbt(struct xfs_scrub *sc);
+int xrep_iallocbt(struct xfs_scrub *sc);
 
 #else
 
@@ -83,12 +85,21 @@ xrep_calc_ag_resblks(
 	return 0;
 }
 
+static inline int
+xrep_reset_perag_resv(
+	struct xfs_scrub	*sc)
+{
+	ASSERT(0);
+	return -EOPNOTSUPP;
+}
+
 #define xrep_probe			xrep_notsupported
 #define xrep_superblock			xrep_notsupported
 #define xrep_agf			xrep_notsupported
 #define xrep_agfl			xrep_notsupported
 #define xrep_agi			xrep_notsupported
 #define xrep_allocbt			xrep_notsupported
+#define xrep_iallocbt			xrep_notsupported
 
 #endif /* CONFIG_XFS_ONLINE_REPAIR */
 
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 2133a3199372..631b0b06db99 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -244,14 +244,14 @@ static const struct xchk_meta_ops meta_scrub_ops[] = {
 		.type	= ST_PERAG,
 		.setup	= xchk_setup_ag_iallocbt,
 		.scrub	= xchk_inobt,
-		.repair	= xrep_notsupported,
+		.repair	= xrep_iallocbt,
 	},
 	[XFS_SCRUB_TYPE_FINOBT] = {	/* finobt */
 		.type	= ST_PERAG,
 		.setup	= xchk_setup_ag_iallocbt,
 		.scrub	= xchk_finobt,
 		.has	= xfs_sb_version_hasfinobt,
-		.repair	= xrep_notsupported,
+		.repair	= xrep_iallocbt,
 	},
 	[XFS_SCRUB_TYPE_RMAPBT] = {	/* rmapbt */
 		.type	= ST_PERAG,
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index af323b229c4b..762db46fd696 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -64,6 +64,7 @@ struct xfs_scrub {
 	uint				ilock_flags;
 	bool				try_harder;
 	bool				has_quotaofflock;
+	bool				reset_perag_resv;
 
 	/* State tracking for single-AG operations. */
 	struct xchk_ag			sa;
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 26bd5dc68efe..9126dc66f726 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -552,7 +552,7 @@ DEFINE_EVENT(xrep_rmap_class, name, \
 		 uint64_t owner, uint64_t offset, unsigned int flags), \
 	TP_ARGS(mp, agno, agbno, len, owner, offset, flags))
 DEFINE_REPAIR_RMAP_EVENT(xrep_abt_walk_rmap);
-DEFINE_REPAIR_RMAP_EVENT(xrep_ialloc_extent_fn);
+DEFINE_REPAIR_RMAP_EVENT(xrep_ibt_walk_rmap);
 DEFINE_REPAIR_RMAP_EVENT(xrep_rmap_extent_fn);
 DEFINE_REPAIR_RMAP_EVENT(xrep_bmap_extent_fn);
 
@@ -700,7 +700,7 @@ TRACE_EVENT(xrep_reset_counters,
 		  MAJOR(__entry->dev), MINOR(__entry->dev))
 )
 
-TRACE_EVENT(xrep_ialloc_insert,
+TRACE_EVENT(xrep_ibt_insert,
 	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
 		 xfs_agino_t startino, uint16_t holemask, uint8_t count,
 		 uint8_t freecount, uint64_t freemask),


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 09/16] xfs: repair refcount btrees
  2018-07-26  0:19 [PATCH v17 00/16] xfs-4.19: online repair support Darrick J. Wong
                   ` (7 preceding siblings ...)
  2018-07-26  0:21 ` [PATCH 08/16] xfs: repair inode btrees Darrick J. Wong
@ 2018-07-26  0:21 ` Darrick J. Wong
  2018-07-26  0:21 ` [PATCH 10/16] xfs: repair inode records Darrick J. Wong
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2018-07-26  0:21 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, david, allison.henderson

From: Darrick J. Wong <darrick.wong@oracle.com>

Reconstruct the refcount data from the rmap btree.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile                |    1 
 fs/xfs/scrub/refcount_repair.c |  586 ++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/repair.h          |    2 
 fs/xfs/scrub/scrub.c           |    2 
 4 files changed, 590 insertions(+), 1 deletion(-)
 create mode 100644 fs/xfs/scrub/refcount_repair.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index af1dc9aeb1a7..4ca97e026f94 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -167,6 +167,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   alloc_repair.o \
 				   bitmap.o \
 				   ialloc_repair.o \
+				   refcount_repair.o \
 				   repair.o \
 				   )
 endif
diff --git a/fs/xfs/scrub/refcount_repair.c b/fs/xfs/scrub/refcount_repair.c
new file mode 100644
index 000000000000..549e1adc972c
--- /dev/null
+++ b/fs/xfs/scrub/refcount_repair.c
@@ -0,0 +1,586 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_itable.h"
+#include "xfs_alloc.h"
+#include "xfs_ialloc.h"
+#include "xfs_rmap.h"
+#include "xfs_rmap_btree.h"
+#include "xfs_refcount.h"
+#include "xfs_refcount_btree.h"
+#include "xfs_error.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/btree.h"
+#include "scrub/trace.h"
+#include "scrub/repair.h"
+#include "scrub/bitmap.h"
+
+/*
+ * Rebuilding the Reference Count Btree
+ * ====================================
+ *
+ * This algorithm is "borrowed" from xfs_repair.  Imagine the rmap
+ * entries as rectangles representing extents of physical blocks, and
+ * that the rectangles can be laid down to allow them to overlap each
+ * other; then we know that we must emit a refcnt btree entry wherever
+ * the amount of overlap changes, i.e. the emission stimulus is
+ * level-triggered:
+ *
+ *                 -    ---
+ *       --      ----- ----   ---        ------
+ * --   ----     ----------- ----     ---------
+ * -------------------------------- -----------
+ * ^ ^  ^^ ^^    ^ ^^ ^^^  ^^^^  ^ ^^ ^  ^     ^
+ * 2 1  23 21    3 43 234  2123  1 01 2  3     0
+ *
+ * For our purposes, a rmap is a tuple (startblock, len, fileoff, owner).
+ *
+ * Note that in the actual refcnt btree we don't store the refcount < 2
+ * cases because the bnobt tells us which blocks are free; single-use
+ * blocks aren't recorded in the bnobt or the refcntbt.  If the rmapbt
+ * supports storing multiple entries covering a given block we could
+ * theoretically dispense with the refcntbt and simply count rmaps, but
+ * that's inefficient in the (hot) write path, so we'll take the cost of
+ * the extra tree to save time.  Also there's no guarantee that rmap
+ * will be enabled.
+ *
+ * Given an array of rmaps sorted by physical block number, a starting
+ * physical block (sp), a bag to hold rmaps that cover sp, and the next
+ * physical block where the level changes (np), we can reconstruct the
+ * refcount btree as follows:
+ *
+ * While there are still unprocessed rmaps in the array,
+ *  - Set sp to the physical block (pblk) of the next unprocessed rmap.
+ *  - Add to the bag all rmaps in the array where startblock == sp.
+ *  - Set np to the physical block where the bag size will change.  This
+ *    is the minimum of (the pblk of the next unprocessed rmap) and
+ *    (startblock + len of each rmap in the bag).
+ *  - Record the bag size as old_bag_size.
+ *
+ *  - While the bag isn't empty,
+ *     - Remove from the bag all rmaps where startblock + len == np.
+ *     - Add to the bag all rmaps in the array where startblock == np.
+ *     - If the bag size isn't old_bag_size, store the refcount entry
+ *       (sp, np - sp, bag_size) in the refcnt btree.
+ *     - If the bag is empty, break out of the inner loop.
+ *     - Set old_bag_size to the bag size
+ *     - Set sp = np.
+ *     - Set np to the physical block where the bag size will change.
+ *       This is the minimum of (the pblk of the next unprocessed rmap)
+ *       and (startblock + len of each rmap in the bag).
+ *
+ * Like all the other repairers, we make a list of all the refcount
+ * records we need, then reinitialize the refcount btree root and
+ * insert all the records.
+ */
+
+struct xrep_refc_rmap {
+	struct list_head	list;
+	struct xfs_rmap_irec	rmap;
+};
+
+struct xrep_refc_extent {
+	struct list_head		list;
+	struct xfs_refcount_irec	refc;
+};
+
+struct xrep_refc {
+	struct list_head	rmap_bag;  /* rmaps we're tracking */
+	struct list_head	rmap_idle; /* idle rmaps */
+	struct list_head	*extlist;  /* refcount extents */
+	struct xfs_bitmap	*btlist;   /* old refcountbt blocks */
+	struct xfs_scrub	*sc;
+	unsigned long		nr_records;/* nr refcount extents */
+	xfs_extlen_t		btblocks;  /* # of refcountbt blocks */
+};
+
+/* Grab the next record from the rmapbt. */
+STATIC int
+xrep_refc_next_rmap(
+	struct xfs_btree_cur	*cur,
+	struct xrep_refc	*rr,
+	struct xfs_rmap_irec	*rec,
+	bool			*have_rec)
+{
+	struct xfs_rmap_irec	rmap;
+	struct xfs_mount	*mp = cur->bc_mp;
+	struct xrep_refc_extent	*rre;
+	xfs_fsblock_t		fsbno;
+	int			have_gt;
+	int			error = 0;
+
+	*have_rec = false;
+	/*
+	 * Loop through the remaining rmaps.  Remember CoW staging
+	 * extents and the refcountbt blocks from the old tree for later
+	 * disposal.  We can only share written data fork extents, so
+	 * keep looping until we find an rmap for one.
+	 */
+	do {
+		if (xchk_should_terminate(rr->sc, &error))
+			goto out_error;
+
+		error = xfs_btree_increment(cur, 0, &have_gt);
+		if (error)
+			goto out_error;
+		if (!have_gt)
+			return 0;
+
+		error = xfs_rmap_get_rec(cur, &rmap, &have_gt);
+		if (error)
+			goto out_error;
+		XFS_WANT_CORRUPTED_GOTO(mp, have_gt == 1, out_error);
+
+		if (rmap.rm_owner == XFS_RMAP_OWN_COW) {
+			/* Pass CoW staging extents right through. */
+			rre = kmem_alloc(sizeof(struct xrep_refc_extent),
+					KM_MAYFAIL);
+			if (!rre)
+				goto out_error;
+
+			INIT_LIST_HEAD(&rre->list);
+			rre->refc.rc_startblock = rmap.rm_startblock +
+					XFS_REFC_COW_START;
+			rre->refc.rc_blockcount = rmap.rm_blockcount;
+			rre->refc.rc_refcount = 1;
+			list_add_tail(&rre->list, rr->extlist);
+		} else if (rmap.rm_owner == XFS_RMAP_OWN_REFC) {
+			/* refcountbt block, dump it when we're done. */
+			rr->btblocks += rmap.rm_blockcount;
+			fsbno = XFS_AGB_TO_FSB(cur->bc_mp,
+					cur->bc_private.a.agno,
+					rmap.rm_startblock);
+			error = xfs_bitmap_set(rr->btlist, fsbno,
+					rmap.rm_blockcount);
+			if (error)
+				goto out_error;
+		}
+	} while (XFS_RMAP_NON_INODE_OWNER(rmap.rm_owner) ||
+		 xfs_internal_inum(mp, rmap.rm_owner) ||
+		 (rmap.rm_flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK |
+				   XFS_RMAP_UNWRITTEN)));
+
+	*rec = rmap;
+	*have_rec = true;
+	return 0;
+
+out_error:
+	return error;
+}
+
+/* Recycle an idle rmap or allocate a new one. */
+static struct xrep_refc_rmap *
+xrep_refc_get_rmap(
+	struct xrep_refc	*rr)
+{
+	struct xrep_refc_rmap	*rrm;
+
+	if (list_empty(&rr->rmap_idle)) {
+		rrm = kmem_alloc(sizeof(struct xrep_refc_rmap), KM_MAYFAIL);
+		if (!rrm)
+			return NULL;
+		INIT_LIST_HEAD(&rrm->list);
+		return rrm;
+	}
+
+	rrm = list_first_entry(&rr->rmap_idle, struct xrep_refc_rmap, list);
+	list_del_init(&rrm->list);
+	return rrm;
+}
+
+/* Compare two btree extents. */
+static int
+xrep_refcount_extent_cmp(
+	void			*priv,
+	struct list_head	*a,
+	struct list_head	*b)
+{
+	struct xrep_refc_extent	*ap;
+	struct xrep_refc_extent	*bp;
+
+	ap = container_of(a, struct xrep_refc_extent, list);
+	bp = container_of(b, struct xrep_refc_extent, list);
+
+	if (ap->refc.rc_startblock > bp->refc.rc_startblock)
+		return 1;
+	else if (ap->refc.rc_startblock < bp->refc.rc_startblock)
+		return -1;
+	return 0;
+}
+
+/* Record a reference count extent. */
+STATIC int
+xrep_refc_new_refc(
+	struct xfs_scrub		*sc,
+	struct xrep_refc		*rr,
+	xfs_agblock_t			agbno,
+	xfs_extlen_t			len,
+	xfs_nlink_t			refcount)
+{
+	struct xrep_refc_extent		*rre;
+	struct xfs_refcount_irec	irec;
+
+	irec.rc_startblock = agbno;
+	irec.rc_blockcount = len;
+	irec.rc_refcount = refcount;
+
+	trace_xrep_refcount_extent_fn(sc->mp, sc->sa.agno, &irec);
+
+	rre = kmem_alloc(sizeof(struct xrep_refc_extent), KM_MAYFAIL);
+	if (!rre)
+		return -ENOMEM;
+	INIT_LIST_HEAD(&rre->list);
+	rre->refc = irec;
+	list_add_tail(&rre->list, rr->extlist);
+
+	return 0;
+}
+
+/* Iterate all the rmap records to generate reference count data. */
+#define RMAP_NEXT(r)	((r).rm_startblock + (r).rm_blockcount)
+STATIC int
+xrep_refc_generate_refcounts(
+	struct xfs_scrub	*sc,
+	struct xrep_refc	*rr)
+{
+	struct xfs_rmap_irec	rmap;
+	struct xfs_btree_cur	*cur;
+	struct xrep_refc_rmap	*rrm;
+	struct xrep_refc_rmap	*n;
+	xfs_agblock_t		sbno;
+	xfs_agblock_t		cbno;
+	xfs_agblock_t		nbno;
+	size_t			old_stack_sz;
+	size_t			stack_sz = 0;
+	bool			have;
+	int			have_gt;
+	int			error;
+
+	/* Start the rmapbt cursor to the left of all records. */
+	cur = xfs_rmapbt_init_cursor(sc->mp, sc->tp, sc->sa.agf_bp,
+			sc->sa.agno);
+	error = xfs_rmap_lookup_le(cur, 0, 0, 0, 0, 0, &have_gt);
+	if (error)
+		goto out;
+	ASSERT(have_gt == 0);
+
+	/* Process reverse mappings into refcount data. */
+	while (xfs_btree_has_more_records(cur)) {
+		/* Push all rmaps with pblk == sbno onto the stack */
+		error = xrep_refc_next_rmap(cur, rr, &rmap, &have);
+		if (error)
+			goto out;
+		if (!have)
+			break;
+		sbno = cbno = rmap.rm_startblock;
+		while (have && rmap.rm_startblock == sbno) {
+			rrm = xrep_refc_get_rmap(rr);
+			if (!rrm)
+				goto out;
+			rrm->rmap = rmap;
+			list_add_tail(&rrm->list, &rr->rmap_bag);
+			stack_sz++;
+			error = xrep_refc_next_rmap(cur, rr, &rmap, &have);
+			if (error)
+				goto out;
+		}
+		error = xfs_btree_decrement(cur, 0, &have_gt);
+		if (error)
+			goto out;
+		XFS_WANT_CORRUPTED_GOTO(sc->mp, have_gt, out);
+
+		/* Set nbno to the bno of the next refcount change */
+		nbno = have ? rmap.rm_startblock : NULLAGBLOCK;
+		list_for_each_entry(rrm, &rr->rmap_bag, list)
+			nbno = min_t(xfs_agblock_t, nbno, RMAP_NEXT(rrm->rmap));
+
+		ASSERT(nbno > sbno);
+		old_stack_sz = stack_sz;
+
+		/* While stack isn't empty... */
+		while (stack_sz) {
+			/* Pop all rmaps that end at nbno */
+			list_for_each_entry_safe(rrm, n, &rr->rmap_bag, list) {
+				if (RMAP_NEXT(rrm->rmap) != nbno)
+					continue;
+				stack_sz--;
+				list_move(&rrm->list, &rr->rmap_idle);
+			}
+
+			/* Push array items that start at nbno */
+			error = xrep_refc_next_rmap(cur, rr, &rmap, &have);
+			if (error)
+				goto out;
+			while (have && rmap.rm_startblock == nbno) {
+				rrm = xrep_refc_get_rmap(rr);
+				if (!rrm)
+					goto out;
+				rrm->rmap = rmap;
+				list_add_tail(&rrm->list, &rr->rmap_bag);
+				stack_sz++;
+				error = xrep_refc_next_rmap(cur, rr, &rmap,
+						&have);
+				if (error)
+					goto out;
+			}
+			error = xfs_btree_decrement(cur, 0, &have_gt);
+			if (error)
+				goto out;
+			XFS_WANT_CORRUPTED_GOTO(sc->mp, have_gt, out);
+
+			/* Emit refcount if necessary */
+			ASSERT(nbno > cbno);
+			if (stack_sz != old_stack_sz) {
+				if (old_stack_sz > 1) {
+					error = xrep_refc_new_refc(sc, rr, cbno,
+							nbno - cbno,
+							old_stack_sz);
+					if (error)
+						goto out;
+					rr->nr_records++;
+				}
+				cbno = nbno;
+			}
+
+			/* Stack empty, go find the next rmap */
+			if (stack_sz == 0)
+				break;
+			old_stack_sz = stack_sz;
+			sbno = nbno;
+
+			/* Set nbno to the bno of the next refcount change */
+			nbno = have ? rmap.rm_startblock : NULLAGBLOCK;
+			list_for_each_entry(rrm, &rr->rmap_bag, list)
+				nbno = min_t(xfs_agblock_t, nbno,
+						RMAP_NEXT(rrm->rmap));
+
+			ASSERT(nbno > sbno);
+		}
+	}
+
+	/* Free all the leftover rmap records. */
+	list_for_each_entry_safe(rrm, n, &rr->rmap_idle, list) {
+		list_del(&rrm->list);
+		kmem_free(rrm);
+	}
+
+	ASSERT(list_empty(&rr->rmap_bag));
+out:
+	xfs_btree_del_cursor(cur, error);
+	return error;
+}
+#undef RMAP_NEXT
+
+/*
+ * Generate all the reference counts for this AG and a list of the old
+ * refcount btree blocks.  Figure out if we have enough free space to
+ * reconstruct the inode btrees.  The caller must clean up the lists if
+ * anything goes wrong.
+ */
+STATIC int
+xrep_refc_find_refcounts(
+	struct xfs_scrub	*sc,
+	struct list_head	*refcount_records,
+	struct xfs_bitmap	*old_refcountbt_blocks)
+{
+	struct xrep_refc	rr;
+	struct xrep_refc_rmap	*rrm;
+	struct xrep_refc_rmap	*n;
+	struct xfs_mount	*mp = sc->mp;
+	int			error;
+
+	INIT_LIST_HEAD(&rr.rmap_bag);
+	INIT_LIST_HEAD(&rr.rmap_idle);
+	rr.extlist = refcount_records;
+	rr.btlist = old_refcountbt_blocks;
+	rr.btblocks = 0;
+	rr.sc = sc;
+	rr.nr_records = 0;
+
+	/* Generate all the refcount records. */
+	error = xrep_refc_generate_refcounts(sc, &rr);
+	if (error)
+		goto out;
+
+	/* Do we actually have enough space to do this? */
+	if (!xrep_ag_has_space(sc->sa.pag,
+			xfs_refcountbt_calc_size(mp, rr.nr_records),
+			XFS_AG_RESV_METADATA)) {
+		error = -ENOSPC;
+		goto out;
+	}
+
+out:
+	list_for_each_entry_safe(rrm, n, &rr.rmap_idle, list) {
+		list_del(&rrm->list);
+		kmem_free(rrm);
+	}
+	list_for_each_entry_safe(rrm, n, &rr.rmap_bag, list) {
+		list_del(&rrm->list);
+		kmem_free(rrm);
+	}
+	return error;
+}
+
+/* Initialize new refcountbt root and implant it into the AGF. */
+STATIC int
+xrep_refc_reset_btree(
+	struct xfs_scrub	*sc,
+	struct xfs_owner_info	*oinfo,
+	int			*log_flags)
+{
+	struct xfs_buf		*bp;
+	struct xfs_agf		*agf;
+	xfs_fsblock_t		btfsb;
+	int			error;
+
+	agf = XFS_BUF_TO_AGF(sc->sa.agf_bp);
+
+	/* Initialize a new refcountbt root. */
+	error = xrep_alloc_ag_block(sc, oinfo, &btfsb, XFS_AG_RESV_METADATA);
+	if (error)
+		return error;
+	error = xrep_init_btblock(sc, btfsb, &bp, XFS_BTNUM_REFC,
+			&xfs_refcountbt_buf_ops);
+	if (error)
+		return error;
+	agf->agf_refcount_root = cpu_to_be32(XFS_FSB_TO_AGBNO(sc->mp, btfsb));
+	agf->agf_refcount_level = cpu_to_be32(1);
+	agf->agf_refcount_blocks = cpu_to_be32(1);
+	*log_flags |= XFS_AGF_REFCOUNT_BLOCKS | XFS_AGF_REFCOUNT_ROOT |
+		      XFS_AGF_REFCOUNT_LEVEL;
+
+	return 0;
+}
+
+/* Build new refcount btree and dispose of the old one. */
+STATIC int
+xrep_refc_rebuild_tree(
+	struct xfs_scrub	*sc,
+	struct list_head	*refcount_records,
+	struct xfs_owner_info	*oinfo,
+	struct xfs_bitmap	*old_refcountbt_blocks)
+{
+	struct xrep_refc_extent	*rre;
+	struct xrep_refc_extent	*n;
+	struct xfs_mount	*mp = sc->mp;
+	struct xfs_btree_cur	*cur;
+	int			have_gt;
+	int			error;
+
+	/* Add all records. */
+	list_sort(NULL, refcount_records, xrep_refcount_extent_cmp);
+	list_for_each_entry_safe(rre, n, refcount_records, list) {
+		/* Insert into the refcountbt. */
+		cur = xfs_refcountbt_init_cursor(mp, sc->tp, sc->sa.agf_bp,
+				sc->sa.agno, NULL);
+		error = xfs_refcount_lookup_eq(cur, rre->refc.rc_startblock,
+				&have_gt);
+		if (error)
+			return error;
+		XFS_WANT_CORRUPTED_RETURN(mp, have_gt == 0);
+		error = xfs_refcount_insert(cur, &rre->refc, &have_gt);
+		if (error)
+			return error;
+		XFS_WANT_CORRUPTED_RETURN(mp, have_gt == 1);
+		xfs_btree_del_cursor(cur, error);
+		cur = NULL;
+
+		error = xrep_roll_ag_trans(sc);
+		if (error)
+			return error;
+
+		list_del(&rre->list);
+		kmem_free(rre);
+	}
+
+	/* Free the old refcountbt blocks if they're not in use. */
+	return xrep_reap_extents(sc, old_refcountbt_blocks, oinfo,
+			XFS_AG_RESV_METADATA);
+}
+
+/* Free every record in the refcount list. */
+STATIC void
+xrep_refc_cancel_recs(
+	struct list_head	*recs)
+{
+	struct xrep_refc_extent	*rre;
+	struct xrep_refc_extent	*n;
+
+	list_for_each_entry_safe(rre, n, recs, list) {
+		list_del(&rre->list);
+		kmem_free(rre);
+	}
+}
+
+/* Rebuild the refcount btree. */
+int
+xrep_refcountbt(
+	struct xfs_scrub	*sc)
+{
+	struct xfs_owner_info	oinfo;
+	struct list_head	refcount_records;
+	struct xfs_bitmap	old_refcountbt_blocks;
+	struct xfs_mount	*mp = sc->mp;
+	int			log_flags = 0;
+	int			error;
+
+	/* We require the rmapbt to rebuild anything. */
+	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+		return -EOPNOTSUPP;
+
+	xchk_perag_get(sc->mp, &sc->sa);
+
+	/* Collect all reference counts. */
+	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_REFC);
+	INIT_LIST_HEAD(&refcount_records);
+	xfs_bitmap_init(&old_refcountbt_blocks);
+	error = xrep_refc_find_refcounts(sc, &refcount_records,
+			&old_refcountbt_blocks);
+	if (error)
+		goto out;
+
+	/*
+	 * Blow out the old refcount btrees.  This is the point at which
+	 * we are no longer able to bail out gracefully.
+	 */
+	error = xrep_refc_reset_btree(sc, &oinfo, &log_flags);
+	if (error)
+		goto out;
+	xfs_alloc_log_agf(sc->tp, sc->sa.agf_bp, log_flags);
+
+	/* Invalidate all the inobt/finobt blocks in btlist. */
+	error = xrep_invalidate_blocks(sc, &old_refcountbt_blocks);
+	if (error)
+		goto out;
+	error = xrep_roll_ag_trans(sc);
+	if (error)
+		goto out;
+
+	/* Now rebuild the refcount information. */
+	error = xrep_refc_rebuild_tree(sc, &refcount_records, &oinfo,
+			&old_refcountbt_blocks);
+	if (error)
+		goto out;
+	sc->reset_perag_resv = true;
+out:
+	xfs_bitmap_destroy(&old_refcountbt_blocks);
+	xrep_refc_cancel_recs(&refcount_records);
+	return error;
+}
diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
index 0cc53dee3228..da12c20376ae 100644
--- a/fs/xfs/scrub/repair.h
+++ b/fs/xfs/scrub/repair.h
@@ -64,6 +64,7 @@ int xrep_agfl(struct xfs_scrub *sc);
 int xrep_agi(struct xfs_scrub *sc);
 int xrep_allocbt(struct xfs_scrub *sc);
 int xrep_iallocbt(struct xfs_scrub *sc);
+int xrep_refcountbt(struct xfs_scrub *sc);
 
 #else
 
@@ -100,6 +101,7 @@ xrep_reset_perag_resv(
 #define xrep_agi			xrep_notsupported
 #define xrep_allocbt			xrep_notsupported
 #define xrep_iallocbt			xrep_notsupported
+#define xrep_refcountbt			xrep_notsupported
 
 #endif /* CONFIG_XFS_ONLINE_REPAIR */
 
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 631b0b06db99..843eafe0acef 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -265,7 +265,7 @@ static const struct xchk_meta_ops meta_scrub_ops[] = {
 		.setup	= xchk_setup_ag_refcountbt,
 		.scrub	= xchk_refcountbt,
 		.has	= xfs_sb_version_hasreflink,
-		.repair	= xrep_notsupported,
+		.repair	= xrep_refcountbt,
 	},
 	[XFS_SCRUB_TYPE_INODE] = {	/* inode record */
 		.type	= ST_INODE,


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 10/16] xfs: repair inode records
  2018-07-26  0:19 [PATCH v17 00/16] xfs-4.19: online repair support Darrick J. Wong
                   ` (8 preceding siblings ...)
  2018-07-26  0:21 ` [PATCH 09/16] xfs: repair refcount btrees Darrick J. Wong
@ 2018-07-26  0:21 ` Darrick J. Wong
  2018-07-26  0:21 ` [PATCH 11/16] xfs: zap broken inode forks Darrick J. Wong
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2018-07-26  0:21 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, david, allison.henderson

From: Darrick J. Wong <darrick.wong@oracle.com>

Try to reinitialize corrupt inodes, or clear the reflink flag
if it's not needed.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile             |    1 
 fs/xfs/libxfs/xfs_format.h  |    3 
 fs/xfs/scrub/inode_repair.c |  659 +++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/repair.h       |    2 
 fs/xfs/scrub/scrub.c        |    2 
 5 files changed, 665 insertions(+), 2 deletions(-)
 create mode 100644 fs/xfs/scrub/inode_repair.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 4ca97e026f94..e01b5003d543 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -167,6 +167,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   alloc_repair.o \
 				   bitmap.o \
 				   ialloc_repair.o \
+				   inode_repair.o \
 				   refcount_repair.o \
 				   repair.o \
 				   )
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 059bc44c27e8..d4ebf1a4f3e8 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -973,7 +973,8 @@ typedef enum xfs_dinode_fmt {
 #define XFS_DFORK_APTR(dip)	\
 	(XFS_DFORK_DPTR(dip) + XFS_DFORK_BOFF(dip))
 #define XFS_DFORK_PTR(dip,w)	\
-	((w) == XFS_DATA_FORK ? XFS_DFORK_DPTR(dip) : XFS_DFORK_APTR(dip))
+	((void *)((w) == XFS_DATA_FORK ? XFS_DFORK_DPTR(dip) : \
+					 XFS_DFORK_APTR(dip)))
 
 #define XFS_DFORK_FORMAT(dip,w) \
 	((w) == XFS_DATA_FORK ? \
diff --git a/fs/xfs/scrub/inode_repair.c b/fs/xfs/scrub/inode_repair.c
new file mode 100644
index 000000000000..5ff929ea3d11
--- /dev/null
+++ b/fs/xfs/scrub/inode_repair.c
@@ -0,0 +1,659 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_icache.h"
+#include "xfs_inode_buf.h"
+#include "xfs_inode_fork.h"
+#include "xfs_ialloc.h"
+#include "xfs_da_format.h"
+#include "xfs_reflink.h"
+#include "xfs_rmap.h"
+#include "xfs_bmap.h"
+#include "xfs_bmap_util.h"
+#include "xfs_dir2.h"
+#include "xfs_quota_defs.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/btree.h"
+#include "scrub/trace.h"
+#include "scrub/repair.h"
+
+/*
+ * Inode Repair
+ *
+ * Roughly speaking, inode problems can be classified based on whether or not
+ * they trip the dinode verifiers.  If those trip, then we won't be able to
+ * _iget ourselves the inode.
+ *
+ * Therefore, the xrep_dinode_* functions fix anything that will cause the
+ * inode buffer verifier or the dinode verifier.  The xrep_inode_* functions
+ * fix things on live incore inodes.
+ */
+
+/* Make sure this buffer can pass the inode buffer verifier. */
+STATIC void
+xrep_dinode_buf(
+	struct xfs_scrub	*sc,
+	struct xfs_buf		*bp)
+{
+	struct xfs_mount	*mp = sc->mp;
+	struct xfs_trans	*tp = sc->tp;
+	struct xfs_dinode	*dip;
+	xfs_agnumber_t		agno;
+	xfs_agino_t		agino;
+	int			ioff;
+	int			i;
+	int			ni;
+	bool			crc_ok;
+	bool			magic_ok;
+	bool			unlinked_ok;
+
+	ni = XFS_BB_TO_FSB(mp, bp->b_length) * mp->m_sb.sb_inopblock;
+	agno = xfs_daddr_to_agno(mp, XFS_BUF_ADDR(bp));
+	for (i = 0; i < ni; i++) {
+		ioff = i << mp->m_sb.sb_inodelog;
+		dip = xfs_buf_offset(bp, ioff);
+		agino = be32_to_cpu(dip->di_next_unlinked);
+
+		unlinked_ok = magic_ok = crc_ok = false;
+
+		if (agino == NULLAGINO || xfs_verify_agino(sc->mp, agno, agino))
+			unlinked_ok = true;
+
+		if (dip->di_magic == cpu_to_be16(XFS_DINODE_MAGIC) &&
+		    xfs_dinode_good_version(mp, dip->di_version))
+			magic_ok = true;
+
+		if (xfs_verify_cksum((char *)dip, mp->m_sb.sb_inodesize,
+				XFS_DINODE_CRC_OFF))
+			crc_ok = true;
+
+		if (magic_ok && unlinked_ok && crc_ok)
+			continue;
+
+		if (!magic_ok) {
+			dip->di_magic = cpu_to_be16(XFS_DINODE_MAGIC);
+			dip->di_version = 3;
+		}
+		if (!unlinked_ok)
+			dip->di_next_unlinked = cpu_to_be32(NULLAGINO);
+		xfs_dinode_calc_crc(mp, dip);
+		xfs_trans_buf_set_type(tp, bp, XFS_BLFT_DINO_BUF);
+		xfs_trans_log_buf(tp, bp, ioff, ioff + sizeof(*dip) - 1);
+	}
+}
+
+/* Reinitialize things that never change in an inode. */
+STATIC void
+xrep_dinode_header(
+	struct xfs_scrub	*sc,
+	struct xfs_dinode	*dip)
+{
+	dip->di_magic = cpu_to_be16(XFS_DINODE_MAGIC);
+	if (!xfs_dinode_good_version(sc->mp, dip->di_version))
+		dip->di_version = 3;
+	dip->di_ino = cpu_to_be64(sc->sm->sm_ino);
+	uuid_copy(&dip->di_uuid, &sc->mp->m_sb.sb_meta_uuid);
+	dip->di_gen = cpu_to_be32(sc->sm->sm_gen);
+}
+
+/*
+ * Turn di_mode into /something/ recognizable.
+ *
+ * XXX: Ideally we'd try to read data block 0 to see if it's a directory.
+ */
+STATIC void
+xrep_dinode_mode(
+	struct xfs_dinode	*dip)
+{
+	uint16_t		mode;
+
+	mode = be16_to_cpu(dip->di_mode);
+	if (mode == 0 || xfs_mode_to_ftype(mode) != XFS_DIR3_FT_UNKNOWN)
+		return;
+
+	/* bad mode, so we set it to a file that only root can read */
+	mode = S_IFREG;
+	dip->di_mode = cpu_to_be16(mode);
+	dip->di_uid = 0;
+	dip->di_gid = 0;
+}
+
+/* Fix any conflicting flags that the verifiers complain about. */
+STATIC void
+xrep_dinode_flags(
+	struct xfs_scrub	*sc,
+	struct xfs_dinode	*dip)
+{
+	struct xfs_mount	*mp = sc->mp;
+	uint64_t		flags2;
+	uint16_t		mode;
+	uint16_t		flags;
+
+	mode = be16_to_cpu(dip->di_mode);
+	flags = be16_to_cpu(dip->di_flags);
+	flags2 = be64_to_cpu(dip->di_flags2);
+
+	if (xfs_sb_version_hasreflink(&mp->m_sb) && S_ISREG(mode))
+		flags2 |= XFS_DIFLAG2_REFLINK;
+	else
+		flags2 &= ~(XFS_DIFLAG2_REFLINK | XFS_DIFLAG2_COWEXTSIZE);
+	if (flags & XFS_DIFLAG_REALTIME)
+		flags2 &= ~XFS_DIFLAG2_REFLINK;
+	if (flags2 & XFS_DIFLAG2_REFLINK)
+		flags2 &= ~XFS_DIFLAG2_DAX;
+	dip->di_flags = cpu_to_be16(flags);
+	dip->di_flags2 = cpu_to_be64(flags2);
+}
+
+/*
+ * Blow out symlink; now it points to the current dir.  We don't have to worry
+ * about incore state because this inode is failing the verifiers.
+ */
+STATIC void
+xrep_dinode_zap_symlink(
+	struct xfs_dinode	*dip)
+{
+	char			*p;
+
+	dip->di_format = XFS_DINODE_FMT_LOCAL;
+	dip->di_size = cpu_to_be64(1);
+	p = XFS_DFORK_PTR(dip, XFS_DATA_FORK);
+	*p = '.';
+}
+
+/*
+ * Blow out dir, make it point to the root.  In the future repair will
+ * reconstruct this directory for us.  Note that there's no in-core directory
+ * inode because the sf verifier tripped, so we don't have to worry about the
+ * dentry cache.
+ */
+STATIC void
+xrep_dinode_zap_dir(
+	struct xfs_mount		*mp,
+	struct xfs_dinode		*dip)
+{
+	const struct xfs_dir_ops	*ops;
+	struct xfs_dir2_sf_hdr		*sfp;
+	int				i8count;
+
+	dip->di_format = XFS_DINODE_FMT_LOCAL;
+	i8count = mp->m_sb.sb_rootino > XFS_DIR2_MAX_SHORT_INUM;
+	ops = xfs_dir_get_ops(mp, NULL);
+	sfp = XFS_DFORK_PTR(dip, XFS_DATA_FORK);
+	sfp->count = 0;
+	sfp->i8count = i8count;
+	ops->sf_put_parent_ino(sfp, mp->m_sb.sb_rootino);
+	dip->di_size = cpu_to_be64(xfs_dir2_sf_hdr_size(i8count));
+}
+
+/* Make sure we don't have a garbage file size. */
+STATIC void
+xrep_dinode_size(
+	struct xfs_mount	*mp,
+	struct xfs_dinode	*dip)
+{
+	uint64_t		size;
+	uint16_t		mode;
+
+	mode = be16_to_cpu(dip->di_mode);
+	size = be64_to_cpu(dip->di_size);
+	switch (mode & S_IFMT) {
+	case S_IFIFO:
+	case S_IFCHR:
+	case S_IFBLK:
+	case S_IFSOCK:
+		/* di_size can't be nonzero for special files */
+		dip->di_size = 0;
+		break;
+	case S_IFREG:
+		/* Regular files can't be larger than 2^63-1 bytes. */
+		dip->di_size = cpu_to_be64(size & ~(1ULL << 63));
+		break;
+	case S_IFLNK:
+		/*
+		 * Truncate ridiculously oversized symlinks.  If the size is
+		 * zero, reset it to point to the current directory.  Both of
+		 * these conditions trigger dinode verifier errors, so there
+		 * is no in-core state to reset.
+		 */
+		if (size > XFS_SYMLINK_MAXLEN)
+			dip->di_size = cpu_to_be64(XFS_SYMLINK_MAXLEN);
+		else if (size == 0)
+			xrep_dinode_zap_symlink(dip);
+		break;
+	case S_IFDIR:
+		/*
+		 * Directories can't have a size larger than 32G.  If the size
+		 * is zero, reset it to an empty directory.  Both of these
+		 * conditions trigger dinode verifier errors, so there is no
+		 * in-core state to reset.
+		 */
+		if (size > XFS_DIR2_SPACE_SIZE)
+			dip->di_size = cpu_to_be64(XFS_DIR2_SPACE_SIZE);
+		else if (size == 0)
+			xrep_dinode_zap_dir(mp, dip);
+		break;
+	}
+}
+
+/* Fix extent size hints. */
+STATIC void
+xrep_dinode_extsize_hints(
+	struct xfs_scrub	*sc,
+	struct xfs_dinode	*dip)
+{
+	struct xfs_mount	*mp = sc->mp;
+	uint64_t		flags2;
+	uint16_t		flags;
+	uint16_t		mode;
+	xfs_failaddr_t		fa;
+
+	mode = be16_to_cpu(dip->di_mode);
+	flags = be16_to_cpu(dip->di_flags);
+	flags2 = be64_to_cpu(dip->di_flags2);
+
+	fa = xfs_inode_validate_extsize(mp, be32_to_cpu(dip->di_extsize),
+			mode, flags);
+	if (fa) {
+		dip->di_extsize = 0;
+		dip->di_flags &= ~cpu_to_be16(XFS_DIFLAG_EXTSIZE |
+					      XFS_DIFLAG_EXTSZINHERIT);
+	}
+
+	if (dip->di_version < 3)
+		return;
+
+	fa = xfs_inode_validate_cowextsize(mp, be32_to_cpu(dip->di_cowextsize),
+			mode, flags, flags2);
+	if (fa) {
+		dip->di_cowextsize = 0;
+		dip->di_flags2 &= ~cpu_to_be64(XFS_DIFLAG2_COWEXTSIZE);
+	}
+}
+
+/* Inode didn't pass verifiers, so fix the raw buffer and retry iget. */
+STATIC int
+xrep_dinode_core(
+	struct xfs_scrub	*sc)
+{
+	struct xfs_imap		imap;
+	struct xfs_buf		*bp;
+	struct xfs_dinode	*dip;
+	xfs_ino_t		ino;
+	bool			inuse;
+	int			error;
+
+	/* Map & read inode. */
+	ino = sc->sm->sm_ino;
+	error = xfs_imap(sc->mp, sc->tp, ino, &imap, XFS_IGET_UNTRUSTED);
+	if (error)
+		return error;
+
+	error = xfs_trans_read_buf(sc->mp, sc->tp, sc->mp->m_ddev_targp,
+			imap.im_blkno, imap.im_len, XBF_UNMAPPED, &bp, NULL);
+	if (error)
+		return error;
+
+	/* Make absolutely sure this inode isn't in core. */
+	error = xfs_icache_inode_is_allocated(sc->mp, sc->tp, ino, &inuse);
+	if (error == 0) {
+		ASSERT(0);
+		return -EFSCORRUPTED;
+	}
+
+	/* Make sure we can pass the inode buffer verifier. */
+	xrep_dinode_buf(sc, bp);
+	bp->b_ops = &xfs_inode_buf_ops;
+
+	/* Fix everything the verifier will complain about. */
+	dip = xfs_buf_offset(bp, imap.im_boffset);
+	xrep_dinode_header(sc, dip);
+	xrep_dinode_mode(dip);
+	xrep_dinode_flags(sc, dip);
+	xrep_dinode_size(sc->mp, dip);
+	xrep_dinode_extsize_hints(sc, dip);
+
+	/* Write out the inode... */
+	xfs_dinode_calc_crc(sc->mp, dip);
+	xfs_trans_buf_set_type(sc->tp, bp, XFS_BLFT_DINO_BUF);
+	xfs_trans_log_buf(sc->tp, bp, imap.im_boffset,
+			imap.im_boffset + sc->mp->m_sb.sb_inodesize - 1);
+	error = xfs_trans_commit(sc->tp);
+	if (error)
+		return error;
+	sc->tp = NULL;
+
+	/* ...and reload it? */
+	error = xfs_iget(sc->mp, sc->tp, ino,
+			XFS_IGET_UNTRUSTED | XFS_IGET_DONTCACHE, 0, &sc->ip);
+	if (error)
+		return error;
+	sc->ilock_flags = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL;
+	xfs_ilock(sc->ip, sc->ilock_flags);
+	error = xchk_trans_alloc(sc, 0);
+	if (error)
+		return error;
+	sc->ilock_flags |= XFS_ILOCK_EXCL;
+	xfs_ilock(sc->ip, XFS_ILOCK_EXCL);
+
+	return 0;
+}
+
+/* Fix everything xfs_dinode_verify cares about. */
+STATIC int
+xrep_dinode_problems(
+	struct xfs_scrub	*sc)
+{
+	int			error;
+
+	error = xrep_dinode_core(sc);
+	if (error)
+		return error;
+
+	/* We had to fix a totally busted inode, schedule quotacheck. */
+	if (XFS_IS_UQUOTA_ON(sc->mp))
+		xrep_force_quotacheck(sc, XFS_DQ_USER);
+	if (XFS_IS_GQUOTA_ON(sc->mp))
+		xrep_force_quotacheck(sc, XFS_DQ_GROUP);
+	if (XFS_IS_PQUOTA_ON(sc->mp))
+		xrep_force_quotacheck(sc, XFS_DQ_PROJ);
+
+	return 0;
+}
+
+/*
+ * Fix problems that the verifiers don't care about.  In general these are
+ * errors that don't cause problems elsewhere in the kernel that we can easily
+ * detect, so we don't check them all that rigorously.
+ */
+
+/* Make sure block and extent counts are ok. */
+STATIC int
+xrep_inode_blockcounts(
+	struct xfs_scrub	*sc)
+{
+	xfs_filblks_t		count;
+	xfs_filblks_t		acount;
+	xfs_extnum_t		nextents;
+	int			error;
+
+	/* Set data fork counters from the data fork mappings. */
+	error = xfs_bmap_count_blocks(sc->tp, sc->ip, XFS_DATA_FORK,
+			&nextents, &count);
+	if (error)
+		return error;
+	if (XFS_IS_REALTIME_INODE(sc->ip)) {
+		if (count >= sc->mp->m_sb.sb_rblocks)
+			return -EFSCORRUPTED;
+	} else if (!xfs_sb_version_hasreflink(&sc->mp->m_sb)) {
+		if (count >= sc->mp->m_sb.sb_dblocks)
+			return -EFSCORRUPTED;
+	}
+	sc->ip->i_d.di_nextents = nextents;
+
+	/* Set attr fork counters from the attr fork mappings. */
+	error = xfs_bmap_count_blocks(sc->tp, sc->ip, XFS_ATTR_FORK,
+			&nextents, &acount);
+	if (error)
+		return error;
+	if (count >= sc->mp->m_sb.sb_dblocks)
+		return -EFSCORRUPTED;
+	if (nextents >= (uint16_t)-1U)
+		return -EFSCORRUPTED;
+	sc->ip->i_d.di_anextents = nextents;
+
+	sc->ip->i_d.di_nblocks = count + acount;
+
+	/*
+	 * If we found attr fork extents but no attr fork root, zero the
+	 * attr fork extent count so that the attr fork repair will run.
+	 */
+	if (sc->ip->i_d.di_anextents != 0 && sc->ip->i_d.di_forkoff == 0)
+		sc->ip->i_d.di_anextents = 0;
+
+	return 0;
+}
+
+/* Check for invalid uid/gid.  Note that a -1U projid is allowed. */
+STATIC void
+xrep_inode_ids(
+	struct xfs_scrub	*sc)
+{
+	if (sc->ip->i_d.di_uid == -1U) {
+		sc->ip->i_d.di_uid = 0;
+		VFS_I(sc->ip)->i_mode &= ~(S_ISUID | S_ISGID);
+		if (XFS_IS_UQUOTA_ON(sc->mp))
+			xrep_force_quotacheck(sc, XFS_DQ_USER);
+	}
+
+	if (sc->ip->i_d.di_gid == -1U) {
+		sc->ip->i_d.di_gid = 0;
+		VFS_I(sc->ip)->i_mode &= ~(S_ISUID | S_ISGID);
+		if (XFS_IS_GQUOTA_ON(sc->mp))
+			xrep_force_quotacheck(sc, XFS_DQ_GROUP);
+	}
+}
+
+/* Nanosecond counters can't have more than 1 billion. */
+STATIC void
+xrep_inode_timestamps(
+	struct xfs_inode	*ip)
+{
+	if ((unsigned long)VFS_I(ip)->i_atime.tv_nsec >= NSEC_PER_SEC)
+		VFS_I(ip)->i_atime.tv_nsec = 0;
+	if ((unsigned long)VFS_I(ip)->i_mtime.tv_nsec >= NSEC_PER_SEC)
+		VFS_I(ip)->i_mtime.tv_nsec = 0;
+	if ((unsigned long)VFS_I(ip)->i_ctime.tv_nsec >= NSEC_PER_SEC)
+		VFS_I(ip)->i_ctime.tv_nsec = 0;
+	if (ip->i_d.di_version > 2 &&
+	    (unsigned long)ip->i_d.di_crtime.t_nsec >= NSEC_PER_SEC)
+		ip->i_d.di_crtime.t_nsec = 0;
+}
+
+/* Fix inode flags that don't make sense together. */
+STATIC void
+xrep_inode_flags(
+	struct xfs_scrub	*sc)
+{
+	uint16_t		mode;
+
+	mode = VFS_I(sc->ip)->i_mode;
+
+	/* Clear junk flags */
+	if (sc->ip->i_d.di_flags & ~XFS_DIFLAG_ANY)
+		sc->ip->i_d.di_flags &= ~XFS_DIFLAG_ANY;
+
+	/* NEWRTBM only applies to realtime bitmaps */
+	if (sc->ip->i_ino == sc->mp->m_sb.sb_rbmino)
+		sc->ip->i_d.di_flags |= XFS_DIFLAG_NEWRTBM;
+	else
+		sc->ip->i_d.di_flags &= ~XFS_DIFLAG_NEWRTBM;
+
+	/* These only make sense for directories. */
+	if (!S_ISDIR(mode))
+		sc->ip->i_d.di_flags &= ~(XFS_DIFLAG_RTINHERIT |
+					  XFS_DIFLAG_EXTSZINHERIT |
+					  XFS_DIFLAG_PROJINHERIT |
+					  XFS_DIFLAG_NOSYMLINKS);
+
+	/* These only make sense for files. */
+	if (!S_ISREG(mode))
+		sc->ip->i_d.di_flags &= ~(XFS_DIFLAG_REALTIME |
+					  XFS_DIFLAG_EXTSIZE);
+
+	/* These only make sense for non-rt files. */
+	if (sc->ip->i_d.di_flags & XFS_DIFLAG_REALTIME)
+		sc->ip->i_d.di_flags &= ~XFS_DIFLAG_FILESTREAM;
+
+	/* Immutable and append only?  Drop the append. */
+	if ((sc->ip->i_d.di_flags & XFS_DIFLAG_IMMUTABLE) &&
+	    (sc->ip->i_d.di_flags & XFS_DIFLAG_APPEND))
+		sc->ip->i_d.di_flags &= ~XFS_DIFLAG_APPEND;
+
+	if (sc->ip->i_d.di_version < 3)
+		return;
+
+	/* Clear junk flags. */
+	if (sc->ip->i_d.di_flags2 & ~XFS_DIFLAG2_ANY)
+		sc->ip->i_d.di_flags2 &= ~XFS_DIFLAG2_ANY;
+
+	/* No reflink flag unless we support it and it's a file. */
+	if (!xfs_sb_version_hasreflink(&sc->mp->m_sb) ||
+	    !S_ISREG(mode))
+		sc->ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK;
+
+	/* DAX only applies to files and dirs. */
+	if (!(S_ISREG(mode) || S_ISDIR(mode)))
+		sc->ip->i_d.di_flags2 &= ~XFS_DIFLAG2_DAX;
+
+	/* No reflink files on the realtime device. */
+	if (sc->ip->i_d.di_flags & XFS_DIFLAG_REALTIME)
+		sc->ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK;
+
+	/* No mixing reflink and DAX yet. */
+	if (sc->ip->i_d.di_flags2 & XFS_DIFLAG2_REFLINK)
+		sc->ip->i_d.di_flags2 &= ~XFS_DIFLAG2_DAX;
+}
+
+/* Fix size problems with block/node format directories. */
+STATIC int
+xrep_inode_blockdir_size(
+	struct xfs_scrub	*sc)
+{
+	struct xfs_iext_cursor	icur;
+	struct xfs_bmbt_irec	got;
+	struct xfs_ifork	*ifp;
+	xfs_fileoff_t		off;
+	int			error;
+
+	/* Find the last block before 32G; this is the dir size. */
+	ifp = XFS_IFORK_PTR(sc->ip, XFS_DATA_FORK);
+	if (!(ifp->if_flags & XFS_IFEXTENTS)) {
+		error = xfs_iread_extents(sc->tp, sc->ip, XFS_DATA_FORK);
+		if (error)
+			return error;
+	}
+
+	off = XFS_B_TO_FSB(sc->mp, XFS_DIR2_SPACE_SIZE);
+	if (!xfs_iext_lookup_extent_before(sc->ip, ifp, &off, &icur, &got)) {
+		/* zero-extents directory? */
+		return -EFSCORRUPTED;
+	}
+
+	off = got.br_startoff + got.br_blockcount;
+	sc->ip->i_d.di_size = min_t(loff_t, XFS_DIR2_SPACE_SIZE,
+			XFS_FSB_TO_B(sc->mp, off));
+	return 0;
+}
+
+/* Fix size problems with short format directories. */
+STATIC int
+xrep_inode_sfdir_size(
+	struct xfs_scrub	*sc)
+{
+	struct xfs_ifork	*ifp;
+
+	ifp = XFS_IFORK_PTR(sc->ip, XFS_DATA_FORK);
+	sc->ip->i_d.di_size = ifp->if_bytes;
+	return 0;
+}
+
+/*
+ * Fix any irregularities in an inode's size now that we can iterate extent
+ * maps and access other regular inode data.
+ */
+STATIC int
+xrep_inode_size(
+	struct xfs_scrub	*sc)
+{
+	/*
+	 * Currently we only support fixing size on extents or btree format
+	 * directories.  Files can be any size and sizes for the other inode
+	 * special types are fixed by xrep_dinode_size.
+	 */
+	if (!S_ISDIR(VFS_I(sc->ip)->i_mode))
+		return 0;
+	switch (XFS_IFORK_FORMAT(sc->ip, XFS_DATA_FORK)) {
+	case XFS_DINODE_FMT_EXTENTS:
+	case XFS_DINODE_FMT_BTREE:
+		return xrep_inode_blockdir_size(sc);
+	case XFS_DINODE_FMT_LOCAL:
+		return xrep_inode_sfdir_size(sc);
+	default:
+		return 0;
+	}
+}
+
+/* Fix any irregularities in an inode that the verifiers don't catch. */
+STATIC int
+xrep_inode_problems(
+	struct xfs_scrub	*sc)
+{
+	int			error;
+
+	error = xrep_inode_blockcounts(sc);
+	if (error)
+		return error;
+	xrep_inode_timestamps(sc->ip);
+	xrep_inode_flags(sc);
+	xrep_inode_ids(sc);
+	error = xrep_inode_size(sc);
+	if (error)
+		return error;
+	xfs_trans_log_inode(sc->tp, sc->ip, XFS_ILOG_CORE);
+	return xfs_trans_roll_inode(&sc->tp, sc->ip);
+}
+
+/* Repair an inode's fields. */
+int
+xrep_inode(
+	struct xfs_scrub	*sc)
+{
+	int			error = 0;
+
+	/*
+	 * No inode?  That means we failed the _iget verifiers.  Repair all
+	 * the things that the inode verifiers care about, then retry _iget.
+	 */
+	if (!sc->ip) {
+		error = xrep_dinode_problems(sc);
+		if (error)
+			goto out;
+	}
+
+	/* By this point we had better have a working incore inode. */
+	ASSERT(sc->ip);
+	xfs_trans_ijoin(sc->tp, sc->ip, 0);
+
+	/* If we found corruption of any kind, try to fix it. */
+	if ((sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT) ||
+	    (sc->sm->sm_flags & XFS_SCRUB_OFLAG_XCORRUPT)) {
+		error = xrep_inode_problems(sc);
+		if (error)
+			goto out;
+	}
+
+	/* See if we can clear the reflink flag. */
+	if (xfs_is_reflink_inode(sc->ip))
+		return xfs_reflink_clear_inode_flag(sc->ip, &sc->tp);
+
+out:
+	return error;
+}
diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
index da12c20376ae..20e449c7a0df 100644
--- a/fs/xfs/scrub/repair.h
+++ b/fs/xfs/scrub/repair.h
@@ -65,6 +65,7 @@ int xrep_agi(struct xfs_scrub *sc);
 int xrep_allocbt(struct xfs_scrub *sc);
 int xrep_iallocbt(struct xfs_scrub *sc);
 int xrep_refcountbt(struct xfs_scrub *sc);
+int xrep_inode(struct xfs_scrub *sc);
 
 #else
 
@@ -102,6 +103,7 @@ xrep_reset_perag_resv(
 #define xrep_allocbt			xrep_notsupported
 #define xrep_iallocbt			xrep_notsupported
 #define xrep_refcountbt			xrep_notsupported
+#define xrep_inode			xrep_notsupported
 
 #endif /* CONFIG_XFS_ONLINE_REPAIR */
 
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 843eafe0acef..ae922801808d 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -271,7 +271,7 @@ static const struct xchk_meta_ops meta_scrub_ops[] = {
 		.type	= ST_INODE,
 		.setup	= xchk_setup_inode,
 		.scrub	= xchk_inode,
-		.repair	= xrep_notsupported,
+		.repair	= xrep_inode,
 	},
 	[XFS_SCRUB_TYPE_BMBTD] = {	/* inode data fork */
 		.type	= ST_INODE,


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 11/16] xfs: zap broken inode forks
  2018-07-26  0:19 [PATCH v17 00/16] xfs-4.19: online repair support Darrick J. Wong
                   ` (9 preceding siblings ...)
  2018-07-26  0:21 ` [PATCH 10/16] xfs: repair inode records Darrick J. Wong
@ 2018-07-26  0:21 ` Darrick J. Wong
  2018-07-26  0:21 ` [PATCH 12/16] xfs: repair inode block maps Darrick J. Wong
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2018-07-26  0:21 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, david, allison.henderson

From: Darrick J. Wong <darrick.wong@oracle.com>

Determine if inode fork damage is responsible for the inode being unable
to pass the ifork verifiers in xfs_iget and zap the fork contents if
this is true.  Once this is done the fork will be empty but we'll be
able to construct an in-core inode, and a subsequent call to the inode
fork repair ioctl will search the rmapbt to rebuild the records that
were in the fork.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_attr_leaf.c |   32 ++-
 fs/xfs/libxfs/xfs_attr_leaf.h |    2 
 fs/xfs/libxfs/xfs_bmap.c      |   21 ++
 fs/xfs/libxfs/xfs_bmap.h      |    2 
 fs/xfs/scrub/inode_repair.c   |  401 +++++++++++++++++++++++++++++++++++++++++
 5 files changed, 439 insertions(+), 19 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
index a673037c7d37..54e24001a2d0 100644
--- a/fs/xfs/libxfs/xfs_attr_leaf.c
+++ b/fs/xfs/libxfs/xfs_attr_leaf.c
@@ -898,23 +898,16 @@ xfs_attr_shortform_allfit(
 	return xfs_attr_shortform_bytesfit(dp, bytes);
 }
 
-/* Verify the consistency of an inline attribute fork. */
+/* Verify the consistency of a raw inline attribute fork. */
 xfs_failaddr_t
-xfs_attr_shortform_verify(
-	struct xfs_inode		*ip)
+xfs_attr_shortform_verify_struct(
+	struct xfs_attr_shortform	*sfp,
+	size_t				size)
 {
-	struct xfs_attr_shortform	*sfp;
 	struct xfs_attr_sf_entry	*sfep;
 	struct xfs_attr_sf_entry	*next_sfep;
 	char				*endp;
-	struct xfs_ifork		*ifp;
 	int				i;
-	int				size;
-
-	ASSERT(ip->i_d.di_aformat == XFS_DINODE_FMT_LOCAL);
-	ifp = XFS_IFORK_PTR(ip, XFS_ATTR_FORK);
-	sfp = (struct xfs_attr_shortform *)ifp->if_u1.if_data;
-	size = ifp->if_bytes;
 
 	/*
 	 * Give up if the attribute is way too short.
@@ -972,6 +965,23 @@ xfs_attr_shortform_verify(
 	return NULL;
 }
 
+/* Verify the consistency of an inline attribute fork. */
+xfs_failaddr_t
+xfs_attr_shortform_verify(
+	struct xfs_inode		*ip)
+{
+	struct xfs_attr_shortform	*sfp;
+	struct xfs_ifork		*ifp;
+	int				size;
+
+	ASSERT(ip->i_d.di_aformat == XFS_DINODE_FMT_LOCAL);
+	ifp = XFS_IFORK_PTR(ip, XFS_ATTR_FORK);
+	sfp = (struct xfs_attr_shortform *)ifp->if_u1.if_data;
+	size = ifp->if_bytes;
+
+	return xfs_attr_shortform_verify_struct(sfp, size);
+}
+
 /*
  * Convert a leaf attribute list to shortform attribute list
  */
diff --git a/fs/xfs/libxfs/xfs_attr_leaf.h b/fs/xfs/libxfs/xfs_attr_leaf.h
index 7b74e18becff..728af25a1738 100644
--- a/fs/xfs/libxfs/xfs_attr_leaf.h
+++ b/fs/xfs/libxfs/xfs_attr_leaf.h
@@ -41,6 +41,8 @@ int	xfs_attr_shortform_to_leaf(struct xfs_da_args *args,
 int	xfs_attr_shortform_remove(struct xfs_da_args *args);
 int	xfs_attr_shortform_allfit(struct xfs_buf *bp, struct xfs_inode *dp);
 int	xfs_attr_shortform_bytesfit(struct xfs_inode *dp, int bytes);
+xfs_failaddr_t xfs_attr_shortform_verify_struct(struct xfs_attr_shortform *sfp,
+		size_t size);
 xfs_failaddr_t xfs_attr_shortform_verify(struct xfs_inode *ip);
 void	xfs_attr_fork_remove(struct xfs_inode *ip, struct xfs_trans *tp);
 
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 50119b54a2b5..cf89c5cfd8f6 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -6201,18 +6201,16 @@ xfs_bmap_finish_one(
 	return error;
 }
 
-/* Check that an inode's extent does not have invalid flags or bad ranges. */
+/* Check that an extent does not have invalid flags or bad ranges. */
 xfs_failaddr_t
-xfs_bmap_validate_extent(
-	struct xfs_inode	*ip,
+xfs_bmap_validate_extent_raw(
+	struct xfs_mount	*mp,
+	bool			isrt,
 	int			whichfork,
 	struct xfs_bmbt_irec	*irec)
 {
-	struct xfs_mount	*mp = ip->i_mount;
 	xfs_fsblock_t		endfsb;
-	bool			isrt;
 
-	isrt = XFS_IS_REALTIME_INODE(ip);
 	endfsb = irec->br_startblock + irec->br_blockcount - 1;
 	if (isrt) {
 		if (!xfs_verify_rtbno(mp, irec->br_startblock))
@@ -6236,3 +6234,14 @@ xfs_bmap_validate_extent(
 	}
 	return NULL;
 }
+
+/* Check that an inode's extent does not have invalid flags or bad ranges. */
+xfs_failaddr_t
+xfs_bmap_validate_extent(
+	struct xfs_inode	*ip,
+	int			whichfork,
+	struct xfs_bmbt_irec	*irec)
+{
+	return xfs_bmap_validate_extent_raw(ip->i_mount,
+			XFS_IS_REALTIME_INODE(ip), whichfork, irec);
+}
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index 9b49ddf99c41..d2c15b2f0fc9 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -284,6 +284,8 @@ static inline int xfs_bmap_fork_to_state(int whichfork)
 	}
 }
 
+xfs_failaddr_t xfs_bmap_validate_extent_raw(struct xfs_mount *mp, bool isrt,
+		int whichfork, struct xfs_bmbt_irec *irec);
 xfs_failaddr_t xfs_bmap_validate_extent(struct xfs_inode *ip, int whichfork,
 		struct xfs_bmbt_irec *irec);
 
diff --git a/fs/xfs/scrub/inode_repair.c b/fs/xfs/scrub/inode_repair.c
index 5ff929ea3d11..48918f09ebc9 100644
--- a/fs/xfs/scrub/inode_repair.c
+++ b/fs/xfs/scrub/inode_repair.c
@@ -22,11 +22,15 @@
 #include "xfs_ialloc.h"
 #include "xfs_da_format.h"
 #include "xfs_reflink.h"
+#include "xfs_alloc.h"
 #include "xfs_rmap.h"
+#include "xfs_rmap_btree.h"
 #include "xfs_bmap.h"
+#include "xfs_bmap_btree.h"
 #include "xfs_bmap_util.h"
 #include "xfs_dir2.h"
 #include "xfs_quota_defs.h"
+#include "xfs_attr_leaf.h"
 #include "scrub/xfs_scrub.h"
 #include "scrub/scrub.h"
 #include "scrub/common.h"
@@ -139,7 +143,8 @@ xrep_dinode_mode(
 STATIC void
 xrep_dinode_flags(
 	struct xfs_scrub	*sc,
-	struct xfs_dinode	*dip)
+	struct xfs_dinode	*dip,
+	bool			is_rt_file)
 {
 	struct xfs_mount	*mp = sc->mp;
 	uint64_t		flags2;
@@ -150,6 +155,11 @@ xrep_dinode_flags(
 	flags = be16_to_cpu(dip->di_flags);
 	flags2 = be64_to_cpu(dip->di_flags2);
 
+	if (is_rt_file)
+		flags |= XFS_DIFLAG_REALTIME;
+	else
+		flags &= ~XFS_DIFLAG_REALTIME;
+
 	if (xfs_sb_version_hasreflink(&mp->m_sb) && S_ISREG(mode))
 		flags2 |= XFS_DIFLAG2_REFLINK;
 	else
@@ -288,11 +298,392 @@ xrep_dinode_extsize_hints(
 	}
 }
 
+/* Blocks and extents associated with an inode, according to rmap records. */
+struct xrep_dinode_stats {
+	struct xfs_scrub	*sc;
+
+	/* Blocks in use on the data device by data extents or bmbt blocks. */
+	xfs_rfsblock_t		data_blocks;
+
+	/* Blocks in use on the rt device. */
+	xfs_rfsblock_t		rt_blocks;
+
+	/* Blocks in use by the attr fork. */
+	xfs_rfsblock_t		attr_blocks;
+
+	/* Number of data device extents for the data fork. */
+	xfs_extnum_t		data_extents;
+
+	/*
+	 * Number of realtime device extents for the data fork.  If
+	 * data_extents and rt_extents indicate that the data fork has extents
+	 * on both devices, we'll just back away slowly.
+	 */
+	xfs_extnum_t		rt_extents;
+
+	/* Number of (data device) extents for the attr fork. */
+	xfs_aextnum_t		attr_extents;
+};
+
+/* Count extents and blocks for an inode given an rmap. */
+STATIC int
+xrep_dinode_walk_rmap(
+	struct xfs_btree_cur		*cur,
+	struct xfs_rmap_irec		*rec,
+	void				*priv)
+{
+	struct xrep_dinode_stats	*dis = priv;
+
+	/* Is this even the right fork? */
+	if (rec->rm_owner != dis->sc->sm->sm_ino)
+		return 0;
+	if (rec->rm_flags & XFS_RMAP_ATTR_FORK) {
+		dis->attr_blocks += rec->rm_blockcount;
+		if (!(rec->rm_flags & XFS_RMAP_BMBT_BLOCK))
+			dis->attr_extents++;
+	} else {
+		dis->data_blocks += rec->rm_blockcount;
+		if (!(rec->rm_flags & XFS_RMAP_BMBT_BLOCK))
+			dis->data_extents++;
+	}
+	return 0;
+}
+
+/* Count extents and blocks for an inode from all AG rmap data. */
+STATIC int
+xrep_dinode_count_ag_rmaps(
+	struct xrep_dinode_stats	*dis,
+	xfs_agnumber_t			agno)
+{
+	struct xfs_btree_cur		*cur;
+	struct xfs_buf			*agf;
+	int				error;
+
+	error = xfs_alloc_read_agf(dis->sc->mp, dis->sc->tp, agno, 0, &agf);
+	if (error)
+		return error;
+
+	cur = xfs_rmapbt_init_cursor(dis->sc->mp, dis->sc->tp, agf, agno);
+	if (!cur) {
+		error = -ENOMEM;
+		goto out_agf;
+	}
+
+	error = xfs_rmap_query_all(cur, xrep_dinode_walk_rmap, dis);
+	if (error == XFS_BTREE_QUERY_RANGE_ABORT)
+		error = 0;
+
+	xfs_btree_del_cursor(cur, error);
+out_agf:
+	xfs_trans_brelse(dis->sc->tp, agf);
+	return error;
+}
+
+/* Count extents and blocks for a given inode from all rmap data. */
+STATIC int
+xrep_dinode_count_rmaps(
+	struct xrep_dinode_stats	*dis)
+{
+	xfs_agnumber_t			agno;
+	int				error;
+
+	if (!xfs_sb_version_hasrmapbt(&dis->sc->mp->m_sb) ||
+	    xfs_sb_version_hasrealtime(&dis->sc->mp->m_sb))
+		return -EOPNOTSUPP;
+
+	/* XXX: find rt blocks too */
+	if (dis->rt_extents != 0) {
+		ASSERT(0);
+		return -EOPNOTSUPP;
+	}
+
+	for (agno = 0; agno < dis->sc->mp->m_sb.sb_agcount; agno++) {
+		error = xrep_dinode_count_ag_rmaps(dis, agno);
+		if (error)
+			return error;
+	}
+
+	/* Can't have extents on both the rt and the data device. */
+	if (dis->data_extents && dis->rt_extents)
+		return -EFSCORRUPTED;
+
+	return 0;
+}
+
+/* Return true if this extents-format ifork looks like garbage. */
+STATIC bool
+xrep_dinode_bad_extents_fork(
+	struct xfs_scrub	*sc,
+	struct xfs_dinode	*dip,
+	int			dfork_size,
+	int			whichfork)
+{
+	struct xfs_bmbt_irec	new;
+	struct xfs_bmbt_rec	*dp;
+	bool			isrt;
+	int			i;
+	int			nex;
+	int			fork_size;
+
+	nex = XFS_DFORK_NEXTENTS(dip, whichfork);
+	fork_size = nex * sizeof(struct xfs_bmbt_rec);
+	if (fork_size < 0 || fork_size > dfork_size)
+		return true;
+	if (whichfork == XFS_ATTR_FORK && nex > ((uint16_t)-1U))
+		return true;
+	dp = XFS_DFORK_PTR(dip, whichfork);
+
+	isrt = dip->di_flags & cpu_to_be16(XFS_DIFLAG_REALTIME);
+	for (i = 0; i < nex; i++, dp++) {
+		xfs_failaddr_t	fa;
+
+		xfs_bmbt_disk_get_all(dp, &new);
+		fa = xfs_bmap_validate_extent_raw(sc->mp, isrt, whichfork,
+				&new);
+		if (fa)
+			return true;
+	}
+
+	return false;
+}
+
+/* Return true if this btree-format ifork looks like garbage. */
+STATIC bool
+xrep_dinode_bad_btree_fork(
+	struct xfs_scrub	*sc,
+	struct xfs_dinode	*dip,
+	int			dfork_size,
+	int			whichfork)
+{
+	struct xfs_bmdr_block	*dfp;
+	int			nrecs;
+	int			level;
+
+	if (XFS_DFORK_NEXTENTS(dip, whichfork) <=
+			dfork_size / sizeof(struct xfs_bmbt_irec))
+		return true;
+
+	dfp = XFS_DFORK_PTR(dip, whichfork);
+	nrecs = be16_to_cpu(dfp->bb_numrecs);
+	level = be16_to_cpu(dfp->bb_level);
+
+	if (nrecs == 0 || XFS_BMDR_SPACE_CALC(nrecs) > dfork_size)
+		return true;
+	if (level == 0 || level > XFS_BTREE_MAXLEVELS)
+		return true;
+	return false;
+}
+
+/*
+ * Check the data fork for things that will fail the ifork verifiers or the
+ * ifork formatters.
+ */
+STATIC bool
+xrep_dinode_check_dfork(
+	struct xfs_scrub	*sc,
+	struct xfs_dinode	*dip,
+	uint16_t		mode)
+{
+	uint64_t		size;
+	unsigned int		fmt;
+	int			dfork_size;
+
+	fmt = XFS_DFORK_FORMAT(dip, XFS_DATA_FORK);
+	size = be64_to_cpu(dip->di_size);
+	switch (mode & S_IFMT) {
+	case S_IFIFO:
+	case S_IFCHR:
+	case S_IFBLK:
+	case S_IFSOCK:
+		if (fmt != XFS_DINODE_FMT_DEV)
+			return true;
+		break;
+	case S_IFREG:
+		if (fmt == XFS_DINODE_FMT_LOCAL)
+			return true;
+		/* fall through */
+	case S_IFLNK:
+	case S_IFDIR:
+		switch (fmt) {
+		case XFS_DINODE_FMT_LOCAL:
+		case XFS_DINODE_FMT_EXTENTS:
+		case XFS_DINODE_FMT_BTREE:
+			break;
+		default:
+			return true;
+		}
+		break;
+	default:
+		return true;
+	}
+	dfork_size = XFS_DFORK_SIZE(dip, sc->mp, XFS_DATA_FORK);
+	switch (fmt) {
+	case XFS_DINODE_FMT_DEV:
+		break;
+	case XFS_DINODE_FMT_LOCAL:
+		if (size > dfork_size)
+			return true;
+		break;
+	case XFS_DINODE_FMT_EXTENTS:
+		if (xrep_dinode_bad_extents_fork(sc, dip, dfork_size,
+				XFS_DATA_FORK))
+			return true;
+		break;
+	case XFS_DINODE_FMT_BTREE:
+		if (xrep_dinode_bad_btree_fork(sc, dip, dfork_size,
+				XFS_DATA_FORK))
+			return true;
+		break;
+	default:
+		return true;
+	}
+
+	return false;
+}
+
+/* Reset the data fork to something sane. */
+STATIC void
+xrep_dinode_zap_dfork(
+	struct xfs_scrub		*sc,
+	struct xfs_dinode		*dip,
+	uint16_t			mode,
+	struct xrep_dinode_stats	*dis)
+{
+	/* Special files always get reset to DEV */
+	switch (mode & S_IFMT) {
+	case S_IFIFO:
+	case S_IFCHR:
+	case S_IFBLK:
+	case S_IFSOCK:
+		dip->di_format = XFS_DINODE_FMT_DEV;
+		dip->di_size = 0;
+		return;
+	}
+
+	/*
+	 * If we have data extents, reset to an empty map and hope the user
+	 * will run the bmapbtd checker next.
+	 */
+	if (dis->data_extents || dis->rt_extents || S_ISREG(mode)) {
+		dip->di_format = XFS_DINODE_FMT_EXTENTS;
+		dip->di_nextents = 0;
+		return;
+	}
+
+	/* Otherwise, reset the local format to the minimum. */
+	switch (mode & S_IFMT) {
+	case S_IFLNK:
+		xrep_dinode_zap_symlink(dip);
+		break;
+	case S_IFDIR:
+		xrep_dinode_zap_dir(sc->mp, dip);
+		break;
+	}
+}
+
+/*
+ * Check the attr fork for things that will fail the ifork verifiers or the
+ * ifork formatters.
+ */
+STATIC bool
+xrep_dinode_check_afork(
+	struct xfs_scrub		*sc,
+	struct xfs_dinode		*dip)
+{
+	struct xfs_attr_shortform	*sfp;
+	int				size;
+
+	if (XFS_DFORK_BOFF(dip) == 0)
+		return dip->di_aformat != XFS_DINODE_FMT_EXTENTS ||
+		       dip->di_anextents != 0;
+
+	size = XFS_DFORK_SIZE(dip, sc->mp, XFS_ATTR_FORK);
+	switch (XFS_DFORK_FORMAT(dip, XFS_ATTR_FORK)) {
+	case XFS_DINODE_FMT_LOCAL:
+		sfp = XFS_DFORK_PTR(dip, XFS_ATTR_FORK);
+		return xfs_attr_shortform_verify_struct(sfp, size) != NULL;
+	case XFS_DINODE_FMT_EXTENTS:
+		if (xrep_dinode_bad_extents_fork(sc, dip, size, XFS_ATTR_FORK))
+			return true;
+		break;
+	case XFS_DINODE_FMT_BTREE:
+		if (xrep_dinode_bad_btree_fork(sc, dip, size, XFS_ATTR_FORK))
+			return true;
+		break;
+	default:
+		return true;
+	}
+
+	return false;
+}
+
+/* Reset the attr fork to something sane. */
+STATIC void
+xrep_dinode_zap_afork(
+	struct xfs_scrub		*sc,
+	struct xfs_dinode		*dip,
+	struct xrep_dinode_stats	*dis)
+{
+	dip->di_aformat = XFS_DINODE_FMT_EXTENTS;
+	dip->di_anextents = 0;
+	/*
+	 * We leave a nonzero forkoff so that the bmap scrub will look for
+	 * attr rmaps.
+	 */
+	dip->di_forkoff = dis->attr_extents ? 1 : 0;
+}
+
+/*
+ * Zap the data/attr forks if we spot anything that isn't going to pass the
+ * ifork verifiers or the ifork formatters, because we need to get the inode
+ * into good enough shape that the higher level repair functions can run.
+ */
+STATIC void
+xrep_dinode_zap_forks(
+	struct xfs_scrub		*sc,
+	struct xfs_dinode		*dip,
+	struct xrep_dinode_stats	*dis)
+{
+	uint16_t			mode;
+	bool				zap_datafork = false;
+	bool				zap_attrfork = false;
+
+	mode = be16_to_cpu(dip->di_mode);
+
+	/* Inode counters don't make sense? */
+	if (be32_to_cpu(dip->di_nextents) > be64_to_cpu(dip->di_nblocks))
+		zap_datafork = true;
+	if (be16_to_cpu(dip->di_anextents) > be64_to_cpu(dip->di_nblocks))
+		zap_attrfork = true;
+	if (be32_to_cpu(dip->di_nextents) + be16_to_cpu(dip->di_anextents) >
+			be64_to_cpu(dip->di_nblocks))
+		zap_datafork = zap_attrfork = true;
+
+	if (!zap_datafork)
+		zap_datafork = xrep_dinode_check_dfork(sc, dip, mode);
+	if (!zap_attrfork)
+		zap_attrfork = xrep_dinode_check_afork(sc, dip);
+
+	/* Zap whatever's bad. */
+	if (zap_attrfork)
+		xrep_dinode_zap_afork(sc, dip, dis);
+	if (zap_datafork)
+		xrep_dinode_zap_dfork(sc, dip, mode, dis);
+	dip->di_nblocks = 0;
+	if (!zap_attrfork)
+		be64_add_cpu(&dip->di_nblocks, dis->attr_blocks);
+	if (!zap_datafork) {
+		be64_add_cpu(&dip->di_nblocks, dis->data_blocks);
+		be64_add_cpu(&dip->di_nblocks, dis->rt_blocks);
+	}
+}
+
 /* Inode didn't pass verifiers, so fix the raw buffer and retry iget. */
 STATIC int
 xrep_dinode_core(
 	struct xfs_scrub	*sc)
 {
+	struct xrep_dinode_stats	dis = { .sc = sc };
 	struct xfs_imap		imap;
 	struct xfs_buf		*bp;
 	struct xfs_dinode	*dip;
@@ -300,6 +691,11 @@ xrep_dinode_core(
 	bool			inuse;
 	int			error;
 
+	/* Figure out what this inode had mapped in both forks. */
+	error = xrep_dinode_count_rmaps(&dis);
+	if (error)
+		return error;
+
 	/* Map & read inode. */
 	ino = sc->sm->sm_ino;
 	error = xfs_imap(sc->mp, sc->tp, ino, &imap, XFS_IGET_UNTRUSTED);
@@ -326,9 +722,10 @@ xrep_dinode_core(
 	dip = xfs_buf_offset(bp, imap.im_boffset);
 	xrep_dinode_header(sc, dip);
 	xrep_dinode_mode(dip);
-	xrep_dinode_flags(sc, dip);
+	xrep_dinode_flags(sc, dip, dis.rt_extents > 0);
 	xrep_dinode_size(sc->mp, dip);
 	xrep_dinode_extsize_hints(sc, dip);
+	xrep_dinode_zap_forks(sc, dip, &dis);
 
 	/* Write out the inode... */
 	xfs_dinode_calc_crc(sc->mp, dip);


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 12/16] xfs: repair inode block maps
  2018-07-26  0:19 [PATCH v17 00/16] xfs-4.19: online repair support Darrick J. Wong
                   ` (10 preceding siblings ...)
  2018-07-26  0:21 ` [PATCH 11/16] xfs: zap broken inode forks Darrick J. Wong
@ 2018-07-26  0:21 ` Darrick J. Wong
  2018-07-26  0:21 ` [PATCH 13/16] xfs: repair damaged symlinks Darrick J. Wong
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2018-07-26  0:21 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, david, allison.henderson

From: Darrick J. Wong <darrick.wong@oracle.com>

Use the reverse-mapping btree information to rebuild an inode fork.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile            |    1 
 fs/xfs/scrub/bmap.c        |   22 ++
 fs/xfs/scrub/bmap_repair.c |  520 ++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/repair.h      |    4 
 fs/xfs/scrub/scrub.c       |    4 
 fs/xfs/scrub/trace.h       |    2 
 fs/xfs/xfs_trans.c         |   54 +++++
 fs/xfs/xfs_trans.h         |    2 
 8 files changed, 606 insertions(+), 3 deletions(-)
 create mode 100644 fs/xfs/scrub/bmap_repair.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index e01b5003d543..7f5467bb18b9 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -166,6 +166,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   agheader_repair.o \
 				   alloc_repair.o \
 				   bitmap.o \
+				   bmap_repair.o \
 				   ialloc_repair.o \
 				   inode_repair.o \
 				   refcount_repair.o \
diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c
index e1d11f3223e3..6659f41e7b4c 100644
--- a/fs/xfs/scrub/bmap.c
+++ b/fs/xfs/scrub/bmap.c
@@ -37,6 +37,7 @@ xchk_setup_inode_bmap(
 	struct xfs_scrub	*sc,
 	struct xfs_inode	*ip)
 {
+	bool			is_repair = false;
 	int			error;
 
 	error = xchk_get_inode(sc, ip);
@@ -46,6 +47,10 @@ xchk_setup_inode_bmap(
 	sc->ilock_flags = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL;
 	xfs_ilock(sc->ip, sc->ilock_flags);
 
+#ifdef CONFIG_XFS_REPAIR
+	is_repair = (sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR);
+#endif
+
 	/*
 	 * We don't want any ephemeral data fork updates sitting around
 	 * while we inspect block mappings, so wait for directio to finish
@@ -53,10 +58,27 @@ xchk_setup_inode_bmap(
 	 */
 	if (S_ISREG(VFS_I(sc->ip)->i_mode) &&
 	    sc->sm->sm_type == XFS_SCRUB_TYPE_BMBTD) {
+		/* Break all our leases, we're going to mess with things. */
+		if (is_repair) {
+			error = xfs_break_layouts(VFS_I(sc->ip),
+					&sc->ilock_flags, BREAK_UNMAP);
+			if (error)
+				goto out;
+		}
+
 		inode_dio_wait(VFS_I(sc->ip));
 		error = filemap_write_and_wait(VFS_I(sc->ip)->i_mapping);
 		if (error)
 			goto out;
+
+		/* Drop the page cache if we're repairing block mappings. */
+		if (is_repair) {
+			error = invalidate_inode_pages2(
+					VFS_I(sc->ip)->i_mapping);
+			if (error)
+				goto out;
+		}
+
 	}
 
 	/* Got the inode, lock it and we're ready to go. */
diff --git a/fs/xfs/scrub/bmap_repair.c b/fs/xfs/scrub/bmap_repair.c
new file mode 100644
index 000000000000..9b0172be0335
--- /dev/null
+++ b/fs/xfs/scrub/bmap_repair.c
@@ -0,0 +1,520 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_inode_fork.h"
+#include "xfs_alloc.h"
+#include "xfs_rtalloc.h"
+#include "xfs_bmap.h"
+#include "xfs_bmap_util.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_rmap.h"
+#include "xfs_rmap_btree.h"
+#include "xfs_refcount.h"
+#include "xfs_quota.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/btree.h"
+#include "scrub/trace.h"
+#include "scrub/repair.h"
+#include "scrub/bitmap.h"
+
+/*
+ * Inode fork block mapping (BMBT) repair.
+ *
+ * Basically, we gather all the rmap records for the inode and fork we're
+ * fixing, reset the incore fork, then re-add all the records.
+ */
+
+struct xrep_bmap_extent {
+	struct list_head	list;
+	struct xfs_rmap_irec	rmap;
+	xfs_agnumber_t		agno;
+};
+
+struct xrep_bmap {
+	/* List of new bmap records. */
+	struct list_head	*extlist;
+
+	/* Old bmbt blocks */
+	struct xfs_bitmap	*btlist;
+
+	struct xfs_scrub	*sc;
+
+	/* Inode we're fixing. */
+	xfs_ino_t		ino;
+
+	/* How many blocks did we find in the other fork? */
+	xfs_rfsblock_t		otherfork_blocks;
+
+	/* How many bmbt blocks did we find for this fork? */
+	xfs_rfsblock_t		bmbt_blocks;
+
+	/* How many extents did we find for this fork? */
+	xfs_extnum_t		extents;
+
+	/* Which fork are we fixing? */
+	int			whichfork;
+};
+
+/* Record extents that belong to this inode's fork. */
+STATIC int
+xrep_bmap_walk_rmap(
+	struct xfs_btree_cur	*cur,
+	struct xfs_rmap_irec	*rec,
+	void			*priv)
+{
+	struct xrep_bmap	*rb = priv;
+	struct xrep_bmap_extent	*rbe;
+	struct xfs_mount	*mp = cur->bc_mp;
+	xfs_fsblock_t		fsbno;
+	int			error = 0;
+
+	if (xchk_should_terminate(rb->sc, &error))
+		return error;
+
+	/* Skip extents which are not owned by this inode and fork. */
+	if (rec->rm_owner != rb->ino) {
+		return 0;
+	} else if (rb->whichfork == XFS_DATA_FORK &&
+		 (rec->rm_flags & XFS_RMAP_ATTR_FORK)) {
+		rb->otherfork_blocks += rec->rm_blockcount;
+		return 0;
+	} else if (rb->whichfork == XFS_ATTR_FORK &&
+		 !(rec->rm_flags & XFS_RMAP_ATTR_FORK)) {
+		rb->otherfork_blocks += rec->rm_blockcount;
+		return 0;
+	}
+
+	/* Delete the old bmbt blocks later. */
+	if (rec->rm_flags & XFS_RMAP_BMBT_BLOCK) {
+		fsbno = XFS_AGB_TO_FSB(mp, cur->bc_private.a.agno,
+				rec->rm_startblock);
+		rb->bmbt_blocks += rec->rm_blockcount;
+		return xfs_bitmap_set(rb->btlist, fsbno, rec->rm_blockcount);
+	}
+
+	/* Remember this rmap. */
+	rb->extents++;
+	trace_xrep_bmap_walk_rmap(mp, cur->bc_private.a.agno,
+			rec->rm_startblock, rec->rm_blockcount, rec->rm_owner,
+			rec->rm_offset, rec->rm_flags);
+
+	rbe = kmem_alloc(sizeof(struct xrep_bmap_extent), KM_MAYFAIL);
+	if (!rbe)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&rbe->list);
+	rbe->rmap = *rec;
+	rbe->agno = cur->bc_private.a.agno;
+	list_add_tail(&rbe->list, rb->extlist);
+
+	return 0;
+}
+
+/* Compare two bmap extents. */
+static int
+xrep_bmap_extent_cmp(
+	void			*priv,
+	struct list_head	*a,
+	struct list_head	*b)
+{
+	struct xrep_bmap_extent	*ap;
+	struct xrep_bmap_extent	*bp;
+
+	ap = container_of(a, struct xrep_bmap_extent, list);
+	bp = container_of(b, struct xrep_bmap_extent, list);
+
+	if (ap->rmap.rm_offset > bp->rmap.rm_offset)
+		return 1;
+	else if (ap->rmap.rm_offset < bp->rmap.rm_offset)
+		return -1;
+	return 0;
+}
+
+/* Scan one AG for reverse mappings that we can turn into extent maps. */
+STATIC int
+xrep_bmap_scan_ag(
+	struct xrep_bmap	*rb,
+	xfs_agnumber_t		agno)
+{
+	struct xfs_scrub	*sc = rb->sc;
+	struct xfs_mount	*mp = sc->mp;
+	struct xfs_buf		*agf_bp = NULL;
+	struct xfs_btree_cur	*cur;
+	int			error;
+
+	error = xfs_alloc_read_agf(mp, sc->tp, agno, 0, &agf_bp);
+	if (error)
+		return error;
+	if (!agf_bp)
+		return -ENOMEM;
+	cur = xfs_rmapbt_init_cursor(mp, sc->tp, agf_bp, agno);
+	error = xfs_rmap_query_all(cur, xrep_bmap_walk_rmap, rb);
+	if (error == XFS_BTREE_QUERY_RANGE_ABORT)
+		error = 0;
+	xfs_btree_del_cursor(cur, error);
+	xfs_trans_brelse(sc->tp, agf_bp);
+	return error;
+}
+
+/* Insert bmap records into an inode fork, given an rmap. */
+STATIC int
+xrep_bmap_insert_rec(
+	struct xfs_scrub	*sc,
+	struct xrep_bmap_extent	*rbe,
+	int			baseflags)
+{
+	struct xfs_bmbt_irec	bmap;
+	struct xfs_defer_ops	dfops;
+	xfs_fsblock_t		firstfsb;
+	xfs_extlen_t		extlen;
+	int			flags;
+	int			error = 0;
+
+	/* Form the "new" mapping... */
+	bmap.br_startblock = XFS_AGB_TO_FSB(sc->mp, rbe->agno,
+			rbe->rmap.rm_startblock);
+	bmap.br_startoff = rbe->rmap.rm_offset;
+
+	flags = 0;
+	if (rbe->rmap.rm_flags & XFS_RMAP_UNWRITTEN)
+		flags = XFS_BMAPI_PREALLOC;
+	while (rbe->rmap.rm_blockcount > 0) {
+		xfs_defer_init(&dfops, &firstfsb);
+		extlen = min_t(xfs_extlen_t, rbe->rmap.rm_blockcount,
+				MAXEXTLEN);
+		bmap.br_blockcount = extlen;
+
+		/* Re-add the extent to the fork. */
+		error = xfs_bmapi_remap(sc->tp, sc->ip, bmap.br_startoff,
+				extlen, bmap.br_startblock, &dfops,
+				baseflags | flags);
+		if (error)
+			goto out_cancel;
+
+		bmap.br_startblock += extlen;
+		bmap.br_startoff += extlen;
+		rbe->rmap.rm_blockcount -= extlen;
+		error = xfs_defer_ijoin(&dfops, sc->ip);
+		if (error)
+			goto out_cancel;
+		error = xfs_defer_finish(&sc->tp, &dfops);
+		if (error)
+			goto out;
+		/* Make sure we roll the transaction. */
+		error = xfs_trans_roll_inode(&sc->tp, sc->ip);
+		if (error)
+			goto out;
+	}
+
+	return 0;
+out_cancel:
+	xfs_defer_cancel(&dfops);
+out:
+	return error;
+}
+
+/* Check for garbage inputs. */
+STATIC int
+xrep_bmap_check_inputs(
+	struct xfs_scrub	*sc,
+	int			whichfork)
+{
+	ASSERT(whichfork == XFS_DATA_FORK || whichfork == XFS_ATTR_FORK);
+
+	/* Don't know how to repair the other fork formats. */
+	if (XFS_IFORK_FORMAT(sc->ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
+	    XFS_IFORK_FORMAT(sc->ip, whichfork) != XFS_DINODE_FMT_BTREE)
+		return -EOPNOTSUPP;
+
+	/*
+	 * If there's no attr fork area in the inode, there's no attr fork to
+	 * rebuild.
+	 */
+	if (whichfork == XFS_ATTR_FORK) {
+		if (!XFS_IFORK_Q(sc->ip))
+			return -ENOENT;
+		return 0;
+	}
+
+	/* Only files, symlinks, and directories get to have data forks. */
+	switch (VFS_I(sc->ip)->i_mode & S_IFMT) {
+	case S_IFREG:
+	case S_IFDIR:
+	case S_IFLNK:
+		/* ok */
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	/* If we somehow have delalloc extents, forget it. */
+	if (sc->ip->i_delayed_blks)
+		return -EBUSY;
+
+	/* Don't know how to rebuild realtime data forks. */
+	if (XFS_IS_REALTIME_INODE(sc->ip))
+		return -EOPNOTSUPP;
+
+	return 0;
+}
+
+/*
+ * Collect block mappings for this fork of this inode and decide if we have
+ * enough space to rebuild.  Caller is responsible for cleaning up the list if
+ * anything goes wrong.
+ */
+STATIC int
+xrep_bmap_find_mappings(
+	struct xfs_scrub	*sc,
+	int			whichfork,
+	struct list_head	*mapping_records,
+	struct xfs_bitmap	*old_bmbt_blocks,
+	xfs_rfsblock_t		*old_bmbt_block_count,
+	xfs_rfsblock_t		*otherfork_blocks)
+{
+	struct xrep_bmap	rb;
+	xfs_agnumber_t		agno;
+	unsigned int		resblks;
+	int			error;
+
+	memset(&rb, 0, sizeof(rb));
+	rb.extlist = mapping_records;
+	rb.btlist = old_bmbt_blocks;
+	rb.ino = sc->ip->i_ino;
+	rb.whichfork = whichfork;
+	rb.sc = sc;
+
+	/* Iterate the rmaps for extents. */
+	for (agno = 0; agno < sc->mp->m_sb.sb_agcount; agno++) {
+		error = xrep_bmap_scan_ag(&rb, agno);
+		if (error)
+			return error;
+	}
+
+	/*
+	 * Guess how many blocks we're going to need to rebuild an entire bmap
+	 * from the number of extents we found, and pump up our transaction to
+	 * have sufficient block reservation.
+	 */
+	resblks = xfs_bmbt_calc_size(sc->mp, rb.extents);
+	error = xfs_trans_reserve_more(sc->tp, resblks, 0);
+	if (error)
+		return error;
+
+	*otherfork_blocks = rb.otherfork_blocks;
+	*old_bmbt_block_count = rb.bmbt_blocks;
+	return 0;
+}
+
+/* Update the inode counters. */
+STATIC int
+xrep_bmap_reset_counters(
+	struct xfs_scrub	*sc,
+	xfs_rfsblock_t		old_bmbt_block_count,
+	xfs_rfsblock_t		otherfork_blocks,
+	int			*log_flags)
+{
+	int			error;
+
+	xfs_trans_ijoin(sc->tp, sc->ip, 0);
+
+	/*
+	 * We're going to use the bmap routines to reconstruct a fork from rmap
+	 * records.  Those functions increment di_nblocks for us, so we need to
+	 * subtract out all the data and bmbt blocks from the fork we're about
+	 * to rebuild.  otherfork_blocks reflects all the data and bmbt blocks
+	 * for the other fork, so this assignment effectively performs the
+	 * subtraction for us.
+	 */
+	sc->ip->i_d.di_nblocks = otherfork_blocks;
+	*log_flags |= XFS_ILOG_CORE;
+
+	if (!old_bmbt_block_count)
+		return 0;
+
+	/* Release quota counts for the old bmbt blocks. */
+	error = xrep_ino_dqattach(sc);
+	if (error)
+		return error;
+	xfs_trans_mod_dquot_byino(sc->tp, sc->ip, XFS_TRANS_DQ_BCOUNT,
+			-(int64_t)old_bmbt_block_count);
+	return 0;
+}
+
+/* Initialize a new fork and implant it in the inode. */
+STATIC void
+xrep_bmap_reset_fork(
+	struct xfs_scrub	*sc,
+	int			whichfork,
+	bool			has_mappings,
+	int			*log_flags)
+{
+	/* Set us back to extents format with zero records. */
+	XFS_IFORK_FMT_SET(sc->ip, whichfork, XFS_DINODE_FMT_EXTENTS);
+	XFS_IFORK_NEXT_SET(sc->ip, whichfork, 0);
+
+	/* Reinitialize the in-core fork. */
+	if (XFS_IFORK_PTR(sc->ip, whichfork) != NULL)
+		xfs_idestroy_fork(sc->ip, whichfork);
+	if (whichfork == XFS_DATA_FORK) {
+		memset(&sc->ip->i_df, 0, sizeof(struct xfs_ifork));
+		sc->ip->i_df.if_flags |= XFS_IFEXTENTS;
+	} else if (whichfork == XFS_ATTR_FORK) {
+		if (has_mappings) {
+			sc->ip->i_afp = NULL;
+		} else {
+			sc->ip->i_afp = kmem_zone_zalloc(xfs_ifork_zone,
+					KM_SLEEP);
+			sc->ip->i_afp->if_flags |= XFS_IFEXTENTS;
+		}
+	}
+
+	/*
+	 * Now that we've reinitialized the in-memory fork and set the inode
+	 * back to extents format with zero extents, any extents that we
+	 * subsequently map into the file will reinitialize the on-disk fork
+	 * area for us.  All we have to do is log the inode core to preserve
+	 * the format and extent count fields.
+	 */
+	*log_flags |= XFS_ILOG_CORE;
+}
+
+/* Make our changes permanent so that we can start rebuilding the fork. */
+STATIC int
+xrep_bmap_commit_new(
+	struct xfs_scrub	*sc,
+	int			log_flags)
+{
+	xfs_trans_log_inode(sc->tp, sc->ip, log_flags);
+	return xfs_trans_roll_inode(&sc->tp, sc->ip);
+}
+
+/* Build new fork mappings and dispose of the old bmbt blocks. */
+STATIC int
+xrep_bmap_rebuild_tree(
+	struct xfs_scrub	*sc,
+	int			whichfork,
+	struct list_head	*mapping_records,
+	struct xfs_bitmap	*old_bmbt_blocks)
+{
+	struct xfs_owner_info	oinfo;
+	struct xrep_bmap_extent	*rbe;
+	struct xrep_bmap_extent	*n;
+	int			baseflags;
+	int			error;
+
+	baseflags = XFS_BMAPI_NORMAP;
+	if (whichfork == XFS_ATTR_FORK)
+		baseflags |= XFS_BMAPI_ATTRFORK;
+
+	/* "Remap" the extents into the fork. */
+	list_sort(NULL, mapping_records, xrep_bmap_extent_cmp);
+	list_for_each_entry_safe(rbe, n, mapping_records, list) {
+		error = xrep_bmap_insert_rec(sc, rbe, baseflags);
+		if (error)
+			return error;
+		list_del(&rbe->list);
+		kmem_free(rbe);
+	}
+
+	/* Dispose of all the old bmbt blocks. */
+	xfs_rmap_ino_bmbt_owner(&oinfo, sc->ip->i_ino, whichfork);
+	return xrep_reap_extents(sc, old_bmbt_blocks, &oinfo,
+			XFS_AG_RESV_NONE);
+}
+
+/* Free every record in the mapping list. */
+STATIC void
+xrep_bmap_cancel_recs(
+	struct list_head	*recs)
+{
+	struct xrep_bmap_extent	*rbe;
+	struct xrep_bmap_extent	*n;
+
+	list_for_each_entry_safe(rbe, n, recs, list) {
+		list_del(&rbe->list);
+		kmem_free(rbe);
+	}
+}
+
+/* Repair an inode fork. */
+STATIC int
+xrep_bmap(
+	struct xfs_scrub	*sc,
+	int			whichfork)
+{
+	struct list_head	mapping_records;
+	struct xfs_bitmap	old_bmbt_blocks;
+	xfs_rfsblock_t		old_bmbt_block_count;
+	xfs_rfsblock_t		otherfork_blocks;
+	int			log_flags = 0;
+	int			error = 0;
+
+	error = xrep_bmap_check_inputs(sc, whichfork);
+	if (error)
+		return error;
+
+	/* Collect all reverse mappings for this fork's extents. */
+	INIT_LIST_HEAD(&mapping_records);
+	xfs_bitmap_init(&old_bmbt_blocks);
+	error = xrep_bmap_find_mappings(sc, whichfork, &mapping_records,
+			&old_bmbt_blocks, &old_bmbt_block_count,
+			&otherfork_blocks);
+	if (error)
+		goto out;
+
+	/*
+	 * Blow out the in-core fork and zero the on-disk fork.  This is the
+	 * point at which we are no longer able to bail out gracefully.
+	 */
+	error = xrep_bmap_reset_counters(sc, old_bmbt_block_count,
+			otherfork_blocks, &log_flags);
+	if (error)
+		goto out;
+	xrep_bmap_reset_fork(sc, whichfork, list_empty(&mapping_records),
+			&log_flags);
+	error = xrep_bmap_commit_new(sc, log_flags);
+	if (error)
+		goto out;
+
+	/* Now rebuild the fork extent map information. */
+	error = xrep_bmap_rebuild_tree(sc, whichfork, &mapping_records,
+			&old_bmbt_blocks);
+out:
+	xfs_bitmap_destroy(&old_bmbt_blocks);
+	xrep_bmap_cancel_recs(&mapping_records);
+	return error;
+}
+
+/* Repair an inode's data fork. */
+int
+xrep_bmap_data(
+	struct xfs_scrub	*sc)
+{
+	return xrep_bmap(sc, XFS_DATA_FORK);
+}
+
+/* Repair an inode's attr fork. */
+int
+xrep_bmap_attr(
+	struct xfs_scrub	*sc)
+{
+	return xrep_bmap(sc, XFS_ATTR_FORK);
+}
diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
index 20e449c7a0df..38444fec70db 100644
--- a/fs/xfs/scrub/repair.h
+++ b/fs/xfs/scrub/repair.h
@@ -66,6 +66,8 @@ int xrep_allocbt(struct xfs_scrub *sc);
 int xrep_iallocbt(struct xfs_scrub *sc);
 int xrep_refcountbt(struct xfs_scrub *sc);
 int xrep_inode(struct xfs_scrub *sc);
+int xrep_bmap_data(struct xfs_scrub *sc);
+int xrep_bmap_attr(struct xfs_scrub *sc);
 
 #else
 
@@ -104,6 +106,8 @@ xrep_reset_perag_resv(
 #define xrep_iallocbt			xrep_notsupported
 #define xrep_refcountbt			xrep_notsupported
 #define xrep_inode			xrep_notsupported
+#define xrep_bmap_data			xrep_notsupported
+#define xrep_bmap_attr			xrep_notsupported
 
 #endif /* CONFIG_XFS_ONLINE_REPAIR */
 
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index ae922801808d..45af20a3ab50 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -277,13 +277,13 @@ static const struct xchk_meta_ops meta_scrub_ops[] = {
 		.type	= ST_INODE,
 		.setup	= xchk_setup_inode_bmap,
 		.scrub	= xchk_bmap_data,
-		.repair	= xrep_notsupported,
+		.repair	= xrep_bmap_data,
 	},
 	[XFS_SCRUB_TYPE_BMBTA] = {	/* inode attr fork */
 		.type	= ST_INODE,
 		.setup	= xchk_setup_inode_bmap,
 		.scrub	= xchk_bmap_attr,
-		.repair	= xrep_notsupported,
+		.repair	= xrep_bmap_attr,
 	},
 	[XFS_SCRUB_TYPE_BMBTC] = {	/* inode CoW fork */
 		.type	= ST_INODE,
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 9126dc66f726..3383b14fd0c0 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -554,7 +554,7 @@ DEFINE_EVENT(xrep_rmap_class, name, \
 DEFINE_REPAIR_RMAP_EVENT(xrep_abt_walk_rmap);
 DEFINE_REPAIR_RMAP_EVENT(xrep_ibt_walk_rmap);
 DEFINE_REPAIR_RMAP_EVENT(xrep_rmap_extent_fn);
-DEFINE_REPAIR_RMAP_EVENT(xrep_bmap_extent_fn);
+DEFINE_REPAIR_RMAP_EVENT(xrep_bmap_walk_rmap);
 
 TRACE_EVENT(xrep_refcount_extent_fn,
 	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 524f543c5b82..c08785cf83a9 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -126,6 +126,60 @@ xfs_trans_dup(
 	return ntp;
 }
 
+/*
+ * Try to reserve more blocks for a transaction.  The single use case we
+ * support is for online repair -- use a transaction to gather data without
+ * fear of btree cycle deadlocks; calculate how many blocks we really need
+ * from that data; and only then start modifying data.  This can fail due to
+ * ENOSPC, so we have to be able to cancel the transaction.
+ */
+int
+xfs_trans_reserve_more(
+	struct xfs_trans	*tp,
+	uint			blocks,
+	uint			rtextents)
+{
+	struct xfs_mount	*mp = tp->t_mountp;
+	bool			rsvd = (tp->t_flags & XFS_TRANS_RESERVE) != 0;
+	int			error = 0;
+
+	ASSERT(!(tp->t_flags & XFS_TRANS_DIRTY));
+
+	/*
+	 * Attempt to reserve the needed disk blocks by decrementing
+	 * the number needed from the number available.  This will
+	 * fail if the count would go below zero.
+	 */
+	if (blocks > 0) {
+		error = xfs_mod_fdblocks(mp, -((int64_t)blocks), rsvd);
+		if (error)
+			return -ENOSPC;
+		tp->t_blk_res += blocks;
+	}
+
+	/*
+	 * Attempt to reserve the needed realtime extents by decrementing
+	 * the number needed from the number available.  This will
+	 * fail if the count would go below zero.
+	 */
+	if (rtextents > 0) {
+		error = xfs_mod_frextents(mp, -((int64_t)rtextents));
+		if (error) {
+			error = -ENOSPC;
+			goto out_blocks;
+		}
+		tp->t_rtx_res += rtextents;
+	}
+
+	return 0;
+out_blocks:
+	if (blocks > 0) {
+		xfs_mod_fdblocks(mp, (int64_t)blocks, rsvd);
+		tp->t_blk_res -= blocks;
+	}
+	return error;
+}
+
 /*
  * This is called to reserve free disk blocks and log space for the
  * given transaction.  This must be done before allocating any resources
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 6526314f0b8f..bdbd3d5fd7b0 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -153,6 +153,8 @@ typedef struct xfs_trans {
 int		xfs_trans_alloc(struct xfs_mount *mp, struct xfs_trans_res *resp,
 			uint blocks, uint rtextents, uint flags,
 			struct xfs_trans **tpp);
+int		xfs_trans_reserve_more(struct xfs_trans *tp, uint blocks,
+			uint rtextents);
 int		xfs_trans_alloc_empty(struct xfs_mount *mp,
 			struct xfs_trans **tpp);
 void		xfs_trans_mod_sb(xfs_trans_t *, uint, int64_t);


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 13/16] xfs: repair damaged symlinks
  2018-07-26  0:19 [PATCH v17 00/16] xfs-4.19: online repair support Darrick J. Wong
                   ` (11 preceding siblings ...)
  2018-07-26  0:21 ` [PATCH 12/16] xfs: repair inode block maps Darrick J. Wong
@ 2018-07-26  0:21 ` Darrick J. Wong
  2018-07-26  0:21 ` [PATCH 14/16] xfs: repair extended attributes Darrick J. Wong
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2018-07-26  0:21 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, david, allison.henderson

From: Darrick J. Wong <darrick.wong@oracle.com>

Repair inconsistent symbolic link data.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile               |    1 
 fs/xfs/scrub/repair.h         |    2 
 fs/xfs/scrub/scrub.c          |    2 
 fs/xfs/scrub/symlink.c        |    5 +
 fs/xfs/scrub/symlink_repair.c |  254 +++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_symlink.c          |  151 ++++++++++++++----------
 fs/xfs/xfs_symlink.h          |    4 +
 7 files changed, 350 insertions(+), 69 deletions(-)
 create mode 100644 fs/xfs/scrub/symlink_repair.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 7f5467bb18b9..e25cde969d99 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -171,6 +171,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   inode_repair.o \
 				   refcount_repair.o \
 				   repair.o \
+				   symlink_repair.o \
 				   )
 endif
 endif
diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
index 38444fec70db..17769efb20d9 100644
--- a/fs/xfs/scrub/repair.h
+++ b/fs/xfs/scrub/repair.h
@@ -68,6 +68,7 @@ int xrep_refcountbt(struct xfs_scrub *sc);
 int xrep_inode(struct xfs_scrub *sc);
 int xrep_bmap_data(struct xfs_scrub *sc);
 int xrep_bmap_attr(struct xfs_scrub *sc);
+int xrep_symlink(struct xfs_scrub *sc);
 
 #else
 
@@ -108,6 +109,7 @@ xrep_reset_perag_resv(
 #define xrep_inode			xrep_notsupported
 #define xrep_bmap_data			xrep_notsupported
 #define xrep_bmap_attr			xrep_notsupported
+#define xrep_symlink			xrep_notsupported
 
 #endif /* CONFIG_XFS_ONLINE_REPAIR */
 
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 45af20a3ab50..0a8eea77e58f 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -307,7 +307,7 @@ static const struct xchk_meta_ops meta_scrub_ops[] = {
 		.type	= ST_INODE,
 		.setup	= xchk_setup_symlink,
 		.scrub	= xchk_symlink,
-		.repair	= xrep_notsupported,
+		.repair	= xrep_symlink,
 	},
 	[XFS_SCRUB_TYPE_PARENT] = {	/* parent pointers */
 		.type	= ST_INODE,
diff --git a/fs/xfs/scrub/symlink.c b/fs/xfs/scrub/symlink.c
index f7ebaa946999..ee968c62d0f2 100644
--- a/fs/xfs/scrub/symlink.c
+++ b/fs/xfs/scrub/symlink.c
@@ -29,12 +29,15 @@ xchk_setup_symlink(
 	struct xfs_scrub	*sc,
 	struct xfs_inode	*ip)
 {
+	uint			resblks;
+
 	/* Allocate the buffer without the inode lock held. */
 	sc->buf = kmem_zalloc_large(XFS_SYMLINK_MAXLEN + 1, KM_SLEEP);
 	if (!sc->buf)
 		return -ENOMEM;
 
-	return xchk_setup_inode_contents(sc, ip, 0);
+	resblks = xfs_symlink_blocks(sc->mp, XFS_SYMLINK_MAXLEN);
+	return xchk_setup_inode_contents(sc, ip, resblks);
 }
 
 /* Symbolic links. */
diff --git a/fs/xfs/scrub/symlink_repair.c b/fs/xfs/scrub/symlink_repair.c
new file mode 100644
index 000000000000..6ebf2d5913ed
--- /dev/null
+++ b/fs/xfs/scrub/symlink_repair.c
@@ -0,0 +1,254 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_inode_fork.h"
+#include "xfs_symlink.h"
+#include "xfs_bmap.h"
+#include "xfs_quota.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_trans_space.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/trace.h"
+#include "scrub/repair.h"
+
+/*
+ * Symbolic Link Repair
+ * ====================
+ *
+ * There's not much we can do to repair symbolic links -- we truncate them to
+ * the first NULL byte and reinitialize the target.  Zero-length symlinks are
+ * turned into links to the current dir.
+ */
+
+/* Try to salvage the pathname from rmt blocks. */
+STATIC int
+xrep_symlink_salvage_remote(
+	struct xfs_scrub	*sc)
+{
+	struct xfs_bmbt_irec	mval[XFS_SYMLINK_MAPS];
+	struct xfs_inode	*ip = sc->ip;
+	struct xfs_buf		*bp;
+	char			*target_buf = sc->buf;
+	xfs_failaddr_t		fa;
+	xfs_filblks_t		fsblocks;
+	xfs_daddr_t		d;
+	loff_t			len;
+	loff_t			offset;
+	unsigned int		byte_cnt;
+	bool			magic_ok;
+	bool			hdr_ok;
+	int			n;
+	int			nmaps = XFS_SYMLINK_MAPS;
+	int			error;
+
+	/* We'll only read until the buffer is full. */
+	len = max_t(loff_t, ip->i_d.di_size, XFS_SYMLINK_MAXLEN);
+	fsblocks = xfs_symlink_blocks(sc->mp, len);
+	error = xfs_bmapi_read(ip, 0, fsblocks, mval, &nmaps, 0);
+	if (error)
+		return error;
+
+	offset = 0;
+	for (n = 0; n < nmaps; n++) {
+		struct xfs_dsymlink_hdr	*dsl;
+
+		d = XFS_FSB_TO_DADDR(sc->mp, mval[n].br_startblock);
+
+		/* Read the rmt block.  We'll run the verifiers manually. */
+		error = xfs_trans_read_buf(sc->mp, sc->tp, sc->mp->m_ddev_targp,
+				d, XFS_FSB_TO_BB(sc->mp, mval[n].br_blockcount),
+				0, &bp, NULL);
+		if (error)
+			return error;
+		bp->b_ops = &xfs_symlink_buf_ops;
+
+		/* How many bytes do we expect to get out of this buffer? */
+		byte_cnt = XFS_FSB_TO_B(sc->mp, mval[n].br_blockcount);
+		byte_cnt = XFS_SYMLINK_BUF_SPACE(sc->mp, byte_cnt);
+		byte_cnt = min_t(unsigned int, byte_cnt, len);
+
+		/*
+		 * See if the verifiers accept this block.  We're willing to
+		 * salvage if the if the offset/byte/ino are ok and either the
+		 * verifier passed or the magic is ok.  Anything else and we
+		 * stop dead in our tracks.
+		 */
+		fa = bp->b_ops->verify_struct(bp);
+		dsl = bp->b_addr;
+		magic_ok = dsl->sl_magic == cpu_to_be32(XFS_SYMLINK_MAGIC);
+		hdr_ok = xfs_symlink_hdr_ok(ip->i_ino, offset, byte_cnt, bp);
+		if (!hdr_ok || (fa != NULL && !magic_ok))
+			break;
+
+		memcpy(target_buf + offset, dsl + 1, byte_cnt);
+
+		len -= byte_cnt;
+		offset += byte_cnt;
+	}
+
+	/* Ensure we have a zero at the end, and /some/ contents. */
+	if (offset == 0)
+		sprintf(target_buf, ".");
+	else
+		target_buf[offset] = 0;
+	return 0;
+}
+
+/*
+ * Try to salvage an inline symlink's contents.  Empty symlinks become a link
+ * to the current directory.
+ */
+STATIC void
+xrep_symlink_salvage_inline(
+	struct xfs_scrub	*sc)
+{
+	struct xfs_inode	*ip = sc->ip;
+	struct xfs_ifork	*ifp;
+
+	ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
+	if (ifp->if_u1.if_data)
+		strncpy(sc->buf, ifp->if_u1.if_data, XFS_IFORK_DSIZE(ip));
+	if (strlen(sc->buf) == 0)
+		sprintf(sc->buf, ".");
+}
+
+/* Reset an inline symlink to its fresh configuration. */
+STATIC void
+xrep_symlink_truncate_inline(
+	struct xfs_inode	*ip)
+{
+	xfs_idestroy_fork(ip, XFS_DATA_FORK);
+	ip->i_d.di_format = XFS_DINODE_FMT_EXTENTS;
+	ip->i_d.di_nextents = 0;
+	memset(&ip->i_df, 0, sizeof(struct xfs_ifork));
+	ip->i_df.if_flags |= XFS_IFEXTENTS;
+}
+
+/*
+ * Salvage an inline symlink's contents and reset data fork.
+ * Returns with the inode joined to the transaction.
+ */
+STATIC int
+xrep_symlink_inline(
+	struct xfs_scrub	*sc)
+{
+	/* Salvage whatever link target information we can find. */
+	xrep_symlink_salvage_inline(sc);
+
+	/* Truncate the symlink. */
+	xrep_symlink_truncate_inline(sc->ip);
+
+	xfs_trans_ijoin(sc->tp, sc->ip, 0);
+	return 0;
+}
+
+/*
+ * Salvage an inline symlink's contents and reset data fork.
+ * Returns with the inode joined to the transaction.
+ */
+STATIC int
+xrep_symlink_remote(
+	struct xfs_scrub	*sc)
+{
+	int			error;
+
+	/* Salvage whatever link target information we can find. */
+	error = xrep_symlink_salvage_remote(sc);
+	if (error)
+		return error;
+
+	/* Truncate the symlink. */
+	xfs_trans_ijoin(sc->tp, sc->ip, 0);
+	return xfs_itruncate_extents(&sc->tp, sc->ip, XFS_DATA_FORK, 0);
+}
+
+/*
+ * Reinitialize a link target.  Caller must ensure the inode is joined to
+ * the transaction.
+ */
+STATIC int
+xrep_symlink_reinitialize(
+	struct xfs_scrub	*sc)
+{
+	struct xfs_defer_ops	dfops;
+	xfs_fsblock_t		first_block;
+	xfs_fsblock_t		fs_blocks;
+	unsigned int		target_len;
+	uint			resblks;
+	int			error;
+
+	/* How many blocks do we need? */
+	target_len = strlen(sc->buf);
+	ASSERT(target_len != 0);
+	if (target_len == 0 || target_len > XFS_SYMLINK_MAXLEN)
+		return -EFSCORRUPTED;
+
+	/* Set up to reinitialize the target. */
+	xfs_defer_init(&dfops, &first_block);
+
+	fs_blocks = xfs_symlink_blocks(sc->mp, target_len);
+	resblks = XFS_SYMLINK_SPACE_RES(sc->mp, target_len, fs_blocks);
+	error = xfs_trans_reserve_quota_nblks(sc->tp, sc->ip, resblks, 0,
+			XFS_QMOPT_RES_REGBLKS);
+
+	/* Try to write the new target back out. */
+	xfs_defer_ijoin(&dfops, sc->ip);
+	error = xfs_symlink_write_target(sc->tp, sc->ip, &dfops, sc->buf,
+			target_len, &first_block, fs_blocks, resblks);
+	if (error)
+		goto err;
+
+	/* Finish up any block mapping activities. */
+	error = xfs_defer_finish(&sc->tp, &dfops);
+	if (error)
+		goto err;
+	return 0;
+err:
+	xfs_defer_cancel(&dfops);
+	return error;
+}
+
+/* Repair a symbolic link. */
+int
+xrep_symlink(
+	struct xfs_scrub	*sc)
+{
+	struct xfs_ifork	*ifp;
+	int			error;
+
+	error = xfs_qm_dqattach_locked(sc->ip, false);
+	if (error)
+		return error;
+
+	/* Salvage whatever we can of the target. */
+	*((char *)sc->buf) = 0;
+	ifp = XFS_IFORK_PTR(sc->ip, XFS_DATA_FORK);
+	if (ifp->if_flags & XFS_IFINLINE)
+		error = xrep_symlink_inline(sc);
+	else
+		error = xrep_symlink_remote(sc);
+	if (error)
+		return error;
+
+	/* Now reset the target. */
+	return xrep_symlink_reinitialize(sc);
+}
diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
index 7f85342a09e6..ea689a0e502d 100644
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -150,6 +150,86 @@ xfs_readlink(
 	return error;
 }
 
+/* Write the symlink target into the inode. */
+int
+xfs_symlink_write_target(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*ip,
+	struct xfs_defer_ops	*dfops,
+	const char		*target_path,
+	int			pathlen,
+	xfs_fsblock_t		*first_block,
+	xfs_fsblock_t		fs_blocks,
+	uint			resblks)
+{
+	struct xfs_bmbt_irec	mval[XFS_SYMLINK_MAPS];
+	struct xfs_mount	*mp = tp->t_mountp;
+	const char		*cur_chunk;
+	struct xfs_buf		*bp;
+	xfs_daddr_t		d;
+	int			byte_cnt;
+	int			nmaps;
+	int			offset;
+	int			n;
+	int			error;
+
+	/*
+	 * If the symlink will fit into the inode, write it inline.
+	 */
+	if (pathlen <= XFS_IFORK_DSIZE(ip)) {
+		xfs_init_local_fork(ip, XFS_DATA_FORK, target_path, pathlen);
+
+		ip->i_d.di_size = pathlen;
+		ip->i_d.di_format = XFS_DINODE_FMT_LOCAL;
+		xfs_trans_log_inode(tp, ip, XFS_ILOG_DDATA | XFS_ILOG_CORE);
+
+		return 0;
+	}
+
+	/* Write target to remote blocks. */
+	nmaps = XFS_SYMLINK_MAPS;
+	error = xfs_bmapi_write(tp, ip, 0, fs_blocks, XFS_BMAPI_METADATA,
+			first_block, resblks, mval, &nmaps, dfops);
+	if (error)
+		return error;
+
+	ip->i_d.di_size = pathlen;
+	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+
+	cur_chunk = target_path;
+	offset = 0;
+	for (n = 0; n < nmaps; n++) {
+		char	*buf;
+
+		d = XFS_FSB_TO_DADDR(mp, mval[n].br_startblock);
+		byte_cnt = XFS_FSB_TO_B(mp, mval[n].br_blockcount);
+		bp = xfs_trans_get_buf(tp, mp->m_ddev_targp, d,
+				BTOBB(byte_cnt), 0);
+		if (!bp)
+			return -ENOMEM;
+		bp->b_ops = &xfs_symlink_buf_ops;
+
+		byte_cnt = XFS_SYMLINK_BUF_SPACE(mp, byte_cnt);
+		byte_cnt = min(byte_cnt, pathlen);
+
+		buf = bp->b_addr;
+		buf += xfs_symlink_hdr_set(mp, ip->i_ino, offset,
+				byte_cnt, bp);
+
+		memcpy(buf, cur_chunk, byte_cnt);
+
+		cur_chunk += byte_cnt;
+		pathlen -= byte_cnt;
+		offset += byte_cnt;
+
+		xfs_trans_buf_set_type(tp, bp, XFS_BLFT_SYMLINK_BUF);
+		xfs_trans_log_buf(tp, bp, 0, (buf + byte_cnt - 1) -
+				(char *)bp->b_addr);
+	}
+	ASSERT(pathlen == 0);
+	return 0;
+}
+
 int
 xfs_symlink(
 	struct xfs_inode	*dp,
@@ -166,15 +246,7 @@ xfs_symlink(
 	struct xfs_defer_ops	dfops;
 	xfs_fsblock_t		first_block;
 	bool                    unlock_dp_on_error = false;
-	xfs_fileoff_t		first_fsb;
 	xfs_filblks_t		fs_blocks;
-	int			nmaps;
-	struct xfs_bmbt_irec	mval[XFS_SYMLINK_MAPS];
-	xfs_daddr_t		d;
-	const char		*cur_chunk;
-	int			byte_cnt;
-	int			n;
-	xfs_buf_t		*bp;
 	prid_t			prid;
 	struct xfs_dquot	*udqp = NULL;
 	struct xfs_dquot	*gdqp = NULL;
@@ -274,66 +346,11 @@ xfs_symlink(
 
 	if (resblks)
 		resblks -= XFS_IALLOC_SPACE_RES(mp);
-	/*
-	 * If the symlink will fit into the inode, write it inline.
-	 */
-	if (pathlen <= XFS_IFORK_DSIZE(ip)) {
-		xfs_init_local_fork(ip, XFS_DATA_FORK, target_path, pathlen);
-
-		ip->i_d.di_size = pathlen;
-		ip->i_d.di_format = XFS_DINODE_FMT_LOCAL;
-		xfs_trans_log_inode(tp, ip, XFS_ILOG_DDATA | XFS_ILOG_CORE);
-	} else {
-		int	offset;
-
-		first_fsb = 0;
-		nmaps = XFS_SYMLINK_MAPS;
-
-		error = xfs_bmapi_write(tp, ip, first_fsb, fs_blocks,
-				  XFS_BMAPI_METADATA, &first_block, resblks,
-				  mval, &nmaps, &dfops);
-		if (error)
-			goto out_bmap_cancel;
 
-		if (resblks)
-			resblks -= fs_blocks;
-		ip->i_d.di_size = pathlen;
-		xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
-
-		cur_chunk = target_path;
-		offset = 0;
-		for (n = 0; n < nmaps; n++) {
-			char	*buf;
-
-			d = XFS_FSB_TO_DADDR(mp, mval[n].br_startblock);
-			byte_cnt = XFS_FSB_TO_B(mp, mval[n].br_blockcount);
-			bp = xfs_trans_get_buf(tp, mp->m_ddev_targp, d,
-					       BTOBB(byte_cnt), 0);
-			if (!bp) {
-				error = -ENOMEM;
-				goto out_bmap_cancel;
-			}
-			bp->b_ops = &xfs_symlink_buf_ops;
-
-			byte_cnt = XFS_SYMLINK_BUF_SPACE(mp, byte_cnt);
-			byte_cnt = min(byte_cnt, pathlen);
-
-			buf = bp->b_addr;
-			buf += xfs_symlink_hdr_set(mp, ip->i_ino, offset,
-						   byte_cnt, bp);
-
-			memcpy(buf, cur_chunk, byte_cnt);
-
-			cur_chunk += byte_cnt;
-			pathlen -= byte_cnt;
-			offset += byte_cnt;
-
-			xfs_trans_buf_set_type(tp, bp, XFS_BLFT_SYMLINK_BUF);
-			xfs_trans_log_buf(tp, bp, 0, (buf + byte_cnt - 1) -
-							(char *)bp->b_addr);
-		}
-		ASSERT(pathlen == 0);
-	}
+	error = xfs_symlink_write_target(tp, ip, &dfops, target_path, pathlen,
+			&first_block, fs_blocks, resblks);
+	if (error)
+		goto out_bmap_cancel;
 
 	/*
 	 * Create the directory entry for the symlink.
diff --git a/fs/xfs/xfs_symlink.h b/fs/xfs/xfs_symlink.h
index 9743d8c9394b..fd7eaa155939 100644
--- a/fs/xfs/xfs_symlink.h
+++ b/fs/xfs/xfs_symlink.h
@@ -12,5 +12,9 @@ int xfs_symlink(struct xfs_inode *dp, struct xfs_name *link_name,
 int xfs_readlink_bmap_ilocked(struct xfs_inode *ip, char *link);
 int xfs_readlink(struct xfs_inode *ip, char *link);
 int xfs_inactive_symlink(struct xfs_inode *ip);
+int xfs_symlink_write_target(struct xfs_trans *tp, struct xfs_inode *ip,
+		struct xfs_defer_ops *dfops, const char *target_path,
+		int pathlen, xfs_fsblock_t *first_block,
+		xfs_fsblock_t fs_blocks, uint resblks);
 
 #endif /* __XFS_SYMLINK_H */


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 14/16] xfs: repair extended attributes
  2018-07-26  0:19 [PATCH v17 00/16] xfs-4.19: online repair support Darrick J. Wong
                   ` (12 preceding siblings ...)
  2018-07-26  0:21 ` [PATCH 13/16] xfs: repair damaged symlinks Darrick J. Wong
@ 2018-07-26  0:21 ` Darrick J. Wong
  2018-07-26  0:21 ` [PATCH 15/16] xfs: scrub should set preen if attr leaf has holes Darrick J. Wong
  2018-07-26  0:21 ` [PATCH 16/16] xfs: repair quotas Darrick J. Wong
  15 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2018-07-26  0:21 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, david, allison.henderson

From: Darrick J. Wong <darrick.wong@oracle.com>

If the extended attributes look bad, try to sift through the rubble to
find whatever keys/values we can, zap the attr tree, and re-add the
values.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile            |    1 
 fs/xfs/scrub/attr.c        |    2 
 fs/xfs/scrub/attr_repair.c |  611 ++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/repair.h      |    2 
 fs/xfs/scrub/scrub.c       |    2 
 fs/xfs/scrub/scrub.h       |    3 
 6 files changed, 619 insertions(+), 2 deletions(-)
 create mode 100644 fs/xfs/scrub/attr_repair.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index e25cde969d99..c3963c88f952 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -164,6 +164,7 @@ xfs-$(CONFIG_XFS_QUOTA)		+= scrub/quota.o
 ifeq ($(CONFIG_XFS_ONLINE_REPAIR),y)
 xfs-y				+= $(addprefix scrub/, \
 				   agheader_repair.o \
+				   attr_repair.o \
 				   alloc_repair.o \
 				   bitmap.o \
 				   bmap_repair.o \
diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
index 81d5e90547a1..e20074c241b5 100644
--- a/fs/xfs/scrub/attr.c
+++ b/fs/xfs/scrub/attr.c
@@ -125,7 +125,7 @@ xchk_xattr_listent(
  * Within a char, the lowest bit of the char represents the byte with
  * the smallest address
  */
-STATIC bool
+bool
 xchk_xattr_set_map(
 	struct xfs_scrub	*sc,
 	unsigned long		*map,
diff --git a/fs/xfs/scrub/attr_repair.c b/fs/xfs/scrub/attr_repair.c
new file mode 100644
index 000000000000..5bacfb88f25e
--- /dev/null
+++ b/fs/xfs/scrub/attr_repair.c
@@ -0,0 +1,611 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_dir2.h"
+#include "xfs_attr.h"
+#include "xfs_attr_leaf.h"
+#include "xfs_attr_sf.h"
+#include "xfs_attr_remote.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/trace.h"
+#include "scrub/repair.h"
+
+/*
+ * Extended Attribute Repair
+ * =========================
+ *
+ * We repair extended attributes by reading the attribute fork blocks looking
+ * for keys and values, then truncate the entire attr fork and reinsert all
+ * the attributes.  Unfortunately, there's no secondary copy of most extended
+ * attribute data, which means that if we blow up midway through there's
+ * little we can do.
+ */
+
+struct xrep_xattr_key {
+	struct list_head	list;
+	unsigned char		*value;
+	int			valuelen;
+	int			flags;
+	int			namelen;
+	unsigned char		name[0];
+};
+
+#define XREP_XATTR_KEY_LEN(namelen) \
+	(sizeof(struct xrep_xattr_key) + (namelen) + 1)
+
+struct xrep_xattr {
+	struct list_head	*attrlist;
+	struct xfs_scrub	*sc;
+};
+
+/*
+ * Iterate each block in an attr fork extent.  The m_attr_geo fsbcount is
+ * always 1 for now, but code defensively in case this ever changes.
+ */
+#define for_each_xfs_attr_block(mp, irec, dabno) \
+	for ((dabno) = roundup((xfs_dablk_t)(irec)->br_startoff, \
+			(mp)->m_attr_geo->fsbcount); \
+	     (dabno) < (irec)->br_startoff + (irec)->br_blockcount; \
+	     (dabno) += (mp)->m_attr_geo->fsbcount)
+
+/*
+ * Decide if we want to salvage this attribute.  We don't bother with
+ * incomplete or oversized keys or values.
+ */
+STATIC int
+xrep_xattr_want_salvage(
+	int			flags,
+	int			namelen,
+	int			valuelen)
+{
+	if (flags & XFS_ATTR_INCOMPLETE)
+		return false;
+	if (namelen > XATTR_NAME_MAX || namelen <= 0)
+		return false;
+	if (valuelen > XATTR_SIZE_MAX || valuelen < 0)
+		return false;
+	return true;
+}
+
+/* Allocate an in-core record to hold xattrs while we rebuild the xattr data. */
+STATIC struct xrep_xattr_key *
+xrep_xattr_salvage_key(
+	int			flags,
+	unsigned char		*name,
+	int			namelen,
+	int			valuelen)
+{
+	struct xrep_xattr_key	*key;
+
+	/* Store attr key. */
+	key = kmem_alloc(XREP_XATTR_KEY_LEN(namelen), KM_MAYFAIL);
+	if (!key)
+		return NULL;
+	INIT_LIST_HEAD(&key->list);
+	key->valuelen = valuelen;
+	key->flags = flags & (ATTR_ROOT | ATTR_SECURE);
+	key->namelen = namelen;
+	key->name[namelen] = 0;
+	memcpy(key->name, name, namelen);
+	key->value = NULL;
+	if (valuelen) {
+		key->value = kmem_alloc_large(valuelen, KM_MAYFAIL);
+		if (!key->value) {
+			kmem_free(key);
+			return NULL;
+		}
+	}
+	return key;
+}
+
+/*
+ * Record a shortform extended attribute key & value for later reinsertion
+ * into the inode.
+ */
+STATIC int
+xrep_xattr_salvage_sf_attr(
+	struct xrep_xattr		*rx,
+	struct xfs_attr_sf_entry	*sfe)
+{
+	unsigned char			*value = &sfe->nameval[sfe->namelen];
+	struct xrep_xattr_key		*key;
+
+	if (!xrep_xattr_want_salvage(sfe->flags, sfe->namelen, sfe->valuelen))
+		return 0;
+	key = xrep_xattr_salvage_key(sfe->flags, sfe->nameval, sfe->namelen,
+			sfe->valuelen);
+	if (!key)
+		return -ENOMEM;
+	if (sfe->valuelen)
+		memcpy(key->value, value, sfe->valuelen);
+	list_add_tail(&key->list, rx->attrlist);
+	return 0;
+}
+
+/*
+ * Record a local format extended attribute key & value for later reinsertion
+ * into the inode.
+ */
+STATIC int
+xrep_xattr_salvage_local_attr(
+	struct xrep_xattr		*rx,
+	struct xfs_attr_leaf_entry	*ent,
+	unsigned int			nameidx,
+	const char			*buf_end,
+	struct xfs_attr_leaf_name_local	*lentry)
+{
+	struct xrep_xattr_key		*key;
+	unsigned long			*usedmap = rx->sc->buf;
+	unsigned int			valuelen;
+	unsigned int			namesize;
+
+	/*
+	 * Decode the leaf local entry format.  If something seems wrong, we
+	 * junk the attribute.
+	 */
+	valuelen = be16_to_cpu(lentry->valuelen);
+	namesize = xfs_attr_leaf_entsize_local(lentry->namelen, valuelen);
+	if ((char *)lentry + namesize > buf_end)
+		return 0;
+	if (!xrep_xattr_want_salvage(ent->flags, lentry->namelen, valuelen))
+		return 0;
+	if (!xchk_xattr_set_map(rx->sc, usedmap, nameidx, namesize))
+		return 0;
+
+	/* Try to save this attribute. */
+	key = xrep_xattr_salvage_key(ent->flags, lentry->nameval,
+			lentry->namelen, valuelen);
+	if (!key)
+		return -ENOMEM;
+	if (valuelen)
+		memcpy(key->value, &lentry->nameval[lentry->namelen], valuelen);
+	list_add_tail(&key->list, rx->attrlist);
+	return 0;
+}
+
+/*
+ * Record a remote format extended attribute key & value for later reinsertion
+ * into the inode.
+ */
+STATIC int
+xrep_xattr_salvage_remote_attr(
+	struct xrep_xattr		*rx,
+	struct xfs_attr_leaf_entry	*ent,
+	unsigned int			nameidx,
+	const char			*buf_end,
+	struct xfs_attr_leaf_name_remote *rentry,
+	unsigned int			ent_idx,
+	struct xfs_buf			*leaf_bp)
+{
+	struct xfs_da_args		args = {
+		.trans		= rx->sc->tp,
+		.dp		= rx->sc->ip,
+		.index		= ent_idx,
+		.geo		= rx->sc->mp->m_attr_geo,
+	};
+	struct xrep_xattr_key		*key;
+	unsigned long			*usedmap = rx->sc->buf;
+	unsigned int			valuelen;
+	unsigned int			namesize;
+	int				error;
+
+	/*
+	 * Decode the leaf remote entry format.  If something seems wrong, we
+	 * junk the attribute.  Note that we should never find a zero-length
+	 * remote attribute value.
+	 */
+	valuelen = be32_to_cpu(rentry->valuelen);
+	namesize = xfs_attr_leaf_entsize_remote(rentry->namelen);
+	if ((char *)rentry + namesize > buf_end)
+		return 0;
+	if (valuelen == 0 ||
+	    !xrep_xattr_want_salvage(ent->flags, rentry->namelen, valuelen))
+		return 0;
+	if (!xchk_xattr_set_map(rx->sc, usedmap, nameidx, namesize))
+		return 0;
+
+	/* Try to save this attribute. */
+	key = xrep_xattr_salvage_key(ent->flags, rentry->name, rentry->namelen,
+			valuelen);
+	if (!key)
+		return -ENOMEM;
+
+	/* Look up the remote value and stash it for reconstruction. */
+	args.valuelen = valuelen;
+	args.namelen = rentry->namelen;
+	args.name = key->name;
+	args.value = key->value;
+	error = xfs_attr3_leaf_getvalue(leaf_bp, &args);
+	if (error || args.rmtblkno == 0)
+		goto err_free;
+
+	error = xfs_attr_rmtval_get(&args);
+	if (error == 0) {
+		/* Got the value, add the attr and get out. */
+		list_add_tail(&key->list, rx->attrlist);
+		return 0;
+	}
+
+err_free:
+	/* remote value was garbage, junk it */
+	if (error == -EFSBADCRC || error == -EFSCORRUPTED)
+		error = 0;
+	kmem_free(key->value);
+	kmem_free(key);
+	return error;
+}
+
+/* Extract every xattr key that we can from this attr fork block. */
+STATIC int
+xrep_xattr_recover_leaf(
+	struct xrep_xattr		*rx,
+	struct xfs_buf			*bp)
+{
+	struct xfs_attr3_icleaf_hdr	leafhdr;
+	struct xfs_scrub		*sc = rx->sc;
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_attr_leafblock	*leaf;
+	unsigned long			*usedmap = sc->buf;
+	struct xfs_attr_leaf_name_local	*lentry;
+	struct xfs_attr_leaf_name_remote *rentry;
+	struct xfs_attr_leaf_entry	*ent;
+	struct xfs_attr_leaf_entry	*entries;
+	char				*buf_end;
+	size_t				off;
+	unsigned int			nameidx;
+	unsigned int			hdrsize;
+	int				i;
+	int				error = 0;
+
+	bitmap_zero(usedmap, mp->m_attr_geo->blksize);
+
+	/* Check the leaf header */
+	leaf = bp->b_addr;
+	xfs_attr3_leaf_hdr_from_disk(mp->m_attr_geo, &leafhdr, leaf);
+	hdrsize = xfs_attr3_leaf_hdr_size(leaf);
+	xchk_xattr_set_map(sc, usedmap, 0, hdrsize);
+	entries = xfs_attr3_leaf_entryp(leaf);
+
+	buf_end = (char *)bp->b_addr + mp->m_attr_geo->blksize;
+	for (i = 0, ent = entries; i < leafhdr.count; ent++, i++) {
+		/* Skip key if it conflicts with something else? */
+		off = (char *)ent - (char *)leaf;
+		if (!xchk_xattr_set_map(sc, usedmap, off,
+				sizeof(xfs_attr_leaf_entry_t)))
+			continue;
+
+		/* Check the name information. */
+		nameidx = be16_to_cpu(ent->nameidx);
+		if (nameidx < leafhdr.firstused ||
+		    nameidx >= mp->m_attr_geo->blksize)
+			continue;
+
+		if (ent->flags & XFS_ATTR_LOCAL) {
+			lentry = xfs_attr3_leaf_name_local(leaf, i);
+			error = xrep_xattr_salvage_local_attr(rx, ent, nameidx,
+					buf_end, lentry);
+		} else {
+			rentry = xfs_attr3_leaf_name_remote(leaf, i);
+			error = xrep_xattr_salvage_remote_attr(rx, ent, nameidx,
+					buf_end, rentry, i, bp);
+		}
+		if (error)
+			break;
+	}
+
+	return error;
+}
+
+/* Try to recover shortform attrs. */
+STATIC int
+xrep_xattr_recover_sf(
+	struct xrep_xattr		*rx)
+{
+	struct xfs_attr_shortform	*sf;
+	struct xfs_attr_sf_entry	*sfe;
+	struct xfs_attr_sf_entry	*next;
+	struct xfs_ifork		*ifp;
+	unsigned char			*end;
+	int				i;
+	int				error;
+
+	ifp = XFS_IFORK_PTR(rx->sc->ip, XFS_ATTR_FORK);
+	sf = (struct xfs_attr_shortform *)rx->sc->ip->i_afp->if_u1.if_data;
+	end = (unsigned char *)ifp->if_u1.if_data + ifp->if_bytes;
+
+	for (i = 0, sfe = &sf->list[0]; i < sf->hdr.count; i++) {
+		next = XFS_ATTR_SF_NEXTENTRY(sfe);
+		if ((unsigned char *)next > end)
+			break;
+
+		/* Ok, let's save this key/value. */
+		error = xrep_xattr_salvage_sf_attr(rx, sfe);
+		if (error)
+			return error;
+
+		sfe = next;
+	}
+
+	return 0;
+}
+
+/* Extract as many attribute keys and values as we can. */
+STATIC int
+xrep_xattr_recover(
+	struct xrep_xattr	*rx)
+{
+	struct xfs_iext_cursor	icur;
+	struct xfs_bmbt_irec	got;
+	struct xfs_scrub	*sc = rx->sc;
+	struct xfs_ifork	*ifp;
+	struct xfs_da_blkinfo	*info;
+	struct xfs_buf		*bp;
+	xfs_dablk_t		dabno;
+	int			error = 0;
+
+	if (sc->ip->i_d.di_aformat == XFS_DINODE_FMT_LOCAL)
+		return xrep_xattr_recover_sf(rx);
+
+	/* Iterate each attr block in the attr fork. */
+	ifp = XFS_IFORK_PTR(sc->ip, XFS_ATTR_FORK);
+	for_each_xfs_iext(ifp, &icur, &got) {
+		for_each_xfs_attr_block(sc->mp, &got, dabno) {
+			/*
+			 * Try to read buffer.  We invalidate them in the next
+			 * step so we don't bother to set a buffer type or
+			 * ops.
+			 */
+			error = xfs_da_read_buf(sc->tp, sc->ip, dabno, -1, &bp,
+					XFS_ATTR_FORK, NULL);
+			if (error || !bp)
+				continue;
+
+			/* Screen out non-leaves & other garbage. */
+			info = bp->b_addr;
+			if (info->magic != cpu_to_be16(XFS_ATTR3_LEAF_MAGIC) ||
+			    xfs_attr3_leaf_buf_ops.verify_struct(bp) != NULL)
+				continue;
+
+			error = xrep_xattr_recover_leaf(rx, bp);
+			if (error)
+				return error;
+		}
+	}
+
+	return error;
+}
+
+/* Free all the attribute fork blocks and delete the fork. */
+STATIC int
+xrep_xattr_reset_btree(
+	struct xfs_scrub	*sc)
+{
+	struct xfs_iext_cursor	icur;
+	struct xfs_bmbt_irec	got;
+	struct xfs_ifork	*ifp;
+	struct xfs_buf		*bp;
+	xfs_fileoff_t		lblk;
+	int			error;
+
+	xfs_trans_ijoin(sc->tp, sc->ip, 0);
+
+	if (sc->ip->i_d.di_aformat == XFS_DINODE_FMT_LOCAL)
+		goto out_fork_remove;
+
+	/* Invalidate each attr block in the attr fork. */
+	ifp = XFS_IFORK_PTR(sc->ip, XFS_ATTR_FORK);
+	for_each_xfs_iext(ifp, &icur, &got) {
+		for_each_xfs_attr_block(sc->mp, &got, lblk) {
+			error = xfs_da_get_buf(sc->tp, sc->ip, lblk, -1, &bp,
+					XFS_ATTR_FORK);
+			if (error || !bp)
+				continue;
+			xfs_trans_binval(sc->tp, bp);
+			error = xfs_trans_roll_inode(&sc->tp, sc->ip);
+			if (error)
+				return error;
+		}
+	}
+
+	/* Now free all the blocks. */
+	error = xfs_itruncate_extents(&sc->tp, sc->ip, XFS_ATTR_FORK, 0);
+	if (error)
+		return error;
+
+out_fork_remove:
+	/* Reset the attribute fork - this also destroys the in-core fork */
+	xfs_attr_fork_remove(sc->ip, sc->tp);
+	return 0;
+}
+
+/*
+ * Compare two xattr keys.  ATTR_SECURE keys come before ATTR_ROOT and
+ * ATTR_ROOT keys come before user attrs.  Otherwise sort in hash order.
+ */
+static int
+xrep_xattr_key_cmp(
+	void			*priv,
+	struct list_head	*a,
+	struct list_head	*b)
+{
+	struct xrep_xattr_key	*ap;
+	struct xrep_xattr_key	*bp;
+	uint			ahash;
+	uint			bhash;
+
+	ap = container_of(a, struct xrep_xattr_key, list);
+	bp = container_of(b, struct xrep_xattr_key, list);
+
+	if (ap->flags > bp->flags)
+		return 1;
+	else if (ap->flags < bp->flags)
+		return -1;
+
+	ahash = xfs_da_hashname(ap->name, ap->namelen);
+	bhash = xfs_da_hashname(bp->name, bp->namelen);
+	if (ahash > bhash)
+		return 1;
+	else if (ahash < bhash)
+		return -1;
+	return 0;
+}
+
+/*
+ * Find all the extended attributes for this inode by scraping them out of the
+ * attribute key blocks by hand.  The caller must clean up the lists if
+ * anything goes wrong.
+ */
+STATIC int
+xrep_xattr_find_attributes(
+	struct xfs_scrub	*sc,
+	struct list_head	*attrlist)
+{
+	struct xrep_xattr	rx;
+	struct xfs_ifork	*ifp;
+	int			error;
+
+	error = xrep_ino_dqattach(sc);
+	if (error)
+		return error;
+
+	/* Extent map should be loaded. */
+	ifp = XFS_IFORK_PTR(sc->ip, XFS_ATTR_FORK);
+	if (XFS_IFORK_FORMAT(sc->ip, XFS_ATTR_FORK) != XFS_DINODE_FMT_LOCAL &&
+	    !(ifp->if_flags & XFS_IFEXTENTS)) {
+		error = xfs_iread_extents(sc->tp, sc->ip, XFS_ATTR_FORK);
+		if (error)
+			return error;
+	}
+
+	rx.attrlist = attrlist;
+	rx.sc = sc;
+
+	/* Read every attr key and value and record them in memory. */
+	return xrep_xattr_recover(&rx);
+}
+
+/* Free all the attributes. */
+STATIC void
+xrep_xattr_cancel_attrs(
+	struct list_head	*attrlist)
+{
+	struct xrep_xattr_key	*key;
+	struct xrep_xattr_key	*n;
+
+	list_for_each_entry_safe(key, n, attrlist, list) {
+		list_del(&key->list);
+		kmem_free(key->value);
+		kmem_free(key);
+	}
+}
+
+/*
+ * Insert all the attributes that we collected.
+ *
+ * Commit the repair transaction and drop the ilock because the attribute
+ * setting code needs to be able to allocate special transactions and take the
+ * ilock on its own.  Some day we'll have deferred attribute setting, at which
+ * point we'll be able to use that to replace the attributes atomically and
+ * safely.
+ */
+STATIC int
+xrep_xattr_rebuild_tree(
+	struct xfs_scrub	*sc,
+	struct list_head	*attrlist)
+{
+	struct xrep_xattr_key	*key;
+	struct xrep_xattr_key	*n;
+	int			error;
+
+	error = xfs_trans_commit(sc->tp);
+	sc->tp = NULL;
+	if (error)
+		return error;
+
+	xfs_iunlock(sc->ip, XFS_ILOCK_EXCL);
+	sc->ilock_flags &= ~XFS_ILOCK_EXCL;
+
+	/* Re-add every attr to the file. */
+	list_sort(NULL, attrlist, xrep_xattr_key_cmp);
+	list_for_each_entry_safe(key, n, attrlist, list) {
+		error = xfs_attr_set(sc->ip, key->name, key->value,
+				key->valuelen, key->flags);
+		if (error)
+			return error;
+
+		/*
+		 * If the attr value is larger than a single page, free the
+		 * key now so that we aren't hogging memory while doing a lot
+		 * of metadata updates.  Otherwise, we want to spend as little
+		 * time reconstructing the attrs as we possibly can.
+		 */
+		if (key->valuelen <= PAGE_SIZE)
+			continue;
+		list_del(&key->list);
+		kmem_free(key->value);
+		kmem_free(key);
+	}
+
+	xrep_xattr_cancel_attrs(attrlist);
+	return 0;
+}
+
+/*
+ * Repair the extended attribute metadata.
+ *
+ * XXX: Remote attribute value buffers encompass the entire (up to 64k) buffer.
+ * The buffer cache in XFS can't handle aliased multiblock buffers, so this
+ * might misbehave if the attr fork is crosslinked with other filesystem
+ * metadata.
+ */
+int
+xrep_xattr(
+	struct xfs_scrub	*sc)
+{
+	struct list_head	attrlist;
+	int			error;
+
+	if (!xfs_inode_hasattr(sc->ip))
+		return -ENOENT;
+
+	/* Collect extended attributes by parsing raw blocks. */
+	INIT_LIST_HEAD(&attrlist);
+	error = xrep_xattr_find_attributes(sc, &attrlist);
+	if (error)
+		goto out;
+
+	/*
+	 * Invalidate and truncate all attribute fork extents.  This is the
+	 * point at which we are no longer able to bail out gracefully.
+	 * We commit the transaction here because xfs_attr_set allocates its
+	 * own transactions.
+	 */
+	error = xrep_xattr_reset_btree(sc);
+	if (error)
+		goto out;
+
+	/* Now rebuild the attribute information. */
+	error = xrep_xattr_rebuild_tree(sc, &attrlist);
+out:
+	xrep_xattr_cancel_attrs(&attrlist);
+	return error;
+}
diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
index 17769efb20d9..b630084d0f39 100644
--- a/fs/xfs/scrub/repair.h
+++ b/fs/xfs/scrub/repair.h
@@ -69,6 +69,7 @@ int xrep_inode(struct xfs_scrub *sc);
 int xrep_bmap_data(struct xfs_scrub *sc);
 int xrep_bmap_attr(struct xfs_scrub *sc);
 int xrep_symlink(struct xfs_scrub *sc);
+int xrep_xattr(struct xfs_scrub *sc);
 
 #else
 
@@ -110,6 +111,7 @@ xrep_reset_perag_resv(
 #define xrep_bmap_data			xrep_notsupported
 #define xrep_bmap_attr			xrep_notsupported
 #define xrep_symlink			xrep_notsupported
+#define xrep_xattr			xrep_notsupported
 
 #endif /* CONFIG_XFS_ONLINE_REPAIR */
 
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 0a8eea77e58f..537636d789fb 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -301,7 +301,7 @@ static const struct xchk_meta_ops meta_scrub_ops[] = {
 		.type	= ST_INODE,
 		.setup	= xchk_setup_xattr,
 		.scrub	= xchk_xattr,
-		.repair	= xrep_notsupported,
+		.repair	= xrep_xattr,
 	},
 	[XFS_SCRUB_TYPE_SYMLINK] = {	/* symbolic link */
 		.type	= ST_INODE,
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index 762db46fd696..d7ad8fad9318 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -139,4 +139,7 @@ void xchk_xref_is_used_rt_space(struct xfs_scrub *sc, xfs_rtblock_t rtbno,
 # define xchk_xref_is_used_rt_space(sc, rtbno, len) do { } while (0)
 #endif
 
+bool xchk_xattr_set_map(struct xfs_scrub *sc, unsigned long *map,
+		unsigned int start, unsigned int len);
+
 #endif	/* __XFS_SCRUB_SCRUB_H__ */


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 15/16] xfs: scrub should set preen if attr leaf has holes
  2018-07-26  0:19 [PATCH v17 00/16] xfs-4.19: online repair support Darrick J. Wong
                   ` (13 preceding siblings ...)
  2018-07-26  0:21 ` [PATCH 14/16] xfs: repair extended attributes Darrick J. Wong
@ 2018-07-26  0:21 ` Darrick J. Wong
  2018-07-26  0:21 ` [PATCH 16/16] xfs: repair quotas Darrick J. Wong
  15 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2018-07-26  0:21 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, david, allison.henderson, Dave Chinner

From: Darrick J. Wong <darrick.wong@oracle.com>

If an attr block indicates that it could use compaction, set the preen
flag to have the attr fork rebuilt, since the attr fork rebuilder can
take care of that for us.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/scrub/attr.c    |    2 ++
 fs/xfs/scrub/dabtree.c |   15 +++++++++++++++
 fs/xfs/scrub/dabtree.h |    1 +
 fs/xfs/scrub/trace.h   |    1 +
 4 files changed, 19 insertions(+)


diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
index e20074c241b5..0956d4588dc5 100644
--- a/fs/xfs/scrub/attr.c
+++ b/fs/xfs/scrub/attr.c
@@ -293,6 +293,8 @@ xchk_xattr_block(
 		xchk_da_set_corrupt(ds, level);
 	if (!xchk_xattr_set_map(ds->sc, usedmap, 0, hdrsize))
 		xchk_da_set_corrupt(ds, level);
+	if (leafhdr.holes)
+		xchk_da_set_preen(ds, level);
 
 	if (ds->sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
 		goto out;
diff --git a/fs/xfs/scrub/dabtree.c b/fs/xfs/scrub/dabtree.c
index f1260b4bfdee..e2ecf9c77010 100644
--- a/fs/xfs/scrub/dabtree.c
+++ b/fs/xfs/scrub/dabtree.c
@@ -85,6 +85,21 @@ xchk_da_set_corrupt(
 			__return_address);
 }
 
+/* Flag a da btree node in need of optimization. */
+void
+xchk_da_set_preen(
+	struct xchk_da_btree	*ds,
+	int			level)
+{
+	struct xfs_scrub	*sc = ds->sc;
+
+	sc->sm->sm_flags |= XFS_SCRUB_OFLAG_PREEN;
+	trace_xchk_fblock_preen(sc, ds->dargs.whichfork,
+			xfs_dir2_da_to_db(ds->dargs.geo,
+				ds->state->path.blk[level].blkno),
+			__return_address);
+}
+
 /* Find an entry at a certain level in a da btree. */
 STATIC void *
 xchk_da_btree_entry(
diff --git a/fs/xfs/scrub/dabtree.h b/fs/xfs/scrub/dabtree.h
index cb3f0003245b..b367bf87a183 100644
--- a/fs/xfs/scrub/dabtree.h
+++ b/fs/xfs/scrub/dabtree.h
@@ -36,6 +36,7 @@ bool xchk_da_process_error(struct xchk_da_btree *ds, int level, int *error);
 
 /* Check for da btree corruption. */
 void xchk_da_set_corrupt(struct xchk_da_btree *ds, int level);
+void xchk_da_set_preen(struct xchk_da_btree *ds, int level);
 
 int xchk_da_btree_hash(struct xchk_da_btree *ds, int level, __be32 *hashp);
 int xchk_da_btree(struct xfs_scrub *sc, int whichfork,
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 3383b14fd0c0..d7133d1d23d6 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -230,6 +230,7 @@ DEFINE_EVENT(xchk_fblock_error_class, name, \
 
 DEFINE_SCRUB_FBLOCK_ERROR_EVENT(xchk_fblock_error);
 DEFINE_SCRUB_FBLOCK_ERROR_EVENT(xchk_fblock_warning);
+DEFINE_SCRUB_FBLOCK_ERROR_EVENT(xchk_fblock_preen);
 
 TRACE_EVENT(xchk_incomplete,
 	TP_PROTO(struct xfs_scrub *sc, void *ret_ip),


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 16/16] xfs: repair quotas
  2018-07-26  0:19 [PATCH v17 00/16] xfs-4.19: online repair support Darrick J. Wong
                   ` (14 preceding siblings ...)
  2018-07-26  0:21 ` [PATCH 15/16] xfs: scrub should set preen if attr leaf has holes Darrick J. Wong
@ 2018-07-26  0:21 ` Darrick J. Wong
  15 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2018-07-26  0:21 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, david, allison.henderson

From: Darrick J. Wong <darrick.wong@oracle.com>

Fix anything that causes the quota verifiers to fail.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile             |    1 
 fs/xfs/scrub/attr_repair.c  |    2 
 fs/xfs/scrub/common.h       |    9 +
 fs/xfs/scrub/quota.c        |    2 
 fs/xfs/scrub/quota_repair.c |  363 +++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/repair.c       |   58 +++++++
 fs/xfs/scrub/repair.h       |    8 +
 fs/xfs/scrub/scrub.c        |   11 +
 8 files changed, 446 insertions(+), 8 deletions(-)
 create mode 100644 fs/xfs/scrub/quota_repair.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index c3963c88f952..ed1fc827ed15 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -174,5 +174,6 @@ xfs-y				+= $(addprefix scrub/, \
 				   repair.o \
 				   symlink_repair.o \
 				   )
+xfs-$(CONFIG_XFS_QUOTA)		+= scrub/quota_repair.o
 endif
 endif
diff --git a/fs/xfs/scrub/attr_repair.c b/fs/xfs/scrub/attr_repair.c
index 5bacfb88f25e..e01ca4350857 100644
--- a/fs/xfs/scrub/attr_repair.c
+++ b/fs/xfs/scrub/attr_repair.c
@@ -395,7 +395,7 @@ xrep_xattr_recover(
 }
 
 /* Free all the attribute fork blocks and delete the fork. */
-STATIC int
+int
 xrep_xattr_reset_btree(
 	struct xfs_scrub	*sc)
 {
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 2d4324d12f9a..aab82f7f9a67 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -138,4 +138,13 @@ static inline bool xchk_skip_xref(struct xfs_scrub_metadata *sm)
 int xchk_metadata_inode_forks(struct xfs_scrub *sc);
 int xchk_ilock_inverted(struct xfs_inode *ip, uint lock_mode);
 
+/* Do we need to invoke the repair tool? */
+static inline bool xfs_scrub_needs_repair(struct xfs_scrub_metadata *sm)
+{
+	return sm->sm_flags & (XFS_SCRUB_OFLAG_CORRUPT |
+			       XFS_SCRUB_OFLAG_XCORRUPT |
+			       XFS_SCRUB_OFLAG_PREEN);
+}
+uint xchk_quota_to_dqtype(struct xfs_scrub *sc);
+
 #endif	/* __XFS_SCRUB_COMMON_H__ */
diff --git a/fs/xfs/scrub/quota.c b/fs/xfs/scrub/quota.c
index 782d582d3edd..0e5578ab088e 100644
--- a/fs/xfs/scrub/quota.c
+++ b/fs/xfs/scrub/quota.c
@@ -29,7 +29,7 @@
 #include "scrub/trace.h"
 
 /* Convert a scrub type code to a DQ flag, or return 0 if error. */
-static inline uint
+uint
 xchk_quota_to_dqtype(
 	struct xfs_scrub	*sc)
 {
diff --git a/fs/xfs/scrub/quota_repair.c b/fs/xfs/scrub/quota_repair.c
new file mode 100644
index 000000000000..36635f7ca217
--- /dev/null
+++ b/fs/xfs/scrub/quota_repair.c
@@ -0,0 +1,363 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_inode_fork.h"
+#include "xfs_alloc.h"
+#include "xfs_bmap.h"
+#include "xfs_quota.h"
+#include "xfs_qm.h"
+#include "xfs_dquot.h"
+#include "xfs_dquot_item.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/trace.h"
+#include "scrub/repair.h"
+
+/*
+ * Quota Repair
+ * ============
+ *
+ * Quota repairs are fairly simplistic; we fix everything that the dquot
+ * verifiers complain about, cap any counters or limits that make no sense,
+ * and schedule a quotacheck if we had to fix anything.  We also repair any
+ * data fork extent records that don't apply to metadata files.
+ */
+
+struct xrep_quota_info {
+	struct xfs_scrub	*sc;
+	bool			need_quotacheck;
+};
+
+/* Scrub the fields in an individual quota item. */
+STATIC int
+xrep_quota_item(
+	struct xfs_dquot	*dq,
+	uint			dqtype,
+	void			*priv)
+{
+	struct xrep_quota_info	*rqi = priv;
+	struct xfs_scrub	*sc = rqi->sc;
+	struct xfs_mount	*mp = sc->mp;
+	struct xfs_disk_dquot	*d = &dq->q_core;
+	unsigned long long	bsoft;
+	unsigned long long	isoft;
+	unsigned long long	rsoft;
+	unsigned long long	bhard;
+	unsigned long long	ihard;
+	unsigned long long	rhard;
+	unsigned long long	bcount;
+	unsigned long long	icount;
+	unsigned long long	rcount;
+	xfs_ino_t		fs_icount;
+	bool			dirty = false;
+	int			error;
+
+	/* Did we get the dquot type we wanted? */
+	if (dqtype != (d->d_flags & XFS_DQ_ALLTYPES)) {
+		d->d_flags = dqtype;
+		dirty = true;
+	}
+
+	if (d->d_pad0 || d->d_pad) {
+		d->d_pad0 = 0;
+		d->d_pad = 0;
+		dirty = true;
+	}
+
+	/* Check the limits. */
+	bhard = be64_to_cpu(d->d_blk_hardlimit);
+	ihard = be64_to_cpu(d->d_ino_hardlimit);
+	rhard = be64_to_cpu(d->d_rtb_hardlimit);
+
+	bsoft = be64_to_cpu(d->d_blk_softlimit);
+	isoft = be64_to_cpu(d->d_ino_softlimit);
+	rsoft = be64_to_cpu(d->d_rtb_softlimit);
+
+	if (bsoft > bhard) {
+		d->d_blk_softlimit = d->d_blk_hardlimit;
+		dirty = true;
+	}
+
+	if (isoft > ihard) {
+		d->d_ino_softlimit = d->d_ino_hardlimit;
+		dirty = true;
+	}
+
+	if (rsoft > rhard) {
+		d->d_rtb_softlimit = d->d_rtb_hardlimit;
+		dirty = true;
+	}
+
+	/* Check the resource counts. */
+	bcount = be64_to_cpu(d->d_bcount);
+	icount = be64_to_cpu(d->d_icount);
+	rcount = be64_to_cpu(d->d_rtbcount);
+	fs_icount = percpu_counter_sum(&mp->m_icount);
+
+	/*
+	 * Check that usage doesn't exceed physical limits.  However, on
+	 * a reflink filesystem we're allowed to exceed physical space
+	 * if there are no quota limits.  We don't know what the real number
+	 * is, but we can make quotacheck find out for us.
+	 */
+	if (!xfs_sb_version_hasreflink(&mp->m_sb) &&
+	    mp->m_sb.sb_dblocks < bcount) {
+		dq->q_res_bcount -= be64_to_cpu(dq->q_core.d_bcount);
+		dq->q_res_bcount += mp->m_sb.sb_dblocks;
+		d->d_bcount = cpu_to_be64(mp->m_sb.sb_dblocks);
+		rqi->need_quotacheck = true;
+		dirty = true;
+	}
+	if (icount > fs_icount) {
+		dq->q_res_icount -= be64_to_cpu(dq->q_core.d_icount);
+		dq->q_res_icount += fs_icount;
+		d->d_icount = cpu_to_be64(fs_icount);
+		rqi->need_quotacheck = true;
+		dirty = true;
+	}
+	if (rcount > mp->m_sb.sb_rblocks) {
+		dq->q_res_rtbcount -= be64_to_cpu(dq->q_core.d_rtbcount);
+		dq->q_res_rtbcount += mp->m_sb.sb_rblocks;
+		d->d_rtbcount = cpu_to_be64(mp->m_sb.sb_rblocks);
+		rqi->need_quotacheck = true;
+		dirty = true;
+	}
+
+	if (!dirty)
+		return 0;
+
+	dq->dq_flags |= XFS_DQ_DIRTY;
+	xfs_trans_dqjoin(sc->tp, dq);
+	xfs_trans_log_dquot(sc->tp, dq);
+	error = xfs_trans_roll(&sc->tp);
+	xfs_dqlock(dq);
+	return error;
+}
+
+/* Fix a quota timer so that we can pass the verifier. */
+STATIC void
+xrep_quota_fix_timer(
+	__be64			softlimit,
+	__be64			countnow,
+	__be32			*timer,
+	time_t			timelimit)
+{
+	uint64_t		soft = be64_to_cpu(softlimit);
+	uint64_t		count = be64_to_cpu(countnow);
+
+	if (soft && count > soft && *timer == 0)
+		*timer = cpu_to_be32(get_seconds() + timelimit);
+}
+
+/* Fix anything the verifiers complain about. */
+STATIC int
+xrep_quota_block(
+	struct xfs_scrub	*sc,
+	struct xfs_buf		*bp,
+	uint			dqtype,
+	xfs_dqid_t		id)
+{
+	struct xfs_dqblk	*d = (struct xfs_dqblk *)bp->b_addr;
+	struct xfs_disk_dquot	*ddq;
+	struct xfs_quotainfo	*qi = sc->mp->m_quotainfo;
+	enum xfs_blft		buftype = 0;
+	int			i;
+
+	bp->b_ops = &xfs_dquot_buf_ops;
+	for (i = 0; i < qi->qi_dqperchunk; i++) {
+		ddq = &d[i].dd_diskdq;
+
+		ddq->d_magic = cpu_to_be16(XFS_DQUOT_MAGIC);
+		ddq->d_version = XFS_DQUOT_VERSION;
+		ddq->d_flags = dqtype;
+		ddq->d_id = cpu_to_be32(id + i);
+
+		xrep_quota_fix_timer(ddq->d_blk_softlimit,
+				ddq->d_bcount, &ddq->d_btimer,
+				qi->qi_btimelimit);
+		xrep_quota_fix_timer(ddq->d_ino_softlimit,
+				ddq->d_icount, &ddq->d_itimer,
+				qi->qi_itimelimit);
+		xrep_quota_fix_timer(ddq->d_rtb_softlimit,
+				ddq->d_rtbcount, &ddq->d_rtbtimer,
+				qi->qi_rtbtimelimit);
+
+		/* We only support v5 filesystems so always set these. */
+		uuid_copy(&d->dd_uuid, &sc->mp->m_sb.sb_meta_uuid);
+		xfs_update_cksum((char *)d, sizeof(struct xfs_dqblk),
+				 XFS_DQUOT_CRC_OFF);
+		d->dd_lsn = 0;
+	}
+	switch (dqtype) {
+	case XFS_DQ_USER:
+		buftype = XFS_BLFT_UDQUOT_BUF;
+		break;
+	case XFS_DQ_GROUP:
+		buftype = XFS_BLFT_GDQUOT_BUF;
+		break;
+	case XFS_DQ_PROJ:
+		buftype = XFS_BLFT_PDQUOT_BUF;
+		break;
+	}
+	xfs_trans_buf_set_type(sc->tp, bp, buftype);
+	xfs_trans_log_buf(sc->tp, bp, 0, BBTOB(bp->b_length) - 1);
+	return xfs_trans_roll(&sc->tp);
+}
+
+/* Repair quota's data fork. */
+STATIC int
+xrep_quota_data_fork(
+	struct xfs_scrub	*sc,
+	uint			dqtype)
+{
+	struct xfs_bmbt_irec	irec = { 0 };
+	struct xfs_iext_cursor	icur;
+	struct xfs_quotainfo	*qi = sc->mp->m_quotainfo;
+	struct xfs_ifork	*ifp;
+	struct xfs_buf		*bp;
+	struct xfs_dqblk	*d;
+	xfs_dqid_t		id;
+	xfs_fileoff_t		max_dqid_off;
+	xfs_fileoff_t		off;
+	xfs_fsblock_t		fsbno;
+	bool			truncate = false;
+	int			error = 0;
+
+	error = xrep_metadata_inode_forks(sc);
+	if (error)
+		goto out;
+
+	/* Check for data fork problems that apply only to quota files. */
+	max_dqid_off = ((xfs_dqid_t)-1) / qi->qi_dqperchunk;
+	ifp = XFS_IFORK_PTR(sc->ip, XFS_DATA_FORK);
+	for_each_xfs_iext(ifp, &icur, &irec) {
+		if (isnullstartblock(irec.br_startblock)) {
+			error = -EFSCORRUPTED;
+			goto out;
+		}
+
+		if (irec.br_startoff > max_dqid_off ||
+		    irec.br_startoff + irec.br_blockcount - 1 > max_dqid_off) {
+			truncate = true;
+			break;
+		}
+	}
+	if (truncate) {
+		error = xfs_itruncate_extents(&sc->tp, sc->ip, XFS_DATA_FORK,
+				max_dqid_off * sc->mp->m_sb.sb_blocksize);
+		if (error)
+			goto out;
+	}
+
+	/* Now go fix anything that fails the verifiers. */
+	for_each_xfs_iext(ifp, &icur, &irec) {
+		for (fsbno = irec.br_startblock, off = irec.br_startoff;
+		     fsbno < irec.br_startblock + irec.br_blockcount;
+		     fsbno += XFS_DQUOT_CLUSTER_SIZE_FSB,
+				off += XFS_DQUOT_CLUSTER_SIZE_FSB) {
+			id = off * qi->qi_dqperchunk;
+			error = xfs_trans_read_buf(sc->mp, sc->tp,
+					sc->mp->m_ddev_targp,
+					XFS_FSB_TO_DADDR(sc->mp, fsbno),
+					qi->qi_dqchunklen,
+					0, &bp, &xfs_dquot_buf_ops);
+			if (error == 0) {
+				d = (struct xfs_dqblk *)bp->b_addr;
+				if (id == be32_to_cpu(d->dd_diskdq.d_id)) {
+					xfs_trans_brelse(sc->tp, bp);
+					continue;
+				}
+				error = -EFSCORRUPTED;
+				xfs_trans_brelse(sc->tp, bp);
+			}
+			if (error != -EFSBADCRC && error != -EFSCORRUPTED)
+				goto out;
+
+			/* Failed verifier, try again. */
+			error = xfs_trans_read_buf(sc->mp, sc->tp,
+					sc->mp->m_ddev_targp,
+					XFS_FSB_TO_DADDR(sc->mp, fsbno),
+					qi->qi_dqchunklen,
+					0, &bp, NULL);
+			if (error)
+				goto out;
+
+			/*
+			 * Fix the quota block, which will roll our transaction
+			 * and release bp.
+			 */
+			error = xrep_quota_block(sc, bp, dqtype, id);
+			if (error)
+				goto out;
+		}
+	}
+
+out:
+	return error;
+}
+
+/*
+ * Go fix anything in the quota items that we could have been mad about.  Now
+ * that we've checked the quota inode data fork we have to drop ILOCK_EXCL to
+ * use the regular dquot functions.
+ */
+STATIC int
+xrep_quota_problems(
+	struct xfs_scrub	*sc,
+	uint			dqtype)
+{
+	struct xrep_quota_info	rqi;
+	int			error;
+
+	rqi.sc = sc;
+	rqi.need_quotacheck = false;
+	error = xfs_qm_dqiterate(sc->mp, dqtype, xrep_quota_item, &rqi);
+	if (error)
+		return error;
+
+	/* Make a quotacheck happen. */
+	if (rqi.need_quotacheck)
+		xrep_force_quotacheck(sc, dqtype);
+	return 0;
+}
+
+/* Repair all of a quota type's items. */
+int
+xrep_quota(
+	struct xfs_scrub	*sc)
+{
+	uint			dqtype;
+	int			error;
+
+	dqtype = xchk_quota_to_dqtype(sc);
+
+	/* Fix problematic data fork mappings. */
+	error = xrep_quota_data_fork(sc, dqtype);
+	if (error)
+		goto out;
+
+	/* Unlock quota inode; we play only with dquots from now on. */
+	xfs_iunlock(sc->ip, sc->ilock_flags);
+	sc->ilock_flags = 0;
+
+	/* Fix anything the dquot verifiers complain about. */
+	error = xrep_quota_problems(sc, dqtype);
+out:
+	return error;
+}
diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
index a44deb6f06ab..27cc50178d86 100644
--- a/fs/xfs/scrub/repair.c
+++ b/fs/xfs/scrub/repair.c
@@ -29,6 +29,8 @@
 #include "xfs_ag_resv.h"
 #include "xfs_trans_space.h"
 #include "xfs_quota.h"
+#include "xfs_attr.h"
+#include "xfs_reflink.h"
 #include "scrub/xfs_scrub.h"
 #include "scrub/scrub.h"
 #include "scrub/common.h"
@@ -900,3 +902,59 @@ xrep_reset_perag_resv(
 out:
 	return error;
 }
+
+/*
+ * Repair the attr/data forks of a metadata inode.  The metadata inode must be
+ * pointed to by sc->ip and the ILOCK must be held.
+ */
+int
+xrep_metadata_inode_forks(
+	struct xfs_scrub	*sc)
+{
+	__u32			smtype;
+	__u32			smflags;
+	int			error;
+
+	smtype = sc->sm->sm_type;
+	smflags = sc->sm->sm_flags;
+
+	/* Let's see if the forks need repair. */
+	sc->sm->sm_flags &= ~XFS_SCRUB_FLAGS_OUT;
+	error = xchk_metadata_inode_forks(sc);
+	if (error || !xfs_scrub_needs_repair(sc->sm))
+		goto out;
+
+	xfs_trans_ijoin(sc->tp, sc->ip, 0);
+
+	/* Clear the reflink flag & attr forks that we shouldn't have. */
+	if (xfs_is_reflink_inode(sc->ip)) {
+		error = xfs_reflink_clear_inode_flag(sc->ip, &sc->tp);
+		if (error)
+			goto out;
+	}
+
+	if (xfs_inode_hasattr(sc->ip)) {
+		error = xrep_xattr_reset_btree(sc);
+		if (error)
+			goto out;
+	}
+
+	/* Repair the data fork. */
+	sc->sm->sm_type = XFS_SCRUB_TYPE_BMBTD;
+	error = xrep_bmap_data(sc);
+	sc->sm->sm_type = smtype;
+	if (error)
+		goto out;
+
+	/* Bail out if we still need repairs. */
+	sc->sm->sm_flags &= ~XFS_SCRUB_FLAGS_OUT;
+	error = xchk_metadata_inode_forks(sc);
+	if (error)
+		goto out;
+	if (xfs_scrub_needs_repair(sc->sm))
+		error = -EFSCORRUPTED;
+out:
+	sc->sm->sm_type = smtype;
+	sc->sm->sm_flags = smflags;
+	return error;
+}
diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
index b630084d0f39..aa032a7b99d0 100644
--- a/fs/xfs/scrub/repair.h
+++ b/fs/xfs/scrub/repair.h
@@ -54,6 +54,8 @@ int xrep_find_ag_btree_roots(struct xfs_scrub *sc, struct xfs_buf *agf_bp,
 void xrep_force_quotacheck(struct xfs_scrub *sc, uint dqtype);
 int xrep_ino_dqattach(struct xfs_scrub *sc);
 int xrep_reset_perag_resv(struct xfs_scrub *sc);
+int xrep_xattr_reset_btree(struct xfs_scrub *sc);
+int xrep_metadata_inode_forks(struct xfs_scrub *sc);
 
 /* Metadata repairers */
 
@@ -70,6 +72,11 @@ int xrep_bmap_data(struct xfs_scrub *sc);
 int xrep_bmap_attr(struct xfs_scrub *sc);
 int xrep_symlink(struct xfs_scrub *sc);
 int xrep_xattr(struct xfs_scrub *sc);
+#ifdef CONFIG_XFS_QUOTA
+int xrep_quota(struct xfs_scrub *sc);
+#else
+# define xrep_quota			xrep_notsupported
+#endif /* CONFIG_XFS_QUOTA */
 
 #else
 
@@ -112,6 +119,7 @@ xrep_reset_perag_resv(
 #define xrep_bmap_attr			xrep_notsupported
 #define xrep_symlink			xrep_notsupported
 #define xrep_xattr			xrep_notsupported
+#define xrep_quota			xrep_notsupported
 
 #endif /* CONFIG_XFS_ONLINE_REPAIR */
 
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 537636d789fb..a9f969214e69 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -333,19 +333,19 @@ static const struct xchk_meta_ops meta_scrub_ops[] = {
 		.type	= ST_FS,
 		.setup	= xchk_setup_quota,
 		.scrub	= xchk_quota,
-		.repair	= xrep_notsupported,
+		.repair	= xrep_quota,
 	},
 	[XFS_SCRUB_TYPE_GQUOTA] = {	/* group quota */
 		.type	= ST_FS,
 		.setup	= xchk_setup_quota,
 		.scrub	= xchk_quota,
-		.repair	= xrep_notsupported,
+		.repair	= xrep_quota,
 	},
 	[XFS_SCRUB_TYPE_PQUOTA] = {	/* project quota */
 		.type	= ST_FS,
 		.setup	= xchk_setup_quota,
 		.scrub	= xchk_quota,
-		.repair	= xrep_notsupported,
+		.repair	= xrep_quota,
 	},
 };
 
@@ -539,9 +539,8 @@ xfs_scrub_metadata(
 		if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_FORCE_SCRUB_REPAIR))
 			sc.sm->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT;
 
-		needs_fix = (sc.sm->sm_flags & (XFS_SCRUB_OFLAG_CORRUPT |
-						XFS_SCRUB_OFLAG_XCORRUPT |
-						XFS_SCRUB_OFLAG_PREEN));
+		needs_fix = xfs_scrub_needs_repair(sc.sm);
+
 		/*
 		 * If userspace asked for a repair but it wasn't necessary,
 		 * report that back to userspace.


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 01/16] xfs: pass transaction lock while setting up agresv on cyclic metadata
  2018-07-26  0:19 ` [PATCH 01/16] xfs: pass transaction lock while setting up agresv on cyclic metadata Darrick J. Wong
@ 2018-07-27 14:21   ` Brian Foster
  0 siblings, 0 replies; 26+ messages in thread
From: Brian Foster @ 2018-07-27 14:21 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, david, allison.henderson

On Wed, Jul 25, 2018 at 05:19:33PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Pass a tranaction pointer through to all helpers that calculate the
> per-AG block reservation.  Online repair will use this to reinitialize
> per-ag reservations while it still holds all the AG headers locked to
> the repair transaction.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

Seems fine:

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/libxfs/xfs_ag_resv.c        |   13 +++++++------
>  fs/xfs/libxfs/xfs_ag_resv.h        |    2 +-
>  fs/xfs/libxfs/xfs_ialloc_btree.c   |   10 ++++++----
>  fs/xfs/libxfs/xfs_ialloc_btree.h   |    4 ++--
>  fs/xfs/libxfs/xfs_refcount_btree.c |    5 +++--
>  fs/xfs/libxfs/xfs_refcount_btree.h |    3 ++-
>  fs/xfs/libxfs/xfs_rmap_btree.c     |    5 +++--
>  fs/xfs/libxfs/xfs_rmap_btree.h     |    2 +-
>  fs/xfs/xfs_fsops.c                 |    2 +-
>  9 files changed, 26 insertions(+), 20 deletions(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_ag_resv.c b/fs/xfs/libxfs/xfs_ag_resv.c
> index fecd187fcf2c..e701ebc36c06 100644
> --- a/fs/xfs/libxfs/xfs_ag_resv.c
> +++ b/fs/xfs/libxfs/xfs_ag_resv.c
> @@ -248,7 +248,8 @@ __xfs_ag_resv_init(
>  /* Create a per-AG block reservation. */
>  int
>  xfs_ag_resv_init(
> -	struct xfs_perag		*pag)
> +	struct xfs_perag		*pag,
> +	struct xfs_trans		*tp)
>  {
>  	struct xfs_mount		*mp = pag->pag_mount;
>  	xfs_agnumber_t			agno = pag->pag_agno;
> @@ -260,11 +261,11 @@ xfs_ag_resv_init(
>  	if (pag->pag_meta_resv.ar_asked == 0) {
>  		ask = used = 0;
>  
> -		error = xfs_refcountbt_calc_reserves(mp, agno, &ask, &used);
> +		error = xfs_refcountbt_calc_reserves(mp, tp, agno, &ask, &used);
>  		if (error)
>  			goto out;
>  
> -		error = xfs_finobt_calc_reserves(mp, agno, &ask, &used);
> +		error = xfs_finobt_calc_reserves(mp, tp, agno, &ask, &used);
>  		if (error)
>  			goto out;
>  
> @@ -282,7 +283,7 @@ xfs_ag_resv_init(
>  
>  			mp->m_inotbt_nores = true;
>  
> -			error = xfs_refcountbt_calc_reserves(mp, agno, &ask,
> +			error = xfs_refcountbt_calc_reserves(mp, tp, agno, &ask,
>  					&used);
>  			if (error)
>  				goto out;
> @@ -298,7 +299,7 @@ xfs_ag_resv_init(
>  	if (pag->pag_rmapbt_resv.ar_asked == 0) {
>  		ask = used = 0;
>  
> -		error = xfs_rmapbt_calc_reserves(mp, agno, &ask, &used);
> +		error = xfs_rmapbt_calc_reserves(mp, tp, agno, &ask, &used);
>  		if (error)
>  			goto out;
>  
> @@ -309,7 +310,7 @@ xfs_ag_resv_init(
>  
>  #ifdef DEBUG
>  	/* need to read in the AGF for the ASSERT below to work */
> -	error = xfs_alloc_pagf_init(pag->pag_mount, NULL, pag->pag_agno, 0);
> +	error = xfs_alloc_pagf_init(pag->pag_mount, tp, pag->pag_agno, 0);
>  	if (error)
>  		return error;
>  
> diff --git a/fs/xfs/libxfs/xfs_ag_resv.h b/fs/xfs/libxfs/xfs_ag_resv.h
> index 4619b554ee90..d1005116b43b 100644
> --- a/fs/xfs/libxfs/xfs_ag_resv.h
> +++ b/fs/xfs/libxfs/xfs_ag_resv.h
> @@ -7,7 +7,7 @@
>  #define	__XFS_AG_RESV_H__
>  
>  int xfs_ag_resv_free(struct xfs_perag *pag);
> -int xfs_ag_resv_init(struct xfs_perag *pag);
> +int xfs_ag_resv_init(struct xfs_perag *pag, struct xfs_trans *tp);
>  
>  bool xfs_ag_resv_critical(struct xfs_perag *pag, enum xfs_ag_resv_type type);
>  xfs_extlen_t xfs_ag_resv_needed(struct xfs_perag *pag,
> diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
> index 735a33252eb2..86c50208a143 100644
> --- a/fs/xfs/libxfs/xfs_ialloc_btree.c
> +++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
> @@ -552,6 +552,7 @@ xfs_inobt_max_size(
>  static int
>  xfs_inobt_count_blocks(
>  	struct xfs_mount	*mp,
> +	struct xfs_trans	*tp,
>  	xfs_agnumber_t		agno,
>  	xfs_btnum_t		btnum,
>  	xfs_extlen_t		*tree_blocks)
> @@ -560,14 +561,14 @@ xfs_inobt_count_blocks(
>  	struct xfs_btree_cur	*cur;
>  	int			error;
>  
> -	error = xfs_ialloc_read_agi(mp, NULL, agno, &agbp);
> +	error = xfs_ialloc_read_agi(mp, tp, agno, &agbp);
>  	if (error)
>  		return error;
>  
> -	cur = xfs_inobt_init_cursor(mp, NULL, agbp, agno, btnum);
> +	cur = xfs_inobt_init_cursor(mp, tp, agbp, agno, btnum);
>  	error = xfs_btree_count_blocks(cur, tree_blocks);
>  	xfs_btree_del_cursor(cur, error);
> -	xfs_buf_relse(agbp);
> +	xfs_trans_brelse(tp, agbp);
>  
>  	return error;
>  }
> @@ -578,6 +579,7 @@ xfs_inobt_count_blocks(
>  int
>  xfs_finobt_calc_reserves(
>  	struct xfs_mount	*mp,
> +	struct xfs_trans	*tp,
>  	xfs_agnumber_t		agno,
>  	xfs_extlen_t		*ask,
>  	xfs_extlen_t		*used)
> @@ -588,7 +590,7 @@ xfs_finobt_calc_reserves(
>  	if (!xfs_sb_version_hasfinobt(&mp->m_sb))
>  		return 0;
>  
> -	error = xfs_inobt_count_blocks(mp, agno, XFS_BTNUM_FINO, &tree_len);
> +	error = xfs_inobt_count_blocks(mp, tp, agno, XFS_BTNUM_FINO, &tree_len);
>  	if (error)
>  		return error;
>  
> diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.h b/fs/xfs/libxfs/xfs_ialloc_btree.h
> index bf8f0c405e7d..ebdd0c6b8766 100644
> --- a/fs/xfs/libxfs/xfs_ialloc_btree.h
> +++ b/fs/xfs/libxfs/xfs_ialloc_btree.h
> @@ -60,8 +60,8 @@ int xfs_inobt_rec_check_count(struct xfs_mount *,
>  #define xfs_inobt_rec_check_count(mp, rec)	0
>  #endif	/* DEBUG */
>  
> -int xfs_finobt_calc_reserves(struct xfs_mount *mp, xfs_agnumber_t agno,
> -		xfs_extlen_t *ask, xfs_extlen_t *used);
> +int xfs_finobt_calc_reserves(struct xfs_mount *mp, struct xfs_trans *tp,
> +		xfs_agnumber_t agno, xfs_extlen_t *ask, xfs_extlen_t *used);
>  extern xfs_extlen_t xfs_iallocbt_calc_size(struct xfs_mount *mp,
>  		unsigned long long len);
>  
> diff --git a/fs/xfs/libxfs/xfs_refcount_btree.c b/fs/xfs/libxfs/xfs_refcount_btree.c
> index b71937982c5b..bcd65ee37260 100644
> --- a/fs/xfs/libxfs/xfs_refcount_btree.c
> +++ b/fs/xfs/libxfs/xfs_refcount_btree.c
> @@ -408,6 +408,7 @@ xfs_refcountbt_max_size(
>  int
>  xfs_refcountbt_calc_reserves(
>  	struct xfs_mount	*mp,
> +	struct xfs_trans	*tp,
>  	xfs_agnumber_t		agno,
>  	xfs_extlen_t		*ask,
>  	xfs_extlen_t		*used)
> @@ -422,14 +423,14 @@ xfs_refcountbt_calc_reserves(
>  		return 0;
>  
>  
> -	error = xfs_alloc_read_agf(mp, NULL, agno, 0, &agbp);
> +	error = xfs_alloc_read_agf(mp, tp, agno, 0, &agbp);
>  	if (error)
>  		return error;
>  
>  	agf = XFS_BUF_TO_AGF(agbp);
>  	agblocks = be32_to_cpu(agf->agf_length);
>  	tree_len = be32_to_cpu(agf->agf_refcount_blocks);
> -	xfs_buf_relse(agbp);
> +	xfs_trans_brelse(tp, agbp);
>  
>  	*ask += xfs_refcountbt_max_size(mp, agblocks);
>  	*used += tree_len;
> diff --git a/fs/xfs/libxfs/xfs_refcount_btree.h b/fs/xfs/libxfs/xfs_refcount_btree.h
> index d2852b6e1fa8..c868394ac02e 100644
> --- a/fs/xfs/libxfs/xfs_refcount_btree.h
> +++ b/fs/xfs/libxfs/xfs_refcount_btree.h
> @@ -55,6 +55,7 @@ extern xfs_extlen_t xfs_refcountbt_max_size(struct xfs_mount *mp,
>  		xfs_agblock_t agblocks);
>  
>  extern int xfs_refcountbt_calc_reserves(struct xfs_mount *mp,
> -		xfs_agnumber_t agno, xfs_extlen_t *ask, xfs_extlen_t *used);
> +		struct xfs_trans *tp, xfs_agnumber_t agno, xfs_extlen_t *ask,
> +		xfs_extlen_t *used);
>  
>  #endif	/* __XFS_REFCOUNT_BTREE_H__ */
> diff --git a/fs/xfs/libxfs/xfs_rmap_btree.c b/fs/xfs/libxfs/xfs_rmap_btree.c
> index 221a88ea60bb..f79cf040d745 100644
> --- a/fs/xfs/libxfs/xfs_rmap_btree.c
> +++ b/fs/xfs/libxfs/xfs_rmap_btree.c
> @@ -554,6 +554,7 @@ xfs_rmapbt_max_size(
>  int
>  xfs_rmapbt_calc_reserves(
>  	struct xfs_mount	*mp,
> +	struct xfs_trans	*tp,
>  	xfs_agnumber_t		agno,
>  	xfs_extlen_t		*ask,
>  	xfs_extlen_t		*used)
> @@ -567,14 +568,14 @@ xfs_rmapbt_calc_reserves(
>  	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
>  		return 0;
>  
> -	error = xfs_alloc_read_agf(mp, NULL, agno, 0, &agbp);
> +	error = xfs_alloc_read_agf(mp, tp, agno, 0, &agbp);
>  	if (error)
>  		return error;
>  
>  	agf = XFS_BUF_TO_AGF(agbp);
>  	agblocks = be32_to_cpu(agf->agf_length);
>  	tree_len = be32_to_cpu(agf->agf_rmap_blocks);
> -	xfs_buf_relse(agbp);
> +	xfs_trans_brelse(tp, agbp);
>  
>  	/* Reserve 1% of the AG or enough for 1 block per record. */
>  	*ask += max(agblocks / 100, xfs_rmapbt_max_size(mp, agblocks));
> diff --git a/fs/xfs/libxfs/xfs_rmap_btree.h b/fs/xfs/libxfs/xfs_rmap_btree.h
> index 50198b6c3bb2..820d668b063d 100644
> --- a/fs/xfs/libxfs/xfs_rmap_btree.h
> +++ b/fs/xfs/libxfs/xfs_rmap_btree.h
> @@ -51,7 +51,7 @@ extern xfs_extlen_t xfs_rmapbt_calc_size(struct xfs_mount *mp,
>  extern xfs_extlen_t xfs_rmapbt_max_size(struct xfs_mount *mp,
>  		xfs_agblock_t agblocks);
>  
> -extern int xfs_rmapbt_calc_reserves(struct xfs_mount *mp,
> +extern int xfs_rmapbt_calc_reserves(struct xfs_mount *mp, struct xfs_trans *tp,
>  		xfs_agnumber_t agno, xfs_extlen_t *ask, xfs_extlen_t *used);
>  
>  #endif	/* __XFS_RMAP_BTREE_H__ */
> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> index 3f2bd6032cf8..7c00b8bedfe3 100644
> --- a/fs/xfs/xfs_fsops.c
> +++ b/fs/xfs/xfs_fsops.c
> @@ -536,7 +536,7 @@ xfs_fs_reserve_ag_blocks(
>  
>  	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
>  		pag = xfs_perag_get(mp, agno);
> -		err2 = xfs_ag_resv_init(pag);
> +		err2 = xfs_ag_resv_init(pag, NULL);
>  		xfs_perag_put(pag);
>  		if (err2 && !error)
>  			error = err2;
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 02/16] xfs: move the repair extent list into its own file
  2018-07-26  0:19 ` [PATCH 02/16] xfs: move the repair extent list into its own file Darrick J. Wong
@ 2018-07-27 14:21   ` Brian Foster
  0 siblings, 0 replies; 26+ messages in thread
From: Brian Foster @ 2018-07-27 14:21 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, david, allison.henderson

On Wed, Jul 25, 2018 at 05:19:39PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Move the xrep_extent_list code into a separate file.  Logically, this
> data structure is really just a clumsy bitmap, and in the next patch
> we'll make this more obvious.  No functional changes.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

I'm not terribly familiar with the existing code, but looks like a
straight move:

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/Makefile       |    1 
>  fs/xfs/scrub/bitmap.c |  208 +++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/scrub/bitmap.h |   37 +++++++++
>  fs/xfs/scrub/repair.c |  194 ----------------------------------------------
>  fs/xfs/scrub/repair.h |   27 ------
>  5 files changed, 248 insertions(+), 219 deletions(-)
>  create mode 100644 fs/xfs/scrub/bitmap.c
>  create mode 100644 fs/xfs/scrub/bitmap.h
> 
> 
> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> index a36cccbec169..57ec46951ede 100644
> --- a/fs/xfs/Makefile
> +++ b/fs/xfs/Makefile
> @@ -164,6 +164,7 @@ xfs-$(CONFIG_XFS_QUOTA)		+= scrub/quota.o
>  ifeq ($(CONFIG_XFS_ONLINE_REPAIR),y)
>  xfs-y				+= $(addprefix scrub/, \
>  				   agheader_repair.o \
> +				   bitmap.o \
>  				   repair.o \
>  				   )
>  endif
> diff --git a/fs/xfs/scrub/bitmap.c b/fs/xfs/scrub/bitmap.c
> new file mode 100644
> index 000000000000..a7c2f4773f98
> --- /dev/null
> +++ b/fs/xfs/scrub/bitmap.c
> @@ -0,0 +1,208 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * Copyright (C) 2018 Oracle.  All Rights Reserved.
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + */
> +#include "xfs.h"
> +#include "xfs_fs.h"
> +#include "xfs_shared.h"
> +#include "xfs_format.h"
> +#include "xfs_trans_resv.h"
> +#include "xfs_mount.h"
> +#include "scrub/xfs_scrub.h"
> +#include "scrub/scrub.h"
> +#include "scrub/common.h"
> +#include "scrub/trace.h"
> +#include "scrub/repair.h"
> +#include "scrub/bitmap.h"
> +
> +/* Collect a dead btree extent for later disposal. */
> +int
> +xrep_collect_btree_extent(
> +	struct xfs_scrub	*sc,
> +	struct xrep_extent_list	*exlist,
> +	xfs_fsblock_t		fsbno,
> +	xfs_extlen_t		len)
> +{
> +	struct xrep_extent	*rex;
> +
> +	trace_xrep_collect_btree_extent(sc->mp,
> +			XFS_FSB_TO_AGNO(sc->mp, fsbno),
> +			XFS_FSB_TO_AGBNO(sc->mp, fsbno), len);
> +
> +	rex = kmem_alloc(sizeof(struct xrep_extent), KM_MAYFAIL);
> +	if (!rex)
> +		return -ENOMEM;
> +
> +	INIT_LIST_HEAD(&rex->list);
> +	rex->fsbno = fsbno;
> +	rex->len = len;
> +	list_add_tail(&rex->list, &exlist->list);
> +
> +	return 0;
> +}
> +
> +/*
> + * An error happened during the rebuild so the transaction will be cancelled.
> + * The fs will shut down, and the administrator has to unmount and run repair.
> + * Therefore, free all the memory associated with the list so we can die.
> + */
> +void
> +xrep_cancel_btree_extents(
> +	struct xfs_scrub	*sc,
> +	struct xrep_extent_list	*exlist)
> +{
> +	struct xrep_extent	*rex;
> +	struct xrep_extent	*n;
> +
> +	for_each_xrep_extent_safe(rex, n, exlist) {
> +		list_del(&rex->list);
> +		kmem_free(rex);
> +	}
> +}
> +
> +/* Compare two btree extents. */
> +static int
> +xrep_btree_extent_cmp(
> +	void			*priv,
> +	struct list_head	*a,
> +	struct list_head	*b)
> +{
> +	struct xrep_extent	*ap;
> +	struct xrep_extent	*bp;
> +
> +	ap = container_of(a, struct xrep_extent, list);
> +	bp = container_of(b, struct xrep_extent, list);
> +
> +	if (ap->fsbno > bp->fsbno)
> +		return 1;
> +	if (ap->fsbno < bp->fsbno)
> +		return -1;
> +	return 0;
> +}
> +
> +/*
> + * Remove all the blocks mentioned in @sublist from the extents in @exlist.
> + *
> + * The intent is that callers will iterate the rmapbt for all of its records
> + * for a given owner to generate @exlist; and iterate all the blocks of the
> + * metadata structures that are not being rebuilt and have the same rmapbt
> + * owner to generate @sublist.  This routine subtracts all the extents
> + * mentioned in sublist from all the extents linked in @exlist, which leaves
> + * @exlist as the list of blocks that are not accounted for, which we assume
> + * are the dead blocks of the old metadata structure.  The blocks mentioned in
> + * @exlist can be reaped.
> + */
> +#define LEFT_ALIGNED	(1 << 0)
> +#define RIGHT_ALIGNED	(1 << 1)
> +int
> +xrep_subtract_extents(
> +	struct xfs_scrub	*sc,
> +	struct xrep_extent_list	*exlist,
> +	struct xrep_extent_list	*sublist)
> +{
> +	struct list_head	*lp;
> +	struct xrep_extent	*ex;
> +	struct xrep_extent	*newex;
> +	struct xrep_extent	*subex;
> +	xfs_fsblock_t		sub_fsb;
> +	xfs_extlen_t		sub_len;
> +	int			state;
> +	int			error = 0;
> +
> +	if (list_empty(&exlist->list) || list_empty(&sublist->list))
> +		return 0;
> +	ASSERT(!list_empty(&sublist->list));
> +
> +	list_sort(NULL, &exlist->list, xrep_btree_extent_cmp);
> +	list_sort(NULL, &sublist->list, xrep_btree_extent_cmp);
> +
> +	/*
> +	 * Now that we've sorted both lists, we iterate exlist once, rolling
> +	 * forward through sublist and/or exlist as necessary until we find an
> +	 * overlap or reach the end of either list.  We do not reset lp to the
> +	 * head of exlist nor do we reset subex to the head of sublist.  The
> +	 * list traversal is similar to merge sort, but we're deleting
> +	 * instead.  In this manner we avoid O(n^2) operations.
> +	 */
> +	subex = list_first_entry(&sublist->list, struct xrep_extent,
> +			list);
> +	lp = exlist->list.next;
> +	while (lp != &exlist->list) {
> +		ex = list_entry(lp, struct xrep_extent, list);
> +
> +		/*
> +		 * Advance subex and/or ex until we find a pair that
> +		 * intersect or we run out of extents.
> +		 */
> +		while (subex->fsbno + subex->len <= ex->fsbno) {
> +			if (list_is_last(&subex->list, &sublist->list))
> +				goto out;
> +			subex = list_next_entry(subex, list);
> +		}
> +		if (subex->fsbno >= ex->fsbno + ex->len) {
> +			lp = lp->next;
> +			continue;
> +		}
> +
> +		/* trim subex to fit the extent we have */
> +		sub_fsb = subex->fsbno;
> +		sub_len = subex->len;
> +		if (subex->fsbno < ex->fsbno) {
> +			sub_len -= ex->fsbno - subex->fsbno;
> +			sub_fsb = ex->fsbno;
> +		}
> +		if (sub_len > ex->len)
> +			sub_len = ex->len;
> +
> +		state = 0;
> +		if (sub_fsb == ex->fsbno)
> +			state |= LEFT_ALIGNED;
> +		if (sub_fsb + sub_len == ex->fsbno + ex->len)
> +			state |= RIGHT_ALIGNED;
> +		switch (state) {
> +		case LEFT_ALIGNED:
> +			/* Coincides with only the left. */
> +			ex->fsbno += sub_len;
> +			ex->len -= sub_len;
> +			break;
> +		case RIGHT_ALIGNED:
> +			/* Coincides with only the right. */
> +			ex->len -= sub_len;
> +			lp = lp->next;
> +			break;
> +		case LEFT_ALIGNED | RIGHT_ALIGNED:
> +			/* Total overlap, just delete ex. */
> +			lp = lp->next;
> +			list_del(&ex->list);
> +			kmem_free(ex);
> +			break;
> +		case 0:
> +			/*
> +			 * Deleting from the middle: add the new right extent
> +			 * and then shrink the left extent.
> +			 */
> +			newex = kmem_alloc(sizeof(struct xrep_extent),
> +					KM_MAYFAIL);
> +			if (!newex) {
> +				error = -ENOMEM;
> +				goto out;
> +			}
> +			INIT_LIST_HEAD(&newex->list);
> +			newex->fsbno = sub_fsb + sub_len;
> +			newex->len = ex->fsbno + ex->len - newex->fsbno;
> +			list_add(&newex->list, &ex->list);
> +			ex->len = sub_fsb - ex->fsbno;
> +			lp = lp->next;
> +			break;
> +		default:
> +			ASSERT(0);
> +			break;
> +		}
> +	}
> +
> +out:
> +	return error;
> +}
> +#undef LEFT_ALIGNED
> +#undef RIGHT_ALIGNED
> diff --git a/fs/xfs/scrub/bitmap.h b/fs/xfs/scrub/bitmap.h
> new file mode 100644
> index 000000000000..1038157695a8
> --- /dev/null
> +++ b/fs/xfs/scrub/bitmap.h
> @@ -0,0 +1,37 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * Copyright (C) 2018 Oracle.  All Rights Reserved.
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + */
> +#ifndef __XFS_SCRUB_BITMAP_H__
> +#define __XFS_SCRUB_BITMAP_H__
> +
> +struct xrep_extent {
> +	struct list_head	list;
> +	xfs_fsblock_t		fsbno;
> +	xfs_extlen_t		len;
> +};
> +
> +struct xrep_extent_list {
> +	struct list_head	list;
> +};
> +
> +static inline void
> +xrep_init_extent_list(
> +	struct xrep_extent_list		*exlist)
> +{
> +	INIT_LIST_HEAD(&exlist->list);
> +}
> +
> +#define for_each_xrep_extent_safe(rbe, n, exlist) \
> +	list_for_each_entry_safe((rbe), (n), &(exlist)->list, list)
> +int xrep_collect_btree_extent(struct xfs_scrub *sc,
> +		struct xrep_extent_list *btlist, xfs_fsblock_t fsbno,
> +		xfs_extlen_t len);
> +void xrep_cancel_btree_extents(struct xfs_scrub *sc,
> +		struct xrep_extent_list *btlist);
> +int xrep_subtract_extents(struct xfs_scrub *sc,
> +		struct xrep_extent_list *exlist,
> +		struct xrep_extent_list *sublist);
> +
> +#endif	/* __XFS_SCRUB_BITMAP_H__ */
> diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
> index 5de1cac424ec..27a904ef6189 100644
> --- a/fs/xfs/scrub/repair.c
> +++ b/fs/xfs/scrub/repair.c
> @@ -34,6 +34,7 @@
>  #include "scrub/common.h"
>  #include "scrub/trace.h"
>  #include "scrub/repair.h"
> +#include "scrub/bitmap.h"
>  
>  /*
>   * Attempt to repair some metadata, if the metadata is corrupt and userspace
> @@ -380,200 +381,7 @@ xrep_init_btblock(
>   * sublist.  As with the other btrees we subtract sublist from exlist, and the
>   * result (since the rmapbt lives in the free space) are the blocks from the
>   * old rmapbt.
> - */
> -
> -/* Collect a dead btree extent for later disposal. */
> -int
> -xrep_collect_btree_extent(
> -	struct xfs_scrub	*sc,
> -	struct xrep_extent_list	*exlist,
> -	xfs_fsblock_t		fsbno,
> -	xfs_extlen_t		len)
> -{
> -	struct xrep_extent	*rex;
> -
> -	trace_xrep_collect_btree_extent(sc->mp,
> -			XFS_FSB_TO_AGNO(sc->mp, fsbno),
> -			XFS_FSB_TO_AGBNO(sc->mp, fsbno), len);
> -
> -	rex = kmem_alloc(sizeof(struct xrep_extent), KM_MAYFAIL);
> -	if (!rex)
> -		return -ENOMEM;
> -
> -	INIT_LIST_HEAD(&rex->list);
> -	rex->fsbno = fsbno;
> -	rex->len = len;
> -	list_add_tail(&rex->list, &exlist->list);
> -
> -	return 0;
> -}
> -
> -/*
> - * An error happened during the rebuild so the transaction will be cancelled.
> - * The fs will shut down, and the administrator has to unmount and run repair.
> - * Therefore, free all the memory associated with the list so we can die.
> - */
> -void
> -xrep_cancel_btree_extents(
> -	struct xfs_scrub	*sc,
> -	struct xrep_extent_list	*exlist)
> -{
> -	struct xrep_extent	*rex;
> -	struct xrep_extent	*n;
> -
> -	for_each_xrep_extent_safe(rex, n, exlist) {
> -		list_del(&rex->list);
> -		kmem_free(rex);
> -	}
> -}
> -
> -/* Compare two btree extents. */
> -static int
> -xrep_btree_extent_cmp(
> -	void			*priv,
> -	struct list_head	*a,
> -	struct list_head	*b)
> -{
> -	struct xrep_extent	*ap;
> -	struct xrep_extent	*bp;
> -
> -	ap = container_of(a, struct xrep_extent, list);
> -	bp = container_of(b, struct xrep_extent, list);
> -
> -	if (ap->fsbno > bp->fsbno)
> -		return 1;
> -	if (ap->fsbno < bp->fsbno)
> -		return -1;
> -	return 0;
> -}
> -
> -/*
> - * Remove all the blocks mentioned in @sublist from the extents in @exlist.
>   *
> - * The intent is that callers will iterate the rmapbt for all of its records
> - * for a given owner to generate @exlist; and iterate all the blocks of the
> - * metadata structures that are not being rebuilt and have the same rmapbt
> - * owner to generate @sublist.  This routine subtracts all the extents
> - * mentioned in sublist from all the extents linked in @exlist, which leaves
> - * @exlist as the list of blocks that are not accounted for, which we assume
> - * are the dead blocks of the old metadata structure.  The blocks mentioned in
> - * @exlist can be reaped.
> - */
> -#define LEFT_ALIGNED	(1 << 0)
> -#define RIGHT_ALIGNED	(1 << 1)
> -int
> -xrep_subtract_extents(
> -	struct xfs_scrub	*sc,
> -	struct xrep_extent_list	*exlist,
> -	struct xrep_extent_list	*sublist)
> -{
> -	struct list_head	*lp;
> -	struct xrep_extent	*ex;
> -	struct xrep_extent	*newex;
> -	struct xrep_extent	*subex;
> -	xfs_fsblock_t		sub_fsb;
> -	xfs_extlen_t		sub_len;
> -	int			state;
> -	int			error = 0;
> -
> -	if (list_empty(&exlist->list) || list_empty(&sublist->list))
> -		return 0;
> -	ASSERT(!list_empty(&sublist->list));
> -
> -	list_sort(NULL, &exlist->list, xrep_btree_extent_cmp);
> -	list_sort(NULL, &sublist->list, xrep_btree_extent_cmp);
> -
> -	/*
> -	 * Now that we've sorted both lists, we iterate exlist once, rolling
> -	 * forward through sublist and/or exlist as necessary until we find an
> -	 * overlap or reach the end of either list.  We do not reset lp to the
> -	 * head of exlist nor do we reset subex to the head of sublist.  The
> -	 * list traversal is similar to merge sort, but we're deleting
> -	 * instead.  In this manner we avoid O(n^2) operations.
> -	 */
> -	subex = list_first_entry(&sublist->list, struct xrep_extent,
> -			list);
> -	lp = exlist->list.next;
> -	while (lp != &exlist->list) {
> -		ex = list_entry(lp, struct xrep_extent, list);
> -
> -		/*
> -		 * Advance subex and/or ex until we find a pair that
> -		 * intersect or we run out of extents.
> -		 */
> -		while (subex->fsbno + subex->len <= ex->fsbno) {
> -			if (list_is_last(&subex->list, &sublist->list))
> -				goto out;
> -			subex = list_next_entry(subex, list);
> -		}
> -		if (subex->fsbno >= ex->fsbno + ex->len) {
> -			lp = lp->next;
> -			continue;
> -		}
> -
> -		/* trim subex to fit the extent we have */
> -		sub_fsb = subex->fsbno;
> -		sub_len = subex->len;
> -		if (subex->fsbno < ex->fsbno) {
> -			sub_len -= ex->fsbno - subex->fsbno;
> -			sub_fsb = ex->fsbno;
> -		}
> -		if (sub_len > ex->len)
> -			sub_len = ex->len;
> -
> -		state = 0;
> -		if (sub_fsb == ex->fsbno)
> -			state |= LEFT_ALIGNED;
> -		if (sub_fsb + sub_len == ex->fsbno + ex->len)
> -			state |= RIGHT_ALIGNED;
> -		switch (state) {
> -		case LEFT_ALIGNED:
> -			/* Coincides with only the left. */
> -			ex->fsbno += sub_len;
> -			ex->len -= sub_len;
> -			break;
> -		case RIGHT_ALIGNED:
> -			/* Coincides with only the right. */
> -			ex->len -= sub_len;
> -			lp = lp->next;
> -			break;
> -		case LEFT_ALIGNED | RIGHT_ALIGNED:
> -			/* Total overlap, just delete ex. */
> -			lp = lp->next;
> -			list_del(&ex->list);
> -			kmem_free(ex);
> -			break;
> -		case 0:
> -			/*
> -			 * Deleting from the middle: add the new right extent
> -			 * and then shrink the left extent.
> -			 */
> -			newex = kmem_alloc(sizeof(struct xrep_extent),
> -					KM_MAYFAIL);
> -			if (!newex) {
> -				error = -ENOMEM;
> -				goto out;
> -			}
> -			INIT_LIST_HEAD(&newex->list);
> -			newex->fsbno = sub_fsb + sub_len;
> -			newex->len = ex->fsbno + ex->len - newex->fsbno;
> -			list_add(&newex->list, &ex->list);
> -			ex->len = sub_fsb - ex->fsbno;
> -			lp = lp->next;
> -			break;
> -		default:
> -			ASSERT(0);
> -			break;
> -		}
> -	}
> -
> -out:
> -	return error;
> -}
> -#undef LEFT_ALIGNED
> -#undef RIGHT_ALIGNED
> -
> -/*
>   * Disposal of Blocks from Old per-AG Btrees
>   *
>   * Now that we've constructed a new btree to replace the damaged one, we want
> diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
> index 91355f6b0087..a3d491a438f4 100644
> --- a/fs/xfs/scrub/repair.h
> +++ b/fs/xfs/scrub/repair.h
> @@ -27,33 +27,8 @@ int xrep_init_btblock(struct xfs_scrub *sc, xfs_fsblock_t fsb,
>  		struct xfs_buf **bpp, xfs_btnum_t btnum,
>  		const struct xfs_buf_ops *ops);
>  
> -struct xrep_extent {
> -	struct list_head	list;
> -	xfs_fsblock_t		fsbno;
> -	xfs_extlen_t		len;
> -};
> -
> -struct xrep_extent_list {
> -	struct list_head	list;
> -};
> -
> -static inline void
> -xrep_init_extent_list(
> -	struct xrep_extent_list	*exlist)
> -{
> -	INIT_LIST_HEAD(&exlist->list);
> -}
> +struct xrep_extent_list;
>  
> -#define for_each_xrep_extent_safe(rbe, n, exlist) \
> -	list_for_each_entry_safe((rbe), (n), &(exlist)->list, list)
> -int xrep_collect_btree_extent(struct xfs_scrub *sc,
> -		struct xrep_extent_list *btlist, xfs_fsblock_t fsbno,
> -		xfs_extlen_t len);
> -void xrep_cancel_btree_extents(struct xfs_scrub *sc,
> -		struct xrep_extent_list *btlist);
> -int xrep_subtract_extents(struct xfs_scrub *sc,
> -		struct xrep_extent_list *exlist,
> -		struct xrep_extent_list *sublist);
>  int xrep_fix_freelist(struct xfs_scrub *sc, bool can_shrink);
>  int xrep_invalidate_blocks(struct xfs_scrub *sc,
>  		struct xrep_extent_list *btlist);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 03/16] xfs: refactor the xrep_extent_list into xfs_bitmap
  2018-07-26  0:19 ` [PATCH 03/16] xfs: refactor the xrep_extent_list into xfs_bitmap Darrick J. Wong
@ 2018-07-27 14:21   ` Brian Foster
  2018-07-27 15:52     ` Darrick J. Wong
  0 siblings, 1 reply; 26+ messages in thread
From: Brian Foster @ 2018-07-27 14:21 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, david, allison.henderson

On Wed, Jul 25, 2018 at 05:19:48PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> As mentioned previously, the xrep_extent_list basically implements a
> bitmap with two functions: set and disjoint union.  Rename all these
> functions to xfs_bitmap to shorten the name and make it more obvious
> what we're doing.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/scrub/bitmap.c |  173 +++++++++++++++++++++++++------------------------
>  fs/xfs/scrub/bitmap.h |   34 ++++------
>  fs/xfs/scrub/repair.c |   85 +++++++++++-------------
>  fs/xfs/scrub/repair.h |    8 +-
>  fs/xfs/scrub/trace.h  |    1 
>  5 files changed, 144 insertions(+), 157 deletions(-)
> 
> 
> diff --git a/fs/xfs/scrub/bitmap.c b/fs/xfs/scrub/bitmap.c
> index a7c2f4773f98..4840f5a1e179 100644
> --- a/fs/xfs/scrub/bitmap.c
> +++ b/fs/xfs/scrub/bitmap.c
...
> @@ -82,117 +84,118 @@ xrep_btree_extent_cmp(
>  }
>  
>  /*
> - * Remove all the blocks mentioned in @sublist from the extents in @exlist.
> + * Remove all the blocks mentioned in @sub from the extents in @bitmap.
>   *
>   * The intent is that callers will iterate the rmapbt for all of its records
> - * for a given owner to generate @exlist; and iterate all the blocks of the
> + * for a given owner to generate @bitmap; and iterate all the blocks of the
>   * metadata structures that are not being rebuilt and have the same rmapbt
> - * owner to generate @sublist.  This routine subtracts all the extents
> - * mentioned in sublist from all the extents linked in @exlist, which leaves
> - * @exlist as the list of blocks that are not accounted for, which we assume
> + * owner to generate @sub.  This routine subtracts all the extents
> + * mentioned in sub from all the extents linked in @bitmap, which leaves
> + * @bitmap as the list of blocks that are not accounted for, which we assume
>   * are the dead blocks of the old metadata structure.  The blocks mentioned in
> - * @exlist can be reaped.
> + * @bitmap can be reaped.
> + *
> + * This is the logical equivalent of bitmap &= ~sub.
>   */
>  #define LEFT_ALIGNED	(1 << 0)
>  #define RIGHT_ALIGNED	(1 << 1)
>  int
> -xrep_subtract_extents(
> -	struct xfs_scrub	*sc,
> -	struct xrep_extent_list	*exlist,
> -	struct xrep_extent_list	*sublist)
> +xfs_bitmap_disunion(
> +	struct xfs_bitmap	*bitmap,
> +	struct xfs_bitmap	*sub)
>  {
>  	struct list_head	*lp;
> -	struct xrep_extent	*ex;
> -	struct xrep_extent	*newex;
> -	struct xrep_extent	*subex;
> +	struct xfs_bitmap_range	*br;
> +	struct xfs_bitmap_range	*new_br;
> +	struct xfs_bitmap_range	*sub_br;
>  	xfs_fsblock_t		sub_fsb;
> -	xfs_extlen_t		sub_len;
> +	xfs_fsblock_t		sub_len;
>  	int			state;
>  	int			error = 0;
>  
> -	if (list_empty(&exlist->list) || list_empty(&sublist->list))
> +	if (list_empty(&bitmap->list) || list_empty(&sub->list))
>  		return 0;
> -	ASSERT(!list_empty(&sublist->list));
> +	ASSERT(!list_empty(&sub->list));
>  
> -	list_sort(NULL, &exlist->list, xrep_btree_extent_cmp);
> -	list_sort(NULL, &sublist->list, xrep_btree_extent_cmp);
> +	list_sort(NULL, &bitmap->list, xrep_btree_extent_cmp);
> +	list_sort(NULL, &sub->list, xrep_btree_extent_cmp);

Still a couple xrep_ function names here. I guess I'm not clear on how
generic this is intended to be..? FWIW, the xfs_bitmap->fsbno name stood
out a bit to me as well (as opposed to something more generic like ->key
or whatever).

Brian

>  
>  	/*
> -	 * Now that we've sorted both lists, we iterate exlist once, rolling
> -	 * forward through sublist and/or exlist as necessary until we find an
> +	 * Now that we've sorted both lists, we iterate bitmap once, rolling
> +	 * forward through sub and/or bitmap as necessary until we find an
>  	 * overlap or reach the end of either list.  We do not reset lp to the
> -	 * head of exlist nor do we reset subex to the head of sublist.  The
> +	 * head of bitmap nor do we reset sub_br to the head of sub.  The
>  	 * list traversal is similar to merge sort, but we're deleting
>  	 * instead.  In this manner we avoid O(n^2) operations.
>  	 */
> -	subex = list_first_entry(&sublist->list, struct xrep_extent,
> +	sub_br = list_first_entry(&sub->list, struct xfs_bitmap_range,
>  			list);
> -	lp = exlist->list.next;
> -	while (lp != &exlist->list) {
> -		ex = list_entry(lp, struct xrep_extent, list);
> +	lp = bitmap->list.next;
> +	while (lp != &bitmap->list) {
> +		br = list_entry(lp, struct xfs_bitmap_range, list);
>  
>  		/*
> -		 * Advance subex and/or ex until we find a pair that
> +		 * Advance sub_br and/or br until we find a pair that
>  		 * intersect or we run out of extents.
>  		 */
> -		while (subex->fsbno + subex->len <= ex->fsbno) {
> -			if (list_is_last(&subex->list, &sublist->list))
> +		while (sub_br->fsbno + sub_br->len <= br->fsbno) {
> +			if (list_is_last(&sub_br->list, &sub->list))
>  				goto out;
> -			subex = list_next_entry(subex, list);
> +			sub_br = list_next_entry(sub_br, list);
>  		}
> -		if (subex->fsbno >= ex->fsbno + ex->len) {
> +		if (sub_br->fsbno >= br->fsbno + br->len) {
>  			lp = lp->next;
>  			continue;
>  		}
>  
> -		/* trim subex to fit the extent we have */
> -		sub_fsb = subex->fsbno;
> -		sub_len = subex->len;
> -		if (subex->fsbno < ex->fsbno) {
> -			sub_len -= ex->fsbno - subex->fsbno;
> -			sub_fsb = ex->fsbno;
> +		/* trim sub_br to fit the extent we have */
> +		sub_fsb = sub_br->fsbno;
> +		sub_len = sub_br->len;
> +		if (sub_br->fsbno < br->fsbno) {
> +			sub_len -= br->fsbno - sub_br->fsbno;
> +			sub_fsb = br->fsbno;
>  		}
> -		if (sub_len > ex->len)
> -			sub_len = ex->len;
> +		if (sub_len > br->len)
> +			sub_len = br->len;
>  
>  		state = 0;
> -		if (sub_fsb == ex->fsbno)
> +		if (sub_fsb == br->fsbno)
>  			state |= LEFT_ALIGNED;
> -		if (sub_fsb + sub_len == ex->fsbno + ex->len)
> +		if (sub_fsb + sub_len == br->fsbno + br->len)
>  			state |= RIGHT_ALIGNED;
>  		switch (state) {
>  		case LEFT_ALIGNED:
>  			/* Coincides with only the left. */
> -			ex->fsbno += sub_len;
> -			ex->len -= sub_len;
> +			br->fsbno += sub_len;
> +			br->len -= sub_len;
>  			break;
>  		case RIGHT_ALIGNED:
>  			/* Coincides with only the right. */
> -			ex->len -= sub_len;
> +			br->len -= sub_len;
>  			lp = lp->next;
>  			break;
>  		case LEFT_ALIGNED | RIGHT_ALIGNED:
>  			/* Total overlap, just delete ex. */
>  			lp = lp->next;
> -			list_del(&ex->list);
> -			kmem_free(ex);
> +			list_del(&br->list);
> +			kmem_free(br);
>  			break;
>  		case 0:
>  			/*
>  			 * Deleting from the middle: add the new right extent
>  			 * and then shrink the left extent.
>  			 */
> -			newex = kmem_alloc(sizeof(struct xrep_extent),
> +			new_br = kmem_alloc(sizeof(struct xfs_bitmap_range),
>  					KM_MAYFAIL);
> -			if (!newex) {
> +			if (!new_br) {
>  				error = -ENOMEM;
>  				goto out;
>  			}
> -			INIT_LIST_HEAD(&newex->list);
> -			newex->fsbno = sub_fsb + sub_len;
> -			newex->len = ex->fsbno + ex->len - newex->fsbno;
> -			list_add(&newex->list, &ex->list);
> -			ex->len = sub_fsb - ex->fsbno;
> +			INIT_LIST_HEAD(&new_br->list);
> +			new_br->fsbno = sub_fsb + sub_len;
> +			new_br->len = br->fsbno + br->len - new_br->fsbno;
> +			list_add(&new_br->list, &br->list);
> +			br->len = sub_fsb - br->fsbno;
>  			lp = lp->next;
>  			break;
>  		default:
> diff --git a/fs/xfs/scrub/bitmap.h b/fs/xfs/scrub/bitmap.h
> index 1038157695a8..3c39900e9269 100644
> --- a/fs/xfs/scrub/bitmap.h
> +++ b/fs/xfs/scrub/bitmap.h
> @@ -6,32 +6,28 @@
>  #ifndef __XFS_SCRUB_BITMAP_H__
>  #define __XFS_SCRUB_BITMAP_H__
>  
> -struct xrep_extent {
> +struct xfs_bitmap_range {
>  	struct list_head	list;
>  	xfs_fsblock_t		fsbno;
> -	xfs_extlen_t		len;
> +	xfs_fsblock_t		len;
>  };
>  
> -struct xrep_extent_list {
> +struct xfs_bitmap {
>  	struct list_head	list;
>  };
>  
> -static inline void
> -xrep_init_extent_list(
> -	struct xrep_extent_list		*exlist)
> -{
> -	INIT_LIST_HEAD(&exlist->list);
> -}
> +void xfs_bitmap_init(struct xfs_bitmap *bitmap);
> +void xfs_bitmap_destroy(struct xfs_bitmap *bitmap);
>  
> -#define for_each_xrep_extent_safe(rbe, n, exlist) \
> -	list_for_each_entry_safe((rbe), (n), &(exlist)->list, list)
> -int xrep_collect_btree_extent(struct xfs_scrub *sc,
> -		struct xrep_extent_list *btlist, xfs_fsblock_t fsbno,
> -		xfs_extlen_t len);
> -void xrep_cancel_btree_extents(struct xfs_scrub *sc,
> -		struct xrep_extent_list *btlist);
> -int xrep_subtract_extents(struct xfs_scrub *sc,
> -		struct xrep_extent_list *exlist,
> -		struct xrep_extent_list *sublist);
> +#define for_each_xfs_bitmap_extent(bex, n, bitmap) \
> +	list_for_each_entry_safe((bex), (n), &(bitmap)->list, list)
> +
> +#define for_each_xfs_bitmap_block(fsbno, bex, n, bitmap) \
> +	list_for_each_entry_safe((bex), (n), &(bitmap)->list, list) \
> +		for (fsbno = bex->fsbno; fsbno < bex->fsbno + bex->len; fsbno++)
> +
> +int xfs_bitmap_set(struct xfs_bitmap *bitmap, xfs_fsblock_t fsbno,
> +		xfs_fsblock_t len);
> +int xfs_bitmap_disunion(struct xfs_bitmap *bitmap, struct xfs_bitmap *sub);
>  
>  #endif	/* __XFS_SCRUB_BITMAP_H__ */
> diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
> index 27a904ef6189..85b048b341a0 100644
> --- a/fs/xfs/scrub/repair.c
> +++ b/fs/xfs/scrub/repair.c
> @@ -368,17 +368,17 @@ xrep_init_btblock(
>   *
>   * However, that leaves the matter of removing all the metadata describing the
>   * old broken structure.  For primary metadata we use the rmap data to collect
> - * every extent with a matching rmap owner (exlist); we then iterate all other
> + * every extent with a matching rmap owner (bitmap); we then iterate all other
>   * metadata structures with the same rmap owner to collect the extents that
> - * cannot be removed (sublist).  We then subtract sublist from exlist to
> + * cannot be removed (sublist).  We then subtract sublist from bitmap to
>   * derive the blocks that were used by the old btree.  These blocks can be
>   * reaped.
>   *
>   * For rmapbt reconstructions we must use different tactics for extent
>   * collection.  First we iterate all primary metadata (this excludes the old
>   * rmapbt, obviously) to generate new rmap records.  The gaps in the rmap
> - * records are collected as exlist.  The bnobt records are collected as
> - * sublist.  As with the other btrees we subtract sublist from exlist, and the
> + * records are collected as bitmap.  The bnobt records are collected as
> + * sublist.  As with the other btrees we subtract sublist from bitmap, and the
>   * result (since the rmapbt lives in the free space) are the blocks from the
>   * old rmapbt.
>   *
> @@ -386,11 +386,11 @@ xrep_init_btblock(
>   *
>   * Now that we've constructed a new btree to replace the damaged one, we want
>   * to dispose of the blocks that (we think) the old btree was using.
> - * Previously, we used the rmapbt to collect the extents (exlist) with the
> + * Previously, we used the rmapbt to collect the extents (bitmap) with the
>   * rmap owner corresponding to the tree we rebuilt, collected extents for any
>   * blocks with the same rmap owner that are owned by another data structure
> - * (sublist), and subtracted sublist from exlist.  In theory the extents
> - * remaining in exlist are the old btree's blocks.
> + * (sublist), and subtracted sublist from bitmap.  In theory the extents
> + * remaining in bitmap are the old btree's blocks.
>   *
>   * Unfortunately, it's possible that the btree was crosslinked with other
>   * blocks on disk.  The rmap data can tell us if there are multiple owners, so
> @@ -406,7 +406,7 @@ xrep_init_btblock(
>   * If there are no rmap records at all, we also free the block.  If the btree
>   * being rebuilt lives in the free space (bnobt/cntbt/rmapbt) then there isn't
>   * supposed to be a rmap record and everything is ok.  For other btrees there
> - * had to have been an rmap entry for the block to have ended up on @exlist,
> + * had to have been an rmap entry for the block to have ended up on @bitmap,
>   * so if it's gone now there's something wrong and the fs will shut down.
>   *
>   * Note: If there are multiple rmap records with only the same rmap owner as
> @@ -419,7 +419,7 @@ xrep_init_btblock(
>   * The caller is responsible for locking the AG headers for the entire rebuild
>   * operation so that nothing else can sneak in and change the AG state while
>   * we're not looking.  We also assume that the caller already invalidated any
> - * buffers associated with @exlist.
> + * buffers associated with @bitmap.
>   */
>  
>  /*
> @@ -429,13 +429,12 @@ xrep_init_btblock(
>  int
>  xrep_invalidate_blocks(
>  	struct xfs_scrub	*sc,
> -	struct xrep_extent_list	*exlist)
> +	struct xfs_bitmap	*bitmap)
>  {
> -	struct xrep_extent	*rex;
> -	struct xrep_extent	*n;
> +	struct xfs_bitmap_range	*bmr;
> +	struct xfs_bitmap_range	*n;
>  	struct xfs_buf		*bp;
>  	xfs_fsblock_t		fsbno;
> -	xfs_agblock_t		i;
>  
>  	/*
>  	 * For each block in each extent, see if there's an incore buffer for
> @@ -445,18 +444,16 @@ xrep_invalidate_blocks(
>  	 * because we never own those; and if we can't TRYLOCK the buffer we
>  	 * assume it's owned by someone else.
>  	 */
> -	for_each_xrep_extent_safe(rex, n, exlist) {
> -		for (fsbno = rex->fsbno, i = rex->len; i > 0; fsbno++, i--) {
> -			/* Skip AG headers and post-EOFS blocks */
> -			if (!xfs_verify_fsbno(sc->mp, fsbno))
> -				continue;
> -			bp = xfs_buf_incore(sc->mp->m_ddev_targp,
> -					XFS_FSB_TO_DADDR(sc->mp, fsbno),
> -					XFS_FSB_TO_BB(sc->mp, 1), XBF_TRYLOCK);
> -			if (bp) {
> -				xfs_trans_bjoin(sc->tp, bp);
> -				xfs_trans_binval(sc->tp, bp);
> -			}
> +	for_each_xfs_bitmap_block(fsbno, bmr, n, bitmap) {
> +		/* Skip AG headers and post-EOFS blocks */
> +		if (!xfs_verify_fsbno(sc->mp, fsbno))
> +			continue;
> +		bp = xfs_buf_incore(sc->mp->m_ddev_targp,
> +				XFS_FSB_TO_DADDR(sc->mp, fsbno),
> +				XFS_FSB_TO_BB(sc->mp, 1), XBF_TRYLOCK);
> +		if (bp) {
> +			xfs_trans_bjoin(sc->tp, bp);
> +			xfs_trans_binval(sc->tp, bp);
>  		}
>  	}
>  
> @@ -519,9 +516,9 @@ xrep_put_freelist(
>  	return 0;
>  }
>  
> -/* Dispose of a single metadata block. */
> +/* Dispose of a single block. */
>  STATIC int
> -xrep_dispose_btree_block(
> +xrep_reap_block(
>  	struct xfs_scrub	*sc,
>  	xfs_fsblock_t		fsbno,
>  	struct xfs_owner_info	*oinfo,
> @@ -593,41 +590,35 @@ xrep_dispose_btree_block(
>  	return error;
>  }
>  
> -/* Dispose of btree blocks from an old per-AG btree. */
> +/* Dispose of every block of every extent in the bitmap. */
>  int
> -xrep_reap_btree_extents(
> +xrep_reap_extents(
>  	struct xfs_scrub	*sc,
> -	struct xrep_extent_list	*exlist,
> +	struct xfs_bitmap	*bitmap,
>  	struct xfs_owner_info	*oinfo,
>  	enum xfs_ag_resv_type	type)
>  {
> -	struct xrep_extent	*rex;
> -	struct xrep_extent	*n;
> +	struct xfs_bitmap_range	*bmr;
> +	struct xfs_bitmap_range	*n;
> +	xfs_fsblock_t		fsbno;
>  	int			error = 0;
>  
>  	ASSERT(xfs_sb_version_hasrmapbt(&sc->mp->m_sb));
>  
> -	/* Dispose of every block from the old btree. */
> -	for_each_xrep_extent_safe(rex, n, exlist) {
> +	for_each_xfs_bitmap_block(fsbno, bmr, n, bitmap) {
>  		ASSERT(sc->ip != NULL ||
> -		       XFS_FSB_TO_AGNO(sc->mp, rex->fsbno) == sc->sa.agno);
> -
> +		       XFS_FSB_TO_AGNO(sc->mp, fsbno) == sc->sa.agno);
>  		trace_xrep_dispose_btree_extent(sc->mp,
> -				XFS_FSB_TO_AGNO(sc->mp, rex->fsbno),
> -				XFS_FSB_TO_AGBNO(sc->mp, rex->fsbno), rex->len);
> +				XFS_FSB_TO_AGNO(sc->mp, fsbno),
> +				XFS_FSB_TO_AGBNO(sc->mp, fsbno), 1);
>  
> -		for (; rex->len > 0; rex->len--, rex->fsbno++) {
> -			error = xrep_dispose_btree_block(sc, rex->fsbno,
> -					oinfo, type);
> -			if (error)
> -				goto out;
> -		}
> -		list_del(&rex->list);
> -		kmem_free(rex);
> +		error = xrep_reap_block(sc, fsbno, oinfo, type);
> +		if (error)
> +			goto out;
>  	}
>  
>  out:
> -	xrep_cancel_btree_extents(sc, exlist);
> +	xfs_bitmap_destroy(bitmap);
>  	return error;
>  }
>  
> diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
> index a3d491a438f4..5a4e92221916 100644
> --- a/fs/xfs/scrub/repair.h
> +++ b/fs/xfs/scrub/repair.h
> @@ -27,13 +27,11 @@ int xrep_init_btblock(struct xfs_scrub *sc, xfs_fsblock_t fsb,
>  		struct xfs_buf **bpp, xfs_btnum_t btnum,
>  		const struct xfs_buf_ops *ops);
>  
> -struct xrep_extent_list;
> +struct xfs_bitmap;
>  
>  int xrep_fix_freelist(struct xfs_scrub *sc, bool can_shrink);
> -int xrep_invalidate_blocks(struct xfs_scrub *sc,
> -		struct xrep_extent_list *btlist);
> -int xrep_reap_btree_extents(struct xfs_scrub *sc,
> -		struct xrep_extent_list *exlist,
> +int xrep_invalidate_blocks(struct xfs_scrub *sc, struct xfs_bitmap *btlist);
> +int xrep_reap_extents(struct xfs_scrub *sc, struct xfs_bitmap *exlist,
>  		struct xfs_owner_info *oinfo, enum xfs_ag_resv_type type);
>  
>  struct xrep_find_ag_btree {
> diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
> index 93db22c39b51..4e20f0e48232 100644
> --- a/fs/xfs/scrub/trace.h
> +++ b/fs/xfs/scrub/trace.h
> @@ -511,7 +511,6 @@ DEFINE_EVENT(xrep_extent_class, name, \
>  		 xfs_agblock_t agbno, xfs_extlen_t len), \
>  	TP_ARGS(mp, agno, agbno, len))
>  DEFINE_REPAIR_EXTENT_EVENT(xrep_dispose_btree_extent);
> -DEFINE_REPAIR_EXTENT_EVENT(xrep_collect_btree_extent);
>  DEFINE_REPAIR_EXTENT_EVENT(xrep_agfl_insert);
>  
>  DECLARE_EVENT_CLASS(xrep_rmap_class,
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 04/16] xfs: repair the AGF
  2018-07-26  0:19 ` [PATCH 04/16] xfs: repair the AGF Darrick J. Wong
@ 2018-07-27 14:23   ` Brian Foster
  2018-07-27 16:02     ` Darrick J. Wong
  0 siblings, 1 reply; 26+ messages in thread
From: Brian Foster @ 2018-07-27 14:23 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, david, allison.henderson

On Wed, Jul 25, 2018 at 05:19:55PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Regenerate the AGF from the rmap data.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

Mostly seems sane to me. I still need to come up to speed on the broader
xfs_scrub context. A few comments in the meantime..

>  fs/xfs/scrub/agheader_repair.c |  366 ++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/scrub/repair.c          |   27 ++-
>  fs/xfs/scrub/repair.h          |    4 
>  fs/xfs/scrub/scrub.c           |    2 
>  4 files changed, 389 insertions(+), 10 deletions(-)
> 
> 
> diff --git a/fs/xfs/scrub/agheader_repair.c b/fs/xfs/scrub/agheader_repair.c
> index 1e96621ece3a..938af216cb1c 100644
> --- a/fs/xfs/scrub/agheader_repair.c
> +++ b/fs/xfs/scrub/agheader_repair.c
...
> @@ -54,3 +61,362 @@ xrep_superblock(
>  	xfs_trans_log_buf(sc->tp, bp, 0, BBTOB(bp->b_length) - 1);
>  	return error;
>  }
...
> +/* Update all AGF fields which derive from btree contents. */
> +STATIC int
> +xrep_agf_calc_from_btrees(
> +	struct xfs_scrub	*sc,
> +	struct xfs_buf		*agf_bp)
> +{
> +	struct xrep_agf_allocbt	raa = { .sc = sc };
> +	struct xfs_btree_cur	*cur = NULL;
> +	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agf_bp);
> +	struct xfs_mount	*mp = sc->mp;
> +	xfs_agblock_t		btreeblks;
> +	xfs_agblock_t		blocks;
> +	int			error;
> +
> +	/* Update the AGF counters from the bnobt. */
> +	cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno,
> +			XFS_BTNUM_BNO);
> +	error = xfs_alloc_query_all(cur, xrep_agf_walk_allocbt, &raa);
> +	if (error)
> +		goto err;
> +	error = xfs_btree_count_blocks(cur, &blocks);
> +	if (error)
> +		goto err;
> +	xfs_btree_del_cursor(cur, error);
> +	btreeblks = blocks - 1;

Why the -1? We don't count the root or something?

> +	agf->agf_freeblks = cpu_to_be32(raa.freeblks);
> +	agf->agf_longest = cpu_to_be32(raa.longest);
> +
> +	/* Update the AGF counters from the cntbt. */
> +	cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno,
> +			XFS_BTNUM_CNT);
> +	error = xfs_btree_count_blocks(cur, &blocks);
> +	if (error)
> +		goto err;
> +	xfs_btree_del_cursor(cur, error);
> +	btreeblks += blocks - 1;
> +
> +	/* Update the AGF counters from the rmapbt. */
> +	cur = xfs_rmapbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno);
> +	error = xfs_btree_count_blocks(cur, &blocks);
> +	if (error)
> +		goto err;
> +	xfs_btree_del_cursor(cur, error);
> +	agf->agf_rmap_blocks = cpu_to_be32(blocks);
> +	btreeblks += blocks - 1;
> +
> +	agf->agf_btreeblks = cpu_to_be32(btreeblks);
> +
> +	/* Update the AGF counters from the refcountbt. */
> +	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> +		cur = xfs_refcountbt_init_cursor(mp, sc->tp, agf_bp,
> +				sc->sa.agno, NULL);

FYI this fails to compile on for-next (dfops param has been removed).

> +		error = xfs_btree_count_blocks(cur, &blocks);
> +		if (error)
> +			goto err;
> +		xfs_btree_del_cursor(cur, error);
> +		agf->agf_refcount_blocks = cpu_to_be32(blocks);
> +	}
> +
> +	return 0;
> +err:
> +	xfs_btree_del_cursor(cur, error);
> +	return error;
> +}
> +
> +/* Commit the new AGF and reinitialize the incore state. */
> +STATIC int
> +xrep_agf_commit_new(
> +	struct xfs_scrub	*sc,
> +	struct xfs_buf		*agf_bp)
> +{
> +	struct xfs_perag	*pag;
> +	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agf_bp);
> +
> +	/* Trigger fdblocks recalculation */
> +	xfs_force_summary_recalc(sc->mp);
> +
> +	/* Write this to disk. */
> +	xfs_trans_buf_set_type(sc->tp, agf_bp, XFS_BLFT_AGF_BUF);
> +	xfs_trans_log_buf(sc->tp, agf_bp, 0, BBTOB(agf_bp->b_length) - 1);
> +
> +	/* Now reinitialize the in-core counters we changed. */
> +	pag = sc->sa.pag;
> +	sc->sa.pag->pagf_init = 1;

Nit: can probably do 'pag->pagf_init = 1' here since we just initialized
pag on the line above.

That aside, is ordering important at all here? I'm wondering if somebody
can grab the pag right after we set this and see pagf_init == 1 before
we've updated the values below. Perhaps it doesn't really matter since
we have the agf buffer.

> +	pag->pagf_btreeblks = be32_to_cpu(agf->agf_btreeblks);
> +	pag->pagf_freeblks = be32_to_cpu(agf->agf_freeblks);
> +	pag->pagf_longest = be32_to_cpu(agf->agf_longest);
> +	pag->pagf_levels[XFS_BTNUM_BNOi] =
> +			be32_to_cpu(agf->agf_levels[XFS_BTNUM_BNOi]);
> +	pag->pagf_levels[XFS_BTNUM_CNTi] =
> +			be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNTi]);
> +	pag->pagf_levels[XFS_BTNUM_RMAPi] =
> +			be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAPi]);
> +	pag->pagf_refcount_level = be32_to_cpu(agf->agf_refcount_level);
> +
> +	return 0;
> +}
> +
> +/* Repair the AGF. v5 filesystems only. */
> +int
> +xrep_agf(
> +	struct xfs_scrub		*sc)
> +{
> +	struct xrep_find_ag_btree	fab[XREP_AGF_MAX] = {
> +		[XREP_AGF_BNOBT] = {
> +			.rmap_owner = XFS_RMAP_OWN_AG,
> +			.buf_ops = &xfs_allocbt_buf_ops,
> +			.magic = XFS_ABTB_CRC_MAGIC,
> +		},
> +		[XREP_AGF_CNTBT] = {
> +			.rmap_owner = XFS_RMAP_OWN_AG,
> +			.buf_ops = &xfs_allocbt_buf_ops,
> +			.magic = XFS_ABTC_CRC_MAGIC,
> +		},
> +		[XREP_AGF_RMAPBT] = {
> +			.rmap_owner = XFS_RMAP_OWN_AG,
> +			.buf_ops = &xfs_rmapbt_buf_ops,
> +			.magic = XFS_RMAP_CRC_MAGIC,
> +		},
> +		[XREP_AGF_REFCOUNTBT] = {
> +			.rmap_owner = XFS_RMAP_OWN_REFC,
> +			.buf_ops = &xfs_refcountbt_buf_ops,
> +			.magic = XFS_REFC_CRC_MAGIC,
> +		},
> +		[XREP_AGF_END] = {
> +			.buf_ops = NULL,
> +		},
> +	};
> +	struct xfs_agf			old_agf;
> +	struct xfs_mount		*mp = sc->mp;
> +	struct xfs_buf			*agf_bp;
> +	struct xfs_buf			*agfl_bp;
> +	struct xfs_agf			*agf;
> +	int				error;
> +
> +	/* We require the rmapbt to rebuild anything. */
> +	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
> +		return -EOPNOTSUPP;
> +
> +	xchk_perag_get(sc->mp, &sc->sa);
> +	error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
> +			XFS_AG_DADDR(mp, sc->sa.agno, XFS_AGF_DADDR(mp)),
> +			XFS_FSS_TO_BB(mp, 1), 0, &agf_bp, NULL);
> +	if (error)
> +		return error;
> +	agf_bp->b_ops = &xfs_agf_buf_ops;

Any reason we don't call xfs_read_agf() here? It looks like we use the
similar helper for the agfl below.

> +	agf = XFS_BUF_TO_AGF(agf_bp);
> +
> +	/*
> +	 * Load the AGFL so that we can screen out OWN_AG blocks that are on
> +	 * the AGFL now; these blocks might have once been part of the
> +	 * bno/cnt/rmap btrees but are not now.  This is a chicken and egg
> +	 * problem: the AGF is corrupt, so we have to trust the AGFL contents
> +	 * because we can't do any serious cross-referencing with any of the
> +	 * btrees rooted in the AGF.  If the AGFL contents are obviously bad
> +	 * then we'll bail out.
> +	 */
> +	error = xfs_alloc_read_agfl(mp, sc->tp, sc->sa.agno, &agfl_bp);
> +	if (error)
> +		return error;
> +
> +	/*
> +	 * Spot-check the AGFL blocks; if they're obviously corrupt then
> +	 * there's nothing we can do but bail out.
> +	 */

Why? Can't we reset the agfl, or is that handled elsewhere?

Brian

> +	error = xfs_agfl_walk(sc->mp, XFS_BUF_TO_AGF(agf_bp), agfl_bp,
> +			xrep_agf_check_agfl_block, sc);
> +	if (error)
> +		return error;
> +
> +	/*
> +	 * Find the AGF btree roots.  This is also a chicken-and-egg situation;
> +	 * see the function for more details.
> +	 */
> +	error = xrep_agf_find_btrees(sc, agf_bp, fab, agfl_bp);
> +	if (error)
> +		return error;
> +
> +	/* Start rewriting the header and implant the btrees we found. */
> +	xrep_agf_init_header(sc, agf_bp, &old_agf);
> +	xrep_agf_set_roots(sc, agf, fab);
> +	error = xrep_agf_calc_from_btrees(sc, agf_bp);
> +	if (error)
> +		goto out_revert;
> +
> +	/* Commit the changes and reinitialize incore state. */
> +	return xrep_agf_commit_new(sc, agf_bp);
> +
> +out_revert:
> +	/* Mark the incore AGF state stale and revert the AGF. */
> +	sc->sa.pag->pagf_init = 0;
> +	memcpy(agf, &old_agf, sizeof(old_agf));
> +	return error;
> +}
> diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
> index 85b048b341a0..17cf48564390 100644
> --- a/fs/xfs/scrub/repair.c
> +++ b/fs/xfs/scrub/repair.c
> @@ -128,9 +128,12 @@ xrep_roll_ag_trans(
>  	int			error;
>  
>  	/* Keep the AG header buffers locked so we can keep going. */
> -	xfs_trans_bhold(sc->tp, sc->sa.agi_bp);
> -	xfs_trans_bhold(sc->tp, sc->sa.agf_bp);
> -	xfs_trans_bhold(sc->tp, sc->sa.agfl_bp);
> +	if (sc->sa.agi_bp)
> +		xfs_trans_bhold(sc->tp, sc->sa.agi_bp);
> +	if (sc->sa.agf_bp)
> +		xfs_trans_bhold(sc->tp, sc->sa.agf_bp);
> +	if (sc->sa.agfl_bp)
> +		xfs_trans_bhold(sc->tp, sc->sa.agfl_bp);
>  
>  	/* Roll the transaction. */
>  	error = xfs_trans_roll(&sc->tp);
> @@ -138,9 +141,12 @@ xrep_roll_ag_trans(
>  		goto out_release;
>  
>  	/* Join AG headers to the new transaction. */
> -	xfs_trans_bjoin(sc->tp, sc->sa.agi_bp);
> -	xfs_trans_bjoin(sc->tp, sc->sa.agf_bp);
> -	xfs_trans_bjoin(sc->tp, sc->sa.agfl_bp);
> +	if (sc->sa.agi_bp)
> +		xfs_trans_bjoin(sc->tp, sc->sa.agi_bp);
> +	if (sc->sa.agf_bp)
> +		xfs_trans_bjoin(sc->tp, sc->sa.agf_bp);
> +	if (sc->sa.agfl_bp)
> +		xfs_trans_bjoin(sc->tp, sc->sa.agfl_bp);
>  
>  	return 0;
>  
> @@ -150,9 +156,12 @@ xrep_roll_ag_trans(
>  	 * buffers will be released during teardown on our way out
>  	 * of the kernel.
>  	 */
> -	xfs_trans_bhold_release(sc->tp, sc->sa.agi_bp);
> -	xfs_trans_bhold_release(sc->tp, sc->sa.agf_bp);
> -	xfs_trans_bhold_release(sc->tp, sc->sa.agfl_bp);
> +	if (sc->sa.agi_bp)
> +		xfs_trans_bhold_release(sc->tp, sc->sa.agi_bp);
> +	if (sc->sa.agf_bp)
> +		xfs_trans_bhold_release(sc->tp, sc->sa.agf_bp);
> +	if (sc->sa.agfl_bp)
> +		xfs_trans_bhold_release(sc->tp, sc->sa.agfl_bp);
>  
>  	return error;
>  }
> diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
> index 5a4e92221916..1d283360b5ab 100644
> --- a/fs/xfs/scrub/repair.h
> +++ b/fs/xfs/scrub/repair.h
> @@ -58,6 +58,8 @@ int xrep_ino_dqattach(struct xfs_scrub *sc);
>  
>  int xrep_probe(struct xfs_scrub *sc);
>  int xrep_superblock(struct xfs_scrub *sc);
> +int xrep_agf(struct xfs_scrub *sc);
> +int xrep_agfl(struct xfs_scrub *sc);
>  
>  #else
>  
> @@ -81,6 +83,8 @@ xrep_calc_ag_resblks(
>  
>  #define xrep_probe			xrep_notsupported
>  #define xrep_superblock			xrep_notsupported
> +#define xrep_agf			xrep_notsupported
> +#define xrep_agfl			xrep_notsupported
>  
>  #endif /* CONFIG_XFS_ONLINE_REPAIR */
>  
> diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
> index 6efb926f3cf8..1e8a17c8e2b9 100644
> --- a/fs/xfs/scrub/scrub.c
> +++ b/fs/xfs/scrub/scrub.c
> @@ -214,7 +214,7 @@ static const struct xchk_meta_ops meta_scrub_ops[] = {
>  		.type	= ST_PERAG,
>  		.setup	= xchk_setup_fs,
>  		.scrub	= xchk_agf,
> -		.repair	= xrep_notsupported,
> +		.repair	= xrep_agf,
>  	},
>  	[XFS_SCRUB_TYPE_AGFL]= {	/* agfl */
>  		.type	= ST_PERAG,
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 03/16] xfs: refactor the xrep_extent_list into xfs_bitmap
  2018-07-27 14:21   ` Brian Foster
@ 2018-07-27 15:52     ` Darrick J. Wong
  0 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2018-07-27 15:52 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs, david, allison.henderson

On Fri, Jul 27, 2018 at 10:21:35AM -0400, Brian Foster wrote:
> On Wed, Jul 25, 2018 at 05:19:48PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > As mentioned previously, the xrep_extent_list basically implements a
> > bitmap with two functions: set and disjoint union.  Rename all these
> > functions to xfs_bitmap to shorten the name and make it more obvious
> > what we're doing.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/scrub/bitmap.c |  173 +++++++++++++++++++++++++------------------------
> >  fs/xfs/scrub/bitmap.h |   34 ++++------
> >  fs/xfs/scrub/repair.c |   85 +++++++++++-------------
> >  fs/xfs/scrub/repair.h |    8 +-
> >  fs/xfs/scrub/trace.h  |    1 
> >  5 files changed, 144 insertions(+), 157 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/scrub/bitmap.c b/fs/xfs/scrub/bitmap.c
> > index a7c2f4773f98..4840f5a1e179 100644
> > --- a/fs/xfs/scrub/bitmap.c
> > +++ b/fs/xfs/scrub/bitmap.c
> ...
> > @@ -82,117 +84,118 @@ xrep_btree_extent_cmp(
> >  }
> >  
> >  /*
> > - * Remove all the blocks mentioned in @sublist from the extents in @exlist.
> > + * Remove all the blocks mentioned in @sub from the extents in @bitmap.
> >   *
> >   * The intent is that callers will iterate the rmapbt for all of its records
> > - * for a given owner to generate @exlist; and iterate all the blocks of the
> > + * for a given owner to generate @bitmap; and iterate all the blocks of the
> >   * metadata structures that are not being rebuilt and have the same rmapbt
> > - * owner to generate @sublist.  This routine subtracts all the extents
> > - * mentioned in sublist from all the extents linked in @exlist, which leaves
> > - * @exlist as the list of blocks that are not accounted for, which we assume
> > + * owner to generate @sub.  This routine subtracts all the extents
> > + * mentioned in sub from all the extents linked in @bitmap, which leaves
> > + * @bitmap as the list of blocks that are not accounted for, which we assume
> >   * are the dead blocks of the old metadata structure.  The blocks mentioned in
> > - * @exlist can be reaped.
> > + * @bitmap can be reaped.
> > + *
> > + * This is the logical equivalent of bitmap &= ~sub.
> >   */
> >  #define LEFT_ALIGNED	(1 << 0)
> >  #define RIGHT_ALIGNED	(1 << 1)
> >  int
> > -xrep_subtract_extents(
> > -	struct xfs_scrub	*sc,
> > -	struct xrep_extent_list	*exlist,
> > -	struct xrep_extent_list	*sublist)
> > +xfs_bitmap_disunion(
> > +	struct xfs_bitmap	*bitmap,
> > +	struct xfs_bitmap	*sub)
> >  {
> >  	struct list_head	*lp;
> > -	struct xrep_extent	*ex;
> > -	struct xrep_extent	*newex;
> > -	struct xrep_extent	*subex;
> > +	struct xfs_bitmap_range	*br;
> > +	struct xfs_bitmap_range	*new_br;
> > +	struct xfs_bitmap_range	*sub_br;
> >  	xfs_fsblock_t		sub_fsb;
> > -	xfs_extlen_t		sub_len;
> > +	xfs_fsblock_t		sub_len;
> >  	int			state;
> >  	int			error = 0;
> >  
> > -	if (list_empty(&exlist->list) || list_empty(&sublist->list))
> > +	if (list_empty(&bitmap->list) || list_empty(&sub->list))
> >  		return 0;
> > -	ASSERT(!list_empty(&sublist->list));
> > +	ASSERT(!list_empty(&sub->list));
> >  
> > -	list_sort(NULL, &exlist->list, xrep_btree_extent_cmp);
> > -	list_sort(NULL, &sublist->list, xrep_btree_extent_cmp);
> > +	list_sort(NULL, &bitmap->list, xrep_btree_extent_cmp);
> > +	list_sort(NULL, &sub->list, xrep_btree_extent_cmp);
> 
> Still a couple xrep_ function names here.

Oops, I'll fix that.

> I guess I'm not clear on how generic this is intended to be..? FWIW,
> the xfs_bitmap->fsbno name stood out a bit to me as well (as opposed
> to something more generic like ->key or whatever).

I suppose if I ever want to turn this into a generic 64-bit bitmap
(right now it's just a fsblock bitmap) I should probably change the
type to uint64_t and rename the parameter fsbno->start or something.

It probably won't make this patch that much fatter so I'll take that
care of that too.

--D

> Brian
> 
> >  
> >  	/*
> > -	 * Now that we've sorted both lists, we iterate exlist once, rolling
> > -	 * forward through sublist and/or exlist as necessary until we find an
> > +	 * Now that we've sorted both lists, we iterate bitmap once, rolling
> > +	 * forward through sub and/or bitmap as necessary until we find an
> >  	 * overlap or reach the end of either list.  We do not reset lp to the
> > -	 * head of exlist nor do we reset subex to the head of sublist.  The
> > +	 * head of bitmap nor do we reset sub_br to the head of sub.  The
> >  	 * list traversal is similar to merge sort, but we're deleting
> >  	 * instead.  In this manner we avoid O(n^2) operations.
> >  	 */
> > -	subex = list_first_entry(&sublist->list, struct xrep_extent,
> > +	sub_br = list_first_entry(&sub->list, struct xfs_bitmap_range,
> >  			list);
> > -	lp = exlist->list.next;
> > -	while (lp != &exlist->list) {
> > -		ex = list_entry(lp, struct xrep_extent, list);
> > +	lp = bitmap->list.next;
> > +	while (lp != &bitmap->list) {
> > +		br = list_entry(lp, struct xfs_bitmap_range, list);
> >  
> >  		/*
> > -		 * Advance subex and/or ex until we find a pair that
> > +		 * Advance sub_br and/or br until we find a pair that
> >  		 * intersect or we run out of extents.
> >  		 */
> > -		while (subex->fsbno + subex->len <= ex->fsbno) {
> > -			if (list_is_last(&subex->list, &sublist->list))
> > +		while (sub_br->fsbno + sub_br->len <= br->fsbno) {
> > +			if (list_is_last(&sub_br->list, &sub->list))
> >  				goto out;
> > -			subex = list_next_entry(subex, list);
> > +			sub_br = list_next_entry(sub_br, list);
> >  		}
> > -		if (subex->fsbno >= ex->fsbno + ex->len) {
> > +		if (sub_br->fsbno >= br->fsbno + br->len) {
> >  			lp = lp->next;
> >  			continue;
> >  		}
> >  
> > -		/* trim subex to fit the extent we have */
> > -		sub_fsb = subex->fsbno;
> > -		sub_len = subex->len;
> > -		if (subex->fsbno < ex->fsbno) {
> > -			sub_len -= ex->fsbno - subex->fsbno;
> > -			sub_fsb = ex->fsbno;
> > +		/* trim sub_br to fit the extent we have */
> > +		sub_fsb = sub_br->fsbno;
> > +		sub_len = sub_br->len;
> > +		if (sub_br->fsbno < br->fsbno) {
> > +			sub_len -= br->fsbno - sub_br->fsbno;
> > +			sub_fsb = br->fsbno;
> >  		}
> > -		if (sub_len > ex->len)
> > -			sub_len = ex->len;
> > +		if (sub_len > br->len)
> > +			sub_len = br->len;
> >  
> >  		state = 0;
> > -		if (sub_fsb == ex->fsbno)
> > +		if (sub_fsb == br->fsbno)
> >  			state |= LEFT_ALIGNED;
> > -		if (sub_fsb + sub_len == ex->fsbno + ex->len)
> > +		if (sub_fsb + sub_len == br->fsbno + br->len)
> >  			state |= RIGHT_ALIGNED;
> >  		switch (state) {
> >  		case LEFT_ALIGNED:
> >  			/* Coincides with only the left. */
> > -			ex->fsbno += sub_len;
> > -			ex->len -= sub_len;
> > +			br->fsbno += sub_len;
> > +			br->len -= sub_len;
> >  			break;
> >  		case RIGHT_ALIGNED:
> >  			/* Coincides with only the right. */
> > -			ex->len -= sub_len;
> > +			br->len -= sub_len;
> >  			lp = lp->next;
> >  			break;
> >  		case LEFT_ALIGNED | RIGHT_ALIGNED:
> >  			/* Total overlap, just delete ex. */
> >  			lp = lp->next;
> > -			list_del(&ex->list);
> > -			kmem_free(ex);
> > +			list_del(&br->list);
> > +			kmem_free(br);
> >  			break;
> >  		case 0:
> >  			/*
> >  			 * Deleting from the middle: add the new right extent
> >  			 * and then shrink the left extent.
> >  			 */
> > -			newex = kmem_alloc(sizeof(struct xrep_extent),
> > +			new_br = kmem_alloc(sizeof(struct xfs_bitmap_range),
> >  					KM_MAYFAIL);
> > -			if (!newex) {
> > +			if (!new_br) {
> >  				error = -ENOMEM;
> >  				goto out;
> >  			}
> > -			INIT_LIST_HEAD(&newex->list);
> > -			newex->fsbno = sub_fsb + sub_len;
> > -			newex->len = ex->fsbno + ex->len - newex->fsbno;
> > -			list_add(&newex->list, &ex->list);
> > -			ex->len = sub_fsb - ex->fsbno;
> > +			INIT_LIST_HEAD(&new_br->list);
> > +			new_br->fsbno = sub_fsb + sub_len;
> > +			new_br->len = br->fsbno + br->len - new_br->fsbno;
> > +			list_add(&new_br->list, &br->list);
> > +			br->len = sub_fsb - br->fsbno;
> >  			lp = lp->next;
> >  			break;
> >  		default:
> > diff --git a/fs/xfs/scrub/bitmap.h b/fs/xfs/scrub/bitmap.h
> > index 1038157695a8..3c39900e9269 100644
> > --- a/fs/xfs/scrub/bitmap.h
> > +++ b/fs/xfs/scrub/bitmap.h
> > @@ -6,32 +6,28 @@
> >  #ifndef __XFS_SCRUB_BITMAP_H__
> >  #define __XFS_SCRUB_BITMAP_H__
> >  
> > -struct xrep_extent {
> > +struct xfs_bitmap_range {
> >  	struct list_head	list;
> >  	xfs_fsblock_t		fsbno;
> > -	xfs_extlen_t		len;
> > +	xfs_fsblock_t		len;
> >  };
> >  
> > -struct xrep_extent_list {
> > +struct xfs_bitmap {
> >  	struct list_head	list;
> >  };
> >  
> > -static inline void
> > -xrep_init_extent_list(
> > -	struct xrep_extent_list		*exlist)
> > -{
> > -	INIT_LIST_HEAD(&exlist->list);
> > -}
> > +void xfs_bitmap_init(struct xfs_bitmap *bitmap);
> > +void xfs_bitmap_destroy(struct xfs_bitmap *bitmap);
> >  
> > -#define for_each_xrep_extent_safe(rbe, n, exlist) \
> > -	list_for_each_entry_safe((rbe), (n), &(exlist)->list, list)
> > -int xrep_collect_btree_extent(struct xfs_scrub *sc,
> > -		struct xrep_extent_list *btlist, xfs_fsblock_t fsbno,
> > -		xfs_extlen_t len);
> > -void xrep_cancel_btree_extents(struct xfs_scrub *sc,
> > -		struct xrep_extent_list *btlist);
> > -int xrep_subtract_extents(struct xfs_scrub *sc,
> > -		struct xrep_extent_list *exlist,
> > -		struct xrep_extent_list *sublist);
> > +#define for_each_xfs_bitmap_extent(bex, n, bitmap) \
> > +	list_for_each_entry_safe((bex), (n), &(bitmap)->list, list)
> > +
> > +#define for_each_xfs_bitmap_block(fsbno, bex, n, bitmap) \
> > +	list_for_each_entry_safe((bex), (n), &(bitmap)->list, list) \
> > +		for (fsbno = bex->fsbno; fsbno < bex->fsbno + bex->len; fsbno++)
> > +
> > +int xfs_bitmap_set(struct xfs_bitmap *bitmap, xfs_fsblock_t fsbno,
> > +		xfs_fsblock_t len);
> > +int xfs_bitmap_disunion(struct xfs_bitmap *bitmap, struct xfs_bitmap *sub);
> >  
> >  #endif	/* __XFS_SCRUB_BITMAP_H__ */
> > diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
> > index 27a904ef6189..85b048b341a0 100644
> > --- a/fs/xfs/scrub/repair.c
> > +++ b/fs/xfs/scrub/repair.c
> > @@ -368,17 +368,17 @@ xrep_init_btblock(
> >   *
> >   * However, that leaves the matter of removing all the metadata describing the
> >   * old broken structure.  For primary metadata we use the rmap data to collect
> > - * every extent with a matching rmap owner (exlist); we then iterate all other
> > + * every extent with a matching rmap owner (bitmap); we then iterate all other
> >   * metadata structures with the same rmap owner to collect the extents that
> > - * cannot be removed (sublist).  We then subtract sublist from exlist to
> > + * cannot be removed (sublist).  We then subtract sublist from bitmap to
> >   * derive the blocks that were used by the old btree.  These blocks can be
> >   * reaped.
> >   *
> >   * For rmapbt reconstructions we must use different tactics for extent
> >   * collection.  First we iterate all primary metadata (this excludes the old
> >   * rmapbt, obviously) to generate new rmap records.  The gaps in the rmap
> > - * records are collected as exlist.  The bnobt records are collected as
> > - * sublist.  As with the other btrees we subtract sublist from exlist, and the
> > + * records are collected as bitmap.  The bnobt records are collected as
> > + * sublist.  As with the other btrees we subtract sublist from bitmap, and the
> >   * result (since the rmapbt lives in the free space) are the blocks from the
> >   * old rmapbt.
> >   *
> > @@ -386,11 +386,11 @@ xrep_init_btblock(
> >   *
> >   * Now that we've constructed a new btree to replace the damaged one, we want
> >   * to dispose of the blocks that (we think) the old btree was using.
> > - * Previously, we used the rmapbt to collect the extents (exlist) with the
> > + * Previously, we used the rmapbt to collect the extents (bitmap) with the
> >   * rmap owner corresponding to the tree we rebuilt, collected extents for any
> >   * blocks with the same rmap owner that are owned by another data structure
> > - * (sublist), and subtracted sublist from exlist.  In theory the extents
> > - * remaining in exlist are the old btree's blocks.
> > + * (sublist), and subtracted sublist from bitmap.  In theory the extents
> > + * remaining in bitmap are the old btree's blocks.
> >   *
> >   * Unfortunately, it's possible that the btree was crosslinked with other
> >   * blocks on disk.  The rmap data can tell us if there are multiple owners, so
> > @@ -406,7 +406,7 @@ xrep_init_btblock(
> >   * If there are no rmap records at all, we also free the block.  If the btree
> >   * being rebuilt lives in the free space (bnobt/cntbt/rmapbt) then there isn't
> >   * supposed to be a rmap record and everything is ok.  For other btrees there
> > - * had to have been an rmap entry for the block to have ended up on @exlist,
> > + * had to have been an rmap entry for the block to have ended up on @bitmap,
> >   * so if it's gone now there's something wrong and the fs will shut down.
> >   *
> >   * Note: If there are multiple rmap records with only the same rmap owner as
> > @@ -419,7 +419,7 @@ xrep_init_btblock(
> >   * The caller is responsible for locking the AG headers for the entire rebuild
> >   * operation so that nothing else can sneak in and change the AG state while
> >   * we're not looking.  We also assume that the caller already invalidated any
> > - * buffers associated with @exlist.
> > + * buffers associated with @bitmap.
> >   */
> >  
> >  /*
> > @@ -429,13 +429,12 @@ xrep_init_btblock(
> >  int
> >  xrep_invalidate_blocks(
> >  	struct xfs_scrub	*sc,
> > -	struct xrep_extent_list	*exlist)
> > +	struct xfs_bitmap	*bitmap)
> >  {
> > -	struct xrep_extent	*rex;
> > -	struct xrep_extent	*n;
> > +	struct xfs_bitmap_range	*bmr;
> > +	struct xfs_bitmap_range	*n;
> >  	struct xfs_buf		*bp;
> >  	xfs_fsblock_t		fsbno;
> > -	xfs_agblock_t		i;
> >  
> >  	/*
> >  	 * For each block in each extent, see if there's an incore buffer for
> > @@ -445,18 +444,16 @@ xrep_invalidate_blocks(
> >  	 * because we never own those; and if we can't TRYLOCK the buffer we
> >  	 * assume it's owned by someone else.
> >  	 */
> > -	for_each_xrep_extent_safe(rex, n, exlist) {
> > -		for (fsbno = rex->fsbno, i = rex->len; i > 0; fsbno++, i--) {
> > -			/* Skip AG headers and post-EOFS blocks */
> > -			if (!xfs_verify_fsbno(sc->mp, fsbno))
> > -				continue;
> > -			bp = xfs_buf_incore(sc->mp->m_ddev_targp,
> > -					XFS_FSB_TO_DADDR(sc->mp, fsbno),
> > -					XFS_FSB_TO_BB(sc->mp, 1), XBF_TRYLOCK);
> > -			if (bp) {
> > -				xfs_trans_bjoin(sc->tp, bp);
> > -				xfs_trans_binval(sc->tp, bp);
> > -			}
> > +	for_each_xfs_bitmap_block(fsbno, bmr, n, bitmap) {
> > +		/* Skip AG headers and post-EOFS blocks */
> > +		if (!xfs_verify_fsbno(sc->mp, fsbno))
> > +			continue;
> > +		bp = xfs_buf_incore(sc->mp->m_ddev_targp,
> > +				XFS_FSB_TO_DADDR(sc->mp, fsbno),
> > +				XFS_FSB_TO_BB(sc->mp, 1), XBF_TRYLOCK);
> > +		if (bp) {
> > +			xfs_trans_bjoin(sc->tp, bp);
> > +			xfs_trans_binval(sc->tp, bp);
> >  		}
> >  	}
> >  
> > @@ -519,9 +516,9 @@ xrep_put_freelist(
> >  	return 0;
> >  }
> >  
> > -/* Dispose of a single metadata block. */
> > +/* Dispose of a single block. */
> >  STATIC int
> > -xrep_dispose_btree_block(
> > +xrep_reap_block(
> >  	struct xfs_scrub	*sc,
> >  	xfs_fsblock_t		fsbno,
> >  	struct xfs_owner_info	*oinfo,
> > @@ -593,41 +590,35 @@ xrep_dispose_btree_block(
> >  	return error;
> >  }
> >  
> > -/* Dispose of btree blocks from an old per-AG btree. */
> > +/* Dispose of every block of every extent in the bitmap. */
> >  int
> > -xrep_reap_btree_extents(
> > +xrep_reap_extents(
> >  	struct xfs_scrub	*sc,
> > -	struct xrep_extent_list	*exlist,
> > +	struct xfs_bitmap	*bitmap,
> >  	struct xfs_owner_info	*oinfo,
> >  	enum xfs_ag_resv_type	type)
> >  {
> > -	struct xrep_extent	*rex;
> > -	struct xrep_extent	*n;
> > +	struct xfs_bitmap_range	*bmr;
> > +	struct xfs_bitmap_range	*n;
> > +	xfs_fsblock_t		fsbno;
> >  	int			error = 0;
> >  
> >  	ASSERT(xfs_sb_version_hasrmapbt(&sc->mp->m_sb));
> >  
> > -	/* Dispose of every block from the old btree. */
> > -	for_each_xrep_extent_safe(rex, n, exlist) {
> > +	for_each_xfs_bitmap_block(fsbno, bmr, n, bitmap) {
> >  		ASSERT(sc->ip != NULL ||
> > -		       XFS_FSB_TO_AGNO(sc->mp, rex->fsbno) == sc->sa.agno);
> > -
> > +		       XFS_FSB_TO_AGNO(sc->mp, fsbno) == sc->sa.agno);
> >  		trace_xrep_dispose_btree_extent(sc->mp,
> > -				XFS_FSB_TO_AGNO(sc->mp, rex->fsbno),
> > -				XFS_FSB_TO_AGBNO(sc->mp, rex->fsbno), rex->len);
> > +				XFS_FSB_TO_AGNO(sc->mp, fsbno),
> > +				XFS_FSB_TO_AGBNO(sc->mp, fsbno), 1);
> >  
> > -		for (; rex->len > 0; rex->len--, rex->fsbno++) {
> > -			error = xrep_dispose_btree_block(sc, rex->fsbno,
> > -					oinfo, type);
> > -			if (error)
> > -				goto out;
> > -		}
> > -		list_del(&rex->list);
> > -		kmem_free(rex);
> > +		error = xrep_reap_block(sc, fsbno, oinfo, type);
> > +		if (error)
> > +			goto out;
> >  	}
> >  
> >  out:
> > -	xrep_cancel_btree_extents(sc, exlist);
> > +	xfs_bitmap_destroy(bitmap);
> >  	return error;
> >  }
> >  
> > diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
> > index a3d491a438f4..5a4e92221916 100644
> > --- a/fs/xfs/scrub/repair.h
> > +++ b/fs/xfs/scrub/repair.h
> > @@ -27,13 +27,11 @@ int xrep_init_btblock(struct xfs_scrub *sc, xfs_fsblock_t fsb,
> >  		struct xfs_buf **bpp, xfs_btnum_t btnum,
> >  		const struct xfs_buf_ops *ops);
> >  
> > -struct xrep_extent_list;
> > +struct xfs_bitmap;
> >  
> >  int xrep_fix_freelist(struct xfs_scrub *sc, bool can_shrink);
> > -int xrep_invalidate_blocks(struct xfs_scrub *sc,
> > -		struct xrep_extent_list *btlist);
> > -int xrep_reap_btree_extents(struct xfs_scrub *sc,
> > -		struct xrep_extent_list *exlist,
> > +int xrep_invalidate_blocks(struct xfs_scrub *sc, struct xfs_bitmap *btlist);
> > +int xrep_reap_extents(struct xfs_scrub *sc, struct xfs_bitmap *exlist,
> >  		struct xfs_owner_info *oinfo, enum xfs_ag_resv_type type);
> >  
> >  struct xrep_find_ag_btree {
> > diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
> > index 93db22c39b51..4e20f0e48232 100644
> > --- a/fs/xfs/scrub/trace.h
> > +++ b/fs/xfs/scrub/trace.h
> > @@ -511,7 +511,6 @@ DEFINE_EVENT(xrep_extent_class, name, \
> >  		 xfs_agblock_t agbno, xfs_extlen_t len), \
> >  	TP_ARGS(mp, agno, agbno, len))
> >  DEFINE_REPAIR_EXTENT_EVENT(xrep_dispose_btree_extent);
> > -DEFINE_REPAIR_EXTENT_EVENT(xrep_collect_btree_extent);
> >  DEFINE_REPAIR_EXTENT_EVENT(xrep_agfl_insert);
> >  
> >  DECLARE_EVENT_CLASS(xrep_rmap_class,
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 04/16] xfs: repair the AGF
  2018-07-27 14:23   ` Brian Foster
@ 2018-07-27 16:02     ` Darrick J. Wong
  2018-07-27 16:25       ` Brian Foster
  0 siblings, 1 reply; 26+ messages in thread
From: Darrick J. Wong @ 2018-07-27 16:02 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs, david, allison.henderson

On Fri, Jul 27, 2018 at 10:23:48AM -0400, Brian Foster wrote:
> On Wed, Jul 25, 2018 at 05:19:55PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Regenerate the AGF from the rmap data.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> 
> Mostly seems sane to me. I still need to come up to speed on the broader
> xfs_scrub context. A few comments in the meantime..

<nod> Thanks for taking a look at this series. :)

> >  fs/xfs/scrub/agheader_repair.c |  366 ++++++++++++++++++++++++++++++++++++++++
> >  fs/xfs/scrub/repair.c          |   27 ++-
> >  fs/xfs/scrub/repair.h          |    4 
> >  fs/xfs/scrub/scrub.c           |    2 
> >  4 files changed, 389 insertions(+), 10 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/scrub/agheader_repair.c b/fs/xfs/scrub/agheader_repair.c
> > index 1e96621ece3a..938af216cb1c 100644
> > --- a/fs/xfs/scrub/agheader_repair.c
> > +++ b/fs/xfs/scrub/agheader_repair.c
> ...
> > @@ -54,3 +61,362 @@ xrep_superblock(
> >  	xfs_trans_log_buf(sc->tp, bp, 0, BBTOB(bp->b_length) - 1);
> >  	return error;
> >  }
> ...
> > +/* Update all AGF fields which derive from btree contents. */
> > +STATIC int
> > +xrep_agf_calc_from_btrees(
> > +	struct xfs_scrub	*sc,
> > +	struct xfs_buf		*agf_bp)
> > +{
> > +	struct xrep_agf_allocbt	raa = { .sc = sc };
> > +	struct xfs_btree_cur	*cur = NULL;
> > +	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agf_bp);
> > +	struct xfs_mount	*mp = sc->mp;
> > +	xfs_agblock_t		btreeblks;
> > +	xfs_agblock_t		blocks;
> > +	int			error;
> > +
> > +	/* Update the AGF counters from the bnobt. */
> > +	cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno,
> > +			XFS_BTNUM_BNO);
> > +	error = xfs_alloc_query_all(cur, xrep_agf_walk_allocbt, &raa);
> > +	if (error)
> > +		goto err;
> > +	error = xfs_btree_count_blocks(cur, &blocks);
> > +	if (error)
> > +		goto err;
> > +	xfs_btree_del_cursor(cur, error);
> > +	btreeblks = blocks - 1;
> 
> Why the -1? We don't count the root or something?

The AGF btreeblks field only counts the number of blocks added to the
bno/cnt/rmapbt since they were initialized (each with a single root
block).  I find it a little strange not to count the root, but oh well.

> > +	agf->agf_freeblks = cpu_to_be32(raa.freeblks);
> > +	agf->agf_longest = cpu_to_be32(raa.longest);
> > +
> > +	/* Update the AGF counters from the cntbt. */
> > +	cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno,
> > +			XFS_BTNUM_CNT);
> > +	error = xfs_btree_count_blocks(cur, &blocks);
> > +	if (error)
> > +		goto err;
> > +	xfs_btree_del_cursor(cur, error);
> > +	btreeblks += blocks - 1;
> > +
> > +	/* Update the AGF counters from the rmapbt. */
> > +	cur = xfs_rmapbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno);
> > +	error = xfs_btree_count_blocks(cur, &blocks);
> > +	if (error)
> > +		goto err;
> > +	xfs_btree_del_cursor(cur, error);
> > +	agf->agf_rmap_blocks = cpu_to_be32(blocks);
> > +	btreeblks += blocks - 1;
> > +
> > +	agf->agf_btreeblks = cpu_to_be32(btreeblks);
> > +
> > +	/* Update the AGF counters from the refcountbt. */
> > +	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> > +		cur = xfs_refcountbt_init_cursor(mp, sc->tp, agf_bp,
> > +				sc->sa.agno, NULL);
> 
> FYI this fails to compile on for-next (dfops param has been removed).

Yeah, I'm working on a rebase to for-next (once I settle on the locking
question in hch's "reduce cow lookups" series).

> > +		error = xfs_btree_count_blocks(cur, &blocks);
> > +		if (error)
> > +			goto err;
> > +		xfs_btree_del_cursor(cur, error);
> > +		agf->agf_refcount_blocks = cpu_to_be32(blocks);
> > +	}
> > +
> > +	return 0;
> > +err:
> > +	xfs_btree_del_cursor(cur, error);
> > +	return error;
> > +}
> > +
> > +/* Commit the new AGF and reinitialize the incore state. */
> > +STATIC int
> > +xrep_agf_commit_new(
> > +	struct xfs_scrub	*sc,
> > +	struct xfs_buf		*agf_bp)
> > +{
> > +	struct xfs_perag	*pag;
> > +	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agf_bp);
> > +
> > +	/* Trigger fdblocks recalculation */
> > +	xfs_force_summary_recalc(sc->mp);
> > +
> > +	/* Write this to disk. */
> > +	xfs_trans_buf_set_type(sc->tp, agf_bp, XFS_BLFT_AGF_BUF);
> > +	xfs_trans_log_buf(sc->tp, agf_bp, 0, BBTOB(agf_bp->b_length) - 1);
> > +
> > +	/* Now reinitialize the in-core counters we changed. */
> > +	pag = sc->sa.pag;
> > +	sc->sa.pag->pagf_init = 1;
> 
> Nit: can probably do 'pag->pagf_init = 1' here since we just initialized
> pag on the line above.

Ok.

> That aside, is ordering important at all here? I'm wondering if somebody
> can grab the pag right after we set this and see pagf_init == 1 before
> we've updated the values below. Perhaps it doesn't really matter since
> we have the agf buffer.

Hmm, I'll move it to the end to minimize the wtf factor. :)

> > +	pag->pagf_btreeblks = be32_to_cpu(agf->agf_btreeblks);
> > +	pag->pagf_freeblks = be32_to_cpu(agf->agf_freeblks);
> > +	pag->pagf_longest = be32_to_cpu(agf->agf_longest);
> > +	pag->pagf_levels[XFS_BTNUM_BNOi] =
> > +			be32_to_cpu(agf->agf_levels[XFS_BTNUM_BNOi]);
> > +	pag->pagf_levels[XFS_BTNUM_CNTi] =
> > +			be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNTi]);
> > +	pag->pagf_levels[XFS_BTNUM_RMAPi] =
> > +			be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAPi]);
> > +	pag->pagf_refcount_level = be32_to_cpu(agf->agf_refcount_level);
> > +
> > +	return 0;
> > +}
> > +
> > +/* Repair the AGF. v5 filesystems only. */
> > +int
> > +xrep_agf(
> > +	struct xfs_scrub		*sc)
> > +{
> > +	struct xrep_find_ag_btree	fab[XREP_AGF_MAX] = {
> > +		[XREP_AGF_BNOBT] = {
> > +			.rmap_owner = XFS_RMAP_OWN_AG,
> > +			.buf_ops = &xfs_allocbt_buf_ops,
> > +			.magic = XFS_ABTB_CRC_MAGIC,
> > +		},
> > +		[XREP_AGF_CNTBT] = {
> > +			.rmap_owner = XFS_RMAP_OWN_AG,
> > +			.buf_ops = &xfs_allocbt_buf_ops,
> > +			.magic = XFS_ABTC_CRC_MAGIC,
> > +		},
> > +		[XREP_AGF_RMAPBT] = {
> > +			.rmap_owner = XFS_RMAP_OWN_AG,
> > +			.buf_ops = &xfs_rmapbt_buf_ops,
> > +			.magic = XFS_RMAP_CRC_MAGIC,
> > +		},
> > +		[XREP_AGF_REFCOUNTBT] = {
> > +			.rmap_owner = XFS_RMAP_OWN_REFC,
> > +			.buf_ops = &xfs_refcountbt_buf_ops,
> > +			.magic = XFS_REFC_CRC_MAGIC,
> > +		},
> > +		[XREP_AGF_END] = {
> > +			.buf_ops = NULL,
> > +		},
> > +	};
> > +	struct xfs_agf			old_agf;
> > +	struct xfs_mount		*mp = sc->mp;
> > +	struct xfs_buf			*agf_bp;
> > +	struct xfs_buf			*agfl_bp;
> > +	struct xfs_agf			*agf;
> > +	int				error;
> > +
> > +	/* We require the rmapbt to rebuild anything. */
> > +	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
> > +		return -EOPNOTSUPP;
> > +
> > +	xchk_perag_get(sc->mp, &sc->sa);
> > +	error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
> > +			XFS_AG_DADDR(mp, sc->sa.agno, XFS_AGF_DADDR(mp)),
> > +			XFS_FSS_TO_BB(mp, 1), 0, &agf_bp, NULL);
> > +	if (error)
> > +		return error;
> > +	agf_bp->b_ops = &xfs_agf_buf_ops;
> 
> Any reason we don't call xfs_read_agf() here? It looks like we use the
> similar helper for the agfl below.

We're grabbing the agf buffer without read verifiers so that we can
reinitialize it.  Note that scrub tries xfs_read_agf, and if it fails
with -EFSCORRUPTED/-EFSBADCRC it marks the agf as corrupt, so it's
possible that sc->sa.sa_agf is still null.

This probably could have been trans_get_buf though...

> > +	agf = XFS_BUF_TO_AGF(agf_bp);
> > +
> > +	/*
> > +	 * Load the AGFL so that we can screen out OWN_AG blocks that are on
> > +	 * the AGFL now; these blocks might have once been part of the
> > +	 * bno/cnt/rmap btrees but are not now.  This is a chicken and egg
> > +	 * problem: the AGF is corrupt, so we have to trust the AGFL contents
> > +	 * because we can't do any serious cross-referencing with any of the
> > +	 * btrees rooted in the AGF.  If the AGFL contents are obviously bad
> > +	 * then we'll bail out.
> > +	 */
> > +	error = xfs_alloc_read_agfl(mp, sc->tp, sc->sa.agno, &agfl_bp);
> > +	if (error)
> > +		return error;
> > +
> > +	/*
> > +	 * Spot-check the AGFL blocks; if they're obviously corrupt then
> > +	 * there's nothing we can do but bail out.
> > +	 */
> 
> Why? Can't we reset the agfl, or is that handled elsewhere?

It's handled in xrep_agfl, but userspace will have to call us again to
fix the agfl and then call us a third time about the agf repair.

(xfs_scrub does this, naturally...)

--D

> Brian
> 
> > +	error = xfs_agfl_walk(sc->mp, XFS_BUF_TO_AGF(agf_bp), agfl_bp,
> > +			xrep_agf_check_agfl_block, sc);
> > +	if (error)
> > +		return error;
> > +
> > +	/*
> > +	 * Find the AGF btree roots.  This is also a chicken-and-egg situation;
> > +	 * see the function for more details.
> > +	 */
> > +	error = xrep_agf_find_btrees(sc, agf_bp, fab, agfl_bp);
> > +	if (error)
> > +		return error;
> > +
> > +	/* Start rewriting the header and implant the btrees we found. */
> > +	xrep_agf_init_header(sc, agf_bp, &old_agf);
> > +	xrep_agf_set_roots(sc, agf, fab);
> > +	error = xrep_agf_calc_from_btrees(sc, agf_bp);
> > +	if (error)
> > +		goto out_revert;
> > +
> > +	/* Commit the changes and reinitialize incore state. */
> > +	return xrep_agf_commit_new(sc, agf_bp);
> > +
> > +out_revert:
> > +	/* Mark the incore AGF state stale and revert the AGF. */
> > +	sc->sa.pag->pagf_init = 0;
> > +	memcpy(agf, &old_agf, sizeof(old_agf));
> > +	return error;
> > +}
> > diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
> > index 85b048b341a0..17cf48564390 100644
> > --- a/fs/xfs/scrub/repair.c
> > +++ b/fs/xfs/scrub/repair.c
> > @@ -128,9 +128,12 @@ xrep_roll_ag_trans(
> >  	int			error;
> >  
> >  	/* Keep the AG header buffers locked so we can keep going. */
> > -	xfs_trans_bhold(sc->tp, sc->sa.agi_bp);
> > -	xfs_trans_bhold(sc->tp, sc->sa.agf_bp);
> > -	xfs_trans_bhold(sc->tp, sc->sa.agfl_bp);
> > +	if (sc->sa.agi_bp)
> > +		xfs_trans_bhold(sc->tp, sc->sa.agi_bp);
> > +	if (sc->sa.agf_bp)
> > +		xfs_trans_bhold(sc->tp, sc->sa.agf_bp);
> > +	if (sc->sa.agfl_bp)
> > +		xfs_trans_bhold(sc->tp, sc->sa.agfl_bp);
> >  
> >  	/* Roll the transaction. */
> >  	error = xfs_trans_roll(&sc->tp);
> > @@ -138,9 +141,12 @@ xrep_roll_ag_trans(
> >  		goto out_release;
> >  
> >  	/* Join AG headers to the new transaction. */
> > -	xfs_trans_bjoin(sc->tp, sc->sa.agi_bp);
> > -	xfs_trans_bjoin(sc->tp, sc->sa.agf_bp);
> > -	xfs_trans_bjoin(sc->tp, sc->sa.agfl_bp);
> > +	if (sc->sa.agi_bp)
> > +		xfs_trans_bjoin(sc->tp, sc->sa.agi_bp);
> > +	if (sc->sa.agf_bp)
> > +		xfs_trans_bjoin(sc->tp, sc->sa.agf_bp);
> > +	if (sc->sa.agfl_bp)
> > +		xfs_trans_bjoin(sc->tp, sc->sa.agfl_bp);
> >  
> >  	return 0;
> >  
> > @@ -150,9 +156,12 @@ xrep_roll_ag_trans(
> >  	 * buffers will be released during teardown on our way out
> >  	 * of the kernel.
> >  	 */
> > -	xfs_trans_bhold_release(sc->tp, sc->sa.agi_bp);
> > -	xfs_trans_bhold_release(sc->tp, sc->sa.agf_bp);
> > -	xfs_trans_bhold_release(sc->tp, sc->sa.agfl_bp);
> > +	if (sc->sa.agi_bp)
> > +		xfs_trans_bhold_release(sc->tp, sc->sa.agi_bp);
> > +	if (sc->sa.agf_bp)
> > +		xfs_trans_bhold_release(sc->tp, sc->sa.agf_bp);
> > +	if (sc->sa.agfl_bp)
> > +		xfs_trans_bhold_release(sc->tp, sc->sa.agfl_bp);
> >  
> >  	return error;
> >  }
> > diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
> > index 5a4e92221916..1d283360b5ab 100644
> > --- a/fs/xfs/scrub/repair.h
> > +++ b/fs/xfs/scrub/repair.h
> > @@ -58,6 +58,8 @@ int xrep_ino_dqattach(struct xfs_scrub *sc);
> >  
> >  int xrep_probe(struct xfs_scrub *sc);
> >  int xrep_superblock(struct xfs_scrub *sc);
> > +int xrep_agf(struct xfs_scrub *sc);
> > +int xrep_agfl(struct xfs_scrub *sc);
> >  
> >  #else
> >  
> > @@ -81,6 +83,8 @@ xrep_calc_ag_resblks(
> >  
> >  #define xrep_probe			xrep_notsupported
> >  #define xrep_superblock			xrep_notsupported
> > +#define xrep_agf			xrep_notsupported
> > +#define xrep_agfl			xrep_notsupported
> >  
> >  #endif /* CONFIG_XFS_ONLINE_REPAIR */
> >  
> > diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
> > index 6efb926f3cf8..1e8a17c8e2b9 100644
> > --- a/fs/xfs/scrub/scrub.c
> > +++ b/fs/xfs/scrub/scrub.c
> > @@ -214,7 +214,7 @@ static const struct xchk_meta_ops meta_scrub_ops[] = {
> >  		.type	= ST_PERAG,
> >  		.setup	= xchk_setup_fs,
> >  		.scrub	= xchk_agf,
> > -		.repair	= xrep_notsupported,
> > +		.repair	= xrep_agf,
> >  	},
> >  	[XFS_SCRUB_TYPE_AGFL]= {	/* agfl */
> >  		.type	= ST_PERAG,
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 04/16] xfs: repair the AGF
  2018-07-27 16:02     ` Darrick J. Wong
@ 2018-07-27 16:25       ` Brian Foster
  2018-07-27 18:19         ` Darrick J. Wong
  0 siblings, 1 reply; 26+ messages in thread
From: Brian Foster @ 2018-07-27 16:25 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, david, allison.henderson

On Fri, Jul 27, 2018 at 09:02:38AM -0700, Darrick J. Wong wrote:
> On Fri, Jul 27, 2018 at 10:23:48AM -0400, Brian Foster wrote:
> > On Wed, Jul 25, 2018 at 05:19:55PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > Regenerate the AGF from the rmap data.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > 
> > Mostly seems sane to me. I still need to come up to speed on the broader
> > xfs_scrub context. A few comments in the meantime..
> 
> <nod> Thanks for taking a look at this series. :)
> 
> > >  fs/xfs/scrub/agheader_repair.c |  366 ++++++++++++++++++++++++++++++++++++++++
> > >  fs/xfs/scrub/repair.c          |   27 ++-
> > >  fs/xfs/scrub/repair.h          |    4 
> > >  fs/xfs/scrub/scrub.c           |    2 
> > >  4 files changed, 389 insertions(+), 10 deletions(-)
> > > 
> > > 
> > > diff --git a/fs/xfs/scrub/agheader_repair.c b/fs/xfs/scrub/agheader_repair.c
> > > index 1e96621ece3a..938af216cb1c 100644
> > > --- a/fs/xfs/scrub/agheader_repair.c
> > > +++ b/fs/xfs/scrub/agheader_repair.c
> > ...
> > > @@ -54,3 +61,362 @@ xrep_superblock(
> > >  	xfs_trans_log_buf(sc->tp, bp, 0, BBTOB(bp->b_length) - 1);
> > >  	return error;
> > >  }
> > ...
> > > +/* Update all AGF fields which derive from btree contents. */
> > > +STATIC int
> > > +xrep_agf_calc_from_btrees(
> > > +	struct xfs_scrub	*sc,
> > > +	struct xfs_buf		*agf_bp)
> > > +{
> > > +	struct xrep_agf_allocbt	raa = { .sc = sc };
> > > +	struct xfs_btree_cur	*cur = NULL;
> > > +	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agf_bp);
> > > +	struct xfs_mount	*mp = sc->mp;
> > > +	xfs_agblock_t		btreeblks;
> > > +	xfs_agblock_t		blocks;
> > > +	int			error;
> > > +
> > > +	/* Update the AGF counters from the bnobt. */
> > > +	cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno,
> > > +			XFS_BTNUM_BNO);
> > > +	error = xfs_alloc_query_all(cur, xrep_agf_walk_allocbt, &raa);
> > > +	if (error)
> > > +		goto err;
> > > +	error = xfs_btree_count_blocks(cur, &blocks);
> > > +	if (error)
> > > +		goto err;
> > > +	xfs_btree_del_cursor(cur, error);
> > > +	btreeblks = blocks - 1;
> > 
> > Why the -1? We don't count the root or something?
> 
> The AGF btreeblks field only counts the number of blocks added to the
> bno/cnt/rmapbt since they were initialized (each with a single root
> block).  I find it a little strange not to count the root, but oh well.
> 

Got it.

> > > +	agf->agf_freeblks = cpu_to_be32(raa.freeblks);
> > > +	agf->agf_longest = cpu_to_be32(raa.longest);
> > > +
> > > +	/* Update the AGF counters from the cntbt. */
> > > +	cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno,
> > > +			XFS_BTNUM_CNT);
> > > +	error = xfs_btree_count_blocks(cur, &blocks);
> > > +	if (error)
> > > +		goto err;
> > > +	xfs_btree_del_cursor(cur, error);
> > > +	btreeblks += blocks - 1;
> > > +
> > > +	/* Update the AGF counters from the rmapbt. */
> > > +	cur = xfs_rmapbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno);
> > > +	error = xfs_btree_count_blocks(cur, &blocks);
> > > +	if (error)
> > > +		goto err;
> > > +	xfs_btree_del_cursor(cur, error);
> > > +	agf->agf_rmap_blocks = cpu_to_be32(blocks);
> > > +	btreeblks += blocks - 1;
> > > +
> > > +	agf->agf_btreeblks = cpu_to_be32(btreeblks);
> > > +
> > > +	/* Update the AGF counters from the refcountbt. */
> > > +	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> > > +		cur = xfs_refcountbt_init_cursor(mp, sc->tp, agf_bp,
> > > +				sc->sa.agno, NULL);
> > 
> > FYI this fails to compile on for-next (dfops param has been removed).
> 
> Yeah, I'm working on a rebase to for-next (once I settle on the locking
> question in hch's "reduce cow lookups" series).
> 
> > > +		error = xfs_btree_count_blocks(cur, &blocks);
> > > +		if (error)
> > > +			goto err;
> > > +		xfs_btree_del_cursor(cur, error);
> > > +		agf->agf_refcount_blocks = cpu_to_be32(blocks);
> > > +	}
> > > +
> > > +	return 0;
> > > +err:
> > > +	xfs_btree_del_cursor(cur, error);
> > > +	return error;
> > > +}
> > > +
...
> > > +/* Repair the AGF. v5 filesystems only. */
> > > +int
> > > +xrep_agf(
> > > +	struct xfs_scrub		*sc)
> > > +{
> > > +	struct xrep_find_ag_btree	fab[XREP_AGF_MAX] = {
> > > +		[XREP_AGF_BNOBT] = {
> > > +			.rmap_owner = XFS_RMAP_OWN_AG,
> > > +			.buf_ops = &xfs_allocbt_buf_ops,
> > > +			.magic = XFS_ABTB_CRC_MAGIC,
> > > +		},
> > > +		[XREP_AGF_CNTBT] = {
> > > +			.rmap_owner = XFS_RMAP_OWN_AG,
> > > +			.buf_ops = &xfs_allocbt_buf_ops,
> > > +			.magic = XFS_ABTC_CRC_MAGIC,
> > > +		},
> > > +		[XREP_AGF_RMAPBT] = {
> > > +			.rmap_owner = XFS_RMAP_OWN_AG,
> > > +			.buf_ops = &xfs_rmapbt_buf_ops,
> > > +			.magic = XFS_RMAP_CRC_MAGIC,
> > > +		},
> > > +		[XREP_AGF_REFCOUNTBT] = {
> > > +			.rmap_owner = XFS_RMAP_OWN_REFC,
> > > +			.buf_ops = &xfs_refcountbt_buf_ops,
> > > +			.magic = XFS_REFC_CRC_MAGIC,
> > > +		},
> > > +		[XREP_AGF_END] = {
> > > +			.buf_ops = NULL,
> > > +		},
> > > +	};
> > > +	struct xfs_agf			old_agf;
> > > +	struct xfs_mount		*mp = sc->mp;
> > > +	struct xfs_buf			*agf_bp;
> > > +	struct xfs_buf			*agfl_bp;
> > > +	struct xfs_agf			*agf;
> > > +	int				error;
> > > +
> > > +	/* We require the rmapbt to rebuild anything. */
> > > +	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
> > > +		return -EOPNOTSUPP;
> > > +
> > > +	xchk_perag_get(sc->mp, &sc->sa);
> > > +	error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
> > > +			XFS_AG_DADDR(mp, sc->sa.agno, XFS_AGF_DADDR(mp)),
> > > +			XFS_FSS_TO_BB(mp, 1), 0, &agf_bp, NULL);
> > > +	if (error)
> > > +		return error;
> > > +	agf_bp->b_ops = &xfs_agf_buf_ops;
> > 
> > Any reason we don't call xfs_read_agf() here? It looks like we use the
> > similar helper for the agfl below.
> 
> We're grabbing the agf buffer without read verifiers so that we can
> reinitialize it.  Note that scrub tries xfs_read_agf, and if it fails
> with -EFSCORRUPTED/-EFSBADCRC it marks the agf as corrupt, so it's
> possible that sc->sa.sa_agf is still null.
> 

Ah, makes sense. I missed the NULL b_ops..

> This probably could have been trans_get_buf though...
> 
> > > +	agf = XFS_BUF_TO_AGF(agf_bp);
> > > +
> > > +	/*
> > > +	 * Load the AGFL so that we can screen out OWN_AG blocks that are on
> > > +	 * the AGFL now; these blocks might have once been part of the
> > > +	 * bno/cnt/rmap btrees but are not now.  This is a chicken and egg
> > > +	 * problem: the AGF is corrupt, so we have to trust the AGFL contents
> > > +	 * because we can't do any serious cross-referencing with any of the
> > > +	 * btrees rooted in the AGF.  If the AGFL contents are obviously bad
> > > +	 * then we'll bail out.
> > > +	 */
> > > +	error = xfs_alloc_read_agfl(mp, sc->tp, sc->sa.agno, &agfl_bp);
> > > +	if (error)
> > > +		return error;
> > > +
> > > +	/*
> > > +	 * Spot-check the AGFL blocks; if they're obviously corrupt then
> > > +	 * there's nothing we can do but bail out.
> > > +	 */
> > 
> > Why? Can't we reset the agfl, or is that handled elsewhere?
> 
> It's handled in xrep_agfl, but userspace will have to call us again to
> fix the agfl and then call us a third time about the agf repair.
> 

Ok, this was one of the things I feel like I don't have enough context
on wrt to online repair: in general, what dependent structures we expect
to be consistent in order to repair some other interrelated structure.
Userspace repair is straightforward in this regard since we slurp the
whole fs into memory, adjust the global state as we go, then essentially
regenerate new metadata based on the finalized state.

For online repair, it sounds like we're potentially limited because
repair of one structure may always depend on some other subset of
metadata being consistent, right? If so, is the goal of online repair to
essentially "do the best we can but otherwise the most severe
corruptions may have to always fall back to xfs_repair?"

So in this particular case, we expect a sane agfl and otherwise buzz off
because 1.) this is a targeted agf repair request and 2.) we have a
separate request to deal with the agfl. It sounds like the smarts to
understand how we might have to jump back and forth between them is in
userspace, so the end-user doesn't necessarily have to understand the
dependency.

Brian

> (xfs_scrub does this, naturally...)
> 
> --D
> 
> > Brian
> > 
> > > +	error = xfs_agfl_walk(sc->mp, XFS_BUF_TO_AGF(agf_bp), agfl_bp,
> > > +			xrep_agf_check_agfl_block, sc);
> > > +	if (error)
> > > +		return error;
> > > +
> > > +	/*
> > > +	 * Find the AGF btree roots.  This is also a chicken-and-egg situation;
> > > +	 * see the function for more details.
> > > +	 */
> > > +	error = xrep_agf_find_btrees(sc, agf_bp, fab, agfl_bp);
> > > +	if (error)
> > > +		return error;
> > > +
> > > +	/* Start rewriting the header and implant the btrees we found. */
> > > +	xrep_agf_init_header(sc, agf_bp, &old_agf);
> > > +	xrep_agf_set_roots(sc, agf, fab);
> > > +	error = xrep_agf_calc_from_btrees(sc, agf_bp);
> > > +	if (error)
> > > +		goto out_revert;
> > > +
> > > +	/* Commit the changes and reinitialize incore state. */
> > > +	return xrep_agf_commit_new(sc, agf_bp);
> > > +
> > > +out_revert:
> > > +	/* Mark the incore AGF state stale and revert the AGF. */
> > > +	sc->sa.pag->pagf_init = 0;
> > > +	memcpy(agf, &old_agf, sizeof(old_agf));
> > > +	return error;
> > > +}
> > > diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
> > > index 85b048b341a0..17cf48564390 100644
> > > --- a/fs/xfs/scrub/repair.c
> > > +++ b/fs/xfs/scrub/repair.c
> > > @@ -128,9 +128,12 @@ xrep_roll_ag_trans(
> > >  	int			error;
> > >  
> > >  	/* Keep the AG header buffers locked so we can keep going. */
> > > -	xfs_trans_bhold(sc->tp, sc->sa.agi_bp);
> > > -	xfs_trans_bhold(sc->tp, sc->sa.agf_bp);
> > > -	xfs_trans_bhold(sc->tp, sc->sa.agfl_bp);
> > > +	if (sc->sa.agi_bp)
> > > +		xfs_trans_bhold(sc->tp, sc->sa.agi_bp);
> > > +	if (sc->sa.agf_bp)
> > > +		xfs_trans_bhold(sc->tp, sc->sa.agf_bp);
> > > +	if (sc->sa.agfl_bp)
> > > +		xfs_trans_bhold(sc->tp, sc->sa.agfl_bp);
> > >  
> > >  	/* Roll the transaction. */
> > >  	error = xfs_trans_roll(&sc->tp);
> > > @@ -138,9 +141,12 @@ xrep_roll_ag_trans(
> > >  		goto out_release;
> > >  
> > >  	/* Join AG headers to the new transaction. */
> > > -	xfs_trans_bjoin(sc->tp, sc->sa.agi_bp);
> > > -	xfs_trans_bjoin(sc->tp, sc->sa.agf_bp);
> > > -	xfs_trans_bjoin(sc->tp, sc->sa.agfl_bp);
> > > +	if (sc->sa.agi_bp)
> > > +		xfs_trans_bjoin(sc->tp, sc->sa.agi_bp);
> > > +	if (sc->sa.agf_bp)
> > > +		xfs_trans_bjoin(sc->tp, sc->sa.agf_bp);
> > > +	if (sc->sa.agfl_bp)
> > > +		xfs_trans_bjoin(sc->tp, sc->sa.agfl_bp);
> > >  
> > >  	return 0;
> > >  
> > > @@ -150,9 +156,12 @@ xrep_roll_ag_trans(
> > >  	 * buffers will be released during teardown on our way out
> > >  	 * of the kernel.
> > >  	 */
> > > -	xfs_trans_bhold_release(sc->tp, sc->sa.agi_bp);
> > > -	xfs_trans_bhold_release(sc->tp, sc->sa.agf_bp);
> > > -	xfs_trans_bhold_release(sc->tp, sc->sa.agfl_bp);
> > > +	if (sc->sa.agi_bp)
> > > +		xfs_trans_bhold_release(sc->tp, sc->sa.agi_bp);
> > > +	if (sc->sa.agf_bp)
> > > +		xfs_trans_bhold_release(sc->tp, sc->sa.agf_bp);
> > > +	if (sc->sa.agfl_bp)
> > > +		xfs_trans_bhold_release(sc->tp, sc->sa.agfl_bp);
> > >  
> > >  	return error;
> > >  }
> > > diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
> > > index 5a4e92221916..1d283360b5ab 100644
> > > --- a/fs/xfs/scrub/repair.h
> > > +++ b/fs/xfs/scrub/repair.h
> > > @@ -58,6 +58,8 @@ int xrep_ino_dqattach(struct xfs_scrub *sc);
> > >  
> > >  int xrep_probe(struct xfs_scrub *sc);
> > >  int xrep_superblock(struct xfs_scrub *sc);
> > > +int xrep_agf(struct xfs_scrub *sc);
> > > +int xrep_agfl(struct xfs_scrub *sc);
> > >  
> > >  #else
> > >  
> > > @@ -81,6 +83,8 @@ xrep_calc_ag_resblks(
> > >  
> > >  #define xrep_probe			xrep_notsupported
> > >  #define xrep_superblock			xrep_notsupported
> > > +#define xrep_agf			xrep_notsupported
> > > +#define xrep_agfl			xrep_notsupported
> > >  
> > >  #endif /* CONFIG_XFS_ONLINE_REPAIR */
> > >  
> > > diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
> > > index 6efb926f3cf8..1e8a17c8e2b9 100644
> > > --- a/fs/xfs/scrub/scrub.c
> > > +++ b/fs/xfs/scrub/scrub.c
> > > @@ -214,7 +214,7 @@ static const struct xchk_meta_ops meta_scrub_ops[] = {
> > >  		.type	= ST_PERAG,
> > >  		.setup	= xchk_setup_fs,
> > >  		.scrub	= xchk_agf,
> > > -		.repair	= xrep_notsupported,
> > > +		.repair	= xrep_agf,
> > >  	},
> > >  	[XFS_SCRUB_TYPE_AGFL]= {	/* agfl */
> > >  		.type	= ST_PERAG,
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 04/16] xfs: repair the AGF
  2018-07-27 16:25       ` Brian Foster
@ 2018-07-27 18:19         ` Darrick J. Wong
  0 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2018-07-27 18:19 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs, david, allison.henderson

On Fri, Jul 27, 2018 at 12:25:56PM -0400, Brian Foster wrote:
> On Fri, Jul 27, 2018 at 09:02:38AM -0700, Darrick J. Wong wrote:
> > On Fri, Jul 27, 2018 at 10:23:48AM -0400, Brian Foster wrote:
> > > On Wed, Jul 25, 2018 at 05:19:55PM -0700, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > 
> > > > Regenerate the AGF from the rmap data.
> > > > 
> > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > ---
> > > 
> > > Mostly seems sane to me. I still need to come up to speed on the broader
> > > xfs_scrub context. A few comments in the meantime..
> > 
> > <nod> Thanks for taking a look at this series. :)
> > 
> > > >  fs/xfs/scrub/agheader_repair.c |  366 ++++++++++++++++++++++++++++++++++++++++
> > > >  fs/xfs/scrub/repair.c          |   27 ++-
> > > >  fs/xfs/scrub/repair.h          |    4 
> > > >  fs/xfs/scrub/scrub.c           |    2 
> > > >  4 files changed, 389 insertions(+), 10 deletions(-)
> > > > 
> > > > 
> > > > diff --git a/fs/xfs/scrub/agheader_repair.c b/fs/xfs/scrub/agheader_repair.c
> > > > index 1e96621ece3a..938af216cb1c 100644
> > > > --- a/fs/xfs/scrub/agheader_repair.c
> > > > +++ b/fs/xfs/scrub/agheader_repair.c
> > > ...
> > > > @@ -54,3 +61,362 @@ xrep_superblock(
> > > >  	xfs_trans_log_buf(sc->tp, bp, 0, BBTOB(bp->b_length) - 1);
> > > >  	return error;
> > > >  }
> > > ...
> > > > +/* Update all AGF fields which derive from btree contents. */
> > > > +STATIC int
> > > > +xrep_agf_calc_from_btrees(
> > > > +	struct xfs_scrub	*sc,
> > > > +	struct xfs_buf		*agf_bp)
> > > > +{
> > > > +	struct xrep_agf_allocbt	raa = { .sc = sc };
> > > > +	struct xfs_btree_cur	*cur = NULL;
> > > > +	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agf_bp);
> > > > +	struct xfs_mount	*mp = sc->mp;
> > > > +	xfs_agblock_t		btreeblks;
> > > > +	xfs_agblock_t		blocks;
> > > > +	int			error;
> > > > +
> > > > +	/* Update the AGF counters from the bnobt. */
> > > > +	cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno,
> > > > +			XFS_BTNUM_BNO);
> > > > +	error = xfs_alloc_query_all(cur, xrep_agf_walk_allocbt, &raa);
> > > > +	if (error)
> > > > +		goto err;
> > > > +	error = xfs_btree_count_blocks(cur, &blocks);
> > > > +	if (error)
> > > > +		goto err;
> > > > +	xfs_btree_del_cursor(cur, error);
> > > > +	btreeblks = blocks - 1;
> > > 
> > > Why the -1? We don't count the root or something?
> > 
> > The AGF btreeblks field only counts the number of blocks added to the
> > bno/cnt/rmapbt since they were initialized (each with a single root
> > block).  I find it a little strange not to count the root, but oh well.
> > 
> 
> Got it.
> 
> > > > +	agf->agf_freeblks = cpu_to_be32(raa.freeblks);
> > > > +	agf->agf_longest = cpu_to_be32(raa.longest);
> > > > +
> > > > +	/* Update the AGF counters from the cntbt. */
> > > > +	cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno,
> > > > +			XFS_BTNUM_CNT);
> > > > +	error = xfs_btree_count_blocks(cur, &blocks);
> > > > +	if (error)
> > > > +		goto err;
> > > > +	xfs_btree_del_cursor(cur, error);
> > > > +	btreeblks += blocks - 1;
> > > > +
> > > > +	/* Update the AGF counters from the rmapbt. */
> > > > +	cur = xfs_rmapbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno);
> > > > +	error = xfs_btree_count_blocks(cur, &blocks);
> > > > +	if (error)
> > > > +		goto err;
> > > > +	xfs_btree_del_cursor(cur, error);
> > > > +	agf->agf_rmap_blocks = cpu_to_be32(blocks);
> > > > +	btreeblks += blocks - 1;
> > > > +
> > > > +	agf->agf_btreeblks = cpu_to_be32(btreeblks);
> > > > +
> > > > +	/* Update the AGF counters from the refcountbt. */
> > > > +	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> > > > +		cur = xfs_refcountbt_init_cursor(mp, sc->tp, agf_bp,
> > > > +				sc->sa.agno, NULL);
> > > 
> > > FYI this fails to compile on for-next (dfops param has been removed).
> > 
> > Yeah, I'm working on a rebase to for-next (once I settle on the locking
> > question in hch's "reduce cow lookups" series).
> > 
> > > > +		error = xfs_btree_count_blocks(cur, &blocks);
> > > > +		if (error)
> > > > +			goto err;
> > > > +		xfs_btree_del_cursor(cur, error);
> > > > +		agf->agf_refcount_blocks = cpu_to_be32(blocks);
> > > > +	}
> > > > +
> > > > +	return 0;
> > > > +err:
> > > > +	xfs_btree_del_cursor(cur, error);
> > > > +	return error;
> > > > +}
> > > > +
> ...
> > > > +/* Repair the AGF. v5 filesystems only. */
> > > > +int
> > > > +xrep_agf(
> > > > +	struct xfs_scrub		*sc)
> > > > +{
> > > > +	struct xrep_find_ag_btree	fab[XREP_AGF_MAX] = {
> > > > +		[XREP_AGF_BNOBT] = {
> > > > +			.rmap_owner = XFS_RMAP_OWN_AG,
> > > > +			.buf_ops = &xfs_allocbt_buf_ops,
> > > > +			.magic = XFS_ABTB_CRC_MAGIC,
> > > > +		},
> > > > +		[XREP_AGF_CNTBT] = {
> > > > +			.rmap_owner = XFS_RMAP_OWN_AG,
> > > > +			.buf_ops = &xfs_allocbt_buf_ops,
> > > > +			.magic = XFS_ABTC_CRC_MAGIC,
> > > > +		},
> > > > +		[XREP_AGF_RMAPBT] = {
> > > > +			.rmap_owner = XFS_RMAP_OWN_AG,
> > > > +			.buf_ops = &xfs_rmapbt_buf_ops,
> > > > +			.magic = XFS_RMAP_CRC_MAGIC,
> > > > +		},
> > > > +		[XREP_AGF_REFCOUNTBT] = {
> > > > +			.rmap_owner = XFS_RMAP_OWN_REFC,
> > > > +			.buf_ops = &xfs_refcountbt_buf_ops,
> > > > +			.magic = XFS_REFC_CRC_MAGIC,
> > > > +		},
> > > > +		[XREP_AGF_END] = {
> > > > +			.buf_ops = NULL,
> > > > +		},
> > > > +	};
> > > > +	struct xfs_agf			old_agf;
> > > > +	struct xfs_mount		*mp = sc->mp;
> > > > +	struct xfs_buf			*agf_bp;
> > > > +	struct xfs_buf			*agfl_bp;
> > > > +	struct xfs_agf			*agf;
> > > > +	int				error;
> > > > +
> > > > +	/* We require the rmapbt to rebuild anything. */
> > > > +	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
> > > > +		return -EOPNOTSUPP;
> > > > +
> > > > +	xchk_perag_get(sc->mp, &sc->sa);
> > > > +	error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
> > > > +			XFS_AG_DADDR(mp, sc->sa.agno, XFS_AGF_DADDR(mp)),
> > > > +			XFS_FSS_TO_BB(mp, 1), 0, &agf_bp, NULL);
> > > > +	if (error)
> > > > +		return error;
> > > > +	agf_bp->b_ops = &xfs_agf_buf_ops;
> > > 
> > > Any reason we don't call xfs_read_agf() here? It looks like we use the
> > > similar helper for the agfl below.
> > 
> > We're grabbing the agf buffer without read verifiers so that we can
> > reinitialize it.  Note that scrub tries xfs_read_agf, and if it fails
> > with -EFSCORRUPTED/-EFSBADCRC it marks the agf as corrupt, so it's
> > possible that sc->sa.sa_agf is still null.
> > 
> 
> Ah, makes sense. I missed the NULL b_ops..
> 
> > This probably could have been trans_get_buf though...
> > 
> > > > +	agf = XFS_BUF_TO_AGF(agf_bp);
> > > > +
> > > > +	/*
> > > > +	 * Load the AGFL so that we can screen out OWN_AG blocks that are on
> > > > +	 * the AGFL now; these blocks might have once been part of the
> > > > +	 * bno/cnt/rmap btrees but are not now.  This is a chicken and egg
> > > > +	 * problem: the AGF is corrupt, so we have to trust the AGFL contents
> > > > +	 * because we can't do any serious cross-referencing with any of the
> > > > +	 * btrees rooted in the AGF.  If the AGFL contents are obviously bad
> > > > +	 * then we'll bail out.
> > > > +	 */
> > > > +	error = xfs_alloc_read_agfl(mp, sc->tp, sc->sa.agno, &agfl_bp);
> > > > +	if (error)
> > > > +		return error;
> > > > +
> > > > +	/*
> > > > +	 * Spot-check the AGFL blocks; if they're obviously corrupt then
> > > > +	 * there's nothing we can do but bail out.
> > > > +	 */
> > > 
> > > Why? Can't we reset the agfl, or is that handled elsewhere?
> > 
> > It's handled in xrep_agfl, but userspace will have to call us again to
> > fix the agfl and then call us a third time about the agf repair.
> > 
> 
> Ok, this was one of the things I feel like I don't have enough context
> on wrt to online repair: in general, what dependent structures we expect
> to be consistent in order to repair some other interrelated structure.
> Userspace repair is straightforward in this regard since we slurp the
> whole fs into memory, adjust the global state as we go, then essentially
> regenerate new metadata based on the finalized state.

<nod>

> For online repair, it sounds like we're potentially limited because
> repair of one structure may always depend on some other subset of
> metadata being consistent, right?

Correct.  There's an implicit assumption here (which is coded into a
warning message in xfs_scrub) that if the primary and secondary metadata
(rmapbt usually) are corrupt then the fs may be unrecoverable online...

> If so, is the goal of online repair to essentially "do the best we can
> but otherwise the most severe corruptions may have to always fall back
> to xfs_repair?"

...and so the only recourse is xfs_repair, with the usual caveat that a
severely damaged filesystem might be a total loss.  If the online repair
fails then the fs will be shut down (either due to cancelling a dirty
repair transaction or because xfs_scrub can call FS_IOC_SHUTDOWN after a
failure).

> So in this particular case, we expect a sane agfl and otherwise buzz off
> because 1.) this is a targeted agf repair request and 2.) we have a
> separate request to deal with the agfl. It sounds like the smarts to
> understand how we might have to jump back and forth between them is in
> userspace, so the end-user doesn't necessarily have to understand the
> dependency.

Correct.  Userspace should be running xfs_scrub, which knows in which
order metadata has to be checked, and what dependencies must be
satisfied for repairs to succeed.  While it's possible to invoke it
manually via xfs_io, in practice nobody but xfstests should be using
that method of invocation.

--D

> Brian
> 
> > (xfs_scrub does this, naturally...)
> > 
> > --D
> > 
> > > Brian
> > > 
> > > > +	error = xfs_agfl_walk(sc->mp, XFS_BUF_TO_AGF(agf_bp), agfl_bp,
> > > > +			xrep_agf_check_agfl_block, sc);
> > > > +	if (error)
> > > > +		return error;
> > > > +
> > > > +	/*
> > > > +	 * Find the AGF btree roots.  This is also a chicken-and-egg situation;
> > > > +	 * see the function for more details.
> > > > +	 */
> > > > +	error = xrep_agf_find_btrees(sc, agf_bp, fab, agfl_bp);
> > > > +	if (error)
> > > > +		return error;
> > > > +
> > > > +	/* Start rewriting the header and implant the btrees we found. */
> > > > +	xrep_agf_init_header(sc, agf_bp, &old_agf);
> > > > +	xrep_agf_set_roots(sc, agf, fab);
> > > > +	error = xrep_agf_calc_from_btrees(sc, agf_bp);
> > > > +	if (error)
> > > > +		goto out_revert;
> > > > +
> > > > +	/* Commit the changes and reinitialize incore state. */
> > > > +	return xrep_agf_commit_new(sc, agf_bp);
> > > > +
> > > > +out_revert:
> > > > +	/* Mark the incore AGF state stale and revert the AGF. */
> > > > +	sc->sa.pag->pagf_init = 0;
> > > > +	memcpy(agf, &old_agf, sizeof(old_agf));
> > > > +	return error;
> > > > +}
> > > > diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
> > > > index 85b048b341a0..17cf48564390 100644
> > > > --- a/fs/xfs/scrub/repair.c
> > > > +++ b/fs/xfs/scrub/repair.c
> > > > @@ -128,9 +128,12 @@ xrep_roll_ag_trans(
> > > >  	int			error;
> > > >  
> > > >  	/* Keep the AG header buffers locked so we can keep going. */
> > > > -	xfs_trans_bhold(sc->tp, sc->sa.agi_bp);
> > > > -	xfs_trans_bhold(sc->tp, sc->sa.agf_bp);
> > > > -	xfs_trans_bhold(sc->tp, sc->sa.agfl_bp);
> > > > +	if (sc->sa.agi_bp)
> > > > +		xfs_trans_bhold(sc->tp, sc->sa.agi_bp);
> > > > +	if (sc->sa.agf_bp)
> > > > +		xfs_trans_bhold(sc->tp, sc->sa.agf_bp);
> > > > +	if (sc->sa.agfl_bp)
> > > > +		xfs_trans_bhold(sc->tp, sc->sa.agfl_bp);
> > > >  
> > > >  	/* Roll the transaction. */
> > > >  	error = xfs_trans_roll(&sc->tp);
> > > > @@ -138,9 +141,12 @@ xrep_roll_ag_trans(
> > > >  		goto out_release;
> > > >  
> > > >  	/* Join AG headers to the new transaction. */
> > > > -	xfs_trans_bjoin(sc->tp, sc->sa.agi_bp);
> > > > -	xfs_trans_bjoin(sc->tp, sc->sa.agf_bp);
> > > > -	xfs_trans_bjoin(sc->tp, sc->sa.agfl_bp);
> > > > +	if (sc->sa.agi_bp)
> > > > +		xfs_trans_bjoin(sc->tp, sc->sa.agi_bp);
> > > > +	if (sc->sa.agf_bp)
> > > > +		xfs_trans_bjoin(sc->tp, sc->sa.agf_bp);
> > > > +	if (sc->sa.agfl_bp)
> > > > +		xfs_trans_bjoin(sc->tp, sc->sa.agfl_bp);
> > > >  
> > > >  	return 0;
> > > >  
> > > > @@ -150,9 +156,12 @@ xrep_roll_ag_trans(
> > > >  	 * buffers will be released during teardown on our way out
> > > >  	 * of the kernel.
> > > >  	 */
> > > > -	xfs_trans_bhold_release(sc->tp, sc->sa.agi_bp);
> > > > -	xfs_trans_bhold_release(sc->tp, sc->sa.agf_bp);
> > > > -	xfs_trans_bhold_release(sc->tp, sc->sa.agfl_bp);
> > > > +	if (sc->sa.agi_bp)
> > > > +		xfs_trans_bhold_release(sc->tp, sc->sa.agi_bp);
> > > > +	if (sc->sa.agf_bp)
> > > > +		xfs_trans_bhold_release(sc->tp, sc->sa.agf_bp);
> > > > +	if (sc->sa.agfl_bp)
> > > > +		xfs_trans_bhold_release(sc->tp, sc->sa.agfl_bp);
> > > >  
> > > >  	return error;
> > > >  }
> > > > diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
> > > > index 5a4e92221916..1d283360b5ab 100644
> > > > --- a/fs/xfs/scrub/repair.h
> > > > +++ b/fs/xfs/scrub/repair.h
> > > > @@ -58,6 +58,8 @@ int xrep_ino_dqattach(struct xfs_scrub *sc);
> > > >  
> > > >  int xrep_probe(struct xfs_scrub *sc);
> > > >  int xrep_superblock(struct xfs_scrub *sc);
> > > > +int xrep_agf(struct xfs_scrub *sc);
> > > > +int xrep_agfl(struct xfs_scrub *sc);
> > > >  
> > > >  #else
> > > >  
> > > > @@ -81,6 +83,8 @@ xrep_calc_ag_resblks(
> > > >  
> > > >  #define xrep_probe			xrep_notsupported
> > > >  #define xrep_superblock			xrep_notsupported
> > > > +#define xrep_agf			xrep_notsupported
> > > > +#define xrep_agfl			xrep_notsupported
> > > >  
> > > >  #endif /* CONFIG_XFS_ONLINE_REPAIR */
> > > >  
> > > > diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
> > > > index 6efb926f3cf8..1e8a17c8e2b9 100644
> > > > --- a/fs/xfs/scrub/scrub.c
> > > > +++ b/fs/xfs/scrub/scrub.c
> > > > @@ -214,7 +214,7 @@ static const struct xchk_meta_ops meta_scrub_ops[] = {
> > > >  		.type	= ST_PERAG,
> > > >  		.setup	= xchk_setup_fs,
> > > >  		.scrub	= xchk_agf,
> > > > -		.repair	= xrep_notsupported,
> > > > +		.repair	= xrep_agf,
> > > >  	},
> > > >  	[XFS_SCRUB_TYPE_AGFL]= {	/* agfl */
> > > >  		.type	= ST_PERAG,
> > > > 
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v17 00/16] xfs-4.19: online repair support
@ 2018-07-26  0:12 Darrick J. Wong
  0 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2018-07-26  0:12 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, david, allison.henderson

Hi all,

This is the seventeenth revision of a patchset that adds to XFS kernel
support for online metadata scrubbing and repair.  There aren't any
on-disk format changes.

New for this version of the patch series are fixes for numerous review
comments that came from Dave and Allison.  The long prefixes of the
previous versions have been drastically shortened.  Comments about the
strategies used to repair broken parts of the filesystem have been
expanded where reviewers thought it confusing.  A few data structures
have been renamed to reflect more accurately what they do.

Note, this series does not include any of the controversial repair
functionality that requires fs freezing; that has been deferred to a
later posting.

The first patch pushes a transaction pointer through the per-AG
reservation code so that scrub can reinitialize the per-AG reservations
after repairing metadata while maintaining the AG header lock.

The next two patches move the 'extent list' functionality into a
separate file and rename it xfs_bitmap, since that's what the data
structure actually represents.

Patches 4-14 implement reconstruction of the AGF/AGI/AGFL headers, the
free space btrees, the inode btrees, the inodes, the inode forks, the
inode block maps, symbolic links, and extended attributes.

Patch 15 augments scrub to rebuild extended attributes when any of the
attr blocks are fragmented.

Patch 16 implements reconstruction of quota blocks.

If you're going to start using this mess, you probably ought to just
pull from my git trees.  The kernel patches[1] should apply against
4.18-rc6.  xfsprogs[2] and xfstests[3] can be found in their usual
places.  The git trees contain all four series' worth of changes.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

[1] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=djwong-devel
[2] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=djwong-devel
[3] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=djwong-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2018-07-27 19:42 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-26  0:19 [PATCH v17 00/16] xfs-4.19: online repair support Darrick J. Wong
2018-07-26  0:19 ` [PATCH 01/16] xfs: pass transaction lock while setting up agresv on cyclic metadata Darrick J. Wong
2018-07-27 14:21   ` Brian Foster
2018-07-26  0:19 ` [PATCH 02/16] xfs: move the repair extent list into its own file Darrick J. Wong
2018-07-27 14:21   ` Brian Foster
2018-07-26  0:19 ` [PATCH 03/16] xfs: refactor the xrep_extent_list into xfs_bitmap Darrick J. Wong
2018-07-27 14:21   ` Brian Foster
2018-07-27 15:52     ` Darrick J. Wong
2018-07-26  0:19 ` [PATCH 04/16] xfs: repair the AGF Darrick J. Wong
2018-07-27 14:23   ` Brian Foster
2018-07-27 16:02     ` Darrick J. Wong
2018-07-27 16:25       ` Brian Foster
2018-07-27 18:19         ` Darrick J. Wong
2018-07-26  0:20 ` [PATCH 05/16] xfs: repair the AGFL Darrick J. Wong
2018-07-26  0:20 ` [PATCH 06/16] xfs: repair the AGI Darrick J. Wong
2018-07-26  0:20 ` [PATCH 07/16] xfs: repair free space btrees Darrick J. Wong
2018-07-26  0:21 ` [PATCH 08/16] xfs: repair inode btrees Darrick J. Wong
2018-07-26  0:21 ` [PATCH 09/16] xfs: repair refcount btrees Darrick J. Wong
2018-07-26  0:21 ` [PATCH 10/16] xfs: repair inode records Darrick J. Wong
2018-07-26  0:21 ` [PATCH 11/16] xfs: zap broken inode forks Darrick J. Wong
2018-07-26  0:21 ` [PATCH 12/16] xfs: repair inode block maps Darrick J. Wong
2018-07-26  0:21 ` [PATCH 13/16] xfs: repair damaged symlinks Darrick J. Wong
2018-07-26  0:21 ` [PATCH 14/16] xfs: repair extended attributes Darrick J. Wong
2018-07-26  0:21 ` [PATCH 15/16] xfs: scrub should set preen if attr leaf has holes Darrick J. Wong
2018-07-26  0:21 ` [PATCH 16/16] xfs: repair quotas Darrick J. Wong
  -- strict thread matches above, loose matches on Subject: below --
2018-07-26  0:12 [PATCH v17 00/16] xfs-4.19: online repair support Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.