All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v15.2 00/14] xfs-4.18: online repair support
@ 2018-05-30 19:30 Darrick J. Wong
  2018-05-30 19:30 ` [PATCH 01/14] xfs: repair the AGF and AGFL Darrick J. Wong
                   ` (13 more replies)
  0 siblings, 14 replies; 35+ messages in thread
From: Darrick J. Wong @ 2018-05-30 19:30 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

Hi all,

This is the fifteenth revision of a patchset that adds to XFS kernel
support for online metadata scrubbing and repair.  There aren't any
on-disk format changes.

New in v15 of the patch series is the ability to scavenge broken
attribute forks for intact extended attributes, to repair minor
corruptions in the on-disk quota records, and to perform quotacheck
online.  The series was rebased atop for-next for 4.18 and a number
of minor bitrots bugs were fixed.

Patches 1-14 introduce the online repair functionality for space
metadata and certain file data.  Our general strategy for rebuilding
damaged primary metadata is to rebuild the structure completely from
secondary metadata and free the old structure after the fact; we do not
try to salvage anything.  Consequently, online repair requires rmapbt.
Rebuilding the secondary metadata (rmap) is much harder -- due to our
locking rules (primary and then secondary) we have to shut down the
filesystem temporarily while we scan all the primary metadata for data
to put in the new secondary structure.

Reconstructing inodes is difficult -- the ability to rebuild files
depends on the filesystem being able to load an inode (xfs_iget), which
means repair has to know how to zap any part of an inode record that
might trigger corruption errors from iget.  To that end, we can now
reset most of an inode record or an inode fork so that we can rebuild
the file.

The refcount rebuilder is more or less the same algorithm that
xfs_repair uses, but modified to reflect the constraints of running in
kernel space.

For rmap rebuilds, we cannot have anything on the filesystem taking
exclusive locks and we cannot have any allocation activity at all.
Therefore, we start by freezing the filesystem to allow other
transactions to finish.  Next, we scan all other AG metadata structures,
every inode, and every block map to reconstruct the rmap data.  Then, we
reinitialize the rmap btree root and reload the rmap btree.  Finally, we
release all the resource we grabbed and the filesystem returns to
normal.

The extended attribute repair function uses a different strategy from
the other repair code.  Since there are no secondary metadata for
extended attributes, we can't simply rebuild from an alternate data
source.  Therefore, this repairer simply walks through the blocks in the
attribute fork looking for attribute names and values that appear to be
intact, zaps the attr fork, and re-adds the collected names and values
to the new fork.  This enables us to trigger optimization notices for
attributes blocks with holes.

Quota repairs are fairly straightforward -- repair anything wrong with
the inode data fork, eliminate garbage extents, and then iterate all the
dquot blocks fixing up things that the dquot buffer verifier will
complain about.  This should leave the quota ip in good enough shape for
online quotacheck!  Here we reuse the same fs freezing mechanism as in
the rmap repair to block all other filesystem users.  Then we zero all
the quota counters, iterate all the inodes in the system to recalculate
the counts, and log all the dquots to disk.  We of course clear the CHKD
flags before starting out, so if we crash midway through, the mount time
quotacheck will run.

Looking forward, the parent pointer feature that Allison Henderson is
working on will enable us to reconstruct directories, at which point
we'll be able to reconstruct most of a lightly damaged filesystem.  But
that's future talk.

If you're going to start using this mess, you probably ought to just
pull from my git trees.  The kernel patches[1] should apply against
4.17-rc7.  xfsprogs[2] and xfstests[3] can be found in their usual
places.  The git trees contain all four series' worth of changes.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

[1] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=djwong-devel
[2] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=djwong-devel
[3] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=djwong-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 01/14] xfs: repair the AGF and AGFL
  2018-05-30 19:30 [PATCH v15.2 00/14] xfs-4.18: online repair support Darrick J. Wong
@ 2018-05-30 19:30 ` Darrick J. Wong
  2018-06-04  1:52   ` Dave Chinner
  2018-05-30 19:30 ` [PATCH 02/14] xfs: repair the AGI Darrick J. Wong
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 35+ messages in thread
From: Darrick J. Wong @ 2018-05-30 19:30 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Regenerate the AGF and AGFL from the rmap data.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/scrub/agheader_repair.c |  484 ++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/repair.c          |   30 ++
 fs/xfs/scrub/repair.h          |    6 
 fs/xfs/scrub/scrub.c           |    4 
 fs/xfs/xfs_trans.c             |   54 ++++
 fs/xfs/xfs_trans.h             |    2 
 6 files changed, 578 insertions(+), 2 deletions(-)


diff --git a/fs/xfs/scrub/agheader_repair.c b/fs/xfs/scrub/agheader_repair.c
index 8b91e9ebe1e7..0f794d27382a 100644
--- a/fs/xfs/scrub/agheader_repair.c
+++ b/fs/xfs/scrub/agheader_repair.c
@@ -31,12 +31,18 @@
 #include "xfs_sb.h"
 #include "xfs_inode.h"
 #include "xfs_alloc.h"
+#include "xfs_alloc_btree.h"
 #include "xfs_ialloc.h"
+#include "xfs_ialloc_btree.h"
 #include "xfs_rmap.h"
+#include "xfs_rmap_btree.h"
+#include "xfs_refcount.h"
+#include "xfs_refcount_btree.h"
 #include "scrub/xfs_scrub.h"
 #include "scrub/scrub.h"
 #include "scrub/common.h"
 #include "scrub/trace.h"
+#include "scrub/repair.h"
 
 /* Superblock */
 
@@ -68,3 +74,481 @@ xfs_repair_superblock(
 	xfs_trans_log_buf(sc->tp, bp, 0, BBTOB(bp->b_length) - 1);
 	return error;
 }
+
+/* AGF */
+
+struct xfs_repair_agf_allocbt {
+	struct xfs_scrub_context	*sc;
+	xfs_agblock_t			freeblks;
+	xfs_agblock_t			longest;
+};
+
+/* Record free space shape information. */
+STATIC int
+xfs_repair_agf_walk_allocbt(
+	struct xfs_btree_cur		*cur,
+	struct xfs_alloc_rec_incore	*rec,
+	void				*priv)
+{
+	struct xfs_repair_agf_allocbt	*raa = priv;
+	int				error = 0;
+
+	if (xfs_scrub_should_terminate(raa->sc, &error))
+		return error;
+
+	raa->freeblks += rec->ar_blockcount;
+	if (rec->ar_blockcount > raa->longest)
+		raa->longest = rec->ar_blockcount;
+	return error;
+}
+
+/* Does this AGFL block look sane? */
+STATIC int
+xfs_repair_agf_check_agfl_block(
+	struct xfs_mount		*mp,
+	xfs_agblock_t			agbno,
+	void				*priv)
+{
+	struct xfs_scrub_context	*sc = priv;
+
+	if (!xfs_verify_agbno(mp, sc->sa.agno, agbno))
+		return -EFSCORRUPTED;
+	return 0;
+}
+
+/* Repair the AGF. */
+int
+xfs_repair_agf(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_repair_find_ag_btree	fab[] = {
+		{
+			.rmap_owner = XFS_RMAP_OWN_AG,
+			.buf_ops = &xfs_allocbt_buf_ops,
+			.magic = XFS_ABTB_CRC_MAGIC,
+		},
+		{
+			.rmap_owner = XFS_RMAP_OWN_AG,
+			.buf_ops = &xfs_allocbt_buf_ops,
+			.magic = XFS_ABTC_CRC_MAGIC,
+		},
+		{
+			.rmap_owner = XFS_RMAP_OWN_AG,
+			.buf_ops = &xfs_rmapbt_buf_ops,
+			.magic = XFS_RMAP_CRC_MAGIC,
+		},
+		{
+			.rmap_owner = XFS_RMAP_OWN_REFC,
+			.buf_ops = &xfs_refcountbt_buf_ops,
+			.magic = XFS_REFC_CRC_MAGIC,
+		},
+		{
+			.buf_ops = NULL,
+		},
+	};
+	struct xfs_repair_agf_allocbt	raa;
+	struct xfs_agf			old_agf;
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_buf			*agf_bp;
+	struct xfs_buf			*agfl_bp;
+	struct xfs_agf			*agf;
+	struct xfs_btree_cur		*cur = NULL;
+	struct xfs_perag		*pag;
+	xfs_agblock_t			blocks;
+	xfs_agblock_t			freesp_blocks;
+	int64_t				delta_fdblocks = 0;
+	int				error;
+
+	/* We require the rmapbt to rebuild anything. */
+	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+		return -EOPNOTSUPP;
+
+	xfs_scrub_perag_get(sc->mp, &sc->sa);
+	pag = sc->sa.pag;
+	memset(&raa, 0, sizeof(raa));
+	error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
+			XFS_AG_DADDR(mp, sc->sa.agno, XFS_AGF_DADDR(mp)),
+			XFS_FSS_TO_BB(mp, 1), 0, &agf_bp, NULL);
+	if (error)
+		return error;
+	agf_bp->b_ops = &xfs_agf_buf_ops;
+
+	/*
+	 * Load the AGFL so that we can screen out OWN_AG blocks that
+	 * are on the AGFL now; these blocks might have once been part
+	 * of the bno/cnt/rmap btrees but are not now.
+	 */
+	error = xfs_alloc_read_agfl(mp, sc->tp, sc->sa.agno, &agfl_bp);
+	if (error)
+		return error;
+	error = xfs_agfl_walk(sc->mp, XFS_BUF_TO_AGF(agf_bp), agfl_bp,
+			xfs_repair_agf_check_agfl_block, sc);
+	if (error)
+		return error;
+
+	/* Find the btree roots. */
+	error = xfs_repair_find_ag_btree_roots(sc, agf_bp, fab, agfl_bp);
+	if (error)
+		return error;
+	if (fab[0].root == NULLAGBLOCK || fab[0].height > XFS_BTREE_MAXLEVELS ||
+	    fab[1].root == NULLAGBLOCK || fab[1].height > XFS_BTREE_MAXLEVELS ||
+	    fab[2].root == NULLAGBLOCK || fab[2].height > XFS_BTREE_MAXLEVELS)
+		return -EFSCORRUPTED;
+	if (xfs_sb_version_hasreflink(&mp->m_sb) &&
+	    (fab[3].root == NULLAGBLOCK || fab[3].height > XFS_BTREE_MAXLEVELS))
+		return -EFSCORRUPTED;
+
+	/* Start rewriting the header. */
+	agf = XFS_BUF_TO_AGF(agf_bp);
+	memcpy(&old_agf, agf, sizeof(old_agf));
+
+	/*
+	 * We relied on the rmapbt to reconstruct the AGF.  If we get a
+	 * different root then something's seriously wrong.
+	 */
+	if (be32_to_cpu(old_agf.agf_roots[XFS_BTNUM_RMAPi]) != fab[2].root)
+		return -EFSCORRUPTED;
+	memset(agf, 0, mp->m_sb.sb_sectsize);
+	agf->agf_magicnum = cpu_to_be32(XFS_AGF_MAGIC);
+	agf->agf_versionnum = cpu_to_be32(XFS_AGF_VERSION);
+	agf->agf_seqno = cpu_to_be32(sc->sa.agno);
+	agf->agf_length = cpu_to_be32(xfs_ag_block_count(mp, sc->sa.agno));
+	agf->agf_roots[XFS_BTNUM_BNOi] = cpu_to_be32(fab[0].root);
+	agf->agf_roots[XFS_BTNUM_CNTi] = cpu_to_be32(fab[1].root);
+	agf->agf_roots[XFS_BTNUM_RMAPi] = cpu_to_be32(fab[2].root);
+	agf->agf_levels[XFS_BTNUM_BNOi] = cpu_to_be32(fab[0].height);
+	agf->agf_levels[XFS_BTNUM_CNTi] = cpu_to_be32(fab[1].height);
+	agf->agf_levels[XFS_BTNUM_RMAPi] = cpu_to_be32(fab[2].height);
+	agf->agf_flfirst = old_agf.agf_flfirst;
+	agf->agf_fllast = old_agf.agf_fllast;
+	agf->agf_flcount = old_agf.agf_flcount;
+	if (xfs_sb_version_hascrc(&mp->m_sb))
+		uuid_copy(&agf->agf_uuid, &mp->m_sb.sb_meta_uuid);
+	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
+		agf->agf_refcount_root = cpu_to_be32(fab[3].root);
+		agf->agf_refcount_level = cpu_to_be32(fab[3].height);
+	}
+
+	/* Update the AGF counters from the bnobt. */
+	cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno,
+			XFS_BTNUM_BNO);
+	raa.sc = sc;
+	error = xfs_alloc_query_all(cur, xfs_repair_agf_walk_allocbt, &raa);
+	if (error)
+		goto err;
+	error = xfs_btree_count_blocks(cur, &blocks);
+	if (error)
+		goto err;
+	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+	freesp_blocks = blocks - 1;
+	agf->agf_freeblks = cpu_to_be32(raa.freeblks);
+	agf->agf_longest = cpu_to_be32(raa.longest);
+
+	/* Update the AGF counters from the cntbt. */
+	cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno,
+			XFS_BTNUM_CNT);
+	error = xfs_btree_count_blocks(cur, &blocks);
+	if (error)
+		goto err;
+	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+	freesp_blocks += blocks - 1;
+
+	/* Update the AGF counters from the rmapbt. */
+	cur = xfs_rmapbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno);
+	error = xfs_btree_count_blocks(cur, &blocks);
+	if (error)
+		goto err;
+	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+	agf->agf_rmap_blocks = cpu_to_be32(blocks);
+	freesp_blocks += blocks - 1;
+
+	/* Update the AGF counters from the refcountbt. */
+	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
+		cur = xfs_refcountbt_init_cursor(mp, sc->tp, agf_bp,
+				sc->sa.agno, NULL);
+		error = xfs_btree_count_blocks(cur, &blocks);
+		if (error)
+			goto err;
+		xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+		agf->agf_refcount_blocks = cpu_to_be32(blocks);
+	}
+	agf->agf_btreeblks = cpu_to_be32(freesp_blocks);
+	cur = NULL;
+
+	/* Trigger reinitialization of the in-core data. */
+	if (raa.freeblks != be32_to_cpu(old_agf.agf_freeblks)) {
+		delta_fdblocks += (int64_t)raa.freeblks -
+				be32_to_cpu(old_agf.agf_freeblks);
+		if (pag->pagf_init)
+			pag->pagf_freeblks = be32_to_cpu(agf->agf_freeblks);
+	}
+
+	if (freesp_blocks != be32_to_cpu(old_agf.agf_btreeblks)) {
+		delta_fdblocks += (int64_t)freesp_blocks -
+				be32_to_cpu(old_agf.agf_btreeblks);
+		if (pag->pagf_init)
+			pag->pagf_btreeblks = be32_to_cpu(agf->agf_btreeblks);
+	}
+
+	if (pag->pagf_init &&
+	    (raa.longest != be32_to_cpu(old_agf.agf_longest) ||
+	     fab[0].height != be32_to_cpu(old_agf.agf_levels[XFS_BTNUM_BNOi]) ||
+	     fab[1].height != be32_to_cpu(old_agf.agf_levels[XFS_BTNUM_CNTi]) ||
+	     fab[2].height != be32_to_cpu(old_agf.agf_levels[XFS_BTNUM_RMAPi]) ||
+	     fab[3].height != be32_to_cpu(old_agf.agf_refcount_level))) {
+		pag->pagf_longest = be32_to_cpu(agf->agf_longest);
+		pag->pagf_levels[XFS_BTNUM_BNOi] =
+				be32_to_cpu(agf->agf_levels[XFS_BTNUM_BNOi]);
+		pag->pagf_levels[XFS_BTNUM_CNTi] =
+				be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNTi]);
+		pag->pagf_levels[XFS_BTNUM_RMAPi] =
+				be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAPi]);
+		pag->pagf_refcount_level =
+				be32_to_cpu(agf->agf_refcount_level);
+	}
+
+	error = xfs_repair_mod_fdblocks(sc, delta_fdblocks);
+	if (error)
+		goto err;
+
+	/* Write this to disk. */
+	xfs_trans_buf_set_type(sc->tp, agf_bp, XFS_BLFT_AGF_BUF);
+	xfs_trans_log_buf(sc->tp, agf_bp, 0, BBTOB(agf_bp->b_length) - 1);
+	return error;
+
+err:
+	if (cur)
+		xfs_btree_del_cursor(cur, error ? XFS_BTREE_ERROR :
+				XFS_BTREE_NOERROR);
+	memcpy(agf, &old_agf, sizeof(old_agf));
+	return error;
+}
+
+/* AGFL */
+
+struct xfs_repair_agfl {
+	struct xfs_repair_extent_list	freesp_list;
+	struct xfs_repair_extent_list	agmeta_list;
+	struct xfs_scrub_context	*sc;
+};
+
+/* Record all freespace information. */
+STATIC int
+xfs_repair_agfl_rmap_fn(
+	struct xfs_btree_cur		*cur,
+	struct xfs_rmap_irec		*rec,
+	void				*priv)
+{
+	struct xfs_repair_agfl		*ra = priv;
+	struct xfs_buf			*bp;
+	xfs_fsblock_t			fsb;
+	int				i;
+	int				error = 0;
+
+	if (xfs_scrub_should_terminate(ra->sc, &error))
+		return error;
+
+	/* Record all the OWN_AG blocks... */
+	if (rec->rm_owner == XFS_RMAP_OWN_AG) {
+		fsb = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_private.a.agno,
+				rec->rm_startblock);
+		error = xfs_repair_collect_btree_extent(ra->sc,
+				&ra->freesp_list, fsb, rec->rm_blockcount);
+		if (error)
+			return error;
+	}
+
+	/* ...and all the rmapbt blocks... */
+	for (i = 0; i < cur->bc_nlevels && cur->bc_ptrs[i] == 1; i++) {
+		xfs_btree_get_block(cur, i, &bp);
+		if (!bp)
+			continue;
+		fsb = XFS_DADDR_TO_FSB(cur->bc_mp, bp->b_bn);
+		error = xfs_repair_collect_btree_extent(ra->sc,
+				&ra->agmeta_list, fsb, 1);
+		if (error)
+			return error;
+	}
+
+	return 0;
+}
+
+/* Add a btree block to the agmeta list. */
+STATIC int
+xfs_repair_agfl_visit_btblock(
+	struct xfs_btree_cur		*cur,
+	int				level,
+	void				*priv)
+{
+	struct xfs_repair_agfl		*ra = priv;
+	struct xfs_buf			*bp;
+	xfs_fsblock_t			fsb;
+	int				error = 0;
+
+	if (xfs_scrub_should_terminate(ra->sc, &error))
+		return error;
+
+	xfs_btree_get_block(cur, level, &bp);
+	if (!bp)
+		return 0;
+
+	fsb = XFS_DADDR_TO_FSB(cur->bc_mp, bp->b_bn);
+	return xfs_repair_collect_btree_extent(ra->sc, &ra->agmeta_list,
+			fsb, 1);
+}
+
+/* Repair the AGFL. */
+int
+xfs_repair_agfl(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_repair_agfl		ra;
+	struct xfs_owner_info		oinfo;
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_buf			*agf_bp;
+	struct xfs_buf			*agfl_bp;
+	struct xfs_agf			*agf;
+	struct xfs_agfl			*agfl;
+	struct xfs_btree_cur		*cur = NULL;
+	__be32				*agfl_bno;
+	struct xfs_repair_extent	*rae;
+	struct xfs_repair_extent	*n;
+	xfs_agblock_t			flcount;
+	xfs_agblock_t			agbno;
+	xfs_agblock_t			bno;
+	xfs_agblock_t			old_flcount;
+	int				error;
+
+	/* We require the rmapbt to rebuild anything. */
+	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+		return -EOPNOTSUPP;
+
+	xfs_scrub_perag_get(sc->mp, &sc->sa);
+	xfs_repair_init_extent_list(&ra.freesp_list);
+	xfs_repair_init_extent_list(&ra.agmeta_list);
+	ra.sc = sc;
+
+	error = xfs_alloc_read_agf(mp, sc->tp, sc->sa.agno, 0, &agf_bp);
+	if (error)
+		return error;
+	if (!agf_bp)
+		return -ENOMEM;
+
+	error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
+			XFS_AG_DADDR(mp, sc->sa.agno, XFS_AGFL_DADDR(mp)),
+			XFS_FSS_TO_BB(mp, 1), 0, &agfl_bp, NULL);
+	if (error)
+		return error;
+	agfl_bp->b_ops = &xfs_agfl_buf_ops;
+
+	/* Find all space used by the free space btrees & rmapbt. */
+	cur = xfs_rmapbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno);
+	error = xfs_rmap_query_all(cur, xfs_repair_agfl_rmap_fn, &ra);
+	if (error)
+		goto err;
+	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+
+	/* Find all space used by bnobt. */
+	cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno,
+			XFS_BTNUM_BNO);
+	error = xfs_btree_visit_blocks(cur, xfs_repair_agfl_visit_btblock,
+			&ra);
+	if (error)
+		goto err;
+	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+
+	/* Find all space used by cntbt. */
+	cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno,
+			XFS_BTNUM_CNT);
+	error = xfs_btree_visit_blocks(cur, xfs_repair_agfl_visit_btblock,
+			&ra);
+	if (error)
+		goto err;
+	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+	cur = NULL;
+
+	/*
+	 * Drop the freesp meta blocks that are in use by btrees.
+	 * The remaining blocks /should/ be AGFL blocks.
+	 */
+	error = xfs_repair_subtract_extents(sc, &ra.freesp_list,
+			&ra.agmeta_list);
+	if (error)
+		goto err;
+	xfs_repair_cancel_btree_extents(sc, &ra.agmeta_list);
+
+	/* Calculate the new AGFL size. */
+	flcount = 0;
+	for_each_xfs_repair_extent_safe(rae, n, &ra.freesp_list) {
+		for (bno = 0; bno < rae->len; bno++) {
+			if (flcount >= xfs_agfl_size(mp) - 1)
+				break;
+			flcount++;
+		}
+	}
+
+	/* Update fdblocks if flcount changed. */
+	agf = XFS_BUF_TO_AGF(agf_bp);
+	old_flcount = be32_to_cpu(agf->agf_flcount);
+	if (flcount != old_flcount) {
+		int64_t	delta_fdblocks = (int64_t)flcount - old_flcount;
+
+		error = xfs_repair_mod_fdblocks(sc, delta_fdblocks);
+		if (error)
+			goto err;
+		if (sc->sa.pag->pagf_init)
+			sc->sa.pag->pagf_flcount = flcount;
+	}
+
+	/* Update the AGF pointers. */
+	agf->agf_flfirst = cpu_to_be32(1);
+	agf->agf_flcount = cpu_to_be32(flcount);
+	agf->agf_fllast = cpu_to_be32(flcount);
+
+	/* Start rewriting the header. */
+	agfl = XFS_BUF_TO_AGFL(agfl_bp);
+	memset(agfl, 0xFF, mp->m_sb.sb_sectsize);
+	agfl->agfl_magicnum = cpu_to_be32(XFS_AGFL_MAGIC);
+	agfl->agfl_seqno = cpu_to_be32(sc->sa.agno);
+	uuid_copy(&agfl->agfl_uuid, &mp->m_sb.sb_meta_uuid);
+
+	/* Fill the AGFL with the remaining blocks. */
+	flcount = 0;
+	agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agfl_bp);
+	for_each_xfs_repair_extent_safe(rae, n, &ra.freesp_list) {
+		agbno = XFS_FSB_TO_AGBNO(mp, rae->fsbno);
+
+		trace_xfs_repair_agfl_insert(mp, sc->sa.agno, agbno, rae->len);
+
+		for (bno = 0; bno < rae->len; bno++) {
+			if (flcount >= xfs_agfl_size(mp) - 1)
+				break;
+			agfl_bno[flcount + 1] = cpu_to_be32(agbno + bno);
+			flcount++;
+		}
+		rae->fsbno += bno;
+		rae->len -= bno;
+		if (rae->len)
+			break;
+		list_del(&rae->list);
+		kmem_free(rae);
+	}
+
+	/* Write AGF and AGFL to disk. */
+	xfs_alloc_log_agf(sc->tp, agf_bp,
+			XFS_AGF_FLFIRST | XFS_AGF_FLLAST | XFS_AGF_FLCOUNT);
+	xfs_trans_buf_set_type(sc->tp, agfl_bp, XFS_BLFT_AGFL_BUF);
+	xfs_trans_log_buf(sc->tp, agfl_bp, 0, BBTOB(agfl_bp->b_length) - 1);
+
+	/* Dump any AGFL overflow. */
+	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_AG);
+	return xfs_repair_reap_btree_extents(sc, &ra.freesp_list, &oinfo,
+			XFS_AG_RESV_AGFL);
+err:
+	xfs_repair_cancel_btree_extents(sc, &ra.agmeta_list);
+	xfs_repair_cancel_btree_extents(sc, &ra.freesp_list);
+	if (cur)
+		xfs_btree_del_cursor(cur, error ? XFS_BTREE_ERROR :
+				XFS_BTREE_NOERROR);
+	return error;
+}
diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
index e3e8fba1c99c..5f31dc8af505 100644
--- a/fs/xfs/scrub/repair.c
+++ b/fs/xfs/scrub/repair.c
@@ -1087,3 +1087,33 @@ xfs_repair_ino_dqattach(
 
 	return error;
 }
+
+/*
+ * We changed this AGF's free block count, so now we need to reset the global
+ * counters.  We use the transaction to update the global counters, so if the
+ * AG free counts were low we have to ask the transaction for more block
+ * reservation before decreasing fdblocks.
+ *
+ * XXX: We ought to have some mechanism for checking and fixing the superblock
+ * counters (particularly if we're close to ENOSPC) but that's left as an open
+ * research question for now.
+ */
+int
+xfs_repair_mod_fdblocks(
+	struct xfs_scrub_context	*sc,
+	int64_t				delta_fdblocks)
+{
+	int				error;
+
+	if (delta_fdblocks == 0)
+		return 0;
+
+	if (delta_fdblocks < 0) {
+		error = xfs_trans_reserve_more(sc->tp, -delta_fdblocks, 0);
+		if (error)
+			return error;
+	}
+
+	xfs_trans_mod_sb(sc->tp, XFS_TRANS_SB_FDBLOCKS, delta_fdblocks);
+	return 0;
+}
diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
index f2b0895294db..97794c281a23 100644
--- a/fs/xfs/scrub/repair.h
+++ b/fs/xfs/scrub/repair.h
@@ -98,11 +98,15 @@ int xfs_repair_find_ag_btree_roots(struct xfs_scrub_context *sc,
 		struct xfs_buf *agfl_bp);
 void xfs_repair_force_quotacheck(struct xfs_scrub_context *sc, uint dqtype);
 int xfs_repair_ino_dqattach(struct xfs_scrub_context *sc);
+int xfs_repair_mod_fdblocks(struct xfs_scrub_context *sc,
+		int64_t delta_fdblocks);
 
 /* Metadata repairers */
 
 int xfs_repair_probe(struct xfs_scrub_context *sc);
 int xfs_repair_superblock(struct xfs_scrub_context *sc);
+int xfs_repair_agf(struct xfs_scrub_context *sc);
+int xfs_repair_agfl(struct xfs_scrub_context *sc);
 
 #else
 
@@ -126,6 +130,8 @@ xfs_repair_calc_ag_resblks(
 
 #define xfs_repair_probe		xfs_repair_notsupported
 #define xfs_repair_superblock		xfs_repair_notsupported
+#define xfs_repair_agf			xfs_repair_notsupported
+#define xfs_repair_agfl			xfs_repair_notsupported
 
 #endif /* CONFIG_XFS_ONLINE_REPAIR */
 
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 36db098ba583..0b523ab9b8b0 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -222,13 +222,13 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.type	= ST_PERAG,
 		.setup	= xfs_scrub_setup_fs,
 		.scrub	= xfs_scrub_agf,
-		.repair	= xfs_repair_notsupported,
+		.repair	= xfs_repair_agf,
 	},
 	[XFS_SCRUB_TYPE_AGFL]= {	/* agfl */
 		.type	= ST_PERAG,
 		.setup	= xfs_scrub_setup_fs,
 		.scrub	= xfs_scrub_agfl,
-		.repair	= xfs_repair_notsupported,
+		.repair	= xfs_repair_agfl,
 	},
 	[XFS_SCRUB_TYPE_AGI] = {	/* agi */
 		.type	= ST_PERAG,
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index fc7ba75b8b69..5c24e66170fe 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -138,6 +138,60 @@ xfs_trans_dup(
 	return ntp;
 }
 
+/*
+ * Try to reserve more blocks for a transaction.  The single use case we
+ * support is for online repair -- use a transaction to gather data without
+ * fear of btree cycle deadlocks; calculate how many blocks we really need
+ * from that data; and only then start modifying data.  This can fail due to
+ * ENOSPC, so we have to be able to cancel the transaction.
+ */
+int
+xfs_trans_reserve_more(
+	struct xfs_trans	*tp,
+	uint			blocks,
+	uint			rtextents)
+{
+	struct xfs_mount	*mp = tp->t_mountp;
+	bool			rsvd = (tp->t_flags & XFS_TRANS_RESERVE) != 0;
+	int			error = 0;
+
+	ASSERT(!(tp->t_flags & XFS_TRANS_DIRTY));
+
+	/*
+	 * Attempt to reserve the needed disk blocks by decrementing
+	 * the number needed from the number available.  This will
+	 * fail if the count would go below zero.
+	 */
+	if (blocks > 0) {
+		error = xfs_mod_fdblocks(mp, -((int64_t)blocks), rsvd);
+		if (error != 0)
+			return -ENOSPC;
+		tp->t_blk_res += blocks;
+	}
+
+	/*
+	 * Attempt to reserve the needed realtime extents by decrementing
+	 * the number needed from the number available.  This will
+	 * fail if the count would go below zero.
+	 */
+	if (rtextents > 0) {
+		error = xfs_mod_frextents(mp, -((int64_t)rtextents));
+		if (error) {
+			error = -ENOSPC;
+			goto out_blocks;
+		}
+		tp->t_rtx_res += rtextents;
+	}
+
+	return 0;
+out_blocks:
+	if (blocks > 0) {
+		xfs_mod_fdblocks(mp, (int64_t)blocks, rsvd);
+		tp->t_blk_res -= blocks;
+	}
+	return error;
+}
+
 /*
  * This is called to reserve free disk blocks and log space for the
  * given transaction.  This must be done before allocating any resources
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 29706b8b3bd4..7284555c4801 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -165,6 +165,8 @@ typedef struct xfs_trans {
 int		xfs_trans_alloc(struct xfs_mount *mp, struct xfs_trans_res *resp,
 			uint blocks, uint rtextents, uint flags,
 			struct xfs_trans **tpp);
+int		xfs_trans_reserve_more(struct xfs_trans *tp, uint blocks,
+			uint rtextents);
 int		xfs_trans_alloc_empty(struct xfs_mount *mp,
 			struct xfs_trans **tpp);
 void		xfs_trans_mod_sb(xfs_trans_t *, uint, int64_t);


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 02/14] xfs: repair the AGI
  2018-05-30 19:30 [PATCH v15.2 00/14] xfs-4.18: online repair support Darrick J. Wong
  2018-05-30 19:30 ` [PATCH 01/14] xfs: repair the AGF and AGFL Darrick J. Wong
@ 2018-05-30 19:30 ` Darrick J. Wong
  2018-06-04  1:56   ` Dave Chinner
  2018-05-30 19:30 ` [PATCH 03/14] xfs: repair free space btrees Darrick J. Wong
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 35+ messages in thread
From: Darrick J. Wong @ 2018-05-30 19:30 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Rebuild the AGI header items with some help from the rmapbt.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/scrub/agheader_repair.c |  108 ++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/repair.c          |   29 +++++++++++
 fs/xfs/scrub/repair.h          |    5 ++
 fs/xfs/scrub/scrub.c           |    2 -
 4 files changed, 143 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/scrub/agheader_repair.c b/fs/xfs/scrub/agheader_repair.c
index 0f794d27382a..ba750d3d11f0 100644
--- a/fs/xfs/scrub/agheader_repair.c
+++ b/fs/xfs/scrub/agheader_repair.c
@@ -552,3 +552,111 @@ xfs_repair_agfl(
 				XFS_BTREE_NOERROR);
 	return error;
 }
+
+/* AGI */
+
+int
+xfs_repair_agi(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_repair_find_ag_btree	fab[] = {
+		{
+			.rmap_owner = XFS_RMAP_OWN_INOBT,
+			.buf_ops = &xfs_inobt_buf_ops,
+			.magic = XFS_IBT_CRC_MAGIC,
+		},
+		{
+			.rmap_owner = XFS_RMAP_OWN_INOBT,
+			.buf_ops = &xfs_inobt_buf_ops,
+			.magic = XFS_FIBT_CRC_MAGIC,
+		},
+		{
+			.buf_ops = NULL
+		},
+	};
+	struct xfs_agi			old_agi;
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_buf			*agi_bp;
+	struct xfs_buf			*agf_bp;
+	struct xfs_agi			*agi;
+	struct xfs_btree_cur		*cur;
+	xfs_agino_t			old_count;
+	xfs_agino_t			old_freecount;
+	xfs_agino_t			count;
+	xfs_agino_t			freecount;
+	int				bucket;
+	int				error;
+
+	/* We require the rmapbt to rebuild anything. */
+	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+		return -EOPNOTSUPP;
+
+	xfs_scrub_perag_get(sc->mp, &sc->sa);
+	error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
+			XFS_AG_DADDR(mp, sc->sa.agno, XFS_AGI_DADDR(mp)),
+			XFS_FSS_TO_BB(mp, 1), 0, &agi_bp, NULL);
+	if (error)
+		return error;
+	agi_bp->b_ops = &xfs_agi_buf_ops;
+
+	error = xfs_alloc_read_agf(mp, sc->tp, sc->sa.agno, 0, &agf_bp);
+	if (error)
+		return error;
+	if (!agf_bp)
+		return -ENOMEM;
+
+	/* Find the btree roots. */
+	error = xfs_repair_find_ag_btree_roots(sc, agf_bp, fab, NULL);
+	if (error)
+		return error;
+	if (fab[0].root == NULLAGBLOCK || fab[0].height > XFS_BTREE_MAXLEVELS)
+		return -EFSCORRUPTED;
+	if (xfs_sb_version_hasfinobt(&mp->m_sb) &&
+	    (fab[1].root == NULLAGBLOCK || fab[1].height > XFS_BTREE_MAXLEVELS))
+		return -EFSCORRUPTED;
+
+	/* Start rewriting the header. */
+	agi = XFS_BUF_TO_AGI(agi_bp);
+	old_agi = *agi;
+	old_count = be32_to_cpu(old_agi.agi_count);
+	old_freecount = be32_to_cpu(old_agi.agi_freecount);
+	memset(agi, 0, mp->m_sb.sb_sectsize);
+	agi->agi_magicnum = cpu_to_be32(XFS_AGI_MAGIC);
+	agi->agi_versionnum = cpu_to_be32(XFS_AGI_VERSION);
+	agi->agi_seqno = cpu_to_be32(sc->sa.agno);
+	agi->agi_length = cpu_to_be32(xfs_ag_block_count(mp, sc->sa.agno));
+	agi->agi_newino = cpu_to_be32(NULLAGINO);
+	agi->agi_dirino = cpu_to_be32(NULLAGINO);
+	if (xfs_sb_version_hascrc(&mp->m_sb))
+		uuid_copy(&agi->agi_uuid, &mp->m_sb.sb_meta_uuid);
+	for (bucket = 0; bucket < XFS_AGI_UNLINKED_BUCKETS; bucket++)
+		agi->agi_unlinked[bucket] = cpu_to_be32(NULLAGINO);
+	agi->agi_root = cpu_to_be32(fab[0].root);
+	agi->agi_level = cpu_to_be32(fab[0].height);
+	if (xfs_sb_version_hasfinobt(&mp->m_sb)) {
+		agi->agi_free_root = cpu_to_be32(fab[1].root);
+		agi->agi_free_level = cpu_to_be32(fab[1].height);
+	}
+
+	/* Update the AGI counters. */
+	cur = xfs_inobt_init_cursor(mp, sc->tp, agi_bp, sc->sa.agno,
+			XFS_BTNUM_INO);
+	error = xfs_ialloc_count_inodes(cur, &count, &freecount);
+	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+	if (error)
+		goto err;
+	agi->agi_count = cpu_to_be32(count);
+	agi->agi_freecount = cpu_to_be32(freecount);
+
+	xfs_repair_mod_ino_counts(sc, old_count, count, old_freecount,
+			freecount);
+
+	/* Write this to disk. */
+	xfs_trans_buf_set_type(sc->tp, agi_bp, XFS_BLFT_AGI_BUF);
+	xfs_trans_log_buf(sc->tp, agi_bp, 0, BBTOB(agi_bp->b_length) - 1);
+	return error;
+
+err:
+	*agi = old_agi;
+	return error;
+}
diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
index 5f31dc8af505..45a91841c0ac 100644
--- a/fs/xfs/scrub/repair.c
+++ b/fs/xfs/scrub/repair.c
@@ -1117,3 +1117,32 @@ xfs_repair_mod_fdblocks(
 	xfs_trans_mod_sb(sc->tp, XFS_TRANS_SB_FDBLOCKS, delta_fdblocks);
 	return 0;
 }
+
+/*
+ * Update in-core and superblock inode counters.
+ *
+ * XXX: Need to check the sb counters, see the comment in
+ * xfs_repair_mod_fdblocks about sb counter checking being unfinished.
+ */
+void
+xfs_repair_mod_ino_counts(
+	struct xfs_scrub_context	*sc,
+	xfs_agino_t			old_count,
+	xfs_agino_t			count,
+	xfs_agino_t			old_freecount,
+	xfs_agino_t			freecount)
+{
+	if (old_count != count) {
+		if (sc->sa.pag->pagi_init)
+			sc->sa.pag->pagi_count = count;
+		xfs_trans_mod_sb(sc->tp, XFS_TRANS_SB_ICOUNT,
+				(int64_t)count - old_count);
+	}
+
+	if (old_freecount != freecount) {
+		if (sc->sa.pag->pagi_init)
+			sc->sa.pag->pagi_freecount = freecount;
+		xfs_trans_mod_sb(sc->tp, XFS_TRANS_SB_IFREE,
+				(int64_t)freecount - old_freecount);
+	}
+}
diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
index 97794c281a23..9d69d03f1bfe 100644
--- a/fs/xfs/scrub/repair.h
+++ b/fs/xfs/scrub/repair.h
@@ -100,6 +100,9 @@ void xfs_repair_force_quotacheck(struct xfs_scrub_context *sc, uint dqtype);
 int xfs_repair_ino_dqattach(struct xfs_scrub_context *sc);
 int xfs_repair_mod_fdblocks(struct xfs_scrub_context *sc,
 		int64_t delta_fdblocks);
+void xfs_repair_mod_ino_counts(struct xfs_scrub_context *sc,
+		xfs_agino_t old_count, xfs_agino_t count,
+		xfs_agino_t old_freecount, xfs_agino_t freecount);
 
 /* Metadata repairers */
 
@@ -107,6 +110,7 @@ int xfs_repair_probe(struct xfs_scrub_context *sc);
 int xfs_repair_superblock(struct xfs_scrub_context *sc);
 int xfs_repair_agf(struct xfs_scrub_context *sc);
 int xfs_repair_agfl(struct xfs_scrub_context *sc);
+int xfs_repair_agi(struct xfs_scrub_context *sc);
 
 #else
 
@@ -132,6 +136,7 @@ xfs_repair_calc_ag_resblks(
 #define xfs_repair_superblock		xfs_repair_notsupported
 #define xfs_repair_agf			xfs_repair_notsupported
 #define xfs_repair_agfl			xfs_repair_notsupported
+#define xfs_repair_agi			xfs_repair_notsupported
 
 #endif /* CONFIG_XFS_ONLINE_REPAIR */
 
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 0b523ab9b8b0..d68cfcd31f30 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -234,7 +234,7 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.type	= ST_PERAG,
 		.setup	= xfs_scrub_setup_fs,
 		.scrub	= xfs_scrub_agi,
-		.repair	= xfs_repair_notsupported,
+		.repair	= xfs_repair_agi,
 	},
 	[XFS_SCRUB_TYPE_BNOBT] = {	/* bnobt */
 		.type	= ST_PERAG,


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 03/14] xfs: repair free space btrees
  2018-05-30 19:30 [PATCH v15.2 00/14] xfs-4.18: online repair support Darrick J. Wong
  2018-05-30 19:30 ` [PATCH 01/14] xfs: repair the AGF and AGFL Darrick J. Wong
  2018-05-30 19:30 ` [PATCH 02/14] xfs: repair the AGI Darrick J. Wong
@ 2018-05-30 19:30 ` Darrick J. Wong
  2018-06-04  2:12   ` Dave Chinner
  2018-05-30 19:31 ` [PATCH 04/14] xfs: repair inode btrees Darrick J. Wong
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 35+ messages in thread
From: Darrick J. Wong @ 2018-05-30 19:30 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Rebuild the free space btrees from the gaps in the rmap btree.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile             |    1 
 fs/xfs/scrub/alloc.c        |    1 
 fs/xfs/scrub/alloc_repair.c |  430 +++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/common.c       |    8 +
 fs/xfs/scrub/repair.h       |    2 
 fs/xfs/scrub/scrub.c        |    4 
 6 files changed, 442 insertions(+), 4 deletions(-)
 create mode 100644 fs/xfs/scrub/alloc_repair.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 29fe115f29d5..abe035ad0aa4 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -175,6 +175,7 @@ xfs-$(CONFIG_XFS_QUOTA)		+= scrub/quota.o
 ifeq ($(CONFIG_XFS_ONLINE_REPAIR),y)
 xfs-y				+= $(addprefix scrub/, \
 				   agheader_repair.o \
+				   alloc_repair.o \
 				   repair.o \
 				   )
 endif
diff --git a/fs/xfs/scrub/alloc.c b/fs/xfs/scrub/alloc.c
index 941a0a55224e..fe7e8bdf4a52 100644
--- a/fs/xfs/scrub/alloc.c
+++ b/fs/xfs/scrub/alloc.c
@@ -29,7 +29,6 @@
 #include "xfs_log_format.h"
 #include "xfs_trans.h"
 #include "xfs_sb.h"
-#include "xfs_alloc.h"
 #include "xfs_rmap.h"
 #include "xfs_alloc.h"
 #include "scrub/xfs_scrub.h"
diff --git a/fs/xfs/scrub/alloc_repair.c b/fs/xfs/scrub/alloc_repair.c
new file mode 100644
index 000000000000..5a81713a69cd
--- /dev/null
+++ b/fs/xfs/scrub/alloc_repair.c
@@ -0,0 +1,430 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_alloc.h"
+#include "xfs_alloc_btree.h"
+#include "xfs_rmap.h"
+#include "xfs_rmap_btree.h"
+#include "xfs_inode.h"
+#include "xfs_refcount.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/btree.h"
+#include "scrub/trace.h"
+#include "scrub/repair.h"
+
+/* Free space btree repair. */
+
+struct xfs_repair_alloc_extent {
+	struct list_head		list;
+	xfs_agblock_t			bno;
+	xfs_extlen_t			len;
+};
+
+struct xfs_repair_alloc {
+	struct list_head		extlist;
+	struct xfs_repair_extent_list	btlist;	  /* OWN_AG blocks */
+	struct xfs_repair_extent_list	nobtlist; /* rmapbt/agfl blocks */
+	struct xfs_scrub_context	*sc;
+	xfs_agblock_t			next_bno;
+	uint64_t			nr_records;
+};
+
+/* Record extents that aren't in use from gaps in the rmap records. */
+STATIC int
+xfs_repair_alloc_extent_fn(
+	struct xfs_btree_cur		*cur,
+	struct xfs_rmap_irec		*rec,
+	void				*priv)
+{
+	struct xfs_repair_alloc		*ra = priv;
+	struct xfs_repair_alloc_extent	*rae;
+	struct xfs_buf			*bp;
+	xfs_fsblock_t			fsb;
+	int				i;
+	int				error;
+
+	/* Record all the OWN_AG blocks... */
+	if (rec->rm_owner == XFS_RMAP_OWN_AG) {
+		fsb = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_private.a.agno,
+				rec->rm_startblock);
+		error = xfs_repair_collect_btree_extent(ra->sc,
+				&ra->btlist, fsb, rec->rm_blockcount);
+		if (error)
+			return error;
+	}
+
+	/* ...and all the rmapbt blocks... */
+	for (i = 0; i < cur->bc_nlevels && cur->bc_ptrs[i] == 1; i++) {
+		xfs_btree_get_block(cur, i, &bp);
+		if (!bp)
+			continue;
+		fsb = XFS_DADDR_TO_FSB(cur->bc_mp, bp->b_bn);
+		error = xfs_repair_collect_btree_extent(ra->sc,
+				&ra->nobtlist, fsb, 1);
+		if (error)
+			return error;
+	}
+
+	/* ...and all the free space. */
+	if (rec->rm_startblock > ra->next_bno) {
+		trace_xfs_repair_alloc_extent_fn(cur->bc_mp,
+				cur->bc_private.a.agno,
+				ra->next_bno, rec->rm_startblock - ra->next_bno,
+				XFS_RMAP_OWN_NULL, 0, 0);
+
+		rae = kmem_alloc(sizeof(struct xfs_repair_alloc_extent),
+				KM_MAYFAIL);
+		if (!rae)
+			return -ENOMEM;
+		INIT_LIST_HEAD(&rae->list);
+		rae->bno = ra->next_bno;
+		rae->len = rec->rm_startblock - ra->next_bno;
+		list_add_tail(&rae->list, &ra->extlist);
+		ra->nr_records++;
+	}
+	ra->next_bno = max_t(xfs_agblock_t, ra->next_bno,
+			rec->rm_startblock + rec->rm_blockcount);
+	return 0;
+}
+
+/* Find the longest free extent in the list. */
+static struct xfs_repair_alloc_extent *
+xfs_repair_allocbt_get_longest(
+	struct xfs_repair_alloc		*ra)
+{
+	struct xfs_repair_alloc_extent	*rae;
+	struct xfs_repair_alloc_extent	*longest = NULL;
+
+	list_for_each_entry(rae, &ra->extlist, list) {
+		if (!longest || rae->len > longest->len)
+			longest = rae;
+	}
+	return longest;
+}
+
+/* Collect an AGFL block for the not-to-release list. */
+static int
+xfs_repair_collect_agfl_block(
+	struct xfs_mount		*mp,
+	xfs_agblock_t			bno,
+	void				*priv)
+{
+	struct xfs_repair_alloc		*ra = priv;
+	xfs_fsblock_t			fsb;
+
+	fsb = XFS_AGB_TO_FSB(mp, ra->sc->sa.agno, bno);
+	return xfs_repair_collect_btree_extent(ra->sc, &ra->nobtlist, fsb, 1);
+}
+
+/* Compare two btree extents. */
+static int
+xfs_repair_allocbt_extent_cmp(
+	void				*priv,
+	struct list_head		*a,
+	struct list_head		*b)
+{
+	struct xfs_repair_alloc_extent	*ap;
+	struct xfs_repair_alloc_extent	*bp;
+
+	ap = container_of(a, struct xfs_repair_alloc_extent, list);
+	bp = container_of(b, struct xfs_repair_alloc_extent, list);
+
+	if (ap->bno > bp->bno)
+		return 1;
+	else if (ap->bno < bp->bno)
+		return -1;
+	return 0;
+}
+
+/* Put an extent onto the free list. */
+STATIC int
+xfs_repair_allocbt_free_extent(
+	struct xfs_scrub_context	*sc,
+	xfs_fsblock_t			fsbno,
+	xfs_extlen_t			len,
+	struct xfs_owner_info		*oinfo)
+{
+	int				error;
+
+	error = xfs_free_extent(sc->tp, fsbno, len, oinfo, 0);
+	if (error)
+		return error;
+	error = xfs_repair_roll_ag_trans(sc);
+	if (error)
+		return error;
+	return xfs_mod_fdblocks(sc->mp, -(int64_t)len, false);
+}
+
+/* Allocate a block from the (cached) longest extent in the AG. */
+STATIC xfs_fsblock_t
+xfs_repair_allocbt_alloc_from_longest(
+	struct xfs_repair_alloc		*ra,
+	struct xfs_repair_alloc_extent	**longest)
+{
+	xfs_fsblock_t			fsb;
+
+	if (*longest && (*longest)->len == 0) {
+		list_del(&(*longest)->list);
+		kmem_free(*longest);
+		*longest = NULL;
+	}
+
+	if (*longest == NULL) {
+		*longest = xfs_repair_allocbt_get_longest(ra);
+		if (*longest == NULL)
+			return NULLFSBLOCK;
+	}
+
+	fsb = XFS_AGB_TO_FSB(ra->sc->mp, ra->sc->sa.agno, (*longest)->bno);
+	(*longest)->bno++;
+	(*longest)->len--;
+	return fsb;
+}
+
+/* Repair the freespace btrees for some AG. */
+int
+xfs_repair_allocbt(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_repair_alloc		ra;
+	struct xfs_owner_info		oinfo;
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_btree_cur		*cur = NULL;
+	struct xfs_repair_alloc_extent	*longest = NULL;
+	struct xfs_repair_alloc_extent	*rae;
+	struct xfs_repair_alloc_extent	*n;
+	struct xfs_perag		*pag;
+	struct xfs_agf			*agf;
+	struct xfs_buf			*bp;
+	xfs_fsblock_t			bnofsb;
+	xfs_fsblock_t			cntfsb;
+	xfs_extlen_t			oldf;
+	xfs_extlen_t			nr_blocks;
+	xfs_agblock_t			agend;
+	int				error;
+
+	/* We require the rmapbt to rebuild anything. */
+	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+		return -EOPNOTSUPP;
+
+	xfs_scrub_perag_get(sc->mp, &sc->sa);
+	pag = sc->sa.pag;
+	/*
+	 * Make sure the busy extent list is clear because we can't put
+	 * extents on there twice.
+	 */
+	spin_lock(&pag->pagb_lock);
+	if (pag->pagb_tree.rb_node) {
+		spin_unlock(&pag->pagb_lock);
+		return -EDEADLOCK;
+	}
+	spin_unlock(&pag->pagb_lock);
+
+	/*
+	 * Collect all reverse mappings for free extents, and the rmapbt
+	 * blocks.  We can discover the rmapbt blocks completely from a
+	 * query_all handler because there are always rmapbt entries.
+	 * (One cannot use on query_all to visit all of a btree's blocks
+	 * unless that btree is guaranteed to have at least one entry.)
+	 */
+	INIT_LIST_HEAD(&ra.extlist);
+	xfs_repair_init_extent_list(&ra.btlist);
+	xfs_repair_init_extent_list(&ra.nobtlist);
+	ra.next_bno = 0;
+	ra.nr_records = 0;
+	ra.sc = sc;
+
+	cur = xfs_rmapbt_init_cursor(mp, sc->tp, sc->sa.agf_bp, sc->sa.agno);
+	error = xfs_rmap_query_all(cur, xfs_repair_alloc_extent_fn, &ra);
+	if (error)
+		goto out;
+	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+	cur = NULL;
+
+	/* Insert a record for space between the last rmap and EOAG. */
+	agf = XFS_BUF_TO_AGF(sc->sa.agf_bp);
+	agend = be32_to_cpu(agf->agf_length);
+	if (ra.next_bno < agend) {
+		rae = kmem_alloc(sizeof(struct xfs_repair_alloc_extent),
+				KM_MAYFAIL);
+		if (!rae) {
+			error = -ENOMEM;
+			goto out;
+		}
+		INIT_LIST_HEAD(&rae->list);
+		rae->bno = ra.next_bno;
+		rae->len = agend - ra.next_bno;
+		list_add_tail(&rae->list, &ra.extlist);
+		ra.nr_records++;
+	}
+
+	/* Collect all the AGFL blocks. */
+	error = xfs_agfl_walk(sc->mp, XFS_BUF_TO_AGF(sc->sa.agf_bp),
+			sc->sa.agfl_bp, xfs_repair_collect_agfl_block, &ra);
+	if (error)
+		goto out;
+
+	/* Do we actually have enough space to do this? */
+	nr_blocks = 2 * xfs_allocbt_calc_size(mp, ra.nr_records);
+	if (!xfs_repair_ag_has_space(pag, nr_blocks, XFS_AG_RESV_NONE)) {
+		error = -ENOSPC;
+		goto out;
+	}
+
+	/* Invalidate all the bnobt/cntbt blocks in btlist. */
+	error = xfs_repair_subtract_extents(sc, &ra.btlist, &ra.nobtlist);
+	if (error)
+		goto out;
+	xfs_repair_cancel_btree_extents(sc, &ra.nobtlist);
+	error = xfs_repair_invalidate_blocks(sc, &ra.btlist);
+	if (error)
+		goto out;
+
+	/* Allocate new bnobt root. */
+	bnofsb = xfs_repair_allocbt_alloc_from_longest(&ra, &longest);
+	if (bnofsb == NULLFSBLOCK) {
+		error = -ENOSPC;
+		goto out;
+	}
+
+	/* Allocate new cntbt root. */
+	cntfsb = xfs_repair_allocbt_alloc_from_longest(&ra, &longest);
+	if (cntfsb == NULLFSBLOCK) {
+		error = -ENOSPC;
+		goto out;
+	}
+
+	agf = XFS_BUF_TO_AGF(sc->sa.agf_bp);
+	/* Initialize new bnobt root. */
+	error = xfs_repair_init_btblock(sc, bnofsb, &bp, XFS_BTNUM_BNO,
+			&xfs_allocbt_buf_ops);
+	if (error)
+		goto out;
+	agf->agf_roots[XFS_BTNUM_BNOi] =
+			cpu_to_be32(XFS_FSB_TO_AGBNO(mp, bnofsb));
+	agf->agf_levels[XFS_BTNUM_BNOi] = cpu_to_be32(1);
+
+	/* Initialize new cntbt root. */
+	error = xfs_repair_init_btblock(sc, cntfsb, &bp, XFS_BTNUM_CNT,
+			&xfs_allocbt_buf_ops);
+	if (error)
+		goto out;
+	agf->agf_roots[XFS_BTNUM_CNTi] =
+			cpu_to_be32(XFS_FSB_TO_AGBNO(mp, cntfsb));
+	agf->agf_levels[XFS_BTNUM_CNTi] = cpu_to_be32(1);
+
+	/*
+	 * Since we're abandoning the old bnobt/cntbt, we have to
+	 * decrease fdblocks by the # of blocks in those trees.
+	 * btreeblks counts the non-root blocks of the free space
+	 * and rmap btrees.  Do this before resetting the AGF counters.
+	 */
+	oldf = pag->pagf_btreeblks + 2;
+	oldf -= (be32_to_cpu(agf->agf_rmap_blocks) - 1);
+	error = xfs_mod_fdblocks(mp, -(int64_t)oldf, false);
+	if (error)
+		goto out;
+
+	/* Reset the perag info. */
+	pag->pagf_btreeblks = be32_to_cpu(agf->agf_rmap_blocks) - 1;
+	pag->pagf_freeblks = 0;
+	pag->pagf_longest = 0;
+	pag->pagf_levels[XFS_BTNUM_BNOi] =
+			be32_to_cpu(agf->agf_levels[XFS_BTNUM_BNOi]);
+	pag->pagf_levels[XFS_BTNUM_CNTi] =
+			be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNTi]);
+
+	/* Now reset the AGF counters. */
+	agf->agf_btreeblks = cpu_to_be32(pag->pagf_btreeblks);
+	agf->agf_freeblks = cpu_to_be32(pag->pagf_freeblks);
+	agf->agf_longest = cpu_to_be32(pag->pagf_longest);
+	xfs_alloc_log_agf(sc->tp, sc->sa.agf_bp,
+			XFS_AGF_ROOTS | XFS_AGF_LEVELS | XFS_AGF_BTREEBLKS |
+			XFS_AGF_LONGEST | XFS_AGF_FREEBLKS);
+	error = xfs_repair_roll_ag_trans(sc);
+	if (error)
+		goto out;
+
+	/*
+	 * Insert the longest free extent in case it's necessary to
+	 * refresh the AGFL with multiple blocks.
+	 */
+	xfs_rmap_skip_owner_update(&oinfo);
+	if (longest && longest->len == 0) {
+		error = xfs_repair_allocbt_free_extent(sc,
+				XFS_AGB_TO_FSB(sc->mp, sc->sa.agno,
+					longest->bno),
+				longest->len, &oinfo);
+		if (error)
+			goto out;
+		list_del(&longest->list);
+		kmem_free(longest);
+	}
+
+	/* Insert records into the new btrees. */
+	list_sort(NULL, &ra.extlist, xfs_repair_allocbt_extent_cmp);
+	list_for_each_entry_safe(rae, n, &ra.extlist, list) {
+		error = xfs_repair_allocbt_free_extent(sc,
+				XFS_AGB_TO_FSB(sc->mp, sc->sa.agno, rae->bno),
+				rae->len, &oinfo);
+		if (error)
+			goto out;
+		list_del(&rae->list);
+		kmem_free(rae);
+	}
+
+	/* Add rmap records for the btree roots */
+	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_AG);
+	error = xfs_rmap_alloc(sc->tp, sc->sa.agf_bp, sc->sa.agno,
+			XFS_FSB_TO_AGBNO(mp, bnofsb), 1, &oinfo);
+	if (error)
+		goto out;
+	error = xfs_rmap_alloc(sc->tp, sc->sa.agf_bp, sc->sa.agno,
+			XFS_FSB_TO_AGBNO(mp, cntfsb), 1, &oinfo);
+	if (error)
+		goto out;
+
+	/* Free all the OWN_AG blocks that are not in the rmapbt/agfl. */
+	return xfs_repair_reap_btree_extents(sc, &ra.btlist, &oinfo,
+			XFS_AG_RESV_NONE);
+out:
+	xfs_repair_cancel_btree_extents(sc, &ra.btlist);
+	xfs_repair_cancel_btree_extents(sc, &ra.nobtlist);
+	if (cur)
+		xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
+	list_for_each_entry_safe(rae, n, &ra.extlist, list) {
+		list_del(&rae->list);
+		kmem_free(rae);
+	}
+	return error;
+}
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 41198a5f872c..89938b328954 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -637,8 +637,14 @@ xfs_scrub_setup_ag_btree(
 	 * expensive operation should be performed infrequently and only
 	 * as a last resort.  Any caller that sets force_log should
 	 * document why they need to do so.
+	 *
+	 * Force everything in memory out to disk if we're repairing.
+	 * This ensures we won't get tripped up by btree blocks sitting
+	 * in memory waiting to have LSNs stamped in.  The AGF/AGI repair
+	 * routines use any available rmap data to try to find a btree
+	 * root that also passes the read verifiers.
 	 */
-	if (force_log) {
+	if (force_log || (sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR)) {
 		error = xfs_scrub_checkpoint_log(mp);
 		if (error)
 			return error;
diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
index 9d69d03f1bfe..82cf2d012ebc 100644
--- a/fs/xfs/scrub/repair.h
+++ b/fs/xfs/scrub/repair.h
@@ -111,6 +111,7 @@ int xfs_repair_superblock(struct xfs_scrub_context *sc);
 int xfs_repair_agf(struct xfs_scrub_context *sc);
 int xfs_repair_agfl(struct xfs_scrub_context *sc);
 int xfs_repair_agi(struct xfs_scrub_context *sc);
+int xfs_repair_allocbt(struct xfs_scrub_context *sc);
 
 #else
 
@@ -137,6 +138,7 @@ xfs_repair_calc_ag_resblks(
 #define xfs_repair_agf			xfs_repair_notsupported
 #define xfs_repair_agfl			xfs_repair_notsupported
 #define xfs_repair_agi			xfs_repair_notsupported
+#define xfs_repair_allocbt		xfs_repair_notsupported
 
 #endif /* CONFIG_XFS_ONLINE_REPAIR */
 
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index d68cfcd31f30..213d7e21466a 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -240,13 +240,13 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.type	= ST_PERAG,
 		.setup	= xfs_scrub_setup_ag_allocbt,
 		.scrub	= xfs_scrub_bnobt,
-		.repair	= xfs_repair_notsupported,
+		.repair	= xfs_repair_allocbt,
 	},
 	[XFS_SCRUB_TYPE_CNTBT] = {	/* cntbt */
 		.type	= ST_PERAG,
 		.setup	= xfs_scrub_setup_ag_allocbt,
 		.scrub	= xfs_scrub_cntbt,
-		.repair	= xfs_repair_notsupported,
+		.repair	= xfs_repair_allocbt,
 	},
 	[XFS_SCRUB_TYPE_INOBT] = {	/* inobt */
 		.type	= ST_PERAG,


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 04/14] xfs: repair inode btrees
  2018-05-30 19:30 [PATCH v15.2 00/14] xfs-4.18: online repair support Darrick J. Wong
                   ` (2 preceding siblings ...)
  2018-05-30 19:30 ` [PATCH 03/14] xfs: repair free space btrees Darrick J. Wong
@ 2018-05-30 19:31 ` Darrick J. Wong
  2018-06-04  3:41   ` Dave Chinner
  2018-05-30 19:31 ` [PATCH 05/14] xfs: repair the rmapbt Darrick J. Wong
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 35+ messages in thread
From: Darrick J. Wong @ 2018-05-30 19:31 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Use the rmapbt to find inode chunks, query the chunks to compute
hole and free masks, and with that information rebuild the inobt
and finobt.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile              |    1 
 fs/xfs/scrub/ialloc_repair.c |  479 ++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/repair.h        |    2 
 fs/xfs/scrub/scrub.c         |    4 
 4 files changed, 484 insertions(+), 2 deletions(-)
 create mode 100644 fs/xfs/scrub/ialloc_repair.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index abe035ad0aa4..7c442f83b179 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -176,6 +176,7 @@ ifeq ($(CONFIG_XFS_ONLINE_REPAIR),y)
 xfs-y				+= $(addprefix scrub/, \
 				   agheader_repair.o \
 				   alloc_repair.o \
+				   ialloc_repair.o \
 				   repair.o \
 				   )
 endif
diff --git a/fs/xfs/scrub/ialloc_repair.c b/fs/xfs/scrub/ialloc_repair.c
new file mode 100644
index 000000000000..dc7b3e95b6c1
--- /dev/null
+++ b/fs/xfs/scrub/ialloc_repair.c
@@ -0,0 +1,479 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_alloc.h"
+#include "xfs_ialloc.h"
+#include "xfs_ialloc_btree.h"
+#include "xfs_icache.h"
+#include "xfs_rmap.h"
+#include "xfs_rmap_btree.h"
+#include "xfs_log.h"
+#include "xfs_trans_priv.h"
+#include "xfs_error.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/btree.h"
+#include "scrub/trace.h"
+#include "scrub/repair.h"
+
+/* Inode btree repair. */
+
+struct xfs_repair_ialloc_extent {
+	struct list_head		list;
+	xfs_inofree_t			freemask;
+	xfs_agino_t			startino;
+	unsigned int			count;
+	unsigned int			usedcount;
+	uint16_t			holemask;
+};
+
+struct xfs_repair_ialloc {
+	struct list_head		extlist;
+	struct xfs_repair_extent_list		btlist;
+	struct xfs_scrub_context	*sc;
+	uint64_t			nr_records;
+};
+
+/* Set usedmask if the inode is in use. */
+STATIC int
+xfs_repair_ialloc_check_free(
+	struct xfs_btree_cur	*cur,
+	struct xfs_buf		*bp,
+	xfs_ino_t		fsino,
+	xfs_agino_t		bpino,
+	bool			*inuse)
+{
+	struct xfs_mount	*mp = cur->bc_mp;
+	struct xfs_dinode	*dip;
+	int			error;
+
+	/* Will the in-core inode tell us if it's in use? */
+	error = xfs_icache_inode_is_allocated(mp, cur->bc_tp, fsino, inuse);
+	if (!error)
+		return 0;
+
+	/* Inode uncached or half assembled, read disk buffer */
+	dip = xfs_buf_offset(bp, bpino * mp->m_sb.sb_inodesize);
+	if (be16_to_cpu(dip->di_magic) != XFS_DINODE_MAGIC)
+		return -EFSCORRUPTED;
+
+	if (dip->di_version >= 3 && be64_to_cpu(dip->di_ino) != fsino)
+		return -EFSCORRUPTED;
+
+	*inuse = dip->di_mode != 0;
+	return 0;
+}
+
+/* Record extents that belong to inode btrees. */
+STATIC int
+xfs_repair_ialloc_extent_fn(
+	struct xfs_btree_cur		*cur,
+	struct xfs_rmap_irec		*rec,
+	void				*priv)
+{
+	struct xfs_imap			imap;
+	struct xfs_repair_ialloc	*ri = priv;
+	struct xfs_repair_ialloc_extent	*rie;
+	struct xfs_dinode		*dip;
+	struct xfs_buf			*bp;
+	struct xfs_mount		*mp = cur->bc_mp;
+	xfs_ino_t			fsino;
+	xfs_inofree_t			usedmask;
+	xfs_fsblock_t			fsbno;
+	xfs_agnumber_t			agno;
+	xfs_agblock_t			agbno;
+	xfs_agino_t			cdist;
+	xfs_agino_t			startino;
+	xfs_agino_t			clusterino;
+	xfs_agino_t			nr_inodes;
+	xfs_agino_t			inoalign;
+	xfs_agino_t			agino;
+	xfs_agino_t			rmino;
+	uint16_t			fillmask;
+	bool				inuse;
+	int				blks_per_cluster;
+	int				usedcount;
+	int				error = 0;
+
+	if (xfs_scrub_should_terminate(ri->sc, &error))
+		return error;
+
+	/* Fragment of the old btrees; dispose of them later. */
+	if (rec->rm_owner == XFS_RMAP_OWN_INOBT) {
+		fsbno = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_private.a.agno,
+				rec->rm_startblock);
+		return xfs_repair_collect_btree_extent(ri->sc, &ri->btlist,
+				fsbno, rec->rm_blockcount);
+	}
+
+	/* Skip extents which are not owned by this inode and fork. */
+	if (rec->rm_owner != XFS_RMAP_OWN_INODES)
+		return 0;
+
+	agno = cur->bc_private.a.agno;
+	blks_per_cluster = xfs_icluster_size_fsb(mp);
+	nr_inodes = XFS_OFFBNO_TO_AGINO(mp, blks_per_cluster, 0);
+
+	if (rec->rm_startblock % blks_per_cluster != 0)
+		return -EFSCORRUPTED;
+
+	trace_xfs_repair_ialloc_extent_fn(mp, cur->bc_private.a.agno,
+			rec->rm_startblock, rec->rm_blockcount, rec->rm_owner,
+			rec->rm_offset, rec->rm_flags);
+
+	/*
+	 * Determine the inode block alignment, and where the block
+	 * ought to start if it's aligned properly.  On a sparse inode
+	 * system the rmap doesn't have to start on an alignment boundary,
+	 * but the record does.  On pre-sparse filesystems, we /must/
+	 * start both rmap and inobt on an alignment boundary.
+	 */
+	inoalign = xfs_ialloc_cluster_alignment(mp);
+	agbno = rec->rm_startblock;
+	agino = XFS_OFFBNO_TO_AGINO(mp, agbno, 0);
+	rmino = XFS_OFFBNO_TO_AGINO(mp, rounddown(agbno, inoalign), 0);
+	if (!xfs_sb_version_hassparseinodes(&mp->m_sb) && agino != rmino)
+		return -EFSCORRUPTED;
+
+	/*
+	 * For each cluster in this blob of inode, we must calculate the
+	 * properly aligned startino of that cluster, then iterate each
+	 * cluster to fill in used and filled masks appropriately.  We
+	 * then use the (startino, used, filled) information to construct
+	 * the appropriate inode records.
+	 */
+	for (agbno = rec->rm_startblock;
+	     agbno < rec->rm_startblock + rec->rm_blockcount;
+	     agbno += blks_per_cluster) {
+		/* The per-AG inum of this inode cluster. */
+		agino = XFS_OFFBNO_TO_AGINO(mp, agbno, 0);
+
+		/* The per-AG inum of the inobt record. */
+		startino = rmino +
+				rounddown(agino - rmino, XFS_INODES_PER_CHUNK);
+		cdist = agino - startino;
+
+		/* Every inode in this holemask slot is filled. */
+		fillmask = xfs_inobt_maskn(
+				cdist / XFS_INODES_PER_HOLEMASK_BIT,
+				nr_inodes / XFS_INODES_PER_HOLEMASK_BIT);
+
+		/* Grab the inode cluster buffer. */
+		imap.im_blkno = XFS_AGB_TO_DADDR(mp, agno, agbno);
+		imap.im_len = XFS_FSB_TO_BB(mp, blks_per_cluster);
+		imap.im_boffset = 0;
+
+		error = xfs_imap_to_bp(mp, cur->bc_tp, &imap,
+				&dip, &bp, 0, XFS_IGET_UNTRUSTED);
+		if (error)
+			return error;
+
+		usedmask = 0;
+		usedcount = 0;
+		/* Which inodes within this cluster are free? */
+		for (clusterino = 0; clusterino < nr_inodes; clusterino++) {
+			fsino = XFS_AGINO_TO_INO(mp, cur->bc_private.a.agno,
+					agino + clusterino);
+			error = xfs_repair_ialloc_check_free(cur, bp, fsino,
+					clusterino, &inuse);
+			if (error) {
+				xfs_trans_brelse(cur->bc_tp, bp);
+				return error;
+			}
+			if (inuse) {
+				usedcount++;
+				usedmask |= XFS_INOBT_MASK(cdist + clusterino);
+			}
+		}
+		xfs_trans_brelse(cur->bc_tp, bp);
+
+		/*
+		 * If the last item in the list is our chunk record,
+		 * update that.
+		 */
+		if (!list_empty(&ri->extlist)) {
+			rie = list_last_entry(&ri->extlist,
+					struct xfs_repair_ialloc_extent, list);
+			if (rie->startino + XFS_INODES_PER_CHUNK > startino) {
+				rie->freemask &= ~usedmask;
+				rie->holemask &= ~fillmask;
+				rie->count += nr_inodes;
+				rie->usedcount += usedcount;
+				continue;
+			}
+		}
+
+		/* New inode chunk; add to the list. */
+		rie = kmem_alloc(sizeof(struct xfs_repair_ialloc_extent),
+				KM_MAYFAIL);
+		if (!rie)
+			return -ENOMEM;
+
+		INIT_LIST_HEAD(&rie->list);
+		rie->startino = startino;
+		rie->freemask = XFS_INOBT_ALL_FREE & ~usedmask;
+		rie->holemask = XFS_INOBT_ALL_FREE & ~fillmask;
+		rie->count = nr_inodes;
+		rie->usedcount = usedcount;
+		list_add_tail(&rie->list, &ri->extlist);
+		ri->nr_records++;
+	}
+
+	return 0;
+}
+
+/* Compare two ialloc extents. */
+static int
+xfs_repair_ialloc_extent_cmp(
+	void				*priv,
+	struct list_head		*a,
+	struct list_head		*b)
+{
+	struct xfs_repair_ialloc_extent	*ap;
+	struct xfs_repair_ialloc_extent	*bp;
+
+	ap = container_of(a, struct xfs_repair_ialloc_extent, list);
+	bp = container_of(b, struct xfs_repair_ialloc_extent, list);
+
+	if (ap->startino > bp->startino)
+		return 1;
+	else if (ap->startino < bp->startino)
+		return -1;
+	return 0;
+}
+
+/* Insert an inode chunk record into a given btree. */
+static int
+xfs_repair_iallocbt_insert_btrec(
+	struct xfs_btree_cur		*cur,
+	struct xfs_repair_ialloc_extent	*rie)
+{
+	int				stat;
+	int				error;
+
+	error = xfs_inobt_lookup(cur, rie->startino, XFS_LOOKUP_EQ, &stat);
+	if (error)
+		return error;
+	XFS_WANT_CORRUPTED_RETURN(cur->bc_mp, stat == 0);
+	error = xfs_inobt_insert_rec(cur, rie->holemask, rie->count,
+			rie->count - rie->usedcount, rie->freemask, &stat);
+	if (error)
+		return error;
+	XFS_WANT_CORRUPTED_RETURN(cur->bc_mp, stat == 1);
+	return error;
+}
+
+/* Insert an inode chunk record into both inode btrees. */
+static int
+xfs_repair_iallocbt_insert_rec(
+	struct xfs_scrub_context	*sc,
+	struct xfs_repair_ialloc_extent	*rie)
+{
+	struct xfs_btree_cur		*cur;
+	int				error;
+
+	trace_xfs_repair_ialloc_insert(sc->mp, sc->sa.agno, rie->startino,
+			rie->holemask, rie->count, rie->count - rie->usedcount,
+			rie->freemask);
+
+	/* Insert into the inobt. */
+	cur = xfs_inobt_init_cursor(sc->mp, sc->tp, sc->sa.agi_bp, sc->sa.agno,
+			XFS_BTNUM_INO);
+	error = xfs_repair_iallocbt_insert_btrec(cur, rie);
+	if (error)
+		goto out_cur;
+	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+
+	/* Insert into the finobt if chunk has free inodes. */
+	if (xfs_sb_version_hasfinobt(&sc->mp->m_sb) &&
+	    rie->count != rie->usedcount) {
+		cur = xfs_inobt_init_cursor(sc->mp, sc->tp, sc->sa.agi_bp,
+				sc->sa.agno, XFS_BTNUM_FINO);
+		error = xfs_repair_iallocbt_insert_btrec(cur, rie);
+		if (error)
+			goto out_cur;
+		xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+	}
+
+	return xfs_repair_roll_ag_trans(sc);
+out_cur:
+	xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
+	return error;
+}
+
+/* Repair both inode btrees. */
+int
+xfs_repair_iallocbt(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_repair_ialloc	ri;
+	struct xfs_owner_info		oinfo;
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_buf			*bp;
+	struct xfs_repair_ialloc_extent	*rie;
+	struct xfs_repair_ialloc_extent	*n;
+	struct xfs_agi			*agi;
+	struct xfs_btree_cur		*cur = NULL;
+	struct xfs_perag		*pag;
+	xfs_fsblock_t			inofsb;
+	xfs_fsblock_t			finofsb;
+	xfs_extlen_t			nr_blocks;
+	xfs_agino_t			old_count;
+	xfs_agino_t			old_freecount;
+	xfs_agino_t			freecount;
+	unsigned int			count;
+	unsigned int			usedcount;
+	int				logflags;
+	int				error = 0;
+
+	/* We require the rmapbt to rebuild anything. */
+	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+		return -EOPNOTSUPP;
+
+	xfs_scrub_perag_get(sc->mp, &sc->sa);
+	pag = sc->sa.pag;
+	/* Collect all reverse mappings for inode blocks. */
+	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_INOBT);
+	INIT_LIST_HEAD(&ri.extlist);
+	xfs_repair_init_extent_list(&ri.btlist);
+	ri.nr_records = 0;
+	ri.sc = sc;
+
+	cur = xfs_rmapbt_init_cursor(mp, sc->tp, sc->sa.agf_bp, sc->sa.agno);
+	error = xfs_rmap_query_all(cur, xfs_repair_ialloc_extent_fn, &ri);
+	if (error)
+		goto out;
+	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+	cur = NULL;
+
+	/* Do we actually have enough space to do this? */
+	nr_blocks = xfs_iallocbt_calc_size(mp, ri.nr_records);
+	if (xfs_sb_version_hasfinobt(&mp->m_sb))
+		nr_blocks *= 2;
+	if (!xfs_repair_ag_has_space(pag, nr_blocks, XFS_AG_RESV_NONE)) {
+		error = -ENOSPC;
+		goto out;
+	}
+
+	/* Invalidate all the inobt/finobt blocks in btlist. */
+	error = xfs_repair_invalidate_blocks(sc, &ri.btlist);
+	if (error)
+		goto out;
+
+	agi = XFS_BUF_TO_AGI(sc->sa.agi_bp);
+	/* Initialize new btree roots. */
+	error = xfs_repair_alloc_ag_block(sc, &oinfo, &inofsb,
+			XFS_AG_RESV_NONE);
+	if (error)
+		goto out;
+	error = xfs_repair_init_btblock(sc, inofsb, &bp, XFS_BTNUM_INO,
+			&xfs_inobt_buf_ops);
+	if (error)
+		goto out;
+	agi->agi_root = cpu_to_be32(XFS_FSB_TO_AGBNO(mp, inofsb));
+	agi->agi_level = cpu_to_be32(1);
+	logflags = XFS_AGI_ROOT | XFS_AGI_LEVEL;
+
+	if (xfs_sb_version_hasfinobt(&mp->m_sb)) {
+		error = xfs_repair_alloc_ag_block(sc, &oinfo, &finofsb,
+				mp->m_inotbt_nores ? XFS_AG_RESV_NONE :
+						     XFS_AG_RESV_METADATA);
+		if (error)
+			goto out;
+		error = xfs_repair_init_btblock(sc, finofsb, &bp,
+				XFS_BTNUM_FINO, &xfs_inobt_buf_ops);
+		if (error)
+			goto out;
+		agi->agi_free_root = cpu_to_be32(XFS_FSB_TO_AGBNO(mp, finofsb));
+		agi->agi_free_level = cpu_to_be32(1);
+		logflags |= XFS_AGI_FREE_ROOT | XFS_AGI_FREE_LEVEL;
+	}
+
+	xfs_ialloc_log_agi(sc->tp, sc->sa.agi_bp, logflags);
+	error = xfs_repair_roll_ag_trans(sc);
+	if (error)
+		goto out;
+
+	/* Insert records into the new btrees. */
+	count = 0;
+	usedcount = 0;
+	list_sort(NULL, &ri.extlist, xfs_repair_ialloc_extent_cmp);
+	list_for_each_entry_safe(rie, n, &ri.extlist, list) {
+		count += rie->count;
+		usedcount += rie->usedcount;
+
+		error = xfs_repair_iallocbt_insert_rec(sc, rie);
+		if (error)
+			goto out;
+
+		list_del(&rie->list);
+		kmem_free(rie);
+	}
+
+
+	/* Update the AGI counters. */
+	agi = XFS_BUF_TO_AGI(sc->sa.agi_bp);
+	old_count = be32_to_cpu(agi->agi_count);
+	old_freecount = be32_to_cpu(agi->agi_freecount);
+	freecount = count - usedcount;
+
+	xfs_repair_mod_ino_counts(sc, old_count, count, old_freecount,
+			freecount);
+
+	if (count != old_count) {
+		if (sc->sa.pag->pagi_init)
+			sc->sa.pag->pagi_count = count;
+		agi->agi_count = cpu_to_be32(count);
+		xfs_ialloc_log_agi(sc->tp, sc->sa.agi_bp, XFS_AGI_COUNT);
+	}
+
+	if (freecount != old_freecount) {
+		if (sc->sa.pag->pagi_init)
+			sc->sa.pag->pagi_freecount = freecount;
+		agi->agi_freecount = cpu_to_be32(freecount);
+		xfs_ialloc_log_agi(sc->tp, sc->sa.agi_bp, XFS_AGI_FREECOUNT);
+	}
+
+	/* Free the old inode btree blocks if they're not in use. */
+	return xfs_repair_reap_btree_extents(sc, &ri.btlist, &oinfo,
+			XFS_AG_RESV_NONE);
+out:
+	if (cur)
+		xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
+	xfs_repair_cancel_btree_extents(sc, &ri.btlist);
+	list_for_each_entry_safe(rie, n, &ri.extlist, list) {
+		list_del(&rie->list);
+		kmem_free(rie);
+	}
+	return error;
+}
diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
index 82cf2d012ebc..eaacbd589754 100644
--- a/fs/xfs/scrub/repair.h
+++ b/fs/xfs/scrub/repair.h
@@ -112,6 +112,7 @@ int xfs_repair_agf(struct xfs_scrub_context *sc);
 int xfs_repair_agfl(struct xfs_scrub_context *sc);
 int xfs_repair_agi(struct xfs_scrub_context *sc);
 int xfs_repair_allocbt(struct xfs_scrub_context *sc);
+int xfs_repair_iallocbt(struct xfs_scrub_context *sc);
 
 #else
 
@@ -139,6 +140,7 @@ xfs_repair_calc_ag_resblks(
 #define xfs_repair_agfl			xfs_repair_notsupported
 #define xfs_repair_agi			xfs_repair_notsupported
 #define xfs_repair_allocbt		xfs_repair_notsupported
+#define xfs_repair_iallocbt		xfs_repair_notsupported
 
 #endif /* CONFIG_XFS_ONLINE_REPAIR */
 
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 213d7e21466a..95f82c7f77f6 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -252,14 +252,14 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.type	= ST_PERAG,
 		.setup	= xfs_scrub_setup_ag_iallocbt,
 		.scrub	= xfs_scrub_inobt,
-		.repair	= xfs_repair_notsupported,
+		.repair	= xfs_repair_iallocbt,
 	},
 	[XFS_SCRUB_TYPE_FINOBT] = {	/* finobt */
 		.type	= ST_PERAG,
 		.setup	= xfs_scrub_setup_ag_iallocbt,
 		.scrub	= xfs_scrub_finobt,
 		.has	= xfs_sb_version_hasfinobt,
-		.repair	= xfs_repair_notsupported,
+		.repair	= xfs_repair_iallocbt,
 	},
 	[XFS_SCRUB_TYPE_RMAPBT] = {	/* rmapbt */
 		.type	= ST_PERAG,


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 05/14] xfs: repair the rmapbt
  2018-05-30 19:30 [PATCH v15.2 00/14] xfs-4.18: online repair support Darrick J. Wong
                   ` (3 preceding siblings ...)
  2018-05-30 19:31 ` [PATCH 04/14] xfs: repair inode btrees Darrick J. Wong
@ 2018-05-30 19:31 ` Darrick J. Wong
  2018-05-31  5:42   ` Amir Goldstein
  2018-05-30 19:31 ` [PATCH 06/14] xfs: repair refcount btrees Darrick J. Wong
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 35+ messages in thread
From: Darrick J. Wong @ 2018-05-30 19:31 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Rebuild the reverse mapping btree from all primary metadata.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile            |    1 
 fs/xfs/scrub/common.c      |    6 
 fs/xfs/scrub/repair.c      |  119 +++++++
 fs/xfs/scrub/repair.h      |   27 +
 fs/xfs/scrub/rmap.c        |    6 
 fs/xfs/scrub/rmap_repair.c |  796 ++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/scrub.c       |   18 +
 fs/xfs/scrub/scrub.h       |    2 
 fs/xfs/xfs_mount.h         |    1 
 fs/xfs/xfs_super.c         |   27 +
 fs/xfs/xfs_trans.c         |    7 
 11 files changed, 1004 insertions(+), 6 deletions(-)
 create mode 100644 fs/xfs/scrub/rmap_repair.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 7c442f83b179..b9bbac3d5075 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -178,6 +178,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   alloc_repair.o \
 				   ialloc_repair.o \
 				   repair.o \
+				   rmap_repair.o \
 				   )
 endif
 endif
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 89938b328954..f92994716522 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -603,9 +603,13 @@ xfs_scrub_trans_alloc(
 	struct xfs_scrub_context	*sc,
 	uint				resblks)
 {
+	uint				flags = 0;
+
+	if (sc->fs_frozen)
+		flags |= XFS_TRANS_NO_WRITECOUNT;
 	if (sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR)
 		return xfs_trans_alloc(sc->mp, &M_RES(sc->mp)->tr_itruncate,
-				resblks, 0, 0, &sc->tp);
+				resblks, 0, flags, &sc->tp);
 
 	return xfs_trans_alloc_empty(sc->mp, &sc->tp);
 }
diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
index 45a91841c0ac..4b5d599d53b9 100644
--- a/fs/xfs/scrub/repair.c
+++ b/fs/xfs/scrub/repair.c
@@ -43,6 +43,8 @@
 #include "xfs_ag_resv.h"
 #include "xfs_trans_space.h"
 #include "xfs_quota.h"
+#include "xfs_bmap.h"
+#include "xfs_bmap_util.h"
 #include "scrub/xfs_scrub.h"
 #include "scrub/scrub.h"
 #include "scrub/common.h"
@@ -1146,3 +1148,120 @@ xfs_repair_mod_ino_counts(
 				(int64_t)freecount - old_freecount);
 	}
 }
+
+/*
+ * Freeze the FS against all other activity so that we can avoid ABBA
+ * deadlocks while taking locks in unusual orders so that we can rebuild
+ * metadata structures such as the rmapbt.
+ */
+int
+xfs_repair_fs_freeze(
+	struct xfs_scrub_context	*sc)
+{
+	int				error;
+
+	error = freeze_super(sc->mp->m_super);
+	if (error)
+		return error;
+	sc->fs_frozen = true;
+	return 0;
+}
+
+/* Unfreeze the FS. */
+int
+xfs_repair_fs_thaw(
+	struct xfs_scrub_context	*sc)
+{
+	struct inode			*inode, *o;
+	int				error;
+
+	sc->fs_frozen = false;
+	error = thaw_super(sc->mp->m_super);
+
+	inode = sc->frozen_inode_list;
+	while (inode) {
+		o = inode->i_private;
+		inode->i_private = NULL;
+		iput(inode);
+		inode = o;
+	}
+
+	return error;
+}
+
+/*
+ * Release an inode while the fs is frozen for a repair.
+ *
+ * We froze the fs so that everything in the fs will be static except for the
+ * metadata that we are rebuilding.  Users can't modify things and periodic
+ * block reclaim is stopped, which leaves only the reclamation that happens
+ * as part of evicting an inode from memory.  We can't have that either, so
+ * redirect those inodes onto a side list and free them once we've thawed the
+ * fs.  Note that memory reclaim is allowed to get to the other inodes.
+ */
+void
+xfs_repair_frozen_iput(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	struct xfs_bmbt_irec		imap;
+	xfs_fileoff_t			end_fsb;
+	xfs_fileoff_t			last_fsb;
+	xfs_filblks_t			map_len;
+	int				nimaps;
+	int				error;
+
+	if (!xfs_can_free_eofblocks(ip, true))
+		goto iput;
+
+	/*
+	 * Figure out if there are any blocks beyond the end of the file.
+	 * If not, then free immediately.
+	 */
+	end_fsb = XFS_B_TO_FSB(sc->mp, (xfs_ufsize_t)XFS_ISIZE(ip));
+	last_fsb = XFS_B_TO_FSB(sc->mp, sc->mp->m_super->s_maxbytes);
+	if (last_fsb <= end_fsb)
+		goto iput;
+	map_len = last_fsb - end_fsb;
+
+	nimaps = 1;
+	xfs_ilock(ip, XFS_ILOCK_SHARED);
+	error = xfs_bmapi_read(ip, end_fsb, map_len, &imap, &nimaps, 0);
+	xfs_iunlock(ip, XFS_ILOCK_SHARED);
+
+	/*
+	 * If there are blocks after the end of file, hang on to them so that
+	 * they don't get destroyed while we aren't able to handle any fs
+	 * modifications.
+	 */
+	if (!error && (nimaps != 0) &&
+	    (imap.br_startblock != HOLESTARTBLOCK ||
+	     ip->i_delayed_blks)) {
+		VFS_I(ip)->i_private = sc->frozen_inode_list;
+		sc->frozen_inode_list = VFS_I(ip);
+		return;
+	}
+iput:
+	iput(VFS_I(ip));
+}
+
+/* Read all AG headers and attach to this transaction. */
+int
+xfs_repair_grab_all_ag_headers(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_buf			*agi;
+	struct xfs_buf			*agf;
+	struct xfs_buf			*agfl;
+	xfs_agnumber_t			agno;
+	int				error = 0;
+
+	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
+		error = xfs_scrub_ag_read_headers(sc, agno, &agi, &agf, &agfl);
+		if (error)
+			break;
+	}
+
+	return error;
+}
diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
index eaacbd589754..a3ee8d1035c0 100644
--- a/fs/xfs/scrub/repair.h
+++ b/fs/xfs/scrub/repair.h
@@ -103,6 +103,11 @@ int xfs_repair_mod_fdblocks(struct xfs_scrub_context *sc,
 void xfs_repair_mod_ino_counts(struct xfs_scrub_context *sc,
 		xfs_agino_t old_count, xfs_agino_t count,
 		xfs_agino_t old_freecount, xfs_agino_t freecount);
+int xfs_repair_fs_freeze(struct xfs_scrub_context *sc);
+int xfs_repair_fs_thaw(struct xfs_scrub_context *sc);
+void xfs_repair_frozen_iput(struct xfs_scrub_context *sc, struct xfs_inode *ip);
+int xfs_repair_grab_all_ag_headers(struct xfs_scrub_context *sc);
+int xfs_repair_rmapbt_setup(struct xfs_scrub_context *sc, struct xfs_inode *ip);
 
 /* Metadata repairers */
 
@@ -113,6 +118,7 @@ int xfs_repair_agfl(struct xfs_scrub_context *sc);
 int xfs_repair_agi(struct xfs_scrub_context *sc);
 int xfs_repair_allocbt(struct xfs_scrub_context *sc);
 int xfs_repair_iallocbt(struct xfs_scrub_context *sc);
+int xfs_repair_rmapbt(struct xfs_scrub_context *sc);
 
 #else
 
@@ -134,6 +140,26 @@ xfs_repair_calc_ag_resblks(
 	return 0;
 }
 
+static inline int xfs_repair_fs_freeze(struct xfs_scrub_context *sc)
+{
+	ASSERT(0);
+	return -EOPNOTSUPP;
+}
+
+static inline int xfs_repair_fs_thaw(struct xfs_scrub_context *sc)
+{
+	ASSERT(0);
+	return -EIO;
+}
+
+static inline int xfs_repair_rmapbt_setup(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	/* We don't support rmap repair, but we can still do a scan. */
+	return xfs_scrub_setup_ag_btree(sc, ip, false);
+}
+
 #define xfs_repair_probe		xfs_repair_notsupported
 #define xfs_repair_superblock		xfs_repair_notsupported
 #define xfs_repair_agf			xfs_repair_notsupported
@@ -141,6 +167,7 @@ xfs_repair_calc_ag_resblks(
 #define xfs_repair_agi			xfs_repair_notsupported
 #define xfs_repair_allocbt		xfs_repair_notsupported
 #define xfs_repair_iallocbt		xfs_repair_notsupported
+#define xfs_repair_rmapbt		xfs_repair_notsupported
 
 #endif /* CONFIG_XFS_ONLINE_REPAIR */
 
diff --git a/fs/xfs/scrub/rmap.c b/fs/xfs/scrub/rmap.c
index b376a9a77c04..14d9b7b40f7b 100644
--- a/fs/xfs/scrub/rmap.c
+++ b/fs/xfs/scrub/rmap.c
@@ -38,6 +38,7 @@
 #include "scrub/common.h"
 #include "scrub/btree.h"
 #include "scrub/trace.h"
+#include "scrub/repair.h"
 
 /*
  * Set us up to scrub reverse mapping btrees.
@@ -47,7 +48,10 @@ xfs_scrub_setup_ag_rmapbt(
 	struct xfs_scrub_context	*sc,
 	struct xfs_inode		*ip)
 {
-	return xfs_scrub_setup_ag_btree(sc, ip, false);
+	if (sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR)
+		return xfs_repair_rmapbt_setup(sc, ip);
+	else
+		return xfs_scrub_setup_ag_btree(sc, ip, false);
 }
 
 /* Reverse-mapping scrubber. */
diff --git a/fs/xfs/scrub/rmap_repair.c b/fs/xfs/scrub/rmap_repair.c
new file mode 100644
index 000000000000..6d2883f11c49
--- /dev/null
+++ b/fs/xfs/scrub/rmap_repair.c
@@ -0,0 +1,796 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_alloc.h"
+#include "xfs_alloc_btree.h"
+#include "xfs_ialloc.h"
+#include "xfs_ialloc_btree.h"
+#include "xfs_rmap.h"
+#include "xfs_rmap_btree.h"
+#include "xfs_inode.h"
+#include "xfs_icache.h"
+#include "xfs_bmap.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_refcount.h"
+#include "xfs_refcount_btree.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/btree.h"
+#include "scrub/trace.h"
+#include "scrub/repair.h"
+
+/* Reverse-mapping repair. */
+
+/* Set us up to repair reverse mapping btrees. */
+int
+xfs_repair_rmapbt_setup(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip)
+{
+	int				error;
+
+	/*
+	 * Freeze out anything that can lock an inode.  We reconstruct
+	 * the rmapbt by reading inode bmaps with the AGF held, which is
+	 * only safe w.r.t. ABBA deadlocks if we're the only ones locking
+	 * inodes.
+	 */
+	error = xfs_repair_fs_freeze(sc);
+	if (error)
+		return error;
+
+	/* Check the AG number and set up the scrub context. */
+	error = xfs_scrub_setup_fs(sc, ip);
+	if (error)
+		return error;
+
+	/*
+	 * Lock all the AG header buffers so that we can read all the
+	 * per-AG metadata too.
+	 */
+	error = xfs_repair_grab_all_ag_headers(sc);
+	if (error)
+		return error;
+
+	return xfs_scrub_ag_init(sc, sc->sm->sm_agno, &sc->sa);
+}
+
+struct xfs_repair_rmapbt_extent {
+	struct list_head		list;
+	struct xfs_rmap_irec		rmap;
+};
+
+struct xfs_repair_rmapbt {
+	struct list_head		rmaplist;
+	struct xfs_repair_extent_list	rmap_freelist;
+	struct xfs_repair_extent_list	bno_freelist;
+	struct xfs_scrub_context	*sc;
+	uint64_t			owner;
+	xfs_extlen_t			btblocks;
+	xfs_agblock_t			next_bno;
+	uint64_t			nr_records;
+};
+
+/* Initialize an rmap. */
+static inline int
+xfs_repair_rmapbt_new_rmap(
+	struct xfs_repair_rmapbt	*rr,
+	xfs_agblock_t			startblock,
+	xfs_extlen_t			blockcount,
+	uint64_t			owner,
+	uint64_t			offset,
+	unsigned int			flags)
+{
+	struct xfs_repair_rmapbt_extent	*rre;
+	int				error = 0;
+
+	trace_xfs_repair_rmap_extent_fn(rr->sc->mp, rr->sc->sa.agno,
+			startblock, blockcount, owner, offset, flags);
+
+	if (xfs_scrub_should_terminate(rr->sc, &error))
+		return error;
+
+	rre = kmem_alloc(sizeof(struct xfs_repair_rmapbt_extent), KM_MAYFAIL);
+	if (!rre)
+		return -ENOMEM;
+	INIT_LIST_HEAD(&rre->list);
+	rre->rmap.rm_startblock = startblock;
+	rre->rmap.rm_blockcount = blockcount;
+	rre->rmap.rm_owner = owner;
+	rre->rmap.rm_offset = offset;
+	rre->rmap.rm_flags = flags;
+	list_add_tail(&rre->list, &rr->rmaplist);
+	rr->nr_records++;
+
+	return 0;
+}
+
+/* Add an AGFL block to the rmap list. */
+STATIC int
+xfs_repair_rmapbt_walk_agfl(
+	struct xfs_mount		*mp,
+	xfs_agblock_t			bno,
+	void				*priv)
+{
+	struct xfs_repair_rmapbt	*rr = priv;
+
+	return xfs_repair_rmapbt_new_rmap(rr, bno, 1, XFS_RMAP_OWN_AG, 0, 0);
+}
+
+/* Add a btree block to the rmap list. */
+STATIC int
+xfs_repair_rmapbt_visit_btblock(
+	struct xfs_btree_cur		*cur,
+	int				level,
+	void				*priv)
+{
+	struct xfs_repair_rmapbt	*rr = priv;
+	struct xfs_buf			*bp;
+	xfs_fsblock_t			fsb;
+
+	xfs_btree_get_block(cur, level, &bp);
+	if (!bp)
+		return 0;
+
+	rr->btblocks++;
+	fsb = XFS_DADDR_TO_FSB(cur->bc_mp, bp->b_bn);
+	return xfs_repair_rmapbt_new_rmap(rr, XFS_FSB_TO_AGBNO(cur->bc_mp, fsb),
+			1, rr->owner, 0, 0);
+}
+
+/* Record inode btree rmaps. */
+STATIC int
+xfs_repair_rmapbt_inodes(
+	struct xfs_btree_cur		*cur,
+	union xfs_btree_rec		*rec,
+	void				*priv)
+{
+	struct xfs_inobt_rec_incore	irec;
+	struct xfs_repair_rmapbt	*rr = priv;
+	struct xfs_mount		*mp = cur->bc_mp;
+	struct xfs_buf			*bp;
+	xfs_fsblock_t			fsb;
+	xfs_agino_t			agino;
+	xfs_agino_t			iperhole;
+	unsigned int			i;
+	int				error;
+
+	/* Record the inobt blocks */
+	for (i = 0; i < cur->bc_nlevels && cur->bc_ptrs[i] == 1; i++) {
+		xfs_btree_get_block(cur, i, &bp);
+		if (!bp)
+			continue;
+		fsb = XFS_DADDR_TO_FSB(mp, bp->b_bn);
+		error = xfs_repair_rmapbt_new_rmap(rr,
+				XFS_FSB_TO_AGBNO(mp, fsb), 1,
+				XFS_RMAP_OWN_INOBT, 0, 0);
+		if (error)
+			return error;
+	}
+
+	xfs_inobt_btrec_to_irec(mp, rec, &irec);
+
+	/* Record a non-sparse inode chunk. */
+	if (irec.ir_holemask == XFS_INOBT_HOLEMASK_FULL)
+		return xfs_repair_rmapbt_new_rmap(rr,
+				XFS_AGINO_TO_AGBNO(mp, irec.ir_startino),
+				XFS_INODES_PER_CHUNK / mp->m_sb.sb_inopblock,
+				XFS_RMAP_OWN_INODES, 0, 0);
+
+	/* Iterate each chunk. */
+	iperhole = max_t(xfs_agino_t, mp->m_sb.sb_inopblock,
+			XFS_INODES_PER_HOLEMASK_BIT);
+	for (i = 0, agino = irec.ir_startino;
+	     i < XFS_INOBT_HOLEMASK_BITS;
+	     i += iperhole / XFS_INODES_PER_HOLEMASK_BIT, agino += iperhole) {
+		/* Skip holes. */
+		if (irec.ir_holemask & (1 << i))
+			continue;
+
+		/* Record the inode chunk otherwise. */
+		error = xfs_repair_rmapbt_new_rmap(rr,
+				XFS_AGINO_TO_AGBNO(mp, agino),
+				iperhole / mp->m_sb.sb_inopblock,
+				XFS_RMAP_OWN_INODES, 0, 0);
+		if (error)
+			return error;
+	}
+
+	return 0;
+}
+
+/* Record a CoW staging extent. */
+STATIC int
+xfs_repair_rmapbt_refcount(
+	struct xfs_btree_cur		*cur,
+	union xfs_btree_rec		*rec,
+	void				*priv)
+{
+	struct xfs_repair_rmapbt	*rr = priv;
+	struct xfs_refcount_irec	refc;
+
+	xfs_refcount_btrec_to_irec(rec, &refc);
+	if (refc.rc_refcount != 1)
+		return -EFSCORRUPTED;
+
+	return xfs_repair_rmapbt_new_rmap(rr,
+			refc.rc_startblock - XFS_REFC_COW_START,
+			refc.rc_blockcount, XFS_RMAP_OWN_COW, 0, 0);
+}
+
+/* Add a bmbt block to the rmap list. */
+STATIC int
+xfs_repair_rmapbt_visit_bmbt(
+	struct xfs_btree_cur		*cur,
+	int				level,
+	void				*priv)
+{
+	struct xfs_repair_rmapbt	*rr = priv;
+	struct xfs_buf			*bp;
+	xfs_fsblock_t			fsb;
+	unsigned int			flags = XFS_RMAP_BMBT_BLOCK;
+
+	xfs_btree_get_block(cur, level, &bp);
+	if (!bp)
+		return 0;
+
+	fsb = XFS_DADDR_TO_FSB(cur->bc_mp, bp->b_bn);
+	if (XFS_FSB_TO_AGNO(cur->bc_mp, fsb) != rr->sc->sa.agno)
+		return 0;
+
+	if (cur->bc_private.b.whichfork == XFS_ATTR_FORK)
+		flags |= XFS_RMAP_ATTR_FORK;
+	return xfs_repair_rmapbt_new_rmap(rr,
+			XFS_FSB_TO_AGBNO(cur->bc_mp, fsb), 1,
+			cur->bc_private.b.ip->i_ino, 0, flags);
+}
+
+/* Determine rmap flags from fork and bmbt state. */
+static inline unsigned int
+xfs_repair_rmapbt_bmap_flags(
+	int			whichfork,
+	xfs_exntst_t		state)
+{
+	return  (whichfork == XFS_ATTR_FORK ? XFS_RMAP_ATTR_FORK : 0) |
+		(state == XFS_EXT_UNWRITTEN ? XFS_RMAP_UNWRITTEN : 0);
+}
+
+/* Find all the extents from a given AG in an inode fork. */
+STATIC int
+xfs_repair_rmapbt_scan_ifork(
+	struct xfs_repair_rmapbt	*rr,
+	struct xfs_inode		*ip,
+	int				whichfork)
+{
+	struct xfs_bmbt_irec		rec;
+	struct xfs_iext_cursor		icur;
+	struct xfs_mount		*mp = rr->sc->mp;
+	struct xfs_btree_cur		*cur = NULL;
+	struct xfs_ifork		*ifp;
+	unsigned int			rflags;
+	int				fmt;
+	int				error = 0;
+
+	/* Do we even have data mapping extents? */
+	fmt = XFS_IFORK_FORMAT(ip, whichfork);
+	ifp = XFS_IFORK_PTR(ip, whichfork);
+	switch (fmt) {
+	case XFS_DINODE_FMT_BTREE:
+		if (!(ifp->if_flags & XFS_IFEXTENTS)) {
+			error = xfs_iread_extents(rr->sc->tp, ip, whichfork);
+			if (error)
+				return error;
+		}
+		break;
+	case XFS_DINODE_FMT_EXTENTS:
+		break;
+	default:
+		return 0;
+	}
+	if (!ifp)
+		return 0;
+
+	/* Find all the BMBT blocks in the AG. */
+	if (fmt == XFS_DINODE_FMT_BTREE) {
+		cur = xfs_bmbt_init_cursor(mp, rr->sc->tp, ip, whichfork);
+		error = xfs_btree_visit_blocks(cur,
+				xfs_repair_rmapbt_visit_bmbt, rr);
+		if (error)
+			goto out;
+		xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+		cur = NULL;
+	}
+
+	/* We're done if this is an rt inode's data fork. */
+	if (whichfork == XFS_DATA_FORK && XFS_IS_REALTIME_INODE(ip))
+		return 0;
+
+	/* Find all the extents in the AG. */
+	for_each_xfs_iext(ifp, &icur, &rec) {
+		if (isnullstartblock(rec.br_startblock))
+			continue;
+		/* Stash non-hole extent. */
+		if (XFS_FSB_TO_AGNO(mp, rec.br_startblock) == rr->sc->sa.agno) {
+			rflags = xfs_repair_rmapbt_bmap_flags(whichfork,
+					rec.br_state);
+			error = xfs_repair_rmapbt_new_rmap(rr,
+					XFS_FSB_TO_AGBNO(mp, rec.br_startblock),
+					rec.br_blockcount, ip->i_ino,
+					rec.br_startoff, rflags);
+			if (error)
+				goto out;
+		}
+	}
+out:
+	if (cur)
+		xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
+	return error;
+}
+
+/* Iterate all the inodes in an AG group. */
+STATIC int
+xfs_repair_rmapbt_scan_inobt(
+	struct xfs_btree_cur		*cur,
+	union xfs_btree_rec		*rec,
+	void				*priv)
+{
+	struct xfs_inobt_rec_incore	irec;
+	struct xfs_repair_rmapbt	*rr = priv;
+	struct xfs_mount		*mp = cur->bc_mp;
+	struct xfs_inode		*ip = NULL;
+	xfs_ino_t			ino;
+	xfs_agino_t			agino;
+	int				chunkidx;
+	int				lock_mode = 0;
+	int				error = 0;
+
+	xfs_inobt_btrec_to_irec(mp, rec, &irec);
+
+	for (chunkidx = 0, agino = irec.ir_startino;
+	     chunkidx < XFS_INODES_PER_CHUNK;
+	     chunkidx++, agino++) {
+		bool	inuse;
+
+		/* Skip if this inode is free */
+		if (XFS_INOBT_MASK(chunkidx) & irec.ir_free)
+			continue;
+		ino = XFS_AGINO_TO_INO(mp, cur->bc_private.a.agno, agino);
+
+		/* Back off and try again if an inode is being reclaimed */
+		error = xfs_icache_inode_is_allocated(mp, cur->bc_tp, ino,
+				&inuse);
+		if (error == -EAGAIN)
+			return -EDEADLOCK;
+
+		/*
+		 * Grab inode for scanning.  We cannot use DONTCACHE here
+		 * because we already have a transaction so the iput must not
+		 * trigger inode reclaim (which might allocate a transaction
+		 * to clean up posteof blocks).
+		 */
+		error = xfs_iget(mp, cur->bc_tp, ino, 0, 0, &ip);
+		if (error)
+			return error;
+
+		if ((ip->i_d.di_format == XFS_DINODE_FMT_BTREE &&
+		     !(ip->i_df.if_flags & XFS_IFEXTENTS)) ||
+		    (ip->i_d.di_aformat == XFS_DINODE_FMT_BTREE &&
+		     !(ip->i_afp->if_flags & XFS_IFEXTENTS)))
+			lock_mode = XFS_ILOCK_EXCL;
+		else
+			lock_mode = XFS_ILOCK_SHARED;
+		if (!xfs_ilock_nowait(ip, lock_mode)) {
+			error = -EBUSY;
+			goto out_rele;
+		}
+
+		/* Check the data fork. */
+		error = xfs_repair_rmapbt_scan_ifork(rr, ip, XFS_DATA_FORK);
+		if (error)
+			goto out_unlock;
+
+		/* Check the attr fork. */
+		error = xfs_repair_rmapbt_scan_ifork(rr, ip, XFS_ATTR_FORK);
+		if (error)
+			goto out_unlock;
+
+		xfs_iunlock(ip, lock_mode);
+		xfs_repair_frozen_iput(rr->sc, ip);
+		ip = NULL;
+	}
+
+	return error;
+out_unlock:
+	xfs_iunlock(ip, lock_mode);
+out_rele:
+	iput(VFS_I(ip));
+	return error;
+}
+
+/* Record extents that aren't in use from gaps in the rmap records. */
+STATIC int
+xfs_repair_rmapbt_record_rmap_freesp(
+	struct xfs_btree_cur		*cur,
+	struct xfs_rmap_irec		*rec,
+	void				*priv)
+{
+	struct xfs_repair_rmapbt	*rr = priv;
+	xfs_fsblock_t			fsb;
+	int				error;
+
+	/* Record the free space we find. */
+	if (rec->rm_startblock > rr->next_bno) {
+		fsb = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_private.a.agno,
+				rr->next_bno);
+		error = xfs_repair_collect_btree_extent(rr->sc,
+				&rr->rmap_freelist, fsb,
+				rec->rm_startblock - rr->next_bno);
+		if (error)
+			return error;
+	}
+	rr->next_bno = max_t(xfs_agblock_t, rr->next_bno,
+			rec->rm_startblock + rec->rm_blockcount);
+	return 0;
+}
+
+/* Record extents that aren't in use from the bnobt records. */
+STATIC int
+xfs_repair_rmapbt_record_bno_freesp(
+	struct xfs_btree_cur		*cur,
+	struct xfs_alloc_rec_incore	*rec,
+	void				*priv)
+{
+	struct xfs_repair_rmapbt	*rr = priv;
+	xfs_fsblock_t			fsb;
+
+	/* Record the free space we find. */
+	fsb = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_private.a.agno,
+			rec->ar_startblock);
+	return xfs_repair_collect_btree_extent(rr->sc, &rr->bno_freelist,
+			fsb, rec->ar_blockcount);
+}
+
+/* Compare two rmapbt extents. */
+static int
+xfs_repair_rmapbt_extent_cmp(
+	void				*priv,
+	struct list_head		*a,
+	struct list_head		*b)
+{
+	struct xfs_repair_rmapbt_extent	*ap;
+	struct xfs_repair_rmapbt_extent	*bp;
+
+	ap = container_of(a, struct xfs_repair_rmapbt_extent, list);
+	bp = container_of(b, struct xfs_repair_rmapbt_extent, list);
+	return xfs_rmap_compare(&ap->rmap, &bp->rmap);
+}
+
+#define RMAP(type, startblock, blockcount) xfs_repair_rmapbt_new_rmap( \
+		&rr, (startblock), (blockcount), \
+		XFS_RMAP_OWN_##type, 0, 0)
+/* Repair the rmap btree for some AG. */
+int
+xfs_repair_rmapbt(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_repair_rmapbt	rr;
+	struct xfs_owner_info		oinfo;
+	struct xfs_repair_rmapbt_extent	*rre;
+	struct xfs_repair_rmapbt_extent	*n;
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_btree_cur		*cur = NULL;
+	struct xfs_buf			*bp = NULL;
+	struct xfs_agf			*agf;
+	struct xfs_agi			*agi;
+	struct xfs_perag		*pag;
+	xfs_fsblock_t			btfsb;
+	xfs_agnumber_t			ag;
+	xfs_agblock_t			agend;
+	xfs_extlen_t			freesp_btblocks;
+	int				error;
+
+	xfs_scrub_perag_get(sc->mp, &sc->sa);
+	pag = sc->sa.pag;
+	INIT_LIST_HEAD(&rr.rmaplist);
+	xfs_repair_init_extent_list(&rr.rmap_freelist);
+	xfs_repair_init_extent_list(&rr.bno_freelist);
+	rr.sc = sc;
+	rr.nr_records = 0;
+
+	/* Collect rmaps for all AG headers. */
+	error = RMAP(FS, XFS_SB_BLOCK(mp), 1);
+	if (error)
+		goto out;
+	rre = list_last_entry(&rr.rmaplist, struct xfs_repair_rmapbt_extent,
+			list);
+
+	if (rre->rmap.rm_startblock != XFS_AGF_BLOCK(mp)) {
+		error = RMAP(FS, XFS_AGF_BLOCK(mp), 1);
+		if (error)
+			goto out;
+		rre = list_last_entry(&rr.rmaplist,
+				struct xfs_repair_rmapbt_extent, list);
+	}
+
+	if (rre->rmap.rm_startblock != XFS_AGI_BLOCK(mp)) {
+		error = RMAP(FS, XFS_AGI_BLOCK(mp), 1);
+		if (error)
+			goto out;
+		rre = list_last_entry(&rr.rmaplist,
+				struct xfs_repair_rmapbt_extent, list);
+	}
+
+	if (rre->rmap.rm_startblock != XFS_AGFL_BLOCK(mp)) {
+		error = RMAP(FS, XFS_AGFL_BLOCK(mp), 1);
+		if (error)
+			goto out;
+	}
+
+	error = xfs_agfl_walk(sc->mp, XFS_BUF_TO_AGF(sc->sa.agf_bp),
+			sc->sa.agfl_bp, xfs_repair_rmapbt_walk_agfl, &rr);
+	if (error)
+		goto out;
+
+	/* Collect rmap for the log if it's in this AG. */
+	if (mp->m_sb.sb_logstart &&
+	    XFS_FSB_TO_AGNO(mp, mp->m_sb.sb_logstart) == sc->sa.agno) {
+		error = RMAP(LOG, XFS_FSB_TO_AGBNO(mp, mp->m_sb.sb_logstart),
+				mp->m_sb.sb_logblocks);
+		if (error)
+			goto out;
+	}
+
+	/* Collect rmaps for the free space btrees. */
+	rr.owner = XFS_RMAP_OWN_AG;
+	rr.btblocks = 0;
+	cur = xfs_allocbt_init_cursor(mp, sc->tp, sc->sa.agf_bp, sc->sa.agno,
+			XFS_BTNUM_BNO);
+	error = xfs_btree_visit_blocks(cur, xfs_repair_rmapbt_visit_btblock,
+			&rr);
+	if (error)
+		goto out;
+	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+	cur = NULL;
+
+	/* Collect rmaps for the cntbt. */
+	cur = xfs_allocbt_init_cursor(mp, sc->tp, sc->sa.agf_bp, sc->sa.agno,
+			XFS_BTNUM_CNT);
+	error = xfs_btree_visit_blocks(cur, xfs_repair_rmapbt_visit_btblock,
+			&rr);
+	if (error)
+		goto out;
+	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+	cur = NULL;
+	freesp_btblocks = rr.btblocks;
+
+	/* Collect rmaps for the inode btree. */
+	cur = xfs_inobt_init_cursor(mp, sc->tp, sc->sa.agi_bp, sc->sa.agno,
+			XFS_BTNUM_INO);
+	error = xfs_btree_query_all(cur, xfs_repair_rmapbt_inodes, &rr);
+	if (error)
+		goto out;
+	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+
+	/* If there are no inodes, we have to include the inobt root. */
+	agi = XFS_BUF_TO_AGI(sc->sa.agi_bp);
+	if (agi->agi_count == cpu_to_be32(0)) {
+		error = xfs_repair_rmapbt_new_rmap(&rr,
+				be32_to_cpu(agi->agi_root), 1,
+				XFS_RMAP_OWN_INOBT, 0, 0);
+		if (error)
+			goto out;
+	}
+
+	/* Collect rmaps for the free inode btree. */
+	if (xfs_sb_version_hasfinobt(&mp->m_sb)) {
+		rr.owner = XFS_RMAP_OWN_INOBT;
+		cur = xfs_inobt_init_cursor(mp, sc->tp, sc->sa.agi_bp,
+				sc->sa.agno, XFS_BTNUM_FINO);
+		error = xfs_btree_visit_blocks(cur,
+				xfs_repair_rmapbt_visit_btblock, &rr);
+		if (error)
+			goto out;
+		xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+		cur = NULL;
+	}
+
+	/* Collect rmaps for the refcount btree. */
+	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
+		union xfs_btree_irec		low;
+		union xfs_btree_irec		high;
+
+		rr.owner = XFS_RMAP_OWN_REFC;
+		cur = xfs_refcountbt_init_cursor(mp, sc->tp, sc->sa.agf_bp,
+				sc->sa.agno, NULL);
+		error = xfs_btree_visit_blocks(cur,
+				xfs_repair_rmapbt_visit_btblock, &rr);
+		if (error)
+			goto out;
+
+		/* Collect rmaps for CoW staging extents. */
+		memset(&low, 0, sizeof(low));
+		low.rc.rc_startblock = XFS_REFC_COW_START;
+		memset(&high, 0xFF, sizeof(high));
+		error = xfs_btree_query_range(cur, &low, &high,
+				xfs_repair_rmapbt_refcount, &rr);
+		if (error)
+			goto out;
+		xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+		cur = NULL;
+	}
+
+	/* Iterate all AGs for inodes. */
+	for (ag = 0; ag < mp->m_sb.sb_agcount; ag++) {
+		error = xfs_ialloc_read_agi(mp, sc->tp, ag, &bp);
+		if (error)
+			goto out;
+		cur = xfs_inobt_init_cursor(mp, sc->tp, bp, ag, XFS_BTNUM_INO);
+		error = xfs_btree_query_all(cur, xfs_repair_rmapbt_scan_inobt,
+				&rr);
+		if (error)
+			goto out;
+		xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+		cur = NULL;
+		xfs_trans_brelse(sc->tp, bp);
+		bp = NULL;
+	}
+
+	/* Do we actually have enough space to do this? */
+	if (!xfs_repair_ag_has_space(pag,
+			xfs_rmapbt_calc_size(mp, rr.nr_records),
+			XFS_AG_RESV_RMAPBT)) {
+		error = -ENOSPC;
+		goto out;
+	}
+
+	/* Initialize a new rmapbt root. */
+	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_UNKNOWN);
+	agf = XFS_BUF_TO_AGF(sc->sa.agf_bp);
+	error = xfs_repair_alloc_ag_block(sc, &oinfo, &btfsb,
+			XFS_AG_RESV_RMAPBT);
+	if (error)
+		goto out;
+	error = xfs_repair_init_btblock(sc, btfsb, &bp, XFS_BTNUM_RMAP,
+			&xfs_rmapbt_buf_ops);
+	if (error)
+		goto out;
+	agf->agf_roots[XFS_BTNUM_RMAPi] = cpu_to_be32(XFS_FSB_TO_AGBNO(mp,
+			btfsb));
+	agf->agf_levels[XFS_BTNUM_RMAPi] = cpu_to_be32(1);
+	agf->agf_rmap_blocks = cpu_to_be32(1);
+
+	/* Reset the perag info. */
+	pag->pagf_btreeblks = freesp_btblocks - 2;
+	pag->pagf_levels[XFS_BTNUM_RMAPi] =
+			be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAPi]);
+
+	/* Now reset the AGF counters. */
+	agf->agf_btreeblks = cpu_to_be32(pag->pagf_btreeblks);
+	xfs_alloc_log_agf(sc->tp, sc->sa.agf_bp, XFS_AGF_ROOTS |
+			XFS_AGF_LEVELS | XFS_AGF_RMAP_BLOCKS |
+			XFS_AGF_BTREEBLKS);
+	bp = NULL;
+	error = xfs_repair_roll_ag_trans(sc);
+	if (error)
+		goto out;
+
+	/* Insert all the metadata rmaps. */
+	list_sort(NULL, &rr.rmaplist, xfs_repair_rmapbt_extent_cmp);
+	list_for_each_entry_safe(rre, n, &rr.rmaplist, list) {
+		/* Add the rmap. */
+		cur = xfs_rmapbt_init_cursor(mp, sc->tp, sc->sa.agf_bp,
+				sc->sa.agno);
+		error = xfs_rmap_map_raw(cur, &rre->rmap);
+		if (error)
+			goto out;
+		xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+		cur = NULL;
+
+		error = xfs_repair_roll_ag_trans(sc);
+		if (error)
+			goto out;
+
+		list_del(&rre->list);
+		kmem_free(rre);
+
+		/*
+		 * Ensure the freelist is full, but don't let it shrink.
+		 * The rmapbt isn't fully set up yet, which means that
+		 * the current AGFL blocks might not be reflected in the
+		 * rmapbt, which is a problem if we want to unmap blocks
+		 * from the AGFL.
+		 */
+		error = xfs_repair_fix_freelist(sc, false);
+		if (error)
+			goto out;
+	}
+
+	/* Compute free space from the new rmapbt. */
+	rr.next_bno = 0;
+	cur = xfs_rmapbt_init_cursor(mp, sc->tp, sc->sa.agf_bp, sc->sa.agno);
+	error = xfs_rmap_query_all(cur, xfs_repair_rmapbt_record_rmap_freesp,
+			&rr);
+	if (error)
+		goto out;
+	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+	cur = NULL;
+
+	/* Insert a record for space between the last rmap and EOAG. */
+	agf = XFS_BUF_TO_AGF(sc->sa.agf_bp);
+	agend = be32_to_cpu(agf->agf_length);
+	if (rr.next_bno < agend) {
+		btfsb = XFS_AGB_TO_FSB(mp, sc->sa.agno, rr.next_bno);
+		error = xfs_repair_collect_btree_extent(sc, &rr.rmap_freelist,
+				btfsb, agend - rr.next_bno);
+		if (error)
+			goto out;
+	}
+
+	/* Compute free space from the existing bnobt. */
+	cur = xfs_allocbt_init_cursor(mp, sc->tp, sc->sa.agf_bp, sc->sa.agno,
+			XFS_BTNUM_BNO);
+	error = xfs_alloc_query_all(cur, xfs_repair_rmapbt_record_bno_freesp,
+			&rr);
+	if (error)
+		goto out;
+	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+	cur = NULL;
+
+	/*
+	 * Free the "free" blocks that the new rmapbt knows about but
+	 * the old bnobt doesn't.  These are the old rmapbt blocks.
+	 */
+	error = xfs_repair_subtract_extents(sc, &rr.rmap_freelist,
+			&rr.bno_freelist);
+	if (error)
+		goto out;
+	xfs_repair_cancel_btree_extents(sc, &rr.bno_freelist);
+	error = xfs_repair_invalidate_blocks(sc, &rr.rmap_freelist);
+	if (error) {
+		goto out;
+	}
+	return xfs_repair_reap_btree_extents(sc, &rr.rmap_freelist, &oinfo,
+			XFS_AG_RESV_RMAPBT);
+out:
+	if (cur)
+		xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
+	if (bp)
+		xfs_trans_brelse(sc->tp, bp);
+	xfs_repair_cancel_btree_extents(sc, &rr.bno_freelist);
+	xfs_repair_cancel_btree_extents(sc, &rr.rmap_freelist);
+	list_for_each_entry_safe(rre, n, &rr.rmaplist, list) {
+		list_del(&rre->list);
+		kmem_free(rre);
+	}
+	return error;
+}
+#undef RMAP
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 95f82c7f77f6..d79696c23a93 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -178,6 +178,8 @@ xfs_scrub_teardown(
 	struct xfs_inode		*ip_in,
 	int				error)
 {
+	int				err2;
+
 	xfs_scrub_ag_free(sc, &sc->sa);
 	if (sc->tp) {
 		if (error == 0 && (sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR))
@@ -194,6 +196,12 @@ xfs_scrub_teardown(
 			iput(VFS_I(sc->ip));
 		sc->ip = NULL;
 	}
+	if (sc->fs_frozen) {
+		err2 = xfs_repair_fs_thaw(sc);
+		if (!error && err2)
+			error = err2;
+		sc->fs_frozen = false;
+	}
 	if (sc->has_quotaofflock)
 		mutex_unlock(&sc->mp->m_quotainfo->qi_quotaofflock);
 	if (sc->buf) {
@@ -266,7 +274,7 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.setup	= xfs_scrub_setup_ag_rmapbt,
 		.scrub	= xfs_scrub_rmapbt,
 		.has	= xfs_sb_version_hasrmapbt,
-		.repair	= xfs_repair_notsupported,
+		.repair	= xfs_repair_rmapbt,
 	},
 	[XFS_SCRUB_TYPE_REFCNTBT] = {	/* refcountbt */
 		.type	= ST_PERAG,
@@ -512,6 +520,8 @@ xfs_scrub_metadata(
 
 	xfs_scrub_experimental_warning(mp);
 
+	atomic_inc(&mp->m_scrubbers);
+
 retry_op:
 	/* Set up for the operation. */
 	memset(&sc, 0, sizeof(sc));
@@ -534,7 +544,7 @@ xfs_scrub_metadata(
 		 */
 		error = xfs_scrub_teardown(&sc, ip, 0);
 		if (error)
-			goto out;
+			goto out_dec;
 		try_harder = true;
 		goto retry_op;
 	} else if (error)
@@ -570,7 +580,7 @@ xfs_scrub_metadata(
 			error = xfs_scrub_teardown(&sc, ip, 0);
 			if (error) {
 				xfs_repair_failure(mp);
-				goto out;
+				goto out_dec;
 			}
 			goto retry_op;
 		}
@@ -580,6 +590,8 @@ xfs_scrub_metadata(
 	xfs_scrub_postmortem(&sc);
 out_teardown:
 	error = xfs_scrub_teardown(&sc, ip, error);
+out_dec:
+	atomic_dec(&mp->m_scrubbers);
 out:
 	trace_xfs_scrub_done(ip, sm, error);
 	if (error == -EFSCORRUPTED || error == -EFSBADCRC) {
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index 636424d5e2ee..d4141b336491 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -69,6 +69,7 @@ struct xfs_scrub_ag {
 
 struct xfs_scrub_context {
 	/* General scrub state. */
+	struct inode			*frozen_inode_list;
 	struct xfs_mount		*mp;
 	struct xfs_scrub_metadata	*sm;
 	const struct xfs_scrub_meta_ops	*ops;
@@ -78,6 +79,7 @@ struct xfs_scrub_context {
 	uint				ilock_flags;
 	bool				try_harder;
 	bool				has_quotaofflock;
+	bool				fs_frozen;
 
 	/* State tracking for single-AG operations. */
 	struct xfs_scrub_ag		sa;
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 10b90bbc5162..44ad46182077 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -205,6 +205,7 @@ typedef struct xfs_mount {
 	unsigned int		*m_errortag;
 	struct xfs_kobj		m_errortag_kobj;
 #endif
+	atomic_t		m_scrubbers;	/* # of active scrub processes */
 } xfs_mount_t;
 
 /*
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 39e5ec3d407f..7f5d335a3f70 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1457,6 +1457,30 @@ xfs_fs_unfreeze(
 	return 0;
 }
 
+/* Don't let userspace freeze while we're scrubbing the filesystem. */
+STATIC int
+xfs_fs_freeze_super(
+	struct super_block	*sb)
+{
+	struct xfs_mount	*mp = XFS_M(sb);
+
+	if (atomic_read(&mp->m_scrubbers) > 0)
+		return -EBUSY;
+	return freeze_super(sb);
+}
+
+/* Don't let userspace thaw while we're scrubbing the filesystem. */
+STATIC int
+xfs_fs_thaw_super(
+	struct super_block	*sb)
+{
+	struct xfs_mount	*mp = XFS_M(sb);
+
+	if (atomic_read(&mp->m_scrubbers) > 0)
+		return -EBUSY;
+	return thaw_super(sb);
+}
+
 STATIC int
 xfs_fs_show_options(
 	struct seq_file		*m,
@@ -1595,6 +1619,7 @@ xfs_mount_alloc(
 	spin_lock_init(&mp->m_perag_lock);
 	mutex_init(&mp->m_growlock);
 	atomic_set(&mp->m_active_trans, 0);
+	atomic_set(&mp->m_scrubbers, 0);
 	INIT_DELAYED_WORK(&mp->m_reclaim_work, xfs_reclaim_worker);
 	INIT_DELAYED_WORK(&mp->m_eofblocks_work, xfs_eofblocks_worker);
 	INIT_DELAYED_WORK(&mp->m_cowblocks_work, xfs_cowblocks_worker);
@@ -1852,6 +1877,8 @@ static const struct super_operations xfs_super_operations = {
 	.show_options		= xfs_fs_show_options,
 	.nr_cached_objects	= xfs_fs_nr_cached_objects,
 	.free_cached_objects	= xfs_fs_free_cached_objects,
+	.freeze_super		= xfs_fs_freeze_super,
+	.thaw_super		= xfs_fs_thaw_super,
 };
 
 static struct file_system_type xfs_fs_type = {
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 5c24e66170fe..5583c20a91fe 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -324,7 +324,12 @@ xfs_trans_alloc(
 	if (!(flags & XFS_TRANS_NO_WRITECOUNT))
 		sb_start_intwrite(mp->m_super);
 
-	WARN_ON(mp->m_super->s_writers.frozen == SB_FREEZE_COMPLETE);
+	/*
+	 * Scrub is allowed to freeze the filesystem in order to obtain
+	 * exclusive access to the filesystem.
+	 */
+	WARN_ON(atomic_read(&mp->m_scrubbers) == 0 &&
+		mp->m_super->s_writers.frozen == SB_FREEZE_COMPLETE);
 	atomic_inc(&mp->m_active_trans);
 
 	tp = kmem_zone_zalloc(xfs_trans_zone,


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 06/14] xfs: repair refcount btrees
  2018-05-30 19:30 [PATCH v15.2 00/14] xfs-4.18: online repair support Darrick J. Wong
                   ` (4 preceding siblings ...)
  2018-05-30 19:31 ` [PATCH 05/14] xfs: repair the rmapbt Darrick J. Wong
@ 2018-05-30 19:31 ` Darrick J. Wong
  2018-05-30 19:31 ` [PATCH 07/14] xfs: repair inode records Darrick J. Wong
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2018-05-30 19:31 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Reconstruct the refcount data from the rmap btree.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile                |    1 
 fs/xfs/scrub/refcount_repair.c |  526 ++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/repair.h          |    2 
 fs/xfs/scrub/scrub.c           |    2 
 4 files changed, 530 insertions(+), 1 deletion(-)
 create mode 100644 fs/xfs/scrub/refcount_repair.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index b9bbac3d5075..36ad73145c25 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -177,6 +177,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   agheader_repair.o \
 				   alloc_repair.o \
 				   ialloc_repair.o \
+				   refcount_repair.o \
 				   repair.o \
 				   rmap_repair.o \
 				   )
diff --git a/fs/xfs/scrub/refcount_repair.c b/fs/xfs/scrub/refcount_repair.c
new file mode 100644
index 000000000000..73c64b5a0cd2
--- /dev/null
+++ b/fs/xfs/scrub/refcount_repair.c
@@ -0,0 +1,526 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_itable.h"
+#include "xfs_alloc.h"
+#include "xfs_ialloc.h"
+#include "xfs_rmap.h"
+#include "xfs_rmap_btree.h"
+#include "xfs_refcount.h"
+#include "xfs_refcount_btree.h"
+#include "xfs_error.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/btree.h"
+#include "scrub/trace.h"
+#include "scrub/repair.h"
+
+/*
+ * Rebuilding the Reference Count Btree
+ *
+ * This algorithm is "borrowed" from xfs_repair.  Imagine the rmap
+ * entries as rectangles representing extents of physical blocks, and
+ * that the rectangles can be laid down to allow them to overlap each
+ * other; then we know that we must emit a refcnt btree entry wherever
+ * the amount of overlap changes, i.e. the emission stimulus is
+ * level-triggered:
+ *
+ *                 -    ---
+ *       --      ----- ----   ---        ------
+ * --   ----     ----------- ----     ---------
+ * -------------------------------- -----------
+ * ^ ^  ^^ ^^    ^ ^^ ^^^  ^^^^  ^ ^^ ^  ^     ^
+ * 2 1  23 21    3 43 234  2123  1 01 2  3     0
+ *
+ * For our purposes, a rmap is a tuple (startblock, len, fileoff, owner).
+ *
+ * Note that in the actual refcnt btree we don't store the refcount < 2
+ * cases because the bnobt tells us which blocks are free; single-use
+ * blocks aren't recorded in the bnobt or the refcntbt.  If the rmapbt
+ * supports storing multiple entries covering a given block we could
+ * theoretically dispense with the refcntbt and simply count rmaps, but
+ * that's inefficient in the (hot) write path, so we'll take the cost of
+ * the extra tree to save time.  Also there's no guarantee that rmap
+ * will be enabled.
+ *
+ * Given an array of rmaps sorted by physical block number, a starting
+ * physical block (sp), a bag to hold rmaps that cover sp, and the next
+ * physical block where the level changes (np), we can reconstruct the
+ * refcount btree as follows:
+ *
+ * While there are still unprocessed rmaps in the array,
+ *  - Set sp to the physical block (pblk) of the next unprocessed rmap.
+ *  - Add to the bag all rmaps in the array where startblock == sp.
+ *  - Set np to the physical block where the bag size will change.  This
+ *    is the minimum of (the pblk of the next unprocessed rmap) and
+ *    (startblock + len of each rmap in the bag).
+ *  - Record the bag size as old_bag_size.
+ *
+ *  - While the bag isn't empty,
+ *     - Remove from the bag all rmaps where startblock + len == np.
+ *     - Add to the bag all rmaps in the array where startblock == np.
+ *     - If the bag size isn't old_bag_size, store the refcount entry
+ *       (sp, np - sp, bag_size) in the refcnt btree.
+ *     - If the bag is empty, break out of the inner loop.
+ *     - Set old_bag_size to the bag size
+ *     - Set sp = np.
+ *     - Set np to the physical block where the bag size will change.
+ *       This is the minimum of (the pblk of the next unprocessed rmap)
+ *       and (startblock + len of each rmap in the bag).
+ *
+ * Like all the other repairers, we make a list of all the refcount
+ * records we need, then reinitialize the refcount btree root and
+ * insert all the records.
+ */
+
+struct xfs_repair_refc_rmap {
+	struct list_head		list;
+	struct xfs_rmap_irec		rmap;
+};
+
+struct xfs_repair_refc_extent {
+	struct list_head		list;
+	struct xfs_refcount_irec	refc;
+};
+
+struct xfs_repair_refc {
+	struct list_head		rmap_bag;  /* rmaps we're tracking */
+	struct list_head		rmap_idle; /* idle rmaps */
+	struct list_head		extlist;   /* refcount extents */
+	struct xfs_repair_extent_list	btlist;    /* old refcountbt blocks */
+	struct xfs_scrub_context	*sc;
+	unsigned long			nr_records;/* nr refcount extents */
+	xfs_extlen_t			btblocks;  /* # of refcountbt blocks */
+};
+
+/* Grab the next record from the rmapbt. */
+STATIC int
+xfs_repair_refcountbt_next_rmap(
+	struct xfs_btree_cur		*cur,
+	struct xfs_repair_refc		*rr,
+	struct xfs_rmap_irec		*rec,
+	bool				*have_rec)
+{
+	struct xfs_rmap_irec		rmap;
+	struct xfs_mount		*mp = cur->bc_mp;
+	struct xfs_repair_refc_extent	*rre;
+	xfs_fsblock_t			fsbno;
+	int				have_gt;
+	int				error = 0;
+
+	*have_rec = false;
+	/*
+	 * Loop through the remaining rmaps.  Remember CoW staging
+	 * extents and the refcountbt blocks from the old tree for later
+	 * disposal.  We can only share written data fork extents, so
+	 * keep looping until we find an rmap for one.
+	 */
+	do {
+		if (xfs_scrub_should_terminate(rr->sc, &error))
+			goto out_error;
+
+		error = xfs_btree_increment(cur, 0, &have_gt);
+		if (error)
+			goto out_error;
+		if (!have_gt)
+			return 0;
+
+		error = xfs_rmap_get_rec(cur, &rmap, &have_gt);
+		if (error)
+			goto out_error;
+		XFS_WANT_CORRUPTED_GOTO(mp, have_gt == 1, out_error);
+
+		if (rmap.rm_owner == XFS_RMAP_OWN_COW) {
+			/* Pass CoW staging extents right through. */
+			rre = kmem_alloc(sizeof(struct xfs_repair_refc_extent),
+					KM_MAYFAIL);
+			if (!rre)
+				goto out_error;
+
+			INIT_LIST_HEAD(&rre->list);
+			rre->refc.rc_startblock = rmap.rm_startblock +
+					XFS_REFC_COW_START;
+			rre->refc.rc_blockcount = rmap.rm_blockcount;
+			rre->refc.rc_refcount = 1;
+			list_add_tail(&rre->list, &rr->extlist);
+		} else if (rmap.rm_owner == XFS_RMAP_OWN_REFC) {
+			/* refcountbt block, dump it when we're done. */
+			rr->btblocks += rmap.rm_blockcount;
+			fsbno = XFS_AGB_TO_FSB(cur->bc_mp,
+					cur->bc_private.a.agno,
+					rmap.rm_startblock);
+			error = xfs_repair_collect_btree_extent(rr->sc,
+					&rr->btlist, fsbno, rmap.rm_blockcount);
+			if (error)
+				goto out_error;
+		}
+	} while (XFS_RMAP_NON_INODE_OWNER(rmap.rm_owner) ||
+		 xfs_internal_inum(mp, rmap.rm_owner) ||
+		 (rmap.rm_flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK |
+				   XFS_RMAP_UNWRITTEN)));
+
+	*rec = rmap;
+	*have_rec = true;
+	return 0;
+
+out_error:
+	return error;
+}
+
+/* Recycle an idle rmap or allocate a new one. */
+static struct xfs_repair_refc_rmap *
+xfs_repair_refcountbt_get_rmap(
+	struct xfs_repair_refc		*rr)
+{
+	struct xfs_repair_refc_rmap	*rrm;
+
+	if (list_empty(&rr->rmap_idle)) {
+		rrm = kmem_alloc(sizeof(struct xfs_repair_refc_rmap),
+				KM_MAYFAIL);
+		if (!rrm)
+			return NULL;
+		INIT_LIST_HEAD(&rrm->list);
+		return rrm;
+	}
+
+	rrm = list_first_entry(&rr->rmap_idle, struct xfs_repair_refc_rmap,
+			list);
+	list_del_init(&rrm->list);
+	return rrm;
+}
+
+/* Compare two btree extents. */
+static int
+xfs_repair_refcount_extent_cmp(
+	void				*priv,
+	struct list_head		*a,
+	struct list_head		*b)
+{
+	struct xfs_repair_refc_extent	*ap;
+	struct xfs_repair_refc_extent	*bp;
+
+	ap = container_of(a, struct xfs_repair_refc_extent, list);
+	bp = container_of(b, struct xfs_repair_refc_extent, list);
+
+	if (ap->refc.rc_startblock > bp->refc.rc_startblock)
+		return 1;
+	else if (ap->refc.rc_startblock < bp->refc.rc_startblock)
+		return -1;
+	return 0;
+}
+
+/* Record a reference count extent. */
+STATIC int
+xfs_repair_refcountbt_new_refc(
+	struct xfs_scrub_context	*sc,
+	struct xfs_repair_refc		*rr,
+	xfs_agblock_t			agbno,
+	xfs_extlen_t			len,
+	xfs_nlink_t			refcount)
+{
+	struct xfs_repair_refc_extent	*rre;
+	struct xfs_refcount_irec	irec;
+
+	irec.rc_startblock = agbno;
+	irec.rc_blockcount = len;
+	irec.rc_refcount = refcount;
+
+	trace_xfs_repair_refcount_extent_fn(sc->mp, sc->sa.agno,
+			&irec);
+
+	rre = kmem_alloc(sizeof(struct xfs_repair_refc_extent),
+			KM_MAYFAIL);
+	if (!rre)
+		return -ENOMEM;
+	INIT_LIST_HEAD(&rre->list);
+	rre->refc = irec;
+	list_add_tail(&rre->list, &rr->extlist);
+
+	return 0;
+}
+
+/* Iterate all the rmap records to generate reference count data. */
+#define RMAP_NEXT(r)	((r).rm_startblock + (r).rm_blockcount)
+STATIC int
+xfs_repair_refcountbt_generate_refcounts(
+	struct xfs_scrub_context	*sc,
+	struct xfs_repair_refc		*rr)
+{
+	struct xfs_rmap_irec		rmap;
+	struct xfs_btree_cur		*cur;
+	struct xfs_repair_refc_rmap	*rrm;
+	struct xfs_repair_refc_rmap	*n;
+	xfs_agblock_t			sbno;
+	xfs_agblock_t			cbno;
+	xfs_agblock_t			nbno;
+	size_t				old_stack_sz;
+	size_t				stack_sz = 0;
+	bool				have;
+	int				have_gt;
+	int				error;
+
+	/* Start the rmapbt cursor to the left of all records. */
+	cur = xfs_rmapbt_init_cursor(sc->mp, sc->tp, sc->sa.agf_bp,
+			sc->sa.agno);
+	error = xfs_rmap_lookup_le(cur, 0, 0, 0, 0, 0, &have_gt);
+	if (error)
+		goto out;
+	ASSERT(have_gt == 0);
+
+	/* Process reverse mappings into refcount data. */
+	while (xfs_btree_has_more_records(cur)) {
+		/* Push all rmaps with pblk == sbno onto the stack */
+		error = xfs_repair_refcountbt_next_rmap(cur, rr, &rmap, &have);
+		if (error)
+			goto out;
+		if (!have)
+			break;
+		sbno = cbno = rmap.rm_startblock;
+		while (have && rmap.rm_startblock == sbno) {
+			rrm = xfs_repair_refcountbt_get_rmap(rr);
+			if (!rrm)
+				goto out;
+			rrm->rmap = rmap;
+			list_add_tail(&rrm->list, &rr->rmap_bag);
+			stack_sz++;
+			error = xfs_repair_refcountbt_next_rmap(cur, rr, &rmap,
+					&have);
+			if (error)
+				goto out;
+		}
+		error = xfs_btree_decrement(cur, 0, &have_gt);
+		if (error)
+			goto out;
+		XFS_WANT_CORRUPTED_GOTO(sc->mp, have_gt, out);
+
+		/* Set nbno to the bno of the next refcount change */
+		nbno = have ? rmap.rm_startblock : NULLAGBLOCK;
+		list_for_each_entry(rrm, &rr->rmap_bag, list)
+			nbno = min_t(xfs_agblock_t, nbno, RMAP_NEXT(rrm->rmap));
+
+		ASSERT(nbno > sbno);
+		old_stack_sz = stack_sz;
+
+		/* While stack isn't empty... */
+		while (stack_sz) {
+			/* Pop all rmaps that end at nbno */
+			list_for_each_entry_safe(rrm, n, &rr->rmap_bag, list) {
+				if (RMAP_NEXT(rrm->rmap) != nbno)
+					continue;
+				stack_sz--;
+				list_move(&rrm->list, &rr->rmap_idle);
+			}
+
+			/* Push array items that start at nbno */
+			error = xfs_repair_refcountbt_next_rmap(cur, rr, &rmap,
+					&have);
+			if (error)
+				goto out;
+			while (have && rmap.rm_startblock == nbno) {
+				rrm = xfs_repair_refcountbt_get_rmap(rr);
+				if (!rrm)
+					goto out;
+				rrm->rmap = rmap;
+				list_add_tail(&rrm->list, &rr->rmap_bag);
+				stack_sz++;
+				error = xfs_repair_refcountbt_next_rmap(cur,
+						rr, &rmap, &have);
+				if (error)
+					goto out;
+			}
+			error = xfs_btree_decrement(cur, 0, &have_gt);
+			if (error)
+				goto out;
+			XFS_WANT_CORRUPTED_GOTO(sc->mp, have_gt, out);
+
+			/* Emit refcount if necessary */
+			ASSERT(nbno > cbno);
+			if (stack_sz != old_stack_sz) {
+				if (old_stack_sz > 1) {
+					error = xfs_repair_refcountbt_new_refc(
+							sc, rr, cbno,
+							nbno - cbno,
+							old_stack_sz);
+					if (error)
+						goto out;
+					rr->nr_records++;
+				}
+				cbno = nbno;
+			}
+
+			/* Stack empty, go find the next rmap */
+			if (stack_sz == 0)
+				break;
+			old_stack_sz = stack_sz;
+			sbno = nbno;
+
+			/* Set nbno to the bno of the next refcount change */
+			nbno = have ? rmap.rm_startblock : NULLAGBLOCK;
+			list_for_each_entry(rrm, &rr->rmap_bag, list)
+				nbno = min_t(xfs_agblock_t, nbno,
+						RMAP_NEXT(rrm->rmap));
+
+			ASSERT(nbno > sbno);
+		}
+	}
+
+	/* Free all the leftover rmap records. */
+	list_for_each_entry_safe(rrm, n, &rr->rmap_idle, list) {
+		list_del(&rrm->list);
+		kmem_free(rrm);
+	}
+
+	ASSERT(list_empty(&rr->rmap_bag));
+	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+	return 0;
+out:
+	xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
+	return error;
+}
+#undef RMAP_NEXT
+
+/* Rebuild the refcount btree. */
+int
+xfs_repair_refcountbt(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_repair_refc		rr;
+	struct xfs_owner_info		oinfo;
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_repair_refc_rmap	*rrm;
+	struct xfs_repair_refc_rmap	*n;
+	struct xfs_repair_refc_extent	*rre;
+	struct xfs_repair_refc_extent	*o;
+	struct xfs_buf			*bp = NULL;
+	struct xfs_agf			*agf;
+	struct xfs_btree_cur		*cur = NULL;
+	xfs_fsblock_t			btfsb;
+	int				have_gt;
+	int				error = 0;
+
+	/* We require the rmapbt to rebuild anything. */
+	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+		return -EOPNOTSUPP;
+
+	xfs_scrub_perag_get(sc->mp, &sc->sa);
+	INIT_LIST_HEAD(&rr.rmap_bag);
+	INIT_LIST_HEAD(&rr.rmap_idle);
+	INIT_LIST_HEAD(&rr.extlist);
+	xfs_repair_init_extent_list(&rr.btlist);
+	rr.btblocks = 0;
+	rr.sc = sc;
+	rr.nr_records = 0;
+	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_REFC);
+
+	error = xfs_repair_refcountbt_generate_refcounts(sc, &rr);
+	if (error)
+		goto out;
+
+	/* Do we actually have enough space to do this? */
+	if (!xfs_repair_ag_has_space(sc->sa.pag,
+			xfs_refcountbt_calc_size(mp, rr.nr_records),
+			XFS_AG_RESV_METADATA)) {
+		error = -ENOSPC;
+		goto out;
+	}
+
+	/* Invalidate all the refcountbt blocks in btlist. */
+	error = xfs_repair_invalidate_blocks(sc, &rr.btlist);
+	if (error)
+		goto out;
+
+	agf = XFS_BUF_TO_AGF(sc->sa.agf_bp);
+	/* Initialize a new btree root. */
+	error = xfs_repair_alloc_ag_block(sc, &oinfo, &btfsb,
+			XFS_AG_RESV_METADATA);
+	if (error)
+		goto out;
+	error = xfs_repair_init_btblock(sc, btfsb, &bp, XFS_BTNUM_REFC,
+			&xfs_refcountbt_buf_ops);
+	if (error)
+		goto out;
+	agf->agf_refcount_root = cpu_to_be32(XFS_FSB_TO_AGBNO(mp, btfsb));
+	agf->agf_refcount_level = cpu_to_be32(1);
+	agf->agf_refcount_blocks = cpu_to_be32(1);
+	xfs_alloc_log_agf(sc->tp, sc->sa.agf_bp, XFS_AGF_REFCOUNT_BLOCKS |
+			XFS_AGF_REFCOUNT_ROOT | XFS_AGF_REFCOUNT_LEVEL);
+	error = xfs_repair_roll_ag_trans(sc);
+	if (error)
+		goto out;
+
+	/* Insert records into the new btree. */
+	list_sort(NULL, &rr.extlist, xfs_repair_refcount_extent_cmp);
+	list_for_each_entry_safe(rre, o, &rr.extlist, list) {
+		/* Insert into the refcountbt. */
+		cur = xfs_refcountbt_init_cursor(mp, sc->tp, sc->sa.agf_bp,
+				sc->sa.agno, NULL);
+		error = xfs_refcount_lookup_eq(cur, rre->refc.rc_startblock,
+				&have_gt);
+		if (error)
+			goto out;
+		XFS_WANT_CORRUPTED_GOTO(mp, have_gt == 0, out);
+		error = xfs_refcount_insert(cur, &rre->refc, &have_gt);
+		if (error)
+			goto out;
+		XFS_WANT_CORRUPTED_GOTO(mp, have_gt == 1, out);
+		xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+		cur = NULL;
+
+		error = xfs_repair_roll_ag_trans(sc);
+		if (error)
+			goto out;
+
+		list_del(&rre->list);
+		kmem_free(rre);
+	}
+
+	/* Free the old refcountbt blocks if they're not in use. */
+	return xfs_repair_reap_btree_extents(sc, &rr.btlist, &oinfo,
+			XFS_AG_RESV_METADATA);
+out:
+	if (cur)
+		xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
+	xfs_repair_cancel_btree_extents(sc, &rr.btlist);
+	list_for_each_entry_safe(rrm, n, &rr.rmap_idle, list) {
+		list_del(&rrm->list);
+		kmem_free(rrm);
+	}
+	list_for_each_entry_safe(rrm, n, &rr.rmap_bag, list) {
+		list_del(&rrm->list);
+		kmem_free(rrm);
+	}
+	list_for_each_entry_safe(rre, o, &rr.extlist, list) {
+		list_del(&rre->list);
+		kmem_free(rre);
+	}
+	return error;
+}
diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
index a3ee8d1035c0..193059d10c59 100644
--- a/fs/xfs/scrub/repair.h
+++ b/fs/xfs/scrub/repair.h
@@ -119,6 +119,7 @@ int xfs_repair_agi(struct xfs_scrub_context *sc);
 int xfs_repair_allocbt(struct xfs_scrub_context *sc);
 int xfs_repair_iallocbt(struct xfs_scrub_context *sc);
 int xfs_repair_rmapbt(struct xfs_scrub_context *sc);
+int xfs_repair_refcountbt(struct xfs_scrub_context *sc);
 
 #else
 
@@ -168,6 +169,7 @@ static inline int xfs_repair_rmapbt_setup(
 #define xfs_repair_allocbt		xfs_repair_notsupported
 #define xfs_repair_iallocbt		xfs_repair_notsupported
 #define xfs_repair_rmapbt		xfs_repair_notsupported
+#define xfs_repair_refcountbt		xfs_repair_notsupported
 
 #endif /* CONFIG_XFS_ONLINE_REPAIR */
 
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index d79696c23a93..2be8e818f7ff 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -281,7 +281,7 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.setup	= xfs_scrub_setup_ag_refcountbt,
 		.scrub	= xfs_scrub_refcountbt,
 		.has	= xfs_sb_version_hasreflink,
-		.repair	= xfs_repair_notsupported,
+		.repair	= xfs_repair_refcountbt,
 	},
 	[XFS_SCRUB_TYPE_INODE] = {	/* inode record */
 		.type	= ST_INODE,


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 07/14] xfs: repair inode records
  2018-05-30 19:30 [PATCH v15.2 00/14] xfs-4.18: online repair support Darrick J. Wong
                   ` (5 preceding siblings ...)
  2018-05-30 19:31 ` [PATCH 06/14] xfs: repair refcount btrees Darrick J. Wong
@ 2018-05-30 19:31 ` Darrick J. Wong
  2018-05-30 19:31 ` [PATCH 08/14] xfs: zap broken inode forks Darrick J. Wong
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2018-05-30 19:31 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Try to reinitialize corrupt inodes, or clear the reflink flag
if it's not needed.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile             |    1 
 fs/xfs/scrub/inode_repair.c |  392 +++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/repair.h       |    2 
 fs/xfs/scrub/scrub.c        |    2 
 4 files changed, 396 insertions(+), 1 deletion(-)
 create mode 100644 fs/xfs/scrub/inode_repair.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 36ad73145c25..b0f25bf07207 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -177,6 +177,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   agheader_repair.o \
 				   alloc_repair.o \
 				   ialloc_repair.o \
+				   inode_repair.o \
 				   refcount_repair.o \
 				   repair.o \
 				   rmap_repair.o \
diff --git a/fs/xfs/scrub/inode_repair.c b/fs/xfs/scrub/inode_repair.c
new file mode 100644
index 000000000000..90208a58a1d1
--- /dev/null
+++ b/fs/xfs/scrub/inode_repair.c
@@ -0,0 +1,392 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_icache.h"
+#include "xfs_inode_buf.h"
+#include "xfs_inode_fork.h"
+#include "xfs_ialloc.h"
+#include "xfs_da_format.h"
+#include "xfs_reflink.h"
+#include "xfs_rmap.h"
+#include "xfs_bmap.h"
+#include "xfs_bmap_util.h"
+#include "xfs_dir2.h"
+#include "xfs_quota_defs.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/btree.h"
+#include "scrub/trace.h"
+#include "scrub/repair.h"
+
+/* Make sure this buffer can pass the inode buffer verifier. */
+STATIC void
+xfs_repair_inode_buf(
+	struct xfs_scrub_context	*sc,
+	struct xfs_buf			*bp)
+{
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_trans		*tp = sc->tp;
+	struct xfs_dinode		*dip;
+	xfs_agnumber_t			agno;
+	xfs_agino_t			agino;
+	int				ioff;
+	int				i;
+	int				ni;
+	int				di_ok;
+	bool				unlinked_ok;
+
+	ni = XFS_BB_TO_FSB(mp, bp->b_length) * mp->m_sb.sb_inopblock;
+	agno = xfs_daddr_to_agno(mp, XFS_BUF_ADDR(bp));
+	for (i = 0; i < ni; i++) {
+		ioff = i << mp->m_sb.sb_inodelog;
+		dip = xfs_buf_offset(bp, ioff);
+		agino = be32_to_cpu(dip->di_next_unlinked);
+		unlinked_ok = (agino == NULLAGINO ||
+			       xfs_verify_agino(sc->mp, agno, agino));
+		di_ok = dip->di_magic == cpu_to_be16(XFS_DINODE_MAGIC) &&
+			xfs_dinode_good_version(mp, dip->di_version);
+		if (di_ok && unlinked_ok)
+			continue;
+		dip->di_magic = cpu_to_be16(XFS_DINODE_MAGIC);
+		dip->di_version = 3;
+		if (!unlinked_ok)
+			dip->di_next_unlinked = cpu_to_be32(NULLAGINO);
+		xfs_dinode_calc_crc(mp, dip);
+		xfs_trans_buf_set_type(tp, bp, XFS_BLFT_DINO_BUF);
+		xfs_trans_log_buf(tp, bp, ioff, ioff + sizeof(*dip) - 1);
+	}
+}
+
+/* Inode didn't pass verifiers, so fix the raw buffer and retry iget. */
+STATIC int
+xfs_repair_inode_core(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_imap			imap;
+	struct xfs_buf			*bp;
+	struct xfs_dinode		*dip;
+	xfs_ino_t			ino;
+	uint64_t			flags2;
+	uint16_t			flags;
+	uint16_t			mode;
+	int				error;
+
+	/* Map & read inode. */
+	ino = sc->sm->sm_ino;
+	error = xfs_imap(sc->mp, sc->tp, ino, &imap, XFS_IGET_UNTRUSTED);
+	if (error)
+		return error;
+
+	error = xfs_trans_read_buf(sc->mp, sc->tp, sc->mp->m_ddev_targp,
+			imap.im_blkno, imap.im_len, XBF_UNMAPPED, &bp, NULL);
+	if (error)
+		return error;
+
+	/* Make sure we can pass the inode buffer verifier. */
+	xfs_repair_inode_buf(sc, bp);
+	bp->b_ops = &xfs_inode_buf_ops;
+
+	/* Fix everything the verifier will complain about. */
+	dip = xfs_buf_offset(bp, imap.im_boffset);
+	mode = be16_to_cpu(dip->di_mode);
+	if (mode && xfs_mode_to_ftype(mode) == XFS_DIR3_FT_UNKNOWN) {
+		/* bad mode, so we set it to a file that only root can read */
+		mode = S_IFREG;
+		dip->di_mode = cpu_to_be16(mode);
+		dip->di_uid = 0;
+		dip->di_gid = 0;
+	}
+	dip->di_magic = cpu_to_be16(XFS_DINODE_MAGIC);
+	if (!xfs_dinode_good_version(sc->mp, dip->di_version))
+		dip->di_version = 3;
+	dip->di_ino = cpu_to_be64(ino);
+	uuid_copy(&dip->di_uuid, &sc->mp->m_sb.sb_meta_uuid);
+	flags = be16_to_cpu(dip->di_flags);
+	flags2 = be64_to_cpu(dip->di_flags2);
+	if (xfs_sb_version_hasreflink(&sc->mp->m_sb) && S_ISREG(mode))
+		flags2 |= XFS_DIFLAG2_REFLINK;
+	else
+		flags2 &= ~(XFS_DIFLAG2_REFLINK | XFS_DIFLAG2_COWEXTSIZE);
+	if (flags & XFS_DIFLAG_REALTIME)
+		flags2 &= ~XFS_DIFLAG2_REFLINK;
+	if (flags2 & XFS_DIFLAG2_REFLINK)
+		flags2 &= ~XFS_DIFLAG2_DAX;
+	dip->di_flags = cpu_to_be16(flags);
+	dip->di_flags2 = cpu_to_be64(flags2);
+	dip->di_gen = cpu_to_be32(sc->sm->sm_gen);
+	if (be64_to_cpu(dip->di_size) & (1ULL << 63))
+		dip->di_size = cpu_to_be64((1ULL << 63) - 1);
+
+	/* Write out the inode... */
+	xfs_dinode_calc_crc(sc->mp, dip);
+	xfs_trans_buf_set_type(sc->tp, bp, XFS_BLFT_DINO_BUF);
+	xfs_trans_log_buf(sc->tp, bp, imap.im_boffset,
+			imap.im_boffset + sc->mp->m_sb.sb_inodesize - 1);
+	error = xfs_trans_commit(sc->tp);
+	if (error)
+		return error;
+	sc->tp = NULL;
+
+	/* ...and reload it? */
+	error = xfs_iget(sc->mp, sc->tp, ino,
+			XFS_IGET_UNTRUSTED | XFS_IGET_DONTCACHE, 0, &sc->ip);
+	if (error)
+		return error;
+	sc->ilock_flags = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL;
+	xfs_ilock(sc->ip, sc->ilock_flags);
+	error = xfs_scrub_trans_alloc(sc, 0);
+	if (error)
+		return error;
+	sc->ilock_flags |= XFS_ILOCK_EXCL;
+	xfs_ilock(sc->ip, XFS_ILOCK_EXCL);
+
+	return 0;
+}
+
+/* Fix di_extsize hint. */
+STATIC void
+xfs_repair_inode_extsize(
+	struct xfs_scrub_context	*sc)
+{
+	xfs_failaddr_t			fa;
+
+	fa = xfs_inode_validate_extsize(sc->mp, sc->ip->i_d.di_extsize,
+			VFS_I(sc->ip)->i_mode, sc->ip->i_d.di_flags);
+	if (!fa)
+		return;
+
+	sc->ip->i_d.di_extsize = 0;
+	sc->ip->i_d.di_flags &= ~(XFS_DIFLAG_EXTSIZE | XFS_DIFLAG_EXTSZINHERIT);
+}
+
+/* Fix di_cowextsize hint. */
+STATIC void
+xfs_repair_inode_cowextsize(
+	struct xfs_scrub_context	*sc)
+{
+	xfs_failaddr_t			fa;
+
+	if (sc->ip->i_d.di_version < 3)
+		return;
+
+	fa = xfs_inode_validate_cowextsize(sc->mp, sc->ip->i_d.di_cowextsize,
+			VFS_I(sc->ip)->i_mode, sc->ip->i_d.di_flags,
+			sc->ip->i_d.di_flags2);
+	if (!fa)
+		return;
+
+	sc->ip->i_d.di_cowextsize = 0;
+	sc->ip->i_d.di_flags2 &= ~XFS_DIFLAG2_COWEXTSIZE;
+}
+
+/* Fix inode flags. */
+STATIC void
+xfs_repair_inode_flags(
+	struct xfs_scrub_context	*sc)
+{
+	uint16_t			mode;
+
+	mode = VFS_I(sc->ip)->i_mode;
+
+	if (sc->ip->i_d.di_flags & ~XFS_DIFLAG_ANY)
+		sc->ip->i_d.di_flags &= ~XFS_DIFLAG_ANY;
+
+	if (sc->ip->i_ino == sc->mp->m_sb.sb_rbmino)
+		sc->ip->i_d.di_flags |= XFS_DIFLAG_NEWRTBM;
+	else
+		sc->ip->i_d.di_flags &= ~XFS_DIFLAG_NEWRTBM;
+
+	if (!S_ISDIR(mode))
+		sc->ip->i_d.di_flags &= ~(XFS_DIFLAG_RTINHERIT |
+					  XFS_DIFLAG_EXTSZINHERIT |
+					  XFS_DIFLAG_PROJINHERIT |
+					  XFS_DIFLAG_NOSYMLINKS);
+	if (!S_ISREG(mode))
+		sc->ip->i_d.di_flags &= ~(XFS_DIFLAG_REALTIME |
+					  XFS_DIFLAG_EXTSIZE);
+
+	if (sc->ip->i_d.di_flags & XFS_DIFLAG_REALTIME)
+		sc->ip->i_d.di_flags &= ~XFS_DIFLAG_FILESTREAM;
+}
+
+/* Fix inode flags2 */
+STATIC void
+xfs_repair_inode_flags2(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_mount		*mp = sc->mp;
+	uint16_t			mode;
+
+	if (sc->ip->i_d.di_version < 3)
+		return;
+
+	mode = VFS_I(sc->ip)->i_mode;
+
+	if (sc->ip->i_d.di_flags2 & ~XFS_DIFLAG2_ANY)
+		sc->ip->i_d.di_flags2 &= ~XFS_DIFLAG2_ANY;
+
+	if (!xfs_sb_version_hasreflink(&mp->m_sb) ||
+	    !S_ISREG(mode))
+		sc->ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK;
+
+	if (!(S_ISREG(mode) || S_ISDIR(mode)))
+		sc->ip->i_d.di_flags2 &= ~XFS_DIFLAG2_DAX;
+
+	if (sc->ip->i_d.di_flags & XFS_DIFLAG_REALTIME)
+		sc->ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK;
+
+	if (sc->ip->i_d.di_flags2 & XFS_DIFLAG2_REFLINK)
+		sc->ip->i_d.di_flags2 &= ~XFS_DIFLAG2_DAX;
+}
+
+/* Repair an inode's fields. */
+int
+xfs_repair_inode(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_inode		*ip;
+	xfs_filblks_t			count;
+	xfs_filblks_t			acount;
+	xfs_extnum_t			nextents;
+	uint16_t			flags;
+	int				error = 0;
+
+	if (!xfs_sb_version_hascrc(&mp->m_sb))
+		return -EOPNOTSUPP;
+
+	/* Skip inode core repair if w're here only for preening. */
+	if (sc->ip &&
+	    (sc->sm->sm_flags & XFS_SCRUB_OFLAG_PREEN) &&
+	    !(sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT) &&
+	    !(sc->sm->sm_flags & XFS_SCRUB_OFLAG_XCORRUPT)) {
+		xfs_trans_ijoin(sc->tp, sc->ip, 0);
+		goto preen_only;
+	}
+
+	if (!sc->ip) {
+		error = xfs_repair_inode_core(sc);
+		if (error)
+			goto out;
+		if (XFS_IS_UQUOTA_ON(mp))
+			xfs_repair_force_quotacheck(sc, XFS_DQ_USER);
+		if (XFS_IS_GQUOTA_ON(mp))
+			xfs_repair_force_quotacheck(sc, XFS_DQ_GROUP);
+		if (XFS_IS_PQUOTA_ON(mp))
+			xfs_repair_force_quotacheck(sc, XFS_DQ_PROJ);
+	}
+	ASSERT(sc->ip);
+
+	ip = sc->ip;
+	xfs_trans_ijoin(sc->tp, ip, 0);
+
+	/* di_[acm]time.nsec */
+	if ((unsigned long)VFS_I(ip)->i_atime.tv_nsec >= NSEC_PER_SEC)
+		VFS_I(ip)->i_atime.tv_nsec = 0;
+	if ((unsigned long)VFS_I(ip)->i_mtime.tv_nsec >= NSEC_PER_SEC)
+		VFS_I(ip)->i_mtime.tv_nsec = 0;
+	if ((unsigned long)VFS_I(ip)->i_ctime.tv_nsec >= NSEC_PER_SEC)
+		VFS_I(ip)->i_ctime.tv_nsec = 0;
+	if (ip->i_d.di_version > 2 &&
+	    (unsigned long)ip->i_d.di_crtime.t_nsec >= NSEC_PER_SEC)
+		ip->i_d.di_crtime.t_nsec = 0;
+
+	/* di_size */
+	if (!S_ISDIR(VFS_I(ip)->i_mode) && !S_ISREG(VFS_I(ip)->i_mode) &&
+	    !S_ISLNK(VFS_I(ip)->i_mode)) {
+		i_size_write(VFS_I(ip), 0);
+		ip->i_d.di_size = 0;
+	}
+
+	/* di_flags */
+	flags = ip->i_d.di_flags;
+	if ((flags & XFS_DIFLAG_IMMUTABLE) && (flags & XFS_DIFLAG_APPEND))
+		flags &= ~XFS_DIFLAG_APPEND;
+
+	if ((flags & XFS_DIFLAG_FILESTREAM) && (flags & XFS_DIFLAG_REALTIME))
+		flags &= ~XFS_DIFLAG_FILESTREAM;
+	ip->i_d.di_flags = flags;
+
+	/* di_nblocks/di_nextents/di_anextents */
+	error = xfs_bmap_count_blocks(sc->tp, sc->ip, XFS_DATA_FORK,
+			&nextents, &count);
+	if (error)
+		goto out;
+	ip->i_d.di_nextents = nextents;
+
+	error = xfs_bmap_count_blocks(sc->tp, sc->ip, XFS_ATTR_FORK,
+			&nextents, &acount);
+	if (error)
+		goto out;
+	ip->i_d.di_anextents = nextents;
+
+	ip->i_d.di_nblocks = count + acount;
+	if (ip->i_d.di_anextents != 0 && ip->i_d.di_forkoff == 0)
+		ip->i_d.di_anextents = 0;
+
+	/* Invalid uid/gid? */
+	if (ip->i_d.di_uid == -1U) {
+		ip->i_d.di_uid = 0;
+		VFS_I(ip)->i_mode &= ~(S_ISUID | S_ISGID);
+		if (XFS_IS_UQUOTA_ON(mp))
+			xfs_repair_force_quotacheck(sc, XFS_DQ_USER);
+	}
+	if (ip->i_d.di_gid == -1U) {
+		ip->i_d.di_gid = 0;
+		VFS_I(ip)->i_mode &= ~(S_ISUID | S_ISGID);
+		if (XFS_IS_GQUOTA_ON(mp))
+			xfs_repair_force_quotacheck(sc, XFS_DQ_GROUP);
+	}
+
+	/* Invalid flags? */
+	xfs_repair_inode_flags(sc);
+	xfs_repair_inode_flags2(sc);
+
+	/* Invalid extent size hints? */
+	xfs_repair_inode_extsize(sc);
+	xfs_repair_inode_cowextsize(sc);
+
+	/* Commit inode core changes. */
+	xfs_trans_log_inode(sc->tp, ip, XFS_ILOG_CORE);
+	error = xfs_trans_roll_inode(&sc->tp, ip);
+	if (error)
+		goto out;
+
+preen_only:
+	/* Inode must be _trans_ijoin'd here */
+	if (xfs_is_reflink_inode(sc->ip))
+		return xfs_reflink_clear_inode_flag(sc->ip, &sc->tp);
+
+out:
+	return error;
+}
diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
index 193059d10c59..b7fbdfe1e4b0 100644
--- a/fs/xfs/scrub/repair.h
+++ b/fs/xfs/scrub/repair.h
@@ -120,6 +120,7 @@ int xfs_repair_allocbt(struct xfs_scrub_context *sc);
 int xfs_repair_iallocbt(struct xfs_scrub_context *sc);
 int xfs_repair_rmapbt(struct xfs_scrub_context *sc);
 int xfs_repair_refcountbt(struct xfs_scrub_context *sc);
+int xfs_repair_inode(struct xfs_scrub_context *sc);
 
 #else
 
@@ -170,6 +171,7 @@ static inline int xfs_repair_rmapbt_setup(
 #define xfs_repair_iallocbt		xfs_repair_notsupported
 #define xfs_repair_rmapbt		xfs_repair_notsupported
 #define xfs_repair_refcountbt		xfs_repair_notsupported
+#define xfs_repair_inode		xfs_repair_notsupported
 
 #endif /* CONFIG_XFS_ONLINE_REPAIR */
 
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 2be8e818f7ff..f08db28fb145 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -287,7 +287,7 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.type	= ST_INODE,
 		.setup	= xfs_scrub_setup_inode,
 		.scrub	= xfs_scrub_inode,
-		.repair	= xfs_repair_notsupported,
+		.repair	= xfs_repair_inode,
 	},
 	[XFS_SCRUB_TYPE_BMBTD] = {	/* inode data fork */
 		.type	= ST_INODE,


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 08/14] xfs: zap broken inode forks
  2018-05-30 19:30 [PATCH v15.2 00/14] xfs-4.18: online repair support Darrick J. Wong
                   ` (6 preceding siblings ...)
  2018-05-30 19:31 ` [PATCH 07/14] xfs: repair inode records Darrick J. Wong
@ 2018-05-30 19:31 ` Darrick J. Wong
  2018-05-30 19:31 ` [PATCH 09/14] xfs: repair inode block maps Darrick J. Wong
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2018-05-30 19:31 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Determine if inode fork damage is responsible for the inode being unable
to pass the ifork verifiers in xfs_iget and zap the fork contents if
this is true.  Once this is done the fork will be empty but we'll be
able to construct an in-core inode, and a subsequent call to the inode
fork repair ioctl will search the rmapbt to rebuild the records that
were in the fork.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_attr_leaf.c |   32 ++-
 fs/xfs/libxfs/xfs_attr_leaf.h |    2 
 fs/xfs/libxfs/xfs_bmap.c      |   21 ++
 fs/xfs/libxfs/xfs_bmap.h      |    2 
 fs/xfs/scrub/inode_repair.c   |  393 +++++++++++++++++++++++++++++++++++++++++
 5 files changed, 433 insertions(+), 17 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
index 2135b8e67dcc..01ce59a4fc92 100644
--- a/fs/xfs/libxfs/xfs_attr_leaf.c
+++ b/fs/xfs/libxfs/xfs_attr_leaf.c
@@ -889,23 +889,16 @@ xfs_attr_shortform_allfit(
 	return xfs_attr_shortform_bytesfit(dp, bytes);
 }
 
-/* Verify the consistency of an inline attribute fork. */
+/* Verify the consistency of a raw inline attribute fork. */
 xfs_failaddr_t
-xfs_attr_shortform_verify(
-	struct xfs_inode		*ip)
+xfs_attr_shortform_verify_struct(
+	struct xfs_attr_shortform	*sfp,
+	size_t				size)
 {
-	struct xfs_attr_shortform	*sfp;
 	struct xfs_attr_sf_entry	*sfep;
 	struct xfs_attr_sf_entry	*next_sfep;
 	char				*endp;
-	struct xfs_ifork		*ifp;
 	int				i;
-	int				size;
-
-	ASSERT(ip->i_d.di_aformat == XFS_DINODE_FMT_LOCAL);
-	ifp = XFS_IFORK_PTR(ip, XFS_ATTR_FORK);
-	sfp = (struct xfs_attr_shortform *)ifp->if_u1.if_data;
-	size = ifp->if_bytes;
 
 	/*
 	 * Give up if the attribute is way too short.
@@ -963,6 +956,23 @@ xfs_attr_shortform_verify(
 	return NULL;
 }
 
+/* Verify the consistency of an inline attribute fork. */
+xfs_failaddr_t
+xfs_attr_shortform_verify(
+	struct xfs_inode		*ip)
+{
+	struct xfs_attr_shortform	*sfp;
+	struct xfs_ifork		*ifp;
+	int				size;
+
+	ASSERT(ip->i_d.di_aformat == XFS_DINODE_FMT_LOCAL);
+	ifp = XFS_IFORK_PTR(ip, XFS_ATTR_FORK);
+	sfp = (struct xfs_attr_shortform *)ifp->if_u1.if_data;
+	size = ifp->if_bytes;
+
+	return xfs_attr_shortform_verify_struct(sfp, size);
+}
+
 /*
  * Convert a leaf attribute list to shortform attribute list
  */
diff --git a/fs/xfs/libxfs/xfs_attr_leaf.h b/fs/xfs/libxfs/xfs_attr_leaf.h
index 4da08af5b134..e5b4102772c1 100644
--- a/fs/xfs/libxfs/xfs_attr_leaf.h
+++ b/fs/xfs/libxfs/xfs_attr_leaf.h
@@ -53,6 +53,8 @@ int	xfs_attr_shortform_to_leaf(struct xfs_da_args *args,
 int	xfs_attr_shortform_remove(struct xfs_da_args *args);
 int	xfs_attr_shortform_allfit(struct xfs_buf *bp, struct xfs_inode *dp);
 int	xfs_attr_shortform_bytesfit(struct xfs_inode *dp, int bytes);
+xfs_failaddr_t xfs_attr_shortform_verify_struct(struct xfs_attr_shortform *sfp,
+		size_t size);
 xfs_failaddr_t xfs_attr_shortform_verify(struct xfs_inode *ip);
 void	xfs_attr_fork_remove(struct xfs_inode *ip, struct xfs_trans *tp);
 
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 7b0e2b551e23..16d17e6a16d2 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -6187,18 +6187,16 @@ xfs_bmap_finish_one(
 	return error;
 }
 
-/* Check that an inode's extent does not have invalid flags or bad ranges. */
+/* Check that an extent does not have invalid flags or bad ranges. */
 xfs_failaddr_t
-xfs_bmap_validate_extent(
-	struct xfs_inode	*ip,
+xfs_bmbt_validate_extent(
+	struct xfs_mount	*mp,
+	bool			isrt,
 	int			whichfork,
 	struct xfs_bmbt_irec	*irec)
 {
-	struct xfs_mount	*mp = ip->i_mount;
 	xfs_fsblock_t		endfsb;
-	bool			isrt;
 
-	isrt = XFS_IS_REALTIME_INODE(ip);
 	endfsb = irec->br_startblock + irec->br_blockcount - 1;
 	if (isrt) {
 		if (!xfs_verify_rtbno(mp, irec->br_startblock))
@@ -6222,3 +6220,14 @@ xfs_bmap_validate_extent(
 	}
 	return NULL;
 }
+
+/* Check that an inode's extent does not have invalid flags or bad ranges. */
+xfs_failaddr_t
+xfs_bmap_validate_extent(
+	struct xfs_inode	*ip,
+	int			whichfork,
+	struct xfs_bmbt_irec	*irec)
+{
+	return xfs_bmbt_validate_extent(ip->i_mount, XFS_IS_REALTIME_INODE(ip),
+			whichfork, irec);
+}
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index 2c233f9f1a26..3b9a83e054c9 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -294,6 +294,8 @@ static inline int xfs_bmap_fork_to_state(int whichfork)
 	}
 }
 
+xfs_failaddr_t xfs_bmbt_validate_extent(struct xfs_mount *mp, bool isrt,
+		int whichfork, struct xfs_bmbt_irec *irec);
 xfs_failaddr_t xfs_bmap_validate_extent(struct xfs_inode *ip, int whichfork,
 		struct xfs_bmbt_irec *irec);
 
diff --git a/fs/xfs/scrub/inode_repair.c b/fs/xfs/scrub/inode_repair.c
index 90208a58a1d1..da37e04bc4df 100644
--- a/fs/xfs/scrub/inode_repair.c
+++ b/fs/xfs/scrub/inode_repair.c
@@ -36,11 +36,15 @@
 #include "xfs_ialloc.h"
 #include "xfs_da_format.h"
 #include "xfs_reflink.h"
+#include "xfs_alloc.h"
 #include "xfs_rmap.h"
+#include "xfs_rmap_btree.h"
 #include "xfs_bmap.h"
+#include "xfs_bmap_btree.h"
 #include "xfs_bmap_util.h"
 #include "xfs_dir2.h"
 #include "xfs_quota_defs.h"
+#include "xfs_attr_leaf.h"
 #include "scrub/xfs_scrub.h"
 #include "scrub/scrub.h"
 #include "scrub/common.h"
@@ -87,11 +91,387 @@ xfs_repair_inode_buf(
 	}
 }
 
+struct xfs_repair_inode_fork_counters {
+	struct xfs_scrub_context	*sc;
+	xfs_rfsblock_t			data_blocks;
+	xfs_rfsblock_t			rt_blocks;
+	xfs_rfsblock_t			attr_blocks;
+	xfs_extnum_t			data_extents;
+	xfs_extnum_t			rt_extents;
+	xfs_aextnum_t			attr_extents;
+};
+
+/* Count extents and blocks for an inode given an rmap. */
+STATIC int
+xfs_repair_inode_count_rmap(
+	struct xfs_btree_cur		*cur,
+	struct xfs_rmap_irec		*rec,
+	void				*priv)
+{
+	struct xfs_repair_inode_fork_counters	*rifc = priv;
+
+	/* Is this even the right fork? */
+	if (rec->rm_owner != rifc->sc->sm->sm_ino)
+		return 0;
+	if (rec->rm_flags & XFS_RMAP_ATTR_FORK) {
+		rifc->attr_blocks += rec->rm_blockcount;
+		if (!(rec->rm_flags & XFS_RMAP_BMBT_BLOCK))
+			rifc->attr_extents++;
+	} else {
+		rifc->data_blocks += rec->rm_blockcount;
+		if (!(rec->rm_flags & XFS_RMAP_BMBT_BLOCK))
+			rifc->data_extents++;
+	}
+	return 0;
+}
+
+/* Count extents and blocks for an inode from all AG rmap data. */
+STATIC int
+xfs_repair_inode_count_ag_rmaps(
+	struct xfs_repair_inode_fork_counters	*rifc,
+	xfs_agnumber_t			agno)
+{
+	struct xfs_btree_cur		*cur;
+	struct xfs_buf			*agf;
+	int				error;
+
+	error = xfs_alloc_read_agf(rifc->sc->mp, rifc->sc->tp, agno, 0, &agf);
+	if (error)
+		return error;
+
+	cur = xfs_rmapbt_init_cursor(rifc->sc->mp, rifc->sc->tp, agf, agno);
+	if (!cur) {
+		error = -ENOMEM;
+		goto out_agf;
+	}
+
+	error = xfs_rmap_query_all(cur, xfs_repair_inode_count_rmap, rifc);
+	if (error == XFS_BTREE_QUERY_RANGE_ABORT)
+		error = 0;
+
+	xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
+out_agf:
+	xfs_trans_brelse(rifc->sc->tp, agf);
+	return error;
+}
+
+/* Count extents and blocks for a given inode from all rmap data. */
+STATIC int
+xfs_repair_inode_count_rmaps(
+	struct xfs_repair_inode_fork_counters	*rifc)
+{
+	xfs_agnumber_t			agno;
+	int				error;
+
+	if (!xfs_sb_version_hasrmapbt(&rifc->sc->mp->m_sb) ||
+	    xfs_sb_version_hasrealtime(&rifc->sc->mp->m_sb))
+		return -EOPNOTSUPP;
+
+	/* XXX: find rt blocks too */
+
+	for (agno = 0; agno < rifc->sc->mp->m_sb.sb_agcount; agno++) {
+		error = xfs_repair_inode_count_ag_rmaps(rifc, agno);
+		if (error)
+			return error;
+	}
+
+	/* Can't have extents on both the rt and the data device. */
+	if (rifc->data_extents && rifc->rt_extents)
+		return -EFSCORRUPTED;
+
+	return 0;
+}
+
+/* Figure out if we need to zap this extents format fork. */
+STATIC bool
+xfs_repair_inode_core_check_extents_fork(
+	struct xfs_scrub_context	*sc,
+	struct xfs_dinode		*dip,
+	int				dfork_size,
+	int				whichfork)
+{
+	struct xfs_bmbt_irec		new;
+	struct xfs_bmbt_rec		*dp;
+	bool				isrt;
+	int				i;
+	int				nex;
+	int				fork_size;
+
+	nex = XFS_DFORK_NEXTENTS(dip, whichfork);
+	fork_size = nex * sizeof(struct xfs_bmbt_rec);
+	if (fork_size < 0 || fork_size > dfork_size)
+		return true;
+	dp = (struct xfs_bmbt_rec *)XFS_DFORK_PTR(dip, whichfork);
+
+	isrt = dip->di_flags & cpu_to_be16(XFS_DIFLAG_REALTIME);
+	for (i = 0; i < nex; i++, dp++) {
+		xfs_failaddr_t	fa;
+
+		xfs_bmbt_disk_get_all(dp, &new);
+		fa = xfs_bmbt_validate_extent(sc->mp, isrt, whichfork, &new);
+		if (fa)
+			return true;
+	}
+
+	return false;
+}
+
+/* Figure out if we need to zap this btree format fork. */
+STATIC bool
+xfs_repair_inode_core_check_btree_fork(
+	struct xfs_scrub_context	*sc,
+	struct xfs_dinode		*dip,
+	int				dfork_size,
+	int				whichfork)
+{
+	struct xfs_bmdr_block		*dfp;
+	int				nrecs;
+	int				level;
+
+	if (XFS_DFORK_NEXTENTS(dip, whichfork) <=
+			dfork_size / sizeof(struct xfs_bmbt_irec))
+		return true;
+
+	dfp = (struct xfs_bmdr_block *)XFS_DFORK_PTR(dip, whichfork);
+	nrecs = be16_to_cpu(dfp->bb_numrecs);
+	level = be16_to_cpu(dfp->bb_level);
+
+	if (nrecs == 0 || XFS_BMDR_SPACE_CALC(nrecs) > dfork_size)
+		return true;
+	if (level == 0 || level > XFS_BTREE_MAXLEVELS)
+		return true;
+	return false;
+}
+
+/*
+ * Check the data fork for things that will fail the ifork verifiers or the
+ * ifork formatters.
+ */
+STATIC bool
+xfs_repair_inode_core_check_data_fork(
+	struct xfs_scrub_context	*sc,
+	struct xfs_dinode		*dip,
+	uint16_t			mode)
+{
+	uint64_t			size;
+	int				dfork_size;
+
+	size = be64_to_cpu(dip->di_size);
+	switch (mode & S_IFMT) {
+	case S_IFIFO:
+	case S_IFCHR:
+	case S_IFBLK:
+	case S_IFSOCK:
+		if (XFS_DFORK_FORMAT(dip, XFS_DATA_FORK) != XFS_DINODE_FMT_DEV)
+			return true;
+		break;
+	case S_IFREG:
+	case S_IFLNK:
+	case S_IFDIR:
+		switch (XFS_DFORK_FORMAT(dip, XFS_DATA_FORK)) {
+		case XFS_DINODE_FMT_LOCAL:
+		case XFS_DINODE_FMT_EXTENTS:
+		case XFS_DINODE_FMT_BTREE:
+			break;
+		default:
+			return true;
+		}
+		break;
+	default:
+		return true;
+	}
+	dfork_size = XFS_DFORK_SIZE(dip, sc->mp, XFS_DATA_FORK);
+	switch (XFS_DFORK_FORMAT(dip, XFS_DATA_FORK)) {
+	case XFS_DINODE_FMT_DEV:
+		break;
+	case XFS_DINODE_FMT_LOCAL:
+		if (size > dfork_size)
+			return true;
+		break;
+	case XFS_DINODE_FMT_EXTENTS:
+		if (xfs_repair_inode_core_check_extents_fork(sc, dip,
+				dfork_size, XFS_DATA_FORK))
+			return true;
+		break;
+	case XFS_DINODE_FMT_BTREE:
+		if (xfs_repair_inode_core_check_btree_fork(sc, dip,
+				dfork_size, XFS_DATA_FORK))
+			return true;
+		break;
+	default:
+		return true;
+	}
+
+	return false;
+}
+
+/* Reset the data fork to something sane. */
+STATIC void
+xfs_repair_inode_core_zap_data_fork(
+	struct xfs_scrub_context	*sc,
+	struct xfs_dinode		*dip,
+	uint16_t			mode,
+	struct xfs_repair_inode_fork_counters	*rifc)
+{
+	char				*p;
+	const struct xfs_dir_ops	*ops;
+	struct xfs_dir2_sf_hdr		*sfp;
+	int				i8count;
+
+	/* Special files always get reset to DEV */
+	switch (mode & S_IFMT) {
+	case S_IFIFO:
+	case S_IFCHR:
+	case S_IFBLK:
+	case S_IFSOCK:
+		dip->di_format = XFS_DINODE_FMT_DEV;
+		dip->di_size = 0;
+		return;
+	}
+
+	/*
+	 * If we have data extents, reset to an empty map and hope the user
+	 * will run the bmapbtd checker next.
+	 */
+	if (rifc->data_extents || rifc->rt_extents || S_ISREG(mode)) {
+		dip->di_format = XFS_DINODE_FMT_EXTENTS;
+		dip->di_nextents = 0;
+		return;
+	}
+
+	/* Otherwise, reset the local format to the minimum. */
+	switch (mode & S_IFMT) {
+	case S_IFLNK:
+		/* Blow out symlink; now it points to root dir */
+		dip->di_format = XFS_DINODE_FMT_LOCAL;
+		dip->di_size = cpu_to_be64(1);
+		p = XFS_DFORK_PTR(dip, XFS_DATA_FORK);
+		*p = '/';
+		break;
+	case S_IFDIR:
+		/*
+		 * Blow out dir, make it point to the root.  In the
+		 * future the direction repair will reconstruct this
+		 * dir for us.
+		 */
+		dip->di_format = XFS_DINODE_FMT_LOCAL;
+		i8count = sc->mp->m_sb.sb_rootino > XFS_DIR2_MAX_SHORT_INUM;
+		ops = xfs_dir_get_ops(sc->mp, NULL);
+		sfp = (struct xfs_dir2_sf_hdr *)XFS_DFORK_PTR(dip,
+				XFS_DATA_FORK);
+		sfp->count = 0;
+		sfp->i8count = i8count;
+		ops->sf_put_parent_ino(sfp, sc->mp->m_sb.sb_rootino);
+		dip->di_size = cpu_to_be64(xfs_dir2_sf_hdr_size(i8count));
+		break;
+	}
+}
+
+/*
+ * Check the attr fork for things that will fail the ifork verifiers or the
+ * ifork formatters.
+ */
+STATIC bool
+xfs_repair_inode_core_check_attr_fork(
+	struct xfs_scrub_context	*sc,
+	struct xfs_dinode		*dip)
+{
+	struct xfs_attr_shortform	*sfp;
+	int				size;
+
+	if (XFS_DFORK_BOFF(dip) == 0)
+		return dip->di_aformat != XFS_DINODE_FMT_EXTENTS ||
+		       dip->di_anextents != 0;
+
+	size = XFS_DFORK_SIZE(dip, sc->mp, XFS_ATTR_FORK);
+	switch (XFS_DFORK_FORMAT(dip, XFS_ATTR_FORK)) {
+	case XFS_DINODE_FMT_LOCAL:
+		sfp = (struct xfs_attr_shortform *)XFS_DFORK_PTR(dip,
+				XFS_ATTR_FORK);
+		return xfs_attr_shortform_verify_struct(sfp, size) != NULL;
+	case XFS_DINODE_FMT_EXTENTS:
+		if (xfs_repair_inode_core_check_extents_fork(sc, dip, size,
+				XFS_ATTR_FORK))
+			return true;
+		break;
+	case XFS_DINODE_FMT_BTREE:
+		if (xfs_repair_inode_core_check_btree_fork(sc, dip, size,
+				XFS_ATTR_FORK))
+			return true;
+		break;
+	default:
+		return true;
+	}
+
+	return false;
+}
+
+/* Reset the attr fork to something sane. */
+STATIC void
+xfs_repair_inode_core_zap_attr_fork(
+	struct xfs_scrub_context	*sc,
+	struct xfs_dinode		*dip,
+	struct xfs_repair_inode_fork_counters	*rifc)
+{
+	dip->di_aformat = XFS_DINODE_FMT_EXTENTS;
+	dip->di_anextents = 0;
+	/*
+	 * We leave a nonzero forkoff so that the bmap scrub will look for
+	 * attr rmaps.
+	 */
+	dip->di_forkoff = rifc->attr_extents ? 1 : 0;
+}
+
+/*
+ * Zap the data/attr forks if we spot anything that isn't going to pass the
+ * ifork verifiers or the ifork formatters, because we need to get the inode
+ * into good enough shape that the higher level repair functions can run.
+ */
+STATIC void
+xfs_repair_inode_core_zap_forks(
+	struct xfs_scrub_context	*sc,
+	struct xfs_dinode		*dip,
+	uint16_t			mode,
+	struct xfs_repair_inode_fork_counters	*rifc)
+{
+	bool				zap_datafork = false;
+	bool				zap_attrfork = false;
+
+	/* Inode counters don't make sense? */
+	if (be32_to_cpu(dip->di_nextents) > be64_to_cpu(dip->di_nblocks))
+		zap_datafork = true;
+	if (be16_to_cpu(dip->di_anextents) > be64_to_cpu(dip->di_nblocks))
+		zap_attrfork = true;
+	if (be32_to_cpu(dip->di_nextents) + be16_to_cpu(dip->di_anextents) >
+			be64_to_cpu(dip->di_nblocks))
+		zap_datafork = zap_attrfork = true;
+
+	if (!zap_datafork)
+		zap_datafork = xfs_repair_inode_core_check_data_fork(sc, dip,
+				mode);
+	if (!zap_attrfork)
+		zap_attrfork = xfs_repair_inode_core_check_attr_fork(sc, dip);
+
+	/* Zap whatever's bad. */
+	if (zap_attrfork)
+		xfs_repair_inode_core_zap_attr_fork(sc, dip, rifc);
+	if (zap_datafork)
+		xfs_repair_inode_core_zap_data_fork(sc, dip, mode, rifc);
+	dip->di_nblocks = 0;
+	if (!zap_attrfork)
+		be64_add_cpu(&dip->di_nblocks, rifc->attr_blocks);
+	if (!zap_datafork) {
+		be64_add_cpu(&dip->di_nblocks, rifc->data_blocks);
+		be64_add_cpu(&dip->di_nblocks, rifc->rt_blocks);
+	}
+}
+
 /* Inode didn't pass verifiers, so fix the raw buffer and retry iget. */
 STATIC int
 xfs_repair_inode_core(
 	struct xfs_scrub_context	*sc)
 {
+	struct xfs_repair_inode_fork_counters	rifc;
 	struct xfs_imap			imap;
 	struct xfs_buf			*bp;
 	struct xfs_dinode		*dip;
@@ -101,6 +481,13 @@ xfs_repair_inode_core(
 	uint16_t			mode;
 	int				error;
 
+	/* Figure out what this inode had mapped in both forks. */
+	memset(&rifc, 0, sizeof(rifc));
+	rifc.sc = sc;
+	error = xfs_repair_inode_count_rmaps(&rifc);
+	if (error)
+		return error;
+
 	/* Map & read inode. */
 	ino = sc->sm->sm_ino;
 	error = xfs_imap(sc->mp, sc->tp, ino, &imap, XFS_IGET_UNTRUSTED);
@@ -133,6 +520,10 @@ xfs_repair_inode_core(
 	uuid_copy(&dip->di_uuid, &sc->mp->m_sb.sb_meta_uuid);
 	flags = be16_to_cpu(dip->di_flags);
 	flags2 = be64_to_cpu(dip->di_flags2);
+	if (rifc.rt_extents)
+		flags |= XFS_DIFLAG_REALTIME;
+	else
+		flags &= ~XFS_DIFLAG_REALTIME;
 	if (xfs_sb_version_hasreflink(&sc->mp->m_sb) && S_ISREG(mode))
 		flags2 |= XFS_DIFLAG2_REFLINK;
 	else
@@ -147,6 +538,8 @@ xfs_repair_inode_core(
 	if (be64_to_cpu(dip->di_size) & (1ULL << 63))
 		dip->di_size = cpu_to_be64((1ULL << 63) - 1);
 
+	xfs_repair_inode_core_zap_forks(sc, dip, mode, &rifc);
+
 	/* Write out the inode... */
 	xfs_dinode_calc_crc(sc->mp, dip);
 	xfs_trans_buf_set_type(sc->tp, bp, XFS_BLFT_DINO_BUF);


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 09/14] xfs: repair inode block maps
  2018-05-30 19:30 [PATCH v15.2 00/14] xfs-4.18: online repair support Darrick J. Wong
                   ` (7 preceding siblings ...)
  2018-05-30 19:31 ` [PATCH 08/14] xfs: zap broken inode forks Darrick J. Wong
@ 2018-05-30 19:31 ` Darrick J. Wong
  2018-05-30 19:31 ` [PATCH 10/14] xfs: repair damaged symlinks Darrick J. Wong
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2018-05-30 19:31 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Use the reverse-mapping btree information to rebuild an inode fork.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile            |    1 
 fs/xfs/scrub/bmap.c        |    8 +
 fs/xfs/scrub/bmap_repair.c |  399 ++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/repair.h      |    4 
 fs/xfs/scrub/scrub.c       |    4 
 5 files changed, 414 insertions(+), 2 deletions(-)
 create mode 100644 fs/xfs/scrub/bmap_repair.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index b0f25bf07207..653da1fe6b26 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -176,6 +176,7 @@ ifeq ($(CONFIG_XFS_ONLINE_REPAIR),y)
 xfs-y				+= $(addprefix scrub/, \
 				   agheader_repair.o \
 				   alloc_repair.o \
+				   bmap_repair.o \
 				   ialloc_repair.o \
 				   inode_repair.o \
 				   refcount_repair.o \
diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c
index eeadb33a701c..bbbac1083181 100644
--- a/fs/xfs/scrub/bmap.c
+++ b/fs/xfs/scrub/bmap.c
@@ -71,6 +71,14 @@ xfs_scrub_setup_inode_bmap(
 		error = filemap_write_and_wait(VFS_I(sc->ip)->i_mapping);
 		if (error)
 			goto out;
+
+		/* Drop the page cache if we're repairing block mappings. */
+		if (sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR) {
+			error = invalidate_inode_pages2(
+					VFS_I(sc->ip)->i_mapping);
+			if (error)
+				goto out;
+		}
 	}
 
 	/* Got the inode, lock it and we're ready to go. */
diff --git a/fs/xfs/scrub/bmap_repair.c b/fs/xfs/scrub/bmap_repair.c
new file mode 100644
index 000000000000..aae780a0032c
--- /dev/null
+++ b/fs/xfs/scrub/bmap_repair.c
@@ -0,0 +1,399 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_inode_fork.h"
+#include "xfs_alloc.h"
+#include "xfs_rtalloc.h"
+#include "xfs_bmap.h"
+#include "xfs_bmap_util.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_rmap.h"
+#include "xfs_rmap_btree.h"
+#include "xfs_refcount.h"
+#include "xfs_quota.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/btree.h"
+#include "scrub/trace.h"
+#include "scrub/repair.h"
+
+/* Inode fork block mapping (BMBT) repair. */
+
+struct xfs_repair_bmap_extent {
+	struct list_head		list;
+	struct xfs_rmap_irec		rmap;
+	xfs_agnumber_t			agno;
+};
+
+struct xfs_repair_bmap {
+	struct list_head		extlist;
+	struct xfs_repair_extent_list	btlist;
+	struct xfs_repair_bmap_extent	ext;	/* most files have 1 extent */
+	struct xfs_scrub_context	*sc;
+	xfs_ino_t			ino;
+	xfs_rfsblock_t			otherfork_blocks;
+	xfs_rfsblock_t			bmbt_blocks;
+	xfs_extnum_t			extents;
+	int				whichfork;
+};
+
+/* Record extents that belong to this inode's fork. */
+STATIC int
+xfs_repair_bmap_extent_fn(
+	struct xfs_btree_cur		*cur,
+	struct xfs_rmap_irec		*rec,
+	void				*priv)
+{
+	struct xfs_repair_bmap		*rb = priv;
+	struct xfs_repair_bmap_extent	*rbe;
+	struct xfs_mount		*mp = cur->bc_mp;
+	xfs_fsblock_t			fsbno;
+	int				error = 0;
+
+	if (xfs_scrub_should_terminate(rb->sc, &error))
+		return error;
+
+	/* Skip extents which are not owned by this inode and fork. */
+	if (rec->rm_owner != rb->ino) {
+		return 0;
+	} else if (rb->whichfork == XFS_DATA_FORK &&
+		 (rec->rm_flags & XFS_RMAP_ATTR_FORK)) {
+		rb->otherfork_blocks += rec->rm_blockcount;
+		return 0;
+	} else if (rb->whichfork == XFS_ATTR_FORK &&
+		 !(rec->rm_flags & XFS_RMAP_ATTR_FORK)) {
+		rb->otherfork_blocks += rec->rm_blockcount;
+		return 0;
+	}
+
+	rb->extents++;
+
+	/* Delete the old bmbt blocks later. */
+	if (rec->rm_flags & XFS_RMAP_BMBT_BLOCK) {
+		fsbno = XFS_AGB_TO_FSB(mp, cur->bc_private.a.agno,
+				rec->rm_startblock);
+		rb->bmbt_blocks += rec->rm_blockcount;
+		return xfs_repair_collect_btree_extent(rb->sc, &rb->btlist,
+				fsbno, rec->rm_blockcount);
+	}
+
+	/* Remember this rmap. */
+	trace_xfs_repair_bmap_extent_fn(mp, cur->bc_private.a.agno,
+			rec->rm_startblock, rec->rm_blockcount, rec->rm_owner,
+			rec->rm_offset, rec->rm_flags);
+
+	if (list_empty(&rb->extlist)) {
+		rbe = &rb->ext;
+	} else {
+		rbe = kmem_alloc(sizeof(struct xfs_repair_bmap_extent),
+				KM_MAYFAIL);
+		if (!rbe)
+			return -ENOMEM;
+	}
+
+	INIT_LIST_HEAD(&rbe->list);
+	rbe->rmap = *rec;
+	rbe->agno = cur->bc_private.a.agno;
+	list_add_tail(&rbe->list, &rb->extlist);
+
+	return 0;
+}
+
+/* Compare two bmap extents. */
+static int
+xfs_repair_bmap_extent_cmp(
+	void				*priv,
+	struct list_head		*a,
+	struct list_head		*b)
+{
+	struct xfs_repair_bmap_extent	*ap;
+	struct xfs_repair_bmap_extent	*bp;
+
+	ap = container_of(a, struct xfs_repair_bmap_extent, list);
+	bp = container_of(b, struct xfs_repair_bmap_extent, list);
+
+	if (ap->rmap.rm_offset > bp->rmap.rm_offset)
+		return 1;
+	else if (ap->rmap.rm_offset < bp->rmap.rm_offset)
+		return -1;
+	return 0;
+}
+
+/* Scan one AG for reverse mappings that we can turn into extent maps. */
+STATIC int
+xfs_repair_bmap_scan_ag(
+	struct xfs_repair_bmap		*rb,
+	xfs_agnumber_t			agno)
+{
+	struct xfs_scrub_context	*sc = rb->sc;
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_buf			*agf_bp = NULL;
+	struct xfs_btree_cur		*cur;
+	int				error;
+
+	error = xfs_alloc_read_agf(mp, sc->tp, agno, 0, &agf_bp);
+	if (error)
+		return error;
+	if (!agf_bp)
+		return -ENOMEM;
+	cur = xfs_rmapbt_init_cursor(mp, sc->tp, agf_bp, agno);
+	error = xfs_rmap_query_all(cur, xfs_repair_bmap_extent_fn, rb);
+	if (error == XFS_BTREE_QUERY_RANGE_ABORT)
+		error = 0;
+	xfs_btree_del_cursor(cur, error ? XFS_BTREE_ERROR :
+			XFS_BTREE_NOERROR);
+	xfs_trans_brelse(sc->tp, agf_bp);
+	return error;
+}
+
+/* Insert bmap records into an inode fork, given an rmap. */
+STATIC int
+xfs_repair_bmap_insert_rec(
+	struct xfs_scrub_context	*sc,
+	struct xfs_repair_bmap_extent	*rbe,
+	int				baseflags)
+{
+	struct xfs_bmbt_irec		bmap;
+	struct xfs_defer_ops		dfops;
+	xfs_fsblock_t			firstfsb;
+	xfs_extlen_t			extlen;
+	int				flags;
+	int				error = 0;
+
+	/* Form the "new" mapping... */
+	bmap.br_startblock = XFS_AGB_TO_FSB(sc->mp, rbe->agno,
+			rbe->rmap.rm_startblock);
+	bmap.br_startoff = rbe->rmap.rm_offset;
+
+	flags = 0;
+	if (rbe->rmap.rm_flags & XFS_RMAP_UNWRITTEN)
+		flags = XFS_BMAPI_PREALLOC;
+	while (rbe->rmap.rm_blockcount > 0) {
+		xfs_defer_init(&dfops, &firstfsb);
+		extlen = min_t(xfs_extlen_t, rbe->rmap.rm_blockcount,
+				MAXEXTLEN);
+		bmap.br_blockcount = extlen;
+
+		/* Re-add the extent to the fork. */
+		error = xfs_bmapi_remap(sc->tp, sc->ip,
+				bmap.br_startoff, extlen,
+				bmap.br_startblock, &dfops,
+				baseflags | flags);
+		if (error)
+			goto out_cancel;
+
+		bmap.br_startblock += extlen;
+		bmap.br_startoff += extlen;
+		rbe->rmap.rm_blockcount -= extlen;
+		error = xfs_defer_ijoin(&dfops, sc->ip);
+		if (error)
+			goto out_cancel;
+		error = xfs_defer_finish(&sc->tp, &dfops);
+		if (error)
+			goto out;
+		/* Make sure we roll the transaction. */
+		error = xfs_trans_roll_inode(&sc->tp, sc->ip);
+		if (error)
+			goto out;
+	}
+
+	return 0;
+out_cancel:
+	xfs_defer_cancel(&dfops);
+out:
+	return error;
+}
+
+/* Repair an inode fork. */
+STATIC int
+xfs_repair_bmap(
+	struct xfs_scrub_context	*sc,
+	int				whichfork)
+{
+	struct xfs_repair_bmap		rb;
+	struct xfs_owner_info		oinfo;
+	struct xfs_inode		*ip = sc->ip;
+	struct xfs_mount		*mp = ip->i_mount;
+	struct xfs_repair_bmap_extent	*rbe;
+	struct xfs_repair_bmap_extent	*n;
+	xfs_agnumber_t			agno;
+	unsigned int			resblks;
+	int				baseflags;
+	int				error = 0;
+
+	ASSERT(whichfork == XFS_DATA_FORK || whichfork == XFS_ATTR_FORK);
+
+	/* Don't know how to repair the other fork formats. */
+	if (XFS_IFORK_FORMAT(sc->ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
+	    XFS_IFORK_FORMAT(sc->ip, whichfork) != XFS_DINODE_FMT_BTREE)
+		return -EOPNOTSUPP;
+
+	/* Only files, symlinks, and directories get to have data forks. */
+	if (whichfork == XFS_DATA_FORK && !S_ISREG(VFS_I(ip)->i_mode) &&
+	    !S_ISDIR(VFS_I(ip)->i_mode) && !S_ISLNK(VFS_I(ip)->i_mode))
+		return -EINVAL;
+
+	/* If we somehow have delalloc extents, forget it. */
+	if (whichfork == XFS_DATA_FORK && ip->i_delayed_blks)
+		return -EBUSY;
+
+	/*
+	 * If there's no attr fork area in the inode, there's
+	 * no attr fork to rebuild.
+	 */
+	if (whichfork == XFS_ATTR_FORK && !XFS_IFORK_Q(ip))
+		return -ENOENT;
+
+	/* We require the rmapbt to rebuild anything. */
+	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+		return -EOPNOTSUPP;
+
+	/* Don't know how to rebuild realtime data forks. */
+	if (XFS_IS_REALTIME_INODE(ip) && whichfork == XFS_DATA_FORK)
+		return -EOPNOTSUPP;
+
+	/*
+	 * If this is a file data fork, wait for all pending directio to
+	 * complete, then tear everything out of the page cache.
+	 */
+	if (S_ISREG(VFS_I(ip)->i_mode) && whichfork == XFS_DATA_FORK) {
+		inode_dio_wait(VFS_I(ip));
+		truncate_inode_pages(VFS_I(ip)->i_mapping, 0);
+	}
+
+	/* Collect all reverse mappings for this fork's extents. */
+	memset(&rb, 0, sizeof(rb));
+	INIT_LIST_HEAD(&rb.extlist);
+	xfs_repair_init_extent_list(&rb.btlist);
+	rb.ino = ip->i_ino;
+	rb.whichfork = whichfork;
+	rb.sc = sc;
+
+	/* Iterate the rmaps for extents. */
+	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
+		error = xfs_repair_bmap_scan_ag(&rb, agno);
+		if (error)
+			goto out;
+	}
+
+	/*
+	 * Guess how many blocks we're going to need to rebuild an entire bmap
+	 * from the number of extents we found, and get ourselves a new
+	 * transaction with proper block reservations.
+	 */
+	resblks = xfs_bmbt_calc_size(mp, rb.extents);
+	error = xfs_trans_reserve_more(sc->tp, resblks, 0);
+	if (error)
+		goto out;
+
+	/* Blow out the in-core fork and zero the on-disk fork. */
+	sc->ip->i_d.di_nblocks = rb.otherfork_blocks;
+	xfs_trans_ijoin(sc->tp, sc->ip, 0);
+	if (XFS_IFORK_PTR(ip, whichfork) != NULL)
+		xfs_idestroy_fork(sc->ip, whichfork);
+	XFS_IFORK_FMT_SET(sc->ip, whichfork, XFS_DINODE_FMT_EXTENTS);
+	XFS_IFORK_NEXT_SET(sc->ip, whichfork, 0);
+
+	/* Reinitialize the on-disk fork. */
+	if (whichfork == XFS_DATA_FORK) {
+		memset(&ip->i_df, 0, sizeof(struct xfs_ifork));
+		ip->i_df.if_flags |= XFS_IFEXTENTS;
+	} else if (whichfork == XFS_ATTR_FORK) {
+		if (list_empty(&rb.extlist))
+			ip->i_afp = NULL;
+		else {
+			ip->i_afp = kmem_zone_zalloc(xfs_ifork_zone, KM_SLEEP);
+			ip->i_afp->if_flags |= XFS_IFEXTENTS;
+		}
+	}
+	xfs_trans_log_inode(sc->tp, sc->ip, XFS_ILOG_CORE);
+	error = xfs_trans_roll_inode(&sc->tp, sc->ip);
+	if (error)
+		goto out;
+
+	baseflags = XFS_BMAPI_NORMAP;
+	if (whichfork == XFS_ATTR_FORK)
+		baseflags |= XFS_BMAPI_ATTRFORK;
+
+	/* Release quota counts for the old bmbt blocks. */
+	if (rb.bmbt_blocks) {
+		error = xfs_repair_ino_dqattach(sc);
+		if (error)
+			goto out;
+		xfs_trans_mod_dquot_byino(sc->tp, sc->ip, XFS_TRANS_DQ_BCOUNT,
+				-rb.bmbt_blocks);
+		error = xfs_trans_roll_inode(&sc->tp, sc->ip);
+		if (error)
+			goto out;
+	}
+
+	/* "Remap" the extents into the fork. */
+	list_sort(NULL, &rb.extlist, xfs_repair_bmap_extent_cmp);
+	list_for_each_entry_safe(rbe, n, &rb.extlist, list) {
+		error = xfs_repair_bmap_insert_rec(sc, rbe, baseflags);
+		if (error)
+			goto out;
+		list_del(&rbe->list);
+		if (rbe != &rb.ext)
+			kmem_free(rbe);
+	}
+
+	/* Dispose of all the old bmbt blocks. */
+	xfs_rmap_ino_bmbt_owner(&oinfo, sc->ip->i_ino, whichfork);
+	return xfs_repair_reap_btree_extents(sc, &rb.btlist, &oinfo,
+			XFS_AG_RESV_NONE);
+out:
+	xfs_repair_cancel_btree_extents(sc, &rb.btlist);
+	list_for_each_entry_safe(rbe, n, &rb.extlist, list) {
+		list_del(&rbe->list);
+		if (rbe != &rb.ext)
+			kmem_free(rbe);
+	}
+	return error;
+}
+
+/* Repair an inode's data fork. */
+int
+xfs_repair_bmap_data(
+	struct xfs_scrub_context	*sc)
+{
+	return xfs_repair_bmap(sc, XFS_DATA_FORK);
+}
+
+/* Repair an inode's attr fork. */
+int
+xfs_repair_bmap_attr(
+	struct xfs_scrub_context	*sc)
+{
+	return xfs_repair_bmap(sc, XFS_ATTR_FORK);
+}
diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
index b7fbdfe1e4b0..f572539e7b55 100644
--- a/fs/xfs/scrub/repair.h
+++ b/fs/xfs/scrub/repair.h
@@ -121,6 +121,8 @@ int xfs_repair_iallocbt(struct xfs_scrub_context *sc);
 int xfs_repair_rmapbt(struct xfs_scrub_context *sc);
 int xfs_repair_refcountbt(struct xfs_scrub_context *sc);
 int xfs_repair_inode(struct xfs_scrub_context *sc);
+int xfs_repair_bmap_data(struct xfs_scrub_context *sc);
+int xfs_repair_bmap_attr(struct xfs_scrub_context *sc);
 
 #else
 
@@ -172,6 +174,8 @@ static inline int xfs_repair_rmapbt_setup(
 #define xfs_repair_rmapbt		xfs_repair_notsupported
 #define xfs_repair_refcountbt		xfs_repair_notsupported
 #define xfs_repair_inode		xfs_repair_notsupported
+#define xfs_repair_bmap_data		xfs_repair_notsupported
+#define xfs_repair_bmap_attr		xfs_repair_notsupported
 
 #endif /* CONFIG_XFS_ONLINE_REPAIR */
 
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index f08db28fb145..675b04f07205 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -293,13 +293,13 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.type	= ST_INODE,
 		.setup	= xfs_scrub_setup_inode_bmap,
 		.scrub	= xfs_scrub_bmap_data,
-		.repair	= xfs_repair_notsupported,
+		.repair	= xfs_repair_bmap_data,
 	},
 	[XFS_SCRUB_TYPE_BMBTA] = {	/* inode attr fork */
 		.type	= ST_INODE,
 		.setup	= xfs_scrub_setup_inode_bmap,
 		.scrub	= xfs_scrub_bmap_attr,
-		.repair	= xfs_repair_notsupported,
+		.repair	= xfs_repair_bmap_attr,
 	},
 	[XFS_SCRUB_TYPE_BMBTC] = {	/* inode CoW fork */
 		.type	= ST_INODE,


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 10/14] xfs: repair damaged symlinks
  2018-05-30 19:30 [PATCH v15.2 00/14] xfs-4.18: online repair support Darrick J. Wong
                   ` (8 preceding siblings ...)
  2018-05-30 19:31 ` [PATCH 09/14] xfs: repair inode block maps Darrick J. Wong
@ 2018-05-30 19:31 ` Darrick J. Wong
  2018-05-30 19:31 ` [PATCH 11/14] xfs: repair extended attributes Darrick J. Wong
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2018-05-30 19:31 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Repair inconsistent symbolic link data.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile               |    1 
 fs/xfs/scrub/repair.h         |    2 
 fs/xfs/scrub/scrub.c          |    2 
 fs/xfs/scrub/symlink.c        |    2 
 fs/xfs/scrub/symlink_repair.c |  284 +++++++++++++++++++++++++++++++++++++++++
 5 files changed, 289 insertions(+), 2 deletions(-)
 create mode 100644 fs/xfs/scrub/symlink_repair.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 653da1fe6b26..5e336892f21f 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -182,6 +182,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   refcount_repair.o \
 				   repair.o \
 				   rmap_repair.o \
+				   symlink_repair.o \
 				   )
 endif
 endif
diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
index f572539e7b55..9897649b659f 100644
--- a/fs/xfs/scrub/repair.h
+++ b/fs/xfs/scrub/repair.h
@@ -123,6 +123,7 @@ int xfs_repair_refcountbt(struct xfs_scrub_context *sc);
 int xfs_repair_inode(struct xfs_scrub_context *sc);
 int xfs_repair_bmap_data(struct xfs_scrub_context *sc);
 int xfs_repair_bmap_attr(struct xfs_scrub_context *sc);
+int xfs_repair_symlink(struct xfs_scrub_context *sc);
 
 #else
 
@@ -176,6 +177,7 @@ static inline int xfs_repair_rmapbt_setup(
 #define xfs_repair_inode		xfs_repair_notsupported
 #define xfs_repair_bmap_data		xfs_repair_notsupported
 #define xfs_repair_bmap_attr		xfs_repair_notsupported
+#define xfs_repair_symlink		xfs_repair_notsupported
 
 #endif /* CONFIG_XFS_ONLINE_REPAIR */
 
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 675b04f07205..462c44ca3080 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -323,7 +323,7 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.type	= ST_INODE,
 		.setup	= xfs_scrub_setup_symlink,
 		.scrub	= xfs_scrub_symlink,
-		.repair	= xfs_repair_notsupported,
+		.repair	= xfs_repair_symlink,
 	},
 	[XFS_SCRUB_TYPE_PARENT] = {	/* parent pointers */
 		.type	= ST_INODE,
diff --git a/fs/xfs/scrub/symlink.c b/fs/xfs/scrub/symlink.c
index 3aa3d60f7c16..a370aad5233f 100644
--- a/fs/xfs/scrub/symlink.c
+++ b/fs/xfs/scrub/symlink.c
@@ -48,7 +48,7 @@ xfs_scrub_setup_symlink(
 	if (!sc->buf)
 		return -ENOMEM;
 
-	return xfs_scrub_setup_inode_contents(sc, ip, 0);
+	return xfs_scrub_setup_inode_contents(sc, ip, XFS_SYMLINK_MAPS);
 }
 
 /* Symbolic links. */
diff --git a/fs/xfs/scrub/symlink_repair.c b/fs/xfs/scrub/symlink_repair.c
new file mode 100644
index 000000000000..a58eb96dd448
--- /dev/null
+++ b/fs/xfs/scrub/symlink_repair.c
@@ -0,0 +1,284 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_inode_fork.h"
+#include "xfs_symlink.h"
+#include "xfs_bmap.h"
+#include "xfs_quota.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/trace.h"
+#include "scrub/repair.h"
+
+/* Blow out the whole symlink; replace contents. */
+STATIC int
+xfs_repair_symlink_rewrite(
+	struct xfs_trans	**tpp,
+	struct xfs_inode	*ip,
+	const char		*target_path,
+	int			pathlen)
+{
+	struct xfs_defer_ops	dfops;
+	struct xfs_bmbt_irec	mval[XFS_SYMLINK_MAPS];
+	struct xfs_ifork	*ifp;
+	const char		*cur_chunk;
+	struct xfs_mount	*mp = (*tpp)->t_mountp;
+	struct xfs_buf		*bp;
+	xfs_fsblock_t		first_block;
+	xfs_fileoff_t		first_fsb;
+	xfs_filblks_t		fs_blocks;
+	xfs_daddr_t		d;
+	int			byte_cnt;
+	int			n;
+	int			nmaps;
+	int			offset;
+	int			error = 0;
+
+	ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
+
+	/* Truncate the whole data fork if it wasn't inline. */
+	if (!(ifp->if_flags & XFS_IFINLINE)) {
+		error = xfs_itruncate_extents(tpp, ip, XFS_DATA_FORK, 0);
+		if (error)
+			goto out;
+	}
+
+	/* Blow out the in-core fork and zero the on-disk fork. */
+	xfs_idestroy_fork(ip, XFS_DATA_FORK);
+	ip->i_d.di_format = XFS_DINODE_FMT_EXTENTS;
+	ip->i_d.di_nextents = 0;
+	memset(&ip->i_df, 0, sizeof(struct xfs_ifork));
+	ip->i_df.if_flags |= XFS_IFEXTENTS;
+
+	/* Rewrite an inline symlink. */
+	if (pathlen <= XFS_IFORK_DSIZE(ip)) {
+		xfs_init_local_fork(ip, XFS_DATA_FORK, target_path, pathlen);
+
+		i_size_write(VFS_I(ip), pathlen);
+		ip->i_d.di_size = pathlen;
+		ip->i_d.di_format = XFS_DINODE_FMT_LOCAL;
+		xfs_trans_log_inode(*tpp, ip, XFS_ILOG_DDATA | XFS_ILOG_CORE);
+		goto out;
+
+	}
+
+	/* Rewrite a remote symlink. */
+	fs_blocks = xfs_symlink_blocks(mp, pathlen);
+	first_fsb = 0;
+	nmaps = XFS_SYMLINK_MAPS;
+
+	/* Reserve quota for new blocks. */
+	error = xfs_trans_reserve_quota_nblks(*tpp, ip, fs_blocks, 0,
+			XFS_QMOPT_RES_REGBLKS);
+	if (error)
+		goto out;
+
+	/* Map blocks, write symlink target. */
+	xfs_defer_init(&dfops, &first_block);
+
+	error = xfs_bmapi_write(*tpp, ip, first_fsb, fs_blocks,
+			  XFS_BMAPI_METADATA, &first_block, fs_blocks,
+			  mval, &nmaps, &dfops);
+	if (error)
+		goto out_bmap_cancel;
+
+	ip->i_d.di_size = pathlen;
+	i_size_write(VFS_I(ip), pathlen);
+	xfs_trans_log_inode(*tpp, ip, XFS_ILOG_CORE);
+
+	cur_chunk = target_path;
+	offset = 0;
+	for (n = 0; n < nmaps; n++) {
+		char	*buf;
+
+		d = XFS_FSB_TO_DADDR(mp, mval[n].br_startblock);
+		byte_cnt = XFS_FSB_TO_B(mp, mval[n].br_blockcount);
+		bp = xfs_trans_get_buf(*tpp, mp->m_ddev_targp, d,
+				       BTOBB(byte_cnt), 0);
+		if (!bp) {
+			error = -ENOMEM;
+			goto out_bmap_cancel;
+		}
+		bp->b_ops = &xfs_symlink_buf_ops;
+
+		byte_cnt = XFS_SYMLINK_BUF_SPACE(mp, byte_cnt);
+		byte_cnt = min(byte_cnt, pathlen);
+
+		buf = bp->b_addr;
+		buf += xfs_symlink_hdr_set(mp, ip->i_ino, offset,
+					   byte_cnt, bp);
+
+		memcpy(buf, cur_chunk, byte_cnt);
+
+		cur_chunk += byte_cnt;
+		pathlen -= byte_cnt;
+		offset += byte_cnt;
+
+		xfs_trans_buf_set_type(*tpp, bp, XFS_BLFT_SYMLINK_BUF);
+		xfs_trans_log_buf(*tpp, bp, 0, (buf + byte_cnt - 1) -
+						(char *)bp->b_addr);
+	}
+	ASSERT(pathlen == 0);
+
+	error = xfs_defer_finish(tpp, &dfops);
+	if (error)
+		goto out_bmap_cancel;
+
+	return 0;
+
+out_bmap_cancel:
+	xfs_defer_cancel(&dfops);
+out:
+	return error;
+}
+
+/* Fix everything that fails the verifiers in the remote blocks. */
+STATIC int
+xfs_repair_symlink_fix_remotes(
+	struct xfs_scrub_context	*sc,
+	loff_t				len)
+{
+	struct xfs_bmbt_irec		mval[XFS_SYMLINK_MAPS];
+	struct xfs_buf			*bp;
+	xfs_filblks_t			fsblocks;
+	xfs_daddr_t			d;
+	loff_t				offset;
+	unsigned int			byte_cnt;
+	int				n;
+	int				nmaps = XFS_SYMLINK_MAPS;
+	int				nr;
+	int				error;
+
+	fsblocks = xfs_symlink_blocks(sc->mp, len);
+	error = xfs_bmapi_read(sc->ip, 0, fsblocks, mval, &nmaps, 0);
+	if (error)
+		return error;
+
+	offset = 0;
+	for (n = 0; n < nmaps; n++) {
+		d = XFS_FSB_TO_DADDR(sc->mp, mval[n].br_startblock);
+		byte_cnt = XFS_FSB_TO_B(sc->mp, mval[n].br_blockcount);
+
+		error = xfs_trans_read_buf(sc->mp, sc->tp, sc->mp->m_ddev_targp,
+				d, BTOBB(byte_cnt), 0, &bp, NULL);
+		if (error)
+			return error;
+		bp->b_ops = &xfs_symlink_buf_ops;
+
+		byte_cnt = XFS_SYMLINK_BUF_SPACE(sc->mp, byte_cnt);
+		if (len < byte_cnt)
+			byte_cnt = len;
+
+		nr = xfs_symlink_hdr_set(sc->mp, sc->ip->i_ino, offset,
+				byte_cnt, bp);
+
+		len -= byte_cnt;
+		offset += byte_cnt;
+
+		xfs_trans_buf_set_type(sc->tp, bp, XFS_BLFT_SYMLINK_BUF);
+		xfs_trans_log_buf(sc->tp, bp, 0, nr - 1);
+		xfs_trans_brelse(sc->tp, bp);
+	}
+	if (len != 0)
+		return -EFSCORRUPTED;
+
+	return 0;
+}
+
+int
+xfs_repair_symlink(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_inode		*ip = sc->ip;
+	struct xfs_ifork		*ifp;
+	loff_t				len;
+	size_t				newlen;
+	int				error = 0;
+
+	ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
+	len = i_size_read(VFS_I(ip));
+	xfs_trans_ijoin(sc->tp, ip, 0);
+
+	/* Truncate the inode if there's a zero inside the length. */
+	if (ifp->if_flags & XFS_IFINLINE) {
+		if (ifp->if_u1.if_data)
+			newlen = strnlen(ifp->if_u1.if_data,
+					XFS_IFORK_DSIZE(ip));
+		else {
+			/* Zero length symlink becomes a root symlink. */
+			ifp->if_u1.if_data = kmem_alloc(4, KM_SLEEP);
+			snprintf(ifp->if_u1.if_data, 4, "/");
+			newlen = 1;
+		}
+		if (len > newlen) {
+			i_size_write(VFS_I(ip), newlen);
+			ip->i_d.di_size = newlen;
+			xfs_trans_log_inode(sc->tp, ip, XFS_ILOG_DDATA |
+					XFS_ILOG_CORE);
+		}
+		goto out;
+	}
+
+	error = xfs_repair_symlink_fix_remotes(sc, len);
+	if (error)
+		goto out;
+
+	/* Roll transaction, release buffers. */
+	error = xfs_trans_roll_inode(&sc->tp, ip);
+	if (error)
+		goto out;
+
+	/* Size set correctly? */
+	len = i_size_read(VFS_I(ip));
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	error = xfs_readlink(ip, sc->buf);
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
+	if (error)
+		goto out;
+
+	/*
+	 * Figure out the new target length.  We can't handle zero-length
+	 * symlinks, so make sure that we don't write that out.
+	 */
+	newlen = strnlen(sc->buf, XFS_SYMLINK_MAXLEN);
+	if (newlen == 0) {
+		*((char *)sc->buf) = '/';
+		newlen = 1;
+	}
+
+	if (len > newlen)
+		error = xfs_repair_symlink_rewrite(&sc->tp, ip, sc->buf,
+				newlen);
+out:
+	return error;
+}


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 11/14] xfs: repair extended attributes
  2018-05-30 19:30 [PATCH v15.2 00/14] xfs-4.18: online repair support Darrick J. Wong
                   ` (9 preceding siblings ...)
  2018-05-30 19:31 ` [PATCH 10/14] xfs: repair damaged symlinks Darrick J. Wong
@ 2018-05-30 19:31 ` Darrick J. Wong
  2018-05-30 19:31 ` [PATCH 12/14] xfs: scrub should set preen if attr leaf has holes Darrick J. Wong
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2018-05-30 19:31 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

If the extended attributes look bad, try to sift through the rubble to
find whatever keys/values we can, zap the attr tree, and re-add the
values.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile            |    1 
 fs/xfs/scrub/attr.c        |    2 
 fs/xfs/scrub/attr_repair.c |  519 ++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/repair.h      |    2 
 fs/xfs/scrub/scrub.c       |    2 
 fs/xfs/scrub/scrub.h       |    3 
 6 files changed, 527 insertions(+), 2 deletions(-)
 create mode 100644 fs/xfs/scrub/attr_repair.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 5e336892f21f..5bc7e2deacbd 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -175,6 +175,7 @@ xfs-$(CONFIG_XFS_QUOTA)		+= scrub/quota.o
 ifeq ($(CONFIG_XFS_ONLINE_REPAIR),y)
 xfs-y				+= $(addprefix scrub/, \
 				   agheader_repair.o \
+				   attr_repair.o \
 				   alloc_repair.o \
 				   bmap_repair.o \
 				   ialloc_repair.o \
diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
index 84b6d6b66578..ac25d624286e 100644
--- a/fs/xfs/scrub/attr.c
+++ b/fs/xfs/scrub/attr.c
@@ -139,7 +139,7 @@ xfs_scrub_xattr_listent(
  * Within a char, the lowest bit of the char represents the byte with
  * the smallest address
  */
-STATIC bool
+bool
 xfs_scrub_xattr_set_map(
 	struct xfs_scrub_context	*sc,
 	unsigned long			*map,
diff --git a/fs/xfs/scrub/attr_repair.c b/fs/xfs/scrub/attr_repair.c
new file mode 100644
index 000000000000..c7a50fd8f0f5
--- /dev/null
+++ b/fs/xfs/scrub/attr_repair.c
@@ -0,0 +1,519 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_dir2.h"
+#include "xfs_attr.h"
+#include "xfs_attr_leaf.h"
+#include "xfs_attr_sf.h"
+#include "xfs_attr_remote.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/trace.h"
+#include "scrub/repair.h"
+
+/* Extended attribute repair. */
+
+struct xfs_attr_key {
+	struct list_head		list;
+	unsigned char			*value;
+	int				valuelen;
+	int				flags;
+	int				namelen;
+	unsigned char			name[0];
+};
+
+#define XFS_ATTR_KEY_LEN(namelen) (sizeof(struct xfs_attr_key) + (namelen) + 1)
+
+struct xfs_repair_xattr {
+	struct list_head		attrlist;
+	struct xfs_scrub_context	*sc;
+};
+
+/* Iterate each block in an attr fork extent */
+#define for_each_xfs_attr_block(mp, irec, dabno) \
+	for ((dabno) = roundup((xfs_dablk_t)(irec)->br_startoff, \
+			(mp)->m_attr_geo->fsbcount); \
+	     (dabno) < (irec)->br_startoff + (irec)->br_blockcount; \
+	     (dabno) += (mp)->m_attr_geo->fsbcount)
+
+/*
+ * Record an extended attribute key & value for later reinsertion into the
+ * inode.  Use the helpers below, don't call this directly.
+ */
+STATIC int
+__xfs_repair_xattr_salvage_attr(
+	struct xfs_repair_xattr		*rx,
+	struct xfs_buf			*bp,
+	int				flags,
+	int				idx,
+	unsigned char			*name,
+	int				namelen,
+	unsigned char			*value,
+	int				valuelen)
+{
+	struct xfs_attr_key		*key;
+	struct xfs_da_args		args;
+	int				error = -ENOMEM;
+
+	/* Ignore incomplete or oversized attributes. */
+	if ((flags & XFS_ATTR_INCOMPLETE) ||
+	    namelen > XATTR_NAME_MAX || namelen < 0 ||
+	    valuelen > XATTR_SIZE_MAX || valuelen < 0)
+		return 0;
+
+	/* Store attr key. */
+	key = kmem_alloc(XFS_ATTR_KEY_LEN(namelen), KM_MAYFAIL);
+	if (!key)
+		goto err;
+	INIT_LIST_HEAD(&key->list);
+	key->value = kmem_zalloc_large(valuelen, KM_MAYFAIL);
+	if (!key->value)
+		goto err_key;
+	key->valuelen = valuelen;
+	key->flags = flags & (ATTR_ROOT | ATTR_SECURE);
+	key->namelen = namelen;
+	key->name[namelen] = 0;
+	memcpy(key->name, name, namelen);
+
+	/* Caller already had the value, so copy it and exit. */
+	if (value) {
+		memcpy(key->value, value, valuelen);
+		goto out_ok;
+	}
+
+	/* Otherwise look up the remote value directly. */
+	memset(&args, 0, sizeof(args));
+	args.geo = rx->sc->mp->m_attr_geo;
+	args.index = idx;
+	args.namelen = namelen;
+	args.name = key->name;
+	args.valuelen = valuelen;
+	args.value = key->value;
+	args.dp = rx->sc->ip;
+	args.trans = rx->sc->tp;
+	error = xfs_attr3_leaf_getvalue(bp, &args);
+	if (error || args.rmtblkno == 0)
+		goto err_value;
+
+	error = xfs_attr_rmtval_get(&args);
+	switch (error) {
+	case 0:
+		break;
+	case -EFSBADCRC:
+	case -EFSCORRUPTED:
+		error = 0;
+		/* fall through */
+	default:
+		goto err_value;
+	}
+
+out_ok:
+	list_add_tail(&key->list, &rx->attrlist);
+	return 0;
+
+err_value:
+	kmem_free(key->value);
+err_key:
+	kmem_free(key);
+err:
+	return error;
+}
+
+/*
+ * Record a local format extended attribute key & value for later reinsertion
+ * into the inode.
+ */
+static inline int
+xfs_repair_xattr_salvage_local_attr(
+	struct xfs_repair_xattr		*rx,
+	int				flags,
+	unsigned char			*name,
+	int				namelen,
+	unsigned char			*value,
+	int				valuelen)
+{
+	return __xfs_repair_xattr_salvage_attr(rx, NULL, flags, 0, name,
+			namelen, value, valuelen);
+}
+
+/*
+ * Record a remote format extended attribute key & value for later reinsertion
+ * into the inode.
+ */
+static inline int
+xfs_repair_xattr_salvage_remote_attr(
+	struct xfs_repair_xattr		*rx,
+	int				flags,
+	unsigned char			*name,
+	int				namelen,
+	struct xfs_buf			*leaf_bp,
+	int				idx,
+	int				valuelen)
+{
+	return __xfs_repair_xattr_salvage_attr(rx, leaf_bp, flags, idx,
+			name, namelen, NULL, valuelen);
+}
+
+/* Extract every xattr key that we can from this attr fork block. */
+STATIC int
+xfs_repair_xattr_recover_leaf(
+	struct xfs_repair_xattr		*rx,
+	struct xfs_buf			*bp)
+{
+	struct xfs_attr3_icleaf_hdr	leafhdr;
+	struct xfs_scrub_context	*sc = rx->sc;
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_attr_leafblock	*leaf;
+	unsigned long			*usedmap = sc->buf;
+	struct xfs_attr_leaf_name_local	*lentry;
+	struct xfs_attr_leaf_name_remote *rentry;
+	struct xfs_attr_leaf_entry	*ent;
+	struct xfs_attr_leaf_entry	*entries;
+	char				*buf_end;
+	char				*name;
+	char				*name_end;
+	char				*value;
+	size_t				off;
+	unsigned int			nameidx;
+	unsigned int			namesize;
+	unsigned int			hdrsize;
+	unsigned int			namelen;
+	unsigned int			valuelen;
+	int				i;
+	int				error;
+
+	bitmap_zero(usedmap, mp->m_attr_geo->blksize);
+
+	/* Check the leaf header */
+	leaf = bp->b_addr;
+	xfs_attr3_leaf_hdr_from_disk(mp->m_attr_geo, &leafhdr, leaf);
+	hdrsize = xfs_attr3_leaf_hdr_size(leaf);
+	xfs_scrub_xattr_set_map(sc, usedmap, 0, hdrsize);
+	entries = xfs_attr3_leaf_entryp(leaf);
+
+	buf_end = (char *)bp->b_addr + mp->m_attr_geo->blksize;
+	for (i = 0, ent = entries; i < leafhdr.count; ent++, i++) {
+		/* Skip key if it conflicts with something else? */
+		off = (char *)ent - (char *)leaf;
+		if (!xfs_scrub_xattr_set_map(sc, usedmap, off,
+				sizeof(xfs_attr_leaf_entry_t)))
+			continue;
+
+		/* Check the name information. */
+		nameidx = be16_to_cpu(ent->nameidx);
+		if (nameidx < leafhdr.firstused ||
+		    nameidx >= mp->m_attr_geo->blksize)
+			continue;
+
+		if (ent->flags & XFS_ATTR_LOCAL) {
+			lentry = xfs_attr3_leaf_name_local(leaf, i);
+			namesize = xfs_attr_leaf_entsize_local(lentry->namelen,
+					be16_to_cpu(lentry->valuelen));
+			name_end = (char *)lentry + namesize;
+			if (lentry->namelen == 0)
+				continue;
+			name = lentry->nameval;
+			namelen = lentry->namelen;
+			valuelen = be16_to_cpu(lentry->valuelen);
+			value = &name[namelen];
+		} else {
+			rentry = xfs_attr3_leaf_name_remote(leaf, i);
+			namesize = xfs_attr_leaf_entsize_remote(rentry->namelen);
+			name_end = (char *)rentry + namesize;
+			if (rentry->namelen == 0 || rentry->valueblk == 0)
+				continue;
+			name = rentry->name;
+			namelen = rentry->namelen;
+			valuelen = be32_to_cpu(rentry->valuelen);
+			value = NULL;
+		}
+		if (name_end > buf_end)
+			continue;
+		if (!xfs_scrub_xattr_set_map(sc, usedmap, nameidx, namesize))
+			continue;
+
+		/* Ok, let's save this key/value. */
+		if (ent->flags & XFS_ATTR_LOCAL)
+			error = xfs_repair_xattr_salvage_local_attr(rx,
+				ent->flags, name, namelen, value, valuelen);
+		else
+			error = xfs_repair_xattr_salvage_remote_attr(rx,
+				ent->flags, name, namelen, bp, i, valuelen);
+		if (error)
+			return error;
+	}
+
+	return 0;
+}
+
+/* Try to recover shortform attrs. */
+STATIC int
+xfs_repair_xattr_recover_sf(
+	struct xfs_repair_xattr		*rx)
+{
+	struct xfs_attr_shortform	*sf;
+	struct xfs_attr_sf_entry	*sfe;
+	struct xfs_attr_sf_entry	*next;
+	struct xfs_ifork		*ifp;
+	unsigned char			*end;
+	int				i;
+	int				error;
+
+	ifp = XFS_IFORK_PTR(rx->sc->ip, XFS_ATTR_FORK);
+	sf = (struct xfs_attr_shortform *)rx->sc->ip->i_afp->if_u1.if_data;
+	end = (unsigned char *)ifp->if_u1.if_data + ifp->if_bytes;
+
+	for (i = 0, sfe = &sf->list[0]; i < sf->hdr.count; i++) {
+		next = XFS_ATTR_SF_NEXTENTRY(sfe);
+		if ((unsigned char *)next > end)
+			break;
+
+		/* Ok, let's save this key/value. */
+		error = xfs_repair_xattr_salvage_local_attr(rx, sfe->flags,
+				sfe->nameval, sfe->namelen,
+				&sfe->nameval[sfe->namelen], sfe->valuelen);
+		if (error)
+			return error;
+
+		sfe = next;
+	}
+
+	return 0;
+}
+
+/* Extract as many attribute keys and values as we can. */
+STATIC int
+xfs_repair_xattr_recover(
+	struct xfs_repair_xattr		*rx)
+{
+	struct xfs_iext_cursor		icur;
+	struct xfs_bmbt_irec		got;
+	struct xfs_scrub_context	*sc = rx->sc;
+	struct xfs_ifork		*ifp;
+	struct xfs_da_blkinfo		*info;
+	struct xfs_buf			*bp;
+	xfs_dablk_t			dabno;
+	int				error = 0;
+
+	if (sc->ip->i_d.di_aformat == XFS_DINODE_FMT_LOCAL)
+		return xfs_repair_xattr_recover_sf(rx);
+
+	/* Iterate each attr block in the attr fork. */
+	ifp = XFS_IFORK_PTR(sc->ip, XFS_ATTR_FORK);
+	for_each_xfs_iext(ifp, &icur, &got) {
+		for_each_xfs_attr_block(sc->mp, &got, dabno) {
+			/*
+			 * Try to read buffer.  We invalidate them in the next
+			 * step so we don't bother to set a buffer type or
+			 * ops.
+			 */
+			error = xfs_da_read_buf(sc->tp, sc->ip, dabno, -1, &bp,
+					XFS_ATTR_FORK, NULL);
+			if (error || !bp)
+				continue;
+
+			/* Screen out non-leaves & other garbage. */
+			info = bp->b_addr;
+			if (info->magic != cpu_to_be16(XFS_ATTR3_LEAF_MAGIC) ||
+			    xfs_attr3_leaf_buf_ops.verify_struct(bp) != NULL)
+				continue;
+
+			error = xfs_repair_xattr_recover_leaf(rx, bp);
+			if (error)
+				return error;
+		}
+	}
+
+	return 0;
+}
+
+/* Free all the attribute fork blocks and delete the fork. */
+STATIC int
+xfs_repair_xattr_zap(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_iext_cursor		icur;
+	struct xfs_bmbt_irec		got;
+	struct xfs_ifork		*ifp;
+	struct xfs_buf			*bp;
+	xfs_fileoff_t			lblk;
+	int				error;
+
+	xfs_trans_ijoin(sc->tp, sc->ip, 0);
+
+	if (sc->ip->i_d.di_aformat == XFS_DINODE_FMT_LOCAL)
+		goto out_fork_remove;
+
+	/* Invalidate each attr block in the attr fork. */
+	ifp = XFS_IFORK_PTR(sc->ip, XFS_ATTR_FORK);
+	for_each_xfs_iext(ifp, &icur, &got) {
+		for_each_xfs_attr_block(sc->mp, &got, lblk) {
+			error = xfs_da_get_buf(sc->tp, sc->ip, lblk, -1, &bp,
+					XFS_ATTR_FORK);
+			if (error || !bp)
+				continue;
+			xfs_trans_binval(sc->tp, bp);
+			error = xfs_trans_roll_inode(&sc->tp, sc->ip);
+			if (error)
+				return error;
+		}
+	}
+
+	error = xfs_itruncate_extents(&sc->tp, sc->ip, XFS_ATTR_FORK, 0);
+	if (error)
+		return error;
+
+out_fork_remove:
+	/* Reset the attribute fork - this also destroys the in-core fork */
+	xfs_attr_fork_remove(sc->ip, sc->tp);
+	return 0;
+}
+
+/*
+ * Compare two xattr keys.  ATTR_SECURE keys come before ATTR_ROOT and
+ * ATTR_ROOT keys come before user attrs.  Otherwise sort in hash order.
+ */
+static int
+xfs_repair_xattr_key_cmp(
+	void			*priv,
+	struct list_head	*a,
+	struct list_head	*b)
+{
+	struct xfs_attr_key	*ap;
+	struct xfs_attr_key	*bp;
+	uint			ahash, bhash;
+
+	ap = container_of(a, struct xfs_attr_key, list);
+	bp = container_of(b, struct xfs_attr_key, list);
+
+	if (ap->flags > bp->flags)
+		return 1;
+	else if (ap->flags < bp->flags)
+		return -1;
+
+	ahash = xfs_da_hashname(ap->name, ap->namelen);
+	bhash = xfs_da_hashname(bp->name, bp->namelen);
+	if (ahash > bhash)
+		return 1;
+	else if (ahash < bhash)
+		return -1;
+	return 0;
+}
+
+/* Repair the extended attribute metadata. */
+int
+xfs_repair_xattr(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_repair_xattr		rx;
+	struct xfs_attr_key		*key, *next;
+	struct xfs_ifork		*ifp;
+	int				error;
+
+	if (!xfs_inode_hasattr(sc->ip))
+		return -ENOENT;
+	error = xfs_repair_ino_dqattach(sc);
+	if (error)
+		return error;
+
+	/* Extent map should be loaded. */
+	ifp = XFS_IFORK_PTR(sc->ip, XFS_ATTR_FORK);
+	if (XFS_IFORK_FORMAT(sc->ip, XFS_ATTR_FORK) != XFS_DINODE_FMT_LOCAL &&
+	    !(ifp->if_flags & XFS_IFEXTENTS)) {
+		error = xfs_iread_extents(sc->tp, sc->ip, XFS_ATTR_FORK);
+		if (error)
+			return error;
+	}
+
+	memset(&rx, 0, sizeof(rx));
+	rx.sc = sc;
+	INIT_LIST_HEAD(&rx.attrlist);
+
+	/* Read every attr key and value and record them in memory. */
+	error = xfs_repair_xattr_recover(&rx);
+	if (error)
+		return error;
+
+	/* Reinsert the security and root attrs first. */
+	list_sort(NULL, &rx.attrlist, xfs_repair_xattr_key_cmp);
+
+	/*
+	 * Invalidate and truncate the attribute fork extents, commit the
+	 * repair transaction, and drop the ilock.  The attribute setting code
+	 * needs to be able to allocate special transactions and take the
+	 * ilock on its own.  This means that we can't 100% prevent other
+	 * programs from accessing the inode while we're rebuilding the
+	 * attributes.
+	 */
+	error = xfs_repair_xattr_zap(sc);
+	if (error)
+		goto out_attrs;
+	error = xfs_trans_commit(sc->tp);
+	sc->tp = NULL;
+	if (error)
+		goto out_attrs;
+	xfs_iunlock(sc->ip, XFS_ILOCK_EXCL);
+	sc->ilock_flags &= ~XFS_ILOCK_EXCL;
+
+	/* Re-add every attr to the file. */
+	list_for_each_entry_safe(key, next, &rx.attrlist, list) {
+		error = xfs_attr_set(sc->ip, key->name, key->value,
+				key->valuelen, key->flags);
+		if (error)
+			goto out_attrs;
+
+		/*
+		 * If the attr value is larger than a single page, free the
+		 * key now so that we aren't hogging memory while doing a lot
+		 * of metadata updates.  Otherwise, we want to spend as little
+		 * time reconstructing the attrs as we possibly can.
+		 */
+		if (key->valuelen <= PAGE_SIZE)
+			continue;
+		list_del(&key->list);
+		kmem_free(key->value);
+		kmem_free(key);
+	}
+
+out_attrs:
+	/* Free attribute list. */
+	list_for_each_entry_safe(key, next, &rx.attrlist, list) {
+		list_del(&key->list);
+		kmem_free(key->value);
+		kmem_free(key);
+	}
+
+	return error;
+}
diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
index 9897649b659f..393adfdc255e 100644
--- a/fs/xfs/scrub/repair.h
+++ b/fs/xfs/scrub/repair.h
@@ -124,6 +124,7 @@ int xfs_repair_inode(struct xfs_scrub_context *sc);
 int xfs_repair_bmap_data(struct xfs_scrub_context *sc);
 int xfs_repair_bmap_attr(struct xfs_scrub_context *sc);
 int xfs_repair_symlink(struct xfs_scrub_context *sc);
+int xfs_repair_xattr(struct xfs_scrub_context *sc);
 
 #else
 
@@ -178,6 +179,7 @@ static inline int xfs_repair_rmapbt_setup(
 #define xfs_repair_bmap_data		xfs_repair_notsupported
 #define xfs_repair_bmap_attr		xfs_repair_notsupported
 #define xfs_repair_symlink		xfs_repair_notsupported
+#define xfs_repair_xattr		xfs_repair_notsupported
 
 #endif /* CONFIG_XFS_ONLINE_REPAIR */
 
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 462c44ca3080..a306a31f46cc 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -317,7 +317,7 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.type	= ST_INODE,
 		.setup	= xfs_scrub_setup_xattr,
 		.scrub	= xfs_scrub_xattr,
-		.repair	= xfs_repair_notsupported,
+		.repair	= xfs_repair_xattr,
 	},
 	[XFS_SCRUB_TYPE_SYMLINK] = {	/* symbolic link */
 		.type	= ST_INODE,
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index d4141b336491..50c6d7917f5f 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -156,4 +156,7 @@ void xfs_scrub_xref_is_used_rt_space(struct xfs_scrub_context *sc,
 # define xfs_scrub_xref_is_used_rt_space(sc, rtbno, len) do { } while (0)
 #endif
 
+bool xfs_scrub_xattr_set_map(struct xfs_scrub_context *sc, unsigned long *map,
+		unsigned int start, unsigned int len);
+
 #endif	/* __XFS_SCRUB_SCRUB_H__ */


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 12/14] xfs: scrub should set preen if attr leaf has holes
  2018-05-30 19:30 [PATCH v15.2 00/14] xfs-4.18: online repair support Darrick J. Wong
                   ` (10 preceding siblings ...)
  2018-05-30 19:31 ` [PATCH 11/14] xfs: repair extended attributes Darrick J. Wong
@ 2018-05-30 19:31 ` Darrick J. Wong
  2018-05-30 19:32 ` [PATCH 13/14] xfs: repair quotas Darrick J. Wong
  2018-05-30 19:32 ` [PATCH 14/14] xfs: implement live quotacheck as part of quota repair Darrick J. Wong
  13 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2018-05-30 19:31 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

If an attr block indicates that it could use compaction, set the preen
flag to have the attr fork rebuilt, since the attr fork rebuilder can
take care of that for us.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/scrub/attr.c    |    2 ++
 fs/xfs/scrub/dabtree.c |   15 +++++++++++++++
 fs/xfs/scrub/dabtree.h |    1 +
 fs/xfs/scrub/trace.h   |    1 +
 4 files changed, 19 insertions(+)


diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
index ac25d624286e..ce27357a8dd1 100644
--- a/fs/xfs/scrub/attr.c
+++ b/fs/xfs/scrub/attr.c
@@ -307,6 +307,8 @@ xfs_scrub_xattr_block(
 		xfs_scrub_da_set_corrupt(ds, level);
 	if (!xfs_scrub_xattr_set_map(ds->sc, usedmap, 0, hdrsize))
 		xfs_scrub_da_set_corrupt(ds, level);
+	if (leafhdr.holes)
+		xfs_scrub_da_set_preen(ds, level);
 
 	if (ds->sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
 		goto out;
diff --git a/fs/xfs/scrub/dabtree.c b/fs/xfs/scrub/dabtree.c
index bffdb7dc09bf..d11364d48286 100644
--- a/fs/xfs/scrub/dabtree.c
+++ b/fs/xfs/scrub/dabtree.c
@@ -99,6 +99,21 @@ xfs_scrub_da_set_corrupt(
 			__return_address);
 }
 
+/* Flag a da btree node in need of optimization. */
+void
+xfs_scrub_da_set_preen(
+	struct xfs_scrub_da_btree	*ds,
+	int				level)
+{
+	struct xfs_scrub_context	*sc = ds->sc;
+
+	sc->sm->sm_flags |= XFS_SCRUB_OFLAG_PREEN;
+	trace_xfs_scrub_fblock_preen(sc, ds->dargs.whichfork,
+			xfs_dir2_da_to_db(ds->dargs.geo,
+				ds->state->path.blk[level].blkno),
+			__return_address);
+}
+
 /* Find an entry at a certain level in a da btree. */
 STATIC void *
 xfs_scrub_da_btree_entry(
diff --git a/fs/xfs/scrub/dabtree.h b/fs/xfs/scrub/dabtree.h
index d31468d68cef..681f82faee3e 100644
--- a/fs/xfs/scrub/dabtree.h
+++ b/fs/xfs/scrub/dabtree.h
@@ -50,6 +50,7 @@ bool xfs_scrub_da_process_error(struct xfs_scrub_da_btree *ds, int level, int *e
 
 /* Check for da btree corruption. */
 void xfs_scrub_da_set_corrupt(struct xfs_scrub_da_btree *ds, int level);
+void xfs_scrub_da_set_preen(struct xfs_scrub_da_btree *ds, int level);
 
 int xfs_scrub_da_btree_hash(struct xfs_scrub_da_btree *ds, int level,
 			    __be32 *hashp);
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 794d56bb1af8..1e25cc1cf34b 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -244,6 +244,7 @@ DEFINE_EVENT(xfs_scrub_fblock_error_class, name, \
 
 DEFINE_SCRUB_FBLOCK_ERROR_EVENT(xfs_scrub_fblock_error);
 DEFINE_SCRUB_FBLOCK_ERROR_EVENT(xfs_scrub_fblock_warning);
+DEFINE_SCRUB_FBLOCK_ERROR_EVENT(xfs_scrub_fblock_preen);
 
 TRACE_EVENT(xfs_scrub_incomplete,
 	TP_PROTO(struct xfs_scrub_context *sc, void *ret_ip),


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 13/14] xfs: repair quotas
  2018-05-30 19:30 [PATCH v15.2 00/14] xfs-4.18: online repair support Darrick J. Wong
                   ` (11 preceding siblings ...)
  2018-05-30 19:31 ` [PATCH 12/14] xfs: scrub should set preen if attr leaf has holes Darrick J. Wong
@ 2018-05-30 19:32 ` Darrick J. Wong
  2018-05-30 19:32 ` [PATCH 14/14] xfs: implement live quotacheck as part of quota repair Darrick J. Wong
  13 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2018-05-30 19:32 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Fix anything that causes the quota verifiers to fail.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile             |    1 
 fs/xfs/scrub/attr_repair.c  |    2 
 fs/xfs/scrub/common.h       |    8 +
 fs/xfs/scrub/quota.c        |    2 
 fs/xfs/scrub/quota_repair.c |  355 +++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/repair.c       |   58 +++++++
 fs/xfs/scrub/repair.h       |    8 +
 fs/xfs/scrub/scrub.c        |   11 +
 fs/xfs/scrub/scrub.h        |    1 
 9 files changed, 438 insertions(+), 8 deletions(-)
 create mode 100644 fs/xfs/scrub/quota_repair.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 5bc7e2deacbd..0018ba84944d 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -185,5 +185,6 @@ xfs-y				+= $(addprefix scrub/, \
 				   rmap_repair.o \
 				   symlink_repair.o \
 				   )
+xfs-$(CONFIG_XFS_QUOTA)		+= scrub/quota_repair.o
 endif
 endif
diff --git a/fs/xfs/scrub/attr_repair.c b/fs/xfs/scrub/attr_repair.c
index c7a50fd8f0f5..d66855860b7f 100644
--- a/fs/xfs/scrub/attr_repair.c
+++ b/fs/xfs/scrub/attr_repair.c
@@ -360,7 +360,7 @@ xfs_repair_xattr_recover(
 }
 
 /* Free all the attribute fork blocks and delete the fork. */
-STATIC int
+int
 xfs_repair_xattr_zap(
 	struct xfs_scrub_context	*sc)
 {
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 76bb2d1d808c..235c91065ad5 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -152,6 +152,14 @@ static inline bool xfs_scrub_skip_xref(struct xfs_scrub_metadata *sm)
 			       XFS_SCRUB_OFLAG_XCORRUPT);
 }
 
+/* Do we need to invoke the repair tool? */
+static inline bool xfs_scrub_needs_repair(struct xfs_scrub_metadata *sm)
+{
+	return sm->sm_flags & (XFS_SCRUB_OFLAG_CORRUPT |
+			       XFS_SCRUB_OFLAG_XCORRUPT |
+			       XFS_SCRUB_OFLAG_PREEN);
+}
+
 int xfs_scrub_metadata_inode_forks(struct xfs_scrub_context *sc);
 int xfs_scrub_ilock_inverted(struct xfs_inode *ip, uint lock_mode);
 
diff --git a/fs/xfs/scrub/quota.c b/fs/xfs/scrub/quota.c
index 15ae4d23d6ac..64776257fd88 100644
--- a/fs/xfs/scrub/quota.c
+++ b/fs/xfs/scrub/quota.c
@@ -43,7 +43,7 @@
 #include "scrub/trace.h"
 
 /* Convert a scrub type code to a DQ flag, or return 0 if error. */
-static inline uint
+uint
 xfs_scrub_quota_to_dqtype(
 	struct xfs_scrub_context	*sc)
 {
diff --git a/fs/xfs/scrub/quota_repair.c b/fs/xfs/scrub/quota_repair.c
new file mode 100644
index 000000000000..68b7082af30a
--- /dev/null
+++ b/fs/xfs/scrub/quota_repair.c
@@ -0,0 +1,355 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_inode_fork.h"
+#include "xfs_alloc.h"
+#include "xfs_bmap.h"
+#include "xfs_quota.h"
+#include "xfs_qm.h"
+#include "xfs_dquot.h"
+#include "xfs_dquot_item.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/trace.h"
+#include "scrub/repair.h"
+
+/* Quota repair. */
+
+struct xfs_repair_quota_info {
+	struct xfs_scrub_context	*sc;
+	bool				need_quotacheck;
+};
+
+/* Scrub the fields in an individual quota item. */
+STATIC int
+xfs_repair_quota_item(
+	struct xfs_dquot		*dq,
+	uint				dqtype,
+	void				*priv)
+{
+	struct xfs_repair_quota_info	*rqi = priv;
+	struct xfs_scrub_context	*sc = rqi->sc;
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_disk_dquot		*d = &dq->q_core;
+	unsigned long long		bsoft;
+	unsigned long long		isoft;
+	unsigned long long		rsoft;
+	unsigned long long		bhard;
+	unsigned long long		ihard;
+	unsigned long long		rhard;
+	unsigned long long		bcount;
+	unsigned long long		icount;
+	unsigned long long		rcount;
+	xfs_ino_t			fs_icount;
+	bool				dirty = false;
+	int				error;
+
+	/* Did we get the dquot type we wanted? */
+	if (dqtype != (d->d_flags & XFS_DQ_ALLTYPES)) {
+		d->d_flags = dqtype;
+		dirty = true;
+	}
+
+	if (d->d_pad0 || d->d_pad) {
+		d->d_pad0 = 0;
+		d->d_pad = 0;
+		dirty = true;
+	}
+
+	/* Check the limits. */
+	bhard = be64_to_cpu(d->d_blk_hardlimit);
+	ihard = be64_to_cpu(d->d_ino_hardlimit);
+	rhard = be64_to_cpu(d->d_rtb_hardlimit);
+
+	bsoft = be64_to_cpu(d->d_blk_softlimit);
+	isoft = be64_to_cpu(d->d_ino_softlimit);
+	rsoft = be64_to_cpu(d->d_rtb_softlimit);
+
+	if (bsoft > bhard) {
+		d->d_blk_softlimit = d->d_blk_hardlimit;
+		dirty = true;
+	}
+
+	if (isoft > ihard) {
+		d->d_ino_softlimit = d->d_ino_hardlimit;
+		dirty = true;
+	}
+
+	if (rsoft > rhard) {
+		d->d_rtb_softlimit = d->d_rtb_hardlimit;
+		dirty = true;
+	}
+
+	/* Check the resource counts. */
+	bcount = be64_to_cpu(d->d_bcount);
+	icount = be64_to_cpu(d->d_icount);
+	rcount = be64_to_cpu(d->d_rtbcount);
+	fs_icount = percpu_counter_sum(&mp->m_icount);
+
+	/*
+	 * Check that usage doesn't exceed physical limits.  However, on
+	 * a reflink filesystem we're allowed to exceed physical space
+	 * if there are no quota limits.  We don't know what the real number
+	 * is, but we can make quotacheck find out for us.
+	 */
+	if (!xfs_sb_version_hasreflink(&mp->m_sb) &&
+	    mp->m_sb.sb_dblocks < bcount) {
+		dq->q_res_bcount -= be64_to_cpu(dq->q_core.d_bcount);
+		dq->q_res_bcount += mp->m_sb.sb_dblocks;
+		d->d_bcount = cpu_to_be64(mp->m_sb.sb_dblocks);
+		rqi->need_quotacheck = true;
+		dirty = true;
+	}
+	if (icount > fs_icount) {
+		dq->q_res_icount -= be64_to_cpu(dq->q_core.d_icount);
+		dq->q_res_icount += fs_icount;
+		d->d_icount = cpu_to_be64(fs_icount);
+		rqi->need_quotacheck = true;
+		dirty = true;
+	}
+	if (rcount > mp->m_sb.sb_rblocks) {
+		dq->q_res_rtbcount -= be64_to_cpu(dq->q_core.d_rtbcount);
+		dq->q_res_rtbcount += mp->m_sb.sb_rblocks;
+		d->d_rtbcount = cpu_to_be64(mp->m_sb.sb_rblocks);
+		rqi->need_quotacheck = true;
+		dirty = true;
+	}
+
+	if (!dirty)
+		return 0;
+
+	dq->dq_flags |= XFS_DQ_DIRTY;
+	xfs_trans_dqjoin(sc->tp, dq);
+	xfs_trans_log_dquot(sc->tp, dq);
+	error = xfs_trans_roll(&sc->tp);
+	xfs_dqlock(dq);
+	return error;
+}
+
+/* Fix a quota timer so that we can pass the verifier. */
+STATIC void
+xfs_repair_quota_fix_timer(
+	__be64			softlimit,
+	__be64			countnow,
+	__be32			*timer,
+	time_t			timelimit)
+{
+	uint64_t		soft = be64_to_cpu(softlimit);
+	uint64_t		count = be64_to_cpu(countnow);
+
+	if (soft && count > soft && *timer == 0)
+		*timer = cpu_to_be32(get_seconds() + timelimit);
+}
+
+/* Fix anything the verifiers complain about. */
+STATIC int
+xfs_repair_quota_block(
+	struct xfs_scrub_context	*sc,
+	struct xfs_buf			*bp,
+	uint				dqtype,
+	xfs_dqid_t			id)
+{
+	struct xfs_dqblk		*d = (struct xfs_dqblk *)bp->b_addr;
+	struct xfs_disk_dquot		*ddq;
+	struct xfs_quotainfo		*qi = sc->mp->m_quotainfo;
+	enum xfs_blft			buftype = 0;
+	int				i;
+
+	bp->b_ops = &xfs_dquot_buf_ops;
+	for (i = 0; i < qi->qi_dqperchunk; i++) {
+		ddq = &d[i].dd_diskdq;
+
+		ddq->d_magic = cpu_to_be16(XFS_DQUOT_MAGIC);
+		ddq->d_version = XFS_DQUOT_VERSION;
+		ddq->d_flags = dqtype;
+		ddq->d_id = cpu_to_be32(id + i);
+
+		xfs_repair_quota_fix_timer(ddq->d_blk_softlimit,
+				ddq->d_bcount, &ddq->d_btimer,
+				qi->qi_btimelimit);
+		xfs_repair_quota_fix_timer(ddq->d_ino_softlimit,
+				ddq->d_icount, &ddq->d_itimer,
+				qi->qi_itimelimit);
+		xfs_repair_quota_fix_timer(ddq->d_rtb_softlimit,
+				ddq->d_rtbcount, &ddq->d_rtbtimer,
+				qi->qi_rtbtimelimit);
+
+		if (xfs_sb_version_hascrc(&sc->mp->m_sb)) {
+			uuid_copy(&d->dd_uuid, &sc->mp->m_sb.sb_meta_uuid);
+			xfs_update_cksum((char *)d, sizeof(struct xfs_dqblk),
+					 XFS_DQUOT_CRC_OFF);
+		} else {
+			memset(&d->dd_uuid, 0, sizeof(d->dd_uuid));
+			d->dd_lsn = 0;
+			d->dd_crc = 0;
+		}
+	}
+	switch (dqtype) {
+	case XFS_DQ_USER:
+		buftype = XFS_BLFT_UDQUOT_BUF;
+		break;
+	case XFS_DQ_GROUP:
+		buftype = XFS_BLFT_GDQUOT_BUF;
+		break;
+	case XFS_DQ_PROJ:
+		buftype = XFS_BLFT_PDQUOT_BUF;
+		break;
+	}
+	xfs_trans_buf_set_type(sc->tp, bp, buftype);
+	xfs_trans_log_buf(sc->tp, bp, 0, BBTOB(bp->b_length) - 1);
+	return xfs_trans_roll(&sc->tp);
+}
+
+/* Repair quota's data fork. */
+STATIC int
+xfs_repair_quota_data_fork(
+	struct xfs_scrub_context	*sc,
+	uint				dqtype)
+{
+	struct xfs_bmbt_irec		irec = { 0 };
+	struct xfs_iext_cursor		icur;
+	struct xfs_scrub_metadata	*real_sm = sc->sm;
+	struct xfs_quotainfo		*qi = sc->mp->m_quotainfo;
+	struct xfs_ifork		*ifp;
+	struct xfs_buf			*bp;
+	struct xfs_dqblk		*d;
+	xfs_dqid_t			id;
+	xfs_fileoff_t			max_dqid_off;
+	xfs_fileoff_t			off;
+	xfs_fsblock_t			fsbno;
+	bool				truncate = false;
+	int				error = 0;
+
+	error = xfs_repair_metadata_inode_forks(sc);
+	if (error)
+		goto out;
+
+	/* Check for data fork problems that apply only to quota files. */
+	max_dqid_off = ((xfs_dqid_t)-1) / qi->qi_dqperchunk;
+	ifp = XFS_IFORK_PTR(sc->ip, XFS_DATA_FORK);
+	for_each_xfs_iext(ifp, &icur, &irec) {
+		if (isnullstartblock(irec.br_startblock)) {
+			error = -EFSCORRUPTED;
+			goto out;
+		}
+
+		if (irec.br_startoff > max_dqid_off ||
+		    irec.br_startoff + irec.br_blockcount - 1 > max_dqid_off) {
+			truncate = true;
+			break;
+		}
+	}
+	if (truncate) {
+		error = xfs_itruncate_extents(&sc->tp, sc->ip, XFS_DATA_FORK,
+				max_dqid_off * sc->mp->m_sb.sb_blocksize);
+		if (error)
+			goto out;
+	}
+
+	/* Now go fix anything that fails the verifiers. */
+	for_each_xfs_iext(ifp, &icur, &irec) {
+		for (fsbno = irec.br_startblock, off = irec.br_startoff;
+		     fsbno < irec.br_startblock + irec.br_blockcount;
+		     fsbno += XFS_DQUOT_CLUSTER_SIZE_FSB,
+				off += XFS_DQUOT_CLUSTER_SIZE_FSB) {
+			id = off * qi->qi_dqperchunk;
+			error = xfs_trans_read_buf(sc->mp, sc->tp,
+					sc->mp->m_ddev_targp,
+					XFS_FSB_TO_DADDR(sc->mp, fsbno),
+					qi->qi_dqchunklen,
+					0, &bp, &xfs_dquot_buf_ops);
+			if (error == 0) {
+				d = (struct xfs_dqblk *)bp->b_addr;
+				if (id == be32_to_cpu(d->dd_diskdq.d_id))
+					continue;
+				error = -EFSCORRUPTED;
+			}
+			if (error != -EFSBADCRC && error != -EFSCORRUPTED)
+				goto out;
+
+			/* Failed verifier, try again. */
+			error = xfs_trans_read_buf(sc->mp, sc->tp,
+					sc->mp->m_ddev_targp,
+					XFS_FSB_TO_DADDR(sc->mp, fsbno),
+					qi->qi_dqchunklen,
+					0, &bp, NULL);
+			if (error)
+				goto out;
+			error = xfs_repair_quota_block(sc, bp, dqtype, id);
+		}
+	}
+
+out:
+	sc->sm = real_sm;
+	return error;
+}
+
+/* Repair all of a quota type's items. */
+int
+xfs_repair_quota(
+	struct xfs_scrub_context	*sc)
+{
+	struct xfs_repair_quota_info	rqi;
+	struct xfs_mount		*mp = sc->mp;
+	uint				dqtype;
+	int				error = 0;
+
+	dqtype = xfs_scrub_quota_to_dqtype(sc);
+
+	error = xfs_repair_quota_data_fork(sc, dqtype);
+	if (error)
+		goto out;
+
+	/*
+	 * Go fix anything in the quota items that we could have been mad
+	 * about.  Now that we've checked the quota inode data fork we have to
+	 * drop ILOCK_EXCL to use the regular dquot functions.
+	 */
+	xfs_iunlock(sc->ip, sc->ilock_flags);
+	sc->ilock_flags = 0;
+	rqi.sc = sc;
+	rqi.need_quotacheck = false;
+	error = xfs_qm_dqiterate(mp, dqtype, xfs_repair_quota_item, &rqi);
+	if (error)
+		goto out_relock;
+
+	/* Make a quotacheck happen. */
+	if (rqi.need_quotacheck)
+		xfs_repair_force_quotacheck(sc, dqtype);
+
+out_relock:
+	sc->ilock_flags = XFS_ILOCK_EXCL;
+	xfs_ilock(sc->ip, sc->ilock_flags);
+out:
+	return error;
+}
diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
index 4b5d599d53b9..6143a159da88 100644
--- a/fs/xfs/scrub/repair.c
+++ b/fs/xfs/scrub/repair.c
@@ -45,6 +45,8 @@
 #include "xfs_quota.h"
 #include "xfs_bmap.h"
 #include "xfs_bmap_util.h"
+#include "xfs_attr.h"
+#include "xfs_reflink.h"
 #include "scrub/xfs_scrub.h"
 #include "scrub/scrub.h"
 #include "scrub/common.h"
@@ -1265,3 +1267,59 @@ xfs_repair_grab_all_ag_headers(
 
 	return error;
 }
+
+/*
+ * Repair the attr/data forks of a metadata inode.  The metadata inode must be
+ * pointed to by sc->ip and the ILOCK must be held.
+ */
+int
+xfs_repair_metadata_inode_forks(
+	struct xfs_scrub_context	*sc)
+{
+	__u32				smtype;
+	__u32				smflags;
+	int				error;
+
+	smtype = sc->sm->sm_type;
+	smflags = sc->sm->sm_flags;
+
+	/* Let's see if the forks need repair. */
+	sc->sm->sm_flags &= ~XFS_SCRUB_FLAGS_OUT;
+	error = xfs_scrub_metadata_inode_forks(sc);
+	if (error || !xfs_scrub_needs_repair(sc->sm))
+		goto out;
+
+	xfs_trans_ijoin(sc->tp, sc->ip, 0);
+
+	/* Clear the reflink flag & attr forks that we shouldn't have. */
+	if (xfs_is_reflink_inode(sc->ip)) {
+		error = xfs_reflink_clear_inode_flag(sc->ip, &sc->tp);
+		if (error)
+			goto out;
+	}
+
+	if (xfs_inode_hasattr(sc->ip)) {
+		error = xfs_repair_xattr_zap(sc);
+		if (error)
+			goto out;
+	}
+
+	/* Repair the data fork. */
+	sc->sm->sm_type = XFS_SCRUB_TYPE_BMBTD;
+	error = xfs_repair_bmap_data(sc);
+	sc->sm->sm_type = smtype;
+	if (error)
+		goto out;
+
+	/* Bail out if we still need repairs. */
+	sc->sm->sm_flags &= ~XFS_SCRUB_FLAGS_OUT;
+	error = xfs_scrub_metadata_inode_forks(sc);
+	if (error)
+		goto out;
+	if (xfs_scrub_needs_repair(sc->sm))
+		error = -EFSCORRUPTED;
+out:
+	sc->sm->sm_type = smtype;
+	sc->sm->sm_flags = smflags;
+	return error;
+}
diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
index 393adfdc255e..88be83752956 100644
--- a/fs/xfs/scrub/repair.h
+++ b/fs/xfs/scrub/repair.h
@@ -108,6 +108,8 @@ int xfs_repair_fs_thaw(struct xfs_scrub_context *sc);
 void xfs_repair_frozen_iput(struct xfs_scrub_context *sc, struct xfs_inode *ip);
 int xfs_repair_grab_all_ag_headers(struct xfs_scrub_context *sc);
 int xfs_repair_rmapbt_setup(struct xfs_scrub_context *sc, struct xfs_inode *ip);
+int xfs_repair_xattr_zap(struct xfs_scrub_context *sc);
+int xfs_repair_metadata_inode_forks(struct xfs_scrub_context *sc);
 
 /* Metadata repairers */
 
@@ -125,6 +127,11 @@ int xfs_repair_bmap_data(struct xfs_scrub_context *sc);
 int xfs_repair_bmap_attr(struct xfs_scrub_context *sc);
 int xfs_repair_symlink(struct xfs_scrub_context *sc);
 int xfs_repair_xattr(struct xfs_scrub_context *sc);
+#ifdef CONFIG_XFS_QUOTA
+int xfs_repair_quota(struct xfs_scrub_context *sc);
+#else
+# define xfs_repair_quota		xfs_repair_notsupported
+#endif /* CONFIG_XFS_QUOTA */
 
 #else
 
@@ -180,6 +187,7 @@ static inline int xfs_repair_rmapbt_setup(
 #define xfs_repair_bmap_attr		xfs_repair_notsupported
 #define xfs_repair_symlink		xfs_repair_notsupported
 #define xfs_repair_xattr		xfs_repair_notsupported
+#define xfs_repair_quota		xfs_repair_notsupported
 
 #endif /* CONFIG_XFS_ONLINE_REPAIR */
 
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index a306a31f46cc..6ca3da5ee2ca 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -349,19 +349,19 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
 		.type	= ST_FS,
 		.setup	= xfs_scrub_setup_quota,
 		.scrub	= xfs_scrub_quota,
-		.repair	= xfs_repair_notsupported,
+		.repair	= xfs_repair_quota,
 	},
 	[XFS_SCRUB_TYPE_GQUOTA] = {	/* group quota */
 		.type	= ST_FS,
 		.setup	= xfs_scrub_setup_quota,
 		.scrub	= xfs_scrub_quota,
-		.repair	= xfs_repair_notsupported,
+		.repair	= xfs_repair_quota,
 	},
 	[XFS_SCRUB_TYPE_PQUOTA] = {	/* project quota */
 		.type	= ST_FS,
 		.setup	= xfs_scrub_setup_quota,
 		.scrub	= xfs_scrub_quota,
-		.repair	= xfs_repair_notsupported,
+		.repair	= xfs_repair_quota,
 	},
 };
 
@@ -557,9 +557,8 @@ xfs_scrub_metadata(
 		if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_FORCE_SCRUB_REPAIR))
 			sc.sm->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT;
 
-		needs_fix = (sc.sm->sm_flags & (XFS_SCRUB_OFLAG_CORRUPT |
-						XFS_SCRUB_OFLAG_XCORRUPT |
-						XFS_SCRUB_OFLAG_PREEN));
+		needs_fix = xfs_scrub_needs_repair(sc.sm);
+
 		/*
 		 * If userspace asked for a repair but it wasn't necessary,
 		 * report that back to userspace.
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index 50c6d7917f5f..08f10ab36e6b 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -158,5 +158,6 @@ void xfs_scrub_xref_is_used_rt_space(struct xfs_scrub_context *sc,
 
 bool xfs_scrub_xattr_set_map(struct xfs_scrub_context *sc, unsigned long *map,
 		unsigned int start, unsigned int len);
+uint xfs_scrub_quota_to_dqtype(struct xfs_scrub_context *sc);
 
 #endif	/* __XFS_SCRUB_SCRUB_H__ */


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 14/14] xfs: implement live quotacheck as part of quota repair
  2018-05-30 19:30 [PATCH v15.2 00/14] xfs-4.18: online repair support Darrick J. Wong
                   ` (12 preceding siblings ...)
  2018-05-30 19:32 ` [PATCH 13/14] xfs: repair quotas Darrick J. Wong
@ 2018-05-30 19:32 ` Darrick J. Wong
  13 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2018-05-30 19:32 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Use the fs freezing mechanism we developed for the rmapbt repair to
freeze the fs, this time to scan the fs for a live quotacheck.  We add a
new dqget variant to use the existing scrub transaction to allocate an
on-disk dquot block if it is missing.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/scrub/quota.c        |   20 +++
 fs/xfs/scrub/quota_repair.c |  286 +++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_dquot.c          |   59 ++++++++-
 fs/xfs/xfs_dquot.h          |    3 
 4 files changed, 360 insertions(+), 8 deletions(-)


diff --git a/fs/xfs/scrub/quota.c b/fs/xfs/scrub/quota.c
index 64776257fd88..596c660ca155 100644
--- a/fs/xfs/scrub/quota.c
+++ b/fs/xfs/scrub/quota.c
@@ -41,6 +41,7 @@
 #include "scrub/scrub.h"
 #include "scrub/common.h"
 #include "scrub/trace.h"
+#include "scrub/repair.h"
 
 /* Convert a scrub type code to a DQ flag, or return 0 if error. */
 uint
@@ -78,12 +79,29 @@ xfs_scrub_setup_quota(
 	mutex_lock(&sc->mp->m_quotainfo->qi_quotaofflock);
 	if (!xfs_this_quota_on(sc->mp, dqtype))
 		return -ENOENT;
+	/*
+	 * Freeze out anything that can alter an inode because we reconstruct
+	 * the quota counts by iterating all the inodes in the system.
+	 */
+	if ((sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR) &&
+	    (sc->try_harder || XFS_QM_NEED_QUOTACHECK(sc->mp))) {
+		error = xfs_repair_fs_freeze(sc);
+		if (error)
+			return error;
+	}
 	error = xfs_scrub_setup_fs(sc, ip);
 	if (error)
 		return error;
 	sc->ip = xfs_quota_inode(sc->mp, dqtype);
-	xfs_ilock(sc->ip, XFS_ILOCK_EXCL);
 	sc->ilock_flags = XFS_ILOCK_EXCL;
+	/*
+	 * Pretend to be an ILOCK parent to shut up lockdep if we're going to
+	 * do a full inode scan of the fs.  Quota inodes do not count towards
+	 * quota accounting, so we shouldn't deadlock on ourselves.
+	 */
+	if (sc->fs_frozen)
+		sc->ilock_flags |= XFS_ILOCK_PARENT;
+	xfs_ilock(sc->ip, sc->ilock_flags);
 	return 0;
 }
 
diff --git a/fs/xfs/scrub/quota_repair.c b/fs/xfs/scrub/quota_repair.c
index 68b7082af30a..5e51dc0dcb9c 100644
--- a/fs/xfs/scrub/quota_repair.c
+++ b/fs/xfs/scrub/quota_repair.c
@@ -30,13 +30,20 @@
 #include "xfs_trans.h"
 #include "xfs_sb.h"
 #include "xfs_inode.h"
+#include "xfs_icache.h"
 #include "xfs_inode_fork.h"
 #include "xfs_alloc.h"
 #include "xfs_bmap.h"
+#include "xfs_bmap_util.h"
+#include "xfs_ialloc.h"
+#include "xfs_ialloc_btree.h"
 #include "xfs_quota.h"
 #include "xfs_qm.h"
 #include "xfs_dquot.h"
 #include "xfs_dquot_item.h"
+#include "xfs_trans_space.h"
+#include "xfs_error.h"
+#include "xfs_errortag.h"
 #include "scrub/xfs_scrub.h"
 #include "scrub/scrub.h"
 #include "scrub/common.h"
@@ -314,6 +321,269 @@ xfs_repair_quota_data_fork(
 	return error;
 }
 
+/*
+ * Add this inode's resource usage to the dquot.  We adjust the in-core and
+ * the (cached) on-disk copies of the counters and leave the dquot dirty.  A
+ * subsequent pass through the dquots logs them all to disk.  Fortunately we
+ * froze the filesystem before starting so at least we don't have to deal
+ * with chown/chproj races.
+ */
+STATIC int
+xfs_repair_quotacheck_dqadjust(
+	struct xfs_scrub_context	*sc,
+	struct xfs_inode		*ip,
+	uint				type,
+	xfs_qcnt_t			nblks,
+	xfs_qcnt_t			rtblks)
+{
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_dquot		*dqp;
+	xfs_dqid_t			id;
+	int				error;
+
+	/* Try to read in the dquot. */
+	id = xfs_qm_id_for_quotatype(ip, type);
+	error = xfs_qm_dqget(mp, id, type, false, &dqp);
+	if (error == -ENOENT) {
+		/* Allocate a dquot using our special transaction. */
+		error = xfs_qm_dqget_alloc(&sc->tp, id, type, &dqp);
+		if (error)
+			return error;
+		error = xfs_trans_roll_inode(&sc->tp, sc->ip);
+	}
+	if (error) {
+		/*
+		 * Shouldn't be able to turn off quotas here.
+		 */
+		ASSERT(error != -ESRCH);
+		ASSERT(error != -ENOENT);
+		return error;
+	}
+
+	/*
+	 * Adjust the inode count and the block count to reflect this inode's
+	 * resource usage.
+	 */
+	be64_add_cpu(&dqp->q_core.d_icount, 1);
+	dqp->q_res_icount++;
+	if (nblks) {
+		be64_add_cpu(&dqp->q_core.d_bcount, nblks);
+		dqp->q_res_bcount += nblks;
+	}
+	if (rtblks) {
+		be64_add_cpu(&dqp->q_core.d_rtbcount, rtblks);
+		dqp->q_res_rtbcount += rtblks;
+	}
+
+	/*
+	 * Set default limits, adjust timers (since we changed usages)
+	 *
+	 * There are no timers for the default values set in the root dquot.
+	 */
+	if (dqp->q_core.d_id) {
+		xfs_qm_adjust_dqlimits(mp, dqp);
+		xfs_qm_adjust_dqtimers(mp, &dqp->q_core);
+	}
+
+	dqp->dq_flags |= XFS_DQ_DIRTY;
+	xfs_qm_dqput(dqp);
+	return 0;
+}
+
+/* Record this inode's quota use. */
+STATIC int
+xfs_repair_quotacheck_inode(
+	struct xfs_scrub_context	*sc,
+	uint				dqtype,
+	struct xfs_inode		*ip)
+{
+	struct xfs_ifork		*ifp;
+	xfs_filblks_t			rtblks = 0;	/* total rt blks */
+	xfs_qcnt_t			nblks;
+	int				error;
+
+	/* Count the realtime blocks. */
+	if (XFS_IS_REALTIME_INODE(ip)) {
+		ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
+
+		if (!(ifp->if_flags & XFS_IFEXTENTS)) {
+			error = xfs_iread_extents(sc->tp, ip, XFS_DATA_FORK);
+			if (error)
+				return error;
+		}
+
+		xfs_bmap_count_leaves(ifp, &rtblks);
+	}
+
+	nblks = (xfs_qcnt_t)ip->i_d.di_nblocks - rtblks;
+
+	/* Adjust the dquot. */
+	return xfs_repair_quotacheck_dqadjust(sc, ip, dqtype, nblks, rtblks);
+}
+
+struct xfs_repair_quotacheck {
+	struct xfs_scrub_context	*sc;
+	uint				dqtype;
+};
+
+/* Iterate all the inodes in an AG group. */
+STATIC int
+xfs_repair_quotacheck_inobt(
+	struct xfs_btree_cur		*cur,
+	union xfs_btree_rec		*rec,
+	void				*priv)
+{
+	struct xfs_inobt_rec_incore	irec;
+	struct xfs_mount		*mp = cur->bc_mp;
+	struct xfs_inode		*ip = NULL;
+	struct xfs_repair_quotacheck	*rq = priv;
+	xfs_ino_t			ino;
+	xfs_agino_t			agino;
+	int				chunkidx;
+	int				error = 0;
+
+	xfs_inobt_btrec_to_irec(mp, rec, &irec);
+
+	for (chunkidx = 0, agino = irec.ir_startino;
+	     chunkidx < XFS_INODES_PER_CHUNK;
+	     chunkidx++, agino++) {
+		bool	inuse;
+
+		/* Skip if this inode is free */
+		if (XFS_INOBT_MASK(chunkidx) & irec.ir_free)
+			continue;
+		ino = XFS_AGINO_TO_INO(mp, cur->bc_private.a.agno, agino);
+		if (xfs_is_quota_inode(&mp->m_sb, ino))
+			continue;
+
+		/* Back off and try again if an inode is being reclaimed */
+		error = xfs_icache_inode_is_allocated(mp, NULL, ino, &inuse);
+		if (error == -EAGAIN)
+			return -EDEADLOCK;
+
+		/*
+		 * Grab inode for scanning.  We cannot use DONTCACHE here
+		 * because we already have a transaction so the iput must not
+		 * trigger inode reclaim (which might allocate a transaction
+		 * to clean up posteof blocks).
+		 */
+		error = xfs_iget(mp, NULL, ino, 0, XFS_ILOCK_EXCL, &ip);
+		if (error)
+			return error;
+
+		error = xfs_repair_quotacheck_inode(rq->sc, rq->dqtype, ip);
+		xfs_iunlock(ip, XFS_ILOCK_EXCL);
+		xfs_repair_frozen_iput(rq->sc, ip);
+		if (error)
+			return error;
+	}
+
+	return 0;
+}
+
+/* Zero a dquot prior to regenerating the counts. */
+static int
+xfs_repair_quotacheck_zero_dquot(
+	struct xfs_dquot		*dq,
+	uint				dqtype,
+	void				*priv)
+{
+	dq->q_res_bcount -= be64_to_cpu(dq->q_core.d_bcount);
+	dq->q_core.d_bcount = 0;
+	dq->q_res_icount -= be64_to_cpu(dq->q_core.d_icount);
+	dq->q_core.d_icount = 0;
+	dq->q_res_rtbcount -= be64_to_cpu(dq->q_core.d_rtbcount);
+	dq->q_core.d_rtbcount = 0;
+	dq->dq_flags |= XFS_DQ_DIRTY;
+	return 0;
+}
+
+/* Log a dirty dquot after we regenerated the counters. */
+static int
+xfs_repair_quotacheck_log_dquot(
+	struct xfs_dquot		*dq,
+	uint				dqtype,
+	void				*priv)
+{
+	struct xfs_scrub_context	*sc = priv;
+	int				error;
+
+	xfs_trans_dqjoin(sc->tp, dq);
+	xfs_trans_log_dquot(sc->tp, dq);
+	error = xfs_trans_roll(&sc->tp);
+	xfs_dqlock(dq);
+	return error;
+}
+
+/* Execute an online quotacheck. */
+STATIC int
+xfs_repair_quotacheck(
+	struct xfs_scrub_context	*sc,
+	uint				dqtype)
+{
+	struct xfs_repair_quotacheck	rq;
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_buf			*bp;
+	struct xfs_btree_cur		*cur;
+	xfs_agnumber_t			ag;
+	uint				flag;
+	int				error;
+
+	/*
+	 * Commit the transaction so that we can allocate new quota ip
+	 * mappings if we have to.  If we crash after this point, the sb
+	 * still has the CHKD flags cleared, so mount quotacheck will fix
+	 * all of this up.
+	 */
+	error = xfs_trans_commit(sc->tp);
+	sc->tp = NULL;
+	if (error)
+		return error;
+
+	/* Zero all the quota items. */
+	error = xfs_qm_dqiterate(mp, dqtype, xfs_repair_quotacheck_zero_dquot,
+			sc);
+	if (error)
+		goto out;
+
+	rq.sc = sc;
+	rq.dqtype = dqtype;
+
+	/* Iterate all AGs for inodes. */
+	for (ag = 0; ag < mp->m_sb.sb_agcount; ag++) {
+		error = xfs_ialloc_read_agi(mp, NULL, ag, &bp);
+		if (error)
+			goto out;
+		cur = xfs_inobt_init_cursor(mp, NULL, bp, ag, XFS_BTNUM_INO);
+		error = xfs_btree_query_all(cur, xfs_repair_quotacheck_inobt,
+				&rq);
+		xfs_btree_del_cursor(cur, error ? XFS_BTREE_ERROR :
+						  XFS_BTREE_NOERROR);
+		xfs_buf_relse(bp);
+		if (error)
+			goto out;
+	}
+
+	/* Log dquots. */
+	error = xfs_scrub_trans_alloc(sc, 0);
+	if (error)
+		goto out;
+	error = xfs_qm_dqiterate(mp, dqtype, xfs_repair_quotacheck_log_dquot,
+			sc);
+	if (error)
+		goto out;
+
+	/* Set quotachecked flag. */
+	flag = xfs_quota_chkd_flag(dqtype);
+	sc->mp->m_qflags |= flag;
+	spin_lock(&sc->mp->m_sb_lock);
+	sc->mp->m_sb.sb_qflags |= flag;
+	spin_unlock(&sc->mp->m_sb_lock);
+	xfs_log_sb(sc->tp);
+out:
+	return error;
+}
+
 /* Repair all of a quota type's items. */
 int
 xfs_repair_quota(
@@ -322,6 +592,7 @@ xfs_repair_quota(
 	struct xfs_repair_quota_info	rqi;
 	struct xfs_mount		*mp = sc->mp;
 	uint				dqtype;
+	uint				flag;
 	int				error = 0;
 
 	dqtype = xfs_scrub_quota_to_dqtype(sc);
@@ -344,9 +615,22 @@ xfs_repair_quota(
 		goto out_relock;
 
 	/* Make a quotacheck happen. */
-	if (rqi.need_quotacheck)
+	if (rqi.need_quotacheck ||
+	    XFS_TEST_ERROR(false, mp, XFS_ERRTAG_FORCE_SCRUB_REPAIR))
 		xfs_repair_force_quotacheck(sc, dqtype);
 
+	/* Do we need a quotacheck?  Did we need one? */
+	flag = xfs_quota_chkd_flag(dqtype);
+	if (!(flag & sc->mp->m_qflags)) {
+		/* We need to freeze the fs before we can scan inodes. */
+		if (!sc->fs_frozen) {
+			error = -EDEADLOCK;
+			goto out_relock;
+		}
+
+		error = xfs_repair_quotacheck(sc, dqtype);
+	}
+
 out_relock:
 	sc->ilock_flags = XFS_ILOCK_EXCL;
 	xfs_ilock(sc->ip, sc->ilock_flags);
diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index 2567391489bd..be0e07f42b17 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -546,6 +546,7 @@ xfs_dquot_from_disk(
 static int
 xfs_qm_dqread_alloc(
 	struct xfs_mount	*mp,
+	struct xfs_trans	**tpp,
 	struct xfs_dquot	*dqp,
 	struct xfs_buf		**bpp)
 {
@@ -553,6 +554,18 @@ xfs_qm_dqread_alloc(
 	struct xfs_buf		*bp;
 	int			error;
 
+	/*
+	 * The caller passed in a transaction which we don't control, so
+	 * release the hold before passing back the buffer.
+	 */
+	if (tpp) {
+		error = xfs_dquot_disk_alloc(tpp, dqp, &bp);
+		if (error)
+			return error;
+		xfs_trans_bhold_release(*tpp, bp);
+		return 0;
+	}
+
 	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_dqalloc,
 			XFS_QM_DQALLOC_SPACE_RES(mp), 0, 0, &tp);
 	if (error)
@@ -588,6 +601,7 @@ xfs_qm_dqread_alloc(
 static int
 xfs_qm_dqread(
 	struct xfs_mount	*mp,
+	struct xfs_trans	**tpp,
 	xfs_dqid_t		id,
 	uint			type,
 	bool			can_alloc,
@@ -603,7 +617,7 @@ xfs_qm_dqread(
 	/* Try to read the buffer, allocating if necessary. */
 	error = xfs_dquot_disk_read(mp, dqp, &bp);
 	if (error == -ENOENT && can_alloc)
-		error = xfs_qm_dqread_alloc(mp, dqp, &bp);
+		error = xfs_qm_dqread_alloc(mp, tpp, dqp, &bp);
 	if (error)
 		goto err;
 
@@ -787,9 +801,10 @@ xfs_qm_dqget_checks(
  * Given the file system, id, and type (UDQUOT/GDQUOT), return a a locked
  * dquot, doing an allocation (if requested) as needed.
  */
-int
-xfs_qm_dqget(
+static int
+__xfs_qm_dqget(
 	struct xfs_mount	*mp,
+	struct xfs_trans	**tpp,
 	xfs_dqid_t		id,
 	uint			type,
 	bool			can_alloc,
@@ -811,7 +826,7 @@ xfs_qm_dqget(
 		return 0;
 	}
 
-	error = xfs_qm_dqread(mp, id, type, can_alloc, &dqp);
+	error = xfs_qm_dqread(mp, NULL, id, type, can_alloc, &dqp);
 	if (error)
 		return error;
 
@@ -850,7 +865,39 @@ xfs_qm_dqget_uncached(
 	if (error)
 		return error;
 
-	return xfs_qm_dqread(mp, id, type, 0, dqpp);
+	return xfs_qm_dqread(mp, NULL, id, type, 0, dqpp);
+}
+
+/*
+ * Given the file system, id, and type (UDQUOT/GDQUOT), return a a locked
+ * dquot, doing an allocation (if requested) as needed.
+ */
+int
+xfs_qm_dqget(
+	struct xfs_mount	*mp,
+	xfs_dqid_t		id,
+	uint			type,
+	bool			can_alloc,
+	struct xfs_dquot	**O_dqpp)
+{
+	return __xfs_qm_dqget(mp, NULL, id, type, can_alloc, O_dqpp);
+}
+
+/*
+ * Given the file system, id, and type (UDQUOT/GDQUOT) and a hole in the quota
+ * data where the on-disk dquot is supposed to live, return a locked dquot
+ * having allocated blocks with the transaction.  This is a corner case
+ * required by online repair, which already has a transaction and has to pass
+ * that into dquot_setup.
+ */
+int
+xfs_qm_dqget_alloc(
+	struct xfs_trans	**tpp,
+	xfs_dqid_t		id,
+	uint			type,
+	struct xfs_dquot	**dqpp)
+{
+	return __xfs_qm_dqget((*tpp)->t_mountp, tpp, id, type, true, dqpp);
 }
 
 /* Return the quota id for a given inode and type. */
@@ -914,7 +961,7 @@ xfs_qm_dqget_inode(
 	 * we re-acquire the lock.
 	 */
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
-	error = xfs_qm_dqread(mp, id, type, can_alloc, &dqp);
+	error = xfs_qm_dqread(mp, NULL, id, type, can_alloc, &dqp);
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	if (error)
 		return error;
diff --git a/fs/xfs/xfs_dquot.h b/fs/xfs/xfs_dquot.h
index bdd6bd921528..27e6df439493 100644
--- a/fs/xfs/xfs_dquot.h
+++ b/fs/xfs/xfs_dquot.h
@@ -180,6 +180,9 @@ extern int		xfs_qm_dqget_next(struct xfs_mount *mp, xfs_dqid_t id,
 extern int		xfs_qm_dqget_uncached(struct xfs_mount *mp,
 					xfs_dqid_t id, uint type,
 					struct xfs_dquot **dqpp);
+extern int		xfs_qm_dqget_alloc(struct xfs_trans **tpp,
+					xfs_dqid_t id, uint type,
+					struct xfs_dquot **dqpp);
 extern void		xfs_qm_dqput(xfs_dquot_t *);
 
 extern void		xfs_dqlock2(struct xfs_dquot *, struct xfs_dquot *);


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 05/14] xfs: repair the rmapbt
  2018-05-30 19:31 ` [PATCH 05/14] xfs: repair the rmapbt Darrick J. Wong
@ 2018-05-31  5:42   ` Amir Goldstein
  2018-06-06 21:13     ` Darrick J. Wong
  0 siblings, 1 reply; 35+ messages in thread
From: Amir Goldstein @ 2018-05-31  5:42 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, May 30, 2018 at 10:31 PM, Darrick J. Wong
<darrick.wong@oracle.com> wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> Rebuild the reverse mapping btree from all primary metadata.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/Makefile            |    1
>  fs/xfs/scrub/common.c      |    6
>  fs/xfs/scrub/repair.c      |  119 +++++++
>  fs/xfs/scrub/repair.h      |   27 +
>  fs/xfs/scrub/rmap.c        |    6
>  fs/xfs/scrub/rmap_repair.c |  796 ++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/scrub/scrub.c       |   18 +
>  fs/xfs/scrub/scrub.h       |    2
>  fs/xfs/xfs_mount.h         |    1
>  fs/xfs/xfs_super.c         |   27 +
>  fs/xfs/xfs_trans.c         |    7
>  11 files changed, 1004 insertions(+), 6 deletions(-)
>  create mode 100644 fs/xfs/scrub/rmap_repair.c
>
>
> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> index 7c442f83b179..b9bbac3d5075 100644
> --- a/fs/xfs/Makefile
> +++ b/fs/xfs/Makefile
> @@ -178,6 +178,7 @@ xfs-y                               += $(addprefix scrub/, \
>                                    alloc_repair.o \
>                                    ialloc_repair.o \
>                                    repair.o \
> +                                  rmap_repair.o \
>                                    )
>  endif
>  endif
> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> index 89938b328954..f92994716522 100644
> --- a/fs/xfs/scrub/common.c
> +++ b/fs/xfs/scrub/common.c
> @@ -603,9 +603,13 @@ xfs_scrub_trans_alloc(
>         struct xfs_scrub_context        *sc,
>         uint                            resblks)
>  {
> +       uint                            flags = 0;
> +
> +       if (sc->fs_frozen)
> +               flags |= XFS_TRANS_NO_WRITECOUNT;
>         if (sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR)
>                 return xfs_trans_alloc(sc->mp, &M_RES(sc->mp)->tr_itruncate,
> -                               resblks, 0, 0, &sc->tp);
> +                               resblks, 0, flags, &sc->tp);
>
>         return xfs_trans_alloc_empty(sc->mp, &sc->tp);
>  }
> diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
> index 45a91841c0ac..4b5d599d53b9 100644
> --- a/fs/xfs/scrub/repair.c
> +++ b/fs/xfs/scrub/repair.c
> @@ -43,6 +43,8 @@
>  #include "xfs_ag_resv.h"
>  #include "xfs_trans_space.h"
>  #include "xfs_quota.h"
> +#include "xfs_bmap.h"
> +#include "xfs_bmap_util.h"
>  #include "scrub/xfs_scrub.h"
>  #include "scrub/scrub.h"
>  #include "scrub/common.h"
> @@ -1146,3 +1148,120 @@ xfs_repair_mod_ino_counts(
>                                 (int64_t)freecount - old_freecount);
>         }
>  }
> +
> +/*
> + * Freeze the FS against all other activity so that we can avoid ABBA
> + * deadlocks while taking locks in unusual orders so that we can rebuild
> + * metadata structures such as the rmapbt.
> + */
> +int
> +xfs_repair_fs_freeze(
> +       struct xfs_scrub_context        *sc)
> +{
> +       int                             error;
> +
> +       error = freeze_super(sc->mp->m_super);
> +       if (error)
> +               return error;
> +       sc->fs_frozen = true;
> +       return 0;
> +}
> +
> +/* Unfreeze the FS. */
> +int
> +xfs_repair_fs_thaw(
> +       struct xfs_scrub_context        *sc)
> +{
> +       struct inode                    *inode, *o;
> +       int                             error;
> +
> +       sc->fs_frozen = false;
> +       error = thaw_super(sc->mp->m_super);
> +
> +       inode = sc->frozen_inode_list;
> +       while (inode) {
> +               o = inode->i_private;
> +               inode->i_private = NULL;
> +               iput(inode);
> +               inode = o;
> +       }
> +
> +       return error;
> +}
> +


I think that new mechanism is worth a mention in the commit message,
if not a patch of its own with cc to fsdevel.
In a discussion on said patch I would ask: how does xfs_repair_fs_freeze()
work in collaboration with user initiated fsfreeze?
Is there a situation where LVM can be fooled to think that XFS is really
frozen, but it is actually "repair frozen"? and metadata can change while
taking a snapshot?
This is why I suggested to add a VFS freeze level, e.g.
SB_FREEZE_FS_MAINTAINANCE so that you don't publish XFS state
as SB_FREEZE_COMPLETE while you are modifying metadata on disk.
It might be sufficient to get XFS to state SB_FREEZE_COMPLETE and
then up only to SB_FREEZE_FS in xfs_repair_fs_freeze() without
adding any new states.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/14] xfs: repair the AGF and AGFL
  2018-05-30 19:30 ` [PATCH 01/14] xfs: repair the AGF and AGFL Darrick J. Wong
@ 2018-06-04  1:52   ` Dave Chinner
  2018-06-05 23:18     ` Darrick J. Wong
  0 siblings, 1 reply; 35+ messages in thread
From: Dave Chinner @ 2018-06-04  1:52 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, May 30, 2018 at 12:30:45PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Regenerate the AGF and AGFL from the rmap data.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

[...]

> +/* Repair the AGF. */
> +int
> +xfs_repair_agf(
> +	struct xfs_scrub_context	*sc)
> +{
> +	struct xfs_repair_find_ag_btree	fab[] = {
> +		{
> +			.rmap_owner = XFS_RMAP_OWN_AG,
> +			.buf_ops = &xfs_allocbt_buf_ops,
> +			.magic = XFS_ABTB_CRC_MAGIC,
> +		},
> +		{
> +			.rmap_owner = XFS_RMAP_OWN_AG,
> +			.buf_ops = &xfs_allocbt_buf_ops,
> +			.magic = XFS_ABTC_CRC_MAGIC,
> +		},
> +		{
> +			.rmap_owner = XFS_RMAP_OWN_AG,
> +			.buf_ops = &xfs_rmapbt_buf_ops,
> +			.magic = XFS_RMAP_CRC_MAGIC,
> +		},
> +		{
> +			.rmap_owner = XFS_RMAP_OWN_REFC,
> +			.buf_ops = &xfs_refcountbt_buf_ops,
> +			.magic = XFS_REFC_CRC_MAGIC,
> +		},
> +		{
> +			.buf_ops = NULL,
> +		},
> +	};
> +	struct xfs_repair_agf_allocbt	raa;
> +	struct xfs_agf			old_agf;
> +	struct xfs_mount		*mp = sc->mp;
> +	struct xfs_buf			*agf_bp;
> +	struct xfs_buf			*agfl_bp;
> +	struct xfs_agf			*agf;
> +	struct xfs_btree_cur		*cur = NULL;
> +	struct xfs_perag		*pag;
> +	xfs_agblock_t			blocks;
> +	xfs_agblock_t			freesp_blocks;
> +	int64_t				delta_fdblocks = 0;
> +	int				error;
> +
> +	/* We require the rmapbt to rebuild anything. */
> +	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
> +		return -EOPNOTSUPP;
> +
> +	xfs_scrub_perag_get(sc->mp, &sc->sa);
> +	pag = sc->sa.pag;
> +	memset(&raa, 0, sizeof(raa));
> +	error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
> +			XFS_AG_DADDR(mp, sc->sa.agno, XFS_AGF_DADDR(mp)),
> +			XFS_FSS_TO_BB(mp, 1), 0, &agf_bp, NULL);
> +	if (error)
> +		return error;
> +	agf_bp->b_ops = &xfs_agf_buf_ops;
> +
> +	/*
> +	 * Load the AGFL so that we can screen out OWN_AG blocks that
> +	 * are on the AGFL now; these blocks might have once been part
> +	 * of the bno/cnt/rmap btrees but are not now.
> +	 */
> +	error = xfs_alloc_read_agfl(mp, sc->tp, sc->sa.agno, &agfl_bp);
> +	if (error)
> +		return error;
> +	error = xfs_agfl_walk(sc->mp, XFS_BUF_TO_AGF(agf_bp), agfl_bp,
> +			xfs_repair_agf_check_agfl_block, sc);
> +	if (error)
> +		return error;

THis is a bit of a chicken/egg situation, isn't it? We haven't
repaired the AGFL yet, so how do we know what is valid here?

> +	/* Find the btree roots. */
> +	error = xfs_repair_find_ag_btree_roots(sc, agf_bp, fab, agfl_bp);
> +	if (error)
> +		return error;
> +	if (fab[0].root == NULLAGBLOCK || fab[0].height > XFS_BTREE_MAXLEVELS ||
> +	    fab[1].root == NULLAGBLOCK || fab[1].height > XFS_BTREE_MAXLEVELS ||
> +	    fab[2].root == NULLAGBLOCK || fab[2].height > XFS_BTREE_MAXLEVELS)
> +		return -EFSCORRUPTED;
> +	if (xfs_sb_version_hasreflink(&mp->m_sb) &&
> +	    (fab[3].root == NULLAGBLOCK || fab[3].height > XFS_BTREE_MAXLEVELS))
> +		return -EFSCORRUPTED;
> +
> +	/* Start rewriting the header. */
> +	agf = XFS_BUF_TO_AGF(agf_bp);
> +	memcpy(&old_agf, agf, sizeof(old_agf));
> +
> +	/*
> +	 * We relied on the rmapbt to reconstruct the AGF.  If we get a
> +	 * different root then something's seriously wrong.
> +	 */
> +	if (be32_to_cpu(old_agf.agf_roots[XFS_BTNUM_RMAPi]) != fab[2].root)
> +		return -EFSCORRUPTED;
> +	memset(agf, 0, mp->m_sb.sb_sectsize);
> +	agf->agf_magicnum = cpu_to_be32(XFS_AGF_MAGIC);
> +	agf->agf_versionnum = cpu_to_be32(XFS_AGF_VERSION);
> +	agf->agf_seqno = cpu_to_be32(sc->sa.agno);
> +	agf->agf_length = cpu_to_be32(xfs_ag_block_count(mp, sc->sa.agno));
> +	agf->agf_roots[XFS_BTNUM_BNOi] = cpu_to_be32(fab[0].root);
> +	agf->agf_roots[XFS_BTNUM_CNTi] = cpu_to_be32(fab[1].root);
> +	agf->agf_roots[XFS_BTNUM_RMAPi] = cpu_to_be32(fab[2].root);
> +	agf->agf_levels[XFS_BTNUM_BNOi] = cpu_to_be32(fab[0].height);
> +	agf->agf_levels[XFS_BTNUM_CNTi] = cpu_to_be32(fab[1].height);
> +	agf->agf_levels[XFS_BTNUM_RMAPi] = cpu_to_be32(fab[2].height);
> +	agf->agf_flfirst = old_agf.agf_flfirst;
> +	agf->agf_fllast = old_agf.agf_fllast;
> +	agf->agf_flcount = old_agf.agf_flcount;
> +	if (xfs_sb_version_hascrc(&mp->m_sb))
> +		uuid_copy(&agf->agf_uuid, &mp->m_sb.sb_meta_uuid);
> +	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> +		agf->agf_refcount_root = cpu_to_be32(fab[3].root);
> +		agf->agf_refcount_level = cpu_to_be32(fab[3].height);
> +	}

Can we factor this function allow rebuild operation lines? That will
help document all the different pieces it is putting together. E.g
move the AGF header init to before xfs_repair_find_ag_btree_roots(),
and then pass it into xfs_repair_agf_rebuild_roots() which contains
the above fab specific code.

> +
> +	/* Update the AGF counters from the bnobt. */
> +	cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno,
> +			XFS_BTNUM_BNO);
> +	raa.sc = sc;
> +	error = xfs_alloc_query_all(cur, xfs_repair_agf_walk_allocbt, &raa);
> +	if (error)
> +		goto err;
> +	error = xfs_btree_count_blocks(cur, &blocks);
> +	if (error)
> +		goto err;
> +	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
> +	freesp_blocks = blocks - 1;
> +	agf->agf_freeblks = cpu_to_be32(raa.freeblks);
> +	agf->agf_longest = cpu_to_be32(raa.longest);
> +
> +	/* Update the AGF counters from the cntbt. */
> +	cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno,
> +			XFS_BTNUM_CNT);
> +	error = xfs_btree_count_blocks(cur, &blocks);
> +	if (error)
> +		goto err;
> +	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
> +	freesp_blocks += blocks - 1;
> +
> +	/* Update the AGF counters from the rmapbt. */
> +	cur = xfs_rmapbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno);
> +	error = xfs_btree_count_blocks(cur, &blocks);
> +	if (error)
> +		goto err;
> +	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
> +	agf->agf_rmap_blocks = cpu_to_be32(blocks);
> +	freesp_blocks += blocks - 1;
> +
> +	/* Update the AGF counters from the refcountbt. */
> +	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> +		cur = xfs_refcountbt_init_cursor(mp, sc->tp, agf_bp,
> +				sc->sa.agno, NULL);
> +		error = xfs_btree_count_blocks(cur, &blocks);
> +		if (error)
> +			goto err;
> +		xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
> +		agf->agf_refcount_blocks = cpu_to_be32(blocks);
> +	}
> +	agf->agf_btreeblks = cpu_to_be32(freesp_blocks);
> +	cur = NULL;

Then this is xfs_repair_agf_rebuild_counters()

> +
> +	/* Trigger reinitialization of the in-core data. */
> +	if (raa.freeblks != be32_to_cpu(old_agf.agf_freeblks)) {
> +		delta_fdblocks += (int64_t)raa.freeblks -
> +				be32_to_cpu(old_agf.agf_freeblks);
> +		if (pag->pagf_init)
> +			pag->pagf_freeblks = be32_to_cpu(agf->agf_freeblks);
> +	}
> +
> +	if (freesp_blocks != be32_to_cpu(old_agf.agf_btreeblks)) {
> +		delta_fdblocks += (int64_t)freesp_blocks -
> +				be32_to_cpu(old_agf.agf_btreeblks);
> +		if (pag->pagf_init)
> +			pag->pagf_btreeblks = be32_to_cpu(agf->agf_btreeblks);
> +	}
> +
> +	if (pag->pagf_init &&
> +	    (raa.longest != be32_to_cpu(old_agf.agf_longest) ||
> +	     fab[0].height != be32_to_cpu(old_agf.agf_levels[XFS_BTNUM_BNOi]) ||
> +	     fab[1].height != be32_to_cpu(old_agf.agf_levels[XFS_BTNUM_CNTi]) ||
> +	     fab[2].height != be32_to_cpu(old_agf.agf_levels[XFS_BTNUM_RMAPi]) ||
> +	     fab[3].height != be32_to_cpu(old_agf.agf_refcount_level))) {
> +		pag->pagf_longest = be32_to_cpu(agf->agf_longest);
> +		pag->pagf_levels[XFS_BTNUM_BNOi] =
> +				be32_to_cpu(agf->agf_levels[XFS_BTNUM_BNOi]);
> +		pag->pagf_levels[XFS_BTNUM_CNTi] =
> +				be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNTi]);
> +		pag->pagf_levels[XFS_BTNUM_RMAPi] =
> +				be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAPi]);
> +		pag->pagf_refcount_level =
> +				be32_to_cpu(agf->agf_refcount_level);
> +	}
> +
> +	error = xfs_repair_mod_fdblocks(sc, delta_fdblocks);
> +	if (error)
> +		goto err;

And xfs_repair_agf_update_pag().

> +
> +	/* Write this to disk. */
> +	xfs_trans_buf_set_type(sc->tp, agf_bp, XFS_BLFT_AGF_BUF);
> +	xfs_trans_log_buf(sc->tp, agf_bp, 0, BBTOB(agf_bp->b_length) - 1);
> +	return error;
> +
> +err:
> +	if (cur)
> +		xfs_btree_del_cursor(cur, error ? XFS_BTREE_ERROR :
> +				XFS_BTREE_NOERROR);
> +	memcpy(agf, &old_agf, sizeof(old_agf));
> +	return error;
> +}
> +
> +/* AGFL */
> +
> +struct xfs_repair_agfl {
> +	struct xfs_repair_extent_list	freesp_list;
> +	struct xfs_repair_extent_list	agmeta_list;
> +	struct xfs_scrub_context	*sc;
> +};
> +
> +/* Record all freespace information. */
> +STATIC int
> +xfs_repair_agfl_rmap_fn(
> +	struct xfs_btree_cur		*cur,
> +	struct xfs_rmap_irec		*rec,
> +	void				*priv)
> +{
> +	struct xfs_repair_agfl		*ra = priv;
> +	struct xfs_buf			*bp;
> +	xfs_fsblock_t			fsb;
> +	int				i;
> +	int				error = 0;
> +
> +	if (xfs_scrub_should_terminate(ra->sc, &error))
> +		return error;
> +
> +	/* Record all the OWN_AG blocks... */
> +	if (rec->rm_owner == XFS_RMAP_OWN_AG) {
> +		fsb = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_private.a.agno,
> +				rec->rm_startblock);
> +		error = xfs_repair_collect_btree_extent(ra->sc,
> +				&ra->freesp_list, fsb, rec->rm_blockcount);
> +		if (error)
> +			return error;
> +	}
> +
> +	/* ...and all the rmapbt blocks... */
> +	for (i = 0; i < cur->bc_nlevels && cur->bc_ptrs[i] == 1; i++) {

What is the significance of "cur->bc_ptrs[i] == 1"?

This loop looks like it is walking the btree path to this leaf, but
bc_ptrs[] will only have a "1" in it if we are at the left-most edge
of the tree, right? so what about all the other btree blocks?

> +		xfs_btree_get_block(cur, i, &bp);
> +		if (!bp)
> +			continue;
> +		fsb = XFS_DADDR_TO_FSB(cur->bc_mp, bp->b_bn);
> +		error = xfs_repair_collect_btree_extent(ra->sc,
> +				&ra->agmeta_list, fsb, 1);
> +		if (error)
> +			return error;
> +	}
> +
> +	return 0;
> +}
> +
> +/* Add a btree block to the agmeta list. */
> +STATIC int
> +xfs_repair_agfl_visit_btblock(
> +	struct xfs_btree_cur		*cur,
> +	int				level,
> +	void				*priv)
> +{
> +	struct xfs_repair_agfl		*ra = priv;
> +	struct xfs_buf			*bp;
> +	xfs_fsblock_t			fsb;
> +	int				error = 0;
> +
> +	if (xfs_scrub_should_terminate(ra->sc, &error))
> +		return error;
> +
> +	xfs_btree_get_block(cur, level, &bp);
> +	if (!bp)
> +		return 0;
> +
> +	fsb = XFS_DADDR_TO_FSB(cur->bc_mp, bp->b_bn);
> +	return xfs_repair_collect_btree_extent(ra->sc, &ra->agmeta_list,
> +			fsb, 1);
> +}
> +
> +/* Repair the AGFL. */
> +int
> +xfs_repair_agfl(
> +	struct xfs_scrub_context	*sc)
> +{
> +	struct xfs_repair_agfl		ra;
> +	struct xfs_owner_info		oinfo;
> +	struct xfs_mount		*mp = sc->mp;
> +	struct xfs_buf			*agf_bp;
> +	struct xfs_buf			*agfl_bp;
> +	struct xfs_agf			*agf;
> +	struct xfs_agfl			*agfl;
> +	struct xfs_btree_cur		*cur = NULL;
> +	__be32				*agfl_bno;
> +	struct xfs_repair_extent	*rae;
> +	struct xfs_repair_extent	*n;
> +	xfs_agblock_t			flcount;
> +	xfs_agblock_t			agbno;
> +	xfs_agblock_t			bno;
> +	xfs_agblock_t			old_flcount;
> +	int				error;

Can we factor this function a little?

> +
> +	/* We require the rmapbt to rebuild anything. */
> +	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
> +		return -EOPNOTSUPP;
> +
> +	xfs_scrub_perag_get(sc->mp, &sc->sa);
> +	xfs_repair_init_extent_list(&ra.freesp_list);
> +	xfs_repair_init_extent_list(&ra.agmeta_list);
> +	ra.sc = sc;
> +
> +	error = xfs_alloc_read_agf(mp, sc->tp, sc->sa.agno, 0, &agf_bp);
> +	if (error)
> +		return error;
> +	if (!agf_bp)
> +		return -ENOMEM;
> +
> +	error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
> +			XFS_AG_DADDR(mp, sc->sa.agno, XFS_AGFL_DADDR(mp)),
> +			XFS_FSS_TO_BB(mp, 1), 0, &agfl_bp, NULL);
> +	if (error)
> +		return error;
> +	agfl_bp->b_ops = &xfs_agfl_buf_ops;

Be nice to have a __xfs_alloc_read_agfl() function that didn't set
the ops, and have this and xfs_alloc_read_agfl() both call it.

>From here:
> +
> +	/* Find all space used by the free space btrees & rmapbt. */
> +	cur = xfs_rmapbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno);
> +	error = xfs_rmap_query_all(cur, xfs_repair_agfl_rmap_fn, &ra);
> +	if (error)
> +		goto err;
> +	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
> +
> +	/* Find all space used by bnobt. */
> +	cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno,
> +			XFS_BTNUM_BNO);
> +	error = xfs_btree_visit_blocks(cur, xfs_repair_agfl_visit_btblock,
> +			&ra);
> +	if (error)
> +		goto err;
> +	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
> +
> +	/* Find all space used by cntbt. */
> +	cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno,
> +			XFS_BTNUM_CNT);
> +	error = xfs_btree_visit_blocks(cur, xfs_repair_agfl_visit_btblock,
> +			&ra);
> +	if (error)
> +		goto err;
> +	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
> +	cur = NULL;
> +
> +	/*
> +	 * Drop the freesp meta blocks that are in use by btrees.
> +	 * The remaining blocks /should/ be AGFL blocks.
> +	 */
> +	error = xfs_repair_subtract_extents(sc, &ra.freesp_list,
> +			&ra.agmeta_list);
> +	if (error)
> +		goto err;
> +	xfs_repair_cancel_btree_extents(sc, &ra.agmeta_list);

This whole section could go into a separate function, say
xfs_repair_agfl_find_extents()?

> +
> +	/* Calculate the new AGFL size. */
> +	flcount = 0;
> +	for_each_xfs_repair_extent_safe(rae, n, &ra.freesp_list) {

Why "safe" - we're not removing anything from the list here.

> +		for (bno = 0; bno < rae->len; bno++) {
> +			if (flcount >= xfs_agfl_size(mp) - 1)

What's the reason for the magic "- 1" there?

> +				break;
> +			flcount++;
> +		}
> +	}

This seems like a complex way of doing:

	for_each_xfs_repair_extent(rae, n, &ra.freesp_list) {
		flcount += rae->len;
		if (flcount >= xfs_agfl_size(mp) - 1) {
			flcount = xfs_agfl_size(mp) - 1;
			break;
		}
	}


> +	/* Update fdblocks if flcount changed. */
> +	agf = XFS_BUF_TO_AGF(agf_bp);
> +	old_flcount = be32_to_cpu(agf->agf_flcount);
> +	if (flcount != old_flcount) {
> +		int64_t	delta_fdblocks = (int64_t)flcount - old_flcount;
> +
> +		error = xfs_repair_mod_fdblocks(sc, delta_fdblocks);
> +		if (error)
> +			goto err;
> +		if (sc->sa.pag->pagf_init)
> +			sc->sa.pag->pagf_flcount = flcount;

No need to check pagf_init here - we've had a successful call to
xfs_alloc_read_agf() earlier and that means pagf has been
initialised.

> +	}
> +
> +	/* Update the AGF pointers. */
> +	agf->agf_flfirst = cpu_to_be32(1);

Why index 1? What is in index 0? (see earlier questions about magic
numbers :)

> +	agf->agf_flcount = cpu_to_be32(flcount);
> +	agf->agf_fllast = cpu_to_be32(flcount);
> +
> +	/* Start rewriting the header. */
> +	agfl = XFS_BUF_TO_AGFL(agfl_bp);
> +	memset(agfl, 0xFF, mp->m_sb.sb_sectsize);
> +	agfl->agfl_magicnum = cpu_to_be32(XFS_AGFL_MAGIC);
> +	agfl->agfl_seqno = cpu_to_be32(sc->sa.agno);
> +	uuid_copy(&agfl->agfl_uuid, &mp->m_sb.sb_meta_uuid);
> +
> +	/* Fill the AGFL with the remaining blocks. */
> +	flcount = 0;
> +	agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agfl_bp);
> +	for_each_xfs_repair_extent_safe(rae, n, &ra.freesp_list) {
> +		agbno = XFS_FSB_TO_AGBNO(mp, rae->fsbno);
> +
> +		trace_xfs_repair_agfl_insert(mp, sc->sa.agno, agbno, rae->len);
> +
> +		for (bno = 0; bno < rae->len; bno++) {
> +			if (flcount >= xfs_agfl_size(mp) - 1)
> +				break;
> +			agfl_bno[flcount + 1] = cpu_to_be32(agbno + bno);
> +			flcount++;
> +		}
> +		rae->fsbno += bno;
> +		rae->len -= bno;

This is a bit weird, using "bno" as an offset. But, also, there's
that magic "don't use index 0 thing again :P

> +		if (rae->len)
> +			break;
> +		list_del(&rae->list);
> +		kmem_free(rae);
> +	}
> +
> +	/* Write AGF and AGFL to disk. */
> +	xfs_alloc_log_agf(sc->tp, agf_bp,
> +			XFS_AGF_FLFIRST | XFS_AGF_FLLAST | XFS_AGF_FLCOUNT);
> +	xfs_trans_buf_set_type(sc->tp, agfl_bp, XFS_BLFT_AGFL_BUF);
> +	xfs_trans_log_buf(sc->tp, agfl_bp, 0, BBTOB(agfl_bp->b_length) - 1);
> +
> +	/* Dump any AGFL overflow. */
> +	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_AG);
> +	return xfs_repair_reap_btree_extents(sc, &ra.freesp_list, &oinfo,
> +			XFS_AG_RESV_AGFL);
> +err:
> +	xfs_repair_cancel_btree_extents(sc, &ra.agmeta_list);
> +	xfs_repair_cancel_btree_extents(sc, &ra.freesp_list);
> +	if (cur)
> +		xfs_btree_del_cursor(cur, error ? XFS_BTREE_ERROR :
> +				XFS_BTREE_NOERROR);
> +	return error;
> +}
> diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
> index e3e8fba1c99c..5f31dc8af505 100644
> --- a/fs/xfs/scrub/repair.c
> +++ b/fs/xfs/scrub/repair.c
> @@ -1087,3 +1087,33 @@ xfs_repair_ino_dqattach(
>  
>  	return error;
>  }
> +
> +/*
> + * We changed this AGF's free block count, so now we need to reset the global
> + * counters.  We use the transaction to update the global counters, so if the
> + * AG free counts were low we have to ask the transaction for more block
> + * reservation before decreasing fdblocks.
> + *
> + * XXX: We ought to have some mechanism for checking and fixing the superblock
> + * counters (particularly if we're close to ENOSPC) but that's left as an open
> + * research question for now.
> + */
> +int
> +xfs_repair_mod_fdblocks(
> +	struct xfs_scrub_context	*sc,
> +	int64_t				delta_fdblocks)
> +{
> +	int				error;
> +
> +	if (delta_fdblocks == 0)
> +		return 0;
> +
> +	if (delta_fdblocks < 0) {
> +		error = xfs_trans_reserve_more(sc->tp, -delta_fdblocks, 0);
> +		if (error)
> +			return error;
> +	}
> +
> +	xfs_trans_mod_sb(sc->tp, XFS_TRANS_SB_FDBLOCKS, delta_fdblocks);

This seems a little hacky - it's working around a transaction
reservation overflow warning, right? Would it simply be better to
have a different type for xfs_trans_mod_sb() that didn't shut down
the filesystem on transaction reservation overflows here? e.g
XFS_TRANS_SB_FDBLOCKS_REPAIR? That would get rid of the need for
xfs_trans_reserve_more() code, right?

[...]
> +/*
> + * Try to reserve more blocks for a transaction.  The single use case we
> + * support is for online repair -- use a transaction to gather data without
> + * fear of btree cycle deadlocks; calculate how many blocks we really need
> + * from that data; and only then start modifying data.  This can fail due to
> + * ENOSPC, so we have to be able to cancel the transaction.
> + */
> +int
> +xfs_trans_reserve_more(
> +	struct xfs_trans	*tp,
> +	uint			blocks,
> +	uint			rtextents)
> +{
> +	struct xfs_mount	*mp = tp->t_mountp;
> +	bool			rsvd = (tp->t_flags & XFS_TRANS_RESERVE) != 0;
> +	int			error = 0;
> +
> +	ASSERT(!(tp->t_flags & XFS_TRANS_DIRTY));
> +
> +	/*
> +	 * Attempt to reserve the needed disk blocks by decrementing
> +	 * the number needed from the number available.  This will
> +	 * fail if the count would go below zero.
> +	 */
> +	if (blocks > 0) {
> +		error = xfs_mod_fdblocks(mp, -((int64_t)blocks), rsvd);
> +		if (error != 0)
> +			return -ENOSPC;

		if (error)

> +		tp->t_blk_res += blocks;
> +	}

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 02/14] xfs: repair the AGI
  2018-05-30 19:30 ` [PATCH 02/14] xfs: repair the AGI Darrick J. Wong
@ 2018-06-04  1:56   ` Dave Chinner
  2018-06-05 23:54     ` Darrick J. Wong
  0 siblings, 1 reply; 35+ messages in thread
From: Dave Chinner @ 2018-06-04  1:56 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, May 30, 2018 at 12:30:52PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Rebuild the AGI header items with some help from the rmapbt.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Looks OK, but I can't help but think it should be structured similar
to the AGF rebuild, even though the functions would be smaller and
simpler...

Thoughts?

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 03/14] xfs: repair free space btrees
  2018-05-30 19:30 ` [PATCH 03/14] xfs: repair free space btrees Darrick J. Wong
@ 2018-06-04  2:12   ` Dave Chinner
  2018-06-06  1:50     ` Darrick J. Wong
  0 siblings, 1 reply; 35+ messages in thread
From: Dave Chinner @ 2018-06-04  2:12 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, May 30, 2018 at 12:30:58PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Rebuild the free space btrees from the gaps in the rmap btree.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/Makefile             |    1 
>  fs/xfs/scrub/alloc.c        |    1 
>  fs/xfs/scrub/alloc_repair.c |  430 +++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/scrub/common.c       |    8 +
>  fs/xfs/scrub/repair.h       |    2 
>  fs/xfs/scrub/scrub.c        |    4 
>  6 files changed, 442 insertions(+), 4 deletions(-)
>  create mode 100644 fs/xfs/scrub/alloc_repair.c
> 
> 
> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> index 29fe115f29d5..abe035ad0aa4 100644
> --- a/fs/xfs/Makefile
> +++ b/fs/xfs/Makefile
> @@ -175,6 +175,7 @@ xfs-$(CONFIG_XFS_QUOTA)		+= scrub/quota.o
>  ifeq ($(CONFIG_XFS_ONLINE_REPAIR),y)
>  xfs-y				+= $(addprefix scrub/, \
>  				   agheader_repair.o \
> +				   alloc_repair.o \
>  				   repair.o \
>  				   )
>  endif
> diff --git a/fs/xfs/scrub/alloc.c b/fs/xfs/scrub/alloc.c
> index 941a0a55224e..fe7e8bdf4a52 100644
> --- a/fs/xfs/scrub/alloc.c
> +++ b/fs/xfs/scrub/alloc.c
> @@ -29,7 +29,6 @@
>  #include "xfs_log_format.h"
>  #include "xfs_trans.h"
>  #include "xfs_sb.h"
> -#include "xfs_alloc.h"
>  #include "xfs_rmap.h"
>  #include "xfs_alloc.h"
>  #include "scrub/xfs_scrub.h"
> diff --git a/fs/xfs/scrub/alloc_repair.c b/fs/xfs/scrub/alloc_repair.c
> new file mode 100644
> index 000000000000..5a81713a69cd
> --- /dev/null
> +++ b/fs/xfs/scrub/alloc_repair.c
> @@ -0,0 +1,430 @@
> +/*
> + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> + *
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#include "xfs.h"
> +#include "xfs_fs.h"
> +#include "xfs_shared.h"
> +#include "xfs_format.h"
> +#include "xfs_trans_resv.h"
> +#include "xfs_mount.h"
> +#include "xfs_defer.h"
> +#include "xfs_btree.h"
> +#include "xfs_bit.h"
> +#include "xfs_log_format.h"
> +#include "xfs_trans.h"
> +#include "xfs_sb.h"
> +#include "xfs_alloc.h"
> +#include "xfs_alloc_btree.h"
> +#include "xfs_rmap.h"
> +#include "xfs_rmap_btree.h"
> +#include "xfs_inode.h"
> +#include "xfs_refcount.h"
> +#include "scrub/xfs_scrub.h"
> +#include "scrub/scrub.h"
> +#include "scrub/common.h"
> +#include "scrub/btree.h"
> +#include "scrub/trace.h"
> +#include "scrub/repair.h"
> +
> +/* Free space btree repair. */

Can you add a decription of the algorithm used here.

> +
> +struct xfs_repair_alloc_extent {
> +	struct list_head		list;
> +	xfs_agblock_t			bno;
> +	xfs_extlen_t			len;
> +};
> +
> +struct xfs_repair_alloc {
> +	struct list_head		extlist;
> +	struct xfs_repair_extent_list	btlist;	  /* OWN_AG blocks */
> +	struct xfs_repair_extent_list	nobtlist; /* rmapbt/agfl blocks */
> +	struct xfs_scrub_context	*sc;
> +	xfs_agblock_t			next_bno;
> +	uint64_t			nr_records;
> +};
> +
> +/* Record extents that aren't in use from gaps in the rmap records. */
> +STATIC int
> +xfs_repair_alloc_extent_fn(
> +	struct xfs_btree_cur		*cur,
> +	struct xfs_rmap_irec		*rec,
> +	void				*priv)
> +{
> +	struct xfs_repair_alloc		*ra = priv;
> +	struct xfs_repair_alloc_extent	*rae;
> +	struct xfs_buf			*bp;
> +	xfs_fsblock_t			fsb;
> +	int				i;
> +	int				error;
> +
> +	/* Record all the OWN_AG blocks... */
> +	if (rec->rm_owner == XFS_RMAP_OWN_AG) {
> +		fsb = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_private.a.agno,
> +				rec->rm_startblock);
> +		error = xfs_repair_collect_btree_extent(ra->sc,
> +				&ra->btlist, fsb, rec->rm_blockcount);
> +		if (error)
> +			return error;
> +	}
> +
> +	/* ...and all the rmapbt blocks... */
> +	for (i = 0; i < cur->bc_nlevels && cur->bc_ptrs[i] == 1; i++) {
> +		xfs_btree_get_block(cur, i, &bp);
> +		if (!bp)
> +			continue;
> +		fsb = XFS_DADDR_TO_FSB(cur->bc_mp, bp->b_bn);
> +		error = xfs_repair_collect_btree_extent(ra->sc,
> +				&ra->nobtlist, fsb, 1);
> +		if (error)
> +			return error;
> +	}

This looks familiar from previous patches, including the magic
bc_ptrs check. factoring opportunity?

> +
> +	/* ...and all the free space. */
> +	if (rec->rm_startblock > ra->next_bno) {
> +		trace_xfs_repair_alloc_extent_fn(cur->bc_mp,
> +				cur->bc_private.a.agno,
> +				ra->next_bno, rec->rm_startblock - ra->next_bno,
> +				XFS_RMAP_OWN_NULL, 0, 0);
> +
> +		rae = kmem_alloc(sizeof(struct xfs_repair_alloc_extent),
> +				KM_MAYFAIL);
> +		if (!rae)
> +			return -ENOMEM;
> +		INIT_LIST_HEAD(&rae->list);
> +		rae->bno = ra->next_bno;
> +		rae->len = rec->rm_startblock - ra->next_bno;
> +		list_add_tail(&rae->list, &ra->extlist);
> +		ra->nr_records++;
> +	}
> +	ra->next_bno = max_t(xfs_agblock_t, ra->next_bno,
> +			rec->rm_startblock + rec->rm_blockcount);
> +	return 0;
> +}

[....]

> +/* Allocate a block from the (cached) longest extent in the AG. */
> +STATIC xfs_fsblock_t
> +xfs_repair_allocbt_alloc_from_longest(
> +	struct xfs_repair_alloc		*ra,
> +	struct xfs_repair_alloc_extent	**longest)
> +{
> +	xfs_fsblock_t			fsb;
> +
> +	if (*longest && (*longest)->len == 0) {
> +		list_del(&(*longest)->list);
> +		kmem_free(*longest);
> +		*longest = NULL;
> +	}
> +
> +	if (*longest == NULL) {
> +		*longest = xfs_repair_allocbt_get_longest(ra);
> +		if (*longest == NULL)
> +			return NULLFSBLOCK;
> +	}
> +
> +	fsb = XFS_AGB_TO_FSB(ra->sc->mp, ra->sc->sa.agno, (*longest)->bno);
> +	(*longest)->bno++;
> +	(*longest)->len--;

What if this makes the longest extent no longer the longest on the
extent list?

> +	return fsb;
> +}
> +
> +/* Repair the freespace btrees for some AG. */
> +int
> +xfs_repair_allocbt(
> +	struct xfs_scrub_context	*sc)
> +{
> +	struct xfs_repair_alloc		ra;
> +	struct xfs_owner_info		oinfo;
> +	struct xfs_mount		*mp = sc->mp;
> +	struct xfs_btree_cur		*cur = NULL;
> +	struct xfs_repair_alloc_extent	*longest = NULL;
> +	struct xfs_repair_alloc_extent	*rae;
> +	struct xfs_repair_alloc_extent	*n;
> +	struct xfs_perag		*pag;
> +	struct xfs_agf			*agf;
> +	struct xfs_buf			*bp;
> +	xfs_fsblock_t			bnofsb;
> +	xfs_fsblock_t			cntfsb;
> +	xfs_extlen_t			oldf;
> +	xfs_extlen_t			nr_blocks;
> +	xfs_agblock_t			agend;
> +	int				error;
> +
> +	/* We require the rmapbt to rebuild anything. */
> +	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
> +		return -EOPNOTSUPP;
> +
> +	xfs_scrub_perag_get(sc->mp, &sc->sa);
> +	pag = sc->sa.pag;

Probably shoulld make xfs_scrub_perag_get() return the pag directly.

> +	/*
> +	 * Make sure the busy extent list is clear because we can't put
> +	 * extents on there twice.
> +	 */
> +	spin_lock(&pag->pagb_lock);
> +	if (pag->pagb_tree.rb_node) {
> +		spin_unlock(&pag->pagb_lock);
> +		return -EDEADLOCK;
> +	}
> +	spin_unlock(&pag->pagb_lock);

Can you wrap that up a helper, say, xfs_extent_busy_list_empty()?

	if (!xfs_extent_busy_list_empty(pag))
		return -EDEADLOCK;

> +	/*
> +	 * Collect all reverse mappings for free extents, and the rmapbt
> +	 * blocks.  We can discover the rmapbt blocks completely from a
> +	 * query_all handler because there are always rmapbt entries.
> +	 * (One cannot use on query_all to visit all of a btree's blocks
> +	 * unless that btree is guaranteed to have at least one entry.)
> +	 */
> +	INIT_LIST_HEAD(&ra.extlist);
> +	xfs_repair_init_extent_list(&ra.btlist);
> +	xfs_repair_init_extent_list(&ra.nobtlist);
> +	ra.next_bno = 0;
> +	ra.nr_records = 0;
> +	ra.sc = sc;
> +
> +	cur = xfs_rmapbt_init_cursor(mp, sc->tp, sc->sa.agf_bp, sc->sa.agno);
> +	error = xfs_rmap_query_all(cur, xfs_repair_alloc_extent_fn, &ra);
> +	if (error)
> +		goto out;
> +	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
> +	cur = NULL;
> +
> +	/* Insert a record for space between the last rmap and EOAG. */
> +	agf = XFS_BUF_TO_AGF(sc->sa.agf_bp);
> +	agend = be32_to_cpu(agf->agf_length);
> +	if (ra.next_bno < agend) {
> +		rae = kmem_alloc(sizeof(struct xfs_repair_alloc_extent),
> +				KM_MAYFAIL);
> +		if (!rae) {
> +			error = -ENOMEM;
> +			goto out;
> +		}
> +		INIT_LIST_HEAD(&rae->list);
> +		rae->bno = ra.next_bno;
> +		rae->len = agend - ra.next_bno;
> +		list_add_tail(&rae->list, &ra.extlist);
> +		ra.nr_records++;
> +	}
> +
> +	/* Collect all the AGFL blocks. */
> +	error = xfs_agfl_walk(sc->mp, XFS_BUF_TO_AGF(sc->sa.agf_bp),
> +			sc->sa.agfl_bp, xfs_repair_collect_agfl_block, &ra);
> +	if (error)
> +		goto out;
> +
> +	/* Do we actually have enough space to do this? */
> +	nr_blocks = 2 * xfs_allocbt_calc_size(mp, ra.nr_records);
> +	if (!xfs_repair_ag_has_space(pag, nr_blocks, XFS_AG_RESV_NONE)) {
> +		error = -ENOSPC;
> +		goto out;
> +	}
> +
> +	/* Invalidate all the bnobt/cntbt blocks in btlist. */
> +	error = xfs_repair_subtract_extents(sc, &ra.btlist, &ra.nobtlist);
> +	if (error)
> +		goto out;
> +	xfs_repair_cancel_btree_extents(sc, &ra.nobtlist);
> +	error = xfs_repair_invalidate_blocks(sc, &ra.btlist);
> +	if (error)
> +		goto out;

So this could be factored in xfs_repair_allocbt_get_free_extents().

> +
> +	/* Allocate new bnobt root. */
> +	bnofsb = xfs_repair_allocbt_alloc_from_longest(&ra, &longest);
> +	if (bnofsb == NULLFSBLOCK) {
> +		error = -ENOSPC;
> +		goto out;
> +	}
> +
> +	/* Allocate new cntbt root. */
> +	cntfsb = xfs_repair_allocbt_alloc_from_longest(&ra, &longest);
> +	if (cntfsb == NULLFSBLOCK) {
> +		error = -ENOSPC;
> +		goto out;
> +	}
> +
> +	agf = XFS_BUF_TO_AGF(sc->sa.agf_bp);
> +	/* Initialize new bnobt root. */
> +	error = xfs_repair_init_btblock(sc, bnofsb, &bp, XFS_BTNUM_BNO,
> +			&xfs_allocbt_buf_ops);
> +	if (error)
> +		goto out;
> +	agf->agf_roots[XFS_BTNUM_BNOi] =
> +			cpu_to_be32(XFS_FSB_TO_AGBNO(mp, bnofsb));
> +	agf->agf_levels[XFS_BTNUM_BNOi] = cpu_to_be32(1);
> +
> +	/* Initialize new cntbt root. */
> +	error = xfs_repair_init_btblock(sc, cntfsb, &bp, XFS_BTNUM_CNT,
> +			&xfs_allocbt_buf_ops);
> +	if (error)
> +		goto out;
> +	agf->agf_roots[XFS_BTNUM_CNTi] =
> +			cpu_to_be32(XFS_FSB_TO_AGBNO(mp, cntfsb));
> +	agf->agf_levels[XFS_BTNUM_CNTi] = cpu_to_be32(1);

xfs_repair_allocbt_new_btree_roots()

> +
> +	/*
> +	 * Since we're abandoning the old bnobt/cntbt, we have to
> +	 * decrease fdblocks by the # of blocks in those trees.
> +	 * btreeblks counts the non-root blocks of the free space
> +	 * and rmap btrees.  Do this before resetting the AGF counters.
> +	 */
> +	oldf = pag->pagf_btreeblks + 2;
> +	oldf -= (be32_to_cpu(agf->agf_rmap_blocks) - 1);
> +	error = xfs_mod_fdblocks(mp, -(int64_t)oldf, false);
> +	if (error)
> +		goto out;
> +
> +	/* Reset the perag info. */
> +	pag->pagf_btreeblks = be32_to_cpu(agf->agf_rmap_blocks) - 1;
> +	pag->pagf_freeblks = 0;
> +	pag->pagf_longest = 0;
> +	pag->pagf_levels[XFS_BTNUM_BNOi] =
> +			be32_to_cpu(agf->agf_levels[XFS_BTNUM_BNOi]);
> +	pag->pagf_levels[XFS_BTNUM_CNTi] =
> +			be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNTi]);
> +
> +	/* Now reset the AGF counters. */
> +	agf->agf_btreeblks = cpu_to_be32(pag->pagf_btreeblks);
> +	agf->agf_freeblks = cpu_to_be32(pag->pagf_freeblks);
> +	agf->agf_longest = cpu_to_be32(pag->pagf_longest);
> +	xfs_alloc_log_agf(sc->tp, sc->sa.agf_bp,
> +			XFS_AGF_ROOTS | XFS_AGF_LEVELS | XFS_AGF_BTREEBLKS |
> +			XFS_AGF_LONGEST | XFS_AGF_FREEBLKS);
> +	error = xfs_repair_roll_ag_trans(sc);
> +	if (error)
> +		goto out;

xfs_repair_allocbt_reset_counters()?

> +	/*
> +	 * Insert the longest free extent in case it's necessary to
> +	 * refresh the AGFL with multiple blocks.
> +	 */
> +	xfs_rmap_skip_owner_update(&oinfo);
> +	if (longest && longest->len == 0) {
> +		error = xfs_repair_allocbt_free_extent(sc,
> +				XFS_AGB_TO_FSB(sc->mp, sc->sa.agno,
> +					longest->bno),
> +				longest->len, &oinfo);
> +		if (error)
> +			goto out;
> +		list_del(&longest->list);
> +		kmem_free(longest);
> +	}
> +
> +	/* Insert records into the new btrees. */
> +	list_sort(NULL, &ra.extlist, xfs_repair_allocbt_extent_cmp);
> +	list_for_each_entry_safe(rae, n, &ra.extlist, list) {
> +		error = xfs_repair_allocbt_free_extent(sc,
> +				XFS_AGB_TO_FSB(sc->mp, sc->sa.agno, rae->bno),
> +				rae->len, &oinfo);
> +		if (error)
> +			goto out;
> +		list_del(&rae->list);
> +		kmem_free(rae);
> +	}
> +
> +	/* Add rmap records for the btree roots */
> +	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_AG);
> +	error = xfs_rmap_alloc(sc->tp, sc->sa.agf_bp, sc->sa.agno,
> +			XFS_FSB_TO_AGBNO(mp, bnofsb), 1, &oinfo);
> +	if (error)
> +		goto out;
> +	error = xfs_rmap_alloc(sc->tp, sc->sa.agf_bp, sc->sa.agno,
> +			XFS_FSB_TO_AGBNO(mp, cntfsb), 1, &oinfo);
> +	if (error)
> +		goto out;

xfs_repair_allocbt_rebuild_tree()

> +
> +	/* Free all the OWN_AG blocks that are not in the rmapbt/agfl. */
> +	return xfs_repair_reap_btree_extents(sc, &ra.btlist, &oinfo,
> +			XFS_AG_RESV_NONE);
> +out:
> +	xfs_repair_cancel_btree_extents(sc, &ra.btlist);
> +	xfs_repair_cancel_btree_extents(sc, &ra.nobtlist);
> +	if (cur)
> +		xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
> +	list_for_each_entry_safe(rae, n, &ra.extlist, list) {
> +		list_del(&rae->list);
> +		kmem_free(rae);
> +	}
> +	return error;
> +}

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 04/14] xfs: repair inode btrees
  2018-05-30 19:31 ` [PATCH 04/14] xfs: repair inode btrees Darrick J. Wong
@ 2018-06-04  3:41   ` Dave Chinner
  2018-06-06  3:55     ` Darrick J. Wong
  0 siblings, 1 reply; 35+ messages in thread
From: Dave Chinner @ 2018-06-04  3:41 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, May 30, 2018 at 12:31:04PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Use the rmapbt to find inode chunks, query the chunks to compute
> hole and free masks, and with that information rebuild the inobt
> and finobt.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

[...]

> +xfs_repair_ialloc_check_free(
> +	struct xfs_btree_cur	*cur,
> +	struct xfs_buf		*bp,
> +	xfs_ino_t		fsino,
> +	xfs_agino_t		bpino,
> +	bool			*inuse)
> +{
> +	struct xfs_mount	*mp = cur->bc_mp;
> +	struct xfs_dinode	*dip;
> +	int			error;
> +
> +	/* Will the in-core inode tell us if it's in use? */
> +	error = xfs_icache_inode_is_allocated(mp, cur->bc_tp, fsino, inuse);
> +	if (!error)
> +		return 0;
> +
> +	/* Inode uncached or half assembled, read disk buffer */
> +	dip = xfs_buf_offset(bp, bpino * mp->m_sb.sb_inodesize);
> +	if (be16_to_cpu(dip->di_magic) != XFS_DINODE_MAGIC)
> +		return -EFSCORRUPTED;

Do we hold the buffer locked here? i.e. can we race with someone
else allocating/freeing/reading the inode?

> +
> +	if (dip->di_version >= 3 && be64_to_cpu(dip->di_ino) != fsino)
> +		return -EFSCORRUPTED;
> +
> +	*inuse = dip->di_mode != 0;
> +	return 0;
> +}
> +
> +/* Record extents that belong to inode btrees. */
> +STATIC int
> +xfs_repair_ialloc_extent_fn(
> +	struct xfs_btree_cur		*cur,
> +	struct xfs_rmap_irec		*rec,
> +	void				*priv)
> +{
> +	struct xfs_imap			imap;
> +	struct xfs_repair_ialloc	*ri = priv;
> +	struct xfs_repair_ialloc_extent	*rie;
> +	struct xfs_dinode		*dip;
> +	struct xfs_buf			*bp;
> +	struct xfs_mount		*mp = cur->bc_mp;
> +	xfs_ino_t			fsino;
> +	xfs_inofree_t			usedmask;
> +	xfs_fsblock_t			fsbno;
> +	xfs_agnumber_t			agno;
> +	xfs_agblock_t			agbno;
> +	xfs_agino_t			cdist;
> +	xfs_agino_t			startino;
> +	xfs_agino_t			clusterino;
> +	xfs_agino_t			nr_inodes;
> +	xfs_agino_t			inoalign;
> +	xfs_agino_t			agino;
> +	xfs_agino_t			rmino;
> +	uint16_t			fillmask;
> +	bool				inuse;
> +	int				blks_per_cluster;
> +	int				usedcount;
> +	int				error = 0;
> +
> +	if (xfs_scrub_should_terminate(ri->sc, &error))
> +		return error;
> +
> +	/* Fragment of the old btrees; dispose of them later. */
> +	if (rec->rm_owner == XFS_RMAP_OWN_INOBT) {
> +		fsbno = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_private.a.agno,
> +				rec->rm_startblock);
> +		return xfs_repair_collect_btree_extent(ri->sc, &ri->btlist,
> +				fsbno, rec->rm_blockcount);
> +	}
> +
> +	/* Skip extents which are not owned by this inode and fork. */
> +	if (rec->rm_owner != XFS_RMAP_OWN_INODES)
> +		return 0;
> +
> +	agno = cur->bc_private.a.agno;
> +	blks_per_cluster = xfs_icluster_size_fsb(mp);
> +	nr_inodes = XFS_OFFBNO_TO_AGINO(mp, blks_per_cluster, 0);
> +
> +	if (rec->rm_startblock % blks_per_cluster != 0)
> +		return -EFSCORRUPTED;
> +
> +	trace_xfs_repair_ialloc_extent_fn(mp, cur->bc_private.a.agno,
> +			rec->rm_startblock, rec->rm_blockcount, rec->rm_owner,
> +			rec->rm_offset, rec->rm_flags);
> +
> +	/*
> +	 * Determine the inode block alignment, and where the block
> +	 * ought to start if it's aligned properly.  On a sparse inode
> +	 * system the rmap doesn't have to start on an alignment boundary,
> +	 * but the record does.  On pre-sparse filesystems, we /must/
> +	 * start both rmap and inobt on an alignment boundary.
> +	 */
> +	inoalign = xfs_ialloc_cluster_alignment(mp);
> +	agbno = rec->rm_startblock;
> +	agino = XFS_OFFBNO_TO_AGINO(mp, agbno, 0);
> +	rmino = XFS_OFFBNO_TO_AGINO(mp, rounddown(agbno, inoalign), 0);
> +	if (!xfs_sb_version_hassparseinodes(&mp->m_sb) && agino != rmino)
> +		return -EFSCORRUPTED;
> +
> +	/*
> +	 * For each cluster in this blob of inode, we must calculate the
> +	 * properly aligned startino of that cluster, then iterate each
> +	 * cluster to fill in used and filled masks appropriately.  We
> +	 * then use the (startino, used, filled) information to construct
> +	 * the appropriate inode records.
> +	 */
> +	for (agbno = rec->rm_startblock;
> +	     agbno < rec->rm_startblock + rec->rm_blockcount;
> +	     agbno += blks_per_cluster) {

I see a few problems with indenting and "just over" long lines here.
Can you factor the loop internals into a separate function to reduce
that issue? Say xfs_repair_ialloc_process_cluster()?

> +		/* The per-AG inum of this inode cluster. */
> +		agino = XFS_OFFBNO_TO_AGINO(mp, agbno, 0);
> +
> +		/* The per-AG inum of the inobt record. */
> +		startino = rmino +
> +				rounddown(agino - rmino, XFS_INODES_PER_CHUNK);
> +		cdist = agino - startino;

What's "cdist" mean? I can guess at it's meaning, but I don't recall
seeing the inode number offset into a cluster been refered to as a
distanced before....

> +		/* Every inode in this holemask slot is filled. */
> +		fillmask = xfs_inobt_maskn(
> +				cdist / XFS_INODES_PER_HOLEMASK_BIT,
> +				nr_inodes / XFS_INODES_PER_HOLEMASK_BIT);
> +
> +		/* Grab the inode cluster buffer. */
> +		imap.im_blkno = XFS_AGB_TO_DADDR(mp, agno, agbno);
> +		imap.im_len = XFS_FSB_TO_BB(mp, blks_per_cluster);
> +		imap.im_boffset = 0;
> +
> +		error = xfs_imap_to_bp(mp, cur->bc_tp, &imap,
> +				&dip, &bp, 0, XFS_IGET_UNTRUSTED);
> +		if (error)
> +			return error;
> +
> +		usedmask = 0;
> +		usedcount = 0;
> +		/* Which inodes within this cluster are free? */
> +		for (clusterino = 0; clusterino < nr_inodes; clusterino++) {
> +			fsino = XFS_AGINO_TO_INO(mp, cur->bc_private.a.agno,
> +					agino + clusterino);
> +			error = xfs_repair_ialloc_check_free(cur, bp, fsino,
> +					clusterino, &inuse);
> +			if (error) {
> +				xfs_trans_brelse(cur->bc_tp, bp);
> +				return error;
> +			}
> +			if (inuse) {
> +				usedcount++;
> +				usedmask |= XFS_INOBT_MASK(cdist + clusterino);
> +			}
> +		}
> +		xfs_trans_brelse(cur->bc_tp, bp);
> +
> +		/*
> +		 * If the last item in the list is our chunk record,
> +		 * update that.
> +		 */
> +		if (!list_empty(&ri->extlist)) {
> +			rie = list_last_entry(&ri->extlist,
> +					struct xfs_repair_ialloc_extent, list);
> +			if (rie->startino + XFS_INODES_PER_CHUNK > startino) {
> +				rie->freemask &= ~usedmask;
> +				rie->holemask &= ~fillmask;
> +				rie->count += nr_inodes;
> +				rie->usedcount += usedcount;
> +				continue;
> +			}
> +		}
> +
> +		/* New inode chunk; add to the list. */
> +		rie = kmem_alloc(sizeof(struct xfs_repair_ialloc_extent),
> +				KM_MAYFAIL);
> +		if (!rie)
> +			return -ENOMEM;
> +
> +		INIT_LIST_HEAD(&rie->list);
> +		rie->startino = startino;
> +		rie->freemask = XFS_INOBT_ALL_FREE & ~usedmask;
> +		rie->holemask = XFS_INOBT_ALL_FREE & ~fillmask;
> +		rie->count = nr_inodes;
> +		rie->usedcount = usedcount;
> +		list_add_tail(&rie->list, &ri->extlist);
> +		ri->nr_records++;
> +	}
> +
> +	return 0;
> +}

[....]

> +/* Repair both inode btrees. */
> +int
> +xfs_repair_iallocbt(
> +	struct xfs_scrub_context	*sc)
> +{
> +	struct xfs_repair_ialloc	ri;
> +	struct xfs_owner_info		oinfo;
> +	struct xfs_mount		*mp = sc->mp;
> +	struct xfs_buf			*bp;
> +	struct xfs_repair_ialloc_extent	*rie;
> +	struct xfs_repair_ialloc_extent	*n;
> +	struct xfs_agi			*agi;
> +	struct xfs_btree_cur		*cur = NULL;
> +	struct xfs_perag		*pag;
> +	xfs_fsblock_t			inofsb;
> +	xfs_fsblock_t			finofsb;
> +	xfs_extlen_t			nr_blocks;
> +	xfs_agino_t			old_count;
> +	xfs_agino_t			old_freecount;
> +	xfs_agino_t			freecount;
> +	unsigned int			count;
> +	unsigned int			usedcount;
> +	int				logflags;
> +	int				error = 0;
> +
> +	/* We require the rmapbt to rebuild anything. */
> +	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
> +		return -EOPNOTSUPP;

This could be factored similarly to the allocbt repair function.

> +
> +	xfs_scrub_perag_get(sc->mp, &sc->sa);
> +	pag = sc->sa.pag;
> +	/* Collect all reverse mappings for inode blocks. */
> +	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_INOBT);
> +	INIT_LIST_HEAD(&ri.extlist);
> +	xfs_repair_init_extent_list(&ri.btlist);
> +	ri.nr_records = 0;
> +	ri.sc = sc;
> +
> +	cur = xfs_rmapbt_init_cursor(mp, sc->tp, sc->sa.agf_bp, sc->sa.agno);
> +	error = xfs_rmap_query_all(cur, xfs_repair_ialloc_extent_fn, &ri);
> +	if (error)
> +		goto out;
> +	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
> +	cur = NULL;
> +
> +	/* Do we actually have enough space to do this? */
> +	nr_blocks = xfs_iallocbt_calc_size(mp, ri.nr_records);
> +	if (xfs_sb_version_hasfinobt(&mp->m_sb))
> +		nr_blocks *= 2;
> +	if (!xfs_repair_ag_has_space(pag, nr_blocks, XFS_AG_RESV_NONE)) {
> +		error = -ENOSPC;
> +		goto out;
> +	}
> +
> +	/* Invalidate all the inobt/finobt blocks in btlist. */
> +	error = xfs_repair_invalidate_blocks(sc, &ri.btlist);
> +	if (error)
> +		goto out;
> +
> +	agi = XFS_BUF_TO_AGI(sc->sa.agi_bp);
> +	/* Initialize new btree roots. */
> +	error = xfs_repair_alloc_ag_block(sc, &oinfo, &inofsb,
> +			XFS_AG_RESV_NONE);
> +	if (error)
> +		goto out;
> +	error = xfs_repair_init_btblock(sc, inofsb, &bp, XFS_BTNUM_INO,
> +			&xfs_inobt_buf_ops);
> +	if (error)
> +		goto out;
> +	agi->agi_root = cpu_to_be32(XFS_FSB_TO_AGBNO(mp, inofsb));
> +	agi->agi_level = cpu_to_be32(1);
> +	logflags = XFS_AGI_ROOT | XFS_AGI_LEVEL;
> +
> +	if (xfs_sb_version_hasfinobt(&mp->m_sb)) {
> +		error = xfs_repair_alloc_ag_block(sc, &oinfo, &finofsb,
> +				mp->m_inotbt_nores ? XFS_AG_RESV_NONE :
> +						     XFS_AG_RESV_METADATA);
> +		if (error)
> +			goto out;
> +		error = xfs_repair_init_btblock(sc, finofsb, &bp,
> +				XFS_BTNUM_FINO, &xfs_inobt_buf_ops);
> +		if (error)
> +			goto out;
> +		agi->agi_free_root = cpu_to_be32(XFS_FSB_TO_AGBNO(mp, finofsb));
> +		agi->agi_free_level = cpu_to_be32(1);
> +		logflags |= XFS_AGI_FREE_ROOT | XFS_AGI_FREE_LEVEL;
> +	}
> +
> +	xfs_ialloc_log_agi(sc->tp, sc->sa.agi_bp, logflags);
> +	error = xfs_repair_roll_ag_trans(sc);
> +	if (error)
> +		goto out;
> +
> +	/* Insert records into the new btrees. */
> +	count = 0;
> +	usedcount = 0;
> +	list_sort(NULL, &ri.extlist, xfs_repair_ialloc_extent_cmp);
> +	list_for_each_entry_safe(rie, n, &ri.extlist, list) {
> +		count += rie->count;
> +		usedcount += rie->usedcount;
> +
> +		error = xfs_repair_iallocbt_insert_rec(sc, rie);
> +		if (error)
> +			goto out;
> +
> +		list_del(&rie->list);
> +		kmem_free(rie);
> +	}
> +
> +
> +	/* Update the AGI counters. */
> +	agi = XFS_BUF_TO_AGI(sc->sa.agi_bp);
> +	old_count = be32_to_cpu(agi->agi_count);
> +	old_freecount = be32_to_cpu(agi->agi_freecount);
> +	freecount = count - usedcount;
> +
> +	xfs_repair_mod_ino_counts(sc, old_count, count, old_freecount,
> +			freecount);
> +
> +	if (count != old_count) {
> +		if (sc->sa.pag->pagi_init)
> +			sc->sa.pag->pagi_count = count;
> +		agi->agi_count = cpu_to_be32(count);
> +		xfs_ialloc_log_agi(sc->tp, sc->sa.agi_bp, XFS_AGI_COUNT);
> +	}
> +
> +	if (freecount != old_freecount) {
> +		if (sc->sa.pag->pagi_init)
> +			sc->sa.pag->pagi_freecount = freecount;

We've read the AGI buffer in at this point, right? so it is
guaranteed that pagi_init is true, right?

> +		agi->agi_freecount = cpu_to_be32(freecount);
> +		xfs_ialloc_log_agi(sc->tp, sc->sa.agi_bp, XFS_AGI_FREECOUNT);
> +	}
> +
> +	/* Free the old inode btree blocks if they're not in use. */
> +	return xfs_repair_reap_btree_extents(sc, &ri.btlist, &oinfo,
> +			XFS_AG_RESV_NONE);
> +out:

out_error, perhaps, to distinguish it from the normal function
return path? (and perhaps apply that to all the previous main reapir
functions on factoring?)

> +	if (cur)
> +		xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
> +	xfs_repair_cancel_btree_extents(sc, &ri.btlist);
> +	list_for_each_entry_safe(rie, n, &ri.extlist, list) {
> +		list_del(&rie->list);
> +		kmem_free(rie);
> +	}
> +	return error;
> +}

-Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/14] xfs: repair the AGF and AGFL
  2018-06-04  1:52   ` Dave Chinner
@ 2018-06-05 23:18     ` Darrick J. Wong
  2018-06-06  4:06       ` Dave Chinner
  0 siblings, 1 reply; 35+ messages in thread
From: Darrick J. Wong @ 2018-06-05 23:18 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, Jun 04, 2018 at 11:52:55AM +1000, Dave Chinner wrote:
> On Wed, May 30, 2018 at 12:30:45PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Regenerate the AGF and AGFL from the rmap data.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> [...]
> 
> > +/* Repair the AGF. */
> > +int
> > +xfs_repair_agf(
> > +	struct xfs_scrub_context	*sc)
> > +{
> > +	struct xfs_repair_find_ag_btree	fab[] = {
> > +		{
> > +			.rmap_owner = XFS_RMAP_OWN_AG,
> > +			.buf_ops = &xfs_allocbt_buf_ops,
> > +			.magic = XFS_ABTB_CRC_MAGIC,
> > +		},
> > +		{
> > +			.rmap_owner = XFS_RMAP_OWN_AG,
> > +			.buf_ops = &xfs_allocbt_buf_ops,
> > +			.magic = XFS_ABTC_CRC_MAGIC,
> > +		},
> > +		{
> > +			.rmap_owner = XFS_RMAP_OWN_AG,
> > +			.buf_ops = &xfs_rmapbt_buf_ops,
> > +			.magic = XFS_RMAP_CRC_MAGIC,
> > +		},
> > +		{
> > +			.rmap_owner = XFS_RMAP_OWN_REFC,
> > +			.buf_ops = &xfs_refcountbt_buf_ops,
> > +			.magic = XFS_REFC_CRC_MAGIC,
> > +		},
> > +		{
> > +			.buf_ops = NULL,
> > +		},
> > +	};
> > +	struct xfs_repair_agf_allocbt	raa;
> > +	struct xfs_agf			old_agf;
> > +	struct xfs_mount		*mp = sc->mp;
> > +	struct xfs_buf			*agf_bp;
> > +	struct xfs_buf			*agfl_bp;
> > +	struct xfs_agf			*agf;
> > +	struct xfs_btree_cur		*cur = NULL;
> > +	struct xfs_perag		*pag;
> > +	xfs_agblock_t			blocks;
> > +	xfs_agblock_t			freesp_blocks;
> > +	int64_t				delta_fdblocks = 0;
> > +	int				error;
> > +
> > +	/* We require the rmapbt to rebuild anything. */
> > +	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
> > +		return -EOPNOTSUPP;
> > +
> > +	xfs_scrub_perag_get(sc->mp, &sc->sa);
> > +	pag = sc->sa.pag;
> > +	memset(&raa, 0, sizeof(raa));
> > +	error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
> > +			XFS_AG_DADDR(mp, sc->sa.agno, XFS_AGF_DADDR(mp)),
> > +			XFS_FSS_TO_BB(mp, 1), 0, &agf_bp, NULL);
> > +	if (error)
> > +		return error;
> > +	agf_bp->b_ops = &xfs_agf_buf_ops;
> > +
> > +	/*
> > +	 * Load the AGFL so that we can screen out OWN_AG blocks that
> > +	 * are on the AGFL now; these blocks might have once been part
> > +	 * of the bno/cnt/rmap btrees but are not now.
> > +	 */
> > +	error = xfs_alloc_read_agfl(mp, sc->tp, sc->sa.agno, &agfl_bp);
> > +	if (error)
> > +		return error;
> > +	error = xfs_agfl_walk(sc->mp, XFS_BUF_TO_AGF(agf_bp), agfl_bp,
> > +			xfs_repair_agf_check_agfl_block, sc);
> > +	if (error)
> > +		return error;
> 
> THis is a bit of a chicken/egg situation, isn't it? We haven't
> repaired the AGFL yet, so how do we know what is valid here?

Yep.  The AGF is corrupt, so we have to trust the AGFL contents because
we can't do any serious cross-referencing with any of the btrees rooted
in the AGF.  If the AGFL contents are obviously bad then we'll bail out.

> > +	/* Find the btree roots. */
> > +	error = xfs_repair_find_ag_btree_roots(sc, agf_bp, fab, agfl_bp);
> > +	if (error)
> > +		return error;
> > +	if (fab[0].root == NULLAGBLOCK || fab[0].height > XFS_BTREE_MAXLEVELS ||
> > +	    fab[1].root == NULLAGBLOCK || fab[1].height > XFS_BTREE_MAXLEVELS ||
> > +	    fab[2].root == NULLAGBLOCK || fab[2].height > XFS_BTREE_MAXLEVELS)
> > +		return -EFSCORRUPTED;
> > +	if (xfs_sb_version_hasreflink(&mp->m_sb) &&
> > +	    (fab[3].root == NULLAGBLOCK || fab[3].height > XFS_BTREE_MAXLEVELS))
> > +		return -EFSCORRUPTED;
> > +
> > +	/* Start rewriting the header. */
> > +	agf = XFS_BUF_TO_AGF(agf_bp);
> > +	memcpy(&old_agf, agf, sizeof(old_agf));
> > +
> > +	/*
> > +	 * We relied on the rmapbt to reconstruct the AGF.  If we get a
> > +	 * different root then something's seriously wrong.
> > +	 */
> > +	if (be32_to_cpu(old_agf.agf_roots[XFS_BTNUM_RMAPi]) != fab[2].root)
> > +		return -EFSCORRUPTED;
> > +	memset(agf, 0, mp->m_sb.sb_sectsize);
> > +	agf->agf_magicnum = cpu_to_be32(XFS_AGF_MAGIC);
> > +	agf->agf_versionnum = cpu_to_be32(XFS_AGF_VERSION);
> > +	agf->agf_seqno = cpu_to_be32(sc->sa.agno);
> > +	agf->agf_length = cpu_to_be32(xfs_ag_block_count(mp, sc->sa.agno));
> > +	agf->agf_roots[XFS_BTNUM_BNOi] = cpu_to_be32(fab[0].root);
> > +	agf->agf_roots[XFS_BTNUM_CNTi] = cpu_to_be32(fab[1].root);
> > +	agf->agf_roots[XFS_BTNUM_RMAPi] = cpu_to_be32(fab[2].root);
> > +	agf->agf_levels[XFS_BTNUM_BNOi] = cpu_to_be32(fab[0].height);
> > +	agf->agf_levels[XFS_BTNUM_CNTi] = cpu_to_be32(fab[1].height);
> > +	agf->agf_levels[XFS_BTNUM_RMAPi] = cpu_to_be32(fab[2].height);
> > +	agf->agf_flfirst = old_agf.agf_flfirst;
> > +	agf->agf_fllast = old_agf.agf_fllast;
> > +	agf->agf_flcount = old_agf.agf_flcount;
> > +	if (xfs_sb_version_hascrc(&mp->m_sb))
> > +		uuid_copy(&agf->agf_uuid, &mp->m_sb.sb_meta_uuid);
> > +	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> > +		agf->agf_refcount_root = cpu_to_be32(fab[3].root);
> > +		agf->agf_refcount_level = cpu_to_be32(fab[3].height);
> > +	}
> 
> Can we factor this function allow rebuild operation lines?

Yes...

> That will help document all the different pieces it is putting
> together. E.g move the AGF header init to before
> xfs_repair_find_ag_btree_roots(), and then pass it into
> xfs_repair_agf_rebuild_roots() which contains the above fab specific
> code.

...however, that's the second (and admittedly not well documented)
second chicken-egg -- we find the agf btree roots by probing the rmapbt,
which is rooted in the agf.  So xfs_repair_find_ag_btree_roots has to be
fed the old agf_bp buffer, and if that blows up then we bail out without
changing anything.

> 
> > +
> > +	/* Update the AGF counters from the bnobt. */
> > +	cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno,
> > +			XFS_BTNUM_BNO);
> > +	raa.sc = sc;
> > +	error = xfs_alloc_query_all(cur, xfs_repair_agf_walk_allocbt, &raa);
> > +	if (error)
> > +		goto err;
> > +	error = xfs_btree_count_blocks(cur, &blocks);
> > +	if (error)
> > +		goto err;
> > +	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
> > +	freesp_blocks = blocks - 1;
> > +	agf->agf_freeblks = cpu_to_be32(raa.freeblks);
> > +	agf->agf_longest = cpu_to_be32(raa.longest);
> > +
> > +	/* Update the AGF counters from the cntbt. */
> > +	cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno,
> > +			XFS_BTNUM_CNT);
> > +	error = xfs_btree_count_blocks(cur, &blocks);
> > +	if (error)
> > +		goto err;
> > +	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
> > +	freesp_blocks += blocks - 1;
> > +
> > +	/* Update the AGF counters from the rmapbt. */
> > +	cur = xfs_rmapbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno);
> > +	error = xfs_btree_count_blocks(cur, &blocks);
> > +	if (error)
> > +		goto err;
> > +	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
> > +	agf->agf_rmap_blocks = cpu_to_be32(blocks);
> > +	freesp_blocks += blocks - 1;
> > +
> > +	/* Update the AGF counters from the refcountbt. */
> > +	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> > +		cur = xfs_refcountbt_init_cursor(mp, sc->tp, agf_bp,
> > +				sc->sa.agno, NULL);
> > +		error = xfs_btree_count_blocks(cur, &blocks);
> > +		if (error)
> > +			goto err;
> > +		xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
> > +		agf->agf_refcount_blocks = cpu_to_be32(blocks);
> > +	}
> > +	agf->agf_btreeblks = cpu_to_be32(freesp_blocks);
> > +	cur = NULL;
> 
> Then this is xfs_repair_agf_rebuild_counters()

Ok.

> > +
> > +	/* Trigger reinitialization of the in-core data. */
> > +	if (raa.freeblks != be32_to_cpu(old_agf.agf_freeblks)) {
> > +		delta_fdblocks += (int64_t)raa.freeblks -
> > +				be32_to_cpu(old_agf.agf_freeblks);
> > +		if (pag->pagf_init)
> > +			pag->pagf_freeblks = be32_to_cpu(agf->agf_freeblks);
> > +	}
> > +
> > +	if (freesp_blocks != be32_to_cpu(old_agf.agf_btreeblks)) {
> > +		delta_fdblocks += (int64_t)freesp_blocks -
> > +				be32_to_cpu(old_agf.agf_btreeblks);
> > +		if (pag->pagf_init)
> > +			pag->pagf_btreeblks = be32_to_cpu(agf->agf_btreeblks);
> > +	}
> > +
> > +	if (pag->pagf_init &&
> > +	    (raa.longest != be32_to_cpu(old_agf.agf_longest) ||
> > +	     fab[0].height != be32_to_cpu(old_agf.agf_levels[XFS_BTNUM_BNOi]) ||
> > +	     fab[1].height != be32_to_cpu(old_agf.agf_levels[XFS_BTNUM_CNTi]) ||
> > +	     fab[2].height != be32_to_cpu(old_agf.agf_levels[XFS_BTNUM_RMAPi]) ||
> > +	     fab[3].height != be32_to_cpu(old_agf.agf_refcount_level))) {
> > +		pag->pagf_longest = be32_to_cpu(agf->agf_longest);
> > +		pag->pagf_levels[XFS_BTNUM_BNOi] =
> > +				be32_to_cpu(agf->agf_levels[XFS_BTNUM_BNOi]);
> > +		pag->pagf_levels[XFS_BTNUM_CNTi] =
> > +				be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNTi]);
> > +		pag->pagf_levels[XFS_BTNUM_RMAPi] =
> > +				be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAPi]);
> > +		pag->pagf_refcount_level =
> > +				be32_to_cpu(agf->agf_refcount_level);
> > +	}
> > +
> > +	error = xfs_repair_mod_fdblocks(sc, delta_fdblocks);
> > +	if (error)
> > +		goto err;
> 
> And xfs_repair_agf_update_pag().

Ok.

> > +
> > +	/* Write this to disk. */
> > +	xfs_trans_buf_set_type(sc->tp, agf_bp, XFS_BLFT_AGF_BUF);
> > +	xfs_trans_log_buf(sc->tp, agf_bp, 0, BBTOB(agf_bp->b_length) - 1);
> > +	return error;
> > +
> > +err:
> > +	if (cur)
> > +		xfs_btree_del_cursor(cur, error ? XFS_BTREE_ERROR :
> > +				XFS_BTREE_NOERROR);
> > +	memcpy(agf, &old_agf, sizeof(old_agf));
> > +	return error;
> > +}
> > +
> > +/* AGFL */
> > +
> > +struct xfs_repair_agfl {
> > +	struct xfs_repair_extent_list	freesp_list;
> > +	struct xfs_repair_extent_list	agmeta_list;
> > +	struct xfs_scrub_context	*sc;
> > +};
> > +
> > +/* Record all freespace information. */
> > +STATIC int
> > +xfs_repair_agfl_rmap_fn(
> > +	struct xfs_btree_cur		*cur,
> > +	struct xfs_rmap_irec		*rec,
> > +	void				*priv)
> > +{
> > +	struct xfs_repair_agfl		*ra = priv;
> > +	struct xfs_buf			*bp;
> > +	xfs_fsblock_t			fsb;
> > +	int				i;
> > +	int				error = 0;
> > +
> > +	if (xfs_scrub_should_terminate(ra->sc, &error))
> > +		return error;
> > +
> > +	/* Record all the OWN_AG blocks... */
> > +	if (rec->rm_owner == XFS_RMAP_OWN_AG) {
> > +		fsb = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_private.a.agno,
> > +				rec->rm_startblock);
> > +		error = xfs_repair_collect_btree_extent(ra->sc,
> > +				&ra->freesp_list, fsb, rec->rm_blockcount);
> > +		if (error)
> > +			return error;
> > +	}
> > +
> > +	/* ...and all the rmapbt blocks... */
> > +	for (i = 0; i < cur->bc_nlevels && cur->bc_ptrs[i] == 1; i++) {
> 
> What is the significance of "cur->bc_ptrs[i] == 1"?
> 
> This loop looks like it is walking the btree path to this leaf, but
> bc_ptrs[] will only have a "1" in it if we are at the left-most edge
> of the tree, right? so what about all the other btree blocks?

Close.  We're walking up the tree from the leaf towards the root.  For
each level, we assume that if bc_ptrs[level] == 1, then this is the
first time we've seen the block at that level, so we remember that we
saw this rmapbt block.  bc_ptrs is the offset within a block, not the
offset for the entire level.

So if our rmapbt tree is:

   4
 / | \
1  2  3

Pretend for this example that each leaf block has 100 rmap records.  For
the first rmap record, we'll observe that bc_ptrs[0] == 1, so we record
that we saw block 1.  Then we observe that bc_ptrs[1] == 1, so we record
block 4.  agmeta_list is [1, 4].

For the second rmap record, we see that bc_ptrs[0] == 2, so we exit the
loop.  agmeta_list remains [1, 4].

For the 101st rmap record, we've moved onto leaf block 2.  Now
bc_ptrs[0] == 1 again, so we record that we saw block 2.  We see that
bc_ptrs[1] == 2, so we exit the loop.  agmeta_list = [1, 4, 2].

For the 102nd rmap, bc_ptrs[0] == 2, so we exit.

For the 201st rmap record, we've moved on to leaf block 3.  bc_ptrs[0]
== 1, so we add 3 to agmeta_list.  [1, 4, 2, 3].

> > +		xfs_btree_get_block(cur, i, &bp);
> > +		if (!bp)
> > +			continue;
> > +		fsb = XFS_DADDR_TO_FSB(cur->bc_mp, bp->b_bn);
> > +		error = xfs_repair_collect_btree_extent(ra->sc,
> > +				&ra->agmeta_list, fsb, 1);
> > +		if (error)
> > +			return error;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +/* Add a btree block to the agmeta list. */
> > +STATIC int
> > +xfs_repair_agfl_visit_btblock(
> > +	struct xfs_btree_cur		*cur,
> > +	int				level,
> > +	void				*priv)
> > +{
> > +	struct xfs_repair_agfl		*ra = priv;
> > +	struct xfs_buf			*bp;
> > +	xfs_fsblock_t			fsb;
> > +	int				error = 0;
> > +
> > +	if (xfs_scrub_should_terminate(ra->sc, &error))
> > +		return error;
> > +
> > +	xfs_btree_get_block(cur, level, &bp);
> > +	if (!bp)
> > +		return 0;
> > +
> > +	fsb = XFS_DADDR_TO_FSB(cur->bc_mp, bp->b_bn);
> > +	return xfs_repair_collect_btree_extent(ra->sc, &ra->agmeta_list,
> > +			fsb, 1);
> > +}
> > +
> > +/* Repair the AGFL. */
> > +int
> > +xfs_repair_agfl(
> > +	struct xfs_scrub_context	*sc)
> > +{
> > +	struct xfs_repair_agfl		ra;
> > +	struct xfs_owner_info		oinfo;
> > +	struct xfs_mount		*mp = sc->mp;
> > +	struct xfs_buf			*agf_bp;
> > +	struct xfs_buf			*agfl_bp;
> > +	struct xfs_agf			*agf;
> > +	struct xfs_agfl			*agfl;
> > +	struct xfs_btree_cur		*cur = NULL;
> > +	__be32				*agfl_bno;
> > +	struct xfs_repair_extent	*rae;
> > +	struct xfs_repair_extent	*n;
> > +	xfs_agblock_t			flcount;
> > +	xfs_agblock_t			agbno;
> > +	xfs_agblock_t			bno;
> > +	xfs_agblock_t			old_flcount;
> > +	int				error;
> 
> Can we factor this function a little?

Or a lot. :)

> > +
> > +	/* We require the rmapbt to rebuild anything. */
> > +	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
> > +		return -EOPNOTSUPP;
> > +
> > +	xfs_scrub_perag_get(sc->mp, &sc->sa);
> > +	xfs_repair_init_extent_list(&ra.freesp_list);
> > +	xfs_repair_init_extent_list(&ra.agmeta_list);
> > +	ra.sc = sc;
> > +
> > +	error = xfs_alloc_read_agf(mp, sc->tp, sc->sa.agno, 0, &agf_bp);
> > +	if (error)
> > +		return error;
> > +	if (!agf_bp)
> > +		return -ENOMEM;
> > +
> > +	error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
> > +			XFS_AG_DADDR(mp, sc->sa.agno, XFS_AGFL_DADDR(mp)),
> > +			XFS_FSS_TO_BB(mp, 1), 0, &agfl_bp, NULL);
> > +	if (error)
> > +		return error;
> > +	agfl_bp->b_ops = &xfs_agfl_buf_ops;
> 
> Be nice to have a __xfs_alloc_read_agfl() function that didn't set
> the ops, and have this and xfs_alloc_read_agfl() both call it.

Huh?  xfs_alloc_read_agfl always reads the agfl buffer with
&xfs_agfl_buf_ops, why would we want to call it without the verifier?

It's only scrub that gets to do screwy things like read buffers with no
verifier.  libxfs functions should never do that.

<confused>

> From here:
> > +
> > +	/* Find all space used by the free space btrees & rmapbt. */
> > +	cur = xfs_rmapbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno);
> > +	error = xfs_rmap_query_all(cur, xfs_repair_agfl_rmap_fn, &ra);
> > +	if (error)
> > +		goto err;
> > +	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
> > +
> > +	/* Find all space used by bnobt. */
> > +	cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno,
> > +			XFS_BTNUM_BNO);
> > +	error = xfs_btree_visit_blocks(cur, xfs_repair_agfl_visit_btblock,
> > +			&ra);
> > +	if (error)
> > +		goto err;
> > +	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
> > +
> > +	/* Find all space used by cntbt. */
> > +	cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno,
> > +			XFS_BTNUM_CNT);
> > +	error = xfs_btree_visit_blocks(cur, xfs_repair_agfl_visit_btblock,
> > +			&ra);
> > +	if (error)
> > +		goto err;
> > +	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
> > +	cur = NULL;
> > +
> > +	/*
> > +	 * Drop the freesp meta blocks that are in use by btrees.
> > +	 * The remaining blocks /should/ be AGFL blocks.
> > +	 */
> > +	error = xfs_repair_subtract_extents(sc, &ra.freesp_list,
> > +			&ra.agmeta_list);
> > +	if (error)
> > +		goto err;
> > +	xfs_repair_cancel_btree_extents(sc, &ra.agmeta_list);
> 
> This whole section could go into a separate function, say
> xfs_repair_agfl_find_extents()?

Ok.

> > +
> > +	/* Calculate the new AGFL size. */
> > +	flcount = 0;
> > +	for_each_xfs_repair_extent_safe(rae, n, &ra.freesp_list) {
> 
> Why "safe" - we're not removing anything from the list here.

There's no for_each_xfs_repair_extent().  I'll make one.

> > +		for (bno = 0; bno < rae->len; bno++) {
> > +			if (flcount >= xfs_agfl_size(mp) - 1)
> 
> What's the reason for the magic "- 1" there?

I'm not sure I remember anymore, and my notes offer nothing.

> > +				break;
> > +			flcount++;
> > +		}
> > +	}
> 
> This seems like a complex way of doing:
> 
> 	for_each_xfs_repair_extent(rae, n, &ra.freesp_list) {
> 		flcount += rae->len;
> 		if (flcount >= xfs_agfl_size(mp) - 1) {
> 			flcount = xfs_agfl_size(mp) - 1;
> 			break;
> 		}
> 	}

Fixed.

> 
> > +	/* Update fdblocks if flcount changed. */
> > +	agf = XFS_BUF_TO_AGF(agf_bp);
> > +	old_flcount = be32_to_cpu(agf->agf_flcount);
> > +	if (flcount != old_flcount) {
> > +		int64_t	delta_fdblocks = (int64_t)flcount - old_flcount;
> > +
> > +		error = xfs_repair_mod_fdblocks(sc, delta_fdblocks);
> > +		if (error)
> > +			goto err;
> > +		if (sc->sa.pag->pagf_init)
> > +			sc->sa.pag->pagf_flcount = flcount;
> 
> No need to check pagf_init here - we've had a successful call to
> xfs_alloc_read_agf() earlier and that means pagf has been
> initialised.

Ok.

> > +	}
> > +
> > +	/* Update the AGF pointers. */
> > +	agf->agf_flfirst = cpu_to_be32(1);
> 
> Why index 1? What is in index 0? (see earlier questions about magic
> numbers :)

Don't remember.  Will set the new list to start at 0 and end at
flcount-1.

> > +	agf->agf_flcount = cpu_to_be32(flcount);
> > +	agf->agf_fllast = cpu_to_be32(flcount);
> > +
> > +	/* Start rewriting the header. */
> > +	agfl = XFS_BUF_TO_AGFL(agfl_bp);
> > +	memset(agfl, 0xFF, mp->m_sb.sb_sectsize);
> > +	agfl->agfl_magicnum = cpu_to_be32(XFS_AGFL_MAGIC);
> > +	agfl->agfl_seqno = cpu_to_be32(sc->sa.agno);
> > +	uuid_copy(&agfl->agfl_uuid, &mp->m_sb.sb_meta_uuid);
> > +
> > +	/* Fill the AGFL with the remaining blocks. */
> > +	flcount = 0;
> > +	agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agfl_bp);
> > +	for_each_xfs_repair_extent_safe(rae, n, &ra.freesp_list) {
> > +		agbno = XFS_FSB_TO_AGBNO(mp, rae->fsbno);
> > +
> > +		trace_xfs_repair_agfl_insert(mp, sc->sa.agno, agbno, rae->len);
> > +
> > +		for (bno = 0; bno < rae->len; bno++) {
> > +			if (flcount >= xfs_agfl_size(mp) - 1)
> > +				break;
> > +			agfl_bno[flcount + 1] = cpu_to_be32(agbno + bno);
> > +			flcount++;
> > +		}
> > +		rae->fsbno += bno;
> > +		rae->len -= bno;
> 
> This is a bit weird, using "bno" as an offset. But, also, there's
> that magic "don't use index 0 thing again :P

Ok.

> > +		if (rae->len)
> > +			break;
> > +		list_del(&rae->list);
> > +		kmem_free(rae);
> > +	}
> > +
> > +	/* Write AGF and AGFL to disk. */
> > +	xfs_alloc_log_agf(sc->tp, agf_bp,
> > +			XFS_AGF_FLFIRST | XFS_AGF_FLLAST | XFS_AGF_FLCOUNT);
> > +	xfs_trans_buf_set_type(sc->tp, agfl_bp, XFS_BLFT_AGFL_BUF);
> > +	xfs_trans_log_buf(sc->tp, agfl_bp, 0, BBTOB(agfl_bp->b_length) - 1);
> > +
> > +	/* Dump any AGFL overflow. */
> > +	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_AG);
> > +	return xfs_repair_reap_btree_extents(sc, &ra.freesp_list, &oinfo,
> > +			XFS_AG_RESV_AGFL);
> > +err:
> > +	xfs_repair_cancel_btree_extents(sc, &ra.agmeta_list);
> > +	xfs_repair_cancel_btree_extents(sc, &ra.freesp_list);
> > +	if (cur)
> > +		xfs_btree_del_cursor(cur, error ? XFS_BTREE_ERROR :
> > +				XFS_BTREE_NOERROR);
> > +	return error;
> > +}
> > diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
> > index e3e8fba1c99c..5f31dc8af505 100644
> > --- a/fs/xfs/scrub/repair.c
> > +++ b/fs/xfs/scrub/repair.c
> > @@ -1087,3 +1087,33 @@ xfs_repair_ino_dqattach(
> >  
> >  	return error;
> >  }
> > +
> > +/*
> > + * We changed this AGF's free block count, so now we need to reset the global
> > + * counters.  We use the transaction to update the global counters, so if the
> > + * AG free counts were low we have to ask the transaction for more block
> > + * reservation before decreasing fdblocks.
> > + *
> > + * XXX: We ought to have some mechanism for checking and fixing the superblock
> > + * counters (particularly if we're close to ENOSPC) but that's left as an open
> > + * research question for now.
> > + */
> > +int
> > +xfs_repair_mod_fdblocks(
> > +	struct xfs_scrub_context	*sc,
> > +	int64_t				delta_fdblocks)
> > +{
> > +	int				error;
> > +
> > +	if (delta_fdblocks == 0)
> > +		return 0;
> > +
> > +	if (delta_fdblocks < 0) {
> > +		error = xfs_trans_reserve_more(sc->tp, -delta_fdblocks, 0);
> > +		if (error)
> > +			return error;
> > +	}
> > +
> > +	xfs_trans_mod_sb(sc->tp, XFS_TRANS_SB_FDBLOCKS, delta_fdblocks);
> 
> This seems a little hacky - it's working around a transaction
> reservation overflow warning, right?

More than that -- we're trying to avoid the situation where the incore
free block counter goes negative.  Things go south pretty quickly when
that happens because transaction reservations succeed when there's not
enough free space to accomodate them.  We'd rather error out to
userspace and have the admin unmount and xfs_repair than risk letting
the fs really blow up.

Note that this function has to be called before repair dirties anything
in the repair transaction so we're still at a place where we could back
out with no harm done.

> Would it simply be better to
> have a different type for xfs_trans_mod_sb() that didn't shut down
> the filesystem on transaction reservation overflows here? e.g
> XFS_TRANS_SB_FDBLOCKS_REPAIR? That would get rid of the need for
> xfs_trans_reserve_more() code, right?
> 
> [...]
> > +/*
> > + * Try to reserve more blocks for a transaction.  The single use case we
> > + * support is for online repair -- use a transaction to gather data without
> > + * fear of btree cycle deadlocks; calculate how many blocks we really need
> > + * from that data; and only then start modifying data.  This can fail due to
> > + * ENOSPC, so we have to be able to cancel the transaction.
> > + */
> > +int
> > +xfs_trans_reserve_more(
> > +	struct xfs_trans	*tp,
> > +	uint			blocks,
> > +	uint			rtextents)
> > +{
> > +	struct xfs_mount	*mp = tp->t_mountp;
> > +	bool			rsvd = (tp->t_flags & XFS_TRANS_RESERVE) != 0;
> > +	int			error = 0;
> > +
> > +	ASSERT(!(tp->t_flags & XFS_TRANS_DIRTY));
> > +
> > +	/*
> > +	 * Attempt to reserve the needed disk blocks by decrementing
> > +	 * the number needed from the number available.  This will
> > +	 * fail if the count would go below zero.
> > +	 */
> > +	if (blocks > 0) {
> > +		error = xfs_mod_fdblocks(mp, -((int64_t)blocks), rsvd);
> > +		if (error != 0)
> > +			return -ENOSPC;
> 
> 		if (error)

Ok.

--D

> > +		tp->t_blk_res += blocks;
> > +	}
> 
> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 02/14] xfs: repair the AGI
  2018-06-04  1:56   ` Dave Chinner
@ 2018-06-05 23:54     ` Darrick J. Wong
  0 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2018-06-05 23:54 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, Jun 04, 2018 at 11:56:38AM +1000, Dave Chinner wrote:
> On Wed, May 30, 2018 at 12:30:52PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Rebuild the AGI header items with some help from the rmapbt.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Looks OK, but I can't help but think it should be structured similar
> to the AGF rebuild, even though the functions would be smaller and
> simpler...
> 
> Thoughts?

Will do.

--D

> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 03/14] xfs: repair free space btrees
  2018-06-04  2:12   ` Dave Chinner
@ 2018-06-06  1:50     ` Darrick J. Wong
  2018-06-06  3:34       ` Dave Chinner
  0 siblings, 1 reply; 35+ messages in thread
From: Darrick J. Wong @ 2018-06-06  1:50 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, Jun 04, 2018 at 12:12:34PM +1000, Dave Chinner wrote:
> On Wed, May 30, 2018 at 12:30:58PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Rebuild the free space btrees from the gaps in the rmap btree.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/Makefile             |    1 
> >  fs/xfs/scrub/alloc.c        |    1 
> >  fs/xfs/scrub/alloc_repair.c |  430 +++++++++++++++++++++++++++++++++++++++++++
> >  fs/xfs/scrub/common.c       |    8 +
> >  fs/xfs/scrub/repair.h       |    2 
> >  fs/xfs/scrub/scrub.c        |    4 
> >  6 files changed, 442 insertions(+), 4 deletions(-)
> >  create mode 100644 fs/xfs/scrub/alloc_repair.c
> > 
> > 
> > diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> > index 29fe115f29d5..abe035ad0aa4 100644
> > --- a/fs/xfs/Makefile
> > +++ b/fs/xfs/Makefile
> > @@ -175,6 +175,7 @@ xfs-$(CONFIG_XFS_QUOTA)		+= scrub/quota.o
> >  ifeq ($(CONFIG_XFS_ONLINE_REPAIR),y)
> >  xfs-y				+= $(addprefix scrub/, \
> >  				   agheader_repair.o \
> > +				   alloc_repair.o \
> >  				   repair.o \
> >  				   )
> >  endif
> > diff --git a/fs/xfs/scrub/alloc.c b/fs/xfs/scrub/alloc.c
> > index 941a0a55224e..fe7e8bdf4a52 100644
> > --- a/fs/xfs/scrub/alloc.c
> > +++ b/fs/xfs/scrub/alloc.c
> > @@ -29,7 +29,6 @@
> >  #include "xfs_log_format.h"
> >  #include "xfs_trans.h"
> >  #include "xfs_sb.h"
> > -#include "xfs_alloc.h"
> >  #include "xfs_rmap.h"
> >  #include "xfs_alloc.h"
> >  #include "scrub/xfs_scrub.h"
> > diff --git a/fs/xfs/scrub/alloc_repair.c b/fs/xfs/scrub/alloc_repair.c
> > new file mode 100644
> > index 000000000000..5a81713a69cd
> > --- /dev/null
> > +++ b/fs/xfs/scrub/alloc_repair.c
> > @@ -0,0 +1,430 @@
> > +/*
> > + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> > + *
> > + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License
> > + * as published by the Free Software Foundation; either version 2
> > + * of the License, or (at your option) any later version.

Will clean this up by spdxinfying it.

// SPDX-License-Identifier: GPL-2.0+
/*
 * Copyright (C) 2018 Oracle.  All Rights Reserved.
 * Author: Darrick J. Wong <darrick.wong@oracle.com>
 */

> > + *
> > + * This program is distributed in the hope that it would be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, write the Free Software Foundation,
> > + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> > + */
> > +#include "xfs.h"
> > +#include "xfs_fs.h"
> > +#include "xfs_shared.h"
> > +#include "xfs_format.h"
> > +#include "xfs_trans_resv.h"
> > +#include "xfs_mount.h"
> > +#include "xfs_defer.h"
> > +#include "xfs_btree.h"
> > +#include "xfs_bit.h"
> > +#include "xfs_log_format.h"
> > +#include "xfs_trans.h"
> > +#include "xfs_sb.h"
> > +#include "xfs_alloc.h"
> > +#include "xfs_alloc_btree.h"
> > +#include "xfs_rmap.h"
> > +#include "xfs_rmap_btree.h"
> > +#include "xfs_inode.h"
> > +#include "xfs_refcount.h"
> > +#include "scrub/xfs_scrub.h"
> > +#include "scrub/scrub.h"
> > +#include "scrub/common.h"
> > +#include "scrub/btree.h"
> > +#include "scrub/trace.h"
> > +#include "scrub/repair.h"
> > +
> > +/* Free space btree repair. */
> 
> Can you add a decription of the algorithm used here.

Ok.

/*
 * Free Space Btree Repair
 * =======================
 *
 * The reverse mappings are supposed to record all space usage for the
 * entire AG.  Therefore, we can recalculate the free extents in an AG
 * by looking for gaps in the physical extents recorded in the rmapbt.
 * On a reflink filesystem this is a little more tricky in that we have
 * to be aware that the rmap records are allowed to overlap.
 *
 * We derive which blocks belonged to the old bnobt/cntbt by recording
 * all the OWN_AG extents and subtracting out the blocks owned by all
 * other OWN_AG metadata: the rmapbt blocks visited while iterating the
 * reverse mappings and the AGFL blocks.
 *
 * Once we have both of those pieces, we can reconstruct the bnobt and
 * cntbt by blowing out the free block state and freeing all the extents
 * that we found.  This adds the requirement that we can't have any busy
 * extents in the AG because the busy code cannot handle duplicate
 * records.
 *
 * Note that we can only rebuild both free space btrees at the same
 * time.
 */


> > +
> > +struct xfs_repair_alloc_extent {
> > +	struct list_head		list;
> > +	xfs_agblock_t			bno;
> > +	xfs_extlen_t			len;
> > +};
> > +
> > +struct xfs_repair_alloc {
> > +	struct list_head		extlist;
> > +	struct xfs_repair_extent_list	btlist;	  /* OWN_AG blocks */
> > +	struct xfs_repair_extent_list	nobtlist; /* rmapbt/agfl blocks */
> > +	struct xfs_scrub_context	*sc;
> > +	xfs_agblock_t			next_bno;
> > +	uint64_t			nr_records;
> > +};
> > +
> > +/* Record extents that aren't in use from gaps in the rmap records. */
> > +STATIC int
> > +xfs_repair_alloc_extent_fn(
> > +	struct xfs_btree_cur		*cur,
> > +	struct xfs_rmap_irec		*rec,
> > +	void				*priv)
> > +{
> > +	struct xfs_repair_alloc		*ra = priv;
> > +	struct xfs_repair_alloc_extent	*rae;
> > +	struct xfs_buf			*bp;
> > +	xfs_fsblock_t			fsb;
> > +	int				i;
> > +	int				error;
> > +
> > +	/* Record all the OWN_AG blocks... */
> > +	if (rec->rm_owner == XFS_RMAP_OWN_AG) {
> > +		fsb = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_private.a.agno,
> > +				rec->rm_startblock);
> > +		error = xfs_repair_collect_btree_extent(ra->sc,
> > +				&ra->btlist, fsb, rec->rm_blockcount);
> > +		if (error)
> > +			return error;
> > +	}
> > +
> > +	/* ...and all the rmapbt blocks... */
> > +	for (i = 0; i < cur->bc_nlevels && cur->bc_ptrs[i] == 1; i++) {
> > +		xfs_btree_get_block(cur, i, &bp);
> > +		if (!bp)
> > +			continue;
> > +		fsb = XFS_DADDR_TO_FSB(cur->bc_mp, bp->b_bn);
> > +		error = xfs_repair_collect_btree_extent(ra->sc,
> > +				&ra->nobtlist, fsb, 1);
> > +		if (error)
> > +			return error;
> > +	}
> 
> This looks familiar from previous patches, including the magic
> bc_ptrs check. factoring opportunity?

Ok.

> > +
> > +	/* ...and all the free space. */
> > +	if (rec->rm_startblock > ra->next_bno) {
> > +		trace_xfs_repair_alloc_extent_fn(cur->bc_mp,
> > +				cur->bc_private.a.agno,
> > +				ra->next_bno, rec->rm_startblock - ra->next_bno,
> > +				XFS_RMAP_OWN_NULL, 0, 0);
> > +
> > +		rae = kmem_alloc(sizeof(struct xfs_repair_alloc_extent),
> > +				KM_MAYFAIL);
> > +		if (!rae)
> > +			return -ENOMEM;
> > +		INIT_LIST_HEAD(&rae->list);
> > +		rae->bno = ra->next_bno;
> > +		rae->len = rec->rm_startblock - ra->next_bno;
> > +		list_add_tail(&rae->list, &ra->extlist);
> > +		ra->nr_records++;
> > +	}
> > +	ra->next_bno = max_t(xfs_agblock_t, ra->next_bno,
> > +			rec->rm_startblock + rec->rm_blockcount);
> > +	return 0;
> > +}
> 
> [....]
> 
> > +/* Allocate a block from the (cached) longest extent in the AG. */
> > +STATIC xfs_fsblock_t
> > +xfs_repair_allocbt_alloc_from_longest(
> > +	struct xfs_repair_alloc		*ra,
> > +	struct xfs_repair_alloc_extent	**longest)
> > +{
> > +	xfs_fsblock_t			fsb;
> > +
> > +	if (*longest && (*longest)->len == 0) {
> > +		list_del(&(*longest)->list);
> > +		kmem_free(*longest);
> > +		*longest = NULL;
> > +	}
> > +
> > +	if (*longest == NULL) {
> > +		*longest = xfs_repair_allocbt_get_longest(ra);
> > +		if (*longest == NULL)
> > +			return NULLFSBLOCK;
> > +	}
> > +
> > +	fsb = XFS_AGB_TO_FSB(ra->sc->mp, ra->sc->sa.agno, (*longest)->bno);
> > +	(*longest)->bno++;
> > +	(*longest)->len--;
> 
> What if this makes the longest extent no longer the longest on the
> extent list?

It should be fine, since all we do later is zero out the free space
counters in the AG and start freeing extents.  The regular extent
freeing code takes care to update the agf/perag longest-free counter
appropriately.

> > +	return fsb;
> > +}
> > +
> > +/* Repair the freespace btrees for some AG. */
> > +int
> > +xfs_repair_allocbt(
> > +	struct xfs_scrub_context	*sc)
> > +{
> > +	struct xfs_repair_alloc		ra;
> > +	struct xfs_owner_info		oinfo;
> > +	struct xfs_mount		*mp = sc->mp;
> > +	struct xfs_btree_cur		*cur = NULL;
> > +	struct xfs_repair_alloc_extent	*longest = NULL;
> > +	struct xfs_repair_alloc_extent	*rae;
> > +	struct xfs_repair_alloc_extent	*n;
> > +	struct xfs_perag		*pag;
> > +	struct xfs_agf			*agf;
> > +	struct xfs_buf			*bp;
> > +	xfs_fsblock_t			bnofsb;
> > +	xfs_fsblock_t			cntfsb;
> > +	xfs_extlen_t			oldf;
> > +	xfs_extlen_t			nr_blocks;
> > +	xfs_agblock_t			agend;
> > +	int				error;
> > +
> > +	/* We require the rmapbt to rebuild anything. */
> > +	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
> > +		return -EOPNOTSUPP;
> > +
> > +	xfs_scrub_perag_get(sc->mp, &sc->sa);
> > +	pag = sc->sa.pag;
> 
> Probably shoulld make xfs_scrub_perag_get() return the pag directly.

TBH I've been refactoring the other way, where I break up the huge
repair functions and only reference sc->sa.pag in the subfunctions as
necessary.

> > +	/*
> > +	 * Make sure the busy extent list is clear because we can't put
> > +	 * extents on there twice.
> > +	 */
> > +	spin_lock(&pag->pagb_lock);
> > +	if (pag->pagb_tree.rb_node) {
> > +		spin_unlock(&pag->pagb_lock);
> > +		return -EDEADLOCK;
> > +	}
> > +	spin_unlock(&pag->pagb_lock);
> 
> Can you wrap that up a helper, say, xfs_extent_busy_list_empty()?
> 
> 	if (!xfs_extent_busy_list_empty(pag))
> 		return -EDEADLOCK;

Ok.

> > +	/*
> > +	 * Collect all reverse mappings for free extents, and the rmapbt
> > +	 * blocks.  We can discover the rmapbt blocks completely from a
> > +	 * query_all handler because there are always rmapbt entries.
> > +	 * (One cannot use on query_all to visit all of a btree's blocks
> > +	 * unless that btree is guaranteed to have at least one entry.)
> > +	 */
> > +	INIT_LIST_HEAD(&ra.extlist);
> > +	xfs_repair_init_extent_list(&ra.btlist);
> > +	xfs_repair_init_extent_list(&ra.nobtlist);
> > +	ra.next_bno = 0;
> > +	ra.nr_records = 0;
> > +	ra.sc = sc;
> > +
> > +	cur = xfs_rmapbt_init_cursor(mp, sc->tp, sc->sa.agf_bp, sc->sa.agno);
> > +	error = xfs_rmap_query_all(cur, xfs_repair_alloc_extent_fn, &ra);
> > +	if (error)
> > +		goto out;
> > +	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
> > +	cur = NULL;
> > +
> > +	/* Insert a record for space between the last rmap and EOAG. */
> > +	agf = XFS_BUF_TO_AGF(sc->sa.agf_bp);
> > +	agend = be32_to_cpu(agf->agf_length);
> > +	if (ra.next_bno < agend) {
> > +		rae = kmem_alloc(sizeof(struct xfs_repair_alloc_extent),
> > +				KM_MAYFAIL);
> > +		if (!rae) {
> > +			error = -ENOMEM;
> > +			goto out;
> > +		}
> > +		INIT_LIST_HEAD(&rae->list);
> > +		rae->bno = ra.next_bno;
> > +		rae->len = agend - ra.next_bno;
> > +		list_add_tail(&rae->list, &ra.extlist);
> > +		ra.nr_records++;
> > +	}
> > +
> > +	/* Collect all the AGFL blocks. */
> > +	error = xfs_agfl_walk(sc->mp, XFS_BUF_TO_AGF(sc->sa.agf_bp),
> > +			sc->sa.agfl_bp, xfs_repair_collect_agfl_block, &ra);
> > +	if (error)
> > +		goto out;
> > +
> > +	/* Do we actually have enough space to do this? */
> > +	nr_blocks = 2 * xfs_allocbt_calc_size(mp, ra.nr_records);
> > +	if (!xfs_repair_ag_has_space(pag, nr_blocks, XFS_AG_RESV_NONE)) {
> > +		error = -ENOSPC;
> > +		goto out;
> > +	}
> > +
> > +	/* Invalidate all the bnobt/cntbt blocks in btlist. */
> > +	error = xfs_repair_subtract_extents(sc, &ra.btlist, &ra.nobtlist);
> > +	if (error)
> > +		goto out;
> > +	xfs_repair_cancel_btree_extents(sc, &ra.nobtlist);
> > +	error = xfs_repair_invalidate_blocks(sc, &ra.btlist);
> > +	if (error)
> > +		goto out;
> 
> So this could be factored in xfs_repair_allocbt_get_free_extents().

Ok.

> > +
> > +	/* Allocate new bnobt root. */
> > +	bnofsb = xfs_repair_allocbt_alloc_from_longest(&ra, &longest);
> > +	if (bnofsb == NULLFSBLOCK) {
> > +		error = -ENOSPC;
> > +		goto out;
> > +	}
> > +
> > +	/* Allocate new cntbt root. */
> > +	cntfsb = xfs_repair_allocbt_alloc_from_longest(&ra, &longest);
> > +	if (cntfsb == NULLFSBLOCK) {
> > +		error = -ENOSPC;
> > +		goto out;
> > +	}
> > +
> > +	agf = XFS_BUF_TO_AGF(sc->sa.agf_bp);
> > +	/* Initialize new bnobt root. */
> > +	error = xfs_repair_init_btblock(sc, bnofsb, &bp, XFS_BTNUM_BNO,
> > +			&xfs_allocbt_buf_ops);
> > +	if (error)
> > +		goto out;
> > +	agf->agf_roots[XFS_BTNUM_BNOi] =
> > +			cpu_to_be32(XFS_FSB_TO_AGBNO(mp, bnofsb));
> > +	agf->agf_levels[XFS_BTNUM_BNOi] = cpu_to_be32(1);
> > +
> > +	/* Initialize new cntbt root. */
> > +	error = xfs_repair_init_btblock(sc, cntfsb, &bp, XFS_BTNUM_CNT,
> > +			&xfs_allocbt_buf_ops);
> > +	if (error)
> > +		goto out;
> > +	agf->agf_roots[XFS_BTNUM_CNTi] =
> > +			cpu_to_be32(XFS_FSB_TO_AGBNO(mp, cntfsb));
> > +	agf->agf_levels[XFS_BTNUM_CNTi] = cpu_to_be32(1);
> 
> xfs_repair_allocbt_new_btree_roots()
> 
> > +
> > +	/*
> > +	 * Since we're abandoning the old bnobt/cntbt, we have to
> > +	 * decrease fdblocks by the # of blocks in those trees.
> > +	 * btreeblks counts the non-root blocks of the free space
> > +	 * and rmap btrees.  Do this before resetting the AGF counters.
> > +	 */
> > +	oldf = pag->pagf_btreeblks + 2;
> > +	oldf -= (be32_to_cpu(agf->agf_rmap_blocks) - 1);
> > +	error = xfs_mod_fdblocks(mp, -(int64_t)oldf, false);
> > +	if (error)
> > +		goto out;
> > +
> > +	/* Reset the perag info. */
> > +	pag->pagf_btreeblks = be32_to_cpu(agf->agf_rmap_blocks) - 1;
> > +	pag->pagf_freeblks = 0;
> > +	pag->pagf_longest = 0;
> > +	pag->pagf_levels[XFS_BTNUM_BNOi] =
> > +			be32_to_cpu(agf->agf_levels[XFS_BTNUM_BNOi]);
> > +	pag->pagf_levels[XFS_BTNUM_CNTi] =
> > +			be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNTi]);
> > +
> > +	/* Now reset the AGF counters. */
> > +	agf->agf_btreeblks = cpu_to_be32(pag->pagf_btreeblks);
> > +	agf->agf_freeblks = cpu_to_be32(pag->pagf_freeblks);
> > +	agf->agf_longest = cpu_to_be32(pag->pagf_longest);
> > +	xfs_alloc_log_agf(sc->tp, sc->sa.agf_bp,
> > +			XFS_AGF_ROOTS | XFS_AGF_LEVELS | XFS_AGF_BTREEBLKS |
> > +			XFS_AGF_LONGEST | XFS_AGF_FREEBLKS);
> > +	error = xfs_repair_roll_ag_trans(sc);
> > +	if (error)
> > +		goto out;
> 
> xfs_repair_allocbt_reset_counters()?

Done.

> > +	/*
> > +	 * Insert the longest free extent in case it's necessary to
> > +	 * refresh the AGFL with multiple blocks.
> > +	 */
> > +	xfs_rmap_skip_owner_update(&oinfo);
> > +	if (longest && longest->len == 0) {
> > +		error = xfs_repair_allocbt_free_extent(sc,
> > +				XFS_AGB_TO_FSB(sc->mp, sc->sa.agno,
> > +					longest->bno),
> > +				longest->len, &oinfo);
> > +		if (error)
> > +			goto out;
> > +		list_del(&longest->list);
> > +		kmem_free(longest);
> > +	}
> > +
> > +	/* Insert records into the new btrees. */
> > +	list_sort(NULL, &ra.extlist, xfs_repair_allocbt_extent_cmp);
> > +	list_for_each_entry_safe(rae, n, &ra.extlist, list) {
> > +		error = xfs_repair_allocbt_free_extent(sc,
> > +				XFS_AGB_TO_FSB(sc->mp, sc->sa.agno, rae->bno),
> > +				rae->len, &oinfo);
> > +		if (error)
> > +			goto out;
> > +		list_del(&rae->list);
> > +		kmem_free(rae);
> > +	}
> > +
> > +	/* Add rmap records for the btree roots */
> > +	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_AG);
> > +	error = xfs_rmap_alloc(sc->tp, sc->sa.agf_bp, sc->sa.agno,
> > +			XFS_FSB_TO_AGBNO(mp, bnofsb), 1, &oinfo);
> > +	if (error)
> > +		goto out;
> > +	error = xfs_rmap_alloc(sc->tp, sc->sa.agf_bp, sc->sa.agno,
> > +			XFS_FSB_TO_AGBNO(mp, cntfsb), 1, &oinfo);
> > +	if (error)
> > +		goto out;
> 
> xfs_repair_allocbt_rebuild_tree()

Done.

--D

> > +
> > +	/* Free all the OWN_AG blocks that are not in the rmapbt/agfl. */
> > +	return xfs_repair_reap_btree_extents(sc, &ra.btlist, &oinfo,
> > +			XFS_AG_RESV_NONE);
> > +out:
> > +	xfs_repair_cancel_btree_extents(sc, &ra.btlist);
> > +	xfs_repair_cancel_btree_extents(sc, &ra.nobtlist);
> > +	if (cur)
> > +		xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
> > +	list_for_each_entry_safe(rae, n, &ra.extlist, list) {
> > +		list_del(&rae->list);
> > +		kmem_free(rae);
> > +	}
> > +	return error;
> > +}
> 
> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 03/14] xfs: repair free space btrees
  2018-06-06  1:50     ` Darrick J. Wong
@ 2018-06-06  3:34       ` Dave Chinner
  2018-06-06  4:01         ` Darrick J. Wong
  0 siblings, 1 reply; 35+ messages in thread
From: Dave Chinner @ 2018-06-06  3:34 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Jun 05, 2018 at 06:50:26PM -0700, Darrick J. Wong wrote:
> On Mon, Jun 04, 2018 at 12:12:34PM +1000, Dave Chinner wrote:
> > On Wed, May 30, 2018 at 12:30:58PM -0700, Darrick J. Wong wrote:
> > > +
> > > +/* Free space btree repair. */
> > 
> > Can you add a decription of the algorithm used here.
> 
> Ok.
> 
> /*
>  * Free Space Btree Repair
>  * =======================
>  *
>  * The reverse mappings are supposed to record all space usage for the
>  * entire AG.  Therefore, we can recalculate the free extents in an AG
>  * by looking for gaps in the physical extents recorded in the rmapbt.
>  * On a reflink filesystem this is a little more tricky in that we have
>  * to be aware that the rmap records are allowed to overlap.
>  *
>  * We derive which blocks belonged to the old bnobt/cntbt by recording
>  * all the OWN_AG extents and subtracting out the blocks owned by all
>  * other OWN_AG metadata: the rmapbt blocks visited while iterating the
>  * reverse mappings and the AGFL blocks.
>  *
>  * Once we have both of those pieces, we can reconstruct the bnobt and
>  * cntbt by blowing out the free block state and freeing all the extents
>  * that we found.  This adds the requirement that we can't have any busy
>  * extents in the AG because the busy code cannot handle duplicate
>  * records.
>  *
>  * Note that we can only rebuild both free space btrees at the same
>  * time.

Ok, so if I've got this right, the limitations indicated in the last
two paragraphs are a result of makring space free by calling
xfs_free_extent() on all the extents in the list? If so, can you
mention that this is an implementation artifact, not a algorithmic
limitation?

> > > +/* Allocate a block from the (cached) longest extent in the AG. */
> > > +STATIC xfs_fsblock_t
> > > +xfs_repair_allocbt_alloc_from_longest(
> > > +	struct xfs_repair_alloc		*ra,
> > > +	struct xfs_repair_alloc_extent	**longest)
> > > +{
> > > +	xfs_fsblock_t			fsb;
> > > +
> > > +	if (*longest && (*longest)->len == 0) {
> > > +		list_del(&(*longest)->list);
> > > +		kmem_free(*longest);
> > > +		*longest = NULL;
> > > +	}
> > > +
> > > +	if (*longest == NULL) {
> > > +		*longest = xfs_repair_allocbt_get_longest(ra);
> > > +		if (*longest == NULL)
> > > +			return NULLFSBLOCK;
> > > +	}
> > > +
> > > +	fsb = XFS_AGB_TO_FSB(ra->sc->mp, ra->sc->sa.agno, (*longest)->bno);
> > > +	(*longest)->bno++;
> > > +	(*longest)->len--;
> > 
> > What if this makes the longest extent no longer the longest on the
> > extent list?
> 
> It should be fine, since all we do later is zero out the free space
> counters in the AG and start freeing extents.  The regular extent
> freeing code takes care to update the agf/perag longest-free counter
> appropriately.

Comment, please. :P

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 04/14] xfs: repair inode btrees
  2018-06-04  3:41   ` Dave Chinner
@ 2018-06-06  3:55     ` Darrick J. Wong
  2018-06-06  4:32       ` Dave Chinner
  0 siblings, 1 reply; 35+ messages in thread
From: Darrick J. Wong @ 2018-06-06  3:55 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, Jun 04, 2018 at 01:41:30PM +1000, Dave Chinner wrote:
> On Wed, May 30, 2018 at 12:31:04PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Use the rmapbt to find inode chunks, query the chunks to compute
> > hole and free masks, and with that information rebuild the inobt
> > and finobt.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> [...]
> 
> > +xfs_repair_ialloc_check_free(
> > +	struct xfs_btree_cur	*cur,
> > +	struct xfs_buf		*bp,
> > +	xfs_ino_t		fsino,
> > +	xfs_agino_t		bpino,
> > +	bool			*inuse)
> > +{
> > +	struct xfs_mount	*mp = cur->bc_mp;
> > +	struct xfs_dinode	*dip;
> > +	int			error;
> > +
> > +	/* Will the in-core inode tell us if it's in use? */
> > +	error = xfs_icache_inode_is_allocated(mp, cur->bc_tp, fsino, inuse);
> > +	if (!error)
> > +		return 0;
> > +
> > +	/* Inode uncached or half assembled, read disk buffer */
> > +	dip = xfs_buf_offset(bp, bpino * mp->m_sb.sb_inodesize);
> > +	if (be16_to_cpu(dip->di_magic) != XFS_DINODE_MAGIC)
> > +		return -EFSCORRUPTED;
> 
> Do we hold the buffer locked here? i.e. can we race with someone
> else allocating/freeing/reading the inode?

I think repair should be ok from alloc/free because both of those paths
(xfs_dialloc/xfs_difree) will grab the AGI header, whereas repair locks
all three AG headers and keeps them locked until repairs are complete.
I don't think we have to worry about concurrent reads because the only
field we care about are di_mode/i_mode, which don't change outside of
inode allocation and freeing.

> > +
> > +	if (dip->di_version >= 3 && be64_to_cpu(dip->di_ino) != fsino)
> > +		return -EFSCORRUPTED;
> > +
> > +	*inuse = dip->di_mode != 0;
> > +	return 0;
> > +}
> > +
> > +/* Record extents that belong to inode btrees. */
> > +STATIC int
> > +xfs_repair_ialloc_extent_fn(
> > +	struct xfs_btree_cur		*cur,
> > +	struct xfs_rmap_irec		*rec,
> > +	void				*priv)
> > +{
> > +	struct xfs_imap			imap;
> > +	struct xfs_repair_ialloc	*ri = priv;
> > +	struct xfs_repair_ialloc_extent	*rie;
> > +	struct xfs_dinode		*dip;
> > +	struct xfs_buf			*bp;
> > +	struct xfs_mount		*mp = cur->bc_mp;
> > +	xfs_ino_t			fsino;
> > +	xfs_inofree_t			usedmask;
> > +	xfs_fsblock_t			fsbno;
> > +	xfs_agnumber_t			agno;
> > +	xfs_agblock_t			agbno;
> > +	xfs_agino_t			cdist;
> > +	xfs_agino_t			startino;
> > +	xfs_agino_t			clusterino;
> > +	xfs_agino_t			nr_inodes;
> > +	xfs_agino_t			inoalign;
> > +	xfs_agino_t			agino;
> > +	xfs_agino_t			rmino;
> > +	uint16_t			fillmask;
> > +	bool				inuse;
> > +	int				blks_per_cluster;
> > +	int				usedcount;
> > +	int				error = 0;
> > +
> > +	if (xfs_scrub_should_terminate(ri->sc, &error))
> > +		return error;
> > +
> > +	/* Fragment of the old btrees; dispose of them later. */
> > +	if (rec->rm_owner == XFS_RMAP_OWN_INOBT) {
> > +		fsbno = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_private.a.agno,
> > +				rec->rm_startblock);
> > +		return xfs_repair_collect_btree_extent(ri->sc, &ri->btlist,
> > +				fsbno, rec->rm_blockcount);
> > +	}
> > +
> > +	/* Skip extents which are not owned by this inode and fork. */
> > +	if (rec->rm_owner != XFS_RMAP_OWN_INODES)
> > +		return 0;
> > +
> > +	agno = cur->bc_private.a.agno;
> > +	blks_per_cluster = xfs_icluster_size_fsb(mp);
> > +	nr_inodes = XFS_OFFBNO_TO_AGINO(mp, blks_per_cluster, 0);
> > +
> > +	if (rec->rm_startblock % blks_per_cluster != 0)
> > +		return -EFSCORRUPTED;
> > +
> > +	trace_xfs_repair_ialloc_extent_fn(mp, cur->bc_private.a.agno,
> > +			rec->rm_startblock, rec->rm_blockcount, rec->rm_owner,
> > +			rec->rm_offset, rec->rm_flags);
> > +
> > +	/*
> > +	 * Determine the inode block alignment, and where the block
> > +	 * ought to start if it's aligned properly.  On a sparse inode
> > +	 * system the rmap doesn't have to start on an alignment boundary,
> > +	 * but the record does.  On pre-sparse filesystems, we /must/
> > +	 * start both rmap and inobt on an alignment boundary.
> > +	 */
> > +	inoalign = xfs_ialloc_cluster_alignment(mp);
> > +	agbno = rec->rm_startblock;
> > +	agino = XFS_OFFBNO_TO_AGINO(mp, agbno, 0);
> > +	rmino = XFS_OFFBNO_TO_AGINO(mp, rounddown(agbno, inoalign), 0);
> > +	if (!xfs_sb_version_hassparseinodes(&mp->m_sb) && agino != rmino)
> > +		return -EFSCORRUPTED;
> > +
> > +	/*
> > +	 * For each cluster in this blob of inode, we must calculate the
> > +	 * properly aligned startino of that cluster, then iterate each
> > +	 * cluster to fill in used and filled masks appropriately.  We
> > +	 * then use the (startino, used, filled) information to construct
> > +	 * the appropriate inode records.
> > +	 */
> > +	for (agbno = rec->rm_startblock;
> > +	     agbno < rec->rm_startblock + rec->rm_blockcount;
> > +	     agbno += blks_per_cluster) {
> 
> I see a few problems with indenting and "just over" long lines here.
> Can you factor the loop internals into a separate function to reduce
> that issue? Say xfs_repair_ialloc_process_cluster()?

Ok, done.

> > +		/* The per-AG inum of this inode cluster. */
> > +		agino = XFS_OFFBNO_TO_AGINO(mp, agbno, 0);
> > +
> > +		/* The per-AG inum of the inobt record. */
> > +		startino = rmino +
> > +				rounddown(agino - rmino, XFS_INODES_PER_CHUNK);
> > +		cdist = agino - startino;
> 
> What's "cdist" mean? I can guess at it's meaning, but I don't recall
> seeing the inode number offset into a cluster been refered to as a
> distanced before....

cluster offset?

I wasn't sure of the terminology for the offset of the cluster within a
chunk, in units of ag inodes.

> > +		/* Every inode in this holemask slot is filled. */
> > +		fillmask = xfs_inobt_maskn(
> > +				cdist / XFS_INODES_PER_HOLEMASK_BIT,
> > +				nr_inodes / XFS_INODES_PER_HOLEMASK_BIT);
> > +
> > +		/* Grab the inode cluster buffer. */
> > +		imap.im_blkno = XFS_AGB_TO_DADDR(mp, agno, agbno);
> > +		imap.im_len = XFS_FSB_TO_BB(mp, blks_per_cluster);
> > +		imap.im_boffset = 0;
> > +
> > +		error = xfs_imap_to_bp(mp, cur->bc_tp, &imap,
> > +				&dip, &bp, 0, XFS_IGET_UNTRUSTED);
> > +		if (error)
> > +			return error;
> > +
> > +		usedmask = 0;
> > +		usedcount = 0;
> > +		/* Which inodes within this cluster are free? */
> > +		for (clusterino = 0; clusterino < nr_inodes; clusterino++) {
> > +			fsino = XFS_AGINO_TO_INO(mp, cur->bc_private.a.agno,
> > +					agino + clusterino);
> > +			error = xfs_repair_ialloc_check_free(cur, bp, fsino,
> > +					clusterino, &inuse);
> > +			if (error) {
> > +				xfs_trans_brelse(cur->bc_tp, bp);
> > +				return error;
> > +			}
> > +			if (inuse) {
> > +				usedcount++;
> > +				usedmask |= XFS_INOBT_MASK(cdist + clusterino);
> > +			}
> > +		}
> > +		xfs_trans_brelse(cur->bc_tp, bp);
> > +
> > +		/*
> > +		 * If the last item in the list is our chunk record,
> > +		 * update that.
> > +		 */
> > +		if (!list_empty(&ri->extlist)) {
> > +			rie = list_last_entry(&ri->extlist,
> > +					struct xfs_repair_ialloc_extent, list);
> > +			if (rie->startino + XFS_INODES_PER_CHUNK > startino) {
> > +				rie->freemask &= ~usedmask;
> > +				rie->holemask &= ~fillmask;
> > +				rie->count += nr_inodes;
> > +				rie->usedcount += usedcount;
> > +				continue;
> > +			}
> > +		}
> > +
> > +		/* New inode chunk; add to the list. */
> > +		rie = kmem_alloc(sizeof(struct xfs_repair_ialloc_extent),
> > +				KM_MAYFAIL);
> > +		if (!rie)
> > +			return -ENOMEM;
> > +
> > +		INIT_LIST_HEAD(&rie->list);
> > +		rie->startino = startino;
> > +		rie->freemask = XFS_INOBT_ALL_FREE & ~usedmask;
> > +		rie->holemask = XFS_INOBT_ALL_FREE & ~fillmask;
> > +		rie->count = nr_inodes;
> > +		rie->usedcount = usedcount;
> > +		list_add_tail(&rie->list, &ri->extlist);
> > +		ri->nr_records++;
> > +	}
> > +
> > +	return 0;
> > +}
> 
> [....]
> 
> > +/* Repair both inode btrees. */
> > +int
> > +xfs_repair_iallocbt(
> > +	struct xfs_scrub_context	*sc)
> > +{
> > +	struct xfs_repair_ialloc	ri;
> > +	struct xfs_owner_info		oinfo;
> > +	struct xfs_mount		*mp = sc->mp;
> > +	struct xfs_buf			*bp;
> > +	struct xfs_repair_ialloc_extent	*rie;
> > +	struct xfs_repair_ialloc_extent	*n;
> > +	struct xfs_agi			*agi;
> > +	struct xfs_btree_cur		*cur = NULL;
> > +	struct xfs_perag		*pag;
> > +	xfs_fsblock_t			inofsb;
> > +	xfs_fsblock_t			finofsb;
> > +	xfs_extlen_t			nr_blocks;
> > +	xfs_agino_t			old_count;
> > +	xfs_agino_t			old_freecount;
> > +	xfs_agino_t			freecount;
> > +	unsigned int			count;
> > +	unsigned int			usedcount;
> > +	int				logflags;
> > +	int				error = 0;
> > +
> > +	/* We require the rmapbt to rebuild anything. */
> > +	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
> > +		return -EOPNOTSUPP;
> 
> This could be factored similarly to the allocbt repair function.

Will do.

> > +
> > +	xfs_scrub_perag_get(sc->mp, &sc->sa);
> > +	pag = sc->sa.pag;
> > +	/* Collect all reverse mappings for inode blocks. */
> > +	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_INOBT);
> > +	INIT_LIST_HEAD(&ri.extlist);
> > +	xfs_repair_init_extent_list(&ri.btlist);
> > +	ri.nr_records = 0;
> > +	ri.sc = sc;
> > +
> > +	cur = xfs_rmapbt_init_cursor(mp, sc->tp, sc->sa.agf_bp, sc->sa.agno);
> > +	error = xfs_rmap_query_all(cur, xfs_repair_ialloc_extent_fn, &ri);
> > +	if (error)
> > +		goto out;
> > +	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
> > +	cur = NULL;
> > +
> > +	/* Do we actually have enough space to do this? */
> > +	nr_blocks = xfs_iallocbt_calc_size(mp, ri.nr_records);
> > +	if (xfs_sb_version_hasfinobt(&mp->m_sb))
> > +		nr_blocks *= 2;
> > +	if (!xfs_repair_ag_has_space(pag, nr_blocks, XFS_AG_RESV_NONE)) {
> > +		error = -ENOSPC;
> > +		goto out;
> > +	}
> > +
> > +	/* Invalidate all the inobt/finobt blocks in btlist. */
> > +	error = xfs_repair_invalidate_blocks(sc, &ri.btlist);
> > +	if (error)
> > +		goto out;
> > +
> > +	agi = XFS_BUF_TO_AGI(sc->sa.agi_bp);
> > +	/* Initialize new btree roots. */
> > +	error = xfs_repair_alloc_ag_block(sc, &oinfo, &inofsb,
> > +			XFS_AG_RESV_NONE);
> > +	if (error)
> > +		goto out;
> > +	error = xfs_repair_init_btblock(sc, inofsb, &bp, XFS_BTNUM_INO,
> > +			&xfs_inobt_buf_ops);
> > +	if (error)
> > +		goto out;
> > +	agi->agi_root = cpu_to_be32(XFS_FSB_TO_AGBNO(mp, inofsb));
> > +	agi->agi_level = cpu_to_be32(1);
> > +	logflags = XFS_AGI_ROOT | XFS_AGI_LEVEL;
> > +
> > +	if (xfs_sb_version_hasfinobt(&mp->m_sb)) {
> > +		error = xfs_repair_alloc_ag_block(sc, &oinfo, &finofsb,
> > +				mp->m_inotbt_nores ? XFS_AG_RESV_NONE :
> > +						     XFS_AG_RESV_METADATA);
> > +		if (error)
> > +			goto out;
> > +		error = xfs_repair_init_btblock(sc, finofsb, &bp,
> > +				XFS_BTNUM_FINO, &xfs_inobt_buf_ops);
> > +		if (error)
> > +			goto out;
> > +		agi->agi_free_root = cpu_to_be32(XFS_FSB_TO_AGBNO(mp, finofsb));
> > +		agi->agi_free_level = cpu_to_be32(1);
> > +		logflags |= XFS_AGI_FREE_ROOT | XFS_AGI_FREE_LEVEL;
> > +	}
> > +
> > +	xfs_ialloc_log_agi(sc->tp, sc->sa.agi_bp, logflags);
> > +	error = xfs_repair_roll_ag_trans(sc);
> > +	if (error)
> > +		goto out;
> > +
> > +	/* Insert records into the new btrees. */
> > +	count = 0;
> > +	usedcount = 0;
> > +	list_sort(NULL, &ri.extlist, xfs_repair_ialloc_extent_cmp);
> > +	list_for_each_entry_safe(rie, n, &ri.extlist, list) {
> > +		count += rie->count;
> > +		usedcount += rie->usedcount;
> > +
> > +		error = xfs_repair_iallocbt_insert_rec(sc, rie);
> > +		if (error)
> > +			goto out;
> > +
> > +		list_del(&rie->list);
> > +		kmem_free(rie);
> > +	}
> > +
> > +
> > +	/* Update the AGI counters. */
> > +	agi = XFS_BUF_TO_AGI(sc->sa.agi_bp);
> > +	old_count = be32_to_cpu(agi->agi_count);
> > +	old_freecount = be32_to_cpu(agi->agi_freecount);
> > +	freecount = count - usedcount;
> > +
> > +	xfs_repair_mod_ino_counts(sc, old_count, count, old_freecount,
> > +			freecount);
> > +
> > +	if (count != old_count) {
> > +		if (sc->sa.pag->pagi_init)
> > +			sc->sa.pag->pagi_count = count;
> > +		agi->agi_count = cpu_to_be32(count);
> > +		xfs_ialloc_log_agi(sc->tp, sc->sa.agi_bp, XFS_AGI_COUNT);
> > +	}
> > +
> > +	if (freecount != old_freecount) {
> > +		if (sc->sa.pag->pagi_init)
> > +			sc->sa.pag->pagi_freecount = freecount;
> 
> We've read the AGI buffer in at this point, right? so it is
> guaranteed that pagi_init is true, right?

Yeah.

> > +		agi->agi_freecount = cpu_to_be32(freecount);
> > +		xfs_ialloc_log_agi(sc->tp, sc->sa.agi_bp, XFS_AGI_FREECOUNT);
> > +	}
> > +
> > +	/* Free the old inode btree blocks if they're not in use. */
> > +	return xfs_repair_reap_btree_extents(sc, &ri.btlist, &oinfo,
> > +			XFS_AG_RESV_NONE);
> > +out:
> 
> out_error, perhaps, to distinguish it from the normal function
> return path? (and perhaps apply that to all the previous main reapir
> functions on factoring?)

Ok.

--D

> > +	if (cur)
> > +		xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
> > +	xfs_repair_cancel_btree_extents(sc, &ri.btlist);
> > +	list_for_each_entry_safe(rie, n, &ri.extlist, list) {
> > +		list_del(&rie->list);
> > +		kmem_free(rie);
> > +	}
> > +	return error;
> > +}
> 
> -Dave.
> 
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 03/14] xfs: repair free space btrees
  2018-06-06  3:34       ` Dave Chinner
@ 2018-06-06  4:01         ` Darrick J. Wong
  0 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2018-06-06  4:01 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Jun 06, 2018 at 01:34:38PM +1000, Dave Chinner wrote:
> On Tue, Jun 05, 2018 at 06:50:26PM -0700, Darrick J. Wong wrote:
> > On Mon, Jun 04, 2018 at 12:12:34PM +1000, Dave Chinner wrote:
> > > On Wed, May 30, 2018 at 12:30:58PM -0700, Darrick J. Wong wrote:
> > > > +
> > > > +/* Free space btree repair. */
> > > 
> > > Can you add a decription of the algorithm used here.
> > 
> > Ok.
> > 
> > /*
> >  * Free Space Btree Repair
> >  * =======================
> >  *
> >  * The reverse mappings are supposed to record all space usage for the
> >  * entire AG.  Therefore, we can recalculate the free extents in an AG
> >  * by looking for gaps in the physical extents recorded in the rmapbt.
> >  * On a reflink filesystem this is a little more tricky in that we have
> >  * to be aware that the rmap records are allowed to overlap.
> >  *
> >  * We derive which blocks belonged to the old bnobt/cntbt by recording
> >  * all the OWN_AG extents and subtracting out the blocks owned by all
> >  * other OWN_AG metadata: the rmapbt blocks visited while iterating the
> >  * reverse mappings and the AGFL blocks.
> >  *
> >  * Once we have both of those pieces, we can reconstruct the bnobt and
> >  * cntbt by blowing out the free block state and freeing all the extents
> >  * that we found.  This adds the requirement that we can't have any busy
> >  * extents in the AG because the busy code cannot handle duplicate
> >  * records.
> >  *
> >  * Note that we can only rebuild both free space btrees at the same
> >  * time.
> 
> Ok, so if I've got this right, the limitations indicated in the last
> two paragraphs are a result of makring space free by calling
> xfs_free_extent() on all the extents in the list? If so, can you
> mention that this is an implementation artifact, not a algorithmic
> limitation?

Ok.

> > > > +/* Allocate a block from the (cached) longest extent in the AG. */
> > > > +STATIC xfs_fsblock_t
> > > > +xfs_repair_allocbt_alloc_from_longest(
> > > > +	struct xfs_repair_alloc		*ra,
> > > > +	struct xfs_repair_alloc_extent	**longest)
> > > > +{
> > > > +	xfs_fsblock_t			fsb;
> > > > +
> > > > +	if (*longest && (*longest)->len == 0) {
> > > > +		list_del(&(*longest)->list);
> > > > +		kmem_free(*longest);
> > > > +		*longest = NULL;
> > > > +	}
> > > > +
> > > > +	if (*longest == NULL) {
> > > > +		*longest = xfs_repair_allocbt_get_longest(ra);
> > > > +		if (*longest == NULL)
> > > > +			return NULLFSBLOCK;
> > > > +	}
> > > > +
> > > > +	fsb = XFS_AGB_TO_FSB(ra->sc->mp, ra->sc->sa.agno, (*longest)->bno);
> > > > +	(*longest)->bno++;
> > > > +	(*longest)->len--;
> > > 
> > > What if this makes the longest extent no longer the longest on the
> > > extent list?
> > 
> > It should be fine, since all we do later is zero out the free space
> > counters in the AG and start freeing extents.  The regular extent
> > freeing code takes care to update the agf/perag longest-free counter
> > appropriately.
> 
> Comment, please. :P

I'll do it one better and refactor it out of the code entirely. :)

The bnobt/cntbt roots can come from the shortest extent in the free
space, which means we can pluck the longest extent off our list and
insert it first, and it's always the longest one.

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/14] xfs: repair the AGF and AGFL
  2018-06-05 23:18     ` Darrick J. Wong
@ 2018-06-06  4:06       ` Dave Chinner
  2018-06-06  4:56         ` Darrick J. Wong
  0 siblings, 1 reply; 35+ messages in thread
From: Dave Chinner @ 2018-06-06  4:06 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Jun 05, 2018 at 04:18:56PM -0700, Darrick J. Wong wrote:
> On Mon, Jun 04, 2018 at 11:52:55AM +1000, Dave Chinner wrote:
> > On Wed, May 30, 2018 at 12:30:45PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > Regenerate the AGF and AGFL from the rmap data.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > [...]
> > 
> > > +/* Repair the AGF. */
> > > +int
> > > +xfs_repair_agf(
> > > +	struct xfs_scrub_context	*sc)
> > > +{
> > > +	struct xfs_repair_find_ag_btree	fab[] = {
> > > +		{
> > > +			.rmap_owner = XFS_RMAP_OWN_AG,
> > > +			.buf_ops = &xfs_allocbt_buf_ops,
> > > +			.magic = XFS_ABTB_CRC_MAGIC,
> > > +		},
> > > +		{
> > > +			.rmap_owner = XFS_RMAP_OWN_AG,
> > > +			.buf_ops = &xfs_allocbt_buf_ops,
> > > +			.magic = XFS_ABTC_CRC_MAGIC,
> > > +		},
> > > +		{
> > > +			.rmap_owner = XFS_RMAP_OWN_AG,
> > > +			.buf_ops = &xfs_rmapbt_buf_ops,
> > > +			.magic = XFS_RMAP_CRC_MAGIC,
> > > +		},
> > > +		{
> > > +			.rmap_owner = XFS_RMAP_OWN_REFC,
> > > +			.buf_ops = &xfs_refcountbt_buf_ops,
> > > +			.magic = XFS_REFC_CRC_MAGIC,
> > > +		},
> > > +		{
> > > +			.buf_ops = NULL,
> > > +		},
> > > +	};
> > > +	struct xfs_repair_agf_allocbt	raa;
> > > +	struct xfs_agf			old_agf;
> > > +	struct xfs_mount		*mp = sc->mp;
> > > +	struct xfs_buf			*agf_bp;
> > > +	struct xfs_buf			*agfl_bp;
> > > +	struct xfs_agf			*agf;
> > > +	struct xfs_btree_cur		*cur = NULL;
> > > +	struct xfs_perag		*pag;
> > > +	xfs_agblock_t			blocks;
> > > +	xfs_agblock_t			freesp_blocks;
> > > +	int64_t				delta_fdblocks = 0;
> > > +	int				error;
> > > +
> > > +	/* We require the rmapbt to rebuild anything. */
> > > +	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
> > > +		return -EOPNOTSUPP;
> > > +
> > > +	xfs_scrub_perag_get(sc->mp, &sc->sa);
> > > +	pag = sc->sa.pag;
> > > +	memset(&raa, 0, sizeof(raa));
> > > +	error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
> > > +			XFS_AG_DADDR(mp, sc->sa.agno, XFS_AGF_DADDR(mp)),
> > > +			XFS_FSS_TO_BB(mp, 1), 0, &agf_bp, NULL);
> > > +	if (error)
> > > +		return error;
> > > +	agf_bp->b_ops = &xfs_agf_buf_ops;
> > > +
> > > +	/*
> > > +	 * Load the AGFL so that we can screen out OWN_AG blocks that
> > > +	 * are on the AGFL now; these blocks might have once been part
> > > +	 * of the bno/cnt/rmap btrees but are not now.
> > > +	 */
> > > +	error = xfs_alloc_read_agfl(mp, sc->tp, sc->sa.agno, &agfl_bp);
> > > +	if (error)
> > > +		return error;
> > > +	error = xfs_agfl_walk(sc->mp, XFS_BUF_TO_AGF(agf_bp), agfl_bp,
> > > +			xfs_repair_agf_check_agfl_block, sc);
> > > +	if (error)
> > > +		return error;
> > 
> > THis is a bit of a chicken/egg situation, isn't it? We haven't
> > repaired the AGFL yet, so how do we know what is valid here?
> 
> Yep.  The AGF is corrupt, so we have to trust the AGFL contents because
> we can't do any serious cross-referencing with any of the btrees rooted
> in the AGF.  If the AGFL contents are obviously bad then we'll bail out.

Can you add that as a comment here?

> > Can we factor this function allow rebuild operation lines?
> 
> Yes...
> 
> > That will help document all the different pieces it is putting
> > together. E.g move the AGF header init to before
> > xfs_repair_find_ag_btree_roots(), and then pass it into
> > xfs_repair_agf_rebuild_roots() which contains the above fab specific
> > code.
> 
> ...however, that's the second (and admittedly not well documented)
> second chicken-egg -- we find the agf btree roots by probing the rmapbt,
> which is rooted in the agf.  So xfs_repair_find_ag_btree_roots has to be
> fed the old agf_bp buffer, and if that blows up then we bail out without
> changing anything.

Same again - factoring and adding comments to explain things like
this will make it much easier to understand.

> > > +/* Record all freespace information. */
> > > +STATIC int
> > > +xfs_repair_agfl_rmap_fn(
> > > +	struct xfs_btree_cur		*cur,
> > > +	struct xfs_rmap_irec		*rec,
> > > +	void				*priv)
> > > +{
> > > +	struct xfs_repair_agfl		*ra = priv;
> > > +	struct xfs_buf			*bp;
> > > +	xfs_fsblock_t			fsb;
> > > +	int				i;
> > > +	int				error = 0;
> > > +
> > > +	if (xfs_scrub_should_terminate(ra->sc, &error))
> > > +		return error;
> > > +
> > > +	/* Record all the OWN_AG blocks... */
> > > +	if (rec->rm_owner == XFS_RMAP_OWN_AG) {
> > > +		fsb = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_private.a.agno,
> > > +				rec->rm_startblock);
> > > +		error = xfs_repair_collect_btree_extent(ra->sc,
> > > +				&ra->freesp_list, fsb, rec->rm_blockcount);
> > > +		if (error)
> > > +			return error;
> > > +	}
> > > +
> > > +	/* ...and all the rmapbt blocks... */
> > > +	for (i = 0; i < cur->bc_nlevels && cur->bc_ptrs[i] == 1; i++) {
> > 
> > What is the significance of "cur->bc_ptrs[i] == 1"?
> > 
> > This loop looks like it is walking the btree path to this leaf, but
> > bc_ptrs[] will only have a "1" in it if we are at the left-most edge
> > of the tree, right? so what about all the other btree blocks?
> 
> Close.  We're walking up the tree from the leaf towards the root.  For
> each level, we assume that if bc_ptrs[level] == 1, then this is the
> first time we've seen the block at that level, so we remember that we
> saw this rmapbt block.  bc_ptrs is the offset within a block, not the
> offset for the entire level.
> 
> So if our rmapbt tree is:
> 
>    4
>  / | \
> 1  2  3
> 
> Pretend for this example that each leaf block has 100 rmap records.  For
> the first rmap record, we'll observe that bc_ptrs[0] == 1, so we record
> that we saw block 1.  Then we observe that bc_ptrs[1] == 1, so we record
> block 4.  agmeta_list is [1, 4].
> 
> For the second rmap record, we see that bc_ptrs[0] == 2, so we exit the
> loop.  agmeta_list remains [1, 4].
> 
> For the 101st rmap record, we've moved onto leaf block 2.  Now
> bc_ptrs[0] == 1 again, so we record that we saw block 2.  We see that
> bc_ptrs[1] == 2, so we exit the loop.  agmeta_list = [1, 4, 2].
> 
> For the 102nd rmap, bc_ptrs[0] == 2, so we exit.
> 
> For the 201st rmap record, we've moved on to leaf block 3.  bc_ptrs[0]
> == 1, so we add 3 to agmeta_list.  [1, 4, 2, 3].

And that is crying out for either an iterator macro or a helper
function with that explanation above it :P

> > > +
> > > +	/* We require the rmapbt to rebuild anything. */
> > > +	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
> > > +		return -EOPNOTSUPP;
> > > +
> > > +	xfs_scrub_perag_get(sc->mp, &sc->sa);
> > > +	xfs_repair_init_extent_list(&ra.freesp_list);
> > > +	xfs_repair_init_extent_list(&ra.agmeta_list);
> > > +	ra.sc = sc;
> > > +
> > > +	error = xfs_alloc_read_agf(mp, sc->tp, sc->sa.agno, 0, &agf_bp);
> > > +	if (error)
> > > +		return error;
> > > +	if (!agf_bp)
> > > +		return -ENOMEM;
> > > +
> > > +	error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
> > > +			XFS_AG_DADDR(mp, sc->sa.agno, XFS_AGFL_DADDR(mp)),
> > > +			XFS_FSS_TO_BB(mp, 1), 0, &agfl_bp, NULL);
> > > +	if (error)
> > > +		return error;
> > > +	agfl_bp->b_ops = &xfs_agfl_buf_ops;
> > 
> > Be nice to have a __xfs_alloc_read_agfl() function that didn't set
> > the ops, and have this and xfs_alloc_read_agfl() both call it.
> 
> Huh?  xfs_alloc_read_agfl always reads the agfl buffer with
> &xfs_agfl_buf_ops, why would we want to call it without the verifier?

You wouldn't:

xfs_alloc_read_agfl()
{
	return __xfs_alloc_read_agfl(..., &xfs_agfl_buf_ops);
}

And then the above simply becomes

	error = __xfs_alloc_read_agfl(..., NULL);
	if (error)
		return error;
	agfl_bp->b_ops = &xfs_agfl_buf_ops;

I'm more concerned about open coding of things we have currently
centralised in helpers, and trying to see if there's ways to keep
the functions centralised.

Don't worry about it - it was really just a comment about "it would
be nice to have...".

> It's only scrub that gets to do screwy things like read buffers with no
> verifier.  libxfs functions should never do that.

I didn't know (well, I don't recall that) we have this rule. Can you
point me at the discussion so I can read up on it?  IMO libxfs is
for centralising common operations, not for enforcing boundaries or
rules on how we access objects.


> > > +int
> > > +xfs_repair_mod_fdblocks(
> > > +	struct xfs_scrub_context	*sc,
> > > +	int64_t				delta_fdblocks)
> > > +{
> > > +	int				error;
> > > +
> > > +	if (delta_fdblocks == 0)
> > > +		return 0;
> > > +
> > > +	if (delta_fdblocks < 0) {
> > > +		error = xfs_trans_reserve_more(sc->tp, -delta_fdblocks, 0);
> > > +		if (error)
> > > +			return error;
> > > +	}
> > > +
> > > +	xfs_trans_mod_sb(sc->tp, XFS_TRANS_SB_FDBLOCKS, delta_fdblocks);
> > 
> > This seems a little hacky - it's working around a transaction
> > reservation overflow warning, right?
> 
> More than that -- we're trying to avoid the situation where the incore
> free block counter goes negative.

Which will only happen if we overflow the transaction reservation,
yes?

> Things go south pretty quickly when
> that happens because transaction reservations succeed when there's not
> enough free space to accomodate them.  We'd rather error out to
> userspace and have the admin unmount and xfs_repair than risk letting
> the fs really blow up.

Sure, but I really don't like retrospective modification of
transaction reservations.  The repair code is already supposed to
have a reservation that is big enough to rebuild the AG trees, so
why should we need to reserve more space while rebuilding the AG
trees?

> Note that this function has to be called before repair dirties anything
> in the repair transaction so we're still at a place where we could back
> out with no harm done.

Still doesn't explain to me what the problem is that this code works
around. And because I don't understand why it is necessary, this just
seems like a hack....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 04/14] xfs: repair inode btrees
  2018-06-06  3:55     ` Darrick J. Wong
@ 2018-06-06  4:32       ` Dave Chinner
  2018-06-06  4:58         ` Darrick J. Wong
  0 siblings, 1 reply; 35+ messages in thread
From: Dave Chinner @ 2018-06-06  4:32 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Jun 05, 2018 at 08:55:28PM -0700, Darrick J. Wong wrote:
> On Mon, Jun 04, 2018 at 01:41:30PM +1000, Dave Chinner wrote:
> > On Wed, May 30, 2018 at 12:31:04PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > Use the rmapbt to find inode chunks, query the chunks to compute
> > > hole and free masks, and with that information rebuild the inobt
> > > and finobt.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > [...]
> > 
> > > +xfs_repair_ialloc_check_free(
> > > +	struct xfs_btree_cur	*cur,
> > > +	struct xfs_buf		*bp,
> > > +	xfs_ino_t		fsino,
> > > +	xfs_agino_t		bpino,
> > > +	bool			*inuse)
> > > +{
> > > +	struct xfs_mount	*mp = cur->bc_mp;
> > > +	struct xfs_dinode	*dip;
> > > +	int			error;
> > > +
> > > +	/* Will the in-core inode tell us if it's in use? */
> > > +	error = xfs_icache_inode_is_allocated(mp, cur->bc_tp, fsino, inuse);
> > > +	if (!error)
> > > +		return 0;
> > > +
> > > +	/* Inode uncached or half assembled, read disk buffer */
> > > +	dip = xfs_buf_offset(bp, bpino * mp->m_sb.sb_inodesize);
> > > +	if (be16_to_cpu(dip->di_magic) != XFS_DINODE_MAGIC)
> > > +		return -EFSCORRUPTED;
> > 
> > Do we hold the buffer locked here? i.e. can we race with someone
> > else allocating/freeing/reading the inode?
> 
> I think repair should be ok from alloc/free because both of those paths
> (xfs_dialloc/xfs_difree) will grab the AGI header, whereas repair locks
> all three AG headers and keeps them locked until repairs are complete.
> I don't think we have to worry about concurrent reads because the only
> field we care about are di_mode/i_mode, which don't change outside of
> inode allocation and freeing.

Comment please :P

And, to be technically correct - di_mode/i_mode can change outside
of alloc/free. However, only the permission bits can change so it
doesn't affect the test we are doing here.

> > > +		/* The per-AG inum of this inode cluster. */
> > > +		agino = XFS_OFFBNO_TO_AGINO(mp, agbno, 0);
> > > +
> > > +		/* The per-AG inum of the inobt record. */
> > > +		startino = rmino +
> > > +				rounddown(agino - rmino, XFS_INODES_PER_CHUNK);
> > > +		cdist = agino - startino;
> > 
> > What's "cdist" mean? I can guess at it's meaning, but I don't recall
> > seeing the inode number offset into a cluster been refered to as a
> > distanced before....
> 
> cluster offset?
>
> I wasn't sure of the terminology for the offset of the cluster within a
> chunk, in units of ag inodes.

I'm not sure we have one. :/

But, yeah, going by the definition of inode offset from
XFS_INO_TO_OFFSET() and XFS_AGINO_TO_OFFSET() - "offset" is the
inode number index from the start of the block - cluster offset is
probably the best name for it.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/14] xfs: repair the AGF and AGFL
  2018-06-06  4:06       ` Dave Chinner
@ 2018-06-06  4:56         ` Darrick J. Wong
  2018-06-07  0:31           ` Dave Chinner
  0 siblings, 1 reply; 35+ messages in thread
From: Darrick J. Wong @ 2018-06-06  4:56 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Jun 06, 2018 at 02:06:24PM +1000, Dave Chinner wrote:
> On Tue, Jun 05, 2018 at 04:18:56PM -0700, Darrick J. Wong wrote:
> > On Mon, Jun 04, 2018 at 11:52:55AM +1000, Dave Chinner wrote:
> > > On Wed, May 30, 2018 at 12:30:45PM -0700, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > 
> > > > Regenerate the AGF and AGFL from the rmap data.
> > > > 
> > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > [...]
> > > 
> > > > +/* Repair the AGF. */
> > > > +int
> > > > +xfs_repair_agf(
> > > > +	struct xfs_scrub_context	*sc)
> > > > +{
> > > > +	struct xfs_repair_find_ag_btree	fab[] = {
> > > > +		{
> > > > +			.rmap_owner = XFS_RMAP_OWN_AG,
> > > > +			.buf_ops = &xfs_allocbt_buf_ops,
> > > > +			.magic = XFS_ABTB_CRC_MAGIC,
> > > > +		},
> > > > +		{
> > > > +			.rmap_owner = XFS_RMAP_OWN_AG,
> > > > +			.buf_ops = &xfs_allocbt_buf_ops,
> > > > +			.magic = XFS_ABTC_CRC_MAGIC,
> > > > +		},
> > > > +		{
> > > > +			.rmap_owner = XFS_RMAP_OWN_AG,
> > > > +			.buf_ops = &xfs_rmapbt_buf_ops,
> > > > +			.magic = XFS_RMAP_CRC_MAGIC,
> > > > +		},
> > > > +		{
> > > > +			.rmap_owner = XFS_RMAP_OWN_REFC,
> > > > +			.buf_ops = &xfs_refcountbt_buf_ops,
> > > > +			.magic = XFS_REFC_CRC_MAGIC,
> > > > +		},
> > > > +		{
> > > > +			.buf_ops = NULL,
> > > > +		},
> > > > +	};
> > > > +	struct xfs_repair_agf_allocbt	raa;
> > > > +	struct xfs_agf			old_agf;
> > > > +	struct xfs_mount		*mp = sc->mp;
> > > > +	struct xfs_buf			*agf_bp;
> > > > +	struct xfs_buf			*agfl_bp;
> > > > +	struct xfs_agf			*agf;
> > > > +	struct xfs_btree_cur		*cur = NULL;
> > > > +	struct xfs_perag		*pag;
> > > > +	xfs_agblock_t			blocks;
> > > > +	xfs_agblock_t			freesp_blocks;
> > > > +	int64_t				delta_fdblocks = 0;
> > > > +	int				error;
> > > > +
> > > > +	/* We require the rmapbt to rebuild anything. */
> > > > +	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
> > > > +		return -EOPNOTSUPP;
> > > > +
> > > > +	xfs_scrub_perag_get(sc->mp, &sc->sa);
> > > > +	pag = sc->sa.pag;
> > > > +	memset(&raa, 0, sizeof(raa));
> > > > +	error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
> > > > +			XFS_AG_DADDR(mp, sc->sa.agno, XFS_AGF_DADDR(mp)),
> > > > +			XFS_FSS_TO_BB(mp, 1), 0, &agf_bp, NULL);
> > > > +	if (error)
> > > > +		return error;
> > > > +	agf_bp->b_ops = &xfs_agf_buf_ops;
> > > > +
> > > > +	/*
> > > > +	 * Load the AGFL so that we can screen out OWN_AG blocks that
> > > > +	 * are on the AGFL now; these blocks might have once been part
> > > > +	 * of the bno/cnt/rmap btrees but are not now.
> > > > +	 */
> > > > +	error = xfs_alloc_read_agfl(mp, sc->tp, sc->sa.agno, &agfl_bp);
> > > > +	if (error)
> > > > +		return error;
> > > > +	error = xfs_agfl_walk(sc->mp, XFS_BUF_TO_AGF(agf_bp), agfl_bp,
> > > > +			xfs_repair_agf_check_agfl_block, sc);
> > > > +	if (error)
> > > > +		return error;
> > > 
> > > THis is a bit of a chicken/egg situation, isn't it? We haven't
> > > repaired the AGFL yet, so how do we know what is valid here?
> > 
> > Yep.  The AGF is corrupt, so we have to trust the AGFL contents because
> > we can't do any serious cross-referencing with any of the btrees rooted
> > in the AGF.  If the AGFL contents are obviously bad then we'll bail out.
> 
> Can you add that as a comment here?

Done.  FWIW each of these functions that gets split out has a nice big
comment above it now.

> > > Can we factor this function allow rebuild operation lines?
> > 
> > Yes...
> > 
> > > That will help document all the different pieces it is putting
> > > together. E.g move the AGF header init to before
> > > xfs_repair_find_ag_btree_roots(), and then pass it into
> > > xfs_repair_agf_rebuild_roots() which contains the above fab specific
> > > code.
> > 
> > ...however, that's the second (and admittedly not well documented)
> > second chicken-egg -- we find the agf btree roots by probing the rmapbt,
> > which is rooted in the agf.  So xfs_repair_find_ag_btree_roots has to be
> > fed the old agf_bp buffer, and if that blows up then we bail out without
> > changing anything.
> 
> Same again - factoring and adding comments to explain things like
> this will make it much easier to understand.

(Done)

> > > > +/* Record all freespace information. */
> > > > +STATIC int
> > > > +xfs_repair_agfl_rmap_fn(
> > > > +	struct xfs_btree_cur		*cur,
> > > > +	struct xfs_rmap_irec		*rec,
> > > > +	void				*priv)
> > > > +{
> > > > +	struct xfs_repair_agfl		*ra = priv;
> > > > +	struct xfs_buf			*bp;
> > > > +	xfs_fsblock_t			fsb;
> > > > +	int				i;
> > > > +	int				error = 0;
> > > > +
> > > > +	if (xfs_scrub_should_terminate(ra->sc, &error))
> > > > +		return error;
> > > > +
> > > > +	/* Record all the OWN_AG blocks... */
> > > > +	if (rec->rm_owner == XFS_RMAP_OWN_AG) {
> > > > +		fsb = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_private.a.agno,
> > > > +				rec->rm_startblock);
> > > > +		error = xfs_repair_collect_btree_extent(ra->sc,
> > > > +				&ra->freesp_list, fsb, rec->rm_blockcount);
> > > > +		if (error)
> > > > +			return error;
> > > > +	}
> > > > +
> > > > +	/* ...and all the rmapbt blocks... */
> > > > +	for (i = 0; i < cur->bc_nlevels && cur->bc_ptrs[i] == 1; i++) {
> > > 
> > > What is the significance of "cur->bc_ptrs[i] == 1"?
> > > 
> > > This loop looks like it is walking the btree path to this leaf, but
> > > bc_ptrs[] will only have a "1" in it if we are at the left-most edge
> > > of the tree, right? so what about all the other btree blocks?
> > 
> > Close.  We're walking up the tree from the leaf towards the root.  For
> > each level, we assume that if bc_ptrs[level] == 1, then this is the
> > first time we've seen the block at that level, so we remember that we
> > saw this rmapbt block.  bc_ptrs is the offset within a block, not the
> > offset for the entire level.
> > 
> > So if our rmapbt tree is:
> > 
> >    4
> >  / | \
> > 1  2  3
> > 
> > Pretend for this example that each leaf block has 100 rmap records.  For
> > the first rmap record, we'll observe that bc_ptrs[0] == 1, so we record
> > that we saw block 1.  Then we observe that bc_ptrs[1] == 1, so we record
> > block 4.  agmeta_list is [1, 4].
> > 
> > For the second rmap record, we see that bc_ptrs[0] == 2, so we exit the
> > loop.  agmeta_list remains [1, 4].
> > 
> > For the 101st rmap record, we've moved onto leaf block 2.  Now
> > bc_ptrs[0] == 1 again, so we record that we saw block 2.  We see that
> > bc_ptrs[1] == 2, so we exit the loop.  agmeta_list = [1, 4, 2].
> > 
> > For the 102nd rmap, bc_ptrs[0] == 2, so we exit.
> > 
> > For the 201st rmap record, we've moved on to leaf block 3.  bc_ptrs[0]
> > == 1, so we add 3 to agmeta_list.  [1, 4, 2, 3].
> 
> And that is crying out for either an iterator macro or a helper
> function with that explanation above it :P

Ok.  I'll fix up that explanation and make the whole thing a helper
function.

> > > > +
> > > > +	/* We require the rmapbt to rebuild anything. */
> > > > +	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
> > > > +		return -EOPNOTSUPP;
> > > > +
> > > > +	xfs_scrub_perag_get(sc->mp, &sc->sa);
> > > > +	xfs_repair_init_extent_list(&ra.freesp_list);
> > > > +	xfs_repair_init_extent_list(&ra.agmeta_list);
> > > > +	ra.sc = sc;
> > > > +
> > > > +	error = xfs_alloc_read_agf(mp, sc->tp, sc->sa.agno, 0, &agf_bp);
> > > > +	if (error)
> > > > +		return error;
> > > > +	if (!agf_bp)
> > > > +		return -ENOMEM;
> > > > +
> > > > +	error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
> > > > +			XFS_AG_DADDR(mp, sc->sa.agno, XFS_AGFL_DADDR(mp)),
> > > > +			XFS_FSS_TO_BB(mp, 1), 0, &agfl_bp, NULL);
> > > > +	if (error)
> > > > +		return error;
> > > > +	agfl_bp->b_ops = &xfs_agfl_buf_ops;
> > > 
> > > Be nice to have a __xfs_alloc_read_agfl() function that didn't set
> > > the ops, and have this and xfs_alloc_read_agfl() both call it.
> > 
> > Huh?  xfs_alloc_read_agfl always reads the agfl buffer with
> > &xfs_agfl_buf_ops, why would we want to call it without the verifier?
> 
> You wouldn't:
> 
> xfs_alloc_read_agfl()
> {
> 	return __xfs_alloc_read_agfl(..., &xfs_agfl_buf_ops);
> }
> 
> And then the above simply becomes
> 
> 	error = __xfs_alloc_read_agfl(..., NULL);
> 	if (error)
> 		return error;
> 	agfl_bp->b_ops = &xfs_agfl_buf_ops;
> 
> I'm more concerned about open coding of things we have currently
> centralised in helpers, and trying to see if there's ways to keep
> the functions centralised.
> 
> Don't worry about it - it was really just a comment about "it would
> be nice to have...".

<nod>

> > It's only scrub that gets to do screwy things like read buffers with no
> > verifier.  libxfs functions should never do that.
> 
> I didn't know (well, I don't recall that) we have this rule. Can you
> point me at the discussion so I can read up on it?  IMO libxfs is
> for centralising common operations, not for enforcing boundaries or
> rules on how we access objects.

Ok, fair enough, I concede. :)

I had simply thought that the convention was that in general we don't
let anyone do that, except for the the thing that deals with exceptional
situations.

> 
> > > > +int
> > > > +xfs_repair_mod_fdblocks(
> > > > +	struct xfs_scrub_context	*sc,
> > > > +	int64_t				delta_fdblocks)
> > > > +{
> > > > +	int				error;
> > > > +
> > > > +	if (delta_fdblocks == 0)
> > > > +		return 0;
> > > > +
> > > > +	if (delta_fdblocks < 0) {
> > > > +		error = xfs_trans_reserve_more(sc->tp, -delta_fdblocks, 0);
> > > > +		if (error)
> > > > +			return error;
> > > > +	}
> > > > +
> > > > +	xfs_trans_mod_sb(sc->tp, XFS_TRANS_SB_FDBLOCKS, delta_fdblocks);
> > > 
> > > This seems a little hacky - it's working around a transaction
> > > reservation overflow warning, right?
> > 
> > More than that -- we're trying to avoid the situation where the incore
> > free block counter goes negative.
> 
> Which will only happen if we overflow the transaction reservation,
> yes?
> 
> > Things go south pretty quickly when
> > that happens because transaction reservations succeed when there's not
> > enough free space to accomodate them.  We'd rather error out to
> > userspace and have the admin unmount and xfs_repair than risk letting
> > the fs really blow up.
> 
> Sure, but I really don't like retrospective modification of
> transaction reservations.  The repair code is already supposed to
> have a reservation that is big enough to rebuild the AG trees, so
> why should we need to reserve more space while rebuilding the AG
> trees?
> 
> > Note that this function has to be called before repair dirties anything
> > in the repair transaction so we're still at a place where we could back
> > out with no harm done.
> 
> Still doesn't explain to me what the problem is that this code works
> around. And because I don't understand why it is necessary, this just
> seems like a hack....

It /is/ a hack while I figure out a sane strategy for checking the
summary counters that doesn't regularly shut down the filesystem.  I've
thought that perhaps we should leave the global counters alone.  If
something corrupts agf_flcount to 0x1000032 (when the real flcount is
0x32) then we're going to subtract a huge quantity from the global
counter, which is totally stupid.

I've been mulling over what to do here -- normally, repair always writes
out fresh AG headers and summary counters and under normal circumstance
we'll always keep the AG counts and the global counts in sync, right?

The idea I have to solve all these problems is to add a superblock state
flag that we set whenever we think the global counters might be wrong.
We'd only do this if we had to fix the counters in an AGF, or if the
agfl padding fixer triggered, etc.

Then we add a new FSSUMMARY scrubber that only does anything if the
'counters might be wrong' flag is set.  When it does, we freeze the fs,
tally up all the counters and reservations, and fix the counters if we
can.  Then unfreeze and exit.  This way we're not locking the entire
filesystem every time scrub runs, but we have a way to fix the global
counters if we need to.  Granted, figuring out the total amount of
incore reservation might not be quick.

The other repair functions no longer need xfs_repair_mod_fdblocks; if
they has to make a correction to any of the per-ag free counters all
they do is set the "wrong counters" flag.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 04/14] xfs: repair inode btrees
  2018-06-06  4:32       ` Dave Chinner
@ 2018-06-06  4:58         ` Darrick J. Wong
  0 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2018-06-06  4:58 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Jun 06, 2018 at 02:32:46PM +1000, Dave Chinner wrote:
> On Tue, Jun 05, 2018 at 08:55:28PM -0700, Darrick J. Wong wrote:
> > On Mon, Jun 04, 2018 at 01:41:30PM +1000, Dave Chinner wrote:
> > > On Wed, May 30, 2018 at 12:31:04PM -0700, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > 
> > > > Use the rmapbt to find inode chunks, query the chunks to compute
> > > > hole and free masks, and with that information rebuild the inobt
> > > > and finobt.
> > > > 
> > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > [...]
> > > 
> > > > +xfs_repair_ialloc_check_free(
> > > > +	struct xfs_btree_cur	*cur,
> > > > +	struct xfs_buf		*bp,
> > > > +	xfs_ino_t		fsino,
> > > > +	xfs_agino_t		bpino,
> > > > +	bool			*inuse)
> > > > +{
> > > > +	struct xfs_mount	*mp = cur->bc_mp;
> > > > +	struct xfs_dinode	*dip;
> > > > +	int			error;
> > > > +
> > > > +	/* Will the in-core inode tell us if it's in use? */
> > > > +	error = xfs_icache_inode_is_allocated(mp, cur->bc_tp, fsino, inuse);
> > > > +	if (!error)
> > > > +		return 0;
> > > > +
> > > > +	/* Inode uncached or half assembled, read disk buffer */
> > > > +	dip = xfs_buf_offset(bp, bpino * mp->m_sb.sb_inodesize);
> > > > +	if (be16_to_cpu(dip->di_magic) != XFS_DINODE_MAGIC)
> > > > +		return -EFSCORRUPTED;
> > > 
> > > Do we hold the buffer locked here? i.e. can we race with someone
> > > else allocating/freeing/reading the inode?
> > 
> > I think repair should be ok from alloc/free because both of those paths
> > (xfs_dialloc/xfs_difree) will grab the AGI header, whereas repair locks
> > all three AG headers and keeps them locked until repairs are complete.
> > I don't think we have to worry about concurrent reads because the only
> > field we care about are di_mode/i_mode, which don't change outside of
> > inode allocation and freeing.
> 
> Comment please :P
> 
> And, to be technically correct - di_mode/i_mode can change outside
> of alloc/free. However, only the permission bits can change so it
> doesn't affect the test we are doing here.

Done.

> > > > +		/* The per-AG inum of this inode cluster. */
> > > > +		agino = XFS_OFFBNO_TO_AGINO(mp, agbno, 0);
> > > > +
> > > > +		/* The per-AG inum of the inobt record. */
> > > > +		startino = rmino +
> > > > +				rounddown(agino - rmino, XFS_INODES_PER_CHUNK);
> > > > +		cdist = agino - startino;
> > > 
> > > What's "cdist" mean? I can guess at it's meaning, but I don't recall
> > > seeing the inode number offset into a cluster been refered to as a
> > > distanced before....
> > 
> > cluster offset?
> >
> > I wasn't sure of the terminology for the offset of the cluster within a
> > chunk, in units of ag inodes.
> 
> I'm not sure we have one. :/
> 
> But, yeah, going by the definition of inode offset from
> XFS_INO_TO_OFFSET() and XFS_AGINO_TO_OFFSET() - "offset" is the
> inode number index from the start of the block - cluster offset is
> probably the best name for it.

Yeah, that's what I picked.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 05/14] xfs: repair the rmapbt
  2018-05-31  5:42   ` Amir Goldstein
@ 2018-06-06 21:13     ` Darrick J. Wong
  0 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2018-06-06 21:13 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: linux-xfs

On Thu, May 31, 2018 at 08:42:06AM +0300, Amir Goldstein wrote:
> On Wed, May 30, 2018 at 10:31 PM, Darrick J. Wong
> <darrick.wong@oracle.com> wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> >
> > Rebuild the reverse mapping btree from all primary metadata.
> >
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/Makefile            |    1
> >  fs/xfs/scrub/common.c      |    6
> >  fs/xfs/scrub/repair.c      |  119 +++++++
> >  fs/xfs/scrub/repair.h      |   27 +
> >  fs/xfs/scrub/rmap.c        |    6
> >  fs/xfs/scrub/rmap_repair.c |  796 ++++++++++++++++++++++++++++++++++++++++++++
> >  fs/xfs/scrub/scrub.c       |   18 +
> >  fs/xfs/scrub/scrub.h       |    2
> >  fs/xfs/xfs_mount.h         |    1
> >  fs/xfs/xfs_super.c         |   27 +
> >  fs/xfs/xfs_trans.c         |    7
> >  11 files changed, 1004 insertions(+), 6 deletions(-)
> >  create mode 100644 fs/xfs/scrub/rmap_repair.c
> >
> >
> > diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> > index 7c442f83b179..b9bbac3d5075 100644
> > --- a/fs/xfs/Makefile
> > +++ b/fs/xfs/Makefile
> > @@ -178,6 +178,7 @@ xfs-y                               += $(addprefix scrub/, \
> >                                    alloc_repair.o \
> >                                    ialloc_repair.o \
> >                                    repair.o \
> > +                                  rmap_repair.o \
> >                                    )
> >  endif
> >  endif
> > diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> > index 89938b328954..f92994716522 100644
> > --- a/fs/xfs/scrub/common.c
> > +++ b/fs/xfs/scrub/common.c
> > @@ -603,9 +603,13 @@ xfs_scrub_trans_alloc(
> >         struct xfs_scrub_context        *sc,
> >         uint                            resblks)
> >  {
> > +       uint                            flags = 0;
> > +
> > +       if (sc->fs_frozen)
> > +               flags |= XFS_TRANS_NO_WRITECOUNT;
> >         if (sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR)
> >                 return xfs_trans_alloc(sc->mp, &M_RES(sc->mp)->tr_itruncate,
> > -                               resblks, 0, 0, &sc->tp);
> > +                               resblks, 0, flags, &sc->tp);
> >
> >         return xfs_trans_alloc_empty(sc->mp, &sc->tp);
> >  }
> > diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
> > index 45a91841c0ac..4b5d599d53b9 100644
> > --- a/fs/xfs/scrub/repair.c
> > +++ b/fs/xfs/scrub/repair.c
> > @@ -43,6 +43,8 @@
> >  #include "xfs_ag_resv.h"
> >  #include "xfs_trans_space.h"
> >  #include "xfs_quota.h"
> > +#include "xfs_bmap.h"
> > +#include "xfs_bmap_util.h"
> >  #include "scrub/xfs_scrub.h"
> >  #include "scrub/scrub.h"
> >  #include "scrub/common.h"
> > @@ -1146,3 +1148,120 @@ xfs_repair_mod_ino_counts(
> >                                 (int64_t)freecount - old_freecount);
> >         }
> >  }
> > +
> > +/*
> > + * Freeze the FS against all other activity so that we can avoid ABBA
> > + * deadlocks while taking locks in unusual orders so that we can rebuild
> > + * metadata structures such as the rmapbt.
> > + */
> > +int
> > +xfs_repair_fs_freeze(
> > +       struct xfs_scrub_context        *sc)
> > +{
> > +       int                             error;
> > +
> > +       error = freeze_super(sc->mp->m_super);
> > +       if (error)
> > +               return error;
> > +       sc->fs_frozen = true;
> > +       return 0;
> > +}
> > +
> > +/* Unfreeze the FS. */
> > +int
> > +xfs_repair_fs_thaw(
> > +       struct xfs_scrub_context        *sc)
> > +{
> > +       struct inode                    *inode, *o;
> > +       int                             error;
> > +
> > +       sc->fs_frozen = false;
> > +       error = thaw_super(sc->mp->m_super);
> > +
> > +       inode = sc->frozen_inode_list;
> > +       while (inode) {
> > +               o = inode->i_private;
> > +               inode->i_private = NULL;
> > +               iput(inode);
> > +               inode = o;
> > +       }
> > +
> > +       return error;
> > +}
> > +
> 
> 
> I think that new mechanism is worth a mention in the commit message,
> if not a patch of its own with cc to fsdevel.
> In a discussion on said patch I would ask: how does xfs_repair_fs_freeze()
> work in collaboration with user initiated fsfreeze?
> Is there a situation where LVM can be fooled to think that XFS is really
> frozen, but it is actually "repair frozen"? and metadata can change while
> taking a snapshot?

Notice how xfs added a ->freeze_super handler to the superblock
operations that prohibits userspace from initiating a freeze while any
repair operations are running?  If userspace (lvm, etc.) try to initiate
a freeze while repair is running, the freeze attempt is kicked back to
userspace with -EBUSY.  Similarly, a new xfs ->thaw_super handler
prevents userspace from unfreezing while repair is running.

Granted, the current patch doesn't quite work right either; I've
replaced m_scrubbers with a mutex that repair holds for the duration of
the repair freeze; this way regular freeze/thaw requests will block
until the repair is finished.

> This is why I suggested to add a VFS freeze level, e.g.
> SB_FREEZE_FS_MAINTAINANCE so that you don't publish XFS state
> as SB_FREEZE_COMPLETE while you are modifying metadata on disk.
> It might be sufficient to get XFS to state SB_FREEZE_COMPLETE and
> then up only to SB_FREEZE_FS in xfs_repair_fs_freeze() without
> adding any new states.

I tried that, and it didn't work.  We actually /do/ want to be at
SB_FREEZE_COMPLETE so that repair is the /only/ thread that can change
any filesystem state.  Under normal circumstances, XFS transaction
allocation will block until the SB_FREEZE_COMPLETE condition clears.
This stops any background space reclamation from happening, at least if
it requires a transaction.  Online repair of course grants itself the
ability to run transactions even during FREEZE_COMPLETE.

Will the following comment (to be embedded in the repair code) explain
this all sufficiently?

/*
 * Freezing the Filesystem for a Repair
 * ====================================
 *
 * While most repair activity can occur while the filesystem is live,
 * there are certain scenarios where we cannot tolerate concurrent
 * metadata updates.  We therefore must freeze the filesystem against
 * all other changes.
 *
 * The typical scenarios envisioned for repair freezes are (a) to avoid
 * ABBA deadlocks when need to take locks in an unusual order; or (b) to
 * update global filesystem state.  For example, reconstruction of a
 * damaged reverse mapping btree requires us to hold the AG header locks
 * while scanning inodes, which goes against the usual inode -> AG
 * header locking order.
 *
 * A note about inode reclaim: when we freeze the filesystem, users
 * can't modify things and periodic background reclaim of speculative
 * preallocations and copy-on-write staging extents is stopped.
 * However, the repair thread must be careful about evicting an inode
 * from memory -- if the eviction would require a transaction, we must
 * defer the iput until after the repair freeze.  The reasons for this
 * are twofold: first, repair already has a transaction and xfs can't
 * nest transactions; and second, we froze the fs to prevent
 * modifications that repair doesn't control directly.
 *
 * Userspace is prevented from freezing or thawing the filesystem during
 * a repair freeze by the ->freeze_super and ->thaw_super superblock
 * operations, which block any changes to the freeze state while a
 * repair freeze is running through the use of the m_repair_freeze
 * mutex.  It only makes sense to run one repair freeze at a time, so
 * the mutex is fine.
 *
 * Repair freezes cannot be initiated during a regular freeze because
 * freeze_super does not allow nested freeze.  Repair activity that does
 * not require a repair freeze is also prevented from running during a
 * regular freeze because transaction allocation blocks on the regular
 * freeze.  We assume that the only other users of
 * XFS_TRANS_NO_WRITECOUNT transactions either aren't modifying space
 * metadata in a way that would affect repair, or that we can inhibit
 * any of the ones that do.
 */

--D

> 
> Thanks,
> Amir.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/14] xfs: repair the AGF and AGFL
  2018-06-06  4:56         ` Darrick J. Wong
@ 2018-06-07  0:31           ` Dave Chinner
  2018-06-07  4:42             ` Darrick J. Wong
  0 siblings, 1 reply; 35+ messages in thread
From: Dave Chinner @ 2018-06-07  0:31 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Jun 05, 2018 at 09:56:03PM -0700, Darrick J. Wong wrote:
> On Wed, Jun 06, 2018 at 02:06:24PM +1000, Dave Chinner wrote:
> > On Tue, Jun 05, 2018 at 04:18:56PM -0700, Darrick J. Wong wrote:
> > > On Mon, Jun 04, 2018 at 11:52:55AM +1000, Dave Chinner wrote:
> > > > > +int
> > > > > +xfs_repair_mod_fdblocks(
> > > > > +	struct xfs_scrub_context	*sc,
> > > > > +	int64_t				delta_fdblocks)
> > > > > +{
> > > > > +	int				error;
> > > > > +
> > > > > +	if (delta_fdblocks == 0)
> > > > > +		return 0;
> > > > > +
> > > > > +	if (delta_fdblocks < 0) {
> > > > > +		error = xfs_trans_reserve_more(sc->tp, -delta_fdblocks, 0);
> > > > > +		if (error)
> > > > > +			return error;
> > > > > +	}
> > > > > +
> > > > > +	xfs_trans_mod_sb(sc->tp, XFS_TRANS_SB_FDBLOCKS, delta_fdblocks);
> > > > 
> > > > This seems a little hacky - it's working around a transaction
> > > > reservation overflow warning, right?
> > > 
> > > More than that -- we're trying to avoid the situation where the incore
> > > free block counter goes negative.
> > 
> > Which will only happen if we overflow the transaction reservation,
> > yes?
> > 
> > > Things go south pretty quickly when
> > > that happens because transaction reservations succeed when there's not
> > > enough free space to accomodate them.  We'd rather error out to
> > > userspace and have the admin unmount and xfs_repair than risk letting
> > > the fs really blow up.
> > 
> > Sure, but I really don't like retrospective modification of
> > transaction reservations.  The repair code is already supposed to
> > have a reservation that is big enough to rebuild the AG trees, so
> > why should we need to reserve more space while rebuilding the AG
> > trees?
> > 
> > > Note that this function has to be called before repair dirties anything
> > > in the repair transaction so we're still at a place where we could back
> > > out with no harm done.
> > 
> > Still doesn't explain to me what the problem is that this code works
> > around. And because I don't understand why it is necessary, this just
> > seems like a hack....
> 
> It /is/ a hack while I figure out a sane strategy for checking the
> summary counters that doesn't regularly shut down the filesystem.  I've
> thought that perhaps we should leave the global counters alone.  If
> something corrupts agf_flcount to 0x1000032 (when the real flcount is
> 0x32) then we're going to subtract a huge quantity from the global
> counter, which is totally stupid.

But that sort of thing is easy to deal with via bounding:

	min(agf_flcount, XFS_AGFL_SIZE(mp))

I'd like to avoid hacks for the "near to ENOSPC" conditions for the
moment. Repair being unreliable at ENOSPC, or even having repair
shut down because of unexpected ENOSPC is fine for the initial
commits. Document it as a problem that needs fixing and add it to
the list of things hat need to be addressed before we can remove the
EXPERIMENTAL tag.

> I've been mulling over what to do here -- normally, repair always writes
> out fresh AG headers and summary counters and under normal circumstance
> we'll always keep the AG counts and the global counts in sync, right?

Userspace repair rebuilds the freespace and inode trees in each ag,
and the rebuild keeps it's own count of the free space, used and
free inodes tracked in the new versions of the trees.  Once all AGs
have been rebuilt, it sums the counts gathered in memory from each
AG, and then it calls sync_sb() to write those aggregated counters
back into the global superblock.

IOWs, userspace repair doesn't rely on the existing counters (on
disk or in memory) at all, nor does it try to keep them up to date
as it goes. it keeps it's own state and assumes nothing else is
modifying the filesystem so it's always going to be correct.

> The idea I have to solve all these problems is to add a superblock state
> flag that we set whenever we think the global counters might be wrong.
> We'd only do this if we had to fix the counters in an AGF, or if the
> agfl padding fixer triggered, etc.

That's pretty much what I suggested via an "unclean unmount" state
flag. That way a new mount would always trigger a resync.

> Then we add a new FSSUMMARY scrubber that only does anything if the
> 'counters might be wrong' flag is set.  When it does, we freeze the fs,
> tally up all the counters and reservations, and fix the counters if we
> can.  Then unfreeze and exit.  This way we're not locking the entire
> filesystem every time scrub runs, but we have a way to fix the global
> counters if we need to.  Granted, figuring out the total amount of
> incore reservation might not be quick.

Right - that's basically the problem doing it at mount time avoids -
having to freeze the filesystem for an unknown amount of time to fix
it up. I agree with you that "unmount/mount" is not really an option
for fixing this up, and that freeze/fix/thaw is a much preferrable
option. However, this really is something that needs to be scheduled
for a maintenance period, not be done on a live production
filesystem where a freeze will violate performance SLAs....

IOWs, I think this "sync global counters" op needs to be a separate
admin controlled repair operation, not something we do automatically
as part of the normal scrub/repair process. i.e. when repair
completes, it tells the admin that:

**** IMPORTANT - Repair operations still pending ****

There are pending repair operations that need a quiesced filesystem
to perform. Quiescing the filesystem will block all access to the
filesystem while the repair operation is being performed, so this
should be performed only during a scheduled maintenance period.

To perform the pending repair operations, please run:

<repair prog> --quiesce --pending <mntpt>

**** /IMPORTANT ****

> The other repair functions no longer need xfs_repair_mod_fdblocks; if
> they has to make a correction to any of the per-ag free counters all
> they do is set the "wrong counters" flag.

*nod*

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/14] xfs: repair the AGF and AGFL
  2018-06-07  0:31           ` Dave Chinner
@ 2018-06-07  4:42             ` Darrick J. Wong
  2018-06-08  0:55               ` Dave Chinner
  0 siblings, 1 reply; 35+ messages in thread
From: Darrick J. Wong @ 2018-06-07  4:42 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Jun 07, 2018 at 10:31:32AM +1000, Dave Chinner wrote:
> On Tue, Jun 05, 2018 at 09:56:03PM -0700, Darrick J. Wong wrote:
> > On Wed, Jun 06, 2018 at 02:06:24PM +1000, Dave Chinner wrote:
> > > On Tue, Jun 05, 2018 at 04:18:56PM -0700, Darrick J. Wong wrote:
> > > > On Mon, Jun 04, 2018 at 11:52:55AM +1000, Dave Chinner wrote:
> > > > > > +int
> > > > > > +xfs_repair_mod_fdblocks(
> > > > > > +	struct xfs_scrub_context	*sc,
> > > > > > +	int64_t				delta_fdblocks)
> > > > > > +{
> > > > > > +	int				error;
> > > > > > +
> > > > > > +	if (delta_fdblocks == 0)
> > > > > > +		return 0;
> > > > > > +
> > > > > > +	if (delta_fdblocks < 0) {
> > > > > > +		error = xfs_trans_reserve_more(sc->tp, -delta_fdblocks, 0);
> > > > > > +		if (error)
> > > > > > +			return error;
> > > > > > +	}
> > > > > > +
> > > > > > +	xfs_trans_mod_sb(sc->tp, XFS_TRANS_SB_FDBLOCKS, delta_fdblocks);
> > > > > 
> > > > > This seems a little hacky - it's working around a transaction
> > > > > reservation overflow warning, right?
> > > > 
> > > > More than that -- we're trying to avoid the situation where the incore
> > > > free block counter goes negative.
> > > 
> > > Which will only happen if we overflow the transaction reservation,
> > > yes?
> > > 
> > > > Things go south pretty quickly when
> > > > that happens because transaction reservations succeed when there's not
> > > > enough free space to accomodate them.  We'd rather error out to
> > > > userspace and have the admin unmount and xfs_repair than risk letting
> > > > the fs really blow up.
> > > 
> > > Sure, but I really don't like retrospective modification of
> > > transaction reservations.  The repair code is already supposed to
> > > have a reservation that is big enough to rebuild the AG trees, so
> > > why should we need to reserve more space while rebuilding the AG
> > > trees?
> > > 
> > > > Note that this function has to be called before repair dirties anything
> > > > in the repair transaction so we're still at a place where we could back
> > > > out with no harm done.
> > > 
> > > Still doesn't explain to me what the problem is that this code works
> > > around. And because I don't understand why it is necessary, this just
> > > seems like a hack....
> > 
> > It /is/ a hack while I figure out a sane strategy for checking the
> > summary counters that doesn't regularly shut down the filesystem.  I've
> > thought that perhaps we should leave the global counters alone.  If
> > something corrupts agf_flcount to 0x1000032 (when the real flcount is
> > 0x32) then we're going to subtract a huge quantity from the global
> > counter, which is totally stupid.
> 
> But that sort of thing is easy to deal with via bounding:
> 
> 	min(agf_flcount, XFS_AGFL_SIZE(mp))
> 
> I'd like to avoid hacks for the "near to ENOSPC" conditions for the
> moment. Repair being unreliable at ENOSPC, or even having repair
> shut down because of unexpected ENOSPC is fine for the initial
> commits. Document it as a problem that needs fixing and add it to
> the list of things hat need to be addressed before we can remove the
> EXPERIMENTAL tag.

Yeah.  I agree that online repair should avoid pushing the system off an
ENOSPC cliff.  Or a corrupted fs cliff of any kind.

> > I've been mulling over what to do here -- normally, repair always writes
> > out fresh AG headers and summary counters and under normal circumstance
> > we'll always keep the AG counts and the global counts in sync, right?
> 
> Userspace repair rebuilds the freespace and inode trees in each ag,
> and the rebuild keeps it's own count of the free space, used and
> free inodes tracked in the new versions of the trees.  Once all AGs
> have been rebuilt, it sums the counts gathered in memory from each
> AG, and then it calls sync_sb() to write those aggregated counters
> back into the global superblock.
> 
> IOWs, userspace repair doesn't rely on the existing counters (on
> disk or in memory) at all, nor does it try to keep them up to date
> as it goes. it keeps it's own state and assumes nothing else is
> modifying the filesystem so it's always going to be correct.

Ok, that's what I thought too.

> > The idea I have to solve all these problems is to add a superblock state
> > flag that we set whenever we think the global counters might be wrong.
> > We'd only do this if we had to fix the counters in an AGF, or if the
> > agfl padding fixer triggered, etc.
> 
> That's pretty much what I suggested via an "unclean unmount" state
> flag. That way a new mount would always trigger a resync.

Heh ok. :)

> > Then we add a new FSSUMMARY scrubber that only does anything if the
> > 'counters might be wrong' flag is set.  When it does, we freeze the fs,
> > tally up all the counters and reservations, and fix the counters if we
> > can.  Then unfreeze and exit.  This way we're not locking the entire
> > filesystem every time scrub runs, but we have a way to fix the global
> > counters if we need to.  Granted, figuring out the total amount of
> > incore reservation might not be quick.
> 
> Right - that's basically the problem doing it at mount time avoids -
> having to freeze the filesystem for an unknown amount of time to fix
> it up. I agree with you that "unmount/mount" is not really an option

(Yeah.  I'll put it out there that mount time quotacheck should (some
day) just be an online repair function that runs at mount, and all the
metadata-corruptions that can stop a mount dead in its tracks
(finobt/refcountbt corruption) should just invoke invoke repair.  But
not for several years while we stabilize this beast....)

> for fixing this up, and that freeze/fix/thaw is a much preferrable
> option. However, this really is something that needs to be scheduled
> for a maintenance period, not be done on a live production
> filesystem where a freeze will violate performance SLAs....

[catching the list up with irc]

Agreed.  We can't just decide to freeze the fs, even if it's relatively
fast.  For a while I had ruminated about adding a time budget field to
the scrub ioctl which would cause it to abort (or chicken out) if it
couldn't execute the job within a certain time constraint, but that
mostly just ended with the behavior that specifying more -b to xfs_scrub
adds sleeps in between scrub calls to throttle the disk/cpu that online
repair eats up.

(I figure if that proves particularly irksome we can rip it out before
we remove the EXPERIMENTAL tags.)

> IOWs, I think this "sync global counters" op needs to be a separate
> admin controlled repair operation, not something we do automatically
> as part of the normal scrub/repair process. i.e. when repair
> completes, it tells the admin that:
>
> **** IMPORTANT - Repair operations still pending ****
> 
> There are pending repair operations that need a quiesced filesystem
> to perform. Quiescing the filesystem will block all access to the
> filesystem while the repair operation is being performed, so this
> should be performed only during a scheduled maintenance period.
> 
> To perform the pending repair operations, please run:
> 
> <repair prog> --quiesce --pending <mntpt>

Yeah, I agree that when the kernel tells scrub it avoided doing
something for fear of freezing the fs then it should print something to
the effect of:

"xfs_scrub: 439 repairs completed."
"xfs_scrub: rerun me with --do-the-slow-thing to complete repairs."

> 
> **** /IMPORTANT ****
> 
> > The other repair functions no longer need xfs_repair_mod_fdblocks; if
> > they has to make a correction to any of the per-ag free counters all
> > they do is set the "wrong counters" flag.
> 
> *nod*

Ok, so two extra scrub ioctl flags, then:

_IFLAG_FREEZE_OK	/* userspace allows repair to freeze the fs */
_OFLAG_AVOIDED_FREEZE	/* would have done something but couldn't freeze */

I'll think about how to add a new scrubber to take care of the global
summary counters, and in the meantime I think I'll nominate online
rmapbt repair and online quotacheck for _IFLAG_FREEZE_OK.

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/14] xfs: repair the AGF and AGFL
  2018-06-07  4:42             ` Darrick J. Wong
@ 2018-06-08  0:55               ` Dave Chinner
  2018-06-08  1:23                 ` Darrick J. Wong
  0 siblings, 1 reply; 35+ messages in thread
From: Dave Chinner @ 2018-06-08  0:55 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Jun 06, 2018 at 09:42:55PM -0700, Darrick J. Wong wrote:
> On Thu, Jun 07, 2018 at 10:31:32AM +1000, Dave Chinner wrote:
> > On Tue, Jun 05, 2018 at 09:56:03PM -0700, Darrick J. Wong wrote:
> > > On Wed, Jun 06, 2018 at 02:06:24PM +1000, Dave Chinner wrote:
> > > > On Tue, Jun 05, 2018 at 04:18:56PM -0700, Darrick J. Wong wrote:
> > > > > On Mon, Jun 04, 2018 at 11:52:55AM +1000, Dave Chinner wrote:

[big snip]

I think we're in agreement with the direction we need to head, so
rather than bikeshed it to death, I'll just say "yes, sounds like a
good plan" and leave the rest to you, Darrick. :)

> Ok, so two extra scrub ioctl flags, then:
> 
> _IFLAG_FREEZE_OK	/* userspace allows repair to freeze the fs */
> _OFLAG_AVOIDED_FREEZE	/* would have done something but couldn't freeze */
> 
> I'll think about how to add a new scrubber to take care of the global
> summary counters, and in the meantime I think I'll nominate online
> rmapbt repair and online quotacheck for _IFLAG_FREEZE_OK.

That's reasonable - I'm guessing the new global counter
repair/scrubber will need this too?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/14] xfs: repair the AGF and AGFL
  2018-06-08  0:55               ` Dave Chinner
@ 2018-06-08  1:23                 ` Darrick J. Wong
  0 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2018-06-08  1:23 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Fri, Jun 08, 2018 at 10:55:17AM +1000, Dave Chinner wrote:
> On Wed, Jun 06, 2018 at 09:42:55PM -0700, Darrick J. Wong wrote:
> > On Thu, Jun 07, 2018 at 10:31:32AM +1000, Dave Chinner wrote:
> > > On Tue, Jun 05, 2018 at 09:56:03PM -0700, Darrick J. Wong wrote:
> > > > On Wed, Jun 06, 2018 at 02:06:24PM +1000, Dave Chinner wrote:
> > > > > On Tue, Jun 05, 2018 at 04:18:56PM -0700, Darrick J. Wong wrote:
> > > > > > On Mon, Jun 04, 2018 at 11:52:55AM +1000, Dave Chinner wrote:
> 
> [big snip]
> 
> I think we're in agreement with the direction we need to head, so
> rather than bikeshed it to death, I'll just say "yes, sounds like a
> good plan" and leave the rest to you, Darrick. :)

Ok!

> > Ok, so two extra scrub ioctl flags, then:
> > 
> > _IFLAG_FREEZE_OK	/* userspace allows repair to freeze the fs */
> > _OFLAG_AVOIDED_FREEZE	/* would have done something but couldn't freeze */
> > 
> > I'll think about how to add a new scrubber to take care of the global
> > summary counters, and in the meantime I think I'll nominate online
> > rmapbt repair and online quotacheck for _IFLAG_FREEZE_OK.
> 
> That's reasonable - I'm guessing the new global counter
> repair/scrubber will need this too?

Most likely.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2018-06-08  1:23 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-30 19:30 [PATCH v15.2 00/14] xfs-4.18: online repair support Darrick J. Wong
2018-05-30 19:30 ` [PATCH 01/14] xfs: repair the AGF and AGFL Darrick J. Wong
2018-06-04  1:52   ` Dave Chinner
2018-06-05 23:18     ` Darrick J. Wong
2018-06-06  4:06       ` Dave Chinner
2018-06-06  4:56         ` Darrick J. Wong
2018-06-07  0:31           ` Dave Chinner
2018-06-07  4:42             ` Darrick J. Wong
2018-06-08  0:55               ` Dave Chinner
2018-06-08  1:23                 ` Darrick J. Wong
2018-05-30 19:30 ` [PATCH 02/14] xfs: repair the AGI Darrick J. Wong
2018-06-04  1:56   ` Dave Chinner
2018-06-05 23:54     ` Darrick J. Wong
2018-05-30 19:30 ` [PATCH 03/14] xfs: repair free space btrees Darrick J. Wong
2018-06-04  2:12   ` Dave Chinner
2018-06-06  1:50     ` Darrick J. Wong
2018-06-06  3:34       ` Dave Chinner
2018-06-06  4:01         ` Darrick J. Wong
2018-05-30 19:31 ` [PATCH 04/14] xfs: repair inode btrees Darrick J. Wong
2018-06-04  3:41   ` Dave Chinner
2018-06-06  3:55     ` Darrick J. Wong
2018-06-06  4:32       ` Dave Chinner
2018-06-06  4:58         ` Darrick J. Wong
2018-05-30 19:31 ` [PATCH 05/14] xfs: repair the rmapbt Darrick J. Wong
2018-05-31  5:42   ` Amir Goldstein
2018-06-06 21:13     ` Darrick J. Wong
2018-05-30 19:31 ` [PATCH 06/14] xfs: repair refcount btrees Darrick J. Wong
2018-05-30 19:31 ` [PATCH 07/14] xfs: repair inode records Darrick J. Wong
2018-05-30 19:31 ` [PATCH 08/14] xfs: zap broken inode forks Darrick J. Wong
2018-05-30 19:31 ` [PATCH 09/14] xfs: repair inode block maps Darrick J. Wong
2018-05-30 19:31 ` [PATCH 10/14] xfs: repair damaged symlinks Darrick J. Wong
2018-05-30 19:31 ` [PATCH 11/14] xfs: repair extended attributes Darrick J. Wong
2018-05-30 19:31 ` [PATCH 12/14] xfs: scrub should set preen if attr leaf has holes Darrick J. Wong
2018-05-30 19:32 ` [PATCH 13/14] xfs: repair quotas Darrick J. Wong
2018-05-30 19:32 ` [PATCH 14/14] xfs: implement live quotacheck as part of quota repair Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.