All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: david@fromorbit.com, darrick.wong@oracle.com
Cc: linux-xfs@vger.kernel.org, xfs@oss.sgi.com
Subject: [PATCH 62/71] xfs_repair: process reverse-mapping data into refcount data
Date: Thu, 25 Aug 2016 16:53:14 -0700	[thread overview]
Message-ID: <147216919493.4420.15402732901727826690.stgit@birch.djwong.org> (raw)
In-Reply-To: <147216879156.4420.2446767701729565218.stgit@birch.djwong.org>

Take all the reverse-mapping data we've acquired and use it to generate
reference count data.  This data is used in phase 5 to rebuild the
refcount btree.

v2: Update to reflect separation of rmap_irec flags.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/phase4.c |   27 ++++++
 repair/rmap.c   |  232 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 repair/rmap.h   |    2 
 3 files changed, 259 insertions(+), 2 deletions(-)


diff --git a/repair/phase4.c b/repair/phase4.c
index 9da1bb1..86992c9 100644
--- a/repair/phase4.c
+++ b/repair/phase4.c
@@ -193,6 +193,21 @@ _("%s while checking reverse-mappings"),
 }
 
 static void
+compute_ag_refcounts(
+	work_queue_t	*wq,
+	xfs_agnumber_t	agno,
+	void		*arg)
+{
+	int		error;
+
+	error = compute_refcounts(wq->mp, agno);
+	if (error)
+		do_error(
+_("%s while computing reference count records.\n"),
+			 strerror(-error));
+}
+
+static void
 process_rmap_data(
 	struct xfs_mount	*mp)
 {
@@ -206,6 +221,14 @@ process_rmap_data(
 	for (i = 0; i < mp->m_sb.sb_agcount; i++)
 		queue_work(&wq, check_rmap_btrees, i, NULL);
 	destroy_work_queue(&wq);
+
+	if (!xfs_sb_version_hasreflink(&mp->m_sb))
+		return;
+
+	create_work_queue(&wq, mp, libxfs_nproc());
+	for (i = 0; i < mp->m_sb.sb_agcount; i++)
+		queue_work(&wq, compute_ag_refcounts, i, NULL);
+	destroy_work_queue(&wq);
 }
 
 void
@@ -359,7 +382,9 @@ phase4(xfs_mount_t *mp)
 
 	/*
 	 * Process all the reverse-mapping data that we collected.  This
-	 * involves checking the rmap data against the btree.
+	 * involves checking the rmap data against the btree, computing
+	 * reference counts based on the rmap data, and checking the counts
+	 * against the refcount btree.
 	 */
 	process_rmap_data(mp);
 
diff --git a/repair/rmap.c b/repair/rmap.c
index 0baf4eb..0753448 100644
--- a/repair/rmap.c
+++ b/repair/rmap.c
@@ -42,6 +42,7 @@ struct xfs_ag_rmap {
 	int		ar_flcount;		/* agfl entries from leftover */
 						/* agbt allocations */
 	struct xfs_rmap_irec	ar_last_rmap;	/* last rmap seen */
+	struct xfs_slab	*ar_refcount_items;	/* refcount items, p4-5 */
 };
 
 static struct xfs_ag_rmap *ag_rmaps;
@@ -88,7 +89,8 @@ bool
 rmap_needs_work(
 	struct xfs_mount	*mp)
 {
-	return xfs_sb_version_hasrmapbt(&mp->m_sb);
+	return xfs_sb_version_hasreflink(&mp->m_sb) ||
+	       xfs_sb_version_hasrmapbt(&mp->m_sb);
 }
 
 /*
@@ -120,6 +122,11 @@ _("Insufficient memory while allocating reverse mapping slabs."));
 			do_error(
 _("Insufficient memory while allocating raw metadata reverse mapping slabs."));
 		ag_rmaps[i].ar_last_rmap.rm_owner = XFS_RMAP_OWN_UNKNOWN;
+		error = init_slab(&ag_rmaps[i].ar_refcount_items,
+				  sizeof(struct xfs_refcount_irec));
+		if (error)
+			do_error(
+_("Insufficient memory while allocating refcount item slabs."));
 	}
 }
 
@@ -138,6 +145,7 @@ rmaps_free(
 	for (i = 0; i < mp->m_sb.sb_agcount; i++) {
 		free_slab(&ag_rmaps[i].ar_rmaps);
 		free_slab(&ag_rmaps[i].ar_raw_rmaps);
+		free_slab(&ag_rmaps[i].ar_refcount_items);
 	}
 	free(ag_rmaps);
 	ag_rmaps = NULL;
@@ -591,6 +599,228 @@ rmap_dump(
 #endif
 
 /*
+ * Rebuilding the Reference Count & Reverse Mapping Btrees
+ *
+ * The reference count (refcnt) and reverse mapping (rmap) btrees are rebuilt
+ * during phase 5, like all other AG btrees.  Therefore, reverse mappings must
+ * be processed into reference counts at the end of phase 4, and the rmaps must
+ * be recorded during phase 4.  There is a need to access the rmaps in physical
+ * block order, but no particular need for random access, so the slab.c code
+ * provides a big logical array (consisting of smaller slabs) and some inorder
+ * iterator functions.
+ *
+ * Once we've recorded all the reverse mappings, we're ready to translate the
+ * rmaps into refcount entries.  Imagine the rmap entries as rectangles
+ * representing extents of physical blocks, and that the rectangles can be laid
+ * down to allow them to overlap each other; then we know that we must emit
+ * a refcnt btree entry wherever the amount of overlap changes, i.e. the
+ * emission stimulus is level-triggered:
+ *
+ *                 -    ---
+ *       --      ----- ----   ---        ------
+ * --   ----     ----------- ----     ---------
+ * -------------------------------- -----------
+ * ^ ^  ^^ ^^    ^ ^^ ^^^  ^^^^  ^ ^^ ^  ^     ^
+ * 2 1  23 21    3 43 234  2123  1 01 2  3     0
+ *
+ * For our purposes, a rmap is a tuple (startblock, len, fileoff, owner).
+ *
+ * Note that in the actual refcnt btree we don't store the refcount < 2 cases
+ * because the bnobt tells us which blocks are free; single-use blocks aren't
+ * recorded in the bnobt or the refcntbt.  If the rmapbt supports storing
+ * multiple entries covering a given block we could theoretically dispense with
+ * the refcntbt and simply count rmaps, but that's inefficient in the (hot)
+ * write path, so we'll take the cost of the extra tree to save time.  Also
+ * there's no guarantee that rmap will be enabled.
+ *
+ * Given an array of rmaps sorted by physical block number, a starting physical
+ * block (sp), a bag to hold rmaps that cover sp, and the next physical
+ * block where the level changes (np), we can reconstruct the refcount
+ * btree as follows:
+ *
+ * While there are still unprocessed rmaps in the array,
+ *  - Set sp to the physical block (pblk) of the next unprocessed rmap.
+ *  - Add to the bag all rmaps in the array where startblock == sp.
+ *  - Set np to the physical block where the bag size will change.
+ *    This is the minimum of (the pblk of the next unprocessed rmap) and
+ *    (startblock + len of each rmap in the bag).
+ *  - Record the bag size as old_bag_size.
+ *
+ *  - While the bag isn't empty,
+ *     - Remove from the bag all rmaps where startblock + len == np.
+ *     - Add to the bag all rmaps in the array where startblock == np.
+ *     - If the bag size isn't old_bag_size, store the refcount entry
+ *       (sp, np - sp, bag_size) in the refcnt btree.
+ *     - If the bag is empty, break out of the inner loop.
+ *     - Set old_bag_size to the bag size
+ *     - Set sp = np.
+ *     - Set np to the physical block where the bag size will change.
+ *       This is the minimum of (the pblk of the next unprocessed rmap) and
+ *       (startblock + len of each rmap in the bag).
+ *
+ * An implementation detail is that because this processing happens during
+ * phase 4, the refcount entries are stored in an array so that phase 5 can
+ * load them into the refcount btree.  The rmaps can be loaded directly into
+ * the rmap btree during phase 5 as well.
+ */
+
+/*
+ * Emit a refcount object for refcntbt reconstruction during phase 5.
+ */
+#define REFCOUNT_CLAMP(nr)	((nr) > MAXREFCOUNT ? MAXREFCOUNT : (nr))
+static void
+refcount_emit(
+	struct xfs_mount		*mp,
+	xfs_agnumber_t		agno,
+	xfs_agblock_t		agbno,
+	xfs_extlen_t		len,
+	size_t			nr_rmaps)
+{
+	struct xfs_refcount_irec	rlrec;
+	int			error;
+	struct xfs_slab		*rlslab;
+
+	rlslab = ag_rmaps[agno].ar_refcount_items;
+	ASSERT(nr_rmaps > 0);
+
+	dbg_printf("REFL: agno=%u pblk=%u, len=%u -> refcount=%zu\n",
+		agno, agbno, len, nr_rmaps);
+	rlrec.rc_startblock = agbno;
+	rlrec.rc_blockcount = len;
+	rlrec.rc_refcount = REFCOUNT_CLAMP(nr_rmaps);
+	error = slab_add(rlslab, &rlrec);
+	if (error)
+		do_error(
+_("Insufficient memory while recreating refcount tree."));
+}
+#undef REFCOUNT_CLAMP
+
+/*
+ * Transform a pile of physical block mapping observations into refcount data
+ * for eventual rebuilding of the btrees.
+ */
+#define RMAP_END(r)	((r)->rm_startblock + (r)->rm_blockcount)
+int
+compute_refcounts(
+	struct xfs_mount		*mp,
+	xfs_agnumber_t		agno)
+{
+	struct xfs_bag		*stack_top = NULL;
+	struct xfs_slab		*rmaps;
+	struct xfs_slab_cursor	*rmaps_cur;
+	struct xfs_rmap_irec	*array_cur;
+	struct xfs_rmap_irec	*rmap;
+	xfs_agblock_t		sbno;	/* first bno of this rmap set */
+	xfs_agblock_t		cbno;	/* first bno of this refcount set */
+	xfs_agblock_t		nbno;	/* next bno where rmap set changes */
+	size_t			n, idx;
+	size_t			old_stack_nr;
+	int			error;
+
+	if (!xfs_sb_version_hasreflink(&mp->m_sb))
+		return 0;
+
+	rmaps = ag_rmaps[agno].ar_rmaps;
+
+	error = init_slab_cursor(rmaps, rmap_compare, &rmaps_cur);
+	if (error)
+		return error;
+
+	error = init_bag(&stack_top);
+	if (error)
+		goto err;
+
+	/* While there are rmaps to be processed... */
+	n = 0;
+	while (n < slab_count(rmaps)) {
+		array_cur = peek_slab_cursor(rmaps_cur);
+		sbno = cbno = array_cur->rm_startblock;
+		/* Push all rmaps with pblk == sbno onto the stack */
+		for (;
+		     array_cur && array_cur->rm_startblock == sbno;
+		     array_cur = peek_slab_cursor(rmaps_cur)) {
+			advance_slab_cursor(rmaps_cur); n++;
+			rmap_dump("push0", agno, array_cur);
+			error = bag_add(stack_top, array_cur);
+			if (error)
+				goto err;
+		}
+
+		/* Set nbno to the bno of the next refcount change */
+		if (n < slab_count(rmaps))
+			nbno = array_cur->rm_startblock;
+		else
+			nbno = NULLAGBLOCK;
+		foreach_bag_ptr(stack_top, idx, rmap) {
+			nbno = min(nbno, RMAP_END(rmap));
+		}
+
+		/* Emit reverse mappings, if needed */
+		ASSERT(nbno > sbno);
+		old_stack_nr = bag_count(stack_top);
+
+		/* While stack isn't empty... */
+		while (bag_count(stack_top)) {
+			/* Pop all rmaps that end at nbno */
+			foreach_bag_ptr_reverse(stack_top, idx, rmap) {
+				if (RMAP_END(rmap) != nbno)
+					continue;
+				rmap_dump("pop", agno, rmap);
+				error = bag_remove(stack_top, idx);
+				if (error)
+					goto err;
+			}
+
+			/* Push array items that start at nbno */
+			for (;
+			     array_cur && array_cur->rm_startblock == nbno;
+			     array_cur = peek_slab_cursor(rmaps_cur)) {
+				advance_slab_cursor(rmaps_cur); n++;
+				rmap_dump("push1", agno, array_cur);
+				error = bag_add(stack_top, array_cur);
+				if (error)
+					goto err;
+			}
+
+			/* Emit refcount if necessary */
+			ASSERT(nbno > cbno);
+			if (bag_count(stack_top) != old_stack_nr) {
+				if (old_stack_nr > 1) {
+					refcount_emit(mp, agno, cbno,
+						      nbno - cbno,
+						      old_stack_nr);
+				}
+				cbno = nbno;
+			}
+
+			/* Stack empty, go find the next rmap */
+			if (bag_count(stack_top) == 0)
+				break;
+			old_stack_nr = bag_count(stack_top);
+			sbno = nbno;
+
+			/* Set nbno to the bno of the next refcount change */
+			if (n < slab_count(rmaps))
+				nbno = array_cur->rm_startblock;
+			else
+				nbno = NULLAGBLOCK;
+			foreach_bag_ptr(stack_top, idx, rmap) {
+				nbno = min(nbno, RMAP_END(rmap));
+			}
+
+			/* Emit reverse mappings, if needed */
+			ASSERT(nbno > sbno);
+		}
+	}
+err:
+	free_bag(&stack_top);
+	free_slab_cursor(&rmaps_cur);
+
+	return error;
+}
+#undef RMAP_END
+
+/*
  * Return the number of rmap objects for an AG.
  */
 size_t
diff --git a/repair/rmap.h b/repair/rmap.h
index 7106dfc..01dec9f 100644
--- a/repair/rmap.h
+++ b/repair/rmap.h
@@ -49,6 +49,8 @@ extern __int64_t rmap_diffkeys(struct xfs_rmap_irec *kp1,
 extern void rmap_high_key_from_rec(struct xfs_rmap_irec *rec,
 		struct xfs_rmap_irec *key);
 
+extern int compute_refcounts(struct xfs_mount *, xfs_agnumber_t);
+
 extern void fix_freelist(struct xfs_mount *, xfs_agnumber_t, bool);
 extern void rmap_store_agflcount(struct xfs_mount *, xfs_agnumber_t, int);
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2016-08-25 23:53 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-25 23:46 [PATCH v8 00/71] xfsprogs: add reflink and dedupe support Darrick J. Wong
2016-08-25 23:46 ` [PATCH 01/71] xfs: remove xfs_btree_bigkey Darrick J. Wong
2016-08-25 23:46 ` [PATCH 02/71] xfs: create a standard btree size calculator code Darrick J. Wong
2016-08-25 23:46 ` [PATCH 03/71] xfs: count the blocks in a btree Darrick J. Wong
2016-08-25 23:46 ` [PATCH 04/71] xfs: defer should allow ->finish_item to request a new transaction Darrick J. Wong
2016-08-25 23:47 ` [PATCH 05/71] xfs: set up per-AG free space reservations Darrick J. Wong
2016-08-25 23:47 ` [PATCH 06/71] xfs: introduce refcount btree definitions Darrick J. Wong
2016-08-25 23:47 ` [PATCH 07/71] xfs: add refcount btree stats infrastructure Darrick J. Wong
2016-08-25 23:47 ` [PATCH 08/71] xfs: refcount btree add more reserved blocks Darrick J. Wong
2016-08-25 23:47 ` [PATCH 09/71] xfs: define the on-disk refcount btree format Darrick J. Wong
2016-08-25 23:47 ` [PATCH 10/71] xfs: account for the refcount btree in the alloc/free log reservation Darrick J. Wong
2016-08-25 23:47 ` [PATCH 11/71] xfs: add refcount btree operations Darrick J. Wong
2016-08-25 23:47 ` [PATCH 12/71] xfs: create refcount update intent log items Darrick J. Wong
2016-08-25 23:47 ` [PATCH 13/71] xfs: log refcount intent items Darrick J. Wong
2016-08-25 23:48 ` [PATCH 14/71] xfs: adjust refcount of an extent of blocks in refcount btree Darrick J. Wong
2016-08-25 23:48 ` [PATCH 15/71] xfs: connect refcount adjust functions to upper layers Darrick J. Wong
2016-08-25 23:48 ` [PATCH 16/71] xfs: adjust refcount when unmapping file blocks Darrick J. Wong
2016-08-25 23:48 ` [PATCH 17/71] xfs: refcount btree requires more reserved space Darrick J. Wong
2016-08-25 23:48 ` [PATCH 18/71] xfs: introduce reflink utility functions Darrick J. Wong
2016-08-25 23:48 ` [PATCH 19/71] xfs: create bmbt update intent log items Darrick J. Wong
2016-08-25 23:48 ` [PATCH 20/71] xfs: log bmap intent items Darrick J. Wong
2016-08-25 23:48 ` [PATCH 21/71] xfs: map an inode's offset to an exact physical block Darrick J. Wong
2016-08-25 23:48 ` [PATCH 22/71] xfs: pass bmapi flags through to bmap_del_extent Darrick J. Wong
2016-08-25 23:48 ` [PATCH 23/71] xfs: implement deferred bmbt map/unmap operations Darrick J. Wong
2016-08-25 23:49 ` [PATCH 24/71] xfs: return work remaining at the end of a bunmapi operation Darrick J. Wong
2016-08-25 23:49 ` [PATCH 25/71] xfs: add reflink feature flag to geometry Darrick J. Wong
2016-08-25 23:49 ` [PATCH 26/71] xfs: don't allow reflinked dir/dev/fifo/socket/pipe files Darrick J. Wong
2016-08-25 23:49 ` [PATCH 27/71] xfs: introduce the CoW fork Darrick J. Wong
2016-08-25 23:49 ` [PATCH 28/71] xfs: support bmapping delalloc extents in " Darrick J. Wong
2016-08-25 23:49 ` [PATCH 29/71] xfs: support allocating delayed extents in " Darrick J. Wong
2016-08-25 23:49 ` [PATCH 30/71] xfs: support removing extents from " Darrick J. Wong
2016-08-25 23:49 ` [PATCH 31/71] xfs: store in-progress CoW allocations in the refcount btree Darrick J. Wong
2016-08-25 23:49 ` [PATCH 32/71] xfs: teach get_bmapx and fiemap about shared extents and the CoW fork Darrick J. Wong
2016-08-25 23:50 ` [PATCH 33/71] xfs: support FS_XFLAG_REFLINK on reflink filesystems Darrick J. Wong
2016-08-25 23:50 ` [PATCH 34/71] xfs: create a separate cow extent size hint for the allocator Darrick J. Wong
2016-08-25 23:50 ` [PATCH 35/71] xfs: preallocate blocks for worst-case btree expansion Darrick J. Wong
2016-08-25 23:50 ` [PATCH 36/71] xfs: try other AGs to allocate a BMBT block Darrick J. Wong
2016-08-25 23:50 ` [PATCH 37/71] xfs: increase log reservations for reflink Darrick J. Wong
2016-08-25 23:50 ` [PATCH 38/71] xfs: add shared rmap map/unmap/convert log item types Darrick J. Wong
2016-08-25 23:50 ` [PATCH 39/71] xfs: use interval query for rmap map and unmap operations on shared files Darrick J. Wong
2016-08-25 23:50 ` [PATCH 40/71] xfs: convert unwritten status of shared-extent reverse mappings " Darrick J. Wong
2016-08-25 23:50 ` [PATCH 41/71] xfs: don't allow realtime and reflinked files to mix Darrick J. Wong
2016-08-25 23:51 ` [PATCH 42/71] xfs: don't mix reflink and DAX mode for now Darrick J. Wong
2016-08-25 23:51 ` [PATCH 43/71] xfs: recognize the reflink feature bit Darrick J. Wong
2016-08-25 23:51 ` [PATCH 44/71] xfs_db: dump refcount btree data Darrick J. Wong
2016-08-25 23:51 ` [PATCH 45/71] xfs_db: add support for checking the refcount btree Darrick J. Wong
2016-08-25 23:51 ` [PATCH 46/71] xfs_db: metadump should copy the refcount btree too Darrick J. Wong
2016-08-25 23:51 ` [PATCH 47/71] xfs_db: deal with the CoW extent size hint Darrick J. Wong
2016-08-25 23:51 ` [PATCH 48/71] xfs_db: print one array element per line Darrick J. Wong
2016-08-25 23:51 ` [PATCH 49/71] xfs_growfs: report the presence of the reflink feature Darrick J. Wong
2016-08-25 23:51 ` [PATCH 50/71] xfs_io: bmap should support querying CoW fork, shared blocks Darrick J. Wong
2016-08-25 23:52 ` [PATCH 51/71] libxfs: add configure option to override system header fsxattr Darrick J. Wong
2016-08-25 23:52 ` [PATCH 52/71] xfs_io: get and set the CoW extent size hint Darrick J. Wong
2016-08-25 23:52 ` [PATCH 53/71] xfs_io: add refcount+bmap error injection types Darrick J. Wong
2016-08-25 23:52 ` [PATCH 54/71] xfs_logprint: support cowextsize reporting in log contents Darrick J. Wong
2016-08-25 23:52 ` [PATCH 55/71] xfs_logprint: support refcount redo items Darrick J. Wong
2016-08-25 23:52 ` [PATCH 56/71] xfs_logprint: support bmap " Darrick J. Wong
2016-08-25 23:52 ` [PATCH 57/71] man: document the reflink inode flag in fsxattr Darrick J. Wong
2016-08-25 23:52 ` [PATCH 58/71] man: document the inode cowextsize flags & fields Darrick J. Wong
2016-08-25 23:52 ` [PATCH 59/71] xfs_repair: fix get_agino_buf to avoid corrupting inodes Darrick J. Wong
2016-08-25 23:53 ` [PATCH 60/71] xfs_repair: check the existing refcount btree Darrick J. Wong
2016-08-25 23:53 ` [PATCH 61/71] xfs_repair: handle multiple owners of data blocks Darrick J. Wong
2016-08-25 23:53 ` Darrick J. Wong [this message]
2016-08-25 23:53 ` [PATCH 63/71] xfs_repair: record reflink inode state Darrick J. Wong
2016-08-25 23:53 ` [PATCH 64/71] xfs_repair: fix inode reflink flags Darrick J. Wong
2016-08-25 23:53 ` [PATCH 65/71] xfs_repair: check the refcount btree against our observed reference counts when -n Darrick J. Wong
2016-08-25 23:53 ` [PATCH 66/71] xfs_repair: rebuild the refcount btree Darrick J. Wong
2016-08-25 23:53 ` [PATCH 67/71] xfs_repair: complain about copy-on-write leftovers Darrick J. Wong
2016-08-25 23:53 ` [PATCH 68/71] xfs_repair: check the CoW extent size hint Darrick J. Wong
2016-08-25 23:54 ` [PATCH 69/71] xfs_repair: use range query when while checking rmaps Darrick J. Wong
2016-08-25 23:54 ` [PATCH 70/71] xfs_repair: check for mergeable refcount records Darrick J. Wong
2016-08-25 23:54 ` [PATCH 71/71] mkfs.xfs: format reflink enabled filesystems Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=147216919493.4420.15402732901727826690.stgit@birch.djwong.org \
    --to=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=xfs@oss.sgi.com \
    --subject='Re: [PATCH 62/71] xfs_repair: process reverse-mapping data into refcount data' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.