[PATCHSET v23.1 0/5] xfs: detect incorrect gaps in refcount btree

linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCHSET v23.1 0/5] xfs: detect incorrect gaps in refcount btree
@ 2022-10-02 18:20 Darrick J. Wong
  2022-10-02 18:20 ` [PATCH 3/5] xfs: mask key comparisons for keyspace fill scans Darrick J. Wong
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Darrick J. Wong @ 2022-10-02 18:20 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

Hi all,

The next few patchsets address a deficiency in scrub that I found while
QAing the refcount btree scrubber.  If there's a gap between refcount
records, we need to cross-reference that gap with the reverse mappings
to ensure that there are no overlapping records in the rmap btree.  If
we find any, then the refcount btree is not consistent.  This is not a
property that is specific to the refcount btree; they all need to have
this sort of keyspace scanning logic to detect inconsistencies.

To do this accurately, we need to be able to scan the keyspace of a
btree (which we already do) to be able to tell the caller if the
keyspace is empty, sparse, or fully covered by records.  The first few
patches add the keyspace scanner to the generic btree code, along with
the ability to mask off parts of btree keys because when we scan the
rmapbt, we only care about space usage, not the owners.

The final patch closes the scanning gap in the refcountbt scanner.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-detect-refcount-gaps

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-detect-refcount-gaps
---
 fs/xfs/libxfs/xfs_alloc.c          |   11 ++-
 fs/xfs/libxfs/xfs_alloc.h          |    4 +
 fs/xfs/libxfs/xfs_alloc_btree.c    |   16 ++++
 fs/xfs/libxfs/xfs_bmap_btree.c     |   16 ++++
 fs/xfs/libxfs/xfs_btree.c          |  140 +++++++++++++++++++++++++++++++-----
 fs/xfs/libxfs/xfs_btree.h          |   23 +++++-
 fs/xfs/libxfs/xfs_ialloc_btree.c   |   17 ++++
 fs/xfs/libxfs/xfs_refcount.c       |   11 ++-
 fs/xfs/libxfs/xfs_refcount.h       |    5 +
 fs/xfs/libxfs/xfs_refcount_btree.c |   16 ++++
 fs/xfs/libxfs/xfs_rmap.c           |   17 +++-
 fs/xfs/libxfs/xfs_rmap.h           |    4 +
 fs/xfs/libxfs/xfs_rmap_btree.c     |   53 ++++++++++++++
 fs/xfs/libxfs/xfs_types.h          |   12 +++
 fs/xfs/scrub/agheader.c            |    5 +
 fs/xfs/scrub/alloc.c               |    7 +-
 fs/xfs/scrub/bmap.c                |   11 ++-
 fs/xfs/scrub/ialloc.c              |    2 -
 fs/xfs/scrub/inode.c               |    1 
 fs/xfs/scrub/refcount.c            |  112 +++++++++++++++++++++++++++--
 fs/xfs/scrub/rmap.c                |    6 +-
 fs/xfs/scrub/scrub.h               |    2 +
 22 files changed, 432 insertions(+), 59 deletions(-)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/5] xfs: replace xfs_btree_has_record with a general keyspace scanner
  2022-10-02 18:20 [PATCHSET v23.1 0/5] xfs: detect incorrect gaps in refcount btree Darrick J. Wong
  2022-10-02 18:20 ` [PATCH 3/5] xfs: mask key comparisons for keyspace fill scans Darrick J. Wong
  2022-10-02 18:20 ` [PATCH 2/5] xfs: refactor converting btree irec to btree key Darrick J. Wong
@ 2022-10-02 18:20 ` Darrick J. Wong
  2022-11-02  1:47   ` Dave Chinner
  2022-10-02 18:20 ` [PATCH 4/5] xfs: check the reference counts of gaps in the refcount btree Darrick J. Wong
  2022-10-02 18:20 ` [PATCH 5/5] xfs: ensure that all metadata and data blocks are not cow staging extents Darrick J. Wong
  4 siblings, 1 reply; 14+ messages in thread
From: Darrick J. Wong @ 2022-10-02 18:20 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

The current implementation of xfs_btree_has_record returns true if it
finds /any/ record within the given range.  Unfortunately, that's not
sufficient for scrub.  We want to be able to tell if a range of keyspace
for a btree is devoid of records, is totally mapped to records, or is
somewhere in between.  By forcing this to be a boolean, we were
generally missing the "in between" case and returning incorrect results.
Fix the API so that we can tell the caller which of those three is the
current state.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_alloc.c          |   11 +++-
 fs/xfs/libxfs/xfs_alloc.h          |    4 +-
 fs/xfs/libxfs/xfs_alloc_btree.c    |   13 +++++
 fs/xfs/libxfs/xfs_bmap_btree.c     |   13 +++++
 fs/xfs/libxfs/xfs_btree.c          |   92 +++++++++++++++++++++++++++++++-----
 fs/xfs/libxfs/xfs_btree.h          |   15 +++++-
 fs/xfs/libxfs/xfs_ialloc_btree.c   |   14 +++++
 fs/xfs/libxfs/xfs_refcount.c       |   11 +++-
 fs/xfs/libxfs/xfs_refcount.h       |    5 +-
 fs/xfs/libxfs/xfs_refcount_btree.c |   13 +++++
 fs/xfs/libxfs/xfs_rmap.c           |   12 +++--
 fs/xfs/libxfs/xfs_rmap.h           |    4 +-
 fs/xfs/libxfs/xfs_rmap_btree.c     |   13 +++++
 fs/xfs/libxfs/xfs_types.h          |   12 +++++
 fs/xfs/scrub/alloc.c               |    6 +-
 fs/xfs/scrub/refcount.c            |    7 ++-
 fs/xfs/scrub/rmap.c                |    6 +-
 17 files changed, 209 insertions(+), 42 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index e2bdf089c0a3..f0c92093db0a 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -3498,13 +3498,16 @@ xfs_alloc_query_all(
 	return xfs_btree_query_all(cur, xfs_alloc_query_range_helper, &query);
 }
 
-/* Is there a record covering a given extent? */
+/*
+ * Scan part of the keyspace of the free space and tell us if the area has no
+ * records, is fully mapped by records, or is partially filled.
+ */
 int
-xfs_alloc_has_record(
+xfs_alloc_scan_keyfill(
 	struct xfs_btree_cur	*cur,
 	xfs_agblock_t		bno,
 	xfs_extlen_t		len,
-	bool			*exists)
+	enum xfs_btree_keyfill	*outcome)
 {
 	union xfs_btree_irec	low;
 	union xfs_btree_irec	high;
@@ -3514,7 +3517,7 @@ xfs_alloc_has_record(
 	memset(&high, 0xFF, sizeof(high));
 	high.a.ar_startblock = bno + len - 1;
 
-	return xfs_btree_has_record(cur, &low, &high, exists);
+	return xfs_btree_scan_keyfill(cur, &low, &high, outcome);
 }
 
 /*
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index 2c3f762dfb58..ebda867aa6f4 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -194,8 +194,8 @@ int xfs_alloc_query_range(struct xfs_btree_cur *cur,
 int xfs_alloc_query_all(struct xfs_btree_cur *cur, xfs_alloc_query_range_fn fn,
 		void *priv);
 
-int xfs_alloc_has_record(struct xfs_btree_cur *cur, xfs_agblock_t bno,
-		xfs_extlen_t len, bool *exist);
+int xfs_alloc_scan_keyfill(struct xfs_btree_cur *cur, xfs_agblock_t bno,
+		xfs_extlen_t len, enum xfs_btree_keyfill *outcome);
 
 typedef int (*xfs_agfl_walk_fn)(struct xfs_mount *mp, xfs_agblock_t bno,
 		void *priv);
diff --git a/fs/xfs/libxfs/xfs_alloc_btree.c b/fs/xfs/libxfs/xfs_alloc_btree.c
index 549a3cba0234..916d278204f5 100644
--- a/fs/xfs/libxfs/xfs_alloc_btree.c
+++ b/fs/xfs/libxfs/xfs_alloc_btree.c
@@ -423,6 +423,18 @@ xfs_cntbt_recs_inorder(
 		 be32_to_cpu(r2->alloc.ar_startblock));
 }
 
+STATIC bool
+xfs_allocbt_has_key_gap(
+	struct xfs_btree_cur		*cur,
+	const union xfs_btree_key	*key1,
+	const union xfs_btree_key	*key2)
+{
+	xfs_agblock_t			next;
+
+	next = be32_to_cpu(key1->alloc.ar_startblock) + 1;
+	return next != be32_to_cpu(key2->alloc.ar_startblock);
+}
+
 static const struct xfs_btree_ops xfs_bnobt_ops = {
 	.rec_len		= sizeof(xfs_alloc_rec_t),
 	.key_len		= sizeof(xfs_alloc_key_t),
@@ -443,6 +455,7 @@ static const struct xfs_btree_ops xfs_bnobt_ops = {
 	.diff_two_keys		= xfs_bnobt_diff_two_keys,
 	.keys_inorder		= xfs_bnobt_keys_inorder,
 	.recs_inorder		= xfs_bnobt_recs_inorder,
+	.has_key_gap		= xfs_allocbt_has_key_gap,
 };
 
 static const struct xfs_btree_ops xfs_cntbt_ops = {
diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c
index cfa052d40105..d1225b957649 100644
--- a/fs/xfs/libxfs/xfs_bmap_btree.c
+++ b/fs/xfs/libxfs/xfs_bmap_btree.c
@@ -518,6 +518,18 @@ xfs_bmbt_recs_inorder(
 		xfs_bmbt_disk_get_startoff(&r2->bmbt);
 }
 
+STATIC bool
+xfs_bmbt_has_key_gap(
+	struct xfs_btree_cur		*cur,
+	const union xfs_btree_key	*key1,
+	const union xfs_btree_key	*key2)
+{
+	xfs_fileoff_t			next;
+
+	next = be64_to_cpu(key1->bmbt.br_startoff) + 1;
+	return next != be64_to_cpu(key2->bmbt.br_startoff);
+}
+
 static const struct xfs_btree_ops xfs_bmbt_ops = {
 	.rec_len		= sizeof(xfs_bmbt_rec_t),
 	.key_len		= sizeof(xfs_bmbt_key_t),
@@ -538,6 +550,7 @@ static const struct xfs_btree_ops xfs_bmbt_ops = {
 	.buf_ops		= &xfs_bmbt_buf_ops,
 	.keys_inorder		= xfs_bmbt_keys_inorder,
 	.recs_inorder		= xfs_bmbt_recs_inorder,
+	.has_key_gap		= xfs_bmbt_has_key_gap,
 };
 
 /*
diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index 4c16c8c31fcb..5710d3ee582a 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -5008,34 +5008,100 @@ xfs_btree_diff_two_ptrs(
 	return (int64_t)be32_to_cpu(a->s) - be32_to_cpu(b->s);
 }
 
-/* If there's an extent, we're done. */
+struct xfs_btree_scan_keyfill {
+	/* Keys for the start and end of the range we want to know about. */
+	union xfs_btree_key	start_key;
+	union xfs_btree_key	end_key;
+
+	/* Highest record key we've seen so far. */
+	union xfs_btree_key	high_key;
+
+	enum xfs_btree_keyfill	outcome;
+};
+
 STATIC int
-xfs_btree_has_record_helper(
+xfs_btree_scan_keyfill_helper(
 	struct xfs_btree_cur		*cur,
 	const union xfs_btree_rec	*rec,
 	void				*priv)
 {
-	return -ECANCELED;
+	union xfs_btree_key		rec_key;
+	union xfs_btree_key		rec_high_key;
+	struct xfs_btree_scan_keyfill	*info = priv;
+	int64_t				res;
+
+	cur->bc_ops->init_key_from_rec(&rec_key, rec);
+
+	if (info->outcome == XFS_BTREE_KEYFILL_EMPTY) {
+		info->outcome = XFS_BTREE_KEYFILL_SPARSE;
+
+		/* Bail if the first record starts after the start key. */
+		res = cur->bc_ops->diff_two_keys(cur, &info->start_key,
+				&rec_key);
+		if (res < 0)
+			return -ECANCELED;
+	} else {
+		/* Bail if there's a gap with the previous record. */
+		if (cur->bc_ops->has_key_gap(cur, &info->high_key, &rec_key))
+			return -ECANCELED;
+	}
+
+	/* If the current record is higher than what we've seen, remember it. */
+	cur->bc_ops->init_high_key_from_rec(&rec_high_key, rec);
+	res = cur->bc_ops->diff_two_keys(cur, &rec_high_key, &info->high_key);
+	if (res > 0)
+		info->high_key = rec_high_key; /* struct copy */
+
+	return 0;
 }
 
-/* Is there a record covering a given range of keys? */
+/*
+ * Scan part of the keyspace of a btree and tell us if the area has no records,
+ * is fully mapped by records, or is partially filled.
+ */
 int
-xfs_btree_has_record(
+xfs_btree_scan_keyfill(
 	struct xfs_btree_cur		*cur,
 	const union xfs_btree_irec	*low,
 	const union xfs_btree_irec	*high,
-	bool				*exists)
+	enum xfs_btree_keyfill		*outcome)
 {
+	struct xfs_btree_scan_keyfill	info = {
+		.outcome		= XFS_BTREE_KEYFILL_EMPTY,
+	};
+	union xfs_btree_rec		rec;
+	int64_t				res;
 	int				error;
 
+	if (!cur->bc_ops->has_key_gap)
+		return -EOPNOTSUPP;
+
+	cur->bc_rec = *low;
+	cur->bc_ops->init_rec_from_cur(cur, &rec);
+	cur->bc_ops->init_key_from_rec(&info.start_key, &rec);
+
+	cur->bc_rec = *high;
+	cur->bc_ops->init_rec_from_cur(cur, &rec);
+	cur->bc_ops->init_key_from_rec(&info.end_key, &rec);
+
 	error = xfs_btree_query_range(cur, low, high,
-			&xfs_btree_has_record_helper, NULL);
-	if (error == -ECANCELED) {
-		*exists = true;
-		return 0;
-	}
-	*exists = false;
-	return error;
+			xfs_btree_scan_keyfill_helper, &info);
+	if (error == -ECANCELED)
+		goto out;
+	if (error)
+		return error;
+
+	if (info.outcome == XFS_BTREE_KEYFILL_EMPTY)
+		goto out;
+
+	/* Did the record set go at least as far as the end? */
+	res = cur->bc_ops->diff_two_keys(cur, &info.high_key, &info.end_key);
+	if (res >= 0)
+		info.outcome = XFS_BTREE_KEYFILL_FULL;
+
+out:
+	*outcome = info.outcome;
+	return 0;
 }
 
 /* Are there more records in this btree? */
diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index eef27858a013..58a05f0d1f1b 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -157,6 +157,11 @@ struct xfs_btree_ops {
 	int	(*recs_inorder)(struct xfs_btree_cur *cur,
 				const union xfs_btree_rec *r1,
 				const union xfs_btree_rec *r2);
+
+	/* decide if there's a gap in the keyspace between two keys */
+	bool	(*has_key_gap)(struct xfs_btree_cur *cur,
+			       const union xfs_btree_key *key1,
+			       const union xfs_btree_key *key2);
 };
 
 /*
@@ -540,9 +545,15 @@ void xfs_btree_get_keys(struct xfs_btree_cur *cur,
 		struct xfs_btree_block *block, union xfs_btree_key *key);
 union xfs_btree_key *xfs_btree_high_key_from_key(struct xfs_btree_cur *cur,
 		union xfs_btree_key *key);
-int xfs_btree_has_record(struct xfs_btree_cur *cur,
+typedef bool (*xfs_btree_key_gap_fn)(struct xfs_btree_cur *cur,
+		const union xfs_btree_key *key1,
+		const union xfs_btree_key *key2);
+
+int xfs_btree_scan_keyfill(struct xfs_btree_cur *cur,
 		const union xfs_btree_irec *low,
-		const union xfs_btree_irec *high, bool *exists);
+		const union xfs_btree_irec *high,
+		enum xfs_btree_keyfill *outcome);
+
 bool xfs_btree_has_more_records(struct xfs_btree_cur *cur);
 struct xfs_ifork *xfs_btree_ifork_ptr(struct xfs_btree_cur *cur);
 
diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
index 8c83e265770c..fd48b95b4f4e 100644
--- a/fs/xfs/libxfs/xfs_ialloc_btree.c
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
@@ -380,6 +380,18 @@ xfs_inobt_recs_inorder(
 		be32_to_cpu(r2->inobt.ir_startino);
 }
 
+STATIC bool
+xfs_inobt_has_key_gap(
+	struct xfs_btree_cur		*cur,
+	const union xfs_btree_key	*key1,
+	const union xfs_btree_key	*key2)
+{
+	xfs_agino_t			next;
+
+	next = be32_to_cpu(key1->inobt.ir_startino) + XFS_INODES_PER_CHUNK;
+	return next != be32_to_cpu(key2->inobt.ir_startino);
+}
+
 static const struct xfs_btree_ops xfs_inobt_ops = {
 	.rec_len		= sizeof(xfs_inobt_rec_t),
 	.key_len		= sizeof(xfs_inobt_key_t),
@@ -399,6 +411,7 @@ static const struct xfs_btree_ops xfs_inobt_ops = {
 	.diff_two_keys		= xfs_inobt_diff_two_keys,
 	.keys_inorder		= xfs_inobt_keys_inorder,
 	.recs_inorder		= xfs_inobt_recs_inorder,
+	.has_key_gap		= xfs_inobt_has_key_gap,
 };
 
 static const struct xfs_btree_ops xfs_finobt_ops = {
@@ -420,6 +433,7 @@ static const struct xfs_btree_ops xfs_finobt_ops = {
 	.diff_two_keys		= xfs_inobt_diff_two_keys,
 	.keys_inorder		= xfs_inobt_keys_inorder,
 	.recs_inorder		= xfs_inobt_recs_inorder,
+	.has_key_gap		= xfs_inobt_has_key_gap,
 };
 
 /*
diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
index 64b910caafaa..607fd25fda56 100644
--- a/fs/xfs/libxfs/xfs_refcount.c
+++ b/fs/xfs/libxfs/xfs_refcount.c
@@ -1766,13 +1766,16 @@ xfs_refcount_recover_cow_leftovers(
 	return error;
 }
 
-/* Is there a record covering a given extent? */
+/*
+ * Scan part of the keyspace of the refcount records and tell us if the area
+ * has no records, is fully mapped by records, or is partially filled.
+ */
 int
-xfs_refcount_has_record(
+xfs_refcount_scan_keyfill(
 	struct xfs_btree_cur	*cur,
 	xfs_agblock_t		bno,
 	xfs_extlen_t		len,
-	bool			*exists)
+	enum xfs_btree_keyfill	*outcome)
 {
 	union xfs_btree_irec	low;
 	union xfs_btree_irec	high;
@@ -1782,7 +1785,7 @@ xfs_refcount_has_record(
 	memset(&high, 0xFF, sizeof(high));
 	high.rc.rc_startblock = bno + len - 1;
 
-	return xfs_btree_has_record(cur, &low, &high, exists);
+	return xfs_btree_scan_keyfill(cur, &low, &high, outcome);
 }
 
 int __init
diff --git a/fs/xfs/libxfs/xfs_refcount.h b/fs/xfs/libxfs/xfs_refcount.h
index e8b322de7f3d..14b8edc289fa 100644
--- a/fs/xfs/libxfs/xfs_refcount.h
+++ b/fs/xfs/libxfs/xfs_refcount.h
@@ -78,8 +78,9 @@ extern int xfs_refcount_recover_cow_leftovers(struct xfs_mount *mp,
  */
 #define XFS_REFCOUNT_ITEM_OVERHEAD	32
 
-extern int xfs_refcount_has_record(struct xfs_btree_cur *cur,
-		xfs_agblock_t bno, xfs_extlen_t len, bool *exists);
+extern int xfs_refcount_scan_keyfill(struct xfs_btree_cur *cur,
+		xfs_agblock_t bno, xfs_extlen_t len,
+		enum xfs_btree_keyfill *outcome);
 union xfs_btree_rec;
 extern void xfs_refcount_btrec_to_irec(const union xfs_btree_rec *rec,
 		struct xfs_refcount_irec *irec);
diff --git a/fs/xfs/libxfs/xfs_refcount_btree.c b/fs/xfs/libxfs/xfs_refcount_btree.c
index 316c1ec0c3c2..f7982b2ecc49 100644
--- a/fs/xfs/libxfs/xfs_refcount_btree.c
+++ b/fs/xfs/libxfs/xfs_refcount_btree.c
@@ -290,6 +290,18 @@ xfs_refcountbt_recs_inorder(
 		be32_to_cpu(r2->refc.rc_startblock);
 }
 
+STATIC bool
+xfs_refcountbt_has_key_gap(
+	struct xfs_btree_cur		*cur,
+	const union xfs_btree_key	*key1,
+	const union xfs_btree_key	*key2)
+{
+	xfs_agblock_t			next;
+
+	next = be32_to_cpu(key1->refc.rc_startblock) + 1;
+	return next != be32_to_cpu(key2->refc.rc_startblock);
+}
+
 static const struct xfs_btree_ops xfs_refcountbt_ops = {
 	.rec_len		= sizeof(struct xfs_refcount_rec),
 	.key_len		= sizeof(struct xfs_refcount_key),
@@ -309,6 +321,7 @@ static const struct xfs_btree_ops xfs_refcountbt_ops = {
 	.diff_two_keys		= xfs_refcountbt_diff_two_keys,
 	.keys_inorder		= xfs_refcountbt_keys_inorder,
 	.recs_inorder		= xfs_refcountbt_recs_inorder,
+	.has_key_gap		= xfs_refcountbt_has_key_gap,
 };
 
 /*
diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
index 094dfc897ebc..08d47cbf4697 100644
--- a/fs/xfs/libxfs/xfs_rmap.c
+++ b/fs/xfs/libxfs/xfs_rmap.c
@@ -2671,13 +2671,17 @@ xfs_rmap_compare(
 		return 0;
 }
 
-/* Is there a record covering a given extent? */
+/*
+ * Scan the physical storage part of the keyspace of the reverse mapping index
+ * and tell us if the area has no records, is fully mapped by records, or is
+ * partially filled.
+ */
 int
-xfs_rmap_has_record(
+xfs_rmap_scan_keyfill(
 	struct xfs_btree_cur	*cur,
 	xfs_agblock_t		bno,
 	xfs_extlen_t		len,
-	bool			*exists)
+	enum xfs_btree_keyfill	*outcome)
 {
 	union xfs_btree_irec	low;
 	union xfs_btree_irec	high;
@@ -2687,7 +2691,7 @@ xfs_rmap_has_record(
 	memset(&high, 0xFF, sizeof(high));
 	high.r.rm_startblock = bno + len - 1;
 
-	return xfs_btree_has_record(cur, &low, &high, exists);
+	return xfs_btree_scan_keyfill(cur, &low, &high, outcome);
 }
 
 /*
diff --git a/fs/xfs/libxfs/xfs_rmap.h b/fs/xfs/libxfs/xfs_rmap.h
index 54741a591a17..263a2dd09216 100644
--- a/fs/xfs/libxfs/xfs_rmap.h
+++ b/fs/xfs/libxfs/xfs_rmap.h
@@ -192,8 +192,8 @@ int xfs_rmap_compare(const struct xfs_rmap_irec *a,
 union xfs_btree_rec;
 int xfs_rmap_btrec_to_irec(const union xfs_btree_rec *rec,
 		struct xfs_rmap_irec *irec);
-int xfs_rmap_has_record(struct xfs_btree_cur *cur, xfs_agblock_t bno,
-		xfs_extlen_t len, bool *exists);
+int xfs_rmap_scan_keyfill(struct xfs_btree_cur *cur, xfs_agblock_t bno,
+		xfs_extlen_t len, enum xfs_btree_keyfill *outcome);
 int xfs_rmap_record_exists(struct xfs_btree_cur *cur, xfs_agblock_t bno,
 		xfs_extlen_t len, const struct xfs_owner_info *oinfo,
 		bool *has_rmap);
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.c b/fs/xfs/libxfs/xfs_rmap_btree.c
index e2e1f68cedf5..d64143a842ce 100644
--- a/fs/xfs/libxfs/xfs_rmap_btree.c
+++ b/fs/xfs/libxfs/xfs_rmap_btree.c
@@ -433,6 +433,18 @@ xfs_rmapbt_recs_inorder(
 	return 0;
 }
 
+STATIC bool
+xfs_rmapbt_has_key_gap(
+	struct xfs_btree_cur		*cur,
+	const union xfs_btree_key	*key1,
+	const union xfs_btree_key	*key2)
+{
+	xfs_agblock_t			next;
+
+	next = be32_to_cpu(key1->rmap.rm_startblock) + 1;
+	return next != be32_to_cpu(key2->rmap.rm_startblock);
+}
+
 static const struct xfs_btree_ops xfs_rmapbt_ops = {
 	.rec_len		= sizeof(struct xfs_rmap_rec),
 	.key_len		= 2 * sizeof(struct xfs_rmap_key),
@@ -452,6 +464,7 @@ static const struct xfs_btree_ops xfs_rmapbt_ops = {
 	.diff_two_keys		= xfs_rmapbt_diff_two_keys,
 	.keys_inorder		= xfs_rmapbt_keys_inorder,
 	.recs_inorder		= xfs_rmapbt_recs_inorder,
+	.has_key_gap		= xfs_rmapbt_has_key_gap,
 };
 
 static struct xfs_btree_cur *
diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h
index a6b7d98cf68f..d63637a3b873 100644
--- a/fs/xfs/libxfs/xfs_types.h
+++ b/fs/xfs/libxfs/xfs_types.h
@@ -174,6 +174,18 @@ enum xfs_ag_resv_type {
 	XFS_AG_RESV_RMAPBT,
 };
 
+/* Results of scanning a btree keyspace to check occupancy. */
+enum xfs_btree_keyfill {
+	/* None of the keyspace maps to records. */
+	XFS_BTREE_KEYFILL_EMPTY = 0,
+
+	/* Some, but not all, of the keyspace maps to records. */
+	XFS_BTREE_KEYFILL_SPARSE,
+
+	/* The entire keyspace maps to records. */
+	XFS_BTREE_KEYFILL_FULL,
+};
+
 /*
  * Type verifier functions
  */
diff --git a/fs/xfs/scrub/alloc.c b/fs/xfs/scrub/alloc.c
index 4d9ccc15b048..d8f2ba7efa22 100644
--- a/fs/xfs/scrub/alloc.c
+++ b/fs/xfs/scrub/alloc.c
@@ -146,15 +146,15 @@ xchk_xref_is_used_space(
 	xfs_agblock_t		agbno,
 	xfs_extlen_t		len)
 {
-	bool			is_freesp;
+	enum xfs_btree_keyfill	keyfill;
 	int			error;
 
 	if (!sc->sa.bno_cur || xchk_skip_xref(sc->sm))
 		return;
 
-	error = xfs_alloc_has_record(sc->sa.bno_cur, agbno, len, &is_freesp);
+	error = xfs_alloc_scan_keyfill(sc->sa.bno_cur, agbno, len, &keyfill);
 	if (!xchk_should_check_xref(sc, &error, &sc->sa.bno_cur))
 		return;
-	if (is_freesp)
+	if (keyfill != XFS_BTREE_KEYFILL_EMPTY)
 		xchk_btree_xref_set_corrupt(sc, sc->sa.bno_cur, 0);
 }
diff --git a/fs/xfs/scrub/refcount.c b/fs/xfs/scrub/refcount.c
index 4e70cd821b62..bfd48aaceb82 100644
--- a/fs/xfs/scrub/refcount.c
+++ b/fs/xfs/scrub/refcount.c
@@ -475,15 +475,16 @@ xchk_xref_is_not_shared(
 	xfs_agblock_t		agbno,
 	xfs_extlen_t		len)
 {
-	bool			shared;
+	enum xfs_btree_keyfill	keyfill;
 	int			error;
 
 	if (!sc->sa.refc_cur || xchk_skip_xref(sc->sm))
 		return;
 
-	error = xfs_refcount_has_record(sc->sa.refc_cur, agbno, len, &shared);
+	error = xfs_refcount_scan_keyfill(sc->sa.refc_cur, agbno, len,
+			&keyfill);
 	if (!xchk_should_check_xref(sc, &error, &sc->sa.refc_cur))
 		return;
-	if (shared)
+	if (keyfill != XFS_BTREE_KEYFILL_EMPTY)
 		xchk_btree_xref_set_corrupt(sc, sc->sa.refc_cur, 0);
 }
diff --git a/fs/xfs/scrub/rmap.c b/fs/xfs/scrub/rmap.c
index afc4f840b6bc..6525202c6a8a 100644
--- a/fs/xfs/scrub/rmap.c
+++ b/fs/xfs/scrub/rmap.c
@@ -224,15 +224,15 @@ xchk_xref_has_no_owner(
 	xfs_agblock_t		bno,
 	xfs_extlen_t		len)
 {
-	bool			has_rmap;
+	enum xfs_btree_keyfill	keyfill;
 	int			error;
 
 	if (!sc->sa.rmap_cur || xchk_skip_xref(sc->sm))
 		return;
 
-	error = xfs_rmap_has_record(sc->sa.rmap_cur, bno, len, &has_rmap);
+	error = xfs_rmap_scan_keyfill(sc->sa.rmap_cur, bno, len, &keyfill);
 	if (!xchk_should_check_xref(sc, &error, &sc->sa.rmap_cur))
 		return;
-	if (has_rmap)
+	if (keyfill != XFS_BTREE_KEYFILL_EMPTY)
 		xchk_btree_xref_set_corrupt(sc, sc->sa.rmap_cur, 0);
 }


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/5] xfs: refactor converting btree irec to btree key
  2022-10-02 18:20 [PATCHSET v23.1 0/5] xfs: detect incorrect gaps in refcount btree Darrick J. Wong
  2022-10-02 18:20 ` [PATCH 3/5] xfs: mask key comparisons for keyspace fill scans Darrick J. Wong
@ 2022-10-02 18:20 ` Darrick J. Wong
  2022-11-02  0:31   ` Dave Chinner
  2022-10-02 18:20 ` [PATCH 1/5] xfs: replace xfs_btree_has_record with a general keyspace scanner Darrick J. Wong
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Darrick J. Wong @ 2022-10-02 18:20 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

We keep doing these conversions to support btree queries, so refactor
this into a helper.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_btree.c |   33 +++++++++++++++++----------------
 1 file changed, 17 insertions(+), 16 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index 5710d3ee582a..edea6db8d8e4 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -4918,6 +4918,19 @@ xfs_btree_overlapped_query_range(
 	return error;
 }
 
+static inline void
+xfs_btree_key_from_irec(
+	struct xfs_btree_cur		*cur,
+	union xfs_btree_key		*key,
+	const union xfs_btree_irec	*irec)
+{
+	union xfs_btree_rec		rec;
+
+	cur->bc_rec = *irec;
+	cur->bc_ops->init_rec_from_cur(cur, &rec);
+	cur->bc_ops->init_key_from_rec(key, &rec);
+}
+
 /*
  * Query a btree for all records overlapping a given interval of keys.  The
  * supplied function will be called with each record found; return one of the
@@ -4932,18 +4945,12 @@ xfs_btree_query_range(
 	xfs_btree_query_range_fn	fn,
 	void				*priv)
 {
-	union xfs_btree_rec		rec;
 	union xfs_btree_key		low_key;
 	union xfs_btree_key		high_key;
 
 	/* Find the keys of both ends of the interval. */
-	cur->bc_rec = *high_rec;
-	cur->bc_ops->init_rec_from_cur(cur, &rec);
-	cur->bc_ops->init_key_from_rec(&high_key, &rec);
-
-	cur->bc_rec = *low_rec;
-	cur->bc_ops->init_rec_from_cur(cur, &rec);
-	cur->bc_ops->init_key_from_rec(&low_key, &rec);
+	xfs_btree_key_from_irec(cur, &high_key, high_rec);
+	xfs_btree_key_from_irec(cur, &low_key, low_rec);
 
 	/* Enforce low key < high key. */
 	if (cur->bc_ops->diff_two_keys(cur, &low_key, &high_key) > 0)
@@ -5069,20 +5076,14 @@ xfs_btree_scan_keyfill(
 	struct xfs_btree_scan_keyfill	info = {
 		.outcome		= XFS_BTREE_KEYFILL_EMPTY,
 	};
-	union xfs_btree_rec		rec;
 	int64_t				res;
 	int				error;
 
 	if (!cur->bc_ops->has_key_gap)
 		return -EOPNOTSUPP;
 
-	cur->bc_rec = *low;
-	cur->bc_ops->init_rec_from_cur(cur, &rec);
-	cur->bc_ops->init_key_from_rec(&info.start_key, &rec);
-
-	cur->bc_rec = *high;
-	cur->bc_ops->init_rec_from_cur(cur, &rec);
-	cur->bc_ops->init_key_from_rec(&info.end_key, &rec);
+	xfs_btree_key_from_irec(cur, &info.start_key, low);
+	xfs_btree_key_from_irec(cur, &info.end_key, high);
 
 	error = xfs_btree_query_range(cur, low, high,
 			xfs_btree_scan_keyfill_helper, &info);


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/5] xfs: mask key comparisons for keyspace fill scans
  2022-10-02 18:20 [PATCHSET v23.1 0/5] xfs: detect incorrect gaps in refcount btree Darrick J. Wong
@ 2022-10-02 18:20 ` Darrick J. Wong
  2022-11-02  2:16   ` Dave Chinner
  2022-10-02 18:20 ` [PATCH 2/5] xfs: refactor converting btree irec to btree key Darrick J. Wong
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Darrick J. Wong @ 2022-10-02 18:20 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

For keyspace fullness scans, we want to be able to mask off the parts of
the key that we don't care about.  For most btree types we /do/ want the
full keyspace, but for checking that a given space usage also has a full
complement of rmapbt records (even if different/multiple owners) we need
this masking so that we only track sparseness of rm_startblock, not the
whole keyspace (which is extremely sparse).

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_alloc.c          |    2 +-
 fs/xfs/libxfs/xfs_alloc_btree.c    |    5 +++-
 fs/xfs/libxfs/xfs_bmap_btree.c     |    5 +++-
 fs/xfs/libxfs/xfs_btree.c          |   41 +++++++++++++++++++++++++++----
 fs/xfs/libxfs/xfs_btree.h          |   10 +++++++-
 fs/xfs/libxfs/xfs_ialloc_btree.c   |    7 ++++-
 fs/xfs/libxfs/xfs_refcount.c       |    2 +-
 fs/xfs/libxfs/xfs_refcount_btree.c |    5 +++-
 fs/xfs/libxfs/xfs_rmap.c           |    7 +++++
 fs/xfs/libxfs/xfs_rmap_btree.c     |   48 +++++++++++++++++++++++++++++++++---
 10 files changed, 114 insertions(+), 18 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index f0c92093db0a..0ce1f914f7cc 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -3517,7 +3517,7 @@ xfs_alloc_scan_keyfill(
 	memset(&high, 0xFF, sizeof(high));
 	high.a.ar_startblock = bno + len - 1;
 
-	return xfs_btree_scan_keyfill(cur, &low, &high, outcome);
+	return xfs_btree_scan_keyfill(cur, &low, &high, NULL, outcome);
 }
 
 /*
diff --git a/fs/xfs/libxfs/xfs_alloc_btree.c b/fs/xfs/libxfs/xfs_alloc_btree.c
index 916d278204f5..4a69d767a32b 100644
--- a/fs/xfs/libxfs/xfs_alloc_btree.c
+++ b/fs/xfs/libxfs/xfs_alloc_btree.c
@@ -427,10 +427,13 @@ STATIC bool
 xfs_allocbt_has_key_gap(
 	struct xfs_btree_cur		*cur,
 	const union xfs_btree_key	*key1,
-	const union xfs_btree_key	*key2)
+	const union xfs_btree_key	*key2,
+	const union xfs_btree_key	*mask)
 {
 	xfs_agblock_t			next;
 
+	ASSERT(!mask || mask->alloc.ar_startblock);
+
 	next = be32_to_cpu(key1->alloc.ar_startblock) + 1;
 	return next != be32_to_cpu(key2->alloc.ar_startblock);
 }
diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c
index d1225b957649..81433a027912 100644
--- a/fs/xfs/libxfs/xfs_bmap_btree.c
+++ b/fs/xfs/libxfs/xfs_bmap_btree.c
@@ -522,10 +522,13 @@ STATIC bool
 xfs_bmbt_has_key_gap(
 	struct xfs_btree_cur		*cur,
 	const union xfs_btree_key	*key1,
-	const union xfs_btree_key	*key2)
+	const union xfs_btree_key	*key2,
+	const union xfs_btree_key	*mask)
 {
 	xfs_fileoff_t			next;
 
+	ASSERT(!mask || mask->bmbt.br_startoff);
+
 	next = be64_to_cpu(key1->bmbt.br_startoff) + 1;
 	return next != be64_to_cpu(key2->bmbt.br_startoff);
 }
diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index edea6db8d8e4..6fbce2f3c17e 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -5020,12 +5020,33 @@ struct xfs_btree_scan_keyfill {
 	union xfs_btree_key	start_key;
 	union xfs_btree_key	end_key;
 
+	/* Mask for key comparisons, if desired. */
+	union xfs_btree_key	*key_mask;
+
 	/* Highest record key we've seen so far. */
 	union xfs_btree_key	high_key;
 
 	enum xfs_btree_keyfill	outcome;
 };
 
+STATIC int64_t
+xfs_btree_diff_two_masked_keys(
+	struct xfs_btree_cur		*cur,
+	const union xfs_btree_key	*key1,
+	const union xfs_btree_key	*key2,
+	const union xfs_btree_key	*mask)
+{
+	union xfs_btree_key		mk1, mk2;
+
+	if (likely(!mask))
+		return cur->bc_ops->diff_two_keys(cur, key1, key2);
+
+	cur->bc_ops->mask_key(cur, &mk1, key1, mask);
+	cur->bc_ops->mask_key(cur, &mk2, key2, mask);
+
+	return cur->bc_ops->diff_two_keys(cur, &mk1, &mk2);
+}
+
 STATIC int
 xfs_btree_scan_keyfill_helper(
 	struct xfs_btree_cur		*cur,
@@ -5043,19 +5064,22 @@ xfs_btree_scan_keyfill_helper(
 		info->outcome = XFS_BTREE_KEYFILL_SPARSE;
 
 		/* Bail if the first record starts after the start key. */
-		res = cur->bc_ops->diff_two_keys(cur, &info->start_key,
-				&rec_key);
+		res = xfs_btree_diff_two_masked_keys(cur, &info->start_key,
+				&rec_key, info->key_mask);
 		if (res < 0)
 			return -ECANCELED;
 	} else {
 		/* Bail if there's a gap with the previous record. */
-		if (cur->bc_ops->has_key_gap(cur, &info->high_key, &rec_key))
+		if (cur->bc_ops->has_key_gap(cur, &info->high_key, &rec_key,
+					info->key_mask))
 			return -ECANCELED;
 	}
 
 	/* If the current record is higher than what we've seen, remember it. */
 	cur->bc_ops->init_high_key_from_rec(&rec_high_key, rec);
-	res = cur->bc_ops->diff_two_keys(cur, &rec_high_key, &info->high_key);
+
+	res = xfs_btree_diff_two_masked_keys(cur, &rec_high_key,
+			&info->high_key, info->key_mask);
 	if (res > 0)
 		info->high_key = rec_high_key; /* struct copy */
 
@@ -5071,11 +5095,13 @@ xfs_btree_scan_keyfill(
 	struct xfs_btree_cur		*cur,
 	const union xfs_btree_irec	*low,
 	const union xfs_btree_irec	*high,
+	const union xfs_btree_irec	*mask,
 	enum xfs_btree_keyfill		*outcome)
 {
 	struct xfs_btree_scan_keyfill	info = {
 		.outcome		= XFS_BTREE_KEYFILL_EMPTY,
 	};
+	union xfs_btree_key		key_mask;
 	int64_t				res;
 	int				error;
 
@@ -5084,6 +5110,10 @@ xfs_btree_scan_keyfill(
 
 	xfs_btree_key_from_irec(cur, &info.start_key, low);
 	xfs_btree_key_from_irec(cur, &info.end_key, high);
+	if (mask) {
+		xfs_btree_key_from_irec(cur, &key_mask, mask);
+		info.key_mask = &key_mask;
+	}
 
 	error = xfs_btree_query_range(cur, low, high,
 			xfs_btree_scan_keyfill_helper, &info);
@@ -5096,7 +5126,8 @@ xfs_btree_scan_keyfill(
 		goto out;
 
 	/* Did the record set go at least as far as the end? */
-	res = cur->bc_ops->diff_two_keys(cur, &info.high_key, &info.end_key);
+	res = xfs_btree_diff_two_masked_keys(cur, &info.high_key,
+			&info.end_key, info.key_mask);
 	if (res >= 0)
 		info.outcome = XFS_BTREE_KEYFILL_FULL;
 
diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index 58a05f0d1f1b..99baa8283049 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -158,10 +158,17 @@ struct xfs_btree_ops {
 				const union xfs_btree_rec *r1,
 				const union xfs_btree_rec *r2);
 
+	/* mask a key for us */
+	void	(*mask_key)(struct xfs_btree_cur *cur,
+			    union xfs_btree_key *out_key,
+			    const union xfs_btree_key *in_key,
+			    const union xfs_btree_key *mask);
+
 	/* decide if there's a gap in the keyspace between two keys */
 	bool	(*has_key_gap)(struct xfs_btree_cur *cur,
 			       const union xfs_btree_key *key1,
-			       const union xfs_btree_key *key2);
+			       const union xfs_btree_key *key2,
+			       const union xfs_btree_key *mask);
 };
 
 /*
@@ -552,6 +559,7 @@ typedef bool (*xfs_btree_key_gap_fn)(struct xfs_btree_cur *cur,
 int xfs_btree_scan_keyfill(struct xfs_btree_cur *cur,
 		const union xfs_btree_irec *low,
 		const union xfs_btree_irec *high,
+		const union xfs_btree_irec *mask,
 		enum xfs_btree_keyfill *outcome);
 
 bool xfs_btree_has_more_records(struct xfs_btree_cur *cur);
diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
index fd48b95b4f4e..d429ca8d9dd8 100644
--- a/fs/xfs/libxfs/xfs_ialloc_btree.c
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
@@ -384,11 +384,14 @@ STATIC bool
 xfs_inobt_has_key_gap(
 	struct xfs_btree_cur		*cur,
 	const union xfs_btree_key	*key1,
-	const union xfs_btree_key	*key2)
+	const union xfs_btree_key	*key2,
+	const union xfs_btree_key	*mask)
 {
 	xfs_agino_t			next;
 
-	next = be32_to_cpu(key1->inobt.ir_startino) + XFS_INODES_PER_CHUNK;
+	ASSERT(!mask || mask->inobt.ir_startino);
+
+	next = be32_to_cpu(key1->inobt.ir_startino) + 1;
 	return next != be32_to_cpu(key2->inobt.ir_startino);
 }
 
diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
index 607fd25fda56..3ce77c9d2504 100644
--- a/fs/xfs/libxfs/xfs_refcount.c
+++ b/fs/xfs/libxfs/xfs_refcount.c
@@ -1785,7 +1785,7 @@ xfs_refcount_scan_keyfill(
 	memset(&high, 0xFF, sizeof(high));
 	high.rc.rc_startblock = bno + len - 1;
 
-	return xfs_btree_scan_keyfill(cur, &low, &high, outcome);
+	return xfs_btree_scan_keyfill(cur, &low, &high, NULL, outcome);
 }
 
 int __init
diff --git a/fs/xfs/libxfs/xfs_refcount_btree.c b/fs/xfs/libxfs/xfs_refcount_btree.c
index f7982b2ecc49..301d036f7081 100644
--- a/fs/xfs/libxfs/xfs_refcount_btree.c
+++ b/fs/xfs/libxfs/xfs_refcount_btree.c
@@ -294,10 +294,13 @@ STATIC bool
 xfs_refcountbt_has_key_gap(
 	struct xfs_btree_cur		*cur,
 	const union xfs_btree_key	*key1,
-	const union xfs_btree_key	*key2)
+	const union xfs_btree_key	*key2,
+	const union xfs_btree_key	*mask)
 {
 	xfs_agblock_t			next;
 
+	ASSERT(!mask || mask->refc.rc_startblock);
+
 	next = be32_to_cpu(key1->refc.rc_startblock) + 1;
 	return next != be32_to_cpu(key2->refc.rc_startblock);
 }
diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
index 08d47cbf4697..4c123b6dd080 100644
--- a/fs/xfs/libxfs/xfs_rmap.c
+++ b/fs/xfs/libxfs/xfs_rmap.c
@@ -2685,13 +2685,18 @@ xfs_rmap_scan_keyfill(
 {
 	union xfs_btree_irec	low;
 	union xfs_btree_irec	high;
+	union xfs_btree_irec	mask;
+
+	/* Only care about space scans here */
+	memset(&mask, 0, sizeof(low));
+	memset(&mask.r.rm_startblock, 0xFF, sizeof(mask.r.rm_startblock));
 
 	memset(&low, 0, sizeof(low));
 	low.r.rm_startblock = bno;
 	memset(&high, 0xFF, sizeof(high));
 	high.r.rm_startblock = bno + len - 1;
 
-	return xfs_btree_scan_keyfill(cur, &low, &high, outcome);
+	return xfs_btree_scan_keyfill(cur, &low, &high, &mask, outcome);
 }
 
 /*
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.c b/fs/xfs/libxfs/xfs_rmap_btree.c
index d64143a842ce..9ca60f709c4b 100644
--- a/fs/xfs/libxfs/xfs_rmap_btree.c
+++ b/fs/xfs/libxfs/xfs_rmap_btree.c
@@ -433,16 +433,55 @@ xfs_rmapbt_recs_inorder(
 	return 0;
 }
 
+STATIC void
+xfs_rmapbt_mask_key(
+	struct xfs_btree_cur		*cur,
+	union xfs_btree_key		*out_key,
+	const union xfs_btree_key	*in_key,
+	const union xfs_btree_key	*mask)
+{
+	memset(out_key, 0, sizeof(union xfs_btree_key));
+
+	if (mask->rmap.rm_startblock)
+		out_key->rmap.rm_startblock = in_key->rmap.rm_startblock;
+	if (mask->rmap.rm_owner)
+		out_key->rmap.rm_owner = in_key->rmap.rm_owner;
+	if (mask->rmap.rm_offset)
+		out_key->rmap.rm_offset = in_key->rmap.rm_offset;
+}
+
 STATIC bool
 xfs_rmapbt_has_key_gap(
 	struct xfs_btree_cur		*cur,
 	const union xfs_btree_key	*key1,
-	const union xfs_btree_key	*key2)
+	const union xfs_btree_key	*key2,
+	const union xfs_btree_key	*mask)
 {
-	xfs_agblock_t			next;
+	bool				reflink = xfs_has_reflink(cur->bc_mp);
+	uint64_t			x, y;
 
-	next = be32_to_cpu(key1->rmap.rm_startblock) + 1;
-	return next != be32_to_cpu(key2->rmap.rm_startblock);
+	if (mask->rmap.rm_offset) {
+		x = be64_to_cpu(key1->rmap.rm_offset) + 1;
+		y = be64_to_cpu(key2->rmap.rm_offset);
+		if ((reflink && x < y) || (!reflink && x != y))
+			return true;
+	}
+
+	if (mask->rmap.rm_owner) {
+		x = be64_to_cpu(key1->rmap.rm_owner) + 1;
+		y = be64_to_cpu(key2->rmap.rm_owner);
+		if ((reflink && x < y) || (!reflink && x != y))
+			return true;
+	}
+
+	if (mask->rmap.rm_startblock) {
+		x = be32_to_cpu(key1->rmap.rm_startblock) + 1;
+		y = be32_to_cpu(key2->rmap.rm_startblock);
+		if ((reflink && x < y) || (!reflink && x != y))
+			return true;
+	}
+
+	return false;
 }
 
 static const struct xfs_btree_ops xfs_rmapbt_ops = {
@@ -465,6 +504,7 @@ static const struct xfs_btree_ops xfs_rmapbt_ops = {
 	.keys_inorder		= xfs_rmapbt_keys_inorder,
 	.recs_inorder		= xfs_rmapbt_recs_inorder,
 	.has_key_gap		= xfs_rmapbt_has_key_gap,
+	.mask_key		= xfs_rmapbt_mask_key,
 };
 
 static struct xfs_btree_cur *


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/5] xfs: check the reference counts of gaps in the refcount btree
  2022-10-02 18:20 [PATCHSET v23.1 0/5] xfs: detect incorrect gaps in refcount btree Darrick J. Wong
                   ` (2 preceding siblings ...)
  2022-10-02 18:20 ` [PATCH 1/5] xfs: replace xfs_btree_has_record with a general keyspace scanner Darrick J. Wong
@ 2022-10-02 18:20 ` Darrick J. Wong
  2022-11-02  2:20   ` Dave Chinner
  2022-10-02 18:20 ` [PATCH 5/5] xfs: ensure that all metadata and data blocks are not cow staging extents Darrick J. Wong
  4 siblings, 1 reply; 14+ messages in thread
From: Darrick J. Wong @ 2022-10-02 18:20 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Gaps in the reference count btree are also significant -- for these
regions, there must not be any overlapping reverse mappings.  We don't
currently check this, so make the refcount scrubber more complete.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/refcount.c |   84 ++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 79 insertions(+), 5 deletions(-)


diff --git a/fs/xfs/scrub/refcount.c b/fs/xfs/scrub/refcount.c
index bfd48aaceb82..d97e7e372b9c 100644
--- a/fs/xfs/scrub/refcount.c
+++ b/fs/xfs/scrub/refcount.c
@@ -332,14 +332,69 @@ xchk_refcountbt_xref(
 	xchk_refcountbt_xref_rmap(sc, agbno, len, refcount);
 }
 
+struct xchk_refcbt_records {
+	/* The next AG block where we aren't expecting shared extents. */
+	xfs_agblock_t		next_unshared_agbno;
+
+	/* Number of CoW blocks we expect. */
+	xfs_agblock_t		cow_blocks;
+};
+
+STATIC int
+xchk_refcountbt_rmap_check_gap(
+	struct xfs_btree_cur		*cur,
+	const struct xfs_rmap_irec	*rec,
+	void				*priv)
+{
+	xfs_agblock_t			*next_bno = priv;
+
+	if (*next_bno != NULLAGBLOCK && rec->rm_startblock < *next_bno)
+		return -ECANCELED;
+
+	*next_bno = rec->rm_startblock + rec->rm_blockcount;
+	return 0;
+}
+
+/*
+ * Make sure that a gap in the reference count records does not correspond to
+ * overlapping records (i.e. shared extents) in the reverse mappings.
+ */
+static inline void
+xchk_refcountbt_xref_gaps(
+	struct xfs_scrub	*sc,
+	struct xchk_refcbt_records *rrc,
+	xfs_agblock_t		bno)
+{
+	struct xfs_rmap_irec	low;
+	struct xfs_rmap_irec	high;
+	xfs_agblock_t		next_bno = NULLAGBLOCK;
+	int			error;
+
+	if (bno <= rrc->next_unshared_agbno || !sc->sa.rmap_cur ||
+            xchk_skip_xref(sc->sm))
+		return;
+
+	memset(&low, 0, sizeof(low));
+	low.rm_startblock = rrc->next_unshared_agbno;
+	memset(&high, 0xFF, sizeof(high));
+	high.rm_startblock = bno - 1;
+
+	error = xfs_rmap_query_range(sc->sa.rmap_cur, &low, &high,
+			xchk_refcountbt_rmap_check_gap, &next_bno);
+	if (error == -ECANCELED)
+		xchk_btree_xref_set_corrupt(sc, sc->sa.rmap_cur, 0);
+	else
+		xchk_should_check_xref(sc, &error, &sc->sa.rmap_cur);
+}
+
 /* Scrub a refcountbt record. */
 STATIC int
 xchk_refcountbt_rec(
 	struct xchk_btree	*bs,
 	const union xfs_btree_rec *rec)
 {
-	xfs_agblock_t		*cow_blocks = bs->private;
 	struct xfs_perag	*pag = bs->cur->bc_ag.pag;
+	struct xchk_refcbt_records *rrc = bs->private;
 	xfs_agblock_t		bno;
 	xfs_extlen_t		len;
 	xfs_nlink_t		refcount;
@@ -354,7 +409,7 @@ xchk_refcountbt_rec(
 	if ((refcount == 1 && !has_cowflag) || (refcount != 1 && has_cowflag))
 		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
 	if (has_cowflag)
-		(*cow_blocks) += len;
+		rrc->cow_blocks += len;
 
 	/* Check the extent. */
 	bno &= ~XFS_REFC_COW_START;
@@ -368,6 +423,16 @@ xchk_refcountbt_rec(
 
 	xchk_refcountbt_xref(bs->sc, bno, len, refcount);
 
+	/*
+	 * If this is a record for a shared extent, check that all blocks
+	 * between the previous record and this one have at most one reverse
+	 * mapping.
+	 */
+	if (!has_cowflag) {
+		xchk_refcountbt_xref_gaps(bs->sc, rrc, bno);
+		rrc->next_unshared_agbno = bno + len;
+	}
+
 	return 0;
 }
 
@@ -409,15 +474,24 @@ int
 xchk_refcountbt(
 	struct xfs_scrub	*sc)
 {
-	xfs_agblock_t		cow_blocks = 0;
+	struct xchk_refcbt_records rrc = {
+		.cow_blocks		= 0,
+		.next_unshared_agbno	= 0,
+	};
 	int			error;
 
 	error = xchk_btree(sc, sc->sa.refc_cur, xchk_refcountbt_rec,
-			&XFS_RMAP_OINFO_REFC, &cow_blocks);
+			&XFS_RMAP_OINFO_REFC, &rrc);
 	if (error)
 		return error;
 
-	xchk_refcount_xref_rmap(sc, cow_blocks);
+	/*
+	 * Check that all blocks between the last refcount > 1 record and the
+	 * end of the AG have at most one reverse mapping.
+	 */
+	xchk_refcountbt_xref_gaps(sc, &rrc, sc->mp->m_sb.sb_agblocks);
+
+	xchk_refcount_xref_rmap(sc, rrc.cow_blocks);
 
 	return 0;
 }


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/5] xfs: ensure that all metadata and data blocks are not cow staging extents
  2022-10-02 18:20 [PATCHSET v23.1 0/5] xfs: detect incorrect gaps in refcount btree Darrick J. Wong
                   ` (3 preceding siblings ...)
  2022-10-02 18:20 ` [PATCH 4/5] xfs: check the reference counts of gaps in the refcount btree Darrick J. Wong
@ 2022-10-02 18:20 ` Darrick J. Wong
  2022-11-02  0:29   ` Dave Chinner
  4 siblings, 1 reply; 14+ messages in thread
From: Darrick J. Wong @ 2022-10-02 18:20 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Make sure that all filesystem metadata blocks and file data blocks are
not also marked as CoW staging extents.  The extra checking added here
was inspired by an actual VM host filesystem corruption incident due to
bugs in the CoW handling of 4.x kernels.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/agheader.c |    5 +++++
 fs/xfs/scrub/alloc.c    |    1 +
 fs/xfs/scrub/bmap.c     |   11 ++++++++---
 fs/xfs/scrub/ialloc.c   |    2 +-
 fs/xfs/scrub/inode.c    |    1 +
 fs/xfs/scrub/refcount.c |   21 +++++++++++++++++++++
 fs/xfs/scrub/scrub.h    |    2 ++
 7 files changed, 39 insertions(+), 4 deletions(-)


diff --git a/fs/xfs/scrub/agheader.c b/fs/xfs/scrub/agheader.c
index 3dd9151a20ad..520ec054e4a6 100644
--- a/fs/xfs/scrub/agheader.c
+++ b/fs/xfs/scrub/agheader.c
@@ -53,6 +53,7 @@ xchk_superblock_xref(
 	xchk_xref_is_not_inode_chunk(sc, agbno, 1);
 	xchk_xref_is_owned_by(sc, agbno, 1, &XFS_RMAP_OINFO_FS);
 	xchk_xref_is_not_shared(sc, agbno, 1);
+	xchk_xref_is_not_cow_staging(sc, agbno, 1);
 
 	/* scrub teardown will take care of sc->sa for us */
 }
@@ -517,6 +518,7 @@ xchk_agf_xref(
 	xchk_xref_is_owned_by(sc, agbno, 1, &XFS_RMAP_OINFO_FS);
 	xchk_agf_xref_btreeblks(sc);
 	xchk_xref_is_not_shared(sc, agbno, 1);
+	xchk_xref_is_not_cow_staging(sc, agbno, 1);
 	xchk_agf_xref_refcblks(sc);
 
 	/* scrub teardown will take care of sc->sa for us */
@@ -644,6 +646,7 @@ xchk_agfl_block_xref(
 	xchk_xref_is_not_inode_chunk(sc, agbno, 1);
 	xchk_xref_is_owned_by(sc, agbno, 1, &XFS_RMAP_OINFO_AG);
 	xchk_xref_is_not_shared(sc, agbno, 1);
+	xchk_xref_is_not_cow_staging(sc, agbno, 1);
 }
 
 /* Scrub an AGFL block. */
@@ -700,6 +703,7 @@ xchk_agfl_xref(
 	xchk_xref_is_not_inode_chunk(sc, agbno, 1);
 	xchk_xref_is_owned_by(sc, agbno, 1, &XFS_RMAP_OINFO_FS);
 	xchk_xref_is_not_shared(sc, agbno, 1);
+	xchk_xref_is_not_cow_staging(sc, agbno, 1);
 
 	/*
 	 * Scrub teardown will take care of sc->sa for us.  Leave sc->sa
@@ -855,6 +859,7 @@ xchk_agi_xref(
 	xchk_agi_xref_icounts(sc);
 	xchk_xref_is_owned_by(sc, agbno, 1, &XFS_RMAP_OINFO_FS);
 	xchk_xref_is_not_shared(sc, agbno, 1);
+	xchk_xref_is_not_cow_staging(sc, agbno, 1);
 	xchk_agi_xref_fiblocks(sc);
 
 	/* scrub teardown will take care of sc->sa for us */
diff --git a/fs/xfs/scrub/alloc.c b/fs/xfs/scrub/alloc.c
index d8f2ba7efa22..0cd20d998368 100644
--- a/fs/xfs/scrub/alloc.c
+++ b/fs/xfs/scrub/alloc.c
@@ -88,6 +88,7 @@ xchk_allocbt_xref(
 	xchk_xref_is_not_inode_chunk(sc, agbno, len);
 	xchk_xref_has_no_owner(sc, agbno, len);
 	xchk_xref_is_not_shared(sc, agbno, len);
+	xchk_xref_is_not_cow_staging(sc, agbno, len);
 }
 
 /* Scrub a bnobt/cntbt record. */
diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c
index 5c4b25585b8c..1e4813c82cc5 100644
--- a/fs/xfs/scrub/bmap.c
+++ b/fs/xfs/scrub/bmap.c
@@ -328,12 +328,17 @@ xchk_bmap_iextent_xref(
 	xchk_bmap_xref_rmap(info, irec, agbno);
 	switch (info->whichfork) {
 	case XFS_DATA_FORK:
-		if (xfs_is_reflink_inode(info->sc->ip))
-			break;
-		fallthrough;
+		if (!xfs_is_reflink_inode(info->sc->ip))
+			xchk_xref_is_not_shared(info->sc, agbno,
+					irec->br_blockcount);
+		xchk_xref_is_not_cow_staging(info->sc, agbno,
+				irec->br_blockcount);
+		break;
 	case XFS_ATTR_FORK:
 		xchk_xref_is_not_shared(info->sc, agbno,
 				irec->br_blockcount);
+		xchk_xref_is_not_cow_staging(info->sc, agbno,
+				irec->br_blockcount);
 		break;
 	case XFS_COW_FORK:
 		xchk_xref_is_cow_staging(info->sc, agbno,
diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
index 0b27c3520b74..efe346ddd1b7 100644
--- a/fs/xfs/scrub/ialloc.c
+++ b/fs/xfs/scrub/ialloc.c
@@ -116,7 +116,7 @@ xchk_iallocbt_chunk(
 		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
 
 	xchk_iallocbt_chunk_xref(bs->sc, irec, agino, bno, len);
-
+	xchk_xref_is_not_cow_staging(bs->sc, bno, len);
 	return true;
 }
 
diff --git a/fs/xfs/scrub/inode.c b/fs/xfs/scrub/inode.c
index 998bf06d2347..a68ba8684465 100644
--- a/fs/xfs/scrub/inode.c
+++ b/fs/xfs/scrub/inode.c
@@ -558,6 +558,7 @@ xchk_inode_xref(
 	xchk_inode_xref_finobt(sc, ino);
 	xchk_xref_is_owned_by(sc, agbno, 1, &XFS_RMAP_OINFO_INODES);
 	xchk_xref_is_not_shared(sc, agbno, 1);
+	xchk_xref_is_not_cow_staging(sc, agbno, 1);
 	xchk_inode_xref_bmap(sc, dip);
 
 out_free:
diff --git a/fs/xfs/scrub/refcount.c b/fs/xfs/scrub/refcount.c
index d97e7e372b9c..2009efea923c 100644
--- a/fs/xfs/scrub/refcount.c
+++ b/fs/xfs/scrub/refcount.c
@@ -562,3 +562,24 @@ xchk_xref_is_not_shared(
 	if (keyfill != XFS_BTREE_KEYFILL_EMPTY)
 		xchk_btree_xref_set_corrupt(sc, sc->sa.refc_cur, 0);
 }
+
+/* xref check that the extent is not being used for CoW staging. */
+void
+xchk_xref_is_not_cow_staging(
+	struct xfs_scrub	*sc,
+	xfs_agblock_t		agbno,
+	xfs_extlen_t		len)
+{
+	enum xfs_btree_keyfill	keyfill;
+	int			error;
+
+	if (!sc->sa.refc_cur || xchk_skip_xref(sc->sm))
+		return;
+
+	error = xfs_refcount_scan_keyfill(sc->sa.refc_cur, agbno +
+			XFS_REFC_COW_START, len, &keyfill);
+	if (!xchk_should_check_xref(sc, &error, &sc->sa.refc_cur))
+		return;
+	if (keyfill != XFS_BTREE_KEYFILL_EMPTY)
+		xchk_btree_xref_set_corrupt(sc, sc->sa.refc_cur, 0);
+}
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index 85c055c2ddc5..a331838e22ff 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -166,6 +166,8 @@ void xchk_xref_is_cow_staging(struct xfs_scrub *sc, xfs_agblock_t bno,
 		xfs_extlen_t len);
 void xchk_xref_is_not_shared(struct xfs_scrub *sc, xfs_agblock_t bno,
 		xfs_extlen_t len);
+void xchk_xref_is_not_cow_staging(struct xfs_scrub *sc, xfs_agblock_t bno,
+		xfs_extlen_t len);
 #ifdef CONFIG_XFS_RT
 void xchk_xref_is_used_rt_space(struct xfs_scrub *sc, xfs_rtblock_t rtbno,
 		xfs_extlen_t len);


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 5/5] xfs: ensure that all metadata and data blocks are not cow staging extents
  2022-10-02 18:20 ` [PATCH 5/5] xfs: ensure that all metadata and data blocks are not cow staging extents Darrick J. Wong
@ 2022-11-02  0:29   ` Dave Chinner
  0 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2022-11-02  0:29 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Sun, Oct 02, 2022 at 11:20:16AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Make sure that all filesystem metadata blocks and file data blocks are
> not also marked as CoW staging extents.  The extra checking added here
> was inspired by an actual VM host filesystem corruption incident due to
> bugs in the CoW handling of 4.x kernels.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Looks reasonable.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/5] xfs: refactor converting btree irec to btree key
  2022-10-02 18:20 ` [PATCH 2/5] xfs: refactor converting btree irec to btree key Darrick J. Wong
@ 2022-11-02  0:31   ` Dave Chinner
  0 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2022-11-02  0:31 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Sun, Oct 02, 2022 at 11:20:16AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> We keep doing these conversions to support btree queries, so refactor
> this into a helper.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/libxfs/xfs_btree.c |   33 +++++++++++++++++----------------
>  1 file changed, 17 insertions(+), 16 deletions(-)

LGTM.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/5] xfs: replace xfs_btree_has_record with a general keyspace scanner
  2022-10-02 18:20 ` [PATCH 1/5] xfs: replace xfs_btree_has_record with a general keyspace scanner Darrick J. Wong
@ 2022-11-02  1:47   ` Dave Chinner
  2022-11-03  1:56     ` Darrick J. Wong
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2022-11-02  1:47 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Sun, Oct 02, 2022 at 11:20:16AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> The current implementation of xfs_btree_has_record returns true if it
> finds /any/ record within the given range.  Unfortunately, that's not
> sufficient for scrub.  We want to be able to tell if a range of keyspace
> for a btree is devoid of records, is totally mapped to records, or is
> somewhere in between.  By forcing this to be a boolean, we were
> generally missing the "in between" case and returning incorrect results.
> Fix the API so that we can tell the caller which of those three is the
> current state.

My first reaction is that this "keyfill" API name is .... awful.

From an API perspective, all we are doing is changing the
"has_record()" API that returns a boolean to return a tri-state - we
add a "partial" return to the "all" and "none" states we currently
return. The whole API doesn't need renaming - it's impossible to
work out what "scan_keyfill" iis actually intended to do, whereas
"has_record"  is very much self documenting....

Hence I think that the implementation of xfs_btree_has_record()
needs to change, I don't think the entire API needs to be renamed.
All that needs to happen to the higher level API is this conversion:

> -	bool			*exists)
> +	enum xfs_btree_keyfill	*exists)

Even then, the enum could be named for what it means -
xfs_btree_rec_overlap - with values for none, full and partial. At
this point, nothing above xfs_btree_has record needs to know
anything about whatever a "key fill" operation might actually be.


>  static const struct xfs_btree_ops xfs_cntbt_ops = {
> diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c
> index cfa052d40105..d1225b957649 100644
> --- a/fs/xfs/libxfs/xfs_bmap_btree.c
> +++ b/fs/xfs/libxfs/xfs_bmap_btree.c
> @@ -518,6 +518,18 @@ xfs_bmbt_recs_inorder(
>  		xfs_bmbt_disk_get_startoff(&r2->bmbt);
>  }
>  
> +STATIC bool
> +xfs_bmbt_has_key_gap(
> +	struct xfs_btree_cur		*cur,
> +	const union xfs_btree_key	*key1,
> +	const union xfs_btree_key	*key2)
> +{
> +	xfs_fileoff_t			next;
> +
> +	next = be64_to_cpu(key1->bmbt.br_startoff) + 1;
> +	return next != be64_to_cpu(key2->bmbt.br_startoff);
> +}

IDGI - this is an extent tree - there is no gap if the extent at
key2 starts at the end of key1. IOWs, this only returns "no gap"
if the extent at key 1 is a single block in length. I'll come back
to this...

Oh, does this assume that the caller has already created a key to a
nonexistent record in the BMBT that points to the end of the first
extent? i.e. that this method is being called with key1 being a high
key for the bmbt record (i.e. an end pointer) and key2 being a low
key for the bmbt record (i.e. a start pointer)?

If so, this API needs some documentation on how it is expected to be
used - at least name the two keys something more descriptive like
"high key" and "next low key"....

It occurs to me that what I'm actually doing here is reverse
engineering the design of this mechanism from the code because
there's no documentation in the patch or the commit description of
the algorithm being used to find overlapping records....

>  static const struct xfs_btree_ops xfs_bmbt_ops = {
>  	.rec_len		= sizeof(xfs_bmbt_rec_t),
>  	.key_len		= sizeof(xfs_bmbt_key_t),
> @@ -538,6 +550,7 @@ static const struct xfs_btree_ops xfs_bmbt_ops = {
>  	.buf_ops		= &xfs_bmbt_buf_ops,
>  	.keys_inorder		= xfs_bmbt_keys_inorder,
>  	.recs_inorder		= xfs_bmbt_recs_inorder,
> +	.has_key_gap		= xfs_bmbt_has_key_gap,
>  };
>  
>  /*
> diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
> index 4c16c8c31fcb..5710d3ee582a 100644
> --- a/fs/xfs/libxfs/xfs_btree.c
> +++ b/fs/xfs/libxfs/xfs_btree.c
> @@ -5008,34 +5008,100 @@ xfs_btree_diff_two_ptrs(
>  	return (int64_t)be32_to_cpu(a->s) - be32_to_cpu(b->s);
>  }
>  
> -/* If there's an extent, we're done. */
> +struct xfs_btree_scan_keyfill {
> +	/* Keys for the start and end of the range we want to know about. */
> +	union xfs_btree_key	start_key;
> +	union xfs_btree_key	end_key;
> +
> +	/* Highest record key we've seen so far. */
> +	union xfs_btree_key	high_key;
> +
> +	enum xfs_btree_keyfill	outcome;
> +};

This "keyfill" scan operation is completely private to
xfs_btree_has_record(), which further indicates the higher level API
should not be renamed "keyfill"....

> +
>  STATIC int
> -xfs_btree_has_record_helper(
> +xfs_btree_scan_keyfill_helper(
>  	struct xfs_btree_cur		*cur,
>  	const union xfs_btree_rec	*rec,
>  	void				*priv)
>  {
> -	return -ECANCELED;
> +	union xfs_btree_key		rec_key;
> +	union xfs_btree_key		rec_high_key;
> +	struct xfs_btree_scan_keyfill	*info = priv;
> +	int64_t				res;
> +
> +	cur->bc_ops->init_key_from_rec(&rec_key, rec);
> +
> +	if (info->outcome == XFS_BTREE_KEYFILL_EMPTY) {
> +		info->outcome = XFS_BTREE_KEYFILL_SPARSE;
> +
> +		/* Bail if the first record starts after the start key. */
> +		res = cur->bc_ops->diff_two_keys(cur, &info->start_key,
> +				&rec_key);
> +		if (res < 0)
> +			return -ECANCELED;

Better comment needed.

		/*
		 * If the first record we find does not overlap the
		 * start key, then there is a hole at the start of
		 * the search range before the overlap was found.
		 * Hence we can classify this as a sparse overlap
		 * and we don't need to search any further.
		 */

> +	} else {
> +		/* Bail if there's a gap with the previous record. */
> +		if (cur->bc_ops->has_key_gap(cur, &info->high_key, &rec_key))
> +			return -ECANCELED;

Ah, yeah, there's the high key -> low key implementation assumption.

> +	}
> +
> +	/* If the current record is higher than what we've seen, remember it. */
> +	cur->bc_ops->init_high_key_from_rec(&rec_high_key, rec);
> +	res = cur->bc_ops->diff_two_keys(cur, &rec_high_key, &info->high_key);
> +	if (res > 0)
> +		info->high_key = rec_high_key; /* struct copy */
> +
> +	return 0;
>  }
>  
> -/* Is there a record covering a given range of keys? */
> +/*
> + * Scan part of the keyspace of a btree and tell us if the area has no records,
> + * is fully mapped by records, or is partially filled.
> + */
>  int
> -xfs_btree_has_record(
> +xfs_btree_scan_keyfill(
>  	struct xfs_btree_cur		*cur,
>  	const union xfs_btree_irec	*low,
>  	const union xfs_btree_irec	*high,
> -	bool				*exists)
> +	enum xfs_btree_keyfill		*outcome)
>  {
> +	struct xfs_btree_scan_keyfill	info = {
> +		.outcome		= XFS_BTREE_KEYFILL_EMPTY,
> +	};
> +	union xfs_btree_rec		rec;
> +	int64_t				res;
>  	int				error;
>  
> +	if (!cur->bc_ops->has_key_gap)
> +		return -EOPNOTSUPP;
> +
> +	cur->bc_rec = *low;
> +	cur->bc_ops->init_rec_from_cur(cur, &rec);
> +	cur->bc_ops->init_key_from_rec(&info.start_key, &rec);
> +
> +	cur->bc_rec = *high;
> +	cur->bc_ops->init_rec_from_cur(cur, &rec);
> +	cur->bc_ops->init_key_from_rec(&info.end_key, &rec);

Didn't a previous patch I just create helpers for these?

Oh.... patches in the series were threaded in the wrong order...


> +
>  	error = xfs_btree_query_range(cur, low, high,
> -			&xfs_btree_has_record_helper, NULL);
> -	if (error == -ECANCELED) {
> -		*exists = true;
> -		return 0;
> -	}
> -	*exists = false;
> -	return error;
> +			xfs_btree_scan_keyfill_helper, &info);
> +	if (error == -ECANCELED)
> +		goto out;
> +	if (error)
> +		return error;
> +
> +	if (info.outcome == XFS_BTREE_KEYFILL_EMPTY)
> +		goto out;
> +
> +	/* Did the record set go at least as far as the end? */
> +	res = cur->bc_ops->diff_two_keys(cur, &info.high_key, &info.end_key);
> +	if (res >= 0)
> +		info.outcome = XFS_BTREE_KEYFILL_FULL;

Hmmm. I'm wondering if we should have helpers for these sorts of
key comparisons.

static bool
xfs_btree_keycmp_lt(
	struct xfs_btree_cur	*cur,
	struct xfs_btree_key	*key1,
	struct xfs_btree_key	*key2)
{
	return cur->bc_ops->diff_two_keys(cur, key1, key2) < 0;
}

static bool
xfs_btree_keycmp_gt(
	struct xfs_btree_cur	*cur,
	struct xfs_btree_key	*key1,
	struct xfs_btree_key	*key2)
{
	return cur->bc_ops->diff_two_keys(cur, key1, key2) > 0;
}

static bool
xfs_btree_keycmp_ge(
	struct xfs_btree_cur	*cur,
	struct xfs_btree_key	*key1,
	struct xfs_btree_key	*key2)
{
	return cur->bc_ops->diff_two_keys(cur, key1, key2) >= 0;
}

Which then makes the code read a whole lot nicer:

	/* Did the record set go at least as far as the end? */
	if (xfs_btree_keycmp_ge(cur, &info.high_key, &info.end_key))
		info.outcome = XFS_BTREE_KEYFILL_FULL;
...

Not necessary for this patch, but I note there are a few places
where these sorts of key range/ordering checks are open coded...
> +
> +out:
> +	*outcome = info.outcome;
> +	return 0;
>  }
>  
>  /* Are there more records in this btree? */
> diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
> index eef27858a013..58a05f0d1f1b 100644
> --- a/fs/xfs/libxfs/xfs_btree.h
> +++ b/fs/xfs/libxfs/xfs_btree.h
> @@ -157,6 +157,11 @@ struct xfs_btree_ops {
>  	int	(*recs_inorder)(struct xfs_btree_cur *cur,
>  				const union xfs_btree_rec *r1,
>  				const union xfs_btree_rec *r2);
> +
> +	/* decide if there's a gap in the keyspace between two keys */
> +	bool	(*has_key_gap)(struct xfs_btree_cur *cur,
> +			       const union xfs_btree_key *key1,
> +			       const union xfs_btree_key *key2);

Having read through it this far, this looks like it is checking for
two discrete keys form a contiguous range? Perhaps that's a better
name than "gap", because "contiguous" means different things for
different keys. e.g. extents vs inode records.


> diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
> index 8c83e265770c..fd48b95b4f4e 100644
> --- a/fs/xfs/libxfs/xfs_ialloc_btree.c
> +++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
> @@ -380,6 +380,18 @@ xfs_inobt_recs_inorder(
>  		be32_to_cpu(r2->inobt.ir_startino);
>  }
>  
> +STATIC bool
> +xfs_inobt_has_key_gap(
> +	struct xfs_btree_cur		*cur,
> +	const union xfs_btree_key	*key1,
> +	const union xfs_btree_key	*key2)
> +{
> +	xfs_agino_t			next;
> +
> +	next = be32_to_cpu(key1->inobt.ir_startino) + XFS_INODES_PER_CHUNK;
> +	return next != be32_to_cpu(key2->inobt.ir_startino);
> +}

Huh. Is that correct? The high key for an inode chunk is:

STATIC void                                                                      
xfs_inobt_init_high_key_from_rec(                                                
        union xfs_btree_key             *key,                                    
        const union xfs_btree_rec       *rec)                                    
{                                                                                
        __u32                           x;                                       
                                                                                 
        x = be32_to_cpu(rec->inobt.ir_startino);                                 
        x += XFS_INODES_PER_CHUNK - 1;                                           
        key->inobt.ir_startino = cpu_to_be32(x);                                 
}                                                                                

Which means highkey->ir_startino (end range pointer) points to
low_key->ir_startino + 63 (start range pointer + inodes in chunk)

Hence if this "gap" API is supposed to be passed {high_key,
low_key}, then xfs_inobt_has_key_gap() is checking
(low_key->ir_startino + 127) against next_low_key->ir_startino...

> +STATIC bool
> +xfs_refcountbt_has_key_gap(
> +	struct xfs_btree_cur		*cur,
> +	const union xfs_btree_key	*key1,
> +	const union xfs_btree_key	*key2)
> +{
> +	xfs_agblock_t			next;
> +
> +	next = be32_to_cpu(key1->refc.rc_startblock) + 1;
> +	return next != be32_to_cpu(key2->refc.rc_startblock);
> +}

... and this matches the BMBT code (as does the rmapbt code), which seems to
assume a high key (end extent pointer) is being passed as key1, and key2 is
a low key (start extent pointer).

Am I completely misunderstanding what the key gap API uses for
low_key and high_key? I am completely confused now...

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/5] xfs: mask key comparisons for keyspace fill scans
  2022-10-02 18:20 ` [PATCH 3/5] xfs: mask key comparisons for keyspace fill scans Darrick J. Wong
@ 2022-11-02  2:16   ` Dave Chinner
  2022-11-03  1:08     ` Darrick J. Wong
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2022-11-02  2:16 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Sun, Oct 02, 2022 at 11:20:16AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> For keyspace fullness scans, we want to be able to mask off the parts of
> the key that we don't care about.  For most btree types we /do/ want the
> full keyspace, but for checking that a given space usage also has a full
> complement of rmapbt records (even if different/multiple owners) we need
> this masking so that we only track sparseness of rm_startblock, not the
> whole keyspace (which is extremely sparse).
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
....
> diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
> index edea6db8d8e4..6fbce2f3c17e 100644
> --- a/fs/xfs/libxfs/xfs_btree.c
> +++ b/fs/xfs/libxfs/xfs_btree.c
> @@ -5020,12 +5020,33 @@ struct xfs_btree_scan_keyfill {
>  	union xfs_btree_key	start_key;
>  	union xfs_btree_key	end_key;
>  
> +	/* Mask for key comparisons, if desired. */
> +	union xfs_btree_key	*key_mask;

How does this mask work? i.e. the way it is supposed to be used it
not documented in either the commit message or the code....


> +STATIC int64_t
> +xfs_btree_diff_two_masked_keys(
> +	struct xfs_btree_cur		*cur,
> +	const union xfs_btree_key	*key1,
> +	const union xfs_btree_key	*key2,
> +	const union xfs_btree_key	*mask)
> +{
> +	union xfs_btree_key		mk1, mk2;
> +
> +	if (likely(!mask))
> +		return cur->bc_ops->diff_two_keys(cur, key1, key2);
> +
> +	cur->bc_ops->mask_key(cur, &mk1, key1, mask);
> +	cur->bc_ops->mask_key(cur, &mk2, key2, mask);
> +
> +	return cur->bc_ops->diff_two_keys(cur, &mk1, &mk2);
> +}

This seems .... very abstract.

Why not just add a mask pointer to xfs_btree_diff_two_keys(),
and in each of the btree implementations of ->diff_two_keys()
change them to:

	if (mask) {
		/* mask keys */
	}
	/* run existing diff on two keys */

That gets rid of all these function pointer calls, and we only need
a single "diff two keys" api to be defined...

> diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
> index 58a05f0d1f1b..99baa8283049 100644
> --- a/fs/xfs/libxfs/xfs_btree.h
> +++ b/fs/xfs/libxfs/xfs_btree.h
> @@ -158,10 +158,17 @@ struct xfs_btree_ops {
>  				const union xfs_btree_rec *r1,
>  				const union xfs_btree_rec *r2);
>  
> +	/* mask a key for us */
> +	void	(*mask_key)(struct xfs_btree_cur *cur,
> +			    union xfs_btree_key *out_key,
> +			    const union xfs_btree_key *in_key,
> +			    const union xfs_btree_key *mask);
> +
>  	/* decide if there's a gap in the keyspace between two keys */
>  	bool	(*has_key_gap)(struct xfs_btree_cur *cur,
>  			       const union xfs_btree_key *key1,
> -			       const union xfs_btree_key *key2);
> +			       const union xfs_btree_key *key2,
> +			       const union xfs_btree_key *mask);
>  };
>  
>  /*
> @@ -552,6 +559,7 @@ typedef bool (*xfs_btree_key_gap_fn)(struct xfs_btree_cur *cur,
>  int xfs_btree_scan_keyfill(struct xfs_btree_cur *cur,
>  		const union xfs_btree_irec *low,
>  		const union xfs_btree_irec *high,
> +		const union xfs_btree_irec *mask,
>  		enum xfs_btree_keyfill *outcome);
>  
>  bool xfs_btree_has_more_records(struct xfs_btree_cur *cur);
> diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
> index fd48b95b4f4e..d429ca8d9dd8 100644
> --- a/fs/xfs/libxfs/xfs_ialloc_btree.c
> +++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
> @@ -384,11 +384,14 @@ STATIC bool
>  xfs_inobt_has_key_gap(
>  	struct xfs_btree_cur		*cur,
>  	const union xfs_btree_key	*key1,
> -	const union xfs_btree_key	*key2)
> +	const union xfs_btree_key	*key2,
> +	const union xfs_btree_key	*mask)
>  {
>  	xfs_agino_t			next;
>  
> -	next = be32_to_cpu(key1->inobt.ir_startino) + XFS_INODES_PER_CHUNK;
> +	ASSERT(!mask || mask->inobt.ir_startino);
> +
> +	next = be32_to_cpu(key1->inobt.ir_startino) + 1;
>  	return next != be32_to_cpu(key2->inobt.ir_startino);
>  }

I think you just fixed the issue I noticed in the last patch....


> diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
> index 08d47cbf4697..4c123b6dd080 100644
> --- a/fs/xfs/libxfs/xfs_rmap.c
> +++ b/fs/xfs/libxfs/xfs_rmap.c
> @@ -2685,13 +2685,18 @@ xfs_rmap_scan_keyfill(
>  {
>  	union xfs_btree_irec	low;
>  	union xfs_btree_irec	high;
> +	union xfs_btree_irec	mask;
> +
> +	/* Only care about space scans here */
> +	memset(&mask, 0, sizeof(low));

sizeof(mask)?

> diff --git a/fs/xfs/libxfs/xfs_rmap_btree.c b/fs/xfs/libxfs/xfs_rmap_btree.c
> index d64143a842ce..9ca60f709c4b 100644
> --- a/fs/xfs/libxfs/xfs_rmap_btree.c
> +++ b/fs/xfs/libxfs/xfs_rmap_btree.c
> @@ -433,16 +433,55 @@ xfs_rmapbt_recs_inorder(
>  	return 0;
>  }
>  
> +STATIC void
> +xfs_rmapbt_mask_key(
> +	struct xfs_btree_cur		*cur,
> +	union xfs_btree_key		*out_key,
> +	const union xfs_btree_key	*in_key,
> +	const union xfs_btree_key	*mask)
> +{
> +	memset(out_key, 0, sizeof(union xfs_btree_key));
> +
> +	if (mask->rmap.rm_startblock)
> +		out_key->rmap.rm_startblock = in_key->rmap.rm_startblock;
> +	if (mask->rmap.rm_owner)
> +		out_key->rmap.rm_owner = in_key->rmap.rm_owner;
> +	if (mask->rmap.rm_offset)
> +		out_key->rmap.rm_offset = in_key->rmap.rm_offset;
> +}

So the mask fields are just used as boolean state to select what
should be copied into the masked key?

> +
>  STATIC bool
>  xfs_rmapbt_has_key_gap(
>  	struct xfs_btree_cur		*cur,
>  	const union xfs_btree_key	*key1,
> -	const union xfs_btree_key	*key2)
> +	const union xfs_btree_key	*key2,
> +	const union xfs_btree_key	*mask)
>  {
> -	xfs_agblock_t			next;
> +	bool				reflink = xfs_has_reflink(cur->bc_mp);
> +	uint64_t			x, y;
>  
> -	next = be32_to_cpu(key1->rmap.rm_startblock) + 1;
> -	return next != be32_to_cpu(key2->rmap.rm_startblock);
> +	if (mask->rmap.rm_offset) {
> +		x = be64_to_cpu(key1->rmap.rm_offset) + 1;
> +		y = be64_to_cpu(key2->rmap.rm_offset);
> +		if ((reflink && x < y) || (!reflink && x != y))
> +			return true;
> +	}
> +
> +	if (mask->rmap.rm_owner) {
> +		x = be64_to_cpu(key1->rmap.rm_owner) + 1;
> +		y = be64_to_cpu(key2->rmap.rm_owner);
> +		if ((reflink && x < y) || (!reflink && x != y))
> +			return true;
> +	}
> +
> +	if (mask->rmap.rm_startblock) {
> +		x = be32_to_cpu(key1->rmap.rm_startblock) + 1;
> +		y = be32_to_cpu(key2->rmap.rm_startblock);
> +		if ((reflink && x < y) || (!reflink && x != y))
> +			return true;
> +	}
> +
> +	return false;

Urk. That needs a comment explaining what all the mystery reflink
logic is doing. It also needs to explain the order of precedence on
the mask checks and why that order is important (or not!).

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/5] xfs: check the reference counts of gaps in the refcount btree
  2022-10-02 18:20 ` [PATCH 4/5] xfs: check the reference counts of gaps in the refcount btree Darrick J. Wong
@ 2022-11-02  2:20   ` Dave Chinner
  0 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2022-11-02  2:20 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Sun, Oct 02, 2022 at 11:20:16AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Gaps in the reference count btree are also significant -- for these
> regions, there must not be any overlapping reverse mappings.  We don't
> currently check this, so make the refcount scrubber more complete.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/scrub/refcount.c |   84 ++++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 79 insertions(+), 5 deletions(-)

Makes sense.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/5] xfs: mask key comparisons for keyspace fill scans
  2022-11-02  2:16   ` Dave Chinner
@ 2022-11-03  1:08     ` Darrick J. Wong
  2022-11-03 21:12       ` Darrick J. Wong
  0 siblings, 1 reply; 14+ messages in thread
From: Darrick J. Wong @ 2022-11-03  1:08 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Nov 02, 2022 at 01:16:26PM +1100, Dave Chinner wrote:
> On Sun, Oct 02, 2022 at 11:20:16AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > For keyspace fullness scans, we want to be able to mask off the parts of
> > the key that we don't care about.  For most btree types we /do/ want the
> > full keyspace, but for checking that a given space usage also has a full
> > complement of rmapbt records (even if different/multiple owners) we need
> > this masking so that we only track sparseness of rm_startblock, not the
> > whole keyspace (which is extremely sparse).
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> ....
> > diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
> > index edea6db8d8e4..6fbce2f3c17e 100644
> > --- a/fs/xfs/libxfs/xfs_btree.c
> > +++ b/fs/xfs/libxfs/xfs_btree.c
> > @@ -5020,12 +5020,33 @@ struct xfs_btree_scan_keyfill {
> >  	union xfs_btree_key	start_key;
> >  	union xfs_btree_key	end_key;
> >  
> > +	/* Mask for key comparisons, if desired. */
> > +	union xfs_btree_key	*key_mask;
> 
> How does this mask work? i.e. the way it is supposed to be used it
> not documented in either the commit message or the code....

When I merge all of this into _diff_two_keys, I'll add this to its
description:

"Normally, each btree type's _diff_two_keys implementation will use all
available btree key fields to compare the given keys.  However, some
callers may prefer to ignore some part of the btree record keyspace when
performing the comparison.

These callers should create a union xfs_btree_key object, set the fields
that *should* be a part of the comparison to any nonzero value, and
leave the rest zeroed.  That object should be passed in as the @key_mask
value."

For a concrete example, take the rmap space scanning function below.  If
we only want to know if a certain range of physical blocks has zero rmap
records, enough rmap records to account for every block in the range, or
some records in between, we'd initialize the key mask as follows:

	union xfs_btree_key	km = {
		.rmap.rm_startblock = 1,
	};

and the _scan_keyfill function will only look that far into the key
comparisons.

> 
> > +STATIC int64_t
> > +xfs_btree_diff_two_masked_keys(
> > +	struct xfs_btree_cur		*cur,
> > +	const union xfs_btree_key	*key1,
> > +	const union xfs_btree_key	*key2,
> > +	const union xfs_btree_key	*mask)
> > +{
> > +	union xfs_btree_key		mk1, mk2;
> > +
> > +	if (likely(!mask))
> > +		return cur->bc_ops->diff_two_keys(cur, key1, key2);
> > +
> > +	cur->bc_ops->mask_key(cur, &mk1, key1, mask);
> > +	cur->bc_ops->mask_key(cur, &mk2, key2, mask);
> > +
> > +	return cur->bc_ops->diff_two_keys(cur, &mk1, &mk2);
> > +}
> 
> This seems .... very abstract.

Yes, I've gone unusually deep into database theory here...

> Why not just add a mask pointer to xfs_btree_diff_two_keys(),
> and in each of the btree implementations of ->diff_two_keys()
> change them to:
> 
> 	if (mask) {
> 		/* mask keys */
> 	}
> 	/* run existing diff on two keys */
> 
> That gets rid of all these function pointer calls, and we only need
> a single "diff two keys" api to be defined...

Ok, I'll look into combining mask_key into diff_two_keys.

> > diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
> > index 58a05f0d1f1b..99baa8283049 100644
> > --- a/fs/xfs/libxfs/xfs_btree.h
> > +++ b/fs/xfs/libxfs/xfs_btree.h
> > @@ -158,10 +158,17 @@ struct xfs_btree_ops {
> >  				const union xfs_btree_rec *r1,
> >  				const union xfs_btree_rec *r2);
> >  
> > +	/* mask a key for us */
> > +	void	(*mask_key)(struct xfs_btree_cur *cur,
> > +			    union xfs_btree_key *out_key,
> > +			    const union xfs_btree_key *in_key,
> > +			    const union xfs_btree_key *mask);
> > +
> >  	/* decide if there's a gap in the keyspace between two keys */
> >  	bool	(*has_key_gap)(struct xfs_btree_cur *cur,
> >  			       const union xfs_btree_key *key1,
> > -			       const union xfs_btree_key *key2);
> > +			       const union xfs_btree_key *key2,
> > +			       const union xfs_btree_key *mask);
> >  };
> >  
> >  /*
> > @@ -552,6 +559,7 @@ typedef bool (*xfs_btree_key_gap_fn)(struct xfs_btree_cur *cur,
> >  int xfs_btree_scan_keyfill(struct xfs_btree_cur *cur,
> >  		const union xfs_btree_irec *low,
> >  		const union xfs_btree_irec *high,
> > +		const union xfs_btree_irec *mask,
> >  		enum xfs_btree_keyfill *outcome);
> >  
> >  bool xfs_btree_has_more_records(struct xfs_btree_cur *cur);
> > diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
> > index fd48b95b4f4e..d429ca8d9dd8 100644
> > --- a/fs/xfs/libxfs/xfs_ialloc_btree.c
> > +++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
> > @@ -384,11 +384,14 @@ STATIC bool
> >  xfs_inobt_has_key_gap(
> >  	struct xfs_btree_cur		*cur,
> >  	const union xfs_btree_key	*key1,
> > -	const union xfs_btree_key	*key2)
> > +	const union xfs_btree_key	*key2,
> > +	const union xfs_btree_key	*mask)
> >  {
> >  	xfs_agino_t			next;
> >  
> > -	next = be32_to_cpu(key1->inobt.ir_startino) + XFS_INODES_PER_CHUNK;
> > +	ASSERT(!mask || mask->inobt.ir_startino);
> > +
> > +	next = be32_to_cpu(key1->inobt.ir_startino) + 1;
> >  	return next != be32_to_cpu(key2->inobt.ir_startino);
> >  }
> 
> I think you just fixed the issue I noticed in the last patch....

Oops, I clearly committed this to the wrong patch.  Sorry about that.

> > diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
> > index 08d47cbf4697..4c123b6dd080 100644
> > --- a/fs/xfs/libxfs/xfs_rmap.c
> > +++ b/fs/xfs/libxfs/xfs_rmap.c
> > @@ -2685,13 +2685,18 @@ xfs_rmap_scan_keyfill(
> >  {
> >  	union xfs_btree_irec	low;
> >  	union xfs_btree_irec	high;
> > +	union xfs_btree_irec	mask;
> > +
> > +	/* Only care about space scans here */
> > +	memset(&mask, 0, sizeof(low));
> 
> sizeof(mask)?

Yep.

> > diff --git a/fs/xfs/libxfs/xfs_rmap_btree.c b/fs/xfs/libxfs/xfs_rmap_btree.c
> > index d64143a842ce..9ca60f709c4b 100644
> > --- a/fs/xfs/libxfs/xfs_rmap_btree.c
> > +++ b/fs/xfs/libxfs/xfs_rmap_btree.c
> > @@ -433,16 +433,55 @@ xfs_rmapbt_recs_inorder(
> >  	return 0;
> >  }
> >  
> > +STATIC void
> > +xfs_rmapbt_mask_key(
> > +	struct xfs_btree_cur		*cur,
> > +	union xfs_btree_key		*out_key,
> > +	const union xfs_btree_key	*in_key,
> > +	const union xfs_btree_key	*mask)
> > +{
> > +	memset(out_key, 0, sizeof(union xfs_btree_key));
> > +
> > +	if (mask->rmap.rm_startblock)
> > +		out_key->rmap.rm_startblock = in_key->rmap.rm_startblock;
> > +	if (mask->rmap.rm_owner)
> > +		out_key->rmap.rm_owner = in_key->rmap.rm_owner;
> > +	if (mask->rmap.rm_offset)
> > +		out_key->rmap.rm_offset = in_key->rmap.rm_offset;
> > +}
> 
> So the mask fields are just used as boolean state to select what
> should be copied into the masked key?

Yes.  A zeroed keymask field will be ignored, a nonzero keymask field
will be retained.

> > +
> >  STATIC bool
> >  xfs_rmapbt_has_key_gap(
> >  	struct xfs_btree_cur		*cur,
> >  	const union xfs_btree_key	*key1,
> > -	const union xfs_btree_key	*key2)
> > +	const union xfs_btree_key	*key2,
> > +	const union xfs_btree_key	*mask)
> >  {
> > -	xfs_agblock_t			next;
> > +	bool				reflink = xfs_has_reflink(cur->bc_mp);
> > +	uint64_t			x, y;
> >  
> > -	next = be32_to_cpu(key1->rmap.rm_startblock) + 1;
> > -	return next != be32_to_cpu(key2->rmap.rm_startblock);
> > +	if (mask->rmap.rm_offset) {
> > +		x = be64_to_cpu(key1->rmap.rm_offset) + 1;
> > +		y = be64_to_cpu(key2->rmap.rm_offset);
> > +		if ((reflink && x < y) || (!reflink && x != y))
> > +			return true;
> > +	}
> > +
> > +	if (mask->rmap.rm_owner) {
> > +		x = be64_to_cpu(key1->rmap.rm_owner) + 1;
> > +		y = be64_to_cpu(key2->rmap.rm_owner);
> > +		if ((reflink && x < y) || (!reflink && x != y))
> > +			return true;
> > +	}
> > +
> > +	if (mask->rmap.rm_startblock) {
> > +		x = be32_to_cpu(key1->rmap.rm_startblock) + 1;
> > +		y = be32_to_cpu(key2->rmap.rm_startblock);
> > +		if ((reflink && x < y) || (!reflink && x != y))
> > +			return true;
> > +	}
> > +
> > +	return false;
> 
> Urk. That needs a comment explaining what all the mystery reflink
> logic is doing. It also needs to explain the order of precedence on
> the mask checks and why that order is important (or not!).

For the purpose of comparing two keys to decide if there's a gap in the
keyspace, we do the comparisons in order of most sensitive to least
sensitive.  This is the opposite order of diff_two_keys -- if two keys
have the same startblock and inode but discontiguous file offsets,
there's a gap.  If two keys have the same startblock but different
owners, there's a gap regardless of the offset.  What that means is a
bit murky, since the only user of this functionality passes a mask so
that we only compare the perag parts fo the keys.

Except for the masking, I think the comparison logic all got committed
to the wrong patch.  :(

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/5] xfs: replace xfs_btree_has_record with a general keyspace scanner
  2022-11-02  1:47   ` Dave Chinner
@ 2022-11-03  1:56     ` Darrick J. Wong
  0 siblings, 0 replies; 14+ messages in thread
From: Darrick J. Wong @ 2022-11-03  1:56 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Nov 02, 2022 at 12:47:45PM +1100, Dave Chinner wrote:
> On Sun, Oct 02, 2022 at 11:20:16AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > The current implementation of xfs_btree_has_record returns true if it
> > finds /any/ record within the given range.  Unfortunately, that's not
> > sufficient for scrub.  We want to be able to tell if a range of keyspace
> > for a btree is devoid of records, is totally mapped to records, or is
> > somewhere in between.  By forcing this to be a boolean, we were
> > generally missing the "in between" case and returning incorrect results.
> > Fix the API so that we can tell the caller which of those three is the
> > current state.
> 
> My first reaction is that this "keyfill" API name is .... awful.

/me smacks head, realizes that "fill" could be interpreted as an action
verb, instead of a noun.  "fullness" might have been better.

This function scans part of a btree's keyspace to determine the fullness
factor (empty, totally full, sparse).

xfs_rmapbt_keyspace_fullness ?
                   _sparseness?
		   _contiguity?

That's the best thing I can think of, though my brain is a little tired
right now.  I could even leave it as _has_record just to avoid the
rename costs, though "has records" is a little vague.

OTOH "keyspace" is one of those jargon terms that comes from database
theory land.

> From an API perspective, all we are doing is changing the
> "has_record()" API that returns a boolean to return a tri-state - we
> add a "partial" return to the "all" and "none" states we currently
> return. The whole API doesn't need renaming - it's impossible to
> work out what "scan_keyfill" iis actually intended to do, whereas
> "has_record"  is very much self documenting....

Ok.

> Hence I think that the implementation of xfs_btree_has_record()
> needs to change, I don't think the entire API needs to be renamed.
> All that needs to happen to the higher level API is this conversion:
> 
> > -	bool			*exists)
> > +	enum xfs_btree_keyfill	*exists)
> 
> Even then, the enum could be named for what it means -
> xfs_btree_rec_overlap - with values for none, full and partial. At
> this point, nothing above xfs_btree_has record needs to know
> anything about whatever a "key fill" operation might actually be.

/me wishes he'd thought of "keyspace contiguity" earlier.

Though that's a lot of long words.  I'll take your suggestion to leave
the name as _has_records.  However, we're not actually measuring the
amount of *overlap* between records -- what we're really looking for is
the btree record equivalent of file holes.

enum xfs_btree_rec_contiguity?

> >  static const struct xfs_btree_ops xfs_cntbt_ops = {
> > diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c
> > index cfa052d40105..d1225b957649 100644
> > --- a/fs/xfs/libxfs/xfs_bmap_btree.c
> > +++ b/fs/xfs/libxfs/xfs_bmap_btree.c
> > @@ -518,6 +518,18 @@ xfs_bmbt_recs_inorder(
> >  		xfs_bmbt_disk_get_startoff(&r2->bmbt);
> >  }
> >  
> > +STATIC bool
> > +xfs_bmbt_has_key_gap(
> > +	struct xfs_btree_cur		*cur,
> > +	const union xfs_btree_key	*key1,
> > +	const union xfs_btree_key	*key2)
> > +{
> > +	xfs_fileoff_t			next;
> > +
> > +	next = be64_to_cpu(key1->bmbt.br_startoff) + 1;
> > +	return next != be64_to_cpu(key2->bmbt.br_startoff);
> > +}
> 
> IDGI - this is an extent tree - there is no gap if the extent at
> key2 starts at the end of key1. IOWs, this only returns "no gap"
> if the extent at key 1 is a single block in length. I'll come back
> to this...
> 
> Oh, does this assume that the caller has already created a key to a
> nonexistent record in the BMBT that points to the end of the first
> extent?

Yes.

> i.e. that this method is being called with key1 being a high
> key for the bmbt record (i.e. an end pointer) and key2 being a low
> key for the bmbt record (i.e. a start pointer)?

Generically, the _has_key_gap functions take two record keys A and C and
decide if is possible for there to be a third record key B satisfying
this relationship: A < B < C.  For the operation to make any sense, it's
very strongly implied that A is the high key of a record R and B is the
low key of a record S.  Technically, however, there's no reason why you
can't pass any two keys.

> If so, this API needs some documentation on how it is expected to be
> used - at least name the two keys something more descriptive like
> "high key" and "next low key"....

Now that I know that scrub is the only user of the key gap functions,
I'm confident that the function signatures can s/key1/high_key/ and
s/key2/low_key/.

Clearly I also need to improve the documentation for this function.

"Given two btree keys @high_key and @low_key, decide if it is possible
for there to be a third btree key K satisfying the relationship
@high_key < K < @low_key.  To determine the sparseness of the keyspace
for a pair of btree records, @high_key should be the high key of a
record and @low_key should be the low key of the next record in the
record set."

Not sure if that's any better though.

> It occurs to me that what I'm actually doing here is reverse
> engineering the design of this mechanism from the code because
> there's no documentation in the patch or the commit description of
> the algorithm being used to find overlapping records....

That's the (severe!) downside of talking to a database guy -- these
kinds of things are obvious to me, but that's not everyone's background.

> >  static const struct xfs_btree_ops xfs_bmbt_ops = {
> >  	.rec_len		= sizeof(xfs_bmbt_rec_t),
> >  	.key_len		= sizeof(xfs_bmbt_key_t),
> > @@ -538,6 +550,7 @@ static const struct xfs_btree_ops xfs_bmbt_ops = {
> >  	.buf_ops		= &xfs_bmbt_buf_ops,
> >  	.keys_inorder		= xfs_bmbt_keys_inorder,
> >  	.recs_inorder		= xfs_bmbt_recs_inorder,
> > +	.has_key_gap		= xfs_bmbt_has_key_gap,
> >  };
> >  
> >  /*
> > diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
> > index 4c16c8c31fcb..5710d3ee582a 100644
> > --- a/fs/xfs/libxfs/xfs_btree.c
> > +++ b/fs/xfs/libxfs/xfs_btree.c
> > @@ -5008,34 +5008,100 @@ xfs_btree_diff_two_ptrs(
> >  	return (int64_t)be32_to_cpu(a->s) - be32_to_cpu(b->s);
> >  }
> >  
> > -/* If there's an extent, we're done. */
> > +struct xfs_btree_scan_keyfill {
> > +	/* Keys for the start and end of the range we want to know about. */
> > +	union xfs_btree_key	start_key;
> > +	union xfs_btree_key	end_key;
> > +
> > +	/* Highest record key we've seen so far. */
> > +	union xfs_btree_key	high_key;
> > +
> > +	enum xfs_btree_keyfill	outcome;
> > +};
> 
> This "keyfill" scan operation is completely private to
> xfs_btree_has_record(), which further indicates the higher level API
> should not be renamed "keyfill"....

struct xfbt_has_records?

> > +
> >  STATIC int
> > -xfs_btree_has_record_helper(
> > +xfs_btree_scan_keyfill_helper(
> >  	struct xfs_btree_cur		*cur,
> >  	const union xfs_btree_rec	*rec,
> >  	void				*priv)
> >  {
> > -	return -ECANCELED;
> > +	union xfs_btree_key		rec_key;
> > +	union xfs_btree_key		rec_high_key;
> > +	struct xfs_btree_scan_keyfill	*info = priv;
> > +	int64_t				res;
> > +
> > +	cur->bc_ops->init_key_from_rec(&rec_key, rec);
> > +
> > +	if (info->outcome == XFS_BTREE_KEYFILL_EMPTY) {
> > +		info->outcome = XFS_BTREE_KEYFILL_SPARSE;
> > +
> > +		/* Bail if the first record starts after the start key. */
> > +		res = cur->bc_ops->diff_two_keys(cur, &info->start_key,
> > +				&rec_key);
> > +		if (res < 0)
> > +			return -ECANCELED;
> 
> Better comment needed.
> 
> 		/*
> 		 * If the first record we find does not overlap the
> 		 * start key, then there is a hole at the start of
> 		 * the search range before the overlap was found.
> 		 * Hence we can classify this as a sparse overlap
> 		 * and we don't need to search any further.
> 		 */

Added.

> > +	} else {
> > +		/* Bail if there's a gap with the previous record. */
> > +		if (cur->bc_ops->has_key_gap(cur, &info->high_key, &rec_key))
> > +			return -ECANCELED;
> 
> Ah, yeah, there's the high key -> low key implementation assumption.

Yes.

> > +	}
> > +
> > +	/* If the current record is higher than what we've seen, remember it. */
> > +	cur->bc_ops->init_high_key_from_rec(&rec_high_key, rec);
> > +	res = cur->bc_ops->diff_two_keys(cur, &rec_high_key, &info->high_key);
> > +	if (res > 0)
> > +		info->high_key = rec_high_key; /* struct copy */
> > +
> > +	return 0;
> >  }
> >  
> > -/* Is there a record covering a given range of keys? */
> > +/*
> > + * Scan part of the keyspace of a btree and tell us if the area has no records,
> > + * is fully mapped by records, or is partially filled.
> > + */
> >  int
> > -xfs_btree_has_record(
> > +xfs_btree_scan_keyfill(
> >  	struct xfs_btree_cur		*cur,
> >  	const union xfs_btree_irec	*low,
> >  	const union xfs_btree_irec	*high,
> > -	bool				*exists)
> > +	enum xfs_btree_keyfill		*outcome)
> >  {
> > +	struct xfs_btree_scan_keyfill	info = {
> > +		.outcome		= XFS_BTREE_KEYFILL_EMPTY,
> > +	};
> > +	union xfs_btree_rec		rec;
> > +	int64_t				res;
> >  	int				error;
> >  
> > +	if (!cur->bc_ops->has_key_gap)
> > +		return -EOPNOTSUPP;
> > +
> > +	cur->bc_rec = *low;
> > +	cur->bc_ops->init_rec_from_cur(cur, &rec);
> > +	cur->bc_ops->init_key_from_rec(&info.start_key, &rec);
> > +
> > +	cur->bc_rec = *high;
> > +	cur->bc_ops->init_rec_from_cur(cur, &rec);
> > +	cur->bc_ops->init_key_from_rec(&info.end_key, &rec);
> 
> Didn't a previous patch I just create helpers for these?
> 
> Oh.... patches in the series were threaded in the wrong order...

Yeah.  I'll rearrange these.

> 
> > +
> >  	error = xfs_btree_query_range(cur, low, high,
> > -			&xfs_btree_has_record_helper, NULL);
> > -	if (error == -ECANCELED) {
> > -		*exists = true;
> > -		return 0;
> > -	}
> > -	*exists = false;
> > -	return error;
> > +			xfs_btree_scan_keyfill_helper, &info);
> > +	if (error == -ECANCELED)
> > +		goto out;
> > +	if (error)
> > +		return error;
> > +
> > +	if (info.outcome == XFS_BTREE_KEYFILL_EMPTY)
> > +		goto out;
> > +
> > +	/* Did the record set go at least as far as the end? */
> > +	res = cur->bc_ops->diff_two_keys(cur, &info.high_key, &info.end_key);
> > +	if (res >= 0)
> > +		info.outcome = XFS_BTREE_KEYFILL_FULL;
> 
> Hmmm. I'm wondering if we should have helpers for these sorts of
> key comparisons.
> 
> static bool
> xfs_btree_keycmp_lt(
> 	struct xfs_btree_cur	*cur,
> 	struct xfs_btree_key	*key1,
> 	struct xfs_btree_key	*key2)
> {
> 	return cur->bc_ops->diff_two_keys(cur, key1, key2) < 0;
> }
> 
> static bool
> xfs_btree_keycmp_gt(
> 	struct xfs_btree_cur	*cur,
> 	struct xfs_btree_key	*key1,
> 	struct xfs_btree_key	*key2)
> {
> 	return cur->bc_ops->diff_two_keys(cur, key1, key2) > 0;
> }
> 
> static bool
> xfs_btree_keycmp_ge(
> 	struct xfs_btree_cur	*cur,
> 	struct xfs_btree_key	*key1,
> 	struct xfs_btree_key	*key2)
> {
> 	return cur->bc_ops->diff_two_keys(cur, key1, key2) >= 0;
> }
> 
> Which then makes the code read a whole lot nicer:
> 
> 	/* Did the record set go at least as far as the end? */
> 	if (xfs_btree_keycmp_ge(cur, &info.high_key, &info.end_key))
> 		info.outcome = XFS_BTREE_KEYFILL_FULL;
> ...
> 
> Not necessary for this patch, but I note there are a few places
> where these sorts of key range/ordering checks are open coded...

Yeah.  Every time I squint at "< 0" "> 0" and have to remember what all
that means.  I'll clean that one up too.

> > +
> > +out:
> > +	*outcome = info.outcome;
> > +	return 0;
> >  }
> >  
> >  /* Are there more records in this btree? */
> > diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
> > index eef27858a013..58a05f0d1f1b 100644
> > --- a/fs/xfs/libxfs/xfs_btree.h
> > +++ b/fs/xfs/libxfs/xfs_btree.h
> > @@ -157,6 +157,11 @@ struct xfs_btree_ops {
> >  	int	(*recs_inorder)(struct xfs_btree_cur *cur,
> >  				const union xfs_btree_rec *r1,
> >  				const union xfs_btree_rec *r2);
> > +
> > +	/* decide if there's a gap in the keyspace between two keys */
> > +	bool	(*has_key_gap)(struct xfs_btree_cur *cur,
> > +			       const union xfs_btree_key *key1,
> > +			       const union xfs_btree_key *key2);
> 
> Having read through it this far, this looks like it is checking for
> two discrete keys form a contiguous range? Perhaps that's a better
> name than "gap", because "contiguous" means different things for
> different keys. e.g. extents vs inode records.

bc_ops->keys_contiguous()?  Sounds good to me.

> 
> 
> > diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
> > index 8c83e265770c..fd48b95b4f4e 100644
> > --- a/fs/xfs/libxfs/xfs_ialloc_btree.c
> > +++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
> > @@ -380,6 +380,18 @@ xfs_inobt_recs_inorder(
> >  		be32_to_cpu(r2->inobt.ir_startino);
> >  }
> >  
> > +STATIC bool
> > +xfs_inobt_has_key_gap(
> > +	struct xfs_btree_cur		*cur,
> > +	const union xfs_btree_key	*key1,
> > +	const union xfs_btree_key	*key2)
> > +{
> > +	xfs_agino_t			next;
> > +
> > +	next = be32_to_cpu(key1->inobt.ir_startino) + XFS_INODES_PER_CHUNK;
> > +	return next != be32_to_cpu(key2->inobt.ir_startino);
> > +}
> 
> Huh. Is that correct? The high key for an inode chunk is:
> 
> STATIC void                                                                      
> xfs_inobt_init_high_key_from_rec(                                                
>         union xfs_btree_key             *key,                                    
>         const union xfs_btree_rec       *rec)                                    
> {                                                                                
>         __u32                           x;                                       
>                                                                                  
>         x = be32_to_cpu(rec->inobt.ir_startino);                                 
>         x += XFS_INODES_PER_CHUNK - 1;                                           
>         key->inobt.ir_startino = cpu_to_be32(x);                                 
> }                                                                                
> 
> Which means highkey->ir_startino (end range pointer) points to
> low_key->ir_startino + 63 (start range pointer + inodes in chunk)
> 
> Hence if this "gap" API is supposed to be passed {high_key,
> low_key}, then xfs_inobt_has_key_gap() is checking
> (low_key->ir_startino + 127) against next_low_key->ir_startino...

Oops, I committed the correct code into the wrong patch.  Some times I
really hate stgit.  This has gotten better recently now that I figured
out how to dump the branch and patch name into PS1.

> > +STATIC bool
> > +xfs_refcountbt_has_key_gap(
> > +	struct xfs_btree_cur		*cur,
> > +	const union xfs_btree_key	*key1,
> > +	const union xfs_btree_key	*key2)
> > +{
> > +	xfs_agblock_t			next;
> > +
> > +	next = be32_to_cpu(key1->refc.rc_startblock) + 1;
> > +	return next != be32_to_cpu(key2->refc.rc_startblock);
> > +}
> 
> ... and this matches the BMBT code (as does the rmapbt code), which seems to
> assume a high key (end extent pointer) is being passed as key1, and key2 is
> a low key (start extent pointer).
> 
> Am I completely misunderstanding what the key gap API uses for
> low_key and high_key? I am completely confused now...

You've understood btree keyspace sparseness scanning correctly.
My apologies for making it harder than it had to be, and thanks for
wading through all this.

--D

> 
> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/5] xfs: mask key comparisons for keyspace fill scans
  2022-11-03  1:08     ` Darrick J. Wong
@ 2022-11-03 21:12       ` Darrick J. Wong
  0 siblings, 0 replies; 14+ messages in thread
From: Darrick J. Wong @ 2022-11-03 21:12 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Nov 02, 2022 at 06:08:12PM -0700, Darrick J. Wong wrote:
> On Wed, Nov 02, 2022 at 01:16:26PM +1100, Dave Chinner wrote:
> > On Sun, Oct 02, 2022 at 11:20:16AM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > > 
> > > For keyspace fullness scans, we want to be able to mask off the parts of
> > > the key that we don't care about.  For most btree types we /do/ want the
> > > full keyspace, but for checking that a given space usage also has a full
> > > complement of rmapbt records (even if different/multiple owners) we need
> > > this masking so that we only track sparseness of rm_startblock, not the
> > > whole keyspace (which is extremely sparse).
> > > 
> > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > ---
> > ....
> > > diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
> > > index edea6db8d8e4..6fbce2f3c17e 100644
> > > --- a/fs/xfs/libxfs/xfs_btree.c
> > > +++ b/fs/xfs/libxfs/xfs_btree.c
> > > @@ -5020,12 +5020,33 @@ struct xfs_btree_scan_keyfill {
> > >  	union xfs_btree_key	start_key;
> > >  	union xfs_btree_key	end_key;
> > >  
> > > +	/* Mask for key comparisons, if desired. */
> > > +	union xfs_btree_key	*key_mask;
> > 
> > How does this mask work? i.e. the way it is supposed to be used it
> > not documented in either the commit message or the code....
> 
> When I merge all of this into _diff_two_keys, I'll add this to its
> description:
> 
> "Normally, each btree type's _diff_two_keys implementation will use all
> available btree key fields to compare the given keys.  However, some
> callers may prefer to ignore some part of the btree record keyspace when
> performing the comparison.
> 
> These callers should create a union xfs_btree_key object, set the fields
> that *should* be a part of the comparison to any nonzero value, and
> leave the rest zeroed.  That object should be passed in as the @key_mask
> value."
> 
> For a concrete example, take the rmap space scanning function below.  If
> we only want to know if a certain range of physical blocks has zero rmap
> records, enough rmap records to account for every block in the range, or
> some records in between, we'd initialize the key mask as follows:
> 
> 	union xfs_btree_key	km = {
> 		.rmap.rm_startblock = 1,
> 	};
> 
> and the _scan_keyfill function will only look that far into the key
> comparisons.
> 
> > 
> > > +STATIC int64_t
> > > +xfs_btree_diff_two_masked_keys(
> > > +	struct xfs_btree_cur		*cur,
> > > +	const union xfs_btree_key	*key1,
> > > +	const union xfs_btree_key	*key2,
> > > +	const union xfs_btree_key	*mask)
> > > +{
> > > +	union xfs_btree_key		mk1, mk2;
> > > +
> > > +	if (likely(!mask))
> > > +		return cur->bc_ops->diff_two_keys(cur, key1, key2);
> > > +
> > > +	cur->bc_ops->mask_key(cur, &mk1, key1, mask);
> > > +	cur->bc_ops->mask_key(cur, &mk2, key2, mask);
> > > +
> > > +	return cur->bc_ops->diff_two_keys(cur, &mk1, &mk2);
> > > +}
> > 
> > This seems .... very abstract.
> 
> Yes, I've gone unusually deep into database theory here...
> 
> > Why not just add a mask pointer to xfs_btree_diff_two_keys(),
> > and in each of the btree implementations of ->diff_two_keys()
> > change them to:
> > 
> > 	if (mask) {
> > 		/* mask keys */
> > 	}
> > 	/* run existing diff on two keys */
> > 
> > That gets rid of all these function pointer calls, and we only need
> > a single "diff two keys" api to be defined...
> 
> Ok, I'll look into combining mask_key into diff_two_keys.
> 
> > > diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
> > > index 58a05f0d1f1b..99baa8283049 100644
> > > --- a/fs/xfs/libxfs/xfs_btree.h
> > > +++ b/fs/xfs/libxfs/xfs_btree.h
> > > @@ -158,10 +158,17 @@ struct xfs_btree_ops {
> > >  				const union xfs_btree_rec *r1,
> > >  				const union xfs_btree_rec *r2);
> > >  
> > > +	/* mask a key for us */
> > > +	void	(*mask_key)(struct xfs_btree_cur *cur,
> > > +			    union xfs_btree_key *out_key,
> > > +			    const union xfs_btree_key *in_key,
> > > +			    const union xfs_btree_key *mask);
> > > +
> > >  	/* decide if there's a gap in the keyspace between two keys */
> > >  	bool	(*has_key_gap)(struct xfs_btree_cur *cur,
> > >  			       const union xfs_btree_key *key1,
> > > -			       const union xfs_btree_key *key2);
> > > +			       const union xfs_btree_key *key2,
> > > +			       const union xfs_btree_key *mask);
> > >  };
> > >  
> > >  /*
> > > @@ -552,6 +559,7 @@ typedef bool (*xfs_btree_key_gap_fn)(struct xfs_btree_cur *cur,
> > >  int xfs_btree_scan_keyfill(struct xfs_btree_cur *cur,
> > >  		const union xfs_btree_irec *low,
> > >  		const union xfs_btree_irec *high,
> > > +		const union xfs_btree_irec *mask,
> > >  		enum xfs_btree_keyfill *outcome);
> > >  
> > >  bool xfs_btree_has_more_records(struct xfs_btree_cur *cur);
> > > diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
> > > index fd48b95b4f4e..d429ca8d9dd8 100644
> > > --- a/fs/xfs/libxfs/xfs_ialloc_btree.c
> > > +++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
> > > @@ -384,11 +384,14 @@ STATIC bool
> > >  xfs_inobt_has_key_gap(
> > >  	struct xfs_btree_cur		*cur,
> > >  	const union xfs_btree_key	*key1,
> > > -	const union xfs_btree_key	*key2)
> > > +	const union xfs_btree_key	*key2,
> > > +	const union xfs_btree_key	*mask)
> > >  {
> > >  	xfs_agino_t			next;
> > >  
> > > -	next = be32_to_cpu(key1->inobt.ir_startino) + XFS_INODES_PER_CHUNK;
> > > +	ASSERT(!mask || mask->inobt.ir_startino);
> > > +
> > > +	next = be32_to_cpu(key1->inobt.ir_startino) + 1;
> > >  	return next != be32_to_cpu(key2->inobt.ir_startino);
> > >  }
> > 
> > I think you just fixed the issue I noticed in the last patch....
> 
> Oops, I clearly committed this to the wrong patch.  Sorry about that.
> 
> > > diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
> > > index 08d47cbf4697..4c123b6dd080 100644
> > > --- a/fs/xfs/libxfs/xfs_rmap.c
> > > +++ b/fs/xfs/libxfs/xfs_rmap.c
> > > @@ -2685,13 +2685,18 @@ xfs_rmap_scan_keyfill(
> > >  {
> > >  	union xfs_btree_irec	low;
> > >  	union xfs_btree_irec	high;
> > > +	union xfs_btree_irec	mask;
> > > +
> > > +	/* Only care about space scans here */
> > > +	memset(&mask, 0, sizeof(low));
> > 
> > sizeof(mask)?
> 
> Yep.
> 
> > > diff --git a/fs/xfs/libxfs/xfs_rmap_btree.c b/fs/xfs/libxfs/xfs_rmap_btree.c
> > > index d64143a842ce..9ca60f709c4b 100644
> > > --- a/fs/xfs/libxfs/xfs_rmap_btree.c
> > > +++ b/fs/xfs/libxfs/xfs_rmap_btree.c
> > > @@ -433,16 +433,55 @@ xfs_rmapbt_recs_inorder(
> > >  	return 0;
> > >  }
> > >  
> > > +STATIC void
> > > +xfs_rmapbt_mask_key(
> > > +	struct xfs_btree_cur		*cur,
> > > +	union xfs_btree_key		*out_key,
> > > +	const union xfs_btree_key	*in_key,
> > > +	const union xfs_btree_key	*mask)
> > > +{
> > > +	memset(out_key, 0, sizeof(union xfs_btree_key));
> > > +
> > > +	if (mask->rmap.rm_startblock)
> > > +		out_key->rmap.rm_startblock = in_key->rmap.rm_startblock;
> > > +	if (mask->rmap.rm_owner)
> > > +		out_key->rmap.rm_owner = in_key->rmap.rm_owner;
> > > +	if (mask->rmap.rm_offset)
> > > +		out_key->rmap.rm_offset = in_key->rmap.rm_offset;
> > > +}
> > 
> > So the mask fields are just used as boolean state to select what
> > should be copied into the masked key?
> 
> Yes.  A zeroed keymask field will be ignored, a nonzero keymask field
> will be retained.
> 
> > > +
> > >  STATIC bool
> > >  xfs_rmapbt_has_key_gap(
> > >  	struct xfs_btree_cur		*cur,
> > >  	const union xfs_btree_key	*key1,
> > > -	const union xfs_btree_key	*key2)
> > > +	const union xfs_btree_key	*key2,
> > > +	const union xfs_btree_key	*mask)
> > >  {
> > > -	xfs_agblock_t			next;
> > > +	bool				reflink = xfs_has_reflink(cur->bc_mp);
> > > +	uint64_t			x, y;
> > >  
> > > -	next = be32_to_cpu(key1->rmap.rm_startblock) + 1;
> > > -	return next != be32_to_cpu(key2->rmap.rm_startblock);
> > > +	if (mask->rmap.rm_offset) {
> > > +		x = be64_to_cpu(key1->rmap.rm_offset) + 1;
> > > +		y = be64_to_cpu(key2->rmap.rm_offset);
> > > +		if ((reflink && x < y) || (!reflink && x != y))
> > > +			return true;
> > > +	}
> > > +
> > > +	if (mask->rmap.rm_owner) {
> > > +		x = be64_to_cpu(key1->rmap.rm_owner) + 1;
> > > +		y = be64_to_cpu(key2->rmap.rm_owner);
> > > +		if ((reflink && x < y) || (!reflink && x != y))
> > > +			return true;
> > > +	}
> > > +
> > > +	if (mask->rmap.rm_startblock) {
> > > +		x = be32_to_cpu(key1->rmap.rm_startblock) + 1;
> > > +		y = be32_to_cpu(key2->rmap.rm_startblock);
> > > +		if ((reflink && x < y) || (!reflink && x != y))
> > > +			return true;
> > > +	}
> > > +
> > > +	return false;
> > 
> > Urk. That needs a comment explaining what all the mystery reflink
> > logic is doing.

I decided to fix this in the "_scan_keyfill" patch by changing the
->keys_contiguous function to return if the keys are contiguous, have a
gap, or overlap.  That way, I could make _has_records detect improperly
overlapping records and return EFSCORRUPTED.  Having done that, it's no
longer necessary to have all this reflink logic to handle rmaps for
shared blocks.

Then I thought about the complexity of this patch, and realized that
there's only one user of the btree key masks, and that is
xfs_rmapbt_has_records.  That single user masks off everything in the
xfs_btree_key except for rm_startblock.  All other callers do not
provide masks at all.

IOWs, I could avoid introducing all this untestable complexity by
dropping this patch entirely.  That would leave the function incomplete,
but that makes things simpler for now because I no longer have to worry
about making the comparison handle numeric gaps in the keyspace where
it's never possible to have keys (e.g. detecting records for a bmbt
block and skipping the offset comparison).

So.  I'm dropping this patch.

--D

> > It also needs to explain the order of precedence on
> > the mask checks and why that order is important (or not!).
> 
> For the purpose of comparing two keys to decide if there's a gap in the
> keyspace, we do the comparisons in order of most sensitive to least
> sensitive.  This is the opposite order of diff_two_keys -- if two keys
> have the same startblock and inode but discontiguous file offsets,
> there's a gap.  If two keys have the same startblock but different
> owners, there's a gap regardless of the offset.  What that means is a
> bit murky, since the only user of this functionality passes a mask so
> that we only compare the perag parts fo the keys.
>
> Except for the masking, I think the comparison logic all got committed
> to the wrong patch.  :(
> 
> --D
> 
> > Cheers,
> > 
> > Dave.
> > -- 
> > Dave Chinner
> > david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-11-03 21:12 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-02 18:20 [PATCHSET v23.1 0/5] xfs: detect incorrect gaps in refcount btree Darrick J. Wong
2022-10-02 18:20 ` [PATCH 3/5] xfs: mask key comparisons for keyspace fill scans Darrick J. Wong
2022-11-02  2:16   ` Dave Chinner
2022-11-03  1:08     ` Darrick J. Wong
2022-11-03 21:12       ` Darrick J. Wong
2022-10-02 18:20 ` [PATCH 2/5] xfs: refactor converting btree irec to btree key Darrick J. Wong
2022-11-02  0:31   ` Dave Chinner
2022-10-02 18:20 ` [PATCH 1/5] xfs: replace xfs_btree_has_record with a general keyspace scanner Darrick J. Wong
2022-11-02  1:47   ` Dave Chinner
2022-11-03  1:56     ` Darrick J. Wong
2022-10-02 18:20 ` [PATCH 4/5] xfs: check the reference counts of gaps in the refcount btree Darrick J. Wong
2022-11-02  2:20   ` Dave Chinner
2022-10-02 18:20 ` [PATCH 5/5] xfs: ensure that all metadata and data blocks are not cow staging extents Darrick J. Wong
2022-11-02  0:29   ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).