All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET 0/5] xfs: improve runtime refcountbt corruption detection
@ 2022-10-24 21:33 Darrick J. Wong
  2022-10-24 21:33 ` [PATCH 1/5] xfs: move _irec structs to xfs_types.h Darrick J. Wong
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Darrick J. Wong @ 2022-10-24 21:33 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

Hi all,

Fuzz testing of the refcount btree demonstrated a weakness in validation
of refcount btree records during normal runtime.  The idea of using the
upper bit of the rc_startblock field to separate the refcount records
into one group for shared space and another for CoW staging extents was
added at the last minute.  The incore struct left this bit encoded in
the upper bit of the startblock field, which makes it all too easy for
arithmetic operations to overflow if we don't detect the cowflag
properly.

When I ran a norepair fuzz tester, I was able to crash the kernel on one
of these accidental overflows by fuzzing a key record in a node block,
which broke lookups.  To fix the problem, make the domain (shared/cow) a
separate field in the incore record.

Unfortunately, a customer also hit this once in production.  Due to bugs
in the kernel running on the VM host, writes to the disk image would
occasionally be lost.  Given sufficient memory pressure on the VM guest,
a refcountbt xfs_buf could be reclaimed and later reloaded from the
stale copy on the virtual disk.  The stale disk contents were a refcount
btree leaf block full of records for the wrong domain, and this caused
an infinite loop in the guest VM.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=refcount-cow-domain-6.1
---
 fs/xfs/libxfs/xfs_format.h         |   22 ---
 fs/xfs/libxfs/xfs_refcount.c       |  269 ++++++++++++++++++++++++++----------
 fs/xfs/libxfs/xfs_refcount.h       |    9 +
 fs/xfs/libxfs/xfs_refcount_btree.c |   26 +++
 fs/xfs/libxfs/xfs_types.h          |   30 ++++
 fs/xfs/scrub/refcount.c            |   72 ++++------
 fs/xfs/xfs_trace.h                 |   48 +++++-
 7 files changed, 324 insertions(+), 152 deletions(-)


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/5] xfs: move _irec structs to xfs_types.h
  2022-10-24 21:33 [PATCHSET 0/5] xfs: improve runtime refcountbt corruption detection Darrick J. Wong
@ 2022-10-24 21:33 ` Darrick J. Wong
  2022-10-25 23:56   ` Dave Chinner
  2022-10-24 21:33 ` [PATCH 2/5] xfs: refactor refcount record usage in xchk_refcountbt_rec Darrick J. Wong
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Darrick J. Wong @ 2022-10-24 21:33 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Structure definitions for incore objects do not belong in the ondisk
format header.  Move them to the incore types header where they belong.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_format.h |   20 --------------------
 fs/xfs/libxfs/xfs_types.h  |   20 ++++++++++++++++++++
 2 files changed, 20 insertions(+), 20 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index b55bdfa9c8a8..005dd65b71cd 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -1564,20 +1564,6 @@ struct xfs_rmap_rec {
 #define RMAPBT_UNUSED_OFFSET_BITLEN	7
 #define RMAPBT_OFFSET_BITLEN		54
 
-#define XFS_RMAP_ATTR_FORK		(1 << 0)
-#define XFS_RMAP_BMBT_BLOCK		(1 << 1)
-#define XFS_RMAP_UNWRITTEN		(1 << 2)
-#define XFS_RMAP_KEY_FLAGS		(XFS_RMAP_ATTR_FORK | \
-					 XFS_RMAP_BMBT_BLOCK)
-#define XFS_RMAP_REC_FLAGS		(XFS_RMAP_UNWRITTEN)
-struct xfs_rmap_irec {
-	xfs_agblock_t	rm_startblock;	/* extent start block */
-	xfs_extlen_t	rm_blockcount;	/* extent length */
-	uint64_t	rm_owner;	/* extent owner */
-	uint64_t	rm_offset;	/* offset within the owner */
-	unsigned int	rm_flags;	/* state flags */
-};
-
 /*
  * Key structure
  *
@@ -1640,12 +1626,6 @@ struct xfs_refcount_key {
 	__be32		rc_startblock;	/* starting block number */
 };
 
-struct xfs_refcount_irec {
-	xfs_agblock_t	rc_startblock;	/* starting block number */
-	xfs_extlen_t	rc_blockcount;	/* count of free blocks */
-	xfs_nlink_t	rc_refcount;	/* number of inodes linked here */
-};
-
 #define MAXREFCOUNT	((xfs_nlink_t)~0U)
 #define MAXREFCEXTLEN	((xfs_extlen_t)~0U)
 
diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h
index a6b7d98cf68f..2d9ebc7338b1 100644
--- a/fs/xfs/libxfs/xfs_types.h
+++ b/fs/xfs/libxfs/xfs_types.h
@@ -166,6 +166,26 @@ typedef struct xfs_bmbt_irec
 	xfs_exntst_t	br_state;	/* extent state */
 } xfs_bmbt_irec_t;
 
+struct xfs_refcount_irec {
+	xfs_agblock_t	rc_startblock;	/* starting block number */
+	xfs_extlen_t	rc_blockcount;	/* count of free blocks */
+	xfs_nlink_t	rc_refcount;	/* number of inodes linked here */
+};
+
+#define XFS_RMAP_ATTR_FORK		(1 << 0)
+#define XFS_RMAP_BMBT_BLOCK		(1 << 1)
+#define XFS_RMAP_UNWRITTEN		(1 << 2)
+#define XFS_RMAP_KEY_FLAGS		(XFS_RMAP_ATTR_FORK | \
+					 XFS_RMAP_BMBT_BLOCK)
+#define XFS_RMAP_REC_FLAGS		(XFS_RMAP_UNWRITTEN)
+struct xfs_rmap_irec {
+	xfs_agblock_t	rm_startblock;	/* extent start block */
+	xfs_extlen_t	rm_blockcount;	/* extent length */
+	uint64_t	rm_owner;	/* extent owner */
+	uint64_t	rm_offset;	/* offset within the owner */
+	unsigned int	rm_flags;	/* state flags */
+};
+
 /* per-AG block reservation types */
 enum xfs_ag_resv_type {
 	XFS_AG_RESV_NONE = 0,


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/5] xfs: refactor refcount record usage in xchk_refcountbt_rec
  2022-10-24 21:33 [PATCHSET 0/5] xfs: improve runtime refcountbt corruption detection Darrick J. Wong
  2022-10-24 21:33 ` [PATCH 1/5] xfs: move _irec structs to xfs_types.h Darrick J. Wong
@ 2022-10-24 21:33 ` Darrick J. Wong
  2022-10-25 23:59   ` Dave Chinner
  2022-10-24 21:33 ` [PATCH 3/5] xfs: track cow/shared record domains explicitly in xfs_refcount_irec Darrick J. Wong
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Darrick J. Wong @ 2022-10-24 21:33 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Consolidate the open-coded xfs_refcount_irec fields into an actual
struct and use the existing _btrec_to_irec to decode the ondisk record.
This will reduce code churn in the next patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/refcount.c |   58 +++++++++++++++++++++--------------------------
 1 file changed, 26 insertions(+), 32 deletions(-)


diff --git a/fs/xfs/scrub/refcount.c b/fs/xfs/scrub/refcount.c
index c68b767dc08f..8ab55e791ea8 100644
--- a/fs/xfs/scrub/refcount.c
+++ b/fs/xfs/scrub/refcount.c
@@ -269,15 +269,13 @@ xchk_refcountbt_process_rmap_fragments(
 STATIC void
 xchk_refcountbt_xref_rmap(
 	struct xfs_scrub		*sc,
-	xfs_agblock_t			bno,
-	xfs_extlen_t			len,
-	xfs_nlink_t			refcount)
+	const struct xfs_refcount_irec	*irec)
 {
 	struct xchk_refcnt_check	refchk = {
-		.sc = sc,
-		.bno = bno,
-		.len = len,
-		.refcount = refcount,
+		.sc			= sc,
+		.bno			= irec->rc_startblock,
+		.len			= irec->rc_blockcount,
+		.refcount		= irec->rc_refcount,
 		.seen = 0,
 	};
 	struct xfs_rmap_irec		low;
@@ -291,9 +289,9 @@ xchk_refcountbt_xref_rmap(
 
 	/* Cross-reference with the rmapbt to confirm the refcount. */
 	memset(&low, 0, sizeof(low));
-	low.rm_startblock = bno;
+	low.rm_startblock = irec->rc_startblock;
 	memset(&high, 0xFF, sizeof(high));
-	high.rm_startblock = bno + len - 1;
+	high.rm_startblock = irec->rc_startblock + irec->rc_blockcount - 1;
 
 	INIT_LIST_HEAD(&refchk.fragments);
 	error = xfs_rmap_query_range(sc->sa.rmap_cur, &low, &high,
@@ -302,7 +300,7 @@ xchk_refcountbt_xref_rmap(
 		goto out_free;
 
 	xchk_refcountbt_process_rmap_fragments(&refchk);
-	if (refcount != refchk.seen)
+	if (irec->rc_refcount != refchk.seen)
 		xchk_btree_xref_set_corrupt(sc, sc->sa.rmap_cur, 0);
 
 out_free:
@@ -315,17 +313,16 @@ xchk_refcountbt_xref_rmap(
 /* Cross-reference with the other btrees. */
 STATIC void
 xchk_refcountbt_xref(
-	struct xfs_scrub	*sc,
-	xfs_agblock_t		agbno,
-	xfs_extlen_t		len,
-	xfs_nlink_t		refcount)
+	struct xfs_scrub		*sc,
+	const struct xfs_refcount_irec	*irec)
 {
 	if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
 		return;
 
-	xchk_xref_is_used_space(sc, agbno, len);
-	xchk_xref_is_not_inode_chunk(sc, agbno, len);
-	xchk_refcountbt_xref_rmap(sc, agbno, len, refcount);
+	xchk_xref_is_used_space(sc, irec->rc_startblock, irec->rc_blockcount);
+	xchk_xref_is_not_inode_chunk(sc, irec->rc_startblock,
+			irec->rc_blockcount);
+	xchk_refcountbt_xref_rmap(sc, irec);
 }
 
 /* Scrub a refcountbt record. */
@@ -334,35 +331,32 @@ xchk_refcountbt_rec(
 	struct xchk_btree	*bs,
 	const union xfs_btree_rec *rec)
 {
+	struct xfs_refcount_irec irec;
 	xfs_agblock_t		*cow_blocks = bs->private;
 	struct xfs_perag	*pag = bs->cur->bc_ag.pag;
-	xfs_agblock_t		bno;
-	xfs_extlen_t		len;
-	xfs_nlink_t		refcount;
 	bool			has_cowflag;
 
-	bno = be32_to_cpu(rec->refc.rc_startblock);
-	len = be32_to_cpu(rec->refc.rc_blockcount);
-	refcount = be32_to_cpu(rec->refc.rc_refcount);
+	xfs_refcount_btrec_to_irec(rec, &irec);
 
 	/* Only CoW records can have refcount == 1. */
-	has_cowflag = (bno & XFS_REFC_COW_START);
-	if ((refcount == 1 && !has_cowflag) || (refcount != 1 && has_cowflag))
+	has_cowflag = (irec.rc_startblock & XFS_REFC_COW_START);
+	if ((irec.rc_refcount == 1 && !has_cowflag) ||
+	    (irec.rc_refcount != 1 && has_cowflag))
 		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
 	if (has_cowflag)
-		(*cow_blocks) += len;
+		(*cow_blocks) += irec.rc_blockcount;
 
 	/* Check the extent. */
-	bno &= ~XFS_REFC_COW_START;
-	if (bno + len <= bno ||
-	    !xfs_verify_agbno(pag, bno) ||
-	    !xfs_verify_agbno(pag, bno + len - 1))
+	irec.rc_startblock &= ~XFS_REFC_COW_START;
+	if (irec.rc_startblock + irec.rc_blockcount <= irec.rc_startblock ||
+	    !xfs_verify_agbno(pag, irec.rc_startblock) ||
+	    !xfs_verify_agbno(pag, irec.rc_startblock + irec.rc_blockcount - 1))
 		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
 
-	if (refcount == 0)
+	if (irec.rc_refcount == 0)
 		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
 
-	xchk_refcountbt_xref(bs->sc, bno, len, refcount);
+	xchk_refcountbt_xref(bs->sc, &irec);
 
 	return 0;
 }


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/5] xfs: track cow/shared record domains explicitly in xfs_refcount_irec
  2022-10-24 21:33 [PATCHSET 0/5] xfs: improve runtime refcountbt corruption detection Darrick J. Wong
  2022-10-24 21:33 ` [PATCH 1/5] xfs: move _irec structs to xfs_types.h Darrick J. Wong
  2022-10-24 21:33 ` [PATCH 2/5] xfs: refactor refcount record usage in xchk_refcountbt_rec Darrick J. Wong
@ 2022-10-24 21:33 ` Darrick J. Wong
  2022-10-26  0:40   ` Dave Chinner
  2022-10-24 21:33 ` [PATCH 4/5] xfs: rename XFS_REFC_COW_START to _COWFLAG Darrick J. Wong
  2022-10-24 21:33 ` [PATCH 5/5] xfs: check deferred refcount op continuation parameters Darrick J. Wong
  4 siblings, 1 reply; 13+ messages in thread
From: Darrick J. Wong @ 2022-10-24 21:33 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Just prior to committing the reflink code into upstream, the xfs
maintainer at the time requested that I find a way to shard the refcount
records into two domains -- one for records tracking shared extents, and
a second for tracking CoW staging extents.  The idea here was to
minimize mount time CoW reclamation by pushing all the CoW records to
the right edge of the keyspace, and it was accomplished by setting the
upper bit in rc_startblock.  We don't allow AGs to have more than 2^31
blocks, so the bit was free.

Unfortunately, this was a very late addition to the codebase, so most of
the refcount record processing code still treats rc_startblock as a u32
and pays no attention to whether or not the upper bit (the cow flag) is
set.  This is a weakness is theoretically exploitable, since we're not
fully validating the incoming metadata records.

Fuzzing demonstrates practical exploits of this weakness.  If the cow
flag of a node block key record is corrupted, a lookup operation can go
to the wrong record block and start returning records from the wrong
cow/shared domain.  This causes the math to go all wrong (since cow
domain is still implicit in the upper bit of rc_startblock) and we can
crash the kernel by tricking xfs into jumping into a nonexistent AG and
tripping over xfs_perag_get(mp, <nonexistent AG>) returning NULL.

To fix this, start tracking the domain as an explicit part of struct
xfs_refcount_irec, adjust all refcount functions to check the domain
of a returned record, and alter the function definitions to accept them
where necessary.

Found by fuzzing keys[2].cowflag = add in xfs/464.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_refcount.c       |  221 ++++++++++++++++++++++++------------
 fs/xfs/libxfs/xfs_refcount.h       |    9 +
 fs/xfs/libxfs/xfs_refcount_btree.c |   26 ++++
 fs/xfs/libxfs/xfs_types.h          |   10 ++
 fs/xfs/scrub/refcount.c            |   22 ++--
 fs/xfs/xfs_trace.h                 |   48 ++++++--
 6 files changed, 235 insertions(+), 101 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
index 64b910caafaa..fa75e785652b 100644
--- a/fs/xfs/libxfs/xfs_refcount.c
+++ b/fs/xfs/libxfs/xfs_refcount.c
@@ -46,13 +46,16 @@ STATIC int __xfs_refcount_cow_free(struct xfs_btree_cur *rcur,
 int
 xfs_refcount_lookup_le(
 	struct xfs_btree_cur	*cur,
+	enum xfs_rcext_domain	domain,
 	xfs_agblock_t		bno,
 	int			*stat)
 {
-	trace_xfs_refcount_lookup(cur->bc_mp, cur->bc_ag.pag->pag_agno, bno,
+	trace_xfs_refcount_lookup(cur->bc_mp, cur->bc_ag.pag->pag_agno,
+			bno | (domain == XFS_RCDOM_COW ? XFS_REFC_COW_START : 0),
 			XFS_LOOKUP_LE);
 	cur->bc_rec.rc.rc_startblock = bno;
 	cur->bc_rec.rc.rc_blockcount = 0;
+	cur->bc_rec.rc.rc_domain = domain;
 	return xfs_btree_lookup(cur, XFS_LOOKUP_LE, stat);
 }
 
@@ -63,13 +66,16 @@ xfs_refcount_lookup_le(
 int
 xfs_refcount_lookup_ge(
 	struct xfs_btree_cur	*cur,
+	enum xfs_rcext_domain	domain,
 	xfs_agblock_t		bno,
 	int			*stat)
 {
-	trace_xfs_refcount_lookup(cur->bc_mp, cur->bc_ag.pag->pag_agno, bno,
+	trace_xfs_refcount_lookup(cur->bc_mp, cur->bc_ag.pag->pag_agno,
+			bno | (domain == XFS_RCDOM_COW ? XFS_REFC_COW_START : 0),
 			XFS_LOOKUP_GE);
 	cur->bc_rec.rc.rc_startblock = bno;
 	cur->bc_rec.rc.rc_blockcount = 0;
+	cur->bc_rec.rc.rc_domain = domain;
 	return xfs_btree_lookup(cur, XFS_LOOKUP_GE, stat);
 }
 
@@ -80,13 +86,16 @@ xfs_refcount_lookup_ge(
 int
 xfs_refcount_lookup_eq(
 	struct xfs_btree_cur	*cur,
+	enum xfs_rcext_domain	domain,
 	xfs_agblock_t		bno,
 	int			*stat)
 {
-	trace_xfs_refcount_lookup(cur->bc_mp, cur->bc_ag.pag->pag_agno, bno,
+	trace_xfs_refcount_lookup(cur->bc_mp, cur->bc_ag.pag->pag_agno,
+			bno | (domain == XFS_RCDOM_COW ? XFS_REFC_COW_START : 0),
 			XFS_LOOKUP_LE);
 	cur->bc_rec.rc.rc_startblock = bno;
 	cur->bc_rec.rc.rc_blockcount = 0;
+	cur->bc_rec.rc.rc_domain = domain;
 	return xfs_btree_lookup(cur, XFS_LOOKUP_EQ, stat);
 }
 
@@ -96,7 +105,17 @@ xfs_refcount_btrec_to_irec(
 	const union xfs_btree_rec	*rec,
 	struct xfs_refcount_irec	*irec)
 {
-	irec->rc_startblock = be32_to_cpu(rec->refc.rc_startblock);
+	__u32				start;
+
+	start = be32_to_cpu(rec->refc.rc_startblock);
+	if (start & XFS_REFC_COW_START) {
+		start &= ~XFS_REFC_COW_START;
+		irec->rc_domain = XFS_RCDOM_COW;
+	} else {
+		irec->rc_domain = XFS_RCDOM_SHARED;
+	}
+
+	irec->rc_startblock = start;
 	irec->rc_blockcount = be32_to_cpu(rec->refc.rc_blockcount);
 	irec->rc_refcount = be32_to_cpu(rec->refc.rc_refcount);
 }
@@ -114,7 +133,6 @@ xfs_refcount_get_rec(
 	struct xfs_perag		*pag = cur->bc_ag.pag;
 	union xfs_btree_rec		*rec;
 	int				error;
-	xfs_agblock_t			realstart;
 
 	error = xfs_btree_get_rec(cur, &rec, stat);
 	if (error || !*stat)
@@ -124,22 +142,18 @@ xfs_refcount_get_rec(
 	if (irec->rc_blockcount == 0 || irec->rc_blockcount > MAXREFCEXTLEN)
 		goto out_bad_rec;
 
-	/* handle special COW-staging state */
-	realstart = irec->rc_startblock;
-	if (realstart & XFS_REFC_COW_START) {
-		if (irec->rc_refcount != 1)
-			goto out_bad_rec;
-		realstart &= ~XFS_REFC_COW_START;
-	} else if (irec->rc_refcount < 2) {
+	/* handle special COW-staging domain */
+	if (irec->rc_domain == XFS_RCDOM_COW && irec->rc_refcount != 1)
+		goto out_bad_rec;
+	if (irec->rc_domain == XFS_RCDOM_SHARED && irec->rc_refcount < 2)
 		goto out_bad_rec;
-	}
 
 	/* check for valid extent range, including overflow */
-	if (!xfs_verify_agbno(pag, realstart))
+	if (!xfs_verify_agbno(pag, irec->rc_startblock))
 		goto out_bad_rec;
-	if (realstart > realstart + irec->rc_blockcount)
+	if (irec->rc_startblock > irec->rc_startblock + irec->rc_blockcount)
 		goto out_bad_rec;
-	if (!xfs_verify_agbno(pag, realstart + irec->rc_blockcount - 1))
+	if (!xfs_verify_agbno(pag, irec->rc_startblock + irec->rc_blockcount - 1))
 		goto out_bad_rec;
 
 	if (irec->rc_refcount == 0 || irec->rc_refcount > MAXREFCOUNT)
@@ -169,12 +183,19 @@ xfs_refcount_update(
 	struct xfs_refcount_irec	*irec)
 {
 	union xfs_btree_rec	rec;
+	__u32			start;
 	int			error;
 
 	trace_xfs_refcount_update(cur->bc_mp, cur->bc_ag.pag->pag_agno, irec);
-	rec.refc.rc_startblock = cpu_to_be32(irec->rc_startblock);
+
+	start = irec->rc_startblock & ~XFS_REFC_COW_START;
+	if (irec->rc_domain == XFS_RCDOM_COW)
+		start |= XFS_REFC_COW_START;
+
+	rec.refc.rc_startblock = cpu_to_be32(start);
 	rec.refc.rc_blockcount = cpu_to_be32(irec->rc_blockcount);
 	rec.refc.rc_refcount = cpu_to_be32(irec->rc_refcount);
+
 	error = xfs_btree_update(cur, &rec);
 	if (error)
 		trace_xfs_refcount_update_error(cur->bc_mp,
@@ -196,9 +217,12 @@ xfs_refcount_insert(
 	int				error;
 
 	trace_xfs_refcount_insert(cur->bc_mp, cur->bc_ag.pag->pag_agno, irec);
+
 	cur->bc_rec.rc.rc_startblock = irec->rc_startblock;
 	cur->bc_rec.rc.rc_blockcount = irec->rc_blockcount;
 	cur->bc_rec.rc.rc_refcount = irec->rc_refcount;
+	cur->bc_rec.rc.rc_domain = irec->rc_domain;
+
 	error = xfs_btree_insert(cur, i);
 	if (error)
 		goto out_error;
@@ -244,7 +268,8 @@ xfs_refcount_delete(
 	}
 	if (error)
 		goto out_error;
-	error = xfs_refcount_lookup_ge(cur, irec.rc_startblock, &found_rec);
+	error = xfs_refcount_lookup_ge(cur, irec.rc_domain, irec.rc_startblock,
+			&found_rec);
 out_error:
 	if (error)
 		trace_xfs_refcount_delete_error(cur->bc_mp,
@@ -343,6 +368,7 @@ xfs_refc_next(
 STATIC int
 xfs_refcount_split_extent(
 	struct xfs_btree_cur		*cur,
+	enum xfs_rcext_domain		domain,
 	xfs_agblock_t			agbno,
 	bool				*shape_changed)
 {
@@ -351,7 +377,7 @@ xfs_refcount_split_extent(
 	int				error;
 
 	*shape_changed = false;
-	error = xfs_refcount_lookup_le(cur, agbno, &found_rec);
+	error = xfs_refcount_lookup_le(cur, domain, agbno, &found_rec);
 	if (error)
 		goto out_error;
 	if (!found_rec)
@@ -364,6 +390,9 @@ xfs_refcount_split_extent(
 		error = -EFSCORRUPTED;
 		goto out_error;
 	}
+
+	if (rcext.rc_domain != domain)
+		return 0;
 	if (rcext.rc_startblock == agbno || xfs_refc_next(&rcext) <= agbno)
 		return 0;
 
@@ -412,6 +441,9 @@ xfs_refcount_merge_center_extents(
 	int				error;
 	int				found_rec;
 
+	ASSERT(left->rc_domain == center->rc_domain);
+	ASSERT(right->rc_domain == center->rc_domain);
+
 	trace_xfs_refcount_merge_center_extents(cur->bc_mp,
 			cur->bc_ag.pag->pag_agno, left, center, right);
 
@@ -423,8 +455,8 @@ xfs_refcount_merge_center_extents(
 	 * call removes the center and the second one removes the right
 	 * extent.
 	 */
-	error = xfs_refcount_lookup_ge(cur, center->rc_startblock,
-			&found_rec);
+	error = xfs_refcount_lookup_ge(cur, center->rc_domain,
+			center->rc_startblock, &found_rec);
 	if (error)
 		goto out_error;
 	if (XFS_IS_CORRUPT(cur->bc_mp, found_rec != 1)) {
@@ -451,8 +483,8 @@ xfs_refcount_merge_center_extents(
 	}
 
 	/* Enlarge the left extent. */
-	error = xfs_refcount_lookup_le(cur, left->rc_startblock,
-			&found_rec);
+	error = xfs_refcount_lookup_le(cur, left->rc_domain,
+			left->rc_startblock, &found_rec);
 	if (error)
 		goto out_error;
 	if (XFS_IS_CORRUPT(cur->bc_mp, found_rec != 1)) {
@@ -488,13 +520,15 @@ xfs_refcount_merge_left_extent(
 	int				error;
 	int				found_rec;
 
+	ASSERT(left->rc_domain == cleft->rc_domain);
+
 	trace_xfs_refcount_merge_left_extent(cur->bc_mp,
 			cur->bc_ag.pag->pag_agno, left, cleft);
 
 	/* If the extent at agbno (cleft) wasn't synthesized, remove it. */
 	if (cleft->rc_refcount > 1) {
-		error = xfs_refcount_lookup_le(cur, cleft->rc_startblock,
-				&found_rec);
+		error = xfs_refcount_lookup_le(cur, cleft->rc_domain,
+				cleft->rc_startblock, &found_rec);
 		if (error)
 			goto out_error;
 		if (XFS_IS_CORRUPT(cur->bc_mp, found_rec != 1)) {
@@ -512,8 +546,8 @@ xfs_refcount_merge_left_extent(
 	}
 
 	/* Enlarge the left extent. */
-	error = xfs_refcount_lookup_le(cur, left->rc_startblock,
-			&found_rec);
+	error = xfs_refcount_lookup_le(cur, left->rc_domain,
+			left->rc_startblock, &found_rec);
 	if (error)
 		goto out_error;
 	if (XFS_IS_CORRUPT(cur->bc_mp, found_rec != 1)) {
@@ -549,6 +583,8 @@ xfs_refcount_merge_right_extent(
 	int				error;
 	int				found_rec;
 
+	ASSERT(right->rc_domain == cright->rc_domain);
+
 	trace_xfs_refcount_merge_right_extent(cur->bc_mp,
 			cur->bc_ag.pag->pag_agno, cright, right);
 
@@ -557,8 +593,8 @@ xfs_refcount_merge_right_extent(
 	 * remove it.
 	 */
 	if (cright->rc_refcount > 1) {
-		error = xfs_refcount_lookup_le(cur, cright->rc_startblock,
-			&found_rec);
+		error = xfs_refcount_lookup_le(cur, cright->rc_domain,
+				cright->rc_startblock, &found_rec);
 		if (error)
 			goto out_error;
 		if (XFS_IS_CORRUPT(cur->bc_mp, found_rec != 1)) {
@@ -576,8 +612,8 @@ xfs_refcount_merge_right_extent(
 	}
 
 	/* Enlarge the right extent. */
-	error = xfs_refcount_lookup_le(cur, right->rc_startblock,
-			&found_rec);
+	error = xfs_refcount_lookup_le(cur, right->rc_domain,
+			right->rc_startblock, &found_rec);
 	if (error)
 		goto out_error;
 	if (XFS_IS_CORRUPT(cur->bc_mp, found_rec != 1)) {
@@ -600,8 +636,6 @@ xfs_refcount_merge_right_extent(
 	return error;
 }
 
-#define XFS_FIND_RCEXT_SHARED	1
-#define XFS_FIND_RCEXT_COW	2
 /*
  * Find the left extent and the one after it (cleft).  This function assumes
  * that we've already split any extent crossing agbno.
@@ -611,16 +645,16 @@ xfs_refcount_find_left_extents(
 	struct xfs_btree_cur		*cur,
 	struct xfs_refcount_irec	*left,
 	struct xfs_refcount_irec	*cleft,
+	enum xfs_rcext_domain		domain,
 	xfs_agblock_t			agbno,
-	xfs_extlen_t			aglen,
-	int				flags)
+	xfs_extlen_t			aglen)
 {
 	struct xfs_refcount_irec	tmp;
 	int				error;
 	int				found_rec;
 
 	left->rc_startblock = cleft->rc_startblock = NULLAGBLOCK;
-	error = xfs_refcount_lookup_le(cur, agbno - 1, &found_rec);
+	error = xfs_refcount_lookup_le(cur, domain, agbno - 1, &found_rec);
 	if (error)
 		goto out_error;
 	if (!found_rec)
@@ -634,11 +668,13 @@ xfs_refcount_find_left_extents(
 		goto out_error;
 	}
 
+	if (tmp.rc_domain != domain)
+		return 0;
 	if (xfs_refc_next(&tmp) != agbno)
 		return 0;
-	if ((flags & XFS_FIND_RCEXT_SHARED) && tmp.rc_refcount < 2)
+	if (domain == XFS_RCDOM_SHARED && tmp.rc_refcount < 2)
 		return 0;
-	if ((flags & XFS_FIND_RCEXT_COW) && tmp.rc_refcount > 1)
+	if (domain == XFS_RCDOM_COW && tmp.rc_refcount > 1)
 		return 0;
 	/* We have a left extent; retrieve (or invent) the next right one */
 	*left = tmp;
@@ -655,6 +691,9 @@ xfs_refcount_find_left_extents(
 			goto out_error;
 		}
 
+		if (tmp.rc_domain != domain)
+			goto not_found;
+
 		/* if tmp starts at the end of our range, just use that */
 		if (tmp.rc_startblock == agbno)
 			*cleft = tmp;
@@ -671,8 +710,10 @@ xfs_refcount_find_left_extents(
 			cleft->rc_blockcount = min(aglen,
 					tmp.rc_startblock - agbno);
 			cleft->rc_refcount = 1;
+			cleft->rc_domain = domain;
 		}
 	} else {
+not_found:
 		/*
 		 * No extents, so pretend that there's one covering the whole
 		 * range.
@@ -680,6 +721,7 @@ xfs_refcount_find_left_extents(
 		cleft->rc_startblock = agbno;
 		cleft->rc_blockcount = aglen;
 		cleft->rc_refcount = 1;
+		cleft->rc_domain = domain;
 	}
 	trace_xfs_refcount_find_left_extent(cur->bc_mp, cur->bc_ag.pag->pag_agno,
 			left, cleft, agbno);
@@ -700,16 +742,16 @@ xfs_refcount_find_right_extents(
 	struct xfs_btree_cur		*cur,
 	struct xfs_refcount_irec	*right,
 	struct xfs_refcount_irec	*cright,
+	enum xfs_rcext_domain		domain,
 	xfs_agblock_t			agbno,
-	xfs_extlen_t			aglen,
-	int				flags)
+	xfs_extlen_t			aglen)
 {
 	struct xfs_refcount_irec	tmp;
 	int				error;
 	int				found_rec;
 
 	right->rc_startblock = cright->rc_startblock = NULLAGBLOCK;
-	error = xfs_refcount_lookup_ge(cur, agbno + aglen, &found_rec);
+	error = xfs_refcount_lookup_ge(cur, domain, agbno + aglen, &found_rec);
 	if (error)
 		goto out_error;
 	if (!found_rec)
@@ -723,11 +765,13 @@ xfs_refcount_find_right_extents(
 		goto out_error;
 	}
 
+	if (tmp.rc_domain != domain)
+		return 0;
 	if (tmp.rc_startblock != agbno + aglen)
 		return 0;
-	if ((flags & XFS_FIND_RCEXT_SHARED) && tmp.rc_refcount < 2)
+	if (domain == XFS_RCDOM_SHARED && tmp.rc_refcount < 2)
 		return 0;
-	if ((flags & XFS_FIND_RCEXT_COW) && tmp.rc_refcount > 1)
+	if (domain == XFS_RCDOM_COW && tmp.rc_refcount > 1)
 		return 0;
 	/* We have a right extent; retrieve (or invent) the next left one */
 	*right = tmp;
@@ -744,6 +788,9 @@ xfs_refcount_find_right_extents(
 			goto out_error;
 		}
 
+		if (tmp.rc_domain != domain)
+			goto not_found;
+
 		/* if tmp ends at the end of our range, just use that */
 		if (xfs_refc_next(&tmp) == agbno + aglen)
 			*cright = tmp;
@@ -760,8 +807,10 @@ xfs_refcount_find_right_extents(
 			cright->rc_blockcount = right->rc_startblock -
 					cright->rc_startblock;
 			cright->rc_refcount = 1;
+			cright->rc_domain = domain;
 		}
 	} else {
+not_found:
 		/*
 		 * No extents, so pretend that there's one covering the whole
 		 * range.
@@ -769,6 +818,7 @@ xfs_refcount_find_right_extents(
 		cright->rc_startblock = agbno;
 		cright->rc_blockcount = aglen;
 		cright->rc_refcount = 1;
+		cright->rc_domain = domain;
 	}
 	trace_xfs_refcount_find_right_extent(cur->bc_mp, cur->bc_ag.pag->pag_agno,
 			cright, right, agbno + aglen);
@@ -794,10 +844,10 @@ xfs_refc_valid(
 STATIC int
 xfs_refcount_merge_extents(
 	struct xfs_btree_cur	*cur,
+	enum xfs_rcext_domain	domain,
 	xfs_agblock_t		*agbno,
 	xfs_extlen_t		*aglen,
 	enum xfs_refc_adjust_op adjust,
-	int			flags,
 	bool			*shape_changed)
 {
 	struct xfs_refcount_irec	left = {0}, cleft = {0};
@@ -812,12 +862,12 @@ xfs_refcount_merge_extents(
 	 * just below (agbno + aglen) [cright], and just above (agbno + aglen)
 	 * [right].
 	 */
-	error = xfs_refcount_find_left_extents(cur, &left, &cleft, *agbno,
-			*aglen, flags);
+	error = xfs_refcount_find_left_extents(cur, &left, &cleft, domain,
+			*agbno, *aglen);
 	if (error)
 		return error;
-	error = xfs_refcount_find_right_extents(cur, &right, &cright, *agbno,
-			*aglen, flags);
+	error = xfs_refcount_find_right_extents(cur, &right, &cright, domain,
+			*agbno, *aglen);
 	if (error)
 		return error;
 
@@ -870,7 +920,7 @@ xfs_refcount_merge_extents(
 				aglen);
 	}
 
-	return error;
+	return 0;
 }
 
 /*
@@ -933,7 +983,8 @@ xfs_refcount_adjust_extents(
 	if (*aglen == 0)
 		return 0;
 
-	error = xfs_refcount_lookup_ge(cur, *agbno, &found_rec);
+	error = xfs_refcount_lookup_ge(cur, XFS_RCDOM_SHARED, *agbno,
+			&found_rec);
 	if (error)
 		goto out_error;
 
@@ -941,10 +992,11 @@ xfs_refcount_adjust_extents(
 		error = xfs_refcount_get_rec(cur, &ext, &found_rec);
 		if (error)
 			goto out_error;
-		if (!found_rec) {
+		if (!found_rec || ext.rc_domain != XFS_RCDOM_SHARED) {
 			ext.rc_startblock = cur->bc_mp->m_sb.sb_agblocks;
 			ext.rc_blockcount = 0;
 			ext.rc_refcount = 0;
+			ext.rc_domain = XFS_RCDOM_SHARED;
 		}
 
 		/*
@@ -957,6 +1009,8 @@ xfs_refcount_adjust_extents(
 			tmp.rc_blockcount = min(*aglen,
 					ext.rc_startblock - *agbno);
 			tmp.rc_refcount = 1 + adj;
+			tmp.rc_domain = XFS_RCDOM_SHARED;
+
 			trace_xfs_refcount_modify_extent(cur->bc_mp,
 					cur->bc_ag.pag->pag_agno, &tmp);
 
@@ -986,8 +1040,8 @@ xfs_refcount_adjust_extents(
 			(*agbno) += tmp.rc_blockcount;
 			(*aglen) -= tmp.rc_blockcount;
 
-			error = xfs_refcount_lookup_ge(cur, *agbno,
-					&found_rec);
+			error = xfs_refcount_lookup_ge(cur, XFS_RCDOM_SHARED,
+					*agbno, &found_rec);
 			if (error)
 				goto out_error;
 		}
@@ -1070,13 +1124,15 @@ xfs_refcount_adjust(
 	/*
 	 * Ensure that no rcextents cross the boundary of the adjustment range.
 	 */
-	error = xfs_refcount_split_extent(cur, agbno, &shape_changed);
+	error = xfs_refcount_split_extent(cur, XFS_RCDOM_SHARED, agbno,
+			&shape_changed);
 	if (error)
 		goto out_error;
 	if (shape_changed)
 		shape_changes++;
 
-	error = xfs_refcount_split_extent(cur, agbno + aglen, &shape_changed);
+	error = xfs_refcount_split_extent(cur, XFS_RCDOM_SHARED, agbno + aglen,
+			&shape_changed);
 	if (error)
 		goto out_error;
 	if (shape_changed)
@@ -1085,8 +1141,8 @@ xfs_refcount_adjust(
 	/*
 	 * Try to merge with the left or right extents of the range.
 	 */
-	error = xfs_refcount_merge_extents(cur, new_agbno, new_aglen, adj,
-			XFS_FIND_RCEXT_SHARED, &shape_changed);
+	error = xfs_refcount_merge_extents(cur, XFS_RCDOM_SHARED, new_agbno,
+			new_aglen, adj, &shape_changed);
 	if (error)
 		goto out_error;
 	if (shape_changed)
@@ -1307,7 +1363,7 @@ xfs_refcount_find_shared(
 	*flen = 0;
 
 	/* Try to find a refcount extent that crosses the start */
-	error = xfs_refcount_lookup_le(cur, agbno, &have);
+	error = xfs_refcount_lookup_le(cur, XFS_RCDOM_SHARED, agbno, &have);
 	if (error)
 		goto out_error;
 	if (!have) {
@@ -1325,6 +1381,8 @@ xfs_refcount_find_shared(
 		error = -EFSCORRUPTED;
 		goto out_error;
 	}
+	if (tmp.rc_domain != XFS_RCDOM_SHARED)
+		goto done;
 
 	/* If the extent ends before the start, look at the next one */
 	if (tmp.rc_startblock + tmp.rc_blockcount <= agbno) {
@@ -1340,6 +1398,8 @@ xfs_refcount_find_shared(
 			error = -EFSCORRUPTED;
 			goto out_error;
 		}
+		if (tmp.rc_domain != XFS_RCDOM_SHARED)
+			goto done;
 	}
 
 	/* If the extent starts after the range we want, bail out */
@@ -1371,7 +1431,8 @@ xfs_refcount_find_shared(
 			error = -EFSCORRUPTED;
 			goto out_error;
 		}
-		if (tmp.rc_startblock >= agbno + aglen ||
+		if (tmp.rc_domain != XFS_RCDOM_SHARED ||
+		    tmp.rc_startblock >= agbno + aglen ||
 		    tmp.rc_startblock != *fbno + *flen)
 			break;
 		*flen = min(*flen + tmp.rc_blockcount, agbno + aglen - *fbno);
@@ -1455,17 +1516,22 @@ xfs_refcount_adjust_cow_extents(
 		return 0;
 
 	/* Find any overlapping refcount records */
-	error = xfs_refcount_lookup_ge(cur, agbno, &found_rec);
+	error = xfs_refcount_lookup_ge(cur, XFS_RCDOM_COW, agbno, &found_rec);
 	if (error)
 		goto out_error;
 	error = xfs_refcount_get_rec(cur, &ext, &found_rec);
 	if (error)
 		goto out_error;
+	if (XFS_IS_CORRUPT(cur->bc_mp, found_rec &&
+				ext.rc_domain != XFS_RCDOM_COW)) {
+		error = -EFSCORRUPTED;
+		goto out_error;
+	}
 	if (!found_rec) {
-		ext.rc_startblock = cur->bc_mp->m_sb.sb_agblocks +
-				XFS_REFC_COW_START;
+		ext.rc_startblock = cur->bc_mp->m_sb.sb_agblocks;
 		ext.rc_blockcount = 0;
 		ext.rc_refcount = 0;
+		ext.rc_domain = XFS_RCDOM_COW;
 	}
 
 	switch (adj) {
@@ -1480,6 +1546,8 @@ xfs_refcount_adjust_cow_extents(
 		tmp.rc_startblock = agbno;
 		tmp.rc_blockcount = aglen;
 		tmp.rc_refcount = 1;
+		tmp.rc_domain = XFS_RCDOM_COW;
+
 		trace_xfs_refcount_modify_extent(cur->bc_mp,
 				cur->bc_ag.pag->pag_agno, &tmp);
 
@@ -1542,24 +1610,24 @@ xfs_refcount_adjust_cow(
 	bool			shape_changed;
 	int			error;
 
-	agbno += XFS_REFC_COW_START;
-
 	/*
 	 * Ensure that no rcextents cross the boundary of the adjustment range.
 	 */
-	error = xfs_refcount_split_extent(cur, agbno, &shape_changed);
+	error = xfs_refcount_split_extent(cur, XFS_RCDOM_COW, agbno,
+			&shape_changed);
 	if (error)
 		goto out_error;
 
-	error = xfs_refcount_split_extent(cur, agbno + aglen, &shape_changed);
+	error = xfs_refcount_split_extent(cur, XFS_RCDOM_COW, agbno + aglen,
+			&shape_changed);
 	if (error)
 		goto out_error;
 
 	/*
 	 * Try to merge with the left or right extents of the range.
 	 */
-	error = xfs_refcount_merge_extents(cur, &agbno, &aglen, adj,
-			XFS_FIND_RCEXT_COW, &shape_changed);
+	error = xfs_refcount_merge_extents(cur, XFS_RCDOM_COW, &agbno, &aglen,
+			adj, &shape_changed);
 	if (error)
 		goto out_error;
 
@@ -1666,6 +1734,10 @@ xfs_refcount_recover_extent(
 			   be32_to_cpu(rec->refc.rc_refcount) != 1))
 		return -EFSCORRUPTED;
 
+	if (XFS_IS_CORRUPT(cur->bc_mp, !(rec->refc.rc_startblock &
+					 cpu_to_be32(XFS_REFC_COW_START))))
+		return -EFSCORRUPTED;
+
 	rr = kmem_alloc(sizeof(struct xfs_refcount_recovery), 0);
 	xfs_refcount_btrec_to_irec(rec, &rr->rr_rrec);
 	list_add_tail(&rr->rr_list, debris);
@@ -1687,10 +1759,11 @@ xfs_refcount_recover_cow_leftovers(
 	union xfs_btree_irec		low;
 	union xfs_btree_irec		high;
 	xfs_fsblock_t			fsb;
-	xfs_agblock_t			agbno;
 	int				error;
 
-	if (mp->m_sb.sb_agblocks >= XFS_REFC_COW_START)
+	/* reflink filesystems mustn't have AGs larger than 2^31-1 blocks */
+	BUILD_BUG_ON(XFS_MAX_CRC_AG_BLOCKS >= XFS_REFC_COW_START);
+	if (mp->m_sb.sb_agblocks > XFS_MAX_CRC_AG_BLOCKS)
 		return -EOPNOTSUPP;
 
 	INIT_LIST_HEAD(&debris);
@@ -1717,7 +1790,7 @@ xfs_refcount_recover_cow_leftovers(
 	/* Find all the leftover CoW staging extents. */
 	memset(&low, 0, sizeof(low));
 	memset(&high, 0, sizeof(high));
-	low.rc.rc_startblock = XFS_REFC_COW_START;
+	low.rc.rc_domain = high.rc.rc_domain = XFS_RCDOM_COW;
 	high.rc.rc_startblock = -1U;
 	error = xfs_btree_query_range(cur, &low, &high,
 			xfs_refcount_recover_extent, &debris);
@@ -1738,8 +1811,8 @@ xfs_refcount_recover_cow_leftovers(
 				&rr->rr_rrec);
 
 		/* Free the orphan record */
-		agbno = rr->rr_rrec.rc_startblock - XFS_REFC_COW_START;
-		fsb = XFS_AGB_TO_FSB(mp, pag->pag_agno, agbno);
+		fsb = XFS_AGB_TO_FSB(mp, pag->pag_agno,
+				rr->rr_rrec.rc_startblock);
 		xfs_refcount_free_cow_extent(tp, fsb,
 				rr->rr_rrec.rc_blockcount);
 
@@ -1770,6 +1843,7 @@ xfs_refcount_recover_cow_leftovers(
 int
 xfs_refcount_has_record(
 	struct xfs_btree_cur	*cur,
+	enum xfs_rcext_domain	domain,
 	xfs_agblock_t		bno,
 	xfs_extlen_t		len,
 	bool			*exists)
@@ -1781,6 +1855,7 @@ xfs_refcount_has_record(
 	low.rc.rc_startblock = bno;
 	memset(&high, 0xFF, sizeof(high));
 	high.rc.rc_startblock = bno + len - 1;
+	low.rc.rc_domain = high.rc.rc_domain = domain;
 
 	return xfs_btree_has_record(cur, &low, &high, exists);
 }
diff --git a/fs/xfs/libxfs/xfs_refcount.h b/fs/xfs/libxfs/xfs_refcount.h
index e8b322de7f3d..35f7db1959e5 100644
--- a/fs/xfs/libxfs/xfs_refcount.h
+++ b/fs/xfs/libxfs/xfs_refcount.h
@@ -14,11 +14,11 @@ struct xfs_bmbt_irec;
 struct xfs_refcount_irec;
 
 extern int xfs_refcount_lookup_le(struct xfs_btree_cur *cur,
-		xfs_agblock_t bno, int *stat);
+		enum xfs_rcext_domain domain, xfs_agblock_t bno, int *stat);
 extern int xfs_refcount_lookup_ge(struct xfs_btree_cur *cur,
-		xfs_agblock_t bno, int *stat);
+		enum xfs_rcext_domain domain, xfs_agblock_t bno, int *stat);
 extern int xfs_refcount_lookup_eq(struct xfs_btree_cur *cur,
-		xfs_agblock_t bno, int *stat);
+		enum xfs_rcext_domain domain, xfs_agblock_t bno, int *stat);
 extern int xfs_refcount_get_rec(struct xfs_btree_cur *cur,
 		struct xfs_refcount_irec *irec, int *stat);
 
@@ -79,7 +79,8 @@ extern int xfs_refcount_recover_cow_leftovers(struct xfs_mount *mp,
 #define XFS_REFCOUNT_ITEM_OVERHEAD	32
 
 extern int xfs_refcount_has_record(struct xfs_btree_cur *cur,
-		xfs_agblock_t bno, xfs_extlen_t len, bool *exists);
+		enum xfs_rcext_domain domain, xfs_agblock_t bno,
+		xfs_extlen_t len, bool *exists);
 union xfs_btree_rec;
 extern void xfs_refcount_btrec_to_irec(const union xfs_btree_rec *rec,
 		struct xfs_refcount_irec *irec);
diff --git a/fs/xfs/libxfs/xfs_refcount_btree.c b/fs/xfs/libxfs/xfs_refcount_btree.c
index 316c1ec0c3c2..b0818063aa20 100644
--- a/fs/xfs/libxfs/xfs_refcount_btree.c
+++ b/fs/xfs/libxfs/xfs_refcount_btree.c
@@ -155,12 +155,31 @@ xfs_refcountbt_init_high_key_from_rec(
 	key->refc.rc_startblock = cpu_to_be32(x);
 }
 
+static inline __u32
+xfs_refcountbt_encode_startblock(
+	struct xfs_btree_cur	*cur)
+{
+	__u32			start;
+
+	/*
+	 * low level btree operations need to handle the generic btree range
+	 * query functions (which set rc_domain == -1U), so we check that the
+	 * domain is /not/ shared.
+	 */
+	start = cur->bc_rec.rc.rc_startblock & ~XFS_REFC_COW_START;
+	if (cur->bc_rec.rc.rc_domain != XFS_RCDOM_SHARED)
+		start |= XFS_REFC_COW_START;
+	return start;
+}
+
 STATIC void
 xfs_refcountbt_init_rec_from_cur(
 	struct xfs_btree_cur	*cur,
 	union xfs_btree_rec	*rec)
 {
-	rec->refc.rc_startblock = cpu_to_be32(cur->bc_rec.rc.rc_startblock);
+	__u32			start = xfs_refcountbt_encode_startblock(cur);
+
+	rec->refc.rc_startblock = cpu_to_be32(start);
 	rec->refc.rc_blockcount = cpu_to_be32(cur->bc_rec.rc.rc_blockcount);
 	rec->refc.rc_refcount = cpu_to_be32(cur->bc_rec.rc.rc_refcount);
 }
@@ -182,10 +201,11 @@ xfs_refcountbt_key_diff(
 	struct xfs_btree_cur		*cur,
 	const union xfs_btree_key	*key)
 {
-	struct xfs_refcount_irec	*rec = &cur->bc_rec.rc;
 	const struct xfs_refcount_key	*kp = &key->refc;
+	__u32				start;
 
-	return (int64_t)be32_to_cpu(kp->rc_startblock) - rec->rc_startblock;
+	start = xfs_refcountbt_encode_startblock(cur);
+	return (int64_t)be32_to_cpu(kp->rc_startblock) - start;
 }
 
 STATIC int64_t
diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h
index 2d9ebc7338b1..50e610a9b89c 100644
--- a/fs/xfs/libxfs/xfs_types.h
+++ b/fs/xfs/libxfs/xfs_types.h
@@ -166,10 +166,20 @@ typedef struct xfs_bmbt_irec
 	xfs_exntst_t	br_state;	/* extent state */
 } xfs_bmbt_irec_t;
 
+enum xfs_rcext_domain {
+	XFS_RCDOM_SHARED = 0,
+	XFS_RCDOM_COW,
+};
+
+#define XFS_RCEXT_DOM_STRINGS \
+	{ XFS_RCDOM_SHARED,	"shared" }, \
+	{ XFS_RCDOM_COW,	"cow" }
+
 struct xfs_refcount_irec {
 	xfs_agblock_t	rc_startblock;	/* starting block number */
 	xfs_extlen_t	rc_blockcount;	/* count of free blocks */
 	xfs_nlink_t	rc_refcount;	/* number of inodes linked here */
+	enum xfs_rcext_domain	rc_domain; /* shared or cow staging extent? */
 };
 
 #define XFS_RMAP_ATTR_FORK		(1 << 0)
diff --git a/fs/xfs/scrub/refcount.c b/fs/xfs/scrub/refcount.c
index 8ab55e791ea8..9cf4be9cbb89 100644
--- a/fs/xfs/scrub/refcount.c
+++ b/fs/xfs/scrub/refcount.c
@@ -334,20 +334,19 @@ xchk_refcountbt_rec(
 	struct xfs_refcount_irec irec;
 	xfs_agblock_t		*cow_blocks = bs->private;
 	struct xfs_perag	*pag = bs->cur->bc_ag.pag;
-	bool			has_cowflag;
 
 	xfs_refcount_btrec_to_irec(rec, &irec);
 
 	/* Only CoW records can have refcount == 1. */
-	has_cowflag = (irec.rc_startblock & XFS_REFC_COW_START);
-	if ((irec.rc_refcount == 1 && !has_cowflag) ||
-	    (irec.rc_refcount != 1 && has_cowflag))
+	if (irec.rc_domain == XFS_RCDOM_SHARED && irec.rc_refcount == 1)
 		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
-	if (has_cowflag)
+	if (irec.rc_domain == XFS_RCDOM_COW) {
+		if (irec.rc_refcount != 1)
+			xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
 		(*cow_blocks) += irec.rc_blockcount;
+	}
 
 	/* Check the extent. */
-	irec.rc_startblock &= ~XFS_REFC_COW_START;
 	if (irec.rc_startblock + irec.rc_blockcount <= irec.rc_startblock ||
 	    !xfs_verify_agbno(pag, irec.rc_startblock) ||
 	    !xfs_verify_agbno(pag, irec.rc_startblock + irec.rc_blockcount - 1))
@@ -420,7 +419,6 @@ xchk_xref_is_cow_staging(
 	xfs_extlen_t			len)
 {
 	struct xfs_refcount_irec	rc;
-	bool				has_cowflag;
 	int				has_refcount;
 	int				error;
 
@@ -428,8 +426,8 @@ xchk_xref_is_cow_staging(
 		return;
 
 	/* Find the CoW staging extent. */
-	error = xfs_refcount_lookup_le(sc->sa.refc_cur,
-			agbno + XFS_REFC_COW_START, &has_refcount);
+	error = xfs_refcount_lookup_le(sc->sa.refc_cur, XFS_RCDOM_COW, agbno,
+			&has_refcount);
 	if (!xchk_should_check_xref(sc, &error, &sc->sa.refc_cur))
 		return;
 	if (!has_refcount) {
@@ -446,8 +444,7 @@ xchk_xref_is_cow_staging(
 	}
 
 	/* CoW flag must be set, refcount must be 1. */
-	has_cowflag = (rc.rc_startblock & XFS_REFC_COW_START);
-	if (!has_cowflag || rc.rc_refcount != 1)
+	if (rc.rc_domain != XFS_RCDOM_COW || rc.rc_refcount != 1)
 		xchk_btree_xref_set_corrupt(sc, sc->sa.refc_cur, 0);
 
 	/* Must be at least as long as what was passed in */
@@ -471,7 +468,8 @@ xchk_xref_is_not_shared(
 	if (!sc->sa.refc_cur || xchk_skip_xref(sc->sm))
 		return;
 
-	error = xfs_refcount_has_record(sc->sa.refc_cur, agbno, len, &shared);
+	error = xfs_refcount_has_record(sc->sa.refc_cur, XFS_RCDOM_SHARED,
+			agbno, len, &shared);
 	if (!xchk_should_check_xref(sc, &error, &sc->sa.refc_cur))
 		return;
 	if (shared)
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index cb7c81ba7fa3..820da4452499 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -799,6 +799,9 @@ TRACE_DEFINE_ENUM(PE_SIZE_PTE);
 TRACE_DEFINE_ENUM(PE_SIZE_PMD);
 TRACE_DEFINE_ENUM(PE_SIZE_PUD);
 
+TRACE_DEFINE_ENUM(XFS_RCDOM_SHARED);
+TRACE_DEFINE_ENUM(XFS_RCDOM_COW);
+
 TRACE_EVENT(xfs_filemap_fault,
 	TP_PROTO(struct xfs_inode *ip, enum page_entry_size pe_size,
 		 bool write_fault),
@@ -2925,6 +2928,7 @@ DECLARE_EVENT_CLASS(xfs_refcount_extent_class,
 	TP_STRUCT__entry(
 		__field(dev_t, dev)
 		__field(xfs_agnumber_t, agno)
+		__field(enum xfs_rcext_domain, domain)
 		__field(xfs_agblock_t, startblock)
 		__field(xfs_extlen_t, blockcount)
 		__field(xfs_nlink_t, refcount)
@@ -2932,13 +2936,15 @@ DECLARE_EVENT_CLASS(xfs_refcount_extent_class,
 	TP_fast_assign(
 		__entry->dev = mp->m_super->s_dev;
 		__entry->agno = agno;
+		__entry->domain = irec->rc_domain;
 		__entry->startblock = irec->rc_startblock;
 		__entry->blockcount = irec->rc_blockcount;
 		__entry->refcount = irec->rc_refcount;
 	),
-	TP_printk("dev %d:%d agno 0x%x agbno 0x%x fsbcount 0x%x refcount %u",
+	TP_printk("dev %d:%d agno 0x%x dom %s agbno 0x%x fsbcount 0x%x refcount %u",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  __entry->agno,
+		  __print_symbolic(__entry->domain, XFS_RCEXT_DOM_STRINGS),
 		  __entry->startblock,
 		  __entry->blockcount,
 		  __entry->refcount)
@@ -2958,6 +2964,7 @@ DECLARE_EVENT_CLASS(xfs_refcount_extent_at_class,
 	TP_STRUCT__entry(
 		__field(dev_t, dev)
 		__field(xfs_agnumber_t, agno)
+		__field(enum xfs_rcext_domain, domain)
 		__field(xfs_agblock_t, startblock)
 		__field(xfs_extlen_t, blockcount)
 		__field(xfs_nlink_t, refcount)
@@ -2966,14 +2973,16 @@ DECLARE_EVENT_CLASS(xfs_refcount_extent_at_class,
 	TP_fast_assign(
 		__entry->dev = mp->m_super->s_dev;
 		__entry->agno = agno;
+		__entry->domain = irec->rc_domain;
 		__entry->startblock = irec->rc_startblock;
 		__entry->blockcount = irec->rc_blockcount;
 		__entry->refcount = irec->rc_refcount;
 		__entry->agbno = agbno;
 	),
-	TP_printk("dev %d:%d agno 0x%x agbno 0x%x fsbcount 0x%x refcount %u @ agbno 0x%x",
+	TP_printk("dev %d:%d agno 0x%x dom %s agbno 0x%x fsbcount 0x%x refcount %u @ agbno 0x%x",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  __entry->agno,
+		  __print_symbolic(__entry->domain, XFS_RCEXT_DOM_STRINGS),
 		  __entry->startblock,
 		  __entry->blockcount,
 		  __entry->refcount,
@@ -2994,9 +3003,11 @@ DECLARE_EVENT_CLASS(xfs_refcount_double_extent_class,
 	TP_STRUCT__entry(
 		__field(dev_t, dev)
 		__field(xfs_agnumber_t, agno)
+		__field(enum xfs_rcext_domain, i1_domain)
 		__field(xfs_agblock_t, i1_startblock)
 		__field(xfs_extlen_t, i1_blockcount)
 		__field(xfs_nlink_t, i1_refcount)
+		__field(enum xfs_rcext_domain, i2_domain)
 		__field(xfs_agblock_t, i2_startblock)
 		__field(xfs_extlen_t, i2_blockcount)
 		__field(xfs_nlink_t, i2_refcount)
@@ -3004,20 +3015,24 @@ DECLARE_EVENT_CLASS(xfs_refcount_double_extent_class,
 	TP_fast_assign(
 		__entry->dev = mp->m_super->s_dev;
 		__entry->agno = agno;
+		__entry->i1_domain = i1->rc_domain;
 		__entry->i1_startblock = i1->rc_startblock;
 		__entry->i1_blockcount = i1->rc_blockcount;
 		__entry->i1_refcount = i1->rc_refcount;
+		__entry->i2_domain = i2->rc_domain;
 		__entry->i2_startblock = i2->rc_startblock;
 		__entry->i2_blockcount = i2->rc_blockcount;
 		__entry->i2_refcount = i2->rc_refcount;
 	),
-	TP_printk("dev %d:%d agno 0x%x agbno 0x%x fsbcount 0x%x refcount %u -- "
-		  "agbno 0x%x fsbcount 0x%x refcount %u",
+	TP_printk("dev %d:%d agno 0x%x dom %s agbno 0x%x fsbcount 0x%x refcount %u -- "
+		  "dom %s agbno 0x%x fsbcount 0x%x refcount %u",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  __entry->agno,
+		  __print_symbolic(__entry->i1_domain, XFS_RCEXT_DOM_STRINGS),
 		  __entry->i1_startblock,
 		  __entry->i1_blockcount,
 		  __entry->i1_refcount,
+		  __print_symbolic(__entry->i2_domain, XFS_RCEXT_DOM_STRINGS),
 		  __entry->i2_startblock,
 		  __entry->i2_blockcount,
 		  __entry->i2_refcount)
@@ -3038,9 +3053,11 @@ DECLARE_EVENT_CLASS(xfs_refcount_double_extent_at_class,
 	TP_STRUCT__entry(
 		__field(dev_t, dev)
 		__field(xfs_agnumber_t, agno)
+		__field(enum xfs_rcext_domain, i1_domain)
 		__field(xfs_agblock_t, i1_startblock)
 		__field(xfs_extlen_t, i1_blockcount)
 		__field(xfs_nlink_t, i1_refcount)
+		__field(enum xfs_rcext_domain, i2_domain)
 		__field(xfs_agblock_t, i2_startblock)
 		__field(xfs_extlen_t, i2_blockcount)
 		__field(xfs_nlink_t, i2_refcount)
@@ -3049,21 +3066,25 @@ DECLARE_EVENT_CLASS(xfs_refcount_double_extent_at_class,
 	TP_fast_assign(
 		__entry->dev = mp->m_super->s_dev;
 		__entry->agno = agno;
+		__entry->i1_domain = i1->rc_domain;
 		__entry->i1_startblock = i1->rc_startblock;
 		__entry->i1_blockcount = i1->rc_blockcount;
 		__entry->i1_refcount = i1->rc_refcount;
+		__entry->i2_domain = i2->rc_domain;
 		__entry->i2_startblock = i2->rc_startblock;
 		__entry->i2_blockcount = i2->rc_blockcount;
 		__entry->i2_refcount = i2->rc_refcount;
 		__entry->agbno = agbno;
 	),
-	TP_printk("dev %d:%d agno 0x%x agbno 0x%x fsbcount 0x%x refcount %u -- "
-		  "agbno 0x%x fsbcount 0x%x refcount %u @ agbno 0x%x",
+	TP_printk("dev %d:%d agno 0x%x dom %s agbno 0x%x fsbcount 0x%x refcount %u -- "
+		  "dom %s agbno 0x%x fsbcount 0x%x refcount %u @ agbno 0x%x",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  __entry->agno,
+		  __print_symbolic(__entry->i1_domain, XFS_RCEXT_DOM_STRINGS),
 		  __entry->i1_startblock,
 		  __entry->i1_blockcount,
 		  __entry->i1_refcount,
+		  __print_symbolic(__entry->i2_domain, XFS_RCEXT_DOM_STRINGS),
 		  __entry->i2_startblock,
 		  __entry->i2_blockcount,
 		  __entry->i2_refcount,
@@ -3086,12 +3107,15 @@ DECLARE_EVENT_CLASS(xfs_refcount_triple_extent_class,
 	TP_STRUCT__entry(
 		__field(dev_t, dev)
 		__field(xfs_agnumber_t, agno)
+		__field(enum xfs_rcext_domain, i1_domain)
 		__field(xfs_agblock_t, i1_startblock)
 		__field(xfs_extlen_t, i1_blockcount)
 		__field(xfs_nlink_t, i1_refcount)
+		__field(enum xfs_rcext_domain, i2_domain)
 		__field(xfs_agblock_t, i2_startblock)
 		__field(xfs_extlen_t, i2_blockcount)
 		__field(xfs_nlink_t, i2_refcount)
+		__field(enum xfs_rcext_domain, i3_domain)
 		__field(xfs_agblock_t, i3_startblock)
 		__field(xfs_extlen_t, i3_blockcount)
 		__field(xfs_nlink_t, i3_refcount)
@@ -3099,27 +3123,33 @@ DECLARE_EVENT_CLASS(xfs_refcount_triple_extent_class,
 	TP_fast_assign(
 		__entry->dev = mp->m_super->s_dev;
 		__entry->agno = agno;
+		__entry->i1_domain = i1->rc_domain;
 		__entry->i1_startblock = i1->rc_startblock;
 		__entry->i1_blockcount = i1->rc_blockcount;
 		__entry->i1_refcount = i1->rc_refcount;
+		__entry->i2_domain = i2->rc_domain;
 		__entry->i2_startblock = i2->rc_startblock;
 		__entry->i2_blockcount = i2->rc_blockcount;
 		__entry->i2_refcount = i2->rc_refcount;
+		__entry->i3_domain = i3->rc_domain;
 		__entry->i3_startblock = i3->rc_startblock;
 		__entry->i3_blockcount = i3->rc_blockcount;
 		__entry->i3_refcount = i3->rc_refcount;
 	),
-	TP_printk("dev %d:%d agno 0x%x agbno 0x%x fsbcount 0x%x refcount %u -- "
-		  "agbno 0x%x fsbcount 0x%x refcount %u -- "
-		  "agbno 0x%x fsbcount 0x%x refcount %u",
+	TP_printk("dev %d:%d agno 0x%x dom %s agbno 0x%x fsbcount 0x%x refcount %u -- "
+		  "dom %s agbno 0x%x fsbcount 0x%x refcount %u -- "
+		  "dom %s agbno 0x%x fsbcount 0x%x refcount %u",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  __entry->agno,
+		  __print_symbolic(__entry->i1_domain, XFS_RCEXT_DOM_STRINGS),
 		  __entry->i1_startblock,
 		  __entry->i1_blockcount,
 		  __entry->i1_refcount,
+		  __print_symbolic(__entry->i2_domain, XFS_RCEXT_DOM_STRINGS),
 		  __entry->i2_startblock,
 		  __entry->i2_blockcount,
 		  __entry->i2_refcount,
+		  __print_symbolic(__entry->i3_domain, XFS_RCEXT_DOM_STRINGS),
 		  __entry->i3_startblock,
 		  __entry->i3_blockcount,
 		  __entry->i3_refcount)


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/5] xfs: rename XFS_REFC_COW_START to _COWFLAG
  2022-10-24 21:33 [PATCHSET 0/5] xfs: improve runtime refcountbt corruption detection Darrick J. Wong
                   ` (2 preceding siblings ...)
  2022-10-24 21:33 ` [PATCH 3/5] xfs: track cow/shared record domains explicitly in xfs_refcount_irec Darrick J. Wong
@ 2022-10-24 21:33 ` Darrick J. Wong
  2022-10-26  0:41   ` Dave Chinner
  2022-10-24 21:33 ` [PATCH 5/5] xfs: check deferred refcount op continuation parameters Darrick J. Wong
  4 siblings, 1 reply; 13+ messages in thread
From: Darrick J. Wong @ 2022-10-24 21:33 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

We've been (ab)using XFS_REFC_COW_START as both an integer quantity and
a bit flag, even though it's *only* a bit flag.  Rename the variable to
reflect its nature and update the cast target since we're not supposed
to be comparing it to xfs_agblock_t now.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_format.h         |    2 +-
 fs/xfs/libxfs/xfs_refcount.c       |   18 +++++++++---------
 fs/xfs/libxfs/xfs_refcount_btree.c |    4 ++--
 3 files changed, 12 insertions(+), 12 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 005dd65b71cd..2ce588f154e1 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -1612,7 +1612,7 @@ unsigned int xfs_refc_block(struct xfs_mount *mp);
  * on the startblock.  This speeds up mount time deletion of stale
  * staging extents because they're all at the right side of the tree.
  */
-#define XFS_REFC_COW_START		((xfs_agblock_t)(1U << 31))
+#define XFS_REFC_COWFLAG		((uint32_t)(1U << 31))
 #define REFCNTBT_COWFLAG_BITLEN		1
 #define REFCNTBT_AGBLOCK_BITLEN		31
 
diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
index fa75e785652b..3e6cc1777ffb 100644
--- a/fs/xfs/libxfs/xfs_refcount.c
+++ b/fs/xfs/libxfs/xfs_refcount.c
@@ -51,7 +51,7 @@ xfs_refcount_lookup_le(
 	int			*stat)
 {
 	trace_xfs_refcount_lookup(cur->bc_mp, cur->bc_ag.pag->pag_agno,
-			bno | (domain == XFS_RCDOM_COW ? XFS_REFC_COW_START : 0),
+			bno | (domain == XFS_RCDOM_COW ? XFS_REFC_COWFLAG : 0),
 			XFS_LOOKUP_LE);
 	cur->bc_rec.rc.rc_startblock = bno;
 	cur->bc_rec.rc.rc_blockcount = 0;
@@ -71,7 +71,7 @@ xfs_refcount_lookup_ge(
 	int			*stat)
 {
 	trace_xfs_refcount_lookup(cur->bc_mp, cur->bc_ag.pag->pag_agno,
-			bno | (domain == XFS_RCDOM_COW ? XFS_REFC_COW_START : 0),
+			bno | (domain == XFS_RCDOM_COW ? XFS_REFC_COWFLAG : 0),
 			XFS_LOOKUP_GE);
 	cur->bc_rec.rc.rc_startblock = bno;
 	cur->bc_rec.rc.rc_blockcount = 0;
@@ -91,7 +91,7 @@ xfs_refcount_lookup_eq(
 	int			*stat)
 {
 	trace_xfs_refcount_lookup(cur->bc_mp, cur->bc_ag.pag->pag_agno,
-			bno | (domain == XFS_RCDOM_COW ? XFS_REFC_COW_START : 0),
+			bno | (domain == XFS_RCDOM_COW ? XFS_REFC_COWFLAG : 0),
 			XFS_LOOKUP_LE);
 	cur->bc_rec.rc.rc_startblock = bno;
 	cur->bc_rec.rc.rc_blockcount = 0;
@@ -108,8 +108,8 @@ xfs_refcount_btrec_to_irec(
 	__u32				start;
 
 	start = be32_to_cpu(rec->refc.rc_startblock);
-	if (start & XFS_REFC_COW_START) {
-		start &= ~XFS_REFC_COW_START;
+	if (start & XFS_REFC_COWFLAG) {
+		start &= ~XFS_REFC_COWFLAG;
 		irec->rc_domain = XFS_RCDOM_COW;
 	} else {
 		irec->rc_domain = XFS_RCDOM_SHARED;
@@ -188,9 +188,9 @@ xfs_refcount_update(
 
 	trace_xfs_refcount_update(cur->bc_mp, cur->bc_ag.pag->pag_agno, irec);
 
-	start = irec->rc_startblock & ~XFS_REFC_COW_START;
+	start = irec->rc_startblock & ~XFS_REFC_COWFLAG;
 	if (irec->rc_domain == XFS_RCDOM_COW)
-		start |= XFS_REFC_COW_START;
+		start |= XFS_REFC_COWFLAG;
 
 	rec.refc.rc_startblock = cpu_to_be32(start);
 	rec.refc.rc_blockcount = cpu_to_be32(irec->rc_blockcount);
@@ -1735,7 +1735,7 @@ xfs_refcount_recover_extent(
 		return -EFSCORRUPTED;
 
 	if (XFS_IS_CORRUPT(cur->bc_mp, !(rec->refc.rc_startblock &
-					 cpu_to_be32(XFS_REFC_COW_START))))
+					 cpu_to_be32(XFS_REFC_COWFLAG))))
 		return -EFSCORRUPTED;
 
 	rr = kmem_alloc(sizeof(struct xfs_refcount_recovery), 0);
@@ -1762,7 +1762,7 @@ xfs_refcount_recover_cow_leftovers(
 	int				error;
 
 	/* reflink filesystems mustn't have AGs larger than 2^31-1 blocks */
-	BUILD_BUG_ON(XFS_MAX_CRC_AG_BLOCKS >= XFS_REFC_COW_START);
+	BUILD_BUG_ON(XFS_MAX_CRC_AG_BLOCKS >= XFS_REFC_COWFLAG);
 	if (mp->m_sb.sb_agblocks > XFS_MAX_CRC_AG_BLOCKS)
 		return -EOPNOTSUPP;
 
diff --git a/fs/xfs/libxfs/xfs_refcount_btree.c b/fs/xfs/libxfs/xfs_refcount_btree.c
index b0818063aa20..fdbb2895d8c3 100644
--- a/fs/xfs/libxfs/xfs_refcount_btree.c
+++ b/fs/xfs/libxfs/xfs_refcount_btree.c
@@ -166,9 +166,9 @@ xfs_refcountbt_encode_startblock(
 	 * query functions (which set rc_domain == -1U), so we check that the
 	 * domain is /not/ shared.
 	 */
-	start = cur->bc_rec.rc.rc_startblock & ~XFS_REFC_COW_START;
+	start = cur->bc_rec.rc.rc_startblock & ~XFS_REFC_COWFLAG;
 	if (cur->bc_rec.rc.rc_domain != XFS_RCDOM_SHARED)
-		start |= XFS_REFC_COW_START;
+		start |= XFS_REFC_COWFLAG;
 	return start;
 }
 


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/5] xfs: check deferred refcount op continuation parameters
  2022-10-24 21:33 [PATCHSET 0/5] xfs: improve runtime refcountbt corruption detection Darrick J. Wong
                   ` (3 preceding siblings ...)
  2022-10-24 21:33 ` [PATCH 4/5] xfs: rename XFS_REFC_COW_START to _COWFLAG Darrick J. Wong
@ 2022-10-24 21:33 ` Darrick J. Wong
  2022-10-26  0:46   ` Dave Chinner
  4 siblings, 1 reply; 13+ messages in thread
From: Darrick J. Wong @ 2022-10-24 21:33 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

If we're in the middle of a deferred refcount operation and decide to
roll the transaction to avoid overflowing the transaction space, we need
to check the new agbno/aglen parameters that we're about to record in
the new intent.  Specifically, we need to check that the new extent is
completely within the filesystem, and that continuation does not put us
into a different AG.

If the keys of a node block are wrong, the lookup to resume an
xfs_refcount_adjust_extents operation can put us into the wrong record
block.  If this happens, we might not find that we run out of aglen at
an exact record boundary, which will cause the loop control to do the
wrong thing.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_refcount.c |   48 ++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 46 insertions(+), 2 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
index 3e6cc1777ffb..a311851a627a 100644
--- a/fs/xfs/libxfs/xfs_refcount.c
+++ b/fs/xfs/libxfs/xfs_refcount.c
@@ -1180,6 +1180,44 @@ xfs_refcount_finish_one_cleanup(
 		xfs_trans_brelse(tp, agbp);
 }
 
+/*
+ * Set up a continuation a deferred refcount operation by updating the intent.
+ * Checks to make sure we're not going to run off the end of the AG.
+ */
+static inline int
+xfs_refcount_continue_op(
+	struct xfs_btree_cur		*cur,
+	xfs_fsblock_t			startblock,
+	xfs_agblock_t			new_agbno,
+	xfs_extlen_t			new_len,
+	xfs_fsblock_t			*fsbp)
+{
+	struct xfs_mount		*mp = cur->bc_mp;
+	struct xfs_perag		*pag = cur->bc_ag.pag;
+	xfs_fsblock_t			new_fsbno;
+	xfs_agnumber_t			old_agno;
+
+	old_agno = XFS_FSB_TO_AGNO(mp, startblock);
+	new_fsbno = XFS_AGB_TO_FSB(mp, pag->pag_agno, new_agbno);
+
+	/*
+	 * If we don't have any work left to do, then there's no need
+	 * to perform the validation of the new parameters.
+	 */
+	if (!new_len)
+		goto done;
+
+	if (XFS_IS_CORRUPT(mp, !xfs_verify_fsbext(mp, new_fsbno, new_len)))
+		return -EFSCORRUPTED;
+
+	if (XFS_IS_CORRUPT(mp, old_agno != XFS_FSB_TO_AGNO(mp, new_fsbno)))
+		return -EFSCORRUPTED;
+
+done:
+	*fsbp = new_fsbno;
+	return 0;
+}
+
 /*
  * Process one of the deferred refcount operations.  We pass back the
  * btree cursor to maintain our lock on the btree between calls.
@@ -1247,12 +1285,18 @@ xfs_refcount_finish_one(
 	case XFS_REFCOUNT_INCREASE:
 		error = xfs_refcount_adjust(rcur, bno, blockcount, &new_agbno,
 				new_len, XFS_REFCOUNT_ADJUST_INCREASE);
-		*new_fsb = XFS_AGB_TO_FSB(mp, pag->pag_agno, new_agbno);
+		if (error)
+			goto out_drop;
+		error = xfs_refcount_continue_op(rcur, startblock, new_agbno,
+				*new_len, new_fsb);
 		break;
 	case XFS_REFCOUNT_DECREASE:
 		error = xfs_refcount_adjust(rcur, bno, blockcount, &new_agbno,
 				new_len, XFS_REFCOUNT_ADJUST_DECREASE);
-		*new_fsb = XFS_AGB_TO_FSB(mp, pag->pag_agno, new_agbno);
+		if (error)
+			goto out_drop;
+		error = xfs_refcount_continue_op(rcur, startblock, new_agbno,
+				*new_len, new_fsb);
 		break;
 	case XFS_REFCOUNT_ALLOC_COW:
 		*new_fsb = startblock + blockcount;


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/5] xfs: move _irec structs to xfs_types.h
  2022-10-24 21:33 ` [PATCH 1/5] xfs: move _irec structs to xfs_types.h Darrick J. Wong
@ 2022-10-25 23:56   ` Dave Chinner
  0 siblings, 0 replies; 13+ messages in thread
From: Dave Chinner @ 2022-10-25 23:56 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Oct 24, 2022 at 02:33:14PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Structure definitions for incore objects do not belong in the ondisk
> format header.  Move them to the incore types header where they belong.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/libxfs/xfs_format.h |   20 --------------------
>  fs/xfs/libxfs/xfs_types.h  |   20 ++++++++++++++++++++
>  2 files changed, 20 insertions(+), 20 deletions(-)

Looks good.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/5] xfs: refactor refcount record usage in xchk_refcountbt_rec
  2022-10-24 21:33 ` [PATCH 2/5] xfs: refactor refcount record usage in xchk_refcountbt_rec Darrick J. Wong
@ 2022-10-25 23:59   ` Dave Chinner
  0 siblings, 0 replies; 13+ messages in thread
From: Dave Chinner @ 2022-10-25 23:59 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Oct 24, 2022 at 02:33:20PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Consolidate the open-coded xfs_refcount_irec fields into an actual
> struct and use the existing _btrec_to_irec to decode the ondisk record.
> This will reduce code churn in the next patch.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/scrub/refcount.c |   58 +++++++++++++++++++++--------------------------
>  1 file changed, 26 insertions(+), 32 deletions(-)

Seems like a straight forward conversion.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/5] xfs: track cow/shared record domains explicitly in xfs_refcount_irec
  2022-10-24 21:33 ` [PATCH 3/5] xfs: track cow/shared record domains explicitly in xfs_refcount_irec Darrick J. Wong
@ 2022-10-26  0:40   ` Dave Chinner
  2022-10-26 22:06     ` Darrick J. Wong
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Chinner @ 2022-10-26  0:40 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Oct 24, 2022 at 02:33:25PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Just prior to committing the reflink code into upstream, the xfs
> maintainer at the time requested that I find a way to shard the refcount
> records into two domains -- one for records tracking shared extents, and
> a second for tracking CoW staging extents.  The idea here was to
> minimize mount time CoW reclamation by pushing all the CoW records to
> the right edge of the keyspace, and it was accomplished by setting the
> upper bit in rc_startblock.  We don't allow AGs to have more than 2^31
> blocks, so the bit was free.
> 
> Unfortunately, this was a very late addition to the codebase, so most of
> the refcount record processing code still treats rc_startblock as a u32
> and pays no attention to whether or not the upper bit (the cow flag) is
> set.  This is a weakness is theoretically exploitable, since we're not
> fully validating the incoming metadata records.
> 
> Fuzzing demonstrates practical exploits of this weakness.  If the cow
> flag of a node block key record is corrupted, a lookup operation can go
> to the wrong record block and start returning records from the wrong
> cow/shared domain.  This causes the math to go all wrong (since cow
> domain is still implicit in the upper bit of rc_startblock) and we can
> crash the kernel by tricking xfs into jumping into a nonexistent AG and
> tripping over xfs_perag_get(mp, <nonexistent AG>) returning NULL.
> 
> To fix this, start tracking the domain as an explicit part of struct
> xfs_refcount_irec, adjust all refcount functions to check the domain
> of a returned record, and alter the function definitions to accept them
> where necessary.
> 
> Found by fuzzing keys[2].cowflag = add in xfs/464.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/libxfs/xfs_refcount.c       |  221 ++++++++++++++++++++++++------------
>  fs/xfs/libxfs/xfs_refcount.h       |    9 +
>  fs/xfs/libxfs/xfs_refcount_btree.c |   26 ++++
>  fs/xfs/libxfs/xfs_types.h          |   10 ++
>  fs/xfs/scrub/refcount.c            |   22 ++--
>  fs/xfs/xfs_trace.h                 |   48 ++++++--
>  6 files changed, 235 insertions(+), 101 deletions(-)

My first thought was that this is complex enough that it needs to be
split up. Reading the very first hunk:

> 
> 
> diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
> index 64b910caafaa..fa75e785652b 100644
> --- a/fs/xfs/libxfs/xfs_refcount.c
> +++ b/fs/xfs/libxfs/xfs_refcount.c
> @@ -46,13 +46,16 @@ STATIC int __xfs_refcount_cow_free(struct xfs_btree_cur *rcur,
>  int
>  xfs_refcount_lookup_le(
>  	struct xfs_btree_cur	*cur,
> +	enum xfs_rcext_domain	domain,
>  	xfs_agblock_t		bno,
>  	int			*stat)
>  {
> -	trace_xfs_refcount_lookup(cur->bc_mp, cur->bc_ag.pag->pag_agno, bno,
> +	trace_xfs_refcount_lookup(cur->bc_mp, cur->bc_ag.pag->pag_agno,
> +			bno | (domain == XFS_RCDOM_COW ? XFS_REFC_COW_START : 0),
>  			XFS_LOOKUP_LE);
>  	cur->bc_rec.rc.rc_startblock = bno;
>  	cur->bc_rec.rc.rc_blockcount = 0;
> +	cur->bc_rec.rc.rc_domain = domain;
>  	return xfs_btree_lookup(cur, XFS_LOOKUP_LE, stat);

I found that I had to go looking through the patch to find what the
definitions of xfs_rcext_domain, XFS_RCDOM_COW and rc_domain
actually were, because without knowing what the actual new
structures being introduced actually were this didn't make a whole
lot of sense.

Hence I think I'd like this to be broken up into a patch that
introduces the rc_domain and the helpers that build/split the
domain/startblock information in the irec, and a followup patch that
modifies the implementation to use the new domain interfaces to make
it a bit easier to see where the significant changes in the code
actually are.

I also suspect that trace_xfs_refcount_lookup() should be passed the
domain directly rather than encoding it at every call site of the
tracepoint, or it code encoded in a helper such as
xfs_refcount_domain_to_block()...

For consistency, calling the enums XFS_REFC_DOMAIN_{COW/SHARED}
would be more consistent with the naming used in other refcount
btree constants.

>  }
>  
> @@ -63,13 +66,16 @@ xfs_refcount_lookup_le(
>  int
>  xfs_refcount_lookup_ge(
>  	struct xfs_btree_cur	*cur,
> +	enum xfs_rcext_domain	domain,
>  	xfs_agblock_t		bno,
>  	int			*stat)
>  {
> -	trace_xfs_refcount_lookup(cur->bc_mp, cur->bc_ag.pag->pag_agno, bno,
> +	trace_xfs_refcount_lookup(cur->bc_mp, cur->bc_ag.pag->pag_agno,
> +			bno | (domain == XFS_RCDOM_COW ? XFS_REFC_COW_START : 0),
>  			XFS_LOOKUP_GE);
>  	cur->bc_rec.rc.rc_startblock = bno;
>  	cur->bc_rec.rc.rc_blockcount = 0;
> +	cur->bc_rec.rc.rc_domain = domain;
>  	return xfs_btree_lookup(cur, XFS_LOOKUP_GE, stat);
>  }
>  
> @@ -80,13 +86,16 @@ xfs_refcount_lookup_ge(
>  int
>  xfs_refcount_lookup_eq(
>  	struct xfs_btree_cur	*cur,
> +	enum xfs_rcext_domain	domain,
>  	xfs_agblock_t		bno,
>  	int			*stat)
>  {
> -	trace_xfs_refcount_lookup(cur->bc_mp, cur->bc_ag.pag->pag_agno, bno,
> +	trace_xfs_refcount_lookup(cur->bc_mp, cur->bc_ag.pag->pag_agno,
> +			bno | (domain == XFS_RCDOM_COW ? XFS_REFC_COW_START : 0),
>  			XFS_LOOKUP_LE);
>  	cur->bc_rec.rc.rc_startblock = bno;
>  	cur->bc_rec.rc.rc_blockcount = 0;
> +	cur->bc_rec.rc.rc_domain = domain;
>  	return xfs_btree_lookup(cur, XFS_LOOKUP_EQ, stat);
>  }
>  
> @@ -96,7 +105,17 @@ xfs_refcount_btrec_to_irec(
>  	const union xfs_btree_rec	*rec,
>  	struct xfs_refcount_irec	*irec)
>  {
> -	irec->rc_startblock = be32_to_cpu(rec->refc.rc_startblock);
> +	__u32				start;
> +
> +	start = be32_to_cpu(rec->refc.rc_startblock);
> +	if (start & XFS_REFC_COW_START) {
> +		start &= ~XFS_REFC_COW_START;
> +		irec->rc_domain = XFS_RCDOM_COW;
> +	} else {
> +		irec->rc_domain = XFS_RCDOM_SHARED;
> +	}

xfs_refcount_block_to_domain()?



> +
> +	irec->rc_startblock = start;
>  	irec->rc_blockcount = be32_to_cpu(rec->refc.rc_blockcount);
>  	irec->rc_refcount = be32_to_cpu(rec->refc.rc_refcount);
>  }
> @@ -114,7 +133,6 @@ xfs_refcount_get_rec(
>  	struct xfs_perag		*pag = cur->bc_ag.pag;
>  	union xfs_btree_rec		*rec;
>  	int				error;
> -	xfs_agblock_t			realstart;
>  
>  	error = xfs_btree_get_rec(cur, &rec, stat);
>  	if (error || !*stat)
> @@ -124,22 +142,18 @@ xfs_refcount_get_rec(
>  	if (irec->rc_blockcount == 0 || irec->rc_blockcount > MAXREFCEXTLEN)
>  		goto out_bad_rec;
>  
> -	/* handle special COW-staging state */
> -	realstart = irec->rc_startblock;
> -	if (realstart & XFS_REFC_COW_START) {
> -		if (irec->rc_refcount != 1)
> -			goto out_bad_rec;
> -		realstart &= ~XFS_REFC_COW_START;
> -	} else if (irec->rc_refcount < 2) {
> +	/* handle special COW-staging domain */
> +	if (irec->rc_domain == XFS_RCDOM_COW && irec->rc_refcount != 1)
> +		goto out_bad_rec;
> +	if (irec->rc_domain == XFS_RCDOM_SHARED && irec->rc_refcount < 2)
>  		goto out_bad_rec;
> -	}
>  
>  	/* check for valid extent range, including overflow */
> -	if (!xfs_verify_agbno(pag, realstart))
> +	if (!xfs_verify_agbno(pag, irec->rc_startblock))
>  		goto out_bad_rec;
> -	if (realstart > realstart + irec->rc_blockcount)
> +	if (irec->rc_startblock > irec->rc_startblock + irec->rc_blockcount)
>  		goto out_bad_rec;
> -	if (!xfs_verify_agbno(pag, realstart + irec->rc_blockcount - 1))
> +	if (!xfs_verify_agbno(pag, irec->rc_startblock + irec->rc_blockcount - 1))
>  		goto out_bad_rec;
>  
>  	if (irec->rc_refcount == 0 || irec->rc_refcount > MAXREFCOUNT)
> @@ -169,12 +183,19 @@ xfs_refcount_update(
>  	struct xfs_refcount_irec	*irec)
>  {
>  	union xfs_btree_rec	rec;
> +	__u32			start;
>  	int			error;
>  
>  	trace_xfs_refcount_update(cur->bc_mp, cur->bc_ag.pag->pag_agno, irec);
> -	rec.refc.rc_startblock = cpu_to_be32(irec->rc_startblock);
> +
> +	start = irec->rc_startblock & ~XFS_REFC_COW_START;
> +	if (irec->rc_domain == XFS_RCDOM_COW)
> +		start |= XFS_REFC_COW_START;

Yeah, this definitely looks like we want a
xfs_refcount_domain_to_block() helper...

> +
> +	rec.refc.rc_startblock = cpu_to_be32(start);
>  	rec.refc.rc_blockcount = cpu_to_be32(irec->rc_blockcount);
>  	rec.refc.rc_refcount = cpu_to_be32(irec->rc_refcount);
> +
>  	error = xfs_btree_update(cur, &rec);
>  	if (error)
>  		trace_xfs_refcount_update_error(cur->bc_mp,
> @@ -196,9 +217,12 @@ xfs_refcount_insert(
>  	int				error;
>  
>  	trace_xfs_refcount_insert(cur->bc_mp, cur->bc_ag.pag->pag_agno, irec);
> +
>  	cur->bc_rec.rc.rc_startblock = irec->rc_startblock;
>  	cur->bc_rec.rc.rc_blockcount = irec->rc_blockcount;
>  	cur->bc_rec.rc.rc_refcount = irec->rc_refcount;
> +	cur->bc_rec.rc.rc_domain = irec->rc_domain;

Does a struct copy make more sense here now?

[.....]

> @@ -600,8 +636,6 @@ xfs_refcount_merge_right_extent(
>  	return error;
>  }
>  
> -#define XFS_FIND_RCEXT_SHARED	1
> -#define XFS_FIND_RCEXT_COW	2
>  /*
>   * Find the left extent and the one after it (cleft).  This function assumes
>   * that we've already split any extent crossing agbno.
> @@ -611,16 +645,16 @@ xfs_refcount_find_left_extents(
>  	struct xfs_btree_cur		*cur,
>  	struct xfs_refcount_irec	*left,
>  	struct xfs_refcount_irec	*cleft,
> +	enum xfs_rcext_domain		domain,
>  	xfs_agblock_t			agbno,
> -	xfs_extlen_t			aglen,
> -	int				flags)
> +	xfs_extlen_t			aglen)
>  {
>  	struct xfs_refcount_irec	tmp;
>  	int				error;
>  	int				found_rec;
>  
>  	left->rc_startblock = cleft->rc_startblock = NULLAGBLOCK;
> -	error = xfs_refcount_lookup_le(cur, agbno - 1, &found_rec);
> +	error = xfs_refcount_lookup_le(cur, domain, agbno - 1, &found_rec);
>  	if (error)
>  		goto out_error;
>  	if (!found_rec)
> @@ -634,11 +668,13 @@ xfs_refcount_find_left_extents(
>  		goto out_error;
>  	}
>  
> +	if (tmp.rc_domain != domain)
> +		return 0;
>  	if (xfs_refc_next(&tmp) != agbno)
>  		return 0;
> -	if ((flags & XFS_FIND_RCEXT_SHARED) && tmp.rc_refcount < 2)
> +	if (domain == XFS_RCDOM_SHARED && tmp.rc_refcount < 2)
>  		return 0;
> -	if ((flags & XFS_FIND_RCEXT_COW) && tmp.rc_refcount > 1)
> +	if (domain == XFS_RCDOM_COW && tmp.rc_refcount > 1)
>  		return 0;

Hmmm - this pattern is repeated in a couple of places. Perhaps a
helper is in order?

[....]

> diff --git a/fs/xfs/libxfs/xfs_refcount_btree.c b/fs/xfs/libxfs/xfs_refcount_btree.c
> index 316c1ec0c3c2..b0818063aa20 100644
> --- a/fs/xfs/libxfs/xfs_refcount_btree.c
> +++ b/fs/xfs/libxfs/xfs_refcount_btree.c
> @@ -155,12 +155,31 @@ xfs_refcountbt_init_high_key_from_rec(
>  	key->refc.rc_startblock = cpu_to_be32(x);
>  }
>  
> +static inline __u32
> +xfs_refcountbt_encode_startblock(
> +	struct xfs_btree_cur	*cur)
> +{
> +	__u32			start;
> +
> +	/*
> +	 * low level btree operations need to handle the generic btree range
> +	 * query functions (which set rc_domain == -1U), so we check that the
> +	 * domain is /not/ shared.
> +	 */
> +	start = cur->bc_rec.rc.rc_startblock & ~XFS_REFC_COW_START;
> +	if (cur->bc_rec.rc.rc_domain != XFS_RCDOM_SHARED)
> +		start |= XFS_REFC_COW_START;
> +	return start;
> +}

Oh, there is a xfs_refcount_domain_to_block() helper already - it
just needs to be passed the agbno + domain, not a btree cursor :)

[....]

> diff --git a/fs/xfs/scrub/refcount.c b/fs/xfs/scrub/refcount.c
> index 8ab55e791ea8..9cf4be9cbb89 100644
> --- a/fs/xfs/scrub/refcount.c
> +++ b/fs/xfs/scrub/refcount.c
> @@ -334,20 +334,19 @@ xchk_refcountbt_rec(
>  	struct xfs_refcount_irec irec;
>  	xfs_agblock_t		*cow_blocks = bs->private;
>  	struct xfs_perag	*pag = bs->cur->bc_ag.pag;
> -	bool			has_cowflag;
>  
>  	xfs_refcount_btrec_to_irec(rec, &irec);
>  
>  	/* Only CoW records can have refcount == 1. */
> -	has_cowflag = (irec.rc_startblock & XFS_REFC_COW_START);
> -	if ((irec.rc_refcount == 1 && !has_cowflag) ||
> -	    (irec.rc_refcount != 1 && has_cowflag))
> +	if (irec.rc_domain == XFS_RCDOM_SHARED && irec.rc_refcount == 1)
>  		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> -	if (has_cowflag)
> +	if (irec.rc_domain == XFS_RCDOM_COW) {
> +		if (irec.rc_refcount != 1)
> +			xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
>  		(*cow_blocks) += irec.rc_blockcount;
> +	}

Hmmm, that looks like another of those shared/cow domain + refcount
checks like I pointed out above...

>  
>  	/* Check the extent. */
> -	irec.rc_startblock &= ~XFS_REFC_COW_START;
>  	if (irec.rc_startblock + irec.rc_blockcount <= irec.rc_startblock ||
>  	    !xfs_verify_agbno(pag, irec.rc_startblock) ||
>  	    !xfs_verify_agbno(pag, irec.rc_startblock + irec.rc_blockcount - 1))
> @@ -420,7 +419,6 @@ xchk_xref_is_cow_staging(
>  	xfs_extlen_t			len)
>  {
>  	struct xfs_refcount_irec	rc;
> -	bool				has_cowflag;
>  	int				has_refcount;
>  	int				error;
>  
> @@ -428,8 +426,8 @@ xchk_xref_is_cow_staging(
>  		return;
>  
>  	/* Find the CoW staging extent. */
> -	error = xfs_refcount_lookup_le(sc->sa.refc_cur,
> -			agbno + XFS_REFC_COW_START, &has_refcount);
> +	error = xfs_refcount_lookup_le(sc->sa.refc_cur, XFS_RCDOM_COW, agbno,
> +			&has_refcount);
>  	if (!xchk_should_check_xref(sc, &error, &sc->sa.refc_cur))
>  		return;
>  	if (!has_refcount) {
> @@ -446,8 +444,7 @@ xchk_xref_is_cow_staging(
>  	}
>  
>  	/* CoW flag must be set, refcount must be 1. */
> -	has_cowflag = (rc.rc_startblock & XFS_REFC_COW_START);
> -	if (!has_cowflag || rc.rc_refcount != 1)
> +	if (rc.rc_domain != XFS_RCDOM_COW || rc.rc_refcount != 1)
>  		xchk_btree_xref_set_corrupt(sc, sc->sa.refc_cur, 0);

That looks like the same validation logic check as the above hunk
in xchk_refcountbt_rec()...

That's just a first pass - I haven't really concentrated on
correctness that much yet, the patch is really doing too much for me
to get a good picture of everything...

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/5] xfs: rename XFS_REFC_COW_START to _COWFLAG
  2022-10-24 21:33 ` [PATCH 4/5] xfs: rename XFS_REFC_COW_START to _COWFLAG Darrick J. Wong
@ 2022-10-26  0:41   ` Dave Chinner
  0 siblings, 0 replies; 13+ messages in thread
From: Dave Chinner @ 2022-10-26  0:41 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Oct 24, 2022 at 02:33:31PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> We've been (ab)using XFS_REFC_COW_START as both an integer quantity and
> a bit flag, even though it's *only* a bit flag.  Rename the variable to
> reflect its nature and update the cast target since we're not supposed
> to be comparing it to xfs_agblock_t now.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/libxfs/xfs_format.h         |    2 +-
>  fs/xfs/libxfs/xfs_refcount.c       |   18 +++++++++---------
>  fs/xfs/libxfs/xfs_refcount_btree.c |    4 ++--
>  3 files changed, 12 insertions(+), 12 deletions(-)

Seems fine, but I think this will change quite a bit if helpers get
used for most of these XFS_REFC_COW_START checks...

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 5/5] xfs: check deferred refcount op continuation parameters
  2022-10-24 21:33 ` [PATCH 5/5] xfs: check deferred refcount op continuation parameters Darrick J. Wong
@ 2022-10-26  0:46   ` Dave Chinner
  2022-10-26  1:28     ` Darrick J. Wong
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Chinner @ 2022-10-26  0:46 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Oct 24, 2022 at 02:33:37PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> If we're in the middle of a deferred refcount operation and decide to
> roll the transaction to avoid overflowing the transaction space, we need
> to check the new agbno/aglen parameters that we're about to record in
> the new intent.  Specifically, we need to check that the new extent is
> completely within the filesystem, and that continuation does not put us
> into a different AG.

Huh. Why would they not be withing the filesystem or AG, given that
the current transaction should only be operating on an extent within
the current filesystem/AG?

IIUC, this is code intended to catch the sort of refcount irec
startblock corruption that the series fixes, right? If so, shouldn't it be
first in the patch series, not last?

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 5/5] xfs: check deferred refcount op continuation parameters
  2022-10-26  0:46   ` Dave Chinner
@ 2022-10-26  1:28     ` Darrick J. Wong
  0 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2022-10-26  1:28 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Oct 26, 2022 at 11:46:03AM +1100, Dave Chinner wrote:
> On Mon, Oct 24, 2022 at 02:33:37PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > If we're in the middle of a deferred refcount operation and decide to
> > roll the transaction to avoid overflowing the transaction space, we need
> > to check the new agbno/aglen parameters that we're about to record in
> > the new intent.  Specifically, we need to check that the new extent is
> > completely within the filesystem, and that continuation does not put us
> > into a different AG.
> 
> Huh. Why would they not be withing the filesystem or AG, given that
> the current transaction should only be operating on an extent within
> the current filesystem/AG?

Math errors.

Callers of xfs_refcount_adjust_extents are supposed to split (and
update) any refcount records that cross the start or end of the range
that we're adjusting.  This guarantees that _adjust_extents will stop at
exactly the end of a refcount record.

However, if a record in the middle of that range has a blockcount that
is longer than *aglen at the point that we encounter the record, we'll
increment agbno beyond the range that we were supposed to adjust.  This
happens either because we fuzzed the refcountbt deliberately, or ...
because we actually did write the refcountbt record block but due to
bugs on the vm host, the write was silently dropped and memory pressure
caused the xfs_buf to get reclaimed and reloaded with stale contents.

(That's an argument for making xfs_refcount_adjust_extents check that
*aglen never underflows; I'll update the patch.

Oh.  I already wrote that patch, but forgot to add it to this series.
Sigh.)

If agbno is now large enough that the segmented xfs_fsblock_t points
into a nonexistent AG, bad things will happen.

> IIUC, this is code intended to catch the sort of refcount irec
> startblock corruption that the series fixes, right? If so, shouldn't it be
> first in the patch series, not last?

<shrug> Based on reviewer feedback over the last few years I got in the
habit of putting the actual bug fixes at the front of the series.
Patches adding more layers of checking so that we can die gracefully
ended up after that.

--D

> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/5] xfs: track cow/shared record domains explicitly in xfs_refcount_irec
  2022-10-26  0:40   ` Dave Chinner
@ 2022-10-26 22:06     ` Darrick J. Wong
  0 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2022-10-26 22:06 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Oct 26, 2022 at 11:40:29AM +1100, Dave Chinner wrote:
> On Mon, Oct 24, 2022 at 02:33:25PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Just prior to committing the reflink code into upstream, the xfs
> > maintainer at the time requested that I find a way to shard the refcount
> > records into two domains -- one for records tracking shared extents, and
> > a second for tracking CoW staging extents.  The idea here was to
> > minimize mount time CoW reclamation by pushing all the CoW records to
> > the right edge of the keyspace, and it was accomplished by setting the
> > upper bit in rc_startblock.  We don't allow AGs to have more than 2^31
> > blocks, so the bit was free.
> > 
> > Unfortunately, this was a very late addition to the codebase, so most of
> > the refcount record processing code still treats rc_startblock as a u32
> > and pays no attention to whether or not the upper bit (the cow flag) is
> > set.  This is a weakness is theoretically exploitable, since we're not
> > fully validating the incoming metadata records.
> > 
> > Fuzzing demonstrates practical exploits of this weakness.  If the cow
> > flag of a node block key record is corrupted, a lookup operation can go
> > to the wrong record block and start returning records from the wrong
> > cow/shared domain.  This causes the math to go all wrong (since cow
> > domain is still implicit in the upper bit of rc_startblock) and we can
> > crash the kernel by tricking xfs into jumping into a nonexistent AG and
> > tripping over xfs_perag_get(mp, <nonexistent AG>) returning NULL.
> > 
> > To fix this, start tracking the domain as an explicit part of struct
> > xfs_refcount_irec, adjust all refcount functions to check the domain
> > of a returned record, and alter the function definitions to accept them
> > where necessary.
> > 
> > Found by fuzzing keys[2].cowflag = add in xfs/464.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  fs/xfs/libxfs/xfs_refcount.c       |  221 ++++++++++++++++++++++++------------
> >  fs/xfs/libxfs/xfs_refcount.h       |    9 +
> >  fs/xfs/libxfs/xfs_refcount_btree.c |   26 ++++
> >  fs/xfs/libxfs/xfs_types.h          |   10 ++
> >  fs/xfs/scrub/refcount.c            |   22 ++--
> >  fs/xfs/xfs_trace.h                 |   48 ++++++--
> >  6 files changed, 235 insertions(+), 101 deletions(-)
> 
> My first thought was that this is complex enough that it needs to be
> split up. Reading the very first hunk:
> 
> > 
> > 
> > diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
> > index 64b910caafaa..fa75e785652b 100644
> > --- a/fs/xfs/libxfs/xfs_refcount.c
> > +++ b/fs/xfs/libxfs/xfs_refcount.c
> > @@ -46,13 +46,16 @@ STATIC int __xfs_refcount_cow_free(struct xfs_btree_cur *rcur,
> >  int
> >  xfs_refcount_lookup_le(
> >  	struct xfs_btree_cur	*cur,
> > +	enum xfs_rcext_domain	domain,
> >  	xfs_agblock_t		bno,
> >  	int			*stat)
> >  {
> > -	trace_xfs_refcount_lookup(cur->bc_mp, cur->bc_ag.pag->pag_agno, bno,
> > +	trace_xfs_refcount_lookup(cur->bc_mp, cur->bc_ag.pag->pag_agno,
> > +			bno | (domain == XFS_RCDOM_COW ? XFS_REFC_COW_START : 0),
> >  			XFS_LOOKUP_LE);
> >  	cur->bc_rec.rc.rc_startblock = bno;
> >  	cur->bc_rec.rc.rc_blockcount = 0;
> > +	cur->bc_rec.rc.rc_domain = domain;
> >  	return xfs_btree_lookup(cur, XFS_LOOKUP_LE, stat);
> 
> I found that I had to go looking through the patch to find what the
> definitions of xfs_rcext_domain, XFS_RCDOM_COW and rc_domain
> actually were, because without knowing what the actual new
> structures being introduced actually were this didn't make a whole
> lot of sense.

<nod> Sorry about that.

> Hence I think I'd like this to be broken up into a patch that
> introduces the rc_domain and the helpers that build/split the
> domain/startblock information in the irec, and a followup patch that
> modifies the implementation to use the new domain interfaces to make
> it a bit easier to see where the significant changes in the code
> actually are.

Ok, I've done that.  The mechanical change itself is still pretty large,
but I think it's a bit more straightforward.  The tracepoint updtes and
the change to _get_rec callers to compare the desired domain against the
one returned are each separate patches.

> I also suspect that trace_xfs_refcount_lookup() should be passed the
> domain directly rather than encoding it at every call site of the
> tracepoint, or it code encoded in a helper such as
> xfs_refcount_domain_to_block()...

Done.

> For consistency, calling the enums XFS_REFC_DOMAIN_{COW/SHARED}
> would be more consistent with the naming used in other refcount
> btree constants.

Done.

> >  }
> >  
> > @@ -63,13 +66,16 @@ xfs_refcount_lookup_le(
> >  int
> >  xfs_refcount_lookup_ge(
> >  	struct xfs_btree_cur	*cur,
> > +	enum xfs_rcext_domain	domain,
> >  	xfs_agblock_t		bno,
> >  	int			*stat)
> >  {
> > -	trace_xfs_refcount_lookup(cur->bc_mp, cur->bc_ag.pag->pag_agno, bno,
> > +	trace_xfs_refcount_lookup(cur->bc_mp, cur->bc_ag.pag->pag_agno,
> > +			bno | (domain == XFS_RCDOM_COW ? XFS_REFC_COW_START : 0),
> >  			XFS_LOOKUP_GE);
> >  	cur->bc_rec.rc.rc_startblock = bno;
> >  	cur->bc_rec.rc.rc_blockcount = 0;
> > +	cur->bc_rec.rc.rc_domain = domain;
> >  	return xfs_btree_lookup(cur, XFS_LOOKUP_GE, stat);
> >  }
> >  
> > @@ -80,13 +86,16 @@ xfs_refcount_lookup_ge(
> >  int
> >  xfs_refcount_lookup_eq(
> >  	struct xfs_btree_cur	*cur,
> > +	enum xfs_rcext_domain	domain,
> >  	xfs_agblock_t		bno,
> >  	int			*stat)
> >  {
> > -	trace_xfs_refcount_lookup(cur->bc_mp, cur->bc_ag.pag->pag_agno, bno,
> > +	trace_xfs_refcount_lookup(cur->bc_mp, cur->bc_ag.pag->pag_agno,
> > +			bno | (domain == XFS_RCDOM_COW ? XFS_REFC_COW_START : 0),
> >  			XFS_LOOKUP_LE);
> >  	cur->bc_rec.rc.rc_startblock = bno;
> >  	cur->bc_rec.rc.rc_blockcount = 0;
> > +	cur->bc_rec.rc.rc_domain = domain;
> >  	return xfs_btree_lookup(cur, XFS_LOOKUP_EQ, stat);
> >  }
> >  
> > @@ -96,7 +105,17 @@ xfs_refcount_btrec_to_irec(
> >  	const union xfs_btree_rec	*rec,
> >  	struct xfs_refcount_irec	*irec)
> >  {
> > -	irec->rc_startblock = be32_to_cpu(rec->refc.rc_startblock);
> > +	__u32				start;
> > +
> > +	start = be32_to_cpu(rec->refc.rc_startblock);
> > +	if (start & XFS_REFC_COW_START) {
> > +		start &= ~XFS_REFC_COW_START;
> > +		irec->rc_domain = XFS_RCDOM_COW;
> > +	} else {
> > +		irec->rc_domain = XFS_RCDOM_SHARED;
> > +	}
> 
> xfs_refcount_block_to_domain()?

This is the only place where this happens, so I've left it this way.

> 
> 
> > +
> > +	irec->rc_startblock = start;
> >  	irec->rc_blockcount = be32_to_cpu(rec->refc.rc_blockcount);
> >  	irec->rc_refcount = be32_to_cpu(rec->refc.rc_refcount);
> >  }
> > @@ -114,7 +133,6 @@ xfs_refcount_get_rec(
> >  	struct xfs_perag		*pag = cur->bc_ag.pag;
> >  	union xfs_btree_rec		*rec;
> >  	int				error;
> > -	xfs_agblock_t			realstart;
> >  
> >  	error = xfs_btree_get_rec(cur, &rec, stat);
> >  	if (error || !*stat)
> > @@ -124,22 +142,18 @@ xfs_refcount_get_rec(
> >  	if (irec->rc_blockcount == 0 || irec->rc_blockcount > MAXREFCEXTLEN)
> >  		goto out_bad_rec;
> >  
> > -	/* handle special COW-staging state */
> > -	realstart = irec->rc_startblock;
> > -	if (realstart & XFS_REFC_COW_START) {
> > -		if (irec->rc_refcount != 1)
> > -			goto out_bad_rec;
> > -		realstart &= ~XFS_REFC_COW_START;
> > -	} else if (irec->rc_refcount < 2) {
> > +	/* handle special COW-staging domain */
> > +	if (irec->rc_domain == XFS_RCDOM_COW && irec->rc_refcount != 1)
> > +		goto out_bad_rec;
> > +	if (irec->rc_domain == XFS_RCDOM_SHARED && irec->rc_refcount < 2)
> >  		goto out_bad_rec;
> > -	}
> >  
> >  	/* check for valid extent range, including overflow */
> > -	if (!xfs_verify_agbno(pag, realstart))
> > +	if (!xfs_verify_agbno(pag, irec->rc_startblock))
> >  		goto out_bad_rec;
> > -	if (realstart > realstart + irec->rc_blockcount)
> > +	if (irec->rc_startblock > irec->rc_startblock + irec->rc_blockcount)
> >  		goto out_bad_rec;
> > -	if (!xfs_verify_agbno(pag, realstart + irec->rc_blockcount - 1))
> > +	if (!xfs_verify_agbno(pag, irec->rc_startblock + irec->rc_blockcount - 1))
> >  		goto out_bad_rec;
> >  
> >  	if (irec->rc_refcount == 0 || irec->rc_refcount > MAXREFCOUNT)
> > @@ -169,12 +183,19 @@ xfs_refcount_update(
> >  	struct xfs_refcount_irec	*irec)
> >  {
> >  	union xfs_btree_rec	rec;
> > +	__u32			start;
> >  	int			error;
> >  
> >  	trace_xfs_refcount_update(cur->bc_mp, cur->bc_ag.pag->pag_agno, irec);
> > -	rec.refc.rc_startblock = cpu_to_be32(irec->rc_startblock);
> > +
> > +	start = irec->rc_startblock & ~XFS_REFC_COW_START;
> > +	if (irec->rc_domain == XFS_RCDOM_COW)
> > +		start |= XFS_REFC_COW_START;
> 
> Yeah, this definitely looks like we want a
> xfs_refcount_domain_to_block() helper...

Yep.

> > +
> > +	rec.refc.rc_startblock = cpu_to_be32(start);
> >  	rec.refc.rc_blockcount = cpu_to_be32(irec->rc_blockcount);
> >  	rec.refc.rc_refcount = cpu_to_be32(irec->rc_refcount);
> > +
> >  	error = xfs_btree_update(cur, &rec);
> >  	if (error)
> >  		trace_xfs_refcount_update_error(cur->bc_mp,
> > @@ -196,9 +217,12 @@ xfs_refcount_insert(
> >  	int				error;
> >  
> >  	trace_xfs_refcount_insert(cur->bc_mp, cur->bc_ag.pag->pag_agno, irec);
> > +
> >  	cur->bc_rec.rc.rc_startblock = irec->rc_startblock;
> >  	cur->bc_rec.rc.rc_blockcount = irec->rc_blockcount;
> >  	cur->bc_rec.rc.rc_refcount = irec->rc_refcount;
> > +	cur->bc_rec.rc.rc_domain = irec->rc_domain;
> 
> Does a struct copy make more sense here now?

Perhaps, but not in a bugfix patchset.

> [.....]
> 
> > @@ -600,8 +636,6 @@ xfs_refcount_merge_right_extent(
> >  	return error;
> >  }
> >  
> > -#define XFS_FIND_RCEXT_SHARED	1
> > -#define XFS_FIND_RCEXT_COW	2
> >  /*
> >   * Find the left extent and the one after it (cleft).  This function assumes
> >   * that we've already split any extent crossing agbno.
> > @@ -611,16 +645,16 @@ xfs_refcount_find_left_extents(
> >  	struct xfs_btree_cur		*cur,
> >  	struct xfs_refcount_irec	*left,
> >  	struct xfs_refcount_irec	*cleft,
> > +	enum xfs_rcext_domain		domain,
> >  	xfs_agblock_t			agbno,
> > -	xfs_extlen_t			aglen,
> > -	int				flags)
> > +	xfs_extlen_t			aglen)
> >  {
> >  	struct xfs_refcount_irec	tmp;
> >  	int				error;
> >  	int				found_rec;
> >  
> >  	left->rc_startblock = cleft->rc_startblock = NULLAGBLOCK;
> > -	error = xfs_refcount_lookup_le(cur, agbno - 1, &found_rec);
> > +	error = xfs_refcount_lookup_le(cur, domain, agbno - 1, &found_rec);
> >  	if (error)
> >  		goto out_error;
> >  	if (!found_rec)
> > @@ -634,11 +668,13 @@ xfs_refcount_find_left_extents(
> >  		goto out_error;
> >  	}
> >  
> > +	if (tmp.rc_domain != domain)
> > +		return 0;
> >  	if (xfs_refc_next(&tmp) != agbno)
> >  		return 0;
> > -	if ((flags & XFS_FIND_RCEXT_SHARED) && tmp.rc_refcount < 2)
> > +	if (domain == XFS_RCDOM_SHARED && tmp.rc_refcount < 2)
> >  		return 0;
> > -	if ((flags & XFS_FIND_RCEXT_COW) && tmp.rc_refcount > 1)
> > +	if (domain == XFS_RCDOM_COW && tmp.rc_refcount > 1)
> >  		return 0;
> 
> Hmmm - this pattern is repeated in a couple of places. Perhaps a
> helper is in order?

Ok, I'll clean this up.

Actually -- upon second examination, these two if tests aren't needed
anymore, because they were a rudimentary means of ignoring records from
the wrong domain.  We added an explicit domain check above, so this can
go away entirely.

_get_rec will return -EFSCORRUPTED if the domain/refcount are
inconsistent, so we're protected there.

> [....]
> 
> > diff --git a/fs/xfs/libxfs/xfs_refcount_btree.c b/fs/xfs/libxfs/xfs_refcount_btree.c
> > index 316c1ec0c3c2..b0818063aa20 100644
> > --- a/fs/xfs/libxfs/xfs_refcount_btree.c
> > +++ b/fs/xfs/libxfs/xfs_refcount_btree.c
> > @@ -155,12 +155,31 @@ xfs_refcountbt_init_high_key_from_rec(
> >  	key->refc.rc_startblock = cpu_to_be32(x);
> >  }
> >  
> > +static inline __u32
> > +xfs_refcountbt_encode_startblock(
> > +	struct xfs_btree_cur	*cur)
> > +{
> > +	__u32			start;
> > +
> > +	/*
> > +	 * low level btree operations need to handle the generic btree range
> > +	 * query functions (which set rc_domain == -1U), so we check that the
> > +	 * domain is /not/ shared.
> > +	 */
> > +	start = cur->bc_rec.rc.rc_startblock & ~XFS_REFC_COW_START;
> > +	if (cur->bc_rec.rc.rc_domain != XFS_RCDOM_SHARED)
> > +		start |= XFS_REFC_COW_START;
> > +	return start;
> > +}
> 
> Oh, there is a xfs_refcount_domain_to_block() helper already - it
> just needs to be passed the agbno + domain, not a btree cursor :)

Done.

> [....]
> 
> > diff --git a/fs/xfs/scrub/refcount.c b/fs/xfs/scrub/refcount.c
> > index 8ab55e791ea8..9cf4be9cbb89 100644
> > --- a/fs/xfs/scrub/refcount.c
> > +++ b/fs/xfs/scrub/refcount.c
> > @@ -334,20 +334,19 @@ xchk_refcountbt_rec(
> >  	struct xfs_refcount_irec irec;
> >  	xfs_agblock_t		*cow_blocks = bs->private;
> >  	struct xfs_perag	*pag = bs->cur->bc_ag.pag;
> > -	bool			has_cowflag;
> >  
> >  	xfs_refcount_btrec_to_irec(rec, &irec);
> >  
> >  	/* Only CoW records can have refcount == 1. */
> > -	has_cowflag = (irec.rc_startblock & XFS_REFC_COW_START);
> > -	if ((irec.rc_refcount == 1 && !has_cowflag) ||
> > -	    (irec.rc_refcount != 1 && has_cowflag))
> > +	if (irec.rc_domain == XFS_RCDOM_SHARED && irec.rc_refcount == 1)
> >  		xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> > -	if (has_cowflag)
> > +	if (irec.rc_domain == XFS_RCDOM_COW) {
> > +		if (irec.rc_refcount != 1)
> > +			xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
> >  		(*cow_blocks) += irec.rc_blockcount;
> > +	}
> 
> Hmmm, that looks like another of those shared/cow domain + refcount
> checks like I pointed out above...

Yep.  Separate cleanup patch.

> >  
> >  	/* Check the extent. */
> > -	irec.rc_startblock &= ~XFS_REFC_COW_START;
> >  	if (irec.rc_startblock + irec.rc_blockcount <= irec.rc_startblock ||
> >  	    !xfs_verify_agbno(pag, irec.rc_startblock) ||
> >  	    !xfs_verify_agbno(pag, irec.rc_startblock + irec.rc_blockcount - 1))
> > @@ -420,7 +419,6 @@ xchk_xref_is_cow_staging(
> >  	xfs_extlen_t			len)
> >  {
> >  	struct xfs_refcount_irec	rc;
> > -	bool				has_cowflag;
> >  	int				has_refcount;
> >  	int				error;
> >  
> > @@ -428,8 +426,8 @@ xchk_xref_is_cow_staging(
> >  		return;
> >  
> >  	/* Find the CoW staging extent. */
> > -	error = xfs_refcount_lookup_le(sc->sa.refc_cur,
> > -			agbno + XFS_REFC_COW_START, &has_refcount);
> > +	error = xfs_refcount_lookup_le(sc->sa.refc_cur, XFS_RCDOM_COW, agbno,
> > +			&has_refcount);
> >  	if (!xchk_should_check_xref(sc, &error, &sc->sa.refc_cur))
> >  		return;
> >  	if (!has_refcount) {
> > @@ -446,8 +444,7 @@ xchk_xref_is_cow_staging(
> >  	}
> >  
> >  	/* CoW flag must be set, refcount must be 1. */
> > -	has_cowflag = (rc.rc_startblock & XFS_REFC_COW_START);
> > -	if (!has_cowflag || rc.rc_refcount != 1)
> > +	if (rc.rc_domain != XFS_RCDOM_COW || rc.rc_refcount != 1)
> >  		xchk_btree_xref_set_corrupt(sc, sc->sa.refc_cur, 0);
> 
> That looks like the same validation logic check as the above hunk
> in xchk_refcountbt_rec()...

...but _get_rec will check this for us, so this check collapses to:

	if (rc.rc_domain != XFS_REFC_DOMAIN_COW)
		xchk_btree_xref_set_corrupt(...);

> That's just a first pass - I haven't really concentrated on
> correctness that much yet, the patch is really doing too much for me
> to get a good picture of everything...

<nod> I'll have a new patchset with changes broken up into smaller
pieces tomorrow.

--D

> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-10-26 22:06 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-24 21:33 [PATCHSET 0/5] xfs: improve runtime refcountbt corruption detection Darrick J. Wong
2022-10-24 21:33 ` [PATCH 1/5] xfs: move _irec structs to xfs_types.h Darrick J. Wong
2022-10-25 23:56   ` Dave Chinner
2022-10-24 21:33 ` [PATCH 2/5] xfs: refactor refcount record usage in xchk_refcountbt_rec Darrick J. Wong
2022-10-25 23:59   ` Dave Chinner
2022-10-24 21:33 ` [PATCH 3/5] xfs: track cow/shared record domains explicitly in xfs_refcount_irec Darrick J. Wong
2022-10-26  0:40   ` Dave Chinner
2022-10-26 22:06     ` Darrick J. Wong
2022-10-24 21:33 ` [PATCH 4/5] xfs: rename XFS_REFC_COW_START to _COWFLAG Darrick J. Wong
2022-10-26  0:41   ` Dave Chinner
2022-10-24 21:33 ` [PATCH 5/5] xfs: check deferred refcount op continuation parameters Darrick J. Wong
2022-10-26  0:46   ` Dave Chinner
2022-10-26  1:28     ` Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.