linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET v23.1 0/2] xfs: enhance btree key checking
@ 2022-10-02 18:20 Darrick J. Wong
  2022-10-02 18:20 ` [PATCH 2/2] xfs: check btree keys reflect the child block Darrick J. Wong
  2022-10-02 18:20 ` [PATCH 1/2] xfs: fix rmap key comparison functions Darrick J. Wong
  0 siblings, 2 replies; 6+ messages in thread
From: Darrick J. Wong @ 2022-10-02 18:20 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

Hi all,

This series addresses some deficiencies in btree key comparisons that I
noticed while testing online scrub.  The first fixes a bug in the rmap
key comparison that prevents the rmap scrubber from detecting rmap
records for file space allocations with mismatched attr/bmbt flags.
The second patch fixes the scrub btree block checker to ensure that the
keys in the parent block accurately represent the block.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=btree-key-enhancements

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=btree-key-enhancements
---
 fs/xfs/libxfs/xfs_rmap_btree.c |   25 ++++++++++++++------
 fs/xfs/scrub/btree.c           |   49 +++++++++++++++++++++++++++++++++++++++-
 2 files changed, 65 insertions(+), 9 deletions(-)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] xfs: fix rmap key comparison functions
  2022-10-02 18:20 [PATCHSET v23.1 0/2] xfs: enhance btree key checking Darrick J. Wong
  2022-10-02 18:20 ` [PATCH 2/2] xfs: check btree keys reflect the child block Darrick J. Wong
@ 2022-10-02 18:20 ` Darrick J. Wong
  2022-11-01 23:40   ` Dave Chinner
  1 sibling, 1 reply; 6+ messages in thread
From: Darrick J. Wong @ 2022-10-02 18:20 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Keys for extent interval records in the reverse mapping btree are
supposed to be computed as follows:

(physical block, owner, fork, is_btree, offset)

This provides users the ability to look up a reverse mapping from a file
block mapping record -- start with the physical block; then if there are
multiple records for the same block, move on to the owner; then the
inode fork type; and so on to the file offset.

However, the key comparison functions incorrectly remove the fork/bmbt
information that's encoded in the on-disk offset.  This means that
lookup comparisons are only done with:

(physical block, owner, offset)

This means that queries can return incorrect results.  On consistent
filesystems this isn't an issue because bmbt blocks and blocks mapped to
an attr fork cannot be shared, but this prevents us from detecting
incorrect fork and bmbt flag bits in the rmap btree.

A previous version of this patch forgot to keep the (un)written state
flag masked during the comparison and caused a major regression in
5.9.x since unwritten extent conversion can update an rmap record
without requiring key updates.

Note that blocks cannot go directly from data fork to attr fork without
being deallocated and reallocated, nor can they be added to or removed
from a bmbt without a free/alloc cycle, so this should not cause any
regressions.

Found by fuzzing keys[1].attrfork = ones on xfs/371.

Fixes: 4b8ed67794fe ("xfs: add rmap btree operations")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_rmap_btree.c |   25 +++++++++++++++++--------
 1 file changed, 17 insertions(+), 8 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_rmap_btree.c b/fs/xfs/libxfs/xfs_rmap_btree.c
index 7f83f62e51e0..e2e1f68cedf5 100644
--- a/fs/xfs/libxfs/xfs_rmap_btree.c
+++ b/fs/xfs/libxfs/xfs_rmap_btree.c
@@ -219,6 +219,15 @@ xfs_rmapbt_init_ptr_from_cur(
 	ptr->s = agf->agf_roots[cur->bc_btnum];
 }
 
+/*
+ * Fork and bmbt are significant parts of the rmap record key, but written
+ * status is merely a record attribute.
+ */
+static inline uint64_t offset_keymask(uint64_t offset)
+{
+	return offset & ~XFS_RMAP_OFF_UNWRITTEN;
+}
+
 STATIC int64_t
 xfs_rmapbt_key_diff(
 	struct xfs_btree_cur		*cur,
@@ -240,8 +249,8 @@ xfs_rmapbt_key_diff(
 	else if (y > x)
 		return -1;
 
-	x = XFS_RMAP_OFF(be64_to_cpu(kp->rm_offset));
-	y = rec->rm_offset;
+	x = offset_keymask(be64_to_cpu(kp->rm_offset));
+	y = offset_keymask(xfs_rmap_irec_offset_pack(rec));
 	if (x > y)
 		return 1;
 	else if (y > x)
@@ -272,8 +281,8 @@ xfs_rmapbt_diff_two_keys(
 	else if (y > x)
 		return -1;
 
-	x = XFS_RMAP_OFF(be64_to_cpu(kp1->rm_offset));
-	y = XFS_RMAP_OFF(be64_to_cpu(kp2->rm_offset));
+	x = offset_keymask(be64_to_cpu(kp1->rm_offset));
+	y = offset_keymask(be64_to_cpu(kp2->rm_offset));
 	if (x > y)
 		return 1;
 	else if (y > x)
@@ -387,8 +396,8 @@ xfs_rmapbt_keys_inorder(
 		return 1;
 	else if (a > b)
 		return 0;
-	a = XFS_RMAP_OFF(be64_to_cpu(k1->rmap.rm_offset));
-	b = XFS_RMAP_OFF(be64_to_cpu(k2->rmap.rm_offset));
+	a = offset_keymask(be64_to_cpu(k1->rmap.rm_offset));
+	b = offset_keymask(be64_to_cpu(k2->rmap.rm_offset));
 	if (a <= b)
 		return 1;
 	return 0;
@@ -417,8 +426,8 @@ xfs_rmapbt_recs_inorder(
 		return 1;
 	else if (a > b)
 		return 0;
-	a = XFS_RMAP_OFF(be64_to_cpu(r1->rmap.rm_offset));
-	b = XFS_RMAP_OFF(be64_to_cpu(r2->rmap.rm_offset));
+	a = offset_keymask(be64_to_cpu(r1->rmap.rm_offset));
+	b = offset_keymask(be64_to_cpu(r2->rmap.rm_offset));
 	if (a <= b)
 		return 1;
 	return 0;


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] xfs: check btree keys reflect the child block
  2022-10-02 18:20 [PATCHSET v23.1 0/2] xfs: enhance btree key checking Darrick J. Wong
@ 2022-10-02 18:20 ` Darrick J. Wong
  2022-11-01 23:41   ` Dave Chinner
  2022-10-02 18:20 ` [PATCH 1/2] xfs: fix rmap key comparison functions Darrick J. Wong
  1 sibling, 1 reply; 6+ messages in thread
From: Darrick J. Wong @ 2022-10-02 18:20 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

When scrub is checking a non-root btree block, it should make sure that
the keys in the parent btree block accurately capture the keyspace that
the child block stores.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/btree.c |   49 ++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 48 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/scrub/btree.c b/fs/xfs/scrub/btree.c
index 47fa14311d17..2e2aa0773858 100644
--- a/fs/xfs/scrub/btree.c
+++ b/fs/xfs/scrub/btree.c
@@ -517,6 +517,48 @@ xchk_btree_check_minrecs(
 		xchk_btree_set_corrupt(bs->sc, cur, level);
 }
 
+/*
+ * If this btree block has a parent, make sure that the parent's keys capture
+ * the keyspace contained in this block.
+ */
+STATIC void
+xchk_btree_block_check_keys(
+	struct xchk_btree	*bs,
+	int			level,
+	struct xfs_btree_block	*block)
+{
+	union xfs_btree_key	block_key;
+	union xfs_btree_key	*block_high_key;
+	union xfs_btree_key	*parent_low_key, *parent_high_key;
+	struct xfs_btree_cur	*cur = bs->cur;
+	struct xfs_btree_block	*parent_block;
+	struct xfs_buf		*bp;
+
+	if (level == cur->bc_nlevels - 1)
+		return;
+
+	xfs_btree_get_keys(cur, block, &block_key);
+
+	/* Make sure the low key of this block matches the parent. */
+	parent_block = xfs_btree_get_block(cur, level + 1, &bp);
+	parent_low_key = xfs_btree_key_addr(cur, cur->bc_levels[level + 1].ptr,
+			parent_block);
+	if (cur->bc_ops->diff_two_keys(cur, &block_key, parent_low_key)) {
+		xchk_btree_set_corrupt(bs->sc, bs->cur, level);
+		return;
+	}
+
+	if (!(cur->bc_flags & XFS_BTREE_OVERLAPPING))
+		return;
+
+	/* Make sure the high key of this block matches the parent. */
+	parent_high_key = xfs_btree_high_key_addr(cur,
+			cur->bc_levels[level + 1].ptr, parent_block);
+	block_high_key = xfs_btree_high_key_from_key(cur, &block_key);
+	if (cur->bc_ops->diff_two_keys(cur, block_high_key, parent_high_key))
+		xchk_btree_set_corrupt(bs->sc, bs->cur, level);
+}
+
 /*
  * Grab and scrub a btree block given a btree pointer.  Returns block
  * and buffer pointers (if applicable) if they're ok to use.
@@ -568,7 +610,12 @@ xchk_btree_get_block(
 	 * Check the block's siblings; this function absorbs error codes
 	 * for us.
 	 */
-	return xchk_btree_block_check_siblings(bs, *pblock);
+	error = xchk_btree_block_check_siblings(bs, *pblock);
+	if (error)
+		return error;
+
+	xchk_btree_block_check_keys(bs, level, *pblock);
+	return 0;
 }
 
 /*


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] xfs: fix rmap key comparison functions
  2022-10-02 18:20 ` [PATCH 1/2] xfs: fix rmap key comparison functions Darrick J. Wong
@ 2022-11-01 23:40   ` Dave Chinner
  2022-11-02 23:53     ` Darrick J. Wong
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2022-11-01 23:40 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Sun, Oct 02, 2022 at 11:20:12AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Keys for extent interval records in the reverse mapping btree are
> supposed to be computed as follows:
> 
> (physical block, owner, fork, is_btree, offset)
> 
> This provides users the ability to look up a reverse mapping from a file
> block mapping record -- start with the physical block; then if there are
> multiple records for the same block, move on to the owner; then the
> inode fork type; and so on to the file offset.
> 
> However, the key comparison functions incorrectly remove the fork/bmbt
> information that's encoded in the on-disk offset.  This means that
> lookup comparisons are only done with:
> 
> (physical block, owner, offset)
> 
> This means that queries can return incorrect results.  On consistent
> filesystems this isn't an issue because bmbt blocks and blocks mapped to
> an attr fork cannot be shared, but this prevents us from detecting
> incorrect fork and bmbt flag bits in the rmap btree.
> 
> A previous version of this patch forgot to keep the (un)written state
> flag masked during the comparison and caused a major regression in
> 5.9.x since unwritten extent conversion can update an rmap record
> without requiring key updates.
> 
> Note that blocks cannot go directly from data fork to attr fork without
> being deallocated and reallocated, nor can they be added to or removed
> from a bmbt without a free/alloc cycle, so this should not cause any
> regressions.
> 
> Found by fuzzing keys[1].attrfork = ones on xfs/371.
> 
> Fixes: 4b8ed67794fe ("xfs: add rmap btree operations")
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/libxfs/xfs_rmap_btree.c |   25 +++++++++++++++++--------
>  1 file changed, 17 insertions(+), 8 deletions(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_rmap_btree.c b/fs/xfs/libxfs/xfs_rmap_btree.c
> index 7f83f62e51e0..e2e1f68cedf5 100644
> --- a/fs/xfs/libxfs/xfs_rmap_btree.c
> +++ b/fs/xfs/libxfs/xfs_rmap_btree.c
> @@ -219,6 +219,15 @@ xfs_rmapbt_init_ptr_from_cur(
>  	ptr->s = agf->agf_roots[cur->bc_btnum];
>  }
>  
> +/*
> + * Fork and bmbt are significant parts of the rmap record key, but written
> + * status is merely a record attribute.
> + */
> +static inline uint64_t offset_keymask(uint64_t offset)
> +{
> +	return offset & ~XFS_RMAP_OFF_UNWRITTEN;
> +}

Ok. but doesn't that mean xfs_rmapbt_init_key_from_rec() and
xfs_rmapbt_init_high_key_from_rec() should be masking out the
XFS_RMAP_OFF_UNWRITTEN bit as well?

-Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] xfs: check btree keys reflect the child block
  2022-10-02 18:20 ` [PATCH 2/2] xfs: check btree keys reflect the child block Darrick J. Wong
@ 2022-11-01 23:41   ` Dave Chinner
  0 siblings, 0 replies; 6+ messages in thread
From: Dave Chinner @ 2022-11-01 23:41 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Sun, Oct 02, 2022 at 11:20:12AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> When scrub is checking a non-root btree block, it should make sure that
> the keys in the parent btree block accurately capture the keyspace that
> the child block stores.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/scrub/btree.c |   49 ++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 48 insertions(+), 1 deletion(-)

Looks sensible.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] xfs: fix rmap key comparison functions
  2022-11-01 23:40   ` Dave Chinner
@ 2022-11-02 23:53     ` Darrick J. Wong
  0 siblings, 0 replies; 6+ messages in thread
From: Darrick J. Wong @ 2022-11-02 23:53 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Nov 02, 2022 at 10:40:22AM +1100, Dave Chinner wrote:
> On Sun, Oct 02, 2022 at 11:20:12AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Keys for extent interval records in the reverse mapping btree are
> > supposed to be computed as follows:
> > 
> > (physical block, owner, fork, is_btree, offset)
> > 
> > This provides users the ability to look up a reverse mapping from a file
> > block mapping record -- start with the physical block; then if there are
> > multiple records for the same block, move on to the owner; then the
> > inode fork type; and so on to the file offset.
> > 
> > However, the key comparison functions incorrectly remove the fork/bmbt
> > information that's encoded in the on-disk offset.  This means that
> > lookup comparisons are only done with:
> > 
> > (physical block, owner, offset)
> > 
> > This means that queries can return incorrect results.  On consistent
> > filesystems this isn't an issue because bmbt blocks and blocks mapped to
> > an attr fork cannot be shared, but this prevents us from detecting
> > incorrect fork and bmbt flag bits in the rmap btree.
> > 
> > A previous version of this patch forgot to keep the (un)written state
> > flag masked during the comparison and caused a major regression in
> > 5.9.x since unwritten extent conversion can update an rmap record
> > without requiring key updates.
> > 
> > Note that blocks cannot go directly from data fork to attr fork without
> > being deallocated and reallocated, nor can they be added to or removed
> > from a bmbt without a free/alloc cycle, so this should not cause any
> > regressions.
> > 
> > Found by fuzzing keys[1].attrfork = ones on xfs/371.
> > 
> > Fixes: 4b8ed67794fe ("xfs: add rmap btree operations")
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  fs/xfs/libxfs/xfs_rmap_btree.c |   25 +++++++++++++++++--------
> >  1 file changed, 17 insertions(+), 8 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/libxfs/xfs_rmap_btree.c b/fs/xfs/libxfs/xfs_rmap_btree.c
> > index 7f83f62e51e0..e2e1f68cedf5 100644
> > --- a/fs/xfs/libxfs/xfs_rmap_btree.c
> > +++ b/fs/xfs/libxfs/xfs_rmap_btree.c
> > @@ -219,6 +219,15 @@ xfs_rmapbt_init_ptr_from_cur(
> >  	ptr->s = agf->agf_roots[cur->bc_btnum];
> >  }
> >  
> > +/*
> > + * Fork and bmbt are significant parts of the rmap record key, but written
> > + * status is merely a record attribute.
> > + */
> > +static inline uint64_t offset_keymask(uint64_t offset)
> > +{
> > +	return offset & ~XFS_RMAP_OFF_UNWRITTEN;
> > +}
> 
> Ok. but doesn't that mean xfs_rmapbt_init_key_from_rec() and
> xfs_rmapbt_init_high_key_from_rec() should be masking out the
> XFS_RMAP_OFF_UNWRITTEN bit as well?

It ought to, but it might be too late for that because
_init_*key_from_rec have been letting the unwritten bit slip into the
rmap key structures since 4.8.  Somewhere out there is a filesystem with
rmapbt node blocks containing struct xfs_rmap_key's with that unwritten
bit set.  The best we can do is ignore it in the key comparison
function.

Let me think about this overnight though, because once we stop paying
attention to the unwritten bit for key comparisons, it might not matter
what's in the ondisk node blocks.

--D

> -Dave.
> 
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-11-02 23:53 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-02 18:20 [PATCHSET v23.1 0/2] xfs: enhance btree key checking Darrick J. Wong
2022-10-02 18:20 ` [PATCH 2/2] xfs: check btree keys reflect the child block Darrick J. Wong
2022-11-01 23:41   ` Dave Chinner
2022-10-02 18:20 ` [PATCH 1/2] xfs: fix rmap key comparison functions Darrick J. Wong
2022-11-01 23:40   ` Dave Chinner
2022-11-02 23:53     ` Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).