[PATCH v3 0/5] xfs stable patches for 5.10.y (backports from v5.15.y)

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 0/5] xfs stable patches for 5.10.y (backports from v5.15.y)
@ 2022-06-27  6:46 Amir Goldstein
  2022-06-27  6:46 ` [PATCH v3 1/5] xfs: use kmem_cache_free() for kmem_cache objects Amir Goldstein
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Amir Goldstein @ 2022-06-27  6:46 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Sasha Levin, Darrick J . Wong, Leah Rumancik, Chandan Babu R,
	Luis Chamberlain, Adam Manzanares, linux-xfs, stable

Hi Greg,

We started to experiment with the "xfs stable maintainers" concept that
Darrick has proposed.

My focus is on 5.10.y backports,
Leah's focus is on 5.15.y backports and
Chandan may pick up the 5.4.y backports.

This stable update is the first collaborated series for 5.10.y/5.15.y.

It started with a 5.15.y patch series from Leah [1] of patches that
specifically fix bugs prior to 5.10, to whom I already had tested
5.10 backport patches for.

Leah will be sending the ACKed 5.15.y series [2].
This 5.10.y series is a subset of the 5.15.y series that was
ACKed by xfs developers for 5.10.y.

All the patches in the 5.15.y series were backported to 5.10.
The ones that are not included in this 5.10.y update were more subtle
to backport, so the backports need more time for review and I will send
them in one of the following stable updates.

I would like to thank Darrick for reviewing the backport candidates.
I would like to thank Luis for his ongoing support of the kdevops [3]
test environment and Samsung for contributing the hardware to drive it.

Thanks,
Amir.

Changes since [v2]:
- CC stable

Changes since [v1]:
- Leave 5 out of 11
- Accked by Darrick

[1] https://lore.kernel.org/linux-xfs/20220616182749.1200971-1-leah.rumancik@gmail.com/
[2] https://lore.kernel.org/linux-xfs/20220623203641.1710377-1-leah.rumancik@gmail.com/
[3] https://github.com/linux-kdevops/kdevops
[v1] https://lore.kernel.org/linux-xfs/20220617100641.1653164-1-amir73il@gmail.com/
[v2] https://lore.kernel.org/linux-xfs/20220624063702.2380990-1-amir73il@gmail.com/

Brian Foster (1):
  xfs: punch out data fork delalloc blocks on COW writeback failure

Darrick J. Wong (1):
  xfs: remove all COW fork extents when remounting readonly

Dave Chinner (1):
  xfs: check sb_meta_uuid for dabuf buffer recovery

Rustam Kovhaev (1):
  xfs: use kmem_cache_free() for kmem_cache objects

Yang Xu (1):
  xfs: Fix the free logic of state in xfs_attr_node_hasname

 fs/xfs/libxfs/xfs_attr.c      | 13 +++++--------
 fs/xfs/xfs_aops.c             | 15 ++++++++++++---
 fs/xfs/xfs_buf_item_recover.c |  2 +-
 fs/xfs/xfs_extfree_item.c     |  6 +++---
 fs/xfs/xfs_super.c            | 14 +++++++++++---
 5 files changed, 32 insertions(+), 18 deletions(-)

-- 
2.25.1

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v3 1/5] xfs: use kmem_cache_free() for kmem_cache objects
  2022-06-27  6:46 [PATCH v3 0/5] xfs stable patches for 5.10.y (backports from v5.15.y) Amir Goldstein
@ 2022-06-27  6:46 ` Amir Goldstein
  2022-06-27  6:47 ` [PATCH v3 2/5] xfs: punch out data fork delalloc blocks on COW writeback failure Amir Goldstein
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Amir Goldstein @ 2022-06-27  6:46 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Sasha Levin, Darrick J . Wong, Leah Rumancik, Chandan Babu R,
	Luis Chamberlain, Adam Manzanares, linux-xfs, stable,
	Rustam Kovhaev

From: Rustam Kovhaev <rkovhaev@gmail.com>

commit c30a0cbd07ecc0eec7b3cd568f7b1c7bb7913f93 upstream.

For kmalloc() allocations SLOB prepends the blocks with a 4-byte header,
and it puts the size of the allocated blocks in that header.
Blocks allocated with kmem_cache_alloc() allocations do not have that
header.

SLOB explodes when you allocate memory with kmem_cache_alloc() and then
try to free it with kfree() instead of kmem_cache_free().
SLOB will assume that there is a header when there is none, read some
garbage to size variable and corrupt the adjacent objects, which
eventually leads to hang or panic.

Let's make XFS work with SLOB by using proper free function.

Fixes: 9749fee83f38 ("xfs: enable the xfs_defer mechanism to process extents to free")
Signed-off-by: Rustam Kovhaev <rkovhaev@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_extfree_item.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index 5c0395256bd1..11474770d630 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -482,7 +482,7 @@ xfs_extent_free_finish_item(
 			free->xefi_startblock,
 			free->xefi_blockcount,
 			&free->xefi_oinfo, free->xefi_skip_discard);
-	kmem_free(free);
+	kmem_cache_free(xfs_bmap_free_item_zone, free);
 	return error;
 }
 
@@ -502,7 +502,7 @@ xfs_extent_free_cancel_item(
 	struct xfs_extent_free_item	*free;
 
 	free = container_of(item, struct xfs_extent_free_item, xefi_list);
-	kmem_free(free);
+	kmem_cache_free(xfs_bmap_free_item_zone, free);
 }
 
 const struct xfs_defer_op_type xfs_extent_free_defer_type = {
@@ -564,7 +564,7 @@ xfs_agfl_free_finish_item(
 	extp->ext_len = free->xefi_blockcount;
 	efdp->efd_next_extent++;
 
-	kmem_free(free);
+	kmem_cache_free(xfs_bmap_free_item_zone, free);
 	return error;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v3 2/5] xfs: punch out data fork delalloc blocks on COW writeback failure
  2022-06-27  6:46 [PATCH v3 0/5] xfs stable patches for 5.10.y (backports from v5.15.y) Amir Goldstein
  2022-06-27  6:46 ` [PATCH v3 1/5] xfs: use kmem_cache_free() for kmem_cache objects Amir Goldstein
@ 2022-06-27  6:47 ` Amir Goldstein
  2022-06-27  6:47 ` [PATCH v3 3/5] xfs: Fix the free logic of state in xfs_attr_node_hasname Amir Goldstein
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Amir Goldstein @ 2022-06-27  6:47 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Sasha Levin, Darrick J . Wong, Leah Rumancik, Chandan Babu R,
	Luis Chamberlain, Adam Manzanares, linux-xfs, stable,
	Brian Foster

From: Brian Foster <bfoster@redhat.com>

commit 5ca5916b6bc93577c360c06cb7cdf71adb9b5faf upstream.

If writeback I/O to a COW extent fails, the COW fork blocks are
punched out and the data fork blocks left alone. It is possible for
COW fork blocks to overlap non-shared data fork blocks (due to
cowextsz hint prealloc), however, and writeback unconditionally maps
to the COW fork whenever blocks exist at the corresponding offset of
the page undergoing writeback. This means it's quite possible for a
COW fork extent to overlap delalloc data fork blocks, writeback to
convert and map to the COW fork blocks, writeback to fail, and
finally for ioend completion to cancel the COW fork blocks and leave
stale data fork delalloc blocks around in the inode. The blocks are
effectively stale because writeback failure also discards dirty page
state.

If this occurs, it is likely to trigger assert failures, free space
accounting corruption and failures in unrelated file operations. For
example, a subsequent reflink attempt of the affected file to a new
target file will trip over the stale delalloc in the source file and
fail. Several of these issues are occasionally reproduced by
generic/648, but are reproducible on demand with the right sequence
of operations and timely I/O error injection.

To fix this problem, update the ioend failure path to also punch out
underlying data fork delalloc blocks on I/O error. This is analogous
to the writeback submission failure path in xfs_discard_page() where
we might fail to map data fork delalloc blocks and consistent with
the successful COW writeback completion path, which is responsible
for unmapping from the data fork and remapping in COW fork blocks.

Fixes: 787eb485509f ("xfs: fix and streamline error handling in xfs_end_io")
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_aops.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 4304c6416fbb..4b76a32d2f16 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -145,6 +145,7 @@ xfs_end_ioend(
 	struct iomap_ioend	*ioend)
 {
 	struct xfs_inode	*ip = XFS_I(ioend->io_inode);
+	struct xfs_mount	*mp = ip->i_mount;
 	xfs_off_t		offset = ioend->io_offset;
 	size_t			size = ioend->io_size;
 	unsigned int		nofs_flag;
@@ -160,18 +161,26 @@ xfs_end_ioend(
 	/*
 	 * Just clean up the in-memory strutures if the fs has been shut down.
 	 */
-	if (XFS_FORCED_SHUTDOWN(ip->i_mount)) {
+	if (XFS_FORCED_SHUTDOWN(mp)) {
 		error = -EIO;
 		goto done;
 	}

 	/*
-	 * Clean up any COW blocks on an I/O error.
+	 * Clean up all COW blocks and underlying data fork delalloc blocks on
+	 * I/O error. The delalloc punch is required because this ioend was
+	 * mapped to blocks in the COW fork and the associated pages are no
+	 * longer dirty. If we don't remove delalloc blocks here, they become
+	 * stale and can corrupt free space accounting on unmount.
 	 */
 	error = blk_status_to_errno(ioend->io_bio->bi_status);
 	if (unlikely(error)) {
-		if (ioend->io_flags & IOMAP_F_SHARED)
+		if (ioend->io_flags & IOMAP_F_SHARED) {
 			xfs_reflink_cancel_cow_range(ip, offset, size, true);
+			xfs_bmap_punch_delalloc_range(ip,
+						      XFS_B_TO_FSBT(mp, offset),
+						      XFS_B_TO_FSB(mp, size));
+		}
 		goto done;
 	}

-- 
2.25.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v3 3/5] xfs: Fix the free logic of state in xfs_attr_node_hasname
  2022-06-27  6:46 [PATCH v3 0/5] xfs stable patches for 5.10.y (backports from v5.15.y) Amir Goldstein
  2022-06-27  6:46 ` [PATCH v3 1/5] xfs: use kmem_cache_free() for kmem_cache objects Amir Goldstein
  2022-06-27  6:47 ` [PATCH v3 2/5] xfs: punch out data fork delalloc blocks on COW writeback failure Amir Goldstein
@ 2022-06-27  6:47 ` Amir Goldstein
  2022-06-27  6:47 ` [PATCH v3 4/5] xfs: remove all COW fork extents when remounting readonly Amir Goldstein
  2022-06-27  6:47 ` [PATCH v3 5/5] xfs: check sb_meta_uuid for dabuf buffer recovery Amir Goldstein
  4 siblings, 0 replies; 6+ messages in thread
From: Amir Goldstein @ 2022-06-27  6:47 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Sasha Levin, Darrick J . Wong, Leah Rumancik, Chandan Babu R,
	Luis Chamberlain, Adam Manzanares, linux-xfs, stable, Yang Xu

From: Yang Xu <xuyang2018.jy@fujitsu.com>

commit a1de97fe296c52eafc6590a3506f4bbd44ecb19a upstream.

When testing xfstests xfs/126 on lastest upstream kernel, it will hang on some machine.
Adding a getxattr operation after xattr corrupted, I can reproduce it 100%.

The deadlock as below:
[983.923403] task:setfattr        state:D stack:    0 pid:17639 ppid: 14687 flags:0x00000080
[  983.923405] Call Trace:
[  983.923410]  __schedule+0x2c4/0x700
[  983.923412]  schedule+0x37/0xa0
[  983.923414]  schedule_timeout+0x274/0x300
[  983.923416]  __down+0x9b/0xf0
[  983.923451]  ? xfs_buf_find.isra.29+0x3c8/0x5f0 [xfs]
[  983.923453]  down+0x3b/0x50
[  983.923471]  xfs_buf_lock+0x33/0xf0 [xfs]
[  983.923490]  xfs_buf_find.isra.29+0x3c8/0x5f0 [xfs]
[  983.923508]  xfs_buf_get_map+0x4c/0x320 [xfs]
[  983.923525]  xfs_buf_read_map+0x53/0x310 [xfs]
[  983.923541]  ? xfs_da_read_buf+0xcf/0x120 [xfs]
[  983.923560]  xfs_trans_read_buf_map+0x1cf/0x360 [xfs]
[  983.923575]  ? xfs_da_read_buf+0xcf/0x120 [xfs]
[  983.923590]  xfs_da_read_buf+0xcf/0x120 [xfs]
[  983.923606]  xfs_da3_node_read+0x1f/0x40 [xfs]
[  983.923621]  xfs_da3_node_lookup_int+0x69/0x4a0 [xfs]
[  983.923624]  ? kmem_cache_alloc+0x12e/0x270
[  983.923637]  xfs_attr_node_hasname+0x6e/0xa0 [xfs]
[  983.923651]  xfs_has_attr+0x6e/0xd0 [xfs]
[  983.923664]  xfs_attr_set+0x273/0x320 [xfs]
[  983.923683]  xfs_xattr_set+0x87/0xd0 [xfs]
[  983.923686]  __vfs_removexattr+0x4d/0x60
[  983.923688]  __vfs_removexattr_locked+0xac/0x130
[  983.923689]  vfs_removexattr+0x4e/0xf0
[  983.923690]  removexattr+0x4d/0x80
[  983.923693]  ? __check_object_size+0xa8/0x16b
[  983.923695]  ? strncpy_from_user+0x47/0x1a0
[  983.923696]  ? getname_flags+0x6a/0x1e0
[  983.923697]  ? _cond_resched+0x15/0x30
[  983.923699]  ? __sb_start_write+0x1e/0x70
[  983.923700]  ? mnt_want_write+0x28/0x50
[  983.923701]  path_removexattr+0x9b/0xb0
[  983.923702]  __x64_sys_removexattr+0x17/0x20
[  983.923704]  do_syscall_64+0x5b/0x1a0
[  983.923705]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[  983.923707] RIP: 0033:0x7f080f10ee1b

When getxattr calls xfs_attr_node_get function, xfs_da3_node_lookup_int fails with EFSCORRUPTED in
xfs_attr_node_hasname because we have use blocktrash to random it in xfs/126. So it
free state in internal and xfs_attr_node_get doesn't do xfs_buf_trans release job.

Then subsequent removexattr will hang because of it.

This bug was introduced by kernel commit 07120f1abdff ("xfs: Add xfs_has_attr and subroutines").
It adds xfs_attr_node_hasname helper and said caller will be responsible for freeing the state
in this case. But xfs_attr_node_hasname will free state itself instead of caller if
xfs_da3_node_lookup_int fails.

Fix this bug by moving the step of free state into caller.

[amir: this text from original commit is not relevant for 5.10 backport:
Also, use "goto error/out" instead of returning error directly in xfs_attr_node_addname_find_attr and
xfs_attr_node_removename_setup function because we should free state ourselves.
]

Fixes: 07120f1abdff ("xfs: Add xfs_has_attr and subroutines")
Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 96ac7e562b87..fcca36bbd997 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -876,21 +876,18 @@ xfs_attr_node_hasname(
 
 	state = xfs_da_state_alloc(args);
 	if (statep != NULL)
-		*statep = NULL;
+		*statep = state;
 
 	/*
 	 * Search to see if name exists, and get back a pointer to it.
 	 */
 	error = xfs_da3_node_lookup_int(state, &retval);
-	if (error) {
-		xfs_da_state_free(state);
-		return error;
-	}
+	if (error)
+		retval = error;
 
-	if (statep != NULL)
-		*statep = state;
-	else
+	if (!statep)
 		xfs_da_state_free(state);
+
 	return retval;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v3 4/5] xfs: remove all COW fork extents when remounting readonly
  2022-06-27  6:46 [PATCH v3 0/5] xfs stable patches for 5.10.y (backports from v5.15.y) Amir Goldstein
                   ` (2 preceding siblings ...)
  2022-06-27  6:47 ` [PATCH v3 3/5] xfs: Fix the free logic of state in xfs_attr_node_hasname Amir Goldstein
@ 2022-06-27  6:47 ` Amir Goldstein
  2022-06-27  6:47 ` [PATCH v3 5/5] xfs: check sb_meta_uuid for dabuf buffer recovery Amir Goldstein
  4 siblings, 0 replies; 6+ messages in thread
From: Amir Goldstein @ 2022-06-27  6:47 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Sasha Levin, Darrick J . Wong, Leah Rumancik, Chandan Babu R,
	Luis Chamberlain, Adam Manzanares, linux-xfs, stable,
	Dave Chinner

From: "Darrick J. Wong" <djwong@kernel.org>

commit 089558bc7ba785c03815a49c89e28ad9b8de51f9 upstream.

[backport xfs_icwalk -> xfs_eofblocks for 5.10.y]

As part of multiple customer escalations due to file data corruption
after copy on write operations, I wrote some fstests that use fsstress
to hammer on COW to shake things loose.  Regrettably, I caught some
filesystem shutdowns due to incorrect rmap operations with the following
loop:

mount <filesystem>				# (0)
fsstress <run only readonly ops> &		# (1)
while true; do
	fsstress <run all ops>
	mount -o remount,ro			# (2)
	fsstress <run only readonly ops>
	mount -o remount,rw			# (3)
done

When (2) happens, notice that (1) is still running.  xfs_remount_ro will
call xfs_blockgc_stop to walk the inode cache to free all the COW
extents, but the blockgc mechanism races with (1)'s reader threads to
take IOLOCKs and loses, which means that it doesn't clean them all out.
Call such a file (A).

When (3) happens, xfs_remount_rw calls xfs_reflink_recover_cow, which
walks the ondisk refcount btree and frees any COW extent that it finds.
This function does not check the inode cache, which means that incore
COW forks of inode (A) is now inconsistent with the ondisk metadata.  If
one of those former COW extents are allocated and mapped into another
file (B) and someone triggers a COW to the stale reservation in (A), A's
dirty data will be written into (B) and once that's done, those blocks
will be transferred to (A)'s data fork without bumping the refcount.

The results are catastrophic -- file (B) and the refcount btree are now
corrupt.  Solve this race by forcing the xfs_blockgc_free_space to run
synchronously, which causes xfs_icwalk to return to inodes that were
skipped because the blockgc code couldn't take the IOLOCK.  This is safe
to do here because the VFS has already prohibited new writer threads.

Fixes: 10ddf64e420f ("xfs: remove leftover CoW reservations when remounting ro")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_super.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 5ebd6cdc44a7..05cea7788d49 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1695,7 +1695,10 @@ static int
 xfs_remount_ro(
 	struct xfs_mount	*mp)
 {
-	int error;
+	struct xfs_eofblocks	eofb = {
+		.eof_flags	= XFS_EOF_FLAGS_SYNC,
+	};
+	int			error;

 	/*
 	 * Cancel background eofb scanning so it cannot race with the final
@@ -1703,8 +1706,13 @@ xfs_remount_ro(
 	 */
 	xfs_stop_block_reaping(mp);

-	/* Get rid of any leftover CoW reservations... */
-	error = xfs_icache_free_cowblocks(mp, NULL);
+	/*
+	 * Clear out all remaining COW staging extents and speculative post-EOF
+	 * preallocations so that we don't leave inodes requiring inactivation
+	 * cleanups during reclaim on a read-only mount.  We must process every
+	 * cached inode, so this requires a synchronous cache scan.
+	 */
+	error = xfs_icache_free_cowblocks(mp, &eofb);
 	if (error) {
 		xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
 		return error;
-- 
2.25.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v3 5/5] xfs: check sb_meta_uuid for dabuf buffer recovery
  2022-06-27  6:46 [PATCH v3 0/5] xfs stable patches for 5.10.y (backports from v5.15.y) Amir Goldstein
                   ` (3 preceding siblings ...)
  2022-06-27  6:47 ` [PATCH v3 4/5] xfs: remove all COW fork extents when remounting readonly Amir Goldstein
@ 2022-06-27  6:47 ` Amir Goldstein
  4 siblings, 0 replies; 6+ messages in thread
From: Amir Goldstein @ 2022-06-27  6:47 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Sasha Levin, Darrick J . Wong, Leah Rumancik, Chandan Babu R,
	Luis Chamberlain, Adam Manzanares, linux-xfs, stable,
	Dave Chinner

From: Dave Chinner <dchinner@redhat.com>

commit 09654ed8a18cfd45027a67d6cbca45c9ea54feab upstream.

Got a report that a repeated crash test of a container host would
eventually fail with a log recovery error preventing the system from
mounting the root filesystem. It manifested as a directory leaf node
corruption on writeback like so:

 XFS (loop0): Mounting V5 Filesystem
 XFS (loop0): Starting recovery (logdev: internal)
 XFS (loop0): Metadata corruption detected at xfs_dir3_leaf_check_int+0x99/0xf0, xfs_dir3_leaf1 block 0x12faa158
 XFS (loop0): Unmount and run xfs_repair
 XFS (loop0): First 128 bytes of corrupted metadata buffer:
 00000000: 00 00 00 00 00 00 00 00 3d f1 00 00 e1 9e d5 8b  ........=.......
 00000010: 00 00 00 00 12 fa a1 58 00 00 00 29 00 00 1b cc  .......X...)....
 00000020: 91 06 78 ff f7 7e 4a 7d 8d 53 86 f2 ac 47 a8 23  ..x..~J}.S...G.#
 00000030: 00 00 00 00 17 e0 00 80 00 43 00 00 00 00 00 00  .........C......
 00000040: 00 00 00 2e 00 00 00 08 00 00 17 2e 00 00 00 0a  ................
 00000050: 02 35 79 83 00 00 00 30 04 d3 b4 80 00 00 01 50  .5y....0.......P
 00000060: 08 40 95 7f 00 00 02 98 08 41 fe b7 00 00 02 d4  .@.......A......
 00000070: 0d 62 ef a7 00 00 01 f2 14 50 21 41 00 00 00 0c  .b.......P!A....
 XFS (loop0): Corruption of in-memory data (0x8) detected at xfs_do_force_shutdown+0x1a/0x20 (fs/xfs/xfs_buf.c:1514).  Shutting down.
 XFS (loop0): Please unmount the filesystem and rectify the problem(s)
 XFS (loop0): log mount/recovery failed: error -117
 XFS (loop0): log mount failed

Tracing indicated that we were recovering changes from a transaction
at LSN 0x29/0x1c16 into a buffer that had an LSN of 0x29/0x1d57.
That is, log recovery was overwriting a buffer with newer changes on
disk than was in the transaction. Tracing indicated that we were
hitting the "recovery immediately" case in
xfs_buf_log_recovery_lsn(), and hence it was ignoring the LSN in the
buffer.

The code was extracting the LSN correctly, then ignoring it because
the UUID in the buffer did not match the superblock UUID. The
problem arises because the UUID check uses the wrong UUID - it
should be checking the sb_meta_uuid, not sb_uuid. This filesystem
has sb_uuid != sb_meta_uuid (which is fine), and the buffer has the
correct matching sb_meta_uuid in it, it's just the code checked it
against the wrong superblock uuid.

The is no corruption in the filesystem, and failing to recover the
buffer due to a write verifier failure means the recovery bug did
not propagate the corruption to disk. Hence there is no corruption
before or after this bug has manifested, the impact is limited
simply to an unmountable filesystem....

This was missed back in 2015 during an audit of incorrect sb_uuid
usage that resulted in commit fcfbe2c4ef42 ("xfs: log recovery needs
to validate against sb_meta_uuid") that fixed the magic32 buffers to
validate against sb_meta_uuid instead of sb_uuid. It missed the
magicda buffers....

Fixes: ce748eaa65f2 ("xfs: create new metadata UUID field and incompat flag")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_buf_item_recover.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c
index d44e8b4a3391..1d649462d731 100644
--- a/fs/xfs/xfs_buf_item_recover.c
+++ b/fs/xfs/xfs_buf_item_recover.c
@@ -805,7 +805,7 @@ xlog_recover_get_buf_lsn(
 	}

 	if (lsn != (xfs_lsn_t)-1) {
-		if (!uuid_equal(&mp->m_sb.sb_uuid, uuid))
+		if (!uuid_equal(&mp->m_sb.sb_meta_uuid, uuid))
 			goto recover_immediately;
 		return lsn;
 	}
-- 
2.25.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-06-27  6:47 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-27  6:46 [PATCH v3 0/5] xfs stable patches for 5.10.y (backports from v5.15.y) Amir Goldstein
2022-06-27  6:46 ` [PATCH v3 1/5] xfs: use kmem_cache_free() for kmem_cache objects Amir Goldstein
2022-06-27  6:47 ` [PATCH v3 2/5] xfs: punch out data fork delalloc blocks on COW writeback failure Amir Goldstein
2022-06-27  6:47 ` [PATCH v3 3/5] xfs: Fix the free logic of state in xfs_attr_node_hasname Amir Goldstein
2022-06-27  6:47 ` [PATCH v3 4/5] xfs: remove all COW fork extents when remounting readonly Amir Goldstein
2022-06-27  6:47 ` [PATCH v3 5/5] xfs: check sb_meta_uuid for dabuf buffer recovery Amir Goldstein

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.