All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET v2 0/3] xfs: random fixes for 5.19-rc5
@ 2022-06-27 21:35 Darrick J. Wong
  2022-06-27 21:35 ` [PATCH 1/3] xfs: empty xattr leaf header blocks are not corruption Darrick J. Wong
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Darrick J. Wong @ 2022-06-27 21:35 UTC (permalink / raw)
  To: djwong; +Cc: Dave Chinner, linux-xfs, david, allison.henderson

Hi all,

This week, we're fixing a log recovery regression that appeared in
5.19-rc1 due to an incorrect verifier patch, and a problem where closing
an O_RDONLY realtime file on a frozen fs incorrectly triggers eofblocks
removal.

v2: move the history of the leaf header count checks to the commit message

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=xfs-fixes-5.19
---
 fs/xfs/libxfs/xfs_attr.c      |   38 +++++++++-----------------------------
 fs/xfs/libxfs/xfs_attr.h      |    5 -----
 fs/xfs/libxfs/xfs_attr_leaf.c |   35 +++++++++++++++++++----------------
 fs/xfs/libxfs/xfs_attr_leaf.h |    3 +--
 fs/xfs/xfs_attr_item.c        |   22 ----------------------
 fs/xfs/xfs_bmap_util.c        |    2 ++
 6 files changed, 31 insertions(+), 74 deletions(-)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] xfs: empty xattr leaf header blocks are not corruption
  2022-06-27 21:35 [PATCHSET v2 0/3] xfs: random fixes for 5.19-rc5 Darrick J. Wong
@ 2022-06-27 21:35 ` Darrick J. Wong
  2022-06-28  0:27   ` Dave Chinner
  2022-06-27 21:35 ` [PATCH 2/3] xfs: don't hold xattr leaf buffers across transaction rolls Darrick J. Wong
  2022-06-27 21:35 ` [PATCH 3/3] xfs: dont treat rt extents beyond EOF as eofblocks to be cleared Darrick J. Wong
  2 siblings, 1 reply; 12+ messages in thread
From: Darrick J. Wong @ 2022-06-27 21:35 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs, david, allison.henderson

From: Darrick J. Wong <djwong@kernel.org>

TLDR: Revert commit 51e6104fdb95 ("xfs: detect empty attr leaf blocks in
xfs_attr3_leaf_verify") because it was wrong.

Every now and then we get a corruption report from the kernel or
xfs_repair about empty leaf blocks in the extended attribute structure.
We've long thought that these shouldn't be possible, but prior to 5.18
one would shake loose in the recoveryloop fstests about once a month.

A new addition to the xattr leaf block verifier in 5.19-rc1 makes this
happen every 7 minutes on my testing cloud.  I added a ton of logging to
detect any time we set the header count on an xattr leaf block to zero.
This produced the following dmesg output on generic/388:

XFS (sda4): ino 0x21fcbaf leaf 0x129bf78 hdcount==0!
Call Trace:
 <TASK>
 dump_stack_lvl+0x34/0x44
 xfs_attr3_leaf_create+0x187/0x230
 xfs_attr_shortform_to_leaf+0xd1/0x2f0
 xfs_attr_set_iter+0x73e/0xa90
 xfs_xattri_finish_update+0x45/0x80
 xfs_attr_finish_item+0x1b/0xd0
 xfs_defer_finish_noroll+0x19c/0x770
 __xfs_trans_commit+0x153/0x3e0
 xfs_attr_set+0x36b/0x740
 xfs_xattr_set+0x89/0xd0
 __vfs_setxattr+0x67/0x80
 __vfs_setxattr_noperm+0x6e/0x120
 vfs_setxattr+0x97/0x180
 setxattr+0x88/0xa0
 path_setxattr+0xc3/0xe0
 __x64_sys_setxattr+0x27/0x30
 do_syscall_64+0x35/0x80
 entry_SYSCALL_64_after_hwframe+0x46/0xb0

So now we know that someone is creating empty xattr leaf blocks as part
of converting a sf xattr structure into a leaf xattr structure.  The
conversion routine logs any existing sf attributes in the same
transaction that creates the leaf block, so we know this is a setxattr
to a file that has no attributes at all.

Next, g/388 calls the shutdown ioctl and cycles the mount to trigger log
recovery.  I also augmented buffer item recovery to call ->verify_struct
on any attr leaf blocks and complain if it finds a failure:

XFS (sda4): Unmounting Filesystem
XFS (sda4): Mounting V5 Filesystem
XFS (sda4): Starting recovery (logdev: internal)
XFS (sda4): xattr leaf daddr 0x129bf78 hdrcount == 0!
Call Trace:
 <TASK>
 dump_stack_lvl+0x34/0x44
 xfs_attr3_leaf_verify+0x3b8/0x420
 xlog_recover_buf_commit_pass2+0x60a/0x6c0
 xlog_recover_items_pass2+0x4e/0xc0
 xlog_recover_commit_trans+0x33c/0x350
 xlog_recovery_process_trans+0xa5/0xe0
 xlog_recover_process_data+0x8d/0x140
 xlog_do_recovery_pass+0x19b/0x720
 xlog_do_log_recovery+0x62/0xc0
 xlog_do_recover+0x33/0x1d0
 xlog_recover+0xda/0x190
 xfs_log_mount+0x14c/0x360
 xfs_mountfs+0x517/0xa60
 xfs_fs_fill_super+0x6bc/0x950
 get_tree_bdev+0x175/0x280
 vfs_get_tree+0x1a/0x80
 path_mount+0x6f5/0xaa0
 __x64_sys_mount+0x103/0x140
 do_syscall_64+0x35/0x80
 entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7fc61e241eae

And a moment later, the _delwri_submit of the recovered buffers trips
the same verifier and recovery fails:

XFS (sda4): Metadata corruption detected at xfs_attr3_leaf_verify+0x393/0x420 [xfs], xfs_attr3_leaf block 0x129bf78
XFS (sda4): Unmount and run xfs_repair
XFS (sda4): First 128 bytes of corrupted metadata buffer:
00000000: 00 00 00 00 00 00 00 00 3b ee 00 00 00 00 00 00  ........;.......
00000010: 00 00 00 00 01 29 bf 78 00 00 00 00 00 00 00 00  .....).x........
00000020: a5 1b d0 02 b2 9a 49 df 8e 9c fb 8d f8 31 3e 9d  ......I......1>.
00000030: 00 00 00 00 02 1f cb af 00 00 00 00 10 00 00 00  ................
00000040: 00 50 0f b0 00 00 00 00 00 00 00 00 00 00 00 00  .P..............
00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
XFS (sda4): Corruption of in-memory data (0x8) detected at _xfs_buf_ioapply+0x37f/0x3b0 [xfs] (fs/xfs/xfs_buf.c:1518).  Shutting down filesystem.
XFS (sda4): Please unmount the filesystem and rectify the problem(s)
XFS (sda4): log mount/recovery failed: error -117
XFS (sda4): log mount failed

I think I see what's going on here -- setxattr is racing with something
that shuts down the filesystem:

Thread 1				Thread 2
--------				--------
xfs_attr_sf_addname
xfs_attr_shortform_to_leaf
<create empty leaf>
xfs_trans_bhold(leaf)
xattri_dela_state = XFS_DAS_LEAF_ADD
<roll transaction>
					<flush log>
					<shut down filesystem>
xfs_trans_bhold_release(leaf)
<discover fs is dead, bail>

Thread 3
--------
<cycle mount, start recovery>
xlog_recover_buf_commit_pass2
xlog_recover_do_reg_buffer
<replay empty leaf buffer from recovered buf item>
xfs_buf_delwri_queue(leaf)
xfs_buf_delwri_submit
_xfs_buf_ioapply(leaf)
xfs_attr3_leaf_write_verify
<trip over empty leaf buffer>
<fail recovery>

As you can see, the bhold keeps the leaf buffer locked and thus prevents
the *AIL* from tripping over the ichdr.count==0 check in the write
verifier.  Unfortunately, it doesn't prevent the log from getting
flushed to disk, which sets up log recovery to fail.

So.  It's clear that the kernel has always had the ability to persist
attr leaf blocks with ichdr.count==0, which means that it's part of the
ondisk format now.

Unfortunately, this check has been added and removed multiple times
throughout history.  It first appeared in[1] kernel 3.10 as part of the
early V5 format patches.  The check was later discovered to break log
recovery and hence disabled[2] during log recovery in kernel 4.10.
Simultaneously, the check was added[3] to xfs_repair 4.9.0 to try to
weed out the empty leaf blocks.  This was still not correct because log
recovery would recover an empty attr leaf block successfully only for
regular xattr operations to trip over the empty block during of the
block during regular operation.  Therefore, the check was removed
entirely[4] in kernel 5.7 but removal of the xfs_repair check was
forgotten.  The continued complaints from xfs_repair lead to us
mistakenly re-adding[5] the verifier check for kernel 5.19.  Remove it
once again.

[1] 517c22207b04 ("xfs: add CRCs to attr leaf blocks")
[2] 2e1d23370e75 ("xfs: ignore leaf attr ichdr.count in verifier
                   during log replay")
[3] f7140161 ("xfs_repair: junk leaf attribute if count == 0")
[4] f28cef9e4dac ("xfs: don't fail verifier on empty attr3 leaf
                   block")
[5] 51e6104fdb95 ("xfs: detect empty attr leaf blocks in
                   xfs_attr3_leaf_verify")

Looking at the rest of the xattr code, it seems that files with empty
leaf blocks behave as expected -- listxattr reports no attributes;
getxattr on any xattr returns nothing as expected; removexattr does
nothing; and setxattr can add attributes just fine.

Original-bug: 517c22207b04 ("xfs: add CRCs to attr leaf blocks")
Still-not-fixed-by: 2e1d23370e75 ("xfs: ignore leaf attr ichdr.count in verifier during log replay")
Removed-in: f28cef9e4dac ("xfs: don't fail verifier on empty attr3 leaf block")
Fixes: 51e6104fdb95 ("xfs: detect empty attr leaf blocks in xfs_attr3_leaf_verify")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr_leaf.c |   26 +++++++++++++++++---------
 1 file changed, 17 insertions(+), 9 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
index 37e7c33f6283..f6629960e17b 100644
--- a/fs/xfs/libxfs/xfs_attr_leaf.c
+++ b/fs/xfs/libxfs/xfs_attr_leaf.c
@@ -289,6 +289,23 @@ xfs_attr3_leaf_verify_entry(
 	return NULL;
 }
 
+/*
+ * Validate an attribute leaf block.
+ *
+ * Empty leaf blocks can occur under the following circumstances:
+ *
+ * 1. setxattr adds a new extended attribute to a file;
+ * 2. The file has zero existing attributes;
+ * 3. The attribute is too large to fit in the attribute fork;
+ * 4. The attribute is small enough to fit in a leaf block;
+ * 5. A log flush occurs after committing the transaction that creates
+ *    the (empty) leaf block; and
+ * 6. The filesystem goes down after the log flush but before the new
+ *    attribute can be committed to the leaf block.
+ *
+ * Hence we need to ensure that we don't fail the validation purely
+ * because the leaf is empty.
+ */
 static xfs_failaddr_t
 xfs_attr3_leaf_verify(
 	struct xfs_buf			*bp)
@@ -310,15 +327,6 @@ xfs_attr3_leaf_verify(
 	if (fa)
 		return fa;
 
-	/*
-	 * Empty leaf blocks should never occur;  they imply the existence of a
-	 * software bug that needs fixing. xfs_repair also flags them as a
-	 * corruption that needs fixing, so we should never let these go to
-	 * disk.
-	 */
-	if (ichdr.count == 0)
-		return __this_address;
-
 	/*
 	 * firstused is the block offset of the first name info structure.
 	 * Make sure it doesn't go off the block or crash into the header.


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/3] xfs: don't hold xattr leaf buffers across transaction rolls
  2022-06-27 21:35 [PATCHSET v2 0/3] xfs: random fixes for 5.19-rc5 Darrick J. Wong
  2022-06-27 21:35 ` [PATCH 1/3] xfs: empty xattr leaf header blocks are not corruption Darrick J. Wong
@ 2022-06-27 21:35 ` Darrick J. Wong
  2022-06-27 21:35 ` [PATCH 3/3] xfs: dont treat rt extents beyond EOF as eofblocks to be cleared Darrick J. Wong
  2 siblings, 0 replies; 12+ messages in thread
From: Darrick J. Wong @ 2022-06-27 21:35 UTC (permalink / raw)
  To: djwong; +Cc: Dave Chinner, linux-xfs, david, allison.henderson

From: Darrick J. Wong <djwong@kernel.org>

Now that we've established (again!) that empty xattr leaf buffers are
ok, we no longer need to bhold them to transactions when we're creating
new leaf blocks.  Get rid of the entire mechanism, which should simplify
the xattr code quite a bit.

The original justification for using bhold here was to prevent the AIL
from trying to write the empty leaf block into the fs during the brief
time that we release the buffer lock.  The reason for /that/ was to
prevent recovery from tripping over the empty ondisk block.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr.c      |   38 +++++++++-----------------------------
 fs/xfs/libxfs/xfs_attr.h      |    5 -----
 fs/xfs/libxfs/xfs_attr_leaf.c |    9 ++-------
 fs/xfs/libxfs/xfs_attr_leaf.h |    3 +--
 fs/xfs/xfs_attr_item.c        |   22 ----------------------
 5 files changed, 12 insertions(+), 65 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 1824f61621a2..224649a76cbb 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -50,7 +50,7 @@ STATIC int xfs_attr_shortform_addname(xfs_da_args_t *args);
 STATIC int xfs_attr_leaf_get(xfs_da_args_t *args);
 STATIC int xfs_attr_leaf_removename(xfs_da_args_t *args);
 STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
-STATIC int xfs_attr_leaf_try_add(struct xfs_da_args *args, struct xfs_buf *bp);
+STATIC int xfs_attr_leaf_try_add(struct xfs_da_args *args);
 
 /*
  * Internal routines when attribute list is more than one block.
@@ -393,16 +393,10 @@ xfs_attr_sf_addname(
 	 * It won't fit in the shortform, transform to a leaf block.  GROT:
 	 * another possible req'mt for a double-split btree op.
 	 */
-	error = xfs_attr_shortform_to_leaf(args, &attr->xattri_leaf_bp);
+	error = xfs_attr_shortform_to_leaf(args);
 	if (error)
 		return error;
 
-	/*
-	 * Prevent the leaf buffer from being unlocked so that a concurrent AIL
-	 * push cannot grab the half-baked leaf buffer and run into problems
-	 * with the write verifier.
-	 */
-	xfs_trans_bhold(args->trans, attr->xattri_leaf_bp);
 	attr->xattri_dela_state = XFS_DAS_LEAF_ADD;
 out:
 	trace_xfs_attr_sf_addname_return(attr->xattri_dela_state, args->dp);
@@ -447,11 +441,9 @@ xfs_attr_leaf_addname(
 
 	/*
 	 * Use the leaf buffer we may already hold locked as a result of
-	 * a sf-to-leaf conversion. The held buffer is no longer valid
-	 * after this call, regardless of the result.
+	 * a sf-to-leaf conversion.
 	 */
-	error = xfs_attr_leaf_try_add(args, attr->xattri_leaf_bp);
-	attr->xattri_leaf_bp = NULL;
+	error = xfs_attr_leaf_try_add(args);
 
 	if (error == -ENOSPC) {
 		error = xfs_attr3_leaf_to_node(args);
@@ -497,8 +489,6 @@ xfs_attr_node_addname(
 	struct xfs_da_args	*args = attr->xattri_da_args;
 	int			error;
 
-	ASSERT(!attr->xattri_leaf_bp);
-
 	error = xfs_attr_node_addname_find_attr(attr);
 	if (error)
 		return error;
@@ -1215,24 +1205,14 @@ xfs_attr_restore_rmt_blk(
  */
 STATIC int
 xfs_attr_leaf_try_add(
-	struct xfs_da_args	*args,
-	struct xfs_buf		*bp)
+	struct xfs_da_args	*args)
 {
+	struct xfs_buf		*bp;
 	int			error;
 
-	/*
-	 * If the caller provided a buffer to us, it is locked and held in
-	 * the transaction because it just did a shortform to leaf conversion.
-	 * Hence we don't need to read it again. Otherwise read in the leaf
-	 * buffer.
-	 */
-	if (bp) {
-		xfs_trans_bhold_release(args->trans, bp);
-	} else {
-		error = xfs_attr3_leaf_read(args->trans, args->dp, 0, &bp);
-		if (error)
-			return error;
-	}
+	error = xfs_attr3_leaf_read(args->trans, args->dp, 0, &bp);
+	if (error)
+		return error;
 
 	/*
 	 * Look up the xattr name to set the insertion point for the new xattr.
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index b4a2fc77017e..dfb47fa63c6d 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -515,11 +515,6 @@ struct xfs_attr_intent {
 	 */
 	struct xfs_attri_log_nameval	*xattri_nameval;
 
-	/*
-	 * Used by xfs_attr_set to hold a leaf buffer across a transaction roll
-	 */
-	struct xfs_buf			*xattri_leaf_bp;
-
 	/* Used to keep track of current state of delayed operation */
 	enum xfs_delattr_state		xattri_dela_state;
 
diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
index f6629960e17b..8f47396f8dd2 100644
--- a/fs/xfs/libxfs/xfs_attr_leaf.c
+++ b/fs/xfs/libxfs/xfs_attr_leaf.c
@@ -930,14 +930,10 @@ xfs_attr_shortform_getvalue(
 	return -ENOATTR;
 }
 
-/*
- * Convert from using the shortform to the leaf.  On success, return the
- * buffer so that we can keep it locked until we're totally done with it.
- */
+/* Convert from using the shortform to the leaf format. */
 int
 xfs_attr_shortform_to_leaf(
-	struct xfs_da_args		*args,
-	struct xfs_buf			**leaf_bp)
+	struct xfs_da_args		*args)
 {
 	struct xfs_inode		*dp;
 	struct xfs_attr_shortform	*sf;
@@ -999,7 +995,6 @@ xfs_attr_shortform_to_leaf(
 		sfe = xfs_attr_sf_nextentry(sfe);
 	}
 	error = 0;
-	*leaf_bp = bp;
 out:
 	kmem_free(tmpbuffer);
 	return error;
diff --git a/fs/xfs/libxfs/xfs_attr_leaf.h b/fs/xfs/libxfs/xfs_attr_leaf.h
index efa757f1e912..368f4d9fa1d5 100644
--- a/fs/xfs/libxfs/xfs_attr_leaf.h
+++ b/fs/xfs/libxfs/xfs_attr_leaf.h
@@ -49,8 +49,7 @@ void	xfs_attr_shortform_create(struct xfs_da_args *args);
 void	xfs_attr_shortform_add(struct xfs_da_args *args, int forkoff);
 int	xfs_attr_shortform_lookup(struct xfs_da_args *args);
 int	xfs_attr_shortform_getvalue(struct xfs_da_args *args);
-int	xfs_attr_shortform_to_leaf(struct xfs_da_args *args,
-			struct xfs_buf **leaf_bp);
+int	xfs_attr_shortform_to_leaf(struct xfs_da_args *args);
 int	xfs_attr_sf_removename(struct xfs_da_args *args);
 int	xfs_attr_sf_findname(struct xfs_da_args *args,
 			     struct xfs_attr_sf_entry **sfep,
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 6ee905a09eb2..5077a7ad5646 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -455,8 +455,6 @@ static inline void
 xfs_attr_free_item(
 	struct xfs_attr_intent		*attr)
 {
-	ASSERT(attr->xattri_leaf_bp == NULL);
-
 	if (attr->xattri_da_state)
 		xfs_da_state_free(attr->xattri_da_state);
 	xfs_attri_log_nameval_put(attr->xattri_nameval);
@@ -511,10 +509,6 @@ xfs_attr_cancel_item(
 	struct xfs_attr_intent		*attr;
 
 	attr = container_of(item, struct xfs_attr_intent, xattri_list);
-	if (attr->xattri_leaf_bp) {
-		xfs_buf_relse(attr->xattri_leaf_bp);
-		attr->xattri_leaf_bp = NULL;
-	}
 	xfs_attr_free_item(attr);
 }
 
@@ -672,16 +666,6 @@ xfs_attri_item_recover(
 		if (error)
 			goto out_unlock;
 
-		/*
-		 * The defer capture structure took its own reference to the
-		 * attr leaf buffer and will give that to the continuation
-		 * transaction.  The attr intent struct drives the continuation
-		 * work, so release our refcount on the attr leaf buffer but
-		 * retain the pointer in the intent structure.
-		 */
-		if (attr->xattri_leaf_bp)
-			xfs_buf_relse(attr->xattri_leaf_bp);
-
 		xfs_iunlock(ip, XFS_ILOCK_EXCL);
 		xfs_irele(ip);
 		return 0;
@@ -692,13 +676,7 @@ xfs_attri_item_recover(
 	}
 
 	error = xfs_defer_ops_capture_and_commit(tp, capture_list);
-
 out_unlock:
-	if (attr->xattri_leaf_bp) {
-		xfs_buf_relse(attr->xattri_leaf_bp);
-		attr->xattri_leaf_bp = NULL;
-	}
-
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	xfs_irele(ip);
 out:


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] xfs: dont treat rt extents beyond EOF as eofblocks to be cleared
  2022-06-27 21:35 [PATCHSET v2 0/3] xfs: random fixes for 5.19-rc5 Darrick J. Wong
  2022-06-27 21:35 ` [PATCH 1/3] xfs: empty xattr leaf header blocks are not corruption Darrick J. Wong
  2022-06-27 21:35 ` [PATCH 2/3] xfs: don't hold xattr leaf buffers across transaction rolls Darrick J. Wong
@ 2022-06-27 21:35 ` Darrick J. Wong
  2022-06-29  7:26   ` Christoph Hellwig
  2 siblings, 1 reply; 12+ messages in thread
From: Darrick J. Wong @ 2022-06-27 21:35 UTC (permalink / raw)
  To: djwong; +Cc: Dave Chinner, linux-xfs, david, allison.henderson

From: Darrick J. Wong <djwong@kernel.org>

On a system with a realtime volume and a 28k realtime extent,
generic/491 fails because the test opens a file on a frozen filesystem
and closing it causes xfs_release -> xfs_can_free_eofblocks to
mistakenly think that the the blocks of the realtime extent beyond EOF
are posteof blocks to be freed.  Realtime extents cannot be partially
unmapped, so this is pointless.  Worse yet, this triggers posteof
cleanup, which stalls on a transaction allocation, which is why the test
fails.

Teach the predicate to account for realtime extents properly.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_bmap_util.c |    2 ++
 1 file changed, 2 insertions(+)


diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 52be58372c63..85e1a26c92e8 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -686,6 +686,8 @@ xfs_can_free_eofblocks(
 	 * forever.
 	 */
 	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_ISIZE(ip));
+	if (XFS_IS_REALTIME_INODE(ip) && mp->m_sb.sb_rextsize > 1)
+		end_fsb = roundup_64(end_fsb, mp->m_sb.sb_rextsize);
 	last_fsb = XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes);
 	if (last_fsb <= end_fsb)
 		return false;


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] xfs: empty xattr leaf header blocks are not corruption
  2022-06-27 21:35 ` [PATCH 1/3] xfs: empty xattr leaf header blocks are not corruption Darrick J. Wong
@ 2022-06-28  0:27   ` Dave Chinner
  0 siblings, 0 replies; 12+ messages in thread
From: Dave Chinner @ 2022-06-28  0:27 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, allison.henderson

On Mon, Jun 27, 2022 at 02:35:27PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> TLDR: Revert commit 51e6104fdb95 ("xfs: detect empty attr leaf blocks in
> xfs_attr3_leaf_verify") because it was wrong.
....
> 
> Original-bug: 517c22207b04 ("xfs: add CRCs to attr leaf blocks")
> Still-not-fixed-by: 2e1d23370e75 ("xfs: ignore leaf attr ichdr.count in verifier during log replay")
> Removed-in: f28cef9e4dac ("xfs: don't fail verifier on empty attr3 leaf block")
> Fixes: 51e6104fdb95 ("xfs: detect empty attr leaf blocks in xfs_attr3_leaf_verify")
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/libxfs/xfs_attr_leaf.c |   26 +++++++++++++++++---------
>  1 file changed, 17 insertions(+), 9 deletions(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
> index 37e7c33f6283..f6629960e17b 100644
> --- a/fs/xfs/libxfs/xfs_attr_leaf.c
> +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
> @@ -289,6 +289,23 @@ xfs_attr3_leaf_verify_entry(
>  	return NULL;
>  }
>  
> +/*
> + * Validate an attribute leaf block.
> + *
> + * Empty leaf blocks can occur under the following circumstances:
> + *
> + * 1. setxattr adds a new extended attribute to a file;
> + * 2. The file has zero existing attributes;
> + * 3. The attribute is too large to fit in the attribute fork;
> + * 4. The attribute is small enough to fit in a leaf block;
> + * 5. A log flush occurs after committing the transaction that creates
> + *    the (empty) leaf block; and
> + * 6. The filesystem goes down after the log flush but before the new
> + *    attribute can be committed to the leaf block.
> + *
> + * Hence we need to ensure that we don't fail the validation purely
> + * because the leaf is empty.
> + */
>  static xfs_failaddr_t
>  xfs_attr3_leaf_verify(
>  	struct xfs_buf			*bp)
> @@ -310,15 +327,6 @@ xfs_attr3_leaf_verify(
>  	if (fa)
>  		return fa;
>  
> -	/*
> -	 * Empty leaf blocks should never occur;  they imply the existence of a
> -	 * software bug that needs fixing. xfs_repair also flags them as a
> -	 * corruption that needs fixing, so we should never let these go to
> -	 * disk.
> -	 */
> -	if (ichdr.count == 0)
> -		return __this_address;
> -
>  	/*
>  	 * firstused is the block offset of the first name info structure.
>  	 * Make sure it doesn't go off the block or crash into the header.

Looks good.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] xfs: dont treat rt extents beyond EOF as eofblocks to be cleared
  2022-06-27 21:35 ` [PATCH 3/3] xfs: dont treat rt extents beyond EOF as eofblocks to be cleared Darrick J. Wong
@ 2022-06-29  7:26   ` Christoph Hellwig
  0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2022-06-29  7:26 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Dave Chinner, linux-xfs, david, allison.henderson

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] xfs: dont treat rt extents beyond EOF as eofblocks to be cleared
  2022-06-27 20:59         ` Darrick J. Wong
@ 2022-06-27 22:27           ` Dave Chinner
  0 siblings, 0 replies; 12+ messages in thread
From: Dave Chinner @ 2022-06-27 22:27 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, allison.henderson

On Mon, Jun 27, 2022 at 01:59:32PM -0700, Darrick J. Wong wrote:
> On Mon, Jun 27, 2022 at 03:16:49PM +1000, Dave Chinner wrote:
> > On Sun, Jun 26, 2022 at 08:57:25PM -0700, Darrick J. Wong wrote:
> > > On Mon, Jun 27, 2022 at 11:37:31AM +1000, Dave Chinner wrote:
> > > > On Sun, Jun 26, 2022 at 03:04:04PM -0700, Darrick J. Wong wrote:
> > > > > From: Darrick J. Wong <djwong@kernel.org>
> > > > > 
> > > > > On a system with a realtime volume and a 28k realtime extent,
> > > > > generic/491 fails because the test opens a file on a frozen filesystem
> > > > > and closing it causes xfs_release -> xfs_can_free_eofblocks to
> > > > > mistakenly think that the the blocks of the realtime extent beyond EOF
> > > > > are posteof blocks to be freed.  Realtime extents cannot be partially
> > > > > unmapped, so this is pointless.  Worse yet, this triggers posteof
> > > > > cleanup, which stalls on a transaction allocation, which is why the test
> > > > > fails.
> > > > > 
> > > > > Teach the predicate to account for realtime extents properly.
> > > > > 
> > > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > > > ---
> > > > >  fs/xfs/xfs_bmap_util.c |    2 ++
> > > > >  1 file changed, 2 insertions(+)
> > > > > 
> > > > > 
> > > > > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> > > > > index 52be58372c63..85e1a26c92e8 100644
> > > > > --- a/fs/xfs/xfs_bmap_util.c
> > > > > +++ b/fs/xfs/xfs_bmap_util.c
> > > > > @@ -686,6 +686,8 @@ xfs_can_free_eofblocks(
> > > > >  	 * forever.
> > > > >  	 */
> > > > >  	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_ISIZE(ip));
> > > > > +	if (XFS_IS_REALTIME_INODE(ip) && mp->m_sb.sb_rextsize > 1)
> > > > > +		end_fsb = roundup_64(end_fsb, mp->m_sb.sb_rextsize);
> > > > >  	last_fsb = XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes);
> > > > >  	if (last_fsb <= end_fsb)
> > > > >  		return false;
> > > > 
> > > > Ok, that works.
> > 
> > Reviewed-by: Dave Chinner <dchinner@redhat.com>
> > 
> > > > 
> > > > However, I was looking at xfs_can_free_eofblocks() w.r.t. freeze a
> > > > couple of days ago and wondering why there isn't a freeze/RO state
> > > > check in xfs_can_free_eofblocks(). Shouldn't we have one here so
> > > > that we never try to run xfs_free_eofblocks() on RO/frozen
> > > > filesystems regardless of unexpected state/alignment issues?
> > > 
> > > I asked myself that question too.  I found three callers of this
> > > predicate:
> > > 
> > > 1. fallocate, which should have obtained freeze protection
> > 
> > *nod*
> > 
> > > 2. inodegc, which should never be running when we get to the innermost
> > > freeze protection level
> > 
> > So inodegc could still do IO here on a read-only fs?
> 
> Correct.
> 
> > > 3. xfs_release, which doesn't take freeze protection at all.  Either it
> > > needs to take freeze protection so that xfs_free_eofblocks can't get
> > > stuck in xfs_trans_alloc, or we'd need to modify xfs_trans_alloc to
> > > sb_start_intwrite_trylock
> > 
> > That looks to me like it is simply a case of replacing the
> > !xfs_is_readonly() check in xfs_release() with a
> > !xfs_fs_writeable(mp, SB_FREEZE_WRITE) check and we shouldn't have
> > to touch anythign else, right?
> 
> I think there would still be a race if we did that -- I don't see
> anything in __fput that prohibits another thread from initiating a
> freeze after the release process calls _can_free_eofblocks but before
> the actual call to _free_eofblocks.

Yup, the probability of that happening is pretty small, and there's
every chance it could have got stuck on something else racing with
the freeze.

In reality, I don't think that blocking in ->release on freeze is
inherently bad - it will just wait until the filesystem is thawed
and then continue on. i.e. just because the test hangs on closing a
fd on a frozen fs doesn't mean that the freeze mechanism is broken
in any way....

> Hm.  How often would we have a readonly fd pointing to a file that has
> posteof blocks?  I suppose this could happen if the system was extending
> a file, crashed, and then someone remounted, opened a ro fd, and then
> closed and froze the fs at the same time...?

Snapshots. Open files at freeze time will result in post-eof blocks
existing in the snapshot. Mount the snapshot read-only ..... and
that's what the read-only check in xfs_release() catches......

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] xfs: dont treat rt extents beyond EOF as eofblocks to be cleared
  2022-06-27  5:16       ` Dave Chinner
@ 2022-06-27 20:59         ` Darrick J. Wong
  2022-06-27 22:27           ` Dave Chinner
  0 siblings, 1 reply; 12+ messages in thread
From: Darrick J. Wong @ 2022-06-27 20:59 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, allison.henderson

On Mon, Jun 27, 2022 at 03:16:49PM +1000, Dave Chinner wrote:
> On Sun, Jun 26, 2022 at 08:57:25PM -0700, Darrick J. Wong wrote:
> > On Mon, Jun 27, 2022 at 11:37:31AM +1000, Dave Chinner wrote:
> > > On Sun, Jun 26, 2022 at 03:04:04PM -0700, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <djwong@kernel.org>
> > > > 
> > > > On a system with a realtime volume and a 28k realtime extent,
> > > > generic/491 fails because the test opens a file on a frozen filesystem
> > > > and closing it causes xfs_release -> xfs_can_free_eofblocks to
> > > > mistakenly think that the the blocks of the realtime extent beyond EOF
> > > > are posteof blocks to be freed.  Realtime extents cannot be partially
> > > > unmapped, so this is pointless.  Worse yet, this triggers posteof
> > > > cleanup, which stalls on a transaction allocation, which is why the test
> > > > fails.
> > > > 
> > > > Teach the predicate to account for realtime extents properly.
> > > > 
> > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > > ---
> > > >  fs/xfs/xfs_bmap_util.c |    2 ++
> > > >  1 file changed, 2 insertions(+)
> > > > 
> > > > 
> > > > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> > > > index 52be58372c63..85e1a26c92e8 100644
> > > > --- a/fs/xfs/xfs_bmap_util.c
> > > > +++ b/fs/xfs/xfs_bmap_util.c
> > > > @@ -686,6 +686,8 @@ xfs_can_free_eofblocks(
> > > >  	 * forever.
> > > >  	 */
> > > >  	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_ISIZE(ip));
> > > > +	if (XFS_IS_REALTIME_INODE(ip) && mp->m_sb.sb_rextsize > 1)
> > > > +		end_fsb = roundup_64(end_fsb, mp->m_sb.sb_rextsize);
> > > >  	last_fsb = XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes);
> > > >  	if (last_fsb <= end_fsb)
> > > >  		return false;
> > > 
> > > Ok, that works.
> 
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> 
> > > 
> > > However, I was looking at xfs_can_free_eofblocks() w.r.t. freeze a
> > > couple of days ago and wondering why there isn't a freeze/RO state
> > > check in xfs_can_free_eofblocks(). Shouldn't we have one here so
> > > that we never try to run xfs_free_eofblocks() on RO/frozen
> > > filesystems regardless of unexpected state/alignment issues?
> > 
> > I asked myself that question too.  I found three callers of this
> > predicate:
> > 
> > 1. fallocate, which should have obtained freeze protection
> 
> *nod*
> 
> > 2. inodegc, which should never be running when we get to the innermost
> > freeze protection level
> 
> So inodegc could still do IO here on a read-only fs?

Correct.

> > 3. xfs_release, which doesn't take freeze protection at all.  Either it
> > needs to take freeze protection so that xfs_free_eofblocks can't get
> > stuck in xfs_trans_alloc, or we'd need to modify xfs_trans_alloc to
> > sb_start_intwrite_trylock
> 
> That looks to me like it is simply a case of replacing the
> !xfs_is_readonly() check in xfs_release() with a
> !xfs_fs_writeable(mp, SB_FREEZE_WRITE) check and we shouldn't have
> to touch anythign else, right?

I think there would still be a race if we did that -- I don't see
anything in __fput that prohibits another thread from initiating a
freeze after the release process calls _can_free_eofblocks but before
the actual call to _free_eofblocks.

Hm.  How often would we have a readonly fd pointing to a file that has
posteof blocks?  I suppose this could happen if the system was extending
a file, crashed, and then someone remounted, opened a ro fd, and then
closed and froze the fs at the same time...?

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] xfs: dont treat rt extents beyond EOF as eofblocks to be cleared
  2022-06-27  3:57     ` Darrick J. Wong
@ 2022-06-27  5:16       ` Dave Chinner
  2022-06-27 20:59         ` Darrick J. Wong
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2022-06-27  5:16 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, allison.henderson

On Sun, Jun 26, 2022 at 08:57:25PM -0700, Darrick J. Wong wrote:
> On Mon, Jun 27, 2022 at 11:37:31AM +1000, Dave Chinner wrote:
> > On Sun, Jun 26, 2022 at 03:04:04PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > > 
> > > On a system with a realtime volume and a 28k realtime extent,
> > > generic/491 fails because the test opens a file on a frozen filesystem
> > > and closing it causes xfs_release -> xfs_can_free_eofblocks to
> > > mistakenly think that the the blocks of the realtime extent beyond EOF
> > > are posteof blocks to be freed.  Realtime extents cannot be partially
> > > unmapped, so this is pointless.  Worse yet, this triggers posteof
> > > cleanup, which stalls on a transaction allocation, which is why the test
> > > fails.
> > > 
> > > Teach the predicate to account for realtime extents properly.
> > > 
> > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > ---
> > >  fs/xfs/xfs_bmap_util.c |    2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > > 
> > > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> > > index 52be58372c63..85e1a26c92e8 100644
> > > --- a/fs/xfs/xfs_bmap_util.c
> > > +++ b/fs/xfs/xfs_bmap_util.c
> > > @@ -686,6 +686,8 @@ xfs_can_free_eofblocks(
> > >  	 * forever.
> > >  	 */
> > >  	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_ISIZE(ip));
> > > +	if (XFS_IS_REALTIME_INODE(ip) && mp->m_sb.sb_rextsize > 1)
> > > +		end_fsb = roundup_64(end_fsb, mp->m_sb.sb_rextsize);
> > >  	last_fsb = XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes);
> > >  	if (last_fsb <= end_fsb)
> > >  		return false;
> > 
> > Ok, that works.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

> > 
> > However, I was looking at xfs_can_free_eofblocks() w.r.t. freeze a
> > couple of days ago and wondering why there isn't a freeze/RO state
> > check in xfs_can_free_eofblocks(). Shouldn't we have one here so
> > that we never try to run xfs_free_eofblocks() on RO/frozen
> > filesystems regardless of unexpected state/alignment issues?
> 
> I asked myself that question too.  I found three callers of this
> predicate:
> 
> 1. fallocate, which should have obtained freeze protection

*nod*

> 2. inodegc, which should never be running when we get to the innermost
> freeze protection level

So inodegc could still do IO here on a read-only fs?

> 3. xfs_release, which doesn't take freeze protection at all.  Either it
> needs to take freeze protection so that xfs_free_eofblocks can't get
> stuck in xfs_trans_alloc, or we'd need to modify xfs_trans_alloc to
> sb_start_intwrite_trylock

That looks to me like it is simply a case of replacing the
!xfs_is_readonly() check in xfs_release() with a
!xfs_fs_writeable(mp, SB_FREEZE_WRITE) check and we shouldn't have
to touch anythign else, right?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] xfs: dont treat rt extents beyond EOF as eofblocks to be cleared
  2022-06-27  1:37   ` Dave Chinner
@ 2022-06-27  3:57     ` Darrick J. Wong
  2022-06-27  5:16       ` Dave Chinner
  0 siblings, 1 reply; 12+ messages in thread
From: Darrick J. Wong @ 2022-06-27  3:57 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, allison.henderson

On Mon, Jun 27, 2022 at 11:37:31AM +1000, Dave Chinner wrote:
> On Sun, Jun 26, 2022 at 03:04:04PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > On a system with a realtime volume and a 28k realtime extent,
> > generic/491 fails because the test opens a file on a frozen filesystem
> > and closing it causes xfs_release -> xfs_can_free_eofblocks to
> > mistakenly think that the the blocks of the realtime extent beyond EOF
> > are posteof blocks to be freed.  Realtime extents cannot be partially
> > unmapped, so this is pointless.  Worse yet, this triggers posteof
> > cleanup, which stalls on a transaction allocation, which is why the test
> > fails.
> > 
> > Teach the predicate to account for realtime extents properly.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  fs/xfs/xfs_bmap_util.c |    2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > 
> > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> > index 52be58372c63..85e1a26c92e8 100644
> > --- a/fs/xfs/xfs_bmap_util.c
> > +++ b/fs/xfs/xfs_bmap_util.c
> > @@ -686,6 +686,8 @@ xfs_can_free_eofblocks(
> >  	 * forever.
> >  	 */
> >  	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_ISIZE(ip));
> > +	if (XFS_IS_REALTIME_INODE(ip) && mp->m_sb.sb_rextsize > 1)
> > +		end_fsb = roundup_64(end_fsb, mp->m_sb.sb_rextsize);
> >  	last_fsb = XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes);
> >  	if (last_fsb <= end_fsb)
> >  		return false;
> 
> Ok, that works.
> 
> However, I was looking at xfs_can_free_eofblocks() w.r.t. freeze a
> couple of days ago and wondering why there isn't a freeze/RO state
> check in xfs_can_free_eofblocks(). Shouldn't we have one here so
> that we never try to run xfs_free_eofblocks() on RO/frozen
> filesystems regardless of unexpected state/alignment issues?

I asked myself that question too.  I found three callers of this
predicate:

1. fallocate, which should have obtained freeze protection

2. inodegc, which should never be running when we get to the innermost
freeze protection level

3. xfs_release, which doesn't take freeze protection at all.  Either it
needs to take freeze protection so that xfs_free_eofblocks can't get
stuck in xfs_trans_alloc, or we'd need to modify xfs_trans_alloc to
sb_start_intwrite_trylock

I don't really want to try to add (3) as part of a fix for 5.19, but I
would like to get these fixes merged so I can concentrate on finding and
fixing the file corruption problems that are still present in -rc4.  If
we want to engineer a freeze/ro state check later, we can do that too.

So, can we move ahead with this fix?

--D

> Cheers,
> 
> Dave.
> 
> 
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] xfs: dont treat rt extents beyond EOF as eofblocks to be cleared
  2022-06-26 22:04 ` [PATCH 3/3] xfs: dont treat rt extents beyond EOF as eofblocks to be cleared Darrick J. Wong
@ 2022-06-27  1:37   ` Dave Chinner
  2022-06-27  3:57     ` Darrick J. Wong
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2022-06-27  1:37 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, allison.henderson

On Sun, Jun 26, 2022 at 03:04:04PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> On a system with a realtime volume and a 28k realtime extent,
> generic/491 fails because the test opens a file on a frozen filesystem
> and closing it causes xfs_release -> xfs_can_free_eofblocks to
> mistakenly think that the the blocks of the realtime extent beyond EOF
> are posteof blocks to be freed.  Realtime extents cannot be partially
> unmapped, so this is pointless.  Worse yet, this triggers posteof
> cleanup, which stalls on a transaction allocation, which is why the test
> fails.
> 
> Teach the predicate to account for realtime extents properly.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/xfs_bmap_util.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> 
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index 52be58372c63..85e1a26c92e8 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -686,6 +686,8 @@ xfs_can_free_eofblocks(
>  	 * forever.
>  	 */
>  	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_ISIZE(ip));
> +	if (XFS_IS_REALTIME_INODE(ip) && mp->m_sb.sb_rextsize > 1)
> +		end_fsb = roundup_64(end_fsb, mp->m_sb.sb_rextsize);
>  	last_fsb = XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes);
>  	if (last_fsb <= end_fsb)
>  		return false;

Ok, that works.

However, I was looking at xfs_can_free_eofblocks() w.r.t. freeze a
couple of days ago and wondering why there isn't a freeze/RO state
check in xfs_can_free_eofblocks(). Shouldn't we have one here so
that we never try to run xfs_free_eofblocks() on RO/frozen
filesystems regardless of unexpected state/alignment issues?

Cheers,

Dave.


-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 3/3] xfs: dont treat rt extents beyond EOF as eofblocks to be cleared
  2022-06-26 22:03 [PATCHSET 0/3] xfs: random fixes for 5.19-rc5 Darrick J. Wong
@ 2022-06-26 22:04 ` Darrick J. Wong
  2022-06-27  1:37   ` Dave Chinner
  0 siblings, 1 reply; 12+ messages in thread
From: Darrick J. Wong @ 2022-06-26 22:04 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs, david, allison.henderson

From: Darrick J. Wong <djwong@kernel.org>

On a system with a realtime volume and a 28k realtime extent,
generic/491 fails because the test opens a file on a frozen filesystem
and closing it causes xfs_release -> xfs_can_free_eofblocks to
mistakenly think that the the blocks of the realtime extent beyond EOF
are posteof blocks to be freed.  Realtime extents cannot be partially
unmapped, so this is pointless.  Worse yet, this triggers posteof
cleanup, which stalls on a transaction allocation, which is why the test
fails.

Teach the predicate to account for realtime extents properly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_bmap_util.c |    2 ++
 1 file changed, 2 insertions(+)


diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 52be58372c63..85e1a26c92e8 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -686,6 +686,8 @@ xfs_can_free_eofblocks(
 	 * forever.
 	 */
 	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_ISIZE(ip));
+	if (XFS_IS_REALTIME_INODE(ip) && mp->m_sb.sb_rextsize > 1)
+		end_fsb = roundup_64(end_fsb, mp->m_sb.sb_rextsize);
 	last_fsb = XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes);
 	if (last_fsb <= end_fsb)
 		return false;


^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-06-29  7:26 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-27 21:35 [PATCHSET v2 0/3] xfs: random fixes for 5.19-rc5 Darrick J. Wong
2022-06-27 21:35 ` [PATCH 1/3] xfs: empty xattr leaf header blocks are not corruption Darrick J. Wong
2022-06-28  0:27   ` Dave Chinner
2022-06-27 21:35 ` [PATCH 2/3] xfs: don't hold xattr leaf buffers across transaction rolls Darrick J. Wong
2022-06-27 21:35 ` [PATCH 3/3] xfs: dont treat rt extents beyond EOF as eofblocks to be cleared Darrick J. Wong
2022-06-29  7:26   ` Christoph Hellwig
  -- strict thread matches above, loose matches on Subject: below --
2022-06-26 22:03 [PATCHSET 0/3] xfs: random fixes for 5.19-rc5 Darrick J. Wong
2022-06-26 22:04 ` [PATCH 3/3] xfs: dont treat rt extents beyond EOF as eofblocks to be cleared Darrick J. Wong
2022-06-27  1:37   ` Dave Chinner
2022-06-27  3:57     ` Darrick J. Wong
2022-06-27  5:16       ` Dave Chinner
2022-06-27 20:59         ` Darrick J. Wong
2022-06-27 22:27           ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.