* [PATCHSET 0/3] xfs: random fixes for 5.19-rc5 @ 2022-06-26 22:03 Darrick J. Wong 2022-06-26 22:03 ` [PATCH 1/3] xfs: empty xattr leaf header blocks are not corruption Darrick J. Wong ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Darrick J. Wong @ 2022-06-26 22:03 UTC (permalink / raw) To: djwong; +Cc: linux-xfs, david, allison.henderson Hi all, This week, we're fixing a log recovery regression that appeared in 5.19-rc1 due to an incorrect verifier patch, and a problem where closing an O_RDONLY realtime file on a frozen fs incorrectly triggers eofblocks removal. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=xfs-fixes-5.19 --- fs/xfs/libxfs/xfs_attr.c | 38 ++++++--------------------- fs/xfs/libxfs/xfs_attr.h | 5 ---- fs/xfs/libxfs/xfs_attr_leaf.c | 57 ++++++++++++++++++++++++++++++++--------- fs/xfs/libxfs/xfs_attr_leaf.h | 3 +- fs/xfs/xfs_attr_item.c | 22 ---------------- fs/xfs/xfs_bmap_util.c | 2 + 6 files changed, 56 insertions(+), 71 deletions(-) ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/3] xfs: empty xattr leaf header blocks are not corruption 2022-06-26 22:03 [PATCHSET 0/3] xfs: random fixes for 5.19-rc5 Darrick J. Wong @ 2022-06-26 22:03 ` Darrick J. Wong 2022-06-27 1:16 ` Dave Chinner 2022-06-26 22:03 ` [PATCH 2/3] xfs: don't hold xattr leaf buffers across transaction rolls Darrick J. Wong 2022-06-26 22:04 ` [PATCH 3/3] xfs: dont treat rt extents beyond EOF as eofblocks to be cleared Darrick J. Wong 2 siblings, 1 reply; 14+ messages in thread From: Darrick J. Wong @ 2022-06-26 22:03 UTC (permalink / raw) To: djwong; +Cc: linux-xfs, david, allison.henderson From: Darrick J. Wong <djwong@kernel.org> Every now and then we get a corruption report from the kernel or xfs_repair about empty leaf blocks in the extended attribute structure. We've long thought that these shouldn't be possible, but prior to 5.18 one would shake loose in the recoveryloop fstests about once a month. A new addition to the xattr leaf block verifier in 5.19-rc1 makes this happen every 7 minutes on my testing cloud. I added a ton of logging to detect any time we set the header count on an xattr leaf block to zero. This produced the following dmesg output on generic/388: XFS (sda4): ino 0x21fcbaf leaf 0x129bf78 hdcount==0! Call Trace: <TASK> dump_stack_lvl+0x34/0x44 xfs_attr3_leaf_create+0x187/0x230 xfs_attr_shortform_to_leaf+0xd1/0x2f0 xfs_attr_set_iter+0x73e/0xa90 xfs_xattri_finish_update+0x45/0x80 xfs_attr_finish_item+0x1b/0xd0 xfs_defer_finish_noroll+0x19c/0x770 __xfs_trans_commit+0x153/0x3e0 xfs_attr_set+0x36b/0x740 xfs_xattr_set+0x89/0xd0 __vfs_setxattr+0x67/0x80 __vfs_setxattr_noperm+0x6e/0x120 vfs_setxattr+0x97/0x180 setxattr+0x88/0xa0 path_setxattr+0xc3/0xe0 __x64_sys_setxattr+0x27/0x30 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 So now we know that someone is creating empty xattr leaf blocks as part of converting a sf xattr structure into a leaf xattr structure. The conversion routine logs any existing sf attributes in the same transaction that creates the leaf block, so we know this is a setxattr to a file that has no attributes at all. Next, g/388 calls the shutdown ioctl and cycles the mount to trigger log recovery. I also augmented buffer item recovery to call ->verify_struct on any attr leaf blocks and complain if it finds a failure: XFS (sda4): Unmounting Filesystem XFS (sda4): Mounting V5 Filesystem XFS (sda4): Starting recovery (logdev: internal) XFS (sda4): xattr leaf daddr 0x129bf78 hdrcount == 0! Call Trace: <TASK> dump_stack_lvl+0x34/0x44 xfs_attr3_leaf_verify+0x3b8/0x420 xlog_recover_buf_commit_pass2+0x60a/0x6c0 xlog_recover_items_pass2+0x4e/0xc0 xlog_recover_commit_trans+0x33c/0x350 xlog_recovery_process_trans+0xa5/0xe0 xlog_recover_process_data+0x8d/0x140 xlog_do_recovery_pass+0x19b/0x720 xlog_do_log_recovery+0x62/0xc0 xlog_do_recover+0x33/0x1d0 xlog_recover+0xda/0x190 xfs_log_mount+0x14c/0x360 xfs_mountfs+0x517/0xa60 xfs_fs_fill_super+0x6bc/0x950 get_tree_bdev+0x175/0x280 vfs_get_tree+0x1a/0x80 path_mount+0x6f5/0xaa0 __x64_sys_mount+0x103/0x140 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7fc61e241eae And a moment later, the _delwri_submit of the recovered buffers trips the same verifier and recovery fails: XFS (sda4): Metadata corruption detected at xfs_attr3_leaf_verify+0x393/0x420 [xfs], xfs_attr3_leaf block 0x129bf78 XFS (sda4): Unmount and run xfs_repair XFS (sda4): First 128 bytes of corrupted metadata buffer: 00000000: 00 00 00 00 00 00 00 00 3b ee 00 00 00 00 00 00 ........;....... 00000010: 00 00 00 00 01 29 bf 78 00 00 00 00 00 00 00 00 .....).x........ 00000020: a5 1b d0 02 b2 9a 49 df 8e 9c fb 8d f8 31 3e 9d ......I......1>. 00000030: 00 00 00 00 02 1f cb af 00 00 00 00 10 00 00 00 ................ 00000040: 00 50 0f b0 00 00 00 00 00 00 00 00 00 00 00 00 .P.............. 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ XFS (sda4): Corruption of in-memory data (0x8) detected at _xfs_buf_ioapply+0x37f/0x3b0 [xfs] (fs/xfs/xfs_buf.c:1518). Shutting down filesystem. XFS (sda4): Please unmount the filesystem and rectify the problem(s) XFS (sda4): log mount/recovery failed: error -117 XFS (sda4): log mount failed I think I see what's going on here -- setxattr is racing with something that shuts down the filesystem: Thread 1 Thread 2 -------- -------- xfs_attr_sf_addname xfs_attr_shortform_to_leaf <create empty leaf> xfs_trans_bhold(leaf) xattri_dela_state = XFS_DAS_LEAF_ADD <roll transaction> <flush log> <shut down filesystem> xfs_trans_bhold_release(leaf) <discover fs is dead, bail> Thread 3 -------- <cycle mount, start recovery> xlog_recover_buf_commit_pass2 xlog_recover_do_reg_buffer <replay empty leaf buffer from recovered buf item> xfs_buf_delwri_queue(leaf) xfs_buf_delwri_submit _xfs_buf_ioapply(leaf) xfs_attr3_leaf_write_verify <trip over empty leaf buffer> <fail recovery> As you can see, the bhold keeps the leaf buffer locked and thus prevents the *AIL* from tripping over the ichdr.count==0 check in the write verifier. Unfortunately, it doesn't prevent the log from getting flushed to disk, which sets up log recovery to fail. So. It's clear that the kernel has always had the ability to persist attr leaf blocks with ichdr.count==0, which means that it's part of the ondisk format now. The original patch adding a check to the verifier was therefore not correct, and the interim patches neutering the check did not fix the problem completely. Removing it was the correct solution, but then the check was added back in 51e6104fdb95. We clearly suck at this, so this time replace the incorrect check with a tombstone comment laying out the entire sad history. Looking at the rest of the xattr code, it seems that files with empty leaf blocks behave as expected -- listxattr reports no attributes; getxattr on any xattr returns nothing as expected; removexattr does nothing; and setxattr can add attributes just fine. Original-bug: 517c22207b04 ("xfs: add CRCs to attr leaf blocks") Still-not-fixed: 2e1d23370e75 ("xfs: ignore leaf attr ichdr.count in verifier during log replay") Removed-in: f28cef9e4dac ("xfs: don't fail verifier on empty attr3 leaf block") Fixes: 51e6104fdb95 ("xfs: detect empty attr leaf blocks in xfs_attr3_leaf_verify") Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_attr_leaf.c | 48 ++++++++++++++++++++++++++++++++++++----- 1 file changed, 42 insertions(+), 6 deletions(-) diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c index 37e7c33f6283..be7c216ec8f2 100644 --- a/fs/xfs/libxfs/xfs_attr_leaf.c +++ b/fs/xfs/libxfs/xfs_attr_leaf.c @@ -311,13 +311,49 @@ xfs_attr3_leaf_verify( return fa; /* - * Empty leaf blocks should never occur; they imply the existence of a - * software bug that needs fixing. xfs_repair also flags them as a - * corruption that needs fixing, so we should never let these go to - * disk. + * Empty leaf blocks can occur under the following circumstances: + * + * 1. setxattr adds a new extended attribute to a file; + * 2. The file has zero existing attributes; + * 3. The attribute is too large to fit in the attribute fork; + * 4. The attribute is small enough to fit in a leaf block; + * 5. A log flush occurs after committing the transaction that creates + * the (empty) leaf block; and + * 6. The filesystem goes down after the log flush but before the new + * attribute can be committed to the leaf block. + * + * xfs_repair used to flag these empty leaf blocks as corruption, but + * aside from wasting space, they are benign. The rest of the xattr + * code will happily add attributes to empty leaf blocks. Hence this + * comment serves as a tombstone to incorrect verifier code. + * + * Unfortunately, this check has been added and removed multiple times + * throughout history. It first appeared in[1] kernel 3.10 as part of + * the early V5 format patches. The check was later discovered to + * break log recovery and hence disabled[2] during log recovery in + * kernel 4.10. Simultaneously, the check was added[3] to xfs_repair + * 4.9.0 to try to weed out the empty leaf blocks. This was still not + * correct because log recovery would recover an empty attr leaf block + * successfully only for regular xattr operations to trip over the + * empty block during of the block during regular operation. + * Therefore, the check was removed entirely[4] in kernel 5.7 but + * removal of the xfs_repair check was forgotten. The continued + * complaints from xfs_repair lead to us mistakenly re-adding[5] the + * verifier check for kernel 5.19, and has been removed once again. + * + * [1] 517c22207b04 ("xfs: add CRCs to attr leaf blocks") + * [2] 2e1d23370e75 ("xfs: ignore leaf attr ichdr.count in verifier + * during log replay") + * [3] f7140161 ("xfs_repair: junk leaf attribute if count == 0") + * [4] f28cef9e4dac ("xfs: don't fail verifier on empty attr3 leaf + * block") + * [5] 51e6104fdb95 ("xfs: detect empty attr leaf blocks in + * xfs_attr3_leaf_verify") + * + * Normally this would go in the commit message, but as we've a history + * of getting this wrong, this now goes in the code base as a gigantic + * comment. */ - if (ichdr.count == 0) - return __this_address; /* * firstused is the block offset of the first name info structure. ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] xfs: empty xattr leaf header blocks are not corruption 2022-06-26 22:03 ` [PATCH 1/3] xfs: empty xattr leaf header blocks are not corruption Darrick J. Wong @ 2022-06-27 1:16 ` Dave Chinner 2022-06-27 3:59 ` Darrick J. Wong 0 siblings, 1 reply; 14+ messages in thread From: Dave Chinner @ 2022-06-27 1:16 UTC (permalink / raw) To: Darrick J. Wong; +Cc: linux-xfs, allison.henderson On Sun, Jun 26, 2022 at 03:03:53PM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@kernel.org> > > Every now and then we get a corruption report from the kernel or > xfs_repair about empty leaf blocks in the extended attribute structure. > We've long thought that these shouldn't be possible, but prior to 5.18 > one would shake loose in the recoveryloop fstests about once a month. > > A new addition to the xattr leaf block verifier in 5.19-rc1 makes this > happen every 7 minutes on my testing cloud. Ok, so this is all just a long way of saying: Revert commit 51e6104fdb95 ("xfs: detect empty attr leaf blocks in xfs_attr3_leaf_verify") because it was wrong. Yes? > Original-bug: 517c22207b04 ("xfs: add CRCs to attr leaf blocks") > Still-not-fixed: 2e1d23370e75 ("xfs: ignore leaf attr ichdr.count in verifier during log replay") > Removed-in: f28cef9e4dac ("xfs: don't fail verifier on empty attr3 leaf block") > Fixes: 51e6104fdb95 ("xfs: detect empty attr leaf blocks in xfs_attr3_leaf_verify") > Signed-off-by: Darrick J. Wong <djwong@kernel.org> > --- > fs/xfs/libxfs/xfs_attr_leaf.c | 48 ++++++++++++++++++++++++++++++++++++----- > 1 file changed, 42 insertions(+), 6 deletions(-) > > > diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c > index 37e7c33f6283..be7c216ec8f2 100644 > --- a/fs/xfs/libxfs/xfs_attr_leaf.c > +++ b/fs/xfs/libxfs/xfs_attr_leaf.c > @@ -311,13 +311,49 @@ xfs_attr3_leaf_verify( > return fa; > > /* > - * Empty leaf blocks should never occur; they imply the existence of a > - * software bug that needs fixing. xfs_repair also flags them as a > - * corruption that needs fixing, so we should never let these go to > - * disk. > + * Empty leaf blocks can occur under the following circumstances: > + * > + * 1. setxattr adds a new extended attribute to a file; > + * 2. The file has zero existing attributes; > + * 3. The attribute is too large to fit in the attribute fork; > + * 4. The attribute is small enough to fit in a leaf block; > + * 5. A log flush occurs after committing the transaction that creates > + * the (empty) leaf block; and > + * 6. The filesystem goes down after the log flush but before the new > + * attribute can be committed to the leaf block. > + * > + * xfs_repair used to flag these empty leaf blocks as corruption, but > + * aside from wasting space, they are benign. The rest of the xattr > + * code will happily add attributes to empty leaf blocks. Hence this > + * comment serves as a tombstone to incorrect verifier code. > + * > + * Unfortunately, this check has been added and removed multiple times > + * throughout history. It first appeared in[1] kernel 3.10 as part of > + * the early V5 format patches. The check was later discovered to > + * break log recovery and hence disabled[2] during log recovery in > + * kernel 4.10. Simultaneously, the check was added[3] to xfs_repair > + * 4.9.0 to try to weed out the empty leaf blocks. This was still not > + * correct because log recovery would recover an empty attr leaf block > + * successfully only for regular xattr operations to trip over the > + * empty block during of the block during regular operation. > + * Therefore, the check was removed entirely[4] in kernel 5.7 but > + * removal of the xfs_repair check was forgotten. The continued > + * complaints from xfs_repair lead to us mistakenly re-adding[5] the > + * verifier check for kernel 5.19, and has been removed once again. > + * > + * [1] 517c22207b04 ("xfs: add CRCs to attr leaf blocks") > + * [2] 2e1d23370e75 ("xfs: ignore leaf attr ichdr.count in verifier > + * during log replay") > + * [3] f7140161 ("xfs_repair: junk leaf attribute if count == 0") > + * [4] f28cef9e4dac ("xfs: don't fail verifier on empty attr3 leaf > + * block") > + * [5] 51e6104fdb95 ("xfs: detect empty attr leaf blocks in > + * xfs_attr3_leaf_verify") > + * > + * Normally this would go in the commit message, but as we've a history > + * of getting this wrong, this now goes in the code base as a gigantic > + * comment. > */ I still think it should be in the commit history, not here in the code. The reason I missed this is that the existing comment about empty leaf attrs is above a section that is verifying entries *after* various fields in the header had been validated. Hence I thought it was a case that the header field had not been validated and so I added it. Simple mistake. This needs to be a comment at the head of the function, not associated with validity checking the entries. i.e. /* * Validate the attribute leaf. * * A leaf block can be empty as a result of transient state whilst * creating a new leaf form attribute: * * 1. setxattr adds a new extended attribute to a file; * 2. The file has zero existing attributes; * 3. The attribute is too large to fit in the attribute fork; * 4. The attribute is small enough to fit in a leaf block; * 5. A log flush occurs after committing the transaction that creates * the (empty) leaf block; and * 6. The filesystem goes down after the log flush but before the new * attribute can be committed to the leaf block. * * Hence we need to ensure that we don't fail the validation purely * because the leaf is empty. */ Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] xfs: empty xattr leaf header blocks are not corruption 2022-06-27 1:16 ` Dave Chinner @ 2022-06-27 3:59 ` Darrick J. Wong 0 siblings, 0 replies; 14+ messages in thread From: Darrick J. Wong @ 2022-06-27 3:59 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs, allison.henderson On Mon, Jun 27, 2022 at 11:16:54AM +1000, Dave Chinner wrote: > On Sun, Jun 26, 2022 at 03:03:53PM -0700, Darrick J. Wong wrote: > > From: Darrick J. Wong <djwong@kernel.org> > > > > Every now and then we get a corruption report from the kernel or > > xfs_repair about empty leaf blocks in the extended attribute structure. > > We've long thought that these shouldn't be possible, but prior to 5.18 > > one would shake loose in the recoveryloop fstests about once a month. > > > > A new addition to the xattr leaf block verifier in 5.19-rc1 makes this > > happen every 7 minutes on my testing cloud. > > Ok, so this is all just a long way of saying: > > Revert commit 51e6104fdb95 ("xfs: detect empty attr leaf blocks in > xfs_attr3_leaf_verify") because it was wrong. > > Yes? Yep. > > Original-bug: 517c22207b04 ("xfs: add CRCs to attr leaf blocks") > > Still-not-fixed: 2e1d23370e75 ("xfs: ignore leaf attr ichdr.count in verifier during log replay") > > Removed-in: f28cef9e4dac ("xfs: don't fail verifier on empty attr3 leaf block") > > Fixes: 51e6104fdb95 ("xfs: detect empty attr leaf blocks in xfs_attr3_leaf_verify") > > Signed-off-by: Darrick J. Wong <djwong@kernel.org> > > --- > > fs/xfs/libxfs/xfs_attr_leaf.c | 48 ++++++++++++++++++++++++++++++++++++----- > > 1 file changed, 42 insertions(+), 6 deletions(-) > > > > > > diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c > > index 37e7c33f6283..be7c216ec8f2 100644 > > --- a/fs/xfs/libxfs/xfs_attr_leaf.c > > +++ b/fs/xfs/libxfs/xfs_attr_leaf.c > > @@ -311,13 +311,49 @@ xfs_attr3_leaf_verify( > > return fa; > > > > /* > > - * Empty leaf blocks should never occur; they imply the existence of a > > - * software bug that needs fixing. xfs_repair also flags them as a > > - * corruption that needs fixing, so we should never let these go to > > - * disk. > > + * Empty leaf blocks can occur under the following circumstances: > > + * > > + * 1. setxattr adds a new extended attribute to a file; > > + * 2. The file has zero existing attributes; > > + * 3. The attribute is too large to fit in the attribute fork; > > + * 4. The attribute is small enough to fit in a leaf block; > > + * 5. A log flush occurs after committing the transaction that creates > > + * the (empty) leaf block; and > > + * 6. The filesystem goes down after the log flush but before the new > > + * attribute can be committed to the leaf block. > > + * > > + * xfs_repair used to flag these empty leaf blocks as corruption, but > > + * aside from wasting space, they are benign. The rest of the xattr > > + * code will happily add attributes to empty leaf blocks. Hence this > > + * comment serves as a tombstone to incorrect verifier code. > > + * > > + * Unfortunately, this check has been added and removed multiple times > > + * throughout history. It first appeared in[1] kernel 3.10 as part of > > + * the early V5 format patches. The check was later discovered to > > + * break log recovery and hence disabled[2] during log recovery in > > + * kernel 4.10. Simultaneously, the check was added[3] to xfs_repair > > + * 4.9.0 to try to weed out the empty leaf blocks. This was still not > > + * correct because log recovery would recover an empty attr leaf block > > + * successfully only for regular xattr operations to trip over the > > + * empty block during of the block during regular operation. > > + * Therefore, the check was removed entirely[4] in kernel 5.7 but > > + * removal of the xfs_repair check was forgotten. The continued > > + * complaints from xfs_repair lead to us mistakenly re-adding[5] the > > + * verifier check for kernel 5.19, and has been removed once again. > > + * > > + * [1] 517c22207b04 ("xfs: add CRCs to attr leaf blocks") > > + * [2] 2e1d23370e75 ("xfs: ignore leaf attr ichdr.count in verifier > > + * during log replay") > > + * [3] f7140161 ("xfs_repair: junk leaf attribute if count == 0") > > + * [4] f28cef9e4dac ("xfs: don't fail verifier on empty attr3 leaf > > + * block") > > + * [5] 51e6104fdb95 ("xfs: detect empty attr leaf blocks in > > + * xfs_attr3_leaf_verify") > > + * > > + * Normally this would go in the commit message, but as we've a history > > + * of getting this wrong, this now goes in the code base as a gigantic > > + * comment. > > */ > > I still think it should be in the commit history, not here in the > code. The reason I missed this is that the existing comment about > empty leaf attrs is above a section that is verifying entries > *after* various fields in the header had been validated. Hence I > thought it was a case that the header field had not been validated > and so I added it. Simple mistake. <nod> ...and I wanted redundant breadcrumbs in the commit history and the code itself because you and I keep making the same mistakes. :/ > This needs to be a comment at the head of the function, not > associated with validity checking the entries. i.e. > > /* > * Validate the attribute leaf. > * > * A leaf block can be empty as a result of transient state whilst > * creating a new leaf form attribute: > * > * 1. setxattr adds a new extended attribute to a file; > * 2. The file has zero existing attributes; > * 3. The attribute is too large to fit in the attribute fork; > * 4. The attribute is small enough to fit in a leaf block; > * 5. A log flush occurs after committing the transaction that creates > * the (empty) leaf block; and > * 6. The filesystem goes down after the log flush but before the new > * attribute can be committed to the leaf block. > * > * Hence we need to ensure that we don't fail the validation purely > * because the leaf is empty. > */ Ok done. --D > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 2/3] xfs: don't hold xattr leaf buffers across transaction rolls 2022-06-26 22:03 [PATCHSET 0/3] xfs: random fixes for 5.19-rc5 Darrick J. Wong 2022-06-26 22:03 ` [PATCH 1/3] xfs: empty xattr leaf header blocks are not corruption Darrick J. Wong @ 2022-06-26 22:03 ` Darrick J. Wong 2022-06-27 1:23 ` Dave Chinner 2022-06-26 22:04 ` [PATCH 3/3] xfs: dont treat rt extents beyond EOF as eofblocks to be cleared Darrick J. Wong 2 siblings, 1 reply; 14+ messages in thread From: Darrick J. Wong @ 2022-06-26 22:03 UTC (permalink / raw) To: djwong; +Cc: linux-xfs, david, allison.henderson From: Darrick J. Wong <djwong@kernel.org> Now that we've established (again!) that empty xattr leaf buffers are ok, we no longer need to bhold them to transactions when we're creating new leaf blocks. Get rid of the entire mechanism, which should simplify the xattr code quite a bit. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_attr.c | 38 +++++++++----------------------------- fs/xfs/libxfs/xfs_attr.h | 5 ----- fs/xfs/libxfs/xfs_attr_leaf.c | 9 ++------- fs/xfs/libxfs/xfs_attr_leaf.h | 3 +-- fs/xfs/xfs_attr_item.c | 22 ---------------------- 5 files changed, 12 insertions(+), 65 deletions(-) diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c index 1824f61621a2..224649a76cbb 100644 --- a/fs/xfs/libxfs/xfs_attr.c +++ b/fs/xfs/libxfs/xfs_attr.c @@ -50,7 +50,7 @@ STATIC int xfs_attr_shortform_addname(xfs_da_args_t *args); STATIC int xfs_attr_leaf_get(xfs_da_args_t *args); STATIC int xfs_attr_leaf_removename(xfs_da_args_t *args); STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp); -STATIC int xfs_attr_leaf_try_add(struct xfs_da_args *args, struct xfs_buf *bp); +STATIC int xfs_attr_leaf_try_add(struct xfs_da_args *args); /* * Internal routines when attribute list is more than one block. @@ -393,16 +393,10 @@ xfs_attr_sf_addname( * It won't fit in the shortform, transform to a leaf block. GROT: * another possible req'mt for a double-split btree op. */ - error = xfs_attr_shortform_to_leaf(args, &attr->xattri_leaf_bp); + error = xfs_attr_shortform_to_leaf(args); if (error) return error; - /* - * Prevent the leaf buffer from being unlocked so that a concurrent AIL - * push cannot grab the half-baked leaf buffer and run into problems - * with the write verifier. - */ - xfs_trans_bhold(args->trans, attr->xattri_leaf_bp); attr->xattri_dela_state = XFS_DAS_LEAF_ADD; out: trace_xfs_attr_sf_addname_return(attr->xattri_dela_state, args->dp); @@ -447,11 +441,9 @@ xfs_attr_leaf_addname( /* * Use the leaf buffer we may already hold locked as a result of - * a sf-to-leaf conversion. The held buffer is no longer valid - * after this call, regardless of the result. + * a sf-to-leaf conversion. */ - error = xfs_attr_leaf_try_add(args, attr->xattri_leaf_bp); - attr->xattri_leaf_bp = NULL; + error = xfs_attr_leaf_try_add(args); if (error == -ENOSPC) { error = xfs_attr3_leaf_to_node(args); @@ -497,8 +489,6 @@ xfs_attr_node_addname( struct xfs_da_args *args = attr->xattri_da_args; int error; - ASSERT(!attr->xattri_leaf_bp); - error = xfs_attr_node_addname_find_attr(attr); if (error) return error; @@ -1215,24 +1205,14 @@ xfs_attr_restore_rmt_blk( */ STATIC int xfs_attr_leaf_try_add( - struct xfs_da_args *args, - struct xfs_buf *bp) + struct xfs_da_args *args) { + struct xfs_buf *bp; int error; - /* - * If the caller provided a buffer to us, it is locked and held in - * the transaction because it just did a shortform to leaf conversion. - * Hence we don't need to read it again. Otherwise read in the leaf - * buffer. - */ - if (bp) { - xfs_trans_bhold_release(args->trans, bp); - } else { - error = xfs_attr3_leaf_read(args->trans, args->dp, 0, &bp); - if (error) - return error; - } + error = xfs_attr3_leaf_read(args->trans, args->dp, 0, &bp); + if (error) + return error; /* * Look up the xattr name to set the insertion point for the new xattr. diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h index b4a2fc77017e..dfb47fa63c6d 100644 --- a/fs/xfs/libxfs/xfs_attr.h +++ b/fs/xfs/libxfs/xfs_attr.h @@ -515,11 +515,6 @@ struct xfs_attr_intent { */ struct xfs_attri_log_nameval *xattri_nameval; - /* - * Used by xfs_attr_set to hold a leaf buffer across a transaction roll - */ - struct xfs_buf *xattri_leaf_bp; - /* Used to keep track of current state of delayed operation */ enum xfs_delattr_state xattri_dela_state; diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c index be7c216ec8f2..997b3f3a9b94 100644 --- a/fs/xfs/libxfs/xfs_attr_leaf.c +++ b/fs/xfs/libxfs/xfs_attr_leaf.c @@ -958,14 +958,10 @@ xfs_attr_shortform_getvalue( return -ENOATTR; } -/* - * Convert from using the shortform to the leaf. On success, return the - * buffer so that we can keep it locked until we're totally done with it. - */ +/* Convert from using the shortform to the leaf format. */ int xfs_attr_shortform_to_leaf( - struct xfs_da_args *args, - struct xfs_buf **leaf_bp) + struct xfs_da_args *args) { struct xfs_inode *dp; struct xfs_attr_shortform *sf; @@ -1027,7 +1023,6 @@ xfs_attr_shortform_to_leaf( sfe = xfs_attr_sf_nextentry(sfe); } error = 0; - *leaf_bp = bp; out: kmem_free(tmpbuffer); return error; diff --git a/fs/xfs/libxfs/xfs_attr_leaf.h b/fs/xfs/libxfs/xfs_attr_leaf.h index efa757f1e912..368f4d9fa1d5 100644 --- a/fs/xfs/libxfs/xfs_attr_leaf.h +++ b/fs/xfs/libxfs/xfs_attr_leaf.h @@ -49,8 +49,7 @@ void xfs_attr_shortform_create(struct xfs_da_args *args); void xfs_attr_shortform_add(struct xfs_da_args *args, int forkoff); int xfs_attr_shortform_lookup(struct xfs_da_args *args); int xfs_attr_shortform_getvalue(struct xfs_da_args *args); -int xfs_attr_shortform_to_leaf(struct xfs_da_args *args, - struct xfs_buf **leaf_bp); +int xfs_attr_shortform_to_leaf(struct xfs_da_args *args); int xfs_attr_sf_removename(struct xfs_da_args *args); int xfs_attr_sf_findname(struct xfs_da_args *args, struct xfs_attr_sf_entry **sfep, diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c index 6ee905a09eb2..5077a7ad5646 100644 --- a/fs/xfs/xfs_attr_item.c +++ b/fs/xfs/xfs_attr_item.c @@ -455,8 +455,6 @@ static inline void xfs_attr_free_item( struct xfs_attr_intent *attr) { - ASSERT(attr->xattri_leaf_bp == NULL); - if (attr->xattri_da_state) xfs_da_state_free(attr->xattri_da_state); xfs_attri_log_nameval_put(attr->xattri_nameval); @@ -511,10 +509,6 @@ xfs_attr_cancel_item( struct xfs_attr_intent *attr; attr = container_of(item, struct xfs_attr_intent, xattri_list); - if (attr->xattri_leaf_bp) { - xfs_buf_relse(attr->xattri_leaf_bp); - attr->xattri_leaf_bp = NULL; - } xfs_attr_free_item(attr); } @@ -672,16 +666,6 @@ xfs_attri_item_recover( if (error) goto out_unlock; - /* - * The defer capture structure took its own reference to the - * attr leaf buffer and will give that to the continuation - * transaction. The attr intent struct drives the continuation - * work, so release our refcount on the attr leaf buffer but - * retain the pointer in the intent structure. - */ - if (attr->xattri_leaf_bp) - xfs_buf_relse(attr->xattri_leaf_bp); - xfs_iunlock(ip, XFS_ILOCK_EXCL); xfs_irele(ip); return 0; @@ -692,13 +676,7 @@ xfs_attri_item_recover( } error = xfs_defer_ops_capture_and_commit(tp, capture_list); - out_unlock: - if (attr->xattri_leaf_bp) { - xfs_buf_relse(attr->xattri_leaf_bp); - attr->xattri_leaf_bp = NULL; - } - xfs_iunlock(ip, XFS_ILOCK_EXCL); xfs_irele(ip); out: ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] xfs: don't hold xattr leaf buffers across transaction rolls 2022-06-26 22:03 ` [PATCH 2/3] xfs: don't hold xattr leaf buffers across transaction rolls Darrick J. Wong @ 2022-06-27 1:23 ` Dave Chinner 2022-06-27 3:46 ` Darrick J. Wong 0 siblings, 1 reply; 14+ messages in thread From: Dave Chinner @ 2022-06-27 1:23 UTC (permalink / raw) To: Darrick J. Wong; +Cc: linux-xfs, allison.henderson On Sun, Jun 26, 2022 at 03:03:58PM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@kernel.org> > > Now that we've established (again!) that empty xattr leaf buffers are > ok, we no longer need to bhold them to transactions when we're creating > new leaf blocks. Get rid of the entire mechanism, which should simplify > the xattr code quite a bit. > > Signed-off-by: Darrick J. Wong <djwong@kernel.org> Why? This code isn't there for correctness - it's just a way of avoiding needing to look up and lock the buffer immediately after we just created it and had a reference to it. This is a valid use of xfs_trans_bhold(), so I'm not convinced that removing it makes the code better. Simpler, yes, but not necessarily better. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] xfs: don't hold xattr leaf buffers across transaction rolls 2022-06-27 1:23 ` Dave Chinner @ 2022-06-27 3:46 ` Darrick J. Wong 2022-06-27 5:10 ` Dave Chinner 0 siblings, 1 reply; 14+ messages in thread From: Darrick J. Wong @ 2022-06-27 3:46 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs, allison.henderson On Mon, Jun 27, 2022 at 11:23:55AM +1000, Dave Chinner wrote: > On Sun, Jun 26, 2022 at 03:03:58PM -0700, Darrick J. Wong wrote: > > From: Darrick J. Wong <djwong@kernel.org> > > > > Now that we've established (again!) that empty xattr leaf buffers are > > ok, we no longer need to bhold them to transactions when we're creating > > new leaf blocks. Get rid of the entire mechanism, which should simplify > > the xattr code quite a bit. > > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org> > > Why? The original justification for using bhold here was to prevent the AIL from trying to write the empty leaf block into the fs during the brief time that we release the buffer lock. The reason for /that/ was to prevent recovery from tripping over the empty ondisk block. The bhold didn't totally solve that problem (hence removing the !count check entirely 3 years later) but it /has/ made things sufficiently more complicated that the resident expert (allison) and two maintainers (you and I) tripped over the bheld leaf buffer handling... > This code isn't there for correctness - it's just a way of > avoiding needing to look up and lock the buffer immediately after we > just created it and had a reference to it. This is a valid use of > xfs_trans_bhold(), so I'm not convinced that removing it makes the > code better. Simpler, yes, but not necessarily better. ...so while I agree that this is a valid use of bhold, I also see that the three people you'd most expect to wield the extra complexity have not done it well, and decided to simplify the buffer usage back to a more common use case in xfs. --D > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] xfs: don't hold xattr leaf buffers across transaction rolls 2022-06-27 3:46 ` Darrick J. Wong @ 2022-06-27 5:10 ` Dave Chinner 0 siblings, 0 replies; 14+ messages in thread From: Dave Chinner @ 2022-06-27 5:10 UTC (permalink / raw) To: Darrick J. Wong; +Cc: linux-xfs, allison.henderson On Sun, Jun 26, 2022 at 08:46:52PM -0700, Darrick J. Wong wrote: > On Mon, Jun 27, 2022 at 11:23:55AM +1000, Dave Chinner wrote: > > On Sun, Jun 26, 2022 at 03:03:58PM -0700, Darrick J. Wong wrote: > > > From: Darrick J. Wong <djwong@kernel.org> > > > > > > Now that we've established (again!) that empty xattr leaf buffers are > > > ok, we no longer need to bhold them to transactions when we're creating > > > new leaf blocks. Get rid of the entire mechanism, which should simplify > > > the xattr code quite a bit. > > > > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org> > > > > Why? > > The original justification for using bhold here was to prevent the AIL > from trying to write the empty leaf block into the fs during the brief > time that we release the buffer lock. The reason for /that/ was to > prevent recovery from tripping over the empty ondisk block. Ok. Reviewed-by: Dave Chinner <dchinner@redhat.com> -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 3/3] xfs: dont treat rt extents beyond EOF as eofblocks to be cleared 2022-06-26 22:03 [PATCHSET 0/3] xfs: random fixes for 5.19-rc5 Darrick J. Wong 2022-06-26 22:03 ` [PATCH 1/3] xfs: empty xattr leaf header blocks are not corruption Darrick J. Wong 2022-06-26 22:03 ` [PATCH 2/3] xfs: don't hold xattr leaf buffers across transaction rolls Darrick J. Wong @ 2022-06-26 22:04 ` Darrick J. Wong 2022-06-27 1:37 ` Dave Chinner 2 siblings, 1 reply; 14+ messages in thread From: Darrick J. Wong @ 2022-06-26 22:04 UTC (permalink / raw) To: djwong; +Cc: linux-xfs, david, allison.henderson From: Darrick J. Wong <djwong@kernel.org> On a system with a realtime volume and a 28k realtime extent, generic/491 fails because the test opens a file on a frozen filesystem and closing it causes xfs_release -> xfs_can_free_eofblocks to mistakenly think that the the blocks of the realtime extent beyond EOF are posteof blocks to be freed. Realtime extents cannot be partially unmapped, so this is pointless. Worse yet, this triggers posteof cleanup, which stalls on a transaction allocation, which is why the test fails. Teach the predicate to account for realtime extents properly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_bmap_util.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 52be58372c63..85e1a26c92e8 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -686,6 +686,8 @@ xfs_can_free_eofblocks( * forever. */ end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_ISIZE(ip)); + if (XFS_IS_REALTIME_INODE(ip) && mp->m_sb.sb_rextsize > 1) + end_fsb = roundup_64(end_fsb, mp->m_sb.sb_rextsize); last_fsb = XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes); if (last_fsb <= end_fsb) return false; ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] xfs: dont treat rt extents beyond EOF as eofblocks to be cleared 2022-06-26 22:04 ` [PATCH 3/3] xfs: dont treat rt extents beyond EOF as eofblocks to be cleared Darrick J. Wong @ 2022-06-27 1:37 ` Dave Chinner 2022-06-27 3:57 ` Darrick J. Wong 0 siblings, 1 reply; 14+ messages in thread From: Dave Chinner @ 2022-06-27 1:37 UTC (permalink / raw) To: Darrick J. Wong; +Cc: linux-xfs, allison.henderson On Sun, Jun 26, 2022 at 03:04:04PM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@kernel.org> > > On a system with a realtime volume and a 28k realtime extent, > generic/491 fails because the test opens a file on a frozen filesystem > and closing it causes xfs_release -> xfs_can_free_eofblocks to > mistakenly think that the the blocks of the realtime extent beyond EOF > are posteof blocks to be freed. Realtime extents cannot be partially > unmapped, so this is pointless. Worse yet, this triggers posteof > cleanup, which stalls on a transaction allocation, which is why the test > fails. > > Teach the predicate to account for realtime extents properly. > > Signed-off-by: Darrick J. Wong <djwong@kernel.org> > --- > fs/xfs/xfs_bmap_util.c | 2 ++ > 1 file changed, 2 insertions(+) > > > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c > index 52be58372c63..85e1a26c92e8 100644 > --- a/fs/xfs/xfs_bmap_util.c > +++ b/fs/xfs/xfs_bmap_util.c > @@ -686,6 +686,8 @@ xfs_can_free_eofblocks( > * forever. > */ > end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_ISIZE(ip)); > + if (XFS_IS_REALTIME_INODE(ip) && mp->m_sb.sb_rextsize > 1) > + end_fsb = roundup_64(end_fsb, mp->m_sb.sb_rextsize); > last_fsb = XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes); > if (last_fsb <= end_fsb) > return false; Ok, that works. However, I was looking at xfs_can_free_eofblocks() w.r.t. freeze a couple of days ago and wondering why there isn't a freeze/RO state check in xfs_can_free_eofblocks(). Shouldn't we have one here so that we never try to run xfs_free_eofblocks() on RO/frozen filesystems regardless of unexpected state/alignment issues? Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] xfs: dont treat rt extents beyond EOF as eofblocks to be cleared 2022-06-27 1:37 ` Dave Chinner @ 2022-06-27 3:57 ` Darrick J. Wong 2022-06-27 5:16 ` Dave Chinner 0 siblings, 1 reply; 14+ messages in thread From: Darrick J. Wong @ 2022-06-27 3:57 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs, allison.henderson On Mon, Jun 27, 2022 at 11:37:31AM +1000, Dave Chinner wrote: > On Sun, Jun 26, 2022 at 03:04:04PM -0700, Darrick J. Wong wrote: > > From: Darrick J. Wong <djwong@kernel.org> > > > > On a system with a realtime volume and a 28k realtime extent, > > generic/491 fails because the test opens a file on a frozen filesystem > > and closing it causes xfs_release -> xfs_can_free_eofblocks to > > mistakenly think that the the blocks of the realtime extent beyond EOF > > are posteof blocks to be freed. Realtime extents cannot be partially > > unmapped, so this is pointless. Worse yet, this triggers posteof > > cleanup, which stalls on a transaction allocation, which is why the test > > fails. > > > > Teach the predicate to account for realtime extents properly. > > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org> > > --- > > fs/xfs/xfs_bmap_util.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > > > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c > > index 52be58372c63..85e1a26c92e8 100644 > > --- a/fs/xfs/xfs_bmap_util.c > > +++ b/fs/xfs/xfs_bmap_util.c > > @@ -686,6 +686,8 @@ xfs_can_free_eofblocks( > > * forever. > > */ > > end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_ISIZE(ip)); > > + if (XFS_IS_REALTIME_INODE(ip) && mp->m_sb.sb_rextsize > 1) > > + end_fsb = roundup_64(end_fsb, mp->m_sb.sb_rextsize); > > last_fsb = XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes); > > if (last_fsb <= end_fsb) > > return false; > > Ok, that works. > > However, I was looking at xfs_can_free_eofblocks() w.r.t. freeze a > couple of days ago and wondering why there isn't a freeze/RO state > check in xfs_can_free_eofblocks(). Shouldn't we have one here so > that we never try to run xfs_free_eofblocks() on RO/frozen > filesystems regardless of unexpected state/alignment issues? I asked myself that question too. I found three callers of this predicate: 1. fallocate, which should have obtained freeze protection 2. inodegc, which should never be running when we get to the innermost freeze protection level 3. xfs_release, which doesn't take freeze protection at all. Either it needs to take freeze protection so that xfs_free_eofblocks can't get stuck in xfs_trans_alloc, or we'd need to modify xfs_trans_alloc to sb_start_intwrite_trylock I don't really want to try to add (3) as part of a fix for 5.19, but I would like to get these fixes merged so I can concentrate on finding and fixing the file corruption problems that are still present in -rc4. If we want to engineer a freeze/ro state check later, we can do that too. So, can we move ahead with this fix? --D > Cheers, > > Dave. > > > -- > Dave Chinner > david@fromorbit.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] xfs: dont treat rt extents beyond EOF as eofblocks to be cleared 2022-06-27 3:57 ` Darrick J. Wong @ 2022-06-27 5:16 ` Dave Chinner 2022-06-27 20:59 ` Darrick J. Wong 0 siblings, 1 reply; 14+ messages in thread From: Dave Chinner @ 2022-06-27 5:16 UTC (permalink / raw) To: Darrick J. Wong; +Cc: linux-xfs, allison.henderson On Sun, Jun 26, 2022 at 08:57:25PM -0700, Darrick J. Wong wrote: > On Mon, Jun 27, 2022 at 11:37:31AM +1000, Dave Chinner wrote: > > On Sun, Jun 26, 2022 at 03:04:04PM -0700, Darrick J. Wong wrote: > > > From: Darrick J. Wong <djwong@kernel.org> > > > > > > On a system with a realtime volume and a 28k realtime extent, > > > generic/491 fails because the test opens a file on a frozen filesystem > > > and closing it causes xfs_release -> xfs_can_free_eofblocks to > > > mistakenly think that the the blocks of the realtime extent beyond EOF > > > are posteof blocks to be freed. Realtime extents cannot be partially > > > unmapped, so this is pointless. Worse yet, this triggers posteof > > > cleanup, which stalls on a transaction allocation, which is why the test > > > fails. > > > > > > Teach the predicate to account for realtime extents properly. > > > > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org> > > > --- > > > fs/xfs/xfs_bmap_util.c | 2 ++ > > > 1 file changed, 2 insertions(+) > > > > > > > > > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c > > > index 52be58372c63..85e1a26c92e8 100644 > > > --- a/fs/xfs/xfs_bmap_util.c > > > +++ b/fs/xfs/xfs_bmap_util.c > > > @@ -686,6 +686,8 @@ xfs_can_free_eofblocks( > > > * forever. > > > */ > > > end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_ISIZE(ip)); > > > + if (XFS_IS_REALTIME_INODE(ip) && mp->m_sb.sb_rextsize > 1) > > > + end_fsb = roundup_64(end_fsb, mp->m_sb.sb_rextsize); > > > last_fsb = XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes); > > > if (last_fsb <= end_fsb) > > > return false; > > > > Ok, that works. Reviewed-by: Dave Chinner <dchinner@redhat.com> > > > > However, I was looking at xfs_can_free_eofblocks() w.r.t. freeze a > > couple of days ago and wondering why there isn't a freeze/RO state > > check in xfs_can_free_eofblocks(). Shouldn't we have one here so > > that we never try to run xfs_free_eofblocks() on RO/frozen > > filesystems regardless of unexpected state/alignment issues? > > I asked myself that question too. I found three callers of this > predicate: > > 1. fallocate, which should have obtained freeze protection *nod* > 2. inodegc, which should never be running when we get to the innermost > freeze protection level So inodegc could still do IO here on a read-only fs? > 3. xfs_release, which doesn't take freeze protection at all. Either it > needs to take freeze protection so that xfs_free_eofblocks can't get > stuck in xfs_trans_alloc, or we'd need to modify xfs_trans_alloc to > sb_start_intwrite_trylock That looks to me like it is simply a case of replacing the !xfs_is_readonly() check in xfs_release() with a !xfs_fs_writeable(mp, SB_FREEZE_WRITE) check and we shouldn't have to touch anythign else, right? Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] xfs: dont treat rt extents beyond EOF as eofblocks to be cleared 2022-06-27 5:16 ` Dave Chinner @ 2022-06-27 20:59 ` Darrick J. Wong 2022-06-27 22:27 ` Dave Chinner 0 siblings, 1 reply; 14+ messages in thread From: Darrick J. Wong @ 2022-06-27 20:59 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs, allison.henderson On Mon, Jun 27, 2022 at 03:16:49PM +1000, Dave Chinner wrote: > On Sun, Jun 26, 2022 at 08:57:25PM -0700, Darrick J. Wong wrote: > > On Mon, Jun 27, 2022 at 11:37:31AM +1000, Dave Chinner wrote: > > > On Sun, Jun 26, 2022 at 03:04:04PM -0700, Darrick J. Wong wrote: > > > > From: Darrick J. Wong <djwong@kernel.org> > > > > > > > > On a system with a realtime volume and a 28k realtime extent, > > > > generic/491 fails because the test opens a file on a frozen filesystem > > > > and closing it causes xfs_release -> xfs_can_free_eofblocks to > > > > mistakenly think that the the blocks of the realtime extent beyond EOF > > > > are posteof blocks to be freed. Realtime extents cannot be partially > > > > unmapped, so this is pointless. Worse yet, this triggers posteof > > > > cleanup, which stalls on a transaction allocation, which is why the test > > > > fails. > > > > > > > > Teach the predicate to account for realtime extents properly. > > > > > > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org> > > > > --- > > > > fs/xfs/xfs_bmap_util.c | 2 ++ > > > > 1 file changed, 2 insertions(+) > > > > > > > > > > > > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c > > > > index 52be58372c63..85e1a26c92e8 100644 > > > > --- a/fs/xfs/xfs_bmap_util.c > > > > +++ b/fs/xfs/xfs_bmap_util.c > > > > @@ -686,6 +686,8 @@ xfs_can_free_eofblocks( > > > > * forever. > > > > */ > > > > end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_ISIZE(ip)); > > > > + if (XFS_IS_REALTIME_INODE(ip) && mp->m_sb.sb_rextsize > 1) > > > > + end_fsb = roundup_64(end_fsb, mp->m_sb.sb_rextsize); > > > > last_fsb = XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes); > > > > if (last_fsb <= end_fsb) > > > > return false; > > > > > > Ok, that works. > > Reviewed-by: Dave Chinner <dchinner@redhat.com> > > > > > > > However, I was looking at xfs_can_free_eofblocks() w.r.t. freeze a > > > couple of days ago and wondering why there isn't a freeze/RO state > > > check in xfs_can_free_eofblocks(). Shouldn't we have one here so > > > that we never try to run xfs_free_eofblocks() on RO/frozen > > > filesystems regardless of unexpected state/alignment issues? > > > > I asked myself that question too. I found three callers of this > > predicate: > > > > 1. fallocate, which should have obtained freeze protection > > *nod* > > > 2. inodegc, which should never be running when we get to the innermost > > freeze protection level > > So inodegc could still do IO here on a read-only fs? Correct. > > 3. xfs_release, which doesn't take freeze protection at all. Either it > > needs to take freeze protection so that xfs_free_eofblocks can't get > > stuck in xfs_trans_alloc, or we'd need to modify xfs_trans_alloc to > > sb_start_intwrite_trylock > > That looks to me like it is simply a case of replacing the > !xfs_is_readonly() check in xfs_release() with a > !xfs_fs_writeable(mp, SB_FREEZE_WRITE) check and we shouldn't have > to touch anythign else, right? I think there would still be a race if we did that -- I don't see anything in __fput that prohibits another thread from initiating a freeze after the release process calls _can_free_eofblocks but before the actual call to _free_eofblocks. Hm. How often would we have a readonly fd pointing to a file that has posteof blocks? I suppose this could happen if the system was extending a file, crashed, and then someone remounted, opened a ro fd, and then closed and froze the fs at the same time...? --D > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] xfs: dont treat rt extents beyond EOF as eofblocks to be cleared 2022-06-27 20:59 ` Darrick J. Wong @ 2022-06-27 22:27 ` Dave Chinner 0 siblings, 0 replies; 14+ messages in thread From: Dave Chinner @ 2022-06-27 22:27 UTC (permalink / raw) To: Darrick J. Wong; +Cc: linux-xfs, allison.henderson On Mon, Jun 27, 2022 at 01:59:32PM -0700, Darrick J. Wong wrote: > On Mon, Jun 27, 2022 at 03:16:49PM +1000, Dave Chinner wrote: > > On Sun, Jun 26, 2022 at 08:57:25PM -0700, Darrick J. Wong wrote: > > > On Mon, Jun 27, 2022 at 11:37:31AM +1000, Dave Chinner wrote: > > > > On Sun, Jun 26, 2022 at 03:04:04PM -0700, Darrick J. Wong wrote: > > > > > From: Darrick J. Wong <djwong@kernel.org> > > > > > > > > > > On a system with a realtime volume and a 28k realtime extent, > > > > > generic/491 fails because the test opens a file on a frozen filesystem > > > > > and closing it causes xfs_release -> xfs_can_free_eofblocks to > > > > > mistakenly think that the the blocks of the realtime extent beyond EOF > > > > > are posteof blocks to be freed. Realtime extents cannot be partially > > > > > unmapped, so this is pointless. Worse yet, this triggers posteof > > > > > cleanup, which stalls on a transaction allocation, which is why the test > > > > > fails. > > > > > > > > > > Teach the predicate to account for realtime extents properly. > > > > > > > > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org> > > > > > --- > > > > > fs/xfs/xfs_bmap_util.c | 2 ++ > > > > > 1 file changed, 2 insertions(+) > > > > > > > > > > > > > > > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c > > > > > index 52be58372c63..85e1a26c92e8 100644 > > > > > --- a/fs/xfs/xfs_bmap_util.c > > > > > +++ b/fs/xfs/xfs_bmap_util.c > > > > > @@ -686,6 +686,8 @@ xfs_can_free_eofblocks( > > > > > * forever. > > > > > */ > > > > > end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_ISIZE(ip)); > > > > > + if (XFS_IS_REALTIME_INODE(ip) && mp->m_sb.sb_rextsize > 1) > > > > > + end_fsb = roundup_64(end_fsb, mp->m_sb.sb_rextsize); > > > > > last_fsb = XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes); > > > > > if (last_fsb <= end_fsb) > > > > > return false; > > > > > > > > Ok, that works. > > > > Reviewed-by: Dave Chinner <dchinner@redhat.com> > > > > > > > > > > However, I was looking at xfs_can_free_eofblocks() w.r.t. freeze a > > > > couple of days ago and wondering why there isn't a freeze/RO state > > > > check in xfs_can_free_eofblocks(). Shouldn't we have one here so > > > > that we never try to run xfs_free_eofblocks() on RO/frozen > > > > filesystems regardless of unexpected state/alignment issues? > > > > > > I asked myself that question too. I found three callers of this > > > predicate: > > > > > > 1. fallocate, which should have obtained freeze protection > > > > *nod* > > > > > 2. inodegc, which should never be running when we get to the innermost > > > freeze protection level > > > > So inodegc could still do IO here on a read-only fs? > > Correct. > > > > 3. xfs_release, which doesn't take freeze protection at all. Either it > > > needs to take freeze protection so that xfs_free_eofblocks can't get > > > stuck in xfs_trans_alloc, or we'd need to modify xfs_trans_alloc to > > > sb_start_intwrite_trylock > > > > That looks to me like it is simply a case of replacing the > > !xfs_is_readonly() check in xfs_release() with a > > !xfs_fs_writeable(mp, SB_FREEZE_WRITE) check and we shouldn't have > > to touch anythign else, right? > > I think there would still be a race if we did that -- I don't see > anything in __fput that prohibits another thread from initiating a > freeze after the release process calls _can_free_eofblocks but before > the actual call to _free_eofblocks. Yup, the probability of that happening is pretty small, and there's every chance it could have got stuck on something else racing with the freeze. In reality, I don't think that blocking in ->release on freeze is inherently bad - it will just wait until the filesystem is thawed and then continue on. i.e. just because the test hangs on closing a fd on a frozen fs doesn't mean that the freeze mechanism is broken in any way.... > Hm. How often would we have a readonly fd pointing to a file that has > posteof blocks? I suppose this could happen if the system was extending > a file, crashed, and then someone remounted, opened a ro fd, and then > closed and froze the fs at the same time...? Snapshots. Open files at freeze time will result in post-eof blocks existing in the snapshot. Mount the snapshot read-only ..... and that's what the read-only check in xfs_release() catches...... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2022-06-27 22:27 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-06-26 22:03 [PATCHSET 0/3] xfs: random fixes for 5.19-rc5 Darrick J. Wong 2022-06-26 22:03 ` [PATCH 1/3] xfs: empty xattr leaf header blocks are not corruption Darrick J. Wong 2022-06-27 1:16 ` Dave Chinner 2022-06-27 3:59 ` Darrick J. Wong 2022-06-26 22:03 ` [PATCH 2/3] xfs: don't hold xattr leaf buffers across transaction rolls Darrick J. Wong 2022-06-27 1:23 ` Dave Chinner 2022-06-27 3:46 ` Darrick J. Wong 2022-06-27 5:10 ` Dave Chinner 2022-06-26 22:04 ` [PATCH 3/3] xfs: dont treat rt extents beyond EOF as eofblocks to be cleared Darrick J. Wong 2022-06-27 1:37 ` Dave Chinner 2022-06-27 3:57 ` Darrick J. Wong 2022-06-27 5:16 ` Dave Chinner 2022-06-27 20:59 ` Darrick J. Wong 2022-06-27 22:27 ` Dave Chinner
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.