* [inline_data] ext4: Stale flags before sync when convert to non-inline @ 2023-11-29 6:15 Daniel Dawson 2023-11-29 20:05 ` Daniel Dawson ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: Daniel Dawson @ 2023-11-29 6:15 UTC (permalink / raw) To: Theodore Ts'o, Andreas Dilger; +Cc: linux-ext4, linux-kernel When a file is converted from inline to non-inline, it has stale flags until sync. If a file with inline data is written to such that it must become non-inline, it temporarily appears to have the inline data flag and not (if applicable) the extent flag. This is corrected on sync, but can cause problems in certain situations. Details: All that is needed to show this behavior is the following command: $ rm -r test-file; dd if=/dev/zero of=test-file bs=64 count=3 status=none; lsattr test-file Assuming extents are in use, this should show --------------e------- test-file but instead shows ------------------N--- test-file until test-file is synced. Despite this, the file is already non-inline and is treated as such for most purposes. Why is this a problem? Because some code will fail under such a condition, for example, lseek(..., SEEK_HOLE) will result in ENOENT. I ran into this with Gentoo's Portage, which uses the call to handle sparse files when copying. Sometimes, an ebuild creates a temporary file that is quickly copied, and apparently the temporary is written in small increments, triggering this. Here is a small program that reproduces the SEEK_HOLE problem (pass it the pathname of a file to create): https://gist.github.com/ddawson/22cfd4cac32916f6f1dcc86f90eed21a Tested with kernel: 6.7.0-rc3 (also 6.6 series) /proc/version: Linux version 6.7.0-rc3 (ddawson@ddawson.local) (gcc (Gentoo 13.2.1_p20231014 p8) 13.2.1 20231014, GNU ld (Gentoo 2.41 p2) 2.41.0) #4 SMP PREEMPT_DYNAMIC Tue Nov 28 20:09:05 PST 2023 Operating System: Gentoo Linux uname -mi: x86_64 GenuineIntel .config: https://gist.github.com/ddawson/2f2e60c6e44a62047d7b7d99c7ce5632 dmesg output: https://gist.github.com/ddawson/026ea63f099ee3e0c301f522dff00764 -- PGP fingerprint: 5BBD5080FEB0EF7F142F8173D572B791F7B4422A ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [inline_data] ext4: Stale flags before sync when convert to non-inline 2023-11-29 6:15 [inline_data] ext4: Stale flags before sync when convert to non-inline Daniel Dawson @ 2023-11-29 20:05 ` Daniel Dawson 2023-11-30 4:06 ` Theodore Ts'o 2024-01-23 5:56 ` Daniel Dawson 2 siblings, 0 replies; 8+ messages in thread From: Daniel Dawson @ 2023-11-29 20:05 UTC (permalink / raw) To: Theodore Ts'o, Andreas Dilger; +Cc: linux-ext4, linux-kernel On 11/28/23 10:15 PM, Daniel Dawson wrote: > $ rm -r test-file; dd if=/dev/zero of=test-file bs=64 count=3 > status=none; lsattr test-file My apologies. That should be "rm -f" at the start. -- PGP fingerprint: 5BBD5080FEB0EF7F142F8173D572B791F7B4422A ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [inline_data] ext4: Stale flags before sync when convert to non-inline 2023-11-29 6:15 [inline_data] ext4: Stale flags before sync when convert to non-inline Daniel Dawson 2023-11-29 20:05 ` Daniel Dawson @ 2023-11-30 4:06 ` Theodore Ts'o 2023-12-28 9:29 ` Daniel Dawson 2024-01-23 5:56 ` Daniel Dawson 2 siblings, 1 reply; 8+ messages in thread From: Theodore Ts'o @ 2023-11-30 4:06 UTC (permalink / raw) To: Daniel Dawson; +Cc: Andreas Dilger, linux-ext4, linux-kernel On Tue, Nov 28, 2023 at 10:15:09PM -0800, Daniel Dawson wrote: > When a file is converted from inline to non-inline, it has stale flags until > sync. > > If a file with inline data is written to such that it must become > non-inline, it temporarily appears to have the inline data flag and not (if > applicable) the extent flag. This is corrected on sync, but can cause > problems in certain situations. The issue here is delayed allocation. When you write to a file with delayed allocation enabled, the file system doesn't decide where the data will be written on the storage device until the last possible minute, when writeback occurs. This can be triggered by a fsync(2) or sync(2) system call, > Why is this a problem? Because some code will fail under such a condition, > for example, lseek(..., SEEK_HOLE) will result in ENOENT. I ran into this > with Gentoo's Portage, which uses the call to handle sparse files when > copying. Sometimes, an ebuild creates a temporary file that is quickly > copied, and apparently the temporary is written in small increments, > triggering this. This is caused by missing logic in ext4_iomap_begin_report(). For the non-inline case, this function does this: ret = ext4_map_blocks(NULL, inode, &map, 0); if (ret < 0) return ret; if (ret == 0) delalloc = ext4_iomap_is_delalloc(inode, &map); For a non-inline file, if you write some data blocks that hasn't been written back due to delayed allocation, ext4_map_blocks() won't be able to map the logical block to a physical block. This logic is missing in the inline_data case: if (ext4_has_inline_data(inode)) { ret = ext4_inline_data_iomap(inode, iomap); if (ret != -EAGAIN) { if (ret == 0 && offset >= iomap->length) ret = -ENOENT; return ret; } } What's missing is a call to ext4_iomap_is_delalloc() in the case where ret == 0, and then setting the delayed allocation flag: if (delalloc && iomap->type == IOMAP_HOLE) iomap->type = IOMAP_DELALLOC; This will deal with the combination of inline_data and delayed allocation for SEEK_HOLE and for FIEMAP. I'll need to turn this into an actual patch and then create a test to validate the patch but I'm pretty sure this should deal with the problem you've come across. Cheers, - Ted ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [inline_data] ext4: Stale flags before sync when convert to non-inline 2023-11-30 4:06 ` Theodore Ts'o @ 2023-12-28 9:29 ` Daniel Dawson 0 siblings, 0 replies; 8+ messages in thread From: Daniel Dawson @ 2023-12-28 9:29 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Andreas Dilger, linux-ext4, linux-kernel On 11/29/23 8:06 PM, Theodore Ts'o wrote: > I'll need to turn this into an actual patch and then create a test to > validate the patch but I'm pretty sure this should deal with the > problem you've come across. Is there anything new on this (apologies if I managed to miss it)? Let me know if there's anything I might do to help, like maybe testing. -- PGP fingerprint: 5BBD5080FEB0EF7F142F8173D572B791F7B4422A ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [inline_data] ext4: Stale flags before sync when convert to non-inline 2023-11-29 6:15 [inline_data] ext4: Stale flags before sync when convert to non-inline Daniel Dawson 2023-11-29 20:05 ` Daniel Dawson 2023-11-30 4:06 ` Theodore Ts'o @ 2024-01-23 5:56 ` Daniel Dawson 2024-01-24 17:13 ` Luis Henriques 2 siblings, 1 reply; 8+ messages in thread From: Daniel Dawson @ 2024-01-23 5:56 UTC (permalink / raw) To: Theodore Ts'o, Andreas Dilger; +Cc: linux-ext4, linux-kernel On 11/28/23 10:15 PM, Daniel Dawson wrote: > When a file is converted from inline to non-inline, it has stale flags > until sync. > Why is this a problem? Because some code will fail under such a > condition, for example, lseek(..., SEEK_HOLE) will result in ENOENT. Just tested. Still happening on 6.8-rc1. -- PGP fingerprint: 5BBD5080FEB0EF7F142F8173D572B791F7B4422A ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [inline_data] ext4: Stale flags before sync when convert to non-inline 2024-01-23 5:56 ` Daniel Dawson @ 2024-01-24 17:13 ` Luis Henriques 2024-01-28 12:06 ` Daniel Dawson 0 siblings, 1 reply; 8+ messages in thread From: Luis Henriques @ 2024-01-24 17:13 UTC (permalink / raw) To: Daniel Dawson; +Cc: Theodore Ts'o, Andreas Dilger, linux-ext4, linux-kernel Daniel Dawson <danielcdawson@gmail.com> writes: > On 11/28/23 10:15 PM, Daniel Dawson wrote: >> When a file is converted from inline to non-inline, it has stale flags until >> sync. > >> Why is this a problem? Because some code will fail under such a condition, for >> example, lseek(..., SEEK_HOLE) will result in ENOENT. > > > Just tested. Still happening on 6.8-rc1. FWIW, I've been looking into a similar issue related with inline-data and delayed allocation. It may however be quite different because it seems to add small block sizes into the mix: https://bugzilla.kernel.org/show_bug.cgi?id=200681 Unfortunately, I'm still trying to find my way around all this code and I can't say I fully understand the whole flow using the reproducer provided in that bugzilla. Bellow, I'm inlining a patch that started as debug patch that I've used to try to understand what was going on. It seems to workaround that bug, but I know it's not a real fix -- I don't yet understand what's going on. Regarding your specific usecase, I can reproduce it and, unfortunately, I don't thing Ted's suggestion will fix it as I don't even see ext4_iomap_begin_report() being executed at all. Anyway, just my 2 cents... let's see if I can come up with something. Cheers, -- Luís diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c index d5bd1e3a5d36..d0c3d6fd48de 100644 --- a/fs/ext4/inline.c +++ b/fs/ext4/inline.c @@ -528,7 +528,19 @@ int ext4_readpage_inline(struct inode *inode, struct folio *folio) if (!folio->index) ret = ext4_read_inline_folio(inode, folio); else if (!folio_test_uptodate(folio)) { - folio_zero_segment(folio, 0, folio_size(folio)); + struct buffer_head *bh, *head; + size_t start = 0; + + head = folio_buffers(folio); + if (head) { + bh = head; + do { + if (!buffer_uptodate(bh)) + break; + start += inode->i_sb->s_blocksize; + } while ((bh = bh->b_this_page) != head); + } + folio_zero_segment(folio, start, folio_size(folio)); folio_mark_uptodate(folio); } ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [inline_data] ext4: Stale flags before sync when convert to non-inline 2024-01-24 17:13 ` Luis Henriques @ 2024-01-28 12:06 ` Daniel Dawson 2024-01-29 11:17 ` Luis Henriques 0 siblings, 1 reply; 8+ messages in thread From: Daniel Dawson @ 2024-01-28 12:06 UTC (permalink / raw) To: Luis Henriques Cc: Theodore Ts'o, Andreas Dilger, linux-ext4, linux-kernel I didn't see your message until now. Sorry. On 1/24/24 9:13 AM, Luis Henriques wrote: > Bellow, I'm inlining a patch that started as debug patch that I've used to > try to understand what was going on. It seems to workaround that bug, but > I know it's not a real fix -- I don't yet understand what's going on. Thanks for this. I'm not sure if you meant to say you think it works around the present issue. I just tested it, and it does not. In case you missed the start of the thread, here is the test I gave for triggering the issue: $ rm -f test-file; dd if=/dev/zero of=test-file bs=64 count=3 status=none; lsattr test-file Instead of writing the file all at once, it splits it into 3 writes, where the first is small enough to make the file inline, and then it becomes non-inline. Ideally, the output should be --------------e------- test-file but delayed allocation means it instead shows ------------------N--- test-file until sync. I also gave this code for testing SEEK_HOLE: https://gist.github.com/ddawson/22cfd4cac32916f6f1dcc86f90eed21a > Regarding your specific usecase, I can reproduce it and, unfortunately, I > don't thing Ted's suggestion will fix it as I don't even see > ext4_iomap_begin_report() being executed at all. To be clear, that function is called in a few specific circumstances, such as when lseek() is called with SEEK_HOLE or SEEK_DATA, or with FIEMAP. When I traced the kernel myself, I did see it being executed from the lseek() call. The changes are to address the file not yet being converted from inline, where the contents are still written where the map would otherwise be. If you treat it as the map, you get nonsense. Something else needs to be done. I'm not clear on whether his proposed changes would then allow an application to function properly under such a condition, but it should at least *not* give ENOENT. After testing what I think are the changes he proposed, I find it doesn't work. If I remove the "&& iomap->type == IOMAP_HOLE", lseek() no longer gives an error, but instead returns 0, which I'm pretty sure won't work for the affected use case. Either way, I'm not sure I interpreted his description of the changes correctly. -- PGP fingerprint: 5BBD5080FEB0EF7F142F8173D572B791F7B4422A ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [inline_data] ext4: Stale flags before sync when convert to non-inline 2024-01-28 12:06 ` Daniel Dawson @ 2024-01-29 11:17 ` Luis Henriques 0 siblings, 0 replies; 8+ messages in thread From: Luis Henriques @ 2024-01-29 11:17 UTC (permalink / raw) To: Daniel Dawson; +Cc: Theodore Ts'o, Andreas Dilger, linux-ext4, linux-kernel Daniel Dawson <danielcdawson@gmail.com> writes: > I didn't see your message until now. Sorry. > > On 1/24/24 9:13 AM, Luis Henriques wrote: >> Bellow, I'm inlining a patch that started as debug patch that I've used to >> try to understand what was going on. It seems to workaround that bug, but >> I know it's not a real fix -- I don't yet understand what's going on. > > Thanks for this. I'm not sure if you meant to say you think it works around the > present issue. I just tested it, and it does not. In case you missed the start I was referring to the bugzilla bug [1] I've been looking into recently. My patch was a debug patch for that bug, but it definitely does not fix the issue you're reporting. Sorry for mixing up everything and confusing everyone ;-) [1] https://bugzilla.kernel.org/show_bug.cgi?id=200681 > of the thread, here is the test I gave for triggering the issue: > > $ rm -f test-file; dd if=/dev/zero of=test-file bs=64 count=3 status=none; > lsattr test-file > > Instead of writing the file all at once, it splits it into 3 writes, where the > first is small enough to make the file inline, and then it becomes > non-inline. Ideally, the output should be > > --------------e------- test-file > > but delayed allocation means it instead shows > > ------------------N--- test-file > > until sync. I also gave this code for testing SEEK_HOLE: > > https://gist.github.com/ddawson/22cfd4cac32916f6f1dcc86f90eed21a > >> Regarding your specific usecase, I can reproduce it and, unfortunately, I >> don't thing Ted's suggestion will fix it as I don't even see >> ext4_iomap_begin_report() being executed at all. > > To be clear, that function is called in a few specific circumstances, such as > when lseek() is called with SEEK_HOLE or SEEK_DATA, or with FIEMAP. When I > traced the kernel myself, I did see it being executed from the lseek() call. The > changes are to address the file not yet being converted from inline, where the > contents are still written where the map would otherwise be. If you treat it as > the map, you get nonsense. Something else needs to be done. > > I'm not clear on whether his proposed changes would then allow an application to > function properly under such a condition, but it should at least *not* give > ENOENT. > > After testing what I think are the changes he proposed, I find it doesn't > work. If I remove the "&& iomap->type == IOMAP_HOLE", lseek() no longer gives an > error, but instead returns 0, which I'm pretty sure won't work for the affected > use case. Either way, I'm not sure I interpreted his description of the changes > correctly. Sure, I can understand under which circumstances ext4_iomap_begin_report() can be executed. What I meant was that, for your very simple test case (using 'dd' and 'lsattr'), this function will not be executed and thus the suggested fix will still show the 'N' in the file attributes. Anyway, thanks a lot for the extra information your providing here. I'll try to find some time in the next few days to keep debugging the issue and hopefully get some more useful info (and, who knows! a patch). Cheers, -- Luís ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-01-29 11:17 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-11-29 6:15 [inline_data] ext4: Stale flags before sync when convert to non-inline Daniel Dawson 2023-11-29 20:05 ` Daniel Dawson 2023-11-30 4:06 ` Theodore Ts'o 2023-12-28 9:29 ` Daniel Dawson 2024-01-23 5:56 ` Daniel Dawson 2024-01-24 17:13 ` Luis Henriques 2024-01-28 12:06 ` Daniel Dawson 2024-01-29 11:17 ` Luis Henriques
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.