All of lore.kernel.org
 help / color / mirror / Atom feed
* [inline_data] ext4: Stale flags before sync when convert to non-inline
@ 2023-11-29  6:15 Daniel Dawson
  2023-11-29 20:05 ` Daniel Dawson
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Daniel Dawson @ 2023-11-29  6:15 UTC (permalink / raw)
  To: Theodore Ts'o, Andreas Dilger; +Cc: linux-ext4, linux-kernel

When a file is converted from inline to non-inline, it has stale flags 
until sync.

If a file with inline data is written to such that it must become 
non-inline, it temporarily appears to have the inline data flag and not 
(if applicable) the extent flag. This is corrected on sync, but can 
cause problems in certain situations.

Details:

All that is needed to show this behavior is the following command:

$ rm -r test-file; dd if=/dev/zero of=test-file bs=64 count=3 
status=none; lsattr test-file

Assuming extents are in use, this should show

--------------e------- test-file

but instead shows

------------------N--- test-file

until test-file is synced. Despite this, the file is already non-inline 
and is treated as such for most purposes.

Why is this a problem? Because some code will fail under such a 
condition, for example, lseek(..., SEEK_HOLE) will result in ENOENT. I 
ran into this with Gentoo's Portage, which uses the call to handle 
sparse files when copying. Sometimes, an ebuild creates a temporary file 
that is quickly copied, and apparently the temporary is written in small 
increments, triggering this.

Here is a small program that reproduces the SEEK_HOLE problem (pass it 
the pathname of a file to create):
https://gist.github.com/ddawson/22cfd4cac32916f6f1dcc86f90eed21a

Tested with kernel: 6.7.0-rc3 (also 6.6 series)
/proc/version: Linux version 6.7.0-rc3 (ddawson@ddawson.local) (gcc 
(Gentoo 13.2.1_p20231014 p8) 13.2.1 20231014, GNU ld (Gentoo 2.41 p2) 
2.41.0) #4 SMP PREEMPT_DYNAMIC Tue Nov 28 20:09:05 PST 2023
Operating System: Gentoo Linux
uname -mi: x86_64 GenuineIntel
.config: https://gist.github.com/ddawson/2f2e60c6e44a62047d7b7d99c7ce5632
dmesg output: 
https://gist.github.com/ddawson/026ea63f099ee3e0c301f522dff00764

-- 
PGP fingerprint: 5BBD5080FEB0EF7F142F8173D572B791F7B4422A


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [inline_data] ext4: Stale flags before sync when convert to non-inline
  2023-11-29  6:15 [inline_data] ext4: Stale flags before sync when convert to non-inline Daniel Dawson
@ 2023-11-29 20:05 ` Daniel Dawson
  2023-11-30  4:06 ` Theodore Ts'o
  2024-01-23  5:56 ` Daniel Dawson
  2 siblings, 0 replies; 8+ messages in thread
From: Daniel Dawson @ 2023-11-29 20:05 UTC (permalink / raw)
  To: Theodore Ts'o, Andreas Dilger; +Cc: linux-ext4, linux-kernel

On 11/28/23 10:15 PM, Daniel Dawson wrote:
> $ rm -r test-file; dd if=/dev/zero of=test-file bs=64 count=3 
> status=none; lsattr test-file

My apologies. That should be "rm -f" at the start.

-- 
PGP fingerprint: 5BBD5080FEB0EF7F142F8173D572B791F7B4422A


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [inline_data] ext4: Stale flags before sync when convert to non-inline
  2023-11-29  6:15 [inline_data] ext4: Stale flags before sync when convert to non-inline Daniel Dawson
  2023-11-29 20:05 ` Daniel Dawson
@ 2023-11-30  4:06 ` Theodore Ts'o
  2023-12-28  9:29   ` Daniel Dawson
  2024-01-23  5:56 ` Daniel Dawson
  2 siblings, 1 reply; 8+ messages in thread
From: Theodore Ts'o @ 2023-11-30  4:06 UTC (permalink / raw)
  To: Daniel Dawson; +Cc: Andreas Dilger, linux-ext4, linux-kernel

On Tue, Nov 28, 2023 at 10:15:09PM -0800, Daniel Dawson wrote:
> When a file is converted from inline to non-inline, it has stale flags until
> sync.
> 
> If a file with inline data is written to such that it must become
> non-inline, it temporarily appears to have the inline data flag and not (if
> applicable) the extent flag. This is corrected on sync, but can cause
> problems in certain situations.

The issue here is delayed allocation.  When you write to a file with
delayed allocation enabled, the file system doesn't decide where the
data will be written on the storage device until the last possible
minute, when writeback occurs.  This can be triggered by a fsync(2) or
sync(2) system call,

> Why is this a problem? Because some code will fail under such a condition,
> for example, lseek(..., SEEK_HOLE) will result in ENOENT. I ran into this
> with Gentoo's Portage, which uses the call to handle sparse files when
> copying. Sometimes, an ebuild creates a temporary file that is quickly
> copied, and apparently the temporary is written in small increments,
> triggering this.

This is caused by missing logic in ext4_iomap_begin_report().  For the
non-inline case, this function does this:

	ret = ext4_map_blocks(NULL, inode, &map, 0);
	if (ret < 0)
		return ret;
	if (ret == 0)
		delalloc = ext4_iomap_is_delalloc(inode, &map);

For a non-inline file, if you write some data blocks that hasn't been
written back due to delayed allocation, ext4_map_blocks() won't be
able to map the logical block to a physical block.  This logic is
missing in the inline_data case:

	if (ext4_has_inline_data(inode)) {
		ret = ext4_inline_data_iomap(inode, iomap);
		if (ret != -EAGAIN) {
			if (ret == 0 && offset >= iomap->length)
				ret = -ENOENT;
			return ret;
		}
	}

What's missing is a call to ext4_iomap_is_delalloc() in the case where
ret == 0, and then setting the delayed allocation flag:

	if (delalloc && iomap->type == IOMAP_HOLE)
		iomap->type = IOMAP_DELALLOC;

This will deal with the combination of inline_data and delayed
allocation for SEEK_HOLE and for FIEMAP.

I'll need to turn this into an actual patch and then create a test to
validate the patch but I'm pretty sure this should deal with the
problem you've come across.

Cheers,

						- Ted

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [inline_data] ext4: Stale flags before sync when convert to non-inline
  2023-11-30  4:06 ` Theodore Ts'o
@ 2023-12-28  9:29   ` Daniel Dawson
  0 siblings, 0 replies; 8+ messages in thread
From: Daniel Dawson @ 2023-12-28  9:29 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Andreas Dilger, linux-ext4, linux-kernel

On 11/29/23 8:06 PM, Theodore Ts'o wrote:
> I'll need to turn this into an actual patch and then create a test to
> validate the patch but I'm pretty sure this should deal with the
> problem you've come across.
Is there anything new on this (apologies if I managed to miss it)? Let 
me know if there's anything I might do to help, like maybe testing.

-- 
PGP fingerprint: 5BBD5080FEB0EF7F142F8173D572B791F7B4422A


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [inline_data] ext4: Stale flags before sync when convert to non-inline
  2023-11-29  6:15 [inline_data] ext4: Stale flags before sync when convert to non-inline Daniel Dawson
  2023-11-29 20:05 ` Daniel Dawson
  2023-11-30  4:06 ` Theodore Ts'o
@ 2024-01-23  5:56 ` Daniel Dawson
  2024-01-24 17:13   ` Luis Henriques
  2 siblings, 1 reply; 8+ messages in thread
From: Daniel Dawson @ 2024-01-23  5:56 UTC (permalink / raw)
  To: Theodore Ts'o, Andreas Dilger; +Cc: linux-ext4, linux-kernel

On 11/28/23 10:15 PM, Daniel Dawson wrote:
> When a file is converted from inline to non-inline, it has stale flags 
> until sync.

> Why is this a problem? Because some code will fail under such a 
> condition, for example, lseek(..., SEEK_HOLE) will result in ENOENT.


Just tested. Still happening on 6.8-rc1.

-- 
PGP fingerprint: 5BBD5080FEB0EF7F142F8173D572B791F7B4422A


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [inline_data] ext4: Stale flags before sync when convert to non-inline
  2024-01-23  5:56 ` Daniel Dawson
@ 2024-01-24 17:13   ` Luis Henriques
  2024-01-28 12:06     ` Daniel Dawson
  0 siblings, 1 reply; 8+ messages in thread
From: Luis Henriques @ 2024-01-24 17:13 UTC (permalink / raw)
  To: Daniel Dawson; +Cc: Theodore Ts'o, Andreas Dilger, linux-ext4, linux-kernel

Daniel Dawson <danielcdawson@gmail.com> writes:

> On 11/28/23 10:15 PM, Daniel Dawson wrote:
>> When a file is converted from inline to non-inline, it has stale flags until
>> sync.
>
>> Why is this a problem? Because some code will fail under such a condition, for
>> example, lseek(..., SEEK_HOLE) will result in ENOENT.
>
>
> Just tested. Still happening on 6.8-rc1.

FWIW, I've been looking into a similar issue related with inline-data and
delayed allocation.  It may however be quite different because it seems to
add small block sizes into the mix:

  https://bugzilla.kernel.org/show_bug.cgi?id=200681

Unfortunately, I'm still trying to find my way around all this code and I
can't say I fully understand the whole flow using the reproducer provided
in that bugzilla.

Bellow, I'm inlining a patch that started as debug patch that I've used to
try to understand what was going on.  It seems to workaround that bug, but
I know it's not a real fix -- I don't yet understand what's going on.

Regarding your specific usecase, I can reproduce it and, unfortunately, I
don't thing Ted's suggestion will fix it as I don't even see
ext4_iomap_begin_report() being executed at all.  Anyway, just my 2
cents... let's see if I can come up with something.

Cheers,
-- 
Luís

diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index d5bd1e3a5d36..d0c3d6fd48de 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -528,7 +528,19 @@ int ext4_readpage_inline(struct inode *inode, struct folio *folio)
 	if (!folio->index)
 		ret = ext4_read_inline_folio(inode, folio);
 	else if (!folio_test_uptodate(folio)) {
-		folio_zero_segment(folio, 0, folio_size(folio));
+		struct buffer_head *bh, *head;
+		size_t start = 0;
+
+		head = folio_buffers(folio);
+		if (head) {
+			bh = head;
+			do {
+				if (!buffer_uptodate(bh))
+					break;
+				start += inode->i_sb->s_blocksize;
+			} while ((bh = bh->b_this_page) != head);
+		}
+		folio_zero_segment(folio, start, folio_size(folio));
 		folio_mark_uptodate(folio);
 	}

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [inline_data] ext4: Stale flags before sync when convert to non-inline
  2024-01-24 17:13   ` Luis Henriques
@ 2024-01-28 12:06     ` Daniel Dawson
  2024-01-29 11:17       ` Luis Henriques
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel Dawson @ 2024-01-28 12:06 UTC (permalink / raw)
  To: Luis Henriques
  Cc: Theodore Ts'o, Andreas Dilger, linux-ext4, linux-kernel

I didn't see your message until now. Sorry.

On 1/24/24 9:13 AM, Luis Henriques wrote:
> Bellow, I'm inlining a patch that started as debug patch that I've used to
> try to understand what was going on.  It seems to workaround that bug, but
> I know it's not a real fix -- I don't yet understand what's going on.

Thanks for this. I'm not sure if you meant to say you think it works 
around the present issue. I just tested it, and it does not. In case you 
missed the start of the thread, here is the test I gave for triggering 
the issue:

$ rm -f test-file; dd if=/dev/zero of=test-file bs=64 count=3 
status=none; lsattr test-file

Instead of writing the file all at once, it splits it into 3 writes, 
where the first is small enough to make the file inline, and then it 
becomes non-inline. Ideally, the output should be

--------------e------- test-file

but delayed allocation means it instead shows

------------------N--- test-file

until sync. I also gave this code for testing SEEK_HOLE:

https://gist.github.com/ddawson/22cfd4cac32916f6f1dcc86f90eed21a

> Regarding your specific usecase, I can reproduce it and, unfortunately, I
> don't thing Ted's suggestion will fix it as I don't even see
> ext4_iomap_begin_report() being executed at all.

To be clear, that function is called in a few specific circumstances, 
such as when lseek() is called with SEEK_HOLE or SEEK_DATA, or with 
FIEMAP. When I traced the kernel myself, I did see it being executed 
from the lseek() call. The changes are to address the file not yet being 
converted from inline, where the contents are still written where the 
map would otherwise be. If you treat it as the map, you get nonsense. 
Something else needs to be done.

I'm not clear on whether his proposed changes would then allow an 
application to function properly under such a condition, but it should 
at least *not* give ENOENT.

After testing what I think are the changes he proposed, I find it 
doesn't work. If I remove the "&& iomap->type == IOMAP_HOLE", lseek() no 
longer gives an error, but instead returns 0, which I'm pretty sure 
won't work for the affected use case. Either way, I'm not sure I 
interpreted his description of the changes correctly.

-- 
PGP fingerprint: 5BBD5080FEB0EF7F142F8173D572B791F7B4422A


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [inline_data] ext4: Stale flags before sync when convert to non-inline
  2024-01-28 12:06     ` Daniel Dawson
@ 2024-01-29 11:17       ` Luis Henriques
  0 siblings, 0 replies; 8+ messages in thread
From: Luis Henriques @ 2024-01-29 11:17 UTC (permalink / raw)
  To: Daniel Dawson; +Cc: Theodore Ts'o, Andreas Dilger, linux-ext4, linux-kernel

Daniel Dawson <danielcdawson@gmail.com> writes:

> I didn't see your message until now. Sorry.
>
> On 1/24/24 9:13 AM, Luis Henriques wrote:
>> Bellow, I'm inlining a patch that started as debug patch that I've used to
>> try to understand what was going on.  It seems to workaround that bug, but
>> I know it's not a real fix -- I don't yet understand what's going on.
>
> Thanks for this. I'm not sure if you meant to say you think it works around the
> present issue. I just tested it, and it does not. In case you missed the start

I was referring to the bugzilla bug [1] I've been looking into recently.
My patch was a debug patch for that bug, but it definitely does not fix
the issue you're reporting.  Sorry for mixing up everything and confusing
everyone ;-)

[1] https://bugzilla.kernel.org/show_bug.cgi?id=200681

> of the thread, here is the test I gave for triggering the issue:
>
> $ rm -f test-file; dd if=/dev/zero of=test-file bs=64 count=3 status=none;
> lsattr test-file
>
> Instead of writing the file all at once, it splits it into 3 writes, where the
> first is small enough to make the file inline, and then it becomes
> non-inline. Ideally, the output should be
>
> --------------e------- test-file
>
> but delayed allocation means it instead shows
>
> ------------------N--- test-file
>
> until sync. I also gave this code for testing SEEK_HOLE:
>
> https://gist.github.com/ddawson/22cfd4cac32916f6f1dcc86f90eed21a
>
>> Regarding your specific usecase, I can reproduce it and, unfortunately, I
>> don't thing Ted's suggestion will fix it as I don't even see
>> ext4_iomap_begin_report() being executed at all.
>
> To be clear, that function is called in a few specific circumstances, such as
> when lseek() is called with SEEK_HOLE or SEEK_DATA, or with FIEMAP. When I
> traced the kernel myself, I did see it being executed from the lseek() call. The
> changes are to address the file not yet being converted from inline, where the
> contents are still written where the map would otherwise be. If you treat it as
> the map, you get nonsense. Something else needs to be done.
>
> I'm not clear on whether his proposed changes would then allow an application to
> function properly under such a condition, but it should at least *not* give
> ENOENT.
>
> After testing what I think are the changes he proposed, I find it doesn't
> work. If I remove the "&& iomap->type == IOMAP_HOLE", lseek() no longer gives an
> error, but instead returns 0, which I'm pretty sure won't work for the affected
> use case. Either way, I'm not sure I interpreted his description of the changes
> correctly.

Sure, I can understand under which circumstances ext4_iomap_begin_report()
can be executed.  What I meant was that, for your very simple test case
(using 'dd' and 'lsattr'), this function will not be executed and thus the
suggested fix will still show the 'N' in the file attributes.

Anyway, thanks a lot for the extra information your providing here.  I'll
try to find some time in the next few days to keep debugging the issue and
hopefully get some more useful info (and, who knows! a patch).

Cheers,
-- 
Luís

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-01-29 11:17 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-29  6:15 [inline_data] ext4: Stale flags before sync when convert to non-inline Daniel Dawson
2023-11-29 20:05 ` Daniel Dawson
2023-11-30  4:06 ` Theodore Ts'o
2023-12-28  9:29   ` Daniel Dawson
2024-01-23  5:56 ` Daniel Dawson
2024-01-24 17:13   ` Luis Henriques
2024-01-28 12:06     ` Daniel Dawson
2024-01-29 11:17       ` Luis Henriques

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.