linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] btrfs: populate extent_map::generation when reading from disk
@ 2022-02-05  8:55 Qu Wenruo
  2022-02-07 10:52 ` Filipe Manana
  0 siblings, 1 reply; 5+ messages in thread
From: Qu Wenruo @ 2022-02-05  8:55 UTC (permalink / raw)
  To: linux-btrfs

[WEIRD BEHAVIOR]

When btrfs_get_extent() tries to get some file extent from disk, it
never populates extent_map::generation , leaving the value to be 0.

On the other hand, for extent map generated by IO, it will get its
generation properly set at finish_ordered_io()

 finish_ordered_io()
 |- unpin_extent_cache(gen = trans->transid)
    |- em->generation = gen;

[REGRESSION?]
I have no idea when such behavior is introduced, but at least in v5.15
this incorrect behavior is already there.

[AFFECT]
Not really sure if there is any behavior really get affected.

Sure there are locations like extent map merging, but there is no value
smaller than 0 for u64, thus it won't really cause a difference.

For autodefrag, although it's checking em->generation to determine if we
need to defrag a range, but that @new_than value is always from IO, thus
all those extent maps with 0 generation will just be skipped, and that's
the expected behavior anyway.

For manual defrag, @newer_than is 0, and our check is to skip generation
smaller than @newer_than, thus it still makes no difference.

[FIX]
To make things less weird, let us populate extent_map::generation in
btrfs_extent_item_to_extent_map().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/file-item.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index 90c5c38836ab..9a3de652ada8 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -1211,6 +1211,7 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
 	extent_start = key.offset;
 	extent_end = btrfs_file_extent_end(path);
 	em->ram_bytes = btrfs_file_extent_ram_bytes(leaf, fi);
+	em->generation = btrfs_file_extent_generation(leaf, fi);
 	if (type == BTRFS_FILE_EXTENT_REG ||
 	    type == BTRFS_FILE_EXTENT_PREALLOC) {
 		em->start = extent_start;
-- 
2.35.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] btrfs: populate extent_map::generation when reading from disk
  2022-02-05  8:55 [PATCH] btrfs: populate extent_map::generation when reading from disk Qu Wenruo
@ 2022-02-07 10:52 ` Filipe Manana
  2022-02-07 11:39   ` Qu Wenruo
  0 siblings, 1 reply; 5+ messages in thread
From: Filipe Manana @ 2022-02-07 10:52 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Sat, Feb 05, 2022 at 04:55:47PM +0800, Qu Wenruo wrote:
> [WEIRD BEHAVIOR]
> 
> When btrfs_get_extent() tries to get some file extent from disk, it
> never populates extent_map::generation , leaving the value to be 0.
> 
> On the other hand, for extent map generated by IO, it will get its
> generation properly set at finish_ordered_io()
> 
>  finish_ordered_io()
>  |- unpin_extent_cache(gen = trans->transid)
>     |- em->generation = gen;
> 
> [REGRESSION?]
> I have no idea when such behavior is introduced, but at least in v5.15
> this incorrect behavior is already there.

The extent map generation is basically only used by the fsync code, but
as it deals only with modified extents, it always sees non-zero generation.

> 
> [AFFECT]
> Not really sure if there is any behavior really get affected.

affect -> effect

No, I don't think it affects anything.

> 
> Sure there are locations like extent map merging, but there is no value
> smaller than 0 for u64, thus it won't really cause a difference.
> 
> For autodefrag, although it's checking em->generation to determine if we
> need to defrag a range, but that @new_than value is always from IO, thus

This is confusing.
You mean the minimum generation threshold for autodefrag. Referring to
a function parameter (and it's named "newer_than") out of context, is
hard to follow.


> all those extent maps with 0 generation will just be skipped, and that's
> the expected behavior anyway.
> 
> For manual defrag, @newer_than is 0, and our check is to skip generation
> smaller than @newer_than, thus it still makes no difference.

Same here, saying the minimum generation threshold for defrag is more
informative than referring to the name of a function parameter. A function
that is not even touched by the patch makes it hard to understand.

> 
> [FIX]
> To make things less weird, let us populate extent_map::generation in
> btrfs_extent_item_to_extent_map().

Looks good.
Though I don't think this fixes anything. As I pointed out in the other
thread, the extent map generation is basically only I used by fsync, which
doesn't use extent maps that are not the in the list of modified extents
(and those always have a generation > 0).

Thnaks.

> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/file-item.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
> index 90c5c38836ab..9a3de652ada8 100644
> --- a/fs/btrfs/file-item.c
> +++ b/fs/btrfs/file-item.c
> @@ -1211,6 +1211,7 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
>  	extent_start = key.offset;
>  	extent_end = btrfs_file_extent_end(path);
>  	em->ram_bytes = btrfs_file_extent_ram_bytes(leaf, fi);
> +	em->generation = btrfs_file_extent_generation(leaf, fi);
>  	if (type == BTRFS_FILE_EXTENT_REG ||
>  	    type == BTRFS_FILE_EXTENT_PREALLOC) {
>  		em->start = extent_start;
> -- 
> 2.35.0
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] btrfs: populate extent_map::generation when reading from disk
  2022-02-07 10:52 ` Filipe Manana
@ 2022-02-07 11:39   ` Qu Wenruo
  2022-02-08 16:34     ` Filipe Manana
  0 siblings, 1 reply; 5+ messages in thread
From: Qu Wenruo @ 2022-02-07 11:39 UTC (permalink / raw)
  To: Filipe Manana, Qu Wenruo; +Cc: linux-btrfs



On 2022/2/7 18:52, Filipe Manana wrote:
> On Sat, Feb 05, 2022 at 04:55:47PM +0800, Qu Wenruo wrote:
>> [WEIRD BEHAVIOR]
>>
>> When btrfs_get_extent() tries to get some file extent from disk, it
>> never populates extent_map::generation , leaving the value to be 0.
>>
>> On the other hand, for extent map generated by IO, it will get its
>> generation properly set at finish_ordered_io()
>>
>>   finish_ordered_io()
>>   |- unpin_extent_cache(gen = trans->transid)
>>      |- em->generation = gen;
>>
>> [REGRESSION?]
>> I have no idea when such behavior is introduced, but at least in v5.15
>> this incorrect behavior is already there.
>
> The extent map generation is basically only used by the fsync code, but
> as it deals only with modified extents, it always sees non-zero generation.
>
>>
>> [AFFECT]
>> Not really sure if there is any behavior really get affected.
>
> affect -> effect
>
> No, I don't think it affects anything.
>
>>
>> Sure there are locations like extent map merging, but there is no value
>> smaller than 0 for u64, thus it won't really cause a difference.
>>
>> For autodefrag, although it's checking em->generation to determine if we
>> need to defrag a range, but that @new_than value is always from IO, thus
>
> This is confusing.
> You mean the minimum generation threshold for autodefrag. Referring to
> a function parameter (and it's named "newer_than") out of context, is
> hard to follow.
>
>
>> all those extent maps with 0 generation will just be skipped, and that's
>> the expected behavior anyway.
>>
>> For manual defrag, @newer_than is 0, and our check is to skip generation
>> smaller than @newer_than, thus it still makes no difference.
>
> Same here, saying the minimum generation threshold for defrag is more
> informative than referring to the name of a function parameter. A function
> that is not even touched by the patch makes it hard to understand.
>
>>
>> [FIX]
>> To make things less weird, let us populate extent_map::generation in
>> btrfs_extent_item_to_extent_map().
>
> Looks good.
> Though I don't think this fixes anything.

I'm considering the following extreme corner case, which this patch may
make a difference (although to cause extra IO)

E.g.
Root 5, last_trans = 500.

In transaction 510, we write back some data for inode A of root 5, the
extent is at file offset 0 len 32K, triggering autodefrag to create an
inode_defrag structure with transid 500 (from root 5 last_trans).

Then we do some fragmented writeback following inode A file offset 32K,
they all happen before cleaner get triggered.

Before cleaner get triggered, fd for inode A is closed, and all cache is
dropped, including the extent map cache.

Then autodefrag get triggered, it tries to get the extent map for offset
0, and got an em, with generation 0.

Since 0 (unpopulated em::generation) < 500 (inode_defrag::transid), the
extent doesn't need to be defragged.

But in fact, these new extents at file offset 0 and onward all have
generation newer than 510, and can be defragged.


Although this is opposite what we're chasing, it will cause more IO,
instead of less...

Thanks,
Qu



> As I pointed out in the other
> thread, the extent map generation is basically only I used by fsync, which
> doesn't use extent maps that are not the in the list of modified extents
> (and those always have a generation > 0).
>
> Thnaks.
>
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   fs/btrfs/file-item.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
>> index 90c5c38836ab..9a3de652ada8 100644
>> --- a/fs/btrfs/file-item.c
>> +++ b/fs/btrfs/file-item.c
>> @@ -1211,6 +1211,7 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
>>   	extent_start = key.offset;
>>   	extent_end = btrfs_file_extent_end(path);
>>   	em->ram_bytes = btrfs_file_extent_ram_bytes(leaf, fi);
>> +	em->generation = btrfs_file_extent_generation(leaf, fi);
>>   	if (type == BTRFS_FILE_EXTENT_REG ||
>>   	    type == BTRFS_FILE_EXTENT_PREALLOC) {
>>   		em->start = extent_start;
>> --
>> 2.35.0
>>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] btrfs: populate extent_map::generation when reading from disk
  2022-02-07 11:39   ` Qu Wenruo
@ 2022-02-08 16:34     ` Filipe Manana
  2022-02-09  0:41       ` Qu Wenruo
  0 siblings, 1 reply; 5+ messages in thread
From: Filipe Manana @ 2022-02-08 16:34 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs

On Mon, Feb 07, 2022 at 07:39:06PM +0800, Qu Wenruo wrote:
> 
> 
> On 2022/2/7 18:52, Filipe Manana wrote:
> > On Sat, Feb 05, 2022 at 04:55:47PM +0800, Qu Wenruo wrote:
> > > [WEIRD BEHAVIOR]
> > > 
> > > When btrfs_get_extent() tries to get some file extent from disk, it
> > > never populates extent_map::generation , leaving the value to be 0.
> > > 
> > > On the other hand, for extent map generated by IO, it will get its
> > > generation properly set at finish_ordered_io()
> > > 
> > >   finish_ordered_io()
> > >   |- unpin_extent_cache(gen = trans->transid)
> > >      |- em->generation = gen;
> > > 
> > > [REGRESSION?]
> > > I have no idea when such behavior is introduced, but at least in v5.15
> > > this incorrect behavior is already there.
> > 
> > The extent map generation is basically only used by the fsync code, but
> > as it deals only with modified extents, it always sees non-zero generation.
> > 
> > > 
> > > [AFFECT]
> > > Not really sure if there is any behavior really get affected.
> > 
> > affect -> effect
> > 
> > No, I don't think it affects anything.
> > 
> > > 
> > > Sure there are locations like extent map merging, but there is no value
> > > smaller than 0 for u64, thus it won't really cause a difference.
> > > 
> > > For autodefrag, although it's checking em->generation to determine if we
> > > need to defrag a range, but that @new_than value is always from IO, thus
> > 
> > This is confusing.
> > You mean the minimum generation threshold for autodefrag. Referring to
> > a function parameter (and it's named "newer_than") out of context, is
> > hard to follow.
> > 
> > 
> > > all those extent maps with 0 generation will just be skipped, and that's
> > > the expected behavior anyway.
> > > 
> > > For manual defrag, @newer_than is 0, and our check is to skip generation
> > > smaller than @newer_than, thus it still makes no difference.
> > 
> > Same here, saying the minimum generation threshold for defrag is more
> > informative than referring to the name of a function parameter. A function
> > that is not even touched by the patch makes it hard to understand.
> > 
> > > 
> > > [FIX]
> > > To make things less weird, let us populate extent_map::generation in
> > > btrfs_extent_item_to_extent_map().
> > 
> > Looks good.
> > Though I don't think this fixes anything.
> 
> I'm considering the following extreme corner case, which this patch may
> make a difference (although to cause extra IO)
> 
> E.g.
> Root 5, last_trans = 500.
> 
> In transaction 510, we write back some data for inode A of root 5, the
> extent is at file offset 0 len 32K, triggering autodefrag to create an
> inode_defrag structure with transid 500 (from root 5 last_trans).
> 
> Then we do some fragmented writeback following inode A file offset 32K,
> they all happen before cleaner get triggered.
> 
> Before cleaner get triggered, fd for inode A is closed, and all cache is
> dropped, including the extent map cache.
> 
> Then autodefrag get triggered, it tries to get the extent map for offset
> 0, and got an em, with generation 0.
> 
> Since 0 (unpopulated em::generation) < 500 (inode_defrag::transid), the
> extent doesn't need to be defragged.
> 
> But in fact, these new extents at file offset 0 and onward all have
> generation newer than 510, and can be defragged.
> 
> 
> Although this is opposite what we're chasing, it will cause more IO,
> instead of less...

Right, when I said it didn't fix anything, it was in regards to 5.15 and
previous kernels, since the only code that checks for generations in
extent maps is the fsync code. But the extent maps it looks at are always
in the list of modified extents, so their generation is never 0.

Another quick look, and I can't find anything else relying on the generation
of extent maps that could break a 0 value.

Yes, this is the opposite of the problems being reported regarding excessive
IO. With the new defrag code from 5.16 this will often cause even more IO.

Looks fine to me, but I think right now the priority should be to find out
why is so much more IO being triggered.

I still think the change log should be phrased differently, specially parts
like "Not really sure if there is any behavior really get affected.", as
that doesn't sound very good.

Once that's updated:

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks.


> 
> Thanks,
> Qu
> 
> 
> 
> > As I pointed out in the other
> > thread, the extent map generation is basically only I used by fsync, which
> > doesn't use extent maps that are not the in the list of modified extents
> > (and those always have a generation > 0).
> > 
> > Thnaks.
> > 
> > > 
> > > Signed-off-by: Qu Wenruo <wqu@suse.com>
> > > ---
> > >   fs/btrfs/file-item.c | 1 +
> > >   1 file changed, 1 insertion(+)
> > > 
> > > diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
> > > index 90c5c38836ab..9a3de652ada8 100644
> > > --- a/fs/btrfs/file-item.c
> > > +++ b/fs/btrfs/file-item.c
> > > @@ -1211,6 +1211,7 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
> > >   	extent_start = key.offset;
> > >   	extent_end = btrfs_file_extent_end(path);
> > >   	em->ram_bytes = btrfs_file_extent_ram_bytes(leaf, fi);
> > > +	em->generation = btrfs_file_extent_generation(leaf, fi);
> > >   	if (type == BTRFS_FILE_EXTENT_REG ||
> > >   	    type == BTRFS_FILE_EXTENT_PREALLOC) {
> > >   		em->start = extent_start;
> > > --
> > > 2.35.0
> > > 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] btrfs: populate extent_map::generation when reading from disk
  2022-02-08 16:34     ` Filipe Manana
@ 2022-02-09  0:41       ` Qu Wenruo
  0 siblings, 0 replies; 5+ messages in thread
From: Qu Wenruo @ 2022-02-09  0:41 UTC (permalink / raw)
  To: Filipe Manana; +Cc: Qu Wenruo, linux-btrfs



On 2022/2/9 00:34, Filipe Manana wrote:
> On Mon, Feb 07, 2022 at 07:39:06PM +0800, Qu Wenruo wrote:
>>
>>
>> On 2022/2/7 18:52, Filipe Manana wrote:
>>> On Sat, Feb 05, 2022 at 04:55:47PM +0800, Qu Wenruo wrote:
>>>> [WEIRD BEHAVIOR]
>>>>
>>>> When btrfs_get_extent() tries to get some file extent from disk, it
>>>> never populates extent_map::generation , leaving the value to be 0.
>>>>
>>>> On the other hand, for extent map generated by IO, it will get its
>>>> generation properly set at finish_ordered_io()
>>>>
>>>>    finish_ordered_io()
>>>>    |- unpin_extent_cache(gen = trans->transid)
>>>>       |- em->generation = gen;
>>>>
>>>> [REGRESSION?]
>>>> I have no idea when such behavior is introduced, but at least in v5.15
>>>> this incorrect behavior is already there.
>>>
>>> The extent map generation is basically only used by the fsync code, but
>>> as it deals only with modified extents, it always sees non-zero generation.
>>>
>>>>
>>>> [AFFECT]
>>>> Not really sure if there is any behavior really get affected.
>>>
>>> affect -> effect
>>>
>>> No, I don't think it affects anything.
>>>
>>>>
>>>> Sure there are locations like extent map merging, but there is no value
>>>> smaller than 0 for u64, thus it won't really cause a difference.
>>>>
>>>> For autodefrag, although it's checking em->generation to determine if we
>>>> need to defrag a range, but that @new_than value is always from IO, thus
>>>
>>> This is confusing.
>>> You mean the minimum generation threshold for autodefrag. Referring to
>>> a function parameter (and it's named "newer_than") out of context, is
>>> hard to follow.
>>>
>>>
>>>> all those extent maps with 0 generation will just be skipped, and that's
>>>> the expected behavior anyway.
>>>>
>>>> For manual defrag, @newer_than is 0, and our check is to skip generation
>>>> smaller than @newer_than, thus it still makes no difference.
>>>
>>> Same here, saying the minimum generation threshold for defrag is more
>>> informative than referring to the name of a function parameter. A function
>>> that is not even touched by the patch makes it hard to understand.
>>>
>>>>
>>>> [FIX]
>>>> To make things less weird, let us populate extent_map::generation in
>>>> btrfs_extent_item_to_extent_map().
>>>
>>> Looks good.
>>> Though I don't think this fixes anything.
>>
>> I'm considering the following extreme corner case, which this patch may
>> make a difference (although to cause extra IO)
>>
>> E.g.
>> Root 5, last_trans = 500.
>>
>> In transaction 510, we write back some data for inode A of root 5, the
>> extent is at file offset 0 len 32K, triggering autodefrag to create an
>> inode_defrag structure with transid 500 (from root 5 last_trans).
>>
>> Then we do some fragmented writeback following inode A file offset 32K,
>> they all happen before cleaner get triggered.
>>
>> Before cleaner get triggered, fd for inode A is closed, and all cache is
>> dropped, including the extent map cache.
>>
>> Then autodefrag get triggered, it tries to get the extent map for offset
>> 0, and got an em, with generation 0.
>>
>> Since 0 (unpopulated em::generation) < 500 (inode_defrag::transid), the
>> extent doesn't need to be defragged.
>>
>> But in fact, these new extents at file offset 0 and onward all have
>> generation newer than 510, and can be defragged.
>>
>>
>> Although this is opposite what we're chasing, it will cause more IO,
>> instead of less...
>
> Right, when I said it didn't fix anything, it was in regards to 5.15 and
> previous kernels, since the only code that checks for generations in
> extent maps is the fsync code. But the extent maps it looks at are always
> in the list of modified extents, so their generation is never 0.
>
> Another quick look, and I can't find anything else relying on the generation
> of extent maps that could break a 0 value.
>
> Yes, this is the opposite of the problems being reported regarding excessive
> IO. With the new defrag code from 5.16 this will often cause even more IO.
>
> Looks fine to me, but I think right now the priority should be to find out
> why is so much more IO being triggered.

Yes, that's still my top list goal.

But unfortunately I don't yet have any good reproducer for this problem.

So for now I can only rely on kind reporters to test with ftrace patches
to show every range we choose to defrag...

Thanks,
Qu

>
> I still think the change log should be phrased differently, specially parts
> like "Not really sure if there is any behavior really get affected.", as
> that doesn't sound very good.
>
> Once that's updated:
>
> Reviewed-by: Filipe Manana <fdmanana@suse.com>
>
> Thanks.
>
>
>>
>> Thanks,
>> Qu
>>
>>
>>
>>> As I pointed out in the other
>>> thread, the extent map generation is basically only I used by fsync, which
>>> doesn't use extent maps that are not the in the list of modified extents
>>> (and those always have a generation > 0).
>>>
>>> Thnaks.
>>>
>>>>
>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>> ---
>>>>    fs/btrfs/file-item.c | 1 +
>>>>    1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
>>>> index 90c5c38836ab..9a3de652ada8 100644
>>>> --- a/fs/btrfs/file-item.c
>>>> +++ b/fs/btrfs/file-item.c
>>>> @@ -1211,6 +1211,7 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
>>>>    	extent_start = key.offset;
>>>>    	extent_end = btrfs_file_extent_end(path);
>>>>    	em->ram_bytes = btrfs_file_extent_ram_bytes(leaf, fi);
>>>> +	em->generation = btrfs_file_extent_generation(leaf, fi);
>>>>    	if (type == BTRFS_FILE_EXTENT_REG ||
>>>>    	    type == BTRFS_FILE_EXTENT_PREALLOC) {
>>>>    		em->start = extent_start;
>>>> --
>>>> 2.35.0
>>>>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-02-09  0:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-05  8:55 [PATCH] btrfs: populate extent_map::generation when reading from disk Qu Wenruo
2022-02-07 10:52 ` Filipe Manana
2022-02-07 11:39   ` Qu Wenruo
2022-02-08 16:34     ` Filipe Manana
2022-02-09  0:41       ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).