[POC for v5.15 0/2] btrfs: defrag: what if v5.15 is doing proper defrag

All of lore.kernel.org
 help / color / mirror / Atom feed

* [POC for v5.15 0/2] btrfs: defrag: what if v5.15 is doing proper defrag
@ 2022-01-25  6:50 Qu Wenruo
  2022-01-25  6:50 ` [POC for v5.15 1/2] btrfs: defrag: don't defrag preallocated extents Qu Wenruo
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Qu Wenruo @ 2022-01-25  6:50 UTC (permalink / raw)
  To: linux-btrfs

** DON'T MERGE, THIS IS JUST A PROOF OF CONCEPT **

There are several reports about v5.16 btrfs autodefrag is causing more
IO than v5.15.

But it turns out that, commit f458a3873ae ("btrfs: fix race when
defragmenting leads to unnecessary IO") is making defrags doing less
work than it should.
Thus damping the IO for autodefrag.

This POC series is to make v5.15 kernel to do proper defrag of all good
candidates while still not defrag any hole/preallocated range.

The test script here looks like this:

	wipefs -fa $dev
	mkfs.btrfs -f $dev -U $uuid > /dev/null
	mount $dev $mnt -o autodefrag
	$fsstress -w -n 2000 -p 1 -d $mnt -s 1642319517
	sync
	echo "=== baseline ==="
	cat /sys/fs/btrfs/$uuid/debug/io_accounting/data_write
	echo 0 > /sys/fs/btrfs/$uuid/debug/cleaner_trigger
	sleep 3
	sync
	echo "=== after autodefrag ==="
	cat /sys/fs/btrfs/$uuid/debug/io_accounting/data_write
	umount $mnt

<uuid>/debug/io_accounting/data_write is the new debug features showing
how many bytes has been written for a btrfs.
The numbers are before chunk mapping.
cleaer_trigger is the trigger to wake up cleaner_kthread so autodefrag
can do its work.

Now there is result:

                | Data bytes written | Diff to baseline
----------------+--------------------+------------------
no autodefrag   | 36896768           | 0
v5.15 vanilla   | 40079360           | +8.6%
v5.15 POC       | 42491904           | +15.2%
v5.16 fixes	| 42536960	     | +15.3%

The data shows, although v5.15 vanilla is really causing the least
amount of IO for autodefrag, if v5.15 is patched with POC to do proper
defrag, the final IO is almost the same as v5.16 with submitted fixes.

So this proves that, the v5.15 has lower IO is not a valid default, but
a regression which leads to less efficient defrag.

And the IO increase is in fact a proof of a regression being fixed.

Qu Wenruo (2):
  btrfs: defrag: don't defrag preallocated extents
  btrfs: defrag: limit cluster size to the first hole/prealloc range

 fs/btrfs/ioctl.c | 48 ++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 42 insertions(+), 6 deletions(-)

-- 
2.34.1

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [POC for v5.15 1/2] btrfs: defrag: don't defrag preallocated extents
  2022-01-25  6:50 [POC for v5.15 0/2] btrfs: defrag: what if v5.15 is doing proper defrag Qu Wenruo
@ 2022-01-25  6:50 ` Qu Wenruo
  2022-01-25  6:50 ` [POC for v5.15 2/2] btrfs: defrag: limit cluster size to the first hole/prealloc range Qu Wenruo
  2022-01-25 10:37 ` [POC for v5.15 0/2] btrfs: defrag: what if v5.15 is doing proper defrag Filipe Manana
  2 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2022-01-25  6:50 UTC (permalink / raw)
  To: linux-btrfs

This is the same patch as "btrfs: defrag: don't try to merge regular
extents with preallocated extents" but for v5.15.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ioctl.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 0ae5777bf056..8350864a4bd8 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1152,19 +1152,23 @@ static bool defrag_check_next_extent(struct inode *inode, struct extent_map *em,
 				     bool locked)
 {
 	struct extent_map *next;
-	bool ret = true;
+	bool ret = false;
 
 	/* this is the last extent */
 	if (em->start + em->len >= i_size_read(inode))
-		return false;
+		return ret;
 
 	next = defrag_lookup_extent(inode, em->start + em->len, locked);
 	if (!next || next->block_start >= EXTENT_MAP_LAST_BYTE)
-		ret = false;
-	else if ((em->block_start + em->block_len == next->block_start) &&
-		 (em->block_len > SZ_128K && next->block_len > SZ_128K))
-		ret = false;
+		goto out;
+	if (test_bit(EXTENT_FLAG_COMPRESSED, &next->flags))
+		goto out;
+	if ((em->block_start + em->block_len == next->block_start) &&
+	 (em->block_len > SZ_128K && next->block_len > SZ_128K))
+		goto out;
 
+	ret = true;
+out:
 	free_extent_map(next);
 	return ret;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [POC for v5.15 2/2] btrfs: defrag: limit cluster size to the first hole/prealloc range
  2022-01-25  6:50 [POC for v5.15 0/2] btrfs: defrag: what if v5.15 is doing proper defrag Qu Wenruo
  2022-01-25  6:50 ` [POC for v5.15 1/2] btrfs: defrag: don't defrag preallocated extents Qu Wenruo
@ 2022-01-25  6:50 ` Qu Wenruo
  2022-01-25 10:37 ` [POC for v5.15 0/2] btrfs: defrag: what if v5.15 is doing proper defrag Filipe Manana
  2 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2022-01-25  6:50 UTC (permalink / raw)
  To: linux-btrfs

We always use 256K as cluster size if possible, but it's very common
that there are only several small regular extents which are target for
defrag.

But those extents are not filling the whole 256K cluster, and the holes
after those extents will make defrag to reject the whole cluster.

This patch will mitigate the problem by doing another search for the
hole/preallocated range, and limit the size of the cluster just to the
regular extents part.

By this, defrag can do more of its work without being interrupted by a
hole.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ioctl.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 8350864a4bd8..b66bb10e2e4a 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1481,6 +1481,29 @@ static int cluster_pages_for_defrag(struct inode *inode,
 
 }
 
+static u64 get_cluster_size(struct inode *inode, u64 start, u64 len)
+{
+	u64 cur = start;
+	u64 cluster_len = 0;
+	while (cur < start + len) {
+		struct extent_map *em;
+
+		em = defrag_lookup_extent(inode, cur, false);
+		if (!em)
+			break;
+		/*
+		 * Here we don't do comprehensive checks, we just
+		 * find the first hole/preallocated.
+		 */
+		if (em->block_start >= EXTENT_MAP_LAST_BYTE ||
+		    test_bit(EXTENT_FLAG_PREALLOC, &em->flags))
+			break;
+		cluster_len += min(em->start + em->len - cur, start + len - cur);
+		cur = min(em->start + em->len, start + len);
+	}
+	return cluster_len;
+}
+
 /*
  * Entry point to file defragmentation.
  *
@@ -1618,6 +1641,13 @@ int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
 		} else {
 			cluster = max_cluster;
 		}
+		cluster = min_t(unsigned long,
+				get_cluster_size(inode, i << PAGE_SHIFT,
+						 cluster << PAGE_SHIFT)
+					>> PAGE_SHIFT,
+			        cluster);
+		if (cluster == 0)
+			goto next;
 
 		if (i + cluster > ra_index) {
 			ra_index = max(i, ra_index);
@@ -1644,6 +1674,7 @@ int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
 		balance_dirty_pages_ratelimited(inode->i_mapping);
 		btrfs_inode_unlock(inode, 0);
 
+next:
 		if (newer_than) {
 			if (newer_off == (u64)-1)
 				break;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [POC for v5.15 0/2] btrfs: defrag: what if v5.15 is doing proper defrag
  2022-01-25  6:50 [POC for v5.15 0/2] btrfs: defrag: what if v5.15 is doing proper defrag Qu Wenruo
  2022-01-25  6:50 ` [POC for v5.15 1/2] btrfs: defrag: don't defrag preallocated extents Qu Wenruo
  2022-01-25  6:50 ` [POC for v5.15 2/2] btrfs: defrag: limit cluster size to the first hole/prealloc range Qu Wenruo
@ 2022-01-25 10:37 ` Filipe Manana
  2022-01-25 10:55   ` Qu Wenruo
  2 siblings, 1 reply; 18+ messages in thread
From: Filipe Manana @ 2022-01-25 10:37 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Jan 25, 2022 at 02:50:55PM +0800, Qu Wenruo wrote:
> ** DON'T MERGE, THIS IS JUST A PROOF OF CONCEPT **
> 
> There are several reports about v5.16 btrfs autodefrag is causing more
> IO than v5.15.
> 
> But it turns out that, commit f458a3873ae ("btrfs: fix race when
> defragmenting leads to unnecessary IO") is making defrags doing less
> work than it should.
> Thus damping the IO for autodefrag.
> 
> This POC series is to make v5.15 kernel to do proper defrag of all good
> candidates while still not defrag any hole/preallocated range.
> 
> The test script here looks like this:
> 
> 	wipefs -fa $dev
> 	mkfs.btrfs -f $dev -U $uuid > /dev/null
> 	mount $dev $mnt -o autodefrag
> 	$fsstress -w -n 2000 -p 1 -d $mnt -s 1642319517
> 	sync
> 	echo "=== baseline ==="
> 	cat /sys/fs/btrfs/$uuid/debug/io_accounting/data_write
> 	echo 0 > /sys/fs/btrfs/$uuid/debug/cleaner_trigger
> 	sleep 3
> 	sync
> 	echo "=== after autodefrag ==="
> 	cat /sys/fs/btrfs/$uuid/debug/io_accounting/data_write
> 	umount $mnt
> 
> <uuid>/debug/io_accounting/data_write is the new debug features showing
> how many bytes has been written for a btrfs.
> The numbers are before chunk mapping.
> cleaer_trigger is the trigger to wake up cleaner_kthread so autodefrag
> can do its work.
> 
> Now there is result:
> 
>                 | Data bytes written | Diff to baseline
> ----------------+--------------------+------------------
> no autodefrag   | 36896768           | 0
> v5.15 vanilla   | 40079360           | +8.6%
> v5.15 POC       | 42491904           | +15.2%
> v5.16 fixes	| 42536960	     | +15.3%
> 
> The data shows, although v5.15 vanilla is really causing the least
> amount of IO for autodefrag, if v5.15 is patched with POC to do proper
> defrag, the final IO is almost the same as v5.16 with submitted fixes.
> 
> So this proves that, the v5.15 has lower IO is not a valid default, but
> a regression which leads to less efficient defrag.
> 
> And the IO increase is in fact a proof of a regression being fixed.

Are you sure that's the only thing?
Users report massive IO difference, 15% more does not seem to be massive.
François for example reported a difference of 10 ops/s vs 1k ops/s [1]

It also does not explain the 100% cpu usage of the cleaner kthread.
Scanning the whole file based on extent maps and not using
btrfs_search_forward() anymore, as discussed yesterday on slack, can
however contribute to much higher cpu usage.

[1] https://lore.kernel.org/linux-btrfs/CAEwRaO4y3PPPUdwYjNDoB9m9CLzfd3DFFk2iK1X6OyyEWG5-mg@mail.gmail.com/

Thanks.

> 
> Qu Wenruo (2):
>   btrfs: defrag: don't defrag preallocated extents
>   btrfs: defrag: limit cluster size to the first hole/prealloc range
> 
>  fs/btrfs/ioctl.c | 48 ++++++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 42 insertions(+), 6 deletions(-)
> 
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [POC for v5.15 0/2] btrfs: defrag: what if v5.15 is doing proper defrag
  2022-01-25 10:37 ` [POC for v5.15 0/2] btrfs: defrag: what if v5.15 is doing proper defrag Filipe Manana
@ 2022-01-25 10:55   ` Qu Wenruo
  2022-01-25 11:05     ` Qu Wenruo
  0 siblings, 1 reply; 18+ messages in thread
From: Qu Wenruo @ 2022-01-25 10:55 UTC (permalink / raw)
  To: Filipe Manana, Qu Wenruo; +Cc: linux-btrfs



On 2022/1/25 18:37, Filipe Manana wrote:
> On Tue, Jan 25, 2022 at 02:50:55PM +0800, Qu Wenruo wrote:
>> ** DON'T MERGE, THIS IS JUST A PROOF OF CONCEPT **
>>
>> There are several reports about v5.16 btrfs autodefrag is causing more
>> IO than v5.15.
>>
>> But it turns out that, commit f458a3873ae ("btrfs: fix race when
>> defragmenting leads to unnecessary IO") is making defrags doing less
>> work than it should.
>> Thus damping the IO for autodefrag.
>>
>> This POC series is to make v5.15 kernel to do proper defrag of all good
>> candidates while still not defrag any hole/preallocated range.
>>
>> The test script here looks like this:
>>
>> 	wipefs -fa $dev
>> 	mkfs.btrfs -f $dev -U $uuid > /dev/null
>> 	mount $dev $mnt -o autodefrag
>> 	$fsstress -w -n 2000 -p 1 -d $mnt -s 1642319517
>> 	sync
>> 	echo "=== baseline ==="
>> 	cat /sys/fs/btrfs/$uuid/debug/io_accounting/data_write
>> 	echo 0 > /sys/fs/btrfs/$uuid/debug/cleaner_trigger
>> 	sleep 3
>> 	sync
>> 	echo "=== after autodefrag ==="
>> 	cat /sys/fs/btrfs/$uuid/debug/io_accounting/data_write
>> 	umount $mnt
>>
>> <uuid>/debug/io_accounting/data_write is the new debug features showing
>> how many bytes has been written for a btrfs.
>> The numbers are before chunk mapping.
>> cleaer_trigger is the trigger to wake up cleaner_kthread so autodefrag
>> can do its work.
>>
>> Now there is result:
>>
>>                  | Data bytes written | Diff to baseline
>> ----------------+--------------------+------------------
>> no autodefrag   | 36896768           | 0
>> v5.15 vanilla   | 40079360           | +8.6%
>> v5.15 POC       | 42491904           | +15.2%
>> v5.16 fixes	| 42536960	     | +15.3%
>>
>> The data shows, although v5.15 vanilla is really causing the least
>> amount of IO for autodefrag, if v5.15 is patched with POC to do proper
>> defrag, the final IO is almost the same as v5.16 with submitted fixes.
>>
>> So this proves that, the v5.15 has lower IO is not a valid default, but
>> a regression which leads to less efficient defrag.
>>
>> And the IO increase is in fact a proof of a regression being fixed.
>
> Are you sure that's the only thing?

Not the only thing, but it's proving the baseline of v5.15 is not a
reliable one.

> Users report massive IO difference, 15% more does not seem to be massive.
> François for example reported a difference of 10 ops/s vs 1k ops/s [1]

This is just for the seed I'm using.
As it provides a reliable and constant baseline where all my previous
testing and debugging are based on.

It can be definitely way more IO if the load involves more full cluster
rejection.

>
> It also does not explain the 100% cpu usage of the cleaner kthread.
> Scanning the whole file based on extent maps and not using
> btrfs_search_forward() anymore, as discussed yesterday on slack, can
> however contribute to much higher cpu usage.

That's definitely one possible reason.

But this particular analyse has io_accounting focused on data_write,
thus metadata can definitely have its part in it.

Nevertheless, this has already shows there are problems in the old
autodefrag path and not really exposed.

Thanks,
Qu

>
> [1] https://lore.kernel.org/linux-btrfs/CAEwRaO4y3PPPUdwYjNDoB9m9CLzfd3DFFk2iK1X6OyyEWG5-mg@mail.gmail.com/
>
> Thanks.
>
>>
>> Qu Wenruo (2):
>>    btrfs: defrag: don't defrag preallocated extents
>>    btrfs: defrag: limit cluster size to the first hole/prealloc range
>>
>>   fs/btrfs/ioctl.c | 48 ++++++++++++++++++++++++++++++++++++++++++------
>>   1 file changed, 42 insertions(+), 6 deletions(-)
>>
>> --
>> 2.34.1
>>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [POC for v5.15 0/2] btrfs: defrag: what if v5.15 is doing proper defrag
  2022-01-25 10:55   ` Qu Wenruo
@ 2022-01-25 11:05     ` Qu Wenruo
  2022-01-25 19:58       ` François-Xavier Thomas
  0 siblings, 1 reply; 18+ messages in thread
From: Qu Wenruo @ 2022-01-25 11:05 UTC (permalink / raw)
  To: Filipe Manana, Qu Wenruo, François-Xavier Thomas; +Cc: linux-btrfs



On 2022/1/25 18:55, Qu Wenruo wrote:
>
>
> On 2022/1/25 18:37, Filipe Manana wrote:
>> On Tue, Jan 25, 2022 at 02:50:55PM +0800, Qu Wenruo wrote:
>>> ** DON'T MERGE, THIS IS JUST A PROOF OF CONCEPT **
>>>
>>> There are several reports about v5.16 btrfs autodefrag is causing more
>>> IO than v5.15.
>>>
>>> But it turns out that, commit f458a3873ae ("btrfs: fix race when
>>> defragmenting leads to unnecessary IO") is making defrags doing less
>>> work than it should.
>>> Thus damping the IO for autodefrag.
>>>
>>> This POC series is to make v5.15 kernel to do proper defrag of all good
>>> candidates while still not defrag any hole/preallocated range.
>>>
>>> The test script here looks like this:
>>>
>>>     wipefs -fa $dev
>>>     mkfs.btrfs -f $dev -U $uuid > /dev/null
>>>     mount $dev $mnt -o autodefrag
>>>     $fsstress -w -n 2000 -p 1 -d $mnt -s 1642319517
>>>     sync
>>>     echo "=== baseline ==="
>>>     cat /sys/fs/btrfs/$uuid/debug/io_accounting/data_write
>>>     echo 0 > /sys/fs/btrfs/$uuid/debug/cleaner_trigger
>>>     sleep 3
>>>     sync
>>>     echo "=== after autodefrag ==="
>>>     cat /sys/fs/btrfs/$uuid/debug/io_accounting/data_write
>>>     umount $mnt
>>>
>>> <uuid>/debug/io_accounting/data_write is the new debug features showing
>>> how many bytes has been written for a btrfs.
>>> The numbers are before chunk mapping.
>>> cleaer_trigger is the trigger to wake up cleaner_kthread so autodefrag
>>> can do its work.
>>>
>>> Now there is result:
>>>
>>>                  | Data bytes written | Diff to baseline
>>> ----------------+--------------------+------------------
>>> no autodefrag   | 36896768           | 0
>>> v5.15 vanilla   | 40079360           | +8.6%
>>> v5.15 POC       | 42491904           | +15.2%
>>> v5.16 fixes    | 42536960         | +15.3%
>>>
>>> The data shows, although v5.15 vanilla is really causing the least
>>> amount of IO for autodefrag, if v5.15 is patched with POC to do proper
>>> defrag, the final IO is almost the same as v5.16 with submitted fixes.
>>>
>>> So this proves that, the v5.15 has lower IO is not a valid default, but
>>> a regression which leads to less efficient defrag.
>>>
>>> And the IO increase is in fact a proof of a regression being fixed.
>>
>> Are you sure that's the only thing?
>
> Not the only thing, but it's proving the baseline of v5.15 is not a
> reliable one.
>
>> Users report massive IO difference, 15% more does not seem to be massive.
>> François for example reported a difference of 10 ops/s vs 1k ops/s [1]
>
> This is just for the seed I'm using.
> As it provides a reliable and constant baseline where all my previous
> testing and debugging are based on.
>
> It can be definitely way more IO if the load involves more full cluster
> rejection.
>
>>
>> It also does not explain the 100% cpu usage of the cleaner kthread.
>> Scanning the whole file based on extent maps and not using
>> btrfs_search_forward() anymore, as discussed yesterday on slack, can
>> however contribute to much higher cpu usage.
>
> That's definitely one possible reason.
>
> But this particular analyse has io_accounting focused on data_write,
> thus metadata can definitely have its part in it.

BTW, with the io_accounting debugging feature, we can easily distinguish
metadata read/write from data read/write caused by the fsstress and
autodefrag.

Although in my environment, the nr_ops are pretty small, thus the whole
metadata can be easily cached in memory, thus the btrfs_search_forward()
problem is not that observable here.

For the real world proof, François, mind to test v5.15 with these two
patches applied to see if there is a difference in IO, compared to v5.15
vanilla?

Thanks,
Qu

>
> Nevertheless, this has already shows there are problems in the old
> autodefrag path and not really exposed.
>
> Thanks,
> Qu
>
>>
>> [1]
>> https://lore.kernel.org/linux-btrfs/CAEwRaO4y3PPPUdwYjNDoB9m9CLzfd3DFFk2iK1X6OyyEWG5-mg@mail.gmail.com/
>>
>>
>> Thanks.
>>
>>>
>>> Qu Wenruo (2):
>>>    btrfs: defrag: don't defrag preallocated extents
>>>    btrfs: defrag: limit cluster size to the first hole/prealloc range
>>>
>>>   fs/btrfs/ioctl.c | 48 ++++++++++++++++++++++++++++++++++++++++++------
>>>   1 file changed, 42 insertions(+), 6 deletions(-)
>>>
>>> --
>>> 2.34.1
>>>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [POC for v5.15 0/2] btrfs: defrag: what if v5.15 is doing proper defrag
  2022-01-25 11:05     ` Qu Wenruo
@ 2022-01-25 19:58       ` François-Xavier Thomas
  2022-02-01 21:18         ` François-Xavier Thomas
  0 siblings, 1 reply; 18+ messages in thread
From: François-Xavier Thomas @ 2022-01-25 19:58 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Filipe Manana, Qu Wenruo, linux-btrfs

Hi Qu, all,

> For the real world proof, François, mind to test v5.15 with these two
> patches applied to see if there is a difference in IO, compared to v5.15
> vanilla?

Sure, I'll take the last 5.15 I used (5.15.13) as a base + the 2
patches from this thread, is that good?
   1-btrfs-dont-defrag-preallocated-extents.patch
   2-defrag-limit-cluster-size-to-first-hole.patch

Will post results tomorrow when it finishes compiling.

François-Xavier

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [POC for v5.15 0/2] btrfs: defrag: what if v5.15 is doing proper defrag
  2022-01-25 19:58       ` François-Xavier Thomas
@ 2022-02-01 21:18         ` François-Xavier Thomas
  2022-02-02  0:35           ` Qu Wenruo
  0 siblings, 1 reply; 18+ messages in thread
From: François-Xavier Thomas @ 2022-02-01 21:18 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Filipe Manana, Qu Wenruo, linux-btrfs

> Sure, I'll take the last 5.15 I used (5.15.13) as a base + the 2
> patches from this thread, is that good?
>    1-btrfs-dont-defrag-preallocated-extents.patch
>    2-defrag-limit-cluster-size-to-first-hole.patch
>
> Will post results tomorrow when it finishes compiling.

Tomorrow ended up taking a week, as it happens, sorry for that. No
significant difference compared to before:
https://i.imgur.com/BNoxixL.png

I'm not sure what we can infer from this, if I understood correctly
this should have shown similar amounts of I/O to 5.16 with all the
previous fixes, right?

I pasted the full diff I used at the end of this mail, since
`defrag_lookup_extent` changed signature between 5.15 and 5.16 I had
to do a couple of minor changes to get it to compile on 5.15.13.
Hopefully that was what you initially intended on testing.

Thanks,
François-Xavier

----

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index cc61813213d8..07eb7417b6e7 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1119,19 +1119,23 @@ static struct extent_map
*defrag_lookup_extent(struct inode *inode, u64 start)
 static bool defrag_check_next_extent(struct inode *inode, struct
extent_map *em)
 {
     struct extent_map *next;
-    bool ret = true;
+    bool ret = false;

     /* this is the last extent */
     if (em->start + em->len >= i_size_read(inode))
-        return false;
+        return ret;

     next = defrag_lookup_extent(inode, em->start + em->len);
     if (!next || next->block_start >= EXTENT_MAP_LAST_BYTE)
-        ret = false;
-    else if ((em->block_start + em->block_len == next->block_start) &&
-         (em->block_len > SZ_128K && next->block_len > SZ_128K))
-        ret = false;
+        goto out;
+    if (test_bit(EXTENT_FLAG_COMPRESSED, &next->flags))
+        goto out;
+    if ((em->block_start + em->block_len == next->block_start) &&
+     (em->block_len > SZ_128K && next->block_len > SZ_128K))
+        goto out;

+    ret = true;
+out:
     free_extent_map(next);
     return ret;
 }
@@ -1403,6 +1407,30 @@ static int cluster_pages_for_defrag(struct inode *inode,

 }

+static u64 get_cluster_size(struct inode *inode, u64 start, u64 len)
+{
+    u64 cur = start;
+    u64 cluster_len = 0;
+    while (cur < start + len) {
+        struct extent_map *em;
+
+        em = defrag_lookup_extent(inode, cur);
+        if (!em)
+            break;
+        /*
+         * Here we don't do comprehensive checks, we just
+         * find the first hole/preallocated.
+         */
+        if (em->block_start >= EXTENT_MAP_LAST_BYTE ||
+            test_bit(EXTENT_FLAG_PREALLOC, &em->flags))
+            break;
+        cluster_len += min(em->start + em->len - cur, start + len - cur);
+        cur = min(em->start + em->len, start + len);
+    }
+    return cluster_len;
+}
+
+
 int btrfs_defrag_file(struct inode *inode, struct file *file,
               struct btrfs_ioctl_defrag_range_args *range,
               u64 newer_than, unsigned long max_to_defrag)
@@ -1531,6 +1559,13 @@ int btrfs_defrag_file(struct inode *inode,
struct file *file,
         } else {
             cluster = max_cluster;
         }
+        cluster = min_t(unsigned long,
+                get_cluster_size(inode, i << PAGE_SHIFT,
+                         cluster << PAGE_SHIFT)
+                    >> PAGE_SHIFT,
+                    cluster);
+        if (cluster == 0)
+            goto next;

         if (i + cluster > ra_index) {
             ra_index = max(i, ra_index);
@@ -1557,6 +1592,7 @@ int btrfs_defrag_file(struct inode *inode,
struct file *file,
         balance_dirty_pages_ratelimited(inode->i_mapping);
         btrfs_inode_unlock(inode, 0);

+next:
         if (newer_than) {
             if (newer_off == (u64)-1)
                 break;

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [POC for v5.15 0/2] btrfs: defrag: what if v5.15 is doing proper defrag
  2022-02-01 21:18         ` François-Xavier Thomas
@ 2022-02-02  0:35           ` Qu Wenruo
  2022-02-02  1:18             ` Qu Wenruo
  0 siblings, 1 reply; 18+ messages in thread
From: Qu Wenruo @ 2022-02-02  0:35 UTC (permalink / raw)
  To: François-Xavier Thomas; +Cc: Filipe Manana, Qu Wenruo, linux-btrfs



On 2022/2/2 05:18, François-Xavier Thomas wrote:
>> Sure, I'll take the last 5.15 I used (5.15.13) as a base + the 2
>> patches from this thread, is that good?
>>     1-btrfs-dont-defrag-preallocated-extents.patch
>>     2-defrag-limit-cluster-size-to-first-hole.patch
>>
>> Will post results tomorrow when it finishes compiling.
>
> Tomorrow ended up taking a week, as it happens, sorry for that. No
> significant difference compared to before:
> https://i.imgur.com/BNoxixL.png
>
> I'm not sure what we can infer from this, if I understood correctly
> this should have shown similar amounts of I/O to 5.16 with all the
> previous fixes, right?

OK, things are getting more complex now.

Unless your workload includes quite some preallocated range, it's
definitely not what I expected.

As a final struggle, mind to test the small diff (without any previous
diff) against v5.15?

Thanks,
Qu

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index cc61813213d8..bcc075ea56c9 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1528,8 +1528,6 @@ int btrfs_defrag_file(struct inode *inode, struct
file *file,
                         cluster = (PAGE_ALIGN(defrag_end) >>
                                    PAGE_SHIFT) - i;
                         cluster = min(cluster, max_cluster);
-               } else {
-                       cluster = max_cluster;
                 }

                 if (i + cluster > ra_index) {


>
> I pasted the full diff I used at the end of this mail, since
> `defrag_lookup_extent` changed signature between 5.15 and 5.16 I had
> to do a couple of minor changes to get it to compile on 5.15.13.
> Hopefully that was what you initially intended on testing.
>
> Thanks,
> François-Xavier
>
> ----
>
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index cc61813213d8..07eb7417b6e7 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -1119,19 +1119,23 @@ static struct extent_map
> *defrag_lookup_extent(struct inode *inode, u64 start)
>   static bool defrag_check_next_extent(struct inode *inode, struct
> extent_map *em)
>   {
>       struct extent_map *next;
> -    bool ret = true;
> +    bool ret = false;
>
>       /* this is the last extent */
>       if (em->start + em->len >= i_size_read(inode))
> -        return false;
> +        return ret;
>
>       next = defrag_lookup_extent(inode, em->start + em->len);
>       if (!next || next->block_start >= EXTENT_MAP_LAST_BYTE)
> -        ret = false;
> -    else if ((em->block_start + em->block_len == next->block_start) &&
> -         (em->block_len > SZ_128K && next->block_len > SZ_128K))
> -        ret = false;
> +        goto out;
> +    if (test_bit(EXTENT_FLAG_COMPRESSED, &next->flags))
> +        goto out;
> +    if ((em->block_start + em->block_len == next->block_start) &&
> +     (em->block_len > SZ_128K && next->block_len > SZ_128K))
> +        goto out;
>
> +    ret = true;
> +out:
>       free_extent_map(next);
>       return ret;
>   }
> @@ -1403,6 +1407,30 @@ static int cluster_pages_for_defrag(struct inode *inode,
>
>   }
>
> +static u64 get_cluster_size(struct inode *inode, u64 start, u64 len)
> +{
> +    u64 cur = start;
> +    u64 cluster_len = 0;
> +    while (cur < start + len) {
> +        struct extent_map *em;
> +
> +        em = defrag_lookup_extent(inode, cur);
> +        if (!em)
> +            break;
> +        /*
> +         * Here we don't do comprehensive checks, we just
> +         * find the first hole/preallocated.
> +         */
> +        if (em->block_start >= EXTENT_MAP_LAST_BYTE ||
> +            test_bit(EXTENT_FLAG_PREALLOC, &em->flags))
> +            break;
> +        cluster_len += min(em->start + em->len - cur, start + len - cur);
> +        cur = min(em->start + em->len, start + len);
> +    }
> +    return cluster_len;
> +}
> +
> +
>   int btrfs_defrag_file(struct inode *inode, struct file *file,
>                 struct btrfs_ioctl_defrag_range_args *range,
>                 u64 newer_than, unsigned long max_to_defrag)
> @@ -1531,6 +1559,13 @@ int btrfs_defrag_file(struct inode *inode,
> struct file *file,
>           } else {
>               cluster = max_cluster;
>           }
> +        cluster = min_t(unsigned long,
> +                get_cluster_size(inode, i << PAGE_SHIFT,
> +                         cluster << PAGE_SHIFT)
> +                    >> PAGE_SHIFT,
> +                    cluster);
> +        if (cluster == 0)
> +            goto next;
>
>           if (i + cluster > ra_index) {
>               ra_index = max(i, ra_index);
> @@ -1557,6 +1592,7 @@ int btrfs_defrag_file(struct inode *inode,
> struct file *file,
>           balance_dirty_pages_ratelimited(inode->i_mapping);
>           btrfs_inode_unlock(inode, 0);
>
> +next:
>           if (newer_than) {
>               if (newer_off == (u64)-1)
>                   break;

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [POC for v5.15 0/2] btrfs: defrag: what if v5.15 is doing proper defrag
  2022-02-02  0:35           ` Qu Wenruo
@ 2022-02-02  1:18             ` Qu Wenruo
  2022-02-02 19:01               ` François-Xavier Thomas
  0 siblings, 1 reply; 18+ messages in thread
From: Qu Wenruo @ 2022-02-02  1:18 UTC (permalink / raw)
  To: François-Xavier Thomas; +Cc: Filipe Manana, Qu Wenruo, linux-btrfs



On 2022/2/2 08:35, Qu Wenruo wrote:
> 
> 
> On 2022/2/2 05:18, François-Xavier Thomas wrote:
>>> Sure, I'll take the last 5.15 I used (5.15.13) as a base + the 2
>>> patches from this thread, is that good?
>>>     1-btrfs-dont-defrag-preallocated-extents.patch
>>>     2-defrag-limit-cluster-size-to-first-hole.patch
>>>
>>> Will post results tomorrow when it finishes compiling.
>>
>> Tomorrow ended up taking a week, as it happens, sorry for that. No
>> significant difference compared to before:
>> https://i.imgur.com/BNoxixL.png
>>
>> I'm not sure what we can infer from this, if I understood correctly
>> this should have shown similar amounts of I/O to 5.16 with all the
>> previous fixes, right?
> 
> OK, things are getting more complex now.
> 
> Unless your workload includes quite some preallocated range, it's
> definitely not what I expected.
> 
> As a final struggle, mind to test the small diff (without any previous
> diff) against v5.15?
> 
> Thanks,
> Qu
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index cc61813213d8..bcc075ea56c9 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -1528,8 +1528,6 @@ int btrfs_defrag_file(struct inode *inode, struct
> file *file,
>                          cluster = (PAGE_ALIGN(defrag_end) >>
>                                     PAGE_SHIFT) - i;
>                          cluster = min(cluster, max_cluster);
> -               } else {
> -                       cluster = max_cluster;
>                  }
> 
>                  if (i + cluster > ra_index) {

Wait, this is not correct, the correct one is this:

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index cc61813213d8..f6f2468d4883 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1524,13 +1524,8 @@ int btrfs_defrag_file(struct inode *inode, struct 
file *file,
                         continue;
                 }

-               if (!newer_than) {
-                       cluster = (PAGE_ALIGN(defrag_end) >>
-                                  PAGE_SHIFT) - i;
-                       cluster = min(cluster, max_cluster);
-               } else {
-                       cluster = max_cluster;
-               }
+               cluster = (PAGE_ALIGN(defrag_end) >> PAGE_SHIFT) - i;
+               cluster = min(cluster, max_cluster);

                 if (i + cluster > ra_index) {
                         ra_index = max(i, ra_index);
> 
> 
>>
>> I pasted the full diff I used at the end of this mail, since
>> `defrag_lookup_extent` changed signature between 5.15 and 5.16 I had
>> to do a couple of minor changes to get it to compile on 5.15.13.
>> Hopefully that was what you initially intended on testing.
>>
>> Thanks,
>> François-Xavier
>>
>> ----
>>
>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>> index cc61813213d8..07eb7417b6e7 100644
>> --- a/fs/btrfs/ioctl.c
>> +++ b/fs/btrfs/ioctl.c
>> @@ -1119,19 +1119,23 @@ static struct extent_map
>> *defrag_lookup_extent(struct inode *inode, u64 start)
>>   static bool defrag_check_next_extent(struct inode *inode, struct
>> extent_map *em)
>>   {
>>       struct extent_map *next;
>> -    bool ret = true;
>> +    bool ret = false;
>>
>>       /* this is the last extent */
>>       if (em->start + em->len >= i_size_read(inode))
>> -        return false;
>> +        return ret;
>>
>>       next = defrag_lookup_extent(inode, em->start + em->len);
>>       if (!next || next->block_start >= EXTENT_MAP_LAST_BYTE)
>> -        ret = false;
>> -    else if ((em->block_start + em->block_len == next->block_start) &&
>> -         (em->block_len > SZ_128K && next->block_len > SZ_128K))
>> -        ret = false;
>> +        goto out;
>> +    if (test_bit(EXTENT_FLAG_COMPRESSED, &next->flags))
>> +        goto out;
>> +    if ((em->block_start + em->block_len == next->block_start) &&
>> +     (em->block_len > SZ_128K && next->block_len > SZ_128K))
>> +        goto out;
>>
>> +    ret = true;
>> +out:
>>       free_extent_map(next);
>>       return ret;
>>   }
>> @@ -1403,6 +1407,30 @@ static int cluster_pages_for_defrag(struct 
>> inode *inode,
>>
>>   }
>>
>> +static u64 get_cluster_size(struct inode *inode, u64 start, u64 len)
>> +{
>> +    u64 cur = start;
>> +    u64 cluster_len = 0;
>> +    while (cur < start + len) {
>> +        struct extent_map *em;
>> +
>> +        em = defrag_lookup_extent(inode, cur);
>> +        if (!em)
>> +            break;
>> +        /*
>> +         * Here we don't do comprehensive checks, we just
>> +         * find the first hole/preallocated.
>> +         */
>> +        if (em->block_start >= EXTENT_MAP_LAST_BYTE ||
>> +            test_bit(EXTENT_FLAG_PREALLOC, &em->flags))
>> +            break;
>> +        cluster_len += min(em->start + em->len - cur, start + len - 
>> cur);
>> +        cur = min(em->start + em->len, start + len);
>> +    }
>> +    return cluster_len;
>> +}
>> +
>> +
>>   int btrfs_defrag_file(struct inode *inode, struct file *file,
>>                 struct btrfs_ioctl_defrag_range_args *range,
>>                 u64 newer_than, unsigned long max_to_defrag)
>> @@ -1531,6 +1559,13 @@ int btrfs_defrag_file(struct inode *inode,
>> struct file *file,
>>           } else {
>>               cluster = max_cluster;
>>           }
>> +        cluster = min_t(unsigned long,
>> +                get_cluster_size(inode, i << PAGE_SHIFT,
>> +                         cluster << PAGE_SHIFT)
>> +                    >> PAGE_SHIFT,
>> +                    cluster);
>> +        if (cluster == 0)
>> +            goto next;
>>
>>           if (i + cluster > ra_index) {
>>               ra_index = max(i, ra_index);
>> @@ -1557,6 +1592,7 @@ int btrfs_defrag_file(struct inode *inode,
>> struct file *file,
>>           balance_dirty_pages_ratelimited(inode->i_mapping);
>>           btrfs_inode_unlock(inode, 0);
>>
>> +next:
>>           if (newer_than) {
>>               if (newer_off == (u64)-1)
>>                   break;

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [POC for v5.15 0/2] btrfs: defrag: what if v5.15 is doing proper defrag
  2022-02-02  1:18             ` Qu Wenruo
@ 2022-02-02 19:01               ` François-Xavier Thomas
  2022-02-04  9:32                 ` François-Xavier Thomas
  0 siblings, 1 reply; 18+ messages in thread
From: François-Xavier Thomas @ 2022-02-02 19:01 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Filipe Manana, Qu Wenruo, linux-btrfs

> As a final struggle, mind to test the small diff (without any previous
diff) against v5.15?

Sure, will let you know when I get the results!

François-Xavier

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [POC for v5.15 0/2] btrfs: defrag: what if v5.15 is doing proper defrag
  2022-02-02 19:01               ` François-Xavier Thomas
@ 2022-02-04  9:32                 ` François-Xavier Thomas
  2022-02-04  9:49                   ` Qu Wenruo
  0 siblings, 1 reply; 18+ messages in thread
From: François-Xavier Thomas @ 2022-02-04  9:32 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Filipe Manana, Qu Wenruo, linux-btrfs

The increase in I/O is small but noticeable:
https://i.imgur.com/IJ4lLI2.png

I also checked 5.16.5 just in case, and the difference is still marked.

François-Xavier

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [POC for v5.15 0/2] btrfs: defrag: what if v5.15 is doing proper defrag
  2022-02-04  9:32                 ` François-Xavier Thomas
@ 2022-02-04  9:49                   ` Qu Wenruo
  2022-02-04 11:05                     ` François-Xavier Thomas
  0 siblings, 1 reply; 18+ messages in thread
From: Qu Wenruo @ 2022-02-04  9:49 UTC (permalink / raw)
  To: François-Xavier Thomas, Qu Wenruo; +Cc: Filipe Manana, linux-btrfs

On 2022/2/4 17:32, François-Xavier Thomas wrote:
> The increase in I/O is small but noticeable:
> https://i.imgur.com/IJ4lLI2.png
> 
> I also checked 5.16.5 just in case, and the difference is still marked.
> 
> François-Xavier
> 

OK, then it means there is still something else involved.

But one thing to mention is, the color scheme is little weird to me.

Although the legend shows dark yellow (?) is read bytes/s, but in the 
graph is more like pink.

The same applies to dark green (?) which is more like regular green for 
sda write bytes/s.

So what I got from the graph is:

- v5.15.13 vanilla
   Less than 3 MB/s write for autodefrag,
   And has burst read around 15MB/s.

- v5.15.13 + final patch
   Approaching 5 MB/s write, and neat 20MB/s burst read.

   I believe this should be our base line.

- v5.16.5 vanilla
   10MB/s write, near 35 MB/s burst read.

For the write part, I don't really have better idea.

But for the read part, I'm working to bring back two optimization.

One is to skip large range which is not our target.
That patchset has been sent to the mail list:

https://patchwork.kernel.org/project/linux-btrfs/list/?series=611214

The other one is to reduce the metadata IO and memory usage for extent 
map searching.

In theory it should reduce the metadata IO, but I'm not yet confident 
enough.

Will give you an update when the testing branch is fully prepared.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [POC for v5.15 0/2] btrfs: defrag: what if v5.15 is doing proper defrag
  2022-02-04  9:49                   ` Qu Wenruo
@ 2022-02-04 11:05                     ` François-Xavier Thomas
  2022-02-04 11:21                       ` Qu Wenruo
  2022-02-05 10:19                       ` Qu Wenruo
  0 siblings, 2 replies; 18+ messages in thread
From: François-Xavier Thomas @ 2022-02-04 11:05 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, Filipe Manana, linux-btrfs

> But one thing to mention is, the color scheme is little weird to me.

Good point, here's a custom graph with just the write rate, that
should be easier to read than the default graph with everything:
https://i.imgur.com/vlRPOFr.png

Waiting for your next update then. In the mean time are there other
statistics I should collect that would make this easier to debug?

Thanks,
François-Xavier

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [POC for v5.15 0/2] btrfs: defrag: what if v5.15 is doing proper defrag
  2022-02-04 11:05                     ` François-Xavier Thomas
@ 2022-02-04 11:21                       ` Qu Wenruo
  2022-02-04 11:27                         ` François-Xavier Thomas
  2022-02-05 10:19                       ` Qu Wenruo
  1 sibling, 1 reply; 18+ messages in thread
From: Qu Wenruo @ 2022-02-04 11:21 UTC (permalink / raw)
  To: François-Xavier Thomas, Qu Wenruo; +Cc: Filipe Manana, linux-btrfs

On 2022/2/4 19:05, François-Xavier Thomas wrote:
>> But one thing to mention is, the color scheme is little weird to me.
>
> Good point, here's a custom graph with just the write rate, that
> should be easier to read than the default graph with everything:
> https://i.imgur.com/vlRPOFr.png

This is indeed way better.

Although now the legend only shows write MB/s, but I still see some pink
lines.

Does the pink line still belongs to the sda write MB/s?
Or it's just the read MB/s?

And if possible, read MB/s would also help.
>
> Waiting for your next update then. In the mean time are there other
> statistics I should collect that would make this easier to debug?

Other debug data can be too large and slow down the system, like ftrace
event for each cluster.

Another thing which can greatly help is sorting the IO by its METADATA flag.
(E.g. providing metadata/data read/write MB/s separately)

But unfortunately I don't have any idea to monitor it without using eBPF...

Thanks,
Qu
>
> Thanks,
> François-Xavier

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [POC for v5.15 0/2] btrfs: defrag: what if v5.15 is doing proper defrag
  2022-02-04 11:21                       ` Qu Wenruo
@ 2022-02-04 11:27                         ` François-Xavier Thomas
  2022-02-04 11:28                           ` François-Xavier Thomas
  0 siblings, 1 reply; 18+ messages in thread
From: François-Xavier Thomas @ 2022-02-04 11:27 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, Filipe Manana, linux-btrfs

> Although now the legend only shows write MB/s, but I still see some pink
lines.

Zabbix shows slightly different colors for statistics when the
timeline is zoomed out far enough: reddish pink is maximum, light
green is minimum ; the legend shows the average. That was indeed a
little muddy in the previous graphs!

> read MB/s would also help.

Here's a set of separate graphs with read/write ops/bytes (no real
difference in reads):
https://imgur.com/1Zp47zj

François-Xavier

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [POC for v5.15 0/2] btrfs: defrag: what if v5.15 is doing proper defrag
  2022-02-04 11:27                         ` François-Xavier Thomas
@ 2022-02-04 11:28                           ` François-Xavier Thomas
  0 siblings, 0 replies; 18+ messages in thread
From: François-Xavier Thomas @ 2022-02-04 11:28 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, Filipe Manana, linux-btrfs

> Here's a set of separate graphs with read/write ops/bytes (no real
difference in reads):

Ah, that was the wrong link, here's the correct one:
https://imgur.com/a/nB4dDmF

François-Xavier

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [POC for v5.15 0/2] btrfs: defrag: what if v5.15 is doing proper defrag
  2022-02-04 11:05                     ` François-Xavier Thomas
  2022-02-04 11:21                       ` Qu Wenruo
@ 2022-02-05 10:19                       ` Qu Wenruo
  1 sibling, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2022-02-05 10:19 UTC (permalink / raw)
  To: François-Xavier Thomas, Qu Wenruo; +Cc: Filipe Manana, linux-btrfs

On 2022/2/4 19:05, François-Xavier Thomas wrote:
>> But one thing to mention is, the color scheme is little weird to me.
>
> Good point, here's a custom graph with just the write rate, that
> should be easier to read than the default graph with everything:
> https://i.imgur.com/vlRPOFr.png
>
> Waiting for your next update then. In the mean time are there other
> statistics I should collect that would make this easier to debug?

Here is my branch:

https://github.com/adam900710/linux/tree/autodefrag_fixes

This branch contains the following fixes which are not yet in misc-next
  branch:

("btrfs: defrag: bring back the old file extent search behavior")

   This addresses several problems exposed by Filipe, all related to the
   btrfs_get_extent() call which get heavily used in v5.16.

   Unfortunately, I don't yet have good idea on how to craft a pinpoint
   test case for it.

   Thus I'm not that confident whether this is the last piece.

("btrfs: populate extent_map::generation when reading from disk")

   And this problem is already there at least from v5.15.
   Thankfully it doesn't affect the autodefrag behavior yet.

("btrfs: defrag: allow defrag_one_cluster() to skip large extent which
is not a target")
("btrfs: defrag: make btrfs_defrag_file() to report accurate number of
defragged sectors")
("btrfs: defrag: use btrfs_defrag_ctrl to replace
btrfs_ioctl_defrag_range_args for btrfs_defrag_file()")
("btrfs: defrag: introduce btrfs_defrag_ctrl structure for later usage")
("btrfs: uapi: introduce BTRFS_DEFRAG_RANGE_MASK for later sanity check")

   This series has been sent the list, and is mostly a pure optimization
   to skip large extents faster, should not change the data write IO
   though.

("btrfs: defrag: remove an ambiguous condition for rejection")
("btrfs: defrag: don't defrag extents which is already at its max capacity")
("btrfs: defrag: don't try to merge regular extents with preallocated
extents")

   This series has been sent to the list, and is to address the old bad
   behavior around preallocated extents.

The base is misc-next branch from David, which is further based on
v5.17-rc2.

Thanks,
Qu

>
> Thanks,
> François-Xavier

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-02-05 10:19 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-25  6:50 [POC for v5.15 0/2] btrfs: defrag: what if v5.15 is doing proper defrag Qu Wenruo
2022-01-25  6:50 ` [POC for v5.15 1/2] btrfs: defrag: don't defrag preallocated extents Qu Wenruo
2022-01-25  6:50 ` [POC for v5.15 2/2] btrfs: defrag: limit cluster size to the first hole/prealloc range Qu Wenruo
2022-01-25 10:37 ` [POC for v5.15 0/2] btrfs: defrag: what if v5.15 is doing proper defrag Filipe Manana
2022-01-25 10:55   ` Qu Wenruo
2022-01-25 11:05     ` Qu Wenruo
2022-01-25 19:58       ` François-Xavier Thomas
2022-02-01 21:18         ` François-Xavier Thomas
2022-02-02  0:35           ` Qu Wenruo
2022-02-02  1:18             ` Qu Wenruo
2022-02-02 19:01               ` François-Xavier Thomas
2022-02-04  9:32                 ` François-Xavier Thomas
2022-02-04  9:49                   ` Qu Wenruo
2022-02-04 11:05                     ` François-Xavier Thomas
2022-02-04 11:21                       ` Qu Wenruo
2022-02-04 11:27                         ` François-Xavier Thomas
2022-02-04 11:28                           ` François-Xavier Thomas
2022-02-05 10:19                       ` Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.