All of lore.kernel.org
 help / color / mirror / Atom feed
* btrfs fi defrag hangs on small files, 100% CPU thread
@ 2022-01-16 19:15 Anthony Ruhier
  2022-01-17 12:10 ` Filipe Manana
  0 siblings, 1 reply; 14+ messages in thread
From: Anthony Ruhier @ 2022-01-16 19:15 UTC (permalink / raw)
  To: linux-btrfs


[-- Attachment #1.1.1: Type: text/plain, Size: 2935 bytes --]

Hi,
Since I upgraded from linux 5.15 to 5.16, `btrfs filesystem defrag 
-t128K` hangs on small files (~1 byte) and triggers what it seems to be 
a loop in the kernel. It results in one CPU thread running being used at 
100%. I cannot kill the process, and rebooting is blocked by btrfs.
It is a copy of the bug https://bugzilla.kernel.org/show_bug.cgi?id=215498

Rebooting to linux 5.15 shows no issue. I have no issue to run a defrag 
on bigger files (I filter out files smaller than 3.9KB).

I had a conversation on #btrfs on IRC, so here's what we debugged:

I can replicate the issue by copying a file impacted by this bug, by 
using `cp --reflink=never`. I attached one of the impacted files to this 
bug, named README.md.

Someone told me that it could be a bug due to the inline extent. So we 
tried to check that.

filefrag shows that the file Readme.md is 1 inline extent. I tried to 
create a new file with random text, of 18 bytes (slightly bigger than 
the other file), that is also 1 inline extent. This file doesn't trigger 
the bug and has no issue to be defragmented.

I tried to mount my system with `max_inline=0`, created a copy of 
README.md. `filefrag` shows me that the new file is now 1 extent, not 
inline. This new file also triggers the bug, so it doesn't seem to be 
due to the inline extent.

Someone asked me to provide the output of a perf top when the defrag is 
stuck:

     28.70%  [kernel]          [k] generic_bin_search
     14.90%  [kernel]          [k] free_extent_buffer
     13.17%  [kernel]          [k] btrfs_search_slot
     12.63%  [kernel]          [k] btrfs_root_node
      8.33%  [kernel]          [k] btrfs_get_64
      3.88%  [kernel]          [k] __down_read_common.llvm
      3.00%  [kernel]          [k] up_read
      2.63%  [kernel]          [k] read_block_for_search
      2.40%  [kernel]          [k] read_extent_buffer
      1.38%  [kernel]          [k] memset_erms
      1.11%  [kernel]          [k] find_extent_buffer
      0.69%  [kernel]          [k] kmem_cache_free
      0.69%  [kernel]          [k] memcpy_erms
      0.57%  [kernel]          [k] kmem_cache_alloc
      0.45%  [kernel]          [k] radix_tree_lookup

I can reproduce the bug on 2 different machines, running 2 different 
linux distributions (Arch and Gentoo) with 2 different kernel configs.
This kernel is compiled with clang, the other with GCC.

Kernel version: 5.16.0
Mount options:
     Machine 1: 
rw,noatime,compress-force=zstd:2,ssd,discard=async,space_cache=v2,autodefrag
     Machine 2: rw,noatime,compress-force=zstd:3,nossd,space_cache=v2

When the error happens, no message is shown in dmesg.

Thanks,
Anthony Ruhier


[-- Attachment #1.1.2: README.md --]
[-- Type: text/markdown, Size: 1 bytes --]

 

[-- Attachment #1.1.3: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 21807 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs fi defrag hangs on small files, 100% CPU thread
  2022-01-16 19:15 btrfs fi defrag hangs on small files, 100% CPU thread Anthony Ruhier
@ 2022-01-17 12:10 ` Filipe Manana
  2022-01-17 14:24   ` Anthony Ruhier
  0 siblings, 1 reply; 14+ messages in thread
From: Filipe Manana @ 2022-01-17 12:10 UTC (permalink / raw)
  To: Anthony Ruhier; +Cc: linux-btrfs

On Sun, Jan 16, 2022 at 08:15:37PM +0100, Anthony Ruhier wrote:
> Hi,
> Since I upgraded from linux 5.15 to 5.16, `btrfs filesystem defrag -t128K`
> hangs on small files (~1 byte) and triggers what it seems to be a loop in
> the kernel. It results in one CPU thread running being used at 100%. I
> cannot kill the process, and rebooting is blocked by btrfs.
> It is a copy of the bug https://bugzilla.kernel.org/show_bug.cgi?id=215498
> 
> Rebooting to linux 5.15 shows no issue. I have no issue to run a defrag on
> bigger files (I filter out files smaller than 3.9KB).
> 
> I had a conversation on #btrfs on IRC, so here's what we debugged:
> 
> I can replicate the issue by copying a file impacted by this bug, by using
> `cp --reflink=never`. I attached one of the impacted files to this bug,
> named README.md.
> 
> Someone told me that it could be a bug due to the inline extent. So we tried
> to check that.
> 
> filefrag shows that the file Readme.md is 1 inline extent. I tried to create
> a new file with random text, of 18 bytes (slightly bigger than the other
> file), that is also 1 inline extent. This file doesn't trigger the bug and
> has no issue to be defragmented.
> 
> I tried to mount my system with `max_inline=0`, created a copy of README.md.
> `filefrag` shows me that the new file is now 1 extent, not inline. This new
> file also triggers the bug, so it doesn't seem to be due to the inline
> extent.
> 
> Someone asked me to provide the output of a perf top when the defrag is
> stuck:
> 
>     28.70%  [kernel]          [k] generic_bin_search
>     14.90%  [kernel]          [k] free_extent_buffer
>     13.17%  [kernel]          [k] btrfs_search_slot
>     12.63%  [kernel]          [k] btrfs_root_node
>      8.33%  [kernel]          [k] btrfs_get_64
>      3.88%  [kernel]          [k] __down_read_common.llvm
>      3.00%  [kernel]          [k] up_read
>      2.63%  [kernel]          [k] read_block_for_search
>      2.40%  [kernel]          [k] read_extent_buffer
>      1.38%  [kernel]          [k] memset_erms
>      1.11%  [kernel]          [k] find_extent_buffer
>      0.69%  [kernel]          [k] kmem_cache_free
>      0.69%  [kernel]          [k] memcpy_erms
>      0.57%  [kernel]          [k] kmem_cache_alloc
>      0.45%  [kernel]          [k] radix_tree_lookup
> 
> I can reproduce the bug on 2 different machines, running 2 different linux
> distributions (Arch and Gentoo) with 2 different kernel configs.
> This kernel is compiled with clang, the other with GCC.
> 
> Kernel version: 5.16.0
> Mount options:
>     Machine 1:
> rw,noatime,compress-force=zstd:2,ssd,discard=async,space_cache=v2,autodefrag
>     Machine 2: rw,noatime,compress-force=zstd:3,nossd,space_cache=v2
> 
> When the error happens, no message is shown in dmesg.

This is very likely the same issue as reported at this thread:

https://lore.kernel.org/linux-btrfs/YeVawBBE3r6hVhgs@debian9.Home/T/#ma1c8a9848c9b7e4edb471f7be184599d38e288bb

Are you able to test the patch posted there?

Thanks.

> 
> Thanks,
> Anthony Ruhier
> 

>  





^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs fi defrag hangs on small files, 100% CPU thread
  2022-01-17 12:10 ` Filipe Manana
@ 2022-01-17 14:24   ` Anthony Ruhier
  2022-01-17 16:52     ` Filipe Manana
  0 siblings, 1 reply; 14+ messages in thread
From: Anthony Ruhier @ 2022-01-17 14:24 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs


[-- Attachment #1.1.1: Type: text/plain, Size: 4286 bytes --]

Thanks for the answer.

I had the exact same issue as in the thread you've linked, and have some 
monitoring and graphs that showed that btrfs-cleaner did constant writes 
during 12 hours just after I upgraded to linux 5.16. Weirdly enough, the 
issue almost disappeared after I did a btrfs balance by filtering on 10% 
usage of data.
But that's why I initially disabled autodefrag, what has lead to 
discovering this bug as I switched to manual defragmentation (which, in 
the end, makes more sense anyway with my setup).

I tried this patch, but sadly it doesn't help for the initial issue. I 
cannot say for the bug in the other thread, as the problem with 
btrfs-cleaner disappeared (I can still see some writes from it, but it 
so rare that I cannot say if it's normal or not).

Thanks,
Anthony

Le 17/01/2022 à 13:10, Filipe Manana a écrit :
> On Sun, Jan 16, 2022 at 08:15:37PM +0100, Anthony Ruhier wrote:
>> Hi,
>> Since I upgraded from linux 5.15 to 5.16, `btrfs filesystem defrag -t128K`
>> hangs on small files (~1 byte) and triggers what it seems to be a loop in
>> the kernel. It results in one CPU thread running being used at 100%. I
>> cannot kill the process, and rebooting is blocked by btrfs.
>> It is a copy of the bughttps://bugzilla.kernel.org/show_bug.cgi?id=215498
>>
>> Rebooting to linux 5.15 shows no issue. I have no issue to run a defrag on
>> bigger files (I filter out files smaller than 3.9KB).
>>
>> I had a conversation on #btrfs on IRC, so here's what we debugged:
>>
>> I can replicate the issue by copying a file impacted by this bug, by using
>> `cp --reflink=never`. I attached one of the impacted files to this bug,
>> named README.md.
>>
>> Someone told me that it could be a bug due to the inline extent. So we tried
>> to check that.
>>
>> filefrag shows that the file Readme.md is 1 inline extent. I tried to create
>> a new file with random text, of 18 bytes (slightly bigger than the other
>> file), that is also 1 inline extent. This file doesn't trigger the bug and
>> has no issue to be defragmented.
>>
>> I tried to mount my system with `max_inline=0`, created a copy of README.md.
>> `filefrag` shows me that the new file is now 1 extent, not inline. This new
>> file also triggers the bug, so it doesn't seem to be due to the inline
>> extent.
>>
>> Someone asked me to provide the output of a perf top when the defrag is
>> stuck:
>>
>>      28.70%  [kernel]          [k] generic_bin_search
>>      14.90%  [kernel]          [k] free_extent_buffer
>>      13.17%  [kernel]          [k] btrfs_search_slot
>>      12.63%  [kernel]          [k] btrfs_root_node
>>       8.33%  [kernel]          [k] btrfs_get_64
>>       3.88%  [kernel]          [k] __down_read_common.llvm
>>       3.00%  [kernel]          [k] up_read
>>       2.63%  [kernel]          [k] read_block_for_search
>>       2.40%  [kernel]          [k] read_extent_buffer
>>       1.38%  [kernel]          [k] memset_erms
>>       1.11%  [kernel]          [k] find_extent_buffer
>>       0.69%  [kernel]          [k] kmem_cache_free
>>       0.69%  [kernel]          [k] memcpy_erms
>>       0.57%  [kernel]          [k] kmem_cache_alloc
>>       0.45%  [kernel]          [k] radix_tree_lookup
>>
>> I can reproduce the bug on 2 different machines, running 2 different linux
>> distributions (Arch and Gentoo) with 2 different kernel configs.
>> This kernel is compiled with clang, the other with GCC.
>>
>> Kernel version: 5.16.0
>> Mount options:
>>      Machine 1:
>> rw,noatime,compress-force=zstd:2,ssd,discard=async,space_cache=v2,autodefrag
>>      Machine 2: rw,noatime,compress-force=zstd:3,nossd,space_cache=v2
>>
>> When the error happens, no message is shown in dmesg.
> This is very likely the same issue as reported at this thread:
>
> https://lore.kernel.org/linux-btrfs/YeVawBBE3r6hVhgs@debian9.Home/T/#ma1c8a9848c9b7e4edb471f7be184599d38e288bb
>
> Are you able to test the patch posted there?
>
> Thanks.
>
>> Thanks,
>> Anthony Ruhier
>>
>>   

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 21807 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs fi defrag hangs on small files, 100% CPU thread
  2022-01-17 14:24   ` Anthony Ruhier
@ 2022-01-17 16:52     ` Filipe Manana
  2022-01-17 17:04       ` Anthony Ruhier
  0 siblings, 1 reply; 14+ messages in thread
From: Filipe Manana @ 2022-01-17 16:52 UTC (permalink / raw)
  To: Anthony Ruhier; +Cc: linux-btrfs

On Mon, Jan 17, 2022 at 03:24:00PM +0100, Anthony Ruhier wrote:
> Thanks for the answer.
> 
> I had the exact same issue as in the thread you've linked, and have some
> monitoring and graphs that showed that btrfs-cleaner did constant writes
> during 12 hours just after I upgraded to linux 5.16. Weirdly enough, the
> issue almost disappeared after I did a btrfs balance by filtering on 10%
> usage of data.
> But that's why I initially disabled autodefrag, what has lead to discovering
> this bug as I switched to manual defragmentation (which, in the end, makes
> more sense anyway with my setup).
> 
> I tried this patch, but sadly it doesn't help for the initial issue. I
> cannot say for the bug in the other thread, as the problem with
> btrfs-cleaner disappeared (I can still see some writes from it, but it so
> rare that I cannot say if it's normal or not).

Ok, reading better your first mail, I see there's the case of the 1 byte
file, which is possibly not the cause from the autodefrag causing the
excessive IO problem.

For the 1 byte file problem, I've just sent a fix:

https://patchwork.kernel.org/project/linux-btrfs/patch/bcbfce0ff7e21bbfed2484b1457e560edf78020d.1642436805.git.fdmanana@suse.com/

It's actually trivial to trigger.

Can you check if it also fixes your problem with autodefrag?

If not, then try the following (after applying the 1 file fix):

https://pastebin.com/raw/EbEfk1tF

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index a5bd6926f7ff..db795e226cca 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1191,6 +1191,7 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
 				  u64 newer_than, bool do_compress,
 				  bool locked, struct list_head *target_list)
 {
+	const u32 min_thresh = extent_thresh / 2;
 	u64 cur = start;
 	int ret = 0;
 
@@ -1198,6 +1199,7 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
 		struct extent_map *em;
 		struct defrag_target_range *new;
 		bool next_mergeable = true;
+		u64 range_start;
 		u64 range_len;
 
 		em = defrag_lookup_extent(&inode->vfs_inode, cur, locked);
@@ -1213,6 +1215,24 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
 		if (em->generation < newer_than)
 			goto next;
 
+		/*
+		 * Our start offset might be in the middle of an existing extent
+		 * map, so take that into account.
+		 */
+		range_len = em->len - (cur - em->start);
+
+		/*
+		 * If there's already a good range for delalloc within the range
+		 * covered by the extent map, skip it, otherwise we can end up
+		 * doing on the same IO range for a long time when using auto
+		 * defrag.
+		 */
+		range_start = cur;
+		if (count_range_bits(&inode->io_tree, &range_start,
+				     range_start + range_len - 1, min_thresh,
+				     EXTENT_DELALLOC, 1) >= min_thresh)
+			goto next;
+
 		/*
 		 * For do_compress case, we want to compress all valid file
 		 * extents, thus no @extent_thresh or mergeable check.
@@ -1220,8 +1240,8 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
 		if (do_compress)
 			goto add;
 
-		/* Skip too large extent */
-		if (em->len >= extent_thresh)
+		/* Skip large enough ranges. */
+		if (range_len >= extent_thresh)
 			goto next;
 
 		next_mergeable = defrag_check_next_extent(&inode->vfs_inode, em,


Thanks.


> 
> Thanks,
> Anthony
> 
> Le 17/01/2022 à 13:10, Filipe Manana a écrit :
> > On Sun, Jan 16, 2022 at 08:15:37PM +0100, Anthony Ruhier wrote:
> > > Hi,
> > > Since I upgraded from linux 5.15 to 5.16, `btrfs filesystem defrag -t128K`
> > > hangs on small files (~1 byte) and triggers what it seems to be a loop in
> > > the kernel. It results in one CPU thread running being used at 100%. I
> > > cannot kill the process, and rebooting is blocked by btrfs.
> > > It is a copy of the bughttps://bugzilla.kernel.org/show_bug.cgi?id=215498
> > > 
> > > Rebooting to linux 5.15 shows no issue. I have no issue to run a defrag on
> > > bigger files (I filter out files smaller than 3.9KB).
> > > 
> > > I had a conversation on #btrfs on IRC, so here's what we debugged:
> > > 
> > > I can replicate the issue by copying a file impacted by this bug, by using
> > > `cp --reflink=never`. I attached one of the impacted files to this bug,
> > > named README.md.
> > > 
> > > Someone told me that it could be a bug due to the inline extent. So we tried
> > > to check that.
> > > 
> > > filefrag shows that the file Readme.md is 1 inline extent. I tried to create
> > > a new file with random text, of 18 bytes (slightly bigger than the other
> > > file), that is also 1 inline extent. This file doesn't trigger the bug and
> > > has no issue to be defragmented.
> > > 
> > > I tried to mount my system with `max_inline=0`, created a copy of README.md.
> > > `filefrag` shows me that the new file is now 1 extent, not inline. This new
> > > file also triggers the bug, so it doesn't seem to be due to the inline
> > > extent.
> > > 
> > > Someone asked me to provide the output of a perf top when the defrag is
> > > stuck:
> > > 
> > >      28.70%  [kernel]          [k] generic_bin_search
> > >      14.90%  [kernel]          [k] free_extent_buffer
> > >      13.17%  [kernel]          [k] btrfs_search_slot
> > >      12.63%  [kernel]          [k] btrfs_root_node
> > >       8.33%  [kernel]          [k] btrfs_get_64
> > >       3.88%  [kernel]          [k] __down_read_common.llvm
> > >       3.00%  [kernel]          [k] up_read
> > >       2.63%  [kernel]          [k] read_block_for_search
> > >       2.40%  [kernel]          [k] read_extent_buffer
> > >       1.38%  [kernel]          [k] memset_erms
> > >       1.11%  [kernel]          [k] find_extent_buffer
> > >       0.69%  [kernel]          [k] kmem_cache_free
> > >       0.69%  [kernel]          [k] memcpy_erms
> > >       0.57%  [kernel]          [k] kmem_cache_alloc
> > >       0.45%  [kernel]          [k] radix_tree_lookup
> > > 
> > > I can reproduce the bug on 2 different machines, running 2 different linux
> > > distributions (Arch and Gentoo) with 2 different kernel configs.
> > > This kernel is compiled with clang, the other with GCC.
> > > 
> > > Kernel version: 5.16.0
> > > Mount options:
> > >      Machine 1:
> > > rw,noatime,compress-force=zstd:2,ssd,discard=async,space_cache=v2,autodefrag
> > >      Machine 2: rw,noatime,compress-force=zstd:3,nossd,space_cache=v2
> > > 
> > > When the error happens, no message is shown in dmesg.
> > This is very likely the same issue as reported at this thread:
> > 
> > https://lore.kernel.org/linux-btrfs/YeVawBBE3r6hVhgs@debian9.Home/T/#ma1c8a9848c9b7e4edb471f7be184599d38e288bb
> > 
> > Are you able to test the patch posted there?
> > 
> > Thanks.
> > 
> > > Thanks,
> > > Anthony Ruhier
> > > 






^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: btrfs fi defrag hangs on small files, 100% CPU thread
  2022-01-17 16:52     ` Filipe Manana
@ 2022-01-17 17:04       ` Anthony Ruhier
  2022-01-17 17:50         ` Anthony Ruhier
  0 siblings, 1 reply; 14+ messages in thread
From: Anthony Ruhier @ 2022-01-17 17:04 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 7852 bytes --]

I'm going to apply your patch for the 1B file, and quickly confirm if it 
works.
Thanks a lot for your patch!

About the autodefrag issue, it's going to be trickier to check that your 
patch fixes it, because for whatever reason the problem seems to have 
resolved itself (or at least, btrfs-cleaner does way less writes) after 
a partial btrfs balance.
I'll try and look the amount of writes after several hours. Maybe for 
this one, see with the author of the other bug. If they can easily 
reproduce the issue then it's going to be easier to check.

Thanks,
Anthony

Le 17/01/2022 à 17:52, Filipe Manana a écrit :
> On Mon, Jan 17, 2022 at 03:24:00PM +0100, Anthony Ruhier wrote:
>> Thanks for the answer.
>>
>> I had the exact same issue as in the thread you've linked, and have some
>> monitoring and graphs that showed that btrfs-cleaner did constant writes
>> during 12 hours just after I upgraded to linux 5.16. Weirdly enough, the
>> issue almost disappeared after I did a btrfs balance by filtering on 10%
>> usage of data.
>> But that's why I initially disabled autodefrag, what has lead to discovering
>> this bug as I switched to manual defragmentation (which, in the end, makes
>> more sense anyway with my setup).
>>
>> I tried this patch, but sadly it doesn't help for the initial issue. I
>> cannot say for the bug in the other thread, as the problem with
>> btrfs-cleaner disappeared (I can still see some writes from it, but it so
>> rare that I cannot say if it's normal or not).
> Ok, reading better your first mail, I see there's the case of the 1 byte
> file, which is possibly not the cause from the autodefrag causing the
> excessive IO problem.
>
> For the 1 byte file problem, I've just sent a fix:
>
> https://patchwork.kernel.org/project/linux-btrfs/patch/bcbfce0ff7e21bbfed2484b1457e560edf78020d.1642436805.git.fdmanana@suse.com/
>
> It's actually trivial to trigger.
>
> Can you check if it also fixes your problem with autodefrag?
>
> If not, then try the following (after applying the 1 file fix):
>
> https://pastebin.com/raw/EbEfk1tF
>
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index a5bd6926f7ff..db795e226cca 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -1191,6 +1191,7 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
>   				  u64 newer_than, bool do_compress,
>   				  bool locked, struct list_head *target_list)
>   {
> +	const u32 min_thresh = extent_thresh / 2;
>   	u64 cur = start;
>   	int ret = 0;
>   
> @@ -1198,6 +1199,7 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
>   		struct extent_map *em;
>   		struct defrag_target_range *new;
>   		bool next_mergeable = true;
> +		u64 range_start;
>   		u64 range_len;
>   
>   		em = defrag_lookup_extent(&inode->vfs_inode, cur, locked);
> @@ -1213,6 +1215,24 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
>   		if (em->generation < newer_than)
>   			goto next;
>   
> +		/*
> +		 * Our start offset might be in the middle of an existing extent
> +		 * map, so take that into account.
> +		 */
> +		range_len = em->len - (cur - em->start);
> +
> +		/*
> +		 * If there's already a good range for delalloc within the range
> +		 * covered by the extent map, skip it, otherwise we can end up
> +		 * doing on the same IO range for a long time when using auto
> +		 * defrag.
> +		 */
> +		range_start = cur;
> +		if (count_range_bits(&inode->io_tree, &range_start,
> +				     range_start + range_len - 1, min_thresh,
> +				     EXTENT_DELALLOC, 1) >= min_thresh)
> +			goto next;
> +
>   		/*
>   		 * For do_compress case, we want to compress all valid file
>   		 * extents, thus no @extent_thresh or mergeable check.
> @@ -1220,8 +1240,8 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
>   		if (do_compress)
>   			goto add;
>   
> -		/* Skip too large extent */
> -		if (em->len >= extent_thresh)
> +		/* Skip large enough ranges. */
> +		if (range_len >= extent_thresh)
>   			goto next;
>   
>   		next_mergeable = defrag_check_next_extent(&inode->vfs_inode, em,
>
>
> Thanks.
>
>
>> Thanks,
>> Anthony
>>
>> Le 17/01/2022 à 13:10, Filipe Manana a écrit :
>>> On Sun, Jan 16, 2022 at 08:15:37PM +0100, Anthony Ruhier wrote:
>>>> Hi,
>>>> Since I upgraded from linux 5.15 to 5.16, `btrfs filesystem defrag -t128K`
>>>> hangs on small files (~1 byte) and triggers what it seems to be a loop in
>>>> the kernel. It results in one CPU thread running being used at 100%. I
>>>> cannot kill the process, and rebooting is blocked by btrfs.
>>>> It is a copy of the bughttps://bugzilla.kernel.org/show_bug.cgi?id=215498
>>>>
>>>> Rebooting to linux 5.15 shows no issue. I have no issue to run a defrag on
>>>> bigger files (I filter out files smaller than 3.9KB).
>>>>
>>>> I had a conversation on #btrfs on IRC, so here's what we debugged:
>>>>
>>>> I can replicate the issue by copying a file impacted by this bug, by using
>>>> `cp --reflink=never`. I attached one of the impacted files to this bug,
>>>> named README.md.
>>>>
>>>> Someone told me that it could be a bug due to the inline extent. So we tried
>>>> to check that.
>>>>
>>>> filefrag shows that the file Readme.md is 1 inline extent. I tried to create
>>>> a new file with random text, of 18 bytes (slightly bigger than the other
>>>> file), that is also 1 inline extent. This file doesn't trigger the bug and
>>>> has no issue to be defragmented.
>>>>
>>>> I tried to mount my system with `max_inline=0`, created a copy of README.md.
>>>> `filefrag` shows me that the new file is now 1 extent, not inline. This new
>>>> file also triggers the bug, so it doesn't seem to be due to the inline
>>>> extent.
>>>>
>>>> Someone asked me to provide the output of a perf top when the defrag is
>>>> stuck:
>>>>
>>>>       28.70%  [kernel]          [k] generic_bin_search
>>>>       14.90%  [kernel]          [k] free_extent_buffer
>>>>       13.17%  [kernel]          [k] btrfs_search_slot
>>>>       12.63%  [kernel]          [k] btrfs_root_node
>>>>        8.33%  [kernel]          [k] btrfs_get_64
>>>>        3.88%  [kernel]          [k] __down_read_common.llvm
>>>>        3.00%  [kernel]          [k] up_read
>>>>        2.63%  [kernel]          [k] read_block_for_search
>>>>        2.40%  [kernel]          [k] read_extent_buffer
>>>>        1.38%  [kernel]          [k] memset_erms
>>>>        1.11%  [kernel]          [k] find_extent_buffer
>>>>        0.69%  [kernel]          [k] kmem_cache_free
>>>>        0.69%  [kernel]          [k] memcpy_erms
>>>>        0.57%  [kernel]          [k] kmem_cache_alloc
>>>>        0.45%  [kernel]          [k] radix_tree_lookup
>>>>
>>>> I can reproduce the bug on 2 different machines, running 2 different linux
>>>> distributions (Arch and Gentoo) with 2 different kernel configs.
>>>> This kernel is compiled with clang, the other with GCC.
>>>>
>>>> Kernel version: 5.16.0
>>>> Mount options:
>>>>       Machine 1:
>>>> rw,noatime,compress-force=zstd:2,ssd,discard=async,space_cache=v2,autodefrag
>>>>       Machine 2: rw,noatime,compress-force=zstd:3,nossd,space_cache=v2
>>>>
>>>> When the error happens, no message is shown in dmesg.
>>> This is very likely the same issue as reported at this thread:
>>>
>>> https://lore.kernel.org/linux-btrfs/YeVawBBE3r6hVhgs@debian9.Home/T/#ma1c8a9848c9b7e4edb471f7be184599d38e288bb
>>>
>>> Are you able to test the patch posted there?
>>>
>>> Thanks.
>>>
>>>> Thanks,
>>>> Anthony Ruhier
>>>>
>
>
>
>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs fi defrag hangs on small files, 100% CPU thread
  2022-01-17 17:04       ` Anthony Ruhier
@ 2022-01-17 17:50         ` Anthony Ruhier
  2022-01-17 17:56           ` Filipe Manana
  0 siblings, 1 reply; 14+ messages in thread
From: Anthony Ruhier @ 2022-01-17 17:50 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 9182 bytes --]

Filipe,

Just a quick update after my previous email, your patch fixed the issue 
for `btrfs fi defrag`.
Thanks a lot! I closed my bug on bugzilla.

I'll keep you in touch about the autodefrag.

Le 17/01/2022 à 18:04, Anthony Ruhier a écrit :
> I'm going to apply your patch for the 1B file, and quickly confirm if 
> it works.
> Thanks a lot for your patch!
>
> About the autodefrag issue, it's going to be trickier to check that 
> your patch fixes it, because for whatever reason the problem seems to 
> have resolved itself (or at least, btrfs-cleaner does way less writes) 
> after a partial btrfs balance.
> I'll try and look the amount of writes after several hours. Maybe for 
> this one, see with the author of the other bug. If they can easily 
> reproduce the issue then it's going to be easier to check.
>
> Thanks,
> Anthony
>
> Le 17/01/2022 à 17:52, Filipe Manana a écrit :
>> On Mon, Jan 17, 2022 at 03:24:00PM +0100, Anthony Ruhier wrote:
>>> Thanks for the answer.
>>>
>>> I had the exact same issue as in the thread you've linked, and have 
>>> some
>>> monitoring and graphs that showed that btrfs-cleaner did constant 
>>> writes
>>> during 12 hours just after I upgraded to linux 5.16. Weirdly enough, 
>>> the
>>> issue almost disappeared after I did a btrfs balance by filtering on 
>>> 10%
>>> usage of data.
>>> But that's why I initially disabled autodefrag, what has lead to 
>>> discovering
>>> this bug as I switched to manual defragmentation (which, in the end, 
>>> makes
>>> more sense anyway with my setup).
>>>
>>> I tried this patch, but sadly it doesn't help for the initial issue. I
>>> cannot say for the bug in the other thread, as the problem with
>>> btrfs-cleaner disappeared (I can still see some writes from it, but 
>>> it so
>>> rare that I cannot say if it's normal or not).
>> Ok, reading better your first mail, I see there's the case of the 1 byte
>> file, which is possibly not the cause from the autodefrag causing the
>> excessive IO problem.
>>
>> For the 1 byte file problem, I've just sent a fix:
>>
>> https://patchwork.kernel.org/project/linux-btrfs/patch/bcbfce0ff7e21bbfed2484b1457e560edf78020d.1642436805.git.fdmanana@suse.com/ 
>>
>>
>> It's actually trivial to trigger.
>>
>> Can you check if it also fixes your problem with autodefrag?
>>
>> If not, then try the following (after applying the 1 file fix):
>>
>> https://pastebin.com/raw/EbEfk1tF
>>
>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>> index a5bd6926f7ff..db795e226cca 100644
>> --- a/fs/btrfs/ioctl.c
>> +++ b/fs/btrfs/ioctl.c
>> @@ -1191,6 +1191,7 @@ static int defrag_collect_targets(struct 
>> btrfs_inode *inode,
>>                     u64 newer_than, bool do_compress,
>>                     bool locked, struct list_head *target_list)
>>   {
>> +    const u32 min_thresh = extent_thresh / 2;
>>       u64 cur = start;
>>       int ret = 0;
>>   @@ -1198,6 +1199,7 @@ static int defrag_collect_targets(struct 
>> btrfs_inode *inode,
>>           struct extent_map *em;
>>           struct defrag_target_range *new;
>>           bool next_mergeable = true;
>> +        u64 range_start;
>>           u64 range_len;
>>             em = defrag_lookup_extent(&inode->vfs_inode, cur, locked);
>> @@ -1213,6 +1215,24 @@ static int defrag_collect_targets(struct 
>> btrfs_inode *inode,
>>           if (em->generation < newer_than)
>>               goto next;
>>   +        /*
>> +         * Our start offset might be in the middle of an existing 
>> extent
>> +         * map, so take that into account.
>> +         */
>> +        range_len = em->len - (cur - em->start);
>> +
>> +        /*
>> +         * If there's already a good range for delalloc within the 
>> range
>> +         * covered by the extent map, skip it, otherwise we can end up
>> +         * doing on the same IO range for a long time when using auto
>> +         * defrag.
>> +         */
>> +        range_start = cur;
>> +        if (count_range_bits(&inode->io_tree, &range_start,
>> +                     range_start + range_len - 1, min_thresh,
>> +                     EXTENT_DELALLOC, 1) >= min_thresh)
>> +            goto next;
>> +
>>           /*
>>            * For do_compress case, we want to compress all valid file
>>            * extents, thus no @extent_thresh or mergeable check.
>> @@ -1220,8 +1240,8 @@ static int defrag_collect_targets(struct 
>> btrfs_inode *inode,
>>           if (do_compress)
>>               goto add;
>>   -        /* Skip too large extent */
>> -        if (em->len >= extent_thresh)
>> +        /* Skip large enough ranges. */
>> +        if (range_len >= extent_thresh)
>>               goto next;
>>             next_mergeable = 
>> defrag_check_next_extent(&inode->vfs_inode, em,
>>
>>
>> Thanks.
>>
>>
>>> Thanks,
>>> Anthony
>>>
>>> Le 17/01/2022 à 13:10, Filipe Manana a écrit :
>>>> On Sun, Jan 16, 2022 at 08:15:37PM +0100, Anthony Ruhier wrote:
>>>>> Hi,
>>>>> Since I upgraded from linux 5.15 to 5.16, `btrfs filesystem defrag 
>>>>> -t128K`
>>>>> hangs on small files (~1 byte) and triggers what it seems to be a 
>>>>> loop in
>>>>> the kernel. It results in one CPU thread running being used at 
>>>>> 100%. I
>>>>> cannot kill the process, and rebooting is blocked by btrfs.
>>>>> It is a copy of the 
>>>>> bughttps://bugzilla.kernel.org/show_bug.cgi?id=215498
>>>>>
>>>>> Rebooting to linux 5.15 shows no issue. I have no issue to run a 
>>>>> defrag on
>>>>> bigger files (I filter out files smaller than 3.9KB).
>>>>>
>>>>> I had a conversation on #btrfs on IRC, so here's what we debugged:
>>>>>
>>>>> I can replicate the issue by copying a file impacted by this bug, 
>>>>> by using
>>>>> `cp --reflink=never`. I attached one of the impacted files to this 
>>>>> bug,
>>>>> named README.md.
>>>>>
>>>>> Someone told me that it could be a bug due to the inline extent. 
>>>>> So we tried
>>>>> to check that.
>>>>>
>>>>> filefrag shows that the file Readme.md is 1 inline extent. I tried 
>>>>> to create
>>>>> a new file with random text, of 18 bytes (slightly bigger than the 
>>>>> other
>>>>> file), that is also 1 inline extent. This file doesn't trigger the 
>>>>> bug and
>>>>> has no issue to be defragmented.
>>>>>
>>>>> I tried to mount my system with `max_inline=0`, created a copy of 
>>>>> README.md.
>>>>> `filefrag` shows me that the new file is now 1 extent, not inline. 
>>>>> This new
>>>>> file also triggers the bug, so it doesn't seem to be due to the 
>>>>> inline
>>>>> extent.
>>>>>
>>>>> Someone asked me to provide the output of a perf top when the 
>>>>> defrag is
>>>>> stuck:
>>>>>
>>>>>       28.70%  [kernel]          [k] generic_bin_search
>>>>>       14.90%  [kernel]          [k] free_extent_buffer
>>>>>       13.17%  [kernel]          [k] btrfs_search_slot
>>>>>       12.63%  [kernel]          [k] btrfs_root_node
>>>>>        8.33%  [kernel]          [k] btrfs_get_64
>>>>>        3.88%  [kernel]          [k] __down_read_common.llvm
>>>>>        3.00%  [kernel]          [k] up_read
>>>>>        2.63%  [kernel]          [k] read_block_for_search
>>>>>        2.40%  [kernel]          [k] read_extent_buffer
>>>>>        1.38%  [kernel]          [k] memset_erms
>>>>>        1.11%  [kernel]          [k] find_extent_buffer
>>>>>        0.69%  [kernel]          [k] kmem_cache_free
>>>>>        0.69%  [kernel]          [k] memcpy_erms
>>>>>        0.57%  [kernel]          [k] kmem_cache_alloc
>>>>>        0.45%  [kernel]          [k] radix_tree_lookup
>>>>>
>>>>> I can reproduce the bug on 2 different machines, running 2 
>>>>> different linux
>>>>> distributions (Arch and Gentoo) with 2 different kernel configs.
>>>>> This kernel is compiled with clang, the other with GCC.
>>>>>
>>>>> Kernel version: 5.16.0
>>>>> Mount options:
>>>>>       Machine 1:
>>>>> rw,noatime,compress-force=zstd:2,ssd,discard=async,space_cache=v2,autodefrag 
>>>>>
>>>>>       Machine 2: 
>>>>> rw,noatime,compress-force=zstd:3,nossd,space_cache=v2
>>>>>
>>>>> When the error happens, no message is shown in dmesg.
>>>> This is very likely the same issue as reported at this thread:
>>>>
>>>> https://lore.kernel.org/linux-btrfs/YeVawBBE3r6hVhgs@debian9.Home/T/#ma1c8a9848c9b7e4edb471f7be184599d38e288bb 
>>>>
>>>>
>>>> Are you able to test the patch posted there?
>>>>
>>>> Thanks.
>>>>
>>>>> Thanks,
>>>>> Anthony Ruhier
>>>>>
>>
>>
>>
>>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs fi defrag hangs on small files, 100% CPU thread
  2022-01-17 17:50         ` Anthony Ruhier
@ 2022-01-17 17:56           ` Filipe Manana
  2022-01-17 19:39             ` Anthony Ruhier
  0 siblings, 1 reply; 14+ messages in thread
From: Filipe Manana @ 2022-01-17 17:56 UTC (permalink / raw)
  To: Anthony Ruhier; +Cc: linux-btrfs

On Mon, Jan 17, 2022 at 5:50 PM Anthony Ruhier <aruhier@mailbox.org> wrote:
>
> Filipe,
>
> Just a quick update after my previous email, your patch fixed the issue
> for `btrfs fi defrag`.
> Thanks a lot! I closed my bug on bugzilla.
>
> I'll keep you in touch about the autodefrag.

Please do.
The 1 byte file case was very specific, so it's probably a different issue.

Thanks for testing!

>
> Le 17/01/2022 à 18:04, Anthony Ruhier a écrit :
> > I'm going to apply your patch for the 1B file, and quickly confirm if
> > it works.
> > Thanks a lot for your patch!
> >
> > About the autodefrag issue, it's going to be trickier to check that
> > your patch fixes it, because for whatever reason the problem seems to
> > have resolved itself (or at least, btrfs-cleaner does way less writes)
> > after a partial btrfs balance.
> > I'll try and look the amount of writes after several hours. Maybe for
> > this one, see with the author of the other bug. If they can easily
> > reproduce the issue then it's going to be easier to check.
> >
> > Thanks,
> > Anthony
> >
> > Le 17/01/2022 à 17:52, Filipe Manana a écrit :
> >> On Mon, Jan 17, 2022 at 03:24:00PM +0100, Anthony Ruhier wrote:
> >>> Thanks for the answer.
> >>>
> >>> I had the exact same issue as in the thread you've linked, and have
> >>> some
> >>> monitoring and graphs that showed that btrfs-cleaner did constant
> >>> writes
> >>> during 12 hours just after I upgraded to linux 5.16. Weirdly enough,
> >>> the
> >>> issue almost disappeared after I did a btrfs balance by filtering on
> >>> 10%
> >>> usage of data.
> >>> But that's why I initially disabled autodefrag, what has lead to
> >>> discovering
> >>> this bug as I switched to manual defragmentation (which, in the end,
> >>> makes
> >>> more sense anyway with my setup).
> >>>
> >>> I tried this patch, but sadly it doesn't help for the initial issue. I
> >>> cannot say for the bug in the other thread, as the problem with
> >>> btrfs-cleaner disappeared (I can still see some writes from it, but
> >>> it so
> >>> rare that I cannot say if it's normal or not).
> >> Ok, reading better your first mail, I see there's the case of the 1 byte
> >> file, which is possibly not the cause from the autodefrag causing the
> >> excessive IO problem.
> >>
> >> For the 1 byte file problem, I've just sent a fix:
> >>
> >> https://patchwork.kernel.org/project/linux-btrfs/patch/bcbfce0ff7e21bbfed2484b1457e560edf78020d.1642436805.git.fdmanana@suse.com/
> >>
> >>
> >> It's actually trivial to trigger.
> >>
> >> Can you check if it also fixes your problem with autodefrag?
> >>
> >> If not, then try the following (after applying the 1 file fix):
> >>
> >> https://pastebin.com/raw/EbEfk1tF
> >>
> >> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> >> index a5bd6926f7ff..db795e226cca 100644
> >> --- a/fs/btrfs/ioctl.c
> >> +++ b/fs/btrfs/ioctl.c
> >> @@ -1191,6 +1191,7 @@ static int defrag_collect_targets(struct
> >> btrfs_inode *inode,
> >>                     u64 newer_than, bool do_compress,
> >>                     bool locked, struct list_head *target_list)
> >>   {
> >> +    const u32 min_thresh = extent_thresh / 2;
> >>       u64 cur = start;
> >>       int ret = 0;
> >>   @@ -1198,6 +1199,7 @@ static int defrag_collect_targets(struct
> >> btrfs_inode *inode,
> >>           struct extent_map *em;
> >>           struct defrag_target_range *new;
> >>           bool next_mergeable = true;
> >> +        u64 range_start;
> >>           u64 range_len;
> >>             em = defrag_lookup_extent(&inode->vfs_inode, cur, locked);
> >> @@ -1213,6 +1215,24 @@ static int defrag_collect_targets(struct
> >> btrfs_inode *inode,
> >>           if (em->generation < newer_than)
> >>               goto next;
> >>   +        /*
> >> +         * Our start offset might be in the middle of an existing
> >> extent
> >> +         * map, so take that into account.
> >> +         */
> >> +        range_len = em->len - (cur - em->start);
> >> +
> >> +        /*
> >> +         * If there's already a good range for delalloc within the
> >> range
> >> +         * covered by the extent map, skip it, otherwise we can end up
> >> +         * doing on the same IO range for a long time when using auto
> >> +         * defrag.
> >> +         */
> >> +        range_start = cur;
> >> +        if (count_range_bits(&inode->io_tree, &range_start,
> >> +                     range_start + range_len - 1, min_thresh,
> >> +                     EXTENT_DELALLOC, 1) >= min_thresh)
> >> +            goto next;
> >> +
> >>           /*
> >>            * For do_compress case, we want to compress all valid file
> >>            * extents, thus no @extent_thresh or mergeable check.
> >> @@ -1220,8 +1240,8 @@ static int defrag_collect_targets(struct
> >> btrfs_inode *inode,
> >>           if (do_compress)
> >>               goto add;
> >>   -        /* Skip too large extent */
> >> -        if (em->len >= extent_thresh)
> >> +        /* Skip large enough ranges. */
> >> +        if (range_len >= extent_thresh)
> >>               goto next;
> >>             next_mergeable =
> >> defrag_check_next_extent(&inode->vfs_inode, em,
> >>
> >>
> >> Thanks.
> >>
> >>
> >>> Thanks,
> >>> Anthony
> >>>
> >>> Le 17/01/2022 à 13:10, Filipe Manana a écrit :
> >>>> On Sun, Jan 16, 2022 at 08:15:37PM +0100, Anthony Ruhier wrote:
> >>>>> Hi,
> >>>>> Since I upgraded from linux 5.15 to 5.16, `btrfs filesystem defrag
> >>>>> -t128K`
> >>>>> hangs on small files (~1 byte) and triggers what it seems to be a
> >>>>> loop in
> >>>>> the kernel. It results in one CPU thread running being used at
> >>>>> 100%. I
> >>>>> cannot kill the process, and rebooting is blocked by btrfs.
> >>>>> It is a copy of the
> >>>>> bughttps://bugzilla.kernel.org/show_bug.cgi?id=215498
> >>>>>
> >>>>> Rebooting to linux 5.15 shows no issue. I have no issue to run a
> >>>>> defrag on
> >>>>> bigger files (I filter out files smaller than 3.9KB).
> >>>>>
> >>>>> I had a conversation on #btrfs on IRC, so here's what we debugged:
> >>>>>
> >>>>> I can replicate the issue by copying a file impacted by this bug,
> >>>>> by using
> >>>>> `cp --reflink=never`. I attached one of the impacted files to this
> >>>>> bug,
> >>>>> named README.md.
> >>>>>
> >>>>> Someone told me that it could be a bug due to the inline extent.
> >>>>> So we tried
> >>>>> to check that.
> >>>>>
> >>>>> filefrag shows that the file Readme.md is 1 inline extent. I tried
> >>>>> to create
> >>>>> a new file with random text, of 18 bytes (slightly bigger than the
> >>>>> other
> >>>>> file), that is also 1 inline extent. This file doesn't trigger the
> >>>>> bug and
> >>>>> has no issue to be defragmented.
> >>>>>
> >>>>> I tried to mount my system with `max_inline=0`, created a copy of
> >>>>> README.md.
> >>>>> `filefrag` shows me that the new file is now 1 extent, not inline.
> >>>>> This new
> >>>>> file also triggers the bug, so it doesn't seem to be due to the
> >>>>> inline
> >>>>> extent.
> >>>>>
> >>>>> Someone asked me to provide the output of a perf top when the
> >>>>> defrag is
> >>>>> stuck:
> >>>>>
> >>>>>       28.70%  [kernel]          [k] generic_bin_search
> >>>>>       14.90%  [kernel]          [k] free_extent_buffer
> >>>>>       13.17%  [kernel]          [k] btrfs_search_slot
> >>>>>       12.63%  [kernel]          [k] btrfs_root_node
> >>>>>        8.33%  [kernel]          [k] btrfs_get_64
> >>>>>        3.88%  [kernel]          [k] __down_read_common.llvm
> >>>>>        3.00%  [kernel]          [k] up_read
> >>>>>        2.63%  [kernel]          [k] read_block_for_search
> >>>>>        2.40%  [kernel]          [k] read_extent_buffer
> >>>>>        1.38%  [kernel]          [k] memset_erms
> >>>>>        1.11%  [kernel]          [k] find_extent_buffer
> >>>>>        0.69%  [kernel]          [k] kmem_cache_free
> >>>>>        0.69%  [kernel]          [k] memcpy_erms
> >>>>>        0.57%  [kernel]          [k] kmem_cache_alloc
> >>>>>        0.45%  [kernel]          [k] radix_tree_lookup
> >>>>>
> >>>>> I can reproduce the bug on 2 different machines, running 2
> >>>>> different linux
> >>>>> distributions (Arch and Gentoo) with 2 different kernel configs.
> >>>>> This kernel is compiled with clang, the other with GCC.
> >>>>>
> >>>>> Kernel version: 5.16.0
> >>>>> Mount options:
> >>>>>       Machine 1:
> >>>>> rw,noatime,compress-force=zstd:2,ssd,discard=async,space_cache=v2,autodefrag
> >>>>>
> >>>>>       Machine 2:
> >>>>> rw,noatime,compress-force=zstd:3,nossd,space_cache=v2
> >>>>>
> >>>>> When the error happens, no message is shown in dmesg.
> >>>> This is very likely the same issue as reported at this thread:
> >>>>
> >>>> https://lore.kernel.org/linux-btrfs/YeVawBBE3r6hVhgs@debian9.Home/T/#ma1c8a9848c9b7e4edb471f7be184599d38e288bb
> >>>>
> >>>>
> >>>> Are you able to test the patch posted there?
> >>>>
> >>>> Thanks.
> >>>>
> >>>>> Thanks,
> >>>>> Anthony Ruhier
> >>>>>
> >>
> >>
> >>
> >>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs fi defrag hangs on small files, 100% CPU thread
  2022-01-17 17:56           ` Filipe Manana
@ 2022-01-17 19:39             ` Anthony Ruhier
  2022-01-17 23:15               ` Qu Wenruo
  0 siblings, 1 reply; 14+ messages in thread
From: Anthony Ruhier @ 2022-01-17 19:39 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 10237 bytes --]

I have some good news and bad news: the bad news is that it didn't fix 
the autodefrag problem (I applied the 2 patches).

The good news is that when I enable autodefrag, I can quickly see if the 
problem is still there or not.
It's not super obvious compared to the amount of writes that happened 
constantly just after my upgrade to linux 5.16, because since then the 
problem mitigated itself a bit, but it's still noticeable.

If I compare the write average (in total, I don't have it per process) 
when taking idle periods on the same machine:
     Linux 5.16:
         without autodefrag: ~ 10KiB/s
         with autodefrag: between 1 and 2MiB/s.

     Linux 5.15:
         with autodefrag:~ 10KiB/s (around the same as without 
autodefrag on 5.16)

Feel free to ask me anything to help your debugging, just try to be 
quite explicit about what I should do, I'm not experimented in 
filesystems debugging.
Thanks

Le 17/01/2022 à 18:56, Filipe Manana a écrit :
> On Mon, Jan 17, 2022 at 5:50 PM Anthony Ruhier <aruhier@mailbox.org> wrote:
>> Filipe,
>>
>> Just a quick update after my previous email, your patch fixed the issue
>> for `btrfs fi defrag`.
>> Thanks a lot! I closed my bug on bugzilla.
>>
>> I'll keep you in touch about the autodefrag.
> Please do.
> The 1 byte file case was very specific, so it's probably a different issue.
>
> Thanks for testing!
>
>> Le 17/01/2022 à 18:04, Anthony Ruhier a écrit :
>>> I'm going to apply your patch for the 1B file, and quickly confirm if
>>> it works.
>>> Thanks a lot for your patch!
>>>
>>> About the autodefrag issue, it's going to be trickier to check that
>>> your patch fixes it, because for whatever reason the problem seems to
>>> have resolved itself (or at least, btrfs-cleaner does way less writes)
>>> after a partial btrfs balance.
>>> I'll try and look the amount of writes after several hours. Maybe for
>>> this one, see with the author of the other bug. If they can easily
>>> reproduce the issue then it's going to be easier to check.
>>>
>>> Thanks,
>>> Anthony
>>>
>>> Le 17/01/2022 à 17:52, Filipe Manana a écrit :
>>>> On Mon, Jan 17, 2022 at 03:24:00PM +0100, Anthony Ruhier wrote:
>>>>> Thanks for the answer.
>>>>>
>>>>> I had the exact same issue as in the thread you've linked, and have
>>>>> some
>>>>> monitoring and graphs that showed that btrfs-cleaner did constant
>>>>> writes
>>>>> during 12 hours just after I upgraded to linux 5.16. Weirdly enough,
>>>>> the
>>>>> issue almost disappeared after I did a btrfs balance by filtering on
>>>>> 10%
>>>>> usage of data.
>>>>> But that's why I initially disabled autodefrag, what has lead to
>>>>> discovering
>>>>> this bug as I switched to manual defragmentation (which, in the end,
>>>>> makes
>>>>> more sense anyway with my setup).
>>>>>
>>>>> I tried this patch, but sadly it doesn't help for the initial issue. I
>>>>> cannot say for the bug in the other thread, as the problem with
>>>>> btrfs-cleaner disappeared (I can still see some writes from it, but
>>>>> it so
>>>>> rare that I cannot say if it's normal or not).
>>>> Ok, reading better your first mail, I see there's the case of the 1 byte
>>>> file, which is possibly not the cause from the autodefrag causing the
>>>> excessive IO problem.
>>>>
>>>> For the 1 byte file problem, I've just sent a fix:
>>>>
>>>> https://patchwork.kernel.org/project/linux-btrfs/patch/bcbfce0ff7e21bbfed2484b1457e560edf78020d.1642436805.git.fdmanana@suse.com/
>>>>
>>>>
>>>> It's actually trivial to trigger.
>>>>
>>>> Can you check if it also fixes your problem with autodefrag?
>>>>
>>>> If not, then try the following (after applying the 1 file fix):
>>>>
>>>> https://pastebin.com/raw/EbEfk1tF
>>>>
>>>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>>>> index a5bd6926f7ff..db795e226cca 100644
>>>> --- a/fs/btrfs/ioctl.c
>>>> +++ b/fs/btrfs/ioctl.c
>>>> @@ -1191,6 +1191,7 @@ static int defrag_collect_targets(struct
>>>> btrfs_inode *inode,
>>>>                      u64 newer_than, bool do_compress,
>>>>                      bool locked, struct list_head *target_list)
>>>>    {
>>>> +    const u32 min_thresh = extent_thresh / 2;
>>>>        u64 cur = start;
>>>>        int ret = 0;
>>>>    @@ -1198,6 +1199,7 @@ static int defrag_collect_targets(struct
>>>> btrfs_inode *inode,
>>>>            struct extent_map *em;
>>>>            struct defrag_target_range *new;
>>>>            bool next_mergeable = true;
>>>> +        u64 range_start;
>>>>            u64 range_len;
>>>>              em = defrag_lookup_extent(&inode->vfs_inode, cur, locked);
>>>> @@ -1213,6 +1215,24 @@ static int defrag_collect_targets(struct
>>>> btrfs_inode *inode,
>>>>            if (em->generation < newer_than)
>>>>                goto next;
>>>>    +        /*
>>>> +         * Our start offset might be in the middle of an existing
>>>> extent
>>>> +         * map, so take that into account.
>>>> +         */
>>>> +        range_len = em->len - (cur - em->start);
>>>> +
>>>> +        /*
>>>> +         * If there's already a good range for delalloc within the
>>>> range
>>>> +         * covered by the extent map, skip it, otherwise we can end up
>>>> +         * doing on the same IO range for a long time when using auto
>>>> +         * defrag.
>>>> +         */
>>>> +        range_start = cur;
>>>> +        if (count_range_bits(&inode->io_tree, &range_start,
>>>> +                     range_start + range_len - 1, min_thresh,
>>>> +                     EXTENT_DELALLOC, 1) >= min_thresh)
>>>> +            goto next;
>>>> +
>>>>            /*
>>>>             * For do_compress case, we want to compress all valid file
>>>>             * extents, thus no @extent_thresh or mergeable check.
>>>> @@ -1220,8 +1240,8 @@ static int defrag_collect_targets(struct
>>>> btrfs_inode *inode,
>>>>            if (do_compress)
>>>>                goto add;
>>>>    -        /* Skip too large extent */
>>>> -        if (em->len >= extent_thresh)
>>>> +        /* Skip large enough ranges. */
>>>> +        if (range_len >= extent_thresh)
>>>>                goto next;
>>>>              next_mergeable =
>>>> defrag_check_next_extent(&inode->vfs_inode, em,
>>>>
>>>>
>>>> Thanks.
>>>>
>>>>
>>>>> Thanks,
>>>>> Anthony
>>>>>
>>>>> Le 17/01/2022 à 13:10, Filipe Manana a écrit :
>>>>>> On Sun, Jan 16, 2022 at 08:15:37PM +0100, Anthony Ruhier wrote:
>>>>>>> Hi,
>>>>>>> Since I upgraded from linux 5.15 to 5.16, `btrfs filesystem defrag
>>>>>>> -t128K`
>>>>>>> hangs on small files (~1 byte) and triggers what it seems to be a
>>>>>>> loop in
>>>>>>> the kernel. It results in one CPU thread running being used at
>>>>>>> 100%. I
>>>>>>> cannot kill the process, and rebooting is blocked by btrfs.
>>>>>>> It is a copy of the
>>>>>>> bughttps://bugzilla.kernel.org/show_bug.cgi?id=215498
>>>>>>>
>>>>>>> Rebooting to linux 5.15 shows no issue. I have no issue to run a
>>>>>>> defrag on
>>>>>>> bigger files (I filter out files smaller than 3.9KB).
>>>>>>>
>>>>>>> I had a conversation on #btrfs on IRC, so here's what we debugged:
>>>>>>>
>>>>>>> I can replicate the issue by copying a file impacted by this bug,
>>>>>>> by using
>>>>>>> `cp --reflink=never`. I attached one of the impacted files to this
>>>>>>> bug,
>>>>>>> named README.md.
>>>>>>>
>>>>>>> Someone told me that it could be a bug due to the inline extent.
>>>>>>> So we tried
>>>>>>> to check that.
>>>>>>>
>>>>>>> filefrag shows that the file Readme.md is 1 inline extent. I tried
>>>>>>> to create
>>>>>>> a new file with random text, of 18 bytes (slightly bigger than the
>>>>>>> other
>>>>>>> file), that is also 1 inline extent. This file doesn't trigger the
>>>>>>> bug and
>>>>>>> has no issue to be defragmented.
>>>>>>>
>>>>>>> I tried to mount my system with `max_inline=0`, created a copy of
>>>>>>> README.md.
>>>>>>> `filefrag` shows me that the new file is now 1 extent, not inline.
>>>>>>> This new
>>>>>>> file also triggers the bug, so it doesn't seem to be due to the
>>>>>>> inline
>>>>>>> extent.
>>>>>>>
>>>>>>> Someone asked me to provide the output of a perf top when the
>>>>>>> defrag is
>>>>>>> stuck:
>>>>>>>
>>>>>>>        28.70%  [kernel]          [k] generic_bin_search
>>>>>>>        14.90%  [kernel]          [k] free_extent_buffer
>>>>>>>        13.17%  [kernel]          [k] btrfs_search_slot
>>>>>>>        12.63%  [kernel]          [k] btrfs_root_node
>>>>>>>         8.33%  [kernel]          [k] btrfs_get_64
>>>>>>>         3.88%  [kernel]          [k] __down_read_common.llvm
>>>>>>>         3.00%  [kernel]          [k] up_read
>>>>>>>         2.63%  [kernel]          [k] read_block_for_search
>>>>>>>         2.40%  [kernel]          [k] read_extent_buffer
>>>>>>>         1.38%  [kernel]          [k] memset_erms
>>>>>>>         1.11%  [kernel]          [k] find_extent_buffer
>>>>>>>         0.69%  [kernel]          [k] kmem_cache_free
>>>>>>>         0.69%  [kernel]          [k] memcpy_erms
>>>>>>>         0.57%  [kernel]          [k] kmem_cache_alloc
>>>>>>>         0.45%  [kernel]          [k] radix_tree_lookup
>>>>>>>
>>>>>>> I can reproduce the bug on 2 different machines, running 2
>>>>>>> different linux
>>>>>>> distributions (Arch and Gentoo) with 2 different kernel configs.
>>>>>>> This kernel is compiled with clang, the other with GCC.
>>>>>>>
>>>>>>> Kernel version: 5.16.0
>>>>>>> Mount options:
>>>>>>>        Machine 1:
>>>>>>> rw,noatime,compress-force=zstd:2,ssd,discard=async,space_cache=v2,autodefrag
>>>>>>>
>>>>>>>        Machine 2:
>>>>>>> rw,noatime,compress-force=zstd:3,nossd,space_cache=v2
>>>>>>>
>>>>>>> When the error happens, no message is shown in dmesg.
>>>>>> This is very likely the same issue as reported at this thread:
>>>>>>
>>>>>> https://lore.kernel.org/linux-btrfs/YeVawBBE3r6hVhgs@debian9.Home/T/#ma1c8a9848c9b7e4edb471f7be184599d38e288bb
>>>>>>
>>>>>>
>>>>>> Are you able to test the patch posted there?
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>> Thanks,
>>>>>>> Anthony Ruhier
>>>>>>>
>>>>
>>>>
>>>>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs fi defrag hangs on small files, 100% CPU thread
  2022-01-17 19:39             ` Anthony Ruhier
@ 2022-01-17 23:15               ` Qu Wenruo
  2022-01-17 23:32                 ` Qu Wenruo
  0 siblings, 1 reply; 14+ messages in thread
From: Qu Wenruo @ 2022-01-17 23:15 UTC (permalink / raw)
  To: Anthony Ruhier, Filipe Manana; +Cc: linux-btrfs



On 2022/1/18 03:39, Anthony Ruhier wrote:
> I have some good news and bad news: the bad news is that it didn't fix
> the autodefrag problem (I applied the 2 patches).
>
> The good news is that when I enable autodefrag, I can quickly see if the
> problem is still there or not.
> It's not super obvious compared to the amount of writes that happened
> constantly just after my upgrade to linux 5.16, because since then the
> problem mitigated itself a bit, but it's still noticeable.
>
> If I compare the write average (in total, I don't have it per process)
> when taking idle periods on the same machine:
>      Linux 5.16:
>          without autodefrag: ~ 10KiB/s
>          with autodefrag: between 1 and 2MiB/s.
>
>      Linux 5.15:
>          with autodefrag:~ 10KiB/s (around the same as without
> autodefrag on 5.16)
>
> Feel free to ask me anything to help your debugging, just try to be
> quite explicit about what I should do, I'm not experimented in
> filesystems debugging.

Mind to test the following diff (along with previous two patches from
Filipe)?

I screwed up, the refactor changed how the defraged bytes accounting,
which I didn't notice that autodefrag relies on that to requeue the inode.

The refactor is originally to add support for subpage defrag, but it
looks like I pushed the boundary too hard to change some behaviors.


diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 11204dbbe053..c720260f9656 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -312,7 +312,9 @@ static int __btrfs_run_defrag_inode(struct
btrfs_fs_info *fs_info,
          */
         if (num_defrag == BTRFS_DEFRAG_BATCH) {
                 defrag->last_offset = range.start;
+               /*
                 btrfs_requeue_inode_defrag(BTRFS_I(inode), defrag);
+               */
         } else if (defrag->last_offset && !defrag->cycled) {
                 /*
                  * we didn't fill our defrag batch, but
@@ -321,7 +323,9 @@ static int __btrfs_run_defrag_inode(struct
btrfs_fs_info *fs_info,
                  */
                 defrag->last_offset = 0;
                 defrag->cycled = 1;
+               /*
                 btrfs_requeue_inode_defrag(BTRFS_I(inode), defrag);
+               */
         } else {
                 kmem_cache_free(btrfs_inode_defrag_cachep, defrag);
         }



> Thanks
>
> Le 17/01/2022 à 18:56, Filipe Manana a écrit :
>> On Mon, Jan 17, 2022 at 5:50 PM Anthony Ruhier <aruhier@mailbox.org>
>> wrote:
>>> Filipe,
>>>
>>> Just a quick update after my previous email, your patch fixed the issue
>>> for `btrfs fi defrag`.
>>> Thanks a lot! I closed my bug on bugzilla.
>>>
>>> I'll keep you in touch about the autodefrag.
>> Please do.
>> The 1 byte file case was very specific, so it's probably a different
>> issue.
>>
>> Thanks for testing!
>>
>>> Le 17/01/2022 à 18:04, Anthony Ruhier a écrit :
>>>> I'm going to apply your patch for the 1B file, and quickly confirm if
>>>> it works.
>>>> Thanks a lot for your patch!
>>>>
>>>> About the autodefrag issue, it's going to be trickier to check that
>>>> your patch fixes it, because for whatever reason the problem seems to
>>>> have resolved itself (or at least, btrfs-cleaner does way less writes)
>>>> after a partial btrfs balance.
>>>> I'll try and look the amount of writes after several hours. Maybe for
>>>> this one, see with the author of the other bug. If they can easily
>>>> reproduce the issue then it's going to be easier to check.
>>>>
>>>> Thanks,
>>>> Anthony
>>>>
>>>> Le 17/01/2022 à 17:52, Filipe Manana a écrit :
>>>>> On Mon, Jan 17, 2022 at 03:24:00PM +0100, Anthony Ruhier wrote:
>>>>>> Thanks for the answer.
>>>>>>
>>>>>> I had the exact same issue as in the thread you've linked, and have
>>>>>> some
>>>>>> monitoring and graphs that showed that btrfs-cleaner did constant
>>>>>> writes
>>>>>> during 12 hours just after I upgraded to linux 5.16. Weirdly enough,
>>>>>> the
>>>>>> issue almost disappeared after I did a btrfs balance by filtering on
>>>>>> 10%
>>>>>> usage of data.
>>>>>> But that's why I initially disabled autodefrag, what has lead to
>>>>>> discovering
>>>>>> this bug as I switched to manual defragmentation (which, in the end,
>>>>>> makes
>>>>>> more sense anyway with my setup).
>>>>>>
>>>>>> I tried this patch, but sadly it doesn't help for the initial
>>>>>> issue. I
>>>>>> cannot say for the bug in the other thread, as the problem with
>>>>>> btrfs-cleaner disappeared (I can still see some writes from it, but
>>>>>> it so
>>>>>> rare that I cannot say if it's normal or not).
>>>>> Ok, reading better your first mail, I see there's the case of the 1
>>>>> byte
>>>>> file, which is possibly not the cause from the autodefrag causing the
>>>>> excessive IO problem.
>>>>>
>>>>> For the 1 byte file problem, I've just sent a fix:
>>>>>
>>>>> https://patchwork.kernel.org/project/linux-btrfs/patch/bcbfce0ff7e21bbfed2484b1457e560edf78020d.1642436805.git.fdmanana@suse.com/
>>>>>
>>>>>
>>>>>
>>>>> It's actually trivial to trigger.
>>>>>
>>>>> Can you check if it also fixes your problem with autodefrag?
>>>>>
>>>>> If not, then try the following (after applying the 1 file fix):
>>>>>
>>>>> https://pastebin.com/raw/EbEfk1tF
>>>>>
>>>>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>>>>> index a5bd6926f7ff..db795e226cca 100644
>>>>> --- a/fs/btrfs/ioctl.c
>>>>> +++ b/fs/btrfs/ioctl.c
>>>>> @@ -1191,6 +1191,7 @@ static int defrag_collect_targets(struct
>>>>> btrfs_inode *inode,
>>>>>                      u64 newer_than, bool do_compress,
>>>>>                      bool locked, struct list_head *target_list)
>>>>>    {
>>>>> +    const u32 min_thresh = extent_thresh / 2;
>>>>>        u64 cur = start;
>>>>>        int ret = 0;
>>>>>    @@ -1198,6 +1199,7 @@ static int defrag_collect_targets(struct
>>>>> btrfs_inode *inode,
>>>>>            struct extent_map *em;
>>>>>            struct defrag_target_range *new;
>>>>>            bool next_mergeable = true;
>>>>> +        u64 range_start;
>>>>>            u64 range_len;
>>>>>              em = defrag_lookup_extent(&inode->vfs_inode, cur,
>>>>> locked);
>>>>> @@ -1213,6 +1215,24 @@ static int defrag_collect_targets(struct
>>>>> btrfs_inode *inode,
>>>>>            if (em->generation < newer_than)
>>>>>                goto next;
>>>>>    +        /*
>>>>> +         * Our start offset might be in the middle of an existing
>>>>> extent
>>>>> +         * map, so take that into account.
>>>>> +         */
>>>>> +        range_len = em->len - (cur - em->start);
>>>>> +
>>>>> +        /*
>>>>> +         * If there's already a good range for delalloc within the
>>>>> range
>>>>> +         * covered by the extent map, skip it, otherwise we can
>>>>> end up
>>>>> +         * doing on the same IO range for a long time when using auto
>>>>> +         * defrag.
>>>>> +         */
>>>>> +        range_start = cur;
>>>>> +        if (count_range_bits(&inode->io_tree, &range_start,
>>>>> +                     range_start + range_len - 1, min_thresh,
>>>>> +                     EXTENT_DELALLOC, 1) >= min_thresh)
>>>>> +            goto next;
>>>>> +
>>>>>            /*
>>>>>             * For do_compress case, we want to compress all valid file
>>>>>             * extents, thus no @extent_thresh or mergeable check.
>>>>> @@ -1220,8 +1240,8 @@ static int defrag_collect_targets(struct
>>>>> btrfs_inode *inode,
>>>>>            if (do_compress)
>>>>>                goto add;
>>>>>    -        /* Skip too large extent */
>>>>> -        if (em->len >= extent_thresh)
>>>>> +        /* Skip large enough ranges. */
>>>>> +        if (range_len >= extent_thresh)
>>>>>                goto next;
>>>>>              next_mergeable =
>>>>> defrag_check_next_extent(&inode->vfs_inode, em,
>>>>>
>>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>>>> Thanks,
>>>>>> Anthony
>>>>>>
>>>>>> Le 17/01/2022 à 13:10, Filipe Manana a écrit :
>>>>>>> On Sun, Jan 16, 2022 at 08:15:37PM +0100, Anthony Ruhier wrote:
>>>>>>>> Hi,
>>>>>>>> Since I upgraded from linux 5.15 to 5.16, `btrfs filesystem defrag
>>>>>>>> -t128K`
>>>>>>>> hangs on small files (~1 byte) and triggers what it seems to be a
>>>>>>>> loop in
>>>>>>>> the kernel. It results in one CPU thread running being used at
>>>>>>>> 100%. I
>>>>>>>> cannot kill the process, and rebooting is blocked by btrfs.
>>>>>>>> It is a copy of the
>>>>>>>> bughttps://bugzilla.kernel.org/show_bug.cgi?id=215498
>>>>>>>>
>>>>>>>> Rebooting to linux 5.15 shows no issue. I have no issue to run a
>>>>>>>> defrag on
>>>>>>>> bigger files (I filter out files smaller than 3.9KB).
>>>>>>>>
>>>>>>>> I had a conversation on #btrfs on IRC, so here's what we debugged:
>>>>>>>>
>>>>>>>> I can replicate the issue by copying a file impacted by this bug,
>>>>>>>> by using
>>>>>>>> `cp --reflink=never`. I attached one of the impacted files to this
>>>>>>>> bug,
>>>>>>>> named README.md.
>>>>>>>>
>>>>>>>> Someone told me that it could be a bug due to the inline extent.
>>>>>>>> So we tried
>>>>>>>> to check that.
>>>>>>>>
>>>>>>>> filefrag shows that the file Readme.md is 1 inline extent. I tried
>>>>>>>> to create
>>>>>>>> a new file with random text, of 18 bytes (slightly bigger than the
>>>>>>>> other
>>>>>>>> file), that is also 1 inline extent. This file doesn't trigger the
>>>>>>>> bug and
>>>>>>>> has no issue to be defragmented.
>>>>>>>>
>>>>>>>> I tried to mount my system with `max_inline=0`, created a copy of
>>>>>>>> README.md.
>>>>>>>> `filefrag` shows me that the new file is now 1 extent, not inline.
>>>>>>>> This new
>>>>>>>> file also triggers the bug, so it doesn't seem to be due to the
>>>>>>>> inline
>>>>>>>> extent.
>>>>>>>>
>>>>>>>> Someone asked me to provide the output of a perf top when the
>>>>>>>> defrag is
>>>>>>>> stuck:
>>>>>>>>
>>>>>>>>        28.70%  [kernel]          [k] generic_bin_search
>>>>>>>>        14.90%  [kernel]          [k] free_extent_buffer
>>>>>>>>        13.17%  [kernel]          [k] btrfs_search_slot
>>>>>>>>        12.63%  [kernel]          [k] btrfs_root_node
>>>>>>>>         8.33%  [kernel]          [k] btrfs_get_64
>>>>>>>>         3.88%  [kernel]          [k] __down_read_common.llvm
>>>>>>>>         3.00%  [kernel]          [k] up_read
>>>>>>>>         2.63%  [kernel]          [k] read_block_for_search
>>>>>>>>         2.40%  [kernel]          [k] read_extent_buffer
>>>>>>>>         1.38%  [kernel]          [k] memset_erms
>>>>>>>>         1.11%  [kernel]          [k] find_extent_buffer
>>>>>>>>         0.69%  [kernel]          [k] kmem_cache_free
>>>>>>>>         0.69%  [kernel]          [k] memcpy_erms
>>>>>>>>         0.57%  [kernel]          [k] kmem_cache_alloc
>>>>>>>>         0.45%  [kernel]          [k] radix_tree_lookup
>>>>>>>>
>>>>>>>> I can reproduce the bug on 2 different machines, running 2
>>>>>>>> different linux
>>>>>>>> distributions (Arch and Gentoo) with 2 different kernel configs.
>>>>>>>> This kernel is compiled with clang, the other with GCC.
>>>>>>>>
>>>>>>>> Kernel version: 5.16.0
>>>>>>>> Mount options:
>>>>>>>>        Machine 1:
>>>>>>>> rw,noatime,compress-force=zstd:2,ssd,discard=async,space_cache=v2,autodefrag
>>>>>>>>
>>>>>>>>
>>>>>>>>        Machine 2:
>>>>>>>> rw,noatime,compress-force=zstd:3,nossd,space_cache=v2
>>>>>>>>
>>>>>>>> When the error happens, no message is shown in dmesg.
>>>>>>> This is very likely the same issue as reported at this thread:
>>>>>>>
>>>>>>> https://lore.kernel.org/linux-btrfs/YeVawBBE3r6hVhgs@debian9.Home/T/#ma1c8a9848c9b7e4edb471f7be184599d38e288bb
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Are you able to test the patch posted there?
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Anthony Ruhier
>>>>>>>>
>>>>>
>>>>>
>>>>>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: btrfs fi defrag hangs on small files, 100% CPU thread
  2022-01-17 23:15               ` Qu Wenruo
@ 2022-01-17 23:32                 ` Qu Wenruo
  2022-01-18  1:58                   ` Anthony Ruhier
  0 siblings, 1 reply; 14+ messages in thread
From: Qu Wenruo @ 2022-01-17 23:32 UTC (permalink / raw)
  To: Anthony Ruhier, Filipe Manana; +Cc: linux-btrfs



On 2022/1/18 07:15, Qu Wenruo wrote:
>
>
> On 2022/1/18 03:39, Anthony Ruhier wrote:
>> I have some good news and bad news: the bad news is that it didn't fix
>> the autodefrag problem (I applied the 2 patches).
>>
>> The good news is that when I enable autodefrag, I can quickly see if the
>> problem is still there or not.
>> It's not super obvious compared to the amount of writes that happened
>> constantly just after my upgrade to linux 5.16, because since then the
>> problem mitigated itself a bit, but it's still noticeable.
>>
>> If I compare the write average (in total, I don't have it per process)
>> when taking idle periods on the same machine:
>>      Linux 5.16:
>>          without autodefrag: ~ 10KiB/s
>>          with autodefrag: between 1 and 2MiB/s.
>>
>>      Linux 5.15:
>>          with autodefrag:~ 10KiB/s (around the same as without
>> autodefrag on 5.16)
>>
>> Feel free to ask me anything to help your debugging, just try to be
>> quite explicit about what I should do, I'm not experimented in
>> filesystems debugging.
>
> Mind to test the following diff (along with previous two patches from
> Filipe)?
>
> I screwed up, the refactor changed how the defraged bytes accounting,
> which I didn't notice that autodefrag relies on that to requeue the inode.
>
> The refactor is originally to add support for subpage defrag, but it
> looks like I pushed the boundary too hard to change some behaviors.

Sorry, previous diff has a memory leakage bug.

The autodefrag code is more complex than I thought, it has extra logic
to handle defrag.

Please try this one instead.

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 11204dbbe053..096feecf2602 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -305,26 +305,7 @@ static int __btrfs_run_defrag_inode(struct
btrfs_fs_info *fs_info,
         num_defrag = btrfs_defrag_file(inode, NULL, &range,
defrag->transid,
                                        BTRFS_DEFRAG_BATCH);
         sb_end_write(fs_info->sb);
-       /*
-        * if we filled the whole defrag batch, there
-        * must be more work to do.  Queue this defrag
-        * again
-        */
-       if (num_defrag == BTRFS_DEFRAG_BATCH) {
-               defrag->last_offset = range.start;
-               btrfs_requeue_inode_defrag(BTRFS_I(inode), defrag);
-       } else if (defrag->last_offset && !defrag->cycled) {
-               /*
-                * we didn't fill our defrag batch, but
-                * we didn't start at zero.  Make sure we loop
-                * around to the start of the file.
-                */
-               defrag->last_offset = 0;
-               defrag->cycled = 1;
-               btrfs_requeue_inode_defrag(BTRFS_I(inode), defrag);
-       } else {
-               kmem_cache_free(btrfs_inode_defrag_cachep, defrag);
-       }
+       kmem_cache_free(btrfs_inode_defrag_cachep, defrag);

         iput(inode);
         return 0;

>
>
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 11204dbbe053..c720260f9656 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -312,7 +312,9 @@ static int __btrfs_run_defrag_inode(struct
> btrfs_fs_info *fs_info,
>           */
>          if (num_defrag == BTRFS_DEFRAG_BATCH) {
>                  defrag->last_offset = range.start;
> +               /*
>                  btrfs_requeue_inode_defrag(BTRFS_I(inode), defrag);
> +               */
>          } else if (defrag->last_offset && !defrag->cycled) {
>                  /*
>                   * we didn't fill our defrag batch, but
> @@ -321,7 +323,9 @@ static int __btrfs_run_defrag_inode(struct
> btrfs_fs_info *fs_info,
>                   */
>                  defrag->last_offset = 0;
>                  defrag->cycled = 1;
> +               /*
>                  btrfs_requeue_inode_defrag(BTRFS_I(inode), defrag);
> +               */
>          } else {
>                  kmem_cache_free(btrfs_inode_defrag_cachep, defrag);
>          }
>
>
>
>> Thanks
>>
>> Le 17/01/2022 à 18:56, Filipe Manana a écrit :
>>> On Mon, Jan 17, 2022 at 5:50 PM Anthony Ruhier <aruhier@mailbox.org>
>>> wrote:
>>>> Filipe,
>>>>
>>>> Just a quick update after my previous email, your patch fixed the issue
>>>> for `btrfs fi defrag`.
>>>> Thanks a lot! I closed my bug on bugzilla.
>>>>
>>>> I'll keep you in touch about the autodefrag.
>>> Please do.
>>> The 1 byte file case was very specific, so it's probably a different
>>> issue.
>>>
>>> Thanks for testing!
>>>
>>>> Le 17/01/2022 à 18:04, Anthony Ruhier a écrit :
>>>>> I'm going to apply your patch for the 1B file, and quickly confirm if
>>>>> it works.
>>>>> Thanks a lot for your patch!
>>>>>
>>>>> About the autodefrag issue, it's going to be trickier to check that
>>>>> your patch fixes it, because for whatever reason the problem seems to
>>>>> have resolved itself (or at least, btrfs-cleaner does way less writes)
>>>>> after a partial btrfs balance.
>>>>> I'll try and look the amount of writes after several hours. Maybe for
>>>>> this one, see with the author of the other bug. If they can easily
>>>>> reproduce the issue then it's going to be easier to check.
>>>>>
>>>>> Thanks,
>>>>> Anthony
>>>>>
>>>>> Le 17/01/2022 à 17:52, Filipe Manana a écrit :
>>>>>> On Mon, Jan 17, 2022 at 03:24:00PM +0100, Anthony Ruhier wrote:
>>>>>>> Thanks for the answer.
>>>>>>>
>>>>>>> I had the exact same issue as in the thread you've linked, and have
>>>>>>> some
>>>>>>> monitoring and graphs that showed that btrfs-cleaner did constant
>>>>>>> writes
>>>>>>> during 12 hours just after I upgraded to linux 5.16. Weirdly enough,
>>>>>>> the
>>>>>>> issue almost disappeared after I did a btrfs balance by filtering on
>>>>>>> 10%
>>>>>>> usage of data.
>>>>>>> But that's why I initially disabled autodefrag, what has lead to
>>>>>>> discovering
>>>>>>> this bug as I switched to manual defragmentation (which, in the end,
>>>>>>> makes
>>>>>>> more sense anyway with my setup).
>>>>>>>
>>>>>>> I tried this patch, but sadly it doesn't help for the initial
>>>>>>> issue. I
>>>>>>> cannot say for the bug in the other thread, as the problem with
>>>>>>> btrfs-cleaner disappeared (I can still see some writes from it, but
>>>>>>> it so
>>>>>>> rare that I cannot say if it's normal or not).
>>>>>> Ok, reading better your first mail, I see there's the case of the 1
>>>>>> byte
>>>>>> file, which is possibly not the cause from the autodefrag causing the
>>>>>> excessive IO problem.
>>>>>>
>>>>>> For the 1 byte file problem, I've just sent a fix:
>>>>>>
>>>>>> https://patchwork.kernel.org/project/linux-btrfs/patch/bcbfce0ff7e21bbfed2484b1457e560edf78020d.1642436805.git.fdmanana@suse.com/
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> It's actually trivial to trigger.
>>>>>>
>>>>>> Can you check if it also fixes your problem with autodefrag?
>>>>>>
>>>>>> If not, then try the following (after applying the 1 file fix):
>>>>>>
>>>>>> https://pastebin.com/raw/EbEfk1tF
>>>>>>
>>>>>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>>>>>> index a5bd6926f7ff..db795e226cca 100644
>>>>>> --- a/fs/btrfs/ioctl.c
>>>>>> +++ b/fs/btrfs/ioctl.c
>>>>>> @@ -1191,6 +1191,7 @@ static int defrag_collect_targets(struct
>>>>>> btrfs_inode *inode,
>>>>>>                      u64 newer_than, bool do_compress,
>>>>>>                      bool locked, struct list_head *target_list)
>>>>>>    {
>>>>>> +    const u32 min_thresh = extent_thresh / 2;
>>>>>>        u64 cur = start;
>>>>>>        int ret = 0;
>>>>>>    @@ -1198,6 +1199,7 @@ static int defrag_collect_targets(struct
>>>>>> btrfs_inode *inode,
>>>>>>            struct extent_map *em;
>>>>>>            struct defrag_target_range *new;
>>>>>>            bool next_mergeable = true;
>>>>>> +        u64 range_start;
>>>>>>            u64 range_len;
>>>>>>              em = defrag_lookup_extent(&inode->vfs_inode, cur,
>>>>>> locked);
>>>>>> @@ -1213,6 +1215,24 @@ static int defrag_collect_targets(struct
>>>>>> btrfs_inode *inode,
>>>>>>            if (em->generation < newer_than)
>>>>>>                goto next;
>>>>>>    +        /*
>>>>>> +         * Our start offset might be in the middle of an existing
>>>>>> extent
>>>>>> +         * map, so take that into account.
>>>>>> +         */
>>>>>> +        range_len = em->len - (cur - em->start);
>>>>>> +
>>>>>> +        /*
>>>>>> +         * If there's already a good range for delalloc within the
>>>>>> range
>>>>>> +         * covered by the extent map, skip it, otherwise we can
>>>>>> end up
>>>>>> +         * doing on the same IO range for a long time when using
>>>>>> auto
>>>>>> +         * defrag.
>>>>>> +         */
>>>>>> +        range_start = cur;
>>>>>> +        if (count_range_bits(&inode->io_tree, &range_start,
>>>>>> +                     range_start + range_len - 1, min_thresh,
>>>>>> +                     EXTENT_DELALLOC, 1) >= min_thresh)
>>>>>> +            goto next;
>>>>>> +
>>>>>>            /*
>>>>>>             * For do_compress case, we want to compress all valid
>>>>>> file
>>>>>>             * extents, thus no @extent_thresh or mergeable check.
>>>>>> @@ -1220,8 +1240,8 @@ static int defrag_collect_targets(struct
>>>>>> btrfs_inode *inode,
>>>>>>            if (do_compress)
>>>>>>                goto add;
>>>>>>    -        /* Skip too large extent */
>>>>>> -        if (em->len >= extent_thresh)
>>>>>> +        /* Skip large enough ranges. */
>>>>>> +        if (range_len >= extent_thresh)
>>>>>>                goto next;
>>>>>>              next_mergeable =
>>>>>> defrag_check_next_extent(&inode->vfs_inode, em,
>>>>>>
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>
>>>>>>> Thanks,
>>>>>>> Anthony
>>>>>>>
>>>>>>> Le 17/01/2022 à 13:10, Filipe Manana a écrit :
>>>>>>>> On Sun, Jan 16, 2022 at 08:15:37PM +0100, Anthony Ruhier wrote:
>>>>>>>>> Hi,
>>>>>>>>> Since I upgraded from linux 5.15 to 5.16, `btrfs filesystem defrag
>>>>>>>>> -t128K`
>>>>>>>>> hangs on small files (~1 byte) and triggers what it seems to be a
>>>>>>>>> loop in
>>>>>>>>> the kernel. It results in one CPU thread running being used at
>>>>>>>>> 100%. I
>>>>>>>>> cannot kill the process, and rebooting is blocked by btrfs.
>>>>>>>>> It is a copy of the
>>>>>>>>> bughttps://bugzilla.kernel.org/show_bug.cgi?id=215498
>>>>>>>>>
>>>>>>>>> Rebooting to linux 5.15 shows no issue. I have no issue to run a
>>>>>>>>> defrag on
>>>>>>>>> bigger files (I filter out files smaller than 3.9KB).
>>>>>>>>>
>>>>>>>>> I had a conversation on #btrfs on IRC, so here's what we debugged:
>>>>>>>>>
>>>>>>>>> I can replicate the issue by copying a file impacted by this bug,
>>>>>>>>> by using
>>>>>>>>> `cp --reflink=never`. I attached one of the impacted files to this
>>>>>>>>> bug,
>>>>>>>>> named README.md.
>>>>>>>>>
>>>>>>>>> Someone told me that it could be a bug due to the inline extent.
>>>>>>>>> So we tried
>>>>>>>>> to check that.
>>>>>>>>>
>>>>>>>>> filefrag shows that the file Readme.md is 1 inline extent. I tried
>>>>>>>>> to create
>>>>>>>>> a new file with random text, of 18 bytes (slightly bigger than the
>>>>>>>>> other
>>>>>>>>> file), that is also 1 inline extent. This file doesn't trigger the
>>>>>>>>> bug and
>>>>>>>>> has no issue to be defragmented.
>>>>>>>>>
>>>>>>>>> I tried to mount my system with `max_inline=0`, created a copy of
>>>>>>>>> README.md.
>>>>>>>>> `filefrag` shows me that the new file is now 1 extent, not inline.
>>>>>>>>> This new
>>>>>>>>> file also triggers the bug, so it doesn't seem to be due to the
>>>>>>>>> inline
>>>>>>>>> extent.
>>>>>>>>>
>>>>>>>>> Someone asked me to provide the output of a perf top when the
>>>>>>>>> defrag is
>>>>>>>>> stuck:
>>>>>>>>>
>>>>>>>>>        28.70%  [kernel]          [k] generic_bin_search
>>>>>>>>>        14.90%  [kernel]          [k] free_extent_buffer
>>>>>>>>>        13.17%  [kernel]          [k] btrfs_search_slot
>>>>>>>>>        12.63%  [kernel]          [k] btrfs_root_node
>>>>>>>>>         8.33%  [kernel]          [k] btrfs_get_64
>>>>>>>>>         3.88%  [kernel]          [k] __down_read_common.llvm
>>>>>>>>>         3.00%  [kernel]          [k] up_read
>>>>>>>>>         2.63%  [kernel]          [k] read_block_for_search
>>>>>>>>>         2.40%  [kernel]          [k] read_extent_buffer
>>>>>>>>>         1.38%  [kernel]          [k] memset_erms
>>>>>>>>>         1.11%  [kernel]          [k] find_extent_buffer
>>>>>>>>>         0.69%  [kernel]          [k] kmem_cache_free
>>>>>>>>>         0.69%  [kernel]          [k] memcpy_erms
>>>>>>>>>         0.57%  [kernel]          [k] kmem_cache_alloc
>>>>>>>>>         0.45%  [kernel]          [k] radix_tree_lookup
>>>>>>>>>
>>>>>>>>> I can reproduce the bug on 2 different machines, running 2
>>>>>>>>> different linux
>>>>>>>>> distributions (Arch and Gentoo) with 2 different kernel configs.
>>>>>>>>> This kernel is compiled with clang, the other with GCC.
>>>>>>>>>
>>>>>>>>> Kernel version: 5.16.0
>>>>>>>>> Mount options:
>>>>>>>>>        Machine 1:
>>>>>>>>> rw,noatime,compress-force=zstd:2,ssd,discard=async,space_cache=v2,autodefrag
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>        Machine 2:
>>>>>>>>> rw,noatime,compress-force=zstd:3,nossd,space_cache=v2
>>>>>>>>>
>>>>>>>>> When the error happens, no message is shown in dmesg.
>>>>>>>> This is very likely the same issue as reported at this thread:
>>>>>>>>
>>>>>>>> https://lore.kernel.org/linux-btrfs/YeVawBBE3r6hVhgs@debian9.Home/T/#ma1c8a9848c9b7e4edb471f7be184599d38e288bb
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Are you able to test the patch posted there?
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Anthony Ruhier
>>>>>>>>>
>>>>>>
>>>>>>
>>>>>>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: btrfs fi defrag hangs on small files, 100% CPU thread
  2022-01-17 23:32                 ` Qu Wenruo
@ 2022-01-18  1:58                   ` Anthony Ruhier
  2022-01-18  2:22                     ` Qu Wenruo
  0 siblings, 1 reply; 14+ messages in thread
From: Anthony Ruhier @ 2022-01-18  1:58 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Filipe Manana


[-- Attachment #1.1: Type: text/plain, Size: 16235 bytes --]

Hi Qu,

No problem, thanks a lot for your patch! After a quick look on a 10min 
period, it looks like your patch fixed the issue. I've applied the 2 
patches from Filipe and yours.
I'll let my NAS run over the night and watch the writes average on a 
longer period. I'll send an update in ~10 hours.

Thanks a lot for your help,
Anthony

Le 18/01/2022 à 00:32, Qu Wenruo a écrit :
>
>
> On 2022/1/18 07:15, Qu Wenruo wrote:
>>
>>
>> On 2022/1/18 03:39, Anthony Ruhier wrote:
>>> I have some good news and bad news: the bad news is that it didn't fix
>>> the autodefrag problem (I applied the 2 patches).
>>>
>>> The good news is that when I enable autodefrag, I can quickly see if 
>>> the
>>> problem is still there or not.
>>> It's not super obvious compared to the amount of writes that happened
>>> constantly just after my upgrade to linux 5.16, because since then the
>>> problem mitigated itself a bit, but it's still noticeable.
>>>
>>> If I compare the write average (in total, I don't have it per process)
>>> when taking idle periods on the same machine:
>>>      Linux 5.16:
>>>          without autodefrag: ~ 10KiB/s
>>>          with autodefrag: between 1 and 2MiB/s.
>>>
>>>      Linux 5.15:
>>>          with autodefrag:~ 10KiB/s (around the same as without
>>> autodefrag on 5.16)
>>>
>>> Feel free to ask me anything to help your debugging, just try to be
>>> quite explicit about what I should do, I'm not experimented in
>>> filesystems debugging.
>>
>> Mind to test the following diff (along with previous two patches from
>> Filipe)?
>>
>> I screwed up, the refactor changed how the defraged bytes accounting,
>> which I didn't notice that autodefrag relies on that to requeue the 
>> inode.
>>
>> The refactor is originally to add support for subpage defrag, but it
>> looks like I pushed the boundary too hard to change some behaviors.
>
> Sorry, previous diff has a memory leakage bug.
>
> The autodefrag code is more complex than I thought, it has extra logic
> to handle defrag.
>
> Please try this one instead.
>
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 11204dbbe053..096feecf2602 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -305,26 +305,7 @@ static int __btrfs_run_defrag_inode(struct
> btrfs_fs_info *fs_info,
>         num_defrag = btrfs_defrag_file(inode, NULL, &range,
> defrag->transid,
>                                        BTRFS_DEFRAG_BATCH);
>         sb_end_write(fs_info->sb);
> -       /*
> -        * if we filled the whole defrag batch, there
> -        * must be more work to do.  Queue this defrag
> -        * again
> -        */
> -       if (num_defrag == BTRFS_DEFRAG_BATCH) {
> -               defrag->last_offset = range.start;
> -               btrfs_requeue_inode_defrag(BTRFS_I(inode), defrag);
> -       } else if (defrag->last_offset && !defrag->cycled) {
> -               /*
> -                * we didn't fill our defrag batch, but
> -                * we didn't start at zero.  Make sure we loop
> -                * around to the start of the file.
> -                */
> -               defrag->last_offset = 0;
> -               defrag->cycled = 1;
> -               btrfs_requeue_inode_defrag(BTRFS_I(inode), defrag);
> -       } else {
> -               kmem_cache_free(btrfs_inode_defrag_cachep, defrag);
> -       }
> +       kmem_cache_free(btrfs_inode_defrag_cachep, defrag);
>
>         iput(inode);
>         return 0;
>
>>
>>
>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
>> index 11204dbbe053..c720260f9656 100644
>> --- a/fs/btrfs/file.c
>> +++ b/fs/btrfs/file.c
>> @@ -312,7 +312,9 @@ static int __btrfs_run_defrag_inode(struct
>> btrfs_fs_info *fs_info,
>>           */
>>          if (num_defrag == BTRFS_DEFRAG_BATCH) {
>>                  defrag->last_offset = range.start;
>> +               /*
>>                  btrfs_requeue_inode_defrag(BTRFS_I(inode), defrag);
>> +               */
>>          } else if (defrag->last_offset && !defrag->cycled) {
>>                  /*
>>                   * we didn't fill our defrag batch, but
>> @@ -321,7 +323,9 @@ static int __btrfs_run_defrag_inode(struct
>> btrfs_fs_info *fs_info,
>>                   */
>>                  defrag->last_offset = 0;
>>                  defrag->cycled = 1;
>> +               /*
>>                  btrfs_requeue_inode_defrag(BTRFS_I(inode), defrag);
>> +               */
>>          } else {
>>                  kmem_cache_free(btrfs_inode_defrag_cachep, defrag);
>>          }
>>
>>
>>
>>> Thanks
>>>
>>> Le 17/01/2022 à 18:56, Filipe Manana a écrit :
>>>> On Mon, Jan 17, 2022 at 5:50 PM Anthony Ruhier <aruhier@mailbox.org>
>>>> wrote:
>>>>> Filipe,
>>>>>
>>>>> Just a quick update after my previous email, your patch fixed the 
>>>>> issue
>>>>> for `btrfs fi defrag`.
>>>>> Thanks a lot! I closed my bug on bugzilla.
>>>>>
>>>>> I'll keep you in touch about the autodefrag.
>>>> Please do.
>>>> The 1 byte file case was very specific, so it's probably a different
>>>> issue.
>>>>
>>>> Thanks for testing!
>>>>
>>>>> Le 17/01/2022 à 18:04, Anthony Ruhier a écrit :
>>>>>> I'm going to apply your patch for the 1B file, and quickly 
>>>>>> confirm if
>>>>>> it works.
>>>>>> Thanks a lot for your patch!
>>>>>>
>>>>>> About the autodefrag issue, it's going to be trickier to check that
>>>>>> your patch fixes it, because for whatever reason the problem 
>>>>>> seems to
>>>>>> have resolved itself (or at least, btrfs-cleaner does way less 
>>>>>> writes)
>>>>>> after a partial btrfs balance.
>>>>>> I'll try and look the amount of writes after several hours. Maybe 
>>>>>> for
>>>>>> this one, see with the author of the other bug. If they can easily
>>>>>> reproduce the issue then it's going to be easier to check.
>>>>>>
>>>>>> Thanks,
>>>>>> Anthony
>>>>>>
>>>>>> Le 17/01/2022 à 17:52, Filipe Manana a écrit :
>>>>>>> On Mon, Jan 17, 2022 at 03:24:00PM +0100, Anthony Ruhier wrote:
>>>>>>>> Thanks for the answer.
>>>>>>>>
>>>>>>>> I had the exact same issue as in the thread you've linked, and 
>>>>>>>> have
>>>>>>>> some
>>>>>>>> monitoring and graphs that showed that btrfs-cleaner did constant
>>>>>>>> writes
>>>>>>>> during 12 hours just after I upgraded to linux 5.16. Weirdly 
>>>>>>>> enough,
>>>>>>>> the
>>>>>>>> issue almost disappeared after I did a btrfs balance by 
>>>>>>>> filtering on
>>>>>>>> 10%
>>>>>>>> usage of data.
>>>>>>>> But that's why I initially disabled autodefrag, what has lead to
>>>>>>>> discovering
>>>>>>>> this bug as I switched to manual defragmentation (which, in the 
>>>>>>>> end,
>>>>>>>> makes
>>>>>>>> more sense anyway with my setup).
>>>>>>>>
>>>>>>>> I tried this patch, but sadly it doesn't help for the initial
>>>>>>>> issue. I
>>>>>>>> cannot say for the bug in the other thread, as the problem with
>>>>>>>> btrfs-cleaner disappeared (I can still see some writes from it, 
>>>>>>>> but
>>>>>>>> it so
>>>>>>>> rare that I cannot say if it's normal or not).
>>>>>>> Ok, reading better your first mail, I see there's the case of the 1
>>>>>>> byte
>>>>>>> file, which is possibly not the cause from the autodefrag 
>>>>>>> causing the
>>>>>>> excessive IO problem.
>>>>>>>
>>>>>>> For the 1 byte file problem, I've just sent a fix:
>>>>>>>
>>>>>>> https://patchwork.kernel.org/project/linux-btrfs/patch/bcbfce0ff7e21bbfed2484b1457e560edf78020d.1642436805.git.fdmanana@suse.com/ 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> It's actually trivial to trigger.
>>>>>>>
>>>>>>> Can you check if it also fixes your problem with autodefrag?
>>>>>>>
>>>>>>> If not, then try the following (after applying the 1 file fix):
>>>>>>>
>>>>>>> https://pastebin.com/raw/EbEfk1tF
>>>>>>>
>>>>>>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>>>>>>> index a5bd6926f7ff..db795e226cca 100644
>>>>>>> --- a/fs/btrfs/ioctl.c
>>>>>>> +++ b/fs/btrfs/ioctl.c
>>>>>>> @@ -1191,6 +1191,7 @@ static int defrag_collect_targets(struct
>>>>>>> btrfs_inode *inode,
>>>>>>>                      u64 newer_than, bool do_compress,
>>>>>>>                      bool locked, struct list_head *target_list)
>>>>>>>    {
>>>>>>> +    const u32 min_thresh = extent_thresh / 2;
>>>>>>>        u64 cur = start;
>>>>>>>        int ret = 0;
>>>>>>>    @@ -1198,6 +1199,7 @@ static int defrag_collect_targets(struct
>>>>>>> btrfs_inode *inode,
>>>>>>>            struct extent_map *em;
>>>>>>>            struct defrag_target_range *new;
>>>>>>>            bool next_mergeable = true;
>>>>>>> +        u64 range_start;
>>>>>>>            u64 range_len;
>>>>>>>              em = defrag_lookup_extent(&inode->vfs_inode, cur,
>>>>>>> locked);
>>>>>>> @@ -1213,6 +1215,24 @@ static int defrag_collect_targets(struct
>>>>>>> btrfs_inode *inode,
>>>>>>>            if (em->generation < newer_than)
>>>>>>>                goto next;
>>>>>>>    +        /*
>>>>>>> +         * Our start offset might be in the middle of an existing
>>>>>>> extent
>>>>>>> +         * map, so take that into account.
>>>>>>> +         */
>>>>>>> +        range_len = em->len - (cur - em->start);
>>>>>>> +
>>>>>>> +        /*
>>>>>>> +         * If there's already a good range for delalloc within the
>>>>>>> range
>>>>>>> +         * covered by the extent map, skip it, otherwise we can
>>>>>>> end up
>>>>>>> +         * doing on the same IO range for a long time when using
>>>>>>> auto
>>>>>>> +         * defrag.
>>>>>>> +         */
>>>>>>> +        range_start = cur;
>>>>>>> +        if (count_range_bits(&inode->io_tree, &range_start,
>>>>>>> +                     range_start + range_len - 1, min_thresh,
>>>>>>> +                     EXTENT_DELALLOC, 1) >= min_thresh)
>>>>>>> +            goto next;
>>>>>>> +
>>>>>>>            /*
>>>>>>>             * For do_compress case, we want to compress all valid
>>>>>>> file
>>>>>>>             * extents, thus no @extent_thresh or mergeable check.
>>>>>>> @@ -1220,8 +1240,8 @@ static int defrag_collect_targets(struct
>>>>>>> btrfs_inode *inode,
>>>>>>>            if (do_compress)
>>>>>>>                goto add;
>>>>>>>    -        /* Skip too large extent */
>>>>>>> -        if (em->len >= extent_thresh)
>>>>>>> +        /* Skip large enough ranges. */
>>>>>>> +        if (range_len >= extent_thresh)
>>>>>>>                goto next;
>>>>>>>              next_mergeable =
>>>>>>> defrag_check_next_extent(&inode->vfs_inode, em,
>>>>>>>
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Anthony
>>>>>>>>
>>>>>>>> Le 17/01/2022 à 13:10, Filipe Manana a écrit :
>>>>>>>>> On Sun, Jan 16, 2022 at 08:15:37PM +0100, Anthony Ruhier wrote:
>>>>>>>>>> Hi,
>>>>>>>>>> Since I upgraded from linux 5.15 to 5.16, `btrfs filesystem 
>>>>>>>>>> defrag
>>>>>>>>>> -t128K`
>>>>>>>>>> hangs on small files (~1 byte) and triggers what it seems to 
>>>>>>>>>> be a
>>>>>>>>>> loop in
>>>>>>>>>> the kernel. It results in one CPU thread running being used at
>>>>>>>>>> 100%. I
>>>>>>>>>> cannot kill the process, and rebooting is blocked by btrfs.
>>>>>>>>>> It is a copy of the
>>>>>>>>>> bughttps://bugzilla.kernel.org/show_bug.cgi?id=215498
>>>>>>>>>>
>>>>>>>>>> Rebooting to linux 5.15 shows no issue. I have no issue to run a
>>>>>>>>>> defrag on
>>>>>>>>>> bigger files (I filter out files smaller than 3.9KB).
>>>>>>>>>>
>>>>>>>>>> I had a conversation on #btrfs on IRC, so here's what we 
>>>>>>>>>> debugged:
>>>>>>>>>>
>>>>>>>>>> I can replicate the issue by copying a file impacted by this 
>>>>>>>>>> bug,
>>>>>>>>>> by using
>>>>>>>>>> `cp --reflink=never`. I attached one of the impacted files to 
>>>>>>>>>> this
>>>>>>>>>> bug,
>>>>>>>>>> named README.md.
>>>>>>>>>>
>>>>>>>>>> Someone told me that it could be a bug due to the inline extent.
>>>>>>>>>> So we tried
>>>>>>>>>> to check that.
>>>>>>>>>>
>>>>>>>>>> filefrag shows that the file Readme.md is 1 inline extent. I 
>>>>>>>>>> tried
>>>>>>>>>> to create
>>>>>>>>>> a new file with random text, of 18 bytes (slightly bigger 
>>>>>>>>>> than the
>>>>>>>>>> other
>>>>>>>>>> file), that is also 1 inline extent. This file doesn't 
>>>>>>>>>> trigger the
>>>>>>>>>> bug and
>>>>>>>>>> has no issue to be defragmented.
>>>>>>>>>>
>>>>>>>>>> I tried to mount my system with `max_inline=0`, created a 
>>>>>>>>>> copy of
>>>>>>>>>> README.md.
>>>>>>>>>> `filefrag` shows me that the new file is now 1 extent, not 
>>>>>>>>>> inline.
>>>>>>>>>> This new
>>>>>>>>>> file also triggers the bug, so it doesn't seem to be due to the
>>>>>>>>>> inline
>>>>>>>>>> extent.
>>>>>>>>>>
>>>>>>>>>> Someone asked me to provide the output of a perf top when the
>>>>>>>>>> defrag is
>>>>>>>>>> stuck:
>>>>>>>>>>
>>>>>>>>>>        28.70%  [kernel]          [k] generic_bin_search
>>>>>>>>>>        14.90%  [kernel]          [k] free_extent_buffer
>>>>>>>>>>        13.17%  [kernel]          [k] btrfs_search_slot
>>>>>>>>>>        12.63%  [kernel]          [k] btrfs_root_node
>>>>>>>>>>         8.33%  [kernel]          [k] btrfs_get_64
>>>>>>>>>>         3.88%  [kernel]          [k] __down_read_common.llvm
>>>>>>>>>>         3.00%  [kernel]          [k] up_read
>>>>>>>>>>         2.63%  [kernel]          [k] read_block_for_search
>>>>>>>>>>         2.40%  [kernel]          [k] read_extent_buffer
>>>>>>>>>>         1.38%  [kernel]          [k] memset_erms
>>>>>>>>>>         1.11%  [kernel]          [k] find_extent_buffer
>>>>>>>>>>         0.69%  [kernel]          [k] kmem_cache_free
>>>>>>>>>>         0.69%  [kernel]          [k] memcpy_erms
>>>>>>>>>>         0.57%  [kernel]          [k] kmem_cache_alloc
>>>>>>>>>>         0.45%  [kernel]          [k] radix_tree_lookup
>>>>>>>>>>
>>>>>>>>>> I can reproduce the bug on 2 different machines, running 2
>>>>>>>>>> different linux
>>>>>>>>>> distributions (Arch and Gentoo) with 2 different kernel configs.
>>>>>>>>>> This kernel is compiled with clang, the other with GCC.
>>>>>>>>>>
>>>>>>>>>> Kernel version: 5.16.0
>>>>>>>>>> Mount options:
>>>>>>>>>>        Machine 1:
>>>>>>>>>> rw,noatime,compress-force=zstd:2,ssd,discard=async,space_cache=v2,autodefrag 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>        Machine 2:
>>>>>>>>>> rw,noatime,compress-force=zstd:3,nossd,space_cache=v2
>>>>>>>>>>
>>>>>>>>>> When the error happens, no message is shown in dmesg.
>>>>>>>>> This is very likely the same issue as reported at this thread:
>>>>>>>>>
>>>>>>>>> https://lore.kernel.org/linux-btrfs/YeVawBBE3r6hVhgs@debian9.Home/T/#ma1c8a9848c9b7e4edb471f7be184599d38e288bb 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Are you able to test the patch posted there?
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Anthony Ruhier
>>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs fi defrag hangs on small files, 100% CPU thread
  2022-01-18  1:58                   ` Anthony Ruhier
@ 2022-01-18  2:22                     ` Qu Wenruo
  2022-01-18 10:57                       ` Anthony Ruhier
  0 siblings, 1 reply; 14+ messages in thread
From: Qu Wenruo @ 2022-01-18  2:22 UTC (permalink / raw)
  To: Anthony Ruhier; +Cc: linux-btrfs, Filipe Manana



On 2022/1/18 09:58, Anthony Ruhier wrote:
> Hi Qu,
>
> No problem, thanks a lot for your patch! After a quick look on a 10min
> period, it looks like your patch fixed the issue. I've applied the 2
> patches from Filipe and yours.
> I'll let my NAS run over the night and watch the writes average on a
> longer period. I'll send an update in ~10 hours.

I'm really sorry for all the problems.

Currently I believe the root cause is I changed the return value of
btrfs_defrag_file(), which should return the number of sectors defragged.

Even I renamed the variable to "sectors_defragged", I still return it in
bytes instead.

(Furthermore it tends to report more bytes than really defragged).

The diff I mentioned is not really to solve the problem, but to help us
to pin down the problem.
(The diff itself will make autodefrag to do a little less work, thus it
may not defrag as hard as previously)

Thank you very much helping fixing the mess I caused.


BTW, if you have spare machines, mind to test the following two patches?
(if everything works as expected, they should be the final fixes, thus
please test without other patches)

https://patchwork.kernel.org/project/linux-btrfs/patch/20220118021544.18543-1-wqu@suse.com/
https://patchwork.kernel.org/project/linux-btrfs/patch/bcbfce0ff7e21bbfed2484b1457e560edf78020d.1642436805.git.fdmanana@suse.com/

Thanks,
Qu


>
> Thanks a lot for your help,
> Anthony
>
> Le 18/01/2022 à 00:32, Qu Wenruo a écrit :
>>
>>
>> On 2022/1/18 07:15, Qu Wenruo wrote:
>>>
>>>
>>> On 2022/1/18 03:39, Anthony Ruhier wrote:
>>>> I have some good news and bad news: the bad news is that it didn't fix
>>>> the autodefrag problem (I applied the 2 patches).
>>>>
>>>> The good news is that when I enable autodefrag, I can quickly see if
>>>> the
>>>> problem is still there or not.
>>>> It's not super obvious compared to the amount of writes that happened
>>>> constantly just after my upgrade to linux 5.16, because since then the
>>>> problem mitigated itself a bit, but it's still noticeable.
>>>>
>>>> If I compare the write average (in total, I don't have it per process)
>>>> when taking idle periods on the same machine:
>>>>      Linux 5.16:
>>>>          without autodefrag: ~ 10KiB/s
>>>>          with autodefrag: between 1 and 2MiB/s.
>>>>
>>>>      Linux 5.15:
>>>>          with autodefrag:~ 10KiB/s (around the same as without
>>>> autodefrag on 5.16)
>>>>
>>>> Feel free to ask me anything to help your debugging, just try to be
>>>> quite explicit about what I should do, I'm not experimented in
>>>> filesystems debugging.
>>>
>>> Mind to test the following diff (along with previous two patches from
>>> Filipe)?
>>>
>>> I screwed up, the refactor changed how the defraged bytes accounting,
>>> which I didn't notice that autodefrag relies on that to requeue the
>>> inode.
>>>
>>> The refactor is originally to add support for subpage defrag, but it
>>> looks like I pushed the boundary too hard to change some behaviors.
>>
>> Sorry, previous diff has a memory leakage bug.
>>
>> The autodefrag code is more complex than I thought, it has extra logic
>> to handle defrag.
>>
>> Please try this one instead.
>>
>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
>> index 11204dbbe053..096feecf2602 100644
>> --- a/fs/btrfs/file.c
>> +++ b/fs/btrfs/file.c
>> @@ -305,26 +305,7 @@ static int __btrfs_run_defrag_inode(struct
>> btrfs_fs_info *fs_info,
>>         num_defrag = btrfs_defrag_file(inode, NULL, &range,
>> defrag->transid,
>>                                        BTRFS_DEFRAG_BATCH);
>>         sb_end_write(fs_info->sb);
>> -       /*
>> -        * if we filled the whole defrag batch, there
>> -        * must be more work to do.  Queue this defrag
>> -        * again
>> -        */
>> -       if (num_defrag == BTRFS_DEFRAG_BATCH) {
>> -               defrag->last_offset = range.start;
>> -               btrfs_requeue_inode_defrag(BTRFS_I(inode), defrag);
>> -       } else if (defrag->last_offset && !defrag->cycled) {
>> -               /*
>> -                * we didn't fill our defrag batch, but
>> -                * we didn't start at zero.  Make sure we loop
>> -                * around to the start of the file.
>> -                */
>> -               defrag->last_offset = 0;
>> -               defrag->cycled = 1;
>> -               btrfs_requeue_inode_defrag(BTRFS_I(inode), defrag);
>> -       } else {
>> -               kmem_cache_free(btrfs_inode_defrag_cachep, defrag);
>> -       }
>> +       kmem_cache_free(btrfs_inode_defrag_cachep, defrag);
>>
>>         iput(inode);
>>         return 0;
>>
>>>
>>>
>>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
>>> index 11204dbbe053..c720260f9656 100644
>>> --- a/fs/btrfs/file.c
>>> +++ b/fs/btrfs/file.c
>>> @@ -312,7 +312,9 @@ static int __btrfs_run_defrag_inode(struct
>>> btrfs_fs_info *fs_info,
>>>           */
>>>          if (num_defrag == BTRFS_DEFRAG_BATCH) {
>>>                  defrag->last_offset = range.start;
>>> +               /*
>>>                  btrfs_requeue_inode_defrag(BTRFS_I(inode), defrag);
>>> +               */
>>>          } else if (defrag->last_offset && !defrag->cycled) {
>>>                  /*
>>>                   * we didn't fill our defrag batch, but
>>> @@ -321,7 +323,9 @@ static int __btrfs_run_defrag_inode(struct
>>> btrfs_fs_info *fs_info,
>>>                   */
>>>                  defrag->last_offset = 0;
>>>                  defrag->cycled = 1;
>>> +               /*
>>>                  btrfs_requeue_inode_defrag(BTRFS_I(inode), defrag);
>>> +               */
>>>          } else {
>>>                  kmem_cache_free(btrfs_inode_defrag_cachep, defrag);
>>>          }
>>>
>>>
>>>
>>>> Thanks
>>>>
>>>> Le 17/01/2022 à 18:56, Filipe Manana a écrit :
>>>>> On Mon, Jan 17, 2022 at 5:50 PM Anthony Ruhier <aruhier@mailbox.org>
>>>>> wrote:
>>>>>> Filipe,
>>>>>>
>>>>>> Just a quick update after my previous email, your patch fixed the
>>>>>> issue
>>>>>> for `btrfs fi defrag`.
>>>>>> Thanks a lot! I closed my bug on bugzilla.
>>>>>>
>>>>>> I'll keep you in touch about the autodefrag.
>>>>> Please do.
>>>>> The 1 byte file case was very specific, so it's probably a different
>>>>> issue.
>>>>>
>>>>> Thanks for testing!
>>>>>
>>>>>> Le 17/01/2022 à 18:04, Anthony Ruhier a écrit :
>>>>>>> I'm going to apply your patch for the 1B file, and quickly
>>>>>>> confirm if
>>>>>>> it works.
>>>>>>> Thanks a lot for your patch!
>>>>>>>
>>>>>>> About the autodefrag issue, it's going to be trickier to check that
>>>>>>> your patch fixes it, because for whatever reason the problem
>>>>>>> seems to
>>>>>>> have resolved itself (or at least, btrfs-cleaner does way less
>>>>>>> writes)
>>>>>>> after a partial btrfs balance.
>>>>>>> I'll try and look the amount of writes after several hours. Maybe
>>>>>>> for
>>>>>>> this one, see with the author of the other bug. If they can easily
>>>>>>> reproduce the issue then it's going to be easier to check.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Anthony
>>>>>>>
>>>>>>> Le 17/01/2022 à 17:52, Filipe Manana a écrit :
>>>>>>>> On Mon, Jan 17, 2022 at 03:24:00PM +0100, Anthony Ruhier wrote:
>>>>>>>>> Thanks for the answer.
>>>>>>>>>
>>>>>>>>> I had the exact same issue as in the thread you've linked, and
>>>>>>>>> have
>>>>>>>>> some
>>>>>>>>> monitoring and graphs that showed that btrfs-cleaner did constant
>>>>>>>>> writes
>>>>>>>>> during 12 hours just after I upgraded to linux 5.16. Weirdly
>>>>>>>>> enough,
>>>>>>>>> the
>>>>>>>>> issue almost disappeared after I did a btrfs balance by
>>>>>>>>> filtering on
>>>>>>>>> 10%
>>>>>>>>> usage of data.
>>>>>>>>> But that's why I initially disabled autodefrag, what has lead to
>>>>>>>>> discovering
>>>>>>>>> this bug as I switched to manual defragmentation (which, in the
>>>>>>>>> end,
>>>>>>>>> makes
>>>>>>>>> more sense anyway with my setup).
>>>>>>>>>
>>>>>>>>> I tried this patch, but sadly it doesn't help for the initial
>>>>>>>>> issue. I
>>>>>>>>> cannot say for the bug in the other thread, as the problem with
>>>>>>>>> btrfs-cleaner disappeared (I can still see some writes from it,
>>>>>>>>> but
>>>>>>>>> it so
>>>>>>>>> rare that I cannot say if it's normal or not).
>>>>>>>> Ok, reading better your first mail, I see there's the case of the 1
>>>>>>>> byte
>>>>>>>> file, which is possibly not the cause from the autodefrag
>>>>>>>> causing the
>>>>>>>> excessive IO problem.
>>>>>>>>
>>>>>>>> For the 1 byte file problem, I've just sent a fix:
>>>>>>>>
>>>>>>>> https://patchwork.kernel.org/project/linux-btrfs/patch/bcbfce0ff7e21bbfed2484b1457e560edf78020d.1642436805.git.fdmanana@suse.com/
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> It's actually trivial to trigger.
>>>>>>>>
>>>>>>>> Can you check if it also fixes your problem with autodefrag?
>>>>>>>>
>>>>>>>> If not, then try the following (after applying the 1 file fix):
>>>>>>>>
>>>>>>>> https://pastebin.com/raw/EbEfk1tF
>>>>>>>>
>>>>>>>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>>>>>>>> index a5bd6926f7ff..db795e226cca 100644
>>>>>>>> --- a/fs/btrfs/ioctl.c
>>>>>>>> +++ b/fs/btrfs/ioctl.c
>>>>>>>> @@ -1191,6 +1191,7 @@ static int defrag_collect_targets(struct
>>>>>>>> btrfs_inode *inode,
>>>>>>>>                      u64 newer_than, bool do_compress,
>>>>>>>>                      bool locked, struct list_head *target_list)
>>>>>>>>    {
>>>>>>>> +    const u32 min_thresh = extent_thresh / 2;
>>>>>>>>        u64 cur = start;
>>>>>>>>        int ret = 0;
>>>>>>>>    @@ -1198,6 +1199,7 @@ static int defrag_collect_targets(struct
>>>>>>>> btrfs_inode *inode,
>>>>>>>>            struct extent_map *em;
>>>>>>>>            struct defrag_target_range *new;
>>>>>>>>            bool next_mergeable = true;
>>>>>>>> +        u64 range_start;
>>>>>>>>            u64 range_len;
>>>>>>>>              em = defrag_lookup_extent(&inode->vfs_inode, cur,
>>>>>>>> locked);
>>>>>>>> @@ -1213,6 +1215,24 @@ static int defrag_collect_targets(struct
>>>>>>>> btrfs_inode *inode,
>>>>>>>>            if (em->generation < newer_than)
>>>>>>>>                goto next;
>>>>>>>>    +        /*
>>>>>>>> +         * Our start offset might be in the middle of an existing
>>>>>>>> extent
>>>>>>>> +         * map, so take that into account.
>>>>>>>> +         */
>>>>>>>> +        range_len = em->len - (cur - em->start);
>>>>>>>> +
>>>>>>>> +        /*
>>>>>>>> +         * If there's already a good range for delalloc within the
>>>>>>>> range
>>>>>>>> +         * covered by the extent map, skip it, otherwise we can
>>>>>>>> end up
>>>>>>>> +         * doing on the same IO range for a long time when using
>>>>>>>> auto
>>>>>>>> +         * defrag.
>>>>>>>> +         */
>>>>>>>> +        range_start = cur;
>>>>>>>> +        if (count_range_bits(&inode->io_tree, &range_start,
>>>>>>>> +                     range_start + range_len - 1, min_thresh,
>>>>>>>> +                     EXTENT_DELALLOC, 1) >= min_thresh)
>>>>>>>> +            goto next;
>>>>>>>> +
>>>>>>>>            /*
>>>>>>>>             * For do_compress case, we want to compress all valid
>>>>>>>> file
>>>>>>>>             * extents, thus no @extent_thresh or mergeable check.
>>>>>>>> @@ -1220,8 +1240,8 @@ static int defrag_collect_targets(struct
>>>>>>>> btrfs_inode *inode,
>>>>>>>>            if (do_compress)
>>>>>>>>                goto add;
>>>>>>>>    -        /* Skip too large extent */
>>>>>>>> -        if (em->len >= extent_thresh)
>>>>>>>> +        /* Skip large enough ranges. */
>>>>>>>> +        if (range_len >= extent_thresh)
>>>>>>>>                goto next;
>>>>>>>>              next_mergeable =
>>>>>>>> defrag_check_next_extent(&inode->vfs_inode, em,
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Anthony
>>>>>>>>>
>>>>>>>>> Le 17/01/2022 à 13:10, Filipe Manana a écrit :
>>>>>>>>>> On Sun, Jan 16, 2022 at 08:15:37PM +0100, Anthony Ruhier wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>> Since I upgraded from linux 5.15 to 5.16, `btrfs filesystem
>>>>>>>>>>> defrag
>>>>>>>>>>> -t128K`
>>>>>>>>>>> hangs on small files (~1 byte) and triggers what it seems to
>>>>>>>>>>> be a
>>>>>>>>>>> loop in
>>>>>>>>>>> the kernel. It results in one CPU thread running being used at
>>>>>>>>>>> 100%. I
>>>>>>>>>>> cannot kill the process, and rebooting is blocked by btrfs.
>>>>>>>>>>> It is a copy of the
>>>>>>>>>>> bughttps://bugzilla.kernel.org/show_bug.cgi?id=215498
>>>>>>>>>>>
>>>>>>>>>>> Rebooting to linux 5.15 shows no issue. I have no issue to run a
>>>>>>>>>>> defrag on
>>>>>>>>>>> bigger files (I filter out files smaller than 3.9KB).
>>>>>>>>>>>
>>>>>>>>>>> I had a conversation on #btrfs on IRC, so here's what we
>>>>>>>>>>> debugged:
>>>>>>>>>>>
>>>>>>>>>>> I can replicate the issue by copying a file impacted by this
>>>>>>>>>>> bug,
>>>>>>>>>>> by using
>>>>>>>>>>> `cp --reflink=never`. I attached one of the impacted files to
>>>>>>>>>>> this
>>>>>>>>>>> bug,
>>>>>>>>>>> named README.md.
>>>>>>>>>>>
>>>>>>>>>>> Someone told me that it could be a bug due to the inline extent.
>>>>>>>>>>> So we tried
>>>>>>>>>>> to check that.
>>>>>>>>>>>
>>>>>>>>>>> filefrag shows that the file Readme.md is 1 inline extent. I
>>>>>>>>>>> tried
>>>>>>>>>>> to create
>>>>>>>>>>> a new file with random text, of 18 bytes (slightly bigger
>>>>>>>>>>> than the
>>>>>>>>>>> other
>>>>>>>>>>> file), that is also 1 inline extent. This file doesn't
>>>>>>>>>>> trigger the
>>>>>>>>>>> bug and
>>>>>>>>>>> has no issue to be defragmented.
>>>>>>>>>>>
>>>>>>>>>>> I tried to mount my system with `max_inline=0`, created a
>>>>>>>>>>> copy of
>>>>>>>>>>> README.md.
>>>>>>>>>>> `filefrag` shows me that the new file is now 1 extent, not
>>>>>>>>>>> inline.
>>>>>>>>>>> This new
>>>>>>>>>>> file also triggers the bug, so it doesn't seem to be due to the
>>>>>>>>>>> inline
>>>>>>>>>>> extent.
>>>>>>>>>>>
>>>>>>>>>>> Someone asked me to provide the output of a perf top when the
>>>>>>>>>>> defrag is
>>>>>>>>>>> stuck:
>>>>>>>>>>>
>>>>>>>>>>>        28.70%  [kernel]          [k] generic_bin_search
>>>>>>>>>>>        14.90%  [kernel]          [k] free_extent_buffer
>>>>>>>>>>>        13.17%  [kernel]          [k] btrfs_search_slot
>>>>>>>>>>>        12.63%  [kernel]          [k] btrfs_root_node
>>>>>>>>>>>         8.33%  [kernel]          [k] btrfs_get_64
>>>>>>>>>>>         3.88%  [kernel]          [k] __down_read_common.llvm
>>>>>>>>>>>         3.00%  [kernel]          [k] up_read
>>>>>>>>>>>         2.63%  [kernel]          [k] read_block_for_search
>>>>>>>>>>>         2.40%  [kernel]          [k] read_extent_buffer
>>>>>>>>>>>         1.38%  [kernel]          [k] memset_erms
>>>>>>>>>>>         1.11%  [kernel]          [k] find_extent_buffer
>>>>>>>>>>>         0.69%  [kernel]          [k] kmem_cache_free
>>>>>>>>>>>         0.69%  [kernel]          [k] memcpy_erms
>>>>>>>>>>>         0.57%  [kernel]          [k] kmem_cache_alloc
>>>>>>>>>>>         0.45%  [kernel]          [k] radix_tree_lookup
>>>>>>>>>>>
>>>>>>>>>>> I can reproduce the bug on 2 different machines, running 2
>>>>>>>>>>> different linux
>>>>>>>>>>> distributions (Arch and Gentoo) with 2 different kernel configs.
>>>>>>>>>>> This kernel is compiled with clang, the other with GCC.
>>>>>>>>>>>
>>>>>>>>>>> Kernel version: 5.16.0
>>>>>>>>>>> Mount options:
>>>>>>>>>>>        Machine 1:
>>>>>>>>>>> rw,noatime,compress-force=zstd:2,ssd,discard=async,space_cache=v2,autodefrag
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>        Machine 2:
>>>>>>>>>>> rw,noatime,compress-force=zstd:3,nossd,space_cache=v2
>>>>>>>>>>>
>>>>>>>>>>> When the error happens, no message is shown in dmesg.
>>>>>>>>>> This is very likely the same issue as reported at this thread:
>>>>>>>>>>
>>>>>>>>>> https://lore.kernel.org/linux-btrfs/YeVawBBE3r6hVhgs@debian9.Home/T/#ma1c8a9848c9b7e4edb471f7be184599d38e288bb
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Are you able to test the patch posted there?
>>>>>>>>>>
>>>>>>>>>> Thanks.
>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Anthony Ruhier
>>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs fi defrag hangs on small files, 100% CPU thread
  2022-01-18  2:22                     ` Qu Wenruo
@ 2022-01-18 10:57                       ` Anthony Ruhier
  2022-01-18 11:58                         ` Qu Wenruo
  0 siblings, 1 reply; 14+ messages in thread
From: Anthony Ruhier @ 2022-01-18 10:57 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Filipe Manana


[-- Attachment #1.1: Type: text/plain, Size: 18617 bytes --]

Indeed, the problem took just a longer time to appear with your previous 
patch. It wasn't from btrfs-cleaner itself, this time, but I had a lot 
of writes.
I'm trying the last one and keep you in touch!

Thanks

Le 18/01/2022 à 03:22, Qu Wenruo a écrit :
>
>
> On 2022/1/18 09:58, Anthony Ruhier wrote:
>> Hi Qu,
>>
>> No problem, thanks a lot for your patch! After a quick look on a 10min
>> period, it looks like your patch fixed the issue. I've applied the 2
>> patches from Filipe and yours.
>> I'll let my NAS run over the night and watch the writes average on a
>> longer period. I'll send an update in ~10 hours.
>
> I'm really sorry for all the problems.
>
> Currently I believe the root cause is I changed the return value of
> btrfs_defrag_file(), which should return the number of sectors defragged.
>
> Even I renamed the variable to "sectors_defragged", I still return it in
> bytes instead.
>
> (Furthermore it tends to report more bytes than really defragged).
>
> The diff I mentioned is not really to solve the problem, but to help us
> to pin down the problem.
> (The diff itself will make autodefrag to do a little less work, thus it
> may not defrag as hard as previously)
>
> Thank you very much helping fixing the mess I caused.
>
>
> BTW, if you have spare machines, mind to test the following two patches?
> (if everything works as expected, they should be the final fixes, thus
> please test without other patches)
>
> https://patchwork.kernel.org/project/linux-btrfs/patch/20220118021544.18543-1-wqu@suse.com/ 
>
> https://patchwork.kernel.org/project/linux-btrfs/patch/bcbfce0ff7e21bbfed2484b1457e560edf78020d.1642436805.git.fdmanana@suse.com/ 
>
>
> Thanks,
> Qu
>
>
>>
>> Thanks a lot for your help,
>> Anthony
>>
>> Le 18/01/2022 à 00:32, Qu Wenruo a écrit :
>>>
>>>
>>> On 2022/1/18 07:15, Qu Wenruo wrote:
>>>>
>>>>
>>>> On 2022/1/18 03:39, Anthony Ruhier wrote:
>>>>> I have some good news and bad news: the bad news is that it didn't 
>>>>> fix
>>>>> the autodefrag problem (I applied the 2 patches).
>>>>>
>>>>> The good news is that when I enable autodefrag, I can quickly see if
>>>>> the
>>>>> problem is still there or not.
>>>>> It's not super obvious compared to the amount of writes that happened
>>>>> constantly just after my upgrade to linux 5.16, because since then 
>>>>> the
>>>>> problem mitigated itself a bit, but it's still noticeable.
>>>>>
>>>>> If I compare the write average (in total, I don't have it per 
>>>>> process)
>>>>> when taking idle periods on the same machine:
>>>>>      Linux 5.16:
>>>>>          without autodefrag: ~ 10KiB/s
>>>>>          with autodefrag: between 1 and 2MiB/s.
>>>>>
>>>>>      Linux 5.15:
>>>>>          with autodefrag:~ 10KiB/s (around the same as without
>>>>> autodefrag on 5.16)
>>>>>
>>>>> Feel free to ask me anything to help your debugging, just try to be
>>>>> quite explicit about what I should do, I'm not experimented in
>>>>> filesystems debugging.
>>>>
>>>> Mind to test the following diff (along with previous two patches from
>>>> Filipe)?
>>>>
>>>> I screwed up, the refactor changed how the defraged bytes accounting,
>>>> which I didn't notice that autodefrag relies on that to requeue the
>>>> inode.
>>>>
>>>> The refactor is originally to add support for subpage defrag, but it
>>>> looks like I pushed the boundary too hard to change some behaviors.
>>>
>>> Sorry, previous diff has a memory leakage bug.
>>>
>>> The autodefrag code is more complex than I thought, it has extra logic
>>> to handle defrag.
>>>
>>> Please try this one instead.
>>>
>>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
>>> index 11204dbbe053..096feecf2602 100644
>>> --- a/fs/btrfs/file.c
>>> +++ b/fs/btrfs/file.c
>>> @@ -305,26 +305,7 @@ static int __btrfs_run_defrag_inode(struct
>>> btrfs_fs_info *fs_info,
>>>         num_defrag = btrfs_defrag_file(inode, NULL, &range,
>>> defrag->transid,
>>>                                        BTRFS_DEFRAG_BATCH);
>>>         sb_end_write(fs_info->sb);
>>> -       /*
>>> -        * if we filled the whole defrag batch, there
>>> -        * must be more work to do.  Queue this defrag
>>> -        * again
>>> -        */
>>> -       if (num_defrag == BTRFS_DEFRAG_BATCH) {
>>> -               defrag->last_offset = range.start;
>>> -               btrfs_requeue_inode_defrag(BTRFS_I(inode), defrag);
>>> -       } else if (defrag->last_offset && !defrag->cycled) {
>>> -               /*
>>> -                * we didn't fill our defrag batch, but
>>> -                * we didn't start at zero.  Make sure we loop
>>> -                * around to the start of the file.
>>> -                */
>>> -               defrag->last_offset = 0;
>>> -               defrag->cycled = 1;
>>> -               btrfs_requeue_inode_defrag(BTRFS_I(inode), defrag);
>>> -       } else {
>>> -               kmem_cache_free(btrfs_inode_defrag_cachep, defrag);
>>> -       }
>>> +       kmem_cache_free(btrfs_inode_defrag_cachep, defrag);
>>>
>>>         iput(inode);
>>>         return 0;
>>>
>>>>
>>>>
>>>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
>>>> index 11204dbbe053..c720260f9656 100644
>>>> --- a/fs/btrfs/file.c
>>>> +++ b/fs/btrfs/file.c
>>>> @@ -312,7 +312,9 @@ static int __btrfs_run_defrag_inode(struct
>>>> btrfs_fs_info *fs_info,
>>>>           */
>>>>          if (num_defrag == BTRFS_DEFRAG_BATCH) {
>>>>                  defrag->last_offset = range.start;
>>>> +               /*
>>>>                  btrfs_requeue_inode_defrag(BTRFS_I(inode), defrag);
>>>> +               */
>>>>          } else if (defrag->last_offset && !defrag->cycled) {
>>>>                  /*
>>>>                   * we didn't fill our defrag batch, but
>>>> @@ -321,7 +323,9 @@ static int __btrfs_run_defrag_inode(struct
>>>> btrfs_fs_info *fs_info,
>>>>                   */
>>>>                  defrag->last_offset = 0;
>>>>                  defrag->cycled = 1;
>>>> +               /*
>>>>                  btrfs_requeue_inode_defrag(BTRFS_I(inode), defrag);
>>>> +               */
>>>>          } else {
>>>>                  kmem_cache_free(btrfs_inode_defrag_cachep, defrag);
>>>>          }
>>>>
>>>>
>>>>
>>>>> Thanks
>>>>>
>>>>> Le 17/01/2022 à 18:56, Filipe Manana a écrit :
>>>>>> On Mon, Jan 17, 2022 at 5:50 PM Anthony Ruhier <aruhier@mailbox.org>
>>>>>> wrote:
>>>>>>> Filipe,
>>>>>>>
>>>>>>> Just a quick update after my previous email, your patch fixed the
>>>>>>> issue
>>>>>>> for `btrfs fi defrag`.
>>>>>>> Thanks a lot! I closed my bug on bugzilla.
>>>>>>>
>>>>>>> I'll keep you in touch about the autodefrag.
>>>>>> Please do.
>>>>>> The 1 byte file case was very specific, so it's probably a different
>>>>>> issue.
>>>>>>
>>>>>> Thanks for testing!
>>>>>>
>>>>>>> Le 17/01/2022 à 18:04, Anthony Ruhier a écrit :
>>>>>>>> I'm going to apply your patch for the 1B file, and quickly
>>>>>>>> confirm if
>>>>>>>> it works.
>>>>>>>> Thanks a lot for your patch!
>>>>>>>>
>>>>>>>> About the autodefrag issue, it's going to be trickier to check 
>>>>>>>> that
>>>>>>>> your patch fixes it, because for whatever reason the problem
>>>>>>>> seems to
>>>>>>>> have resolved itself (or at least, btrfs-cleaner does way less
>>>>>>>> writes)
>>>>>>>> after a partial btrfs balance.
>>>>>>>> I'll try and look the amount of writes after several hours. Maybe
>>>>>>>> for
>>>>>>>> this one, see with the author of the other bug. If they can easily
>>>>>>>> reproduce the issue then it's going to be easier to check.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Anthony
>>>>>>>>
>>>>>>>> Le 17/01/2022 à 17:52, Filipe Manana a écrit :
>>>>>>>>> On Mon, Jan 17, 2022 at 03:24:00PM +0100, Anthony Ruhier wrote:
>>>>>>>>>> Thanks for the answer.
>>>>>>>>>>
>>>>>>>>>> I had the exact same issue as in the thread you've linked, and
>>>>>>>>>> have
>>>>>>>>>> some
>>>>>>>>>> monitoring and graphs that showed that btrfs-cleaner did 
>>>>>>>>>> constant
>>>>>>>>>> writes
>>>>>>>>>> during 12 hours just after I upgraded to linux 5.16. Weirdly
>>>>>>>>>> enough,
>>>>>>>>>> the
>>>>>>>>>> issue almost disappeared after I did a btrfs balance by
>>>>>>>>>> filtering on
>>>>>>>>>> 10%
>>>>>>>>>> usage of data.
>>>>>>>>>> But that's why I initially disabled autodefrag, what has lead to
>>>>>>>>>> discovering
>>>>>>>>>> this bug as I switched to manual defragmentation (which, in the
>>>>>>>>>> end,
>>>>>>>>>> makes
>>>>>>>>>> more sense anyway with my setup).
>>>>>>>>>>
>>>>>>>>>> I tried this patch, but sadly it doesn't help for the initial
>>>>>>>>>> issue. I
>>>>>>>>>> cannot say for the bug in the other thread, as the problem with
>>>>>>>>>> btrfs-cleaner disappeared (I can still see some writes from it,
>>>>>>>>>> but
>>>>>>>>>> it so
>>>>>>>>>> rare that I cannot say if it's normal or not).
>>>>>>>>> Ok, reading better your first mail, I see there's the case of 
>>>>>>>>> the 1
>>>>>>>>> byte
>>>>>>>>> file, which is possibly not the cause from the autodefrag
>>>>>>>>> causing the
>>>>>>>>> excessive IO problem.
>>>>>>>>>
>>>>>>>>> For the 1 byte file problem, I've just sent a fix:
>>>>>>>>>
>>>>>>>>> https://patchwork.kernel.org/project/linux-btrfs/patch/bcbfce0ff7e21bbfed2484b1457e560edf78020d.1642436805.git.fdmanana@suse.com/ 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> It's actually trivial to trigger.
>>>>>>>>>
>>>>>>>>> Can you check if it also fixes your problem with autodefrag?
>>>>>>>>>
>>>>>>>>> If not, then try the following (after applying the 1 file fix):
>>>>>>>>>
>>>>>>>>> https://pastebin.com/raw/EbEfk1tF
>>>>>>>>>
>>>>>>>>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>>>>>>>>> index a5bd6926f7ff..db795e226cca 100644
>>>>>>>>> --- a/fs/btrfs/ioctl.c
>>>>>>>>> +++ b/fs/btrfs/ioctl.c
>>>>>>>>> @@ -1191,6 +1191,7 @@ static int defrag_collect_targets(struct
>>>>>>>>> btrfs_inode *inode,
>>>>>>>>>                      u64 newer_than, bool do_compress,
>>>>>>>>>                      bool locked, struct list_head *target_list)
>>>>>>>>>    {
>>>>>>>>> +    const u32 min_thresh = extent_thresh / 2;
>>>>>>>>>        u64 cur = start;
>>>>>>>>>        int ret = 0;
>>>>>>>>>    @@ -1198,6 +1199,7 @@ static int defrag_collect_targets(struct
>>>>>>>>> btrfs_inode *inode,
>>>>>>>>>            struct extent_map *em;
>>>>>>>>>            struct defrag_target_range *new;
>>>>>>>>>            bool next_mergeable = true;
>>>>>>>>> +        u64 range_start;
>>>>>>>>>            u64 range_len;
>>>>>>>>>              em = defrag_lookup_extent(&inode->vfs_inode, cur,
>>>>>>>>> locked);
>>>>>>>>> @@ -1213,6 +1215,24 @@ static int defrag_collect_targets(struct
>>>>>>>>> btrfs_inode *inode,
>>>>>>>>>            if (em->generation < newer_than)
>>>>>>>>>                goto next;
>>>>>>>>>    +        /*
>>>>>>>>> +         * Our start offset might be in the middle of an 
>>>>>>>>> existing
>>>>>>>>> extent
>>>>>>>>> +         * map, so take that into account.
>>>>>>>>> +         */
>>>>>>>>> +        range_len = em->len - (cur - em->start);
>>>>>>>>> +
>>>>>>>>> +        /*
>>>>>>>>> +         * If there's already a good range for delalloc 
>>>>>>>>> within the
>>>>>>>>> range
>>>>>>>>> +         * covered by the extent map, skip it, otherwise we can
>>>>>>>>> end up
>>>>>>>>> +         * doing on the same IO range for a long time when using
>>>>>>>>> auto
>>>>>>>>> +         * defrag.
>>>>>>>>> +         */
>>>>>>>>> +        range_start = cur;
>>>>>>>>> +        if (count_range_bits(&inode->io_tree, &range_start,
>>>>>>>>> +                     range_start + range_len - 1, min_thresh,
>>>>>>>>> +                     EXTENT_DELALLOC, 1) >= min_thresh)
>>>>>>>>> +            goto next;
>>>>>>>>> +
>>>>>>>>>            /*
>>>>>>>>>             * For do_compress case, we want to compress all valid
>>>>>>>>> file
>>>>>>>>>             * extents, thus no @extent_thresh or mergeable check.
>>>>>>>>> @@ -1220,8 +1240,8 @@ static int defrag_collect_targets(struct
>>>>>>>>> btrfs_inode *inode,
>>>>>>>>>            if (do_compress)
>>>>>>>>>                goto add;
>>>>>>>>>    -        /* Skip too large extent */
>>>>>>>>> -        if (em->len >= extent_thresh)
>>>>>>>>> +        /* Skip large enough ranges. */
>>>>>>>>> +        if (range_len >= extent_thresh)
>>>>>>>>>                goto next;
>>>>>>>>>              next_mergeable =
>>>>>>>>> defrag_check_next_extent(&inode->vfs_inode, em,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Anthony
>>>>>>>>>>
>>>>>>>>>> Le 17/01/2022 à 13:10, Filipe Manana a écrit :
>>>>>>>>>>> On Sun, Jan 16, 2022 at 08:15:37PM +0100, Anthony Ruhier wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> Since I upgraded from linux 5.15 to 5.16, `btrfs filesystem
>>>>>>>>>>>> defrag
>>>>>>>>>>>> -t128K`
>>>>>>>>>>>> hangs on small files (~1 byte) and triggers what it seems to
>>>>>>>>>>>> be a
>>>>>>>>>>>> loop in
>>>>>>>>>>>> the kernel. It results in one CPU thread running being used at
>>>>>>>>>>>> 100%. I
>>>>>>>>>>>> cannot kill the process, and rebooting is blocked by btrfs.
>>>>>>>>>>>> It is a copy of the
>>>>>>>>>>>> bughttps://bugzilla.kernel.org/show_bug.cgi?id=215498
>>>>>>>>>>>>
>>>>>>>>>>>> Rebooting to linux 5.15 shows no issue. I have no issue to 
>>>>>>>>>>>> run a
>>>>>>>>>>>> defrag on
>>>>>>>>>>>> bigger files (I filter out files smaller than 3.9KB).
>>>>>>>>>>>>
>>>>>>>>>>>> I had a conversation on #btrfs on IRC, so here's what we
>>>>>>>>>>>> debugged:
>>>>>>>>>>>>
>>>>>>>>>>>> I can replicate the issue by copying a file impacted by this
>>>>>>>>>>>> bug,
>>>>>>>>>>>> by using
>>>>>>>>>>>> `cp --reflink=never`. I attached one of the impacted files to
>>>>>>>>>>>> this
>>>>>>>>>>>> bug,
>>>>>>>>>>>> named README.md.
>>>>>>>>>>>>
>>>>>>>>>>>> Someone told me that it could be a bug due to the inline 
>>>>>>>>>>>> extent.
>>>>>>>>>>>> So we tried
>>>>>>>>>>>> to check that.
>>>>>>>>>>>>
>>>>>>>>>>>> filefrag shows that the file Readme.md is 1 inline extent. I
>>>>>>>>>>>> tried
>>>>>>>>>>>> to create
>>>>>>>>>>>> a new file with random text, of 18 bytes (slightly bigger
>>>>>>>>>>>> than the
>>>>>>>>>>>> other
>>>>>>>>>>>> file), that is also 1 inline extent. This file doesn't
>>>>>>>>>>>> trigger the
>>>>>>>>>>>> bug and
>>>>>>>>>>>> has no issue to be defragmented.
>>>>>>>>>>>>
>>>>>>>>>>>> I tried to mount my system with `max_inline=0`, created a
>>>>>>>>>>>> copy of
>>>>>>>>>>>> README.md.
>>>>>>>>>>>> `filefrag` shows me that the new file is now 1 extent, not
>>>>>>>>>>>> inline.
>>>>>>>>>>>> This new
>>>>>>>>>>>> file also triggers the bug, so it doesn't seem to be due to 
>>>>>>>>>>>> the
>>>>>>>>>>>> inline
>>>>>>>>>>>> extent.
>>>>>>>>>>>>
>>>>>>>>>>>> Someone asked me to provide the output of a perf top when the
>>>>>>>>>>>> defrag is
>>>>>>>>>>>> stuck:
>>>>>>>>>>>>
>>>>>>>>>>>>        28.70%  [kernel]          [k] generic_bin_search
>>>>>>>>>>>>        14.90%  [kernel]          [k] free_extent_buffer
>>>>>>>>>>>>        13.17%  [kernel]          [k] btrfs_search_slot
>>>>>>>>>>>>        12.63%  [kernel]          [k] btrfs_root_node
>>>>>>>>>>>>         8.33%  [kernel]          [k] btrfs_get_64
>>>>>>>>>>>>         3.88%  [kernel]          [k] __down_read_common.llvm
>>>>>>>>>>>>         3.00%  [kernel]          [k] up_read
>>>>>>>>>>>>         2.63%  [kernel]          [k] read_block_for_search
>>>>>>>>>>>>         2.40%  [kernel]          [k] read_extent_buffer
>>>>>>>>>>>>         1.38%  [kernel]          [k] memset_erms
>>>>>>>>>>>>         1.11%  [kernel]          [k] find_extent_buffer
>>>>>>>>>>>>         0.69%  [kernel]          [k] kmem_cache_free
>>>>>>>>>>>>         0.69%  [kernel]          [k] memcpy_erms
>>>>>>>>>>>>         0.57%  [kernel]          [k] kmem_cache_alloc
>>>>>>>>>>>>         0.45%  [kernel]          [k] radix_tree_lookup
>>>>>>>>>>>>
>>>>>>>>>>>> I can reproduce the bug on 2 different machines, running 2
>>>>>>>>>>>> different linux
>>>>>>>>>>>> distributions (Arch and Gentoo) with 2 different kernel 
>>>>>>>>>>>> configs.
>>>>>>>>>>>> This kernel is compiled with clang, the other with GCC.
>>>>>>>>>>>>
>>>>>>>>>>>> Kernel version: 5.16.0
>>>>>>>>>>>> Mount options:
>>>>>>>>>>>>        Machine 1:
>>>>>>>>>>>> rw,noatime,compress-force=zstd:2,ssd,discard=async,space_cache=v2,autodefrag 
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>        Machine 2:
>>>>>>>>>>>> rw,noatime,compress-force=zstd:3,nossd,space_cache=v2
>>>>>>>>>>>>
>>>>>>>>>>>> When the error happens, no message is shown in dmesg.
>>>>>>>>>>> This is very likely the same issue as reported at this thread:
>>>>>>>>>>>
>>>>>>>>>>> https://lore.kernel.org/linux-btrfs/YeVawBBE3r6hVhgs@debian9.Home/T/#ma1c8a9848c9b7e4edb471f7be184599d38e288bb 
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Are you able to test the patch posted there?
>>>>>>>>>>>
>>>>>>>>>>> Thanks.
>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Anthony Ruhier
>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs fi defrag hangs on small files, 100% CPU thread
  2022-01-18 10:57                       ` Anthony Ruhier
@ 2022-01-18 11:58                         ` Qu Wenruo
  0 siblings, 0 replies; 14+ messages in thread
From: Qu Wenruo @ 2022-01-18 11:58 UTC (permalink / raw)
  To: Anthony Ruhier; +Cc: linux-btrfs, Filipe Manana



On 2022/1/18 18:57, Anthony Ruhier wrote:
> Indeed, the problem took just a longer time to appear with your previous
> patch. It wasn't from btrfs-cleaner itself, this time, but I had a lot
> of writes.
> I'm trying the last one and keep you in touch!

And another autodefrag related problem exposed...

Thankfully, the involved fixes are all small enough.

So would you please try the following patches? (The first one from
Filipe is not touched, the 2nd one is mostly the same but with minor
tweaks for commit message, the the final one is the new one):

https://patchwork.kernel.org/project/linux-btrfs/patch/bcbfce0ff7e21bbfed2484b1457e560edf78020d.1642436805.git.fdmanana@suse.com/

https://patchwork.kernel.org/project/linux-btrfs/patch/20220118071904.29991-1-wqu@suse.com/

https://lore.kernel.org/linux-btrfs/20220118115352.52126-1-wqu@suse.com/T/#u

Thanks,
Qu

>
> Thanks
>
> Le 18/01/2022 à 03:22, Qu Wenruo a écrit :
>>
>>
>> On 2022/1/18 09:58, Anthony Ruhier wrote:
>>> Hi Qu,
>>>
>>> No problem, thanks a lot for your patch! After a quick look on a 10min
>>> period, it looks like your patch fixed the issue. I've applied the 2
>>> patches from Filipe and yours.
>>> I'll let my NAS run over the night and watch the writes average on a
>>> longer period. I'll send an update in ~10 hours.
>>
>> I'm really sorry for all the problems.
>>
>> Currently I believe the root cause is I changed the return value of
>> btrfs_defrag_file(), which should return the number of sectors defragged.
>>
>> Even I renamed the variable to "sectors_defragged", I still return it in
>> bytes instead.
>>
>> (Furthermore it tends to report more bytes than really defragged).
>>
>> The diff I mentioned is not really to solve the problem, but to help us
>> to pin down the problem.
>> (The diff itself will make autodefrag to do a little less work, thus it
>> may not defrag as hard as previously)
>>
>> Thank you very much helping fixing the mess I caused.
>>
>>
>> BTW, if you have spare machines, mind to test the following two patches?
>> (if everything works as expected, they should be the final fixes, thus
>> please test without other patches)
>>
>> https://patchwork.kernel.org/project/linux-btrfs/patch/20220118021544.18543-1-wqu@suse.com/
>>
>> https://patchwork.kernel.org/project/linux-btrfs/patch/bcbfce0ff7e21bbfed2484b1457e560edf78020d.1642436805.git.fdmanana@suse.com/
>>
>>
>> Thanks,
>> Qu
>>
>>
>>>
>>> Thanks a lot for your help,
>>> Anthony
>>>
>>> Le 18/01/2022 à 00:32, Qu Wenruo a écrit :
>>>>
>>>>
>>>> On 2022/1/18 07:15, Qu Wenruo wrote:
>>>>>
>>>>>
>>>>> On 2022/1/18 03:39, Anthony Ruhier wrote:
>>>>>> I have some good news and bad news: the bad news is that it didn't
>>>>>> fix
>>>>>> the autodefrag problem (I applied the 2 patches).
>>>>>>
>>>>>> The good news is that when I enable autodefrag, I can quickly see if
>>>>>> the
>>>>>> problem is still there or not.
>>>>>> It's not super obvious compared to the amount of writes that happened
>>>>>> constantly just after my upgrade to linux 5.16, because since then
>>>>>> the
>>>>>> problem mitigated itself a bit, but it's still noticeable.
>>>>>>
>>>>>> If I compare the write average (in total, I don't have it per
>>>>>> process)
>>>>>> when taking idle periods on the same machine:
>>>>>>      Linux 5.16:
>>>>>>          without autodefrag: ~ 10KiB/s
>>>>>>          with autodefrag: between 1 and 2MiB/s.
>>>>>>
>>>>>>      Linux 5.15:
>>>>>>          with autodefrag:~ 10KiB/s (around the same as without
>>>>>> autodefrag on 5.16)
>>>>>>
>>>>>> Feel free to ask me anything to help your debugging, just try to be
>>>>>> quite explicit about what I should do, I'm not experimented in
>>>>>> filesystems debugging.
>>>>>
>>>>> Mind to test the following diff (along with previous two patches from
>>>>> Filipe)?
>>>>>
>>>>> I screwed up, the refactor changed how the defraged bytes accounting,
>>>>> which I didn't notice that autodefrag relies on that to requeue the
>>>>> inode.
>>>>>
>>>>> The refactor is originally to add support for subpage defrag, but it
>>>>> looks like I pushed the boundary too hard to change some behaviors.
>>>>
>>>> Sorry, previous diff has a memory leakage bug.
>>>>
>>>> The autodefrag code is more complex than I thought, it has extra logic
>>>> to handle defrag.
>>>>
>>>> Please try this one instead.
>>>>
>>>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
>>>> index 11204dbbe053..096feecf2602 100644
>>>> --- a/fs/btrfs/file.c
>>>> +++ b/fs/btrfs/file.c
>>>> @@ -305,26 +305,7 @@ static int __btrfs_run_defrag_inode(struct
>>>> btrfs_fs_info *fs_info,
>>>>         num_defrag = btrfs_defrag_file(inode, NULL, &range,
>>>> defrag->transid,
>>>>                                        BTRFS_DEFRAG_BATCH);
>>>>         sb_end_write(fs_info->sb);
>>>> -       /*
>>>> -        * if we filled the whole defrag batch, there
>>>> -        * must be more work to do.  Queue this defrag
>>>> -        * again
>>>> -        */
>>>> -       if (num_defrag == BTRFS_DEFRAG_BATCH) {
>>>> -               defrag->last_offset = range.start;
>>>> -               btrfs_requeue_inode_defrag(BTRFS_I(inode), defrag);
>>>> -       } else if (defrag->last_offset && !defrag->cycled) {
>>>> -               /*
>>>> -                * we didn't fill our defrag batch, but
>>>> -                * we didn't start at zero.  Make sure we loop
>>>> -                * around to the start of the file.
>>>> -                */
>>>> -               defrag->last_offset = 0;
>>>> -               defrag->cycled = 1;
>>>> -               btrfs_requeue_inode_defrag(BTRFS_I(inode), defrag);
>>>> -       } else {
>>>> -               kmem_cache_free(btrfs_inode_defrag_cachep, defrag);
>>>> -       }
>>>> +       kmem_cache_free(btrfs_inode_defrag_cachep, defrag);
>>>>
>>>>         iput(inode);
>>>>         return 0;
>>>>
>>>>>
>>>>>
>>>>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
>>>>> index 11204dbbe053..c720260f9656 100644
>>>>> --- a/fs/btrfs/file.c
>>>>> +++ b/fs/btrfs/file.c
>>>>> @@ -312,7 +312,9 @@ static int __btrfs_run_defrag_inode(struct
>>>>> btrfs_fs_info *fs_info,
>>>>>           */
>>>>>          if (num_defrag == BTRFS_DEFRAG_BATCH) {
>>>>>                  defrag->last_offset = range.start;
>>>>> +               /*
>>>>>                  btrfs_requeue_inode_defrag(BTRFS_I(inode), defrag);
>>>>> +               */
>>>>>          } else if (defrag->last_offset && !defrag->cycled) {
>>>>>                  /*
>>>>>                   * we didn't fill our defrag batch, but
>>>>> @@ -321,7 +323,9 @@ static int __btrfs_run_defrag_inode(struct
>>>>> btrfs_fs_info *fs_info,
>>>>>                   */
>>>>>                  defrag->last_offset = 0;
>>>>>                  defrag->cycled = 1;
>>>>> +               /*
>>>>>                  btrfs_requeue_inode_defrag(BTRFS_I(inode), defrag);
>>>>> +               */
>>>>>          } else {
>>>>>                  kmem_cache_free(btrfs_inode_defrag_cachep, defrag);
>>>>>          }
>>>>>
>>>>>
>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Le 17/01/2022 à 18:56, Filipe Manana a écrit :
>>>>>>> On Mon, Jan 17, 2022 at 5:50 PM Anthony Ruhier <aruhier@mailbox.org>
>>>>>>> wrote:
>>>>>>>> Filipe,
>>>>>>>>
>>>>>>>> Just a quick update after my previous email, your patch fixed the
>>>>>>>> issue
>>>>>>>> for `btrfs fi defrag`.
>>>>>>>> Thanks a lot! I closed my bug on bugzilla.
>>>>>>>>
>>>>>>>> I'll keep you in touch about the autodefrag.
>>>>>>> Please do.
>>>>>>> The 1 byte file case was very specific, so it's probably a different
>>>>>>> issue.
>>>>>>>
>>>>>>> Thanks for testing!
>>>>>>>
>>>>>>>> Le 17/01/2022 à 18:04, Anthony Ruhier a écrit :
>>>>>>>>> I'm going to apply your patch for the 1B file, and quickly
>>>>>>>>> confirm if
>>>>>>>>> it works.
>>>>>>>>> Thanks a lot for your patch!
>>>>>>>>>
>>>>>>>>> About the autodefrag issue, it's going to be trickier to check
>>>>>>>>> that
>>>>>>>>> your patch fixes it, because for whatever reason the problem
>>>>>>>>> seems to
>>>>>>>>> have resolved itself (or at least, btrfs-cleaner does way less
>>>>>>>>> writes)
>>>>>>>>> after a partial btrfs balance.
>>>>>>>>> I'll try and look the amount of writes after several hours. Maybe
>>>>>>>>> for
>>>>>>>>> this one, see with the author of the other bug. If they can easily
>>>>>>>>> reproduce the issue then it's going to be easier to check.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Anthony
>>>>>>>>>
>>>>>>>>> Le 17/01/2022 à 17:52, Filipe Manana a écrit :
>>>>>>>>>> On Mon, Jan 17, 2022 at 03:24:00PM +0100, Anthony Ruhier wrote:
>>>>>>>>>>> Thanks for the answer.
>>>>>>>>>>>
>>>>>>>>>>> I had the exact same issue as in the thread you've linked, and
>>>>>>>>>>> have
>>>>>>>>>>> some
>>>>>>>>>>> monitoring and graphs that showed that btrfs-cleaner did
>>>>>>>>>>> constant
>>>>>>>>>>> writes
>>>>>>>>>>> during 12 hours just after I upgraded to linux 5.16. Weirdly
>>>>>>>>>>> enough,
>>>>>>>>>>> the
>>>>>>>>>>> issue almost disappeared after I did a btrfs balance by
>>>>>>>>>>> filtering on
>>>>>>>>>>> 10%
>>>>>>>>>>> usage of data.
>>>>>>>>>>> But that's why I initially disabled autodefrag, what has lead to
>>>>>>>>>>> discovering
>>>>>>>>>>> this bug as I switched to manual defragmentation (which, in the
>>>>>>>>>>> end,
>>>>>>>>>>> makes
>>>>>>>>>>> more sense anyway with my setup).
>>>>>>>>>>>
>>>>>>>>>>> I tried this patch, but sadly it doesn't help for the initial
>>>>>>>>>>> issue. I
>>>>>>>>>>> cannot say for the bug in the other thread, as the problem with
>>>>>>>>>>> btrfs-cleaner disappeared (I can still see some writes from it,
>>>>>>>>>>> but
>>>>>>>>>>> it so
>>>>>>>>>>> rare that I cannot say if it's normal or not).
>>>>>>>>>> Ok, reading better your first mail, I see there's the case of
>>>>>>>>>> the 1
>>>>>>>>>> byte
>>>>>>>>>> file, which is possibly not the cause from the autodefrag
>>>>>>>>>> causing the
>>>>>>>>>> excessive IO problem.
>>>>>>>>>>
>>>>>>>>>> For the 1 byte file problem, I've just sent a fix:
>>>>>>>>>>
>>>>>>>>>> https://patchwork.kernel.org/project/linux-btrfs/patch/bcbfce0ff7e21bbfed2484b1457e560edf78020d.1642436805.git.fdmanana@suse.com/
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> It's actually trivial to trigger.
>>>>>>>>>>
>>>>>>>>>> Can you check if it also fixes your problem with autodefrag?
>>>>>>>>>>
>>>>>>>>>> If not, then try the following (after applying the 1 file fix):
>>>>>>>>>>
>>>>>>>>>> https://pastebin.com/raw/EbEfk1tF
>>>>>>>>>>
>>>>>>>>>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>>>>>>>>>> index a5bd6926f7ff..db795e226cca 100644
>>>>>>>>>> --- a/fs/btrfs/ioctl.c
>>>>>>>>>> +++ b/fs/btrfs/ioctl.c
>>>>>>>>>> @@ -1191,6 +1191,7 @@ static int defrag_collect_targets(struct
>>>>>>>>>> btrfs_inode *inode,
>>>>>>>>>>                      u64 newer_than, bool do_compress,
>>>>>>>>>>                      bool locked, struct list_head *target_list)
>>>>>>>>>>    {
>>>>>>>>>> +    const u32 min_thresh = extent_thresh / 2;
>>>>>>>>>>        u64 cur = start;
>>>>>>>>>>        int ret = 0;
>>>>>>>>>>    @@ -1198,6 +1199,7 @@ static int defrag_collect_targets(struct
>>>>>>>>>> btrfs_inode *inode,
>>>>>>>>>>            struct extent_map *em;
>>>>>>>>>>            struct defrag_target_range *new;
>>>>>>>>>>            bool next_mergeable = true;
>>>>>>>>>> +        u64 range_start;
>>>>>>>>>>            u64 range_len;
>>>>>>>>>>              em = defrag_lookup_extent(&inode->vfs_inode, cur,
>>>>>>>>>> locked);
>>>>>>>>>> @@ -1213,6 +1215,24 @@ static int defrag_collect_targets(struct
>>>>>>>>>> btrfs_inode *inode,
>>>>>>>>>>            if (em->generation < newer_than)
>>>>>>>>>>                goto next;
>>>>>>>>>>    +        /*
>>>>>>>>>> +         * Our start offset might be in the middle of an
>>>>>>>>>> existing
>>>>>>>>>> extent
>>>>>>>>>> +         * map, so take that into account.
>>>>>>>>>> +         */
>>>>>>>>>> +        range_len = em->len - (cur - em->start);
>>>>>>>>>> +
>>>>>>>>>> +        /*
>>>>>>>>>> +         * If there's already a good range for delalloc
>>>>>>>>>> within the
>>>>>>>>>> range
>>>>>>>>>> +         * covered by the extent map, skip it, otherwise we can
>>>>>>>>>> end up
>>>>>>>>>> +         * doing on the same IO range for a long time when using
>>>>>>>>>> auto
>>>>>>>>>> +         * defrag.
>>>>>>>>>> +         */
>>>>>>>>>> +        range_start = cur;
>>>>>>>>>> +        if (count_range_bits(&inode->io_tree, &range_start,
>>>>>>>>>> +                     range_start + range_len - 1, min_thresh,
>>>>>>>>>> +                     EXTENT_DELALLOC, 1) >= min_thresh)
>>>>>>>>>> +            goto next;
>>>>>>>>>> +
>>>>>>>>>>            /*
>>>>>>>>>>             * For do_compress case, we want to compress all valid
>>>>>>>>>> file
>>>>>>>>>>             * extents, thus no @extent_thresh or mergeable check.
>>>>>>>>>> @@ -1220,8 +1240,8 @@ static int defrag_collect_targets(struct
>>>>>>>>>> btrfs_inode *inode,
>>>>>>>>>>            if (do_compress)
>>>>>>>>>>                goto add;
>>>>>>>>>>    -        /* Skip too large extent */
>>>>>>>>>> -        if (em->len >= extent_thresh)
>>>>>>>>>> +        /* Skip large enough ranges. */
>>>>>>>>>> +        if (range_len >= extent_thresh)
>>>>>>>>>>                goto next;
>>>>>>>>>>              next_mergeable =
>>>>>>>>>> defrag_check_next_extent(&inode->vfs_inode, em,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Anthony
>>>>>>>>>>>
>>>>>>>>>>> Le 17/01/2022 à 13:10, Filipe Manana a écrit :
>>>>>>>>>>>> On Sun, Jan 16, 2022 at 08:15:37PM +0100, Anthony Ruhier wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>> Since I upgraded from linux 5.15 to 5.16, `btrfs filesystem
>>>>>>>>>>>>> defrag
>>>>>>>>>>>>> -t128K`
>>>>>>>>>>>>> hangs on small files (~1 byte) and triggers what it seems to
>>>>>>>>>>>>> be a
>>>>>>>>>>>>> loop in
>>>>>>>>>>>>> the kernel. It results in one CPU thread running being used at
>>>>>>>>>>>>> 100%. I
>>>>>>>>>>>>> cannot kill the process, and rebooting is blocked by btrfs.
>>>>>>>>>>>>> It is a copy of the
>>>>>>>>>>>>> bughttps://bugzilla.kernel.org/show_bug.cgi?id=215498
>>>>>>>>>>>>>
>>>>>>>>>>>>> Rebooting to linux 5.15 shows no issue. I have no issue to
>>>>>>>>>>>>> run a
>>>>>>>>>>>>> defrag on
>>>>>>>>>>>>> bigger files (I filter out files smaller than 3.9KB).
>>>>>>>>>>>>>
>>>>>>>>>>>>> I had a conversation on #btrfs on IRC, so here's what we
>>>>>>>>>>>>> debugged:
>>>>>>>>>>>>>
>>>>>>>>>>>>> I can replicate the issue by copying a file impacted by this
>>>>>>>>>>>>> bug,
>>>>>>>>>>>>> by using
>>>>>>>>>>>>> `cp --reflink=never`. I attached one of the impacted files to
>>>>>>>>>>>>> this
>>>>>>>>>>>>> bug,
>>>>>>>>>>>>> named README.md.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Someone told me that it could be a bug due to the inline
>>>>>>>>>>>>> extent.
>>>>>>>>>>>>> So we tried
>>>>>>>>>>>>> to check that.
>>>>>>>>>>>>>
>>>>>>>>>>>>> filefrag shows that the file Readme.md is 1 inline extent. I
>>>>>>>>>>>>> tried
>>>>>>>>>>>>> to create
>>>>>>>>>>>>> a new file with random text, of 18 bytes (slightly bigger
>>>>>>>>>>>>> than the
>>>>>>>>>>>>> other
>>>>>>>>>>>>> file), that is also 1 inline extent. This file doesn't
>>>>>>>>>>>>> trigger the
>>>>>>>>>>>>> bug and
>>>>>>>>>>>>> has no issue to be defragmented.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I tried to mount my system with `max_inline=0`, created a
>>>>>>>>>>>>> copy of
>>>>>>>>>>>>> README.md.
>>>>>>>>>>>>> `filefrag` shows me that the new file is now 1 extent, not
>>>>>>>>>>>>> inline.
>>>>>>>>>>>>> This new
>>>>>>>>>>>>> file also triggers the bug, so it doesn't seem to be due to
>>>>>>>>>>>>> the
>>>>>>>>>>>>> inline
>>>>>>>>>>>>> extent.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Someone asked me to provide the output of a perf top when the
>>>>>>>>>>>>> defrag is
>>>>>>>>>>>>> stuck:
>>>>>>>>>>>>>
>>>>>>>>>>>>>        28.70%  [kernel]          [k] generic_bin_search
>>>>>>>>>>>>>        14.90%  [kernel]          [k] free_extent_buffer
>>>>>>>>>>>>>        13.17%  [kernel]          [k] btrfs_search_slot
>>>>>>>>>>>>>        12.63%  [kernel]          [k] btrfs_root_node
>>>>>>>>>>>>>         8.33%  [kernel]          [k] btrfs_get_64
>>>>>>>>>>>>>         3.88%  [kernel]          [k] __down_read_common.llvm
>>>>>>>>>>>>>         3.00%  [kernel]          [k] up_read
>>>>>>>>>>>>>         2.63%  [kernel]          [k] read_block_for_search
>>>>>>>>>>>>>         2.40%  [kernel]          [k] read_extent_buffer
>>>>>>>>>>>>>         1.38%  [kernel]          [k] memset_erms
>>>>>>>>>>>>>         1.11%  [kernel]          [k] find_extent_buffer
>>>>>>>>>>>>>         0.69%  [kernel]          [k] kmem_cache_free
>>>>>>>>>>>>>         0.69%  [kernel]          [k] memcpy_erms
>>>>>>>>>>>>>         0.57%  [kernel]          [k] kmem_cache_alloc
>>>>>>>>>>>>>         0.45%  [kernel]          [k] radix_tree_lookup
>>>>>>>>>>>>>
>>>>>>>>>>>>> I can reproduce the bug on 2 different machines, running 2
>>>>>>>>>>>>> different linux
>>>>>>>>>>>>> distributions (Arch and Gentoo) with 2 different kernel
>>>>>>>>>>>>> configs.
>>>>>>>>>>>>> This kernel is compiled with clang, the other with GCC.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Kernel version: 5.16.0
>>>>>>>>>>>>> Mount options:
>>>>>>>>>>>>>        Machine 1:
>>>>>>>>>>>>> rw,noatime,compress-force=zstd:2,ssd,discard=async,space_cache=v2,autodefrag
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>        Machine 2:
>>>>>>>>>>>>> rw,noatime,compress-force=zstd:3,nossd,space_cache=v2
>>>>>>>>>>>>>
>>>>>>>>>>>>> When the error happens, no message is shown in dmesg.
>>>>>>>>>>>> This is very likely the same issue as reported at this thread:
>>>>>>>>>>>>
>>>>>>>>>>>> https://lore.kernel.org/linux-btrfs/YeVawBBE3r6hVhgs@debian9.Home/T/#ma1c8a9848c9b7e4edb471f7be184599d38e288bb
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Are you able to test the patch posted there?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Anthony Ruhier
>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-01-18 11:58 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-16 19:15 btrfs fi defrag hangs on small files, 100% CPU thread Anthony Ruhier
2022-01-17 12:10 ` Filipe Manana
2022-01-17 14:24   ` Anthony Ruhier
2022-01-17 16:52     ` Filipe Manana
2022-01-17 17:04       ` Anthony Ruhier
2022-01-17 17:50         ` Anthony Ruhier
2022-01-17 17:56           ` Filipe Manana
2022-01-17 19:39             ` Anthony Ruhier
2022-01-17 23:15               ` Qu Wenruo
2022-01-17 23:32                 ` Qu Wenruo
2022-01-18  1:58                   ` Anthony Ruhier
2022-01-18  2:22                     ` Qu Wenruo
2022-01-18 10:57                       ` Anthony Ruhier
2022-01-18 11:58                         ` Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.