Massive I/O usage from btrfs-cleaner after upgrading to 5.16

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Massive I/O usage from btrfs-cleaner after upgrading to 5.16
@ 2022-01-17 10:06 François-Xavier Thomas
  2022-01-17 12:02 ` Filipe Manana
  0 siblings, 1 reply; 20+ messages in thread
From: François-Xavier Thomas @ 2022-01-17 10:06 UTC (permalink / raw)
  To: linux-btrfs

Hello all,

Just in case someone is having the same issue: Btrfs (in the
btrfs-cleaner process) is taking a large amount of disk IO after
upgrading to 5.16 on one of my volumes, and multiple other people seem
to be having the same issue, see discussion in [0].

[1] is a close-up screenshot of disk I/O history (blue line is write
ops, going from a baseline of some 10 ops/s to around 1k ops/s). I
downgraded from 5.16 to 5.15 in the middle, which immediately restored
previous performance.

Common options between affected people are: ssd, autodefrag. No error
in the logs, and no other issue aside from performance (the volume
works just fine for accessing data).

One person reports that SMART stats show a massive amount of blocks
being written; unfortunately I do not have historical data for that so
I cannot confirm, but this sounds likely given what I see on what
should be a relatively new SSD.

Any idea of what it could be related to?

François-Xavier

[0] https://www.reddit.com/r/btrfs/comments/s4nrzb/massive_performance_degradation_after_upgrading/
[1] https://imgur.com/oYhYat1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Massive I/O usage from btrfs-cleaner after upgrading to 5.16
  2022-01-17 10:06 Massive I/O usage from btrfs-cleaner after upgrading to 5.16 François-Xavier Thomas
@ 2022-01-17 12:02 ` Filipe Manana
  2022-01-17 16:59   ` Filipe Manana
  0 siblings, 1 reply; 20+ messages in thread
From: Filipe Manana @ 2022-01-17 12:02 UTC (permalink / raw)
  To: François-Xavier Thomas; +Cc: linux-btrfs

On Mon, Jan 17, 2022 at 11:06:42AM +0100, François-Xavier Thomas wrote:
> Hello all,
> 
> Just in case someone is having the same issue: Btrfs (in the
> btrfs-cleaner process) is taking a large amount of disk IO after
> upgrading to 5.16 on one of my volumes, and multiple other people seem
> to be having the same issue, see discussion in [0].
> 
> [1] is a close-up screenshot of disk I/O history (blue line is write
> ops, going from a baseline of some 10 ops/s to around 1k ops/s). I
> downgraded from 5.16 to 5.15 in the middle, which immediately restored
> previous performance.
> 
> Common options between affected people are: ssd, autodefrag. No error
> in the logs, and no other issue aside from performance (the volume
> works just fine for accessing data).
> 
> One person reports that SMART stats show a massive amount of blocks
> being written; unfortunately I do not have historical data for that so
> I cannot confirm, but this sounds likely given what I see on what
> should be a relatively new SSD.
> 
> Any idea of what it could be related to?

There was a big refactor of the defrag code that landed in 5.16.

On a quick glance, when using autodefrag it seems we now can end up in an
infinite loop by marking the same range for degrag (IO) over and over.

Can you try the following patch? (also at https://pastebin.com/raw/QR27Jv6n)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index a5bd6926f7ff..0a9f6125a566 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1213,6 +1213,13 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
                if (em->generation < newer_than)
                        goto next;
 
+               /*
+                * Skip extents already under IO, otherwise we can end up in an
+                * infinite loop when using auto defrag.
+                */
+               if (em->generation == (u64)-1)
+                       goto next;
+
                /*
                 * For do_compress case, we want to compress all valid file
                 * extents, thus no @extent_thresh or mergeable check.


> 
> François-Xavier
> 
> [0] https://www.reddit.com/r/btrfs/comments/s4nrzb/massive_performance_degradation_after_upgrading/
> [1] https://imgur.com/oYhYat1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: Massive I/O usage from btrfs-cleaner after upgrading to 5.16
  2022-01-17 12:02 ` Filipe Manana
@ 2022-01-17 16:59   ` Filipe Manana
  2022-01-17 21:37     ` François-Xavier Thomas
  0 siblings, 1 reply; 20+ messages in thread
From: Filipe Manana @ 2022-01-17 16:59 UTC (permalink / raw)
  To: François-Xavier Thomas; +Cc: linux-btrfs

On Mon, Jan 17, 2022 at 12:02:08PM +0000, Filipe Manana wrote:
> On Mon, Jan 17, 2022 at 11:06:42AM +0100, François-Xavier Thomas wrote:
> > Hello all,
> > 
> > Just in case someone is having the same issue: Btrfs (in the
> > btrfs-cleaner process) is taking a large amount of disk IO after
> > upgrading to 5.16 on one of my volumes, and multiple other people seem
> > to be having the same issue, see discussion in [0].
> > 
> > [1] is a close-up screenshot of disk I/O history (blue line is write
> > ops, going from a baseline of some 10 ops/s to around 1k ops/s). I
> > downgraded from 5.16 to 5.15 in the middle, which immediately restored
> > previous performance.
> > 
> > Common options between affected people are: ssd, autodefrag. No error
> > in the logs, and no other issue aside from performance (the volume
> > works just fine for accessing data).
> > 
> > One person reports that SMART stats show a massive amount of blocks
> > being written; unfortunately I do not have historical data for that so
> > I cannot confirm, but this sounds likely given what I see on what
> > should be a relatively new SSD.
> > 
> > Any idea of what it could be related to?
> 
> There was a big refactor of the defrag code that landed in 5.16.
> 
> On a quick glance, when using autodefrag it seems we now can end up in an
> infinite loop by marking the same range for degrag (IO) over and over.
> 
> Can you try the following patch? (also at https://pastebin.com/raw/QR27Jv6n)

Actually try this one instead:

https://pastebin.com/raw/EbEfk1tF

Also, there's a bug with defrag running into an (almost) infinite loop when
attempting to defrag a 1 byte file. Someone ran into this and I've just sent
a fix for it:

https://patchwork.kernel.org/project/linux-btrfs/patch/bcbfce0ff7e21bbfed2484b1457e560edf78020d.1642436805.git.fdmanana@suse.com/

Maybe that is what you are running into when using autodefrag.
Firt try that fix for the 1 byte file case, and if after that you still run
into problems, then try with the other patch above as well (both patches
applied).

Thanks.



> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index a5bd6926f7ff..0a9f6125a566 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -1213,6 +1213,13 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
>                 if (em->generation < newer_than)
>                         goto next;
>  
> +               /*
> +                * Skip extents already under IO, otherwise we can end up in an
> +                * infinite loop when using auto defrag.
> +                */
> +               if (em->generation == (u64)-1)
> +                       goto next;
> +
>                 /*
>                  * For do_compress case, we want to compress all valid file
>                  * extents, thus no @extent_thresh or mergeable check.
> 
> 
> > 
> > François-Xavier
> > 
> > [0] https://www.reddit.com/r/btrfs/comments/s4nrzb/massive_performance_degradation_after_upgrading/
> > [1] https://imgur.com/oYhYat1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Massive I/O usage from btrfs-cleaner after upgrading to 5.16
  2022-01-17 16:59   ` Filipe Manana
@ 2022-01-17 21:37     ` François-Xavier Thomas
  2022-01-19  9:44       ` François-Xavier Thomas
  0 siblings, 1 reply; 20+ messages in thread
From: François-Xavier Thomas @ 2022-01-17 21:37 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs

Hi Filipe,

Thank you so much for the hints!

I compiled 5.16 with the 1-byte file patch and have been running it
for a couple of hours now. I/O seems to have been gradually increasing
compared to 5.15, but I will wait for tomorrow to have a clearer view
on the graphs, then I'll try the both patches.

François-Xavier

On Mon, Jan 17, 2022 at 5:59 PM Filipe Manana <fdmanana@kernel.org> wrote:
>
> On Mon, Jan 17, 2022 at 12:02:08PM +0000, Filipe Manana wrote:
> > On Mon, Jan 17, 2022 at 11:06:42AM +0100, François-Xavier Thomas wrote:
> > > Hello all,
> > >
> > > Just in case someone is having the same issue: Btrfs (in the
> > > btrfs-cleaner process) is taking a large amount of disk IO after
> > > upgrading to 5.16 on one of my volumes, and multiple other people seem
> > > to be having the same issue, see discussion in [0].
> > >
> > > [1] is a close-up screenshot of disk I/O history (blue line is write
> > > ops, going from a baseline of some 10 ops/s to around 1k ops/s). I
> > > downgraded from 5.16 to 5.15 in the middle, which immediately restored
> > > previous performance.
> > >
> > > Common options between affected people are: ssd, autodefrag. No error
> > > in the logs, and no other issue aside from performance (the volume
> > > works just fine for accessing data).
> > >
> > > One person reports that SMART stats show a massive amount of blocks
> > > being written; unfortunately I do not have historical data for that so
> > > I cannot confirm, but this sounds likely given what I see on what
> > > should be a relatively new SSD.
> > >
> > > Any idea of what it could be related to?
> >
> > There was a big refactor of the defrag code that landed in 5.16.
> >
> > On a quick glance, when using autodefrag it seems we now can end up in an
> > infinite loop by marking the same range for degrag (IO) over and over.
> >
> > Can you try the following patch? (also at https://pastebin.com/raw/QR27Jv6n)
>
> Actually try this one instead:
>
> https://pastebin.com/raw/EbEfk1tF
>
> Also, there's a bug with defrag running into an (almost) infinite loop when
> attempting to defrag a 1 byte file. Someone ran into this and I've just sent
> a fix for it:
>
> https://patchwork.kernel.org/project/linux-btrfs/patch/bcbfce0ff7e21bbfed2484b1457e560edf78020d.1642436805.git.fdmanana@suse.com/
>
> Maybe that is what you are running into when using autodefrag.
> Firt try that fix for the 1 byte file case, and if after that you still run
> into problems, then try with the other patch above as well (both patches
> applied).
>
> Thanks.
>
>
>
> >
> > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> > index a5bd6926f7ff..0a9f6125a566 100644
> > --- a/fs/btrfs/ioctl.c
> > +++ b/fs/btrfs/ioctl.c
> > @@ -1213,6 +1213,13 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
> >                 if (em->generation < newer_than)
> >                         goto next;
> >
> > +               /*
> > +                * Skip extents already under IO, otherwise we can end up in an
> > +                * infinite loop when using auto defrag.
> > +                */
> > +               if (em->generation == (u64)-1)
> > +                       goto next;
> > +
> >                 /*
> >                  * For do_compress case, we want to compress all valid file
> >                  * extents, thus no @extent_thresh or mergeable check.
> >
> >
> > >
> > > François-Xavier
> > >
> > > [0] https://www.reddit.com/r/btrfs/comments/s4nrzb/massive_performance_degradation_after_upgrading/
> > > [1] https://imgur.com/oYhYat1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Massive I/O usage from btrfs-cleaner after upgrading to 5.16
  2022-01-17 21:37     ` François-Xavier Thomas
@ 2022-01-19  9:44       ` François-Xavier Thomas
  2022-01-19 10:13         ` Filipe Manana
  0 siblings, 1 reply; 20+ messages in thread
From: François-Xavier Thomas @ 2022-01-19  9:44 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs

Hi,

More details on graph[0]:
- First patch (1-byte file) on 5.16.0 did not have a significant impact.
- Both patches on 5.16.0 did reduce a large part of the I/O but still
have a high baseline I/O compared to 5.15

Some people reported that 5.16.1 improved the situation for them, so
I'm testing that. It's too early to tell but for now the baseline I/O
still seems to be high compared to 5.15. Will update with more results
tomorrow.

François-Xavier

[0] https://i.imgur.com/agzAKGc.png

On Mon, Jan 17, 2022 at 10:37 PM François-Xavier Thomas
<fx.thomas@gmail.com> wrote:
>
> Hi Filipe,
>
> Thank you so much for the hints!
>
> I compiled 5.16 with the 1-byte file patch and have been running it
> for a couple of hours now. I/O seems to have been gradually increasing
> compared to 5.15, but I will wait for tomorrow to have a clearer view
> on the graphs, then I'll try the both patches.
>
> François-Xavier
>
> On Mon, Jan 17, 2022 at 5:59 PM Filipe Manana <fdmanana@kernel.org> wrote:
> >
> > On Mon, Jan 17, 2022 at 12:02:08PM +0000, Filipe Manana wrote:
> > > On Mon, Jan 17, 2022 at 11:06:42AM +0100, François-Xavier Thomas wrote:
> > > > Hello all,
> > > >
> > > > Just in case someone is having the same issue: Btrfs (in the
> > > > btrfs-cleaner process) is taking a large amount of disk IO after
> > > > upgrading to 5.16 on one of my volumes, and multiple other people seem
> > > > to be having the same issue, see discussion in [0].
> > > >
> > > > [1] is a close-up screenshot of disk I/O history (blue line is write
> > > > ops, going from a baseline of some 10 ops/s to around 1k ops/s). I
> > > > downgraded from 5.16 to 5.15 in the middle, which immediately restored
> > > > previous performance.
> > > >
> > > > Common options between affected people are: ssd, autodefrag. No error
> > > > in the logs, and no other issue aside from performance (the volume
> > > > works just fine for accessing data).
> > > >
> > > > One person reports that SMART stats show a massive amount of blocks
> > > > being written; unfortunately I do not have historical data for that so
> > > > I cannot confirm, but this sounds likely given what I see on what
> > > > should be a relatively new SSD.
> > > >
> > > > Any idea of what it could be related to?
> > >
> > > There was a big refactor of the defrag code that landed in 5.16.
> > >
> > > On a quick glance, when using autodefrag it seems we now can end up in an
> > > infinite loop by marking the same range for degrag (IO) over and over.
> > >
> > > Can you try the following patch? (also at https://pastebin.com/raw/QR27Jv6n)
> >
> > Actually try this one instead:
> >
> > https://pastebin.com/raw/EbEfk1tF
> >
> > Also, there's a bug with defrag running into an (almost) infinite loop when
> > attempting to defrag a 1 byte file. Someone ran into this and I've just sent
> > a fix for it:
> >
> > https://patchwork.kernel.org/project/linux-btrfs/patch/bcbfce0ff7e21bbfed2484b1457e560edf78020d.1642436805.git.fdmanana@suse.com/
> >
> > Maybe that is what you are running into when using autodefrag.
> > Firt try that fix for the 1 byte file case, and if after that you still run
> > into problems, then try with the other patch above as well (both patches
> > applied).
> >
> > Thanks.
> >
> >
> >
> > >
> > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> > > index a5bd6926f7ff..0a9f6125a566 100644
> > > --- a/fs/btrfs/ioctl.c
> > > +++ b/fs/btrfs/ioctl.c
> > > @@ -1213,6 +1213,13 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
> > >                 if (em->generation < newer_than)
> > >                         goto next;
> > >
> > > +               /*
> > > +                * Skip extents already under IO, otherwise we can end up in an
> > > +                * infinite loop when using auto defrag.
> > > +                */
> > > +               if (em->generation == (u64)-1)
> > > +                       goto next;
> > > +
> > >                 /*
> > >                  * For do_compress case, we want to compress all valid file
> > >                  * extents, thus no @extent_thresh or mergeable check.
> > >
> > >
> > > >
> > > > François-Xavier
> > > >
> > > > [0] https://www.reddit.com/r/btrfs/comments/s4nrzb/massive_performance_degradation_after_upgrading/
> > > > [1] https://imgur.com/oYhYat1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Massive I/O usage from btrfs-cleaner after upgrading to 5.16
  2022-01-19  9:44       ` François-Xavier Thomas
@ 2022-01-19 10:13         ` Filipe Manana
  2022-01-20 11:37           ` François-Xavier Thomas
  0 siblings, 1 reply; 20+ messages in thread
From: Filipe Manana @ 2022-01-19 10:13 UTC (permalink / raw)
  To: François-Xavier Thomas; +Cc: linux-btrfs

On Wed, Jan 19, 2022 at 9:44 AM François-Xavier Thomas
<fx.thomas@gmail.com> wrote:
>
> Hi,
>
> More details on graph[0]:
> - First patch (1-byte file) on 5.16.0 did not have a significant impact.
> - Both patches on 5.16.0 did reduce a large part of the I/O but still
> have a high baseline I/O compared to 5.15

So, try with these two more patches on top of that:

https://patchwork.kernel.org/project/linux-btrfs/patch/20220118071904.29991-1-wqu@suse.com/

https://patchwork.kernel.org/project/linux-btrfs/patch/20220118115352.52126-1-wqu@suse.com/

>
> Some people reported that 5.16.1 improved the situation for them, so

I don't see how that's possible, nothing was added to 5.16.1 that
involves defrag.
Might just be a coincidence.

Thanks.

> I'm testing that. It's too early to tell but for now the baseline I/O
> still seems to be high compared to 5.15. Will update with more results
> tomorrow.
>
> François-Xavier
>
> [0] https://i.imgur.com/agzAKGc.png
>
> On Mon, Jan 17, 2022 at 10:37 PM François-Xavier Thomas
> <fx.thomas@gmail.com> wrote:
> >
> > Hi Filipe,
> >
> > Thank you so much for the hints!
> >
> > I compiled 5.16 with the 1-byte file patch and have been running it
> > for a couple of hours now. I/O seems to have been gradually increasing
> > compared to 5.15, but I will wait for tomorrow to have a clearer view
> > on the graphs, then I'll try the both patches.
> >
> > François-Xavier
> >
> > On Mon, Jan 17, 2022 at 5:59 PM Filipe Manana <fdmanana@kernel.org> wrote:
> > >
> > > On Mon, Jan 17, 2022 at 12:02:08PM +0000, Filipe Manana wrote:
> > > > On Mon, Jan 17, 2022 at 11:06:42AM +0100, François-Xavier Thomas wrote:
> > > > > Hello all,
> > > > >
> > > > > Just in case someone is having the same issue: Btrfs (in the
> > > > > btrfs-cleaner process) is taking a large amount of disk IO after
> > > > > upgrading to 5.16 on one of my volumes, and multiple other people seem
> > > > > to be having the same issue, see discussion in [0].
> > > > >
> > > > > [1] is a close-up screenshot of disk I/O history (blue line is write
> > > > > ops, going from a baseline of some 10 ops/s to around 1k ops/s). I
> > > > > downgraded from 5.16 to 5.15 in the middle, which immediately restored
> > > > > previous performance.
> > > > >
> > > > > Common options between affected people are: ssd, autodefrag. No error
> > > > > in the logs, and no other issue aside from performance (the volume
> > > > > works just fine for accessing data).
> > > > >
> > > > > One person reports that SMART stats show a massive amount of blocks
> > > > > being written; unfortunately I do not have historical data for that so
> > > > > I cannot confirm, but this sounds likely given what I see on what
> > > > > should be a relatively new SSD.
> > > > >
> > > > > Any idea of what it could be related to?
> > > >
> > > > There was a big refactor of the defrag code that landed in 5.16.
> > > >
> > > > On a quick glance, when using autodefrag it seems we now can end up in an
> > > > infinite loop by marking the same range for degrag (IO) over and over.
> > > >
> > > > Can you try the following patch? (also at https://pastebin.com/raw/QR27Jv6n)
> > >
> > > Actually try this one instead:
> > >
> > > https://pastebin.com/raw/EbEfk1tF
> > >
> > > Also, there's a bug with defrag running into an (almost) infinite loop when
> > > attempting to defrag a 1 byte file. Someone ran into this and I've just sent
> > > a fix for it:
> > >
> > > https://patchwork.kernel.org/project/linux-btrfs/patch/bcbfce0ff7e21bbfed2484b1457e560edf78020d.1642436805.git.fdmanana@suse.com/
> > >
> > > Maybe that is what you are running into when using autodefrag.
> > > Firt try that fix for the 1 byte file case, and if after that you still run
> > > into problems, then try with the other patch above as well (both patches
> > > applied).
> > >
> > > Thanks.
> > >
> > >
> > >
> > > >
> > > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> > > > index a5bd6926f7ff..0a9f6125a566 100644
> > > > --- a/fs/btrfs/ioctl.c
> > > > +++ b/fs/btrfs/ioctl.c
> > > > @@ -1213,6 +1213,13 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
> > > >                 if (em->generation < newer_than)
> > > >                         goto next;
> > > >
> > > > +               /*
> > > > +                * Skip extents already under IO, otherwise we can end up in an
> > > > +                * infinite loop when using auto defrag.
> > > > +                */
> > > > +               if (em->generation == (u64)-1)
> > > > +                       goto next;
> > > > +
> > > >                 /*
> > > >                  * For do_compress case, we want to compress all valid file
> > > >                  * extents, thus no @extent_thresh or mergeable check.
> > > >
> > > >
> > > > >
> > > > > François-Xavier
> > > > >
> > > > > [0] https://www.reddit.com/r/btrfs/comments/s4nrzb/massive_performance_degradation_after_upgrading/
> > > > > [1] https://imgur.com/oYhYat1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Massive I/O usage from btrfs-cleaner after upgrading to 5.16
  2022-01-19 10:13         ` Filipe Manana
@ 2022-01-20 11:37           ` François-Xavier Thomas
  2022-01-20 11:44             ` Filipe Manana
  0 siblings, 1 reply; 20+ messages in thread
From: François-Xavier Thomas @ 2022-01-20 11:37 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs

Hi Felipe,

> So, try with these two more patches on top of that:

Thanks, I did just that, see graph with annotations:
https://i.imgur.com/pu66nz0.png

No visible improvement, average baseline I/O (for roughly similar
workloads, the server I'm testing it on is not very busy I/O-wise) is
still 3-4x higher in 5.16 than in 5.15 with autodefrag enabled.

The good news is that patch 2 did fix a large part of the issues 5.16.0 had.
I also checked that disabling autodefrag immediately brings I/O rate
back to how it was in 5.15.

>> Some people reported that 5.16.1 improved the situation for them, so
> I don't see how that's possible, nothing was added to 5.16.1 that
> involves defrag.
> Might just be a coincidence.

Yes, I found no evidence that official 5.16.1 is any better than the
rest on my side.

François-Xavier

On Wed, Jan 19, 2022 at 11:14 AM Filipe Manana <fdmanana@kernel.org> wrote:
>
> On Wed, Jan 19, 2022 at 9:44 AM François-Xavier Thomas
> <fx.thomas@gmail.com> wrote:
> >
> > Hi,
> >
> > More details on graph[0]:
> > - First patch (1-byte file) on 5.16.0 did not have a significant impact.
> > - Both patches on 5.16.0 did reduce a large part of the I/O but still
> > have a high baseline I/O compared to 5.15
>
> So, try with these two more patches on top of that:
>
> https://patchwork.kernel.org/project/linux-btrfs/patch/20220118071904.29991-1-wqu@suse.com/
>
> https://patchwork.kernel.org/project/linux-btrfs/patch/20220118115352.52126-1-wqu@suse.com/
>
> >
> > Some people reported that 5.16.1 improved the situation for them, so
>
> I don't see how that's possible, nothing was added to 5.16.1 that
> involves defrag.
> Might just be a coincidence.
>
> Thanks.
>
> > I'm testing that. It's too early to tell but for now the baseline I/O
> > still seems to be high compared to 5.15. Will update with more results
> > tomorrow.
> >
> > François-Xavier
> >
> > [0] https://i.imgur.com/agzAKGc.png
> >
> > On Mon, Jan 17, 2022 at 10:37 PM François-Xavier Thomas
> > <fx.thomas@gmail.com> wrote:
> > >
> > > Hi Filipe,
> > >
> > > Thank you so much for the hints!
> > >
> > > I compiled 5.16 with the 1-byte file patch and have been running it
> > > for a couple of hours now. I/O seems to have been gradually increasing
> > > compared to 5.15, but I will wait for tomorrow to have a clearer view
> > > on the graphs, then I'll try the both patches.
> > >
> > > François-Xavier
> > >
> > > On Mon, Jan 17, 2022 at 5:59 PM Filipe Manana <fdmanana@kernel.org> wrote:
> > > >
> > > > On Mon, Jan 17, 2022 at 12:02:08PM +0000, Filipe Manana wrote:
> > > > > On Mon, Jan 17, 2022 at 11:06:42AM +0100, François-Xavier Thomas wrote:
> > > > > > Hello all,
> > > > > >
> > > > > > Just in case someone is having the same issue: Btrfs (in the
> > > > > > btrfs-cleaner process) is taking a large amount of disk IO after
> > > > > > upgrading to 5.16 on one of my volumes, and multiple other people seem
> > > > > > to be having the same issue, see discussion in [0].
> > > > > >
> > > > > > [1] is a close-up screenshot of disk I/O history (blue line is write
> > > > > > ops, going from a baseline of some 10 ops/s to around 1k ops/s). I
> > > > > > downgraded from 5.16 to 5.15 in the middle, which immediately restored
> > > > > > previous performance.
> > > > > >
> > > > > > Common options between affected people are: ssd, autodefrag. No error
> > > > > > in the logs, and no other issue aside from performance (the volume
> > > > > > works just fine for accessing data).
> > > > > >
> > > > > > One person reports that SMART stats show a massive amount of blocks
> > > > > > being written; unfortunately I do not have historical data for that so
> > > > > > I cannot confirm, but this sounds likely given what I see on what
> > > > > > should be a relatively new SSD.
> > > > > >
> > > > > > Any idea of what it could be related to?
> > > > >
> > > > > There was a big refactor of the defrag code that landed in 5.16.
> > > > >
> > > > > On a quick glance, when using autodefrag it seems we now can end up in an
> > > > > infinite loop by marking the same range for degrag (IO) over and over.
> > > > >
> > > > > Can you try the following patch? (also at https://pastebin.com/raw/QR27Jv6n)
> > > >
> > > > Actually try this one instead:
> > > >
> > > > https://pastebin.com/raw/EbEfk1tF
> > > >
> > > > Also, there's a bug with defrag running into an (almost) infinite loop when
> > > > attempting to defrag a 1 byte file. Someone ran into this and I've just sent
> > > > a fix for it:
> > > >
> > > > https://patchwork.kernel.org/project/linux-btrfs/patch/bcbfce0ff7e21bbfed2484b1457e560edf78020d.1642436805.git.fdmanana@suse.com/
> > > >
> > > > Maybe that is what you are running into when using autodefrag.
> > > > Firt try that fix for the 1 byte file case, and if after that you still run
> > > > into problems, then try with the other patch above as well (both patches
> > > > applied).
> > > >
> > > > Thanks.
> > > >
> > > >
> > > >
> > > > >
> > > > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> > > > > index a5bd6926f7ff..0a9f6125a566 100644
> > > > > --- a/fs/btrfs/ioctl.c
> > > > > +++ b/fs/btrfs/ioctl.c
> > > > > @@ -1213,6 +1213,13 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
> > > > >                 if (em->generation < newer_than)
> > > > >                         goto next;
> > > > >
> > > > > +               /*
> > > > > +                * Skip extents already under IO, otherwise we can end up in an
> > > > > +                * infinite loop when using auto defrag.
> > > > > +                */
> > > > > +               if (em->generation == (u64)-1)
> > > > > +                       goto next;
> > > > > +
> > > > >                 /*
> > > > >                  * For do_compress case, we want to compress all valid file
> > > > >                  * extents, thus no @extent_thresh or mergeable check.
> > > > >
> > > > >
> > > > > >
> > > > > > François-Xavier
> > > > > >
> > > > > > [0] https://www.reddit.com/r/btrfs/comments/s4nrzb/massive_performance_degradation_after_upgrading/
> > > > > > [1] https://imgur.com/oYhYat1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Massive I/O usage from btrfs-cleaner after upgrading to 5.16
  2022-01-20 11:37           ` François-Xavier Thomas
@ 2022-01-20 11:44             ` Filipe Manana
  2022-01-20 12:02               ` François-Xavier Thomas
  0 siblings, 1 reply; 20+ messages in thread
From: Filipe Manana @ 2022-01-20 11:44 UTC (permalink / raw)
  To: François-Xavier Thomas; +Cc: linux-btrfs, Qu Wenruo

On Thu, Jan 20, 2022 at 11:37 AM François-Xavier Thomas
<fx.thomas@gmail.com> wrote:
>
> Hi Felipe,
>
> > So, try with these two more patches on top of that:
>
> Thanks, I did just that, see graph with annotations:
> https://i.imgur.com/pu66nz0.png
>
> No visible improvement, average baseline I/O (for roughly similar
> workloads, the server I'm testing it on is not very busy I/O-wise) is
> still 3-4x higher in 5.16 than in 5.15 with autodefrag enabled.

What if on top of those patches, you also add this one:

https://pastebin.com/raw/EbEfk1tF

Can you see if it helps?

>
> The good news is that patch 2 did fix a large part of the issues 5.16.0 had.
> I also checked that disabling autodefrag immediately brings I/O rate
> back to how it was in 5.15.

At least that!
Thanks.

>
> >> Some people reported that 5.16.1 improved the situation for them, so
> > I don't see how that's possible, nothing was added to 5.16.1 that
> > involves defrag.
> > Might just be a coincidence.
>
> Yes, I found no evidence that official 5.16.1 is any better than the
> rest on my side.
>
> François-Xavier
>
> On Wed, Jan 19, 2022 at 11:14 AM Filipe Manana <fdmanana@kernel.org> wrote:
> >
> > On Wed, Jan 19, 2022 at 9:44 AM François-Xavier Thomas
> > <fx.thomas@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > More details on graph[0]:
> > > - First patch (1-byte file) on 5.16.0 did not have a significant impact.
> > > - Both patches on 5.16.0 did reduce a large part of the I/O but still
> > > have a high baseline I/O compared to 5.15
> >
> > So, try with these two more patches on top of that:
> >
> > https://patchwork.kernel.org/project/linux-btrfs/patch/20220118071904.29991-1-wqu@suse.com/
> >
> > https://patchwork.kernel.org/project/linux-btrfs/patch/20220118115352.52126-1-wqu@suse.com/
> >
> > >
> > > Some people reported that 5.16.1 improved the situation for them, so
> >
> > I don't see how that's possible, nothing was added to 5.16.1 that
> > involves defrag.
> > Might just be a coincidence.
> >
> > Thanks.
> >
> > > I'm testing that. It's too early to tell but for now the baseline I/O
> > > still seems to be high compared to 5.15. Will update with more results
> > > tomorrow.
> > >
> > > François-Xavier
> > >
> > > [0] https://i.imgur.com/agzAKGc.png
> > >
> > > On Mon, Jan 17, 2022 at 10:37 PM François-Xavier Thomas
> > > <fx.thomas@gmail.com> wrote:
> > > >
> > > > Hi Filipe,
> > > >
> > > > Thank you so much for the hints!
> > > >
> > > > I compiled 5.16 with the 1-byte file patch and have been running it
> > > > for a couple of hours now. I/O seems to have been gradually increasing
> > > > compared to 5.15, but I will wait for tomorrow to have a clearer view
> > > > on the graphs, then I'll try the both patches.
> > > >
> > > > François-Xavier
> > > >
> > > > On Mon, Jan 17, 2022 at 5:59 PM Filipe Manana <fdmanana@kernel.org> wrote:
> > > > >
> > > > > On Mon, Jan 17, 2022 at 12:02:08PM +0000, Filipe Manana wrote:
> > > > > > On Mon, Jan 17, 2022 at 11:06:42AM +0100, François-Xavier Thomas wrote:
> > > > > > > Hello all,
> > > > > > >
> > > > > > > Just in case someone is having the same issue: Btrfs (in the
> > > > > > > btrfs-cleaner process) is taking a large amount of disk IO after
> > > > > > > upgrading to 5.16 on one of my volumes, and multiple other people seem
> > > > > > > to be having the same issue, see discussion in [0].
> > > > > > >
> > > > > > > [1] is a close-up screenshot of disk I/O history (blue line is write
> > > > > > > ops, going from a baseline of some 10 ops/s to around 1k ops/s). I
> > > > > > > downgraded from 5.16 to 5.15 in the middle, which immediately restored
> > > > > > > previous performance.
> > > > > > >
> > > > > > > Common options between affected people are: ssd, autodefrag. No error
> > > > > > > in the logs, and no other issue aside from performance (the volume
> > > > > > > works just fine for accessing data).
> > > > > > >
> > > > > > > One person reports that SMART stats show a massive amount of blocks
> > > > > > > being written; unfortunately I do not have historical data for that so
> > > > > > > I cannot confirm, but this sounds likely given what I see on what
> > > > > > > should be a relatively new SSD.
> > > > > > >
> > > > > > > Any idea of what it could be related to?
> > > > > >
> > > > > > There was a big refactor of the defrag code that landed in 5.16.
> > > > > >
> > > > > > On a quick glance, when using autodefrag it seems we now can end up in an
> > > > > > infinite loop by marking the same range for degrag (IO) over and over.
> > > > > >
> > > > > > Can you try the following patch? (also at https://pastebin.com/raw/QR27Jv6n)
> > > > >
> > > > > Actually try this one instead:
> > > > >
> > > > > https://pastebin.com/raw/EbEfk1tF
> > > > >
> > > > > Also, there's a bug with defrag running into an (almost) infinite loop when
> > > > > attempting to defrag a 1 byte file. Someone ran into this and I've just sent
> > > > > a fix for it:
> > > > >
> > > > > https://patchwork.kernel.org/project/linux-btrfs/patch/bcbfce0ff7e21bbfed2484b1457e560edf78020d.1642436805.git.fdmanana@suse.com/
> > > > >
> > > > > Maybe that is what you are running into when using autodefrag.
> > > > > Firt try that fix for the 1 byte file case, and if after that you still run
> > > > > into problems, then try with the other patch above as well (both patches
> > > > > applied).
> > > > >
> > > > > Thanks.
> > > > >
> > > > >
> > > > >
> > > > > >
> > > > > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> > > > > > index a5bd6926f7ff..0a9f6125a566 100644
> > > > > > --- a/fs/btrfs/ioctl.c
> > > > > > +++ b/fs/btrfs/ioctl.c
> > > > > > @@ -1213,6 +1213,13 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
> > > > > >                 if (em->generation < newer_than)
> > > > > >                         goto next;
> > > > > >
> > > > > > +               /*
> > > > > > +                * Skip extents already under IO, otherwise we can end up in an
> > > > > > +                * infinite loop when using auto defrag.
> > > > > > +                */
> > > > > > +               if (em->generation == (u64)-1)
> > > > > > +                       goto next;
> > > > > > +
> > > > > >                 /*
> > > > > >                  * For do_compress case, we want to compress all valid file
> > > > > >                  * extents, thus no @extent_thresh or mergeable check.
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > François-Xavier
> > > > > > >
> > > > > > > [0] https://www.reddit.com/r/btrfs/comments/s4nrzb/massive_performance_degradation_after_upgrading/
> > > > > > > [1] https://imgur.com/oYhYat1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Massive I/O usage from btrfs-cleaner after upgrading to 5.16
  2022-01-20 11:44             ` Filipe Manana
@ 2022-01-20 12:02               ` François-Xavier Thomas
  2022-01-20 12:45                 ` Qu Wenruo
  2022-01-20 17:46                 ` Filipe Manana
  0 siblings, 2 replies; 20+ messages in thread
From: François-Xavier Thomas @ 2022-01-20 12:02 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs, Qu Wenruo

> What if on top of those patches, you also add this one:
> https://pastebin.com/raw/EbEfk1tF

That's exactly patch 2 in my stack of patches in fact, is that the correct link?

On Thu, Jan 20, 2022 at 12:45 PM Filipe Manana <fdmanana@kernel.org> wrote:
>
> On Thu, Jan 20, 2022 at 11:37 AM François-Xavier Thomas
> <fx.thomas@gmail.com> wrote:
> >
> > Hi Felipe,
> >
> > > So, try with these two more patches on top of that:
> >
> > Thanks, I did just that, see graph with annotations:
> > https://i.imgur.com/pu66nz0.png
> >
> > No visible improvement, average baseline I/O (for roughly similar
> > workloads, the server I'm testing it on is not very busy I/O-wise) is
> > still 3-4x higher in 5.16 than in 5.15 with autodefrag enabled.
>
> What if on top of those patches, you also add this one:
>
> https://pastebin.com/raw/EbEfk1tF
>
> Can you see if it helps?
>
> >
> > The good news is that patch 2 did fix a large part of the issues 5.16.0 had.
> > I also checked that disabling autodefrag immediately brings I/O rate
> > back to how it was in 5.15.
>
> At least that!
> Thanks.
>
> >
> > >> Some people reported that 5.16.1 improved the situation for them, so
> > > I don't see how that's possible, nothing was added to 5.16.1 that
> > > involves defrag.
> > > Might just be a coincidence.
> >
> > Yes, I found no evidence that official 5.16.1 is any better than the
> > rest on my side.
> >
> > François-Xavier
> >
> > On Wed, Jan 19, 2022 at 11:14 AM Filipe Manana <fdmanana@kernel.org> wrote:
> > >
> > > On Wed, Jan 19, 2022 at 9:44 AM François-Xavier Thomas
> > > <fx.thomas@gmail.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > More details on graph[0]:
> > > > - First patch (1-byte file) on 5.16.0 did not have a significant impact.
> > > > - Both patches on 5.16.0 did reduce a large part of the I/O but still
> > > > have a high baseline I/O compared to 5.15
> > >
> > > So, try with these two more patches on top of that:
> > >
> > > https://patchwork.kernel.org/project/linux-btrfs/patch/20220118071904.29991-1-wqu@suse.com/
> > >
> > > https://patchwork.kernel.org/project/linux-btrfs/patch/20220118115352.52126-1-wqu@suse.com/
> > >
> > > >
> > > > Some people reported that 5.16.1 improved the situation for them, so
> > >
> > > I don't see how that's possible, nothing was added to 5.16.1 that
> > > involves defrag.
> > > Might just be a coincidence.
> > >
> > > Thanks.
> > >
> > > > I'm testing that. It's too early to tell but for now the baseline I/O
> > > > still seems to be high compared to 5.15. Will update with more results
> > > > tomorrow.
> > > >
> > > > François-Xavier
> > > >
> > > > [0] https://i.imgur.com/agzAKGc.png
> > > >
> > > > On Mon, Jan 17, 2022 at 10:37 PM François-Xavier Thomas
> > > > <fx.thomas@gmail.com> wrote:
> > > > >
> > > > > Hi Filipe,
> > > > >
> > > > > Thank you so much for the hints!
> > > > >
> > > > > I compiled 5.16 with the 1-byte file patch and have been running it
> > > > > for a couple of hours now. I/O seems to have been gradually increasing
> > > > > compared to 5.15, but I will wait for tomorrow to have a clearer view
> > > > > on the graphs, then I'll try the both patches.
> > > > >
> > > > > François-Xavier
> > > > >
> > > > > On Mon, Jan 17, 2022 at 5:59 PM Filipe Manana <fdmanana@kernel.org> wrote:
> > > > > >
> > > > > > On Mon, Jan 17, 2022 at 12:02:08PM +0000, Filipe Manana wrote:
> > > > > > > On Mon, Jan 17, 2022 at 11:06:42AM +0100, François-Xavier Thomas wrote:
> > > > > > > > Hello all,
> > > > > > > >
> > > > > > > > Just in case someone is having the same issue: Btrfs (in the
> > > > > > > > btrfs-cleaner process) is taking a large amount of disk IO after
> > > > > > > > upgrading to 5.16 on one of my volumes, and multiple other people seem
> > > > > > > > to be having the same issue, see discussion in [0].
> > > > > > > >
> > > > > > > > [1] is a close-up screenshot of disk I/O history (blue line is write
> > > > > > > > ops, going from a baseline of some 10 ops/s to around 1k ops/s). I
> > > > > > > > downgraded from 5.16 to 5.15 in the middle, which immediately restored
> > > > > > > > previous performance.
> > > > > > > >
> > > > > > > > Common options between affected people are: ssd, autodefrag. No error
> > > > > > > > in the logs, and no other issue aside from performance (the volume
> > > > > > > > works just fine for accessing data).
> > > > > > > >
> > > > > > > > One person reports that SMART stats show a massive amount of blocks
> > > > > > > > being written; unfortunately I do not have historical data for that so
> > > > > > > > I cannot confirm, but this sounds likely given what I see on what
> > > > > > > > should be a relatively new SSD.
> > > > > > > >
> > > > > > > > Any idea of what it could be related to?
> > > > > > >
> > > > > > > There was a big refactor of the defrag code that landed in 5.16.
> > > > > > >
> > > > > > > On a quick glance, when using autodefrag it seems we now can end up in an
> > > > > > > infinite loop by marking the same range for degrag (IO) over and over.
> > > > > > >
> > > > > > > Can you try the following patch? (also at https://pastebin.com/raw/QR27Jv6n)
> > > > > >
> > > > > > Actually try this one instead:
> > > > > >
> > > > > > https://pastebin.com/raw/EbEfk1tF
> > > > > >
> > > > > > Also, there's a bug with defrag running into an (almost) infinite loop when
> > > > > > attempting to defrag a 1 byte file. Someone ran into this and I've just sent
> > > > > > a fix for it:
> > > > > >
> > > > > > https://patchwork.kernel.org/project/linux-btrfs/patch/bcbfce0ff7e21bbfed2484b1457e560edf78020d.1642436805.git.fdmanana@suse.com/
> > > > > >
> > > > > > Maybe that is what you are running into when using autodefrag.
> > > > > > Firt try that fix for the 1 byte file case, and if after that you still run
> > > > > > into problems, then try with the other patch above as well (both patches
> > > > > > applied).
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> > > > > > > index a5bd6926f7ff..0a9f6125a566 100644
> > > > > > > --- a/fs/btrfs/ioctl.c
> > > > > > > +++ b/fs/btrfs/ioctl.c
> > > > > > > @@ -1213,6 +1213,13 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
> > > > > > >                 if (em->generation < newer_than)
> > > > > > >                         goto next;
> > > > > > >
> > > > > > > +               /*
> > > > > > > +                * Skip extents already under IO, otherwise we can end up in an
> > > > > > > +                * infinite loop when using auto defrag.
> > > > > > > +                */
> > > > > > > +               if (em->generation == (u64)-1)
> > > > > > > +                       goto next;
> > > > > > > +
> > > > > > >                 /*
> > > > > > >                  * For do_compress case, we want to compress all valid file
> > > > > > >                  * extents, thus no @extent_thresh or mergeable check.
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > François-Xavier
> > > > > > > >
> > > > > > > > [0] https://www.reddit.com/r/btrfs/comments/s4nrzb/massive_performance_degradation_after_upgrading/
> > > > > > > > [1] https://imgur.com/oYhYat1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Massive I/O usage from btrfs-cleaner after upgrading to 5.16
  2022-01-20 12:02               ` François-Xavier Thomas
@ 2022-01-20 12:45                 ` Qu Wenruo
  2022-01-20 12:55                   ` Filipe Manana
  2022-01-20 17:46                 ` Filipe Manana
  1 sibling, 1 reply; 20+ messages in thread
From: Qu Wenruo @ 2022-01-20 12:45 UTC (permalink / raw)
  To: François-Xavier Thomas, Filipe Manana; +Cc: linux-btrfs, Qu Wenruo



On 2022/1/20 20:02, François-Xavier Thomas wrote:
>> What if on top of those patches, you also add this one:
>> https://pastebin.com/raw/EbEfk1tF
>
> That's exactly patch 2 in my stack of patches in fact, is that the correct link?

Mind to share the full stack of patches or diffs?

I'd say for the known autodefrag, the following seems to solve the
problem for at least one reporter:

- btrfs: fix too long loop when defragging a 1 byte file

https://patchwork.kernel.org/project/linux-btrfs/patch/bcbfce0ff7e21bbfed2484b1457e560edf78020d.1642436805.git.fdmanana@suse.com/

- [v2] btrfs: defrag: fix the wrong number of defragged sectors

https://patchwork.kernel.org/project/linux-btrfs/patch/20220118071904.29991-1-wqu@suse.com/

- btrfs: defrag: properly update range->start for autodefrag

https://patchwork.kernel.org/project/linux-btrfs/patch/20220118115352.52126-1-wqu@suse.com/

Thanks,
Qu

>
> On Thu, Jan 20, 2022 at 12:45 PM Filipe Manana <fdmanana@kernel.org> wrote:
>>
>> On Thu, Jan 20, 2022 at 11:37 AM François-Xavier Thomas
>> <fx.thomas@gmail.com> wrote:
>>>
>>> Hi Felipe,
>>>
>>>> So, try with these two more patches on top of that:
>>>
>>> Thanks, I did just that, see graph with annotations:
>>> https://i.imgur.com/pu66nz0.png
>>>
>>> No visible improvement, average baseline I/O (for roughly similar
>>> workloads, the server I'm testing it on is not very busy I/O-wise) is
>>> still 3-4x higher in 5.16 than in 5.15 with autodefrag enabled.
>>
>> What if on top of those patches, you also add this one:
>>
>> https://pastebin.com/raw/EbEfk1tF
>>
>> Can you see if it helps?
>>
>>>
>>> The good news is that patch 2 did fix a large part of the issues 5.16.0 had.
>>> I also checked that disabling autodefrag immediately brings I/O rate
>>> back to how it was in 5.15.
>>
>> At least that!
>> Thanks.
>>
>>>
>>>>> Some people reported that 5.16.1 improved the situation for them, so
>>>> I don't see how that's possible, nothing was added to 5.16.1 that
>>>> involves defrag.
>>>> Might just be a coincidence.
>>>
>>> Yes, I found no evidence that official 5.16.1 is any better than the
>>> rest on my side.
>>>
>>> François-Xavier
>>>
>>> On Wed, Jan 19, 2022 at 11:14 AM Filipe Manana <fdmanana@kernel.org> wrote:
>>>>
>>>> On Wed, Jan 19, 2022 at 9:44 AM François-Xavier Thomas
>>>> <fx.thomas@gmail.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> More details on graph[0]:
>>>>> - First patch (1-byte file) on 5.16.0 did not have a significant impact.
>>>>> - Both patches on 5.16.0 did reduce a large part of the I/O but still
>>>>> have a high baseline I/O compared to 5.15
>>>>
>>>> So, try with these two more patches on top of that:
>>>>
>>>> https://patchwork.kernel.org/project/linux-btrfs/patch/20220118071904.29991-1-wqu@suse.com/
>>>>
>>>> https://patchwork.kernel.org/project/linux-btrfs/patch/20220118115352.52126-1-wqu@suse.com/
>>>>
>>>>>
>>>>> Some people reported that 5.16.1 improved the situation for them, so
>>>>
>>>> I don't see how that's possible, nothing was added to 5.16.1 that
>>>> involves defrag.
>>>> Might just be a coincidence.
>>>>
>>>> Thanks.
>>>>
>>>>> I'm testing that. It's too early to tell but for now the baseline I/O
>>>>> still seems to be high compared to 5.15. Will update with more results
>>>>> tomorrow.
>>>>>
>>>>> François-Xavier
>>>>>
>>>>> [0] https://i.imgur.com/agzAKGc.png
>>>>>
>>>>> On Mon, Jan 17, 2022 at 10:37 PM François-Xavier Thomas
>>>>> <fx.thomas@gmail.com> wrote:
>>>>>>
>>>>>> Hi Filipe,
>>>>>>
>>>>>> Thank you so much for the hints!
>>>>>>
>>>>>> I compiled 5.16 with the 1-byte file patch and have been running it
>>>>>> for a couple of hours now. I/O seems to have been gradually increasing
>>>>>> compared to 5.15, but I will wait for tomorrow to have a clearer view
>>>>>> on the graphs, then I'll try the both patches.
>>>>>>
>>>>>> François-Xavier
>>>>>>
>>>>>> On Mon, Jan 17, 2022 at 5:59 PM Filipe Manana <fdmanana@kernel.org> wrote:
>>>>>>>
>>>>>>> On Mon, Jan 17, 2022 at 12:02:08PM +0000, Filipe Manana wrote:
>>>>>>>> On Mon, Jan 17, 2022 at 11:06:42AM +0100, François-Xavier Thomas wrote:
>>>>>>>>> Hello all,
>>>>>>>>>
>>>>>>>>> Just in case someone is having the same issue: Btrfs (in the
>>>>>>>>> btrfs-cleaner process) is taking a large amount of disk IO after
>>>>>>>>> upgrading to 5.16 on one of my volumes, and multiple other people seem
>>>>>>>>> to be having the same issue, see discussion in [0].
>>>>>>>>>
>>>>>>>>> [1] is a close-up screenshot of disk I/O history (blue line is write
>>>>>>>>> ops, going from a baseline of some 10 ops/s to around 1k ops/s). I
>>>>>>>>> downgraded from 5.16 to 5.15 in the middle, which immediately restored
>>>>>>>>> previous performance.
>>>>>>>>>
>>>>>>>>> Common options between affected people are: ssd, autodefrag. No error
>>>>>>>>> in the logs, and no other issue aside from performance (the volume
>>>>>>>>> works just fine for accessing data).
>>>>>>>>>
>>>>>>>>> One person reports that SMART stats show a massive amount of blocks
>>>>>>>>> being written; unfortunately I do not have historical data for that so
>>>>>>>>> I cannot confirm, but this sounds likely given what I see on what
>>>>>>>>> should be a relatively new SSD.
>>>>>>>>>
>>>>>>>>> Any idea of what it could be related to?
>>>>>>>>
>>>>>>>> There was a big refactor of the defrag code that landed in 5.16.
>>>>>>>>
>>>>>>>> On a quick glance, when using autodefrag it seems we now can end up in an
>>>>>>>> infinite loop by marking the same range for degrag (IO) over and over.
>>>>>>>>
>>>>>>>> Can you try the following patch? (also at https://pastebin.com/raw/QR27Jv6n)
>>>>>>>
>>>>>>> Actually try this one instead:
>>>>>>>
>>>>>>> https://pastebin.com/raw/EbEfk1tF
>>>>>>>
>>>>>>> Also, there's a bug with defrag running into an (almost) infinite loop when
>>>>>>> attempting to defrag a 1 byte file. Someone ran into this and I've just sent
>>>>>>> a fix for it:
>>>>>>>
>>>>>>> https://patchwork.kernel.org/project/linux-btrfs/patch/bcbfce0ff7e21bbfed2484b1457e560edf78020d.1642436805.git.fdmanana@suse.com/
>>>>>>>
>>>>>>> Maybe that is what you are running into when using autodefrag.
>>>>>>> Firt try that fix for the 1 byte file case, and if after that you still run
>>>>>>> into problems, then try with the other patch above as well (both patches
>>>>>>> applied).
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>>>>>>>> index a5bd6926f7ff..0a9f6125a566 100644
>>>>>>>> --- a/fs/btrfs/ioctl.c
>>>>>>>> +++ b/fs/btrfs/ioctl.c
>>>>>>>> @@ -1213,6 +1213,13 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
>>>>>>>>                  if (em->generation < newer_than)
>>>>>>>>                          goto next;
>>>>>>>>
>>>>>>>> +               /*
>>>>>>>> +                * Skip extents already under IO, otherwise we can end up in an
>>>>>>>> +                * infinite loop when using auto defrag.
>>>>>>>> +                */
>>>>>>>> +               if (em->generation == (u64)-1)
>>>>>>>> +                       goto next;
>>>>>>>> +
>>>>>>>>                  /*
>>>>>>>>                   * For do_compress case, we want to compress all valid file
>>>>>>>>                   * extents, thus no @extent_thresh or mergeable check.
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> François-Xavier
>>>>>>>>>
>>>>>>>>> [0] https://www.reddit.com/r/btrfs/comments/s4nrzb/massive_performance_degradation_after_upgrading/
>>>>>>>>> [1] https://imgur.com/oYhYat1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Massive I/O usage from btrfs-cleaner after upgrading to 5.16
  2022-01-20 12:45                 ` Qu Wenruo
@ 2022-01-20 12:55                   ` Filipe Manana
  0 siblings, 0 replies; 20+ messages in thread
From: Filipe Manana @ 2022-01-20 12:55 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: François-Xavier Thomas, linux-btrfs, Qu Wenruo

On Thu, Jan 20, 2022 at 12:45 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2022/1/20 20:02, François-Xavier Thomas wrote:
> >> What if on top of those patches, you also add this one:
> >> https://pastebin.com/raw/EbEfk1tF
> >
> > That's exactly patch 2 in my stack of patches in fact, is that the correct link?
>
> Mind to share the full stack of patches or diffs?
>
> I'd say for the known autodefrag, the following seems to solve the
> problem for at least one reporter:
>
> - btrfs: fix too long loop when defragging a 1 byte file
>
> https://patchwork.kernel.org/project/linux-btrfs/patch/bcbfce0ff7e21bbfed2484b1457e560edf78020d.1642436805.git.fdmanana@suse.com/
>
> - [v2] btrfs: defrag: fix the wrong number of defragged sectors
>
> https://patchwork.kernel.org/project/linux-btrfs/patch/20220118071904.29991-1-wqu@suse.com/
>
> - btrfs: defrag: properly update range->start for autodefrag
>
> https://patchwork.kernel.org/project/linux-btrfs/patch/20220118115352.52126-1-wqu@suse.com/

That's the stack of patches I pointed out earlier in the thread...

>
> Thanks,
> Qu
>
> >
> > On Thu, Jan 20, 2022 at 12:45 PM Filipe Manana <fdmanana@kernel.org> wrote:
> >>
> >> On Thu, Jan 20, 2022 at 11:37 AM François-Xavier Thomas
> >> <fx.thomas@gmail.com> wrote:
> >>>
> >>> Hi Felipe,
> >>>
> >>>> So, try with these two more patches on top of that:
> >>>
> >>> Thanks, I did just that, see graph with annotations:
> >>> https://i.imgur.com/pu66nz0.png
> >>>
> >>> No visible improvement, average baseline I/O (for roughly similar
> >>> workloads, the server I'm testing it on is not very busy I/O-wise) is
> >>> still 3-4x higher in 5.16 than in 5.15 with autodefrag enabled.
> >>
> >> What if on top of those patches, you also add this one:
> >>
> >> https://pastebin.com/raw/EbEfk1tF
> >>
> >> Can you see if it helps?
> >>
> >>>
> >>> The good news is that patch 2 did fix a large part of the issues 5.16.0 had.
> >>> I also checked that disabling autodefrag immediately brings I/O rate
> >>> back to how it was in 5.15.
> >>
> >> At least that!
> >> Thanks.
> >>
> >>>
> >>>>> Some people reported that 5.16.1 improved the situation for them, so
> >>>> I don't see how that's possible, nothing was added to 5.16.1 that
> >>>> involves defrag.
> >>>> Might just be a coincidence.
> >>>
> >>> Yes, I found no evidence that official 5.16.1 is any better than the
> >>> rest on my side.
> >>>
> >>> François-Xavier
> >>>
> >>> On Wed, Jan 19, 2022 at 11:14 AM Filipe Manana <fdmanana@kernel.org> wrote:
> >>>>
> >>>> On Wed, Jan 19, 2022 at 9:44 AM François-Xavier Thomas
> >>>> <fx.thomas@gmail.com> wrote:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> More details on graph[0]:
> >>>>> - First patch (1-byte file) on 5.16.0 did not have a significant impact.
> >>>>> - Both patches on 5.16.0 did reduce a large part of the I/O but still
> >>>>> have a high baseline I/O compared to 5.15
> >>>>
> >>>> So, try with these two more patches on top of that:
> >>>>
> >>>> https://patchwork.kernel.org/project/linux-btrfs/patch/20220118071904.29991-1-wqu@suse.com/
> >>>>
> >>>> https://patchwork.kernel.org/project/linux-btrfs/patch/20220118115352.52126-1-wqu@suse.com/
> >>>>
> >>>>>
> >>>>> Some people reported that 5.16.1 improved the situation for them, so
> >>>>
> >>>> I don't see how that's possible, nothing was added to 5.16.1 that
> >>>> involves defrag.
> >>>> Might just be a coincidence.
> >>>>
> >>>> Thanks.
> >>>>
> >>>>> I'm testing that. It's too early to tell but for now the baseline I/O
> >>>>> still seems to be high compared to 5.15. Will update with more results
> >>>>> tomorrow.
> >>>>>
> >>>>> François-Xavier
> >>>>>
> >>>>> [0] https://i.imgur.com/agzAKGc.png
> >>>>>
> >>>>> On Mon, Jan 17, 2022 at 10:37 PM François-Xavier Thomas
> >>>>> <fx.thomas@gmail.com> wrote:
> >>>>>>
> >>>>>> Hi Filipe,
> >>>>>>
> >>>>>> Thank you so much for the hints!
> >>>>>>
> >>>>>> I compiled 5.16 with the 1-byte file patch and have been running it
> >>>>>> for a couple of hours now. I/O seems to have been gradually increasing
> >>>>>> compared to 5.15, but I will wait for tomorrow to have a clearer view
> >>>>>> on the graphs, then I'll try the both patches.
> >>>>>>
> >>>>>> François-Xavier
> >>>>>>
> >>>>>> On Mon, Jan 17, 2022 at 5:59 PM Filipe Manana <fdmanana@kernel.org> wrote:
> >>>>>>>
> >>>>>>> On Mon, Jan 17, 2022 at 12:02:08PM +0000, Filipe Manana wrote:
> >>>>>>>> On Mon, Jan 17, 2022 at 11:06:42AM +0100, François-Xavier Thomas wrote:
> >>>>>>>>> Hello all,
> >>>>>>>>>
> >>>>>>>>> Just in case someone is having the same issue: Btrfs (in the
> >>>>>>>>> btrfs-cleaner process) is taking a large amount of disk IO after
> >>>>>>>>> upgrading to 5.16 on one of my volumes, and multiple other people seem
> >>>>>>>>> to be having the same issue, see discussion in [0].
> >>>>>>>>>
> >>>>>>>>> [1] is a close-up screenshot of disk I/O history (blue line is write
> >>>>>>>>> ops, going from a baseline of some 10 ops/s to around 1k ops/s). I
> >>>>>>>>> downgraded from 5.16 to 5.15 in the middle, which immediately restored
> >>>>>>>>> previous performance.
> >>>>>>>>>
> >>>>>>>>> Common options between affected people are: ssd, autodefrag. No error
> >>>>>>>>> in the logs, and no other issue aside from performance (the volume
> >>>>>>>>> works just fine for accessing data).
> >>>>>>>>>
> >>>>>>>>> One person reports that SMART stats show a massive amount of blocks
> >>>>>>>>> being written; unfortunately I do not have historical data for that so
> >>>>>>>>> I cannot confirm, but this sounds likely given what I see on what
> >>>>>>>>> should be a relatively new SSD.
> >>>>>>>>>
> >>>>>>>>> Any idea of what it could be related to?
> >>>>>>>>
> >>>>>>>> There was a big refactor of the defrag code that landed in 5.16.
> >>>>>>>>
> >>>>>>>> On a quick glance, when using autodefrag it seems we now can end up in an
> >>>>>>>> infinite loop by marking the same range for degrag (IO) over and over.
> >>>>>>>>
> >>>>>>>> Can you try the following patch? (also at https://pastebin.com/raw/QR27Jv6n)
> >>>>>>>
> >>>>>>> Actually try this one instead:
> >>>>>>>
> >>>>>>> https://pastebin.com/raw/EbEfk1tF
> >>>>>>>
> >>>>>>> Also, there's a bug with defrag running into an (almost) infinite loop when
> >>>>>>> attempting to defrag a 1 byte file. Someone ran into this and I've just sent
> >>>>>>> a fix for it:
> >>>>>>>
> >>>>>>> https://patchwork.kernel.org/project/linux-btrfs/patch/bcbfce0ff7e21bbfed2484b1457e560edf78020d.1642436805.git.fdmanana@suse.com/
> >>>>>>>
> >>>>>>> Maybe that is what you are running into when using autodefrag.
> >>>>>>> Firt try that fix for the 1 byte file case, and if after that you still run
> >>>>>>> into problems, then try with the other patch above as well (both patches
> >>>>>>> applied).
> >>>>>>>
> >>>>>>> Thanks.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> >>>>>>>> index a5bd6926f7ff..0a9f6125a566 100644
> >>>>>>>> --- a/fs/btrfs/ioctl.c
> >>>>>>>> +++ b/fs/btrfs/ioctl.c
> >>>>>>>> @@ -1213,6 +1213,13 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
> >>>>>>>>                  if (em->generation < newer_than)
> >>>>>>>>                          goto next;
> >>>>>>>>
> >>>>>>>> +               /*
> >>>>>>>> +                * Skip extents already under IO, otherwise we can end up in an
> >>>>>>>> +                * infinite loop when using auto defrag.
> >>>>>>>> +                */
> >>>>>>>> +               if (em->generation == (u64)-1)
> >>>>>>>> +                       goto next;
> >>>>>>>> +
> >>>>>>>>                  /*
> >>>>>>>>                   * For do_compress case, we want to compress all valid file
> >>>>>>>>                   * extents, thus no @extent_thresh or mergeable check.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> François-Xavier
> >>>>>>>>>
> >>>>>>>>> [0] https://www.reddit.com/r/btrfs/comments/s4nrzb/massive_performance_degradation_after_upgrading/
> >>>>>>>>> [1] https://imgur.com/oYhYat1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Massive I/O usage from btrfs-cleaner after upgrading to 5.16
  2022-01-20 12:02               ` François-Xavier Thomas
  2022-01-20 12:45                 ` Qu Wenruo
@ 2022-01-20 17:46                 ` Filipe Manana
  2022-01-20 18:21                   ` François-Xavier Thomas
  1 sibling, 1 reply; 20+ messages in thread
From: Filipe Manana @ 2022-01-20 17:46 UTC (permalink / raw)
  To: François-Xavier Thomas; +Cc: linux-btrfs, Qu Wenruo

On Thu, Jan 20, 2022 at 12:02 PM François-Xavier Thomas
<fx.thomas@gmail.com> wrote:
>
> > What if on top of those patches, you also add this one:
> > https://pastebin.com/raw/EbEfk1tF
>
> That's exactly patch 2 in my stack of patches in fact, is that the correct link?

It was the correct link, but I forgot that I had already given it to
you (there's another thread from another
user that reported defrag/autodefrag issues in 5.16 as well).

Ok, so new patches to try and the new stack of patches should be:

1) https://patchwork.kernel.org/project/linux-btrfs/patch/bcbfce0ff7e21bbfed2484b1457e560edf78020d.1642436805.git.fdmanana@suse.com/

2) https://patchwork.kernel.org/project/linux-btrfs/patch/20220118071904.29991-1-wqu@suse.com/

3) https://patchwork.kernel.org/project/linux-btrfs/patch/20220118115352.52126-1-wqu@suse.com/

4) https://patchwork.kernel.org/project/linux-btrfs/patch/5cb3ce140c84b0283be685bae8a5d75d5d19af08.1642688018.git.fdmanana@suse.com/

5) https://patchwork.kernel.org/project/linux-btrfs/patch/3fe2f747e0a9319064d59d051dc3f993fc41b172.1642698605.git.fdmanana@suse.com/

6) https://patchwork.kernel.org/project/linux-btrfs/patch/20aad8ccf0fbdecddd49216f25fa772754f77978.1642700395.git.fdmanana@suse.com/

Hope that helps.
Thanks.


>
> On Thu, Jan 20, 2022 at 12:45 PM Filipe Manana <fdmanana@kernel.org> wrote:
> >
> > On Thu, Jan 20, 2022 at 11:37 AM François-Xavier Thomas
> > <fx.thomas@gmail.com> wrote:
> > >
> > > Hi Felipe,
> > >
> > > > So, try with these two more patches on top of that:
> > >
> > > Thanks, I did just that, see graph with annotations:
> > > https://i.imgur.com/pu66nz0.png
> > >
> > > No visible improvement, average baseline I/O (for roughly similar
> > > workloads, the server I'm testing it on is not very busy I/O-wise) is
> > > still 3-4x higher in 5.16 than in 5.15 with autodefrag enabled.
> >
> > What if on top of those patches, you also add this one:
> >
> > https://pastebin.com/raw/EbEfk1tF
> >
> > Can you see if it helps?
> >
> > >
> > > The good news is that patch 2 did fix a large part of the issues 5.16.0 had.
> > > I also checked that disabling autodefrag immediately brings I/O rate
> > > back to how it was in 5.15.
> >
> > At least that!
> > Thanks.
> >
> > >
> > > >> Some people reported that 5.16.1 improved the situation for them, so
> > > > I don't see how that's possible, nothing was added to 5.16.1 that
> > > > involves defrag.
> > > > Might just be a coincidence.
> > >
> > > Yes, I found no evidence that official 5.16.1 is any better than the
> > > rest on my side.
> > >
> > > François-Xavier
> > >
> > > On Wed, Jan 19, 2022 at 11:14 AM Filipe Manana <fdmanana@kernel.org> wrote:
> > > >
> > > > On Wed, Jan 19, 2022 at 9:44 AM François-Xavier Thomas
> > > > <fx.thomas@gmail.com> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > More details on graph[0]:
> > > > > - First patch (1-byte file) on 5.16.0 did not have a significant impact.
> > > > > - Both patches on 5.16.0 did reduce a large part of the I/O but still
> > > > > have a high baseline I/O compared to 5.15
> > > >
> > > > So, try with these two more patches on top of that:
> > > >
> > > > https://patchwork.kernel.org/project/linux-btrfs/patch/20220118071904.29991-1-wqu@suse.com/
> > > >
> > > > https://patchwork.kernel.org/project/linux-btrfs/patch/20220118115352.52126-1-wqu@suse.com/
> > > >
> > > > >
> > > > > Some people reported that 5.16.1 improved the situation for them, so
> > > >
> > > > I don't see how that's possible, nothing was added to 5.16.1 that
> > > > involves defrag.
> > > > Might just be a coincidence.
> > > >
> > > > Thanks.
> > > >
> > > > > I'm testing that. It's too early to tell but for now the baseline I/O
> > > > > still seems to be high compared to 5.15. Will update with more results
> > > > > tomorrow.
> > > > >
> > > > > François-Xavier
> > > > >
> > > > > [0] https://i.imgur.com/agzAKGc.png
> > > > >
> > > > > On Mon, Jan 17, 2022 at 10:37 PM François-Xavier Thomas
> > > > > <fx.thomas@gmail.com> wrote:
> > > > > >
> > > > > > Hi Filipe,
> > > > > >
> > > > > > Thank you so much for the hints!
> > > > > >
> > > > > > I compiled 5.16 with the 1-byte file patch and have been running it
> > > > > > for a couple of hours now. I/O seems to have been gradually increasing
> > > > > > compared to 5.15, but I will wait for tomorrow to have a clearer view
> > > > > > on the graphs, then I'll try the both patches.
> > > > > >
> > > > > > François-Xavier
> > > > > >
> > > > > > On Mon, Jan 17, 2022 at 5:59 PM Filipe Manana <fdmanana@kernel.org> wrote:
> > > > > > >
> > > > > > > On Mon, Jan 17, 2022 at 12:02:08PM +0000, Filipe Manana wrote:
> > > > > > > > On Mon, Jan 17, 2022 at 11:06:42AM +0100, François-Xavier Thomas wrote:
> > > > > > > > > Hello all,
> > > > > > > > >
> > > > > > > > > Just in case someone is having the same issue: Btrfs (in the
> > > > > > > > > btrfs-cleaner process) is taking a large amount of disk IO after
> > > > > > > > > upgrading to 5.16 on one of my volumes, and multiple other people seem
> > > > > > > > > to be having the same issue, see discussion in [0].
> > > > > > > > >
> > > > > > > > > [1] is a close-up screenshot of disk I/O history (blue line is write
> > > > > > > > > ops, going from a baseline of some 10 ops/s to around 1k ops/s). I
> > > > > > > > > downgraded from 5.16 to 5.15 in the middle, which immediately restored
> > > > > > > > > previous performance.
> > > > > > > > >
> > > > > > > > > Common options between affected people are: ssd, autodefrag. No error
> > > > > > > > > in the logs, and no other issue aside from performance (the volume
> > > > > > > > > works just fine for accessing data).
> > > > > > > > >
> > > > > > > > > One person reports that SMART stats show a massive amount of blocks
> > > > > > > > > being written; unfortunately I do not have historical data for that so
> > > > > > > > > I cannot confirm, but this sounds likely given what I see on what
> > > > > > > > > should be a relatively new SSD.
> > > > > > > > >
> > > > > > > > > Any idea of what it could be related to?
> > > > > > > >
> > > > > > > > There was a big refactor of the defrag code that landed in 5.16.
> > > > > > > >
> > > > > > > > On a quick glance, when using autodefrag it seems we now can end up in an
> > > > > > > > infinite loop by marking the same range for degrag (IO) over and over.
> > > > > > > >
> > > > > > > > Can you try the following patch? (also at https://pastebin.com/raw/QR27Jv6n)
> > > > > > >
> > > > > > > Actually try this one instead:
> > > > > > >
> > > > > > > https://pastebin.com/raw/EbEfk1tF
> > > > > > >
> > > > > > > Also, there's a bug with defrag running into an (almost) infinite loop when
> > > > > > > attempting to defrag a 1 byte file. Someone ran into this and I've just sent
> > > > > > > a fix for it:
> > > > > > >
> > > > > > > https://patchwork.kernel.org/project/linux-btrfs/patch/bcbfce0ff7e21bbfed2484b1457e560edf78020d.1642436805.git.fdmanana@suse.com/
> > > > > > >
> > > > > > > Maybe that is what you are running into when using autodefrag.
> > > > > > > Firt try that fix for the 1 byte file case, and if after that you still run
> > > > > > > into problems, then try with the other patch above as well (both patches
> > > > > > > applied).
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> > > > > > > > index a5bd6926f7ff..0a9f6125a566 100644
> > > > > > > > --- a/fs/btrfs/ioctl.c
> > > > > > > > +++ b/fs/btrfs/ioctl.c
> > > > > > > > @@ -1213,6 +1213,13 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
> > > > > > > >                 if (em->generation < newer_than)
> > > > > > > >                         goto next;
> > > > > > > >
> > > > > > > > +               /*
> > > > > > > > +                * Skip extents already under IO, otherwise we can end up in an
> > > > > > > > +                * infinite loop when using auto defrag.
> > > > > > > > +                */
> > > > > > > > +               if (em->generation == (u64)-1)
> > > > > > > > +                       goto next;
> > > > > > > > +
> > > > > > > >                 /*
> > > > > > > >                  * For do_compress case, we want to compress all valid file
> > > > > > > >                  * extents, thus no @extent_thresh or mergeable check.
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > François-Xavier
> > > > > > > > >
> > > > > > > > > [0] https://www.reddit.com/r/btrfs/comments/s4nrzb/massive_performance_degradation_after_upgrading/
> > > > > > > > > [1] https://imgur.com/oYhYat1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Massive I/O usage from btrfs-cleaner after upgrading to 5.16
  2022-01-20 17:46                 ` Filipe Manana
@ 2022-01-20 18:21                   ` François-Xavier Thomas
  2022-01-21 10:49                     ` Filipe Manana
  0 siblings, 1 reply; 20+ messages in thread
From: François-Xavier Thomas @ 2022-01-20 18:21 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs, Qu Wenruo

> Ok, so new patches to try

Nice, thanks, I'll let you know how that goes tomorrow!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Massive I/O usage from btrfs-cleaner after upgrading to 5.16
  2022-01-20 18:21                   ` François-Xavier Thomas
@ 2022-01-21 10:49                     ` Filipe Manana
  2022-01-21 19:39                       ` François-Xavier Thomas
  0 siblings, 1 reply; 20+ messages in thread
From: Filipe Manana @ 2022-01-21 10:49 UTC (permalink / raw)
  To: François-Xavier Thomas; +Cc: linux-btrfs, Qu Wenruo

On Thu, Jan 20, 2022 at 6:21 PM François-Xavier Thomas
<fx.thomas@gmail.com> wrote:
>
> > Ok, so new patches to try
>
> Nice, thanks, I'll let you know how that goes tomorrow!

You can also get more one on top of those 6:

https://pastebin.com/raw/p87HX6AF

Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Massive I/O usage from btrfs-cleaner after upgrading to 5.16
  2022-01-21 10:49                     ` Filipe Manana
@ 2022-01-21 19:39                       ` François-Xavier Thomas
  2022-01-21 23:34                         ` Qu Wenruo
  0 siblings, 1 reply; 20+ messages in thread
From: François-Xavier Thomas @ 2022-01-21 19:39 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs, Qu Wenruo

Thanks, will add that to the list and test. FYI the 6 patches didn't
seem to have much additional effect today compared to my previous
stack of 4.

On Fri, Jan 21, 2022 at 11:49 AM Filipe Manana <fdmanana@kernel.org> wrote:
>
> On Thu, Jan 20, 2022 at 6:21 PM François-Xavier Thomas
> <fx.thomas@gmail.com> wrote:
> >
> > > Ok, so new patches to try
> >
> > Nice, thanks, I'll let you know how that goes tomorrow!
>
> You can also get more one on top of those 6:
>
> https://pastebin.com/raw/p87HX6AF
>
> Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Massive I/O usage from btrfs-cleaner after upgrading to 5.16
  2022-01-21 19:39                       ` François-Xavier Thomas
@ 2022-01-21 23:34                         ` Qu Wenruo
  2022-01-22 18:20                           ` François-Xavier Thomas
  0 siblings, 1 reply; 20+ messages in thread
From: Qu Wenruo @ 2022-01-21 23:34 UTC (permalink / raw)
  To: François-Xavier Thomas, Filipe Manana; +Cc: linux-btrfs, Qu Wenruo

On 2022/1/22 03:39, François-Xavier Thomas wrote:
> Thanks, will add that to the list and test. FYI the 6 patches didn't
> seem to have much additional effect today compared to my previous
> stack of 4.

Good and bad news.

Good news is, I got my way to reproduce (at least part of) the problem.

With fsstress, a way to trigger autodefrag at will, and io accounting
for data/metadata read/write, it's clear the newer kernel is indeed
causing more IO.

v5.15 (or just revert the defrag code) causes around 8.7% of total data
IO for autodefrag.

While v5.16, even with the 6 patches, causes 18% of total data IO for
autodefrag.

Then bad news.

I have seen cases where v5.15 doesn't defrag ranges which is completely
sane to defrag.

Something like this:

         item 59 key (287 EXTENT_DATA 118784) itemoff 6211 itemsize 53
                 generation 85 type 1 (regular)
                 extent data disk byte 339296256 nr 8192
                 extent data offset 0 nr 8192 ram 8192
                 extent compression 0 (none)
         item 60 key (287 EXTENT_DATA 126976) itemoff 6158 itemsize 53
                 generation 85 type 1 (regular)
                 extent data disk byte 300445696 nr 4096
                 extent data offset 0 nr 4096 ram 4096
                 extent compression 0 (none)
         item 61 key (287 EXTENT_DATA 131072) itemoff 6105 itemsize 53
                 generation 85 type 1 (regular)
                 extent data disk byte 339304448 nr 4096
                 extent data offset 0 nr 4096 ram 4096
                 extent compression 0 (none)
         item 62 key (287 EXTENT_DATA 135168) itemoff 6052 itemsize 53
                 generation 85 type 1 (regular)
                 extent data disk byte 301170688 nr 4096
                 extent data offset 0 nr 4096 ram 4096
                 extent compression 0 (none)
         item 63 key (287 EXTENT_DATA 139264) itemoff 5999 itemsize 53
                 generation 85 type 1 (regular)
                 extent data disk byte 339308544 nr 106496
                 extent data offset 0 nr 106496 ram 106496
                 extent compression 0 (none)

This 124K range is definitely sane to defrag (and the newer_than
parameter is only 35, all extents are a good fit).

But older kernel by some reason (still under investigation) doesn't
choose to defrag at all, while newer kernel is pretty happy to defrag.

Although there are cases newer kernel is doing too small defrag which
doesn't make sense, with such cases fixed, it still results 15% of total
IO for autodefrag.

I'm afraid there may be some bugs or questionable behaviors in the old
defrag code that is not defragging all good candidates.

So even with more fixes, we may just end up with more IO for autodefrag,
purely because old code is not defragging as hard.

Thanks,
Qu
>
> On Fri, Jan 21, 2022 at 11:49 AM Filipe Manana <fdmanana@kernel.org> wrote:
>>
>> On Thu, Jan 20, 2022 at 6:21 PM François-Xavier Thomas
>> <fx.thomas@gmail.com> wrote:
>>>
>>>> Ok, so new patches to try
>>>
>>> Nice, thanks, I'll let you know how that goes tomorrow!
>>
>> You can also get more one on top of those 6:
>>
>> https://pastebin.com/raw/p87HX6AF
>>
>> Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Massive I/O usage from btrfs-cleaner after upgrading to 5.16
  2022-01-21 23:34                         ` Qu Wenruo
@ 2022-01-22 18:20                           ` François-Xavier Thomas
  2022-01-24  7:00                             ` Qu Wenruo
  0 siblings, 1 reply; 20+ messages in thread
From: François-Xavier Thomas @ 2022-01-22 18:20 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Filipe Manana, linux-btrfs, Qu Wenruo

> https://pastebin.com/raw/p87HX6AF

The 7th patch doesn't seem to be having a noticeable improvement so far.

> So even with more fixes, we may just end up with more IO for autodefrag,
> purely because old code is not defragging as hard.

That's unfortunate, but thanks for having looked into it, at least
there's a known reason for the IO increase.

François-Xavier

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Massive I/O usage from btrfs-cleaner after upgrading to 5.16
  2022-01-22 18:20                           ` François-Xavier Thomas
@ 2022-01-24  7:00                             ` Qu Wenruo
  2022-01-25 20:00                               ` François-Xavier Thomas
  0 siblings, 1 reply; 20+ messages in thread
From: Qu Wenruo @ 2022-01-24  7:00 UTC (permalink / raw)
  To: François-Xavier Thomas; +Cc: Filipe Manana, linux-btrfs, Qu Wenruo

On 2022/1/23 02:20, François-Xavier Thomas wrote:
>> https://pastebin.com/raw/p87HX6AF
>
> The 7th patch doesn't seem to be having a noticeable improvement so far.

Mind to test the latest two patches, which still needs the first 6 patches:

https://patchwork.kernel.org/project/linux-btrfs/patch/20220123045242.25247-1-wqu@suse.com/

https://patchwork.kernel.org/project/linux-btrfs/patch/20220124063419.40114-1-wqu@suse.com/

The last one would greatly reduce IO, almost disable autodefrag, as it
will only defrag a full 256K aligned, no hole/preallocated range.

>
>> So even with more fixes, we may just end up with more IO for autodefrag,
>> purely because old code is not defragging as hard.
>
> That's unfortunate, but thanks for having looked into it, at least
> there's a known reason for the IO increase.

And just mentioned in that long commit message of the last RFC patch,
the defrag behavior in fact changed in v5.11 first, which reduced the IO
(if the up-to-256K cluster has any hole in it, the cluster will be
rejected).

While the even older (v5.10-) behavior will try to defrag holes, which
is even less acceptable.

My guess is, sorting by IO caused by autodefrag, the whole picture would
look like this:

v5.10 > v5.16 vanilla > v5.16 + 7 patches > v5.11~v5.15 > v5.16 + 8 patches

v5.10 should be the worst, it has the most amount of IO, but wastes them
for holes/preallocated a lot.

v5.11~v5.15 reduced IO by rejecting a lot of valid cases, but still has
a small bug related to preallocated extents.
But overall, the rejected defrags causes less IO.

v5.16 vanilla is slightly better than v5.10, it skips holes properly,
but doesn't handle preallocated range just like v5.10, along with extra
bugs.

v5.16 + 7 patches, it should be the most balanced one (a little more
towards defrag though).
It can skip all hole/preallocated ranges properly, while still try its
best to defrag small extents.

v5.16 + 8 patches, the worst efficiency for defrag, thus the least
amount of IO.

 From the beginning, defrag code is not that well documented, thus
causing such "hidden" behavior.

I hope with the pain felt in v5.16, we can catch up on the testing
coverage and more defined/documented defrag behavior.

Thanks,
Qu

>
> François-Xavier

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Massive I/O usage from btrfs-cleaner after upgrading to 5.16
  2022-01-24  7:00                             ` Qu Wenruo
@ 2022-01-25 20:00                               ` François-Xavier Thomas
  2022-01-25 23:29                                 ` Qu Wenruo
  0 siblings, 1 reply; 20+ messages in thread
From: François-Xavier Thomas @ 2022-01-25 20:00 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Filipe Manana, linux-btrfs, Qu Wenruo

Hi,

> > Mind to test the latest two patches, which still needs the first 6 patches:
>
> Gotcha, I'll test the following stack tomorrow, omitting patch 7 from
> Filipe (hopefully the filenames are descriptive enough):
>
> 1-btrfs-fix-too-long-loop-when-defragging-a-1-byte-file.patch
> 2-v2-btrfs-defrag-fix-the-wrong-number-of-defragged-sectors.patch
> 3-btrfs-defrag-properly-update-range--start-for-autodefrag.patch
> 4-btrfs-fix-deadlock-when-reserving-space-during-defrag.patch
> 5-btrfs-add-back-missing-dirty-page-rate-limiting-to-defrag.patch
> 6-btrfs-update-writeback-index-when-starting-defrag.patch
> 7-btrfs-defrag-don-t-try-to-merge-regular-extents-with-preallocated-extents.patch
> 8-RFC-btrfs-defrag-abort-the-whole-cluster-if-there-is-any-hole-in-the-range.patch

After testing this one immediately reduces I/O to the 5.15 baseline of
a few 10s of ops/s,
so your hypothesis does seem correct.

François-Xavier

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Massive I/O usage from btrfs-cleaner after upgrading to 5.16
  2022-01-25 20:00                               ` François-Xavier Thomas
@ 2022-01-25 23:29                                 ` Qu Wenruo
  0 siblings, 0 replies; 20+ messages in thread
From: Qu Wenruo @ 2022-01-25 23:29 UTC (permalink / raw)
  To: François-Xavier Thomas; +Cc: Filipe Manana, linux-btrfs, Qu Wenruo



On 2022/1/26 04:00, François-Xavier Thomas wrote:
> Hi,
>
>>> Mind to test the latest two patches, which still needs the first 6 patches:
>>
>> Gotcha, I'll test the following stack tomorrow, omitting patch 7 from
>> Filipe (hopefully the filenames are descriptive enough):
>>
>> 1-btrfs-fix-too-long-loop-when-defragging-a-1-byte-file.patch
>> 2-v2-btrfs-defrag-fix-the-wrong-number-of-defragged-sectors.patch
>> 3-btrfs-defrag-properly-update-range--start-for-autodefrag.patch
>> 4-btrfs-fix-deadlock-when-reserving-space-during-defrag.patch
>> 5-btrfs-add-back-missing-dirty-page-rate-limiting-to-defrag.patch
>> 6-btrfs-update-writeback-index-when-starting-defrag.patch
>> 7-btrfs-defrag-don-t-try-to-merge-regular-extents-with-preallocated-extents.patch
>> 8-RFC-btrfs-defrag-abort-the-whole-cluster-if-there-is-any-hole-in-the-range.patch
>
> After testing this one immediately reduces I/O to the 5.15 baseline of
> a few 10s of ops/s,
> so your hypothesis does seem correct.

Awesome! This indeed proves the major IO is from more extensively defrag
behavior.

Looking forward for the v5.15 POC to get the final nail.

THanks,
Qu

>
> François-Xavier

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-01-25 23:29 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-17 10:06 Massive I/O usage from btrfs-cleaner after upgrading to 5.16 François-Xavier Thomas
2022-01-17 12:02 ` Filipe Manana
2022-01-17 16:59   ` Filipe Manana
2022-01-17 21:37     ` François-Xavier Thomas
2022-01-19  9:44       ` François-Xavier Thomas
2022-01-19 10:13         ` Filipe Manana
2022-01-20 11:37           ` François-Xavier Thomas
2022-01-20 11:44             ` Filipe Manana
2022-01-20 12:02               ` François-Xavier Thomas
2022-01-20 12:45                 ` Qu Wenruo
2022-01-20 12:55                   ` Filipe Manana
2022-01-20 17:46                 ` Filipe Manana
2022-01-20 18:21                   ` François-Xavier Thomas
2022-01-21 10:49                     ` Filipe Manana
2022-01-21 19:39                       ` François-Xavier Thomas
2022-01-21 23:34                         ` Qu Wenruo
2022-01-22 18:20                           ` François-Xavier Thomas
2022-01-24  7:00                             ` Qu Wenruo
2022-01-25 20:00                               ` François-Xavier Thomas
2022-01-25 23:29                                 ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).