Re: Massive I/O usage from btrfs-cleaner after upgrading to 5.16

From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: "François-Xavier Thomas" <fx.thomas@gmail.com>
Cc: Filipe Manana <fdmanana@kernel.org>,
	linux-btrfs <linux-btrfs@vger.kernel.org>,
	Qu Wenruo <wqu@suse.com>
Subject: Re: Massive I/O usage from btrfs-cleaner after upgrading to 5.16
Date: Mon, 24 Jan 2022 15:00:24 +0800	[thread overview]
Message-ID: <c88c1438-ebcf-a652-1940-4daa4ee53be9@gmx.com> (raw)
In-Reply-To: <CAEwRaO58oCzY4GjU3gCSru3Gq02GvGSdkg5hPwCKMwYcZ+cM2Q@mail.gmail.com>

On 2022/1/23 02:20, François-Xavier Thomas wrote:
>> https://pastebin.com/raw/p87HX6AF
>
> The 7th patch doesn't seem to be having a noticeable improvement so far.

Mind to test the latest two patches, which still needs the first 6 patches:

https://patchwork.kernel.org/project/linux-btrfs/patch/20220123045242.25247-1-wqu@suse.com/

https://patchwork.kernel.org/project/linux-btrfs/patch/20220124063419.40114-1-wqu@suse.com/

The last one would greatly reduce IO, almost disable autodefrag, as it
will only defrag a full 256K aligned, no hole/preallocated range.

>
>> So even with more fixes, we may just end up with more IO for autodefrag,
>> purely because old code is not defragging as hard.
>
> That's unfortunate, but thanks for having looked into it, at least
> there's a known reason for the IO increase.

And just mentioned in that long commit message of the last RFC patch,
the defrag behavior in fact changed in v5.11 first, which reduced the IO
(if the up-to-256K cluster has any hole in it, the cluster will be
rejected).

While the even older (v5.10-) behavior will try to defrag holes, which
is even less acceptable.

My guess is, sorting by IO caused by autodefrag, the whole picture would
look like this:

v5.10 > v5.16 vanilla > v5.16 + 7 patches > v5.11~v5.15 > v5.16 + 8 patches

v5.10 should be the worst, it has the most amount of IO, but wastes them
for holes/preallocated a lot.

v5.11~v5.15 reduced IO by rejecting a lot of valid cases, but still has
a small bug related to preallocated extents.
But overall, the rejected defrags causes less IO.

v5.16 vanilla is slightly better than v5.10, it skips holes properly,
but doesn't handle preallocated range just like v5.10, along with extra
bugs.

v5.16 + 7 patches, it should be the most balanced one (a little more
towards defrag though).
It can skip all hole/preallocated ranges properly, while still try its
best to defrag small extents.

v5.16 + 8 patches, the worst efficiency for defrag, thus the least
amount of IO.

 From the beginning, defrag code is not that well documented, thus
causing such "hidden" behavior.

I hope with the pain felt in v5.16, we can catch up on the testing
coverage and more defined/documented defrag behavior.

Thanks,
Qu

>
> François-Xavier