Re: Btrfs autodefrag wrote 5TB in one day to a 0.5TB SSD without a measurable benefit

From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Zygo Blaxell <zblaxell@furryterror.org>,
	Jan Ziak <0xe2.0x9a.0x9b@gmail.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Btrfs autodefrag wrote 5TB in one day to a 0.5TB SSD without a measurable benefit
Date: Sat, 12 Mar 2022 11:24:18 +0800	[thread overview]
Message-ID: <59c57200-9c77-3b8a-ab9d-11aef96da852@gmx.com> (raw)
In-Reply-To: <YiwIxnCMjsl8BPPA@hungrycats.org>

On 2022/3/12 10:43, Zygo Blaxell wrote:
> On Sat, Mar 12, 2022 at 12:28:10AM +0100, Jan Ziak wrote:
>> On Sat, Mar 12, 2022 at 12:04 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>> As stated before, autodefrag is not really that useful for database.
>>
>> Do you realize that you are claiming that btrfs autodefrag should not
>> - by design - be effective in the case of high-fragmentation files? If
>> it isn't supposed to be useful for high-fragmentation files then where
>> is it supposed to be useful? Low-fragmentation files?
>
> IMHO it's best to deprecate the in-kernel autodefrag option, and start
> over with a better approach.  The kernel is the wrong place to solve
> this problem, and the undesirable and unfixable things in autodefrag
> are a consequence of that early design error.

I'm having the same feeling exactly.

Especially the current autodefrag is putting its own policy (transid
filter) without providing a mechanism to utilize from user space.

Exactly the opposite what we should do, provide a mechanism not a policy.

Not to mention there are quite some limitations of the current policy.

But unfortunately, even we deprecate it right now, it will takes a long
time to really remove it from kernel.

While on the other hand, we also need to introduce new parameters like
@newer_than, and @max_to_defrag to the ioctl interface.

Which may already eat up the unused bytes (only 16 bytes, while
newer_than needs u64, max_to_defrag may also need to be u64).

And user space tool lacks one of the critical info, where the small
writes are.

So even I can't be more happier to deprecate the autodefrag, we still
need to hang on it for a pretty lone time, before a user space tool
which can do everything the same as autodefrag.

Thanks,
Qu

>
> As far as I can tell, in-kernel autodefrag's only purpose is to provide
> exposure to new and exciting bugs on each kernel release, and a lot of
> uncontrolled IO demands even when it's working perfectly.  Inevitably,
> re-reading old fragments that are no longer in memory will consume RAM
> and iops during writeback activity, when memory and IO bandwidth is least
> available.  If we avoid expensive re-reading of extents, then we don't
> get a useful rate of reduction of fragmentation, because we can't coalesce
> small new exists with small existing ones.  If we try to fix these issues
> one at a time, the feature would inevitably grow a lot of complicated
> and brittle configuration knobs to turn it off selectively, because it's
> so awful without extensive filtering.
>
> All the above criticism applies to abstract ideal in-kernel autodefrag,
> _before_ considering whether a concrete implementation might have
> limitations or bugs which make it worse than the already-bad best case.
> 5.16 happened to have a lot of examples of these, but fixing the
> regressions can only restore autodefrag's relative harmlessness, not
> add utility within the constraints the kernel is under.
>
> The right place to do autodefrag is userspace.  Interfaces already
> exist for userspace to 1) discover new extents and their neighbors,
> quickly and safely, across the entire filesystem; 2) invoke defrag_range
> on file extent ranges found in step 1; and 3) run a while (true)
> loop that periodically performs steps 1 and 2.  Indeed, the existing
> kernel autodefrag implementation is already using the same back-end
> infrastructure for parts 1 and 2, so all that would be required for
> userspace is to reimplement (and start improving upon) part 3.
>
> A command-line utility or daemon can locate new extents immediately with
> tree_search queries, either at filesystem-wide scales, or directed at
> user-chosen file subsets.  Tools can quickly assess whether new extents
> are good candidates for defrag, then coalesce them with their neighbors.
>
> The user can choose between different tools to decide basic policy
> questions like: whether to run once in a batch job or continuously in
> the background, what amounts of IO bandwidth and memory to consume,
> whether to recompress data with a more aggressive algorithm/level, which
> reference to a snapshot-shared extent should be preferred for defrag,
> file-type-specific layout optimizations to apply, or any custom or
> experimental selection, scheduling, or optimization logic desired.
>
> Implementations can be kept simple because it's not necessary for
> userspace tools to pile every possible option into a single implementation,
> and support every released option forever (as required for the kernel).
> A specialist implementation can discard existing code with impunity or
> start from scratch with an experimental algorithm, and spend its life
> in a fork of the main userspace autodefrag project with niche users
> who never have to cope with generic users' use cases and vice versa.
> This efficiently distributes development and maintenance costs.
>
> Userspace autodefrag can be implemented today in any programming language
> with btrfs ioctl support, and run on any kernel released in the last
> 6 years.  Alas, I don't know of anybody who's released a userspace
> autodefrag tool yet, and it hasn't been important enough to me to build
> one myself (other than a few proof-of-concept prototypes).
>
> For now, I do defrag mostly ad-hoc with 'btrfs fi defrag' on the most
> severely fragmented files (top N list of files with the highest extent
> counts on the filesystem), and ignore fragmentation everywhere else.
>
>
>> -Jan