All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Zygo Blaxell <zblaxell@furryterror.org>,
	Jan Ziak <0xe2.0x9a.0x9b@gmail.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Btrfs autodefrag wrote 5TB in one day to a 0.5TB SSD without a measurable benefit
Date: Sat, 12 Mar 2022 11:24:18 +0800	[thread overview]
Message-ID: <59c57200-9c77-3b8a-ab9d-11aef96da852@gmx.com> (raw)
In-Reply-To: <YiwIxnCMjsl8BPPA@hungrycats.org>



On 2022/3/12 10:43, Zygo Blaxell wrote:
> On Sat, Mar 12, 2022 at 12:28:10AM +0100, Jan Ziak wrote:
>> On Sat, Mar 12, 2022 at 12:04 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>> As stated before, autodefrag is not really that useful for database.
>>
>> Do you realize that you are claiming that btrfs autodefrag should not
>> - by design - be effective in the case of high-fragmentation files? If
>> it isn't supposed to be useful for high-fragmentation files then where
>> is it supposed to be useful? Low-fragmentation files?
>
> IMHO it's best to deprecate the in-kernel autodefrag option, and start
> over with a better approach.  The kernel is the wrong place to solve
> this problem, and the undesirable and unfixable things in autodefrag
> are a consequence of that early design error.

I'm having the same feeling exactly.

Especially the current autodefrag is putting its own policy (transid
filter) without providing a mechanism to utilize from user space.

Exactly the opposite what we should do, provide a mechanism not a policy.

Not to mention there are quite some limitations of the current policy.


But unfortunately, even we deprecate it right now, it will takes a long
time to really remove it from kernel.

While on the other hand, we also need to introduce new parameters like
@newer_than, and @max_to_defrag to the ioctl interface.

Which may already eat up the unused bytes (only 16 bytes, while
newer_than needs u64, max_to_defrag may also need to be u64).

And user space tool lacks one of the critical info, where the small
writes are.

So even I can't be more happier to deprecate the autodefrag, we still
need to hang on it for a pretty lone time, before a user space tool
which can do everything the same as autodefrag.

Thanks,
Qu

>
> As far as I can tell, in-kernel autodefrag's only purpose is to provide
> exposure to new and exciting bugs on each kernel release, and a lot of
> uncontrolled IO demands even when it's working perfectly.  Inevitably,
> re-reading old fragments that are no longer in memory will consume RAM
> and iops during writeback activity, when memory and IO bandwidth is least
> available.  If we avoid expensive re-reading of extents, then we don't
> get a useful rate of reduction of fragmentation, because we can't coalesce
> small new exists with small existing ones.  If we try to fix these issues
> one at a time, the feature would inevitably grow a lot of complicated
> and brittle configuration knobs to turn it off selectively, because it's
> so awful without extensive filtering.
>
> All the above criticism applies to abstract ideal in-kernel autodefrag,
> _before_ considering whether a concrete implementation might have
> limitations or bugs which make it worse than the already-bad best case.
> 5.16 happened to have a lot of examples of these, but fixing the
> regressions can only restore autodefrag's relative harmlessness, not
> add utility within the constraints the kernel is under.
>
> The right place to do autodefrag is userspace.  Interfaces already
> exist for userspace to 1) discover new extents and their neighbors,
> quickly and safely, across the entire filesystem; 2) invoke defrag_range
> on file extent ranges found in step 1; and 3) run a while (true)
> loop that periodically performs steps 1 and 2.  Indeed, the existing
> kernel autodefrag implementation is already using the same back-end
> infrastructure for parts 1 and 2, so all that would be required for
> userspace is to reimplement (and start improving upon) part 3.
>
> A command-line utility or daemon can locate new extents immediately with
> tree_search queries, either at filesystem-wide scales, or directed at
> user-chosen file subsets.  Tools can quickly assess whether new extents
> are good candidates for defrag, then coalesce them with their neighbors.
>
> The user can choose between different tools to decide basic policy
> questions like: whether to run once in a batch job or continuously in
> the background, what amounts of IO bandwidth and memory to consume,
> whether to recompress data with a more aggressive algorithm/level, which
> reference to a snapshot-shared extent should be preferred for defrag,
> file-type-specific layout optimizations to apply, or any custom or
> experimental selection, scheduling, or optimization logic desired.
>
> Implementations can be kept simple because it's not necessary for
> userspace tools to pile every possible option into a single implementation,
> and support every released option forever (as required for the kernel).
> A specialist implementation can discard existing code with impunity or
> start from scratch with an experimental algorithm, and spend its life
> in a fork of the main userspace autodefrag project with niche users
> who never have to cope with generic users' use cases and vice versa.
> This efficiently distributes development and maintenance costs.
>
> Userspace autodefrag can be implemented today in any programming language
> with btrfs ioctl support, and run on any kernel released in the last
> 6 years.  Alas, I don't know of anybody who's released a userspace
> autodefrag tool yet, and it hasn't been important enough to me to build
> one myself (other than a few proof-of-concept prototypes).
>
> For now, I do defrag mostly ad-hoc with 'btrfs fi defrag' on the most
> severely fragmented files (top N list of files with the highest extent
> counts on the filesystem), and ignore fragmentation everywhere else.
>
>
>> -Jan

  reply	other threads:[~2022-03-12  3:24 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-06 15:59 Btrfs autodefrag wrote 5TB in one day to a 0.5TB SSD without a measurable benefit Jan Ziak
2022-03-07  0:48 ` Qu Wenruo
2022-03-07  2:23   ` Jan Ziak
2022-03-07  2:39     ` Qu Wenruo
2022-03-07  7:31       ` Qu Wenruo
2022-03-10  1:10         ` Jan Ziak
2022-03-10  1:26           ` Qu Wenruo
2022-03-10  4:33             ` Jan Ziak
2022-03-10  6:42               ` Qu Wenruo
2022-03-10 21:31                 ` Jan Ziak
2022-03-10 23:27                   ` Qu Wenruo
2022-03-11  2:42                     ` Jan Ziak
2022-03-11  2:59                       ` Qu Wenruo
2022-03-11  5:04                         ` Jan Ziak
2022-03-11 16:31                           ` Jan Ziak
2022-03-11 20:02                             ` Jan Ziak
2022-03-11 23:04                             ` Qu Wenruo
2022-03-11 23:28                               ` Jan Ziak
2022-03-11 23:39                                 ` Qu Wenruo
2022-03-12  0:01                                   ` Jan Ziak
2022-03-12  0:15                                     ` Qu Wenruo
2022-03-12  3:16                                     ` Zygo Blaxell
2022-03-12  2:43                                 ` Zygo Blaxell
2022-03-12  3:24                                   ` Qu Wenruo [this message]
2022-03-12  3:48                                     ` Zygo Blaxell
2022-03-14 20:09                         ` Phillip Susi
2022-03-14 22:59                           ` Zygo Blaxell
2022-03-15 18:28                             ` Phillip Susi
2022-03-15 19:28                               ` Jan Ziak
2022-03-15 21:06                               ` Zygo Blaxell
2022-03-15 22:20                                 ` Jan Ziak
2022-03-16 17:02                                   ` Zygo Blaxell
2022-03-16 17:48                                     ` Jan Ziak
2022-03-17  2:11                                       ` Zygo Blaxell
2022-03-16 18:46                                 ` Phillip Susi
2022-03-16 19:59                                   ` Zygo Blaxell
2022-03-20 17:50                             ` Forza
2022-03-20 21:15                               ` Zygo Blaxell
2022-03-08 21:57       ` Jan Ziak
2022-03-08 23:40         ` Qu Wenruo
2022-03-09 22:22           ` Jan Ziak
2022-03-09 22:44             ` Qu Wenruo
2022-03-09 22:55               ` Jan Ziak
2022-03-09 23:00                 ` Jan Ziak
2022-03-09  4:48         ` Zygo Blaxell
2022-03-07 14:30 ` Phillip Susi
2022-03-08 21:43   ` Jan Ziak
2022-03-09 18:46     ` Phillip Susi
2022-03-09 21:35       ` Jan Ziak
2022-03-14 20:02         ` Phillip Susi
2022-03-14 21:53           ` Jan Ziak
2022-03-14 22:24             ` Remi Gauvin
2022-03-14 22:51               ` Zygo Blaxell
2022-03-14 23:07                 ` Remi Gauvin
2022-03-14 23:39                   ` Zygo Blaxell
2022-03-15 14:14                     ` Remi Gauvin
2022-03-15 18:51                       ` Zygo Blaxell
2022-03-15 19:22                         ` Remi Gauvin
2022-03-15 21:08                           ` Zygo Blaxell
2022-03-15 18:15             ` Phillip Susi
2022-03-16 16:52           ` Andrei Borzenkov
2022-03-16 18:28             ` Jan Ziak
2022-03-16 18:31             ` Phillip Susi
2022-03-16 18:43               ` Andrei Borzenkov
2022-03-16 18:46               ` Jan Ziak
2022-03-16 19:04               ` Zygo Blaxell
2022-03-17 20:34                 ` Phillip Susi
2022-03-17 22:06                   ` Zygo Blaxell
2022-03-16 12:47 ` Kai Krakow
2022-03-16 18:18   ` Jan Ziak
2022-06-17  0:20 Jan Ziak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=59c57200-9c77-3b8a-ab9d-11aef96da852@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=0xe2.0x9a.0x9b@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=zblaxell@furryterror.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.