All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andreas Dilger <adilger@dilger.ca>
To: Mike Frysinger <vapier@gentoo.org>
Cc: linux-ext4@vger.kernel.org
Subject: Re: e4defrag seems too optimistic
Date: Thu, 29 Apr 2021 14:47:36 -0600	[thread overview]
Message-ID: <A0C999DB-A6D5-4C95-A5B8-92E7002395A7@dilger.ca> (raw)
In-Reply-To: <YIpFK3or2Creo1qg@vapier>

[-- Attachment #1: Type: text/plain, Size: 3184 bytes --]

On Apr 28, 2021, at 11:33 PM, Mike Frysinger <vapier@gentoo.org> wrote:
> 
> i started running e4defrag out of curiosity on some large files that i'm
> archiving long term.  its results seem exceedingly optimistic and i have
> a hard time agreeing with it.  am i pessimistic ?
> 
> for example, i have a ~4GB archive:
> $ e4defrag -c ./foo.tar.xz
> <File>                                         now/best       size/ext
> ./foo.tar.xz
>                                             39442/2             93 KB
> 
> Total/best extents				39442/2
> Average size per extent			93 KB
> Fragmentation score				34
> [0-30 no problem: 31-55 a little bit fragmented: 56- needs defrag]
> This file (./foo.tar.xz) does not need defragmentation.
> Done.
> 
> i have a real hard time seeing this file as barely "a little bit fragmented".
> shouldn't the fragmentation score be higher ?

I would tend to agree.  A 4GB file with 39k 100KB extents is not great.
On an HDD with 125 IOPS (not counting track buffers and such) this would
take about 300s to read at a whopping 13MB/s.  On flash, small writes do
lead to increased wear, but the seeks are free and you may not care.

IMHO, anything below 1MB/extent is sub-optimal in terms of IO performance,
and a sign of filesystem fragmentation (or a very poor IO pattern), since
mballoc should try to do allocation in 8MB chunks for large writes.

In many respects, if the extents are large enough, the "cost" of a seek
hidden by the device bandwidth (e.g. 250 MB/s / 125 seeks/sec = 2MB for
a good HDD today, scale linearly for RAID-5/6), so any extent larger than
this is not limited by seeks. Should 1024 x 4MB extents in a 4GB file be
considered fragmented or not?  Definitely 108KB/extent should be.

However, the "ideal = 2" case is bogus, since extents are max size 128MB,
so you would need at least 32 for a perfect 4GB file.  In that respect,
e4defrag is at best a "working prototype" but I don't think many people
use it, and has not gotten many improvements since it was first landed.
If you have a better idea for a "fragmentation score" I would be open
to looking at it, doubly so if it comes in the form of a patch.

You could check the actual file layout using "fallocate -v" before/after
running e4defrag to see how the allocation was changed.  This would tell
you if it is actually helping or not.  I've thought for a while that it
would be useful to add the same "fragmentation score" to filefrag, but
that would be contingent on the score actually making sense.

You can also use "e2freefrag" to check the filesystem as a whole to see
whether the free space is badly fragmented (i.e. most free chunks < 8MB).
In that case, running e4defrag _may_ help you, but it is not "smart" like
the old DOS defrag utilities, since it just rewrites each file separately
instead of having a "plan" for how to defrag the whole filesystem.

> as a measure of "how fragmented is it really", if i copy the file and then
> delete the original, there's a noticeable delay before `rm` finishes.

Yes, that would be totally clear if you ran filefrag on the file first.

Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 873 bytes --]

      reply	other threads:[~2021-04-29 20:47 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-29  5:33 e4defrag seems too optimistic Mike Frysinger
2021-04-29 20:47 ` Andreas Dilger [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=A0C999DB-A6D5-4C95-A5B8-92E7002395A7@dilger.ca \
    --to=adilger@dilger.ca \
    --cc=linux-ext4@vger.kernel.org \
    --cc=vapier@gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.