linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Filipe Manana <fdmanana@kernel.org>
To: Dominique MARTINET <dominique.martinet@atmark-techno.com>
Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>,
	Josef Bacik <josef@toxicpanda.com>, Chris Mason <clm@fb.com>,
	David Sterba <dsterba@suse.com>,
	linux-btrfs@vger.kernel.org, lkml <linux-kernel@vger.kernel.org>,
	Chen Liang-Chun <featherclc@gmail.com>,
	Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>,
	kernel@openvz.org, Yu Kuai <yukuai3@huawei.com>,
	"Theodore Ts'o" <tytso@mit.edu>
Subject: Re: fiemap is slow on btrfs on files with multiple extents
Date: Wed, 21 Sep 2022 10:00:37 +0100	[thread overview]
Message-ID: <CAL3q7H5cL+4W6SQApq=ZhkzffvZAR2cEWK0bduNun+OkFevk=g@mail.gmail.com> (raw)
In-Reply-To: <Yyq9lfH3AP8I/pwd@atmark-techno.com>

On Wed, Sep 21, 2022 at 8:30 AM Dominique MARTINET
<dominique.martinet@atmark-techno.com> wrote:
>
> Filipe Manana wrote on Thu, Sep 01, 2022 at 02:25:12PM +0100:
> > It took me a bit more than I expected, but here is the patchset to make fiemap
> > (and lseek) much more efficient on btrfs:
> >
> > https://lore.kernel.org/linux-btrfs/cover.1662022922.git.fdmanana@suse.com/
> >
> > And also available in this git branch:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux.git/log/?h=lseek_fiemap_scalability
>
> Thanks a lot!
> Sorry for the slow reply, it took me a while to find time to get back to
> my test setup.
>
> There's still this weird behaviour that later calls to cp are slower
> than the first, but the improvement is so good that it doesn't matter
> quite as much -- I haven't been able to reproduce the rcu stalls in qemu
> so I can't say for sure but they probably won't be a problem anymore.
>
> From a quick look with perf record/report the difference still seems to
> stem from fiemap (time spent there goes from 4.13 to 45.20%), so there
> is still more processing once the file is (at least partially) in cache,
> but it has gotten much better.
>
>
> (tests run on a laptop so assume some inconsistency with thermal
> throttling etc)
>
> /mnt/t/t # compsize bigfile
> Processed 1 file, 194955 regular extents (199583 refs), 0 inline.
> Type       Perc     Disk Usage   Uncompressed Referenced
> TOTAL       15%      3.7G          23G          23G
> none       100%      477M         477M         514M
> zstd        14%      3.2G          23G          23G
> /mnt/t/t # time cp bigfile /dev/null
> real    0m 44.52s
> user    0m 0.49s
> sys     0m 32.91s
> /mnt/t/t # time cp bigfile /dev/null
> real    0m 46.81s
> user    0m 0.55s
> sys     0m 35.63s
> /mnt/t/t # time cp bigfile /dev/null
> real    1m 13.63s
> user    0m 0.55s
> sys     1m 1.89s
> /mnt/t/t # time cp bigfile /dev/null
> real    1m 13.44s
> user    0m 0.53s
> sys     1m 2.08s
>
>
> For comparison here's how it was on 6.0-rc2 your branch is based on:
> /mnt/t/t # time cp atde-test /dev/null
> real    0m 46.17s
> user    0m 0.60s
> sys     0m 33.21s
> /mnt/t/t # time cp atde-test /dev/null
> real    5m 35.92s
> user    0m 0.57s
> sys     5m 24.20s
>
>
>
> If you're curious the report blames set_extent_bit and
> clear_state_bit as follow; get_extent_skip_holes is completely gone; but
> I wouldn't necessarily say this needs much more time spent on it.

get_extent_skip_holes() no longer exists, so 0% of time spent there :)

Yes, I know. The reason you see so much time spent on
lock_extent_bits() is basically
because cp does too many fiemap calls with a very small extent buffer size.
I pointed that out here:

https://lore.kernel.org/linux-btrfs/CAL3q7H5NSVicm7nYBJ7x8fFkDpno8z3PYt5aPU43Bajc1H0h1Q@mail.gmail.com/

Making it use a larger buffer (say 500 or 1000 extents), would make it
a lot better.
But as I pointed out there, last year cp was changed to not use fiemap
to detect holes anymore,
now it uses lseek with SEEK_HOLE mode. So with time, everyone will get
a cp version that does
not use fiemap anymore.

Also, for the cp case, since it does many read and fiemap calls to the
source file, the following
patch probably helps too:

https://lore.kernel.org/linux-btrfs/20220819024408.9714-1-ethanlien@synology.com/

Because it will make the io tree smaller. That should land on 6.1 too.

Thanks for testing and the report.

>
> 45.20%--extent_fiemap
> |
> |--31.02%--lock_extent_bits
> |          |
> |           --30.78%--set_extent_bit
> |                     |
> |                     |--6.93%--insert_state
> |                     |          |
> |                     |           --0.70%--set_state_bits
> |                     |
> |                     |--4.25%--alloc_extent_state
> |                     |          |
> |                     |           --3.86%--kmem_cache_alloc
> |                     |
> |                     |--2.77%--_raw_spin_lock
> |                     |          |
> |                     |           --1.23%--preempt_count_add
> |                     |
> |                     |--2.48%--rb_next
> |                     |
> |                     |--1.13%--_raw_spin_unlock
> |                     |          |
> |                     |           --0.55%--preempt_count_sub
> |                     |
> |                      --0.92%--set_state_bits
> |
>  --13.80%--__clear_extent_bit
>            |
>             --13.30%--clear_state_bit
>                       |
>                       |           --3.48%--_raw_spin_unlock_irqrestore
>                       |
>                       |--2.45%--merge_state.part.0
>                       |          |
>                       |           --1.57%--rb_next
>                       |
>                       |--2.14%--__slab_free
>                       |          |
>                       |           --1.26%--cmpxchg_double_slab.constprop.0.isra.0
>                       |
>                       |--0.74%--free_extent_state
>                       |
>                       |--0.70%--kmem_cache_free
>                       |
>                       |--0.69%--btrfs_clear_delalloc_extent
>                       |
>                        --0.52%--rb_next
>
>
>
> Thanks!
> --
> Dominique

      reply	other threads:[~2022-09-21  9:01 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-04 16:30 fiemap is slow on btrfs on files with multiple extents Pavel Tikhomirov
2022-08-04 18:49 ` Josef Bacik
2022-08-05  4:52   ` Wang Yugui
2022-08-05  7:38 ` Dominique MARTINET
2022-08-05  9:54   ` Filipe Manana
2022-09-01 13:25     ` Filipe Manana
2022-09-01 15:06       ` Pavel Tikhomirov
2022-09-21  7:30       ` Dominique MARTINET
2022-09-21  9:00         ` Filipe Manana [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAL3q7H5cL+4W6SQApq=ZhkzffvZAR2cEWK0bduNun+OkFevk=g@mail.gmail.com' \
    --to=fdmanana@kernel.org \
    --cc=alexander.mikhalitsyn@virtuozzo.com \
    --cc=clm@fb.com \
    --cc=dominique.martinet@atmark-techno.com \
    --cc=dsterba@suse.com \
    --cc=featherclc@gmail.com \
    --cc=josef@toxicpanda.com \
    --cc=kernel@openvz.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ptikhomirov@virtuozzo.com \
    --cc=tytso@mit.edu \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).