linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Filipe Manana <fdmanana@kernel.org>
To: Wang Yugui <wangyugui@e16-tech.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 00/10] btrfs: make lseek and fiemap much more efficient
Date: Fri, 2 Sep 2022 09:24:12 +0100	[thread overview]
Message-ID: <CAL3q7H79BWAJVk2ecWqa4mbW0+WFJrEX-=a+Zg9FOc_UcAKjLg@mail.gmail.com> (raw)
In-Reply-To: <20220902085320.642A.409509F4@e16-tech.com>

On Fri, Sep 2, 2022 at 2:09 AM Wang Yugui <wangyugui@e16-tech.com> wrote:
>
> Hi,
>
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > We often get reports of fiemap and hole/data seeking (lseek) being too slow
> > on btrfs, or even unusable in some cases due to being extremely slow.
> >
> > Some recent reports for fiemap:
> >
> >     https://lore.kernel.org/linux-btrfs/21dd32c6-f1f9-f44a-466a-e18fdc6788a7@virtuozzo.com/
> >     https://lore.kernel.org/linux-btrfs/Ysace25wh5BbLd5f@atmark-techno.com/
> >
> > For lseek (LSF/MM from 2017):
> >
> >    https://lwn.net/Articles/718805/
> >
> > Basically both are slow due to very high algorithmic complexity which
> > scales badly with the number of extents in a file and the heigth of
> > subvolume and extent b+trees.
> >
> > Using Pavel's test case (first Link tag for fiemap), which uses files with
> > many 4K extents and holes before and after each extent (kind of a worst
> > case scenario), the speedup is of several orders of magnitude (for the 1G
> > file, from ~225 seconds down to ~0.1 seconds).
> >
> > Finally the new algorithm for fiemap also ends up solving a bug with the
> > current algorithm. This happens because we are currently relying on extent
> > maps to report extents, which can be merged, and this may cause us to
> > report 2 different extents as a single one that is not shared but one of
> > them is shared (or the other way around). More details on this on patches
> > 9/10 and 10/10.
> >
> > Patches 1/10 and 2/10 are for lseek, introducing some code that will later
> > be used by fiemap too (patch 10/10). More details in the changelogs.
> >
> > There are a few more things that can be done to speedup fiemap and lseek,
> > but I'll leave those other optimizations I have in mind for some other time.
> >
> > Filipe Manana (10):
> >   btrfs: allow hole and data seeking to be interruptible
> >   btrfs: make hole and data seeking a lot more efficient
> >   btrfs: remove check for impossible block start for an extent map at fiemap
> >   btrfs: remove zero length check when entering fiemap
> >   btrfs: properly flush delalloc when entering fiemap
> >   btrfs: allow fiemap to be interruptible
> >   btrfs: rename btrfs_check_shared() to a more descriptive name
> >   btrfs: speedup checking for extent sharedness during fiemap
> >   btrfs: skip unnecessary extent buffer sharedness checks during fiemap
> >   btrfs: make fiemap more efficient and accurate reporting extent sharedness
> >
> >  fs/btrfs/backref.c     | 153 ++++++++-
> >  fs/btrfs/backref.h     |  20 +-
> >  fs/btrfs/ctree.h       |  22 +-
> >  fs/btrfs/extent-tree.c |  10 +-
> >  fs/btrfs/extent_io.c   | 703 ++++++++++++++++++++++++++++-------------
> >  fs/btrfs/file.c        | 439 +++++++++++++++++++++++--
> >  fs/btrfs/inode.c       | 146 ++-------
> >  7 files changed, 1111 insertions(+), 382 deletions(-)
>
>
> An infinite loop happen when the 10 pathes applied to 6.0-rc3.

Nop, it's not an infinite loop, and it happens as well before the patchset.
The reason is that the files created by the test are very sparse and
with small extents.
It's full of 4K extents surrounded by 8K holes.

So any one doing hole seeking, advances 8K on every lseek call.
If you strace the cp process, with

strace -p <cp pid>

You'll see something like this filling your terminal:

(...)
lseek(3, 18808832, SEEK_SET)            = 18808832
write(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
4096) = 4096
read(3, "a\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
4096) = 4096
write(4, "a\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
4096) = 4096
lseek(3, 18817024, SEEK_SET)            = 18817024
write(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
4096) = 4096
read(3, "a\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
4096) = 4096
write(4, "a\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
4096) = 4096
lseek(3, 18825216, SEEK_SET)            = 18825216
write(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
4096) = 4096
read(3, "a\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
4096) = 4096
write(4, "a\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
4096) = 4096
lseek(3, 18833408, SEEK_SET)            = 18833408
write(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
4096) = 4096
read(3, "a\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
4096) = 4096
write(4, "a\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
4096) = 4096
lseek(3, 18841600, SEEK_SET)            = 18841600
write(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
4096) = 4096
read(3, "a\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
4096) = 4096
write(4, "a\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
4096) = 4096
lseek(3, 18849792, SEEK_SET)            = 18849792
write(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
4096) = 4096
read(3, "a\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
4096) = 4096
write(4, "a\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
4096) = 4096
lseek(3, 18857984, SEEK_SET)            = 18857984
(...)

It takes a long time, but it finishes. If you notice the difference
between each return
value is exactly 8K.

That happens both before and after the patchset.

Thanks.


>
> a file is created by 'pavels-test.c' of [PATCH 10/10].
> and then '/bin/cp /mnt/test/file1 /dev/null' will trigger an infinite
> loop.
>
> 'sysrq -l' output:
>
> [ 1437.765228] Call Trace:
> [ 1437.765228]  <TASK>
> [ 1437.765228]  set_extent_bit+0x33d/0x6e0 [btrfs]
> [ 1437.765228]  lock_extent_bits+0x64/0xa0 [btrfs]
> [ 1437.765228]  btrfs_file_llseek+0x192/0x5b0 [btrfs]
> [ 1437.765228]  ksys_lseek+0x64/0xb0
> [ 1437.765228]  do_syscall_64+0x58/0x80
> [ 1437.765228]  ? syscall_exit_to_user_mode+0x12/0x30
> [ 1437.765228]  ? do_syscall_64+0x67/0x80
> [ 1437.765228]  ? do_syscall_64+0x67/0x80
> [ 1437.765228]  ? exc_page_fault+0x64/0x140
> [ 1437.765228]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [ 1437.765228] RIP: 0033:0x7f5a263441bb
>
> Best Regards
> Wang Yugui (wangyugui@e16-tech.com)
> 2022/09/02
>
>

  reply	other threads:[~2022-09-02  8:24 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-01 13:18 [PATCH 00/10] btrfs: make lseek and fiemap much more efficient fdmanana
2022-09-01 13:18 ` [PATCH 01/10] btrfs: allow hole and data seeking to be interruptible fdmanana
2022-09-01 13:58   ` Josef Bacik
2022-09-01 21:49   ` Qu Wenruo
2022-09-01 13:18 ` [PATCH 02/10] btrfs: make hole and data seeking a lot more efficient fdmanana
2022-09-01 14:03   ` Josef Bacik
2022-09-01 15:00     ` Filipe Manana
2022-09-02 13:26       ` Josef Bacik
2022-09-01 22:18   ` Qu Wenruo
2022-09-02  8:36     ` Filipe Manana
2022-09-11 22:12   ` Qu Wenruo
2022-09-12  8:38     ` Filipe Manana
2022-09-01 13:18 ` [PATCH 03/10] btrfs: remove check for impossible block start for an extent map at fiemap fdmanana
2022-09-01 14:03   ` Josef Bacik
2022-09-01 22:19   ` Qu Wenruo
2022-09-01 13:18 ` [PATCH 04/10] btrfs: remove zero length check when entering fiemap fdmanana
2022-09-01 14:04   ` Josef Bacik
2022-09-01 22:24   ` Qu Wenruo
2022-09-01 13:18 ` [PATCH 05/10] btrfs: properly flush delalloc " fdmanana
2022-09-01 14:06   ` Josef Bacik
2022-09-01 22:38   ` Qu Wenruo
2022-09-01 13:18 ` [PATCH 06/10] btrfs: allow fiemap to be interruptible fdmanana
2022-09-01 14:07   ` Josef Bacik
2022-09-01 22:42   ` Qu Wenruo
2022-09-02  8:38     ` Filipe Manana
2022-09-01 13:18 ` [PATCH 07/10] btrfs: rename btrfs_check_shared() to a more descriptive name fdmanana
2022-09-01 14:08   ` Josef Bacik
2022-09-01 22:45   ` Qu Wenruo
2022-09-01 13:18 ` [PATCH 08/10] btrfs: speedup checking for extent sharedness during fiemap fdmanana
2022-09-01 14:23   ` Josef Bacik
2022-09-01 22:50   ` Qu Wenruo
2022-09-02  8:46     ` Filipe Manana
2022-09-01 13:18 ` [PATCH 09/10] btrfs: skip unnecessary extent buffer sharedness checks " fdmanana
2022-09-01 14:26   ` Josef Bacik
2022-09-01 23:01   ` Qu Wenruo
2022-09-01 13:18 ` [PATCH 10/10] btrfs: make fiemap more efficient and accurate reporting extent sharedness fdmanana
2022-09-01 14:35   ` Josef Bacik
2022-09-01 15:04     ` Filipe Manana
2022-09-02 13:25       ` Josef Bacik
2022-09-01 23:27   ` Qu Wenruo
2022-09-02  8:59     ` Filipe Manana
2022-09-02  9:34       ` Qu Wenruo
2022-09-02  9:41         ` Filipe Manana
2022-09-02  9:50           ` Qu Wenruo
2022-09-02  0:53 ` [PATCH 00/10] btrfs: make lseek and fiemap much more efficient Wang Yugui
2022-09-02  8:24   ` Filipe Manana [this message]
2022-09-02 11:41     ` Wang Yugui
2022-09-02 11:45     ` Filipe Manana
2022-09-05 14:39       ` Filipe Manana
2022-09-06 16:20 ` David Sterba
2022-09-06 17:13   ` Filipe Manana
2022-09-07  9:12 ` Christoph Hellwig
2022-09-07  9:47   ` Filipe Manana

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAL3q7H79BWAJVk2ecWqa4mbW0+WFJrEX-=a+Zg9FOc_UcAKjLg@mail.gmail.com' \
    --to=fdmanana@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=wangyugui@e16-tech.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).