From: Josef Bacik <josef@toxicpanda.com>
To: fdmanana@kernel.org
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 09/10] btrfs: skip unnecessary extent buffer sharedness checks during fiemap
Date: Thu, 1 Sep 2022 10:26:55 -0400 [thread overview]
Message-ID: <YxDBL5qXmAU9qllX@localhost.localdomain> (raw)
In-Reply-To: <d80f75e12d0212da59cbcccac2eddd506c8998af.1662022922.git.fdmanana@suse.com>
On Thu, Sep 01, 2022 at 02:18:29PM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> During fiemap, for each file extent we find, we must check if it's shared
> or not. The sharedness check starts by verifying if the extent is directly
> shared (its refcount in the extent tree is > 1), and if it is not directly
> shared, then we will check if every node in the subvolume b+tree leading
> from the root to the leaf that has the file extent item (in reverse order),
> is shared (through snapshots).
>
> However this second step is not needed if our extent was created in a
> transaction more recent than the last transaction where a snapshot of the
> inode's root happened, because it can't be shared indirectly (through
> shared subtrees) without a snapshot created in a more recent transaction.
>
> So grab the generation of the extent from the extent map and pass it to
> btrfs_is_data_extent_shared(), which will skip this second phase when the
> generation is more recent than the root's last snapshot value. Note that
> we skip this optimization if the extent map is the result of merging 2
> or more extent maps, because in this case its generation is the maximum
> of the generations of all merged extent maps.
>
> The fact the we use extent maps and they can be merged despite the
> underlying extents being distinct (different file extent items in the
> subvolume b+tree and different extent items in the extent b+tree), can
> result in some bugs when reporting shared extents. But this is a problem
> of the current implementation of fiemap relying on extent maps.
> One example where we get incorrect results is:
>
> $ cat fiemap-bug.sh
> #!/bin/bash
>
> DEV=/dev/sdj
> MNT=/mnt/sdj
>
> mkfs.btrfs -f $DEV
> mount $DEV $MNT
>
> # Create a file with two 256K extents.
> # Since there is no other write activity, they will be contiguous,
> # and their extent maps merged, despite having two distinct extents.
> xfs_io -f -c "pwrite -S 0xab 0 256K" \
> -c "fsync" \
> -c "pwrite -S 0xcd 256K 256K" \
> -c "fsync" \
> $MNT/foo
>
> # Now clone only the second extent into another file.
> xfs_io -f -c "reflink $MNT/foo 256K 0 256K" $MNT/bar
>
> # Filefrag will report a single 512K extent, and say it's not shared.
> echo
> filefrag -v $MNT/foo
>
> umount $MNT
>
> Running the reproducer:
>
> $ ./fiemap-bug.sh
> wrote 262144/262144 bytes at offset 0
> 256 KiB, 64 ops; 0.0038 sec (65.479 MiB/sec and 16762.7030 ops/sec)
> wrote 262144/262144 bytes at offset 262144
> 256 KiB, 64 ops; 0.0040 sec (61.125 MiB/sec and 15647.9218 ops/sec)
> linked 262144/262144 bytes at offset 0
> 256 KiB, 1 ops; 0.0002 sec (1.034 GiB/sec and 4237.2881 ops/sec)
>
> Filesystem type is: 9123683e
> File size of /mnt/sdj/foo is 524288 (128 blocks of 4096 bytes)
> ext: logical_offset: physical_offset: length: expected: flags:
> 0: 0.. 127: 3328.. 3455: 128: last,eof
> /mnt/sdj/foo: 1 extent found
>
> We end up reporting that we have a single 512K that is not shared, however
> we have two 256K extents, and the second one is shared. Changing the
> reproducer to clone instead the first extent into file 'bar', makes us
> report a single 512K extent that is shared, which is algo incorrect since
> we have two 256K extents and only the first one is shared.
>
> This is z problem that existed before this change, and remains after this
> change, as it can't be easily fixed. The next patch in the series reworks
> fiemap to primarily use file extent items instead of extent maps (except
> for checking for delalloc ranges), with the goal of improving its
> scalability and performance, but it also ends up fixing this particular
> bug caused by extent map merging.
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Thanks,
Josef
next prev parent reply other threads:[~2022-09-01 14:27 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-01 13:18 [PATCH 00/10] btrfs: make lseek and fiemap much more efficient fdmanana
2022-09-01 13:18 ` [PATCH 01/10] btrfs: allow hole and data seeking to be interruptible fdmanana
2022-09-01 13:58 ` Josef Bacik
2022-09-01 21:49 ` Qu Wenruo
2022-09-01 13:18 ` [PATCH 02/10] btrfs: make hole and data seeking a lot more efficient fdmanana
2022-09-01 14:03 ` Josef Bacik
2022-09-01 15:00 ` Filipe Manana
2022-09-02 13:26 ` Josef Bacik
2022-09-01 22:18 ` Qu Wenruo
2022-09-02 8:36 ` Filipe Manana
2022-09-11 22:12 ` Qu Wenruo
2022-09-12 8:38 ` Filipe Manana
2022-09-01 13:18 ` [PATCH 03/10] btrfs: remove check for impossible block start for an extent map at fiemap fdmanana
2022-09-01 14:03 ` Josef Bacik
2022-09-01 22:19 ` Qu Wenruo
2022-09-01 13:18 ` [PATCH 04/10] btrfs: remove zero length check when entering fiemap fdmanana
2022-09-01 14:04 ` Josef Bacik
2022-09-01 22:24 ` Qu Wenruo
2022-09-01 13:18 ` [PATCH 05/10] btrfs: properly flush delalloc " fdmanana
2022-09-01 14:06 ` Josef Bacik
2022-09-01 22:38 ` Qu Wenruo
2022-09-01 13:18 ` [PATCH 06/10] btrfs: allow fiemap to be interruptible fdmanana
2022-09-01 14:07 ` Josef Bacik
2022-09-01 22:42 ` Qu Wenruo
2022-09-02 8:38 ` Filipe Manana
2022-09-01 13:18 ` [PATCH 07/10] btrfs: rename btrfs_check_shared() to a more descriptive name fdmanana
2022-09-01 14:08 ` Josef Bacik
2022-09-01 22:45 ` Qu Wenruo
2022-09-01 13:18 ` [PATCH 08/10] btrfs: speedup checking for extent sharedness during fiemap fdmanana
2022-09-01 14:23 ` Josef Bacik
2022-09-01 22:50 ` Qu Wenruo
2022-09-02 8:46 ` Filipe Manana
2022-09-01 13:18 ` [PATCH 09/10] btrfs: skip unnecessary extent buffer sharedness checks " fdmanana
2022-09-01 14:26 ` Josef Bacik [this message]
2022-09-01 23:01 ` Qu Wenruo
2022-09-01 13:18 ` [PATCH 10/10] btrfs: make fiemap more efficient and accurate reporting extent sharedness fdmanana
2022-09-01 14:35 ` Josef Bacik
2022-09-01 15:04 ` Filipe Manana
2022-09-02 13:25 ` Josef Bacik
2022-09-01 23:27 ` Qu Wenruo
2022-09-02 8:59 ` Filipe Manana
2022-09-02 9:34 ` Qu Wenruo
2022-09-02 9:41 ` Filipe Manana
2022-09-02 9:50 ` Qu Wenruo
2022-09-02 0:53 ` [PATCH 00/10] btrfs: make lseek and fiemap much more efficient Wang Yugui
2022-09-02 8:24 ` Filipe Manana
2022-09-02 11:41 ` Wang Yugui
2022-09-02 11:45 ` Filipe Manana
2022-09-05 14:39 ` Filipe Manana
2022-09-06 16:20 ` David Sterba
2022-09-06 17:13 ` Filipe Manana
2022-09-07 9:12 ` Christoph Hellwig
2022-09-07 9:47 ` Filipe Manana
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YxDBL5qXmAU9qllX@localhost.localdomain \
--to=josef@toxicpanda.com \
--cc=fdmanana@kernel.org \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).