linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: fdmanana@kernel.org
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 09/10] btrfs: skip unnecessary extent buffer sharedness checks during fiemap
Date: Thu, 1 Sep 2022 10:26:55 -0400	[thread overview]
Message-ID: <YxDBL5qXmAU9qllX@localhost.localdomain> (raw)
In-Reply-To: <d80f75e12d0212da59cbcccac2eddd506c8998af.1662022922.git.fdmanana@suse.com>

On Thu, Sep 01, 2022 at 02:18:29PM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> During fiemap, for each file extent we find, we must check if it's shared
> or not. The sharedness check starts by verifying if the extent is directly
> shared (its refcount in the extent tree is > 1), and if it is not directly
> shared, then we will check if every node in the subvolume b+tree leading
> from the root to the leaf that has the file extent item (in reverse order),
> is shared (through snapshots).
> 
> However this second step is not needed if our extent was created in a
> transaction more recent than the last transaction where a snapshot of the
> inode's root happened, because it can't be shared indirectly (through
> shared subtrees) without a snapshot created in a more recent transaction.
> 
> So grab the generation of the extent from the extent map and pass it to
> btrfs_is_data_extent_shared(), which will skip this second phase when the
> generation is more recent than the root's last snapshot value. Note that
> we skip this optimization if the extent map is the result of merging 2
> or more extent maps, because in this case its generation is the maximum
> of the generations of all merged extent maps.
> 
> The fact the we use extent maps and they can be merged despite the
> underlying extents being distinct (different file extent items in the
> subvolume b+tree and different extent items in the extent b+tree), can
> result in some bugs when reporting shared extents. But this is a problem
> of the current implementation of fiemap relying on extent maps.
> One example where we get incorrect results is:
> 
>     $ cat fiemap-bug.sh
>     #!/bin/bash
> 
>     DEV=/dev/sdj
>     MNT=/mnt/sdj
> 
>     mkfs.btrfs -f $DEV
>     mount $DEV $MNT
> 
>     # Create a file with two 256K extents.
>     # Since there is no other write activity, they will be contiguous,
>     # and their extent maps merged, despite having two distinct extents.
>     xfs_io -f -c "pwrite -S 0xab 0 256K" \
>               -c "fsync" \
>               -c "pwrite -S 0xcd 256K 256K" \
>               -c "fsync" \
>               $MNT/foo
> 
>     # Now clone only the second extent into another file.
>     xfs_io -f -c "reflink $MNT/foo 256K 0 256K" $MNT/bar
> 
>     # Filefrag will report a single 512K extent, and say it's not shared.
>     echo
>     filefrag -v $MNT/foo
> 
>     umount $MNT
> 
> Running the reproducer:
> 
>     $ ./fiemap-bug.sh
>     wrote 262144/262144 bytes at offset 0
>     256 KiB, 64 ops; 0.0038 sec (65.479 MiB/sec and 16762.7030 ops/sec)
>     wrote 262144/262144 bytes at offset 262144
>     256 KiB, 64 ops; 0.0040 sec (61.125 MiB/sec and 15647.9218 ops/sec)
>     linked 262144/262144 bytes at offset 0
>     256 KiB, 1 ops; 0.0002 sec (1.034 GiB/sec and 4237.2881 ops/sec)
> 
>     Filesystem type is: 9123683e
>     File size of /mnt/sdj/foo is 524288 (128 blocks of 4096 bytes)
>      ext:     logical_offset:        physical_offset: length:   expected: flags:
>        0:        0..     127:       3328..      3455:    128:             last,eof
>     /mnt/sdj/foo: 1 extent found
> 
> We end up reporting that we have a single 512K that is not shared, however
> we have two 256K extents, and the second one is shared. Changing the
> reproducer to clone instead the first extent into file 'bar', makes us
> report a single 512K extent that is shared, which is algo incorrect since
> we have two 256K extents and only the first one is shared.
> 
> This is z problem that existed before this change, and remains after this
> change, as it can't be easily fixed. The next patch in the series reworks
> fiemap to primarily use file extent items instead of extent maps (except
> for checking for delalloc ranges), with the goal of improving its
> scalability and performance, but it also ends up fixing this particular
> bug caused by extent map merging.
> 
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

  reply	other threads:[~2022-09-01 14:27 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-01 13:18 [PATCH 00/10] btrfs: make lseek and fiemap much more efficient fdmanana
2022-09-01 13:18 ` [PATCH 01/10] btrfs: allow hole and data seeking to be interruptible fdmanana
2022-09-01 13:58   ` Josef Bacik
2022-09-01 21:49   ` Qu Wenruo
2022-09-01 13:18 ` [PATCH 02/10] btrfs: make hole and data seeking a lot more efficient fdmanana
2022-09-01 14:03   ` Josef Bacik
2022-09-01 15:00     ` Filipe Manana
2022-09-02 13:26       ` Josef Bacik
2022-09-01 22:18   ` Qu Wenruo
2022-09-02  8:36     ` Filipe Manana
2022-09-11 22:12   ` Qu Wenruo
2022-09-12  8:38     ` Filipe Manana
2022-09-01 13:18 ` [PATCH 03/10] btrfs: remove check for impossible block start for an extent map at fiemap fdmanana
2022-09-01 14:03   ` Josef Bacik
2022-09-01 22:19   ` Qu Wenruo
2022-09-01 13:18 ` [PATCH 04/10] btrfs: remove zero length check when entering fiemap fdmanana
2022-09-01 14:04   ` Josef Bacik
2022-09-01 22:24   ` Qu Wenruo
2022-09-01 13:18 ` [PATCH 05/10] btrfs: properly flush delalloc " fdmanana
2022-09-01 14:06   ` Josef Bacik
2022-09-01 22:38   ` Qu Wenruo
2022-09-01 13:18 ` [PATCH 06/10] btrfs: allow fiemap to be interruptible fdmanana
2022-09-01 14:07   ` Josef Bacik
2022-09-01 22:42   ` Qu Wenruo
2022-09-02  8:38     ` Filipe Manana
2022-09-01 13:18 ` [PATCH 07/10] btrfs: rename btrfs_check_shared() to a more descriptive name fdmanana
2022-09-01 14:08   ` Josef Bacik
2022-09-01 22:45   ` Qu Wenruo
2022-09-01 13:18 ` [PATCH 08/10] btrfs: speedup checking for extent sharedness during fiemap fdmanana
2022-09-01 14:23   ` Josef Bacik
2022-09-01 22:50   ` Qu Wenruo
2022-09-02  8:46     ` Filipe Manana
2022-09-01 13:18 ` [PATCH 09/10] btrfs: skip unnecessary extent buffer sharedness checks " fdmanana
2022-09-01 14:26   ` Josef Bacik [this message]
2022-09-01 23:01   ` Qu Wenruo
2022-09-01 13:18 ` [PATCH 10/10] btrfs: make fiemap more efficient and accurate reporting extent sharedness fdmanana
2022-09-01 14:35   ` Josef Bacik
2022-09-01 15:04     ` Filipe Manana
2022-09-02 13:25       ` Josef Bacik
2022-09-01 23:27   ` Qu Wenruo
2022-09-02  8:59     ` Filipe Manana
2022-09-02  9:34       ` Qu Wenruo
2022-09-02  9:41         ` Filipe Manana
2022-09-02  9:50           ` Qu Wenruo
2022-09-02  0:53 ` [PATCH 00/10] btrfs: make lseek and fiemap much more efficient Wang Yugui
2022-09-02  8:24   ` Filipe Manana
2022-09-02 11:41     ` Wang Yugui
2022-09-02 11:45     ` Filipe Manana
2022-09-05 14:39       ` Filipe Manana
2022-09-06 16:20 ` David Sterba
2022-09-06 17:13   ` Filipe Manana
2022-09-07  9:12 ` Christoph Hellwig
2022-09-07  9:47   ` Filipe Manana

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YxDBL5qXmAU9qllX@localhost.localdomain \
    --to=josef@toxicpanda.com \
    --cc=fdmanana@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).