All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Sterba <dsterba@suse.cz>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>
Cc: dsterba@suse.cz, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v3] btrfs: fiemap: Cache and merge fiemap extent before submit it to user
Date: Fri, 5 May 2017 19:41:45 +0200	[thread overview]
Message-ID: <20170505174145.GK10575@twin.jikos.cz> (raw)
In-Reply-To: <f7edd49d-8955-2e4e-1122-8b5026f14eb6@cn.fujitsu.com>

On Thu, Apr 13, 2017 at 08:36:24AM +0800, Qu Wenruo wrote:
> 
> 
> At 04/12/2017 11:05 PM, David Sterba wrote:
> > On Fri, Apr 07, 2017 at 10:43:15AM +0800, Qu Wenruo wrote:
> >> [BUG]
> >> Cycle mount btrfs can cause fiemap to return different result.
> >> Like:
> >>   # mount /dev/vdb5 /mnt/btrfs
> >>   # dd if=/dev/zero bs=16K count=4 oflag=dsync of=/mnt/btrfs/file
> >>   # xfs_io -c "fiemap -v" /mnt/btrfs/file
> >>   /mnt/test/file:
> >>   EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
> >>     0: [0..127]:        25088..25215       128   0x1
> >>   # umount /mnt/btrfs
> >>   # mount /dev/vdb5 /mnt/btrfs
> >>   # xfs_io -c "fiemap -v" /mnt/btrfs/file
> >>   /mnt/test/file:
> >>   EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
> >>     0: [0..31]:         25088..25119        32   0x0
> >>     1: [32..63]:        25120..25151        32   0x0
> >>     2: [64..95]:        25152..25183        32   0x0
> >>     3: [96..127]:       25184..25215        32   0x1
> >> But after above fiemap, we get correct merged result if we call fiemap
> >> again.
> >>   # xfs_io -c "fiemap -v" /mnt/btrfs/file
> >>   /mnt/test/file:
> >>   EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
> >>     0: [0..127]:        25088..25215       128   0x1
> >>
> >> [REASON]
> >> Btrfs will try to merge extent map when inserting new extent map.
> >>
> >> btrfs_fiemap(start=0 len=(u64)-1)
> >> |- extent_fiemap(start=0 len=(u64)-1)
> >>     |- get_extent_skip_holes(start=0 len=64k)
> >>     |  |- btrfs_get_extent_fiemap(start=0 len=64k)
> >>     |     |- btrfs_get_extent(start=0 len=64k)
> >>     |        |  Found on-disk (ino, EXTENT_DATA, 0)
> >>     |        |- add_extent_mapping()
> >>     |        |- Return (em->start=0, len=16k)
> >>     |
> >>     |- fiemap_fill_next_extent(logic=0 phys=X len=16k)
> >>     |
> >>     |- get_extent_skip_holes(start=0 len=64k)
> >>     |  |- btrfs_get_extent_fiemap(start=0 len=64k)
> >>     |     |- btrfs_get_extent(start=16k len=48k)
> >>     |        |  Found on-disk (ino, EXTENT_DATA, 16k)
> >>     |        |- add_extent_mapping()
> >>     |        |  |- try_merge_map()
> >>     |        |     Merge with previous em start=0 len=16k
> >>     |        |     resulting em start=0 len=32k
> >>     |        |- Return (em->start=0, len=32K)    << Merged result
> >>     |- Stripe off the unrelated range (0~16K) of return em
> >>     |- fiemap_fill_next_extent(logic=16K phys=X+16K len=16K)
> >>        ^^^ Causing split fiemap extent.
> >>
> >> And since in add_extent_mapping(), em is already merged, in next
> >> fiemap() call, we will get merged result.
> >>
> >> [FIX]
> >> Here we introduce a new structure, fiemap_cache, which records previous
> >> fiemap extent.
> >>
> >> And will always try to merge current fiemap_cache result before calling
> >> fiemap_fill_next_extent().
> >> Only when we failed to merge current fiemap extent with cached one, we
> >> will call fiemap_fill_next_extent() to submit cached one.
> >>
> >> So by this method, we can merge all fiemap extents.
> > 
> > The cache gets reset on each call to extent_fiemap, so if fi_extents_max
> > is 1, the cache will be always unset and we'll never merge anything. The
> > same can happen if the number of extents reaches the limit
> > (FIEMAP_MAX_EXTENTS or any other depending on the ioctl caller). And
> > this leads to the unmerged extents.
> 
> Nope, extents will still be merged, as long as they can be merged.
> 
> The fiemap extent is only submitted if we found an unmergeable extent.
> 
> Even fi_extents_max is 1, it still possible for us to merge extents.
> 
> File A:
> Extent 1: offset=0 len=4k phys=X
> Extent 2: offset=4k len=4k phys=X+4
> Extent 3: offset=8k len=4k phys=Y
> 
> 1) Found Extent 1
>     Cache it, not submitted yet.
> 2) Found Extent 2
>     Merge it with cached one, not submitted yet.
> 3) Found Extent 3
>     Can't merge, submit cached first.
>     Submitted one reach fi_extents_max, exit current extent_fiemap.
> 
> 4) Next fiemap call starts from offset 8K,
>     Extent 3 is the last extent, no need to cache just submit.
> 
> So we still got merged fiemap extent, without anything wrong.
> 
> The point is, fi_extents_max or other limit can only be merged when we 
> submit fiemap_extent, in that case either we found unmergable extent, or 
> we already hit the last extent.

Thanks for the explanation.

> >> It can also be done in fs/ioctl.c, however the problem is if
> >> fieinfo->fi_extents_max == 0, we have no space to cache previous fiemap
> >> extent.
> > 
> > I don't see why, it's the same code path, no?
> 
> My original design in VFS is to check if we can merge current fiemap 
> extent with the last one in fiemap_info.
> 
> But for fi_extents_max == 0 case, fiemap_info doesn't store any extent 
> so that's not possible.
> 
> 
> So for fi_extents_max == 0 case, either do it in each fs like what we 
> are doing, or introduce a new function like fiemap_cache_next_extent() 
> with reference to cached structure.
> 
> > 
> >> So I choose to merge it in btrfs.
> > 
> > Lifting that to the vfs interface is probably not the right approach.
> > The ioctl has never done any postprocessing of the data returned by
> > filesystems, it's really up to the filesystem to prepare the data.
> 
> OK, let's keep it in btrfs.
> 
> > 
> >> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> >> ---
> >> v2:
> >>    Since fiemap_extent_info has a limit for number of fiemap_extent, it's possible
> >>    that fiemap_fill_next_extent() return 1 halfway. Remove the WARN_ON() which can
> >>    cause kernel warning if we fiemap is called on large compressed file.
> >> v3:
> >>    Rename finish_fiemap_extent() to check_fiemap_extent(), as in v3 we ensured
> >>    submit_fiemap_extent() to submit fiemap cache, so it just acts as a
> >>    sanity check.
> >>    Remove BTRFS_MAX_EXTENT_SIZE limit in submit_fiemap_extent(), as
> >>    extent map can be larger than BTRFS_MAX_EXTENT_SIZE.
> >>    Don't do backward jump, suggested by David.
> >>    Better sanity check and recoverable fix.
> >>
> >> To David:
> >>    What about adding a btrfs_debug_warn(), which will only call WARN_ON(1) if
> >>    BTRFS_CONFIG_DEBUG is specified for recoverable bug?
> >>
> >>    And modify ASSERT() to always WARN_ON() and exit error code?
> > 
> > That's for a separate discussion.
> > 
> >> ---
> >>   fs/btrfs/extent_io.c | 124 ++++++++++++++++++++++++++++++++++++++++++++++++++-
> >>   1 file changed, 122 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> >> index 28e8192..c4cb65d 100644
> >> --- a/fs/btrfs/extent_io.c
> >> +++ b/fs/btrfs/extent_io.c
> >> @@ -4353,6 +4353,123 @@ static struct extent_map *get_extent_skip_holes(struct inode *inode,
> >>   	return NULL;
> >>   }
> >>   
> >> +/*
> >> + * To cache previous fiemap extent
> >> + *
> >> + * Will be used for merging fiemap extent
> >> + */
> >> +struct fiemap_cache {
> >> +	u64 offset;
> >> +	u64 phys;
> >> +	u64 len;
> >> +	u32 flags;
> >> +	bool cached;
> >> +};
> >> +
> >> +/*
> >> + * Helper to submit fiemap extent.
> >> + *
> >> + * Will try to merge current fiemap extent specified by @offset, @phys,
> >> + * @len and @flags with cached one.
> >> + * And only when we fails to merge, cached one will be submitted as
> >> + * fiemap extent.
> >> + *
> >> + * Return value is the same as fiemap_fill_next_extent().
> >> + */
> >> +static int submit_fiemap_extent(struct fiemap_extent_info *fieinfo,
> > 
> > I'd suggest to rename it to emmit_fiemap_extent
> 
> I'm OK to change it in next version.

As we agree on the patch, I'll rename the varaible and you don't need to
send a new version. I made a typo, the intended name was emit_fiemap_extent.

  reply	other threads:[~2017-05-05 17:41 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-07  2:43 [PATCH v3] btrfs: fiemap: Cache and merge fiemap extent before submit it to user Qu Wenruo
2017-04-12 15:05 ` David Sterba
2017-04-13  0:36   ` Qu Wenruo
2017-05-05 17:41     ` David Sterba [this message]
2017-04-20  1:25 ` Liu Bo
2017-04-20  1:52   ` Qu Wenruo
2017-04-20  1:58     ` Liu Bo
2017-04-20  2:09       ` Qu Wenruo
2017-04-20 19:52         ` Liu Bo
2017-05-05 17:38           ` David Sterba
2017-05-05 17:36         ` David Sterba
2017-04-20  2:08     ` Liu Bo
2017-04-20  2:16       ` Qu Wenruo
2017-06-16 12:33 ` David Sterba
2017-06-17  7:43   ` Qu Wenruo
2017-06-17  8:30     ` Adam Borowski
2017-06-17 13:28       ` Qu Wenruo
2017-06-17 14:57         ` Adam Borowski
2017-06-17 21:24         ` Adam Borowski
2017-06-18  9:38           ` Qu Wenruo
2017-06-18 11:23             ` Qu Wenruo
2017-06-18 11:56               ` Holger Hoffstätte
2017-06-18 13:42               ` Adam Borowski
2017-06-21  8:13                 ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170505174145.GK10575@twin.jikos.cz \
    --to=dsterba@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.