All of lore.kernel.org
 help / color / mirror / Atom feed
From: Liu Bo <bo.li.liu@oracle.com>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>
Cc: linux-btrfs@vger.kernel.org, dsterba@suse.cz
Subject: Re: [PATCH v3] btrfs: fiemap: Cache and merge fiemap extent before submit it to user
Date: Wed, 19 Apr 2017 19:08:17 -0700	[thread overview]
Message-ID: <20170420020817.GD15096@lim.localdomain> (raw)
In-Reply-To: <98ab28d8-7ab6-0d2d-92d6-38b78e572d39@cn.fujitsu.com>

On Thu, Apr 20, 2017 at 09:52:00AM +0800, Qu Wenruo wrote:
> 
> 
> At 04/20/2017 09:25 AM, Liu Bo wrote:
> > On Fri, Apr 07, 2017 at 10:43:15AM +0800, Qu Wenruo wrote:
> > > [BUG]
> > > Cycle mount btrfs can cause fiemap to return different result.
> > > Like:
> > >   # mount /dev/vdb5 /mnt/btrfs
> > >   # dd if=/dev/zero bs=16K count=4 oflag=dsync of=/mnt/btrfs/file
> > >   # xfs_io -c "fiemap -v" /mnt/btrfs/file
> > >   /mnt/test/file:
> > >   EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
> > >     0: [0..127]:        25088..25215       128   0x1
> > >   # umount /mnt/btrfs
> > >   # mount /dev/vdb5 /mnt/btrfs
> > >   # xfs_io -c "fiemap -v" /mnt/btrfs/file
> > >   /mnt/test/file:
> > >   EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
> > >     0: [0..31]:         25088..25119        32   0x0
> > >     1: [32..63]:        25120..25151        32   0x0
> > >     2: [64..95]:        25152..25183        32   0x0
> > >     3: [96..127]:       25184..25215        32   0x1
> > > But after above fiemap, we get correct merged result if we call fiemap
> > > again.
> > >   # xfs_io -c "fiemap -v" /mnt/btrfs/file
> > >   /mnt/test/file:
> > >   EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
> > >     0: [0..127]:        25088..25215       128   0x1
> > > 
> > > [REASON]
> > > Btrfs will try to merge extent map when inserting new extent map.
> > > 
> > > btrfs_fiemap(start=0 len=(u64)-1)
> > > |- extent_fiemap(start=0 len=(u64)-1)
> > >     |- get_extent_skip_holes(start=0 len=64k)
> > >     |  |- btrfs_get_extent_fiemap(start=0 len=64k)
> > >     |     |- btrfs_get_extent(start=0 len=64k)
> > >     |        |  Found on-disk (ino, EXTENT_DATA, 0)
> > >     |        |- add_extent_mapping()
> > >     |        |- Return (em->start=0, len=16k)
> > >     |
> > >     |- fiemap_fill_next_extent(logic=0 phys=X len=16k)
> > >     |
> > >     |- get_extent_skip_holes(start=0 len=64k)
> > >     |  |- btrfs_get_extent_fiemap(start=0 len=64k)
> > >     |     |- btrfs_get_extent(start=16k len=48k)
> > >     |        |  Found on-disk (ino, EXTENT_DATA, 16k)
> > >     |        |- add_extent_mapping()
> > >     |        |  |- try_merge_map()
> > >     |        |     Merge with previous em start=0 len=16k
> > >     |        |     resulting em start=0 len=32k
> > >     |        |- Return (em->start=0, len=32K)    << Merged result
> > >     |- Stripe off the unrelated range (0~16K) of return em
> > >     |- fiemap_fill_next_extent(logic=16K phys=X+16K len=16K)
> > >        ^^^ Causing split fiemap extent.
> > > 
> > > And since in add_extent_mapping(), em is already merged, in next
> > > fiemap() call, we will get merged result.
> > > 
> > > [FIX]
> > > Here we introduce a new structure, fiemap_cache, which records previous
> > > fiemap extent.
> > > 
> > > And will always try to merge current fiemap_cache result before calling
> > > fiemap_fill_next_extent().
> > > Only when we failed to merge current fiemap extent with cached one, we
> > > will call fiemap_fill_next_extent() to submit cached one.
> > > 
> > > So by this method, we can merge all fiemap extents.
> > > 
> > 
> > If I understand it correctly, what it's missing currently is 'merge', a
> > straightfoward idea is to make use of the 'merge' ability of btrfs_get_extent. >
> > Since btrfs_get_extent_fiemap is a wrapper of btrfs_get_extent, does it make
> > sense if we make btrfs_get_extent_fiemap iterate all extent maps within the
> > range passing to it or just one more extent map to check if btrfs_get_extent
> > could return a merged extent map before returning?
> 
> So your idea to to do the merging inside btrfs_get_extent(), instead of
> merging it in extent_fiemap()?
> 
> In theory, they have the same effect for fiemap.
> And that's my original idea.
> 
> But the problem is, btrfs_get_extent() is called almost everywhere, more
> frequently than extent_fiemap(), the extra time used to merging extent map
> may cause performance problem.
> 
> For the worst case, if a file is made up by 262144 4K extent (takes up 1G),
> btrfs_get_extent() call on the file will iterate all these extents,
> which will iterate more than 500 16K tree blocks.
> 
> If only fiemap takes such longer time, it's still acceptable. But if
> btrfs_get_extent() takes such longer time, I think it will be a problem
> then.
>

Not related to the patch, but the question is, does fiemap ioctl think
it is important to have unified output?

The filefrag manual only says it's attempting to get extent
information.  And this inconsistent output of filefrag doesn't seem to
confusing users as the reported extent count is correct.

Reviewed-by: Liu Bo <bo.li.liu@oracle.com>

Thanks,

-liubo

  parent reply	other threads:[~2017-04-20  2:09 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-07  2:43 [PATCH v3] btrfs: fiemap: Cache and merge fiemap extent before submit it to user Qu Wenruo
2017-04-12 15:05 ` David Sterba
2017-04-13  0:36   ` Qu Wenruo
2017-05-05 17:41     ` David Sterba
2017-04-20  1:25 ` Liu Bo
2017-04-20  1:52   ` Qu Wenruo
2017-04-20  1:58     ` Liu Bo
2017-04-20  2:09       ` Qu Wenruo
2017-04-20 19:52         ` Liu Bo
2017-05-05 17:38           ` David Sterba
2017-05-05 17:36         ` David Sterba
2017-04-20  2:08     ` Liu Bo [this message]
2017-04-20  2:16       ` Qu Wenruo
2017-06-16 12:33 ` David Sterba
2017-06-17  7:43   ` Qu Wenruo
2017-06-17  8:30     ` Adam Borowski
2017-06-17 13:28       ` Qu Wenruo
2017-06-17 14:57         ` Adam Borowski
2017-06-17 21:24         ` Adam Borowski
2017-06-18  9:38           ` Qu Wenruo
2017-06-18 11:23             ` Qu Wenruo
2017-06-18 11:56               ` Holger Hoffstätte
2017-06-18 13:42               ` Adam Borowski
2017-06-21  8:13                 ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170420020817.GD15096@lim.localdomain \
    --to=bo.li.liu@oracle.com \
    --cc=dsterba@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.