linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Filipe Manana <fdmanana@gmail.com>
To: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Cc: Qu Wenruo <quwenruo.btrfs@gmx.com>, halfdog <me@halfdog.net>,
	linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: FIDEDUPERANGE woes
Date: Sat, 14 Dec 2019 11:21:11 +0000	[thread overview]
Message-ID: <CAL3q7H6Rj7VzdBh_bZaqosTwrBFgDs6jj0St5rqThvo-PCGO-g@mail.gmail.com> (raw)
In-Reply-To: <20191214051938.GA13306@hungrycats.org>

On Sat, Dec 14, 2019 at 5:46 AM Zygo Blaxell
<ce3g8jdj@umail.furryterror.org> wrote:
>
> On Fri, Dec 13, 2019 at 05:32:14PM +0800, Qu Wenruo wrote:
> >
> >
> > On 2019/12/13 上午12:15, halfdog wrote:
> > > Hello list,
> > >
> > > Using btrfs on
> > >
> > > Linux version 5.3.0-2-amd64 (debian-kernel@lists.debian.org) (gcc version 9.2.1 20191109 (Debian 9.2.1-19)) #1 SMP Debian 5.3.9-3 (2019-11-19)
> > >
> > > the FIDEDUPERANGE exposes weird behaviour on two identical but
> > > not too large files that seems to be depending on the file size.
> > > Before FIDEDUPERANGE both files have a single extent, afterwards
> > > first file is still single extent, second file has all bytes sharing
> > > with the extent of the first file except for the last 4096 bytes.
> > >
> > > Is there anything known about a bug fixed since the above mentioned
> > > kernel version?
> > >
> > >
> > >
> > > If no, does following reproducer still show the same behaviour
> > > on current Linux kernel (my Python test tools also attached)?
> > >
> > >> dd if=/dev/zero bs=1M count=32 of=disk
> > >> mkfs.btrfs --mixed --metadata single --data single --nodesize 4096 disk
> > >> mount disk /mnt/test
> > >> mkdir /mnt/test/x
> > >> dd bs=1 count=155489 if=/dev/urandom of=/mnt/test/x/file-0
> >
> > 155489 is not sector size aligned, thus the last extent will be padded
> > with zero.
> >
> > >> cat /mnt/test/x/file-0 > /mnt/test/x/file-1
> >
> > Same for the new file.
> >
> > For the tailing padding part, it's not aligned, and it's smaller than
> > the inode size.
> >
> > Thus we won't dedupe that tailing part.
>
> We definitely *must* dedupe the tailing part on btrfs; otherwise, we can't
> eliminate the reference to the last (partial) block in the last extent of
> the file, and there is no way to dedupe the _entire_ file in this example.
> It does pretty bad things to dedupe hit rates on uncompressed contiguous
> files, where you can lose an average of 64MB of space per file.
>
> I had been wondering why dedupe scores seemed so low on recent kernels,
> and this bug would certainly contribute to that.
>
> It worked in 4.20, broken in 5.0.  My guess is commit
> 34a28e3d77535efc7761aa8d67275c07d1fe2c58 ("Btrfs: use
> generic_remap_file_range_prep() for cloning and deduplication") but I
> haven't run a test to confirm.

The problem comes from the generic (vfs) code, which always rounds
down the deduplication length to a multiple of the filesystem's sector
size.
That should be done only when the destination range's end does not
match the destination's file size, to avoid the corruption I found
over a year ago [1].
For some odd reason it has the correct behavior for cloning, it only
rounds down the destination range's end is less then the destination's
file size.

I'll see if I get that fixed next week.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=de02b9f6bb65a6a1848f346f7a3617b7a9b930c0

>
>
> > Thanks,
> > Qu
> >
> > >
> > >> ./SimpleIndexer x > x.json
> > >> ./IndexDeduplicationAnalyzer --IndexFile /mnt/test/x.json /mnt/test/x > dedup.list
> > > Got dict: {b'/mnt/test/x/file-0': [(0, 5316608, 155648)], b'/mnt/test/x/file-1': [(0, 5472256, 155648)]}
> > > ...
> > >> strace -s256 -f btrfs-extent-same 155489 /mnt/test/x/file-0 0 /mnt/test/x/file-1 0 2>&1 | grep -E -e FIDEDUPERANGE
> > > ioctl(3, BTRFS_IOC_FILE_EXTENT_SAME or FIDEDUPERANGE, {src_offset=0, src_length=155489, dest_count=1, info=[{dest_fd=4, dest_offset=0}]} => {info=[{bytes_deduped=155489, status=0}]}) = 0
> > >> ./IndexDeduplicationAnalyzer --IndexFile /mnt/test/x.json /mnt/test/x > dedup.list
> > > Got dict: {b'/mnt/test/x/file-0': [(0, 5316608, 155648)], b'/mnt/test/x/file-1': [(0, 5316608, 151552), (151552, 5623808, 4096)]}
> > > ...
> > >> strace -s256 -f btrfs-extent-same 155489 /mnt/test/x/file-0 0 /mnt/test/x/file-1 0 2>&1 | grep -E -e FIDEDUPERANGE
> > > ioctl(3, BTRFS_IOC_FILE_EXTENT_SAME or FIDEDUPERANGE, {src_offset=0, src_length=155489, dest_count=1, info=[{dest_fd=4, dest_offset=0}]} => {info=[{bytes_deduped=155489, status=0}]}) = 0
> > >> strace -s256 -f btrfs-extent-same 4096 /mnt/test/x/file-0 151552 /mnt/test/x/file-1 151552 2>&1 | grep -E -e FIDEDUPERANGE
> > > ioctl(3, BTRFS_IOC_FILE_EXTENT_SAME or FIDEDUPERANGE, {src_offset=151552, src_length=4096, dest_count=1, info=[{dest_fd=4, dest_offset=151552}]}) = -1 EINVAL (Invalid argument)
> > >
> >
>
>
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

  reply	other threads:[~2019-12-14 11:21 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-12 16:15 FIDEDUPERANGE woes halfdog
2019-12-13  9:32 ` Qu Wenruo
2019-12-14  5:19   ` Zygo Blaxell
2019-12-14 11:21     ` Filipe Manana [this message]
2020-01-30 16:59 ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAL3q7H6Rj7VzdBh_bZaqosTwrBFgDs6jj0St5rqThvo-PCGO-g@mail.gmail.com \
    --to=fdmanana@gmail.com \
    --cc=ce3g8jdj@umail.furryterror.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=me@halfdog.net \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).