All of lore.kernel.org
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: fdmanana@kernel.org, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] btrfs: send, fix invalid clone operations when cloning from the same file and root
Date: Mon, 11 Jan 2021 11:06:25 -0500	[thread overview]
Message-ID: <83c11071-2e0c-ef9d-4e6c-072b9027b166@toxicpanda.com> (raw)
In-Reply-To: <900493c40f7edbd42fe861ccd9a68851ea952499.1610363502.git.fdmanana@suse.com>

On 1/11/21 6:41 AM, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> When an incremental send finds an extent that is shared, it checks which
> file extent items in the range refer to that extent, and for those it
> emits clone operations, while for others it emits regular write operations
> to avoid corruption at the destination (as described and fixed by commit
> d906d49fc5f4 ("Btrfs: send, fix file corruption due to incorrect cloning
> operations")).
> 
> However when the root we are cloning from is the send root, we are cloning
> from the inode currently being processed and the source file range has
> several extent items that partially point to the desired extent, with an
> offset smaller than the offset in the file extent item for the range we
> want to clone into, it can cause the algorithm to issue a clone operation
> that starts at the current eof of the file being processed in the receiver
> side, in which case the receiver will fail, with -EINVAL, when attempting
> to execute the clone operation.
> 
> Example reproducer:
> 
>    $ cat test-send-clone.sh
>    #!/bin/bash
> 
>    DEV=/dev/sdi
>    MNT=/mnt/sdi
> 
>    mkfs.btrfs -f $DEV >/dev/null
>    mount $DEV $MNT
> 
>    # Create our test file with a single and large extent (1M) and with
>    # different content for different file ranges that will be reflinked
>    # later.
>    xfs_io -f \
>           -c "pwrite -S 0xab 0 128K" \
>           -c "pwrite -S 0xcd 128K 128K" \
>           -c "pwrite -S 0xef 256K 256K" \
>           -c "pwrite -S 0x1a 512K 512K" \
>           $MNT/foobar
> 
>    btrfs subvolume snapshot -r $MNT $MNT/snap1
>    btrfs send -f /tmp/snap1.send $MNT/snap1
> 
>    # Now do a series of changes to our file such that we end up with
>    # different parts of the extent reflinked into different file offsets
>    # and we overwrite a large part of the extent too, so no file extent
>    # items refer to that part that was overwritten. This used to confure
>    # the algorithm used by the kernel to figure out which file ranges to
>    # clone, making it attempt to clone from a source range starting at
>    # the current eof of the file, resulting in the receiver to fail since
>    # it is an invalid clone operation.
>    #
>    xfs_io -c "reflink $MNT/foobar 64K 1M 960K" \
>           -c "reflink $MNT/foobar 0K 512K 256K" \
>           -c "reflink $MNT/foobar 512K 128K 256K" \
>           -c "pwrite -S 0x73 384K 640K" \
>           $MNT/foobar
> 
>    btrfs subvolume snapshot -r $MNT $MNT/snap2
>    btrfs send -f /tmp/snap2.send -p $MNT/snap1 $MNT/snap2
> 
>    echo -e "\nFile digest in the original filesystem:"
>    md5sum $MNT/snap2/foobar
> 
>    # Now unmount the filesystem, create a new one, mount it and try to
>    # apply both send streams to recreate both snapshots.
>    umount $DEV
> 
>    mkfs.btrfs -f $DEV >/dev/null
>    mount $DEV $MNT
> 
>    btrfs receive -f /tmp/snap1.send $MNT
>    btrfs receive -f /tmp/snap2.send $MNT
> 
>    # Must match what we got in the original filesystem of course.
>    echo -e "\nFile digest in the new filesystem:"
>    md5sum $MNT/snap2/foobar
> 
>    umount $MNT
> 
> When running the reproducer, the incremental send operation fails due to
> an invalid clone operation:
> 
>    $ ./test-send-clone.sh
>    wrote 131072/131072 bytes at offset 0
>    128 KiB, 32 ops; 0.0015 sec (80.906 MiB/sec and 20711.9741 ops/sec)
>    wrote 131072/131072 bytes at offset 131072
>    128 KiB, 32 ops; 0.0013 sec (90.514 MiB/sec and 23171.6148 ops/sec)
>    wrote 262144/262144 bytes at offset 262144
>    256 KiB, 64 ops; 0.0025 sec (98.270 MiB/sec and 25157.2327 ops/sec)
>    wrote 524288/524288 bytes at offset 524288
>    512 KiB, 128 ops; 0.0052 sec (95.730 MiB/sec and 24506.9883 ops/sec)
>    Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap1'
>    At subvol /mnt/sdi/snap1
>    linked 983040/983040 bytes at offset 1048576
>    960 KiB, 1 ops; 0.0006 sec (1.419 GiB/sec and 1550.3876 ops/sec)
>    linked 262144/262144 bytes at offset 524288
>    256 KiB, 1 ops; 0.0020 sec (120.192 MiB/sec and 480.7692 ops/sec)
>    linked 262144/262144 bytes at offset 131072
>    256 KiB, 1 ops; 0.0018 sec (133.833 MiB/sec and 535.3319 ops/sec)
>    wrote 655360/655360 bytes at offset 393216
>    640 KiB, 160 ops; 0.0093 sec (66.781 MiB/sec and 17095.8436 ops/sec)
>    Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap2'
>    At subvol /mnt/sdi/snap2
> 
>    File digest in the original filesystem:
>    9c13c61cb0b9f5abf45344375cb04dfa  /mnt/sdi/snap2/foobar
>    At subvol snap1
>    At snapshot snap2
>    ERROR: failed to clone extents to foobar: Invalid argument
> 
>    File digest in the new filesystem:
>    132f0396da8f48d2e667196bff882cfc  /mnt/sdi/snap2/foobar
> 
> The clone operation is invalid because its source range starts at the
> current eof of the file in the receiver, causing the receiver to get
> an -EINVAL error from the clone operation when attempting it.
> 
> For the example above, what happens is the following:
> 
> 1) When processing the extent at file offset 1M, the algorithm checks that
>     the extent is shared and can be (fully or partially) found at file
>     offset 0.
> 
>     At this point the file has a size (and eof) of 1M at the receiver;
> 
> 2) It finds that our extent item at file offset 1M has a data offset of
>     64K and, since the file extent item at file offset 0 has a data offset
>     of 0, it issues a clone operation, from the same file and root, that
>     has a source range offset of 64K, destination offset of 1M and a length
>     of 64K, since the extent item at file offset 0 refers only to the first
>     128K of the shared extent.
> 
>     After this clone operation, the file size (and eof) at the receiver is
>     increased from 1M to 1088K (1M + 64K);
> 
> 3) Now there's still 896K (960K - 64K) of data left to clone or write, so
>     it checks for the next file extent item, which starts at file offset
>     128K. This file extent item has a data offset of 0 and a length of
>     256K, so a clone operation with a source range offset of 256K, a
>     destination offset of 1088K (1M + 64K) and length of 128K is issued.
> 
>     After this operation the file size (and eof) at the receiver increases
>     from 1088K to 1216K (1088K + 128K);
> 
> 4) Now there's still 768K (896K - 128K) of data left to clone or write, so
>     it checks for the next file extent item, located at file offset 384K.
>     This file extent item points to a different extent, not the one we want
>     to clone, with a length of 640K. So we issue a write operation into the
>     file range 1216K (1088K + 128K, end of the last clone operation), with
>     a length of 640K and with a data matching the one we can find for that
>     range in send root.
> 
>     After this operation, the file size (and eof) at the receiver increases
>     from 1216K to 1856K (1216K + 640K);
> 
> 5) Now there's still 128K (768K - 640K) of data left to clone or write, so
>     we look into the file extent item, which is for file offset 1M and it
>     points to the extent we want to clone, with a data offset of 64K and a
>     length of 960K.
> 
>     However this matches the file offset we started with, the start of the
>     range to clone into. So we can't for sure find any file extent item
>     from here onwards with the rest of the data we want to clone, yet we
>     proceed and since the file extent item points to the shared extent,
>     with a data offset of 64K, we issue a clone operation with a source
>     range starting at file offset 1856K, which matches the file extent
>     item's offset, 1M, plus the amount of data cloned and written so far,
>     which is 64K (step 2) + 128K (step 3) + 640K (step 4). This clone
>     operation is invalid since the source range offset matches the current
>     eof of the file in the receiver. We should have stopped looking for
>     extents to clone at this point and instead fallback to write, which
>     would simply the contain the data in the file range from 1856K to
>     1856K + 128K.
> 
> So fix this by stopping the loop that looks for file ranges to clone at
> clone_range() when we reach the current eof of the file being processed,
> if we are cloning from the same file and using the send root as the clone
> root. This ensures any data not yet cloned will be sent to the receiver
> through a write operation.
> 
> A test case for fstests will follow soon.
> 
> Reported-by: Massimo B. <massimo.b@gmx.net>
> Link: https://lore.kernel.org/linux-btrfs/6ae34776e85912960a253a8327068a892998e685.camel@gmx.net/
> Fixes: 11f2069c113e ("Btrfs: send, allow clone operations within the same file")
> CC: stable@vger.kernel.org # 5.5+
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Read the commit log like 9 times, then read the comment and understood what was 
happening.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

  reply	other threads:[~2021-01-11 16:07 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-11 11:41 [PATCH] btrfs: send, fix invalid clone operations when cloning from the same file and root fdmanana
2021-01-11 16:06 ` Josef Bacik [this message]
2021-01-12 14:01 ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83c11071-2e0c-ef9d-4e6c-072b9027b166@toxicpanda.com \
    --to=josef@toxicpanda.com \
    --cc=fdmanana@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.