* Re: [PATCH] Btrfs: send, improve clone range
2019-03-28 9:40 [PATCH] Btrfs: send, improve clone range robbieko
@ 2019-03-28 22:52 ` Filipe Manana
2019-03-29 8:26 ` robbieko
2019-03-31 9:36 ` kbuild test robot
1 sibling, 1 reply; 4+ messages in thread
From: Filipe Manana @ 2019-03-28 22:52 UTC (permalink / raw)
To: robbieko; +Cc: linux-btrfs
On Thu, Mar 28, 2019 at 9:41 AM robbieko <robbieko@synology.com> wrote:
>
> From: Robbie Ko <robbieko@synology.com>
>
> Improve clone_range two use scenarios.
>
> 1. Remove the limit of clone inode size
> We can do partial clone range, so there is no need to limit
> the inode size.
There is.
Cloning from a source range that goes beyond the source's i_size
results in an -EINVAL error returned from the clone ioctl.
Try running fstests btrfs/007, with a seed value of 1553175244 and
2000 operations instead of 200:
# /home/fdmanana/git/hub/btrfs-progs/btrfs receive
/home/fdmanana/btrfs-tests/scratch_1
ERROR: failed to clone extents to p0/df/d10/f2c: Invalid argument
At snapshot incr
failed: '/home/fdmanana/git/hub/btrfs-progs/btrfs receive
/home/fdmanana/btrfs-tests/scratch_1'
>
> 2. In the scenarios of rewrite or clone_range, data_offset
> rarely matches exactly, so the chance of a clone is missed.
>
> e.g.
> 1. Write a 1M file
> dd if=/dev/zero of=1M bs=1M count=1
>
> 2. Clone 1M file
> cp --reflink 1M clone
>
> 3. Rewrite 4k on the clone file
> dd if=/dev/zero of=clone bs=4k count=1 conv=notrunc
>
> The disk layout is as follows:
> item 16 key (257 EXTENT_DATA 0) itemoff 15353 itemsize 53
> extent data disk byte 1103101952 nr 1048576
> extent data offset 0 nr 1048576 ram 1048576
> extent compression(none)
> ...
> item 22 key (258 EXTENT_DATA 0) itemoff 14959 itemsize 53
> extent data disk byte 1104150528 nr 4096
> extent data offset 0 nr 4096 ram 4096
> extent compression(none)
> item 23 key (258 EXTENT_DATA 4096) itemoff 14906 itemsize 53
> extent data disk byte 1103101952 nr 1048576
> extent data offset 4096 nr 1044480 ram 1048576
> extent compression(none)
>
> When send, inode 258 file offset 4096~1048576 (item 23) has a
> chance to clone_range, but because data_offset does not match
> inode 257 (item 16), it causes missed clone and can only transfer
> actual data.
>
> Improve the problem by judging whether the current data_offset
> has overlap with the file extent item, and if so, adjusting
> offset and extent_len so that we can clone correctly.
>
> Signed-off-by: Robbie Ko <robbieko@synology.com>
> ---
> fs/btrfs/send.c | 20 ++++++++++++++++----
> 1 file changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> index 7ea2d6b..7766b12 100644
> --- a/fs/btrfs/send.c
> +++ b/fs/btrfs/send.c
> @@ -1240,9 +1240,6 @@ static int __iterate_backrefs(u64 ino, u64 offset, u64 root, void *ctx_)
> if (ret < 0)
> return ret;
>
> - if (offset + bctx->data_offset + bctx->extent_len > i_size)
> - return 0;
And this is why the failure above happens.
> -
> /*
> * Make sure we don't consider clones from send_root that are
> * behind the current inode/offset.
> @@ -5148,6 +5145,7 @@ static int clone_range(struct send_ctx *sctx,
> u8 type;
> u64 ext_len;
> u64 clone_len;
> + u64 clone_data_offset;
CC [M] fs/btrfs/send.o
fs/btrfs/send.c: In function 'process_extent':
fs/btrfs/send.c:5218:60: warning: 'clone_data_offset' may be used
uninitialized in this function [-Wmaybe-uninitialized]
if (btrfs_file_extent_disk_bytenr(leaf, ei) == disk_byte &&
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
clone_data_offset == data_offset)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
fs/btrfs/send.c:5148:7: note: 'clone_data_offset' was declared here
u64 clone_data_offset;
^~~~~~~~~~~~~~~~~
LD [M] fs/btrfs/btrfs.o
$ gcc --version
gcc --version
gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
Copyright (C) 2016 Free Software Foundation, Inc.
Harmless but we shouldn't have warnings, initialize it to something
impossible (u64(-1) / U64_MAX) or re-structure the change below.
>
> if (slot >= btrfs_header_nritems(leaf)) {
> ret = btrfs_next_leaf(clone_root->root, path);
> @@ -5201,10 +5199,24 @@ static int clone_range(struct send_ctx *sctx,
> if (key.offset >= clone_root->offset + len)
> break;
>
> + if (btrfs_file_extent_disk_bytenr(leaf, ei) == disk_byte) {
> + clone_root->offset = key.offset;
> + clone_data_offset = btrfs_file_extent_offset(leaf, ei);
> + if (clone_data_offset < data_offset &&
> + clone_data_offset + ext_len > data_offset) {
> + u64 extent_offset;
> +
> + extent_offset = data_offset - clone_data_offset;
> + ext_len -= extent_offset;
> + clone_data_offset += extent_offset;
> + clone_root->offset += extent_offset;
> + }
> + }
> +
> clone_len = min_t(u64, ext_len, len);
>
> if (btrfs_file_extent_disk_bytenr(leaf, ei) == disk_byte &&
> - btrfs_file_extent_offset(leaf, ei) == data_offset)
> + clone_data_offset == data_offset)
> ret = send_clone(sctx, offset, clone_len, clone_root);
> else
> ret = send_extent_data(sctx, offset, clone_len);
> --
> 1.9.1
>
--
Filipe David Manana,
“Whether you think you can, or you think you can't — you're right.”
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] Btrfs: send, improve clone range
2019-03-28 9:40 [PATCH] Btrfs: send, improve clone range robbieko
2019-03-28 22:52 ` Filipe Manana
@ 2019-03-31 9:36 ` kbuild test robot
1 sibling, 0 replies; 4+ messages in thread
From: kbuild test robot @ 2019-03-31 9:36 UTC (permalink / raw)
To: robbieko; +Cc: kbuild-all, linux-btrfs, Robbie Ko
[-- Attachment #1: Type: text/plain, Size: 14929 bytes --]
Hi robbieko,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on btrfs/next]
[also build test WARNING on v5.1-rc2 next-20190329]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/robbieko/Btrfs-send-improve-clone-range/20190331-162406
base: https://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git next
config: i386-randconfig-x005-201913 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386
Note: it may well be a FALSE warning. FWIW you are at least aware of it now.
http://gcc.gnu.org/wiki/Better_Uninitialized_Warnings
All warnings (new ones prefixed by >>):
fs/btrfs/send.c: In function 'process_extent':
>> fs/btrfs/send.c:5195:60: warning: 'clone_data_offset' may be used uninitialized in this function [-Wmaybe-uninitialized]
if (btrfs_file_extent_disk_bytenr(leaf, ei) == disk_byte &&
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
clone_data_offset == data_offset)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
fs/btrfs/send.c:5125:7: note: 'clone_data_offset' was declared here
u64 clone_data_offset;
^~~~~~~~~~~~~~~~~
vim +/clone_data_offset +5195 fs/btrfs/send.c
d906d49f Filipe Manana 2015-10-02 5048
d906d49f Filipe Manana 2015-10-02 5049 static int clone_range(struct send_ctx *sctx,
d906d49f Filipe Manana 2015-10-02 5050 struct clone_root *clone_root,
d906d49f Filipe Manana 2015-10-02 5051 const u64 disk_byte,
d906d49f Filipe Manana 2015-10-02 5052 u64 data_offset,
d906d49f Filipe Manana 2015-10-02 5053 u64 offset,
d906d49f Filipe Manana 2015-10-02 5054 u64 len)
d906d49f Filipe Manana 2015-10-02 5055 {
d906d49f Filipe Manana 2015-10-02 5056 struct btrfs_path *path;
d906d49f Filipe Manana 2015-10-02 5057 struct btrfs_key key;
d906d49f Filipe Manana 2015-10-02 5058 int ret;
d906d49f Filipe Manana 2015-10-02 5059
72610b1b Filipe Manana 2017-08-10 5060 /*
72610b1b Filipe Manana 2017-08-10 5061 * Prevent cloning from a zero offset with a length matching the sector
72610b1b Filipe Manana 2017-08-10 5062 * size because in some scenarios this will make the receiver fail.
72610b1b Filipe Manana 2017-08-10 5063 *
72610b1b Filipe Manana 2017-08-10 5064 * For example, if in the source filesystem the extent at offset 0
72610b1b Filipe Manana 2017-08-10 5065 * has a length of sectorsize and it was written using direct IO, then
72610b1b Filipe Manana 2017-08-10 5066 * it can never be an inline extent (even if compression is enabled).
72610b1b Filipe Manana 2017-08-10 5067 * Then this extent can be cloned in the original filesystem to a non
72610b1b Filipe Manana 2017-08-10 5068 * zero file offset, but it may not be possible to clone in the
72610b1b Filipe Manana 2017-08-10 5069 * destination filesystem because it can be inlined due to compression
72610b1b Filipe Manana 2017-08-10 5070 * on the destination filesystem (as the receiver's write operations are
72610b1b Filipe Manana 2017-08-10 5071 * always done using buffered IO). The same happens when the original
72610b1b Filipe Manana 2017-08-10 5072 * filesystem does not have compression enabled but the destination
72610b1b Filipe Manana 2017-08-10 5073 * filesystem has.
72610b1b Filipe Manana 2017-08-10 5074 */
72610b1b Filipe Manana 2017-08-10 5075 if (clone_root->offset == 0 &&
72610b1b Filipe Manana 2017-08-10 5076 len == sctx->send_root->fs_info->sectorsize)
72610b1b Filipe Manana 2017-08-10 5077 return send_extent_data(sctx, offset, len);
72610b1b Filipe Manana 2017-08-10 5078
d906d49f Filipe Manana 2015-10-02 5079 path = alloc_path_for_send();
d906d49f Filipe Manana 2015-10-02 5080 if (!path)
d906d49f Filipe Manana 2015-10-02 5081 return -ENOMEM;
d906d49f Filipe Manana 2015-10-02 5082
d906d49f Filipe Manana 2015-10-02 5083 /*
d906d49f Filipe Manana 2015-10-02 5084 * We can't send a clone operation for the entire range if we find
d906d49f Filipe Manana 2015-10-02 5085 * extent items in the respective range in the source file that
d906d49f Filipe Manana 2015-10-02 5086 * refer to different extents or if we find holes.
d906d49f Filipe Manana 2015-10-02 5087 * So check for that and do a mix of clone and regular write/copy
d906d49f Filipe Manana 2015-10-02 5088 * operations if needed.
d906d49f Filipe Manana 2015-10-02 5089 *
d906d49f Filipe Manana 2015-10-02 5090 * Example:
d906d49f Filipe Manana 2015-10-02 5091 *
d906d49f Filipe Manana 2015-10-02 5092 * mkfs.btrfs -f /dev/sda
d906d49f Filipe Manana 2015-10-02 5093 * mount /dev/sda /mnt
d906d49f Filipe Manana 2015-10-02 5094 * xfs_io -f -c "pwrite -S 0xaa 0K 100K" /mnt/foo
d906d49f Filipe Manana 2015-10-02 5095 * cp --reflink=always /mnt/foo /mnt/bar
d906d49f Filipe Manana 2015-10-02 5096 * xfs_io -c "pwrite -S 0xbb 50K 50K" /mnt/foo
d906d49f Filipe Manana 2015-10-02 5097 * btrfs subvolume snapshot -r /mnt /mnt/snap
d906d49f Filipe Manana 2015-10-02 5098 *
d906d49f Filipe Manana 2015-10-02 5099 * If when we send the snapshot and we are processing file bar (which
d906d49f Filipe Manana 2015-10-02 5100 * has a higher inode number than foo) we blindly send a clone operation
d906d49f Filipe Manana 2015-10-02 5101 * for the [0, 100K[ range from foo to bar, the receiver ends up getting
d906d49f Filipe Manana 2015-10-02 5102 * a file bar that matches the content of file foo - iow, doesn't match
d906d49f Filipe Manana 2015-10-02 5103 * the content from bar in the original filesystem.
d906d49f Filipe Manana 2015-10-02 5104 */
d906d49f Filipe Manana 2015-10-02 5105 key.objectid = clone_root->ino;
d906d49f Filipe Manana 2015-10-02 5106 key.type = BTRFS_EXTENT_DATA_KEY;
d906d49f Filipe Manana 2015-10-02 5107 key.offset = clone_root->offset;
d906d49f Filipe Manana 2015-10-02 5108 ret = btrfs_search_slot(NULL, clone_root->root, &key, path, 0, 0);
d906d49f Filipe Manana 2015-10-02 5109 if (ret < 0)
d906d49f Filipe Manana 2015-10-02 5110 goto out;
d906d49f Filipe Manana 2015-10-02 5111 if (ret > 0 && path->slots[0] > 0) {
d906d49f Filipe Manana 2015-10-02 5112 btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0] - 1);
d906d49f Filipe Manana 2015-10-02 5113 if (key.objectid == clone_root->ino &&
d906d49f Filipe Manana 2015-10-02 5114 key.type == BTRFS_EXTENT_DATA_KEY)
d906d49f Filipe Manana 2015-10-02 5115 path->slots[0]--;
d906d49f Filipe Manana 2015-10-02 5116 }
d906d49f Filipe Manana 2015-10-02 5117
d906d49f Filipe Manana 2015-10-02 5118 while (true) {
d906d49f Filipe Manana 2015-10-02 5119 struct extent_buffer *leaf = path->nodes[0];
d906d49f Filipe Manana 2015-10-02 5120 int slot = path->slots[0];
d906d49f Filipe Manana 2015-10-02 5121 struct btrfs_file_extent_item *ei;
d906d49f Filipe Manana 2015-10-02 5122 u8 type;
d906d49f Filipe Manana 2015-10-02 5123 u64 ext_len;
d906d49f Filipe Manana 2015-10-02 5124 u64 clone_len;
c4b3268d Robbie Ko 2019-03-28 5125 u64 clone_data_offset;
d906d49f Filipe Manana 2015-10-02 5126
d906d49f Filipe Manana 2015-10-02 5127 if (slot >= btrfs_header_nritems(leaf)) {
d906d49f Filipe Manana 2015-10-02 5128 ret = btrfs_next_leaf(clone_root->root, path);
d906d49f Filipe Manana 2015-10-02 5129 if (ret < 0)
d906d49f Filipe Manana 2015-10-02 5130 goto out;
d906d49f Filipe Manana 2015-10-02 5131 else if (ret > 0)
d906d49f Filipe Manana 2015-10-02 5132 break;
d906d49f Filipe Manana 2015-10-02 5133 continue;
d906d49f Filipe Manana 2015-10-02 5134 }
d906d49f Filipe Manana 2015-10-02 5135
d906d49f Filipe Manana 2015-10-02 5136 btrfs_item_key_to_cpu(leaf, &key, slot);
d906d49f Filipe Manana 2015-10-02 5137
d906d49f Filipe Manana 2015-10-02 5138 /*
d906d49f Filipe Manana 2015-10-02 5139 * We might have an implicit trailing hole (NO_HOLES feature
d906d49f Filipe Manana 2015-10-02 5140 * enabled). We deal with it after leaving this loop.
d906d49f Filipe Manana 2015-10-02 5141 */
d906d49f Filipe Manana 2015-10-02 5142 if (key.objectid != clone_root->ino ||
d906d49f Filipe Manana 2015-10-02 5143 key.type != BTRFS_EXTENT_DATA_KEY)
d906d49f Filipe Manana 2015-10-02 5144 break;
d906d49f Filipe Manana 2015-10-02 5145
d906d49f Filipe Manana 2015-10-02 5146 ei = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item);
d906d49f Filipe Manana 2015-10-02 5147 type = btrfs_file_extent_type(leaf, ei);
d906d49f Filipe Manana 2015-10-02 5148 if (type == BTRFS_FILE_EXTENT_INLINE) {
d906d49f Filipe Manana 2015-10-02 5149 ext_len = btrfs_file_extent_inline_len(leaf, slot, ei);
09cbfeaf Kirill A. Shutemov 2016-04-01 5150 ext_len = PAGE_ALIGN(ext_len);
d906d49f Filipe Manana 2015-10-02 5151 } else {
d906d49f Filipe Manana 2015-10-02 5152 ext_len = btrfs_file_extent_num_bytes(leaf, ei);
d906d49f Filipe Manana 2015-10-02 5153 }
d906d49f Filipe Manana 2015-10-02 5154
d906d49f Filipe Manana 2015-10-02 5155 if (key.offset + ext_len <= clone_root->offset)
d906d49f Filipe Manana 2015-10-02 5156 goto next;
d906d49f Filipe Manana 2015-10-02 5157
d906d49f Filipe Manana 2015-10-02 5158 if (key.offset > clone_root->offset) {
d906d49f Filipe Manana 2015-10-02 5159 /* Implicit hole, NO_HOLES feature enabled. */
d906d49f Filipe Manana 2015-10-02 5160 u64 hole_len = key.offset - clone_root->offset;
d906d49f Filipe Manana 2015-10-02 5161
d906d49f Filipe Manana 2015-10-02 5162 if (hole_len > len)
d906d49f Filipe Manana 2015-10-02 5163 hole_len = len;
d906d49f Filipe Manana 2015-10-02 5164 ret = send_extent_data(sctx, offset, hole_len);
d906d49f Filipe Manana 2015-10-02 5165 if (ret < 0)
d906d49f Filipe Manana 2015-10-02 5166 goto out;
d906d49f Filipe Manana 2015-10-02 5167
d906d49f Filipe Manana 2015-10-02 5168 len -= hole_len;
d906d49f Filipe Manana 2015-10-02 5169 if (len == 0)
d906d49f Filipe Manana 2015-10-02 5170 break;
d906d49f Filipe Manana 2015-10-02 5171 offset += hole_len;
d906d49f Filipe Manana 2015-10-02 5172 clone_root->offset += hole_len;
d906d49f Filipe Manana 2015-10-02 5173 data_offset += hole_len;
d906d49f Filipe Manana 2015-10-02 5174 }
d906d49f Filipe Manana 2015-10-02 5175
d906d49f Filipe Manana 2015-10-02 5176 if (key.offset >= clone_root->offset + len)
d906d49f Filipe Manana 2015-10-02 5177 break;
d906d49f Filipe Manana 2015-10-02 5178
c4b3268d Robbie Ko 2019-03-28 5179 if (btrfs_file_extent_disk_bytenr(leaf, ei) == disk_byte) {
c4b3268d Robbie Ko 2019-03-28 5180 clone_root->offset = key.offset;
c4b3268d Robbie Ko 2019-03-28 5181 clone_data_offset = btrfs_file_extent_offset(leaf, ei);
c4b3268d Robbie Ko 2019-03-28 5182 if (clone_data_offset < data_offset &&
c4b3268d Robbie Ko 2019-03-28 5183 clone_data_offset + ext_len > data_offset) {
c4b3268d Robbie Ko 2019-03-28 5184 u64 extent_offset;
c4b3268d Robbie Ko 2019-03-28 5185
c4b3268d Robbie Ko 2019-03-28 5186 extent_offset = data_offset - clone_data_offset;
c4b3268d Robbie Ko 2019-03-28 5187 ext_len -= extent_offset;
c4b3268d Robbie Ko 2019-03-28 5188 clone_data_offset += extent_offset;
c4b3268d Robbie Ko 2019-03-28 5189 clone_root->offset += extent_offset;
c4b3268d Robbie Ko 2019-03-28 5190 }
c4b3268d Robbie Ko 2019-03-28 5191 }
c4b3268d Robbie Ko 2019-03-28 5192
d906d49f Filipe Manana 2015-10-02 5193 clone_len = min_t(u64, ext_len, len);
d906d49f Filipe Manana 2015-10-02 5194
d906d49f Filipe Manana 2015-10-02 @5195 if (btrfs_file_extent_disk_bytenr(leaf, ei) == disk_byte &&
c4b3268d Robbie Ko 2019-03-28 5196 clone_data_offset == data_offset)
d906d49f Filipe Manana 2015-10-02 5197 ret = send_clone(sctx, offset, clone_len, clone_root);
d906d49f Filipe Manana 2015-10-02 5198 else
d906d49f Filipe Manana 2015-10-02 5199 ret = send_extent_data(sctx, offset, clone_len);
d906d49f Filipe Manana 2015-10-02 5200
d906d49f Filipe Manana 2015-10-02 5201 if (ret < 0)
d906d49f Filipe Manana 2015-10-02 5202 goto out;
d906d49f Filipe Manana 2015-10-02 5203
d906d49f Filipe Manana 2015-10-02 5204 len -= clone_len;
d906d49f Filipe Manana 2015-10-02 5205 if (len == 0)
d906d49f Filipe Manana 2015-10-02 5206 break;
d906d49f Filipe Manana 2015-10-02 5207 offset += clone_len;
d906d49f Filipe Manana 2015-10-02 5208 clone_root->offset += clone_len;
d906d49f Filipe Manana 2015-10-02 5209 data_offset += clone_len;
d906d49f Filipe Manana 2015-10-02 5210 next:
d906d49f Filipe Manana 2015-10-02 5211 path->slots[0]++;
d906d49f Filipe Manana 2015-10-02 5212 }
d906d49f Filipe Manana 2015-10-02 5213
d906d49f Filipe Manana 2015-10-02 5214 if (len > 0)
d906d49f Filipe Manana 2015-10-02 5215 ret = send_extent_data(sctx, offset, len);
d906d49f Filipe Manana 2015-10-02 5216 else
d906d49f Filipe Manana 2015-10-02 5217 ret = 0;
d906d49f Filipe Manana 2015-10-02 5218 out:
d906d49f Filipe Manana 2015-10-02 5219 btrfs_free_path(path);
d906d49f Filipe Manana 2015-10-02 5220 return ret;
d906d49f Filipe Manana 2015-10-02 5221 }
d906d49f Filipe Manana 2015-10-02 5222
:::::: The code at line 5195 was first introduced by commit
:::::: d906d49fc5f49a0129527e8fbc13312f36b9b9ce Btrfs: send, fix file corruption due to incorrect cloning operations
:::::: TO: Filipe Manana <fdmanana@suse.com>
:::::: CC: Filipe Manana <fdmanana@suse.com>
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 29562 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread