All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] btrfs: implement send/receive of compressed extents without decompressing
@ 2020-08-21  7:39 Omar Sandoval
  2020-08-21  7:39 ` [PATCH 1/9] btrfs: send: get rid of i_size logic in send_write() Omar Sandoval
                   ` (21 more replies)
  0 siblings, 22 replies; 37+ messages in thread
From: Omar Sandoval @ 2020-08-21  7:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: linux-fsdevel

This series uses the interface added in "fs: interface for directly
reading/writing compressed data" to send and receive compressed data
without wastefully decompressing and recompressing it. It does so by

1. Bumping the send stream protocol version to 2.
2. Adding a new command, BTRFS_SEND_C_ENCODED_WRITE, and its associated
   attributes that indicates a write using the new encoded I/O
   interface.
3. Sending compressed extents with BTRFS_SEND_C_ENCODED_WRITE when
   requested by the user.
4. Falling back to decompressing and writing the decompressed data if
   encoded I/O fails.

Benchmarks
==========

I ran some benchmarks on send and receive of a zstd (level 3) compressed
snapshot of a server's root filesystem which is about 23GB when
compressed and 50GB when decompressed.

Send v1:
0.41user 81.97system 2:21.71elapsed 58%CPU (0avgtext+0avgdata 2900maxresident)k
47182656inputs+0outputs (10major+119minor)pagefaults 0swaps

Send compressed:
0.43user 60.53system 2:20.62elapsed 43%CPU (0avgtext+0avgdata 2836maxresident)k
47778864inputs+0outputs (8major+117minor)pagefaults 0swaps

In this case, the bottleneck for send is reading the metadata trees and
data from disk, so there's not much of a wall time improvement, but
since the kernel doesn't have to decompress the data in the compressed
case, it uses significantly less CPU and system time.

Receive v1 into a filesystem with compress=none:
15.58user 62.36system 7:34.44elapsed 17%CPU (0avgtext+0avgdata 3028maxresident)k
104719648inputs+105333248outputs (1major+140minor)pagefaults 0swaps

Receive v1 into a filesystem with compress-force=zstd:
15.45user 63.99system 5:11.57elapsed 25%CPU (0avgtext+0avgdata 3100maxresident)k
104587240inputs+105379328outputs (1major+143minor)pagefaults 0swaps

Receive compressed into a filesystem with compress-force=zstd:
7.95user 44.53system 3:42.79elapsed 23%CPU (0avgtext+0avgdata 2992maxresident)k
46909600inputs+21603216outputs (2major+176minor)pagefaults 0swaps

Without compressed receive, recompressing the data is still a wall time
win because it requires much less I/O. However, compressed receive
reduces the wall time even further.

The v1 send stream is 50GB, and the v2 send stream is 23 GB. The v1 send
stream compresses down to 17GB with zstd (level 3), so compressed send
gets pretty close with no extra CPU overhead (the reason that compressed
send is still larger is of course that we compress extents individually,
which does not compress as efficiently as compressing the entire
filesystem representation in one go).

# ls -lh v1.send v1.send.zst compressed.send
-rw-r--r-- 1 root root 23G Aug 17 12:34 compressed.send                 
-rw-r--r-- 1 root root 50G Aug 17 12:13 v1.send                           
-rw-r--r-- 1 root root 17G Aug 17 12:28 v1.send.zst               

Protocol Updates
================

This series makes some changes to the send stream protocol beyond adding
the encoded write command/attributes and bumping the version. Namely, v1
has a 64k limit on the size of a write due to the 16-bit attribute
length. This is not enough for encoded writes, as compressed extents may
be up to 128k and cannot be split up. To address this, the
BTRFS_SEND_A_DATA is treated specially in v2: its length is implicitly
the remaining length of the command (which has a 32-bit length). This
was the last bad of the options I considered.

There are other commands that we've been wanting to add to the protocol:
fallocate and FS_IOC_SETFLAGS. This series reserves their command and
attribute numbers but does not implement kernel support for emitting
them. However, it does implement support in receive for them, so the
kernel can start emitting those whenever we get around to implementing
them.

Interface
=========

For the send ioctl, stream version 2 is opt-in, and compressed writes
are opt-in separately (but dependent on) stream version 2.

Accordingly, `btrfs send` now accepts a `--stream-version` option and a
`--compressed` option; the latter implies `--stream-version 2`.

`btrfs receive` also accepts a `--force-decompress` option that forces
the fallback to decompressing and writing the decompressed data.

These options are provided to give the user flexibility in case they
don't want their receiving filesytem to be compressed.

Patches
=======

The kernel patches are based on kdave/misc-next plus my "fs: interface
for directly reading/writing compressed data" series. Patches 1-3 are
improvements to the generic send code.  Patches 4-7 do some preparation
for stream v2 and compressed send. Patch 8 implements compressed send.
Patch 9 modified the ioctl to accept the new flags and enable the new
feature.

Omar Sandoval (9):
  btrfs: send: get rid of i_size logic in send_write()
  btrfs: send: avoid copying file data
  btrfs: send: use btrfs_file_extent_end() in send_write_or_clone()
  btrfs: add send_stream_version attribute to sysfs
  btrfs: add send stream v2 definitions
  btrfs: send: write larger chunks when using stream v2
  btrfs: send: allocate send buffer with alloc_page() and vmap() for v2
  btrfs: send: send compressed extents with encoded writes
  btrfs: send: enable support for stream v2 and compressed writes

 fs/btrfs/ctree.h           |   4 +
 fs/btrfs/inode.c           |   6 +-
 fs/btrfs/send.c            | 419 ++++++++++++++++++++++++++++---------
 fs/btrfs/send.h            |  33 ++-
 fs/btrfs/sysfs.c           |   9 +
 include/uapi/linux/btrfs.h |  17 +-
 6 files changed, 384 insertions(+), 104 deletions(-)

The btrfs-progs patches were written by Boris Burkov. Patches 1-5 are
preparation. Patch 6 implements encoded writes. Patch 7 implements the
fallback to decompressing. Patch 8-9 implement the other commands. Patch
10 adds the new `btrfs send` options. Patch 11 adds a test case.

Boris Burkov (11):
  btrfs-progs: receive: support v2 send stream larger tlv_len
  btrfs-progs: receive: dynamically allocate sctx->read_buf
  btrfs-progs: receive: support v2 send stream DATA tlv format
  btrfs-progs: receive: add send stream v2 cmds and attrs to send.h
  btrfs-progs: receive: add stub implementation for pwritev2
  btrfs-progs: receive: process encoded_write commands
  btrfs-progs: receive: encoded_write fallback to explicit decode and
    write
  btrfs-progs: receive: process fallocate commands
  btrfs-progs: receive: process setflags ioctl commands
  btrfs-progs: send: stream v2 ioctl flags
  btrfs-progs: receive: add tests for basic encoded_write send/receive

 Makefile                                      |   4 +-
 cmds/receive-dump.c                           |  31 +-
 cmds/receive.c                                | 402 +++++++++++++++++-
 cmds/send.c                                   |  39 +-
 common/send-stream.c                          | 159 +++++--
 common/send-stream.h                          |   7 +
 configure.ac                                  |   1 +
 ioctl.h                                       |  17 +-
 libbtrfsutil/btrfs.h                          |  17 +-
 send.h                                        |  19 +-
 stubs.c                                       |  24 ++
 stubs.h                                       |  50 +++
 .../040-receive-write-encoded/test.sh         | 114 +++++
 13 files changed, 832 insertions(+), 52 deletions(-)
 create mode 100644 stubs.c
 create mode 100644 stubs.h
 create mode 100755 tests/misc-tests/040-receive-write-encoded/test.sh

Thanks!

-- 
2.28.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 1/9] btrfs: send: get rid of i_size logic in send_write()
  2020-08-21  7:39 [PATCH 0/9] btrfs: implement send/receive of compressed extents without decompressing Omar Sandoval
@ 2020-08-21  7:39 ` Omar Sandoval
  2020-08-21 17:26   ` Filipe Manana
  2020-08-24 17:39   ` Josef Bacik
  2020-08-21  7:39 ` [PATCH 2/9] btrfs: send: avoid copying file data Omar Sandoval
                   ` (20 subsequent siblings)
  21 siblings, 2 replies; 37+ messages in thread
From: Omar Sandoval @ 2020-08-21  7:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: linux-fsdevel

From: Omar Sandoval <osandov@fb.com>

send_write()/fill_read_buf() have some logic for avoiding reading past
i_size. However, everywhere that we call
send_write()/send_extent_data(), we've already clamped the length down
to i_size. Get rid of the i_size handling, which simplifies the next
change.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/send.c | 37 ++++++++++---------------------------
 1 file changed, 10 insertions(+), 27 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 7c7c09fc65e8..8af5e867e4ca 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -4794,7 +4794,7 @@ static int process_all_new_xattrs(struct send_ctx *sctx)
 	return ret;
 }
 
-static ssize_t fill_read_buf(struct send_ctx *sctx, u64 offset, u32 len)
+static int fill_read_buf(struct send_ctx *sctx, u64 offset, u32 len)
 {
 	struct btrfs_root *root = sctx->send_root;
 	struct btrfs_fs_info *fs_info = root->fs_info;
@@ -4804,21 +4804,13 @@ static ssize_t fill_read_buf(struct send_ctx *sctx, u64 offset, u32 len)
 	pgoff_t index = offset >> PAGE_SHIFT;
 	pgoff_t last_index;
 	unsigned pg_offset = offset_in_page(offset);
-	ssize_t ret = 0;
+	int ret = 0;
+	size_t read = 0;
 
 	inode = btrfs_iget(fs_info->sb, sctx->cur_ino, root);
 	if (IS_ERR(inode))
 		return PTR_ERR(inode);
 
-	if (offset + len > i_size_read(inode)) {
-		if (offset > i_size_read(inode))
-			len = 0;
-		else
-			len = offset - i_size_read(inode);
-	}
-	if (len == 0)
-		goto out;
-
 	last_index = (offset + len - 1) >> PAGE_SHIFT;
 
 	/* initial readahead */
@@ -4859,16 +4851,15 @@ static ssize_t fill_read_buf(struct send_ctx *sctx, u64 offset, u32 len)
 		}
 
 		addr = kmap(page);
-		memcpy(sctx->read_buf + ret, addr + pg_offset, cur_len);
+		memcpy(sctx->read_buf + read, addr + pg_offset, cur_len);
 		kunmap(page);
 		unlock_page(page);
 		put_page(page);
 		index++;
 		pg_offset = 0;
 		len -= cur_len;
-		ret += cur_len;
+		read += cur_len;
 	}
-out:
 	iput(inode);
 	return ret;
 }
@@ -4882,7 +4873,6 @@ static int send_write(struct send_ctx *sctx, u64 offset, u32 len)
 	struct btrfs_fs_info *fs_info = sctx->send_root->fs_info;
 	int ret = 0;
 	struct fs_path *p;
-	ssize_t num_read = 0;
 
 	p = fs_path_alloc();
 	if (!p)
@@ -4890,12 +4880,9 @@ static int send_write(struct send_ctx *sctx, u64 offset, u32 len)
 
 	btrfs_debug(fs_info, "send_write offset=%llu, len=%d", offset, len);
 
-	num_read = fill_read_buf(sctx, offset, len);
-	if (num_read <= 0) {
-		if (num_read < 0)
-			ret = num_read;
+	ret = fill_read_buf(sctx, offset, len);
+	if (ret < 0)
 		goto out;
-	}
 
 	ret = begin_cmd(sctx, BTRFS_SEND_C_WRITE);
 	if (ret < 0)
@@ -4907,16 +4894,14 @@ static int send_write(struct send_ctx *sctx, u64 offset, u32 len)
 
 	TLV_PUT_PATH(sctx, BTRFS_SEND_A_PATH, p);
 	TLV_PUT_U64(sctx, BTRFS_SEND_A_FILE_OFFSET, offset);
-	TLV_PUT(sctx, BTRFS_SEND_A_DATA, sctx->read_buf, num_read);
+	TLV_PUT(sctx, BTRFS_SEND_A_DATA, sctx->read_buf, len);
 
 	ret = send_cmd(sctx);
 
 tlv_put_failure:
 out:
 	fs_path_free(p);
-	if (ret < 0)
-		return ret;
-	return num_read;
+	return ret;
 }
 
 /*
@@ -5095,9 +5080,7 @@ static int send_extent_data(struct send_ctx *sctx,
 		ret = send_write(sctx, offset + sent, size);
 		if (ret < 0)
 			return ret;
-		if (!ret)
-			break;
-		sent += ret;
+		sent += size;
 	}
 	return 0;
 }
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 2/9] btrfs: send: avoid copying file data
  2020-08-21  7:39 [PATCH 0/9] btrfs: implement send/receive of compressed extents without decompressing Omar Sandoval
  2020-08-21  7:39 ` [PATCH 1/9] btrfs: send: get rid of i_size logic in send_write() Omar Sandoval
@ 2020-08-21  7:39 ` Omar Sandoval
  2020-08-21 17:29   ` Filipe Manana
                     ` (2 more replies)
  2020-08-21  7:39 ` [PATCH 3/9] btrfs: send: use btrfs_file_extent_end() in send_write_or_clone() Omar Sandoval
                   ` (19 subsequent siblings)
  21 siblings, 3 replies; 37+ messages in thread
From: Omar Sandoval @ 2020-08-21  7:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: linux-fsdevel

From: Omar Sandoval <osandov@fb.com>

send_write() currently copies from the page cache to sctx->read_buf, and
then from sctx->read_buf to sctx->send_buf. Similarly, send_hole()
zeroes sctx->read_buf and then copies from sctx->read_buf to
sctx->send_buf. However, if we write the TLV header manually, we can
copy to sctx->send_buf directly and get rid of sctx->read_buf.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/send.c | 65 +++++++++++++++++++++++++++++--------------------
 fs/btrfs/send.h |  1 -
 2 files changed, 39 insertions(+), 27 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 8af5e867e4ca..e70f5ceb3261 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -122,8 +122,6 @@ struct send_ctx {
 
 	struct file_ra_state ra;
 
-	char *read_buf;
-
 	/*
 	 * We process inodes by their increasing order, so if before an
 	 * incremental send we reverse the parent/child relationship of
@@ -4794,7 +4792,25 @@ static int process_all_new_xattrs(struct send_ctx *sctx)
 	return ret;
 }
 
-static int fill_read_buf(struct send_ctx *sctx, u64 offset, u32 len)
+static u64 max_send_read_size(struct send_ctx *sctx)
+{
+	return sctx->send_max_size - SZ_16K;
+}
+
+static int put_data_header(struct send_ctx *sctx, u32 len)
+{
+	struct btrfs_tlv_header *hdr;
+
+	if (sctx->send_max_size - sctx->send_size < sizeof(*hdr) + len)
+		return -EOVERFLOW;
+	hdr = (struct btrfs_tlv_header *)(sctx->send_buf + sctx->send_size);
+	hdr->tlv_type = cpu_to_le16(BTRFS_SEND_A_DATA);
+	hdr->tlv_len = cpu_to_le16(len);
+	sctx->send_size += sizeof(*hdr);
+	return 0;
+}
+
+static int put_file_data(struct send_ctx *sctx, u64 offset, u32 len)
 {
 	struct btrfs_root *root = sctx->send_root;
 	struct btrfs_fs_info *fs_info = root->fs_info;
@@ -4804,8 +4820,11 @@ static int fill_read_buf(struct send_ctx *sctx, u64 offset, u32 len)
 	pgoff_t index = offset >> PAGE_SHIFT;
 	pgoff_t last_index;
 	unsigned pg_offset = offset_in_page(offset);
-	int ret = 0;
-	size_t read = 0;
+	int ret;
+
+	ret = put_data_header(sctx, len);
+	if (ret)
+		return ret;
 
 	inode = btrfs_iget(fs_info->sb, sctx->cur_ino, root);
 	if (IS_ERR(inode))
@@ -4851,14 +4870,15 @@ static int fill_read_buf(struct send_ctx *sctx, u64 offset, u32 len)
 		}
 
 		addr = kmap(page);
-		memcpy(sctx->read_buf + read, addr + pg_offset, cur_len);
+		memcpy(sctx->send_buf + sctx->send_size, addr + pg_offset,
+		       cur_len);
 		kunmap(page);
 		unlock_page(page);
 		put_page(page);
 		index++;
 		pg_offset = 0;
 		len -= cur_len;
-		read += cur_len;
+		sctx->send_size += cur_len;
 	}
 	iput(inode);
 	return ret;
@@ -4880,10 +4900,6 @@ static int send_write(struct send_ctx *sctx, u64 offset, u32 len)
 
 	btrfs_debug(fs_info, "send_write offset=%llu, len=%d", offset, len);
 
-	ret = fill_read_buf(sctx, offset, len);
-	if (ret < 0)
-		goto out;
-
 	ret = begin_cmd(sctx, BTRFS_SEND_C_WRITE);
 	if (ret < 0)
 		goto out;
@@ -4894,7 +4910,9 @@ static int send_write(struct send_ctx *sctx, u64 offset, u32 len)
 
 	TLV_PUT_PATH(sctx, BTRFS_SEND_A_PATH, p);
 	TLV_PUT_U64(sctx, BTRFS_SEND_A_FILE_OFFSET, offset);
-	TLV_PUT(sctx, BTRFS_SEND_A_DATA, sctx->read_buf, len);
+	ret = put_file_data(sctx, offset, len);
+	if (ret < 0)
+		goto out;
 
 	ret = send_cmd(sctx);
 
@@ -5013,8 +5031,8 @@ static int send_update_extent(struct send_ctx *sctx,
 static int send_hole(struct send_ctx *sctx, u64 end)
 {
 	struct fs_path *p = NULL;
+	u64 read_size = max_send_read_size(sctx);
 	u64 offset = sctx->cur_inode_last_extent;
-	u64 len;
 	int ret = 0;
 
 	/*
@@ -5041,16 +5059,19 @@ static int send_hole(struct send_ctx *sctx, u64 end)
 	ret = get_cur_path(sctx, sctx->cur_ino, sctx->cur_inode_gen, p);
 	if (ret < 0)
 		goto tlv_put_failure;
-	memset(sctx->read_buf, 0, BTRFS_SEND_READ_SIZE);
 	while (offset < end) {
-		len = min_t(u64, end - offset, BTRFS_SEND_READ_SIZE);
+		u64 len = min(end - offset, read_size);
 
 		ret = begin_cmd(sctx, BTRFS_SEND_C_WRITE);
 		if (ret < 0)
 			break;
 		TLV_PUT_PATH(sctx, BTRFS_SEND_A_PATH, p);
 		TLV_PUT_U64(sctx, BTRFS_SEND_A_FILE_OFFSET, offset);
-		TLV_PUT(sctx, BTRFS_SEND_A_DATA, sctx->read_buf, len);
+		ret = put_data_header(sctx, len);
+		if (ret < 0)
+			break;
+		memset(sctx->send_buf + sctx->send_size, 0, len);
+		sctx->send_size += len;
 		ret = send_cmd(sctx);
 		if (ret < 0)
 			break;
@@ -5066,17 +5087,16 @@ static int send_extent_data(struct send_ctx *sctx,
 			    const u64 offset,
 			    const u64 len)
 {
+	u64 read_size = max_send_read_size(sctx);
 	u64 sent = 0;
 
 	if (sctx->flags & BTRFS_SEND_FLAG_NO_FILE_DATA)
 		return send_update_extent(sctx, offset, len);
 
 	while (sent < len) {
-		u64 size = len - sent;
+		u64 size = min(len - sent, read_size);
 		int ret;
 
-		if (size > BTRFS_SEND_READ_SIZE)
-			size = BTRFS_SEND_READ_SIZE;
 		ret = send_write(sctx, offset + sent, size);
 		if (ret < 0)
 			return ret;
@@ -7145,12 +7165,6 @@ long btrfs_ioctl_send(struct file *mnt_file, struct btrfs_ioctl_send_args *arg)
 		goto out;
 	}
 
-	sctx->read_buf = kvmalloc(BTRFS_SEND_READ_SIZE, GFP_KERNEL);
-	if (!sctx->read_buf) {
-		ret = -ENOMEM;
-		goto out;
-	}
-
 	sctx->pending_dir_moves = RB_ROOT;
 	sctx->waiting_dir_moves = RB_ROOT;
 	sctx->orphan_dirs = RB_ROOT;
@@ -7354,7 +7368,6 @@ long btrfs_ioctl_send(struct file *mnt_file, struct btrfs_ioctl_send_args *arg)
 
 		kvfree(sctx->clone_roots);
 		kvfree(sctx->send_buf);
-		kvfree(sctx->read_buf);
 
 		name_cache_free(sctx);
 
diff --git a/fs/btrfs/send.h b/fs/btrfs/send.h
index ead397f7034f..de91488b7cd0 100644
--- a/fs/btrfs/send.h
+++ b/fs/btrfs/send.h
@@ -13,7 +13,6 @@
 #define BTRFS_SEND_STREAM_VERSION 1
 
 #define BTRFS_SEND_BUF_SIZE SZ_64K
-#define BTRFS_SEND_READ_SIZE (48 * SZ_1K)
 
 enum btrfs_tlv_type {
 	BTRFS_TLV_U8,
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 3/9] btrfs: send: use btrfs_file_extent_end() in send_write_or_clone()
  2020-08-21  7:39 [PATCH 0/9] btrfs: implement send/receive of compressed extents without decompressing Omar Sandoval
  2020-08-21  7:39 ` [PATCH 1/9] btrfs: send: get rid of i_size logic in send_write() Omar Sandoval
  2020-08-21  7:39 ` [PATCH 2/9] btrfs: send: avoid copying file data Omar Sandoval
@ 2020-08-21  7:39 ` Omar Sandoval
  2020-08-21 17:30   ` Filipe Manana
  2020-08-21  7:39 ` [PATCH 4/9] btrfs: add send_stream_version attribute to sysfs Omar Sandoval
                   ` (18 subsequent siblings)
  21 siblings, 1 reply; 37+ messages in thread
From: Omar Sandoval @ 2020-08-21  7:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: linux-fsdevel

From: Omar Sandoval <osandov@fb.com>

send_write_or_clone() basically has an open-coded copy of
btrfs_file_extent_end() except that it (incorrectly) aligns to PAGE_SIZE
instead of sectorsize. Fix and simplify the code by using
btrfs_file_extent_end().

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/send.c | 44 +++++++++++---------------------------------
 1 file changed, 11 insertions(+), 33 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index e70f5ceb3261..37ce21361782 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -5400,51 +5400,29 @@ static int send_write_or_clone(struct send_ctx *sctx,
 			       struct clone_root *clone_root)
 {
 	int ret = 0;
-	struct btrfs_file_extent_item *ei;
 	u64 offset = key->offset;
-	u64 len;
-	u8 type;
+	u64 end;
 	u64 bs = sctx->send_root->fs_info->sb->s_blocksize;
 
-	ei = btrfs_item_ptr(path->nodes[0], path->slots[0],
-			struct btrfs_file_extent_item);
-	type = btrfs_file_extent_type(path->nodes[0], ei);
-	if (type == BTRFS_FILE_EXTENT_INLINE) {
-		len = btrfs_file_extent_ram_bytes(path->nodes[0], ei);
-		/*
-		 * it is possible the inline item won't cover the whole page,
-		 * but there may be items after this page.  Make
-		 * sure to send the whole thing
-		 */
-		len = PAGE_ALIGN(len);
-	} else {
-		len = btrfs_file_extent_num_bytes(path->nodes[0], ei);
-	}
-
-	if (offset >= sctx->cur_inode_size) {
-		ret = 0;
-		goto out;
-	}
-	if (offset + len > sctx->cur_inode_size)
-		len = sctx->cur_inode_size - offset;
-	if (len == 0) {
-		ret = 0;
-		goto out;
-	}
+	end = min(btrfs_file_extent_end(path), sctx->cur_inode_size);
+	if (offset >= end)
+		return 0;
 
-	if (clone_root && IS_ALIGNED(offset + len, bs)) {
+	if (clone_root && IS_ALIGNED(end, bs)) {
+		struct btrfs_file_extent_item *ei;
 		u64 disk_byte;
 		u64 data_offset;
 
+		ei = btrfs_item_ptr(path->nodes[0], path->slots[0],
+				    struct btrfs_file_extent_item);
 		disk_byte = btrfs_file_extent_disk_bytenr(path->nodes[0], ei);
 		data_offset = btrfs_file_extent_offset(path->nodes[0], ei);
 		ret = clone_range(sctx, clone_root, disk_byte, data_offset,
-				  offset, len);
+				  offset, end - offset);
 	} else {
-		ret = send_extent_data(sctx, offset, len);
+		ret = send_extent_data(sctx, offset, end - offset);
 	}
-	sctx->cur_inode_next_write_offset = offset + len;
-out:
+	sctx->cur_inode_next_write_offset = end;
 	return ret;
 }
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 4/9] btrfs: add send_stream_version attribute to sysfs
  2020-08-21  7:39 [PATCH 0/9] btrfs: implement send/receive of compressed extents without decompressing Omar Sandoval
                   ` (2 preceding siblings ...)
  2020-08-21  7:39 ` [PATCH 3/9] btrfs: send: use btrfs_file_extent_end() in send_write_or_clone() Omar Sandoval
@ 2020-08-21  7:39 ` Omar Sandoval
  2020-08-21  7:39 ` [PATCH 5/9] btrfs: add send stream v2 definitions Omar Sandoval
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 37+ messages in thread
From: Omar Sandoval @ 2020-08-21  7:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: linux-fsdevel

From: Omar Sandoval <osandov@fb.com>

This reports the latest send stream version supported by the kernel.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/sysfs.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index 8593086d1d10..34d21edff7d7 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -14,6 +14,7 @@
 #include "ctree.h"
 #include "discard.h"
 #include "disk-io.h"
+#include "send.h"
 #include "transaction.h"
 #include "sysfs.h"
 #include "volumes.h"
@@ -321,9 +322,17 @@ static ssize_t supported_checksums_show(struct kobject *kobj,
 }
 BTRFS_ATTR(static_feature, supported_checksums, supported_checksums_show);
 
+static ssize_t send_stream_version_show(struct kobject *kobj,
+					struct kobj_attribute *ka, char *buf)
+{
+	return snprintf(buf, PAGE_SIZE, "%d\n", BTRFS_SEND_STREAM_VERSION);
+}
+BTRFS_ATTR(static_feature, send_stream_version, send_stream_version_show);
+
 static struct attribute *btrfs_supported_static_feature_attrs[] = {
 	BTRFS_ATTR_PTR(static_feature, rmdir_subvol),
 	BTRFS_ATTR_PTR(static_feature, supported_checksums),
+	BTRFS_ATTR_PTR(static_feature, send_stream_version),
 	NULL
 };
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 5/9] btrfs: add send stream v2 definitions
  2020-08-21  7:39 [PATCH 0/9] btrfs: implement send/receive of compressed extents without decompressing Omar Sandoval
                   ` (3 preceding siblings ...)
  2020-08-21  7:39 ` [PATCH 4/9] btrfs: add send_stream_version attribute to sysfs Omar Sandoval
@ 2020-08-21  7:39 ` Omar Sandoval
  2020-08-24 17:49   ` Josef Bacik
  2020-08-21  7:39 ` [PATCH 6/9] btrfs: send: write larger chunks when using stream v2 Omar Sandoval
                   ` (16 subsequent siblings)
  21 siblings, 1 reply; 37+ messages in thread
From: Omar Sandoval @ 2020-08-21  7:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: linux-fsdevel

From: Omar Sandoval <osandov@fb.com>

This adds the definitions of the new commands for send stream version 2
and their respective attributes: fallocate, FS_IOC_SETFLAGS (a.k.a.
chattr), and encoded writes. It also documents two changes to the send
stream format in v2: the receiver shouldn't assume a maximum command
size, and the DATA attribute is encoded differently to allow for writes
larger than 64k. These will be implemented in subsequent changes, and
then the ioctl will accept the new flags.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/send.c            |  2 +-
 fs/btrfs/send.h            | 30 +++++++++++++++++++++++++++++-
 include/uapi/linux/btrfs.h | 13 +++++++++++++
 3 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 37ce21361782..e25c3391fc02 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -7136,7 +7136,7 @@ long btrfs_ioctl_send(struct file *mnt_file, struct btrfs_ioctl_send_args *arg)
 
 	sctx->clone_roots_cnt = arg->clone_sources_count;
 
-	sctx->send_max_size = BTRFS_SEND_BUF_SIZE;
+	sctx->send_max_size = BTRFS_SEND_BUF_SIZE_V1;
 	sctx->send_buf = kvmalloc(sctx->send_max_size, GFP_KERNEL);
 	if (!sctx->send_buf) {
 		ret = -ENOMEM;
diff --git a/fs/btrfs/send.h b/fs/btrfs/send.h
index de91488b7cd0..9f4f7b96b1eb 100644
--- a/fs/btrfs/send.h
+++ b/fs/btrfs/send.h
@@ -12,7 +12,11 @@
 #define BTRFS_SEND_STREAM_MAGIC "btrfs-stream"
 #define BTRFS_SEND_STREAM_VERSION 1
 
-#define BTRFS_SEND_BUF_SIZE SZ_64K
+/*
+ * In send stream v1, no command is larger than 64k. In send stream v2, no limit
+ * should be assumed.
+ */
+#define BTRFS_SEND_BUF_SIZE_V1 SZ_64K
 
 enum btrfs_tlv_type {
 	BTRFS_TLV_U8,
@@ -76,6 +80,13 @@ enum btrfs_send_cmd {
 
 	BTRFS_SEND_C_END,
 	BTRFS_SEND_C_UPDATE_EXTENT,
+
+	/* The following commands were added in send stream v2. */
+
+	BTRFS_SEND_C_FALLOCATE,
+	BTRFS_SEND_C_SETFLAGS,
+	BTRFS_SEND_C_ENCODED_WRITE,
+
 	__BTRFS_SEND_C_MAX,
 };
 #define BTRFS_SEND_C_MAX (__BTRFS_SEND_C_MAX - 1)
@@ -106,6 +117,11 @@ enum {
 	BTRFS_SEND_A_PATH_LINK,
 
 	BTRFS_SEND_A_FILE_OFFSET,
+	/*
+	 * In send stream v2, this attribute is special: it must be the last
+	 * attribute in a command, its header contains only the type, and its
+	 * length is implicitly the remaining length of the command.
+	 */
 	BTRFS_SEND_A_DATA,
 
 	BTRFS_SEND_A_CLONE_UUID,
@@ -114,6 +130,18 @@ enum {
 	BTRFS_SEND_A_CLONE_OFFSET,
 	BTRFS_SEND_A_CLONE_LEN,
 
+	/* The following attributes were added in send stream v2. */
+
+	BTRFS_SEND_A_FALLOCATE_MODE,
+
+	BTRFS_SEND_A_SETFLAGS_FLAGS,
+
+	BTRFS_SEND_A_UNENCODED_FILE_LEN,
+	BTRFS_SEND_A_UNENCODED_LEN,
+	BTRFS_SEND_A_UNENCODED_OFFSET,
+	BTRFS_SEND_A_COMPRESSION,
+	BTRFS_SEND_A_ENCRYPTION,
+
 	__BTRFS_SEND_A_MAX,
 };
 #define BTRFS_SEND_A_MAX (__BTRFS_SEND_A_MAX - 1)
diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index 2c39d15a2beb..51e69f28d22d 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -769,6 +769,19 @@ struct btrfs_ioctl_received_subvol_args {
  */
 #define BTRFS_SEND_FLAG_OMIT_END_CMD		0x4
 
+/*
+ * Use version 2 of the send stream, which adds new commands and supports larger
+ * writes.
+ */
+#define BTRFS_SEND_FLAG_STREAM_V2		0x8
+
+/*
+ * Send compressed data using the ENCODED_WRITE command instead of decompressing
+ * the data and sending it with the WRITE command. This requires
+ * BTRFS_SEND_FLAG_STREAM_V2.
+ */
+#define BTRFS_SEND_FLAG_COMPRESSED		0x10
+
 #define BTRFS_SEND_FLAG_MASK \
 	(BTRFS_SEND_FLAG_NO_FILE_DATA | \
 	 BTRFS_SEND_FLAG_OMIT_STREAM_HEADER | \
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 6/9] btrfs: send: write larger chunks when using stream v2
  2020-08-21  7:39 [PATCH 0/9] btrfs: implement send/receive of compressed extents without decompressing Omar Sandoval
                   ` (4 preceding siblings ...)
  2020-08-21  7:39 ` [PATCH 5/9] btrfs: add send stream v2 definitions Omar Sandoval
@ 2020-08-21  7:39 ` Omar Sandoval
  2020-08-24 17:57   ` Josef Bacik
  2020-08-21  7:39 ` [PATCH 7/9] btrfs: send: allocate send buffer with alloc_page() and vmap() for v2 Omar Sandoval
                   ` (15 subsequent siblings)
  21 siblings, 1 reply; 37+ messages in thread
From: Omar Sandoval @ 2020-08-21  7:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: linux-fsdevel

From: Omar Sandoval <osandov@fb.com>

The length field of the send stream TLV header is 16 bits. This means
that the maximum amount of data that can be sent for one write is 64k
minus one. However, encoded writes must be able to send the maximum
compressed extent (128k) in one command. To support this, send stream
version 2 encodes the DATA attribute differently: it has no length
field, and the length is implicitly up to the end of containing command
(which has a 32-bit length field). Although this is necessary for
encoded writes, normal writes can benefit from it, too.

For v2, let's bump up the send buffer to the maximum compressed extent
size plus 16k for the other metadata (144k total). Since this will most
likely be vmalloc'd (and always will be after the next commit), we round
it up to the next page since we might as well use the rest of the page
on systems with >16k pages.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/send.c | 34 ++++++++++++++++++++++++++--------
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index e25c3391fc02..c0f81d302f49 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -4799,14 +4799,27 @@ static u64 max_send_read_size(struct send_ctx *sctx)
 
 static int put_data_header(struct send_ctx *sctx, u32 len)
 {
-	struct btrfs_tlv_header *hdr;
+	if (sctx->flags & BTRFS_SEND_FLAG_STREAM_V2) {
+		__le16 tlv_type;
+
+		if (sctx->send_max_size - sctx->send_size <
+		    sizeof(tlv_type) + len)
+			return -EOVERFLOW;
+		tlv_type = cpu_to_le16(BTRFS_SEND_A_DATA);
+		memcpy(sctx->send_buf + sctx->send_size, &tlv_type,
+		       sizeof(tlv_type));
+		sctx->send_size += sizeof(tlv_type);
+	} else {
+		struct btrfs_tlv_header *hdr;
 
-	if (sctx->send_max_size - sctx->send_size < sizeof(*hdr) + len)
-		return -EOVERFLOW;
-	hdr = (struct btrfs_tlv_header *)(sctx->send_buf + sctx->send_size);
-	hdr->tlv_type = cpu_to_le16(BTRFS_SEND_A_DATA);
-	hdr->tlv_len = cpu_to_le16(len);
-	sctx->send_size += sizeof(*hdr);
+		if (sctx->send_max_size - sctx->send_size < sizeof(*hdr) + len)
+			return -EOVERFLOW;
+		hdr = (struct btrfs_tlv_header *)(sctx->send_buf +
+						  sctx->send_size);
+		hdr->tlv_type = cpu_to_le16(BTRFS_SEND_A_DATA);
+		hdr->tlv_len = cpu_to_le16(len);
+		sctx->send_size += sizeof(*hdr);
+	}
 	return 0;
 }
 
@@ -7136,7 +7149,12 @@ long btrfs_ioctl_send(struct file *mnt_file, struct btrfs_ioctl_send_args *arg)
 
 	sctx->clone_roots_cnt = arg->clone_sources_count;
 
-	sctx->send_max_size = BTRFS_SEND_BUF_SIZE_V1;
+	if (sctx->flags & BTRFS_SEND_FLAG_STREAM_V2) {
+		sctx->send_max_size = ALIGN(SZ_16K + BTRFS_MAX_COMPRESSED,
+					    PAGE_SIZE);
+	} else {
+		sctx->send_max_size = BTRFS_SEND_BUF_SIZE_V1;
+	}
 	sctx->send_buf = kvmalloc(sctx->send_max_size, GFP_KERNEL);
 	if (!sctx->send_buf) {
 		ret = -ENOMEM;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 7/9] btrfs: send: allocate send buffer with alloc_page() and vmap() for v2
  2020-08-21  7:39 [PATCH 0/9] btrfs: implement send/receive of compressed extents without decompressing Omar Sandoval
                   ` (5 preceding siblings ...)
  2020-08-21  7:39 ` [PATCH 6/9] btrfs: send: write larger chunks when using stream v2 Omar Sandoval
@ 2020-08-21  7:39 ` Omar Sandoval
  2020-08-21  7:39 ` [PATCH 8/9] btrfs: send: send compressed extents with encoded writes Omar Sandoval
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 37+ messages in thread
From: Omar Sandoval @ 2020-08-21  7:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: linux-fsdevel

From: Omar Sandoval <osandov@fb.com>

For encoded writes, we need the raw pages for reading compressed data
directly via a bio. So, replace kvmalloc() with vmap() so we have access
to the raw pages. 144k is large enough that it usually gets allocated
with vmalloc(), anyways.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/send.c | 33 +++++++++++++++++++++++++++++++--
 1 file changed, 31 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index c0f81d302f49..efa6f8f27e4d 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -81,6 +81,7 @@ struct send_ctx {
 	char *send_buf;
 	u32 send_size;
 	u32 send_max_size;
+	struct page **send_buf_pages;
 	u64 total_send_size;
 	u64 cmd_send_size[BTRFS_SEND_C_MAX + 1];
 	u64 flags;	/* 'flags' member of btrfs_ioctl_send_args is u64 */
@@ -7072,6 +7073,7 @@ long btrfs_ioctl_send(struct file *mnt_file, struct btrfs_ioctl_send_args *arg)
 	struct btrfs_root *clone_root;
 	struct send_ctx *sctx = NULL;
 	u32 i;
+	u32 send_buf_num_pages = 0;
 	u64 *clone_sources_tmp = NULL;
 	int clone_sources_to_rollback = 0;
 	unsigned alloc_size;
@@ -7152,10 +7154,28 @@ long btrfs_ioctl_send(struct file *mnt_file, struct btrfs_ioctl_send_args *arg)
 	if (sctx->flags & BTRFS_SEND_FLAG_STREAM_V2) {
 		sctx->send_max_size = ALIGN(SZ_16K + BTRFS_MAX_COMPRESSED,
 					    PAGE_SIZE);
+		send_buf_num_pages = sctx->send_max_size >> PAGE_SHIFT;
+		sctx->send_buf_pages = kcalloc(send_buf_num_pages,
+					       sizeof(*sctx->send_buf_pages),
+					       GFP_KERNEL);
+		if (!sctx->send_buf_pages) {
+			send_buf_num_pages = 0;
+			ret = -ENOMEM;
+			goto out;
+		}
+		for (i = 0; i < send_buf_num_pages; i++) {
+			sctx->send_buf_pages[i] = alloc_page(GFP_KERNEL);
+			if (!sctx->send_buf_pages[i]) {
+				ret = -ENOMEM;
+				goto out;
+			}
+		}
+		sctx->send_buf = vmap(sctx->send_buf_pages, send_buf_num_pages,
+				      VM_MAP, PAGE_KERNEL);
 	} else {
 		sctx->send_max_size = BTRFS_SEND_BUF_SIZE_V1;
+		sctx->send_buf = kvmalloc(sctx->send_max_size, GFP_KERNEL);
 	}
-	sctx->send_buf = kvmalloc(sctx->send_max_size, GFP_KERNEL);
 	if (!sctx->send_buf) {
 		ret = -ENOMEM;
 		goto out;
@@ -7363,7 +7383,16 @@ long btrfs_ioctl_send(struct file *mnt_file, struct btrfs_ioctl_send_args *arg)
 			fput(sctx->send_filp);
 
 		kvfree(sctx->clone_roots);
-		kvfree(sctx->send_buf);
+		if (sctx->flags & BTRFS_SEND_FLAG_STREAM_V2) {
+			vunmap(sctx->send_buf);
+			for (i = 0; i < send_buf_num_pages; i++) {
+				if (sctx->send_buf_pages[i])
+					__free_page(sctx->send_buf_pages[i]);
+			}
+			kfree(sctx->send_buf_pages);
+		} else {
+			kvfree(sctx->send_buf);
+		}
 
 		name_cache_free(sctx);
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 8/9] btrfs: send: send compressed extents with encoded writes
  2020-08-21  7:39 [PATCH 0/9] btrfs: implement send/receive of compressed extents without decompressing Omar Sandoval
                   ` (6 preceding siblings ...)
  2020-08-21  7:39 ` [PATCH 7/9] btrfs: send: allocate send buffer with alloc_page() and vmap() for v2 Omar Sandoval
@ 2020-08-21  7:39 ` Omar Sandoval
  2020-08-24 17:32   ` Josef Bacik
  2020-08-21  7:39 ` [PATCH 9/9] btrfs: send: enable support for stream v2 and compressed writes Omar Sandoval
                   ` (13 subsequent siblings)
  21 siblings, 1 reply; 37+ messages in thread
From: Omar Sandoval @ 2020-08-21  7:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: linux-fsdevel

From: Omar Sandoval <osandov@fb.com>

Now that all of the pieces are in place, we can use the ENCODED_WRITE
command to send compressed extents when appropriate.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/ctree.h |   4 +
 fs/btrfs/inode.c |   6 +-
 fs/btrfs/send.c  | 230 +++++++++++++++++++++++++++++++++++++++++++----
 3 files changed, 220 insertions(+), 20 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 744f4212b5f7..918ae5471994 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3020,6 +3020,10 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct page *locked_page
 int btrfs_writepage_cow_fixup(struct page *page, u64 start, u64 end);
 void btrfs_writepage_endio_finish_ordered(struct page *page, u64 start,
 					  u64 end, int uptodate);
+int encoded_iov_compression_from_btrfs(unsigned int compress_type);
+int btrfs_encoded_read_regular_fill_pages(struct inode *inode, u64 offset,
+					  u64 disk_io_size,
+					  struct page **pages);
 ssize_t btrfs_encoded_read(struct kiocb *iocb, struct iov_iter *iter);
 ssize_t btrfs_encoded_write(struct kiocb *iocb, struct iov_iter *from,
 			    struct encoded_iov *encoded);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 174889774b10..e7cc966d7cf8 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -9818,7 +9818,7 @@ void btrfs_set_range_writeback(struct extent_io_tree *tree, u64 start, u64 end)
 	}
 }
 
-static int encoded_iov_compression_from_btrfs(unsigned int compress_type)
+int encoded_iov_compression_from_btrfs(unsigned int compress_type)
 {
 	switch (compress_type) {
 	case BTRFS_COMPRESS_NONE:
@@ -10019,8 +10019,8 @@ static void btrfs_encoded_read_endio(struct bio *bio)
 	bio_put(bio);
 }
 
-static int btrfs_encoded_read_regular_fill_pages(struct inode *inode, u64 offset,
-						 u64 disk_io_size, struct page **pages)
+int btrfs_encoded_read_regular_fill_pages(struct inode *inode, u64 offset,
+					  u64 disk_io_size, struct page **pages)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct btrfs_encoded_read_private priv = {
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index efa6f8f27e4d..df6882b3ab2b 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -594,6 +594,7 @@ static int tlv_put(struct send_ctx *sctx, u16 attr, const void *data, int len)
 		return tlv_put(sctx, attr, &__tmp, sizeof(__tmp));	\
 	}
 
+TLV_PUT_DEFINE_INT(32)
 TLV_PUT_DEFINE_INT(64)
 
 static int tlv_put_string(struct send_ctx *sctx, u16 attr,
@@ -5097,16 +5098,211 @@ static int send_hole(struct send_ctx *sctx, u64 end)
 	return ret;
 }
 
-static int send_extent_data(struct send_ctx *sctx,
-			    const u64 offset,
-			    const u64 len)
+static int send_encoded_inline_extent(struct send_ctx *sctx,
+				      struct btrfs_path *path, u64 offset,
+				      u64 len)
 {
+	struct btrfs_root *root = sctx->send_root;
+	struct btrfs_fs_info *fs_info = root->fs_info;
+	struct inode *inode;
+	struct fs_path *p;
+	struct extent_buffer *leaf = path->nodes[0];
+	struct btrfs_key key;
+	struct btrfs_file_extent_item *ei;
+	u64 ram_bytes;
+	size_t inline_size;
+	int ret;
+
+	inode = btrfs_iget(fs_info->sb, sctx->cur_ino, root);
+	if (IS_ERR(inode))
+		return PTR_ERR(inode);
+
+	p = fs_path_alloc();
+	if (!p) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	ret = begin_cmd(sctx, BTRFS_SEND_C_ENCODED_WRITE);
+	if (ret < 0)
+		goto out;
+
+	ret = get_cur_path(sctx, sctx->cur_ino, sctx->cur_inode_gen, p);
+	if (ret < 0)
+		goto out;
+
+	btrfs_item_key_to_cpu(leaf, &key, path->slots[0]);
+	ei = btrfs_item_ptr(leaf, path->slots[0],
+			    struct btrfs_file_extent_item);
+	ram_bytes = btrfs_file_extent_ram_bytes(leaf, ei);
+	inline_size = btrfs_file_extent_inline_item_len(leaf,
+						btrfs_item_nr(path->slots[0]));
+
+	TLV_PUT_PATH(sctx, BTRFS_SEND_A_PATH, p);
+	TLV_PUT_U64(sctx, BTRFS_SEND_A_FILE_OFFSET, offset);
+	TLV_PUT_U64(sctx, BTRFS_SEND_A_UNENCODED_FILE_LEN,
+		    min(key.offset + ram_bytes - offset, len));
+	TLV_PUT_U64(sctx, BTRFS_SEND_A_UNENCODED_LEN, ram_bytes);
+	TLV_PUT_U64(sctx, BTRFS_SEND_A_UNENCODED_OFFSET, offset - key.offset);
+	ret = encoded_iov_compression_from_btrfs(
+				btrfs_file_extent_compression(leaf, ei));
+	if (ret < 0)
+		goto out;
+	TLV_PUT_U32(sctx, BTRFS_SEND_A_COMPRESSION, ret);
+	TLV_PUT_U32(sctx, BTRFS_SEND_A_ENCRYPTION, 0);
+
+	ret = put_data_header(sctx, inline_size);
+	if (ret < 0)
+		goto out;
+	read_extent_buffer(leaf, sctx->send_buf + sctx->send_size,
+			   btrfs_file_extent_inline_start(ei), inline_size);
+	sctx->send_size += inline_size;
+
+	ret = send_cmd(sctx);
+
+tlv_put_failure:
+out:
+	fs_path_free(p);
+	iput(inode);
+	return ret;
+}
+
+static int send_encoded_extent(struct send_ctx *sctx, struct btrfs_path *path,
+			       u64 offset, u64 len)
+{
+	struct btrfs_root *root = sctx->send_root;
+	struct btrfs_fs_info *fs_info = root->fs_info;
+	struct inode *inode;
+	struct fs_path *p;
+	struct extent_buffer *leaf = path->nodes[0];
+	struct btrfs_key key;
+	struct btrfs_file_extent_item *ei;
+	u64 block_start;
+	u64 block_len;
+	u32 data_offset;
+	struct btrfs_cmd_header *hdr;
+	u32 crc;
+	int ret;
+
+	inode = btrfs_iget(fs_info->sb, sctx->cur_ino, root);
+	if (IS_ERR(inode))
+		return PTR_ERR(inode);
+
+	p = fs_path_alloc();
+	if (!p) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	ret = begin_cmd(sctx, BTRFS_SEND_C_ENCODED_WRITE);
+	if (ret < 0)
+		goto out;
+
+	ret = get_cur_path(sctx, sctx->cur_ino, sctx->cur_inode_gen, p);
+	if (ret < 0)
+		goto out;
+
+	btrfs_item_key_to_cpu(leaf, &key, path->slots[0]);
+	ei = btrfs_item_ptr(leaf, path->slots[0],
+			    struct btrfs_file_extent_item);
+	block_start = btrfs_file_extent_disk_bytenr(leaf, ei);
+	block_len = btrfs_file_extent_disk_num_bytes(leaf, ei);
+
+	TLV_PUT_PATH(sctx, BTRFS_SEND_A_PATH, p);
+	TLV_PUT_U64(sctx, BTRFS_SEND_A_FILE_OFFSET, offset);
+	TLV_PUT_U64(sctx, BTRFS_SEND_A_UNENCODED_FILE_LEN,
+		    min(key.offset + btrfs_file_extent_num_bytes(leaf, ei) - offset,
+			len));
+	TLV_PUT_U64(sctx, BTRFS_SEND_A_UNENCODED_LEN,
+		    btrfs_file_extent_ram_bytes(leaf, ei));
+	TLV_PUT_U64(sctx, BTRFS_SEND_A_UNENCODED_OFFSET,
+		    offset - key.offset + btrfs_file_extent_offset(leaf, ei));
+	ret = encoded_iov_compression_from_btrfs(
+				btrfs_file_extent_compression(leaf, ei));
+	if (ret < 0)
+		goto out;
+	TLV_PUT_U32(sctx, BTRFS_SEND_A_COMPRESSION, ret);
+	TLV_PUT_U32(sctx, BTRFS_SEND_A_ENCRYPTION, 0);
+
+	ret = put_data_header(sctx, block_len);
+	if (ret < 0)
+		goto out;
+
+	data_offset = ALIGN(sctx->send_size, PAGE_SIZE);
+	if (data_offset > sctx->send_max_size ||
+	    sctx->send_max_size - data_offset < block_len) {
+		ret = -EOVERFLOW;
+		goto out;
+	}
+
+	ret = btrfs_encoded_read_regular_fill_pages(inode, block_start,
+						    block_len,
+						    sctx->send_buf_pages +
+						    (data_offset >> PAGE_SHIFT));
+	if (ret)
+		goto out;
+
+	hdr = (struct btrfs_cmd_header *)sctx->send_buf;
+	hdr->len = cpu_to_le32(sctx->send_size + block_len - sizeof(*hdr));
+	hdr->crc = 0;
+	crc = btrfs_crc32c(0, sctx->send_buf, sctx->send_size);
+	crc = btrfs_crc32c(crc, sctx->send_buf + data_offset, block_len);
+	hdr->crc = cpu_to_le32(crc);
+
+	ret = write_buf(sctx->send_filp, sctx->send_buf, sctx->send_size,
+			&sctx->send_off);
+	if (!ret) {
+		ret = write_buf(sctx->send_filp, sctx->send_buf + data_offset,
+				block_len, &sctx->send_off);
+	}
+	sctx->total_send_size += sctx->send_size + block_len;
+	sctx->cmd_send_size[le16_to_cpu(hdr->cmd)] +=
+		sctx->send_size + block_len;
+	sctx->send_size = 0;
+
+tlv_put_failure:
+out:
+	fs_path_free(p);
+	iput(inode);
+	return ret;
+}
+
+static int send_extent_data(struct send_ctx *sctx, struct btrfs_path *path,
+			    const u64 offset, const u64 len)
+{
+	struct extent_buffer *leaf = path->nodes[0];
+	struct btrfs_file_extent_item *ei;
 	u64 read_size = max_send_read_size(sctx);
 	u64 sent = 0;
 
 	if (sctx->flags & BTRFS_SEND_FLAG_NO_FILE_DATA)
 		return send_update_extent(sctx, offset, len);
 
+	ei = btrfs_item_ptr(leaf, path->slots[0],
+			    struct btrfs_file_extent_item);
+	if ((sctx->flags & BTRFS_SEND_FLAG_COMPRESSED) &&
+	    btrfs_file_extent_compression(leaf, ei) != BTRFS_COMPRESS_NONE) {
+		bool is_inline = (btrfs_file_extent_type(leaf, ei) ==
+				  BTRFS_FILE_EXTENT_INLINE);
+
+		/*
+		 * Send the compressed extent unless the compressed data is
+		 * larger than the decompressed data. This can happen if we're
+		 * not sending the entire extent, either because it has been
+		 * partially overwritten/truncated or because this is a part of
+		 * the extent that we couldn't clone in clone_range().
+		 */
+		if (is_inline &&
+		    btrfs_file_extent_inline_item_len(leaf,
+					btrfs_item_nr(path->slots[0])) <= len) {
+			return send_encoded_inline_extent(sctx, path, offset,
+							  len);
+		} else if (!is_inline &&
+			   btrfs_file_extent_disk_num_bytes(leaf, ei) <= len) {
+			return send_encoded_extent(sctx, path, offset, len);
+		}
+	}
+
 	while (sent < len) {
 		u64 size = min(len - sent, read_size);
 		int ret;
@@ -5177,12 +5373,9 @@ static int send_capabilities(struct send_ctx *sctx)
 	return ret;
 }
 
-static int clone_range(struct send_ctx *sctx,
-		       struct clone_root *clone_root,
-		       const u64 disk_byte,
-		       u64 data_offset,
-		       u64 offset,
-		       u64 len)
+static int clone_range(struct send_ctx *sctx, struct btrfs_path *dst_path,
+		       struct clone_root *clone_root, const u64 disk_byte,
+		       u64 data_offset, u64 offset, u64 len)
 {
 	struct btrfs_path *path;
 	struct btrfs_key key;
@@ -5206,7 +5399,7 @@ static int clone_range(struct send_ctx *sctx,
 	 */
 	if (clone_root->offset == 0 &&
 	    len == sctx->send_root->fs_info->sectorsize)
-		return send_extent_data(sctx, offset, len);
+		return send_extent_data(sctx, dst_path, offset, len);
 
 	path = alloc_path_for_send();
 	if (!path)
@@ -5303,7 +5496,8 @@ static int clone_range(struct send_ctx *sctx,
 
 			if (hole_len > len)
 				hole_len = len;
-			ret = send_extent_data(sctx, offset, hole_len);
+			ret = send_extent_data(sctx, dst_path, offset,
+					       hole_len);
 			if (ret < 0)
 				goto out;
 
@@ -5376,14 +5570,16 @@ static int clone_range(struct send_ctx *sctx,
 					if (ret < 0)
 						goto out;
 				}
-				ret = send_extent_data(sctx, offset + slen,
+				ret = send_extent_data(sctx, dst_path,
+						       offset + slen,
 						       clone_len - slen);
 			} else {
 				ret = send_clone(sctx, offset, clone_len,
 						 clone_root);
 			}
 		} else {
-			ret = send_extent_data(sctx, offset, clone_len);
+			ret = send_extent_data(sctx, dst_path, offset,
+					       clone_len);
 		}
 
 		if (ret < 0)
@@ -5400,7 +5596,7 @@ static int clone_range(struct send_ctx *sctx,
 	}
 
 	if (len > 0)
-		ret = send_extent_data(sctx, offset, len);
+		ret = send_extent_data(sctx, dst_path, offset, len);
 	else
 		ret = 0;
 out:
@@ -5431,10 +5627,10 @@ static int send_write_or_clone(struct send_ctx *sctx,
 				    struct btrfs_file_extent_item);
 		disk_byte = btrfs_file_extent_disk_bytenr(path->nodes[0], ei);
 		data_offset = btrfs_file_extent_offset(path->nodes[0], ei);
-		ret = clone_range(sctx, clone_root, disk_byte, data_offset,
-				  offset, end - offset);
+		ret = clone_range(sctx, path, clone_root, disk_byte,
+				  data_offset, offset, end - offset);
 	} else {
-		ret = send_extent_data(sctx, offset, end - offset);
+		ret = send_extent_data(sctx, path, offset, end - offset);
 	}
 	sctx->cur_inode_next_write_offset = end;
 	return ret;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 9/9] btrfs: send: enable support for stream v2 and compressed writes
  2020-08-21  7:39 [PATCH 0/9] btrfs: implement send/receive of compressed extents without decompressing Omar Sandoval
                   ` (7 preceding siblings ...)
  2020-08-21  7:39 ` [PATCH 8/9] btrfs: send: send compressed extents with encoded writes Omar Sandoval
@ 2020-08-21  7:39 ` Omar Sandoval
  2020-08-21  7:40 ` [PATCH 01/11] btrfs-progs: receive: support v2 send stream larger tlv_len Omar Sandoval
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 37+ messages in thread
From: Omar Sandoval @ 2020-08-21  7:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: linux-fsdevel

From: Omar Sandoval <osandov@fb.com>

Now that the new support is implemented, allow the ioctl to accept the
flags and update the version in sysfs.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/send.c            | 10 +++++++++-
 fs/btrfs/send.h            |  2 +-
 include/uapi/linux/btrfs.h |  4 +++-
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index df6882b3ab2b..e87dea7bd915 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -670,7 +670,10 @@ static int send_header(struct send_ctx *sctx)
 	struct btrfs_stream_header hdr;
 
 	strcpy(hdr.magic, BTRFS_SEND_STREAM_MAGIC);
-	hdr.version = cpu_to_le32(BTRFS_SEND_STREAM_VERSION);
+	if (sctx->flags & BTRFS_SEND_FLAG_STREAM_V2)
+		hdr.version = cpu_to_le32(2);
+	else
+		hdr.version = cpu_to_le32(1);
 
 	return write_buf(sctx->send_filp, &hdr, sizeof(hdr),
 					&sctx->send_off);
@@ -7315,6 +7318,11 @@ long btrfs_ioctl_send(struct file *mnt_file, struct btrfs_ioctl_send_args *arg)
 		ret = -EINVAL;
 		goto out;
 	}
+	if ((arg->flags & BTRFS_SEND_FLAG_COMPRESSED) &&
+	    !(arg->flags & BTRFS_SEND_FLAG_STREAM_V2)) {
+		ret = -EINVAL;
+		goto out;
+	}
 
 	sctx = kzalloc(sizeof(struct send_ctx), GFP_KERNEL);
 	if (!sctx) {
diff --git a/fs/btrfs/send.h b/fs/btrfs/send.h
index 9f4f7b96b1eb..9c83e14a43b2 100644
--- a/fs/btrfs/send.h
+++ b/fs/btrfs/send.h
@@ -10,7 +10,7 @@
 #include "ctree.h"
 
 #define BTRFS_SEND_STREAM_MAGIC "btrfs-stream"
-#define BTRFS_SEND_STREAM_VERSION 1
+#define BTRFS_SEND_STREAM_VERSION 2
 
 /*
  * In send stream v1, no command is larger than 64k. In send stream v2, no limit
diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index 51e69f28d22d..6f29c456e4d7 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -785,7 +785,9 @@ struct btrfs_ioctl_received_subvol_args {
 #define BTRFS_SEND_FLAG_MASK \
 	(BTRFS_SEND_FLAG_NO_FILE_DATA | \
 	 BTRFS_SEND_FLAG_OMIT_STREAM_HEADER | \
-	 BTRFS_SEND_FLAG_OMIT_END_CMD)
+	 BTRFS_SEND_FLAG_OMIT_END_CMD | \
+	 BTRFS_SEND_FLAG_STREAM_V2 | \
+	 BTRFS_SEND_FLAG_COMPRESSED)
 
 struct btrfs_ioctl_send_args {
 	__s64 send_fd;			/* in */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 01/11] btrfs-progs: receive: support v2 send stream larger tlv_len
  2020-08-21  7:39 [PATCH 0/9] btrfs: implement send/receive of compressed extents without decompressing Omar Sandoval
                   ` (8 preceding siblings ...)
  2020-08-21  7:39 ` [PATCH 9/9] btrfs: send: enable support for stream v2 and compressed writes Omar Sandoval
@ 2020-08-21  7:40 ` Omar Sandoval
  2020-08-21  7:40 ` [PATCH 02/11] btrfs-progs: receive: dynamically allocate sctx->read_buf Omar Sandoval
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 37+ messages in thread
From: Omar Sandoval @ 2020-08-21  7:40 UTC (permalink / raw)
  To: linux-btrfs; +Cc: linux-fsdevel

From: Boris Burkov <borisb@fb.com>

An encoded extent can be up to 128K in length, which exceeds the largest
value expressible by the current send stream format's 16 bit tlv_len
field. Since encoded writes cannot be split into multiple writes by
btrfs send, the send stream format must change to accommodate encoded
writes.

Supporting this changed format requires retooling how we store the
commands we have processed. Since we can no longer use btrfs_tlv_header
to describe every attribute, we define a new struct btrfs_send_attribute
which has a 32 bit length field, and use that to store the attribute
information needed for receive processing. This is transparent to users
of the various TLV_GET macros.

Signed-off-by: Boris Burkov <boris@bur.io>
---
 common/send-stream.c | 34 +++++++++++++++++++++++++---------
 1 file changed, 25 insertions(+), 9 deletions(-)

diff --git a/common/send-stream.c b/common/send-stream.c
index 69d75168..3bd21d3f 100644
--- a/common/send-stream.c
+++ b/common/send-stream.c
@@ -24,13 +24,23 @@
 #include "crypto/crc32c.h"
 #include "common/utils.h"
 
+struct btrfs_send_attribute {
+	u16 tlv_type;
+	/*
+	 * Note: in btrfs_tlv_header, this is __le16, but we need 32 bits for
+	 * attributes with file data as of version 2 of the send stream format
+	 */
+	u32 tlv_len;
+	char *data;
+};
+
 struct btrfs_send_stream {
 	int fd;
 	char read_buf[BTRFS_SEND_BUF_SIZE];
 
 	int cmd;
 	struct btrfs_cmd_header *cmd_hdr;
-	struct btrfs_tlv_header *cmd_attrs[BTRFS_SEND_A_MAX + 1];
+	struct btrfs_send_attribute cmd_attrs[BTRFS_SEND_A_MAX + 1];
 	u32 version;
 
 	/*
@@ -152,6 +162,7 @@ static int read_cmd(struct btrfs_send_stream *sctx)
 		struct btrfs_tlv_header *tlv_hdr;
 		u16 tlv_type;
 		u16 tlv_len;
+		struct btrfs_send_attribute *send_attr;
 
 		tlv_hdr = (struct btrfs_tlv_header *)data;
 		tlv_type = le16_to_cpu(tlv_hdr->tlv_type);
@@ -164,10 +175,15 @@ static int read_cmd(struct btrfs_send_stream *sctx)
 			goto out;
 		}
 
-		sctx->cmd_attrs[tlv_type] = tlv_hdr;
+		send_attr = &sctx->cmd_attrs[tlv_type];
+		send_attr->tlv_type = tlv_type;
+		send_attr->tlv_len = tlv_len;
+		pos += sizeof(*tlv_hdr);
+		data += sizeof(*tlv_hdr);
 
-		data += sizeof(*tlv_hdr) + tlv_len;
-		pos += sizeof(*tlv_hdr) + tlv_len;
+		send_attr->data = data;
+		pos += send_attr->tlv_len;
+		data += send_attr->tlv_len;
 	}
 
 	sctx->cmd = cmd;
@@ -180,7 +196,7 @@ out:
 static int tlv_get(struct btrfs_send_stream *sctx, int attr, void **data, int *len)
 {
 	int ret;
-	struct btrfs_tlv_header *hdr;
+	struct btrfs_send_attribute *send_attr;
 
 	if (attr <= 0 || attr > BTRFS_SEND_A_MAX) {
 		error("invalid attribute requested, attr = %d", attr);
@@ -188,15 +204,15 @@ static int tlv_get(struct btrfs_send_stream *sctx, int attr, void **data, int *l
 		goto out;
 	}
 
-	hdr = sctx->cmd_attrs[attr];
-	if (!hdr) {
+	send_attr = &sctx->cmd_attrs[attr];
+	if (!send_attr->data) {
 		error("attribute %d requested but not present", attr);
 		ret = -ENOENT;
 		goto out;
 	}
 
-	*len = le16_to_cpu(hdr->tlv_len);
-	*data = hdr + 1;
+	*len = send_attr->tlv_len;
+	*data = send_attr->data;
 
 	ret = 0;
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 02/11] btrfs-progs: receive: dynamically allocate sctx->read_buf
  2020-08-21  7:39 [PATCH 0/9] btrfs: implement send/receive of compressed extents without decompressing Omar Sandoval
                   ` (9 preceding siblings ...)
  2020-08-21  7:40 ` [PATCH 01/11] btrfs-progs: receive: support v2 send stream larger tlv_len Omar Sandoval
@ 2020-08-21  7:40 ` Omar Sandoval
  2020-08-21  7:40 ` [PATCH 03/11] btrfs-progs: receive: support v2 send stream DATA tlv format Omar Sandoval
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 37+ messages in thread
From: Omar Sandoval @ 2020-08-21  7:40 UTC (permalink / raw)
  To: linux-btrfs; +Cc: linux-fsdevel

From: Boris Burkov <boris@bur.io>

In send stream v2, write commands can now be an arbitrary size. For that
reason, we can no longer allocate a fixed array in sctx for read_cmd.
Instead, read_cmd dynamically allocates sctx->read_buf. To avoid
needless reallocations, we reuse read_buf between read_cmd calls by also
keeping track of the size of the allocated buffer in sctx->read_buf_sz.

We do the first allocation of the old default size at the start of
processing the stream, and we only reallocate if we encounter a command
that needs a larger buffer.

Signed-off-by: Boris Burkov <boris@bur.io>
---
 common/send-stream.c | 55 ++++++++++++++++++++++++++++----------------
 send.h               |  2 +-
 2 files changed, 36 insertions(+), 21 deletions(-)

diff --git a/common/send-stream.c b/common/send-stream.c
index 3bd21d3f..51a6a94a 100644
--- a/common/send-stream.c
+++ b/common/send-stream.c
@@ -36,10 +36,10 @@ struct btrfs_send_attribute {
 
 struct btrfs_send_stream {
 	int fd;
-	char read_buf[BTRFS_SEND_BUF_SIZE];
+	char *read_buf;
+	size_t read_buf_sz;
 
 	int cmd;
-	struct btrfs_cmd_header *cmd_hdr;
 	struct btrfs_send_attribute cmd_attrs[BTRFS_SEND_A_MAX + 1];
 	u32 version;
 
@@ -111,11 +111,12 @@ static int read_cmd(struct btrfs_send_stream *sctx)
 	u32 pos;
 	u32 crc;
 	u32 crc2;
+	struct btrfs_cmd_header *cmd_hdr;
+	size_t buf_len;
 
 	memset(sctx->cmd_attrs, 0, sizeof(sctx->cmd_attrs));
 
-	ASSERT(sizeof(*sctx->cmd_hdr) <= sizeof(sctx->read_buf));
-	ret = read_buf(sctx, sctx->read_buf, sizeof(*sctx->cmd_hdr));
+	ret = read_buf(sctx, sctx->read_buf, sizeof(*cmd_hdr));
 	if (ret < 0)
 		goto out;
 	if (ret) {
@@ -124,18 +125,22 @@ static int read_cmd(struct btrfs_send_stream *sctx)
 		goto out;
 	}
 
-	sctx->cmd_hdr = (struct btrfs_cmd_header *)sctx->read_buf;
-	cmd = le16_to_cpu(sctx->cmd_hdr->cmd);
-	cmd_len = le32_to_cpu(sctx->cmd_hdr->len);
-
-	if (cmd_len + sizeof(*sctx->cmd_hdr) >= sizeof(sctx->read_buf)) {
-		ret = -EINVAL;
-		error("command length %u too big for buffer %zu",
-				cmd_len, sizeof(sctx->read_buf));
-		goto out;
+	cmd_hdr = (struct btrfs_cmd_header *)sctx->read_buf;
+	cmd_len = le32_to_cpu(cmd_hdr->len);
+	cmd = le16_to_cpu(cmd_hdr->cmd);
+	buf_len = sizeof(*cmd_hdr) + cmd_len;
+	if (sctx->read_buf_sz < buf_len) {
+		sctx->read_buf = realloc(sctx->read_buf, buf_len);
+		if (!sctx->read_buf) {
+			ret = -ENOMEM;
+			error("failed to reallocate read buffer for cmd");
+			goto out;
+		}
+		sctx->read_buf_sz = buf_len;
+		/* We need to reset cmd_hdr after realloc of sctx->read_buf */
+		cmd_hdr = (struct btrfs_cmd_header *)sctx->read_buf;
 	}
-
-	data = sctx->read_buf + sizeof(*sctx->cmd_hdr);
+	data = sctx->read_buf + sizeof(*cmd_hdr);
 	ret = read_buf(sctx, data, cmd_len);
 	if (ret < 0)
 		goto out;
@@ -145,11 +150,12 @@ static int read_cmd(struct btrfs_send_stream *sctx)
 		goto out;
 	}
 
-	crc = le32_to_cpu(sctx->cmd_hdr->crc);
-	sctx->cmd_hdr->crc = 0;
+	crc = le32_to_cpu(cmd_hdr->crc);
+	/* in send, crc is computed with header crc = 0, replicate that */
+	cmd_hdr->crc = 0;
 
 	crc2 = crc32c(0, (unsigned char*)sctx->read_buf,
-			sizeof(*sctx->cmd_hdr) + cmd_len);
+			sizeof(*cmd_hdr) + cmd_len);
 
 	if (crc != crc2) {
 		ret = -EINVAL;
@@ -524,19 +530,28 @@ int btrfs_read_and_process_send_stream(int fd,
 		goto out;
 	}
 
+	sctx.read_buf = malloc(BTRFS_SEND_BUF_SIZE_V1);
+	if (!sctx.read_buf) {
+		ret = -ENOMEM;
+		error("unable to allocate send stream read buffer");
+		goto out;
+	}
+	sctx.read_buf_sz = BTRFS_SEND_BUF_SIZE_V1;
+
 	while (1) {
 		ret = read_and_process_cmd(&sctx);
 		if (ret < 0) {
 			last_err = ret;
 			errors++;
 			if (max_errors > 0 && errors >= max_errors)
-				goto out;
+				break;
 		} else if (ret > 0) {
 			if (!honor_end_cmd)
 				ret = 0;
-			goto out;
+			break;
 		}
 	}
+	free(sctx.read_buf);
 
 out:
 	if (last_err && !ret)
diff --git a/send.h b/send.h
index 8dd865ec..228928a0 100644
--- a/send.h
+++ b/send.h
@@ -33,7 +33,7 @@ extern "C" {
 #define BTRFS_SEND_STREAM_MAGIC "btrfs-stream"
 #define BTRFS_SEND_STREAM_VERSION 1
 
-#define BTRFS_SEND_BUF_SIZE SZ_64K
+#define BTRFS_SEND_BUF_SIZE_V1 SZ_64K
 #define BTRFS_SEND_READ_SIZE (1024 * 48)
 
 enum btrfs_tlv_type {
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 03/11] btrfs-progs: receive: support v2 send stream DATA tlv format
  2020-08-21  7:39 [PATCH 0/9] btrfs: implement send/receive of compressed extents without decompressing Omar Sandoval
                   ` (10 preceding siblings ...)
  2020-08-21  7:40 ` [PATCH 02/11] btrfs-progs: receive: dynamically allocate sctx->read_buf Omar Sandoval
@ 2020-08-21  7:40 ` Omar Sandoval
  2020-08-21  7:40 ` [PATCH 04/11] btrfs-progs: receive: add send stream v2 cmds and attrs to send.h Omar Sandoval
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 37+ messages in thread
From: Omar Sandoval @ 2020-08-21  7:40 UTC (permalink / raw)
  To: linux-btrfs; +Cc: linux-fsdevel

From: Boris Burkov <borisb@fb.com>

The new format privileges the BTRFS_SEND_A_DATA attribute by
guaranteeing it will always be the last attribute in any command that
needs it, and by implicitly encoding the data length as the difference
between the total command length in the command header and the sizes of
the rest of the attributes (and of course the tlv_type identifying the
DATA attribute). To parse the new stream, we must read the tlv_type and
if it is not DATA, we proceed normally, but if it is DATA, we don't
parse a tlv_len but simply compute the length.

In addition, we add some bounds checking when parsing each chunk of
data, as well as for the tlv_len itself.

Signed-off-by: Boris Burkov <boris@bur.io>
---
 common/send-stream.c | 36 ++++++++++++++++++++++++++----------
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/common/send-stream.c b/common/send-stream.c
index 51a6a94a..77d5cd04 100644
--- a/common/send-stream.c
+++ b/common/send-stream.c
@@ -165,28 +165,44 @@ static int read_cmd(struct btrfs_send_stream *sctx)
 
 	pos = 0;
 	while (pos < cmd_len) {
-		struct btrfs_tlv_header *tlv_hdr;
 		u16 tlv_type;
-		u16 tlv_len;
 		struct btrfs_send_attribute *send_attr;
 
-		tlv_hdr = (struct btrfs_tlv_header *)data;
-		tlv_type = le16_to_cpu(tlv_hdr->tlv_type);
-		tlv_len = le16_to_cpu(tlv_hdr->tlv_len);
+		if (cmd_len - pos < sizeof(__le16)) {
+			error("send stream is truncated");
+			ret = -EINVAL;
+			goto out;
+		}
+		tlv_type = le16_to_cpu(*(__le16 *)data);
 
 		if (tlv_type == 0 || tlv_type > BTRFS_SEND_A_MAX) {
-			error("invalid tlv in cmd tlv_type = %hu, tlv_len = %hu",
-					tlv_type, tlv_len);
+			error("invalid tlv in cmd tlv_type = %hu", tlv_type);
 			ret = -EINVAL;
 			goto out;
 		}
 
 		send_attr = &sctx->cmd_attrs[tlv_type];
 		send_attr->tlv_type = tlv_type;
-		send_attr->tlv_len = tlv_len;
-		pos += sizeof(*tlv_hdr);
-		data += sizeof(*tlv_hdr);
 
+		pos += sizeof(tlv_type);
+		data += sizeof(tlv_type);
+		if (sctx->version == 2 && tlv_type == BTRFS_SEND_A_DATA) {
+			send_attr->tlv_len = cmd_len - pos;
+		} else {
+			if (cmd_len - pos < sizeof(__le16)) {
+				error("send stream is truncated");
+				ret = -EINVAL;
+				goto out;
+			}
+			send_attr->tlv_len = le16_to_cpu(*(__le16 *)data);
+			pos += sizeof(__le16);
+			data += sizeof(__le16);
+		}
+		if (cmd_len - pos < send_attr->tlv_len) {
+			error("send stream is truncated");
+			ret = -EINVAL;
+			goto out;
+		}
 		send_attr->data = data;
 		pos += send_attr->tlv_len;
 		data += send_attr->tlv_len;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 04/11] btrfs-progs: receive: add send stream v2 cmds and attrs to send.h
  2020-08-21  7:39 [PATCH 0/9] btrfs: implement send/receive of compressed extents without decompressing Omar Sandoval
                   ` (11 preceding siblings ...)
  2020-08-21  7:40 ` [PATCH 03/11] btrfs-progs: receive: support v2 send stream DATA tlv format Omar Sandoval
@ 2020-08-21  7:40 ` Omar Sandoval
  2020-08-21  7:40 ` [PATCH 05/11] btrfs-progs: receive: add stub implementation for pwritev2 Omar Sandoval
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 37+ messages in thread
From: Omar Sandoval @ 2020-08-21  7:40 UTC (permalink / raw)
  To: linux-btrfs; +Cc: linux-fsdevel

From: Boris Burkov <boris@bur.io>

Send stream v2 adds three commands and several attributes associated to
those commands. Before we implement processing them, add all the
commands and attributes. This avoids leaving the enums in an
intermediate state that doesn't correspond to any version of send
stream.

Signed-off-by: Boris Burkov <boris@bur.io>
---
 send.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/send.h b/send.h
index 228928a0..3c47e0c7 100644
--- a/send.h
+++ b/send.h
@@ -98,6 +98,11 @@ enum btrfs_send_cmd {
 
 	BTRFS_SEND_C_END,
 	BTRFS_SEND_C_UPDATE_EXTENT,
+
+	BTRFS_SEND_C_FALLOCATE,
+	BTRFS_SEND_C_SETFLAGS,
+	BTRFS_SEND_C_ENCODED_WRITE,
+
 	__BTRFS_SEND_C_MAX,
 };
 #define BTRFS_SEND_C_MAX (__BTRFS_SEND_C_MAX - 1)
@@ -136,6 +141,16 @@ enum {
 	BTRFS_SEND_A_CLONE_OFFSET,
 	BTRFS_SEND_A_CLONE_LEN,
 
+	BTRFS_SEND_A_FALLOCATE_MODE,
+
+	BTRFS_SEND_A_SETFLAGS_FLAGS,
+
+	BTRFS_SEND_A_UNENCODED_FILE_LEN,
+	BTRFS_SEND_A_UNENCODED_LEN,
+	BTRFS_SEND_A_UNENCODED_OFFSET,
+	BTRFS_SEND_A_COMPRESSION,
+	BTRFS_SEND_A_ENCRYPTION,
+
 	__BTRFS_SEND_A_MAX,
 };
 #define BTRFS_SEND_A_MAX (__BTRFS_SEND_A_MAX - 1)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 05/11] btrfs-progs: receive: add stub implementation for pwritev2
  2020-08-21  7:39 [PATCH 0/9] btrfs: implement send/receive of compressed extents without decompressing Omar Sandoval
                   ` (12 preceding siblings ...)
  2020-08-21  7:40 ` [PATCH 04/11] btrfs-progs: receive: add send stream v2 cmds and attrs to send.h Omar Sandoval
@ 2020-08-21  7:40 ` Omar Sandoval
  2020-08-21  7:40 ` [PATCH 06/11] btrfs-progs: receive: process encoded_write commands Omar Sandoval
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 37+ messages in thread
From: Omar Sandoval @ 2020-08-21  7:40 UTC (permalink / raw)
  To: linux-btrfs; +Cc: linux-fsdevel

From: Boris Burkov <borisb@fb.com>

Encoded writes in receive will use pwritev2. It is possible that the
system libc does not export this function, so we stub it out and detect
whether to build the stub code with autoconf.

This syscall has special semantics in x32 (no hi lo, just takes loff_t)
so we have to detect that case and use the appropriate arguments.

Signed-off-by: Boris Burkov <boris@bur.io>
---
 Makefile     |  4 ++--
 configure.ac |  1 +
 stubs.c      | 24 ++++++++++++++++++++++++
 stubs.h      | 11 +++++++++++
 4 files changed, 38 insertions(+), 2 deletions(-)
 create mode 100644 stubs.c
 create mode 100644 stubs.h

diff --git a/Makefile b/Makefile
index c788b91b..11ecfc6c 100644
--- a/Makefile
+++ b/Makefile
@@ -173,12 +173,12 @@ libbtrfs_objects = common/send-stream.o common/send-utils.o kernel-lib/rbtree.o
 		   kernel-lib/raid56.o kernel-lib/tables.o \
 		   common/device-scan.o common/path-utils.o \
 		   common/utils.o libbtrfsutil/subvolume.o libbtrfsutil/stubs.o \
-		   crypto/hash.o crypto/xxhash.o $(CRYPTO_OBJECTS)
+		   crypto/hash.o crypto/xxhash.o $(CRYPTO_OBJECTS) stubs.o
 libbtrfs_headers = common/send-stream.h common/send-utils.h send.h kernel-lib/rbtree.h btrfs-list.h \
 	       crypto/crc32c.h kernel-lib/list.h kerncompat.h \
 	       kernel-lib/radix-tree.h kernel-lib/sizes.h kernel-lib/raid56.h \
 	       common/extent-cache.h kernel-shared/extent_io.h ioctl.h \
-	       kernel-shared/ctree.h btrfsck.h version.h
+	       kernel-shared/ctree.h btrfsck.h version.h stubs.h
 libbtrfsutil_major := $(shell sed -rn 's/^\#define BTRFS_UTIL_VERSION_MAJOR ([0-9])+$$/\1/p' libbtrfsutil/btrfsutil.h)
 libbtrfsutil_minor := $(shell sed -rn 's/^\#define BTRFS_UTIL_VERSION_MINOR ([0-9])+$$/\1/p' libbtrfsutil/btrfsutil.h)
 libbtrfsutil_patch := $(shell sed -rn 's/^\#define BTRFS_UTIL_VERSION_PATCH ([0-9])+$$/\1/p' libbtrfsutil/btrfsutil.h)
diff --git a/configure.ac b/configure.ac
index 7c2c9b8d..cbcfbe6d 100644
--- a/configure.ac
+++ b/configure.ac
@@ -45,6 +45,7 @@ AC_CHECK_FUNCS([openat], [],
 	[AC_MSG_ERROR([cannot find openat() function])])
 
 AC_CHECK_FUNCS([reallocarray])
+AC_CHECK_FUNCS([pwritev2])
 
 m4_ifndef([PKG_PROG_PKG_CONFIG],
   [m4_fatal([Could not locate the pkg-config autoconf
diff --git a/stubs.c b/stubs.c
new file mode 100644
index 00000000..ab68a411
--- /dev/null
+++ b/stubs.c
@@ -0,0 +1,24 @@
+#if HAVE_PWRITEV2 != 1
+
+#include "stubs.h"
+
+#include "kerncompat.h"
+
+#include <unistd.h>
+#include <sys/syscall.h>
+#include <sys/uio.h>
+
+ssize_t pwritev2(int fd, const struct iovec *iov, int iovcnt, off_t offset,
+		 int flags)
+{
+/* these conditions indicate an x32 system, which has a different pwritev2 */
+#if defined(__x86_64__) && defined(__ILP32__)
+	return syscall(SYS_pwritev2, fd, iov, iovcnt, offset, flags);
+#else
+	unsigned long hi = offset >> (BITS_PER_LONG / 2) >> (BITS_PER_LONG / 2);
+	unsigned long lo = offset;
+
+	return syscall(SYS_pwritev2, fd, iov, iovcnt, lo, hi, flags);
+#endif // X32
+}
+#endif /* HAVE_PWRIVEV2 */
diff --git a/stubs.h b/stubs.h
new file mode 100644
index 00000000..b39f8a69
--- /dev/null
+++ b/stubs.h
@@ -0,0 +1,11 @@
+#ifndef _BTRFS_STUBS_H
+#define _BTRFS_STUBS_H
+
+#include <sys/types.h>
+
+struct iovec;
+
+ssize_t pwritev2(int fd, const struct iovec *iov, int iovcnt, off_t offset,
+		 int flags);
+
+#endif
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 06/11] btrfs-progs: receive: process encoded_write commands
  2020-08-21  7:39 [PATCH 0/9] btrfs: implement send/receive of compressed extents without decompressing Omar Sandoval
                   ` (13 preceding siblings ...)
  2020-08-21  7:40 ` [PATCH 05/11] btrfs-progs: receive: add stub implementation for pwritev2 Omar Sandoval
@ 2020-08-21  7:40 ` Omar Sandoval
  2020-08-21  7:40 ` [PATCH 07/11] btrfs-progs: receive: encoded_write fallback to explicit decode and write Omar Sandoval
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 37+ messages in thread
From: Omar Sandoval @ 2020-08-21  7:40 UTC (permalink / raw)
  To: linux-btrfs; +Cc: linux-fsdevel

From: Boris Burkov <borisb@fb.com>

Add a new btrfs_send_op and support for both dumping and proper receive
processing which does actual encoded writes.

Encoded writes are only allowed on a file descriptor opened with an extra
flag that allows encoded writes, so we also add support for this flag
when opening or reusing a file for writing.

Signed-off-by: Boris Burkov <boris@bur.io>
---
 cmds/receive-dump.c  | 16 +++++++-
 cmds/receive.c       | 98 +++++++++++++++++++++++++++++++++++++++++---
 common/send-stream.c | 22 ++++++++++
 common/send-stream.h |  4 ++
 stubs.h              | 39 ++++++++++++++++++
 5 files changed, 173 insertions(+), 6 deletions(-)

diff --git a/cmds/receive-dump.c b/cmds/receive-dump.c
index 648d9314..20ec2b70 100644
--- a/cmds/receive-dump.c
+++ b/cmds/receive-dump.c
@@ -316,6 +316,19 @@ static int print_update_extent(const char *path, u64 offset, u64 len,
 			  offset, len);
 }
 
+static int print_encoded_write(const char *path, const void *data, u64 offset,
+			       u64 len, u64 unencoded_file_len,
+			       u64 unencoded_len, u64 unencoded_offset,
+			       u32 compression, u32 encryption, void *user)
+{
+	return PRINT_DUMP(user, path, "encoded_write",
+			  "offset=%llu len=%llu, unencoded_file_len=%llu, "
+			  "unencoded_len=%llu, unencoded_offset=%llu, "
+			  "compression=%u, encryption=%u",
+			  offset, len, unencoded_file_len, unencoded_len,
+			  unencoded_offset, compression, encryption);
+}
+
 struct btrfs_send_ops btrfs_print_send_ops = {
 	.subvol = print_subvol,
 	.snapshot = print_snapshot,
@@ -337,5 +350,6 @@ struct btrfs_send_ops btrfs_print_send_ops = {
 	.chmod = print_chmod,
 	.chown = print_chown,
 	.utimes = print_utimes,
-	.update_extent = print_update_extent
+	.update_extent = print_update_extent,
+	.encoded_write = print_encoded_write,
 };
diff --git a/cmds/receive.c b/cmds/receive.c
index 2aaba3ff..cd0f47ec 100644
--- a/cmds/receive.c
+++ b/cmds/receive.c
@@ -30,12 +30,14 @@
 #include <assert.h>
 #include <getopt.h>
 #include <limits.h>
+#include <errno.h>
 
 #include <sys/stat.h>
 #include <sys/types.h>
 #include <sys/ioctl.h>
 #include <sys/time.h>
 #include <sys/types.h>
+#include <sys/uio.h>
 #include <sys/xattr.h>
 #include <uuid/uuid.h>
 
@@ -52,6 +54,7 @@
 #include "cmds/receive-dump.h"
 #include "common/help.h"
 #include "common/path-utils.h"
+#include "stubs.h"
 
 struct btrfs_receive
 {
@@ -60,6 +63,7 @@ struct btrfs_receive
 
 	int write_fd;
 	char write_path[PATH_MAX];
+	int write_fd_allow_encoded;
 
 	char *root_path;
 	char *dest_dir_path; /* relative to root_path */
@@ -643,24 +647,65 @@ out:
 	return ret;
 }
 
-static int open_inode_for_write(struct btrfs_receive *rctx, const char *path)
+static int set_write_fd_allow_encoded(struct btrfs_receive *rctx)
+{
+	int ret;
+	int flags;
+
+	flags = fcntl(rctx->write_fd, F_GETFL);
+	if (flags < 0) {
+		ret = -errno;
+		error("failed to fetch old fd flags");
+		goto close_fd;
+	}
+	ret = fcntl(rctx->write_fd, F_SETFL, flags | O_ALLOW_ENCODED);
+	if (ret < 0) {
+		ret = -errno;
+		error("failed to enable encoded writes");
+		goto close_fd;
+	}
+	rctx->write_fd_allow_encoded = true;
+	ret = 0;
+	goto out;
+close_fd:
+	close(rctx->write_fd);
+	rctx->write_fd = -1;
+	rctx->write_fd_allow_encoded = false;
+out:
+	return ret;
+}
+
+static int open_inode_for_write(struct btrfs_receive *rctx, const char *path,
+				bool allow_encoded)
 {
 	int ret = 0;
+	int flags = O_RDWR;
 
 	if (rctx->write_fd != -1) {
-		if (strcmp(rctx->write_path, path) == 0)
+		/*
+		 * if the existing fd is for this path and the needed flags are
+		 * satisfied, no need to open a new one
+		 */
+		if (strcmp(rctx->write_path, path) == 0) {
+			/* fixup the allow encoded flag, if necessary */
+			if (allow_encoded && !rctx->write_fd_allow_encoded)
+				ret = set_write_fd_allow_encoded(rctx);
 			goto out;
+		}
 		close(rctx->write_fd);
 		rctx->write_fd = -1;
 	}
 
-	rctx->write_fd = open(path, O_RDWR);
+	if (allow_encoded)
+		flags |= O_ALLOW_ENCODED;
+	rctx->write_fd = open(path, flags);
 	if (rctx->write_fd < 0) {
 		ret = -errno;
 		error("cannot open %s: %m", path);
 		goto out;
 	}
 	strncpy_null(rctx->write_path, path);
+	rctx->write_fd_allow_encoded = allow_encoded;
 
 out:
 	return ret;
@@ -691,7 +736,7 @@ static int process_write(const char *path, const void *data, u64 offset,
 		goto out;
 	}
 
-	ret = open_inode_for_write(rctx, full_path);
+	ret = open_inode_for_write(rctx, full_path, false);
 	if (ret < 0)
 		goto out;
 
@@ -734,7 +779,7 @@ static int process_clone(const char *path, u64 offset, u64 len,
 		goto out;
 	}
 
-	ret = open_inode_for_write(rctx, full_path);
+	ret = open_inode_for_write(rctx, full_path, false);
 	if (ret < 0)
 		goto out;
 
@@ -1028,6 +1073,48 @@ static int process_update_extent(const char *path, u64 offset, u64 len,
 	return 0;
 }
 
+static int process_encoded_write(const char *path, const void *data, u64 offset,
+	u64 len, u64 unencoded_file_len, u64 unencoded_len,
+	u64 unencoded_offset, u32 compression, u32 encryption, void *user)
+{
+	int ret;
+	struct btrfs_receive *rctx = user;
+	char full_path[PATH_MAX];
+	struct encoded_iov encoded = {
+		.len = unencoded_file_len,
+		.unencoded_len = unencoded_len,
+		.unencoded_offset = unencoded_offset,
+		.compression = compression,
+		.encryption = encryption,
+	};
+	struct iovec iov[2] = {
+		{ &encoded, sizeof(encoded) },
+		{ (char *)data, len }
+	};
+
+	ret = path_cat_out(full_path, rctx->full_subvol_path, path);
+	if (ret < 0) {
+		error("encoded_write: path invalid: %s", path);
+		goto out;
+	}
+
+	ret = open_inode_for_write(rctx, full_path, true);
+	if (ret < 0)
+		goto out;
+
+	/*
+	 * NOTE: encoded writes guarantee no partial writes,
+	 * so we don't need to handle that possibility.
+	 */
+	ret = pwritev2(rctx->write_fd, iov, 2, offset, RWF_ENCODED);
+	if (ret < 0) {
+		ret = -errno;
+		error("encoded_write: writing to %s failed: %m", path);
+	}
+out:
+	return ret;
+}
+
 static struct btrfs_send_ops send_ops = {
 	.subvol = process_subvol,
 	.snapshot = process_snapshot,
@@ -1050,6 +1137,7 @@ static struct btrfs_send_ops send_ops = {
 	.chown = process_chown,
 	.utimes = process_utimes,
 	.update_extent = process_update_extent,
+	.encoded_write = process_encoded_write,
 };
 
 static int do_receive(struct btrfs_receive *rctx, const char *tomnt,
diff --git a/common/send-stream.c b/common/send-stream.c
index 77d5cd04..1376e00b 100644
--- a/common/send-stream.c
+++ b/common/send-stream.c
@@ -354,6 +354,8 @@ static int read_and_process_cmd(struct btrfs_send_stream *sctx)
 	struct timespec mt;
 	u8 uuid[BTRFS_UUID_SIZE];
 	u8 clone_uuid[BTRFS_UUID_SIZE];
+	u32 compression;
+	u32 encryption;
 	u64 tmp;
 	u64 tmp2;
 	u64 ctransid;
@@ -362,6 +364,9 @@ static int read_and_process_cmd(struct btrfs_send_stream *sctx)
 	u64 dev;
 	u64 clone_offset;
 	u64 offset;
+	u64 unencoded_file_len;
+	u64 unencoded_len;
+	u64 unencoded_offset;
 	int len;
 	int xattr_len;
 
@@ -436,6 +441,23 @@ static int read_and_process_cmd(struct btrfs_send_stream *sctx)
 		TLV_GET(sctx, BTRFS_SEND_A_DATA, &data, &len);
 		ret = sctx->ops->write(path, data, offset, len, sctx->user);
 		break;
+	case BTRFS_SEND_C_ENCODED_WRITE:
+		TLV_GET_STRING(sctx, BTRFS_SEND_A_PATH, &path);
+		TLV_GET_U64(sctx, BTRFS_SEND_A_FILE_OFFSET, &offset);
+		TLV_GET_U64(sctx, BTRFS_SEND_A_UNENCODED_FILE_LEN,
+			    &unencoded_file_len);
+		TLV_GET_U64(sctx, BTRFS_SEND_A_UNENCODED_LEN, &unencoded_len);
+		TLV_GET_U64(sctx, BTRFS_SEND_A_UNENCODED_OFFSET,
+			    &unencoded_offset);
+		TLV_GET_U32(sctx, BTRFS_SEND_A_COMPRESSION, &compression);
+		TLV_GET_U32(sctx, BTRFS_SEND_A_ENCRYPTION, &encryption);
+		TLV_GET(sctx, BTRFS_SEND_A_DATA, &data, &len);
+		ret = sctx->ops->encoded_write(path, data, offset, len,
+					       unencoded_file_len,
+					       unencoded_len, unencoded_offset,
+					       compression, encryption,
+					       sctx->user);
+		break;
 	case BTRFS_SEND_C_CLONE:
 		TLV_GET_STRING(sctx, BTRFS_SEND_A_PATH, &path);
 		TLV_GET_U64(sctx, BTRFS_SEND_A_FILE_OFFSET, &offset);
diff --git a/common/send-stream.h b/common/send-stream.h
index 39901f86..607bc007 100644
--- a/common/send-stream.h
+++ b/common/send-stream.h
@@ -66,6 +66,10 @@ struct btrfs_send_ops {
 		      struct timespec *mt, struct timespec *ct,
 		      void *user);
 	int (*update_extent)(const char *path, u64 offset, u64 len, void *user);
+	int (*encoded_write)(const char *path, const void *data, u64 offset,
+			     u64 len, u64 unencoded_file_len, u64 unencoded_len,
+			     u64 unencoded_offset, u32 compression,
+			     u32 encryption, void *user);
 };
 
 int btrfs_read_and_process_send_stream(int fd,
diff --git a/stubs.h b/stubs.h
index b39f8a69..d0ad0d06 100644
--- a/stubs.h
+++ b/stubs.h
@@ -1,6 +1,8 @@
 #ifndef _BTRFS_STUBS_H
 #define _BTRFS_STUBS_H
 
+#include <fcntl.h>
+#include <linux/fs.h>
 #include <sys/types.h>
 
 struct iovec;
@@ -8,4 +10,41 @@ struct iovec;
 ssize_t pwritev2(int fd, const struct iovec *iov, int iovcnt, off_t offset,
 		 int flags);
 
+#ifndef O_ALLOW_ENCODED
+#define O_ALLOW_ENCODED      040000000
 #endif
+
+#ifndef RWF_ENCODED
+enum {
+	ENCODED_IOV_COMPRESSION_NONE,
+#define ENCODED_IOV_COMPRESSION_NONE ENCODED_IOV_COMPRESSION_NONE
+	ENCODED_IOV_COMPRESSION_ZLIB,
+#define ENCODED_IOV_COMPRESSION_ZLIB ENCODED_IOV_COMPRESSION_ZLIB
+	ENCODED_IOV_COMPRESSION_LZO,
+#define ENCODED_IOV_COMPRESSION_LZO ENCODED_IOV_COMPRESSION_LZO
+	ENCODED_IOV_COMPRESSION_ZSTD,
+#define ENCODED_IOV_COMPRESSION_ZSTD ENCODED_IOV_COMPRESSION_ZSTD
+	ENCODED_IOV_COMPRESSION_TYPES = ENCODED_IOV_COMPRESSION_ZSTD,
+};
+
+enum {
+	ENCODED_IOV_ENCRYPTION_NONE,
+#define ENCODED_IOV_ENCRYPTION_NONE ENCODED_IOV_ENCRYPTION_NONE
+	ENCODED_IOV_ENCRYPTION_TYPES = ENCODED_IOV_ENCRYPTION_NONE,
+};
+
+struct encoded_iov {
+	__aligned_u64 len;
+	__aligned_u64 unencoded_len;
+	__aligned_u64 unencoded_offset;
+	__u32 compression;
+	__u32 encryption;
+};
+
+#define ENCODED_IOV_SIZE_VER0 32
+
+/* encoded (e.g., compressed and/or encrypted) IO */
+#define RWF_ENCODED    ((__kernel_rwf_t)0x00000020)
+#endif /* RWF_ENCODED */
+
+#endif /* _BTRFS_STUBS_H */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 07/11] btrfs-progs: receive: encoded_write fallback to explicit decode and write
  2020-08-21  7:39 [PATCH 0/9] btrfs: implement send/receive of compressed extents without decompressing Omar Sandoval
                   ` (14 preceding siblings ...)
  2020-08-21  7:40 ` [PATCH 06/11] btrfs-progs: receive: process encoded_write commands Omar Sandoval
@ 2020-08-21  7:40 ` Omar Sandoval
  2020-08-21  7:40 ` [PATCH 08/11] btrfs-progs: receive: process fallocate commands Omar Sandoval
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 37+ messages in thread
From: Omar Sandoval @ 2020-08-21  7:40 UTC (permalink / raw)
  To: linux-btrfs; +Cc: linux-fsdevel

From: Boris Burkov <boris@bur.io>

An encoded_write can fail if the file system it is being applied to does
not support encoded writes or if it can't find enough contiguous space
to accommodate the encoded extent. In those cases, we can likely still
process an encoded_write by explicitly decoding the data and doing a
normal write.

Add the necessary fallback path for decoding data compressed with zlib,
lzo, or zstd. zlib and zstd have reusable decoding context data
structures which we cache in the receive context so that we don't have
to recreate them on every encoded_write.

Finally, add a command line flag for force-decompress which causes
receive to always use the fallback path rather than first attempting the
encoded write.

Signed-off-by: Boris Burkov <boris@bur.io>
---
 cmds/receive.c | 273 +++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 262 insertions(+), 11 deletions(-)

diff --git a/cmds/receive.c b/cmds/receive.c
index cd0f47ec..c67d4653 100644
--- a/cmds/receive.c
+++ b/cmds/receive.c
@@ -41,6 +41,10 @@
 #include <sys/xattr.h>
 #include <uuid/uuid.h>
 
+#include <lzo/lzo1x.h>
+#include <zlib.h>
+#include <zstd.h>
+
 #include "kernel-shared/ctree.h"
 #include "ioctl.h"
 #include "cmds/commands.h"
@@ -82,6 +86,8 @@ struct btrfs_receive
 
 	int honor_end_cmd;
 
+	int force_decompress;
+
 	/*
 	 * Buffer to store capabilities from security.capabilities xattr,
 	 * usually 20 bytes, but make same room for potentially larger
@@ -89,6 +95,10 @@ struct btrfs_receive
 	 */
 	char cached_capabilities[64];
 	int cached_capabilities_len;
+
+	/* Reuse stream objects for encoded_write decompression fallback */
+	ZSTD_DStream *zstd_dstream;
+	z_stream *zlib_stream;
 };
 
 static int finish_subvol(struct btrfs_receive *rctx)
@@ -1073,9 +1083,210 @@ static int process_update_extent(const char *path, u64 offset, u64 len,
 	return 0;
 }
 
+static int decompress_zlib(struct btrfs_receive *rctx, const void *encoded_data,
+			   u64 encoded_len, char *unencoded_data,
+			   u64 unencoded_len)
+{
+	int status = 0;
+	bool init = false;
+	int ret;
+
+	if (!rctx->zlib_stream) {
+		init = true;
+		rctx->zlib_stream = malloc(sizeof(z_stream));
+		if (!rctx->zlib_stream) {
+			error("failed to allocate zlib stream %m");
+			status = -ENOMEM;
+			goto out;
+		}
+	}
+	rctx->zlib_stream->next_in = (void *)encoded_data;
+	rctx->zlib_stream->avail_in = encoded_len;
+	rctx->zlib_stream->next_out = (void *)unencoded_data;
+	rctx->zlib_stream->avail_out = unencoded_len;
+
+	if (!init)
+		ret = inflateReset(rctx->zlib_stream);
+	else {
+		rctx->zlib_stream->zalloc = Z_NULL;
+		rctx->zlib_stream->zfree = Z_NULL;
+		rctx->zlib_stream->opaque = Z_NULL;
+		ret = inflateInit(rctx->zlib_stream);
+	}
+	if (ret != Z_OK) {
+		error("zlib inflate init failed %d", ret);
+		status = -EIO;
+		goto out;
+	}
+
+	while (rctx->zlib_stream->avail_in > 0 &&
+	       rctx->zlib_stream->avail_out > 0) {
+		ret = inflate(rctx->zlib_stream, Z_FINISH);
+		if (ret == Z_STREAM_END) {
+			break;
+		} else if (ret != Z_OK) {
+			error("zlib inflate failed %d", ret);
+			status = -EIO;
+			break;
+		}
+	}
+out:
+	return status;
+}
+
+static int decompress_lzo(const void *encoded_data, u64 encoded_len,
+			  char *unencoded_data, u64 unencoded_len)
+{
+	uint32_t total_len;
+	size_t in_pos, out_pos;
+
+	if (encoded_len < 4) {
+		error("lzo header is truncated");
+		return -EIO;
+	}
+	memcpy(&total_len, encoded_data, 4);
+	total_len = le32toh(total_len);
+	if (total_len > encoded_len) {
+		error("lzo header is invalid");
+		return -EIO;
+	}
+
+	in_pos = 4;
+	out_pos = 0;
+	while (in_pos < total_len && out_pos < unencoded_len) {
+		uint32_t src_len;
+		lzo_uint dst_len = unencoded_len - out_pos;
+		int ret;
+
+		if (total_len - in_pos < 4) {
+			error("lzo segment header is truncated");
+			return -EIO;
+		}
+		memcpy(&src_len, encoded_data + in_pos, 4);
+		src_len = le32toh(src_len);
+		in_pos += 4;
+		if (src_len > total_len - in_pos) {
+			error("lzo segment header is invalid\n");
+			return -EIO;
+		}
+
+		ret = lzo1x_decompress_safe((void *)(encoded_data + in_pos),
+			src_len, (void *)(unencoded_data + out_pos), &dst_len,
+			NULL);
+		if (ret != LZO_E_OK) {
+			error("lzo1x_decompress_safe failed: %d", ret);
+			return -EIO;
+		}
+
+		in_pos += src_len;
+		out_pos += dst_len;
+	}
+	return 0;
+}
+
+static int decompress_zstd(struct btrfs_receive *rctx, const void *encoded_buf,
+			   u64 encoded_len, char *unencoded_buf,
+			   u64 unencoded_len)
+{
+	ZSTD_inBuffer in_buf = {
+		.src = encoded_buf,
+		.size = encoded_len
+	};
+	ZSTD_outBuffer out_buf = {
+		.dst = unencoded_buf,
+		.size = unencoded_len
+	};
+	int status = 0;
+	size_t ret;
+
+	if (!rctx->zstd_dstream) {
+		rctx->zstd_dstream = ZSTD_createDStream();
+		if (!rctx->zstd_dstream) {
+			error("failed to create zstd dstream");
+			status = -ENOMEM;
+			goto out;
+		}
+	}
+	ret = ZSTD_initDStream(rctx->zstd_dstream);
+	if (ZSTD_isError(ret)) {
+		error("failed to init zstd stream %s", ZSTD_getErrorName(ret));
+		status = -EIO;
+		goto out;
+	}
+	while (in_buf.pos < in_buf.size && out_buf.pos < out_buf.size) {
+		ret = ZSTD_decompressStream(rctx->zstd_dstream, &out_buf, &in_buf);
+		if (ret == 0) {
+			break;
+		} else if (ZSTD_isError(ret)) {
+			error("failed to decompress zstd stream: %s",
+			      ZSTD_getErrorName(ret));
+			status = -EIO;
+			goto out;
+		}
+	}
+
+out:
+	return status;
+}
+
+static int decompress_and_write(const void *encoded_data, u64 encoded_len,
+				u64 unencoded_file_len, u64 unencoded_len,
+				u64 unencoded_offset, u32 compression,
+				void *user)
+{
+	int ret = 0;
+	size_t pos;
+	ssize_t w;
+	struct btrfs_receive *rctx = user;
+	char *unencoded_data;
+
+	unencoded_data = calloc(unencoded_len, sizeof(*unencoded_data));
+	if (!unencoded_data) {
+		error("allocating space for unencoded data failed: %m");
+		return -errno;
+	}
+
+	switch (compression) {
+	case ENCODED_IOV_COMPRESSION_ZLIB:
+		ret = decompress_zlib(rctx, encoded_data, encoded_len,
+				  unencoded_data, unencoded_len);
+		if (ret)
+			goto out;
+		break;
+	case ENCODED_IOV_COMPRESSION_LZO:
+		ret = decompress_lzo(encoded_data, encoded_len,
+				 unencoded_data, unencoded_len);
+		if (ret)
+			goto out;
+		break;
+	case ENCODED_IOV_COMPRESSION_ZSTD:
+		ret = decompress_zstd(rctx, encoded_data, encoded_len,
+				  unencoded_data, unencoded_len);
+		if (ret)
+			goto out;
+		break;
+	}
+
+	pos = unencoded_offset;
+	while (pos < unencoded_file_len) {
+		w = pwrite(rctx->write_fd, unencoded_data + pos,
+			   unencoded_file_len - pos, unencoded_offset + pos);
+		if (w < 0) {
+			ret = -errno;
+			error("writing unencoded data failed: %m");
+			goto out;
+		}
+		pos += w;
+	}
+out:
+	free(unencoded_data);
+	return ret;
+}
+
 static int process_encoded_write(const char *path, const void *data, u64 offset,
-	u64 len, u64 unencoded_file_len, u64 unencoded_len,
-	u64 unencoded_offset, u32 compression, u32 encryption, void *user)
+				 u64 len, u64 unencoded_file_len,
+				 u64 unencoded_len, u64 unencoded_offset,
+				 u32 compression, u32 encryption, void *user)
 {
 	int ret;
 	struct btrfs_receive *rctx = user;
@@ -1091,6 +1302,14 @@ static int process_encoded_write(const char *path, const void *data, u64 offset,
 		{ &encoded, sizeof(encoded) },
 		{ (char *)data, len }
 	};
+	bool encoded_write = !rctx->force_decompress;
+	bool decompress = rctx->force_decompress;
+
+	if (encryption) {
+		error("encoded_write: encryption not supported\n");
+		ret = -EOPNOTSUPP;
+		goto out;
+	}
 
 	ret = path_cat_out(full_path, rctx->full_subvol_path, path);
 	if (ret < 0) {
@@ -1102,15 +1321,37 @@ static int process_encoded_write(const char *path, const void *data, u64 offset,
 	if (ret < 0)
 		goto out;
 
-	/*
-	 * NOTE: encoded writes guarantee no partial writes,
-	 * so we don't need to handle that possibility.
-	 */
-	ret = pwritev2(rctx->write_fd, iov, 2, offset, RWF_ENCODED);
-	if (ret < 0) {
-		ret = -errno;
-		error("encoded_write: writing to %s failed: %m", path);
+	if (encoded_write) {
+		/*
+		 * NOTE: encoded writes guarantee no partial writes,
+		 * so we don't need to handle that possibility.
+		 */
+		ret = pwritev2(rctx->write_fd, iov, 2, offset, RWF_ENCODED);
+		if (ret < 0) {
+			/*
+			 * error conditions where fallback to manual decompress
+			 * and write make sense.
+			 */
+			if (errno == ENOSPC ||
+			    errno == EOPNOTSUPP ||
+			    errno == EINVAL)
+				decompress = true;
+			else {
+				ret = -errno;
+				error("encoded_write: writing to %s failed: %m", path);
+				goto out;
+			}
+		}
 	}
+
+	if (decompress) {
+		ret = decompress_and_write(data, len, unencoded_file_len,
+				unencoded_len, unencoded_offset,
+				compression, user);
+		if (ret < 0)
+			goto out;
+	}
+	ret = 0;
 out:
 	return ret;
 }
@@ -1300,6 +1541,12 @@ out:
 		close(rctx->dest_dir_fd);
 		rctx->dest_dir_fd = -1;
 	}
+	if (rctx->zstd_dstream)
+		ZSTD_freeDStream(rctx->zstd_dstream);
+	if (rctx->zlib_stream) {
+		inflateEnd(rctx->zlib_stream);
+		free(rctx->zlib_stream);
+	}
 
 	return ret;
 }
@@ -1373,12 +1620,13 @@ static int cmd_receive(const struct cmd_struct *cmd, int argc, char **argv)
 	optind = 0;
 	while (1) {
 		int c;
-		enum { GETOPT_VAL_DUMP = 257 };
+		enum { GETOPT_VAL_DUMP = 257, GETOPT_VAL_FORCE_DECOMPRESS };
 		static const struct option long_opts[] = {
 			{ "max-errors", required_argument, NULL, 'E' },
 			{ "chroot", no_argument, NULL, 'C' },
 			{ "dump", no_argument, NULL, GETOPT_VAL_DUMP },
 			{ "quiet", no_argument, NULL, 'q' },
+			{ "force-decompress", no_argument, NULL, GETOPT_VAL_FORCE_DECOMPRESS },
 			{ NULL, 0, NULL, 0 }
 		};
 
@@ -1421,6 +1669,9 @@ static int cmd_receive(const struct cmd_struct *cmd, int argc, char **argv)
 		case GETOPT_VAL_DUMP:
 			dump = 1;
 			break;
+		case GETOPT_VAL_FORCE_DECOMPRESS:
+			rctx.force_decompress = 1;
+			break;
 		default:
 			usage_unknown_option(cmd, argv);
 		}
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 08/11] btrfs-progs: receive: process fallocate commands
  2020-08-21  7:39 [PATCH 0/9] btrfs: implement send/receive of compressed extents without decompressing Omar Sandoval
                   ` (15 preceding siblings ...)
  2020-08-21  7:40 ` [PATCH 07/11] btrfs-progs: receive: encoded_write fallback to explicit decode and write Omar Sandoval
@ 2020-08-21  7:40 ` Omar Sandoval
  2020-08-21  7:40 ` [PATCH 09/11] btrfs-progs: receive: process setflags ioctl commands Omar Sandoval
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 37+ messages in thread
From: Omar Sandoval @ 2020-08-21  7:40 UTC (permalink / raw)
  To: linux-btrfs; +Cc: linux-fsdevel

From: Boris Burkov <boris@bur.io>

Send stream v2 can emit fallocate commands, so receive must support them
as well. The implementation simply passes along the arguments to the
syscall. Note that mode is encoded as a u32 in send stream but fallocate
takes an int, so there is a unsigned->signed conversion there.

Signed-off-by: Boris Burkov <boris@bur.io>
---
 cmds/receive-dump.c  |  9 +++++++++
 cmds/receive.c       | 26 ++++++++++++++++++++++++++
 common/send-stream.c |  9 +++++++++
 common/send-stream.h |  2 ++
 4 files changed, 46 insertions(+)

diff --git a/cmds/receive-dump.c b/cmds/receive-dump.c
index 20ec2b70..acc0ba32 100644
--- a/cmds/receive-dump.c
+++ b/cmds/receive-dump.c
@@ -329,6 +329,14 @@ static int print_encoded_write(const char *path, const void *data, u64 offset,
 			  unencoded_offset, compression, encryption);
 }
 
+static int print_fallocate(const char *path, int mode, u64 offset, u64 len,
+			   void *user)
+{
+	return PRINT_DUMP(user, path, "fallocate",
+			  "mode=%d offset=%llu len=%llu",
+			  mode, offset, len);
+}
+
 struct btrfs_send_ops btrfs_print_send_ops = {
 	.subvol = print_subvol,
 	.snapshot = print_snapshot,
@@ -352,4 +360,5 @@ struct btrfs_send_ops btrfs_print_send_ops = {
 	.utimes = print_utimes,
 	.update_extent = print_update_extent,
 	.encoded_write = print_encoded_write,
+	.fallocate = print_fallocate,
 };
diff --git a/cmds/receive.c b/cmds/receive.c
index c67d4653..5c0930cb 100644
--- a/cmds/receive.c
+++ b/cmds/receive.c
@@ -1356,6 +1356,31 @@ out:
 	return ret;
 }
 
+static int process_fallocate(const char *path, int mode, u64 offset, u64 len,
+			     void *user)
+{
+	int ret;
+	struct btrfs_receive *rctx = user;
+	char full_path[PATH_MAX];
+
+	ret = path_cat_out(full_path, rctx->full_subvol_path, path);
+	if (ret < 0) {
+		error("fallocate: path invalid: %s", path);
+		goto out;
+	}
+	ret = open_inode_for_write(rctx, full_path, false);
+	if (ret < 0)
+		goto out;
+	ret = fallocate(rctx->write_fd, mode, offset, len);
+	if (ret < 0) {
+		ret = -errno;
+		error("fallocate: fallocate on %s failed: %m", path);
+		goto out;
+	}
+out:
+	return ret;
+}
+
 static struct btrfs_send_ops send_ops = {
 	.subvol = process_subvol,
 	.snapshot = process_snapshot,
@@ -1379,6 +1404,7 @@ static struct btrfs_send_ops send_ops = {
 	.utimes = process_utimes,
 	.update_extent = process_update_extent,
 	.encoded_write = process_encoded_write,
+	.fallocate = process_fallocate,
 };
 
 static int do_receive(struct btrfs_receive *rctx, const char *tomnt,
diff --git a/common/send-stream.c b/common/send-stream.c
index 1376e00b..d455cdfb 100644
--- a/common/send-stream.c
+++ b/common/send-stream.c
@@ -369,6 +369,7 @@ static int read_and_process_cmd(struct btrfs_send_stream *sctx)
 	u64 unencoded_offset;
 	int len;
 	int xattr_len;
+	int fallocate_mode;
 
 	ret = read_cmd(sctx);
 	if (ret)
@@ -514,6 +515,14 @@ static int read_and_process_cmd(struct btrfs_send_stream *sctx)
 	case BTRFS_SEND_C_END:
 		ret = 1;
 		break;
+	case BTRFS_SEND_C_FALLOCATE:
+		TLV_GET_STRING(sctx, BTRFS_SEND_A_PATH, &path);
+		TLV_GET_U32(sctx, BTRFS_SEND_A_FALLOCATE_MODE, &fallocate_mode);
+		TLV_GET_U64(sctx, BTRFS_SEND_A_FILE_OFFSET, &offset);
+		TLV_GET_U64(sctx, BTRFS_SEND_A_SIZE, &tmp);
+		ret = sctx->ops->fallocate(path, fallocate_mode, offset, tmp,
+					   sctx->user);
+		break;
 	}
 
 tlv_get_failed:
diff --git a/common/send-stream.h b/common/send-stream.h
index 607bc007..a58739bb 100644
--- a/common/send-stream.h
+++ b/common/send-stream.h
@@ -70,6 +70,8 @@ struct btrfs_send_ops {
 			     u64 len, u64 unencoded_file_len, u64 unencoded_len,
 			     u64 unencoded_offset, u32 compression,
 			     u32 encryption, void *user);
+	int (*fallocate)(const char *path, int mode, u64 offset, u64 len,
+			 void *user);
 };
 
 int btrfs_read_and_process_send_stream(int fd,
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 09/11] btrfs-progs: receive: process setflags ioctl commands
  2020-08-21  7:39 [PATCH 0/9] btrfs: implement send/receive of compressed extents without decompressing Omar Sandoval
                   ` (16 preceding siblings ...)
  2020-08-21  7:40 ` [PATCH 08/11] btrfs-progs: receive: process fallocate commands Omar Sandoval
@ 2020-08-21  7:40 ` Omar Sandoval
  2020-08-21  7:40 ` [PATCH 10/11] btrfs-progs: send: stream v2 ioctl flags Omar Sandoval
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 37+ messages in thread
From: Omar Sandoval @ 2020-08-21  7:40 UTC (permalink / raw)
  To: linux-btrfs; +Cc: linux-fsdevel

From: Boris Burkov <boris@bur.io>

In send stream v2, send can emit a command for setting inode flags via
the setflags ioctl. Pass the flags attribute through to the ioctl call
in receive.

Signed-off-by: Boris Burkov <boris@bur.io>
---
 cmds/receive-dump.c  |  6 ++++++
 cmds/receive.c       | 25 +++++++++++++++++++++++++
 common/send-stream.c |  7 +++++++
 common/send-stream.h |  1 +
 4 files changed, 39 insertions(+)

diff --git a/cmds/receive-dump.c b/cmds/receive-dump.c
index acc0ba32..40f07ad4 100644
--- a/cmds/receive-dump.c
+++ b/cmds/receive-dump.c
@@ -337,6 +337,11 @@ static int print_fallocate(const char *path, int mode, u64 offset, u64 len,
 			  mode, offset, len);
 }
 
+static int print_setflags(const char *path, int flags, void *user)
+{
+	return PRINT_DUMP(user, path, "setflags", "flags=%d", flags);
+}
+
 struct btrfs_send_ops btrfs_print_send_ops = {
 	.subvol = print_subvol,
 	.snapshot = print_snapshot,
@@ -361,4 +366,5 @@ struct btrfs_send_ops btrfs_print_send_ops = {
 	.update_extent = print_update_extent,
 	.encoded_write = print_encoded_write,
 	.fallocate = print_fallocate,
+	.setflags = print_setflags,
 };
diff --git a/cmds/receive.c b/cmds/receive.c
index 5c0930cb..50c843ed 100644
--- a/cmds/receive.c
+++ b/cmds/receive.c
@@ -1381,6 +1381,30 @@ out:
 	return ret;
 }
 
+static int process_setflags(const char *path, int flags, void *user)
+{
+	int ret;
+	struct btrfs_receive *rctx = user;
+	char full_path[PATH_MAX];
+
+	ret = path_cat_out(full_path, rctx->full_subvol_path, path);
+	if (ret < 0) {
+		error("setflags: path invalid: %s", path);
+		goto out;
+	}
+	ret = open_inode_for_write(rctx, full_path, false);
+	if (ret < 0)
+		goto out;
+	ret = ioctl(rctx->write_fd, FS_IOC_SETFLAGS, &flags);
+	if (ret < 0) {
+		ret = -errno;
+		error("setflags: setflags ioctl on %s failed: %m", path);
+		goto out;
+	}
+out:
+	return ret;
+}
+
 static struct btrfs_send_ops send_ops = {
 	.subvol = process_subvol,
 	.snapshot = process_snapshot,
@@ -1405,6 +1429,7 @@ static struct btrfs_send_ops send_ops = {
 	.update_extent = process_update_extent,
 	.encoded_write = process_encoded_write,
 	.fallocate = process_fallocate,
+	.setflags = process_setflags,
 };
 
 static int do_receive(struct btrfs_receive *rctx, const char *tomnt,
diff --git a/common/send-stream.c b/common/send-stream.c
index d455cdfb..da0c0e5d 100644
--- a/common/send-stream.c
+++ b/common/send-stream.c
@@ -370,6 +370,7 @@ static int read_and_process_cmd(struct btrfs_send_stream *sctx)
 	int len;
 	int xattr_len;
 	int fallocate_mode;
+	int setflags_flags;
 
 	ret = read_cmd(sctx);
 	if (ret)
@@ -523,8 +524,14 @@ static int read_and_process_cmd(struct btrfs_send_stream *sctx)
 		ret = sctx->ops->fallocate(path, fallocate_mode, offset, tmp,
 					   sctx->user);
 		break;
+	case BTRFS_SEND_C_SETFLAGS:
+		TLV_GET_STRING(sctx, BTRFS_SEND_A_PATH, &path);
+		TLV_GET_U32(sctx, BTRFS_SEND_A_SETFLAGS_FLAGS, &setflags_flags);
+		ret = sctx->ops->setflags(path, setflags_flags, sctx->user);
+		break;
 	}
 
+
 tlv_get_failed:
 out:
 	free(path);
diff --git a/common/send-stream.h b/common/send-stream.h
index a58739bb..5373bf69 100644
--- a/common/send-stream.h
+++ b/common/send-stream.h
@@ -72,6 +72,7 @@ struct btrfs_send_ops {
 			     u32 encryption, void *user);
 	int (*fallocate)(const char *path, int mode, u64 offset, u64 len,
 			 void *user);
+	int (*setflags)(const char *path, int flags, void *user);
 };
 
 int btrfs_read_and_process_send_stream(int fd,
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 10/11] btrfs-progs: send: stream v2 ioctl flags
  2020-08-21  7:39 [PATCH 0/9] btrfs: implement send/receive of compressed extents without decompressing Omar Sandoval
                   ` (17 preceding siblings ...)
  2020-08-21  7:40 ` [PATCH 09/11] btrfs-progs: receive: process setflags ioctl commands Omar Sandoval
@ 2020-08-21  7:40 ` Omar Sandoval
  2020-08-21  7:40 ` [PATCH 11/11] btrfs-progs: receive: add tests for basic encoded_write send/receive Omar Sandoval
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 37+ messages in thread
From: Omar Sandoval @ 2020-08-21  7:40 UTC (permalink / raw)
  To: linux-btrfs; +Cc: linux-fsdevel

From: Boris Burkov <boris@bur.io>

To make the btrfs send ioctl use the stream v2 format requires passing
BTRFS_SEND_FLAG_STREAM_V2 in flags. Further, to cause the ioctl to emit
encoded_write commands for encoded extents, we must set that flag as
well as BTRFS_SEND_FLAG_COMPRESSED. Finally, we bump up the version in
send.h as well, since we are now fully compatible with v2.

Add two command line arguments to btrfs send: --stream-version and
--compressed. --stream-version requires an argument which it parses as
an integer and sets STREAM_V2 if the argument is 2. --compressed does
not require an argument and automatically implies STREAM_V2 as well
(COMPRESSED alone causes the ioctl to error out).

Some examples to illustrate edge cases:

// v1, old format and no encoded_writes
btrfs send subvol
btrfs send --stream-version 1 subvol

// v2 and compressed, we will see encoded_writes
btrfs send --compressed subvol
btrfs send --compressed --stream-version 2 subvol

// v2 only, new format but no encoded_writes
btrfs send --stream-version 2 subvol

// error: compressed needs version >= 2
btrfs send --compressed --stream-version 1 subvol

// error: invalid version (not 1 or 2)
btrfs send --stream-version 3 subvol
btrfs send --compressed --stream-version 0 subvol
btrfs send --compressed --stream-version 10 subvol

Signed-off-by: Boris Burkov <boris@bur.io>
---
 cmds/send.c          | 39 +++++++++++++++++++++++++++++++++++++--
 ioctl.h              | 17 ++++++++++++++++-
 libbtrfsutil/btrfs.h | 17 ++++++++++++++++-
 send.h               |  2 +-
 4 files changed, 70 insertions(+), 5 deletions(-)

diff --git a/cmds/send.c b/cmds/send.c
index b8e3ba12..4c4eaa84 100644
--- a/cmds/send.c
+++ b/cmds/send.c
@@ -474,6 +474,7 @@ static int cmd_send(const struct cmd_struct *cmd, int argc, char **argv)
 	int full_send = 1;
 	int new_end_cmd_semantic = 0;
 	u64 send_flags = 0;
+	long stream_version = 0;
 
 	memset(&send, 0, sizeof(send));
 	send.dump_fd = fileno(stdout);
@@ -492,11 +493,17 @@ static int cmd_send(const struct cmd_struct *cmd, int argc, char **argv)
 
 	optind = 0;
 	while (1) {
-		enum { GETOPT_VAL_SEND_NO_DATA = 256 };
+		enum {
+			GETOPT_VAL_SEND_NO_DATA = 256,
+			GETOPT_VAL_SEND_STREAM_V2,
+			GETOPT_VAL_SEND_COMPRESSED
+		};
 		static const struct option long_options[] = {
 			{ "verbose", no_argument, NULL, 'v' },
 			{ "quiet", no_argument, NULL, 'q' },
-			{ "no-data", no_argument, NULL, GETOPT_VAL_SEND_NO_DATA }
+			{ "no-data", no_argument, NULL, GETOPT_VAL_SEND_NO_DATA },
+			{ "stream-version", required_argument, NULL, GETOPT_VAL_SEND_STREAM_V2 },
+			{ "compressed", no_argument, NULL, GETOPT_VAL_SEND_COMPRESSED }
 		};
 		int c = getopt_long(argc, argv, "vqec:f:i:p:", long_options, NULL);
 
@@ -584,10 +591,38 @@ static int cmd_send(const struct cmd_struct *cmd, int argc, char **argv)
 		case GETOPT_VAL_SEND_NO_DATA:
 			send_flags |= BTRFS_SEND_FLAG_NO_FILE_DATA;
 			break;
+		case GETOPT_VAL_SEND_STREAM_V2:
+			stream_version = strtol(optarg, NULL, 10);
+			if (stream_version < 1 || stream_version > 2) {
+				ret = 1;
+				error("invalid --stream-version. valid values: {1, 2}");
+				goto out;
+			}
+			if (stream_version == 2)
+				send_flags |= BTRFS_SEND_FLAG_STREAM_V2;
+			break;
+		case GETOPT_VAL_SEND_COMPRESSED:
+			send_flags |= BTRFS_SEND_FLAG_COMPRESSED;
+			/*
+			 * We want to default to stream v2 if only compressed is
+			 * set. If stream_version is explicitly set to 0, that
+			 * will trigger its own error condition for being an
+			 * invalid version.
+			 */
+			if (stream_version == 0) {
+				stream_version = 2;
+				send_flags |= BTRFS_SEND_FLAG_STREAM_V2;
+			}
+			break;
 		default:
 			usage_unknown_option(cmd, argv);
 		}
 	}
+	if (stream_version < 2 && (send_flags & BTRFS_SEND_FLAG_COMPRESSED)) {
+		ret = 1;
+		error("--compressed requires --stream-version >= 2");
+		goto out;
+	}
 
 	if (check_argc_min(argc - optind, 1))
 		return 1;
diff --git a/ioctl.h b/ioctl.h
index ade6dcb9..46de8ac8 100644
--- a/ioctl.h
+++ b/ioctl.h
@@ -653,10 +653,25 @@ BUILD_ASSERT(sizeof(struct btrfs_ioctl_received_subvol_args_32) == 192);
  */
 #define BTRFS_SEND_FLAG_OMIT_END_CMD		0x4
 
+/*
+ * Use version 2 of the send stream, which adds new commands and supports larger
+ * writes.
+ */
+#define BTRFS_SEND_FLAG_STREAM_V2		0x8
+
+/*
+ * Send compressed data using the ENCODED_WRITE command instead of decompressing
+ * the data and sending it with the WRITE command. This requires
+ * BTRFS_SEND_FLAG_STREAM_V2.
+ */
+#define BTRFS_SEND_FLAG_COMPRESSED		0x10
+
 #define BTRFS_SEND_FLAG_MASK \
 	(BTRFS_SEND_FLAG_NO_FILE_DATA | \
 	 BTRFS_SEND_FLAG_OMIT_STREAM_HEADER | \
-	 BTRFS_SEND_FLAG_OMIT_END_CMD)
+	 BTRFS_SEND_FLAG_OMIT_END_CMD | \
+	 BTRFS_SEND_FLAG_STREAM_V2 | \
+	 BTRFS_SEND_FLAG_COMPRESSED)
 
 struct btrfs_ioctl_send_args {
 	__s64 send_fd;			/* in */
diff --git a/libbtrfsutil/btrfs.h b/libbtrfsutil/btrfs.h
index 60d51ff6..8430a40d 100644
--- a/libbtrfsutil/btrfs.h
+++ b/libbtrfsutil/btrfs.h
@@ -731,10 +731,25 @@ struct btrfs_ioctl_received_subvol_args {
  */
 #define BTRFS_SEND_FLAG_OMIT_END_CMD		0x4
 
+/*
+ * Use version 2 of the send stream, which adds new commands and supports larger
+ * writes.
+ */
+#define BTRFS_SEND_FLAG_STREAM_V2		0x8
+
+/*
+ * Send compressed data using the ENCODED_WRITE command instead of decompressing
+ * the data and sending it with the WRITE command. This requires
+ * BTRFS_SEND_FLAG_STREAM_V2.
+ */
+#define BTRFS_SEND_FLAG_COMPRESSED		0x10
+
 #define BTRFS_SEND_FLAG_MASK \
 	(BTRFS_SEND_FLAG_NO_FILE_DATA | \
 	 BTRFS_SEND_FLAG_OMIT_STREAM_HEADER | \
-	 BTRFS_SEND_FLAG_OMIT_END_CMD)
+	 BTRFS_SEND_FLAG_OMIT_END_CMD | \
+	 BTRFS_SEND_FLAG_STREAM_V2 | \
+	 BTRFS_SEND_FLAG_COMPRESSED)
 
 struct btrfs_ioctl_send_args {
 	__s64 send_fd;			/* in */
diff --git a/send.h b/send.h
index 3c47e0c7..fac90588 100644
--- a/send.h
+++ b/send.h
@@ -31,7 +31,7 @@ extern "C" {
 #endif
 
 #define BTRFS_SEND_STREAM_MAGIC "btrfs-stream"
-#define BTRFS_SEND_STREAM_VERSION 1
+#define BTRFS_SEND_STREAM_VERSION 2
 
 #define BTRFS_SEND_BUF_SIZE_V1 SZ_64K
 #define BTRFS_SEND_READ_SIZE (1024 * 48)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 11/11] btrfs-progs: receive: add tests for basic encoded_write send/receive
  2020-08-21  7:39 [PATCH 0/9] btrfs: implement send/receive of compressed extents without decompressing Omar Sandoval
                   ` (18 preceding siblings ...)
  2020-08-21  7:40 ` [PATCH 10/11] btrfs-progs: send: stream v2 ioctl flags Omar Sandoval
@ 2020-08-21  7:40 ` Omar Sandoval
  2020-08-24 19:57 ` [PATCH 0/9] btrfs: implement send/receive of compressed extents without decompressing David Sterba
  2020-09-10 11:28 ` David Sterba
  21 siblings, 0 replies; 37+ messages in thread
From: Omar Sandoval @ 2020-08-21  7:40 UTC (permalink / raw)
  To: linux-btrfs; +Cc: linux-fsdevel

From: Boris Burkov <boris@bur.io>

Adapt the existing send/receive tests by passing '-o --force-compress'
to the mount commands in a new test. After writing a few files in the
various compression formats, send/receive them with and without
--force-decompress to test both the encoded_write path and the
fallback to decode+write.

Signed-off-by: Boris Burkov <boris@bur.io>
---
 .../040-receive-write-encoded/test.sh         | 114 ++++++++++++++++++
 1 file changed, 114 insertions(+)
 create mode 100755 tests/misc-tests/040-receive-write-encoded/test.sh

diff --git a/tests/misc-tests/040-receive-write-encoded/test.sh b/tests/misc-tests/040-receive-write-encoded/test.sh
new file mode 100755
index 00000000..4df6ccd6
--- /dev/null
+++ b/tests/misc-tests/040-receive-write-encoded/test.sh
@@ -0,0 +1,114 @@
+#!/bin/bash
+#
+# test that we can send and receive encoded writes for three modes of
+# transparent compression: zlib, lzo, and zstd.
+
+source "$TEST_TOP/common"
+
+check_prereq mkfs.btrfs
+check_prereq btrfs
+
+setup_root_helper
+prepare_test_dev
+
+here=`pwd`
+
+# assumes the filesystem exists, and does mount, write, snapshot, send, unmount
+# for the specified encoding option
+send_one() {
+	local str
+	local subv
+	local snap
+
+	algorithm="$1"
+	shift
+	str="$1"
+	shift
+
+	subv="subv-$algorithm"
+	snap="snap-$algorithm"
+
+	run_check_mount_test_dev "-o" "compress-force=$algorithm"
+	cd "$TEST_MNT" || _fail "cannot chdir to TEST_MNT"
+
+	run_check $SUDO_HELPER "$TOP/btrfs" subvolume create "$subv"
+	run_check $SUDO_HELPER dd if=/dev/zero of="$subv/file1" bs=1M count=1
+	run_check $SUDO_HELPER dd if=/dev/zero of="$subv/file2" bs=500K count=1
+	run_check $SUDO_HELPER "$TOP/btrfs" subvolume snapshot -r "$subv" "$snap"
+	run_check $SUDO_HELPER "$TOP/btrfs" send -f "$str" "$snap" "$@"
+
+	cd "$here" || _fail "cannot chdir back to test directory"
+	run_check_umount_test_dev
+}
+
+receive_one() {
+	local str
+	str="$1"
+	shift
+
+	run_check_mkfs_test_dev
+	run_check_mount_test_dev
+	run_check $SUDO_HELPER "$TOP/btrfs" receive "$@" -v -f "$str" "$TEST_MNT"
+	run_check_umount_test_dev
+	run_check rm -f -- "$str"
+}
+
+test_one_write_encoded() {
+	local str
+	local algorithm
+	algorithm="$1"
+	shift
+	str="$here/stream-$algorithm.stream"
+
+	run_check_mkfs_test_dev
+	send_one "$algorithm" "$str" --compressed
+	receive_one "$str" "$@"
+}
+
+test_one_stream_v1() {
+	local str
+	local algorithm
+	algorithm="$1"
+	shift
+	str="$here/stream-$algorithm.stream"
+
+	run_check_mkfs_test_dev
+	send_one "$algorithm" "$str" --stream-version 1
+	receive_one "$str" "$@"
+}
+
+test_mix_write_encoded() {
+	local strzlib
+	local strlzo
+	local strzstd
+	strzlib="$here/stream-zlib.stream"
+	strlzo="$here/stream-lzo.stream"
+	strzstd="$here/stream-zstd.stream"
+
+	run_check_mkfs_test_dev
+
+	send_one "zlib" "$strzlib" --compressed
+	send_one "lzo" "$strlzo" --compressed
+	send_one "zstd" "$strzstd" --compressed
+
+	receive_one "$strzlib"
+	receive_one "$strlzo"
+	receive_one "$strzstd"
+}
+
+test_one_write_encoded "zlib"
+test_one_write_encoded "lzo"
+test_one_write_encoded "zstd"
+
+# with decompression forced
+test_one_write_encoded "zlib" "--force-decompress"
+test_one_write_encoded "lzo" "--force-decompress"
+test_one_write_encoded "zstd" "--force-decompress"
+
+# send stream v1
+test_one_stream_v1 "zlib"
+test_one_stream_v1 "lzo"
+test_one_stream_v1 "zstd"
+
+# files use a mix of compression algorithms
+test_mix_write_encoded
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/9] btrfs: send: get rid of i_size logic in send_write()
  2020-08-21  7:39 ` [PATCH 1/9] btrfs: send: get rid of i_size logic in send_write() Omar Sandoval
@ 2020-08-21 17:26   ` Filipe Manana
  2020-08-24 17:39   ` Josef Bacik
  1 sibling, 0 replies; 37+ messages in thread
From: Filipe Manana @ 2020-08-21 17:26 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-btrfs, linux-fsdevel

On Fri, Aug 21, 2020 at 8:42 AM Omar Sandoval <osandov@osandov.com> wrote:
>
> From: Omar Sandoval <osandov@fb.com>
>
> send_write()/fill_read_buf() have some logic for avoiding reading past
> i_size. However, everywhere that we call
> send_write()/send_extent_data(), we've already clamped the length down
> to i_size. Get rid of the i_size handling, which simplifies the next
> change.
>
> Signed-off-by: Omar Sandoval <osandov@fb.com>

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Looks good, and it passed some long duration tests with both full and
incremental sends here (with and without compression, no-holes, etc).

Thanks.

> ---
>  fs/btrfs/send.c | 37 ++++++++++---------------------------
>  1 file changed, 10 insertions(+), 27 deletions(-)
>
> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> index 7c7c09fc65e8..8af5e867e4ca 100644
> --- a/fs/btrfs/send.c
> +++ b/fs/btrfs/send.c
> @@ -4794,7 +4794,7 @@ static int process_all_new_xattrs(struct send_ctx *sctx)
>         return ret;
>  }
>
> -static ssize_t fill_read_buf(struct send_ctx *sctx, u64 offset, u32 len)
> +static int fill_read_buf(struct send_ctx *sctx, u64 offset, u32 len)
>  {
>         struct btrfs_root *root = sctx->send_root;
>         struct btrfs_fs_info *fs_info = root->fs_info;
> @@ -4804,21 +4804,13 @@ static ssize_t fill_read_buf(struct send_ctx *sctx, u64 offset, u32 len)
>         pgoff_t index = offset >> PAGE_SHIFT;
>         pgoff_t last_index;
>         unsigned pg_offset = offset_in_page(offset);
> -       ssize_t ret = 0;
> +       int ret = 0;
> +       size_t read = 0;
>
>         inode = btrfs_iget(fs_info->sb, sctx->cur_ino, root);
>         if (IS_ERR(inode))
>                 return PTR_ERR(inode);
>
> -       if (offset + len > i_size_read(inode)) {
> -               if (offset > i_size_read(inode))
> -                       len = 0;
> -               else
> -                       len = offset - i_size_read(inode);
> -       }
> -       if (len == 0)
> -               goto out;
> -
>         last_index = (offset + len - 1) >> PAGE_SHIFT;
>
>         /* initial readahead */
> @@ -4859,16 +4851,15 @@ static ssize_t fill_read_buf(struct send_ctx *sctx, u64 offset, u32 len)
>                 }
>
>                 addr = kmap(page);
> -               memcpy(sctx->read_buf + ret, addr + pg_offset, cur_len);
> +               memcpy(sctx->read_buf + read, addr + pg_offset, cur_len);
>                 kunmap(page);
>                 unlock_page(page);
>                 put_page(page);
>                 index++;
>                 pg_offset = 0;
>                 len -= cur_len;
> -               ret += cur_len;
> +               read += cur_len;
>         }
> -out:
>         iput(inode);
>         return ret;
>  }
> @@ -4882,7 +4873,6 @@ static int send_write(struct send_ctx *sctx, u64 offset, u32 len)
>         struct btrfs_fs_info *fs_info = sctx->send_root->fs_info;
>         int ret = 0;
>         struct fs_path *p;
> -       ssize_t num_read = 0;
>
>         p = fs_path_alloc();
>         if (!p)
> @@ -4890,12 +4880,9 @@ static int send_write(struct send_ctx *sctx, u64 offset, u32 len)
>
>         btrfs_debug(fs_info, "send_write offset=%llu, len=%d", offset, len);
>
> -       num_read = fill_read_buf(sctx, offset, len);
> -       if (num_read <= 0) {
> -               if (num_read < 0)
> -                       ret = num_read;
> +       ret = fill_read_buf(sctx, offset, len);
> +       if (ret < 0)
>                 goto out;
> -       }
>
>         ret = begin_cmd(sctx, BTRFS_SEND_C_WRITE);
>         if (ret < 0)
> @@ -4907,16 +4894,14 @@ static int send_write(struct send_ctx *sctx, u64 offset, u32 len)
>
>         TLV_PUT_PATH(sctx, BTRFS_SEND_A_PATH, p);
>         TLV_PUT_U64(sctx, BTRFS_SEND_A_FILE_OFFSET, offset);
> -       TLV_PUT(sctx, BTRFS_SEND_A_DATA, sctx->read_buf, num_read);
> +       TLV_PUT(sctx, BTRFS_SEND_A_DATA, sctx->read_buf, len);
>
>         ret = send_cmd(sctx);
>
>  tlv_put_failure:
>  out:
>         fs_path_free(p);
> -       if (ret < 0)
> -               return ret;
> -       return num_read;
> +       return ret;
>  }
>
>  /*
> @@ -5095,9 +5080,7 @@ static int send_extent_data(struct send_ctx *sctx,
>                 ret = send_write(sctx, offset + sent, size);
>                 if (ret < 0)
>                         return ret;
> -               if (!ret)
> -                       break;
> -               sent += ret;
> +               sent += size;
>         }
>         return 0;
>  }
> --
> 2.28.0
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/9] btrfs: send: avoid copying file data
  2020-08-21  7:39 ` [PATCH 2/9] btrfs: send: avoid copying file data Omar Sandoval
@ 2020-08-21 17:29   ` Filipe Manana
  2020-08-24 21:34     ` Omar Sandoval
  2020-08-24 17:47   ` Josef Bacik
  2020-09-11 14:13   ` David Sterba
  2 siblings, 1 reply; 37+ messages in thread
From: Filipe Manana @ 2020-08-21 17:29 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-btrfs, linux-fsdevel

On Fri, Aug 21, 2020 at 8:42 AM Omar Sandoval <osandov@osandov.com> wrote:
>
> From: Omar Sandoval <osandov@fb.com>
>
> send_write() currently copies from the page cache to sctx->read_buf, and
> then from sctx->read_buf to sctx->send_buf. Similarly, send_hole()
> zeroes sctx->read_buf and then copies from sctx->read_buf to
> sctx->send_buf. However, if we write the TLV header manually, we can
> copy to sctx->send_buf directly and get rid of sctx->read_buf.
>
> Signed-off-by: Omar Sandoval <osandov@fb.com>

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Looks good, and it passed some long duration tests with both full and
incremental sends here (with and without compression, no-holes, etc).
Only one minor thing below, but it's really subjective and doesn't
make much of a difference.

Thanks.

> ---
>  fs/btrfs/send.c | 65 +++++++++++++++++++++++++++++--------------------
>  fs/btrfs/send.h |  1 -
>  2 files changed, 39 insertions(+), 27 deletions(-)
>
> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> index 8af5e867e4ca..e70f5ceb3261 100644
> --- a/fs/btrfs/send.c
> +++ b/fs/btrfs/send.c
> @@ -122,8 +122,6 @@ struct send_ctx {
>
>         struct file_ra_state ra;
>
> -       char *read_buf;
> -
>         /*
>          * We process inodes by their increasing order, so if before an
>          * incremental send we reverse the parent/child relationship of
> @@ -4794,7 +4792,25 @@ static int process_all_new_xattrs(struct send_ctx *sctx)
>         return ret;
>  }
>
> -static int fill_read_buf(struct send_ctx *sctx, u64 offset, u32 len)
> +static u64 max_send_read_size(struct send_ctx *sctx)

We could make this inline, since it's so small and trivial, and
constify the argument too.

> +{
> +       return sctx->send_max_size - SZ_16K;
> +}
> +
> +static int put_data_header(struct send_ctx *sctx, u32 len)
> +{
> +       struct btrfs_tlv_header *hdr;
> +
> +       if (sctx->send_max_size - sctx->send_size < sizeof(*hdr) + len)
> +               return -EOVERFLOW;
> +       hdr = (struct btrfs_tlv_header *)(sctx->send_buf + sctx->send_size);
> +       hdr->tlv_type = cpu_to_le16(BTRFS_SEND_A_DATA);
> +       hdr->tlv_len = cpu_to_le16(len);
> +       sctx->send_size += sizeof(*hdr);
> +       return 0;
> +}
> +
> +static int put_file_data(struct send_ctx *sctx, u64 offset, u32 len)
>  {
>         struct btrfs_root *root = sctx->send_root;
>         struct btrfs_fs_info *fs_info = root->fs_info;
> @@ -4804,8 +4820,11 @@ static int fill_read_buf(struct send_ctx *sctx, u64 offset, u32 len)
>         pgoff_t index = offset >> PAGE_SHIFT;
>         pgoff_t last_index;
>         unsigned pg_offset = offset_in_page(offset);
> -       int ret = 0;
> -       size_t read = 0;
> +       int ret;
> +
> +       ret = put_data_header(sctx, len);
> +       if (ret)
> +               return ret;
>
>         inode = btrfs_iget(fs_info->sb, sctx->cur_ino, root);
>         if (IS_ERR(inode))
> @@ -4851,14 +4870,15 @@ static int fill_read_buf(struct send_ctx *sctx, u64 offset, u32 len)
>                 }
>
>                 addr = kmap(page);
> -               memcpy(sctx->read_buf + read, addr + pg_offset, cur_len);
> +               memcpy(sctx->send_buf + sctx->send_size, addr + pg_offset,
> +                      cur_len);
>                 kunmap(page);
>                 unlock_page(page);
>                 put_page(page);
>                 index++;
>                 pg_offset = 0;
>                 len -= cur_len;
> -               read += cur_len;
> +               sctx->send_size += cur_len;
>         }
>         iput(inode);
>         return ret;
> @@ -4880,10 +4900,6 @@ static int send_write(struct send_ctx *sctx, u64 offset, u32 len)
>
>         btrfs_debug(fs_info, "send_write offset=%llu, len=%d", offset, len);
>
> -       ret = fill_read_buf(sctx, offset, len);
> -       if (ret < 0)
> -               goto out;
> -
>         ret = begin_cmd(sctx, BTRFS_SEND_C_WRITE);
>         if (ret < 0)
>                 goto out;
> @@ -4894,7 +4910,9 @@ static int send_write(struct send_ctx *sctx, u64 offset, u32 len)
>
>         TLV_PUT_PATH(sctx, BTRFS_SEND_A_PATH, p);
>         TLV_PUT_U64(sctx, BTRFS_SEND_A_FILE_OFFSET, offset);
> -       TLV_PUT(sctx, BTRFS_SEND_A_DATA, sctx->read_buf, len);
> +       ret = put_file_data(sctx, offset, len);
> +       if (ret < 0)
> +               goto out;
>
>         ret = send_cmd(sctx);
>
> @@ -5013,8 +5031,8 @@ static int send_update_extent(struct send_ctx *sctx,
>  static int send_hole(struct send_ctx *sctx, u64 end)
>  {
>         struct fs_path *p = NULL;
> +       u64 read_size = max_send_read_size(sctx);
>         u64 offset = sctx->cur_inode_last_extent;
> -       u64 len;
>         int ret = 0;
>
>         /*
> @@ -5041,16 +5059,19 @@ static int send_hole(struct send_ctx *sctx, u64 end)
>         ret = get_cur_path(sctx, sctx->cur_ino, sctx->cur_inode_gen, p);
>         if (ret < 0)
>                 goto tlv_put_failure;
> -       memset(sctx->read_buf, 0, BTRFS_SEND_READ_SIZE);
>         while (offset < end) {
> -               len = min_t(u64, end - offset, BTRFS_SEND_READ_SIZE);
> +               u64 len = min(end - offset, read_size);
>
>                 ret = begin_cmd(sctx, BTRFS_SEND_C_WRITE);
>                 if (ret < 0)
>                         break;
>                 TLV_PUT_PATH(sctx, BTRFS_SEND_A_PATH, p);
>                 TLV_PUT_U64(sctx, BTRFS_SEND_A_FILE_OFFSET, offset);
> -               TLV_PUT(sctx, BTRFS_SEND_A_DATA, sctx->read_buf, len);
> +               ret = put_data_header(sctx, len);
> +               if (ret < 0)
> +                       break;
> +               memset(sctx->send_buf + sctx->send_size, 0, len);
> +               sctx->send_size += len;
>                 ret = send_cmd(sctx);
>                 if (ret < 0)
>                         break;
> @@ -5066,17 +5087,16 @@ static int send_extent_data(struct send_ctx *sctx,
>                             const u64 offset,
>                             const u64 len)
>  {
> +       u64 read_size = max_send_read_size(sctx);
>         u64 sent = 0;
>
>         if (sctx->flags & BTRFS_SEND_FLAG_NO_FILE_DATA)
>                 return send_update_extent(sctx, offset, len);
>
>         while (sent < len) {
> -               u64 size = len - sent;
> +               u64 size = min(len - sent, read_size);
>                 int ret;
>
> -               if (size > BTRFS_SEND_READ_SIZE)
> -                       size = BTRFS_SEND_READ_SIZE;
>                 ret = send_write(sctx, offset + sent, size);
>                 if (ret < 0)
>                         return ret;
> @@ -7145,12 +7165,6 @@ long btrfs_ioctl_send(struct file *mnt_file, struct btrfs_ioctl_send_args *arg)
>                 goto out;
>         }
>
> -       sctx->read_buf = kvmalloc(BTRFS_SEND_READ_SIZE, GFP_KERNEL);
> -       if (!sctx->read_buf) {
> -               ret = -ENOMEM;
> -               goto out;
> -       }
> -
>         sctx->pending_dir_moves = RB_ROOT;
>         sctx->waiting_dir_moves = RB_ROOT;
>         sctx->orphan_dirs = RB_ROOT;
> @@ -7354,7 +7368,6 @@ long btrfs_ioctl_send(struct file *mnt_file, struct btrfs_ioctl_send_args *arg)
>
>                 kvfree(sctx->clone_roots);
>                 kvfree(sctx->send_buf);
> -               kvfree(sctx->read_buf);
>
>                 name_cache_free(sctx);
>
> diff --git a/fs/btrfs/send.h b/fs/btrfs/send.h
> index ead397f7034f..de91488b7cd0 100644
> --- a/fs/btrfs/send.h
> +++ b/fs/btrfs/send.h
> @@ -13,7 +13,6 @@
>  #define BTRFS_SEND_STREAM_VERSION 1
>
>  #define BTRFS_SEND_BUF_SIZE SZ_64K
> -#define BTRFS_SEND_READ_SIZE (48 * SZ_1K)
>
>  enum btrfs_tlv_type {
>         BTRFS_TLV_U8,
> --
> 2.28.0
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/9] btrfs: send: use btrfs_file_extent_end() in send_write_or_clone()
  2020-08-21  7:39 ` [PATCH 3/9] btrfs: send: use btrfs_file_extent_end() in send_write_or_clone() Omar Sandoval
@ 2020-08-21 17:30   ` Filipe Manana
  0 siblings, 0 replies; 37+ messages in thread
From: Filipe Manana @ 2020-08-21 17:30 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-btrfs, linux-fsdevel

On Fri, Aug 21, 2020 at 8:42 AM Omar Sandoval <osandov@osandov.com> wrote:
>
> From: Omar Sandoval <osandov@fb.com>
>
> send_write_or_clone() basically has an open-coded copy of
> btrfs_file_extent_end() except that it (incorrectly) aligns to PAGE_SIZE
> instead of sectorsize. Fix and simplify the code by using
> btrfs_file_extent_end().
>
> Signed-off-by: Omar Sandoval <osandov@fb.com>

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Looks good, and it passed some long duration tests with both full and
incremental sends here (with and without compression, no-holes, etc).

Thanks.

> ---
>  fs/btrfs/send.c | 44 +++++++++++---------------------------------
>  1 file changed, 11 insertions(+), 33 deletions(-)
>
> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> index e70f5ceb3261..37ce21361782 100644
> --- a/fs/btrfs/send.c
> +++ b/fs/btrfs/send.c
> @@ -5400,51 +5400,29 @@ static int send_write_or_clone(struct send_ctx *sctx,
>                                struct clone_root *clone_root)
>  {
>         int ret = 0;
> -       struct btrfs_file_extent_item *ei;
>         u64 offset = key->offset;
> -       u64 len;
> -       u8 type;
> +       u64 end;
>         u64 bs = sctx->send_root->fs_info->sb->s_blocksize;
>
> -       ei = btrfs_item_ptr(path->nodes[0], path->slots[0],
> -                       struct btrfs_file_extent_item);
> -       type = btrfs_file_extent_type(path->nodes[0], ei);
> -       if (type == BTRFS_FILE_EXTENT_INLINE) {
> -               len = btrfs_file_extent_ram_bytes(path->nodes[0], ei);
> -               /*
> -                * it is possible the inline item won't cover the whole page,
> -                * but there may be items after this page.  Make
> -                * sure to send the whole thing
> -                */
> -               len = PAGE_ALIGN(len);
> -       } else {
> -               len = btrfs_file_extent_num_bytes(path->nodes[0], ei);
> -       }
> -
> -       if (offset >= sctx->cur_inode_size) {
> -               ret = 0;
> -               goto out;
> -       }
> -       if (offset + len > sctx->cur_inode_size)
> -               len = sctx->cur_inode_size - offset;
> -       if (len == 0) {
> -               ret = 0;
> -               goto out;
> -       }
> +       end = min(btrfs_file_extent_end(path), sctx->cur_inode_size);
> +       if (offset >= end)
> +               return 0;
>
> -       if (clone_root && IS_ALIGNED(offset + len, bs)) {
> +       if (clone_root && IS_ALIGNED(end, bs)) {
> +               struct btrfs_file_extent_item *ei;
>                 u64 disk_byte;
>                 u64 data_offset;
>
> +               ei = btrfs_item_ptr(path->nodes[0], path->slots[0],
> +                                   struct btrfs_file_extent_item);
>                 disk_byte = btrfs_file_extent_disk_bytenr(path->nodes[0], ei);
>                 data_offset = btrfs_file_extent_offset(path->nodes[0], ei);
>                 ret = clone_range(sctx, clone_root, disk_byte, data_offset,
> -                                 offset, len);
> +                                 offset, end - offset);
>         } else {
> -               ret = send_extent_data(sctx, offset, len);
> +               ret = send_extent_data(sctx, offset, end - offset);
>         }
> -       sctx->cur_inode_next_write_offset = offset + len;
> -out:
> +       sctx->cur_inode_next_write_offset = end;
>         return ret;
>  }
>
> --
> 2.28.0
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 8/9] btrfs: send: send compressed extents with encoded writes
  2020-08-21  7:39 ` [PATCH 8/9] btrfs: send: send compressed extents with encoded writes Omar Sandoval
@ 2020-08-24 17:32   ` Josef Bacik
  2020-08-24 17:52     ` Omar Sandoval
  0 siblings, 1 reply; 37+ messages in thread
From: Josef Bacik @ 2020-08-24 17:32 UTC (permalink / raw)
  To: Omar Sandoval, linux-btrfs; +Cc: linux-fsdevel

On 8/21/20 3:39 AM, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> Now that all of the pieces are in place, we can use the ENCODED_WRITE
> command to send compressed extents when appropriate.
> 
> Signed-off-by: Omar Sandoval <osandov@fb.com>

This one doesn't apply cleanly to misc-next, the ctree.h and inode.c chunks all 
fail, and the last hunk of the send stuff doesn't apply.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/9] btrfs: send: get rid of i_size logic in send_write()
  2020-08-21  7:39 ` [PATCH 1/9] btrfs: send: get rid of i_size logic in send_write() Omar Sandoval
  2020-08-21 17:26   ` Filipe Manana
@ 2020-08-24 17:39   ` Josef Bacik
  1 sibling, 0 replies; 37+ messages in thread
From: Josef Bacik @ 2020-08-24 17:39 UTC (permalink / raw)
  To: Omar Sandoval, linux-btrfs; +Cc: linux-fsdevel

On 8/21/20 3:39 AM, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> send_write()/fill_read_buf() have some logic for avoiding reading past
> i_size. However, everywhere that we call
> send_write()/send_extent_data(), we've already clamped the length down
> to i_size. Get rid of the i_size handling, which simplifies the next
> change.
> 
> Signed-off-by: Omar Sandoval <osandov@fb.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/9] btrfs: send: avoid copying file data
  2020-08-21  7:39 ` [PATCH 2/9] btrfs: send: avoid copying file data Omar Sandoval
  2020-08-21 17:29   ` Filipe Manana
@ 2020-08-24 17:47   ` Josef Bacik
  2020-09-11 14:13   ` David Sterba
  2 siblings, 0 replies; 37+ messages in thread
From: Josef Bacik @ 2020-08-24 17:47 UTC (permalink / raw)
  To: Omar Sandoval, linux-btrfs; +Cc: linux-fsdevel

On 8/21/20 3:39 AM, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> send_write() currently copies from the page cache to sctx->read_buf, and
> then from sctx->read_buf to sctx->send_buf. Similarly, send_hole()
> zeroes sctx->read_buf and then copies from sctx->read_buf to
> sctx->send_buf. However, if we write the TLV header manually, we can
> copy to sctx->send_buf directly and get rid of sctx->read_buf.
> 
> Signed-off-by: Omar Sandoval <osandov@fb.com>

I couldn't figure out why you weren't just using TLV_ helper for this, but then 
I realized the len is the length of the data, so you need a special helper for 
the header.  Just in case anybody else gets confused,

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 5/9] btrfs: add send stream v2 definitions
  2020-08-21  7:39 ` [PATCH 5/9] btrfs: add send stream v2 definitions Omar Sandoval
@ 2020-08-24 17:49   ` Josef Bacik
  0 siblings, 0 replies; 37+ messages in thread
From: Josef Bacik @ 2020-08-24 17:49 UTC (permalink / raw)
  To: Omar Sandoval, linux-btrfs; +Cc: linux-fsdevel

On 8/21/20 3:39 AM, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> This adds the definitions of the new commands for send stream version 2
> and their respective attributes: fallocate, FS_IOC_SETFLAGS (a.k.a.
> chattr), and encoded writes. It also documents two changes to the send
> stream format in v2: the receiver shouldn't assume a maximum command
> size, and the DATA attribute is encoded differently to allow for writes
> larger than 64k. These will be implemented in subsequent changes, and
> then the ioctl will accept the new flags.
> 
> Signed-off-by: Omar Sandoval <osandov@fb.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 8/9] btrfs: send: send compressed extents with encoded writes
  2020-08-24 17:32   ` Josef Bacik
@ 2020-08-24 17:52     ` Omar Sandoval
  0 siblings, 0 replies; 37+ messages in thread
From: Omar Sandoval @ 2020-08-24 17:52 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, linux-fsdevel

On Mon, Aug 24, 2020 at 01:32:47PM -0400, Josef Bacik wrote:
> On 8/21/20 3:39 AM, Omar Sandoval wrote:
> > From: Omar Sandoval <osandov@fb.com>
> > 
> > Now that all of the pieces are in place, we can use the ENCODED_WRITE
> > command to send compressed extents when appropriate.
> > 
> > Signed-off-by: Omar Sandoval <osandov@fb.com>
> 
> This one doesn't apply cleanly to misc-next, the ctree.h and inode.c chunks
> all fail, and the last hunk of the send stuff doesn't apply.  Thanks,
> 
> Josef

Looks like a few patches just went in that conflict with this. I'll
rebase for the next version, but I also have this in a git branch in the
meantime: https://github.com/osandov/linux/tree/btrfs-send-encoded-v1

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 6/9] btrfs: send: write larger chunks when using stream v2
  2020-08-21  7:39 ` [PATCH 6/9] btrfs: send: write larger chunks when using stream v2 Omar Sandoval
@ 2020-08-24 17:57   ` Josef Bacik
  0 siblings, 0 replies; 37+ messages in thread
From: Josef Bacik @ 2020-08-24 17:57 UTC (permalink / raw)
  To: Omar Sandoval, linux-btrfs; +Cc: linux-fsdevel

On 8/21/20 3:39 AM, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> The length field of the send stream TLV header is 16 bits. This means
> that the maximum amount of data that can be sent for one write is 64k
> minus one. However, encoded writes must be able to send the maximum
> compressed extent (128k) in one command. To support this, send stream
> version 2 encodes the DATA attribute differently: it has no length
> field, and the length is implicitly up to the end of containing command
> (which has a 32-bit length field). Although this is necessary for
> encoded writes, normal writes can benefit from it, too.
> 
> For v2, let's bump up the send buffer to the maximum compressed extent
> size plus 16k for the other metadata (144k total). Since this will most
> likely be vmalloc'd (and always will be after the next commit), we round
> it up to the next page since we might as well use the rest of the page
> on systems with >16k pages.
> 
> Signed-off-by: Omar Sandoval <osandov@fb.com>
> ---
>   fs/btrfs/send.c | 34 ++++++++++++++++++++++++++--------
>   1 file changed, 26 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> index e25c3391fc02..c0f81d302f49 100644
> --- a/fs/btrfs/send.c
> +++ b/fs/btrfs/send.c
> @@ -4799,14 +4799,27 @@ static u64 max_send_read_size(struct send_ctx *sctx)
>   
>   static int put_data_header(struct send_ctx *sctx, u32 len)
>   {
> -	struct btrfs_tlv_header *hdr;
> +	if (sctx->flags & BTRFS_SEND_FLAG_STREAM_V2) {
> +		__le16 tlv_type;
> +
> +		if (sctx->send_max_size - sctx->send_size <
> +		    sizeof(tlv_type) + len)
> +			return -EOVERFLOW;
> +		tlv_type = cpu_to_le16(BTRFS_SEND_A_DATA);
> +		memcpy(sctx->send_buf + sctx->send_size, &tlv_type,
> +		       sizeof(tlv_type));
> +		sctx->send_size += sizeof(tlv_type);

Can we add a comment for implied length thing here?  I was reviewing this in 
vimdiff without the commit message so missed the implied length detail.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/9] btrfs: implement send/receive of compressed extents without decompressing
  2020-08-21  7:39 [PATCH 0/9] btrfs: implement send/receive of compressed extents without decompressing Omar Sandoval
                   ` (19 preceding siblings ...)
  2020-08-21  7:40 ` [PATCH 11/11] btrfs-progs: receive: add tests for basic encoded_write send/receive Omar Sandoval
@ 2020-08-24 19:57 ` David Sterba
  2020-08-24 22:16   ` Omar Sandoval
  2020-09-10 11:28 ` David Sterba
  21 siblings, 1 reply; 37+ messages in thread
From: David Sterba @ 2020-08-24 19:57 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-btrfs, linux-fsdevel

On Fri, Aug 21, 2020 at 12:39:50AM -0700, Omar Sandoval wrote:
> Protocol Updates
> ================
> 
> This series makes some changes to the send stream protocol beyond adding
> the encoded write command/attributes and bumping the version. Namely, v1
> has a 64k limit on the size of a write due to the 16-bit attribute
> length. This is not enough for encoded writes, as compressed extents may
> be up to 128k and cannot be split up. To address this, the
> BTRFS_SEND_A_DATA is treated specially in v2: its length is implicitly
> the remaining length of the command (which has a 32-bit length). This
> was the last bad of the options I considered.
> 
> There are other commands that we've been wanting to add to the protocol:
> fallocate and FS_IOC_SETFLAGS. This series reserves their command and
> attribute numbers but does not implement kernel support for emitting
> them. However, it does implement support in receive for them, so the
> kernel can start emitting those whenever we get around to implementing
> them.

Can you please outline the protocol changes (as a bullet list) and
eventually cross-ref with items
https://btrfs.wiki.kernel.org/index.php/Design_notes_on_Send/Receive#Send_stream_v2_draft

I'd like to know which and why you did not implement. The decision here
is between get v2 out with most desired options and rev v3 later with
the rest, or do v2 as complete as possible.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/9] btrfs: send: avoid copying file data
  2020-08-21 17:29   ` Filipe Manana
@ 2020-08-24 21:34     ` Omar Sandoval
  0 siblings, 0 replies; 37+ messages in thread
From: Omar Sandoval @ 2020-08-24 21:34 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs, linux-fsdevel

On Fri, Aug 21, 2020 at 06:29:30PM +0100, Filipe Manana wrote:
> On Fri, Aug 21, 2020 at 8:42 AM Omar Sandoval <osandov@osandov.com> wrote:
> >
> > From: Omar Sandoval <osandov@fb.com>
> >
> > send_write() currently copies from the page cache to sctx->read_buf, and
> > then from sctx->read_buf to sctx->send_buf. Similarly, send_hole()
> > zeroes sctx->read_buf and then copies from sctx->read_buf to
> > sctx->send_buf. However, if we write the TLV header manually, we can
> > copy to sctx->send_buf directly and get rid of sctx->read_buf.
> >
> > Signed-off-by: Omar Sandoval <osandov@fb.com>
> 
> Reviewed-by: Filipe Manana <fdmanana@suse.com>
> 
> Looks good, and it passed some long duration tests with both full and
> incremental sends here (with and without compression, no-holes, etc).
> Only one minor thing below, but it's really subjective and doesn't
> make much of a difference.
> 
> Thanks.
> 
> > ---
> >  fs/btrfs/send.c | 65 +++++++++++++++++++++++++++++--------------------
> >  fs/btrfs/send.h |  1 -
> >  2 files changed, 39 insertions(+), 27 deletions(-)
> >
> > diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> > index 8af5e867e4ca..e70f5ceb3261 100644
> > --- a/fs/btrfs/send.c
> > +++ b/fs/btrfs/send.c
> > @@ -122,8 +122,6 @@ struct send_ctx {
> >
> >         struct file_ra_state ra;
> >
> > -       char *read_buf;
> > -
> >         /*
> >          * We process inodes by their increasing order, so if before an
> >          * incremental send we reverse the parent/child relationship of
> > @@ -4794,7 +4792,25 @@ static int process_all_new_xattrs(struct send_ctx *sctx)
> >         return ret;
> >  }
> >
> > -static int fill_read_buf(struct send_ctx *sctx, u64 offset, u32 len)
> > +static u64 max_send_read_size(struct send_ctx *sctx)
> 
> We could make this inline, since it's so small and trivial, and
> constify the argument too.

Good point, fixed. Thanks, Filipe!

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/9] btrfs: implement send/receive of compressed extents without decompressing
  2020-08-24 19:57 ` [PATCH 0/9] btrfs: implement send/receive of compressed extents without decompressing David Sterba
@ 2020-08-24 22:16   ` Omar Sandoval
  0 siblings, 0 replies; 37+ messages in thread
From: Omar Sandoval @ 2020-08-24 22:16 UTC (permalink / raw)
  To: dsterba, linux-btrfs, linux-fsdevel

On Mon, Aug 24, 2020 at 09:57:55PM +0200, David Sterba wrote:
> On Fri, Aug 21, 2020 at 12:39:50AM -0700, Omar Sandoval wrote:
> > Protocol Updates
> > ================
> > 
> > This series makes some changes to the send stream protocol beyond adding
> > the encoded write command/attributes and bumping the version. Namely, v1
> > has a 64k limit on the size of a write due to the 16-bit attribute
> > length. This is not enough for encoded writes, as compressed extents may
> > be up to 128k and cannot be split up. To address this, the
> > BTRFS_SEND_A_DATA is treated specially in v2: its length is implicitly
> > the remaining length of the command (which has a 32-bit length). This
> > was the last bad of the options I considered.
> > 
> > There are other commands that we've been wanting to add to the protocol:
> > fallocate and FS_IOC_SETFLAGS. This series reserves their command and
> > attribute numbers but does not implement kernel support for emitting
> > them. However, it does implement support in receive for them, so the
> > kernel can start emitting those whenever we get around to implementing
> > them.
> 
> Can you please outline the protocol changes (as a bullet list) and
> eventually cross-ref with items
> https://btrfs.wiki.kernel.org/index.php/Design_notes_on_Send/Receive#Send_stream_v2_draft
> 
> I'd like to know which and why you did not implement. The decision here
> is between get v2 out with most desired options and rev v3 later with
> the rest, or do v2 as complete as possible.

The short version is that I didn't implement the kernel side of any of
those :) the RWF_ENCODED series + this series is already big, and I
didn't want to make it even bigger. I figured updating the
protocol/receive now and doing the kernel side later was a good
compromise (rather than doing a huge code dump or constantly bumping the
protocol version). Is there some reason you don't like this approach?
I'm of course happy to go about this in whatever way you think is best.

Here's a breakdown of the list from the wiki:

* Send extent holes, send preallocated extents: both require fallocate.
  Boris implemented the receive side. I have some old patches
  implementing the send side [1], but they're a largish rework of extent
  tracking in send.
* Extent clones within one file: as far as I can tell, this is already
  possible with v1, it just sends redundant file paths.
* Send otime for inodes: the consensus when I posted patches to enable
  this [2] was that we don't want this after all.
* Send file flags (FS_IOC_GETFLAGS/FS_IOC_SETFLAGS): again, Boris
  implemented the receive side. I previously took a stab at the send
  side, but it's really annoying because of all of the interactions
  between directory inheritance, writes vs. NOCOW/append-only/immutable,
  etc. It's do-able, it would just take a lot of care.
* Optionally send owner/group as strings: this one I wasn't aware of.
* "block device is not sent over the stream": I don't know what this is
  referring to. It looks like we send block device nodes with mknod.

In my opinion, fallocate support is the most important, SETFLAGS would
be good but is a lot of effort, and the rest are nice-to-have.

Let me know how you'd like me to go about this.

1: https://github.com/osandov/linux/commits/btrfs-send-v2
2: https://lore.kernel.org/linux-btrfs/cover.1550136164.git.osandov@fb.com/

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/9] btrfs: implement send/receive of compressed extents without decompressing
  2020-08-21  7:39 [PATCH 0/9] btrfs: implement send/receive of compressed extents without decompressing Omar Sandoval
                   ` (20 preceding siblings ...)
  2020-08-24 19:57 ` [PATCH 0/9] btrfs: implement send/receive of compressed extents without decompressing David Sterba
@ 2020-09-10 11:28 ` David Sterba
  21 siblings, 0 replies; 37+ messages in thread
From: David Sterba @ 2020-09-10 11:28 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-btrfs, linux-fsdevel

On Fri, Aug 21, 2020 at 12:39:50AM -0700, Omar Sandoval wrote:
> Omar Sandoval (9):
>   btrfs: send: get rid of i_size logic in send_write()
>   btrfs: send: avoid copying file data
>   btrfs: send: use btrfs_file_extent_end() in send_write_or_clone()
>   btrfs: add send_stream_version attribute to sysfs

For the record, I'll add 1-4 to misc-next.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/9] btrfs: send: avoid copying file data
  2020-08-21  7:39 ` [PATCH 2/9] btrfs: send: avoid copying file data Omar Sandoval
  2020-08-21 17:29   ` Filipe Manana
  2020-08-24 17:47   ` Josef Bacik
@ 2020-09-11 14:13   ` David Sterba
  2020-09-14 22:04     ` Omar Sandoval
  2 siblings, 1 reply; 37+ messages in thread
From: David Sterba @ 2020-09-11 14:13 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-btrfs, linux-fsdevel

On Fri, Aug 21, 2020 at 12:39:52AM -0700, Omar Sandoval wrote:
> +static int put_data_header(struct send_ctx *sctx, u32 len)
> +{
> +	struct btrfs_tlv_header *hdr;
> +
> +	if (sctx->send_max_size - sctx->send_size < sizeof(*hdr) + len)
> +		return -EOVERFLOW;
> +	hdr = (struct btrfs_tlv_header *)(sctx->send_buf + sctx->send_size);
> +	hdr->tlv_type = cpu_to_le16(BTRFS_SEND_A_DATA);
> +	hdr->tlv_len = cpu_to_le16(len);

I think we need put_unaligned_le16 here, it's mapping a random buffer to
a pointer, this is not alignment safe in general.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/9] btrfs: send: avoid copying file data
  2020-09-11 14:13   ` David Sterba
@ 2020-09-14 22:04     ` Omar Sandoval
  2020-09-15  8:14       ` David Sterba
  0 siblings, 1 reply; 37+ messages in thread
From: Omar Sandoval @ 2020-09-14 22:04 UTC (permalink / raw)
  To: dsterba, linux-btrfs, linux-fsdevel

On Fri, Sep 11, 2020 at 04:13:39PM +0200, David Sterba wrote:
> On Fri, Aug 21, 2020 at 12:39:52AM -0700, Omar Sandoval wrote:
> > +static int put_data_header(struct send_ctx *sctx, u32 len)
> > +{
> > +	struct btrfs_tlv_header *hdr;
> > +
> > +	if (sctx->send_max_size - sctx->send_size < sizeof(*hdr) + len)
> > +		return -EOVERFLOW;
> > +	hdr = (struct btrfs_tlv_header *)(sctx->send_buf + sctx->send_size);
> > +	hdr->tlv_type = cpu_to_le16(BTRFS_SEND_A_DATA);
> > +	hdr->tlv_len = cpu_to_le16(len);
> 
> I think we need put_unaligned_le16 here, it's mapping a random buffer to
> a pointer, this is not alignment safe in general.

I think you're right, although tlv_put() seems to have this same
problem.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/9] btrfs: send: avoid copying file data
  2020-09-14 22:04     ` Omar Sandoval
@ 2020-09-15  8:14       ` David Sterba
  0 siblings, 0 replies; 37+ messages in thread
From: David Sterba @ 2020-09-15  8:14 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: dsterba, linux-btrfs, linux-fsdevel

On Mon, Sep 14, 2020 at 03:04:48PM -0700, Omar Sandoval wrote:
> On Fri, Sep 11, 2020 at 04:13:39PM +0200, David Sterba wrote:
> > On Fri, Aug 21, 2020 at 12:39:52AM -0700, Omar Sandoval wrote:
> > > +static int put_data_header(struct send_ctx *sctx, u32 len)
> > > +{
> > > +	struct btrfs_tlv_header *hdr;
> > > +
> > > +	if (sctx->send_max_size - sctx->send_size < sizeof(*hdr) + len)
> > > +		return -EOVERFLOW;
> > > +	hdr = (struct btrfs_tlv_header *)(sctx->send_buf + sctx->send_size);
> > > +	hdr->tlv_type = cpu_to_le16(BTRFS_SEND_A_DATA);
> > > +	hdr->tlv_len = cpu_to_le16(len);
> > 
> > I think we need put_unaligned_le16 here, it's mapping a random buffer to
> > a pointer, this is not alignment safe in general.
> 
> I think you're right, although tlv_put() seems to have this same
> problem.

Indeed and there's more: tlv_put, TLV_PUT_DEFINE_INT, begin_cmd,
send_cmd. Other direct assignments are in local structs so the alignment
is fine.

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2020-09-15  8:22 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-21  7:39 [PATCH 0/9] btrfs: implement send/receive of compressed extents without decompressing Omar Sandoval
2020-08-21  7:39 ` [PATCH 1/9] btrfs: send: get rid of i_size logic in send_write() Omar Sandoval
2020-08-21 17:26   ` Filipe Manana
2020-08-24 17:39   ` Josef Bacik
2020-08-21  7:39 ` [PATCH 2/9] btrfs: send: avoid copying file data Omar Sandoval
2020-08-21 17:29   ` Filipe Manana
2020-08-24 21:34     ` Omar Sandoval
2020-08-24 17:47   ` Josef Bacik
2020-09-11 14:13   ` David Sterba
2020-09-14 22:04     ` Omar Sandoval
2020-09-15  8:14       ` David Sterba
2020-08-21  7:39 ` [PATCH 3/9] btrfs: send: use btrfs_file_extent_end() in send_write_or_clone() Omar Sandoval
2020-08-21 17:30   ` Filipe Manana
2020-08-21  7:39 ` [PATCH 4/9] btrfs: add send_stream_version attribute to sysfs Omar Sandoval
2020-08-21  7:39 ` [PATCH 5/9] btrfs: add send stream v2 definitions Omar Sandoval
2020-08-24 17:49   ` Josef Bacik
2020-08-21  7:39 ` [PATCH 6/9] btrfs: send: write larger chunks when using stream v2 Omar Sandoval
2020-08-24 17:57   ` Josef Bacik
2020-08-21  7:39 ` [PATCH 7/9] btrfs: send: allocate send buffer with alloc_page() and vmap() for v2 Omar Sandoval
2020-08-21  7:39 ` [PATCH 8/9] btrfs: send: send compressed extents with encoded writes Omar Sandoval
2020-08-24 17:32   ` Josef Bacik
2020-08-24 17:52     ` Omar Sandoval
2020-08-21  7:39 ` [PATCH 9/9] btrfs: send: enable support for stream v2 and compressed writes Omar Sandoval
2020-08-21  7:40 ` [PATCH 01/11] btrfs-progs: receive: support v2 send stream larger tlv_len Omar Sandoval
2020-08-21  7:40 ` [PATCH 02/11] btrfs-progs: receive: dynamically allocate sctx->read_buf Omar Sandoval
2020-08-21  7:40 ` [PATCH 03/11] btrfs-progs: receive: support v2 send stream DATA tlv format Omar Sandoval
2020-08-21  7:40 ` [PATCH 04/11] btrfs-progs: receive: add send stream v2 cmds and attrs to send.h Omar Sandoval
2020-08-21  7:40 ` [PATCH 05/11] btrfs-progs: receive: add stub implementation for pwritev2 Omar Sandoval
2020-08-21  7:40 ` [PATCH 06/11] btrfs-progs: receive: process encoded_write commands Omar Sandoval
2020-08-21  7:40 ` [PATCH 07/11] btrfs-progs: receive: encoded_write fallback to explicit decode and write Omar Sandoval
2020-08-21  7:40 ` [PATCH 08/11] btrfs-progs: receive: process fallocate commands Omar Sandoval
2020-08-21  7:40 ` [PATCH 09/11] btrfs-progs: receive: process setflags ioctl commands Omar Sandoval
2020-08-21  7:40 ` [PATCH 10/11] btrfs-progs: send: stream v2 ioctl flags Omar Sandoval
2020-08-21  7:40 ` [PATCH 11/11] btrfs-progs: receive: add tests for basic encoded_write send/receive Omar Sandoval
2020-08-24 19:57 ` [PATCH 0/9] btrfs: implement send/receive of compressed extents without decompressing David Sterba
2020-08-24 22:16   ` Omar Sandoval
2020-09-10 11:28 ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.