All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v15 0/7] btrfs: add send/receive support for reading/writing compressed data
@ 2022-04-04 17:29 Omar Sandoval
  2022-04-04 17:29 ` [PATCH v15 1/7] btrfs: send: remove unused send_ctx::{total,cmd}_send_size Omar Sandoval
                   ` (6 more replies)
  0 siblings, 7 replies; 15+ messages in thread
From: Omar Sandoval @ 2022-04-04 17:29 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team

From: Omar Sandoval <osandov@fb.com>

This series adds support for sending compressed data via Btrfs send and
btrfs-progs support for sending/receiving compressed data and writing it
with BTRFS_IOC_ENCODED_WRITE, which was previously merged into
misc-next. See the previous posting for more details and benchmarks [1].

Patches 1 and 2 are cleanups for Btrfs send. Patches 3-5 prepare some
protocol changes for send stream v2. Patch 6 implements compressed send.
Patch 7 enables send stream v2 and compressed send in the send ioctl
when requested.

Changes since v14 [2]:

- Added a comment about "put_data" to patch 4.
- Replaced hard-coded 2s with sizeof(__le16) in patch 4.
- Replaced patch 5 with new approach using vmalloc() + vmalloc_to_page()
  (instead of alloc_page() + vmap()) as suggested by Sweet Tea.
- Added Sweet Tea's reviewed-bys.
- Rebased on latest misc-next branch.

The btrfs-progs patches are unchanged since v14, so I'm not resending
them this time.

1: https://lore.kernel.org/linux-btrfs/cover.1615922753.git.osandov@fb.com/
2: https://lore.kernel.org/linux-btrfs/cover.1647537027.git.osandov@fb.com/

Omar Sandoval (7):
  btrfs: send: remove unused send_ctx::{total,cmd}_send_size
  btrfs: send: explicitly number commands and attributes
  btrfs: add send stream v2 definitions
  btrfs: send: write larger chunks when using stream v2
  btrfs: send: get send buffer pages for protocol v2
  btrfs: send: send compressed extents with encoded writes
  btrfs: send: enable support for stream v2 and compressed writes

 fs/btrfs/ctree.h           |   6 +
 fs/btrfs/inode.c           |  13 +-
 fs/btrfs/send.c            | 320 +++++++++++++++++++++++++++++++++----
 fs/btrfs/send.h            | 142 +++++++++-------
 include/uapi/linux/btrfs.h |  10 +-
 5 files changed, 392 insertions(+), 99 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v15 1/7] btrfs: send: remove unused send_ctx::{total,cmd}_send_size
  2022-04-04 17:29 [PATCH v15 0/7] btrfs: add send/receive support for reading/writing compressed data Omar Sandoval
@ 2022-04-04 17:29 ` Omar Sandoval
  2022-04-04 17:29 ` [PATCH v15 2/7] btrfs: send: explicitly number commands and attributes Omar Sandoval
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Omar Sandoval @ 2022-04-04 17:29 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team

From: Omar Sandoval <osandov@fb.com>

We collect these statistics but have never exposed them in any way. I
also didn't find any patches that ever attempted to make use of them.

Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/send.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index cf86f1eafcb7..6d36dee1505f 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -82,8 +82,6 @@ struct send_ctx {
 	char *send_buf;
 	u32 send_size;
 	u32 send_max_size;
-	u64 total_send_size;
-	u64 cmd_send_size[BTRFS_SEND_C_MAX + 1];
 	u64 flags;	/* 'flags' member of btrfs_ioctl_send_args is u64 */
 	/* Protocol version compatibility requested */
 	u32 proto;
@@ -727,8 +725,6 @@ static int send_cmd(struct send_ctx *sctx)
 	ret = write_buf(sctx->send_filp, sctx->send_buf, sctx->send_size,
 					&sctx->send_off);
 
-	sctx->total_send_size += sctx->send_size;
-	sctx->cmd_send_size[get_unaligned_le16(&hdr->cmd)] += sctx->send_size;
 	sctx->send_size = 0;
 
 	return ret;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v15 2/7] btrfs: send: explicitly number commands and attributes
  2022-04-04 17:29 [PATCH v15 0/7] btrfs: add send/receive support for reading/writing compressed data Omar Sandoval
  2022-04-04 17:29 ` [PATCH v15 1/7] btrfs: send: remove unused send_ctx::{total,cmd}_send_size Omar Sandoval
@ 2022-04-04 17:29 ` Omar Sandoval
  2022-05-18 22:24   ` David Sterba
  2022-04-04 17:29 ` [PATCH v15 3/7] btrfs: add send stream v2 definitions Omar Sandoval
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Omar Sandoval @ 2022-04-04 17:29 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team

From: Omar Sandoval <osandov@fb.com>

Commit e77fbf990316 ("btrfs: send: prepare for v2 protocol") added
_BTRFS_SEND_C_MAX_V* macros equal to the maximum command number for the
version plus 1, but as written this creates gaps in the number space.
The maximum command number is currently 22, and __BTRFS_SEND_C_MAX_V1 is
accordingly 23. But then __BTRFS_SEND_C_MAX_V2 is 24, suggesting that v2
has a command numbered 23, and __BTRFS_SEND_C_MAX is 25, suggesting that
23 and 24 are valid commands.

Instead, let's explicitly number all of the commands, attributes, and
sentinel MAX constants.

Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/send.c |   4 +-
 fs/btrfs/send.h | 106 ++++++++++++++++++++++++------------------------
 2 files changed, 54 insertions(+), 56 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 6d36dee1505f..9363f625fa17 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -326,8 +326,8 @@ __maybe_unused
 static bool proto_cmd_ok(const struct send_ctx *sctx, int cmd)
 {
 	switch (sctx->proto) {
-	case 1:	 return cmd < __BTRFS_SEND_C_MAX_V1;
-	case 2:	 return cmd < __BTRFS_SEND_C_MAX_V2;
+	case 1:	 return cmd <= BTRFS_SEND_C_MAX_V1;
+	case 2:	 return cmd <= BTRFS_SEND_C_MAX_V2;
 	default: return false;
 	}
 }
diff --git a/fs/btrfs/send.h b/fs/btrfs/send.h
index 08602fdd600a..67721e0281ba 100644
--- a/fs/btrfs/send.h
+++ b/fs/btrfs/send.h
@@ -46,84 +46,82 @@ struct btrfs_tlv_header {
 
 /* commands */
 enum btrfs_send_cmd {
-	BTRFS_SEND_C_UNSPEC,
+	BTRFS_SEND_C_UNSPEC = 0,
 
 	/* Version 1 */
-	BTRFS_SEND_C_SUBVOL,
-	BTRFS_SEND_C_SNAPSHOT,
+	BTRFS_SEND_C_SUBVOL = 1,
+	BTRFS_SEND_C_SNAPSHOT = 2,
 
-	BTRFS_SEND_C_MKFILE,
-	BTRFS_SEND_C_MKDIR,
-	BTRFS_SEND_C_MKNOD,
-	BTRFS_SEND_C_MKFIFO,
-	BTRFS_SEND_C_MKSOCK,
-	BTRFS_SEND_C_SYMLINK,
+	BTRFS_SEND_C_MKFILE = 3,
+	BTRFS_SEND_C_MKDIR = 4,
+	BTRFS_SEND_C_MKNOD = 5,
+	BTRFS_SEND_C_MKFIFO = 6,
+	BTRFS_SEND_C_MKSOCK = 7,
+	BTRFS_SEND_C_SYMLINK = 8,
 
-	BTRFS_SEND_C_RENAME,
-	BTRFS_SEND_C_LINK,
-	BTRFS_SEND_C_UNLINK,
-	BTRFS_SEND_C_RMDIR,
+	BTRFS_SEND_C_RENAME = 9,
+	BTRFS_SEND_C_LINK = 10,
+	BTRFS_SEND_C_UNLINK = 11,
+	BTRFS_SEND_C_RMDIR = 12,
 
-	BTRFS_SEND_C_SET_XATTR,
-	BTRFS_SEND_C_REMOVE_XATTR,
+	BTRFS_SEND_C_SET_XATTR = 13,
+	BTRFS_SEND_C_REMOVE_XATTR = 14,
 
-	BTRFS_SEND_C_WRITE,
-	BTRFS_SEND_C_CLONE,
+	BTRFS_SEND_C_WRITE = 15,
+	BTRFS_SEND_C_CLONE = 16,
 
-	BTRFS_SEND_C_TRUNCATE,
-	BTRFS_SEND_C_CHMOD,
-	BTRFS_SEND_C_CHOWN,
-	BTRFS_SEND_C_UTIMES,
+	BTRFS_SEND_C_TRUNCATE = 17,
+	BTRFS_SEND_C_CHMOD = 18,
+	BTRFS_SEND_C_CHOWN = 19,
+	BTRFS_SEND_C_UTIMES = 20,
 
-	BTRFS_SEND_C_END,
-	BTRFS_SEND_C_UPDATE_EXTENT,
-	__BTRFS_SEND_C_MAX_V1,
+	BTRFS_SEND_C_END = 21,
+	BTRFS_SEND_C_UPDATE_EXTENT = 22,
+	BTRFS_SEND_C_MAX_V1 = 22,
 
 	/* Version 2 */
-	__BTRFS_SEND_C_MAX_V2,
+	BTRFS_SEND_C_MAX_V2 = 22,
 
 	/* End */
-	__BTRFS_SEND_C_MAX,
+	BTRFS_SEND_C_MAX = 22,
 };
-#define BTRFS_SEND_C_MAX (__BTRFS_SEND_C_MAX - 1)
 
 /* attributes in send stream */
 enum {
-	BTRFS_SEND_A_UNSPEC,
+	BTRFS_SEND_A_UNSPEC = 0,
 
-	BTRFS_SEND_A_UUID,
-	BTRFS_SEND_A_CTRANSID,
+	BTRFS_SEND_A_UUID = 1,
+	BTRFS_SEND_A_CTRANSID = 2,
 
-	BTRFS_SEND_A_INO,
-	BTRFS_SEND_A_SIZE,
-	BTRFS_SEND_A_MODE,
-	BTRFS_SEND_A_UID,
-	BTRFS_SEND_A_GID,
-	BTRFS_SEND_A_RDEV,
-	BTRFS_SEND_A_CTIME,
-	BTRFS_SEND_A_MTIME,
-	BTRFS_SEND_A_ATIME,
-	BTRFS_SEND_A_OTIME,
+	BTRFS_SEND_A_INO = 3,
+	BTRFS_SEND_A_SIZE = 4,
+	BTRFS_SEND_A_MODE = 5,
+	BTRFS_SEND_A_UID = 6,
+	BTRFS_SEND_A_GID = 7,
+	BTRFS_SEND_A_RDEV = 8,
+	BTRFS_SEND_A_CTIME = 9,
+	BTRFS_SEND_A_MTIME = 10,
+	BTRFS_SEND_A_ATIME = 11,
+	BTRFS_SEND_A_OTIME = 12,
 
-	BTRFS_SEND_A_XATTR_NAME,
-	BTRFS_SEND_A_XATTR_DATA,
+	BTRFS_SEND_A_XATTR_NAME = 13,
+	BTRFS_SEND_A_XATTR_DATA = 14,
 
-	BTRFS_SEND_A_PATH,
-	BTRFS_SEND_A_PATH_TO,
-	BTRFS_SEND_A_PATH_LINK,
+	BTRFS_SEND_A_PATH = 15,
+	BTRFS_SEND_A_PATH_TO = 16,
+	BTRFS_SEND_A_PATH_LINK = 17,
 
-	BTRFS_SEND_A_FILE_OFFSET,
-	BTRFS_SEND_A_DATA,
+	BTRFS_SEND_A_FILE_OFFSET = 18,
+	BTRFS_SEND_A_DATA = 19,
 
-	BTRFS_SEND_A_CLONE_UUID,
-	BTRFS_SEND_A_CLONE_CTRANSID,
-	BTRFS_SEND_A_CLONE_PATH,
-	BTRFS_SEND_A_CLONE_OFFSET,
-	BTRFS_SEND_A_CLONE_LEN,
+	BTRFS_SEND_A_CLONE_UUID = 20,
+	BTRFS_SEND_A_CLONE_CTRANSID = 21,
+	BTRFS_SEND_A_CLONE_PATH = 22,
+	BTRFS_SEND_A_CLONE_OFFSET = 23,
+	BTRFS_SEND_A_CLONE_LEN = 24,
 
-	__BTRFS_SEND_A_MAX,
+	BTRFS_SEND_A_MAX = 24,
 };
-#define BTRFS_SEND_A_MAX (__BTRFS_SEND_A_MAX - 1)
 
 #ifdef __KERNEL__
 long btrfs_ioctl_send(struct inode *inode, struct btrfs_ioctl_send_args *arg);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v15 3/7] btrfs: add send stream v2 definitions
  2022-04-04 17:29 [PATCH v15 0/7] btrfs: add send/receive support for reading/writing compressed data Omar Sandoval
  2022-04-04 17:29 ` [PATCH v15 1/7] btrfs: send: remove unused send_ctx::{total,cmd}_send_size Omar Sandoval
  2022-04-04 17:29 ` [PATCH v15 2/7] btrfs: send: explicitly number commands and attributes Omar Sandoval
@ 2022-04-04 17:29 ` Omar Sandoval
  2022-05-18 21:00   ` David Sterba
  2022-04-04 17:29 ` [PATCH v15 4/7] btrfs: send: write larger chunks when using stream v2 Omar Sandoval
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Omar Sandoval @ 2022-04-04 17:29 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team

From: Omar Sandoval <osandov@fb.com>

This adds the definitions of the new commands for send stream version 2
and their respective attributes: fallocate, FS_IOC_SETFLAGS (a.k.a.
chattr), and encoded writes. It also documents two changes to the send
stream format in v2: the receiver shouldn't assume a maximum command
size, and the DATA attribute is encoded differently to allow for writes
larger than 64k. These will be implemented in subsequent changes, and
then the ioctl will accept the new version and flag.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/send.c            |  2 +-
 fs/btrfs/send.h            | 40 ++++++++++++++++++++++++++++++++++----
 include/uapi/linux/btrfs.h |  7 +++++++
 3 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 9363f625fa17..1f141de3a7d6 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -7459,7 +7459,7 @@ long btrfs_ioctl_send(struct inode *inode, struct btrfs_ioctl_send_args *arg)
 
 	sctx->clone_roots_cnt = arg->clone_sources_count;
 
-	sctx->send_max_size = BTRFS_SEND_BUF_SIZE;
+	sctx->send_max_size = BTRFS_SEND_BUF_SIZE_V1;
 	sctx->send_buf = kvmalloc(sctx->send_max_size, GFP_KERNEL);
 	if (!sctx->send_buf) {
 		ret = -ENOMEM;
diff --git a/fs/btrfs/send.h b/fs/btrfs/send.h
index 67721e0281ba..805d8095209a 100644
--- a/fs/btrfs/send.h
+++ b/fs/btrfs/send.h
@@ -12,7 +12,11 @@
 #define BTRFS_SEND_STREAM_MAGIC "btrfs-stream"
 #define BTRFS_SEND_STREAM_VERSION 1
 
-#define BTRFS_SEND_BUF_SIZE SZ_64K
+/*
+ * In send stream v1, no command is larger than 64k. In send stream v2, no limit
+ * should be assumed.
+ */
+#define BTRFS_SEND_BUF_SIZE_V1 SZ_64K
 
 enum btrfs_tlv_type {
 	BTRFS_TLV_U8,
@@ -80,16 +84,20 @@ enum btrfs_send_cmd {
 	BTRFS_SEND_C_MAX_V1 = 22,
 
 	/* Version 2 */
-	BTRFS_SEND_C_MAX_V2 = 22,
+	BTRFS_SEND_C_FALLOCATE = 23,
+	BTRFS_SEND_C_SETFLAGS = 24,
+	BTRFS_SEND_C_ENCODED_WRITE = 25,
+	BTRFS_SEND_C_MAX_V2 = 25,
 
 	/* End */
-	BTRFS_SEND_C_MAX = 22,
+	BTRFS_SEND_C_MAX = 25,
 };
 
 /* attributes in send stream */
 enum {
 	BTRFS_SEND_A_UNSPEC = 0,
 
+	/* Version 1 */
 	BTRFS_SEND_A_UUID = 1,
 	BTRFS_SEND_A_CTRANSID = 2,
 
@@ -112,6 +120,11 @@ enum {
 	BTRFS_SEND_A_PATH_LINK = 17,
 
 	BTRFS_SEND_A_FILE_OFFSET = 18,
+	/*
+	 * As of send stream v2, this attribute is special: it must be the last
+	 * attribute in a command, its header contains only the type, and its
+	 * length is implicitly the remaining length of the command.
+	 */
 	BTRFS_SEND_A_DATA = 19,
 
 	BTRFS_SEND_A_CLONE_UUID = 20,
@@ -120,7 +133,26 @@ enum {
 	BTRFS_SEND_A_CLONE_OFFSET = 23,
 	BTRFS_SEND_A_CLONE_LEN = 24,
 
-	BTRFS_SEND_A_MAX = 24,
+	BTRFS_SEND_A_MAX_V1 = 24,
+
+	/* Version 2 */
+	BTRFS_SEND_A_FALLOCATE_MODE = 25,
+
+	BTRFS_SEND_A_SETFLAGS_FLAGS = 26,
+
+	BTRFS_SEND_A_UNENCODED_FILE_LEN = 27,
+	BTRFS_SEND_A_UNENCODED_LEN = 28,
+	BTRFS_SEND_A_UNENCODED_OFFSET = 29,
+	/*
+	 * COMPRESSION and ENCRYPTION default to NONE (0) if omitted from
+	 * BTRFS_SEND_C_ENCODED_WRITE.
+	 */
+	BTRFS_SEND_A_COMPRESSION = 30,
+	BTRFS_SEND_A_ENCRYPTION = 31,
+	BTRFS_SEND_A_MAX_V2 = 31,
+
+	/* End */
+	BTRFS_SEND_A_MAX = 31,
 };
 
 #ifdef __KERNEL__
diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index d956b2993970..b6f26a434b10 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -777,6 +777,13 @@ struct btrfs_ioctl_received_subvol_args {
  */
 #define BTRFS_SEND_FLAG_VERSION			0x8
 
+/*
+ * Send compressed data using the ENCODED_WRITE command instead of decompressing
+ * the data and sending it with the WRITE command. This requires protocol
+ * version >= 2.
+ */
+#define BTRFS_SEND_FLAG_COMPRESSED		0x10
+
 #define BTRFS_SEND_FLAG_MASK \
 	(BTRFS_SEND_FLAG_NO_FILE_DATA | \
 	 BTRFS_SEND_FLAG_OMIT_STREAM_HEADER | \
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v15 4/7] btrfs: send: write larger chunks when using stream v2
  2022-04-04 17:29 [PATCH v15 0/7] btrfs: add send/receive support for reading/writing compressed data Omar Sandoval
                   ` (2 preceding siblings ...)
  2022-04-04 17:29 ` [PATCH v15 3/7] btrfs: add send stream v2 definitions Omar Sandoval
@ 2022-04-04 17:29 ` Omar Sandoval
  2022-04-04 17:29 ` [PATCH v15 5/7] btrfs: send: get send buffer pages for protocol v2 Omar Sandoval
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Omar Sandoval @ 2022-04-04 17:29 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team

From: Omar Sandoval <osandov@fb.com>

The length field of the send stream TLV header is 16 bits. This means
that the maximum amount of data that can be sent for one write is 64k
minus one. However, encoded writes must be able to send the maximum
compressed extent (128k) in one command. To support this, send stream
version 2 encodes the DATA attribute differently: it has no length
field, and the length is implicitly up to the end of containing command
(which has a 32-bit length field). Although this is necessary for
encoded writes, normal writes can benefit from it, too.

Also add a check to enforce that the DATA attribute is last. It is only
strictly necessary for v2, but we might as well make v1 consistent with
it.

For v2, let's bump up the send buffer to the maximum compressed extent
size plus 16k for the other metadata (144k total). Since this will most
likely be vmalloc'd (and always will be after the next commit), we round
it up to the next page since we might as well use the rest of the page
on systems with >16k pages.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/send.c | 47 +++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 39 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 1f141de3a7d6..c0ca45dae6d6 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -82,6 +82,11 @@ struct send_ctx {
 	char *send_buf;
 	u32 send_size;
 	u32 send_max_size;
+	/*
+	 * Whether BTRFS_SEND_A_DATA attribute was already added to current
+	 * command (since protocol v2, data must be the last attribute).
+	 */
+	bool put_data;
 	u64 flags;	/* 'flags' member of btrfs_ioctl_send_args is u64 */
 	/* Protocol version compatibility requested */
 	u32 proto;
@@ -589,6 +594,9 @@ static int tlv_put(struct send_ctx *sctx, u16 attr, const void *data, int len)
 	int total_len = sizeof(*hdr) + len;
 	int left = sctx->send_max_size - sctx->send_size;
 
+	if (WARN_ON_ONCE(sctx->put_data))
+		return -EINVAL;
+
 	if (unlikely(left < total_len))
 		return -EOVERFLOW;
 
@@ -726,6 +734,7 @@ static int send_cmd(struct send_ctx *sctx)
 					&sctx->send_off);
 
 	sctx->send_size = 0;
+	sctx->put_data = false;
 
 	return ret;
 }
@@ -4853,14 +4862,31 @@ static inline u64 max_send_read_size(const struct send_ctx *sctx)
 
 static int put_data_header(struct send_ctx *sctx, u32 len)
 {
-	struct btrfs_tlv_header *hdr;
+	if (WARN_ON_ONCE(sctx->put_data))
+		return -EINVAL;
+	sctx->put_data = true;
+	if (sctx->proto >= 2) {
+		/*
+		 * Since v2, the data attribute header doesn't include a length;
+		 * it is implicitly to the end of the command.
+		 */
+		if (sctx->send_max_size - sctx->send_size <
+		    sizeof(__le16) + len)
+			return -EOVERFLOW;
+		put_unaligned_le16(BTRFS_SEND_A_DATA,
+				   sctx->send_buf + sctx->send_size);
+		sctx->send_size += sizeof(__le16);
+	} else {
+		struct btrfs_tlv_header *hdr;
 
-	if (sctx->send_max_size - sctx->send_size < sizeof(*hdr) + len)
-		return -EOVERFLOW;
-	hdr = (struct btrfs_tlv_header *)(sctx->send_buf + sctx->send_size);
-	put_unaligned_le16(BTRFS_SEND_A_DATA, &hdr->tlv_type);
-	put_unaligned_le16(len, &hdr->tlv_len);
-	sctx->send_size += sizeof(*hdr);
+		if (sctx->send_max_size - sctx->send_size < sizeof(*hdr) + len)
+			return -EOVERFLOW;
+		hdr = (struct btrfs_tlv_header *)(sctx->send_buf +
+						  sctx->send_size);
+		put_unaligned_le16(BTRFS_SEND_A_DATA, &hdr->tlv_type);
+		put_unaligned_le16(len, &hdr->tlv_len);
+		sctx->send_size += sizeof(*hdr);
+	}
 	return 0;
 }
 
@@ -7459,7 +7485,12 @@ long btrfs_ioctl_send(struct inode *inode, struct btrfs_ioctl_send_args *arg)
 
 	sctx->clone_roots_cnt = arg->clone_sources_count;
 
-	sctx->send_max_size = BTRFS_SEND_BUF_SIZE_V1;
+	if (sctx->proto >= 2) {
+		sctx->send_max_size = ALIGN(SZ_16K + BTRFS_MAX_COMPRESSED,
+					    PAGE_SIZE);
+	} else {
+		sctx->send_max_size = BTRFS_SEND_BUF_SIZE_V1;
+	}
 	sctx->send_buf = kvmalloc(sctx->send_max_size, GFP_KERNEL);
 	if (!sctx->send_buf) {
 		ret = -ENOMEM;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v15 5/7] btrfs: send: get send buffer pages for protocol v2
  2022-04-04 17:29 [PATCH v15 0/7] btrfs: add send/receive support for reading/writing compressed data Omar Sandoval
                   ` (3 preceding siblings ...)
  2022-04-04 17:29 ` [PATCH v15 4/7] btrfs: send: write larger chunks when using stream v2 Omar Sandoval
@ 2022-04-04 17:29 ` Omar Sandoval
  2022-04-04 17:29 ` [PATCH v15 6/7] btrfs: send: send compressed extents with encoded writes Omar Sandoval
  2022-04-04 17:29 ` [PATCH v15 7/7] btrfs: send: enable support for stream v2 and compressed writes Omar Sandoval
  6 siblings, 0 replies; 15+ messages in thread
From: Omar Sandoval @ 2022-04-04 17:29 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team

From: Omar Sandoval <osandov@fb.com>

For encoded writes in send v2, we will get the encoded data with
btrfs_encoded_read_regular_fill_pages(), which expects a list of raw
pages. To avoid extra buffers and copies, we should read directly into
the send buffer. Therefore, we need the raw pages for the send buffer.

We currently allocate the send buffer with kvmalloc(), which may return
a kmalloc'd buffer or a vmalloc'd buffer. For vmalloc, we can get the
pages with vmalloc_to_page(). For kmalloc, we could use virt_to_page().
However, the buffer size we use (144k) is not a power of two, which in
theory is not guaranteed to return a page-aligned buffer, and in
practice would waste a lot of memory due to rounding up to the next
power of two. 144k is large enough that it usually gets allocated with
vmalloc(), anyways. So, for send v2, replace kvmalloc() with vmalloc()
and save the pages in an array.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/send.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index c0ca45dae6d6..e574d4f4a167 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -87,6 +87,7 @@ struct send_ctx {
 	 * command (since protocol v2, data must be the last attribute).
 	 */
 	bool put_data;
+	struct page **send_buf_pages;
 	u64 flags;	/* 'flags' member of btrfs_ioctl_send_args is u64 */
 	/* Protocol version compatibility requested */
 	u32 proto;
@@ -7486,12 +7487,32 @@ long btrfs_ioctl_send(struct inode *inode, struct btrfs_ioctl_send_args *arg)
 	sctx->clone_roots_cnt = arg->clone_sources_count;
 
 	if (sctx->proto >= 2) {
+		u32 send_buf_num_pages;
+
 		sctx->send_max_size = ALIGN(SZ_16K + BTRFS_MAX_COMPRESSED,
 					    PAGE_SIZE);
+		sctx->send_buf = vmalloc(sctx->send_max_size);
+		if (!sctx->send_buf) {
+			ret = -ENOMEM;
+			goto out;
+		}
+		send_buf_num_pages = sctx->send_max_size >> PAGE_SHIFT;
+		sctx->send_buf_pages = kcalloc(send_buf_num_pages,
+					       sizeof(*sctx->send_buf_pages),
+					       GFP_KERNEL);
+		if (!sctx->send_buf_pages) {
+			ret = -ENOMEM;
+			goto out;
+		}
+		for (i = 0; i < send_buf_num_pages; i++) {
+			sctx->send_buf_pages[i] =
+				vmalloc_to_page(sctx->send_buf +
+						(i << PAGE_SHIFT));
+		}
 	} else {
 		sctx->send_max_size = BTRFS_SEND_BUF_SIZE_V1;
+		sctx->send_buf = kvmalloc(sctx->send_max_size, GFP_KERNEL);
 	}
-	sctx->send_buf = kvmalloc(sctx->send_max_size, GFP_KERNEL);
 	if (!sctx->send_buf) {
 		ret = -ENOMEM;
 		goto out;
@@ -7684,6 +7705,7 @@ long btrfs_ioctl_send(struct inode *inode, struct btrfs_ioctl_send_args *arg)
 			fput(sctx->send_filp);
 
 		kvfree(sctx->clone_roots);
+		kfree(sctx->send_buf_pages);
 		kvfree(sctx->send_buf);
 
 		name_cache_free(sctx);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v15 6/7] btrfs: send: send compressed extents with encoded writes
  2022-04-04 17:29 [PATCH v15 0/7] btrfs: add send/receive support for reading/writing compressed data Omar Sandoval
                   ` (4 preceding siblings ...)
  2022-04-04 17:29 ` [PATCH v15 5/7] btrfs: send: get send buffer pages for protocol v2 Omar Sandoval
@ 2022-04-04 17:29 ` Omar Sandoval
  2022-04-04 17:29 ` [PATCH v15 7/7] btrfs: send: enable support for stream v2 and compressed writes Omar Sandoval
  6 siblings, 0 replies; 15+ messages in thread
From: Omar Sandoval @ 2022-04-04 17:29 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team

From: Omar Sandoval <osandov@fb.com>

Now that all of the pieces are in place, we can use the ENCODED_WRITE
command to send compressed extents when appropriate.

Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/ctree.h |   6 ++
 fs/btrfs/inode.c |  13 +--
 fs/btrfs/send.c  | 234 +++++++++++++++++++++++++++++++++++++++++++----
 3 files changed, 228 insertions(+), 25 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 4af340c32986..2727b006a18a 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3356,6 +3356,12 @@ int btrfs_writepage_cow_fixup(struct page *page);
 void btrfs_writepage_endio_finish_ordered(struct btrfs_inode *inode,
 					  struct page *page, u64 start,
 					  u64 end, bool uptodate);
+int btrfs_encoded_io_compression_from_extent(struct btrfs_fs_info *fs_info,
+					     int compress_type);
+int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode,
+					  u64 file_offset, u64 disk_bytenr,
+					  u64 disk_io_size,
+					  struct page **pages);
 ssize_t btrfs_encoded_read(struct kiocb *iocb, struct iov_iter *iter,
 			   struct btrfs_ioctl_encoded_io_args *encoded);
 ssize_t btrfs_do_encoded_write(struct kiocb *iocb, struct iov_iter *from,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4b3a5a8c581f..9b96e51b7870 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -10170,9 +10170,8 @@ void btrfs_set_range_writeback(struct btrfs_inode *inode, u64 start, u64 end)
 	}
 }
 
-static int btrfs_encoded_io_compression_from_extent(
-				struct btrfs_fs_info *fs_info,
-				int compress_type)
+int btrfs_encoded_io_compression_from_extent(struct btrfs_fs_info *fs_info,
+					     int compress_type)
 {
 	switch (compress_type) {
 	case BTRFS_COMPRESS_NONE:
@@ -10377,11 +10376,9 @@ static void btrfs_encoded_read_endio(struct bio *bio)
 	bio_put(bio);
 }
 
-static int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode,
-						 u64 file_offset,
-						 u64 disk_bytenr,
-						 u64 disk_io_size,
-						 struct page **pages)
+int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode,
+					  u64 file_offset, u64 disk_bytenr,
+					  u64 disk_io_size, struct page **pages)
 {
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
 	struct btrfs_encoded_read_private priv = {
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index e574d4f4a167..675f46d96539 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -618,6 +618,7 @@ static int tlv_put(struct send_ctx *sctx, u16 attr, const void *data, int len)
 		return tlv_put(sctx, attr, &__tmp, sizeof(__tmp));	\
 	}
 
+TLV_PUT_DEFINE_INT(32)
 TLV_PUT_DEFINE_INT(64)
 
 static int tlv_put_string(struct send_ctx *sctx, u16 attr,
@@ -5165,16 +5166,215 @@ static int send_hole(struct send_ctx *sctx, u64 end)
 	return ret;
 }
 
-static int send_extent_data(struct send_ctx *sctx,
-			    const u64 offset,
-			    const u64 len)
+static int send_encoded_inline_extent(struct send_ctx *sctx,
+				      struct btrfs_path *path, u64 offset,
+				      u64 len)
 {
+	struct btrfs_root *root = sctx->send_root;
+	struct btrfs_fs_info *fs_info = root->fs_info;
+	struct inode *inode;
+	struct fs_path *p;
+	struct extent_buffer *leaf = path->nodes[0];
+	struct btrfs_key key;
+	struct btrfs_file_extent_item *ei;
+	u64 ram_bytes;
+	size_t inline_size;
+	int ret;
+
+	inode = btrfs_iget(fs_info->sb, sctx->cur_ino, root);
+	if (IS_ERR(inode))
+		return PTR_ERR(inode);
+
+	p = fs_path_alloc();
+	if (!p) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	ret = begin_cmd(sctx, BTRFS_SEND_C_ENCODED_WRITE);
+	if (ret < 0)
+		goto out;
+
+	ret = get_cur_path(sctx, sctx->cur_ino, sctx->cur_inode_gen, p);
+	if (ret < 0)
+		goto out;
+
+	btrfs_item_key_to_cpu(leaf, &key, path->slots[0]);
+	ei = btrfs_item_ptr(leaf, path->slots[0],
+			    struct btrfs_file_extent_item);
+	ram_bytes = btrfs_file_extent_ram_bytes(leaf, ei);
+	inline_size = btrfs_file_extent_inline_item_len(leaf, path->slots[0]);
+
+	TLV_PUT_PATH(sctx, BTRFS_SEND_A_PATH, p);
+	TLV_PUT_U64(sctx, BTRFS_SEND_A_FILE_OFFSET, offset);
+	TLV_PUT_U64(sctx, BTRFS_SEND_A_UNENCODED_FILE_LEN,
+		    min(key.offset + ram_bytes - offset, len));
+	TLV_PUT_U64(sctx, BTRFS_SEND_A_UNENCODED_LEN, ram_bytes);
+	TLV_PUT_U64(sctx, BTRFS_SEND_A_UNENCODED_OFFSET, offset - key.offset);
+	ret = btrfs_encoded_io_compression_from_extent(fs_info,
+				btrfs_file_extent_compression(leaf, ei));
+	if (ret < 0)
+		goto out;
+	TLV_PUT_U32(sctx, BTRFS_SEND_A_COMPRESSION, ret);
+
+	ret = put_data_header(sctx, inline_size);
+	if (ret < 0)
+		goto out;
+	read_extent_buffer(leaf, sctx->send_buf + sctx->send_size,
+			   btrfs_file_extent_inline_start(ei), inline_size);
+	sctx->send_size += inline_size;
+
+	ret = send_cmd(sctx);
+
+tlv_put_failure:
+out:
+	fs_path_free(p);
+	iput(inode);
+	return ret;
+}
+
+static int send_encoded_extent(struct send_ctx *sctx, struct btrfs_path *path,
+			       u64 offset, u64 len)
+{
+	struct btrfs_root *root = sctx->send_root;
+	struct btrfs_fs_info *fs_info = root->fs_info;
+	struct inode *inode;
+	struct fs_path *p;
+	struct extent_buffer *leaf = path->nodes[0];
+	struct btrfs_key key;
+	struct btrfs_file_extent_item *ei;
+	u64 disk_bytenr, disk_num_bytes;
+	u32 data_offset;
+	struct btrfs_cmd_header *hdr;
+	u32 crc;
+	int ret;
+
+	inode = btrfs_iget(fs_info->sb, sctx->cur_ino, root);
+	if (IS_ERR(inode))
+		return PTR_ERR(inode);
+
+	p = fs_path_alloc();
+	if (!p) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	ret = begin_cmd(sctx, BTRFS_SEND_C_ENCODED_WRITE);
+	if (ret < 0)
+		goto out;
+
+	ret = get_cur_path(sctx, sctx->cur_ino, sctx->cur_inode_gen, p);
+	if (ret < 0)
+		goto out;
+
+	btrfs_item_key_to_cpu(leaf, &key, path->slots[0]);
+	ei = btrfs_item_ptr(leaf, path->slots[0],
+			    struct btrfs_file_extent_item);
+	disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, ei);
+	disk_num_bytes = btrfs_file_extent_disk_num_bytes(leaf, ei);
+
+	TLV_PUT_PATH(sctx, BTRFS_SEND_A_PATH, p);
+	TLV_PUT_U64(sctx, BTRFS_SEND_A_FILE_OFFSET, offset);
+	TLV_PUT_U64(sctx, BTRFS_SEND_A_UNENCODED_FILE_LEN,
+		    min(key.offset + btrfs_file_extent_num_bytes(leaf, ei) - offset,
+			len));
+	TLV_PUT_U64(sctx, BTRFS_SEND_A_UNENCODED_LEN,
+		    btrfs_file_extent_ram_bytes(leaf, ei));
+	TLV_PUT_U64(sctx, BTRFS_SEND_A_UNENCODED_OFFSET,
+		    offset - key.offset + btrfs_file_extent_offset(leaf, ei));
+	ret = btrfs_encoded_io_compression_from_extent(fs_info,
+				btrfs_file_extent_compression(leaf, ei));
+	if (ret < 0)
+		goto out;
+	TLV_PUT_U32(sctx, BTRFS_SEND_A_COMPRESSION, ret);
+	TLV_PUT_U32(sctx, BTRFS_SEND_A_ENCRYPTION, 0);
+
+	ret = put_data_header(sctx, disk_num_bytes);
+	if (ret < 0)
+		goto out;
+
+	/*
+	 * We want to do I/O directly into the send buffer, so get the next page
+	 * boundary in the send buffer. This means that there may be a gap
+	 * between the beginning of the command and the file data.
+	 */
+	data_offset = ALIGN(sctx->send_size, PAGE_SIZE);
+	if (data_offset > sctx->send_max_size ||
+	    sctx->send_max_size - data_offset < disk_num_bytes) {
+		ret = -EOVERFLOW;
+		goto out;
+	}
+
+	/*
+	 * Note that send_buf is a mapping of send_buf_pages, so this is really
+	 * reading into send_buf.
+	 */
+	ret = btrfs_encoded_read_regular_fill_pages(BTRFS_I(inode), offset,
+						    disk_bytenr, disk_num_bytes,
+						    sctx->send_buf_pages +
+						    (data_offset >> PAGE_SHIFT));
+	if (ret)
+		goto out;
+
+	hdr = (struct btrfs_cmd_header *)sctx->send_buf;
+	hdr->len = cpu_to_le32(sctx->send_size + disk_num_bytes - sizeof(*hdr));
+	hdr->crc = 0;
+	crc = btrfs_crc32c(0, sctx->send_buf, sctx->send_size);
+	crc = btrfs_crc32c(crc, sctx->send_buf + data_offset, disk_num_bytes);
+	hdr->crc = cpu_to_le32(crc);
+
+	ret = write_buf(sctx->send_filp, sctx->send_buf, sctx->send_size,
+			&sctx->send_off);
+	if (!ret) {
+		ret = write_buf(sctx->send_filp, sctx->send_buf + data_offset,
+				disk_num_bytes, &sctx->send_off);
+	}
+	sctx->send_size = 0;
+	sctx->put_data = false;
+
+tlv_put_failure:
+out:
+	fs_path_free(p);
+	iput(inode);
+	return ret;
+}
+
+static int send_extent_data(struct send_ctx *sctx, struct btrfs_path *path,
+			    const u64 offset, const u64 len)
+{
+	struct extent_buffer *leaf = path->nodes[0];
+	struct btrfs_file_extent_item *ei;
 	u64 read_size = max_send_read_size(sctx);
 	u64 sent = 0;
 
 	if (sctx->flags & BTRFS_SEND_FLAG_NO_FILE_DATA)
 		return send_update_extent(sctx, offset, len);
 
+	ei = btrfs_item_ptr(leaf, path->slots[0],
+			    struct btrfs_file_extent_item);
+	if ((sctx->flags & BTRFS_SEND_FLAG_COMPRESSED) &&
+	    btrfs_file_extent_compression(leaf, ei) != BTRFS_COMPRESS_NONE) {
+		bool is_inline = (btrfs_file_extent_type(leaf, ei) ==
+				  BTRFS_FILE_EXTENT_INLINE);
+
+		/*
+		 * Send the compressed extent unless the compressed data is
+		 * larger than the decompressed data. This can happen if we're
+		 * not sending the entire extent, either because it has been
+		 * partially overwritten/truncated or because this is a part of
+		 * the extent that we couldn't clone in clone_range().
+		 */
+		if (is_inline &&
+		    btrfs_file_extent_inline_item_len(leaf,
+						      path->slots[0]) <= len) {
+			return send_encoded_inline_extent(sctx, path, offset,
+							  len);
+		} else if (!is_inline &&
+			   btrfs_file_extent_disk_num_bytes(leaf, ei) <= len) {
+			return send_encoded_extent(sctx, path, offset, len);
+		}
+	}
+
 	while (sent < len) {
 		u64 size = min(len - sent, read_size);
 		int ret;
@@ -5245,12 +5445,9 @@ static int send_capabilities(struct send_ctx *sctx)
 	return ret;
 }
 
-static int clone_range(struct send_ctx *sctx,
-		       struct clone_root *clone_root,
-		       const u64 disk_byte,
-		       u64 data_offset,
-		       u64 offset,
-		       u64 len)
+static int clone_range(struct send_ctx *sctx, struct btrfs_path *dst_path,
+		       struct clone_root *clone_root, const u64 disk_byte,
+		       u64 data_offset, u64 offset, u64 len)
 {
 	struct btrfs_path *path;
 	struct btrfs_key key;
@@ -5274,7 +5471,7 @@ static int clone_range(struct send_ctx *sctx,
 	 */
 	if (clone_root->offset == 0 &&
 	    len == sctx->send_root->fs_info->sectorsize)
-		return send_extent_data(sctx, offset, len);
+		return send_extent_data(sctx, dst_path, offset, len);
 
 	path = alloc_path_for_send();
 	if (!path)
@@ -5371,7 +5568,8 @@ static int clone_range(struct send_ctx *sctx,
 
 			if (hole_len > len)
 				hole_len = len;
-			ret = send_extent_data(sctx, offset, hole_len);
+			ret = send_extent_data(sctx, dst_path, offset,
+					       hole_len);
 			if (ret < 0)
 				goto out;
 
@@ -5444,14 +5642,16 @@ static int clone_range(struct send_ctx *sctx,
 					if (ret < 0)
 						goto out;
 				}
-				ret = send_extent_data(sctx, offset + slen,
+				ret = send_extent_data(sctx, dst_path,
+						       offset + slen,
 						       clone_len - slen);
 			} else {
 				ret = send_clone(sctx, offset, clone_len,
 						 clone_root);
 			}
 		} else {
-			ret = send_extent_data(sctx, offset, clone_len);
+			ret = send_extent_data(sctx, dst_path, offset,
+					       clone_len);
 		}
 
 		if (ret < 0)
@@ -5483,7 +5683,7 @@ static int clone_range(struct send_ctx *sctx,
 	}
 
 	if (len > 0)
-		ret = send_extent_data(sctx, offset, len);
+		ret = send_extent_data(sctx, dst_path, offset, len);
 	else
 		ret = 0;
 out:
@@ -5514,10 +5714,10 @@ static int send_write_or_clone(struct send_ctx *sctx,
 				    struct btrfs_file_extent_item);
 		disk_byte = btrfs_file_extent_disk_bytenr(path->nodes[0], ei);
 		data_offset = btrfs_file_extent_offset(path->nodes[0], ei);
-		ret = clone_range(sctx, clone_root, disk_byte, data_offset,
-				  offset, end - offset);
+		ret = clone_range(sctx, path, clone_root, disk_byte,
+				  data_offset, offset, end - offset);
 	} else {
-		ret = send_extent_data(sctx, offset, end - offset);
+		ret = send_extent_data(sctx, path, offset, end - offset);
 	}
 	sctx->cur_inode_next_write_offset = end;
 	return ret;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v15 7/7] btrfs: send: enable support for stream v2 and compressed writes
  2022-04-04 17:29 [PATCH v15 0/7] btrfs: add send/receive support for reading/writing compressed data Omar Sandoval
                   ` (5 preceding siblings ...)
  2022-04-04 17:29 ` [PATCH v15 6/7] btrfs: send: send compressed extents with encoded writes Omar Sandoval
@ 2022-04-04 17:29 ` Omar Sandoval
  6 siblings, 0 replies; 15+ messages in thread
From: Omar Sandoval @ 2022-04-04 17:29 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team

From: Omar Sandoval <osandov@fb.com>

Now that the new support is implemented, allow the ioctl to accept v2
and the compressed flag, and update the version in sysfs.

Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/send.c            | 7 +++++--
 fs/btrfs/send.h            | 2 +-
 include/uapi/linux/btrfs.h | 3 ++-
 3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 675f46d96539..1447b01c0bda 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -694,8 +694,7 @@ static int send_header(struct send_ctx *sctx)
 	struct btrfs_stream_header hdr;
 
 	strcpy(hdr.magic, BTRFS_SEND_STREAM_MAGIC);
-	hdr.version = cpu_to_le32(BTRFS_SEND_STREAM_VERSION);
-
+	hdr.version = cpu_to_le32(sctx->proto);
 	return write_buf(sctx->send_filp, &hdr, sizeof(hdr),
 					&sctx->send_off);
 }
@@ -7667,6 +7666,10 @@ long btrfs_ioctl_send(struct inode *inode, struct btrfs_ioctl_send_args *arg)
 	} else {
 		sctx->proto = 1;
 	}
+	if ((arg->flags & BTRFS_SEND_FLAG_COMPRESSED) && sctx->proto < 2) {
+		ret = -EINVAL;
+		goto out;
+	}
 
 	sctx->send_filp = fget(arg->send_fd);
 	if (!sctx->send_filp) {
diff --git a/fs/btrfs/send.h b/fs/btrfs/send.h
index 805d8095209a..50a2aceae929 100644
--- a/fs/btrfs/send.h
+++ b/fs/btrfs/send.h
@@ -10,7 +10,7 @@
 #include "ctree.h"
 
 #define BTRFS_SEND_STREAM_MAGIC "btrfs-stream"
-#define BTRFS_SEND_STREAM_VERSION 1
+#define BTRFS_SEND_STREAM_VERSION 2
 
 /*
  * In send stream v1, no command is larger than 64k. In send stream v2, no limit
diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index b6f26a434b10..f54dc91e4025 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -788,7 +788,8 @@ struct btrfs_ioctl_received_subvol_args {
 	(BTRFS_SEND_FLAG_NO_FILE_DATA | \
 	 BTRFS_SEND_FLAG_OMIT_STREAM_HEADER | \
 	 BTRFS_SEND_FLAG_OMIT_END_CMD | \
-	 BTRFS_SEND_FLAG_VERSION)
+	 BTRFS_SEND_FLAG_VERSION | \
+	 BTRFS_SEND_FLAG_COMPRESSED)
 
 struct btrfs_ioctl_send_args {
 	__s64 send_fd;			/* in */
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v15 3/7] btrfs: add send stream v2 definitions
  2022-04-04 17:29 ` [PATCH v15 3/7] btrfs: add send stream v2 definitions Omar Sandoval
@ 2022-05-18 21:00   ` David Sterba
  2022-05-18 22:25     ` Omar Sandoval
  0 siblings, 1 reply; 15+ messages in thread
From: David Sterba @ 2022-05-18 21:00 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-btrfs, kernel-team

On Mon, Apr 04, 2022 at 10:29:05AM -0700, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> This adds the definitions of the new commands for send stream version 2
> and their respective attributes: fallocate, FS_IOC_SETFLAGS (a.k.a.
> chattr), and encoded writes. It also documents two changes to the send
> stream format in v2: the receiver shouldn't assume a maximum command
> size, and the DATA attribute is encoded differently to allow for writes
> larger than 64k. These will be implemented in subsequent changes, and
> then the ioctl will accept the new version and flag.
> 
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
> Signed-off-by: Omar Sandoval <osandov@fb.com>
> ---
>  fs/btrfs/send.c            |  2 +-
>  fs/btrfs/send.h            | 40 ++++++++++++++++++++++++++++++++++----
>  include/uapi/linux/btrfs.h |  7 +++++++
>  3 files changed, 44 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> index 9363f625fa17..1f141de3a7d6 100644
> --- a/fs/btrfs/send.c
> +++ b/fs/btrfs/send.c
> @@ -7459,7 +7459,7 @@ long btrfs_ioctl_send(struct inode *inode, struct btrfs_ioctl_send_args *arg)
>  
>  	sctx->clone_roots_cnt = arg->clone_sources_count;
>  
> -	sctx->send_max_size = BTRFS_SEND_BUF_SIZE;
> +	sctx->send_max_size = BTRFS_SEND_BUF_SIZE_V1;
>  	sctx->send_buf = kvmalloc(sctx->send_max_size, GFP_KERNEL);
>  	if (!sctx->send_buf) {
>  		ret = -ENOMEM;
> diff --git a/fs/btrfs/send.h b/fs/btrfs/send.h
> index 67721e0281ba..805d8095209a 100644
> --- a/fs/btrfs/send.h
> +++ b/fs/btrfs/send.h
> @@ -12,7 +12,11 @@
>  #define BTRFS_SEND_STREAM_MAGIC "btrfs-stream"
>  #define BTRFS_SEND_STREAM_VERSION 1
>  
> -#define BTRFS_SEND_BUF_SIZE SZ_64K
> +/*
> + * In send stream v1, no command is larger than 64k. In send stream v2, no limit
> + * should be assumed.
> + */
> +#define BTRFS_SEND_BUF_SIZE_V1 SZ_64K
>  
>  enum btrfs_tlv_type {
>  	BTRFS_TLV_U8,
> @@ -80,16 +84,20 @@ enum btrfs_send_cmd {
>  	BTRFS_SEND_C_MAX_V1 = 22,
>  
>  	/* Version 2 */
> -	BTRFS_SEND_C_MAX_V2 = 22,
> +	BTRFS_SEND_C_FALLOCATE = 23,
> +	BTRFS_SEND_C_SETFLAGS = 24,

Do you have patches that implement the fallocate modes and setflags? I
don't see it in this patchset. The setflags should be switched to
something closer to the recent refactoring that unifies all the
flags/attrs to fileattr. I have a prototype patch for that, comparing
the inode flags in the same way as file mode, the tricky part is on the
receive side how to apply them correctly. On the sending side it's
simple though.

> +	BTRFS_SEND_C_ENCODED_WRITE = 25,
> +	BTRFS_SEND_C_MAX_V2 = 25,
>  
>  	/* End */
> -	BTRFS_SEND_C_MAX = 22,
> +	BTRFS_SEND_C_MAX = 25,
>  };

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v15 2/7] btrfs: send: explicitly number commands and attributes
  2022-04-04 17:29 ` [PATCH v15 2/7] btrfs: send: explicitly number commands and attributes Omar Sandoval
@ 2022-05-18 22:24   ` David Sterba
  0 siblings, 0 replies; 15+ messages in thread
From: David Sterba @ 2022-05-18 22:24 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-btrfs, kernel-team

On Mon, Apr 04, 2022 at 10:29:04AM -0700, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> Commit e77fbf990316 ("btrfs: send: prepare for v2 protocol") added
> _BTRFS_SEND_C_MAX_V* macros equal to the maximum command number for the
> version plus 1, but as written this creates gaps in the number space.
> The maximum command number is currently 22, and __BTRFS_SEND_C_MAX_V1 is
> accordingly 23. But then __BTRFS_SEND_C_MAX_V2 is 24, suggesting that v2
> has a command numbered 23, and __BTRFS_SEND_C_MAX is 25, suggesting that
> 23 and 24 are valid commands.
> 
> Instead, let's explicitly number all of the commands, attributes, and
> sentinel MAX constants.
> 
> Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
> Signed-off-by: Omar Sandoval <osandov@fb.com>
> ---
>  fs/btrfs/send.c |   4 +-
>  fs/btrfs/send.h | 106 ++++++++++++++++++++++++------------------------
>  2 files changed, 54 insertions(+), 56 deletions(-)
> 
> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> index 6d36dee1505f..9363f625fa17 100644
> --- a/fs/btrfs/send.c
> +++ b/fs/btrfs/send.c
> @@ -326,8 +326,8 @@ __maybe_unused
>  static bool proto_cmd_ok(const struct send_ctx *sctx, int cmd)
>  {
>  	switch (sctx->proto) {
> -	case 1:	 return cmd < __BTRFS_SEND_C_MAX_V1;
> -	case 2:	 return cmd < __BTRFS_SEND_C_MAX_V2;
> +	case 1:	 return cmd <= BTRFS_SEND_C_MAX_V1;
> +	case 2:	 return cmd <= BTRFS_SEND_C_MAX_V2;
>  	default: return false;
>  	}
>  }
> diff --git a/fs/btrfs/send.h b/fs/btrfs/send.h
> index 08602fdd600a..67721e0281ba 100644
> --- a/fs/btrfs/send.h
> +++ b/fs/btrfs/send.h
> @@ -46,84 +46,82 @@ struct btrfs_tlv_header {
>  
>  /* commands */
>  enum btrfs_send_cmd {
> -	BTRFS_SEND_C_UNSPEC,
> +	BTRFS_SEND_C_UNSPEC = 0,
>  
>  	/* Version 1 */
> -	BTRFS_SEND_C_SUBVOL,
> -	BTRFS_SEND_C_SNAPSHOT,
> +	BTRFS_SEND_C_SUBVOL = 1,
> +	BTRFS_SEND_C_SNAPSHOT = 2,
>  
> -	BTRFS_SEND_C_MKFILE,
> -	BTRFS_SEND_C_MKDIR,
> -	BTRFS_SEND_C_MKNOD,
> -	BTRFS_SEND_C_MKFIFO,
> -	BTRFS_SEND_C_MKSOCK,
> -	BTRFS_SEND_C_SYMLINK,
> +	BTRFS_SEND_C_MKFILE = 3,
> +	BTRFS_SEND_C_MKDIR = 4,
> +	BTRFS_SEND_C_MKNOD = 5,
> +	BTRFS_SEND_C_MKFIFO = 6,
> +	BTRFS_SEND_C_MKSOCK = 7,
> +	BTRFS_SEND_C_SYMLINK = 8,

Sweat Tea suggested to align the "= number" in the previous iteration, I
agree with that, it's much more readable. As this is just cosmetic
change it could wait until we have all the other changes done.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v15 3/7] btrfs: add send stream v2 definitions
  2022-05-18 21:00   ` David Sterba
@ 2022-05-18 22:25     ` Omar Sandoval
  2022-05-19 16:07       ` David Sterba
  0 siblings, 1 reply; 15+ messages in thread
From: Omar Sandoval @ 2022-05-18 22:25 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs, kernel-team

On Wed, May 18, 2022 at 11:00:03PM +0200, David Sterba wrote:
> On Mon, Apr 04, 2022 at 10:29:05AM -0700, Omar Sandoval wrote:
> > From: Omar Sandoval <osandov@fb.com>
> > 
> > This adds the definitions of the new commands for send stream version 2
> > and their respective attributes: fallocate, FS_IOC_SETFLAGS (a.k.a.
> > chattr), and encoded writes. It also documents two changes to the send
> > stream format in v2: the receiver shouldn't assume a maximum command
> > size, and the DATA attribute is encoded differently to allow for writes
> > larger than 64k. These will be implemented in subsequent changes, and
> > then the ioctl will accept the new version and flag.
> > 
> > Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> > Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
> > Signed-off-by: Omar Sandoval <osandov@fb.com>
> > ---
> >  fs/btrfs/send.c            |  2 +-
> >  fs/btrfs/send.h            | 40 ++++++++++++++++++++++++++++++++++----
> >  include/uapi/linux/btrfs.h |  7 +++++++
> >  3 files changed, 44 insertions(+), 5 deletions(-)
> > 
> > diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> > index 9363f625fa17..1f141de3a7d6 100644
> > --- a/fs/btrfs/send.c
> > +++ b/fs/btrfs/send.c
> > @@ -7459,7 +7459,7 @@ long btrfs_ioctl_send(struct inode *inode, struct btrfs_ioctl_send_args *arg)
> >  
> >  	sctx->clone_roots_cnt = arg->clone_sources_count;
> >  
> > -	sctx->send_max_size = BTRFS_SEND_BUF_SIZE;
> > +	sctx->send_max_size = BTRFS_SEND_BUF_SIZE_V1;
> >  	sctx->send_buf = kvmalloc(sctx->send_max_size, GFP_KERNEL);
> >  	if (!sctx->send_buf) {
> >  		ret = -ENOMEM;
> > diff --git a/fs/btrfs/send.h b/fs/btrfs/send.h
> > index 67721e0281ba..805d8095209a 100644
> > --- a/fs/btrfs/send.h
> > +++ b/fs/btrfs/send.h
> > @@ -12,7 +12,11 @@
> >  #define BTRFS_SEND_STREAM_MAGIC "btrfs-stream"
> >  #define BTRFS_SEND_STREAM_VERSION 1
> >  
> > -#define BTRFS_SEND_BUF_SIZE SZ_64K
> > +/*
> > + * In send stream v1, no command is larger than 64k. In send stream v2, no limit
> > + * should be assumed.
> > + */
> > +#define BTRFS_SEND_BUF_SIZE_V1 SZ_64K
> >  
> >  enum btrfs_tlv_type {
> >  	BTRFS_TLV_U8,
> > @@ -80,16 +84,20 @@ enum btrfs_send_cmd {
> >  	BTRFS_SEND_C_MAX_V1 = 22,
> >  
> >  	/* Version 2 */
> > -	BTRFS_SEND_C_MAX_V2 = 22,
> > +	BTRFS_SEND_C_FALLOCATE = 23,
> > +	BTRFS_SEND_C_SETFLAGS = 24,
> 
> Do you have patches that implement the fallocate modes and setflags? I
> don't see it in this patchset.

Nope, as discussed before, in order to keep the patch series managable,
this series adds the definitions and receive support for fallocate and
setflags, but leaves the send side to be implemented at a later time.

I implemented fallocate for send back in 2019:
https://github.com/osandov/linux/commits/btrfs-send-v2. It passed some
basic testing back then, but it'd need a big rebase and more testing.

> The setflags should be switched to
> something closer to the recent refactoring that unifies all the
> flags/attrs to fileattr. I have a prototype patch for that, comparing
> the inode flags in the same way as file mode, the tricky part is on the
> receive side how to apply them correctly. On the sending side it's
> simple though.

The way this series documents (and implements in receive)
BTRFS_SEND_C_SETFLAGS is that it's a simple call to FS_IOC_SETFLAGS with
given flags. I don't think this is affected by the change to fileattr,
unless I'm misunderstanding.

This is in line with the other commands being straightforward system
calls, but it does mean that the sending side has to deal with the
complexities of an immutable or append-only file being modified between
incremental sends (by temporarily clearing the flag), and of inherited
flags (e.g., a COW file inside of a NOCOW directory). I suppose it'd
also be possible to have SETFLAGS define the final flags and leave it up
to receive to make that happen by temporarily setting/clearing flags as
necessary, but that is a bit inconsistent with how we've handled other
commands.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v15 3/7] btrfs: add send stream v2 definitions
  2022-05-18 22:25     ` Omar Sandoval
@ 2022-05-19 16:07       ` David Sterba
  2022-05-19 22:31         ` Omar Sandoval
  0 siblings, 1 reply; 15+ messages in thread
From: David Sterba @ 2022-05-19 16:07 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-btrfs, kernel-team

On Wed, May 18, 2022 at 03:25:34PM -0700, Omar Sandoval wrote:
> On Wed, May 18, 2022 at 11:00:03PM +0200, David Sterba wrote:
> > On Mon, Apr 04, 2022 at 10:29:05AM -0700, Omar Sandoval wrote:
> > > @@ -80,16 +84,20 @@ enum btrfs_send_cmd {
> > >  	BTRFS_SEND_C_MAX_V1 = 22,
> > >  
> > >  	/* Version 2 */
> > > -	BTRFS_SEND_C_MAX_V2 = 22,
> > > +	BTRFS_SEND_C_FALLOCATE = 23,
> > > +	BTRFS_SEND_C_SETFLAGS = 24,
> > 
> > Do you have patches that implement the fallocate modes and setflags? I
> > don't see it in this patchset.
> 
> Nope, as discussed before, in order to keep the patch series managable,
> this series adds the definitions and receive support for fallocate and
> setflags, but leaves the send side to be implemented at a later time.
> 
> I implemented fallocate for send back in 2019:
> https://github.com/osandov/linux/commits/btrfs-send-v2. It passed some
> basic testing back then, but it'd need a big rebase and more testing.

The patches in the branch are partially cleanups and preparatory work,
so at least avoiding sending the holes would be nice to have for v2 as
it was one of the first bugs reported. The falllocate modes seem to be
easy. The rest is about the versioning infrastructure that we already
have merged.

> > The setflags should be switched to
> > something closer to the recent refactoring that unifies all the
> > flags/attrs to fileattr. I have a prototype patch for that, comparing
> > the inode flags in the same way as file mode, the tricky part is on the
> > receive side how to apply them correctly. On the sending side it's
> > simple though.
> 
> The way this series documents (and implements in receive)
> BTRFS_SEND_C_SETFLAGS is that it's a simple call to FS_IOC_SETFLAGS with
> given flags. I don't think this is affected by the change to fileattr,
> unless I'm misunderstanding.

The SETFLAGS ioctls are obsolete and I don't want to make them part of
the protocol defition because the bit namespace contains flags we don't
have implemented or are not releated to anything in btrfs.

https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/fs.h#L220

It's basically just naming and specifying what exactly is the value so
we should pick the most recent interface name that superseded SETFLAGS
and the XFLAGS.

> This is in line with the other commands being straightforward system
> calls, but it does mean that the sending side has to deal with the
> complexities of an immutable or append-only file being modified between
> incremental sends (by temporarily clearing the flag), and of inherited
> flags (e.g., a COW file inside of a NOCOW directory).

Yeah the receiving side needs to understand the constraints of the
flags, it has only the information about the final state and not the
order in which the flags get applied.

> I suppose it'd
> also be possible to have SETFLAGS define the final flags and leave it up
> to receive to make that happen by temporarily setting/clearing flags as
> necessary, but that is a bit inconsistent with how we've handled other
> commands.

I'm not sure we can always stick to 1:1 mapping to syscalls or ioctls,
of course it's the best option, but the protocol can transfer eg.
more complete information and it's up to the receiving side to apply it
(like if a file has NODATASUM flag set).

From the other side there are multiple actions for something that could
be just one, like creating file first as an orphan and then renaming it.
So I'd like to look at it from the protocol perspective and not
necessarily blindly copy the OS interfaces.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v15 3/7] btrfs: add send stream v2 definitions
  2022-05-19 16:07       ` David Sterba
@ 2022-05-19 22:31         ` Omar Sandoval
  2022-05-20 19:34           ` David Sterba
  0 siblings, 1 reply; 15+ messages in thread
From: Omar Sandoval @ 2022-05-19 22:31 UTC (permalink / raw)
  To: dsterba, linux-btrfs, kernel-team

On Thu, May 19, 2022 at 06:07:49PM +0200, David Sterba wrote:
> On Wed, May 18, 2022 at 03:25:34PM -0700, Omar Sandoval wrote:
> > On Wed, May 18, 2022 at 11:00:03PM +0200, David Sterba wrote:
> > > On Mon, Apr 04, 2022 at 10:29:05AM -0700, Omar Sandoval wrote:
> > > > @@ -80,16 +84,20 @@ enum btrfs_send_cmd {
> > > >  	BTRFS_SEND_C_MAX_V1 = 22,
> > > >  
> > > >  	/* Version 2 */
> > > > -	BTRFS_SEND_C_MAX_V2 = 22,
> > > > +	BTRFS_SEND_C_FALLOCATE = 23,
> > > > +	BTRFS_SEND_C_SETFLAGS = 24,
> > > 
> > > Do you have patches that implement the fallocate modes and setflags? I
> > > don't see it in this patchset.
> > 
> > Nope, as discussed before, in order to keep the patch series managable,
> > this series adds the definitions and receive support for fallocate and
> > setflags, but leaves the send side to be implemented at a later time.
> > 
> > I implemented fallocate for send back in 2019:
> > https://github.com/osandov/linux/commits/btrfs-send-v2. It passed some
> > basic testing back then, but it'd need a big rebase and more testing.
> 
> The patches in the branch are partially cleanups and preparatory work,
> so at least avoiding sending the holes would be nice to have for v2 as
> it was one of the first bugs reported. The falllocate modes seem to be
> easy. The rest is about the versioning infrastructure that we already
> have merged.

I rebased the patches on this series:
https://github.com/osandov/linux/commits/btrfs-send-v2-redux. It passes
some basic testing, but it'll definitely need a lot of fstests.

> > > The setflags should be switched to
> > > something closer to the recent refactoring that unifies all the
> > > flags/attrs to fileattr. I have a prototype patch for that, comparing
> > > the inode flags in the same way as file mode, the tricky part is on the
> > > receive side how to apply them correctly. On the sending side it's
> > > simple though.
> > 
> > The way this series documents (and implements in receive)
> > BTRFS_SEND_C_SETFLAGS is that it's a simple call to FS_IOC_SETFLAGS with
> > given flags. I don't think this is affected by the change to fileattr,
> > unless I'm misunderstanding.
> 
> The SETFLAGS ioctls are obsolete and I don't want to make them part of
> the protocol defition because the bit namespace contains flags we don't
> have implemented or are not releated to anything in btrfs.
> 
> https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/fs.h#L220
> 
> It's basically just naming and specifying what exactly is the value so
> we should pick the most recent interface name that superseded SETFLAGS
> and the XFLAGS.

This is the situation with FS_IOC_SETFLAGS, FS_IOC_FSSETXATTR, and
fileattr as I understand it. Please correct me if I'm wrong:

- FS_IOC_SETFLAGS originally came from ext4 and was added to Btrfs very
  early on (commit 6cbff00f4632 ("Btrfs: implement
  FS_IOC_GETFLAGS/SETFLAGS/GETVERSION")).
- FS_IOC_FSSETXATTR originally came from XFS and was added to Btrfs a
  few years ago (in commit 025f2121488e ("btrfs: add FS_IOC_FSSETXATTR
  ioctl")).
- The two ioctls allow setting some of the same flags (e.g., IMMUTABLE,
  APPEND), but some are only supported by SETFLAGS (e.g., NOCOW) and
  some are only supported by FSSETXATTR (none of these are supported by
  Btrfs, however).
- fileattr is a recent VFS interface that is used to implement those two
  ioctls. It basically passes through the arguments for whichever ioctl
  was called and translates the equivalent flags between the two ioctls.
  It is not a new UAPI and doesn't have its own set of flags.

Is there another new UAPI that I'm missing that obsoletes SETFLAGS?

I see your point about the irrelevant flags in SETFLAGS, however. Is
your suggestion to have our own send protocol-specific set of flags that
we translate to whatever ioctl we need to make?

> > This is in line with the other commands being straightforward system
> > calls, but it does mean that the sending side has to deal with the
> > complexities of an immutable or append-only file being modified between
> > incremental sends (by temporarily clearing the flag), and of inherited
> > flags (e.g., a COW file inside of a NOCOW directory).
> 
> Yeah the receiving side needs to understand the constraints of the
> flags, it has only the information about the final state and not the
> order in which the flags get applied.

If the sender only tells the receiver what the final flags are, then
yes, the receiver would need to deal with, e.g., temporarily clearing
and resetting flags. The way I envisioned it was that the sender would
instead send commands for those intermediate flag operations. E.g., if
the incremental send requires writing some data to a file that is
immutable in both the source and the parent subvolume, the sender could
send commands to: clear the immutable flag, write the data, set the
immutable flag. This is a lot like the orphan renaming that you
mentioned.

If we want to have receive handle the intermediate states instead, then
I would like to postpone SETFLAGS (or whatever we call it) to send
protocol v3, since it'll be very tricky to get right and we can't add it
to the protocol without having an implementation in the receiver.

On the other hand, if send takes care of the intermediate states and
receive just has to blindly apply the flags, then we can add SETFLAGS to
the protocol and receive now and implement it in send later. That is
exactly what this patch series does.

I'm fine with either of those paths forward, but I don't want to block
the compressed send/receive on SETFLAGS or fallocate.

Thanks,
Omar

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v15 3/7] btrfs: add send stream v2 definitions
  2022-05-19 22:31         ` Omar Sandoval
@ 2022-05-20 19:34           ` David Sterba
  2022-05-20 20:58             ` g.btrfs
  0 siblings, 1 reply; 15+ messages in thread
From: David Sterba @ 2022-05-20 19:34 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: dsterba, linux-btrfs, kernel-team

On Thu, May 19, 2022 at 03:31:56PM -0700, Omar Sandoval wrote:
> On Thu, May 19, 2022 at 06:07:49PM +0200, David Sterba wrote:
> > The SETFLAGS ioctls are obsolete and I don't want to make them part of
> > the protocol defition because the bit namespace contains flags we don't
> > have implemented or are not releated to anything in btrfs.
> > 
> > https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/fs.h#L220
> > 
> > It's basically just naming and specifying what exactly is the value so
> > we should pick the most recent interface name that superseded SETFLAGS
> > and the XFLAGS.
> 
> This is the situation with FS_IOC_SETFLAGS, FS_IOC_FSSETXATTR, and
> fileattr as I understand it. Please correct me if I'm wrong:
> 
> - FS_IOC_SETFLAGS originally came from ext4 and was added to Btrfs very
>   early on (commit 6cbff00f4632 ("Btrfs: implement
>   FS_IOC_GETFLAGS/SETFLAGS/GETVERSION")).
> - FS_IOC_FSSETXATTR originally came from XFS and was added to Btrfs a
>   few years ago (in commit 025f2121488e ("btrfs: add FS_IOC_FSSETXATTR
>   ioctl")).
> - The two ioctls allow setting some of the same flags (e.g., IMMUTABLE,
>   APPEND), but some are only supported by SETFLAGS (e.g., NOCOW) and
>   some are only supported by FSSETXATTR (none of these are supported by
>   Btrfs, however).
> - fileattr is a recent VFS interface that is used to implement those two
>   ioctls. It basically passes through the arguments for whichever ioctl
>   was called and translates the equivalent flags between the two ioctls.
>   It is not a new UAPI and doesn't have its own set of flags.
> 
> Is there another new UAPI that I'm missing that obsoletes SETFLAGS?

That was supposed to be FSSETXATTR, new flags have appeared there, the
reason for btrfs was to allow the FS_XFLAG_DAX bit as people are were
working on the DAX support, and potentially other bits like
FS_XFLAG_NOSYMLINKS or FS_XFLAG_NODEFRAG. Or new flags that we want to
be able to set, NODATASUM for example.

> I see your point about the irrelevant flags in SETFLAGS, however. Is
> your suggestion to have our own send protocol-specific set of flags that
> we translate to whatever ioctl we need to make?

Yes, that's the idea, the flags are not protocol-specific but rather
btrfs-specific, ie we want to support namely the bits that btrfs inodes
can have.

> > > This is in line with the other commands being straightforward system
> > > calls, but it does mean that the sending side has to deal with the
> > > complexities of an immutable or append-only file being modified between
> > > incremental sends (by temporarily clearing the flag), and of inherited
> > > flags (e.g., a COW file inside of a NOCOW directory).
> > 
> > Yeah the receiving side needs to understand the constraints of the
> > flags, it has only the information about the final state and not the
> > order in which the flags get applied.
> 
> If the sender only tells the receiver what the final flags are, then
> yes, the receiver would need to deal with, e.g., temporarily clearing
> and resetting flags. The way I envisioned it was that the sender would
> instead send commands for those intermediate flag operations. E.g., if
> the incremental send requires writing some data to a file that is
> immutable in both the source and the parent subvolume, the sender could
> send commands to: clear the immutable flag, write the data, set the
> immutable flag. This is a lot like the orphan renaming that you
> mentioned.

I see, so the question is where do we want to put the logic. I'd go with
userspace as lots of things are easier there, eg. maitaining some
intermediate state or delayed application of bits/flags.

> If we want to have receive handle the intermediate states instead, then
> I would like to postpone SETFLAGS (or whatever we call it) to send
> protocol v3, since it'll be very tricky to get right and we can't add it
> to the protocol without having an implementation in the receiver.

Yeah it would be tricky to generate the sequence right, while if it's on
the receiving side we can simply ignore/report it or implement a subset
where we know how to apply (eg. immutable) and don't need to postpone
it.

> On the other hand, if send takes care of the intermediate states and
> receive just has to blindly apply the flags, then we can add SETFLAGS to
> the protocol and receive now and implement it in send later. That is
> exactly what this patch series does.

It adds a command to the protocol but does not outline the plan how to
use it, not counting this discussion.

> I'm fine with either of those paths forward, but I don't want to block
> the compressed send/receive on SETFLAGS or fallocate.

I get that you care only about the encoded write, but I don't want to
rev protocol every few releases because we did not bother to implement
something we know is missing in the protocol. Anyway, encoded write will
be in v2 scheduled for 5.20 and I'll implement the rest plus will have a
look at your fallocate patches.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v15 3/7] btrfs: add send stream v2 definitions
  2022-05-20 19:34           ` David Sterba
@ 2022-05-20 20:58             ` g.btrfs
  0 siblings, 0 replies; 15+ messages in thread
From: g.btrfs @ 2022-05-20 20:58 UTC (permalink / raw)
  To: dsterba, Omar Sandoval, linux-btrfs, kernel-team

On 20/05/2022 20:34, David Sterba wrote:
> On Thu, May 19, 2022 at 03:31:56PM -0700, Omar Sandoval wrote:
>> On Thu, May 19, 2022 at 06:07:49PM +0200, David Sterba wrote:
>>> The SETFLAGS ioctls are obsolete and I don't want to make them part of
>>> the protocol defition because the bit namespace contains flags we don't
>>> have implemented or are not releated to anything in btrfs.
>>>
>>> https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/fs.h#L220
>>>
>>> It's basically just naming and specifying what exactly is the value so
>>> we should pick the most recent interface name that superseded SETFLAGS
>>> and the XFLAGS.
>>
>> This is the situation with FS_IOC_SETFLAGS, FS_IOC_FSSETXATTR, and
>> fileattr as I understand it. Please correct me if I'm wrong:
>>
>> - FS_IOC_SETFLAGS originally came from ext4 and was added to Btrfs very
>>   early on (commit 6cbff00f4632 ("Btrfs: implement
>>   FS_IOC_GETFLAGS/SETFLAGS/GETVERSION")).
>> - FS_IOC_FSSETXATTR originally came from XFS and was added to Btrfs a
>>   few years ago (in commit 025f2121488e ("btrfs: add FS_IOC_FSSETXATTR
>>   ioctl")).
>> - The two ioctls allow setting some of the same flags (e.g., IMMUTABLE,
>>   APPEND), but some are only supported by SETFLAGS (e.g., NOCOW) and
>>   some are only supported by FSSETXATTR (none of these are supported by
>>   Btrfs, however).
>> - fileattr is a recent VFS interface that is used to implement those two
>>   ioctls. It basically passes through the arguments for whichever ioctl
>>   was called and translates the equivalent flags between the two ioctls.
>>   It is not a new UAPI and doesn't have its own set of flags.
>>
>> Is there another new UAPI that I'm missing that obsoletes SETFLAGS?
> 
> That was supposed to be FSSETXATTR, new flags have appeared there, the
> reason for btrfs was to allow the FS_XFLAG_DAX bit as people are were
> working on the DAX support, and potentially other bits like
> FS_XFLAG_NOSYMLINKS or FS_XFLAG_NODEFRAG. Or new flags that we want to
> be able to set, NODATASUM for example.
> 
>> I see your point about the irrelevant flags in SETFLAGS, however. Is
>> your suggestion to have our own send protocol-specific set of flags that
>> we translate to whatever ioctl we need to make?
> 
> Yes, that's the idea, the flags are not protocol-specific but rather
> btrfs-specific, ie we want to support namely the bits that btrfs inodes
> can have.
> 

>>>> This is in line with the other commands being straightforward system
>>>> calls, but it does mean that the sending side has to deal with the
>>>> complexities of an immutable or append-only file being modified between
>>>> incremental sends (by temporarily clearing the flag), and of inherited
>>>> flags (e.g., a COW file inside of a NOCOW directory).
>>>
>>> Yeah the receiving side needs to understand the constraints of the
>>> flags, it has only the information about the final state and not the
>>> order in which the flags get applied.
>>
>> If the sender only tells the receiver what the final flags are, then
>> yes, the receiver would need to deal with, e.g., temporarily clearing
>> and resetting flags. The way I envisioned it was that the sender would
>> instead send commands for those intermediate flag operations. E.g., if
>> the incremental send requires writing some data to a file that is
>> immutable in both the source and the parent subvolume, the sender could
>> send commands to: clear the immutable flag, write the data, set the
>> immutable flag. This is a lot like the orphan renaming that you
>> mentioned.
> 
> I see, so the question is where do we want to put the logic. I'd go with
> userspace as lots of things are easier there, eg. maitaining some
> intermediate state or delayed application of bits/flags.
>
We should remember that what you are designing here is a protocol for
transmission of a snapshot. The protocol features are numbers that have
to remain unchanged across all future software updates and all
implementations of this version of the protocol. The values may or may
not co-incidentally match some constant we know today in the Linux 5.x
ABI but it may have no relation to *anything* on the receiving side.
Don't forget that some people are archiving send streams as a form of
backup with the intention of playing them back in 10 or 20 years time on
a btrfs implementation that might bear little resemblance to anything we
would recognise today (not a good idea, but people are doing it).

There is no way the sender has any idea what those future
implementations might have to do to replicate the source data - that is
up to those implementations. For example, it is very easy to imagine
that some future OS might disallow the concept of "clearing the
immutable flag". In that case, what the receiver needs to know is that
this new data represents a new immutable file, based on the contents of
previous file with some specified differences - maybe it will handle
this by turning off the immutable flag, or deleting the old file and
writing a new one, or asking the system manager to authorize it, or
using some versioning feature built into a future version of btrfs, or ...

The right question isn't to ask "what would a Linux BTRFS receiver
running the same software rev as the sender need to do", it needs to be
"what information do I need to supply that gives a future receiver, on a
completely different system, with a different I/O architecture and a
different kernel, in 20 years, the best chance to implement it".

Graham

P.S. I'm an old network guy, not a file system guy. Send/receive is a
network protocol, with the added problems that (i) there is no
negotiation or feedback channel, and (ii) the data is probably mission
critical to some people and they expect it to be usable in 10's of years
time.

>> If we want to have receive handle the intermediate states instead, then
>> I would like to postpone SETFLAGS (or whatever we call it) to send
>> protocol v3, since it'll be very tricky to get right and we can't add it
>> to the protocol without having an implementation in the receiver.
> 
> Yeah it would be tricky to generate the sequence right, while if it's on
> the receiving side we can simply ignore/report it or implement a subset
> where we know how to apply (eg. immutable) and don't need to postpone
> it.
> 
>> On the other hand, if send takes care of the intermediate states and
>> receive just has to blindly apply the flags, then we can add SETFLAGS to
>> the protocol and receive now and implement it in send later. That is
>> exactly what this patch series does.
> 
> It adds a command to the protocol but does not outline the plan how to
> use it, not counting this discussion.
> 
>> I'm fine with either of those paths forward, but I don't want to block
>> the compressed send/receive on SETFLAGS or fallocate.
> 
> I get that you care only about the encoded write, but I don't want to
> rev protocol every few releases because we did not bother to implement
> something we know is missing in the protocol. Anyway, encoded write will
> be in v2 scheduled for 5.20 and I'll implement the rest plus will have a
> look at your fallocate patches.


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-05-20 20:59 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-04 17:29 [PATCH v15 0/7] btrfs: add send/receive support for reading/writing compressed data Omar Sandoval
2022-04-04 17:29 ` [PATCH v15 1/7] btrfs: send: remove unused send_ctx::{total,cmd}_send_size Omar Sandoval
2022-04-04 17:29 ` [PATCH v15 2/7] btrfs: send: explicitly number commands and attributes Omar Sandoval
2022-05-18 22:24   ` David Sterba
2022-04-04 17:29 ` [PATCH v15 3/7] btrfs: add send stream v2 definitions Omar Sandoval
2022-05-18 21:00   ` David Sterba
2022-05-18 22:25     ` Omar Sandoval
2022-05-19 16:07       ` David Sterba
2022-05-19 22:31         ` Omar Sandoval
2022-05-20 19:34           ` David Sterba
2022-05-20 20:58             ` g.btrfs
2022-04-04 17:29 ` [PATCH v15 4/7] btrfs: send: write larger chunks when using stream v2 Omar Sandoval
2022-04-04 17:29 ` [PATCH v15 5/7] btrfs: send: get send buffer pages for protocol v2 Omar Sandoval
2022-04-04 17:29 ` [PATCH v15 6/7] btrfs: send: send compressed extents with encoded writes Omar Sandoval
2022-04-04 17:29 ` [PATCH v15 7/7] btrfs: send: enable support for stream v2 and compressed writes Omar Sandoval

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.