All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] btrfs-progs: receive: introduce new --clone-fallback option
@ 2021-09-30 11:48 Qu Wenruo
  2021-09-30 11:48 ` [PATCH v2 1/2] btrfs-progs: receive: fallback to buffered copy if clone failed Qu Wenruo
  2021-09-30 11:48 ` [PATCH v2 2/2] btrfs-progs: misc-tests: add test case for receive --clone-fallback Qu Wenruo
  0 siblings, 2 replies; 8+ messages in thread
From: Qu Wenruo @ 2021-09-30 11:48 UTC (permalink / raw)
  To: linux-btrfs

When parent stream and incremental stream are received with different
nodatasum mount options, any clone opeartion in the incremental stream
will be rejected by kernel.

There are more situations to cause clone failure, like receiving a stream
on a fs with different sectorsize.

Thus this patchset will introduce a new option, --clone-fallback, for
btrfs receive to fall back to buffered write when clone failed.

This fall back behavior will only happen if the new option is explicitly
specified, as such behavior can hide some send bugs, and under most sane
cases users don't need such option.

Also add a test case for the new option.

Changelog:
RFC->v1:
- Introduce a new option for the fallback behavior
  To avoid hide send bugs.

- Hide the warning message behind -v option
  Since we have a special option for it thus users are aware of what
  they are doing, there is no need to output such warning by default.

- Add a new test case for it

v2:
- Add the missing help string for --clone-fallback option

- Rephrase the words in comments and commit messages

- Add run_check_remount_test_dev() helper

Qu Wenruo (2):
  btrfs-progs: receive: fallback to buffered copy if clone failed
  btrfs-progs: misc-tests: add test case for receive --clone-fallback

 Documentation/btrfs-receive.asciidoc          | 12 ++++
 cmds/receive.c                                | 62 ++++++++++++++++++-
 tests/common                                  |  9 +++
 .../049-receive-clone-fallback/test.sh        | 58 +++++++++++++++++
 4 files changed, 138 insertions(+), 3 deletions(-)
 create mode 100755 tests/misc-tests/049-receive-clone-fallback/test.sh

-- 
2.33.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/2] btrfs-progs: receive: fallback to buffered copy if clone failed
  2021-09-30 11:48 [PATCH v2 0/2] btrfs-progs: receive: introduce new --clone-fallback option Qu Wenruo
@ 2021-09-30 11:48 ` Qu Wenruo
  2021-09-30 12:47   ` Graham Cobb
  2021-09-30 12:51   ` Filipe Manana
  2021-09-30 11:48 ` [PATCH v2 2/2] btrfs-progs: misc-tests: add test case for receive --clone-fallback Qu Wenruo
  1 sibling, 2 replies; 8+ messages in thread
From: Qu Wenruo @ 2021-09-30 11:48 UTC (permalink / raw)
  To: linux-btrfs

[BUG]
There are two very basic send streams:
(a/m/ctime and uuid omitted)

  Stream 1: (Parent subvolume)
  subvol   ./parent_subv           transid=8
  chown    ./parent_subv/          gid=0 uid=0
  chmod    ./parent_subv/          mode=755
  utimes   ./parent_subv/
  mkfile   ./parent_subv/o257-7-0
  rename   ./parent_subv/o257-7-0  dest=./parent_subv/source
  utimes   ./parent_subv/
  write    ./parent_subv/source    offset=0 len=16384
  chown    ./parent_subv/source    gid=0 uid=0
  chmod    ./parent_subv/source    mode=600
  utimes   ./parent_subv/source

  Stream 2: (snapshot and clone)
  snapshot ./dest_subv             transid=14 parent_transid=10
  utimes   ./dest_subv/
  mkfile   ./dest_subv/o258-14-0
  rename   ./dest_subv/o258-14-0   dest=./dest_subv/reflink
  utimes   ./dest_subv/
  clone    ./dest_subv/reflink     offset=0 len=16384 from=./dest_subv/source clone_offset=0
  chown    ./dest_subv/reflink     gid=0 uid=0
  chmod    ./dest_subv/reflink     mode=600
  utimes   ./dest_subv/reflink

But if we receive the first stream with default mount, then remount to
nodatasum, and try to receive the second stream, it will fail:

 # mount /mnt/btrfs
 # btrfs receive -f ~/parent_stream /mnt/btrfs/
 At subvol parent_subv
 # mount -o remount,nodatasum /mnt/btrfs
 # btrfs receive -f ~/clone_stream /mnt/btrfs/
 At snapshot dest_subv
 ERROR: failed to clone extents to reflink: Invalid argument
 # echo $?
 1

[CAUSE]
Btrfs doesn't allow clone source and destination files have different
NODATASUM flags.
This is to prevent a data extent to be owned by both NODATASUM inode and
regular DATASUM inode.

For above receive operations, the clone destination is inheriting the
NODATASUM flag from mount option, while the clone source has no
NODATASUM flag, thus preventing us from doing the clone.

[FIX]
Btrfs send/receive doesn't require the underlying inode has the same
flags (thus we can send from compressed extent and receive on a
non-compressed filesystem).

So here we add a new command line option, '--clone-fallback', to allow
btrfs-receive to fall back to buffered write to copy data from the
source file.

Since such behavior can result much less clone operations, which may not
be what the end users really want, and can hide bugs in send stream.
Thus this behavior must be explicitly specified by the new option.

And we will output a warning message each time such fallback is
triggered if the user wants extra debug output.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 Documentation/btrfs-receive.asciidoc | 12 ++++++
 cmds/receive.c                       | 62 ++++++++++++++++++++++++++--
 2 files changed, 71 insertions(+), 3 deletions(-)

diff --git a/Documentation/btrfs-receive.asciidoc b/Documentation/btrfs-receive.asciidoc
index e4c4d2c0bf3d..9c934a399a9c 100644
--- a/Documentation/btrfs-receive.asciidoc
+++ b/Documentation/btrfs-receive.asciidoc
@@ -65,6 +65,18 @@ dump the stream metadata, one line per operation
 +
 Does not require the 'path' parameter. The filesystem remains unchanged.
 
+--clone-fallback::
+when clone opeartions fail, attempt to directly copy the data instead.
++
+When the source and destination filesystems have different sector sizes, or
+when source and destination files have differnt 'nodatacow' and/or 'nodatasum'
+flags (can be set per-file or through mount options), clone operations can fail.
++
+This option makes the receive process attempt to manually copy data from the
+source to the destination file when a clone operation fails (caused by above
+reasons). When this happens, extents will end up not being shared
+between the files, thus will take up more space.
+
 -q|--quiet::
 (deprecated) alias for global '-q' option
 
diff --git a/cmds/receive.c b/cmds/receive.c
index 48c774cea587..31746d571016 100644
--- a/cmds/receive.c
+++ b/cmds/receive.c
@@ -76,6 +76,8 @@ struct btrfs_receive
 	struct subvol_uuid_search sus;
 
 	int honor_end_cmd;
+
+	bool clone_fallback;
 };
 
 static int finish_subvol(struct btrfs_receive *rctx)
@@ -705,6 +707,44 @@ out:
 	return ret;
 }
 
+static int buffered_copy(int src_fd, int dst_fd, u64 src_offset, u64 len,
+			 u64 dest_offset)
+{
+	unsigned char buf[SZ_32K];
+	u64 copied = 0;
+	int ret = 0;
+
+	while (copied < len) {
+		u32 copy_len = min_t(u32, ARRAY_SIZE(buf), len - copied);
+		u32 written = 0;
+		ssize_t read_size;
+
+		read_size = pread(src_fd, buf, copy_len, src_offset + copied);
+		if (read < 0) {
+			ret = -errno;
+			error("failed to read source file: %m");
+			return ret;
+		}
+
+		/* Write the buffer to dest file */
+		while (written < read_size) {
+			ssize_t write_size;
+
+			write_size = pwrite(dst_fd, buf + written,
+					read_size - written,
+					dest_offset + copied + written);
+			if (write_size < 0) {
+				ret = -errno;
+				error("failed to write source file: %m");
+				return ret;
+			}
+			written += write_size;
+		}
+		copied += read_size;
+	}
+	return ret;
+}
+
 static int process_clone(const char *path, u64 offset, u64 len,
 			 const u8 *clone_uuid, u64 clone_ctransid,
 			 const char *clone_path, u64 clone_offset,
@@ -788,8 +828,17 @@ static int process_clone(const char *path, u64 offset, u64 len,
 	ret = ioctl(rctx->write_fd, BTRFS_IOC_CLONE_RANGE, &clone_args);
 	if (ret < 0) {
 		ret = -errno;
-		error("failed to clone extents to %s: %m", path);
-		goto out;
+		if (ret != -EINVAL || !rctx->clone_fallback) {
+			error("failed to clone extents to %s: %m", path);
+			goto out;
+		}
+
+		if (bconf.verbose >= 2)
+			warning(
+		"failed to clone extents to %s, fallback to buffered write",
+				path);
+		ret = buffered_copy(clone_fd, rctx->write_fd, clone_offset,
+				    len, offset);
 	}
 
 out:
@@ -1197,6 +1246,8 @@ static const char * const cmd_receive_usage[] = {
 	"                 this file system is mounted.",
 	"--dump           dump stream metadata, one line per operation,",
 	"                 does not require the MOUNT parameter",
+	"--clone-fallback when clone operations fail, attempt to directly copy"
+	"                 the data instead"
 	"-v               deprecated, alias for global -v option",
 	HELPINFO_INSERT_GLOBALS,
 	HELPINFO_INSERT_VERBOSE,
@@ -1238,11 +1289,13 @@ static int cmd_receive(const struct cmd_struct *cmd, int argc, char **argv)
 	optind = 0;
 	while (1) {
 		int c;
-		enum { GETOPT_VAL_DUMP = 257 };
+		enum { GETOPT_VAL_DUMP = 257, GETOPT_VAL_CLONE_FALLBACK };
 		static const struct option long_opts[] = {
 			{ "max-errors", required_argument, NULL, 'E' },
 			{ "chroot", no_argument, NULL, 'C' },
 			{ "dump", no_argument, NULL, GETOPT_VAL_DUMP },
+			{ "clone-fallback", no_argument, NULL,
+				GETOPT_VAL_CLONE_FALLBACK},
 			{ "quiet", no_argument, NULL, 'q' },
 			{ NULL, 0, NULL, 0 }
 		};
@@ -1286,6 +1339,9 @@ static int cmd_receive(const struct cmd_struct *cmd, int argc, char **argv)
 		case GETOPT_VAL_DUMP:
 			dump = 1;
 			break;
+		case GETOPT_VAL_CLONE_FALLBACK:
+			rctx.clone_fallback = true;
+			break;
 		default:
 			usage_unknown_option(cmd, argv);
 		}
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 2/2] btrfs-progs: misc-tests: add test case for receive --clone-fallback
  2021-09-30 11:48 [PATCH v2 0/2] btrfs-progs: receive: introduce new --clone-fallback option Qu Wenruo
  2021-09-30 11:48 ` [PATCH v2 1/2] btrfs-progs: receive: fallback to buffered copy if clone failed Qu Wenruo
@ 2021-09-30 11:48 ` Qu Wenruo
  2021-09-30 12:53   ` Filipe Manana
  1 sibling, 1 reply; 8+ messages in thread
From: Qu Wenruo @ 2021-09-30 11:48 UTC (permalink / raw)
  To: linux-btrfs

The new test case will create two send streams:

- parent_stream
  A full send stream.
  Contains one file, as clone source.

- clone_stream
  An incremental send stream.
  Contains one clone operation.

Then we will receive the parent_stream with default mount options, while
try to receive the clone_stream with nodatasum mount option.

This should result clone failure due to nodatasum flag mismatch.

Then check if the receive can continue with --clone-fallback option.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 tests/common                                  |  9 +++
 .../049-receive-clone-fallback/test.sh        | 58 +++++++++++++++++++
 2 files changed, 67 insertions(+)
 create mode 100755 tests/misc-tests/049-receive-clone-fallback/test.sh

diff --git a/tests/common b/tests/common
index 253071025db2..0423af4231a8 100644
--- a/tests/common
+++ b/tests/common
@@ -633,6 +633,15 @@ run_check_mount_test_dev()
 	run_check $SUDO_HELPER mount -t btrfs $loop_opt "$@" "$TEST_DEV" "$TEST_MNT"
 }
 
+run_check_remount_test_dev()
+{
+	setup_root_helper
+
+	local opts="$1"
+
+	run_check $SUDO_HELPER mount -o "remount,$opts" "$TEST_MNT"
+}
+
 # $1-$n: optional paths to unmount, otherwise fallback to TEST_DEV
 run_check_umount_test_dev()
 {
diff --git a/tests/misc-tests/049-receive-clone-fallback/test.sh b/tests/misc-tests/049-receive-clone-fallback/test.sh
new file mode 100755
index 000000000000..18136a9e63ee
--- /dev/null
+++ b/tests/misc-tests/049-receive-clone-fallback/test.sh
@@ -0,0 +1,58 @@
+#!/bin/bash
+# Verify that btrfs receive can fallback to buffered copy when clone
+# failed
+
+source "$TEST_TOP/common"
+
+check_prereq mkfs.btrfs
+check_prereq btrfs
+setup_root_helper
+prepare_test_dev
+
+tmp=$(mktemp -d --tmpdir btrfs-progs-send-stream.XXXXXX)
+
+# Create two sends stream, one as the parent and the other as an incremental
+# stream with one clone operation.
+run_check_mkfs_test_dev
+run_check_mount_test_dev -o datacow,datasum
+run_check $SUDO_HELPER "$TOP/btrfs" subvolume create "$TEST_MNT/subv"
+run_check $SUDO_HELPER dd if=/dev/zero bs=1M count=1 of="$TEST_MNT/subv/file1" 
+run_check $SUDO_HELPER "$TOP/btrfs" subvolume snapshot -r "$TEST_MNT/subv" \
+	"$TEST_MNT/snap1"
+run_check $SUDO_HELPER cp --reflink=always "$TEST_MNT/subv/file1" \
+	"$TEST_MNT/subv/file2"
+run_check $SUDO_HELPER "$TOP/btrfs" subvolume snapshot -r "$TEST_MNT/subv" \
+	"$TEST_MNT/snap2"
+
+run_check $SUDO_HELPER "$TOP/btrfs" send -f "$tmp/parent_stream" \
+	"$TEST_MNT/snap1"
+run_check $SUDO_HELPER "$TOP/btrfs" send -f "$tmp/clone_stream" \
+	-p "$TEST_MNT/snap1" "$TEST_MNT/snap2"
+
+run_check_umount_test_dev
+run_check_mkfs_test_dev
+
+# Receive the first stream with the same mount option
+run_check_mount_test_dev -o datacow -o datasum
+
+# Receiving the full stream should not fail
+run_check $SUDO_HELPER "$TOP/btrfs" receive -f "$tmp/parent_stream" "$TEST_MNT"
+
+# Remount the fs with nodatasum mount option, so that the new file received
+# through the incremental stream will end up with the nodatasum flag set.
+run_check_remount_test_dev nodatasum
+
+# Receiving incremental send stream without --clone-fallback should fail.
+# As the clone source and destination have different NODATASUM flags
+run_mustfail "receiving clone with different NODATASUM should fail" \
+	$SUDO_HELPER "$TOP/btrfs" receive -f "$tmp/clone_stream" "$TEST_MNT"
+
+# Firstly we need to cleanup the partially received subvolume
+run_check $SUDO_HELPER "$TOP/btrfs" subvolume delete "$TEST_MNT/snap2"
+
+# With --clone-fallback specified, the receive should finish without problem
+run_check $SUDO_HELPER "$TOP/btrfs" receive --clone-fallback \
+	-f "$tmp/clone_stream" "$TEST_MNT"
+run_check_umount_test_dev
+
+rm -rf -- "$tmp"
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/2] btrfs-progs: receive: fallback to buffered copy if clone failed
  2021-09-30 11:48 ` [PATCH v2 1/2] btrfs-progs: receive: fallback to buffered copy if clone failed Qu Wenruo
@ 2021-09-30 12:47   ` Graham Cobb
  2021-09-30 13:31     ` Filipe Manana
  2021-09-30 14:19     ` Qu Wenruo
  2021-09-30 12:51   ` Filipe Manana
  1 sibling, 2 replies; 8+ messages in thread
From: Graham Cobb @ 2021-09-30 12:47 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 30/09/2021 12:48, Qu Wenruo wrote:
> [BUG]
> There are two very basic send streams:
> (a/m/ctime and uuid omitted)
> 
>   Stream 1: (Parent subvolume)
>   subvol   ./parent_subv           transid=8
>   chown    ./parent_subv/          gid=0 uid=0
>   chmod    ./parent_subv/          mode=755
>   utimes   ./parent_subv/
>   mkfile   ./parent_subv/o257-7-0
>   rename   ./parent_subv/o257-7-0  dest=./parent_subv/source
>   utimes   ./parent_subv/
>   write    ./parent_subv/source    offset=0 len=16384
>   chown    ./parent_subv/source    gid=0 uid=0
>   chmod    ./parent_subv/source    mode=600
>   utimes   ./parent_subv/source
> 
>   Stream 2: (snapshot and clone)
>   snapshot ./dest_subv             transid=14 parent_transid=10
>   utimes   ./dest_subv/
>   mkfile   ./dest_subv/o258-14-0
>   rename   ./dest_subv/o258-14-0   dest=./dest_subv/reflink
>   utimes   ./dest_subv/
>   clone    ./dest_subv/reflink     offset=0 len=16384 from=./dest_subv/source clone_offset=0
>   chown    ./dest_subv/reflink     gid=0 uid=0
>   chmod    ./dest_subv/reflink     mode=600
>   utimes   ./dest_subv/reflink
> 
> But if we receive the first stream with default mount, then remount to
> nodatasum, and try to receive the second stream, it will fail:
> 
>  # mount /mnt/btrfs
>  # btrfs receive -f ~/parent_stream /mnt/btrfs/
>  At subvol parent_subv
>  # mount -o remount,nodatasum /mnt/btrfs
>  # btrfs receive -f ~/clone_stream /mnt/btrfs/
>  At snapshot dest_subv
>  ERROR: failed to clone extents to reflink: Invalid argument
>  # echo $?
>  1
> 
> [CAUSE]
> Btrfs doesn't allow clone source and destination files have different
> NODATASUM flags.
> This is to prevent a data extent to be owned by both NODATASUM inode and
> regular DATASUM inode.
> 
> For above receive operations, the clone destination is inheriting the
> NODATASUM flag from mount option, while the clone source has no
> NODATASUM flag, thus preventing us from doing the clone.
> 
> [FIX]
> Btrfs send/receive doesn't require the underlying inode has the same
> flags (thus we can send from compressed extent and receive on a
> non-compressed filesystem).
> 
> So here we add a new command line option, '--clone-fallback', to allow
> btrfs-receive to fall back to buffered write to copy data from the
> source file.
> 
> Since such behavior can result much less clone operations, which may not
> be what the end users really want, and can hide bugs in send stream.
> Thus this behavior must be explicitly specified by the new option.
> 
> And we will output a warning message each time such fallback is
> triggered if the user wants extra debug output.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  Documentation/btrfs-receive.asciidoc | 12 ++++++
>  cmds/receive.c                       | 62 ++++++++++++++++++++++++++--
>  2 files changed, 71 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/btrfs-receive.asciidoc b/Documentation/btrfs-receive.asciidoc
> index e4c4d2c0bf3d..9c934a399a9c 100644
> --- a/Documentation/btrfs-receive.asciidoc
> +++ b/Documentation/btrfs-receive.asciidoc
> @@ -65,6 +65,18 @@ dump the stream metadata, one line per operation
>  +
>  Does not require the 'path' parameter. The filesystem remains unchanged.
>  
> +--clone-fallback::
> +when clone opeartions fail, attempt to directly copy the data instead.
> ++
> +When the source and destination filesystems have different sector sizes, or
> +when source and destination files have differnt 'nodatacow' and/or 'nodatasum'

typo: "different"

> +flags (can be set per-file or through mount options), clone operations can fail.
> ++
> +This option makes the receive process attempt to manually copy data from the
> +source to the destination file when a clone operation fails (caused by above
> +reasons). When this happens, extents will end up not being shared
> +between the files, thus will take up more space.

Send/receive and the storage savings available by storing snapshots are
important btrfs features for many sysadmins. I think the documentation
needs to be a bit clearer.

1) It says that the fallback happens when the clone operation fails
"caused by above reasons". Is that right? Is it **only** those cases
that can cause EINVAL error? In the earlier email discussion there was
mention of different compression settings - would that cause a problem?
What about new features like the VerifyFS stuff being worked on (I have
no idea - just choosing a work-in-progress item as an example). If these
are the only two cases, I think there needs to be a code comment in the
kernel code that returns this error that if any other cases are
introduced the documentation for --clone-fallback needs to be updated.

2) In any case, "caused by above reasons" sounds a bit unnatural to me
(native English speaker). I would suggest replacing "(caused by above
reasons)" with "in this way".

3) And maybe add another sentence: "A warning message will be displayed
when this happens, if the --verbose option is in effect".

> +
>  -q|--quiet::
>  (deprecated) alias for global '-q' option
>  
> diff --git a/cmds/receive.c b/cmds/receive.c
> index 48c774cea587..31746d571016 100644
> --- a/cmds/receive.c
> +++ b/cmds/receive.c
> @@ -76,6 +76,8 @@ struct btrfs_receive
>  	struct subvol_uuid_search sus;
>  
>  	int honor_end_cmd;
> +
> +	bool clone_fallback;
>  };
>  
>  static int finish_subvol(struct btrfs_receive *rctx)
> @@ -705,6 +707,44 @@ out:
>  	return ret;
>  }
>  
> +static int buffered_copy(int src_fd, int dst_fd, u64 src_offset, u64 len,
> +			 u64 dest_offset)
> +{
> +	unsigned char buf[SZ_32K];
> +	u64 copied = 0;
> +	int ret = 0;
> +
> +	while (copied < len) {
> +		u32 copy_len = min_t(u32, ARRAY_SIZE(buf), len - copied);
> +		u32 written = 0;
> +		ssize_t read_size;
> +
> +		read_size = pread(src_fd, buf, copy_len, src_offset + copied);
> +		if (read < 0) {
> +			ret = -errno;
> +			error("failed to read source file: %m");
> +			return ret;
> +		}
> +
> +		/* Write the buffer to dest file */
> +		while (written < read_size) {
> +			ssize_t write_size;
> +
> +			write_size = pwrite(dst_fd, buf + written,
> +					read_size - written,
> +					dest_offset + copied + written);
> +			if (write_size < 0) {
> +				ret = -errno;
> +				error("failed to write source file: %m");
> +				return ret;
> +			}
> +			written += write_size;
> +		}
> +		copied += read_size;
> +	}
> +	return ret;
> +}
> +
>  static int process_clone(const char *path, u64 offset, u64 len,
>  			 const u8 *clone_uuid, u64 clone_ctransid,
>  			 const char *clone_path, u64 clone_offset,
> @@ -788,8 +828,17 @@ static int process_clone(const char *path, u64 offset, u64 len,
>  	ret = ioctl(rctx->write_fd, BTRFS_IOC_CLONE_RANGE, &clone_args);
>  	if (ret < 0) {
>  		ret = -errno;
> -		error("failed to clone extents to %s: %m", path);
> -		goto out;
> +		if (ret != -EINVAL || !rctx->clone_fallback) {
> +			error("failed to clone extents to %s: %m", path);
> +			goto out;
> +		}
> +
> +		if (bconf.verbose >= 2)
> +			warning(
> +		"failed to clone extents to %s, fallback to buffered write",

I think this message needs to tell the user how many bytes which they
expected to be cloned are now being duplicated. Something like "failed
to clone NNNNNN bytes to FILE, fallback to copying data".

Graham

> +				path);
> +		ret = buffered_copy(clone_fd, rctx->write_fd, clone_offset,
> +				    len, offset);
>  	}
>  
>  out:
> @@ -1197,6 +1246,8 @@ static const char * const cmd_receive_usage[] = {
>  	"                 this file system is mounted.",
>  	"--dump           dump stream metadata, one line per operation,",
>  	"                 does not require the MOUNT parameter",
> +	"--clone-fallback when clone operations fail, attempt to directly copy"
> +	"                 the data instead"
>  	"-v               deprecated, alias for global -v option",
>  	HELPINFO_INSERT_GLOBALS,
>  	HELPINFO_INSERT_VERBOSE,
> @@ -1238,11 +1289,13 @@ static int cmd_receive(const struct cmd_struct *cmd, int argc, char **argv)
>  	optind = 0;
>  	while (1) {
>  		int c;
> -		enum { GETOPT_VAL_DUMP = 257 };
> +		enum { GETOPT_VAL_DUMP = 257, GETOPT_VAL_CLONE_FALLBACK };
>  		static const struct option long_opts[] = {
>  			{ "max-errors", required_argument, NULL, 'E' },
>  			{ "chroot", no_argument, NULL, 'C' },
>  			{ "dump", no_argument, NULL, GETOPT_VAL_DUMP },
> +			{ "clone-fallback", no_argument, NULL,
> +				GETOPT_VAL_CLONE_FALLBACK},
>  			{ "quiet", no_argument, NULL, 'q' },
>  			{ NULL, 0, NULL, 0 }
>  		};
> @@ -1286,6 +1339,9 @@ static int cmd_receive(const struct cmd_struct *cmd, int argc, char **argv)
>  		case GETOPT_VAL_DUMP:
>  			dump = 1;
>  			break;
> +		case GETOPT_VAL_CLONE_FALLBACK:
> +			rctx.clone_fallback = true;
> +			break;
>  		default:
>  			usage_unknown_option(cmd, argv);
>  		}
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/2] btrfs-progs: receive: fallback to buffered copy if clone failed
  2021-09-30 11:48 ` [PATCH v2 1/2] btrfs-progs: receive: fallback to buffered copy if clone failed Qu Wenruo
  2021-09-30 12:47   ` Graham Cobb
@ 2021-09-30 12:51   ` Filipe Manana
  1 sibling, 0 replies; 8+ messages in thread
From: Filipe Manana @ 2021-09-30 12:51 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Thu, Sep 30, 2021 at 1:34 PM Qu Wenruo <wqu@suse.com> wrote:
>
> [BUG]
> There are two very basic send streams:
> (a/m/ctime and uuid omitted)
>
>   Stream 1: (Parent subvolume)
>   subvol   ./parent_subv           transid=8
>   chown    ./parent_subv/          gid=0 uid=0
>   chmod    ./parent_subv/          mode=755
>   utimes   ./parent_subv/
>   mkfile   ./parent_subv/o257-7-0
>   rename   ./parent_subv/o257-7-0  dest=./parent_subv/source
>   utimes   ./parent_subv/
>   write    ./parent_subv/source    offset=0 len=16384
>   chown    ./parent_subv/source    gid=0 uid=0
>   chmod    ./parent_subv/source    mode=600
>   utimes   ./parent_subv/source
>
>   Stream 2: (snapshot and clone)
>   snapshot ./dest_subv             transid=14 parent_transid=10
>   utimes   ./dest_subv/
>   mkfile   ./dest_subv/o258-14-0
>   rename   ./dest_subv/o258-14-0   dest=./dest_subv/reflink
>   utimes   ./dest_subv/
>   clone    ./dest_subv/reflink     offset=0 len=16384 from=./dest_subv/source clone_offset=0
>   chown    ./dest_subv/reflink     gid=0 uid=0
>   chmod    ./dest_subv/reflink     mode=600
>   utimes   ./dest_subv/reflink
>
> But if we receive the first stream with default mount, then remount to
> nodatasum, and try to receive the second stream, it will fail:
>
>  # mount /mnt/btrfs
>  # btrfs receive -f ~/parent_stream /mnt/btrfs/
>  At subvol parent_subv
>  # mount -o remount,nodatasum /mnt/btrfs
>  # btrfs receive -f ~/clone_stream /mnt/btrfs/
>  At snapshot dest_subv
>  ERROR: failed to clone extents to reflink: Invalid argument
>  # echo $?
>  1
>
> [CAUSE]
> Btrfs doesn't allow clone source and destination files have different
> NODATASUM flags.
> This is to prevent a data extent to be owned by both NODATASUM inode and
> regular DATASUM inode.
>
> For above receive operations, the clone destination is inheriting the
> NODATASUM flag from mount option, while the clone source has no
> NODATASUM flag, thus preventing us from doing the clone.
>
> [FIX]
> Btrfs send/receive doesn't require the underlying inode has the same
> flags (thus we can send from compressed extent and receive on a
> non-compressed filesystem).
>
> So here we add a new command line option, '--clone-fallback', to allow
> btrfs-receive to fall back to buffered write to copy data from the
> source file.
>
> Since such behavior can result much less clone operations, which may not
> be what the end users really want, and can hide bugs in send stream.
> Thus this behavior must be explicitly specified by the new option.
>
> And we will output a warning message each time such fallback is
> triggered if the user wants extra debug output.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  Documentation/btrfs-receive.asciidoc | 12 ++++++
>  cmds/receive.c                       | 62 ++++++++++++++++++++++++++--
>  2 files changed, 71 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/btrfs-receive.asciidoc b/Documentation/btrfs-receive.asciidoc
> index e4c4d2c0bf3d..9c934a399a9c 100644
> --- a/Documentation/btrfs-receive.asciidoc
> +++ b/Documentation/btrfs-receive.asciidoc
> @@ -65,6 +65,18 @@ dump the stream metadata, one line per operation
>  +
>  Does not require the 'path' parameter. The filesystem remains unchanged.
>
> +--clone-fallback::
> +when clone opeartions fail, attempt to directly copy the data instead.

 opeartions -> operations

> ++
> +When the source and destination filesystems have different sector sizes, or
> +when source and destination files have differnt 'nodatacow' and/or 'nodatasum'

when source -> when the source
differnt -> different

> +flags (can be set per-file or through mount options), clone operations can fail.

inside the parenthesis:  can be set per file/directory or through mount options

> ++
> +This option makes the receive process attempt to manually copy data from the
> +source to the destination file when a clone operation fails (caused by above

by above -> by the above

Better yet, "for the reasons mentioned before".

> +reasons). When this happens, extents will end up not being shared
> +between the files, thus will take up more space.

Sorry, I hate to be picky just about language stuff, but this is user
oriented so it matters.
For simple things like the typos, checkpatch.pl from the kernel should help.

The rest seems fine to me, and after the corrections:  Reviewed-by:
Filipe Manana <fdmanana@suse.com>

Thanks.

> +
>  -q|--quiet::
>  (deprecated) alias for global '-q' option
>
> diff --git a/cmds/receive.c b/cmds/receive.c
> index 48c774cea587..31746d571016 100644
> --- a/cmds/receive.c
> +++ b/cmds/receive.c
> @@ -76,6 +76,8 @@ struct btrfs_receive
>         struct subvol_uuid_search sus;
>
>         int honor_end_cmd;
> +
> +       bool clone_fallback;
>  };
>
>  static int finish_subvol(struct btrfs_receive *rctx)
> @@ -705,6 +707,44 @@ out:
>         return ret;
>  }
>
> +static int buffered_copy(int src_fd, int dst_fd, u64 src_offset, u64 len,
> +                        u64 dest_offset)
> +{
> +       unsigned char buf[SZ_32K];
> +       u64 copied = 0;
> +       int ret = 0;
> +
> +       while (copied < len) {
> +               u32 copy_len = min_t(u32, ARRAY_SIZE(buf), len - copied);
> +               u32 written = 0;
> +               ssize_t read_size;
> +
> +               read_size = pread(src_fd, buf, copy_len, src_offset + copied);
> +               if (read < 0) {
> +                       ret = -errno;
> +                       error("failed to read source file: %m");
> +                       return ret;
> +               }
> +
> +               /* Write the buffer to dest file */
> +               while (written < read_size) {
> +                       ssize_t write_size;
> +
> +                       write_size = pwrite(dst_fd, buf + written,
> +                                       read_size - written,
> +                                       dest_offset + copied + written);
> +                       if (write_size < 0) {
> +                               ret = -errno;
> +                               error("failed to write source file: %m");
> +                               return ret;
> +                       }
> +                       written += write_size;
> +               }
> +               copied += read_size;
> +       }
> +       return ret;
> +}
> +
>  static int process_clone(const char *path, u64 offset, u64 len,
>                          const u8 *clone_uuid, u64 clone_ctransid,
>                          const char *clone_path, u64 clone_offset,
> @@ -788,8 +828,17 @@ static int process_clone(const char *path, u64 offset, u64 len,
>         ret = ioctl(rctx->write_fd, BTRFS_IOC_CLONE_RANGE, &clone_args);
>         if (ret < 0) {
>                 ret = -errno;
> -               error("failed to clone extents to %s: %m", path);
> -               goto out;
> +               if (ret != -EINVAL || !rctx->clone_fallback) {
> +                       error("failed to clone extents to %s: %m", path);
> +                       goto out;
> +               }
> +
> +               if (bconf.verbose >= 2)
> +                       warning(
> +               "failed to clone extents to %s, fallback to buffered write",
> +                               path);
> +               ret = buffered_copy(clone_fd, rctx->write_fd, clone_offset,
> +                                   len, offset);
>         }
>
>  out:
> @@ -1197,6 +1246,8 @@ static const char * const cmd_receive_usage[] = {
>         "                 this file system is mounted.",
>         "--dump           dump stream metadata, one line per operation,",
>         "                 does not require the MOUNT parameter",
> +       "--clone-fallback when clone operations fail, attempt to directly copy"
> +       "                 the data instead"
>         "-v               deprecated, alias for global -v option",
>         HELPINFO_INSERT_GLOBALS,
>         HELPINFO_INSERT_VERBOSE,
> @@ -1238,11 +1289,13 @@ static int cmd_receive(const struct cmd_struct *cmd, int argc, char **argv)
>         optind = 0;
>         while (1) {
>                 int c;
> -               enum { GETOPT_VAL_DUMP = 257 };
> +               enum { GETOPT_VAL_DUMP = 257, GETOPT_VAL_CLONE_FALLBACK };
>                 static const struct option long_opts[] = {
>                         { "max-errors", required_argument, NULL, 'E' },
>                         { "chroot", no_argument, NULL, 'C' },
>                         { "dump", no_argument, NULL, GETOPT_VAL_DUMP },
> +                       { "clone-fallback", no_argument, NULL,
> +                               GETOPT_VAL_CLONE_FALLBACK},
>                         { "quiet", no_argument, NULL, 'q' },
>                         { NULL, 0, NULL, 0 }
>                 };
> @@ -1286,6 +1339,9 @@ static int cmd_receive(const struct cmd_struct *cmd, int argc, char **argv)
>                 case GETOPT_VAL_DUMP:
>                         dump = 1;
>                         break;
> +               case GETOPT_VAL_CLONE_FALLBACK:
> +                       rctx.clone_fallback = true;
> +                       break;
>                 default:
>                         usage_unknown_option(cmd, argv);
>                 }
> --
> 2.33.0
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] btrfs-progs: misc-tests: add test case for receive --clone-fallback
  2021-09-30 11:48 ` [PATCH v2 2/2] btrfs-progs: misc-tests: add test case for receive --clone-fallback Qu Wenruo
@ 2021-09-30 12:53   ` Filipe Manana
  0 siblings, 0 replies; 8+ messages in thread
From: Filipe Manana @ 2021-09-30 12:53 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Thu, Sep 30, 2021 at 12:51 PM Qu Wenruo <wqu@suse.com> wrote:
>
> The new test case will create two send streams:
>
> - parent_stream
>   A full send stream.
>   Contains one file, as clone source.
>
> - clone_stream
>   An incremental send stream.
>   Contains one clone operation.
>
> Then we will receive the parent_stream with default mount options, while
> try to receive the clone_stream with nodatasum mount option.
>
> This should result clone failure due to nodatasum flag mismatch.
>
> Then check if the receive can continue with --clone-fallback option.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Looks good, thanks.

> ---
>  tests/common                                  |  9 +++
>  .../049-receive-clone-fallback/test.sh        | 58 +++++++++++++++++++
>  2 files changed, 67 insertions(+)
>  create mode 100755 tests/misc-tests/049-receive-clone-fallback/test.sh
>
> diff --git a/tests/common b/tests/common
> index 253071025db2..0423af4231a8 100644
> --- a/tests/common
> +++ b/tests/common
> @@ -633,6 +633,15 @@ run_check_mount_test_dev()
>         run_check $SUDO_HELPER mount -t btrfs $loop_opt "$@" "$TEST_DEV" "$TEST_MNT"
>  }
>
> +run_check_remount_test_dev()
> +{
> +       setup_root_helper
> +
> +       local opts="$1"
> +
> +       run_check $SUDO_HELPER mount -o "remount,$opts" "$TEST_MNT"
> +}
> +
>  # $1-$n: optional paths to unmount, otherwise fallback to TEST_DEV
>  run_check_umount_test_dev()
>  {
> diff --git a/tests/misc-tests/049-receive-clone-fallback/test.sh b/tests/misc-tests/049-receive-clone-fallback/test.sh
> new file mode 100755
> index 000000000000..18136a9e63ee
> --- /dev/null
> +++ b/tests/misc-tests/049-receive-clone-fallback/test.sh
> @@ -0,0 +1,58 @@
> +#!/bin/bash
> +# Verify that btrfs receive can fallback to buffered copy when clone
> +# failed
> +
> +source "$TEST_TOP/common"
> +
> +check_prereq mkfs.btrfs
> +check_prereq btrfs
> +setup_root_helper
> +prepare_test_dev
> +
> +tmp=$(mktemp -d --tmpdir btrfs-progs-send-stream.XXXXXX)
> +
> +# Create two sends stream, one as the parent and the other as an incremental
> +# stream with one clone operation.
> +run_check_mkfs_test_dev
> +run_check_mount_test_dev -o datacow,datasum
> +run_check $SUDO_HELPER "$TOP/btrfs" subvolume create "$TEST_MNT/subv"
> +run_check $SUDO_HELPER dd if=/dev/zero bs=1M count=1 of="$TEST_MNT/subv/file1"
> +run_check $SUDO_HELPER "$TOP/btrfs" subvolume snapshot -r "$TEST_MNT/subv" \
> +       "$TEST_MNT/snap1"
> +run_check $SUDO_HELPER cp --reflink=always "$TEST_MNT/subv/file1" \
> +       "$TEST_MNT/subv/file2"
> +run_check $SUDO_HELPER "$TOP/btrfs" subvolume snapshot -r "$TEST_MNT/subv" \
> +       "$TEST_MNT/snap2"
> +
> +run_check $SUDO_HELPER "$TOP/btrfs" send -f "$tmp/parent_stream" \
> +       "$TEST_MNT/snap1"
> +run_check $SUDO_HELPER "$TOP/btrfs" send -f "$tmp/clone_stream" \
> +       -p "$TEST_MNT/snap1" "$TEST_MNT/snap2"
> +
> +run_check_umount_test_dev
> +run_check_mkfs_test_dev
> +
> +# Receive the first stream with the same mount option
> +run_check_mount_test_dev -o datacow -o datasum
> +
> +# Receiving the full stream should not fail
> +run_check $SUDO_HELPER "$TOP/btrfs" receive -f "$tmp/parent_stream" "$TEST_MNT"
> +
> +# Remount the fs with nodatasum mount option, so that the new file received
> +# through the incremental stream will end up with the nodatasum flag set.
> +run_check_remount_test_dev nodatasum
> +
> +# Receiving incremental send stream without --clone-fallback should fail.
> +# As the clone source and destination have different NODATASUM flags
> +run_mustfail "receiving clone with different NODATASUM should fail" \
> +       $SUDO_HELPER "$TOP/btrfs" receive -f "$tmp/clone_stream" "$TEST_MNT"
> +
> +# Firstly we need to cleanup the partially received subvolume
> +run_check $SUDO_HELPER "$TOP/btrfs" subvolume delete "$TEST_MNT/snap2"
> +
> +# With --clone-fallback specified, the receive should finish without problem
> +run_check $SUDO_HELPER "$TOP/btrfs" receive --clone-fallback \
> +       -f "$tmp/clone_stream" "$TEST_MNT"
> +run_check_umount_test_dev
> +
> +rm -rf -- "$tmp"
> --
> 2.33.0
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/2] btrfs-progs: receive: fallback to buffered copy if clone failed
  2021-09-30 12:47   ` Graham Cobb
@ 2021-09-30 13:31     ` Filipe Manana
  2021-09-30 14:19     ` Qu Wenruo
  1 sibling, 0 replies; 8+ messages in thread
From: Filipe Manana @ 2021-09-30 13:31 UTC (permalink / raw)
  To: Graham Cobb; +Cc: Qu Wenruo, linux-btrfs

On Thu, Sep 30, 2021 at 2:20 PM Graham Cobb <g.btrfs@cobb.uk.net> wrote:
>
> On 30/09/2021 12:48, Qu Wenruo wrote:
> > [BUG]
> > There are two very basic send streams:
> > (a/m/ctime and uuid omitted)
> >
> >   Stream 1: (Parent subvolume)
> >   subvol   ./parent_subv           transid=8
> >   chown    ./parent_subv/          gid=0 uid=0
> >   chmod    ./parent_subv/          mode=755
> >   utimes   ./parent_subv/
> >   mkfile   ./parent_subv/o257-7-0
> >   rename   ./parent_subv/o257-7-0  dest=./parent_subv/source
> >   utimes   ./parent_subv/
> >   write    ./parent_subv/source    offset=0 len=16384
> >   chown    ./parent_subv/source    gid=0 uid=0
> >   chmod    ./parent_subv/source    mode=600
> >   utimes   ./parent_subv/source
> >
> >   Stream 2: (snapshot and clone)
> >   snapshot ./dest_subv             transid=14 parent_transid=10
> >   utimes   ./dest_subv/
> >   mkfile   ./dest_subv/o258-14-0
> >   rename   ./dest_subv/o258-14-0   dest=./dest_subv/reflink
> >   utimes   ./dest_subv/
> >   clone    ./dest_subv/reflink     offset=0 len=16384 from=./dest_subv/source clone_offset=0
> >   chown    ./dest_subv/reflink     gid=0 uid=0
> >   chmod    ./dest_subv/reflink     mode=600
> >   utimes   ./dest_subv/reflink
> >
> > But if we receive the first stream with default mount, then remount to
> > nodatasum, and try to receive the second stream, it will fail:
> >
> >  # mount /mnt/btrfs
> >  # btrfs receive -f ~/parent_stream /mnt/btrfs/
> >  At subvol parent_subv
> >  # mount -o remount,nodatasum /mnt/btrfs
> >  # btrfs receive -f ~/clone_stream /mnt/btrfs/
> >  At snapshot dest_subv
> >  ERROR: failed to clone extents to reflink: Invalid argument
> >  # echo $?
> >  1
> >
> > [CAUSE]
> > Btrfs doesn't allow clone source and destination files have different
> > NODATASUM flags.
> > This is to prevent a data extent to be owned by both NODATASUM inode and
> > regular DATASUM inode.
> >
> > For above receive operations, the clone destination is inheriting the
> > NODATASUM flag from mount option, while the clone source has no
> > NODATASUM flag, thus preventing us from doing the clone.
> >
> > [FIX]
> > Btrfs send/receive doesn't require the underlying inode has the same
> > flags (thus we can send from compressed extent and receive on a
> > non-compressed filesystem).
> >
> > So here we add a new command line option, '--clone-fallback', to allow
> > btrfs-receive to fall back to buffered write to copy data from the
> > source file.
> >
> > Since such behavior can result much less clone operations, which may not
> > be what the end users really want, and can hide bugs in send stream.
> > Thus this behavior must be explicitly specified by the new option.
> >
> > And we will output a warning message each time such fallback is
> > triggered if the user wants extra debug output.
> >
> > Signed-off-by: Qu Wenruo <wqu@suse.com>
> > ---
> >  Documentation/btrfs-receive.asciidoc | 12 ++++++
> >  cmds/receive.c                       | 62 ++++++++++++++++++++++++++--
> >  2 files changed, 71 insertions(+), 3 deletions(-)
> >
> > diff --git a/Documentation/btrfs-receive.asciidoc b/Documentation/btrfs-receive.asciidoc
> > index e4c4d2c0bf3d..9c934a399a9c 100644
> > --- a/Documentation/btrfs-receive.asciidoc
> > +++ b/Documentation/btrfs-receive.asciidoc
> > @@ -65,6 +65,18 @@ dump the stream metadata, one line per operation
> >  +
> >  Does not require the 'path' parameter. The filesystem remains unchanged.
> >
> > +--clone-fallback::
> > +when clone opeartions fail, attempt to directly copy the data instead.
> > ++
> > +When the source and destination filesystems have different sector sizes, or
> > +when source and destination files have differnt 'nodatacow' and/or 'nodatasum'
>
> typo: "different"
>
> > +flags (can be set per-file or through mount options), clone operations can fail.
> > ++
> > +This option makes the receive process attempt to manually copy data from the
> > +source to the destination file when a clone operation fails (caused by above
> > +reasons). When this happens, extents will end up not being shared
> > +between the files, thus will take up more space.
>
> Send/receive and the storage savings available by storing snapshots are
> important btrfs features for many sysadmins. I think the documentation
> needs to be a bit clearer.
>
> 1) It says that the fallback happens when the clone operation fails
> "caused by above reasons". Is that right? Is it **only** those cases
> that can cause EINVAL error? In the earlier email discussion there was
> mention of different compression settings - would that cause a problem?
> What about new features like the VerifyFS stuff being worked on (I have
> no idea - just choosing a work-in-progress item as an example). If these
> are the only two cases, I think there needs to be a code comment in the
> kernel code that returns this error that if any other cases are
> introduced the documentation for --clone-fallback needs to be updated.
>
> 2) In any case, "caused by above reasons" sounds a bit unnatural to me
> (native English speaker). I would suggest replacing "(caused by above
> reasons)" with "in this way".
>
> 3) And maybe add another sentence: "A warning message will be displayed
> when this happens, if the --verbose option is in effect".
>
> > +
> >  -q|--quiet::
> >  (deprecated) alias for global '-q' option
> >
> > diff --git a/cmds/receive.c b/cmds/receive.c
> > index 48c774cea587..31746d571016 100644
> > --- a/cmds/receive.c
> > +++ b/cmds/receive.c
> > @@ -76,6 +76,8 @@ struct btrfs_receive
> >       struct subvol_uuid_search sus;
> >
> >       int honor_end_cmd;
> > +
> > +     bool clone_fallback;
> >  };
> >
> >  static int finish_subvol(struct btrfs_receive *rctx)
> > @@ -705,6 +707,44 @@ out:
> >       return ret;
> >  }
> >
> > +static int buffered_copy(int src_fd, int dst_fd, u64 src_offset, u64 len,
> > +                      u64 dest_offset)
> > +{
> > +     unsigned char buf[SZ_32K];
> > +     u64 copied = 0;
> > +     int ret = 0;
> > +
> > +     while (copied < len) {
> > +             u32 copy_len = min_t(u32, ARRAY_SIZE(buf), len - copied);
> > +             u32 written = 0;
> > +             ssize_t read_size;
> > +
> > +             read_size = pread(src_fd, buf, copy_len, src_offset + copied);
> > +             if (read < 0) {
> > +                     ret = -errno;
> > +                     error("failed to read source file: %m");
> > +                     return ret;
> > +             }
> > +
> > +             /* Write the buffer to dest file */
> > +             while (written < read_size) {
> > +                     ssize_t write_size;
> > +
> > +                     write_size = pwrite(dst_fd, buf + written,
> > +                                     read_size - written,
> > +                                     dest_offset + copied + written);
> > +                     if (write_size < 0) {
> > +                             ret = -errno;
> > +                             error("failed to write source file: %m");
> > +                             return ret;
> > +                     }
> > +                     written += write_size;
> > +             }
> > +             copied += read_size;
> > +     }
> > +     return ret;
> > +}
> > +
> >  static int process_clone(const char *path, u64 offset, u64 len,
> >                        const u8 *clone_uuid, u64 clone_ctransid,
> >                        const char *clone_path, u64 clone_offset,
> > @@ -788,8 +828,17 @@ static int process_clone(const char *path, u64 offset, u64 len,
> >       ret = ioctl(rctx->write_fd, BTRFS_IOC_CLONE_RANGE, &clone_args);
> >       if (ret < 0) {
> >               ret = -errno;
> > -             error("failed to clone extents to %s: %m", path);
> > -             goto out;
> > +             if (ret != -EINVAL || !rctx->clone_fallback) {
> > +                     error("failed to clone extents to %s: %m", path);
> > +                     goto out;
> > +             }
> > +
> > +             if (bconf.verbose >= 2)
> > +                     warning(
> > +             "failed to clone extents to %s, fallback to buffered write",
>
> I think this message needs to tell the user how many bytes which they
> expected to be cloned are now being duplicated. Something like "failed
> to clone NNNNNN bytes to FILE, fallback to copying data".

That is a good idea.
Perhaps even keep track of a sum to report once receive completes, so
that the user knows how many bytes in total were not
shared/deduplicated (we can easily have tens, hundreds of thousands or
more clone operations).

Adding the full path of the source file, instead of only the
destination file, could also be nice to have, as it can be used
afterwards for running simple deduplication tools.

>
> Graham
>
> > +                             path);
> > +             ret = buffered_copy(clone_fd, rctx->write_fd, clone_offset,
> > +                                 len, offset);
> >       }
> >
> >  out:
> > @@ -1197,6 +1246,8 @@ static const char * const cmd_receive_usage[] = {
> >       "                 this file system is mounted.",
> >       "--dump           dump stream metadata, one line per operation,",
> >       "                 does not require the MOUNT parameter",
> > +     "--clone-fallback when clone operations fail, attempt to directly copy"
> > +     "                 the data instead"
> >       "-v               deprecated, alias for global -v option",
> >       HELPINFO_INSERT_GLOBALS,
> >       HELPINFO_INSERT_VERBOSE,
> > @@ -1238,11 +1289,13 @@ static int cmd_receive(const struct cmd_struct *cmd, int argc, char **argv)
> >       optind = 0;
> >       while (1) {
> >               int c;
> > -             enum { GETOPT_VAL_DUMP = 257 };
> > +             enum { GETOPT_VAL_DUMP = 257, GETOPT_VAL_CLONE_FALLBACK };
> >               static const struct option long_opts[] = {
> >                       { "max-errors", required_argument, NULL, 'E' },
> >                       { "chroot", no_argument, NULL, 'C' },
> >                       { "dump", no_argument, NULL, GETOPT_VAL_DUMP },
> > +                     { "clone-fallback", no_argument, NULL,
> > +                             GETOPT_VAL_CLONE_FALLBACK},
> >                       { "quiet", no_argument, NULL, 'q' },
> >                       { NULL, 0, NULL, 0 }
> >               };
> > @@ -1286,6 +1339,9 @@ static int cmd_receive(const struct cmd_struct *cmd, int argc, char **argv)
> >               case GETOPT_VAL_DUMP:
> >                       dump = 1;
> >                       break;
> > +             case GETOPT_VAL_CLONE_FALLBACK:
> > +                     rctx.clone_fallback = true;
> > +                     break;
> >               default:
> >                       usage_unknown_option(cmd, argv);
> >               }
> >
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/2] btrfs-progs: receive: fallback to buffered copy if clone failed
  2021-09-30 12:47   ` Graham Cobb
  2021-09-30 13:31     ` Filipe Manana
@ 2021-09-30 14:19     ` Qu Wenruo
  1 sibling, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2021-09-30 14:19 UTC (permalink / raw)
  To: Graham Cobb, Qu Wenruo, linux-btrfs



On 2021/9/30 20:47, Graham Cobb wrote:
> On 30/09/2021 12:48, Qu Wenruo wrote:
>> [BUG]
>> There are two very basic send streams:
>> (a/m/ctime and uuid omitted)
>>
>>    Stream 1: (Parent subvolume)
>>    subvol   ./parent_subv           transid=8
>>    chown    ./parent_subv/          gid=0 uid=0
>>    chmod    ./parent_subv/          mode=755
>>    utimes   ./parent_subv/
>>    mkfile   ./parent_subv/o257-7-0
>>    rename   ./parent_subv/o257-7-0  dest=./parent_subv/source
>>    utimes   ./parent_subv/
>>    write    ./parent_subv/source    offset=0 len=16384
>>    chown    ./parent_subv/source    gid=0 uid=0
>>    chmod    ./parent_subv/source    mode=600
>>    utimes   ./parent_subv/source
>>
>>    Stream 2: (snapshot and clone)
>>    snapshot ./dest_subv             transid=14 parent_transid=10
>>    utimes   ./dest_subv/
>>    mkfile   ./dest_subv/o258-14-0
>>    rename   ./dest_subv/o258-14-0   dest=./dest_subv/reflink
>>    utimes   ./dest_subv/
>>    clone    ./dest_subv/reflink     offset=0 len=16384 from=./dest_subv/source clone_offset=0
>>    chown    ./dest_subv/reflink     gid=0 uid=0
>>    chmod    ./dest_subv/reflink     mode=600
>>    utimes   ./dest_subv/reflink
>>
>> But if we receive the first stream with default mount, then remount to
>> nodatasum, and try to receive the second stream, it will fail:
>>
>>   # mount /mnt/btrfs
>>   # btrfs receive -f ~/parent_stream /mnt/btrfs/
>>   At subvol parent_subv
>>   # mount -o remount,nodatasum /mnt/btrfs
>>   # btrfs receive -f ~/clone_stream /mnt/btrfs/
>>   At snapshot dest_subv
>>   ERROR: failed to clone extents to reflink: Invalid argument
>>   # echo $?
>>   1
>>
>> [CAUSE]
>> Btrfs doesn't allow clone source and destination files have different
>> NODATASUM flags.
>> This is to prevent a data extent to be owned by both NODATASUM inode and
>> regular DATASUM inode.
>>
>> For above receive operations, the clone destination is inheriting the
>> NODATASUM flag from mount option, while the clone source has no
>> NODATASUM flag, thus preventing us from doing the clone.
>>
>> [FIX]
>> Btrfs send/receive doesn't require the underlying inode has the same
>> flags (thus we can send from compressed extent and receive on a
>> non-compressed filesystem).
>>
>> So here we add a new command line option, '--clone-fallback', to allow
>> btrfs-receive to fall back to buffered write to copy data from the
>> source file.
>>
>> Since such behavior can result much less clone operations, which may not
>> be what the end users really want, and can hide bugs in send stream.
>> Thus this behavior must be explicitly specified by the new option.
>>
>> And we will output a warning message each time such fallback is
>> triggered if the user wants extra debug output.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   Documentation/btrfs-receive.asciidoc | 12 ++++++
>>   cmds/receive.c                       | 62 ++++++++++++++++++++++++++--
>>   2 files changed, 71 insertions(+), 3 deletions(-)
>>
>> diff --git a/Documentation/btrfs-receive.asciidoc b/Documentation/btrfs-receive.asciidoc
>> index e4c4d2c0bf3d..9c934a399a9c 100644
>> --- a/Documentation/btrfs-receive.asciidoc
>> +++ b/Documentation/btrfs-receive.asciidoc
>> @@ -65,6 +65,18 @@ dump the stream metadata, one line per operation
>>   +
>>   Does not require the 'path' parameter. The filesystem remains unchanged.
>>
>> +--clone-fallback::
>> +when clone opeartions fail, attempt to directly copy the data instead.
>> ++
>> +When the source and destination filesystems have different sector sizes, or
>> +when source and destination files have differnt 'nodatacow' and/or 'nodatasum'
>
> typo: "different"
>
>> +flags (can be set per-file or through mount options), clone operations can fail.
>> ++
>> +This option makes the receive process attempt to manually copy data from the
>> +source to the destination file when a clone operation fails (caused by above
>> +reasons). When this happens, extents will end up not being shared
>> +between the files, thus will take up more space.
>
> Send/receive and the storage savings available by storing snapshots are
> important btrfs features for many sysadmins.

BTW, the space saving by receiving a snapshot is not affected in this case.

As a snapshot is received by doing a snapshot of the target subvolume at
the receiving side.
Thus this particular situation is only affecting clone operations in a
send stream.

> I think the documentation
> needs to be a bit clearer.
>
> 1) It says that the fallback happens when the clone operation fails
> "caused by above reasons". Is that right? Is it **only** those cases
> that can cause EINVAL error?

And the recent RO->RW problems.

Despite those send stream problems, I don't have other obvious reasons yet.

> In the earlier email discussion there was
> mention of different compression settings - would that cause a problem?

No.

Send stream only contains the content of the file, not the compressed
content.

For the send stream, there is no difference in the content (all the
uncompressed data).

> What about new features like the VerifyFS stuff being worked on (I have
> no idea - just choosing a work-in-progress item as an example).

If those new features are adding new limitations to reflink, then it
could be.

E.g. if we're going to support encryption, then reflinking from
unencrypted inode to an encrypted inode will fail.

But unfortunately I don't have better description for such feature cases.

> If these
> are the only two cases, I think there needs to be a code comment in the
> kernel code that returns this error that if any other cases are
> introduced the documentation for --clone-fallback needs to be updated.

All the reasons are related to the reflink limitation:

- Sectorsize
   Reflink requires certain alignment

- Btrfs specific flags
   NODATASUM and NODATACOW flags difference between source/destination
   files will reject the reflink

There are some others that could return -EINVAL, like invalid remap flags.
But that's not possible inside btrfs-receive, as we will not use those
invalid flags.

Others are very basic checks, like the source should be a regular file,
the range should not overflow.

If the very basic requirement can't be met, it should be a send bug.
Thankfully we don't have such reports yet, except the ongoing RO->RW->RO
problem.

>
> 2) In any case, "caused by above reasons" sounds a bit unnatural to me
> (native English speaker). I would suggest replacing "(caused by above
> reasons)" with "in this way".

Thanks for the better expressions, I am definitely not the proper guy to
write docs...

>
> 3) And maybe add another sentence: "A warning message will be displayed
> when this happens, if the --verbose option is in effect".
>
>> +
>>   -q|--quiet::
>>   (deprecated) alias for global '-q' option
>>
>> diff --git a/cmds/receive.c b/cmds/receive.c
>> index 48c774cea587..31746d571016 100644
>> --- a/cmds/receive.c
>> +++ b/cmds/receive.c
>> @@ -76,6 +76,8 @@ struct btrfs_receive
>>   	struct subvol_uuid_search sus;
>>
>>   	int honor_end_cmd;
>> +
>> +	bool clone_fallback;
>>   };
>>
>>   static int finish_subvol(struct btrfs_receive *rctx)
>> @@ -705,6 +707,44 @@ out:
>>   	return ret;
>>   }
>>
>> +static int buffered_copy(int src_fd, int dst_fd, u64 src_offset, u64 len,
>> +			 u64 dest_offset)
>> +{
>> +	unsigned char buf[SZ_32K];
>> +	u64 copied = 0;
>> +	int ret = 0;
>> +
>> +	while (copied < len) {
>> +		u32 copy_len = min_t(u32, ARRAY_SIZE(buf), len - copied);
>> +		u32 written = 0;
>> +		ssize_t read_size;
>> +
>> +		read_size = pread(src_fd, buf, copy_len, src_offset + copied);
>> +		if (read < 0) {
>> +			ret = -errno;
>> +			error("failed to read source file: %m");
>> +			return ret;
>> +		}
>> +
>> +		/* Write the buffer to dest file */
>> +		while (written < read_size) {
>> +			ssize_t write_size;
>> +
>> +			write_size = pwrite(dst_fd, buf + written,
>> +					read_size - written,
>> +					dest_offset + copied + written);
>> +			if (write_size < 0) {
>> +				ret = -errno;
>> +				error("failed to write source file: %m");
>> +				return ret;
>> +			}
>> +			written += write_size;
>> +		}
>> +		copied += read_size;
>> +	}
>> +	return ret;
>> +}
>> +
>>   static int process_clone(const char *path, u64 offset, u64 len,
>>   			 const u8 *clone_uuid, u64 clone_ctransid,
>>   			 const char *clone_path, u64 clone_offset,
>> @@ -788,8 +828,17 @@ static int process_clone(const char *path, u64 offset, u64 len,
>>   	ret = ioctl(rctx->write_fd, BTRFS_IOC_CLONE_RANGE, &clone_args);
>>   	if (ret < 0) {
>>   		ret = -errno;
>> -		error("failed to clone extents to %s: %m", path);
>> -		goto out;
>> +		if (ret != -EINVAL || !rctx->clone_fallback) {
>> +			error("failed to clone extents to %s: %m", path);
>> +			goto out;
>> +		}
>> +
>> +		if (bconf.verbose >= 2)
>> +			warning(
>> +		"failed to clone extents to %s, fallback to buffered write",
>
> I think this message needs to tell the user how many bytes which they
> expected to be cloned are now being duplicated. Something like "failed
> to clone NNNNNN bytes to FILE, fallback to copying data".

Sure, that could help, and just like what Filipe mentioned, I'll also
add a summary for how many bytes in total are not cloned, after all
streams are received.

Thanks,
Qu

>
> Graham
>
>> +				path);
>> +		ret = buffered_copy(clone_fd, rctx->write_fd, clone_offset,
>> +				    len, offset);
>>   	}
>>
>>   out:
>> @@ -1197,6 +1246,8 @@ static const char * const cmd_receive_usage[] = {
>>   	"                 this file system is mounted.",
>>   	"--dump           dump stream metadata, one line per operation,",
>>   	"                 does not require the MOUNT parameter",
>> +	"--clone-fallback when clone operations fail, attempt to directly copy"
>> +	"                 the data instead"
>>   	"-v               deprecated, alias for global -v option",
>>   	HELPINFO_INSERT_GLOBALS,
>>   	HELPINFO_INSERT_VERBOSE,
>> @@ -1238,11 +1289,13 @@ static int cmd_receive(const struct cmd_struct *cmd, int argc, char **argv)
>>   	optind = 0;
>>   	while (1) {
>>   		int c;
>> -		enum { GETOPT_VAL_DUMP = 257 };
>> +		enum { GETOPT_VAL_DUMP = 257, GETOPT_VAL_CLONE_FALLBACK };
>>   		static const struct option long_opts[] = {
>>   			{ "max-errors", required_argument, NULL, 'E' },
>>   			{ "chroot", no_argument, NULL, 'C' },
>>   			{ "dump", no_argument, NULL, GETOPT_VAL_DUMP },
>> +			{ "clone-fallback", no_argument, NULL,
>> +				GETOPT_VAL_CLONE_FALLBACK},
>>   			{ "quiet", no_argument, NULL, 'q' },
>>   			{ NULL, 0, NULL, 0 }
>>   		};
>> @@ -1286,6 +1339,9 @@ static int cmd_receive(const struct cmd_struct *cmd, int argc, char **argv)
>>   		case GETOPT_VAL_DUMP:
>>   			dump = 1;
>>   			break;
>> +		case GETOPT_VAL_CLONE_FALLBACK:
>> +			rctx.clone_fallback = true;
>> +			break;
>>   		default:
>>   			usage_unknown_option(cmd, argv);
>>   		}
>>
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-09-30 14:19 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-30 11:48 [PATCH v2 0/2] btrfs-progs: receive: introduce new --clone-fallback option Qu Wenruo
2021-09-30 11:48 ` [PATCH v2 1/2] btrfs-progs: receive: fallback to buffered copy if clone failed Qu Wenruo
2021-09-30 12:47   ` Graham Cobb
2021-09-30 13:31     ` Filipe Manana
2021-09-30 14:19     ` Qu Wenruo
2021-09-30 12:51   ` Filipe Manana
2021-09-30 11:48 ` [PATCH v2 2/2] btrfs-progs: misc-tests: add test case for receive --clone-fallback Qu Wenruo
2021-09-30 12:53   ` Filipe Manana

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.